[01:15:01] (03PS6) 10Mstyles: dockerfiles: Add Java sonarcloud implementation [integration/config] - 10https://gerrit.wikimedia.org/r/556513 (https://phabricator.wikimedia.org/T238004) [01:15:03] (03PS1) 10Mstyles: jib: Add Java codehealth project and job-template [integration/config] - 10https://gerrit.wikimedia.org/r/557154 (https://phabricator.wikimedia.org/T238004) [01:15:48] (03CR) 10jerkins-bot: [V: 04-1] jib: Add Java codehealth project and job-template [integration/config] - 10https://gerrit.wikimedia.org/r/557154 (https://phabricator.wikimedia.org/T238004) (owner: 10Mstyles) [06:11:22] PROBLEM - Parsoid on deployment-mediawiki-parsoid10 is CRITICAL: connect to address 172.16.0.141 and port 8000: Connection refused [06:11:22] PROBLEM - Parsoid on deployment-parsoid09 is CRITICAL: connect to address 172.16.5.63 and port 8000: Connection refused [06:25:38] PROBLEM - Puppet errors on deployment-wdqs01 is CRITICAL: (Service Check Timed Out) [06:25:46] PROBLEM - Puppet staleness on deployment-hadoop-test-3 is CRITICAL: (Service Check Timed Out) [10:13:07] (03Abandoned) 10Legoktm: More aggressively skip things based on which stages are run [integration/quibble] - 10https://gerrit.wikimedia.org/r/438084 (owner: 10Legoktm) [11:49:04] (03PS6) 10Awight: [WIP] Sniff for undocumented, unchecked @throws annotations [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/556654 (https://phabricator.wikimedia.org/T240672) [12:26:14] (03PS5) 10Daimona Eaytoy: phpunit deprecations: Handle assertType() as well [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/557051 (https://phabricator.wikimedia.org/T192167) [12:30:18] (03CR) 10jerkins-bot: [V: 04-1] phpunit deprecations: Handle assertType() as well [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/557051 (https://phabricator.wikimedia.org/T192167) (owner: 10Daimona Eaytoy) [12:47:03] (03PS6) 10Daimona Eaytoy: phpunit deprecations: Handle assertType() as well [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/557051 (https://phabricator.wikimedia.org/T192167) [12:51:44] (03CR) 10jerkins-bot: [V: 04-1] phpunit deprecations: Handle assertType() as well [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/557051 (https://phabricator.wikimedia.org/T192167) (owner: 10Daimona Eaytoy) [12:53:45] (03PS7) 10Awight: Sniff for undocumented, unchecked @throws annotations [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/556654 (https://phabricator.wikimedia.org/T240672) [12:58:16] (03PS8) 10Awight: Sniff for undocumented, unchecked @throws annotations [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/556654 (https://phabricator.wikimedia.org/T240672) [14:18:56] (03CR) 10Awight: "I'm not sure what the standards are for this sort of thing, but it seems useful to include a boolean configuration knob to detect *all* an" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/556654 (https://phabricator.wikimedia.org/T240672) (owner: 10Awight) [14:38:46] (03PS1) 10Awight: [WIP] Knob to forbid any unchecked exception annotations [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/557253 (https://phabricator.wikimedia.org/T240672) [19:07:27] (03PS1) 10Jayprakash12345: Register bookreader for tox-docker tests [integration/config] - 10https://gerrit.wikimedia.org/r/557275 [19:09:51] (03PS2) 10Jayprakash12345: Register bookreader for tox-docker tests [integration/config] - 10https://gerrit.wikimedia.org/r/557275 [19:10:55] (03CR) 10Jayprakash12345: "tox.ini added in https://gerrit.wikimedia.org/r/#/c/labs/tools/bookreader/+/557274/" [integration/config] - 10https://gerrit.wikimedia.org/r/557275 (owner: 10Jayprakash12345) [19:44:51] (03CR) 10Jforrester: "Is this good to go?" [integration/config] - 10https://gerrit.wikimedia.org/r/554859 (owner: 10Hashar) [19:45:16] (03CR) 10Jforrester: [C: 04-1] "Not ready yet." [integration/config] - 10https://gerrit.wikimedia.org/r/554326 (https://phabricator.wikimedia.org/T233012) (owner: 10Jforrester) [19:47:48] (03CR) 10Jforrester: [C: 03+2] jjb: Add back codehealth messages and adjust success/failure pattern [integration/config] - 10https://gerrit.wikimedia.org/r/557138 (https://phabricator.wikimedia.org/T217008) (owner: 10Kosta Harlan) [19:48:38] (03Merged) 10jenkins-bot: jjb: Add back codehealth messages and adjust success/failure pattern [integration/config] - 10https://gerrit.wikimedia.org/r/557138 (https://phabricator.wikimedia.org/T217008) (owner: 10Kosta Harlan) [19:48:40] (03CR) 10Jforrester: [C: 03+2] Register bookreader for tox-docker tests [integration/config] - 10https://gerrit.wikimedia.org/r/557275 (owner: 10Jayprakash12345) [19:49:26] !log Zuul: Deploying https://gerrit.wikimedia.org/r/557138 [19:49:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:49:29] (03Merged) 10jenkins-bot: Register bookreader for tox-docker tests [integration/config] - 10https://gerrit.wikimedia.org/r/557275 (owner: 10Jayprakash12345) [19:49:46] !log Zuul: Deploying https://gerrit.wikimedia.org/r/557275 [19:49:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:52:16] (03PS4) 10Legoktm: Whitelist SLF Bot for extension-SecureLinkFixer [integration/config] - 10https://gerrit.wikimedia.org/r/539724 (owner: 10MarcoAurelio) [20:01:15] I think gerrit-replica is down [20:01:36] paladox: around? [20:01:43] yup [20:02:01] T240763 [20:02:02] T240763: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 [20:02:03] oh, it's indeed down [20:02:17] not sure why [20:02:25] thcipriani ^ [20:07:01] I disabled libup for now [20:15:08] Thanks, legoktm. [20:35:20] (03CR) 10Legoktm: Begin migration to pytest, use it as test runner (036 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/525209 (owner: 10Legoktm) [20:36:51] legoktm: Hah, I didn't mean to nerd-snipe you into doing it immediately! :_) [20:37:31] Also, I imagine the rebase will be non-trivial. [20:40:01] I think I might be better off just doing it from scratch again [20:40:05] :-( [20:40:16] But yeah, I run into that state sometimes. [20:41:53] it's just the first patch that's bad. the follow-ups are split well enough it should be reasonable to rebase them [21:12:30] fixed puppet on deployment-acme-chief0[34] by updating structure of authdns_servers hiera data [21:14:48] (03Abandoned) 10Legoktm: tests: Further parametrize test_dockerfiles::test_has_files [integration/config] - 10https://gerrit.wikimedia.org/r/525475 (owner: 10Legoktm) [21:17:45] fixed puppet on deployment-puppetdb02 by specifying profile::puppetdb::elk_logging hieradata [21:18:52] (03PS2) 10Legoktm: Begin migration to pytest, use it as test runner [integration/config] - 10https://gerrit.wikimedia.org/r/525209 [21:18:54] (03PS2) 10Legoktm: Port test_zuul_doc_functions to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525210 [21:18:56] (03PS2) 10Legoktm: Port test_zuul_set_gated_extensions to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525211 [21:18:58] (03PS3) 10Legoktm: Port test_files_structure to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525212 [21:19:00] (03PS2) 10Legoktm: Port test_zuul_mw_dependencies to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525479 [21:19:10] rebased all but 3 patches [21:19:23] those can be done later though [21:19:44] (03CR) 10jerkins-bot: [V: 04-1] Begin migration to pytest, use it as test runner [integration/config] - 10https://gerrit.wikimedia.org/r/525209 (owner: 10Legoktm) [21:20:10] (03CR) 10jerkins-bot: [V: 04-1] Port test_zuul_doc_functions to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525210 (owner: 10Legoktm) [21:20:17] (03CR) 10jerkins-bot: [V: 04-1] Port test_zuul_set_gated_extensions to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525211 (owner: 10Legoktm) [21:20:17] * legoktm -> food [21:20:23] set profile::backup::director_seed: '' in deployment-prep project hiera to fix puppet errors on deployment-webperf12, deployment-mwmaint01, and deployment-deploy0[12] [21:20:23] (03CR) 10jerkins-bot: [V: 04-1] Port test_files_structure to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525212 (owner: 10Legoktm) [21:20:34] (03CR) 10jerkins-bot: [V: 04-1] Port test_zuul_mw_dependencies to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525479 (owner: 10Legoktm) [21:20:49] (we have profile::backup::enable: false in labs.yaml, maybe that should also set director_seed to empty string) [21:50:10] PROBLEM - Puppet staleness on deployment-db05 is CRITICAL: (Service Check Timed Out) [21:50:23] PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: (Service Check Timed Out) [21:51:30] PROBLEM - Puppet errors on deployment-acme-chief03 is CRITICAL: (Service Check Timed Out) [21:51:57] PROBLEM - Puppet errors on deployment-sca04 is CRITICAL: (Service Check Timed Out) [21:51:58] PROBLEM - Puppet errors on deployment-sca04 is CRITICAL: (Service Check Timed Out) [21:52:13] PROBLEM - Puppet errors on integration-agent-docker-1006 is CRITICAL: (Service Check Timed Out) [21:53:00] PROBLEM - Puppet staleness on integration-agent-jessie-docker-1001 is CRITICAL: (Service Check Timed Out) [21:54:01] PROBLEM - Puppet staleness on deployment-sca04 is CRITICAL: (Service Check Timed Out) [21:54:25] PROBLEM - Puppet errors on integration-agent-docker-1016 is CRITICAL: (Service Check Timed Out) [21:54:48] PROBLEM - Puppet staleness on deployment-puppetdb02 is CRITICAL: (Service Check Timed Out) [21:55:04] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-parsoid10 is CRITICAL: (Service Check Timed Out) [21:55:13] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-09 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:55:35] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: (Service Check Timed Out) [21:56:09] PROBLEM - Free space - all mounts on deployment-urldownloader02 is CRITICAL: (Service Check Timed Out) [21:56:19] PROBLEM - Free space - all mounts on deployment-hadoop-test-1 is CRITICAL: (Service Check Timed Out) [21:56:28] PROBLEM - Puppet errors on integration-cumin is CRITICAL: (Service Check Timed Out) [21:56:48] what's shinken's problem? [21:56:58] PROBLEM - Free space - all mounts on deployment-sca01 is CRITICAL: (Service Check Timed Out) [21:58:04] PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: (Service Check Timed Out) [21:58:36] PROBLEM - Puppet staleness on integration-agent-docker-1003 is CRITICAL: (Service Check Timed Out) [21:59:02] PROBLEM - Puppet staleness on deployment-deploy02 is CRITICAL: (Service Check Timed Out) [21:59:23] PROBLEM - Puppet staleness on deployment-hadoop-test-2 is CRITICAL: (Service Check Timed Out) [21:59:51] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: (Service Check Timed Out) [21:59:58] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-parsoid10 is OK: HTTP OK: HTTP/1.1 200 OK - 49272 bytes in 1.707 second response time [22:00:02] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-09 is OK: HTTP OK: HTTP/1.1 200 OK - 49248 bytes in 0.803 second response time [22:00:25] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 37031 bytes in 0.599 second response time [22:00:59] RECOVERY - Free space - all mounts on deployment-urldownloader02 is OK: OK: All targets OK [22:01:08] RECOVERY - Free space - all mounts on deployment-hadoop-test-1 is OK: OK: All targets OK [22:01:50] PROBLEM - Puppet errors on integration-agent-pkgbuilder-1002 is CRITICAL: (Service Check Timed Out) [22:02:24] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: (Service Check Timed Out) [22:02:36] (03PS3) 10Legoktm: Begin migration to pytest, use it as test runner [integration/config] - 10https://gerrit.wikimedia.org/r/525209 [22:02:38] (03PS3) 10Legoktm: Port test_zuul_doc_functions to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525210 [22:02:40] (03PS3) 10Legoktm: Port test_zuul_set_gated_extensions to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525211 [22:02:42] (03PS4) 10Legoktm: Port test_files_structure to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525212 [22:02:43] PROBLEM - Puppet staleness on integration-agent-docker-1004 is CRITICAL: (Service Check Timed Out) [22:02:44] (03PS3) 10Legoktm: Port test_zuul_mw_dependencies to pytest [integration/config] - 10https://gerrit.wikimedia.org/r/525479 [22:05:33] PROBLEM - Puppet errors on deployment-puppetdb02 is CRITICAL: (Service Check Timed Out) [22:06:05] PROBLEM - Free space - all mounts on deployment-zookeeper02 is CRITICAL: (Service Check Timed Out) [22:07:39] PROBLEM - Puppet staleness on deployment-jobrunner03 is CRITICAL: (Service Check Timed Out) [22:07:50] PROBLEM - Puppet staleness on deployment-acme-chief04 is CRITICAL: (Service Check Timed Out) [22:08:41] RECOVERY - Free space - all mounts on deployment-ms-be05 is OK: OK: All targets OK [22:09:42] PROBLEM - Puppet errors on integration-agent-puppet-docker-1001 is CRITICAL: (Service Check Timed Out) [22:09:58] PROBLEM - Puppet errors on deployment-snapshot01 is CRITICAL: (Service Check Timed Out) [22:10:07] PROBLEM - Puppet errors on deployment-mediawiki-09 is CRITICAL: (Service Check Timed Out) [22:10:21] PROBLEM - Puppet staleness on deployment-fluorine02 is CRITICAL: (Service Check Timed Out) [22:10:59] RECOVERY - Free space - all mounts on deployment-zookeeper02 is OK: OK: All targets OK [22:11:38] PROBLEM - Free space - all mounts on integration-agent-docker-1007 is CRITICAL: (Service Check Timed Out) [22:12:13] PROBLEM - Puppet staleness on integration-slave-jessie-1002 is CRITICAL: (Service Check Timed Out) [22:13:06] PROBLEM - Puppet staleness on integration-cumin is CRITICAL: (Service Check Timed Out) [22:14:01] PROBLEM - Puppet errors on integration-agent-docker-1005 is CRITICAL: (Service Check Timed Out) [22:15:19] PROBLEM - Puppet staleness on integration-agent-docker-1002 is CRITICAL: (Service Check Timed Out) [22:15:58] PROBLEM - Puppet staleness on deployment-elastic07 is CRITICAL: (Service Check Timed Out) [22:16:23] PROBLEM - Puppet staleness on deployment-ores01 is CRITICAL: (Service Check Timed Out) [22:16:47] RECOVERY - Free space - all mounts on deployment-sca01 is OK: OK: deployment-prep.deployment-sca01.diskspace._var.byte_percentfree (No valid datapoints found) deployment-prep.deployment-sca01.diskspace._srv.byte_percentfree (No valid datapoints found) deployment-prep.deployment-sca01.diskspace._mnt.byte_percentfree (No valid datapoints found) deployment-prep.deployment-sca01.diskspace._var_log.byte_percentfree (No valid datapoints [22:16:54] PROBLEM - Free space - all mounts on deployment-imagescaler02 is CRITICAL: (Service Check Timed Out) [22:17:42] PROBLEM - Free space - all mounts on deployment-aqs01 is CRITICAL: (Service Check Timed Out) [22:18:23] PROBLEM - Puppet errors on deployment-sentry01 is CRITICAL: (Service Check Timed Out) [22:18:45] PROBLEM - Free space - all mounts on deployment-ores01 is CRITICAL: (Service Check Timed Out) [22:18:45] PROBLEM - Free space - all mounts on deployment-docker-citoid01 is CRITICAL: (Service Check Timed Out) [22:19:25] PROBLEM - Puppet staleness on integration-agent-stretch-1001 is CRITICAL: (Service Check Timed Out) [22:19:45] PROBLEM - Puppet errors on deployment-webperf12 is CRITICAL: (Service Check Timed Out) [22:19:46] PROBLEM - Puppet errors on deployment-webperf12 is CRITICAL: (Service Check Timed Out) [22:20:56] PROBLEM - Free space - all mounts on integration-agent-puppet-docker-1001 is CRITICAL: (Service Check Timed Out) [22:22:34] RECOVERY - Free space - all mounts on deployment-aqs01 is OK: OK: All targets OK [22:22:41] PROBLEM - Puppet errors on deployment-memc07 is CRITICAL: (Service Check Timed Out) [22:23:27] PROBLEM - Puppet staleness on deployment-mcs01 is CRITICAL: (Service Check Timed Out) [22:23:34] PROBLEM - Puppet staleness on deployment-mediawiki-09 is CRITICAL: (Service Check Timed Out) [22:23:37] RECOVERY - Free space - all mounts on deployment-ores01 is OK: OK: All targets OK [22:23:37] RECOVERY - Free space - all mounts on deployment-docker-citoid01 is OK: OK: All targets OK [22:25:45] RECOVERY - Free space - all mounts on integration-agent-puppet-docker-1001 is OK: OK: All targets OK [22:26:14] PROBLEM - Puppet staleness on deployment-docker-cxserver01 is CRITICAL: (Service Check Timed Out) [22:26:44] RECOVERY - Free space - all mounts on deployment-imagescaler02 is OK: OK: All targets OK [22:26:56] monitoring flap .. nothing to worry about apparently [22:26:57] ;) [22:27:17] PROBLEM - Puppet staleness on deployment-wdqs01 is CRITICAL: (Service Check Timed Out) [22:28:59] I think something is still wrong with it actually [22:29:08] (the monitoring system) [22:29:34] if I try running the checks on shinken-02 from the CLI it's mostly fine but sometimes I get "UNKNOWN: execution of the check script exited with exception ('The read operation timed out',)" [22:30:30] PROBLEM - Puppet staleness on deployment-cache-text05 is CRITICAL: (Service Check Timed Out) [22:30:30] PROBLEM - Puppet staleness on deployment-aqs01 is CRITICAL: (Service Check Timed Out) [22:31:09] PROBLEM - Puppet staleness on deployment-memc04 is CRITICAL: (Service Check Timed Out) [22:32:37] PROBLEM - Puppet staleness on deployment-memc06 is CRITICAL: (Service Check Timed Out) [22:33:15] I'm wondering if there's something flakey about the labs network actually [22:33:42] PROBLEM - Free space - all mounts on deployment-aqs01 is CRITICAL: (Service Check Timed Out) [22:36:02] hm no [22:36:13] Krenair: or the icinga system is broken somehow :/ [22:36:44] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: (Service Check Timed Out) [22:36:44] well it's shinken rather than icinga, but right now I'm wondering if something is up with the host behind graphite-labs.wikimedia.org that shinken queries for a lot of this stuff [22:37:24] I was just repeatedly running "curl https://graphite-labs.wikimedia.org -v" on the box [22:37:43] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: (Service Check Timed Out) [22:38:51] I found that it sometimes gets stuck between sending the HTTP request and getting a response back [22:39:30] not any particularly special query or anything, just plain GET / [22:40:55] PROBLEM - Puppet errors on deployment-imagescaler03 is CRITICAL: (Service Check Timed Out) [22:41:08] takes almost exactly a minute [22:41:59] PROBLEM - Puppet staleness on deployment-imagescaler02 is CRITICAL: (Service Check Timed Out) [22:41:59] ATS config in hieradata/common/profile/trafficserver/backend.yaml maps it directly to http://labmon1001.eqiad.wmnet [22:42:00] PROBLEM - Puppet errors on deployment-cache-text05 is CRITICAL: (Service Check Timed Out) [22:42:16] hieradata/role/common/cache/text.yaml does the same [22:44:02] PROBLEM - Puppet errors on integration-agent-docker-1007 is CRITICAL: (Service Check Timed Out) [22:44:41] PROBLEM - Free space - all mounts on deployment-kafka-main-2 is CRITICAL: (Service Check Timed Out) [22:48:55] hashar: are you able to look at why gerrit-replica is down? (https://phabricator.wikimedia.org/T240763) [22:49:06] legoktm: yes i am on it [22:49:22] thank you! [22:50:21] legoktm: I have restarted it [22:50:56] hashar i wonder what it says in the logs [22:50:58] (gerrit logs) [22:51:04] out of memory :/ [22:51:10] oh, really? [22:51:31] 10Gerrit, 10LibUp, 10Operations: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) [2019-12-14 16:07:25,808] [HTTP-87043] WARN org.eclipse.jetty.servlet.ServletHandler : Error for /r/mediawiki/extensions/DataTransfer.git... [22:51:54] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912), 10LibUp, 10Operations: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) [22:52:02] works now :P [22:52:03] awesome, libup is running again :) [22:53:15] legoktm: try to throttle it a bit maybe [22:53:19] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912), 10LibUp, 10Operations: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10Legoktm) Thank you hashar :) Confirmed that libup... [22:53:30] if you can paste on the task the rough date/time you started it that could help [22:53:57] not sure whether the replica was already in a bad shape or whether the libup script caused it to explode :-\ [22:54:03] hashar: right now it's running with concurrency of 2, I can drop that down to 1 if you want [22:54:12] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912), 10LibUp, 10Operations: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) And the trace analysis is https://fastthr... [22:54:21] legoktm: well I don't know what went wrong ;] [22:55:07] hashar https://gerrit-replica.wikimedia.org/r/monitoring?part=graph&graph=usedMemory [22:55:49] slow and steady wins the race [22:55:51] lowering it to 1 [22:56:25] that shows that it started to reach 32gb around midday (and stayed there) [22:56:36] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912), 10LibUp, 10Operations: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) No out of memory messages for December 12... [22:57:03] paladox: I cant even login :D [22:57:13] hashar you /r/login/ [22:57:35] good point [22:57:41] then you type in gerrit-replica url after logging in, as it will redirect you to gerrit.wikimedia.org [22:57:49] hashar also https://phabricator.wikimedia.org/F31475954 :) [22:58:20] https://gerrit-replica.wikimedia.org/r/monitoring?part=graph&graph=usedMemory&period=mois [22:58:29] 250 hits/min on 2,707 requests ;D [22:58:37] looks like it just hit its breaking point [22:58:38] yeh [22:59:48] https://gerrit-replica.wikimedia.org/r/monitoring?part=graph&graph=cpu&period=jour makes it very obvious to see when libup started running [23:01:24] and extdist is hammering it as well [23:01:50] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912), 10LibUp, 10Operations: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10Legoktm) Looks like the memory usage has just bee... [23:02:12] extdist theoretically has been running since ever, while libup had been down for ~20 days [23:02:59] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912), 10LibUp, 10Operations: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) 05Open→03Resolved a:03hashar tldr:... [23:03:41] legoktm: I guess it will be fine now more or less [23:05:43] legoktm: paladox I am off and sleeping now. Will try to remmber to look at it tomorrow :) [23:05:49] good night :) [23:05:55] meanwhile a restart is enough to clear the jvm memory [23:05:55] I will keep an eye on the monitoring page for the next few hours [23:06:01] and whatever was stuck at 100% cpu [23:06:12] and thanks for libup! [23:06:21] thanks! [23:06:36] hashar possibly thats why the memory went up? [23:06:43] (100% cpu) [23:09:37] possibly :) [23:09:40] I dont know really [23:09:48] ;) [23:14:59] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [labs/tools/sonarqubebot] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557289 [23:15:01] (03CR) 10QChris: [V: 03+2 C: 03+2] Allow “Gerrit Managers” to import history [labs/tools/sonarqubebot] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557289 (owner: 10QChris) [23:15:07] (03PS1) 10QChris: Import done. Revoke import grants [labs/tools/sonarqubebot] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557290 [23:15:09] (03CR) 10QChris: [V: 03+2 C: 03+2] Import done. Revoke import grants [labs/tools/sonarqubebot] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557290 (owner: 10QChris) [23:24:10] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/ShortDescription] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557297 [23:24:12] (03CR) 10QChris: [V: 03+2 C: 03+2] Allow “Gerrit Managers” to import history [extensions/ShortDescription] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557297 (owner: 10QChris) [23:24:18] (03PS1) 10QChris: Import done. Revoke import grants [extensions/ShortDescription] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557298 [23:24:20] (03CR) 10QChris: [V: 03+2 C: 03+2] Import done. Revoke import grants [extensions/ShortDescription] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557298 (owner: 10QChris) [23:33:01] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/CreateAPage] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557300 [23:33:03] (03CR) 10QChris: [V: 03+2 C: 03+2] Allow “Gerrit Managers” to import history [extensions/CreateAPage] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557300 (owner: 10QChris) [23:33:08] (03PS1) 10QChris: Import done. Revoke import grants [extensions/CreateAPage] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557301 [23:33:10] (03CR) 10QChris: [V: 03+2 C: 03+2] Import done. Revoke import grants [extensions/CreateAPage] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/557301 (owner: 10QChris) [23:34:54] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Krenair) [23:40:36] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Krenair) [23:43:47] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Krenair) [23:56:03] 10Beta-Cluster-Infrastructure: Figure out future for newly created deployment-prep jessie instances - https://phabricator.wikimedia.org/T218609 (10Krenair) I've just found https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/546731/1 which made a new service in the MW config called echostore and pointe...