[00:06:20] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Patch-For-Review, 10Release, 10Train Deployments: 1.35.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T233862 (10greg) [00:39:05] 10Release-Engineering-Team-TODO, 10MediaWiki-Release-Tools, 10MediaWiki-Releasing (Workflow Improvements), 10Patch-For-Review: merge branch.py and make-wmf-branch - https://phabricator.wikimedia.org/T222829 (10Jdforrester-WMF) Is this really "external"? It feels both internal and mostly-done? [00:48:14] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-Release-Tools, 10MediaWiki-Releasing (Workflow Improvements), 10Patch-For-Review: merge branch.py and make-wmf-branch - https://phabricator.wikimedia.org/T222829 (10thcipriani) [00:48:59] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-Release-Tools: Automate weekly branch cut - https://phabricator.wikimedia.org/T196517 (10thcipriani) [03:18:35] 10Beta-Cluster-Infrastructure, 10User-DannyS712: BetaCluster: ExternalStoreException - Unable to store text to external storage - https://phabricator.wikimedia.org/T228088 (10DannyS712) Just got `[XhVJzawQBGoAAFZofegAAAAO] Exception caught: Unable to store text to external storage` again [04:32:02] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10Commons, 10Wikidata, 10User-brennen: Split group1 so that Commons and Wikidata aren't in the general group1, but their own buckets - https://phabricator.wikimedia.org/T223410 (10Ladsgroup) > Try to roll out the Wikidata... [04:32:07] 10Phabricator, 10User-DannyS712: Exception - You must use withSourcePHIDs() to query edges. - https://phabricator.wikimedia.org/T242186 (10DannyS712) [04:32:40] 10Phabricator, 10User-DannyS712: Exception - You must use withSourcePHIDs() to query edges. - https://phabricator.wikimedia.org/T242186 (10DannyS712) Suspect that this is #phabricator-upstream, but not tagging in case there is a configuration issue that caused it (not very familiar with phabricator software) [04:55:38] 10Phabricator, 10User-DannyS712: Exception - You must use withSourcePHIDs() to query edges. - https://phabricator.wikimedia.org/T242186 (10JJMC89) This may be caused by @Chase's work for {T242032}. [05:10:38] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 21525 bytes in 0.143 second response time [05:10:53] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 4336 bytes in 0.084 second response time [05:11:19] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 4883 bytes in 0.094 second response time [05:12:11] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-09 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 4336 bytes in 0.086 second response time [05:12:22] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-parsoid10 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 4344 bytes in 0.143 second response time [05:21:08] Project beta-update-databases-eqiad build #39313: 04FAILURE in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39313/ [06:17:21] 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10DannyS712) [06:18:06] 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10DannyS712) p:05Triage→03Unbreak! Potential cause: {T215853} - https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/562657/ [06:21:09] Project beta-update-databases-eqiad build #39314: 04STILL FAILING in 1 min 8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39314/ [06:22:44] 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10DannyS712) [06:33:46] 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Operations, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10DannyS712) [06:45:14] 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Operations, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10DannyS712) @Reedy can I ask why https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikibaseLexeme/+/562646/ was abandoned?... [07:13:41] 10Phabricator, 10Security-Team, 10Security: Audit members of #security for more than x duration of no activity - https://phabricator.wikimedia.org/T241781 (10Legoktm) @Wim_b, @QuiteUnusual, @Samtar and @Pmlineditor are all Wikimedia Stewards, presumably they have access because of that. @csteipp is the OG s... [07:18:23] 10Continuous-Integration-Config, 10Security: Add some form of static analysis for package-lock.json - https://phabricator.wikimedia.org/T242058 (10Legoktm) >>! In T242058#5784237, @Jdforrester-WMF wrote: > Adding a `npx lockfile-lint --path package-lock.json --type npm -validate-https --allowed-hosts registry.... [07:21:50] Project beta-update-databases-eqiad build #39315: 04STILL FAILING in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39315/ [07:26:21] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 49502 bytes in 2.447 second response time [07:27:18] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-09 is OK: HTTP OK: HTTP/1.1 200 OK - 48978 bytes in 0.936 second response time [07:27:22] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-parsoid10 is OK: HTTP OK: HTTP/1.1 200 OK - 49002 bytes in 1.771 second response time [07:30:37] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 37774 bytes in 0.552 second response time [07:30:54] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-07 is OK: HTTP OK: HTTP/1.1 200 OK - 48978 bytes in 0.503 second response time [08:06:37] 10Continuous-Integration-Config, 10Code-Health-Metrics, 10Product-Infrastructure-Team-Backlog: Enable codehealth pipeline for node services - https://phabricator.wikimedia.org/T240989 (10kostajh) @MSantos T238004 is not yet live but it has all the patches which show what you'd want to do. Are those patches s... [08:06:54] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-StopForumSpam, 10Patch-For-Review, 10Wikimedia-extension-review-queue: Deploy StopForumSpam to the Beta Cluster - https://phabricator.wikimedia.org/T181217 (10Tgr) [08:21:02] Yippee, build fixed! [08:21:03] Project beta-update-databases-eqiad build #39316: 09FIXED in 1 min 2 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39316/ [08:27:40] 10Phabricator, 10User-DannyS712: Exception - You must use withSourcePHIDs() to query edges. - https://phabricator.wikimedia.org/T242186 (10Tgr) Visiting {T125338} also gives that error. [08:41:26] 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Operations, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10Reedy) >>! In T242188#5784973, @DannyS712 wrote: > @Reedy can I ask why https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions... [08:41:41] 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Operations, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10Reedy) 05Open→03Resolved a:03Reedy [13:21:04] 10MediaWiki-Codesniffer: Forbid usage of array_push with a single element - https://phabricator.wikimedia.org/T242218 (10Daimona) [13:58:06] 10Phabricator, 10User-DannyS712: Exception: "You must use withSourcePHIDs() to query edges." for #deprecated-security-team-reviews and some of its associated tasks - https://phabricator.wikimedia.org/T242186 (10Aklapper) p:05Triage→03High [14:00:58] 10Phabricator, 10User-DannyS712: Exception: "You must use withSourcePHIDs() to query edges." for #deprecated-security-team-reviews and some of its associated tasks - https://phabricator.wikimedia.org/T242186 (10Aklapper) >>! In T242186#5784846, @JJMC89 wrote: > This may be caused by `@Chase`'s work for {T24203... [14:15:14] 10Phabricator, 10User-DannyS712: Exception: "You must use withSourcePHIDs() to query edges." for #deprecated-security-team-reviews and some of its associated tasks - https://phabricator.wikimedia.org/T242186 (10Aklapper) Interesting that https://phabricator.wikimedia.org/project/profile/944/ can be reached. L... [15:06:46] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:07:03] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-07 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:07:30] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:08:19] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-09 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:08:32] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-parsoid10 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:58] PROBLEM - Free space - all mounts on deployment-ores01 is CRITICAL: (Service Check Timed Out) [15:15:49] RECOVERY - Free space - all mounts on deployment-ores01 is OK: OK: All targets OK [15:16:38] 10Continuous-Integration-Config, 10Code-Health-Metrics, 10Product-Infrastructure-Team-Backlog, 10Services: Enable codehealth pipeline for node services - https://phabricator.wikimedia.org/T240989 (10MSantos) p:05Triage→03Low @kostajh thanks for pointing that out, our team is under-resourced for this qu... [15:28:24] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-parsoid10 is OK: HTTP OK: HTTP/1.1 200 OK - 49002 bytes in 1.673 second response time [15:31:38] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 37748 bytes in 0.594 second response time [15:31:53] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-07 is OK: HTTP OK: HTTP/1.1 200 OK - 48978 bytes in 0.594 second response time [15:37:20] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 49485 bytes in 0.662 second response time [15:38:10] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-09 is OK: HTTP OK: HTTP/1.1 200 OK - 48962 bytes in 0.894 second response time [15:53:24] PROBLEM - Free space - all mounts on integration-cumin is CRITICAL: (Service Check Timed Out) [15:56:06] PROBLEM - Free space - all mounts on deployment-deploy02 is CRITICAL: (Service Check Timed Out) [15:58:23] RECOVERY - Free space - all mounts on integration-cumin is OK: OK: All targets OK [16:01:06] RECOVERY - Free space - all mounts on deployment-deploy02 is OK: OK: All targets OK [16:01:59] PROBLEM - Free space - all mounts on deployment-ores01 is CRITICAL: (Service Check Timed Out) [16:03:01] PROBLEM - Free space - all mounts on deployment-puppetdb02 is CRITICAL: (Service Check Timed Out) [16:06:49] RECOVERY - Free space - all mounts on deployment-ores01 is OK: OK: All targets OK [16:07:53] RECOVERY - Free space - all mounts on deployment-puppetdb02 is OK: OK: All targets OK [16:13:27] 10Phabricator, 10Security-Team, 10Security: Audit members of #security for more than x duration of no activity - https://phabricator.wikimedia.org/T241781 (10Anomie) >>! In T241781#5784989, @Legoktm wrote: > @csteipp is the OG security team To clarify, he was the entire Security "team" for a while (2012-201... [16:28:54] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-StopForumSpam, 10Patch-For-Review, 10Wikimedia-extension-review-queue: Deploy StopForumSpam to the Beta Cluster - https://phabricator.wikimedia.org/T181217 (10MarcoAurelio) p:05Lowest→03Triage [16:52:06] 10Release-Engineering-Team (Other / Uncategorized), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Releng Q3 priorities vs resources Chart - https://phabricator.wikimedia.org/T242237 (10thcipriani) [16:52:07] (03CR) 10Zoranzoki21: [C: 03+1] Change email for ptrcnull [integration/config] - 10https://gerrit.wikimedia.org/r/562869 (owner: 10Bjornskjald) [16:52:08] 10Release-Engineering-Team (Other / Uncategorized), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Releng Q3 priorities vs resources Chart - https://phabricator.wikimedia.org/T242237 (10thcipriani) p:05Triage→03Normal [17:01:56] 10Phabricator, 10Security-Team, 10PM: make security-team-reviews a subproject of security-team-services - https://phabricator.wikimedia.org/T242032 (10chasemp) note https://phabricator.wikimedia.org/T242163#5786664 [17:33:08] 10Beta-Cluster-Infrastructure, 10ChangeProp, 10Core Platform Team, 10WMF-JobQueue: Job queue broken on Beta Cluster - https://phabricator.wikimedia.org/T241448 (10WDoranWMF) [17:47:54] PROBLEM - Parsoid on deployment-parsoid09 is CRITICAL: connect to address 172.16.5.63 and port 8000: Connection refused [17:47:54] PROBLEM - Parsoid on deployment-mediawiki-parsoid10 is CRITICAL: connect to address 172.16.0.141 and port 8000: Connection refused [17:49:04] There seems to be quite a few CI jobs failing cloning repos atm from gerrit [17:52:39] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10User-brennen: Set up new personal desktop - https://phabricator.wikimedia.org/T242248 (10brennen) [17:54:52] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10User-brennen: Explore tools for remote pairing - https://phabricator.wikimedia.org/T240484 (10brennen) [17:57:47] 10Phabricator, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10DBA, and 2 others: Prepare a disaster recovery plan for failing over Phabricator - https://phabricator.wikimedia.org/T190572 (10mmodell) I believe we can close this as resolved? [18:00:40] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Security-Team: Make ‘Protect as security issue’ add project #security-team - https://phabricator.wikimedia.org/T242018 (10mmodell) [18:03:41] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10User-brennen: logspam.pl: Shorten paths and include fatals - https://phabricator.wikimedia.org/T242252 (10brennen) [18:17:12] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): move_project breaks the world when moving a subproject that already has subprojects. - https://phabricator.wikimedia.org/T242254 (10mmodell) [18:18:30] 10Phabricator, 10Release-Engineering-Team-TODO: move_project breaks the world when moving a subproject that already has subprojects. - https://phabricator.wikimedia.org/T242254 (10mmodell) p:05Triage→03High [18:22:51] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10User-brennen: logspam.pl: Shorten paths and include fatals - https://phabricator.wikimedia.org/T242252 (10thcipriani) p:05Triage→03Normal [18:32:50] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Write proposal for personal development solution - https://phabricator.wikimedia.org/T242258 (10thcipriani) p:05Triage→03Normal [18:34:00] 10Release-Engineering-Team (Local Dev), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Write proposal for personal development solution - https://phabricator.wikimedia.org/T242258 (10thcipriani) [18:34:17] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Update parsoid image and helm charts for php - https://phabricator.wikimedia.org/T242259 (10jeena) [18:35:30] 10Release-Engineering-Team (Local Dev), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Update parsoid image and helm charts for php - https://phabricator.wikimedia.org/T242259 (10thcipriani) [18:35:47] 10Release-Engineering-Team (Local Dev), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Update parsoid image and helm charts for php - https://phabricator.wikimedia.org/T242259 (10thcipriani) p:05Triage→03Normal [18:37:24] (03CR) 10Daimona Eaytoy: [C: 04-1] "See also If738c488afc207920b6348af43c27ca3edf308d1. Basically, this all is blocked on the sniff for PHPUnit's assertArraySubset. If we rel" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/562626 (owner: 10Jforrester) [18:44:39] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Code-Review-Workgroup: Review and refine the Code Review Office Hours model of engagment - https://phabricator.wikimedia.org/T229512 (10mmodell) [18:46:12] 10Phabricator, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): move_project breaks the world when moving a subproject that already has subprojects. - https://phabricator.wikimedia.org/T242254 (10mmodell) [18:57:22] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Write command line API client for Ick controller, in Go - https://phabricator.wikimedia.org/T239899 (10LarsWirzenius) 05Open→03Resolved This is written now and works sufficiently that I can use it. I'll be presenting it at a future Go study group mee... [18:57:24] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Learn Go - https://phabricator.wikimedia.org/T234543 (10LarsWirzenius) [19:08:27] (03PS4) 10Jforrester: Npm install before each node command [integration/quibble] - 10https://gerrit.wikimedia.org/r/540387 (https://phabricator.wikimedia.org/T225008) (owner: 10Awight) [19:09:00] (03CR) 10Jforrester: [C: 03+1] "PS4: Manual rebase." [integration/quibble] - 10https://gerrit.wikimedia.org/r/540387 (https://phabricator.wikimedia.org/T225008) (owner: 10Awight) [19:12:06] 10Beta-Cluster-Infrastructure, 10ChangeProp, 10Core Platform Team, 10WMF-JobQueue: Job queue broken on Beta Cluster - https://phabricator.wikimedia.org/T241448 (10Pchelolo) 05Open→03Resolved a:03Pchelolo > It worked but took ca. 6 minutes to complete. I don't think it would be possible any more to d... [19:12:08] 10Beta-Cluster-Infrastructure, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 10User-DannyS712: Global renames aren't being processed on beta cluster - https://phabricator.wikimedia.org/T241294 (10Pchelolo) [19:15:51] 10Beta-Cluster-Infrastructure, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 10User-DannyS712: Global renames aren't being processed on beta cluster - https://phabricator.wikimedia.org/T241294 (10DannyS712) Now that the joq queue should be running, I've tested https://deployment.wikipedia.beta.wmflab... [19:20:04] (03PS1) 10Jforrester: Release Quibble 0.0.40 [integration/quibble] - 10https://gerrit.wikimedia.org/r/562918 (https://phabricator.wikimedia.org/T192167) [19:22:42] (03CR) 10Jforrester: [C: 03+2] "What could possibly go wrong?" [integration/quibble] - 10https://gerrit.wikimedia.org/r/562918 (https://phabricator.wikimedia.org/T192167) (owner: 10Jforrester) [19:23:47] (03Merged) 10jenkins-bot: Release Quibble 0.0.40 [integration/quibble] - 10https://gerrit.wikimedia.org/r/562918 (https://phabricator.wikimedia.org/T192167) (owner: 10Jforrester) [19:25:08] !log Tagged Quibble 0.0.40 @ ba365ab36fd84a87fcfb10e0910255e8b44432f1 T192167 T220586 T236222 T236680 [19:25:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:25:13] T236680: Enable API integration tests in CI for MediaWiki core - https://phabricator.wikimedia.org/T236680 [19:25:14] T192167: Upgrade PHPUnit from 4/6 to 8 - https://phabricator.wikimedia.org/T192167 [19:25:14] T220586: Quibble to output markers for processing its output - https://phabricator.wikimedia.org/T220586 [19:25:39] (03PS1) 10Jforrester: changelog: Begin new version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/562924 [19:27:22] (03CR) 10Jforrester: [C: 03+2] changelog: Begin new version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/562924 (owner: 10Jforrester) [19:28:16] (03Merged) 10jenkins-bot: changelog: Begin new version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/562924 (owner: 10Jforrester) [19:30:43] Do we have a ticket for random failures caused by gerrit returning 500s? [19:30:55] Like this: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/562783#message-ab0a604a118a31f535f134f108d5d414ba2fac45 [19:31:00] I've not seen one... But definitely should have one if not [19:32:54] Amir1 did that just happen to you? [19:33:05] yup, for weeks now [19:33:23] oh! [19:33:24] ci [19:34:50] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments: 1.35.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T233863 (10thcipriani) p:05Triage→03Normal [19:37:23] I've seen it numerous times in the last few days [19:39:12] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Patch-For-Review, 10User-brennen: logspam.pl: Shorten paths and include fatals - https://phabricator.wikimedia.org/T242252 (10brennen) [19:39:20] I see nothing in https://gerrit.wikimedia.org/r/monitoring (that could be the cause) so maybe network? [19:40:52] http 502 is bad gateway... [19:42:24] oh, right [19:43:11] thcipriani ^ maybe you see something in the logs that could explain the 502? (apache logs) [19:52:13] hrm, I see some upload-pack errors for core around 19:30 [19:52:20] some connection reset by peer...which is fine [19:52:28] trying to see if there are other causes in here [19:54:28] looks like there was a hiccup talking to reviewdb as well [19:56:57] I'll update the integration agents' cache, that should mean less fetching overall [19:57:25] !log root@integration-cumin:~# cumin -b1 'name:integration-agent-docker' 'sleep 1; git -C /srv/git/mediawiki/core.git fetch' [19:57:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:01:54] (03PS1) 10Jforrester: RELEASING: Drop reference to now-shut qa mailing list [integration/quibble] - 10https://gerrit.wikimedia.org/r/562941 [20:06:37] (03CR) 10Kosta Harlan: [C: 03+2] RELEASING: Drop reference to now-shut qa mailing list [integration/quibble] - 10https://gerrit.wikimedia.org/r/562941 (owner: 10Jforrester) [20:26:33] (03CR) 10Umherirrender: "recheck" [integration/docroot] - 10https://gerrit.wikimedia.org/r/562953 (owner: 10Libraryupgrader) [20:36:14] o/ [20:36:28] is jenkins stalled? a few of my recent patches don't seem to have noticed [20:36:41] apparently also happened https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/562639/ too [20:36:41] they were simple enough for me to just merge, but now I want serivce-pipeline-publish to run... [20:36:42] :) [20:36:49] *to [20:37:00] also here to leave that jenkins comment [20:37:11] ah you see that too mutante ? [20:37:23] https://integration.wikimedia.org/zuul/ is looking very quiet [20:37:32] yea, i did not get a jenkins-bot vote yet [20:37:34] yeah jenkins is mostly idle [20:37:43] is zuul dead? [20:37:50] Jeena looks to be getting annoyed too [20:37:53] goes to contint1001 [20:39:00] Hey, I'm safe-restarting Jenkins to see if that helps. [20:39:36] it looks to me like either zuul is dead or gerrit isn't sending any events [20:39:39] !log Safe-restarting Jenkins to see if it fixes the lack of progress [20:39:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:39:43] zuul log is very quiet [20:39:48] let me restart zuul [20:39:52] that is usually it..afaik [20:40:06] Sadly restarting zuul is destructive, but yes. [20:40:20] Wow, that did something. [20:40:55] zuul log is now active [20:41:23] well it's destructive if there are patches moving through it [20:41:24] !log contint1001 - restartted zuul service [20:41:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:41:34] Did it looseohh [20:41:36] ohh [20:41:36] thcipriani: Yeah. [20:41:44] it appears it could have been the db? [20:42:00] i see: [20:42:01] WARN [com.google.gerrit.sshd.CommandFactoryProvider] Cannot start command "gerrit query --format json --commit-message --current-patch-set message:Ic3dd6651b85a7f46c2fa38fd2d80a129ec1c1346" for user jenkins-bot [20:42:09] Jenkins lists active jobs not on Zuul's list. [20:42:17] * James_F ponders terminating them. [20:42:28] +1 terminating them [20:42:41] !log Terminating some jobs Zuul doesn't know about. [20:42:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:42:52] zuul hasn't logged anything since restarting [20:43:14] it did log some DEBUG lines [20:43:30] but now it's just DEBUG zuul.Scheduler: Run handler sleeping [20:44:05] Has its connection with gerrit not been re-established? [20:44:06] seems that jenkins is restarting now that it's "safe" [20:44:12] Yeah. [20:44:28] I see 2 connections from jenkins-bot in gerrit ssh [20:44:34] zuul log is showing things again [20:44:43] it's updating repos .. [20:44:45] (which is "normal" but I don't remember why) [20:45:19] fixed [20:45:30] Jeena's change now has a vote after my last "recheck" [20:45:33] Yup. [20:45:40] G&S now running on 562946, longma. [20:46:02] thanks James_F [20:46:09] thanks mutante and James_F :) [20:46:23] hm ok if i merged a patch already that was to trigger service-pipeline-publish [20:46:27] what should I do? [20:46:48] I'm not sure we can synthetically trigger publish jobs from gerrit. [20:46:59] Manually run the jenkins job? [20:47:00] ottomata: merge another dummy change on the same repo? [20:47:02] ok [20:47:05] thcipriani i think that's because it watches the ssh events? [20:47:37] ottomata: could also push a tag [20:47:48] that'll trigger service-pipeline-publish [20:48:09] Dummy change/tag is probably best, yes. [20:54:06] (03PS1) 10Jforrester: dockerfiles: Create images for Quibble version 0.0.40 [integration/config] - 10https://gerrit.wikimedia.org/r/562959 (https://phabricator.wikimedia.org/T192167) [20:54:12] (03PS1) 10Jforrester: jjb: Switch over to images using Quibble version 0.0.40 [integration/config] - 10https://gerrit.wikimedia.org/r/562960 (https://phabricator.wikimedia.org/T192167) [20:54:32] (03CR) 10Jforrester: [C: 03+2] dockerfiles: Create images for Quibble version 0.0.40 [integration/config] - 10https://gerrit.wikimedia.org/r/562959 (https://phabricator.wikimedia.org/T192167) (owner: 10Jforrester) [20:55:03] ty that worked [20:55:43] (03Merged) 10jenkins-bot: dockerfiles: Create images for Quibble version 0.0.40 [integration/config] - 10https://gerrit.wikimedia.org/r/562959 (https://phabricator.wikimedia.org/T192167) (owner: 10Jforrester) [20:56:33] !log Docker: Publishing Quibble 0.0.40 images on contint1001 T192167 T220586 T236222 T236680 [20:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:56:38] T236680: Enable API integration tests in CI for MediaWiki core - https://phabricator.wikimedia.org/T236680 [20:56:39] T192167: Upgrade PHPUnit from 4/6 to 8 - https://phabricator.wikimedia.org/T192167 [20:56:39] T220586: Quibble to output markers for processing its output - https://phabricator.wikimedia.org/T220586 [20:57:54] (03CR) 10Umherirrender: "recheck" [integration/docroot] - 10https://gerrit.wikimedia.org/r/562953 (owner: 10Libraryupgrader) [21:05:56] (03CR) 10Jforrester: [C: 03+2] "…" [integration/quibble] - 10https://gerrit.wikimedia.org/r/562941 (owner: 10Jforrester) [21:12:30] (03Merged) 10jenkins-bot: RELEASING: Drop reference to now-shut qa mailing list [integration/quibble] - 10https://gerrit.wikimedia.org/r/562941 (owner: 10Jforrester) [21:18:24] longma: Found a train-blocker for Commons; should probably revert there. :-( [21:18:46] just commons or all of group1? [21:19:01] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Patch-For-Review, 10Release, 10Train Deployments: 1.35.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T233862 (10Jdforrester-WMF) [21:20:08] longma: Just Commons, it's only a bug in the SDC code AFAICT. [21:20:20] okay [21:27:30] James_F: running scap sync now [21:30:37] 10Release-Engineering-Team, 10serviceops, 10Patch-For-Review: decommission phab1003.eqiad.wmnet - https://phabricator.wikimedia.org/T238957 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `phab1003.eqiad.wmnet` - phab1003.eqiad.wmnet (**PASS**) - Downtimed ho... [21:32:22] 10Release-Engineering-Team, 10serviceops, 10Patch-For-Review: decommission phab1003.eqiad.wmnet - https://phabricator.wikimedia.org/T238957 (10Dzahn) [21:39:49] 10MediaWiki-Codesniffer, 10FR-Smashpig, 10Fundraising-Backlog, 10Patch-For-Review: Write mutant code style config for SmashPig, or fully adopt MediaWiki style - https://phabricator.wikimedia.org/T133576 (10Ejegg) .phpcs.xml is a decent de-facto style guide which can be tightened up as time goes on. [21:40:03] 10MediaWiki-Codesniffer, 10FR-Smashpig, 10Fundraising-Backlog, 10Patch-For-Review: Write mutant code style config for SmashPig, or fully adopt MediaWiki style - https://phabricator.wikimedia.org/T133576 (10Ejegg) 05Open→03Resolved [22:04:31] Project beta-code-update-eqiad build #279340: 04FAILURE in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/279340/ [22:11:14] 22:03:03 fatal: unable to access 'https://gerrit.wikimedia.org/r/mediawiki/core/': The requested URL returned error: 502 [22:11:59] :O [22:12:03] Reedy: Yeah, seen a few of those today. [22:12:04] hmm. that's a "Not Found" [22:12:18] mutante that's a gate way error :) [22:12:46] i see another db error at 07 [22:12:53] paladox: yea, but the actual website says "Not Found" as if it should be a 4xx [22:13:00] oh [22:13:01] yeh [22:13:06] that's because it's a git clone :) [22:13:14] (it's not ment to serve ui) [22:13:26] i remember now [22:13:34] maybe we should still redirect those [22:13:44] if we can tell it's in browser [22:14:07] paladox: what db error? [22:14:21] Caused by: com.google.gwtorm.server.OrmException: Cannot open database connection [22:14:32] Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure [22:14:32] The last packet successfully received from the server was 1 milliseconds ago. The last packet sent successfully to the server was 1 milliseconds ago. [22:14:36] Yippee, build fixed! [22:14:36] Project beta-code-update-eqiad build #279341: 09FIXED in 1 min 35 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/279341/ [22:15:20] there seems to have been many db errors today [22:15:37] hrrmmmm [22:15:49] but the timing does not match what Reedy posted [22:16:17] Maybe the issue is described in apache error log? :) [22:19:37] E486: Pattern not found: 502 76,91-96 6% [22:20:00] pattern? [22:20:07] "502" is not in the logfile [22:20:32] oh [22:21:02] well.. i can clone that without problems [22:21:11] mediawiki/core is working for me [22:21:19] i doin't think it'll log the 502 like that [22:21:27] mutante it's itermittent :) [22:23:23] https://phabricator.wikimedia.org/T240763 ? [22:23:38] no... [22:24:19] that's not related [22:24:24] that was due to the OOM [22:27:21] 10Project-Admins, 10Office-IT: Scope of #Office-IT project tag in Phabricator? - https://phabricator.wikimedia.org/T242292 (10Aklapper) [22:30:04] 10Release-Engineering-Team-TODO, 10Quality-and-Test-Engineering-Team (QTE): Archive mediawiki/selenium? - https://phabricator.wikimedia.org/T242293 (10Jdforrester-WMF) [23:05:40] (03CR) 10Jforrester: [C: 03+2] "Deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/562960 (https://phabricator.wikimedia.org/T192167) (owner: 10Jforrester) [23:06:41] (03Merged) 10jenkins-bot: jjb: Switch over to images using Quibble version 0.0.40 [integration/config] - 10https://gerrit.wikimedia.org/r/562960 (https://phabricator.wikimedia.org/T192167) (owner: 10Jforrester) [23:35:50] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Patch-For-Review, 10Release, 10Train Deployments: 1.35.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T233862 (10Jdforrester-WMF)