[00:54:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [01:03:45] PROBLEM - Puppet staleness on deployment-logstash2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [01:49:03] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.230 second response time [02:29:53] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<50.00%) [06:00:44] 10Gerrit, 10translatewiki.net: Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4 - https://phabricator.wikimedia.org/T222546 (10Nikerabbit) p:05Triage→03Unbreak! RepoNg defaults to number of cores (8 on translatewiki.net server). Same limit is used... [06:13:15] 10Gerrit, 10translatewiki.net, 10Patch-For-Review: Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4 - https://phabricator.wikimedia.org/T222546 (10Nikerabbit) a:03Nikerabbit [06:13:36] 10Gerrit, 10translatewiki.net, 10Language-Team (Language-2019-April-June), 10Patch-For-Review: Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4 - https://phabricator.wikimedia.org/T222546 (10Nikerabbit) [06:25:57] PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: (Service Check Timed Out) [06:29:24] 10Gerrit, 10translatewiki.net, 10Language-Team (Language-2019-April-June): Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4 - https://phabricator.wikimedia.org/T222546 (10Nikerabbit) The above change is deployed. I would be curious to hear from @Ray... [06:40:06] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [06:49:52] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [06:50:04] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.024 second response time [06:56:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [07:15:13] 10Gerrit, 10translatewiki.net, 10Language-Team (Language-2019-April-June): Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4 - https://phabricator.wikimedia.org/T222546 (10Raymond) >>! In T222546#5158738, @Nikerabbit wrote: > The above change is depl... [08:03:27] (03PS1) 10Hashar: github-php-security: rename grantmetrics to eventmetrics [integration/config] - 10https://gerrit.wikimedia.org/r/508282 (https://phabricator.wikimedia.org/T222455) [08:04:25] (03CR) 10Hashar: [C: 03+2] github-php-security: rename grantmetrics to eventmetrics [integration/config] - 10https://gerrit.wikimedia.org/r/508282 (https://phabricator.wikimedia.org/T222455) (owner: 10Hashar) [08:04:52] 10Continuous-Integration-Config, 10Patch-For-Review: Change https://github.com/wikimedia/grantmetrics to https://github.com/wikimedia/eventmetrics for php-composer-security-docker - https://phabricator.wikimedia.org/T222455 (10hashar) 05Open→03Resolved a:03hashar I have updated https://integration.wikime... [08:06:27] (03Merged) 10jenkins-bot: github-php-security: rename grantmetrics to eventmetrics [integration/config] - 10https://gerrit.wikimedia.org/r/508282 (https://phabricator.wikimedia.org/T222455) (owner: 10Hashar) [08:22:22] Good morning, @hashar: could you maybe have a look at this change: https://gerrit.wikimedia.org/r/c/integration/config/+/507298 ? We're hoping to finish this process of renaming soon, and this is one of the last things left open. [08:27:29] noa_wmde: sounds easy ;) deploying it right now! thank you [08:27:35] (03CR) 10Hashar: [C: 03+2] Update for WikibaseSchema → EntitySchema rename [integration/config] - 10https://gerrit.wikimedia.org/r/507298 (https://phabricator.wikimedia.org/T222189) (owner: 10Lucas Werkmeister (WMDE)) [08:29:03] (03Merged) 10jenkins-bot: Update for WikibaseSchema → EntitySchema rename [integration/config] - 10https://gerrit.wikimedia.org/r/507298 (https://phabricator.wikimedia.org/T222189) (owner: 10Lucas Werkmeister (WMDE)) [08:29:28] hashar: no thank you! [08:30:58] (03CR) 10Hashar: [C: 03+2] "Should be good now :]" [integration/config] - 10https://gerrit.wikimedia.org/r/507298 (https://phabricator.wikimedia.org/T222189) (owner: 10Lucas Werkmeister (WMDE)) [08:40:41] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [08:40:44] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10MW-1.27-release-notes, 10Patch-For-Review, 10Technical-Debt: Phaseout CI mediawiki config / extensions_load.txt to load extensions - https://phabricator.wikimedia.org/T189567 (10hashar) 05Open→03Resolved The last use case of `EXT... [08:50:36] 10Phabricator-Sprint-Extension: Call to undefined method SprintProjectProfilePanelEngine::buildNavigation() - when accessing "Burndown" - https://phabricator.wikimedia.org/T222585 (10Vlaza-servoy-com) [08:53:23] 10Phabricator-Sprint-Extension: Call to undefined method SprintProjectProfilePanelEngine::buildNavigation() - when accessing "Burndown" - https://phabricator.wikimedia.org/T222586 (10Vlaza-servoy-com) [09:08:49] (03PS1) 10Hashar: docker: use libtidy-0.99.0 Debian package [integration/config] - 10https://gerrit.wikimedia.org/r/508288 (https://phabricator.wikimedia.org/T191771) [09:17:29] (03PS2) 10Hashar: docker: use libtidy-0.99.0 Debian package [integration/config] - 10https://gerrit.wikimedia.org/r/508288 (https://phabricator.wikimedia.org/T191771) [09:20:37] (03CR) 10Hashar: [C: 03+2] docker: use libtidy-0.99.0 Debian package [integration/config] - 10https://gerrit.wikimedia.org/r/508288 (https://phabricator.wikimedia.org/T191771) (owner: 10Hashar) [09:20:43] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Testing, 10MediaWiki-Parser, 10Quibble, and 2 others: [REL1_30] Some parserTests fail on debian stretch using Tidy, because of a new version of libtidy - https://phabricator.wikimedia.org/T191771 (10hashar) 05Open→03Resolved a:03hashar I have migr... [09:22:08] (03Merged) 10jenkins-bot: docker: use libtidy-0.99.0 Debian package [integration/config] - 10https://gerrit.wikimedia.org/r/508288 (https://phabricator.wikimedia.org/T191771) (owner: 10Hashar) [09:26:03] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: 'recheck' on a CR+2 patch should trigger gate-and-submit, not test - https://phabricator.wikimedia.org/T105474 (10hashar) 05Open→03Resolved [09:27:57] (03CR) 10Hashar: [C: 03+2] zuul: skip test/test-prio for CR+2 changes [integration/config] - 10https://gerrit.wikimedia.org/r/368154 (https://phabricator.wikimedia.org/T105474) (owner: 10Hashar) [09:27:59] (03Merged) 10jenkins-bot: zuul: skip test/test-prio for CR+2 changes [integration/config] - 10https://gerrit.wikimedia.org/r/368154 (https://phabricator.wikimedia.org/T105474) (owner: 10Hashar) [09:37:49] (03CR) 10Lars Wirzenius: [C: 03+1] "Commit message may or may not need amending. If you don't think it does, treat this as a +2 or tell me that you do, and I'll +2 this." (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/508036 (https://phabricator.wikimedia.org/T222199) (owner: 10Dduvall) [09:43:11] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog), 10Operations: contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10hashar) Eventually I have unzipped them, the reason is the log rotation is handled by python logging not by... [10:35:03] <_joe_> hi, it's more than 20 minutes I wait for a patch of mine to be checked by CI [10:35:14] <_joe_> what can be done to fix it? [10:46:03] (03PS3) 10Hashar: Adding trailling slash to doc publishing URL [integration/config] - 10https://gerrit.wikimedia.org/r/483682 (https://phabricator.wikimedia.org/T213509) [10:46:23] (03CR) 10Hashar: [C: 03+2] Adding trailling slash to doc publishing URL [integration/config] - 10https://gerrit.wikimedia.org/r/483682 (https://phabricator.wikimedia.org/T213509) (owner: 10Hashar) [10:48:41] (03CR) 10Hashar: [C: 03+2] [ThrottleOverride] Add phan [integration/config] - 10https://gerrit.wikimedia.org/r/506882 (owner: 10Umherirrender) [10:49:17] (03Merged) 10jenkins-bot: Adding trailling slash to doc publishing URL [integration/config] - 10https://gerrit.wikimedia.org/r/483682 (https://phabricator.wikimedia.org/T213509) (owner: 10Hashar) [10:50:11] (03Merged) 10jenkins-bot: [ThrottleOverride] Add phan [integration/config] - 10https://gerrit.wikimedia.org/r/506882 (owner: 10Umherirrender) [10:58:44] 10Beta-Cluster-Infrastructure: Migrate away from Debian Jessie to Debian Stretch - https://phabricator.wikimedia.org/T218729 (10fgiunchedi) >>! In T218729#5156079, @Krenair wrote: >>>! In T218729#5155492, @fgiunchedi wrote: >>>>! In T218729#5153739, @Krenair wrote: >>>>>! In T218729#5143033, @fgiunchedi wrote: >... [11:01:49] 10Phabricator-Sprint-Extension: Call to undefined method SprintProjectProfilePanelEngine::buildNavigation() - when accessing "Burndown" - https://phabricator.wikimedia.org/T222585 (10Mainframe98) [11:01:56] 10Phabricator-Sprint-Extension: Call to undefined method SprintProjectProfilePanelEngine::buildNavigation() - when accessing "Burndown" - https://phabricator.wikimedia.org/T222586 (10Mainframe98) [11:20:37] 10Phabricator-Sprint-Extension: Call to undefined method SprintProjectProfilePanelEngine::buildNavigation() - when accessing "Burndown" - https://phabricator.wikimedia.org/T222586 (10Aklapper) p:05Triage→03Lowest [11:36:03] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.033 second response time [11:42:06] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [11:51:59] 10Release-Engineering-Team: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10Joe) [11:52:11] 10Release-Engineering-Team: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10Joe) p:05Triage→03Unbreak! [11:54:16] 10Continuous-Integration-Infrastructure, 10Tool-extjsonuploader, 10Test-Coverage: Provide list of repos with coverage information in machine-readable format - https://phabricator.wikimedia.org/T221510 (10hashar) p:05Triage→03Low I am not sure what you are looking for. Do you need a machine friendly list... [12:03:45] (03CR) 10Hashar: [C: 03+2] [TemplateStyles] Add phan [integration/config] - 10https://gerrit.wikimedia.org/r/507575 (owner: 10Umherirrender) [12:05:16] (03Merged) 10jenkins-bot: [TemplateStyles] Add phan [integration/config] - 10https://gerrit.wikimedia.org/r/507575 (owner: 10Umherirrender) [12:11:47] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [12:34:11] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [12:41:32] 10Continuous-Integration-Infrastructure, 10Jenkins: JENKINS-2111 path sanitization ineffective when using legacy Workspace Root Directory - https://phabricator.wikimedia.org/T213956 (10hashar) [12:41:35] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog), 10Jenkins: Jenkins warning JENKINS-2111 path sanitization ineffective - https://phabricator.wikimedia.org/T217791 (10hashar) [12:44:28] 10Phabricator, 10Release-Engineering-Team (Backlog): Getting Admin rights for Phabricator - https://phabricator.wikimedia.org/T221136 (10RazShuty) @Aklapper sorry for the super late reply just came back from a long vacation... I just enabled the 2fa for my Phab... and yes you are absolutely right about the und... [12:44:45] apparently gerrit upstream are creating a steering committee and mentorships roles. [12:44:58] https://gerrit-review.googlesource.com/c/gerrit/+/223472/ [12:45:04] https://gerrit-review.googlesource.com/c/gerrit/+/223474/ [12:48:44] google's increasing it's investment into gerrit too. [12:52:06] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.058 second response time [12:54:43] Is anyone having a look at T222605 by any chance? [12:54:44] T222605: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 [12:55:09] PROBLEM - Citoid on deployment-sca02 is CRITICAL: connect to address 172.16.5.112 and port 1970: Connection refused [12:55:31] hashar ^^ [13:00:23] Gerrit appears to be more transparent! [13:03:03] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [13:05:08] RECOVERY - Citoid on deployment-sca02 is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.026 second response time [13:16:35] 10Continuous-Integration-Infrastructure: Jenkins jobs regularly being queued while resources appear to be readily available - https://phabricator.wikimedia.org/T218458 (10hashar) I have trouble figuring out how the job end up being scheduled. The Zuul scheduler triggers Gearman jobs (eg: `build:npm-node-6-docke... [13:16:53] 10Continuous-Integration-Infrastructure: Jenkins jobs regularly being queued while resources appear to be readily available - https://phabricator.wikimedia.org/T218458 (10hashar) TLDR: we should look at installing the **Least Load plugin** https://plugins.jenkins.io/leastload [13:18:30] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:31:28] !log Jenkins: installed Least Load plugin | T218458 [13:31:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:31:31] T218458: Jenkins jobs regularly being queued while resources appear to be readily available - https://phabricator.wikimedia.org/T218458 [13:34:49] <_joe_> hashar: are you aware CI is not working? [13:35:16] <_joe_> no patch is getting checked since a few hours [13:41:36] _joe_: hi, nope I am unaware of any brokage then I haven't really looked at the ci stack today [13:41:53] but I got plenty of patches passing through properly [13:42:01] <_joe_> how can I catch the attention of someone in your team when something like this happens? [13:42:26] <_joe_> well, you must be lucky, no patch is being voted on of the last 5 I submitted [13:43:11] <_joe_> and I'm not the only one having that problem [13:43:25] (03PS1) 10Volans: cumin-tox-publish: use the Python 3.7 env [integration/config] - 10https://gerrit.wikimedia.org/r/508319 [13:43:26] <_joe_> also all the patches I've reviewed in multiple repos [13:44:36] (03PS1) 10Hashar: phan: use composer and mysql [integration/config] - 10https://gerrit.wikimedia.org/r/508320 (https://phabricator.wikimedia.org/T189567) [13:46:25] my change is not going through ci https://integration.wikimedia.org/zuul/ [13:46:47] 10Release-Engineering-Team: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10CDanis) My patches are also stuck in the queue, and I'm seeing teammates manually V+2 their Puppet changes. [13:47:32] (03CR) 10Hashar: "Example failure: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ThrottleOverride/+/506881/" [integration/config] - 10https://gerrit.wikimedia.org/r/508320 (https://phabricator.wikimedia.org/T189567) (owner: 10Hashar) [13:48:39] _joe_: paladox : just give me a repo / change that had nothing triggered and I will gladly investigate [13:48:51] is it correct to interpret https://grafana.wikimedia.org/d/000000321/zuul?from=now-6h&to=now&panelId=13&fullscreen&orgId=1 as meaning that no operations/puppet changes have been processed for 2.5 hours? [13:48:51] hashar https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/508127/ [13:48:51] hashar: at least 6 puppet patches have been merged with manual V+2 because of this [13:48:55] :) [13:49:03] <_joe_> https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/504578 [13:49:12] hashar: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/508011/ [13:49:17] ok ok :) [13:49:19] <_joe_> https://gerrit.wikimedia.org/r/c/operations/puppet/+/508311 [13:49:27] so that is operations/puppet.git [13:49:28] <_joe_> we can go on for 1 hour I guess :D [13:49:34] <_joe_> not just puppet [13:49:36] <_joe_> all repos [13:49:39] <_joe_> really [13:49:47] na not all repos [13:49:55] <_joe_> https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/508172 [13:50:01] <_joe_> well 99%? [13:50:14] 10Release-Engineering-Team: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10Marostegui) >>! In T222605#5159921, @CDanis wrote: > My patches are also stuck in the queue, and I'm seeing teammates manually V+2 their Puppet changes. Same here with my patches. [13:50:15] <_joe_> anyways, please take a look [13:50:36] could this be caused by the lowering of how many ssh connections a user can have in gerrit? [13:51:46] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): CI no more triggers for some/all? repositories! - https://phabricator.wikimedia.org/T222614 (10hashar) [13:51:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): CI no more triggers for some/all? repositories! - https://phabricator.wikimedia.org/T222614 (10hashar) p:05Triage→03Unbreak! [13:51:53] paladox: yeah potentially :- [13:52:23] anyway I have filled https://phabricator.wikimedia.org/T222614 [13:52:28] digging into logs now [13:52:44] <_joe_> hashar: there is already a ticket [13:52:48] <_joe_> UBN [13:52:51] hashar: there was T222605 as pinged earlier [13:52:51] T222605: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 [13:53:44] just mark mine as a dupe I guess [13:53:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): CI no more triggers for some/all? repositories! - https://phabricator.wikimedia.org/T222614 (10Marostegui) This looks like a duplicate of {T222605}? [13:53:57] anywya I am digging in los [13:54:00] ogs [13:55:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): CI no more triggers for some/all? repositories! - https://phabricator.wikimedia.org/T222614 (10Joe) [13:55:59] 10Release-Engineering-Team: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10Joe) [13:56:32] 10Continuous-Integration-Config, 10Release-Engineering-Team: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10Paladox) [13:57:21] Hello. It seems nobody is handling https://phabricator.wikimedia.org/tag/gerrit-privilege-requests/ [14:03:58] (03PS1) 10Hashar: Revert "zuul: skip test/test-prio for CR+2 changes" [integration/config] - 10https://gerrit.wikimedia.org/r/508323 (https://phabricator.wikimedia.org/T105474) [14:04:16] (03CR) 10Hashar: [C: 03+2] Revert "zuul: skip test/test-prio for CR+2 changes" [integration/config] - 10https://gerrit.wikimedia.org/r/508323 (https://phabricator.wikimedia.org/T105474) (owner: 10Hashar) [14:04:45] cdanis: volans: _joe_: paladox: eventually I broken the zuul workflow earlier this morning despite writing tests and testing it live :-/// [14:04:49] revert is ongoing [14:04:59] oh :( [14:05:09] ack [14:05:20] <_joe_> thanks [14:05:44] (03Merged) 10jenkins-bot: Revert "zuul: skip test/test-prio for CR+2 changes" [integration/config] - 10https://gerrit.wikimedia.org/r/508323 (https://phabricator.wikimedia.org/T105474) (owner: 10Hashar) [14:08:14] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: 'recheck' on a CR+2 patch should trigger gate-and-submit, not test - https://phabricator.wikimedia.org/T105474 (10hashar) 05Resolved→03Open [14:09:20] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10Patch-For-Review: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10hashar) 05Open→03Resolved a:03hashar I broke Zuul workflow earlier this morning when deploying a change for T105474 despite tests :-(... [14:13:43] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10Patch-For-Review: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10D3r1ck01) Thank you very much @hashar \o/, I can confirm that the pipeline is back up & running. [14:18:08] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.076 second response time [14:23:12] (03CR) 10Hashar: [C: 03+2] "That fixed https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ThrottleOverride/+/506881/" [integration/config] - 10https://gerrit.wikimedia.org/r/508320 (https://phabricator.wikimedia.org/T189567) (owner: 10Hashar) [14:24:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [14:27:29] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [14:28:20] hmm seems like none of the tests queued are running - https://integration.wikimedia.org/zuul/ [14:28:23] hashar ^^ [14:31:11] (03Merged) 10jenkins-bot: phan: use composer and mysql [integration/config] - 10https://gerrit.wikimedia.org/r/508320 (https://phabricator.wikimedia.org/T189567) (owner: 10Hashar) [14:33:41] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.34.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T220730 (10hashar) [14:34:52] hmm [14:35:41] paladox: yeah it is busy processing a lot of changes [14:36:13] oh, though it dosen't look like any tests are running. [14:52:19] <_joe_> hashar: ops/puppet seems to still be stuck [14:52:22] hm, i'm not seeing my eventgate trigger-service-pipeline-test-and-publish job starting [14:52:26] i just pushed a tag to eventgate-ci [14:52:39] <_joe_> yeah I'm not sure CI is working again [14:52:41] is there somewhere other than e.g. https://integration.wikimedia.org/ci/blue/organizations/jenkins/trigger-service-pipeline-test-and-publish/activity I cna look? [14:52:42] ohhhh [14:54:18] _joe_ all tests seem stuck. [14:55:04] (03PS1) 10Hoo man: Add EntitySchema to make-wmf-branch/config.json [tools/release] - 10https://gerrit.wikimedia.org/r/508339 (https://phabricator.wikimedia.org/T221648) [14:55:10] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: 'recheck' on a CR+2 patch should trigger gate-and-submit, not test - https://phabricator.wikimedia.org/T105474 (10hashar) Eventually my change broke Zuul workflow entirely. Various changes could not enter the test pipel... [14:56:35] 10Release-Engineering-Team, 10Release Pipeline, 10serviceops, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801 [14:56:40] 10Release-Engineering-Team (Next), 10CX-cxserver, 10Release Pipeline, 10serviceops, and 3 others: Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10jijiki) 05Open→03Resolved a:03jijiki [14:57:28] _joe_: yeah it has been overflowed [15:10:30] 10Release-Engineering-Team (Next), 10ChangeProp, 10Release Pipeline, 10serviceops, and 2 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10Jdforrester-WMF) Does this need cleanup or can it be marked as Resolved? [15:12:24] (03CR) 10Jforrester: [C: 03+2] Add EntitySchema to make-wmf-branch/config.json [tools/release] - 10https://gerrit.wikimedia.org/r/508339 (https://phabricator.wikimedia.org/T221648) (owner: 10Hoo man) [15:13:09] (03Merged) 10jenkins-bot: Add EntitySchema to make-wmf-branch/config.json [tools/release] - 10https://gerrit.wikimedia.org/r/508339 (https://phabricator.wikimedia.org/T221648) (owner: 10Hoo man) [15:16:45] 10Release-Engineering-Team (Next), 10ChangeProp, 10Release Pipeline, 10serviceops, and 2 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10Pchelolo) changeprop has not been moved to k8s, so no, it can not be marked as resolved. [15:18:18] 10Release-Engineering-Team (Next), 10ChangeProp, 10Release Pipeline, 10serviceops, and 2 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10Jdforrester-WMF) Oh, sorry, mis-read the above message. [15:24:17] (03PS5) 10Kosta Harlan: Generate junit.xml for sonar-scanner's usage [integration/config] - 10https://gerrit.wikimedia.org/r/508019 (https://phabricator.wikimedia.org/T218598) [15:26:24] (03PS1) 10Zfilipin: Send e-mail notification to Ephemeralwaves if a job fails [integration/config] - 10https://gerrit.wikimedia.org/r/508350 (https://phabricator.wikimedia.org/T217051) [15:28:17] 10Gerrit, 10Release-Engineering-Team, 10translatewiki.net, 10Language-Team (Language-2019-April-June): Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4 - https://phabricator.wikimedia.org/T222546 (10Reedy) >>! In T222546#5158936, @Raymond wrote: >... [15:28:54] 10Gerrit, 10Release-Engineering-Team, 10translatewiki.net, 10Language-Team (Language-2019-April-June): Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4 - https://phabricator.wikimedia.org/T222546 (10Reedy) p:05Unbreak!→03High [15:29:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.037 second response time [15:33:10] 10Gerrit, 10Release-Engineering-Team, 10translatewiki.net, 10Language-Team (Language-2019-April-June): Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4 - https://phabricator.wikimedia.org/T222546 (10abi_) I ran the exports today and they ran witho... [15:35:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [15:38:15] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.34.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T220729 (10Jakob_WMDE) [16:19:49] (03PS2) 10Dduvall: dockerfiles: Provide gradle a writable directory [integration/config] - 10https://gerrit.wikimedia.org/r/508036 (https://phabricator.wikimedia.org/T222199) [16:21:18] (03PS3) 10Dduvall: dockerfiles: Provide gradle a writable directory [integration/config] - 10https://gerrit.wikimedia.org/r/508036 (https://phabricator.wikimedia.org/T222199) [16:21:20] (03PS6) 10Dduvall: doc: Publish documentation for pipelinelib [integration/config] - 10https://gerrit.wikimedia.org/r/507871 (https://phabricator.wikimedia.org/T222199) [16:22:02] (03CR) 10Dduvall: dockerfiles: Provide gradle a writable directory (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/508036 (https://phabricator.wikimedia.org/T222199) (owner: 10Dduvall) [16:23:55] (03PS4) 10Dduvall: dockerfiles: Provide gradle a writable directory [integration/config] - 10https://gerrit.wikimedia.org/r/508036 (https://phabricator.wikimedia.org/T222199) [16:23:59] (03PS7) 10Dduvall: doc: Publish documentation for pipelinelib [integration/config] - 10https://gerrit.wikimedia.org/r/507871 (https://phabricator.wikimedia.org/T222199) [16:26:21] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [16:28:49] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.34.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T220729 (10Jdforrester-WMF) [16:33:04] 10Release-Engineering-Team (Kanban), 10User-MModell: Talk with Timo and Fillipo about graphana and sentury for LM ("logging, monitoring, metrics") - https://phabricator.wikimedia.org/T222638 (10mmodell) [16:34:41] 10Release-Engineering-Team (Kanban), 10User-MModell: Talk with Timo and Fillipo about grafana and sentury for LM ("logging, monitoring, metrics") - https://phabricator.wikimedia.org/T222638 (10mmodell) [16:53:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-Installer, 10MW-1.32-release, and 3 others: MediaWiki web installer do not show extension when their dependency is missing - https://phabricator.wikimedia.org/T220514 (10Krinkle) [16:53:26] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-Installer, 10MW-1.32-release, and 2 others: MediaWiki web installer do not show extension when their dependency is missing - https://phabricator.wikimedia.org/T220514 (10Krinkle) [16:53:36] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-Installer, 10MW-1.33-release, and 2 others: MediaWiki web installer do not show extension when their dependency is missing - https://phabricator.wikimedia.org/T220514 (10Krinkle) [16:53:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-Installer, 10MW-1.33-release, 10Patch-For-Review: MediaWiki web installer do not show extension when their dependency is missing - https://phabricator.wikimedia.org/T220514 (10Krinkle) [17:00:04] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.024 second response time [17:01:03] 10Phabricator, 10Project-Admins, 10Release-Engineering-Team (Next), 10Operations, and 2 others: Document how to convert projects into subprojects/milestones etc (sudo privileges for phab admins to run move_project script) - https://phabricator.wikimedia.org/T221112 (10Dzahn) @Aklapper @mmodell The request... [17:02:01] 10Phabricator, 10Project-Admins, 10Release-Engineering-Team (Next), 10Operations, and 2 others: Document how to convert projects into subprojects/milestones etc (sudo privileges for phab admins to run move_project script) - https://phabricator.wikimedia.org/T221112 (10mmodell) @dzahn: it has a bunch of par... [17:02:12] 10Phabricator, 10Project-Admins, 10Release-Engineering-Team (Next), 10Operations, and 2 others: Document how to convert projects into subprojects/milestones etc (sudo privileges for phab admins to run move_project script) - https://phabricator.wikimedia.org/T221112 (10Dzahn) a:03Aklapper Puppet ran on ph... [17:02:39] 10Phabricator, 10Project-Admins, 10Release-Engineering-Team (Next), 10Operations, and 2 others: Document how to convert projects into subprojects/milestones etc (sudo privileges for phab admins to run move_project script) - https://phabricator.wikimedia.org/T221112 (10mmodell) see T221112#5121800 [17:03:33] 10Phabricator, 10Project-Admins, 10Release-Engineering-Team (Next), 10Operations, and 2 others: Document how to convert projects into subprojects/milestones etc (sudo privileges for phab admins to run move_project script) - https://phabricator.wikimedia.org/T221112 (10Dzahn) >>! In T221112#5160984, @mmodel... [17:03:50] 10Continuous-Integration-Infrastructure, 10MediaWiki-Installer, 10MW-1.33-release, 10Patch-For-Review: MediaWiki web installer do not show extension when their dependency is missing - https://phabricator.wikimedia.org/T220514 (10greg) [17:20:51] (03CR) 10Dduvall: [C: 03+2] dockerfiles: Provide gradle a writable directory [integration/config] - 10https://gerrit.wikimedia.org/r/508036 (https://phabricator.wikimedia.org/T222199) (owner: 10Dduvall) [17:22:05] (03Merged) 10jenkins-bot: dockerfiles: Provide gradle a writable directory [integration/config] - 10https://gerrit.wikimedia.org/r/508036 (https://phabricator.wikimedia.org/T222199) (owner: 10Dduvall) [17:26:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [17:28:31] 10Continuous-Integration-Config, 10Tool-extjsonuploader, 10Test-Coverage: Provide list of repos with coverage information in machine-readable format - https://phabricator.wikimedia.org/T221510 (10Tgr) > Do you need a machine friendly list of (extension, coverage percent)? Yeah. [17:34:38] 10Beta-Cluster-Infrastructure: Migrate away from Debian Jessie to Debian Stretch - https://phabricator.wikimedia.org/T218729 (10Krenair) >>! In T218729#5159473, @fgiunchedi wrote: >>>! In T218729#5156079, @Krenair wrote: >> No worries I'm happy to take care of that sort of problem, I've got the puppet repo sorte... [17:44:32] !log Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/c/integration/config/+/508036 [17:44:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:51:18] 10Gerrit, 10Release-Engineering-Team (Watching / External), 10Operations, 10ops-eqiad, 10serviceops: Gerrit Hardware Upgrade - https://phabricator.wikimedia.org/T222391 (10CDanis) cc @mark who I know is about to start looking at hardware requests for the coming FY [17:52:37] 10Gerrit, 10Release-Engineering-Team (Watching / External), 10Operations, 10ops-eqiad, 10serviceops: Gerrit Hardware Upgrade - https://phabricator.wikimedia.org/T222391 (10Dzahn) I expect this to be a topic in our (DP - SRE) meeting this Wednesday. [17:57:07] (03PS1) 10Dduvall: dockerfiles: Fix GRADLE_USER_HOME creation [integration/config] - 10https://gerrit.wikimedia.org/r/508370 [18:06:31] 10Continuous-Integration-Infrastructure: Add two more zuul-merger process - https://phabricator.wikimedia.org/T222645 (10hashar) [18:06:42] 10Continuous-Integration-Infrastructure, 10Wikimedia-Incident: Add two more zuul-merger process - https://phabricator.wikimedia.org/T222645 (10hashar) [18:08:42] 10Phabricator, 10Project-Admins, 10Release-Engineering-Team (Next), 10Operations, and 2 others: Document how to convert projects into subprojects/milestones etc (sudo privileges for phab admins to run move_project script) - https://phabricator.wikimedia.org/T221112 (10Dzahn) >>! In T221112#5160991, @Dzahn... [18:12:09] 10Continuous-Integration-Infrastructure, 10Wikimedia-Incident: Create a controlled and ongoing CI pipeline test job that we can alert on - https://phabricator.wikimedia.org/T158054 (10hashar) [18:18:45] 10Release-Engineering-Team (Kanban), 10Developer Productivity, 10Release Pipeline, 10local-charts, 10Patch-For-Review: Define a base docker-pkg template and .pipeline/blubber.yaml for mediawiki/core - https://phabricator.wikimedia.org/T218360 (10brennen) [18:20:15] (03CR) 10Jforrester: "The task this is tagged against is marked as Resolved (via I7a895aede4); does that mean this can be abandoned?" [integration/config] - 10https://gerrit.wikimedia.org/r/395610 (https://phabricator.wikimedia.org/T159591) (owner: 10Hashar) [18:35:14] (03CR) 10Dduvall: [C: 03+2] dockerfiles: Fix GRADLE_USER_HOME creation [integration/config] - 10https://gerrit.wikimedia.org/r/508370 (owner: 10Dduvall) [18:36:42] (03Merged) 10jenkins-bot: dockerfiles: Fix GRADLE_USER_HOME creation [integration/config] - 10https://gerrit.wikimedia.org/r/508370 (owner: 10Dduvall) [18:37:40] !log Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/c/integration/config/+/508370 [18:37:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:45:29] (03PS8) 10Dduvall: doc: Publish documentation for pipelinelib [integration/config] - 10https://gerrit.wikimedia.org/r/507871 (https://phabricator.wikimedia.org/T222199) [19:02:59] heads up [19:03:21] Jenkins has a different scheduling mecanism applied [19:03:33] so that it should spread the load more evenly across all nodes [19:03:37] https://phabricator.wikimedia.org/T218458#5159850 [19:04:02] or in short, I have installed the Least Load plugin https://plugins.jenkins.io/leastload [19:04:52] 10Continuous-Integration-Infrastructure: Jenkins jobs regularly being queued while resources appear to be readily available - https://phabricator.wikimedia.org/T218458 (10hashar) 05Open→03Resolved a:03hashar I am assuming the plugin magically fixed it up. [19:05:44] (03CR) 10Krinkle: [C: 03+1] "Per the zuul diff, the postmerge job is now on the master branch only. Previously could be on other branches/tags as well, but never made " [integration/config] - 10https://gerrit.wikimedia.org/r/508012 (owner: 10Jforrester) [19:05:47] (03PS2) 10Krinkle: [TemplateData] Replace extension-jsduck and mwext-jsduck-publish with extension-javascript-documentation [integration/config] - 10https://gerrit.wikimedia.org/r/508012 (owner: 10Jforrester) [19:05:50] (03CR) 10Krinkle: [C: 03+2] [TemplateData] Replace extension-jsduck and mwext-jsduck-publish with extension-javascript-documentation [integration/config] - 10https://gerrit.wikimedia.org/r/508012 (owner: 10Jforrester) [19:06:52] hashar: Node 6 is now EOL. [19:06:57] https://phabricator.wikimedia.org/T211784 [19:07:19] (03Merged) 10jenkins-bot: [TemplateData] Replace extension-jsduck and mwext-jsduck-publish with extension-javascript-documentation [integration/config] - 10https://gerrit.wikimedia.org/r/508012 (owner: 10Jforrester) [19:07:45] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/508012 [19:07:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:08:49] Krinkle: Want to work on https://phabricator.wikimedia.org/T222406 with me (or suggest something less drastic)? :-) [19:11:13] James_F: My only advice would be to skip step 1 and 7, and instead use an auto-detect for that step (already exists afaik). So they're disabled in individual repos first (your step 2), and then re-enabled at some point in a way that will be required to pass CI and can be worked on/iterated on accordingly until it passes (your step 5-6). [19:11:33] For myself, I won't have time for that I think, but can certainly consult a little here and there. [19:11:50] Note that it is blocked on T199116 either way. [19:11:51] T199116: Quibble should run `npm install` and `npm run selenium-test` for each extension/skin that has Selenium tests - https://phabricator.wikimedia.org/T199116 [19:11:58] Right now it's all global un-upgradeable. [19:12:05] (03PS6) 10Kosta Harlan: Generate junit.xml for sonar-scanner's usage [integration/config] - 10https://gerrit.wikimedia.org/r/508019 (https://phabricator.wikimedia.org/T218598) [19:12:06] Same as pre-2013 jshint. [19:12:23] Fixing that is a blocker to being able to change anything, unfortunately. [19:12:24] Auto-detect what? [19:12:32] James_F: Auto-detect whether to run selenium tests. [19:12:42] We do that already for most other steps, and I think even for wdio as well already. [19:12:43] Oh, no, I mean "don't run them at all anywhere". [19:13:08] Not on a per-repo basis. This is the upgrade-global step. [19:13:31] Right, but actual changes will be needed per-repo no matter what, if anything, the wdio dependency needs to be bumped, and test suites reformatted. [19:13:46] The only thing quibble does is run an npm-run command. [19:13:59] And it won't be able to even do 'npm install/test' for eslint if there are incompat dependencies. [19:14:04] So it'll need to be disabled per-repo. [19:14:07] Yes, but right now we can't fix things anywhere because we can't bump MW-core because quibble will not let us bump npm. [19:14:10] At which point, no need to disable anything in quibbe :) [19:14:16] (03CR) 10jerkins-bot: [V: 04-1] Generate junit.xml for sonar-scanner's usage [integration/config] - 10https://gerrit.wikimedia.org/r/508019 (https://phabricator.wikimedia.org/T218598) (owner: 10Kosta Harlan) [19:14:36] (03PS2) 10Kosta Harlan: sonar-scanner: Adjust polling script [integration/config] - 10https://gerrit.wikimedia.org/r/508086 (https://phabricator.wikimedia.org/T218598) [19:14:43] So fixing individual repos after the fact is a nice-to-have. [19:15:58] James_F: I assume you'd like repos that have no activity but some wdio 4 tests to fail after we're done, instead of remain disabled, so that the failures are seen. [19:16:07] I have seen James had a proposal to handle the non trivial nodejs / webdriver.io upgrade [19:16:12] Right now quibble is running `npm run selenium`. [19:16:27] but I could not process the info :/ [19:16:33] Krinkle: Yes, that's what the "re-enable selenium" step would do. [19:17:06] hashar: Essentially, I propose to break some repos to get CI running node 10, and then fix them. [19:17:10] at least REL1_27 does not have any wdio test s:] [19:17:29] hmm [19:17:35] well we can probably craft an experimental job [19:17:45] that just runs quibble --run npm-test,selenium [19:17:56] I could just patch Core's selenium.sh job to exit. [19:18:05] run that on all the important repo (that is reasonably easy to do ) collect result and assert it is not too bad [19:18:07] James_F: yes, I agree. My point is, it is a step we don't need. the process requires that, no matter what, for CI to work during the migration, any repos that want to use Gerrit need to disable it locally in their repos as well. As otherwise 'npm install' would fail with incompat deps. [19:18:24] James_F: Any repos that dont participate won't be doing that step either. And thus will still fail afterwards just the same way. [19:18:36] Oh, right, for local webdriveio issues, yes, good point. [19:18:40] quibble's run of 'npm run selenium' is conditional. [19:19:03] So we'd need to force-merge for repos in gate that have webdriverio specified. [19:19:04] For example, Minerva didnt have wdio tests until recently, now it does. It just started to pick them up. It didn't run that command previously. [19:19:23] * James_F sighs. [19:19:35] and Minerva is now in the wmf-quibble jobs gate [19:19:35] No config change was needed for that, because just like 'npm test', it is conditional on certain things being found in the repo. [19:19:46] (asked by Jon last week since it is hmm critical to mobile apparently) [19:20:03] Anyway, it's a little extra work, if those doing it insist on it, that's fine, I'm just saying it can be done in fewer steps with no observable difference [19:20:22] So we need to patch quibble to both not try to run selenium, but also to patch out requests to install webdriverio? [19:20:38] It's not `npm test` that will fail on node 10, it's `npm install`. [19:20:43] "Temporarily remove webdriverio from core's npm build (as it's node10-incompatible)" [19:20:51] This means wdio won't run, thus skipped naturally. [19:20:58] Yeah, but not just core, also everyone else's repos. [19:21:12] the conditional is checking for the existence of tests/selenium/ https://github.com/wikimedia/quibble/blob/master/quibble/cmd.py#L553-L556 [19:21:20] does installing wdio 4 on node 10 fail only in core, not in other repos? [19:21:21] E.g. https://gerrit.wikimedia.org/g/mediawiki/extensions/Cite/+/HEAD/package.json which is in gate. [19:21:52] Krinkle: No, it will fail in any repo. WDIO 4 depends on fibres 2 which is not node 10 compatible. [19:21:52] Yeah, so regardless of whether we patch quibble to hard-skip "npm run selenium" we need to remove those from indivudla repos' packagejson [19:21:53] last time I checked that is because wdio4 depends on wdio-async which depends on fibers@2.x [19:21:59] Exactly :) [19:22:07] and fivers@2.x only works with nodejs 6 / 8 due to abi compatibility [19:22:14] OK, so. What are we going to do about it, and when? [19:22:16] hashar: yep, https://phabricator.wikimedia.org/T222406 is what we are discussing. [19:22:29] and upgrading wdio to 5 has a dependency on fibers@3.x [19:22:47] then maybe we can get wdio4 to work with fibers@2.x and hence on nodejs 10 [19:22:55] yeah [19:23:07] 10Continuous-Integration-Config, 10Release-Engineering-Team (Backlog), 10JavaScript: Switch quibble-based CI jobs from node6 to node10 - https://phabricator.wikimedia.org/T222406 (10Jdforrester-WMF) [19:23:09] zeljkof reported that fiber/nodejs/wdio a few weeks ago and we dig a little bit into it [19:23:21] 10Continuous-Integration-Config, 10Release-Engineering-Team (Backlog), 10JavaScript: Switch quibble-based CI jobs from node6 to node10 - https://phabricator.wikimedia.org/T222406 (10Jdforrester-WMF) [19:23:43] ah yeah https://phabricator.wikimedia.org/T210506#5065813 [19:24:10] hashar: We can also stop using fibers in wdio. It is only used for the counter-intuitive weird way that we do async code. Normally in JS we use promises for that. Instead, wdio has an (optional) feature for magically making the JS pause whenever a promise is pending (sometimes) and then continue. That is what fibers does. [19:24:26] If we use promises instead, we can migrate on wdio 4 and then move to node 10 without any downtime. [19:24:38] Timo, you are largely over estimating my knowledge about the javascript world :] [19:24:38] and then upgrade to wdio 5/6 whenever. [19:24:40] It's in the install tree though. [19:24:54] but yeah webdriver.io > wdio.async > fiber [19:24:57] And wdio 4->5 is a breaking change. [19:25:19] so maybe there is a way to get wdio4 to be installable and work just fine under nodejs 10 [19:25:24] which would ease the migration [19:25:30] James_F: kind of, fibers comes in via 'wdio-sync' [19:25:54] Which is not part of 'webdriverio' [19:26:03] we're installing it somehow, I don't recall how [19:26:29] Ah, wdio-mocha-framework [19:26:52] Yeah. [19:27:00] I cant remember the history , but I think wdio 3 was async [19:27:10] and went with sync mode instead to make it easier [19:27:23] one can then optionally set sync: false to restore the old behavior [19:27:43] (03PS7) 10Kosta Harlan: Generate junit.xml for sonar-scanner's usage [integration/config] - 10https://gerrit.wikimedia.org/r/508019 (https://phabricator.wikimedia.org/T218598) [19:27:44] easier for people who are okay with post-poning learning how JS works everywhere else in the world. harder for anyone else, including beginners that have learned javascript a litlte bit correctly. [19:27:55] https://phabricator.wikimedia.org/T182412 Investigate if WebdriverIO `sync: false` would be useful to us and document how to use it [19:27:58] and very very hard to debug. [19:28:13] :-( [19:28:26] which i have declined, apparently because sync: false is a feature flag for legacy webdriver 3 [19:28:33] Also, I've seen most developers (wmf, wmde, volunteers) frequently forget to use browser.call() to trigger this magic, and thus have async stacks that are not sync-ified. [19:28:53] which then leads to flaky tests [19:28:53] well [19:29:00] and last 3 months, most flaky tests are all wdio. [19:29:12] we need a javascript endowment to start cloning Timo and save the web! [19:29:21] If only. [19:29:34] I expect wdio to deprecate sync mode in wdio 6 in favour of async-await on Node 10 [19:29:37] Timo.clone() ERR: Method not yet implemented. [19:29:40] to be fair, wdio/javascript is not the only culpirt. There are bunch of races with mediawiki itself :-( [19:29:46] Their new exxamples already do not use sync mode anymore. [19:30:02] the good news is that I have eventually found a couple of very nasty bugs which Brad kindly fixed up [19:30:41] I'm sure they exist, although so far all browser tests I have seen are simple and do not have mw-related race conditions. Would be interested in an example to help incorporate in docs I might be writing about wdio. [19:30:41] but even if tests were switched to use async [19:30:43] Links welcome :) [19:30:57] the wdio-mocha-framework would still have a dependency on wdio-sync and thus fiber ? [19:32:16] Yes, we might need to fork that or work with upstream to change their mind. [19:32:26] It does not appear to be needed for it. [19:32:28] gtg [19:34:29] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Wikimedia-Logstash, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324 (10thcipriani) >>! In T141324#5122978, @herron wrote: >>>! In T141324#5122806, @Gehel wrote: >> * structured logging from log4j can... [19:36:08] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.569 second response time [19:40:28] (03PS1) 10Hashar: WMF: backport Don't call the merger for non-live items [integration/zuul] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/508388 (https://phabricator.wikimedia.org/T140297) [19:42:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [19:43:26] (03PS1) 10Hashar: 2.5.1-wmf8: Don't call merger for non live item [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/508390 (https://phabricator.wikimedia.org/T140297) [19:46:21] thcipriani: brennen: longma: theorically, might it be a good thing to have a repo that uses docker-pkg / helm to "easily" reproduce our stack of gerrit/zuul/jenkins ? [19:47:20] I am notably interested in having an easy way to setup gerrit and zuul (jenkins not so much but might still be interesting) [19:47:57] I think it would be nice if we could easily recreate our CI/CD tooling, yeah [19:48:50] if we use helm they would have to be running in k8s though [19:49:11] no minikube? [19:49:29] ,D [19:49:43] it would be helpful from time-to-time to debug certain scenarios. [19:49:51] it is probably nicer to have helm > k8s rather than docker composer which is not going to be the stack we recommend/use [19:50:09] i don't think i have a good understanding of what it would take to reproduce that environment now [19:50:15] so it seems like a good idea in that sense [19:50:28] yeah I am just looking for a quick feedback as whether it might make sense [19:50:32] or whether I am just over thinking [19:53:05] in all case, I would guess that any state should be stored on the k8s vm or the host [19:53:15] I think could be fine? (it is a k8s cluster)...I just don't know how we currently run gerrit/zuul/jenkins either [19:53:25] I don't know either [19:53:28] * hashar grins [19:53:30] :) [19:53:31] hah [19:53:36] they are on baremetal [19:53:41] that's a good sign. ;) [19:53:41] with mixed deployment cases [19:54:06] zuul is a hacked debian package that runs pip install (via dh_virtualenv) to grab missing python modules from pypi.python.org [19:54:09] its terrible [19:54:38] Jenkins is an upstream Debian package that install a .jar , with the start up script being defined in puppet and all the rest not in any config management :-( [19:54:57] and Gerrit uses scap + git-fat to deploy with startup and config in pupet [19:55:10] but if I just need a sandbox area [19:55:12] we ran jenkins in k8s at my old job, but I wasn't involved in setting it up [19:55:22] so a container with zuul , another with the gerrit version we use [19:55:31] the git repos on the host [19:55:31] rapid fire questions: what's the use-case for this? I can see it being useful occasionally for debugging plugins and that kind of thing. Are there other use-cases for something like this? Is the idea to eventually to deploy to prod on k8s? [19:55:36] upstream have a repo for gerrit in k8s [19:55:42] then have helm tomagically inject the IP of gerrit in zuul.conf [19:55:42] ;) [19:55:54] hashar thcipriani longma brennen https://gerrit-review.googlesource.com/admin/repos/k8s-gerrit [19:56:17] my use case is testing patches for zuul [19:56:29] and potentially test zuul against new version of Gerrit [19:56:50] so really I am just thinking of a quick way to setup a dev environment [19:56:57] automated or manual testing? [19:57:05] manual develop [19:57:18] though zuul has a fairly large integration suite for Gerrit [19:57:26] in reality, it just mock all Gerrit calls [19:57:55] interesting [19:58:13] for Gerrit I don't know [19:58:25] I guess they are making it database less so that eventually it can run in a container [19:58:33] I mean [19:58:44] so that Gerrit can run in a container on k8s and thus get stateless [19:58:50] why can't it run in a container and use an external database? [19:59:08] hashar yup (gerrit 3.0 dropped the db) [19:59:10] cause databases are so not 2010's / hipster? [19:59:17] hah [19:59:22] I am kidding [19:59:32] paladox: what was the use case for dropping the database ? [19:59:37] 2020s hipster: we're living in a weird scifi future. [19:59:43] https://gerrit-review.googlesource.com/Documentation/note-db.html [20:00:21] it's so that you doin't need dbas to maintain gerrit [20:00:22] it seems possible. Gerrit is currently just a bunch of jar files. You could run it with an h2 database for testing (i.e., just a file) [20:00:30] anyway, is the question more about wanting infrastructure as code to set up our CI/CD tooling or is it about wanting it in containers? [20:01:41] very easy to deploy gerrit too [20:01:50] (from 2.16) [20:03:43] longma: I guess longterm almost everything will be in containers / k8s [20:04:10] so I would say it is more aout having infra as code [20:05:50] * brennen -> lunch. [20:05:50] 👍 I think it's a good idea...I think marxarelli was interested in a reproducible deployment environment as well [20:06:09] yeah, seems a good idea. [20:06:22] * brennen -> actually lunch. [20:09:25] good [20:09:34] so maybe I will look at crafting such env at some point :] [20:17:36] hashar: cool, I'd be happy to help where I can! [20:18:16] (03CR) 10Dzahn: [C: 03+1] "lgtm, unicode chars confirmed" [integration/config] - 10https://gerrit.wikimedia.org/r/464643 (owner: 10Paladox) [20:18:51] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Upgrade python-pbr on contint1001 / contint2001 and restart Zuul process - https://phabricator.wikimedia.org/T222659 (10hashar) [20:20:22] (03CR) 10Hashar: "Oops I forgot about this change. We might need to rebase this one in case we had new pipelines added meanwhile." [integration/config] - 10https://gerrit.wikimedia.org/r/464643 (owner: 10Paladox) [20:20:46] (03CR) 10Dzahn: "was there a IRC conversation about it or a specific request?" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/508068 (owner: 10Paladox) [20:20:51] paladox: if you can rebase the " ❌ for failure and ✅ for success" https://gerrit.wikimedia.org/r/#/c/integration/config/+/464643/ I guess it is about time to have it deployed! [20:21:02] yup [20:21:07] * paladox does [20:27:53] (03CR) 10Dzahn: "adding Alex" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/508068 (owner: 10Paladox) [20:28:36] (03PS9) 10Paladox: Use ❌ for failure and ✅ for success [integration/config] - 10https://gerrit.wikimedia.org/r/464643 [20:28:46] hashar ^^ [20:29:11] (03CR) 10Paladox: "> was there a IRC conversation about it or a specific request?" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/508068 (owner: 10Paladox) [20:30:04] paladox: :] [20:34:43] (03PS2) 10Paladox: grant access to Javamelody Monitoring for ldap/ops [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/508068 [20:34:48] (03CR) 10Hashar: [C: 03+2] Use ❌ for failure and ✅ for success [integration/config] - 10https://gerrit.wikimedia.org/r/464643 (owner: 10Paladox) [20:35:10] \o/ [20:36:38] (03Merged) 10jenkins-bot: Use ❌ for failure and ✅ for success [integration/config] - 10https://gerrit.wikimedia.org/r/464643 (owner: 10Paladox) [20:40:27] (03CR) 10Paladox: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/464643 (owner: 10Paladox) [20:45:05] Hey, I ran into an issue where Gerrit no longer recognizes my previously working Wikitech user credentials. I tried the workarounds suggested in earlier similar tickets (i.e. trying to log in all case variations of the user name) but it didn't help. I was told you might be able to help me out :) [20:46:44] mszabo-wikia hi, is it the error when it brings up an error saying something about attaching? [20:47:10] yeah the full context is in https://phabricator.wikimedia.org/T222186 [20:47:16] "Cannot assign user name "tk-999" to account <...>; name already in use." [20:48:05] thanks [20:48:07] thcipriani ^^ [20:50:42] 10Continuous-Integration-Infrastructure: zuul git-daemon sometime reject connections: Too many children, dropping connection - https://phabricator.wikimedia.org/T222661 (10hashar) [20:54:09] (03CR) 10Dzahn: "+ chaomodus" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/508068 (owner: 10Paladox) [20:54:47] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): zuul git-daemon sometime reject connections: Too many children, dropping connection - https://phabricator.wikimedia.org/T222661 (10hashar) a:03hashar [20:56:00] (03CR) 10Hashar: [C: 03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/464643 (owner: 10Paladox) [20:56:50] paladox: not quite there yet : https://gerrit.wikimedia.org/r/#/c/test/gerrit-ping/+/508376/ :-( [20:57:10] oh [20:57:54] apparently I did a review on ps4 [20:58:22] (03CR) 10CRusnov: [C: 03+1] "This seems like a logical change." [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/508068 (owner: 10Paladox) [20:58:34] hashar should we do https://gerrit.wikimedia.org/r/c/integration/config/+/464643/9#message-04af741362fc9f5260130d50095273deb6d5b7ee instead ? [20:59:00] (03PS8) 10Kosta Harlan: Generate junit.xml for sonar-scanner's usage [integration/config] - 10https://gerrit.wikimedia.org/r/508019 (https://phabricator.wikimedia.org/T218598) [20:59:04] (03PS1) 10Hashar: Revert "Use ❌ for failure and ✅ for success" [integration/config] - 10https://gerrit.wikimedia.org/r/508412 [20:59:31] (03CR) 10Hashar: [C: 03+2] "https://gerrit.wikimedia.org/r/#/c/test/gerrit-ping/+/508376/" [integration/config] - 10https://gerrit.wikimedia.org/r/508412 (owner: 10Hashar) [21:00:33] and in the ssh log: jenkins-bot a/75 gerrit.review.--project.test/gerrit-ping.--message.xe2x9cx85 Main test build succeeded. [21:00:35] bah [21:01:42] (03Merged) 10jenkins-bot: Revert "Use ❌ for failure and ✅ for success" [integration/config] - 10https://gerrit.wikimedia.org/r/508412 (owner: 10Hashar) [21:02:01] paladox: reverted sorry :( [21:02:06] ok [21:02:58] 10Gerrit, 10Release-Engineering-Team (Watching / External), 10Operations: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086 (10crusnov) Just to +1 the idea of shipping javamelody to prometheus. Let me know if I can help at all. [21:03:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog): fatal: remote error: access denied or repository not exported: /mediawiki/extensions/ReadingLists - https://phabricator.wikimedia.org/T187897 (10hashar) > Also we might be reaching the maximum number of connections which defaults to... [21:04:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: zuul git-daemon sometime reject connections: Too many children, dropping connection - https://phabricator.wikimedia.org/T222661 (10hashar) p:05Triage→03Low [21:08:36] 10Gerrit, 10Release-Engineering-Team (Watching / External), 10Operations: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086 (10Paladox) @crusnov we could use your help, yup. We need to create a prometheusBearerToken [plugin.javamelody.prometheusBearerToken] https://gerrit.googleso... [21:11:28] 10Phabricator, 10Developer-Advocacy (Apr-Jun 2019): Re-evaluate our use of Phabricator Conpherence chat - https://phabricator.wikimedia.org/T127640 (10Dzahn) > Re "messaging problematic users in Phab itself": This is what notifications are for. Mentioning a user by @name or subscribing them will notify them.... [21:14:18] (03PS3) 10Kosta Harlan: sonar-scanner: Adjust polling script [integration/config] - 10https://gerrit.wikimedia.org/r/508086 (https://phabricator.wikimedia.org/T218598) [21:15:49] 10Phabricator, 10Developer-Advocacy (Apr-Jun 2019): Re-evaluate our use of Phabricator Conpherence chat - https://phabricator.wikimedia.org/T127640 (10Aklapper) >>! In T127640#5162133, Dzahn wrote: >> Re "messaging problematic users in Phab itself": > > This is what notifications are for. Mentioning a user b... [21:20:57] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10Patch-For-Review, 10Wikimedia-Incident: CI is unavailable since around 10:00 UTC - https://phabricator.wikimedia.org/T222605 (10hashar) Sorry for the mess today, I was really not paying attention to any IRC notifications and was otherwise busy... [21:35:54] (03CR) 10Hashar: [C: 04-2] "I guess we also need https://review.opendev.org/#/c/589762/" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/508390 (https://phabricator.wikimedia.org/T140297) (owner: 10Hashar) [21:36:58] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Zuul, 10Patch-For-Review, 10Upstream: 'recheck' on a CR+2 patch should trigger gate-and-submit, not test - https://phabricator.wikimedia.org/T105474 (10hashar) I need a test that is the other way around, namely ensure a patch without... [21:40:50] paladox: good thing about today zuul incident, I guess we have hit two bugs in zuul :] [21:40:59] heh, yeh [21:41:02] which is good because the fix seems to be available [21:41:15] and bad cause we are so outdated that we should have benefited frmo those hotfixes :-\ [21:41:24] I should have commited to maintained a v2.5 on upstream repo [21:41:33] :) [21:42:05] and handle all the backportings :D [21:42:06] anyway [21:42:10] bed time! [21:42:11] heh [21:42:58] pasted at https://wikitech.wikimedia.org/wiki/Incident_documentation/20190506-zuul [21:43:01] *wave* [21:43:18] i saw that the fix was only published a few months ago [21:43:21] into zuul v3 [21:44:07] yes [21:44:12] i will need to pick them [21:44:17] and then fix our tests ideally [21:44:28] or setup a gerrit/zuul and do some manual testing ;/ [21:44:37] anyway that will be after a good night and a breakfast! [21:44:38] ;) [21:46:46] (03PS4) 10Kosta Harlan: sonar-scanner: Adjust polling script, drop JSON pretty-print output [integration/config] - 10https://gerrit.wikimedia.org/r/508086 (https://phabricator.wikimedia.org/T218598) [21:49:31] (03CR) 10Thcipriani: [C: 03+2] sonar-scanner: Adjust polling script, drop JSON pretty-print output [integration/config] - 10https://gerrit.wikimedia.org/r/508086 (https://phabricator.wikimedia.org/T218598) (owner: 10Kosta Harlan) [21:50:57] (03Merged) 10jenkins-bot: sonar-scanner: Adjust polling script, drop JSON pretty-print output [integration/config] - 10https://gerrit.wikimedia.org/r/508086 (https://phabricator.wikimedia.org/T218598) (owner: 10Kosta Harlan) [21:52:04] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.026 second response time [21:54:14] (03PS26) 10Kosta Harlan: Establish codehealth pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/502606 (https://phabricator.wikimedia.org/T218598) [21:54:52] 10Release-Engineering-Team (Backlog), 10Browser-Tests, 10Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q4), 10Spike, 10User-zeljkofilipin: [Spike] Have a discussion around Minerva selenium browser test architecture - https://phabricator.wikimedia.org/T220755 (10Jdlrobson) 05Open→03Resolved [21:57:56] !log update docker-pkg on contint1001 for https://gerrit.wikimedia.org/r/508086 [21:57:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:58:03] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [22:18:23] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Testing, 10MW-1.34-notes (1.34.0-wmf.4; 2019-05-07), 10Patch-For-Review: Stop using jsonlint (as it's abandonware) and instead use eslint-plugin-json for the linting - https://phabricator.wikimedia.org/T220036 (10Jdforrester-WMF) [22:21:18] 10Gerrit, 10Release-Engineering-Team, 10VPS-project-libraryupgrader: Re-enable use of Gerrit HTTP token to push patchsets - https://phabricator.wikimedia.org/T218750 (10Paladox) [22:36:55] 10Beta-Cluster-Infrastructure: Can't run mwscript without explicit sudo on Beta Cluster - https://phabricator.wikimedia.org/T89802 (10Dzahn) The ability to run commands as the "apache" user has been removed from the prod admins module today. [22:37:47] 10Beta-Cluster-Infrastructure, 10Operations, 10Patch-For-Review: Make www-data the web-serving user (is currently apache) - https://phabricator.wikimedia.org/T78076 (10Dzahn) The ability to run commands as the 'apache' user has been removed from prod admins module sudo privileges today. [22:58:07] (03CR) 10Dzahn: "why was it reverted?" [integration/config] - 10https://gerrit.wikimedia.org/r/508412 (owner: 10Hashar) [23:23:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.039 second response time [23:25:57] 10Gerrit, 10LDAP: Gerrit login failure for user tk-999 - https://phabricator.wikimedia.org/T222186 (10Paladox) i think this is blocked on us moving to 2.16. See T220867#5124861 [23:28:03] 10Gerrit, 10LDAP: Gerrit login failure for user tk-999 - https://phabricator.wikimedia.org/T222186 (10TK-999) Thanks @Paladox for investigating :) The timing of the issue is a bit unfortunate with the WM Hackathon coming up in a week, but I guess I can create a secondary account for that if all else fails. [23:29:07] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [23:33:45] 10Gerrit, 10Operations, 10cloud-services-team, 10serviceops: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Dzahn) [23:33:49] PROBLEM - Content Translation Server on deployment-sca01 is CRITICAL: connect to address 172.16.5.13 and port 8080: Connection refused [23:38:48] RECOVERY - Content Translation Server on deployment-sca01 is OK: HTTP OK: HTTP/1.1 200 OK - 904 bytes in 0.027 second response time [23:39:50] 10Gerrit, 10LDAP: Gerrit: Cannot assign user name "vladi2016" to account XXXX; name already in use. - https://phabricator.wikimedia.org/T220867 (10TK-999) For reference, it seems the Gerrit 2.16 upgrade rollout is being tracked in T200739 [23:53:16] 10MediaWiki-Codesniffer, 10MediaWiki-General-or-Unknown: Opening brace indent level should match that of preceding keyword - https://phabricator.wikimedia.org/T222673 (10Tgr) [23:57:50] 10MediaWiki-Codesniffer, 10MediaWiki-General-or-Unknown: Opening brace indent level should match that of preceding keyword - https://phabricator.wikimedia.org/T222673 (10Tgr)