[00:35:37] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:55:54] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:02:34] (03PS1) 10D3r1ck01: Add SendGrid extension to Jenkins test [integration/config] - 10https://gerrit.wikimedia.org/r/373408 [01:10:35] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:16:06] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:30:54] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [01:39:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<55.56%) [01:51:07] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [02:02:35] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Patch-For-Review: Update gerrit to 2.14.3 - https://phabricator.wikimedia.org/T156120#3547592 (10Paladox) [02:12:06] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:12:54] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:33:01] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:45:22] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:52:07] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [02:52:53] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [03:06:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [03:12:58] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:41:35] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [04:02:36] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [04:18:47] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #495: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/495/ [04:20:21] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [04:42:35] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [04:58:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:38:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [05:56:14] (03PS1) 10MaxSem: WIP: prohibit some globals [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/373433 [05:59:34] (03CR) 10jerkins-bot: [V: 04-1] WIP: prohibit some globals [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/373433 (owner: 10MaxSem) [06:16:22] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [06:51:21] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [07:14:04] RECOVERY - Free space - all mounts on deployment-kafka01 is OK: OK: All targets OK [07:24:04] * addshore thinks hashar is on vacation? [07:26:23] 10Gerrit, 10Wikidata, 10User-Ladsgroup, 10Wikidata-Sprint-2016-03-01, 10Wikidata-Sprint-2016-04-12: [Task] Move DataTypes repository from Github to gerrit - https://phabricator.wikimedia.org/T127292#2038409 (10Addshore) Just a note from IRC earlier, this looks more like an extension than a library, and t... [07:33:37] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:41:57] 10Continuous-Integration-Config: CI job debian-glue-non-voting: add support for BACKPORTS=yes - https://phabricator.wikimedia.org/T173999#3547757 (10Volans) [08:13:36] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:43:08] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [08:43:47] addshore: he is, will back on monday IIRC [08:58:48] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [08:59:31] godog: ack! [09:23:06] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [09:32:30] PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:34:00] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:44:06] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:54:47] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:12:32] RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [10:14:00] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:19:05] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [10:29:49] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [10:43:54] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [10:45:07] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [10:48:41] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3548084 (10zeljkofilipin) [10:49:54] 10Release-Engineering-Team (Kanban), 10MediaWiki-extensions-CentralAuth, 10Browser-Tests, 10User-Tgr, 10User-zeljkofilipin: Port CentralAuth Selenium tests from Ruby to Node - https://phabricator.wikimedia.org/T173989#3548091 (10zeljkofilipin) a:03zeljkofilipin To get this going, I will create a sample... [10:55:50] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [10:56:30] 10Release-Engineering-Team (Kanban), 10MediaWiki-extensions-CentralAuth, 10Browser-Tests, 10User-Tgr, 10User-zeljkofilipin: Port CentralAuth Selenium tests from Ruby to Node - https://phabricator.wikimedia.org/T173989#3547318 (10zeljkofilipin) p:05Triage>03Normal [11:08:56] zeljkof: can I have an user whitelisted in zuul, please? [11:09:25] TabbyCat: I'm probably not the correct one to ask :) [11:09:32] did you create a phab task? [11:09:42] that's how it probably works [11:09:53] zeljkof: I've got a patch and added some people to it [11:09:59] never created a task before [11:10:12] if it is the procedure now, I'll do that [11:13:38] TabbyCat: if you have done it before, it probably did not change [11:20:06] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [11:21:59] zeljkof: if it helps or you know who to stalk to get https://gerrit.wikimedia.org/r/#/c/372780/ merged ;) [11:23:52] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [11:24:05] TabbyCat: you have a good list there, somebody should know what to do [11:24:26] I would also create a task in phab, that might get some more eyes on it [11:24:48] zeljkof: okay, and I'll add a coffee token; for the lazyness :P [11:27:19] TabbyCat: i would probably suggest waiting the ~4 days for chad to get back, from memory it involves restarting some of the CI stuff to get the changes to be picked up after merging [11:27:31] s/chad/hashar [11:30:09] hashar is on vacation, should be back next week, but there will be a big backlog of stuff for him... [11:30:49] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [11:46:44] (03PS1) 10Zfilipin: Run mediawiki-core-qunit-selenium-jessie for CentralAuth [integration/config] - 10https://gerrit.wikimedia.org/r/373523 (https://phabricator.wikimedia.org/T173989) [11:52:34] (03CR) 10Zfilipin: [C: 032] Run mediawiki-core-qunit-selenium-jessie for CentralAuth [integration/config] - 10https://gerrit.wikimedia.org/r/373523 (https://phabricator.wikimedia.org/T173989) (owner: 10Zfilipin) [11:53:28] (03Merged) 10jenkins-bot: Run mediawiki-core-qunit-selenium-jessie for CentralAuth [integration/config] - 10https://gerrit.wikimedia.org/r/373523 (https://phabricator.wikimedia.org/T173989) (owner: 10Zfilipin) [11:56:50] !log Reloading Zuul to deploy c20a7402467efb669a30dc06ec70c41fc6853193 [11:56:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:57:50] (03CR) 10Zfilipin: "Deployed: https://wikitech.wikimedia.org/w/index.php?title=Release_Engineering%2FSAL&type=revision&diff=1768553&oldid=1768364" [integration/config] - 10https://gerrit.wikimedia.org/r/373523 (https://phabricator.wikimedia.org/T173989) (owner: 10Zfilipin) [13:04:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [13:18:11] 10Release-Engineering-Team (Kanban), 10MediaWiki-extensions-CentralAuth, 10Browser-Tests, 10Patch-For-Review, and 2 others: Port CentralAuth Selenium tests from Ruby to Node - https://phabricator.wikimedia.org/T173989#3548362 (10zeljkofilipin) a:05zeljkofilipin>03None [13:19:17] 10Release-Engineering-Team (Kanban), 10MediaWiki-extensions-CentralAuth, 10Browser-Tests, 10Patch-For-Review, and 2 others: Port CentralAuth Selenium tests from Ruby to Node - https://phabricator.wikimedia.org/T173989#3547318 (10zeljkofilipin) The setup is done, let me know if you need help. I am available... [13:20:49] 10Release-Engineering-Team (Watching / External), 10Operations, 10Ops-Access-Requests, 10User-Addshore: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3548373 (10daniel) Thank you! [13:22:18] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:23:56] PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:29:52] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: For MediaWiki extensions, merge rubocop inside mwext-mw-selenium-jessie - https://phabricator.wikimedia.org/T164479#3234955 (10zeljkofilipin) a:03zeljkofilipin [13:30:44] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: WebdriverIO tech talk - https://phabricator.wikimedia.org/T171852#3548397 (10zeljkofilipin) p:05High>03Normal [13:30:51] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Ruby, 10User-zeljkofilipin: Announce Selenium Ruby framework deprecation on appropriate mailing lists (QA, engineering, wikitech-l) - https://phabricator.wikimedia.org/T173488#3548398 (10zeljkofilipin) p:05High>03Normal [13:38:49] 10Release-Engineering-Team (Kanban), 10RelatedArticles, 10MW-1.30-release-notes (WMF-deploy-2017-08-22 (1.30.0-wmf.15)), 10Readers-Web-Backlog (Tracking), 10User-zeljkofilipin: Rewrite Related pages browser tests in Node.js - https://phabricator.wikimedia.org/T164024#3548403 (10Jdlrobson) [13:39:03] 10Release-Engineering-Team (Kanban), 10MobileFrontend, 10Readers-Web-Backlog, 10RelatedArticles, and 2 others: [EPIC] Port Selenium tests from Ruby to Node.js on Reading Web extensions - https://phabricator.wikimedia.org/T162256#3548407 (10Jdlrobson) [13:39:06] 10Release-Engineering-Team (Kanban), 10RelatedArticles, 10MW-1.30-release-notes (WMF-deploy-2017-08-22 (1.30.0-wmf.15)), 10Readers-Web-Backlog (Tracking), 10User-zeljkofilipin: Rewrite Related pages browser tests in Node.js - https://phabricator.wikimedia.org/T164024#3218685 (10Jdlrobson) 05Open>03Res... [13:39:16] 10Release-Engineering-Team (Kanban), 10MobileFrontend, 10Readers-Web-Backlog, 10RelatedArticles, and 2 others: [EPIC] Port Selenium tests from Ruby to Node.js on Reading Web extensions - https://phabricator.wikimedia.org/T162256#3157035 (10Jdlrobson) [13:39:36] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:39:47] 10Release-Engineering-Team (Kanban), 10RelatedArticles, 10MW-1.30-release-notes (WMF-deploy-2017-08-22 (1.30.0-wmf.15)), 10Readers-Web-Backlog (Tracking), 10User-zeljkofilipin: Rewrite Related pages browser tests in Node.js - https://phabricator.wikimedia.org/T164024#3548411 (10Jdlrobson) [13:39:49] 10Release-Engineering-Team (Kanban), 10RelatedArticles, 10MW-1.30-release-notes (WMF-deploy-2017-08-22 (1.30.0-wmf.15)), 10Patch-For-Review, and 2 others: Create Jenkins job that runs RelatedArticles Selenium tests daily - https://phabricator.wikimedia.org/T171847#3548409 (10Jdlrobson) 05Open>03Resolved... [13:45:13] 10Release-Engineering-Team (Kanban), 10RelatedArticles, 10MW-1.30-release-notes (WMF-deploy-2017-08-22 (1.30.0-wmf.15)), 10Readers-Web-Backlog (Tracking), 10User-zeljkofilipin: Rewrite Related pages browser tests in Node.js - https://phabricator.wikimedia.org/T164024#3548427 (10Jdlrobson) [13:46:13] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #501: 04FAILURE in 2 min 11 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/501/ [13:53:50] PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:02:14] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:03:56] RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:30:53] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3548755 (10zeljkofilipin) [14:33:53] RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:40:09] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Patch-For-Review: Update gerrit to 2.14.3 - https://phabricator.wikimedia.org/T156120#3548844 (10Paladox) [14:42:22] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [14:44:10] 10Release-Engineering-Team (Kanban), 10MinervaNeue, 10Readers-Web-Backlog, 10User-zeljkofilipin: Port Minerva's browser tests to Selenium - https://phabricator.wikimedia.org/T174018#3548878 (10zeljkofilipin) a:03zeljkofilipin [14:45:10] 10Release-Engineering-Team (Kanban), 10MinervaNeue, 10Readers-Web-Backlog, 10User-zeljkofilipin: Port Minerva's browser tests to Selenium - https://phabricator.wikimedia.org/T174018#3548434 (10zeljkofilipin) I will set this up and create a sample test. [14:45:49] 10Gerrit: Migrate to NoteDb - https://phabricator.wikimedia.org/T174034#3548890 (10Paladox) [14:46:00] 10Gerrit: Migrate to NoteDb - https://phabricator.wikimedia.org/T174034#3548903 (10Paladox) p:05Triage>03Lowest [14:56:40] 10Gerrit: Migrate to NoteDb - https://phabricator.wikimedia.org/T174034#3549012 (10Paladox) [15:22:20] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [15:27:32] Anyone able to merge and deploy https://gerrit.wikimedia.org/r/#/c/371138/ (Jenkins config change)? [15:32:52] James_F: sure [15:33:50] Thanks! [15:34:07] (03CR) 10BryanDavis: [C: 031] Add Melos to trusted user list [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (owner: 10MarcoAurelio) [15:34:19] (03CR) 10Thcipriani: [C: 032] Add npm jobs [integration/config] - 10https://gerrit.wikimedia.org/r/371138 (owner: 10Umherirrender) [15:35:12] (03Merged) 10jenkins-bot: Add npm jobs [integration/config] - 10https://gerrit.wikimedia.org/r/371138 (owner: 10Umherirrender) [15:37:22] !log reloading zuul to deploy https://gerrit.wikimedia.org/r/#/c/371138/ [15:37:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:37:54] James_F: should be live now [15:38:36] thcipriani: Thanks, checking. [15:38:53] nice, looks to be running https://integration.wikimedia.org/ci/job/mwgate-npm-node-6-jessie/20805/ for CreatePageUw [15:39:12] Yup. Thanks! [15:40:51] 10Gerrit, 10Repository-Admins: Can not change group membership in gerrit as a group member anymore - https://phabricator.wikimedia.org/T173337#3549246 (10Florian) @Legoktm: I'm pretty sure, that I already was able to add and remove members from groups where I'm a member of, and it was not something special lik... [15:44:10] 10Gerrit, 10Repository-Admins: Can not change group membership in gerrit as a group member anymore - https://phabricator.wikimedia.org/T173337#3549253 (10Paladox) @Florian hi, we uninstalled the single groups plugin. All new groups will own them selfs with exiting groups needing to request that there group own... [15:50:52] (03PS2) 10Umherirrender: [EmailDiff] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/371608 [15:51:55] 10Continuous-Integration-Config: CI job debian-glue-non-voting: add support for BACKPORTS=yes - https://phabricator.wikimedia.org/T173999#3547757 (10thcipriani) You can inject environmental variables in a job inside the `set_parameters` function inside `parameter_functions` in the `integration/config` repo. Thi... [15:55:17] 10Continuous-Integration-Config, 10User-MarcoAurelio: Whitelist @Melos on integration/config - https://phabricator.wikimedia.org/T174050#3549296 (10MarcoAurelio) [15:56:03] (03PS5) 10MarcoAurelio: Add Melos to trusted user list [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (https://phabricator.wikimedia.org/T174050) [15:56:09] (03PS6) 10MarcoAurelio: Add Melos to trusted user list [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (https://phabricator.wikimedia.org/T174050) [16:05:47] (03CR) 10MarcoAurelio: [C: 04-1] "Patch is messed up. Will reset and upload a new one." [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (https://phabricator.wikimedia.org/T174050) (owner: 10MarcoAurelio) [16:13:26] (03PS7) 10MarcoAurelio: Add Melos to trusted user list [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (https://phabricator.wikimedia.org/T174050) [16:15:47] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [140.0] [16:16:48] (03CR) 10Jforrester: [C: 031] Add Melos to trusted user list [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (https://phabricator.wikimedia.org/T174050) (owner: 10MarcoAurelio) [16:17:48] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [140.0] [16:22:41] yep, looks like a big backlog of patches to review ^ but nothing broken afaict: https://grafana.wikimedia.org/dashboard/db/nodepool?orgId=1 [16:23:31] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:25:11] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:29:07] 10Release-Engineering-Team (Kanban), 10Math, 10Browser-Tests, 10JavaScript, and 2 others: WebdriverIO tests for Math - https://phabricator.wikimedia.org/T162455#3549450 (10zeljkofilipin) To make it explicit, we did not invest time in updating CI so the example test runs fine there because the repository ow... [16:29:42] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:33:36] (03CR) 10MarcoAurelio: "Sorry, I somewhat messed this patch with edit and rebase. I think that this is okay now but please do carefully review just in case I remo" [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (https://phabricator.wikimedia.org/T174050) (owner: 10MarcoAurelio) [16:39:42] phuedx: heya! can you add MinervaNue to https://www.mediawiki.org/wiki/Developers/Maintainers (cf https://www.mediawiki.org/w/index.php?title=Reading/Component_responsibility&diff=next&oldid=2533052 ) :) [16:41:44] greg-g: suresies [16:41:57] :) [16:43:41] 10Continuous-Integration-Config: CI job debian-glue-non-voting: add support for BACKPORTS=yes - https://phabricator.wikimedia.org/T173999#3549505 (10thcipriani) instead of continuing the pattern inside the `parameter_functions.py` file of hard-coding repository names, i.e.: ```lang=python if 'debian-glue' in jo... [16:48:53] (03CR) 10Thcipriani: [C: 032] Add Melos to trusted user list [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (https://phabricator.wikimedia.org/T174050) (owner: 10MarcoAurelio) [16:50:18] (03PS4) 10Umherirrender: Whitelist second email of Kghbln [integration/config] - 10https://gerrit.wikimedia.org/r/372181 [16:51:06] (03Merged) 10jenkins-bot: Add Melos to trusted user list [integration/config] - 10https://gerrit.wikimedia.org/r/372780 (https://phabricator.wikimedia.org/T174050) (owner: 10MarcoAurelio) [16:52:37] such table [16:54:00] !log reloading zuul to deploy https://gerrit.wikimedia.org/r/#/c/372780/7 [16:54:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:56:24] phuedx: inorite [17:01:07] RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] [17:20:50] zuul dashboard not updating for anyone else? [17:22:50] thcipriani: that big string of wikidata jobs is gone, at least (from my last look) [17:23:11] greg-g: no I mean, it's not getting updates via ajax [17:23:18] the actual dashboard page [17:23:24] https://integration.wikimedia.org/zuul/ [17:23:32] hmm, I see nothing moving... [17:23:55] or is that just my ad blockers somewhere? [17:24:06] * thcipriani files task [17:25:11] yeah, even in my throwaway browser it's still not updating [17:25:15] (Chrome) [17:25:52] 10Continuous-Integration-Infrastructure: Zuul status page not updating - https://phabricator.wikimedia.org/T174058#3549685 (10thcipriani) [17:26:06] I think I see the problem... [17:26:47] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:28:59] 10Continuous-Integration-Infrastructure: Zuul status page not updating - https://phabricator.wikimedia.org/T174058#3549710 (10thcipriani) It looks like this commit added `mw` as an undefined reference to that script: https://gerrit.wikimedia.org/r/#/c/372496/ We include mw.org's version of that file in integrat... [17:34:24] heh [17:34:27] we should stop doing that [17:35:46] 10Continuous-Integration-Infrastructure, 10Zuul: Zuul status page not updating - https://phabricator.wikimedia.org/T174058#3549720 (10greg) [17:44:13] 10Continuous-Integration-Config: Reject non-executable files with execute bits with a build check - https://phabricator.wikimedia.org/T168659#3549755 (10Legoktm) Today @anomie removed a bunch of executable bits from extensions. [10:34:08] legoktm: find . \( -name .git -o -name node_modules -o -name ven... [18:01:49] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [18:04:59] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:05:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:10:49] 10MediaWiki-Codesniffer, 10MediaWiki-Platform-Team, 10Patch-For-Review: Enforce one class per file in preparation for PSR-4 - https://phabricator.wikimedia.org/T173798#3549899 (10Legoktm) 05Resolved>03Open Unfortunately this doesn't exactly do what we need. This enforces separately one class per file, on... [18:23:10] 10MediaWiki-Codesniffer, 10MediaWiki-Platform-Team, 10Patch-For-Review: Enforce one class per file in preparation for PSR-4 - https://phabricator.wikimedia.org/T173798#3549929 (10Legoktm) https://github.com/squizlabs/PHP_CodeSniffer/issues/1627 [18:40:00] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:40:04] 10Release-Engineering-Team, 10MediaWiki-Containers, 10Operations, 10Epic, and 3 others: FY2017/18 Program 6 - Outcome 2 - Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3550012 (10dbarratt) [18:40:11] 10MediaWiki-Releasing, 10Release-Engineering-Team (Watching / External), 10Architecture, 10Parsoid, and 2 others: Evaluate and decide on a distribution strategy targeted at VMs - https://phabricator.wikimedia.org/T87774#3550013 (10dbarratt) [18:40:17] 10MediaWiki-Releasing, 10MediaWiki-Containers, 10Wikimania-Hackathon-2017, 10Services (doing), and 2 others: Ready-to-use Docker package for MediaWiki - https://phabricator.wikimedia.org/T92826#3550010 (10dbarratt) 05Open>03Resolved BOOM! https://hub.docker.com/_/mediawiki/ This is obviously **not** m... [18:40:36] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:52:17] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-MarcoAurelio: Whitelist @Melos on integration/config - https://phabricator.wikimedia.org/T174050#3550131 (10MarcoAurelio) 05Open>03Resolved a:03thcipriani Thanks to @thcipriani for merging the patch. Does... [18:52:32] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10User-MarcoAurelio: Whitelist @Melos on integration/config - https://phabricator.wikimedia.org/T174050#3550132 (10MarcoAurelio) [18:55:16] Hello. Would someone mind reviewing (possibly +2) for https://gerrit.wikimedia.org/r/#/c/371947/? (it's an extension used by the WMF) [18:55:51] 10MediaWiki-Releasing, 10MediaWiki-Containers, 10Wikimania-Hackathon-2017, 10Services (doing), and 2 others: Ready-to-use Docker package for MediaWiki - https://phabricator.wikimedia.org/T92826#1484393 (10Legoktm) Woohoo, congrats :) Do you think you could fill out the docker section on https://www.mediawi... [19:03:06] (03PS1) 10Thcipriani: Add global mw obj, remove jqXHR `complete` [integration/docroot] - 10https://gerrit.wikimedia.org/r/373667 (https://phabricator.wikimedia.org/T174058) [19:09:05] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:09:39] RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:12:20] PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:17:33] PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:20:49] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:20:55] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:30:40] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:50:18] 10Continuous-Integration-Config, 10User-MarcoAurelio: zuul seems stuck and has some jobs running for 20-40 minutes already - https://phabricator.wikimedia.org/T174083#3550503 (10MarcoAurelio) [19:53:30] 10Continuous-Integration-Config, 10User-MarcoAurelio: zuul seems stuck and has some jobs running for 20-40 minutes already - https://phabricator.wikimedia.org/T174083#3550534 (10Zppix) Releng and @thcipriani are aware @chasemp and @andrewbogott seem to be investigating. [19:54:15] 10Beta-Cluster-Infrastructure, 10MediaWiki-Authentication-and-authorization, 10MediaWiki-extensions-CentralAuth, 10MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)), 10Patch-For-Review: "Loss of session data" on Beta Cluster - https://phabricator.wikimedia.org/T172560#3550539 (10Etonkovidova)... [19:55:38] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Cloud-VPS, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3550552 (10chasemp) We have been having rabbitmq and/or timeout issues wi... [19:56:20] PROBLEM - nodepoold running on labnodepool1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [19:59:20] RECOVERY - nodepoold running on labnodepool1001 is OK: PROCS OK: 1 process with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [20:11:14] ok, now nodepool is running again and cleaned up a few things... [20:11:21] but now it's just sitting there. Not sure what it's waiting for. [20:11:46] hrm, right now seems like it's: Adding gearman server contint1001.wikimedia.org_4730 [20:11:56] stopping now? [20:12:27] oh, ok, I kicked it prematurely, I'll let it sit longer next time [20:13:32] FWIW /var/log/nodepool/debug.log seems more useful than any of the other nodepool logs [20:16:23] thanks, maybe that will show >0 activity... [20:17:12] https://integration.wikimedia.org/zuul/ hasn't changed [20:17:37] unplug and replug not an option right? ;) [20:17:51] TabbyCat: not really it take longer than lol [20:18:06] nodepool looks like it's starting to make requests to launch nodes, so there's something [20:18:48] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3: Addressing technical debt - https://phabricator.wikimedia.org/T174087#3550609 (10greg) [20:19:54] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 1: The amount of orphaned code that is running Wikimedia “production” services is reduced. - https://phabricator.wikimedia.org/T174088#3550622 (10greg) [20:20:06] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 2: Organizational technical debt is reduced. - https://phabricator.wikimedia.org/T174089#3550634 (10greg) [20:20:43] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 1 Objective 1: Define a set of code stewardship levels (from high to low expectations) - https://phabricator.wikimedia.org/T174090#3550646 (10greg) [20:20:56] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 1 Objective 2: Identify and find stewards for high-priority/high use code segment orphans - https://phabricator.wikimedia.org/T174091#3550658 (10greg) [20:21:03] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 1 Objective 3: Define and steward a light-weight process for adopting or orphaning/sunsetting products and infrastructure. - https://phabricator.wikimedia.org/T174092#3550670 (10greg) [20:21:40] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 2 Objective 1: Define a “Technical Debt Project Manager” role that regularly communicates with all Foundation engineering teams regarding their technical debt - https://phabricator.wikimedia.org/T174093#3550682 (10greg) [20:21:47] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 2 Objective 2: Define and implement a process to regularly address technical debt across the Foundation - https://phabricator.wikimedia.org/T174095#3550706 (10greg) [20:21:56] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 2 Objective 3: Promote and surface important technical debt topics at large gatherings of Wikimedia developers (e.g., DevSummit and Hackathon(s)) - https://phabricator.wikimedia.org/T174096#3550718 (10greg) [20:21:58] thcipriani: sure, luckily I don't have anything to SWAT today :) [20:22:04] :) [20:22:13] thcipriani: does this mean anything? DEBUG nodepool.NodeCompleteThread: Unable to find node with nodename: deployment-tin.eqiad [20:22:33] beta cluster [20:23:13] ok, there, finally it created a new node [20:23:21] andrewbogott: I saw that and I don't recall seeing it before, but I'm guessing that it's just digging through the zuul gearman queue figuring out if its responsible for jobs. That is, I don't think it's a problem. [20:23:30] *it's [20:23:43] looks like it's working now, just took a long time to get its bearings [20:24:10] sounds like me [20:24:25] yeah, I saw 5 or so "building" nodes get hostnames...so that's good :) [20:24:43] and so far openstack is keeping up ok [20:26:35] 10releng-201718-q2, 10Epic: FY2017/18 Program 3 Outcome 1 Objective 1: Define a set of code stewardship levels (from high to low expectations) - https://phabricator.wikimedia.org/T174090#3550736 (10greg) [20:26:51] hey and jobs are running [20:27:07] andrewbogott: thanks for your help! [20:28:11] 10Release-Engineering-Team, 10Epic: FY2017/18 Program 3 Outcome 1 Objective 1: Define a set of code stewardship levels (from high to low expectations) - https://phabricator.wikimedia.org/T174090#3550646 (10greg) [20:29:30] 10releng-201718-q2, 10Epic: FY2017/18 Program 3 Outcome 1 Objective 1: Define a set of code stewardship levels (from high to low expectations) - https://phabricator.wikimedia.org/T174090#3550751 (10greg) [20:29:37] * greg-g is done fighting with myself [20:30:51] Lol [20:47:47] is something going on with CI? My patch is being processed for 1.5 hours, still less than half done... [20:48:28] SMalyshev: yeah, there were problems with nodepool, it's still recovering from the backlog. [20:48:51] thcipriani: ah, thanks. Will exercise patience then :) [20:50:10] * Zppix suggests that a topic change for status is ideal? [20:51:26] If it was during the time we needed to restart nodepool, yes, now it's just 'delayed' and that will pass very soon [20:52:30] greg-g: whelp i forgot to refresh i didnt realise the backlog decrease (gotta love hitting f4 instead of 5) [20:59:13] https://grafana.wikimedia.org/dashboard/db/nodepool?orgId=1&from=now-15m&to=now <-- looks good to you? [21:00:54] TabbyCat: it looks like a nodepool stablizing to me but fwiw im not the most experienced in that area, im learning slowly though [21:01:51] TabbyCat: that's what it looks like when busy, see: https://grafana.wikimedia.org/dashboard/db/nodepool?orgId=1 [21:02:00] that dip is the outage [21:02:05] that it is recovering from [21:02:32] greg-g: you mean the gap from 20:00 to 20:30 right? [21:02:36] yes [21:02:53] I somewhat figured out that //that// might be the problem [21:03:04] it seems our patient is recovering then [21:03:20] yes, as I said :) [21:05:02] good, good :D [21:05:26] greg-g: i hear rumors of ci moving to docker? [21:07:30] Zppix: very very early stages. The context is that services in production are moving to docker (in the long term, we're just starting the project this quarter) and to keep production and CI in sync we'll migrate CI to docker as well. We've done some experiments with docker based CI recently (notably the ops/puppet repo now runs via docker images instead of nodepool). But the timeline for any othe [21:07:35] repos is not clear/anything soon. [21:08:25] How will that affect maintainers who use ci? If docker is used entirely? [21:08:28] there was also plans to migrate development from Gerrit to Differential and I think that they can be ditched at this point [21:09:46] TabbyCat: i like my gerrit i dont really like diffusion idea [21:09:58] I got used to gerrit already [21:10:18] and I might need to get used to Arcanist to use Differential [21:10:34] so things can stay as they are for me [21:12:35] TabbyCat: gerrit + differential are unrelated to this entirely [21:13:00] greg-g: I know. Just a comparaison about the plans to move to docker. It might take some time :) [21:13:02] Zppix: we'll have clear instructions if any action is needed of the maintainers [21:14:01] TabbyCat: not at all a valid comparison. One was effectively blocked by... reasons we don't need to get into here. And this is a 3 party cross team endeavor, codified in our annual plan [21:14:20] I'd appreciate if the snark level decreased in this channel, especially when it comes to release engineering related projects. [21:16:19] greg-g: no snark on my part, if I understood the dictionary definition rightly. [21:16:43] (Biting, cruel humor or wit, commonly used to verbally attack someone or something.) [21:16:50] not at all [21:17:22] also: sarcasm or mocking [21:17:33] neither [21:18:02] I felt it was mocking of release engineering efforts in the past and using that to degrade current plans. but I admit the differential thing is a sore spot for me. [21:18:41] (it still hurts, the differential (non)migration) :( [21:19:00] greg-g: apolgises if i came across that way i just was stating my opinion. [21:19:43] greg-g: be sure that I respect the whole releng team not to make derogatory statements about you or your work; I am aware that the non-migration is because there are still things to be resolved (ie: Harbormaster is still pretty rough afaik, among other things) [21:19:55] ^ [21:20:05] Ditto [21:22:16] thanks both. [21:22:22] Np [21:22:57] np too; sorry if I said something wrong though; in my defense, I do not speak English [21:23:12] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3217943 (10debt) [21:24:54] TabbyCat: you speak it well :) And yeah, understand linguistic differences, we're good. :) [21:25:19] (03CR) 10Krinkle: [C: 04-1] "Defining an empty mw object won't suffice given it will still fail on the next level of property access. Fixing the deprecated use will wo" [integration/docroot] - 10https://gerrit.wikimedia.org/r/373667 (https://phabricator.wikimedia.org/T174058) (owner: 10Thcipriani) [21:25:34] TabbyCat: i cant either and im a native english speaker xD [21:45:52] 10Continuous-Integration-Config: CI job debian-glue-non-voting: add support for BACKPORTS=yes - https://phabricator.wikimedia.org/T173999#3551123 (10Volans) Thanks @thcipriani for the answers, with my little knowledge of the zuul-jenkins relationship and (in)direct variables settings, it seems to me a fairly nor... [21:48:07] RECOVERY - Free space - all mounts on integration-slave-jessie-android is OK: OK: integration.integration-slave-jessie-android.diskspace._mnt.byte_percentfree (No valid datapoints found) [21:55:25] !log disk was full on integration-slave-jessie-android; deleted ~8gb of old screenshots from /tmp to clear some space [21:55:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:01:36] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [22:12:40] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:41:34] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:52:40] RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:59:32] 10MediaWiki-Releasing, 10MediaWiki-Containers, 10Wikimania-Hackathon-2017, 10Services (doing), and 2 others: Ready-to-use Docker package for MediaWiki - https://phabricator.wikimedia.org/T92826#3551324 (10dbarratt) >>! In T92826#3550173, @Legoktm wrote: > Woohoo, congrats :) Do you think you could fill out... [23:02:34] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:07:04] 10Beta-Cluster-Infrastructure, 10MediaWiki-Authentication-and-authorization, 10MediaWiki-extensions-CentralAuth, 10MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)), 10Patch-For-Review: "Loss of session data" on Beta Cluster - https://phabricator.wikimedia.org/T172560#3502076 (10Krenair) Is t... [23:42:34] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0]