[00:01:04] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [00:01:27] James_F: I got storybook publishing > https://integration.wikimedia.org/ci/job/mwext-node10-docs-docker-publish/1036/console [00:01:43] i'm not too happy with the result but it does the curling on the client and the results are committed [00:01:55] do experimental jobs publish to a location? [00:02:10] Jdlrobson: Nice. Yes, that should have actually published. [00:02:14] https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/532736/ [00:02:17] Want me to convert the repo? [00:02:22] yes please [00:02:31] Check you are okay with https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/532736/ before doing so though :) [00:02:37] https://doc.wikimedia.org/MobileFrontend/master/js/js/ [00:02:41] Documentation generated by JSDoc 3.6.3 on Wed Aug 28 2019 00:00:05 GMT+0000 (GMT) [00:02:45] Definitely fresh. :-) [00:03:13] that's the js. Should be a ui folder too [00:03:13] (03CR) 10Jforrester: [C: 03+2] layout: [MobileFrontend] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532417 (https://phabricator.wikimedia.org/T230841) (owner: 10Jforrester) [00:03:26] ahah https://doc.wikimedia.org/MobileFrontend/master/js/ui/?path=/story/anchor--normal [00:03:29] w00t [00:03:33] Yes https://doc.wikimedia.org/MobileFrontend/master/js/ui/?path=/story/anchor--normal [00:04:01] (03CR) 10Jforrester: [C: 03+2] jjb: Drop mwext-npm-doc-publish [integration/config] - 10https://gerrit.wikimedia.org/r/532420 (owner: 10Jforrester) [00:04:54] (03Merged) 10jenkins-bot: layout: [MobileFrontend] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532417 (https://phabricator.wikimedia.org/T230841) (owner: 10Jforrester) [00:06:43] (03Merged) 10jenkins-bot: jjb: Drop mwext-npm-doc-publish [integration/config] - 10https://gerrit.wikimedia.org/r/532420 (owner: 10Jforrester) [00:07:02] !log Zuul: Migrate MobileFrontend to extension-javascript-documentation T230831 [00:07:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:07:05] T230831: Add GrowthExperiments to tasks tagged as GrowthExperiments-* - https://phabricator.wikimedia.org/T230831 [00:07:37] !log JJB: Delete mwext-npm-doc-publish, no longer used [00:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:07:50] Jdlrobson: Should be working well now. Will merge your patch. [00:07:58] thanks James_F ! [00:11:05] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [00:31:05] 10Continuous-Integration-Infrastructure: castor rsync's taking 3-5 minutes for mwgate-npm jobs - https://phabricator.wikimedia.org/T188375 (10tstarling) I talked with @Legoktm about the possibility of removing the npm cache from this rsync'd cache directory, and instead using a central network service for indivi... [00:40:14] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MobileFrontend, 10Documentation, 10Readers-Web-Backlog (Tracking): Migrate documentation generation to Node 10.15.2 from node 6.11.0 - https://phabricator.wikimedia.org/T230841 (10Jdlrobson) 05Open→03Resolved hurrah! [00:40:17] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201908), 10JavaScript: Upgrade all CI jobs from node6/npm3 to node10/npm6 across all projects - https://phabricator.wikimedia.org/T211784 (10Jdlrobson) [01:01:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [01:38:45] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<40.00%) [02:11:04] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [04:11:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [04:31:12] 10Continuous-Integration-Config, 10RESTBase, 10Documentation, 10good first bug: Remove link to defunct https://rest.wikimedia.org/ on https://doc.wikimedia.org/ - https://phabricator.wikimedia.org/T227766 (10Zoranzoki21) I would like to work on this, and I think it should link to page on MediaWiki, but noo... [05:01:04] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:11:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:46:01] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:58:47] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:41:36] (03CR) 10Hashar: php-fpm: restart as mwdeploy as root (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/532829 (owner: 10Thcipriani) [08:11:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [08:11:19] 10Project-Admins: Wiki Club West project - https://phabricator.wikimedia.org/T231416 (10JarrahTree) [08:36:11] 10Continuous-Integration-Config, 10MediaWiki-Core-Testing: Audit using a library to bypass 'final' keyword in PHPUnit - https://phabricator.wikimedia.org/T231419 (10Daimona) [08:44:05] 10Project-Admins: Wiki Club West project - https://phabricator.wikimedia.org/T231416 (10Peachey88) Hi @JarrahTree, For a project created a few extra details are needed (https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects). Just to confirm you want: Name: Wiki-Club-West Type: Group [08:58:00] 10Project-Admins: Wiki Club West project - https://phabricator.wikimedia.org/T231416 (10JarrahTree) That's great if possible [08:59:21] 10Project-Admins: Wiki Club West project - https://phabricator.wikimedia.org/T231416 (10JarrahTree) oops adding info [09:05:40] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: /srv 50402 MB (5% inode=93%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=contint1001&var-datasource=eqiad+prometheus/ops [09:12:38] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201907), 10Operations, 10serviceops: contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10Marostegui) [09:12:41] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907), 10Operations: contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10Marostegui) 05Resolved→03Open This is alerting again: ` [09:05:40] <+icinga-wm> PROBLEM... [09:13:07] 10Continuous-Integration-Config, 10Documentation: doc.wikimedia.org contains empties on end of page - https://phabricator.wikimedia.org/T231421 (10Zoranzoki21) [09:28:26] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: /srv 50705 MB (5% inode=93%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=contint1001&var-datasource=eqiad+prometheus/ops [10:11:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [10:16:09] 10Release-Engineering-Team-TODO (201908), 10User-zeljkofilipin: Request Sauce Labs access for niedzielski - https://phabricator.wikimedia.org/T206358 (10zeljkofilipin) [10:34:25] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907), 10Operations: contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10Dzahn) /srv/jenkins 753G (!) @hashar [10:48:22] PROBLEM - Free space - all mounts on deployment-mediawiki-07 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki-07.diskspace.root.byte_percentfree (<100.00%) [11:25:18] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907), 10Operations: contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10hashar) The builds are growing insane again. In megabytes: ` contint1001:/srv$ du /srv/jenkins... [11:55:22] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907), 10Operations: contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10hashar) There are a lot of `mw-debug-cli.log` files which are 130MBytes. It is generated by Me... [11:58:26] 161125 [DBQuery] Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly [0s] localhost:/workspace/db/quibble-mysql-6nyjl11f/socket: SELECT @@GLOBAL.read_only AS Value [11:58:26] bah [12:02:29] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [12:11:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [12:24:05] (03PS1) 10Hashar: Compress MediaWiki logs in all jobs [integration/config] - 10https://gerrit.wikimedia.org/r/532995 (https://phabricator.wikimedia.org/T219850) [12:24:47] (03CR) 10Hashar: [C: 03+2] "jobs updated" [integration/config] - 10https://gerrit.wikimedia.org/r/532995 (https://phabricator.wikimedia.org/T219850) (owner: 10Hashar) [12:28:04] (03Merged) 10jenkins-bot: Compress MediaWiki logs in all jobs [integration/config] - 10https://gerrit.wikimedia.org/r/532995 (https://phabricator.wikimedia.org/T219850) (owner: 10Hashar) [12:33:27] RECOVERY - Disk space on contint1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=contint1001&var-datasource=eqiad+prometheus/ops [12:41:22] 10Release-Engineering-Team-TODO (201908), 10ContentTranslation, 10User-zeljkofilipin: error while trying to run wdio tests - https://phabricator.wikimedia.org/T231305 (10Jpita) 05Open→03Resolved [12:51:27] 10Release-Engineering-Team-TODO (201908), 10ContentTranslation, 10User-zeljkofilipin: error while trying to run wdio tests - https://phabricator.wikimedia.org/T231305 (10zeljkofilipin) Resolved while pairing with @Jpita on T231428. [13:04:31] 10Release-Engineering-Team-TODO (201908): [Cloud VPS alert] Puppet failure on integration-agent-puppet-docker-1001.integration.eqiad.wmflabs - https://phabricator.wikimedia.org/T231447 (10zeljkofilipin) [13:20:25] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.34.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T220745 (10zeljkofilipin) [13:23:01] (03PS1) 10Hashar: Align mediawiki-quibble* jobs with other quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533010 [13:24:59] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [13:25:11] (03PS1) 10Hashar: mediawiki-quibble* jobs now reuse the main Quibble template [integration/config] - 10https://gerrit.wikimedia.org/r/533011 [13:28:34] (03PS1) 10Hashar: Further simply mediawiki-quibble* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533013 [13:29:45] (03PS2) 10Hashar: Align mediawiki-quibble* jobs with other quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533010 (https://phabricator.wikimedia.org/T219850) [13:30:13] (03CR) 10Hashar: [C: 03+2] Align mediawiki-quibble* jobs with other quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533010 (https://phabricator.wikimedia.org/T219850) (owner: 10Hashar) [13:30:32] (03CR) 10Hashar: [C: 03+2] "It is indeed a noop" [integration/config] - 10https://gerrit.wikimedia.org/r/533011 (owner: 10Hashar) [13:31:54] (03CR) 10jerkins-bot: [V: 04-1] Align mediawiki-quibble* jobs with other quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533010 (https://phabricator.wikimedia.org/T219850) (owner: 10Hashar) [13:32:37] (03Merged) 10jenkins-bot: Align mediawiki-quibble* jobs with other quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533010 (https://phabricator.wikimedia.org/T219850) (owner: 10Hashar) [13:33:05] 10Gerrit, 10User-zeljkofilipin: Can not download a specific patch from Gerrit using git-review - https://phabricator.wikimedia.org/T194520 (10Huji) 05Resolved→03Open This was fixed in upstream about a year ago, yet on our instance of gerrit for WMF this issue still prevails. ` > git review -d 532713,1 WAR... [13:38:13] (03PS2) 10Hashar: mediawiki-quibble* jobs now reuse the main Quibble template [integration/config] - 10https://gerrit.wikimedia.org/r/533011 [13:38:25] (03CR) 10Hashar: [C: 03+2] mediawiki-quibble* jobs now reuse the main Quibble template [integration/config] - 10https://gerrit.wikimedia.org/r/533011 (owner: 10Hashar) [13:38:43] (03CR) 10Hashar: [C: 03+2] Further simply mediawiki-quibble* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533013 (owner: 10Hashar) [13:41:04] (03PS1) 10Hashar: Revert "jjb: Drop three custom tox-docker jobs, unused" [integration/config] - 10https://gerrit.wikimedia.org/r/533018 [13:41:29] (03PS2) 10Hashar: Further simply mediawiki-quibble* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533013 [13:41:37] (03CR) 10Hashar: [C: 03+2] Further simply mediawiki-quibble* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533013 (owner: 10Hashar) [13:43:27] (03Merged) 10jenkins-bot: mediawiki-quibble* jobs now reuse the main Quibble template [integration/config] - 10https://gerrit.wikimedia.org/r/533011 (owner: 10Hashar) [13:44:44] (03Merged) 10jenkins-bot: Further simply mediawiki-quibble* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/533013 (owner: 10Hashar) [13:45:30] (03PS1) 10Hashar: Revert "layout: Move three repos to generic tox-docker job" [integration/config] - 10https://gerrit.wikimedia.org/r/533021 [13:46:18] (03PS2) 10Hashar: Revert "jjb: Drop three custom tox-docker jobs, unused" [integration/config] - 10https://gerrit.wikimedia.org/r/533018 [13:46:20] (03PS2) 10Hashar: Revert "layout: Move three repos to generic tox-docker job" [integration/config] - 10https://gerrit.wikimedia.org/r/533021 [13:47:23] (03CR) 10Hashar: [C: 03+2] "INFO:jenkins_jobs.builder:Creating jenkins job cassandra-table-properties-tox-docker" [integration/config] - 10https://gerrit.wikimedia.org/r/533018 (owner: 10Hashar) [13:47:46] (03CR) 10Hashar: [C: 03+2] Revert "layout: Move three repos to generic tox-docker job" [integration/config] - 10https://gerrit.wikimedia.org/r/533021 (owner: 10Hashar) [13:48:44] (03CR) 10jerkins-bot: [V: 04-1] Revert "layout: Move three repos to generic tox-docker job" [integration/config] - 10https://gerrit.wikimedia.org/r/533021 (owner: 10Hashar) [13:49:24] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907), 10Operations, 10Patch-For-Review: contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10hashar) 05Open→03Resolved Some of the jobs (`mediawiki-quibble-*`, F... [13:49:28] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201907), 10Operations, 10serviceops: contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10hashar) [13:49:53] (03Merged) 10jenkins-bot: Revert "jjb: Drop three custom tox-docker jobs, unused" [integration/config] - 10https://gerrit.wikimedia.org/r/533018 (owner: 10Hashar) [13:49:56] (03Merged) 10jenkins-bot: Revert "layout: Move three repos to generic tox-docker job" [integration/config] - 10https://gerrit.wikimedia.org/r/533021 (owner: 10Hashar) [13:54:47] RECOVERY - Puppet staleness on integration-agent-puppet-docker-1001 is OK: OK: Less than 1.00% above the threshold [3600.0] [14:11:05] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [14:29:33] 10Gerrit, 10Release-Engineering-Team-TODO (201908), 10User-zeljkofilipin: Can not download a specific patch from Gerrit using git-review - https://phabricator.wikimedia.org/T194520 (10zeljkofilipin) [14:40:18] 10Project-Admins: Create a "Language-analytics" project under "Language-Team" - https://phabricator.wikimedia.org/T231455 (10Neil_P._Quinn_WMF) [14:53:30] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907), 10Operations: contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10hashar) Once clenanup has completed: ` $ df -h /srv Filesystem Size Us... [14:56:30] (03CR) 10Hashar: [C: 03+2] "https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/TimedMediaHandler/+/528599/ got merged :)" [integration/config] - 10https://gerrit.wikimedia.org/r/528600 (https://phabricator.wikimedia.org/T224766) (owner: 10Umherirrender) [14:59:00] (03Merged) 10jenkins-bot: [TimedMediaHandler] Run phan job [integration/config] - 10https://gerrit.wikimedia.org/r/528600 (https://phabricator.wikimedia.org/T224766) (owner: 10Umherirrender) [15:00:12] 10Gerrit, 10Release-Engineering-Team-TODO (201908), 10User-zeljkofilipin: Can not download a specific patch from Gerrit using git-review - https://phabricator.wikimedia.org/T194520 (10hashar) @Huji that was an issue about `git-review` not supporting our version of Gerrit. The upstream change has been release... [15:33:21] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO: Re-enable use of Gerrit HTTP token to push patchsets - https://phabricator.wikimedia.org/T218750 (10thcipriani) 05Open→03Resolved This has been re-enabled since August 8th. I have been waiting to see if this cha... [15:36:58] (03CR) 10Jforrester: "C-1. We don't need this, this just adds complexity for a few seconds' speed-up." [integration/config] - 10https://gerrit.wikimedia.org/r/533018 (owner: 10Hashar) [15:52:50] 10Gerrit, 10Release-Engineering-Team-TODO (201908), 10User-zeljkofilipin: Can not download a specific patch from Gerrit using git-review - https://phabricator.wikimedia.org/T194520 (10Huji) 05Open→03Resolved a:03Huji Oh, I see. I was using 1.26.0 and now I upgraded to 1.28.0 and things work smoothly. [15:53:32] 10Continuous-Integration-Infrastructure, 10MediaWiki-Installer, 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.32-release, and 3 others: install.php --with-extensions silently ignores extensions whose dependencies are not satisfied - https://phabricator.wikimedia.org/T225512 (10eprodromou) @WDo... [16:08:33] !log integration: docker container prune -f ; docker image prune -f [16:08:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:11:04] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [16:11:35] !log Upgraded Jenkins for security update (2.176.3) [16:11:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:17:05] 10Project-Admins: Create a "Language-analytics" project under "Language-Team" - https://phabricator.wikimedia.org/T231455 (10Pginer-WMF) 05Open→03Resolved a:03Pginer-WMF I created the project #Language-analytics. [16:18:01] stupid plugins .. [16:19:58] 10Continuous-Integration-Infrastructure, 10MediaWiki-Installer, 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.32-release, and 3 others: install.php --with-extensions silently ignores extensions whose dependencies are not satisfied - https://phabricator.wikimedia.org/T225512 (10WDoranWMF) It di... [16:23:03] 10Continuous-Integration-Infrastructure, 10MediaWiki-Installer, 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.32-release, and 3 others: install.php --with-extensions silently ignores extensions whose dependencies are not satisfied - https://phabricator.wikimedia.org/T225512 (10eprodromou) Than... [16:24:18] 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201908): [Cloud VPS alert] Puppet failure on integration-agent-puppet-docker-1001.integration.eqiad.wmflabs - https://phabricator.wikimedia.org/T231447 (10greg) "You are receiving this email because you are listed as member fo... [16:24:48] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Jenkins: Switch back to upstream jenkins xunit plugin after PHPUnit fix is released - https://phabricator.wikimedia.org/T194096 (10hashar) Our fork is marked `1.103-wmf.1`. Patche... [16:25:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Jenkins: Switch back to upstream jenkins xunit plugin after PHPUnit fix is released - https://phabricator.wikimedia.org/T194096 (10hashar) p:05Lowest→03Normal [16:33:41] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.34.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T220745 (10Reedy) [17:05:25] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [17:15:48] PROBLEM - Host deployment-mediawiki-jhuneidi is DOWN: CRITICAL - Host Unreachable (172.16.1.48) [17:18:50] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.34.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T220745 (10Jdforrester-WMF) [17:23:11] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.34.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T220745 (10Jdforrester-WMF) [17:55:23] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-jhuneidi is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 487 bytes in 0.002 second response time [18:11:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [18:25:09] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:44:38] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.34.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T220745 (10zeljkofilipin) [19:33:31] 10Phabricator, 10Project-Admins, 10Growth-Team (Current Sprint): Add GrowthExperiments to tasks tagged as GrowthExperiments-* - https://phabricator.wikimedia.org/T230831 (10MarcoAurelio) @JTannerWMF Are you okay with @Aklapper's explanation above? He can convert those tags to subprojects if you're okay with... [19:55:16] So https://phabricator.wikimedia.org/T231365 showed up in my notifications, despite my not being subscribed to it or requesting notifications from any of those projects that i can tell... [19:55:21] whyyyyy? [19:55:52] robh: because SRE-Access-Requests is in the "Subscribers" field. [19:56:05] Which shouldn't. [19:56:08] ahhhh... huh [19:56:12] indeed, it shouldnt and i didnt notice [19:56:14] thanks! ill remove [19:56:14] got my email fwiw robh ? [19:57:02] ? [19:57:24] email? [19:57:42] robh: yup, sent you an email some days ago [19:59:15] I don't see anything, can you tell me the subject line to search for? I do a LOT of emails though so i could have missed it accidentally. [19:59:52] robh: sure, let me fetch my outbound folders [20:00:22] robh: "LDAP Access Requests" [20:00:32] see if it's in Spam just in case as well [20:01:06] I don't send many emails to be blacklisted but spam filters sometimes do funny things. [20:01:22] ahh, i see it [20:01:27] so employees dont have nda checks [20:01:34] since you are on nda during your hiring paperwork [20:01:41] so you dont see ldap confirmation if its a staff account [20:01:51] Got it [20:01:55] rephrase: you dont see legal confirming nda is there when they are staff accounts [20:02:04] just for volunteers and contractors [20:02:09] but paranoia is always good =] [20:02:14] Heh [20:02:21] Thanks for the info [20:02:30] yeah it just got lost in the avalanche of email, sorry about that [20:02:47] No problem. Happens to me from time to time as well [20:02:51] also because its not a secret [20:03:02] there is a google sheet legal only has write access to [20:03:11] that they share read access to individuals doing clinic duty [20:03:19] so, those folks can also check the google sheet [20:03:31] but im paranoid so when i do that, i note that i've confirmed nda status via that, etc... [20:03:51] I knew something like that existed, yep. I think it's mentioned on Wikitech somewhere. [20:03:52] Paranoia = best practices for next decade. [20:04:23] I still don't get why people has to sign L3 AND Cobblestone fwiw. The later is more comprehensive. [20:04:32] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.34.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T220745 (10Jdforrester-WMF) [20:04:40] oh, most of SRE doesnt have cobblestone access [20:04:44] or L2 - too many L* [20:04:44] and no way to confirm it [20:04:58] i mean, sre directors access cobblestone and so do managers [20:05:08] but noone else that im aware of... i have a login.... i hate that software [20:05:10] its terrible. [20:05:22] i have no cue how to use it since ive never had to, we hired willy ;D [20:05:32] anybody here knows how I can change an email that I have on gerrit login? [20:05:52] I've changed it on Wikitech but that doesn't seem to change anything for gerrit [20:06:00] I prefer to sign important stuff in front of a Notary robh :) [20:06:14] Civil Law Notary fwiw [20:06:44] SMalyshev: I'm not sure but maybe changing the email you use on Wikitech could work [20:06:53] SMalyshev: try hitting reload in the gerrit settings [20:07:00] hauskatze: well I did and it still not changed [20:07:11] SMalyshev: true, sorry, didn't read that line [20:07:27] ah wait maybe I should log out and log in again? [20:07:57] robh: fwiw I though only Legal and Finance had access to Cobblestone [20:07:58] ok, that actually helped :) sweet [20:08:10] at least Rachel told me only those people had access to [20:11:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [20:17:50] hey releng, is zuul held up at the moment? [20:18:54] the first item in the queue has been there for 4h+ if I'm reading it correctly [20:21:07] Good Lord. [20:21:19] greg-g: zuul's down apparently [20:21:25] cc James_F [20:22:06] or performing very slowly [20:22:10] tons of stuff there [20:27:37] * James_F looks. [20:28:24] Hmm. https://integration.wikimedia.org/ci/view/CI/job/wmf-quibble-core-vendor-mysql-hhvm-docker/ is executing… [20:29:03] Also, https://gerrit.wikimedia.org/r/c/mediawiki/core/+/533056 landed three hours ago; why is it still in test? [20:29:22] I know Antoine restarted jenkins earlier. [20:29:56] I could restart Zuul to clear the queues but that'll loose everything, and is very disruptive. [20:31:10] Maybe thcipriani might know what to do? [20:31:24] * thcipriani looks [20:31:27] Each time you restart zuul, Antoine kills a kitty. Be mindful of the kitties. [20:31:34] Indeed. [20:31:47] Reloading is fine, restarting seems disruptive I think. [20:32:02] but I don't think reloading would fix it? - I'm not a CI guy. [20:32:07] Yeah, but reloading is what we did earlier and what might have caused the lost item. [20:32:29] I could just wait 'til late tonight when all the queues are empty anyway and restart then? [20:32:39] Probably zuul is in prorrogation too. [20:33:43] I'm not +2ing anything today [20:33:52] Plus I don't want to contribute to increase the backlog. [20:34:45] * James_F grins. [20:36:39] I'll go play some cards then :) [20:37:53] hrm, well, according to gearman there are 420 queued jobs... [20:37:58] and 29 running [20:38:21] Does gearman think 533056,1 is still queued? [20:38:26] in the debug log I see a lot of "No changes needed" [20:38:47] hrm, lemme see if there's a way to figure that out [20:39:02] (there is, just dunno the magic offhand) [20:45:22] welp, maybe there is no magic to get that, other than the rpc which is what the status page uses. [20:45:38] I don't know how to get that info from gearman directly, but zuul thinks it's still queued [20:50:53] hrm, was there a zuul upgrade at some point? [20:51:02] There was a jenkins upgrade this morning. [20:51:20] So the jenkins/gearman/zuul state may have got into an eddy somewhere. [20:51:34] I see this in the debug log https://phabricator.wikimedia.org/P8998 [20:52:03] Hmm, that's not ideal. [20:52:35] Is it happening lots? [20:54:34] hrm, not a whole lot [20:55:02] and it looks like it's been happening for...a while [20:55:09] looking at old logs [20:57:54] Oh good. [21:00:49] thcipriani: There's only one thing in gate-and-submit; I suggest now is a not-bad time to do a full reset. [21:05:22] Hi, can anyone refresh zuul as it stucks with jobs? [21:05:59] James_F: should we wait for gate-and-submit to clear or just do it? [21:06:52] Just do it; I'll re-trigger the Parsoid job. [21:07:50] * thcipriani jfdi [21:08:11] !log restarting zuul server since it's out-of-sync with reality [21:08:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:09:45] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<40.00%) [21:10:25] James_F: done [21:11:35] Yep, it is better now thank you! [21:11:48] * thcipriani files task [21:11:53] Well, it's now empty. [21:12:03] You're assuming that that's better. :-) [21:13:06] But everything was mixed [21:13:17] It queued already merged patches for tests and etc [21:14:01] Can you +2 on https://gerrit.wikimedia.org/r/532854 and https://gerrit.wikimedia.org/r/532855 :) [21:21:55] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [21:23:17] James_F: Requeue https://gerrit.wikimedia.org/r/531703 and https://gerrit.wikimedia.org/r/531672 [21:24:52] Zoranzoki21: Yeah, now that things seem to get getting back on track I [21:24:57] 'm going through https://gerrit.wikimedia.org/r/q/is:open+label:Code-Review%252B2+age:1h [21:25:09] James_F: I can help you if you want [21:25:10] those were +2d at 6am yesterday...so...39 hours ago? [21:25:13] I doing it too [21:25:25] Let's take it slowly. [21:25:33] thcipriani: I no know, before I reported you problem I saw ~5 hours queued patches [21:27:33] I think zuul will get a stroke [21:35:05] James_F: Add in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Editcount/+/525984/ T201491 as topic [21:35:06] T201491: Fix common typos in code - https://phabricator.wikimedia.org/T201491 [21:36:03] Zoranzoki21: Why are you linking me to a patch I've already C+2'ed? [21:37:10] Because I can't add topic [21:38:19] I don't understand. [21:40:20] See https://snipboard.io/HJ2tgG.jpg [21:40:33] I want to add topic T201491 but I can't [21:40:34] T201491: Fix common typos in code - https://phabricator.wikimedia.org/T201491 [21:40:37] Why? [21:41:01] Because it is patch for which we have task but it isn't included in commit message. [21:41:29] It's a low-value co-ordination task. Tagging it doesn't make the world significantly better. :-) [21:41:41] Ok, I understand you. No problem [21:42:01] I found same typo in GlobalContribs extension. [21:42:05] So, I made patch :) [21:42:17] Yes, and I fixed your patch to be less confusing and triggered merge on it. [21:42:21] Thank you. :-) [21:43:19] Oh, thank you very much! I apologize to everyone if you feel bad for me [21:43:42] My goal is not to hurt anyone, but to help ;) [21:43:57] No worries. Try to make patches that are easier to understand from the title in future, they get merged faster. [21:44:33] Thank you! I will listen to this advice for sure. [22:11:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [22:49:18] (03PS2) 10Jforrester: integration: add tests for the labs/tools/maintain-kubeusers repo [integration/config] - 10https://gerrit.wikimedia.org/r/530606 (https://phabricator.wikimedia.org/T228499) (owner: 10Bstorm) [22:49:44] (03CR) 10Jforrester: [C: 03+2] "> Patch Set 1:" [integration/config] - 10https://gerrit.wikimedia.org/r/530606 (https://phabricator.wikimedia.org/T228499) (owner: 10Bstorm) [22:58:31] (03Merged) 10jenkins-bot: integration: add tests for the labs/tools/maintain-kubeusers repo [integration/config] - 10https://gerrit.wikimedia.org/r/530606 (https://phabricator.wikimedia.org/T228499) (owner: 10Bstorm) [22:58:39] Finally. [23:00:20] !log Zuul: [TimedMediaHandler] Run phan job T224766 (merged but not deployed?) [23:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:00:24] T224766: Add phan to TimedMediaHandler extension - https://phabricator.wikimedia.org/T224766 [23:00:38] !log Zuul: add tests for the labs/tools/maintain-kubeusers repo T228499 [23:00:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:00:41] T228499: Toolforge: changes to maintain-kubeusers - https://phabricator.wikimedia.org/T228499 [23:09:16] (03PS4) 10Jforrester: tests: Expand tests to explicitly cover each branch we care about [integration/config] - 10https://gerrit.wikimedia.org/r/518179 [23:12:52] (03CR) 10jerkins-bot: [V: 04-1] tests: Expand tests to explicitly cover each branch we care about [integration/config] - 10https://gerrit.wikimedia.org/r/518179 (owner: 10Jforrester) [23:19:32] (03PS5) 10Jforrester: tests: Expand tests to explicitly cover each branch we care about [integration/config] - 10https://gerrit.wikimedia.org/r/518179 [23:22:33] (03CR) 10jerkins-bot: [V: 04-1] tests: Expand tests to explicitly cover each branch we care about [integration/config] - 10https://gerrit.wikimedia.org/r/518179 (owner: 10Jforrester) [23:24:38] 10Phabricator: Disable "Browse Gerrit Projects" on https://phabricator.wikimedia.org/r/ - https://phabricator.wikimedia.org/T228507 (10mmodell) @MarcoAurelio I intend to automate the updates of this page, I just haven't gotten around to it. I still find it useful but maybe it's not worth the effort to write the... [23:25:49] (03PS6) 10Jforrester: tests: Expand tests to explicitly cover each branch we care about [integration/config] - 10https://gerrit.wikimedia.org/r/518179 [23:31:31] (03PS5) 10Jforrester: layout: Replace negative branch matches of quibble jobs with positive ones [integration/config] - 10https://gerrit.wikimedia.org/r/518088 [23:33:04] (03CR) 10jerkins-bot: [V: 04-1] layout: Replace negative branch matches of quibble jobs with positive ones [integration/config] - 10https://gerrit.wikimedia.org/r/518088 (owner: 10Jforrester) [23:42:51] (03CR) 10Thcipriani: php-fpm: restart as mwdeploy as root (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/532829 (owner: 10Thcipriani)