[00:49:47] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:14:43] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:34:52] Project beta-scap-eqiad build #36905: FAILURE in 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36905/ [01:42:27] 3Continuous-Integration: Gallium must be backed up (tracking) - https://phabricator.wikimedia.org/T65934#954351 (10Krinkle) p:5Normal>3High [01:42:54] 3Continuous-Integration: Figure out paths that needs to be backed up on gallium - https://phabricator.wikimedia.org/T65938#954355 (10Krinkle) [01:44:43] 3Continuous-Integration: Figure out paths that needs to be backed up on gallium - https://phabricator.wikimedia.org/T65938#710875 (10Krinkle) [01:44:55] Project beta-scap-eqiad build #36906: STILL FAILING in 1 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36906/ [01:55:25] Yippee, build fixed! [01:55:25] Project beta-scap-eqiad build #36907: FIXED in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36907/ [02:11:34] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<11.11%) [02:46:31] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<30.00%) [03:16:37] (03CR) 10Krinkle: [C: 04-1] Setup php-composer-validate for operations/mediawiki-config (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/180591 (owner: 10Legoktm) [05:34:00] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #468: FAILURE in 47 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/468/ [05:55:00] Yippee, build fixed! [05:55:01] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #376: FIXED in 47 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/376/ [06:36:32] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [06:38:15] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:03:11] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [07:05:21] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #387: FAILURE in 23 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/387/ [07:30:10] (03PS2) 10Legoktm: Setup php-composer-validate for operations/mediawiki-config [integration/config] - 10https://gerrit.wikimedia.org/r/180591 [07:30:12] (03CR) 10jenkins-bot: [V: 04-1] Setup php-composer-validate for operations/mediawiki-config [integration/config] - 10https://gerrit.wikimedia.org/r/180591 (owner: 10Legoktm) [07:34:46] (03PS3) 10Legoktm: Setup php-composer-validate for operations/mediawiki-config [integration/config] - 10https://gerrit.wikimedia.org/r/180591 [07:36:29] (03PS1) 10Legoktm: Add php-composer-validate for wikimedia/wikimania-scholarships [integration/config] - 10https://gerrit.wikimedia.org/r/182769 [07:40:42] (03CR) 10Legoktm: Setup php-composer-validate for operations/mediawiki-config (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/180591 (owner: 10Legoktm) [07:55:00] Project beta-scap-eqiad build #36943: FAILURE in 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36943/ [08:05:03] Project beta-scap-eqiad build #36944: STILL FAILING in 1 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36944/ [08:15:49] Yippee, build fixed! [08:15:49] Project beta-scap-eqiad build #36945: FIXED in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36945/ [09:47:39] !log upgrade tox on all Jenkins slaves to 1.8.1 {{bug|T85662}} [09:48:17] 3Continuous-Integration: Upgrade tox on integration slaves - https://phabricator.wikimedia.org/T85662#954522 (10hashar) 5Open>3Resolved a:3hashar I have upgraded tox on all Jenkins labs slaves so they now have tox 1.8.1. Thank you! [09:58:57] (03CR) 10Hashar: [C: 04-1] "When running flake8 with python 3.4.2 I got two errors:" [integration/config] - 10https://gerrit.wikimedia.org/r/182067 (owner: 10XZise) [10:12:40] 3Beta-Cluster, operations: Beta servers can be badly misconfigured if mwyaml hiera backend fails - https://phabricator.wikimedia.org/T78408#954536 (10Joe) 5Open>3Resolved [10:21:51] (03PS1) 10Hashar: pywikibot-i18n-npm [integration/config] - 10https://gerrit.wikimedia.org/r/182775 [10:24:12] (03CR) 10Hashar: [C: 032] pywikibot-i18n-npm [integration/config] - 10https://gerrit.wikimedia.org/r/182775 (owner: 10Hashar) [10:26:53] Project beta-scap-eqiad build #36959: FAILURE in 1 min 39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36959/ [10:32:07] (03Merged) 10jenkins-bot: pywikibot-i18n-npm [integration/config] - 10https://gerrit.wikimedia.org/r/182775 (owner: 10Hashar) [10:35:30] (03PS1) 10Hashar: zuul: expand pywikibot/i18n jobs [integration/config] - 10https://gerrit.wikimedia.org/r/182779 [10:35:48] (03PS2) 10Hashar: zuul: expand jobs teamplate for pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/182779 [10:35:54] Yippee, build fixed! [10:35:54] Project beta-scap-eqiad build #36960: FIXED in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36960/ [10:36:59] (03CR) 10Hashar: [C: 032] zuul: expand jobs teamplate for pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/182779 (owner: 10Hashar) [10:37:42] (03CR) 10jenkins-bot: [V: 04-1] zuul: expand jobs teamplate for pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/182779 (owner: 10Hashar) [10:39:00] (03PS3) 10Hashar: zuul: expand jobs teamplate for pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/182779 [10:39:14] (03CR) 10Hashar: [C: 032] zuul: expand jobs teamplate for pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/182779 (owner: 10Hashar) [10:39:59] (03Merged) 10jenkins-bot: zuul: expand jobs teamplate for pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/182779 (owner: 10Hashar) [10:58:21] (03PS2) 10Hashar: Make jslint voting for VipsScaler [integration/config] - 10https://gerrit.wikimedia.org/r/181948 (owner: 10Unicodesnowman) [10:59:17] (03CR) 10Hashar: [C: 032] "Thank you!" [integration/config] - 10https://gerrit.wikimedia.org/r/181948 (owner: 10Unicodesnowman) [11:00:08] (03Merged) 10jenkins-bot: Make jslint voting for VipsScaler [integration/config] - 10https://gerrit.wikimedia.org/r/181948 (owner: 10Unicodesnowman) [11:05:21] (03PS5) 10Hashar: Translate depends on ULS [integration/config] - 10https://gerrit.wikimedia.org/r/181574 (owner: 10Jarry1250) [11:07:49] (03PS6) 10Hashar: Translate depends on ULS [integration/config] - 10https://gerrit.wikimedia.org/r/181574 (owner: 10Jarry1250) [11:11:11] (03CR) 10Hashar: [C: 032] "I have fixed the Bug: link and listed the jobs being updated. Confirmed to fix the Translate job :)" [integration/config] - 10https://gerrit.wikimedia.org/r/181574 (owner: 10Jarry1250) [11:11:22] (03CR) 10Hashar: "Oh a" [integration/config] - 10https://gerrit.wikimedia.org/r/181574 (owner: 10Jarry1250) [11:11:40] (03CR) 10Hashar: "Oh and jobs have been refreshed:" [integration/config] - 10https://gerrit.wikimedia.org/r/181574 (owner: 10Jarry1250) [11:18:48] (03Merged) 10jenkins-bot: Translate depends on ULS [integration/config] - 10https://gerrit.wikimedia.org/r/181574 (owner: 10Jarry1250) [11:30:26] (03CR) 10Hashar: [C: 032] "Thanks!" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182379 (owner: 10BryanDavis) [11:30:29] (03Merged) 10jenkins-bot: Add a Composer package definition [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182379 (owner: 10BryanDavis) [11:34:43] 3Librarization, MediaWiki-Core-Team, Continuous-Integration: Publish MediaWiki codesniffer config on Packagist - https://phabricator.wikimedia.org/T85631#954621 (10hashar) 5Open>3Resolved The repository now has a composer.json with the package name `mediawiki/mediawiki-codesniffer`. I have added it to packa... [11:35:16] (03CR) 10Hashar: "https://packagist.org/packages/mediawiki/mediawiki-codesniffer \O/" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182379 (owner: 10BryanDavis) [12:03:46] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [12:09:30] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [12:28:43] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [12:31:39] PROBLEM - Puppet failure on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:34:33] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [12:45:10] Project beta-scap-eqiad build #36973: FAILURE in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36973/ [12:55:12] Yippee, build fixed! [12:55:13] Project beta-scap-eqiad build #36974: FIXED in 1 min 18 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36974/ [13:06:01] 3Beta-Cluster: DNS lookup fails for http://wikidata.beta.wmflabs.org/ - https://phabricator.wikimedia.org/T85793#954860 (10Tobi_WMDE_SW) 3NEW [13:07:12] 3Beta-Cluster: DNS lookup fails for http://wikidata.beta.wmflabs.org/ - https://phabricator.wikimedia.org/T85793#954869 (10hoo) p:5Triage>3High [13:10:46] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:12:28] 3Beta-Cluster: DNS lookup fails for http://wikidata.beta.wmflabs.org/ - https://phabricator.wikimedia.org/T85793#954876 (10yuvipanda) Caused by my actions in T85469 apparently. I've removed that set of domain names, and things seem to work fine. [13:12:43] 3Beta-Cluster: m.wikidata.beta.wmflabs.org should point to a mobile IP - https://phabricator.wikimedia.org/T85469#954880 (10yuvipanda) 5Resolved>3Open [13:13:10] 3Wikidata, Beta-Cluster: DNS lookup fails for http://wikidata.beta.wmflabs.org/ - https://phabricator.wikimedia.org/T85793#954881 (10Lydia_Pintscher) [13:23:59] Yippee, build fixed! [13:23:59] Project browsertests-Wikidata-PerformanceTests-linux-firefox-sauce build #108: FIXED in 1 min 49 sec: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-PerformanceTests-linux-firefox-sauce/108/ [13:24:53] 3MediaWiki-extensions-Translate, Continuous-Integration: CI tests fail for Translate extension patches - https://phabricator.wikimedia.org/T85664#954897 (10hashar) 5Open>3Resolved a:3hashar Fixed by adding the required dependencies. [13:24:58] 3Wikidata, Beta-Cluster: DNS lookup fails for http://wikidata.beta.wmflabs.org/ - https://phabricator.wikimedia.org/T85793#954901 (10yuvipanda) Works now (P193 for dig output) [13:29:26] 3Beta-Cluster: Setup monitoring for Beta cluster (tracking) - https://phabricator.wikimedia.org/T53497#954907 (10hashar) [13:33:17] 3Wikidata, Beta-Cluster: DNS lookup fails for http://wikidata.beta.wmflabs.org/ - https://phabricator.wikimedia.org/T85793#954926 (10yuvipanda) 5Open>3Resolved a:3yuvipanda [13:38:55] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:40:44] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:53:16] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: CI labs instances can't start on reboot: tmpfs: Bad value 'jenkins-deploy' for mount option 'uid' - https://phabricator.wikimedia.org/T76250#954959 (10hashar) To workaround the boot sequence not finding jenkins-slave user, we could have the tmpfs mounted ju... [14:03:56] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [14:26:57] 3Continuous-Integration: mw-debug.log missing in Jenkins jobs (Failed to be created "Permission denied") - https://phabricator.wikimedia.org/T85799#955014 (10Krinkle) 3NEW [14:27:07] 3Continuous-Integration: fix the qunit tests for wikidata: mwext-Wikibase-qunit - https://phabricator.wikimedia.org/T74184#955021 (10Krinkle) Filed: * {T73058} * {T85799} [14:44:24] 3Continuous-Integration: V+1 checks for non-whitelisted users are missing some linters included in V+2 voting checks - https://phabricator.wikimedia.org/T85800#955063 (10matmarex) 3NEW [14:49:40] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [14:51:44] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:51:46] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [14:54:51] !log rebooting integration-slave1001 to ensure it manages to mount the tmpfs on boot ( https://phabricator.wikimedia.org/T76250 ) [14:54:54] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [14:55:20] !log slave1001 came back \O/ [14:56:26] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: CI labs instances can't start on reboot: tmpfs: Bad value 'jenkins-deploy' for mount option 'uid' - https://phabricator.wikimedia.org/T76250#955074 (10hashar) Applied on all labs CI slaves using: ``` umount /mnt/home/jenkins-deploy/tmpfs rmdir /mnt/home/jen... [14:56:40] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:03:45] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:10:29] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:10:40] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [15:11:56] umm [15:11:56] tests are broken [15:11:56] https://gerrit.wikimedia.org/r/#/c/182652/ [15:12:02] 15:00:47 mkdir: cannot create directory ‘/mnt/home/jenkins-deploy/tmpfs/mediawiki-phpunit-hhvm’: Permission denied [15:12:05] 15:00:45 mkdir: cannot create directory `/mnt/home/jenkins-deploy/tmpfs/mediawiki-core-qunit': Permission denied [15:13:26] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:13:48] hashar: halp [15:13:56] bah [15:14:03] MatmaRex: yeah that is me blame me :( [15:16:29] !log /mnt/home/jenkins-deploy/tmpfs was not properly mounted on integration-slave1009 causing the mediawiki-phpunit-hhvm job to fail [15:16:33] MatmaRex: being fixed hopefully [15:16:45] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:17:15] stupid puppet [15:19:40] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [15:19:44] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:19:44] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:21:46] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:26:38] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:28:23] MatmaRex: should be good now [15:28:44] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [15:30:40] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:35:29] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [15:37:40] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:38:50] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:40:45] RECOVERY - Puppet failure on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:43:23] RECOVERY - Puppet failure on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:44:42] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:45:57] Project beta-scap-eqiad build #36988: FAILURE in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36988/ [15:46:28] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:49:27] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:49:47] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [15:51:59] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:55:38] Yippee, build fixed! [15:55:39] Project beta-scap-eqiad build #36989: FIXED in 1 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/36989/ [16:00:43] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:02:43] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:03:47] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:04:55] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [16:08:33] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:13:11] (03CR) 10Hashar: [C: 04-1] "Patchset 3 removed the use of Zuul cloner when I want to eventually get rid of the Jenkins git plugin entirely. I suspect Timo did it bec" [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [16:14:23] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:14:49] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:15:39] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [16:16:32] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:16:59] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [16:17:18] 3Beta-Cluster, Labs-Team, operations: Core dumps fill up /var on labs instances - https://phabricator.wikimedia.org/T1259#955291 (10greg) >>! In T1259#943783, @yuvipanda wrote: > Also, all of these are hhvm - are the hhvm core dumps from beta useful at all, or should we disable them? @joe or @ori or @bd808 ? [16:19:33] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:19:43] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:21:04] 3Beta-Cluster, Labs-Team, operations: Core dumps fill up /var on labs instances - https://phabricator.wikimedia.org/T1259#955312 (10bd808) >>! In T1259#955291, @greg wrote: >>>! In T1259#943783, @yuvipanda wrote: >> Also, all of these are hhvm - are the hhvm core dumps from beta useful at all, or should we disab... [16:22:25] (03CR) 10Hashar: "Also looking at composer cli documentation https://getcomposer.org/doc/03-cli.md there is a few env variables we might want to set:" [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [16:23:22] (03CR) 10BryanDavis: "Is perfect being the enemy of good enough here? It seems a little crazy that it takes 2 months of discussion to create a new template to r" [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [16:29:10] 3Release-Engineering, Continuous-Integration: Zuul-cloner forgets to clear workspace - https://phabricator.wikimedia.org/T76304#955330 (10Krinkle) @hashar That is not an option. Both composer and npm have proven in the past that they do not support long-lived local directories. Things like trying to cover every... [16:30:43] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:31:39] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:32:44] (03CR) 10Krinkle: "If we need manually set environmental variable and removal scripts to clear in-job logistics like 'vendor', then the environment is flawed" [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [16:32:45] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:32:57] (03PS5) 10Krinkle: Add job template for running composer scripts [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [16:33:30] (03CR) 10Krinkle: [C: 031] "Merging this soon as the minimal template. Anything else discussed can be added later. We're not losing anything." [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [16:33:48] (03PS4) 10Krinkle: Replace cdb-phpunit with cdb-composer [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [16:34:02] (03CR) 10Krinkle: [C: 031] Replace cdb-phpunit with cdb-composer [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [16:34:11] (03CR) 10Hashar: "I give up." [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [16:38:34] RECOVERY - Puppet failure on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:16] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: CI labs instances can't start on reboot: tmpfs: Bad value 'jenkins-deploy' for mount option 'uid' - https://phabricator.wikimedia.org/T76250#955365 (10Krinkle) 5Open>3Resolved Thanks! By the way, do we have a ticket to track LDAP not being available? O... [16:39:24] RECOVERY - Puppet failure on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:44:47] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:45:07] PROBLEM - Puppet failure on deployment-cache-mobile03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:46:21] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:48:00] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:48:15] oh good morning to you too, shinken-wm [16:49:45] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [16:50:44] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:55:40] 3Quality-Assurance, MediaWiki-extensions-Flow: Flow reply_moderation browser test has erroneous selector for "3rd comment on the topic" - https://phabricator.wikimedia.org/T85201#955377 (10Cmcmahon) p:5Low>3Normal [16:56:26] 3Quality-Assurance, MediaWiki-extensions-MultimediaViewer, Multimedia: Navigation browser test no longer works with Safari driver - https://phabricator.wikimedia.org/T85802#955381 (10Gilles) [16:56:41] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [16:56:44] 3Quality-Assurance, MediaWiki-extensions-MultimediaViewer, Multimedia: Navigation browser test no longer works with Safari driver - https://phabricator.wikimedia.org/T85802#955382 (10Gilles) a:3Cmcmahon [16:57:06] greg-g: I was going to silence puppet failures from shinken-wm until the DNS stuff gets fixed, but then less incentive for people to poke Coren :P [16:58:20] :) [17:02:45] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:05:39] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:06:09] (03PS6) 10Krinkle: Add job template for running composer scripts [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [17:06:11] (03PS5) 10Krinkle: Replace cdb-phpunit with cdb-composer [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [17:07:46] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:08:46] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0] [17:11:22] !log restarted logstash on logstash1. 127910 events in redis queue [17:12:45] 3Labs-Team, Beta-Cluster, operations: Core dumps fill up /var on labs instances - https://phabricator.wikimedia.org/T1259#955410 (10greg) @Ori, ideas on how to manage these? Should you (or someone else close to HHVM) take a weekly gander at the dumps on beta or something else? Ideally we'd have auto bug reportin... [17:14:09] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3617 bytes in 0.085 second response time [17:14:23] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3617 bytes in 0.109 second response time [17:14:26] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3897 bytes in 0.093 second response time [17:14:43] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3617 bytes in 0.106 second response time [17:16:38] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:17:44] (03PS1) 10Krinkle: Upgrade JSHint from v2.5.6 to 2.5.11 [integration/jenkins] - 10https://gerrit.wikimedia.org/r/182836 [17:17:59] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [17:19:36] "Unexpected non-MediaWiki exception encountered, of type "InvalidArgumentException" [17:19:36] [1dff465f] /wiki/Main_Page InvalidArgumentException from line 88 of /srv/mediawiki/php-master/includes/libs/ObjectFactory.php: Provided specification lacks both factory and class parameters." [17:19:46] yeah [17:21:53] (from -operations, where I accidentally pasted the same thing) bd-808 thinks it's logging (probably right), is looking into it [17:22:20] (03PS2) 10Krinkle: Upgrade JSHint from v2.5.6 to 2.5.11 [integration/jenkins] - 10https://gerrit.wikimedia.org/r/182836 [17:22:43] (03CR) 10Krinkle: [C: 032] Upgrade JSHint from v2.5.6 to 2.5.11 [integration/jenkins] - 10https://gerrit.wikimedia.org/r/182836 (owner: 10Krinkle) [17:22:49] (03Merged) 10jenkins-bot: Upgrade JSHint from v2.5.6 to 2.5.11 [integration/jenkins] - 10https://gerrit.wikimedia.org/r/182836 (owner: 10Krinkle) [17:23:18] (now we just wait for scap on beta, the merge is in) [17:24:10] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 49901 bytes in 0.820 second response time [17:24:13] http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page WFM now [17:24:16] yay [17:24:19] ty sir [17:24:22] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 49892 bytes in 0.553 second response time [17:24:26] (03PS7) 10Krinkle: Add job template for running composer scripts [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [17:24:28] (03PS6) 10Krinkle: Replace cdb-phpunit with cdb-composer [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [17:24:29] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 50097 bytes in 0.680 second response time [17:24:43] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 49902 bytes in 0.692 second response time [17:25:08] bd808: also, "welcome back" (though you were back at work things before I was) [17:25:32] Did I ever leave? ;) [17:25:46] unsure, I only assumed :) [17:26:40] PROBLEM - Puppet failure on deployment-restbase03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:26:49] bd808: I am leaving for the night, but as a teaser you got mail :D https://www.mediawiki.org/wiki/Continuous_integration/Tests_entry_points [17:27:07] hashar: awesome. [17:27:09] bd808: was busy with other things in December, sorry for the cdb / composer jobs :-( [17:27:27] I wanted to go one way and timo to another way, that surely did not help hehe [17:27:30] no worries. we all have 10 more things to work on than we can get to [17:28:00] the composer things are what left in my mail backlog [17:28:10] so I guess I will work it out with Timo this week [17:28:20] can't wait to drop all the legacy phpcs jobs we have [17:28:23] that would be most excellent [17:28:25] and delegate to devs [17:28:27] * greg-g waves to hashar [17:28:46] greg-g: happy new year :-] [17:28:56] I can add the result to the RFC on how to run a library project too [17:29:15] that would be totally awesome [17:30:05] also have to finish up the job that tests multiple extensions together [17:30:26] mobile and languages teams are eager to migrate to it [17:30:33] (03CR) 10Krinkle: [C: 032] Add job template for running composer scripts [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [17:30:33] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:31:21] (03CR) 10Krinkle: [C: 032] "Deployed new Jenkins job cdb-composer." [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [17:31:25] greg-g: the beta cluster has been reporting bunch of failures for the last two weeks or so. The issue is the labs DNS which has plenty of other bad effete :-( [17:31:28] effects [17:32:19] (03CR) 10Hashar: [C: 031] Replace cdb-phpunit with cdb-composer [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [17:32:37] (03CR) 10Hashar: [C: 031] "As Timo said, that is good enough for now :D" [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [17:32:52] enough overthinking, heading back home *wave* [17:35:32] Project beta-scap-eqiad build #37003: FAILURE in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37003/ [17:37:45] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:38:39] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:38:43] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:38:46] (03CR) 10jenkins-bot: [V: 04-1] Add job template for running composer scripts [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [17:38:48] (03CR) 10jenkins-bot: [V: 04-1] Replace cdb-phpunit with cdb-composer [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [17:40:46] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:41:41] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [17:41:43] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:45:15] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:45:40] Yippee, build fixed! [17:45:40] Project beta-scap-eqiad build #37004: FIXED in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37004/ [17:46:33] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:49:18] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:49:51] (03CR) 10Krinkle: [V: 032] "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [17:49:59] (03CR) 10Krinkle: [C: 032] Add job template for running composer scripts [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [17:51:44] RECOVERY - Puppet failure on deployment-restbase03 is OK: OK: Less than 1.00% above the threshold [0.0] [17:57:39] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:58:36] PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:59:45] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:01:37] (03Merged) 10jenkins-bot: Add job template for running composer scripts [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [18:01:40] (03Merged) 10jenkins-bot: Replace cdb-phpunit with cdb-composer [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [18:03:39] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:03:56] (03CR) 10Legoktm: Add job template for running composer scripts (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [18:04:19] !log Reload Zuul to deploy I986bc438acfb19a0a7b36e1b435dfd4423a66a25 [18:06:44] Project beta-scap-eqiad build #37006: FAILURE in 2 min 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37006/ [18:11:40] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:17] RECOVERY - Puppet failure on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [18:15:15] RECOVERY - Puppet failure on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:15:46] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:19:41] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:22:44] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [18:28:31] (03PS1) 10Legoktm: Fix formatting [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182855 [18:28:33] (03PS1) 10Legoktm: Update README.md [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182856 [18:28:36] RECOVERY - Puppet failure on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [18:29:38] (03CR) 10BryanDavis: [C: 032] Fix formatting [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182855 (owner: 10Legoktm) [18:29:40] (03Merged) 10jenkins-bot: Fix formatting [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182855 (owner: 10Legoktm) [18:29:41] RECOVERY - Puppet failure on deployment-cxserver03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:30:15] (03CR) 10BryanDavis: [C: 032] Update README.md [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182856 (owner: 10Legoktm) [18:30:17] (03Merged) 10jenkins-bot: Update README.md [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182856 (owner: 10Legoktm) [18:31:44] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:33:46] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [18:35:38] (03CR) 10Legoktm: "I changed the package to point to Github so it'll download zipballs which can be cached, and set up the packagist service hook." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182379 (owner: 10BryanDavis) [18:38:23] PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:38:51] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #180: FAILURE in 36 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/180/ [18:40:53] Yippee, build fixed! [18:40:54] Project beta-scap-eqiad build #37007: FIXED in 26 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37007/ [18:43:19] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #234: FAILURE in 36 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/234/ [18:45:12] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:45:49] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:47:30] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:47:34] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:51:44] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:56:44] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:56:57] Project browsertests-Flow-test2.wikipedia.org-windows_8-internet_explorer-sauce build #374: FAILURE in 55 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-test2.wikipedia.org-windows_8-internet_explorer-sauce/374/ [19:03:23] RECOVERY - Puppet failure on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [19:12:33] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:15:14] RECOVERY - Puppet failure on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [19:16:44] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:23:40] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [19:24:38] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:26:17] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:26:45] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:28:45] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [19:34:43] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:37:32] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:43:34] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:44:53] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #218: FAILURE in 44 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/218/ [19:48:23] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:50:20] (03CR) 10Hashar: "From my comment on the task:" [integration/config] - 10https://gerrit.wikimedia.org/r/180591 (owner: 10Legoktm) [19:54:56] (03CR) 10Legoktm: "Sorry, I didn't link https://gerrit.wikimedia.org/r/#/c/180589/ here, which moves the composer.json into the repo root." [integration/config] - 10https://gerrit.wikimedia.org/r/180591 (owner: 10Legoktm) [19:56:15] RECOVERY - Puppet failure on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [19:58:45] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:59:47] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:05:56] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:08:58] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [20:09:17] 3Wikimedia-Labs-Infrastructure, Continuous-Integration: CI labs instances can't start on reboot: tmpfs: Bad value 'jenkins-deploy' for mount option 'uid' - https://phabricator.wikimedia.org/T76250#955973 (10hashar) >>! In T76250#955365, @Krinkle wrote: > By the way, do we have a ticket to track LDAP not being av... [20:15:54] Project beta-scap-eqiad build #37018: FAILURE in 1 min 46 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37018/ [20:18:23] RECOVERY - Puppet failure on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:19:11] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:19:16] (03CR) 10Hashar: Add job template for running composer scripts (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [20:22:37] (03PS2) 10Jdlrobson: Publish documentation for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/181693 [20:25:26] Project beta-scap-eqiad build #37019: STILL FAILING in 1 min 28 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37019/ [20:27:32] (03CR) 10BryanDavis: Add job template for running composer scripts (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [20:30:41] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:30:54] RECOVERY - Puppet failure on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [20:30:58] (03CR) 10Hashar: "The rsync craziness and run on gallium is legacy, should be done by running the doc on labs instance and use the push-doc macro to publish" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/181693 (owner: 10Jdlrobson) [20:31:27] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #437: FAILURE in 22 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/437/ [20:32:44] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:33:39] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:34:39] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:34:43] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:35:45] Yippee, build fixed! [20:35:45] Project beta-scap-eqiad build #37020: FIXED in 1 min 46 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37020/ [20:38:59] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [20:40:42] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:42:42] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:43:02] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:44:12] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [20:45:38] Project beta-scap-eqiad build #37021: FAILURE in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37021/ [20:46:55] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:55:31] Yippee, build fixed! [20:55:31] Project beta-scap-eqiad build #37022: FIXED in 1 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37022/ [20:57:41] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:04:40] RECOVERY - Puppet failure on deployment-parsoid05 is OK: OK: Less than 1.00% above the threshold [0.0] [21:04:47] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:07:45] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:08:33] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:08:43] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [21:08:56] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:10:42] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [21:11:54] RECOVERY - Puppet failure on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [21:12:42] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:13:01] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:14:24] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #377: FAILURE in 50 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/377/ [21:19:25] PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:20:45] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:22:53] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:27:39] PROBLEM - Puppet failure on deployment-restbase03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:29:10] 3Release-Engineering, Continuous-Integration: Zuul-cloner forgets to clear workspace - https://phabricator.wikimedia.org/T76304#956276 (10hashar) I acknowledged that Zuul cloner not cleaning the workspace can be problematic. I am not denying it, I simply lack cycles to add support for clearing the repositories a... [21:33:56] RECOVERY - Puppet failure on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:34:48] RECOVERY - Puppet failure on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [21:36:03] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:36:39] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:37:39] RECOVERY - Puppet failure on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:38:08] 3Beta-Cluster: VE connection to Parsoid is broken again - https://phabricator.wikimedia.org/T85863#956300 (10Krenair) [21:38:37] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:43:46] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0] [21:44:26] RECOVERY - Puppet failure on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [21:44:42] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [21:46:35] woops [21:49:30] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:49:45] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:50:00] bd808 or YuviPanda: there was something I wanted to check on deployment-bastion, but I no longer seem to be able to ssh there from bastion.wmflabs.org. ($ eval `ssh-agent`/ssh -A cmcmahon@bastion.wmflabs.org/ssh deployment-bastion/Permission denied (publickey) [21:52:14] chrismcmahon: hmmm... you are a member of the group. Sounds like a ssh key problem. You got logged into bastion ok though? [21:52:41] RECOVERY - Puppet failure on deployment-restbase03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:52:55] bd808: yeah, I get into bastion.wmflabs.org just fine. just getting from there to any of the deployment-foo hosts has not worked for me in a while [21:52:59] RECOVERY - Puppet failure on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [21:53:38] bd808: at one point I was a bit confused about key forwarding but I think I did it right just now. [21:53:55] I do it this way -- https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_.28recommended.29 [21:54:15] but yeah ssh -A should forward your agent [21:55:13] on bastion you can type `ssh-add -l` to see if the agent forwarding is actually working [21:55:14] you'd think [21:55:22] checking... [21:55:40] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:55:57] $ `ssh-add -l` [21:55:58] No command 'The' found, did you mean: [21:55:58] Command 'the' from package 'the' (universe) [21:55:59] The: command not found [21:56:10] bd808: mean anything to you? ^^ [21:56:16] heh. leave off the backticks [21:56:34] $ ssh-add -l [21:56:34] The agent has no identities. [21:56:48] ssh is all about the backticks [21:56:54] so forwarding isn't really working for you I guess [21:57:18] you might try setting up the proxy command bits in your .ssh/config [21:57:31] that's a better way to get past the bastion anyway [21:58:00] better in it doesn't expose your ssh-agent to folks with root on the basion [21:58:03] *bastion [21:58:07] OK. thanks bd808 [21:58:32] (or folks with a local root exploit either) [21:58:45] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:59:23] weird, I know this used to work. [21:59:37] PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:00:41] (03PS1) 10Hashar: Composer test entry point [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182899 [22:01:03] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [22:03:49] (03PS1) 10Hashar: Switch mediawiki/tools/codesniffer to composer [integration/config] - 10https://gerrit.wikimedia.org/r/182900 [22:04:22] (03CR) 10Hashar: "Jenkins job and Zuul config made with https://gerrit.wikimedia.org/r/182900" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182899 (owner: 10Hashar) [22:06:40] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:03] (03CR) 10Hashar: "A run of this patchset under Jenkins is https://integration.wikimedia.org/ci/job/mw-tools-codesniffer-composer/1/console (success)" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182899 (owner: 10Hashar) [22:10:58] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [22:11:18] Project beta-scap-eqiad build #37029: FAILURE in 1 min 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37029/ [22:13:22] (03PS1) 10Hashar: Clean up {name}-composer comment [integration/config] - 10https://gerrit.wikimedia.org/r/182932 [22:14:31] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:21] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:15:33] (03PS1) 10Hashar: composer install no more output progress [integration/config] - 10https://gerrit.wikimedia.org/r/182933 [22:16:04] (03CR) 10Hashar: "Example of mangled output: https://integration.wikimedia.org/ci/job/mw-tools-codesniffer-composer/1/console" [integration/config] - 10https://gerrit.wikimedia.org/r/182933 (owner: 10Hashar) [22:17:05] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [22:18:09] (03CR) 10Legoktm: [C: 031] composer install no more output progress [integration/config] - 10https://gerrit.wikimedia.org/r/182933 (owner: 10Hashar) [22:18:42] (03CR) 10Legoktm: [C: 032] Composer test entry point [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182899 (owner: 10Hashar) [22:18:44] (03Merged) 10jenkins-bot: Composer test entry point [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/182899 (owner: 10Hashar) [22:19:48] (03CR) 10Legoktm: [C: 031] Switch mediawiki/tools/codesniffer to composer [integration/config] - 10https://gerrit.wikimedia.org/r/182900 (owner: 10Hashar) [22:20:40] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:20:43] RECOVERY - Puppet failure on deployment-parsoid05 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:36] RECOVERY - Puppet failure on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [22:28:42] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:30:58] Yippee, build fixed! [22:30:58] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #388: FIXED in 28 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/388/ [22:31:43] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [22:32:36] mmm chrome sauce [22:33:05] twentyafterfour: :) [22:36:38] 3Beta-Cluster: VE connection to Parsoid is broken again - https://phabricator.wikimedia.org/T85863#956508 (10Ryasmeen) [22:37:09] Yippee, build fixed! [22:37:09] Project beta-scap-eqiad build #37030: FIXED in 22 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/37030/ [22:40:10] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [22:40:18] RECOVERY - Puppet failure on deployment-apertium01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:40:58] RECOVERY - Puppet failure on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:42:04] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [22:44:02] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [22:49:43] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [22:50:41] RECOVERY - Puppet failure on deployment-cxserver03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:55:41] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [22:59:47] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:01:45] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:05:11] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [23:09:05] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:19:43] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:36:24] hi marxarelli [23:36:35] chrismcmahon: howdy [23:38:34] marxarelli: I'm going to ask for some help with a big ol' refactoring pretty soon here I think. The page object for Flow has issues, and the deeper I dig the funnier it looks. :-) [23:39:47] chrismcmahon: sure thing! [23:40:39] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [23:56:34] 3Beta-Cluster: m.wikidata.beta.wmflabs.org should point to a mobile IP - https://phabricator.wikimedia.org/T85469#956703 (10greg) p:5Triage>3Normal [23:56:43] 3Beta-Cluster, operations, Labs-Team: Backport new salt-syndic packages - https://phabricator.wikimedia.org/T85442#956706 (10greg) p:5Triage>3Normal [23:59:14] 3Beta-Cluster, MediaWiki-extensions-Score: FileBackendException using tag on beta labs: No backend defined with the name 'global-multiwrite' - https://phabricator.wikimedia.org/T85049#956719 (10greg) p:5Triage>3Normal