[00:01:43] !log deploy-prep awight disabled ORES service [00:01:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:02:16] !log deploy-prep awight enabled ORES service [00:02:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:12:54] Grabbing an emergency deployment window, beginning now. [00:43:58] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10Performance-Team, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3795136 (10aaron) This looks like an integration issue with ChronologyProtector vs W... [00:46:46] greg-g: done with “my” window [00:46:52] :) [00:47:02] you can keep it! :p [00:47:02] I have to run now, don't break anything [00:47:09] no problem ;-) [00:52:46] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10Performance-Team, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3795161 (10aaron) The simple thing is to not set INTERIM keys in the same request th... [01:23:43] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:29:47] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [01:39:01] RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:42:26] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#3795227 (10Paladox) Oh sorry for late reply. Your change may require that the integration config change is merged or that the other repo is published to composer... [01:43:47] RECOVERY - Puppet errors on deployment-eventlog02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:00:15] Project mediawiki-core-code-coverage build #3161: 04FAILURE in 14 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/3161/ [03:23:02] !log Jenkins jobs for mediawiki-core-php55lint consistently failing on integration-slave-jessie ("git: stderr: error: failed to write..") [03:23:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [03:23:26] gotta go. Failed twice on https://gerrit.wikimedia.org/r/#/c/393983/ [03:23:28] o/ [03:23:54] 10Beta-Cluster-Infrastructure, 10Performance-Team: Make MediaWiki profiler in Beta match production - https://phabricator.wikimedia.org/T180766#3795357 (10Krinkle) [03:53:26] 03:00:13 DEBUG:git.cmd:AutoInterrupt wait stderr: 'fatal: write error: No space left on device\nfatal: index-pack failed' [03:53:51] The last Puppet run was at Tue Nov 28 17:19:11 UTC 2017 (634 minutes ago). [03:56:01] !log deleted all workspaces on integration-slave-jessie-1003 /srv ran out of space [03:56:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [03:56:07] /dev/mapper/vd-second--local--disk 21G 4.5G 15G 24% /srv [03:58:23] Yippee, build fixed! [03:58:24] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,BrowserTests build #592: 09FIXED in 2 min 23 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=BrowserTests/592/ [04:00:21] 10Continuous-Integration-Infrastructure: mwgate-php55lint workspaces are getting huge - https://phabricator.wikimedia.org/T179963#3795398 (10Legoktm) Just had to clean this up on integration-slave-jessie-1003. [04:07:50] RECOVERY - Free space - all mounts on integration-slave-jessie-1003 is OK: OK: All targets OK [04:52:10] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10Performance-Team, and 2 others: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3795413 (10aaron) I noticed a worse bug of cpPosTime cookies not being used (not related to WA... [04:54:50] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [05:19:52] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:51:22] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10Performance-Team, and 2 others: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3795451 (10aaron) >>! In T180035#3795413, @aaron wrote: > I noticed a worse bug of cpPosTime c... [07:00:01] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [08:54:11] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:00:39] hello! I am wondering if https://phabricator.wikimedia.org/T181219 is enough to ask for a github mirror from gerrit [09:06:10] elukey: hmm yeah maybe ? :) [09:08:01] !log github: created repo operations-software-druid_exporter | T181219 [09:08:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:08:05] T181219: Mirror the gerrit repo operations/software/druid_exporter - https://phabricator.wikimedia.org/T181219 [09:08:31] ahahahah I didn't mean to push you to do it now, just wanted to know if I got the right procedure :/ [09:08:38] but thanks! [09:09:44] elukey: in theory we just have to create the repo on GitHub and supposedly Gerrit will end up replicating to it [09:09:45] https://github.com/wikimedia/operations-software-druid_exporter [09:09:56] not sure whether the underscore will end up being a problem though [09:10:27] all right I'll keep it monitored! thanks! [09:10:37] we also have https://github.com/wikimedia/operations-software-hhvm_exporter [09:10:48] so the _ should be fine [09:12:09] it does not seem to replicate though :( [09:14:05] !log github: created wikimedia/operations-debs-contenttranslation-apertium-crh-tur and wikimedia/operations-debs-prometheus-openldap-exporter [09:14:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:18:18] !log gerrit: forcing replication: ssh -p 29418 hashar@gerrit.wikimedia.org replication start operations/software/druid_exporter # T181219 [09:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:18:22] T181219: Mirror the gerrit repo operations/software/druid_exporter - https://phabricator.wikimedia.org/T181219 [09:19:42] elukey: mirrored! https://github.com/wikimedia/operations-software-druid_exporter [09:21:15] hashar: you rock, thank you!!! [10:39:57] (03Draft1) 10MarcoAurelio: Archive the ActionEditSubmit extension [integration/config] - 10https://gerrit.wikimedia.org/r/394033 (https://phabricator.wikimedia.org/T180808) [10:40:02] (03PS2) 10MarcoAurelio: Archive the ActionEditSubmit extension [integration/config] - 10https://gerrit.wikimedia.org/r/394033 (https://phabricator.wikimedia.org/T180808) [12:05:50] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Upgrade WebdriverIO to 4.9 - https://phabricator.wikimedia.org/T180144#3796173 (10zeljkofilipin) a:03zeljkofilipin [12:22:47] Yippee, build fixed! [12:22:49] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #601: 09FIXED in 46 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/601/ [13:13:41] 10Continuous-Integration-Config, 10Security: Run composer with `--dev` flag - https://phabricator.wikimedia.org/T180235#3796300 (10Reedy) 05stalled>03declined [13:23:28] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team (Current), and 2 others: Git refusing to clone some ORES submodules - https://phabricator.wikimedia.org/T181552#3796325 (10awight) @thcipriani @mmodell Is the fix for T179013 deployed to production? I'm hoping the fix will be that s... [14:22:20] (03PS1) 10Phuedx: Add npm job for the Chromium render service [integration/config] - 10https://gerrit.wikimedia.org/r/394058 (https://phabricator.wikimedia.org/T179552) [14:30:55] PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [14:31:25] PROBLEM - Puppet errors on deployment-memc06 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:10:54] RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [15:11:28] RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0] [16:28:18] Yippee, build fixed! [16:28:19] Project mediawiki-core-code-coverage build #3162: 09FIXED in 1 hr 28 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/3162/ [16:46:27] 10Continuous-Integration-Infrastructure: Jenkins silently ignores patches if a dependency loop is involved somehow - https://phabricator.wikimedia.org/T181574#3796972 (10Anomie) [17:24:20] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:24:22] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:24:27] PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:24:53] PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:24:57] PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:25:05] PROBLEM - Puppet errors on jenkinstest is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:25:45] PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:26:34] PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:27:46] PROBLEM - Puppet errors on deployment-puppetmaster02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:28:12] PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:29:05] PROBLEM - Puppet errors on integration-slave-docker-1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:29:55] PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:30:03] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:30:11] PROBLEM - Puppet errors on deployment-cassandra3-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:30:25] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:30:26] PROBLEM - Puppet errors on castor02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:30:34] PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:30:38] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:30:44] PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:30:46] PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:31:37] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:31:53] PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:32:17] eeek? [17:32:25] PROBLEM - Puppet errors on deployment-memc06 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:32:49] PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:32:53] PROBLEM - Puppet errors on integration-cumin is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:32:55] i think those may be the same as i am getting [17:32:58] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Variable $::nameservers is not defined! at /etc/puppet/modules/base/manifests/resolving.pp:6 on node phabricator.phabricator.eqiad.wmflabs [17:32:58] Warning: Not using cache on failed catalog [17:32:58] Error: Could not retrieve catalog; skipping run [17:33:26] PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:34:12] PROBLEM - Puppet errors on integration-slave-docker-1005 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:34:48] PROBLEM - Puppet errors on deployment-eventlog02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:35:00] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:35:02] PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:35:22] PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:36:17] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:36:23] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:36:29] PROBLEM - Puppet errors on integration-publishing is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:36:31] PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:37:19] PROBLEM - Puppet errors on saucelabs-03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:37:39] PROBLEM - Puppet errors on saucelabs-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:38:07] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:38:34] PROBLEM - Puppet errors on deployment-memc07 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:39:22] PROBLEM - Puppet errors on deployment-videoscaler01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:39:38] PROBLEM - Puppet errors on deployment-sentry01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:39:40] PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:39:42] PROBLEM - Puppet errors on deployment-cumin is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:40:26] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team (Current), and 2 others: Git refusing to clone some ORES submodules - https://phabricator.wikimedia.org/T181552#3797253 (10awight) @thcipriani OK thank you for the workaround. I'll note that I don't have permissions to do that mysel... [17:40:27] PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:40:37] PROBLEM - Puppet errors on integration-slave-docker-1003 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:40:57] PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:41:03] PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:42:22] PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:42:46] PROBLEM - Puppet errors on integration-slave-jessie-android is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:43:06] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:43:12] PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:43:14] PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:43:47] PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:43:51] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:43:54] PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:44:21] PROBLEM - Puppet errors on integration-slave-docker-1007 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:44:45] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:44:47] PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:45:03] PROBLEM - Puppet errors on deployment-sca03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:45:23] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:45:27] PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:45:35] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:45:39] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:45:53] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:46:14] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:49:21] PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:49:35] PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:49:39] PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:49:51] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:50:01] PROBLEM - Puppet errors on integration-slave-docker-1004 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:50:17] PROBLEM - Puppet errors on integration-slave-docker-1006 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:50:22] PROBLEM - Puppet errors on deployment-sca04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:50:28] PROBLEM - Puppet errors on deployment-cpjobqueue is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:50:38] PROBLEM - Puppet errors on deployment-redis05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:50:58] PROBLEM - Puppet errors on deployment-mediawiki07 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:51:27] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:52:27] How would I find the internal service URL for a beta cluster machine, in this case ores-beta.wmflabs.org ? I don’t see that configured in puppet... [17:52:40] PROBLEM - Puppet errors on webperformance is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:53:29] PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:53:34] PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:54:52] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:55:16] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:55:48] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:56:08] Answering my question, I found the mapping in https://tools.wmflabs.org/openstack-browser/project/deployment-prep [17:56:32] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:56:41] However, I’d like to discover it dynamically, like using ores.discovery.wmnet in production. [17:57:27] My motivation is to have the ORES clients on beta connect directly to the server rather than going through an external proxy. Should I not bother? [17:59:01] K all the other services use the external DNS names. I won’t bother. [18:09:56] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team (Current): Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3797349 (10awight) [18:09:59] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team (Current): Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3797349 (10awight) a:05awight>03None [18:14:05] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team (Current), and 2 others: Git refusing to clone some ORES submodules - https://phabricator.wikimedia.org/T181552#3797381 (10awight) a:05awight>03None [18:28:41] paladox: sorry, in meetings back-to-back, any more info? [18:29:19] hrm... Error 400 on SERVER: Variable $::nameservers is not defined! at /etc/puppet/modules/base/manifests/resolving.pp:6 [18:30:04] and...that's right, it isn't defined....but was it before... [18:36:36] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Run Cucumber+Selenium+Node.js in CI - https://phabricator.wikimedia.org/T179190#3797446 (10zeljkofilipin) a:05zeljkofilipin>03None [18:37:36] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Use createAccount() in Selenium tests - https://phabricator.wikimedia.org/T180379#3797449 (10zeljkofilipin) a:05zeljkofilipin>03None [18:39:13] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 5 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3797455 (10zeljkofilipin) a:05zeljkofilipin>03None [18:39:43] 10Release-Engineering-Team (Kanban), 10Discovery, 10Discovery-Search (Current work), 10User-zeljkofilipin: Run selenium-EXTENSION-jessie Jenkins job for CirrusSearch - https://phabricator.wikimedia.org/T175179#3797457 (10zeljkofilipin) a:05zeljkofilipin>03None [18:40:37] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Run Selenium tests in CI for extensions - https://phabricator.wikimedia.org/T164721#3797470 (10zeljkofilipin) a:05zeljkofilipin>03None [18:42:50] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: WebdriverIO should run Chrome headlessly - https://phabricator.wikimedia.org/T167507#3797479 (10zeljkofilipin) [18:42:52] 10Continuous-Integration-Infrastructure, 10User-zeljkofilipin: Upgrade to Chromium 59 or newer on Debian Jessie in CI - https://phabricator.wikimedia.org/T170032#3797478 (10zeljkofilipin) 05Open>03stalled [18:45:27] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10User-zeljkofilipin: Create a Jenkins job that runs Echo RSpec tests daily - https://phabricator.wikimedia.org/T171753#3797494 (10zeljkofilipin) 05Open>03declined [18:47:52] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 5 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3797516 (10zeljkofilipin) a:03zeljkofilipin [18:48:03] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team (Current): Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3797349 (10mmodell) This is very strange. I can't tell exactly what would be causing this. [19:05:13] PROBLEM - Free space - all mounts on integration-slave-docker-1001 is CRITICAL: CRITICAL: integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_e6e01440f04a145c09affed5a3ca6f3b53daaab53d038a0d9a25a2321ad2e83e.byte_percentfree (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_ae0c0365a418e6d09fafa87f9895fd4e627f98827599ba01651a8d2bb7 [19:05:13] e (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_542702f60914ef9f11b4fff9801fb50b98babed3d88dab01384772c8a8f1f043.byte_percentfree (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_382de2b48f7a102ecf17cbd12d459f88057d36dba96cedb922040e35e9b82d2e.byte_percentfree (No valid datapoints found) integrat [19:05:13] docker-1001.diskspace._var_lib_docker_devicemapper_mnt_244a2d354369e401b6c66d4bd544048c3840d932cf778fdf984537361725f816.byte_percentfree (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_d8fc462aeb76074a5f89517c31b54a82a6b9bee7edadd84f909f7bf192acd160.byte_percentfree (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_ [19:05:13] a7f3437c5df7fe9022aa85e5b8d3f8456090e82a1199b2b29dec9635842.byte_percentfree (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_1c0fb41f7a2181b123e5946fba45e85cdfef2556fc8597cb15f65675a1b19486.byte_percentfree (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_eed8308ecda3c515fc3eb746da906634666ed3ec1b4 [19:05:13] byte_percentfree (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_31efaafd7825c53306a17741dcce1df6a28a339aadc13d4aa228986a0325c8a7.byte_percentfree (No valid datapoints found) integration.integration-slave-docker-1001.diskspace._var_lib_docker_devicemapper_mnt_734cb090a18ec3f54bb70110faa369e4073fe6ad423521c364e403698f3ec8b1.byte_percentfree (No valid datapoints [19:05:22] 10Beta-Cluster-Infrastructure, 10WMF-Legal, 10Privacy, 10Security: Require email address to register on Beta Cluster - https://phabricator.wikimedia.org/T181034#3797631 (10Bawolff) p:05Triage>03Normal [19:06:31] 10Continuous-Integration-Config, 10Librarization, 10Security-Team, 10Composer, and 2 others: Expand our usage of FriendsOfPHP/security-advisories - https://phabricator.wikimedia.org/T180278#3797633 (10Bawolff) p:05Triage>03Normal [19:11:48] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [19:12:09] 10Release-Engineering-Team (Kanban), 10StructuredDiscussions, 10Browser-Tests, 10Collaboration-Team-Triage (Collab-Team-This-Quarter), and 2 others: Flow: Migrate browser tests from Ruby to node.js - https://phabricator.wikimedia.org/T174591#3797647 (10zeljkofilipin) p:05Triage>03Normal a:03zeljkofili... [19:12:48] greg-g it was fixed but andrew says the cron should run and fix this soon. Hes going to keep and eye on it. [19:13:26] to fix it manually you replace this line default_manifest in puppet.conf with default_manifest = $confdir/manifests and then run puppet. [19:13:37] then sudo service apache2 restart [19:13:53] then puppet should start working. you only need to do that on the puppetmaster. [19:14:21] thcipriani ^^ [19:14:29] that's only going to work if https://gerrit.wikimedia.org/r/#/c/394098/4 has landed there I imagine [19:14:37] yeh [19:14:38] 34.todd_ │19:12:47 paladox greg-g it was fixed but andrew says the cron should run and fix this soon. Hes going to keep and eye on it. [19:14:55] heh middle click paste rather than middle click link click [19:15:10] sorry for pings [19:15:48] cc chasemp ^^ :) [19:16:14] chasemp: afaict that has landed on deployment-puppetmaster02 [19:16:25] i think all puppetmasters may need to do that. [19:17:40] paladox: so I wondered teh same but that does not seem to fix things [19:17:46] I know it's part of the issue anyway [19:17:54] hmm, it fixes it for me [19:18:29] there were other transitional changes that maybe didn't happen here first? [19:18:46] that's possible [19:20:05] 1. edit vi /etc/puppet/puppet.conf (replace this line default_manifest with default_manifest = $confdir/manifests) then save [19:20:09] 2. restart apache [19:20:13] then run puppet [19:20:57] 10Release-Engineering-Team (Kanban), 10StructuredDiscussions, 10Browser-Tests, 10Collaboration-Team-Triage (Collab-Team-This-Quarter), and 2 others: Flow: Migrate browser tests from Ruby to node.js - https://phabricator.wikimedia.org/T174591#3797689 (10zeljkofilipin) I have assigned the task to myself unti... [19:21:24] ah...I did just notice that puppet.conf had the wrong default_manifest line [19:21:31] and now it seems to be running ok [19:21:41] I changed it back after it didn't work the first time [19:21:51] it's possible I race conditioned originally w/ the puppet checkout [19:22:02] order of operational things [19:22:11] ah, gotcha, well puppet run was just successful on deployment-puppetmaster02 [19:22:23] thcipriani: can you spot check a few clients as well please? [19:22:28] yep, doing [19:23:37] looking good for a handful of them! [19:23:54] chasemp: paladox thanks for your help! [19:24:00] ok cool [19:24:09] I think we'll start to see recoveries from shinken soon :) [19:24:14] Your welcome, though andrew gave me the hint to replace default_manifest line :) [19:24:45] chasemp i think doing this will need to be done in all local puppet masters. [19:24:46] there are a lot of broken masters out there I imagine https://tools.wmflabs.org/openstack-browser/puppetclass/role::puppetmaster::standalone [19:25:05] puppet-phabricator.phabricator.eqiad.wmflabs and puppet-paladox3 are fixed :) [19:27:20] * thcipriani checks integration [19:31:31] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:32:15] yeap. Looks like integration-puppetmaster needed a manual nudge as well. [19:32:46] RECOVERY - Puppet errors on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:32:52] (03PS1) 10Zfilipin: Do not run Ruby Selenium jobs for Flow [integration/config] - 10https://gerrit.wikimedia.org/r/394111 (https://phabricator.wikimedia.org/T174591) [19:34:16] RECOVERY - Puppet errors on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:34:22] RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0] [19:34:26] RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:34:52] (03PS2) 10Zfilipin: Do not run Ruby Selenium jobs for Flow [integration/config] - 10https://gerrit.wikimedia.org/r/394111 (https://phabricator.wikimedia.org/T174591) [19:34:53] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [19:34:55] RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:34:57] RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:35:11] RECOVERY - Puppet errors on deployment-cassandra3-01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:35:26] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [19:35:33] RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0] [19:35:47] RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:35:59] RECOVERY - Puppet errors on deployment-mediawiki07 is OK: OK: Less than 1.00% above the threshold [0.0] [19:36:38] RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [19:37:50] RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:38:12] RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [19:39:55] RECOVERY - Puppet errors on deployment-cassandra3-02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:40:01] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:40:03] RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:40:03] RECOVERY - Puppet errors on deployment-trending01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:40:23] RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:40:39] RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:40:43] RECOVERY - Puppet errors on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:41:22] RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:41:28] RECOVERY - Puppet errors on integration-publishing is OK: OK: Less than 1.00% above the threshold [0.0] [19:41:30] RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:41:54] RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [19:42:26] RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0] [19:43:25] RECOVERY - Puppet errors on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0] [19:44:47] RECOVERY - Puppet errors on deployment-eventlog02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:45:40] RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:46:05] RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:46:20] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [19:47:20] RECOVERY - Puppet errors on deployment-ms-fe02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:47:40] RECOVERY - Puppet errors on saucelabs-02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:48:06] RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:48:10] RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [19:48:14] RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:48:34] RECOVERY - Puppet errors on deployment-memc07 is OK: OK: Less than 1.00% above the threshold [0.0] [19:48:45] RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:48:51] RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:49:23] RECOVERY - Puppet errors on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:49:38] RECOVERY - Puppet errors on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:49:40] RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:49:46] RECOVERY - Puppet errors on deployment-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [19:49:46] RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:50:04] RECOVERY - Puppet errors on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:50:21] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [19:50:25] RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:50:30] RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [19:50:39] RECOVERY - Puppet errors on integration-slave-docker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [19:50:55] RECOVERY - Puppet errors on deployment-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:51:14] RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:52:45] RECOVERY - Puppet errors on integration-slave-jessie-android is OK: OK: Less than 1.00% above the threshold [0.0] [19:53:05] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [19:53:51] RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [19:53:51] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [10.0] [19:54:22] RECOVERY - Puppet errors on integration-slave-docker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [19:54:42] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:55:20] RECOVERY - Puppet errors on deployment-sca04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:55:30] RECOVERY - Puppet errors on deployment-ms-be03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:55:34] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:55:36] RECOVERY - Puppet errors on deployment-redis05 is OK: OK: Less than 1.00% above the threshold [0.0] [19:55:53] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:57:42] RECOVERY - Puppet errors on webperformance is OK: OK: Less than 1.00% above the threshold [0.0] [19:58:26] RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:58:34] RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:59:20] RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [19:59:35] RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:59:39] RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [19:59:51] RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:59:55] RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:03] RECOVERY - Puppet errors on integration-slave-docker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:05] RECOVERY - Puppet errors on jenkinstest is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:17] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:17] RECOVERY - Puppet errors on integration-slave-docker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:32] RECOVERY - Puppet errors on deployment-cpjobqueue is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:46] RECOVERY - Puppet errors on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:48] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [20:01:26] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:01:26] RECOVERY - Puppet errors on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [20:04:05] RECOVERY - Puppet errors on integration-slave-docker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [20:05:46] RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:07:54] RECOVERY - Puppet errors on integration-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [20:09:13] RECOVERY - Puppet errors on integration-slave-docker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [20:09:56] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 3 others: Support git-lfs files in gerrit - https://phabricator.wikimedia.org/T171758#3797904 (10Paladox) git-lfs is now supported :). See https://gerrit.wikimedia.org/r/#/c/394125/ [20:10:27] RECOVERY - Puppet errors on castor02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:11:07] RECOVERY - Puppet errors on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [20:11:36] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:12:18] RECOVERY - Puppet errors on saucelabs-03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:15:35] we should write up some lfs docs now :). [20:16:39] PROBLEM - Puppet errors on deployment-redis05 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:16:58] paladox: in a really large file... [20:17:20] yep. 1mb file. i just used head to try and create a test file. [20:18:00] hehe "Kanban), Scap (Tech Debt Sprint FY201718-Q2), ORES, and 3 others" :) [20:18:12] heh [20:18:42] PROBLEM - Puppet errors on webperformance is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:19:26] PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:20:20] PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:20:35] PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:20:41] PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:20:51] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:20:55] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:21:01] PROBLEM - Puppet errors on integration-slave-docker-1004 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:21:15] PROBLEM - Puppet errors on integration-slave-docker-1006 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:21:29] PROBLEM - Puppet errors on deployment-cpjobqueue is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:21:37] hmm [20:21:46] PROBLEM - Puppet errors on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:21:48] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:21:50] it happened for me too, but as soon as i re ran it worked [20:21:57] PROBLEM - Puppet errors on deployment-mediawiki07 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:22:27] PROBLEM - Puppet errors on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:22:29] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:23:12] (03CR) 10Jdlrobson: [C: 031] "The unit tests have been fixed now.. Keen to get this merged so they don't break again!" [integration/config] - 10https://gerrit.wikimedia.org/r/393642 (https://phabricator.wikimedia.org/T181429) (owner: 10Jdlrobson) [20:24:35] PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:31:27] 10Release-Engineering-Team (Watching / External), 10Scap, 10ORES, 10Operations, 10Scoring-platform-team: ORES should use a git large file plugin for storing serialized binaries - https://phabricator.wikimedia.org/T171619#3797980 (10demon) [20:31:35] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 3 others: Support git-lfs files in gerrit - https://phabricator.wikimedia.org/T171758#3797978 (10demon) 05Open>03Resolved a:03demon [20:32:09] PROBLEM - Puppet errors on integration-slave-jessie-1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:37:07] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 2 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3798012 (10awight) [20:38:50] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [20:39:57] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 2 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3798028 (10awight) [20:41:12] hashar: zeljkof: why is https://phabricator.wikimedia.org/T170032 stalled? [20:41:17] What is it blocked on? [20:44:11] Krinkle: will reopen and write a status update [20:45:55] Krinkle: there is no progress on it in months [20:45:56] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: WebdriverIO should run Chrome headlessly - https://phabricator.wikimedia.org/T167507#3798083 (10hashar) [20:45:58] 10Continuous-Integration-Infrastructure, 10User-zeljkofilipin: Upgrade to Chromium 59 or newer on Debian Jessie in CI - https://phabricator.wikimedia.org/T170032#3798079 (10hashar) 05stalled>03Open **Status update** T179360 provides a Docker container with a Xvfb driver and running npm install. That was a... [20:45:59] Krinkle: that is a work in progress more or less [20:46:29] PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:46:38] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 3 others: Support git-lfs files in gerrit - https://phabricator.wikimedia.org/T171758#3798085 (10awight) [20:46:44] 10Release-Engineering-Team (Watching / External), 10Scap, 10ORES, 10Operations, 10Scoring-platform-team: ORES should use a git large file plugin for storing serialized binaries - https://phabricator.wikimedia.org/T171619#3798086 (10awight) [20:46:48] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 2 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3798084 (10awight) [20:46:53] hashar: sorry, I thought it was stalled :) [20:46:54] Krinkle: the container that has Xvfb in the background and phantomjs is https://gerrit.wikimedia.org/r/#/c/393232/2 [20:47:10] Krinkle: then I will switch it to stretch and add chromium/firefox [20:48:10] zeljkof: no worries :) [20:48:26] 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review: Create "npm-browser" docker image with npm, xvfb, chromium, and firefox installed - https://phabricator.wikimedia.org/T179360#3798096 (10hashar) [20:48:28] 10Continuous-Integration-Infrastructure, 10User-zeljkofilipin: Upgrade to Chromium 59 or newer on Debian Jessie in CI - https://phabricator.wikimedia.org/T170032#3798095 (10hashar) [20:48:34] hashar: ok. [20:49:26] 10Release-Engineering-Team (Watching / External), 10Scap, 10ORES, 10Operations, 10Scoring-platform-team: ORES should use a git large file plugin for storing serialized binaries - https://phabricator.wikimedia.org/T171619#3798097 (10Paladox) 05stalled>03Open [20:49:45] 10Continuous-Integration-Infrastructure (shipyard), 10User-zeljkofilipin: Upgrade to Chromium 59 or newer on Debian Jessie in CI - https://phabricator.wikimedia.org/T170032#3417522 (10hashar) [20:50:07] Krinkle: the thing I have to dig into is how to auto rebuild the container when a new chromium is available [20:50:17] but there is some tooling being build for that by ops [20:51:57] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 2 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3798104 (10awight) I'm guessing we want to do something like, # Copy repos to a read-only location. # Set LFS flags... [20:53:05] 10Continuous-Integration-Infrastructure (shipyard), 10User-zeljkofilipin: Upgrade to Chromium 59 or newer on Debian Jessie in CI - https://phabricator.wikimedia.org/T170032#3798112 (10hashar) And I thought I commented on this task sorry. My thoughts are: * I am not going to add a stretch image to Nodepool. Nod... [20:55:18] RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:55:41] RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:55:59] RECOVERY - Puppet errors on integration-slave-docker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [20:56:17] RECOVERY - Puppet errors on integration-slave-docker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [20:56:27] RECOVERY - Puppet errors on deployment-cpjobqueue is OK: OK: Less than 1.00% above the threshold [0.0] [20:56:37] RECOVERY - Puppet errors on deployment-redis05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:56:58] RECOVERY - Puppet errors on deployment-mediawiki07 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:16] hashar: Makes sense. [20:58:21] hashar: What about mediawiki/php/apache [20:58:31] e.g. for npm jobs for mediawiki and extensions [20:58:33] no clue yet :( [20:58:40] What are the blockers for that? [20:58:42] RECOVERY - Puppet errors on webperformance is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:08] I havent much thought about it yet [20:59:27] RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:35] RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:33] RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:50] RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:55] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team (Current): Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3798137 (10thcipriani) Hrm. I think this error probably has something to do with ssh client timeout. I'm not sure if anything rece... [21:01:47] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 2 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3798142 (10demon) They'll mirror just fine since Phabricator just observes upstream. [21:01:48] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [21:02:26] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:15:39] 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, 10User-Joe: Upgrade latest docker-registry.wikimedia.org/nodejs-devel to stretch - https://phabricator.wikimedia.org/T180524#3798234 (10Joe) p:05Triage>03High [21:17:03] !log deployment-prep Verbose logging for ORES Celery [21:17:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:30:20] (03CR) 10Hashar: [C: 032] Run QUnit tests on Minerva skin non-experimental [integration/config] - 10https://gerrit.wikimedia.org/r/393642 (https://phabricator.wikimedia.org/T181429) (owner: 10Jdlrobson) [21:30:34] thcipriani: Just curious, I see that fetching a new revision of ORES takes a long time even if it’s just a minor submodule bump. I would’ve thought that the git cache would make that fast. Is it really just a helluva lot of pure I/O file copying? [21:30:49] PROBLEM - Free space - all mounts on deployment-sca03 is CRITICAL: CRITICAL: deployment-prep.deployment-sca03.diskspace._srv.byte_percentfree (<30.00%) [21:31:29] (03Merged) 10jenkins-bot: Run QUnit tests on Minerva skin non-experimental [integration/config] - 10https://gerrit.wikimedia.org/r/393642 (https://phabricator.wikimedia.org/T181429) (owner: 10Jdlrobson) [21:31:50] awight: #REDIRECT twentyafterfour ^ ( thcipriani is trying to focus on non-scap stuff as much as possible) [21:31:51] awight: are you seeing that on beta or prod? We're doing things two different ways currently :) [21:32:03] heh, or that works, too :) [21:32:34] twentyafterfour: This is on beta (pretending you are thcipriani :) [21:33:31] Interestingly, a scap —force of the same revision is lightning-fast. [21:33:50] 10Continuous-Integration-Config, 10Patch-For-Review, 10User-Jdlrobson: QUnit tests are not running on Minerva skin - https://phabricator.wikimedia.org/T181429#3798288 (10hashar) a:03Jdlrobson Thanks Jon for taking care of the CI config :] [21:35:57] well. To finish my thought. If it were on prod it's because submodule caching is still only in master. Since it's happening on beta it bears some investigation: I don't know, it should be pretty fast since there is a cache for submodules as well now, IIRC. [21:46:18] 10Continuous-Integration-Config, 10Proton, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Set up Jenkins for chromium-render and chromium-render-deploy repositories - https://phabricator.wikimedia.org/T179552#3798316 (10hashar) There is some eslint missing: ``` ├─┬ UNMET PEER DEPENDENCY eslint@3.19.... [21:46:18] (03CR) 10Hashar: "Replied on T179552 :D Lets poke each other!" [integration/config] - 10https://gerrit.wikimedia.org/r/394058 (https://phabricator.wikimedia.org/T179552) (owner: 10Phuedx) [21:47:29] no_justification wondering could you merge https://gerrit.wikimedia.org/r/#/c/394179/ please? :) [21:48:45] Lemme finish this other thing up then yeah I'll have a look [21:48:54] Unrelated: we should have a jenkins job for refs/meta/config changes [21:48:59] Should be easy to lint [21:49:31] ah [21:49:49] hmm, i wonder how will jenkins lint the project.config and groups.config file? [21:49:56] is there a lint for that? :) [21:50:13] (also thanks for reviewing it when you can :)) [21:50:34] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10fullscreenorgId=1 [21:52:15] paladox: gitconfig files should be easy to lint -- looks like there's an npm package that does it already [21:52:21] (ew, npm, i suggested that?!) [21:52:29] ah, and lol [21:52:43] i guess we can do that [21:54:06] * paladox wonders what the package name is [21:54:20] I need to shower... i just saw the word npm xD [21:54:22] Actually, ConfigParser in Python works just fine, if you strip the leading spaces [21:57:28] paladox: Merged [21:57:30] thanks no_justification [21:58:05] yw [22:22:30] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Cloud-VPS, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3798413 (10hashar) The issue ( T170492#3581822 ) still happens from time... [22:23:22] Nodepool should be back [22:24:06] something got stall in openstack at 21:09 and probably after again [22:24:09] but it seems to be recovering [22:51:48] addshore: btw, thoughts on using a subdirectory for MW in docker-dev? [22:51:54] (e.g. 'w' or mediawiki, instead of root) [22:52:21] Yup, can do! :) [22:52:42] Again, thanks for all the PRs..... I've not had much time to poke it much recently [22:52:44] addshore: Did you go with root because it was easiest with the webdev images, or because it is preferred? [22:52:50] I really want to make running unit tests easier [22:52:52] addshore: Thank you for revieiwng them :) [22:53:29] I simply went with root as I didn't see a reason not to really :P [22:53:35] Ah, okay. [22:53:53] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Cloud-VPS, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3798548 (10Andrew) >>! In T170492#3798417, @Stashbot wrote: > {nav icon=f... [22:53:58] I suppose it would still be preferable to avoid having to write nginx/apache 2.2/apache 2.4/whatever config separately just for a simple rewrite rule [22:54:06] But I worked around that by using an index.php wrapper instead. [22:54:08] in the document root [22:54:10] simple enough [22:55:04] Ahh yes, perhaps thats why I did it, to avoid touching the images where possible and try to just use the env vars you can pass in [22:55:25] Yeah, they have no env API for rewrite rules (eww..) [22:55:55] Atm it's blocking QUnit and other npm commands because Karma will serve the HTML transparently, and then MW will try to relatively access itself where / is already taken by Karma itself. [22:55:56] why should they? [22:56:06] I'll submit a PR :) [22:56:25] wouldn't it be cool (once CI is all run in containers) to be able to have CI run in the same docker environment locally as it does currently within the wmf infrastructure? ;) [22:56:33] anyway! I'm off to bed for now! [22:56:37] Platonides: Yeah, it makes sense not to want to provide a server-agnostic API to rewrite rules in form of env variables. [22:57:21] but it also means that to mount a web app elsewhere than root, you have to commit to a specific web server when writing the config, whereas right now the niceness is that we can swap the nginx/apache base images in mediawiki-docker-dev without any modifications. [22:57:24] Only run-time plugging. [22:58:33] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 3 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3798554 (10Halfak) Trying start a gerrit review for wheels. Got this: ``` Do you really want to submit the above... [22:59:47] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 3 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3798556 (10Halfak) Putting repo backups here: https://analytics.wikimedia.org/datasets/archive/public-datasets/all/... [23:17:56] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 3 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3798581 (10Halfak) https://github.com/wiki-ai/draftquality is fully updated. [23:59:36] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10fullscreenorgId=1