[00:00:06] (03CR) 10jenkins-bot: [V: 04-1] Add service deploy via scap [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 (owner: 10Thcipriani) [00:06:55] (03PS2) 10Thcipriani: Add service deploy via scap [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 [00:09:18] (03CR) 10BryanDavis: Add service deploy via scap (0310 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 (owner: 10Thcipriani) [03:29:03] (03PS3) 10Thcipriani: Add service deploy via scap [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 [03:43:25] thcipriani|afk: Looks like a pretty good start. [03:45:20] bd808: definitely, scap extends nicely to this use-case, I think. [03:46:26] One thing you will need is to figure out how to let a random server find out what the "current" hash is [03:46:38] so that puppet can ensue that the host is up to date [03:46:46] *ensure [03:47:30] scap just pulls whatever the rsync server on tin has but I'm guessing you can do better with the backing git repo [03:48:20] I think trebuchet does this by writing some metadata file into the repo [03:48:54] yeah, there's a file + a tag in the repo on tin [03:50:35] writing a tag seems like a righteous way to go. I'll look more into this going forward. I just wanted to get something out into the world before then next deployment cabal meeting. [03:50:45] *nod* [03:51:25] I have a patch up that is close to what needs to happen to sync tin with mira that might be worth reading over for sanity [03:51:44] I need to build a second deploy server in beta cluster so I can test it [03:54:07] that shouldn't be too bad. Setting up a new tin in labs has seemingly been getting easier. Also, I've been getting less new, so that may just be a matter of perspective. [03:58:22] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<33.33%) [04:13:20] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [06:43:29] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<50.00%) [07:08:58] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [07:10:24] PROBLEM - Puppet failure on deployment-stream is CRITICAL 20.00% of data above the critical threshold [0.0] [07:11:36] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 30.00% of data above the critical threshold [0.0] [07:12:12] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 44.44% of data above the critical threshold [0.0] [07:12:46] PROBLEM - Puppet failure on deployment-salt is CRITICAL 40.00% of data above the critical threshold [0.0] [07:12:52] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 50.00% of data above the critical threshold [0.0] [07:12:56] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 50.00% of data above the critical threshold [0.0] [07:13:23] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL 44.44% of data above the critical threshold [0.0] [07:13:23] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL 50.00% of data above the critical threshold [0.0] [07:14:19] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 55.56% of data above the critical threshold [0.0] [07:14:49] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL 60.00% of data above the critical threshold [0.0] [07:32:53] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [07:35:25] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [07:56:05] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 22.22% of data above the critical threshold [0.0] [07:56:35] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 30.00% of data above the critical threshold [0.0] [07:57:57] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 30.00% of data above the critical threshold [0.0] [07:59:57] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 50.00% of data above the critical threshold [0.0] [08:00:15] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 55.56% of data above the critical threshold [0.0] [08:01:35] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:01:55] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 40.00% of data above the critical threshold [0.0] [08:05:09] PROBLEM - Puppet failure on deployment-test is CRITICAL 33.33% of data above the critical threshold [0.0] [08:07:29] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 20.00% of data above the critical threshold [0.0] [08:09:21] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 55.56% of data above the critical threshold [0.0] [08:09:37] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:09:41] PROBLEM - Puppet failure on deployment-upload is CRITICAL 70.00% of data above the critical threshold [0.0] [08:11:15] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 55.56% of data above the critical threshold [0.0] [08:11:52] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 20.00% of data above the critical threshold [0.0] [08:12:28] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 40.00% of data above the critical threshold [0.0] [08:12:44] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:13:04] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 44.44% of data above the critical threshold [0.0] [08:13:05] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 22.22% of data above the critical threshold [0.0] [08:13:41] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:14:29] PROBLEM - Puppet failure on deployment-urldownloader is CRITICAL 50.00% of data above the critical threshold [0.0] [08:14:33] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 40.00% of data above the critical threshold [0.0] [08:16:07] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 33.33% of data above the critical threshold [0.0] [08:17:39] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:18:54] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 50.00% of data above the critical threshold [0.0] [08:21:30] PROBLEM - Puppet failure on deployment-stream is CRITICAL 30.00% of data above the critical threshold [0.0] [08:21:32] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 20.00% of data above the critical threshold [0.0] [08:21:42] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL 60.00% of data above the critical threshold [0.0] [08:22:44] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:22:26] PROBLEM - Puppet failure on deployment-mx is CRITICAL 100.00% of data above the critical threshold [0.0] [10:06:46] !log integration: kicking puppet master. It is stalled somehow [10:06:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:08:06] PROBLEM - Host integration-t102108-trusty-new2 is DOWN: CRITICAL - Host Unreachable (10.68.17.6) [10:09:33] PROBLEM - Host integration-t102108-jessie-new2 is DOWN: CRITICAL - Host Unreachable (10.68.16.128) [10:11:43] PROBLEM - Puppet failure on integration-slave-precise-1011 is CRITICAL 40.00% of data above the critical threshold [0.0] [10:12:54] !log deployment-prep: killing puppetmaster [10:12:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:18:35] 10Continuous-Integration-Infrastructure: MediaWiki phpunit jobs should collect php errors from installer - https://phabricator.wikimedia.org/T104909#1449030 (10hashar) 5Open>3Resolved a:3hashar [10:24:56] !log pushed mediawiki/ruby/api tags for versions 0.4.0 and 0.4.1 [10:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:26:14] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [10:26:52] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [10:27:38] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [10:27:45] bah [10:27:51] it s coming up!!!!!!!! [10:28:40] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [10:29:35] RECOVERY - Puppet failure on deployment-urldownloader is OK Less than 1.00% above the threshold [0.0] [10:29:35] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [10:31:31] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [10:31:43] RECOVERY - Puppet failure on deployment-elastic06 is OK Less than 1.00% above the threshold [0.0] [10:32:39] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [10:32:43] RECOVERY - Puppet failure on deployment-mediawiki03 is OK Less than 1.00% above the threshold [0.0] [10:33:54] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [10:35:12] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [10:36:28] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [10:37:12] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [10:37:54] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [10:38:21] RECOVERY - Puppet failure on deployment-pdf01 is OK Less than 1.00% above the threshold [0.0] [10:39:19] RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0] [10:39:37] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [10:39:57] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [10:41:35] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [10:41:35] RECOVERY - Puppet failure on deployment-memc02 is OK Less than 1.00% above the threshold [0.0] [10:41:55] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [10:42:42] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [10:42:54] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [10:43:18] RECOVERY - Puppet failure on deployment-elastic07 is OK Less than 1.00% above the threshold [0.0] [10:44:47] RECOVERY - Puppet failure on deployment-eventlogging02 is OK Less than 1.00% above the threshold [0.0] [10:45:09] RECOVERY - Puppet failure on deployment-test is OK Less than 1.00% above the threshold [0.0] [10:46:05] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [10:46:33] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [10:49:17] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [10:49:41] RECOVERY - Puppet failure on deployment-upload is OK Less than 1.00% above the threshold [0.0] [10:51:41] RECOVERY - Puppet failure on integration-slave-precise-1011 is OK Less than 1.00% above the threshold [0.0] [10:52:29] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [10:52:30] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [10:53:04] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [10:53:06] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [10:53:27] 10Deployment-Systems, 6Release-Engineering: scap should be LCStore-agnostic - https://phabricator.wikimedia.org/T105683#1449057 (10ori) 3NEW [10:56:08] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [12:24:32] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<10.00%) [13:07:05] Yippee, build fixed! [13:07:05] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #715: FIXED in 35 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/715/ [13:49:18] 10Continuous-Integration-Infrastructure, 6operations, 7Blocked-on-Operations: Update jenkins-debian-glue packages on Jessie to v0.13.0 - https://phabricator.wikimedia.org/T102106#1449275 (10Joe) p:5Triage>3Normal [13:50:09] 10Continuous-Integration-Infrastructure, 6operations, 7Blocked-on-Operations: Update jenkins-debian-glue packages on Jessie to v0.13.0 - https://phabricator.wikimedia.org/T102106#1449278 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [14:13:17] 10Beta-Cluster, 10Continuous-Integration-Config, 10Math: beta-recompile-math-texvc-eqiad job fails with "/usr/local/bin/scap-recompile: No such file or directory" - https://phabricator.wikimedia.org/T91191#1449357 (10hashar) p:5High>3Low [14:14:35] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Isolation, 5Patch-For-Review: Puppetize Nodepool configuration - https://phabricator.wikimedia.org/T89143#1449362 (10hashar) 5Open>3Resolved The bulk of the work has been done. There are some refinements going on but overall labnodepool1001... [14:25:29] (03PS1) 10Hashar: tests: precompile some regex [integration/config] - 10https://gerrit.wikimedia.org/r/224417 [14:29:06] (03Abandoned) 10Hashar: Reënable Apache lint check [integration/config] - 10https://gerrit.wikimedia.org/r/166033 (https://bugzilla.wikimedia.org/70068) (owner: 10Hashar) [14:29:44] (03Abandoned) 10Hashar: phabricator job to run arc lint on all repo [integration/config] - 10https://gerrit.wikimedia.org/r/183094 (https://phabricator.wikimedia.org/T85123) (owner: 10Hashar) [14:32:14] (03PS1) 10Hashar: tests: save a level of indent in test_repos_have_required_jobs [integration/config] - 10https://gerrit.wikimedia.org/r/224418 [15:23:27] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL 100.00% of data above the critical threshold [0.0] [15:52:32] bd808 or whoever, shall I merge https://gerrit.wikimedia.org/r/#/c/224219/2 now? [15:56:12] 10Deployment-Systems, 6Release-Engineering: scap should be LCStore-agnostic - https://phabricator.wikimedia.org/T105683#1449611 (10bd808) The specific check that exists for `l10n_cache-en.cdb` is to determine if the `--force` flag needs to be passed to `rebuildLocalisationCache.php` or not. To see why this is... [16:00:23] I have this room to myself, don’t I [16:02:02] 10Deployment-Systems: Investigate what changes are needed to deploy MW+Extensions by percentage of users (instead of by domain/wiki) - https://phabricator.wikimedia.org/T104398#1449627 (10greg) Normal prio is fine, we should just be able to give an answer (with our hands waving a bit) when asked "what's needed t... [16:06:47] andrewbogott: please do merge that, get rid of a cherry pick on deployment-salt [16:07:16] thcipriani: ok, will merge in 10 or so [16:07:23] kk, thanks [16:26:07] 6Release-Engineering, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1449713 (10BBlack) So, where are we at on removing the redirection workarounds here? I'd still like to get these re... [16:59:17] just double checking, this week follows standard deploy? [17:01:04] ebernhardson: looking at the deployments page - appears so [17:34:56] yep [17:34:59] ebernhardson: yep [17:48:09] marxarelli: Does mw-selenium work properly with TMPDIR not existing? It's assigned but not created as far as I can see. [17:49:09] per https://github.com/wikimedia/integration-jenkins/blob/master/bin/global-teardown.sh, the teardown can use rm -rf unconditionally since it already tests internally with -f [17:49:30] I'm also curious why it does a mkdir in https://gerrit.wikimedia.org/r/#/c/224192/1/bin/mw-selenium-teardown.sh afterward. [17:56:13] Krinkle: https://gerrit.wikimedia.org/r/#/c/224179/1/jjb/macro.yaml [17:56:42] Krinkle: ha! i never knew that about `rm -f` [17:57:17] marxarelli: Yeah, see global-setup and global-teardown for comparison. Just rm -rf in teardown and mkdir in setup. If it exists, mkdir will produce an error which is good, because it shouldn't exit [17:57:19] exist* [17:57:22] Krinkle: the mkdir is super counterintuitive, but makes it useful for both setup and teardown [17:59:23] Krinkle: i think i was too eager to get that working on friday and should have thought about it for a second longer [17:59:29] and that's how you get cruft :) [17:59:33] Yeah. [17:59:50] We should make thing strict. No tolerance or unneeded resiliance. It'll only cause bugs. [17:59:54] e.g. files that shouldn't be there [18:01:24] (03PS1) 10Pastakhov: add mediawiki/extensions/PhpTagsStorage [integration/config] - 10https://gerrit.wikimedia.org/r/224446 [18:01:54] Krinkle: so nothing else uses a TMPDIR at /tmp? [18:02:02] i thought i saw that somewhere [18:02:03] marxarelli: It does. [18:02:05] Yes [18:02:17] Lots of places use it [18:02:27] but you create your own directory that's unique for that executor slot at that time [18:02:33] if something else is occupying that, that's wrong [18:02:38] right right [18:02:47] that makes sense [18:02:55] and unlike production, tests should fail hard if anything is unexpected. No trying to make it work under strange circumstances. Can't trust the test results in that case :) [18:06:05] i agree with that position, but it's a little difficult to reason about when working with jjb. i.e. it's very abstracted [18:07:48] PROBLEM - Free space - all mounts on deployment-fluorine is CRITICAL deployment-prep.deployment-fluorine.diskspace.root.byte_percentfree (<40.00%) [18:08:37] Krinkle: so you think separate setup/teardown (mkdir -p and rm -f, respectively) mw-selenium scripts is the right approach? [18:09:41] (03PS1) 10Chad: Default to eqiad, not pmtpa [tools/scap] - 10https://gerrit.wikimedia.org/r/224449 [18:10:24] it seems redundant of what global-setup/-teardown is doing but i don't see how i can incorporate those scripts without separating them from global-set-env.sh [18:14:08] marxarelli: Yeah, you won't be able to call global-set-env [18:14:13] marxarelli: Why is this not using those though? [18:14:26] what's the issue with it using tmpfs potentially [18:14:36] e.g. the default TMPDIR provided by global [18:14:38] Krinkle: some bug in xkb on Ubuntu [18:14:53] What kind of bug. Path too long? [18:14:58] Krinkle: https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/972324 [18:15:10] Hm.. [18:15:11] it seems like an Ubuntu specific patch [18:15:18] i don't know the details [18:15:53] i was going to try to run the job on jessie but we don't seem to be there yet [18:16:08] marxarelli: could have mw-selenium-set-env set a variable before including global-set-env that will make it avoid tmpfs [18:16:59] then we can at least re-use global-set-env [18:17:01] Krinkle: possibly. when is global-setup.sh called ... [18:17:12] yeah, let me look into it [18:17:23] global-setup and global-teardown aren't needed, you'll implement your own mw-senenium versions instead [18:17:27] it's only mkdir/rm-rf anyway [18:17:42] but both would include mw-selenium-set-env which uses global-set-env [18:17:59] so you have DISPLAY and the centralised logic for the TMPDIR variable [18:29:48] 10Deployment-Systems, 6Release-Engineering, 7Puppet: Puppet failure on deployment-sentry2 - https://phabricator.wikimedia.org/T78411#1450217 (10thcipriani) 5Open>3Resolved a:3thcipriani Puppet on deployment-sentry2 seems to be running just fine now [18:35:40] 6Release-Engineering, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1450241 (10Tgr) >>! In T102566#1449713, @BBlack wrote: > Have we released new software with https:// URLs? No. [[ h... [18:39:36] 6Release-Engineering, 6Analytics-Backlog, 6Performance-Team, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1450247 (10Krinkle) [18:40:41] 6Release-Engineering, 6Analytics-Backlog, 6Performance-Team, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1379593 (10Krinkle) Adding Release-Engineering since Deployment-Systems was removed. This is related to t... [18:41:53] greg-g: can I get a small deploy window after the services one today to turn on $wgCentralAuthStrict = true;? it will prevent people without global accounts from logging in (should affect no one :P) [18:47:50] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1450295 (10mmodell) [18:47:52] 10Deployment-Systems: [scap] multi datacenter aware without (major) performance hit - https://phabricator.wikimedia.org/T71572#1450294 (10mmodell) [18:48:02] 6Release-Engineering, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1450297 (10demon) >>! In T102566#1449713, @BBlack wrote: > So, where are we at on removing the redirection workaroun... [18:52:23] (03PS1) 10Dduvall: Separate mw-selenium setup/teardown and be strict [integration/jenkins] - 10https://gerrit.wikimedia.org/r/224456 [18:53:16] (03PS2) 10Dduvall: Separate mw-selenium setup/teardown and be strict [integration/jenkins] - 10https://gerrit.wikimedia.org/r/224456 [18:53:53] Krinkle: ^ [19:00:19] (03CR) 10Dduvall: [C: 04-2] "Needs to be refactored according to If13aadef929d4b2e7682cc5c2bbe79e474b7c3ba" [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [19:17:28] (03CR) 10BryanDavis: [C: 032] Default to eqiad, not pmtpa [tools/scap] - 10https://gerrit.wikimedia.org/r/224449 (owner: 10Chad) [19:17:51] (03Merged) 10jenkins-bot: Default to eqiad, not pmtpa [tools/scap] - 10https://gerrit.wikimedia.org/r/224449 (owner: 10Chad) [19:19:25] (03CR) 10Dzahn: "can we have datacenter2: codfw ?:)" [tools/scap] - 10https://gerrit.wikimedia.org/r/224449 (owner: 10Chad) [19:26:07] 10Beta-Cluster, 10Staging: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1450387 (10thcipriani) [19:28:02] 10Beta-Cluster, 6Release-Engineering, 10Continuous-Integration-Config: Send beta cluster Jenkins alerts to betacluster-alert list - https://phabricator.wikimedia.org/T1125#1450399 (10mmodell) [19:29:53] 10Beta-Cluster, 10MediaWiki-File-management, 6Multimedia: Thumbnail 404s get cached - https://phabricator.wikimedia.org/T69056#1450412 (10thcipriani) p:5Triage>3Normal [19:55:25] PROBLEM - Puppet staleness on integration-slave-trusty-1013 is CRITICAL 20.00% of data above the critical threshold [43200.0] [20:23:44] PROBLEM - Puppet failure on deployment-salt is CRITICAL 30.00% of data above the critical threshold [0.0] [20:28:26] greg-g: also, [11:41:54] greg-g: can I get a small deploy window after the services one today to turn on $wgCentralAuthStrict = true;? it will prevent people without global accounts from logging in (should affect no one :P) [20:38:45] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [20:39:37] !log restarting puppetmaster on deployment-salt, seeing weird errors on instances [20:39:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:43:27] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 20.00% of data above the critical threshold [0.0] [20:44:03] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 33.33% of data above the critical threshold [0.0] [20:44:05] !log might be some failures, puppetmaster refused to stop as usual, had to kill pid and restart [20:44:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:47:06] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 33.33% of data above the critical threshold [0.0] [20:47:16] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 55.56% of data above the critical threshold [0.0] [20:48:17] legoktm: cool [20:48:23] legoktm: iow: go ahead [20:48:26] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 60.00% of data above the critical threshold [0.0] [20:48:31] thanks :) [20:48:35] :) [20:49:08] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 66.67% of data above the critical threshold [0.0] [20:55:42] 10Beta-Cluster, 6Release-Engineering, 5Patch-For-Review, 7Regression: Beta cluster logo broken (/static/images/project-logos 404 Not Found) - https://phabricator.wikimedia.org/T105541#1450606 (10thcipriani) This gerrit patch is cherry picked on deployment-prep puppetmaster and seems to have fixed the issu... [20:58:32] 10Beta-Cluster, 6operations, 7HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1450629 (10Dzahn) Can't they go to jessie right away? I guess they can't because i hear we don't build HHVM for jessie. [21:18:27] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:19:21] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<44.44%) [21:20:43] (03PS1) 10Hashar: tests: consistency of Zuul pipelines declarations [integration/config] - 10https://gerrit.wikimedia.org/r/224517 [21:22:11] (03CR) 10Hashar: [C: 04-2] "WIP, need to add a few more related tests and fix up the current issues (such as analytics/kraken or ArticleFeedbackv5)." [integration/config] - 10https://gerrit.wikimedia.org/r/224517 (owner: 10Hashar) [21:22:17] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [21:23:27] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [21:23:27] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [21:23:44] (03CR) 10jenkins-bot: [V: 04-1] tests: consistency of Zuul pipelines declarations [integration/config] - 10https://gerrit.wikimedia.org/r/224517 (owner: 10Hashar) [21:24:06] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [21:24:06] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [21:26:16] PROBLEM - Puppet failure on deployment-cache-text03 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:27:06] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [21:42:55] (03PS1) 10Ori.livneh: Don't assume current l10n cache files are .cdb [tools/scap] - 10https://gerrit.wikimedia.org/r/224520 [22:10:25] RECOVERY - Puppet staleness on integration-slave-trusty-1013 is OK Less than 1.00% above the threshold [3600.0] [22:43:52] (03PS2) 10Ori.livneh: Don't assume current l10n cache files are .cdb [tools/scap] - 10https://gerrit.wikimedia.org/r/224520 [22:47:00] (03CR) 10BryanDavis: Don't assume current l10n cache files are .cdb (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/224520 (owner: 10Ori.livneh) [22:49:42] (03CR) 10Ori.livneh: Don't assume current l10n cache files are .cdb (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/224520 (owner: 10Ori.livneh) [22:50:33] (03PS3) 10Ori.livneh: Don't assume current l10n cache files are .cdb [tools/scap] - 10https://gerrit.wikimedia.org/r/224520 [23:20:11] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 100.00% of data above the critical threshold [0.0]