[00:05:24] Reedy: Did you find anything to work on? I was hiding from my computer for the weekend [00:36:08] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1428520 (10Krenair) Shall we open a separate ticket against scap for that? [00:44:55] 10Deployment-Systems, 7HHVM: [scap] Compile HHVM bytecode cache as deployment step - https://phabricator.wikimedia.org/T66272#1428524 (10bd808) >>! In T66272#786613, @bd808 wrote: > I did some work previously during the #hhvm project to make a proof of concept for the compilation step. Prior POC code now publ... [00:47:38] 10Deployment-Systems: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1428530 (10bd808) 3NEW [00:52:24] 10Deployment-Systems: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1428540 (10Krenair) [00:52:26] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1428539 (10Krenair) [02:30:45] 6Release-Engineering, 10Wikimania-Hackathon-2015: Sprint: Wikimedia Site Requests triage/cleanup/process - https://phabricator.wikimedia.org/T90468#1428581 (10Krenair) [02:33:35] 6Release-Engineering, 10Wikimania-Hackathon-2015: Sprint: Wikimedia Site Requests triage/cleanup/process - https://phabricator.wikimedia.org/T90468#1428582 (10Krenair) Listed at https://wikimania2015.wikimedia.org/wiki/Hackathon#Projects_for_experienced_contributors now (Obviously we want experienced contribut... [03:41:58] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL 100.00% of data above the critical threshold [0.0] [04:31:36] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #491: FAILURE in 39 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/491/ [06:30:21] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #671: FAILURE in 11 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/671/ [06:39:18] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [06:59:02] PROBLEM - Puppet failure on integration-zuul-server is CRITICAL 100.00% of data above the critical threshold [0.0] [08:08:06] (03Abandoned) 10Krinkle: Add jsonlint and jshint test for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [08:10:55] (03Restored) 10Paladox: Add jsonlint and jshint test for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [08:11:03] (03CR) 10Paladox: [C: 04-1] Add jsonlint and jshint test for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [08:25:04] 10Browser-Tests: Support headless gem's video recording feature for headless Jenkins jobs - https://phabricator.wikimedia.org/T104583#1428887 (10hashar) [08:33:28] 10Browser-Tests: Support headless gem's video recording feature for headless Jenkins jobs - https://phabricator.wikimedia.org/T104583#1428918 (10hashar) `ffmpeg` has been forked to a new project named `libav`. On the CI slaves, we include the puppet class `mediawiki::packages::multimedia` which uses ffmpeg on Pr... [08:33:49] (03PS5) 10Hashar: WIP Video recording of headless execution [selenium] - 10https://gerrit.wikimedia.org/r/222346 (https://phabricator.wikimedia.org/T104583) (owner: 10Dduvall) [08:34:34] (03CR) 10Hashar: "I have just linked Gerrit change to the task T104583 (Support headless gem's video recording feature for headless Jenkins jobs)." [selenium] - 10https://gerrit.wikimedia.org/r/222346 (https://phabricator.wikimedia.org/T104583) (owner: 10Dduvall) [08:47:59] 10Continuous-Integration-Infrastructure, 6operations: Provide Jessie package to fullfil Mediawiki::Packages requirement - https://phabricator.wikimedia.org/T95002#1428988 (10hashar) [08:48:03] 10Continuous-Integration-Infrastructure, 6Multimedia, 6operations: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1428989 (10hashar) [08:51:33] (03PS6) 10Paladox: Add jsonlint and jshint test for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 [08:51:58] (03CR) 10Paladox: [C: 031] Add jsonlint and jshint test for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [08:52:07] (03PS7) 10Paladox: Add jsonlint test for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 [09:01:35] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Provide Jessie package to fullfil Mediawiki::Packages requirement - https://phabricator.wikimedia.org/T95002#1429013 (10hashar) [09:02:22] 10Continuous-Integration-Infrastructure, 6Multimedia, 6operations, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1429015 (10MoritzMuehlenhoff) For jessie I recommend we use a backport of ffmpeg 2.7.1 as current... [09:09:39] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:10:35] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:12:17] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:14:28] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:15:32] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:15:33] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:15:34] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:16:04] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 44.44% of data above the critical threshold [0.0] [09:16:11] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 22.22% of data above the critical threshold [0.0] [09:16:29] PROBLEM - Puppet failure on deployment-urldownloader is CRITICAL 20.00% of data above the critical threshold [0.0] [09:17:13] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 22.22% of data above the critical threshold [0.0] [09:17:29] PROBLEM - Puppet failure on deployment-stream is CRITICAL 50.00% of data above the critical threshold [0.0] [09:18:11] PROBLEM - Puppet failure on deployment-test is CRITICAL 55.56% of data above the critical threshold [0.0] [09:18:31] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:19:20] (03CR) 10Hashar: [C: 04-1] "The repo lacks a .php entry point. You probably want to fix up the jshint errors as well." (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/222760 (https://phabricator.wikimedia.org/T104760) (owner: 10Eranroz) [09:19:53] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:20:17] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 66.67% of data above the critical threshold [0.0] [09:20:21] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 55.56% of data above the critical threshold [0.0] [09:20:38] bah [09:21:13] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:21:33] ^^^ graphite having an issue [09:21:35] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:23:47] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:24:40] PROBLEM - Puppet failure on deployment-upload is CRITICAL 20.00% of data above the critical threshold [0.0] [09:24:42] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:24:52] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:25:08] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 22.22% of data above the critical threshold [0.0] [09:25:20] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL 33.33% of data above the critical threshold [0.0] [09:25:26] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 50.00% of data above the critical threshold [0.0] [09:25:54] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:26:22] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:27:56] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 20.00% of data above the critical threshold [0.0] [09:28:12] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 55.56% of data above the critical threshold [0.0] [09:29:16] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 55.56% of data above the critical threshold [0.0] [09:29:52] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:30:37] (03PS1) 10Krinkle: mw-teardown: Only cp LocalSettings if it exists [integration/jenkins] - 10https://gerrit.wikimedia.org/r/223003 [09:30:55] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:31:03] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 22.22% of data above the critical threshold [0.0] [09:31:33] (03CR) 10Krinkle: [C: 032] "Example https://integration.wikimedia.org/ci/job/mediawiki-phpunit-hhvm/10793/console – can't get stack trace without log artefact." [integration/jenkins] - 10https://gerrit.wikimedia.org/r/223003 (owner: 10Krinkle) [09:31:43] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:32:11] (03Merged) 10jenkins-bot: mw-teardown: Only cp LocalSettings if it exists [integration/jenkins] - 10https://gerrit.wikimedia.org/r/223003 (owner: 10Krinkle) [09:32:39] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:34:13] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL 66.67% of data above the critical threshold [0.0] [09:46:43] 10Continuous-Integration-Infrastructure, 6Multimedia, 6operations, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1429108 (10MoritzMuehlenhoff) >>! In T103335#1429015, @MoritzMuehlenhoff wrote: > In Debian the De... [10:26:36] 10Continuous-Integration-Infrastructure, 6Multimedia, 6operations, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1429138 (10Bawolff) Umm, we already use libav. In debian (unless you go back far enough), the ffmp... [10:49:12] RECOVERY - Puppet failure on deployment-logstash1 is OK Less than 1.00% above the threshold [0.0] [10:49:40] RECOVERY - Puppet failure on deployment-upload is OK Less than 1.00% above the threshold [0.0] [10:50:34] RECOVERY - Puppet failure on deployment-memc02 is OK Less than 1.00% above the threshold [0.0] [10:52:14] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [10:53:08] RECOVERY - Puppet failure on deployment-test is OK Less than 1.00% above the threshold [0.0] [10:54:28] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [10:55:14] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [10:55:20] RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0] [10:55:32] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [10:56:04] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [10:56:12] RECOVERY - Puppet failure on deployment-restbase02 is OK Less than 1.00% above the threshold [0.0] [10:56:32] RECOVERY - Puppet failure on deployment-urldownloader is OK Less than 1.00% above the threshold [0.0] [10:57:12] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [10:57:26] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [10:57:54] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [10:58:37] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [11:00:29] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [11:00:35] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [11:01:37] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [11:03:47] RECOVERY - Puppet failure on deployment-eventlogging02 is OK Less than 1.00% above the threshold [0.0] [11:04:35] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [11:04:51] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [11:04:53] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [11:05:07] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [11:05:19] RECOVERY - Puppet failure on deployment-elastic07 is OK Less than 1.00% above the threshold [0.0] [11:05:25] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [11:05:54] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [11:06:10] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [11:06:26] RECOVERY - Puppet failure on deployment-pdf01 is OK Less than 1.00% above the threshold [0.0] [11:07:42] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [11:08:12] RECOVERY - Puppet failure on deployment-bastion is OK Less than 1.00% above the threshold [0.0] [11:09:16] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [11:09:42] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [11:09:54] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [11:10:54] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [11:11:06] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [11:11:44] RECOVERY - Puppet failure on deployment-mediawiki03 is OK Less than 1.00% above the threshold [0.0] [12:07:18] PROBLEM - Puppet failure on integration-slave-jessie-1001 is CRITICAL 100.00% of data above the critical threshold [0.0] [12:47:29] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Isolation: Puppetize Nodepool configuration - https://phabricator.wikimedia.org/T89143#1429499 (10hashar) a:3hashar [12:48:55] 10Continuous-Integration-Infrastructure: Request Jenkins shell access for account "sniedzielski" - https://phabricator.wikimedia.org/T103192#1429516 (10hashar) p:5Normal>3High [12:49:01] 10Continuous-Integration-Infrastructure: Request Jenkins shell access for account "sniedzielski" - https://phabricator.wikimedia.org/T103192#1429518 (10hashar) a:5hashar>3None [12:49:49] 10Continuous-Integration-Infrastructure, 6operations, 7HHVM: HHVM Jenkins job throw: Unable to set CoreFileSize to 8589934592: Operation not permitted (1) - https://phabricator.wikimedia.org/T78799#1429522 (10hashar) [12:50:22] 10Browser-Tests, 10Continuous-Integration-Config: Browser test jobs should use xUnit publisher instead of Junit - https://phabricator.wikimedia.org/T94684#1429526 (10hashar) [13:05:23] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #708: FAILURE in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/708/ [13:27:53] 10Browser-Tests, 10Continuous-Integration-Config: Browser test jobs should use xUnit publisher instead of Junit - https://phabricator.wikimedia.org/T94684#1429615 (10hashar) The only jobs that uses the xunit publisher are: ``` $ grep -l xunit /var/lib/jenkins/jobs/*/config.xml|cut -d\/ -f6 integration-phpunit-... [13:33:19] 10Continuous-Integration-Infrastructure, 7Jenkins: Upgrade Jenkins to 1.609.1 - https://phabricator.wikimedia.org/T101884#1429633 (10faidon) [13:33:21] 10Continuous-Integration-Infrastructure, 6operations, 7Blocked-on-Operations, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.1 - https://phabricator.wikimedia.org/T103343#1429630 (10faidon) 5Open>3Resolved a:3faidon Done! [13:42:45] bd808: I ended up spending most of the weekend in random fbos waiting for weather to pass [13:42:53] 24 hours away, less than 5 hours time logged... About the same for the other british guy [13:54:58] 10Browser-Tests, 10Continuous-Integration-Config: Browser test jobs should use xUnit publisher instead of Junit - https://phabricator.wikimedia.org/T94684#1429699 (10hashar) Comparing the two plugins, the output are exactly the same. Xunit has the ability to tweak thresholds though but for most jobs we are usi... [14:02:13] 5Continuous-Integration-Isolation, 6operations, 5Patch-For-Review: Backport python-diskimage-builder 0.1.46 from testing to jessie-wikimedia - https://phabricator.wikimedia.org/T102880#1429709 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [14:08:17] !log Disconnected lanthanum Jenkins slave. Being phased out https://phabricator.wikimedia.org/T86658 [14:08:19] Logged the message, Master [14:08:54] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1429721 (10hashar) I have disconnected the server from Jenkins master/slave config https://integration.wikimedia.org/ci/computer/lanthanum/ [14:17:22] hashar: is there anything else left with lanthanum? [14:21:04] JohnFLewis: why? [14:21:08] what ? :D [14:21:26] just looking at it and noticed your disconnect above :P [14:21:38] JohnFLewis: yeah most everything now runs on labs instances [14:21:44] and nothing is left to run on lanthanum [14:22:10] so we will decommission the server so it can be reused by some other project [14:22:25] the decom part is mostly what I'm looking at [14:47:37] 5Continuous-Integration-Isolation, 6operations, 5Patch-For-Review: Backport python-diskimage-builder 0.1.46 from testing to jessie-wikimedia - https://phabricator.wikimedia.org/T102880#1429900 (10MoritzMuehlenhoff) python-diskimage-builder 0.1.46-1+wmf1 has been added to apt.wikimedia.org @hashar: Please m... [15:30:00] legoktm: Imma let you finish your patch, but first, lemme add my own bits. [15:33:10] thcipriani: The thing with restbase you were stuck on? Design decision :) [15:33:31] It doesn't start automagically because cassandra doesn't either. Fixed as soon as I enabled the service in systemd. [15:33:55] I guess actionable there is possibly detecting it and failing in a slightly easier to understand manner? [15:34:33] ostriches: restbase just wouldn't start. Restbase01 looks fine now. trying to spin up other instances, but there's a bunch of nfs things that are tripping up the palladium now :( [15:34:46] Blehhh, ok [15:35:51] by "looks fine" I mean all the ports look fine and the services report as up, but haven't done any sort of actual testing on it. [15:36:33] thcipriani: It started as soon as you enabled it in systemd. [15:37:30] https://phabricator.wikimedia.org/T104276#1417765 [15:37:32] ostriches: cool, yeah, I saw your note on it on Wednesday, thanks for checking that. [15:37:57] Yeah, and mobrovac said it was by design. We just didn't expect that and the actual way puppet fails in that case is cruddy [15:38:34] heh, yeah, totally. [15:39:25] Something something systemd :p [15:40:18] What puppet saw: WHERE ARE TEH FILES? [15:40:26] What we saw: Files there, why won't you start? [15:40:38] What systemd said: I can't start the service, no files [15:40:45] What systemd meant: The service is not enabled [15:41:36] Also, last week I learned that `man systemctl` is clearly not written by humans who expect other humans to read it. [15:42:33] systemd is a big opaque blob of wtf for me so far [15:42:52] old geeks, new tricks, yadda yadda yadda [15:44:04] I feel like it's going to make my life on desktop linux easier in about 6 months and my life using linux server harder forever. [15:44:23] but that's just the view from the bottom of the hill. [15:47:05] I've been hearing that desktop linux will be usable for normal people "soon" longer than folks waited for Duke Nukem Forever [15:53:55] bd808: :D [15:56:21] If anything, desktop linux has become less usable in recent years. 2009 was probably as close as any distros got to being usable mainstream. Metacity and gnome 2 were window managers folks could grok. Unity, gnome3, not fun-to-use gentle introductions for neophytes IMO. [15:59:10] My last linux on the desktop experience was with KDE2 ;) [16:19:48] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL 100.00% of data above the critical threshold [43200.0] [16:21:31] ru-roh [16:23:02] Yeah, same nfs bs. [16:24:06] Ah, home directories. [16:25:02] Can't we disable nfs home directories for the project if we don't need them? [16:25:02] We really don't in staging. [16:25:16] ostriches: they should be disabled in staging and deployment-prep [16:27:00] Hmm, yeah I don't see it in nfs-mounts.yaml [16:29:02] To #-labs! [16:31:15] ostriches: I'd guess we need to update deployment-salt to latest origin/production and possibly kick the puppetmaster, that's what I'm doing in staging right now. [16:31:49] Ah, could be [16:34:56] RECOVERY - Puppet staleness on deployment-restbase01 is OK Less than 1.00% above the threshold [3600.0] [16:38:33] That did it [16:38:33] Or not... [16:38:33] Hmm, puppet still failing [16:41:39] on restbase01? huh, looks like it's looking for sid-wikimedia in http://apt.wikimedia.org/wikimedia/dists/ [16:42:22] didn't realize it was running sid. [17:02:43] thcipriani, re restbase and C* not starting automatically: we are trying not to start services with unknown data and/or code state [17:03:16] in Cassandra's case that can lead to data corruption, in $random_service's case that can run to random behavior when a node comes back with old code [17:03:29] s/run/lead/ [17:11:39] gwicke: Yeah, it makes sense now that it was explained :p [17:42:12] 5Continuous-Integration-Isolation, 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-103, and 3 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1430922 (10Andrew) [17:47:39] 10Beta-Cluster: Install Checkuser extension on Beta-Cluster - https://phabricator.wikimedia.org/T104883#1430983 (10Bugreporter) 3NEW [17:50:32] ugh [17:50:32] ongoing vandalism at beta cluster [17:58:32] thcipriani, twentyafterfour, mobrovac, marxarelli: https://phabricator.wikimedia.org/diffusion/GDEP/ [17:59:00] nice [17:59:56] "Synergy and other buzzwords relating to workflows and deployments." [18:00:57] Glaisher: haha :) [18:01:00] it's a synergistic thing? [18:01:28] ostriches: thanks, I will push the requisite boilerplate [18:01:31] (c) 2015. Synergistic Work Engine [18:01:39] yey [18:01:51] finally a true differential repo [18:02:18] twentyafterfour: How do we push? [18:03:02] We need to turn on diffusion.allow-http-auth for now, until we have SSH [18:03:14] yeah [18:03:40] I'll write a patch for puppet to do it perma-like [18:04:33] greg-g: there is a nice bug about enabling CU on beta cluster, fair to say 'declined'? :) [18:06:57] 10Beta-Cluster: Install Checkuser extension on Beta-Cluster - https://phabricator.wikimedia.org/T104883#1431075 (10JohnLewis) 5Open>3declined a:3JohnLewis Beta is not really designed for any sort of production use so this means nothing. Also releasing IP and user agent information to people freely with no... [18:12:13] 10Deployment-Systems, 6Release-Engineering, 6Performance-Team, 6operations, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1431094 (10mmodell) p:5High>3Low [18:12:44] 10Deployment-Systems, 6Release-Engineering, 6Performance-Team, 6operations, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1414314 (10mmodell) p:5Low>3Normal [18:14:42] 10Deployment-Systems, 6operations, 5Patch-For-Review: Trebuchet doesn't like when a deployer server is also a minion, a edge case for scap - https://phabricator.wikimedia.org/T67549#1431113 (10thcipriani) [18:15:47] 10Deployment-Systems, 6operations, 5Patch-For-Review: Trebuchet doesn't like when a deployer server is also a minion, a edge case for scap - https://phabricator.wikimedia.org/T67549#1431124 (10thcipriani) Are there any updates that need to happen on this patch? Could pull to deployment-prep for a sanity check. [18:17:30] 10Deployment-Systems: make-wmf-branch should support resuming - https://phabricator.wikimedia.org/T101935#1431134 (10mmodell) [18:20:17] 10Beta-Cluster: Install Checkuser extension on Beta-Cluster - https://phabricator.wikimedia.org/T104883#1431148 (10hashar) What @JohnLewis. We explicitly disabled CheckUser on June 2nd 2012 with https://gerrit.wikimedia.org/r/#/c/9796/ There was no bug, that was asked internally following discussions with commu... [18:23:57] 10Deployment-Systems, 6Release-Engineering, 7Epic: Rethinking mediawiki deployment process - https://phabricator.wikimedia.org/T89945#1431188 (10mmodell) p:5Low>3Normal [18:24:06] 10Deployment-Systems: Make make-wmf-branch able to branch extensions with replaced substring of the version of mediawiki being branched - https://phabricator.wikimedia.org/T51392#1431191 (10mmodell) [18:24:08] 10Deployment-Systems, 6Release-Engineering, 7Epic: Rethinking mediawiki deployment process - https://phabricator.wikimedia.org/T89945#1049275 (10mmodell) [18:27:07] 10Deployment-Systems, 6Release-Engineering: Update mediawiki-tools-release to use new API continuation - https://phabricator.wikimedia.org/T102866#1431220 (10thcipriani) p:5Triage>3Normal [18:31:01] 10Deployment-Systems: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1431243 (10mmodell) p:5Triage>3Normal [18:34:19] 10Deployment-Systems, 7Graphite: Record sync wikiversions event in graphite - https://phabricator.wikimedia.org/T104635#1431265 (10mmodell) I think this sounds easy to implement but I don't know specifically how to push an event like that into graphite. Can someone point me to some kind of example where we are... [18:34:57] 10Deployment-Systems, 7Graphite: Record sync wikiversions event in graphite - https://phabricator.wikimedia.org/T104635#1431268 (10mmodell) p:5Triage>3Normal [18:38:00] 10Deployment-Systems: Investigate what changes are needed to deploy MW+Extensions by percentage of users (instead of by domain/wiki) - https://phabricator.wikimedia.org/T104398#1431278 (10mmodell) p:5Triage>3Normal this seems epic, not sure how actionable this is at the moment, backlogging @greg: when you g... [18:38:15] 10Deployment-Systems, 10RESTBase: Setup staging for testing RESTBase deploys - https://phabricator.wikimedia.org/T104276#1431283 (10thcipriani) p:5Triage>3High [18:39:33] 10Deployment-Systems, 6Services: Use semantic versioning for services (for consistency with mediawiki core) - https://phabricator.wikimedia.org/T102550#1431296 (10mmodell) p:5Low>3Normal [18:39:55] 10Deployment-Systems, 6Release-Engineering, 7Epic: Rethinking mediawiki deployment process - https://phabricator.wikimedia.org/T89945#1431306 (10mmodell) [18:39:57] 10Deployment-Systems, 6Services: Use semantic versioning for services (for consistency with mediawiki core) - https://phabricator.wikimedia.org/T102550#1367472 (10mmodell) [18:40:03] 10Deployment-Systems, 10RESTBase: Setup staging for testing RESTBase deploys - https://phabricator.wikimedia.org/T104276#1431308 (10GWicke) > This is a safety measure put in place to disable starting RESTBase on boot because Cassandra is not started on boot either since that might harm the data and corrupt it.... [18:40:40] 10Deployment-Systems, 6Release-Engineering, 6Performance-Team, 6operations, and 2 others: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1431310 (10mmodell) p:5High>3Normal [18:45:28] 10Deployment-Systems, 6Performance-Team, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1431343 (10mmodell) Who should be working on this? I have no idea how to get this data. [18:45:41] 10Deployment-Systems, 6Performance-Team, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1431345 (10mmodell) [18:52:20] 10Deployment-Systems, 7Graphite: Record sync wikiversions event in graphite - https://phabricator.wikimedia.org/T104635#1431421 (10bd808) As simple as adding `self.get_stats().increment('deploy.sync-wikiversions')` after `self.announce(...)` in scap.main.SyncWikiversions. It would probably be useful to record... [18:56:40] 10Deployment-Systems: Trebuchet blockers (tracking) - https://phabricator.wikimedia.org/T45338#1431446 (10mmodell) [18:56:42] 10Deployment-Systems: [Trebuchet] Git-deploy should have an rsync backend - https://phabricator.wikimedia.org/T56185#1431443 (10mmodell) 5Open>3declined a:3mmodell the new deployment framework (rGDEP) will likely support pluggable transport mechanisms. At any rate I think that git-deploy will become extinc... [19:08:15] 10Beta-Cluster, 10Citoid: On beta cluster citoid should self update and reload after change is merged - https://phabricator.wikimedia.org/T95652#1431520 (10hashar) p:5Triage>3High [19:08:33] 10Beta-Cluster, 10Citoid: On beta cluster citoid should self update and reload after change is merged - https://phabricator.wikimedia.org/T95652#1431523 (10hashar) p:5High>3Normal [19:09:48] 10Beta-Cluster, 10Citoid: On beta cluster citoid should self update and reload after change is merged - https://phabricator.wikimedia.org/T95652#1196889 (10hashar) @mobrovac and I had some pairing last week to train him up on JJB usage. The informal .plan is to revisit each of the mediawiki services backend an... [19:10:26] 10Continuous-Integration-Infrastructure: Request Jenkins shell access for account "sniedzielski" - https://phabricator.wikimedia.org/T103192#1431543 (10Niedzielski) [19:17:21] 10Beta-Cluster: Logging out of Commons beta: Cannot contact the database server: Unknown database 'incubatorwiki' - https://phabricator.wikimedia.org/T71898#1431586 (10hashar) We deleted a bunch of wiki projects / databases on the beta cluster ({T48104}). They are still referenced in CentralAuth global username... [19:18:21] 10Beta-Cluster: Logging out of Commons beta: Cannot contact the database server: Unknown database 'incubatorwiki' - https://phabricator.wikimedia.org/T71898#1431604 (10hashar) See T65396 and T68401 for a possible fix [19:18:48] 10Beta-Cluster, 5Patch-For-Review, 3Reading-Web, 7Wikimedia-log-errors: Visiting sign up form shows 500 - https://phabricator.wikimedia.org/T103107#1431611 (10demon) [19:19:56] 10Beta-Cluster: Logging out of Commons beta: Cannot contact the database server: Unknown database 'incubatorwiki' - https://phabricator.wikimedia.org/T71898#1431627 (10thcipriani) p:5Triage>3Low [19:20:37] 10Deployment-Systems, 6Release-Engineering: Update mediawiki-tools-release to use new API continuation - https://phabricator.wikimedia.org/T102866#1431645 (10Reedy) Yeah, I copied botclasses.php from elsewhere. I'm not 100% sure where the actual canonical source for it is. However, for what we are/were using... [19:21:09] haha "canonical source" [19:21:42] thcipriani: Could we push a config patch now? Trying to fix a fatal in Beta Cluster (which'll become a prod fatal when the train goes out). [19:21:44] Reedy: there is no canonical source, everyone just copies it into their projects and modifies as necessary :P [19:21:58] legoktm: I did presume as much... :P [19:22:11] 10Beta-Cluster, 10Compact-Personal-Bar-(Beta), 10MediaWiki-extensions-VectorBeta, 7Beta-Feature, 7user-notice: Remove Compact Personal Bar from Beta Cluster and Production - https://phabricator.wikimedia.org/T104659#1431648 (10thcipriani) p:5Triage>3Normal [19:22:56] James_F: thcipriani: I guess you can push it :-D [19:23:12] it is not like you guys are going to push a few thousands lines of experimental code hehe [19:23:17] No. :-) [19:23:34] Krenair: You want to do the honours [19:23:35] ? [19:24:23] Reedy: I think it would be worthwile to switch to https://github.com/addwiki/mediawiki-api, It is on my list to kill botclasses from my bots as well... [19:24:53] legoktm: WFM. botclasses.php was just the first/simplest php api interface I found [19:25:22] James_F, ok [19:26:10] 10Beta-Cluster, 10MediaWiki-API: mw: interwiki prefix missing on beta cluster, so API's "complete documentation" is a 404. - https://phabricator.wikimedia.org/T104504#1431659 (10hashar) p:5Triage>3Low Maybe related to {T69931} ? One can dig at http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:Inte... [19:28:22] 10Beta-Cluster, 7Graphite, 7Shinken: Delete specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T104091#1431669 (10hashar) a:3yuvipanda Hello Yuvi, would you mind having a look at those stalled graphite entries for no more existent disk partition? Shiken complains about them f... [19:28:35] 10Beta-Cluster, 7Graphite, 7Shinken: Delete specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T104091#1431672 (10hashar) p:5Triage>3Normal [19:31:54] 10Beta-Cluster, 10Wikimedia-Logstash: deployment-logstash02 fails puppet: Apache2 can't start, mod_authz_groupfile not enabled on Jessie - https://phabricator.wikimedia.org/T103804#1431681 (10hashar) Hello @bd808 , seems our puppet manifests for logstash are not complete for Jessie. We fail to enable the Apac... [19:32:27] 10Beta-Cluster, 10Wikimedia-Logstash: deployment-logstash02 fails puppet: Apache2 can't start, mod_authz_groupfile not enabled on Jessie - https://phabricator.wikimedia.org/T103804#1431693 (10hashar) p:5Triage>3Normal [19:32:47] 10Beta-Cluster, 10Wikimedia-Logstash: deployment-logstash02 fails puppet: Apache2 can't start, mod_authz_groupfile not enabled on Jessie - https://phabricator.wikimedia.org/T103804#1399518 (10hashar) p:5Triage>3Normal [19:34:28] 10Beta-Cluster, 7Varnish: deployment-parsoidcache02 fails puppet: Role::Cache::Statsd/Varnish::Logging::Statsd[default]/Base::Service_unit[varnishstatsd-default]/Service[varnishstatsd-default]: Could not evaluate: Could not find init script or upstart conf file... - https://phabricator.wikimedia.org/T103660#1431699 [19:34:39] hashar, thcipriani: Thanks; all clear. [19:34:47] hashar, fyi, at some point I fiddled with one of those mods to fix a puppet error, moaned about it on phabricator so we wouldn't forget, and then forgot [19:36:13] 10Beta-Cluster, 6Release-Engineering, 10Continuous-Integration-Config, 10Parsoid: Parsoid patches don't update Beta Cluster automatically -- only deploy repo patches seem to update that code - https://phabricator.wikimedia.org/T92871#1431713 (10hashar) So that is similar to {T95652} where I wrote: @mobrov... [19:36:34] 10Beta-Cluster, 10Citoid: On beta cluster citoid should self update and reload after change is merged - https://phabricator.wikimedia.org/T95652#1196889 (10hashar) [19:36:37] 10Beta-Cluster, 7Epic: Meeting: Automatic deployment of backend services on beta cluster - https://phabricator.wikimedia.org/T100099#1431719 (10hashar) [19:37:00] 10Beta-Cluster, 7Epic: Meeting: Automatic deployment of backend services on beta cluster - https://phabricator.wikimedia.org/T100099#1431724 (10hashar) From {T95652}: @mobrovac and I had some pairing last week to train him up on JJB usage. The informal .plan is to revisit each of the mediawiki services backen... [19:37:51] 10Beta-Cluster, 6Release-Engineering, 10Continuous-Integration-Config, 10Parsoid: Parsoid patches don't update Beta Cluster automatically -- only deploy repo patches seem to update that code - https://phabricator.wikimedia.org/T92871#1431729 (10hashar) p:5High>3Normal [19:48:37] 10Beta-Cluster, 10Staging, 6Collaboration-Team, 7Database: Use External Store on Beta Cluster - https://phabricator.wikimedia.org/T95871#1431750 (10hashar) @Mattflaschen External Store is a bit tricky. Gotta need to setup a few instances to hold the SQL databases then add the related mediawiki-config conf... [19:51:20] James_F: I don't think we have a formal process regarding pushing changes just before a branch cut [19:51:44] hashar: Depends on the team; in VE we very much try to avoid that. [19:51:46] James_F: thcipriani just told me that apparently we tend to freeze features on friday [19:51:52] and potentially will use monday as a QA day [19:51:57] then cut on tuesday for deployment [19:52:06] having probably enough confidence the branch is clean [19:52:10] That's roughly our aspiration in VE, yes. [19:52:21] other teams are adopting that trends [19:52:27] that slow down things a bit [19:52:32] but I think it is worth it [19:52:35] less troubles when deploying [19:52:43] less maintenance / bugs / fire fight on deployment [19:52:55] so overall should be a net gain (then I don't have metrics to back up my assumption) [19:52:58] time will tell! [19:53:20] one sure thing, multiple devs asked for it :) That is a good sign QA is nowadays taken seriously [19:59:48] it was brought up on wikitech before, but too many people had different opinions to make any kind of standard [20:04:32] ebernhardson|lch: if we get the most important codes / fancy features to adhere to that model without enforcing it, we will be fine :-} [20:05:24] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1431819 (10mmodell) p:5Normal>3Low [20:05:42] ostriches: I disabled Gerrit replication to lanthanum (CI slave) with https://gerrit.wikimedia.org/r/222595 [20:05:51] 10Deployment-Systems, 10RESTBase: Setup staging for testing RESTBase deploys - https://phabricator.wikimedia.org/T104276#1431834 (10mmodell) [20:05:54] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1136938 (10mmodell) [20:06:03] ostriches: the conf landed, but puppet does not restart gerrit for replication.config changes. Would you mind kicking it please ? [20:06:39] ostriches: also I could not find a command line to check the status of replication. But maybe there is none. Might be good to add something on https://wikitech.wikimedia.org/wiki/Gerrit [20:07:09] oh `gerrit show-queue -w` does still show lanthanum :D [20:10:51] 10Deployment-Systems, 6Analytics-Backlog, 6Performance-Team, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1431847 (10Krinkle) [20:14:07] 10Deployment-Systems, 6Analytics-Backlog, 6Performance-Team, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1431863 (10Krinkle) Presumably by adding a tail subscriber to the varnish stream. Basically we'd collect... [20:20:39] 10Continuous-Integration-Infrastructure: MediaWiki phpunit jobs should collect php errors from installer - https://phabricator.wikimedia.org/T104909#1431875 (10Krinkle) 3NEW [20:21:41] hashar: i'm trying to further generalize the mw-selenium builder by factoring out the mwext dependency bits [20:23:12] is the "right" way to use the ext_dependencies python script? [20:23:18] and make a generic builder? [20:25:28] marxarelli: hi! [20:25:46] legoktm: :) [20:25:55] marxarelli: the ext_dependencies script sets an env variable that your jenkins job can use [20:25:59] i was just looking through the git logs [20:27:02] we load the function in zuul with: [20:27:02] - name: ^mwext-(testextension|qunit).*? [20:27:02] parameter-function: set_ext_dependencies [20:27:04] legoktm: looks like prepare-mediawiki-zuul-project is maybe what i want [20:27:24] yep, probably [20:27:40] nifty [20:29:17] marxarelli: legoktm is the boss :-) [20:29:27] we used to have the dependencies in the yaml on a per job/repo basis [20:29:43] If i understand it properly, there is now a shared job which is feed the dependencies to use [20:29:59] legoktm: awesome. looks pretty straightforward [20:30:11] hashar: You have to kick the plugin, not the full gerrit. [20:30:12] iirc [20:30:17] Or I can't remember. [20:30:38] 10Beta-Cluster: Enable the possibility to block users by the AbuseFilter at the deployment wiki at the beta cluster - https://phabricator.wikimedia.org/T103060#1431944 (10Luke081515) p:5Triage>3High At the moment, there is a lot of spam, so blocking would help a lot. [20:30:43] ostriches: seems I am lacking access rights for the replication plugin :/ [20:31:09] To do what with it? [20:31:21] to kick the plugin :-) [20:31:30] or at least query it for whatever info it might have [20:31:34] `ssh -p 29418 gerrit.wikimedia.org gerrit plugin reload replication` [20:32:01] `gerrit plugin ls` gives you a list of all plugins and their state [20:32:08] !log Gerrit: reloading replication plugin: gerrit plugin reload replication [20:32:12] Logged the message, Master [20:32:16] I already did :p [20:32:19] Anyway, you're good [20:32:20] ah [20:32:36] but does the replication plugin has any command line interface ? [20:32:43] like to show currently configured replicats [20:32:49] No [20:33:07] `replication start` is the only thing you can do [20:33:10] To manually kick it off [20:33:23] oh my, are we getting a new gerrit maintainer? :D [20:33:45] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Disable Gerrit replication to production slaves - https://phabricator.wikimedia.org/T86661#1431959 (10hashar) From Chad, had to reload the replication plugin to take in account the configuration change: `gerrit plugin reload replication`. Lanthanum s... [20:33:51] ostriches: okkk :) [20:34:29] !log lanthanum: deleting gerrit replicas under /srv/ssd/gerrit [20:34:32] Logged the message, Master [20:35:00] legoktm: the description of set_ext_dependencies() says "reads dependencies from the yaml file" but it looks like just a static dictionary at the top. were there plans to create a convention/entrypoint for that? [20:36:30] marxarelli: oh yeah, originally I had it in a yaml file, but I couldn't figure out the path handling so I just stuck the mapping in the python file itself [20:38:32] legoktm: ok. i was having an issue with mw-selenium tests needed to have a skin installed as well (https://gerrit.wikimedia.org/r/#/c/222223/). would it be reasonable to add cloning of vector to the prepare-mediawiki-zuul-project builder? [20:38:50] or would that break shit? [20:40:09] Krinkle: if you are around, we lanthanum.eqiad.wmnet no more runs any job (everything that still ran on that host got moved to labs) [20:40:25] Krinkle: so I am going to ask ops to phase out / wipe the machine entirely ( https://phabricator.wikimedia.org/T86658 ) [20:40:48] hashar: OK [20:40:52] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Migrate all jobs to labs slaves - https://phabricator.wikimedia.org/T86659#1431987 (10hashar) [20:40:53] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Disable Gerrit replication to production slaves - https://phabricator.wikimedia.org/T86661#1431986 (10hashar) [20:40:56] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1431985 (10hashar) [20:41:26] hashar: https://phabricator.wikimedia.org/T86659 [20:41:44] marxarelli: I don't think we should since the qunit and phpunit tests should all work with no skins installed. Also I don't think any of our current extension loading infra (the stuff in integration/jenkins) supports skins yet [20:41:44] it all hardcodes extensions [20:42:52] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1431990 (10hashar) a:3RobH Removed the blockers that have been achieved for lanthanum.eqiad.wmnet There will be some puppet cleanup to conduct. Some part sti... [20:43:25] Krinkle: yeah [20:43:29] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Migrate all jobs to labs slaves - https://phabricator.wikimedia.org/T86659#1431993 (10Krinkle) [20:43:35] Krinkle: some jobs are still on gallium -:-/ [20:43:50] legoktm: ok, that makes sense not to bloat the qunit and unittest jobs [20:44:03] hashar: OK. [20:44:18] hashar: Any progress on git-cache and smaller instances? [20:44:32] nop [20:44:51] but got a disk image recipe on my other comp to create Jessie images that have some basic puppet manifest [20:44:59] legoktm: i would like to figure out a way to generalize the ext dependency setup, to allow for cloning of arbitrary dependencies (a skin for mw-selenium) [20:45:03] marxarelli and I working on it [20:45:30] Krinkle: for git-cache , we should probably revisit the way Zuul is installed on labs instance. Using .deb package is a pain in the *** [20:45:50] Yes, it is. :P [20:46:08] Does it need any kind of installation? [20:46:12] marxarelli: you could optionally add the dependency by checking what job.name is in set_ext_dependencies. [20:46:14] Or can we just trebuchet out of git? [20:46:18] (git::clone) [20:46:31] Trebuchet is dead [20:46:38] but [20:46:41] I know, but I mean fake trebuchet [20:46:53] when creating the base image, we could create a Zuul virtual env based on integration/zuul @ some branch [20:46:58] then pip install :-D [20:47:05] We're not creating new instances all the time [20:47:11] this is about existing instances, right? [20:47:15] legoktm: that's true [20:47:37] legoktm: what was the problem with reading the dependency list from file/yaml? [20:47:39] marxarelli: the extension loading logic in integration/jenkins's 50_mw_ext_loader.php needs to be updated to support skins. I think everything else should just work(TM)... [20:47:54] marxarelli: some weird path stuff, __file__ wasn't actually the file. I think zuul moved the python file around or something. [20:48:06] hashar: What changes do we need in Zuul though? Is it not good enough right know? [20:48:17] I would expect the blockers to be on our side, not new patches to Zuul. [20:48:42] legoktm: oh, i see what you're saying. i actually was thinking of reading it from a file in the repo [20:48:59] legoktm: like `tests/dependencies.yaml` or something [20:49:16] hashar: I don't know ETA on isolation, but I expect that will be many months to finalise. It'd be quite useful and valuable to end-users if we have git-cache working (which requires smaller instances due to non-race syncing as expained on phab) – so that we can wipe workspace and have fix dozens of instabilities [20:49:38] it will also be a good preparation for isolation after that. [20:49:50] marxarelli: I think just sticking it into the python file is good enough for now. [20:49:52] that way, it can simply be declared by the project, less coupling in jjb [20:50:21] !log removing lanthanum from Jenkins slave configuration. Server is gone ( https://phabricator.wikimedia.org/T86658 ) [20:50:24] Logged the message, Master [20:50:32] I left most of it documented in Phabricator and in Gerrit. It'd be sad to go to waste. If you need a session to brainstorm, I'm available for that. [20:51:33] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1432015 (10hashar) Removed it from the Jenkins slaves configuration. [20:55:54] !log cherry-picked https://gerrit.wikimedia.org/r/#/c/223172/ for testing [20:55:57] Logged the message, Master [20:56:11] Krinkle: basically gotta cherry pick a few more patches and rebuild the Zuul packages then get them uploaded to apt [20:56:23] Krinkle: and figure out the potential impacts :( [20:56:56] hashar: which patches? [20:57:08] !log restarted puppetmaster on deployment-salt [20:57:11] Logged the message, Master [20:57:11] what issues are there in Zuul that we need fixed? [20:59:23] Krinkle: it is bed time for me sorry. Poke me during our afternoon [20:59:32] k [21:00:20] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 22.22% of data above the critical threshold [0.0] [21:00:54] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 30.00% of data above the critical threshold [0.0] [21:01:06] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 55.56% of data above the critical threshold [0.0] [21:01:26] 10Beta-Cluster, 10Wikimedia-Site-requests: Enable the possibility to block users by the AbuseFilter at the deployment wiki at the beta cluster - https://phabricator.wikimedia.org/T103060#1432078 (10Krenair) [21:01:31] 10Beta-Cluster, 10Wikimedia-Site-requests: Enable the possibility to block users by the AbuseFilter at the deployment wiki at the beta cluster - https://phabricator.wikimedia.org/T103060#1432080 (10Krenair) a:3Krenair [21:11:08] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [21:23:17] 10Beta-Cluster, 10Wikimedia-Logstash, 5Patch-For-Review, 15User-Bd808-Test: Build jessie based elasticsearch/logstash/kibana (ELK) host for beta testing - https://phabricator.wikimedia.org/T101541#1432177 (10bd808) [21:26:23] 10Beta-Cluster, 10Wikimedia-Site-requests, 5Patch-For-Review: Enable the possibility to block users by the AbuseFilter at the deployment wiki at the beta cluster - https://phabricator.wikimedia.org/T103060#1432193 (10MGChecker) Is it active for deploymentwiki or for all beta-wikis? In the second case, maybe... [21:30:19] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [21:30:53] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [21:32:37] !log cherry-picked https://gerrit.wikimedia.org/r/#/c/223184/ to deployment-salt [21:32:40] Logged the message, Master [21:36:34] Yippee, build fixed! [21:36:34] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #185: FIXED in 13 min: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/185/ [21:38:53] 10Beta-Cluster: Install Checkuser extension on Beta-Cluster - https://phabricator.wikimedia.org/T104883#1432225 (10MGChecker) I think CheckUser would be useful on beta, because spam is really annoying and beta is in productive use. It is a great place to test new Lua scripts, new JavaScripts and Bots. If there w... [21:54:06] (03PS1) 10Dduvall: Support generic MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) [21:54:08] (03PS1) 10Dduvall: Generalize MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223189 (https://phabricator.wikimedia.org/T103039) [21:54:28] (03CR) 10jenkins-bot: [V: 04-1] Support generic MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [21:56:12] (03CR) 10jenkins-bot: [V: 04-1] Generalize MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223189 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:14:13] (03PS2) 10Dduvall: Generalize MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223189 (https://phabricator.wikimedia.org/T103039) [22:14:37] (03CR) 10jenkins-bot: [V: 04-1] Generalize MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223189 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:18:08] (03PS3) 10Dduvall: Generalize MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223189 (https://phabricator.wikimedia.org/T103039) [22:18:10] (03PS2) 10Dduvall: Support generic MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) [22:18:31] (03CR) 10jenkins-bot: [V: 04-1] Support generic MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:20:06] (03CR) 10jenkins-bot: [V: 04-1] Generalize MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223189 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:20:19] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<55.56%) [22:21:41] (03Abandoned) 10Dduvall: Fix tests directory and "missing skin" errors in MW-Selenium jobs [integration/config] - 10https://gerrit.wikimedia.org/r/222223 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:24:25] legoktm: think i did that right (https://gerrit.wikimedia.org/r/#/c/223188/) though tox is failing for seemingly DNS reasons [22:25:44] marxarelli: hmm, see -labs [22:26:21] legoktm: doh. interesting. i was having an issue at the office a minute ago [22:26:39] dhcp renew fixed it [22:28:56] (03CR) 10Legoktm: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:34:10] (03CR) 10Legoktm: [C: 04-1] "Looks good, could you add tests for this?" [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:52:14] (03PS4) 10Dduvall: Generalize MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223189 (https://phabricator.wikimedia.org/T103039) [22:52:16] (03PS3) 10Dduvall: Support generic MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) [23:04:44] (03PS6) 10Dduvall: WIP Video recording of headless execution [selenium] - 10https://gerrit.wikimedia.org/r/222346 (https://phabricator.wikimedia.org/T104583) [23:05:53] (03CR) 10Dduvall: "Legoktm, done!" [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [23:07:25] 10Deployment-Systems, 7Graphite: Record sync wikiversions event in graphite - https://phabricator.wikimedia.org/T104635#1432456 (10mmodell) a:3mmodell @bd808: thanks, I'll see what I can do [23:13:04] (03CR) 10Legoktm: [C: 031] Support generic MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [23:13:28] marxarelli: not +2ing ^ so you can deploy whenever [23:13:44] legoktm: right on. thanks! [23:15:15] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1432481 (10GWicke) We have been using Ansible for [RESTBase deployments](https://wikitech.wikimedia.org/wiki/RESTBase) and Cassandra restarts for some time now (~2 1/2 months in staging, 1 month in... [23:15:17] legoktm: will reloading zuul reload the python scripts as well? (i.e. just needs the usual `fab deploy_zuul`) [23:15:22] marxarelli: yep [23:15:26] sweet [23:16:24] (03CR) 10Dduvall: [C: 032] Support generic MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [23:20:02] Project beta-update-databases-eqiad build #1281: FAILURE in 1.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/1281/ [23:23:59] !log restarted nutcracker on deployment-mediawiki01 [23:24:02] Logged the message, Master [23:28:35] (03Merged) 10jenkins-bot: Support generic MW-Selenium job for MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/223188 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [23:31:45] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL 100.00% of data above the critical threshold [0.0] [23:34:02] !log Reloading Zuul to deploy I33ac72e7df498e58f0e25d8c59f167d13eae06cf [23:34:04] Logged the message, Master [23:37:48] 10Beta-Cluster, 10Wikimedia-Site-requests, 5Patch-For-Review: Enable the possibility to block users by the AbuseFilter at the deployment wiki at the beta cluster - https://phabricator.wikimedia.org/T103060#1432513 (10Krenair) 5Open>3Resolved @Luke081515: You should be able to get AF to block on all beta...