[00:18:47] greg-g: have a moment ? [00:19:52] matanya: on IRC yeah, I'm wfh today [00:20:04] yeah, irc works [00:20:27] i wanted to ask to push recent wmf branches to the fdc wiki [00:20:39] i am not sure to what group it belongs, if at all [00:20:56] you mean it isn't being updated along with the train? [00:21:07] but it is missing some nice features the entire cluster got in the past months [00:21:13] yes, i fear so [00:21:25] * greg-g is confuszored [00:21:32] 1.27.0-wmf.6 (796411d) [00:21:32] 10:24, 17 November 2015 [00:21:49] that's today :) [00:21:50] in special:version [00:22:18] * greg-g is now more confused [00:22:22] yes, but for example, the split between echo and notifications is not there [00:22:49] oh, same on meta, that might be a settings issue? [00:23:11] ah, didn't think of that [00:24:04] greg-g: thanks, i'll compare the diff's and submit patches for the most annoying misses [00:24:10] * greg-g nods [00:24:11] np [00:30:39] matanya: the messages pane doesn't show up until someone leaves you a message. [00:31:04] legoktm: thanks, i meant gui-wise [00:31:32] yes, the second tab won't show up until someone writes something on your talk page [00:31:42] tab/icon/badge/flyout/popup [00:31:49] whatever you call it :P [00:32:12] confirmed, thanks [00:33:07] * matanya goes to write some recommendation text instead. [00:35:00] legoktm: doohicky [01:29:27] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.94 ms [01:57:18] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [03:34:24] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [03:37:14] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms [03:42:17] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [03:53:47] 10Beta-Cluster-Infrastructure, 10Graphoid: Graphoid does not deploy sync to deployment-sca01 - https://phabricator.wikimedia.org/T118929#1812944 (10Yurik) 3NEW [03:54:41] 10Beta-Cluster-Infrastructure, 10Graphoid: Graphoid does not deploy sync to deployment-sca01 - https://phabricator.wikimedia.org/T118929#1812951 (10Yurik) [04:28:22] (03PS1) 10Tim Starling: Note results of libdc1394 bug investigation [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/253827 [04:44:38] (03CR) 10Tim Starling: [C: 032 V: 032] Note results of libdc1394 bug investigation [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/253827 (owner: 10Tim Starling) [05:29:40] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #606: 04FAILURE in 27 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/606/ [06:39:26] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [08:33:27] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #786: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/786/ [08:43:59] (03CR) 10Ori.livneh: [C: 031] Configure thumbor/exif-optimizer [integration/config] - 10https://gerrit.wikimedia.org/r/253668 (https://phabricator.wikimedia.org/T111722) (owner: 10Gilles) [09:00:24] zeljkof: I am around more or less [09:00:50] hashar: more or less? :) [09:02:36] hashar: can not join the hangout? [09:05:15] nature's call [09:05:15] :D [09:20:45] (03PS2) 10Hashar: Added building the gem to `rake test` (CI entry point for Ruby) [selenium] - 10https://gerrit.wikimedia.org/r/252693 (https://phabricator.wikimedia.org/T117993) (owner: 10Zfilipin) [09:30:48] (03PS1) 10Hashar: Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/253858 (https://phabricator.wikimedia.org/T114860) [09:30:56] (03CR) 10Hashar: "restored with https://gerrit.wikimedia.org/r/253858" [integration/config] - 10https://gerrit.wikimedia.org/r/252690 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin) [09:31:08] (03CR) 10Hashar: [C: 032] Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/253858 (https://phabricator.wikimedia.org/T114860) (owner: 10Hashar) [09:32:02] (03Merged) 10jenkins-bot: Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/253858 (https://phabricator.wikimedia.org/T114860) (owner: 10Hashar) [09:32:15] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [09:32:56] (03CR) 10Hashar: [C: 032] Added Rakefile [ruby/api] - 10https://gerrit.wikimedia.org/r/252698 (https://phabricator.wikimedia.org/T117993) (owner: 10Zfilipin) [09:33:29] (03CR) 10Hashar: "recheck" [selenium] - 10https://gerrit.wikimedia.org/r/252693 (https://phabricator.wikimedia.org/T117993) (owner: 10Zfilipin) [09:33:38] (03Merged) 10jenkins-bot: Added Rakefile [ruby/api] - 10https://gerrit.wikimedia.org/r/252698 (https://phabricator.wikimedia.org/T117993) (owner: 10Zfilipin) [09:38:52] (03CR) 10Hashar: [C: 032] "rake-jessie pass \O/" [selenium] - 10https://gerrit.wikimedia.org/r/252693 (https://phabricator.wikimedia.org/T117993) (owner: 10Zfilipin) [09:47:29] Project beta-scap-eqiad build #79039: 04FAILURE in 2 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/79039/ [09:48:00] (03Merged) 10jenkins-bot: Added building the gem to `rake test` (CI entry point for Ruby) [selenium] - 10https://gerrit.wikimedia.org/r/252693 (https://phabricator.wikimedia.org/T117993) (owner: 10Zfilipin) [09:51:29] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [09:54:47] PROBLEM - Puppet staleness on integration-dev is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:07:34] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 7Ruby, 7Tracking: Update repositories that use mediawiki_selenium Ruby gem to version 1.x - https://phabricator.wikimedia.org/T94083#1813263 (10zeljkofilipin) [10:19:28] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [10:32:15] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 0.99 ms [10:42:18] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [10:47:40] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [11:11:30] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.88 ms [11:12:16] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.40 ms [11:32:38] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [11:44:25] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [12:47:07] Project browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce build #702: 04FAILURE in 1 min 7 sec: https://integration.wikimedia.org/ci/job/browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce/702/ [12:55:55] PROBLEM - Host deployment-mediawiki02 is DOWN: CRITICAL - Host Unreachable (10.68.16.127) [12:58:00] RECOVERY - Host deployment-mediawiki02 is UP: PING OK - Packet loss = 0%, RTA = 1.07 ms [13:01:06] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #852: 04FAILURE in 29 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/852/ [13:06:29] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.90 ms [13:15:12] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [13:34:44] hashar: any idea why rake-jessie ran for this commit? https://gerrit.wikimedia.org/r/#/c/236038/ [13:35:00] zeljkof: because of zuul/layout.yaml ? [13:35:02] it was merged in september [13:35:04] have you looked at it? [13:35:14] oh [13:35:22] I mean, why would we run a job for already merged commit [13:35:22] probably manually enqueued to verify it is working properly [13:35:30] I see [13:35:35] I sometime trigger rechecks from the command line against the last change that got merged [13:35:47] on gallium that is with "zuul enqueue" [13:36:15] nevermind, I have just stumbled upon it and got confused [13:38:19] zeljkof: http://docs.openstack.org/infra/zuul/client.html#enqueue :D [13:38:29] so you can do: [13:38:37] hashar: thanks, was not aware of that [13:38:52] zuul enqueue --trigger gerrit --pipeline test --project mediawiki/core --change 12345,42 [13:39:14] which would inject in Zuul scheduler the exact same thing as if 12345,42 entered the test pipeline (i.e. a new patch) [13:40:23] cool [13:48:44] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.42 ms [13:49:26] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms [13:59:56] (03PS2) 10Hashar: Configure thumbor/exif-optimizer [integration/config] - 10https://gerrit.wikimedia.org/r/253668 (https://phabricator.wikimedia.org/T111722) (owner: 10Gilles) [14:00:31] (03CR) 10Hashar: [C: 032] Configure thumbor/exif-optimizer [integration/config] - 10https://gerrit.wikimedia.org/r/253668 (https://phabricator.wikimedia.org/T111722) (owner: 10Gilles) [14:01:23] (03CR) 10jenkins-bot: [V: 04-1] Configure thumbor/exif-optimizer [integration/config] - 10https://gerrit.wikimedia.org/r/253668 (https://phabricator.wikimedia.org/T111722) (owner: 10Gilles) [14:05:30] (03CR) 10Hashar: [C: 032] "gate failed because of a network error preventing access to Jenkins API." [integration/config] - 10https://gerrit.wikimedia.org/r/253668 (https://phabricator.wikimedia.org/T111722) (owner: 10Gilles) [14:06:20] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [14:06:22] (03Merged) 10jenkins-bot: Configure thumbor/exif-optimizer [integration/config] - 10https://gerrit.wikimedia.org/r/253668 (https://phabricator.wikimedia.org/T111722) (owner: 10Gilles) [14:08:09] gilles: merci d'avoir pris le temps de configurer le CI pour thumblor :-) [14:08:26] hashar: merci pour les reviews! [14:19:27] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.88 ms [14:22:41] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [14:22:48] hashar: pkoi je vois pas les changements que j'ai cherry-pické sur deployment-bastion? [14:23:00] puppet agent -tv aide pas sur deployment-restbase0x [14:23:01] :( [14:24:14] mobrovac: cherry pické quoi ou ? [14:24:29] pour puppet ca doit être sur integration-puppetmaster dans /var/lib/git/operations/puppet [14:24:41] ah ok [14:24:44] sigh [14:25:01] hashar: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated dit pas ça :P [14:25:05] si c'est pour mediawiki / /srv/mediawiki-staging est maintenu par un job jenkins qui nettoie tout toutes les 10 minutes [14:25:18] non non, c'est RESTBase [14:25:23] aahhh [14:25:36] RESTBase no clue [14:25:57] y avait eu l'expérimentation y a deux semaines pour utiliser scap deploy pour restbase [14:26:10] j'essaie integration-puppetmaster [14:26:12] mais je t'avoue qu'ajaourd'hui je ne sais pas comment restbase est poussé/mise à jour sur beta :( [14:26:35] yup, mais ça c'est pour valider https://gerrit.wikimedia.org/r/#/c/253895/ avant d'aller en prod [14:27:13] euh? sudo su me domande le mot de passe [14:27:36] hashar: ^ ? [14:27:47] RECOVERY - Puppet staleness on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [3600.0] [14:27:51] ah puppet [14:27:59] donc faut le cherry picker sur integration-puppetmaster en effet [14:28:11] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #694: 04FAILURE in 2 min 10 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/694/ [14:28:24] j'y suis, mais je peux pas devenir root là-bas [14:28:30] pkoi? [14:28:35] hashar@integration-puppetmaster:~$ sudo su - [14:28:36] root@integration-puppetmaster:~# [14:28:46] nope [14:28:50] tjr mot de passe [14:29:08] ah! [14:29:12] ok ok [14:29:21] tu dois pas avoir les droits ... [14:29:22] grr [14:29:44] ouep [14:29:47] attends [14:29:53] c'est koi ça? [14:29:56] !log granting root access to mobrovac on beta [14:30:07] integration-puppetmaster.integration.eqiad.wmflabs == integration-puppetmaster.eqiad.wmflabs [14:30:17] oh [14:30:18] man [14:30:21] mais integration-puppetmaster.deployment-prep.eqiad.wmflabs existe pas [14:30:29] lets switch to english because my french is crap for work related business [14:30:31] so yeah [14:30:34] haha [14:30:42] we have a couple puppet master, one for beta and one for integration [14:30:44] and I keep mixing up each of them [14:30:56] so for beta it is deployment-puppetmaster.deployment-prep.eqiad.wmflabs [14:31:12] User mobrovac may run the following commands on deployment-puppetmaster: [14:31:14] (ALL) NOPASSWD: ALL [14:31:14] mobrovac: sorry :-/ [14:31:18] right, so i cherry-picked on the good one in the first place [14:31:31] but still can't see the changes on deployment-restbase0x [14:31:35] hm hm hm [14:31:42] * mobrovac goes to dig [14:31:42] yeah [14:32:03] sometime I have to kick puppetmaster process [14:32:12] for it to catch up with cherry picks [14:33:29] ah no, i got it now, the config in beta is managed by scap [14:33:31] it's a symlink [14:33:36] so it doesn't change [14:33:37] uf [14:33:38] kk [14:33:42] thnx hashar! [14:33:42] mess mess [14:33:45] yeah [14:36:24] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #327: 04FAILURE in 8 min 24 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/327/ [14:56:49] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1813655 (10hashar) The two last puppet patches has let the zuul-merger on scandium to reach out gallium AND let the slaves git clone from scandium git... [14:57:19] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/173830 (owner: 10Hashar) [14:57:37] (03CR) 10jenkins-bot: [V: 04-1] tests: factor out code to find defined project [integration/config] - 10https://gerrit.wikimedia.org/r/173830 (owner: 10Hashar) [15:04:43] andrewbogott: zuul-merger pooled on scandium thanks for the follow up patche [15:04:48] I am monitoring it [15:04:49] :-D [15:05:00] working so far? [15:11:29] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [15:12:21] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [15:15:36] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1813684 (10hashar) More unpuppetized / badly pauperized stuff: ``` stderr: 'Cloning into '/srv/ssd/zuul/git/mediawiki/core'... Warning: Identity file... [15:16:54] !log stopping zuul-merger on scandium. Lacks ssh private key to reach gerrit [15:18:04] PROBLEM - zuul_merger_service_running on scandium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [15:19:30] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms [15:27:06] I'd like to set a config for wikipedia sites only, I see that I can add a 'wikipedia' key in mediawiki-config InitialiseSettings.php (like wgFavicon). [15:28:02] And I have a test that uses SiteConfiguration but these vars are not resolved properly, I was wondering where this resolution is made [15:55:31] RECOVERY - Puppet failure on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:00:11] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 5Patch-For-Review, 7WorkType-Maintenance: beta-scap-eqiad mira / deployment-bastion permissions problem - https://phabricator.wikimedia.org/T117016#1813816 (10hashar) https://gerrit.wikimedia.org/r/253040 got merged at ` Wed Nov 18 09:22:43 2015 UTC`. T... [16:03:14] dcausse: it is based on the dblist files [16:03:23] dcausse: can't remember offhand how the resolution is done though :( [16:03:56] probably somewhere in multiversion/MWWikiversions.ph [16:04:18] hashar: thanks, I'll continue to search, resolution by suffix seams to work in my case but not with sitename :/ [16:04:23] wmf-config/CommonSettings.php: $dblist = MWWikiversions::readDbListFile( $tag ); [16:04:23] wmf-config/CommonSettings.php: if ( in_array( $wgDBname, $dblist ) ) { [16:04:50] bah barely make sense [16:04:58] but CommonSettings.php has some code [16:05:02] reading the dblist files [16:05:18] what's the meaning of tag? [16:05:31] and adding matching .dblist files to a $wikiTags variable which is passed as a parameter to the wgConf object [16:06:48] that is mediawiki/core includes/SiteConfiguration.php [16:10:30] hashar: thanks! [16:26:31] thcipriani: I can't land D48. I'm getting "Exception: You do not have permission to push to this repository." using ssh://vcs@git-ssh.wikimedia.org/diffusion/MSCA/scap.git as my origin [16:27:21] bd808: blerg. I'm guessing for whatever reason it's limited to RelEng team (but I'm not sure about that) twentyafterfour ostriches ^ [16:30:29] thcipriani: *nod* that makes some sense. Feel free to land it yourself if you guys are ready. It looks like the associated puppet change merged [16:31:29] bd808: kk, I can land it, would be nice to know how to change these permissions though. Blindly clicking around interfaces isn't yielding anything that looks useful so far. [16:32:55] PROBLEM - Host deployment-parsoidcache02 is DOWN: PING CRITICAL - Packet loss = 100% [16:37:31] bd808: thanks I'll give you push permissions ;) [16:38:01] PROBLEM - Puppet failure on deployment-tmh01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:38:07] * bd808 runs wild with scap changes [16:38:49] Do the permissions not match gerrit? [16:41:00] Krenair: we haven't migrated permissions in any way, the scap repo is a one-off which we are using to test out differential more thoroughly [16:42:56] bd808: haven't started any sort of landing for D48, FYI, so now that you have permissions, feel free. I Need to upgrade my arcanist and wanted to wait until I was done deploying. [16:43:19] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [16:45:13] twentyafterfour: I'm still getting the same error. What should my git remote look like? I have "ssh://vcs@git-ssh.wikimedia.org/diffusion/MSCA/scap.git" [16:45:39] bd808: I just added you maybe you tried before I hit save? [16:45:45] your remote looks correct [16:46:07] Krenair: No, permissions do not match gerrit. Gerrit permissions are not exactly a model to follow :) [16:46:08] victory! [16:46:14] If anything we should be *more* permissive. [16:46:53] ostriches: with scap we probably shouldn't be very permissive, given how critical it is [16:46:57] ostriches, well, yes [16:47:05] thcipriani, twentyafterfour: it's merged now. [16:47:17] twentyafterfour: Well, since we're planning to package it I'm not too worried. [16:47:18] bd808: awesome thanks! [16:47:25] ostriches, point is, the set of people able to do things in the gerrit repo should probably have been granted the equivalent [16:47:55] No, it started as "just repo admins" and then "just releng" and then I'll probably widen it to "anyone" [16:47:56] * twentyafterfour doesn't even know who had commit access in gerrit [16:48:22] allow anyone to push to the repository? [16:48:23] ostriches: there is a per-repository setting for "can push" and there are also project based policies [16:48:31] I know. [16:48:42] I think anyone with an account should be able to push to it, sure. [16:49:03] (03CR) 10Paladox: "Can now be merged." [integration/config] - 10https://gerrit.wikimedia.org/r/253376 (owner: 10Paladox) [16:49:05] to the master branch? [16:49:08] I'm not entirely opposed to that but anyone can get an account... [16:49:19] so that's pretty much wide-open to anything goes [16:49:21] hashar: Please merge https://gerrit.wikimedia.org/r/#/c/253376/ and https://gerrit.wikimedia.org/r/#/c/253165/ they can be merged now. [16:49:38] * twentyafterfour thinks we might want a bit more restriction than that. [16:49:47] unreviewed, in my review queue. Maybe others here can process them though :) [16:49:48] PROBLEM - Puppet failure on mira is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:49:59] Maybe we could have a group like we did with Code Review. [16:50:08] Let anyone who's a member add any other member. [16:50:19] And those people can generally push to *any* repo unless it's otherwise ACL'd [16:50:22] (ie: puppet) [16:50:46] so circle of trust, essentially? [16:50:49] that makes sense [16:50:50] Yep [16:51:40] if we use packages and audits then each team can set up audits on the code that they care about (so that we catch any dangerous rogue commits before they get deployed anywhere) [16:51:57] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:52:09] audit really is a nice tool and it eliminates the need for strict push controls, generally [16:55:46] And really, the number of repos that actually need strict push controls is limited (puppet, dns) [16:56:21] Most other things go through (or should go through) a packaging process which would be the extra set of control you need. [17:03:09] ostriches, deployment branches? [17:03:22] Deployment branches should die in a fire. [17:03:25] Terrible pattern [17:03:31] At least the way we do them for MW [17:03:35] + exts [17:03:38] what do you suggest we replace them with? [17:03:50] A shared repo for deployment that's a "fork" of MW. [17:03:57] *That* repo would have all the permissions [17:04:07] And extensions/core can just be left alone and permissive. [17:08:42] there's a task for that :) [17:08:54] we were just talking about it during antoine and I's 1:1 [17:13:29] Krenair: Anyway, if we did per-branch push permissions we'd set them up in Herald anyway. The "Can push" permission would remain permissive. [17:18:37] !log updating scap in beta cluster to deploy https://phabricator.wikimedia.org/D48 [17:25:13] so far, off to a bad start: 2/10 minions completed fetch :( [17:34:57] !log scap update in beta failing, mediawikis failing git fetch, unable to find needed commit, removing directory and refetching [17:40:02] 10Beta-Cluster-Infrastructure, 6Release-Engineering-Team: Rebuild deployment master - https://phabricator.wikimedia.org/T117504#1814180 (10demon) a:5demon>3None [17:41:12] thcipriani: are you scap3-ing? [17:41:40] marxarelli: trebucheting scap3 on deployment-prep [17:41:51] ahhh [17:41:54] right, minions [17:42:00] corrupt git checkout on most of the minions. [17:43:40] it's weird, with mira in deployment prep, I just had to login and run the salt-call commands manually and everything worked fine: sudo salt-call deploy.fetch 'scap/scap' then deploy.checkout 'scap/scap' [17:46:58] 10Deployment-Systems, 6Release-Engineering-Team: Take heat off day before the weekly branch-cut? - https://phabricator.wikimedia.org/T118212#1814203 (10Nikerabbit) The proposed end goal might cause more SWAT deploys will be used instead, working against the original goal. I am also cautious whether staying lo... [17:49:18] RECOVERY - Host integration-labsvagrant is UP: PING OK - Packet loss = 0%, RTA = 0.58 ms [17:50:07] 10Deployment-Systems, 6Release-Engineering-Team: Take heat off day before the weekly branch-cut? - https://phabricator.wikimedia.org/T118212#1814216 (10greg) We can easily cut the next branch before we deploy it (say, on Friday for the upcoming Tuesday train start) but we'll need yet-another-Beta-Cluster to te... [18:16:20] bd808: ostriches so now I'm still seeing permissions issues with sync-masters, but now it has to do tasks.merge_cdb_updates https://integration.wikimedia.org/ci/job/beta-scap-eqiad/79087/console [18:16:22] 10Beta-Cluster-Infrastructure, 6operations, 7HHVM, 5Patch-For-Review: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1814315 (10dduvall) [18:19:40] thcipriani: hmmm... need help digging into that? [18:20:20] Bleh. [18:21:08] bd808: I can start looking into it, I just tracked down what was happening, poked you guys in case it was something obvious. [18:21:35] that directory is 0755 l10nupdate:wikidev [18:25:20] do we usually run the cdb update task as the l10nupdate user? /me looks [18:25:45] that's what I was just digging for. I know there's a sudoer rule to allow mwdeploy to run some things as l10nupdate [18:26:50] thcipriani: I think that's the problem. look for other calls to scap-rebuild-cdbs [18:27:10] okie doke [18:28:05] * ostriches mutters something about doing everything as root [18:28:15] well.. no. It looks we call that script as mwdeploy [18:28:41] main.RebuildCdbs actually asserts that it it run as mwdeploy [18:29:15] greg-g: 'staging' sounds interesting. Is there a task or page about that? [18:31:07] Krinkle: not really, but summary: yet-another-test-cluster that has two goals: 1) be yet another cluster and 2) clean up the hacks that make up beta cluster [18:31:26] thcipriani: this is probably related to the differences between staging and the deploy dir [18:31:28] Krinkle: staging isn't really on our near-team roadmap right now (it's not a simple thing) :) [18:31:51] in staging we want l10nupdate to be able to mess with the cdb files to do the nightly update with mater message changes [18:32:07] Krinkle: there's https://phabricator.wikimedia.org/tag/staging/ but that's too "down in the weeds" [18:32:22] but in the deploy dir (which is what scap normally operates on) everything is mwdeploy:mwdeploy [18:32:45] * greg-g is in the SoS and then the monthly eng managers meeting, slow to respond now [18:33:09] *master message changes [18:33:09] bd808: that makes sense, mostly :) [18:34:16] * bd808 tries to find the difference between syncing from master->master and just updating the "normal" master [18:36:19] thcipriani: on the deploy master we run tasks.update_localization_cache [18:36:37] that has the sudo as l10nupdate bits in it [18:36:55] That process creates the l10n json files [18:37:29] seems like we only do the merge_cdb_updates task on targets in scap (which is what master → master is doing) [18:37:29] what we are trying to do now in the mira process is the reverse (recreate the cdbs from the json) [18:38:29] I don't think I thought this through fully when I set it up to happen [18:39:19] we need to recreate the cdbs on mira as though l10nupdate had created them in the first place. [18:39:40] that's going to take code changes in scap (new wrapper script) [18:40:03] because we need to sudo as l10nupdate when recreating the cdbs [18:40:26] hi again! bd808, asking you cause I SEE YOU. [18:40:37] the new eventlogging repo I created is not being replicated to github [18:40:40] i see [18:40:44] sudo -u l10nupdate is something mwdeploy can do in sudoers files. [18:40:47] Missing repository created; retry replication to git@github.com:wikimedia/eventlogging [18:40:54] (along with the same message for several other repos) [18:40:55] in [18:41:03] /var/lib/gerrit2/review_site/logs/error_log [18:41:20] twentyafterfour: I see what thcipriani sees re "no newline" by the way: https://phabricator.wikimedia.org/F2973024 [18:41:26] thcipriani: yeah, we just need a scap entry point that can run the right command as the l10nupdate user I think [18:41:49] ottomata: above my pay grade, but ostriches might be able to help [18:42:05] : [18:42:06] :) [18:42:55] thcipriani: am I making sense when I babble about this? [18:44:00] bd808: heh, yeah, this is the part of scap that I generally try to not touch so I'm a little behind. [18:44:10] I think what we are missing is an entry point script that we can use to run tasks.update_l10n_cdb as the l10nupdate user [18:44:33] and then use that instead of directly calling tasks.update_l10n_cdb when we sync to the master [18:44:46] right. I can put this on my todo list. Would probably help me get more familiar with this section of code. [18:45:35] thcipriani: cool. yell at me if you give up, but it shouldn't be too tricky [18:45:49] bd808: thanks, will do :) [18:54:11] ottomata: New repo on gerrit? [18:54:48] ostriches: ja, couple days ago I made https://gerrit.wikimedia.org/r/#/admin/projects/eventlogging [18:54:52] everything is cool w gerrit [18:54:54] Hmm, "Missing repository created" should mean it exists now [18:54:58] but it isn't replicating to github [18:54:59] oh [18:55:01] It'll replicate next time something changes. [18:55:04] https://github.com/wikimedia/eventlogging [18:55:10] hm, lots has been changing [18:55:18] there was some trickiness with it though too [18:55:28] I hate github. [18:55:32] i created the repo, and accidentally committed the wrong first commit [18:55:36] so i deleted it from gerrit and recreated it [18:56:09] i first pushed to it, not through gerrit [18:56:11] so maybe that's the problem? [18:56:14] hmm, i bet that is it [18:56:17] for most of them [18:56:22] there are other repos there that have that problem [18:56:32] since i was filter-branching to create this new repo from a dir of another [18:56:43] i had to push (ddin't want to review hundreds of commits in gerrit) [18:57:24] Maybe? I dunno, gerrit replication breaks over the tiniest thing. [18:57:59] 10Beta-Cluster-Infrastructure, 10Graphoid: Graphoid does not deploy sync to deployment-sca01 - https://phabricator.wikimedia.org/T118929#1814427 (10akosiaris) 5Open>3Resolved a:3akosiaris For some reason there were empty objects under .git directory. Deleting them fixed that and trebuchet works again. [19:05:00] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 5Patch-For-Review, 7WorkType-Maintenance: beta-scap-eqiad mira / deployment-bastion permissions problem - https://phabricator.wikimedia.org/T117016#1814439 (10bd808) >>! In T117016#1813816, @hashar wrote: > https://gerrit.wikimedia.org/r/253040 got merge... [19:29:28] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [20:06:41] uhm ... wmf.7 contains reference to undefined constant CURLOPT_SAFE_UPLOAD [20:07:21] https://phabricator.wikimedia.org/T118988 [20:13:45] jzerebecki: Could you review https://gerrit.wikimedia.org/r/#/c/253376/ and https://gerrit.wikimedia.org/r/#/c/253165/ the first link is for moving jshint to check: since i moved jshint to npm. Source patch was merged. Second link is for migrating WikiEditor to extension-gate. And since codeeditor depends on wikieditor but is not added as dependacy in jenkins yet so this makes it easier. [20:41:42] andrewbogott: looks like I created some tech debt a year or so ago :-/ [20:41:57] the zuul-merger ssh key is only on gallium and is not in puppet. [20:42:01] (was https://gerrit.wikimedia.org/r/#/c/253925/ ) [20:49:42] 10Beta-Cluster-Infrastructure, 10Graphoid: Graphoid does not deploy sync to deployment-sca01 - https://phabricator.wikimedia.org/T118929#1814906 (10hashar) Great finding @akosiaris! It might be due to one of the labvirt node that had a full disk a few days ago. In such a case, files can still be created (ther... [21:21:05] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #74: 04FAILURE in 5 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/74/ [21:24:31] (03PS1) 10Ottomata: Removing python code from EventLogging MW extension, don't need to tox anymore [integration/config] - 10https://gerrit.wikimedia.org/r/254032 (https://phabricator.wikimedia.org/T118863) [21:24:42] (03PS2) 10Ottomata: Removing python code from EventLogging MW extension, don't need to tox anymore [integration/config] - 10https://gerrit.wikimedia.org/r/254032 (https://phabricator.wikimedia.org/T118863) [21:24:56] hashar: , whenever you get a chance, danke [21:25:39] (03CR) 10Hashar: [C: 032] Removing python code from EventLogging MW extension, don't need to tox anymore [integration/config] - 10https://gerrit.wikimedia.org/r/254032 (https://phabricator.wikimedia.org/T118863) (owner: 10Ottomata) [21:25:48] ottomata: will deploy once merged :-) [21:28:31] (03Merged) 10jenkins-bot: Removing python code from EventLogging MW extension, don't need to tox anymore [integration/config] - 10https://gerrit.wikimedia.org/r/254032 (https://phabricator.wikimedia.org/T118863) (owner: 10Ottomata) [21:29:18] ottomata: done! [21:29:25] ottomata: one day we will get it migrated to scap :-} [21:29:37] I am off, have fun everyone [21:30:05] danke laters! [21:43:06] 10Deployment-Systems, 6Release-Engineering-Team, 5Patch-For-Review: Move the train deployment from Thursday to Wednesday for some Wikipedia sites - https://phabricator.wikimedia.org/T115002#1815170 (10greg) Alright, given the next two weeks we won't be having the train anyways (see: https://wikitech.wikimedi... [23:13:44] this is super cool: https://corner.squareup.com/2015/11/fastclone.html [23:22:11] that is pretty neat. [23:23:51] oh, ruby. https://github.com/square/git-fastclone/blob/master/lib/git-fastclone.rb#L18 [23:24:11] haha [23:24:49] iirc, the gem has a very unprofessional post-install message [23:25:55] oh maybe it was just the description [23:25:58] "A small library for doing (command) lines" [23:29:58] thcipriani: are you thinking we could port it to python? [23:30:06] fastclone that is [23:30:26] 10Deployment-Systems, 3Scap3: Need a way to see config diffs in Scap - https://phabricator.wikimedia.org/T118206#1815478 (10mmodell) p:5Triage>3High [23:30:41] marxarelli: heh, I wasn't thinking that, but that is a pattern we follow :) [23:31:01] no, twentyafterfour posted it, I just started digging through. [23:31:58] I was thinking it might be nice for CI stuff [23:32:09] combine that with drydock and we'd have a jenkins killer [23:33:45] i was thinking recently that we should combine drydock and kubernetes :) [23:34:19] the latter is already coming to Tool Labs [23:35:04] kubernetes for allocating VM instances? [23:35:11] containers [23:35:15] ah [23:35:17] right on [23:37:06] so what is the scope of drydock exactly? [23:37:11] i'm a little confused about that [23:38:24] drydock is all about allocating and leasing resources [23:38:50] so instances, working trees, or any arbitrary resource (it's very abstract) [23:45:46] currently the resources are pre-configured statically and it handles leasing the jobs out to harbormaster builds [23:46:49] * marxarelli reads all the docs [23:48:02] they just wrote a bunch of new documentation, which I read last night. it's still not 100% clear but it's pretty good. [23:49:51] it'd be great to sit down and suss out a long-term vision for CI and Phab in January