[00:00:25] shallow-clone: true [00:00:33] Yeah, it's doing a shallow clone alright [00:00:38] I saw the clone with a limited hitory [00:00:40] history* [00:00:45] For 5 minutes [00:00:49] http://ci.openstack.org/jenkins-job-builder/scm.html#scm.git [00:00:58] but after the initial clone it's doing the fetching and chekcout [00:01:02] and that somehow makes it fetch it all [00:01:36] https://wiki.jenkins-ci.org/display/JENKINS/Git+Plugin [00:01:45] It's just yaml>xml that jjb does [00:03:26] http://stackoverflow.com/questions/19352894/how-to-git-fetch-efficiently-from-a-shallow-clone [00:03:29] Might be git behaviour [00:04:58] legoktm: Fixed in git 1.8.5 [00:05:13] but precise >.> [00:05:14] This is a Precise instance with 1.7.9 [00:05:22] Yeah [00:05:27] and php53 [00:05:38] backport git? [00:05:57] http://stackoverflow.com/questions/19352894/how-to-git-fetch-efficiently-from-a-shallow-clone > https://github.com/Webconverger/webc/issues/174 > https://github.com/git/git/commit/238504b014230d0bc244fb0de84990863fcddd59 [00:06:05] legoktm: Might require server-side as well [00:06:11] What does gerrit emulate? [00:06:18] * legoktm has no idea [00:07:05] LEt em test by running it on a Trusty instance [00:08:48] 10Continuous-Integration: Generic phplint job is extremely slow for mediawiki/core - https://phabricator.wikimedia.org/T92042#1102391 (10Legoktm) No longer blocking as of da36d0db3a67ec55ee1f4183cbc15c9b58807c76 [00:10:51] WRT the wikimedia/fundraising/crm repo, we're looking at a situation similar to the mediawiki/vendor one. We don't want to deal with Git+Composer unless we're actually changing production deployment packages... So I'm curious, how do we deal with rollbacks if vendor/ is not a proper submodule? This is handled manually? [00:11:13] s/rollbacks/heterogeneous versioning/ :D [00:14:08] awight: in prod the mw/vendor repo is branched with the normal wmf branches like extensions and skins are, and it's just a submodule in the main deployment repo that can be rollbacked like extensions [00:25:08] legoktm: awesome, thanks for explaining. That's pretty much the scheme we were slowly gravitating towards. [00:33:15] 10Staging: Create staging-db* (databases) - https://phabricator.wikimedia.org/T91545#1102469 (10thcipriani) using puppet modules: standard mariadb::packages mariadb::config Seem to be a small handful of things that need to be tweaked: Something should address: https://gerrit.wikimedia.org/r/#/c/195328... [00:38:49] * legoktm is off to catch a train [00:51:08] PROBLEM - Puppet staleness on deployment-zotero01 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [43200.0] [01:00:07] 10Continuous-Integration: Generic phplint job is extremely slow for mediawiki/core - https://phabricator.wikimedia.org/T92042#1102518 (10Krinkle) The initial clone is correctly shallow. There's no bug in JJB translating this option into XML, and no bug in the Jenkins Git Plugin using it in its `git` commands. I... [01:54:23] 10Continuous-Integration, 10VisualEditor, 3VisualEditor 2014/15 Q3 blockers: Concurrent builds using local Chromium/Firefox browsers on Linux host fail - https://phabricator.wikimedia.org/T90673#1102552 (10Krinkle) [03:08:16] Project browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #460: FAILURE in 2 min 15 sec: https://integration.wikimedia.org/ci/job/browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/460/ [03:12:33] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #361: FAILURE in 5 min 32 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/361/ [03:13:23] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #515: FAILURE in 3 min 22 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/515/ [03:21:12] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #169: FAILURE in 3 min 11 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/169/ [03:23:24] Project browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #498: FAILURE in 3 min 23 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/498/ [03:25:17] 10Continuous-Integration: Generic phplint job is extremely slow for mediawiki/core - https://phabricator.wikimedia.org/T92042#1102574 (10Legoktm) 4. Use a specific job with a workspace for mediawiki-core to avoid doing a fresh clone each time, the status quo before we introduced a single phplint job I don't thi... [03:26:38] Project browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce build #14: FAILURE in 3 min 13 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce/14/ [03:27:25] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce build #525: FAILURE in 8 min 23 sec: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce/525/ [03:29:41] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #528: FAILURE in 10 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/528/ [03:29:42] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #415: FAILURE in 2 min 16 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/415/ [03:29:45] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #553: STILL FAILING in 1 min 19 sec: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/553/ [03:29:47] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce build #358: FAILURE in 3 min 8 sec: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce/358/ [03:29:54] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #448: FAILURE in 12 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/448/ [03:30:11] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #232: FAILURE in 10 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/232/ [03:31:21] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #530: FAILURE in 20 sec: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/530/ [03:33:13] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #489: FAILURE in 12 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/489/ [03:41:26] Project browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #453: FAILURE in 24 sec: https://integration.wikimedia.org/ci/job/browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/453/ [03:46:10] Project browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce build #424: FAILURE in 9.2 sec: https://integration.wikimedia.org/ci/job/browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce/424/ [03:47:18] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #167: FAILURE in 17 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/167/ [03:48:16] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-10-sauce build #166: FAILURE in 14 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-10-sauce/166/ [03:48:22] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #511: FAILURE in 12 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/511/ [03:50:13] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #513: FAILURE in 11 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/513/ [03:52:12] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #357: FAILURE in 11 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/357/ [03:53:11] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #231: FAILURE in 10 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/231/ [04:02:50] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #19: FAILURE in 5 min 49 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/19/ [04:54:41] Project beta-scap-eqiad build #44679: FAILURE in 39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44679/ [05:14:57] Yippee, build fixed! [05:14:57] Project beta-scap-eqiad build #44681: FIXED in 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44681/ [06:37:09] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [06:49:48] Project browsertests-VisualEditor-production-linux-firefox-sauce build #50: FAILURE in 1 hr 49 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-production-linux-firefox-sauce/50/ [08:46:17] (03PS2) 10Hashar: Job passes now, let's keep it that way [integration/config] - 10https://gerrit.wikimedia.org/r/195341 (owner: 10Awight) [08:46:30] (03PS3) 10Hashar: Make wikimedia-fundraising-tools-yamllint voting [integration/config] - 10https://gerrit.wikimedia.org/r/195341 (owner: 10Awight) [08:46:39] (03CR) 10Hashar: [C: 032] "\O/" [integration/config] - 10https://gerrit.wikimedia.org/r/195341 (owner: 10Awight) [08:47:56] (03Merged) 10jenkins-bot: Make wikimedia-fundraising-tools-yamllint voting [integration/config] - 10https://gerrit.wikimedia.org/r/195341 (owner: 10Awight) [09:17:58] 10Continuous-Integration: Generic phplint job is extremely slow for mediawiki/core - https://phabricator.wikimedia.org/T92042#1103698 (10hashar) https://integration.wikimedia.org/ci/job/phplint/402/ has been triggered by change: 195447,1 Branch: wmf/1.25wmf19 . The mediawiki/core wmf branch have submodules and... [09:25:26] 10Beta-Cluster, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release4, 3LE-Sprint-84: Setup new wikis in Beta Cluster for Content Translation - https://phabricator.wikimedia.org/T90683#1103716 (10Arrbee) [09:36:12] 10Staging, 5Patch-For-Review: Setup staging-tin as deployment host - https://phabricator.wikimedia.org/T88442#1103758 (10yuvipanda) Deep, deep down in a rabbit hole... [09:38:25] 10Staging, 5Patch-For-Review: Setup staging-tin as deployment host - https://phabricator.wikimedia.org/T88442#1103761 (10yuvipanda) So I got it to try to install scap via trebuchet, and that kept failing. According to https://wikitech.wikimedia.org/wiki/Trebuchet it looks like first deploy has to be manual (EU... [11:14:45] Project beta-scap-eqiad build #44716: FAILURE in 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44716/ [11:20:02] 10Staging, 5Patch-For-Review: Setup staging-tin as deployment host - https://phabricator.wikimedia.org/T88442#1104030 (10yuvipanda) Manual steps I've had to do so far: # git deploy start / sync on scap # Clone /srv/mediawiki-staging to be mediawiki-config # Clone /srv/mediawiki-staging/php-master to be mediaw... [11:34:56] Yippee, build fixed! [11:34:56] Project beta-scap-eqiad build #44718: FIXED in 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44718/ [11:49:38] (03PS1) 10Hashar: Replace python shebang with python2.7 [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195540 [11:53:56] ... [11:59:14] (03PS5) 10Hashar: Package python deps with dh-virtualenv [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) [11:59:37] (03PS2) 10Hashar: Replace python shebang with python2.7 [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195540 [11:59:39] (03PS2) 10Hashar: Merger: ensure_cloned() now looks for '.git' [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195281 [11:59:41] (03PS2) 10Hashar: wmf: soften requirements [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195280 [11:59:43] (03PS2) 10Hashar: Ensure the repository configuration lock is released [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195283 [11:59:45] (03PS2) 10Hashar: Update merge status after merge:merge is submitted [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195282 [11:59:47] (03PS1) 10Hashar: Package python deps with dh-virtualenv [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195541 (https://phabricator.wikimedia.org/T48552) [12:09:35] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [12:09:45] yeah [12:14:00] Project beta-scap-eqiad build #44722: FAILURE in 0.79 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44722/ [12:19:59] NOBODY PANIC [12:20:05] I am in the process of breaking scap. [12:20:09] should be unbroken shortly [12:34:08] alright, I think it’s fine now :) [12:34:41] :-D [12:38:17] omg looks like I can kill beta/scap soon [12:44:34] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [12:46:39] ^demon|away: poke when around. I think I might’ve managred to actuall kill the beta specific scap code while trying to get staging-tin done :) [13:08:01] while you're here YuviPanda (:-P )... do you or have you ever [13:08:28] ... used 'git deploy restart' in deployment-prep, or anywhere else for that matter? [13:08:31] that sounds like the start of one of the questions I needed to fill in on my Visa form. [13:08:36] heh [13:08:40] ‘do you or have you ever ordered genocide?' [13:08:42] (actual question) [13:08:45] wow [13:08:59] for greeks coming to the US it was if you've ever been amember of the cmmunist party [13:09:02] (no joke) [13:09:06] > Did you seek to enter the United States to engage in export control violations, subversive or terrorist activities, or any other unlawful purpose? Are you a member or representative of a terrorist organization as currently designated by the U.S. Secretary of State? Have you ever participated in persecutions directed by the Nazi government or Germany; or [13:09:06] have you ever participated in genocide? [13:09:12] but they don't need a visa now [13:09:27] well now that you mention it [13:09:38] there was one teeny tiny instance of genocide I've almost forgotten about [13:09:45] I mean seriously... who comes up with these [13:10:31] also, "subversive activities" [13:10:35] I mean that's half of the wmf [13:10:48] subverting intellectual property for the common good [13:11:13] apergos: :D [13:11:13] anyways... git deploy restart? any takers? [13:11:21] I haven’t, but I’m the wrong person to ask... [13:11:24] ah [13:11:28] bd808 then? [13:11:30] yeah [13:11:37] and duly pinged [13:13:49] brb must get lunch [13:21:53] I’m going to get food. [13:22:03] scap isn’t actually failing - it’s only failing on deployment-bastion itself... [14:15:02] either SauceLabs or Jenkins (or both) had major issues overnight [14:15:30] I’ve been futzing with scap [14:15:32] but not overnight [14:15:37] only for the last 2h [14:17:49] oh well, that's why we run these things often [14:18:27] Selenium has not been working well with Chrome when clicking things that invoke WMF-stylel [14:18:51] style "overlays". I made some changes yesterday to see if I could work around that, but then all the builds failed for other reasons. [14:20:44] heh [14:20:46] not me then [14:37:14] hi thcipriani [14:37:35] howdy YuviPanda [14:37:38] https://gerrit.wikimedia.org/r/#/c/195340/ has birthed 4 other patches (those have been merged!) [14:37:45] and staging-tin almost works [14:38:01] thcipriani: however, a nice side effect is that I applied it to deployment::bastion and *that* works now. [14:38:12] this means we can deprecate the beta/scap puppet code, which is super nice :D [14:38:29] heh, that's awesome! That's a lot of patches :) [14:39:06] thcipriani yup. and this one’s still a bigone :) [14:39:48] thcipriani: I also realized that prod is using keyholder module to do ssh keyholdering stuff, while beta is using a physical key [14:40:28] so we probably want to move to using the keyholdering stuff for staging. [14:42:27] YuviPanda: I've been working my way through mariadb. I'm pretty sure I need this patch to move forward: https://gerrit.wikimedia.org/r/#/c/195328/ but it definitely feels a bit hacky. I'm wondering if what's happening is libmysqlclient18 is being installed before an apt-get update is getting run after adding the mariadb apt repo. [14:42:44] thcipriani: yup. am looking at it. [14:46:26] YuviPanda: 503 from the API on beta labs just now, was that you? [14:46:45] I’m not doing anything atm... [14:46:48] * chrismcmahon could go check the logs I guess [14:47:24] weird. I'll try one more time [14:49:00] YuviPanda: so after thinking about this and kicking myself a bit, I did the following [14:49:21] I am removing a capability in trigger in order for git deploy sync (th checkout piece) to work [14:49:41] this is in the trigger package, only deployed on deployment-bastion in the deployment prep project [14:49:59] if I upload new packages, it's in the prod repo and I remove this one specific capability everywhere, which is bad [14:50:12] I could build a deb and dpkg install it there [14:50:20] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #364: ABORTED in 10 hr: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/364/ [14:50:30] but that's just as bad as changing two lines of code in the py file :-P [14:50:46] !log Browsertest job was stuck for > 10hrs. Jobs should not be allowed to run that long. [14:50:52] Logged the message, Master [14:50:58] i.e. it's not in puppet and not going to be; we really have to wait for the new 2014.7 salt to come out with the backport of their bugfix [14:51:26] so I propose to just log the manual patch, it won't be overwritten by anyone unless deployment-bastion is reinstalled [14:51:36] well before then I hope we'll have the new salt package which I'll shove around [14:51:49] and in themeantime upgrade on prod is held back (sadly) [14:51:58] make sense to you? [14:52:28] 10Continuous-Integration, 10Quality-Assurance, 6Release-Engineering: browsertest jobs should not be allowed to run for 10 hours - https://phabricator.wikimedia.org/T92275#1104513 (10Krinkle) 3NEW [14:52:28] apergos: you’re still talking to the wrong person, because I’ve absolutely no idea how git deploy / trigger works at all :D [14:52:40] ah ha :-D [14:52:53] 10Continuous-Integration, 10Quality-Assurance, 6Release-Engineering: browsertest jobs should not be allowed to run for 10 hours - https://phabricator.wikimedia.org/T92275#1104520 (10Krinkle) [14:52:54] ok I'll ping bd808 again [14:52:57] :D [14:53:07] in any case I did make the manual modification [14:53:22] so git dpeloy sync is now working on deployment-prep [14:53:31] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #507: ABORTED in 10 hr: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/507/ [14:53:32] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #456: ABORTED in 10 hr: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/456/ [14:54:31] wtf [14:55:53] Krinkle: I'm guessing you just killed that Jenkins build? Thanks, something went sideways overnight and I'm not sure what it was. [14:56:05] chrismcmahon: Yup [14:56:17] It seems it got stuck after the exception [14:57:12] A quick survey of failed builds shows all sorts of connection errors, 404s, 503s, it's just a mishmash of problems. [14:58:21] hashar: can we move the meeting tomorrow to 30 minutes later? [14:58:50] thcipriani: I wonder if the apt errors are because of differences in OS version. [14:58:59] zeljkof: sure [14:59:05] zeljkof: just move it :) [14:59:07] * chrismcmahonbrb shakes fist at Chrome [14:59:11] thcipriani: either way, I”d want sean pringle to weigh in before we do anything about that [14:59:13] srsly brb [14:59:13] hashar: great, will do [14:59:23] YuviPanda: tried it on precise and trusty, both seemed upset :\ [14:59:28] thcipriani: bah. [15:00:05] thcipriani: let’s email sean? cc me as well... [15:00:27] * bd808 rubs sleep from his eyes and looks at backscroll [15:03:44] apergos: I don't think I've used `git deploy restart` before, but I think parsoid uses it (or at least did at one point) [15:05:09] scap, kibana, wikimania scholarships and IEG grants are all boring `git deploy sync` only users of trebuchet [15:06:35] 10Beta-Cluster: upgrade salt on deployment-prep to 2014.7 - https://phabricator.wikimedia.org/T92276#1104574 (10ArielGlenn) 3NEW a:3ArielGlenn [15:06:58] 10Beta-Cluster: upgrade salt on deployment-prep to 2014.7 - https://phabricator.wikimedia.org/T92276#1104583 (10ArielGlenn) [15:11:17] 10Beta-Cluster: upgrade salt on deployment-prep to 2014.7 - https://phabricator.wikimedia.org/T92276#1104585 (10ArielGlenn) Salt-common, salt-master and salt-minion have been updated across deployment-prep. 2014.7 salt-syndic has been installed on deployment-salt. I have removed the salt ppa repo from sources... [15:11:44] hrm does it use it in deployment-prep?? [15:11:47] bd808: [15:12:32] because right now I've just contained the damage, so to speak: git deploy sync works but w/o --force, and restart will not work at all [15:12:44] I could hardcode the batchsize in there I guess if we really need it [15:14:09] oh, also, goo morning, now I see you are just waking up,it's no rush, so reply any time [15:15:58] apergos: I ignored pings until I was done with breakfast. :) The latest znc took away the plugin that changes nick when detached. [15:16:07] ahhh [15:16:19] that seems a bit silly [15:16:37] maybe I should add you on the 'ugrae salt in deployment prep' task I just created :-P [15:16:38] I'm not sure if parsoid or any of the other deployment-prep apps are using restart [15:16:44] ah then I won't bother [15:16:49] even bettah [15:17:35] I think parsoid is actually only using trebuchet in prod and some other update system via Jenkins in beta [15:17:41] cool [15:17:46] but ocg is using trebuchet I think [15:17:51] hm [15:18:00] restart? or just deploy? [15:18:04] cause eploy works [15:18:07] *deploy [15:18:41] *shrug* They could certainly work around restart in beta. There is only 1 or 2 instances they would need to poke [15:19:19] ok [15:19:36] 10Continuous-Integration, 10Quality-Assurance, 6Release-Engineering: browsertest jobs should not be allowed to run for 10 hours - https://phabricator.wikimedia.org/T92275#1104612 (10zeljkofilipin) 30 minutes is not enough for some browser test jobs. Either the jobs should be split into smaller ones, or large... [15:19:54] then I'll leave things like they are for now and nag the salt guys til they give us a new point release [15:20:04] works for me [15:20:08] thanks! [15:20:08] 10Staging, 5Patch-For-Review: Setup staging-tin as deployment host - https://phabricator.wikimedia.org/T88442#1104619 (10greg) >>! In T88442#1103761, @yuvipanda wrote: > According to https://wikitech.wikimedia.org/wiki/Trebuchet it looks like first deploy has to be manual (EUGH?), so I did that and scap is now... [15:20:23] I'l try to remember why it's broken when and if someone notices [15:20:36] is there somewhere I could put a link to that task? [15:20:52] 10Staging, 5Patch-For-Review: Setup staging-tin as deployment host - https://phabricator.wikimedia.org/T88442#1104626 (10yuvipanda) @greg me neither. the plan is for me to get this patch in, then destroy staging-tin, and rebuild it. Repeat until 0 manual steps remain. [15:21:04] 10Continuous-Integration, 10Quality-Assurance, 6Release-Engineering, 7Browser-Tests: browsertest jobs should not be allowed to run for 10 hours - https://phabricator.wikimedia.org/T92275#1104630 (10greg) [15:21:08] greg-g_: I’ have not gone to the dark side :) [15:21:16] :-D [15:21:25] apergos: Maybe just !log it here? Then we could find it in SAL [15:21:28] ok sure [15:21:33] email releng@ maybe [15:22:31] YuviPanda: eh? [15:22:40] !log after update of salt in deployment-prep git deploy restart is likely broken. details; https://phabricator.wikimedia.org/T92276 [15:22:43] Logged the message, Master [15:22:46] greg-g_: in response to your question about manual steps :) [15:22:49] on phab [15:22:57] I'm not on that list [15:22:58] YuviPanda: feel free to rewrite all the deployment systems so they are sane. The last time anyone tried the project was declared "done" before we did much more than figure out what the old code was doing and mke it more readable [15:22:58] YuviPanda: you want me to email releng@? [15:23:25] * greg-g_ is confused by what YuviPanda was trying to communicate [15:23:39] bd808: I’m not complaining about trebuchet / scap. I’m just mostly whining about my own ignorance about how they work, and using the phab ticket to record what I’m doing [15:23:58] YuviPanda: NO IT"S ALL YOUR FAULT!!!!! [15:24:00] :) [15:24:00] don't worry, I'll complain enough for the both of us :-P [15:24:26] bd808: :) esp. because I’m doing handhack -> update patch -> apply, try again, so... [15:25:06] There will be another opportunity in codfw soon too [15:25:15] yeah [15:25:27] and I suppose my patch right now would make that slightly simpler, perhaps? [15:25:34] setting up the deployment host, that is [15:25:39] *nod* [15:26:07] and right now you could be reimaging the whole host after you fix each broken part [15:26:22] which is harder to do later and why this stuff is so sccabby [15:26:25] yeah [15:26:58] and also other differences - like mwdeploy key being present on homedir in beta vs ‘keyholder’ services in prod. [15:27:09] Trebuchet has a lot of new project quirks just because it didn't get run often enough [15:27:09] mostly I didn’t realize that ‘setup staging-tin’ would go this deep down the rabbit hole :) [15:27:14] yeah [15:27:36] We just never converted beta after keyholder was invented [15:27:40] I’m not sure if the sync I did was the only thing that happened, or if the iniital sync made it ‘just work' [15:27:40] yeah [15:27:49] it would be good to do so obviously [15:27:50] so that’s getting balled into this as well [15:27:56] excellent [15:27:57] I’m also splitting out patches when this one gets too big [15:28:07] it’s already given out 3 patches that’ve been no’op merged, so that’s good [15:28:34] Maybe we can even setup l10nupdate so we can finally debug why it has been broken for 4 months :) [15:28:48] (that was the sarcastic we by the way) [15:28:52] hehe [15:29:03] bd808: when isn't it? [15:29:47] 10Continuous-Integration, 10VisualEditor, 3VisualEditor 2014/15 Q3 blockers: Concurrent builds using local Chromium/Firefox browsers on Linux host fail - https://phabricator.wikimedia.org/T90673#1104669 (10Krinkle) Per the suggestion from upstream ([karma-users thread](https://groups.google.com/forum/#!msg/k... [15:29:52] greg-g_: hey! I resemble that remark. [15:30:14] * bd808 would love to work on this stuff soooo much more than PM thinks [15:30:33] * greg-g_ says nothing [15:30:49] * YuviPanda would love for bd808 to wor on this stuff so much more than PM things [15:31:38] greg-g_: did you freeze or I? [15:32:01] bd808: who am I to ask scap questions to? [15:32:07] or should I just dig in and make whiny notes on phab... [15:32:10] me I think, rejoining, or trying to at least... [15:32:12] I would be happy to spend a chunk of time on trebuchet and its olk [15:32:13] ilk [15:32:23] basically anything salt-ish [15:33:09] apergos: oh? [15:33:09] * greg-g_ rubs his hands together with an evil grin on his face [15:33:09] zeljkof: google hangouts now isn't loading at all for me [15:33:10] nor is plus.google.com [15:33:27] YuviPanda: ask me, but whine in the general direction of twentyafterfour and tcipriani too [15:33:27] I'd have to get ma rk to say it's cool but um yeah [15:33:35] greg-g_: firefox halo/hello? (whatever the name is) [15:34:47] while I wait for google to return to my list of pingable internet hosts... [15:34:53] apergos: if serious, I would appreciate your contributions and perspective and time on that work. thcipriani, ^demon|away, and twentyafterfour are the main goto people from RelEng (with special guest appearances by bd808 ). [15:35:16] bd808: is there an ‘architecture of scap’ document somewhere? [15:35:17] now bear in mind I said salt + trbuchet. I know next to nothing about scap :-D [15:35:19] I can’t find much on wikitech [15:35:49] YuviPanda: Closest thing is https://doc.wikimedia.org/mw-tools-scap/ [15:35:49] 10Continuous-Integration, 10Quality-Assurance, 6Release-Engineering, 7Browser-Tests: browsertest jobs should not be allowed to run for 10 hours - https://phabricator.wikimedia.org/T92275#1104687 (10Tobi_WMDE_SW) The min Wikidata browsertest suite takes 3h at the moment. See https://integration.wikimedia.or... [15:35:50] but inasmuch as it depends on salt I am happy to look at those bits too. [15:36:04] YuviPanda: specifically https://doc.wikimedia.org/mw-tools-scap/scripts.html#scap [15:36:17] * bd808 sees that the formatting got mangled there [15:36:29] aha! [15:37:02] YuviPanda: The code is pretyt heavily commented. scap/main.py is a good place to start reading [15:37:52] scap is actually easy. It's not a great process but today it is reasonably easy to understand [15:37:53] bd808: alright [15:37:56] 10Beta-Cluster: upgrade salt on deployment-prep to 2014.7 - https://phabricator.wikimedia.org/T92276#1104692 (10hashar) As a workaround, we could rebuild the ppa package and add the three commits mentioned on [[ https://github.com/saltstack/salt/issues/18317 | issue 18317 ]] the mark it with -wmf1 or something.... [15:38:03] I’m trying to see why it fails only on one host [15:38:22] bad permissions or missing ssh creds [15:38:25] most likely [15:38:47] Or a firewall problem [15:39:04] right [15:39:36] bd808: problem is, it is scap running on deployment-bastion trying to ssh to itself and failing [15:39:36] besides doing a bit of prep work on the local file system, scap just uses ssh to connect to each mw host and asks it to rsync some files [15:39:49] hmmm [15:39:50] I’m not even sure if you can ssh to yourself [15:39:55] sure you can [15:39:56] or if it is even supposed to do that [15:40:07] It doesn't have to [15:40:14] right. [15:40:26] that means that deployment-bastion is in the dsh group file [15:40:34] yeah [15:40:45] which probably happened by accident [15:40:46] so I”m not sure, if this has been breaking forever or is breaking after my patch... [15:41:02] bd808: oh, it doesn’t make sense to deploy to -bastion? [15:41:23] it does that locally already [15:41:40] "Sync deploy directory on localhost with staging area" [15:42:08] It does that before it updates the rsync proxies [15:42:20] I see > 15:42:12 Copying to deployment-bastion.eqiad.wmflabs from deployment-bastion.eqiad.wmflabs [15:42:25] so I guess that’s that [15:42:26] so that it is running the latest code locally when it does a few php steps [15:42:34] *nod* [15:42:42] right [15:42:46] so let me get rid of that from the group [15:48:46] (03PS2) 10Hashar: Package python deps with dh-virtualenv [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195541 (https://phabricator.wikimedia.org/T48552) [15:48:48] (03PS3) 10Hashar: Replace python shebang with python2.7 [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195540 [15:48:50] (03PS3) 10Hashar: Merger: ensure_cloned() now looks for '.git' [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195281 [15:48:52] (03PS3) 10Hashar: wmf: soften requirements [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195280 [15:48:52] bd808: that works! (for deployment-prep!) Thank you! [15:48:54] (03PS3) 10Hashar: Ensure the repository configuration lock is released [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195283 [15:48:56] (03PS3) 10Hashar: Update merge status after merge:merge is submitted [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195282 [15:49:02] sweet [15:49:11] kill all the cruft you can YuviPanda [15:49:36] I can guarantee it will creep back in [15:49:46] totally [15:50:04] it comes and goes in waves, I suppose [15:50:14] cruft maintenance ftw [15:50:40] In an idea world we would figure out how to automate provisioning the whole test environment and then make it rebuild once a month/week [15:50:47] BaaS [15:50:55] I would like us to get there before end of this calendar year [15:51:06] which isn’t actually that bad. [15:51:19] yes yes yes [15:51:20] ther’s a YAML based ENC planned, and exposing Nova APIs to external folks planned... [15:51:23] bd808: I'd really like to see that, especially to blow away data used only once, like images, users, etc. [15:51:49] Yippee, build fixed! [15:51:49] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #418: FIXED in 10 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/418/ [15:51:49] so with the YAML based ENC you’d get a site.pp equivalent, and then you just script the nova starts and boom you’re good to go. [15:51:55] FINALLY [15:52:01] staging-* is an important part of that [15:52:07] because we’re fixing the puppet code as we go. [15:52:14] to involve zero ‘manual’ steps [15:52:36] bd808: greg-g_ although, next quarter I think I might have to dive into toollabs, and I am not sure what’ll happen here... [15:52:40] server login considered harmful [15:52:47] yup [15:53:09] ssh should be like a museum. ‘look but don’t touch' [15:53:32] speaking of which, YuviPanda did you see I got the ssh master public key thing for puppet salt::minions up on gerrit? [15:53:36] (03PS6) 10Hashar: Package python deps with dh-virtualenv [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) [15:53:43] We need to pick our battles. There is an endless number of things that *should* be worked on but only so many of us to do work. [15:53:46] thcipriani: yup! did you manage to test it? [15:53:49] bd808: yeah... [15:54:11] toollabs getting more stable/easy would be a huge win [15:54:12] YuviPanda: yeah, seems to work, it's patched on staging-palladium now [15:54:27] bd808: yeah. both stability + an actual ‘developer story' [15:54:40] bd808: end game would be Labs : Tools :: AWS : Heroku [15:54:43] Yippee, build fixed! [15:54:43] Project beta-scap-eqiad build #44747: FIXED in 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44747/ [15:54:50] +1 [15:55:12] Building an open source heroku would be great [15:55:32] I wonder if we could trick^Wconvince heroku staff to help [15:55:33] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [15:55:50] bd808: a lot of their stuff is already open source [15:55:58] bd808: we’re however stuck with GridEngine... [15:56:04] for a while, at leats [15:56:32] strange message, shinken-wm. the errors have all cleared... [16:07:29] 10Beta-Cluster: upgrade salt on deployment-prep to 2014.7 - https://phabricator.wikimedia.org/T92276#1104783 (10ArielGlenn) I can but I'd have to build it for 3 platforms, and I'd rather just wait the couple of weeks for the point release. [16:08:02] 10Beta-Cluster: upgrade salt on deployment-prep to 2014.7 - https://phabricator.wikimedia.org/T92276#1104794 (10ArielGlenn) well 4 platforms actually... [16:10:37] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [16:26:34] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [16:36:41] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [16:52:33] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [17:00:22] YuviPanda: by the way, my reference in that ticket might have not been universal, but the "no sir I don't like it" is from a Ren and Stimpy episode: here's the quote: https://duckduckgo.com/l/?kh=-1&uddg=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcDGlN6mluGA [17:00:40] greg-g: aaaaaah. that makes more sense [17:01:02] yeah, sorry about that :) [17:01:19] greg-g: also, I found out today that your second name has an ‘r’ in it [17:01:23] mind was blown [17:01:30] grossmeier? [17:01:48] ( we say "last name" in 'merica) [17:01:50] greg-g: https://en.wikipedia.org/wiki/Gossamer is how I’ve always ‘read’ it [17:01:57] ah, nope [17:02:00] greg-g: ah, right. [17:02:07] greg-g: yeah, interestingly long amount of blindness [17:02:17] greg-g: also, I’ll be in the office in about 3 weeks. booking tickets now. [17:02:21] it's actually a misnomer, in a way. Gross == Big and Meier == "land owner". I'm definitely not a owner of any land. [17:02:32] aaaaaaaahhh [17:02:32] right [17:09:43] thcipriani: btw, that patch has led me deeeeeep down a rabbit hole. [17:09:51] am currently making labs use keyholder... [17:10:00] the deployment hosts have differed enough from prod that this is needed now [17:12:36] bd808: I see your local hack on the labs private repo on deployment-prep, with private keys for scap. [17:12:52] I’m putting a new mwdeploy key on labs/private... [17:13:08] I guess (hope?) mwdeploy can’t ssh in anyway, plus this is keyholder, so it should be alright... [17:13:09] as a non-local? [17:13:57] actually, let me think. [17:13:59] if that key goes out in git then anyone with access to the labs bastion can change the php code for beta [17:14:10] which is not super awesome [17:14:13] right [17:14:36] so basically what I want is for it to be public, and still only usable by people with deployment-prep access [17:14:39] * YuviPanda considers [17:14:45] YuviPanda: ohhh good. I just hope that the number of roles that aren't rabbit holes outnumber the number roles that are :) [17:14:52] The private key only needs to exist on the deploy server [17:14:58] like it does in prod [17:15:07] indeed. [17:15:38] I am wondering if I have any sane reasons for attempting to go deeper into this hole other than ‘must not have local cherry-picks' [17:15:51] but you’re probably right. I’ll just switch keys again, and put only the public one out [17:15:54] (keyholder requires a passphrase) [17:16:35] Honestly I think the no cherry-picks think is a bit off the deep end [17:16:49] but I don't have root [17:17:42] I have this -- http://www.amazon.com/Palisades-Ren-Stimpy-Shaven-Action/dp/B000302CQM -- on my desk to remind me not to shave yaks that should be self-shaving [17:18:30] oh wow [17:18:36] yeah, I’ll probably not shave this particular yak [17:19:08] bd808: the no long lived cherry-picks on deployment-prep is also a commitment for ops, I think - to not just go ‘eh, beta, who cares’ but actually work on + fix things... [17:19:25] I guess it’ll be tested ‘next quarter’ when I’ll be less available.. [17:19:43] which is great! But I've seen you revert things (eg break things) in service of not having the cherry-pick [17:19:52] which is the wrong way to game the stat [17:20:08] have I? [17:20:12] I don’t recall doing that. [17:20:23] I have tried to not do that at all... [17:20:41] You were asking to to it in -ops yesterday [17:20:41] do call me out if I do it.. [17:20:46] zotero? [17:20:51] yeah [17:20:59] it was a nudge :) [17:21:08] mostly checking to see if it wasn’t just ‘forgotten' [17:21:33] I most definitely need to work on the words I pick... [17:22:05] bd808: intention is to be ‘hey! this patch is around, want some help moving it around? whom do we poke?’ etc rather than ‘hah! YOU ARE OUT OF TIME! I WILL KICK IT OUT NOW!' [17:22:14] :) [17:22:22] That I'm fine with [17:22:34] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:52] And I'm super excited that people besides me and hashar started to care [17:23:30] bd808: yup, yup :) I am happy we were able to test the entirety of the www-data move on beta, for example. [17:25:17] bd808: I’ll appreciate being called out in a PM if you think I’m doing something stupid :) [17:25:32] PM? I shame in public ;) [17:25:50] that too :)P [17:25:52] :P [17:26:09] when I’ve asked people to do that in public before they have been like ‘eh, no' [17:30:36] <^demon|away> YuviPanda: I'm going to rebuild -mc* as jessie. That works. [17:30:42] <^demon|away> (and is what we're doing in codfw) [17:30:53] ^demon|away: I think we should stick to precise... [17:31:04] we’ve to battle lots of puppet woes I think [17:31:19] <^demon|away> I just build -mc4 as jessie and it worked just fine [17:31:21] and we shouldn’t be changing two things at once... [17:31:25] that we know of [17:31:32] it’s running a different version of memcached (probably) than prod [17:31:47] etc [17:32:03] so I guess we’ll move prod (eqiad, or whatever is active DC) from precise to something else at some point [17:32:08] and then we can use staging to test that as well... [17:32:33] so I *really* think we should have the same OS / version combo that prod does. [17:32:41] <^demon|away> But codfw isn't being done with precise [17:32:57] codfw is also not serving any wiki pages atm [17:32:59] or anything at all... [17:33:15] the currently in-use memcached instances are precise. [17:34:15] ^demon|away: once we start doing both DCs actively in prod, we can activate the labs cluster in codfw, and then set that up to mirror codfw prod... [17:34:37] thcipriani: twentyafterfour ^ thoughts? [17:34:43] Project beta-scap-eqiad build #44756: FAILURE in 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44756/ [17:35:36] * twentyafterfour reads... [17:36:33] I was working with the end goal of having staging match prod. I thought that would be a win:win since I assumed all the puppet roles should work for prod, so it's weird that -mc works for jessie and not precise. [17:36:50] +1 to thcipriani [17:36:52] <^demon|away> It works for jessie and precise. [17:36:57] <^demon|away> Just not trusty (which is fine) [17:37:07] yeah, because we have no mc hosts with trusty [17:37:20] <^demon|away> Again, which is fine :) [17:37:36] I think eqiad labs should match eqiad prod, and then we can spin up codfw labs and have that match codfw prod, and then people can test migration on the staging environ before migrating prod... [17:37:49] gotcha, yeah, initially I thought they all had to be on jessie, then I thought they all had to be on trusty, but now I think that that was an early misunderstanding of production on my part. [17:40:17] yeah [17:40:22] production is a mismatch of things... [17:40:32] thcipriani: it even has some… lucid… hosts... [17:40:41] (we are killing them, and they are misc hosts we won’t have to repro in staging) [17:53:05] * ^demon|away nukes his non-precise instances, sighs audibly in YuviPanda's direction ;-) [17:53:17] \o/ yay :) [17:54:52] Yippee, build fixed! [17:54:52] Project beta-scap-eqiad build #44758: FIXED in 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44758/ [18:01:31] I am 99% certain we can avoid these test failures with Chrome as soon as SauceLabs supports se 2.45 and Chrome 41. 2.45 was a big update for selenium. [18:07:42] Yippee, build fixed! [18:07:42] Project browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #461: FIXED in 1 min 40 sec: https://integration.wikimedia.org/ci/job/browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/461/ [18:11:50] thcipriani: I’m going to head to bed now. I’m now figuring out how to distribute public keys properly for different environs. This should be fun :) [18:11:57] Yippee, build fixed! [18:11:58] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #568: FIXED in 26 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/568/ [18:12:02] I’ll dig deeper tomorrow, see how far it goes. [18:12:14] YuviPanda: nice. Sounds good. [18:12:48] thcipriani: :) reach out to the ops@ mailing list maybe about the package conflict? [18:13:16] right, I'll do that. I think I may have a way around it, not sure yet, but I'll email. Testing it now. [18:13:33] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [18:13:56] thcipriani: \o/ cool [18:13:59] Like I said, I don't think it's a big deal as long as we can ensure that an update runs before anything tries to install with mariadb [18:14:20] ...I _think_ [18:19:17] so belatedly... I think I agree with YuviPanda, the staging environment should mirror production as closely as possible, including os versions [18:19:44] Yippee, build fixed! [18:19:44] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #170: FIXED in 1 min 42 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/170/ [18:22:45] twentyafterfour: yeah I think that's a good goal and policy, too. Also, if you have any time to dig into puppet role surgery, that would be a good thing: there's more of it to be done than I think I initially thought. [18:23:16] "think I initially thought"? Yup. sticking with that phrasing. [18:24:35] :D [18:24:46] thcipriani: yeah I can work on some puppet stuff, point me in the right direction [18:24:48] thcipriani: but I think when we leave them we’re leaving them cleaner than we found... [18:24:54] thcipriani: but I have to do deployment right now ... [18:25:33] YuviPanda: heh, hopefully we'll still think that in a few months :) [18:25:41] :D [18:25:42] true, true [18:26:21] twentyafterfour: sure, yeah, a likely excuse :) [18:27:34] twentyafterfour: but seriously, whenever you have time, just pick a set of servers from the staging blocking task and dig in. [18:28:35] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [18:35:50] Yippee, build fixed! [18:35:50] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #529: FIXED in 16 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/529/ [18:53:06] Yippee, build fixed! [18:53:07] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #362: FIXED in 46 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/362/ [18:56:10] Yippee, build fixed! [18:56:10] Project browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #499: FIXED in 3 min 2 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/499/ [18:57:39] YuviPanda, legoktm, "wikibugs has quit (Excess Flood)" and hasn't rejoined [18:57:56] (almost 3 hours ago) [18:58:22] in a few channels, but not all. [18:58:28] right [18:59:19] it looks like it's still in -dev and -labs, but nowhere else. [19:09:23] Yippee, build fixed! [19:09:23] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce build #526: FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce/526/ [19:10:23] Yippee, build fixed! [19:10:23] Project browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce build #15: FIXED in 3 min 22 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce/15/ [19:17:30] Yippee, build fixed! [19:17:31] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #516: FIXED in 1 hr 7 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/516/ [19:42:14] Yippee, build fixed! [19:42:14] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #449: FIXED in 1 min 8 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/449/ [19:43:59] Yippee, build fixed! [19:44:00] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #233: FIXED in 1 min 45 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/233/ [20:14:16] Yippee, build fixed! [20:14:16] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #490: FIXED in 4 min 24 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/490/ [20:30:58] Yippee, build fixed! [20:30:58] Project browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #454: FIXED in 2 min 20 sec: https://integration.wikimedia.org/ci/job/browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/454/ [20:34:59] Yippee, build fixed! [20:34:59] Project browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce build #425: FIXED in 1 min 35 sec: https://integration.wikimedia.org/ci/job/browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce/425/ [20:36:19] Yippee, build fixed! [20:36:19] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #168: FIXED in 1 min 19 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/168/ [20:48:25] what's up with the bots? [20:48:32] no grrit-wm nor wikibugs [20:53:15] greg-g: I thought I saw someone mention it in -operations, but I can't find it in the scrollback [20:55:13] thcipriani: legoktm is "wtf'ing" over in -labs [20:57:02] so many channels on which to "wtf", hard to keep track. [20:57:04] Yippee, build fixed! [20:57:05] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-10-sauce build #167: FIXED in 1 min 48 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-10-sauce/167/ [20:57:26] thcipriani: 'tis true [20:57:55] it's being discussed over in -labs because wikibugs and grrit-wm run on the toollabs infra [21:01:18] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #569: FAILURE in 24 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/569/ [21:40:32] Yippee, build fixed! [21:40:32] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #232: FIXED in 1 min 2 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/232/ [21:43:45] Yippee, build fixed! [21:43:45] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #512: FIXED in 42 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/512/ [21:50:09] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<12.50%) [22:05:08] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<11.11%) [22:05:08] Yippee, build fixed! [22:05:09] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #514: FIXED in 40 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/514/ [22:24:11] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #508: STILL FAILING in 19 min: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/508/ [22:25:45] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #457: SUCCESS in 1 min 34 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/457/ [22:26:21] Yippee, build fixed! [22:26:22] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #358: FIXED in 48 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/358/ [22:39:43] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #365: STILL FAILING in 52 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/365/ [22:40:46] legoktm: https://integration.wikimedia.org/ci/job/phplint/680/console [22:40:48] Only took 1 minute [22:40:51] interesting? [22:41:11] Still a long time compared to shallow but impressive [22:41:30] Krinkle: on the bug hashar said it might be because its checking out submodules [23:23:06] twentyafterfour: I have a better-better fatalmonitor that removes the "repeated N times" leader and correctly counts the de-duplicated lines. Can I amend your patch to use it or should I toss in a followup patch? [23:23:34] bd808: amend away [23:23:41] sweet [23:24:42] bd808: I seriously considered adding up the repeated count but the thought of doing it in shell script made my brain hurt [23:25:17] and rewriting it in php didn't seem much better [23:30:02] twentyafterfour: It's a bit eye bleedy -- https://gerrit.wikimedia.org/r/#/c/195657 [23:30:31] If my awk fu was stronger I'm sure that could have been one line [23:31:12] heh ... less lines isn't always better [23:31:48] perl golf takes over when my brain sees problems like this [23:32:05] I was gonna say, looks like a Perl problem :-) [23:41:51] how can I tell who has +2 on operations/puppet? I never know who to cc on my changes to that repo [23:42:06] anyone with root [23:42:20] https://meta.wikimedia.org/wiki/System_administrators [23:43:16] legoktm: is that actually updated regularly? and how? [23:43:36] https://meta.wikimedia.org/w/index.php?title=System_administrators&action=history reasonably [23:43:40] manually? :P [23:43:51] cluster-wide root doesn't change that often [23:43:57] but shell? [23:44:19] whatever, I trust the config files and prefer not to repeat on wiki pages :) [23:44:45] :P [23:45:05] * greg-g goes to eat a brownie with his son [23:45:22] (now that the mark lanegan album is over and the last few things ticked off the list) [23:47:52] wiki pages live forever ;) [23:48:57] there is a gerrit group for almost every repo, but not for puppet [23:53:29] that's probably an ldap group? [23:54:16] I suppose ... no idea how it's set up in gerrit [23:54:41] Project beta-scap-eqiad build #44795: FAILURE in 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44795/ [23:58:16] twentyafterfour: https://gerrit.wikimedia.org/r/#/admin/projects/operations/puppet,access [23:58:19] ldap/ops