[04:04:27] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce build #559: FAILURE in 32 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce/559/ [04:42:00] 10Staging, 5Patch-For-Review, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1156024 (10mmodell) [[ https://graphite.wmflabs.org//render?width=600&from=-8hours&until=now&height=400&target=cactiStyle%28alias%28averageSeries... [04:50:53] Yippee, build fixed! [04:50:54] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #566: FIXED in 41 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/566/ [05:13:42] 05:12:09 A database query error has occurred. [05:13:42] 05:12:09 Query: CREATE UNIQUE INDEX pp_propname_sortkey_page ON `page_props` (pp_propname,pp_sortkey,pp_page) [05:13:42] 05:12:09 [05:13:42] 05:12:09 Function: DatabaseBase::sourceFile( /mnt/jenkins-workspace/workspace/mediawiki-extensions-zend/src/maintenance/tables.sql ) [05:13:42] 05:12:09 Error: 3 Error writing file './jenkins_u4_mw/#sql-3f4_cff.frm' (Errcode: 28) (localhost) [05:40:02] Yippee, build fixed! [05:40:02] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #59: FIXED in 6 min 28 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/59/ [05:42:21] 10Continuous-Integration: Error: 3 Error writing file './jenkins_u4_mw/#sql-3f4_cff.frm' (Errcode: 28) (localhost) on integration-slave1001 in mediawiki-extensions-zend job - https://phabricator.wikimedia.org/T94138#1156057 (10Legoktm) 3NEW [05:42:36] !log marked integration-slave1001 as offline due to https://phabricator.wikimedia.org/T94138 [05:42:42] Logged the message, Master [06:29:19] 10Continuous-Integration, 10MediaWiki-Codesniffer, 3Google-Summer-of-Code-2015, 3Outreachy-Round-10: GSoC Proposal for Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T94140#1156079 (10Hharchani) 3NEW a:3Hharchani [06:49:00] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [0.0] [07:14:00] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:36:05] (03PS1) 10Hharchani: Add sniff to check for "goto" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/200121 [07:39:06] (03CR) 10Polybuildr: [C: 031] Add sniff to check for "goto" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/200121 (owner: 10Hharchani) [07:39:08] (03CR) 10Legoktm: [C: 04-1] Add sniff to check for "goto" (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/200121 (owner: 10Hharchani) [07:40:28] (03PS2) 10Hharchani: Add sniff to check for "goto" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/200121 [07:55:39] (03CR) 10Legoktm: "recheck" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/200121 (owner: 10Hharchani) [07:56:23] (03CR) 10Legoktm: [C: 032] Add sniff to check for "goto" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/200121 (owner: 10Hharchani) [07:56:36] (03Merged) 10jenkins-bot: Add sniff to check for "goto" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/200121 (owner: 10Hharchani) [07:57:02] (03PS1) 10Legoktm: Use zuul-cloner for Wikibase/Wikidata jobs [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) [08:00:27] (03CR) 10Legoktm: "Untested" [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) (owner: 10Legoktm) [08:14:16] (03CR) 10Zfilipin: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/199934 (https://phabricator.wikimedia.org/T94032) (owner: 10Greg Grossmeier) [08:14:19] (03CR) 10Zfilipin: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/199932 (owner: 10Hashar) [08:17:00] (03CR) 10Aude: [C: 031] "looks sane to me and we definitely want to *not* depend on gitblit" [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) (owner: 10Legoktm) [08:22:52] zeljkof: thanks for the merges [08:23:04] most of them are just padding, but one is a real bug that elitre noticed [08:23:31] [ https://gerrit.wikimedia.org/r/#/c/200055/ ] [08:23:50] aharoni: saw that [08:24:02] you could have merged all the padding commits into one :) [08:24:08] but this is fine too [08:26:15] moving to another machine [08:31:21] * zeljkof is back [08:32:11] (03PS2) 10Zfilipin: Remove Chris from email alerts [integration/config] - 10https://gerrit.wikimedia.org/r/199934 (https://phabricator.wikimedia.org/T94032) (owner: 10Greg Grossmeier) [08:32:18] (03CR) 10Zfilipin: [C: 031] Remove Chris from email alerts [integration/config] - 10https://gerrit.wikimedia.org/r/199934 (https://phabricator.wikimedia.org/T94032) (owner: 10Greg Grossmeier) [08:34:06] (03PS2) 10Zfilipin: Stop throttling SauceLabs jobs [integration/config] - 10https://gerrit.wikimedia.org/r/199932 (owner: 10Hashar) [08:34:12] (03CR) 10Zfilipin: [C: 031] Stop throttling SauceLabs jobs [integration/config] - 10https://gerrit.wikimedia.org/r/199932 (owner: 10Hashar) [08:52:34] zeljkof: good morning [08:52:43] hashar: good morning [08:52:55] zeljkof: wanna start right now? I am ready [08:53:29] hashar: I need a few minutes to finish something, I will be ready at 10 :) [08:53:40] ok :) [09:34:23] (03CR) 10Hashar: "Timo is right we might reach out SauceLabs quota." [integration/config] - 10https://gerrit.wikimedia.org/r/199932 (owner: 10Hashar) [09:35:14] (03CR) 10Zfilipin: [C: 032] Stop throttling SauceLabs jobs [integration/config] - 10https://gerrit.wikimedia.org/r/199932 (owner: 10Hashar) [09:36:17] (03CR) 10Hashar: "Deploying the jobs. Now we can switch them to @daily in another change." [integration/config] - 10https://gerrit.wikimedia.org/r/199932 (owner: 10Hashar) [09:48:36] (03Merged) 10jenkins-bot: Stop throttling SauceLabs jobs [integration/config] - 10https://gerrit.wikimedia.org/r/199932 (owner: 10Hashar) [09:48:58] 10Continuous-Integration, 7Browser-Tests, 7Jenkins: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1156195 (10zeljkofilipin) 3NEW a:3zeljkofilipin [09:50:14] 10Continuous-Integration, 7Browser-Tests, 7Jenkins: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1156203 (10hashar) From our 1/1, change the timer to @daily and in the browsertest default, then for the VE screenshot job make it @anually :) [09:52:33] (03PS2) 10Zfilipin: Abort browsertests* jobs if they do not complete in 4 hours [integration/config] - 10https://gerrit.wikimedia.org/r/199919 (https://phabricator.wikimedia.org/T92275) [09:57:34] (03CR) 10Hashar: [C: 04-1] "The 4 hours timeout is merely for the Wikidata job. I am not a fan of having the same timeout for all jobs. Instead we can introduce a ne" [integration/config] - 10https://gerrit.wikimedia.org/r/199919 (https://phabricator.wikimedia.org/T92275) (owner: 10Zfilipin) [10:00:51] zeljkof: please do the @daily so we can it deployed soonish :) [10:01:28] hasharCall: Cool. What is our quota actually? [10:01:34] * Krinkle has no clue [10:01:39] 10 vm [10:01:41] 15 for mac [10:01:48] all in one shared account? [10:01:58] that's quite good. [10:02:06] Krinkle: sorry for the nitpick/ rage quit a few days ago. [10:02:13] 10 concurrent vm I assume, and practically unlimited minute? [10:02:33] hasharCall: No worries, you made a good point. I sent you an email. [10:02:42] Krinkle: sorry for the nitpick/ rage quit a few days ago. [10:03:16] I was so happy when I found the trick to change the image that I wanted to see it :D [10:04:28] while you are around, we have yet another bug in Zuul which causes changes to block the gate for 5 minutes whenever they are force merged https://phabricator.wikimedia.org/T93812 :( [10:04:39] good news: the Zuul packaging work for Precise / Trusty is mostly done [10:04:46] will deploy it next week once I written some doc [10:05:36] hashar: Yeah, the deadlock is interesting. It must be a regression, right? I don't think self-merged commits were a problem in the past. [10:05:48] it most probably always have been around [10:05:56] but we barely noticed it or blame something else [10:06:02] Hm.. [10:06:07] maybe our replication is slower? [10:06:09] it is much more noticieable now that all projects are sharing the same queue [10:06:16] it is unrelated [10:06:19] to the replication [10:06:26] remember that bug I told you about where a MobileFrontend job got a mediawiki master of almost a week old on a prod slave. [10:06:37] well the faulty function was meant to monitor replication / Gerrit lag in submitting a change [10:06:47] but it is given wrong parameters due to a race condition [10:06:59] in short, the merge event is handled before the gate and submit [10:07:04] Oh, right. That makes sense. Sharing the queue makes it worse. [10:07:21] so the new state of the repository ( master branch) is already pointing to the change in gate [10:07:40] and in gate, it compares the @master branch sha1 with the one of the change [10:07:57] and loop because they are identical [10:08:10] from what I understand. Need to write a unit test for it [10:08:16] and figure out the parameters being passed to the method [10:08:30] interesting. [10:08:37] yeah [10:08:44] doesn't impact OpenStack since they never ever force merge [10:08:44] hashar: Looks like that wait function in Zuul doesn't account for multiple commits appearing at once [10:08:50] it doesn't check "contain" it checks "is last" [10:08:55] right? [10:08:56] yeah [10:09:06] "is last" can be checked by asking Gerrit some refs [10:09:21] "contain" would need a clone of the repository to do something like git branch --contains [10:09:26] Yeah, which means if there was a merge commit, or multiple commits at once between the interval, it doesn't see it ever as "last" [10:09:39] exactly [10:09:46] and thus loop till it shows up in the repo [10:09:49] which never happen [10:09:59] Hm.. there's got to be a way to do it without a clone. Maybe a way over gerrit to see status of change id. [10:10:03] Or a way with git fetch [10:10:10] although git fetch would find it before merge. [10:10:11] the whole process appear after the gate job has passed all tests. It is to ensure that Gerrit properly merged it on --submit [10:10:31] Hm.. yeah [10:10:34] so if no change is merged, and since the gate guarantee changes are merged in sequence. The tip of the branch should be equal to the change being processed [10:10:37] and it's a synchronous loop in the main thread. [10:10:38] unless Gerrit fails to merge [10:10:43] yeah sync loop [10:10:49] so that deadlock all processing [10:10:53] Yup [10:10:55] that is done in the main loop [10:10:56] Great stuff :P [10:11:02] in a method named processOneItem or something [10:11:05] lovely [10:11:22] but I am quite happy to have found the root cause in less than 10 minutes the other day [10:11:28] Yeah! [10:11:31] Nice catch no doubt [10:11:32] when we had an outage rather late in the evening [10:11:43] there is another very nasty bug we had which you pointed to [10:11:53] So, I've got a couple ideas, but nothing solid. What's your next step on this you recommend? [10:11:59] which was changes being hold because a bunch of jobs are stall in "queued" mode [10:12:17] this was is fixed on our setup since October and it definitely solved the issue :) [10:12:26] so for the deadlock [10:12:40] my absolute top priority right now is to get a zuul debian package [10:12:50] to replace the crazy pip install I came up with a long time ago [10:13:00] and which is a mess to deal with as you noticed when recreating slaves [10:13:02] from there [10:13:11] hashar: Yeah. [10:13:13] we will be able to "easily" update Zuul [10:13:24] we are 3 or 4 months away from tip of upstream [10:13:29] so I want to push it forward [10:13:34] get our pending changes merged [10:13:40] this will reconciliate us with upstream [10:13:42] hashar: So.. I don't know if you read my mail yet, but I'll be bold: I'd like to propose we disable dependant gate pipeline. At least in the short term. [10:13:50] then work on the bug hitting us and craft patches [10:14:11] I havent read it :( [10:14:26] I suspect having all projects sharing the same queue is too disruptive / delay merges isn't it ? [10:14:31] Yeah [10:14:54] dependant gate is nice, but isn't all that important in my opinion. It's a nice trick to help with a minor edge case sometimes. [10:14:56] have you heard complains from devs? [10:15:03] I can imagine it impacts SWAT / train deploys [10:15:13] At least ^demon|away seemed interested in disabling it. [10:15:20] And it does delay SWAT a lot, yeah. [10:15:25] They prefer to not merge ahead of time [10:15:30] they should imho [10:15:32] so they merge 2 wmf changes, deploy, merge 2 more, deploy. [10:15:43] And a window can have a lot of changes sometimes. [10:15:45] and yeah the way we roll out extensions patches is crazy [10:15:47] And then other people doing stuff at the same time. [10:16:08] a single patch needs 1 merge to branch, 2 cherry picks to wmf branches, 2 submodule update. OR 5 commits [10:16:12] thus 10 jobs run [10:16:16] Also some repos with only every quick tests (elg. ios/app jslint) has to wait 10 minutes for Zend mediawiki core if that is merges at the same time [10:16:34] and dependent also doesn't look at branch. So master is blocked with wmf and vice versa. [10:17:13] (03CR) 10Filippo Giunchedi: Package python deps with dh-virtualenv (031 comment) [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [10:17:48] hashar: will do [10:18:00] Krinkle: yeah that is on purpose [10:18:09] because Zuul has no idea which branches the jobs are going to rely on [10:18:16] that is not the same as on purpose :P [10:18:38] a job trigger changes for a wmf branch could well use a repository using 'master' branch [10:19:03] so if a change for that repo master branch is ahead in the queue, you need the dependency since it can impacts the wmf branch change [10:19:09] at least in theory :) [10:20:09] hashar: Yeah, but if they're merge 5 minutes apart the other way around, then they both merge and nothing fails. [10:20:32] yeah [10:20:38] which is more common, and just normal practice in software development. Upstream might break a project, and you'll find out next commit. No big deal. Saves a few minutes, and has to be fixed either way. [10:20:55] unless the repo using 'master' also has a tests covering the other repositories branches [10:21:01] that is a never ending mess [10:21:01] True [10:22:04] what we can come up with [10:22:30] add an option to Zuul so we can disable the automatic regrouping of jobs if they share the same job [10:22:43] and add a way to specify the queue on a per project basis [10:24:42] hashar: Yes, but on the short term. Because we have other priorities (I need to get back to my Q3 goals) [10:24:58] * Krinkle adds your notes to the Task [10:25:21] I wish I have seen your huge refactor :( [10:25:30] I would probably have noticed that side effect [10:25:42] anyway using shallow clone and having less jobs is a huge win [10:26:43] hashar: Next low-prio task in that category (Just an idea, open to input from you!) is to change mwext jobs to use shallow clone for core, before calling zuul-cloner [10:27:10] mind to elaborate ? :) [10:27:19] because shallow clone uses git plugin, not zuul. [10:27:20] the workspaces are permanent so already have the clone around [10:27:29] so no submodule or dependencies or mediawiki core [10:27:35] hashar: Yeah, which is huge. [10:27:46] once they are cloned, zuul-cloner just git remote update && git checkout (ZUUL_REF|ZUUL_BRANCH|--branch|master) [10:27:49] hashar: Lego and I had to disable l10nbot again [10:27:56] oh [10:27:58] because we don't have enough space to have mext workspaces with mwcore for all projects [10:28:03] ah yeah [10:28:07] I commented on one of the task [10:28:15] and had a discussion with legoktm about it [10:28:24] the idea would be to have a mirror of mediawiki/core on each instance [10:28:42] BEFORE running zuul cloner, we would do a git clone --share mirror src/ [10:28:57] which would populate mediawiki/core from the mirror and have the git repo point to the mirror [10:29:07] which make core a 110KB repo in the workspace [10:29:14] hashar: Hm.. yeah, but how would that work inside a VM? I don't think this is important enough to implement something like that before we switch to VM. Might as well delay it or do in a way that would also work in a fresh VM. [10:29:20] Yeah, that'd be sweet! [10:29:27] then we launch zuul cloner which will grab from the zuul git repo whatever it is missing [10:29:44] that would save a shit ton of disk space / IO [10:29:48] (03CR) 10Filippo Giunchedi: Package python deps with dh-virtualenv (031 comment) [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [10:29:52] Yay [10:30:09] there is one drawback: whenever the mirror / reference repo is written to, all core clones will become corrupted apparently [10:30:28] written to? [10:30:37] the workaround is to delete it in the workspace and reclone it again. Which is fast and still only take 110KB [10:30:48] written to: running a commit in the mirror repo (unlikely) [10:30:55] much probable: doing a git repack [10:31:23] or a git remote update, which might trigger 'git gc --auto' which will kill loose objects which might be referenced in the workspaces [10:31:27] I am not sure to be honest [10:31:48] but we might end up having corrupted repos; so to play it safe it is easier to just restart with a fresh clone everytime [10:31:57] I might actually solve it next time if I have half a day to dedicate [10:34:58] Krinkle: I cant find the task :( [10:35:15] ah here it is https://phabricator.wikimedia.org/T93703#1144542 [10:35:22] ^ [10:36:45] hashar: Sounds good. [10:36:57] (03PS1) 10Hashar: Support per browsertest job timeout [integration/config] - 10https://gerrit.wikimedia.org/r/200129 [10:37:20] hashar: So VM isolation, what is current checkpoint on that? No rush, just curious where we are at and what is the next actionable blocker? [10:39:31] Zuul package mostly done [10:39:47] I have attempted packaging Nodepool and it seems trivial to do for Debian/Jessie [10:39:58] I will pair with Zeljkof to do the puppet grunt work that is needed [10:40:16] so Zeljkof level up on puppet and knows about our ci infra [10:40:39] couple blockers: get a new hardware in labs subnet which will host nodepool and zuul mergers [10:40:54] so we can drive the openstack API to create VMS and have the VM / labs instance fetch patches [10:41:07] my goal is to have a proof of concept by Lyon [10:41:19] then migrate some of our jobs to vm. Probably the integration/* ones first [10:41:25] then over the summer migrate the rest [10:41:25] (03PS1) 10Giuseppe Lavagetto: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 [10:41:44] (03CR) 10jenkins-bot: [V: 04-1] proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [10:43:22] (03PS2) 10Hashar: Support per browsertest job timeout [integration/config] - 10https://gerrit.wikimedia.org/r/200129 (https://phabricator.wikimedia.org/T92275) [10:43:24] (03PS3) 10Hashar: Limit WikidataTests browser test to four hours [integration/config] - 10https://gerrit.wikimedia.org/r/199919 (https://phabricator.wikimedia.org/T92275) (owner: 10Zfilipin) [10:43:36] zeljkof: here are the patches :) [10:43:50] hashar: great, I am working on @daily [10:44:17] Krinkle: to make it clear. Greg acked CI isolation is one of our team top priority [10:44:25] Krinkle: the other being the staging cluster. [10:44:41] Krinkle: each project is handled by a subset of people. CI isolation being Zeljko/Dan/ I [10:44:48] Cool. [10:44:49] I have yet to catch up with Dan though due to TZ diff :( [10:45:03] is smart / senior enough I am confident he will catch up easily [10:45:08] he is quite busy with MW Vagrant [10:45:10] When I initially pitched it to greg beginning of quarter, it was more of a side project. Cool to see it raised :) [10:45:12] (03CR) 10Zfilipin: [C: 032] Support per browsertest job timeout [integration/config] - 10https://gerrit.wikimedia.org/r/200129 (https://phabricator.wikimedia.org/T92275) (owner: 10Hashar) [10:45:22] yeah it has waited too long [10:45:37] zuul-cloner I worked last year was part of the effort though [10:45:56] 10Continuous-Integration, 7Browser-Tests, 7Jenkins: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1156268 (10zeljkofilipin) [10:46:01] hashar: I had two meetings with Dan in SF last year to familiarize him with about 80% of our CI infrastructure. [10:46:08] oh awesome [10:46:33] Not much hands-on in practice yet, but expect a lot of "Oh, right, yeah, I remember" [10:46:55] Yeah, he's a quick learner :) [10:46:59] will definitely get him enrolled in the work to provision our test images to boot instances of them [10:47:08] I would like to use Vagrant to populate them [10:47:11] (03CR) 10Zfilipin: [C: 032] Limit WikidataTests browser test to four hours [integration/config] - 10https://gerrit.wikimedia.org/r/199919 (https://phabricator.wikimedia.org/T92275) (owner: 10Zfilipin) [10:47:22] hashar: Hm.. manually? [10:47:22] and next year, eventually unify our CI vagrant with the MediaWikiVagrant project [10:47:34] so that devs can easily reproduce what is running in CI [10:47:40] Maybe it's for later, but I thought it would create instances itself based on a base image from labs and puppet roles. [10:47:42] (vagrant up && role ci  # hack) [10:47:45] Oh, right! [10:47:49] That would be awesome [10:47:54] nodepool is able to create instances for us [10:47:59] Yeah [10:48:10] we tell it something like: bootstrap debian jessie then run populate.sh [10:48:11] and them create a template from that/snapshot to quickly duplicate instances from [10:48:24] populate.sh would call the vagrant stuff we need [10:48:37] then once it is populated, the image is pushed to openstack and newly created VM use that [10:48:48] Yep [10:48:54] so you can in theory easily reproduce what CI is doing [10:49:03] They talked me through this in openstack-infra. [10:49:12] and if we get everything in git repos (instead of lame scripts in jenkins jobs), that means our devs will be able to tweak what CI is running [10:49:17] Roan and I actually had dinner with them in SF. That was so unexpected, but really nice. [10:49:20] Cool folks [10:49:21] by just proposing changes to the slave scripts [10:49:26] that is going to be fucking cool [10:49:33] then I can leave to greener pasturage :) [10:49:46] Wiat, you were there. [10:49:47] I'm stupud. [10:49:49] yeah [10:49:53] at the other side of the table hehe [10:49:54] (03Merged) 10jenkins-bot: Support per browsertest job timeout [integration/config] - 10https://gerrit.wikimedia.org/r/200129 (https://phabricator.wikimedia.org/T92275) (owner: 10Hashar) [10:50:00] Yeah, so far away :P [10:50:03] had interesting discussions with totally unrelated projects [10:50:09] they were quite impressed about our CI [10:50:13] ah, it was all shop talk on our side of the table [10:50:28] specially the qunit / karma / browser tests I told them about [10:50:56] and I discovered they have project to manage real servers (baremetal) from openstack [10:51:00] instead of vms [10:51:04] and that is crazy [10:51:05] yeah, using ironic [10:51:11] yup thgat is the name [10:51:15] The guy on my side of the table worked on that. [10:51:27] It was confusing, they introduced him as "He is ironic" [10:51:38] I wasn't sure whether it was a name or adjective, but it was neither. [10:51:43] (03Merged) 10jenkins-bot: Limit WikidataTests browser test to four hours [integration/config] - 10https://gerrit.wikimedia.org/r/199919 (https://phabricator.wikimedia.org/T92275) (owner: 10Zfilipin) [10:51:44] more like "Antoine is CI" [10:51:45] ahaha [10:51:59] I can imagine the confusion [10:52:25] poh [10:52:27] unrelated [10:52:35] do you have any timeslot during european afternooon? [10:52:42] I would like to start a weekly CI meeting [10:52:48] Yes [10:52:48] with europe / east coast people [10:53:00] more or less what we did a few time with just the both of us [10:53:02] Any day in mind? [10:53:07] * Krinkle looks at Engineering calendar [10:53:10] but with more poeple from WMF / WMDE and volunteers [10:53:21] I thought about doing them early in the week [10:53:33] hashar: btw, I'm finalising Karma and MySQL today. Will write email today or tomorrow to wikitech [10:53:36] I found out checkins late in the week tends to get item forgotten over the weekend [10:53:43] There's a few minor issues to clean up and then we're ready to announce it [10:53:51] oh yeah mysql, be bold please. That is a huge step forward [10:53:53] Yeah, monday or tuesday ideally. [10:54:26] hashar: The worst part is Zend/Precise. core tests got 4-5 minutes slower. Really annoying. But also proof that our tests use too much database for no reason. [10:54:31] Interestingly, hhvm/trusty almost no difference. [10:54:50] and for extensions no noticable difference either [10:54:56] ohh shit [10:55:06] so zend/precise went from 6 minutes to more than 10 ? [10:55:10] 9 or 10 [10:55:32] that means probably half an hour to push a SWAT change :( [10:55:44] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/4169/testReport/(root)/ - sort by Duration :) [10:55:50] 8 minutes 34 sec [10:55:51] we should probably get rid of the zend tests for the wmf branches [10:55:55] if not done already [10:55:59] ah, good point [10:56:04] * hashar fills task [10:56:18] prod is not entirely on HHVM yet though [10:56:41] (03PS1) 10Zfilipin: Run almost all browsertests* jobs daily [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) [10:56:50] hashar: ^ [10:56:57] 10Continuous-Integration: Get rid of zend tests for wmf branches - https://phabricator.wikimedia.org/T94149#1156282 (10hashar) 3NEW [10:57:03] zeljkof: greatness [10:57:12] Krinkle: https://phabricator.wikimedia.org/T94149 :) [10:57:39] Krinkle: might be quite easy to achieve using zuul job filters. Though there might be some side effect since some have branch: filters :( [10:58:08] Krinkle: is the engineering calendar accurate? It seems mostly empty [11:00:06] 10Continuous-Integration, 7Browser-Tests, 7Jenkins, 7Tracking: Delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1156290 (10zeljkofilipin) 3NEW a:3zeljkofilipin [11:00:54] hashar: Yeah. [11:01:04] I think most people moved to meetings with individuals invited [11:01:39] enabling me and greg in calendar on the right should give more coverage [11:01:47] I enabled you and greg. [11:02:43] Monday 1pm (SF) / 9pm (UTC; me) / 10pm (Europe) [11:02:45] That could work [11:02:58] maybe too late? [11:03:01] (03CR) 10Hashar: "I have updated the job and confirmed it is using a 240 minutes timeout now ( https://integration.wikimedia.org/ci/job/browsertests-Wikidat" [integration/config] - 10https://gerrit.wikimedia.org/r/199919 (https://phabricator.wikimedia.org/T92275) (owner: 10Zfilipin) [11:03:17] Krinkle: yeah [11:03:29] Krinkle: I wanted to do a few during european afternoon [11:03:32] it is easier for me [11:03:40] then eventually shift it to SF [11:03:44] that is my afternoon :P [11:03:45] or have two meetins [11:03:58] one in the morning for india/europe [11:04:05] and one in our evenings for US [11:04:08] Hm.. do we really need two? [11:04:17] for TZ coverage yeah :D [11:04:32] well at least at the beginning, I dont think I will be able to lead such a meeting in our late evenings [11:04:36] cause of kids / being tired [11:04:49] Yeah, but a few hours earlier and everyone should be able to join [11:05:06] I wake up at 7am every single day. So by 9pm / 10pm I am not very operational beside for maintenance tasks [11:05:21] and from 6pm to 9pm our time, that is familly / kids caring [11:05:30] which is unfortunately 9am / noon SF :( [11:05:41] Right [11:05:48] so either WMF or I should relocate to US east coast [11:06:08] so my idea was to skip SF for now [11:06:15] pair up with wmde / and european folks [11:06:18] see how it is going [11:06:26] then elaborate a scenario to cover SF as well [11:06:29] We can use RelEng meeting to sync with SF [11:06:32] either shift the meeting or have a second one [11:06:32] for summary [11:06:42] and notes of course [11:06:44] yeah or publish the report weekly [11:06:48] 10Continuous-Integration, 10MediaWiki-extensions-CentralNotice, 7Browser-Tests, 7Jenkins: Fix or delete failing and disabled browsertests-CentralNotice* Jenkins jobs - https://phabricator.wikimedia.org/T94151#1156306 (10zeljkofilipin) 3NEW [11:06:50] meetbot should help [11:06:58] meetings are overrated. I'd rather not have two. [11:07:01] wait, you mean irc? [11:07:19] or hangout / etherpad [11:07:23] not sure whaqt is going to work best [11:07:32] the RFC meetings are using meetbot and that seems to work for them [11:07:32] IRC could work. like a public triage basically [11:07:37] I guess we will want to experiment [11:07:43] but yeah public triage is what i have in head [11:07:47] as well as open QA for anyone [11:07:54] as well as having an opportunity for devs to speak about their needs [11:08:00] Yup [11:08:11] If they can't attend, they can send in answers ahead of time via etherpad. [11:08:14] (or file a phab task) [11:08:22] yeah [11:08:24] questions* [11:08:26] phab tasks are nice [11:08:51] having a meeting though let them call out a task for action [11:08:54] hashar: so what time works best for you? [11:09:08] (03PS2) 10Zfilipin: Run almost all browsertests* jobs daily [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) [11:09:13] 3 4 5 pm our time? [11:09:34] which are US East mornings [11:09:50] I am not sure when you start your days [11:09:55] well, we should triage all untriaged tasks in backlog. [11:10:02] yup [11:10:16] I tend to start around 2 or 3 [11:10:35] Today is an exception because my bank is .. suboptimal. [11:11:19] and I have no recurring meetings before 4 yet, so that should work [11:11:43] (03CR) 10Hashar: "I am hoping @daily will actualy spread the jobs over the course of the day. We dont want 30+ jobs to trigger at the same time or we will " (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) (owner: 10Zfilipin) [11:11:57] hashar: btw, did you really mean anually? [11:12:00] so 3pm our time? [11:12:00] for VE jobs? [11:12:13] yeah that is taking screenshots and it is run manually [11:12:23] oh [11:12:26] right [11:12:33] so there is no point in running it daily. Zeljkof had the idea to trigger them annualy, which is a clever hack :) [11:12:57] Monday 3 pm ? [11:13:20] OK [11:13:36] Wanna create it on WMF Engineering? [11:14:13] 10Continuous-Integration, 10Echo, 7Browser-Tests, 7Jenkins: Delete or fix failed Echo browsertests Jenkins job - https://phabricator.wikimedia.org/T94152#1156318 (10zeljkofilipin) 3NEW [11:16:26] Krinkle: doing so right now [11:16:42] zeljkof: Jenkins phab is for Jenkins system issues, not jobs :) [11:16:52] Krinkle: oops, sorry [11:16:54] (03PS1) 10Giuseppe Lavagetto: proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 [11:16:55] will remove tags [11:17:11] (03CR) 10jenkins-bot: [V: 04-1] proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 (owner: 10Giuseppe Lavagetto) [11:17:22] zeljkof: no worries, I haven't cleaned it up much so it looks confusing. [11:17:25] 10Continuous-Integration, 10Flow, 7Browser-Tests: Delete or fix failed Flow browsertests Jenkins job - https://phabricator.wikimedia.org/T94153#1156329 (10zeljkofilipin) 3NEW [11:18:39] 10Continuous-Integration, 10Echo, 7Browser-Tests: Delete or fix failed Echo browsertests Jenkins job - https://phabricator.wikimedia.org/T94152#1156343 (10zeljkofilipin) [11:18:48] 10Continuous-Integration, 6Release-Engineering: Create list of performance-related improvements for Jenkins jobs - https://phabricator.wikimedia.org/T423#1156346 (10Krinkle) [11:18:54] 10Continuous-Integration, 10MediaWiki-extensions-CentralNotice, 7Browser-Tests: Fix or delete failing and disabled browsertests-CentralNotice* Jenkins jobs - https://phabricator.wikimedia.org/T94151#1156348 (10zeljkofilipin) [11:19:04] 10Continuous-Integration, 7Browser-Tests, 7Tracking: Delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1156350 (10zeljkofilipin) [11:19:15] 10Continuous-Integration, 7Browser-Tests, 5Patch-For-Review: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1156352 (10zeljkofilipin) [11:19:32] 10Continuous-Integration, 6Release-Engineering: Send beta cluster Jenkins alerts to betacluster-alert list - https://phabricator.wikimedia.org/T1125#1156354 (10zeljkofilipin) [11:19:57] (03PS2) 10Giuseppe Lavagetto: proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 [11:20:12] 10Continuous-Integration, 6Release-Engineering: Create list of performance-related improvements for Jenkins jobs - https://phabricator.wikimedia.org/T423#4466 (10Krinkle) [11:20:13] (03CR) 10jenkins-bot: [V: 04-1] proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 (owner: 10Giuseppe Lavagetto) [11:20:13] Krinkle: you should have an invite [11:20:19] Krinkle: cleaned up jenkins phab project, the only one browser tests related is this one https://phabricator.wikimedia.org/T68449 [11:20:48] 10Continuous-Integration, 10Gather: Set up qunit Jenkins job for Extension:Gather - https://phabricator.wikimedia.org/T91708#1156365 (10Krinkle) [11:20:56] Krinkle: looks like it is related to jenkins performance, so I have left it there, feel free to remove if it is not related [11:22:24] (03CR) 10Zfilipin: Run almost all browsertests* jobs daily (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) (owner: 10Zfilipin) [11:22:30] (03PS3) 10Zfilipin: Run almost all browsertests* jobs daily [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) [11:22:45] 10Continuous-Integration, 10Wikidata: PHP fatal errors are not visible in jenkins output in mwext-Wikibase-client-tests job - https://phabricator.wikimedia.org/T92397#1156368 (10Krinkle) [11:23:25] 10Continuous-Integration, 6Release-Engineering, 7Puppet: Suggestion: disable autoloader_layout checks in our jenkins puppet-lint - https://phabricator.wikimedia.org/T1289#1156369 (10Krinkle) [11:24:48] (03PS4) 10Zfilipin: Run almost all browsertests* jobs daily [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) [11:24:51] (03PS3) 10Giuseppe Lavagetto: proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 [11:25:18] (03CR) 10Zfilipin: Run almost all browsertests* jobs daily (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) (owner: 10Zfilipin) [11:30:20] 10Continuous-Integration, 7Browser-Tests, 7Tracking: Delete or fix failed GettingStarted browsertests Jenkins job - https://phabricator.wikimedia.org/T94154#1156375 (10zeljkofilipin) 3NEW [11:30:58] 10Continuous-Integration, 10Flow, 7Browser-Tests: Delete or fix failed Flow browsertests Jenkins job - https://phabricator.wikimedia.org/T94153#1156382 (10zeljkofilipin) [11:32:43] 10Continuous-Integration, 10MediaWiki-Codesniffer, 5Patch-For-Review: Convert existing legacy phpcs jobs to use composer entry point + versioning - https://phabricator.wikimedia.org/T90943#1156398 (10phuedx) [11:33:48] 10Continuous-Integration, 7Regression, 7Upstream: Manually starting builds in Jenkins throws "java.lang.IndexOutOfBoundsException: Index: 0, Size: 0" - https://phabricator.wikimedia.org/T93321#1156405 (10Krinkle) [11:34:00] 10Continuous-Integration: integration-zuul-layoutdiff job creating 16MB of logs - https://phabricator.wikimedia.org/T92757#1156406 (10Krinkle) [11:34:26] 10Continuous-Integration: Bump Zuul support for python-statsd 3.x - https://phabricator.wikimedia.org/T78402#1156410 (10Krinkle) [11:37:52] (03PS2) 10Giuseppe Lavagetto: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 [11:38:07] (03CR) 10jenkins-bot: [V: 04-1] proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [11:38:20] 10Continuous-Integration, 6Release-Engineering, 7Browser-Tests: Map operations/mediawiki-config/extension-list entries to Jenkins browser test job - https://phabricator.wikimedia.org/T456#1156412 (10Krinkle) [11:39:23] 10Continuous-Integration, 6Release-Engineering, 7Browser-Tests: Map operations/mediawiki-config/extension-list entries to Jenkins browser test job - https://phabricator.wikimedia.org/T456#4629 (10Krinkle) Updated description as I assume this is specifically about browser tests. Unit test job contain non-nota... [11:40:52] 10Continuous-Integration, 7Browser-Tests, 5Patch-For-Review: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1156417 (10zeljkofilipin) 5Open>3Resolved [11:41:03] oh men [11:41:07] we have made so much progress [11:41:14] we have a template editor now https://www.mediawiki.org/wiki/Help:TemplateData [11:41:32] 10Continuous-Integration, 6Release-Engineering: Learn how Zuul works - https://phabricator.wikimedia.org/T1367#1156418 (10zeljkofilipin) a:5zeljkofilipin>3None [11:41:50] 6Release-Engineering, 7Documentation: Document RuboCop workflow - https://phabricator.wikimedia.org/T1368#1156419 (10zeljkofilipin) a:5zeljkofilipin>3None [11:42:26] 10Continuous-Integration, 6Release-Engineering: Repositories with Ruby code should be documented and appropriate Jenkins jobs should be running - https://phabricator.wikimedia.org/T1361#1156423 (10zeljkofilipin) a:5zeljkofilipin>3None [11:45:56] 10Continuous-Integration, 10MediaWiki-extensions-CentralNotice, 7Browser-Tests: Delete or fix failing and disabled CentralNotice browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94151#1156431 (10zeljkofilipin) [11:45:58] hashar: :) [11:49:17] 10Continuous-Integration, 6Mobile-Web, 7Browser-Tests, 7Tracking: Delete or fix failed MobileFrontend browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94156#1156443 (10zeljkofilipin) 3NEW a:3zeljkofilipin [11:51:15] zeljkof: one last for today: https://gerrit.wikimedia.org/r/#/c/200142/ [11:51:17] very trivial [11:51:40] I thought of creating a few new screenshots for citations UI today, but apparently it's all changing now, so I'll wait until it stabilizes. [11:52:16] aharoni: merged! [11:52:21] thanks [11:52:25] enjoy the weekend! [11:53:04] aharoni: thanks, you too! :) [11:53:41] 10Continuous-Integration, 10MediaWiki-extensions-MultimediaViewer, 6Multimedia, 7Browser-Tests, 7Tracking: Delete or fix failed MultimediaViewer browsertests Jenkins job - https://phabricator.wikimedia.org/T94157#1156464 (10zeljkofilipin) 3NEW [11:54:20] 10Continuous-Integration, 6Mobile-Web, 7Browser-Tests, 7Tracking: Delete or fix failed MobileFrontend browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94156#1156475 (10zeljkofilipin) a:5zeljkofilipin>3None [11:57:48] 10Continuous-Integration, 10MediaWiki-extensions-UniversalLanguageSelector, 7Browser-Tests, 7Tracking: Delete or fix failed UniversalLanguageSelector browsertests Jenkins job - https://phabricator.wikimedia.org/T94158#1156479 (10zeljkofilipin) 3NEW [12:00:12] hashar: do you know how to find Mark Holmquist/MarkTraceur in phab? [12:02:09] zeljkof: no clu [12:02:10] e [12:02:27] zeljkof: in the top right search box write: markt [12:02:27] :) [12:02:38] zeljkof: https://phabricator.wikimedia.org/p/MarkTraceur/ ! [12:04:23] (03CR) 10Hashar: [C: 032] "Great, lets see what happens. Refreshing jobs." [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) (owner: 10Zfilipin) [12:05:50] zeljkof: I am afraid @daily cause the jobs to run all at 00:00 UTC :D [12:08:51] (03Merged) 10jenkins-bot: Run almost all browsertests* jobs daily [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) (owner: 10Zfilipin) [12:08:53] (03CR) 10Hashar: "The runs are apparently spread https://github.com/jenkinsci/jenkins/commit/b1bb3f66676b550971db08725d5c3cef5b42191b :)" [integration/config] - 10https://gerrit.wikimedia.org/r/200135 (https://phabricator.wikimedia.org/T94145) (owner: 10Zfilipin) [12:14:26] 10Continuous-Integration, 7Browser-Tests, 5Patch-For-Review: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1156495 (10hashar) The jobs should be spread over the day as per Jenkins commit https://github.com/jenkinsci/jenkins/commit/b1bb3f66676b550971db08725d5c3cef5b42191b Can you announc... [12:14:45] zeljkof: so what is left to do is announce that browser tests are now only run on a daily basis https://phabricator.wikimedia.org/T94145 :) [12:15:06] 10Continuous-Integration, 10MediaWiki-extensions-UniversalLanguageSelector, 7Browser-Tests, 7Tracking: Delete or fix failed UniversalLanguageSelector browsertests Jenkins job - https://phabricator.wikimedia.org/T94158#1156496 (10Amire80) Let's take a look at this on Monday. [12:24:47] 10Beta-Cluster, 10MediaWiki-ResourceLoader: http://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences lacks normal styling - https://phabricator.wikimedia.org/T93050#1156511 (10Florian) Same on testwiki and test2wiki, btw. [12:30:54] (03PS26) 10Adrian Lang: Fix WikibaseJavaScriptApi tests [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) [12:34:01] 10Continuous-Integration: Get rid of zend tests for wmf branches - https://phabricator.wikimedia.org/T94149#1156535 (10Krinkle) [12:36:45] damn food [12:36:47] needed [12:37:35] PROBLEM - SSH on deployment-lucid-salt is CRITICAL: Connection refused [12:39:10] 10Beta-Cluster, 10Deployment-Systems, 5Patch-For-Review: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1156550 (10hashar) Thank you for the fix and I know how complicated our l10n cache sync is so kudos! [12:45:13] 10Continuous-Integration, 10MediaWiki-extensions-UploadWizard, 6Multimedia, 7Browser-Tests, 7Tracking: Delete or fix failed UploadWizard browsertests Jenkins job - https://phabricator.wikimedia.org/T94161#1156557 (10zeljkofilipin) 3NEW [12:48:18] 10Continuous-Integration, 10VisualEditor, 7Browser-Tests, 7Tracking: Delete or fix failed VisualEditor browsertests Jenkins job - https://phabricator.wikimedia.org/T94162#1156565 (10zeljkofilipin) 3NEW [12:52:19] 10Continuous-Integration, 10Wikidata, 7Browser-Tests, 7Tracking: Delete or fix failed Wikidata browsertests Jenkins job - https://phabricator.wikimedia.org/T94163#1156574 (10zeljkofilipin) 3NEW [12:54:38] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #1: SUCCESS in 1 min 1 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/1/ [12:54:46] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-chrome-sauce build #1: SUCCESS in 57 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-chrome-sauce/1/ [12:55:15] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-firefox-sauce build #1: SUCCESS in 1 min 18 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-firefox-sauce/1/ [13:00:14] 6Release-Engineering, 7Browser-Tests: mediawiki_selenium 0.4.0 does not timeout on sauce_api call - https://phabricator.wikimedia.org/T88221#1156584 (10zeljkofilipin) p:5Triage>3Normal [13:01:20] 6Release-Engineering, 7Browser-Tests: mediawiki_selenium 0.4.0 does not timeout on sauce_api call - https://phabricator.wikimedia.org/T88221#1006426 (10zeljkofilipin) @dduvall: is this still the problem in mediawiki_selenium 1.x? [13:01:40] 10Continuous-Integration, 10Wikidata, 7Browser-Tests, 7Tracking: Delete or fix failed Wikidata browsertests Jenkins job - https://phabricator.wikimedia.org/T94163#1156595 (10hashar) The Wikidata-WikidataTests one got aborted due to a timeout. It has been raised from 3 to 4 hours with https://gerrit.wikimed... [13:02:04] AndyRussG: I see you have joined the hangout, but I can not see or hear you [13:02:28] zeljkof: I could hear you, but not see you... I think the problem's on my end, one sec :) [13:02:43] open your eyes?! [13:02:51] (assuming you are not blind) [13:03:00] hashar: ;P [13:03:19] 6Release-Engineering, 7Browser-Tests: mediawiki_selenium always use the same default xvfb display 99 - https://phabricator.wikimedia.org/T73602#1156601 (10zeljkofilipin) @dduvall, @hashar: is this still the problem? [13:06:36] 6Release-Engineering, 7Browser-Tests: mediawiki_selenium always use the same default xvfb display 99 - https://phabricator.wikimedia.org/T73602#1156609 (10hashar) p:5High>3Low Dan added support to fix the issue with mediawiki_selenium 0.4.1. Per my comment on T73602#774867 we have to update the job templat... [13:08:32] https://integration.wikimedia.org/ci/view/BrowserTests/view/CentralNotice/ [13:08:33] AndyRussG: https://phabricator.wikimedia.org/T94151 [13:08:46] 10Beta-Cluster: m.wikidata.beta.wmflabs.org should point to a mobile IP - https://phabricator.wikimedia.org/T85469#1156621 (10hashar) Currently it resolves to: $ dig m.wikidata.beta.wmflabs.org m.wikidata.beta.wmflabs.org. 3600 IN A 208.80.155.139 From [[ https://wikitech.wikimedia.org/wiki/Sp... [13:09:18] 10Continuous-Integration, 10MediaWiki-extensions-CentralNotice, 7Browser-Tests: Delete or fix failing and disabled CentralNotice browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94151#1156623 (10zeljkofilipin) [13:19:05] https://docs.saucelabs.com/reference/platforms-configurator/?_ga=1.116398918.1392785498.1427461076#/ [13:19:43] 6Release-Engineering, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1156644 (10Qgil) [13:19:54] 6Release-Engineering, 7Browser-Tests: mediawiki_selenium 0.4.0 does not timeout on sauce_api call - https://phabricator.wikimedia.org/T88221#1156645 (10hashar) It is still a problem as I pointed at the bottom of my previous comment: > Looking at lib/mediawiki_selenium/remote_browser_factory.rb in our gem 1.0.... [13:20:33] 6Release-Engineering, 7Browser-Tests: mediawiki_selenium does not timeout on sauce_api call - https://phabricator.wikimedia.org/T88221#1156646 (10hashar) [13:20:37] 6Release-Engineering, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1156650 (10Qgil) Instead of "Blocks" should be "Blocked by", right? But we don't need all these notifications, so le... [13:22:51] 10Continuous-Integration, 6Release-Engineering: Learn how Zuul works - https://phabricator.wikimedia.org/T1367#1156653 (10hashar) @zeljkofilipin can we just close this task? We can cover Zuul behavior during our recurring 1/1 meeting. @addshore if you some time, feel free to contact me so we can arrange a fe... [13:25:56] 10Continuous-Integration, 6Release-Engineering: Create list of performance-related improvements for Jenkins jobs - https://phabricator.wikimedia.org/T423#1156659 (10hashar) [13:27:41] 10Continuous-Integration, 6Release-Engineering: Learn how Zuul works - https://phabricator.wikimedia.org/T1367#1156668 (10Addshore) +1 [13:50:06] # Set MEDIAWIKI_ENVIRONMENT for mediawiki-selenium >= 1.0.0 [13:50:06] case 'en.m.wikipedia.beta.wmflabs.org' in [13:50:06] 'en.wikipedia.beta.wmflabs.org') [13:50:06] export MEDIAWIKI_ENVIRONMENT=beta [13:50:06] ;; [13:50:07] 'test2.wikipedia.org') [13:50:10] export MEDIAWIKI_ENVIRONMENT=test2 [13:50:12] ;; [13:50:14] esac [13:58:16] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [13:59:43] 10Continuous-Integration: Error: 3 Error writing file './jenkins_u4_mw/#sql-3f4_cff.frm' (Errcode: 28) (localhost) on integration-slave1001 in mediawiki-extensions-zend job - https://phabricator.wikimedia.org/T94138#1156743 (10hashar) [14:00:01] 10Continuous-Integration: Error: 3 Error writing file './jenkins_u4_mw/#sql-3f4_cff.frm' (Errcode: 28) (localhost) on integration-slave1001 in mediawiki-extensions-zend job - https://phabricator.wikimedia.org/T94138#1156057 (10hashar) Edited tasks with some details. In short /var/ went full because of core files. [14:08:14] 10Continuous-Integration: Error: 3 Error writing file './jenkins_u4_mw/#sql-3f4_cff.frm' (Errcode: 28) (localhost) on integration-slave1001 in mediawiki-extensions-zend job - https://phabricator.wikimedia.org/T94138#1156763 (10hashar) [14:24:53] AndyRussG: that one will not catch jobs targetting commons.wikimedia.beta.wmflabs.org though ;) [14:25:05] commented about it on some Gerrit change zeljkof been working on earlier today [14:25:32] grocery shopping then kid. Be back later this evening. [14:25:37] hashar: thx! [14:28:53] AndyRussG: zeljkof: see my comment on https://gerrit.wikimedia.org/r/#/c/197975/ :) [14:29:06] that is a bug :) [14:29:41] hashar: thanks [14:30:13] hashar: zeljkof: in this case it's not detecting beta cluster mobile site, I think? [14:30:29] AndyRussG: in the above commit? [14:30:50] zeljkof: the code I pasted, further above, from the CentralNotice iphone browsertest [14:31:23] yeah we need to be smarter [14:31:29] AndyRussG: yes, we would have to take a look how gem version 1.x sets up the env variables [14:31:31] or drop the shell case / esac [14:31:39] and pass the env name explicitly to jobs [14:32:31] https://integration.wikimedia.org/ci/view/BrowserTests/view/CentralNotice/job/browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-os_x_10.10-iphone-sauce/configure [14:33:02] 6Release-Engineering, 6WMF-Legal, 6operations, 7Documentation: Sphinx generated documentation should state license in footer - https://phabricator.wikimedia.org/T94000#1156816 (10Dzahn) "Copyright Platform" seems not optimal, i would wonder what that actually is. Maybe "Wikimedia Platform Engineering Team"... [14:45:32] I am off now *wave* [14:46:20] * AndyRussG waves [14:48:29] Yippee, build fixed! [14:48:29] Project browsertests-Wikidata-SmokeTests-linux-firefox-sauce build #199: FIXED in 31 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-SmokeTests-linux-firefox-sauce/199/ [15:00:11] 10Continuous-Integration, 10MediaWiki-extensions-MultimediaViewer, 6Multimedia, 7Browser-Tests, 7Tracking: Delete or fix failed MultimediaViewer browsertests Jenkins job - https://phabricator.wikimedia.org/T94157#1156903 (10Gilles) Please don't delete them, a ton of work went into them. I've done many ro... [15:00:31] 10Continuous-Integration, 10Wikidata, 7Browser-Tests, 7Tracking: Delete or fix failed Wikidata browsertests Jenkins job - https://phabricator.wikimedia.org/T94163#1156904 (10Tobi_WMDE_SW) Ok, first, please do not remove the Wikidata jobs, even if they are failing. There are several reasons: 1) Due to the h... [15:02:07] 10Continuous-Integration, 10MediaWiki-extensions-MultimediaViewer, 6Multimedia, 7Browser-Tests, 7Tracking: Delete or fix failed MultimediaViewer browsertests Jenkins job - https://phabricator.wikimedia.org/T94157#1156905 (10Gilles) Also, a week is insufficient as a way to measure that a job is unsalvagea... [15:02:42] AndyRussG: asked sauce labs to transfer the main account to me... [15:04:45] 10Continuous-Integration, 10MediaWiki-extensions-UploadWizard, 6Multimedia, 7Browser-Tests, 7Tracking: Delete or fix failed UploadWizard browsertests Jenkins job - https://phabricator.wikimedia.org/T94161#1156913 (10Gilles) Same remarks as the ones I left on T94157 apply here. It's not realistic to hold... [15:07:33] 6Release-Engineering, 7Browser-Tests, 5Patch-For-Review: Things to do after Chris leaves - https://phabricator.wikimedia.org/T94032#1156915 (10zeljkofilipin) [15:17:32] zeljkof: thanks! [15:29:21] 6Release-Engineering, 7Browser-Tests: Ask Sauce Labs support if there is a way to disable Selenium log temporarily - https://phabricator.wikimedia.org/T89353#1156969 (10zeljkofilipin) > zfilipin > Mar 25, 9:46 AM > > Hi, > > I really like that Selenium log is available for every job, but since all our jobs a... [15:30:51] 6Release-Engineering, 7Browser-Tests: Ask Sauce Labs support if there is a way to disable Selenium log temporarily - https://phabricator.wikimedia.org/T89353#1156971 (10zeljkofilipin) > Dylan > Sauce Labs > > G'day! > > That's a great feature request! I've passed it onto our Product management team. > > Wh... [15:31:07] 6Release-Engineering, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1156972 (10bd808) >>! In T91803#1156650, @Qgil wrote: > Instead of "Blocks" should be "Blocked by", right? But we do... [15:33:32] 10Continuous-Integration, 6Release-Engineering, 6Phabricator, 10Wikimedia-Git-or-Gerrit: Gerritbot shouldn't post "Change merged by jenkins-bot:" messages any more - https://phabricator.wikimedia.org/T91766#1156980 (10Jdforrester-WMF) [15:34:22] 6Release-Engineering, 7Browser-Tests: Ask Sauce Labs support if there is a way to disable Selenium log temporarily - https://phabricator.wikimedia.org/T89353#1156982 (10zeljkofilipin) @dduvall, @hashar, do you think we should play with cookie manipulation? Or do we for now just disable Selenium log? [15:40:06] 10Beta-Cluster, 10Continuous-Integration, 6Release-Engineering: Send beta cluster Jenkins alerts to betacluster-alert list - https://phabricator.wikimedia.org/T1125#1157019 (10greg) [15:42:31] 10Beta-Cluster: m.wikidata.beta.wmflabs.org should point to a mobile IP - https://phabricator.wikimedia.org/T85469#1157032 (10greg) 5Open>3Resolved a:3yuvipanda [15:49:42] 10Continuous-Integration, 7Browser-Tests, 5Patch-For-Review: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1157075 (10zeljkofilipin) @hashar: done https://lists.wikimedia.org/pipermail/qa/2015-March/002214.html [15:51:24] 6Release-Engineering, 10MediaWiki-extensions-GettingStarted, 7Browser-Tests, 5Patch-For-Review: Pass MEDIAWIKI_CAPTCHA_BYPASS_PASSWORD in on Jenkins so GettingStarted browser tests pass - https://phabricator.wikimedia.org/T91220#1157076 (10zeljkofilipin) Is this still the problem? GettingStarted browser te... [15:52:23] 6Release-Engineering, 10VisualEditor, 7Browser-Tests: Selenium bug with Firefox causes VE test failure - https://phabricator.wikimedia.org/T90651#1157077 (10zeljkofilipin) I see the commit is merged. Can this be closed? [15:53:29] 6Release-Engineering, 10VisualEditor, 7Browser-Tests: Create VisualEditor tests targeting the older version of browsers (Chrome and Firefox for now) for better backward compatibility - https://phabricator.wikimedia.org/T90678#1157081 (10zeljkofilipin) Do you still plan to work on this, or can the task be clo... [15:55:14] 6Release-Engineering, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015, 7Browser-Tests: Create pool of user accounts on beta cluster for browser test builds in Jenkins - https://phabricator.wikimedia.org/T90964#1157092 (10zeljkofilipin) I am not sure how useful this would be. I think we should focus... [16:11:16] 5am :((( [16:12:09] legoktm: CI meeting? [16:12:13] yeah [16:12:26] it's actually 6am! [16:12:30] you can totally make it! [16:12:44] [04:05:48] so either WMF or I should relocate to US east coast <-- yes [16:13:33] I'm fine with appalachia [16:13:34] oh I'm bad at math today. too early!!! [16:21:25] 10Continuous-Integration, 7Browser-Tests, 5Patch-For-Review: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1157144 (10greg) What does ``` @daily ``` do? Does it intelligently spread them out over the day or run them all at once or? /me is just curious [16:31:03] 10Continuous-Integration, 7Browser-Tests, 5Patch-For-Review: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1157177 (10zeljkofilipin) @greg: yes, `@daily` should spread the jobs over the day. [16:38:50] 10Continuous-Integration, 10MediaWiki-Codesniffer, 10Possible-Tech-Projects, 3Google-Summer-of-Code-2015: GSOC Project Proposal for the Idea : Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T93934#1157208 (10lucky) @legoktm Sir Any comments on my proposal [16:41:38] 10Continuous-Integration, 10MediaWiki-Codesniffer, 10Possible-Tech-Projects, 3Google-Summer-of-Code-2015: GSOC Project Proposal for the Idea : Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T93934#1157224 (10NiharikaKohli) @lucky, what microtask(s) have you completed or a... [16:42:03] 6Release-Engineering, 3releng-201415-Q3: [Quarterly Success Metric] RelEng+TPG process discussion and improvements (tracking) - https://phabricator.wikimedia.org/T88708#1157226 (10greg) a:5Cmcmahon>3greg [16:53:55] 10Continuous-Integration, 10Fundraising Tech Backlog, 6Scrum-of-Scrums, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Continuous integration - CiviCRM - https://phabricator.wikimedia.org/T78100#1157260 (10greg) [16:54:49] 6Release-Engineering, 6Phabricator, 10Phabricator-Sprint-Extension, 7Browser-Tests, and 2 others: Create Browser Tests for Phabricator - https://phabricator.wikimedia.org/T87359#1157265 (10greg) >>! In T87359#1136192, @greg wrote: > Was this supposed to be just for the sprint extension? If so, that seems d... [16:56:20] 10Deployment-Systems, 6Release-Engineering, 6operations, 5Patch-For-Review: /usr/local/bin/deploy2graphite broken on tin due to nc command syntax - https://phabricator.wikimedia.org/T1387#1157273 (10greg) I think the only patch left here is https://gerrit.wikimedia.org/r/#/c/199857/ After that is this don... [16:57:25] 10Continuous-Integration, 6Release-Engineering, 5Patch-For-Review: Zuul-cloner forgets to clear workspace - https://phabricator.wikimedia.org/T76304#1157278 (10greg) p:5Unbreak!>3High This is unbreak now! but no movement in a long time.... what should happen next here? [17:13:02] (03PS2) 10Legoktm: Use zuul-cloner for Wikibase/Wikidata jobs [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) [17:30:15] (03PS3) 10Legoktm: Use zuul-cloner for Wikibase/Wikidata jobs [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) [17:37:13] (03PS1) 10JanZerebecki: Wikidata: add phpunit group Purtle [integration/config] - 10https://gerrit.wikimedia.org/r/200187 (https://phabricator.wikimedia.org/T94172) [17:48:00] !log marked integration-slave1002 as offline, /var filled up [17:48:08] Logged the message, Master [17:48:42] do we have alerts for that? [17:48:53] (did I miss it?) [17:49:00] 10Continuous-Integration: Error: 3 Error writing file './jenkins_u4_mw/#sql-3f4_cff.frm' (Errcode: 28) (localhost) on integration-slave1001 in mediawiki-extensions-zend job - https://phabricator.wikimedia.org/T94138#1157449 (10Legoktm) p:5Triage>3High /var just filled up on integration-slave1002. [17:49:37] 10Continuous-Integration: Small /var partition is filling up due to mysql usage on labs slaves - https://phabricator.wikimedia.org/T94138#1157451 (10Legoktm) [17:49:41] greg-g: idk, ^ is the bug for it [17:49:54] Project browsertests-Wikidata-WikidataTests-linux-firefox-sauce build #175: STILL FAILING in 3 hr 27 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-WikidataTests-linux-firefox-sauce/175/ [17:49:58] legoktm: thanks [17:50:50] !log deleted core dumps from integration-slave1002 [17:50:55] Logged the message, Master [17:57:21] 10Continuous-Integration, 10Flow, 7Browser-Tests: Delete or fix failed Flow browsertests Jenkins job - https://phabricator.wikimedia.org/T94153#1157467 (10Mattflaschen) Yes, we could use help if you have time available. When did it start failing? [17:58:20] 10Continuous-Integration, 10Echo, 7Browser-Tests: Delete or fix failed Echo browsertests Jenkins job - https://phabricator.wikimedia.org/T94152#1157471 (10Mattflaschen) If you have time available to help fixing it, we would appreciate it. When did it start failing? [18:01:23] 10Continuous-Integration, 10Echo, 7Browser-Tests: Delete or fix failed Echo browsertests Jenkins job - https://phabricator.wikimedia.org/T94152#1157491 (10Jdlrobson) It would be good to fix these up. I set them up as Echo had no tests whatsoever and it would be good to have confidence the feature is working. [18:10:26] (03PS4) 10Legoktm: Use zuul-cloner for Wikibase/Wikidata jobs [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) [18:11:06] (03PS5) 10Legoktm: Use zuul-cloner for Wikibase/Wikidata jobs [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) [18:11:53] (03CR) 10Legoktm: [C: 032] Use zuul-cloner for Wikibase/Wikidata jobs [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) (owner: 10Legoktm) [18:12:44] 10Continuous-Integration, 7Browser-Tests, 7Tracking: Delete or fix failed GettingStarted browsertests Jenkins job - https://phabricator.wikimedia.org/T94154#1157514 (10Mattflaschen) [18:12:45] 6Release-Engineering, 10MediaWiki-extensions-GettingStarted, 7Browser-Tests, 5Patch-For-Review: Pass MEDIAWIKI_CAPTCHA_BYPASS_PASSWORD in on Jenkins so GettingStarted browser tests pass - https://phabricator.wikimedia.org/T91220#1157515 (10Mattflaschen) [18:14:01] 6Release-Engineering, 10MediaWiki-extensions-GettingStarted, 7Browser-Tests, 5Patch-For-Review: Pass MEDIAWIKI_CAPTCHA_BYPASS_PASSWORD in on Jenkins so GettingStarted browser tests pass - https://phabricator.wikimedia.org/T91220#1157519 (10Mattflaschen) p:5High>3Unbreak! Yes, I've been trying to get th... [18:19:51] (03CR) 10Faidon Liambotis: [C: 04-2] "Relaying our IRC conversation:" [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 (owner: 10Giuseppe Lavagetto) [18:19:54] (03PS3) 10Mattflaschen: Pass MEDIAWIKI_CAPTCHA_BYPASS_PASSWORD to GettingStarted browser test [integration/config] - 10https://gerrit.wikimedia.org/r/194749 (https://phabricator.wikimedia.org/T91220) [18:20:16] (03CR) 10Mattflaschen: "Again, this is breaking the GettingStarted browser tests. Please review." [integration/config] - 10https://gerrit.wikimedia.org/r/194749 (https://phabricator.wikimedia.org/T91220) (owner: 10Mattflaschen) [18:20:23] 10Continuous-Integration, 10Wikidata: Switch or add additional jenkins jobs to run Wikibase tests with mysql - https://phabricator.wikimedia.org/T94208#1157529 (10aude) 3NEW [18:21:42] (03Merged) 10jenkins-bot: Use zuul-cloner for Wikibase/Wikidata jobs [integration/config] - 10https://gerrit.wikimedia.org/r/200122 (https://phabricator.wikimedia.org/T74001) (owner: 10Legoktm) [18:21:50] 6Release-Engineering, 10MediaWiki-extensions-GettingStarted, 7Blocked-on-Continuous-Integration, 7Browser-Tests, 5Patch-For-Review: Pass MEDIAWIKI_CAPTCHA_BYPASS_PASSWORD in on Jenkins so GettingStarted browser tests pass - https://phabricator.wikimedia.org/T91220#1157539 (10Mattflaschen) [18:21:58] 10Continuous-Integration, 6Release-Engineering, 10MediaWiki-extensions-GettingStarted, 7Blocked-on-Continuous-Integration, and 2 others: Pass MEDIAWIKI_CAPTCHA_BYPASS_PASSWORD in on Jenkins so GettingStarted browser tests pass - https://phabricator.wikimedia.org/T91220#1076825 (10Mattflaschen) [18:22:42] 10Continuous-Integration, 10MediaWiki-Codesniffer, 10Possible-Tech-Projects, 3Google-Summer-of-Code-2015: GSOC Project Proposal for the Idea : Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T93934#1157543 (10lucky) @niharikaKohli I have almost completed one microtask whic... [18:25:20] 10Continuous-Integration, 10Wikidata, 5Patch-For-Review, 7Technical-Debt: Remove dependency on git.wikimedia.org - https://phabricator.wikimedia.org/T74001#1157563 (10Legoktm) Wikibase/Wikidata jobs are now using zuul-cloner. Only usage left of the 'mw-setup-extension' macro is browser tests ('{name}-{ext-... [18:27:11] 10Continuous-Integration, 10MediaWiki-Codesniffer, 10Possible-Tech-Projects, 3Google-Summer-of-Code-2015: GSOC Project Proposal for the Idea : Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T93934#1157577 (10NiharikaKohli) @lucky, that microtask has already been completed... [18:32:22] (03PS1) 10Dduvall: Set MEDIAWIKI_ENVIRONMENT for browser tests on any beta wiki [integration/config] - 10https://gerrit.wikimedia.org/r/200207 [18:33:08] 10Continuous-Integration, 6Mobile-Web, 7Browser-Tests, 7Tracking: Delete or fix failed MobileFrontend browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94156#1157590 (10Jdlrobson) I started a mail thread about this. I think we should split the big one into smaller jobs. We also need to find a... [18:37:51] 10Continuous-Integration, 7Browser-Tests: Re-try each failed test within a build - https://phabricator.wikimedia.org/T67773#1157606 (10greg) [18:39:40] 10Continuous-Integration, 10Wikimedia-Hackathon-2015: All new extensions should be setup automatically with Zuul - https://phabricator.wikimedia.org/T92909#1157611 (10Legoktm) All extensions + skins currently have a basic phplint job running. jsonlint will soon be added to that list. https://gerrit.wikimedia.o... [18:39:53] 10Continuous-Integration, 10Wikimedia-Hackathon-2015: All new extensions should be setup automatically with Zuul - https://phabricator.wikimedia.org/T92909#1157612 (10Legoktm) [18:41:35] 10Continuous-Integration, 7Browser-Tests: Mark test build as useless due to non-test issues (beta, jenkins, etc) - https://phabricator.wikimedia.org/T66957#1157621 (10greg) [18:45:40] 10Continuous-Integration, 7Browser-Tests: Accommodate flaky tests flapping - https://phabricator.wikimedia.org/T94212#1157641 (10greg) 3NEW [18:45:50] 10Continuous-Integration: Jenkins tests shouldn't go red when it's not its fault - https://phabricator.wikimedia.org/T74722#1157653 (10greg) 5Open>3declined a:3greg >>! In T74722#801937, @zeljkofilipin wrote: > Is this still happening? I do not remember seeing it since we moved to wikimedia jenkins. Closi... [18:50:09] !log running `jenkins-jobs update` to update 'browsertests-UploadWizard-*' with Id33ffde07f0c15e153d52388cf130be4c59b4559 [18:50:14] Logged the message, Master [18:51:33] 10Continuous-Integration, 10MediaWiki-Codesniffer, 3Google-Summer-of-Code-2015, 3Outreachy-Round-10: GSoC Proposal for Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T94140#1157693 (10Hharchani) [18:54:43] 10Continuous-Integration, 10Wikimedia-Hackathon-2015: All new extensions should be setup automatically with Zuul - https://phabricator.wikimedia.org/T92909#1157701 (10Jdlrobson) Qunit tests and PHP units should be in by default and my main concern right now. There is an issue in Gather at the moment due to a b... [18:56:36] 6Release-Engineering, 10Wikimedia-Hackathon-2015: Release/QA tasks at the Wikimedia Hackathon 2015 - https://phabricator.wikimedia.org/T92565#1157706 (10greg) [18:58:14] 10Continuous-Integration, 10MediaWiki-Codesniffer, 10Possible-Tech-Projects, 3Google-Summer-of-Code-2015: GSOC Project Proposal for the Idea : Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T93934#1157709 (10lucky) @niharikaKohli Then I have completed not a single microtask [19:05:31] 10Continuous-Integration, 10MediaWiki-Codesniffer, 3Google-Summer-of-Code-2015, 3Outreachy-Round-10: GSoC Proposal for Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T94140#1157740 (10Hharchani) [19:07:19] 6Release-Engineering, 6WMF-Legal, 6operations, 7Documentation: Sphinx generated documentation should state license in footer - https://phabricator.wikimedia.org/T94000#1157745 (10Mattflaschen) >>! In T94000#1156816, @Dzahn wrote: > "Copyright Platform" seems not optimal, i would wonder what that actually i... [19:07:54] 6Release-Engineering, 6MediaWiki-Core-Team, 6WMF-Legal, 6operations, 7Documentation: Sphinx generated documentation should state license in footer - https://phabricator.wikimedia.org/T94000#1157746 (10Mattflaschen) [19:10:51] 6Release-Engineering, 6MediaWiki-Core-Team, 6WMF-Legal, 6operations, 7Documentation: Sphinx generated documentation should state license in footer - https://phabricator.wikimedia.org/T94000#1157755 (10greg) FYI: Just putting: ``` (C) Wikimedia Foundation Licensed under $whatever_license, see LICENSE for... [19:22:43] 10Continuous-Integration, 10Wikidata: Switch or add additional jenkins jobs to run Wikibase tests with mysql - https://phabricator.wikimedia.org/T94208#1157786 (10JanZerebecki) Both please. [19:23:30] 10Deployment-Systems, 5Patch-For-Review: LocalisationUpdate needs to support updating skins/ as well as extensions/ - https://phabricator.wikimedia.org/T69154#715669 (10greg) The only patch left unmerged is: https://gerrit.wikimedia.org/r/#/c/169716/ [19:24:05] 10Deployment-Systems: [l10n] l10nupdate process should respect the scap lock file - https://phabricator.wikimedia.org/T72752#1157790 (10greg) a:5Reedy>3None [19:24:20] (03CR) 10Dduvall: [C: 032] "Tested against https://integration.wikimedia.org/ci/view/BrowserTests/view/-Dashboard/job/browsertests-UploadWizard-commons.wikimedia.beta" [integration/config] - 10https://gerrit.wikimedia.org/r/200207 (owner: 10Dduvall) [19:24:31] 10Deployment-Systems, 6operations: Use FQDNs for mediawiki-installation - https://phabricator.wikimedia.org/T93983#1157794 (10Dzahn) >>! In T93983#1151838, @bd808 wrote: > The fix will be to update `mediawiki-installation` which is currently maintained manually to use FQDNs. A big comment should be added to th... [19:25:19] 10Deployment-Systems, 6operations: Use FQDNs for mediawiki-installation - https://phabricator.wikimedia.org/T93983#1157796 (10Dzahn) >>! In T93983#1157794, @Dzahn wrote: >>>! In T93983#1151838, @bd808 wrote: >> The fix will be to update `mediawiki-installation` which is currently maintained manually to use FQD... [19:29:50] 10Continuous-Integration, 7Technical-Debt: Remove dependency on git.wikimedia.org - https://phabricator.wikimedia.org/T74001#1157808 (10JanZerebecki) [19:31:52] 10Deployment-Systems, 5Patch-For-Review: [l10n] Use Scap in Localisation Update - https://phabricator.wikimedia.org/T72443#1157817 (10greg) a:5Reedy>3None [19:32:42] 10Deployment-Systems: Make make-wmf-branch able to branch extensions with replaced substring of the version of mediawiki being branched - https://phabricator.wikimedia.org/T51392#1157825 (10greg) [19:34:22] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » sv,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 7 hr 36 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=sv,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [19:34:27] 10Deployment-Systems, 6Release-Engineering, 5Patch-For-Review: Don't commit interwiki cdbs - https://phabricator.wikimedia.org/T75905#1157836 (10greg) [19:41:00] (03CR) 1020after4: [C: 031] "looks good, I don't have a scap test environment set up to run, however, from a visual inspection of the code it looks like this should wo" [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [19:42:09] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » nb,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 7 hr 44 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=nb,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [19:46:05] (03CR) 10Dzahn: [C: 04-1] "what jenkins says: undefined name 'cfg'. that's in utils.py, see inline comment. should be "conf" instead?" (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [19:50:51] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » krc,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 7 hr 53 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=krc,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [20:00:41] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » ro,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 8 hr 3 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=ro,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [20:01:24] (03PS3) 1020after4: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [20:01:40] (03CR) 10jenkins-bot: [V: 04-1] proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [20:10:34] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » hr,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 8 hr 12 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=hr,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [20:13:11] (03PS2) 10JanZerebecki: Wikidata: add phpunit group Purtle [integration/config] - 10https://gerrit.wikimedia.org/r/200187 (https://phabricator.wikimedia.org/T94172) [20:14:49] (03PS4) 1020after4: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [20:16:05] (03CR) 1020after4: "re-factored slightly to be more efficient, only do the for loop once (this also avoids the long line that flake8 was complaining about)" [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [20:17:30] 10Continuous-Integration, 10Wikimedia-Hackathon-2015: All new extensions should be setup automatically with Zuul - https://phabricator.wikimedia.org/T92909#1157931 (10Spage) >>! In T92909#1157611, @Legoktm wrote: > We could potentially create magic composer/npm jobs that run "composer test" / "npm test" if it... [20:20:40] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » ru,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 8 hr 23 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=ru,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [20:28:21] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » kn,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 8 hr 30 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=kn,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [20:38:06] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » ms,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 8 hr 40 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=ms,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [20:40:34] (03CR) 10Legoktm: [C: 032] Wikidata: add phpunit group Purtle [integration/config] - 10https://gerrit.wikimedia.org/r/200187 (https://phabricator.wikimedia.org/T94172) (owner: 10JanZerebecki) [20:41:12] 10Deployment-Systems, 6WMF-Legal, 7Documentation: mediawiki/tools/scap is lacking a license - https://phabricator.wikimedia.org/T94239#1158049 (10hashar) 3NEW [20:43:01] 6Release-Engineering, 6WMF-Legal, 7Documentation: mediawiki/tools/releng is lacking a license - https://phabricator.wikimedia.org/T94242#1158076 (10hashar) 3NEW [20:43:18] 10Deployment-Systems, 6WMF-Legal, 7Documentation: mediawiki/tools/scap is lacking a license - https://phabricator.wikimedia.org/T94239#1158082 (10hashar) [20:45:27] 6Release-Engineering, 6WMF-Legal, 7Documentation: mediawiki/tools/releng is lacking a license - https://phabricator.wikimedia.org/T94242#1158116 (10hashar) @greg can you ack on CC-BY-SA ? I dont mind doing the grunt work, crafting a Gerrit patch and have you +2 it for approval of the license. Should cover u... [20:45:58] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » fr,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 8 hr 48 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=fr,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [20:46:23] 6Release-Engineering, 6WMF-Legal, 7Documentation: mediawiki/tools/releng is lacking a license - https://phabricator.wikimedia.org/T94242#1158125 (10greg) >>! In T94242#1158116, @hashar wrote: > @greg can you ack on CC-BY-SA ? I dont mind doing the grunt work, crafting a Gerrit patch and have you +2 it for ap... [20:47:06] (03Merged) 10jenkins-bot: Wikidata: add phpunit group Purtle [integration/config] - 10https://gerrit.wikimedia.org/r/200187 (https://phabricator.wikimedia.org/T94172) (owner: 10JanZerebecki) [20:47:34] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Wikipedia' not found on 'http://en.m.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3323 bytes in 0.089 second response time [20:48:56] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3062 bytes in 0.070 second response time [20:49:00] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3318 bytes in 0.094 second response time [20:49:07] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3062 bytes in 0.062 second response time [20:49:15] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0] [20:49:35] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0] [20:49:43] eek [20:50:09] twentyafterfour: ^ [20:50:14] marxarelli: ^ [20:50:49] bd808: "[a4aff1a0] /wiki/Main_Page?debug=true ReflectionException from line of : Class MWLoggerMonologSamplingHandler does not exist" [20:51:10] bah [20:51:16] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #588: FAILURE in 25 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/588/ [20:51:21] need to update logging config for beta [20:51:21] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [20:51:21] PROBLEM - Puppet failure on deployment-test is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:51:52] greg-g: https://gerrit.wikimedia.org/r/#/c/198662/ [20:52:07] (03CR) 10JanZerebecki: "Updated and checked Jenkins jobs: mwext-Wikibase-repo-api-tests, mwext-Wikidata-repo-nonexperimental-tests, mwext-Wikidata-repo-tests" [integration/config] - 10https://gerrit.wikimedia.org/r/200187 (https://phabricator.wikimedia.org/T94172) (owner: 10JanZerebecki) [20:52:45] bd808: without that beta stays broken? [20:53:00] yeah. Or a smaller fix [20:53:09] kk [20:53:18] I have a meeting in ~7 minutes [20:53:35] Can somebody push that for me? [20:53:45] review and push [20:53:47] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [20:54:13] 6Release-Engineering, 6WMF-Legal, 7Documentation: mediawiki/tools/releng is lacking a license - https://phabricator.wikimedia.org/T94242#1158141 (10Slaporte) @greg do you prefer CC BY over CC0 for these sort of docs? [20:54:21] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » lb,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 8 hr 56 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=lb,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [20:55:29] 6Release-Engineering, 6WMF-Legal, 7Documentation: mediawiki/tools/releng is lacking a license - https://phabricator.wikimedia.org/T94242#1158142 (10greg) >>! In T94242#1158141, @Slaporte wrote: > @greg do you prefer CC BY over CC0 for these sort of docs? CC0 is fine with me as well, actually. Just a flow ch... [20:55:56] twentyafterfour: ^d can you review/merge that patch from bryan re logging on beta? beta's broken without it [20:56:18] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [20:56:28] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [20:56:37] !log Beta Cluster is down, known [20:56:42] Logged the message, Master [20:57:04] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [20:57:13] <^d> greg-g: 198662? [20:57:14] <^d> Looking [20:57:25] greg-g: I think Roan is going to revert the breaking change [20:57:31] either way will fix beta [20:57:48] <^d> I can +2 [20:57:48] I should have merged a smaller config change earlier [20:57:59] 10Beta-Cluster, 10MediaWiki-Logging: ReflectionException from line of : Class MWLoggerMonologSamplingHandler does not exist - https://phabricator.wikimedia.org/T94249#1158163 (10greg) 3NEW [20:58:11] heh. Roan just +2'd a revert [20:58:17] we can sort it out later [20:58:18] 10Beta-Cluster, 10MediaWiki-Logging: ReflectionException from line of : Class MWLoggerMonologSamplingHandler does not exist - https://phabricator.wikimedia.org/T94249#1158170 (10greg) [20:58:20] kk [20:58:33] <^d> Oh and I just gave you a +2 too [20:58:37] :) [20:58:55] 10Beta-Cluster, 10MediaWiki-Logging: ReflectionException from line of : Class MWLoggerMonologSamplingHandler does not exist - https://phabricator.wikimedia.org/T94249#1158175 (10Catrope) https://gerrit.wikimedia.org/r/#/c/200235/ reverts the offending commit and is currently being merged. [21:00:00] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [21:00:44] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [21:00:58] 10Beta-Cluster, 10MediaWiki-Logging: ReflectionException from line of : Class MWLoggerMonologSamplingHandler does not exist - https://phabricator.wikimedia.org/T94249#1158189 (10Catrope) Strike that, Chad just +2ed https://gerrit.wikimedia.org/r/198662 which is the compensating config change, so I've abandoned... [21:01:15] PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [0.0] [21:01:34] oh, all the fun chatter was over in -dev :( [21:03:37] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » mk,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 9 hr 6 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=mk,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [21:05:17] hi [21:06:04] hashar: :) [21:06:21] greg-g: I am reading Slaporte reply suggesting to use CC0 :D [21:06:33] I dont think I can legally put my work under CC0 by french laws [21:06:36] but IANAL [21:06:54] in addition I have a joint copyright agreement with wmf hehe [21:07:05] so I guess wmf can do whatever they want with the part I did [21:07:31] so in theory: I could add the CC0 and state that I am doing it on behalf of the wmf but not for me as an individual [21:07:35] going to be fun case in court [21:07:40] hashar: there's a license fallback for CC0 [21:07:50] so we default to CC0 ? [21:08:42] http://creativecommons.org/publicdomain/zero/1.0/legalcode "Public license fallback" [21:08:59] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 49090 bytes in 0.922 second response time [21:08:59] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 49289 bytes in 0.552 second response time [21:09:00] sure, for that repo at least, cc0 is fine with me [21:09:05] yay, things are coming back! [21:09:05] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 49090 bytes in 0.536 second response time [21:09:18] !! [21:10:10] greg-g: abuse :p [21:10:17] 6Release-Engineering, 6Phabricator, 10Phabricator-Sprint-Extension, 7Browser-Tests, and 2 others: Create Browser Tests for Phabricator - https://phabricator.wikimedia.org/T87359#1158226 (10Christopher) The browser tests in the Sprint extension are very limited and cover only a sampling of stuff. The base... [21:11:07] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » el,contintLabsSlave && UbuntuTrusty build #33: FAILURE in 9 hr 13 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=el,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [21:12:36] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30138 bytes in 0.546 second response time [21:13:19] !log things be better [21:13:24] Logged the message, Master [21:13:54] 10Deployment-Systems, 5Patch-For-Review: LocalisationUpdate needs to support updating skins/ as well as extensions/ - https://phabricator.wikimedia.org/T69154#1158231 (10mmodell) I'd merge it but it should really be merged and monitored by whoever does the next swat deploy? [21:14:36] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [21:15:49] 10Deployment-Systems, 5Patch-For-Review: LocalisationUpdate needs to support updating skins/ as well as extensions/ - https://phabricator.wikimedia.org/T69154#1158234 (10greg) This can be merged and deployed at any time that someone is willing to either run l10nupdate manually or wait until the nightly run. C... [21:16:13] RECOVERY - Puppet failure on deployment-test is OK: OK: Less than 1.00% above the threshold [0.0] [21:17:17] http://shinken.wmflabs.org/problems?search=hg:deployment-prep still got a lot of puppet failures though, what's up with that? [21:19:22] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:21:17] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [21:21:27] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:22:00] 10Beta-Cluster, 10MediaWiki-Logging: ReflectionException from line of : Class MWLoggerMonologSamplingHandler does not exist - https://phabricator.wikimedia.org/T94249#1158271 (10greg) 5Open>3Resolved a:3greg This is fixed now, thanks all for the quick turn around. [21:22:08] 10Beta-Cluster, 10MediaWiki-Logging: ReflectionException from line of : Class MWLoggerMonologSamplingHandler does not exist - https://phabricator.wikimedia.org/T94249#1158274 (10greg) a:5greg>3None [21:22:09] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [21:22:53] the build queue in Jenkins scared the hell out of me just now (https://integration.wikimedia.org/ci/ ) until I realized it was all of the db update jobs and that's ok. [21:23:22] well, now I'm curious if those db update jobs are actually running... [21:23:38] (ignore me for now) [21:24:37] shit, yeah :( [21:24:38] Configuration beta-update-databases-eqiad » deployment-bastion-eqiad,fawiki is still in the queue: Waiting for next available executor on deployment-bastion.eqiad [21:25:01] RECOVERY - Puppet failure on deployment-kafka02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:25:23] hashar: what should I do to kick that? ^ [21:25:37] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [21:26:17] RECOVERY - Puppet failure on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [21:26:49] (03PS1) 10Hashar: Clarify license is CC0 [tools/releng] - 10https://gerrit.wikimedia.org/r/200240 (https://phabricator.wikimedia.org/T94242) [21:27:06] help?? https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/8493/console [21:27:34] (03CR) 10Hashar: "greg-g please +2 if you agree to the license :)" [tools/releng] - 10https://gerrit.wikimedia.org/r/200240 (https://phabricator.wikimedia.org/T94242) (owner: 10Hashar) [21:27:43] ah poor database [21:27:48] damn that keeps happening [21:27:53] greg-g: looking at it [21:28:02] there is some deadlock / race condition somewhere in jenkins [21:28:24] (03CR) 10Greg Grossmeier: [C: 032] "Let it be free." [tools/releng] - 10https://gerrit.wikimedia.org/r/200240 (https://phabricator.wikimedia.org/T94242) (owner: 10Hashar) [21:28:43] 10Beta-Cluster: beta-scap-eqiad works but throws errors - https://phabricator.wikimedia.org/T94261#1158319 (10Catrope) 3NEW [21:28:45] waiting for scap to finish [21:29:00] (03Merged) 10jenkins-bot: Clarify license is CC0 [tools/releng] - 10https://gerrit.wikimedia.org/r/200240 (https://phabricator.wikimedia.org/T94242) (owner: 10Hashar) [21:30:27] 10Continuous-Integration, 10MediaWiki-extensions-CentralNotice, 7Browser-Tests: Delete or fix failing and disabled CentralNotice browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94151#1158341 (10AndyRussG) [21:30:35] 6Release-Engineering, 6WMF-Legal, 7Documentation, 5Patch-For-Review: mediawiki/tools/releng is lacking a license - https://phabricator.wikimedia.org/T94242#1158342 (10hashar) a:3hashar [21:31:26] 6Release-Engineering, 6WMF-Legal, 7Documentation, 5Patch-For-Review: mediawiki/tools/releng is lacking a license - https://phabricator.wikimedia.org/T94242#1158343 (10hashar) 5Open>3Resolved Greg and I agreed on CC0 The above code change should make us legally happy. The site is up to date and the mai... [21:31:51] greg-g: magically solved itself [21:32:09] greg-g: apparently the scap job is preventing over jobs from running :( [21:32:52] ahhh [21:33:45] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [21:33:55] anyway yeah [21:33:58] just have to wait [21:34:36] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [21:34:51] hashar: well, alright then [21:34:53] :) [21:35:11] anyway news from the front [21:35:23] we now run browser tests once per day at some random time of the day [21:35:26] instead of twice per day [21:35:36] cool [21:35:47] Timo and I have an open CI triage session on monday 1pm UTC [21:35:57] which is the middle of the night for SF but who cares :D [21:36:08] <^d> greg-g: Give it a little longer to recover on its own and then we'll have a better idea of what needs poking [21:36:14] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [21:36:15] * greg-g nods [21:36:15] <^d> (still seems to be slowly getting there) [21:36:19] <^d> eg ^ :) [21:36:24] * greg-g is just antsy today [21:37:13] 10Beta-Cluster, 10Deployment-Systems: scap failed: IOError [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp' - https://phabricator.wikimedia.org/T94261#1158375 (10hashar) [21:38:18] 10Beta-Cluster, 10Deployment-Systems: scap failed: IOError [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp' - https://phabricator.wikimedia.org/T94261#1158319 (10hashar) IOError: [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp' That sounds ve... [21:39:40] RECOVERY - Puppet failure on deployment-sentry2 is OK: OK: Less than 1.00% above the threshold [0.0] [21:44:12] legoktm: dvipng core dumps for some reason :( [21:59:20] 10Continuous-Integration: dvipng spurts coredump on Precise instance - https://phabricator.wikimedia.org/T94273#1158472 (10hashar) 3NEW [21:59:54] 10Continuous-Integration: Small /var partition is filling up due to mysql usage on labs slaves - https://phabricator.wikimedia.org/T94138#1158478 (10hashar) [22:02:24] hey. I can look at the scap perms thing if nobody else it [22:02:27] *is [22:02:55] that would be nice, bd808 [22:02:58] It's probably perms fallout from the change thcipriani made yesterday [22:03:29] which was an attempt to undo the change that made me change all the perms on wednesday [22:03:56] chmod -R 777 * [22:04:15] heh. let's not go there again [22:04:48] 10Beta-Cluster, 10Deployment-Systems: scap failed: IOError [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp' - https://phabricator.wikimedia.org/T94261#1158491 (10bd808) a:3bd808 [22:04:56] 10Beta-Cluster, 10Deployment-Systems: scap failed: IOError [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp' - https://phabricator.wikimedia.org/T94261#1158319 (10bd808) p:5Triage>3High [22:05:11] PROBLEM - SSH on deployment-salt is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:05:23] ohh, good. yeah, if you're not part of the mwdeploy group or root, you'll get a permissions denied in beta now. [22:06:00] files in the dir still have wonky perms [22:07:24] ah, crud, yeah I see that. [22:07:47] the top level directory has the correct ownership, now everything in the directory is still hosed. [22:08:13] yeah. puppet isn't set to recursively manage that. [22:08:21] and actually shouldn't be [22:08:30] receusive perms ina puppet are evil [22:08:38] yeah, it just does a git "update" on that directory [22:08:46] http://bd808.com/blog/2014/09/30/puppet-file-recurse-pitfall/ [22:09:09] I think that dir should really be chgrp wikidev [22:09:48] I remember reading this right when I first started. [22:09:56] did I have you tell puppet to make it mwdeploy? [22:10:01] wait...wikidev? I thought it was supposed to be mwdeploy? [22:10:07] RECOVERY - SSH on deployment-salt is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [22:10:15] mediawiki-staging should be wikidev, mediawiki should be mwdeploy [22:10:33] wikidev == humans and bot that update the origin [22:10:47] mwdeploy == bot that copies stuff [22:10:59] so the problem there is that there are jenkins postmerge jobs that run as mwdeploy [22:11:12] or, at least, that was the problem I was trying to fix yesterday [22:11:17] hmmm [22:11:19] 10Continuous-Integration, 10Flow, 7Browser-Tests: Delete or fix failed Flow browsertests Jenkins job - https://phabricator.wikimedia.org/T94153#1158516 (10hashar) You can look at each of the four links mentioned in the task detail. They shows up a build trend which shows passing/failling builds for the last... [22:11:32] because of the switch to the shared ssh agent right? [22:12:21] mwdeploy isn't a group that I or the jenkins bot belong to [22:12:21] well that was the original purpose of the merge of those roles, I think. Or at least part of the idea. [22:12:48] no, but it runs scripts as mwdeploy...does exec sudo -u etc... [22:12:52] mediawiki-staging on tin is 2775 root:wikidev [22:13:21] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » zh-hans,contintLabsSlave && UbuntuTrusty build #33: SUCCESS in 10 hr: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=zh-hans,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [22:13:27] The problem is that we are using mwdeploy differently in beta cluster then [22:13:50] In prod, mwdeploy is only used on the leaf nodes [22:14:08] to populate /srv/mediawiki via rsync and run things after [22:14:38] /usr/local/bin/wmf-beta-mwconfig-update was the script I was thinking of, fyi. [22:14:42] the work in mediawiki-staging happens as the deployer who is a wikidev member [22:15:58] yeah. that should be running as the jenkins-deploy user [22:16:03] not mwdeploy [22:16:28] sure, but that script has this line: [[ $(whoami) = $RUN_AS ]] || exec sudo -H -E -u $RUN_AS -- "$0" "$@" [22:16:39] where RUN_AS=mwdeploy [22:16:46] Oh I see that. It just doesn't make sense [22:17:16] so, maybe the Right Thing™ would be to change the group to wikidev, then update all those scripts to wikidev...? [22:17:41] it'd be nice to have it be consistent with prod. [22:18:21] jenkins-deploy is the user that should be changing things in /srv/mediawiki-staging [22:18:31] That user is a member of wikidev [22:18:32] except that wikidev is only a group. dang. [22:18:38] as are all the humans [22:18:56] oh, then yeah, get rid of RUN_AS all together. I get it. [22:20:10] yeah. This is blowback from the changes to merge things with prod actually [22:20:26] that script has been running as mwdeploy for a long time [22:20:37] but mydeploy used to be different I'm pretty sure [22:20:44] *mwdeploy [22:21:01] lets think for a minute before randomly whacking things [22:21:08] absolutely it is. I'm not sure of the original permissions. If I had to guess it would be 775 mwdeploy:wikidev [22:21:17] note there are some sudo rights set in the wikitech web interface [22:21:37] and I think the deployment group got changed from wikidev to project-deployment-prep fairly recently [22:21:43] o hi hashar o/ [22:21:50] hi!! :) [22:22:06] yes. thcipriani change that back yesterday [22:22:21] but we haven't chmod -R things back [22:22:32] all that pile of users / scripts is really a mess :( [22:22:48] meh. ad hoc automation always is [22:22:54] it's getting better [22:23:02] can we do everything as root again? [22:23:12] * hashar grins [22:23:14] thcipriani: I think you are right about the prior ownership [22:23:35] but I think now we can just get rid of the sudo mwdeploy [22:23:44] after we fix all the perms [22:24:11] 10Continuous-Integration, 10Wikimedia-Hackathon-2015: All new extensions should be setup automatically with Zuul - https://phabricator.wikimedia.org/T92909#1158556 (10Jdlrobson) @SPage the boilerplate is great but some extensions may not choose to use it for some reason... my goal is to not have to waste time... [22:25:24] So I think we should chmod -R jenkins-deploy:wikidev in /srv/mediawiki-staging [22:25:49] and remove the sudo from /usr/local/bin/wmf-beta-mwconfig-update [22:26:12] and then see what user is running /usr/local/bin/wmf-beta-scap [22:26:27] which I would hope in jenkins-deploy [22:26:34] but if not fix that too [22:26:39] looks like that's whomever runs that file [22:26:46] nod [22:26:51] the jenkins-deploy user is the one used on labs to run jenkins scripts [22:27:10] and the beta-scap-eqiad job is fairly simple: /usr/local/bin/scap "$JOB_NAME (build $BUILD_DISPLAY_NAME)" [22:27:33] yeah. It used to be fancier but it got whittled down to that [22:27:40] which is nice [22:27:44] I like when Jenkins jobs are dumb [22:27:49] It used to start the ssh agent [22:28:14] Awesome! It's not even used anymore -- https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/configure [22:28:30] note the job should not be changed via the web interface [22:28:36] but via integration/config.git and jjb [22:28:39] although that path to the scap bin should be different [22:28:57] yeah. just checking it there [22:28:59] cause we sometime refresh all the jobs using jjb, so any hack done via Jenkins the web interface will disappear on refresh [22:29:03] reading jjb isn't always fun [22:29:19] :syntax off is what I learned [22:29:46] bd808: suggestion for improvement welcomed as tasks :) [22:29:53] I will be happy to refactor [22:31:03] hashar: :) It's just following the macros when they are used. can be a twisty little maze [22:31:13] and I have being eaten by the grue [22:31:18] *hate [22:31:33] yeah I have been through that pain at one point :( [22:31:52] hashar: hey could you clean up a lingering src/vendor which makes https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-api-tests/6580/console on integration-slave1001 fail? [22:32:08] I feel some parts could probably be made more obvious / simpler [22:32:27] jzerebecki: sure :( [22:32:42] jzerebecki: dont you have access to the slaves? [22:32:51] hashar: i wish [22:33:12] jzerebecki: please fill a task to request access on the integration labs project against #continuous-integration project [22:33:17] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [22:33:20] jzerebecki: feel free to add other WMDE people to it as well [22:33:52] hashar: will do. thx. [22:34:28] !log integration-slave1001 rm -fR mwext-Wikibase-repo-api-tests/src/vendor [22:34:32] Logged the message, Master [22:34:47] seems mediawiki/core has a /vendor/ directory now? [22:35:19] ah yeah from wmf branches bah [22:36:48] deployment-salt is a slow pig today [22:36:56] bd808 hashar : amended patch from yesterday, should not interfere with any permissions changes, i.e. back to wikidev [22:37:13] thcipriani: oh cool [22:37:14] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [22:37:18] thcipriani: I have noticed the group change by YuviPanda but I cant remember whether I mentionned it on a task :/ [22:37:23] I can re-cherry-pick [22:37:26] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [22:37:26] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [22:37:26] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [22:37:48] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [22:37:54] bd808: please do. Actually makes stuff a bit cleaner, too. [22:37:55] why is the puppet master load so crazy? [22:38:03] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0] [22:38:16] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [22:38:19] PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [22:38:43] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [22:38:57] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:40:01] !log added jzerebecki to the integration labs project as a normal member [22:40:05] Logged the message, Master [22:40:09] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [22:40:09] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [22:40:09] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:40:33] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [22:40:33] !log integration: created sudo policy allowing members to run any command as jenkins-deploy on all hosts. [22:40:35] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [22:40:35] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [22:40:37] Logged the message, Master [22:40:47] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [22:40:59] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [22:41:00] jzerebecki: you should be able to connect to the slaves now and sudo as 'jenkins-deploy' which is the user that runs the jobs [22:41:06] !log cherry-picked https://gerrit.wikimedia.org/r/#/c/200248/ and https://gerrit.wikimedia.org/r/#/c/199988/ [22:41:10] Logged the message, Master [22:41:11] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL: CRITICAL: 85.71% of data above the critical threshold [0.0] [22:41:15] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [22:41:22] !log forcing puppet run on deployment-bastion [22:41:25] labs sure has been wacky today. I got an instance without an ec2id earlier. Not even sure how that happens. [22:41:26] Logged the message, Master [22:41:43] ldap burp [22:41:47] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [22:41:47] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [22:42:15] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [22:42:17] thcipriani: iirc the ec2id is provided by a puppet fact which might fail its http request due to random reason. It might not always abort as expected by puppet which ends up with an empty result [22:42:27] thcipriani: might be worth filling a task :) [22:43:02] operations/puppet.git modules/base/lib/facter/ec2id.rb has the code, and it is just a curl :/ [22:43:30] 10Continuous-Integration, 10MediaWiki-extensions-General-or-Unknown: Separate BoilerPlate extension from extension/examples - https://phabricator.wikimedia.org/T94279#1158623 (10Spage) 3NEW [22:43:59] !log deployment-bastion: chown -R jenkins-deploy:wikidev /srv/mediawiki-staging/ [22:44:03] Logged the message, Master [22:44:15] hashar: yeah, looks like it's calling out to openstack. [22:44:21] if I had to guess. [22:44:38] 10Continuous-Integration, 7Browser-Tests, 5Patch-For-Review: Run browser tests daily - https://phabricator.wikimedia.org/T94145#1158634 (10hashar) >>! In T94145#1157144, @greg wrote: > What does > ``` > @daily > ``` > do? Does it intelligently spread them out over the day or run them all at once or? /me is... [22:45:39] hashar: yes, works. thx. so no ticket anymore? [22:45:52] * bd808 twiddles thumbs while a long chown -R runs [22:45:54] jzerebecki: please fill one so other CI folks knwo about it :) [22:46:10] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL: CRITICAL: 85.71% of data above the critical threshold [0.0] [22:46:11] jzerebecki: and this way we can refer to the task to get more folks added [22:46:19] k [22:46:28] jzerebecki: feel free to create a very basic one such as: please add me to integration labs project [22:46:44] jzerebecki: and we can fill the gasps next week. I am heading bed in a few [22:47:21] jzerebecki: also, we have a CI triage Monday at 3pm CET. You are welcome to join :) [22:47:26] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:47:35] 10Continuous-Integration: access request to labs project integration - https://phabricator.wikimedia.org/T94280#1158643 (10JanZerebecki) 3NEW [22:48:56] RECOVERY - Puppet failure on deployment-cxserver03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:49:26] hashar: good to know. good night. [22:51:42] RECOVERY - Puppet failure on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:52:12] !log chown -R l10nupdate:wikidev /srv/mediawiki-staging/php-master/cache/l10n [22:52:16] 10Continuous-Integration: access request to labs project integration - https://phabricator.wikimedia.org/T94280#1158663 (10hashar) 5Open>3Resolved a:3hashar I have added @JanZerebecki to the 'integration' labs project. To let him and other regular members interact with the Jenkins workspace, I have created... [22:52:16] Logged the message, Master [22:52:26] jzerebecki: thanks a ton! this way we have some log :) [22:52:56] !log integration: jzerebecki addition and sudo policy tracked for history purpose as {{bug|T94280}} [22:52:57] RECOVERY - Puppet failure on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [22:53:00] Logged the message, Master [22:53:00] Project beta-mediawiki-config-update-eqiad build #2192: FAILURE in 0.66 sec: https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad/2192/ [22:53:19] thcipriani: bd808 good luck to both of you ! [22:53:39] why is that config run fail? [22:53:43] /srv/mediawiki-staging error: cannot open .git/FETCH_HEAD: Permission denied [22:53:44] o/ hashar [22:53:47] RECOVERY - Puppet failure on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [22:53:49] hashar: have a good night! [22:53:50] https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad/2192/console [22:54:01] Project beta-code-update-eqiad build #49371: FAILURE in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/49371/ [22:54:05] maybe you need to chown -R the dotfiles as well ? [22:54:14] I did [22:54:23] does jenkins not run as jenkins-deploy? [22:54:23] so that job might be using a sudo :( [22:54:42] the jenkins master ssh to the labs instance has jenkins-deploy [22:54:46] push a bunch of .jar [22:54:58] which starts a client running as jenkins-deploy [22:55:02] from there commands are run [22:55:12] RECOVERY - Puppet failure on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:55:22] the beta-code-update job runs /usr/local/bin/wmf-beta-mwconfig-update [22:55:34] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [22:55:34] RECOVERY - Puppet failure on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:55:46] bah. the change I made isn't there. /usr/local/bin/wmf-beta-mwconfig-update [22:55:54] :( [22:56:09] RECOVERY - Puppet failure on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [22:56:18] * bd808 forces another puppet run [22:56:27] stupid puppet templates [22:56:50] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:57:16] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:57:16] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [22:57:28] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:03] RECOVERY - Puppet failure on deployment-apertium01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:19] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [23:00:11] RECOVERY - Puppet failure on deployment-parsoid05 is OK: OK: Less than 1.00% above the threshold [0.0] [23:00:11] RECOVERY - Puppet failure on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:00:37] RECOVERY - Puppet failure on deployment-sentry2 is OK: OK: Less than 1.00% above the threshold [0.0] [23:00:58] !log restarted puppetmaster [23:01:03] Logged the message, Master [23:01:09] well [23:01:13] RECOVERY - Puppet failure on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:01:13] RECOVERY - Puppet failure on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:01:21] ding dong time to escape or: https://youtu.be/Dyl_nW2n528?t=72 [23:02:31] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:03:01] wtf. deployment-bastion isn't picking up puppet changes made on deployment-salt [23:03:34] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #60: FAILURE in 6 min 34 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/60/ [23:05:40] thcipriani: Any idea why changes to operations/puppet on deployment-salt wouldn't be being picked up on deployment-bastion? [23:05:50] RECOVERY - Puppet failure on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [23:05:55] I checked /etc/puppet/puppet.conf and it is pointed there [23:05:58] I'm looking at that...you restarted the puppet master.. [23:06:03] and the puppet run isn't fialing [23:06:06] yeah [23:06:17] That was my first wild guess [23:06:33] occasionally I've seen the master not notice on disk changes [23:07:02] The role is applied on deployment-bastion according to the local log file [23:07:03] are you looking at /usr/local/bin/wmf-beta-mwconfig-update? [23:07:24] seems to be up-to-date on bastion [23:07:29] Yeah. It's right now because I manually changed it :/ [23:07:33] oh [23:07:35] https://gerrit.wikimedia.org/r/#/c/200248/ wasn't applying [23:07:47] but it is in the repo on d-salt [23:08:04] * bd808 is baffled [23:08:22] I'm going to move it out of the way and see if it is recreated [23:08:26] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [23:08:40] 10Continuous-Integration, 6Release-Engineering, 10Wikidata, 7Browser-Tests, 5Patch-For-Review: browsertest jobs should not be allowed to run for 10 hours - https://phabricator.wikimedia.org/T92275#1158771 (10Tobi_WMDE_SW) @hashar & @zeljkofilipin great thanks to you for fixing this! [23:11:39] bd808: looks like that file is part of the beta::autoupdater module class which is included with role::beta::bastion which isn't assigned to deployment-bastion according to ldap. [23:12:44] which is weird.. [23:16:11] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » it,contintLabsSlave && UbuntuTrusty build #33: SUCCESS in 11 hr: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=it,label=contintLabsSlave%20&&%20UbuntuTrusty/33/ [23:17:23] but.../var/lib/puppet/client_yaml/catalog/i-0000010b.eqiad.wmflabs.yaml says it is applied :/ [23:17:47] I think I remember YuviPanda|flight killing that role recently though [23:18:25] I'll make sure it is applied directly via wikitech I guess [23:20:18] It's applied now :) [23:20:44] oh fuck [23:20:55] There is another role for the same file :( [23:21:07] !log Duplicate declaration: Git::Clone[operations/mediawiki-config] is already declared in file /etc/puppet/modules/beta/manifests/autoupdater.pp:46; cannot redeclare at /etc/puppet/modules/scap/manifests/master.pp:22 [23:21:11] Logged the message, Master [23:21:57] ah I see [23:21:59] well, that explains why that role wasn't applied anymore I guess [23:22:06] I can fix this [23:23:07] YuviPanda|flight: added the git clone to scap master when he got rid of the scap master role as part of combining deployment_server roles [23:23:26] yeah. sloppy [23:23:59] he abandoned the role without making sure it was empty basically [23:24:12] "beta must be link prod!!!!" [23:24:20] *like [23:24:28] 10Beta-Cluster: wmf-beta-mwconfig-update does not exist on Beta Labs - https://phabricator.wikimedia.org/T94287#1158871 (10Mattflaschen) 3NEW [23:25:36] well, to be fair, there were some big strides made in that direction. Mostly this branch was for staging so that we can spin up and down bastions whenever. [23:26:22] 10Beta-Cluster: wmf-beta-mwconfig-update does not exist on Beta Labs - https://phabricator.wikimedia.org/T94287#1158878 (10Mattflaschen) Breaks deployments to Beta Labs, e.g. https://gerrit.wikimedia.org/r/#/c/200252/ . [23:26:36] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [23:27:35] yeah. I'm not knocking the progress; just the aftershocks [23:28:43] !log applied beta::autoupdater directly to deployment-bastion via wikitech interface [23:28:48] Logged the message, Master [23:29:11] That class does a lot of stuff for jenkins integration [23:29:32] arguably it could be moved to the integration module [23:30:47] is there an integration module? [23:31:08] somewhere... [23:31:20] "contint" [23:32:05] Its what sets up the "normal" jenkins slaves [23:32:10] and the master I think [23:34:02] yeah, contint definitely seems like a module with a stronger focus on CI. Beta is just fairly tightly intertwined with it, seemingly. [23:34:19] or at least, beta::autoupdater. [23:34:31] yeah [23:34:57] this is the beta<->jenkins bridge class [23:37:53] yeah, I could see that. [23:39:40] grr.. something not right [23:40:34] I'm getting closer... running puppet with --debug to see what's taking so long [23:43:01] 10Deployment-Systems, 6WMF-Legal, 7Documentation: mediawiki/tools/scap is lacking a license - https://phabricator.wikimedia.org/T94239#1158916 (10bd808) It's a derivative of unlicensed code that was lifted from operations/puppet :( I'd be glad to license my contributions under any OSI license with a prefere... [23:45:16] Yippee, build fixed! [23:45:16] Project beta-mediawiki-config-update-eqiad build #2194: FIXED in 3.6 sec: https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad/2194/ [23:45:30] ^ yay! [23:45:43] now to see if scap is fixed too [23:50:48] blerg. not quite [23:51:40] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [23:52:26] do we know about those nutcracker system errors in prod? [23:53:41] Krenair: details? [23:53:57] krenair@fluorine:~$ grep "SYSTEM ERROR" /a/mw-log/hhvm.log [23:54:34] bunch of huge "Unable to unserialise" results with text like this at the end: [23:54:57] memcached-serious 2015-03-27 23:35:46 mw1147 enwiktionary: Memcached error for key "enwiktionary:lag_times:db1024:lock" on server "/var/run/nutcracker/nutcracker.sock:0": SYSTEM ERROR [23:55:35] I haven't seen that one before, no [23:56:31] 10Beta-Cluster: wmf-beta-mwconfig-update does not exist on Beta Labs - https://phabricator.wikimedia.org/T94287#1158939 (10Mattflaschen) [23:56:33] 10Beta-Cluster, 10Deployment-Systems, 5Patch-For-Review: scap failed: IOError [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp' - https://phabricator.wikimedia.org/T94261#1158941 (10Mattflaschen)