[00:01:45] 10Release-Engineering-Team (Watching / External), 10Scap, 10Operations: Scap: Standardize git version - https://phabricator.wikimedia.org/T179353#3721989 (10thcipriani) Adding @MoritzMuehlenhoff explicitly since IIRC he did the work to add git 2.11 to jessie-backports. [00:02:46] 10Continuous-Integration-Infrastructure (shipyard), 10Operations: wikimedia-jessie & wikimedia-stretch docker images don't have deb-src set for apt.wikimedia.org - https://phabricator.wikimedia.org/T179354#3721999 (10Legoktm) [00:25:45] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] [00:42:05] (03PS1) 10Legoktm: Add mwgate-npm-node-6-docker [integration/config] - 10https://gerrit.wikimedia.org/r/387499 [00:44:56] (03CR) 10Legoktm: [C: 032] Add mwgate-npm-node-6-docker [integration/config] - 10https://gerrit.wikimedia.org/r/387499 (owner: 10Legoktm) [00:51:51] (03Merged) 10jenkins-bot: Add mwgate-npm-node-6-docker [integration/config] - 10https://gerrit.wikimedia.org/r/387499 (owner: 10Legoktm) [00:53:37] (03PS1) 10Legoktm: Move mwgate-npm over to docker [integration/config] - 10https://gerrit.wikimedia.org/r/387500 [00:53:48] (03CR) 10Legoktm: [C: 032] Move mwgate-npm over to docker [integration/config] - 10https://gerrit.wikimedia.org/r/387500 (owner: 10Legoktm) [01:06:25] I wonder if we should prioritize integration/config in zuul so we can deploy changes when everything is backlogged [01:17:45] (03Merged) 10jenkins-bot: Move mwgate-npm over to docker [integration/config] - 10https://gerrit.wikimedia.org/r/387500 (owner: 10Legoktm) [01:19:26] oh crap [01:19:33] it doesn't have jsduck [01:19:39] !log deployed https://gerrit.wikimedia.org/r/387500 [01:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:31:57] RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] [01:35:58] (03CR) 10Jdlrobson: "This seems to have broken `make docs` for Minerva and probably MobileFrontend (and others) which blocks merges." [integration/config] - 10https://gerrit.wikimedia.org/r/387500 (owner: 10Legoktm) [01:36:21] (03CR) 10Legoktm: "Yes, I'm nearly done fixing it." [integration/config] - 10https://gerrit.wikimedia.org/r/387500 (owner: 10Legoktm) [01:41:09] (03PS1) 10Legoktm: Install jsduck in npm image [integration/config] - 10https://gerrit.wikimedia.org/r/387505 [01:46:42] jdlrobson: it should work now [01:48:51] (03CR) 10Legoktm: [C: 032] Install jsduck in npm image [integration/config] - 10https://gerrit.wikimedia.org/r/387505 (owner: 10Legoktm) [01:50:53] (03Merged) 10jenkins-bot: Install jsduck in npm image [integration/config] - 10https://gerrit.wikimedia.org/r/387505 (owner: 10Legoktm) [01:53:50] thanks legoktm ill give it a go [02:14:05] Project selenium-QuickSurveys » chrome,beta,Linux,BrowserTests build #588: 04FAILURE in 1 min 5 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/588/ [02:18:46] 10Gitblit-Deprecate, 10Epic, 10MW-1.30-release-notes (WMF-deploy-2017-07-18_(1.30.0-wmf.10)), 10Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3722201 (10TerraCodes) Weird, GitHub's search shows a lot more, tho that could just be due to the se... [02:24:35] !log moved mwgate-npm jobs over to docker - https://lists.wikimedia.org/pipermail/wikitech-l/2017-October/089046.html [02:24:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [02:43:10] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T174362#3722240 (10matmarex) [02:43:43] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T174362#3559343 (10matmarex) [03:24:25] 10Continuous-Integration-Infrastructure (shipyard): Create "npm-browser" docker image with npm, xvfb, chromium, and firefox installed - https://phabricator.wikimedia.org/T179360#3722269 (10Legoktm) [03:55:34] (03PS1) 10Legoktm: Remove build-essential after installing jsduck [integration/config] - 10https://gerrit.wikimedia.org/r/387513 [04:00:50] Yippee, build fixed! [04:00:50] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,BrowserTests build #563: 09FIXED in 4 min 50 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=BrowserTests/563/ [04:10:45] 10Continuous-Integration-Infrastructure: Pin jsduck version? - https://phabricator.wikimedia.org/T179362#3722299 (10Legoktm) [04:13:20] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [04:17:09] (03CR) 10Legoktm: [C: 032] "419MB -> 352MB." [integration/config] - 10https://gerrit.wikimedia.org/r/387513 (owner: 10Legoktm) [04:19:07] (03Merged) 10jenkins-bot: Remove build-essential after installing jsduck [integration/config] - 10https://gerrit.wikimedia.org/r/387513 (owner: 10Legoktm) [04:53:21] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [05:06:47] (03CR) 10Tim Starling: [C: 032] "Looks good. Removed Jenkins, Legoktm is planning on fixing the CI config later." [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [05:06:55] (03CR) 10jerkins-bot: [V: 04-1] Add support for OpenCV 3.0+ with autotools [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [05:09:30] (03CR) 10Legoktm: [V: 032] Add support for OpenCV 3.0+ with autotools [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [05:37:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [10.0] [05:54:54] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [05:55:18] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [05:56:16] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [06:12:48] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:17:25] (03PS1) 10Legoktm: Merge tag '1.3.0' into debian [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387518 [06:17:27] (03PS1) 10Legoktm: Bump Standards-Version to 4.1.1 [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387519 [06:17:29] (03PS1) 10Legoktm: Add build dependency upon libopencv-imgcodecs-dev [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387520 [06:17:31] (03PS1) 10Legoktm: uprightdiff (1.3.0-1) unstable; urgency=medium [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387521 [06:17:39] (03CR) 10jerkins-bot: [V: 04-1] Merge tag '1.3.0' into debian [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387518 (owner: 10Legoktm) [06:17:41] (03CR) 10jerkins-bot: [V: 04-1] Bump Standards-Version to 4.1.1 [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387519 (owner: 10Legoktm) [06:17:43] (03CR) 10jerkins-bot: [V: 04-1] Add build dependency upon libopencv-imgcodecs-dev [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387520 (owner: 10Legoktm) [06:18:09] (03CR) 10jerkins-bot: [V: 04-1] uprightdiff (1.3.0-1) unstable; urgency=medium [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387521 (owner: 10Legoktm) [06:18:16] ouch [06:20:06] (03CR) 10Legoktm: [V: 032 C: 032] Merge tag '1.3.0' into debian [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387518 (owner: 10Legoktm) [06:20:18] (03CR) 10Legoktm: [V: 032 C: 032] Bump Standards-Version to 4.1.1 [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387519 (owner: 10Legoktm) [06:20:26] (03CR) 10Legoktm: [V: 032 C: 032] Add build dependency upon libopencv-imgcodecs-dev [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387520 (owner: 10Legoktm) [06:20:37] (03CR) 10jerkins-bot: [V: 04-1] Add build dependency upon libopencv-imgcodecs-dev [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387520 (owner: 10Legoktm) [06:29:53] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [06:30:19] RECOVERY - Puppet errors on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:31:17] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:32:34] (03PS1) 10Legoktm: uprightdiff (1.3.0-1) unstable; urgency=medium [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/387523 [06:32:55] (03Abandoned) 10Legoktm: uprightdiff (1.3.0-1) unstable; urgency=medium [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/387523 (owner: 10Legoktm) [06:33:23] (03PS2) 10Legoktm: uprightdiff (1.3.0-1) unstable; urgency=medium [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/387521 [06:51:24] PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:57:49] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [07:31:25] RECOVERY - Puppet errors on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [07:34:55] PROBLEM - Free space - all mounts on integration-slave-jessie-1004 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1004.diskspace._srv.byte_percentfree (<10.00%) [08:53:39] 10Continuous-Integration-Infrastructure (shipyard), 10Operations: wikimedia-jessie & wikimedia-stretch docker images don't have deb-src set for apt.wikimedia.org - https://phabricator.wikimedia.org/T179354#3722450 (10hashar) package_builder set up some copy on write images for the distribution we support. When... [08:57:11] hashar: I have a failed publication of maven site that I dont understand (https://doc.wikimedia.org/search-highlighter/) the build seems to be successful, but the published directory is empty [08:58:47] if you have any idea... [09:02:02] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:03:57] 10Continuous-Integration-Infrastructure: Pin jsduck version? - https://phabricator.wikimedia.org/T179362#3722454 (10hashar) We used to to have the job running on Trusty and JSDuck was installed with a ruby package. Then on Jessie that is the job that installed it directly (gem install jsduck). Eventually I got i... [09:17:21] hashar: please disregard above comment, problem seems to be on my side [09:17:36] probably between chair and keyboard... I need to get a new chair... [09:45:57] 10Continuous-Integration-Infrastructure (shipyard): pip does not seem to cache upstream wheels properly - https://phabricator.wikimedia.org/T179366#3722492 (10hashar) [09:56:16] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Platform-Team, 10Patch-For-Review, 10Performance-Team (Radar): Support multi-instance hosts on mediawiki-config - https://phabricator.wikimedia.org/T178553#3722509 (10Marostegui) 05Open>03Resolved a:03Marostegui Crossposting from T178359#3... [09:56:19] gehel: good morning :) [09:56:28] hashar: hello! [09:57:01] gehel: yesterday Erik B. has made a patch for search/xgboost [09:57:10] which involves being able to choose a different parent pom :] [09:57:27] yep, I followed that one [09:57:29] in this case, the maven job is made to have mvn point to jvm-packages/pom.xml [09:57:50] with a cryptic definition: root-pom: '{obj:root_pom_var}' [09:58:00] which in jjb is a way to say "dont bother setting that value, use whatever default" [09:58:14] so maybe you would need to rely on that for one of the other repos [09:58:26] he also told me that search/xgboost is not ready yet for the "site" goal [09:59:02] that project is soo simple that publishing the maven site is probably not worth the effort [09:59:18] I might have a look at some point if I'm bored... [10:02:35] hashar: I'm not entirely sure what you mean by "so maybe you would need to rely on that for one of the other repos" [10:05:56] gehel: oh I was just pointing out that if one day you need to change the path to pom.xml, there is a way to do it ;) [10:06:14] hashar: Ok, I was wondering if I had to fix something now... [10:06:21] thanks for the info! I'll try to remember [10:06:37] but most probably I'll forget and I'll ask you again when I run into the issue :) [10:08:41] ;D [10:19:54] PROBLEM - Free space - all mounts on deployment-mediawiki04 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki04.diskspace.root.byte_percentfree (<10.00%) [10:27:24] 10Continuous-Integration-Infrastructure (shipyard): pip does not seem to cache upstream wheels properly - https://phabricator.wikimedia.org/T179366#3722561 (10hashar) And of course I can not reproduce locally bah. Maybe pip considers them outdated [10:33:00] 10Continuous-Integration-Infrastructure (shipyard): pip does not seem to cache upstream wheels properly - https://phabricator.wikimedia.org/T179366#3722566 (10hashar) After running the job with ZUUL_PIPELINE=postmerge, that triggered castor to save the cache. A next run show that pip managed to use the cached ge... [10:41:14] 10Continuous-Integration-Infrastructure (shipyard): pip does not seem to cache upstream wheels properly - https://phabricator.wikimedia.org/T179366#3722569 (10hashar) Would need to have pip install run with --verbose to get details about the cache. Locally that yields something like (abstract): ``` $ pip uninsta... [10:44:22] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:47:30] 10Continuous-Integration-Infrastructure (shipyard): pip does not seem to cache upstream wheels properly - https://phabricator.wikimedia.org/T179366#3722570 (10hashar) Once the pypi url expires: ``` Looking up "https://pypi.python.org/simple/monotonic/" in the cache Current age based on date: 635 Freshness... [10:50:42] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/387459 (owner: 10Hashar) [10:54:40] 10Continuous-Integration-Infrastructure (shipyard): pip does not seem to cache upstream wheels properly - https://phabricator.wikimedia.org/T179366#3722592 (10hashar) 05Open>03Resolved a:03hashar I guess the cache was simply outdated somehow. [10:56:06] (03PS1) 10Hashar: Fails flake8 [integration/config] - 10https://gerrit.wikimedia.org/r/387549 [10:58:48] (03CR) 10jerkins-bot: [V: 04-1] Fails flake8 [integration/config] - 10https://gerrit.wikimedia.org/r/387549 (owner: 10Hashar) [10:59:18] (03Abandoned) 10Hashar: Fails flake8 [integration/config] - 10https://gerrit.wikimedia.org/r/387549 (owner: 10Hashar) [11:02:58] (03PS1) 10Hashar: Switch integration-config-tox to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387552 [11:07:11] (03PS1) 10Hashar: Fix venv name in integration-zuul-layoutdiff [integration/config] - 10https://gerrit.wikimedia.org/r/387554 [11:07:20] (03CR) 10Hashar: [C: 032] Switch integration-config-tox to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387552 (owner: 10Hashar) [11:08:31] (03Merged) 10jenkins-bot: Switch integration-config-tox to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387552 (owner: 10Hashar) [11:09:40] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/387554 (owner: 10Hashar) [11:12:37] (03CR) 10Hashar: [C: 032] Fix venv name in integration-zuul-layoutdiff [integration/config] - 10https://gerrit.wikimedia.org/r/387554 (owner: 10Hashar) [11:13:55] (03Merged) 10jenkins-bot: Fix venv name in integration-zuul-layoutdiff [integration/config] - 10https://gerrit.wikimedia.org/r/387554 (owner: 10Hashar) [11:19:17] (03PS1) 10Aude: Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/387556 [11:19:23] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [11:36:25] (03CR) 10Aude: [C: 032] Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/387556 (owner: 10Aude) [11:36:56] (03Merged) 10jenkins-bot: Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/387556 (owner: 10Aude) [12:22:46] Yippee, build fixed! [12:22:46] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #572: 09FIXED in 45 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/572/ [13:02:02] hasharLunch: thanks for all the phab work, if you could toss some subscribers on tickets from the openstack-browser project page we can probably get license to shut down a few of these out of hand [13:02:08] I bet the hhvm one could be shutdown [13:09:15] chasemp: good morning. Yes I have used watroles/openstack browser to find out admins for the affected projects [13:09:41] hashar: thanks man, not trying to nitpick your good deeds :) [13:10:03] chasemp: it is a nice utility :] kudos to all involved in writing it [13:10:10] that is like 99% bryan [13:10:17] but the cause of this tickets is that some labvirt seems to be overloaded CPU wise :( [13:10:41] last time Andrew rebalanced a few instances here and there [13:10:56] * chasemp nods [13:11:28] then I am not sure why some instances are always at 100% cpu such as the tools-webgrid-lighttpd-* [13:11:38] ah you already replied neat [13:13:52] I'm sort of convinced in a workload agnostic hosting provider situation that overcommit should be fairly conservative [13:21:46] chasemp: at least Andrew made new instances to be allocated on the least loaded labvirt :] [13:22:05] no_justification hi, looks like upstream are going to support openssh native https://gerrit-review.googlesource.com/#/c/gerrit/+/137790/ :) [13:22:07] and seems nova default to an overcommit of * 16 ( cpu_allocation_ratio = 16 ) [13:22:14] yes in theory our fancy scheduling should offset some of this [13:22:21] but my confidence in it is not absolute [13:22:33] but that is only for newly created instances though. It is not magically rebalancing instances :( [13:23:16] in theory we could have some compute nodes with a 1/1 ratio for those instances we know require all the cpu [13:23:36] but that is probably going to make it more complicated than it already is [13:24:19] though i doint think we want to use it that way due to security risks probaly. [13:25:43] hashar: no good solutions, only painful ones :) [13:26:08] chasemp: oh and we started migrating some of the CI flow toward Docker containers, though they are still running on labs instance :D [13:26:19] taht's cool and good [13:26:26] I am not sure how much of the flow we will be able to migrate by end of this year though [13:26:37] but that has started! [13:26:44] s/labs/cloud :) [13:26:47] yeah that's awesome [13:26:57] (03PS3) 10Hashar: tox-docker generic job [integration/config] - 10https://gerrit.wikimedia.org/r/387459 [13:26:57] that's a more portable mechanism too [13:28:07] and maybe later we will look at shifting the load out of the WMCS platform to some other hardware [13:28:11] I am not sure really [13:28:25] it might not be a good idea to have several k8s platforms to maintain [13:28:34] I think teh majority of the untethered burden is control plane atm [13:28:48] so moving that out of line w/ the openstack logistics is nice in and of itself [13:30:59] (03CR) 10Hashar: [C: 032] "Creates" [integration/config] - 10https://gerrit.wikimedia.org/r/387459 (owner: 10Hashar) [13:31:08] hashar: fyi andrew is on vacation for a bit, I'll talk to the team about at least doing some rebalancing in teh short term [13:31:15] but I'm unsure what we'll have capacity to change [13:31:18] just a note [13:31:27] as I will also be traveling at ten end of the week [13:31:58] no worries. I dont think there is much urgency [13:32:06] * chasemp nods [13:32:13] I just happened to notice the slowness and filled it somewhere to be acted on eventuall [13:32:13] y [13:32:18] (03Merged) 10jenkins-bot: tox-docker generic job [integration/config] - 10https://gerrit.wikimedia.org/r/387459 (owner: 10Hashar) [13:32:30] we can always use more justification for labvirts [13:32:44] because we have been handing out capacity like mad lately [13:32:47] which is good [13:32:54] but growth is always painful [13:33:07] well the lame board I created at https://grafana.wikimedia.org/dashboard/db/labs-capacity-planning is more or less giving some overview [13:33:21] I am not even sure how/why I ended up doing it. Maybe because some jobs ended up randomly slow [13:34:11] on tools labs, there is probably a bunch of code that could use some profiling / tweaks. But that is a never ending fight [13:35:17] s/tool labs/Toolforge :) [13:35:19] yes [13:35:23] undoubtably [13:35:42] in any sane universe a team of 5 people are doing that work only [13:37:08] (03CR) 10Hashar: "check experimental" [tools/releng] - 10https://gerrit.wikimedia.org/r/200240 (https://phabricator.wikimedia.org/T94242) (owner: 10Hashar) [13:37:47] (03CR) 10jenkins-bot: Clarify license is CC0 [tools/releng] - 10https://gerrit.wikimedia.org/r/200240 (https://phabricator.wikimedia.org/T94242) (owner: 10Hashar) [13:40:09] (03PS1) 10Hashar: zuul template for tox-docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/387568 [13:41:23] (03PS1) 10Hashar: mediawiki/tools/releng to tox-docker [integration/config] - 10https://gerrit.wikimedia.org/r/387569 [13:42:07] (03CR) 10Hashar: [C: 032] "Noop in Zuul since the templates are not used yet." [integration/config] - 10https://gerrit.wikimedia.org/r/387568 (owner: 10Hashar) [13:43:26] (03Merged) 10jenkins-bot: zuul template for tox-docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/387568 (owner: 10Hashar) [13:44:10] (03CR) 10Hashar: [C: 032] mediawiki/tools/releng to tox-docker [integration/config] - 10https://gerrit.wikimedia.org/r/387569 (owner: 10Hashar) [13:45:22] (03CR) 10Hashar: "check experimental" [integration/dashboard] - 10https://gerrit.wikimedia.org/r/274504 (owner: 10Paladox) [13:45:28] (03Merged) 10jenkins-bot: mediawiki/tools/releng to tox-docker [integration/config] - 10https://gerrit.wikimedia.org/r/387569 (owner: 10Hashar) [13:45:42] (03CR) 10jenkins-bot: flake8: allow lambda expression assignment [integration/dashboard] - 10https://gerrit.wikimedia.org/r/274504 (owner: 10Paladox) [14:01:56] 10Continuous-Integration-Infrastructure (shipyard): tox-docker fails to install MySQL-python: EnvironmentError: mysql_config not found - https://phabricator.wikimedia.org/T179392#3723215 (10hashar) [14:06:08] (03PS1) 10Hashar: wikimedia/fundraising/tools requires mysql_config [integration/config] - 10https://gerrit.wikimedia.org/r/387576 (https://phabricator.wikimedia.org/T179392) [14:08:41] (03PS1) 10Hashar: UploadWizard tox to Docker and delete mwgate-tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/387577 [14:08:53] (03CR) 10Hashar: [C: 032] UploadWizard tox to Docker and delete mwgate-tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/387577 (owner: 10Hashar) [14:08:56] (03CR) 10Hashar: [C: 032] wikimedia/fundraising/tools requires mysql_config [integration/config] - 10https://gerrit.wikimedia.org/r/387576 (https://phabricator.wikimedia.org/T179392) (owner: 10Hashar) [14:10:34] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/387578 [14:10:45] (03CR) 10Hashar: "check experimental" [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/387578 (owner: 10Hashar) [14:11:15] (03CR) 10Hashar: "check experimental" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/376236 (owner: 10Hashar) [14:11:47] (03CR) 10jenkins-bot: Jenkins job validation (DO NOT SUBMIT) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/376236 (owner: 10Hashar) [14:11:56] (03Merged) 10jenkins-bot: wikimedia/fundraising/tools requires mysql_config [integration/config] - 10https://gerrit.wikimedia.org/r/387576 (https://phabricator.wikimedia.org/T179392) (owner: 10Hashar) [14:12:01] (03Merged) 10jenkins-bot: UploadWizard tox to Docker and delete mwgate-tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/387577 (owner: 10Hashar) [14:12:13] chasemp: I promise I will eventually print out a labs > foo reference chart and print it next to my screen :] [14:12:46] 00:00:02.236 rm: cannot remove ‘log/log/tox-0.log’: Permission denied grmblblblblbl [14:34:41] (03PS1) 10Hashar: Migrate more tox jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387582 [14:35:01] (03CR) 10Hashar: [C: 04-2] "-2 because:" [integration/config] - 10https://gerrit.wikimedia.org/r/387582 (owner: 10Hashar) [14:43:47] PROBLEM - Free space - all mounts on deployment-eventlog02 is CRITICAL: CRITICAL: deployment-prep.deployment-eventlog02.diskspace.root.byte_percentfree (<30.00%) [15:12:59] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Documentation for selenium-RelatedArticles-jessie - https://phabricator.wikimedia.org/T179406#3723640 (10zeljkofilipin) [15:13:29] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Documentation for selenium-RelatedArticles-jessie - https://phabricator.wikimedia.org/T179406#3723640 (10zeljkofilipin) p:05Triage>03Normal [15:14:20] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Documentation for selenium-RelatedArticles-jessie - https://phabricator.wikimedia.org/T179406#3723640 (10zeljkofilipin) [15:14:22] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Developer-Relations (Oct-Dec 2017), 10User-zeljkofilipin: Tech talk: Selenium tests in Node.js - https://phabricator.wikimedia.org/T171852#3723660 (10zeljkofilipin) [15:14:24] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3723658 (10zeljkofilipin) [15:15:05] well upstream just rewrote the whole layout for gerrit and merged it [15:15:12] * paladox checks to see what plugin breaks now [15:21:24] *sigh* its-phabricator is broken again on master /me looks for a fix [15:28:18] (03PS1) 10Hashar: Fix up tox log permissions with Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387590 [15:28:26] https://gerrit.wikimedia.org/r/387590 Fix up tox log permissions with Docker [15:30:12] (03CR) 10Hashar: [C: 032] Fix up tox log permissions with Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387590 (owner: 10Hashar) [15:31:33] (03Merged) 10jenkins-bot: Fix up tox log permissions with Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387590 (owner: 10Hashar) [15:33:06] fixed it in https://gerrit-review.googlesource.com/#/c/plugins/its-base/+/137851/ [15:33:07] :) [15:34:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [16:04:14] PROBLEM - Puppet errors on integration-slave-docker-1005 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:08:06] 10Continuous-Integration-Infrastructure (shipyard), 10Operations: wikimedia-jessie & wikimedia-stretch docker images don't have deb-src set for apt.wikimedia.org - https://phabricator.wikimedia.org/T179354#3723800 (10Legoktm) The real reason I was asking is that in our hhvm puppet class we run `apt-get build-d... [16:08:12] !log integration: sudo cumin --force 'name:docker' 'rm -fR /srv/jenkins-workspace/workspace/*tox-docker*' [16:08:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:09:52] OHHH [16:10:03] 00:00:35.932 configure: error: in `/tmp/pip-build-fmUOue/pycrypto': [16:10:03] 00:00:35.932 configure: error: no acceptable C compiler found in $PATH [16:10:03] :D [16:10:23] huh [16:10:36] unrelated :D [16:10:36] will look at that one later really [16:11:13] the perms looks right now [16:18:09] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3723835 (10hoo) [16:20:31] (03CR) 10Hashar: [C: 04-2] "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/387582 (owner: 10Hashar) [16:21:42] thcipriani: yeah so setgid + umask solved it THANK YOU! [16:21:52] \o/ [16:21:55] nice [16:22:02] Here’s a silly problem: I need to update the version of pip used inside a virtualenv. pip can’t upgrade the venv/bin/pip, for some reason, although easy_install can. However, we can’t download from production boxes and easy_install is unable to install from a wheel. [16:22:13] fun / not-fun [16:24:36] Maybe I need to distribute a “binary” .egg and easy_install that? [16:26:48] (03CR) 10Hashar: "check experimental" [integration/quibble] - 10https://gerrit.wikimedia.org/r/381436 (owner: 10Hashar) [16:27:16] (03CR) 10jenkins-bot: Fix formatting in README.md [integration/quibble] - 10https://gerrit.wikimedia.org/r/381436 (owner: 10Hashar) [16:27:49] awight: can you upgrade pip inside the virtualenv ? [16:28:27] hashar: I can, but only with easy_install. If I “pip install --upgrade pip”, I’m left with a broken bin/venv/pip. [16:28:32] madness. [16:28:41] awight: virtualenv foo && cd foo && . bin/activate && pip --version && pip install --upgrade pip [16:28:52] Exactly. [16:29:01] ahh yeah [16:29:11] 10Beta-Cluster-Infrastructure, 10Deployments, 10Patch-For-Review: mediawiki::users::mwdeploy_pub_key hiera key should be purged - https://phabricator.wikimedia.org/T145495#3723874 (10demon) 05Open>03Resolved a:03demon [16:29:24] because virtualenv set the PATH to bin/ [16:29:28] and pip is copied in there [16:29:45] hashar: What happens is that venv/bin/pip is still version 1.5.6, but the lib has pip 9.0.1 modules, so bin/pip is left borken. [16:29:58] maybe [16:30:23] well on my setup [16:30:27] bin/pip is just a dummy wrapper [16:30:31] I’m only seeing two workarounds: include a pip-9.0.1.egg alongside our wheels (nasty), or reimage the boxes with stretch (too much overhead) [16:30:35] oho [16:30:44] it import pip from the virtualenv lib/python2.7/site-packages [16:31:01] my virtualenv installs pip 0.9.x [16:31:11] and if I do: pip install 'pip<0.9.0' [16:31:14] I get: [16:31:18] $ pip --version [16:31:18] pip 0.8.3 from /tmp/foo/lib/python2.7/site-packages (python 2.7) [16:31:28] so maybe you want to upgrade virtualenv [16:32:02] hashar: fwiw, https://phabricator.wikimedia.org/P6235 [16:32:43] It’s weirder than I thought, thanks for making me take another look. It actually breaks inside the lib itself! [16:33:10] (03CR) 10Hashar: "check experimental" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/376236 (owner: 10Hashar) [16:33:41] awight: hoo and that is python3 so you want to : pip3 install --upgrade pip [16:33:44] I think [16:34:11] (03CR) 10jenkins-bot: Jenkins job validation (DO NOT SUBMIT) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/376236 (owner: 10Hashar) [16:34:30] It shouldn’t make a difference since my venv only has py3, but lemme see... [16:35:43] Yeah same deal. [16:35:54] :( [16:36:30] It’s failing on a python internal, though. [16:36:47] importlib._bootstrap which sounds private [16:37:07] lol > This module is NOT meant to be directly imported [16:37:09] naughty. [16:37:10] who knows what kind of craziness is going on really :( [16:39:11] RECOVERY - Puppet errors on integration-slave-docker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [16:42:59] (03CR) 10Hashar: "check experimental" [tools/release] - 10https://gerrit.wikimedia.org/r/387556 (owner: 10Aude) [16:43:33] (03CR) 10jenkins-bot: Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/387556 (owner: 10Aude) [16:47:02] FYI, I’m starting some experiments where I toggle beta-labs ORES between two major versions to test migration sanity. Some errors will probably scroll through. [16:47:12] (03PS2) 10Hashar: Migrate more tox jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387582 [16:50:01] (03CR) 10Hashar: [C: 032] "Validated each of them" [integration/config] - 10https://gerrit.wikimedia.org/r/387582 (owner: 10Hashar) [16:51:31] (03Merged) 10jenkins-bot: Migrate more tox jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/387582 (owner: 10Hashar) [16:51:54] random CI question .. is there any good way to grab a 200M tgz and decompress it so it's available during testing? Specifically this is spark, a framework for data processing in hadoop that needs to be available to run the tests in search/MjoLniR repository (doesnt need a hadoop cluster, it can run single-node standalone). I'm thinking at a minimum i need to host that tarball somewhere [16:52:00] locally? [16:52:16] !log Migrated some tox jobs to Docker via https://gerrit.wikimedia.org/r/387582 [16:52:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:52:33] the actual tests would be running via tox, maybe it could be part of docker overlay fs? :) [16:53:39] ebernhardson: isn't Spark a dependency defined in the maven pom.xml ? [16:53:54] hashar: this is actually python code, via pyspark. [16:53:59] ahhh [16:54:05] hashar: it spins up a jvm and talks to it through sockets [16:55:04] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3724003 (10zeljkofilipin) [16:55:06] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Developer-Relations (Oct-Dec 2017), 10User-zeljkofilipin: Tech talk: Selenium tests in Node.js - https://phabricator.wikimedia.org/T171852#3724001 (10zeljkofilipin) 05Open>03Resolved https://www.youtube.com/watch?v=Q7TT1Joze14 [16:55:20] Hitting something odd with scap, thcipriani. I’m testing ORES upgrade and rollback, but when I try to deploy the older revision (0ad…), I see the correct tree-ish all the way through, but the deployed version is still the new one. [16:55:33] ebernhardson: and pyspark is just the python code to interact with Spark isn't it ? [16:55:35] not a blocker cos I think I can use “-r HEAD” for now, but I thought you might want to know. [16:56:00] ebernhardson: I am migrating repositories to run tox on docker. So I guess we can maybe come up with a way to craft a more specific container that would have spark [16:56:38] hashar: mostly yes, pyspark is a library that allows interaction with spark. It's not installable via pip though, generally the suggested way is to just download a tarball with all the jars/python/etc stuff and use the builtin scripts to stand things up in place without doing a system level install [16:57:25] awight: huh, that's weird. You were trying to deploy with -r 0ad...? Or were you resetting the git repo and just running deploy? [16:57:45] an overlay with decompressed spark in /srv/spark or whatever would probably be perfect. [16:58:14] hmm [16:58:21] i suppose i'll create a ticket, i was originally thinking maybe it would just need to be hosted somewhere and i make a script the does curl ... | tar -xf , but if its moving to a docker a more specific container could be much better and waste less time per-run [16:58:26] yeah definitely a task :) [16:58:32] then I guess you can craft a container for CI [16:58:38] integration/config ./dockerfiles/ [16:58:45] thcipriani: Just doing a git checkout and running scap without “-r” anything. The creepy part is that the checked-out version is what is reported in the console output. Not a problem if the logic is set to never roll back without explicitly asking for it, but the reporting needs to jive. [16:59:34] ebernhardson: the image for tox is dockerfiles/tox/Dockerfile , so maybe extend that one with a curl; tar etc. [17:00:08] lol, I ran out of disk space on deployment-sca03.deployment-prep.eqiad.wmflabs. I’ll randomly remove some checkouts. [17:00:08] awight: yeah, scap doesn't really know it's an older revision so it should have Just Worked™. Thanks for the report, I can take a look. [17:00:14] hashar: perfect, thanks! [17:00:29] ebernhardson: but you need docker installed locally. I am unlikely to be able to give it a try this week though. But Tyler might have some time to introduce you to the CI docker build system [17:00:39] thcipriani: ooh interesting. Happy to contribute to a task if it’s helpful. [17:00:49] it is not too hard really. There is a dockerfiles/build.py files that takes as argument a directory holding the Dockerfile [17:00:50] PROBLEM - Free space - all mounts on deployment-sca03 is CRITICAL: CRITICAL: deployment-prep.deployment-sca03.diskspace._srv.byte_percentfree (<50.00%) [17:01:24] ebernhardson: you can try locally based on FROM wmfreleng/tox (they are on dockerhub) [17:01:55] "CI docker build system" tl;dr add a directory with a Dockerfile here: https://github.com/wikimedia/integration-config/tree/master/dockerfiles :P [17:02:11] ;]]]]]] [17:03:04] anyway I am out. Will check in later tonight [17:20:52] RECOVERY - Free space - all mounts on deployment-sca03 is OK: OK: All targets OK [17:26:20] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:47:39] no_justification lol i managed to increase its-phabricator, its-base size by 40mb+. Though i've fixed it now :). [17:47:54] it nows back down in size [17:48:04] reason why is never use PLUGIN_DEPS :) [17:48:22] it includes all the deps in gerrit core. [17:58:31] i do like the new status badge in polygerrit :) (by status bage i mean for the change) [17:58:34] 10Release-Engineering-Team (Kanban), 10Epic, 10RelEng FY201718 Q2 Goals: FY2017/18 Program 3 Outcome 1 Objective 1: Define a set of code stewardship levels (from high to low expectations) - https://phabricator.wikimedia.org/T174090#3724204 (10Jrbranaa) Added first pass changes regarding code stewardship.... [17:59:24] 10Continuous-Integration-Config, 10Page-Previews: mediawiki/extensions/Popups runs qunit in normal "npm test" - https://phabricator.wikimedia.org/T179425#3724205 (10Legoktm) [18:00:32] (03PS1) 10Legoktm: Move mediawiki/extensions/Popups back to nodepool npm-test [integration/config] - 10https://gerrit.wikimedia.org/r/387624 (https://phabricator.wikimedia.org/T179425) [18:01:17] RECOVERY - Puppet errors on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:03:05] (03CR) 10Legoktm: [C: 032] Move mediawiki/extensions/Popups back to nodepool npm-test [integration/config] - 10https://gerrit.wikimedia.org/r/387624 (https://phabricator.wikimedia.org/T179425) (owner: 10Legoktm) [18:03:14] 10Continuous-Integration-Config, 10Page-Previews, 10Patch-For-Review: mediawiki/extensions/Popups runs qunit in normal "npm test" - https://phabricator.wikimedia.org/T179425#3724246 (10pmiazga) @Jdlrobson @Jhernandez @phuedx: any idea? [18:04:44] (03Merged) 10jenkins-bot: Move mediawiki/extensions/Popups back to nodepool npm-test [integration/config] - 10https://gerrit.wikimedia.org/r/387624 (https://phabricator.wikimedia.org/T179425) (owner: 10Legoktm) [18:06:04] !log deployed https://gerrit.wikimedia.org/r/387624 [18:06:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:21:26] 10Continuous-Integration-Config, 10Page-Previews, 10Patch-For-Review: mediawiki/extensions/Popups runs qunit in normal "npm test" - https://phabricator.wikimedia.org/T179425#3724352 (10Jhernandez) It seems like QUnit has nothing to do with the problem. From the logs it seems like babel is trying to create a... [18:28:11] 10Continuous-Integration-Config, 10Page-Previews, 10Patch-For-Review: mediawiki/extensions/Popups runs qunit in normal "npm test" - https://phabricator.wikimedia.org/T179425#3724363 (10Legoktm) >>! In T179425#3724352, @Jhernandez wrote: > It seems like QUnit has nothing to do with the problem. > > From the... [18:33:25] PROBLEM - Puppet errors on deployment-kafka-jumbo-1 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:33:57] 10Continuous-Integration-Config, 10MediaWiki-extensions-General, 10Patch-For-Review: Disable CI for Semantic Extensions that have gone to github - https://phabricator.wikimedia.org/T152835#3724380 (10Umherirrender) 05Open>03Resolved p:05Triage>03Normal [18:35:00] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:38:32] 10Continuous-Integration-Config, 10MediaWiki-extensions-General, 10Patch-For-Review: Disable CI for Semantic Extensions that have gone to github - https://phabricator.wikimedia.org/T152835#2861593 (10Umherirrender) Looks all done, if some extension was missing, please create a new task in #cleanup [18:39:22] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:53:19] If anyone wants to feel sorry for us, more git woes: > 18:51:32 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'cluster', 'promote', '--refresh-config'] on ores2005.codfw.wmnet returned [255]: Received disconnect from 10.192.32.173: 2: Too many authentication failures for invalid user deploy-service from 10.64.0.196 port 40092 ssh2 [19:02:14] awight: stop breaking stuff, will ya? :P [19:03:04] awight: I think in this instance you do need to set keyholder_key: deploy_service in scap.cfg [19:09:59] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:13:26] RECOVERY - Puppet errors on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [0.0] [19:17:55] Project selenium-MinervaNeue » firefox,beta,Linux,BrowserTests build #181: 04FAILURE in 28 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/181/ [19:43:54] legoktm: good morning. I got some tox jobs migrated to docker as well :) [19:44:06] hi :) yay! [19:44:08] after a bunch of headhaches related to permissions and figuring out a way to restore the package cache properly [19:44:16] is it using castor? [19:44:22] yes!! [19:44:52] I introduced a few more macros for that [19:50:49] (03PS1) 10Hashar: More specific tox jobs for fundraising/tools [integration/config] - 10https://gerrit.wikimedia.org/r/387649 [19:51:00] (03PS1) 10Jforrester: [MediaViewer] Run jsduck in gate [integration/config] - 10https://gerrit.wikimedia.org/r/387650 [19:51:41] (03CR) 10Jforrester: "Repo issue fixed in I1aebfa88f." [integration/config] - 10https://gerrit.wikimedia.org/r/387650 (owner: 10Jforrester) [19:52:07] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/387578 (owner: 10Hashar) [20:02:41] https://github.com/babel/babel/issues/6653#issuecomment-340889166 "join us on slack" -> "self service registration on slack is closed because of spam, please request an invite on github" -.- [20:03:28] 10Continuous-Integration-Config, 10Page-Previews, 10Patch-For-Review: mediawiki/extensions/Popups runs qunit in normal "npm test" - https://phabricator.wikimedia.org/T179425#3724574 (10Legoktm) I've also filed an upstream request https://github.com/babel/babel/issues/6653 that they support the XDG basedir sp... [20:04:38] hashar chasemp it seems mysql likes to use alot of cpu :). Even if it's just installed but no one really using it. [20:04:43] by mysql i mean mariadb. [20:06:45] legoktm: muther $*@($(! [20:07:22] it's just sad :/ they're basically giving out invites over twitter DMs [20:07:37] (03PS1) 10Legoktm: Set BABEL_CACHE_PATH in npm image [integration/config] - 10https://gerrit.wikimedia.org/r/387653 (https://phabricator.wikimedia.org/T179425) [20:07:56] (03CR) 10Legoktm: "I haven't had time to test this yet." [integration/config] - 10https://gerrit.wikimedia.org/r/387653 (https://phabricator.wikimedia.org/T179425) (owner: 10Legoktm) [20:08:15] (03CR) 10Hashar: [C: 032] More specific tox jobs for fundraising/tools [integration/config] - 10https://gerrit.wikimedia.org/r/387649 (owner: 10Hashar) [20:11:01] (03Merged) 10jenkins-bot: More specific tox jobs for fundraising/tools [integration/config] - 10https://gerrit.wikimedia.org/r/387649 (owner: 10Hashar) [20:24:02] (03PS1) 10Hashar: Archive labs/migration-assistant [integration/config] - 10https://gerrit.wikimedia.org/r/387656 [20:27:18] legoktm: any clue whether labs/tools/gblrenamemon is still used? :) [20:29:30] (03PS1) 10EBernhardson: Run search/mjolnir pyspark tests via tox [integration/config] - 10https://gerrit.wikimedia.org/r/387658 [20:30:52] (03CR) 10EBernhardson: "Tested docker image using:" [integration/config] - 10https://gerrit.wikimedia.org/r/387658 (owner: 10EBernhardson) [20:31:22] hashar: i lost my scrollback, who else was i to ask about potentially reviewing? [20:34:40] (03PS1) 10Hashar: Migrate to tox jobs to Docker (labs ones) [integration/config] - 10https://gerrit.wikimedia.org/r/387659 [20:35:02] ebernhardson: thcipriani most probably :) [20:35:20] perfect, thanks! [20:36:10] ebernhardson: usually we have the jenkins job refer to a specific docker version. Running build.py --update-jjb should then take care of updating the files [20:36:39] (03CR) 10Hashar: [C: 032] "All tested" [integration/config] - 10https://gerrit.wikimedia.org/r/387659 (owner: 10Hashar) [20:36:41] (03CR) 10Hashar: [C: 032] Archive labs/migration-assistant [integration/config] - 10https://gerrit.wikimedia.org/r/387656 (owner: 10Hashar) [20:36:57] hashar: wouldn't it nede to be pushed somewhere though? I suppose i didn't check but assumed i don't randomly have rights to push new images [20:37:50] a few of us have the credentials to push to dockerhub wmfreleng/ namespace [20:39:39] (03Merged) 10jenkins-bot: Archive labs/migration-assistant [integration/config] - 10https://gerrit.wikimedia.org/r/387656 (owner: 10Hashar) [20:39:41] (03Merged) 10jenkins-bot: Migrate to tox jobs to Docker (labs ones) [integration/config] - 10https://gerrit.wikimedia.org/r/387659 (owner: 10Hashar) [20:41:48] Yippee, build fixed! [20:41:49] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #565: 09FIXED in 48 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/565/ [20:41:50] Yippee, build fixed! [20:41:51] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #565: 09FIXED in 49 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/565/ [20:44:18] (03PS2) 10EBernhardson: Run search/mjolnir pyspark tests via tox [integration/config] - 10https://gerrit.wikimedia.org/r/387658 [20:46:57] ebernhardson: you are such a hacker :))) [20:47:08] (03CR) 10Hashar: Run search/mjolnir pyspark tests via tox (035 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/387658 (owner: 10EBernhardson) [20:47:15] ebernhardson: bah I did a review of PS 1 :( [20:47:34] hashar: yea i was reviewing and saw some silly things too. But i only fixed 2 things, so 5 means i have things to do still :) [20:48:11] well a couple of my comments are "OH YEAH" and "+1" [20:48:28] thcipriani: I remember something about this from last week, just checking, isn’t the keyholder_key thing a workaround and should already be the default? [20:48:37] :) [20:49:05] ebernhardson: and one is about s/mv/cp --recursive/ https://gerrit.wikimedia.org/r/#/c/387658/1/dockerfiles/tox-pyspark/run.sh [20:49:19] ebernhardson: which Tyler and I found out while you were crafting your patch [20:49:30] awight: unfortunately it is only the default in the next release. I didn't anticipate needing it in this release cycle, but maybe there's a new key in keyholder... [20:49:51] ahh. I'd love a way to reuse the existing run file, but i didn't see a straight forward way to add the extra environment variable and have is passthru [20:49:55] thcipriani: Cool, I’ll just leave a deprecation comment in my scap.cfg [20:49:58] would keep down duplication though [20:50:38] ebernhardson: which env var ? [20:51:20] hashar: the difference between the two files is basically adding the SPARK_HOME environment var, and adjusting TOX_TESTENV_PASSENV to include SPARK_HOME [20:51:53] hmm [20:52:11] yeah tox has been made to start with an empty environment [20:52:30] maybe the tox image can somehow be made to support injecting variables to tox [20:52:51] that might be as simple as crafting a tox.env file which would then be processed by the tox image run.sh [20:54:22] or we can have tox to pass all env variables with TOX_TESTENV_PASSENV='*' [20:55:20] thcipriani: Not sure I got that comment character correct. “#” ? [20:55:34] I made this for you :D. https://gerrit.wikimedia.org/r/#/c/387665/1 [20:55:51] hashar: hmm, a tox.env file seems plausible. lemme see if i can make that work [20:56:56] ebernhardson: but most probably we should let tox accept any files [20:56:59] err [20:57:04] any env variable [20:57:37] hmm, yea the environment here is already pretty restrictive, passing it along seems reasonable [21:05:18] (03PS1) 10Hashar: docker: pass all env variables to tox [integration/config] - 10https://gerrit.wikimedia.org/r/387682 [21:05:36] ebernhardson: ^^ but I have not tested it [21:05:56] (03CR) 10Hashar: "For https://gerrit.wikimedia.org/r/#/c/387658/" [integration/config] - 10https://gerrit.wikimedia.org/r/387682 (owner: 10Hashar) [21:07:17] hashar: i'll test it, thanks! [21:19:07] I am getting some weird permission denied errors on php7-docker CI run: https://integration.wikimedia.org/ci/job/composer-package-php70-docker/80/console [21:19:10] known issue? [21:20:29] legoktm: ^ :( [21:20:37] SMalyshev: more or less. It is still a bit experimental [21:20:47] and there are a few issues with uid between the host and the containers [21:20:55] hashar: any workaround? [21:21:31] recheck, hope that the build happens on another host [21:21:35] will try to look at it [21:21:57] ok, got it, will try recheck for now [21:23:39] SMalyshev: and the task to create that job is https://phabricator.wikimedia.org/T144961 [21:24:37] the one that fails is composer-package-php70-docker [21:24:40] is that the same? [21:28:14] SMalyshev: yeah that is related [21:28:37] ok will be patient then [21:28:42] SMalyshev: it is probably just a leftover directory with wrong permissions [21:28:47] I am going to check them all / delete as needed [21:28:58] hopefully I can get 2 runs in a row where it doesn't fail :) [21:30:02] SMalyshev: your chance are exactly 3/5 that it is going to pass :) [21:32:08] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review: Create composer-php70 job - https://phabricator.wikimedia.org/T144961#3724772 (10hashar) ``` + rm -rf log + mkdir -m 2777 -p log + rm -rf src rm: cannot remove ‘src/vendor/phpspec/prophecy/README.md’: Per... [21:32:31] !log T144961 : sudo cumin --force 'name:docker' 'rm -fR /srv/jenkins-workspace/workspace/composer-package-php70-docker/*' [21:32:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:32:35] T144961: Create composer-php70 job - https://phabricator.wikimedia.org/T144961 [21:32:54] awight: lgtm, ; and # are comment chars, just uses standard python2 ConfigParser [21:33:14] thcipriani: ty! [21:34:23] !log T144961 : sudo cumin --force 'name:docker' 'rm -fR /srv/jenkins-workspace/workspace/composer-*php70*' [21:34:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:34:31] SMalyshev: should be good now [21:35:05] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review: Create composer-php70 job - https://phabricator.wikimedia.org/T144961#3724792 (10hashar) Should be good. Maybe we need a preflight check to ensure cache/log/src are all fine. [21:38:41] (03PS1) 10Hashar: docker: add dev dependencies to tox [integration/config] - 10https://gerrit.wikimedia.org/r/387723 [21:40:39] thcipriani: keyholder_key tweak seems to be holding its ground, thanks again! [21:40:59] awight: glad to hear it. Sorry for all the trouble you've been running into :( [21:41:38] (03PS1) 10Hashar: dockerfile: add libmysqlclient-dev to tox [integration/config] - 10https://gerrit.wikimedia.org/r/387728 (https://phabricator.wikimedia.org/T179392) [21:41:50] not at all—I only recently came to understand that I’m in the tech industry only because computers *never* do what I ask them to. [21:42:02] Happy to crash test scap any time :p [21:42:05] :D [21:43:30] thcipriani: lol: > 21:43:13 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'cluster', 'fetch', '--refresh-config'] on ores2003.codfw.wmnet returned [255]: Permission denied (publickey,keyboard-interactive). [21:43:36] gtg for now [21:54:32] hashar: thanks! [21:58:26] (03CR) 10Hashar: [C: 04-1] "wikimedia/fundraising/tools does try to connect to a mysql server but the tox image doesn't ship any :]" [integration/config] - 10https://gerrit.wikimedia.org/r/387728 (https://phabricator.wikimedia.org/T179392) (owner: 10Hashar) [22:01:42] Gerrit should support gpg keys without setting receive.enableSignedPush [22:02:55] SMalyshev: you are welcome and thanks for the report :] [22:09:26] (03CR) 10EBernhardson: Run search/mjolnir pyspark tests via tox (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/387658 (owner: 10EBernhardson) [22:09:55] (03PS3) 10EBernhardson: Run search/mjolnir pyspark tests via tox [integration/config] - 10https://gerrit.wikimedia.org/r/387658 [22:10:30] (03CR) 10EBernhardson: [C: 031] "tested with the followup search/mjolnir patch, works as expected." [integration/config] - 10https://gerrit.wikimedia.org/r/387682 (owner: 10Hashar) [22:11:40] ebernhardson: awesome [22:14:31] !log docker push wmfreleng/tox:v2017.10.31.21.03 | for ebernhardson / https://gerrit.wikimedia.org/r/#/c/387682/ [22:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:15:59] (03CR) 10Hashar: [C: 032] "Thank you Erik." [integration/config] - 10https://gerrit.wikimedia.org/r/387682 (owner: 10Hashar) [22:16:34] ebernhardson: but I am surprised maven doesn't have a "tox" goal yet :] [22:18:46] (03CR) 10jerkins-bot: [V: 04-1] docker: pass all env variables to tox [integration/config] - 10https://gerrit.wikimedia.org/r/387682 (owner: 10Hashar) [22:18:56] grblblbl [22:19:32] :) [22:20:13] I have pushed the image / update the job [22:20:22] and seems XDG_CACHE_HOME is NOT passed :/ [22:20:27] :S [22:20:41] SPARK_HOME was certainly passed [22:21:15] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Performance-Team, 10Availability (Multiple-active-datacenters): Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep - https://phabricator.wikimedia.org/T151466#3724889 (10aaron) So, running mcrouter via... [22:22:04] hm [22:22:05] https://integration.wikimedia.org/ci/job/integration-config-tox-docker/95/artifact/log/zuul_tests-0.log/*view*/ [22:22:07] it is there :) [22:27:55] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Performance-Team, 10Availability (Multiple-active-datacenters): Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep - https://phabricator.wikimedia.org/T151466#3724922 (10aaron) [22:30:51] grbmbmbl tox [22:35:05] 10Continuous-Integration-Config, 10Page-Previews, 10Patch-For-Review, 10Readers-Web-Kanban-Board, 10Unplanned-Sprint-Work: mediawiki/extensions/Popups runs qunit in normal "npm test" - https://phabricator.wikimedia.org/T179425#3724950 (10Jdlrobson) Pulling in to sprint for visibility since it impacted th... [22:35:45] 10Continuous-Integration-Config, 10Page-Previews, 10Readers-Web-Backlog, 10Patch-For-Review, and 2 others: mediawiki/extensions/Popups runs qunit in normal "npm test" - https://phabricator.wikimedia.org/T179425#3724953 (10Jdlrobson) [22:38:02] 10Release-Engineering-Team, 10Librarization, 10MinervaNeue, 10MobileFrontend, 10Readers-Web-Backlog: Move MobileFrontend/Minerva's svg_check.sh script into a reusable, separate library - https://phabricator.wikimedia.org/T179361#3724966 (10Jdlrobson) I agree, this would be very useful. We considered this... [22:38:26] 10Release-Engineering-Team, 10Librarization, 10MinervaNeue, 10MobileFrontend, 10Readers-Web-Backlog: Move MobileFrontend/Minerva's svg_check.sh script into a reusable, separate library - https://phabricator.wikimedia.org/T179361#3724968 (10Jdlrobson) p:05Triage>03Normal [22:45:41] AHH [22:47:05] (03CR) 10Hashar: [C: 032] "We run tox 2.5.0" [integration/config] - 10https://gerrit.wikimedia.org/r/387682 (owner: 10Hashar) [22:48:15] (03CR) 10jerkins-bot: [V: 04-1] docker: pass all env variables to tox [integration/config] - 10https://gerrit.wikimedia.org/r/387682 (owner: 10Hashar) [22:49:00] ebernhardson: seems to be an issue in tox 2.5.0, going to bump [22:50:32] yeahhhhh [22:53:34] !log docker push wmfreleng/tox:v2017.10.31.22.51 ( tox 2.6.0 https://gerrit.wikimedia.org/r/#/c/387682/ ) [22:53:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:54:16] (03PS2) 10Hashar: docker: pass all env variables to tox [integration/config] - 10https://gerrit.wikimedia.org/r/387682 [22:56:04] (03CR) 10Hashar: [C: 032] "Had to bump tox to 2.6.0 to support a wildcard in TOX_TESTENV_PASSENV" [integration/config] - 10https://gerrit.wikimedia.org/r/387682 (owner: 10Hashar) [22:57:24] (03Merged) 10jenkins-bot: docker: pass all env variables to tox [integration/config] - 10https://gerrit.wikimedia.org/r/387682 (owner: 10Hashar) [22:57:59] "Ivy, Maven and Gradle each have their own dependency cache" [22:58:01] ahh [22:58:13] so Ivy is yet another package manager grbuubuee [23:04:37] hashar: sadly, yes. And ivy is built into spark deeply as the dependency manager ... it will just pull from archiva.wikimedia.org (internal network) so might not be a big deal [23:05:00] at least, for mjolnir it will pull from archiva, and shuld for anything else that uses it since ops requires that stuff running in our networks after merge to come from our networks [23:08:06] Yippee, build fixed! [23:08:06] Project selenium-MinervaNeue » firefox,beta,Linux,BrowserTests build #182: 09FIXED in 28 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/182/ [23:08:54] (03CR) 10Hashar: "That is all great but I will try to get it to run as user nobody for consistency and solely rely on /cache." [integration/config] - 10https://gerrit.wikimedia.org/r/387658 (owner: 10EBernhardson) [23:09:06] :) [23:09:52] ebernhardson: that is http://www.scala-sbt.org/ isn't it ? [23:10:17] hashar: well, kinda sorta. Spark is built with sbt. The ivy usage in spark though is a direct integration by calling Ivy classes [23:10:32] my previous city had folks working for Zegularity / Play Framework ( https://en.wikipedia.org/wiki/Play_Framework ) [23:10:55] so there is a class SparkSubmit, it reads some configuration variables and news up appropriate ivy classes and then uses them to grab dependencies as part of the setup process [23:11:07] do you think we could get ivy2 to write cache to /cache/ivy2 instead of $HOME/.ivy2 ? [23:11:25] i can pass an explicit cache directory as part of mjolnir/test/conftest.py in the spark_context method, i just have to define some os envrionment variable to expect it in [23:11:42] IVY_CACHE_DIR ? [23:11:54] sounds like a plan ? :) [23:12:04] the idea is to try to have all package managers to write down to /cache/ [23:12:16] (which is defined in our docker images via XDG_CACHE_HOME=/cache/ ) [23:12:30] ok, i can use XDG_CACHE_DIR i suppose [23:12:46] then when a build has been triggered by a +2 or after a merge, the whole /cache ends up being stored to some place, and the next build will have it restored [23:12:58] or yeah XDG_CACHE_HOME + ivy2 [23:13:05] or whatever other name really :] [23:13:22] the idea is that at https://gerrit.wikimedia.org/r/#/c/387658/3/dockerfiles/tox-pyspark/example-run.sh [23:13:33] you mount /cache/ivy2 to /home/testrunner/.ivy2 [23:13:45] it is probably fine but it is late and I am not so sure how well it is going to work :) [23:14:10] (also the tox-docker job just -v "$(pwd)/cache:/cache" [23:15:13] there is also a comment about "does not test python code with the jvm code". I am sure that is fixable [23:15:23] maybe by running tox in the maven job [23:15:46] we would probably need a dedicated job though [23:16:49] hashar: no worries, go to bed. [23:17:11] hashar: yea i'm sure i could find a way to build the jar and use it for testing, but tbh its the python code that changes, the jvm code is small and pretty much static [23:17:24] things only end up in the jvm code because pyspark doesn't have the right things available. try and avoid it [23:17:44] ah [23:17:54] so it is not ideal, but not a concern either ) [23:18:03] yea not particularly worried about it [23:20:18] ebernhardson: anyway thanks for the pom_parent hack :) i like it a lot [23:20:28] I am going to rest. Happy hacking [23:20:32] \o [23:35:19] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:49:34] 10Release-Engineering-Team, 10Librarization, 10MinervaNeue, 10MobileFrontend, 10Readers-Web-Backlog: Move MobileFrontend/Minerva's svg_check.sh script into a reusable, separate library - https://phabricator.wikimedia.org/T179361#3725175 (10Legoktm) You can just have two commands: * `svg-check check image...