[00:10:31] 10Beta-Cluster-Infrastructure, 10Multimedia, 10Thumbor, 10Multimedia-Team-Working-Board: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3431671 (10ABorbaWMF) Looks good now on Beta. Tried a couple of articles and some different browsers. [00:12:00] !log jenkins now configured and running at https://releases.wikimedia.org/ci/ (T164030) - but needs additional admin users and puppet is still disabled for temp hack fix [00:12:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:12:05] T164030: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030 [00:12:11] 10Beta-Cluster-Infrastructure, 10Multimedia, 10Thumbor, 10Multimedia-Team-Working-Board: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3645146 (10ABorbaWMF) 05Open>03Resolved Forgot to resolve [00:19:08] !log releases1001 - created user for "no_justification", dropped pass in home dir [00:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:29:52] 10Release-Engineering-Team, 10Operations, 10vm-requests, 10Security-General: New ganeti VM for MW release pipeline work - https://phabricator.wikimedia.org/T163743#3645173 (10Dzahn) [00:29:54] 10RelEng-Archive-FY201718-Q1, 10Operations, 10Patch-For-Review, 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3645172 (10Dzahn) 05Resolved>03Open [00:31:11] 10RelEng-Archive-FY201718-Q1, 10Operations, 10Patch-For-Review, 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3218909 (10Dzahn) https://releases.wikimedia.org/ci/ is now usable. I went through the setup wizard, it said it was... [00:31:51] 10RelEng-Archive-FY201718-Q1, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review, 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3645178 (10Dzahn) a:05Dzahn>03demon [01:16:25] legoktm: https://grafana.wikimedia.org/dashboard/db/ci-docker-jobs [01:16:25] bed now [01:16:48] * legoktm hugs addshore [01:16:50] night! [01:46:41] 10:20:25 [137.5MB/2.41s] Package etsy/phan is abandoned, you should avoid using it. Use phan/phan instead. [02:16:06] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T172806#3645252 (10Mattflaschen-WMF) [04:03:34] (03PS1) 10Legoktm: docker: Rename php images to php70 [integration/config] - 10https://gerrit.wikimedia.org/r/381377 [04:13:06] (03PS2) 10Legoktm: docker: Rename php images to php70 [integration/config] - 10https://gerrit.wikimedia.org/r/381377 [04:28:14] (03PS1) 10Legoktm: Add experimental "composer-package-php70-docker" job [integration/config] - 10https://gerrit.wikimedia.org/r/381378 [04:30:35] (03CR) 10jerkins-bot: [V: 04-1] Add experimental "composer-package-php70-docker" job [integration/config] - 10https://gerrit.wikimedia.org/r/381378 (owner: 10Legoktm) [04:34:46] (03PS2) 10Legoktm: Add experimental "composer-package-php70-docker" job [integration/config] - 10https://gerrit.wikimedia.org/r/381378 [04:58:41] (03PS1) 10Legoktm: Add base "npm" docker image [integration/config] - 10https://gerrit.wikimedia.org/r/381384 [04:59:24] addshore: ok, turns out npm was easier than I thought [05:12:15] also, something really weird [05:12:16] https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-docker/26/console [05:12:20] 05:05:38 Generating optimized autoload files [05:12:20] 05:07:05 > ComposerHookHandler::onPreUpdate [05:12:27] then later [05:12:32] 05:08:15 Generating optimized autoload files [05:12:32] 05:10:22 ............................................................ 60 / 2544 (2%) [05:12:38] so it took 2 minutes for composer to generate the autoloader [05:28:26] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Install Blubber on contint1001 - https://phabricator.wikimedia.org/T175296#3645330 (10Joe) It should be built on boron (or copper) and added to the main component IMO. I reviewed the packaging and it seems ok, although I think we might want to add a def... [05:28:50] 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, 10User-Joe: Install Blubber on contint1001 - https://phabricator.wikimedia.org/T175296#3645331 (10Joe) [05:31:37] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [05:34:51] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [05:41:41] unsurprisingly, once we start running multiple jobs on the same docker slave, things start slowing down [05:47:26] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645351 (10Legoktm) [05:59:03] !log marking integration-slave-docker-1001 as offline - T177039 [05:59:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [05:59:07] T177039: mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039 [06:01:16] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645383 (10Legoktm) [06:06:45] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645395 (10Legoktm) While not nearly as obvious, you can sometimes see operations-puppet job... [06:11:37] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:14:52] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:16:00] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<10.00%) [06:32:36] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:57:16] (03PS3) 10Legoktm: Add experimental "composer-package-php70-docker" job [integration/config] - 10https://gerrit.wikimedia.org/r/381378 [06:57:18] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [06:57:18] (03PS1) 10Legoktm: Use Debian stretch for php70 images [integration/config] - 10https://gerrit.wikimedia.org/r/381392 [07:06:05] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:06:27] 10Gerrit, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, and 2 others: [Task] Move DataTypes repository from Github to gerrit - https://phabricator.wikimedia.org/T127292#3645416 (10WMDE-leszek) @thiemowmde: regarding the description on the github mirror https://github.com/wikimedia/mediawiki-extensions-... [07:12:38] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:33:40] 10Gerrit, 10Release-Engineering-Team (Kanban), 10DataTypes, 10DataValues, and 3 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3645439 (10thiemowmde) [07:36:53] 10Gerrit, 10Release-Engineering-Team (Kanban), 10DataTypes, 10DataValues, and 3 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3645456 (10thiemowmde) [07:37:24] 10Gerrit, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, and 2 others: [Task] Move DataTypes repository from Github to gerrit - https://phabricator.wikimedia.org/T127292#3645458 (10thiemowmde) I created T177042 and assigned @hashar. [07:38:00] 10Release-Engineering-Team (Watching / External), 10Wikidata: Decide what to do with Wikibase JS-only libraries regarding the build/deployment of Wikidata code - https://phabricator.wikimedia.org/T174922#3645460 (10WMDE-leszek) So we've decided to go with option 3 over the option 5 to avoid the issues @JeroenD... [07:38:08] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3645462 (10WMDE-leszek) [07:38:12] 10Release-Engineering-Team (Watching / External), 10Wikidata: Decide what to do with Wikibase JS-only libraries regarding the build/deployment of Wikidata code - https://phabricator.wikimedia.org/T174922#3645461 (10WMDE-leszek) 05Open>03Resolved [07:39:51] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [07:40:09] 10Gerrit, 10Release-Engineering-Team (Kanban), 10DataTypes, 10DataValues, and 3 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3645463 (10WMDE-leszek) the list seems to contain some irrelevant items, e.g. https://github.com/wikim... [07:45:40] 10Gerrit, 10Release-Engineering-Team (Kanban), 10DataTypes, 10DataValues, and 3 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3645469 (10thiemowmde) These are not "irrelevant", just not in use at the moment. I would like to disab... [07:48:10] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Zuul: Update zuul to upstream master - https://phabricator.wikimedia.org/T158243#3645472 (10Paladox) This https://github.com/openstack-infra/zuul/commit/01740bd14778cfc2632804522d4548fac57efff8 will require us to update our layout.yaml file but i doint think... [08:04:59] 10Gerrit: jenkins-bot ran gate-and-submit and voted V+2, but did not submit (merge) the change - https://phabricator.wikimedia.org/T155558#3645484 (10Paladox) This may be fixed with https://github.com/GerritCodeReview/gerrit/commit/29fd1f7364a96e6bc021d71240be9d390978ba88 [08:54:50] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [09:02:01] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:09:55] (03PS1) 10Addshore: docker: quiet fetch in mediawiki-phpcs [integration/config] - 10https://gerrit.wikimedia.org/r/381413 [09:10:29] !log wm-ci-docker-push mediawiki-phpcs:v2017.09.29.09.08 & latest https://gerrit.wikimedia.org/r/381413 [09:10:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:10:40] (03CR) 10Addshore: [C: 032] docker: quiet fetch in mediawiki-phpcs [integration/config] - 10https://gerrit.wikimedia.org/r/381413 (owner: 10Addshore) [09:11:08] morning hashar [09:11:25] I made another ticket :P https://phabricator.wikimedia.org/T177031 [09:14:25] (03Merged) 10jenkins-bot: docker: quiet fetch in mediawiki-phpcs [integration/config] - 10https://gerrit.wikimedia.org/r/381413 (owner: 10Addshore) [09:14:25] 10Continuous-Integration-Infrastructure (shipyard), 10User-Addshore: phan can fail to allocate memory on current docker hosts - https://phabricator.wikimedia.org/T177031#3645583 (10Addshore) [09:14:28] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Composer failed in Selenium job but job didn't stop - https://phabricator.wikimedia.org/T177047#3645584 (10Aleksey_WMDE) [09:14:33] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645597 (10Addshore) So before I look into this too much I'll say that I ran into my lintr j... [09:27:04] addshore: yeah that is going to hit us :( [09:27:21] specially the docker slaves allow up to two executors [09:33:54] !log rebooting integration-slave-docker-1001 [09:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:36:03] 10Continuous-Integration-Infrastructure (shipyard), 10User-Addshore: phan can fail to allocate memory on current docker hosts - https://phabricator.wikimedia.org/T177031#3645685 (10hashar) The docker slaves have two executors and are using the `c1.medium` flavor: | vCPUs | 2 | RAM | 2GB For lightweight jobs th... [09:36:12] addshore: yeah we need large ones https://phabricator.wikimedia.org/T177031#3645685 :D [09:36:22] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645686 (10Addshore) So, as far as I can tell this is probably due to more memory being used... [09:36:46] I asked for a c1.medium to save up some memory usage on the openstack infra. 2GB is usually enough for small jobs [09:36:54] but anything stronger definitely requires moaar memory [09:37:20] I am tempted to keep some small ships like now [09:37:26] can I have a shot a provisioning 1 today? :) I mean, we could definitely tag the different slaves too and only allow smaller jobs to run in smaller places? [09:37:33] and add larger ones with a specific label [09:37:50] well the smaller jobs can surely roam among all the instances [09:37:55] so you get eg: [09:38:09] small instances label: DebianJessieDocker [09:38:20] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [09:38:25] big instances: DebianJessieDocker BunchOfMemory [09:38:39] and for the phan job one would: node: DebianJessieDocker && BunchOfMemory [09:38:47] a light job would just: node: DebianJessieDocker [09:38:50] and would run on any [09:39:08] we can also apply to the docker slave the name of the openstack flavor. That might make it easier [09:39:14] 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, 10User-Joe: Install Blubber on contint1001 - https://phabricator.wikimedia.org/T175296#3645691 (10akosiaris) >>! In T175296#3645330, @Joe wrote: > It should be built on boron (or copper) and added to the main component IMO. Either `mai... [09:39:16] (eg: node: DebianJessieDocker && m1.xlarge ) [09:39:16] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645692 (10Addshore) After a reboot of 1001: ``` addshore@integration-slave-docker-1001:~$... [09:39:41] the provisionning should be straightforward: create an instance, apply whatever puppet role is needed, add to jenkins. Success!? [09:39:41] it looks like there is also an issue with the slaves chewing up all the memory ^^ [09:39:54] hashar! indeed! but i havnt done that yet ;) [09:40:43] addshore: ahhh is 1002 having the same mem issue? [09:40:53] !log marking integration-slave-docker-1001 as online - T177039 [09:40:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:40:57] T177039: mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039 [09:41:05] hashar: yeh, it looks like it has been building memory usage over the past 3 months [09:41:12] and it looks like 1003 and 1004 would eventually do the same [09:41:13] lets not reboot it :D [09:42:37] yeh, I just rebooted 1001 to see if it did actually make memory usage go away. 1002 will probably start performing terribly in the next couple of days [09:42:42] oh my [09:43:03] running a job on 1001 now (that took 25 mins yesterday for phpcs) and looks like it will be done in 2 mins again https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-docker/52 [09:45:24] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645704 (10Addshore) My rebuild of a job on 1001 that did that 25 mins yesterday (apparnetly... [09:48:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [09:48:47] 10Continuous-Integration-Infrastructure (shipyard), 10Cloud-Services, 10Graphite: Grafana reports ALL docker mounts in a spammy way - https://phabricator.wikimedia.org/T177052#3645705 (10Addshore) [09:49:41] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645719 (10hashar) On integration-slave-docker=1002 ``` free -m total use... [09:49:55] addshore: for the memory usage. Really I have no idea [09:50:05] each time I get confused as to how to find out what is using mem [09:50:51] yeh, I had a look and it didnt look like that much stuff was using memory :P [09:52:13] Slab: 1387900 kB [09:52:13] SReclaimable: 47348 kB [09:52:14] SUnreclaim: 1340552 kB [09:52:48] (ahh gotta find out the command to show the kernel slab allocations) [09:53:06] !log addshore@integration-slave-docker-1001:~$ sudo docker ps --filter "status=exited" | grep 'months ago' | awk '{print $1}' | xargs --no-run-if-empty sudo docker rm [09:53:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:53:16] slabtop! [09:53:35] addshore: ah yeah the container are not deleted. Despite the docker run -rm [09:53:40] !log addshore@integration-slave-docker-1001:~$ sudo docker ps --filter "status=exited" | grep 'weeks ago' | awk '{print $1}' | xargs --no-run-if-empty sudo docker rm [09:53:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:53:50] I was looking into that yesterday. TLDR is that signal handling is messy [09:54:00] hashar: these are all left over old old ones, this doesnt, maybe testing ones, containers are deleted for all the new jobs [09:54:57] 1001 container list is now totally clear [10:03:40] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645749 (10hashar) ``` slabtop --sort c --once|head -n 15 Active / Total Objects (% used)... [10:04:43] addshore: imho, that is a kernel memory leak of some sort [10:05:20] I just reclaimed 6GB of disk spaces from 1001 too [10:06:30] 10Continuous-Integration-Infrastructure (shipyard): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3645756 (10hashar) Found via `/proc/meminfo` and the slab allocation: ``` Slab: 1... [10:06:57] addshore: so for the slab memory issue. You can try pinging operations about it they would probably know what is going on [10:08:41] okay! :) [10:08:51] hashar: I dont like this new cumin thing, it has way too much output! :P [10:09:03] how can I make all the progress bars and crap go away? :D [10:09:20] hmm [10:09:24] gotta hack it :) [10:09:36] operations/software/cumin [10:09:46] it uses some 3rd party library for the progress bar [10:10:01] I was looking at it this morning, and there is way to disable the bar [10:10:07] oooooh [10:10:10] and maybe one can add a --no-progress [10:10:38] it also has troubles resizing the bars when changing the terminal size (SIGWNCH is not used) [10:10:54] but that is just all minor glitches :] [10:11:16] no --no-progress, but there is a --config [10:12:11] I am not sure it is implemented though [10:12:49] TypeError: 'str' does not support the buffer interface [10:12:52] bah [10:26:21] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Kanban), 10MW-1.30-release-notes (WMF-deploy-2017-09-05 (1.30.0-wmf.17)), 10Patch-For-Review, 10User-zeljkofilipin: mwext-ruby-jessie Jenkins job runs all ruby task... - https://phabricator.wikimedia.org/T164479#3645792 [10:27:50] 10Release-Engineering-Team (Watching / External), 10Wikidata: Decide what to do with Wikibase JS-only libraries regarding the build/deployment of Wikidata code - https://phabricator.wikimedia.org/T174922#3645793 (10aude) we should check with @demon if git submodules will be okay with CI / deployment. While I... [10:44:32] 10Release-Engineering-Team (Watching / External), 10Wikidata: Decide what to do with Wikibase JS-only libraries regarding the build/deployment of Wikidata code - https://phabricator.wikimedia.org/T174922#3645809 (10WMDE-leszek) @demon commented on this in https://phabricator.wikimedia.org/T174922#3578222: >... [10:44:46] 10Continuous-Integration-Infrastructure (shipyard), 10Operations: Update docker image docker-registry.wikimedia.org/wikimedia-jessie - https://phabricator.wikimedia.org/T177055#3645812 (10hashar) [11:10:15] 10Gerrit, 10Release-Engineering-Team (Kanban), 10DataTypes, 10DataValues, and 4 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3645886 (10Addshore) a:05hashar>03Addshore [11:13:01] (03PS1) 10Hashar: Shell out to zuul-cloner instead [integration/quibble] - 10https://gerrit.wikimedia.org/r/381424 [11:13:24] (03CR) 10jerkins-bot: [V: 04-1] Shell out to zuul-cloner instead [integration/quibble] - 10https://gerrit.wikimedia.org/r/381424 (owner: 10Hashar) [11:16:12] 10Gerrit, 10Release-Engineering-Team (Kanban), 10DataTypes, 10DataValues, and 4 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3645900 (10Addshore) [11:16:14] (03PS2) 10Hashar: Shell out to zuul-cloner instead [integration/quibble] - 10https://gerrit.wikimedia.org/r/381424 [11:16:27] 10Gerrit, 10Release-Engineering-Team (Kanban), 10DataTypes, 10DataValues, and 4 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3645439 (10Addshore) 05Open>03Resolved I had added the wikidata group to ALL of the repos listed in... [11:18:06] 10Gerrit, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, and 2 others: [Task] Move DataTypes repository from Github to gerrit - https://phabricator.wikimedia.org/T127292#3645906 (10Addshore) >>! In T127292#3645458, @thiemowmde wrote: > I created T177042 and assigned @hashar. I have completed the above... [11:21:24] hashar: is there any naming / numbering convention for the docker hosts? or should i just go for 1005? [11:21:38] what is 1705 ? [11:27:22] addshore: integration-slave--XXX [11:27:27] where XXX increments [11:27:35] ack, so whats the 17 one for? :P [11:27:36] but 1705 seems to have been set for docker 17.05 :] [11:27:42] aaaaah [11:27:50] or maybe we had 700 slaves created in between ;D [11:28:13] guess you can just use a different hundred for the big one that might represent the RAM 1201 1202 [11:28:17] or be creative :D [11:29:08] maybe integration-slave-docker-1005-large integration-slave-docker-1006-xlarge [11:29:22] I am not sure whether we need different hostname per amount of memory [11:29:25] up to you [11:29:41] it might be easier for our eyes :)_ [11:30:00] medium or large? [11:30:40] medium has 4G so only one executor [11:30:47] large 8G => 2 exec [11:30:51] xlarge 16G => 4 exec [11:31:04] (assuming the job ends up with 4G or so ram usage [11:31:14] oh wait ci1.medium and m1.medium are different ci 2GB ram, m 4GB [11:31:38] a fun thing as well is that docker run can enforce some memory/cpu limits [11:31:43] yeah [11:31:57] ci1.medium it is to save up a bit of RAM on openstack infra [11:32:12] is it possible to have 1 slave with 2 executors with different tags? or are the tags per slave? [11:33:29] per slave [11:33:32] okay [11:37:59] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 392 bytes in 0.004 second response time [11:38:00] hashar: would you hate me for integration-slave-docker-2c4r-1005 ? :P [11:39:18] might as well just use the openstack flavor names? m1-medium m1-large [11:39:39] also when custom flavors are asked they come with names such as c1.m2.s80 (1 cpu, 2g ram, 80 disk) [11:41:49] in theory can't the flavour names have their resources changed? [11:44:49] I dont think that can easily be done [11:45:39] integration-slave-docker-c1.m2.s40-1005 ? :P [11:46:02] well, integration-slave-docker-c2.m4.s40-1005 [11:47:10] integration-slave-docker-c2m4s40-1005 OR integration-slave-docker-m1-medium-1005 :P woo, name bikesheding [11:47:56] !!! [11:48:00] also hmm [11:48:14] a dot is probably not valid in a hostname, since that is used as a domain separator [11:48:19] moaraaar bikeshed :] [11:48:27] (and I might be wrong) [11:48:28] yup, i took the dots out of the last message! [11:49:16] I think being able to see the resources is nicer than having the flavour name, it gives you the ability to grep for all hosts that have 2gb ram / 4gb ram without having to know anything about flavours :) [11:49:28] s40 or d40 though for disk space? ;) [11:54:09] 10Release-Engineering-Team, 10Wikidata, 10Patch-For-Review, 10User-Addshore: Move config & loading logic out of Wikidata build and into mediawiki-config - https://phabricator.wikimedia.org/T176948#3645974 (10aude) [12:12:16] meh, editing the Hiera Config for the instance i created doesnt seem to be working [12:16:07] 10Continuous-Integration-Infrastructure (shipyard), 10Cloud-Services, 10Graphite: Grafana reports ALL docker mounts in a spammy way - https://phabricator.wikimedia.org/T177052#3646024 (10hashar) For {T1075} @fgiunchedi did a tweak to disable reporting diskI/O from partitions: ``` lang=ruby,name=modules/diam... [12:16:59] 10Continuous-Integration-Infrastructure (shipyard), 10Cloud-Services, 10Graphite: Grafana reports ALL docker mounts in a spammy way - https://phabricator.wikimedia.org/T177052#3646030 (10Addshore) That sounds like a good solution. [12:17:08] 10Continuous-Integration-Infrastructure (shipyard), 10Cloud-Services, 10Graphite, 10User-Addshore: Grafana reports ALL docker mounts in a spammy way - https://phabricator.wikimedia.org/T177052#3646031 (10Addshore) [12:18:14] hashar: can you apply any hiera config to https://horizon.wikimedia.org/project/instances/2683f86d-3210-4f7e-ab83-524b94eeb553/ ? :( [12:23:06] addshore: hmm wut ? :] [12:23:16] instance is no more! [12:23:22] sorry gotta brew yet another coffee [12:32:18] hashar: I wantto copy the hiera config from integration-slave-docker-1001 to integration-slave-docker-c2-m4-d40-1005 so it has docker, but editing hiera doesnt seem to work... [12:32:31] ohhhh [12:32:46] f£££ hiera [12:33:14] does docker-1001 actually have any hiera conf ?? [12:33:52] addshore: I think you just have to apply role::ci::slave::labs::docker [12:34:05] there is another docker::version: present , but afaik that is no more needed [12:34:20] there are a few patch on the ci puppet master to install docker-ce from jessie-wikimedia/component/ci [12:34:30] or does it fails? [12:35:20] *looks* [12:35:21] addshore: yeah a single role should od [12:35:32] and also puppet is broken due to https://phabricator.wikimedia.org/T152941 [12:35:43] I dont see role::ci::slave::labs::docker [12:35:46] * hashar fix puppet [12:35:51] and yeh, 1001 has some hiera conf, and no docker roles applied [12:36:09] on the instance in "Other Classes" just add: role::ci::slave::labs::docker [12:36:12] that should do it [12:36:28] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid tag '- role::ci::slave::labs::docker' on node integration-slave-docker-c2-m4-d40-1005.integration.eqiad.wmflabs [12:36:28] eek [12:38:54] $ /var/lib/git/operations/puppet/utils/hiera_lookup --fqdn=integration-slave-docker-1001.integration.eqiad.wmflabs classes [12:38:54] role::ci::slave::labs::docker [12:38:59] $ /var/lib/git/operations/puppet/utils/hiera_lookup --fqdn=integration-slave-docker-c2-m4-d40-1005.integration.eqiad.wmflabs classes [12:38:59] role::ci::slave::labs::docker [12:39:04] (that is on the ci puppetmaster [12:40:19] addshore: that works now. I just added the class in horizon "Other classes" sections [12:40:29] okay! :D [12:40:52] it is doing something somehow [12:41:37] How can you make that role appear in the "project" list of roles ? [12:41:42] done [12:41:51] I have no idea [12:42:39] maybe it will appear there when applied to at least 1 instance or something [12:45:13] wtf horizon, it still said ci::slave::labs::docker applied false [12:45:16] so da puppet is all happy at least [12:45:59] yeh, and docker is on the host, just wtf horizon [12:46:05] forget it [12:46:10] the class got applied :] [12:46:44] so, adding a slave is just done in the UI right? [12:47:01] yes [12:47:58] (03CR) 10Hashar: "recheck" [integration/quibble] - 10https://gerrit.wikimedia.org/r/381424 (owner: 10Hashar) [12:48:23] so docker and signal handling is a mess [12:48:34] depends on whether you pass -i and/or -t to docker run [12:48:41] and what your entry point is [12:48:46] not to mention docker run --init [12:53:12] hashar: hmm [12:53:12] !log gerrit: marked labs/tools/grrrit archived [12:53:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:53:20] (03PS1) 10Hashar: labs/tools/grrrit is archived [integration/config] - 10https://gerrit.wikimedia.org/r/381434 [12:53:50] (03PS7) 10Addshore: DNM mwext-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/381271 [12:57:32] hashar: yay, my new node is up :D https://integration.wikimedia.org/ci/job/mwext-php70-phan-docker/9/console [12:58:03] congratulations [12:58:08] whoop [12:58:11] bah [12:58:15] zuul.Cloner:Creating repo mediawiki/core from upstream https://gerrit.wikimedia.org/r/p/mediawiki/core [12:58:20] where is da cache?!!!!! :] [12:58:25] yeh, that was my next question [12:58:42] i tried searching through puppet (like you said) but couldnt find the location to add the cache [12:59:01] (03PS8) 10Addshore: DNM mwext-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/381271 [13:00:34] !log github: created https://github.com/wikimedia/integration-quibble for gerrit replication [13:00:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:00:50] addshore: https://github.com/wikimedia/integration-quibble/blob/master/README.md#caching [13:00:53] though the formatting is ugly [13:01:47] aah, okay, so nothing in puppet, i should just go and run that on hosts? [13:03:11] (03PS1) 10Hashar: Fix formatting in README.md [integration/quibble] - 10https://gerrit.wikimedia.org/r/381436 [13:03:26] addshore: well quibble has some doc about the cache [13:03:42] ack [13:03:45] I went with docker run -v "$(pwd)"/ref:/srv/git:ro [13:03:48] did you see what I did in https://gerrit.wikimedia.org/r/#/c/381271/7/jjb/mediawiki-extensions.yaml for the phan job? :P [13:03:57] and then in the conatainer: zuul-cloner --cache-dir /srv/git [13:04:25] (03CR) 10Hashar: [C: 032] Fix formatting in README.md [integration/quibble] - 10https://gerrit.wikimedia.org/r/381436 (owner: 10Hashar) [13:04:55] (03Merged) 10jenkins-bot: Fix formatting in README.md [integration/quibble] - 10https://gerrit.wikimedia.org/r/381436 (owner: 10Hashar) [13:04:55] I'm looking forward to maybe being able to use this quibble tihng [13:04:57] *thing [13:07:39] (03PS1) 10Addshore: docker: lintr-docker job use m4executor [integration/config] - 10https://gerrit.wikimedia.org/r/381437 [13:08:11] (03CR) 10Addshore: [C: 032] docker: lintr-docker job use m4executor [integration/config] - 10https://gerrit.wikimedia.org/r/381437 (owner: 10Addshore) [13:09:55] (03Merged) 10jenkins-bot: docker: lintr-docker job use m4executor [integration/config] - 10https://gerrit.wikimedia.org/r/381437 (owner: 10Addshore) [13:11:22] I was going to make an image called ci-code-builder or ci-source-fetcher that did everything like git fetch, zuul-cloner, composer installs, npm etc but I feel that quibble will do that? [13:18:29] I am not sure :] [13:18:53] for quibble the idea is to provide a stack that covers any mediawiki testing use case [13:19:00] phpunit / composer tests / qunit whatever [13:19:20] with an easy way to setup the environment (repos to clone, vendor vs composer install, postgres v mysql, etc [13:19:33] I think I will first use it to replace the PHPUnit run [13:24:56] addshore: anyway its kids time already! [13:24:59] tata! [14:09:58] !log maurelio@deployment-tin:~$ mwscript cleanupSpam.php --wiki=deploymentwiki *.loginpartner.org --delete ( testing T176206 / 7f842058602c ) [14:10:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:10:04] T176206: cleanupSpam.php: mark edits/actions as 'bot' - https://phabricator.wikimedia.org/T176206 [14:15:12] !log maurelio@deployment-tin:~$ mwscript cleanupSpam.php --wiki=deploymentwiki *.logininput.org ( testing w/o delete T176206 / 7f842058602c ) [14:15:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:15:16] T176206: cleanupSpam.php: mark edits/actions as 'bot' - https://phabricator.wikimedia.org/T176206 [14:20:08] (03PS9) 10Addshore: DNM mwext-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/381271 [14:22:00] (03PS10) 10Addshore: DNM mwext-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/381271 [14:30:43] (03PS11) 10Addshore: mwext-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/381271 [14:47:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Cloud-VPS, 10Nodepool, and 2 others: rabbitmq: Consume and log messages sent to notifications.error - https://phabricator.wikimedia.org/T175029#3646514 (10Andrew) p:05Triage>03Normal [14:54:30] (03PS1) 10Zfilipin: mwext-ruby-jessie job should run all Ruby tasks for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/381453 (https://phabricator.wikimedia.org/T164479) [14:58:27] 10Continuous-Integration-Config, 10Patch-For-Review: Whitelist Dvorapa on Zuul CI - https://phabricator.wikimedia.org/T176570#3646582 (10MarcoAurelio) @hashar et al. Guys I feel we're offending @Dvorapa here. If there are concerns on whitelisting the user, please voice them. Staying quite does not help the use... [15:07:07] (03CR) 10Zfilipin: [C: 032] mwext-ruby-jessie job should run all Ruby tasks for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/381453 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [15:08:13] (03Merged) 10jenkins-bot: mwext-ruby-jessie job should run all Ruby tasks for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/381453 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [15:10:08] !log Reloading Zuul to deploy 7f66813dc0842dadfdb74c9257582aed26f35d60 [15:10:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:15:07] thcipriani: looks like using the user "nobody" still has issues once the directories you want to delete etc go more than 1 level deep :/ [15:16:31] hrm [15:16:39] which test are you looking at? [15:20:15] (03PS1) 10Zfilipin: Update Ruby jobs for VisualEditor [integration/config] - 10https://gerrit.wikimedia.org/r/381460 (https://phabricator.wikimedia.org/T164479) [15:20:37] maybe guid bit on that directory would be helpful in this instance? [15:34:05] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [15:38:13] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Kanban), 10MW-1.30-release-notes (WMF-deploy-2017-09-05 (1.30.0-wmf.17)), 10Patch-For-Review, 10User-zeljkofilipin: mwext-ruby-jessie Jenkins job runs all Ruby task... - https://phabricator.wikimedia.org/T164479#3646740 [15:44:02] (03PS1) 10Zfilipin: Deleted Ruby jobs for PdfHandler [integration/config] - 10https://gerrit.wikimedia.org/r/381466 (https://phabricator.wikimedia.org/T164479) [15:45:08] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3540725 (10WMDE-leszek) [15:45:38] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3646773 (10WMDE-leszek) [15:45:42] addshore: https://gist.github.com/thcipriani/97b45ceff0385b785de3c5ab2a276f33 ? [15:49:56] thcipriani: thanks, will look soon! [15:56:55] (03PS1) 10Zfilipin: Deleted Ruby jobs for Translate [integration/config] - 10https://gerrit.wikimedia.org/r/381468 (https://phabricator.wikimedia.org/T164479) [15:59:10] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Set scap.cfg's canary_dashboard_url to useful beta logstash url - https://phabricator.wikimedia.org/T168211#3646909 (10thcipriani) 05Open>03Resolved [16:02:16] 10Scap: Scap mediawiki canaries should prompt to continue - https://phabricator.wikimedia.org/T173146#3646914 (10thcipriani) [16:02:32] 10Scap: Scap MediaWiki canaries should prompt to continue - https://phabricator.wikimedia.org/T173146#3519703 (10thcipriani) [16:02:43] (03PS1) 10Zfilipin: Deleted Ruby jobs for TwnMainPage [integration/config] - 10https://gerrit.wikimedia.org/r/381471 (https://phabricator.wikimedia.org/T164479) [16:07:55] 10Scap: Scap sync and sync-file are too similar looking yet do very different things - https://phabricator.wikimedia.org/T174369#3646962 (10thcipriani) p:05Triage>03Normal We've been working towards the goal of making it possible to have `scap sync` just Do The Right Thing™ in the majority of cases (e.g., sy... [16:37:34] (03Abandoned) 10Niedzielski: Marvin: install NPM >= 5 [integration/config] - 10https://gerrit.wikimedia.org/r/375405 (owner: 10Niedzielski) [16:38:34] (03CR) 10Zfilipin: [C: 032] Update Ruby jobs for VisualEditor [integration/config] - 10https://gerrit.wikimedia.org/r/381460 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [16:41:12] (03Merged) 10jenkins-bot: Update Ruby jobs for VisualEditor [integration/config] - 10https://gerrit.wikimedia.org/r/381460 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [16:42:53] !log Reloading Zuul to deploy 09445b837a03a2ce906fe848aec8350f59ab5898 [16:42:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:56:42] hello addshore [16:57:50] Hi legoktm [16:58:01] I have stuff for you to review :) [16:58:11] Oooh, what? [16:58:33] https://gerrit.wikimedia.org/r/#/q/owner:%22Legoktm+%253Clegoktm%2540member.fsf.org%253E%22+project:integration/config+status:open [16:59:58] (03CR) 10Addshore: [C: 031] "not verified yet" [integration/config] - 10https://gerrit.wikimedia.org/r/381392 (owner: 10Legoktm) [17:00:27] (03CR) 10Addshore: "I am against naming the image php70, really we should just have a php image and it would have different tags" [integration/config] - 10https://gerrit.wikimedia.org/r/381377 (owner: 10Legoktm) [17:01:08] hmm [17:01:23] to have different tags you still need different dockerfiles right? [17:02:02] yes [17:02:14] our build script probably isnt setup in a nice way to do it either [17:02:26] we should really look at some of the templating things you can use to make docker images [17:02:38] legoktm: did you see https://gerrit.wikimedia.org/r/#/c/381271/ ? [17:03:02] made an image that can setup the source to test for mediawiki, and then i can just use the phan image to do the test [17:03:25] the image used for building has the jenkins-slace scripts, php, python, composer, git and zuul-cloner, and you just pass in whatever bash script you want to run [17:04:27] oh awesome [17:04:42] i think that /might/ be a nice direction to do [17:05:03] and if we make all of the builders that jenkins uses actually just shell scripts (which they can be) we can mount them in the image and then just run them [17:05:09] addshore: take a look at https://github.com/wikimedia/operations-docker-images-production-images I was talking to _joe_ last night and he showed me that [17:05:23] and the code being run for setup then can be exactly the same for the jessie jobs as the docker jobs [17:06:28] but the image i built basically takes whole RUN commands from the other images and puts them in the one dockerfile [17:08:09] not got any time to look at stuff tonight [17:08:13] I kind of feel like we're going in circles a bit [17:08:24] ok, it's basically a build script that uses jinja2 templates [17:19:48] (03PS1) 10Zfilipin: Deleted Ruby jobs for UniversalLanguageSelector [integration/config] - 10https://gerrit.wikimedia.org/r/381480 (https://phabricator.wikimedia.org/T164479) [17:20:33] (03CR) 10Nikerabbit: [C: 031] Deleted Ruby jobs for TwnMainPage [integration/config] - 10https://gerrit.wikimedia.org/r/381471 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [17:21:45] (03CR) 10Zfilipin: [C: 032] Deleted Ruby jobs for TwnMainPage [integration/config] - 10https://gerrit.wikimedia.org/r/381471 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [17:23:44] (03Merged) 10jenkins-bot: Deleted Ruby jobs for TwnMainPage [integration/config] - 10https://gerrit.wikimedia.org/r/381471 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [17:25:34] 10Gerrit, 10RelEng-Archive-FY201718-Q1, 10DataTypes, 10DataValues, and 4 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3647174 (10greg) [17:25:41] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10RelEng-Archive-FY201718-Q1, 10Operations-Software-Development, and 2 others: Replace salt on integration and deployment-prep projects - https://phabricator.wikimedia.org/T176314#3647175 (10greg) [17:25:43] 10RelEng-Archive-FY201718-Q1, 10Wikimedia-Mailing-lists: Create code health mailing list - https://phabricator.wikimedia.org/T170963#3647179 (10greg) [17:25:48] 10Beta-Cluster-Infrastructure, 10RelEng-Archive-FY201718-Q1, 10Patch-For-Review: Set scap.cfg's canary_dashboard_url to useful beta logstash url - https://phabricator.wikimedia.org/T168211#3647183 (10greg) [17:25:50] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10RelEng-Archive-FY201718-Q1: Uncaught exception: Could not open extension when trying to load tidy.so - https://phabricator.wikimedia.org/T168978#3647182 (10greg) [17:25:52] 10Continuous-Integration-Infrastructure, 10RelEng-Archive-FY201718-Q1, 10Documentation, 10WorkType-Maintenance, 10Zuul: Zuul can't merge two patches at once on integration/config.git because of fast-forward only Gerrit strategy - https://phabricator.wikimedia.org/T111426#3647190 (10greg) [17:25:54] 10Continuous-Integration-Infrastructure, 10RelEng-Archive-FY201718-Q1, 10Jenkins, 10Upstream: Launching Jenkins slave agent fails with "java.io.IOException: Unexpected termination of the channel" - https://phabricator.wikimedia.org/T91697#3647191 (10greg) [17:26:38] 10Release-Engineering-Team (Kanban), 10User-greg: End of Q1 grooming - https://phabricator.wikimedia.org/T176523#3647192 (10greg) And again: https://phabricator.wikimedia.org/daemon/bulk/view/1233/ [17:28:14] SWAT has a new illustration -- https://wikitech.wikimedia.org/wiki/SWAT_deploys#Humour [17:28:53] good work [17:29:21] ^ [17:29:22] solid. [17:30:12] 10RelEng-Archive-FY201718-Q1, 10DataTypes, 10DataValues, 10GitHub-Mirrors, and 4 others: [Task] Add Wikidata developers as admins to all relevant GitHub mirrors - https://phabricator.wikimedia.org/T177042#3647202 (10Legoktm) [17:34:06] !log Reloading Zuul to deploy 0e26c8697438a88fbdd62884dcb40e664ba98fa2 [17:34:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:58:52] bd808: Nice! [17:58:56] greg-g: Speaking of wikis being on fire.... [17:59:04] I have a request for a Friday deploy [17:59:13] T176884 [17:59:14] T176884: Icons missing throughout UI on Edge, IE 11 - https://phabricator.wikimedia.org/T176884 [18:00:09] (Thanks to Volker and Ed for identifying and fixing the problem, all I did was cause the problem) [18:03:29] just the revert there? [18:03:51] RoanKattouw: ^ [18:03:57] Two reverts but yes [18:04:06] kk [18:04:15] godspeed [18:04:17] Ed figured out that the commit couldn't be reverted by itself, the one that came after it had to be reverted too to avoid conflicts [18:04:20] Thanks [18:12:37] ARGH [18:12:41] https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-docker/67/console [18:12:45] CI is broken for wmf.1 [18:13:09] I'll file a task [18:14:46] 10Continuous-Integration-Config: CI (phpcs-docker) broken for wmf branches - https://phabricator.wikimedia.org/T177104#3647315 (10Catrope) [18:16:34] (03CR) 10Umherirrender: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [18:24:13] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T172806#3647340 (10Catrope) [18:30:07] (03CR) 10jerkins-bot: [V: 04-1] Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [18:35:49] (03PS1) 10Thcipriani: Allow dockerfiles to be built from anywhere [integration/config] - 10https://gerrit.wikimedia.org/r/381487 [18:38:39] (03CR) 10Legoktm: [C: 032] "Thanks, I ran into this yesterday too :)" [integration/config] - 10https://gerrit.wikimedia.org/r/381487 (owner: 10Thcipriani) [18:39:45] (03Merged) 10jenkins-bot: Allow dockerfiles to be built from anywhere [integration/config] - 10https://gerrit.wikimedia.org/r/381487 (owner: 10Thcipriani) [18:40:28] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard): CI (phpcs-docker) broken for wmf branches - https://phabricator.wikimedia.org/T177104#3647404 (10Legoktm) This is a bug in the new docker version - we shouldn't be checking out submodules for this job... [18:51:38] (03PS1) 10Thcipriani: Don't checkout submodules for phpcs-docker [integration/config] - 10https://gerrit.wikimedia.org/r/381488 (https://phabricator.wikimedia.org/T177104) [18:53:23] (03CR) 10Legoktm: [C: 031] "Thanks. I assume the image needs to be re-built before it can be merged?" [integration/config] - 10https://gerrit.wikimedia.org/r/381488 (https://phabricator.wikimedia.org/T177104) (owner: 10Thcipriani) [18:55:38] hashar hi, i belive there has been a fix for gerrit 2.13 in zuul https://github.com/openstack-infra/zuul/commit/01740bd14778cfc2632804522d4548fac57efff8 [18:56:20] (03CR) 10Nikerabbit: [C: 031] Deleted Ruby jobs for UniversalLanguageSelector [integration/config] - 10https://gerrit.wikimedia.org/r/381480 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [18:57:15] * paladox is redesgning the registration in polygerrit. [18:57:38] (03PS2) 10Thcipriani: Don't checkout submodules for phpcs-docker [integration/config] - 10https://gerrit.wikimedia.org/r/381488 (https://phabricator.wikimedia.org/T177104) [18:59:06] (03CR) 10Thcipriani: "> Thanks. I assume the image needs to be re-built before it can be" [integration/config] - 10https://gerrit.wikimedia.org/r/381488 (https://phabricator.wikimedia.org/T177104) (owner: 10Thcipriani) [19:00:22] (03CR) 10Hashar: "Maybe we can have a single DockerFile which would vary based on the PHP version. Eg:" [integration/config] - 10https://gerrit.wikimedia.org/r/381377 (owner: 10Legoktm) [19:01:47] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10RelEng-Archive-FY201718-Q1, 10Operations-Software-Development, and 2 others: Replace salt on integration and deployment-prep projects - https://phabricator.wikimedia.org/T176314#3647464 (10hashar) I have a made a short announce on th... [19:03:51] (03CR) 10Hashar: [C: 031] Don't checkout submodules for phpcs-docker [integration/config] - 10https://gerrit.wikimedia.org/r/381488 (https://phabricator.wikimedia.org/T177104) (owner: 10Thcipriani) [19:07:20] * legoktm tests new image [19:10:32] (03CR) 10Hashar: "bah sorry I am not paying attention when writing code :D" [integration/config] - 10https://gerrit.wikimedia.org/r/381487 (owner: 10Thcipriani) [19:19:35] thcipriani: hmm https://paste.fedoraproject.org/paste/4-rfugBOQdtDDO4msf2pDw/raw [19:20:45] oh, my image still has the git submodule line in it [19:20:58] is :latest not equal to the most recent dated tag? [19:21:54] ah, no, I just pushed the tagged image directly. I figured I'd overwrite latest once deployed. [19:22:07] tagged image does not have the submodule line, just confirmed locally [19:23:13] thcipriani: with the tagged image I'm getting https://paste.fedoraproject.org/paste/jL451NKaU5rF-G8YK4mh~A/raw [19:24:50] oh good. [19:25:55] (the first part about the `install` command not working can be ignored, I just created the directory manually for now.) [19:26:03] hrm, I wonder if there were some local changes where creates the src directory, I think that's all it's missing? [19:26:16] well and git [19:26:43] * thcipriani rebuilds all images [19:27:09] it's probably something out of date for me locally somewhere :( [19:29:17] I'm going afk for lunch, if you can get it to work with the example-run env I posted, I think it should be good to go :) [19:31:16] (03CR) 10Nikerabbit: [C: 031] Deleted Ruby jobs for Translate [integration/config] - 10https://gerrit.wikimedia.org/r/381468 (https://phabricator.wikimedia.org/T164479) (owner: 10Zfilipin) [19:31:44] legoktm: k, thanks, I think the problem is that the ci base image updated since I last built it :\ [19:33:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:36:47] or maybe all of the things have updated [19:40:49] I need to rebuild labvirt1016, so I need to migrate some VMs: integration-slave-jessie-1003, integration-slave-jessie-1004, integration-slave-jessie-php55 [19:41:02] Can I do that now? and/or is there anything I should do to prevent disruption when I do? [19:41:13] hashar or thcipriani ^ [19:42:24] uhhh, lemme mark those as offline so jenkins doesn't assign any builds there [19:43:27] perfect, thanks [19:43:33] let me know when you're ready [19:44:22] andrewbogott: ok, should be good to go [19:45:17] andrewbogott: I am going to nuke integration-slave-jessie-php55 [19:45:23] while at it :D [19:45:31] better yet :) [19:45:34] !log Deleting integration-slave-jessie-php55 [19:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:45:54] andrewbogott: can you kindly graefully shut down the others ? :) [19:46:16] whenever they come back, assuming they keep the same IP, Jenkins will pool them back just fine [19:47:21] hashar: 1003 is aleady mid-migration, but I'll do a 'halt' on 1004 [19:47:53] PROBLEM - Host integration-slave-jessie-php55 is DOWN: CRITICAL - Host Unreachable (10.68.17.183) [19:48:19] PROBLEM - Host integration-slave-jessie-1003 is DOWN: CRITICAL - Host Unreachable (10.68.17.164) [19:49:00] PROBLEM - Host integration-slave-jessie-1004 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [19:49:32] (03PS3) 10Thcipriani: Don't checkout submodules for phpcs-docker [integration/config] - 10https://gerrit.wikimedia.org/r/381488 (https://phabricator.wikimedia.org/T177104) [19:55:15] (03CR) 10Thcipriani: [C: 032] "example run working now, merging." [integration/config] - 10https://gerrit.wikimedia.org/r/381488 (https://phabricator.wikimedia.org/T177104) (owner: 10Thcipriani) [19:57:09] (03Merged) 10jenkins-bot: Don't checkout submodules for phpcs-docker [integration/config] - 10https://gerrit.wikimedia.org/r/381488 (https://phabricator.wikimedia.org/T177104) (owner: 10Thcipriani) [20:04:25] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T172806#3647561 (10Catrope) [20:08:12] RECOVERY - Host integration-slave-jessie-1003 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [20:13:37] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:14:13] so the question for the random audience [20:14:24] HOW DO I GPG SIGN A DOCKER IMAGE I AM GOING TO PUSH ? ???? [20:15:45] PROBLEM - Puppet errors on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [20:17:17] RECOVERY - Host integration-slave-jessie-1004 is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms [20:17:45] thcipriani: migration finished, you can repool at your leisure [20:18:46] andrewbogott: thanks, nodes back online :) [20:19:37] mind if I migrate deployment-kafka-jumbo-1 now? [20:19:45] hm, maybe that's not you [20:20:44] thcipriani: ? [20:20:46] RECOVERY - Puppet errors on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [20:21:06] andrewbogott: hrm, that one is probably in deployment-prep, but I have no idea what it is :( [20:21:52] andrewbogott: elukey ottomatta would know. I have no clue what it is [20:22:01] ok [20:28:27] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:38:04] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Cloud-VPS, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3647602 (10Andrew) [20:38:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Cloud-VPS, 10Nodepool, and 2 others: rabbitmq: Consume and log messages sent to notifications.error - https://phabricator.wikimedia.org/T175029#3647601 (10Andrew) 05Open>03Resolved [20:41:02] hashar: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/managing_containers/signing_container_images [20:41:43] Yippee, build fixed! [20:41:44] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #532: 09FIXED in 43 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/532/ [20:41:49] Yippee, build fixed! [20:41:49] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #532: 09FIXED in 49 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/532/ [20:42:03] looks like you need to use Fedora/RHEL's "atomic" thing [20:58:45] !gerrit I3b55b0dbb00cedaa124bf6d38b43bf71875be83b [20:59:02] !help [20:59:18] (03CR) 10Hashar: [C: 032] labs/tools/grrrit is archived [integration/config] - 10https://gerrit.wikimedia.org/r/381434 (owner: 10Hashar) [21:00:02] legoktm: docker has some content trust system, though that is disabled by default and I am not quite sure how it actually validate who is who. It also still let one download an unsigned image [21:00:03] I trust: legoktm!.*@wikipedia/Legoktm (2admin), hashar!.*@mediawiki/hashar (2admin), .*@wikipedia/.* (2trusted), .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .* (2trusted), .*@wikimedia/zeljko-filipin-wmf (2admin), [21:00:03] @trusted [21:00:14] !!!! [21:00:15] !help [21:00:19] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 2.8.0.0 [libirc v. 1.0.3] my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [21:00:19] @help [21:00:20] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 2.8.0.0 [libirc v. 1.0.3] my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [21:00:20] @help [21:00:28] @gerrit I3b55b0dbb00cedaa124bf6d38b43bf71875be83b [21:00:29] https://meta.wikimedia.org/wiki/Wm-bot [21:00:36] bd808: thank you :] [21:00:49] !gerrit is https://gerrit.wikimedia.org/r/#/c/$1 [21:00:50] Key was added [21:00:55] I thought I had it learned !gerrit as a search query at somepoint [21:00:57] (03Merged) 10jenkins-bot: labs/tools/grrrit is archived [integration/config] - 10https://gerrit.wikimedia.org/r/381434 (owner: 10Hashar) [21:01:00] !gerrit I3b55b0dbb00cedaa124bf6d38b43bf71875be83b [21:01:00] https://gerrit.wikimedia.org/r/#/c/I3b55b0dbb00cedaa124bf6d38b43bf71875be83b [21:01:28] oh, q [21:01:32] !del gerrit [21:01:38] !gerrit del [21:01:38] Successfully removed gerrit [21:01:43] !gerrit is https://gerrit.wikimedia.org/r/#/q/$1 [21:01:44] Key was added [21:01:47] !gerrit I3b55b0dbb00cedaa124bf6d38b43bf71875be83b [21:01:47] https://gerrit.wikimedia.org/r/#/q/I3b55b0dbb00cedaa124bf6d38b43bf71875be83b [21:01:57] hashar: there you go :) [21:02:09] so helpful! thanks! [21:07:31] legoktm: as I understood it, RedHat forked Docker or at least maintain a bunch of usefull patches on top of it [21:08:25] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [21:09:25] they have stuff that wraps around docker, not sure if they have patches that aren't upstream [21:10:02] ah thank you for the link. +1 on gpg :] [21:26:19] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review: CI (phpcs-docker) broken for wmf branches - https://phabricator.wikimedia.org/T177104#3647735 (10thcipriani) 05Open>03Resolved a:03thcipriani new phpcs image pushed and live [21:31:52] thcipriani: oh also. One day you showed me a cringe website to lint dockerfile :] [21:32:24] but I cant find it anymore. Afaik it looked like some comic strip with bright yellow and bold Terminal font [21:32:53] hrm....that doesn't ring a bell [21:33:07] I will find out eventually :] [21:35:00] you should share it, then :) [21:35:09] maybe it was not even for docker [21:41:43] http://docker.wtf/ <--- the first tip is the equivalent of curl | sudo bash :] [21:45:37] terrifying [21:45:51] is a real thing evidently: https://github.com/spotify/docker-gc [21:46:58] https://github.com/spotify/docker-gc#running-as-a-docker-container !!! [21:47:28] f you want it to run as a cron job, you can configure it now by creating a root-owned executable file /etc/cron.hourly/docker-gc with content /usr/sbin/docker-gc [21:47:31] yeah surely [21:47:31] heh, at least that mounts /etc as read only :) [21:47:52] yeah that is secure [21:47:54] ;]] [21:47:55] :P [21:49:48] well time for me to head to bed! [21:53:23] 10Continuous-Integration-Infrastructure, 10Technical-Debt: Jenkins should flag usage of deprecated features - https://phabricator.wikimedia.org/T53908#3647798 (10Aklapper) [22:26:45] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:31:36] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 35270 bytes in 1.358 second response time [22:46:09] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Deployments: Add jobrunners to Scap canary process - https://phabricator.wikimedia.org/T172480#3647925 (10greg) Adding our #releng-kanban project as we would like to work on this in the coming quarter or two (no promises though, th... [22:46:15] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10ORES, and 2 others: Support git-lfs files in gerrit - https://phabricator.wikimedia.org/T171758#3647927 (10greg) Adding our #releng-kanban project as we would like to work on this in the coming quarter or two (no promis... [22:46:18] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10WorkType-NewFunctionality: Play elevator music while scap is running - https://phabricator.wikimedia.org/T170484#3647930 (10greg) Adding our #releng-kanban project as we would like to work on this in the coming quarter or two (no p... [22:46:23] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Patch-For-Review: Use git as transport mechanism for MediaWiki scap deploys - https://phabricator.wikimedia.org/T147938#3647933 (10greg) Adding our #releng-kanban project as we would like to work on this in the coming quarter or tw... [22:46:27] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Deployments, 10WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#2358196 (10greg) Adding our #releng-kanban project as we would like to work on this in the coming quarter or two (... [22:46:30] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10scap2: Scap should touch symlinks when originals are touched - https://phabricator.wikimedia.org/T126306#3647938 (10greg) Adding our #releng-kanban project as we would like to work on this in the coming quarter or two (no promises... [22:46:34] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10scap2: scap3 should repack / pack-refs git repos under /srv/deployment - https://phabricator.wikimedia.org/T112509#3647941 (10greg) Adding our #releng-kanban project as we would like to work on this in the coming quarter or two (no...