[00:14:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [00:40:19] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [00:47:36] 10Continuous-Integration-Config, 10MediaWiki-Platform-Team, 10Patch-For-Review: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#3623350 (10Legoktm) [00:48:48] 10Continuous-Integration-Config, 10MediaWiki-Platform-Team, 10Patch-For-Review: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#2615580 (10Krinkle) Note that our core unit tests //are// currently passing on Travis CI: [01:15:16] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:19] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [04:16:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [04:37:18] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [04:52:50] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:42:48] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:24:22] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Patch-For-Review: Update gerrit to 2.14.4 - https://phabricator.wikimedia.org/T156120#3623604 (10Paladox) [06:42:46] PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:43:40] 10Gerrit: Data loss in gerrit review interface due to bad design - https://phabricator.wikimedia.org/T174551#3623611 (10TerraCodes) [06:43:42] 10Gerrit: "add comment" feature doesn't allow you to write a comment while viewing the code or viewing the other comments - https://phabricator.wikimedia.org/T48777#3623614 (10TerraCodes) [06:45:57] (03PS1) 10Legoktm: chmod +x dockerfiles/composer/prebuild.sh [integration/config] - 10https://gerrit.wikimedia.org/r/379473 [06:50:05] [INFO] BUILDING wmfreleng/mediawiki-phan:v2017.09.21.06.45 [06:50:05] Sending build context to Docker daemon 3.584 kB [06:50:05] Step 1/9 : FROM wmfreleng/composer:latest as composer [06:50:05] Error parsing reference: "wmfreleng/composer:latest as composer" is not a valid repository/tag: invalid reference format [06:53:56] aha [06:53:57] https://docs.docker.com/v1.13/engine/reference/builder/#/from [06:54:01] my docker version is just too old [06:54:44] PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:57:18] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [06:57:34] PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:58:00] https://bugzilla.redhat.com/show_bug.cgi?id=1493147 [06:59:31] PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:00:25] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:00:59] PROBLEM - Puppet errors on deployment-kafka-jumbo-2 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:01:10] PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:01:17] 10Gerrit: Data loss in gerrit review interface due to bad design - https://phabricator.wikimedia.org/T174551#3623637 (10Nikerabbit) 05duplicate>03Open Sorry I don't see how this would be a duplicate. I am speaking about a data loss that happens when you have already commented, not the inability to write a co... [07:02:26] PROBLEM - Puppet errors on deployment-kafka-jumbo-1 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:02:42] PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:02:46] PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:02:52] PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:03:22] PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:03:24] PROBLEM - Puppet errors on deployment-memc06 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:03:38] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:03:46] PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:03:51] PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:04:01] PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:04:21] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:04:24] PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:04:48] PROBLEM - Puppet errors on deployment-eventlog02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:05:27] PROBLEM - Puppet errors on deployment-pdf01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:05:30] PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:07:08] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:08:22] https://github.com/docker/for-linux/issues/35 this is pretty silly [07:09:13] PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [07:09:19] PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:10:21] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:10:27] PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:10:33] PROBLEM - Puppet errors on deployment-memc07 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:10:45] PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:11:03] PROBLEM - Puppet errors on deployment-sca03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:11:08] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:12:40] PROBLEM - Puppet errors on deployment-sentry01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:12:54] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:13:12] PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:13:22] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:13:28] PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:13:40] PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:13:54] PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:15:43] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:15:51] PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:16:13] and the docker version in their "edge" repo is broken, yay. I'm giving up for tonight :( [07:17:13] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:17:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [07:17:35] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:18:27] PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:18:29] PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:18:51] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:20:40] PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:20:50] PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:21:20] PROBLEM - Puppet errors on deployment-sca04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:22:20] PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:22:22] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:22:34] PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:22:56] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:23:27] PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:23:27] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:23:33] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:24:15] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:24:35] PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:25:47] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:25:51] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:25:53] PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:26:56] PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:27:45] PROBLEM - Puppet errors on deployment-puppetmaster02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:30:49] (03PS1) 10Legoktm: [WIP] Add dockerfile for 'mediawiki-core-phpcs' job [integration/config] - 10https://gerrit.wikimedia.org/r/379479 [07:31:33] (03CR) 10Legoktm: "I haven't figured out how to test this since my docker is too old, but I just copied the mediawiki-phan one." [integration/config] - 10https://gerrit.wikimedia.org/r/379479 (owner: 10Legoktm) [07:42:25] (03CR) 10Hashar: [C: 032] "And a follow up fix is https://gerrit.wikimedia.org/r/#/c/379297/" [integration/config] - 10https://gerrit.wikimedia.org/r/379473 (owner: 10Legoktm) [07:47:44] (03Merged) 10jenkins-bot: chmod +x dockerfiles/composer/prebuild.sh [integration/config] - 10https://gerrit.wikimedia.org/r/379473 (owner: 10Legoktm) [07:49:46] 10Release-Engineering-Team, 10Operations, 10Phabricator: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3623717 (10Joe) [07:49:55] 10Release-Engineering-Team, 10Operations, 10Phabricator: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3623729 (10Joe) p:05Triage>03High [07:52:01] (03CR) 10Hashar: dockerfiles: allow prebuild.sh without executable bit (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/379297 (owner: 10Hashar) [07:56:44] (03PS2) 10Hashar: fab: git gc zuul repo on the servers [integration/config] - 10https://gerrit.wikimedia.org/r/378667 [07:57:09] (03CR) 10Hashar: [C: 032] fab: git gc zuul repo on the servers [integration/config] - 10https://gerrit.wikimedia.org/r/378667 (owner: 10Hashar) [07:58:43] (03Merged) 10jenkins-bot: fab: git gc zuul repo on the servers [integration/config] - 10https://gerrit.wikimedia.org/r/378667 (owner: 10Hashar) [08:06:17] 10Beta-Cluster-Infrastructure: Access to deployment-prep for sau226 - https://phabricator.wikimedia.org/T176213#3623739 (10Sau226) I understand I did something wrong but on the other hand there is no evidence that is reasonably accessible on wiki that declares the user as a user for testing. I understand the pur... [08:11:40] addshore: good morning. I think in the Dockerfile we can just: ARG DEBIAN_FRONTEND=noninteractive [08:11:49] not sure whether that define the env variable at run time though :D [08:12:01] or whether ARG is inherited by child images [08:22:24] (03CR) 10Hashar: WIP (I dont know tox) tox Dockefile (034 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [08:23:25] (03PS2) 10Hashar: WIP (I dont know tox) tox Dockefile [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [08:23:35] (03CR) 10jerkins-bot: [V: 04-1] WIP (I dont know tox) tox Dockefile [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [08:24:11] (03PS3) 10Hashar: WIP (I dont know tox) tox Dockefile [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [08:25:54] (03CR) 10Hashar: [C: 031] "Cherry picked against tip of master to get rid of the dependent change that provides the phan image." [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [08:39:20] RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [08:40:26] RECOVERY - Puppet errors on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:40:32] RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:42:24] RECOVERY - Puppet errors on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [0.0] [08:42:46] RECOVERY - Puppet errors on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:44:25] RECOVERY - Puppet errors on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0] [08:44:49] RECOVERY - Puppet errors on deployment-eventlog02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:45:27] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [08:45:33] RECOVERY - Puppet errors on deployment-memc07 is OK: OK: Less than 1.00% above the threshold [0.0] [08:47:07] RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:47:37] RECOVERY - Puppet errors on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:47:45] RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:48:11] RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [08:48:26] RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:48:42] RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:48:56] RECOVERY - Puppet errors on deployment-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:49:16] RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:49:20] RECOVERY - Puppet errors on deployment-ms-fe02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:50:26] RECOVERY - Puppet errors on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [08:50:29] RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [08:50:48] RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:50:50] RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [08:51:04] RECOVERY - Puppet errors on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [08:51:08] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [08:51:29] hashar: have you looked at the docker plugin for Jenkins at all? [08:52:02] addshore: more or less and shared some summary on a task [08:52:03] I think I convinced myself a single docker image is the right way to go [08:52:04] let me find it [08:52:13] RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:52:18] While sleeping last night [08:52:20] Dan Duvall did try a docker plugin but dished it out [08:52:35] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:52:53] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:52:56] And although we probably want most images to be nice and generic to the type of job, there is also no reason we can't have ones like the puppet one to keep it super slick in CI [08:53:08] so dan dismissed it via https://phabricator.wikimedia.org/T150505 [08:53:21] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [08:53:29] RECOVERY - Puppet errors on deployment-ms-be03 is OK: OK: Less than 1.00% above the threshold [0.0] [08:53:38] but I cant find the writing I did about each of the plugins [08:55:41] RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [08:55:41] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:55:59] addshore: https://etherpad.wikimedia.org/p/nodepool-migration !!! [08:56:09] I list a bunch of them [08:56:15] TLDR my conclusion is: [08:56:20] RECOVERY - Puppet errors on deployment-sca04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:56:27] Short term: "Docker Plugin" which might be easy to integrate and would be back compatibile with our job definitions. [08:56:27] Middle term: rewrite jobs to use Pipeline syntax. [08:56:27] Long term: infra to a K8S [08:56:58] so eg https://wiki.jenkins.io/display/JENKINS/Docker+Plugin && https://docs.openstack.org/infra/jenkins-job-builder/properties.html?highlight=docker#properties.docker-container [08:57:22] RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [08:57:36] RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:57:48] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [08:58:28] RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:58:48] RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:59:16] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:59:35] Pipeline! I'm very pro [08:59:36] RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:59:52] Do you mean JenkinsFile style of pipeline? [09:00:47] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [09:00:53] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [09:00:55] RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:01:59] RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:02:03] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:02:22] RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0] [09:02:54] RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:03:11] What is blubber? :) [09:03:28] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:03:28] RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:03:32] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:04:08] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Determine rough plan to migrate out of Nodepool - https://phabricator.wikimedia.org/T176394#3623793 (10hashar) [09:04:35] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Determine rough plan to migrate out of Nodepool - https://phabricator.wikimedia.org/T176394#3623807 (10hashar) 05Open>03Resolved a:03hashar Was merely to copy paste from an etherpad. [09:04:42] RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:04:46] addshore: it is a middleware that abstracts a Dockerfile [09:05:06] so developers would be able to define what they want in the test/production image using an abstract syntax (yaml iirc) [09:05:16] Oooh, nice :) [09:05:23] blubber then process it and generate the Dockerfile for ya [09:05:31] so that is a way to give some liberty to developers [09:05:39] while still preventing them from adding random crap to the image :] [09:05:58] https://phabricator.wikimedia.org/source/blubber/repository/master/ < addshore [09:06:04] written in go and hosted on Differential [09:06:13] RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [09:06:44] so dev -> blubber -> dockerfile for each env -> docker registry -> deploy --> profit $$$$$ [09:07:07] with the goal of having ^^ chain to be fully automatic [09:07:33] RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [09:07:41] RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:07:45] RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:08:15] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:08:21] RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [09:08:23] RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0] [09:08:37] RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:08:46] RECOVERY - Puppet errors on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0] [09:08:50] RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:09:00] RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:09:34] RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0] [09:10:58] RECOVERY - Puppet errors on deployment-kafka-jumbo-2 is OK: OK: Less than 1.00% above the threshold [0.0] [09:12:52] RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [09:17:47] addshore: the jenkins pipelines, I am not sure how to migrate the jjb recipes to it :D [09:17:59] mostly, how to keep the templating system which we rely on heavily [09:18:35] meanwhile , i have amended/rebased your dockerfile for tox and that looks fine to me now [09:18:51] https://gerrit.wikimedia.org/r/#/c/377337/ just gotta change the subject line I guess [09:18:53] Cool, ill try and take a look in a bit [09:18:59] I have a kind of manic day today :) [09:25:15] I will be off in a short, got bunch of accounting to do this afternoon [09:25:20] and drive (I hate that) [09:56:00] Fetched 11.6 MB in 6s (1911 kB/s) [09:56:04] I got apt-cacher-ng setup :) [10:04:50] 10Release-Engineering-Team, 10Operations, 10Phabricator: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3623717 (10Paladox) Also would like to add that it is failing to connect to the db too. [10:09:04] (03PS1) 10Hashar: dockerfiles: support for http_proxy [integration/config] - 10https://gerrit.wikimedia.org/r/379507 [10:10:26] (03CR) 10Hashar: "That needs some careful testing I guess. That dramatically speed up the builds ;]" [integration/config] - 10https://gerrit.wikimedia.org/r/379507 (owner: 10Hashar) [10:16:29] I am off! [10:23:21] !log removed 6fdf6ee653 from deployment-prep's puppet master cherry picks (seemed an old version of https://gerrit.wikimedia.org/r/#/c/357985) [10:23:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:27:07] PROBLEM - Puppet errors on integration-saltmaster is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [10:43:16] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [11:02:11] RECOVERY - Puppet errors on integration-saltmaster is OK: OK: Less than 1.00% above the threshold [0.0] [11:09:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:11:10] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [11:44:20] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [12:17:59] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Patch-For-Review: Update gerrit to 2.14.4 - https://phabricator.wikimedia.org/T156120#3624129 (10Paladox) Gerrit 2.14 should be stable by now :). [12:40:19] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [12:55:53] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3624233 (10aude) [13:04:54] found an instance of mismatch where the whole revision is different [13:05:06] ou, sorry releng, whong channel [13:15:16] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [13:36:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:46:38] Yippee, build fixed! [13:46:38] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #529: 09FIXED in 2 min 36 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/529/ [14:17:17] 10Release-Engineering-Team, 10Operations, 10Phabricator: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3624488 (10Dzahn) What alert was it? I don't think there is any Icinga monitoring for it yet and it wasn't even expected to be used, like the servic... [14:28:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Upgrade docker on integration-slave-docker-* - https://phabricator.wikimedia.org/T176267#3624512 (10akosiaris) [14:28:08] 10Release-Engineering-Team (Next), 10Release Pipeline: Define new Jenkins pipeline for container build phase - https://phabricator.wikimedia.org/T175297#3624513 (10akosiaris) [14:28:10] 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, 10Patch-For-Review: Provision Docker >= 17.05 on contint1001 - https://phabricator.wikimedia.org/T175293#3624510 (10akosiaris) 05Open>03Resolved And done. Resolving [14:32:24] 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, 10Patch-For-Review: Provision Docker >= 17.05 on contint1001 - https://phabricator.wikimedia.org/T175293#3624518 (10hashar) ``` contint1001:~$ apt-cache policy docker-ce docker-ce: Installed: 17.06.2~ce-0~debian Candidate: 17.06.2~c... [14:47:22] (03CR) 10Zoranzoki21: [C: 031] Whitelist Dvorapa on Zuul CI [integration/config] - 10https://gerrit.wikimedia.org/r/375765 (owner: 10MarcoAurelio) [15:00:13] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Upgrade docker on integration-slave-docker-* - https://phabricator.wikimedia.org/T176267#3624616 (10hashar) So the slaves have `docker-engine` ``` $ apt-cache policy docker-engine docker-engine: Installed: 1.12.6-0~debian-jessie... [15:24:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<11.11%) [15:34:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<55.56%) [16:06:35] 10Beta-Cluster-Infrastructure: Access to deployment-prep for sau226 - https://phabricator.wikimedia.org/T176213#3617488 (10greg) Just to be clear, right now is your bot not running on the Beta Cluster, @Sau226 ? I don't see any recent edits or deletions from it, but I don't want it to start again. [16:13:14] 10Continuous-Integration-Infrastructure (phase-out-trusty), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Package php modules for Zend 5.5 on Jessie - https://phabricator.wikimedia.org/T174972#3624839 (10hashar) Update: php-defaults / php-redis got build with php5.5 and uploaded to component/ci.... [16:13:53] Project selenium-QuickSurveys » chrome,beta,Linux,BrowserTests build #538: 15ABORTED in 4.1 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/538/ [16:14:07] Project selenium-QuickSurveys » chrome,beta,Linux,BrowserTests build #540: 04STILL FAILING in 7.8 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/540/ [16:16:28] 10Beta-Cluster-Infrastructure: Access to deployment-prep for sau226 - https://phabricator.wikimedia.org/T176213#3624854 (10greg) And for the record, a page that you deleted (that then broke browser tests for developers, see T176290) had the text "Test is used by Selenium web driver". I would expect you to read... [16:16:40] 10Release-Engineering-Team, 10Operations, 10Phabricator: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3624856 (10Paladox) it does work, it seems. It is starting on port 22280 see /srv/phab/aphlict/config.json root@phabricator:/home/paladox# telnet l... [16:18:35] zeljkof: I addressed two of the three tasks about issues with beta clsuter and browser tests, but I haven't figured out https://phabricator.wikimedia.org/T176315 yet [16:35:14] 10Release-Engineering-Team, 10Operations, 10Phabricator, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3623717 (10mmodell) Indeed, it shouldn't be enabled or alerting. Hmm. [16:37:15] 10Release-Engineering-Team, 10Operations, 10Phabricator, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3624933 (10Paladox) i think systemd sent the alert, per recovery at 8:35am this morning [08:35:48] <+icinga-wm> RECOVERY - C... [16:44:19] 10Release-Engineering-Team, 10Operations, 10Phabricator, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3624976 (10mmodell) Is there a way to have a systemd unit installed but not auto-started/monitored/expected? That'd be ideal f... [16:44:46] is anyone else having an issue in gerrit where scrolling down a file diff makes the interface scroll up instead? Pasting the diffs I'm experiencing it on in case it's just those changes or something: [16:44:51] https://gerrit.wikimedia.org/r/#/c/379227/2/sys/mediawiki-history-metrics.js [16:44:54] https://gerrit.wikimedia.org/r/#/c/376235/10/oozie/mediawiki/history/reduced/mediawiki-stats-druid.md [16:45:43] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:48:04] 10Release-Engineering-Team, 10Operations, 10Phabricator, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3624986 (10Paladox) @mmodell yep, i think it's because we are using base::service, so i guess lets just say for it to not run.... [16:48:30] twentyafterfour woohoo [16:48:31] it works [16:48:32] https://phab-01.wmflabs.org/notification/ [16:48:43] now it's disconnected [16:48:44] hmm [16:49:48] we can remove all trusty code from phabricator [16:49:52] no longer needed. [16:49:58] we can remove upstart too [16:50:35] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 35629 bytes in 0.989 second response time [16:50:54] paladox: yep [16:51:26] * paladox wonders why it says starting up but then disconnects. [17:01:02] ah i know why [17:01:10] is it because the port has to be enabled [17:01:15] for use outside of the instance [17:14:16] Yippee, build fixed! [17:14:17] Project selenium-QuickSurveys » chrome,beta,Linux,BrowserTests build #541: 09FIXED in 30 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/541/ [17:26:46] thcipriani: did you miss any firewall/iptable rules on memc07? Seems a bit different than memc06. (e.g. https://phabricator.wikimedia.org/T175418) [17:27:24] AaronSchulz: hrm, lemme double check, I think I added the memcached security group... [17:28:15] memc06 works with telnet from tin just fine on the mc port [17:30:28] hrm, yeah, both have this in horizon: ALLOW 11211:11211/tcp [17:31:13] but I am seeing the same thing with netcat [17:31:30] 10Beta-Cluster-Infrastructure: Access to deployment-prep for sau226 - https://phabricator.wikimedia.org/T176213#3625187 (10Legoktm) This is my last response on this ticket. >>! In T176213#3623739, @Sau226 wrote: > I understand I did something wrong but on the other hand there is no evidence that is reasonably a... [17:33:24] and missing iptables rules on that box [17:34:26] and puppet does nothing to change that. Alright, I'm just going to rebuild that box [17:39:39] PROBLEM - Host deployment-memc07 is DOWN: CRITICAL - Host Unreachable (10.68.23.111) [17:58:14] AaronSchulz: rebuilt the host. should be fixed. iptables looks right. [18:04:03] RECOVERY - Host deployment-memc07 is UP: PING OK - Packet loss = 0%, RTA = 0.69 ms [18:12:16] thcipriani: would you have time to look at https://gerrit.wikimedia.org/r/#/c/379479/ and see if it's going in the right direction? my local version of docker is too old, so I haven't been able to test it [18:13:26] legoktm: yeah, I can test it out and give it some feedback, was excited to see the review in my inbox :) [18:13:31] :D [19:18:56] (03PS4) 10Addshore: WIP (I dont know tox) tox Dockefile [integration/config] - 10https://gerrit.wikimedia.org/r/377337 [19:31:24] oooh legoktm ill look too [19:32:04] ca-certificates, I think we should make a base image that has that and maybe git in it [19:32:58] --report-checkstyle=./log/checkstyle.xml, so with my testing yesterday and adding a user to the image such as phpcs, that wont be able to write to the log dir unless the dir is 777 [19:33:26] and then, if that user does write there, the jenkins user cant remove the log files for the next run outside of the container :/ [19:33:51] still not sure what the 'right' thing to do with users is, i hate hardcoding the jenkins user and ids in the image... [19:34:04] is there a problem if we run everything as root? [19:34:45] in the containers? so, ideally we shouldn't, and that probably then still has the same issue with removing logs etc after runs as they would then be owned by root [19:35:18] so, for the docker run command we can pass in -u and then a user/group combo, so the files would have the correct permissions :) [19:35:27] hm ok [19:35:50] However, git then complains if you do -u jenkins-deploy and the user isnt in /etc/passwd [19:36:21] I then tried -v \"//etc/passwd://etc/passwd\" but jenkins-deploy also isnt in the file on the host machine as it is just a ldap user not a machine user [19:36:30] then i went to bed [19:36:55] But I guess there is some reason git works on the host machine under the jenkins-deploy user so i guess there might be some other file that can be mounted that fixes that issue? [19:37:20] oh gosh [19:37:51] still, users can be a bit tricky and I was going to try and find some examples of other people doing similar things, do they just chmod things? chown things? use root? bla bla bla.... [19:38:38] but yeh, I made the php, composer, mediawiki-phan images etc to try out the other approach i was thinking of for containers for CI, where commands are dockerized but the whole build process actually isnt [19:38:49] and this user stuff was the last thing that tripped me up [19:39:07] (03PS5) 10Addshore: WIP DNM docker: alternate docker based phan job.. [integration/config] - 10https://gerrit.wikimedia.org/r/378535 [19:39:20] legoktm: ^^ that is my WIP job that is deployed as ....-phan-docker [19:39:53] but, not convinced thats the right path now, but it was a fun experiment [19:40:13] hmm [19:40:19] I think the whole process should be dockerized [19:40:37] well, it just seems a lot more straightforward if it was [19:40:49] yup [19:41:07] thats basically the conclusion i came to again after playing with that stuff yesterday [19:41:53] although im still not sure about the fetching of the code being in the container [19:42:08] ie. fetching of the code being tested [19:46:18] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [19:53:48] are the zuul refs on gerrit directly? IE can I use zuul-cloner locally to work as in prod / prodci? [19:54:57] no [19:55:04] I think zuul-merger creates them [19:55:12] ^ [19:55:36] they're in the zuul git repo on contint1001 iirc [19:55:51] which is only accessible to labs [19:56:06] for reasons I don't remember [19:57:06] you could probably fake the zuul refs [19:57:27] or just copy the git repo from contint1001 locally? it doesn't have any private stuff in it [19:57:33] yeh, thats the one reason im against having the zuul / fetching of code in the testing image [19:57:38] as then we can't use them locally [19:58:32] But I don't really see an issue of having 1 image / just the regulkar zuul-cloner fetching the code and then everything else happening in a testing image [20:00:02] thcipriani: do you have any idea about the git /etc/passwd file issue with users? or any idea why the host machine can use git with the jenkins-deploy user without git complaining? [20:01:35] addshore: can you show me what you're working on? Is this just a work around for hard-coding the jenkins-deploy uid? I'm fine hard-coding it as it likely won't change and we can just rebuild if it needs to... [20:02:12] but again, if you hardcode the id then the images can't be used for testing on other machines / locally :/ which was what I was trying to get to [20:02:36] maybe its just a dream I shouldnt be chasing [20:03:06] ah, gotcha. Hrm. This is a tricky dream to chase :) [20:03:22] so, things like this https://integration.wikimedia.org/ci/job/mwext-php70-phan-jessie-docker/22/console where I pass in the uid and group id and mount the passwd file but git still complains [20:03:22] https://integration.wikimedia.org/ci/job/mwext-php70-phan-jessie-docker/22/console [20:03:56] im thinking there must be another file I can mount or something for the ldap users... or i could mount a fake passwd file and it would work [20:06:40] "If you need to use SSH protocol inside a container (e.g., git clone), you will need the password file." from https://github.com/moby/moby/issues/22323, but that doesnt work as the host password file has no jenkins-deploy [20:06:45] anyway, going aronud in circles again :D [20:07:20] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:08:10] meanwhile I'm still trying to grok what's happening in this job :) [20:08:43] hehe :) [20:09:06] the phan job is essentially exactly the same as the previous phan job, just the composer, phan and zuul-cloner commands alias to docker run commands [20:09:19] hmmmm, Re the ldap thing >> https://github.com/moby/moby/issues/7057 [20:10:36] I mean, would it be possible for puppet to add the jenkins-deploy user to the passwd file on the docker hosts (maybe), again i could just mount a fake passwd file, anyway, i think i'll just keep mulling it over this evening [20:11:10] so fwiw, jenkins-deploy is not in /etc/passwd on any of the integration hosts, which is probably why mounting /etc/passwd doesn't work [20:11:24] running zuul-cloner on the host and the rest in docker seems not too crazy fwiw [20:11:26] * legoktm -> class [20:11:38] legoktm: *agrees* [20:12:09] sounds fine to me, just need to ensure job cleans up after itself [20:12:35] could use /srv/git/[repo] as reference repo, clone inside the workspace, mount as volume, run tests [20:13:12] but then getting the log files out is still problematic? [20:15:03] yeah using sudo docker run --rm -it -u 2947:500 to launch a container gives me the prompt: I have no name!@1441aabe8a0f [20:15:09] which is a little disturbing, really [20:16:26] so, I think zuul-cloner to clone as it already does to $WORKSPACE/src and then the testing image to mount both /src and /log and run on the code in src while outputing to log [20:16:28] so we only run phpcs on gate? seems like it would be more helpful to include that in the test jobs. [20:17:02] phpcs runs as part of the composer tests most of the time too [20:17:30] * twentyafterfour just +2'd a patch that failed gate for a formatting violation (missing a space before a closing paren) [20:17:37] https://gerrit.wikimedia.org/r/#/c/368803/ [20:17:41] thcipriani: maybe at the end of the image / run the files in /log get chmod so jenkins-deploy can delete them after the container run? [20:17:45] maybe just set up that way for skins? [20:17:54] sorry to distract [20:18:12] twentyafterfour: ahh yes maybe, the extensions mostly have a "composer test" which will include phpcs [20:18:33] addshore: so one awful way to do what you're trying to do that seems to work... getent passwd jenkins-deploy > fakepasswd; docker run --rm -it -u 2947:500 -v $(pwd)/fakepasswd:/etc/passwd --entrypoint /bin/bash wmfreleng/zuul-cloner:v2017.09.20.13.44 [20:19:12] thcipriani: yeh, that starts looking uglier and uglier, but i guess we could wrap it in something pretty.... [20:19:19] :) [20:19:24] put a dress on it :P [20:19:52] lipstick for all of the pigs [20:20:54] this is probably a sign we're attempting something weird in the first place [20:21:09] the fact that we have to mount a fake /etc/passwd, I mean [20:21:22] yup [20:21:23] the end goal would be: take a local directory, run tests, output logs [20:21:29] yup [20:21:45] i had vauge thoughts of a docker volume per jenkins job too, but that also sounds ugly [20:22:05] with the caveats that we don't want to run as root, and jenkins-deploy needs read on the logs [20:22:39] chmod 666 :P [20:23:02] or 644 seems like it should be fine, I think... [20:23:06] yeh [20:23:55] I mean, to make the docker images 'really nice' we could have an env var you could set to the uid that should be able to read stuff? and then the docker image has sudo to chown? and can do that as part of the run? [20:23:59] endless messy posibilites :D [20:24:09] :) [20:24:31] I just still cant decide which is 'right' or even 'nicest' [20:25:54] maybe to run the tests locally there's a docker container with jenkins in it [20:26:13] (no bad ideas in brainstorming) [20:27:54] maybe I should just stop thinking about that for now :P [20:29:22] I have seen quite a few docker containers that rather than create a user just use the user "nobody" [20:29:26] 10Release-Engineering-Team, 10Operations, 10Phabricator, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3625569 (10mmodell) [20:29:31] which will exist on the host system and in the image [20:30:00] and then maybe jenkins-deploy could actually have the access to chown from nobody to itself? maybe using a specific bash script through sudo? [20:30:18] That could solve the whole user issue [20:30:53] / only the ability to chown things in jenkins workspaces, which it should always own really anyway... [20:32:05] then the images could spit out logs in whatever fashion they want, and there are no wmf specific users in the image, that also works for local systems (i guess) people just have to chown the logs / sudo to read them? [20:35:23] hrm, the nobody user might work [20:35:28] just playing around with it [20:36:29] created a directory where anyone can write (umask 000), mounted in container, ran as nobody, put a file in there. Since the default umask is 022 jenkins-deploy can still read any file created by nobody [20:37:02] oooooohhhhhh [20:38:08] seems like an easier thing to script locally too [20:39:41] yup [20:39:44] oooooh :) [20:39:54] I'll probably take another dive into stuff tommorrw then! [20:40:22] awesome, hopefully that is a decent path forward :) [20:41:31] yep, that fixes the issue si hit most recently :) [20:42:00] Yippee, build fixed! [20:42:00] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #524: 09FIXED in 59 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/524/ [20:42:04] Yippee, build fixed! [20:42:04] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #524: 09FIXED in 1 min 3 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/524/ [20:44:40] right, bed [20:47:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [21:38:17] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [23:43:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0]