[00:14:41] 10RelEng-Archive-FY201718-Q1, 10Phabricator, 10User-Matthewrbowker: "No edit forms" when attempting to edit task - https://phabricator.wikimedia.org/T176769#3638294 (10Matthewrbowker) >>! In T176769#3637531, @mmodell wrote: > Uhm, I locked down the form because people complained that the bug task type would... [00:16:57] Yippee, build fixed! [00:16:58] Project selenium-Flow » firefox,beta,Linux,BrowserTests build #526: 09FIXED in 57 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/526/ [00:47:17] 10Continuous-Integration-Infrastructure, 10Composer, 10Patch-For-Review: Upgrade integration/composer to 1.4.1 stable - https://phabricator.wikimedia.org/T125343#1985014 (10Dzahn) Bump, this has been labeled "next" since February and there is a change sitting in Gerrit since March that is "Update composer to... [00:48:22] 10Gerrit, 10Release-Engineering-Team (Next), 10Operations, 10Patch-For-Review: Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3638321 (10Dzahn) Well, that change above has "--enable-sshd" as option when it is a slave, and it wasn't merged.. and we were wonde... [00:52:17] (03Abandoned) 10Dzahn: Add assert-phpflavor.sh shell script [integration/jenkins] - 10https://gerrit.wikimedia.org/r/296060 (https://phabricator.wikimedia.org/T124572) (owner: 10Paladox) [00:52:19] (03Abandoned) 10Dzahn: Add mw-teardown-postgresql for postgres [integration/jenkins] - 10https://gerrit.wikimedia.org/r/316230 (https://phabricator.wikimedia.org/T22343) (owner: 10Paladox) [04:56:26] (03PS6) 10Umherirrender: Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 [04:58:07] (03CR) 10jerkins-bot: [V: 04-1] Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [05:05:47] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:26:20] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:13] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47000 bytes in 1.264 second response time [05:35:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:28:21] 10Gerrit, 10Release-Engineering-Team (Next), 10Operations, 10Patch-For-Review: Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3638503 (10Paladox) Gerrit ssh won’t start because it carn’t connect to the mysql db. Since chad coulden’t get init to work. [06:57:19] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [07:11:02] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Define new Jenkins pipeline for container build phase - https://phabricator.wikimedia.org/T175297#3638550 (10hashar) On integration-slave-docker-1705 I changed the `docker.service` to use debug log level (in ExecStart, pass `-D`) a... [07:17:41] 10Gerrit, 10Release-Engineering-Team, 10Operations: 404 for /changes/ with Gerrit / git-review - https://phabricator.wikimedia.org/T176835#3638554 (10qoreqyas) [07:28:50] 10Gerrit, 10Release-Engineering-Team, 10Operations: 404 for /changes/ with Gerrit / git-review - https://phabricator.wikimedia.org/T176835#3638603 (10Paladox) Hi, this is a known problem with git-review. Try cloning over ssh please. [07:29:07] 10Gerrit: 404 for /changes/ with Gerrit / git-review - https://phabricator.wikimedia.org/T176835#3638604 (10Paladox) [07:34:31] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Define new Jenkins pipeline for container build phase - https://phabricator.wikimedia.org/T175297#3638605 (10hashar) Clarified with @joe labs instances are not allowed to interact with the docker registry. Nginx rejects them and th... [08:05:10] 10Beta-Cluster-Infrastructure, 10Multimedia, 10Thumbor, 10Multimedia-Team-Working-Board: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3638662 (10matthiasmullie) a:03matthiasmullie This is now deployed properly (on beta) [08:25:01] (03PS9) 10Hashar: dockerfiles: config file + http_proxy support [integration/config] - 10https://gerrit.wikimedia.org/r/379507 [08:25:02] (03PS1) 10Hashar: dockerfiles: create log dir in example-run.sh [integration/config] - 10https://gerrit.wikimedia.org/r/380925 [08:25:04] (03PS1) 10Hashar: dockerfiles: change tox to user nobody [integration/config] - 10https://gerrit.wikimedia.org/r/380926 [08:30:32] morning hashar [08:30:46] I threw up a test core phpcs job from legoktm's patch [08:30:54] got to sort out the git cache though I think [08:35:29] (03PS1) 10Hashar: Experimental integration-config-tox-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380929 [08:39:19] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [08:39:34] (03CR) 10Addshore: [C: 032] dockerfiles: create log dir in example-run.sh [integration/config] - 10https://gerrit.wikimedia.org/r/380925 (owner: 10Hashar) [08:40:22] (03CR) 10Addshore: [C: 032] dockerfiles: config file + http_proxy support [integration/config] - 10https://gerrit.wikimedia.org/r/379507 (owner: 10Hashar) [08:40:37] (03Merged) 10jenkins-bot: dockerfiles: create log dir in example-run.sh [integration/config] - 10https://gerrit.wikimedia.org/r/380925 (owner: 10Hashar) [08:40:42] hashar: hey, can I have an empty repo named wikibase/wikiba.se-deploy (See https://phabricator.wikimedia.org/T171274#3625886) also mirrored in github and diffusion [08:40:45] (03CR) 10Addshore: [C: 031] dockerfiles: change tox to user nobody [integration/config] - 10https://gerrit.wikimedia.org/r/380926 (owner: 10Hashar) [08:40:47] Should I make a phab card? [08:44:23] (03CR) 10Hashar: "https://integration.wikimedia.org/ci/job/integration-config-tox-docker/1/console" [integration/config] - 10https://gerrit.wikimedia.org/r/380926 (owner: 10Hashar) [08:44:38] Amir1: yes please. I am overwheelmed [08:44:58] oh, I hope I can help, is there any? [08:45:26] Amir1: get a sword of Docker +5 :] [08:45:58] addshore: with all the patches you and legoktm wrote, I managed to create a tox job in docker :] [08:46:07] :D [08:46:32] it looks like it fails? :D but runs!! [08:46:55] how can se setup a git cache in /srv/git on the docker hosts then? [08:47:10] hashar: that takes a little bit of time :D https://phabricator.wikimedia.org/T176841 [08:47:11] for the mediawiki-phpcs one I was going to try out running zuul on the host and then just mounting a src dir [08:48:02] I have tried with having git bare repositories on the host under /srv/git [08:48:15] then running docker -v /srv/git:/srv/git:ro [08:48:30] and then: zuul-cloner --cache-dir /srv/git [08:48:55] which under the hood causes zuul-cloner to invoke: git clone /srv/git/$ZUUL_PROJECT . [08:49:01] it then adds gerrit as a remote and fetch from it [08:49:15] then reach out to $ZUUL_URL to fetch $ZUUL_REF and checkout $ZUUL_COMMIT :] [08:49:23] the thing is [08:49:38] on nodepool the workspace and /srv/git are on the same partition and belogn to the same user [08:49:42] so git clone uses hard links [08:50:20] with the read-only volume, git clone is going to copy files. But maybe we can use the git alternates system though zuul-cloner does not support it [08:50:49] (see git clone --shared ) [08:52:09] hmm okay [08:52:37] hashar: see my comments on https://gerrit.wikimedia.org/r/#/c/379479/ [08:52:56] right now zuul is pointing to gerrit.wikimedia.org, isnt there somewhere closer and faster it can point at that has the same refs? [08:53:03] though there is an env variable for git: `GIT_ALTERNATE_OBJECT_DIRECTORIES`:: ... list of git object directories which can be used to search for Git objects. [08:53:37] I get confused again about the need of zuul-cloner vs a regular clone [08:53:58] for single repos it doesnt matter [08:54:11] you can just use the regular clone [08:54:27] ack [08:54:27] but when cloning core + extensions, it has some logic to map repositories names to directories [08:54:52] and checkout the proper reference from zuul-merger (or when reference is missing checkout $ZUUL_BRANCH) [08:55:00] so yeah, that is really when you gotta clone multiple repositories [08:55:02] how about zuul cloner in https://gerrit.wikimedia.org/r/#/c/379479/11/jjb/macro-docker.yaml, does that have to use gerrit or can it use the contint server? [08:55:21] 10Gerrit: 404 for /changes/ with Gerrit / git-review - https://phabricator.wikimedia.org/T176835#3638748 (10Peachey88) [08:55:49] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [08:56:11] (03CR) 10Addshore: "So, as this job only needs core it would be faster to do a shallow git clone." [integration/config] - 10https://gerrit.wikimedia.org/r/379479 (owner: 10Legoktm) [08:56:20] addshore: well it first clone from cache-dir if it has a bare repository for $ZUUL_PROJECT [08:56:36] then it would refresh the copy from gerrit [08:56:44] and finally fetch from $ZUUL_URL [08:56:55] so that uses both [08:57:22] one reason is the branches on the zuul-merger are wrong. They can be any arbitrary commit [08:57:39] when Gerrit has the canonical branches references [09:02:01] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:05:21] (03CR) 10Hashar: Experimental Migrate 'mediawiki-core-phpcs' job to docker (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/379479 (owner: 10Legoktm) [09:09:18] addshore: replied [09:12:55] (03CR) 10Addshore: Experimental Migrate 'mediawiki-core-phpcs' job to docker (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/379479 (owner: 10Legoktm) [09:13:01] hashar: awesome thanks! some great notes! [09:13:12] I'll get that job finished off and deployed today I expect [09:13:34] hashar: we need to have some icinga / whatever it is called checks on the docker hosts really [09:14:24] see https://phabricator.wikimedia.org/T176747 and https://phabricator.wikimedia.org/T176623 should have something that watches the number of containers and the number of images [09:14:44] containers should always be somewhere between 0 and the number of slots allowed by jenkins for jobs [09:15:14] (03PS4) 10Hashar: docker: base image for CI images [integration/config] - 10https://gerrit.wikimedia.org/r/378033 [09:15:18] some script to tidy up old images that are unused would also be good [09:15:34] (03CR) 10Hashar: [C: 04-1] "I have to migrate the other images to this new image." [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [09:16:03] addshore: do you mean running containers? [09:16:17] running containers and also just idle containers [09:16:29] supposedly Jenkins kill them for us after a timeout [09:16:42] yeh, see the ticket, apparently that doesnt happen in the real world [09:17:14] and as a result that slave can get overwhelmed really easily, i could barely log into 1001 when i found the containers still running [09:17:25] + some garbage collection such as: docker container prune && docker image prune [09:17:38] eek [09:17:51] OHHHH [09:17:52] (03CR) 10Addshore: docker: base image for CI images (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [09:17:53] yeah [09:17:56] hmm [09:17:58] I dont know [09:18:05] signal handling I guess [09:18:25] 10Gerrit: 404 for /changes/ with Gerrit / git-review - https://phabricator.wikimedia.org/T176835#3638554 (10Zoranzoki21) >>! In T176835#3638603, @Paladox wrote: > Hi, this is a known problem with git-review. Try cloning over ssh please. Yes. With ssh work normal. [09:18:41] like, my mad ideas included, giving the docker container a name, that is something assigned by jenkins, and then we can just run docker container stop XXXX always after running stuff? [09:18:42] (03PS5) 10Hashar: docker: base image for CI images [integration/config] - 10https://gerrit.wikimedia.org/r/378033 [09:19:50] (03CR) 10Addshore: docker: base image for CI images (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [09:25:56] (03CR) 10Hashar: docker: base image for CI images (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [09:31:20] (03CR) 10Hashar: [C: 031] "Should be good. I will port the other images one by one since they are not all so trivial." [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [09:31:50] addshore: sounds like a good idea. Then I would also love to find out why the container are left running behind [09:31:59] maybe because the signal is intercepted somehow [09:33:30] (03PS1) 10Hashar: (WIP) dockerfiles: migrate some images to ci-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/380940 [09:33:48] addshore: so I think https://gerrit.wikimedia.org/r/#/c/378033/ can be merged :] [09:33:52] that is very basic base image [09:34:16] Yup, I'll try building it now and see how it goes for me [09:34:24] Just restarting my laptop [09:39:03] (03CR) 10Addshore: docker: base image for CI images (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [09:48:46] addshore: yeah the apt config comes from bootstrap-vz, the recipe is somewhere in puppet.git :] [09:49:20] and potentially https://gerrit.wikimedia.org/r/#/c/380940/ would migrate some images to it ( wip , I havent build the image / checked them) [09:50:39] kids && lunch time [10:08:36] !log docker push docker.io/wmfreleng/ci-jessie:v2017.09.27.09.59 & latest (from https://gerrit.wikimedia.org/r/#/c/378033/5) [10:08:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:08:59] (03CR) 10Addshore: [C: 032] docker: base image for CI images [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [10:10:06] (03Merged) 10jenkins-bot: docker: base image for CI images [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [10:45:17] (03PS2) 10Addshore: (WIP) dockerfiles: migrate some images to ci-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/380940 (owner: 10Hashar) [10:46:39] (03CR) 10Addshore: "All built, here are the differences." [integration/config] - 10https://gerrit.wikimedia.org/r/380940 (owner: 10Hashar) [10:49:59] (03PS1) 10Matthias Mullie: Add 3D extension [tools/release] - 10https://gerrit.wikimedia.org/r/380949 [10:54:50] (03PS3) 10Addshore: (WIP) dockerfiles: migrate some images to ci-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/380940 (owner: 10Hashar) [10:55:07] (03CR) 10Addshore: "wmfreleng/composer v2017.09.27.10.49 3952217c3da9 7 seconds ago 192MB" [integration/config] - 10https://gerrit.wikimedia.org/r/380940 (owner: 10Hashar) [10:57:14] (03PS4) 10Addshore: dockerfiles: migrate some images to ci-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/380940 (owner: 10Hashar) [10:58:03] (03CR) 10Addshore: [C: 032] dockerfiles: migrate some images to ci-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/380940 (owner: 10Hashar) [11:00:30] (03Merged) 10jenkins-bot: dockerfiles: migrate some images to ci-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/380940 (owner: 10Hashar) [11:01:08] !log docker push docker.io/wmfreleng/tox:v2017.09.27.10.21 & latest (From https://gerrit.wikimedia.org/r/#/c/380940) [11:01:08] !log docker push docker.io/wmfreleng/php:v2017.09.27.10.21 & latest (From https://gerrit.wikimedia.org/r/#/c/380940) [11:01:09] !log docker push docker.io/wmfreleng/lintr:v2017.09.27.10.45 & latest (From https://gerrit.wikimedia.org/r/#/c/380940) [11:01:09] !log docker push docker.io/wmfreleng/composer:v2017.09.27.10.49 & latest (From https://gerrit.wikimedia.org/r/#/c/380940) [11:01:09] !log docker push docker.io/wmfreleng/php-mediawiki:v2017.09.27.10.51 & latest (From https://gerrit.wikimedia.org/r/#/c/380940) [11:01:09] !log docker push docker.io/wmfreleng/mediawiki-phan:v2017.09.27.10.53 & latest (From https://gerrit.wikimedia.org/r/#/c/380940) [11:01:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:01:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:01:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:01:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:01:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:01:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:02:23] It would be 'sweet' to get auto building working and then tag new images with the gerrit change number, so gerrit.380940 for example [11:12:10] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:37:59] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 392 bytes in 0.002 second response time [11:42:01] 10Gerrit: 404 for /changes/ with Gerrit / git-review - https://phabricator.wikimedia.org/T176835#3639094 (10Aklapper) See T100987 [11:42:09] 10Gerrit: 404 for /changes/ with Gerrit / git-review - https://phabricator.wikimedia.org/T176835#3639097 (10Aklapper) [11:42:11] 10Gerrit, 10RelEng-Archive-FY201718-Q1, 10Upstream: "git review -d XXX" doesn't work for http gerrit - https://phabricator.wikimedia.org/T100987#1325879 (10Aklapper) [11:47:10] addshore: I was thinking of a way to safely build docker images in CI . No luck so far but I will probably experiment with kvm / qemu [11:48:45] Well, a labs VM would do? ;) [11:49:20] I thought about docker build --isolation=qemu , but that is rejected for docker build ( https://github.com/moby/moby/issues/29454 ) [11:49:27] they dont want to support isolation [11:49:30] but [11:49:43] I think kubernetes support running the containers under a container [11:49:44] :D [12:01:50] hah, containers in containers [12:10:25] (03PS10) 10Addshore: dockerfiles: config file + http_proxy support [integration/config] - 10https://gerrit.wikimedia.org/r/379507 (owner: 10Hashar) [12:10:28] (03CR) 10Addshore: [C: 032] dockerfiles: config file + http_proxy support [integration/config] - 10https://gerrit.wikimedia.org/r/379507 (owner: 10Hashar) [12:12:20] (03Merged) 10jenkins-bot: dockerfiles: config file + http_proxy support [integration/config] - 10https://gerrit.wikimedia.org/r/379507 (owner: 10Hashar) [12:14:19] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [12:17:22] addshore: would you mind configuring your text editor to add a new line at end of files? :] [12:17:34] hahahhahaaa :p [12:20:02] (03PS1) 10Addshore: Add .editorconfig for newlines at end of files [integration/config] - 10https://gerrit.wikimedia.org/r/380955 [12:20:03] hasharAway: ^^ [12:20:12] !! [12:21:03] !!! [12:21:16] (03PS2) 10Hashar: dockerfiles: change tox to user nobody [integration/config] - 10https://gerrit.wikimedia.org/r/380926 [12:22:48] 10Gitblit-Deprecate, 10Diffusion: Redirect git.wikimedia.org HEAD URLs to Diffusion - https://phabricator.wikimedia.org/T141965#3639263 (10Aklapper) p:05Normal>03Low [12:24:38] (03PS3) 10Addshore: Add all wikidata build exts to make-wmf-branch [tools/release] - 10https://gerrit.wikimedia.org/r/378771 (https://phabricator.wikimedia.org/T173940) [12:27:07] !log docker push wmfreleng/tox:v2017.09.27.12.26 https://gerrit.wikimedia.org/r/#/c/380926/ [12:27:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:27:12] (03CR) 10Hashar: [C: 032] dockerfiles: change tox to user nobody [integration/config] - 10https://gerrit.wikimedia.org/r/380926 (owner: 10Hashar) [12:28:17] (03Merged) 10jenkins-bot: dockerfiles: change tox to user nobody [integration/config] - 10https://gerrit.wikimedia.org/r/380926 (owner: 10Hashar) [12:30:41] =] [12:31:53] not sure why docker push insist on pushing all the intermediate layers [12:32:00] but I guess there might be a reason [12:32:37] (03PS1) 10Addshore: Use lintr:v2017.09.27.10.45 in lintr-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380956 [12:32:52] (03CR) 10Addshore: [C: 032] Use lintr:v2017.09.27.10.45 in lintr-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380956 (owner: 10Addshore) [12:33:14] hasharAway: well, its because thats exactly how docker works :P you need all of the layers, otherwise you dont have a resulting image [12:33:27] which is why it is important to keep the layers as small as possible [12:33:46] still, it copies the whole layer when I would expect it to only push the delta [12:33:50] but maybe I am confused [12:34:07] maybe it is just badly worded? :P [12:34:35] the "layer" is the "delta", for example the layers that add users etc are not pushing the previous 100000MB too, just the few KB / MB of the user creation [12:34:59] (03Merged) 10jenkins-bot: Use lintr:v2017.09.27.10.45 in lintr-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380956 (owner: 10Addshore) [12:35:03] yeah yeah. I guess the tox image has two big layers (99MB and 83MB ) [12:36:03] (03PS2) 10Hashar: Experimental integration-config-tox-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380929 [12:36:13] (03PS4) 10Addshore: Add lintr-docker-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/380746 (https://phabricator.wikimedia.org/T176194) [12:36:17] (03CR) 10Addshore: [C: 032] Add lintr-docker-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/380746 (https://phabricator.wikimedia.org/T176194) (owner: 10Addshore) [12:37:04] hmm [12:37:16] bumping the tag version again and again is going to be tedious [12:37:31] what about having the jenkins jobs to always point to a "stable" tag ? [12:37:40] so then we can just push the tag maybe [12:38:09] (03Merged) 10jenkins-bot: Add lintr-docker-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/380746 (https://phabricator.wikimedia.org/T176194) (owner: 10Addshore) [12:38:09] hasharAway: yeh, I think something like that would probably make sense [12:38:51] !log Reloading Zuul to deploy - Add lintr-docker-non-voting [integration/config] - https://gerrit.wikimedia.org/r/380746 [12:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:57:23] (03PS3) 10Hashar: Experimental integration-config-tox-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380929 [12:57:25] (03PS1) 10Hashar: docker: always set XDG_CACHE_HOME=/cache [integration/config] - 10https://gerrit.wikimedia.org/r/380961 [12:57:27] (03PS1) 10Hashar: docker: stop passing HOME [integration/config] - 10https://gerrit.wikimedia.org/r/380962 [12:58:52] addshore: and also gotta rebuild the images in the proper order (since some depends on others) [12:59:11] not to say that I would love to see FROM foo:>=2.0 :] [12:59:21] Yup [12:59:37] Docker hub, the place where everyone just run from the tip of everything ! [12:59:56] Now the build script is in python it should be pretty easy to have 00_ci-jessie and 01-php etc [13:01:10] another thing that annoys me [13:01:22] if I want to build ci-jessie -> lintr [13:01:27] and assuming I have no image [13:01:35] it would download ci-jessie from docker registry [13:01:42] when I would instead expect a failure [13:02:03] all of that versionning is very confusing. But maybe I have too much expectations [13:02:42] (03PS4) 10Hashar: Experimental integration-config-tox-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380929 [13:02:44] (03PS2) 10Hashar: docker: stop passing HOME [integration/config] - 10https://gerrit.wikimedia.org/r/380962 [13:10:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:15:57] (03CR) 10Addshore: [C: 031] docker: always set XDG_CACHE_HOME=/cache [integration/config] - 10https://gerrit.wikimedia.org/r/380961 (owner: 10Hashar) [13:16:45] (03CR) 10Addshore: [C: 04-1] "This is still used by the operations-puppet job right now." [integration/config] - 10https://gerrit.wikimedia.org/r/380962 (owner: 10Hashar) [13:16:49] https://integration.wikimedia.org/ci/job/integration-config-tox-docker/4/console [13:16:50] bah [13:16:58] The directory '/cache/pip' or its parent directory is not owned by the current user [13:22:27] (03CR) 10MarkTraceur: [C: 032] Add 3D extension [tools/release] - 10https://gerrit.wikimedia.org/r/380949 (owner: 10Matthias Mullie) [13:23:06] (03Merged) 10jenkins-bot: Add 3D extension [tools/release] - 10https://gerrit.wikimedia.org/r/380949 (owner: 10Matthias Mullie) [13:28:03] (03PS2) 10Hashar: docker: always set XDG_CACHE_HOME=/cache [integration/config] - 10https://gerrit.wikimedia.org/r/380961 [13:28:05] (03PS5) 10Hashar: Experimental integration-config-tox-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380929 [13:28:07] (03PS3) 10Hashar: docker: stop passing HOME [integration/config] - 10https://gerrit.wikimedia.org/r/380962 [13:28:27] fixed :D [13:29:42] (03CR) 10Hashar: docker: always set XDG_CACHE_HOME=/cache (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/380961 (owner: 10Hashar) [13:40:20] (03PS3) 10Hashar: docker: always set XDG_CACHE_HOME=/cache [integration/config] - 10https://gerrit.wikimedia.org/r/380961 [13:40:22] (03PS6) 10Hashar: Experimental integration-config-tox-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/380929 [13:40:24] (03PS4) 10Hashar: docker: stop passing HOME [integration/config] - 10https://gerrit.wikimedia.org/r/380962 [13:40:56] (03CR) 10Hashar: [C: 04-1] "Bah:" [integration/config] - 10https://gerrit.wikimedia.org/r/380929 (owner: 10Hashar) [13:41:50] (03CR) 10Hashar: "That seems to work for the tox image (for which I provide a job in https://gerrit.wikimedia.org/r/#/c/380929/ )." [integration/config] - 10https://gerrit.wikimedia.org/r/380961 (owner: 10Hashar) [13:46:49] Yippee, build fixed! [13:46:49] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #535: 09FIXED in 2 min 48 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/535/ [13:49:45] install --mode 777 --directory cache ; docker run --volume cache:/cache [13:49:49] tis magic [13:57:37] bah requests is not happy inside the tox container [13:57:41] will debug that tonight [13:57:45] away again & [14:18:35] (03PS1) 10EddieGP: Add Zoranzoki21 to CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/380989 [14:34:37] Yippee, build fixed! [14:34:37] Project selenium-WikiLove » firefox,beta,Linux,BrowserTests build #529: 09FIXED in 2 min 36 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/529/ [15:03:08] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Operations-Software-Development, 10Patch-For-Review, 10Technical-Debt: Replace salt on integration and deployment-prep projects - https://phabricator.wikimedia.org/T176314#3639946 (10Volans) @hashar want to do the honors of being t... [15:14:46] 10Gerrit, 10Release-Engineering-Team (Next), 10Operations, 10Patch-For-Review: Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3639956 (10Dzahn) No, i did not create a hole. I just think these are 2 unrelated issues. gerrit service doesnt start because of the... [15:34:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [15:36:38] 10Gerrit, 10Release-Engineering-Team (Next), 10Operations, 10Patch-For-Review: Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3640045 (10Paladox) @Dzahn Ssh should start regardless of weather we specify that option or not since it is there just for consitenc... [15:47:58] (03CR) 10Zoranzoki21: [C: 031] "Thank you very much! :)" [integration/config] - 10https://gerrit.wikimedia.org/r/380989 (owner: 10EddieGP) [15:50:18] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [16:14:37] hasharAway: >> install --mode 777 --directory cache ; docker run --volume cache:/cache << oooh, thats a coooll idea [16:14:39] ooooooh [16:15:10] (03CR) 10Addshore: [C: 032] Add .editorconfig for newlines at end of files [integration/config] - 10https://gerrit.wikimedia.org/r/380955 (owner: 10Addshore) [16:15:43] (03CR) 10Thcipriani: make-wmf-branch: Stop amending commits for branches with submodules (031 comment) [tools/release] - 10https://gerrit.wikimedia.org/r/380885 (https://phabricator.wikimedia.org/T175324) (owner: 10Chad) [16:16:42] (03Merged) 10jenkins-bot: Add .editorconfig for newlines at end of files [integration/config] - 10https://gerrit.wikimedia.org/r/380955 (owner: 10Addshore) [16:21:46] (03PS1) 10Addshore: Revert "Low prio queue for libraryupdater" [integration/config] - 10https://gerrit.wikimedia.org/r/381024 [16:21:49] (03PS2) 10Addshore: Revert "Low prio queue for libraryupdater" [integration/config] - 10https://gerrit.wikimedia.org/r/381024 [16:21:53] (03CR) 10Addshore: [C: 032] Revert "Low prio queue for libraryupdater" [integration/config] - 10https://gerrit.wikimedia.org/r/381024 (owner: 10Addshore) [16:22:06] Amir1: looks like we need a bit more thought about the low prio stuff :( [16:22:13] it's not as simple as I first though [16:22:16] *thought [16:23:18] (03Merged) 10jenkins-bot: Revert "Low prio queue for libraryupdater" [integration/config] - 10https://gerrit.wikimedia.org/r/381024 (owner: 10Addshore) [16:38:29] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/381024 [16:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:40:44] 10MediaWiki-Codesniffer: Mediwiki CodeSniffer MediaWiki.Classes.UnusedUseStatement.UnusedUse does not recognize Annotations - https://phabricator.wikimedia.org/T176885#3640240 (10dbarratt) [16:53:23] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:00:12] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Define new Jenkins pipeline for container build phase - https://phabricator.wikimedia.org/T175297#3640301 (10dduvall) 05Open>03stalled Right on. Thanks for debugging that @hashar! I think we can continue to test the experimental... [17:00:27] PROBLEM - Puppet errors on castor02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:09:22] (03Draft1) 10Hashar: docker: example with a cache volume [integration/config] - 10https://gerrit.wikimedia.org/r/380980 [17:09:49] (03CR) 10Hashar: [C: 04-1] "Just a random example to share a cache with host. Which is probably a bad idea." [integration/config] - 10https://gerrit.wikimedia.org/r/380980 (owner: 10Hashar) [17:14:04] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations-Software-Development, 10Technical-Debt: Replace salt on integration and deployment-prep projects - https://phabricator.wikimedia.org/T176314#3640375 (10hashar) >>! In T176314#3639946,... [17:33:22] RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0] [17:40:27] RECOVERY - Puppet errors on castor02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:57:52] i had a question ... https://gerrit.wikimedia.org/r/#/c/379931/ didn't make the cut y'day since we didn't get our act together in time .. is it possible to backport it and have it out this week? [17:58:51] subbu: that's what SWATs are for :) [17:59:07] okay then! :) [17:59:12] but, if it's not a bug fix (I see no associated task) then it's lower priority [18:01:28] oh, i forgot to tag the phab task ( T176363 ).. but, yes, it is not a bug fix, but it is a bug report for wiki content. can i spin it that way? ;-) [18:01:29] T176363: New high-priority Linter category for a subset of misnested tags whose behavior will change with a HTML5 parser - https://phabricator.wikimedia.org/T176363 [18:02:42] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T172806#3640529 (10Jdforrester-WMF) [18:03:05] meh, I won't block you, but we try to keep swats to bugfix backports only. New features/general improvements can wait the week :) [18:04:37] ty .. we really want to give editors all the heads up they can get .. i meant for it to go out this week .. since the parsoid code will deploy today / tomorrow as well. but, we merged it just a little too late. [18:05:12] will try to be more on top of it going forward. [18:06:43] subbu: BE PERFECT PLEASE! :P [18:06:59] aye aye sir! [18:18:43] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Establish secure way of passing registry credentials from Jenkins to Docker - https://phabricator.wikimedia.org/T176896#3640558 (10dduvall) [18:18:57] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Define new Jenkins pipeline for container build phase - https://phabricator.wikimedia.org/T175297#3589305 (10dduvall) [18:19:00] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Establish secure way of passing registry credentials from Jenkins to Docker - https://phabricator.wikimedia.org/T176896#3640576 (10dduvall) [18:19:32] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Establish secure way of passing registry credentials from Jenkins to Docker - https://phabricator.wikimedia.org/T176896#3640558 (10dduvall) p:05Triage>03Normal [18:22:07] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Establish secure way of passing registry credentials from Jenkins to Docker - https://phabricator.wikimedia.org/T176896#3640583 (10dduvall) [18:25:00] 10Gitblit-Deprecate, 10Epic, 10MW-1.30-release-notes (WMF-deploy-2017-07-18_(1.30.0-wmf.10)), 10Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3640620 (10TerraCodes) [18:36:56] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T172806#3640666 (10Jdforrester-WMF) [18:52:09] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [19:39:19] !log deployment-prep: purging "ferm" on hosts that no more have it applied via puppet. There were some old iptables rules left around blocking access [19:39:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:03:37] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations-Software-Development, and 2 others: Replace salt on integration and deployment-prep projects - https://phabricator.wikimedia.org/T176314#3640963 (10hashar) It took me 14 minutes from th... [20:12:11] !log Deleted aptly.integration.eqiad.wmflabs and the https://integration-aptly.wmflabs.org/repo/ webproxy. They were for php5.5 packages on jessie, now available on apt.wm.o - T174972 [20:12:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:12:18] T174972: Package php modules for Zend 5.5 on Jessie - https://phabricator.wikimedia.org/T174972 [20:14:20] PROBLEM - Host aptly is DOWN: CRITICAL - Host Unreachable (10.68.16.60) [20:28:04] twentyafterfour: hey.. how can i get myself a phab user board? [20:28:28] jdlrobson: Sure, I think that's still a thing ;) [20:29:13] twentyafterfour: i opened a phab ticket but it's got lost in the machine i think [20:32:39] PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:32:49] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:32:51] PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [20:32:51] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:32:53] PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:33:01] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:33:04] jdlrobson: looking ... [20:33:31] PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:33:33] PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:33:33] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:33:37] PROBLEM - Puppet errors on deployment-sentry01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:33:45] PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:33:47] PROBLEM - Puppet errors on deployment-puppetmaster02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:33:47] PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:33:54] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:33:54] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:33:56] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:34:12] jdlrobson: https://phabricator.wikimedia.org/tag/user-jdlrobson/ [20:34:12] PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:34:22] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:34:24] PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:34:24] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:34:26] PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:34:28] PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:34:28] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:34:28] PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:34:30] PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:34:32] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:34:35] PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:34:41] PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:34:49] PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:34:49] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:34:57] PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:35:00] andre already created it [20:35:01] PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:35:15] PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:35:17] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:35:21] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:35:21] PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:35:25] w00ttt thanks twentyafterfour [20:35:25] PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [20:35:37] PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:35:41] PROBLEM - Puppet errors on deployment-cumin is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:35:43] PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:35:47] PROBLEM - Puppet errors on deployment-eventlog02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:35:54] wtf shinken [20:36:01] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Operations, 10Patch-For-Review: Reimage gerrit2001 as stretch - https://phabricator.wikimedia.org/T168562#3641012 (10Dzahn) gerrit2001 currently has a puppet error because Letsencrypt cert request gets denied by LE due to hitting rate limits. This affects t... [20:36:02] PROBLEM - Puppet errors on cumin is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:36:41] who broke puppet? [20:37:00] PROBLEM - Puppet errors on deployment-mediawiki07 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:37:04] PROBLEM - Puppet errors on deployment-sca03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:37:06] PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:37:10] PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:37:22] PROBLEM - Puppet errors on deployment-sca04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:37:22] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:37:26] PROBLEM - Puppet errors on deployment-pdf01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [20:37:26] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:37:26] PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:37:32] PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:37:32] PROBLEM - Puppet errors on deployment-memc07 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:37:40] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:37:49] PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:37:51] PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:38:09] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [20:38:09] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:38:13] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [20:38:21] PROBLEM - Puppet errors on deployment-videoscaler01 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [20:38:21] PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [20:42:25] RECOVERY - Puppet errors on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:45:24] RECOVERY - Puppet errors on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0] [20:45:24] PROBLEM - Host cumin is DOWN: CRITICAL - Host Unreachable (10.68.23.120) [20:45:46] RECOVERY - Puppet errors on deployment-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [20:47:04] RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:47:22] RECOVERY - Puppet errors on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [20:47:26] RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [20:47:33] RECOVERY - Puppet errors on deployment-memc07 is OK: OK: Less than 1.00% above the threshold [0.0] [20:48:07] RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:48:21] RECOVERY - Puppet errors on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:48:39] RECOVERY - Puppet errors on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:48:45] RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:49:13] RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:49:23] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:49:25] RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:49:39] RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:49:53] RECOVERY - Puppet errors on deployment-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:50:13] RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:50:19] RECOVERY - Puppet errors on deployment-ms-fe02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:52:04] RECOVERY - Puppet errors on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:52:44] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:52:48] RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:52:50] RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:52:50] RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [20:53:06] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [20:53:12] RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:53:23] RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:53:35] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:53:53] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:54:29] RECOVERY - Puppet errors on deployment-ms-be03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:56:00] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations-Software-Development, and 2 others: Replace salt on integration and deployment-prep projects - https://phabricator.wikimedia.org/T176314#3641059 (10hashar) a:03hashar I have provision... [20:56:57] RECOVERY - Puppet errors on deployment-mediawiki07 is OK: OK: Less than 1.00% above the threshold [0.0] [20:57:13] RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [20:57:19] RECOVERY - Puppet errors on deployment-sca04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:57:25] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [20:57:40] RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:57:48] RECOVERY - Puppet errors on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:57:50] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [20:57:54] RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:02] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:34] RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:46] RECOVERY - Puppet errors on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:46] RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:54] RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:54] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:23] RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:25] RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:29] RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:29] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:33] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:35] RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:49] RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:51] RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:15] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:21] RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:35] RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:44] RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:02:31] RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [21:03:35] RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0] [21:04:27] RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:05:01] RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:05:45] RECOVERY - Puppet errors on deployment-eventlog02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:12:52] (03PS1) 10Hashar: fab: migrate from salt to cumin [integration/config] - 10https://gerrit.wikimedia.org/r/381129 (https://phabricator.wikimedia.org/T176314) [21:13:47] (03CR) 10Hashar: "Salt will be gone by thursday :)" [integration/config] - 10https://gerrit.wikimedia.org/r/381129 (https://phabricator.wikimedia.org/T176314) (owner: 10Hashar) [21:22:19] (03PS1) 10Hashar: fab: skip hosts when deploying slave scripts [integration/config] - 10https://gerrit.wikimedia.org/r/381131 [21:23:23] (03CR) 10Hashar: [C: 031] "tada" [integration/config] - 10https://gerrit.wikimedia.org/r/381131 (owner: 10Hashar) [21:24:49] twentyafterfour: I probably broke puppet on the beta cluster sorry bout that :( [21:25:38] !log salt is being replaced by cumin instances being deployment-cumin and integration-cumin . Check this out: https://wikitech.wikimedia.org/wiki/Cumin ! [21:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:25:59] hashar: awesome (well, not re breakage, but re cumin :P ) [21:26:47] greg-g: v.olans created a nice step by step guide to setup a cumin master https://wikitech.wikimedia.org/wiki/Help:Cumin_master [21:27:15] https://wikitech.wikimedia.org/wiki/Cumin is quite nice as well [21:28:34] +1 [21:28:53] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations-Software-Development, and 2 others: Replace salt on integration and deployment-prep projects - https://phabricator.wikimedia.org/T176314#3641191 (10hashar) I created deployment-cumin an... [21:32:22] +1 [21:39:36] 10Gerrit, 10RelEng-Archive-FY201718-Q1, 10Upstream: "git review -d XXX" doesn't work for http gerrit - https://phabricator.wikimedia.org/T100987#3641221 (10hashar) At some point I should probably copy my previous comment T100987#3353888 to the wiki :-] [22:17:10] I'm deploying a small logging change to see if we screwed up and losing edits [22:21:17] oh good [22:38:18] * Sagan read "oh god" first.... [22:42:05] lol [22:43:31] WE'RE DOOMED! [23:02:51] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10Beta-Cluster-reproducible: beta: Wikidata dispatchChanges.php causes a lot of "Wikimedia\Rdbms\DatabaseMysqlBase::unlock failed to release lock" - https://phabricator.wikimedia.org/T175109#3583091 (10Krinkle) Still happ... [23:24:52] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10Beta-Cluster-reproducible, 10Wikimedia-log-errors: beta: Wikidata dispatchChanges.php causes a lot of "Wikimedia\Rdbms\DatabaseMysqlBase::unlock failed to release lock... - https://phabricator.wikimedia.org/T175109#3641504