[01:32:27] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [04:06:50] Yippee, build fixed! [04:06:50] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,BrowserTests build #583: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=BrowserTests/583/ [05:17:00] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<20.00%) [05:47:39] 10Beta-Cluster-Infrastructure, 10MediaWiki-Authentication-and-authorization, 10MediaWiki-extensions-CentralAuth, 10MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)): "Loss of session data" on Beta Cluster - https://phabricator.wikimedia.org/T172560#3773496 (10greg) >>! In T172560#3771432, @Anom... [06:43:43] PROBLEM - Puppet errors on deployment-cumin is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:02:00] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:18:42] RECOVERY - Puppet errors on deployment-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [07:38:53] PROBLEM - Free space - all mounts on integration-slave-jessie-1004 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1004.diskspace._srv.byte_percentfree (<10.00%) [07:38:57] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<10.00%) [08:39:58] 10Release-Engineering-Team (Kanban), 10Wikimedia-Mailing-lists: qa-alerts admin password recovery - https://phabricator.wikimedia.org/T180933#3773664 (10hashar) [09:07:56] Project beta-scap-eqiad build #182841: 04FAILURE in 4 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/182841/ [09:16:51] Yippee, build fixed! [09:16:52] Project beta-scap-eqiad build #182842: 09FIXED in 3 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/182842/ [09:26:58] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10hashar) [09:27:16] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3773727 (10hashar) [09:27:30] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10hashar) p:05Triage>03High [09:29:36] !log deployment-tin: apt-mark hold scap | the apt-repo on deployment-tin is out of date | T180935 [09:29:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:29:41] T180935: Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935 [09:38:29] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773742 (10hashar) deployment-cache-text04 and deployment-cache-upload04 are broken because hieradata for roles are not applied on labs T120165. They are app... [09:39:07] !log deployment-prep added missing key between_bytes_timeout to cache::app_def_be_opts for deployment-cache-text04 and deployment-cache-upload04 | T180935 [09:39:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:39:11] T180935: Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935 [09:40:06] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773745 (10hashar) [09:41:09] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10hashar) [09:45:02] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773771 (10hashar) [09:48:49] RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:51:07] RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:54:42] RECOVERY - Puppet staleness on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [3600.0] [09:56:10] RECOVERY - Puppet staleness on deployment-redis06 is OK: OK: Less than 1.00% above the threshold [3600.0] [09:58:11] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773792 (10hashar) [09:59:37] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10hashar) [10:05:40] !log deployment-phab : set hiera 'phabricator_cluster_search: []' trying to unblock puppet and soft rebooted the instance | T180935 [10:05:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:05:44] T180935: Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935 [10:06:59] !log nodepool: manually deleted left over instances ci-jessie-wikimedia-894187 and ci-jessie-wikimedia-894188 . Jenkins fails to ssh to it and they were left ready for 72 hours. [10:07:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:16:18] Yippee, build fixed! [10:16:19] Project beta-scap-eqiad build #182848: 09FIXED in 2 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/182848/ [10:38:30] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773862 (10hashar) [10:38:32] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10hashar) [11:47:45] 10Continuous-Integration-Config, 10MediaWiki-extensions-General, 10Google-Code-in-2017: Add phan to MediaWiki extensions and skins for static analysis [cloneable] - https://phabricator.wikimedia.org/T179554#3774013 (10Aklapper) [12:00:58] 10Continuous-Integration-Config, 10MediaWiki-extensions-General, 10Google-Code-in-2017: Add phan to MediaWiki extensions and skins for static analysis [cloneable] - https://phabricator.wikimedia.org/T179554#3774039 (10Aklapper) Let's go for two extensions/skins per GCI task. Imported: https://codein.withgoog... [12:37:36] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10MoritzMuehlenhoff) If deployment-mx is still in use/needed, it should be reimaged to jessie or stretch. [12:40:53] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3774140 (10MoritzMuehlenhoff) deployment-tin seems failing because scap is put on hold, since 3.7.3 is also on apt.wikimedia.org "apt-mark unhold scap" should... [12:47:27] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3774154 (10hashar) I have marked scap on hold to get the version from apt.wikimedia.org, else it tries to get an outdated version generated by CI (from deploy... [13:36:20] hashar hi for deployment-phab we just need to specify mysql for search instead of elasticsearch [13:39:16] something like [13:40:37] https://phabricator.wikimedia.org/P6353 [13:41:10] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10Paladox) for deployment-phab we could do https://phabricator.wikimedia.org/P6353 (syntax untested but should work) [14:56:57] (03PS3) 10Hashar: Mark pywikibot/compat as archived [integration/config] - 10https://gerrit.wikimedia.org/r/388054 (https://phabricator.wikimedia.org/T101214) [14:57:15] (03CR) 10Hashar: [C: 032] Mark pywikibot/compat as archived [integration/config] - 10https://gerrit.wikimedia.org/r/388054 (https://phabricator.wikimedia.org/T101214) (owner: 10Hashar) [14:58:18] (03Abandoned) 10Hashar: build.py: refactor dep tree to use 'scratch' [integration/config] - 10https://gerrit.wikimedia.org/r/384580 (owner: 10Hashar) [14:58:35] (03Merged) 10jenkins-bot: Mark pywikibot/compat as archived [integration/config] - 10https://gerrit.wikimedia.org/r/388054 (https://phabricator.wikimedia.org/T101214) (owner: 10Hashar) [15:00:43] (03PS2) 10Hashar: Pass env to docker run [integration/config] - 10https://gerrit.wikimedia.org/r/390432 (https://phabricator.wikimedia.org/T177684) [15:02:07] 10Gerrit, 10Google-Code-in-2017: Add basic Gerrit support to git-repo - https://phabricator.wikimedia.org/T180962#3774570 (10jayvdb) [15:02:26] (03PS3) 10Hashar: Pass env to docker run [integration/config] - 10https://gerrit.wikimedia.org/r/390432 (https://phabricator.wikimedia.org/T177684) [15:02:53] (03CR) 10Hashar: [C: 032] "Rebased. I have made sure that there is no more any use of 'docker-zuul-env' and that all --env-file use <(/usr/bin/env)." [integration/config] - 10https://gerrit.wikimedia.org/r/390432 (https://phabricator.wikimedia.org/T177684) (owner: 10Hashar) [15:03:21] !log integration: pass all environment variables to the docker run commands | https://gerrit.wikimedia.org/r/#/c/390432/ | T177684 [15:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:03:25] T177684: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684 [15:03:51] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3774592 (10hashar) 05Open>03Resolved [15:04:08] (03Merged) 10jenkins-bot: Pass env to docker run [integration/config] - 10https://gerrit.wikimedia.org/r/390432 (https://phabricator.wikimedia.org/T177684) (owner: 10Hashar) [15:04:25] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3774594 (10hashar) We now pass the whole environment which includes `JENKINS_*` variables as... [15:05:54] 10Continuous-Integration-Infrastructure (shipyard): Investigate usage of Docker volume for CI - https://phabricator.wikimedia.org/T179742#3774600 (10hashar) p:05Triage>03Normal [15:13:58] RECOVERY - Free space - all mounts on integration-slave-jessie-1001 is OK: OK: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found) [15:20:37] (03PS1) 10Hashar: operations/dumps/statusapi add tox [integration/config] - 10https://gerrit.wikimedia.org/r/392427 (https://phabricator.wikimedia.org/T180328) [15:20:48] (03CR) 10Hashar: [C: 032] operations/dumps/statusapi add tox [integration/config] - 10https://gerrit.wikimedia.org/r/392427 (https://phabricator.wikimedia.org/T180328) (owner: 10Hashar) [15:21:45] (03Merged) 10jenkins-bot: operations/dumps/statusapi add tox [integration/config] - 10https://gerrit.wikimedia.org/r/392427 (https://phabricator.wikimedia.org/T180328) (owner: 10Hashar) [15:26:13] 10Continuous-Integration-Config, 10Dumps-Generation, 10Patch-For-Review: Add CI to all operations/dumps/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180328#3774615 (10hashar) [15:26:57] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Dumps-Generation: Add CI to all operations/dumps/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180328#3754037 (10hashar) [15:40:19] 10Continuous-Integration-Config, 10Tracking: Add CI to all Gerrit repositories - https://phabricator.wikimedia.org/T180317#3774656 (10hashar) [15:46:02] 10Gerrit, 10Google-Code-in-2017: Add basic Gerrit support to git-repo - https://phabricator.wikimedia.org/T180962#3774570 (10Paladox) I doint know how your going to support this as this will need to be done in gerrit master upstream and we are on 2.13 yet there are 2.14 and 2.15. [15:47:01] (03PS1) 10Hashar: Archive operations/puppet/mesos [integration/config] - 10https://gerrit.wikimedia.org/r/392434 [15:50:49] 10Release-Engineering-Team (Kanban), 10Wikimedia-Mailing-lists: qa-alerts admin password recovery - https://phabricator.wikimedia.org/T180933#3774687 (10RobH) 05Open>03Resolved a:03RobH Password reset, new password has been emailed to all admins of that list. [15:53:22] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:53:52] PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [15:53:54] PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [15:54:07] PROBLEM - Puppet errors on jenkinstest is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:54:45] PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:55:14] (03PS2) 10Hashar: Archive some operations/* repositories [integration/config] - 10https://gerrit.wikimedia.org/r/392434 [15:55:15] (03PS1) 10Hashar: Add debian-glue to some Debian packages [integration/config] - 10https://gerrit.wikimedia.org/r/392439 (https://phabricator.wikimedia.org/T180330) [15:55:23] 10Continuous-Integration-Config, 10Operations, 10Patch-For-Review: Add CI to all operations/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180330#3774696 (10hashar) [15:55:31] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:55:47] PROBLEM - Puppet errors on deployment-puppetmaster02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:57:38] !log gerrit: deleted operations/network-diagrams mostly empty and no changes. Created back in 2012. [15:57:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:58:04] PROBLEM - Puppet errors on integration-slave-docker-1002 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:58:10] PROBLEM - Puppet errors on deployment-cassandra3-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:58:23] 10Continuous-Integration-Config, 10Operations, 10Patch-For-Review: Add CI to all operations/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180330#3774712 (10hashar) [15:58:27] PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:58:45] PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:58:56] PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [15:59:06] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/392439 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [15:59:20] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/392434 (owner: 10Hashar) [15:59:26] PROBLEM - Puppet errors on castor02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:59:31] PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:59:41] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:59:43] PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:59:49] grmbmbmb [15:59:53] PROBLEM - Puppet errors on integration-cumin is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:00:36] PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:00:51] PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:00:57] that is transient [16:01:13] PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:01:39] (03CR) 10Hashar: [C: 032] Archive some operations/* repositories [integration/config] - 10https://gerrit.wikimedia.org/r/392434 (owner: 10Hashar) [16:01:55] (03CR) 10Hashar: [C: 032] Add debian-glue to some Debian packages [integration/config] - 10https://gerrit.wikimedia.org/r/392439 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [16:02:37] (03Merged) 10jenkins-bot: Archive some operations/* repositories [integration/config] - 10https://gerrit.wikimedia.org/r/392434 (owner: 10Hashar) [16:02:54] (03Merged) 10jenkins-bot: Add debian-glue to some Debian packages [integration/config] - 10https://gerrit.wikimedia.org/r/392439 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [16:03:02] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:03:14] PROBLEM - Puppet errors on integration-slave-docker-1005 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:03:49] PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:03:53] PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:04:01] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:04:26] PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:04:44] PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:05:10] PROBLEM - Puppet errors on integration-slave-jessie-1004 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:05:18] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:05:20] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:05:27] PROBLEM - Puppet errors on deployment-memc06 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:05:29] PROBLEM - Puppet errors on integration-publishing is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:05:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:05:41] PROBLEM - Puppet errors on saucelabs-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:06:01] PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:06:19] PROBLEM - Puppet errors on saucelabs-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:10:09] 10Gerrit, 10Google-Code-in-2017: Add basic Gerrit support to git-repo - https://phabricator.wikimedia.org/T180962#3774784 (10Aklapper) @Paladox: I don't see how the last comment is relevant. "this" in "this will need to be done" remains vague. Please be specific. [16:14:09] 10Gerrit, 10Google-Code-in-2017: Add basic Gerrit support to git-repo - https://phabricator.wikimedia.org/T180962#3774815 (10Paladox) @Aklapper this task has the Google Code project. Which is suggesting that this is easy to implement and can run as soon as it is built. We are running gerrit 2.13 with plans to... [16:14:33] RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0] [16:14:37] PROBLEM - Puppet errors on integration-slave-docker-1003 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:14:39] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:19:44] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10thcipriani) >>! In T180935#3774154, @hashar wrote: > I have marked scap on hold to get the version from apt.wikimedia.org, else it tries to get an... [16:23:00] 10Gerrit, 10Google-Code-in-2017: Add basic Gerrit support to git-repo - https://phabricator.wikimedia.org/T180962#3774837 (10jayvdb) @Paladox , this is a git client tool, which interfaces with lots of different git servers. It can happily support them all. Please read tasks before you comment on them. [16:23:54] 10Gerrit, 10Google-Code-in-2017: Add basic Gerrit support to git-repo - https://phabricator.wikimedia.org/T180962#3774838 (10Paladox) Oh sorry. Im confused why was a task filled under the gerrit tag then? [16:30:32] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:32:28] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [16:33:07] RECOVERY - Puppet errors on integration-slave-docker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [16:33:25] RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0] [16:33:29] RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:33:45] RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:33:52] RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:33:56] RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:34:06] RECOVERY - Puppet errors on jenkinstest is OK: OK: Less than 1.00% above the threshold [0.0] [16:34:44] RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:35:36] RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [16:35:47] RECOVERY - Puppet errors on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:36:11] RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [16:37:21] 10Release-Engineering-Team (Kanban), 10Wikimedia-Mailing-lists: qa-alerts admin password recovery - https://phabricator.wikimedia.org/T180933#3774891 (10hashar) [16:37:42] 10Release-Engineering-Team (Kanban), 10Wikimedia-Mailing-lists: qa-alerts admin password recovery - https://phabricator.wikimedia.org/T180933#3773664 (10hashar) Thank you @RobH . I have added it to the release engineering pwstore :] [16:38:02] RECOVERY - Puppet errors on deployment-trending01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:38:13] RECOVERY - Puppet errors on deployment-cassandra3-01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:38:13] RECOVERY - Puppet errors on integration-slave-docker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [16:38:47] RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:38:57] RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [16:38:59] RECOVERY - Puppet errors on deployment-cassandra3-02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:04] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:23] RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:28] RECOVERY - Puppet errors on castor02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:40] RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:44] RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:46] RECOVERY - Puppet errors on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:54] RECOVERY - Puppet errors on integration-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [16:40:10] RECOVERY - Puppet errors on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [16:40:23] RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:40:25] RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0] [16:40:35] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:40:52] RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:41:00] RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:41:19] RECOVERY - Puppet errors on saucelabs-03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:45:18] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [16:45:28] RECOVERY - Puppet errors on integration-publishing is OK: OK: Less than 1.00% above the threshold [0.0] [16:45:40] RECOVERY - Puppet errors on saucelabs-02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:46:52] no_justification woo i've managed to add support for logstash in gerrit https://gerrit-review.googlesource.com/#/c/gerrit/+/142850/ without using a custom file. [16:47:47] but for now we have to use a file [16:47:54] until i get upstream to merge that [16:48:01] and manage to backport it to a stable release [16:49:36] RECOVERY - Puppet errors on integration-slave-docker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [16:49:41] RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:25:40] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikimedia-Portals, and 2 others: Create a Jenkins Job that builds the portal deployment artifacts in CI - https://phabricator.wikimedia.org/T179694#3775122 (10debt) [17:32:00] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline: Update Debian package for Blubber - https://phabricator.wikimedia.org/T179984#3775124 (10dduvall) Since we hadn't actually released 0.2.0 and there weren't any changes other than Debian package related ones, I moved the exist... [18:02:02] 10Beta-Cluster-Infrastructure, 10MediaWiki-Authentication-and-authorization, 10MediaWiki-extensions-CentralAuth, 10MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)): "Loss of session data" on Beta Cluster - https://phabricator.wikimedia.org/T172560#3775198 (10Anomie) One thing to try would be t... [18:06:21] 10Gerrit, 10Operations: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3775204 (10Paladox) [18:09:36] 10Release-Engineering-Team (Watching / External), 10Electron-PDFs, 10Operations, 10Proton, and 4 others: How should we get Chromium for use in puppeteer? - https://phabricator.wikimedia.org/T178570#3775220 (10phuedx) a:03phuedx [18:12:08] 10Release-Engineering-Team, 10MinervaNeue, 10Readers-Web-Backlog: Many MinervaNeue browser tests are failing intermittently but often on Chrome and Firefox - https://phabricator.wikimedia.org/T180828#3775229 (10Jdlrobson) 05Open>03declined Sure enough, the tests have recovered over the weekend. I don't s... [18:35:55] twentyafterfour: mind if I merge https://gerrit.wikimedia.org/r/#/c/391969/ ? [18:39:23] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:48:27] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikimedia-Portals, and 2 others: Create a Jenkins Job that builds the portal deployment artifacts in CI - https://phabricator.wikimedia.org/T179694#3775355 (10RobH) [20:09:46] (03PS1) 10Hashar: Revert "Pass env to docker run" [integration/config] - 10https://gerrit.wikimedia.org/r/392480 [20:09:51] (03PS2) 10Hashar: Revert "Pass env to docker run" [integration/config] - 10https://gerrit.wikimedia.org/r/392480 [20:09:57] (03CR) 10Hashar: [C: 032] Revert "Pass env to docker run" [integration/config] - 10https://gerrit.wikimedia.org/r/392480 (owner: 10Hashar) [20:10:29] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3775707 (10hashar) 05Resolved>03Open [20:12:41] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3775730 (10hashar) exec docker run --rm --env-file /dev/fd/63 --volume /srv/jenkins-workspa... [20:14:44] (03Merged) 10jenkins-bot: Revert "Pass env to docker run" [integration/config] - 10https://gerrit.wikimedia.org/r/392480 (owner: 10Hashar) [20:19:51] PROBLEM - Free space - all mounts on deployment-sca03 is CRITICAL: CRITICAL: deployment-prep.deployment-sca03.diskspace._srv.byte_percentfree (<40.00%) [20:22:51] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [20:23:44] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3775751 (10hashar) bash variables are not available to sub programs (eg /usr/bin/env) unless... [20:26:08] Project selenium-Wikibase-chrome » chrome,beta,Linux,DebianJessie && contintLabsSlave build #18: 04FAILURE in 39 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase-chrome/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=DebianJessie%20&&%20contintLabsSlave/18/ [20:32:49] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [20:41:06] 10Gerrit, 10Operations, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3775800 (10demon) [20:54:47] (03PS1) 10Jack Phoenix: Whitelist ShoutWiki and Uncyclomedia email addresses [integration/config] - 10https://gerrit.wikimedia.org/r/392492 [21:02:00] hashar hi quck question about https://gerrit.wikimedia.org/r/#/c/392489/ [21:02:11] the tests fail if i do include and if i do class [21:02:16] how do i fix that please? [21:02:49] it seems that the class error was recently introduced as this https://github.com/wikimedia/puppet/commit/bec1b31267e6ff68d00173d13fcf0e4c958b285d was converting from a include to a class. [21:04:30] ah never mind [21:04:53] paladox: that is introducing a new violation [21:04:58] to the wmf style guide [21:05:02] so the test fail [21:05:09] yep just realised that patch moved it into a new file [21:05:49] so yeah probably have to move all those apache:: things from gerrit module to the profile [21:06:02] that should be a different change though :) [21:06:20] ah yep thanks :) [21:06:21] oh [21:06:34] then you can rebase your http/2 patch https://gerrit.wikimedia.org/r/#/c/392489/ on top of it [21:11:51] yep. thanks [21:11:53] fixed now :) [21:16:18] 10Beta-Cluster-Infrastructure, 10Performance-Team, 10Patch-For-Review: Make MediaWiki profiler in Beta match production - https://phabricator.wikimedia.org/T180766#3775898 (10Gilles) a:03Krinkle [21:16:23] 10Beta-Cluster-Infrastructure, 10Performance-Team: Set up XHGui for Beta Cluster - https://phabricator.wikimedia.org/T180761#3775899 (10Krinkle) p:05Triage>03Low [21:16:41] 10Beta-Cluster-Infrastructure, 10Performance-Team, 10Patch-For-Review: Make MediaWiki profiler in Beta match production - https://phabricator.wikimedia.org/T180766#3775903 (10Krinkle) p:05Triage>03High [21:18:57] RECOVERY - Puppet errors on deployment-kafka-jumbo-2 is OK: OK: Less than 1.00% above the threshold [0.0] [21:20:25] RECOVERY - Puppet errors on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [0.0] [21:23:08] (03PS1) 10SamanthaNguyen: Add dependencies for social tool extensions [integration/config] - 10https://gerrit.wikimedia.org/r/392500 [21:28:28] (03CR) 10Hashar: [C: 032] Add dependencies for social tool extensions [integration/config] - 10https://gerrit.wikimedia.org/r/392500 (owner: 10SamanthaNguyen) [21:30:00] (03Merged) 10jenkins-bot: Add dependencies for social tool extensions [integration/config] - 10https://gerrit.wikimedia.org/r/392500 (owner: 10SamanthaNguyen) [21:30:24] (03CR) 10Hashar: "Deployed. Thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/392500 (owner: 10SamanthaNguyen) [21:34:58] PROBLEM - Puppet errors on deployment-kafka-jumbo-2 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:36:26] PROBLEM - Puppet errors on deployment-kafka-jumbo-1 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:44:18] 10Gerrit, 10Operations, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3775978 (10demon) Chatted with @bblack on IRC, couple of quick notes: * Good idea in general * Apache HTTP2 module hasn't been reviewed or used at WMF yet, so we should d... [23:16:24] RECOVERY - Puppet errors on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [0.0] [23:24:57] RECOVERY - Puppet errors on deployment-kafka-jumbo-2 is OK: OK: Less than 1.00% above the threshold [0.0] [23:25:07] RECOVERY - Puppet errors on deployment-netbox is OK: OK: Less than 1.00% above the threshold [0.0] [23:27:31] 10Release-Engineering-Team, 10Scoring-platform-team: Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776331 (10awight) [23:27:59] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team: Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776344 (10greg) [23:28:48] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team: Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776331 (10Halfak) I'm not sure why this issue didn't show up with https://ru.wikipedia.beta.wmflabs.org/w... [23:29:06] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T178635#3776352 (10Krinkle) [23:30:57] PROBLEM - Puppet errors on deployment-kafka-jumbo-2 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:36:10] PROBLEM - Puppet errors on deployment-netbox is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [23:37:26] PROBLEM - Puppet errors on deployment-kafka-jumbo-1 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [23:40:09] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team: Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776368 (10awight) [23:41:59] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team: Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776331 (10awight) [23:43:58] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team: Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776380 (10awight)