[00:00:23] <Krinkle>	 Yeah, I guess that means we need to have two menu items
[00:00:34] <Krinkle>	 But not sure what a good link label would be
[00:00:48] <Krinkle>	 Or maybe just linked from the main Coverage page
[00:00:52] <Krinkle>	 without being in the main menu
[00:00:55] <legoktm>	 I was thinking of maybe a submenu just for coverage
[00:00:56] <legoktm>	 yep
[00:00:58] <legoktm>	 that ;)
[00:01:49] <legoktm>	 huh, the first two nav links on https://doc.wikimedia.org/cover/extensions/ are broken
[00:03:33] <wmf-insecte>	 Project beta-code-update-eqiad build #188154: 04STILL FAILING in 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/188154/
[00:04:44] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline: Package/upload service-checker for Debian stretch - https://phabricator.wikimedia.org/T184224#3876861 (10dduvall) p:05Triage>03Normal
[00:09:50] <Krinkle>	 legoktm: Right. The Page thing expects a single depth of pages
[00:10:06] <Krinkle>	 so it allows linking from /foo/ to /bar/ as ../bar but doesn't deal with /foo/bar/ to /baz/
[00:10:34] <Krinkle>	 The main reason this complexity exists is because I want to be able to easily test this thing locally without it having its own doc root, e.g. from localhost/dev/integration-docroot/---- etc.
[00:10:42] <legoktm>	 me too...
[00:10:54] <legoktm>	 maybe cover-extensions/ ?
[00:13:45] <wmf-insecte>	 Yippee, build fixed!
[00:13:46] <wmf-insecte>	 Project beta-code-update-eqiad build #188155: 09FIXED in 45 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/188155/
[00:26:15] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[00:29:06] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[00:35:04] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[00:36:53] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[00:57:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[00:59:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[01:02:05] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[01:02:47] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[01:05:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[01:07:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[01:07:55] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[01:17:59] <wikibugs>	 10MediaWiki-Codesniffer: Undefined index: scope_opener - https://phabricator.wikimedia.org/T184232#3877014 (10Reedy)
[01:28:14] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-ms-be03 is OK: OK: Less than 1.00% above the threshold [3600.0]
[01:34:34] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-mx is CRITICAL: CRITICAL: deployment-prep.deployment-mx.diskspace._var_log.byte_percentfree (<100.00%)
[01:34:46] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:37:04] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:37:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:37:45] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:37:54] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: deployment-phab completely broken - https://phabricator.wikimedia.org/T184233#3877039 (10Krenair) p:05Triage>03Normal
[01:38:32] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: deployment-phab completely broken - https://phabricator.wikimedia.org/T184233#3877039 (10Krenair)
[01:39:07] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [3600.0]
[01:39:40] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-cache-text04 due to varnishkafka issues - https://phabricator.wikimedia.org/T184234#3877051 (10Krenair) p:05Triage>03Normal
[01:39:59] <shinken-wm>	 RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:40:01] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [3600.0]
[01:42:30] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Analytics, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3877064 (10Krenair) p:05Triage>03Normal
[01:43:03] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Analytics, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3877064 (10Krenair) ```krenair@deployment-kafka03:~$ sudo puppet agent -tv Warning: Setting configtimeout is deprecated.     (at /usr/lib/ruby/vendor_ruby/pup...
[01:46:45] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Analytics, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3877076 (10Krenair) 2.7G /var/log/daemon.log 2.6G /var/log/daemon.log.1 221M /var/log/kafka/controller.log 257M /var/log/kafka/kafka-mirror-main-deployment-pr...
[01:47:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0]
[01:47:55] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:49:19] <Krenair>	 hmph. zotero01 was having memory issues earlier. not anymore?
[01:51:12] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-ms-be0[34] with evaluation error in swift module - https://phabricator.wikimedia.org/T184236#3877079 (10Krenair) p:05Triage>03Normal
[01:53:10] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-eventlogging04 due to missing repo on deployment-tin? - https://phabricator.wikimedia.org/T184238#3877100 (10Krenair) p:05Triage>03Normal
[01:54:52] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Analytics, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3877117 (10Krenair) Repeat of T174742 ?
[02:12:57] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3877128 (10Krenair) p:05Triage>03Normal
[02:15:05] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-kafka-jump-[12] due to version of a package being missing - https://phabricator.wikimedia.org/T184240#3877141 (10Krenair) p:05Triage>03Normal
[02:17:43] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-trending01 due to removal of role - https://phabricator.wikimedia.org/T184241#3877153 (10Krenair) p:05Triage>03Normal
[02:21:43] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-netbox, looks like it thinks its a prod box - https://phabricator.wikimedia.org/T184242#3877167 (10Krenair) p:05Triage>03Normal
[02:23:03] <wikibugs>	 10Beta-Cluster-Infrastructure: various .beta.wmflabs.org domains use an invalid ssl certificate - https://phabricator.wikimedia.org/T182927#3877181 (10Krenair) https://community.letsencrypt.org/t/staging-endpoint-for-acme-v2/49605
[02:29:57] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-redis0[12] due to systemd on trusty - https://phabricator.wikimedia.org/T184243#3877182 (10Krenair) p:05Triage>03Normal
[02:31:00] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mx due to systemd on trusty - https://phabricator.wikimedia.org/T184244#3877193 (10Krenair) p:05Triage>03Normal
[02:48:07] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-cache-text04 due to varnishkafka issues - https://phabricator.wikimedia.org/T184234#3877214 (10Krenair) hiera part: ```diff --git a/hieradata/labs/deployment-prep/host/deployment-cache-text04.yaml b/hieradata/labs/deployment-prep/host/deploym...
[03:07:38] <wmf-insecte>	 Project mwext-phpunit-coverage-publish build #31: 04FAILURE in 7 min 38 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/31/
[03:44:54] <wikibugs>	 10Continuous-Integration-Config, 10MinusX, 10Google-Code-in-2017, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Add MinusX to MediaWiki extensions and PHP library repos - https://phabricator.wikimedia.org/T175794#3877249 (10Ryan10145)
[04:49:42] <shinken-wm>	 PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0]
[05:14:44] <shinken-wm>	 PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [10.0]
[05:15:18] <wikibugs>	 (03PS1) 10Legoktm: Safely handle incomplete clover.xml files [integration/docroot] - 10https://gerrit.wikimedia.org/r/402174
[05:16:40] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Safely handle incomplete clover.xml files [integration/docroot] - 10https://gerrit.wikimedia.org/r/402174 (owner: 10Legoktm)
[05:17:02] <wikibugs>	 (03Merged) 10jenkins-bot: Safely handle incomplete clover.xml files [integration/docroot] - 10https://gerrit.wikimedia.org/r/402174 (owner: 10Legoktm)
[05:17:09] <wikibugs>	 (03CR) 10jenkins-bot: Safely handle incomplete clover.xml files [integration/docroot] - 10https://gerrit.wikimedia.org/r/402174 (owner: 10Legoktm)
[05:30:13] <wikibugs>	 (03PS1) 10Legoktm: Generate clover.xml files for tox-py27-coverage-publish [integration/config] - 10https://gerrit.wikimedia.org/r/402175 (https://phabricator.wikimedia.org/T179054)
[05:30:55] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Generate clover.xml files for tox-py27-coverage-publish [integration/config] - 10https://gerrit.wikimedia.org/r/402175 (https://phabricator.wikimedia.org/T179054) (owner: 10Legoktm)
[05:32:11] <wikibugs>	 (03Merged) 10jenkins-bot: Generate clover.xml files for tox-py27-coverage-publish [integration/config] - 10https://gerrit.wikimedia.org/r/402175 (https://phabricator.wikimedia.org/T179054) (owner: 10Legoktm)
[05:34:05] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), 10Patch-For-Review: Create pretty landing page at https://doc.wikimedia.org/cover/ - https://phabricator.wikimedia.org/T146970#3877367 (10Legoktm)
[05:34:07] <wikibugs>	 10Continuous-Integration-Config, 10Wiki-Loves-Monuments-Database, 10Patch-For-Review: Generate clover.xml for labs/tools/heritage - https://phabricator.wikimedia.org/T179054#3877365 (10Legoktm) 05Open>03Resolved https://doc.wikimedia.org/cover/ shows labs-tools-heritage at 40%, which matches https://doc....
[05:47:07] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<22.22%)
[06:42:24] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Operations, 10Traffic: Lower varnish caching length on doc.wikimedia.org - https://phabricator.wikimedia.org/T184255#3877424 (10Legoktm)
[07:07:09] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK
[07:17:09] <wikibugs>	 (03PS1) 10Legoktm: Rename to cover-extensions/ to avoid issues with subdirectories [integration/docroot] - 10https://gerrit.wikimedia.org/r/402184
[07:48:42] <wikibugs>	 (03PS1) 10Legoktm: Publish extension coverage to cover-extensions/ [integration/config] - 10https://gerrit.wikimedia.org/r/402186
[07:50:05] <wikibugs>	 (03PS2) 10Legoktm: Publish extension coverage to cover-extensions/ [integration/config] - 10https://gerrit.wikimedia.org/r/402186
[07:56:44] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Rename to cover-extensions/ to avoid issues with subdirectories [integration/docroot] - 10https://gerrit.wikimedia.org/r/402184 (owner: 10Legoktm)
[07:56:58] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Publish extension coverage to cover-extensions/ [integration/config] - 10https://gerrit.wikimedia.org/r/402186 (owner: 10Legoktm)
[07:57:10] <wikibugs>	 (03Merged) 10jenkins-bot: Rename to cover-extensions/ to avoid issues with subdirectories [integration/docroot] - 10https://gerrit.wikimedia.org/r/402184 (owner: 10Legoktm)
[07:57:16] <wikibugs>	 (03CR) 10jenkins-bot: Rename to cover-extensions/ to avoid issues with subdirectories [integration/docroot] - 10https://gerrit.wikimedia.org/r/402184 (owner: 10Legoktm)
[07:58:14] <wikibugs>	 (03Merged) 10jenkins-bot: Publish extension coverage to cover-extensions/ [integration/config] - 10https://gerrit.wikimedia.org/r/402186 (owner: 10Legoktm)
[07:59:42] <wikibugs>	 (03PS1) 10Legoktm: Install extension dependencies for coverage job [integration/config] - 10https://gerrit.wikimedia.org/r/402189
[08:00:11] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Install extension dependencies for coverage job [integration/config] - 10https://gerrit.wikimedia.org/r/402189 (owner: 10Legoktm)
[08:01:20] <wikibugs>	 (03Merged) 10jenkins-bot: Install extension dependencies for coverage job [integration/config] - 10https://gerrit.wikimedia.org/r/402189 (owner: 10Legoktm)
[08:01:38] <wmf-insecte>	 Yippee, build fixed!
[08:01:39] <wmf-insecte>	 Project mwext-phpunit-coverage-publish build #32: 09FIXED in 3 min 1 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/32/
[08:14:02] <wikibugs>	 10Continuous-Integration-Config, 10Operations: tox 2.5.0 on phabricator-jessie-diffs fails with ERROR: Commands not specified - https://phabricator.wikimedia.org/T184060#3877476 (10hashar) The revert commit for 2.7.0 https://github.com/tox-dev/tox/issues/454 which looks like a hack when one can achieve exactly...
[08:18:44] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikimedia-Portals, and 2 others: Update Portals page on Beta to reflect head of master branch - https://phabricator.wikimedia.org/T181799#3877478 (10hashar) 05Open>03Resolved
[08:21:17] <wikibugs>	 (03CR) 10Hashar: [C: 032] "Ah indeed they are non voting :}   Sorry for all the delay regarding the BlueSpice extensions." [integration/config] - 10https://gerrit.wikimedia.org/r/394578 (owner: 10Robert Vogel)
[08:21:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Changed settings for BlueSpice-repos [integration/config] - 10https://gerrit.wikimedia.org/r/394578 (owner: 10Robert Vogel)
[08:25:35] <wikibugs>	 (03PS7) 10Hashar: Changed settings for BlueSpice-repos [integration/config] - 10https://gerrit.wikimedia.org/r/394578 (owner: 10Robert Vogel)
[08:26:05] <wikibugs>	 (03CR) 10Hashar: [C: 032] "Rebased. Some will probably fail but can be fixed later on :-}" [integration/config] - 10https://gerrit.wikimedia.org/r/394578 (owner: 10Robert Vogel)
[08:28:13] <wikibugs>	 (03Merged) 10jenkins-bot: Changed settings for BlueSpice-repos [integration/config] - 10https://gerrit.wikimedia.org/r/394578 (owner: 10Robert Vogel)
[08:33:21] <wikibugs>	 (03PS2) 10Hashar: Add BlueSpicePageAccess extension to zuul/layout.yaml [integration/config] - 10https://gerrit.wikimedia.org/r/401627 (https://phabricator.wikimedia.org/T183674) (owner: 10Divadsn)
[08:35:11] <wikibugs>	 (03PS4) 10Hashar: Add BlueSpiceNamespaceCSS extension to zuul/layout.yaml [integration/config] - 10https://gerrit.wikimedia.org/r/401628 (https://phabricator.wikimedia.org/T183674) (owner: 10Divadsn)
[08:36:33] <wikibugs>	 (03CR) 10Hashar: [C: 032] "Rebased and moved the definition to have the extension in alphabetical order." [integration/config] - 10https://gerrit.wikimedia.org/r/401627 (https://phabricator.wikimedia.org/T183674) (owner: 10Divadsn)
[08:36:37] <wikibugs>	 (03CR) 10Hashar: [C: 032] "Rebased and moved the definition to have the extension in alphabetical order." [integration/config] - 10https://gerrit.wikimedia.org/r/401628 (https://phabricator.wikimedia.org/T183674) (owner: 10Divadsn)
[08:37:02] <wmf-insecte>	 Project mwext-phpunit-coverage-publish build #38: 04FAILURE in 34 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/38/
[08:37:44] <wikibugs>	 (03Merged) 10jenkins-bot: Add BlueSpicePageAccess extension to zuul/layout.yaml [integration/config] - 10https://gerrit.wikimedia.org/r/401627 (https://phabricator.wikimedia.org/T183674) (owner: 10Divadsn)
[08:38:06] <wikibugs>	 (03Merged) 10jenkins-bot: Add BlueSpiceNamespaceCSS extension to zuul/layout.yaml [integration/config] - 10https://gerrit.wikimedia.org/r/401628 (https://phabricator.wikimedia.org/T183674) (owner: 10Divadsn)
[08:39:48] <legoktm>	 ahhhhh
[08:39:52] <legoktm>	 not skins again :(
[08:40:09] <wmf-insecte>	 Yippee, build fixed!
[08:40:09] <wmf-insecte>	 Project mwext-phpunit-coverage-publish build #39: 09FIXED in 3 min 6 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/39/
[08:50:07] <wikibugs>	 10Continuous-Integration-Infrastructure, 10RemexHtml, 10Patch-For-Review: Figure out how to speed up RemexHtml coverage runs - https://phabricator.wikimedia.org/T179055#3877494 (10Legoktm) ...and with PHP 7, it takes 7 minutes. Wonderful. https://integration.wikimedia.org/ci/job/remexhtml-phpunit-coverage-pu...
[08:53:27] <wmf-insecte>	 Project mwext-phpunit-coverage-publish build #42: 04FAILURE in 5.6 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/42/
[08:54:50] <legoktm>	 ^ I'll fix the extension coverage job tomorrow
[08:55:11] <wikibugs>	 10Continuous-Integration-Config, 10Operations: tox 2.5.0 on phabricator-jessie-diffs fails with ERROR: Commands not specified - https://phabricator.wikimedia.org/T184060#3877497 (10fgiunchedi) 05Open>03Invalid Fair enough! Thanks @hashar !
[08:57:06] <wmf-insecte>	 Yippee, build fixed!
[08:57:07] <wmf-insecte>	 Project mwext-phpunit-coverage-publish build #43: 09FIXED in 3 min 38 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/43/
[09:12:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-snapshot01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[09:17:41] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3877516 (10mobrovac)
[09:17:43] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet, 10Services (done): Puppet broken on deployment-trending01 due to removal of role - https://phabricator.wikimedia.org/T184241#3877513 (10mobrovac) 05Open>03Resolved The instance has been deleted and its puppet prefix and web proxy cleaned up.
[09:20:38] <shinken-wm>	 PROBLEM - Host deployment-trending01 is DOWN: CRITICAL - Host Unreachable (10.68.18.186)
[10:29:55] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-kafka03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0]
[10:32:01] <icinga-wm>	 PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:02:01] <icinga-wm>	 RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[11:04:51] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-kafka-jump-[12] due to version of a package being missing - https://phabricator.wikimedia.org/T184240#3877141 (10Paladox) Probably want to include the os too like Jessie or stretch?
[11:07:06] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3877128 (10Paladox) Maybe stretch is pointing to an o...
[11:10:48] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-ms-be0[34] with evaluation error in swift module - https://phabricator.wikimedia.org/T184236#3877079 (10Paladox) I guess we can apply this https://github.com/wikimedia/mediawiki-vagrant/commit/ac6d19df598c75d97b635b026763ae7fd96f5970 fix at /...
[11:30:32] <wikibugs>	 10Continuous-Integration-Config, 10Wiki-Loves-Monuments-Database, 10Patch-For-Review: Generate clover.xml for labs/tools/heritage - https://phabricator.wikimedia.org/T179054#3877791 (10Lokal_Profil) Thanks!  Possibly a new task: Could we also generate coverage for the PHP components (the API) and if so is it...
[13:07:30] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: deployment-phab completely broken - https://phabricator.wikimedia.org/T184233#3877039 (10Paladox) This is probaly because puppet has been broken on this host for a long while now. Probaly needs to be recreated or deleted.  It’s been disconnected from getting any changes...
[13:12:43] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Scap, 10Wikimedia-Incident: Investigate deployment that caused high error-rate but wasn't prevented by Scap - https://phabricator.wikimedia.org/T183952#3878049 (10zeljkofilipin) Scap did fail during deployment. Since the commit that caused the failure was already merged...
[13:12:52] <wikibugs>	 10Continuous-Integration-Infrastructure: integration.integration-slave-jessie-1001 disk space full - https://phabricator.wikimedia.org/T184269#3878052 (10Paladox)
[14:10:43] <shinken-wm>	 RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0]
[14:19:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Flood of ORES errors at Beta Cluster - https://phabricator.wikimedia.org/T184276#3878311 (10MarcoAurelio)
[14:23:26] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Flood of ORES errors at Beta Cluster - https://phabricator.wikimedia.org/T184276#3878331 (10MarcoAurelio) https://logstash-beta.wmflabs.org/goto/3da590c69d2896cf4d4cd227616fcd29 is one of them, but you should check the...
[14:25:24] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Flood of ORES errors at Beta Cluster - https://phabricator.wikimedia.org/T184276#3878311 (10awight) @MarcoAurelio Thanks for the report!  Our celery worker died three days ago, probably due to out-of-memory.  It's not t...
[14:26:55] <halfak>	 !log restarted celery-ores-worker on deployment-sca03
[14:26:59] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Beta Cluster ORES celery worker dies - https://phabricator.wikimedia.org/T184276#3878342 (10awight)
[14:27:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[14:28:12] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Beta Cluster ORES celery worker dies - https://phabricator.wikimedia.org/T184276#3878346 (10MarcoAurelio) Dear @awight; thanks for your quick response. Yesterday @Krenair was discussing at -releng that there were a numb...
[14:31:14] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Beta Cluster ORES celery worker dies - https://phabricator.wikimedia.org/T184276#3878311 (10Halfak) It looks like we might need more memory on sca03 (or whatever beta cluster node we're deploying to).  Maybe it's time t...
[14:37:43] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Beta Cluster ORES celery worker dies - https://phabricator.wikimedia.org/T184276#3878361 (10Halfak) Alternatively, we could also reduce the # of workers from 8 to 4.   I think we could still handle beta-capacity with th...
[14:37:46] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Beta Cluster ORES celery worker dies - https://phabricator.wikimedia.org/T184276#3878362 (10awight) Looking at /srv/log/ores/app.log, we've been down for at least 2 weeks.  Any useful evidence has been rotated out of lo...
[15:00:11] <wmf-insecte>	 Project mwext-phpunit-coverage-publish build #46: 04FAILURE in 11 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/46/
[15:12:32] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:15:03] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis06 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:21:54] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Move beta cluster ORES to its own machine - https://phabricator.wikimedia.org/T184282#3878447 (10awight)
[15:27:44] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:33:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:37:54] <wikibugs>	 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Move beta cluster ORES to its own machine - https://phabricator.wikimedia.org/T184282#3878472 (10Halfak) FWIW, our staging machine for our CloudVPS install for ORES is 16GB and usually runs with 9.2GB free.  It has 8 ce...
[15:41:02] <wikibugs>	 10MediaWiki-Codesniffer: Undefined index: scope_opener - https://phabricator.wikimedia.org/T184232#3878475 (10Umherirrender)
[15:41:04] <wikibugs>	 10MediaWiki-Codesniffer: Undefined index: scope_opener in IfElseStructureSniff - https://phabricator.wikimedia.org/T183828#3878478 (10Umherirrender)
[15:47:30] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:57:37] <wmf-insecte>	 Yippee, build fixed!
[15:57:38] <wmf-insecte>	 Project mwext-phpunit-coverage-publish build #47: 09FIXED in 31 min: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/47/
[16:04:56] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [3600.0]
[16:16:38] <wikibugs>	 (03PS1) 10Umherirrender: Add BlueSpice extensions [integration/config] - 10https://gerrit.wikimedia.org/r/402374 (https://phabricator.wikimedia.org/T130811)
[16:22:02] <wikibugs>	 10Gerrit, 10Release-Engineering-Team (Kanban), 10Regression, 10Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#3878553 (10Paladox) See also https://groups.google.com/forum/m/#!topic/repo-discuss/rP3DdKXxHbI  This problem may not been fully fixed in 2.14 but...
[16:22:37] <wikibugs>	 (03PS2) 10Umherirrender: Add BlueSpice extensions [integration/config] - 10https://gerrit.wikimedia.org/r/402374 (https://phabricator.wikimedia.org/T130811)
[16:23:01] <wikibugs>	 (03CR) 10Umherirrender: "Removed BlueSpiceRSSFeeder, because it is an empty repo (waiting for inital commit)" [integration/config] - 10https://gerrit.wikimedia.org/r/402374 (https://phabricator.wikimedia.org/T130811) (owner: 10Umherirrender)
[16:32:05] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[17:12:57] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc06 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[17:47:57] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:10:52] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[18:39:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[18:45:51] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:53:38] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3878841 (10mobrovac) 05Resolved>03Open This is still an issue even though the params are now quoted:  ```lines=10 18:50:25 Started deploy [mathoid/deploy@c9957c...
[18:54:29] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-kafka03 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka03.diskspace.root.byte_percentfree (<100.00%)
[18:55:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3878848 (10mmodell) @mobrovac: beta is where proper testing takes place.
[18:56:31] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3878849 (10mmodell) I'm working on reverting the problematic change. Just give me a few more minutes.
[18:58:20] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3878867 (10mobrovac) >>! In T184176#3878848, @mmodell wrote: > @mobrovac: beta is where proper testing takes place.  If you define //testing// as //not working even...
[19:10:56] <wikibugs>	 (03PS1) 10Umherirrender: Archive mediawiki/extensions/DataTypes [integration/config] - 10https://gerrit.wikimedia.org/r/402412
[19:31:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3878921 (10mmodell) Ok I've installed 3.7.5 on deployment-tin, this should resolve the issue with tagging.
[19:35:38] <shinken-wm>	 PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[19:35:58] <shinken-wm>	 PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[19:36:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[19:36:36] <wikibugs>	 10MediaWiki-Codesniffer: Undefined index: scope_opener in IfElseStructureSniff - https://phabricator.wikimedia.org/T183828#3878926 (10Umherirrender) p:05Triage>03Normal a:03Umherirrender
[19:37:10] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[19:37:39] <wikibugs>	 (03PS1) 10Umherirrender: Fix Undefined index: scope_opener in IfElseStructureSniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/402418 (https://phabricator.wikimedia.org/T183828)
[19:38:12] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cassandra3-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[19:38:32] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[19:38:50] <shinken-wm>	 PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[19:38:54] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cpjobqueue is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[19:39:06] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3878942 (10mmodell) well now I need to force the right scap version on all of beta hosts...I'm gonna try to figure out how to use cumin to do that.
[19:40:35] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[19:41:51] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[19:41:59] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[19:44:18] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[19:44:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[19:46:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[19:48:33] <shinken-wm>	 PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[19:49:03] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[19:49:25] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[19:50:41] <shinken-wm>	 PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[19:51:02] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlog02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[19:57:51] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[19:59:29] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:00:05] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:00:35] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:01:56] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:03:04] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:09:09] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3879021 (10mmodell) Ok I force-downgraded scap on deployment-prep with cumin, as follows: ``` sudo cumin 'O{project:deployment-prep}' 'dpkg-query --status scap && D...
[20:09:23] <twentyafterfour>	 puppet errors should be fixed now
[20:09:44] <shinken-wm>	 PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0]
[20:10:10] <shinken-wm>	 PROBLEM - Host deployment-phab is DOWN: CRITICAL - Host Unreachable (10.68.17.67)
[20:10:34] <twentyafterfour>	 shinken-wm: deployment-phab is deleted, so of course it's down
[20:13:27] <bd808>	 twentyafterfour: it usually takes shinken 30m or so to grok a delete
[20:14:08] <twentyafterfour>	 yeah I just enjoy mocking the bots :-o
[20:14:40] <twentyafterfour>	 they will have their revenge one day when they take my job :P
[20:16:52] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:18:29] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:21:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:22:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[20:23:33] <shinken-wm>	 RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:24:02] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[20:24:17] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cassandra3-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:24:50] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:29] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:30:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:31:03] <shinken-wm>	 RECOVERY - Puppet errors on deployment-eventlog02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:34:30] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[20:34:30] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:35:08] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:36:58] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:37:52] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:38:04] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:40:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:40:43] <Krenair>	 plenty more puppet bugs open if anyone is interested: https://phabricator.wikimedia.org/T132259
[20:40:59] <shinken-wm>	 RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:40:59] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:41:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3773712 (10Krenair) >>! In T180935#3872752, @hashar wrote: > As for puppet being broken on several instances, indeed we could use some new tasks.  The reasons...
[20:42:10] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:43:12] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cassandra3-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:43:48] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3879068 (10mmodell)
[20:43:50] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: deployment-phab completely broken - https://phabricator.wikimedia.org/T184233#3879065 (10mmodell) 05Open>03Resolved a:03mmodell I deleted the instance
[20:43:56] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cpjobqueue is OK: OK: Less than 1.00% above the threshold [0.0]
[20:44:30] <Krenair>	 cpjobqueue?
[20:44:52] * twentyafterfour shrugs
[20:47:01] <legoktm>	 change prop job queue
[20:47:29] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3879086 (10Krenair) Nope, it just plain doesn't exist...
[20:48:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0]
[20:52:12] <wikibugs>	 10Gerrit, 10Release-Engineering-Team (Kanban), 10Regression, 10Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#3879094 (10demon) *eyeroll*  It'll be fixed when they stop putting canonical data in a secondary index.
[20:53:11] <paladox>	 Krenair try apt-get update && apt-get install prometheus-nutcracker-exporter
[20:54:02] <Krenair>	 same error as expected
[20:54:17] <paladox>	 what are the apt error you get when doing apt-get update please?
[21:02:44] <legoktm>	 !log legoktm@contint1001:/srv/org/wikimedia/doc/cover$ sudo -u jenkins-slave rm -rf extensions
[21:02:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[21:10:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3879147 (10Krenair) Dug into this a bit more with som...
[21:10:32] <wikibugs>	 (03PS1) 10Legoktm: Cleanup skins before setting up extension coverage job [integration/config] - 10https://gerrit.wikimedia.org/r/402428
[21:10:34] <wikibugs>	 (03PS1) 10Legoktm: Only generate coverage information for master [integration/config] - 10https://gerrit.wikimedia.org/r/402429
[21:10:43] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3879148 (10Paladox) It's due to the experimental comp...
[21:20:38] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki07 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:31:16] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Cleanup skins before setting up extension coverage job [integration/config] - 10https://gerrit.wikimedia.org/r/402428 (owner: 10Legoktm)
[21:31:29] <wikibugs>	 (03CR) 10Legoktm: [C: 032] "Noticed MobileFrontend's "specialpages" branch." [integration/config] - 10https://gerrit.wikimedia.org/r/402429 (owner: 10Legoktm)
[21:32:32] <wikibugs>	 (03Merged) 10jenkins-bot: Cleanup skins before setting up extension coverage job [integration/config] - 10https://gerrit.wikimedia.org/r/402428 (owner: 10Legoktm)
[21:32:39] <wikibugs>	 (03Merged) 10jenkins-bot: Only generate coverage information for master [integration/config] - 10https://gerrit.wikimedia.org/r/402429 (owner: 10Legoktm)
[21:40:52] <wikibugs>	 (03CR) 10Hashar: [C: 032] Archive mediawiki/extensions/DataTypes [integration/config] - 10https://gerrit.wikimedia.org/r/402412 (owner: 10Umherirrender)
[21:41:30] <wikibugs>	 (03CR) 10Hashar: [C: 032] Add BlueSpice extensions [integration/config] - 10https://gerrit.wikimedia.org/r/402374 (https://phabricator.wikimedia.org/T130811) (owner: 10Umherirrender)
[21:41:56] <wikibugs>	 (03Merged) 10jenkins-bot: Archive mediawiki/extensions/DataTypes [integration/config] - 10https://gerrit.wikimedia.org/r/402412 (owner: 10Umherirrender)
[21:42:38] <wikibugs>	 (03Merged) 10jenkins-bot: Add BlueSpice extensions [integration/config] - 10https://gerrit.wikimedia.org/r/402374 (https://phabricator.wikimedia.org/T130811) (owner: 10Umherirrender)
[22:08:00] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10fullscreenorgId=1
[22:09:39] <Amir1>	 jenkins still doesn't merge ores wheels: https://gerrit.wikimedia.org/r/#/c/401822/ I need to find out why
[22:11:00] <paladox>	 is it in integration/zuul
[22:11:06] <paladox>	 cannot use the noop template.
[22:11:43] <paladox>	 https://github.com/wikimedia/integration-config/blob/c26fa8dab814f0075497293691cda267e90423e5/zuul/layout.yaml#L8030
[22:11:45] <paladox>	 Amir1
[22:11:49] <paladox>	 you cannot use noop
[22:11:53] <paladox>	 it dosen't self merge.
[22:12:23] <Amir1>	 well, I saw it in other repos and it worked just fine
[22:12:32] <Amir1>	 search for noop tests
[22:12:46] <paladox>	 Yeh, it adds jenkins to the repo, but dosen't self merge.
[22:14:01] <paladox>	 some how  [22:08:00]  <icinga-wm>	PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10fullscreenorgId=1
[22:14:12] <paladox>	 is picked up when a change depends on a ton of other changes
[22:18:09] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10fullscreenorgId=1
[22:18:49] <hashar>	 Amir1: paladox yeah that is transient
[22:19:00] <paladox>	 yep
[22:19:20] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Upstream: Non-existent wiki urls on beta cluster gives Unsafe/Insecure connection message - https://phabricator.wikimedia.org/T173469#3879271 (10Krenair)
[22:19:22] <wikibugs>	 10Beta-Cluster-Infrastructure: various .beta.wmflabs.org domains use an invalid ssl certificate - https://phabricator.wikimedia.org/T182927#3879274 (10Krenair)
[22:19:35] <hashar>	 someone has sent a lot of dependent changes to Gerrit
[22:19:52] <hashar>	 which triggers a lot of merge checks , and it takes a bit to process them
[22:19:57] <paladox>	 yeh
[22:20:12] <hashar>	 https://gerrit.wikimedia.org/r/#/q/project:mediawiki/services/parsoid+is:open :]
[22:20:40] <hashar>	 it happens from time to time
[22:20:48] <hashar>	 can be solved by adding a few more zuul-merger instances
[22:22:07] <paladox>	 hashar that will be fixed with  https://github.com/openstack-infra/zuul/commit/773651ad7bf0fc6adba2357173ffb657d874478a
[22:22:52] <hashar>	 paladox: na that is slightly different :D
[22:22:57] <paladox>	 oh
[22:28:29] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3879304 (10Krenair) Patch handl...
[22:28:59] <Krenair>	 Jan 05 22:23:25 deployment-videoscaler01 ferm[24188]: DNS query for 'deployment-prometheus01.deployment-prep.eqiad.wmflabs' failed: NXDOMAIN
[22:29:00] <Krenair>	 really.
[22:29:13] * paladox gets that error too
[22:29:19] <paladox>	 but for different hosts
[22:30:00] <Krenair>	 wonder if it's the AAAA thing
[22:33:13] <Krenair>	 why do I feel like I've fought with ferm and its backing DNS library before
[22:33:50] <Krenair>	 oh yeah here we go: https://phabricator.wikimedia.org/T153468
[22:34:05] <shinken-wm>	 RECOVERY - Puppet errors on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:34:33] <shinken-wm>	 RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:35:39] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis06 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:39:56] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3879347 (10Krenair) Actually th...
[23:05:15] <halfak>	 !log restarted staged ores-wmflabs-deploy:8d252de
[23:05:25] <halfak>	 Woops
[23:05:32] <halfak>	 wrong channel :|
[23:07:01] <halfak>	 Seems that stashbot has died so maybe no one will know my mistake
[23:18:59] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka-jumbo-2 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:21:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:22:05] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3879427 (10Krenair)
[23:23:08] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2192864 (10demon) Is this really best as a tracking task or should we add it to the deployment-prep workboard column? The task by its nature is always gonna be...
[23:24:34] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3879432 (10Krenair) It's fine with me if you want to move them all to a particular workboard column instead of a tracking task
[23:36:08] <shinken-wm>	 PROBLEM - Puppet errors on deployment-netbox is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[23:38:19] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10fullscreenorgId=1
[23:48:19] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10fullscreenorgId=1