[00:05:45] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[00:45:44] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[00:54:40] <hoo>	 Hi
[00:54:57] <hoo>	 does anyone know if (and how) one can trigger manual runs of a (browsertest) job?
[00:55:23] <hoo>	 nvm, got it
[02:13:11] <wmf-insecte>	 Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #146: 04FAILURE in 10 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/146/
[02:45:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[03:02:36] <Krenair>	 What is wrong with shinken...
[03:08:46] <wmf-insecte>	 Project selenium-Wikibase » firefox,test,Linux,contintLabsSlave && UbuntuTrusty build #100: 04FAILURE in 2 hr 13 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/100/
[03:10:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:18:39] <wmf-insecte>	 Yippee, build fixed!
[04:18:40] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #134: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/134/
[04:35:14] <legoktm>	 !log depooled integration-slave-jessie-1005 in jenkins so I can test puppet stuff on it
[04:35:22] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[04:59:56] <legoktm>	 !log added Krenair to integration project to help debug puppet stuff
[05:00:00] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[05:11:09] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1005 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[05:22:02] <legoktm>	 legoktm@integration-slave-jessie-1005:~$ PHP_BIN=php7.0 php --version
[05:22:02] <legoktm>	 PHP 7.0.10-1+0~20160829102714.10+jessie~1.gbpd58428 (cli) ( NTS )
[05:22:06] <legoktm>	 ta-daaaaa
[05:25:58] <legoktm>	 !log cherry-picked https://gerrit.wikimedia.org/r/#/c/308918/ onto integration-puppetmaster with a hack that has it only apply to integration-slave-jessie-1005
[05:26:03] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[05:26:10] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1005 is OK: OK: Less than 1.00% above the threshold [0.0]
[05:41:39] <wikibugs>	 10Continuous-Integration-Infrastructure: Investigate installing php5.3 on trusty and/or debian instance - https://phabricator.wikimedia.org/T103786#2613662 (10Legoktm) Precise LTS support ends in April 2017. MediaWiki 1.23 goes EOL in May 2017 (last version to support 5.3). If the labs team is okay with us havin...
[05:49:29] <grrrit-wm>	 (03PS1) 10Legoktm: Get rid of manual PHP_BIN handling [integration/jenkins] - 10https://gerrit.wikimedia.org/r/308931 
[05:50:22] <grrrit-wm>	 (03PS2) 10Legoktm: Get rid of manual PHP_BIN handling [integration/jenkins] - 10https://gerrit.wikimedia.org/r/308931 
[05:54:01] <grrrit-wm>	 (03PS3) 10Legoktm: Publish Doxygen and jsduck documentation for Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/299697 (https://phabricator.wikimedia.org/T140657) (owner: 10MaxSem)
[05:54:03] <grrrit-wm>	 (03PS1) 10Legoktm: Add mwext-doxygen-publish generic job [integration/config] - 10https://gerrit.wikimedia.org/r/308932 
[05:55:26] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Add mwext-doxygen-publish generic job [integration/config] - 10https://gerrit.wikimedia.org/r/308932 (owner: 10Legoktm)
[05:55:32] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Publish Doxygen and jsduck documentation for Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/299697 (https://phabricator.wikimedia.org/T140657) (owner: 10MaxSem)
[05:56:08] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add mwext-doxygen-publish generic job [integration/config] - 10https://gerrit.wikimedia.org/r/308932 (owner: 10Legoktm)
[05:56:10] <grrrit-wm>	 (03Merged) 10jenkins-bot: Publish Doxygen and jsduck documentation for Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/299697 (https://phabricator.wikimedia.org/T140657) (owner: 10MaxSem)
[05:57:03] <legoktm>	 !log deploying https://gerrit.wikimedia.org/r/308932 https://gerrit.wikimedia.org/r/299697
[05:57:07] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[06:08:27] <grrrit-wm>	 (03PS1) 10Legoktm: doc: Fix alpha-sort of MediaWiki extensions [integration/docroot] - 10https://gerrit.wikimedia.org/r/308934 
[06:08:29] <grrrit-wm>	 (03PS1) 10Legoktm: doc: Add Kartographer extension [integration/docroot] - 10https://gerrit.wikimedia.org/r/308935 (https://phabricator.wikimedia.org/T140657) 
[06:09:13] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] doc: Fix alpha-sort of MediaWiki extensions [integration/docroot] - 10https://gerrit.wikimedia.org/r/308934 (owner: 10Legoktm)
[06:09:16] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] doc: Add Kartographer extension [integration/docroot] - 10https://gerrit.wikimedia.org/r/308935 (https://phabricator.wikimedia.org/T140657) (owner: 10Legoktm)
[06:09:31] <grrrit-wm>	 (03Merged) 10jenkins-bot: doc: Fix alpha-sort of MediaWiki extensions [integration/docroot] - 10https://gerrit.wikimedia.org/r/308934 (owner: 10Legoktm)
[06:09:36] <grrrit-wm>	 (03Merged) 10jenkins-bot: doc: Add Kartographer extension [integration/docroot] - 10https://gerrit.wikimedia.org/r/308935 (https://phabricator.wikimedia.org/T140657) (owner: 10Legoktm)
[07:23:15] <shinken-wm>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[07:43:16] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2613749 (10Paladox) @legoktm hi, how would we manage to support php 5.6, and 7 on Jessie, or are we going to use php 7 instead of php 5.6?
[08:03:13] <shinken-wm>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[08:53:39] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Puppet, 07Technical-Debt, 07Zuul: role::zuul::configuration should be replaced by  hiera - https://phabricator.wikimedia.org/T139527#2613906 (10hashar) p:05Triage>03Normal a:03hashar As part of migrating CI from gallium to a new host, I...
[08:55:50] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Puppet, 07Technical-Debt, 07Zuul: role::zuul::configuration should be replaced by  hiera - https://phabricator.wikimedia.org/T139527#2613923 (10Paladox) @hashar I can ajust but I doint know how I can run puppet without it deleting the /var/ww...
[09:25:58] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[09:35:14] <grrrit-wm>	 (03PS2) 10Hashar: Revert "Move `rake` jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306723 
[09:35:16] <grrrit-wm>	 (03PS2) 10Hashar: Revert "rake: Fix bundle install path" [integration/config] - 10https://gerrit.wikimedia.org/r/306724 
[09:35:58] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Revert "Move `rake` jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306723 (owner: 10Hashar)
[09:36:22] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Revert "rake: Fix bundle install path" [integration/config] - 10https://gerrit.wikimedia.org/r/306724 (owner: 10Hashar)
[09:38:11] <grrrit-wm>	 (03PS3) 10Hashar: Revert "Move `rake` jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306723 
[09:38:13] <grrrit-wm>	 (03PS3) 10Hashar: Revert "rake: Fix bundle install path" [integration/config] - 10https://gerrit.wikimedia.org/r/306724 
[09:40:43] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Revert "Move `rake` jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306723 (owner: 10Hashar)
[09:40:45] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Revert "rake: Fix bundle install path" [integration/config] - 10https://gerrit.wikimedia.org/r/306724 (owner: 10Hashar)
[09:41:22] <hashar>	 !log Moving rake jobs back to Nodepool ( T143938 ) with https://gerrit.wikimedia.org/r/#/c/306723/ and https://gerrit.wikimedia.org/r/#/c/306724/
[09:41:26] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[09:41:52] <grrrit-wm>	 (03Merged) 10jenkins-bot: Revert "Move `rake` jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306723 (owner: 10Hashar)
[09:42:30] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Revert "rake: Fix bundle install path" [integration/config] - 10https://gerrit.wikimedia.org/r/306724 (owner: 10Hashar)
[09:42:50] <wikibugs>	 10Gerrit: Project access history links broken - https://phabricator.wikimedia.org/T120658#2614057 (10Paladox) 05Open>03declined I'm declining this based on the fact this is not going to work for the parent mediawiki project.
[09:44:00] <grrrit-wm>	 (03Merged) 10jenkins-bot: Revert "rake: Fix bundle install path" [integration/config] - 10https://gerrit.wikimedia.org/r/306724 (owner: 10Hashar)
[09:50:23] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2614078 (10hashar) I have moved the rake and oojs-ui-rake jobs to Nodepool. Validated them by hitting `recheck` on a couple dummy changes:  | https://gerrit.wikimed...
[09:51:12] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2614082 (10hashar)
[10:00:57] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:19:19] <grrrit-wm>	 (03PS2) 10Hashar: Revert "Move npm-node-4 off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306722 
[10:19:36] <grrrit-wm>	 (03PS3) 10Hashar: Revert "Move npm-node-4 off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306722 (https://phabricator.wikimedia.org/T143938) 
[10:25:57] <wikibugs>	 10Gerrit, 10grrrit-wm, 13Patch-For-Review: Merges of l10n updates by Jenkins should not be reported by grrrit-wm - https://phabricator.wikimedia.org/T93082#1128887 (10Paladox)
[10:37:13] <hashar>	 !log beta: cleaning up salt-keys on deployment-salt02 . Bunch of instances got deleted
[10:37:17] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[10:55:13] <hashar>	 !log integration: dropped PHP7 cherry pick from puppet master. https://gerrit.wikimedia.org/r/#/c/308918/ has been merged.   Pushing it to the fleet of permanent Jessie slaves. T144872
[10:55:17] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[11:03:09] <hashar>	 !log integration: cherry pick https://gerrit.wikimedia.org/r/#/c/308955/ "contint: prefer our bin/php alternative"  T144872
[11:03:13] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[11:04:49] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2614187 (10hashar) p:05Triage>03Normal
[11:41:53] <wikibugs>	 10Gerrit, 10grrrit-wm, 07Upstream: Patchsets created through web interface attributed to the wrong user - https://phabricator.wikimedia.org/T141329#2614232 (10Paladox) Yes.
[11:53:57] <hashar>	 !log Force refreshing Nodepool jessie snapshot to get PHP7 included T144872
[11:54:01] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[11:57:06] <hashar>	 bah
[11:57:07] <hashar>	 Error: Could not find dependency Apt::Repository[sury-php] for Package[php7.0-cli] at /puppet/modules/contint/manifests/packages/php.pp:44
[12:01:16] <wmf-insecte>	 Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #137: 04FAILURE in 15 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/137/
[12:09:02] <grrrit-wm>	 (03PS1) 10Hashar: dib: include contint::packages::apt [integration/config] - 10https://gerrit.wikimedia.org/r/308960 (https://phabricator.wikimedia.org/T144872) 
[12:09:46] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] dib: include contint::packages::apt [integration/config] - 10https://gerrit.wikimedia.org/r/308960 (https://phabricator.wikimedia.org/T144872) (owner: 10Hashar)
[12:10:28] <grrrit-wm>	 (03Merged) 10jenkins-bot: dib: include contint::packages::apt [integration/config] - 10https://gerrit.wikimedia.org/r/308960 (https://phabricator.wikimedia.org/T144872) (owner: 10Hashar)
[12:20:30] <grrrit-wm>	 (03PS1) 10Hashar: dib: add 'apt' class [integration/config] - 10https://gerrit.wikimedia.org/r/308962 
[12:21:00] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] dib: add 'apt' class [integration/config] - 10https://gerrit.wikimedia.org/r/308962 (owner: 10Hashar)
[12:21:58] <grrrit-wm>	 (03Merged) 10jenkins-bot: dib: add 'apt' class [integration/config] - 10https://gerrit.wikimedia.org/r/308962 (owner: 10Hashar)
[12:31:38] <grrrit-wm>	 (03PS1) 10Hashar: dib: apt-get update before php7 [integration/config] - 10https://gerrit.wikimedia.org/r/308963 
[12:31:57] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] dib: apt-get update before php7 [integration/config] - 10https://gerrit.wikimedia.org/r/308963 (owner: 10Hashar)
[12:32:58] <grrrit-wm>	 (03Merged) 10jenkins-bot: dib: apt-get update before php7 [integration/config] - 10https://gerrit.wikimedia.org/r/308963 (owner: 10Hashar)
[12:36:19] <phuedx>	 hey zeljkof, i may have found a bug in either the jenkins setup in mwext-mw-selenium or in mediawiki-ruby-api
[12:36:43] <phuedx>	 here's an example of related build failure: https://integration.wikimedia.org/ci/job/mwext-mw-selenium/9801/consoleFull
[12:37:30] <phuedx>	 Mediawiki::Client#get_wikitext is opening $MEDIAWIKI_URL/w/index.php
[12:38:00] <phuedx>	 but jenkins is setting $MEDIAWIKI_URL to ".../w/index.php/" anyway
[12:39:13] <zeljkof>	 phuedx: huh
[12:39:23] <zeljkof>	 can you create a task in phab, please?
[12:39:30] <phuedx>	 yup
[12:39:57] <phuedx>	 was wondering if i'd messed something up and you'd tell me in 2s flat ;)
[12:40:16] <zeljkof>	 phuedx: it's strange
[12:40:20] <zeljkof>	 did something change recently?
[12:40:38] <phuedx>	 i introduced the get_wikitext call
[12:40:48] <phuedx>	 in the change that's being tested
[12:41:48] <zeljkof>	 hm, I think we are using it elsewhere, it should not fail
[12:44:43] <grrrit-wm>	 (03PS1) 10Hashar: dib: swap puppet dep ordering [integration/config] - 10https://gerrit.wikimedia.org/r/308965 
[12:44:59] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] dib: swap puppet dep ordering [integration/config] - 10https://gerrit.wikimedia.org/r/308965 (owner: 10Hashar)
[12:45:32] <grrrit-wm>	 (03Merged) 10jenkins-bot: dib: swap puppet dep ordering [integration/config] - 10https://gerrit.wikimedia.org/r/308965 (owner: 10Hashar)
[12:47:14] <wikibugs>	 10Browser-Tests-Infrastructure, 07Jenkins, 07Ruby: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#2614350 (10phuedx)
[12:47:35] <phuedx>	 ^ zeljkof: hopefully that's explanatory
[12:48:36] <wikibugs>	 10Browser-Tests-Infrastructure, 07Jenkins, 07Ruby: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#2614375 (10zeljkofilipin) a:03zeljkofilipin
[12:48:47] <wikibugs>	 10Browser-Tests-Infrastructure, 07Jenkins, 07Ruby: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#2614350 (10zeljkofilipin) p:05Triage>03Normal
[12:50:23] <wikibugs>	 10Browser-Tests-Infrastructure, 07Jenkins, 07Ruby, 15User-zeljkofilipin: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#2614350 (10zeljkofilipin)
[12:52:35] <wikibugs>	 10Browser-Tests-Infrastructure, 07Jenkins, 07Ruby, 15User-zeljkofilipin: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#2614385 (10phuedx)
[12:52:40] <zeljkof>	 phuedx: thanks for the task, I will take a look, but probably tomorrow, I am in the middle of something else now
[12:53:13] <wikibugs>	 10Browser-Tests-Infrastructure, 07Jenkins, 07Ruby, 15User-zeljkofilipin: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#2614350 (10phuedx) ^ I mustn't disregard the possibility (strong likelihood) that I've done something wrong.
[12:53:17] <phuedx>	 zeljkof: no worries
[12:56:58] <grrrit-wm>	 (03PS1) 10Hashar: dib: an extra dependency for php7 class [integration/config] - 10https://gerrit.wikimedia.org/r/308969 
[13:02:37] <grrrit-wm>	 (03PS1) 10Hashar: dib: drop puppet chain, use serial execution [integration/config] - 10https://gerrit.wikimedia.org/r/308972 
[13:07:56] <grrrit-wm>	 (03PS1) 10Hashar: dib: require_package('apt-transport-https') [integration/config] - 10https://gerrit.wikimedia.org/r/308974 
[13:13:51] <hashar>	 !log Image ci-jessie-wikimedia-1473253681 in wmflabs-eqiad is ready  , has php7 packages. T144872
[13:13:55] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[13:13:58] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2614420 (10hashar) I had to a few other puppet tweak in integration/config to get the surly apt repo to be updated before the PHP7 package resources get realized. The Jessie Nodepo...
[13:28:58] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2614474 (10faidon) Yes, this is inline with what I've previously said and it sounds fine with me. This is really no...
[13:30:11] <wikibugs>	 10Continuous-Integration-Infrastructure: Install php7 and the php-ast extension so etsy/phan can be run from jenkins - https://phabricator.wikimedia.org/T132636#2614487 (10hashar)
[13:30:13] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2614486 (10hashar) 05Open>03Resolved
[13:38:27] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2614527 (10hashar)
[14:00:00] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2614563 (10hashar) The /usr/bin/php is wrong when provisioning images. The alternatives::install for php was run before the php7 packages which in turn overrides it.  Had to rebuil...
[15:17:53] <Krenair>	 hashar, hey
[15:18:00] <Krenair>	 what host did you build the nodepool image on?
[15:19:21] <hashar>	 Krenair: home machine
[15:19:26] <Krenair>	 ah
[15:19:36] <hashar>	 which is really just Jessie + libvirt-tools + diskimage-builder python software
[15:19:36] <hashar>	 iirc
[15:19:44] <hashar>	 I got one on labs at one point but it got deleted
[15:19:46] <Krenair>	 I just ask because I was looking at integration tasks and found T126613 
[15:19:58] <hashar>	 once the base image has been uploaded to labs,  we can get nodepool to refresh it 
[15:20:23] <hashar>	 with something like:   (get fix merged in puppet or integration/config)  then:  nodepool update wmflabs-eqiad  ci-jessie-wikimedia
[15:20:36] <hashar>	 that spawn an instance out of the image, run puppet and then snapshot the instance
[15:20:41] <hashar>	 the snapshot is then used to boot the instances being consumed
[15:20:45] <wikibugs>	 10Continuous-Integration-Infrastructure: Give @mobrovac access to CI instances - https://phabricator.wikimedia.org/T129880#2118616 (10Krenair) The blocked task is resolved... still want this @mobrovac?
[15:24:10] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests, 10netops: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2614974 (10hashar) Looping #netops .  We would need contint1001 to be moved to the public network with...
[15:24:36] <hashar>	 Krenair: yeah integration-dev was the instance
[15:24:38] <hoo>	 FYI: https://integration.wikimedia.org/ci/job/mwext-qunit-composer-jessie/1793/console
[15:24:50] <hoo>	 "The requested PHP extension ext-mbstring * is missing from your system. Install or enable PHP's mbstring extension." etc.
[15:26:14] <wikibugs>	 10Continuous-Integration-Infrastructure, 07WorkType-Maintenance: Rebuild integration-dev (instance to build images) - https://phabricator.wikimedia.org/T126613#2614994 (10hashar) 05Open>03declined No more needed.  To build an image one would need libvirt-tools and diskimage-builder then run the shell scrip...
[15:26:19] <hashar>	 Krenair: I have closed it. The doc is on http://wikitech.wikimedia.org/wiki/Nodepool
[15:27:12] <wmf-insecte>	 Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #150: 04FAILURE in 5 min 11 sec: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/150/
[15:27:15] <Krenair>	 ok
[15:27:36] <hashar>	 one less task thanks ! ;)
[15:29:42] <Krenair>	 yay
[15:35:28] <hashar>	 thcipriani: nodepool dead :(
[15:35:36] <hashar>	 can't reach the openstack api
[15:35:37] <thcipriani>	 wat!?
[15:35:39] <hashar>	 labnet1002.eqiad.wmnet  : no route to host
[15:35:44] <hashar>	 from labnodepool1001.eqiad.wmnet
[15:36:06] * hashar fills a task first
[15:36:42] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:37:02] <Krenair>	 hashar, what about labnet1001?
[15:37:32] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: labnet1002.eqiad.wmnet: no route to host - https://phabricator.wikimedia.org/T144945#2615093 (10hashar)
[15:37:34] <Krenair>	 pretty sure the active one is 1001
[15:38:13] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: labnet1002.eqiad.wmnet: no route to host - https://phabricator.wikimedia.org/T144945#2615108 (10hashar)
[15:39:02] <hashar>	 Krenair: well the keystone entry point seems to disagree / be out of date
[15:39:11] <Krenair>	 huh, ok
[15:39:57] <Krenair>	 hashar, I can ping it from inside labs...
[15:40:05] <Krenair>	 wonder why you can't from labnodepool
[15:40:08] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: labnet1002.eqiad.wmnet: no route to host - https://phabricator.wikimedia.org/T144945#2615093 (10hashar) Step to reproduce:  ``` ssh labnodepool1001.eqiad.wmnet  user@labnodepool1001:~$ become-nodepool   nodepool@labnodepool1001:~$ openst...
[15:40:32] <chasemp>	 andrewbogott: ^ keystone directory may still say labnet1002 fyi
[15:41:08] <andrewbogott>	 Yeah, I'm rebuilding it
[15:41:10] <andrewbogott>	 oh!
[15:41:19] <andrewbogott>	 yeah, I bet I broke the nova api, sorry.  One moment...
[15:41:28] <hashar>	 seems something does not gracefully failover
[15:43:50] <hashar>	 at least nodepool is not hammering the api as fast as it can.   It has a 60 seconds time out on api queries
[15:44:06] <hashar>	 seems to be enough to throttle nodepool requests
[15:45:02] <hashar>	 andrewbogott: it is back \O/
[15:45:10] <andrewbogott>	 good :)
[15:45:34] * andrewbogott updates the labnet failover docs
[15:45:47] <hashar>	 I am monitoring nodepool
[15:47:05] <shinken-wm>	 PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[15:48:12] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: labnet1002.eqiad.wmnet: no route to host - https://phabricator.wikimedia.org/T144945#2615165 (10hashar) 05Open>03Resolved a:03Andrew labnet1002 is in maintenance but the failover did not update Keystone.  The openstack CLI tool on...
[15:50:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Ensure /srv/deployment/integration/slave-scripts is latest master on deployment-tin - https://phabricator.wikimedia.org/T97324#2615186 (10Krenair)
[15:50:40] <legoktm>	 hashar: thank you for the follow up on php7 :)
[15:50:52] <hashar>	 legoktm: that gaves me a bunch of headaches :D
[15:50:58] <Krenair>	 legoktm, did you see my ping re integration-puppetmaster?
[15:51:08] <hashar>	 thanks to puppet, but php7 should be on both permanent and nodepool slaves now
[15:51:44] <legoktm>	 Krenair: yes, I assume we need to move it to jessie somehow?
[15:51:53] <Krenair>	 yes
[15:52:07] <Krenair>	 or just not-precise realy
[15:52:08] <Krenair>	 really*
[15:52:18] <hashar>	 bah nodepool is off
[15:52:28] <hashar>	 it has four instances flagged for deletion
[15:52:42] <hashar>	 but somehow the requests are made to the old entry labnet1001 entry point
[15:52:46] <hashar>	 so they fail
[15:58:15] <hashar>	 !log Restarting Nodepool . Lost state when labnet got moved T144945
[15:58:20] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:09:13] <hashar>	 !log Nodepool back in action. Had to manually delete some instances in labs 
[16:09:17] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:11:26] <wikibugs>	 10Continuous-Integration-Infrastructure: Move integration-puppetmaster off of precise (probably to jessie) - https://phabricator.wikimedia.org/T144951#2615279 (10Legoktm)
[16:11:44] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:11:45] <legoktm>	 Krenair: ^ is there a task we should block on or depend upon?
[16:22:05] <shinken-wm>	 RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:25:35] <Krenair>	 legoktm, asking
[16:29:06] <Krenair>	 legoktm, maybe not
[16:31:22] <wikibugs>	 10Continuous-Integration-Infrastructure: Install php7 and the php-ast extension so etsy/phan can be run from jenkins - https://phabricator.wikimedia.org/T132636#2615347 (10Legoktm) We have PHP7 now, the external repo we're using doesn't have php-ast yet, so we need to ask them to include it, or package/build it...
[16:47:11] <wikibugs>	 06Release-Engineering-Team, 10DBA, 10MediaWiki-Maintenance-scripts, 06Operations, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2615428 (10greg)
[16:47:23] <legoktm>	 hashar: is the process for getting puppet updates to nodepool images documented somewhere? I'd like to try it (have a new puppet patch to push)
[16:48:06] <hashar>	 legoktm: sure!  https://wikitech.wikimedia.org/wiki/Nodepool#Manually_generate_a_new_snapshot
[16:48:22] <hashar>	 which ask nodepool to create a new snapshot instances based on a stock image
[16:48:47] <hashar>	 it boot the stock image (ci-jessie-wikimedia),  run  setup_node.sh script into it (as root)
[16:48:59] <hashar>	 which really puppet apply  the puppet manifests in integration/config
[16:49:05] <hashar>	 then if all goes fine, take a snapshot of that instance
[16:49:13] <hashar>	 and then use that to boot new instances
[16:49:32] <legoktm>	 okay, my patch is https://gerrit.wikimedia.org/r/309039
[16:49:36] <hashar>	 gotta get the puppet patch merged in puppet.git first though (there is no cherry picking system)
[16:49:47] <legoktm>	 aha
[16:50:15] <hashar>	 with europe, ops are usually quite fast at merging them at least
[16:50:27] <hashar>	 and in case of emergency one can update the ciimage.pp  in integration/config
[16:50:29] <hashar>	 merge that
[16:50:36] <hashar>	 and refresh the imaget
[16:50:40] <hashar>	 then later upstream the bit from ciimage.pp to puppet.git
[16:51:21] <hashar>	 oh
[16:51:35] <hashar>	 legoktm: once the snapshot has been refreshed, the  instances that got spawned with the previous are still around
[16:51:43] <hashar>	 so gotta wait for them to be consumed or manually delete the old instances
[16:51:51] <hashar>	 via:  nodepool delete <some numeric id>
[16:52:06] <hashar>	 then nodepool will refill the pool with instances based on the last snapshot
[16:55:35] <hashar>	 I am off!
[16:55:55] <legoktm>	 o/
[17:06:14] * paladox watching apple event ;)
[17:06:18] <paladox>	 :)
[17:07:35] <wikibugs>	 10Continuous-Integration-Infrastructure: Investigate installing php5.3 on trusty and/or debian instance - https://phabricator.wikimedia.org/T103786#2615478 (10greg) >>! In T103786#2613662, @Legoktm wrote: > Precise LTS support ends in April 2017. MediaWiki 1.23 goes EOL in May 2017 (last version to support 5.3)....
[17:10:30] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2615494 (10Legoktm) >>! In T144872#2613749, @Paladox wrote: > @legoktm hi, how would we manage to support php 5.6, and 7 on Jessie, or are we going to use php 7 instead of php 5.6?...
[17:12:18] <wikibugs>	 10Continuous-Integration-Infrastructure: Install PHP5.5 on jessie CI instances - https://phabricator.wikimedia.org/T144959#2615500 (10Legoktm)
[17:12:26] <wikibugs>	 10Continuous-Integration-Infrastructure: Install PHP5.5 on jessie CI instances - https://phabricator.wikimedia.org/T144959#2615514 (10Legoktm) p:05Triage>03Low
[17:13:06] <paladox>	 legoktm we can probaly update integration/config now for php7
[17:13:28] <paladox>	 probaly create a basic composer test that uses php7? Just as a test to make sure things work
[17:14:54] <legoktm>	 hold on, I'm still filing tasks xD
[17:15:47] <paladox>	 legoktm Oh, so we carnt do that yet?, or just wait for you to finish filling tasks before doing it?
[17:15:48] <paladox>	 :)
[17:16:17] <wikibugs>	 10Continuous-Integration-Config: Create composer-php70 job - https://phabricator.wikimedia.org/T144961#2615547 (10Legoktm)
[17:16:33] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2615561 (10Legoktm)
[17:17:12] <paladox>	 LOL, they brought mario to the app store :)
[17:17:47] <wikibugs>	 10Continuous-Integration-Config: Create composer-php70 job - https://phabricator.wikimedia.org/T144961#2615547 (10Paladox) We can do this now :)
[17:19:15] <wikibugs>	 10Continuous-Integration-Config: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#2615580 (10Legoktm)
[17:19:25] <wikibugs>	 10Continuous-Integration-Config: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#2615592 (10Legoktm)
[17:19:27] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2615593 (10Legoktm)
[17:19:54] <legoktm>	 paladox: do you want to create the composer job? otherwise I can do it later today
[17:19:55] <wikibugs>	 10Continuous-Integration-Infrastructure: Support PHP 7 in CI infra - https://phabricator.wikimedia.org/T144872#2615612 (10Paladox)
[17:20:22] <wikibugs>	 10Continuous-Integration-Infrastructure: Tracking php7 support in ci - https://phabricator.wikimedia.org/T144964#2615613 (10Paladox)
[17:20:54] <wikibugs>	 10Continuous-Integration-Config: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#2615629 (10Paladox)
[17:20:56] <wikibugs>	 10Continuous-Integration-Config: Create composer-php70 job - https://phabricator.wikimedia.org/T144961#2615630 (10Paladox)
[17:21:01] <paladox>	 legoktm yeh i can do that
[17:21:51] <paladox>	 legoktm do i do in parameter_functions
[17:21:52] <paladox>	     if 'php7' in job.name:
[17:21:52] <paladox>	         params['PHP_BIN'] = 'php7'
[17:21:59] <paladox>	 legoktm also doint forget about php7.1
[17:22:08] <legoktm>	 let's name it "php70"
[17:22:10] <paladox>	 Ok
[17:22:21] <paladox>	 legoktm so do i do params['PHP_BIN'] = 'php70'
[17:22:23] <paladox>	 ?
[17:22:36] <legoktm>	 and PHP_BIN needs to be set to php7.0
[17:22:45] <paladox>	 ah
[17:22:46] <paladox>	 thanks
[17:25:00] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests, 10netops: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2615667 (10RobH) So I can handle the vlan move and reimage.  Just to confirm there is no data that is c...
[17:28:25] <robh>	 so hashar isnt about but can anyone else in release engineering confirm that conint1001 doesnt house data and i can reimage per https://phabricator.wikimedia.org/T140257
[17:28:27] <robh>	 ?
[17:28:55] <robh>	 im 99.99% its not but I dont want to just do it, im paranoid.
[17:28:58] <paladox>	 greg-g thcipriani ^^
[17:29:19] <paladox>	 ostriches ^^
[17:30:16] <robh>	 I think its fine, since it was supposed to be allocated as an emergency replacement but then not used.  I expect all the data is just OS and puppet data, easily lost.
[17:30:16] <thcipriani>	 robh: that is correct, nothing important on that box. It was rebuilt in an emergency and now we just need to reimage it.
[17:30:21] <robh>	 awesome
[17:30:27] <robh>	 I'll reimage it for you guys on the right vlan now
[17:30:29] <robh>	 thx!
[17:30:37] <thcipriani>	 awesome! Thank you!
[17:31:04] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests, 10netops: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2615745 (10RobH) a:03RobH checked in release engineering, its cool for me to reimage this now (after...
[17:31:06] <wikibugs>	 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2615747 (10greg)
[17:32:59] <wikibugs>	 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2493130 (10greg)
[17:33:15] <greg-g>	 thanks much robh 
[17:34:35] <robh>	 is there a preferred paritiotning?
[17:34:47] <robh>	 or is the typical raid1 ext4 /srv mount fine?
[17:35:00] <robh>	 its now in there as ci-master.cfg, which is a new one to me.
[17:35:01] <grrrit-wm>	 (03PS1) 10Paladox: Add support for php7.0 [integration/config] - 10https://gerrit.wikimedia.org/r/309048 (https://phabricator.wikimedia.org/T144961) 
[17:35:09] <paladox>	 legoktm ^^
[17:35:15] <paladox>	 :)
[17:35:22] <grrrit-wm>	 (03PS2) 10Paladox: Add support for php7.0 [integration/config] - 10https://gerrit.wikimedia.org/r/309048 (https://phabricator.wikimedia.org/T144961) 
[17:35:55] <robh>	 i can leave it as the ci-master recipe, but its very non standard compared ot all the other recipes
[17:36:04] <robh>	 as it has a large /var, but perhaps needed here?
[17:36:42] <robh>	 8gb swap (eww, we typically dont use swap because we just throw in enough ram), 50gb /, then  100g /srv and 250g /var
[17:36:49] <paladox>	 thcipriani ^^
[17:36:50] <robh>	 is what its set for on ci-master.cfg
[17:37:51] <wikibugs>	 06Release-Engineering-Team, 15User-greg: Ping tasks in #wikimedia-incident without recent activity for follow-up near the end of FY1617Q1 - https://phabricator.wikimedia.org/T144973#2615805 (10greg)
[17:38:29] <wikibugs>	 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Plan how to improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2493130 (10greg)
[17:38:41] <robh>	 i'll likely leave it on the ci-master thing if there is no answer, assuming this will become said master since it replaces the existing ones.
[17:38:44] <wikibugs>	 06Release-Engineering-Team, 15User-greg: Ping tasks in #wikimedia-incident without recent activity for follow-up near the end of FY1617Q1 - https://phabricator.wikimedia.org/T144973#2615831 (10greg)
[17:38:46] <wikibugs>	 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Plan how to improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2493130 (10greg)
[17:38:53] <thcipriani>	 robh: I don't think that the ci-master.cfg is correct...looking at how gallium is partitioned and what you're describing
[17:39:12] <robh>	 ok, so basically gallium is my template for parititoning, that makes more sense =]
[17:39:19] <robh>	 cool, i rather elimiante odd partitioning
[17:39:29] <legoktm>	 paladox: looks good, will merge/deploy later
[17:39:40] <thcipriani>	 cool, yeah, I'm unclear where we used ci-master.cfg :\
[17:39:45] <wikibugs>	 06Release-Engineering-Team, 15User-greg: Ping tasks in #wikimedia-incident without recent activity for follow-up near the end of FY1617Q1 - https://phabricator.wikimedia.org/T144973#2615805 (10greg)
[17:39:47] <wikibugs>	 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Plan how to improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2493130 (10greg) 05Open>03Resolved a:03greg With the retitling and documentin...
[17:39:49] <paladox>	 legoktm ok thanks :)
[17:40:12] <robh>	 thcipriani: it seems it was only used for the emergency reinstall of contint1001 awhile back, so glad to eliminate it.
[17:58:49] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests, 10netops: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2615947 (10RobH)
[18:07:42] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[18:23:28] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-extensions-Examples, 07Documentation, and 3 others: Improve documentation around running/writing (with lots of examples) browser tests - https://phabricator.wikimedia.org/T108108#1512435 (10greg) https://www.mediawiki.org/wiki/Selenium...
[18:35:44] <chasemp>	 thcipriani: about?
[18:35:59] <thcipriani>	 chasemp: ish, doing a SWAT deploy now
[18:36:01] <chasemp>	 I think the jobs hashar reverted this morning have some long running components
[18:36:02] <chasemp>	 https://graphite.wikimedia.org/render/?width=2064&height=1100&_salt=1472565940.394&target=nodepool.job.*jessie*.runtime.mean&hideLegend=false&from=-2h&lineMode=connected
[18:36:14] <chasemp>	 k
[18:36:16] <chasemp>	 not urget
[18:36:17] <chasemp>	 just fyi https://graphite.wikimedia.org/render/?width=966&height=489&_salt=1471549160.573&target=cactiStyle(zuul.pipeline.gate-and-submit.label.ci*.wait_time.mean)&hideLegend=false&lineMode=connected&from=-6h
[18:36:30] <chasemp>	 I'm not entirely clear on what jobs were reverted in practical terms
[18:36:41] <chasemp>	 but I think that's them causing delay here atm
[18:42:00] <wikibugs>	 10Continuous-Integration-Infrastructure: Tracking php7 support in ci - https://phabricator.wikimedia.org/T144964#2615613 (10hashar) Please hold from adding jobs on all MediaWiki repos, and specially mediawiki/core. We do not have enough capacity to run them :-)  Though that would be fine in experimental.
[18:42:40] <wikibugs>	 10Continuous-Integration-Infrastructure: Tracking php7 support in ci - https://phabricator.wikimedia.org/T144964#2616200 (10Paladox) Ok, but doint we have two Jessie instances we could use?
[18:42:46] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:47:31] <grrrit-wm>	 (03CR) 10Hashar: "JJB def looks fine. We do not have enough capacity on labs now to have the php70 jobs triggered automatically so "experimental" sounds s" [integration/config] - 10https://gerrit.wikimedia.org/r/309048 (https://phabricator.wikimedia.org/T144961) (owner: 10Paladox)
[18:49:20] <thcipriani>	 chasemp: wowza, yeah, I'm not sure what jobs have all been reverted either. Could the maintenance where we couldn't communicate with openstack account for some of this? Or is the timing wrong?
[18:49:27] * thcipriani looks through backscroll
[18:49:42] <chasemp>	 I thikn this (really recent backlog) is way later
[18:50:30] <thcipriani>	 yeah, looks like
[18:50:38] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] "Sounds sane. One less legacy thingy to deal with!" (031 comment) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/308931 (owner: 10Legoktm)
[18:54:18] <thcipriani>	 hrm, I don't see any changes to config that happened around that time...
[18:54:33] <chasemp>	 thcipriani: no I think it was the reverts from tis morning lying in wait for actual work etc
[18:54:40] <chasemp>	 iiuc the jobs here https://graphite.wikimedia.org/render/?width=2064&height=1100&_salt=1472565940.394&target=nodepool.job.*jessie*.runtime.mean&hideLegend=false&from=-2h&lineMode=connected
[18:54:54] <chasemp>	 that are long running and recent are those
[18:55:03] <chasemp>	 https://phabricator.wikimedia.org/T143938#2614056
[18:55:33] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[18:58:21] <thcipriani>	 sounds true: more jobs, same pool available, wait times are longer
[18:59:04] <thcipriani>	 plus during SWAT there were a ton of patches I was stuck behind when I tried to recheck one. Waiting on rake-jessie, IIRC
[19:00:03] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests, 10netops, 13Patch-For-Review: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2616279 (10hashar) I confirm the server content on contint1001.eqiad.wmnet can be...
[19:00:06] <chasemp>	 I bet that's what it was
[19:00:12] <chasemp>	 I ping you on huge backlog right as you are doing swat
[19:00:19] <chasemp>	 and neither of us immediately connects those dots
[19:00:22] <greg-g>	 heh
[19:00:24] <thcipriani>	 :D
[19:00:50] <chasemp>	 and I bet a lot of tha twork was tox and linting (moved over today)
[19:00:54] <chasemp>	 iiuc
[19:01:41] <grrrit-wm>	 (03PS3) 10Paladox: Add support for php7.0 [integration/config] - 10https://gerrit.wikimedia.org/r/309048 (https://phabricator.wikimedia.org/T144961) 
[19:01:47] <hashar>	 robh: if you are going to reimage contint1001.eqiad.wmnet, we will want to drop the roles in puppet before its moved to the new vlan/reimaged  https://phabricator.wikimedia.org/T140257
[19:02:08] <hashar>	 robh: else puppet might well spawn new Zuul and an empty Jenkins
[19:03:14] <hashar>	 chasemp: I have moved the rake jobs to Nodepool. It is only a few builds
[19:03:52] <chasemp>	 could be just jobs alrady reverted before today that aligned w/ the swat
[19:04:16] <hashar>	 been running with other meetings most of my afternoon. So I havent moved anything else.
[19:05:10] <hashar>	 Tyler mentionned to me the spike from yesterday. I haven't really looked into it
[19:05:13] <chasemp>	 sure I mean even last week or last few days, something not triggered until x happens
[19:05:18] <chasemp>	 where x is something from swat possibly
[19:05:42] <hashar>	 something triggering?
[19:05:54] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[19:07:36] <thcipriani>	 just had a big spike in wait time for jobs. Lots of patches came in to the test queue all at the same time. Triggered big wait times.
[19:08:31] <thcipriani>	 <chasemp>: https://graphite.wikimedia.org/render/?width=966&height=489&_salt=1471549160.573&target=cactiStyle%28zuul.pipeline.gate-and-submit.label.ci*.wait_time.mean%29&hideLegend=false&lineMode=connected&from=-6h
[19:08:56] <chasemp>	 we were talking about what exactly those jobs are etc
[19:10:00] <hashar>	 ah in gate
[19:10:10] <hashar>	 the gate also has a window to limit the jobs it is triggering
[19:11:13] <chasemp>	 does that mean wait-times can be artificially high no matter how idle nodepool is?
[19:11:30] <hashar>	 for the gate-and-submit yeah
[19:11:40] <hashar>	 if there is like 100 patches that get a CR+2 and put in the queue
[19:12:00] <hashar>	 unless a handful of them actually get processed, the others are idling pending for the changes ahead in the queue to be complete
[19:12:45] <chasemp>	 where is the limit defined?
[19:14:47] <hashar>	 doc is http://docs.openstack.org/infra/zuul/zuul.html?highlight=window
[19:14:52] <hashar>	 we have  window-floor: 12
[19:15:01] <hashar>	 defined in integration/config.git /zuul/layout.yaml
[19:15:38] <hashar>	 that is rate limiting system to the number of jobs triggered by the pipeline
[19:15:42] <hashar>	 with a default of 20  
[19:15:47] <chasemp>	 so for long wait-time spikes how do we know which is the cuplrit?
[19:15:50] <chasemp>	 culprit even
[19:15:55] <chasemp>	 is there a metric for jobs blocked by this?
[19:16:00] <hashar>	 and window-floor is to have the minimum of jobs running to be 12
[19:16:04] <hashar>	 not that I know
[19:16:22] <hashar>	 (should I have thought about that one sorry :( )
[19:16:30] <hashar>	 let me check the sources
[19:17:00] <chasemp>	 at the mometn it seems like we are blind afa tuning nodepool if there is a secondary higher up mechanism inflating wait-times randomly
[19:17:21] <chasemp>	 or periodically anyway when it would be the most illustrative 
[19:18:43] <hashar>	 from a quick check to the sources, the "window" is not reported to statsd
[19:19:00] <hashar>	 but it is log.debug  and there are a few messages in gallium.wikimedia.org /var/log/zuul/debug.log
[19:20:02] <hashar>	 2016-09-07 18:18:14 <ChangeQueue gate-and-submit: mediawiki> window size increased to 22>
[19:20:14] <hashar>	 2016-09-07 18:18:15 <ChangeQueue gate-and-submit: mediawiki> window size decreased to 12
[19:20:33] <hashar>	 2016-09-07 18:31:26 <ChangeQueue gate-and-submit: mediawiki> window size increased to 13>
[19:22:34] <hashar>	 2016-09-07 18:18:15,150 INFO zuul.DependentPipelineManager: Reported change <Change 0x7fc033baa5d0 309059,2> status: all-succeeded: True, merged: False
[19:22:35] <hashar>	 2016-09-07 18:18:15,150 DEBUG zuul.DependentPipelineManager: Reported change <Change 0x7fc033baa5d0 309059,2> failed tests or failed to merge
[19:22:42] <hashar>	 that is what triggered the window decrease
[19:22:49] <hashar>	 (all of that from /var/log/zuul/debug.log
[19:23:24] <hashar>	 caused by https://gerrit.wikimedia.org/r/#/c/309059/  force merged
[19:23:58] <chasemp>	 why do we use this windowing behavior?
[19:24:00] <hashar>	 zuul had the tests completed,  then tried to  --submit the change
[19:24:09] <hashar>	 since the change got closed due to the forced merge,  it considered the change failed to merge
[19:24:30] <hashar>	 thus triggered a window decrease to the window-floor value
[19:24:43] <hashar>	 then cancels all the jobs, rebuild the queue and reprocess
[19:27:59] <robh>	 How do I give someone the 'advanced' option to create tasks in phab?
[19:28:05] <wikibugs>	 10Continuous-Integration-Infrastructure: Move integration-puppetmaster off of precise (probably to jessie) - https://phabricator.wikimedia.org/T144951#2615279 (10yuvipanda) we're considering not supporting precise puppetmasters that are self-hosted on labs soon, so yes please :)
[19:28:07] <robh>	 I need to give Pam (new buyer) the ability to create s4 tasks
[19:29:27] <chasemp>	 robh: spaces and advanced as kind of perpindicular to each other so to speak, you want to add this person to the relevant acl I think?
[19:29:30] <hashar>	 chasemp: the window flavor is always enabled
[19:29:49] <chasemp>	 hashar: it says it can be disabled
[19:29:50] <chasemp>	 A value of 0 disables rate limiting on the DependentPipelineManager. Default: 20.
[19:30:00] <hashar>	 oh
[19:30:33] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:30:58] <hashar>	 from the commit that added window-floor https://gerrit.wikimedia.org/r/#/c/199169/
[19:31:09] <hashar>	 that was to ensure that at least 12 jobs can run
[19:31:27] <chasemp>	 but I think window-floor is only relevant is windowing is enabled at all
[19:31:29] <chasemp>	 right?
[19:31:31] <hashar>	 else in case of multiple failures it would go down to 3 which is typically not enough (a mediawiki core job is 10-12 jobs iirc)
[19:31:38] <chasemp>	 if even
[19:36:15] <mobrovac>	 is beta having problems?
[19:37:05] <hashar>	 mobrovac: have you checked https://logstash-beta.wmflabs.org/  >  mediawiki-errors ?
[19:41:20] <wikibugs>	 10Continuous-Integration-Infrastructure: Move integration-puppetmaster off of precise (probably to jessie) - https://phabricator.wikimedia.org/T144951#2616415 (10Legoktm) Okay, I'll try and do this later in the week. Notes from IRC: * create new puppetmaster instance on jessie, with same puppet roles * copy over...
[19:45:52] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:56:41] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:00:01] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-ORES, 15User-Ladsgroup: User contribs seems to be empty when ores enabled - https://phabricator.wikimedia.org/T144999#2616468 (10Ladsgroup)
[20:06:41] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T142855#2616509 (10Jdforrester-WMF)
[20:07:46] <matt_flaschen>	 mobrovac, we are investigating the missing text in #wikimedia-dev.
[20:08:00] <mobrovac>	 kk thnx matt_flaschen
[20:09:24] <matt_flaschen>	 hashar, I am trying to run sync-dir manually on Beta Cluster to test a revert, but it just hangs.
[20:09:26] <matt_flaschen>	 Any thoughts?
[20:09:27] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-ORES, 15User-Ladsgroup: User contribs seems to be empty when ores enabled - https://phabricator.wikimedia.org/T144999#2616541 (10Ladsgroup) I guess that's old data in wikidata and English Wikipedia in beta cluster.
[20:10:58] <hashar>	 matt_flaschen: no idea
[20:11:19] <hashar>	 matt_flaschen: remember that Jenkins update all repos every 10 minutes and then trigger a scap.  That will override your hack
[20:11:42] <hashar>	 so you will want to disable the job https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/
[20:11:52] <hashar>	 (which does the update / reset --hard
[20:11:53] <matt_flaschen>	 hashar, well, if scap isn't working it's a moot point.
[20:12:00] <matt_flaschen>	 hashar, but you're right, thank you.
[20:12:04] <hashar>	 then you can trigger the scap either manually or via https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/
[20:12:26] <matt_flaschen>	 hashar, if I trigger it in the web UI, will it change which code is checked out?
[20:13:03] <hashar>	 it just runs "scap sync"
[20:13:11] <matt_flaschen>	 hashar, okay, cool, thank you.
[20:13:14] <hashar>	 you will get bunch of logs via the console https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/119020/console :D
[20:14:06] <wmf-insecte>	 Project beta-code-update-eqiad build #120474: 15ABORTED in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/120474/
[20:14:52] <matt_flaschen>	 !Temporarily disabled https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/ to test live revert of aa0f6ea
[20:14:55] <matt_flaschen>	 !log Temporarily disabled https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/ to test live revert of aa0f6ea
[20:15:01] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:21:39] <matt_flaschen>	 hashar, it's hanging on there too, maybe the same place: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/119021/console
[20:22:23] <matt_flaschen>	 Never mind, just slow.
[20:26:04] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium): Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) - https://phabricator.wikimedia.org/T137323#2616610 (10hashar)
[20:27:20] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 13Patch-For-Review: Migrate CI services from gallium to contint1001 - https://phabricator.wikimedia.org/T137358#2616631 (10hashar)
[20:27:23] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium): Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) - https://phabricator.wikimedia.org/T137323#2364905 (10hashar) 05stalled>03Open contint1001 has been moved to the production public network with fqdn c...
[20:30:43] <hashar>	 !log Updated security group for contintcloud and integration labs project.  Allow ssh port 22 from contint1001.wikimedia.org (matching rules for gallium). T137323
[20:30:49] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:31:37] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:35:30] <hashar>	 !log Updated security group for deployment-prep labs project.  Allow ssh port 22 from contint1001.wikimedia.org (matching rules for gallium). T137323
[20:35:36] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:44:15] <matt_flaschen>	 !log Re-enabled beta-code-update-eqiad .
[20:44:20] <matt_flaschen>	 Thanks, hashar
[20:44:22] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:46:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:50:54] <Krinkle>	 greg-g: I've claimed https://phabricator.wikimedia.org/T141985
[20:53:01] <hashar>	 robh: noticed you got contint1001 a new vlan/ip and even got it reimaged!  impressive :)   (ping thcipriani )
[20:53:39] <hashar>	 thcipriani: new hostname is contint1001.wikimedia.org  with IP 208.80.154.17 . I have sent a few patches to puppet to adjust the conf
[20:53:48] <thcipriani>	 nice
[20:54:13] <robh>	 hashar: yep, signed hte puppet keys but not salt yet
[20:54:24] <robh>	 got distracted with other sutff, will do so now and its ready for you guys to take over
[20:54:43] <paladox>	 :)
[20:54:53] <hashar>	 robh: do not enabled puppet on it though
[20:55:06] <robh>	 oh, i already had and ran initial run
[20:55:10] <robh>	 it has nothing but base
[20:55:14] <hashar>	 oh
[20:55:16] <robh>	 as it has no site.pp entries, does it need to start over?
[20:55:19] <hashar>	 ah yeah site.pp do not match it
[20:55:23] <hashar>	 will update site.pp
[20:55:35] <robh>	 yeah, the old site.pp entry was for eqiad.wmnet
[20:55:46] <robh>	 so i assumed it was ok to run puppet and sign so you can implement as needed via site.pp
[20:56:00] <hashar>	 yeah definitely
[20:56:03] <hashar>	 you are way more careful than me!
[20:56:21] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests, 10netops, 13Patch-For-Review: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2616802 (10RobH)
[20:57:12] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests, 10netops, 13Patch-For-Review: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2458291 (10RobH) a:05RobH>03hashar contint1001.wikimedia.org is online with p...
[20:57:18] <robh>	 its all yours now =]
[21:01:50] <hashar>	 robh: if you feel adventurous, the updated site.pp is https://gerrit.wikimedia.org/r/#/c/309069/
[21:02:07] <hashar>	 else will loop it with Europe ops tomorrow
[21:08:58] <robh>	 hashar: yeah rebasing and will merge and run no problem
[21:09:09] <hashar>	 robh: awesome. This way site.pp is clean
[21:09:48] <greg-g>	 Krinkle: ty sir
[21:12:49] <hashar>	 so  contint1001.timer.start()
[21:13:03] <paladox>	 :)
[21:14:25] <robh>	 waiting on auto verification
[21:14:27] <robh>	 =P
[21:14:42] <robh>	 CI is slow to approve the patch to start the process of making a new ci server.
[21:14:55] <hashar>	 yeah
[21:15:08] <hashar>	 https://integration.wikimedia.org/zuul/  there is a bunch of -trusty and -jessie jobs waiting for a node to be available
[21:15:20] <paladox>	 I guess we should probaly detangle all mw jobs.
[21:15:42] <hashar>	 Demand from gearman: ci-jessie-wikimedia: 6   ci-trusty-wikimedia: 4  (jobs pending)
[21:16:03] <shinken-wm>	 PROBLEM - Puppet run on deployment-conf03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[21:16:29] <hashar>	 it does not garbage collect fast enough
[21:18:20] <robh>	 its merged now
[21:19:37] <hashar>	 thcipriani: chasemp: the same issue has yesterday reproduced just now!
[21:20:01] <hashar>	 basically got a bunch of jobs running with pool being fully occupied
[21:20:20] <wikibugs>	 06Release-Engineering-Team, 10DBA, 10MediaWiki-Maintenance-scripts, 06Operations, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2616900 (10greg) I'm also a big +1 on having those long running maint sc...
[21:20:34] <hashar>	 with up to 7 request to delete servers (which free up the pool and allow to spawn new instance)
[21:20:56] <hashar>	 and at the given rate, that takes a while to reclaim all those slots
[21:21:18] <hashar>	 then eventually once the deleted nodes got freed, a surge of new ones are created
[21:21:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:22:46] <shinken-wm>	 PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[21:23:00] <hashar>	 so whenever the pool has consumed all nodes, it takes a good minute and a half to delete all the nodes
[21:23:12] <hashar>	 on labnodepool, a good view would be:  grep 'wmflabs-eqiad running task' /var/log/nodepool/debug.log
[21:24:54] <paladox>	 hashar i guess openstack never produce that since they have 1000+ instances, plus there list in zuul is long
[21:25:21] <paladox>	 so they doint really mind the bad performance, so maybe it is a bug openstack havent seen yet?
[21:26:13] <hashar>	 paladox: all queries made to the cloud provider are serialized in a queue (eg processed one by one) every  X seconds
[21:26:36] <paladox>	 yep
[21:26:46] <shinken-wm>	 RECOVERY - Host deployment-parsoid05 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms
[21:28:02] <thcipriani>	 that grep does give an interesting view of the log
[21:28:30] <hashar>	 the  queue: X   
[21:28:48] <hashar>	 is the number of any tasks in the queue (I thought it was for a given task / with shared queus)
[21:29:00] <thcipriani>	 yeah, listservertask, createservertask, deleteservertask
[21:29:02] <hashar>	 so  queue:  8   --> 8 tasks waiting,   can be anything
[21:29:17] <hashar>	 and I checked the code, the queue size is not sent to statsd
[21:31:08] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) - https://phabricator.wikimedia.org/T137323#2616944 (10hashar) contint1001 now has the default set of rules from contint::firewall....
[21:31:45] <shinken-wm>	 PROBLEM - Host deployment-parsoid05 is DOWN: CRITICAL - Host Unreachable (10.68.16.120)
[21:32:00] <hashar>	 thcipriani: the ferm rules are now enabled on contint1001 . So gotta connect to it via the bastions  
[21:32:22] <paladox>	 :)
[21:32:30] <hashar>	 (if only ferm could set description to the iptables rules...)
[21:32:53] <paladox>	 (Question, will you be doing zuul and jenkins tomarror on contint101?)
[21:34:01] <thcipriani>	 a very long oneliner that could probably all be done with awk: grep 'wmflabs-eqiad running task' /var/log/nodepool/debug.log | cut -d ':' -f5 | tr -d ')' | awk '{sum+=$1} END { print "Average = ",sum/NR}'
[21:34:23] <thcipriani>	 seem to average around 3 tasks in the queue at any time.
[21:35:03] <greg-g>	 beautiful
[21:36:32] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) - https://phabricator.wikimedia.org/T137323#2616971 (10hashar)
[21:37:50] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) - https://phabricator.wikimedia.org/T137323#2364905 (10hashar)
[21:40:16] <hashar>	 thcipriani: feel free to puppetize the one-liner!    nodepool-queue.sh !
[21:42:01] <hashar>	 checked a few network flows https://phabricator.wikimedia.org/T137323
[21:42:07] <greg-g>	 +1, I love me a good one-line .sh (I have a few)
[21:42:21] <greg-g>	 (well, two lines, #! first)
[21:43:00] <hashar>	 I puppetized  the sudo command line to become the "nodepool" user
[21:43:03] <hashar>	 become-nodepool
[21:43:19] <hashar>	 inspired by toollabs, turns out to be a huge time saver
[21:43:29] <hashar>	 (I checked a few network flows https://phabricator.wikimedia.org/T137323 )
[21:43:37] <hashar>	 iridium I have no idea if I have access to it
[21:43:53] <hashar>	 some others are pending the puppet classes for jenkins and zuul which add the ferm rules
[21:43:58] <paladox>	 twentyafterfour has access to ^^ 
[21:44:30] <hashar>	 anyway sleep time
[21:44:42] <greg-g>	 hashar: you don't: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Access_list
[21:44:45] <greg-g>	 ;)
[21:44:52] <hashar>	 will check with ops tomorrow to enable backup and maybe load Jenkins data to contint1001
[21:45:06] <hashar>	 greg-g: nice table!
[21:45:14] <greg-g>	 hashar: copy/pasted from you!
[21:45:17] <greg-g>	 g'night!
[21:45:22] <hashar>	 o really
[21:45:30] <hashar>	 hashar: nice table!
[21:45:30] <hashar>	 hashar: thank you
[21:45:34] <greg-g>	 :)
[21:47:16] <hashar>	 sleep well folks!
[21:47:23] <Platonides>	 you too, hashar 
[21:47:31] <hashar>	 oh Platonides !!!
[21:48:03] * Platonides waves :)
[21:48:06] <hashar>	 Platonides: are you back / active again??? 
[21:49:15] <hashar>	 Platonides: when ever you are there, poke me so we can chat a bit :D  For now I really need sleep time sorry
[21:50:05] <Platonides>	 np
[21:54:05] <paladox>	 hashar we could probaly get your zuul change merged tomrror?
[21:54:09] <paladox>	 tommorror?
[21:54:28] <paladox>	 the one that cleans it up and supports heira
[21:55:49] <hashar>	 maybe
[21:55:52] <paladox>	 ok
[21:55:53] <hashar>	 gotta discuss them with ops
[21:55:56] <paladox>	 ok
[21:56:00] <hashar>	 cya
[21:56:00] <twentyafterfour>	 can I help?
[21:56:14] * twentyafterfour got pinged
[21:56:56] <paladox>	 twentyafterfour sorru, i did it when hashar was going through https://phabricator.wikimedia.org/T137323
[21:57:08] <paladox>	 and said he coulden test iridium since he didnt have access
[21:58:53] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) - https://phabricator.wikimedia.org/T137323#2364905 (10mmodell) ``` twentyafterfour@iridium:~$ telnet 208.80.154.17 4730 Trying 208.8...
[21:58:58] <twentyafterfour>	 connection refused
[21:59:29] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) - https://phabricator.wikimedia.org/T137323#2617146 (10mmodell)
[22:02:44] <shinken-wm>	 RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[22:02:52] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) - https://phabricator.wikimedia.org/T137323#2617183 (10hashar) Yup the service is not enabled (that is the Gearman server embedded in...
[22:13:14] <legoktm>	 twentyafterfour: uhhh, your update submodules commit just got linked to a bunch of irrelevant tasks
[22:14:02] <legoktm>	 [15:11:23] <wikibugs> MediaWiki-General-or-Unknown: Change user namespace from usuário to wikipedista in portuguese wikipedia - https://phabricator.wikimedia.org/T11587#2617200 (mmodell) declined>Resolved
[22:19:46] <legoktm>	 twentyafterfour: ????? stop
[22:20:42] <legoktm>	 this happens every time...
[22:21:45] <paladox>	 Oh phabricators repo needs switching to notifiy off
[22:23:03] <legoktm>	 I assume he'll clean up his mess, but it would be nice to not create it in the first place, given how many times it's happened in the past.
[22:24:45] <twentyafterfour>	 legoktm: ?
[22:25:24] <paladox>	 twentyafterfour the upstream branch you pulled in linked to other tasks and closed them with status that shoulden be
[22:25:29] <paladox>	 see also -devtools
[22:25:50] <twentyafterfour>	 autoclose should be disabled
[22:27:07] <paladox>	 Yep, i think it may be phabricator deployment
[22:27:12] <paladox>	 twentyafterfour ^^
[22:27:24] <paladox>	 https://phabricator.wikimedia.org/diffusion/PHDEP/manage/actions/
[22:33:43] <twentyafterfour>	 freakin gerrit decided to quote the commit messages from submodule changes that got merged
[22:33:56] * twentyafterfour did not make this commit https://phabricator.wikimedia.org/rPHDEP81dc55d04fec633ae3958e9849a43429468208aa
[22:36:00] <paladox>	 LOL
[23:06:55] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[23:15:42] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: doc.wikimedia.org should be running PHP 5.5+, not 5.3 -> demos etc. don't work - https://phabricator.wikimedia.org/T127504#2045488 (10Krinkle) Moving integration.wikimedia.org is harder due to the Zuul and Jenkins proxies.  {F4099754 size=ful...
[23:37:14] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T142855#2617850 (10Ladsgroup)
[23:46:54] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:55:21] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T142855#2617920 (10Jdforrester-WMF)