[00:46:18] <wmf-insecte>	 Yippee, build fixed!
[00:46:19] <wmf-insecte>	 Project UploadWizard-api-commons.wikimedia.beta.wmflabs.org build #3018: 09FIXED in 17 sec: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/3018/
[02:38:48] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[03:22:21] <wmf-insecte>	 Project beta-scap-eqiad build #79555: 04FAILURE in 3 min 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/79555/
[03:26:24] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<11.11%)
[03:26:34] <shinken-wm>	 PROBLEM - Puppet failure on deployment-eventlogging03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[03:26:55] <wmf-insecte>	 Yippee, build fixed!
[03:26:56] <wmf-insecte>	 Project beta-scap-eqiad build #79556: 09FIXED in 2 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/79556/
[03:28:47] <bd808>	 !log Freed 800M on deployment-bastion by running /home/bd808/cleanup-var-crap.sh
[03:28:50] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[03:41:27] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK
[03:51:39] <Krenair>	 krenair@deployment-poolcounter01:~$ df -h
[03:51:39] <Krenair>	 df: `/sys/kernel/debug': Function not implemented
[03:51:45] <Krenair>	 Filesystem                                                 Size  Used Avail Use% Mounted on
[03:51:46] <Krenair>	 what
[03:53:14] <bd808>	 Krenair: sounds like -- https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/1465180
[04:01:31] <shinken-wm>	 RECOVERY - Puppet failure on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:21:39] <shinken-wm>	 PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[05:24:26] <wmf-insecte>	 Project beta-scap-eqiad build #79568: 04FAILURE in 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/79568/
[05:30:58] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #611: 04FAILURE in 28 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/611/
[05:35:42] <shinken-wm>	 PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[05:36:17] <wmf-insecte>	 Yippee, build fixed!
[05:36:18] <wmf-insecte>	 Project beta-scap-eqiad build #79569: 09FIXED in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/79569/
[05:36:18] <shinken-wm>	 PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0]
[05:36:19] <shinken-wm>	 PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[05:36:59] <shinken-wm>	 PROBLEM - Puppet failure on integration-puppetmaster is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[05:37:11] <shinken-wm>	 PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0]
[05:37:11] <shinken-wm>	 PROBLEM - Puppet failure on deployment-sca02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[05:37:11] <shinken-wm>	 PROBLEM - Puppet failure on deployment-tin is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0]
[05:37:11] <shinken-wm>	 PROBLEM - Puppet failure on integration-slave-trusty-1013 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[05:37:53] <shinken-wm>	 PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0]
[05:37:53] <shinken-wm>	 PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[05:38:03] <shinken-wm>	 PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[05:39:36] <shinken-wm>	 PROBLEM - Puppet failure on integration-slave-trusty-1023 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[05:40:43] <shinken-wm>	 PROBLEM - Puppet failure on integration-publisher is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [0.0]
[05:40:55] <shinken-wm>	 PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0]
[05:40:57] <shinken-wm>	 PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [0.0]
[05:41:09] <shinken-wm>	 PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[05:42:15] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #262: 04FAILURE in 26 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/262/
[06:02:11] <shinken-wm>	 RECOVERY - Puppet failure on integration-slave-trusty-1013 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:05:46] <shinken-wm>	 RECOVERY - Puppet failure on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[06:06:11] <shinken-wm>	 RECOVERY - Puppet failure on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:06:25] <shinken-wm>	 RECOVERY - Puppet failure on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:06:57] <shinken-wm>	 RECOVERY - Puppet failure on integration-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[06:07:05] <shinken-wm>	 RECOVERY - Puppet failure on deployment-kafka02 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:07:05] <shinken-wm>	 RECOVERY - Puppet failure on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:07:05] <shinken-wm>	 RECOVERY - Puppet failure on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[06:07:59] <shinken-wm>	 RECOVERY - Puppet failure on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:07:59] <shinken-wm>	 RECOVERY - Puppet failure on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:09:35] <shinken-wm>	 RECOVERY - Puppet failure on integration-slave-trusty-1023 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:10:45] <shinken-wm>	 RECOVERY - Puppet failure on integration-publisher is OK: OK: Less than 1.00% above the threshold [0.0]
[06:10:53] <shinken-wm>	 RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:10:55] <shinken-wm>	 RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:11:17] <shinken-wm>	 RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:13:02] <shinken-wm>	 RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0]
[06:40:51] <wikibugs>	 10Deployment-Systems, 3Scap3: scap3 dsh_target should check the scap directory of a repo as well as `/etc/dsh/groups` - https://phabricator.wikimedia.org/T119200#1824365 (10mmodell)
[06:52:56] <wikibugs>	 10Deployment-Systems, 6operations: l10nupdate user uid mismatch between tin and mira - https://phabricator.wikimedia.org/T119165#1824367 (10mmodell) @bd808: so, reading the manpage for rsync, it seems that you have to specify --numeric-ids for this to even matter?  Otherwise rsync should be smart enough to rem...
[07:52:21] <wikibugs>	 10Deployment-Systems, 3Scap3, 7Documentation, 5Patch-For-Review: Add documentation of the new scap3 features to the scap docs - https://phabricator.wikimedia.org/T112554#1824417 (10mmodell) 5Open>3Resolved https://doc.wikimedia.org/mw-tools-scap/scap3/index.html
[08:46:02] <wikibugs>	 10MediaWiki-Codesniffer, 3Outreachy-Round-11: Outreachy proposal for : Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T115585#1824501 (1001tonythomas) 5Open>3declined Thank you for your proposal. Sadly, the Outreachy administration team made it strict that candidates with...
[08:46:04] <wikibugs>	 10MediaWiki-Codesniffer, 10Possible-Tech-Projects, 3Outreachy-Round-11: Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T89682#1824505 (1001tonythomas)
[09:07:22] <grrrit-wm>	 (03PS1) 10Hashar: Add rake-jessie experimental to MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/254816 
[09:07:32] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Add rake-jessie experimental to MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/254816 (owner: 10Hashar)
[09:08:49] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add rake-jessie experimental to MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/254816 (owner: 10Hashar)
[09:22:22] <wikibugs>	 7Browser-Tests, 10Continuous-Integration-Config, 10MediaWiki-extensions-ContentTranslation, 5Patch-For-Review, and 3 others: Add Rakefile to repositories with Ruby code - https://phabricator.wikimedia.org/T117993#1824528 (10Nemo_bis)
[09:24:13] <grrrit-wm>	 (03CR) 10Hashar: "To skip the rake-jessie branch on a specific repo/branch we should now be able to do:" [integration/config] - 10https://gerrit.wikimedia.org/r/253343 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin)
[09:38:32] <wmf-insecte>	 Yippee, build fixed!
[09:38:32] <wmf-insecte>	 Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #682: 09FIXED in 1 min 30 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/682/
[10:17:43] <hashar>	 !log added Jcrespo (Jaime) to the beta cluster project as an admin + sudo rights
[10:17:47] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[10:41:41] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Database, 7WorkType-NewFunctionality: Send deployment-db1 deployment-db2 syslog to beta cluster logstash - https://phabricator.wikimedia.org/T119370#1824659 (10hashar) 3NEW
[10:46:08] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Database, 7WorkType-NewFunctionality: Send deployment-db1 deployment-db2 syslog to beta cluster logstash - https://phabricator.wikimedia.org/T119370#1824686 (10hashar) The configuration is around: ``` deployment-db1:~$ cat /etc/rsyslog.d/30-remote-syslog.conf  *.info;mail.none...
[10:49:11] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Database, 7WorkType-NewFunctionality: Send deployment-db1 deployment-db2 syslog to beta cluster logstash - https://phabricator.wikimedia.org/T119370#1824692 (10hashar) p:5Normal>3Low wfLogDBError is already sent to logstash beta, which comes with stracktraces. So syslog is...
[10:56:50] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 7Database, 7WorkType-NewFunctionality: Enable MariaDB/MySQL strict mode on CI slaves - https://phabricator.wikimedia.org/T119371#1824701 (10hashar) 3NEW
[11:24:10] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Database: Investigate slow query logging/digest for Beta Cluster - https://phabricator.wikimedia.org/T116793#1824748 (10hashar) Clarified with @jcrespo.  We can just enable `performance_schema` just like for production (T99485).  The informations will then be available in the be...
[11:27:15] <grrrit-wm>	 (03CR) 10Zfilipin: "https://gerrit.wikimedia.org/r/#/c/252686/ is merged and rake-jessie job runs fine for operations/puppet:" [integration/config] - 10https://gerrit.wikimedia.org/r/252689 (https://phabricator.wikimedia.org/T110019) (owner: 10Zfilipin)
[13:58:56] <grrrit-wm>	 (03PS1) 10Hashar: dib: make git mirrors belong to jenkins:jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/254851 
[13:59:15] <hashar>	 zeljkof: I am bumping the nodepool images to have /srv/git  to belong to jenkins user :-D
[13:59:21] <hashar>	 zeljkof: something I noticed this morning
[13:59:32] <zeljkof>	 hashar: +1 :)
[14:00:01] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] dib: make git mirrors belong to jenkins:jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/254851 (owner: 10Hashar)
[14:01:36] <hashar>	 I should inject mediawiki/core as well
[14:02:08] <wikibugs>	 10MediaWiki-Releasing, 6Developer-Relations, 10Wikimedia-Blog-Content, 3DevRel-November-2015, 5MW-1.26-release: Write blog post announcing MW 1.26 - https://phabricator.wikimedia.org/T112842#1824970 (10Qgil) Thank you @greg! Quick question: are you planning to release on 2015-11-25 -- next Wednesday? In...
[14:02:23] <grrrit-wm>	 (03PS2) 10Hashar: dib: make git mirrors belong to jenkins:jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/254851 
[14:07:07] <grrrit-wm>	 (03PS1) 10Hashar: dib: mirror mediawiki/core inside images [integration/config] - 10https://gerrit.wikimedia.org/r/254852 
[14:07:09] <grrrit-wm>	 (03PS1) 10Hashar: dib: tweak 01-mirror-gerrit-repos feedback [integration/config] - 10https://gerrit.wikimedia.org/r/254853 
[14:07:15] <hashar>	 Creating host cache /srv/dib/cache/git-repos/mediawiki/core.git
[14:07:15] <hashar>	 mkdir: created directory '/srv/dib/cache/git-repos/mediawiki'
[14:07:16] <hashar>	 Cloning into bare repository '/srv/dib/cache/git-repos/mediawiki/core.git'...
[14:08:54] <hashar>	 hacking while listening to french music https://www.youtube.com/watch?v=NwVA5zYfNWw
[14:17:25] <hashar>	 in live, they are literally surfing on top of the concert crowd https://youtu.be/B81iYZUov7Q?t=2m18s :D
[14:21:26] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] dib: mirror mediawiki/core inside images [integration/config] - 10https://gerrit.wikimedia.org/r/254852 (owner: 10Hashar)
[14:21:37] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] dib: tweak 01-mirror-gerrit-repos feedback [integration/config] - 10https://gerrit.wikimedia.org/r/254853 (owner: 10Hashar)
[14:21:45] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] dib: make git mirrors belong to jenkins:jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/254851 (owner: 10Hashar)
[14:23:10] <hashar>	 !log pushing new disk image to labs for Nodepool
[14:23:13] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:23:20] <grrrit-wm>	 (03Merged) 10jenkins-bot: dib: make git mirrors belong to jenkins:jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/254851 (owner: 10Hashar)
[14:23:22] <grrrit-wm>	 (03Merged) 10jenkins-bot: dib: mirror mediawiki/core inside images [integration/config] - 10https://gerrit.wikimedia.org/r/254852 (owner: 10Hashar)
[14:23:31] <wikibugs>	 6Release-Engineering-Team, 5Testing-Initiative-2015, 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-Examples, 7Documentation: Improve documentation around running/writing (with lots of examples) browser tests - https://phabricator.wikimedia.org/T108108#1825017 (10zeljkofilipin) a:5dduvall>3zel...
[14:23:48] <grrrit-wm>	 (03Merged) 10jenkins-bot: dib: tweak 01-mirror-gerrit-repos feedback [integration/config] - 10https://gerrit.wikimedia.org/r/254853 (owner: 10Hashar)
[14:24:38] <hashar>	 !log refreshing nodepool snapshot
[14:24:41] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:27:49] <hashar>	 !log Image ci-jessie-wikimedia-1448288646 in wmflabs-eqiad is ready
[14:27:53] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:31:46] <hashar>	 !log deleted obsolete nodepool instances so nodepool replenish the pool with new image
[14:31:49] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:37:13] <grrrit-wm>	 (03PS3) 10Zfilipin: Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/253343 (https://phabricator.wikimedia.org/T114860) 
[14:38:05] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/253343 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin)
[14:39:27] <hashar>	 bah
[14:42:44] <hashar>	 !log regenerating the nodepool images and snapshot. 51-git-mirror-ownership did not run because of missing executable bit 
[14:42:47] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[15:02:16] <hashar>	 !log updating rake-jessie job to use cached repos under /srv/git (for nodepool)
[15:02:19] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[15:02:25] <hashar>	 zeljkof: ^^^
[15:02:26] <hashar>	 some mess
[15:02:52] <hashar>	 I am update rake-jessie to clone the repo using /srv/git/$ZUUL_PROJECT.git as a reference
[15:02:55] <hashar>	 might speed it up
[15:04:20] <hashar>	 doesn't seem to show up https://integration.wikimedia.org/ci/job/rake-jessie/1252/consoleFull
[15:04:21] <hashar>	 :(
[15:04:24] <hashar>	 stupid git
[15:09:44] <hashar>	 time to migrate to zuul-cloner!
[15:14:04] <shinken-wm>	 PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:14:04] <shinken-wm>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:14:09] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:17:17] <grrrit-wm>	 (03PS1) 10Hashar: dib: add zuul to nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/254871 (https://phabricator.wikimedia.org/T117223) 
[15:17:31] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] dib: add zuul to nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/254871 (https://phabricator.wikimedia.org/T117223) (owner: 10Hashar)
[15:18:33] <grrrit-wm>	 (03Merged) 10jenkins-bot: dib: add zuul to nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/254871 (https://phabricator.wikimedia.org/T117223) (owner: 10Hashar)
[15:18:35] <shinken-wm>	 RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30650 bytes in 0.642 second response time
[15:18:35] <shinken-wm>	 RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39322 bytes in 0.612 second response time
[15:18:59] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 38981 bytes in 0.493 second response time
[15:20:34] <grrrit-wm>	 (03CR) 10Hashar: "Refreshing snapshot with:" [integration/config] - 10https://gerrit.wikimedia.org/r/254871 (https://phabricator.wikimedia.org/T117223) (owner: 10Hashar)
[15:25:21] <hashar>	 Image ci-jessie-wikimedia-1448292050 in wmflabs-eqiad is ready
[15:25:23] <hashar>	 !log Image ci-jessie-wikimedia-1448292050 in wmflabs-eqiad is ready
[15:25:26] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[15:49:35] <zeljkof>	 hashar: sorry, family emergency, have to go
[16:35:01] <wikibugs>	 10Deployment-Systems, 6operations: l10nupdate user uid mismatch between tin and mira - https://phabricator.wikimedia.org/T119165#1825437 (10bd808) I can assert that we are currently seeing uid preservation when rsycning from tin to mira. Checking the ownership of `/srv/mediawiki-staging/php-1.27.0-wmf.7/cache/...
[17:13:22] <hashar>	 !log deleting old Nodepool snapshots. Current one is  ci-jessie-wikimedia-1448296278
[17:13:25] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[17:28:58] <shinken-wm>	 PROBLEM - Puppet failure on deployment-eventlogging03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[17:46:09] <wikibugs>	 10Deployment-Systems, 6operations: l10nupdate user uid mismatch between tin and mira - https://phabricator.wikimedia.org/T119165#1825834 (10mmodell) @bd808: So it sounds like having consistent UIDs is the better / easier solution, based on the complexity of getting name mapping to work?
[17:47:01] <wikibugs>	 10Deployment-Systems, 3Scap3: Need a way to see config diffs in Scap - https://phabricator.wikimedia.org/T118206#1825842 (10mmodell)
[17:51:10] <wikibugs>	 10Deployment-Systems, 10MediaWiki-extensions-LocalisationUpdate, 7I18n, 7Wikimedia-log-errors: l10n-update not updating Vector and extensions - https://phabricator.wikimedia.org/T103879#1825877 (10Elitre) >>! In T103879#1663136, @Nemo_bis wrote: > @supernino reports about 5ccf6668d4e0c17c not being sync'ed...
[17:58:20] <ostriches>	 I'm thinking for the ERROR/WARNING graph we should just use linear scaling instead of log.
[17:58:35] <ostriches>	 Since we actually care about the absolute numbers for these 2 metrics and not just the trend.
[17:58:59] <ostriches>	 That has it looking like https://grafana.wikimedia.org/dashboard/db/production-logging?panelId=14&fullscreen
[17:59:26] <ostriches>	 Nice thing about linear is if one metric goes kaboom, it's really really freaking obvious.
[18:02:28] <wikibugs>	 10Continuous-Integration-Config, 5MW-1.26-release, 5Patch-For-Review: MediaWiki 1.26 bundled repo should be state of the art - https://phabricator.wikimedia.org/T115392#1723180 (10hashar) Refreshed table. ConfirmEdit got composer/npm added.
[18:02:33] <shinken-wm>	 RECOVERY - Puppet failure on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:06:33] <legoktm>	 https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/21216/console uhhh what?
[18:37:23] <wikibugs>	 10Continuous-Integration-Config, 5MW-1.26-release, 5Patch-For-Review: MediaWiki 1.26 bundled repo should be state of the art - https://phabricator.wikimedia.org/T115392#1826091 (10demon) I don't think we'll get composer.json created & merged w/ tests and backported to REL1_26 by Wednesday. Suggest just tryin...
[19:05:26] <grrrit-wm>	 (03PS1) 10Niedzielski: Move long running Android tests to periodic job [integration/config] - 10https://gerrit.wikimedia.org/r/254905 (https://phabricator.wikimedia.org/T118098) 
[19:09:21] <grrrit-wm>	 (03CR) 10Niedzielski: "@hashar, hello! Two questions:" [integration/config] - 10https://gerrit.wikimedia.org/r/254905 (https://phabricator.wikimedia.org/T118098) (owner: 10Niedzielski)
[19:30:13] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:35:05] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 39005 bytes in 0.614 second response time
[19:47:19] <ostriches>	 twentyafterfour: git-lfs only speaks http(s). I think that makes it a non-contender.
[19:48:41] <chasemp>	 ostriches: what are you guys looking for? (just curious)
[19:49:06] <ostriches>	 Replacement for git-fat support for large binaries (mostly jars)
[19:49:27] <ostriches>	 I'll probably go with git-annex since it supports rsync already.
[19:49:39] <ostriches>	 And it's already got a debian package :)
[19:51:28] <chasemp>	 cool
[19:53:57] <twentyafterfour>	 ostriches: https isn't necessarily a bad thing... but I'm also not against git-annex
[20:01:01] <ostriches>	 twentyafterfour: The problem with git-lfs too is it requires a server implementation. The only canonical one is Githubs, which is closed source. There's a couple others...none in Python
[20:01:46] <ostriches>	 (The repos on tin from which you fetch would have to be special)
[20:01:55] <twentyafterfour>	 yeah it doesn't really seem too mature yet
[20:11:58] <grrrit-wm>	 (03PS1) 10Hashar: [Cite] add composer tests [integration/config] - 10https://gerrit.wikimedia.org/r/254910 
[20:24:32] <hashar>	 ostriches: I started a bunch of composer test entry point for REL1_26 bundled stuff
[20:24:34] <hashar>	 else Jenkins will faill
[20:24:49] <hashar>	 we already trigger composer test on most of those repositories master branches apparently
[20:35:04] <ostriches>	 hashar: Sounds good :)
[20:38:02] <hashar>	 ostriches: should I just bump some REL1_26 branches ?
[20:38:07] <hashar>	 not sure whether I should do a merge commit
[20:38:10] <hashar>	 or force push
[20:38:16] <hashar>	 an example is Interwiki
[20:38:45] <hashar>	 it received l10n spam and basic tests/formatting changes
[20:39:27] <hashar>	 lame merge example: https://gerrit.wikimedia.org/r/254925 
[20:45:31] <hashar>	 ostriches: any clue ? :D
[20:45:50] <ostriches>	 Why not pull --rebase?
[20:46:16] <ostriches>	 Oh dur.
[20:46:17] <ostriches>	 Meh
[20:46:28] <hashar>	 I am merging master into REL1_26
[20:46:39] <hashar>	 but I can well just push an update of REL1_26
[20:46:52] <ostriches>	 I hate merge commits in gerrit.
[20:46:58] <hashar>	 who care about Gerrit :D
[20:47:16] <hashar>	 the question is more: should I silently update the REL1_26
[20:47:28] <hashar>	 or make it clear it has been manually updated since we have cut the branch by crafting a merge commit
[20:47:36] <ostriches>	 Yeah do the merge commit.
[20:47:40] <ostriches>	 Force pushing is even worse.
[20:47:48] <hashar>	 I believe
[20:48:02] <hashar>	 wanna review them or should I just +2?
[20:56:12] <ostriches>	 go ahead, making lunch
[21:06:03] <grrrit-wm>	 (03PS2) 10Hashar: composer tests for a few extensions [integration/config] - 10https://gerrit.wikimedia.org/r/254910 
[21:06:25] <hashar>	 adding composer
[21:07:41] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] composer tests for a few extensions [integration/config] - 10https://gerrit.wikimedia.org/r/254910 (owner: 10Hashar)
[21:08:44] <grrrit-wm>	 (03Merged) 10jenkins-bot: composer tests for a few extensions [integration/config] - 10https://gerrit.wikimedia.org/r/254910 (owner: 10Hashar)
[21:14:44] <grrrit-wm>	 (03PS1) 10Arlolra: Increase parsoid npm test timeout [integration/config] - 10https://gerrit.wikimedia.org/r/254946 
[21:15:21] <wikibugs>	 10Deployment-Systems, 3Scap3: Scap3 needs a way to handle large binary file transport - https://phabricator.wikimedia.org/T119443#1826547 (10thcipriani) 3NEW
[21:21:05] <wikibugs>	 10Deployment-Systems, 3Scap3: Scap3 needs a way to handle large binary file transport - https://phabricator.wikimedia.org/T119443#1826575 (10thcipriani) From our meeting, @demon specifically called out `.jar` files being important in this workflow.  Contenders are: * git-fat * git-lfs * git-annex  There have b...
[21:23:16] <wikibugs>	 10Deployment-Systems, 3Scap3: Scap3 needs a way to handle large binary file transport - https://phabricator.wikimedia.org/T119443#1826579 (10demon) >>! In T119443#1826575, @thcipriani wrote: > @demon mentioned in IRC that `git-lfs` is https-only and requires a special server implementation—which may not jive w...
[21:24:29] <ostriches>	 thcipriani: The only two implementations that seem general enough for us are in Java and Node.
[21:24:35] <ostriches>	 The rest seem rather use-case-specific.
[21:25:12] <ostriches>	 Requiring either as a dependency for scap rubs me real icky like
[21:25:26] <thcipriani>	 heh, agreed.
[21:26:07] <ostriches>	 git-annex on the other hand is really just client-side. You can fetch the objects from basically anywhere
[21:26:43] <ostriches>	 (on disk, rsync, etc etc etc)
[21:26:43] <ostriches>	 In that regards, it's much more like git-fat.
[21:26:43] <thcipriani>	 I use git-annex for all my raw photo files and it works really well for my personal use-case I've had no real problems with it + the s3-gpg-backing. Works gerat.
[21:27:10] <ostriches>	 Plus it has debian packages.
[21:28:38] <ostriches>	 So just have to add it to our package deps and call it a day
[21:28:43] <wikibugs>	 10Continuous-Integration-Config, 5MW-1.26-release, 5Patch-For-Review: MediaWiki 1.26 bundled repo should be state of the art - https://phabricator.wikimedia.org/T115392#1826587 (10hashar)
[21:28:43] <ostriches>	 The hardest part I think is figuring out the workflow. Do you do the initial local/Gerrit/Phab work with git-annex too? Or is it something you handle on the deploy master on the fly?
[21:29:04] <ostriches>	 I think the latter.
[21:30:05] <ostriches>	 Also: do we make it "smart" and have it automatically annex binary files over N size?
[21:30:14] <ostriches>	 Or do we require explicit registration of binaries to track
[21:30:48] <thcipriani>	 well if it's going to be explicit then we'd have to make it part of the phab/gerrit/local workflow mostly, I'd think.
[21:31:02] <ostriches>	 Hmm true
[21:31:06] <thcipriani>	 developers would have to select which files to add to annex then.
[21:31:31] <thcipriani>	 also, what is the default transport mechanism for annex?
[21:31:40] <thcipriani>	 I mean for our use.
[21:31:46] <ostriches>	 I'd say rsync for us.
[21:31:57] <ostriches>	 We already have it everywhere, it's a known beast to deal with atm.
[21:32:30] <thcipriani>	 sure, have to expand the rsync server modules to cover /srv/deployment in that case.
[21:34:30] <thcipriani>	 hmm, this is a big ball of work :P
[21:34:53] <ostriches>	 Heh, now that I poke git-annex more, I wonder if `git annex sync` to keep co-masters sane would be better than rsync
[21:37:32] <thcipriani>	 sigh. To paraphrase master p: mo' masters mo' problems.
[21:37:47] <grrrit-wm>	 (03PS1) 10Hashar: Add composer test to three REL1_26 skins [integration/config] - 10https://gerrit.wikimedia.org/r/255016 
[21:39:39] <grrrit-wm>	 (03CR) 10Cscott: [C: 031] Increase parsoid npm test timeout [integration/config] - 10https://gerrit.wikimedia.org/r/254946 (owner: 10Arlolra)
[21:39:41] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/255016 (owner: 10Hashar)
[21:40:07] <thcipriani>	 ostriches: have time to take on T119443? You probably seem like you've got a good start with it :P
[21:40:20] <ostriches>	 Not this week
[21:41:12] <thcipriani>	 kk. It's in the to triage column for now, we'll just leave it there until some one has time to circle back.
[21:42:57] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Add composer test to three REL1_26 skins [integration/config] - 10https://gerrit.wikimedia.org/r/255016 (owner: 10Hashar)
[21:45:20] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add composer test to three REL1_26 skins [integration/config] - 10https://gerrit.wikimedia.org/r/255016 (owner: 10Hashar)
[21:53:29] <wikibugs>	 10Continuous-Integration-Config, 5MW-1.26-release, 5Patch-For-Review: MediaWiki 1.26 bundled repo should be state of the art - https://phabricator.wikimedia.org/T115392#1826668 (10hashar)
[21:54:15] <wikibugs>	 10Continuous-Integration-Config, 5MW-1.26-release, 5Patch-For-Review: MediaWiki 1.26 bundled repo should be state of the art - https://phabricator.wikimedia.org/T115392#1723180 (10hashar) I have added a bunch of composer entry points. Some are falling such as LocalizationUpdate https://gerrit.wikimedia.org/r...
[22:02:39] <wikibugs>	 10Deployment-Systems, 3Scap3: Need a way to restart services without deploying via scap - https://phabricator.wikimedia.org/T119449#1826691 (10thcipriani) 3NEW
[22:05:23] <greg-g>	 marxarelli: oh, drive by link drop again: https://office.wikimedia.org/wiki/Systems_guide_for_new_hires
[22:05:38] <greg-g>	 marxarelli: https://office.wikimedia.org/wiki/New_tech_employee_orientation
[22:06:33] <marxarelli>	 greg-g: danke!
[22:07:10] <greg-g>	 np :)
[22:07:42] <marxarelli>	 i don't think i've ever seen this page
[22:07:45] <marxarelli>	 :)
[22:09:14] <chasemp>	 that happened for me too don't feel bad :)
[22:10:56] <greg-g>	 marxarelli: wait, really? I could've swore I sent that to you and mukunda and tyler.... maybe I didn't for you since officially robla was the hiring manager
[22:10:59] * greg-g shrugs
[22:11:43] <marxarelli>	 greg-g: yeah, you weren't my manager yet
[22:11:47] <twentyafterfour>	 ?
[22:11:52] <subbu>	 can someone review https://gerrit.wikimedia.org/r/#/c/254946/ and enable it .. we are getting a bit too many false failures because of the tight timeout after we merged some code that slows down our test runs.
[22:12:05] <marxarelli>	 it's entirely possible that robla sent it to me, but also likely that i dropped it due to overload
[22:12:15] * robla tries to figure out what he's being thrown under the bus for :-P
[22:12:29] <marxarelli>	 it's robla's fault
[22:12:31] <marxarelli>	 get him!
[22:12:35] <twentyafterfour>	 lol
[22:12:43] <thcipriani>	 greg-g: you might have sent that out, but it looks like something I might have skipped in the information overload of my first few days.
[22:14:37] <thcipriani>	 I definitely got the new tech employee thing.
[22:15:33] * greg-g forgot robla was in here
[22:15:35] * greg-g denies everything
[22:20:15] <chasemp>	 first few days are a blur :)
[22:21:55] <wikibugs>	 10Deployment-Systems, 3Scap3: Need a way to restart services without deploying via scap - https://phabricator.wikimedia.org/T119449#1826747 (10thcipriani) Unlike a normal `deploy` there is no recourse for deployers that do a restart via Scap3 that fails subsequent checks. If a service restart fails or a servic...
[22:25:28] <greg-g>	 hashar: btw, subbu's request up there, if you have time ^^
[22:25:49] <subbu>	 hashar said earlier that he as "sleeping" :)
[22:26:14] <subbu>	 "I am busy/ sleeping "
[22:26:37] <greg-g>	 nvm :)
[22:26:58] <subbu>	 but, don't let it stop anyone else who can do it. ;)
[22:29:28] <hashar>	 subbu: well actually you should be able to refresh the jobs using JJB
[22:29:36] <hashar>	 I would, but too late to monitor them :(
[22:29:44] <subbu>	 hashar, we do refresh them via recheck.
[22:29:46] <hashar>	 ultimately, we will want to speed the test!
[22:30:08] <subbu>	 but, that is not a good solution to keep rechecking whenever the tests take a bit too long to run.
[22:30:16] <subbu>	 hence the increased timeout.
[22:30:23] <grrrit-wm>	 (03PS2) 10Hashar: Increase parsoid npm test timeout [integration/config] - 10https://gerrit.wikimedia.org/r/254946 (owner: 10Arlolra)
[22:30:44] <hashar>	 subbu: do the tests can take advantage of parallelization?
[22:30:48] <hashar>	 just wondering
[22:31:00] <subbu>	 no, they run one after another. 
[22:31:25] <subbu>	 and our parser tests just slowed down by 1 minute .. so, that adds 2 minutes to our previous run time .. which cuts it quite closed with the 10 min. limit we have.
[22:31:49] <hashar>	 !log Updating parsoidsvc-deploy-npm-* jobs  https://gerrit.wikimedia.org/r/254946
[22:31:53] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[22:31:59] <subbu>	 legoktm, ^
[22:32:02] <subbu>	 thanks.
[22:32:10] <hashar>	 !log Updating parsoidsvc-source-npm-* jobs  https://gerrit.wikimedia.org/r/254946
[22:32:13] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[22:32:25] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "Updated all four jobs with JJB :-)" [integration/config] - 10https://gerrit.wikimedia.org/r/254946 (owner: 10Arlolra)
[22:32:43] <legoktm>	 and I'm still slower than sleeping hashar
[22:32:55] <subbu>	 ha ha ..
[22:32:58] <hashar>	 subbu: how and the jenkins job update doc is at https://www.mediawiki.org/wiki/CI/JJB   :-D
[22:33:16] <hashar>	 legoktm: that is because you are doing some other things while I am only starring at this channel ! :D
[22:33:23] <subbu>	 thanks. will add a link to that on our wiki page.
[22:33:33] <hashar>	 subbu: and we will want to run them in parallel one day
[22:33:34] <grrrit-wm>	 (03Merged) 10jenkins-bot: Increase parsoid npm test timeout [integration/config] - 10https://gerrit.wikimedia.org/r/254946 (owner: 10Arlolra)
[22:33:52] <hashar>	 subbu: assuming they don't have side effect, we could split them in four groups and consume four cpus :D
[22:34:16] <hashar>	 subbu: the ones from mediawiki/core are slow as hell, takes up to 6 minutes under Zend :-(((
[22:35:26] <subbu>	 hashar, yes, in parallel would be good.
[22:37:04] <wikibugs>	 10Continuous-Integration-Config, 5MW-1.26-release, 5Patch-For-Review: MediaWiki 1.26 bundled repo should be state of the art - https://phabricator.wikimedia.org/T115392#1826794 (10hashar) ConfirmEdit registration was added via T88047  https://gerrit.wikimedia.org/r/#/c/250277/ is the backport for REL1_26.  W...
[22:37:48] <hashar>	 subbu: and maybe some profiling can show low hanging fruit to speed them up :D
[22:37:50] <hashar>	 but anyway
[22:37:56] <hashar>	 I am supposed to sleep by now
[22:38:24] <greg-g>	 hashar: go sleep!
[22:38:27] <greg-g>	 g'night :)
[22:39:20] <hashar>	 thx *wave*
[22:44:06] <wikibugs>	 10Deployment-Systems, 6Performance-Team, 6operations, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1826816 (10thcipriani) Started talking about this at the deployment [[https://www.mediawiki.org/wiki/Deployment_tooling/Cabal/2015-11-23#etcd...
[22:44:19] <wikibugs>	 10Deployment-Systems, 3Scap3, 6Performance-Team, 6operations, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1826818 (10thcipriani)
[22:46:43] <ebernhardson>	 anyone mind if i take the 3-4 (starts in 15min) deploy window to try again at enabling the ES labs replica writes?
[22:47:25] <ebernhardson>	 last time around the write load was too heavy and the disks couldn't keep up, this time enabling just enwiki and dewiki (whereas last time around enabled everything but enwiki and dewiki)
[22:48:32] <ostriches>	 I'm guessing it's open between now-swat?
[22:52:55] <greg-g>	 yeah, I said yes in -operations, but then spam
[22:59:18] <twentyafterfour>	 ostriches: still wanna land https://phabricator.wikimedia.org/D51 ?
[23:02:42] <ostriches>	 accepted.
[23:08:19] <wikibugs>	 10Deployment-Systems, 3Scap3: Need a way to restart services without deploying via scap - https://phabricator.wikimedia.org/T119449#1826945 (10dduvall) In thinking about this problem a bit more, I think it may be problematic to entirely split promote into promote/restart, mainly for the pooling reasons you men...
[23:43:46] <wikibugs>	 10Beta-Cluster-Infrastructure, 10netops, 6operations, 7Database: Evaluate security concerns of logging beta cluster db queries on tendril - https://phabricator.wikimedia.org/T119461#1827042 (10jcrespo) 3NEW
[23:54:53] <shinken-wm>	 PROBLEM - Puppet failure on pmcache is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]