[03:16:26] <wmf-insecte>	 Yippee, build fixed!
[03:16:26] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #762: 09FIXED in 26 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/762/
[03:40:45] <grrrit-wm>	 (03PS1) 10Krinkle: Enable npm for mediawiki/extensions/ShortUrl [integration/config] - 10https://gerrit.wikimedia.org/r/249341 
[03:40:56] <grrrit-wm>	 (03CR) 10Krinkle: [C: 032] Enable npm for mediawiki/extensions/ShortUrl [integration/config] - 10https://gerrit.wikimedia.org/r/249341 (owner: 10Krinkle)
[03:41:59] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable npm for mediawiki/extensions/ShortUrl [integration/config] - 10https://gerrit.wikimedia.org/r/249341 (owner: 10Krinkle)
[03:43:30] <Krinkle>	 !log Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/249341 (
[03:43:33] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[04:26:09] <wmf-insecte>	 Yippee, build fixed!
[04:26:09] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #605: 09FIXED in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/605/
[04:58:16] <wikibugs>	 10Deployment-Systems, 3Scap3: Scap3 targets should use a config file rather than `key:value` arguments - https://phabricator.wikimedia.org/T116432#1760887 (10mmodell) related: {D28}
[05:05:59] <twentyafterfour>	 so apparently scap isn't mirrored to github anymore?
[05:06:39] <twentyafterfour>	 nevermind I'm dumb. was looking at ancient revision in github.
[05:17:16] <shinken-wm>	 PROBLEM - Puppet failure on deployment-cache-parsoid04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[05:19:29] <grrrit-wm>	 (03PS1) 10Legoktm: Remove jshint from ProofreadPage [integration/config] - 10https://gerrit.wikimedia.org/r/249349 
[05:19:38] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Remove jshint from ProofreadPage [integration/config] - 10https://gerrit.wikimedia.org/r/249349 (owner: 10Legoktm)
[05:26:09] <grrrit-wm>	 (03Merged) 10jenkins-bot: Remove jshint from ProofreadPage [integration/config] - 10https://gerrit.wikimedia.org/r/249349 (owner: 10Legoktm)
[05:26:42] <legoktm>	 !log deploying https://gerrit.wikimedia.org/r/249349
[05:26:45] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[06:30:03] <wmf-insecte>	 Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #788: 04FAILURE in 11 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/788/
[06:31:17] <shinken-wm>	 PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[06:36:11] <grrrit-wm>	 (03CR) 1020after4: [C: 031] Provide scap control server FQDN to proxy sync commands [tools/scap] - 10https://gerrit.wikimedia.org/r/247965 (https://phabricator.wikimedia.org/T104826) (owner: 10BryanDavis)
[07:11:21] <shinken-wm>	 RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0]
[08:22:25] <wikibugs>	 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1761092 (10Luke081515) @Mww113: > (User rights log); 09:19 . . Luke081515 (Talk | contribs | block) changed group membership for Mww113@metawiki from (none) to administrator and confirmed user ‎(per T116364)  but I...
[08:32:00] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #765: 04FAILURE in 21 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/765/
[09:18:30] <mobrovac>	 hashar: 'round?
[09:18:39] <hashar>	 mobrovac: bonjour :-}
[09:18:48] <mobrovac>	 bonjour!
[09:18:49] <hashar>	 beside my net being crappy yeah I am around
[09:18:54] <mobrovac>	 haha
[09:18:55] <hashar>	 just finished the checkin with zeljko
[09:18:57] <hashar>	 what is up?
[09:19:11] <mobrovac>	 deployment-parsoidcache0x is what's up
[09:19:18] <mobrovac>	 they don't seem to work
[09:19:21] <mobrovac>	 can't even ssh in
[09:19:51] <hashar>	 ohh
[09:19:58] <mobrovac>	 so http://parsoid-beta.wmflabs.org gives me 503s
[09:20:08] <hashar>	 folks where playing with Parsoid yesterday evening
[09:20:15] <mobrovac>	 *sigh*
[09:20:46] <hashar>	 and the deployment-parsoidcache is still running Trusty :-(
[09:21:27] <mobrovac>	 yeah ideally we should switch it to jessie
[09:21:53] <hashar>	 deployment-parsoidcache02
[09:21:53] <hashar>	 ubuntu-14.04-trusty (deprecated 2014-10-03)
[09:22:00] <hashar>	 deployment-cache-parsoid04
[09:22:01] <hashar>	 debian-8.1-jessie
[09:22:10] <hashar>	 not sure which one is being used
[09:25:05] <hashar>	 <rant>why do I always have to fix stuff myself</rant>
[09:26:50] <hashar>	 mobrovac: parsoid-beta.wmflabs.org  points to the public IP  208.80.155.156
[09:26:58] <hashar>	 which does not shows up on https://wikitech.wikimedia.org/wiki/Special:NovaAddress :/
[09:27:17] <hashar>	 ah that is the shared proxy
[09:27:21] <hashar>	 wtf
[09:27:38] <mobrovac>	 uf this morning it's messy wherever i turn
[09:27:39] <mobrovac>	 damn
[09:28:15] <hashar>	 so on beta the parsoid cache is not used apparnetly
[09:30:38] <shinken-wm>	 PROBLEM - Puppet failure on deployment-fluorine is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[09:31:17] <hashar>	 mobrovac: any idea where parsoid-beta.wmflabs.org entry is configured?
[09:31:29] <hashar>	 I mean what is the Parsoid/VE setting that points to parsoid-beta.wmflabs.org
[09:32:07] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Varnish: deployment-parsoidcache02 fails puppet: Role::Cache::Statsd/Varnish::Logging::Statsd[default]/Base::Service_unit[varnishstatsd-default]/Service[varnishstatsd-default]: Could not evaluate: Could not find init script or up... - https://phabricator.wikimedia.org/T103660#1761154
[09:32:12] <hashar>	 my net is crap
[09:32:32] <mobrovac>	 parsoid-beta is supposed point to parsoid's cache
[09:32:57] <mobrovac>	 ve and rb configs point to the internal IP of deployment-parsoidcache05 if i'm not mistaken
[09:33:36] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Varnish: deployment-parsoidcache02 fails puppet: Role::Cache::Statsd/Varnish::Logging::Statsd[default]/Base::Service_unit[varnishstatsd-default]/Service[varnishstatsd-default]: Could not evaluate: Could not find init script or up... - https://phabricator.wikimedia.org/T103660#1761155
[09:38:08] <hashar>	 https://github.com/wikimedia/mediawiki-services-parsoid-deploy/blob/master/conf/wmf/betalabs.localsettings.js
[09:38:17] <hashar>	 parsoidConfig.parsoidCacheURI = 'http://10.68.16.145/'; // deployment-parsoidcache01.eqiad.wmflabs
[09:38:33] <hashar>	 that one is the Trusty instance
[09:39:05] <hashar>	 ah
[09:39:06] <hashar>	 wmf-config/CommonSettings-labs.php:     $wmgParsoidURL = 'http://10.68.16.145'; // deployment-parsoidcache02.eqiad
[09:39:29] <hashar>	 mobrovac: I have no idea what  parsoid-beta.wmflabs.org is for
[09:40:20] <hashar>	 mobrovac: VE has been made to hit RESTBase instead of Parsoid isn't it ?
[09:40:32] <mobrovac>	 when i curl, i can see: X-Cache: deployment-parsoidcache02 miss (0), deployment-parsoidcache02 frontend miss (0)
[09:40:36] <mobrovac>	 hashar: yup
[09:41:24] <mobrovac>	 parsoid-beta is supposed to be used by outside processes
[09:41:32] <mobrovac>	 w.g. we use it for restbase testing in travis
[09:44:57] <hashar>	 so parsoid-beta is some entry point which is not related to beta cluster isit?
[09:45:02] <hashar>	 else it should point to a deployment-parsoidcache instance 
[09:46:10] <hashar>	 varnish doesn't start on the Jessie cache (deployment-cache-parsoid04) 
[09:46:23] <mobrovac>	 uf it's a mess
[09:46:29] <mobrovac>	 it should point to parosid in beta
[09:47:19] <hashar>	 and thanks to systemd I have very useful messages:
[09:47:20] <hashar>	    Failed to start varnish (Varnish HTTP Accelerator).
[09:47:27] <hashar>	    Unit varnish.service entered failed state.
[09:47:57] <grrrit-wm>	 (03PS5) 10Florianschmidtwelzow: Output plain html entities [tools/release] - 10https://gerrit.wikimedia.org/r/231317 
[09:48:09] <hashar>	 so for now, parsoid-beta.wmflabs.org point to  something, but that is not in the deployment-prep labs project 
[09:48:49] <hashar>	 for file /srv/vdb/varnish.main1 failed: No space left on device  !!
[09:48:54] <hashar>	 progress
[09:49:29] <hashar>	 f**** labs
[09:49:33] <hashar>	 found it
[09:49:41] <hashar>	  /srv/vdb only has 500 MB of disk
[09:49:42] <hashar>	 :-D
[09:51:40] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761186 (10hashar)
[09:53:36] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1395617 (10hashar) a:3hashar Deleted deployment-cache-parsoid04 which is too small.  Created deployment-cache-parsoid05 a m1.medium or 40GB of disk.
[09:54:27] <hashar>	 !log beta: deleting deployment-cache-parsoid04 not enough disk space for /srv/  ( https://phabricator.wikimedia.org/T103660 )
[09:54:30] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[09:56:26] <grrrit-wm>	 (03PS1) 10Phedenskog: Split WebPageTest jobs into two [integration/config] - 10https://gerrit.wikimedia.org/r/249363 
[09:56:58] <hashar>	 mobrovac: I am going to  just nuke the current Parsoid cache running Trusty
[09:57:25] <shinken-wm>	 PROBLEM - Host deployment-cache-parsoid04 is DOWN: CRITICAL - Host Unreachable (10.68.19.197)
[09:57:34] <mobrovac>	 kk hashar, makes sense
[09:58:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761212 (10hashar) We should delete:  | deployment-parsoidcache02 | Trusty | 10.68.16.145  To be replaced with:  | deployment-cache-parsoid05 | Jessie | 10.68.20.102
[09:58:38] <hashar>	 !log Deleting  deployment-parsoidcache02 (Trusty) 10.68.16.145   to be replaced with deployment-cache-parsoid05 10.68.20.102 (Jessie)
[09:58:41] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[09:59:26] <hashar>	 mobrovac: would you mind preparing patches to replace 10.68.16.145  by 10.68.20.102   ?   Would need updates probably in all of operations/puppet.git  operations/mediawiki-config.git  and  mediawiki/services/parsoid/deploy.git
[09:59:36] <hashar>	 + RESTBase if needed
[09:59:43] <hashar>	 all patches can refer to T103660 -:)
[10:00:08] <mobrovac>	 hashar: that's the cache's IP?
[10:00:13] <hashar>	 yeah
[10:00:16] <mobrovac>	 kk
[10:00:19] <hashar>	 which I am rebuilding
[10:00:34] <hashar>	 we have been using the Trusty instance  ( 10.68.16.145 ) for ages
[10:00:41] <hashar>	 but varnish is outdated there
[10:00:49] <hashar>	 then an instance got created for Jessie
[10:00:52] <hashar>	 but never got completed
[10:01:00] <hashar>	 and did not have enough disk space
[10:01:57] <hashar>	 !log applying  role::cache::parsoid to deployment-cache-parsoid05
[10:02:01] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[10:03:34] <hashar>	 blah la running puppet a few times
[10:04:57] <hashar>	 root     16839     1  0 10:03 ?        00:00:00 python /usr/local/bin/varnishxcps --statsd-server=statsd.eqiad.wmnet
[10:04:57] <hashar>	 root     17097     1  1 10:03 ?        00:00:00 python /usr/local/bin/varnishrls --statsd-server=statsd.eqiad.wmnet
[10:04:59] <hashar>	 ....
[10:05:56] <mobrovac>	 lol
[10:06:42] <mobrovac>	 hashar: patch #1: https://gerrit.wikimedia.org/r/249366
[10:06:59] <mobrovac>	 (no refs of that IP in ops/puppet btw)
[10:07:12] <mobrovac>	 going to parsoid/deploy now
[10:08:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 6Analytics-Engineering, 6operations, 7Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1761231 (10hashar) 3NEW
[10:08:40] <hashar>	 moaar bugs
[10:09:21] <mobrovac>	 hashar: on 10.68.20.102 is varnish, right?
[10:09:49] <hashar>	 mobrovac: yeah   deployment-cache-parsoid05  10.68.20.102  
[10:09:52] <mobrovac>	 kk
[10:09:53] <hashar>	 varnish is up
[10:10:38] <hashar>	 I need a break  coffee/nature etc
[10:10:41] <hashar>	 net is crap here :-/
[10:12:04] <mobrovac>	 yeaaaah
[10:12:04] <mobrovac>	 me too
[10:13:13] <mobrovac>	 hashar: https://gerrit.wikimedia.org/r/249367 for parsoid/deploy
[10:17:28] <hashar>	 greaat
[10:17:50] <hashar>	 my net feels like it is year 1995 again
[10:25:11] <hashar>	 mobrovac: +2ed both
[10:25:37] <hashar>	 mobrovac: I have no idea whether that parsoid cache instance has any public URL
[10:26:29] <hashar>	 I have no idea how VisualEditor communicate with Parsoid nowadays
[10:26:50] <mobrovac>	 hashar: via restbase
[10:27:35] <hashar>	 does RESTBase uses the  $wmgParsoidURL = 'http://10.68.20.102';  setting?
[10:28:45] <mobrovac>	 hashar: RB uses https://github.com/wikimedia/operations-puppet/blob/production/hieradata/labs/deployment-prep/common.yaml#L69
[10:28:54] <mobrovac>	 in beta
[10:29:57] <hashar>	 so many configuration systems :-}
[10:30:05] <hashar>	 so
[10:30:32] <hashar>	 a VE user hits restbase public entry point somehow (I guess some varnish cache) then restbase directly it the Parsoid system
[10:30:39] <hashar>	 shouldn't it uses the parsoid cache instead?
[10:30:54] <mobrovac>	 no, we need to talk to parsoid directly
[10:31:11] <mobrovac>	 RB is the new parsoid cache :)
[10:31:48] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Database: Investigate slow query logging/digest for Beta Cluster - https://phabricator.wikimedia.org/T116793#1761285 (10hashar)
[10:34:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Database: Investigate slow query logging/digest for Beta Cluster - https://phabricator.wikimedia.org/T116793#1761291 (10hashar) @jcrespo that is a follow up task after the beta cluster outage (T116447).   Dan mentioned the beta cluster databases do not log slow queries.  We thou...
[10:36:07] <hashar>	 ah
[10:36:20] <hashar>	 mobrovac: so does that mean we can eventually get rid of the parsoid cache soonish?
[10:36:49] <mobrovac>	 exactly hashar!
[10:37:08] <hashar>	 but we still have $wmgParsoidURL = 'http://10.68.20.102';    ..
[10:37:40] <hashar>	 so
[10:37:40] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Database: Investigate slow query logging/digest for Beta Cluster - https://phabricator.wikimedia.org/T116793#1761295 (10jcrespo) Are you committing time to this?  > We (RelEng) probably won't be able to commit any time to it right now --greg
[10:37:42] <hashar>	 beta got updated
[10:37:52] <hashar>	 I have no idea how to verify whether the parsoid cache behaves properly
[10:40:31] <hashar>	 Error loading data from server: HTTP 504.  :D
[10:41:16] <hashar>	 http://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User%3AHashar    Failed to load resource: the server responded with a status of 504 (Gateway Timeout)
[10:41:41] <hashar>	 what a mess
[10:41:48] <hashar>	 I found that error via the console log
[10:42:01] <hashar>	 and I had to curl to get the actual error:
[10:42:15] <hashar>	 {"type":"https://restbase.org/errors/internal_http_error","method":"get","detail":"Error: connect ECONNREFUSED","uri":"http://deployment-parsoid05.deployment-prep.eqiad.wmflabs:8000/v2/en.wikipedia.beta.wmflabs.org/pagebundle/User%3AHashar/85930"}
[10:45:05] <wikibugs>	 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761312 (10hashar) Doing a VE edit on beta I get:       Error loading data from server: HTTP 504.  In the browser console there is:      http://en.wikipedia.beta...
[10:48:13] <wikibugs>	 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761318 (10hashar) The merge of https://gerrit.wikimedia.org/r/#/c/249367/ did trigger the Jenkins job `beta-parsoid-update-eqiad` with: ``` + sudo /etc/init.d/p...
[10:49:35] <wikibugs>	 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761322 (10hashar)
[10:55:56] <hashar>	 mobrovac: any clue where Parsoid logs are sent to ?  We used to have something like /var/lib/parsoid/parsoid.log
[10:56:18] <hashar>	 or is that solely relying on logstash now ?
[10:56:32] <mobrovac>	 i think it's logstash only
[10:56:36] * mobrovac checking
[10:57:08] <hashar>	 which is rather annoying when the service refuses to start :D
[10:57:35] <hashar>	 I wish we had some error output to be sent to  /var/log/upstart/parsoid.log
[10:57:36] <hashar>	 but Iam ranting
[10:57:55] <mobrovac>	 hashar: according to the conf it's logging both to stdout and logstash
[10:58:10] <mobrovac>	 hashar: so for systemd, journalctl might help you see these
[10:58:32] <hashar>	 oh
[10:58:34] <hashar>	  /data/project/parsoid/parsoid.log
[10:59:41] <hashar>	 module.js:340
[10:59:41] <hashar>	     throw err;
[10:59:41] <wikibugs>	 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761333 (10hashar) Parsoid fails with: ``` module.js:340     throw err;           ^ Error: Cannot find module '/srv/deployment/parsoid/p...
[10:59:44] <hashar>	           ^
[10:59:44] <hashar>	 Error: Cannot find module '/srv/deployment/parsoid/parsoid/api/server.js'
[10:59:49] <hashar>	     at Function.Module._resolveFilename (module.js:338:15)
[10:59:49] <hashar>	     at Function.Module._load (module.js:280:25)
[10:59:49] <hashar>	     at Function.Module.runMain (module.js:497:10)
[10:59:49] <hashar>	     at startup (node.js:119:16)
[10:59:49] <hashar>	     at node.js:902:3
[11:02:14] <wikibugs>	 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761342 (10hashar) 5Open>3stalled stalled / pending fixup of Parsoid.  Should fill another task for it.
[11:05:02] <wikibugs>	 6Release-Engineering-Team, 3Scap3, 10Wikimedia-Developer-Summit-2016: Scap3: updates, upgrades, and challenges - https://phabricator.wikimedia.org/T114045#1761352 (10Qgil) Hi @thcipriani, this proposal is focusing on a Summit session but there is no indication about topics that could be discussed here before...
[11:10:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761372 (10hashar) 3NEW
[11:12:25] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761372 (10hashar) Most probably caused by https://gerrit.wikimedia.org/r/#/c/249138/ //T115665: Reorg parsoid repo//  It renamed a bunch of files and the pupp...
[11:13:29] <wikibugs>	 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761395 (10hashar) Lot of files have been renamed in the parsoid source repo but the setting files to start the service have not been up...
[11:15:07] <hashar>	 mobrovac: I believe the Jessie cache is fine now
[11:15:22] <hashar>	 mobrovac: the root cause is Parsoid does not start anymore because a bunch of files got renamed in mediawiki/services/parsoid
[11:15:37] <hashar>	 mobrovac: I have filled a task for them to adjust the /etc/default/parsoid in puppet;
[11:15:43] <hashar>	 meanwhile VE/ PArsoid will remain broken :-}
[11:16:33] <mobrovac>	 still getting 503s for curl -v parsoid-beta.wmflabs.org
[11:16:33] <mobrovac>	 :/
[11:29:46] <wikibugs>	 5Continuous-Integration-Scaling, 6operations: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1761440 (10hashar) Holy hell, how do you manage to install servers so fast ? :-}
[11:33:45] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761460 (10mobrovac) `PARSOID_BASE_PATH` and `NODE_PATH` in `/etc/default/parsoid` are wrong.
[11:38:24] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit, 5Patch-For-Review, 7Technical-Debt: Disable Gerrit replication to production slaves - https://phabricator.wikimedia.org/T86661#1761470 (10hashar)
[11:38:26] <wikibugs>	 5Continuous-Integration-Scaling, 6operations: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1761469 (10hashar)
[11:41:10] <wikibugs>	 5Continuous-Integration-Scaling, 6operations: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1761472 (10hashar) Need to get rid of the Gerrit replication ( T86661 )  We will need to make sure all slaves (gallium.wikimedia.org and instances in contintcloud and inte...
[12:24:46] <wikibugs>	 5Continuous-Integration-Scaling, 6operations, 7Blocked-on-Operations: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1761540 (10fgiunchedi) afaict our puppet hooks for jessie does include `thirdparty`  ```     package_builder::pbuilder...
[12:32:00] <grrrit-wm>	 (03PS1) 10Hashar: multigit.sh: no more hardcode Zuul git URL [integration/jenkins] - 10https://gerrit.wikimedia.org/r/249387 (https://phabricator.wikimedia.org/T95046) 
[12:32:11] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] multigit.sh: no more hardcode Zuul git URL [integration/jenkins] - 10https://gerrit.wikimedia.org/r/249387 (https://phabricator.wikimedia.org/T95046) (owner: 10Hashar)
[12:33:04] <grrrit-wm>	 (03Merged) 10jenkins-bot: multigit.sh: no more hardcode Zuul git URL [integration/jenkins] - 10https://gerrit.wikimedia.org/r/249387 (https://phabricator.wikimedia.org/T95046) (owner: 10Hashar)
[12:49:25] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] build: Pass -s to phpcs for easier debugging [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/249320 (owner: 10Legoktm)
[12:49:50] <grrrit-wm>	 (03Merged) 10jenkins-bot: build: Pass -s to phpcs for easier debugging [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/249320 (owner: 10Legoktm)
[13:31:05] <shinken-wm>	 PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[13:41:03] <shinken-wm>	 RECOVERY - Puppet failure on integration-slave-precise-1013 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:43:38] <wikibugs>	 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Puppet, and 2 others: Puppetize npm/grunt manual setup - https://phabricator.wikimedia.org/T113903#1761698 (10hashar) All good on permanent slaves.  When https://gerrit.wikimedia.org/r/#/c/244748/ is merged, we ca...
[13:44:17] <wikibugs>	 10Continuous-Integration-Config: Provide a CI job to generate JS code coverage reports for extensions by using karma-coverage instead of just karma - https://phabricator.wikimedia.org/T116808#1761699 (10hashar)
[13:44:53] <wikibugs>	 10Browser-Tests: Passed Jenkins jobs should have links to Sauce Labs jobs - https://phabricator.wikimedia.org/T48890#1761701 (10hashar)
[13:47:56] <wikibugs>	 10Continuous-Integration-Infrastructure: phplint fails on paths containing a space - https://phabricator.wikimedia.org/T89380#1761715 (10hashar) 5Open>3Resolved a:3hashar The PHP package `jakub-onderka/php-parallel-lint` is not affected.  Any repo suffering from that issue should be migrated to use the com...
[13:48:16] <wikibugs>	 10Browser-Tests, 7Tracking: Move browser test alerts to responsible teams' channels from -releng - https://phabricator.wikimedia.org/T89375#1761718 (10hashar)
[13:49:31] <wikibugs>	 10Continuous-Integration-Infrastructure: /tmp/bundler* directories left behind on Jenkins slaves - https://phabricator.wikimedia.org/T84974#1761720 (10hashar) 5Open>3Resolved a:3hashar This will be fixed by moving the ruby / bundler jobs to Nodepool instances i.e. with the `rake-jessie`.
[13:49:32] <wikibugs>	 10Continuous-Integration-Infrastructure: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#1761723 (10hashar)
[13:50:03] <wikibugs>	 10Continuous-Integration-Infrastructure: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#759885 (10hashar)
[13:50:05] <wikibugs>	 10Continuous-Integration-Infrastructure, 5Patch-For-Review: hhvm Jenkins job fill up /tmp with perf-*.map files - https://phabricator.wikimedia.org/T64788#1761725 (10hashar) 5Open>3declined a:3hashar Wont fixing it for now. There is not much disk spaces issues any more and the issue will be definitely so...
[13:51:00] <wikibugs>	 10Continuous-Integration-Infrastructure: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#1761731 (10hashar) 5Open>3Resolved a:3hashar This is almost no more an issue compared to what it used to be at the end of 2014 / beginning of 2015. The complete res...
[13:51:20] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761735 (10SBisson) Similar problem with mw-vagrant. It is looking for `/vagrant/srv/parsoid/src/api/server.js`. It doesn't exist but there's `/vagrant/srv/par...
[13:51:22] <wikibugs>	 10Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#1761736 (10hashar)
[13:52:56] <wikibugs>	 10Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#540581 (10hashar) We have the `mwext-mw-selenium` job now which is meant to be run on patchset proposal.  To achieve this task we would ne...
[13:53:37] <wikibugs>	 10Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#1761739 (10hashar)
[13:54:36] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761741 (10ssastry) I am going to add a symlink for now so these continue to work. Separately, we'll update puppet (for both beta and production) -- I need to...
[13:56:18] <wikibugs>	 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Puppet, and 2 others: Puppetize npm/grunt manual setup - https://phabricator.wikimedia.org/T113903#1761752 (10hashar)
[13:56:18] <wikibugs>	 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 7Tracking: tracking: Cleanup contint puppet manifests so they are easier to reuse / split slaves by functions - https://phabricator.wikimedia.org/T110864#1761751 (10hashar)
[13:56:57] <wikibugs>	 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Migrate all debian-glue jobs to Jessie slaves - https://phabricator.wikimedia.org/T95545#1761756 (10hashar) p:5Normal>3Low
[13:59:57] <wikibugs>	 10Continuous-Integration-Infrastructure: Migrate all jobs to labs slaves - https://phabricator.wikimedia.org/T86659#1761760 (10hashar) Status as October 28th 2015 ``` $ ssh gallium.wikimedia.org ls -1 /srv/ssd/jenkins-slave/workspace doc-publish-sync integration-docroot-deploy mediawiki-vagrant-puppet-doc mwext-...
[14:04:57] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761771 (10hashar) For puppet, `manifests/role/parsoid.pp` vary the config with: ``` lang=ruby class role::parsoid::beta {     ...     # For beta, override NOD...
[14:07:43] <wikibugs>	 10Continuous-Integration-Infrastructure, 7Documentation: Document RuboCop workflow - https://phabricator.wikimedia.org/T1368#1761786 (10hashar) 5Open>3Resolved a:3hashar We have moved to use `rake` as an entry point.  It is described on https://www.mediawiki.org/wiki/Continuous_integration/Entry_points#R...
[14:09:19] <wikibugs>	 10Continuous-Integration-Infrastructure: Jenkins: Set up postmerge job to auto-deploy jenkins-job-builder configuration - https://phabricator.wikimedia.org/T49056#1761792 (10hashar)
[14:09:20] <wikibugs>	 10Continuous-Integration-Infrastructure: Jenkins: Set up job to validate jenkins-job-builder configuration - https://phabricator.wikimedia.org/T45140#1761794 (10hashar)
[14:09:22] <wikibugs>	 10Continuous-Integration-Infrastructure, 6operations: Install Jenkins Job Builder on gallium - https://phabricator.wikimedia.org/T45141#1761789 (10hashar) 5Open>3declined a:3hashar For now on we deploy them manually.  Maybe we will get that moved to use scap3 and have it generate the jobs directly from t...
[14:10:13] <wikibugs>	 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1761796 (10hashar)
[14:10:14] <wikibugs>	 10Continuous-Integration-Infrastructure, 7Tracking: Zuul: scale merge operations (tracking) - https://phabricator.wikimedia.org/T70480#1761795 (10hashar)
[14:14:33] <wikibugs>	 10Continuous-Integration-Infrastructure, 7Jenkins, 7Upstream: Jenkins sporadically changes its UI language - https://phabricator.wikimedia.org/T90558#1761815 (10hashar) 5Open>3Resolved a:3hashar I haven't heard of any complain since we upgraded Jenkins.  Assuming it got fixed somehow by upstream.
[14:14:49] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761818 (10ssastry) https://gerrit.wikimedia.org/r/#/c/249392/ is the parsoid patch to add the symlink https://gerrit.wikimedia.org/r/#/c/249399/ is the puppet...
[14:15:16] <wikibugs>	 10Continuous-Integration-Infrastructure, 7Zuul: Zuul status page should show the pipelines "window" value - https://phabricator.wikimedia.org/T93701#1761820 (10hashar) p:5Low>3Lowest
[14:15:24] <wikibugs>	 10Browser-Tests, 5Patch-For-Review: Auto retry failed browser tests to reduce false negatives - https://phabricator.wikimedia.org/T67773#1761821 (10hashar)
[14:15:42] <wikibugs>	 10Continuous-Integration-Infrastructure: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#1761823 (10hashar) 5Open>3stalled
[14:17:53] <wikibugs>	 6Release-Engineering-Team: Nightly builds tested and used for production deployments - https://phabricator.wikimedia.org/T67126#1761824 (10hashar)
[14:20:29] <wikibugs>	 6Release-Engineering-Team, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1761834 (10saper) Question: wouldn't that be possible to ship the certificate as a parameter to `$wgForeignXXXR...
[15:38:37] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:38:38] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:38:38] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:38:39] <shinken-wm>	 PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:38:40] <wmf-insecte>	 Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #672: 04FAILURE in 2 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/672/
[15:38:41] <shinken-wm>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:38:42] <wmf-insecte>	 Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #306: 04FAILURE in 2 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/306/
[15:38:48] <hashar>	 andrewbogott: sorry about https://gerrit.wikimedia.org/r/#/c/249389/  looks like my message is a bit confusing
[15:38:48] <hashar>	 andrewbogott: the zuul_url does not refer to the zuul/gearman server  that is another parameter
[15:38:48] <hashar>	 the parameter is meant to be consumed by Jenkins job so they can retrieve the patch on the host that prepared the patch
[15:38:48] <hashar>	 so it needs to vary by host
[15:38:49] <hashar>	 !log rebooting deployment-parsoid05  . seems NFS is flappy
[15:38:50] <hashar>	 !log no matter, NFS is under maintenance
[15:38:51] <hashar>	 !sal
[15:38:51] <wm-bot>	 https://tools.wmflabs.org/sal/releng
[15:38:51] <hashar>	 ...
[15:38:51] <hashar>	 f****** bots
[15:38:52] <shinken-wm>	 RECOVERY - Parsoid on deployment-parsoid05 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 0.050 second response time
[15:38:53] <hashar>	 subbu: we even have an Icinga check for the beta cluster parsoid service ^^^ :-}
[15:38:53] <subbu>	 oh, great!
[15:38:53] <shinken-wm>	 PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:38:53] <subbu>	 i was wondering that earlier.
[15:38:53] <subbu>	 we could have caught this y'day if we had seen this.
[15:38:57] <hashar>	 bah
[15:38:57] <hashar>	 maybe related to NFS
[15:38:57] <hashar>	 Error: Command exceeded timeout
[15:38:57] <hashar>	 Error: /Stage[main]/Role::Labs::Instance/Exec[block-for-project-export]/returns: change from notrun to 0 failed: Command exceeded timeout
[15:38:57] <hashar>	 Notice: /Stage[main]/Role::Labs::Instance/Mount[/data/project]: Dependency Exec[block-for-project-export] has failures: true
[15:38:57] <hashar>	 Warning: /Stage[main]/Role::Labs::Instance/Mount[/data/project]: Skipping because of failed dependencies
[15:38:57] <hashar>	 yeah 
[15:38:57] <hashar>	 all "normal"
[15:38:57] <yurik>	 hi, git deploy sync is not working for graphoid - https://phabricator.wikimedia.org/T116920
[15:38:58] <yurik>	 hashar, ? :)
[15:38:58] <hashar>	 yurik: as it ever worked?
[15:38:58] <yurik>	 hashar, i think so
[15:38:58] <yurik>	 hashar, judgig from the sca01 & sca02, their git repos are pointing to bastion, not gerrit or github
[15:38:58] <hashar>	 I have no idea how Trebuchet works with salt / minion targets and so on
[15:38:58] <yurik>	 thcipriani, ?
[15:38:58] <yurik>	 mobrovac, ^
[15:38:58] <hashar>	 I guess something got broken at some point
[15:38:58] <hashar>	 and the sca** hosts are not more attached as minions
[15:38:58] <hashar>	 or at least not with the grain trebuchet is expecting
[15:38:58] <yurik>	 hashar, that's an excellent observation )))))
[15:38:58] <yurik>	 (i was refering to "something got broken at some point" :)
[15:38:58] <hashar>	 yeah
[15:38:58] <hashar>	 :-D
[15:38:58] <mobrovac>	 there was a problem like that in the past
[15:38:58] <mobrovac>	 with sca0x
[15:38:58] <thcipriani>	 I was it's weird that it says 0/0 minions. sca01 has the grain deploy_target:graphoid/deploy
[15:38:58] <mobrovac>	 thcipriani fixed it by purging the redis queue or some magic like that
[15:38:58] <thcipriani>	 yeah, that was a slightly different problem it was saying only half the minions checked in, because domain name changed.
[15:38:58] <thcipriani>	 so trebuchet thought we doubled the deploy size, but only half the domains were reachable.
[15:38:58] <thcipriani>	 er...hosts.
[15:38:59] <yurik>	 bd808, ^
[15:38:59] <hashar>	 hashar@deployment-bastion:~$ redis-cli SMEMBERS "deploy_target:graphoid/deploy"
[15:38:59] <hashar>	 (empty list or set)
[15:38:59] <hashar>	 thcipriani: ^^^redis breakage?
[15:38:59] <thcipriani>	 Could be. It could be that deployment "worked" but the redis returner broke.
[15:38:59] <andrewbogott>	 hashar: ah!  That makes sense
[15:39:01] <hashar>	 andrewbogott: so you concern apply to HEAD^ :-}
[15:39:01] <hashar>	 andrewbogott: and the patch actually stop assuming everything is on gallium  (zuul.eqiad.wmnet)
[15:39:02] <andrewbogott>	 yeah, that part made sense
[15:39:02] <grrrit-wm>	 (03PS1) 10Paladox: [GuidedTour] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249414 
[15:39:02] <thcipriani>	 is nutcracker up and running?
[15:39:02] <hashar>	 andrewbogott: andI get another patch to add our shell access to scandium , which is on labs-support network  ( https://gerrit.wikimedia.org/r/#/c/249380/ )
[15:39:02] <thcipriani>	 nevermind.
[15:39:02] <hashar>	 andrewbogott: though that one grant me root on the machine which is quite useful for the service implmentation
[15:39:03] <bd808>	 yurik: did you manually update sca01? It has 07a2b2f7689addce080aaa49db1b73481c019fea checked out
[15:39:03] <yurik>	 bd808, yes, i tried to get it resolved manually
[15:39:03] <yurik>	 didn't work either )
[15:39:03] <yurik>	 for some reason the service graphoid restart failed
[15:39:03] <grrrit-wm>	 (03PS1) 10Paladox: Add new jsduck template [integration/config] - 10https://gerrit.wikimedia.org/r/249416 
[15:39:03] <hashar>	 !log zuul-merger will now use ZUUL_URL=git://gallium.wikimedia.org instead of ZUUL_URL=git://zuul.eqiad.wmnet  ( https://gerrit.wikimedia.org/r/#/c/249389/ )
[15:39:04] <hashar>	 yurik: most of the time if you try to fix stuff manually that breaks the automatic system even more :-}
[15:39:04] * yurik hides in shame
[15:39:05] * yurik thinks automatic systems should fix problems automatically ))
[15:39:05] <bd808>	 yurik: worked just fine for me -- https://phabricator.wikimedia.org/T116920#1762053
[15:39:05] <hashar>	 yurik: for what it is worth Tyler managed to deploy RESTBase with scap3 last week
[15:39:05] <hashar>	 so potentially we will migrate graphoid to scap3 as well
[15:39:05] <wmf-insecte>	 Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #223: 04FAILURE in 2 min 6 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/223/
[15:39:06] <grrrit-wm>	 (03PS2) 10Paladox: [GuidedTour] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249414 
[15:39:06] <yurik>	 bd808, i think i'm an idiot - i tried to deploy from bastion, and it "worked"
[15:39:06] <yurik>	 but showed 0/0
[15:39:06] <yurik>	 oh, wait, you did too.  never mind... .weird
[15:39:06] <hashar>	 !log beta: moved web proxy parsoid-beta.wmflabs.org	to use http://deployment-cache-parsoid05.eqiad.wmflabs:80
[15:39:06] <hashar>	 !log beta: deleting old deployment-parsoidcache02 (trusty) replaced by deployment-cache-parsoid05 (Jessie)
[15:39:06] <shinken-wm>	 PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145)
[15:39:06] <yurik>	 thanks everyone for helping!!  I restarted the graphoid service on both sca1 & 2, it works now!
[15:39:07] <wmf-insecte>	 Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #700: 04FAILURE in 2 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/700/
[15:39:09] <yurik>	 bd808, how long did it take you to get from 0/2 to 2/2 ?
[15:39:09] <yurik>	 i'm trying to sync now (without any changes), and it keep staying at 0/2.   i even tried "retry"
[15:39:09] <bd808>	 yurik: not long. 1-2 minutes?
[15:39:09] <yurik>	 ok, will wait for a bit
[15:39:09] <bd808>	 actually it returned that at the first timeout in the tool
[15:39:09] <bd808>	 which is somehting like 30s as I recall
[15:39:09] <bd808>	 that paste is the whole session I did
[15:39:09] <yurik>	 ah, there it goes
[15:39:09] <yurik>	 thx, works
[15:39:11] <shinken-wm>	 RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 38772 bytes in 0.627 second response time
[15:39:13] <wikibugs>	 10Beta-Cluster-Infrastructure: Unable to trebuchet deploy Graphoid on deployment labs - https://phabricator.wikimedia.org/T116920#1762004 (10Yurik) 3NEW
[15:39:19] <wikibugs>	 10Beta-Cluster-Infrastructure: Unable to trebuchet deploy Graphoid on deployment labs - https://phabricator.wikimedia.org/T116920#1762014 (10Yurik)
[15:39:22] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1762016 (10hashar) Could use help from #parsoid people to verify whether the new Parsoid Varnish cache is properly working....
[15:39:53] <hashar>	 ostriches: I have migrated the last varnish cache on beta-cluster from Trusty to Jessie :-}
[15:40:29] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Parsoid, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1762097 (10hashar)
[15:40:31] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Traffic, 6operations, 5Patch-For-Review: Upgrade beta-cluster caches to jessie - https://phabricator.wikimedia.org/T98758#1762095 (10hashar) 5Open>3Resolved {T103660} has finally been solved. That was the last Varnish cache still using Trusty.
[15:42:05] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 38446 bytes in 1.806 second response time
[15:43:36] <shinken-wm>	 RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30297 bytes in 0.535 second response time
[15:44:39] <ostriches>	 hashar: I had already built -04 for it
[15:44:48] <ostriches>	 I just hadn't pointed the IP yet because puppet was half-failing.
[15:45:06] <hashar>	 yeah puppet failed because it could not create the varnish cache files on /srv/vdb
[15:45:18] <hashar>	 because -04 was a m1.small 
[15:45:19] <hashar>	 and the extended /dev/vdb disk only had 500MBytes
[15:45:33] <hashar>	 so I created a a m1.medium instance with 40GB disk (and  /dev/vdb with 20GB)
[15:45:36] <ostriches>	 Ahh
[15:45:39] <hashar>	 that fixed the varnish cache creation file
[15:45:45] <hashar>	 then looped with marko to fix up parsoid
[15:45:49] <hashar>	 and subbu
[15:46:05] <hashar>	 since they reorganized parsoid files in the source repo over night, that broke Parsoid startup
[15:46:07] <hashar>	 anyway
[15:46:10] <hashar>	 it is all fine now ;:-)
[15:48:16] <ostriches>	 Yay
[15:48:24] <ostriches>	 bd808: Nothing uses the 'scap-test' dsh group, right?
[15:48:52] <bd808>	 nope. That was a short lived experiment by me and ori to restart hhvm on sync
[15:49:21] <bd808>	 The outcome: badness
[15:49:34] <ostriches>	 I thought so.
[15:49:44] <bd808>	 we couldn't get pybal to depool as desired
[15:50:01] <bd808>	 The hope is to bring it back once pybal is using etcd
[15:50:18] <bd808>	 which has been a 6m project or so at this point
[15:57:21] <shinken-wm>	 RECOVERY - Puppet failure on deployment-parsoid05 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:59:36] <wikibugs>	 6Release-Engineering-Team, 10Wikimedia-Developer-Summit-2016: [WIP] Code-review migration status/discussion - https://phabricator.wikimedia.org/T114320#1762149 (10mmodell)
[16:00:27] <hashar>	 !log for integration/zuul.git , created branch labs-tox-deployment  to be used to deploy Zuul with pip on labs instances
[16:00:30] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:00:35] <hashar>	 cause .deb packaging is a nightmare
[16:04:40] <wikibugs>	 10Deployment-Systems, 3Scap3: Ensure that git handles `git-update-server-info` automatically - https://phabricator.wikimedia.org/T116640#1762173 (10mmodell) p:5Triage>3Normal
[16:07:24] <wikibugs>	 10Deployment-Systems, 3Scap3: default lock file for scap3 should be repo-dependent - https://phabricator.wikimedia.org/T116208#1762186 (10mmodell) p:5Triage>3Normal
[16:40:45] <wikibugs>	 10Continuous-Integration-Config, 10Graphoid, 6Services: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1762331 (10hashar) Should be as easy as editing the `zuul/layout.yaml` file in `integration/config.git` and add: ``` - name: mediawiki/services/graphoid:   templa...
[16:42:33] <grrrit-wm>	 (03PS1) 10Hashar: zuul: reorder mediawiki/services/cxserver [integration/config] - 10https://gerrit.wikimedia.org/r/249438 
[16:43:47] <grrrit-wm>	 (03PS1) 10Hashar: [graphoid] experimental npm job [integration/config] - 10https://gerrit.wikimedia.org/r/249439 (https://phabricator.wikimedia.org/T106668) 
[16:44:02] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] zuul: reorder mediawiki/services/cxserver [integration/config] - 10https://gerrit.wikimedia.org/r/249438 (owner: 10Hashar)
[16:45:04] <grrrit-wm>	 (03Merged) 10jenkins-bot: zuul: reorder mediawiki/services/cxserver [integration/config] - 10https://gerrit.wikimedia.org/r/249438 (owner: 10Hashar)
[16:45:15] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] [graphoid] experimental npm job [integration/config] - 10https://gerrit.wikimedia.org/r/249439 (https://phabricator.wikimedia.org/T106668) (owner: 10Hashar)
[16:46:12] <grrrit-wm>	 (03Merged) 10jenkins-bot: [graphoid] experimental npm job [integration/config] - 10https://gerrit.wikimedia.org/r/249439 (https://phabricator.wikimedia.org/T106668) (owner: 10Hashar)
[16:49:54] <wikibugs>	 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1762373 (10Dzahn) tin got firewalling during this morning's swat deploy.  that meant tin and mira are now identical and we could merge hoo's change above to reflect...
[16:50:29] <wikibugs>	 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1762378 (10hashar) The CI patch I made let you trigger the Jenkins `npm` job on Trusty slaves by commenting in Gerrit: `check experimental`....
[16:50:33] <wikibugs>	 10Continuous-Integration-Config, 6operations, 5Patch-For-Review: Forbid quoted booleans in puppet manifests - https://phabricator.wikimedia.org/T113783#1762381 (10Andrew) 5Open>3Resolved
[17:11:11] <grrrit-wm>	 (03PS2) 10Krinkle: Split WebPageTest jobs into two [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog)
[17:13:43] <grrrit-wm>	 (03PS3) 10Krinkle: Split WebPageTest jobs into two [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog)
[17:14:22] <grrrit-wm>	 (03CR) 10Krinkle: [C: 032] "Deploying.." [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog)
[17:16:22] <grrrit-wm>	 (03Merged) 10jenkins-bot: Split WebPageTest jobs into two [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog)
[17:17:05] <shinken-wm>	 RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms
[17:18:21] <grrrit-wm>	 (03CR) 10Chad: [C: 032] Provide scap control server FQDN to proxy sync commands [tools/scap] - 10https://gerrit.wikimedia.org/r/247965 (https://phabricator.wikimedia.org/T104826) (owner: 10BryanDavis)
[17:19:08] <grrrit-wm>	 (03Merged) 10jenkins-bot: Provide scap control server FQDN to proxy sync commands [tools/scap] - 10https://gerrit.wikimedia.org/r/247965 (https://phabricator.wikimedia.org/T104826) (owner: 10BryanDavis)
[17:22:16] <shinken-wm>	 PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145)
[17:25:44] <wikibugs>	 5Continuous-Integration-Scaling, 6operations, 10ops-eqiad: Reclaim SSD from labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T116936#1762547 (10hashar)
[17:28:28] <wikibugs>	 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1762568 (10hashar) We got shell access thanks to ops reviews!  Will now look at the network flows. Once happy we can apply the zuul::merger role and d...
[17:40:06] <grrrit-wm>	 (03CR) 10Krinkle: "Live at https://integration.wikimedia.org/ci/job/performance-webpagetest-wmf/ and https://integration.wikimedia.org/ci/job/performance-web" [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog)
[17:52:07] <wikibugs>	 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1762647 (10mobrovac) Graphoid requires some extra pkgs to be installed on the system, cf. [the dependencies](https://github.com/wikimedia/med...
[18:16:40] <wmf-insecte>	 Project beta-scap-eqiad build #76283: 04FAILURE in 1 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/76283/
[18:34:41] <ostriches>	 Yeah I know ^
[18:34:45] <ostriches>	 It's actually mostly working
[19:02:54] <grrrit-wm>	 (03PS1) 10Legoktm: Add config for `base-convert` repository [integration/config] - 10https://gerrit.wikimedia.org/r/249476 
[19:03:37] <grrrit-wm>	 (03PS2) 10Legoktm: Add config for `base-convert` repository [integration/config] - 10https://gerrit.wikimedia.org/r/249476 
[19:03:48] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Add config for `base-convert` repository [integration/config] - 10https://gerrit.wikimedia.org/r/249476 (owner: 10Legoktm)
[19:05:45] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add config for `base-convert` repository [integration/config] - 10https://gerrit.wikimedia.org/r/249476 (owner: 10Legoktm)
[19:06:10] <legoktm>	 !log deploying https://gerrit.wikimedia.org/r/249476
[19:06:13] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[19:18:51] <grrrit-wm>	 (03PS1) 10Paladox: [Git2Pages] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249481 
[19:19:43] <grrrit-wm>	 (03PS2) 10Paladox: [Git2Pages] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249481 
[19:20:21] <grrrit-wm>	 (03PS3) 10Paladox: [Git2Pages] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249481 
[19:24:03] <wikibugs>	 10Beta-Cluster-Infrastructure, 6Analytics-Engineering, 7Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1762992 (10chasemp)
[19:24:58] <wikibugs>	 10Deployment-Systems, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1763007 (10chasemp) p:5Triage>3Normal
[19:25:13] <wikibugs>	 10Deployment-Systems, 6Release-Engineering-Team, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1759278 (10chasemp)
[19:25:31] <wikibugs>	 10Deployment-Systems, 6Release-Engineering-Team, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1759278 (10chasemp) at #release-Engineering-Team please advise :)
[19:27:45] <wikibugs>	 5Continuous-Integration-Scaling, 6operations, 10ops-eqiad: Reclaim SSD from labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T116936#1763049 (10chasemp) 5Open>3stalled Let's wait on this until we have a fully realized and migrated solution here just in case so we don't end up in a "oh wait...
[19:27:51] <wikibugs>	 5Continuous-Integration-Scaling, 6operations, 10ops-eqiad: Reclaim SSD from labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T116936#1763052 (10chasemp) p:5Triage>3Lowest
[20:01:10] <wikibugs>	 10Beta-Cluster-Infrastructure, 6Analytics-Engineering, 7Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1763189 (10hashar) The reason is the role classes in `modules/role/manifests/cache/statsd.pp` all have: ``` statsd_server => 'statsd.e...
[20:09:46] <wikibugs>	 10Beta-Cluster-Infrastructure, 6Analytics-Engineering, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1761231 (10hashar)
[20:12:52] <wikibugs>	 5Continuous-Integration-Scaling, 6operations, 10ops-eqiad: Reclaim SSD from labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T116936#1763238 (10hashar) Good call. We never know :-)
[20:17:45] <wikibugs>	 10Deployment-Systems, 6Release-Engineering-Team, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1763243 (10hashar) The original commit is from Nov 30, 2012. I think at one point the idea was to publish the state of the repos on the de...
[20:20:05] <wikibugs>	 10Deployment-Systems, 6Release-Engineering-Team, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1763254 (10chasemp) 5Open>3Resolved a:3chasemp
[20:22:01] <wikibugs>	 10Beta-Cluster-Infrastructure, 7Blocked-on-RelEng, 6operations, 7HHVM, 5Patch-For-Review: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1763256 (10chasemp)
[20:25:08] <wikibugs>	 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1763277 (10hashar) I am npm illiterate, but what is that section in package.json: ```   "deploy": {     "target": "ubuntu",       "dependenci...
[20:28:31] <wikibugs>	 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1763281 (10hashar) We will need both the binaries and -dev packages dependencies to be shipped on the CI slave.  For production the packages...
[20:28:53] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Documentation, 7Documentation: Create a Jenkins check to verify hooks.yaml formatting - https://phabricator.wikimedia.org/T116965#1763282 (10Tgr) 3NEW
[20:29:03] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Documentation, 7Documentation: Create a Jenkins check to verify hooks.yaml formatting - https://phabricator.wikimedia.org/T116965#1763293 (10Tgr)
[20:29:34] <hashar>	 chasemp: looked a bit at scandium.  It is unreachable from the labs instance
[20:29:49] <chasemp>	 can you outline in the task what access labs instances need to it
[20:29:54] <hashar>	 I am not sure if that is a firewall in between labs-support and the actual instances
[20:29:55] <chasemp>	 i.e. what IP and port
[20:29:58] <chasemp>	 yes
[20:29:59] <chasemp>	 there is
[20:30:03] <hashar>	 or if the iptables is broken
[20:30:03] <hashar>	 ah
[20:30:23] <chasemp>	 I can help out get it sorted but please put what we need in writing on the task and I'll look this week
[20:30:32] <hashar>	 will update the wiki doc and fill a subtask to the scandium one :-)
[20:30:40] <chasemp>	 nah just use the main task
[20:30:41] <chasemp>	 that's fine
[20:30:42] <hashar>	 yeah yeah filling a task was implied
[20:30:51] <hashar>	 just wanted to quickly check whether I missed something obvious
[20:30:58] <chasemp>	 no it's restricted for sure
[20:31:05] <chasemp>	 :)
[20:37:57] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Documentation, 7Documentation: Create a Jenkins check to verify hooks.yaml formatting - https://phabricator.wikimedia.org/T116965#1763319 (10hashar) Creating a specific job for each use cases turned out to be a nightmare. So instead the Jenkis jobs have be...
[20:54:07] <wikibugs>	 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Write a migration plan for CI infra to the disposable VMs infrastructure - https://phabricator.wikimedia.org/T86172#1763392 (10hashar)
[20:54:09] <wikibugs>	 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Design the Jenkins isolation architecture - https://phabricator.wikimedia.org/T86171#1763388 (10hashar) 5Open>3Resolved Nodepool has been deployed.  The rest of the components are being slowly added as time allow. We have just starte...
[20:54:10] <wikibugs>	 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 10releng-201415-Q3, 10releng-201415-Q4, 7Epic: [EPIC] Run CI jobs in disposable VMs - https://phabricator.wikimedia.org/T47499#1763393 (10hashar)
[21:19:06] <hashar>	 chasemp: you were right this project is epic :-}
[21:26:24] <wikibugs>	 10Browser-Tests, 5Patch-For-Review: Investigate QuickSurveys browser tests failures - https://phabricator.wikimedia.org/T113534#1763570 (10Jdlrobson)
[21:26:37] <wikibugs>	 10Browser-Tests, 5Patch-For-Review, 3Reading Web Sprint 59 - Amsterdam and the hamsters: Investigate QuickSurveys browser tests failures - https://phabricator.wikimedia.org/T113534#1668078 (10Jdlrobson)
[21:39:41] <wikibugs>	 5Continuous-Integration-Scaling, 6operations: Allow network flow between labs instance and scandium - https://phabricator.wikimedia.org/T116975#1763623 (10hashar) 3NEW a:3hashar
[21:40:38] <hashar>	 chasemp: I have filled the task, hopefully it is not going to be controversial :-}
[21:40:58] <chasemp>	 kk
[22:10:08] <hashar>	 phabricator has badges! https://secure.phabricator.com/badges/ 
[22:10:55] <wikibugs>	 10Deployment-Systems, 3Scap3, 5Patch-For-Review: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1763840 (10mmodell)
[22:10:58] <chasemp>	 yes hotly debated :)
[22:11:07] <chasemp>	 in our isntall
[22:11:09] <chasemp>	 I guess
[22:13:34] <wikibugs>	 10Deployment-Systems, 3Scap3: Ensure that git handles `git-update-server-info` automatically - https://phabricator.wikimedia.org/T116640#1763854 (10mmodell) a:3demon
[22:13:36] <hashar>	 we would face the risk of being more social
[22:14:06] <mutante>	 we already have like _and_ dislike button :)
[22:14:14] <mutante>	 but give me badges
[22:14:30] <mutante>	 gamification works great for wikidata too
[22:15:10] <chasemp>	 badges are more like barnstars something something
[22:15:14] <chasemp>	 tokens are what you mean?
[22:15:21] <mutante>	 wants to create new tokens, the pterodactyl and cactus are getting old
[22:15:35] <mutante>	 badges look like achievements unlocked
[22:15:46] <mutante>	 that i get automatically based on stats, right
[22:16:55] <mutante>	 i'll take it all, and the memes.
[22:18:11] <chasemp>	 you laugh but I've been in a convo where someone wanted to know from legal if were liable for copyrighted memes :)
[22:19:00] <hashar>	 i would love to award CI related badges to folks :-D
[22:19:05] <hashar>	 or "you fixed beta cluster" !
[22:19:36] <chasemp>	 go for it :)
[22:20:11] <hashar>	 I have my own little ascii art barn star
[22:20:19] <hashar>	 that i paste to Gerrit changes from time to time
[22:26:45] <mutante>	 chasemp: just as much as we are when people put them on commons ?:P
[22:26:54] <mutante>	 i mean, isn't that question already solved
[22:27:00] <mutante>	 for the main wiki sites
[22:27:26] <chasemp>	 don't expect legal shenanigans to make sense
[22:27:30] <mutante>	 right
[22:27:58] <mutante>	 actually...it should be on phab like on wikis
[22:28:04] <mutante>	 you can use the images from commons
[22:28:08] <mutante>	 you dont upload locally
[22:28:15] <mutante>	 and commons handles all these questions like they normally do
[22:28:16] <mutante>	 done
[22:42:03] <shinken-wm>	 RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms
[22:43:00] <Krenair>	 I thought hashar got rid of that?
[22:45:53] <wikibugs>	 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1764003 (10mobrovac) >>! In T106668#1763277, @hashar wrote: > I am npm illiterate, but what is that section in package.json:  This is #servic...
[23:09:34] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Documentation, 7Documentation: Create a Jenkins check to verify hooks.yaml formatting - https://phabricator.wikimedia.org/T116965#1764227 (10Tgr) >>! In T116965#1763319, @hashar wrote: > Creating a specific job for each use cases turned out to be a nightma...
[23:34:38] <wikibugs>	 5Continuous-Integration-Scaling, 6operations: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1764315 (10chasemp)
[23:34:54] <wikibugs>	 5Continuous-Integration-Scaling, 6operations: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1433420 (10chasemp) >>! In T104967#1761540, @fgiunchedi wrote: > afaict our puppet hooks for jessie does include `thirdparty` >  > ``` >     pac...
[23:51:39] <wmf-insecte>	 Yippee, build fixed!
[23:51:40] <wmf-insecte>	 Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #831: 09FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/831/