[03:16:26] Yippee, build fixed! [03:16:26] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #762: 09FIXED in 26 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/762/ [03:40:45] (03PS1) 10Krinkle: Enable npm for mediawiki/extensions/ShortUrl [integration/config] - 10https://gerrit.wikimedia.org/r/249341 [03:40:56] (03CR) 10Krinkle: [C: 032] Enable npm for mediawiki/extensions/ShortUrl [integration/config] - 10https://gerrit.wikimedia.org/r/249341 (owner: 10Krinkle) [03:41:59] (03Merged) 10jenkins-bot: Enable npm for mediawiki/extensions/ShortUrl [integration/config] - 10https://gerrit.wikimedia.org/r/249341 (owner: 10Krinkle) [03:43:30] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/249341 ( [03:43:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [04:26:09] Yippee, build fixed! [04:26:09] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #605: 09FIXED in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/605/ [04:58:16] 10Deployment-Systems, 3Scap3: Scap3 targets should use a config file rather than `key:value` arguments - https://phabricator.wikimedia.org/T116432#1760887 (10mmodell) related: {D28} [05:05:59] so apparently scap isn't mirrored to github anymore? [05:06:39] nevermind I'm dumb. was looking at ancient revision in github. [05:17:16] PROBLEM - Puppet failure on deployment-cache-parsoid04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:19:29] (03PS1) 10Legoktm: Remove jshint from ProofreadPage [integration/config] - 10https://gerrit.wikimedia.org/r/249349 [05:19:38] (03CR) 10Legoktm: [C: 032] Remove jshint from ProofreadPage [integration/config] - 10https://gerrit.wikimedia.org/r/249349 (owner: 10Legoktm) [05:26:09] (03Merged) 10jenkins-bot: Remove jshint from ProofreadPage [integration/config] - 10https://gerrit.wikimedia.org/r/249349 (owner: 10Legoktm) [05:26:42] !log deploying https://gerrit.wikimedia.org/r/249349 [05:26:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [06:30:03] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #788: 04FAILURE in 11 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/788/ [06:31:17] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:36:11] (03CR) 1020after4: [C: 031] Provide scap control server FQDN to proxy sync commands [tools/scap] - 10https://gerrit.wikimedia.org/r/247965 (https://phabricator.wikimedia.org/T104826) (owner: 10BryanDavis) [07:11:21] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [08:22:25] 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1761092 (10Luke081515) @Mww113: > (User rights log); 09:19 . . Luke081515 (Talk | contribs | block) changed group membership for Mww113@metawiki from (none) to administrator and confirmed user ‎(per T116364) but I... [08:32:00] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #765: 04FAILURE in 21 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/765/ [09:18:30] hashar: 'round? [09:18:39] mobrovac: bonjour :-} [09:18:48] bonjour! [09:18:49] beside my net being crappy yeah I am around [09:18:54] haha [09:18:55] just finished the checkin with zeljko [09:18:57] what is up? [09:19:11] deployment-parsoidcache0x is what's up [09:19:18] they don't seem to work [09:19:21] can't even ssh in [09:19:51] ohh [09:19:58] so http://parsoid-beta.wmflabs.org gives me 503s [09:20:08] folks where playing with Parsoid yesterday evening [09:20:15] *sigh* [09:20:46] and the deployment-parsoidcache is still running Trusty :-( [09:21:27] yeah ideally we should switch it to jessie [09:21:53] deployment-parsoidcache02 [09:21:53] ubuntu-14.04-trusty (deprecated 2014-10-03) [09:22:00] deployment-cache-parsoid04 [09:22:01] debian-8.1-jessie [09:22:10] not sure which one is being used [09:25:05] why do I always have to fix stuff myself [09:26:50] mobrovac: parsoid-beta.wmflabs.org points to the public IP 208.80.155.156 [09:26:58] which does not shows up on https://wikitech.wikimedia.org/wiki/Special:NovaAddress :/ [09:27:17] ah that is the shared proxy [09:27:21] wtf [09:27:38] uf this morning it's messy wherever i turn [09:27:39] damn [09:28:15] so on beta the parsoid cache is not used apparnetly [09:30:38] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:31:17] mobrovac: any idea where parsoid-beta.wmflabs.org entry is configured? [09:31:29] I mean what is the Parsoid/VE setting that points to parsoid-beta.wmflabs.org [09:32:07] 10Beta-Cluster-Infrastructure, 7Varnish: deployment-parsoidcache02 fails puppet: Role::Cache::Statsd/Varnish::Logging::Statsd[default]/Base::Service_unit[varnishstatsd-default]/Service[varnishstatsd-default]: Could not evaluate: Could not find init script or up... - https://phabricator.wikimedia.org/T103660#1761154 [09:32:12] my net is crap [09:32:32] parsoid-beta is supposed point to parsoid's cache [09:32:57] ve and rb configs point to the internal IP of deployment-parsoidcache05 if i'm not mistaken [09:33:36] 10Beta-Cluster-Infrastructure, 7Varnish: deployment-parsoidcache02 fails puppet: Role::Cache::Statsd/Varnish::Logging::Statsd[default]/Base::Service_unit[varnishstatsd-default]/Service[varnishstatsd-default]: Could not evaluate: Could not find init script or up... - https://phabricator.wikimedia.org/T103660#1761155 [09:38:08] https://github.com/wikimedia/mediawiki-services-parsoid-deploy/blob/master/conf/wmf/betalabs.localsettings.js [09:38:17] parsoidConfig.parsoidCacheURI = 'http://10.68.16.145/'; // deployment-parsoidcache01.eqiad.wmflabs [09:38:33] that one is the Trusty instance [09:39:05] ah [09:39:06] wmf-config/CommonSettings-labs.php: $wmgParsoidURL = 'http://10.68.16.145'; // deployment-parsoidcache02.eqiad [09:39:29] mobrovac: I have no idea what parsoid-beta.wmflabs.org is for [09:40:20] mobrovac: VE has been made to hit RESTBase instead of Parsoid isn't it ? [09:40:32] when i curl, i can see: X-Cache: deployment-parsoidcache02 miss (0), deployment-parsoidcache02 frontend miss (0) [09:40:36] hashar: yup [09:41:24] parsoid-beta is supposed to be used by outside processes [09:41:32] w.g. we use it for restbase testing in travis [09:44:57] so parsoid-beta is some entry point which is not related to beta cluster isit? [09:45:02] else it should point to a deployment-parsoidcache instance [09:46:10] varnish doesn't start on the Jessie cache (deployment-cache-parsoid04) [09:46:23] uf it's a mess [09:46:29] it should point to parosid in beta [09:47:19] and thanks to systemd I have very useful messages: [09:47:20] Failed to start varnish (Varnish HTTP Accelerator). [09:47:27] Unit varnish.service entered failed state. [09:47:57] (03PS5) 10Florianschmidtwelzow: Output plain html entities [tools/release] - 10https://gerrit.wikimedia.org/r/231317 [09:48:09] so for now, parsoid-beta.wmflabs.org point to something, but that is not in the deployment-prep labs project [09:48:49] for file /srv/vdb/varnish.main1 failed: No space left on device !! [09:48:54] progress [09:49:29] f**** labs [09:49:33] found it [09:49:41] /srv/vdb only has 500 MB of disk [09:49:42] :-D [09:51:40] 10Beta-Cluster-Infrastructure, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761186 (10hashar) [09:53:36] 10Beta-Cluster-Infrastructure, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1395617 (10hashar) a:3hashar Deleted deployment-cache-parsoid04 which is too small. Created deployment-cache-parsoid05 a m1.medium or 40GB of disk. [09:54:27] !log beta: deleting deployment-cache-parsoid04 not enough disk space for /srv/ ( https://phabricator.wikimedia.org/T103660 ) [09:54:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:56:26] (03PS1) 10Phedenskog: Split WebPageTest jobs into two [integration/config] - 10https://gerrit.wikimedia.org/r/249363 [09:56:58] mobrovac: I am going to just nuke the current Parsoid cache running Trusty [09:57:25] PROBLEM - Host deployment-cache-parsoid04 is DOWN: CRITICAL - Host Unreachable (10.68.19.197) [09:57:34] kk hashar, makes sense [09:58:28] 10Beta-Cluster-Infrastructure, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761212 (10hashar) We should delete: | deployment-parsoidcache02 | Trusty | 10.68.16.145 To be replaced with: | deployment-cache-parsoid05 | Jessie | 10.68.20.102 [09:58:38] !log Deleting deployment-parsoidcache02 (Trusty) 10.68.16.145 to be replaced with deployment-cache-parsoid05 10.68.20.102 (Jessie) [09:58:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:59:26] mobrovac: would you mind preparing patches to replace 10.68.16.145 by 10.68.20.102 ? Would need updates probably in all of operations/puppet.git operations/mediawiki-config.git and mediawiki/services/parsoid/deploy.git [09:59:36] + RESTBase if needed [09:59:43] all patches can refer to T103660 -:) [10:00:08] hashar: that's the cache's IP? [10:00:13] yeah [10:00:16] kk [10:00:19] which I am rebuilding [10:00:34] we have been using the Trusty instance ( 10.68.16.145 ) for ages [10:00:41] but varnish is outdated there [10:00:49] then an instance got created for Jessie [10:00:52] but never got completed [10:01:00] and did not have enough disk space [10:01:57] !log applying role::cache::parsoid to deployment-cache-parsoid05 [10:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:03:34] blah la running puppet a few times [10:04:57] root 16839 1 0 10:03 ? 00:00:00 python /usr/local/bin/varnishxcps --statsd-server=statsd.eqiad.wmnet [10:04:57] root 17097 1 1 10:03 ? 00:00:00 python /usr/local/bin/varnishrls --statsd-server=statsd.eqiad.wmnet [10:04:59] .... [10:05:56] lol [10:06:42] hashar: patch #1: https://gerrit.wikimedia.org/r/249366 [10:06:59] (no refs of that IP in ops/puppet btw) [10:07:12] going to parsoid/deploy now [10:08:39] 10Beta-Cluster-Infrastructure, 6Analytics-Engineering, 6operations, 7Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1761231 (10hashar) 3NEW [10:08:40] moaar bugs [10:09:21] hashar: on 10.68.20.102 is varnish, right? [10:09:49] mobrovac: yeah deployment-cache-parsoid05 10.68.20.102 [10:09:52] kk [10:09:53] varnish is up [10:10:38] I need a break coffee/nature etc [10:10:41] net is crap here :-/ [10:12:04] yeaaaah [10:12:04] me too [10:13:13] hashar: https://gerrit.wikimedia.org/r/249367 for parsoid/deploy [10:17:28] greaat [10:17:50] my net feels like it is year 1995 again [10:25:11] mobrovac: +2ed both [10:25:37] mobrovac: I have no idea whether that parsoid cache instance has any public URL [10:26:29] I have no idea how VisualEditor communicate with Parsoid nowadays [10:26:50] hashar: via restbase [10:27:35] does RESTBase uses the $wmgParsoidURL = 'http://10.68.20.102'; setting? [10:28:45] hashar: RB uses https://github.com/wikimedia/operations-puppet/blob/production/hieradata/labs/deployment-prep/common.yaml#L69 [10:28:54] in beta [10:29:57] so many configuration systems :-} [10:30:05] so [10:30:32] a VE user hits restbase public entry point somehow (I guess some varnish cache) then restbase directly it the Parsoid system [10:30:39] shouldn't it uses the parsoid cache instead? [10:30:54] no, we need to talk to parsoid directly [10:31:11] RB is the new parsoid cache :) [10:31:48] 10Beta-Cluster-Infrastructure, 7Database: Investigate slow query logging/digest for Beta Cluster - https://phabricator.wikimedia.org/T116793#1761285 (10hashar) [10:34:28] 10Beta-Cluster-Infrastructure, 7Database: Investigate slow query logging/digest for Beta Cluster - https://phabricator.wikimedia.org/T116793#1761291 (10hashar) @jcrespo that is a follow up task after the beta cluster outage (T116447). Dan mentioned the beta cluster databases do not log slow queries. We thou... [10:36:07] ah [10:36:20] mobrovac: so does that mean we can eventually get rid of the parsoid cache soonish? [10:36:49] exactly hashar! [10:37:08] but we still have $wmgParsoidURL = 'http://10.68.20.102'; .. [10:37:40] so [10:37:40] 10Beta-Cluster-Infrastructure, 7Database: Investigate slow query logging/digest for Beta Cluster - https://phabricator.wikimedia.org/T116793#1761295 (10jcrespo) Are you committing time to this? > We (RelEng) probably won't be able to commit any time to it right now --greg [10:37:42] beta got updated [10:37:52] I have no idea how to verify whether the parsoid cache behaves properly [10:40:31] Error loading data from server: HTTP 504. :D [10:41:16] http://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User%3AHashar Failed to load resource: the server responded with a status of 504 (Gateway Timeout) [10:41:41] what a mess [10:41:48] I found that error via the console log [10:42:01] and I had to curl to get the actual error: [10:42:15] {"type":"https://restbase.org/errors/internal_http_error","method":"get","detail":"Error: connect ECONNREFUSED","uri":"http://deployment-parsoid05.deployment-prep.eqiad.wmflabs:8000/v2/en.wikipedia.beta.wmflabs.org/pagebundle/User%3AHashar/85930"} [10:45:05] 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761312 (10hashar) Doing a VE edit on beta I get: Error loading data from server: HTTP 504. In the browser console there is: http://en.wikipedia.beta... [10:48:13] 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761318 (10hashar) The merge of https://gerrit.wikimedia.org/r/#/c/249367/ did trigger the Jenkins job `beta-parsoid-update-eqiad` with: ``` + sudo /etc/init.d/p... [10:49:35] 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761322 (10hashar) [10:55:56] mobrovac: any clue where Parsoid logs are sent to ? We used to have something like /var/lib/parsoid/parsoid.log [10:56:18] or is that solely relying on logstash now ? [10:56:32] i think it's logstash only [10:56:36] * mobrovac checking [10:57:08] which is rather annoying when the service refuses to start :D [10:57:35] I wish we had some error output to be sent to /var/log/upstart/parsoid.log [10:57:36] but Iam ranting [10:57:55] hashar: according to the conf it's logging both to stdout and logstash [10:58:10] hashar: so for systemd, journalctl might help you see these [10:58:32] oh [10:58:34] /data/project/parsoid/parsoid.log [10:59:41] module.js:340 [10:59:41] throw err; [10:59:41] 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761333 (10hashar) Parsoid fails with: ``` module.js:340 throw err; ^ Error: Cannot find module '/srv/deployment/parsoid/p... [10:59:44] ^ [10:59:44] Error: Cannot find module '/srv/deployment/parsoid/parsoid/api/server.js' [10:59:49] at Function.Module._resolveFilename (module.js:338:15) [10:59:49] at Function.Module._load (module.js:280:25) [10:59:49] at Function.Module.runMain (module.js:497:10) [10:59:49] at startup (node.js:119:16) [10:59:49] at node.js:902:3 [11:02:14] 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761342 (10hashar) 5Open>3stalled stalled / pending fixup of Parsoid. Should fill another task for it. [11:05:02] 6Release-Engineering-Team, 3Scap3, 10Wikimedia-Developer-Summit-2016: Scap3: updates, upgrades, and challenges - https://phabricator.wikimedia.org/T114045#1761352 (10Qgil) Hi @thcipriani, this proposal is focusing on a Summit session but there is no indication about topics that could be discussed here before... [11:10:28] 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761372 (10hashar) 3NEW [11:12:25] 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761372 (10hashar) Most probably caused by https://gerrit.wikimedia.org/r/#/c/249138/ //T115665: Reorg parsoid repo// It renamed a bunch of files and the pupp... [11:13:29] 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1761395 (10hashar) Lot of files have been renamed in the parsoid source repo but the setting files to start the service have not been up... [11:15:07] mobrovac: I believe the Jessie cache is fine now [11:15:22] mobrovac: the root cause is Parsoid does not start anymore because a bunch of files got renamed in mediawiki/services/parsoid [11:15:37] mobrovac: I have filled a task for them to adjust the /etc/default/parsoid in puppet; [11:15:43] meanwhile VE/ PArsoid will remain broken :-} [11:16:33] still getting 503s for curl -v parsoid-beta.wmflabs.org [11:16:33] :/ [11:29:46] 5Continuous-Integration-Scaling, 6operations: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1761440 (10hashar) Holy hell, how do you manage to install servers so fast ? :-} [11:33:45] 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761460 (10mobrovac) `PARSOID_BASE_PATH` and `NODE_PATH` in `/etc/default/parsoid` are wrong. [11:38:24] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit, 5Patch-For-Review, 7Technical-Debt: Disable Gerrit replication to production slaves - https://phabricator.wikimedia.org/T86661#1761470 (10hashar) [11:38:26] 5Continuous-Integration-Scaling, 6operations: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1761469 (10hashar) [11:41:10] 5Continuous-Integration-Scaling, 6operations: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1761472 (10hashar) Need to get rid of the Gerrit replication ( T86661 ) We will need to make sure all slaves (gallium.wikimedia.org and instances in contintcloud and inte... [12:24:46] 5Continuous-Integration-Scaling, 6operations, 7Blocked-on-Operations: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1761540 (10fgiunchedi) afaict our puppet hooks for jessie does include `thirdparty` ``` package_builder::pbuilder... [12:32:00] (03PS1) 10Hashar: multigit.sh: no more hardcode Zuul git URL [integration/jenkins] - 10https://gerrit.wikimedia.org/r/249387 (https://phabricator.wikimedia.org/T95046) [12:32:11] (03CR) 10Hashar: [C: 032] multigit.sh: no more hardcode Zuul git URL [integration/jenkins] - 10https://gerrit.wikimedia.org/r/249387 (https://phabricator.wikimedia.org/T95046) (owner: 10Hashar) [12:33:04] (03Merged) 10jenkins-bot: multigit.sh: no more hardcode Zuul git URL [integration/jenkins] - 10https://gerrit.wikimedia.org/r/249387 (https://phabricator.wikimedia.org/T95046) (owner: 10Hashar) [12:49:25] (03CR) 10Hashar: [C: 032] build: Pass -s to phpcs for easier debugging [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/249320 (owner: 10Legoktm) [12:49:50] (03Merged) 10jenkins-bot: build: Pass -s to phpcs for easier debugging [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/249320 (owner: 10Legoktm) [13:31:05] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:41:03] RECOVERY - Puppet failure on integration-slave-precise-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [13:43:38] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Puppet, and 2 others: Puppetize npm/grunt manual setup - https://phabricator.wikimedia.org/T113903#1761698 (10hashar) All good on permanent slaves. When https://gerrit.wikimedia.org/r/#/c/244748/ is merged, we ca... [13:44:17] 10Continuous-Integration-Config: Provide a CI job to generate JS code coverage reports for extensions by using karma-coverage instead of just karma - https://phabricator.wikimedia.org/T116808#1761699 (10hashar) [13:44:53] 10Browser-Tests: Passed Jenkins jobs should have links to Sauce Labs jobs - https://phabricator.wikimedia.org/T48890#1761701 (10hashar) [13:47:56] 10Continuous-Integration-Infrastructure: phplint fails on paths containing a space - https://phabricator.wikimedia.org/T89380#1761715 (10hashar) 5Open>3Resolved a:3hashar The PHP package `jakub-onderka/php-parallel-lint` is not affected. Any repo suffering from that issue should be migrated to use the com... [13:48:16] 10Browser-Tests, 7Tracking: Move browser test alerts to responsible teams' channels from -releng - https://phabricator.wikimedia.org/T89375#1761718 (10hashar) [13:49:31] 10Continuous-Integration-Infrastructure: /tmp/bundler* directories left behind on Jenkins slaves - https://phabricator.wikimedia.org/T84974#1761720 (10hashar) 5Open>3Resolved a:3hashar This will be fixed by moving the ruby / bundler jobs to Nodepool instances i.e. with the `rake-jessie`. [13:49:32] 10Continuous-Integration-Infrastructure: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#1761723 (10hashar) [13:50:03] 10Continuous-Integration-Infrastructure: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#759885 (10hashar) [13:50:05] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: hhvm Jenkins job fill up /tmp with perf-*.map files - https://phabricator.wikimedia.org/T64788#1761725 (10hashar) 5Open>3declined a:3hashar Wont fixing it for now. There is not much disk spaces issues any more and the issue will be definitely so... [13:51:00] 10Continuous-Integration-Infrastructure: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#1761731 (10hashar) 5Open>3Resolved a:3hashar This is almost no more an issue compared to what it used to be at the end of 2014 / beginning of 2015. The complete res... [13:51:20] 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761735 (10SBisson) Similar problem with mw-vagrant. It is looking for `/vagrant/srv/parsoid/src/api/server.js`. It doesn't exist but there's `/vagrant/srv/par... [13:51:22] 10Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#1761736 (10hashar) [13:52:56] 10Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#540581 (10hashar) We have the `mwext-mw-selenium` job now which is meant to be run on patchset proposal. To achieve this task we would ne... [13:53:37] 10Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#1761739 (10hashar) [13:54:36] 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761741 (10ssastry) I am going to add a symlink for now so these continue to work. Separately, we'll update puppet (for both beta and production) -- I need to... [13:56:18] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Puppet, and 2 others: Puppetize npm/grunt manual setup - https://phabricator.wikimedia.org/T113903#1761752 (10hashar) [13:56:18] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 7Tracking: tracking: Cleanup contint puppet manifests so they are easier to reuse / split slaves by functions - https://phabricator.wikimedia.org/T110864#1761751 (10hashar) [13:56:57] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Migrate all debian-glue jobs to Jessie slaves - https://phabricator.wikimedia.org/T95545#1761756 (10hashar) p:5Normal>3Low [13:59:57] 10Continuous-Integration-Infrastructure: Migrate all jobs to labs slaves - https://phabricator.wikimedia.org/T86659#1761760 (10hashar) Status as October 28th 2015 ``` $ ssh gallium.wikimedia.org ls -1 /srv/ssd/jenkins-slave/workspace doc-publish-sync integration-docroot-deploy mediawiki-vagrant-puppet-doc mwext-... [14:04:57] 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761771 (10hashar) For puppet, `manifests/role/parsoid.pp` vary the config with: ``` lang=ruby class role::parsoid::beta { ... # For beta, override NOD... [14:07:43] 10Continuous-Integration-Infrastructure, 7Documentation: Document RuboCop workflow - https://phabricator.wikimedia.org/T1368#1761786 (10hashar) 5Open>3Resolved a:3hashar We have moved to use `rake` as an entry point. It is described on https://www.mediawiki.org/wiki/Continuous_integration/Entry_points#R... [14:09:19] 10Continuous-Integration-Infrastructure: Jenkins: Set up postmerge job to auto-deploy jenkins-job-builder configuration - https://phabricator.wikimedia.org/T49056#1761792 (10hashar) [14:09:20] 10Continuous-Integration-Infrastructure: Jenkins: Set up job to validate jenkins-job-builder configuration - https://phabricator.wikimedia.org/T45140#1761794 (10hashar) [14:09:22] 10Continuous-Integration-Infrastructure, 6operations: Install Jenkins Job Builder on gallium - https://phabricator.wikimedia.org/T45141#1761789 (10hashar) 5Open>3declined a:3hashar For now on we deploy them manually. Maybe we will get that moved to use scap3 and have it generate the jobs directly from t... [14:10:13] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1761796 (10hashar) [14:10:14] 10Continuous-Integration-Infrastructure, 7Tracking: Zuul: scale merge operations (tracking) - https://phabricator.wikimedia.org/T70480#1761795 (10hashar) [14:14:33] 10Continuous-Integration-Infrastructure, 7Jenkins, 7Upstream: Jenkins sporadically changes its UI language - https://phabricator.wikimedia.org/T90558#1761815 (10hashar) 5Open>3Resolved a:3hashar I haven't heard of any complain since we upgraded Jenkins. Assuming it got fixed somehow by upstream. [14:14:49] 10Beta-Cluster-Infrastructure, 10Parsoid, 7WorkType-Maintenance: Parsoid refuses to start on beta cluster - https://phabricator.wikimedia.org/T116901#1761818 (10ssastry) https://gerrit.wikimedia.org/r/#/c/249392/ is the parsoid patch to add the symlink https://gerrit.wikimedia.org/r/#/c/249399/ is the puppet... [14:15:16] 10Continuous-Integration-Infrastructure, 7Zuul: Zuul status page should show the pipelines "window" value - https://phabricator.wikimedia.org/T93701#1761820 (10hashar) p:5Low>3Lowest [14:15:24] 10Browser-Tests, 5Patch-For-Review: Auto retry failed browser tests to reduce false negatives - https://phabricator.wikimedia.org/T67773#1761821 (10hashar) [14:15:42] 10Continuous-Integration-Infrastructure: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#1761823 (10hashar) 5Open>3stalled [14:17:53] 6Release-Engineering-Team: Nightly builds tested and used for production deployments - https://phabricator.wikimedia.org/T67126#1761824 (10hashar) [14:20:29] 6Release-Engineering-Team, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1761834 (10saper) Question: wouldn't that be possible to ship the certificate as a parameter to `$wgForeignXXXR... [15:38:37] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:38] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:38] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:39] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:40] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #672: 04FAILURE in 2 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/672/ [15:38:41] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:42] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #306: 04FAILURE in 2 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/306/ [15:38:48] andrewbogott: sorry about https://gerrit.wikimedia.org/r/#/c/249389/ looks like my message is a bit confusing [15:38:48] andrewbogott: the zuul_url does not refer to the zuul/gearman server that is another parameter [15:38:48] the parameter is meant to be consumed by Jenkins job so they can retrieve the patch on the host that prepared the patch [15:38:48] so it needs to vary by host [15:38:49] !log rebooting deployment-parsoid05 . seems NFS is flappy [15:38:50] !log no matter, NFS is under maintenance [15:38:51] !sal [15:38:51] https://tools.wmflabs.org/sal/releng [15:38:51] ... [15:38:51] f****** bots [15:38:52] RECOVERY - Parsoid on deployment-parsoid05 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 0.050 second response time [15:38:53] subbu: we even have an Icinga check for the beta cluster parsoid service ^^^ :-} [15:38:53] oh, great! [15:38:53] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:38:53] i was wondering that earlier. [15:38:53] we could have caught this y'day if we had seen this. [15:38:57] bah [15:38:57] maybe related to NFS [15:38:57] Error: Command exceeded timeout [15:38:57] Error: /Stage[main]/Role::Labs::Instance/Exec[block-for-project-export]/returns: change from notrun to 0 failed: Command exceeded timeout [15:38:57] Notice: /Stage[main]/Role::Labs::Instance/Mount[/data/project]: Dependency Exec[block-for-project-export] has failures: true [15:38:57] Warning: /Stage[main]/Role::Labs::Instance/Mount[/data/project]: Skipping because of failed dependencies [15:38:57] yeah [15:38:57] all "normal" [15:38:57] hi, git deploy sync is not working for graphoid - https://phabricator.wikimedia.org/T116920 [15:38:58] hashar, ? :) [15:38:58] yurik: as it ever worked? [15:38:58] hashar, i think so [15:38:58] hashar, judgig from the sca01 & sca02, their git repos are pointing to bastion, not gerrit or github [15:38:58] I have no idea how Trebuchet works with salt / minion targets and so on [15:38:58] thcipriani, ? [15:38:58] mobrovac, ^ [15:38:58] I guess something got broken at some point [15:38:58] and the sca** hosts are not more attached as minions [15:38:58] or at least not with the grain trebuchet is expecting [15:38:58] hashar, that's an excellent observation ))))) [15:38:58] (i was refering to "something got broken at some point" :) [15:38:58] yeah [15:38:58] :-D [15:38:58] there was a problem like that in the past [15:38:58] with sca0x [15:38:58] I was it's weird that it says 0/0 minions. sca01 has the grain deploy_target:graphoid/deploy [15:38:58] thcipriani fixed it by purging the redis queue or some magic like that [15:38:58] yeah, that was a slightly different problem it was saying only half the minions checked in, because domain name changed. [15:38:58] so trebuchet thought we doubled the deploy size, but only half the domains were reachable. [15:38:58] er...hosts. [15:38:59] bd808, ^ [15:38:59] hashar@deployment-bastion:~$ redis-cli SMEMBERS "deploy_target:graphoid/deploy" [15:38:59] (empty list or set) [15:38:59] thcipriani: ^^^redis breakage? [15:38:59] Could be. It could be that deployment "worked" but the redis returner broke. [15:38:59] hashar: ah! That makes sense [15:39:01] andrewbogott: so you concern apply to HEAD^ :-} [15:39:01] andrewbogott: and the patch actually stop assuming everything is on gallium (zuul.eqiad.wmnet) [15:39:02] yeah, that part made sense [15:39:02] (03PS1) 10Paladox: [GuidedTour] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249414 [15:39:02] is nutcracker up and running? [15:39:02] andrewbogott: andI get another patch to add our shell access to scandium , which is on labs-support network ( https://gerrit.wikimedia.org/r/#/c/249380/ ) [15:39:02] nevermind. [15:39:02] andrewbogott: though that one grant me root on the machine which is quite useful for the service implmentation [15:39:03] yurik: did you manually update sca01? It has 07a2b2f7689addce080aaa49db1b73481c019fea checked out [15:39:03] bd808, yes, i tried to get it resolved manually [15:39:03] didn't work either ) [15:39:03] for some reason the service graphoid restart failed [15:39:03] (03PS1) 10Paladox: Add new jsduck template [integration/config] - 10https://gerrit.wikimedia.org/r/249416 [15:39:03] !log zuul-merger will now use ZUUL_URL=git://gallium.wikimedia.org instead of ZUUL_URL=git://zuul.eqiad.wmnet ( https://gerrit.wikimedia.org/r/#/c/249389/ ) [15:39:04] yurik: most of the time if you try to fix stuff manually that breaks the automatic system even more :-} [15:39:04] * yurik hides in shame [15:39:05] * yurik thinks automatic systems should fix problems automatically )) [15:39:05] yurik: worked just fine for me -- https://phabricator.wikimedia.org/T116920#1762053 [15:39:05] yurik: for what it is worth Tyler managed to deploy RESTBase with scap3 last week [15:39:05] so potentially we will migrate graphoid to scap3 as well [15:39:05] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #223: 04FAILURE in 2 min 6 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/223/ [15:39:06] (03PS2) 10Paladox: [GuidedTour] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249414 [15:39:06] bd808, i think i'm an idiot - i tried to deploy from bastion, and it "worked" [15:39:06] but showed 0/0 [15:39:06] oh, wait, you did too. never mind... .weird [15:39:06] !log beta: moved web proxy parsoid-beta.wmflabs.org to use http://deployment-cache-parsoid05.eqiad.wmflabs:80 [15:39:06] !log beta: deleting old deployment-parsoidcache02 (trusty) replaced by deployment-cache-parsoid05 (Jessie) [15:39:06] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [15:39:06] thanks everyone for helping!! I restarted the graphoid service on both sca1 & 2, it works now! [15:39:07] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #700: 04FAILURE in 2 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/700/ [15:39:09] bd808, how long did it take you to get from 0/2 to 2/2 ? [15:39:09] i'm trying to sync now (without any changes), and it keep staying at 0/2. i even tried "retry" [15:39:09] yurik: not long. 1-2 minutes? [15:39:09] ok, will wait for a bit [15:39:09] actually it returned that at the first timeout in the tool [15:39:09] which is somehting like 30s as I recall [15:39:09] that paste is the whole session I did [15:39:09] ah, there it goes [15:39:09] thx, works [15:39:11] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 38772 bytes in 0.627 second response time [15:39:13] 10Beta-Cluster-Infrastructure: Unable to trebuchet deploy Graphoid on deployment labs - https://phabricator.wikimedia.org/T116920#1762004 (10Yurik) 3NEW [15:39:19] 10Beta-Cluster-Infrastructure: Unable to trebuchet deploy Graphoid on deployment labs - https://phabricator.wikimedia.org/T116920#1762014 (10Yurik) [15:39:22] 10Beta-Cluster-Infrastructure, 10Parsoid, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1762016 (10hashar) Could use help from #parsoid people to verify whether the new Parsoid Varnish cache is properly working.... [15:39:53] ostriches: I have migrated the last varnish cache on beta-cluster from Trusty to Jessie :-} [15:40:29] 10Beta-Cluster-Infrastructure, 10Parsoid, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: Migrate Parsoid cache from Trusty to Jessie - https://phabricator.wikimedia.org/T103660#1762097 (10hashar) [15:40:31] 10Beta-Cluster-Infrastructure, 10Traffic, 6operations, 5Patch-For-Review: Upgrade beta-cluster caches to jessie - https://phabricator.wikimedia.org/T98758#1762095 (10hashar) 5Open>3Resolved {T103660} has finally been solved. That was the last Varnish cache still using Trusty. [15:42:05] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 38446 bytes in 1.806 second response time [15:43:36] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30297 bytes in 0.535 second response time [15:44:39] hashar: I had already built -04 for it [15:44:48] I just hadn't pointed the IP yet because puppet was half-failing. [15:45:06] yeah puppet failed because it could not create the varnish cache files on /srv/vdb [15:45:18] because -04 was a m1.small [15:45:19] and the extended /dev/vdb disk only had 500MBytes [15:45:33] so I created a a m1.medium instance with 40GB disk (and /dev/vdb with 20GB) [15:45:36] Ahh [15:45:39] that fixed the varnish cache creation file [15:45:45] then looped with marko to fix up parsoid [15:45:49] and subbu [15:46:05] since they reorganized parsoid files in the source repo over night, that broke Parsoid startup [15:46:07] anyway [15:46:10] it is all fine now ;:-) [15:48:16] Yay [15:48:24] bd808: Nothing uses the 'scap-test' dsh group, right? [15:48:52] nope. That was a short lived experiment by me and ori to restart hhvm on sync [15:49:21] The outcome: badness [15:49:34] I thought so. [15:49:44] we couldn't get pybal to depool as desired [15:50:01] The hope is to bring it back once pybal is using etcd [15:50:18] which has been a 6m project or so at this point [15:57:21] RECOVERY - Puppet failure on deployment-parsoid05 is OK: OK: Less than 1.00% above the threshold [0.0] [15:59:36] 6Release-Engineering-Team, 10Wikimedia-Developer-Summit-2016: [WIP] Code-review migration status/discussion - https://phabricator.wikimedia.org/T114320#1762149 (10mmodell) [16:00:27] !log for integration/zuul.git , created branch labs-tox-deployment to be used to deploy Zuul with pip on labs instances [16:00:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:00:35] cause .deb packaging is a nightmare [16:04:40] 10Deployment-Systems, 3Scap3: Ensure that git handles `git-update-server-info` automatically - https://phabricator.wikimedia.org/T116640#1762173 (10mmodell) p:5Triage>3Normal [16:07:24] 10Deployment-Systems, 3Scap3: default lock file for scap3 should be repo-dependent - https://phabricator.wikimedia.org/T116208#1762186 (10mmodell) p:5Triage>3Normal [16:40:45] 10Continuous-Integration-Config, 10Graphoid, 6Services: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1762331 (10hashar) Should be as easy as editing the `zuul/layout.yaml` file in `integration/config.git` and add: ``` - name: mediawiki/services/graphoid: templa... [16:42:33] (03PS1) 10Hashar: zuul: reorder mediawiki/services/cxserver [integration/config] - 10https://gerrit.wikimedia.org/r/249438 [16:43:47] (03PS1) 10Hashar: [graphoid] experimental npm job [integration/config] - 10https://gerrit.wikimedia.org/r/249439 (https://phabricator.wikimedia.org/T106668) [16:44:02] (03CR) 10Hashar: [C: 032] zuul: reorder mediawiki/services/cxserver [integration/config] - 10https://gerrit.wikimedia.org/r/249438 (owner: 10Hashar) [16:45:04] (03Merged) 10jenkins-bot: zuul: reorder mediawiki/services/cxserver [integration/config] - 10https://gerrit.wikimedia.org/r/249438 (owner: 10Hashar) [16:45:15] (03CR) 10Hashar: [C: 032] [graphoid] experimental npm job [integration/config] - 10https://gerrit.wikimedia.org/r/249439 (https://phabricator.wikimedia.org/T106668) (owner: 10Hashar) [16:46:12] (03Merged) 10jenkins-bot: [graphoid] experimental npm job [integration/config] - 10https://gerrit.wikimedia.org/r/249439 (https://phabricator.wikimedia.org/T106668) (owner: 10Hashar) [16:49:54] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1762373 (10Dzahn) tin got firewalling during this morning's swat deploy. that meant tin and mira are now identical and we could merge hoo's change above to reflect... [16:50:29] 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1762378 (10hashar) The CI patch I made let you trigger the Jenkins `npm` job on Trusty slaves by commenting in Gerrit: `check experimental`.... [16:50:33] 10Continuous-Integration-Config, 6operations, 5Patch-For-Review: Forbid quoted booleans in puppet manifests - https://phabricator.wikimedia.org/T113783#1762381 (10Andrew) 5Open>3Resolved [17:11:11] (03PS2) 10Krinkle: Split WebPageTest jobs into two [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog) [17:13:43] (03PS3) 10Krinkle: Split WebPageTest jobs into two [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog) [17:14:22] (03CR) 10Krinkle: [C: 032] "Deploying.." [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog) [17:16:22] (03Merged) 10jenkins-bot: Split WebPageTest jobs into two [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog) [17:17:05] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms [17:18:21] (03CR) 10Chad: [C: 032] Provide scap control server FQDN to proxy sync commands [tools/scap] - 10https://gerrit.wikimedia.org/r/247965 (https://phabricator.wikimedia.org/T104826) (owner: 10BryanDavis) [17:19:08] (03Merged) 10jenkins-bot: Provide scap control server FQDN to proxy sync commands [tools/scap] - 10https://gerrit.wikimedia.org/r/247965 (https://phabricator.wikimedia.org/T104826) (owner: 10BryanDavis) [17:22:16] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [17:25:44] 5Continuous-Integration-Scaling, 6operations, 10ops-eqiad: Reclaim SSD from labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T116936#1762547 (10hashar) [17:28:28] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1762568 (10hashar) We got shell access thanks to ops reviews! Will now look at the network flows. Once happy we can apply the zuul::merger role and d... [17:40:06] (03CR) 10Krinkle: "Live at https://integration.wikimedia.org/ci/job/performance-webpagetest-wmf/ and https://integration.wikimedia.org/ci/job/performance-web" [integration/config] - 10https://gerrit.wikimedia.org/r/249363 (owner: 10Phedenskog) [17:52:07] 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1762647 (10mobrovac) Graphoid requires some extra pkgs to be installed on the system, cf. [the dependencies](https://github.com/wikimedia/med... [18:16:40] Project beta-scap-eqiad build #76283: 04FAILURE in 1 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/76283/ [18:34:41] Yeah I know ^ [18:34:45] It's actually mostly working [19:02:54] (03PS1) 10Legoktm: Add config for `base-convert` repository [integration/config] - 10https://gerrit.wikimedia.org/r/249476 [19:03:37] (03PS2) 10Legoktm: Add config for `base-convert` repository [integration/config] - 10https://gerrit.wikimedia.org/r/249476 [19:03:48] (03CR) 10Legoktm: [C: 032] Add config for `base-convert` repository [integration/config] - 10https://gerrit.wikimedia.org/r/249476 (owner: 10Legoktm) [19:05:45] (03Merged) 10jenkins-bot: Add config for `base-convert` repository [integration/config] - 10https://gerrit.wikimedia.org/r/249476 (owner: 10Legoktm) [19:06:10] !log deploying https://gerrit.wikimedia.org/r/249476 [19:06:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:18:51] (03PS1) 10Paladox: [Git2Pages] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249481 [19:19:43] (03PS2) 10Paladox: [Git2Pages] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249481 [19:20:21] (03PS3) 10Paladox: [Git2Pages] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/249481 [19:24:03] 10Beta-Cluster-Infrastructure, 6Analytics-Engineering, 7Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1762992 (10chasemp) [19:24:58] 10Deployment-Systems, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1763007 (10chasemp) p:5Triage>3Normal [19:25:13] 10Deployment-Systems, 6Release-Engineering-Team, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1759278 (10chasemp) [19:25:31] 10Deployment-Systems, 6Release-Engineering-Team, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1759278 (10chasemp) at #release-Engineering-Team please advise :) [19:27:45] 5Continuous-Integration-Scaling, 6operations, 10ops-eqiad: Reclaim SSD from labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T116936#1763049 (10chasemp) 5Open>3stalled Let's wait on this until we have a fully realized and migrated solution here just in case so we don't end up in a "oh wait... [19:27:51] 5Continuous-Integration-Scaling, 6operations, 10ops-eqiad: Reclaim SSD from labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T116936#1763052 (10chasemp) p:5Triage>3Lowest [20:01:10] 10Beta-Cluster-Infrastructure, 6Analytics-Engineering, 7Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1763189 (10hashar) The reason is the role classes in `modules/role/manifests/cache/statsd.pp` all have: ``` statsd_server => 'statsd.e... [20:09:46] 10Beta-Cluster-Infrastructure, 6Analytics-Engineering, 5Patch-For-Review, 7Varnish, 7WorkType-Maintenance: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1761231 (10hashar) [20:12:52] 5Continuous-Integration-Scaling, 6operations, 10ops-eqiad: Reclaim SSD from labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T116936#1763238 (10hashar) Good call. We never know :-) [20:17:45] 10Deployment-Systems, 6Release-Engineering-Team, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1763243 (10hashar) The original commit is from Nov 30, 2012. I think at one point the idea was to publish the state of the repos on the de... [20:20:05] 10Deployment-Systems, 6Release-Engineering-Team, 6operations: Investigate whether mod_dav needs to stay enabled on tin/terbium - https://phabricator.wikimedia.org/T116823#1763254 (10chasemp) 5Open>3Resolved a:3chasemp [20:22:01] 10Beta-Cluster-Infrastructure, 7Blocked-on-RelEng, 6operations, 7HHVM, 5Patch-For-Review: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1763256 (10chasemp) [20:25:08] 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1763277 (10hashar) I am npm illiterate, but what is that section in package.json: ``` "deploy": { "target": "ubuntu", "dependenci... [20:28:31] 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1763281 (10hashar) We will need both the binaries and -dev packages dependencies to be shipped on the CI slave. For production the packages... [20:28:53] 10Continuous-Integration-Infrastructure, 10MediaWiki-Documentation, 7Documentation: Create a Jenkins check to verify hooks.yaml formatting - https://phabricator.wikimedia.org/T116965#1763282 (10Tgr) 3NEW [20:29:03] 10Continuous-Integration-Infrastructure, 10MediaWiki-Documentation, 7Documentation: Create a Jenkins check to verify hooks.yaml formatting - https://phabricator.wikimedia.org/T116965#1763293 (10Tgr) [20:29:34] chasemp: looked a bit at scandium. It is unreachable from the labs instance [20:29:49] can you outline in the task what access labs instances need to it [20:29:54] I am not sure if that is a firewall in between labs-support and the actual instances [20:29:55] i.e. what IP and port [20:29:58] yes [20:29:59] there is [20:30:03] or if the iptables is broken [20:30:03] ah [20:30:23] I can help out get it sorted but please put what we need in writing on the task and I'll look this week [20:30:32] will update the wiki doc and fill a subtask to the scandium one :-) [20:30:40] nah just use the main task [20:30:41] that's fine [20:30:42] yeah yeah filling a task was implied [20:30:51] just wanted to quickly check whether I missed something obvious [20:30:58] no it's restricted for sure [20:31:05] :) [20:37:57] 10Continuous-Integration-Infrastructure, 10MediaWiki-Documentation, 7Documentation: Create a Jenkins check to verify hooks.yaml formatting - https://phabricator.wikimedia.org/T116965#1763319 (10hashar) Creating a specific job for each use cases turned out to be a nightmare. So instead the Jenkis jobs have be... [20:54:07] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Write a migration plan for CI infra to the disposable VMs infrastructure - https://phabricator.wikimedia.org/T86172#1763392 (10hashar) [20:54:09] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Design the Jenkins isolation architecture - https://phabricator.wikimedia.org/T86171#1763388 (10hashar) 5Open>3Resolved Nodepool has been deployed. The rest of the components are being slowly added as time allow. We have just starte... [20:54:10] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 10releng-201415-Q3, 10releng-201415-Q4, 7Epic: [EPIC] Run CI jobs in disposable VMs - https://phabricator.wikimedia.org/T47499#1763393 (10hashar) [21:19:06] chasemp: you were right this project is epic :-} [21:26:24] 10Browser-Tests, 5Patch-For-Review: Investigate QuickSurveys browser tests failures - https://phabricator.wikimedia.org/T113534#1763570 (10Jdlrobson) [21:26:37] 10Browser-Tests, 5Patch-For-Review, 3Reading Web Sprint 59 - Amsterdam and the hamsters: Investigate QuickSurveys browser tests failures - https://phabricator.wikimedia.org/T113534#1668078 (10Jdlrobson) [21:39:41] 5Continuous-Integration-Scaling, 6operations: Allow network flow between labs instance and scandium - https://phabricator.wikimedia.org/T116975#1763623 (10hashar) 3NEW a:3hashar [21:40:38] chasemp: I have filled the task, hopefully it is not going to be controversial :-} [21:40:58] kk [22:10:08] phabricator has badges! https://secure.phabricator.com/badges/ [22:10:55] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1763840 (10mmodell) [22:10:58] yes hotly debated :) [22:11:07] in our isntall [22:11:09] I guess [22:13:34] 10Deployment-Systems, 3Scap3: Ensure that git handles `git-update-server-info` automatically - https://phabricator.wikimedia.org/T116640#1763854 (10mmodell) a:3demon [22:13:36] we would face the risk of being more social [22:14:06] we already have like _and_ dislike button :) [22:14:14] but give me badges [22:14:30] gamification works great for wikidata too [22:15:10] badges are more like barnstars something something [22:15:14] tokens are what you mean? [22:15:21] wants to create new tokens, the pterodactyl and cactus are getting old [22:15:35] badges look like achievements unlocked [22:15:46] that i get automatically based on stats, right [22:16:55] i'll take it all, and the memes. [22:18:11] you laugh but I've been in a convo where someone wanted to know from legal if were liable for copyrighted memes :) [22:19:00] i would love to award CI related badges to folks :-D [22:19:05] or "you fixed beta cluster" ! [22:19:36] go for it :) [22:20:11] I have my own little ascii art barn star [22:20:19] that i paste to Gerrit changes from time to time [22:26:45] chasemp: just as much as we are when people put them on commons ?:P [22:26:54] i mean, isn't that question already solved [22:27:00] for the main wiki sites [22:27:26] don't expect legal shenanigans to make sense [22:27:30] right [22:27:58] actually...it should be on phab like on wikis [22:28:04] you can use the images from commons [22:28:08] you dont upload locally [22:28:15] and commons handles all these questions like they normally do [22:28:16] done [22:42:03] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [22:43:00] I thought hashar got rid of that? [22:45:53] 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#1764003 (10mobrovac) >>! In T106668#1763277, @hashar wrote: > I am npm illiterate, but what is that section in package.json: This is #servic... [23:09:34] 10Continuous-Integration-Infrastructure, 10MediaWiki-Documentation, 7Documentation: Create a Jenkins check to verify hooks.yaml formatting - https://phabricator.wikimedia.org/T116965#1764227 (10Tgr) >>! In T116965#1763319, @hashar wrote: > Creating a specific job for each use cases turned out to be a nightma... [23:34:38] 5Continuous-Integration-Scaling, 6operations: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1764315 (10chasemp) [23:34:54] 5Continuous-Integration-Scaling, 6operations: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1433420 (10chasemp) >>! In T104967#1761540, @fgiunchedi wrote: > afaict our puppet hooks for jessie does include `thirdparty` > > ``` > pac... [23:51:39] Yippee, build fixed! [23:51:40] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #831: 09FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/831/