[05:01:30] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:22] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 46476 bytes in 1.740 second response time [05:36:57] 10Gerrit, 06WMF-Legal, 07Privacy: Using Gerrit/git requires the email registered via wikitech and ends ups being voluntary disclosed (break of privacy?) - https://phabricator.wikimedia.org/T151529#3149746 (10Peachey88) [06:14:05] PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:14:53] PROBLEM - Puppet run on saucelabs-03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [06:40:23] Project selenium-Wikibase ยป chrome,beta,Linux,BrowserTests build #319: 04FAILURE in 2 hr 0 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/319/ [06:49:03] RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [06:49:52] RECOVERY - Puppet run on saucelabs-03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:56:50] hashar: re https://phabricator.wikimedia.org/T125050 [07:57:03] Any chance to get this worked around in some way? [08:12:49] hashar: hello! After https://gerrit.wikimedia.org/r/#/c/345810 deployment-jobrunner02 seems broken [08:13:30] 10Beta-Cluster-Infrastructure: Puppet at deployment-tin is not running - https://phabricator.wikimedia.org/T162016#3149883 (10Luke081515) [08:13:33] afaics it is a bit unclear if the deployment-prep config is best to be put in Hiera puppet or https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [08:13:37] 10Beta-Cluster-Infrastructure: Puppet at deployment-tin is not running - https://phabricator.wikimedia.org/T162016#3149883 (10Luke081515) p:05Triage>03High [08:14:19] theoretically mediawiki_session_redis_servers should become "sessions" under "redis::shards" [08:15:40] (also "redis::shards" under deployment-prep/common is not up to date with https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep) [08:15:51] not sure how we want to amend this inconsistency [08:15:57] will wait for your expert opinion :) [08:25:46] 10Browser-Tests-Infrastructure, 07Documentation, 07Easy: audit/update headers in files - https://phabricator.wikimedia.org/T69141#713980 (10zeljkofilipin) This is a rough list of files that has string `qa-browsertests` https://github.com/search?q=org%3Awikimedia+qa-browsertests&type=Code File headers shoul... [08:29:02] elukey: Giuseppe filled a task about how beta+hiera is a mess [08:29:39] elukey: there are too many sources: wikitech Hiera namespace, Horizon, /hieradata/labs/deployment-prep in puppet.git and the cherry picks [08:29:39] etc [08:29:53] there are probably a dozen place where one can mess with hiera settings :( [08:30:20] the stance is more or less that: [08:30:47] - some dislike horizon because it lacks history of actions [08:31:02] - puppet.git lacks access to non opsen [08:31:23] - cherry picks are not easy to find out (have to look on the puppet master) [08:31:32] - people end up using wikitech :D [08:31:35] but yeah that is all inconsitent [08:32:14] hoo: re Scribunto in gate, beside the short analysis I did last week, no I am not looking into it :\ [08:32:49] hashar: I see :/ Could we try switching away from LuaSandbox for now or something? [08:33:01] This makes all of our gate and submits fail [08:33:05] which is super annoying [08:33:09] hashar: any preferred way to fix "redis::shards" ? [08:34:22] elukey: looks like it is currently on https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep so probably easier to fix it there? [08:34:59] hashar: looks like it.. Mind if I add redis::shards sessions in there? [08:35:32] elukey: that is probably the easiest for now :-} [08:36:27] hoo: has anyone at least tried to reproduce and track the memory usage? [08:50:11] hashar: Not as far as I know, I don't have Lua Sandbox myself [08:54:09] RECOVERY - Puppet run on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:55:29] RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [08:56:45] elukey: looks good now? ^^ [08:57:12] yep! :) [08:57:44] hashar: now second question :) [08:58:24] I'd like to live-hack jobrunner02 (/usr/src/hhvm/hphp/system/php/redis/Redis.php) to remove the QUIT command [08:58:37] and see if the RST go away [09:00:50] elukey: well I guess that Redis.php class is compiled/embedded in hhvm itself [09:01:00] RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [09:01:16] RECOVERY - Puppet run on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:01:35] hashar: what I want to try is to change the file and then restart hhvm [09:01:43] potentially one can hack the jobrunner PHP service [09:01:46] theoretically that php file should end up in the jit cache [09:01:51] copy paste the HHVM Redis class to something like: RedisDebug [09:02:03] and hack the src/RedisJobService.php to do : new RedisDebug(); [09:02:32] or maybe we can extend the built-in Redis [09:03:03] hashar: do you think that /usr/src/hhvm/hphp/system/php/redis/Redis.php is not re-evaluated when hhvm starts? [09:03:51] let me try a monkey patch [09:06:57] elukey: what is the bug # already? [09:07:21] https://github.com/facebook/hhvm/issues/7757 [09:07:35] ah you mean the phab task [09:07:36] ?? [09:07:52] T125735 [09:07:53] T125735: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735 [09:08:29] RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [09:09:52] elukey: https://gerrit.wikimedia.org/r/#/c/346117/1/src/RedisJobService.php [09:10:01] is the absolutely most horrible code I can come up with [09:10:19] that switch the jobrunner service to use a new class RedisMonkeyPatched instead of Redis [09:10:21] RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:10:25] which implements a different close() method [09:10:43] so maybe we can cherry pick that on jobrunner02 under /srv/deployment/mediawiki/xxxxx/something/yyy [09:10:46] restart the jobrunner service [09:10:49] and see what happen [09:10:58] (my guess is: my code will fatal out somehow) [09:11:55] zeljkof: ^^^ sorry been messing up with hhvm/redis etc :} [09:12:55] hashar: joining the hangout? or want to skip today? [09:15:00] hashar: I'll follow your lead, but didn't want to waste a lot of your time, just do a quick test :( [09:15:53] elukey: let me cherry pick the patch and restart the service [09:16:17] at some point I'll have to ship beers to you hashar [09:16:52] I avoid real life events nowadays [09:17:00] I get flooded by beers the minute I arrive at the venue [09:17:33] !log deployment-jobrunner02 : cherry picked a monkey patch for Redis::close() to prevent it from sending QUIT command ( https://gerrit.wikimedia.org/r/#/c/346117/ ) - T125735 [09:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:17:37] T125735: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735 [09:18:07] the jobrunner is a mess really [09:18:33] hashar hi, (not important can be read any time) But upstream have a patch for using the soy template https://gerrit-review.googlesource.com/#/c/100962/ i've tested it with my follow up https://gerrit-review.googlesource.com/#/c/101733/ [09:18:45] no configuration or changes needed for us either too :) [09:18:46] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3150082 (10WMDE-leszek) The problem reported in T161698 blocks merging any of Wikibase patches, so this is becoming more pressing for Wikidat... [09:18:55] it uses the canonical config [09:19:24] elukey: so assuming some jobs are being run, the jobrunner should no more send QUIT to redis [09:19:33] * elukey runs tcpdump [09:19:55] also there are alot of users (by lots i mean a few) aggree we should backport it to stable-2.14 [09:20:05] google employees too [09:20:36] paladox: the Soy template is to make polygerrit easier to hack ? [09:20:43] Yes [09:20:47] Well i think it is [09:21:21] This is on the issue page "Rather than serving the static file, serve index.html via a Soy template." [09:21:24] paladox: we can still cherry pick the patch when building our gerrit 2.14 package [09:22:22] RECOVERY - Puppet run on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [09:22:33] One problem though https://gerrit-review.googlesource.com/#/c/100962/ dosen't hook it up to any of the urls inside polygerrit's js yet. It only implements it so it can be done in a follow up. He is doing it step by step. But they are going to fix it on stable-2.14. [09:22:50] But with this patch https://gerrit-review.googlesource.com/#/c/101733/ that hooks it up. [09:24:18] hashar: I am running tcpdump -n -v 'tcp[tcpflags] & (tcp-rst) != 0 and (host 10.68.16.177 or host 10.68.16.231)' and I can definitely see a lot less RST going [09:24:34] (a lot less is horrible, just realized it) [09:24:44] a lot less hmm [09:24:45] so less ? [09:24:47] ;-} [09:24:51] hahaha yes [09:25:01] hashar i got feedback on https://github.com/jenkinsci/trilead-ssh2/pull/9#issuecomment-285813590 :) [09:25:15] but still see them, so I am wondering if we are indeed not sending the qui [09:25:18] *quit [09:25:28] I'll try to check capturnins some traffic [09:29:49] elukey: there is another service running [09:29:52] the ChronRunner or something [09:30:17] and the services hit MediaWiki locally on /rpc.php or something [09:30:24] jobchron? [09:30:30] yeah jobchron [09:30:33] and MediaWiki itself might well invoke redis for other things [09:30:54] class RedisJobChronService extends RedisJobService [09:31:21] so now the jobchron should use the same hack [09:31:54] elukey: I have restarted jobchron as well :D [09:32:13] so it should have the same code running now [09:32:36] hashar: confirmed that the RST on jobrunner02 were due to some QUIT commands [09:32:47] \O/ [09:34:06] well that means that we are still sending the QUITs :P [09:34:25] checking again [09:36:43] not seeing any RST :) [09:42:41] too soon, I can see the RST [09:42:56] that can be something else [09:43:24] I am re-running tcpdump to check traffic :) [09:44:31] looks like jobrunner uses deployment-redis01 10.68.16.177 [09:44:50] maybe the RST are for the other instance redis02 10.68.16.231 [09:47:34] or they are RST from some other service [09:48:50] nono I can see RST from jobrunner to 10.68.16.177 [09:48:57] and QUIT commands in wireshard [09:49:08] (the follow TCP stream option is life saving) [09:50:46] let's try to restart hhvm (doing it now) [09:52:58] :( [09:56:27] they have different caches though [09:56:43] jobrunner / jobchron are run from command line, I would expect the cache file to be /var/cache/hhvm/cli.hhbc.sq3 [09:56:56] while the hhvm service would be the fcgi.hhbc.sq3 file [09:57:09] we can always nuke them and restart all services [09:57:58] yep.. I brutally commented one line in /usr/src/hhvm/hphp/system/php/redis/Redis.php and restarted hhvm to see if it works [10:00:25] oh [10:00:43] I don't think that is used [10:03:02] yeah unsuccessfull attemp [10:03:07] *attempt [10:03:07] :) [10:03:08] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 4 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3150118 (10zeljkofilipin) [10:03:23] hashar: next step brutally clear cache? [10:03:39] yeah [10:03:48] and restart everything :D [10:03:57] then the RST sent could be anything [10:07:42] 10Beta-Cluster-Infrastructure: Puppet at deployment-tin is not running - https://phabricator.wikimedia.org/T162016#3150123 (10hashar) 05Open>03Resolved a:03elukey Catalog failed with: ``` Error: Could not retrieve catalog from remote server: Error 400 on SERVER: redis_servers is not a hash or array when ac... [10:23:47] hashar: can still see the RSTs and the QUIT commands via wireshark/tcpdump [10:24:07] still to redis01 ? [10:24:13] maybe that is some other things [10:24:24] or my patch is plain wrong :-} [10:26:06] yeah.. [10:29:35] elukey: on jobrunner02 I don't see RST packets via tshark -n -f 'host 10.68.16.177' [10:30:42] it takes a bit before getting them [10:32:13] I am using sudo tcpdump -n -v 'tcp[tcpflags] & (tcp-rst) != 0 and (host 10.68.16.177 or host 10.68.16.231)' [10:33:33] really rare though [10:34:09] why am I seeing the QUIT command in the tcpdump logs though? [10:34:15] grrr difficult mondays [10:34:37] 10.68.19.42.51200 > 10.68.16.177.6379: Flags [R], cksum 0xa30b (correct), seq 1225108097, win 0, length 0 [10:34:40] this is an example [10:38:27] at least there are no more QUIT sent [10:39:00] though I don't know whether they are reported by redis-cli MONITOR [10:40:13] elukey: most of the RST spam is gone isn't it ? [10:40:43] what would be ideal is to capture the whole TCP sequence that eventually contains a RST [10:40:52] so we can analyze what redis command got emitted in that session [10:41:24] yes this is what I have been doing, using "Follow tcp stream" in wireshark to check the data exchanged by jobrunner and redis.. [10:41:30] each time finding QUIT [10:43:48] ahhh [10:43:53] but now I checked random packets (following their stream) and QUIT is not there [10:44:02] so there might be a QUIT hiding somewhere :D [10:44:50] I haven't audited the whole code [11:07:07] hasharLaunch https://github.com/jenkinsci/trilead-ssh2/pull/13/files [11:07:15] suppor for edsa keys in trilead-ssh2 [11:07:16] :) [11:07:21] suppor = support [11:08:05] PROBLEM - Puppet run on integration-c1 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:37:31] 10Continuous-Integration-Infrastructure, 07Jenkins, 07Upstream, 07WorkType-NewFunctionality: Jenkins trilead-ssh2 doesn't support our MAC/KEX algorithms - https://phabricator.wikimedia.org/T103351#3150299 (10Paladox) @hashar even better see this https://github.com/jenkinsci/trilead-ssh2/pull/13 and https:/... [12:13:52] (03PS3) 10Hashar: Move fundraising dash from Node 0.10 to Node 6 [integration/config] - 10https://gerrit.wikimedia.org/r/345571 (https://phabricator.wikimedia.org/T99869) [12:16:36] (03CR) 10Hashar: [C: 032] Drop node-0.10 support [integration/config] - 10https://gerrit.wikimedia.org/r/345577 (https://phabricator.wikimedia.org/T161884) (owner: 10Hashar) [12:16:51] (03CR) 10Hashar: [C: 032] Move fundraising dash from Node 0.10 to Node 6 [integration/config] - 10https://gerrit.wikimedia.org/r/345571 (https://phabricator.wikimedia.org/T99869) (owner: 10Hashar) [12:17:02] (03PS3) 10Hashar: Drop node-0.10 support [integration/config] - 10https://gerrit.wikimedia.org/r/345577 (https://phabricator.wikimedia.org/T161884) [12:17:09] (03CR) 10Hashar: Drop node-0.10 support [integration/config] - 10https://gerrit.wikimedia.org/r/345577 (https://phabricator.wikimedia.org/T161884) (owner: 10Hashar) [12:18:20] (03Merged) 10jenkins-bot: Move fundraising dash from Node 0.10 to Node 6 [integration/config] - 10https://gerrit.wikimedia.org/r/345571 (https://phabricator.wikimedia.org/T99869) (owner: 10Hashar) [12:19:35] (03CR) 10Hashar: [C: 032] Drop node-0.10 support [integration/config] - 10https://gerrit.wikimedia.org/r/345577 (https://phabricator.wikimedia.org/T161884) (owner: 10Hashar) [12:20:26] 10Continuous-Integration-Infrastructure (phase-out-trusty), 13Patch-For-Review: Migrate NodeJS Nodepool jobs from Trusty to Jessie - https://phabricator.wikimedia.org/T161884#3150434 (10hashar) 05Open>03Resolved a:03hashar [12:21:03] (03Merged) 10jenkins-bot: Drop node-0.10 support [integration/config] - 10https://gerrit.wikimedia.org/r/345577 (https://phabricator.wikimedia.org/T161884) (owner: 10Hashar) [12:29:24] (03PS1) 10Hashar: Drop skins/chameleon moved to github [integration/config] - 10https://gerrit.wikimedia.org/r/346137 [12:33:11] (03CR) 10Hashar: [C: 032] Drop skins/chameleon moved to github [integration/config] - 10https://gerrit.wikimedia.org/r/346137 (owner: 10Hashar) [12:34:39] (03Merged) 10jenkins-bot: Drop skins/chameleon moved to github [integration/config] - 10https://gerrit.wikimedia.org/r/346137 (owner: 10Hashar) [12:46:04] 10Continuous-Integration-Config, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint: CI tests on wikimedia/portals repo: cache node_modules to save time - https://phabricator.wikimedia.org/T152386#3150486 (10hashar) [12:46:06] 10Continuous-Integration-Config: Castor: mediawiki-core-qunit-jessie node_modules cache ineffective - https://phabricator.wikimedia.org/T159591#3150485 (10hashar) [12:47:51] 10Continuous-Integration-Config: Castor: mediawiki-core-qunit-jessie node_modules cache ineffective - https://phabricator.wikimedia.org/T159591#3072606 (10hashar) I found a good candidate: wikimedia/portals @Jdrewniak filled T152386 to get node_modules cached. So I guess we can try to switch it to `npm prune &&... [12:49:33] 10Continuous-Integration-Config, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint: CI tests on wikimedia/portals repo: cache node_modules to save time - https://phabricator.wikimedia.org/T152386#3150492 (10hashar) That came around on T159591 which is more generic. In short instead of doing: ```... [12:53:23] (03PS1) 10Hashar: jjb: rm unused {name}-npm-run-{script} [integration/config] - 10https://gerrit.wikimedia.org/r/346142 [12:57:54] (03CR) 10Hashar: [C: 032] "Noop :}" [integration/config] - 10https://gerrit.wikimedia.org/r/346142 (owner: 10Hashar) [12:59:07] (03Merged) 10jenkins-bot: jjb: rm unused {name}-npm-run-{script} [integration/config] - 10https://gerrit.wikimedia.org/r/346142 (owner: 10Hashar) [13:38:57] hashar woohoo i fixed almost all cases of fixing the links in https://gerrit-review.googlesource.com/#/c/101733/ :) tested all links locally and found none breaking. I even fixed the footer link :) [13:41:47] paladox: \O/ [13:41:58] Im deploying it to gerrit-new now :) [13:44:33] is it sad that i had to quote word for word wmf mission statement to prove a point xD [13:44:50] anyway hows jenkins holding up with precise being out of the picture [13:45:07] PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:49:10] PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:50:32] 06Release-Engineering-Team, 06Operations, 06Services, 05Goal, 07kubernetes: Prepare and maintain base container images - https://phabricator.wikimedia.org/T162042#3150656 (10akosiaris) [14:01:53] (03PS1) 10Hashar: Cache node_modules [integration/config] - 10https://gerrit.wikimedia.org/r/346152 (https://phabricator.wikimedia.org/T159591) [14:03:40] (03PS2) 10Hashar: Cache node_modules [integration/config] - 10https://gerrit.wikimedia.org/r/346152 (https://phabricator.wikimedia.org/T159591) [14:05:15] (03PS3) 10Hashar: Cache node_modules [integration/config] - 10https://gerrit.wikimedia.org/r/346152 (https://phabricator.wikimedia.org/T159591) [14:08:41] (03CR) 10Hashar: "We can probably give this a try by:" [integration/config] - 10https://gerrit.wikimedia.org/r/346152 (https://phabricator.wikimedia.org/T159591) (owner: 10Hashar) [14:12:33] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3150790 (10hoo) p:05Normal>03High [14:14:04] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3150798 (10Paladox) We should try and make scribuntu more performant i.e. performance wise. Or at least skip the tests that cause the tests t... [14:20:05] RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [14:22:41] 06Release-Engineering-Team, 06Operations, 05Goal, 06Services (designing), and 2 others: Prepare and maintain base container images - https://phabricator.wikimedia.org/T162042#3150814 (10mobrovac) [14:27:50] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3150819 (10hoo) >>! In T125050#3150798, @Paladox wrote: > We should try and make scribuntu more performant i.e. performance wise. Or at least... [14:39:46] 10Continuous-Integration-Infrastructure (phase-out-trusty): Migrate PHP5.5 jobs from Trusty to Jessie - https://phabricator.wikimedia.org/T161882#3150861 (10hashar) [14:39:48] 10Continuous-Integration-Config, 13Patch-For-Review: Combine composer-php55 and composer-hhvm jobs - https://phabricator.wikimedia.org/T142457#3150862 (10hashar) [14:39:50] 10Continuous-Integration-Infrastructure (phase-out-trusty): Install PHP5.5 on jessie CI instances - https://phabricator.wikimedia.org/T144959#3150859 (10hashar) 05Open>03declined Unsurprisingly sury.org no more provides PHP 5.5 packages since it has reached end of life. https://www.patreon.com/posts/php-5-5-... [14:42:06] 10Continuous-Integration-Infrastructure (phase-out-trusty): Migrate PHP5.5 jobs from Trusty to Jessie - https://phabricator.wikimedia.org/T161882#3150866 (10hashar) sury.org no more provides PHP 5.5 packages since it has reached end of life and I closed the related sub task T144959. I am tempted to phase out PH... [15:00:50] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3150901 (10Anomie) >>! In T125050#3150798, @Paladox wrote: > We should try and make scribuntu more performant i.e. performance wise. Or at le... [16:17:40] RainbowSprinkles and mutante great news, upstream have created https://gerrit-review.googlesource.com/#/c/100962/ (using soy to create index.html for polygerrit) It only implements the backend. A follow up will fix the links. Anyways it should land on stable-2.14 hopefully soon as they aggree it should be on there. I did a follow up here https://gerrit-review.googlesource.com/#/c/101733/ for me to test and it indeeds works. I fixed all the [16:17:40] links i see no breakages. The footer link works in gwt and polygerrit now (for prefixed urls). [16:18:23] no rewrites needed. and no configuation changes needed too. Uses the canonical config in gerrit which we have already set. [16:29:35] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 4 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3151313 (10zeljkofilipin) [18:23:33] 10Continuous-Integration-Config: Raise priority for operations-mw-config-composer-hhvm-jessie from the gate-and-submit pipeline - https://phabricator.wikimedia.org/T162076#3151697 (10Dereckson) [18:33:07] 10Continuous-Integration-Config: Raise priority for operations-mw-config-composer-hhvm-jessie from the gate-and-submit pipeline - https://phabricator.wikimedia.org/T162076#3151697 (10Paladox) It is already higher priority - name: operations/mediawiki-config check: - operations-mw-config-php55lint... [18:53:24] PROBLEM - Puppet run on integration-slave-docker-1000 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:12:35] hashar the polygerrit changes was merged in master and stable-2.14 :), now all there needs to be is a follow up to hook up the implementation. [19:21:34] paladox: when do we start using polygerrit [19:21:41] in prod [19:25:39] paladox: nice :) [19:26:40] Zppix: subscribe to the task https://phabricator.wikimedia.org/T156120 and eventually when we decide to plan the upgrade some activity will happen on that task [19:27:29] is there a way to unsubscribe from things that were subscribed to months ago [19:27:30] only [19:33:23] RECOVERY - Puppet run on integration-slave-docker-1000 is OK: OK: Less than 1.00% above the threshold [0.0] [19:37:10] yep also Zppix it will happen when ever we upgrade to gerrit 2.14 + when ever upstream finish doing the fixes. [19:40:10] 10Continuous-Integration-Infrastructure (Little Steps Sprint): Raise priority for operations-mw-config-composer-hhvm-jessie from the gate-and-submit pipeline - https://phabricator.wikimedia.org/T162076#3152078 (10hashar) [20:00:10] 10Scap (Scap3-Adoption-Phase1), 10RESTBase, 06Services (doing), 15User-mobrovac: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#3152129 (10mobrovac) a:03mobrovac [20:00:25] 10Continuous-Integration-Infrastructure (Little Steps Sprint): Raise priority for operations-mw-config-composer-hhvm-jessie from the gate-and-submit pipeline - https://phabricator.wikimedia.org/T162076#3152132 (10hashar) Zuul keeps metrics for the various pipeline / project etc. I created a graph that represent... [20:03:28] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:19:28] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [20:37:02] !log jenkins: disabled/reenabled gearman plugin to unlock the beta cluster related jobs [20:37:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:39:03] !log Nodepool: holding instance ci-trusty-wikimedia-597386 in an attempt debug Wikibase/Scribunto memory usage exploding T125050 [20:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:39:06] T125050: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050 [20:43:16] !log Update mobileapps to fdd4e31 [20:43:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:02:00] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3152295 (10hashar) I manually rebuild mwext-testextension-php55-composer-trusty for Wikibase. Instructed nodepool to not delete the instance... [21:07:03] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3152305 (10hashar) Next thing. On Jenkins we have a monitoring system named JavaMelody which can be used to inspect an instance, specially th... [21:35:28] hashar i guess we can go all the way upto eddsa for the ssh key :) https://github.com/jenkinsci/trilead-ssh2/pull/12 [21:35:31] https://github.com/jenkinsci/trilead-ssh2/pull/13 [21:42:05] who knew i will be calling my mail oath soon http://www.theverge.com/2017/4/3/15166872/aol-verizon-oath-announced-merger-rebranding-new-name-logo [22:43:27] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:15:53] PROBLEM - Puppet run on saucelabs-03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:16:06] PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [23:43:49] https://cloudplatform.googleblog.com/2017/03/how-release-canaries-can-save-your-bacon-CRE-life-lessons.html [23:50:50] RECOVERY - Puppet run on saucelabs-03 is OK: OK: Less than 1.00% above the threshold [0.0] [23:52:53] (03CR) 10Krinkle: Cache node_modules (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/346152 (https://phabricator.wikimedia.org/T159591) (owner: 10Hashar) [23:53:06] (03CR) 10Krinkle: [C: 04-1] Cache node_modules [integration/config] - 10https://gerrit.wikimedia.org/r/346152 (https://phabricator.wikimedia.org/T159591) (owner: 10Hashar)