[00:00:45] RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [00:09:35] bd808: okay, I'll take a look [00:20:03] Project beta-update-databases-eqiad build #10871: 04STILL FAILING in 2.7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10871/ [00:27:37] 10Beta-Cluster-Infrastructure, 06Revision-Scoring-As-A-Service: deployment-sca03 can't call puppetmaster - https://phabricator.wikimedia.org/T143958#2584624 (10mmodell) Looks like that error is related to {bf86143048c9ef821f37f0bfe5e1d2b72ce6d4a5}, that modified the pick_initscript function which probably requ... [00:28:47] !log restarted puppetmaster service on deployment-puppetmaster [00:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [00:30:34] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 06Revision-Scoring-As-A-Service, 13Patch-For-Review, 15User-Ladsgroup: Switch beta to use the proper wiki models for scoring (rather than "testwiki") - https://phabricator.wikimedia.org/T143567#2584704 (10mmodell) [00:30:37] 10Beta-Cluster-Infrastructure, 06Revision-Scoring-As-A-Service: deployment-sca03 can't call puppetmaster - https://phabricator.wikimedia.org/T143958#2584701 (10mmodell) 05Open>03Resolved a:03mmodell After restarting the puppetmaster, a puppet run on `deployment-sca03` succeeded: `twentyafterfour@deploym... [00:31:07] Amir1: fixed puppet on deployment-sca03 [00:38:40] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta puppetmaster cherry-pick process - https://phabricator.wikimedia.org/T135427#2584745 (10mmodell) Thanks @mobrovac. I think we are in pretty good shape. [00:45:38] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2584759 (10mmodell) p:05Triage>03High [01:20:02] Project beta-update-databases-eqiad build #10872: 04STILL FAILING in 0.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10872/ [01:36:30] (03CR) 10Krinkle: Use WebPageTest.org key on WMF WebPageTest to support relay server (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/305993 (https://phabricator.wikimedia.org/T142964) (owner: 10Phedenskog) [01:36:53] (03PS2) 10Phedenskog: Use WebPageTest.org key on WMF WebPageTest to support relay server [integration/config] - 10https://gerrit.wikimedia.org/r/305993 (https://phabricator.wikimedia.org/T142964) [01:37:02] (03PS3) 10Krinkle: Use WebPageTest.org key on WMF WebPageTest to support relay server [integration/config] - 10https://gerrit.wikimedia.org/r/305993 (https://phabricator.wikimedia.org/T142964) (owner: 10Phedenskog) [01:59:15] 10Continuous-Integration-Config, 07Regression: integration-zuul-layoutdiff claims difference when there is none - https://phabricator.wikimedia.org/T143966#2584877 (10Krinkle) [02:00:14] (03CR) 10Krinkle: [C: 032] "Recompiled 'performance-webpagetest-*' (2 jobs) and published to Jenkins." [integration/config] - 10https://gerrit.wikimedia.org/r/305993 (https://phabricator.wikimedia.org/T142964) (owner: 10Phedenskog) [02:02:05] (03Merged) 10jenkins-bot: Use WebPageTest.org key on WMF WebPageTest to support relay server [integration/config] - 10https://gerrit.wikimedia.org/r/305993 (https://phabricator.wikimedia.org/T142964) (owner: 10Phedenskog) [02:20:01] Project beta-update-databases-eqiad build #10873: 04STILL FAILING in 0.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10873/ [03:07:47] greg-g: heads-up https://phabricator.wikimedia.org/T143968 Not sure since when this started happening. [03:20:01] Project beta-update-databases-eqiad build #10874: 04STILL FAILING in 0.86 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10874/ [03:56:23] Project beta-scap-eqiad build #117146: 04FAILURE in 1 min 51 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117146/ [03:58:41] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,contintLabsSlave && UbuntuTrusty build #122: 04FAILURE in 2 min 40 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/122/ [04:01:53] 03:56:23 __main__.CheckServiceError: Timeout on connection while downloading deployment-logstash2.deployment-prep.eqiad.wmflabs:9200/logstash-*/_search [04:02:33] _search doesn't work anymore [04:02:39] is that from scap? [04:03:16] oh wait... that should work [04:03:34] :9200 is elasticsaerch directly [04:06:20] Project beta-scap-eqiad build #117147: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117147/ [04:15:06] PROBLEM - SSH on deployment-logstash2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:16:16] Project beta-scap-eqiad build #117148: 04STILL FAILING in 1 min 46 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117148/ [04:18:47] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #122: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/122/ [04:20:02] Project beta-update-databases-eqiad build #10875: 04STILL FAILING in 0.91 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10875/ [04:26:25] Project beta-scap-eqiad build #117149: 04STILL FAILING in 1 min 52 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117149/ [04:36:16] Project beta-scap-eqiad build #117150: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117150/ [04:46:16] Project beta-scap-eqiad build #117151: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117151/ [04:56:20] Project beta-scap-eqiad build #117152: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117152/ [05:06:19] Project beta-scap-eqiad build #117153: 04STILL FAILING in 1 min 46 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117153/ [05:16:16] Project beta-scap-eqiad build #117154: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117154/ [05:20:01] Project beta-update-databases-eqiad build #10876: 04STILL FAILING in 0.86 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10876/ [05:26:19] Project beta-scap-eqiad build #117155: 04STILL FAILING in 1 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117155/ [05:32:43] 10Beta-Cluster-Infrastructure: beta-scap-eqiad failing: Timeout on connection while downloading deployment-logstash2.deployment-prep.eqiad.wmflabs:9200/logstash-*/_search - https://phabricator.wikimedia.org/T143973#2585024 (10Legoktm) [05:32:55] 10Beta-Cluster-Infrastructure: beta-scap-eqiad failing: Timeout on connection while downloading deployment-logstash2.deployment-prep.eqiad.wmflabs:9200/logstash-*/_search - https://phabricator.wikimedia.org/T143973#2585036 (10Legoktm) p:05Triage>03Unbreak! [05:34:16] and also https://phabricator.wikimedia.org/T143974 [05:36:17] Project beta-scap-eqiad build #117156: 04STILL FAILING in 1 min 46 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117156/ [05:46:17] Project beta-scap-eqiad build #117157: 04STILL FAILING in 1 min 47 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117157/ [05:56:20] Project beta-scap-eqiad build #117158: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117158/ [06:06:16] Project beta-scap-eqiad build #117159: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117159/ [06:16:14] Project beta-scap-eqiad build #117160: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117160/ [06:20:01] Project beta-update-databases-eqiad build #10877: 04STILL FAILING in 0.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10877/ [06:26:20] Project beta-scap-eqiad build #117161: 04STILL FAILING in 1 min 47 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117161/ [06:29:57] RECOVERY - SSH on deployment-logstash2 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [06:36:22] Project beta-scap-eqiad build #117162: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117162/ [06:46:17] Project beta-scap-eqiad build #117163: 04STILL FAILING in 1 min 46 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117163/ [06:56:26] Project beta-scap-eqiad build #117164: 04STILL FAILING in 1 min 51 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117164/ [07:06:29] Project beta-scap-eqiad build #117165: 04STILL FAILING in 1 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117165/ [07:16:21] Project beta-scap-eqiad build #117166: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117166/ [07:20:01] Project beta-update-databases-eqiad build #10878: 04STILL FAILING in 0.73 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10878/ [07:26:25] Project beta-scap-eqiad build #117167: 04STILL FAILING in 1 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117167/ [07:36:19] Project beta-scap-eqiad build #117168: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117168/ [07:46:19] Project beta-scap-eqiad build #117169: 04STILL FAILING in 1 min 47 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117169/ [07:54:16] !log cherry-picked 306839/1 into deployment-puppetmaster [07:54:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [07:54:22] good morning [07:56:22] Project beta-scap-eqiad build #117170: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117170/ [08:00:48] !log beta-scap-eqiad failing investigating [08:00:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:01:32] 00:01:38.036 07:56:12 Executing check 'Logstash Error rate for deployment-mediawiki01.deployment-prep.eqiad.wmflabs' [08:01:32] bah [08:01:40] Timeout on connection while downloading deployment-logstash2.deployment-prep.eqiad.wmflabs:9200/logstash-*/_search [08:05:04] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 10Wikimedia-Logstash: scap on beta cluster does not run anymore due to logstash being down - https://phabricator.wikimedia.org/T143982#2585194 (10hashar) [08:05:09] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 587 bytes in 0.003 second response time [08:06:17] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.m.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 589 bytes in 0.002 second response time [08:06:23] Project beta-scap-eqiad build #117171: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117171/ [08:06:28] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 10Wikimedia-Logstash: scap on beta cluster does not run anymore due to logstash being down - https://phabricator.wikimedia.org/T143982#2585208 (10hashar) ``` Debian GNU/Linux 8 deployment-logstash2 ttyS0 deployment-logstash2 login: [2348521.020179] INFO:... [08:06:47] PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [08:07:00] !log rebooting deployment-logstash02 via Horizon. Kernel hang apparently T143982 [08:07:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:07:57] PROBLEM - Host deployment-logstash2 is DOWN: CRITICAL - Host Unreachable (10.68.16.147) [08:10:28] !log deployment-logstash2 is back after a hard reboot. T143982 [08:10:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:10:46] Yippee, build fixed! [08:10:47] Project beta-scap-eqiad build #117172: 09FIXED in 1 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117172/ [08:10:54] RECOVERY - Host deployment-logstash2 is UP: PING OK - Packet loss = 0%, RTA = 1.03 ms [08:10:58] !log beta-scap-eqiad job is back in operation. Was blocked on logstash not being reachable. T143982 [08:11:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:11:29] easy morning fix :D [08:11:31] zeljkof: ^^^ [08:11:38] the usual beta cluster maintenance hehe [08:11:47] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 10Wikimedia-Logstash: scap on beta cluster does not run anymore due to logstash being down - https://phabricator.wikimedia.org/T143982#2585234 (10hashar) 05Open>03Resolved a:03hashar [08:11:50] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 10Wikimedia-Logstash: scap on beta cluster does not run anymore due to logstash being down - https://phabricator.wikimedia.org/T143982#2585237 (10hashar) p:05Triage>03Unbreak! [08:12:04] hashar: good morning :D [08:15:40] !log restart uwsgi-ores and celery-ores-worker in deployment-sca03 (T143567) [08:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:20:02] Project beta-update-databases-eqiad build #10879: 04STILL FAILING in 1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10879/ [08:24:52] 10Beta-Cluster-Infrastructure: beta-scap-eqiad failing: Timeout on connection while downloading deployment-logstash2.deployment-prep.eqiad.wmflabs:9200/logstash-*/_search - https://phabricator.wikimedia.org/T143973#2585024 (10AlexMonk-WMF) It appears fine now - I did notice deployment-logstash2 go offline earlie... [08:26:45] RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [08:27:15] 10Beta-Cluster-Infrastructure: beta-scap-eqiad failing: Timeout on connection while downloading deployment-logstash2.deployment-prep.eqiad.wmflabs:9200/logstash-*/_search - https://phabricator.wikimedia.org/T143973#2585269 (10AlexMonk-WMF) [08:27:17] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 10Wikimedia-Logstash: scap on beta cluster does not run anymore due to logstash being down - https://phabricator.wikimedia.org/T143982#2585272 (10AlexMonk-WMF) [08:29:29] hashar: hey, if you're around. I have this patch for beta cluster. Can I merge it on my own (I just got the rights) https://gerrit.wikimedia.org/r/#/c/306881/ [08:30:37] Amir1: if it is only for beta, yeah we get them reviewed/merged right away [08:30:42] no need to wait for SWAT [08:30:43] BUT [08:30:53] one has to git fetch && git rebase on the production deployment server [08:30:58] eg tin [08:31:13] as to not confuse people that might have to do config change and would be left wondering what that patch is about :) [08:31:41] Amir1: CR+2 want assistance to do the rebase on tin.eqiad.wmnet ? [08:31:46] can pair it over hangout if you want [08:31:59] I'm in library, can't do hangouts :( [08:32:09] well it is quite easy [08:32:11] https://wikitech.wikimedia.org/wiki/How_to_deploy_code [08:32:12] ssh tin.eqiad.wmnet [08:32:15] cd /srv/mediawiki-staging [08:32:18] git fetch [08:32:26] git log HEAD..HEAD@{u} [08:32:33] should show a single patch, the one that got merged [08:32:36] git rebase [08:32:38] done [08:32:42] okay [08:32:44] no need to sync the -labs.php file to the fleet [08:32:45] I do it now [08:33:14] it is just that later when a production patch is sent, people doing "git log HEAD..HEAD@{u}" will not see your patch :D [08:33:27] I see [08:33:31] (most will understand that is a beta only change and thus safe to rebase onto) [08:33:45] but some people my stop there and ask whether it is really safe / why it hasn't been deployed [08:34:24] done now [08:37:15] thanks hashar [08:38:07] kudos :) [08:48:49] 10Beta-Cluster-Infrastructure: beta-scap-eqiad failing: Timeout on connection while downloading deployment-logstash2.deployment-prep.eqiad.wmflabs:9200/logstash-*/_search - https://phabricator.wikimedia.org/T143973#2585024 (10hashar) Sorry I have noticed this task after I fixed the issue and filled a dupe T14398... [08:54:38] 10Continuous-Integration-Config, 07Regression, 07Zuul: integration-zuul-layoutdiff claims difference when there is none - https://phabricator.wikimedia.org/T143966#2585293 (10hashar) Indeed and that is an issue in #zuul. The objects are used to format the string but lack a string representation method so th... [09:20:01] Project beta-update-databases-eqiad build #10880: 04STILL FAILING in 0.86 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10880/ [09:33:16] 10Beta-Cluster-Infrastructure, 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2585345 (10elukey) >>! In T116206#2582429, @elukey wrote: > Thanks for reporting, this is my bad since analytics_hadoop_hosts is not in hiera lab... [10:20:01] Project beta-update-databases-eqiad build #10881: 04STILL FAILING in 0.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10881/ [10:20:40] One question: per here https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings-labs.php#L476 [10:20:50] all models should be false except damaging [10:21:18] https://www.irccloud.com/pastebin/qS0TuJru/ [10:21:35] ^ the config says all are true [10:27:42] same goes for prod [10:28:17] 10Continuous-Integration-Config, 07Regression, 07Zuul: integration-zuul-layoutdiff claims difference when there is none - https://phabricator.wikimedia.org/T143966#2585390 (10hashar) a:03hashar First pass at https://review.openstack.org/#/c/361064/ [10:40:44] Amir1: my guess would be that those specific settings are overriden later by a global [10:40:50] depends on the load order [10:40:55] and that is very scary / confusing [10:40:58] I keep getting lost [10:41:14] Yeah, I'm trying to find out what is the load order [10:41:50] extension.json->CommonSettings.php->InitaliseSettings.php [10:42:39] I think that should be it but I'm not sure [10:53:05] what the hell [10:53:13] that only applies in enwiki [10:53:27] dewiki, fawiki are all okay both in prod and beta cluster [10:57:22] extension.json is actually loaded last, but it is only merged with other confirmation, not overriding it [11:01:07] hmm [11:11:17] 10Continuous-Integration-Infrastructure, 07Upstream, 07WorkType-Maintenance, 07Zuul: Zuul deadlocks if unknown repo has activity in Gerrit - https://phabricator.wikimedia.org/T128569#2585460 (10Paladox) @hashar looking at https://github.com/openstack-infra/zuul/commit/a8b90b38094587bbd82ffa5e4028aef1dfd029... [11:20:01] Project beta-update-databases-eqiad build #10882: 04STILL FAILING in 0.79 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10882/ [11:26:50] 10Continuous-Integration-Infrastructure, 07Upstream, 07WorkType-Maintenance, 07Zuul: Zuul deadlocks if unknown repo has activity in Gerrit - https://phabricator.wikimedia.org/T128569#2585482 (10Paladox) Maybe because in zuul/source/gerrit.py line 321 it does def getGitUrl(self, project): return s... [11:41:21] (03CR) 10Zfilipin: [C: 032] Added Phabricator username for owners of Selenium jobs [integration/config] - 10https://gerrit.wikimedia.org/r/306648 (https://phabricator.wikimedia.org/T142409) (owner: 10Zfilipin) [11:41:26] (03PS3) 10Zfilipin: Added Phabricator username for owners of Selenium jobs [integration/config] - 10https://gerrit.wikimedia.org/r/306648 (https://phabricator.wikimedia.org/T142409) [11:41:30] (03CR) 10Zfilipin: Added Phabricator username for owners of Selenium jobs [integration/config] - 10https://gerrit.wikimedia.org/r/306648 (https://phabricator.wikimedia.org/T142409) (owner: 10Zfilipin) [11:41:34] (03CR) 10Zfilipin: [C: 032] Added Phabricator username for owners of Selenium jobs [integration/config] - 10https://gerrit.wikimedia.org/r/306648 (https://phabricator.wikimedia.org/T142409) (owner: 10Zfilipin) [11:43:20] (03Merged) 10jenkins-bot: Added Phabricator username for owners of Selenium jobs [integration/config] - 10https://gerrit.wikimedia.org/r/306648 (https://phabricator.wikimedia.org/T142409) (owner: 10Zfilipin) [11:47:04] hashar hi, im wondering would this https://github.com/openstack-infra/zuul/commit/a8b90b38094587bbd82ffa5e4028aef1dfd02987 fix zuul for us [11:47:09] https://phabricator.wikimedia.org/T128569 [11:47:10] ? [11:48:08] looks like it could help :) [11:48:20] Oh thanks :) [11:48:42] Wondering could we backport it to test please? when you have time :) [11:50:21] (03PS31) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [11:51:13] (03PS32) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [11:51:18] :) [11:53:24] not on a friday [11:53:31] Ok [11:53:53] What i mean is the patch, but not building or merging [11:53:55] until your ready [11:53:58] :) [11:54:02] please [11:55:01] paladox: can you mention the commit on our task ? [11:55:08] Ok i have [11:55:22] (03PS33) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [11:55:26] https://phabricator.wikimedia.org/T128569#2585460 [11:55:27] :) [11:55:47] 10Continuous-Integration-Infrastructure, 06Operations, 07Zuul: Upgrade Zuul on scandium.eqiad.wmnet (Jessie zuul-merger) - https://phabricator.wikimedia.org/T140894#2585537 (10hashar) I will rebuild Zuul to try a patch for T128569 ( https://github.com/openstack-infra/zuul/commit/a8b90b38094587bbd82ffa5e4028a... [11:56:40] Oh thanks ^^ [11:56:43] brb [12:01:57] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Documentation, 13Patch-For-Review, 15User-zeljkofilipin: Document browser tests ownership (and what it means) on wiki - https://phabricator.wikimedia.org/T142409#2585570 (10zeljkofilipin) 05Open>03Resolved Resolving, please reopen if there... [12:20:01] Project beta-update-databases-eqiad build #10883: 04STILL FAILING in 0.76 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10883/ [12:28:48] (03CR) 10Paladox: [C: 031] "This test dose t take long to complete so +1" [integration/config] - 10https://gerrit.wikimedia.org/r/306723 (owner: 10Hashar) [12:29:13] (03CR) 10Paladox: "This test dose t take long to complete so +1" [integration/config] - 10https://gerrit.wikimedia.org/r/306722 (owner: 10Hashar) [12:29:16] (03CR) 10Paladox: [C: 031] Revert "Move npm-node-4 off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306722 (owner: 10Hashar) [12:29:27] (03CR) 10Paladox: [C: 031] "This test dose t take long to complete so +1" [integration/config] - 10https://gerrit.wikimedia.org/r/306725 (owner: 10Hashar) [12:29:41] (03CR) 10Paladox: [C: 031] "This test dose t take long to complete so +1" [integration/config] - 10https://gerrit.wikimedia.org/r/306724 (owner: 10Hashar) [12:31:32] (03CR) 10Paladox: "You can ignore certain folders or files with Zuul." [integration/config] - 10https://gerrit.wikimedia.org/r/306050 (https://phabricator.wikimedia.org/T143598) (owner: 10Awight) [13:19:06] 10Continuous-Integration-Infrastructure, 07Nodepool: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2585790 (10hashar) Looking at debug messages over four days: ``` $ grep 'wmflabs.*running task' /var/log/nodepool/debug.log*|cut -d\ -f9|cut -d\. -f3|sort|uniq -c|sort -rn 6266 ListS... [13:19:24] 10Continuous-Integration-Infrastructure, 07Nodepool: Investigate use of Nodepool ListFloatingIPsTask - https://phabricator.wikimedia.org/T143943#2585791 (10hashar) Looking at debug messages over four days: ``` $ grep 'wmflabs.*running task' /var/log/nodepool/debug.log*|cut -d\ -f9|cut -d\. -f3|sort|uniq -c|so... [13:20:02] Project beta-update-databases-eqiad build #10884: 04STILL FAILING in 0.83 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10884/ [13:26:47] (03PS1) 10Whym: Whitelist Dalba and Whym (for Pywikibot) [integration/config] - 10https://gerrit.wikimedia.org/r/306916 [13:27:46] (03PS2) 10Whym: Whitelist Dalba and Whym (for Pywikibot) [integration/config] - 10https://gerrit.wikimedia.org/r/306916 [13:37:49] 10Continuous-Integration-Infrastructure, 07Nodepool: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2585833 (10hashar) [13:40:32] 10Continuous-Integration-Infrastructure, 07Nodepool: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2585836 (10hashar) I looked at the patches this Friday morning and since I am out this evening with the week-end going, there was no chance for me to babysit the revert. After discussio... [13:40:48] (03PS34) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [13:57:24] (03PS1) 10Lokal Profil: Add QUnit test to Wikispeech [integration/config] - 10https://gerrit.wikimedia.org/r/306923 [14:02:18] (03PS2) 10Lokal Profil: Add QUnit test to Wikispeech [integration/config] - 10https://gerrit.wikimedia.org/r/306923 [14:04:45] 10Continuous-Integration-Infrastructure, 07Nodepool: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2585898 (10chasemp) Metrics: https://grafana.wikimedia.org/dashboard/db/nodepool https://grafana.wikimedia.org/dashboard/db/releng-zuul?panelId=25&fullscreen https://grafana.wikimedi... [14:06:44] PROBLEM - Host deployment-parsoid05 is DOWN: CRITICAL - Host Unreachable (10.68.16.120) [14:20:02] Project beta-update-databases-eqiad build #10885: 04STILL FAILING in 0.92 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10885/ [14:26:51] 10Continuous-Integration-Infrastructure, 07Nodepool: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2586015 (10hashar) @chasemp suggested to list out how many builds per timeframe a set of jobs does. Looking at Zuul metrics, we get for each pipeline the amount of jobs triggered. I cr... [14:36:06] 10Continuous-Integration-Infrastructure, 06Operations, 07Zuul: Upgrade Zuul on scandium.eqiad.wmnet (Jessie zuul-merger) - https://phabricator.wikimedia.org/T140894#2586065 (10Paladox) @hashar we should also bump the version to 2.5.0 per https://github.com/openstack-infra/zuul/releases/tag/2.5.0 Also we sho... [14:37:02] 07Browser-Tests, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review: [Task] enable rubocop in Wikibase - https://phabricator.wikimedia.org/T131139#2586066 (10Tobi_WMDE_SW) 05Open>03Resolved a:03Tobi_WMDE_SW [14:37:16] 07Browser-Tests, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review: [Task] enable rubocop in Wikibase - https://phabricator.wikimedia.org/T131139#2156983 (10Tobi_WMDE_SW) [14:37:19] 07Browser-Tests, 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 15User-zeljkofilipin: [Task] Move Wikidata browsertests into Wikibase repository - https://phabricator.wikimedia.org/T118727#2586069 (10Tobi_WMDE_SW) 05Open>03Resolved a:03Tobi_WMDE_SW [15:20:01] Project beta-update-databases-eqiad build #10886: 04STILL FAILING in 0.72 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10886/ [15:21:16] (03PS35) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [16:02:19] I am disappeared [16:02:24] I am disappearing damn [16:02:26] happy week-end [16:20:02] Project beta-update-databases-eqiad build #10887: 04STILL FAILING in 0.79 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10887/ [16:20:08] ostriches hi, could you merge https://gerrit.wikimedia.org/r/#/c/306413/ please? [16:30:29] paladox: That's puppet, no I can't merge it. [16:32:31] ostriches Oh, is there some body we could ask to merge it like andrewb please? [17:20:02] Project beta-update-databases-eqiad build #10888: 04STILL FAILING in 0.98 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10888/ [17:29:51] 10Deployment-Systems, 10scap: handle logstash timeouts separately from spikes in errors reported by logstash - https://phabricator.wikimedia.org/T144033#2586774 (10mmodell) [18:03:05] is there a phab task to port keyholder to systemd yet? [18:03:49] that's going to be a blocker to T143536 [18:08:51] I don't think that there is, actually [18:10:17] I'm writing one, couldn't find anything in a search [18:10:38] ack, thanks [18:13:34] 10Deployment-Systems, 03Scap3, 10scap, 06Operations: Make keyholder work with systemd - https://phabricator.wikimedia.org/T144043#2587001 (10bd808) [18:13:55] o/ marxarelli [18:14:15] hey bd808 [18:15:03] how's things in OAK? [18:15:24] eh, bad day to ask [18:15:31] :/ [18:15:47] * bd808 hugs marxarelli [18:16:13] :) [18:17:09] yesterday was ok though. got some python written [18:17:45] nice. I just got my django project live on prod yesterday -- https://toolsadmin.wikimedia.org/ [18:18:07] oh neat [18:18:19] and now I have a growing pile of bugs to fix for it [18:18:54] * bd808 continues to avoid working with MW (which is why he was hired in the first place) [18:20:02] Project beta-update-databases-eqiad build #10889: 04STILL FAILING in 0.81 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10889/ [18:28:29] bd808: have you ever set up an unprivileged lxc container on jessie? [18:29:39] i've been wanting to run some experiments with it for ci isolation but it still seems rather uncharted [18:31:00] no, I haven't really. I've used mw-vagrant on sid and played with it a bit on jessie, but I don't think that ends up being an unprivledged container. [18:31:13] hmm, yeah [18:32:24] seems systemd is maybe the tricky bit, running a container that uses systemd that is [18:33:24] could be. systemd is still all voodoo to me :/ [19:20:01] Project beta-update-databases-eqiad build #10890: 04STILL FAILING in 0.67 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10890/ [20:20:01] Project beta-update-databases-eqiad build #10891: 04STILL FAILING in 0.78 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10891/ [20:51:45] 10Continuous-Integration-Infrastructure, 07Puppet: Cant refresh Nodepool snapshot due to puppet: Could not find class passwords::puppet::database - https://phabricator.wikimedia.org/T143769#2587470 (10hashar) Git bisect on a Jessie image yields 8cfbc62e5d1c8657fd394728d1cf4d75952c91f3 ``` commit 8cfbc62e5d1c8... [20:53:27] 06Release-Engineering-Team, 03releng-201617-q1, 15User-greg: Make tech debt analysis working documents public/documented - https://phabricator.wikimedia.org/T144059#2587488 (10greg) [20:54:42] 10Continuous-Integration-Infrastructure, 07Puppet: Cant refresh Nodepool snapshot due to puppet: Could not find class passwords::puppet::database - https://phabricator.wikimedia.org/T143769#2587505 (10hashar) Removing the include standard from the apache module fix it: ``` diff --git a/modules/apache/manifests... [20:56:53] 10Continuous-Integration-Infrastructure, 07Puppet: Cant refresh Nodepool snapshot due to puppet: Could not find class passwords::puppet::database - https://phabricator.wikimedia.org/T143769#2587506 (10hashar) [21:11:15] 10Continuous-Integration-Infrastructure, 07Puppet: Cant refresh Nodepool snapshot due to puppet: Could not find class passwords::puppet::database - https://phabricator.wikimedia.org/T143769#2587537 (10hashar) @akosiaris I could use some assistance / idea on this one. For Nodepool I am provisioning lightweight... [21:20:02] Project beta-update-databases-eqiad build #10892: 04STILL FAILING in 0.89 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10892/ [21:37:34] 10Deployment-Systems, 10scap: handle logstash timeouts separately from spikes in errors reported by logstash - https://phabricator.wikimedia.org/T144033#2587559 (10mmodell) a:05hashar>03None [21:54:44] hashar: is 'noop' a special job defined in zuul? [21:54:53] (03PS1) 10Legoktm: Get labs/striker out of the mediawiki queue [integration/config] - 10https://gerrit.wikimedia.org/r/307025 [21:55:01] legoktm: yes it is entirely internal to zuul [21:55:07] does not trigger a gearman function [21:55:22] we use it for the few cases were a patch could get a CR+2 but do not trigger any jobs due to file filters [21:55:34] still need at least a job for zuul to consider it a success [21:55:37] right. but it still causes queues to get merged into the mediawiki queue [21:55:54] (03PS2) 10Legoktm: Get labs/striker out of the mediawiki queue [integration/config] - 10https://gerrit.wikimedia.org/r/307025 [21:55:54] AND to define repos in zuul config until zuul is fixed (paladox found out upstream has sent a patch for that recently) [21:56:00] oh [21:56:09] yeah "noop" should not merge queues [21:56:15] - name: labs/striker/deploy [21:56:15] test: [21:56:15] - noop [21:56:15] gate-and-submit: [21:56:15] - noop [21:56:20] that is pulled into the MW queue [21:56:36] !!! [21:57:47] yeah :( [21:58:14] * bd808 hugs legoktm for working on this [22:01:54] (03CR) 10Hashar: [C: 031] "What Kunal said" [integration/config] - 10https://gerrit.wikimedia.org/r/307025 (owner: 10Legoktm) [22:02:00] legoktm: please do :) [22:02:26] I got distracted working on an upstream zuul patch [22:14:05] going asleep *wave* [22:20:01] Project beta-update-databases-eqiad build #10893: 04STILL FAILING in 0.92 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10893/ [22:26:32] https://review.openstack.org/#/c/361505/ [22:27:30] that was a nice distraction [22:27:51] (03CR) 10Legoktm: [C: 032] Get labs/striker out of the mediawiki queue [integration/config] - 10https://gerrit.wikimedia.org/r/307025 (owner: 10Legoktm) [22:28:51] (03Merged) 10jenkins-bot: Get labs/striker out of the mediawiki queue [integration/config] - 10https://gerrit.wikimedia.org/r/307025 (owner: 10Legoktm) [22:29:16] !log deploying https://gerrit.wikimedia.org/r/307025 [22:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:32:14] bd808: how terrible is having the other repos in the mw queue? [22:33:22] none of it is the end of the world. it's the noop that causes it then? [22:34:18] ah I see in you upstream patch [22:34:21] fun stuff [23:10:21] Project beta-scap-eqiad build #117262: 04FAILURE in 1 min 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117262/ [23:12:06] PROBLEM - Host deployment-poolcounter01 is DOWN: CRITICAL - Host Unreachable (10.68.19.181) [23:16:32] Yippee, build fixed! [23:16:32] Project beta-scap-eqiad build #117263: 09FIXED in 1 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/117263/ [23:20:01] Project beta-update-databases-eqiad build #10894: 04STILL FAILING in 0.89 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10894/ [23:27:37] PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [23:37:36] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [23:38:10] PROBLEM - Puppet run on deployment-pdf02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:38:37] 10Deployment-Systems, 03Scap3: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#2587755 (10thcipriani) 05Resolved>03Open Hiya @fgiunchedi could I get you to upload `scap_3.2.4-1` to carbon? This has the fix for {T142990} which unblocks Parsoid(!) [23:47:57] PROBLEM - Puppet run on deployment-pdfrender is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:48:09] RECOVERY - Puppet run on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:50:21] PROBLEM - Puppet run on deployment-mediawiki03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:53:48] PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:54:54] (03PS1) 10Paladox: Create job *-php55lint [integration/config] - 10https://gerrit.wikimedia.org/r/307033 [23:55:57] (03CR) 10jenkins-bot: [V: 04-1] Create job *-php55lint [integration/config] - 10https://gerrit.wikimedia.org/r/307033 (owner: 10Paladox) [23:58:14] (03PS2) 10Paladox: Create job *-php55lint [integration/config] - 10https://gerrit.wikimedia.org/r/307033 [23:59:45] (03CR) 10jenkins-bot: [V: 04-1] Create job *-php55lint [integration/config] - 10https://gerrit.wikimedia.org/r/307033 (owner: 10Paladox)