[00:16:45] 10Beta-Cluster-Infrastructure, 07Puppet, 07Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2241788 (10Krenair) [00:23:22] PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [00:23:40] PROBLEM - Puppet run on deployment-ms-be02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:24:10] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [00:24:54] PROBLEM - Puppet run on deployment-db2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [00:25:09] PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [00:25:15] PROBLEM - Puppet run on deployment-cxserver03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [00:26:57] PROBLEM - Puppet run on deployment-salt is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:27:55] PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0] [00:28:53] PROBLEM - Puppet run on deployment-urldownloader is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [00:34:53] RECOVERY - Puppet run on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [00:41:58] RECOVERY - Puppet run on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [00:43:38] RECOVERY - Puppet run on deployment-ms-be02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:53:21] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [00:56:01] RECOVERY - Puppet run on integration-slave-trusty-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [00:57:51] RECOVERY - Puppet run on deployment-apertium01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:00:03] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [01:00:17] RECOVERY - Puppet run on deployment-cxserver03 is OK: OK: Less than 1.00% above the threshold [0.0] [01:03:58] RECOVERY - Puppet run on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [01:24:45] Just went to logstash2, sentry2, and db1 to fix puppet.conf [01:24:58] they were missing the deployment-prep part of the fqdns, breaking pupper [01:25:00] puppet* [01:25:41] running puppet reveals a ton of older changes that I had assumed were active months ago [01:32:31] Okay, wtf? [01:32:39] krenair@deployment-eventlogging03:~$ cat /etc/ssh/userkeys/root [01:32:39] # laner [01:32:39] krenair@deployment-eventlogging03:~$ [01:32:55] puppet does nothing about this [01:42:26] woah... lots of instances like that [02:02:53] found the issue: https://gerrit.wikimedia.org/r/285519 [02:34:54] Yippee, build fixed! [02:34:55] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #924: 09FIXED in 1 min 53 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/924/ [03:01:33] RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:12:47] Yippee, build fixed! [03:12:48] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #955: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/955/ [03:13:25] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #1058: 04FAILURE in 31 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/1058/ [03:19:19] zeljkof: Do you know what can be workaround if I have deleted .rvm directory in home? Everything is failing (specially mw-vagrant) [04:17:29] RECOVERY - Puppet run on integration-slave-trusty-1025 is OK: OK: Less than 1.00% above the threshold [0.0] [05:12:36] kart_: I did not use rvm in a year or two [05:13:30] Just install a recent version of Ruby (2.x) using a package manager (brew, apt, yum...) [06:38:50] zeljkof: oh :) no issue. [06:39:40] zeljkof: yep. wondering why mw-vagrant still messed up. must be something else. [06:42:27] kart_: ping me in an hour, we can debug your vagrant [06:43:46] zeljkof: Sure. [06:51:03] Yippee, build fixed! [06:51:03] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce build #801: 09FIXED in 26 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce/801/ [08:04:01] kart_: sorry for the delay, still on the phone, had to pick up a dog (long story), at my macine in an hour or so [08:30:15] zeljkof: no worries. [08:35:57] Project beta-scap-eqiad build #100113: 04FAILURE in 1 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100113/ [08:45:36] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [08:46:04] Project beta-scap-eqiad build #100114: 04STILL FAILING in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100114/ [08:56:03] Project beta-scap-eqiad build #100115: 04STILL FAILING in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100115/ [09:00:53] 06Release-Engineering-Team, 10Phabricator: Clean up tasks in archived #Staging Phabricator project - https://phabricator.wikimedia.org/T133529#2242244 (10hashar) 05Open>03Resolved a:03hashar What a massive cleanup! The left over is indeed tasks we want to eventually achieve at some point, we can keep #s... [09:05:57] Project beta-scap-eqiad build #100116: 04STILL FAILING in 1 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100116/ [09:16:02] Project beta-scap-eqiad build #100117: 04STILL FAILING in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100117/ [09:26:02] Project beta-scap-eqiad build #100118: 04STILL FAILING in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100118/ [09:35:57] Project beta-scap-eqiad build #100119: 04STILL FAILING in 1 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100119/ [09:39:06] 06Release-Engineering-Team, 10Phabricator: Clean up tasks in archived #Staging Phabricator project - https://phabricator.wikimedia.org/T133529#2242299 (10Aklapper) Thank you! [09:42:14] poor scap [09:43:02] andre__: that "clean up #staging tasks" task was definitely worth filling thank you ( was https://phabricator.wikimedia.org/T133529#2242299 ) [09:43:32] kart_: if you still have trouble with vagrant, let me know [09:43:39] I am finally here :) [09:43:56] !log tmh01.deployment-prep.eqiad.wmflabs denies mwdeploy user breaking https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ [09:44:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:44:54] !sal [09:44:54] https://tools.wmflabs.org/sal/releng [09:45:46] hashar, thank you and Chad for cleaning up :) [09:45:58] Project beta-scap-eqiad build #100120: 04STILL FAILING in 1 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100120/ [09:46:13] I just need a more structured approach to find such archived Phab projects (currently it's pure luck). Working on that... [09:51:01] 10Beta-Cluster-Infrastructure: deployment-tmh01.deployment-prep.eqiad.wmflabs refuses mwdeploy ssh connection - https://phabricator.wikimedia.org/T133769#2242325 (10hashar) [09:51:19] 03Scap3, 10scap, 13Patch-For-Review: scap::target shouldn't allow users to redefine the user's key - https://phabricator.wikimedia.org/T132747#2209132 (10hashar) deployment-tmh01 refuses mwdeploy key now :( T133769 [09:55:58] Project beta-scap-eqiad build #100121: 04STILL FAILING in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100121/ [10:06:02] Project beta-scap-eqiad build #100122: 04STILL FAILING in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100122/ [10:16:03] Project beta-scap-eqiad build #100123: 04STILL FAILING in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100123/ [10:26:07] Project beta-scap-eqiad build #100124: 04STILL FAILING in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100124/ [10:30:18] Project beta-scap-eqiad build #100125: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100125/ [10:35:59] Project beta-scap-eqiad build #100126: 04STILL FAILING in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100126/ [10:38:40] andre__: [10:38:44] wrong ping [10:38:50] !log Rebuild Image ci-trusty-wikimedia-1461753210 in wmflabs-eqiad is ready [10:38:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:39:25] * andre__ hides again [10:42:07] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 03releng-201516-q4, and 2 others: [keyresult] Migrate php composer (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2242435 (10hashar) **Status update** The Trusty image is provided as of... [10:42:56] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 03releng-201516-q4, and 2 others: [keyresult] Migrate php composer (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2242438 (10hashar) [10:43:36] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 03releng-201516-q4, and 2 others: [keyresult] Migrate php composer (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#1819085 (10hashar) We need our PHP_BIN wrapper script to be merged so it... [10:46:03] Project beta-scap-eqiad build #100127: 04STILL FAILING in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100127/ [10:56:03] Project beta-scap-eqiad build #100128: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100128/ [11:06:00] Project beta-scap-eqiad build #100129: 04STILL FAILING in 1 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100129/ [11:15:58] Project beta-scap-eqiad build #100130: 04STILL FAILING in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100130/ [11:16:11] 10Deployment-Systems, 06Release-Engineering-Team, 06Operations: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#2132321 (10fgiunchedi) notes from looking into this with @Reedy on irc: * scap doesn't seem to know about `/var/lib/l10nupdate` but instead it drops cdb files... [11:16:29] 10Deployment-Systems, 06Release-Engineering-Team, 06Operations: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#2242482 (10fgiunchedi) p:05Triage>03Normal [11:26:00] Project beta-scap-eqiad build #100131: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100131/ [11:35:57] Project beta-scap-eqiad build #100132: 04STILL FAILING in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100132/ [11:37:06] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review, 07Puppet: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2242535 (10hashar) deployment-cache-text04 had the issue again. So at f... [11:41:39] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review, 07Puppet: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2242540 (10hashar) p:05Triage>03Normal https://gerrit.wikimedia.org... [11:43:17] !log fixed puppet on deployment-cache-text04 T132689 [11:43:18] T132689: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689 [11:43:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:45:59] Project beta-scap-eqiad build #100133: 04STILL FAILING in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100133/ [11:55:59] Project beta-scap-eqiad build #100134: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100134/ [12:05:53] Project beta-scap-eqiad build #100135: 04STILL FAILING in 1 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100135/ [12:16:04] Project beta-scap-eqiad build #100136: 04STILL FAILING in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100136/ [12:26:06] Project beta-scap-eqiad build #100137: 04STILL FAILING in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100137/ [12:30:46] 07Browser-Tests, 10Wikidata, 10Wikidata-Gadgets, 03Wikidata-Sprint-2016-04-26: smoke test Feature: Authority control gadget test fails - https://phabricator.wikimedia.org/T131144#2242623 (10Jonas) a:03inyono [12:31:17] 07Browser-Tests, 10Wikidata, 10Wikidata-Gadgets, 03Wikidata-Sprint-2016-04-26: smoke test Feature: Authority control gadget test fails - https://phabricator.wikimedia.org/T131144#2157126 (10Jonas) a:05inyono>03Jonas [12:31:36] 07Browser-Tests, 10Wikidata, 07Tracking: [tracking] make Wikidata browsertests non-flaky - https://phabricator.wikimedia.org/T92619#2242632 (10Jonas) [12:31:38] 07Browser-Tests, 10Wikidata, 10Wikidata-Gadgets, 03Wikidata-Sprint-2016-04-26: smoke test Feature: Authority control gadget test fails - https://phabricator.wikimedia.org/T131144#2157126 (10Jonas) 05Open>03Resolved [12:35:56] Project beta-scap-eqiad build #100138: 04STILL FAILING in 1 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100138/ [12:41:31] (03Draft2) 10Hashar: dib: only use 'main' Debian component [integration/config] - 10https://gerrit.wikimedia.org/r/270873 (https://phabricator.wikimedia.org/T120963) [12:42:09] (03Abandoned) 10Hashar: dib: only use 'main' Debian component [integration/config] - 10https://gerrit.wikimedia.org/r/270873 (https://phabricator.wikimedia.org/T120963) (owner: 10Hashar) [12:43:17] 05Continuous-Integration-Scaling, 13Patch-For-Review, 07Upstream: Nodepool instances have duplicate entry for jessie-backports/main - https://phabricator.wikimedia.org/T120963#2242673 (10hashar) 05Open>03Resolved Works fine now since dib 1.12.0 and setting `DIB_DEBIAN_COMPONENTS='main,contrib,non-free'` [12:45:33] stupid me [12:45:34] I lost a patch [12:46:00] Project beta-scap-eqiad build #100139: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100139/ [12:53:51] Yippee, build fixed! [12:53:51] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #1032: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/1032/ [12:54:44] RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [12:55:57] Project beta-scap-eqiad build #100140: 04STILL FAILING in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100140/ [12:56:47] (03PS1) 10Hashar: Pass --cache-dir /srv/git to zuul-cloner [integration/config] - 10https://gerrit.wikimedia.org/r/285637 [13:03:29] (03PS1) 10JanZerebecki: Remove generic dependency for Wikibase on Echo. [integration/config] - 10https://gerrit.wikimedia.org/r/285638 (https://phabricator.wikimedia.org/T133774) [13:04:28] (03CR) 10JanZerebecki: [C: 032] Remove generic dependency for Wikibase on Echo. [integration/config] - 10https://gerrit.wikimedia.org/r/285638 (https://phabricator.wikimedia.org/T133774) (owner: 10JanZerebecki) [13:05:21] (03Merged) 10jenkins-bot: Remove generic dependency for Wikibase on Echo. [integration/config] - 10https://gerrit.wikimedia.org/r/285638 (https://phabricator.wikimedia.org/T133774) (owner: 10JanZerebecki) [13:05:58] Project beta-scap-eqiad build #100141: 04STILL FAILING in 1 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100141/ [13:13:06] !log reloading zuul for 81a1f1a..0993349 [13:13:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:16:00] Project beta-scap-eqiad build #100142: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100142/ [13:23:34] 07Browser-Tests, 10Wikidata, 10Wikidata-Gadgets, 03Wikidata-Sprint-2016-04-26: smoke test Feature: Authority control gadget test fails - https://phabricator.wikimedia.org/T131144#2242796 (10Tobi_WMDE_SW) To give some more context here, we've updated the gadget on beta to the current version and had to chan... [13:26:03] Project beta-scap-eqiad build #100143: 04STILL FAILING in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100143/ [13:36:00] Project beta-scap-eqiad build #100144: 04STILL FAILING in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100144/ [13:45:59] Project beta-scap-eqiad build #100145: 04STILL FAILING in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100145/ [13:48:42] (03PS1) 10Hashar: Composer jobs on Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/285642 (https://phabricator.wikimedia.org/T119139) [13:48:58] (03PS2) 10Hashar: Composer jobs on Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/285642 (https://phabricator.wikimedia.org/T119139) [13:49:00] (03PS2) 10Hashar: Pass --cache-dir /srv/git to zuul-cloner [integration/config] - 10https://gerrit.wikimedia.org/r/285637 [13:50:47] (03PS3) 10Hashar: Composer jobs on Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/285642 (https://phabricator.wikimedia.org/T119139) [13:51:25] (03CR) 10Hashar: [C: 04-2] "Need to manually trigger them first." [integration/config] - 10https://gerrit.wikimedia.org/r/285642 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [13:54:43] oh for god sake [13:54:44] I am not paying attention [13:55:58] Project beta-scap-eqiad build #100146: 04STILL FAILING in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100146/ [13:59:53] 06Release-Engineering-Team, 06Operations: Update gerrit sshkey in role::ci::slave::labs when upgrade to Jessie happens - https://phabricator.wikimedia.org/T131903#2242850 (10fgiunchedi) p:05Triage>03Low [14:01:08] zeljkof: are you here? [14:01:17] Luke081515: yes! [14:01:57] !log Jenkins upgrading git client plugin 1.19.1. > 1.19.6 [14:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:02:36] zeljkof: I got a question concerning browsertests/selenium tests: When the selenium user is active atbeta for example, how do you store the password etc then? Is that public visible, or hidden? If hidden, it would be useful for me do know how, I need this way too ;) [14:03:01] Project browsertests-Wikidata-SmokeTests-linux-firefox build #182: 15ABORTED in 5 min 0 sec: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-SmokeTests-linux-firefox/182/ [14:03:02] Project browsertests-Wikidata-WikidataTests-linux-firefox build #181: 15ABORTED in 5 min 1 sec: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-WikidataTests-linux-firefox/181/ [14:03:28] ^^^ I have aborted them to restart Jenkins [14:04:24] Luke081515: do you have access to office wiki? [14:04:33] zeljkof: no [14:04:41] ok [14:04:47] the passwords are stored there :) [14:05:04] in short, passwords are stored in jenkins credential store [14:05:08] do you have access to that? [14:05:32] https://integration.wikimedia.org/ci/credential-store/ [14:06:07] zeljkof: No, (I want to do this at my onw jenkins instance), but how can the job read from this store, do you got a PHP line with code, that I can copy? ;) [14:06:19] ok, to make it more clear [14:06:32] (I don't need the password, but a way to do my tests at my instance the same way) [14:06:51] the passwords are hidden, they are available at jenkins credential store [14:06:53] and we have a backup at office wiki [14:07:13] are you using jenkins job builder to configure jobs? [14:07:19] Project beta-scap-eqiad build #100147: 04STILL FAILING in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100147/ [14:07:25] http://docs.openstack.org/infra/jenkins-job-builder/ [14:07:30] actually no. is there a way in php to do that? [14:07:52] !log Jenkins upgrading git plugin 2.4.1 > 2.4.4 [14:07:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:08:11] for reference, this is credentials plugin https://wiki.jenkins-ci.org/display/JENKINS/Credentials+Plugin [14:08:17] Luke081515: ok, let's step back [14:08:24] what are you trying to do? :) [14:08:31] I need some context [14:08:38] is your jenkins instance public? [14:08:59] zeljkof: I worte a PHP test for my bot programm at my instance, the instance is not public, and I won't store the password for the testwiki at the test code ;) [14:09:06] !log Jenkins upgrading credential plugin 1.24 > 1.27 And Credentials binding plugin 1.6 > 1.7 [14:09:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:09:46] ok, sounds good [14:10:05] so, you have a private jenkins instance the runs just a few jobs? [14:10:25] yeah [14:10:43] !log restarting Jenkins [14:10:45] again :( [14:10:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:11:00] but the test code is open source, so a password there is very bad :-/ [14:11:30] passwords in code is always a bad thing, open source or not [14:11:37] (03CR) 10Hashar: [C: 032] Pass --cache-dir /srv/git to zuul-cloner [integration/config] - 10https://gerrit.wikimedia.org/r/285637 (owner: 10Hashar) [14:12:02] so, the easiest thing is to create an environment variable in jenkins and store the password there [14:12:43] ok, but then I need to start the php programm with this variable, how I can do that with a shell command? [14:12:49] go to jenkins-url/configure [14:13:08] (03Merged) 10jenkins-bot: Pass --cache-dir /srv/git to zuul-cloner [integration/config] - 10https://gerrit.wikimedia.org/r/285637 (owner: 10Hashar) [14:13:15] all jobs will have access to this shell environment variable [14:13:31] not sure how to get it from PHP, but there is a way for sure [14:13:52] something like ENV["SECRET_PASSWORD"] [14:14:01] ok, thanks [14:14:20] http://php.net/manual/en/reserved.variables.environment.php [14:14:29] quick google search says this [14:14:37] ok, thanks you [14:14:39] try it, ping me if you get stuck [14:14:42] this solves my problem^^ [14:15:07] great [14:15:34] I've got 99 problems but env variable aint one ;) [14:15:57] Project beta-scap-eqiad build #100148: 04STILL FAILING in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100148/ [14:19:11] Luke081515: is there a reason you are not running the job on wmf jenkins? [14:20:20] zeljkof: It's not a mw extension etc, actually it's running at my localhost, so actually no need ;) [14:21:13] ok [14:21:17] just asking [14:26:01] Project beta-scap-eqiad build #100149: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100149/ [14:26:34] mediawiki/skins/VectorV2 SUCCESS https://integration.wikimedia.org/ci/job/composer-php55-trusty/5/ [14:26:34] :D [14:36:00] Project beta-scap-eqiad build #100150: 04STILL FAILING in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100150/ [14:46:02] Project beta-scap-eqiad build #100151: 04STILL FAILING in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100151/ [14:46:24] Project selenium-Math » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #1: 04FAILURE in 33 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/1/ [14:46:31] Project selenium-Math » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #1: 04FAILURE in 40 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/1/ [14:55:05] Project selenium-Math » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #1: 09SUCCESS in 25 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/1/ [14:55:06] Project selenium-Math » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #1: 09SUCCESS in 27 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/1/ [14:56:07] Project beta-scap-eqiad build #100152: 04STILL FAILING in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100152/ [14:56:59] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 05Testing-Initiative-2015, 10MediaWiki-extensions-Examples, and 2 others: Improve documentation around running/writing (with lots of examples) browser tests - https://phabricator.wikimedia.org/T108108#2242996 (10zeljkofilipin) [14:57:15] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, and 2 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2242997 (10zeljkofilipin) [14:57:22] 10Continuous-Integration-Config, 06Release-Engineering-Team, 06Operations: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2242998 (10fgiunchedi) p:05Triage>03Low [14:57:50] (03PS1) 10Hashar: Move composer job templates to php.yaml [integration/config] - 10https://gerrit.wikimedia.org/r/285651 [14:58:26] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #1: 09SUCCESS in 3 min 45 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/1/ [15:00:19] 10Continuous-Integration-Config, 06Release-Engineering-Team, 06Operations: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2218145 (10hashar) We have the `operations-puppet-typos` Jenkins job which reads from `/typos` but rely on `fgrep`: ``` fgrep -r --color=alw... [15:03:28] hashar: Hi [15:03:58] 10Continuous-Integration-Config, 06Release-Engineering-Team, 06Operations: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2243012 (10hashar) There is also https://www.npmjs.com/package/grunt-tyops / https://github.com/jdforrester/grunt-tyops which is a Grunt ta... [15:04:28] (03CR) 10Hashar: [C: 032] "Noop" [integration/config] - 10https://gerrit.wikimedia.org/r/285651 (owner: 10Hashar) [15:04:46] paladox: hello [15:05:18] Can we add assert-phpflavor: to npm 4.3 [15:05:21] hashar ^^7 [15:05:23] please [15:05:41] as a tester [15:05:56] and revert if it fails [15:05:59] Project beta-scap-eqiad build #100153: 04STILL FAILING in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100153/ [15:07:06] (03Merged) 10jenkins-bot: Move composer job templates to php.yaml [integration/config] - 10https://gerrit.wikimedia.org/r/285651 (owner: 10Hashar) [15:07:12] Project beta-scap-eqiad build #100154: 04STILL FAILING in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100154/ [15:07:18] paladox: there is no PHP on Nodepool yet [15:07:22] paladox: at least it is not ready [15:07:27] Oh ok [15:08:20] paladox: I have oojs on my radar and will eventually get it migrated [15:08:28] there is a bunch of puppet work that needs to happen first [15:08:33] Ok thanks. [15:08:45] most of jobs got migrated at least :-} [15:09:03] and I am hoping to migrate a wide range of composer based jobs to Nodepool as well [15:10:08] Ok :) [15:16:02] Project beta-scap-eqiad build #100155: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100155/ [15:24:21] (03PS4) 10Hashar: Composer jobs on Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/285642 (https://phabricator.wikimedia.org/T119139) [15:26:00] Project beta-scap-eqiad build #100156: 04STILL FAILING in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100156/ [15:27:53] Project beta-scap-eqiad build #100157: 04STILL FAILING in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100157/ [15:28:33] Yippee, build fixed! [15:28:33] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #897: 09FIXED in 32 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/897/ [15:31:14] 17:27:51 15:27:51 sudo -u mwdeploy -n -- /usr/bin/rsync -l deployment-tin.eqiad.wmflabs::common/wikiversions*.{json,php} /srv/mediawiki on deployment-tmh01.deployment-prep.eqiad.wmflabs returned [255]: Permission denied (publickey,keyboard-interactive). [15:31:19] meh, happend again [15:31:42] (03PS5) 10Hashar: Composer jobs on Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/285642 (https://phabricator.wikimedia.org/T119139) [15:34:26] (03CR) 10Hashar: [C: 032] "Good enough for now as an experiment." [integration/config] - 10https://gerrit.wikimedia.org/r/285642 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [15:35:43] (03Merged) 10jenkins-bot: Composer jobs on Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/285642 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [15:36:04] Project beta-scap-eqiad build #100158: 04STILL FAILING in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100158/ [15:39:17] Luke081515: yeah I have filled that one earlier [15:39:26] ok [15:39:56] https://phabricator.wikimedia.org/T133769 [15:40:18] twentyafterfour: when you get around, ssh / mwdeploy is broken for deployment-tmh01 somehow. Got a bunch of traces on https://phabricator.wikimedia.org/T133769 [15:40:33] definitely puppet that changed /etc/ssh/userkeys/mwdeploy to some weird key [15:46:08] Project beta-scap-eqiad build #100159: 04STILL FAILING in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100159/ [15:47:11] hashar: looking [15:47:15] zeljkof: Do you know how I can use crdentials from jenkins at shell commands, how I can set the enviorment variable? [15:47:35] twentyafterfour: tracked it down to 8:30am UTC which match the puppet event [15:47:41] Luke081515: not sure that the question is [15:47:47] no clue why it has a deployment-salt ssh key [15:47:53] what do you have? what do you want to do? [15:48:03] maybe that is tmh01 lacking some role / not being a deployment target [15:48:29] zeljkof: I want to execute a php script with 'php ' while 'args' in this case should be the password from the credential store [15:48:36] but I don't have an idea how to load it [15:49:16] ok [15:49:25] where are you executing the script from? [15:49:34] jenkins job, from shell script? [15:50:31] I'm currently configuring the job at jenkins GUI, and using there the "execute sheel" command, where I want to use the php command [15:50:55] can the php script just read the env variable? [15:51:11] without it being provided when starting the script as an argument? [15:51:58] I actually don't have a way to use the credential as an evn varaible, that's the problem [15:53:08] ok, I am lost :)( [15:53:10] :) [15:54:28] zeljkof: Do the selenium tests read the data from credentials too? Maybe I can copy a part of the code ^^ [15:56:07] Project beta-scap-eqiad build #100160: 04STILL FAILING in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100160/ [15:58:12] Luke081515: sure [15:58:23] but the configuration is all over the place [15:58:28] let me see if there is a good example [15:58:37] is your code public? [15:58:47] it would help if I could see what you are doing [15:58:47] actually no [15:59:01] it is hard to debug in the dark [15:59:08] beacuse I'm actually writing the unit tests so they aren't finished yet [15:59:23] *because [15:59:41] https://gerrit.wikimedia.org/r/#/c/274136/42/jjb/job-templates-selenium.yaml,cm [16:00:18] lines 139+ are how jenkins credential store and shell script running in jenkins are connected [16:00:33] ah, ok [16:00:33] thanks [16:01:50] Luke081515: just push the code to gerrit [16:02:02] mark it WIP and add me as a reviewer [16:02:09] or push to github, gist, somewhere [16:02:20] it is hard to tell what you are doing when I do not have the code [16:03:41] zeljkof: Actually it's just a bot, which needs a password to login via mw API. Normally I read the password from a chmod 0600 config file, but that's not pssobile at jenkins, so I just need another way to give him the passowrd [16:03:46] that' sall currently :-/ [16:03:55] +1 on publishing code somewhere ;-} [16:04:03] I dont mind reviewing / helping as time allow [16:05:00] ah, I guess https://wiki.jenkins-ci.org/display/JENKINS/Credentials+Binding+Plugin would help me to use it as an env variable :) [16:05:02] getenv( 'BOT_USERNAME' ) or $config['username']; [16:05:03] :D [16:05:17] then we can get the username in the Jenkins credential storage [16:05:34] and have the Jenkins job to fetch the credentials from the store and export them as env variables [16:05:54] potentially, the job could replace tokens in the config file with the appropriate values [16:06:09] Yippee, build fixed! [16:06:10] Project beta-scap-eqiad build #100161: 09FIXED in 1 min 19 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100161/ [16:06:43] ah, beta scap works again :D [16:06:48] twentyafterfour: well done [16:06:49] hopefully longer than the last time :D [16:07:00] I am really off now *wave* [16:07:25] 10Beta-Cluster-Infrastructure: deployment-tmh01.deployment-prep.eqiad.wmflabs refuses mwdeploy ssh connection - https://phabricator.wikimedia.org/T133769#2243219 (10mmodell) I'm getting an odd error: Error: Could not find command 'for' Error: /Stage[main]/Mediawiki::Users/Ssh::Userkey[mwdeploy]/Exec[compile_ssh_... [16:07:30] bye [16:08:11] 10Beta-Cluster-Infrastructure: deployment-tmh01.deployment-prep.eqiad.wmflabs refuses mwdeploy ssh connection - https://phabricator.wikimedia.org/T133769#2243227 (10mmodell) it's supposed to be a bash for loop. [16:08:39] PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:10:07] Project selenium-Flow » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #1: 09SUCCESS in 23 min: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/1/ [16:10:47] 10Beta-Cluster-Infrastructure: deployment-tmh01.deployment-prep.eqiad.wmflabs refuses mwdeploy ssh connection - https://phabricator.wikimedia.org/T133769#2243233 (10hashar) From I4086a12896e7e22004402dd0bc025896c037c746 for T132747 which is cherry picked on puppetmaster: ``` + exec { "compile_ssh_userkey... [16:11:07] twentyafterfour: I think you want to bash -c 'for ...' [16:11:20] anyway commute time and at least beta-scap-eqiad is fixed [16:11:42] Luke081515: sorry I keep breaking it [16:12:40] twentyafterfour: No matter, if it keeps stable now ^^ [16:12:46] *remains [16:13:25] Project selenium-Flow » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #1: 09SUCCESS in 26 min: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/1/ [16:18:57] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-MultimediaViewer, 10Reading-Web: NeedCaptcha (MediawikiApi::CreateAccountError) in MultimediaViewer browser tests - https://phabricator.wikimedia.org/T129472#2243288 (10MBinder_WMF) [16:19:00] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-MultimediaViewer, 10Reading-Web: A JSON text must at least contain two octets! (JSON::ParserError) in MultimediaViewer browser tests - https://phabricator.wikimedia.org/T129483#2243287 (10MBinder_WMF) [16:19:04] 10Browser-Tests-Infrastructure, 10MobileFrontend, 10Reading-Web: Net::ReadTimeout in MobileFrontend browser tests when visiting Watchlist page - https://phabricator.wikimedia.org/T129328#2243290 (10MBinder_WMF) [16:22:00] 07Browser-Tests, 10Browser-Tests-Infrastructure, 10MobileFrontend, 10Reading-Web, 15User-zeljkofilipin: Add helper to mediawiki_selenium for detecting if ResourceLoader module (JavaScript) has loaded - https://phabricator.wikimedia.org/T132753#2243396 (10MBinder_WMF) [16:23:09] 07Browser-Tests, 10MobileFrontend, 10Reading-Web, 07Technical-Debt: Refactor MobileFrontend tests (language, nearby) - https://phabricator.wikimedia.org/T109464#2243435 (10MBinder_WMF) [16:25:11] 07Browser-Tests, 10Gather, 10Reading-Web: Browser tests for notifications - https://phabricator.wikimedia.org/T120167#2243499 (10MBinder_WMF) [16:26:02] 07Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 10Reading-Web: Failure on "Download menu.Attribution area can be closed" browser test - https://phabricator.wikimedia.org/T92810#2243528 (10MBinder_WMF) [16:26:21] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-MultimediaViewer, 10Reading-Web: Automated screenshots - https://phabricator.wikimedia.org/T77634#2243536 (10MBinder_WMF) [16:28:18] 07Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 10Reading-Web: Failure in MMV browser tests - https://phabricator.wikimedia.org/T66249#2243599 (10MBinder_WMF) [16:43:38] RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:58:36] Yippee, build fixed! [17:58:37] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #892: 09FIXED in 35 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/892/ [18:39:00] 10Deployment-Systems, 10scap, 06Mobile-Apps, 03Mobile-Content-Service, 10Scap3 (Scap3-Adoption-Phase1): Deploy mobileapps/deploy with scap3 - https://phabricator.wikimedia.org/T129147#2244491 (10Mholloway) [18:40:50] hello! is CI dead? https://gerrit.wikimedia.org/r/#/c/285656/ hasn't had any activitiy for ~15mins [18:43:14] thcipriani: ^ ping [18:43:19] YuviPanda: looking [18:43:45] oh boy. [18:45:12] YuviPanda: build failed [18:45:21] CI seems alive and well [18:45:23] I see movement :) [18:46:00] Project beta-scap-eqiad build #100177: 04FAILURE in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100177/ [18:46:40] yeah! [18:47:03] > 18:44:12 Warning: Unrecognised escape sequence '\;' in file /mnt/jenkins-workspace/workspace/pplint-HEAD/modules/wikistats/manifests/db.pp at line 22 [18:47:11] why is it -1ing my patch for that? [18:47:27] * twentyafterfour doesn't know [18:47:36] 18:41:29 Warning: You cannot collect without storeconfigs being set on line 80 in file /mnt/jenkins-workspace/workspace/pplint-HEAD/modules/ssh/manifests/server.pp [18:47:45] that's definitely unrelated to any of my code... [18:48:08] it did this previously as well, with a pep8 violation for random other code. I fixed that and merged it too, but now it's showing up someething else... [18:48:31] I have no clue how operations/puppet CI is configured, honestly [18:48:49] do you know who would? [18:48:54] hmmm also suspect that it took 18mins to run a job the generally takes 5 seconds. [18:48:59] yeah [18:49:21] all the other runs of that job are < 1-2min [18:50:55] happened here too, different CI node, same problem. https://gerrit.wikimedia.org/r/#/c/285654/ [18:51:35] wooo, not just me then! [18:51:55] ha, and totally different warnings [18:53:00] PROBLEM - Puppet run on deployment-mediawiki01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:54:32] must be a problem somewhere in /srv/deployment/integration/slave-scripts/bin/git-changed-in-head pp for some changes... [18:54:35] YuviPanda, I ran into this error too [18:55:27] YuviPanda, pep8 errors? https://gerrit.wikimedia.org/r/#/c/285659/ :) [18:56:04] Project beta-scap-eqiad build #100178: 04STILL FAILING in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100178/ [18:56:47] Krenair: haha, I just merged a different fix :) [18:57:17] a 'recheck' fixed it [18:57:54] for a different problem? [18:58:08] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:59:08] Krenair: same type of problem but different errors yeah [18:59:24] Krenair: see PS6 [18:59:35] thcipriani: legoktm did something to fix it the last timet his happened I think [19:04:42] hi, whom should i bug about scap3 service depl? [19:05:13] i would like to deploy kartotherian in an hour [19:05:19] yurik: that's me or thcipriani [19:05:40] twentyafterfour, thx, could you walk me through the new process? [19:05:58] unless i should still use git deploy [19:06:02] Yippee, build fixed! [19:06:03] Project beta-scap-eqiad build #100179: 09FIXED in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100179/ [19:06:08] is kartotherian set up for scap3? it's gonna take more than an hour if you haven't set up scap3 yet [19:06:25] twentyafterfour, i haven't touched it on tin [19:06:31] because patches have to be merged in puppet to set it up [19:06:54] can i still deploy it with git deploy sync? [19:07:33] yeah it should work still [19:07:58] yurik: git deploy should still be functional until we get everyone migrated to scap3 [19:08:07] cool, thx [19:08:17] let me know when you want to work on migrating it [19:09:56] YuviPanda: it's strange, looking at the workspace on 1004 for your job. git show HEAD --name-only -- '*.pp' | wc -l == 1477 (which explained why it took so long to run) [19:11:35] yurik: if you want to migrate it shouldn't be too difficult, particularly if you're using the service-deploy user. Here's the overall process for services: https://wikitech.wikimedia.org/wiki/Services/Scap_Migration [19:11:58] thcipriani, lets migrate it after the deployment [19:12:23] okie doke. [19:13:20] thcipriani, changing data.yaml might take a week :) [19:14:19] :) The puppet patches, while not particularly onerous at this point, are probably what's going to take some time. [19:15:56] 10Deployment-Systems, 06Release-Engineering-Team, 03Scap3, 06Operations: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#2244644 (10Dzahn) [19:16:09] hashar: can I just remove a workspace that is messed up on a jenkins node? Context: https://integration.wikimedia.org/ci/job/pplint-HEAD/17551/consoleFull git show HEAD -- '*.pp' show 1477 files changed even though this was the change: https://gerrit.wikimedia.org/r/#/c/285656/ [19:17:56] RECOVERY - Puppet run on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:20:45] thcipriani, will changes to puppets prevent me from doing git deploy sync? [19:21:44] thcipriani: yup [19:22:01] thcipriani: you can mass clear workspaces via integration-saltmaster [19:22:16] with something like: sudo salt -v '*' cmd.run 'rm -fR /mnt/jenkins-workspace/workspace/pplint-HEAD*' [19:22:21] or hook on the slave [19:22:25] debug a bit then wipe the dir [19:22:38] yurik: yeah, once you change to deployment => 'scap3', [19:22:58] you won't be able to do: git deploy {start,sync} [19:23:04] permissions issues will arise [19:23:14] but that job is wiping the workspace entirely on each run ... [19:25:19] hmmm, well, didn't wipe integration-slave-trusty-1004 [19:26:00] that -HEAD job look at the files changed in HEAD and only keep the one ending with .pp [19:26:06] /srv/deployment/integration/slave-scripts/bin/git-changed-in-head pp [19:26:15] then xargs does the magic: xargs -n1 -t puppet parser validate [19:27:02] yeah, problem on 1004 was that even though only 3 files had changed in head (or should have) it showed 1477 files. [19:27:27] https://integration.wikimedia.org/ci/job/pplint-HEAD/buildTimeTrend shows the last failures / time to run [19:27:48] the thing is Zuul merge the patch on tip of the branch [19:28:05] that is done on gallium and scandium in a working copy /srv/ssd/zuul/git/operations/puppet [19:28:12] so maybe that place is dirty / corrupted somehow [19:29:24] greg-g, I'm planning to deploy https://gerrit.wikimedia.org/r/#/c/282440/ later today (to use External Store on Beta Cluster testwiki). [19:29:24] what part of the job configuration shows that this workspace should get wiped after a build? [19:29:42] the one that took 20minutes all fetched their patch from scandium [19:29:48] so the working copy there must be corrupted [19:29:54] or zuul-merger as some kind of issue [19:30:21] thcipriani: the top of the console log shows: "Wiping out workspace first." [19:31:26] ah, I see in the job config, too: 'Wipe out workspace and force clone' [19:31:34] yeah [19:31:42] but that must be an issue on scandiu [19:31:56] from a labs instance, you can try cloning from scandium + fetch ref + checkout [19:32:20] the parameters are available at https://integration.wikimedia.org/ci/job/pplint-HEAD/17551/parameters/ [19:32:27] or in the git debug log https://integration.wikimedia.org/ci/job/pplint-HEAD/17551/consoleFull [19:32:40] so [19:32:41] git://scandium.eqiad.wmnet/operations/puppet [19:32:44] git clone git://scandium.eqiad.wmnet/operations/puppet [19:33:12] eek one can even: git -c core.askpass=true fetch --tags --progress git://scandium.eqiad.wmnet/operations/puppet refs/zuul/production/Z6a3d43b3c73940f6ab37a4ba933c0ac4 && git FETCH_HEAD [19:34:07] hmm, checkout seemed to work, although I am in a detached head state... [19:34:14] yup [19:34:20] and the patch looks fine [19:36:00] and if I look on scandium [19:36:18] git -C /srv/ssd/zuul/git/operations/puppet show --stat zuul/production/Z6a3d43b3c73940f6ab37a4ba933c0ac4 [19:36:21] that looks legit [19:36:50] maybe integration/jenkins.git /bin/git-changed-in-head has an issue :( [19:37:05] though it hasn't changed in a while [19:37:41] no, it's definitely doing things correctly. Checkout what zuul setup on integration-slave-trusty-1004 in the pplist-HEAD workspace. [19:38:13] RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [19:44:36] weirdly, the commit parent object doesn't exist in the repo in that workspace. Git log looks wacky: https://phabricator.wikimedia.org/P2964 [19:47:53] OHH [19:47:59] thcipriani: that is a shallow clone [19:48:36] it git fetch --depth=1 [19:51:12] yeah reproduced [19:51:12] that indeed shows everything [19:52:52] so that would be why it's checking so many changes. But why only sometimes? [19:53:42] that is a good question :( [19:53:48] I have upgrade the git plugin earlier today [19:53:51] might be a regression [19:54:53] rebuilding it shows the same behavior https://integration.wikimedia.org/ci/job/pplint-HEAD/17561/console [19:55:25] I have logged in !sal when I have upgraded the git and git client plugins [19:55:42] so I guess we could grab a console log from before that time , or just a random build from yesterday [19:55:44] and compare the git commands being used [19:55:55] and most probably revert the plugins upgrade [19:56:50] hmm so on the passing job: https://integration.wikimedia.org/ci/job/pplint-HEAD/17553/console it shows the remote as gallium. [19:57:41] RECOVERY - Puppet run on deployment-sentry2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:57] 05Gitblit-Deprecate, 10Diffusion, 13Patch-For-Review: Replicate open patchsets to diffusion - https://phabricator.wikimedia.org/T89940#2244723 (10Paladox) This will be added in https://phabricator.wikimedia.org/D217 [20:06:07] thcipriani: we should move that to a rake task :-} [20:06:13] or a rubygem [20:06:17] gem install puppet_lint_HEAD [20:06:24] rake puppet::lint::HEAD [20:06:26] \O/ [20:07:57] 05Gitblit-Deprecate, 10Diffusion, 13Patch-For-Review: Replicate open patchsets to diffusion - https://phabricator.wikimedia.org/T89940#2244753 (10Luke081515) [20:09:09] do I...do I want that? [20:11:31] There is over 75 thousond changes in mw core including open changes [20:13:37] !log updated OCG to version e39e06570083877d5498da577758cf8d162c1af4 [20:13:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:15:09] thcipriani: maybe not ;) [20:16:12] hashar: so, FWIW, the .git/refs/heads/production is not up-to-date on scandium. Looks like it's behind by 3 commits at this point. [20:16:23] for operations/puppet [20:19:35] yup [20:19:42] because some of the patches are also merged on gallium [20:20:03] so production branch is going to be lagged out until a change is elected to be merged on scandium [20:20:12] the zuul-merger process then git remote update from Gerrit [20:20:16] reset to origin/production [20:20:42] attempt merging the patch, on success craft a reference like /refs/zuul/production/Z102931023910239 [20:20:45] and send that ref + resulting commit back to Zuul scheduler [20:21:12] the Zuul scheduler then run the Gearman function with ZUUL_URL=git://scandium ZUUL_REF=/refs/zuul/production/Z102931023910239 ZUUL_COMMIT=whatever [20:21:19] which is then passed down to the Jenkins job as build parameters [20:21:26] parameters that are used in the Jenkins git plugin [20:22:12] gallium is Precise with git 1.7 while scandium is git 2.1.4 [20:22:27] and upgrade the git plugin might have introduced a change that take advantage of some new feature of git that ends up triggering the issue ( [20:23:36] the easiest way to confirm would be to rollback the git plugins [20:23:37] from https://integration.wikimedia.org/ci/pluginManager/installed [20:23:54] [Git client plugin] and [Git plugin] each have a nice blue "downgrade button" [20:23:57] then restart jenkins [20:24:07] retrigger the build known to fail and pointing at scandium [20:24:12] and see what happens [20:24:17] could the fact that scandium isn't up-to-date cause a problem when doing git show HEAD? Like if the commit's parent object doesn't exist in the cloned repo (as was the case with the failed workspaces)? [20:24:21] if the build pass with the old plugins -> regression in the plugins [20:24:32] if the build still fail -> whatever the f**** ?? ;-D [20:25:13] on scandium , it does fetch the patch from Gerrit as well as the history leading to it ( i.e. git fetch does not have --depth X [20:25:18] so the parent should be around [20:26:21] another possibility is that two pending changes got proposed [20:26:32] eh, but it isn't on at least one failed job: https://phabricator.wikimedia.org/P2965 [20:26:42] and the parent change (not merged in production) would not be fetched by the Jenkins job when doing git fetch --depth 1 [20:26:58] maybe git consider the parent as non existing since it is not in the refs/heads/ [20:27:23] yeah [20:28:45] but in that instance the parent had merged. anyway, this could be a red-herring. [20:29:12] probably [20:29:42] and in a shallow fetch (--depth 1) it is probably normal for the parent to not be around [20:29:52] yeah. [20:29:56] maybe the easiest is to migrate that job to nodepool [20:30:04] and instead of --depth 1 / shallow [20:30:14] just git clone from the local mirror [20:30:19] so the workspace would have a whole repo [20:30:37] (operations/puppet is shipped in Nodepool instances as a bare repository at /srv/git/operations/puppet.git ) [20:31:14] so we can: git clone /srv/git/operations/puppet.git (will create hardlinks, blazing fast) [20:31:21] then git fetch from scandium / checkout / done [20:31:30] been thinking about how to deal with that in JJB today [20:32:15] it is a fairly standalone job, if the goal is to move more things to nodepool. [20:33:34] yup [20:34:01] and to save consuming an instance for such a trivial task, maybe we can integrate the pp validation in another of the operations/puppet jobs [20:35:55] thcipriani: I would first downgrade the Jenkins git plugin to rule them out [20:36:06] wanna handle it ? https://integration.wikimedia.org/ci/pluginManager/installed :D [20:36:15] oh boy! [20:37:37] what's the protocol? Announce it in -opperations? then go for it? Should just take a few minutes, correct? [20:39:02] I first check whether anyone does deployment [20:39:12] which in Europe is ... nobody usually till 4pm local time ;} [20:39:18] yup, I think services is done, but I'll check. [20:39:32] but yeah a quick announce in operations channel about Jenkins being restarted is a good thing [20:39:42] nowadays it only takes a couple minutes to come back [20:39:47] so that is not too much of an hassle [20:39:59] another thing i check is https://integration.wikimedia.org/zuul/ [20:40:16] cause I avoid aborting long running jobs in gate-and-submit [20:40:29] even if a job is aborted, Zuul will notice and reschedule it when jenkins is back [20:40:47] then https://integration.wikimedia.org/ci/ has a list of jobs running [20:41:12] the browsertests and some wmf-performance jobs are long running and would block restart until they are complete [20:41:15] I usually just kill them [20:41:26] they will run again later [20:43:36] thcipriani: and in case of emergency, people can still force merge in Gerrit and deploy ;-} [20:44:17] hashar: nodejs 6 has been released. [20:44:25] It is a lts release. [20:44:34] https://github.com/nodejs/LTS#lts_schedule [20:44:41] https://nodejs.org/en/ [20:50:12] paladox: you want to poke #wikimedia-services about it I guess [20:50:31] they have most of their software written in NodeJs [20:51:29] thcipriani: so once job complete , Jenkins should restart [20:51:48] you can tail : ssh -C gallium.wikimedia.org tail -F /var/log/jenkins/jenkins.log [20:52:01] * thcipriani does [20:52:35] and when impatient, you can abort/kill jobs in test pipeline [20:52:38] then folks will "recheck" [20:52:47] the jobs shows back in Gerrit as ABORTED [20:53:13] and since Jenkins now refuses to get new jobs, Zuul Gearman have them enqueued [20:53:25] so events pills up on https://integration.wikimedia.org/zuul/ [20:54:35] I have killed the jobs for https://gerrit.wikimedia.org/r/#/c/285722/ [20:54:51] I noticed. [20:55:06] and left a nice comment to the developer on his change at https://gerrit.wikimedia.org/r/#/c/285722/ [20:55:19] with 'recheck' so that will magically report back whenever Jenkins is back ;-) [20:55:42] when Jenkins starts [20:55:48] the plugins are enabled early on [20:55:55] and Gearman start processing before the UI [20:56:04] so you might have https://integration.wikimedia.org/ci/ showing a 500 / timeout whatever [20:56:15] but jobs being process on Zuul status page (and spamming jenkins.log as well ) [20:56:39] well the log tail has certainly been crazy busy. [20:56:45] yeah [20:57:02] the Gearman plugin register each of the jobs for each executors of each slaves [20:57:12] and each events must generate a few line by itself [20:57:15] so that pills up quickly ;} [20:57:23] I usually just CTR+C at that point [20:57:34] so https://integration.wikimedia.org/ci/pluginManager/installed shows the git plugins have been reverted [20:57:52] time to rebuild https://integration.wikimedia.org/ci/job/pplint-HEAD/17551/ which is known to fail [20:58:56] https://integration.wikimedia.org/ci/job/pplint-HEAD/17564/console [20:59:05] seems to be looking at everything again :( [20:59:14] \O/ [20:59:31] that rules out the git plugins imho [20:59:37] :D [20:59:57] !log thcipriani downgraded git plugins successfully (we wanted to rule out their upgrade for some weird issue) [21:00:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:00:07] there [21:00:12] thanks [21:00:48] thanks for walking me through that. good practice. /me makes notes. [21:00:50] * hashar awards a yellow star for thcipriani "Jenkins maint" on https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Skill_matrix [21:02:12] then I have no idea about the issue :( [21:02:26] might be a problem within git [21:02:44] with scandium serving with git 2.1.4 and the slave (trusty) using git v1.9 (iirc) [21:02:51] which would have some kind of weird side effect [21:03:42] I dont have good ideas really :( [21:04:34] so, just for my notes, you replied 'recheck' on that job before you cancelled it in jenkins? [21:04:37] I reproduced it by copy pasting the git command and I think I did that on scandium which is Jessie and has git client 2.1.4 [21:04:44] oh [21:04:51] no I cancelled the jobs in the Jenkins view [21:05:17] then confirmed in Zuul status page the change is gone / no jobs are left running [21:05:30] and headed to the gerrit change where jenkins voted -1 with a few ABORTED jobs [21:05:34] then commented recheck [21:05:45] that reenqueue the job in jenkins as is the patchset as been just submitted [21:06:12] zuul adds it to 'test' pipeline [21:06:42] fire the gearman functions which get in gearman queue until a worker take them [21:06:49] gotcha, zuul didn't seem to pick this one back up: https://gerrit.wikimedia.org/r/#/c/285722/ [21:07:14] it must be busy doing something else [21:07:37] oh no [21:07:41] https://integration.wikimedia.org/zuul/ it shows up in "test" [21:07:50] oojs/ui jobs take a while [21:08:12] on the Zuul status page there are some timers [21:08:20] 0 min (in black) is the estimated time of accomplishment [21:08:42] 12 min is the amount of time the change has been ( known to --- stuck in ) Zuul [21:09:05] the ETA is a bit off though [21:09:27] Jenkins produce an ETA for each job and Zuul report the max() one [21:09:33] but our jobs are shared by different repos [21:09:46] so "npm" job can take anytime from a few seconds to several minutes [21:09:49] which mess up the ETA [21:10:06] (no clue how Jenkins compute it, but it is per job for sure) [21:10:22] had we had a oojs-ui-npm job, the ETA would be better [21:10:42] (flood of text sorry) [21:11:07] Sorry. :-) [21:11:31] When we switch OOUI over to nodepool, we'll start screwing up npm-nodepool-4.3 instead. [21:11:51] "We", he says, despite hashar doing all the work. [21:12:16] James_F: I challenge that statement :-} [21:12:41] James_F: you have done a large share of CI work and evangelization ! [21:13:01] once we got bunch of jobs to Nodepool I think I will look at deploying the Jenkins jobs via scap3 [21:13:14] and probably use the repo name as a prefix [21:13:28] so long term: oojs-ui-npm :-D [21:14:00] Sure. [21:15:18] James_F: been working on composer / PHP on Nodepool. That is a bit of a mess though but I got some experimental jobs for Zend 5.5 / Trusty ;-} [21:15:30] might migrate whatever npm job that relies on composer tomorrow [21:16:20] James_F: and I can't remember who suggested to use nvm (Node Version Manager) to simply handling of nodejs/npm versions [21:16:45] I might have? [21:18:03] that would not surprise me [21:18:16] there is even some puppet module! [21:18:50] hashar: This [21:18:50] https://gerrit.wikimedia.org/r/269370 [21:18:50] has been merged [21:18:58] Should we close this https://phabricator.wikimedia.org/T126211 [21:19:21] Please. [21:21:28] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #250: 04FAILURE in 5 min 27 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/250/ [21:27:37] paladox: yeah will take care of that tomorrow ;) [21:27:57] Ok [21:28:05] :) [21:28:36] will have to rebuild the nodepool images [21:28:46] and check that the php alternative works fine [21:28:50] fun ;-D [21:28:52] Oh ok. :) [21:28:54] Yep [21:29:12] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 03releng-201516-q4, and 2 others: [keyresult] Migrate php composer (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2245025 (10hashar) a:03hashar [21:29:43] 05Continuous-Integration-Scaling, 06Operations, 07Nodepool, 07WorkType-NewFunctionality: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#2245031 (10hashar) a:05hashar>03None [21:30:12] well night time have a good day evening etc [21:33:29] 10Continuous-Integration-Infrastructure: pplint-HEAD fails for clones of puppet from scandium - https://phabricator.wikimedia.org/T133816#2245071 (10thcipriani) [21:37:04] 10Continuous-Integration-Infrastructure: pplint-HEAD fails for clones of puppet from scandium - https://phabricator.wikimedia.org/T133816#2245104 (10hashar) [21:37:44] that one looks like a Puzzle [21:38:00] err [21:38:01] jigsaw [21:38:12] well that stuff https://en.wikipedia.org/wiki/Jigsaw_puzzle [21:38:12] got ton of pieces [21:38:21] but you get to find them ;-) [22:09:03] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-GettingStarted, 06Operations: GettingStarted on Beta Cluster periodically loses its Redis index - https://phabricator.wikimedia.org/T100515#2245193 (10Mattflaschen) I reopened it since it did happen again: T94154#2066037 [22:18:28] nodepool jobs not running? [22:18:41] !log Deployed https://gerrit.wikimedia.org/r/#/c/282440/ to switch Beta Cluster to use External Store for new testwiki writes [22:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:21:37] legoktm: blerg. looks that way. [22:23:12] Exception: Timeout waiting for server 9577e9a7-35a9-4083-8257-787f0bff6222 deletion in wmflabs-eqiad [22:25:29] 10Continuous-Integration-Config, 10Fundraising-Backlog, 10Unplanned-Sprint-Work, 07FR-ActiveMQ, and 3 others: Run PHPUnit on PHP-Queue repo - https://phabricator.wikimedia.org/T133574#2245253 (10DStrine) [22:26:33] Worked. :) [22:26:58] I haven't really had to deal with the nodepool service. wonder if it's fine to just restart in this instance? [22:27:03] ^ andrewbogott ? [22:29:46] 10Beta-Cluster-Infrastructure, 10Staging, 10DBA, 03Collab-Archive-2015-2016, and 2 others: Use External Store on Beta Cluster - https://phabricator.wikimedia.org/T95871#2245311 (10Mattflaschen) Working fine. I did a couple tests at http://test.wikimedia.beta.wmflabs.org/wiki/Vorlage:Lk_%28Bibel%29 and htt... [22:35:29] !log restarting nodepool [22:35:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:36:11] !log I don't have permission to restart nodepool [22:36:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:36:20] thcipriani: sorry, went offline a bit. nodepool list shows all the jessie instances in a delete state [22:36:30] Yippee, build fixed! [22:36:30] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #1032: 09FIXED in 25 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/1032/ [22:36:32] yeah, I saw that. [22:36:44] you're not in contint-admins? [22:36:58] I found this ticket https://phabricator.wikimedia.org/T120668 there was a jenkins restart earlier, but 8888 looks up on gallium. [22:37:02] I thought I was. [22:37:31] I'm in the list according to data.yaml [22:37:47] do we know why jenkins is timing out? [22:37:53] ah, I did nodepool restart, not nodepool stop/start [22:38:08] !log stop/started nodepool [22:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:38:53] legoktm: Thanks, I was just about to bring that up [22:39:00] I'm not really sure that helped though [22:39:03] now it's throwing [22:39:04] ManagerStoppedException: Manager wmflabs-eqiad is no longer running [22:39:05] I was starting to wonder why my patch for SWAT hadn't merged yet [22:39:06] no, nodepool restart was just a guess. [22:39:51] 10Beta-Cluster-Infrastructure, 10Staging, 10DBA, 03Collab-Archive-2015-2016, and 2 others: Use External Store on Beta Cluster - https://phabricator.wikimedia.org/T95871#2245341 (10Mattflaschen) A few DB queries I used to verify (260 is http://test.wikimedia.beta.wmflabs.org/wiki/Vorlage:Lk_%28Bibel%29): `... [22:40:12] 2016-04-27 22:39:17,245 INFO nodepool.NodePool: Deleted jenkins node id: 85342 [22:40:14] and a bunch more now [22:41:00] nodepool list hasn't updated yet though [22:41:36] uh, let me try deleting one of them manually [22:41:37] !log Deployed https://gerrit.wikimedia.org/r/#/c/285765/ to enable External Store everywhere on Beta Cluster [22:41:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:42:05] !log nodepool delete 85342 [22:42:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:43:48] didn't do anything :( [22:45:02] I noticed. hmmmm. [22:46:24] did you try it in the foreground? Lemme try one. [22:47:48] lots of INFO urllib3.connectionpool: Starting new HTTP connection (1): labnet1002.eqiad.wmnet [22:49:42] 10Beta-Cluster-Infrastructure, 10Staging, 10DBA, 03Collab-Archive-2015-2016, and 2 others: Use External Store on Beta Cluster - https://phabricator.wikimedia.org/T95871#2245384 (10Mattflaschen) Tested enwiki at http://en.wikipedia.beta.wmflabs.org/wiki/UserMergeoujuvz and http://en.wikipedia.beta.wmflabs.o... [22:51:05] well, I get a connection refused from netcat to port 80 on that box. So that could be part of the problem. [22:51:40] sorry, my wifi died [22:51:48] !log also ran openstack server delete ci-jessie-wikimedia-85342 [22:51:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:52:00] ^ didn't do anything afais [22:52:28] thcipriani: you can't connect to labnet1002? [22:53:15] labnodepool1001: nc -vz labnet1002.eqiad.wmnet -w1 80 == labnet1002.eqiad.wmnet [10.64.20.25] 80 (http) : Connection refused [22:53:53] :/ [22:55:12] I'm going to log out of labnodepool1001.eqiad.wmnet because my wifi is really sketchy right now [23:03:27] nodepool is Exception: Timeout waiting for server da75528f-399a-47ce-bafd-8d429b61ff40 deletion in wmflabs-eqiad again. [23:04:07] seems like it can't talk to openstack. [23:06:40] it has open connections to labnet1002 on 8774, which seems right, but not getting a response. [23:08:31] 07Browser-Tests, 10MediaWiki-extensions-GettingStarted, 07Tracking: Cucumber tests for GettingStarted - https://phabricator.wikimedia.org/T92156#2245430 (10Jdforrester-WMF) p:05Normal>03Low [23:13:18] hmm [23:16:46] the gearman job pool is all waiting no running, pretty much [23:19:49] FWIW, it sounds a lot like this is happening: https://phabricator.wikimedia.org/T122731 [23:28:03] hm [23:28:10] everything ready for the phabricator update? [23:30:24] twentyafterfour: I created T133820 for the future [23:30:24] T133820: Future Phabricator Update - https://phabricator.wikimedia.org/T133820 [23:31:05] Luke081515: trying to put out the nodepool fire, so not quite ready for phab update [23:31:15] ok [23:41:13] 05Gitblit-Deprecate, 10Diffusion, 13Patch-For-Review: Replicate open patchsets to diffusion - https://phabricator.wikimedia.org/T89940#2245487 (10mmodell) [23:42:02] 05Gitblit-Deprecate, 10Diffusion, 13Patch-For-Review: Replicate open patchsets to diffusion - https://phabricator.wikimedia.org/T89940#1049136 (10mmodell) [23:57:27] !log nodepool instances running again after an openstack rabbitmq restart by andrewbogott [23:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master