[00:36:05] (03PS5) 10Awight: Populate CRM test dbs [integration/jenkins] - 10https://gerrit.wikimedia.org/r/158554 [00:44:13] 3Quality-Assurance, Release-Engineering: Review environment abstraction layer for mediawiki_selenium - https://phabricator.wikimedia.org/T78356#843217 (10dduvall) 3NEW a:3dduvall [00:50:40] (03PS1) 10Dduvall: Cache authenticated API client by url and user [selenium] (env-abstraction-layer) - 10https://gerrit.wikimedia.org/r/179362 [01:42:20] (03PS6) 10Awight: Populate CRM test dbs [integration/jenkins] - 10https://gerrit.wikimedia.org/r/158554 [01:44:10] (03PS1) 10Dduvall: Method for loading of default environment [selenium] (env-abstraction-layer) - 10https://gerrit.wikimedia.org/r/179375 [01:44:12] (03PS1) 10Dduvall: Minor cruft removal [selenium] (env-abstraction-layer) - 10https://gerrit.wikimedia.org/r/179376 [01:44:32] (03CR) 10jenkins-bot: [V: 04-1] Method for loading of default environment [selenium] (env-abstraction-layer) - 10https://gerrit.wikimedia.org/r/179375 (owner: 10Dduvall) [11:51:06] 3Release-Engineering: "gem build" should fail if there are _any_ warnings - https://phabricator.wikimedia.org/T1333#844106 (10zeljkofilipin) [13:05:07] 3Release-Engineering: "gem build" should fail if there are _any_ warnings - https://phabricator.wikimedia.org/T1333#844187 (10hashar) Although Gem does not emit an exit code on warning, that can be worked around by catching stderr and check whether anything has been output. See my previous comment: T1333#23449... [14:16:51] (03PS4) 10Stan: add MediaWiki standard rubocop config [selenium] - 10https://gerrit.wikimedia.org/r/176256 [14:16:53] (03PS4) 10Stan: change block comments to # comments (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176260 [14:16:55] (03PS5) 10Stan: change compact style module definition to nested (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176261 [14:16:57] (03PS2) 10Stan: remove non applicable private (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176254 [14:16:59] (03PS4) 10Stan: remove redundant selfs (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176340 [14:17:01] (03PS4) 10Stan: rubocop line length fixes [selenium] - 10https://gerrit.wikimedia.org/r/176255 [14:17:03] (03PS2) 10Stan: fix rubocop string literal offenses [selenium] - 10https://gerrit.wikimedia.org/r/176252 [14:17:05] (03PS3) 10Stan: add basic documentation comments (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176275 [14:17:07] (03PS4) 10Stan: use english special globals per rubocop [selenium] - 10https://gerrit.wikimedia.org/r/176342 [14:17:09] (03PS2) 10Stan: mark unused argument (rubocop fix) [selenium] - 10https://gerrit.wikimedia.org/r/176253 [14:17:11] (03PS3) 10Stan: remove extra blank lines (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176277 [14:17:13] (03PS3) 10Stan: add empty lines (per rubocop style) [selenium] - 10https://gerrit.wikimedia.org/r/176276 [14:17:15] (03PS3) 10Stan: use guard clause per rubocop [selenium] - 10https://gerrit.wikimedia.org/r/176279 [14:17:17] (03PS3) 10Stan: disable cop in this instance as rvm requires no space [selenium] - 10https://gerrit.wikimedia.org/r/176282 [14:17:19] (03PS4) 10Stan: disable global cop around globals that can't easily be removed [selenium] - 10https://gerrit.wikimedia.org/r/176344 [14:17:21] (03PS1) 10Stan: regenerate rubocop todo [selenium] - 10https://gerrit.wikimedia.org/r/179460 [14:17:23] (03PS1) 10Stan: remove comma after last item of hash (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/179461 [14:17:25] (03PS1) 10Stan: fullstop with method name (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/179462 [14:17:27] (03PS1) 10Stan: remove space inside empty braces (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/179463 [14:17:29] (03PS1) 10Stan: use &: style instead of block (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/179464 [14:17:31] (03CR) 10jenkins-bot: [V: 04-1] change block comments to # comments (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176260 (owner: 10Stan) [14:17:33] (03CR) 10jenkins-bot: [V: 04-1] change compact style module definition to nested (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176261 (owner: 10Stan) [14:17:37] (03CR) 10jenkins-bot: [V: 04-1] remove non applicable private (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176254 (owner: 10Stan) [14:17:39] (03CR) 10jenkins-bot: [V: 04-1] remove redundant selfs (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176340 (owner: 10Stan) [14:17:41] (03CR) 10jenkins-bot: [V: 04-1] rubocop line length fixes [selenium] - 10https://gerrit.wikimedia.org/r/176255 (owner: 10Stan) [14:17:43] (03CR) 10jenkins-bot: [V: 04-1] fix rubocop string literal offenses [selenium] - 10https://gerrit.wikimedia.org/r/176252 (owner: 10Stan) [14:17:45] (03CR) 10jenkins-bot: [V: 04-1] add basic documentation comments (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176275 (owner: 10Stan) [14:17:47] (03CR) 10jenkins-bot: [V: 04-1] mark unused argument (rubocop fix) [selenium] - 10https://gerrit.wikimedia.org/r/176253 (owner: 10Stan) [14:17:50] (03CR) 10jenkins-bot: [V: 04-1] remove extra blank lines (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176277 (owner: 10Stan) [14:17:51] (03CR) 10jenkins-bot: [V: 04-1] add empty lines (per rubocop style) [selenium] - 10https://gerrit.wikimedia.org/r/176276 (owner: 10Stan) [14:17:53] (03CR) 10jenkins-bot: [V: 04-1] use guard clause per rubocop [selenium] - 10https://gerrit.wikimedia.org/r/176279 (owner: 10Stan) [14:17:55] (03CR) 10jenkins-bot: [V: 04-1] disable global cop around globals that can't easily be removed [selenium] - 10https://gerrit.wikimedia.org/r/176344 (owner: 10Stan) [14:17:57] (03CR) 10jenkins-bot: [V: 04-1] remove comma after last item of hash (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/179461 (owner: 10Stan) [14:17:59] (03CR) 10jenkins-bot: [V: 04-1] fullstop with method name (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/179462 (owner: 10Stan) [14:18:01] (03CR) 10jenkins-bot: [V: 04-1] remove space inside empty braces (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/179463 (owner: 10Stan) [14:18:03] (03CR) 10jenkins-bot: [V: 04-1] use &: style instead of block (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/179464 (owner: 10Stan) [14:34:13] (03CR) 10JanZerebecki: [C: 031] "We could craft jobs to run some or all of the jobs requiring composer in a variant that uses this updated repo, but I think other things m" [integration/composer] - 10https://gerrit.wikimedia.org/r/178550 (owner: 10Legoktm) [14:40:08] (03Abandoned) 10Stan: remove nil comparison per rubocop [selenium] - 10https://gerrit.wikimedia.org/r/176283 (owner: 10Stan) [14:40:40] (03Abandoned) 10Stan: remove extra blank line (per rubocop) [selenium] - 10https://gerrit.wikimedia.org/r/176278 (owner: 10Stan) [14:40:53] (03Abandoned) 10Stan: correct bracket spacing per rubocop [selenium] - 10https://gerrit.wikimedia.org/r/176341 (owner: 10Stan) [14:41:09] (03Abandoned) 10Stan: change delimiter for % per rubocop [selenium] - 10https://gerrit.wikimedia.org/r/176338 (owner: 10Stan) [14:41:22] (03Abandoned) 10Stan: indentation fix [selenium] - 10https://gerrit.wikimedia.org/r/176281 (owner: 10Stan) [14:41:33] (03Abandoned) 10Stan: update hash syntax per rubocop [selenium] - 10https://gerrit.wikimedia.org/r/176280 (owner: 10Stan) [14:41:51] (03Abandoned) 10Stan: ignore cop because is part of module api [selenium] - 10https://gerrit.wikimedia.org/r/176257 (owner: 10Stan) [14:42:03] (03Abandoned) 10Stan: rubocop and/or style fixes [selenium] - 10https://gerrit.wikimedia.org/r/176258 (owner: 10Stan) [14:42:16] (03Abandoned) 10Stan: use % instead of %Q per rubocop [selenium] - 10https://gerrit.wikimedia.org/r/176259 (owner: 10Stan) [14:43:10] (03PS2) 10Stan: regenerate rubocop todo [selenium] - 10https://gerrit.wikimedia.org/r/179460 [14:55:36] Project browsertests-Wikidata-SmokeTests-linux-firefox-sauce build #84: FAILURE in 38 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-SmokeTests-linux-firefox-sauce/84/ [14:55:42] zeljkof_: https://gerrit.wikimedia.org/r/#/c/178903 [15:15:13] 3Continuous-Integration: Zuul WebApp no more provide status.json (exception raised) - https://phabricator.wikimedia.org/T78400#844314 (10hashar) That is entirely my fault. Earlier this week I have reset Zuul to wmf-deploy-20141030-3 which has a wrong patch causing that exact error. Reverting again to wmf-deplo... [15:15:56] 3Continuous-Integration: Zuul WebApp no more provide status.json (exception raised) - https://phabricator.wikimedia.org/T78400#844315 (10hashar) 5Open>3Resolved a:3hashar [15:29:47] 3Continuous-Integration: Bump Zuul supports for python-statsd 3.x - https://phabricator.wikimedia.org/T78402#844324 (10hashar) 3NEW a:3hashar [15:34:46] 3Quality-Assurance: rubocop fixes in mediawiki/selenium - https://phabricator.wikimedia.org/T75898#844333 (10Stan3) hmmm, can't seem to get gerrit to base my changes on the env-abstraction-layer branch rather than master pushed to https://github.com/stan3/mediawiki-selenium/tree/T75898 in the mean time [15:36:11] manybubbles: nik!!!!!!!!!!!!!!!!! [15:36:54] manybubbles: iirc you are a rspec guru. If you have some spare time, you might want to glance at a patch I proposed on operations/puppet to let us do rspec unit testing https://gerrit.wikimedia.org/r/#/c/178810/ :D [15:39:52] hashar: k [15:40:33] hashar: I'm probably not the best with rspec but I can certainly review it [15:47:44] manybubbles: if you have some spare cycles, would be nice to get your view on it :] [15:58:23] (03CR) 10Hashar: [C: 04-2] "They usually run in less than 2 minutes." [integration/config] - 10https://gerrit.wikimedia.org/r/179345 (owner: 10MaxSem) [16:02:35] shinken is actually correct it seems. http://en.wikipedia.beta.wmflabs.org/ not responding as of just now. [16:03:58] 503 after a long wait [16:10:52] and it's back [16:30:13] (03PS4) 10Hashar: Support MediaWiki core under HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/178862 [16:32:11] (03CR) 10Hashar: "Fixed a few more things:" [integration/config] - 10https://gerrit.wikimedia.org/r/178862 (owner: 10Hashar) [16:36:32] 3Continuous-Integration, Release-Engineering: Zuul-cloner forgets to clear workspace - https://phabricator.wikimedia.org/T76304#844401 (10hashar) I would use `npm prune` as a workaround for the npm job. I lack spare cycles to investigate the impact of having zuul-cloner to run `git clean -xqdf` (or `-ff`). [16:36:50] 3Continuous-Integration, Release-Engineering: Zuul-cloner forgets to clear workspace - https://phabricator.wikimedia.org/T76304#844402 (10hashar) a:5hashar>3None [16:38:49] 3Continuous-Integration: [OPS] Jenkins: Slaves running Ubuntu Trusty should have hhvm installed - https://phabricator.wikimedia.org/T75356#844403 (10hashar) I have applied the puppet patch https://gerrit.wikimedia.org/r/178806 on the continuous integration puppetmaster, so we have hhvm installed and more or les... [16:45:30] hi hashar do you know what is wrong when I run "jenkins-jobs test config/ -o output/" and nothing appears in output/ ? The output/directory remains empty. I have updated and re-installed jjb. [16:46:06] hashar: and I have the correct branch in config/ [16:46:34] YuviPanda: Any idea what this puppet failure is about? -- Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::labs::instance for i-00000194.eqiad.wmflabs on node i-00000194.eqiad.wmflab [16:46:36] 3Release-Engineering, Quality-Assurance: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#844410 (10hashar) Poked our internal ops list to attract attention on the patch and have people to try it out. [16:46:50] bd808: yeah that is transient [16:47:02] bd808: the puppetmaster can't read something in the sqlite database it use [16:47:13] bd808: most probably because another puppetmaster process has a lock on the sqlite database [16:47:24] that happens often when you run puppet agent -tv in // on different hosts [16:47:32] *nod* [16:47:36] worked the second time [16:47:43] chrismcmahon: the config path is wrong: "config/" -> "config/jjb/" [16:48:01] chrismcmahon: whatever doc you used needs an update. The JJB files got moved to integration/config.git under /jjb/ directory [16:48:20] bd808: ideally the puppet master self would use a local mysql database ;D [16:48:35] or even more ideally we'd be masterless [16:48:46] grr Shinken reports everything is dead :( [16:48:53] but that has issues when we think we are hiding secrets I guess [16:48:56] bd808: using /data/project/puppet ? :] [16:49:51] Error: /Stage[main]/Ldap::Client::Utils/File[/usr/local/sbin/manage-nfs-volumes-daemon]: Could not evaluate: getaddrinfo: Temporary failure in name resolution [16:49:52] bah [16:50:03] thanks hashar [16:50:16] labs dns is flaky [16:51:32] yeah again [16:51:49] last time that was because of the beta cluster spamming it [16:52:32] greg-g: (and for all deployers) https://twitter.com/eightamrock/status/543409688784936961 :) [16:52:46] having issues with beta cluster responding still :/ [16:52:58] yeah. me too. [16:53:12] anybody looking at that or have an idea where to start? [16:53:13] kart_: hah! I like the numpad for entering the launch code [16:53:37] chrismcmahon: are you looking into the unresponsive beta cluster? [16:53:59] ping is working... [16:54:22] There was a ton of log spam in logstash about memcached failures a bit ago [16:54:23] beside DNS being flappy on labs? [16:54:30] greg-g: it came back afaik [16:54:30] stopped getting a lot of events at 16:40: https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/default [16:54:48] chrismcmahon: 503 here [16:54:55] greg-g: OK. I [16:55:10] the memcached error like: Memcached error for key "simplewiki:lag_times:deployment-db1" on server "127.0.0.1:11212": SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY [16:55:14] greg-g: OK. I've been working on pushing some Jenkins jobs this morning. [16:55:20] I have NEVER understood what it is nor how it is triggered [16:55:59] ask in -ops? [16:56:15] * bd808 is looking at host now [16:56:30] s/never understood/never wanted to spend time figuring out what the non obvious message mean/ :D [16:56:49] apparently logstash-beta no more receive events [16:56:54] yeah [16:56:57] at 16:40ish [16:57:37] twentyafterfour: awake? [16:57:41] curl localhost is hanging on deployment-mediawiki02 [16:57:51] I think apache is taking a long nap [16:57:52] wt... [16:58:03] 65 child processes [16:58:05] urllib2.URLError: [16:58:14] and there are some: 2014-12-12 14:27:29 deployment-jobrunner01 wikidatawiki: [fcc6214f] [no req] RedisException from line 48 of /srv/mediawiki/php-master/vendor/monolog/monolog/src/Monolog/Handler/RedisHandler.php: read error on connection [16:58:58] !log restarted apache2 on deployment-mediawiki02 [16:59:35] Logged the message, Master [16:59:39] fixed it I think [16:59:56] started getting events in logstash again at 16:54 [17:00:06] !log restarted apache2 on deployment-mediawiki01 [17:00:09] Logged the message, Master [17:00:51] apache had 65 children on both hosts. I haven't seen that since we were having fcgi problems with hhvm [17:01:27] So something hung up the hhvm processes I guess and the apache worker pool back up the the point of being unusable [17:01:49] :/ [17:01:57] I would guess related to the memcached errors [17:02:46] bd808: worth a quick summary to ops mailing list? I volunteer hashar or chrismcmahon :) [17:02:56] It's still sick/sad [17:03:03] grr [17:03:16] deployment-mediawiki01 is at 65 kids again [17:03:32] and so is 02 [17:03:39] something's fubar [17:05:11] ok, so, what changed? [17:05:14] redis server is down maybe? Dec 12 16:47:08 deployment-mediawiki02 hhvm: #012Warning: Failed connecting to redis server at 10.68.16.134: Connection timed out [17:06:24] redis seems to be alive on deployment-redis01 [17:07:04] * chrismcmahon yells at jjb "why u no update Jenkins you $#@$#^%!" [17:07:19] chrismcmahon: jenkins can wait for now [17:09:50] lots of fcgi errors. I'm going to restart hhvm [17:10:35] so it might be memcached related as well [17:10:54] but honestly I lost track of all the memcachedinfra changes that occurred over the last two years or so [17:11:02] nutcracker might be at fault, [17:11:54] Reedy: there? [17:12:12] all better [17:12:21] "all" and "better" ? :) [17:12:37] !log restarted hhvm on deployment-mediawiki0[12] and purged hhbc database [17:13:11] Logged the message, Master [17:13:29] so hhvm had gone bonkers and was giving apache bad fast cgi responses [17:13:33] \O/ [17:13:44] thanks Bryan! [17:14:07] I shot it in the head and cleaned up the existing jit bytecode cache while it was down for good measure [17:14:08] chrismcmahon: you might have an entry in the local jjb cache [17:14:22] chrismcmahon: jenkins-jobs --flush-cache [17:14:38] memecahced looks to still be fubar [17:14:59] hashar: that's probably it, checking... [17:17:55] bd808: yeah, that’s transient. [17:17:57] nope [17:18:24] YuviPanda: what is? [17:18:47] greg-g: uh, a particular error bd808 asked me about, not finding role::instance or somesuch [17:18:56] k [17:19:04] today’s shinken spam courtesy of a puppet change that broke puppet *everywhere*, including cluster :) [17:19:18] "great" [17:19:23] would’ve spammed here way too much, so I’ve kicked shinken off [17:20:51] bd808: hashar so the puppet cron was broken in a commit, a followup fixed it but this means that puppet won’t auto run until it’s forced to run (with an up to date ops/puppet) at least once [17:21:14] I guess one can use salt to run puppet everywhere ? [17:21:48] I am heading back home and be back on monday [17:21:56] promised wife and kids to be there "early" :D [17:26:38] * hashar waves [17:32:43] !log forcing puppet runs on deployment-mediawiki0[12]; hiera settings specific to beta were not applied on the hosts leading to all kinds of problems [17:32:47] Logged the message, Master [17:34:24] greg-g: This was the cause of a large number of problems -- https://phabricator.wikimedia.org/P153 [17:35:22] notice all the memecached servers point to prod instead of beta and syslog output not going to logstash [17:48:56] bd808: ugh [17:50:43] greg-g: So it looks like there was some transient problem talking to wikitech that led to our hiera overrides not being applied to the puppet run. There should have been multiple puppet runs in the time that things were borked though as far as I can tell. [17:51:32] this transient word keeps coming up [17:51:39] But there was a premature puppet cahnge committed that messed some things up so it may have just been a random timing thing [17:51:46] is yuvi looking into the failure of hier? [17:51:47] a [17:51:52] k [17:51:52] 3Quality-Assurance, Release-Engineering: role/phabricator.pp include a password class in the global puppet scope - https://phabricator.wikimedia.org/T78344#844493 (10akosiaris) Indeed that include should not be outside a class. Best way to do this is to put all that it in a ::config class and include it in any o... [17:52:03] greg-g: didn't you know that most computer problems are non-deterministic? [17:52:23] Knuth proved that at some point before you were born. :) [17:52:27] :P [18:00:54] greg-g: I am now [18:06:22] 3Beta-Cluster: Beta servers can be badly misconfigured if mwyaml hiera backend fails - https://phabricator.wikimedia.org/T78408#844508 (10bd808) 3NEW [18:06:45] twentyafterfour: see the last paste from bd.808 [18:07:07] and that phab task ^^ [18:07:08] looks like failed to apply the hiera bits we have for beta cluster [18:07:37] greg-g: no, I’m still looking into overall puppet failure in betalabs and trying to get that fixed [18:07:43] * greg-g nods [18:08:28] YuviPanda: things seem to be working on hosts where I have forced a manual run [18:08:46] Is the cron runner broken until manually run? [18:08:51] yeah, that’s what I’m doing, ran into another change that caused puppet failures for toollabs... [18:08:58] *nod* [18:09:03] 3Quality-Assurance, Release-Engineering: role/phabricator.pp include a password class in the global puppet scope - https://phabricator.wikimedia.org/T78344#844519 (10chasemp) >>! In T78344#844493, @akosiaris wrote: > Best way to do this is to put all that it in a ::config class and include it in any other role c... [18:09:05] bd808: it is, and I’m forcing a run in all of labs, but not sure if that’ll hit deployment prep [18:09:09] since it has its own saltmaster [18:09:25] YuviPanda: It won't. The salt masters aren't chained [18:09:45] apparently there is some way to do that. Ryan mentioned it a long time ago [18:10:05] right [18:10:15] I can force via salt in beta if you'd like [18:10:18] bd808: if so, can you use salt to force a run on all deployment-prep hosts [18:10:20] yeah [18:10:29] will do [18:10:38] unless twentyafterfour wants to [18:12:58] twentyafterfour: want to learn how to do that ^ ? :) [18:20:55] greg-g: ok [18:21:21] bd808: already done it? [18:21:30] twentyafterfour: nope [18:21:38] all yours if you have time [18:21:43] ok [18:23:07] remember to batch, forgetting that is why virt1000 (and wikitech) was down for a few mins :) [18:23:18] crush! [18:23:31] YuviPanda: sample command? [18:23:58] salt '*' -b2 cmd.run 'puppet agent --test --verbose' ? [18:25:11] yeah [18:25:29] or -tv :) [18:26:17] I like having readable commands in my shell history (except when I'm using awk) [18:26:24] heh [18:26:25] alright [18:28:00] twentyafterfour: ^ [18:29:03] ok so force puppet run? [18:29:55] yup [18:30:32] and I run this on the deployment salt master? [18:32:41] twentyafterfour: yes [18:32:46] you need sudo [18:34:42] ohi mw-logstash-beta [18:35:15] !log Added puppet config to record !log messages in logstash [18:35:18] Logged the message, Master [18:36:40] 3Quality-Assurance, Release-Engineering: role/phabricator.pp include a password class in the global puppet scope - https://phabricator.wikimedia.org/T78344#844561 (10hashar) >>! In T78344#844493, @akosiaris wrote: > Indeed that include should not be outside a class. Best way to do this is to put all that it in a... [18:37:04] Yippee, build fixed! [18:37:04] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #132: FIXED in 35 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/132/ [18:37:24] salt+puppet seems to be working [18:38:40] Could not find class webserver:::php5-mysql for i-000004df.eqiad.wmflabs on node i-000004df.eqiad.wmfla [18:39:44] Could not find class lvs::configuration for i-000005bf.eqiad.wmflabs [18:40:51] wtf build fixed? [18:41:07] that should be failing. I think y'all fixed beta labs better than you know. [18:42:00] twentyafterfour: The failure on I-000004df.eqiad.wmflabs is config drift. That's the hacked up deployment-sentry2 box [18:43:21] twentyafterfour: Maybe worth opening a task for. Reedy was going to play with that box at some point but I don't think he's gotten around to it [18:43:37] ok [18:43:42] so far just those two failures [18:44:02] Yippee, build fixed! [18:44:03] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #185: FIXED in 36 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/185/ [18:44:24] The other one is deployment-parsoidcache02. Maybe gwicke could be tricked into looking into it? [18:46:47] gwicke: oh gabriel :) :) see the above puppet failure on parsoidcache02 in beta cluster ^ (or subbu or whoever) [18:48:47] 3Release-Engineering: Puppet failure on deployment-sentry2 - https://phabricator.wikimedia.org/T78411#844586 (10mmodell) 3NEW [18:49:56] bd808, greg-g: roan was working on that recently IIRC [18:49:58] salt seems frozen or something, no longer running puppet but not returning to the shell [18:50:17] he might know more about the details [18:56:51] 3Quality-Assurance, Release-Engineering: role/phabricator.pp include a password class in the global puppet scope - https://phabricator.wikimedia.org/T78344#844612 (10akosiaris) Ah, OK, cause in my RSpec tests I test per module, avoiding this issue entirely. Granted I haven't gotten around to testing roles but mo... [19:03:56] gwicke: does that make sense for him to manage the parsoid machine instead of the.. parsoid team? [19:07:30] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #411: FAILURE in 23 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/411/ [19:10:27] wb, shinken-wm [19:13:53] Yippee, build fixed! [19:13:54] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #340: FIXED in 44 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/340/ [19:18:08] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #350: FAILURE in 23 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/350/ [19:19:06] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:25:24] reran puppet on parsoid-cache02 and it didn't fail this time [19:52:31] RECOVERY - Puppet failure on deployment-cxserver03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:54:08] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [19:54:50] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:56:04] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [19:56:34] RECOVERY - Puppet failure on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:58:25] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:58:35] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:41] RECOVERY - Puppet failure on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:14:36] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:15:14] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:16:11] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #170: FAILURE in 1 hr 2 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/170/ [20:17:32] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #278: FAILURE in 1 min 20 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/278/ [20:20:22] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #60: FAILURE in 2 min 49 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/60/ [20:20:46] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #357: FAILURE in 2 min 15 sec: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/357/ [20:21:14] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #379: FAILURE in 46 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/379/ [20:23:04] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #314: FAILURE in 2 min 17 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/314/ [20:37:39] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #222: FAILURE in 16 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/222/ [20:44:38] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:45:18] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:36:14] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [21:42:21] 3Beta-Cluster: Puppet failures on deployment-bastion - https://phabricator.wikimedia.org/T75520#845001 (10hashar) hey operations, can some puppet guru help sort out a puppet error we have please? ``` Must pass trusted_group to Class[Keyholder] ``` On deployment-bastion.eqiad.wmflabs , the equivalent of tin on... [21:46:14] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:06:11] (03PS5) 10Hashar: Support MediaWiki PHP tests under HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/178862 [22:06:39] (03CR) 10Hashar: "Split testextension jobs as well." [integration/config] - 10https://gerrit.wikimedia.org/r/178862 (owner: 10Hashar) [22:11:36] Krinkle: we broke Zuul statsd reporting earlier today :/ The graphs at the bottom of https://integration.wikimedia.org/zuul/ are stalled [22:11:46] Krinkle: will get it fixed monday. Have to patch Zuul [22:27:55] !log Creating 1300 Jenkins jobs to run extensions PHPUnit tests under either HHVM or Zend PHP flavors. [22:27:59] Logged the message, Master [22:31:00] 1300 Jenkins jobs? [22:32:46] (03CR) 10Hashar: "Great thanks to have tested! Guess we can push this next week early on :]" [integration/composer] - 10https://gerrit.wikimedia.org/r/178550 (owner: 10Legoktm) [22:33:30] chrismcmahon: aim splitting the PHPUnit Mediawiki extensions jobs [22:33:38] hashar: What broke it? [22:33:57] ex: mwext-Cite-testextension will be removed in favor of: mwext-Cite-testextension-hhvm and b mwext-Cite-testextension-zend [22:34:34] Krinkle: Filippo needed to upgrade python-statsd [22:35:08] Krinkle: there is some minor API change that is used by Zuul scheduler. The Geard graph still works though [22:35:55] hashar: zuul is stuck [22:36:52] Yippee, build fixed! [22:36:52] Project beta-scap-eqiad build #33636: FIXED in 8 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33636/ [22:38:04] !log Fixed scap by deleting /srv/mediawiki/~tmp~ on deployment-rsync01 [22:38:07] Logged the message, Master [22:38:14] Krinkle: bah :( [22:38:41] hashar: What does zuul-cloner benefit us? What did we use before? [22:39:29] Krinkle: zuul-cloner has proper support to clone multiple repository [22:39:40] Krinkle: and it knows about Zuul ref to apply it on multiple repos [22:40:53] hashar: OK [22:40:55] Krinkle: when multiple repos share the same job and a change enter the gate-and-submit, Zuul craft a ref for each repo zuul-cloner lookup that ref name in each repo and check it out if needed, else fallback to the target branch [22:40:59] there is some doc at http://ci.openstack.org/zuul/gating.html#cross-projects-dependencies [22:41:53] it is merely a python port of an OpenStack shell script that uses that logic. Good news: upstream loves the python port and are switching to it ! [22:41:59] hashar: OK. But leaving random files behind from unmerged patch sets and unrelated branches is simply unacceptable. That's a foundational layer of integrity in CI I can not work with. I need to be able to trust the build to be clean. [22:42:30] for npm, I think 'npm purge' will work around it [22:42:37] No, it won't. [22:42:47] I already told you. Any .js file I create in apatch set will linger around forever. [22:42:52] And any file I delete, won't be really gone [22:43:08] so any wildcard selectors, and file discoveries et. will continue matching and breaking old builds [22:43:32] And also, leaving 1000s of clones on slaves forever (for each job and each @2, @3 workspace) doesn't scale. [22:43:46] disk is cheap [22:44:21] so if you send a patch that adds a new file do you mean the new file stay in the local workspace for the next build? [22:44:30] isn't git checkout supposed to drop it? [22:44:58] No? git does not and never did work that way. [22:45:15] It does if the file was part of the commit yes. [22:45:24] But not when you rewind. [22:45:41] Just checkout REL1_10, then REL1_21 and then master and then REL1_23 [22:45:46] you'll have a giant mess [22:46:19] Because .gitignore changes [22:46:48] hashar: The only reason we changed jenkins from workspace-wipe to git-preserve is because we could clean with git-clean. [22:47:17] .I just tried on core to checkout REL1_10 then REL1_21 and REL1_23 and my workspace is cleaer [22:47:19] clean [22:48:00] hashar: https://gist.github.com/Krinkle/2d8f0ae275dc5560c76a [22:48:03] anyway the issue is that mediawiki/core is fetched in /src/ and the extensions under /src/extensions [22:48:19] so if we ran git clean in /src/ that will drop the extensions [22:48:25] That is not clean [22:48:54] ahhhh [22:49:22] The directory 'tests/frontend/' exists in master, but not in the branch. [22:49:31] Checkout out the other branch, leaves the directory behind [22:49:42] I assume that was the case for you as well? [22:51:02] yup indeed [22:51:04] Anyway, build artefacts as well (temporary files, anything generated by grunt, cache, log files, uploads, local settings, submodules) [22:51:16] the long term fix are disposables sandboxes [22:51:39] my main issue with adding git clean, is that I have no idea of the impacts [22:51:44] hashar: OK. I thought the same (as mentioned on the phabricator task). It's still in the future, but do you have a plan for that yet? [22:52:05] With sandboxes we wouldn't have preserved workspaces obviously, Which means no git-clone either. [22:52:20] if that's not a problem, then I assume we should be able to adopt that other mechanism now and also not preserved workspaces. [22:52:31] We'd just enable workspace-wipe again for all jobs and be done with it [22:52:40] no need to change zuul-cloner or add git-clean [23:22:23] Yippee, build fixed! [23:22:23] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #340: FIXED in 43 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/340/ [23:32:44] PROBLEM - Puppet staleness on deployment-salt is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [23:35:18] PROBLEM - Puppet staleness on deployment-stream is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [23:36:43] PROBLEM - Puppet staleness on deployment-elastic05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [23:42:57] PROBLEM - Puppet staleness on deployment-db2 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [43200.0] [23:43:19] PROBLEM - Puppet staleness on deployment-redis02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [23:45:08] PROBLEM - Puppet staleness on deployment-cache-mobile03 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [23:47:58] PROBLEM - Puppet staleness on deployment-upload is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [23:49:14] PROBLEM - Puppet staleness on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [23:52:56] PROBLEM - Puppet staleness on deployment-sca01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [23:53:49] PROBLEM - Puppet staleness on deployment-eventlogging02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [23:56:55] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [23:58:10] Yippee, build fixed! [23:58:11] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #342: FIXED in 58 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/342/