[00:13:10] 3Wikimedia Labs / 3deployment-prep (beta): Issues connecting to https://meta.wikimedia.beta.wmflabs.org - 10https://bugzilla.wikimedia.org/73680 (10Kunal Mehta (Legoktm)) 3NEW p:3Unprio s:3normal a:3None $ curl "https://meta.wikimedia.beta.wmflabs.org" curl: (7) Failed connect to meta.wikimedia.beta.... [00:13:55] possibly unrelated to ^ [00:13:56] legoktm@deployment-bastion:~$ curl https://meta.wikimedia.beta.wmflabs.org [00:13:56] curl: (7) couldn't connect to host [00:14:03] I guess I have to use a different hostname to do that? [00:16:08] 3Wikimedia Labs / 3deployment-prep (beta): beta labs no longer listens for HTTPS - 10https://bugzilla.wikimedia.org/68387#c9 (10Bryan Davis) *** Bug 73680 has been marked as a duplicate of this bug. *** [00:16:10] 3Wikimedia Labs / 3deployment-prep (beta): Issues connecting to https://meta.wikimedia.beta.wmflabs.org - 10https://bugzilla.wikimedia.org/73680#c1 (10Bryan Davis) 5NEW>3RESO/DUP https doesn't work in beta. *** This bug has been marked as a duplicate of bug 68387 *** [00:43:53] Project browsertests-Echo-test2.wikipedia.org-linux-firefox-sauce build #177: FAILURE in 18 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-test2.wikipedia.org-linux-firefox-sauce/177/ [00:58:23] Yippee, build fixed! [00:58:23] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #297: FIXED in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/297/ [01:15:36] Project beta-scap-eqiad build #30571: FAILURE in 1 min 39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/30571/ [01:18:19] Yippee, build fixed! [01:18:19] Project beta-scap-eqiad build #30572: FIXED in 1 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/30572/ [01:31:07] Yippee, build fixed! [01:31:07] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #299: FIXED in 45 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/299/ [02:13:20] Yippee, build fixed! [02:13:20] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #143: FIXED in 42 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/143/ [02:47:14] (03CR) 10Krinkle: "Undid those two earlier today. Debugging via https://gerrit.wikimedia.org/r/#/c/174738/ and others." [integration/config] - 10https://gerrit.wikimedia.org/r/173529 (https://bugzilla.wikimedia.org/72063) (owner: 10Krinkle) [02:47:18] (03CR) 10Krinkle: [C: 04-1] Migrate mediawiki/qunit jobs from production slaves to labs [integration/config] - 10https://gerrit.wikimedia.org/r/173529 (https://bugzilla.wikimedia.org/72063) (owner: 10Krinkle) [02:56:32] Yippee, build fixed! [02:56:33] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #299: FIXED in 35 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/299/ [05:59:25] Project browsertests-Wikidata-WikidataTests-linux-firefox-sauce build #48: SUCCESS in 2 hr 40 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-WikidataTests-linux-firefox-sauce/48/ [06:35:29] Project beta-scap-eqiad build #30601: FAILURE in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/30601/ [06:46:56] Yippee, build fixed! [06:46:56] Project beta-scap-eqiad build #30602: FIXED in 2 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/30602/ [07:18:09] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce build #143: FAILURE in 1 hr 1 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce/143/ [07:21:49] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #198: FAILURE in 14 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/198/ [07:53:06] Project browsertests-CirrusSearch-test2.wikipedia.org-linux-firefox-sauce build #275: FAILURE in 2 min 48 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-test2.wikipedia.org-linux-firefox-sauce/275/ [08:38:01] Yippee, build fixed! [08:38:02] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #344: FIXED in 1 hr 16 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/344/ [09:21:49] Yippee, build fixed! [09:21:50] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #379: FIXED in 1 hr 24 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/379/ [10:05:26] Yippee, build fixed! [10:05:26] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #288: FIXED in 43 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/288/ [10:23:52] Yippee, build fixed! [10:23:52] Project browsertests-Echo-test2.wikipedia.org-linux-firefox-sauce build #178: FIXED in 18 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-test2.wikipedia.org-linux-firefox-sauce/178/ [12:00:42] Project browsertests-MultimediaViewer-mediawiki.org-linux-firefox-sauce build #299: FAILURE in 6 min 35 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-mediawiki.org-linux-firefox-sauce/299/ [14:33:23] shinken-wm: Y U QUIET [14:33:24] :( [14:33:41] oh, hmm [14:45:30] welcome, shinken-test [14:48:04] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:49:19] haha! [14:49:22] so that works [14:50:15] !log deployment-restbase01 has some puppet error: Error 400 on SERVER: Must provide non empty value. on node i-00000727.eqiad.wmflabs . That is due to puppet pickle() function being given an empty variable [14:50:20] Logged the message, Master [14:50:23] YuviPanda: heello :-] [14:50:42] hello hashar [14:52:27] !log manually switching restbase01 puppet master from virt1000 to deployment-salt.eqiad.wmflabs [14:52:29] Logged the message, Master [14:54:31] * YuviPanda hopes shinken-wm will speak up today [15:00:51] puppet certs seems terribly broken on beta :( [15:01:52] !log deployment-salt cleaning certs with puppet cert clean [15:01:54] Logged the message, Master [15:06:47] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:07:34] woohoo [15:12:37] YuviPanda: yeah I broke it i think [15:12:54] 'tis ok [15:13:00] I was happy the notification worked ;) [15:13:07] I think I revoked the puppet master certificate [15:13:08] :( [15:13:18] [certificate revoked for /CN=i-0000015c.eqiad.wmflabs] [15:14:26] Warning: Server hostname 'deployment-salt.eqiad.wmflabs' did not match server certificate; expected i-0000015c.eqiad.wmflabs [15:14:27] nice [15:19:41] !log I have revoked the deployment-salt certificates. All puppet agent are thus broken! [15:19:42] Logged the message, Master [15:19:45] stupid me :/ [15:27:45] greg-g: chrismcmahon this avalanche of emails is not a test :) [15:30:12] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:30:16] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:30:17] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:30:55] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:31:19] hmm [15:31:24] works fine when run manually [15:31:26] and not from init script [15:31:27] YuviPanda: shin ken is spamming me every 5 minutes [15:31:34] with puppet failures :D [15:31:45] oh, it should spam you every 60min per failure. [15:31:52] I should probably increase it to every 6 hours or something [15:31:58] but solution of course is to fix the failures :) [15:32:07] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [15:32:11] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:32:24] hashar: I think it's spamming now because all the instances are broken. don't think you will get repeats [15:32:58] YuviPanda: i do :) [15:33:09] still working on fixing it though [15:33:11] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:33:13] I think I got the master sorted outp [15:33:14] out [15:33:14] yeah [15:33:24] no way to acknowledge it atm, which is bad. [15:33:27] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:34:33] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:34:33] !log Renerated puppet master certificate on deployment-salt. It needs to be named deployment-salt.eqiad.wmflabs not i-0000015c.eqiad.wmflabs. Puppet agent works on deployment-salt now. [15:34:35] Logged the message, Master [15:34:45] PROBLEM - Free space - all mounts on deployment-cache-upload02 is CRITICAL: CRITICAL: deployment-prep.deployment-cache-upload02.diskspace._srv_vdb.byte_percentfree.value (<100.00%) [15:35:11] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:35:38] : Error 400 on SERVER: Could not find class ::beta::mwdeploy_sudo for i-0000044e.eqiad.wmflabs on node i-0000044e.eqiad.wmflabs [15:35:39] pff [15:35:41] that never stops [15:35:49] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:36:00] let me kill the IRC bot for now [15:36:08] na it is ok [15:36:20] YuviPanda: I have fixed the cert on the puppet master [15:36:25] so they should come back soon [15:36:28] cool [15:36:40] the irc bot wasn't running properly anyway [15:36:57] I'm converting that into an upstart script now [15:38:12] hashar: but, yay monitoring? :) [15:39:43] YuviPanda: yeah that is a nice thing [15:39:56] the issue now is figuring out how to report all those issues and get folks to fix them [15:41:19] Error: Sysctl::Parameters[wikimedia base]: Could not evaluate: can't dup Symbol [15:41:19] hehe [15:46:29] omg shinken. I step away for 30 minutes... ;-) [15:50:26] !log deployment-sca01 regenerating puppet CA for deployment-sca01 [15:50:28] Logged the message, Master [15:50:50] chrismcmahon: most are related to beta cluster puppet master being crazy [15:50:57] I broke it somehow, but it should be fine now [15:57:51] !log fixed puppet cert on deployment-restbase01 [15:57:53] Logged the message, Master [16:03:31] YuviPanda: seems shin ken is out of date [16:03:59] http://shinken.wmflabs.org/host/deployment-parsoid05 shows all green [16:04:08] ah no cache issue [16:04:12] it was showing on http://shinken.wmflabs.org/problems [16:06:35] hashar: twentyafterfour will get to those puppet failures Real Soon Now(TM), post BZ/Phab migration [16:06:59] also, what a way to come online to pings about beta failures (any user visible impact?) [16:11:34] greg-g: I looked at some puppet failure,s they are related to recent changes in the puppet manifests [16:11:55] the recent burst of errors was due to me revoking the cert of the puppet master :D [16:12:00] that caused all instances to fail [16:13:00] hah [16:13:06] hey, monitoring works then :) [16:14:49] I will probably craft some command to easily retrieve the puppet agent reports from all instances [16:14:55] that will ease tracking failure [16:24:41] greg-g: yeah I'm close to declaring shinken stable. Just need to get IRC notifications working. :) [16:24:54] Will then remove things from icinga [16:25:51] yay! [16:25:55] greg-g: still opposition to screams in -operations but I think it is important. Needs more poking etc [16:26:07] Shinken is already more reliable than icinga. [16:26:25] YuviPanda: opposition to any shinken in -ops or beta shinken? [16:26:45] Beta [16:26:59] * greg-g nods [16:27:04] I haven't see the arguments [16:27:06] On that icinga in operations task/patch [16:27:12] Mostly (more spam!) [16:27:21] * greg-g nods [16:27:29] But out of sight out of mind etc [16:29:57] hashar: You can see what's failing in puppet and why at https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/puppet%20runs [16:30:19] bd808: oh man. I keep forgetting about it [16:31:01] The part that is lame there is the use of the node ids rather than host names [16:31:58] It would be nice to add a logstash filter that knows how to translate from i-000* to the dns name [16:38:29] or we could change puppet to use the instance hostname /:D [16:38:43] though it is probably not guaranteed to be unique [16:40:28] We can make Wikitech enforce that [16:40:33] If it doesn't already [16:56:17] instance hostname is only temporally unique. You can't have 2 hosts with the same name at the same time, but you can reuse the name of a deleted host. That bit is probably why we use the unique ids instead for puppet and salt certs. [16:58:12] Having an easy way to look up the mapping would be nice. twentyafterfour hacked a php script that works most of the time. I could probably clean it up for the edge cases and figure out how to plug it into logstash. [16:59:42] curl https://wikitech.wikimedia.org/wiki/Nova_Resource:$NODE_ID -s | grep "Instance Name" --context=1 | tail -1 [17:06:01] !log deleted salt keys for deleted instances: i-00000289, i-0000028a, i-0000028b, i-0000028e, i-000002b7, i-000006ad [17:06:04] Logged the message, Master [17:09:37] oh man logstash is so much better than that shinken thing [17:14:38] "just" have to figure out how to make logstash emit alarms now :d [17:14:59] http://logstash.net/docs/1.4.2/outputs/irc [17:15:09] http://logstash.net/docs/1.4.2/outputs/email [17:15:18] http://logstash.net/docs/1.4.2/outputs/nagios [17:15:31] moauhhhhhh [17:16:48] if it could deduplicate the puppet messages per (hostname, status) that would be a killer [17:19:43] Logstash itself won't help with that much as it's really a stream processor. There is a filter that lets you look in the backing elasticsearch for things matching the current record but I worry that it wouldn't be very performant out of the box. [17:20:12] Although there are things that could be done (hash the message and store that in elastic) [17:20:34] But with a message hash you'd want to strip out things like timestamps from the data being hashed [17:20:51] The snowball gets bigger and bigger [17:20:55] maybe we can get puppet to report the state change [17:21:19] instead of just success / failure have additionals fields: new_failure || recovery [17:21:30] so we would only output to irc the new failure / recovery messages [17:21:34] and dismiss the success failure [17:22:33] I don't know how customizable shinken is, but it might be enough to link to a search of logstash for the failing host from the shinken alert page [17:22:44] yeah that would be nice [17:23:13] +1 to having tools know about each other, even if "just" linking [17:23:22] well, intelligent linking [17:23:27] I can change that dashboard to just show failed run details really easily too [17:23:42] I like it when tools link-bait eachother [17:23:47] * bd808 does that [17:24:32] logstash totally just rickrolled sssshinken lolol [17:25:14] there is http://logstash.net/docs/1.4.2/filters/throttle [17:26:27] Yeah, that could be used to limit the rate of complaining [17:27:22] though we would need to reset the throttle on state change [17:27:23] bah [17:28:28] One of the ops guys at $DAYJOB-1 is the maintainer of http://opennetadmin.com/ . He had it tweaks out to do nice cross linking of all our internal tools. [17:28:34] *tweaked [17:28:38] has anyone looked at puppet-dashboard? it does what shinken does but better IIRC [17:28:52] and it's from teh horses mouth (puppet labs) [17:28:59] The code behind ONA is a nighmare though [17:29:13] But it's only for puppet right? [17:29:26] shinken is the new nagios [17:29:37] bitches about anything [17:29:39] oh [17:29:53] well it bitches without providing any useful context [17:30:16] heh. just like nagios :) FEATURE PARITY ACHIEVED [17:30:35] puppet dashboard specifically deals with transitions from ok->failed puppet run->recovered [17:31:10] it makes it easy to see a list of all failures, and it groups them together in intelligent ways [17:31:28] like if a bunch of nodes have essentially the same failure it doesn't make you figure that out on your own [17:31:58] *nod* [17:32:32] We started playing with a log dedup tool but never did anything useful with it. [17:32:36] * bd808 looks for url [17:34:14] http://sentry-beta.wmflabs.org/ [17:35:57] The idea was that we'd add a logstash output to send things there. It uses a database and a django app to organize things and should dashboards of new events and their rates [17:36:17] We were looking at it for hhvm failures initially but never really got it off the ground [17:36:31] sentry looks cool [17:37:22] we had a pretty awesome rough tool developed internally at dA that did that kind of stuff, and it was hella useful [17:38:37] There is a 3rd party logstash plugin for the commercial version. It would be pretty easy to tweak for the locally hosted version I think. [17:40:02] I used Micromuse Netcool which is exactly meant to process events [17:40:14] but that is a telco software, proprietary and ... expensive :] [17:54:13] hey on my local machine with puppet 3.7.3 the run status can be 'changed' 'failed' or 'unchanged' [17:59:57] well i am off for real now *wave* [18:56:58] greg-g: chrismcmahon so, shinken will complain every hour until the betalabs issues are fixed. I'm thinking of making that every 12hours. Thoughts? [18:59:12] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #143: FAILURE in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/143/ [18:59:29] YuviPanda: less is better. it's just noise for me right now [19:00:17] yeah [19:00:22] 12 hours, th en [19:00:30] of course, 'proper' fix is to... fix them [19:00:38] might make it 24 hours, actually [19:01:20] YuviPanda: 12 seems long, 1 seems short, 6? [19:01:38] oh, I didn't see chris' response [19:02:05] greg-g: it'll immediately notify of any *changes*, this is just repeated notifications for things already down [19:02:13] word [19:02:19] whatevs :) [19:02:21] alright [19:02:28] 24h, I think. [19:02:31] we know it's broken, etc. [19:02:38] really, antoine is the one to ask since he's the one to respond more, soon mukunda [19:02:51] actually, twentyafterfour, can we add you to the shinken alerts for beta now? :) :) [19:03:10] email alerts, that is [19:03:25] yeah [19:03:26] :D [19:03:32] yeah [19:04:23] I'm on it for *all* projects, so I've a *lot* of spam :) [19:04:44] greg-g: sure [19:04:50] YuviPanda: doit ^ :) [19:04:54] cool [19:05:09] twentyafterfour: mmodell@wikimedia.org right? [19:05:15] or somewhere else? [19:05:22] yes [19:06:02] cool [19:09:14] twentyafterfour: +1? https://gerrit.wikimedia.org/r/174999 [19:11:20] done [19:11:37] greg-g: chrismcmalunch re-notification interval upped to 24h. so one reminder a day that things are broken. [19:25:06] twentyafterfour: you'll probably get an alert email sometime within the next 24h. do let me know when you get one, so I can be sure it's working ok [19:33:09] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:34:09] Project beta-scap-eqiad build #30678: FAILURE in 30 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/30678/ [19:34:33] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:34:44] PROBLEM - Free space - all mounts on deployment-cache-upload02 is CRITICAL: CRITICAL: deployment-prep.deployment-cache-upload02.diskspace._srv_vdb.byte_percentfree.value (<100.00%) [19:37:14] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:37:43] BAM SHINKEN WOOHOO [19:39:53] "100.00% of data above the critical threshold " means ... nothing? [19:40:12] indeed. [19:40:16] I should work on the error messages [19:40:38] it should probably have the threshold (1 error) and the number by which it is above (whatever number of errors there are) [19:41:22] Yippee, build fixed! [19:41:22] Project beta-scap-eqiad build #30682: FIXED in 2 min 20 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/30682/ [19:43:57] YuviPanda: file a.. er.. well, you can have Mukunda work on that kind of stuff as well :) [19:44:05] but, no bugs/tasks until Monday! [19:44:09] yeah [19:44:16] but this is core shinken stuff, so I might as well take 'em [19:44:19] sure sure [19:44:24] will file a bug anyway [19:44:29] I wasn't sure if it was specific to beta hosts or not [19:44:30] I'm experimenting with boards and stuff [19:44:33] nah, it isn't [19:48:03] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:21:39] (03PS1) 10EBernhardson: Add EventLogging dependency to Flow qunit and phpunit [integration/config] - 10https://gerrit.wikimedia.org/r/175016 [20:22:03] (03PS2) 10EBernhardson: Add EventLogging dependency to Flow qunit and phpunit [integration/config] - 10https://gerrit.wikimedia.org/r/175016 [20:24:54] (03CR) 10EBernhardson: [C: 031] "Deployed to jenkins and tested against Ic011a61" [integration/config] - 10https://gerrit.wikimedia.org/r/175016 (owner: 10EBernhardson) [20:51:08] Yippee, build fixed! [20:51:09] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #199: FIXED in 15 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/199/ [20:57:25] Yippee, build fixed! [20:57:26] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce build #144: FIXED in 1 hr 3 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce/144/ [21:02:00] (03CR) 10Hashar: "Replying to your comments inline. I am not amending the patch since a new patchset would make it hard to keep track of all comments, most " (037 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/174410 (owner: 10Hashar) [21:05:36] (03CR) 10Hashar: [C: 04-1] "Some comments to ease review. -1 so I remember to drop the php-composer-validate if we end up adding the command to the composer template" (035 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/174411 (owner: 10Hashar) [21:05:58] * hashar whistles [21:10:23] (03CR) 10Hashar: [C: 032] "You are making me happy :]" [integration/config] - 10https://gerrit.wikimedia.org/r/175016 (owner: 10EBernhardson) [21:11:14] (03CR) 10Hashar: "oh i forgot, I repushed the job while reviewing. I haven't caught your message stating you refreshed them already." [integration/config] - 10https://gerrit.wikimedia.org/r/175016 (owner: 10EBernhardson) [21:14:07] (03Merged) 10jenkins-bot: Add EventLogging dependency to Flow qunit and phpunit [integration/config] - 10https://gerrit.wikimedia.org/r/175016 (owner: 10EBernhardson) [21:14:21] hi marxarelli I don't think Jenkins is honoring "recheck" any more, would you +2 this if you have a moment? https://gerrit.wikimedia.org/r/#/c/175035/ [21:14:50] see if we can get it merged and get more builds back to green(ish) [21:16:45] chrismcmahon: blam-o [21:17:29] thanks marxarelli, qunit tests have been flaky this week and it brings the whole train to a stop [21:18:14] chrismcmahon: np [21:18:16] * chrismcmahon wants to fix the Chrome builds and the test2 builds today [21:18:48] gahh, failed again https://gerrit.wikimedia.org/r/#/c/175035/ [21:34:06] (03PS1) 10Hashar: Capture npm-debug.log when available [integration/config] - 10https://gerrit.wikimedia.org/r/175112 [21:35:51] (03CR) 10Hashar: "It is not going to always capture the $WORKSPACE/npm-debug.log file since npm is most probably run in sub directories :/" [integration/config] - 10https://gerrit.wikimedia.org/r/175112 (owner: 10Hashar) [21:37:33] chrismcmahon: not honoring recheck ? :D [21:37:36] that is a BUG !!!!! [21:38:08] hashar: could be. I managed to force it by review-0 then review-+2 again [21:38:22] 🐛 <-- bug [21:38:25] (I love unicode [21:38:54] hashar: you can see the history of my recheck here: https://gerrit.wikimedia.org/r/#/c/175035/ [21:40:52] so [21:41:03] the patch is proposed and the main tests are run [21:41:12] cr+2 start the gate jobs [21:41:16] which fails [21:41:27] recheck did trigger the main test [21:41:38] dan vote CR+2 which fails again [21:41:54] I think it worked properly [21:42:06] hashar: yes, recheck triggers the main test but the >merge< part never happens after recheck [21:42:55] yeah [21:42:59] openstack does it [21:43:23] I haven't looked at how to configure Zuul to have recheck merge the change if it has a CR+2 [21:43:32] so one has to remove the CR+2 vote and re vote +2 [21:43:50] hmm, I thought recheck would do the merge if the 2nd test passed [21:44:18] I could be mistaken [21:53:48] (03CR) 10Krinkle: "When a job using npm causes a failure, there should already be a sufficient level of verbosity in the console output. npm-debug.log is mos" [integration/config] - 10https://gerrit.wikimedia.org/r/175112 (owner: 10Hashar) [21:54:24] hashar: shoudl't need a custom destination, it's always output as well. [21:55:11] Krinkle: mobile team reported me https://integration.wikimedia.org/ci/job/mwext-MobileFrontend-npm/60/console [21:55:20] which has not much details [21:55:49] hashar: It does [21:55:50] 21:02:37 npm ERR! Error: ENOENT, lstat '/mnt/jenkins-workspace/workspace/mwext-MobileFrontend-npm/node_modules/jscs/node_modules/xmlbuilder/node_modules/lodash-node/modern/internals/shimIsPlainObject.js' [21:55:59] YuviPanda: i already got some alerts [21:56:01] view full log [21:56:06] hashar: npm-debug is not going to cointain more details [21:56:16] (03CR) 10Hashar: "That was prompted to me by the mobile team: https://integration.wikimedia.org/ci/job/mwext-MobileFrontend-npm/60/console" [integration/config] - 10https://gerrit.wikimedia.org/r/175112 (owner: 10Hashar) [21:56:28] the effort to find, download, unpack and open is not going to be faster than viewing full console log. [21:56:43] i wasn't sure whether it had some useful content [21:56:48] I am. [21:57:14] so I guess the patch is unneeded:] [21:57:19] yay [21:57:22] twentyafterfour: cool :) [21:58:35] (03Abandoned) 10Hashar: Capture npm-debug.log when available [integration/config] - 10https://gerrit.wikimedia.org/r/175112 (owner: 10Hashar) [21:58:56] Krinkle: which leave the mystery of why the build failed ( https://integration.wikimedia.org/ci/job/mwext-MobileFrontend-npm/60/console ) [22:03:05] Yippee, build fixed! [22:03:06] Project browsertests-CirrusSearch-test2.wikipedia.org-linux-firefox-sauce build #276: FIXED in 3 min 55 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-test2.wikipedia.org-linux-firefox-sauce/276/ [22:06:43] hashar: https://github.com/jscs-dev/node-jscs/issues/787 [22:06:48] hashar: npm has bugs with caching [22:06:51] or something [22:06:58] just try again [22:07:09] no need to clear cache, as long as the local directory is cleared, which it is, right? [22:07:18] we absolutely must not retain node_modules between jobs. [22:07:25] .git being preserved is enough trouble [22:32:26] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #345: FAILURE in 1 hr 17 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/345/ [22:59:19] ryasmeen: are you working in the MobileFrontend much these days? [23:17:59] (03CR) 10Krinkle: [C: 032] mediawiki: Configure 'error' and 'exception' log groups [integration/jenkins] - 10https://gerrit.wikimedia.org/r/174810 (https://bugzilla.wikimedia.org/48002) (owner: 10Krinkle) [23:18:04] (03Merged) 10jenkins-bot: mediawiki: Configure 'error' and 'exception' log groups [integration/jenkins] - 10https://gerrit.wikimedia.org/r/174810 (https://bugzilla.wikimedia.org/48002) (owner: 10Krinkle) [23:33:21] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #380: FAILURE in 1 hr 28 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/380/ [23:58:47] chrismcmahon: mobile VE yes