[00:33:45] 3Wikimedia Labs / 3deployment-prep (beta): "404 file Not Found Error" when logging into betalabs - 10https://bugzilla.wikimedia.org/71806 (10Greg Grossmeier) a:3Sam Reed (reedy) [01:00:39] 19:25 < bd808> Reedy is the new beta + logstash + other cool things master [01:00:44] just repeating for good measure :) [01:02:53] (03PS1) 10Arlolra: Run npm test on parsoid [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165671 [01:53:43] 3Wikimedia Labs / 3deployment-prep (beta): "404 file Not Found Error" when logging into betalabs - 10https://bugzilla.wikimedia.org/71806#c7 (10Sam Reed (reedy)) 5NEW>3RESO/FIX Another example of why beta shouldn't have a diverged apache config from production. Docroot paths fixed in https://gerrit.wiki... [01:57:04] what the.. that happened (sam closing that bug) a long time ago [03:26:05] wikibugs is running about 3 hours late, it seems. [03:56:13] Yippee, build fixed! [03:56:13] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #219: FIXED in 12 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/219/ [04:05:28] Creating accounts on Beta enwiki is totally broken AFAICT: https://bugzilla.wikimedia.org/show_bug.cgi?id=71862 [04:09:39] superm401: just replied :) [04:21:45] (03PS1) 10EBernhardson: Add Echo dependency to Flow qunit [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165686 [04:34:42] (03CR) 10EBernhardson: [C: 032] "Confirmed to fix the qunit job: https://integration.wikimedia.org/ci/job/mwext-Flow-qunit/2056/console" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165686 (owner: 10EBernhardson) [04:36:24] (03CR) 10Krinkle: Run npm test on parsoid (031 comment) [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165671 (owner: 10Arlolra) [04:37:39] (03CR) 10Krinkle: Run npm test on parsoid (031 comment) [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165671 (owner: 10Arlolra) [04:37:52] (03Merged) 10jenkins-bot: Add Echo dependency to Flow qunit [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165686 (owner: 10EBernhardson) [04:47:02] Yippee, build fixed! [04:47:02] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #109: FIXED in 10 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/109/ [05:08:34] Yippee, build fixed! [05:08:35] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #144: FIXED in 1 min 9 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/144/ [05:12:31] 3Wikimedia Labs / 3deployment-prep (beta): Mobile redirect goes to wrong domain name on beta labs - 10https://bugzilla.wikimedia.org/71079#c6 (10Greg Grossmeier) a:3Sam Reed (reedy) Reedy: Another docroot issue? This and bug 70948, too. [05:22:00] Yippee, build fixed! [05:22:00] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #188: FIXED in 2 min 41 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/188/ [05:33:23] Project browsertests-CirrusSearch-test2.wikipedia.org-linux-firefox-sauce build #187: FAILURE in 1 min 27 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-test2.wikipedia.org-linux-firefox-sauce/187/ [06:08:01] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862 (10Matthew Flaschen) 3NEW p:3Unprio s:3major a:3None On Beta Labs enwiki, attempting to create an account gives (in the red error box): --- Ac... [06:08:14] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c1 (10Greg Grossmeier) p:5Unprio>3Normal Matt: Do you have access to the Beta Cluster? If not, you should. Then can you poke around the logs and se... [06:08:29] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c2 (10Matthew Flaschen) I think I have the necessary access. I'm not planning to take this (at least not tonight), though. [06:08:43] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c3 (10Matthew Flaschen) S Page said he was able to create an account, but it then redirected to the a page that said, "You are already logged in as Spa... [06:09:15] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c4 (10Greg Grossmeier) Mini rant: Those with production deploy privs should have access to the beta cluster (if not, I'll add you right now) and should... [06:10:44] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c5 (10Greg Grossmeier) (not complaining about you two specifically, Matthew/S, just something that's been on my mind for a while about making our test... [08:12:44] 3Wikimedia / 3Continuous integration: [upstream] Jenkins: jobs created via JJB are not properly registered in Zuul Gearman server - 10https://bugzilla.wikimedia.org/63758#c3 (10Antoine "hashar" Musso) I managed to describe how to reproduce the issue and confirmed it. Upstream author Khai Do happily followed... [08:24:46] good morning [08:33:32] guten morgen [08:34:53] Tobi_WMDE_SW_NA: had a 1/1 yesterday with greg-g and I praised your work on migrating the browsertests :D [08:35:07] hashar: haha [08:35:09] ok [08:35:11] thx [08:35:43] with your great support it wasn't that hard. [08:39:37] :) [08:39:41] * greg-g can't sleep [08:40:25] greg-g: tip : close that window and grab a book !! :D [08:40:57] good idea [08:40:58] * greg-g tries [08:42:04] lets review the other pending changes [08:43:19] Tobi_WMDE_SW_NA: are Jan Zerebecki and Addshore working for wmde? [08:43:50] hashar: jzerebecki is working for WMDE now. addshore isn't anymore [08:43:51] hashar: i am [08:44:29] jzerebecki: great news :-]  I really liked your test related patches over the last years/months [08:44:55] yes, its great :) . thank you. [09:00:10] Yippee, build fixed! [09:00:10] Project browsertests-VisualEditor-test2.wikipedia.org-windows_8-internet_explorer-sauce build #55: FIXED in 1 hr 16 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-test2.wikipedia.org-windows_8-internet_explorer-sauce/55/ [09:00:22] Yippee, build fixed! [09:00:23] Project browsertests-VisualEditor-test2.wikipedia.org-linux-firefox-sauce build #232: FIXED in 1 hr 8 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-test2.wikipedia.org-linux-firefox-sauce/232/ [09:19:16] !log deleting deployment-cxserver01 (borked since virt1005 outage) creating deployment-cxserver02 to replace it {{bug|71783}} [09:19:18] Logged the message, Master [09:22:11] !log Renamed Jenkins slave deployment-cxserver01 to deployment-cxserver02 and updated IP. It is marked offline until the instance is ready and has the relevant puppet classes applied. [09:22:13] Logged the message, Master [09:34:59] !log migrating deployment-cxserver02 to beta cluster puppet and salt masters [09:35:00] Logged the message, Master [09:54:37] PROBLEM - BetaLabs: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: deployment-prep.deployment-cxserver02.puppetagent.failed_events.value (30.00%) [10:10:40] ^^^ https://bugzilla.wikimedia.org/show_bug.cgi?id=71873 [10:10:45] can't ACK it in icinga [10:10:51] off for lunch [11:45:55] Project beta-scap-eqiad build #24847: FAILURE in 1 min 18 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/24847/ [11:55:32] Yippee, build fixed! [11:55:33] Project beta-scap-eqiad build #24848: FIXED in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/24848/ [12:14:07] zeljkof: did you manage to have a look at https://gerrit.wikimedia.org/r/#/c/165461/ yet? [12:14:27] Tobi_WMDE_SW: sorry, not yet, will do today, in a meeting now [12:14:37] a'right [12:14:40] :) [13:04:40] (03CR) 10Manybubbles: [C: 031] "Looks good to me. Happy to +2 if everyone is happy with it." [ruby/api] - 10https://gerrit.wikimedia.org/r/159630 (https://bugzilla.wikimedia.org/70605) (owner: 10Damienkan) [13:42:12] (03PS2) 10Hashar: Remove '-qunit' from 'prepare-mediawiki-qunit' [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165321 [13:44:14] (03PS2) 10Hashar: Avoid dupe code by using prepare-mediawiki macro [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165327 [13:45:45] (03CR) 10Hashar: [C: 032] "noop" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165321 (owner: 10Hashar) [13:49:05] (03Merged) 10jenkins-bot: Remove '-qunit' from 'prepare-mediawiki-qunit' [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165321 (owner: 10Hashar) [13:51:12] (03PS3) 10Hashar: Avoid dupe code by using prepare-mediawiki macro [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165327 [13:52:29] (03PS6) 10Tobias Gritschacher: Use templates for all Wikidata jobs [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165461 [13:54:43] (03CR) 10Hashar: [C: 032] "Yum noop" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165327 (owner: 10Hashar) [13:56:32] (03CR) 10Hashar: [C: 031] mwext-VisualEditor-qunit: Use git clean "-ff" instead "-f" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165328 (owner: 10Krinkle) [13:58:11] (03Merged) 10jenkins-bot: Avoid dupe code by using prepare-mediawiki macro [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165327 (owner: 10Hashar) [14:36:51] (03CR) 10Zfilipin: "I will +2 it later today, if nobody does it before. Did not have the time to test on my machine yet." [ruby/api] - 10https://gerrit.wikimedia.org/r/159630 (https://bugzilla.wikimedia.org/70605) (owner: 10Damienkan) [14:56:42] (03PS4) 10Hashar: Revert "Workaround hphpize injecting some bad include path" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160447 (https://bugzilla.wikimedia.org/68944) [14:59:46] (03CR) 10Hashar: [C: 032] "$ jenkins-jobs --conf jenkins_jobs.ini update config/ '*hhvm-build'" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160447 (https://bugzilla.wikimedia.org/68944) (owner: 10Hashar) [15:02:59] (03Merged) 10jenkins-bot: Revert "Workaround hphpize injecting some bad include path" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160447 (https://bugzilla.wikimedia.org/68944) (owner: 10Hashar) [15:31:46] (03PS1) 10Hashar: Move layout to new /zuul/ subdirectory [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165745 [15:33:04] (03CR) 10Hashar: [C: 032] Move layout to new /zuul/ subdirectory [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165745 (owner: 10Hashar) [15:33:13] (03Merged) 10jenkins-bot: Move layout to new /zuul/ subdirectory [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165745 (owner: 10Hashar) [15:40:27] (03PS1) 10Hashar: Zuul layout.yaml got moved to /zuul/ subdir [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165747 [15:46:29] (03CR) 10Hashar: "Deployed" [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165745 (owner: 10Hashar) [15:47:06] (03CR) 10Hashar: [C: 032] "Jobs refreshed." [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165747 (owner: 10Hashar) [15:50:50] (03Restored) 10Hashar: Jenkins job validation (DO NOT SUBMIT)... [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/60646 (owner: 10Hashar) [15:50:52] (03Merged) 10jenkins-bot: Zuul layout.yaml got moved to /zuul/ subdir [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165747 (owner: 10Hashar) [15:50:54] (03PS6) 10Hashar: Jenkins job validation (DO NOT SUBMIT)... [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/60646 [15:51:26] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT)... [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/60646 (owner: 10Hashar) [16:06:46] PROBLEM - BetaLabs: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: deployment-prep.deployment-cxserver02.puppetagent.failed_events.value (100.00%) [16:07:59] 3Wikimedia / 3Continuous integration: hphpize inject $PWD used at time of building to HHVM_INCLUDE_DIRS - 10https://bugzilla.wikimedia.org/68944#c12 (10Antoine "hashar" Musso) 5RESO/?>3VERI $ jenkins-jobs --conf jenkins_jobs.ini update config/ '*hhvm-build' INFO:root:Updating jobs in ['config/'] (['*hhv... [16:29:38] Reedy: can you take a look at this one before you do today's deploy? https://bugzilla.wikimedia.org/show_bug.cgi?id=71862 [16:53:50] bd808: any quick pointers on beta::puppetmaster::sync ? [16:54:09] we got bit confused with beta::puppetmaster::sync and self::role::puppetmaster [16:54:44] tonythomas: the sync class just adds a cron job to update the /var/lib/git/operations/puppet repo once an hour [16:54:49] Which is handy [16:55:18] So it assumes you have self::role::puppetmaster enabled too. [16:55:34] and it should only be applied on the actually puppetmaster [16:55:55] bd808: ok so -> it should be applied only after role::self::puppet ? [16:56:45] after or at the same time. either should work [16:56:48] I think [16:58:30] Greg-g I will [16:58:46] bd808: ok. will do that. We where trying to setup mx.beta.wmflabs.org :) [16:59:31] tonythomas: Oh. It should not have it's own puppetmaster or chaos [16:59:42] we already have a beta puppetmaster [17:00:06] But you need to hack stuff for that instance right? [17:00:31] bd808: yeah. [17:00:39] currenty we dont have a role::mail::mx available [17:00:42] My fear with having a "one of these things is not like the others" in beta is that it will rot and break [17:00:46] so have to hack [17:01:10] I'm not the keeper of beta but I am "a" keeper of beta :) [17:01:29] bd808: that was not our intention though. Jeff was tight with getting the mx in sync wih puppet [17:01:43] but it looks like the resolutions will take long time [17:01:49] tonythomas: It's not so much that it's not available right, it's just that it conflict with existing roles that are applied across labs? [17:01:51] hope you saw the thread in wikitech labs [17:02:04] yeah. mail::sender [17:02:11] I wonder if that can be fixed with hiera [17:03:26] I don't have time to experiment so I'll step out of the way and let you guys do what you think you can do. But please make sure we have an open bug that tracks the variance [17:03:28] thats what all pointed too -> but it looks like that would take time [17:03:33] we just killed one though https://gerrit.wikimedia.org/r/#/c/165751/ [17:52:13] Reedy: any update on the login one? [17:52:46] * greg-g has to be in transit for the next 15 minutes, brb :) [17:57:15] Nope [17:57:20] Only just back on my laptop [18:01:33] First guess is that it's more likely to be beta itself [18:04:29] I guess the fucktonne of memcached errors for beta aren't helping matters :) [18:12:25] just restarted all the memcached boxes [18:12:41] * greg-g nods [18:14:10] from job runners too [18:16:02] 3Wikimedia Labs / 3deployment-prep (beta): Requested 115.108.187.192.proxies.dnsbl.sorbs.net., not found in proxies.dnsbl.sorbs.net.. - 10https://bugzilla.wikimedia.org/71894 (10Sam Reed (reedy)) 3NEW p:3Unprio s:3minor a:3None Requested 115.108.187.192.proxies.dnsbl.sorbs.net., not found in proxies.... [18:16:19] Memcached error: Error connecting to 127.0.0.1:11211: Connection refused [18:16:25] I guess that must be nutcracker then [18:17:39] ugh [18:17:59] Reedy: do you know much about that setup? need anyone else? [18:19:23] The question is why it's trying 11211... [18:19:23] listen: "127.0.0.1:11212" [18:19:45] * greg-g shrugs [18:19:52] then we get others [18:19:53] Memcached error for key "enwiki:lag_times:deployment-db1" on server "127.0.0.1:11212": SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY [18:20:11] Ah [18:20:20] Why is beta trying to use production memcached? :) [18:20:31] servers: [18:20:31] - "10.64.0.180:11211:1" [18:20:31] - "10.64.0.181:11211:1" [18:20:31] - "10.64.0.182:11211:1" [18:20:38] At least on deployment-jobrunner01 [18:21:23] oh goodie :) [18:21:33] wtf [18:21:43] it's right on deployment-mediawiki01 [18:22:46] so yeah, I see 2 distinct different issues [18:25:23] Reedy: btw, I added you (well, Tyler did it) to the default cc for beta cluster bugs :) [18:26:22] I'm CC'd on bz bugs ;) [18:26:46] oh, heh [18:26:56] well then, I couldn't sneak that by you ;) [18:29:32] greg-g: do we have any 'point person' to point to for 'hey, puppet in beta labs is failing'? [18:29:36] right now it's the cxserver... [18:29:45] antoine and reedy, really [18:30:01] we don't have 100% timezone coverage [18:30:10] me being up at 2am last night isn't normal [18:30:37] (and yes, I explicitly didn't mention bryan) [18:30:54] greg-g: heh, ok [18:31:47] YuviPanda: other than you and bogott, any suggestions from ops for a quasi point person for puppet things? [18:31:59] greg-g: coren perhaps. [18:32:03] (and sorry for cc'ing you on the last few bugs :/ ) [18:33:43] greg-g: nah, 'tis ok :) [18:37:18] - - "10.64.0.194:11211:1" [18:37:19] - - "10.64.0.195:11211:1" [18:37:19] + - "10.68.16.14:11211:1" [18:37:19] + - "10.68.16.15:11211:1" [18:37:19] Yay [18:39:41] greg-g: Ok, so now the job runner is at least failing with the same memcached errors as the "apaches" [18:39:41] :P [18:39:50] getting there :) [18:40:22] Just why is it trying to connect to 127.0.0.1:11211 [18:41:01] Which is the default for $wgMemCachedServers [18:46:29] 3Wikimedia Labs / 3deployment-prep (beta): Requested 115.108.187.192.proxies.dnsbl.sorbs.net., not found in proxies.dnsbl.sorbs.net.. - 10https://bugzilla.wikimedia.org/71894#c1 (10Yuvi Panda) Where's this coming from? [18:47:30] 3Wikimedia Labs / 3deployment-prep (beta): Requested 115.108.187.192.proxies.dnsbl.sorbs.net., not found in proxies.dnsbl.sorbs.net.. - 10https://bugzilla.wikimedia.org/71894#c2 (10Sam Reed (reedy)) I noticed it in logstash-beta [18:48:57] I think https://gerrit.wikimedia.org/r/165778 has kerbed the accessing of the wrong memcached servers [18:49:44] 3Wikimedia Labs / 3deployment-prep (beta): Requested 115.108.187.192.proxies.dnsbl.sorbs.net., not found in proxies.dnsbl.sorbs.net.. - 10https://bugzilla.wikimedia.org/71894#c3 (10Sam Reed (reedy)) { "_index": "logstash-2014.10.09", "_type": "dnsblacklist", "_id": "CeX-WiVqTPerC4N8uRXoog", "_score":... [18:50:30] curbed* [18:50:40] SpecialCentralAutoLogin::getInlineScript: file not found: "/srv/mediawiki/php-master/extensions/CentralAuth/includes/specials/../modules/inline/anon-set.js" [18:51:29] Just poked lego about it [18:54:02] (03CR) 10Arlolra: "Maybe https://gerrit.wikimedia.org/r/#/c/156693/ is the better patch to land." [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165671 (owner: 10Arlolra) [18:54:57] greg-g: Oh look, deployment-videoscaler01 also uses production nutcracker... [18:55:03] Project browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #156: FAILURE in 37 sec: https://integration.wikimedia.org/ci/job/browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/156/ [18:55:26] Reedy: wth [18:55:57] I guess I should move include ::role::beta::nutcracker into role::beta::common [18:57:01] (03CR) 10Cscott: "recheck" [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/156693 (owner: 10Cscott) [18:59:30] (03CR) 10Cscott: [C: 031] Add macros for asserting node.js version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/161918 (owner: 10Hashar) [18:59:36] (03PS5) 10Cscott: Add macros for asserting node.js version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/161918 (owner: 10Hashar) [18:59:56] (03PS7) 10Cscott: parsoidsvc: split npm job based on nodejs version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:00:22] (03CR) 10Cscott: [C: 031] parsoidsvc: split npm job based on nodejs version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:01:32] (03CR) 10jenkins-bot: [V: 04-1] parsoidsvc: split npm job based on nodejs version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:01:38] (03CR) 10Cscott: "Depends on https://gerrit.wikimedia.org/r/160589" [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/156693 (owner: 10Cscott) [19:02:49] bd808: https://gerrit.wikimedia.org/r/#/c/165770/4/manifests/role/beta.pp,unified Should I just move the nutcracker config to ::beta::common ? [19:04:50] I suppose it wouldn't hurt anything. How do we add nutcracker in prod? [19:04:58] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c6 (10Sam Reed (reedy)) Can someone test this for me now? [19:04:59] Not sure [19:05:10] * bd808 was too lazy to look [19:05:17] But numerous of the labs machines have nutcracker, all with production ip ranges [19:05:46] role::mediawiki::common gives it [19:05:47] Unless there's some sub role for machines that have mediawiki installed [19:05:55] Oh.... [19:06:08] we just need to fix hiera for beta! [19:06:18] servers => hiera('mediawiki_memcached_servers') [19:06:21] "just" [19:06:22] ;) [19:06:26] heh [19:06:39] It's just a yaml fille change [19:06:45] (03CR) 10Cscott: [C: 032] Add macros for asserting node.js version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/161918 (owner: 10Hashar) [19:06:46] instead of a hack class [19:07:25] !log updated scap to include 8183d94 (Fix "TypeError bufsize must be an integer") [19:07:29] Logged the message, Master [19:09:33] Reedy: We need to figure out how _joe_ wired per-project things for labs into hiera and then make a file like -- https://github.com/wikimedia/operations-puppet/blob/9ad61aa3c94169e4c5d376371766b2e6983bb46b/hieradata/eqiad.yaml#L1 [19:09:51] Ah [19:09:55] That looks sane at least [19:10:00] (03Merged) 10jenkins-bot: Add macros for asserting node.js version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/161918 (owner: 10Hashar) [19:11:03] Reedy: https://github.com/wikimedia/operations-puppet/blob/9ad61aa3c94169e4c5d376371766b2e6983bb46b/modules/puppetmaster/files/labs.hiera.yaml#L12 [19:11:25] So labs/deployment-prep.yaml I think [19:12:13] Is this likely a recent breakage then? [19:13:37] yeah I think its kind of new [19:14:38] Joe is around so I guess we can ask where it needs to go [19:15:18] (03PS8) 10Cscott: parsoidsvc: split npm job based on nodejs version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:15:28] 3Wikimedia Labs / 3deployment-prep (beta): Requested 115.108.187.192.proxies.dnsbl.sorbs.net., not found in proxies.dnsbl.sorbs.net.. - 10https://bugzilla.wikimedia.org/71894#c4 (10Antoine "hashar" Musso) On beta we have: # Attempt to auto block users using faulty servers # See also http://www.us.sorbs.net/... [19:15:49] (03CR) 10jenkins-bot: [V: 04-1] parsoidsvc: split npm job based on nodejs version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:16:08] (03PS9) 10Cscott: parsoidsvc: split npm job based on nodejs version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:23:46] (03CR) 10Jforrester: "Good to go?" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:30:15] (03CR) 10Cscott: [C: 032] "Looks good to me! I'll deploy this." [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:33:41] (03Merged) 10jenkins-bot: parsoidsvc: split npm job based on nodejs version [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/160589 (owner: 10Jforrester) [19:39:51] cscott: James_F \O/ [19:39:58] Woo. :-) [19:40:05] Now for the Zuul change too. [19:40:21] James_F: I have tweaked a bit the zuul-config repo [19:40:21] yeah, i'm rebasing that now. [19:40:28] the layout.yaml file is now in a subdirectory /zuul/ [19:40:33] hashar: I saw. [19:40:39] hashar: Are you going to merge the repos soon? [19:40:45] not this week [19:40:49] Sure. :-) [19:40:54] Monday! It's a holiday. [19:40:54] unfortunately git-rebase doesn't magically move my patch i don't think. [19:40:59] Best possible time to break everything. [19:40:59] yeah monday sounds good [19:41:28] we had a bunch of changes to review/merge in and Tobi from WMDE was finishing the migration of their jobs to the wmf instance [19:43:26] (03PS7) 10Cscott: Swap in the new parsoidsvc-(source|deploy) jobs. [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/156693 [19:43:57] ahhh [19:44:01] hashar, James_F: there is only zuul: https://gerrit.wikimedia.org/r/156693 [19:44:37] jenkins likes it, even. [19:44:48] https://integration.wikimedia.org/ci/job/integration-zuul-layoutdiff/1840/console [19:44:55] !log added role::deployment::test to deployment-rsync01 and deployment-mediawiki03 for trebuchet testing [19:44:55] the diff let you figure out what is changed (more or less) [19:44:56] Logged the message, Master [19:45:01] (03CR) 10Cscott: [C: 031] "ok, the dependencies have been merged. let's do this thing!" [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/156693 (owner: 10Cscott) [19:45:56] (03CR) 10Hashar: [C: 032] "Since I am obviously never going to carefully review this and I am probably overthinking the whole thing. Lets be bold, merge and amend la" [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/156693 (owner: 10Cscott) [19:46:02] la -> later [19:46:04] (03Merged) 10jenkins-bot: Swap in the new parsoidsvc-(source|deploy) jobs. [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/156693 (owner: 10Cscott) [19:46:16] hashar: great, merge & run. ;) [19:46:16] cscott: do you have access on gallium to pull / reload ? [19:47:30] let me see. i had jjb permissions, i think we gave me zuul perms at the same time. [19:47:40] too late deployed :] [19:47:51] i noticed that https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Deploy_configuration doesn't actually mention *which* machine you're supposed to log in to. [19:47:57] ohh [19:49:09] done [19:49:18] eventually one day the deploy thing will use git-deploy [19:49:54] (03Abandoned) 10Arlolra: Run npm test on parsoid [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165671 (owner: 10Arlolra) [19:50:51] hashar: Eventually. [19:53:18] bd808: hieradata/labs/deployment-prep.yaml ? [19:53:49] Reedy: yeah [19:55:14] The following paths are ignored by one of your .gitignore files: [19:55:14] hieradata/labs/deployment-prep.yaml [19:55:15] Use -f if you really want to add them. [19:55:15] fatal: no files added [19:55:43] # Exclude all yaml files here from git, as they should be created on-demand for any instance. [20:04:31] (03PS1) 10Hashar: parsoid npm jobs were missing git commands [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165864 [20:04:35] cscott: ^^ [20:04:41] cscott: lets avoid spamming parsoid folks :] [20:05:29] (03PS1) 10Cscott: parsoidsvc: ensure sources are checked out before running npm test. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165865 [20:05:29] I am refreshing parsoidsvc-source-npm-0.8 [20:06:02] (03Abandoned) 10Hashar: parsoid npm jobs were missing git commands [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165864 (owner: 10Hashar) [20:06:35] cscott: you can rebuild a job directly from Jenkins ex: https://integration.wikimedia.org/ci/job/parsoidsvc-source-npm-0.8 [20:07:57] bd808: Reckon I should just leave the heira stuff for joe tomorrow? [20:08:09] hashar: oh, i didn't copy over concurrent:true in https://gerrit.wikimedia.org/r/165865 [20:08:38] Reedy: hmmm... yeah maybe [20:08:59] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c7 (10Matthew Flaschen) (In reply to Sam Reed (reedy) from comment #6) > Can someone test this for me now? Yeah, still the same symptoms for me. Any... [20:08:59] (03PS2) 10Cscott: parsoidsvc: ensure sources are checked out before running npm test. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165865 [20:09:06] Reedy: Or the idea is we just jam them into the puppetmaster? I don't like that [20:09:37] I'm really not sure [20:10:06] cscott: yeah by default Jenkins only allow one copy of a job to run [20:10:22] hashar: i updated the patch [20:10:35] cscott: go go go! [20:10:59] (03CR) 10Hashar: [C: 031] "Update the jobs, retrigger one and see what happens :]" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165865 (owner: 10Cscott) [20:11:22] cscott: so what I usually do is refresh the job configuration from my machine. Then in the web interface I rebuild the last build and see what happens [20:11:33] cscott: amend the job config, refresh it, rebuild ... rinse repeat till happy [20:13:25] ok [20:15:09] cscott: feel free to +2 the change once you are happy with it [20:16:25] bed time for now [20:16:40] !log deployment-sca01 dead -- Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 [20:16:41] Logged the message, Master [20:16:45] bd808: https://rt.wikimedia.org/Ticket/Display.html?id=8613 filed for beta heira assigned to joe [20:17:22] !log rebooted deployment-sca01 via wikitech ui [20:17:24] Logged the message, Master [20:17:45] Reedy: LGTM thanks! [20:26:46] YuviPanda: How do I restart a Labs instance that has a kernel panic? Hitting reboot in wikitech didn't seem to do anything. [20:27:08] bd808: uh, that's the only way... [20:27:16] bd808: also, try refreshing after hitting reboot... [20:27:23] sometimes it does reboot, but thinks it didn't... [20:27:38] (03PS1) 10Cscott: Fix typo: parsoid/deploy jobs should be parsoidsvc-deploy-*. [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165866 [20:31:02] (03CR) 10Cscott: [C: 032] Fix typo: parsoid/deploy jobs should be parsoidsvc-deploy-*. [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165866 (owner: 10Cscott) [20:31:20] (03Merged) 10jenkins-bot: Fix typo: parsoid/deploy jobs should be parsoidsvc-deploy-*. [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165866 (owner: 10Cscott) [20:32:25] hashar: as it turns out, i can't log in to gallium. [20:32:51] cscott: :( [20:33:01] cscott: that is fixable via a puppet change [20:33:04] could you deploy https://gerrit.wikimedia.org/r/165866 for me? [20:33:04] will update [20:34:20] cscott: done [20:34:25] (03CR) 10Hashar: "deployed" [integration/zuul-config] - 10https://gerrit.wikimedia.org/r/165866 (owner: 10Cscott) [20:34:44] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c8 (10Sam Reed (reedy)) Not sure straight off. Memcached was in somewhat of a mess on beta, which was spamming the hell out of the logs. I sorta presum... [20:35:48] Project browsertests-VisualEditor-test2.wikipedia.org-linux-chrome-sauce build #235: FAILURE in 58 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-test2.wikipedia.org-linux-chrome-sauce/235/ [20:37:10] What's up with these? [20:39:08] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #110: FAILURE in 10 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/110/ [20:56:03] (03CR) 10Cscott: [C: 032] parsoidsvc: ensure sources are checked out before running npm test. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165865 (owner: 10Cscott) [20:56:19] Reedy: the browser test fails? [20:56:25] Indeed [20:56:54] no idea :/ [20:59:24] (03Merged) 10jenkins-bot: parsoidsvc: ensure sources are checked out before running npm test. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165865 (owner: 10Cscott) [21:00:13] I'm not going down more rabbit holes tonight [21:01:28] Reedy: yeah [21:23:34] Yippee, build fixed! [21:23:34] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #96: FIXED in 9 min 52 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/96/ [21:26:02] Yippee, build fixed! [21:26:03] Project browsertests-CirrusSearch-test2.wikipedia.org-linux-firefox-sauce build #188: FIXED in 1 min 42 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-test2.wikipedia.org-linux-firefox-sauce/188/ [21:45:11] well, at least we get some fixed tests :) [21:51:34] Haha [21:52:10] I think I'm going to try spend some time next week on this logstash aggregation stuff [21:52:28] +1 [21:52:51] It'd have been nice if those memcached errors were surfaced [21:52:56] To irc or similar [21:53:08] Eg I didn't have to go look for them [21:53:34] There were 2 or 3 distinct issues. But then thousands of each etc [21:57:10] Haha, mobile site [21:57:12] Shit happens —  [21:58:22] Reedy++ [21:58:52] Damon's comment in office just now too [21:59:15] Discovery of bugs [22:10:54] (03PS1) 10Cscott: parsoidsvc: ensure that npm can find mocha in the $PATH. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165904 [22:14:50] (03CR) 10Cscott: "I'm not sure about the quoting -- I think maybe we should be using single quotes here, to ensure that the variables are not expanded prema" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165904 (owner: 10Cscott) [22:15:55] (03CR) 10Arlolra: [C: 031] parsoidsvc: ensure that npm can find mocha in the $PATH. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165904 (owner: 10Cscott) [22:35:37] marxarelli: good questions, btw, for damon [22:37:09] (03PS2) 10Cscott: parsoidsvc: ensure that npm can find mocha in the $PATH. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165904 [22:37:14] greg-g: thx. there seems to be a lot of conversation lately about tooling (tdd and otherwise) and i'm curious who is thinking about the behavioral side of things [22:39:13] greg-g: and on that note, i have a new tool in search of a problem :) [22:39:21] wee! [22:39:23] I love those! [22:39:57] marxarelli: which tool? [22:40:04] tc + iptables + sudo -g == run certain commands under certain network "conditions" [22:40:38] e.g. sudo -g asia bundle exec cucumber ... [22:40:39] :) [22:40:47] sudo -g asia? [22:41:02] man sudo's [22:41:22] ah, each group has different iptables/tc settings? [22:41:29] env variables or somesuch? [22:41:48] * greg-g shouldn't care, just curious [22:41:58] 1) use iptables to mark outgoing packets by guid; 2) use tc to shape the traffic; 3) run a command as whatever group [22:42:21] s/guid/gid/ [22:42:39] * greg-g nods [22:42:53] cool [22:42:59] anyway, i'm going to package it up into a role [22:43:00] let's find some nails [22:43:04] :) [22:43:13] haha, i couldn't resist trying it. too fun :) [22:44:17] i'm trying it with the mmv perf tests just to see if it makes much of a difference [22:44:34] * greg-g nods [22:45:02] it probably will on a 128kbps connection with 300ms latency :) [22:45:09] :) [22:50:58] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c9 (10Matthew Flaschen) If it helps, when I do the following, I get a different error: 1. Clear all Beta Labs cookies. 2. Go to http://en.wikipedia.be... [22:51:22] (03CR) 10Cscott: [C: 032] parsoidsvc: ensure that npm can find mocha in the $PATH. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165904 (owner: 10Cscott) [22:52:14] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c10 (10Greg Grossmeier) (In reply to Matthew Flaschen from comment #9) > If it helps, when I do the following, I get a different error: > > 1. Clear a... [22:53:19] greg-g, Reedy, I'm back on IRC if you want to talk about the above. [22:53:23] Grepping for that error now. [22:53:32] greg-g, can you reproduce, BTW? [22:53:50] the initial test case, but not the one you just commented about [22:54:42] (03Merged) 10jenkins-bot: parsoidsvc: ensure that npm can find mocha in the $PATH. [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/165904 (owner: 10Cscott) [23:10:14] 3Wikimedia Labs / 3deployment-prep (beta): "There was an unexpected error logging in" when creating accounts on Beta - 10https://bugzilla.wikimedia.org/71862#c11 (10Matthew Flaschen) (In reply to Matthew Flaschen from comment #0) > There was an unexpected error logging in. This is 'nocookieslogin', with a m... [23:12:36] greg-g, I'm not sure where to go from there. Are we able to use a debugger against Beta Labs somehow? Or are there logs I should check? [23:13:08] https://logstash-beta.wmflabs.org/ and whatever else you want to look at on the machines [23:18:15] greg-g, superm401: I've been running into something similar locally on my vagrant instance, but I never got around to debugging it. [23:18:44] legoktm, I've found CentralAuth a little glitchy locally (maybe due to cache eviction), not sure if that's what you're seeing. [23:23:07] greg-g, legoktm, I don't see anything in Kibana for username, password, or session ID (from the cookie). Do you know anywhere else logs might go that are not in Kibana? [23:23:32] I don't really know how beta debugging is setup [23:23:58] Project beta-code-update-eqiad build #27479: FAILURE in 57 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/27479/ [23:25:10] I typically just use the debugging toolbar with wfDebugLog statements, or $wgDebugLogFile = "php://stdout"; [23:27:30] logs are in /data/project/logs from any host (nfs) [23:27:41] the best luck i've had debugging things in beta is booting up your own hhvm instance somewhere, attaching with the hhvm debug cli, and then sending specific requests to it. Its a giant pain to get going though, i should scriptify and document some day [23:27:50] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #213: FAILURE in 6 min 29 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/213/ [23:27:57] but it needs to be inside the beta cluster so it gets all the beta configuration/etc. [23:28:19] yuck [23:28:36] but +1 if it works [23:28:50] it turns out its better than merging var_dump's to master :) [23:29:09] we have a hhvm box in there that you can direct your request too with a cookie [23:29:23] we should publicize that [23:29:39] ooh, that might be easier, but then i wonder about how that will work with shared state? [23:29:48] the reason i boot up my own hhvm instance is because the cli debugger freezes the whole server [23:30:01] ebernhardson: marxarelli could tell you how to do it. we set it up for a pen tester [23:30:02] not just the request instance, but everything that hhvm daemon is doing [23:30:12] ah right [23:31:09] I have done sneaky things like stop the jenkins jobs and live hack before too ;) [23:34:00] Yippee, build fixed! [23:34:01] Project beta-code-update-eqiad build #27480: FIXED in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/27480/