[01:09:58] 3Wikimedia Labs / 3tools: Audit security groups - 10https://bugzilla.wikimedia.org/60144#c2 (10Tim Landscheidt) 5RESO/INV>3REOP Eh, yes, irrespective of the DC location, we should still make sure that: a) all hosts have proper security groups assigned, and b) security groups really allow traffic they're... [01:21:13] 3Wikimedia Labs / 3tools: Audit security groups - 10https://bugzilla.wikimedia.org/60144#c3 (10Daniel Zahn) agree, just because we didn't already do it doesn't mean it's invalid :) [02:12:28] 3Wikimedia Labs / 3wikistats: Make a stats table for 85 W3C wikis - 10https://bugzilla.wikimedia.org/41023#c11 (10Daniel Zahn) Robih, check here: http://wikistats.wmflabs.org/display.php?t=w3c yea, still misses the Version column and outgoing links are broken, but wanted to let you know about progress it... [02:21:16] I am setting a webserver in an instance in a project but the default size is periodically removed from /etc/apache2/sites-enabled dir [02:21:34] does that have to do with the configuration of the instances? like a puppet issue or something? [02:32:39] uh, yes, I have this in syslog Aug 28 01:12:57 wlmjurytool2014 puppet-agent[18659]: (/Stage[main]/Apache/File[/etc/apache2/sites-enabled/000-default.conf]/ensure) removed [02:32:46] how can I tell it to keep that file? [02:43:58] 3Wikimedia Labs / 3wikistats: Make a stats table for 85 W3C wikis - 10https://bugzilla.wikimedia.org/41023#c12 (10Daniel Zahn) 5ASSI>3RESO/FIX links in table: https://gerrit.wikimedia.org/r/#/c/156746/ cron for updates: https://gerrit.wikimedia.org/r/156747 the version update thing is all horrible and... [09:21:25] !log deployment-prep resetting git repository in /data/project/apache/conf to point to the betaclusterbranch of operations/mediawiki-config.git discarded all local hacks in the process [09:21:27] Logged the message, Master [11:03:08] i was wondering if it might be possible to get node upgraded in tools labs for this project https://tools.wmflabs.org/anon/ [11:04:04] i've got it working now with a locally installed version of node i downloaded, but ideally i would get it working with the open grid engine, which probably involves upgrading node in tools labs? [11:04:38] latest is v0.10.31 and we've got v0.8.2 installed [11:05:03] maybe i should create a ticket for this instead of asking here? [11:32:18] edsu: yes [11:51:30] 3Wikimedia Labs / 3tools: upgrade node in tools labs - 10https://bugzilla.wikimedia.org/70120 (10Ed Summers) 3NEW p:3Unprio s:3normal a:3Marc A. Pelletier I'd like to get a more modern version of node installed in Tools Labs for the anon project. Right now we have v0.8.2 installed but v0.10.31 is ava... [11:52:09] :-D [13:05:46] 3Wikimedia Labs / 3tools: Add locking to jstart so simultaneously started jstart / jsub -once calls don't create duplicated tasks - 10https://bugzilla.wikimedia.org/60862#c4 (10Krinkle) Happened again at tools.wmfdbbbot: qstat: 3343394 0.31914 dbbot-wm tools.wmfdbb r 08/21/2014 18:40:18 continuous@too... [13:08:16] 3Wikimedia Labs / 3tools: upgrade node in tools labs - 10https://bugzilla.wikimedia.org/70120#c1 (10Krinkle) This certificate error is a problem within node 0.8 / npm 1.1.x that was fixed in an update. While we should update to node 0.10, we probably will not do that for an individual page and instead have... [13:10:28] 3Wikimedia Labs / 3tools: upgrade node in tools labs - 10https://bugzilla.wikimedia.org/70120#c2 (10Ed Summers) So can we get an upgrade in tools labs to another version of 0.8.x? [13:10:46] 3Wikimedia Labs / 3tools: upgrade node in tools labs - 10https://bugzilla.wikimedia.org/70120#c3 (10Krinkle) See also http://blog.npmjs.org/post/78085451721/npms-self-signed-certificate-is-no-more In short: They used to have a self-signed certificate which was baked into nodejs. They got a proper certificat... [13:10:46] 3Wikimedia Labs / 3tools: Tool Labs: Node.js and npm broken due to outdated certificate (install minor update to fix certificate) - 10https://bugzilla.wikimedia.org/70120 (10Krinkle) [13:11:16] 3Wikimedia Labs / 3tools: Tool Labs: Node.js and npm broken due to outdated certificate (install minor update to fix certificate) - 10https://bugzilla.wikimedia.org/70120#c4 (10Krinkle) p:5Unprio>3High s:5normal>3critic Raising priority. npm is unusable without this. [13:16:40] Krinkle: thanks for your help w/ this [13:16:55] this = https://bugzilla.wikimedia.org/show_bug.cgi?id=70120 :) [13:51:58] 3Wikimedia Labs / 3tools: Tool Labs: Node.js and npm broken due to outdated certificate (install minor update to fix certificate) - 10https://bugzilla.wikimedia.org/70120#c5 (10Tim Landscheidt) *** Bug 69079 has been marked as a duplicate of this bug. *** [13:51:58] 3Wikimedia Labs / 3tools: Upgrade node and npm on Tool Labs - 10https://bugzilla.wikimedia.org/69079#c2 (10Tim Landscheidt) 5UNCO>3RESO/DUP *** This bug has been marked as a duplicate of bug 70120 *** [14:23:45] 3Wikimedia Labs / 3deployment-prep (beta): Commons beta cannot resolve URL with a colon in it - 10https://bugzilla.wikimedia.org/70124 (10dan) 3NEW p:3Unprio s:3normal a:3None Created attachment 16304 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16304&action=edit sample XML file for use wit... [14:32:28] 3Wikimedia Labs / 3tools: NFS servers doesn't allow access for some tool maintainers to their tool directories - 10https://bugzilla.wikimedia.org/62038#c1 (10Marc A. Pelletier) 5NEW>3RESO/WOR We do use manage-gids and LDAP as the user backend; so that is was the issue. That said, I've tested the failing... [14:34:43] 3Wikimedia Labs / 3tools: tools.wmflabs.org rejects SNI with tools.wmflabs.org - 10https://bugzilla.wikimedia.org/63435#c1 (10Marc A. Pelletier) 5NEW>3RESO/WOR All four work for me. Perhaps this issue was only present during the transition to nginx? (There were proxies to proxies for roughly one month... [14:51:54] Please could someone run two database queries for me on Commons [14:52:13] 1: number of unique images in category:Wikimania 2014 and subcategories [14:52:40] 2: number of unique users who have uploaded one or more files to category:Wikimania 2014 and subcategories [14:55:31] Thryduulf: http://quarry.wmflabs.org/ may well be what you want. :-) [14:56:16] thank you [14:56:47] unlikely [14:56:54] now, how to learn SQL... [14:57:02] Thryduulf: your best bet is http://tools.wmflabs.org/catscan2/ [14:57:38] then work on the results table e.g. in libreoffice or UNIX commandline [14:57:50] I've tried that but I can only get it to give me a list of all files in "Category:Photos by Chris McKenna", regardless of what settings I use [14:58:49] already being answered at #wikimedia-commons [15:29:43] 3Wikimedia Labs / 3tools: Rename revision_userindex to revision - 10https://bugzilla.wikimedia.org/66786#c14 (10Marc A. Pelletier) 5NEW>3RESO/WON After careful consideration, the necessity of having revision provide the same view as in production wins over the potential confusion of having some rows miss... [15:32:58] 3Wikimedia Labs / 3tools: Clean up list of projects on Tool Labs home page - 10https://bugzilla.wikimedia.org/49937#c10 (10Marc A. Pelletier) *** Bug 67259 has been marked as a duplicate of this bug. *** [15:32:59] 3Wikimedia Labs / 3tools: Tools using Tomcat missing in tool list - 10https://bugzilla.wikimedia.org/67259#c1 (10Marc A. Pelletier) 5NEW>3RESO/DUP This will be fixed with the coming overhaul of the tool list. I'm marking this bug as duplicate of 49937 since the same fix will correct both. *** This bug... [15:41:28] 3Wikimedia Labs / 3tools: Clean up list of projects on Tool Labs home page and add Tomcat tools - 10https://bugzilla.wikimedia.org/49937 (10Tim Landscheidt) [16:13:22] !log deployment-prep merging a patch that renames 'labswiki' to 'deploymentwiki' [16:13:25] Logged the message, dummy [16:13:56] here we go.... \o/ [16:15:07] * ^d puts on his hard-hat [16:15:10] andrewbogott: The scap for that change is running right now [16:15:11] <^d> Fire in the hole! [16:15:25] great, we'll see what breaks! [16:15:31] And it's done [16:15:48] http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page still loads... [16:15:51] that's about all I know to check [16:16:03] hmmm... yeah [16:16:05] <^d> Well enwiki main page still shows up, so that's something. [16:16:08] <^d> We didn't break prod. [16:16:20] Did the db get copied already? [16:16:27] <^d> Last night, right? [16:16:28] bd808: yes, springle did it last night. [16:16:34] oh sweet [16:16:53] he says "Wasn't sure if dropping labswiki would break stuff before the patches are merged, so right now labswiki database still exists, but only as a view of real data in deploymentwiki." [16:17:07] "Once the merges are done, someone can simply drop labswiki." [16:17:44] `mwscript eval.php --wiki=deploymentwiki` works [16:18:09] <^d> login's broken. [16:18:18] <^d> Not Found [16:18:18] <^d> The requested URL /wiki/Special:CentralLogin/start was not found on this server. [16:18:18] <^d> Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request. [16:19:08] http://login.wikimedia.beta.wmflabs.org/wiki/Special:Version is 404 too [16:19:38] <^d> Hmm [16:20:09] _joe changed apache stuff today too -- https://gerrit.wikimedia.org/r/#/c/156762/2 [16:20:33] login. seems deleted [16:21:19] <^d> sql loginwiki [16:21:19] <^d> ERROR 1045 (28000): Access denied for user 'wikiadmin'@'10.68.16.58' (using password: YES) [16:21:22] <^d> Seems...bad [16:21:41] the sql script doesn't work in beta [16:21:45] <^d> boo [16:22:00] long standing bug [16:22:26] <^d> $ mwscript eval.php --wiki=loginwiki [16:22:27] <^d> > echo $wgServerName; [16:22:27] <^d> login.wikimedia.beta.wmflabs.org [16:22:45] <^d> So login wiki still exists and knows who *it* is. _joe_'s change seems more suspect now. [16:23:06] Yeah. Trying to grok the change. [16:23:55] * andrewbogott looks on with guilty expression [16:23:56] loginwiki.conf is not included from the old repo [16:24:26] <^d> That'll do it [16:24:58] Should be easy to add... I'll see if I can fix [16:25:42] bd808: aude is seeing wikidata issues in beta as well, e.g. http://wikidata.beta.wmflabs.org/ [16:25:47] Is that the same thing? [16:25:57] It may be [16:25:57] <^d> Very possibly. [16:26:14] http://wikidata.beta.wmflabs.org/ is not configured [16:26:18] yeah no wikidata in new apache code [16:26:32] aude: Releated to https://gerrit.wikimedia.org/r/#/c/156762/2 [16:26:44] ok [16:27:00] <^d> aude: Just like loginwiki. Still exists, all there. Apache conf's just borked. [16:27:02] as long as it's a known issue, ok [16:27:07] <^d> You can confirm by using eval.php [16:27:27] sure [16:30:38] _joe_ is rolling back his change, we found a bunch of stuff missing [16:31:39] <_joe_> yes sorry it's a trivial problem [16:31:48] <_joe_> but I will fix it tomorrow. [16:31:51] Yeah, no worries [16:32:35] all.conf -> main.conf -> site.conf -> (lots of conf) [16:32:39] <_joe_> can you try now guys? [16:32:42] <_joe_> it should be fixed [16:33:10] <_joe_> and again, sorry. I just missed a couple of include declaration [16:33:27] varnish has cached the 404s [16:33:30] I'll get that [16:34:26] !log deployment-prep Restarted varnishes on deployment-cache-text02 [16:34:29] Logged the message, Master [16:34:56] http://login.wikimedia.beta.wmflabs.org/wiki/Main_Page is still 404? [16:35:03] <_joe_> bd808: mh [16:35:08] <_joe_> let me check [16:35:26] X-Cache:deployment-cache-text02 miss (0), deployment-cache-text02 frontend hit (4) [16:35:33] which seems not right [16:35:49] <_joe_> it is not [16:35:51] <_joe_> let me see [16:36:00] Getting better [16:36:15] service varnish restart -- apparently doesn't clear cahce [16:36:26] stop & start does [16:36:47] <_joe_> wait [16:37:15] <_joe_> but Im'm ion a meeting [16:37:32] http://login.wikimedia.beta.wmflabs.org/wiki/Special:Version is back but bits is sad [16:37:59] 404 cache there too [16:38:01] <_joe_> well, the config is exactly as it was before I made the change now [16:38:28] <^d> CentralAuth working again. [16:38:53] <_joe_> sorry guys, this was really trivial :/ [16:39:08] <^d> It happens! Thanks for fixing :) [16:41:18] aude: Is wikidata.beta better now? [16:41:38] My browser is telling lies I think [16:42:21] ah. that's better [16:42:48] gah, do i have to proxy? [16:42:56] hmmm... sometimes works and sometimes 404's for me [16:43:10] <_joe_> bd808: mmmm [16:43:24] proxy doesn't help [16:43:24] <_joe_> maybe deployment-mediawiki01 is in the pool as well? [16:43:27] it's not working for me [16:43:35] _joe_: I think it is [16:43:37] http://wikidata.beta.wmflabs.org/wiki/Special:Random [16:43:40] <_joe_> bd808: [16:43:49] <_joe_> ok one second [16:45:34] login on commons works but on deployment not. [16:45:54] <_joe_> bd808: I applied puppet on 01 as well, now [16:46:30] better for me. aude, Steinsplitter ? [16:46:41] yep [16:46:52] <_joe_> finally [16:47:14] w00t. Go back to your quarterly review _joe_ :) [16:47:20] Incorrect password entered. Please try again. [16:47:20] try to login on deployment. [16:47:24] something with CA is broken. [16:47:38] <_joe_> bd808: yes please :) [16:47:50] Steinsplitter: Ok. That could be from another change andrewbogott pushed [16:48:36] I'm in a meeting now but can step away if there's something I can do to help... [16:48:51] I'll poke at it, ^d will help I bet [16:49:31] !log deployment-prep Apache vhosts look good again [16:49:34] Logged the message, Master [16:49:54] !log deployment-prep CentralAuth looks broken on http://deployment.wikimedia.beta.wmflabs.org/ [16:49:57] Logged the message, Master [16:51:14] <^d> Not working how? [16:51:48] <^d> Able to login/logout from various wikis, including deploymentwiki [16:52:17] ^d: the pw is stored in which db? i can login on commons but not on deployment. o_O [16:52:42] <^d> I can login from deployment, commons, eswiki, enwiki [16:52:45] <^d> Logout from all two [16:52:50] <^d> *too [16:53:24] hmmm... I can't login on deployment but I can on others [16:53:30] <^d> 16:18, 28 August 2014 on deploymentwiki though. seems suspect. [16:53:36] <^d> account registration time. [16:53:45] ah. wiki renamed [16:53:53] <^d> Ahhhh [16:53:56] <^d> CA entries [16:53:58] <^d> Need updating. [16:53:59] yeah [16:54:30] I think I can fix that [16:54:59] <^d> $listOfReasonsWeNeverRenameWikis[] = 'CentralAuth records'; [16:56:14] ^d: The localuser table needs updated right? [16:56:29] <^d> I think so. [16:56:35] <^d> Lemme pull up the schema and check [16:56:48] elect * from localuser where lu_wiki = 'labswiki'; -- 940 rows [16:57:06] <^d> https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCentralAuth/HEAD/central-auth.sql [16:58:20] <^d> Definitely localuser.lu_wiki. [16:58:38] <^d> Probably globaluser.gu_home_db to be safe too. [16:58:38] And localnames too I think [16:58:49] <^d> localnames.ln_wiki [16:59:01] update localuser set lu_wiki = 'deploymentwiki' where lu_wiki = 'labswiki'; [16:59:07] update localnames set lu_wiki = 'deploymentwiki' where lu_wiki = 'labswiki'; [16:59:25] I think that will do it? [16:59:33] <^d> globaluser.gu_home_db? [16:59:35] <^d> wikiset.ws_wikis need checking perhaps? [16:59:52] <^d> renameuser_status.ru_wiki needs checking. [17:00:04] <^d> and renameuser_queue.rq_wiki [17:00:14] <^d> Think that's it [17:00:19] renameuser_queue will be empty. [17:00:36] I made that and it has no ui yet [17:00:51] Ok. starting to really screw stuff up now :) [17:05:59] Ok. I think maybe fixed now [17:06:04] * bd808 tries to login [17:07:29] w00t. Steinsplitter Can you login to http://deployment.wikimedia.beta.wmflabs.org/ now? [17:08:19] bd808: nope [17:08:45] boo. It worked for me after I logged out of en.beta [17:10:10] bd808: logged out there, now works. [17:10:31] ok. cool [17:10:31] Not sure why the logout is needed [17:10:46] <^d> Because CentralAuth [17:11:11] PRobably related to cache [17:11:25] redis somewhere has the old user objects in it [17:11:37] which think they have labswiki for a home wiki [17:12:00] this keeps updating rows -- update globaluser set gu_home_db = 'deploymentwiki' where gu_home_db = 'labswiki'; [17:12:33] !log deployment-prep Changed centralauth db to rename labswiki -> deploymentwiki [17:12:35] Logged the message, Master [17:14:43] 3Wikimedia Labs / 3deployment-prep (beta): hhvm creates core file in /tmp/ filling mediawiki02 labs instance root partition - 10https://bugzilla.wikimedia.org/69979#c10 (10Greg Grossmeier) Still at 12 since over night. I wonder if the core dumps were caused by the relatively mega high load due to the automat... [17:25:58] 3Wikimedia Labs / 3deployment-prep (beta): hhvm creates core file in /tmp/ filling mediawiki02 labs instance root partition - 10https://bugzilla.wikimedia.org/69979#c11 (10Bryan Davis) (In reply to Greg Grossmeier from comment #10) > Still at 12 since over night. I wonder if the core dumps were caused by the... [17:30:13] 3Wikimedia Labs / 3tools: Add locking to jstart so simultaneously started jstart / jsub -once calls don't create duplicated tasks - 10https://bugzilla.wikimedia.org/60862#c5 (10Krinkle) And tools.ecmabot: 3343333 0.31965 ecmabot-wm tools.ecmabo r 08/21/2014 18:36:18 continuous@tools-exec-09.eqiad 1... [17:31:28] 3Wikimedia Labs / 3tools: jsub should prevent starting duplicate jobs for -once tasks - 10https://bugzilla.wikimedia.org/60862 (10Krinkle) [17:31:58] 3Wikimedia Labs / 3tools: Tool Labs: jsub should prevent starting duplicate jobs for -once tasks - 10https://bugzilla.wikimedia.org/60862 (10Krinkle) [17:43:09] <^d> Do we not load StartProfiler in beta? [17:58:12] ^d: there arn't any flags in the core code that loads it, its just if file exists include. I can verify there is no StartProfiler.php on deployment-mediawiki01 [17:58:26] so basically, looks like we dont [17:58:41] * ebernhardson is 15 minutes late :P [17:58:45] <^d> We symlink it from $IP/StartProfiler.php to ../wmf-config/StartProfiler.php on production [17:58:50] <^d> Lacking the symlink in beta. [17:58:51] <^d> Hmm [18:03:25] bd808: I'm catching up post-meeting; is central auth still broken? [18:04:26] andrewbogott: I think it is working, but people who were logged in pre-rename and want to login on development.beta need to logout via en.beta (or another wiki) before the login works [18:04:35] I blame redis object caching [18:04:59] ok. So other than that… not broken? [18:26:29] AndyRussG needs to be added as an admin on the Globaleducation project. Is there a wiki place to request that, or do we just pester folks here? https://wikitech.wikimedia.org/wiki/Nova_Resource:Globaleducation [18:29:12] how can I tell puppet not to delete a file in /etc/apache2? [18:29:24] in which subdirectory? [18:29:26] sites-enabled? [18:29:36] in particular, my enabled site (the symlink crated by a2ensite in /etc/apache2/sites-enabled) is being deleted in every puppet run [18:29:43] if so, you don't; use sites-local instead. files in that directory get loaded too, but they aren't managed by puppet [18:30:12] ori: oh, great, I'll do that [18:30:12] thanks [18:30:39] np [19:21:45] can someone help me get User:Dduvall added to the deployment-prep project? [19:22:01] I just get "Failed to add Dduvall to deployment-prep. " [19:23:23] greg-g: i believe i'm already a member [19:23:28] 3Wikimedia Labs / 3deployment-prep (beta): shell wrapper to connect to databases - 10https://bugzilla.wikimedia.org/45706#c6 (10Bryan Davis) 5NEW>3RESO/FIX Fixed via a local commit in the puppet repo on deployment-salt: $ git show --pretty 52fc928 commit 52fc92891d53267ff2ed82b917017c288cdebaa2 Au... [19:23:37] bah! [19:24:02] pre-asking for your username I searched for "dan" didn't see anything, and kept that assumption throughout, bad greg [19:24:31] YuviPanda: Weather any better at your place? [19:24:51] multichill: not too cold, yeah. I'm no longer wearing a bathrobe! [19:25:56] When are you going back to your home oven? [19:27:04] YuviPanda: It's still summer in India, right? ;-) [19:27:20] "home oven" [19:28:02] (03PS1) 10Yuvipanda: Strip the 'operations' prefix as well from messages [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156877 [19:28:05] (03PS1) 10Yuvipanda: Rename jenkins-bot if it does a -1 [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 [19:28:05] multichill: oh, no, I'm in Glasgow ;) [19:28:16] multichill: it kinda is, yeah. I'm going back on 30 Sep [19:28:47] mutante: The place where Yuvi lives is kinda hot in summer if I recall correctly ;-) [19:29:04] (03CR) 10Krinkle: [C: 031] Strip the 'operations' prefix as well from messages [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156877 (owner: 10Yuvipanda) [19:29:21] (03CR) 10Krinkle: "Why?" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [19:29:23] He's now in the home freezer :P [19:29:32] (03CR) 10Jackmcbarn: [C: 04-1] "I'm not a fan of this idea." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [19:29:38] says somebody who was "chill" in the nick:) [19:29:46] (03CR) 10Yuvipanda: [C: 032] Strip the 'operations' prefix as well from messages [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156877 (owner: 10Yuvipanda) [19:29:49] (03Merged) 10jenkins-bot: Strip the 'operations' prefix as well from messages [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156877 (owner: 10Yuvipanda) [19:30:12] (03CR) 10Andrew Bogott: [C: 031] Rename jenkins-bot if it does a -1 [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [19:31:21] (03PS1) 10Krinkle: preprocess: Clean up arbitrary quoting of object key 'message' [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156880 [19:32:39] mutante: ^^ merged change to strip the 'operations/' prefix :) should deploy soon [19:34:09] (03CR) 10Yuvipanda: [C: 031] "This is grrrit-wm, and that's why :)" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [19:35:58] YuviPanda: cool, i guess? it never bothered me [19:36:12] mutante: hmm, it was you or matanya who poked me about it a few days back... [19:36:14] i'm not sure the other one about renaming the bot is a good idea [19:36:15] might have been matanya [19:36:19] tch tch [19:36:23] YuviPanda: i think matanya :) [19:42:24] (03CR) 10Legoktm: [C: 04-1] "lol" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [19:46:25] (03CR) 10MaxSem: [C: 031] "I find it very inspiring." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [19:46:40] * MaxSem bites legoktm [19:46:58] MaxSem: +2! [19:47:24] (03CR) 10MaxSem: [C: 032] "Yuvi made me do it! :P" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [19:47:28] (03Merged) 10jenkins-bot: Rename jenkins-bot if it does a -1 [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [19:48:33] !log tools.lolrrit-wm restart to deploy to latest master [19:48:35] Logged the message, Master [19:49:07] about what YuviPanda? [19:49:25] matanya: oh, grrrit-wm stripping the operations/ prefix from repo names when reporting [19:51:12] maybe, i'm getting senile [19:51:39] heh [19:59:32] matanya: "getting"? [19:59:45] well, already... [20:00:01] but no need to though it in my face :D [20:00:05] *r [20:37:44] (03PS1) 10MarkTraceur: Add ImageMetrics to -multimedia [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156945 [20:38:46] (03CR) 10MarkTraceur: [C: 04-1] "I think these are still useful to have in -dev, given that the Multimedia channel is not yet all that lively. A few more months and notifi" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/134154 (owner: 10Yuvipanda) [20:41:41] legoktm: thoughts on https://github.com/wikimedia/analytics-quarry-web/blob/master/quarry/web/output.py#L15 [20:42:01] what specifically about it? [20:42:12] legoktm: streaming CSV via flask [20:42:22] legoktm: it's a hack, wonder if there's better ways to do it [20:42:25] :o [20:42:42] flask has pretty good support for streaming files [20:42:52] legoktm: indeed, but I don't want to write then stream [20:42:54] what if you gave it a StringIO ? [20:43:02] would that work? [20:43:37] legoktm: yeah, but how do I yield from a stringIO? [20:43:44] hmm, I could readline [20:44:02] legoktm: also with stringIO I'd have to clean out the data past my seek position [20:57:02] (03CR) 10Legoktm: "This doesn't seem to be working. I still see "jenkins-bot" in V-1 messages." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/156878 (owner: 10Yuvipanda) [21:10:45] 3Wikimedia Labs / 3Infrastructure: WMFLabs: Ganglia down / needs reinstall - 10https://bugzilla.wikimedia.org/63362#c11 (10Greg Grossmeier) (In reply to Antoine "hashar" Musso from comment #9) > Will poke at it again next week with Andrew Boggot. I would like us to > attempt to resize the instance to a bigge... [21:14:15] 3Wikimedia Labs / 3Infrastructure: WMFLabs: Ganglia down / needs reinstall - 10https://bugzilla.wikimedia.org/63362#c12 (10Yuvi Panda) Ganglia is dead, long live Graphite. We had a working graphite.wmflabs.org instance for a while, but the same problems that ganglia ran into we ran into with graphite. So w... [21:15:00] 3Wikimedia Labs / 3Infrastructure: WMFLabs: Ganglia down / needs reinstall - 10https://bugzilla.wikimedia.org/63362 (10Yuvi Panda) a:5Antoine "hashar" Musso>3Yuvi Panda [21:15:13] greg-g: if you want to help with that, poking on the RT linked would be cool [21:18:31] YuviPanda: is mark the only one who can make that network change? [21:18:39] s/can/should/ [21:18:56] greg-g: I'm unsure. Coren tells me that 'before' we had paravoid also make these changes, but he's unavailable now... [21:19:07] :( [21:19:10] gage? [21:19:16] * greg-g asks [21:22:05] YuviPanda: shoudl that bug be closed and/or retitled to something like "Setup Graphite for WMFLabs"? [21:22:28] greg-g: retitling sounds good, ya [21:22:31] we need ganglia imo [21:22:41] ori: why [21:23:01] actually, maybe not [21:23:18] if I can autogenerate dashboards, I think we wouldn't need ganglia [21:23:21] yes, forget it; i'm not actually going to claim that [21:23:26] yeah [21:23:39] and autogenerating should be trivial [21:23:51] it's not even a requirement [21:24:10] icinga + icinga+wm for beta labs is more important (i think that was up at one time, not sure if it still is) [21:26:02] ori: I've realized we can do that quite easily as a stopgap in ways similar to how we have alerts for swift, eventlogging and mw 5xxs [21:26:17] nod [21:26:36] so, YuviPanda, what monitoring *do* we have for labs and/or beta cluster, that you know of? [21:28:00] greg-g: zero, afaik. [21:28:07] we used to have logstash.wmflabs [21:28:12] greg-g: well, if all of labs goes down virtxxx comes up [21:28:15] logging != monitoring [21:28:17] err, alerts for virtxxx comes up [21:28:23] that's about it, I think [21:28:33] ori: yeah, I was just looking for anything, really [21:28:34] greg-g: https://logstash-beta.wmflabs.org [21:29:00] That url changed with the eqiad migration [21:29:04] oh right [21:29:10] there needs to be a puppet freshness check too [21:29:10] and was always just beta [21:29:31] ori: we can actually do a puppet freshness check via graphite on labs :) [21:29:40] what graphite on labs? [21:29:45] :P [21:29:58] greg-g: http://graphite.wmflabs.org/ actually works most of the time [21:30:01] Yes please for a puppet check; this is a poor substitute -- https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/puppet%20runs [21:30:03] greg-g: except when it doesn't [21:30:24] grr, Sorry, user gjg is not allowed to execute '/bin/cat /root/secrets.txt' as root on deployment-bastion.eqiad.wmflabs. [21:30:55] why not? I can fix [21:30:55] greg-g: http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1409261446.902&target=deployment-prep.deployment-bastion.puppetagent.failed_events.value [21:31:03] bd808: ^ puppet failure count [21:31:11] on deployment-bastion [21:31:16] so we can feed that into icinga [21:31:19] (for now) [21:31:22] bd808: not sure [21:31:35] ori: ^ [21:32:47] !log deployment-prep Added "Greg Grossmeier" to UnderNDA sudoers group [21:32:50] Logged the message, Master [21:33:36] Hello everybody [21:33:48] greg-g: {{done}} see also https://bugzilla.wikimedia.org/show_bug.cgi?id=69269 [21:33:48] I have one question... [21:35:04] Why did Toolserver migrated to Wikimedia Tool Labs? [21:35:10] ori: btw, the smoking gun for Sherif's security testing: http://ur1.ca/i2nxj [21:35:28] 3Wikimedia Labs / 3Infrastructure: WMFLabs: Ganglia down / needs reinstall - 10https://bugzilla.wikimedia.org/63362#c13 (10Antoine "hashar" Musso) For history purposes: Ganglia on labs dies because it is on a small instance. Andrew attempted a resize via nova but that definitely does not work. Since: - ga... [21:35:53] ori: well, one of the smoking guns, corroborated with the other data [21:36:10] bd808: confirmed working [21:36:18] greg-g: if you'd like, I can make graphite.wmflabs.org more stable in the meantime by essentially disabling sending toollabs data to it [21:36:27] greg-g: (and Krinkle's cvn project data) [21:36:33] I personally only care about beta cluster ;) [21:36:54] YuviPanda: as in, I wouldn't mind if there was a BC-only suite of monitoring/logging tools [21:37:05] (I'd actually very much welcome it, and pay many beers for it) [21:37:25] greg-g: :) I can prioritize icinga alerts for bc over toollabs once labmon1001 comes up if you'd like [21:37:34] YuviPanda: I have seen a graphite link in the backlog http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1409261673.21&from=00%3A00_20140814&until=23%3A59_20140828&target=deployment-prep.deployment-mediawiki01.cpu.total.user.value&target=deployment-prep.deployment-mediawiki02.cpu.total.user.value [21:37:37] (and if you can pay me in appropriate alcohol whenever we meet) [21:37:39] YuviPanda: that is awesome :-] [21:37:49] hashar: it's been working for a while now, but gets overloaded now and then [21:37:50] * hashar delivers beers at Yuvi door. [21:38:04] given it is still a WIP, I find it acceptable [21:38:12] better to have some flacky data than NO data at all :D [21:38:15] hashar: :) [21:38:25] nice [21:38:55] YuviPanda: do you know whether we can query graphite for all "directories" under a directory ? :D [21:39:04] hashar: you can [21:39:23] YuviPanda: use case, would be to have some page to query all hosts under deployment-prep. hierarchy and dump the metrics. But I guess that is what Giraffe is doing anyway [21:39:33] hashar: yeah, that's my plan :) [21:39:38] awesome [21:40:05] if you get tools labs / and beta in there you will be cheered all over the planet [21:40:25] hashar: graphite already has beta and toollabs :) [21:40:33] hashar: although I'm removing toollabs for now [21:42:40] YuviPanda: I have opted in "integration" which run the Jenkins slaves [21:42:55] might help investigating some potential perf issues we have with browser tests [21:43:03] hashar: I just removed it as well. graphite on labs can't really handle it, and in an effort to stabilize it I'm going to just have it run betacluster [21:43:06] hashar: see -operations [21:43:26] YuviPanda: works for me :] [21:44:29] I am off. Have a good whatever [21:46:47] YuviPanda: Ah, graphite back up would be nice [21:47:07] Krinkle: yeah, awaiting a network config change [21:56:10] Hi. What is the meaning of "extended" in X!'s edit counter? e.g. see [21:58:49] !log deployment-prep Jenkins: deployment-bastion slave was no more processing jobs due to executors being stalled somehow. Marked the node offline and bring it back online to have the executors killed and recreated. Beta cluster is updating again (has been frozen for 2:30 hours). [21:58:55] now I am sleeping [22:02:55] ^Was that a human or a bot? [22:03:05] a bot [22:05:30] 3Wikimedia Labs / 3deployment-prep (beta): Determine first pass list of icinga-alerting data from graphite.wmflabs - 10https://bugzilla.wikimedia.org/70141 (10Greg Grossmeier) 3NEW p:3Unprio s:3normal a:3None Let's get some icinga alerts so we know when things are going sideways in Beta Cluster. [22:06:58] 3Wikimedia Labs / 3deployment-prep (beta): Determine first pass list of icinga-alerting data from graphite.wmflabs - 10https://bugzilla.wikimedia.org/70141#c1 (10Yuvi Panda) - No puppet run for more than 1h - Presence of any puppet failures What else? [22:09:43] 3Wikimedia Labs / 3deployment-prep (beta): Determine first pass list of icinga-alerting data from graphite.wmflabs - 10https://bugzilla.wikimedia.org/70141#c2 (10Greg Grossmeier) p:5Unprio>3High My first pass list (puppet fails on important vms): * deployment-prep.deployment-bastion.puppetagent.failed_ev... [22:12:29] 3Wikimedia Labs / 3deployment-prep (beta): Determine first pass list of icinga-alerting data from graphite.wmflabs - 10https://bugzilla.wikimedia.org/70141#c3 (10Greg Grossmeier) (In reply to Yuvi Panda from comment #1) > - No puppet run for more than 1h http://graphite.wmflabs.org/render/?width=586&height=... [22:30:19] greg-g: bad news on the icinga checks, I updated https://bugzilla.wikimedia.org/show_bug.cgi?id=70141 [22:30:28] 3Wikimedia Labs / 3deployment-prep (beta): Determine first pass list of icinga-alerting data from graphite.wmflabs - 10https://bugzilla.wikimedia.org/70141#c4 (10Yuvi Panda) I just realized that you can't hit Labs URLs from prod, and so we can't actually do this right now because of that :( Two options: 1.... [22:34:26] grrr, edit conflict ;) [22:34:28] 3Wikimedia Labs / 3deployment-prep (beta): Determine first pass list of icinga-alerting data from graphite.wmflabs - 10https://bugzilla.wikimedia.org/70141#c5 (10Greg Grossmeier) * deployment-prep.deployment-mediawiki01.diskspace.root.byte_free.value < 2 gigs * deployment-prep.deployment-mediawiki02.diskspac... [22:36:28] 3Wikimedia Labs / 3deployment-prep (beta): Determine first pass list of icinga-alerting data from graphite.wmflabs - 10https://bugzilla.wikimedia.org/70141#c6 (10Greg Grossmeier) (In reply to Yuvi Panda from comment #4) > Two options: > 1. File an RT ticket to allow access to graphite.wmflabs.org from labmon... [22:37:23] YuviPanda: The data flow is $INSTANCE => labmon1001 (aggregator) => graphite.wmflabs.org (web)? [22:37:33] scfc_de: right now? [22:37:48] scfc_de: right now it is just instance -> diamond-collector.eqiad.wmflabs :) [22:38:11] scfc_de: two potential future workflows is: $instance -> labmon1001, then labsproxy <-> labmon1001 (for http) [22:38:17] scfc_de: other is to just serve from labmon1001 [22:38:31] scfc_de: icinga would just use monitor_graphite_threshold running on labmon1001 [22:39:22] Serving directly from labmon1001 seems easier, but you probably thought about that a lot more :-). [22:40:24] scfc_de: mostly then we'll open that up to the wide internet, need to handle ssl there, etc. [23:34:04] 3Wikimedia Labs / 3deployment-prep (beta): Unable to log in to beta labs on iOS devices - 10https://bugzilla.wikimedia.org/70145 (10Maryana Pinchuk) 3NEW p:3Unprio s:3normal a:3None I've observed this for a couple weeks now at various times on multiple iOS devices (iPhone 5 and iPad 2). It may be aff...