[00:46:58] Ryan_Lane: When you say all content, do you mean all content or that which was reviewed as useful (there was tagging going on at some point)... all could just get argh... though I suppose it's incentive to review, clean and nuke from orbit where needed [00:47:06] all contet [00:47:10] *content [00:48:07] * Damianz notes to prepare the nukes [00:49:30] I suppose it sort of makes sense, since the next step really is to export the html as static offsite again... or replicate the db, though that's probably a bad idea sec wise and arg effort... there does seem to be a lot of crap on wikitech though heh [00:51:40] !log disabling puppet on nova-precise2 to test some apache changes [00:51:41] disabling is not a valid project. [00:51:47] !log openstack disabling puppet on nova-precise2 to test some apache changes [00:51:48] Logged the message, Master [00:52:34] <^demon> Damianz: Heh, so actually I've been hitting an openjdk6 bug. Turns out it's fixed in openjdk7. But my fix works for all versions, so we're going with that :) [00:52:40] <^demon> Rather than leaving openjdk6 broken ;-) [00:53:01] heh [00:54:39] See, that's the stuff I love when jenkins is like 'HEY, you just made a regression' [01:04:57] what's wrong with this stupid rule: RewriteRule ^/ids/s1/(.*)$ /wiki/Special:OpenIDIdentifier/$1 [PT] [01:04:58] ? [01:08:45] RewriteRule ^/ids/s1/(.*)$ /w/index.php?title=Special:OpenIDIdentifier/$1 [PT] [01:08:54] there we go [01:20:36] Ryan_Lane: Apache [01:20:45] Damianz: :) [01:22:52] I love graphs.... the world needs more graphs [01:23:06] Sooo much data in such a simple format [02:01:19] I think this means we need to move to Icinga in labs now (yay).... I need incentive to puppetize that pile of crap properly (or just tweak the prod config really) [03:19:37] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted was modified, changed by タチコマ robot link https://www.mediawiki.org/w/index.php?diff=652561 edit summary: [+0] Robot: Fixing double redirect to [[Wikimedia Labs/Tool Labs/Needed Toolserver features]] [03:19:47] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted in Tool Labs was modified, changed by タチコマ robot link https://www.mediawiki.org/w/index.php?diff=652562 edit summary: [+0] Robot: Fixing double redirect to [[Wikimedia Labs/Tool Labs/Needed Toolserver features]] [03:19:53] Change on 12mediawiki a page Wikimedia Labs/Toolserver features needed in Tool Labs was modified, changed by MZMcBride link https://www.mediawiki.org/w/index.php?diff=652563 edit summary: [+0] fixed redirect [08:02:35] andrewbogott_afk: Why are you in the cvn project? [08:03:57] hmn.. is not really in the group, outdated on wiki page [08:05:15] !log cvn cvn-apache2 is getting too small for all the cvn apps (wasn't supposed to run them in the first place, just apache). Created instance cvn-app1 with more cpu/memory. Moving over application pool later today. [08:05:16] Logged the message, Master [08:09:20] Ryan_Lane: Looks like everytime I go and so something on labs something somewhere is broken preventing me from doing it [08:09:35] Ryan_Lane: I can ssh into the instance I created yesterday (cvn-app1.pmpta.wmflabs) [08:09:47] presumably my home directory isn't there yet [08:10:16] hm.. They do boot after creatio [08:10:19] ergh [08:10:23] tomorrow! [08:25:49] Coren think you will have some sort of process distribution up by this time next week? :P [10:58:04] Hi! The creation of labs instances is not working at the moment. Did I overlook an e-mail? [11:00:53] 08:09 < Krinkle> Ryan_Lane: Looks like everytime I go and so something on labs something somewhere is broken preventing me from doing it [11:27:22] [bz] (8NEW - created by: 2silke.meyer, priority: 4Unprioritized - 6normal) [Bug 45483] Make instance creation failures more verbose - https://bugzilla.wikimedia.org/show_bug.cgi?id=45483 [11:50:05] Silke_WMDE elaborate [11:50:55] petan: I click to get an instance, I get "failed to create instance". I try again and again. I get the same. [11:51:05] :) [11:55:20] petan: oh. ok. Correcting: Instance creation fails when not using the default security group but a different one. [11:55:33] (that worked earlier) [13:00:51] re [13:01:41] I deleted and recreated my security group. Still can't create instances belonging to this group. m( [13:39:42] addshore: Define "process distribution"? [13:40:35] whatever you were planning before ;p [13:48:20] addshore: Heh. I don't know about "this time next week", but that's what I'm working on atm on the new project. [13:51:48] well this time next week i'm going to have on hell of allot of processses to run ;p [13:52:21] see bnr1 ;p http://ganglia.wmflabs.org/latest/?r=day&cs=&ce=&c=bots&h=bots-bnr1&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS [14:07:29] addshore: Well, I can arrange for you to be a test case if you want. It'd certainly be useful for me, but if there are problems it might delay your work while I fix things. [14:08:57] The work wont be urgent, but there will be allot of it :) and if things to go really badly I can keep using bnr1 :) [14:10:49] addshore: Allright then, make sure you can enumerate your dependencies as soon as you can and I'll set things up so that you can be a guinea pig. :-) Are you already familiar with starting jobs on a grid? [14:15:01] I have used to ts system :) [14:17:07] Oh, we'll soon be able to submit jobs? [14:18:12] Darkdadaah: Wait, wait, wait! It's not even in a "preproduction" stage yet, and I don't want too many people to start using the new project before its design gels. :-) [14:18:36] Coren: I won't do anything unless you say it's alright to do it, don't worry. [14:21:51] And I don't plan to do anything serious on labs unless I'm sure it's stable enough. [14:28:16] Coren: whens dbrep? ;p [14:28:38] addshore: "When it's ready." :-) [14:29:07] addshore: Asher is hard at work on that project. [14:29:18] [= [16:10:59] * Damianz merges Coren into a cookie [16:11:25] Hmmm. Cookie. [16:11:50] !log cvn Installing php5-cli and mono-complete on cvn-app1 [16:11:52] Logged the message, Master [16:12:10] !log cvn Depooling cvn-apache1 from application [16:12:11] Logged the message, Master [16:47:22] !nagios [16:47:23] http://208.80.153.210/nagios3 http://nagios.wmflabs.org/nagios3 [16:47:46] !icinga is http://208.80.153.210/icinga http://icinga.wmflabs.org/ [16:47:46] Key was added [16:47:50] !nagios del [16:47:50] Successfully removed nagios [16:47:54] !nagios alias icinga [16:47:54] Created new alias for this key [16:47:58] !nagios [16:47:58] http://208.80.153.210/icinga http://icinga.wmflabs.org/ [16:49:23] is it worth of renaming the channel? :P I don't think so [16:50:48] Everyone knows what 'a Nagios' is. It'll take some time before it's Icinga. And MAN that is one ugly name. [16:54:25] !log cvn Add cvn-app1 to application pool. [16:54:26] Logged the message, Master [16:55:26] !log cvn Pool configuration pushed to github: https://github.com/countervandalism/stillalive/blob/master/localSettings-cvn.json [16:55:27] Logged the message, Master [17:13:20] Ryan_Lane: good morning [17:15:22] petan: icinga is fully built with frames, no useful urls [17:15:25] that's a step back [17:15:33] modern UI, uh [17:17:53] Hi! Waaaah! Puppet! [17:18:35] It stopped working... http://pastebin.com/Kx3eWR3C [17:18:55] Why is that? [17:22:17] !log deployment-prep foreachwikiindblist /home/wikipedia/common/all-labs.dblist update.php --quick --quiet [17:22:19] Logged the message, Master [17:33:36] http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?hostgroup=cvn&style=detail [17:33:39] broken links on labsconsole [17:34:35] Looks like the redirect isn't fully in effect yet [17:34:42] or only set up for the root [17:34:58] !log deployment-prep updating mediawiki-config 8d1aac9..10bda3a [17:35:00] Logged the message, Master [18:01:06] +2, +2, my kindom* for a +2 (* kingdom may or may not exist; offer void where prohibited by law, or in nations with republican political systems) [18:01:41] * hashar gives a +2 to Coren [18:04:15] hashar: No you didn't. :-) [18:04:25] hashar: No kingdom for you! [18:04:28] :-( [18:04:49] Tu le veux pas, anyways, c'est la Belgique. :-) [18:04:58] ahahah [18:05:20] je suis un amateur de bière, j'échangerais bien mon empire pour ton royaume! [18:07:14] Bof la bière belge... à part la Blanche de Bruges, je préfère les allemandes. :-) [18:27:04] Coren: link? (If you're still seeking review) [18:27:56] https://gerrit.wikimedia.org/r/#/c/51051/ and https://gerrit.wikimedia.org/r/#/c/50913/ [18:30:01] Ah, sorry, I was confused by the 'abandon' message yesterday :) [18:30:27] Krinkle: did your issue with cvn-app1 solve itself? [18:31:40] Ryan_Lane: Yes, apparently it took about 12 hours for it to be reachable [18:31:48] hm [18:31:54] I didn't do anything, it just worked at some point. [18:32:12] I see the nagios RECOVER messages about 9-10 hours after I created it [18:32:13] andrewbogott: Heh. You were confused by the message, think how confused /I/ was to do it. :-P [18:32:17] hm [18:32:23] indicating that it started for the first time [18:32:27] I wonder if puppet had failed to run a number of times [18:32:51] those were however also the first messages from nagios for that host [18:33:10] I'm looking at the log [18:33:26] k [18:33:58] did you reboot it? [18:34:05] oh [18:34:12] nevermind. ignore me [18:34:29] Ryan_Lane: Wait, are you saying that isn't SOP? :-) [18:34:38] heh [18:34:51] it should take at most maybe 10 minutes [18:35:02] Ryan_Lane: I might have tried to reboot it from labsconsole. I think I did it 3 hours after I created it and couldn't ssh in [18:35:05] first puppet run finished: Feb 27 08:40:59 [18:35:28] which is almost 3 hours after creation [18:36:03] it looks like puppet got itself into some state where it thought it was running: Feb 27 05:58:05 cvn-app1 puppet-agent[4363]: Run of Puppet configuration client already in progress; skipping [18:36:07] and did for hours [19:18:57] Krinkle if icinga is a step back why we use it on production? [19:19:14] btw nagios is using frames too [19:19:32] I have no doubt that icinga is an improvement (seems to be better maintained) [19:19:33] :o [19:19:52] I don't see difference other than that it looks different [19:20:08] petan: nagios also uses frames, but it managed the url properly from what I remember [19:20:20] the two aren't mutually exclusive (frames and good urls) [19:20:23] but nagios released a new version few days ago that mean it's not dead [19:20:42] I'm just a bystander. [19:21:04] I don't really care I just wanted to align with production [19:21:20] I don't believe we are going to switch back to nagios there :P [19:21:28] it seems like exactly the same to me [19:21:37] just a different colors [19:21:45] btw latest version of nagios has similar colors [19:21:57] The fork was caused by the dev mindshare moving away from Nagios. [19:22:03] so main difference between icinga is actually it has different logo [19:22:28] https://www.icinga.org/faq/why-a-fork/ [19:22:30] Coren why they moved away from nagios? [19:22:54] vs. https://www.icinga.org/nagios/ [19:22:59] one vs. the other [19:23:20] " the Nagios software itself- was maintained by a single developer in the United States and was hence developed at a slower pace" -- lies [19:23:50] the latest release of nagios has several people mentioned in changelog who submitted patches [19:24:18] why in the world they didn't join that "one developer" instead of forking whole project [19:24:35] petan: They tried, apparently. [19:26:51] hmm [19:27:15] ok I think we keep icinga on labs :P [19:27:27] petan: btw, not sure whether I don't get labsconsole or whether this is a room for improvement, but why can't I go to manage an instance from https://labsconsole.wikimedia.org/wiki/Nova_Resource:Resourceloader2 or https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000001d7 [19:27:55] I have to go to https://labsconsole.wikimedia.org/wiki/Special:NovaInstance [19:28:04] e.g to reboot an instance [19:28:16] mhm... indeed [19:28:17] I should be ablt to do that from the project page or instance page, right ? [19:28:20] we should create a bug for that [19:28:31] I keep having to load that slow page, find the instance again and go on. [19:28:39] (since I'm in a lot of projects) [19:28:41] yes that is annoying [19:28:46] should be simple to fix [19:28:51] k [19:29:19] petan: e.g. the "Actions" column of Special:NovaInstance [19:29:42] perhaps add that to the NovaResource pages (either on project lever or instance level) [19:29:46] * Krinkle files a bug [19:30:19] I am wondering if this would be possible with some template changes, but I guess we need to change the extension :/ [19:30:36] that would be likely simpler [19:36:12] petan: Basically the "Instances for this project" section on the nova resource project page should be the same as the table for that project on Special:NovaInstance [19:36:23] It is already almost the same, only the Action column is missing [19:36:30] yes [19:36:37] https://bugzilla.wikimedia.org/show_bug.cgi?id=45513 [19:36:47] cool [19:37:35] petan: How do I unpause a vm that Ryan turned off for phpmyadmin (I want to delete phpmyadmin and get the vm back up) [19:37:39] "Failed to reboot instance." [19:37:47] https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&showmsg=setfilter / resourceloader2 [19:37:53] state= paused [19:38:02] I think it's possible only using the console [19:38:09] so ping paravoid or andrewbogott [19:38:12] :P [19:38:14] the vm is off, how I can I ssh into it? [19:38:19] paravoid: andrewbogott: :) [19:38:23] I mean console of nova [19:38:28] not ur vm [19:38:28] right [19:38:41] I don't have such access [19:38:41] In other words, *I* can't. [19:38:45] I think [19:38:46] okay [19:38:48] not sure [19:39:47] hi [19:40:03] Krinkle: Ryan_Lane may disagree, but I believe those VMs are presumed to have already been hacked. So it might be that we'll want to just mount their drives and let you salvage w/out actually restarting them. [19:40:54] Krinkle: Catching staff on IRC today will be tricky… I'd advise you to respond to the labs-l email and start a discussion about this. [19:41:28] "Failed to delete instance." [19:42:09] andrewbogott: can you just delete it? [19:43:05] the VM is compromised, needs to be deleted/reinstalled [19:43:16] there is no data to preserve, it can be deleted. I'm creating a new one as I speak. [19:43:32] I just want nagios to shut up about it being donw. [19:43:34] down* [19:43:46] as it pings "resourceloader2" (which I want) [19:44:22] and also to get the public IP back so I can assign it to the new instance [19:44:36] Freeing the IP should be simple. But yeah, I can delete it. [19:44:49] Krinkle, this is the 'resourceloader2' instance we're talking about? [19:44:53] I've associated the IP to the other, that was easy :) [19:45:05] andrewbogott: resourceloader2-apache.pmtpa.wmflabs to be specific, yes [19:45:15] I've just created rl2-apache2 [19:46:47] Speaking of which, I'll need a public IP and DNS record and I don't seem to be allowed to allocate new IPS [19:47:23] Coren: IIRC there is a quota in openstack [19:47:36] Coren: a project having a default quota of Zero public address :-D [19:47:59] also one could use the .instance-proxy.wmflabs.org trick [19:48:02] hashar: I don't think I have any openstack rights atm. In fact, I'm pretty sure I don't have any. :-) [19:48:12] hashar: This is meant for something a little more permanent. [19:48:28] hashar: And which needs to be controllable from within the project. [19:48:42] andrewbogott: btw, I don't see where I can assign a security group to a vm [19:48:56] Krinkle: You can only do it during VM creation [19:49:00] I just created a new vm replacing the one you're deleting, created a "web" policy allowing port 80, how do I give.. [19:49:06] Coren, project name? [19:49:08] andrewbogott: And it can't be changed? [19:49:13] andrewbogott: tools [19:49:17] Krinkle: Right, must be set on creation [19:49:36] andrewbogott: why? Feature not implemented yet, or restriction by nature? [19:50:16] andrewbogott: Danke. [19:50:55] andrewbogott: okay, I'll create a new one [19:51:06] andrewbogott: btw, do I give it 'web' or 'default' and 'web' ? [19:52:44] brb after dinner [19:55:42] Bah, sorry, network problems [19:55:54] Krinkle, I will try to clean up that instance later in the day -- too much happening just now :) [19:56:30] what do we need gridengine for? [20:09:41] paravoid: toolserver uses sun grid engine now… Coren is setting up grid engine in labs to make a smooth transition for toolserver users [20:10:16] paravoid: There are also reliability bonuses for long-running and continual processes (moving them vm-to-vm for free) [20:11:28] paravoid: Plus I already got users who will benefit from the scheduling; jobs that take a half-bazillion little job runs to complete can now be parallelized without crushing the process tables. :-) [20:17:04] Coren: I take it it won't be Sun GE, though, right? [20:17:37] would it be to make the syntax (qsub, cronsub etc.) compatible, or would it also actually be distributed over several "tools" project vms. [20:17:53] Krinkle: You're correct. The objective the Open Grid Scheduler (a fork), but right now Ubintu includes its immediate predecessor (which is also open source) [20:17:57] I suppose we'd want something like that anyway, might as well be that. [20:18:06] cool [20:18:48] And yes, command-compatible is an objective, though I'll also provide wrappers with reasonable defaults for those who have no need for the flexibility (most users, I'd expect) [20:18:55] Coren: btw, do you know what the status is on the proposal to merge "tools" and "bots"? [20:19:14] Krinkle: It's 'webtools' and 'bots'. The result is extant: 'tools' :-) [20:19:21] Great [20:20:15] Krinkle: Working on the setup now. I've already got a few requests from users to be on the bleeding edge and start trying it. [20:20:39] Coren: I'm not sure whether it is part of SGE, but toolserver had a command "qsub" which would take a bash command as argument (variadic arguments, like nohup for example). And it would execute it on one of the available hosts asynchronously (no other options needed) [20:21:00] it does take more options, but it works out of the box. [20:21:03] Krinkle: Yep, that's standard. [20:21:06] great [20:21:31] Krinkle: The wrappers are mostly for the 'run-this-thing-continuously' variety (bots and such) which want restart support and such. [20:21:38] (e.g. for memory, number of expected sql connections to wmf-replication, duration in seconds or infinite) [20:21:45] http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Design [20:21:47] ah, yeah, that'd be nice [21:05:54] Ryan_Lane: I don't see /etc/ssh/ssh_config under puppet control? [21:09:34] Coren: it's not [21:09:43] kk. Just checking [21:09:44] sshd_config is [21:13:48] * petan pokes [21:17:09] !log resourceloader2 Installing apache/mysql/php on gadgets-apache3 [21:17:10] Logged the message, Master [21:18:56] I'm getting a captcha for my edit on labsconsole that includes a link [21:20:05] and the editor is broken, It is ignoring what I entered and inserting an empty "{{Nova Project Documentation}}" template [21:22:31] Ryan_Lane: petan: https://bugzilla.wikimedia.org/show_bug.cgi?id=45519 [21:22:36] [bz] (8NEW - created by: 2Krinkle, priority: 4Unprioritized - 6major) [Bug 45519] [Regression] Editing "Documentation" page for labs project not working - https://bugzilla.wikimedia.org/show_bug.cgi?id=45519 [21:22:54] the editor is broken? [21:23:06] arrr. I guess I'm just out of luck, but seriously, not once have I done something on labs without filing a bug or walking to ryan for fixing. [21:23:22] Of course while I"m not here, everything works :) [21:23:26] heh [21:23:47] ah [21:23:55] this is the editor when using semantic forms [21:24:10] captcha is just like on wikimedia projects [21:24:11] I tried editing raw, but I guess hijacks it somehow from a hook [21:24:24] bleh. fucking semantic forms [21:24:27] let me update it [21:26:45] Ryan_Lane: It's not just the initial edit [21:26:46] https://labsconsole.wikimedia.org/w/index.php?title=Nova_Resource%3AResourceloader2%2FDocumentation&diff=12195&oldid=12194 [21:26:50] it'll actually clear the page [21:26:54] -_- [21:27:30] Krinkle: it looks like it's working to me now [21:27:45] Ryan_Lane: https://labsconsole.wikimedia.org/w/index.php?title=Nova_Resource%3AResourceloader2%2FDocumentation&diff=12194&oldid=12193 [21:27:48] I was able to add text too [21:27:54] the problem is when the captcha comes in [21:28:00] which is only for actua links [21:28:04] let me just disable the captcha [21:29:11] Krinkle: now try [21:29:17] no more captcha for external urls [21:29:49] that fixed it [21:30:09] Ryan_Lane: No, when I press submit now, instead of getting a captcha, I'm still at the formedit page [21:30:22] as if I didn't do anything [21:30:27] what are you typing into each box? [21:30:34] it's working perfectly fine for me [21:30:45] Ryan_Lane: http://cl.ly/image/0j0f0w00293l [21:30:54] I press submit there, I get back to exactly the same page [21:31:04] edit not saed [21:31:05] saved* [21:31:40] ugh. I see [21:31:42] Okay, now it saved it [21:31:47] oh [21:31:52] I think this is a race condition [21:32:00] and edit conflicts [21:32:16] right [21:32:40] the captcha issue is related [21:32:49] Ryan_Lane: Do Nova_Resource pages have some special caching that prevents editing /Documentation from it being purged? [21:33:00] I always purge it manually [21:33:07] well, it runs via cron [21:33:13] runJobs, that is [21:33:22] Right [21:34:19] it runs every minute, but that's pretty slow still [21:34:59] k [21:35:02] Krinkle: it sucks you keep hitting bugs [21:35:12] Yeah, well at least we're hitting them. [21:35:30] well, you're actually reporting them, so I can fix them ;) [21:35:38] nod [21:35:57] I should eliminate Semantic Forms from the documentation [21:39:36] Ryan_Lane: Ze ssh banner. She is evile! [21:39:42] why? [21:41:12] Ryan_Lane: The only way to suppress it (which is needed for many uninteractive uses) is to up LogLevel to ERROR or QUIET, which suppresses other important messages. [21:42:17] (I can cope in this particular case. I just felt like it was important to whine about it) [21:42:44] which do you prefer? lots of support requests, or having to change loglevel in some places? :) [21:43:29] Ryan_Lane: How about keeping it on for bastion-like boxen but allow turning it off for the rest? :-) [21:43:42] that's doable, yes [21:43:50] if you push that into puppet I'll approve it [21:44:08] Like, 'tools-login' I'd keep it on. 'tools-exec-01' not so much. :-) [21:44:54] Do we have a class for 'this box is meant to be used interactively'? [21:47:42] it can be a variable that's set [21:47:48] then we'd set it explicitly on the bastions [21:48:02] Hm. ssh_access_banner [21:59:52] <^demon> Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/51280/ is ready for the new deploy when you get a second. [22:10:16] Hi, just for my local work, if I have two pending, differently based gerrits https://gerrit.wikimedia.org/r/#/c/50604/ and https://gerrit.wikimedia.org/r/#/c/51278/ ... [22:10:36] ...what is the best way to merge them locally ? [22:13:49] Wikinaut: probably best to ask in #mediawiki [22:27:05] Ryan_Lane is there a reason why that wiki admin group can't change MediaWiki space pages? [22:27:31] someone wanted to fix sidebar few days ago and it wasn't possible with this set :/ [22:27:53] you're not a wiki admin [22:28:05] I mean that second group [22:28:06] and yes, because that would allow you to change css/js [22:28:20] hm... and how is that bad? [22:28:42] because that would allow you to own every admin [22:28:49] given that I am already global interface bit holder, I can change css js on any wikimedia project :o [22:29:06] mhm [22:29:08] right [22:29:11] it would let you delete every instance, or change people's ssh keys [22:29:12] etc [22:29:28] including folks that require two factor auth [22:29:35] mhm aha [22:30:16] there would be quite evidence if I attempted to do anything like that, but fair [22:30:39] yeah, I'm sure the other ops folks would be opposed to allowing it