[01:56:14] Ryan_Lane: ping? [01:57:21] sup? [01:57:22] so. [01:57:32] I just found something that's going to save me days of work :) [01:57:36] virt5 is emptied [01:57:38] I don't *need* to do the dns stuff right now [01:57:40] awesome [02:04:13] paravoid: so, what's up with the c_rehash revert? did you look at all yet? [02:04:25] I haven't yet no [02:04:34] * jeremyb could look a little but isn't really in a position to test i think? that needs a prod node? [02:04:42] I have several people waiting on me for reviews plus the VM migrations [02:04:55] (cause it only broke prod?) [02:05:08] probably tomorrow, right now I feel like just doing dumb work [02:05:16] (at this time...) [02:05:34] heh [02:06:02] * jeremyb pastes for reference: [02:06:03] 01 16:40:42 <@mark> (File[/etc/ssl/certs/star.wikibooks.org.pem] => Exec[c_rehash] => Class[Certificates::Base] => Install_certificate[star.wikibooks.org] => File[/etc/ssl/certs/star.wikibooks.org.pem]) [02:06:25] * jeremyb digs a little [02:06:47] Ryan_Lane: simple revision for you https://gerrit.wikimedia.org/r/#/c/17376/ [02:06:54] Fixes the annoying tabindex order on login at labs ;) [02:07:28] can't we just fix our auth code so that we don't have to do this in hacky ways? :) [02:07:49] this is a 30 second fix... :D [02:08:08] can't we just not offer a choice if there's no choice to make? [02:08:09] ;) [02:08:25] Reedy: well, yes, but it breaks things for third parties [02:08:25] but yeah, having to manually set this based on randomness is daft [02:08:33] not that we have any [02:08:34] but still [02:08:35] :) [02:08:54] merged [02:08:55] let's deploy [02:10:03] wow. I almost did something really really bad [02:10:18] Ryan_Lane: permission to reboot all of your VMs? incl. the ones in the bots project? [02:10:24] paravoid: go for it [02:11:05] Reedy: deployed [02:11:05] worked [02:11:17] Yay [02:11:25] I almost did a git pull in the parent directory [02:11:27] I was going to log a bug, but it would've taken longer to fill out the form [02:11:36] I ctrl-c'd out of it quickly enough :) [02:12:10] paravoid: what are you still doing awake? :) [02:12:17] heh [02:12:28] Not something you want to be doing when you don't want to have to fix it now [02:12:31] zzz have VMs zzz migrate [02:13:00] :D [02:13:18] go to sleep! heh [02:14:52] Reedy and me have just an hour time difference you know [02:14:53] hooray, we can stick anything we want into the metadata with the nova api! [02:15:05] and update it [02:15:17] yeah, I knew that [02:15:22] Aren't you +2 on us? [02:15:25] It's 03:15 here [02:15:31] ah, right [02:15:33] damn [02:15:59] nova api doesn't list the security groups that an instance is a member of, though :( [02:16:06] so I'm adding it to the metadata for now [02:17:57] I'm pondering whether I should just leave a script that nova-manage vm list | grep -v virt[6-8] | cold-migrate and go to sleep :P [02:18:34] might want to bring virt5 into the mix first [02:18:40] by removing the gluster mount [02:19:11] ? [02:19:17] virt5 is a cisco box [02:19:37] that's why I was saying we should evacuate it first :) [02:19:38] yeah [02:20:06] oh so you mean remove the gluster mount and start populating VMs on it [02:20:10] yeah [02:20:27] yeah, should better do that sooner than later if I want to keep the cluster balanced [02:20:30] also, right now, if someone creates an instance, it'll end up on virt5 [02:20:37] heh, right [02:20:45] can we drain virt1-5? [02:20:53] i.e. tell openstack to never allocate vms there? [02:20:53] I'm fine with that [02:20:58] ah [02:21:07] well, when they are empty we can just disable nova-compute [02:21:13] I mean now [02:21:17] hm [02:21:23] I can't think of an easy way [02:21:39] :( [02:21:41] it's likely possible, but I'd need to find the way how [02:22:15] so, what's the process of removing the gluster mount? [02:22:26] I really don't want to read gluster docs right now :) [02:22:32] umount [02:22:47] and remove from fstab [02:22:51] eh? just that? [02:22:55] yeah [02:22:57] it's just a mount :) [02:23:07] it's not a replica? [02:23:10] nope [02:23:19] cool [02:23:21] virt1-4 are [02:23:30] and even on those, the mount isn't needed for the replica [02:23:51] gluster is easy. it's too bad it's so buggy [02:23:55] well yes, but if it had a replica I'd like to remove that too [02:23:59] * Ryan_Lane nods [02:24:09] we should wait until all nodes are done before doing so [02:24:24] otherwise we need to shrink the volume [02:25:35] and it needs to be done in pairs [02:25:37] umounted [02:25:55] sweet [02:26:11] we need to mount the local disk on /var/lib/nova/instances, now [02:26:33] I know I know [02:26:38] heh [02:26:40] I was just surprised umount worked :P [02:26:41] * Ryan_Lane nods [02:26:44] :D [02:28:20] gah, it's different than the others [02:28:26] :( [02:28:31] maybe we should rebuild it then [02:28:40] for now we can disable nova-compute [02:28:50] way ahead of you [02:28:51] ah [02:28:52] crap [02:28:54] I did that first [02:28:56] we can't rebuild it [02:28:58] heh [02:29:02] we can't? [02:29:03] why not? [02:29:06] ppa [02:29:10] fuuuuck [02:29:13] \o/ [02:29:17] never again [02:29:22] never again will I use a ppa [02:29:41] it's not too bad [02:29:55] * Ryan_Lane goes to a conference room with a whiteboard to write that 200 times [02:30:12] it just has a /dev/md1 for swap, which makes instance storage on /dev/md2 instead of /dev/md1 like the others [02:30:22] ah [02:30:23] yeah [02:30:24] otoh, we can live with three nodes for now and add it back when we upgrade [02:30:29] indeed [02:30:54] three nodes is way more than enough for now :) [02:31:09] just half a terrabyte of ram [02:31:48] yeah [02:31:55] more ram than the other hosts combined [02:33:56] Mem: 48386 48165 220 0 5 17915 [02:33:59] -/+ buffers/cache: 30244 18142 [02:34:01] Swap: 50159 11376 38783 [02:34:02] no wonder people were complaining [02:34:06] yep [02:36:18] cloudcracker.com is awesome [02:37:21] even has a rest api [02:38:43] ok. I'm off [02:38:45] * Ryan_Lane waves [02:38:59] go to sleep already. heh [07:39:11] hello [07:43:09] hi [07:43:17] !ping [07:43:17] pong [07:43:25] !log [07:43:30] :/ [07:44:56] ;-( [07:46:04] grmblbl apparently Gluster hasn't been updated :/ [07:54:11] petan: is there anything of interest on deployment-wmsearch or can we just get rid of it ? [07:54:46] hashar we really want to have a woring search there [07:55:05] sec [07:55:13] I must agree, but if we do so I guess we will want to start out using a fresh instance [07:55:19] + using Ubuntu Precise :-) [07:56:10] -bash: /usr/bin/groups: cannot execute binary file [07:56:10] Connection to deployment-wmsearch.pmtpa.wmflabs closed. [07:56:15] I guess it got corrupted [07:57:13] ok [07:57:21] !delete it [07:57:44] I should implement control interface for nova in this bot [07:57:54] so we can manage instances using !start !stop !delete !create [07:57:58] :D [07:58:00] !log deployment-prep {{bug|38748}} deleting unused/corrupted deployment-wmsearch instance. (had stuff like: -bash: /usr/bin/groups: cannot execute binary file. Connection to deployment-wmsearch.pmtpa.wmflabs closed.) [07:58:02] Logged the message, Master. [07:59:21] petan: Ryan wants to implements a MediaWiki API interface [07:59:37] this would need something more [07:59:39] petan: so he could start using more AJAX in the interface [07:59:46] you would need to auth using bot [07:59:57] but it would be cool [08:00:06] well the bot could get an auth token or a specific account [08:01:05] we could even have the bot act in the name of the user triggering the action [08:01:27] by using the irc cloak as an auth [08:04:50] heh [08:04:53] true [08:05:01] you could just set a cloak in your preferences in console [08:05:35] it would also log it to SAL by design [08:05:56] I should make a bug for that before I forget [08:07:02] we will make this channel an interactive shell :D [08:07:09] multiplayer bash [08:07:11] :P [08:07:22] I was actually thinking of a bot like that [08:07:39] it would be connected using ssh and it would send the output to channel and take input of users as commands to terminal [08:07:53] so that you could share the shell using irc [08:18:15] petan: #jenkins as a nice setup [08:18:33] that let them do most administrative tasks via the channel [08:18:40] for example forking a repo on github [08:18:47] triggering a jenkins build [08:18:51] or even a release :-) [08:19:00] and also interact with bugs [08:27:17] :) [08:27:45] irc is a nice terminal for multiple users :P [08:27:52] you just need to extend it a bit [08:51:04] Change on 12mediawiki a page Developer access was modified, changed by Kozuch link https://www.mediawiki.org/w/index.php?diff=568227 edit summary: [12:45:44] 08/02/2012 - 12:45:44 - Creating a home directory for rotsee at /export/keys/rotsee [12:46:46] 08/02/2012 - 12:46:46 - Updating keys for rotsee at /export/keys/rotsee [14:22:58] Change on 12mediawiki a page Developer access was modified, changed by Matma Rex link https://www.mediawiki.org/w/index.php?diff=568344 edit summary: [15:00:52] andrewbogott: ping? [15:03:20] howdy [15:04:56] hashar why my SUL on labs doesn't work anymore [15:05:48] petan|wk: I have no idea [15:12:12] petan|wk: try investigating it a bit more :-D [15:12:25] hashar how do I generate new one [15:12:30] I need my account to work [15:12:31] like are the pictures showing ? where do they link to ? what the link error output is and so on [15:12:44] error is wrong username or password [15:12:54] Login error [15:12:55] Incorrect password entered. Please try again. [15:12:56] that's i [15:13:30] I will need to change it in sql [15:13:30] SUL seems insane, not sure why we can't use oauth and stick to standards [15:13:42] what encryption it uese [15:13:44] uses [15:13:52] paravoid: hi. Did you get any inspiration for -dbdump / syslog-ng / rsyslog madness ? :-] [15:13:52] well [15:13:56] I have an idea [15:14:58] !log deployment-prep yeah we lost udp2log again! -dbdump : /etc/init.d/udp2log-mw restart [15:14:59] Logged the message, Master. [15:16:16] hashar: sec, looking at the outage [15:16:24] oh outage :( [15:16:27] paravoid: I'm out for breakfast at the moment; I'm going to head home and will catch up with you again in ~15 minutes. Hopefully the nagios smoke will have cleared by then… [15:16:36] andrewbogott_afk: yeah :) [15:16:40] I'll be here [15:17:43] hashar why gu_password is null for all users? [15:17:48] in centralauth [15:17:53] I have no idea what gu_password is [15:18:04] it's in global user table [15:18:07] where are passwords? [15:18:11] I need to reset mine [15:18:28] you could use eval.php maybe [15:18:40] what should I eval [15:18:53] mwscript eval.php --wiki=labswiki (can't remember the central auth wiki, but I guess it is labswiki) [15:18:54] $wgPasswordtableOfPetrb [15:18:57] :P [15:19:01] then input some mediawiki magic such as: [15:19:03] it's centralauth db [15:19:11] $u = User::newFromName( 'Petrb' ); [15:19:12] oh [15:19:18] $u->setPassword( 'foosecret' ); [15:19:18] you mean it's not in that db [15:19:19] ok [15:19:21] sec [15:19:22] $u->saveSettings(); [15:19:34] need to look at includes/User.php for the exact method names [15:20:53] wtf [15:21:03] there are no passwords in db :| [15:22:18] maybe that is not used :-) [15:22:21] I can login fine [15:22:23] and SUL work [15:22:32] (logged on commons beta and got logged on enwiki beta) [15:22:46] got it [15:24:09] wtf [15:24:12] it still doesn't work [15:24:21] I changed the password by hand... [15:24:22] pff [15:26:41] hashar can you give sysop to Petrb2 on enwiki [15:26:45] I need to try something [15:26:56] teh main acc doesn't work [15:27:01] I don't know why [15:27:11] I overwrote the gu_password by hand and it's still same [15:27:35] not a bureaucrat there :-D [15:27:40] you are steward [15:27:42] I am sure [15:27:45] I gave u steward [15:28:00] just go to meta wiki and do Petrb@enwiki] [15:28:03] just go to meta wiki and do Petrb@enwiki [15:28:12] it will let you change the rights there [15:29:13] pohhhhh [15:29:14] on http://deployment.wikimedia.beta.wmflabs.org/ [15:29:38] You do not have permission to edit user rights on other wikis. :-D [15:29:39] meta.wikimedia [15:29:45] o.o [15:29:50] we had to screw things [15:30:04] I don't have the user rights on meta.wikimedia.beta.wmflabs.org [15:30:05] or some steward did [15:30:14] you have global user rights [15:30:21] or you had before stewards put their nose in [15:30:28] ok I will do it using sql... [15:30:36] sorry ;D [15:41:41] paravoid: OK, back. Two questions: [15:41:53] 1) Are you trying to warn people before you migrate their instances? [15:42:00] 2) How should we divide up the pool? [15:42:33] I'm trying, although it's not required [15:42:43] if I see them online, I give them a heads-up [15:42:52] also, I'm thinking of special handling bots [15:43:27] as for the pool, we're on separate timezones, so we can work on it on separate times I guess :) [15:43:48] I did something like 40 yesterday, we have 115 left [15:43:51] it won't take that long [15:44:30] I'd start with nova-manage vm list |grep laner :) [15:44:47] That's easy enough… just let me know when you quit for the day and I'll take over. [15:44:55] feel free to start [15:44:57] I have a huuuuge backlog [15:45:00] ok :) [15:45:11] and I block other people, like hashar and j^ [15:45:29] regarding bots… special handling how? Just making sure the instance owners are on call during migration? [15:45:50] so far my plan was "do everything else, and leave them last" :) [15:45:54] :) [15:45:55] but yeah, that sounds sensible [15:45:58] ping their owners [15:46:03] also bastion [15:46:14] Does your script mostly succeed without comment, or are there frequent special cases? [15:46:19] since killing bastion would be bad(tm) [15:46:23] Oh yeah, bastion, yikes! [15:46:37] it has no error handling /at all/ [15:46:48] (ryan wrote it mostly) [15:47:06] be careful, the first argument is 0000NNN, without the i- or the instance- [15:47:47] hashar I fixed ur groups too [15:47:55] so that you should be able to use meta to change right everywhere [15:48:23] and it doesn't even check if you gave a second argument [15:48:27] petan|wk: thanks [15:48:32] (and fails miserably) [15:49:51] paravoid: OK, and to make sure I'm totally clear on what's happening… the grep -v virt[6-8] is because virt6,7,8 are the new servers and virt1,2,3,4,5 the old ones. [15:50:06] yep [15:50:13] we're migrating from virt1-5 to virt6-8 [15:50:18] But all in pmtpa? [15:50:24] virt5 is already empty because we want to fill that up [15:50:25] yes [15:50:31] virt5-8 are Ciscos [15:50:43] 'k. [15:50:43] virt1-4 are smallish servers [15:50:48] Let's see what I can break! [15:51:08] virt1-4 has 48G of RAM, virt5-8 have 192GB [15:52:46] Hm… I fear that Ryan is actively coding on one or more of these instances. [15:52:57] ryan gave a go-ahead for all of his VMsw [15:53:04] I asked him yesterday [15:53:08] ok [15:54:24] paravoid: How are you picking $TOHOST? Round-robin? [15:54:40] yes [15:55:40] Doing any checking to make sure the instances survive the operation? [16:01:00] not really :) [16:01:02] "Failed to create instance as the host could not be added to LDAP." [16:01:10] what's wrong? [16:01:24] ldap works... [16:02:27] paravoid: And we're expecting labconsole to display the old (now wrong) hostname, right? [16:02:50] andrewbogott: but ldap will be correct? [16:03:27] jeremyb: Hm, no idea. [16:03:30] andrewbogott: yes, Ryan was telling me that you're involved into something that will fix this or soemthing? :) [16:03:45] jeremyb: I don't think LDAP keeps instance<->node relationships [16:03:52] paravoid: Yeah, although it will work best in folsom :( [16:05:58] paravoid: huh, i guess so [16:06:10] folsom sounds like distant future? [16:08:42] yeah [16:08:47] we're about to upgrade to essex [16:08:59] although I surely hope the essex->folsom transition will be easier [16:09:17] it seems so, considering we'll be using keystone and the openstack api [16:09:33] but who knows, maybe we'll other kind of transitions then, like the networking stuff [16:18:28] andrewbogott: can you display as a semantic ask result whether or not someone is a sysadmin/netadmin for a given project ? [16:18:49] andrewbogott: btw, nova-manage vm list is remarkably slow (there's a launchpad bug about it), so I usually do nova-manage vm list > nova-is-slow [16:18:53] and then grep through that [16:19:13] https://labsconsole.wikimedia.org/w/index.php?title=Special:Ask&q=%5B%5BResource+Type%3A%3Aproject%5D%5D%5B%5Bmember%3A%3AUser%3AAndrew+Bogott%5D%5D&p=format%3Dbroadtable%2Fheaders%3Dshow%2Fmainlabel%3D-2D%2Flink%3Dall%2Fsearchlabel%3Dprojects%2Fclass%3Dsortable-20wikitable-20smwtable&po=%3F%0A%3FMember%0A%3FDescription%0A&eq=no [16:19:55] paravoid: btw, do you subscribe to all wnpp? [16:20:03] jeremyb: of course not, are you crazy? :) [16:20:08] I just read debian-devel [16:20:17] hah, ok, ok ;) [16:21:53] jeremyb: Should be possible but I'm not familiar with semantic search at all… [16:23:32] andrewbogott: well if you edit the project resource page it lists the users but not the type of relationship. (only that they are a member) [16:24:52] jeremyb: This page has roles… https://labsconsole.wikimedia.org/wiki/Special:NovaProject [16:25:19] andrewbogott: i know but that doesn't allow viewing roles for projects you're not a member of [16:25:25] But I suppose only for… yeah, what you said. [16:26:11] basically i wanted to see what [[special:novaproject]] would look like if i were user X [16:26:26] we used to have no filter at all and you could see all projects [16:30:29] paravoid: I get an instance lacking a DNS entry in pmtpa.wmflabs domain. Should I assign the bug to you ? :) https://bugzilla.wikimedia.org/show_bug.cgi?id=38846 [16:31:18] jeremyb: I agree that it's probably not useful to keep project roles secret from non-members. Probably worth logging a bug about it. [16:31:45] And, actually, I don't know if those roles are literally secret or just filtered to save space… that 'list projects' page is kind of a real-estate disaster in any case. needs work. [16:32:37] jeremyb: Sorry I'm not being very helpful :( [16:32:41] andrewbogott: it's nothing to do with space. it's all about perf [16:32:59] but in effect it then made some info secret [16:33:02] ish [16:33:49] andrewbogott: anyway, separate issue: that data should be exposed to semantic queries i think [16:34:05] I agree. [16:34:05] hashar: sigh. sure, although I'd have to ask Ryan [16:34:09] but I can be your tier-1 [16:34:14] paravoid: assigned :-] [16:34:39] paravoid: the instance creation just failed to create the DNS entry :-D I would fix it myself if I had the proper credentials :-] [16:38:11] paravoid: also apparently Gluster has not been updated on labs :/ [16:38:27] need to poke Ryan about it, he had a slot last night [16:38:40] yeah, he didn't keep me in the loop for that [16:39:12] Wednesday, August 1, 22:00-23:00 UTC (3pm-4pm PDT): Labs GlusterFS project storage upgrade [Ryan Lane] [16:39:38] maybe he faced a conflict with something else [16:39:55] * jeremyb wonders what this says: Οὔτε τι τῶν ἀνθρωπίνων ἄξιον ὂν μεγάλης σπουδῆς [16:40:05] it' [16:40:08] it's ancient greek [16:40:26] * jeremyb pulled it out of a chinese guy's mail footer [16:40:41] it's Plato [16:40:50] according to Google [16:41:04] "No human thing is of serious importance." [16:41:05] oh, google! of course i should have asked google [16:45:48] petan: d'you mind if nova-dev3 shuts down for a bit? [16:56:09] that's np [16:56:15] andrewbogott ok [16:58:35] petan|wk: I may actually just do all your non-bots instances in a lump, if that's OK with you. Will it ruin anyone's day to have the deployment-prep VMs reboot? [16:59:27] Just don't break them [16:59:28] :) [17:27:37] paravoid: Is it normal for migrations of big instances to take minutes rather than seconds? This last one is taking orders of magnitude longer than any of the earlier ones. [17:27:59] (And I can't immediately think of a good way to measure progress) [17:28:00] andrewbogott hold on [17:28:24] andrewbogott which instances we talk about now and what are you going to do with them? [17:29:00] nova-dev3 is an instance or not? are you talking about the nova compute node? [17:29:12] petan|wk: Right now I'm doing 000000d0. I'm just migrating them to faster hardware; it will just cause a few minute outage. [17:29:20] aah [17:29:25] nova-dev3 is a VM that claims to have been created by you. [17:29:36] hm, we should probably announce this in a mail next time [17:29:46] so that the people who run their bots know [17:30:00] I can tell you it's ok for me, but I don't know about the rest of people [17:30:21] I do not operate their bots, I don't even know how to bring them back [17:30:28] petan|wk: We're going to do bots last, and do some extra dilligence before migrating them, since they're the most likely to suffer from outages. [17:30:32] we really need to have a documentation base [17:30:39] ok [17:30:42] That is, 'instances in the bots project' [17:31:00] so are you going 000000000++ or randomly? [17:31:12] you said 00000d0 is now [17:31:47] bots-1 is 00a9 [17:32:04] so, I suppose you skipped it for now [17:32:39] Maybe bot authors should have init.d scripts and links in rc.d. [17:32:44] petan|wk: Not really doing them in numerical order, trying to do them per-project or per-owner so that I can notify people in IRC when possible. [17:32:46] petan|wk: we announced this last week [17:32:51] all my bots are on bots-1 and 2 so when wm-bot die, I know what's going on :) [17:33:08] Ryan_Lane ok (I knew about it so I didn't really check) [17:33:24] Ryan_Lane did you announce estimate times and instances that are affected? [17:33:44] all instances [17:33:47] ok [17:33:51] andrewbogott go ahead [17:33:55] and over the course of last week and this week [17:34:01] k [17:34:02] true [17:34:05] I remember [17:34:09] you said that last wekk [17:34:11] week [17:34:13] Ryan_Lane: hi :) have you rescheduled the glusterFS upgrade? [17:34:31] not yet. I'm actually worried that it does actually require downtime [17:34:37] we were on a pretty old beta [17:36:00] doh [17:36:14] that kind of block deployement on beta since it relies on git to get changes [17:36:29] though I can switch everything to be owned by root and requires user to do sudo git [17:36:29] I'm asking the gluster people now [17:37:11] great :) [17:38:44] going to play with my daughter, will be back later on :D [17:41:45] good news, we're past the point of needing downtime [17:53:39] ok. I'm going to reschedule the gluster upgrade for today [17:55:28] Bleh. Where's the docs about using the SAL for labs? [17:58:02] petan|wk: ^ [17:59:13] Reedy I don't know if we have any docs [17:59:15] !sal [17:59:15] https://labsconsole.wikimedia.org/wiki/Server_Admin_Log see it and you will know all you need [17:59:25] !logging [17:59:25] To log a message, use the following format: !log [17:59:31] Reedy that's all we have [17:59:38] The second one is what I wanted : [17:59:41] Thanks [17:59:45] yw [18:00:02] !sal is !logging [18:00:02] This key already exist - remove it, if you want to change it [18:00:11] ah [18:00:12] heh [18:00:36] Wikidata were asking [18:03:00] our documentation could use some love [18:03:40] WikiLovesDocuments?!? [18:03:44] Life could use some love too [18:04:17] hashar, petan|wk : I've now migrated all of deployment-prep. Y'all might want to give it a once-over and make sure things are up and running. [18:04:33] (One or two of the VMs might still be booting.) [18:05:01] https://labsconsole.wikimedia.org/w/index.php?title=Server_Admin_Log&diff=5331&oldid=2554 [18:05:02] preilly: you mean wikisource? [18:05:33] !log [18:05:44] preilly: as long as we can tie toolserver to it [18:05:46] ok [18:06:17] !log deployment-prep Migrated all instances to new hardware [18:06:20] Logged the message, Master. [18:15:54] andrewbogott: congratulations :-) [18:19:02] hashar: Does that mean it still works? [18:19:07] oops [18:36:08] Ryan_Lane: Failed to allocate new public IP address ? [18:37:02] Jarry1250: there is a quota per project on the IPs [18:37:21] mutante: Yes, Ryan just fulfilled my request on that (I think) [18:37:45] Jarry1250: ok, that was the one for "translatesvg"? [18:38:08] Yup (kept accidentally emailing the list from the wrong address, grr) [18:40:33] Jarry1250: i just confirmed you have a quota of 1 on that project [18:41:28] mutante: None listed, still failed to allocate (I'm a netadmin) [18:41:49] Jarry1250: the project is that? or the instance? [18:42:34] Well I'm looking at Special:NovaAddress, I assume that's a list of instances (I have the same name for each) [18:42:53] Then I'm doing "Allocate IP" [18:43:00] Confirm -> fail. [18:46:33] * jeremyb watches Ryan_Lane running in circles [18:46:50] Jarry1250: gimme a sec [18:46:57] maybe I need to add IPs to the pool [18:47:08] yep [18:47:13] Ryan_Lane: No problem [18:47:52] Jarry1250: it'll work now [18:48:16] Thanks! Works like a charm. [18:49:52] great [18:51:07] Well, the IP assignment worked like a charm. Now to work out why it's timing out *potters around* [18:51:43] * Madman|busy is happy to putter away on his development wiki hidden from the outside world. :D [18:53:51] btw, I noticed there was a slight problem with the certs manifest in that it'd fail if the symlinks it attempted to create (for the cert fingerprint) already existed. [18:53:57] Jarry1250: if you are trying to ping that would be normal, ICMP isn't allowed (by default) [18:54:14] But it didn't seem to break anything depending on it, just my OCD, and it was easily fixed. :) [18:56:21] Change on 12mediawiki a page Developer access was modified, changed by Nkansahrexford link https://www.mediawiki.org/w/index.php?diff=568458 edit summary: [18:57:07] !log pediapress Migrated all instances to new hardware [18:57:08] Logged the message, Master. [19:04:14] mutante: No, actually, I just literally shoved it into my browser [19:04:26] It was working over proxy before (can't test right now) [19:04:59] Jarry1250: check "manage security groups", most likely you are lacking rules to allow 80 and 443 [19:05:18] Nope, allowing both [19:05:32] (over tcp) [19:06:02] ...sigh. [19:07:03] Jarry1250: hmm, the webserver might need a restart to listen on the new IP [19:07:26] mutante: Ah, okay. == reboot instance? [19:08:37] Jarry1250: /etc/init.d/apache restart [19:08:49] kk, will try that [19:09:08] doesn't that whinge at you that you should use service apache restart? [19:10:12] /usr/sbin/apachectl graceful :P [19:10:17] whichever works [19:11:17] reboot [19:11:17] :D [19:13:11] mutante: /usr/sbin/apachectl graceful seems to have run successfully (as root) -- no output -- but no change, still timing out [19:14:27] What interfaces/ips etc is apache listening on? [19:15:51] Reedy: According to ports.conf, 80, 443 [19:16:04] !log reportcard Migrated all instances to new hardware [19:16:05] Logged the message, Master. [19:16:15] Well, 443 conditional [19:17:06] And your VirtualHost entry? [19:17:13] ? [19:20:08] Reedy: That's supposed to be in ports.conf? [19:20:17] no, in sites-enabled [19:20:37] I'm guessing swift is supposed to have bound the ip etc to teh machine? [19:20:59] huh? [19:21:23] Reedy: Ah, right, then a load of other bits [19:21:29] (Inside that tag, I mean) [19:23:33] Reedy: http://pastebin.com/bt8Mpzir if it helps [19:25:20] jeremyb: we can either have a database service, or we can just extend puppet to install mysql with credentials [19:25:25] jeremyb: inside of a container [19:25:36] where the user doesn't have shell access [19:26:07] what is "database service"? [19:26:15] like a compute service [19:26:22] "create a database like this for me" [19:26:28] are you considering reddwarft? [19:26:32] reddwarf* [19:26:33] not anymore [19:26:48] now it's a generic PaaS [19:26:55] it doesn't offer us anything [19:26:59] huh [19:27:24] we could use PaaS for bot labs [19:27:35] I can think of easier ways to handle that [19:27:51] a sane deployment system is mostly what we need for bots [19:28:32] not just for irc bots or wiki bots [19:28:35] but for web apps [19:28:46] i was mostly thinking web apps [19:29:04] !log incubator Migrated all instances to new hardware [19:29:04] I don't think a PaaS will help much with that either [19:29:05] Logged the message, Master. [19:29:10] most PaaS lock you into a language [19:29:20] the ones that don't are just deployment systems ;) [19:29:56] well i thought most have a choice of langs but not unlimited options for langs [19:30:03] if they had enough choices... [19:30:25] they usually lock you into frameworks too [19:30:46] it would likely be a decent amount of work to integrate a PaaS [19:30:51] since it needs api access [19:31:05] and we need to finish some dev work to make api access doable [19:31:49] * Damianz PaaS's on Ryans SaaSness [19:31:55] heh [19:32:03] huh, never heard of this trystack thing before [19:32:27] Bots needs lots of puppet, chunks of mysql, a decent scheduler/job running thing and a half crazy queue system [19:32:38] jeremyb: it's interesting [19:32:43] it lets you program against the apis [19:33:50] !performance [19:34:38] !webscale [19:36:28] Ryan_Lane: anyway, not seeing where it says that it's morphed like that [19:36:33] into PaaS [19:36:51] jeremyb: they did a talk at the last summit [19:37:22] !log ganglia Migrated all instances to new hardware [19:37:24] Logged the message, Master. [19:37:29] and they never wrote it down? ;( [19:37:42] heh [19:37:50] that project has been flailing for ages now [19:40:05] Ryan_Lane: so what about prod replicas? that's separate from the DBaaS question? [19:40:13] separate, yes [19:40:18] likely just bare metal [19:40:26] when? ;-) [19:40:45] either 2 months or 4-6 months [19:40:48] Just give us the root details to the production boxes, promise not to break anything? :D [19:41:07] back in a bit [19:41:10] getting lunch [19:41:31] Ryan_Lane: i've been saying i'd get lunch for the last ~90 mins... [19:41:40] lunch? [19:41:42] * jeremyb shoudl do it [19:41:44] should* [19:41:44] I might head home for the day [19:49:01] <^demon> Ryan_Lane: I suddenly lost access to my gerrit instance. [19:49:18] <^demon> I can ssh to bastion but not to the instance. [19:49:32] <^demon> Back, odd. [19:50:13] ^demo: I'm migrating instances right now, maybe I just moved yours. [19:50:19] um… ^demon: that is [19:52:03] <^demon> Ok, everything seems fine :) [19:56:40] Odd, etherpad-lite just restarted for no apparent reason [20:01:15] marktraceur: That was me! Just migrated you to faster hardware. [20:01:34] Reedy: I take it there was nothing obvious in that earlier pastebin of mine? [20:02:05] the first line was the interesting one [20:02:32] Right you are :) And that was as it should be? [20:09:07] andrewbogott: Well gee! Thanks much! [20:09:59] marktraceur: Ryan sent a warning to the list that instances would be rebooting this week, but it's taking a long time to get through the list. Hope I didn't cause you to lose any work. [20:10:40] andrewbogott: No, I had just forgotten about the warning I guess [20:11:10] I guess I should expect the other one to go down relatively soon as well [20:11:58] 'relatively' => 'I touch it once every two weeks or so, it will be sooner than that' [20:12:44] Hmm, there's just no sign of my attempts to load the public IP in any log I can find [20:17:30] mutante: I also opened ICMP in my security group, but pings time out too. Any ideas? [20:21:33] Jarry1250: you didn't use default? [20:21:38] you should always use default [20:22:01] I really need to mod the interface to require it [20:22:20] there's situations where you wouldn't want to use it, but they are rare [20:22:57] Oh, the instance is in both [20:23:05] then ping and such should work [20:23:09] Jarry1250: oh [20:23:14] I just figured from mutante's earlier comment that the rules didn't inherit or something [20:23:18] did you associate the public Ip with the instance? [20:23:41] it won't show up on the system's logs [20:23:44] Yes, via Special:NovaAddress anyway [20:23:48] public IPs are mapped via NAT [20:23:55] the instance doesn't know about its own IP [20:25:38] kk, so what next to look at? [20:26:10] is the public IP not responding to ping? [20:26:31] paravoid, the wiki on the 'mwreview' instance doesn't seem to have survived the migration, and I don't know enough to usefully debug the issue. Are you awake enough to have a look? [20:27:09] Ryan_Lane: /facepalm [20:27:27] andrewbogott: was it one of the corrupted ones [20:27:28] ? [20:27:33] Jarry1250: ? [20:27:41] I'd added but not associated it [20:27:46] ah [20:27:47] *allocated [20:27:54] yeah. allocating just adds it to your project [20:27:55] Ryan_Lane: I don't know. I'm pretty sure that it was working properly before I migrated it. (Or, at least, the front page came up OK.) [20:28:02] associating lets you map it to a specificinstance [20:28:10] andrewbogott: what's the instance id? [20:28:53] I can walk you through my troubleshooting process as I do it [20:29:40] Ryan_Lane: Yeah, I was confused by my having the same identifier for each [20:29:48] * Ryan_Lane nods [20:29:50] Ryan_Lane: I-000002ae [20:29:52] It's probably best for newbies like me to differentiate [20:30:11] Jarry1250: which identifiers? [20:30:14] allocate and associate? [20:30:18] !addresses [20:30:18] https://labsconsole.wikimedia.org/wiki/Help:Addresses [20:30:29] those docs walk you through the process [20:30:31] kind of [20:30:39] andrewbogott: it isn't in the corrupted list [20:30:50] I'm going to go on virt0 and run: nova-manage vm list | grep 2ae [20:31:01] that tells me which host its on [20:31:07] virt8 [20:31:14] Ryan: No, instance and project names [20:31:20] Jarry1250: ahhh ok [20:31:43] andrewbogott: by tailing its console log, I can it that it's up [20:31:50] Ryan_Lane: The system came up just fine… it's possible this same problem would've appeared if I rebooted it in place. [20:31:51] it pings [20:31:57] oh [20:31:58] The problem is this: http://mwreview.wmflabs.org/wiki/index.php [20:32:01] did it just take a long time? [20:32:03] ah [20:32:12] database is down [20:32:24] weird [20:32:28] database shows as up [20:32:46] I started mysql already. And I don't see the file that should correspond to the named database. [20:33:11] ah. I see the problem [20:33:13] apparmor [20:33:23] Oh? Can you explain further? [20:33:35] weird [20:33:43] it seems my.cnf was reconfigured [20:33:46] same with appamor [20:33:49] apparmor [20:33:55] the data directory was on /mnt [20:33:59] now it's /var/lib/mysql [20:34:29] Is it because rebooting caused puppet to start working after having been off for ages? [20:34:45] (I think Erik tinkered with that system a bit since I last touched it, so not sure what its state was.) [20:35:44] probably, yeah [20:35:49] I made a puppet change for the role recently [20:35:52] I wonder if I broke it [20:55:54] Ryan_Lane: Are you still tinkering with I-000002ae or shall I try to fix it? [20:56:08] andrewbogott: I think it's likely due to a puppet change I made [20:56:13] do you mind investigating it? [20:56:20] nope, I'll look. [20:57:29] I think it's the datadir => $::mysql_datadir ? { stuff [20:57:36] it appears to not work properly [21:01:49] WTF [21:01:59] Why is sudo git pull in /var/lib/git/operations/puppet not working for me [21:02:04] It gives me a publickey error [21:02:46] Ryan_Lane, ^demon ----^^ [21:02:57] sec [21:02:58] <^demon> Oh what instance? gerrit? [21:03:03] Yes [21:03:08] It seems to be using a dedicated username [21:03:10] origin ssh://labs-puppet@gerrit.wikimedia.org:29418/operations/puppet.git (fetch) [21:03:15] RoanKattouw: try now [21:03:22] Nope [21:03:23] <^demon> Ah, gerrit instance isn't puppetized properly anyway. [21:03:28] <^demon> So puppet won't do you much good. [21:03:31] I know that, but [21:03:35] RoanKattouw: I don't see an attemp [21:03:38] *attempt [21:03:39] It has puppetmaster::self and the gerrit classes enabled [21:03:48] RoanKattouw: you sure you are using 29418? [21:03:52] hm [21:03:53] <^demon> Yes, but they fail horribly when you run puppetd -tv [21:03:53] So while Timo edits the CSS, Puppet will periodically destroy his work [21:04:07] it says so in the origin line. heh [21:04:17] oohhhhh [21:04:21] this is the instance [21:04:30] not production [21:04:33] <^demon> RoanKattouw: Look at the last time puppet ran. Puppet won't overwrite his work. [21:04:34] ignore me [21:04:53] lol [21:04:53] <^demon> The manifest is too broken for it to complete. [21:16:20] WHOA WHOA WHOA [21:16:24] LABS IS WORKING AND THE SITE IS NOT?? [21:17:29] dschoon: -_- [21:17:47] maybe we should redirect traffic to labs/ [21:17:48] ? [21:18:17] Ryan_Lane: You aren't using the XML interface for nova are you? (I'm not even sure what that is, but nova folks are talking about ripping it out.) [21:18:26] nope [21:18:27] json [21:18:39] andrewbogott: they support xml and json [21:18:42] json being the default [21:18:50] Right, I think that they may move to just json. [21:18:57] dschoon labs are powered by volunteers, partially ^^ :P [21:19:24] that makes them more stable :D [21:19:57] * Damianz kicks dschoon [21:20:58] "Wikimedia Labs: powered by volunteers, set on fire Ryan_Lane" [21:21:03] <3 [21:21:11] set on fire by? [21:21:16] heh [21:21:31] andrewbogott: yeah, they do. I have no problems with that :) [21:21:48] it's easier to only have one format that always works, rather than one that always works and another that never works [21:21:57] * andrewbogott nods [21:58:06] Ryan_Lane: did you get any informations from the Gluster team regarding the upgrade ? :-) [21:58:13] yes [21:58:14] I can do it [21:58:34] and plan on doing so right about now [22:01:49] Ryan_Lane++ thanks [22:02:39] Ryan_Lane: would be awesome :-) [22:02:49] hm. I don't see how to add a package into two distros with reprepro [22:03:17] chrismcmahon: Matthias did a nice job at fixing AFTv5 :-) [22:03:33] chrismcmahon: he pointed a missing dependency: ClickTracking was not installed :-D [22:04:25] hashar: yep, I caught that, thanks [22:04:39] chrismcmahon: I also need to write some goals for 2012 / 2013 :/ [22:04:48] hadn't have the occasion to think about it [22:05:07] will end up rushing a few ideas I get tomorrow. Will send them to Rob and CC you [22:05:28] sounds good [22:09:31] Ryan_Lane: what ever the end result is with Gluster, can you please mail me the result ? Would be great :-) [22:09:37] well.... [22:09:52] we're running into issues importing the package into reprepro [22:43:30] out for bed, good luck Ryan_Lane with Gluster :) [22:43:38] hashar: it's updating right now [22:53:15] Change on 12mediawiki a page Developer access was modified, changed by Burthsceh link https://www.mediawiki.org/w/index.php?diff=568543 edit summary: [23:32:45] hm. though keystone is kind of a pain in the ass in ways, it's also nice in some ways too. it keeps a service registry [23:32:57] so, you request the endpoints for all services and all regions [23:33:08] it'll give you back the service and all urls for contacting them [23:34:01] the application on the user side only needs to know the identity url, and it can get everything else from it [23:35:25] which means we can also completely shard per region. the only downside is that we'll need do to twice as many queries