[00:01:28] and as for security rules? [00:01:54] security groups are firewall rules [00:02:09] I knew that [00:02:10] basics of labs is that you can only use it for wikimedia related purposes [00:06:28] So I make a rule with 80 as the port, and incoming connections will be accepted on it? [00:07:10] (I'm sorry, I'm new to OpenStack and the like.) [00:07:15] * Ryan_Lane nods [00:07:19] CIDR should be 0.0.0.0/0 [00:07:25] protocol should be tcp [00:07:30] from 80 to 80 [00:07:45] Thanks [00:07:48] yw [00:09:13] Ryan_Lane, where does nova-manage look for my admin password? [00:09:28] nova-manage? [00:09:33] umm [00:09:43] I thought nova-manage doesn't take credentials at all [00:09:57] I am trying to create a project. It's failing, and the logfile says 'INVALID_CREDENTIALS' [00:10:05] ah [00:10:10] #nova-manage project create --project demo --user andrew [00:10:29] if it is talking to ldap, then it's the LDAP configuration in nova.conf that needs to be fixed [00:13:14] ldap_password in nova.conf looks right to me. But, could be my ldap is still screwed up. [00:13:31] can you do an ldap search against it manually using the credentials? [00:15:37] PROBLEM host: simplewikt is DOWN address: simplewikt PING CRITICAL - Packet loss = 100% [00:15:37] RECOVERY dpkg-check is now: OK on diablo-n-gluster diablo-n-gluster output: All packages OK [00:20:08] hm... indeed, ldap is broken. "SASL(-4): no mechanism available" [00:20:30] use -x [00:20:36] which is simple, rather than SASL [00:21:04] Ugh! [00:21:16] Jews: ? [00:21:26] Not talking to you [00:21:31] Ah, ok, that gets me a normal 'Invalid credentials" [00:21:53] Jews: why not? did I do something to offend you? [00:21:58] No. [00:21:59] * Ryan_Lane likes trolling [00:22:08] lol :) [00:23:14] instance is timing out [00:23:35] stuck on pending... lol [00:23:44] sometimes it takes a bit [00:23:55] unless the system ran out of memory [00:24:19] k [00:24:52] hmm. I need to more move instances to virt0 [00:24:59] err. virt1 [00:25:41] well, virt2 is reporting only about 1GB of memory left [00:26:08] which instance is showing as pending/ [00:26:20] oh [00:26:24] you mean this page? [00:26:29] !instance I-00000148 [00:26:29] https://labsconsole.wikimedia.org/wiki/Help:Instances [00:26:33] yes [00:26:33] bah [00:26:43] that's because that attribute doesn't properly get updated [00:26:54] @instance [00:27:01] @search instance [00:27:01] Results (found 4): instancelist, instance-json, access, instance, [00:27:12] !resource I-00000148 [00:27:18] * Ryan_Lane rolls his eyes [00:27:26] the instance list shows running [00:28:06] Timed out for me some time ago\ [00:28:27] still timing out [00:28:38] did you reboot it? [00:28:51] i'll try [00:28:56] no. don't [00:28:58] I was asking [00:29:08] already did [00:29:14] It might help [00:29:41] yep. :O [00:29:59] well, I still can't ping it [00:30:07] did you remove rules from the default security group? [00:30:10] I just deleted it [00:30:16] I didn't, I added rules [00:30:25] why delete it? [00:31:03] what made it die all of a sudden? [00:31:19] my reboot [00:31:25] it came back up [00:31:32] it wasn't replying to ping [00:31:47] Meh, I'm confused. [00:32:00] Ryan_Lane: I'm still jabbing at ldap but, meanwhile -- it occurs to me that I probably need some mysql stuff installed as well. Do you know which classes I need for nova? [00:32:09] Ah, now I see [00:32:10] so, before, when I asked if you had rebooted, had you *already* rebooted? [00:32:19] port 22 is not opened and all that fun stuff [00:32:25] it should be [00:32:28] fixing [00:32:35] it's open.... [00:32:50] the default security group has port 22 [00:32:51] and icmp [00:33:12] andrewbogott: see nova-production1's configuration [00:33:17] I used a different ruleset [00:33:24] 'k [00:33:32] Now fixed [00:33:39] Jews: *always* keep the default group checked [00:33:44] Okay. [00:33:49] * Jews forgot [00:33:54] please read the docs :) [00:33:55] !instance [00:33:55] https://labsconsole.wikimedia.org/wiki/Help:Instances [00:34:32] "In general" -- change it to "always" :P [00:36:00] well, there's cases where you wouldn't want to [00:36:20] but unless you know why you shouldn't, you should ;) [00:36:37] back in a little bit [00:44:56] PROBLEM host: simplewikt is DOWN address: simplewikt CRITICAL - Host Unreachable (simplewikt) [00:49:56] Looks done now. [00:50:48] !log simplewiki After some confusion, added firewall rules correctly and made new instance. Hope it works now... [00:50:49] Logged the message, Master [00:52:28] I'm ready for the IP now [00:52:50] Also, https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-00000145 seems stray [00:53:48] stray? [00:53:50] what do you mean? [00:54:15] https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-00000145 [00:54:27] yeah. I don't understand what you mean by stray? [00:55:17] Says pending and I can't SSH in, and my small instance is working now [00:58:14] i think I'm figuring out gerrit a little bit…just figured out that it wanted me to interactively merge all of my previous master commits as part of a rebase just to merge from master into my branch [00:58:20] now i want to merge my branch back to master [00:58:38] but i'm afraid that it will try to do some kind of crazy rebase + review thing [00:58:45] if I've already had my stuff reviewed in my branch [00:58:48] and approved [00:58:49] Ryan_Lane: Before you vanish for good... any ideas about what's going on with ldap on my instance? (I've twiddled so many puppet settings at this point that I'm tempted to start over from scratch.) [00:58:52] how do I now put it back in master [00:59:24] or. where is the appropriate place to ask this question? [01:06:10] ottomata: This is not a terrible place to ask, but I don't know the answer, and it's after 5pm in California which means you may need to re-ask next week. [01:06:43] Ryan_Lane, main point is that my simplewikt instance is ready to demo [01:08:12] thanks andrewbogott [01:10:46] RECOVERY host: simplewikt is UP address: simplewikt PING OK - Packet loss = 0%, RTA = 6.57 ms [01:33:15] hello? [02:39:06] RECOVERY Free ram is now: OK on bots-sql3 bots-sql3 output: OK: 20% free memory [02:41:46] RECOVERY Free ram is now: OK on puppet-lucid puppet-lucid output: OK: 20% free memory [02:42:36] RECOVERY Current Users is now: OK on testing-ldap-build testing-ldap-build output: USERS OK - 0 users currently logged in [02:44:06] RECOVERY Disk Space is now: OK on testing-ldap-build testing-ldap-build output: DISK OK [02:44:06] RECOVERY dpkg-check is now: OK on testing-ldap-build testing-ldap-build output: All packages OK [02:45:26] RECOVERY Free ram is now: OK on testing-ldap-build testing-ldap-build output: OK: 61% free memory [02:45:36] RECOVERY Total Processes is now: OK on testing-ldap-build testing-ldap-build output: PROCS OK: 80 processes [02:46:46] RECOVERY Current Load is now: OK on testing-ldap-build testing-ldap-build output: OK - load average: 0.01, 0.09, 0.08 [02:57:39] Ryan_Lane: Are you there? [02:59:36] Hydriz: yes, but not for long [02:59:50] Can I get access back? [02:59:57] Or are you still on vacation :( [03:00:22] got back recently [03:12:06] PROBLEM Free ram is now: WARNING on bots-sql3 bots-sql3 output: Warning: 18% free memory [08:40:52] Hello, I'm admin of the French Wikisource and I would like to have a write access to http://wikisource-dev.wmflabs.org in order to try MW extensions before ask their installation in Wikisource. Zaran say me that I've to ask here. What should I do ? [14:47:32] https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=commit;h=f6359eefead213987ca623553b5571d28ac8fe8a < Ryans meeting with the legal department not go well. [15:13:45] PROBLEM Current Load is now: CRITICAL on diablo diablo output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:17:35] PROBLEM dpkg-check is now: CRITICAL on diablo diablo output: DPKG CRITICAL dpkg reports broken packages [15:18:45] RECOVERY Current Load is now: OK on diablo diablo output: OK - load average: 0.31, 1.05, 0.85 [15:22:35] RECOVERY dpkg-check is now: OK on diablo diablo output: All packages OK [16:03:48] PROBLEM Current Load is now: CRITICAL on gluster-devstack gluster-devstack output: Connection refused by host [16:04:28] PROBLEM Current Users is now: CRITICAL on gluster-devstack gluster-devstack output: Connection refused by host [16:05:08] PROBLEM Disk Space is now: CRITICAL on gluster-devstack gluster-devstack output: Connection refused by host [16:05:48] PROBLEM Free ram is now: CRITICAL on gluster-devstack gluster-devstack output: Connection refused by host [16:06:58] PROBLEM Total Processes is now: CRITICAL on gluster-devstack gluster-devstack output: CHECK_NRPE: Error - Could not complete SSL handshake. [16:07:38] PROBLEM dpkg-check is now: CRITICAL on gluster-devstack gluster-devstack output: CHECK_NRPE: Error - Could not complete SSL handshake. [16:54:28] RECOVERY Current Users is now: OK on gluster-devstack gluster-devstack output: USERS OK - 2 users currently logged in [16:55:08] RECOVERY Disk Space is now: OK on gluster-devstack gluster-devstack output: DISK OK [16:55:38] I really hope that Ryan is going to arrive [16:55:48] RECOVERY Free ram is now: OK on gluster-devstack gluster-devstack output: OK: 67% free memory [16:56:58] RECOVERY Total Processes is now: OK on gluster-devstack gluster-devstack output: PROCS OK: 134 processes [16:58:48] RECOVERY Current Load is now: OK on gluster-devstack gluster-devstack output: OK - load average: 0.00, 0.10, 0.07 [17:01:00] * andrewbogott wishes that nagios didn't send messages to IRC because of my standard development cycle. [17:22:07] Ryan_Lane: Not enjoy your meeting yesterday? https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=commit;h=f6359eefead213987ca623553b5571d28ac8fe8a [17:22:26] heh [17:22:34] nah. thought my laptop was stolen [17:22:37] seems it wasn't [17:23:33] Ah, that would of sucked [17:26:52] ji [17:26:53] hi [17:26:59] :) [17:27:29] howdy [17:27:38] cool [17:28:06] gonna write notes http://etherpad.wikimedia.org/LabsIrcConf [17:28:09] from conference [17:28:14] * Ryan_Lane nods [17:28:18] k [17:29:12] It's in half an hour or an hour an a half? Silly timezones [17:32:42] huh [17:35:54] I thought it was in 30 mins? [17:37:46] yeah. it's in 30 mins [17:37:56] it's 17:37 GMT right now [17:56:23] I hope that people who do not already use wmf labs are going to join us too, at some point it should be some intro too [17:58:16] I hope they're not all bot owners as 2/3 are raped daily :P [17:58:26] really? [17:58:41] I hope some "they" are here :) [17:58:52] Well actually it might be fixed now as Beetstra was sorting her..(his?) bot. [17:58:54] or it's gonna be a meeting of us 3 [18:00:09] *gong* [18:00:14] :D [18:00:15] this they (singular) is somewhat around (having dinner) [18:00:18] I assume it did get posted to say wikitech? [18:00:23] yes [18:00:24] it was [18:00:42] Mmm dinner I wish but that would take me like 40min, might be lazy and just order pizza. [18:01:04] petan: so, this is your thing, want to start it off? [18:01:09] ok we can discuss food later :) [18:01:23] Why .. are bots more important than food? [18:01:30] right I never really did any conference, online or any other, but I guess yes [18:01:31] * Ryan_Lane is hungry and wants breakfast ;) [18:01:45] and I just ate supper... [18:02:03] I think we can start talk all points we have in pad [18:02:26] So you're handing back to Ryan_Lane for 'tech specs'? :P [18:02:27] the first part was meant to be some intro for newbies but I don't know if some are around [18:02:38] link? [18:02:44] http://etherpad.wikimedia.org/LabsIrcConf [18:03:18] technical specs. right now we have 4 compute nodes, and one controller [18:03:33] ok the compute nodes are "physical servers" right? [18:03:43] the controller is as well [18:03:52] what are technical specs of these servers? [18:04:07] I've noticed there is nothing about it on wikitech [18:04:24] the compute nodes have 48GB memory, two processors with 6 cores each [18:04:53] and 1.2 TB of storage per node [18:05:04] the storage is in a raid1 [18:05:05] err [18:05:07] raid10 [18:05:14] ok, so we can have around 24 instances on one server [18:05:17] What does labs relay on mwf production for (apart from network), just apt? [18:05:32] and the nodes share storage with each other using gluster [18:05:37] basically just apt [18:05:46] why is svn firewalled on labs? [18:05:49] ssh one [18:05:52] ssh [18:06:07] we may open that back up, or may just keep it closed until git transition [18:06:20] all of labs is blocked from production for ssh [18:06:47] it's to avoid hijacking of forwarded keys [18:06:59] Is there still a rough plan to switch to gluster for project storage and move away from assigning public IPs directly to have a domain based forwarding on nginx/varnish/some nice proxy? [18:07:15] both, ues [18:07:17] *yes [18:07:35] I have the gluster storage nodes installed, and they are running a glusterfs cluster that is peered [18:07:43] ok, let's have a bit of explanation of vm's we have, the open stack support moving instances from one node to other right? [18:07:48] I just need to write some code to make them share to the projects now [18:07:55] :) [18:08:14] work on a proxy solution hasn't started at all, but is planned [18:08:21] petan: yes [18:08:25] we can do live migrations [18:08:31] I need to do some right now, in fact [18:08:53] virt1 still doesn't have nearly as many instances as the others [18:08:54] so in case that one node needs to be restarted it's possible to move all instances to running one [18:09:01] yes, but it's painful [18:09:13] current nova support for live migrations is kind of crap [18:09:28] Can there not be logic that if we 'shutdown' a node via openstack it handles shifting all the vms off. Like XenServer does if you mark a server as broke [18:09:34] you basically need to do one instance at a time, or the migrations fail in really shitty ways [18:09:37] but in case there is outage of some node it's possible to quickly recover it? [18:09:44] so there should not be long outages, taking days [18:09:50] Damianz: I could write a script for it, yes [18:10:29] well, I'm not sure if it's possible to do cold migrations [18:10:47] I'll need to ask the openstack people about it. I don't see it in the docs and I haven't dug into the code [18:11:03] in that situation, right now, we'd need to fix the node [18:11:08] for example if one instance is rebooted on loaded node is it possible to move it automaticaly to node with low load? [18:11:29] automatically? no [18:11:33] ok [18:11:35] it's a manual process [18:11:44] so ballancing load between nodes is manual [18:11:46] this will be less problematic with the next set of hardware [18:12:10] I don't have the specs of the new hardware, but they are kind of great :) [18:12:19] 256GB of memory or so [18:12:30] In relation to if it's broken - do we plan to have some form of labs<>production mailing list/jira instance etc (or do we have it now) where by say a node is physically down rather than nagios moaning someone from mwf ops will sort it. Not sure how to explain it but rather than the TS model of admins manage have a community/admins manage is strange for certain things. [18:12:38] memory is our limiting factor right now [18:12:57] CPU utilization is very low, but we are overallocated on memory, and the hosts are swapping [18:13:15] In relation to re-balencing/migration - Once ganglia is live we probably should look at something where by if one node has some very cpu intence vms of live migrating them automagically. [18:13:43] Damianz: labs is considered "quasi-production". we'll try to fix it quickly, but it may not be to the support level of production [18:14:01] or do you mean things inside of labs? [18:14:16] everything on the instances is community managed [18:14:22] Well I think things inside say a project is down to the project manager/members which is community [18:14:29] But the community can't fix say switches exploding. [18:14:48] right. hardware and infrastructure that supports labs will be handled by ops [18:15:16] I think there are many communication methods either bugzilla or mailing list we have [18:15:33] indeed [18:15:48] I believe that labs-l is watched by ops [18:16:00] if something breaks with the hardware, we'll get alerted by production nagios [18:16:09] another thing I wanted to sort out [18:16:15] who all from ops is responsible for labs [18:16:25] basically just me [18:16:25] So say for a feature request - are we looking for bug tracker or sending a review of a puppet diff for merging into production? [18:16:39] we are hiring another position for labs [18:16:52] ok, so in case there is a problem with labs, we need to wait for you? [18:17:02] including requests for new accounts etc [18:17:05] Damianz: I'd say both, usually [18:17:21] I'm more thinking we have 2 groups of people - those who can write up manifests and they will get merged and those who notice bugs and not be able to fix them (and those that ask ops for roject issues..) [18:17:22] anyone with the ability to make svn accounts can make labs accounts [18:17:33] the dev process and labs are kind of mingled together [18:18:15] also, if there is a problem with labs, and I'm not around, someone else on ops will likely take a look at it [18:18:27] everyone works together on breakages, usually [18:18:33] ok, so when we need to have an account or project created, we can basically ping anyone from ops? or certain people [18:18:57] well, so far only a few have made projects and such, but they all have permissions [18:19:22] We could look at something like AIAV or whatever it's called where account/project requests go and someone with access picks it up. [18:19:27] we should really have some way of requesting project creations [18:19:32] maybe on bz? [18:19:44] I was also thinking of allowing anyone who is netadmin and sysadmin in a project to be able to create projects for others [18:20:05] seems a good idea [18:20:07] bz could work for it, yeah [18:20:10] that would make sense but it's a bit security issue [18:20:15] I think creating projects should b ereally easy [18:20:23] well, actually not much of a security issue [18:20:25] if u create 50 projects and 400 instance [18:20:32] ah. true [18:20:33] you get labs down [18:20:34] You could easily limit projects/instances though [18:20:42] right now projects are quota'd [18:20:49] 10 instances per project [18:20:49] petan, if they can make a vm in an existing project, what do they gain creating projectgs? [18:20:59] Platonides: override quota [18:21:01] the ability to create an unlimited number [18:21:13] throttling [18:21:39] well, it's something to think about [18:21:44] Maybe have a voting system or something where 2 people who are netadmin in projects can create a new project... or have an ccount over a certain age. [18:22:25] it could be permission based, and auto-confirm based, yeah [18:22:35] Tbh though if it's something that forces us to use the wiki then I'd rather not :P [18:22:47] I wouldn't want to do that with other things, but I see no reason to limit project creation to the web interface [18:22:52] err [18:22:54] *not to limit [18:23:15] I don't see any realistic way of letting people create projects from the cli [18:23:17] Yeah, that's more 'specialised' though - day to day stuff would be a pain to restrict ourselves to the web interface. [18:23:49] yes, we're working on moving OpenStackManager specific things into openstack so that we can allow people to use the cli [18:24:14] but, project creation isn't one we plan on changing [18:24:28] we skipped the rules part :) [18:24:36] Rules? how boring :P [18:24:39] heh [18:24:44] well, we do need to discuss that [18:24:50] so, I had a meeting with legal yesterday [18:25:03] people in general don't know how to name their instances so we sometimes end up with a crazy ones, check nagios to see what all we have there [18:25:03] we'll be writing up a terms of service, and a privacy policy soon [18:25:19] so we could make a naming conventions for instances [18:25:19] petan: Could we group nagios by project? :) [18:25:23] Damianz: yes [18:25:29] people will be required to accept the terms of service to login [18:25:30] will be working on that [18:25:51] also, it's looking like we won't require people to identify [18:25:54] ok, can you make a list of most important parts of these terms? [18:26:29] labs project will be required to display the privacy policy and terms of use, and they must show a warning to users anywhere information can be collected [18:26:47] also, privacy information should simply not be kept where possible [18:27:06] ok [18:27:23] so that people can become a check users on test site, even if not identified to foundation [18:27:35] example [18:27:48] hmm... that could be unexpected [18:27:49] we'll likely need to remove public IPs from projects that don't properly display terms of use and privacy policy, or don't display warnings, until the project complies [18:27:56] if the users login in there aren't aware of that [18:28:05] Will there be any project based terms or just labs? [18:28:18] Platonides: it'll be displayed on the login page, in that situation [18:28:36] Damianz: what do you mean? [18:29:08] Ryan_Lane: is it possible to make exception in case all members of project are identified? [18:29:17] so that there is no requirement to inform people etc [18:29:27] I think we'd prefer to not have to identify people [18:29:54] ok, but if the members of project chose to do that rather than changing code of sw to show the warnings etc [18:29:57] it's an administrative burden [18:29:59] For any projects taht say require storing private information - will they be restricted with seperate usage/terms stuff or not. For example (eventhough it shouldn't be dealing with live data) the donations stuff every year. [18:31:03] so, access to any project that may hold production private data will require a signed confidentiality agreement with the foundation [18:31:17] Makes sense [18:31:46] access to those projects will require another role, so it won't technically be possible to log in, even if someone accidentally adds another person [18:31:49] ok [18:32:07] it also may be a separate hardware cluster, etc, etc [18:32:18] we're still working out very sensitive data and labs [18:32:44] can we get a basic rules of what is allowed / disallowed and how people should name their instances :) [18:32:45] That won't affect the general stuff though? Like deleted diffs and having access to the production dbs for say bots. [18:33:05] probably db should not content private data [18:33:07] like TS [18:33:20] sure. of course, this is all a precursor to the terms of service [18:33:27] 1. no hacking [18:33:41] Ryan_Lane: *ehhem* cracking [18:34:09] of course, authorized pen tests for things inside of labs may be OK, but only if an exception has been granted. meaning, you need to ask ops first. [18:34:14] malicious hacking ;) [18:34:31] 2. No proprietary software [18:34:44] of course, it should also be possible to get an exception to that rule too [18:35:01] there's some cases where we need proprietary software for testing. like db2 or oracle [18:35:13] ok [18:35:17] 3. No copyrighted data [18:35:45] but even the gpl sw has some kind of copyright or not? [18:35:49] again, exceptions can be made, but people need to ask [18:35:55] fair use is also ok [18:36:06] right [18:36:08] You mean copyright without decent access like gpld stuff that someone holds the copyright for? [18:36:12] sorry [18:36:15] let me clarify :) [18:36:27] only open source software, and open content licenses [18:36:35] OSI approved open source licenses [18:36:59] Awwww [18:37:04] ok [18:37:04] WTFPL isn't accepted by the OSI IIRC [18:37:10] :P [18:37:12] heh [18:37:33] basically, we'd like to avoid getting sued by the RIAA or MPAA [18:37:46] People get sued? I thought you just got raided by the FBI [18:37:52] I have a feeling we didn't make friends with them with that whole SOPA thing ;) [18:37:53] heh [18:38:10] 4. No torrenting [18:38:19] Downloading or? [18:38:22] either [18:38:25] :( [18:38:28] no bittorent [18:38:37] even legal? [18:38:42] Not even opensource stuff? Some things are only avaible for torrent download [18:38:45] torrents of linux :) [18:38:49] heh [18:38:55] of course, there can be exceptions, but it'll require approaval [18:38:59] No offering of illegal content [18:39:04] I wrote this [18:39:16] well, that goes into 3. [18:39:17] feel free to fix it [18:39:24] ok I write to epad [18:39:44] but yes, no illegal content, as defined within the United States. Legal may change all of this wording, btw ;) [18:39:50] s/may/likely will/ [18:40:09] 4. No tor nodes [18:40:11] Yeah I assume everything is 'as in the US' as the servers are over there and mwf is over there mainly. [18:40:20] again, exceptions can be made to this policy [18:40:25] Damianz: yep [18:40:35] and we have no plans to have labs infrastructure outside of the US [18:40:46] this is difficult enough, legally, without involving other countries [18:41:26] 5. Only Wikimedia related work is allowed [18:41:31] Any thoughts on proxies in general? (not say socks for accessing labs but web etc, more along the lines of tor). [18:41:35] of course, that's pretty broad [18:42:12] #4 was no tor nodes ;) [18:42:20] or anything related to that [18:42:28] Open web proxies etc I assume you fall under tor [18:42:36]