[00:05:11] if you do that you'll be my hero [00:05:30] I wanted to fix it for ages but haven't found the time [00:06:31] so this is what turns out undefined <%= scope.lookupvar('puppetmaster::config::gitdir') %> [00:08:01] yeah [00:08:11] the whole cronjob should not be in labs [00:11:01] i wonder why it is on all instances when it's "puppetmaster server side scripts" [00:11:22] puppetmaster::self probably :) [00:11:29] ah, yeah [00:13:33] i guess we want to keep the other cron job though, the one that deletes old puppet reports [00:13:57] seems to make sense in labs as well.. i see report files on an instance with puppetmaster::self [00:20:57] 08/30/2012 - 00:20:57 - Created a home directory for rmoen in project(s): bastion,editor-engagement [00:21:09] Yay :) [00:26:01] 08/30/2012 - 00:26:01 - User rmoen may have been modified in LDAP or locally, updating key in project(s): bastion,editor-engagement [00:26:02] paravoid: yeah, i dont have a better idea than just adding another realm check [00:26:11] !change 21981 | paravoid [00:26:11] paravoid: https://gerrit.wikimedia.org/r/#q,21981,n,z [00:26:29] at least for now to stop the mails [00:26:52] add an "else ensure => absent" and I'm +2 :) [00:27:18] and remind me to buy you a beer or two in ~two weeks [00:31:15] paravoid: howdy [00:31:36] hi Ryan [00:31:40] I was looking for you before [00:39:56] paravoid: patch set 4 [00:40:25] hm, the bot is down probably... [00:40:45] Patch Set 4: Fails [00:41:12] ugh, never mind :p [00:42:24] too many semicolons.. i blame PHP :) [00:42:28] patch set 5 [00:42:54] paravoid: oh? [00:43:05] ah, chat from yesterday [00:45:11] mutante: hahm, there is prettier/more "puppet way" of doing that [01:01:08] paravoid: so, there's a bunch of small tasks we can do to help things out [01:01:10] in labs [01:01:22] we can switch to virtio as the network driver [01:01:34] we can switch the home directories to gluster [01:01:48] it's likely best to switch to virtio before switching the homedirs :) [01:02:02] paravoid: I talked to the piston cloud people today at vmworld [01:02:19] their commercial openstack distro uses ceph [01:02:30] and they are using boot from volume with it [01:02:58] they have a bunch of proprietary changes to openstack to make it work well when using boot from volume [01:03:01] do they have any users? :P [01:03:14] they won "best private cloud solution" at vmworld [01:03:20] at *vmworld* [01:03:37] mutante: see patchset 5 [01:03:57] so, yeah, they do have some customers. heh [01:04:03] isn't vmworld a vmware thing? [01:04:06] yes [01:04:11] which makes that funny [01:04:20] cloud people are crazy, we knew that [01:04:21] :P [01:04:28] an openstack distro won best private cloud solution at a vmware conference [01:04:36] yeah yeah I got that :) [01:04:46] it made me laugh :) [01:05:00] do we have any (better) news regarding ceph from andrew? [01:05:04] and/or asher? [01:05:05] kind of [01:05:14] except they use ssds [01:05:21] paravoid: oh yeah, gotcha, thanks [01:05:21] which isn't out of the question for us [01:05:27] the instance storage is fairly small [01:05:30] I mean, if it's going to be crappy as hell and use proprietary patches... [01:05:39] but you said boot from volume? [01:05:42] ceph isn't using patches [01:05:44] openstack is [01:05:52] I got that, but still [01:06:03] boot from volume works right now [01:06:06] slow + proprietary patches, that's already two minuses [01:06:08] the problem is live migration with it [01:06:16] we don't need the patches [01:06:35] I asked them about performance [01:07:11] if we're going the boot as volume way, I'd prefer if we'd imitate production rather than tell everyone to use /mnt or whatever for their data... [01:07:36] we don't need to have /mnt at all [01:07:50] I'd prefer that we don't have secondary storage on our instances at all [01:07:54] right [01:07:56] agreed [01:07:58] and that people mount volumes when they need them [01:08:30] we can't really do that until we have a volume service [01:08:37] so, okay, I'm *all* from switching away from local storage [01:08:42] well...... [01:08:52] what piston is doing is using the compute node storage [01:08:54] but don't we have other more urgent things to do? [01:08:55] like we are [01:08:57] yes [01:09:01] this is just ideas for the future [01:09:04] as Ceph nodes you mean? [01:09:07] yes [01:09:13] yeah, I've seen that before and I like it [01:09:39] it's good to think about this kind of stuff so that we know where we'd like to go when we have time to do so :) [01:09:42] but... [01:09:52] because if you split your clusters, you have a bunch of compute systems with no disk I/O at all and a bunch of storage systems with no CPU usage at all [01:09:56] if we're considering this, it may be good to replace the disks in eqiad with ssds [01:10:03] yep [01:10:06] replace them with SSDs? [01:10:06] agreed [01:10:13] replace 1TB disks with 100G SSDs? [01:10:22] instance storage can be fairly small [01:10:23] or is it 600G SAS? [01:10:33] 500GB SAS, I believe [01:10:39] there's no 500G SAS afaik [01:10:42] hm [01:10:44] it's 146/300/600 [01:11:22] I think using SSDs for the / of *all* labs instances is a bit of an overkill [01:11:49] well, it would be for the ceph storage [01:11:51] they're 300GB SAS [01:11:54] ah [01:12:07] if it's the only way for us to get decent performance out of ceph... [01:12:29] it would be good to test it [01:12:43] to see how much of a different it would make, and what the cost difference would be [01:12:59] I'm not a big fan of throwing money to broken software to solve its problems :-) [01:13:00] of course, we don't really have much for hardware budget this year [01:13:11] it's not necessarily broken software [01:13:17] however, people do suggest to use SSDs for the ceph *journal* [01:13:27] every write needs to do multiple writes across the network [01:13:28] if you use ceph rather than rados that is [01:15:27] the piston people said: don't use btrfs, and don't use cephfs [01:15:59] I wouldn't use btrfs [01:16:06] but asher said that cephfs was faster than rados [01:17:09] really? [01:17:21] I think so [01:17:24] the shared, nfs style block storage? [01:17:40] I've read the opposite [01:20:05] ignoring that whole thing, though.... [01:20:12] I'd really like to kill of some of the smaller issues [01:20:17] we need to add a new network in pmtpa [01:20:26] we need to switch to virtio [01:20:36] we need to move the home dirs away from the nfs instance [01:21:00] we need to do network node per compute node (which means I need to apply that bgp patch) [01:21:17] the bgp patch needs exabgp moved to ubuntu, with the patch I sent in applied [01:21:28] the patch I sent into exabgp was accepted upstream [01:21:45] exabgp is already packaged in debian [01:22:20] yeah, those would be nice [01:22:35] add a new network in pmtpa -- also have ipv6 in labs finally [01:22:39] yes [01:22:46] the new network should be added with ipv6 [01:22:55] we need to figure out how to make it apply to the old network too [01:23:00] maybe we can test that in ewiad [01:23:01] eqiad [01:23:10] we need to set up eqiad... [01:23:13] you didn't say that :) [01:23:17] well, it's set up [01:23:24] it can be used for testing anyway [01:23:32] no I mean add it as a region [01:23:35] we can't allow users to access it yet [01:23:35] yeah [01:23:42] I'd like to tackle some of these smaller tasks first [01:23:49] sure, agreed [01:24:05] it'll mean eqiad comes up with these things fixes from the beginning [01:24:16] oh [01:24:21] how difficult is to switch to virtio? [01:24:24] isn't it just a single setting? [01:24:26] we need to bring up the storage in eqiad too [01:24:34] we'll need to modify the libvirt configs [01:24:46] and change the openstack config so that new instances will get virtio [01:24:49] how about new VMs? [01:24:51] ah [01:24:58] it should be a fairly easy change [01:25:01] okay, I've told this before [01:25:10] but it's hard for me to keep track of all these things [01:25:15] yeah [01:25:17] I mean, we've talked about all of them [01:25:22] should we put these in to bugzilla now? [01:25:23] but I keep forgetting some [01:25:36] heh yeah, that was going to be my suggestion [01:25:44] cool. let's do it [01:25:55] and I hate how we do both RT & bugzilla for infrastructure work :/ [01:26:07] but one problem at a time [01:26:08] labs should really only use bugzilla [01:26:15] well, what's "labs"? [01:26:18] Wikimedia Labs project [01:26:34] sorry, that wasn't a response [01:26:45] just mentioning which product to use for bug reports :) [01:26:55] most of this work is in "production" systems [01:26:59] yes [01:27:03] true [01:27:15] and I think Mark will hate you if you e.g. open a bugzilla for the BGP peerings :) [01:27:25] it can still be done by volunteers, though [01:27:39] hm? switch to virtio or add a network? [01:27:45] yes [01:27:51] absolutely [01:27:58] openstack is set up in labs [01:28:05] you can even launch vms inside of it [01:28:15] I don't think it can reasonably, but it's always nice to have tickets in public so that the community is aware [01:28:30] s/it/they/ [01:28:34] * Ryan_Lane nods [01:29:40] so, use bugzilla, or rt? :) [01:30:01] I hate the split in the systems [01:30:10] really we just need a public RT queue for labs [01:30:20] and for RT to use LDAP auth [01:30:27] can't agree more [01:30:33] then this problem goes away [01:30:36] can we do parent tickets with bz? [01:30:41] yep [01:30:49] they are tracking tickets [01:31:06] you use the "Depends on" field for that [01:31:16] right. sorry, not very experienced with bz [01:31:20] * Ryan_Lane nods [01:31:37] I've been a user in multiple projects but never had to do this sort of thing [01:31:41] hm. depends on may not really be right [01:31:57] let's look at a tracking ticket to see [01:32:21] found one [01:32:33] it's tracking depends on child and child blocks tracking [01:32:41] ah. right [01:32:45] that makes sense [01:33:12] heh [01:33:20] they have a tracking bug for tracking bugs [01:34:05] I really wish we could send irc bug notifications for labs into this channel [01:34:37] also, we probably need a "labsconsole" component [01:34:46] anyway, I'm opening the virtio bug now [01:35:48] ah ok [01:35:57] hey [01:35:59] /dev/vda1 on / type ext4 (rw) [01:36:01] I added the one about the network [01:36:04] ? [01:36:05] we already use virtio [01:36:09] how so? [01:36:11] for disks [01:36:17] not sure if that's the case for network too [01:36:21] right [01:36:25] network is not [01:36:31] disks are [01:36:38] ah [01:37:13] adding the ticket about moving away from nfs instance [01:37:24] I thought disks were not either [01:37:27] that's more important [01:37:49] the virtio network driver is *way* faster [01:37:52] think project storage [01:38:22] oh true [01:38:46] We already have home shares made for each project, byw [01:38:51] *btw [01:39:07] they are created and managed with the project storage [01:39:21] we just need to move the data and change the mounts [01:39:45] changing the mounts will likely be the hard part [01:39:48] oh. right.... [01:39:55] hm? [01:39:56] there's another thing I wanted to do too [01:40:51] rather than mounting each homecir [01:40:52] ok, I'm opening the tracking bug [01:40:55] mount all of home [01:41:04] as "Labs infrastructure work tracking bug" [01:41:09] then use pam to make home dirs [01:41:14] sounds good [01:41:16] yes! [01:41:20] pam_mkhomedir [01:41:22] yep [01:41:29] that's actually a blocker [01:41:31] also, passwordless sudo? [01:41:34] yes [01:41:35] that too [01:41:39] yay for opening bugs [01:41:46] :-) [01:42:03] if only I had more time to work on them :P [01:42:11] that's also my problem [01:43:00] is it possible to mount /home with automount? [01:43:06] * Ryan_Lane surely hopes so [01:43:17] I'm not totally sure it's possible [01:43:32] pam_mount [01:43:42] oh, you mean the whole of /home [01:43:45] yes [01:43:46] why automount? [01:43:53] fstab mounts are fucking evil [01:44:03] we can control autofs from ldap [01:44:10] you just said half of unix is evil :P [01:44:17] sorry [01:44:22] fstab nfs/gluster mounts are evil [01:44:56] it's more difficult to change, since we need to change it on every host [01:45:05] it's nice to have that config centralized [01:47:24] okay [01:47:29] open the bug about it [01:47:37] and I'm opening the network node bug [01:47:38] I have one opened for it [01:47:43] I added the network bug [01:47:48] ah [01:48:09] https://bugzilla.wikimedia.org/show_bug.cgi?id=39781 [01:48:20] ah sorry, I meant the network node [01:48:22] per compute node [01:48:25] ah [01:48:26] right [01:48:26] ok [01:48:36] let me link the ones I made [01:48:39] my bad [01:50:15] I'm adding passwordless sudo [01:52:13] https://bugzilla.wikimedia.org/showdependencytree.cgi?id=39784&hide_resolved=1 [01:52:16] \o/ [01:52:38] I'm adding "add eqiad as a production region" [01:52:43] cool [01:59:45] I think we ordered project storage in eqiad [01:59:48] in fact, I think it's there [02:02:03] paravoid: oh, I had an idea of how to replace the awful manage puppet groups interface [02:02:29] paravoid: add keywords as comments in the puppet manifests [02:02:37] for classes and variables [02:02:48] with group assignments for them. [02:03:03] then we can parse the manifests for them [02:03:14] it would work for per-project branches, too [02:03:26] whenever we have per-project branches [02:03:52] we'd need to come up with a syntax for it, though [02:11:12] paravoid: you should really go to sleep :) [02:11:16] I'm heading home [02:14:21] dammit [02:18:04] this view just 500'd: https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&project=editor-engagement&instanceid=i-000003dd [07:42:20] Damianz wtf is ur irc client [07:42:23] it's weird [08:12:41] !log deployment-prep petrb: /home/wikipedia/common/php git pull [08:12:43] Logged the message, Master [08:30:13] petan: kvirc? And yeah it's a little dodgy on osx [08:30:26] "Hey all, we will be down for about 20 minutes at 8pm PST to shine our unicorn horns. Sorry for the inconvenience." < labs totally needs to start announcing downtime like dropbox [08:30:39] heh [08:39:13] !log nagios changing puppet-FAIL command to `echo "Puppet has not run in the last 10 hours" && exit 2` from `/usr/share/nagios3/puppet_check.sh $HOSTADDRESS$` [08:39:14] Logged the message, Master [08:39:38] Damianz btw there are some things on nagios [08:39:49] I saw you tried to change the command for checks [08:39:55] there is a template file for parser [08:40:10] Yeah I saw, it cats in default then seds it =/ [08:40:11] if you wanted to change anything in config files, you need to change it there or it will be lost [08:40:28] you just change the template, enforce update [08:42:12] It's working for now apart from the hosts that havn't run the latest puppet yet, I'd rather take a clean instance and put everything in puppet and know it works before changing -main too much. [08:42:30] eh [08:42:32] noo [08:42:41] I would rather puppetize stuff in -main and keep it [08:42:51] changing hostname again... [08:42:53] bleh [08:43:01] we still didn't recovered from last change [08:43:16] Oh I'd rather keep -main because changing that requires changing ips and names all over [08:43:25] But I'd rather try crazy stuff in -dev then have puppet apply it on -main [08:43:30] why we need to change IP because of puppet [08:43:39] nrpe [08:43:42] ok as long as you won't break the current nagios [08:43:44] I don't care [08:44:02] not that it wasn't broken already :P [08:44:12] It was...er... somewhat broken [08:44:15] :D [08:52:48] How do I have 36 emails from last night to this morning o.0 bleh [09:00:53] !log deployment-prep petrb: php multiversion/MWScript.php changePassword.php --wiki enwiki --user Petrb --password needed to change [09:00:54] Logged the message, Master [15:57:52] hashar can I deploy OSB extension on beta [15:58:02] I really need to test it on prod like wiki [15:58:21] I deployed it before but someone was constantly deleting it from configs [15:58:36] is OSB scheduled for deployment ? [15:58:41] yes [15:58:49] actually it's being reviewed atm [15:58:58] hashar: not quite "scheduled" for deployment [15:58:59] and it's hard to review it when it's not installed anywhere [15:59:04] hashar: see [[Review queue]] [15:59:06] what is OSB anyway? [15:59:10] OnlineStatusBar [15:59:10] online status bar [15:59:44] we have so many queues and schedule to review … :-) [15:59:49] I don't know if it's scheduled but it's supposed to be deployed [15:59:57] yes the deployment process is more than confusing [16:00:02] hashar: can you name them? I can try to simplify [16:00:23] sumanah: I just ignore all of them [16:00:34] petan: there are going to be a lot of steps, that's unavoidable, but if I get told about what's confusing then I can at least try to list stuff out in a public wiki page [16:00:34] cause I don't have the resource / time to invest in tracking everything that happens ;-) [16:00:42] hashar: so, you don't know :) [16:00:53] petan: so you get Krinkle and Trevor reviewing it ? [16:00:58] sumanah it's hard to understand if it's scheduled or not. It was definitely approved by wiki community [16:01:02] petan: and then we will "soon" land it in production? [16:01:07] which should make it scheduled [16:01:24] has eiter of you read the wiki page about this? how to write an ext for deployment? [16:01:33] hashar I have no idea, I never understood this process, which started almost 2 years ago [16:01:33] petan: well go ahead [16:01:48] petan: but please make sure to add your extension configuration in operations/mediawiki-config [16:01:51] and pass it via Gerrit [16:01:52] sumanah that page changed significantly over years [16:02:01] ok [16:02:06] you can do your changes on -dbdump and submit them to Gerrit [16:02:08] Yes, I updated it to make it more accurate, petan [16:02:13] the remote is ssh://gerrit something [16:02:15] I think [16:02:17] hashar I don't know how [16:02:26] I still don't know these git tricks [16:02:27] so if you get a ssh key on -dbdump you should be able to commit and then push to gerrit [16:02:30] ohh [16:02:32] aha [16:02:43] if I uploaded ssh key there, it wouldn't be really safe [16:03:02] create a new key / password protect it ? [16:03:05] hashar: please skim https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment [16:03:10] but yeah indeed will not be safe [16:03:17] hashar: if it is not clear or is inaccurate in some way, I want to know [16:03:23] hashar then someone sudo su and commit as me [16:03:25] :D [16:03:34] sumanah: I am not involved in extensions sorry :-( [16:03:42] petan: hehe [16:03:47] hashar: I know that, but still, your eyes would be useful [16:04:08] petan: so what you can do is add a live hack in the files that would include( …OnlineStatusBar.php ) [16:04:12] and put all your configuration there [16:04:19] that is probably the easiest thing to do [16:04:46] I will commit on my own server where git works [16:04:52] https://www.mediawiki.org/wiki/Review_queue is where the actual list of extensions awaiting review & revision lives, with links to the Bugzilla bugs, and then when something has finally been reviewed and approved for deployment AND there is community consensus for that first deployment, Reedy moves the extension to the "deployment queue" [16:05:39] when someone says "it's confusing" but doesn't tell me WHAT PART is confusing, then I can't improve the process or the documentation. So, tell me what part is confusing, please [16:06:17] I find the entire use of bz confusing :P [16:07:00] Damianz: ok. Tell me more. [16:07:27] sumanah: what is confusing me is the mass of informations we have to handle :-) [16:07:43] I could spend a whole week just reading informations and producing nothing [16:07:47] that is what I was ranting about [16:07:56] hashar: that is the case in all complicated systems, I tihnk [16:08:01] not the Review_queue which is, I am sure of it, most probably a great queueing system [16:08:16] so I end up skipping lot of informations to preserve my brain / productivity [16:08:27] and ask the TL;DR team whenever I need an information hehe [16:08:28] hashar: there is no way to get code deployed onto the 5th highest-traffic website on the internet without various steps, and describing those steps takes words. [16:08:45] yeah I fully agree [16:08:51] hashar: how is Engineering Community Team supposed to scale if everyone does that, though? [16:08:57] that is the point of docs [16:08:58] again, I am not completing about having process / bureaucracy [16:09:05]