[00:19:17] !log cvn Installing python-twisted-words from apt on cvn-app1 (for cvnclerkbot.py) [00:19:19] Logged the message, Master [09:09:15] Coren: ping [09:10:12] does SGE have any restrictions on threads? [09:10:18] Fatal Python error: Couldn't create autoTLSkey mapping [09:10:36] according to google that has something to do with not being able to create threads... [10:38:25] !log deployment-prep rebooting both apaches instances. They consume ton of network, most probably related to Gluster [10:38:28] Logged the message, Master [10:40:50] !log deployment-prep rebooting jobrunner08 and bastion. High network use too. [10:40:52] Logged the message, Master [11:11:40] [bz] (NEW - created by: Željko Filipin, priority: Unprioritized - normal) [Bug 46649] Link to login page broken at en.m.wikipedia.beta.wmflabs.org - https://bugzilla.wikimedia.org/show_bug.cgi?id=46649 [11:53:05] !log deployment-prep restoring commonswiki on beta {{gerrit|56593}}. [11:53:08] Logged the message, Master [11:53:08] hashar (or anyone) - Care to help me figure out how to properly set up some instances for oauth testing? We want a master db, a slave db, and two apaches with memcached. Are there appropriate puppet groups to do any of that, or do I have to configure it manually? [12:00:38] !log deployment-prep MobileFrontend should not let user login again ({{bug|46649}}, the issue was most probably caused by the lack of commonswiki on beta. [12:00:40] Logged the message, Master [12:01:41] anomie: the apaches would run mediawiki aren't they ? [12:02:10] anomie: if the OAuth code is in master, we can most probably set it up on beta [12:02:28] that already comes with the varnishes caches, mobile frontend, memcached [12:02:34] and two databases, though there are no slaves [12:03:00] hashar- It won't go against the purpose of beta to be doing active development on the extension there? [12:03:16] ahh [12:04:49] anomie: I am not sure [12:04:55] anomie: the idea is to keep beta stable enough [12:05:25] so yeah, maybe you want to create your own mediawiki instance [12:05:31] I am not sure you need master and slave db [12:05:34] hashar- Yeah, we should probably continue with the separate instance setup. [12:06:31] anomie: for mediawiki you can have a look at the manifests/role/labsmediawiki.pp [12:06:42] If I can do the master/slave thing, may as well to be able to catch any issues that arise from that. [12:06:48] maybe role::mediawiki-install-latest::labs [12:07:09] I looked at that, but that seems to install mysql listening locally and mediawiki all on the same machine [12:07:21] which loads modules/mediawiki_singlenode/manifests/init.pp [12:07:35] well that is good enough isn't it ? [12:07:47] I mean, for a playground / demo [12:08:02] that get you memcached too [12:08:31] I am not sure whether you really need to split the apaches/db to different instances [12:08:48] Well, we want multiple apaches with shared memcached, since the auth stuff is likely going to be using that somewhat heavily. [12:09:52] I would get some code running on a standalone instance [12:10:03] once that seems to be more or less stable, get it on beta :D [12:10:21] 1 apache hitting memcached, is probably the same as 2 apaches hitting the same memcache isn't it ? [13:35:51] legoktm: It doesn't, but perhaps you ran out of memory? [16:54:15] * ^demon spots a Ryan [16:54:19] ;) [16:55:11] when you −1 something and want me to merge it, please set it back to +1 [16:56:55] <^demon> +1'd all of them :) [16:57:15] <^demon> And had jenkins recheck the failure. [16:59:49] thanks :) [17:01:50] this has a −1: https://gerrit.wikimedia.org/r/#/c/55259/1 [17:02:13] <^demon> hashar is being pedantic :p [17:02:34] <^demon> It's the same thing we do 8 lines down for pushing to github. [17:02:44] do what ever you want :-] [17:02:51] heh [17:02:52] I still think using an array is nicer and less error prone hehe [17:03:11] ok. so, is this gerrit update actually ready to go out? [17:03:18] <^demon> Yep. [17:07:23] <^demon> This release contains the fetch/clone improvements finally. Once we run jgit's gc on our repos, we should notice a huge performance increase. [17:08:32] andrewbogott: [17:08:34] aea [17:10:42] Ryan_Lane: I could use a merge from you. To get rid of puppet running the puppet documentation. I have migrated that to a jenkins job. That would make puppet a bit faster on the contint server. I talked about it whith andrew already .. https://gerrit.wikimedia.org/r/#/c/53958/ [17:11:12] the jenkins job does work https://integration.wikimedia.org/ci/view/Operations/job/operations-puppet-doc/5/console [17:15:35] ^demon: all merged in, and the package is in the repo [17:15:47] <^demon> Awesome, thanks. [17:17:06] yw [17:17:21] hashar: done [17:22:29] \O/ [17:23:01] and the merge triggered a new documentation generation!!! [17:24:45] heh [17:24:46] hey Ryan_Lane -- ping re the note I sent about Tom [17:25:42] ah. I'll have to figure out when I'm free [17:26:38] Thanks Ryan_Lane - it's also ok to say "argh no I don't have any time while in town" - Tom's a reasonable guy and will understand [17:26:50] * Ryan_Lane nods [17:27:06] How did last night's workshop go? I'm sorry I couldn't come [17:27:46] it was good [17:28:03] I believe we'll probably get one or two volunteers from it [17:33:56] <^demon> Ryan_Lane: One minor cleanup, otherwise looks good: https://gerrit.wikimedia.org/r/#/c/56612/ [17:35:27] see, hashar was right! it would be less error prone! [17:35:29] :) [17:35:42] <^demon> lol. [17:35:51] what actually changed here? [17:36:03] whitespace? [17:36:18] <^demon> Quotes from " -> ' [17:36:20] ah [17:36:28] <^demon> ${name} was being evaluated by puppet to the class name. [17:36:52] done [17:38:39] <^demon> submitted, merge pending my ass. [17:42:05] :D [17:42:33] <^demon> The hell is going on with that patch? [17:51:31] Ten minutes till LevelUp office hour in #wikimedia-dev [17:59:57] <^demon> Ryan_Lane: Ok, fixed those patches that got stuck in merge limbo. [18:00:11] <^demon> Apparently upstream renamed a parameter to comment-added and I missed it. [18:00:11] <^demon> https://gerrit.wikimedia.org/r/#/c/56616/ [18:00:16] <^demon> That should be the last of it. [18:02:29] done [18:03:06] <^demon> And live. [18:07:39] hashar: Are you all sorted out, or did you still need something? [18:07:46] ok, LevelUp meeting happening now in #wikimedia-dev [19:10:16] dr0ptp4kt, do you have shell access to labs? [19:10:35] MaxSem, yes [19:10:49] then you should be ok [19:11:16] MaxSem, to which host do i need to jump to from bastion? [19:11:47] HOLY PREPOSITION BATMAN! sorry, about that! [19:12:36] deployment-bastion [19:13:01] i'm in. thanks, MaxSem! [20:36:42] How do I set up DNS entries for my labs instances, like how beta has en.wikipedia.beta.wmflabs.org? [20:39:27] anomie: if you are a project admin there should be "Manage Addresses" in the sidebar [20:40:06] mutante- I click that. Then "Allocate IP" seems the only thing to do. Then it tells me "Failed to allocate new public IP address." [20:40:54] which project is this? [20:40:58] oauth [20:42:45] !log raising IP quota on project oauth to 1 [20:42:46] raising is not a valid project. [20:42:47] :) [20:43:01] anomie: there is a quota on it, changed from 0 to 1 [20:43:17] Thanks [20:43:17] !log oauth raising IP quota to 1 [20:43:19] Logged the message, Master [20:43:45] i hope there are still enough in the current pool, but should afaik [20:57:30] petan: Ping [20:58:50] petan: Could make wm-bot in #mediawiki-visualeditor use the same brain as #wikimedia-dev etc.? Assuming it doesn't have anything special, it's easier to just make it use the same instead of maintaining double [20:59:06] for example, it had a wrong version of !g (using https://gerrit.wikimedia.org/r/$1 instead of https://gerrit.wikimedia.org/r/#q,$1,n,z) [20:59:13] (causing hashes not to work) [21:00:13] I know of command @infobot-link but not sure in which direction that works, and which destroys it destoys (if at all) [21:00:25] and which brain it destroys* [21:25:23] Krinkle|detached hi [21:28:50] anybody able to get a bot test account the bot and reviewer flags on ee-prototype [21:29:12] Vacation9 no idea what you talk about [21:30:18] me and Fox are making a bot for aft5, and it would be nice to test it on http://ee-prototype.wmflabs.org/ [21:30:40] oh [21:30:49] and for that we need reviewer [21:30:52] @labs-resolve prototype [21:30:52] I don't know this instance - aren't you are looking for: I-0000013d (ee-prototype), [21:31:00] @labs-info ee-prototype [21:31:00] [Name ee-prototype doesn't exist but resolves to I-0000013d] I-0000013d is Nova Instance with name: ee-prototype, host: virt6, IP: 10.4.0.93 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: editor-engagement, size of storage: 30 and with image ID: lucid-server-cloudimg-amd64.img [21:31:19] @labs-project-users editor-engagement [21:31:19] Following users are in this project (displaying 19 of 24 total): Bsitu, Halfak, Kaldari, Ryan Lane, Maryana, Matthias Mullie, Novaadmin, Ori.livneh, Raindrift, Spage, Swalling, Tychay, Werdna, Robmoen, Dzahn, Massaf, Mattflaschen, Lwelling, Legoktm, [21:31:32] ok anyone of these people can help you :) [21:31:46] okey dokey, thanks [21:31:58] @labs-user Petrb [21:31:58] Petrb is member of 15 projects: Bastion, Bots, Configtest, Deployment-prep, Deployment-prepbackup, Etherpad, Gareth, Huggle, Hugglewa, Nagios, Openstack, Search, Tools, Upload-wizard, Webtools, [21:32:06] not me :) [21:32:19] though I would be happy to [21:32:51] mutante: you can check whats in the pool and how they are allocated using: nova-manage floating list [21:33:09] ah Ryan_Lane could you help me with something [21:33:40] I need a bot account on ee-prototype with bot and reviewer, would you be able to do this [21:36:52] [bz] (RESOLVED - created by: Antoine "hashar" Musso, priority: Normal - normal) [Bug 44424] wikiversions.cdb does not vary by realm - https://bugzilla.wikimedia.org/show_bug.cgi?id=44424 [21:39:13] [bz] (RESOLVED - created by: Sumana Harihareswara, priority: Normal - critical) [Bug 40991] squid001 gives "Zero Sized Reply" error on POST request - https://bugzilla.wikimedia.org/show_bug.cgi?id=40991 [21:39:42] [bz] (RESOLVED - created by: Antoine "hashar" Musso, priority: Normal - major) [Bug 41132] live hack in beta mediawiki-config (tracking) - https://bugzilla.wikimedia.org/show_bug.cgi?id=41132 [21:51:34] Vacation9: I don't have any permissions on that wiki [21:52:40] Ryan_Lane: Okay [22:02:42] [bz] (NEW - created by: silke.meyer, priority: High - normal) [Bug 45609] Input/Output errors in a /home directory - https://bugzilla.wikimedia.org/show_bug.cgi?id=45609 [22:12:41] Coren: you know, we could actually design and test the nfs crap using labs instances [22:12:47] we don't need production machines for it [22:12:56] it'll be a lot quicker [22:14:02] we can't test moving service IPs and such, but we can do the rest [22:50:46] andrewbogott: sooooo much code to review :) [22:51:01] It's mostly the same as last time [22:51:05] * Ryan_Lane nos [22:51:07] *nods [22:51:36] thank god gerrit lets you review between patchsets :) [22:52:00] well, it only sort of allows you... [22:52:09] The diff includes anythign else that has changed in the codebase in the meantime [22:52:19] Which in OSM is not a big deal but in an active project is pretty awful [22:52:22] not if you push it in without a rebase [22:52:56] Oh, hm. I assume that if it isn't rebased it won't merge. But I guess gerrit can do the rebase now [22:53:22] or you rebase when you're totally done [23:24:23] andrewbogott_afk: change looks good. [23:30:55] Ryan_Lane: We *could*, but any test would make shit. [23:31:06] shit (results) [23:31:41] are we expecting different performance for different solutions? [23:31:50] and are we really still considering the proxy much [23:31:51] ? [23:32:15] I know CT favors it, and will be convinced to go forward with a pure server by numbers rather than philosophy. [23:33:38] the proxy is a pretty complicated solution and will almost definitely be slower [23:33:58] it also still allows labs to slow down the netapp for production use [23:34:29] lots of things in labs are already unnecessarily complicated [23:34:57] if we can't directly use the netapp I'm in favor of just doing an NFS box [23:35:44] Coren: so, I do think a proof of concept is necessary [23:36:11] specifically, I want to proof of concept replication methods [23:36:44] Oookay. Different project or do I just fire up a pair of happy fun instances in tools and use myself as guinea pig? [23:36:52] I was really only recommending its use because we can start there immediately [23:37:09] I note there is a lack of large storage options for instances though. ;-) [23:37:15] I wouldn't recommend actually using a labs instance other than testing things out :) [23:39:33] So, I start play with a few small instances and work out a decent replication scheme? [23:39:39] yep [23:39:50] so, in production, it looks like the labstore systems are set up as two raid 6s and a single disk for the os [23:40:04] I think I'd like to use a raid1 for the os in the new config [23:40:14] How many bays? [23:40:18] about 24 disks [23:40:19] (total, I mean) [23:40:20] I believe [23:40:25] * Coren calculates. [23:40:50] I'm fine continuing with two raid 6 [23:41:24] That'd leave 11 disks per raid 6; 9+2 or 8+2 and a spare? [23:41:42] I think I'm using spares [23:41:45] it's 32TB [23:41:50] in the current config [23:41:55] err [23:41:57] 36 [23:42:08] 9+2 then more likely [23:42:22] And you have /a/ spare since you don't raid1 the OS disk. [23:42:30] likely [23:42:39] I'd need to reboot into raid utils to actually know [23:42:51] I probably should have documented that [23:43:01] Probably. :-) But the numbers add up [23:43:49] well, I'm in labstore1001's mgmt console [23:43:54] (that's eqiad) [23:44:02] so, we can discuss some options [23:44:32] Are there more than one controller in them frames? [23:44:43] I don't remember [23:44:50] probably, though, yes [23:45:00] Because there's no particularily good reason to have two arrays if there isn't. [23:45:13] it's a *lot* of disks for a single raid [23:45:54] With good striping on the filesystem, that's not a big obstacle; but the cost isn't too bad to have two. [23:47:04] the drives are 3TB [23:47:20] how the fuck do I get into the raid config? [23:47:24] ... the math doesn't add up to 36 TB then. [23:47:32] the console jumbles up all of the stupid output :( [23:47:39] maybe it's ctrl-e? [23:47:58] Dell's? [23:48:19] well, it's a dell. yeah [23:48:24] but it depends on the controller [23:48:52] I thought all PERCs used the same [23:49:23] what's the escape sequence for it? [23:49:37] I've seen ctrl-c, ctrl-e, and a number of others [23:49:47] ctrl-r is this one [23:49:48] it's LSI [23:49:57] I only see a single controller [23:50:04] PERC H700 [23:50:22] and it has 12 disks [23:50:52] there must be another controller [23:51:46] it's 2GB disks [23:51:46] on this controller [23:52:00] Ah. That's what I thought. [23:52:34] so, a raid 1 with the first two disks [23:53:33] Wait, that probably won't work -- if you have two controllers I very much doubt you can span an array accross them. [23:53:40] you can't [23:53:51] you can do two raid 6, then LVM them [23:54:14] Well yea, but that won't help for the OS disk unless we soft raid1? [23:54:42] you can do a raid1 and a raid 6 on one controller [23:54:46] then a raid 6 on the second [23:54:53] and LVM the two raid 6 [23:55:05] Incidentally, on a pure fileserver, I found that software raid over JBOD is /more/ efficient. [23:55:22] that's an option, for sure [23:56:07] There's a CPU cost, but on a fileserver we don't care so much and the kernel is better at understanding the filesystem layout to avoid write amplification. [23:59:15] And we get to double the IO bandwidth by spanning both controllers. [23:59:16] there's a complexity cost, too [23:59:36] how would we not be spanning both controllers otherwise? [23:59:43] ah. I see what you mean