[00:01:05] ok. updated [00:01:21] looking [00:02:14] can't just say /vX/. need to actually split the doc apart per version [00:02:20] since the urls themselves can change too [00:03:28] {tenant_id} in openstack is actually a uuid, but we don't use uuids for projects [00:03:36] which actually means we can't rename projects :D [00:06:14] one plus of using hostnames for map id is that enforcing ownership by project is easier [00:07:11] looks good to me, Ryan_Lane [00:07:15] cool [00:07:23] you've a couple of extra / left [00:07:31] fixed em already :) [00:08:45] Ryan_Lane: heh, right. [00:09:10] Ryan_Lane: what if... I didn't use python, but used some other language? like... node? [00:09:16] for the API [00:09:31] I'm totally cool with using Python, since I know that is what most people are comfortable with [00:09:39] but JS has a lot of people comfortable with it too [00:09:46] python would be good [00:10:06] we're basically making an openstack api [00:10:19] the rest of openstack is python, so it's good to stick with the ecosystem's language [00:10:26] makes sense [00:10:44] oh, btw. use Apache2 license :) [00:11:04] Ryan_Lane: I was thinking this would be just a module in the puppet repo? [00:11:18] hm [00:11:20] Ryan_Lane: since the rest of the stuff is there, makes sense to keep this there too? [00:11:26] Ryan_Lane: but then I can't really +2 things to it, which is bad. [00:11:36] a separate repo would be good [00:11:39] hmm, yes. [00:11:45] dual licensed to Apache2 and WTFPL :P [00:11:46] we can deploy to it using git-deploy [00:11:56] heh [00:11:57] ooo! [00:12:03] would need virtualenv too [00:12:16] shouldn't be too hard. [00:12:17] dual licensing makes things murky ;) [00:12:35] but it's your code [00:12:35] sure, I'll just put WTFPL on it :P [00:12:43] that's an evil license [00:12:47] pfft. [00:13:25] Ryan_Lane: btw, where exactly does wikitech run from? [00:13:30] virt0 [00:13:34] ah, right. [00:13:36] not 'cluster' [00:13:39] yep [00:13:49] so we can easily firewall proxy-project-project-proxy and wikitech [00:13:57] so no SSL, etc. [00:14:00] fine, fine [00:14:04] SSL would be preferable :) [00:14:26] inside the cluster? [00:14:28] well, I guess that would require a real cert [00:14:34] yes [00:14:49] this would need to travel through the network node [00:14:51] uwsgi supports ssl out of the box if you can give it a cert. [00:15:07] well, we're going to need a cert for the webserver anyway ;) [00:15:13] Ryan_Lane: oh, yes. that is true. [00:15:56] Ryan_Lane: btw, tools will have its own instance. [00:16:06] of proxy. [00:16:08] not use the labs-wide one [00:16:14] why's that? [00:16:30] because we want to use... identd for auth :D [00:16:39] eh? [00:16:43] idea is [00:16:50] there's a different deamon that'll sit there. [00:17:04] and we write wrappers for jsub for different languages (node, uwsgi, go, etc) [00:17:17] that'll start the job on SGE, and then communicate the hostname/port to the proxy API [00:17:27] ah [00:17:30] Ryan_Lane: and then the proxy thing will call identd on the host to confirm who is the caller [00:17:30] that seems sane [00:17:50] Ryan_Lane: and the mappings will be to tools.wmflabs.org//*whatever* [00:17:54] Ryan_Lane: and not just domains. [00:17:54] so [00:18:07] is ident actually a sane way of doing that? [00:18:23] Ryan_Lane: of verifying that the person making this connection is the tool user? yes! [00:18:24] isn't that pretty spoofable? :) [00:18:29] Ryan_Lane: only if you've root [00:18:33] ah. ok [00:18:39] Ryan_Lane: and if you've root on toollabs, we're fucked anyway. [00:18:42] yep [00:18:57] [00:18:58] well, if that's the case, we don't really need the api or anything for tools [00:19:04] it can just write to redis directly [00:19:13] Ryan_Lane: ah, no. [00:19:16] no need for wikitech to handle it [00:19:22] that makes things easier :) [00:19:29] Ryan_Lane: yeah, no wikitech for toollabs. [00:19:50] Ryan_Lane: you can't make toollabs write directly to redis, since then anyone can remap anyone else :) [00:20:01] Ryan_Lane: at least with wikitech you can put access controls on the wikitech side :) [00:20:25] Ryan_Lane: with that, yes - if you agree to put the data in the wikitech db, then the API layert is useless [00:20:31] well, I mean some other thing that Coren handles will handle tools' proxy [00:21:03] Coren|Away: won't have to do anything with uwsgi tyrant, and the current tools-webproxy can also go away :) [00:21:22] our apaches will be degraded to lowly php-execution-hosts, and bwahahah! :P [00:21:29] * Ryan_Lane wants mediawiki to do less and less in the future [00:21:41] yeah, a sentiment I agree with :) [00:21:53] at some point we may want to switch to using horizon rather than MW + OSM [00:21:59] * andrewbogott fell asleep, oops [00:22:07] andrewbogott: hah. I had that issue last week [00:22:11] But, yeah, I will work on packaging. [00:22:50] YuviPanda: the other plus of writing an API is that your service handles all the logic [00:23:12] andrewbogott: do I need to give more details about the package? [00:23:26] Ryan_Lane: oh yeah, +1. I can change how stuff is done on Redis side without worrying about wikitech [00:24:31] yep [00:24:42] as long as the API works the same way, the logic can change [00:25:11] and the logic is within the same language, rather than spread between them :) [00:25:13] YuviPanda, I think we have a git repo for the existing nginx builds… figuring I'll just use that to build the tip (plus whatever security patch.) Is there more to it than that? [00:26:18] andrewbogott: the lua module in even teh latest nginx-extras is waaay outdated. [00:26:44] andrewbogott: and needs updating. same with nginx. I'd need at least 1.3 of nginx and 0.6 of lua - and preferably 1.5 of nginx and 0.8 of lua. [00:26:55] andrewbogott: so I guess you need to pull in new sources? [00:29:18] Yeah, I meant 'the tip' of nginx-extras [00:33:22] andrewbogott: hmm, I think that is what I meant, yeah :) [00:33:29] andrewbogott: 'tip' -> from upstream? [00:33:37] yeah [00:33:40] right! [01:03:14] [01:03:38] Coren|Away: ping [01:04:13] Where's petan? [01:04:41] Anybody working at labs right now? [01:04:47] Crashed hours ago. [01:05:04] -.- [01:05:04] @seen petan [01:05:04] Technical_13: Last time I saw petan they were quitting the network with reason: Ping timeout: 264 seconds N/A at 8/27/2013 5:36:31 PM (7h28m33s ago) [01:05:15] @seen Coren|Away [01:05:15] CP678|Webchat: I have never seen Coren|Away [01:05:32] @seenrx Coren.* [01:05:32] Technical_13: Last time I saw Coren they were changing the nickname to Coren|Away and Coren|Away is still in the channel #wikimedia-operations at 8/27/2013 11:22:31 PM (1h43m1s ago) (multiple results were found: Coren|AFK, Coren|Sleep, |Coren|, Coren|Dinner, Coren|Lunch and 14 more results) [01:05:54] @seen Coren [01:05:54] CP678|Webchat: Last time I saw Coren they were changing the nickname to Coren|Away and Coren|Away is still in the channel #wikimedia-operations at 8/27/2013 11:22:31 PM (1h43m22s ago) [01:06:06] Lovely. [01:06:18] Perfect time for labs to crash. [01:06:26] What's up? [01:06:43] You in charge of xtools now? [01:06:44] I'm trying to monitor my spambot program. [01:07:00] For the most part yes. [01:07:22] CP678|Webchat: What's up? [01:07:22] Update the new page counter. :) [01:07:54] Technical_13: I'm busy. [01:08:10] It doesn't do all ns [01:08:19] Well... [01:08:37] It does if you mess with tbe url [01:09:17] Coren: labs is broken. [01:09:36] And I'm guessing that my bot is still in operation. [01:10:01] Which means, without any control, it might start mass tagging pages tomorrow morning. [01:10:56] Ah, webserver-01 ran out of ram again. xtools once more. [01:12:18] Yeah, X!'s editcounter is br0ked. [01:13:47] * Coren tries to find out what is wrong with it. [01:20:06] Hmmm. Nope, something deeper is going on. [01:20:13] Coren: The edit counter and all of his tools ran just fine on toolserver. [01:24:00] yeah, grrrit-wm left! [01:30:47] just had my IRC bots go down [01:31:10] (on tools) [01:32:20] rschen7754: I'm on it. Should return shortly. [01:32:25] :D [01:32:27] thx [01:35:02] Yeah, there's definitely something wrong with the xtools atm; either it spams error messages or just hangs there consuming memory. [01:41:06] There are several hundred thousand errors in the log in the past few days; most being syntax errors. I'm thinking there was a change recently that broke something. [01:47:54] Coren: er, so what's breaking the rest of toollabs? [01:48:40] Yas again? [01:49:00] legoktm: I have no idea. A lot of stuff is running hard and it's not clear why. [01:49:39] There's a LOT of filesystem traffic. [01:49:51] &ping [01:49:54] @ping [01:50:06] Nope.. guess that is gone. [01:50:30] Looks like it's all coming from -exec-04 [01:50:36] and slowing everything else down. [01:51:09] back up now, or still down? [01:51:32] rschen7754: Still iffy. One of the exec nodes is crunching so hard it's causing pain elsewhere. [01:51:45] ok, so kicking would not be a good idea then :P [01:52:10] tools-exec-04 Load: 436.7% [01:52:14] * legoktm facepalms [01:52:17] :( [01:53:36] Yeah, -exec-04 is completely bust. [01:54:03] * Coren headdesks. [01:54:10] Which is why I just restarted -02. Of course. [01:54:14] * Coren is a moron. [01:54:33] At least the gridengine will just reschedule stuff. [01:54:37] Coren: are yoy using ? [01:54:37] Yeay gridengine. [01:55:03] !blink [01:56:36] Coren: ping [01:56:53] Pong [01:57:19] Coren can you tell me if a file on xtools was changed recently. I have no access at the moment. [01:57:26] :55] !blink [01:57:26] [21:55] Technical_13: When you use , you're not just hurting yourself, but also hurting those around you [01:57:31] !blink del [01:57:31] Unable to find the specified key in db [01:57:39] ah, even better :D [01:57:46] CP678|Webchat: I'll be able to tell you in a minute; I'm making sure the compute nodes go back up. :-) [01:58:01] !YuviPanda del [01:58:02] Unable to find the specified key in db [01:58:30] xtools hasn't been causing problems like this before. [02:00:13] That first puppet run on reboot is soooo slow. [02:00:30] CP678|Webchat: Lemme check to see if I see dates that look askew [02:02:21] [28-Aug-2013 01:58:09 UTC] PHP Warning: syntax error, unexpected '=' in Unknown on line 33 [02:02:21] That's about the worse error message ever. in "Unknown"? [02:03:07] At least you know it's on line 33... [02:03:09] :/ [02:03:11] WTF is Unknown?!? [02:03:13] hey, better than 'Unknown error, in index.html line 0' [02:03:19] (PhoneGap once gave me that) [02:03:25] oooo. perhaps caused in the parse_ini_string(); so it'd come from the ini file.addimagepath [02:03:57] Coren: have you found any changes within the last 2 months. [02:04:12] CP678|Webchat: Looking now. [02:04:36] The dates on every file should be around 1.5 months old, aprrox. [02:05:01] YuviPanda: it could be worse, it could be: Syntax error at '}'; expected '}' [02:05:30] find takes forever: there are directories in there with 100K+ files! [02:06:20] I have a feeling JackPotte's been tampering with the files again. [02:06:33] Anything he touches, tends to break the code. :/ [02:06:40] Ryan_Lane: at least with that you can think 'damn, encodings!' or 'damn, indents!' [02:06:52] Ryan_Lane: *try* finding a js error in line 0 of index.html [02:07:06] :) [02:07:08] I'm seeing mostly config files changed Jun 13 [02:07:33] Can you tell which user changed them? [02:07:49] No, not really. [02:07:55] Oh wait. Is it a mass change? [02:08:37] Coren: ^ [02:08:54] No just a couple. [02:09:16] Can you give me the file paths for those files? [02:10:03] public_html/articleinfo/data is really evil when you try to find; it forces 100K+ stat() calls over NFS [02:10:16] ... that should really go in a DB, not on the filesystem. [02:11:24] Coren: actually, that's a temp directory. The files should actually delete after a certain amount of time. [02:11:59] Does it hurt anything if I blow its contents away? [02:12:07] No [02:15:21] Coren: anything new? [02:15:25] Coren: is labs back up? [02:15:39] YuviPanda: Everything seems to be working atm [02:15:47] Coren: jobs weren't restarted? [02:16:36] YuviPanda: I see no continuous jobs in error mode, and lots of restarted ones. Which one were you worrying about? [02:16:49] Coren: lolrrit-wm [02:17:00] Coren: Ive got jobs that didnt restart [02:17:43] Ryan_Lane: I miss grrrit-wm. [02:18:21] Coren: it seems 'stuck'? [02:18:28] job 748654 [02:18:29] !jobs [02:18:37] Please do not restart my jobs. I beg you. [02:18:54] CP678|Webchat: I only see config files and templates changed. No code. [02:19:17] Coren: 748654 seems stuck in some sort of... deletion purgatory? [02:19:19] When was the latest change? [02:19:22] Coren: job 748654 is already in deletion [02:19:29] Coren: but that's been like that for a minute or so now [02:19:54] YuviPanda: Yeah, I haven't seen that state before. Gimme a sec to find the info for CP678|Webchat then I'm all yours. [02:20:03] Coren: alright. [02:20:15] Elsie: that 'job 748654 is already in deletion' is from grrrit-wm tho [02:21:16] Hm? [02:21:20] CP678|Webchat: Outside of Peachy, most recent I see is sitenotice.php on Jul 28 [02:21:53] Elsie: you said you missed it :) [02:22:05] I'd like to know what's going on in Gerrit. [02:22:12] Without grrrit-wm, I'm a lost child. [02:22:20] That was me. Hmm. Something seems to be going haywire. I set it to display all errors on the page, so I can be notified of them real quickly. [02:22:42] * CP678|Webchat puts Elsie up for adoption [02:22:52] Elsie: stalker [02:23:17] Coren: thanks. [02:23:30] Peachy has a lot of changes in August. [02:23:52] Coren: that would be the autoupdater at work. [02:24:11] Peachy updates itself. [02:24:19] YuviPanda: That's an odd state. "running but deleted"; looks like it's stuck trying to kill it. [02:24:29] Coren: can you put it to rest? [02:25:30] Coren: still zombie'd [02:25:38] Coren: actually one last question. Can you tell when xtools began to flood error messages? [02:26:14] The edit counter seems to be hanging up. :? [02:26:44] Great. I get to do some massive debugging tomorrow. If I can even find the time. [02:26:46] Hm. Whatever happened earlier seems to have made more than one compute node sick. [02:26:51] CP678|Webchat: Lemme check. [02:31:50] Actually, I see errors pop up in large bursts starting around 31 Jul [02:33:06] Coren: any luck in killing it? [02:33:15] And then a big bunch starting the 25 [02:33:21] (Aug) two days ago [02:33:49] YuviPanda: The grid seem to be mildly wedged. It should unstuck as soon as -03 reports in (a few minutes) [02:33:57] Coren: oaky [02:34:45] grrrrit-wm, otoh, seems to not have restarted at all for some reason. [02:36:20] Coren: err. I pushed https://github.com/cyberpower678/Peachy/commit/e16773bfcea03ffba42eee32bd3185a604977837 to Peachy on July 31. [02:37:02] That seems unlikely to be related. [02:37:54] Coren: so, I've been trying to kill it myself [02:37:55] no luck either [02:38:04] Other than that, I can't find a connection to these error surges. [02:38:07] tried agian. nothin [02:38:26] YuviPanda: Yeah, I know; the scheduler is confused about the state of a node. [02:38:38] Well, I'm going to bed. Thanks for your help Coren. Good night everyone. [02:38:39] alright, I'll wait some more :) [02:38:54] i'd want to get grrrit-wm back up before sleeping, but it is alread 8:00 AM [02:39:22] The puppet runs on boot make the boxes take *forever* to startup [02:40:19] Ah, it's back up so now the delete went through okay. [02:42:43] Coren: yes :) [02:43:10] I really don't get what happened. Every single instance seems to be wonky to some degree. [02:46:41] [bz] (8REOPENED - created by: 2MZMcBride, priority: 4Normal - 6normal) [Bug 36885] wmflabs.org does not resolve - https://bugzilla.wikimedia.org/show_bug.cgi?id=36885 [02:53:36] ... there seems to be a lock held on the page table of enwiki_p. [02:53:46] Perhaps that's what caused all those things to pile up and wedge. [03:06:33] Boom. Headshot. [03:07:08] oh come on. we did spray bullets elsewhere first :P [03:09:17] And X!'s edit counter lives again. Not entirely surprising that it'd wedge if it couldn't access the page table. [04:28:26] Hi all [04:30:31] I have Q [04:30:54] Is some one there to help me [04:32:03] Elph: ? [04:32:53] I want to get report from wikiquote [04:33:29] labs question: I'm working with an older MediaWiki labs instance, does anyone know where the admin password was stored prior to the orig/adminpass setup? [04:33:43] What changes should I do to wikipedia's reports code [04:33:45] ? [04:34:29] idk about wikiquote [04:34:55] Elph , you might want to ask on #mediawiki, I'm not sure how that's a labs question unless you have a labs instance where you're running the report [04:35:23] thanks [04:36:36] Elph, I'm not sure what you mean by "report", but there are dumps of the content, e.g. http://dumps.wikimedia.org/enwikiquote/20130827/ [04:36:53] or dbs [04:37:10] report means the result of a query :) [04:37:23] Elph: API? [04:37:30] sql [04:37:37] hi [04:37:37] oh [04:37:43] need one help [04:38:13] it's simple query [04:38:24] on login to server [04:39:21] I am getting error like " The user key you've selected is not registered in the remote server" [04:39:36] i have updated the public key in openstack [04:40:39] Elph: connect to a server like "commonswiki.labsdb" (checking about wikiquote) [04:42:10] "enwikiquote.labsdb" [04:42:52] database "enwikiquote_p" [04:43:08] for ar [04:43:25] "arwikiquote.labsdb" ? [04:43:41] and usename & password are stored in "~/replica.my.cnf" [04:43:58] db list: https://noc.wikimedia.org/conf/all.dblist [04:44:32] thanks a lot [04:44:55] yw [04:45:14] naveenpf: ? [04:49:20] !google xshell The user key you've selected is not registered in the remote server [04:49:20] https://www.google.com/search?q=xshell+The+user+key+you've+selected+is+not+registered+in+the+remote+server [04:53:34] @seen petan [04:53:35] zhuyifei1999: Last time I saw petan they were quitting the network with reason: Ping timeout: 264 seconds N/A at 8/27/2013 5:36:31 PM (11h17m4s ago) [04:53:46] @notify petan [04:53:46] I'll let you know when I see petan around here [06:28:43] hello is there a problem with bastion, or the instances ? I cannot connect ot openid-wiki and openid2-wiki [06:45:25] [bz] (8NEW - created by: 2Chris McMahon, priority: 4High - 6enhancement) [Bug 53061] support Flow on beta cluster - https://bugzilla.wikimedia.org/show_bug.cgi?id=53061 [07:37:57] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 53457] setup a DB backed parser cache - https://bugzilla.wikimedia.org/show_bug.cgi?id=53457 [07:40:08] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 53458] adapt the MariaDB puppet manifests for beta - https://bugzilla.wikimedia.org/show_bug.cgi?id=53458 [07:40:09] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 53339] migrate beta databases to MariaDB - https://bugzilla.wikimedia.org/show_bug.cgi?id=53339 [12:34:09] Coren: What's up with job 744318? It's showing 0 CPU time and 0 memory usage, although it does seem to be running. And it's not the spaces-in-$0 problem this time (those are nbsp, heh). [12:34:25] anomie: Lemme see. [12:36:54] anomie: Would that be ArticleCreationGrapher? [12:37:26] Coren: No, ArticleCreationGrapher is 744309 [12:37:40] * Coren doesn't know how to tell those apart. :-) [12:38:02] Ah, the number (999) [12:38:28] * anomie stops typing out all the tasks since Coren found the number [12:39:18] hm. It's running fine indeed; I'm thinking the gridengine process that collects stats is easily confused because it's not picking those up [12:39:36] Perhaps the "[SourceUploader]" in the args? [12:40:07] Hasn't been a problem before though, as far as I know. But I'll see about testing it later. [12:41:28] anomie: As far as I can tell it really is just a collection problem; the process is running and happily chugging away at regular interval. [12:42:28] Coren: I see entries in the bot's logfile, so it is running (somewhere). But I see another oddity. By submitting jobs that do "ps auxwww", I see AnomieBOT processes that don't seem to be attached to jobs: pids 1128, 1278, and 1673 on exec-03 and pids 2082 and 2083 on exec-05. [12:42:53] Actually, nevermind that [12:43:13] * anomie modifies script to delete old logfiles [12:43:34] I don't see either 2082 or 2083 [12:43:41] Oh. :-) [12:44:44] FYI, in case you wanted the numbers, 999 has 4:26 accumulated CPU times and just under 203MB of Vmem [12:50:03] anyone know if there is an open elasticsearch index that can be queried? mentioned in the search roadmap https://www.mediawiki.org/wiki/Roadmap#Search [12:51:38] edsu: not deployed to production yet. [12:51:57] edsu: should be interesting to get that on toollabs when it is deployed [12:53:40] i guess it's running somewhere in labs already? [12:54:03] edsu: hmm, it was, and then NFS was being hammered by it, and... I don't know what they did after that [12:54:10] edsu: might be running in betalbs, though [12:56:06] looks like there some older solr images in the 'search' project https://wikitech.wikimedia.org/wiki/Nova_Resource:Search [12:58:21] edsu: ^d / ^demon and manybubbles are the ones who work on it, IIRC [13:00:45] YuviPanda: cheers [13:01:08] manybubbles: ping [13:12:09] :q [13:12:16] :D [13:17:41] manybubbles: following on from edsu's Elasticsearch questions, do you know if the new search API will allow more_like_this queries? [13:18:22] here! [13:19:10] on phone on moment [13:21:03] ok, off phone, gonna read chat log [13:22:16] edsu: the elasticsearch index isn't open to the internet but is open to our network. It exists in beta and for test2.wikipedia.org right now and will grow mediawiki.org today. [13:22:51] Coren: My job running on tools-exec-07 doesn't seem to be logging anything, when it should. Is NFS working right there? [13:24:13] fresco: the new search api will be mostly a clone of the old one. we wanted to get feature parity before diving into more, well, features. more_like_this is currently unimplemented but I'd be happy to implement something when the bugs quiet down. [13:24:15] anomie good question [13:24:47] manybubbles: this is the query i'm using with a local elasticsearch index of wikipedia: https://gist.github.com/hubgit/6365895 [13:24:57] it is working but for some reason there is long IO wait [13:25:35] it's possibly a bit more cpu-intensive than a standard query, but it'd be nice to be able to run it against the source index instead of mirroring [13:28:29] fresco: hmmm - I'll have to talk to ^d about it but we're really not at the point where we are willing to expose elasticsearch for arbitrary queries outside of folks with production cluster access. [13:29:01] fresco: we might be able to come up with something but I can't promise it'll be as soon as you'd like [13:29:03] manybubbles: that's fair enough - most of that query could be turned into parameters for a "morelikethis" query, though [13:31:29] fresco: so my suggestion is to file a bug against cirrussearch and I'll look at it. also, do you plan on doing this against enwiki or others? [13:31:44] just enwiki, personally - thanks, i'll do that [13:32:33] I promise I'll at least look at the bug and respond this week. Maybe by the time we get rolled out to enwiki I'll have something for you. [13:47:41] manybubbles: does elastic search do replication of some sort? [13:47:49] manybubbles: perhaps when it is all done we can replicate it to toollabs :) [13:47:56] YubiPanda: so many sorts [13:48:13] manybubbles: you can type Yu to complete my name [13:48:21] or just put a rate-limiting proxy in front [13:48:36] YuviPanda: yay tab! [13:48:41] :D [13:49:07] YuviPanda: snapping it back to toollabs on a scheduled basis is a good idea. [13:49:09] guess you'd need to guard against DELETE [13:49:23] yes, I meant also limiting to certain queries etc. [13:49:47] paravoid: as edsu said you'd need to guard against delete and other things. Also I bet you can poison it with nasty nasty queries pretty easilly. [13:50:08] readonly! [13:50:13] like how we have db replicas [13:50:25] essentially, if we wanted to expose it we'd have to be super careful [13:54:32] manybubbles: true, true. there was a good amount of work needed to expose the dbs [13:54:47] manybubbles: but I'm sure if we can expose the dbs, we can expose tehse too. eventually. at some point. ;) [13:54:56] *not* over the internet [13:55:03] but to toollabs, which is in-network [13:55:20] Yu: thus, why I don't think we'll have an answer as soon as we'd like. [14:03:30] manybubbles: sure, that's fine. [14:08:58] anomie: Nothing wrong with NFS that I can see. [14:10:03] anomie: What file did you expect its output to go to? [14:14:13] hey Coren :) [14:14:15] Coren: Output should be in /data/project/anomiebot/botlogs/bot-updater.err. Also it should have done a git pull in /data/project/anomiebot/bot, which isn't showing up on tools-login as having been done either. I also put data into /data/project/anomiebot/.anomiebot-data/AnomieBOT-updater.cmd that the process should have seen, logged an error message about, and then deleted. [14:29:51] anomie: The fliesystem is working fine on -07. From what I see, bot-updater.pl is currently sleeping on a read() system call over a pipe. [14:30:51] @jb wikipedia/Jyothis [14:31:10] that used to be faster [14:31:15] @jb wikipedia/Jyothis [14:31:24] aha chanserv is lagging :) [14:31:24] anomie: Ah, waiting for output from 'git fetch -q' [14:31:44] o.o [14:31:53] @unjb wikipedia/Jyothis [14:32:02] @jb Jyothis [14:32:10] anomie: Want me to kill the stuck git? [14:32:10] here we go [14:32:25] Coren: Yeah. I wonder why git is stuck. [14:33:27] anomie: It looks like it's spinning waiting for something to happen on the filesystem. Perhaps waiting on a lock? [14:34:00] Coren: How long ago was that git process started? [14:34:15] 01:13 UTC. [14:35:41] Nothing useful there, the git repo it's trying to pull from shouldn't have changed for weeks at that time. I did a push at around 13:05 UTC, but that's 12 hours after it hung up. [14:36:31] Killing it had it output "(U) Received unknown command 'ping'" in its death throes. [14:36:57] That's the expected error message from what I put in /data/project/anomiebot/.anomiebot-data/AnomieBOT-updater.cmd [14:37:32] Incidentally, ignore the "test" in the .log; that's how I checked the filesystem was fine. :-) [14:37:49] ok [14:38:48] From what I can see, bot-updated.pl is now back in some sleep loop waiting for something interesting to happen. [14:38:56] bot-updater* [14:39:01] So it really was just the stuck git [14:40:51] anomie: You might want to put a timeout on the git operation for robustness. [14:41:06] Coren: Yeah, it looks like you killed the stuck git, so it resumed its event loop, did a few successful git commands, re-execed itself, and now it's back in it's normal event loop again. I suppose I'll just have to find a way to time out the git command in case it sticks again. [14:41:26] Otherwise, you're depending on the network, the git server, and a bazillion other things. [14:43:18] Well, in this case the "remote" git repo is on the local filesystem, in /data/project/anomiebot/AnomieBOT.git. [14:44:00] Odd. [14:44:30] Still good practice to guard against things like this. [14:44:35] Yeah. [14:46:14] IPC::Run should allow you to wrap the git call in a safe manner. [14:48:29] (Check out the "Timeouts and Timers" of man IPC::Run, it even has a nice example) [15:16:05] Hi guys. I've ask, the "take" command is not working for me, I want to overtake the owner from my "luxo" to "guc" project: [15:16:06] luxo@tools-dev:/data/project/guc/public_html$ become guc [15:16:08] local-guc@tools-dev:~$ cd /data/project/guc/public_html [15:16:09] local-guc@tools-dev:~/public_html$ ls [15:16:11] index.php [15:16:12] local-guc@tools-dev:~/public_html$ take index.php [15:16:14] -bash: take: command not found [15:18:18] Coren: ^ [15:18:46] ... how odd. Lemme check. [15:20:00] Ha! Since take is only newly packages, it didn't get deployed to -dev automatically. [15:20:04] Fixed. [15:20:22] Luxo: It's fixed; apparently you're the first one trying to use take from -dev. :-) [15:20:51] it works :) great, thanks [17:23:04] andrewbogott: any luck on the package? [17:24:12] YuviPanda, not yet, I was waiting for Ryan_Lane to explain how to use https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/nginx [17:24:23] ah, okay! :) [17:24:47] andrewbogott: I think he wanted this to be a local deb, rather than update that one, since updating that would mean updating production as well [17:26:54] what do you mean 'a local deb'? [17:27:54] andrewbogott: not sure. Ryan_Lane can probably explain. He gave a link to scfc_de which I don't have atm. [17:28:16] andrewbogott: sorry, I seem to have tuned out all the packaging talk between Ryan_Lane and scfc_de, so don't fully remember :| [17:28:23] * YuviPanda pokes Ryan_Lane with andrewbogott [17:28:45] andrewbogott: what specifically do you need to know? [17:28:58] that package only has the debian directory, right? [17:29:21] and modules/nginx-udplog [17:29:30] yeah. so, ignore that module [17:29:37] we don't need or want it [17:29:44] you can take a straight backport and use that [17:29:50] maybe from raring or such [17:30:02] and bring in a newer nginx version [17:30:08] Oh -- is that true in prod as well, or just for yuvi's purposes? [17:30:19] just for yuvi's purposes [17:30:32] but we'll also likely get rid of that logging module in production, at some point too [17:31:37] ok… so if we don't need to build the package, what was scfc doing? [17:32:11] he was backporting a packaging from raring [17:32:27] but YuviPanda was saying he needed a newer version than was in raring [17:32:42] I need at least 1.3, preferably 1.5 [17:32:46] it's still necessary to build a backported package [17:32:48] let me see what version is in raring [17:33:17] Ah, because using the actual new package lands us in dependency hell? [17:33:32] it probably won't install [17:33:37] Ryan_Lane: yeah, raring has nginx 1.2.6-1ubuntu3 [17:33:38] because it depends on things that don't exist [17:33:42] which is too old [17:33:42] Then I'm back to my original question -- should I use that repo in gerrit to build the new package? And, if so, how? [17:33:52] so yes, Raring is too old. [17:33:54] seems nginx has an ubuntu ppa [17:34:13] https://launchpad.net/~nginx/+archive/development [17:34:28] that has 1.5.0 for precies, which is good enough [17:34:34] development version is 1.5.... [17:34:34] it doesn't have the lua module though [17:34:40] stable is 1.4 [17:35:18] Coren: ping [17:35:27] Ryan_Lane: hmm, alright, 1.4 then. [17:35:37] what's in 1.5 that you need? [17:35:41] legoktm: pong? [17:36:11] Ryan_Lane: no, I don't need 1.5. [17:36:15] Ryan_Lane: 1.4 is good enough. [17:36:16] ok [17:36:31] Ryan_Lane: the lua module just needs to be added [17:36:54] see pm [17:37:14] nginx 1.3 added websocket support, which is why I need at least that [17:41:02] andrewbogott: are... things clearer? or are you more confused now? :) [17:41:45] "should I use that repo in gerrit to build the new package? And, if so, how?" [17:42:43] Ryan_Lane: ^ [17:43:03] Coren: http://git.wikimedia.org/project/labs [17:43:08] what are ost of those? [17:43:10] *most [17:43:22] centralauth? maps? incubator? qmwbot? [17:43:30] shouldn't those be under tools? [17:44:04] kirstentest should probably be deleted [17:44:07] since... itw as a tet [17:44:10] *it was a test [17:44:41] Ryan_Lane: I dunno; I didn't create those. Perhaps andre didn't notice the existence of a more apropriate project? [17:44:47] heh [17:45:13] I really need to add this support to wikitech [17:45:26] so that people can just click a button to get a repo, and it'll be named correctly [17:45:48] labs/ <-- that's what we should be going for [17:46:03] labs// [17:46:43] <^demon> If we supported that, plus mediawiki/extensions/ we'd handle 90% of the repo creation requests. [17:46:49] ^demon: yep [17:46:57] also, we can allow service groups to be the maintainers [17:47:10] so creating repos can be separate from service groups [17:47:31] really we could have maintainers be service groups or the project [17:47:58] <^demon> We'll do this with that extra couple of weeks of free time we have :) [17:48:02] hahaha [17:48:13] well, we wanted to do a sprint, right? [17:48:21] let's do a functional spec [17:48:25] and plan a sprint [17:48:53] this is definitely a missing feature we need [17:49:04] <^demon> I'm in the middle of two sprints already :) [17:49:07] I'd say let's use gerrit's rest api [17:49:16] ^demon: I wasn't saying we should do it now :) [17:49:20] <^demon> And supposedly not caring about gerrit much this quarter ;-) [17:49:33] I'm saying we should plan it on a certain date, for a week and bang out the code [17:49:40] and have the spec written first [17:49:43] <^demon> Sure :) [17:50:18] my irc communication skills are getting worse and worse. commas are just randomly appearing in my sentences [17:50:27] o,o,really? [17:50:31] :D [17:50:33] <^demon> I'm not sure, what you are, talking, about. [17:50:44] Ryan_Lane: you also missed andrewbogott's question about the debs :P [17:50:44] my girlfriend would look down on me [17:50:49] oh [17:51:06] andrewbogott: nah. don't use that repo. or if you do, use another branch [17:51:17] hm. maybe do and use another branch :) [17:51:29] because the master branch is for production [17:51:41] maybe a tools branch? [17:53:44] That repo contains the how-to-make-a-deb stuff but not any actual nginx code. Presumably I need to build or checkout nginx and put something in debian/source... [17:53:50] Should that step be obvious somehow? [17:54:13] it's normal to not have the code [17:54:37] if you add the raring apt source repo to your instance's config [17:54:47] and remove all the other source repos [17:54:53] you can do: apt-get source nginx [17:55:01] and it'll pull in everything [17:55:04] we don't keep the source [17:55:06] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Highest - 6major) [Bug 48501] [OPS] beta: get SSL certificates - https://bugzilla.wikimedia.org/show_bug.cgi?id=48501 [17:55:08] Ooooh, I had no idea it worked like that. [17:55:15] just the debian directory and any custom code we add [17:55:29] I mean, I understand why it doesn't contain the source, I was just expecting it to have a README or something explaining usage. [17:55:47] !b 1 | andrewbogott [17:55:47] andrewbogott: https://bugzilla.wikimedia.org/1 [17:55:49] :) [17:55:50] But, ok, I think that part makes sense. Presumably I can put the lua module in modules/ as well? [17:56:50] Of course apt-get source nginx will only get me a version that we've already determined is too old... [17:57:12] andrewbogott: well, it's kind of a skeleton to work with [17:57:23] 'k [17:57:31] you can download the nginx source and replace what's there [17:57:43] and figure out how the patches need to be applied, if at all [17:57:58] adding in the lua module should be relatively straightforward [17:58:05] see how I add the udp logging module in ours [17:58:57] * andrewbogott nods [18:03:31] ^demon: https://wikitech.wikimedia.org/wiki/Projects#Gerrit_repo_creation_through_wikitech [18:03:52] I'll write up a basic functional spec and have you modify [18:03:59] <^demon> Cool [18:05:11] allowing service groups to maintain a repo is going to take changes in service group creation and in gerrit config [18:05:34] Hm, I installed nginx source but this still fails for lack of orig.tar [18:05:36] we'll need to change the basedn for groups to the root, so that it can find groups in ou=groups and in ou=projects [18:06:38] hm. I think projects are also groups. that could be problematic [18:06:57] heh [18:09:17] andrewbogott: have you read this? http://www.debian.org/doc/manuals/maint-guide/ [18:09:51] Ryan_Lane: Projects /are/ groups indeed. [18:10:09] they aren't posixgroups, though [18:10:19] all the rest of our groups are [18:10:29] service groups and groups in ou=groups [18:11:06] Wait, yes they are. 50062(project-bastion) [18:11:17] those are in ou=groups ;) [18:11:31] Ah! And not people. Point. [18:11:38] dn: cn=wordpress,ou=projects,dc=wikimedia,dc=org [18:11:38] objectClass: groupofnames [18:11:43] well, not projects [18:11:49] and service groups are under projects [18:12:17] I was worried about projects ou and groups ou both being usable, even though they are the same, but name differently [18:12:22] it's more of a confusion thing [18:12:27] Dunno what's planned for that two weeks in SF, but if you want to do the account renaming thing that'd be a great opportunity. [18:12:32] but we can limit the scope to posixgroup [18:12:56] hm. though we do have some groups in ou=groups that may not be posixgroup [18:13:31] oh, good. none of the important ones are [18:13:45] in fact, only Directory Manager isn't a posix group [18:13:51] excellent [18:15:04] Coren: that would work for me [18:16:06] It's a fairly involved nightmare, tbh, and will likely need a lot of fiddling around in tools. Thankfully, there are few/no dependencies on what the group name prefix actually /is/ for the most part. [18:16:22] yep [18:28:10] ^demon: :( [18:28:19] <^demon> hm? [18:28:23] gerrit's rest api doesn't seem to return info about what a project's owner is [18:29:23] it'll let you set an owner, but doesn't return that information [18:29:33] awww, Gerrit [18:29:53] that's... less than helpful [18:30:14] Ryan_Lane: that sounds very typical of the gerrit REST api, yes :-) [18:30:55] Ryan_Lane: note that there is also an xml-rpc api, which has more options but is deprecated [18:31:16] <^demon> Don't use the xml-rpc api. [18:31:20] what I'd really like is a list of projects the group is an owner of [18:31:33] <^demon> Its functionality is being actively yanked out with a crowbar as it's replaced with rest. [18:32:09] Yeah. They deprecated the API before actually having a working alternative, which is a bit.. odd [18:32:43] But it's still better than screen-scraping, because that's *really* not going to work with Gerrit ;-) [18:33:04] <^demon> Well, they're deprecating as they replace it :) [18:33:11] I can't even find a way to get a project's members [18:33:12] err [18:33:13] <^demon> It was never *meant* to be a public API [18:33:16] a project's owners [18:33:23] ssh doesn't seem to show it either [18:33:28] <^demon> That info's in git :) [18:33:52] what's info in git? [18:34:15] did they put *that* in git notes/ [18:34:15] ?! [18:34:43] hello I cannot connect to my instances openid-wiki and openid2-wiki [18:34:47] since some daysd [18:34:52] already rebooted [18:35:01] web access works but not PuTTY or WinSCP [18:35:10] please +++ assist [18:36:42] hello I cannot connect to my instances openid-wiki and openid2-wiki [18:39:53] ^demon: well, so it seems maybe we'll need to replicate the repos to wikitech? [18:41:05] either way I'll need to have a cron that crawls the repos for the info. fucking gerrit [18:41:37] Wikinaut: I'll need a bit more detail than this. What username are you trying with ssh, and what error(s) are you getting when you try? [18:41:52] username wikinaut [18:42:08] no errors but timeout. No connect to the instance [18:42:21] worked since one year [18:42:24] I have a feeling glustr is acting up [18:42:39] hm. I take that back [18:43:17] I can't ssh in as root either [18:43:31] I guess that could still be due to gluster [18:45:17] Ryan_Lane: Sometimes it still goes crunch even as root. [18:45:59] Ryan_Lane: You're on it then? I did my share of gluster kicking this week already. :-P [18:48:36] Coren, most accesses will be from loosely typed languages [18:48:44] I still think it's better to rename the col to server [18:49:29] Coren: well, I'm taking a look [18:50:10] Yeah, I know it's going to be loosely typed languages, which is why I expect that code will try to connect to "s$server.foo" instead of "$slice" and that will fail mysteriously rather than cause an error at the SQL level which is easy to figure out. [18:50:44] Or, worse, do things like $server == someint with the returned values. [18:51:51] then it will fail [18:52:17] I would prefer to add if ($server doesn't have a '.') $server = "s$server"; [18:52:32] instead of parametrising the field name [18:55:05] Platonides: The /point/ of having the server name is that I don't want to have tools' code guess at the naming scheme or try to construct hostnames itself, so it's easy to change them as needed without breaking anything. Even the presence of a '.' isn't guaranteed. [18:55:12] * Coren ponders. [18:57:01] * Coren boggles at some queries. [18:57:02] * Platonides changes the above line to �I would prefer to add if (CLUSTER == "TOOLSERVER") $server = "s$server";� [18:57:05] 55h, really? [18:57:52] Wikinaut: should be fixed [18:58:05] one moment... [18:58:19] Well, it's still relatively recent enough that I suppose I can change the column name without breaking much. I'm really not comfortable about the idea of having a column with different semantics keep the same name though. Let's get a 3d opinion first? :-) [18:58:32] confirmed: it now works.thx [18:58:42] uh... [18:59:02] 58 packages can be updated. [18:59:04] 2 updates are security updates. [18:59:19] shall I reboot? [19:01:07] Wikinaut: apt-get upgrade first [19:01:11] then if you'd like to reboot, sure [19:01:16] Ryan_Lane: Go opine on https://bugzilla.wikimedia.org/show_bug.cgi?id=48626 [19:01:21] it'll apply security updates automatically anyway [19:01:24] Ryan_Lane: I wants an opinion. :-) [19:01:30] Coren: sigh. I have like 3k mail on tools [19:01:38] and it's all garbage [19:01:48] this is why I hate local mail [19:02:07] Ryan_Lane: I never did apt-get upgrade . Is this documented somewhere ? [19:02:09] Coren: I honestly have no opinion on that ;) [19:02:18] bash$ >/var/mail/laner [19:02:20] Wikinaut: it just upgrades packages [19:02:26] I do know the command as such [19:02:27] Wikinaut: to be honest, though, you can ignore it [19:02:31] ok [19:02:37] (likes to ignore that) [19:02:43] the system will automatically apply security updates [19:03:03] sometimes before I saw a "you need to reboot message" [19:03:08] then I did so [19:03:24] oh, yeah, in those cases, please do reboot [19:04:22] Coren: I have relatively no knowledge of mysql, so my opinion doesn't matter :) [19:05:03] oh. do you mean slice vs server? [19:05:15] * Ryan_Lane grumbles [19:05:17] we use slice everywhere [19:05:22] TS was stupid to use server [19:06:01] Ryan_Lane: Yeah, Platonides feels that keeping the name the same is better, I'm thinking that changing semantics without changing the name is asking for trouble. I can see his point (not needing to change the query). [19:06:48] (On TS, the column was an integer 1-7; I have the slice hostname instead) [19:09:41] Platonides: Perhaps I should just provide a view with the column named server for legacy purposes. [19:10:15] For that matter, I should probably make a view with /all/ the same columns for that purpose, just null those I don't support. [19:19:12] Ryan_Lane it is not a garbage... [19:19:22] petan: eh? [19:19:25] some of them tell you errors that can be fixed and such [19:19:30] local mail [19:19:39] except that I'm never going to see it [19:19:46] or logs from cron etc [19:20:07] you can always disable your mailbox or just dont read it :P [19:20:23] all of those mails have nothing to do with me [19:20:43] it was errors on the exec nodes and it was sent to my personal user [19:20:49] then dont read them... put some kind of /dev/null to your forward [19:20:57] it would be nice if they could be redirected per project (admins) [19:21:05] it is [19:21:08] well, I wouldn't mind emails that were actually useful [19:21:14] he is project admin, that is why he receive it [19:21:20] but random errors on the exec nodes (when I'm not running anything) isn't useful [19:21:25] ah. I see [19:21:46] only people who are in local-admin group receive these mails AFAIK [19:22:04] local-admin? why in the world are we using a service group for this? [19:22:18] we have a role in ldap that has the same membership ;) [19:22:24] because it is simple and it works :o I think... didnt really set it up myself [19:22:43] except now it's necessary to maintain the membership in two spots [19:23:04] when it's just a matter of changing the ldap query [19:23:08] I dont really know how it works, either it is local-admin or the project admins, anyway you are in both [19:24:23] on other hand having 2 groups allow you to be project admin and not receive mails in same time :P [19:27:14] that's not a great reason ;) [19:28:11] petan: completely unrelated, but did you notice we merged that motd file for bots project machines.. i know it's been months ago, heh, but you can use it now [19:29:08] is the bots project not dead yet? :) [19:30:04] petan: has everything not been moved to tools yet? I'd really like to get the space back from those vms :) [19:44:14] [bz] (8RESOLVED - created by: 2Tim Landscheidt, priority: 4Normal - 6normal) [Bug 48626] Provide wiki metadata in the databases similar to toolserver.wiki - https://bugzilla.wikimedia.org/show_bug.cgi?id=48626 [19:44:44] Ryan_Lane: what puppet version is used ? [19:44:53] 2.7.? [19:45:14] yes [19:45:47] what comes in the question mark number, Ryan_Lane ? [19:47:18] 2.7 [19:47:52] no subversion? [19:48:15] such as 2.7.14? [19:51:39] Coren, a view is fine for me [19:52:23] although if the column will be always NULL I'm not sure it's useful [19:53:33] server? It's going to be the hostname. Hence the caveat. [19:58:10] slice also contained a hostname :P [20:00:03] I know. I'm not giving any new data here, only a view to it that keeps the column name and order. [20:06:34] perhaps it's a problem with the views? [20:10:45] Platonides: No, the tables the views point to seem to have... vanished. [20:11:19] Not gotten empty, mind you. The databases are just gone(!) [20:13:15] :O [20:13:20] deleted frm files? [20:19:31] Platonides: Not sure what happened. I'll let Asher take a look at it before I touch anything to avoid screwing up with diagnostics. [20:28:44] did s7 replication ever begin? [20:31:37] rschen7754: S7 on toolserver? [20:31:49] on labs [20:32:21] Okay, was going to say. On ts has been broked for weeks. [20:40:20] rschen7754: It's been here since late May. [20:40:34] Coren: oh, ok… yeah i never heard about it [20:41:09] I'm pretty sure I sent an announcement on labs-l [20:41:48] But then again, that's the kind of thing I can forget. [21:22:13] Ryan_Lane: 2 questions to you. One general question re. wfDebug() code in extensions. Is there a general policy, can it stay in or should it be removed ? [21:24:19] Ryan_Lane: second question: you gave +1 to E:OpenID. So, from your point of view: can I merge now ? [21:25:47] it's fine [21:25:52] and encouraged [21:25:59] thx [21:26:14] and yeah, I +1'd so that it wouldn't get auto-merged [21:26:16] I invested one hour right now and performed test [21:26:19] if you're ready to merge, go for it [21:26:24] one moment [21:26:54] Ryan_Lane: I did not get it working _between_with OpenIDForcedProvider = one of the instances [21:27:01] I guess, it has to do [21:28:04] with the different urls [21:28:04] but I am not sure [21:28:04] do you understand, what I mean ? [21:28:45] On http://openid-wiki.instance-proxy.wmflabs.org/wiki/ I set $OpenIDForcedProvider = "http://openid-wiki2.instance-proxy.wmflabs.org/wiki/Special:OpenIDServer/id"; [21:30:02] but this does not work. I am not 100% sure, whether the first instance fetches from the seconds via curl - then the url must be changed to http://openid-wiki.pmtpa.wmflabs.org/wiki/Special:OpenIDServer/id [21:31:00] so I stopped. This is a separate problem, I think. In February, you set E:OpenID for a test on labsconsole.mediawiki.org or so. Do you remember ? [21:32:04] (one moment - back in five minutes - I have to restart my DSL router for other reasons) [21:42:34] Ryan_Lane: oops [21:43:48] Ryan_Lane: do I understand you correctly, that everything with E:OpenID works as expected [21:43:49] ? [21:44:27] (This is my view now, after your " they can, if you use a socks proxy" answer [21:44:29] ) [21:50:22] Wikinaut: I didn't test, I just code reviewed [21:50:25] I assumed you had tested it [21:50:30] yes [21:50:33] ! [21:50:33] There are multiple keys, refine your input: !log, $realm, $site, *, :), ?, {, access, account, account-questions, accountreq, add, addresses, addshore, afk, airport-centre, alert, amend, ask, awstats, bang, bastion, be, beta, bible, blehlogging, blueprint-dns, borg, bot, bots, botsdocs, broken, bug, bz, chmod, cmds, console, cookies, coren, Coren, credentials, cs, Cyberpower678, damianz, damianz's-reset, db, del, demon, deployment-beta-docs-1, deployment-prep, doc, docs, domain, dumb, enwp, epad, etherpad, evil, extension, failure, fff, filemoves.py, flow, FORTRAN, forwarding, gerrit, gerritsearch, gerrit-wm, ghsh, git, git-puppet, gitweb, google, group, grrrit-wm, hashar, help, helpmebot, hexmode, hodor, home, htmllogs, hyperon, info, initial-login, instance, instance-json, instancelist, instanceproject, ip, is, keys, labs, labsconf, labsconsole, labsconsole.wiki, labs-home-wm, labs-l, labs-morebots, labs-nagios-wm, labs-project, labstore3, labswiki, leslie's-reset, link, linux, load, load-all, log, logs, logsearch, mac, magic, mäh, mail, manage-projects, mediawiki-instance, meh, mobile-cache, monitor, morebots, msys-git, nagios, nagios.wmflabs.org, nagios-fix, nc, newgrp, newlabs, new-labsuser, new-ldapuser, night, nocloakonjoin, nova-resource, op_on_duty, openstack-manager, origin/test, os-change, osm-bug, paf, pageant, password, pastebin, pathconflict, perl, petan, petan..., petan-build, petan-forgot, ping, pl, po*of, pong, poof, port-forwarding, project-access, project-discuss, projects, proxy, puppet, puppetmaster::self, puppetmasterself, puppet-variables, putty, pxe, pypi, python, pythonguy, pythonwalkthrough, queue, quilt, ragesoss, rb, reboot, remove, replicateddb, report, requests, resource, revision, rights, rq, rt, rules, Ryan, Ryan_Lane, ryanland, sal, SAL, say, screenfix, search, searchlog, security, security-groups, seen, sexytime, shellrequests, single-node-mediawiki, snapshits, socks-proxy, ssh, sshkey, start, stats, status, Steinsplitter, StoneB, stucked, sudo, sudo-policies, sudo-policy, svn, t13, taskinfo, tdb, Technical_13, terminology, test, Thehelpfulone, todo, tooldocs, tools-admin, toolsbeta, tools-bug, tools-request, tools-web, trout, tunnel, tygs, unicorn, venue, vim, vmem, we, whatIwant, whitespace, whyismypackagegone:'(, wiki, wikitech, wikitech-putty, wikiversity-sandbox, windows, wl, wm-bot, wm-bot2, wm-bot3, wm-bot4, wmflabs, you, zhuyifei1999, [21:50:48] lol [21:50:49] when we get closer to deployment we'll do a bunch of testing [21:51:18] !bible [21:51:18] debian bible: http://www.debian.org/doc/manuals/maint-guide/index.en.html [21:51:24] ah, _that_ one:) [22:23:08] Ryan_Lane: bd808 here is noticing that vim takes a long time to start up because of gluster on /home, is there a way we can make it less painful? [22:23:28] Or are we doomed to choose between bland vimrc and long startup times [22:24:29] `time find .vim/bundle` takes 3.5s wall clock with 33 bundles [22:24:40] bd808: marktraceur use NFS instead? [22:24:43] marktraceur: once the NFS server is properly stable we'll switch everything to NFS [22:24:58] Coren|Dinner: how's that going, btw? :) [22:25:40] Yuvi|NoPower|grr, Ryan_Lane ta [22:36:32] Ryan_Lane: It hasn't been up to its old tricks in ages. The controller stall is trivially reproductible on 3; I haven't done the tests on 4 yet (was waiting to close a number of bugs and feature requests first, which had been a bit neglected since NFS went on the fritz) [22:36:55] Ryan_Lane: No stalls on the H700 or with the internal drives, at the very least. [22:38:22] ah, but we have no redundancy right now [22:39:13] Right. It's still a raid6 though, so worst case scenario is we move the drives from 3 to 4. Still a pain. [22:39:30] But I don't want to move back to the shelves untill we clearly isolated and fixed the stalls. [23:02:02] Coren: so..... [23:02:05] http://ganglia.wikimedia.org/latest/?c=Virtualization%20cluster%20pmtpa&h=virt2.pmtpa.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [23:02:08] there's a problem here :) [23:02:15] notice what it is? :) [23:03:20] dumps is eating 100M of that [23:04:06] Actually, doesn't look all that bad from here. What am I supposed to see? [23:04:17] we only have 3 bonded network ports [23:04:23] and they are 1Gb/s each [23:04:33] that's the network node [23:04:41] Ah, network I/O [23:05:14] Ooo. That's hitting pretty much 100% at regular interval. [23:05:19] yep [23:05:32] I'm going to have to tell the dumps folks that they need to calm themselves a bit [23:05:47] Or throttle a bit at least. [23:05:50] tools is using 80M out and 10M in [23:06:40] the rest is just small amounts added up from the rest of the projects [23:07:29] NFS is necessarily a big chunk of that too. [23:11:46] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/81593/ [23:11:58] Ryan_Lane: Mostly FYI; I really wanted to stuff that somewhere in source control. [23:12:30] * Ryan_Lane nods [23:14:07] wonders how much of ./files/misc/scripts/ is actually operations/software [23:20:51] Ryan_Lane: You need to learn to not engage with trolls so much. :-) [23:21:27] Coren: what do you mean? [23:22:15] As a rule, someone asking a clearly loaded/provocative question who refuses to clarify the context or relevance won't. :-) [23:26:59] Coren: well, the thread was 20 something mails deep