[00:05:08] Coren: lighttpd? :D [00:05:14] why lighttpd? :) [00:05:42] because it's really, really lightweight so there is little cost to starting a bunch of them. [00:05:57] Works very well too. [00:12:23] * Ryan_Lane shrugs [00:12:26] lighter than nginx? :) [00:12:48] (wouldn't surprise me) [00:12:54] sounds good, though [00:13:10] it's much better than running everyone on the exact same web server [00:13:19] Coren: you're running them all on different ports? [00:13:54] so, this uses the grid to schedule a lighttpd on a node? [00:13:56] Yes, sent as a job to the gridengine (on a dedicated queue). [00:14:07] how are you routing to it? [00:14:15] yuviproxy? [00:14:44] No, but it'll be adaptable for it. Right now, the startup scripts registers itself in a flat file the webproxy uses for its rewriterules. [00:14:51] ah, ok [00:15:01] yuviproxy should make that easier :) [00:15:04] It should be trivial to tweak that to register with the yuviproxy instead. [00:15:30] this sounds like a much saner approach [00:15:37] it's still possible for one to take down an entire node, though [00:15:47] maybe launch the lighttp in a cgroup? [00:16:02] this is definitely an improvement, though :) [00:16:07] and more scalable [00:16:37] you could probably also make subqueues for more expensive ones [00:24:15] Coren: +1 good work :) [00:27:06] so, how can I get my exterior IP address then? [00:27:50] it turns out I will need to know what my outfacing ip address is because i must use it as part of the token/hash protocol that the xISBN API uses [00:28:08] sorry, to be clear this if for a tools project [00:34:18] notconfusing: is this a web tool or bot? [00:34:33] for bots there's no way to know what the public IP is [00:34:56] it seems you can hack it with checkip.dyndns.org and REs [00:35:05] * Ryan_Lane shudders [00:35:12] that's one way to do it, yeah [00:35:15] but it can change [00:35:22] so i [00:35:32] 'll just recheck ever so often [00:35:44] every time your bot launches [00:35:55] once your bot is running it's stable [00:36:01] when your bot starts it could be somewhere else [00:36:13] Ryan_Lane, thanks, that's good to know [00:36:22] perfect so it wont add to the runtime then [00:36:23] hm. I wonder if there's a way we can list the IP somewhere [00:58:33] Ryan_Lane: Couldn't they be found buried somewhere in LDAP? [01:08:45] Coren: only kind of [01:08:59] Coren: you'd need to know the public hostname [01:09:11] and if you had that, you could just do a reverse lookup in dns [01:09:15] err [01:09:18] a lookup in dns [01:09:21] Ah, you can't find it from looking at the association to the instance? [01:09:27] that isn't in ldap [01:09:30] that's in openstack [01:09:41] only dns and puppet is in ldap, [01:31:36] I'm getting a "Cannot allocate memory" error :/ [01:32:29] https://gist.github.com/theopolisme/c604cb45dac86c219163 [01:34:33] Segmentation fault... [01:34:39] Any ideas? [04:42:43] [bz] (8NEW - created by: 2Krinkle, priority: 4Unprioritized - 6normal) [Bug 55547] tools: Disable the default no-op output buffer - https://bugzilla.wikimedia.org/show_bug.cgi?id=55547 [04:50:57] YuviPanda|sick: you there? [05:16:09] !log cvn Moving CVNBot14 from cvn@willow.toolserver.org to cvn-app1.wmflabs [05:16:10] Logged the message, Master [08:42:33] Hi all [08:42:39] i have a Q [08:43:14] Why i face "recieving incomplete xml data.sleeping for x seconds" when i want to run my bot on labs? [09:25:51] [bz] (8RESOLVED - created by: 2bgwhite, priority: 4Immediate - 6critical) [Bug 55498] Webserver is down - https://bugzilla.wikimedia.org/show_bug.cgi?id=55498 [09:49:50] !log tools tools-webserver-01is getting a 500 Internal Server Error again [09:49:54] Logged the message, Master [09:50:30] petan: around? [09:54:49] or Coren, although I feel it may be a little early :) [09:58:01] hello [09:58:24] how do I download a dump of zh-yuewiki? [09:59:08] qwebirc1035095: http://dumps.wikimedia.org/zh_yuewiki/ [09:59:37] I mean on labs [09:59:45] to my tool name space [10:01:59] qwebirc1035095: /public/datasets/public/zh_yuewiki [10:02:09] Thanks! [10:02:19] np [10:45:14] i [10:54:19] [bz] (8NEW - created by: 2Magnus Manske, priority: 4Unprioritized - 6blocker) [Bug 55556] Extremely/unusually slow SQL on categorylinks table - https://bugzilla.wikimedia.org/show_bug.cgi?id=55556 [11:29:25] I am having problems to ssh into some labs instances that could ssh into before. [11:29:32] They are in the analytics project [11:29:49] So I can no longer log in to for example: kraken-namenode-standby [11:29:55] https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000862 [11:30:20] But I still can log into some other instances of the analytics project (e.g.: limn0) [11:31:00] The error message I get when trying to log in on kraken-namenode-standby is "Permission denied (publickey)." [11:46:37] Output of ssh -v [...] is at http://dpaste.com/1412029/plain/ [11:59:25] legoktm: You have queries on revision WHERE rev_user=; you realize that this will do full table scans every time (= very very long) unless you use revision_userindex instead? [12:34:47] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [13:11:20] [bz] (8RESOLVED - created by: 2Dirk Beetstra, priority: 4Highest - 6blocker) [Bug 54690] Tools-exec instances are not running the same software installs - https://bugzilla.wikimedia.org/show_bug.cgi?id=54690 [13:11:21] [bz] (8RESOLVED - created by: 2bgwhite, priority: 4Immediate - 6critical) [Bug 55498] tools-webserver-01 is down - https://bugzilla.wikimedia.org/show_bug.cgi?id=55498 [13:12:49] [bz] (8RESOLVED - created by: 2Liangent, priority: 4Normal - 6major) [Bug 54934] Wikimedia Labs database replication has seemingly stopped (s1 and s2?) - https://bugzilla.wikimedia.org/show_bug.cgi?id=54934 [13:14:54] [bz] (8RESOLVED - created by: 2Robin Krahl, priority: 4Normal - 6normal) [Bug 54107] *_userindex tables need to be documented - https://bugzilla.wikimedia.org/show_bug.cgi?id=54107 [13:15:12] [bz] (8RESOLVED - created by: 2Sumana Harihareswara, priority: 4Unprioritized - 6normal) [Bug 54700] Add "Labs seems slow, what do I do?" to Help doc - https://bugzilla.wikimedia.org/show_bug.cgi?id=54700 [13:17:01] [bz] (8RESOLVED - created by: 2Liangent, priority: 4Unprioritized - 6normal) [Bug 54451] User databases on s3 were lost (in the outage?) - https://bugzilla.wikimedia.org/show_bug.cgi?id=54451 [13:18:51] [bz] (8NEW - created by: 2Betacommand, priority: 4Unprioritized - 6normal) [Bug 54133] Web-servers do not treat .htaccess consistently - https://bugzilla.wikimedia.org/show_bug.cgi?id=54133 [13:20:42] Coren: https://bugzilla.wikimedia.org/show_bug.cgi?id=54451 - why can't they be backed up? [13:21:24] Because we don't do that, and most data *is* ephemeral. Users are, OTOH, welcome to dump any important data they want at interval. [13:23:41] [bz] (8RESOLVED - created by: 2Jesse PW (Pathoschild), priority: 4Unprioritized - 6normal) [Bug 53668] Some replicated databases are missing tables - https://bugzilla.wikimedia.org/show_bug.cgi?id=53668 [13:24:23] Coren: is a mysqldump command in crontab good enough? [13:25:32] liangent: It should, although if you wanted to be /really/ cool and the data is of general value, pushing the result to git would be (a) extra redundantly delicious and (b) possibly useful to others as well. [13:28:12] Coren: is there a reason not to run lighttpd as the user? then the user could just use jkill to kill the server [13:28:33] Coren: and how is the server started? automatically when a HTTP request is made? [13:30:19] valhallasw: (a) There's a couple, mostly needed to prevent some attack vectors, and (b) it's started if you've got a configuration file (although, once that becomes the default, it will be 'if you have a config file or a non-empty public_html') [13:31:09] ah, OK [13:31:34] Coren: no. data there are my bot execution states [13:32:10] liangent: So then just a mysqldump at interval and you're all set. [13:32:59] Coren: but what caused the loss? [13:33:04] on s3 [13:33:48] liangent: The actual database went exploody; we had to rebuild it entirely. [13:34:21] Coren: including user databases? [13:34:28] liangent: It's the same DB. [13:34:49] s/DB/cluster in mysql parlance/ [13:35:20] In other words, the entire slice was broken. [13:35:39] Coren: ok [13:52:58] Yup. That new webservice thing is going to end up working fine. [13:58:32] [bz] (8NEW - created by: 2Johannes Kroll (WMDE), priority: 4Unprioritized - 6normal) [Bug 55562] 'take' subfolder bug - https://bugzilla.wikimedia.org/show_bug.cgi?id=55562 [14:03:34] Coren: Hmm. How do I tell if my tool is actually being served from the new webgrid thing or if it's still being served by apache? [14:04:01] anomie: Easy way: the format of the access.log is subtly different. [14:04:38] Also, 'qstat -u root -q webgrid' will show you if your server is running. [14:05:10] Coren: Well, either it's extremely subtle or oauth-hello-world is still being served from apache even though I see a job for it for webgrid. [14:05:27] I shall check. :-) [14:06:09] anomie: Ah, no, it keeps failing to work and restarts in a loop. (It shouldn't do that) [14:06:24] No wait, I just lied. :-) [14:06:31] Coren, anomie: see also http://tools.wmflabs.org/?status under tools-webgrid-01 [14:06:58] which is slightly easier than remembering qstats's invocation in my experience [14:07:08] * valhallasw likes how ?status just automagically works [14:07:11] anomie: I'm seeing it as up and serving requests. [14:08:09] anomie: (!) [14:08:33] You started a server for *anomiebot* not oauth-hello-world [14:08:34] :-) [14:08:42] Coren: I started one for both ;) [14:09:08] Well, you have stuff in your error.log [14:09:24] Coren: I think I see the problem. https://tools.wmflabs.org/oauth-hello-world/foo.php goes to apache, while http://tools.wmflabs.org/oauth-hello-world/foo.php goes to lighthttpd [14:09:46] Oh, DOH! [14:10:23] fix't [14:10:49] * Coren pretends that worked all along and he just didn't forget to enable it there. [14:14:56] Coren: Hmm. What's the replacement for $_SERVER['SCRIPT_NAME'] in lighttpd/FCGI? [14:15:33] ... I honestly don't know. [14:15:42] anomie: phpinfo() might help for that [14:16:54] valhallasw: Hmm. SCRIPT_NAME is coming through but without the "/oauth-hello-world/" prefix. [14:17:04] anomie: That'd be in PATH_INFO [14:17:16] PATH_INFO has no value [14:17:20] Hm. [14:17:52] phpinfo() will show you everything the server throws at you; lemme try something. [14:19:32] Ah, oh, doh. That's unavoidable; unlike with apache the setup with lighttpd /does/ give you the documentroot. [14:19:48] * anomie suspects that http://tools.wmflabs.org/oauth-hello-world/foo.php is effectively being forwarded to something like http://tools-webgrid-01:98765/foo.php [14:20:11] It is. [14:20:16] * Coren wonders if there is a workaround. [14:20:27] Coren: you can add a header to the request [14:20:58] X-REQUEST-URL, or something like that. [14:21:37] Actually, now that I think of it, there is no overriding reason to not simply put the documentroot one level higher and keep the path the same. [14:21:56] It's not an issue because the proxy will hardcode the component anyways. [14:22:31] This way things will remain compatible regardless of the scheme in use. [14:22:41] * Coren does just that. [14:22:50] * anomie \o/ [14:23:09] Coren: btw, why doesn't the request just get forwarded to the web server? then it just receives the full 'http://tools.wmflabs.org/etc/' url [14:24:05] Ohwait. That won't work. [14:24:07] * Coren ponders. [14:25:31] Ah, yes, easily done after all. [14:28:36] * Coren restarts the webservers. [14:32:48] Hm. That didn't quite work. [14:37:40] Ah, typo. Restarting. [14:41:05] There we go. [14:42:04] That should have SCRIPT_NAME and SCRIPT_URL consistent with running under apache now. [14:42:10] * Coren needs to restart one last time. [14:43:02] anomie: ^^ does it work for you? [14:43:28] Coren: It looks like it does, yes [14:44:00] It needed a bit of alias trickery to keep the documentroot correct, but it should now work with consistent semantics to apache's [15:55:38] I got a race condition I'm not getting. :-( [15:58:51] gah. Has anyone been able to get mw.o's oauth working in python? [15:59:14] apparently it's nonstandard enough to bring the standard flask-oauth library to tears [16:38:53] [bz] (8NEW - created by: 2MZMcBride, priority: 4Unprioritized - 6normal) [Bug 55567] gerrit-stats (gerrit-stats.wmflabs.org) appears to be broken - https://bugzilla.wikimedia.org/show_bug.cgi?id=55567 [16:41:30] Coren: Oh. I'll try and figure out which script is doing that... [16:41:56] Oh, it's working now. [16:42:10] NFS file locking semantics are... not useful for what I was trying to do. [17:02:35] [bz] (8ASSIGNED - created by: 2Tim Landscheidt, priority: 4Unprioritized - 6normal) [Bug 52976] Provide user_slot resource in grid - https://bugzilla.wikimedia.org/show_bug.cgi?id=52976 [17:03:43] [bz] (8RESOLVED - created by: 2MZMcBride, priority: 4Unprioritized - 6normal) [Bug 53640] "links" MariaDB database view is broken on Wikimedia Labs - https://bugzilla.wikimedia.org/show_bug.cgi?id=53640 [17:06:39] [bz] (8NEW - created by: 2Tim Landscheidt, priority: 4Unprioritized - 6trivial) [Bug 48625] Provide namespace IDs and names in the databases similar to toolserver.namespace - https://bugzilla.wikimedia.org/show_bug.cgi?id=48625 [17:08:07] [bz] (8RESOLVED - created by: 2Daniel Schwen, priority: 4Unprioritized - 6blocker) [Bug 52944] Please enable mod_fastcgi support - https://bugzilla.wikimedia.org/show_bug.cgi?id=52944 [17:11:29] [bz] (8RESOLVED - created by: 2Tim Landscheidt, priority: 4Low - 6normal) [Bug 48105] "sudo chown ..." asks for password which doesn't exist - https://bugzilla.wikimedia.org/show_bug.cgi?id=48105 [17:37:28] Coren: If you have a minute, https://gerrit.wikimedia.org/r/#/c/89023/ [17:40:25] anomie: merged [17:40:36] Coren: Thanks! [18:11:20] * Coren considers giving users quotas. 100K should do it. [18:13:19] in which units? [18:15:11] bytes. :-) I'm just groaning because a rsync of the shared filesystem is... lots of stuff. [18:18:17] Ryan_Lane: Did you migrate projects or is this all switches to NFS the users made themselves? [18:18:46] themselves [18:19:04] I'm waiting for it to be stabel [18:19:07] *stable [18:20:44] Ryan_Lane: Looks like you won't have much work to do; I'm seeing actual homes for at least 123 projects [18:20:51] Issues on tools-login - no response [18:21:16] hedonil1: Planned maintenance in progress; this was annouced on labs-l and is in the channel topic. :-) [18:21:58] ok then 8-) [18:28:16] What does "couple of hours" mean in detail 1, 2, 10 20? [18:28:36] Ryan_Lane: we're organizing an event in November targeting researchers / data hackers. What's the current processing time for new tool labs requests? [18:28:47] [bz] (8RESOLVED - created by: 2Dereckson, priority: 4Unprioritized - 6normal) [Bug 53793] Users with a former SVN account not migrated can't create an account - https://bugzilla.wikimedia.org/show_bug.cgi?id=53793 [18:29:29] hedonil1: I'm hoping less than 1, but we might be stuck around 2. [18:29:56] DarTar: depends on who is online, really. About 4-5 people can approve requests... [18:30:15] DarTar: so if you've one of those people online.... Pretty instantaneous. [18:30:25] DarTar: If we have a heads' up and we expect it, pretty much instant. [18:30:43] Coren: Thanks. [18:30:44] k cool, I don't know how many external people are planning to attend, but we should start advertising soon [18:31:20] DarTar: Otherwise, the list tends to be look at about every day or so. [18:31:37] looked* [18:31:49] will give you the heads up once we make the announcement, thanks [18:31:58] (With occasional bouts of "but I thought someone else was doing it") :-) [18:32:14] Anybody care to explain? [18:33:04] !newlabs [18:33:04] This is labs. It's another version of toolserver. Because people wanted it just like toolserver, an effort was made to create an almost identical environment. Now users can enjoy replication, similar commands, and bear the burden of instabilities, just like Toolserver. [18:34:20] Coren: oh. wait. I think I had rsync'd them a while aho [18:34:22] *ago [18:34:28] because I was going to move everyone [18:34:34] then it started having instability issues [18:34:56] Ah, well at least it's going to be a fairly swift rsync since nothing would have changed. [18:35:01] yeah [18:35:07] that hasn't changed in ages [18:44:28] Coren, why is labs not working? [18:45:12] petan, ^ [18:45:29] did you read the mailing list? [18:45:31] Cyberpower678: scheduled maintenance [18:45:34] also /topic [18:45:49] I didn't get an email. [18:46:05] see labs-l [18:47:28] Oh wait. It's coming in now as well a flood of other labs-l messages. WTF? [18:47:37] WTF? [18:47:43] >:( [18:47:53] It just keeps coming, [18:48:42] WTF is catfoot. [18:49:00] I've got zillions of messages about that. [18:49:22] That's not why I subscribed to the list, to be spammed like that. [18:49:54] * YuviPanda takes labs-l to arbcom [18:51:18] * Cyberpower678 backs YuviPanda and also takes Coren to ArbCom. *oh the irony [18:51:21] three, two, one, FIGHT! [18:51:26] that's not irony! [18:52:03] YuviPanda, Coren used to be a member of ArbCom if I recall correctly. Now he's being taken to it. [18:52:08] hi labs people! [18:52:13] who can help qchris fix his labs stuff? [18:52:15] he can't access some nodes [18:52:15] Cyberpower678: :) I know [18:52:17] we aren't sure why [18:52:20] If that's not ironic [18:52:23] what is [18:52:24] ottomata: NFS maintenance now, perhaps that [18:52:30] its been a day or two? [18:52:43] oh [18:52:48] probably not :) [18:53:15] Ok so I'll try in a few hours when NFS maintenance is over. [18:53:17] Thanks. [18:53:31] Cyberpower678: https://en.wikipedia.org/wiki/File:Stop_Defacing_Signs.jpg is ironic. [18:54:17] There's a catch 22 here. [18:55:10] In order to tell people to stop defacing stop signs, you need to deface a stop sign. But defacing a stop sign is going against the sign. [18:56:20] * Cyberpower678 hacks YuviPanda  [18:56:33] * YuviPanda takes away Cyberpower678's node [18:56:48] * Cyberpower678 wasn't aware he had one. [18:58:00] * Coren looks at the not-fast rsync with a bit of concern. [18:58:06] * Cyberpower678 digitializes YuviPanda  [18:58:51] Digitilization complete. Digital YuviPanda is 4 PB big. [18:59:12] * Cyberpower678 creates 5 MB chunks and uploads YuviPanda to Wikipedia. [19:00:12] YuviPanda, what's it like there? [19:03:27] Coren, eta? [19:04:22] Cyberpower678: Unknown. There seems to have been tools that ran since last night that created gigs of output that now has to be rsynced while the filesystem is off. :-( [19:04:57] Umm... Would that be me? [19:05:12] I know some log files of mine are couple gigs big. [19:06:15] Coren, eta on the system being switched back on? [19:11:38] I've just answered that Cyberpower678 [19:11:41] beta labs is 503 right now? [19:12:07] chrismcmahon: As announced on labs-l some time ago, and today, and on the channel topic: planned maintenance. :-) [19:12:18] thanks Coren. I are slow. [19:17:59] While waiting: https://commons.wikimedia.org/w/index.php?title=File%3ABeethoven%2C_Sonata_No._8_in_C_Minor_Pathetique%2C_Op._13_-_II._Adagio_cantabile.ogg [19:19:24] Coren, I thought that was an eta on labs being operational again, not when the new hardware gets turned on to start resyncing. [19:23:40] Cyberpower678: One requires the other. [19:24:13] * Coren stares at rsync, hoping to scare it into being done faster. [19:25:23] * Cyberpower678 sends a stare through XChat to rsync alongside Coren's [19:25:38] >:| [19:25:41] >:| [19:25:43] >:| [19:26:48] The whole process is, unsurprisingly, IO-bound [19:26:51] * chrismcmahon cheers on rsync from Utah [19:27:07] go rsync go. go rsync go... [19:27:17] Coren, don't tell me you're pushing enter all the time. [19:28:05] Pikachu! I choose you!!. [19:28:14] Use thunderbolt on rsync [19:29:37] Hedonil sends digital german beer to strenghthen rsync's virility [19:29:48] my ssh to tools-dev is down, any known issue? [19:29:57] Danny_B: NFS maintenance. see labs-l [19:30:00] Danny_B, scrollback [19:30:22] sigh, again no notifications :-/ [19:30:35] Danny_B, but there were [19:31:02] did not get any [19:36:15] Connection closed by remote host when trying to ssh to tools-login? [19:36:50] * Krenair should pay more attention to his emails [19:37:12] Danny_B: The first notification was two weeks ago; then a reschedule for this week (the leak shuffled everyone's schedules), then a warning this morning before it started, then a notice when it started. [19:37:49] Danny_B: I'm trying hard, but you guys have *got* to read labs-l. :-) [19:40:17] Coren: I actually just skim labs-l but istr reading it on Yet Another List. I just spaced out that it was RIGHT NOW. [19:40:58] You know, the delay is all the users' faults. If only they didn't keep doing things like writing to files, and all! [19:40:59] :-) [19:41:23] yeah, this job would be great if it weren't for the users. and the programmers. [19:48:25] chrismcmahon: and the ops people [19:53:48] Coren: last time we talked about this you guys promised you will be sending those server notifications (i don't know the proper term, sorry) - some message which pops up on you in your ssh window [19:55:37] Danny_B: That I *did* forget. Mea culpa. [19:56:24] Funnily enough, it's deployment-prep and not tools that has the heaviest catchup to do. [19:56:33] happens... np... just pls try to not forget again in future ;-) [19:57:51] Coren: deployment-prep stores a ton of useless crap in its filesystems [19:58:02] and I have doubts they ever garbage collect their repos [19:58:25] Ryan_Lane: The problem is that rsync doesn't have a "meh, not worth copying" discrimator module. :-) [19:58:30] yeah [19:58:32] I know :( [19:58:36] it totally should [20:01:11] andrewbogott: were you still working on the OpenStackManager interface to invisible unicorn? [20:01:42] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/88187/ [20:02:06] let me review. we can write a package for yuvi's api [20:02:15] ok [20:02:22] we need to start working on labs migration soon [20:02:27] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Highest - 6major) [Bug 48501] [OPS] beta: get SSL certificates - https://bugzilla.wikimedia.org/show_bug.cgi?id=48501 [20:03:29] hashar: so... there's a number of options on the table regarding beta [20:03:41] and none of them really require much ops work [20:03:55] hashar: how are the beta domains setup? [20:04:13] ..beta.wmflabs.org? [20:04:30] and commons.beta.wmflabs.org, wikidata.beta.wmflabs.org, etc ? [20:04:31] Ryan_Lane: e.g. en.wikipedia.beta.wmflabs.org [20:04:35] yes [20:04:39] Ryan_Lane: For future reference, it's not clear that rsync then rsync after for migrating is measurably faster than just doing a tar over ssh and copying the whole thing. [20:04:45] getting a cert is the easy thing [20:04:58] en.m.wikipedia.beta.wmflabs.org also. [20:05:08] It's too late now for me to restart this, but rsync wastes more time figuring out what to diff than it does actually copying stuff. :-( [20:05:25] yeah, so .beta.wmflabs.org [20:05:42] Coren: you have it actually doing diffs? [20:05:48] you aren't using the timestamp options? [20:06:09] yeah, that's going to be slow as shit :) [20:06:27] Ryan_Lane: we have mobile as well [20:06:30] Ryan_Lane: The problem with timestamps is that most of the big stuff that has to be sent is gigs and gigs of appended-to logs. [20:06:58] hashar: right, so, same url scheme ;) [20:07:05] Ryan_Lane: i should move invisible unicorn to gerrit [20:07:11] Ryan_Lane: and commons.wikimedia.beta.wmflabs.org [20:07:12] YuviPanda: yes, please :) [20:07:18] wonder where it should live [20:07:23] labs/invisible-unicorn? [20:07:29] Ryan_Lane: but yeah the aim is to mimic production as close to possible [20:07:29] wikidata.beta.wmflabs.org :) [20:07:37] Ryan_Lane: and thanks for your Sartoris puppet change :] [20:07:43] oh, fuck me, the url scheme is somewhat different? [20:07:48] that needs to get fixed [20:08:03] no idea the details there [20:08:08] Ryan_Lane: the URL for bits I think is also different, not sure [20:08:16] yeah, that all needs to get fixed [20:08:36] Coren: whats the eta for labs? [20:08:56] everything should match the url scheme like this: s/.org/.beta.wmflabs.org/ [20:09:07] Betacommand: Uncertain. The copy is taking way longer than I was hoping. Judging by the disk space, it should be aboute 20-30 minutes. [20:09:23] But it has been anything but linear. :-( [20:09:37] At least we get working hardware this time. [20:09:46] Coren: http://xkcd.com/612/ [20:09:47] Ryan_Lane: iirc bits is at https://bits.beta.wmflabs.org [20:09:49] (BTW Ryan_Lane, no controller stalls no matter how hard I pushed it) [20:09:56] Coren: excellent [20:10:01] Coren: so, that controller is fucked [20:10:10] let's let it run for a while and ensure that's the case [20:10:27] Ryan_Lane: requested at https://www.mediawiki.org/wiki/Git/New_repositories/Requests [20:10:38] ^d: new repo? (scroll to end of https://www.mediawiki.org/wiki/Git/New_repositories/Requests) [20:10:44] Either that, or there is a driver regression with the newer kernels (I haven't taken any chances and did not enable thin volumes on labstore4) [20:10:56] But now that labstore3 will be free, we can experiment. [20:11:55] indeed [20:12:07] hashar: so, the systems that have the certs need to be controlled [20:12:17] I'd imagine that right now the ssl terminators are on the squids? [20:12:29] na its varnish doing the terminaison [20:12:37] everything is on varnish now? [20:12:41] yup :-] [20:12:44] excellent [20:12:48] worked with mark to migrate everything [20:12:52] ok, so we need to control access to the varnish boxes [20:12:59] he even tested text varnish using beta :-] [20:13:02] and we caught bugs! [20:13:05] nice [20:13:12] then got mobile to test it out as well iirc [20:13:16] there's a few options here: [20:13:40] 1. We disallow volunteers from having projectadmin in deployment-prep and we limit sudo access on those systems [20:13:52] 2. We require all volunteers in the project with projectadmin to have NDAs [20:14:15] 3. We move the varnish systems out of deployment-prep and into a locked down project [20:14:21] those are solutions, what are you attempting to prevent ? [20:14:31] volunteers getting access to the * beta certs [20:14:40] can't we just generate random certs using the Labs certificate authority ? [20:14:40] then add the CA public cert to the test browsers? [20:14:55] I suggested something similar [20:14:56] do we really have to use a star ? [20:15:02] twice. in the bug. [20:15:26] if you want to go that route, you can generate your own CA [20:15:36] we killed off the * wmflabs cert [20:16:10] hashar: if we don't get a real cert, then end-users will always get warning pages [20:16:26] so manual testing will be harder [20:16:33] hashar Ryan_Lane my concern is the group of "test browsers" for beta labs is far larger than just automated tests or users we know of. [20:16:44] chrismcmahon: eh? what do you mean? [20:16:51] you mean manual testers? [20:17:09] Ryan_Lane: what you said about manual testers, or anyone who wants to use beta labs wikis casually [20:17:14] yeah [20:17:33] I'm not opposed to real certs, but if we have them, we need to protect them [20:18:16] so far no one has really made a plan of action, though all of the solutions have been mentioned in the bug [20:18:18] Ryan_Lane: part of the utility of beta labs is as a resource for existing and potential community, so the lower the barrier to entry, the better. [20:18:51] chrismcmahon: I'm not disagreeing. I'm just saying there's steps we need to do if we want to have real certs. [20:19:05] Ryan_Lane: yep [20:19:20] and whoever is going to be doing those steps needs to agree to one and start doing them [20:19:25] yep [20:19:27] we can get the certs in a day or so usually [20:19:57] i see what you mean now [20:20:14] so the idea would be to get real certificates for *.beta.wmflabs.org and *.wikimedia.beta.wmflabs.org etc [20:20:21] Ryan_Lane: at a casual glance, your solution #1 above looks like the least expensive and least disruptive [20:20:22] get them published on the varnish instances [20:20:34] make sure only people under NDA are project admin and know about it [20:20:40] then restrict sudo access on those boxes [20:20:52] sounds plausible to me [20:21:00] hashar: it's more than just *.wikimedia *.wikipedia [20:21:09] don't we also have wikibooks, wikinews, etc. ? [20:21:16] it'll be one large combined cert [20:21:19] just like production [20:21:46] solution #2 seems unworkable to me. solution #3 sounds like a hassle but maybe I'm mistaken [20:21:47] yeah hence 'etc' : [20:21:58] it can be a combo of #1 and #2 [20:22:07] or we could just ask tester to manually download Labs CA cert [20:22:10] #2 is what we require on tools [20:22:13] and self sign [20:22:19] hashar: we don't want people to install our CA [20:22:23] it's dangerous [20:22:32] then any web site they go to can be MITM'd with our CA [20:23:07] isn't our CA bound to wmflabs.org ? [20:23:36] CAs don't work that way [20:24:04] it's a hierarchical trust system [20:24:12] if you trust a CA, you trust everything it signs [20:25:45] * aude prefers combo of option #1 and #2 [20:29:24] Ryan_Lane: if you get NDA for project admins in tools, I guess beta should do the same [20:29:38] Ryan_Lane: much easier to handle since both projects will have the same policy [20:30:21] Ryan_Lane: looking at the list of projectadmin for beta, seems it is mostly wmf folks [20:30:28] aude: aren't you under NDA with wmf ? [20:30:36] hashar: we are not but should be [20:30:54] hashar: yeah, and there's actually relatively few volunteers in deployment-prep [20:31:03] it's important to not shut out volunteers from helping, at same time security is important [20:31:13] well, it's more than security [20:31:19] having some procedure for that seems reasonable and not a burden [20:31:21] Ryan_Lane: yeah [20:31:27] if we apply that policy, we can change the privacy policy of beta [20:31:32] to be the same as the sites [20:31:41] Ryan_Lane: would you mind taking the lead on that ? It is well over my tech knowledge [20:31:42] like tools [20:31:56] hashar: this isn't a tech thing [20:32:03] Ryan_Lane: and I barely know CA / SSL stuff (like I thought sub CA could be restricted to a domain hehe) [20:32:14] hashar: we're going to get real certs [20:32:21] and robh will generate them [20:32:49] past that, you need to go through the projectadmin list and remove any volunteers [20:32:58] + clean out sudo [20:33:00] then you need to change the sudo policy to only allow specific people [20:33:02] yes [20:33:21] but then, until a few weeks ago I thought that granting projectadmin was giving sudo rights as well :D [20:34:54] heh [20:36:10] I guess we can still give sudo access to selected servers [20:36:18] seems NovaSudoer let us do that [20:38:03] Gaaaaah! So many minuscule files!!!! [20:38:17] it would be rare or never that i need it [20:38:28] but you never know if/when a need arises [20:40:51] Ryan_Lane: could you post a summary on https://bugzilla.wikimedia.org/show_bug.cgi?id=48501 please ? [20:41:22] aude: and you might want get in touch with legals @ wmf to get WMDE folks under NDA as welll [20:41:26] aude: that would be hlepful [20:42:07] Ryan_Lane: and I barely know CA / SSL stuff (like I thought sub CA could be restricted to a domain hehe) [20:42:12] yeah that's what I thought as well [20:42:16] why isn't that possible? [20:42:24] ... why does deployment-prep have thousands and thousants of thumbnails? [20:42:34] Coren: delete them all [20:42:40] Coren: it is not important [20:42:41] hashar: i'll be at the office in a week 1/2 [20:42:54] Coren: thumbs might never get purged on deployment-prep [20:42:55] can ask to push these things along [20:43:12] hashar: I can't, the filesystem is readonly now. [20:43:23] aude: you might want to email beforehand :-) this way when you show up you just have to sign :-D [20:43:27] Krenair: because that's not how X509 is designed :) [20:43:30] Coren: arch sorry :-( [20:43:43] I think I'm going to abort deployment-prep and leave them for later. [20:44:15] when you trust a CA, you don't trust it for a set of subdomains, but you trust any certificate it signs [20:44:22] hashar: sure [20:44:24] Coren: if that can help do :) [20:44:39] Coren: and whenever it is writable again, feel free to purge all thumbnails. [20:44:56] Coren: might want to fill a bug if they are not purged. Not sure we do in prod either [20:45:11] hashar: I'd rather someone who knows deployment-prep to do it though. [20:45:13] Krenair: this is why people complain very loudly about the chinese CAs that were added to most browsers recently [20:45:26] What chinese CAs? [20:45:35] the chinese CAs in question are corporations that are mostly PLA operated [20:45:44] wtf [20:45:51] CNNIC [20:46:04] https://bugzilla.mozilla.org/show_bug.cgi?id=542689 [20:47:03] recently, January 2010? :P [20:47:10] that gives them the ability to do legitimate MITM attacks [20:47:19] that is crazy [20:47:21] :-] [20:47:40] I'm always surprised when people don't know how SSL works :) [20:47:44] I guess I shouldn't be [20:48:09] the IP address could have a DNS entry listing the SSL fingerprint it serves :-] [20:48:26] it couldn't, actually [20:49:08] well, I guess it's more possible now: http://datatracker.ietf.org/wg/dane/ [20:49:12] but DANE is kind of a bad idea [20:49:22] Why isn't the SSL certificate system designed in such a way that certificates can be signed to only validate signing specific domains? [20:49:46] and dane still requires a hierarchal trust [20:50:00] Krenair: where would you buy certs from? [20:50:29] or how would browsers know what to trust? [20:50:45] companies which have certificates trusted to issue certificates for * [20:50:55] which brings you back to the same problem [20:51:02] not really [20:51:05] we'd still end up trusting some people [20:52:10] but you could get a certificate that can only sign for validation of specific domains [20:52:40] I guess that would be helpful for some things. for instance, it would be nice for the US government CAs to be included in browsers, but only valid for US government domains [20:52:44] * Coren skips deployment-prep. [20:52:52] exactly [20:53:31] they could be restricted to *.gov (or the opposite - I'd really love to see .gov, .edu and .mil go away or have their US-only restriction removed) [20:54:59] but yeah, not how the system works, so MITM's for everyone! [20:55:04] :) [20:55:38] Coren: got to head to bed [20:55:49] Coren: bon courage pour la suite! [20:55:54] hashar: Anyone on your side that can help with deployment-prep? [20:56:05] hashar: I'd rather not leave it down until tomorrow. [20:56:23] Coren: not sure what is left to do ? [20:56:31] Coren: all instances can be rebooted safely [20:56:38] Coren: though the SQL ones might be an issue [20:56:55] hashar: Allright then, I'll just reboot them once your copy is over and hope for the best. [20:57:18] Coren: I can check that things are at least mostly valid after reboot [20:57:22] Coren: yeah that should be fine. You might want to shutdown mysql properly on -sql and -sql02 [20:57:36] Coren: and do delete thumbnails if it can help [20:57:49] hashar: Surely, you didn't but their databases on NFS? [20:57:52] put* [20:57:56] I have idea [20:57:59] chrismcmahon: Excellent. Thanks. [20:58:02] I think it is on /dev/vdb [21:01:00] Coren: and I have a bunch of automated builds against deployment-prep kicking off in the middle of the night PDT. It's be nice to have deployment-prep up for those, and we can check a whole lot of stuff about beta after those run. [21:02:05] chrismcmahon: Once the NFS server is back to full operational status, my first order of business is to get you guys synced back up. I dunno how long this will take, but it's certainly today. [21:05:51] WHOOOOOOOOOOO [21:06:30] My patch uploader now speaks OAuth for authentication, and uses that to set the committers' identity -- https://gerrit.wikimedia.org/r/#/c/89110/ [21:06:40] Coren: bed for me. [21:06:45] Now if only I could deploy ;-) (no worries, I'll do it tomorrow) [21:06:46] will look at beta tomorrow [21:06:51] hashar: Good sleeps. [21:07:05] Coren: eventually leave some notes somewhere for european ops to be able to help if needed [21:07:14] but I am sure it will be fine :-] [21:07:24] valhallasw: excellent work :) [21:07:38] valhallasw: congratulations! [21:09:23] OK, so to do now: deploy, and document how I got this auth thing to work. [21:09:30] Probably in reverse order *grin* [21:18:57] Why the [bleep] [bleep]ing [bleep] is there a an install of MXE on tools, with the full source of Qt?! [21:19:30] Oh! And windows exe! [21:20:10] petan: ^^ [21:20:18] Srsly, dude? [21:20:49] petan: What in blazes do you need a Windows cross-compiler for? [21:37:58] is the maintenance still running? [21:38:12] i still can't ssh to tools dev [21:42:11] Coren, I'm guessing it's for Huggle? [21:42:39] Danny_B: Yeah; I didn't count on ~2.1 million files having been touched in the past 24h. [21:42:45] http://www.gossamer-threads.com/lists/wiki/wikitech/389495 [21:42:54] rsync doesn't like git pull. :-/ [21:43:42] Coren: any eta? no pushing, i just want to know if it's worth to stay awake or go sleep ;-) [21:44:08] Danny_B: It might be another hour still. [21:44:33] You're welcome to push hard on rsync, but I doubt it's going to work any faster for it. :-P [21:47:25] valhallasw, what's the use case for your oauth commit identity thing? [21:47:32] as i said no rush, just informational question [21:49:28] Krenair: people can submit patches to gerrit via his web tool [21:51:03] ah, ok [21:52:44] Krenair: it's basically two things: a) an oauth based identity tool, as openid is not functional at the moment, and b) a tool to upload patches [21:53:08] and I hope b) will lower the barrier-of-entry of gerrit some more [21:53:26] How will a web tool help? [21:54:01] Krenair: you don't need to use git-review, or git at all - you can just paste a patch file [21:54:32] I'd link you to it, but tool labs is down ;-) [21:54:45] so instead I'll link you to it tomorrow :-) [21:56:12] Danny_B: No, seriously. Anything helps. :-) [22:08:39] Hah. And I was worried that the switchover might be too quick and I'd have trouble with the ARP cache. [22:08:58] hahaha [22:09:25] * Ryan_Lane uses labs with his gluster filesystem [22:09:40] ;) [22:09:52] * Coren would like to see how quickly you'd be able to copy it all. :-) [22:09:56] is it back up? [22:10:02] Coren: that would take ages [22:10:03] ah [22:10:08] i guess no [22:10:30] So many small files! [22:35:20] Hey guys, is tools.wmflabs.org down? [22:36:32] TParis, ping Coren [22:40:21] TParis: Maintenance. Copying the filesystem to new hardware. [22:40:47] The rsync I did in advance should have kept that short. Didn't count on over 2M files being touched since yesterday. :-( [22:41:34] Coren Coren Coren - I'm disappointed. If you can't predict the unpredictable, well, I just don't know... [22:41:36] ;) [22:41:45] Okay, well I'll go mow the law while you work, thanks ;) [22:43:31] andrewbogott: you can actually test your patch ;) [22:43:37] andrewbogott: openstack project uses gluster [22:43:54] so does the bastion [22:45:06] Ryan_Lane: but proxy-dammit is on NFS [22:45:23] it tests against the proxy in use? [22:45:32] IIRC yeah :P [22:45:44] and shouldn't it still work, even without homedirs? [22:45:50] are you running it as your own user? [22:45:54] or as a system user? [22:45:56] Ryan_Lane: the proxy itself works [22:46:03] Ryan_Lane: but the api is just running in screen as me [22:46:04] so [22:46:11] stabby stabby stab [22:46:12] :) [22:46:13] Ryan_Lane: metrics.wmflabs.org works, for example. [22:46:29] Ryan_Lane: I am still not sure how to deploy it :P needs uwsgi [22:46:33] git deploy? [22:46:36] you know, we could run the api as a wsgi behind nginx or apache [22:46:45] indeed, that's how we should do it [22:46:47] no apache please [22:46:48] git-deploy inside of labs is.... difficult right now [22:46:53] I see [22:47:03] I could deploy via Puppet... :P [22:47:07] you could, yes [22:47:13] specify a hash [22:47:19] I'm not opposed to that for now [22:47:23] ideally we'd make a package [22:47:36] I don't know. This is a webapp, shouldn't be deployed as a package. [22:47:43] should be deployed as how we deploy other cod [22:47:44] e [22:47:45] why not? [22:47:54] that's how the other openstack services are deployed [22:48:17] why? [22:48:23] an API should be relatively stable [22:48:38] because it's easier to handle dependencies and such via packages [22:48:49] and they do proper code releases [22:49:17] and the apis can run standalone, or via wsgi behind nginx/apache [22:50:19] Ryan_Lane: packaging web apps always felt 'wrong' to me [22:50:29] Ryan_Lane: perhaps because I'm yet to see it done right [22:50:37] mw, wp, drupal... [22:50:41] it's a web service, not necessarily a web app [22:50:52] hmm, it's not even a 'web' service, just uses http [22:51:04] it just happens to use http to make an API available [22:51:12] right [22:51:25] so I'd be hard pressed to call it a web app :) [22:51:28] I've nothing against it, but it's going to be a while before I can figure out how to make it a package [22:51:43] I'm pretty sure we're going to do it for you [22:51:57] that'll be nice :D [22:52:05] I put in a gerrit request [22:52:09] a while ago [22:52:11] we need to finish this work up so that we can move onto migration [22:52:25] yeah [22:52:37] YuviPanda: Ah, we/you're blocked by waiting for gerrit things to happen? [22:52:38] this is going to be really helpful for migration, so I want to finish it first :) [22:52:50] the less floating IPs we have in use the easier things are [22:52:59] andrewbogott: one of the things, yeah :) [22:53:07] andrewbogott: to move invisible-unicorn into gerrit [22:53:12] andrewbogott: Ryan_Lane i requested labs/invisible-unicorn [22:53:16] rather than labs/tools [22:53:18] seemed ap [22:53:19] t [22:53:35] * Ryan_Lane nods [22:54:35] Ryan_Lane: i bet system packaged versions of flask *and* py redis are too old for it. [22:54:42] YuviPanda: OK, I can create that right now, except I don't know what to put in 'rights inherit from...' [22:54:55] hmm, i've no idea what that means either :| [22:55:01] andrewbogott: can you also do the import? [22:55:50] YuviPanda: I could, but it should be easy enough for you to do in just a minute... [22:55:54] btw, where is the proxy code hosted? [22:55:59] hmm, never worked for me :) [22:56:06] andrewbogott: github.com/yuvipanda/invisible-unicorn [22:56:13] andrewbogott: the *proxy* code itself is in operations/puppet [22:56:16] dynamicproxy module [22:56:29] ah, ok. [22:57:04] YuviPanda: can you now git clone https://gerrit.wikimedia.org/r/labs/invisible-unicorn, add files, and git review? [22:58:23] andrewbogott: rights inherit from a parent repo [22:58:30] which I'd imagine should be labs [22:58:36] if it's in labs/ [22:59:16] Hm, now that it's created I wonder if it's too late to change that [22:59:39] cloning [23:00:51] pushing [23:01:17] remote: You are not allowed to perform this operation. [23:01:20] remote: To push into this reference you need 'Push' rights. [23:01:23] remote: User: yuvipanda [23:01:26] andrewbogott: ^ [23:01:49] well, you should review rather than push... [23:01:56] I don't think anyone ever pushes directly to gerrit do they? [23:01:58] that's a lot of commits to review [23:01:58] sigh [23:01:59] ok [23:02:00] let me do that [23:02:00] Or does 'push' do the same as review? [23:02:05] Can't it just be one big commit? [23:02:10] andrewbogott: no!!!!! [23:02:13] that's terrible :P [23:02:23] andrewbogott: you need 'push' if you want to import repo with history [23:02:30] Hm, the project configuration GUI is surprisingly sparse [23:02:31] andrewbogott: so usually ^d sets it up so you push once and then you can't [23:02:35] ah, I see. [23:02:42] andrewbogott: heh, Gerrit's UI sucks. who would'v ethought :P [23:02:45] Hm. [23:02:48] <^d> YuviPanda: Basically you make an empty repo. [23:03:00] ^d, that part's done… [23:03:12] https://gerrit.wikimedia.org/r/#/admin/projects/labs/invisible-unicorn [23:03:17] <^d> The behavior is the same if you push to a branch that doesn't exist yet, it treats it as a branch creation. [23:03:20] <^d> So just push to master. [23:03:33] 04:31 YuviPanda: remote: You are not allowed to perform this operation. [23:03:39] 04:31 YuviPanda: remote: To push into this reference you need 'Push' rights. [23:03:42] 04:31 YuviPanda: remote: User: yuvipanda [23:03:49] Maybe by creating the initial empty commit I ruined something? [23:03:51] ^d: problem might be that there's an 'empty initial commit'? [23:03:52] yeah [23:04:01] <^d> Yeah. [23:04:07] right [23:04:08] dang [23:04:10] so review and manual +2 [23:04:16] <^d> So either you change permissions on the repo to allow direct pushing to the branch in question. [23:04:22] <^d> Or rebase on top of the empty commit. [23:04:32] How do I change the permissions? [23:04:39] ^d: i rebased on top of the empty commit [23:05:05] <^d> andrewbogott: When you have a repo open like the url you gave me earlier...click the "Access" tab. [23:05:20] Oh, those are tabs! [23:05:25] Now I see where the rest of the gui is [23:05:38] hehe [23:07:10] andrewbogott: found it? [23:07:13] Um, ok, it wants a group, do I need to create a group and add yuvi to it? [23:07:23] I should already be in a bunch of groups [23:07:49] Right, but I want to add just you, not everyone in labs [23:08:49] and 'Group Name' is just a freeform text field, no idea what to add [23:09:54] andrewbogott: hmm, I could just push the commits and you can merge them :D [23:09:56] it's not too many [23:10:05] inherits from YuviDirectPushOverrideGroup [23:10:15] just 16 [23:10:40] YuviPanda… assuming I can merge, let's do that. [23:10:44] But I also have to go in about 1 minute [23:10:57] AzaToth's quit message needs "Repeat ad nauseam." at the end. :-) [23:11:17] Just send me an email with the last patch in your patchset and I'll merge later on... [23:11:21] andrewbogott: whelp, that's not going to happen anytime soon :| [23:11:22] how many files left? [23:11:28] * Coren makes a note to tell whoever maintains checkwiki that they don't *need* to keep dumps of every wiki in their account. [23:11:32] andrewbogott: because I need to add a commit message for each of those :| [23:11:39] yikes [23:11:50] Well… maybe you can get ^d to give you permissions... [23:12:00] * YuviPanda pokes ^d [23:12:11] Danny_B: It looks about 90% done, size wise.