[00:16:17] Aaah! [00:16:17] Your php dies during malloc [00:16:17] up the memory limit in php cli config? [00:16:17] Ryan_Lane: -1 [00:16:17] :P [00:16:17] heh [00:16:17] I think my process is just a bit too memory hungry :/ [00:16:17] * Coren wonders why the system doesn't go to swap though. Some peculiarity of the VM? [00:16:17] addshore: Yeah, I was about to ask wth you could be doing that had that heavy a footprint. :-) [00:16:17] preprocessing dump files :/ [00:16:17] addshore: In memory? [00:16:17] this was my first plan :P [00:16:17] and it has failed ;p [00:16:17] addshore: You may have to... err... revise your algorithm. :-) [00:16:17] yep :P [00:16:17] I think the easiest way may just be to parse the dumps to sql and then dump them in a DB [00:16:18] Ryan_Lane: You're going to be full of joy at the proxy config I got going for the tools project. I managed to cleanly isolate the tools /and/ sanitize away PII. :-) [00:16:18] overwrite the db with each dump [00:16:18] Coren: cool >) [00:16:18] err [00:16:18] :) [00:16:18] this would be allot easier with DB replication ;p *wink win nudge nudge* [00:16:18] maybe I should go and hunt for a pre made dump parser rather than trying to write my own :/ [00:16:18] Also PHP is not renowned for its judicious memory management. :-) [00:16:18] I know :P [00:16:18] * Jasper_Deng wonders why PHP was selected for MediaWiki in the first place.... [00:16:18] Ryan_Lane: *cough* *cough* https://gerrit.wikimedia.org/r/#/c/51312/ *cough* [00:16:18] Coren: you should get that cough looked at [00:16:18] Jasper_Deng: It's reasonable for what it was doing then, it has a good base of people who know it, and it's managable for most things. [00:16:18] Jasper_Deng: It could have been worse. It might have been implemented with Java servlets. :-) [00:16:19] Coren: hm [00:16:19] I rarely use virtual resources [00:16:19] now I need to look this up to properly review :( [00:16:19] Ryan_Lane: Oh? That's pretty much an ideal use case. [00:16:19] I'm not saying it's a problem [00:16:19] Ryan_Lane: Heh. Don't worry about a rush, just as long as it doesn't fall of your todo. :-) [00:16:19] * addshore thinks http://www.mediawiki.org/wiki/Manual:MWDumper may be his saviour [00:16:19] so, how is this conditionally included? [00:16:19] it's only realized if ssh_banner is set? [00:16:19] Ryan_Lane: You include the ssh::bastion class. This will cause ssh to realize the file resource. [00:16:19] * Ryan_Lane nods [00:16:20] Ryan_Lane: The advantages of doing it this way is (a) ssh::bastion might include more "bastionny" stuff eventually, and (b) other classes may want to define the virt resource if they need a banner. [00:16:20] Though of course you can only have one of those. I could have gone with the file-bits approach to construct a sshd_banner from a concat, but that seemed like heavy overkill to me. :-) [00:16:20] merged all the way through [00:16:20] yeah, this is a perfectly appropriate use of this [00:16:20] kk. FYI: the banners will stay everywhere they are, but puppet will only replace them (or add them for new instances) if the ssh::bastion instance is there. I'd have posted a change to put in in the openstack defaults, but I don't know how. :-) [00:16:21] meh, just remove the banner [00:16:21] there is no standard one [00:16:21] Heh. [00:16:21] * Coren might even have the grid working tonight. [00:16:21] that would be amazing xD [00:16:22] Ryan_Lane: Feb 28 23:42:20 tools-exec-01 nslcd[1082]: [7cfb34] error writing to client: Broken pipe [00:16:22] Something going on with LDAP? [00:16:24] Ryan_Lane: Gluster goes boom. [00:16:26] Coren: eh? [00:16:26] Coren: what specifically is breaking? [00:16:26] Ryan_Lane: No more gluster; nothing it the logs, every access gets stuck waiting on I/O [00:16:26] on which instance, on which project? [00:16:26] Ah. All instances of tools. [00:16:26] Includes /public/keys [00:16:27] Ah, except on -login which I had just recently logged in. [00:16:27] public keys isn't working? [00:16:27] Hm, no, that seems to have been a symptom of the homes being broken. [00:16:27] can you give me some instance names? [00:16:27] bots-webproxy [00:16:27] tools-webproxy* [00:16:27] tools-master [00:16:27] tools-exec-01 [00:16:27] trying to login to labs: [00:16:27] tools-webserver-01 [00:16:27] Login error [00:16:27] Labs uses cookies to log in users. You have cookies disabled. Please enable them and try again. [00:16:27] ? [00:16:27] Wikinaut: you need to use wikitech.wikimedia.org [00:16:27] argh [00:16:27] I wonder why that isn't redirecting. [00:16:27] redirect [00:16:27] ! [00:16:28] ===> http://www.w3.org/Provider/Style/URI.html <=== [00:16:28] There are multiple keys, refine your input: !log, $realm, $site, *, :), access, account, account-questions, accountreq, addresses, addshore, afk, alert, amend, ask, b, bang, bastion, beta, blehlogging, blueprint-dns, bot, botrestart, bots, botsdocs, broken, bug, bz, cmds, console, cookies, credentials, cs, damianz, damianz's-reset, db, del, demon, deployment-beta-docs-1, deployment-prep, docs, documentation, domain, epad, etherpad, extension, -f, forwarding, gerrit, gerritsearch, gerrit-wm, ghsh, git, git-branches, git-puppet, gitweb, google, group, hashar, help, hexmode, home, htmllogs, hyperon, icinga, info, initial-login, instance, instance-json, instancelist, instanceproject, keys, labs, labsconf, labsconsole, labsconsole.wiki, labs-home-wm, labs-morebots, labs-nagios-wm, labs-project, labswiki, leslie's-reset, link, linux, load, load-all, logs, mac, magic, mail, manage-projects, meh, mobile-cache, monitor, morebots, msys, msys-git, nagios.wmflabs.org, nagios-fix, newgrp, new-labsuser, new-ldapuser, nova-resource, op_on_duty, openstack-manager, origin/test, os-change, osm-bug, pageant, password, pastebin, pathconflict, petan, ping, pl, pong, port-forwarding, project-access, project-discuss, projects, puppet, puppetmaster::self, puppetmasterself, puppet-variables, putty, pxe, python, q1, queue, quilt, report, requests, resource, revision, rights, rt, Ryan, ryanland, sal, SAL, say, search, security, security-groups, sexytime, single-node-mediawiki, socks-proxy, ssh, sshkey, start, stucked, sudo, sudo-policies, sudo-policy, svn, terminology, test, Thehelpfulone, tunnel, unicorn, whatIwant, whitespace, wiki, wikitech, wikiversity-sandbox, windows, wl, wm-bot, [00:16:28] We just reorganized our website to make it better. [00:16:28] Wikinaut: yes, obviously. [00:16:28] Do you really feel that the old URIs cannot be kept running? If so, you chose them very badly. Think of your new ones so that you will be able to keep then running after the next redesign. [00:16:28] * Ryan_Lane sighs [00:16:36] (It's not my day, today. But there's still the night. (C) Wikinaut) [00:16:37] Coren: I can't even log in directly as root [00:16:37] Coren: are you logged into these systems? [00:16:37] Ryan_Lane: I have _one_ working (-login) but I dare not log off. :-) [00:16:37] that one works fine for me [00:16:37] you sure it's gluster? [00:16:37] Ryan_Lane: Try a ls of /data/project [00:16:37] what changes were made? [00:16:37] or a df for that matter. [00:16:37] that wouldn't cause a hang of login [00:16:37] Ryan_Lane: No, but the homes would [00:16:37] ah [00:16:37] let me check something [00:16:38] hm [00:16:38] it should be shared properly [00:16:38] public/keys works properly [00:16:38] homedirs should as well [00:16:38] data/project shouldn't cause hangs [00:16:38] Everything was fine until ~20 minutes ago [00:16:38] Maybe 30 by now [00:16:38] * Ryan_Lane nods [00:16:38] I'll check the servers [00:16:38] * addshore thinks his mysql permissions are a wee bit messed up on bots.mysql-3 on the table addshore_dump, would be great if someone could poke my permissions in the right direction again :/ [00:16:38] Wait, public/keys works on -login, like homedirs, but we don't know that it's working on the other instances. [00:16:38] And root wouldn't care about homedirs, only public/keys right? [00:16:38] if public/keys is working anywhere, it's working everywhere [00:16:38] hm. gluster volume seems to be non-responsive [00:16:38] the command, that is [00:16:38] can't go a week without dealing with gluster, it seems [00:16:39] * Coren starts eyeing NFS with longing. [00:16:39] I want to switch to the netapp pretty badly [00:16:39] why don't we have an NFS export per project ? [00:16:39] instead of fgluster [00:16:39] and beside NFS being evil? [00:16:39] :-D [00:16:39] hashar: hosted on what? [00:16:39] glusterfuck? [00:16:40] on the Gluster server? [00:16:40] a single point of failure NFS server will fuck us equally as hard at some point [00:16:40] that won't help us [00:16:40] yeah probably [00:16:40] though less likely than the crazy Gluster issues we are having [00:16:40] Ryan_Lane: Could just export block devices to the projects and they deal with per-project or whatever else they need? At least any failure then becomes project-local. [00:16:40] then we'll have a shit-ton of nfs servers [00:16:40] and we'll deal with entire projects being broken because their nfs is down [00:16:40] and people will use fstab [00:16:40] and we'll spend all day fixing instances rather than a centralized solution [00:16:41] Hm. Four quarters for a dollar; you'd trade one big problem for lots of little ones. I see your point, but I guess the question is how confident are you that you can beat gluster into shape? [00:16:41] ... if you did something, that worked. [00:16:42] I would rather have on project not having nfs instead of the full labs being awfully slow :-] [00:17:06] I havent seem gluster slowing things down in quite a while [00:17:15] hashar: part of the slowness is due to the network driver we use [00:17:21] we'll be solving that next week or so [00:17:26] ahh good to know :-] [00:18:05] Ryan_Lane: New data point: /data/project just unwedged on -login [00:18:11] yep [00:18:19] I just restarted glusterd service on all of them [00:19:09] How did you manage that without logging in? :-) [00:19:43] I did this on labstore1-4 [00:19:47] Oh, on the gluster stores. [00:40:54] __db.001: Applesoft BASIC program data [00:40:54] I'm thinking... no. :-) [01:09:58] Question for any bots people: Which instance do I do? O.O https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots#Instances_for_this_project [01:11:08] pick one [01:12:04] Seriously? [01:12:48] well [01:12:50] dont use an apache [01:12:52] or an sql [01:12:59] or something specific to a person [01:13:05] like analytics or salebot [01:13:06] but yeah [01:13:33] * Riley just goes with bots-bnr1 [01:14:29] oh goodie [01:14:36] that'll teach addshore manners. [01:15:50] Hahah [01:17:00] bots boner? [01:22:11] I can haz gridz! [01:38:44] Coren: you has gridz? :P [01:39:03] legoktm: HAHA! [01:39:10] :D [01:39:40] Riley: whatcha running on there? :p [01:40:01] addshore: IRC bots and my sandbox bot [01:40:09] addshore: I haz a gridz! [01:40:33] can I has gridz? ;p [01:40:43] shame it cant have sql yet :/ [01:41:18] legoktm: what script were you talking about earlier to scan dumps? :P [01:41:28] * Riley waits like an hour for pywikipedia to transfer over to labs [01:41:29] give me like an hour? [01:41:34] its xmlreader.py [01:41:37] ill set it up for you [01:41:43] I had one but put little or no memory managment on it so I just chewed through 8GB in like 2 mins ;p [01:41:50] Riley: you dont need to copy it there [01:41:51] lolwut [01:42:01] addshore: no? [01:42:09] /data/project/pywikipedia [01:42:13] bbl [01:43:33] addshore: If you don't mind living on the bleeding edge, sure. [01:43:43] Coren: I love the bleeding edge ;p [01:44:15] What's your labs username? [01:44:23] addshore [= [01:44:24] Well, wikitech now. :-) [01:44:45] addshore: You can haz gridz. [01:44:47] also in regards to having sql with the gridz :P Could you not just setup a single sql instance? :O [01:44:51] addshore: tools-login [01:45:36] :D [01:47:28] addshore: Do you need a tool directory and stuff set up? That requires manual handling atm. [01:47:43] (Needed if you want to have a web interface) [01:47:55] wont need a web interface :) [01:48:20] Also needed for continuous jobs. [01:48:35] mhhhm, go on then :P [01:48:48] please to give me name of tool [01:48:54] addbot :P [01:53:42] addshore: project dir is in /data/project/addbot [01:53:57] cheers :) [01:55:11] addshore: only the tool can actually request continuous runs. You can sudo to it to do stuff. [01:55:27] Also http://tools.wmflabs.org/addbot/ [01:55:36] [= [01:55:37] (That's for free) :-) [01:56:03] You'll probably need to log off and back on to be in the local-addbot group [01:56:10] (the username for the tool is local-addbot) [01:58:04] okay :) [01:58:33] addshore: hint: sudo -iu local-addbot [02:02:50] Ooh [02:02:53] I want gridz! [02:03:01] Coren: plz [02:03:13] Heh. [02:03:25] wikitech username? [02:03:33] legoktm [02:03:59] Beware: this is very barebones still. There's a webserver for tools, a compute grid, and not much else. :-) [02:04:31] Sure. [02:04:42] legoktm: Read all the above speech to addshore, apply to yourself, then answer the same questions for me. :-) [02:05:04] * Coren copypastas said speech somewhere [02:05:43] Yes I would like a web interface. [02:05:55] If I make my tool name "legobot' can I still ssh as legoktm@blah? [02:07:08] legoktm: Yes, you sudo to the tool once on -login at need [02:07:16] legobot then? [02:07:27] Sure [02:07:50] * Coren makes a script to automate that. :-) [02:11:12] Ah, poop, puppet is fighting with me. [02:11:14] * addshore thinks Coren should add a mysql host ;p [02:14:40] All done but the sudoers [02:14:44] Hang thight [02:14:48] tight* [02:14:55] * legoktm hangs on for his life [02:15:05] well legobot does. [02:31:36] !Ryan_Lane [02:32:05] log bot is dead? [02:32:25] heh [02:32:33] any reason for a ping? :) [02:32:35] No, was about to poke you for a quick and simple puppet revie. It's just about to go ping on -operations. :-) [02:33:02] There it goes. :-) [02:33:38] Coren: please use modules [02:33:48] Even for one-offs? [02:33:52] yes [02:33:56] always modules [02:34:06] no more spaghetti code manifests [02:34:14] Oh. I shouldn't have looked at existing practice, huh? :-P [02:34:21] Will amend in minutes. [02:34:27] why are we using system sudo, rather than ldap? [02:34:54] Ryan_Lane: Because the data is locally generated and ldap sudo doesn't allow %group ALL=(user), as far as I can tell. [02:35:45] I'm confused by what you mean [02:35:58] The enties I need are: [02:36:14] %argoup ALL=(auser) NOPASSWD: ALL [02:36:19] yeah [02:36:21] that can go in ldap [02:36:25] Oh! [02:36:38] Hm. Then I need write access to ldap, don't I? :-) [02:37:18] (and if it goes in ldap then it's %agroup ahost=(auser) instead) :-) [02:37:27] when we add the interface to manage tools, it can add the sudo policy at the same time [02:37:59] Hmmm. And in the meantime, then, what do I do? Because atm puppet crushes any changes to sudoers or sudoers.d [02:38:26] So I can't do it locally. [02:39:19] do it via ldap [02:39:25] ah. crap [02:39:30] it lists the groups specifically [02:39:32] hm [02:39:54] Coren: puppet doesn't wipe out changes in sudoers.d [02:40:00] we don't recurse [02:40:08] Yes it does. It's a managed dir. It purges any addition. [02:40:11] temporarily you can do it that way [02:40:48] [[citation needed]] [02:41:14] * Coren is hunting for it. [02:44:01] dafu? I can't find it now, but that file was rm'ed under me at least twice in the past two hours. [02:44:15] * Coren is confused. [02:44:31] Allright, chalk it up to... Because ALIENS! [02:44:34] hahaha [02:44:52] it shouldn't [02:44:54] * Coren just ran a puppetd -tv just to be sure [02:46:44] legoktm: So, it works as advertized after all. :-) [02:46:54] Awesome! [02:46:58] Is there documentation somewhere? [02:47:06] Or do I just go crazy? :P [02:47:11] http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Help [02:47:20] :-) [02:47:41] So, no. But I'm on-channel, just ping me at need. Documentation is... ongoing. :-) [02:48:12] legoktm: What probably interests you is /data/project/legobot [02:48:17] Ok [02:48:20] legoktm: there's a public_html in there. [02:48:26] So whats the basic workflow? [02:48:37] Do I just stick my tools in that directory? [02:48:45] I probably need a cgi-bin [02:48:52] legoktm: There's one too. :-) [02:48:58] :) [02:49:15] legoktm: Stuff in there is run by the user owning it, even from the webserver [02:49:44] the webserver is a valid submit host, so you can qsub things from your CGIs [02:49:55] ooh [02:50:03] and the syntax for job submission is the same? [02:50:34] Basically, though I expect the available queues are mostly different but that should be transparent to you. [02:51:02] You can also qstat to xml if you want to do fancy things like show the status of jobs from the web interface. [02:51:48] Your tool lives under http://tools.wmflabs.org/legobot/ [02:52:54] Remember that you should run your jobs as the tool user, and the cgis and php should be owned by it as well. [02:53:02] You can run one-off jobs as yourself if you want, though. [02:53:57] ping me for missing dependencies, I don't yet have most things installed. [02:54:04] Coren: Could you add me (scfc/tool wikilint) as well to the project? [02:59:20] scfc_de: {{done}} [03:01:20] Thanks! [03:01:41] Ah ok [03:01:43] thanks [03:11:38] Coren: I'd need libhtml-parser-perl libwww-perl liburi-perl libdbd-sqlite3-perl on -login and -webserver-01, and additionally libtool autoconf sqlite3 on -login. (And if apt-file isn't installed on -login, would be nice to install it there as well to look up packages.) [03:12:19] * addshore reminds legoktm about his dumps ;p [03:12:43] i just fought off a bunch of chinese spambots, finished CUing them and am exhausted [03:12:50] in a few hours [03:13:18] :P [03:14:05] I never realized how complicated CUing is [03:14:28] scfc_de: Kk. Gimme a minute [03:15:31] No problem. The webserver doesn't serve /data/project/* yet, does it? [03:21:00] scfc_de: Yes it does; yours would be at http://tools.wmflabs.org/wikilint/ [03:25:16] scfc_de: I'm presuming you want most (all) of that on the exec host as well. :-) [03:25:59] exec = SGE? Not necessarily, the tool is just a CGI. [03:26:20] scfc_de: Is it heavyweight? [03:26:32] No, I'm just an autotools fan :-). [03:26:51] scfc_de: Ah, ok. Cuz you can qsub stuff from the CGI for heavy duty stuff. :-) [03:27:02] I've seen your ads :-). [03:27:25] (I'll look into that for other things.) [03:28:01] hint: "qsub -sync y" is a useful idiom when all you need is the extra oomph [03:28:53] scfc_de: All your bits seem to be in place [03:29:33] Ah, so that's a bit like "ssh $SOMEWHERECOZY"? [03:30:07] scfc_de: Pretty much. [03:30:43] Also neat: qmake :-) [03:31:27] Hm. Not installed atm though. [03:32:31] Well, if I ever were to compile SGE ... :-) [03:33:04] -login is pretty fast, BTW. The DB conversion that's part of my Makefile went through in seconds, while on Webtools it took several minutes. [03:33:18] (Even though the instances look the same.) [03:35:00] Ooops. "make *-n* install" ... Let's time it again. [03:36:07] (Okay, apparently not super fast when it actually does what it's supposed to do :-).) [03:36:12] Heh. [03:36:28] It's a small instance; the intent is that any significant work should be farmed out to the cluster. [03:36:54] I'll probably add a -dev instance for compiles and stuff to leave -login snappy for interactive use. [03:39:24] Although, tbh, I don't think "snappy" is a realistic objective with gluster. [03:39:33] Don't know if it's worth the trouble (and users will be able to differentiate between "interactive compiles" and "interactive other uses"). [03:40:47] (Especially if SGE/OGS can be used easily.) [03:44:55] scfc_de: Ah, a caveat: any cgi or php you run will use the uid of the file it's in. [03:45:10] scfc_de: That usually makes things /simpler/, but you need to be aware. :-) [03:47:43] Coren: Don't you use the standard config? I believe in this all files beneath ~$USER/public_html are either served as the user's account (if various conditions are fulfilled) or not at all. [03:48:16] Coren: (Though not a problem, of course.) [03:49:16] scfc_de: In practice that shouldn't be an issue; the $USER owns the tree and so has control over ownership of the files within -- but there are security concerns about trusting an entire tree to a set of credentials rather than the specific object. [03:49:49] scfc_de: (I.e.: with the best practice of keeping all your tool files owned by the tools, there is no difference) [03:51:42] Yep, but AFAIR chown is effectively limited to root, so you can't clean up after the fact, but have to plan beforehand. [03:52:23] scfc_de: Well, there is always the make-copy-remove-bad-one idiom [03:53:01] cp broken broken~;mv broken~ broken [03:53:37] And, no, I believe mod_suexec makes a point of not trusting the entire tree, but looks at each file and if it doesn't fit the criteria, it's not executed as either the owner of the file or the owner of the tree, so it's pretty hard to shoot yourself in the foot. [03:54:09] The tool tree is sgid to the tool group, so the tool maintainers always retain control. [03:56:38] As long as it is possible to execute scripts as the tool user, I can cope with the rest :-). [03:57:46] Ah, but I'm not using mod_suexec. I'm using suphp which has some desirable properties for my setup. (faster, and also better handling of groups) [03:58:57] Although I admitedly configured it a bit liberally; flexibility always means it's easier to shoot *yourself* in the foot. :-) [04:00:49] Does it handle CGIs as well? :-) [04:00:56] (Perl CGIs, that is.) [04:00:57] scfc_de: Aye. [04:01:18] Though, tbh, I didn't actually /test/ that. :-) [04:01:33] Heh. [04:01:42] It should. If it doesn't, I'll fix it. [04:02:34] Oh, btw, make sure none of your stuff relies on living on any specific webserver; they may be moved around through webproxy as load demands. [04:02:54] (all of them will be identical, and see the same filesystem, so that should have no effect on you) [04:03:46] So, in short: Just like on Toolserver :-). [04:04:09] Neither River nor Daniel were fools. :-) [04:04:42] Some googling suggests that your assumption about suphp + Perl is true ... [04:05:27] There is a plausible difference, though: your scripts will not see the actual IPs of the users. Not sure if that was the case on the TS (I only ran bots there) [04:06:02] Probably; but I never used that. [04:07:25] One good thing that's going to be deployed soon: Wikimedia as openid server. [04:07:40] (soon as in "days" most likely) [04:10:03] Yep, my estimate at https://bugzilla.wikimedia.org/show_bug.cgi?id=13631#c17 was only off by three months :-). [11:52:23] !log bots added Hercule and Riley, trusted users [11:52:26] Logged the message, Master [11:52:34] Thanks [11:53:56] Riley: Actually you were added by Vacation9. [11:54:21] But he forgot to reply on talk page. [11:54:32] * Riley though there was a difference between being approved and being a "trusted user" [11:55:44] No differences, I think. That's just a reason in log and wouldn't affect other things. [13:39:22] * addshore wonders if he can modify somethinf or the gridz :D [14:30:32] Coren that system on tool labs you are working is going to be "production" or testing or both [14:31:01] petan: It's the future production system. [14:31:02] original proposal of bots project was consisting of two isolated areas, so that people can develop their bots on one and when ready, they update the "production" with latest version [14:31:24] petan: But it's easily replicated. [14:31:36] perhaps we could convert the current bots project structure to same as what you are making and use it for testing purposes [14:31:46] once it's usable of course [14:32:15] so that's it's less pain to move bots from current bots into your new project in future [14:32:17] petan: That sounds like a plausible scenario; but I'd make a different project rather than repurpose a current one as long as someone uses it. [14:32:32] addshore: Whatcha need? [14:32:58] need to make my scripts not use a DB ;p [14:33:03] petan: That said, there are already a few craz^H^H^H^H courageous volunteers working the new infrastructure pattern. :-) [14:33:26] link me to proposal [14:34:18] !tl is http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Design [14:34:18] Key was added [14:34:31] !toollabs alias tl [14:34:31] Created new alias for this key [14:34:35] This you mean http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Design [14:34:39] yes [14:34:40] ... yeah. [14:34:41] that's what I meant :) [14:35:16] @labs-project-instances tool-labs [14:35:16] I don't know this project, sorry, try browsing the list by hand, but I can guarantee there is no such project matching this name unless it has been created less than 34 seconds ago [14:36:02] * addshore needs to make a script to join all RC irc feeds :/ [14:36:06] @labs-project-instances webtools [14:36:06] Following instances are in this project: webtools-odie, node, webtools-login, webtools-apache-1, webtools-rr, [14:36:16] Coren is that tool labs project? [14:36:20] webtools? [14:36:27] addshore: I'm going to make a not-the-replica-DB DB anyways; there are clearly use cases where it would be counterproductive to hold stuff on the replication db [14:36:37] @labs-project-instances tools [14:36:37] Following instances are in this project: tools-login, tools-puppet-test, tools-webproxy, tools-exec-01, tools-webserver-01, tools-master, [14:36:45] awesome Coren :) [14:36:57] Coren that's where the bots are going to be? [14:37:17] petan: Eventually, yes. Right now, it's barebones functionnal but not ready for primetime. [14:37:30] ok can you insert me to that project? [14:37:37] petan: Sure. [14:37:43] Petrb [14:38:45] Gimme a minute and I'm all yours. [14:56:03] petan: Back. [14:56:21] petan: What's your labsconsole username? [14:56:26] Petrb [14:57:45] Please to read the pseudo-doc at http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Help [14:57:56] ok [14:58:21] If you want a tool user (you almost certainly do) do tell me. :-) [14:58:50] ok that's very pseudo [14:59:13] yes if you think I do then I do :D [14:59:48] It's not so much "docs" as "raw data I keep around for when I write the doc" :-) [15:01:18] petan: Tool name? [15:01:42] I have no tool to run there yet, I just wanted to help out and check out how this new system works [15:02:13] only tools I have are either not suitable or I don't want to run them on unstable environment... [15:02:25] Ah, cool. [15:02:40] I can create an experimental playpen for you nonetheless [15:03:01] right, what would be cool now would be a description of these boxes which are there [15:03:16] tools-puppet-test etc [15:03:39] Ignore -puppet-test. It's an artefact. :-) [15:04:03] The scheme is simple: all human work is done by/on -login [15:04:13] -master is the grid scheduler [15:04:21] -exec-nn are the execution nodes (there is just one) [15:04:29] mhm I love to have description of box in motd but that is something what wasn't much welcomed in labs [15:04:54] -webserver-nn are the apaches serving the actual contents [15:04:56] I did that on beta cluster but it was later overriden on most boxes [15:05:20] -webproxy is, unsurprisingly enough, the proxy in front of the webservers that does the dispatch. [15:05:36] tbh, nothing but -login has user servicable parts inside. :-) [15:06:07] okay but I suppose there are going to be some sysadmins too in future, or is that system going to be operated only by you until end of times? [15:06:11] * its times [15:06:33] if there were some, they would appreciate description of these boxes :P [15:06:33] petan: There will be, but if you're a sysadmin you're unlikely to rely on motds to tell you what box does what. :-P [15:06:40] ofc [15:06:54] but it's usefull for users who accidentally ssh to some and have no clue where they are [15:07:13] petan: And yes, that will all be documented of course. Right now I'm in "do the stuff, collect notes" mode though. [15:10:42] petan: Your playpen name 'petan' works for you? [15:11:01] sec I am in office [15:14:28] * Coren needs to switch desks. BBIAB. [15:36:51] petan: I be back. [15:44:25] has any of you had input/output errors when accessing a home directory? o_= [15:44:30] yes [15:44:35] Silke_WMDE that's gluster issue [15:44:42] you should create a bz ticket [15:44:42] ah [15:44:48] ok [15:45:02] thx [15:45:03] include the name of files and project [15:45:10] i will [15:55:09] [bz] (8NEW - created by: 2silke.meyer, priority: 4Unprioritized - 6normal) [Bug 45609] Input/Output errors in a /home directory - https://bugzilla.wikimedia.org/show_bug.cgi?id=45609 [15:58:44] * Coren would rip gluster off entirely if it were up to him. [16:02:06] Coren, Ceph? [16:03:09] MaxSem: It's an alternative, but I'm not convinced for the case for a DFS in the first place; though I can see some of the arguments in favor. [16:03:36] project storage is convenient [16:03:54] also, shared /home is awesome [16:04:03] It is; I just don't see DFS as the best way to do it for the general case. [16:04:18] though I've seen a wiki that ran off Gluster... ugh [16:04:54] I'm particularily unimpressed by seeing glusterd keep half a CPU busy like it typically does. [16:05:51] You know what my first reflext might have been? Coda. [16:06:06] Coren, you haven't seen Gluster leaking memory - that was fun [16:06:12] heh [16:06:19] 7+ gb of ram eaten etc [16:06:32] Ah, you're using the Dwarf Fortress definition of "fun". :-) [16:09:32] Ryan was trying to prove to me that I can't configure MW to run fast, then I asked what does he think `time ls some/gluster/dir` should output?:) [16:10:02] But, after like 4 days of official working with a project having to rely on gluster, my opinion is "Hell, give me a network block device and lemme do the NFS dance on my own -- anything is better" :-) [16:43:36] BBIAB [16:58:09] Down with Gluster! [16:58:21] petan: I eat ram ;p [16:59:15] addshore: is that your favorite snack? [16:59:20] :o [16:59:31] Ryan_Lane I updated main page [16:59:36] ye[ [16:59:38] *yep [16:59:41] saw that [16:59:43] ram ;p I tend to eat 8GB at a time ;p [16:59:58] * Coren doesn't particularily enjoy having a process consume 60% of my CPU time to allow me to read a 64 byte file with just 3 seconds of delay. [17:00:33] Coren: gluster? [17:00:56] Ryan_Lane: Yeah. Frankly, at this point CIFS would look like an improvement. [17:01:41] Coren indeed that's why I put all my bots on local storage no matter how complicated they are :P [17:02:08] even ls of almost empty folder can take forever on gluster [17:02:17] with automount :) [17:02:52] addshore: You'll choke on that. That's why it's normally byte-sized. :-) [17:03:00] Ryan_Lane how did you import the pages from wikitech? :P [17:03:05] there is no log at all in RC feed [17:03:16] importDump [17:03:17] did you run maintenance/rebuild rc.php [17:03:25] HAH, love it Coren ! [17:03:29] I did [17:03:36] mhm ok [17:03:46] maybe it doesn't write to RC log at all when u use that [17:03:50] probably not [17:04:38] because I know some pages were redlinked even when they exist, they became blue once you clicked them [17:04:41] that was weird [17:05:10] like most of these links on "production" [17:05:42] Ryan_Lane: Can you just give me a nice block device? I promise I'll take good care of it. :-) I'll hold it, and hug it and love it forever. And I'll call it George. [17:06:22] Coren: if you just want a block device that isn't shared between instances, use /mnt [17:06:37] petan: cache [17:14:02] Ryan_Lane: Well, I was planning on exporting it through NFS between my instances. Honestly, with the current Gluster performance and reliability, we'll not get anyone to want to move to the Labs. [17:17:04] Coren: we'll be improving the gluster performance soon [17:17:08] by changing the network driver [17:17:23] also, we'll eventually switch away to the netapp [17:17:39] we've had nfs instances before [17:17:41] it's not fun [17:17:49] it leads to wonderful cascading failures [17:17:57] that are difficult to fix [17:18:36] ok. need to head to work [17:18:51] Ryan_Lane: Not sure what you mean by cascading failiure in that context, but right now Gluster is definitely a blocker. [17:18:56] Ryan_Lane: ttyl [18:12:18] !log deployment-prep deleting -dbdump, migrated udp2log on -bastion [18:12:20] Logged the message, Master [18:24:17] [bz] (8REOPENED - created by: 2Antoine "hashar" Musso, priority: 4High - 6major) [Bug 41104] glusterfs log files are not rotated - https://bugzilla.wikimedia.org/show_bug.cgi?id=41104 [18:35:36] !log deployment-prep Created deployment-search01 and deployment-searchidx01 [18:35:38] Logged the message, Master [18:38:27] xyzram: I have created two boxes on labs to host the search system and index :-D [18:38:44] xyzram: working on the puppet manifest right now ( puppet being the system that we use to configure a server) [18:38:49] Oh, nice! [18:38:59] Which ones ? [18:55:23] Aw come /on/. 8s to serve a static file?! [18:55:37] On a comletely unloaded server. [19:01:36] * Coren looks at Gluster. Looks at a flamethrower. Looks again at Gluster. [19:01:46] press ? [19:02:28] xyzram: I guess I will have to come to you to find out the configuration that is needed [19:02:57] When I try to create a new instance on labs, I always get the error message "Failed to create instance". [19:03:03] Is there something I am missing? [19:03:08] https://twitter.com/hukl/status/307469987826761729 [19:03:21] apmon: might be a quota issue [19:03:31] or maybe it is broken again :) [19:04:06] How can I find out if it is a quota issue? [19:04:25] I'd suggest pulling in notpeter as well [19:05:04] yeah [19:05:10] are you there next week ? [19:06:08] hashar: I'll be online but physically in LA [19:06:18] at,least that is the same timezone :) [19:06:25] right! [19:06:51] do you know the history behind disabling wikis using a non-existent host ? [19:07:08] i.e. search1000x [19:07:19] nop [19:07:26] i guess that is a hack [19:08:12] hashar: Looks like it might have been a quota issue. After deleting an old one, I can now managed to create a new one. Thanks [19:08:57] apmon: Is that an OK solution? I can raise your quota by an instance, otherwise. [19:10:13] It should be fine for the moment. But I might get back to you later? [19:11:01] On the other hand, I suspect not all of the instances in the maps project are needed anymore, so potentially some of them can be cleaned up to make space [19:11:44] * hashar digs in lucene doc at http://www.mediawiki.org/wiki/Extension:Lucene-search/2.0_docs#Global_configuration [19:11:46] grmblblb [19:12:52] at least we have doc [19:16:13] xyzram: https://gerrit.wikimedia.org/r/#/c/51677/ [19:16:21] will apply that on the instance and find out what happens [19:17:38] !log deployment-prep Applying puppetmaster::self to both search boxes [19:17:40] Logged the message, Master [19:22:16] Ryan_Lane, do you have enough spare cycles to fix this split-brain problem? https://bugzilla.wikimedia.org/show_bug.cgi?id=45609 [19:22:44] yep [19:22:48] Although, huh, if it's in a git repo they should maybe just erase and start over [19:22:54] Barring local patches [19:23:10] depending on the number of split brained files it may be easy [19:23:17] 'k thx [19:25:27] * Damianz watches nfs jump up and slap Coren [19:25:40] andrewbogott: Haaaai, how you finding salting? [19:25:49] Also someone tell me to fix nagios snmp traps in about an hour [19:25:58] Ryan_Lane: Stop trying to make my point. :-) [19:26:30] Damianz: Um… it keeps getting pushed down the line :( [19:26:35] :( [19:26:58] But I've read some of the code, it looks reasonably easy to follow [19:27:07] gluster needs to go [19:27:27] If bastion is ever stable again I might just bash out some osm stuff and get something down for stuff... needs yummy python work first though cause php suuucks [19:27:32] Ryan_Lane: We know this? :D [19:28:05] Damianz: what's unstable with bastion? [19:28:16] [14:01:36] Coren looks at Gluster. Looks at a flamethrower. Looks again at Gluster. [19:28:20] You kept restarting gluster, so it killed my session [19:28:29] Like the not last but time before the time before the time before gluster broke [19:28:32] I only restarted it one, yesterday [19:28:34] ah [19:29:35] * Damianz mutters about rebooting not being a valid fix [19:29:47] rebooting? I haven't done that in weeks [19:29:59] you mean for gluster? [19:30:04] Yeah :P [19:30:22] I've restarted the glusterd daemon a couple times in the last two weeks, but I haven't rebooted in quite a while [19:30:43] Also I should come troll in here more often, got a crapton of stuff to do but never seem to have time these days... lets fix that =D Cause we know you really love me, really :P [19:30:50] :D [19:30:52] well, you do help :) [19:31:08] andrewbogott: morning :-] I noticed the gluster logs are not properly rotated on the clients. Had to reopen https://bugzilla.wikimedia.org/show_bug.cgi?id=41104 :-D [19:31:23] oooh hashar! [19:31:25] andrewbogott: seems the package glusterfs-common provides a log rotate file which does not include SIGHUP [19:32:05] hashar: Zuul is awesomeness, why did we not have this before? :D [19:32:39] hashar: I'm not sure the sighup is necessary [19:32:54] A general question regarding gluster: Is it feasible to use gluster for replication across a couple of servers and then access the bricks directly for read only to avoid any performance issues? [19:33:16] apmon: not really [19:33:19] Ryan_Lane: I am not sure which one of the script does the rotation, but it surely does not rotate properly. The gluster client ends up still writing to the same inode :/ [19:33:25] Damianz: I have no idea :-] [19:33:30] hashar: I guess it's necessary, then D [19:33:32] err [19:33:32] :D [19:33:58] apmon: I tried that once, it ended badly [19:34:18] :-( In what way? [19:34:21] Since a stat kicks off replication or sends you to another node you can get weird content back [19:35:01] It doesn't replicate on writes, but reads? [19:35:34] notice: /Stage[main]/Accounts::Dab/Group[dab]/ensure: created [19:35:37] pffff [19:36:24] Well it replicates on writes ideally, but if you have a node split the client in theory can get redirected to a node with the 'newest' content and it will be replicated out or something (this was a while back). So effectivly if you stat a file and another node has an up-to-date copy it will replicate to your out of date node, bypassing the client skips that so you can get split brain content.e [19:36:35] Though Ryan is really the expert on gluster split brains :D [19:36:43] -_- [19:36:52] they suck [19:36:56] and are a pain in the ass to fix [19:37:37] hashar: Yeah, I saw your bug, will investigate. [19:37:49] andrewbogott: probably not that urgent :-]  [19:38:05] It definitely did not work w/out the sigup, first time I tested. [19:38:20] OK. That would potentially still be ok for the application I had in mind. [19:38:24] But gluster also provides a canned rotation tool which maybe we can switch to. [19:38:34] :( [19:38:44] Ryan_Lane: The frontpage doesn't have enough unicorns [19:38:51] Damianz: heh [19:39:11] weird having sal (prod) under a bit labs logo rofl [19:39:26] heh [19:39:36] well, I'd like to move to the strapping skin at some point [19:39:41] which makes it look less awkward [19:42:21] strapping is the theme used on ovirt, right? [19:42:53] yeah. and on wiki.openstack.org [19:43:30] I can never find anything on the openstack wiki heh [19:43:41] yeah, it's kind of a pain [19:44:55] it's slightly better with the switch to mediawiki [19:52:43] bah trying to apply the lucene role manifests on an instance, puppet ends up trying to add all the production users :( [19:52:50] stuff like: [19:52:51] err: /Stage[main]/Accounts::Erik/Unixaccount[Erik Moeller]/User[erik]/comment: change from Eloquence to Erik Moeller failed: Could not set comment on user[erik]: Execution of '/usr/sbin/usermod -c Erik Moeller erik' returned 6: usermod: user 'erik' does not exist in /etc/passwd [19:52:56] any idea? [19:53:23] ahh [19:53:26] that is added in the role [19:53:27] damn [19:54:47] Ryan_Lane: How disapointed would you be if I ripped out Gluster from tools and made a temporary NFS server to serve until it is suitable for primetime (or a good replacement is found)? [19:55:32] found it [19:55:58] I mean, mount points are nice that way. The users don't care what you mount there. :-) [19:57:33] Once we get a good labs-wide system in place, I promise to migate back to it. Deal? :-) [20:00:03] Coren: I would be more than happy to beta test it :-] [20:00:51] * MaxSem has deleted the instance he was looged into. quite entertaining;) [20:07:12] I'm more like, fuck gluster, lets make everything modules then we can be sexy [20:08:28] Oh, lame. I can't make an instance with lots of storage and light CPU and memory. [20:14:27] Coren: Sure you can, annoy Ryan to make a custom template [20:14:30] or like andrewbogott :D [20:16:28] I know he's hoping for performance improvements with the new network drivers, but if the network stack can give the necessary 7 order of magnitude of improvement, I want one for me too! [20:17:08] Coren: are you in SF this week ? [20:17:15] hashar: Nope. [20:17:35] would you mind taking a flight there next week? I am sick of speaking english :-] [20:18:01] Heh. That'd be hard to justify. :-) [20:18:09] "Employee morale"? [20:18:13] ;]]]]]] [20:19:53] hashar: English > French though! [20:20:51] Mécreant! [20:21:48] Damianz: That's woring, even lexicographically. "English" < "French" :-) [20:22:44] Le français des oignons, nous prenons le thé. Le thé est bien meilleure que celle des oignons. impressionnant stéréotypes. [20:23:53] Damianz: ... did you just try to translate "English as she is spoke"? :-) [20:24:45] Broken english doesn't translate well into sentances really [20:24:55] Pretty sure that says the french are onions rather than the french have onions [20:24:59] * Damianz shrugs [20:25:09] Can't everyone just speak english and use utc for simples [20:27:01] Damianz: Well, if I try to translate what you wrote: "The french of onions, we take the tea. The tea [she] is much better that the onions'. Impressive stereotypes.' [20:27:40] Google translate sucks... impressive was awesome, there was no the and the of was have [20:28:59] Google translate does a pretty good job of single word translations and simple sentences /to/ English. For the rest? Not so much. :-) [20:29:50] At least I can sort of understand german, I totally suck at french [20:32:57] we can fix it [20:33:04] dès maintenant! [20:36:36] hashar: Now you're just being mean. You *know* that 'muricans have a language disability. :-) [20:37:42] Coren: I guess most of them don't even know there are other languages [20:38:32] err: /Stage[main]/Mediawiki_new::Users::L10nupdate/File[/home/l10nupdate/.ssh/authorized_keys]: Could not evaluate: Could not read file /home/l10nupdate/.ssh/authorized_keys: Input/output error - /home/l10nupdate/.ssh/authorized_keys [20:38:33] :( [20:38:37] stupid l10nupdate [20:39:30] !log deployment-prep Search boxes are now having {{gerrit|51677}} patchset 5 applied. Still have to figure out how Lucene works though [20:39:32] Logged the message, Master [20:39:32] lol [20:40:01] i have no idea how lucene works [20:40:07] nor about our infrastructure hehe [20:40:51] Why lucene over like elasticsearch? [20:41:36] Damianz: cause that is what we have in production [20:41:43] and beta simply replicate it :-] [20:41:58] * kteatime has a question concerning the English language... [20:42:35] * kteatime is Silke_WMDE [20:43:10] erm... off-topic... Is "awesome" a positive or a negative word? Or does it depend on the context? [20:43:20] Depends if being sarcastic [20:43:47] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 45617] update Precise image to get the latest patches - https://bugzilla.wikimedia.org/show_bug.cgi?id=45617 [20:44:28] Damianz: So "normally" it's positive, but you can put a different meaning into it [20:44:37] kteatime: Right. [20:44:40] Yeah [20:44:58] And is it common to use it in a sarcastic way? [20:45:01] or rarely? [20:45:01] for god sake [20:45:12] /a/ is not being ensured as a mount grmblblbl [20:45:29] Dunno, I'm constantly sarcastic so context falls into a blur [20:45:38] :) [20:45:42] hashar: I thought that was local now? [20:46:07] kteatime: Not that common, I'd say. If all by itself, more likely than when used in a sentence. [20:46:40] kteatime: All by itself followed by a full stop is the most probable of being ironic, I think. [20:46:41] Damianz: yeah that is local in production [20:47:17] Coren: ok thanks [20:49:22] end of English lesson :) CU you soon! [21:07:22] MaxSem: heh. it usually sticks around for a little bit if you are logged in :) [21:07:38] Coren: please don't use an nfs instance [21:07:42] Ryan_Lane, yup. and then it simply hung:P [21:07:53] :D [21:07:53] Ryan_Lane: I won't, but you owe me. :-) [21:08:06] Coren: well, I just discussed this in a meeting [21:08:38] Ryan_Lane: And you came up with a Plan(tm)? [21:08:48] not necessarily [21:09:08] my idea is to switch to the netapp [21:09:15] but we either need to move things off, or we need virtual filer [21:09:57] mmmkay? [21:11:43] What you are saying is clearly correct. I expect there is a follow up to it, though? :-) [21:12:03] we'll be having followup [21:12:50] Okay, so that fixes things at some point in the future. Do you have an etc on that? [21:13:30] I'd like it to happen soonish [21:14:52] Because right now, with gluster, the whole thing is a no-starter. [21:18:00] I'm not in disagreement [21:18:20] switching out the network drivers will actually help a lot either way [21:18:32] it makes gluster dramatically faster [21:18:46] also, we should lengthen the mount time for autofs [21:19:06] the issue you saw earlier was likely because the gluster mount wasn't there [21:19:12] gluster takes ages to mount [21:19:24] ... that'd make sense given the low traffic. [21:19:48] But anytime you do something significant on the fs, the glusterd sucks up all the CPU too. [21:19:56] it does, yes [21:20:38] So, I'm taking that 'soonish' means 'a couple weeks' and not 'within a few months'? :-) [21:21:15] I hope, but don't hold your breath [21:21:30] want to push in a change to extend the mount time? [21:21:37] I plan on changing the network drivers next week [21:21:50] and we're going to bond some ports on the network node as well [21:22:13] Okay, all of those things have the potential to help. [21:22:21] that alone will speed up gluster by 3x or so, based on some short testing [21:22:24] And yes, extending the mount time is likely to be beneficial. [21:22:35] 3x? Just over the network drivers? [21:22:47] What in blazes does that driver /do/? [21:23:06] 1. the current driver is 100mb [21:23:13] 2. the current driver sucks ;) [21:23:26] virtio is much nicer [21:24:52] all of these changes will help if we're on the netapp too, so it's not wasted effort [21:24:57] Ah well. As long as I don't have more than 2-3 users and that their tools are CPU bound... :-) [21:25:04] heh [21:25:14] make instances with multiple cpus [21:25:27] Ohcrap. I already have four users and they do I/O bound stuff. Fail. :-) [21:25:46] :D [21:25:47] Do we have a preferred mysql deployment pattern, BTW? [21:26:00] not on virtualization ideally ;) [21:26:11] we're supposed to provide a user database service this quarter [21:26:48] You mean in addition to the replica or alongside it? [21:26:53] along side it [21:27:30] hm, though I guess we're letting people write to the replicas.... [21:27:36] Not all use cases are amenable to this; some tools do serious crunching of data with no need to join against the replica and would be best kept on their own. [21:29:35] Some tools just need a DB for light work that's independent as well (like UTRS and their ticket system) [21:30:02] yeah [21:30:08] we're discussing this right now [21:30:14] I would have tended to keep those to a different DB than the principal cluster if at all possible. [21:31:20] tbh, with the setup I now have, if I had a smallish mysql available some significant fraction of tools could already make a tentative move / adaptation trial run. [21:32:42] I'm doing local users and groups with now with hacking scripts rather than LDAP (since I can't touch LDAP), but that's easily adaptable and not visible to the maintainers. [21:33:05] s/with now with hacking/now with hacky/