[00:01:11] Unable to create and initialize directory '/home/ori'. [00:02:42] Hm, I wonder if that means that homedirs don't work at all w/out sharing them [00:02:50] Ryan_Lane, does that seem right? [00:03:04] you'll learn to share or you'll get nothing! [00:04:10] Anyway… I turned on shared homedirs… give it a few minutes and try again [00:20:31] andrewbogott: wait, what project is this? [00:20:38] oh [00:20:51] yeah, unless you have shared homedirs things don't work [00:49:41] andrewbogott: it works now; thanks [01:25:04] !log account-creation-assistance SSH on accounts-database is acting up again, will reboot and investigate [01:25:05] Logged the message, Master [01:34:46] !log account-creation-assistance Looks like the problem was caused by puppet not having been run in days, so the ldap config was out of date (along with the puppet config) [01:34:49] Logged the message, Master [06:04:23] Ryan_Lane: forgot about me for shell? :) [06:04:37] you can already do so [06:05:42] hm [06:05:44] I think [06:05:45] one sec [06:06:00] well where's the queue? [06:06:08] i.e. how do i find people that are waiting? [06:06:20] I take it back [06:06:24] you couldn't ;) [06:06:34] it'll be in your sidebar now [06:06:50] aha [06:06:57] <3 DynamicSidebar [06:07:29] Ryan_Lane: so what's the policy like? just try to detect if they're human? [06:07:44] give shell unless you have a strong reason not to [06:07:51] i.e. these are automated reqs [06:08:03] yeah [06:11:06] Ryan_Lane: hi, got a moment to help me with the monthly stats for the monthly report? [06:11:08] Wikimedia Labs now hosts missing projects and missing users; to date missing instances have been created. [06:11:30] crap. I forgot to update that [06:11:32] link? [06:11:40] er, sorry, you know what I mean - the "missing" #. Guillaume's checklist says he just asks you, and I presume if there were some easier way he would say so [06:11:43] https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/March [06:11:48] yeah [06:12:14] how's he doing? [06:12:32] Last I heard, about the same :/ [06:13:28] ugh [06:16:04] Ryan_Lane: so i'm a SMW noob... [06:16:11] heh [06:16:17] what's the diff between [[Shell Request User Name::{{{User Name|}}}]] and just {{{User Name|}}} [06:16:21] ? [06:16:31] in the shell request page? [06:16:36] just click "modify rights" [06:16:43] no [06:16:49] i was modifying the template [06:16:49] example? [06:16:53] why? [06:16:57] https://wikitech.wikimedia.org/w/index.php?title=Template:Shell_Access_Request&action=edit [06:17:15] so it links to the user's user page and talk page and contribs. (or some subset of those) [06:17:28] but first i wanted to even figure out how it works as is [06:17:29] they likely won't have any [06:17:36] right, i know :) [06:17:40] [[Shell Request User Name::{{{User Name|}}}]] <— that's a semantic annotation [06:17:50] Shell Request User Name is a property [06:17:56] {{User Name}} is its value [06:18:20] so, rendering is the same, just different so that the page's property's are right? [06:18:21] properties* [06:18:54] yep [06:19:08] ok [06:21:10] Thanks Ryan [06:21:14] yw [06:21:19] I'm updating the labs info now as well [06:27:23] * jeremyb_ sleeps [08:03:15] @labs-resolve tools [08:03:15] I don't know this instance - aren't you are looking for: I-0000048c (glam-gwtoolset-puppet), I-00000515 (webtools-odie), I-000005c9 (webtools-login), I-000005ca (webtools-apache-1), I-000005cb (webtools-rr), I-000005f9 (tools-login), I-000005fb (tools-puppet-test), I-00000600 (tools-webproxy), [08:03:35] @labs-resolve mysql [08:03:35] I don't know this instance - aren't you are looking for: I-0000028b (tutorial-mysql), I-00000341 (wikibits-mysql), I-0000040c (wlm-mysql-master), I-00000497 (centralauth-mysql), I-000005c8 (opensim-mysql1), [08:03:41] meh [14:13:21] petan: Still around? [14:13:27] yes [14:13:31] hi [14:13:33] Heya [14:13:53] So, a quick lesson on my transition tools now that we have working service groups? :-) [14:15:15] First, have you noticed the new section in "Manage Projects"? [14:15:24] (Go look at tools) [14:17:08] ok [14:17:21] User can now create their own tools, and manage membership. [14:17:27] where do I find it [14:17:38] https://wikitech.wikimedia.org/wiki/Special:NovaProject [14:17:38] oh I see [14:17:40] NovaProject [14:18:18] ok how does it work? [14:18:19] Now that doesn't do _everything_ yet, I'll add a daemon that watches ldap to do the rest soon, but there is one extra step for us when someone creates a tool: [14:18:38] I suppose that all members of that group can sudo to extra account of name of the services? [14:18:40] * service [14:18:44] petan: Right. [14:18:54] What the OpenStack part does is: [14:18:57] ok, so creation of service group creates a new account? [14:19:01] * Coren nods. [14:19:36] is it possible to change the parameters of that user, like $HOME or real name etc [14:19:55] or, where is default $HOME /home? [14:20:05] so same volume as user homes [14:20:17] No real name; since it's a service user. The $home is made from a pattern that is per-project; check https://wikitech.wikimedia.org/w/index.php?title=Special:NovaProject&action=configureproject&projectname=tools [14:20:28] ok [14:20:28] In our case, the patter is /data/project/$tool [14:20:50] that's a new thing, btw it should be Configure not configure :) [14:20:59] because all other options are capital first char [14:21:17] cool [14:21:44] So, what openstack does is create the local-$tool user and groups, manages the sudoers, and the group's membership. [14:22:16] Right now, we have one more step to do in the project to create the rest (that will be automated this week) [14:22:29] So we /usr/local/sbin/addtool $tool [14:23:21] aha, so creation of new tool consist of 1. creation of service group 2. execution of that script with parameter same as service group (with local- or without?) [14:23:30] This (a) creates the tool home (b) adds the tool to the webserver and (c) creates a database [14:23:46] btw every service group need to be prefixed with local- or that is done automatically [14:23:53] Without local-; local- is added by openstack itself [14:24:17] So when you create you give 'foo' as tool name, openstack creates local-foo [14:24:22] ok, so without local in wikitech and without local on addtool as well? [14:24:27] Right. [14:25:07] ok, does it also put the service account to some group on GE / change some permissions etc [14:25:09] addtool is going to go away, though, I'm writing a daemon that will watch for additions in LDAP and do it automatically once I'm done with NFS [14:25:18] ok [14:25:27] petan: Yes, it sets up the permissions as well. [14:25:32] (addtool) [14:25:42] well, there could be a "post-add" script on console [14:25:44] wikitech [14:25:51] like in Configuration [14:26:03] which gets executed after creation of service group [14:26:14] that would be more effective that daemon which is watching LDAP for changes [14:26:17] * than [14:26:28] as well as post-remove script [14:27:13] That's actually part of the to-do list, but won't be here until salt is finished deploying. [14:27:26] k [14:27:39] But yeah, that's better than watching LDAP. :-) [14:28:15] At least, now, tool maintainers can manage the membership of the tool's group. [14:28:23] ok [14:29:22] Every member of the group gets a sudo to the tool account, like I was doing locally. [14:29:41] btw I don't like the way how it's implemented to wikitech, I think this should be on separate page just as domains, sudoers etc, this way it will mess up the table of each project that doesn't use this feature and I guess thee are going to be many of them :/ [14:30:15] petan: Andrew preferred it this way; he says that the concept of service accounts is going to be used often in other contexts. [14:30:36] petan: They're doing local groups like we were instead atm. [14:30:36] yes no doubt it is going to be used on /some/ projects [14:31:05] but it adds a lot of white space to other :P [14:31:14] well, I don't mind [14:31:30] Discussion of the Wikitech UI is Andrew's domain; I agree that a good UI expert should be involved eventually. :-) [14:31:48] BTW [14:31:57] there is no service group for my bot [14:32:07] maybe I could just create it to test it [14:32:10] ... which brings me to the second point. [14:32:21] Moving the existing accounts to the new scheme. :-) [14:32:36] That's VERY hackish, and a bit delicate. What you do is: [14:32:50] on tools-login: xfer $tool [14:32:57] It'll eventually say "create the group" [14:33:03] You create the group on wikitech [14:33:15] Then hit enter on tools-login. [14:33:23] It's ugly, and fragile. [14:33:32] I will check that script :) [14:33:33] But it only needs to be done once, and only for the old accounts. [14:33:54] But it will break any job currently running from that account, so you need to talk to the maintainer to do it. [14:34:03] I.e.: discuss it with yourself and try. :-) [14:34:48] btw why /bin/bash ? :P I know it doesn't matter... but it's an old habit of me to make all bourne compatible scripts have /bin/sh so they run everywhere no matter if bash is there or not [14:35:30] petan: Because I ofen use bashisms, and I want to be told if the script would break. :-) [14:35:36] $[] [14:35:42] $() [14:36:08] also it's a good idea to check if cd /data/project so that in case gluster broke it doesn't execute the commands in WD, but that's a minor problem and given that it's not going to be executed by regular users it shouldn't matter [14:36:18] * if cd /data/project was successful [14:36:35] petan: True; but that script will go away as soon as the last tools are moved to the new schemes anyways. [14:37:25] I'm working on the gluster-replacement this week. :-) [14:39:04] last rant: you are loading tool="$1" which allows you to load something what is in $IFS (like tab or space) but later you are not putting quotes over $tool which could break a lot of stuff, but again, it's not going to be run by users, so who cares... now I go test it :D [14:40:26] !log tools petrb: trying to convert afcbot to new service group local-afcbot [14:40:29] Logged the message, Master [14:40:46] so I am going to run sudo xfer afcbot? [14:41:41] or xfer local-afcbot :) [14:41:54] let me figure out from source [14:41:56] No need to sudo, it alteady sudo at the apropriate points. [14:42:04] without -tool [14:42:06] aha [14:42:07] ok [14:42:15] xfer afcbot [14:42:27] line 7: ongrid command not found [14:42:33] o_O [14:42:38] I think there is somthing in yout PATH that isn't in others [14:42:56] Oh, fool. I left ongrid in my ~/bin [14:42:57] :-) [14:43:02] heh :) [14:43:05] Moved. [14:43:27] ok trying again [14:43:35] chmod a+x too [14:43:36] pls [14:43:39] :P [14:43:46] permission denied now [14:43:49] 'group does not exist/user does not exist' error messages are okay. [14:44:02] Did you forward your key? [14:44:03] no it doesn't allow me to execute ongrid [14:44:12] because I don't have +x on that [14:44:31] ... because it was in my bin. :-) Fixed. [14:44:43] are you sure? :D [14:44:47] -rwxr-xr-x 1 marc wikidev 115 Mar 26 15:51 /usr/local/sbin/ongrid [14:44:50] yay [14:44:52] now it works [14:45:24] aha so it requires key forwarding as well? [14:45:27] mhm [14:45:30] let me fix it [14:45:31] BTW, 'ongrid' just sudo the command on every "run environment" host. [14:45:48] that probably did some inconsistencies then [14:45:59] because I was successful only on some hosts [14:46:04] given that I didn't forward my ket [14:46:05] key [14:46:14] Ah. [14:46:36] Hm. The only thing that will have broken is master and shadow. I'll remove your group manually there. [14:47:00] ok so is it safe to start it again? [14:47:07] I should have start it with -x [14:47:14] to see what it has done so far [14:47:39] No, it's kinda worthless to run it now. Just create your service account and chown your tool's home manually. [14:47:58] it should be working, it's just userdel [14:48:05] running userdel multiple times won't hurt [14:48:26] petan: Yes, but you don't have a local user anymore so the $(id -u) won't be able to figure out the previous uid for the finds. [14:48:51] are you sure? I killed the script after first host it couldn't connect to [14:49:00] Ah! If you killed it, then it'll work. [14:49:05] yes it works [14:49:08] (Local is always done last) [14:49:14] I don't usually continue on errors [14:49:30] petan, let me see if I can exclude that column from projects w/out service groups. I agree it's a waste of space... [14:49:38] ok [14:50:39] Coren: finished and group created [14:50:42] let's try if it works [14:50:44] petan: Since the gid changed, you'll have to log off and back on again. [14:51:09] ok [14:51:17] petan: BTW, you know that tools-login.wmflabs.org works, right? :-) [14:51:23] tools-login is a bastion. [14:51:26] yup [14:51:28] it works [14:51:44] Coren yes I know but I would need to update my aliases... :P [14:51:51] :-P [14:51:53] I just type labs to terminal to ssh there [14:53:07] yes it works and I can submit jobs :P [14:53:24] whether they really get executed is another question [14:54:30] Did you just try to qsub _from a job_? [14:54:44] yes [14:54:50] it seems to work at some point [14:55:01] The exec hosts are not submit hosts. [14:55:29] tools-login is submit host? [14:55:30] oh that [14:55:41] yes I did qsub on a script which I should have start local [14:55:41] But on tools, what you're trying to do is easier done with [14:55:43] localy [14:55:49] that was a mistake [14:55:50] jstart /data/project/afcbot/start_bot2.sh [14:56:31] can I use relative path? [14:56:48] Yes, but remember that the cwd is always your home. [14:56:58] (Unless you override it with -cwd) [15:00:32] Krenair, are you using nova-precise w to develop still? [15:00:43] not at the moment [15:01:26] But I do need at least one working wikitech-like wiki if I want to make OSM commits [15:02:45] are you aka 'alex_monk'? w3 seems in use by him/her. [15:05:25] Anyway, I will test using w3 for now [15:15:20] andrewbogott, yes. [15:15:40] Krenair: OK. I switched branches in w3 but left that branch intact. [15:16:16] Was it that review/ branch? That's already in gerrit [15:16:22] https://gerrit.wikimedia.org/r/#/c/53464/ [15:17:06] petan, https://gerrit.wikimedia.org/r/#/c/57513/ [15:17:37] …and you can see the behavior in action here: https://wikitech-test.wmflabs.org/wiki3/Special:NovaProject [15:17:48] Although there's some bad data there [15:20:16] k [15:30:09] Coren is there any plan to convert existing local tools to shared service groups? [15:30:17] or only at user request? [15:30:51] I'm doing at user request for now, but I will convert all the leftover during the filesystem switch outage. [15:31:02] ok [15:54:27] @notify Ryan_Lane [15:54:27] I will notify you, when I see Ryan_Lane around here [15:58:25] andrewbogott: You have a minute? [16:04:55] Coren -- sure. what's up? [18:08:24] jesus. deployment-prepbackup-project has over 1.6m files? [18:08:43] its shrink has been going for ages [18:08:54] wtf is being put there? [18:44:48] Ryan_Lane wut? [18:44:52] let me check [18:45:08] gluster is shrinkin git [18:45:11] *shrinking it [18:45:16] it scans and move files [18:45:18] @labs-resolve backup [18:45:18] I don't know this instance - aren't you are looking for: I-000000f8 (deployment-backup), [18:45:25] it's at 1.7M scanned now [18:45:35] no route to host [18:45:45] where are these files? [18:45:50] I am using only local storage there [18:45:52] no gluster at all [18:45:52] on project data [18:45:59] that was never used there [18:46:05] bah. so we can delete all of that? [18:46:06] I doubt it was ever mounted [18:46:15] idk maybe hashar used it? [18:46:19] I was only using /mnt [18:46:22] to backup db [18:46:30] I don't think /data/project existed back then [18:46:34] looks like its backing up mediawiki and related things [18:46:42] to /data/project? [18:46:48] wtf [18:47:25] if I could see it... [18:51:35] I can see it [18:51:37] it's that [18:51:52] ? [18:51:57] that is what :D [18:52:01] the mediawiki condif [18:52:04] err [18:52:08] mediawiki and config and such [18:52:16] why there is so much of that [18:52:17] with a ton of mediawiki releases [18:52:31] 5 versions of mediawiki and all the git info [18:52:37] hmm [18:52:44] yes I think you can delete it [18:53:30] deleting [19:12:28] Coren: you tested that ssh change, right? [19:12:58] Yeah, on tools-puppet-test. It behaves as expected if you don't have the variable set at all (i.e.: just adds the comments) [19:13:06] ok [19:23:04] Ryan_Lane is here some performance problem? [19:23:12] please be specifiv [19:23:15] *specific [19:23:18] sql server is getting in troubles [19:23:33] @labs-resolve bots-bsql01 [19:23:33] The bots-bsql01 resolves to instance I-0000061b with a fancy name bots-bsql01 and IP 10.4.0.226 [19:23:34] this one [19:23:49] which host is it on? [19:23:57] @labs-info bots-bsql01 [19:23:57] [Name bots-bsql01 doesn't exist but resolves to I-0000061b] I-0000061b is Nova Instance with name: bots-bsql01, host: virt9, IP: 10.4.0.226 of type: m1.xlarge, with number of CPUs: 8, RAM of this size: 16384M, member of project: bots, size of storage: 170 and with image ID: ubuntu-12.04-precise [19:24:04] virt9 [19:24:22] http://ganglia.wikimedia.org/latest/?c=Virtualization%20cluster%20pmtpa&h=virt9.pmtpa.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [19:24:23] yes [19:24:31] something is eating up shitloads of CPU [19:24:38] of course this server is :D [19:24:48] in the last 20 minutes [19:24:49] addshore spawned like 500 isntances of his bot [19:24:52] look at the graph [19:24:53] ddosing the mysql [19:25:08] shhh :/ [19:25:15] itll be done soon [19:25:19] 32465 libvirt- 20 0 21.4g 16g 7580 S 639 8.6 13548:48 kvm [19:25:32] i-0000061b [19:25:53] @labs-resolve 00061b [19:25:53] I don't know this instance - aren't you are looking for: I-0000061b (bots-bsql01), [19:25:53] well, this isn't an infrastructure problem, then ;) [19:26:02] yes that's :D [19:26:05] how is it? [19:26:06] its just me ;p [19:26:15] no I mean that's the server [19:26:19] ah [19:26:19] ok [19:26:30] the server is just being slammed [19:26:48] temporarily being slammed ;p [19:27:16] * petan sends hordes of cats over addshore [19:27:26] u'll see what is it like [19:27:28] to be slammed [19:27:35] slammed with cats? :P [19:27:40] yeah :D [19:28:04] haha, good to see bsql01 cpu is actually doing something for once ;p [19:28:04] what is this? reddit? [19:28:54] yes i was thinking that cpu is actually just pretending to exist and in fact it's just 1 cpu instead of 8 [19:28:55] :P [19:29:41] same ^^ [19:31:17] it's using 6 cores [19:31:31] minimum [19:31:42] seeins at that it was using over 600% of the host cpu ;) [19:32:13] at least these cpu's started to be useful for something else than heating the building [19:33:17] Ryan_Lane, Coren, could one of you review https://gerrit.wikimedia.org/r/#/c/57513/1/special/SpecialNovaProject.php [19:33:24] when parsoid runs its round-trip tests almost all of the hosts shoot up to 80% utilization or so [19:33:35] And, subsequently, is now an OK time for me to make the latest sudo runas changes live? [19:35:06] andrewbogott: sure [19:36:00] andrewbogott: Was about to merge, and Ryan beat me to it. [19:38:19] ok… everything is live now. Coren, let me know if the 'sudo as' thing works properly next time you create a service group [19:38:36] andrewbogott: kk [20:11:21] addshore [20:11:28] helooo [20:11:35] you should go in nagios [20:11:50] and set pinging to sql [20:11:51] :D [20:12:11] you would actually see that load is over 30 [20:12:28] D: [20:12:38] Its because I am doing something that the database is not designed to do ;p [20:12:42] ok [20:12:50] full scan? :D [20:13:18] well you remember the later I tried to do a long time ago to avoid duplicate lang and article combinations? :P [20:13:31] no [20:13:32] :D [20:13:38] my memory suck [20:13:48] well, I ran out of time as it was saying ti would take days [20:13:53] so i just ran without doing it [20:14:01] aha [20:14:10] That would have made this process use a hell of a lot less cpu intensive as currently I have having to check to see if a row exists before adding it [20:14:21] whereasbefore I would have just added it and it would either work or not [20:14:40] oh [20:14:43] so that? [20:14:45] :D [20:14:52] ya :/ [20:14:56] thats whats taking its time [20:15:00] * do that [20:15:02] :D [20:15:03] 1 process per wiki language [20:15:04] not so [20:15:21] started off with a few hundred, down to just 61 now :D [20:18:26] :/ [20:19:36] 60! [20:20:02] 60 what [20:20:11] instances? [20:20:17] processes left ot finish :P I am gessing these are for the larger wikis [20:20:56] added an extra 100,000 rows so far :D [20:45:20] petan: I have a graph you might like :) [20:45:38] http://stath.at/sh/SmTjXyV4yTZNxsFu [20:46:06] click overview and you can change the length of tiem the graph shows :) [20:47:56] hah [20:48:06] [= [20:48:45] how does it work :o [20:52:40] I have a cron that checks every min and sends it to the graph :) [20:52:49] I onyl found StatHat today and I LOVE IT :) [21:42:18] [bz] (NEW - created by: Ryan Lane, priority: Unprioritized - normal) [Bug 46907] Make instances accessible prior to full bootstrapping - https://bugzilla.wikimedia.org/show_bug.cgi?id=46907 [21:43:23] Change on 12mediawiki a page Wikimedia Labs/Instance creation improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=668841 edit summary: [+121] /* Bootstrapping */ [22:16:59] petan made it even nicer :) https://www.stathat.com/cards/ShfrTAqcj2hL [22:20:26] petan brings all the boys to the yard [23:07:47] !log cvn Setting up checkout of https://github.com/countervandalism/infrastructure in /cvn/externals [23:07:49] Logged the message, Master [23:12:11] !log cvn Set up symlinks for cvn-config. Bot in #cvn-wp-nl now uses the Dutch msgs file, #cvn-wp-es uses Spanish msgs file. [23:12:13] Logged the message, Master