[00:21:51] \o/ [00:21:53] it's working! [01:12:39] andrewbogott_afk: Sorry, was afk. Poke me when you're back. [01:34:37] [bz] (RESOLVED - created by: testingwithfire, priority: Unprioritized - normal) [Bug 46962] HTTPS link is shown on toro new login form but HTTPS is not supported - https://bugzilla.wikimedia.org/show_bug.cgi?id=46962 [01:36:09] [bz] (RESOLVED - created by: Marc A. Pelletier, priority: Highest - enhancement) [Bug 45119] Add per-project service/role user accounts and groups - https://bugzilla.wikimedia.org/show_bug.cgi?id=45119 [02:51:39] [bz] (RESOLVED - created by: Ryan Lane, priority: Unprioritized - normal) [Bug 46907] Make instances accessible prior to full bootstrapping - https://bugzilla.wikimedia.org/show_bug.cgi?id=46907 [02:54:44] Change on 12mediawiki a page Wikimedia Labs/Instance creation improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=671696 edit summary: [-567] [02:59:09] [bz] (NEW - created by: Ryan Lane, priority: Unprioritized - normal) [Bug 47067] Make wikitech an openid provider - https://bugzilla.wikimedia.org/show_bug.cgi?id=47067 [02:59:39] Change on 12mediawiki a page Wikimedia Labs/Account creation improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=671697 edit summary: [+66] /* OpenID as a provider */ [03:01:29] Change on 12mediawiki a page Wikimedia Labs/Instance creation improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=671698 edit summary: [+52] /* Bootstrapping */ [03:03:47] Ryan_Lane: BTW, you should start giving thoughts to other talks you want to give at Wikimania. :-) [03:03:53] yep [03:04:01] submission date is coming up soon [03:05:04] Change on 12mediawiki a page Wikimedia Labs/Interface usability improvement project was modified, changed by 67.160.217.184 link https://www.mediawiki.org/w/index.php?diff=671699 edit summary: [-6] /* Skin */ [03:05:43] Change on 12mediawiki a page Wikimedia Labs/Stability improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=671700 edit summary: [-137] /* Gluster */ [03:08:25] Change on 12mediawiki a page Wikimedia Labs/Communication improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=671701 edit summary: [-174] /* Notifications */ [03:10:11] Change on 12mediawiki a page Wikimedia Labs/Communication improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=671702 edit summary: [+112] [03:10:29] Change on 12mediawiki a page Wikimedia Labs/Communication improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=671703 edit summary: [+0] /* Notifications */ [06:12:31] [bz] (RESOLVED - created by: Krinkle, priority: Normal - enhancement) [Bug 40519] webtools: Setup up webtools.wmflabs.org - https://bugzilla.wikimedia.org/show_bug.cgi?id=40519 [06:39:30] !log deployment-prep search01 : restarted lucene-search-2 , was not listening on port 8123. [06:39:34] Logged the message, Master [06:42:12] [bz] (ASSIGNED - created by: Chris McMahon, priority: Highest - normal) [Bug 46459] lucene-search-2 uses too much memory on labs - https://bugzilla.wikimedia.org/show_bug.cgi?id=46459 [06:45:22] !log deployment-prep searchidx01 : restarted lucene-search-2 might have been killed by OOM killer (see {{bug|46459}} [06:45:24] Logged the message, Master [10:09:02] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Needed Toolserver features was modified, changed by 84.175.78.38 link https://www.mediawiki.org/w/index.php?diff=671908 edit summary: [+114] /* Logs/Stats */ added comment from WMF legal [10:11:35] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Needed Toolserver features was modified, changed by 84.175.78.38 link https://www.mediawiki.org/w/index.php?diff=671910 edit summary: [+13] /* Logs/Stats */ minor detail [10:21:02] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Needed Toolserver features was modified, changed by 84.175.78.38 link https://www.mediawiki.org/w/index.php?diff=671911 edit summary: [-21] /* Logs/Stats */ [11:02:35] !log integration jenkins2 instance: upgrading Zuul / Jenkins (maybe Gerrit too) [11:02:37] Logged the message, Master [11:24:43] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Needed Toolserver features was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=671917 edit summary: [+30] /* Mail: to do */ [11:47:01] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Needed Toolserver features was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=671922 edit summary: [+36] /* Mail: to do */ [13:13:54] !log deployment-prep reran Jenkins job https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update/ . Some git failures happened in /home/wikipedia/common . [13:13:56] Logged the message, Master [13:19:13] !log deployment-prep no pages being served. Most probably a PHP Fatal error [13:19:15] Logged the message, Master [13:20:48] !log deployment-prep apt-get upgraded apache32 and apache33 . Note that apache is down on them. [13:20:50] Logged the message, Master [13:23:04] !log deployment-prep apache2: Syntax error on line 324 of /etc/apache2/apache2.conf: Syntax error on line 9 of /etc/apache2/wmf/all.conf: Could not open configuration file /etc/apache2/wmf/www.wikipedia.conf: No such file or directory [13:23:05] Logged the message, Master [13:23:08] !log wikiversity-sandbox Updated wikiversity-sandbox-frontend, no reboot was needed [13:23:09] Logged the message, Master [13:24:47] !log deployment-prep Gluster failure again /data/project/apache/conf/ has some files missing: www.wikipedia.conf en2.conf wikimedia.conf [13:24:49] Logged the message, Master [13:25:14] !log deployment-prep rebooting both apaches. [13:25:16] Logged the message, Master [13:26:45] !log deployment-prep Cluster is back up :-] [13:26:47] Logged the message, Master [13:31:03] !log deployment-prep rebooting deployment-bastion too : gluster issue [13:31:05] Logged the message, Master [13:32:57] !log deployment-prep switching udp2log on bastion: /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start (see {{bug|38995}} ) [13:32:59] Logged the message, Master [13:33:36] !log deployment-prep Restarted the database updating job https://integration.wikimedia.org/ci/job/beta-update-databases/374/ [13:33:38] Logged the message, Master [15:09:13] coren: So, since a tool could run on one of a number of servers, I presume we need to puppetize any additional package installs, right? [15:35:36] paravoid: Hi [15:36:02] hi [15:36:07] saw silke's mail [15:36:10] I'll reply in a bit [15:36:35] In coppenhagen, you had a look at the debian packaging for the osm stack [15:36:46] yeah, I need to put some work into this [15:36:53] You suggested to move the docs out of the debian directory, which I have done now. [15:37:04] Were there other suggestions you had to improve the packaging? [15:37:21] I don't remember :( [15:39:11] What is the best way forward? I can create new packages with the scripts I have for testing on labs [15:39:26] or perhaps you could have another look? [15:41:44] andrewbogott: yes ;) [15:42:17] I should... [15:42:47] petan, is there a base class that pulls in everything a tools node should have? [15:43:18] I don't think so, Coren|Sleep said he's going to work on it later [15:43:26] dang, ok. [15:43:49] Is there a stopgap approach, meanwhile? Or should I just find another task in the meantime? [15:44:40] Can you remember which set of scripts you used as the basis to look at things? The debian scripts in the upstream repository, the ones MaxSem created, or the ones I made for labs testing? [16:03:45] the scripts in the upstream repository have too much auto configuration and setup to be debian packaging compliant, because they were designed to be as simple as possible to setup [16:04:36] So I'll update the ones I made for labs, which removed some of that auto configuration (as that would be handled by puppet) and you can use those as a basis [16:49:52] [bz] (NEW - created by: Chris McMahon, priority: Unprioritized - major) [Bug 47080] missing Nostalgia etc. breaks beta cluster - https://bugzilla.wikimedia.org/show_bug.cgi?id=47080 [16:59:41] Coren, ping? [17:40:12] hey apmon [17:43:13] hi [17:57:04] apmon, which packages should I use for testing right now? [18:12:40] Coren: I'm only a few volumes away from being done shrinking everything [18:12:53] MaxSem: I was using the ones in the /data/project/repo repository for testing [18:13:17] but I haven't updated them in a bit. [18:13:55] Let me do that, and you should be able to use them automatically through the self hosted repo [18:14:27] How I can install cpan modules? [18:14:29] although, as the current labs puppet configuration forces the packages to be signed, I signed them with my own key [18:14:51] which you need to import into a new instance before being able to use the repository [18:15:09] Ryan_Lane: Yeay! *dances around his chair* [18:16:25] Hm, Ryan_Lane, is it possible to use class parameters in the labsconsole instance configuration? [18:16:34] nope [18:16:42] variables and classes only [18:16:56] puppet's ldap backend doesn't support parameterized classes [18:17:08] hmmm [18:17:09] k [18:17:36] how do the 'variables' work then? the doc says they are inserted as templates [18:17:42] are they available in the classes? [18:17:51] as local vars? [18:21:03] Ryan_Lane^ [18:21:20] eh? [18:21:27] they are global variables [18:21:30] assigned at the node level [18:21:42] they work just like normal variables ;) [18:21:53] so, if I had in labsconsole myvar set [18:21:58] i could access it as $::myvar in any class? [18:22:16] yes, but you should really only access it in a role class [18:22:27] if possibl [18:22:29] *possible [18:23:43] hmmm, that's cool [18:24:07] <^demon|sick> !log deployment-prep ran mergeMessageList.php for php-master wikis [18:24:09] Logged the message, Master [18:29:57] How I can install cpan modules? [18:30:24] hi [18:30:40] UA31_ where? [18:31:19] in tools proyect [18:31:43] hmm which modules do you need? [18:31:48] I can install them for you... [18:32:10] MediaWiki/Bot.pm [18:32:36] btw - do you need to install them system wide or is that just some local installation? [18:32:53] for running a bot [18:34:03] is it even needed to install these using cpan? I must say I have little experience with that, aren´t these python packages somewhere? [18:34:22] s/python/perl/ [18:35:29] ah perl [18:35:30] UA31_: You shouldn't install CPAN modules; we don't want to support two package management systems. [18:35:45] UA31_: If the module you need isn't debianized, I'll be happy to make it for you. [18:37:07] UA31_: Or you can do a local install if your module is very specialized and unlikely to be used by others. My Copyright bot uses Text::Align::WagnerFisher for instance which is -- unusual. :-) [18:37:23] cpan local is simples :D [18:37:46] I would prefer one package system as well [18:38:12] though cpan works well... [18:38:13] unless it's virtualenv'd pip because 1 global version of python modules is evil and I'll stab you [18:38:25] python is evil [18:38:33] s/evil/awesome/ [18:38:53] Explain local install [18:38:58] why does it eat so much system resources for doing simple tasks :P [18:39:03] Though saying that I might have to rewrite my lovley python in C/C++ since I don't want to compile python into an initrd [18:40:56] I wish gtk QT and wx merged to one trully working, universal and perfect thingie [18:42:04] so many people working on same thing, tripple efforts :/ miserable outcome [18:42:43] actually GTK isn´t that bad, but the number of bugs I am finding in it is killing me [18:43:14] I can´t even write apostrophes in windows :( [18:57:09] hello Coren, someone told me that you were able to treat the pending requests: https://wikitech.wikimedia.org/wiki/Nova_Resource_Talk:Bots & https://wikitech.wikimedia.org/wiki/Nova_Resource_Talk:Tools [18:57:36] JackPotte: Sure. Gimme a minute to look at those. [18:57:51] thanks a lot [18:59:11] * Coren adds those to his watchlist, he wasn't expecting requests there. :-) [19:30:31] JackPotte: You're all set. Do you want me to create a tool account for you on tools as well while you are here? (see https://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Help) [19:31:22] Coren: yes please, I've got a pywikipedia bot, and an edit counter in French: http://toolserver.org/~jackpotte/sc [19:31:54] Do you have name you'd like for them? [19:31:57] names [19:32:17] Like, 'foobot' and 'barcounter' or whatever? [19:33:28] Coren: "JackBot" and "X!'s tools" [19:34:41] Hm. Alphanum only, I'm afraid. It's also case significant so you'll get annoyed with the caps. :-) Think 'username' not 'title'. [19:34:49] How about 'jackbot' and 'xtools'? [19:35:25] Coren: perfect [19:37:18] JackPotte: {{done}}. You can log in now. [19:37:39] Coren: thank you, I'm testing [19:41:14] JackPotte: Undocumented easter egg: If you put a .description file in your tools' homes, it'll show up at https://tools.wmflabs.org/ :-) [19:41:29] ok [19:42:55] oh [19:42:56] cool [19:43:16] Coren: just raw html? [19:43:37] legoktm: Few tags allowed,


basically [19:43:59] ok [19:49:43] Coren: So, regarding adding packages to the tools instances… should I just wait until you have a plan? [19:50:13] andrewbogott: No, in the meantime you can just ask me and I'll be glad to help. :-) [19:50:20] (Or petan) [19:50:39] ok! In that case I need… python-argparse, python-irclib, and adminbot :) [19:51:04] adminbot 1.7, which should be the latest package available for precise on Brewster. [19:52:26] No current or candidate version found for python-argparse [19:53:26] hm, ok. [19:53:47] pip only? [19:54:02] hi [19:54:38] Coren: how do i make the link work? [19:54:52] Coren, oh, I had to install it on lucid, I bet it's present by default on precise. [19:54:53] legoktm: You need to have an index.* in your public_html [19:54:58] ok [19:55:18] Coren, yep, it's already present on tools-login. [19:55:55] Coren you need help w anything? [19:56:00] andrewbogott: Ah, simpler that way. {{done}} [19:56:08] Yep, I see it. Thanks. [19:56:15] I can just start my jobs directly on tools-login? [19:57:05] petan: You know how you could help the incoming users a *lot*? Find some reasonably automated way to make .debs out of pip-only packages. :-) [19:57:11] * Coren is a python newbie. [19:57:16] hmm [19:57:36] for beginning you could make a list of pip packages that needs to get converted [19:57:46] once I convert some I might find a way how to automate it [19:57:55] petan: At this time, the big one is oursql. [19:58:12] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Notepad was modified, changed by MPelletier (WMF) link https://www.mediawiki.org/w/index.php?diff=672068 edit summary: [-19] update [19:58:13] is that a package name? [19:58:36] petan: Yes. there is also a request for a more recent version of requests [19:58:36] pip install oursql [19:58:39] :D [19:58:40] ah [19:58:43] ok [19:59:02] these packages have confusing names [19:59:05] the requests that the package manager has is pre 1.0 which had a bunch of breaking changes [19:59:17] "my"sql --> "our"sql :P [19:59:27] andrewbogott: You /can/, but I would very much prefer you have it scheduled via jstart so that it can be migrated if a server goes down. [19:59:31] so package /requests/ and package /oursql/ [19:59:43] I can write apostrophes for some reason nor quotes [20:00:03] I found out itś not because of GTK but because this local windows they have some extra tool which messes the keyboard [20:00:46] petan: Right. The traditional resulting debian package name is 'python-x' [20:01:07] petan: You might want to look at the source deb for python-requests -- it's outdated, but should give you a good start [20:01:16] ok [20:02:10] where is to-do list [20:02:35] we should have one on etherpad, or at least on wiki [20:03:07] petan: http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/TODO [20:03:23] k [20:04:59] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/TODO was modified, changed by MPelletier (WMF) link https://www.mediawiki.org/w/index.php?diff=672069 edit summary: [+73] Implementation details [20:05:47] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/TODO was modified, changed by MPelletier (WMF) link https://www.mediawiki.org/w/index.php?diff=672070 edit summary: [-104] {{done}} [20:07:38] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/TODO was modified, changed by Petrb link https://www.mediawiki.org/w/index.php?diff=672073 edit summary: [+290] pip [20:09:07] Coren I will convert these packages on bots because I don want to create mess on tools (production) [20:09:53] petan: It might even be best to create an instance especially for that particular function; I'd use it as well when I have to make debs out of upstream. [20:10:16] Coren: Sorry, I meant, 'do I run jstart on tools-login or someplace else?' [20:10:31] andrewbogott: Ah, yes, you can just jstart from tools-login. [20:10:35] hmm [20:10:38] how to call it? [20:10:39] ok [20:10:41] tools-dev? [20:10:58] petan: Sounds fine. [20:12:08] andrewbogott why are service groups listed in project filter? [20:12:24] is that a way to hide them? [20:13:07] petan: I don't know! Do you mind filing a bug for me? [20:13:23] Coren should I create some security group or not? because it won be possible to change later - I tihnk it is not needed IMHO [20:13:40] andrewbogott ok [20:13:45] woo, Ryan_Lane: would love a review: [20:13:45] https://gerrit.wikimedia.org/r/#/c/58540/ [20:14:16] petan: It's a normal host; just leave it in default. [20:17:37] [bz] (NEW - created by: Peter Bena, priority: Unprioritized - normal) [Bug 47101] there is a strange stuff in project filter - https://bugzilla.wikimedia.org/show_bug.cgi?id=47101 [20:18:19] andrewbogott [20:18:21] http://bug-attachment.wikimedia.org/attachment.cgi?id=12073 [20:19:34] itś local-afcbot etc [20:20:07] and my laptop is bit overheated lol [20:23:37] !log tools created -dev instance for various purposes [20:23:38] Logged the message, Master [20:25:12] Coren, dammit I made my adminbot package such that the bot can only be run by root :( I'll need you to update the package shortly. [20:25:15] Coren how is motd being set in tools [20:25:52] petan: motd.tail by hand atm; it's in my draft puppet set of classes though. [20:26:03] ah so hand... [20:26:16] petan: Aka. Number 3 on my priority todo. :-) [20:26:22] :P [20:29:29] petan: I'm going to start the shrinking of bots-home and bots-project soon [20:29:39] what does it mean [20:29:46] are these 2 mounts going to be down? [20:29:55] when I do a commit each one may be temporarily unavailable for a few seconds [20:30:15] ok [20:30:53] !log tools petrb: installing some dev tools to -dev [20:30:55] Logged the message, Master [20:32:36] andrewbogott: ugh https://bugzilla.wikimedia.org/show_bug.cgi?id=47101 [20:32:37] heh [20:33:05] groupofnames... [20:33:13] so it shows up as a project [20:33:44] andrewbogott: easiest fix for that is to change the scope of the search, when searching for projects [20:33:46] from sub to one [20:45:36] Core, could you please create a tools group called 'morebots'? [20:49:14] andrewbogott: Yep. [20:49:55] andrewbogott: {{done}} You'll have to log out and back in to be part of the groups. [20:50:09] thanks! [20:52:43] andrewbogott: Belay that, I'm silly. :-) [20:53:16] andrewbogott: /now/ it'll work. [21:00:52] Coren: does labs have somewhere super sneaky to store passwords like /mnt/secure on bots? ;p [21:01:34] you mean for your bots' password? [21:01:35] nope [21:01:41] addshore: No sneaking necessary; only NDA'ed can be root, so unix permissions are good. [21:01:51] addshore's bot is in bots [21:01:54] not tools [21:02:03] addshore: Eventually, we'll gun for OAuth [21:02:23] Ryan_Lane: That's something he's currently working on. He's toolsificating. :-) [21:02:31] ah [21:02:42] :P [21:02:55] then, yeah, you're good in that project ;) [21:03:26] addshore: Canonically, you want your credentials to be owned by the tool account so that the maintainers have access to restart the bot. Maintainers = you until you change the userlist. :-) [21:03:27] not sure Im going to be able to migrate my current task really :P [21:04:12] addshore: If you can't, then I've got a failiure I need to fix. [21:04:25] Coren: haha, dont worry about me :) fix your stuff :P [21:04:59] No, you misunderstood. If you cannot sucessfuly transition to tools, then there is a bug in tools which needs fixing. [21:06:30] Failure is not an option. [21:06:42] do i understand that right that only tool maintainers can see the source code? [21:06:58] anyone in the tool's service group [21:07:09] and admins in the tools project (which requires an NDA) [21:07:25] sounds like a step back to me [21:07:31] how so? [21:07:44] the code should additionally be in git [21:07:49] giftpflanze: Nothing prevents you from setting it world readable or (even better) putting it in git [21:08:11] afaik in the bots project that's different [21:08:47] giftpflanze: Not really, you have to set it world readable as well; but in bots there are more people with root who can look regardless of permissions. [21:09:21] i'd rather like everything to be openly visible [21:09:28] hmm Coren can I connect directly from bots to tools instances? :/ [21:10:11] addshore like? [21:10:14] connect to what [21:10:18] dbs [21:10:21] no [21:10:25] :( [21:10:33] but you could connect from tools to bots [21:10:34] addshore: Depends on the security settings; but I would expect not. [21:10:41] ninja'ed by petan [21:10:43] petan: tools to bots works? :P [21:10:56] well, it doesn t but I can enable that [21:10:57] if so thats perfect xD [21:11:05] petan that would be amazing :) [21:11:11] ok [21:11:59] projects are only separated from each other by policy ;) [21:12:12] which projectadmins can manage [21:12:17] Coren: is exec2 actually configured in the tools grid currently? :/ it doesnt seem to have much load [21:13:38] addshore: It is, but by default SGE doesn't spread the load evenly; it only moves up the list when the previous host doesn't have "room". But I enforce hard limits, so you are garanteed half a CPU per slot and hard VM allocation, so there is little point in spreading the load. [21:14:36] addshore: The exec host have no overcommit allowed at all, so they cannot trash or starve. [21:14:47] :D [21:14:54] we will see ? ;p [21:14:59] :-) [21:15:12] Coren, this is all looking good… analytics-logbot in #wikimedia-analytics is no running in tools. I'm going to keep an eye on it for a few days before migrating other logbots. [21:15:22] *is now running in tools [21:15:46] andrewbogott: Sounds like a reasonable plan to me. How boring. :-) [21:16:37] addshore it works [21:16:39] andrewbogott: There is a planned outage tomorrow that shouldn't affect it, but you may not be able to log in yourself for a brief period. [21:16:43] addshore :P [21:16:55] addshore you can use mysql to bots-bsql01 from tools to bots [21:16:59] YAY! :D do I just need to connect to bots-bsql01 as usual? :) [21:17:10] yes, but you need your credentials [21:17:16] which are stored in /mnt/secure on bots [21:17:25] haha, my creds were never in my.cnf ;p [21:17:28] ok then [21:17:47] works :D [21:17:52] woop woop! [21:19:17] i wonder if this will all be as easy as I was hoping [21:19:25] dunno [21:19:31] but mysql should work ok [21:20:05] Coren: does tools need any special qsub params? [21:20:30] I would use jsub :P [21:20:59] Coren we should use that debian package to replace jsub so that we have manual pages :P [21:21:21] it´s built in my home [21:21:37] ... i cant find the tools help page again xD [21:21:43] addshore: Not as a rule, you want to watch your memory allocation since it's enforced strictly but otherwise the nodes are designed to be fungible and all have access to the same resources. Being explicit in your limits (if you know how to) may help with more optimal allocation - but will never be make/break [21:21:47] !toolsdocs [21:21:47] http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Help [21:21:51] addshore [21:22:06] :D [21:22:23] petan: Yes. I'll install it shortly. [21:22:31] Now: food. [21:22:37] I can install it [21:22:43] petan: Go ahead. [21:22:58] !log tools petrb: installing jobutils.deb on login [21:23:00] Logged the message, Master [21:23:36] addshore [21:23:39] man jsub [21:23:40] :D [21:23:41] shrinking keys.... [21:23:47] hehe libgcc_s.so.1 must be installed for pthread_cancel to work [21:23:47] if anyone sees login issues, let me know [21:24:10] addshore ok [21:24:40] done shrinking keys [21:24:57] right i'll be back in an hour or 2 to take this further :) [21:25:27] addshore idea which package contains this? [21:25:31] libgcc is present [21:25:51] petan: no idea at all :D [21:26:03] you better find it out :D [21:26:06] I go sleep :P [21:26:12] itś 23:30 in here [21:26:14] shrinking bots volumes now [21:26:15] almost [21:26:24] heh good timing [21:26:24] petan: 22:26 here :P [21:26:31] petan: ;) [21:26:34] yes 23:26 [21:26:54] petan: might it be something to do with 64 /32bit packages? [21:27:13] after a speedy google >> http://forum.teamspeak.com/showthread.php/40875-libgcc_s-so-1-must-be-installed-for-pthread_cancel-to-work [21:27:18] bots and tools are 64bit I think both [21:27:25] yrd [21:27:28] yes [21:27:43] both project use same ubuntu version and similar kernel [21:28:01] not exactly same version, but 3.2x 64 [21:28:23] well, I just caused a short downtime for bots-home [21:28:25] but it's back up [21:28:28] starting the shrink again [21:28:39] I hate gluster [21:29:04] addshore that sounds too hard, is that something that works to you in bots project? [21:29:16] I mean, itś not hard but it seems too hard for something relatively simple [21:29:28] yep :P its the exact same script ^^ [21:29:37] I don think I need to alien some foreign packages just to get these [21:29:55] ok in that case we are missing things on tools we have on bots [21:31:05] ok. this shrink is going much better :) [21:31:12] http://askubuntu.com/questions/126625/libgcc-s-so-1-must-be-installed-for-pthread-cancel-to-work [21:31:14] no rush petan, go to bed ;p [21:31:54] Ryan_Lane: Any idea what might cause a symlink in /data to go crazy? [21:31:55] ---------T 2 krinkle svn 0 Apr 4 23:05 Console.msgs [21:32:13] That used to be a symlink (relative path to another directory/file within /data/project) [21:32:31] I re-created it now, but I'm worried. [21:32:56] https://lists.ubuntu.com/archives/foundations-bugs/2012-February/067594.html [21:33:01] addshore [21:33:04] hmm petan it works when i ssh to an exec instance and run it, just not through qsub [21:33:43] This error is fixed by adding /lib to /etc/ld.so.conf and running [21:33:44] ldconfig [21:33:50] Hm.. the target is also gone :O [21:33:53] Krinkle: which project? [21:33:56] ---------T 1 krinkle svn 0 Apr 4 23:05 CVNBot-msgs-es.msgs [21:33:56] ---------T 1 root root 0 Apr 10 04:55 CVNBot-msgs-nl.msgs [21:34:06] cvn-app1:/data/project/cvn/externals/infrastructure [21:34:08] Krinkle: likely due to a hung brick after a volume shrink [21:34:09] one sec [21:34:50] the symlink gone is one thing, but the target has gone as well. That's weird. Everything else is in tact as far as I can tell. [21:35:02] !log tools petrb: inserting /lib to /etc/ld.so.conf in order to fix the bug with gcc / ubuntu see irc logs (22:30 GMT) [21:35:04] permissions like that are always gluster failures [21:35:04] Logged the message, Master [21:35:36] Krinkle: is it better now? [21:35:44] addshore try now [21:35:54] Ryan_Lane: I don't see a difference [21:36:15] still permission denied on both [21:37:14] mhmm petan same again, but still works running directly on the instace [21:37:17] weird [21:37:18] no logs [21:37:54] Krinkle: it looks like you can restore them with git [21:38:00] I'd do that and ignore the gluster issues [21:38:13] yeah, I can. I will once you are satisfied with checking it out :) [21:38:15] they were likely split-brained before the shrink [21:38:25] and we're very soon moving away from this shitty filesystem [21:38:37] Always have backup route :) Glad I put that in version control last week. [21:38:39] like hopefully in a week or so [21:38:48] OK, I'l just restore it. [21:38:51] cool [21:40:14] !log cvn CVNBot-nl and CVNBot-es were down because it was unable to open i18n files. Files restored from git, bots automatically started by stillalive. [21:40:16] Logged the message, Master [21:40:27] addshore try now [21:41:32] same again :/ [21:41:46] damn [21:41:54] its got to be somethign to do with my use of qsub..? [21:42:56] !log tools petrb: reverting the change [21:42:58] Logged the message, Master [21:43:33] addshore I think it has something to do with these restrictions [21:43:49] in bug comments you can see that linux jail is causing these issues [21:44:03] I don really know how jobs are started but they are more restricted [21:45:34] happens if I use jsub also :P [21:47:48] mhhm [21:52:12] * addshore feels bsql01 may have vanished :/ [21:54:53] HAHA all of my bots grid tasks suddenly jumped into Eqw state :P [21:55:07] addshore: Where? [21:55:12] bots [21:55:48] That bodes poorly; E is never a good sign. [21:57:35] * addshore doesnt think there is an easy way to clear all jobs either :/ [21:59:11] You might want to qstat one or two of 'em to know exactly what the error was. [21:59:50] * addshore is not that much of a master of qstat [22:00:41] qstat -j jobnumber [22:01:18] 04/10/2013 21:43:12 [2178:12384]: error: can't open output file "/home/addshore/wd.del.o573844" [22:01:25] must have been the shrink :P [22:02:21] shouldn't be [22:02:23] I didn't commit it [22:02:29] hmm O_o [22:02:32] * Coren is sooo pleased Gluster is gtfo regardless. [22:03:04] though there's a possibility a brick is down [22:03:35] we'll see after the commit [22:03:39] Coren: do you know an easy way to qdel all of my errored tasks? :) [22:04:52] addshore: IIRC, you can qdel -u your_username [22:04:57] :D [22:05:04] but that qdels all, not just the errored ones. [22:05:17] thats fine, literally everything had errored :P [22:05:38] and my deploy script wont start any more if there are a certain number already in the queue (apparently even if they have errored) [22:06:42] hmm, a lot of them just went straight back to eqw [22:07:15] can't stat() "/home/addshore" as stdout_path: Invalid argument [22:07:46] addshore: That's no' a good thing. Looks like the exec node is ill. [22:08:13] bots remember, not tools ;p [22:08:51] addshore: still not a good thing. What instance did they die on? [22:08:58] * addshore checks [22:09:09] andrewbogott: Are you going to start more than one instance of adminlogbot eventually? [22:09:30] it doesnt list an instance [22:09:39] Coren: Yes… maybe four. [22:09:49] andrewbogott: You'll probably want to jstart them with a -N name then to distinguish them for jstart/jstop. [22:10:08] ok [22:10:43] addshore: even if you just 'qstat'? [22:11:05] qstat jsut lists things such as 574083 0.25000 wd.g.vec addshore Eqw 04/10/2013 22:05:18 [22:11:15] no queue for the eqw's listed [22:11:18] Ah; they never got that far. [22:11:25] :P [22:11:35] !keys [22:11:36] http://bots.wmflabs.org/~petrb/db/ list of infobot keys [22:13:55] Where did you start them from? [22:14:14] bots-gs [22:14:54] bots-gs is glusted up too. [22:14:59] :P [22:15:05] everything borked :P [22:15:32] I have to step out for about an hour or 2, is there any chance that if you fix it you could qdel everything fro my user 'addshore' ? :) [22:16:00] addshore: No problem; it's probably a hung brick. [22:16:11] cheers :) ttyl [22:16:15] Coren: on bots? [22:16:26] Ryan_Lane: Yeah, at least from bots-gs [22:16:40] once I commit I can kill the bricks and force start them [22:16:43] then we'll know [22:16:58] Well, you can't make things worse with a commit now. :-) [22:17:12] it's not finished shrinking [22:17:18] Ah. [22:17:19] so that would lose all the non-rebalances files ;) [22:17:44] * Coren adds this to the list of things to not try [22:32:17] uuugghhh [22:32:41] seems every page request to wikitech was causing an ldap lookup [22:32:47] still is for logged-in users [22:33:06] dear memcache [22:41:49] I like the Han http://tvtropes.org/pmwiki/pmwiki.php/Main/LeaningOnTheFurniture pose. [22:45:10] mt [22:47:46] committing bots-home [22:47:53] Coren: know which files to check there? [22:48:00] there were 2 failed rebalances [22:48:05] which were likely split-brained files [22:48:33] Ryan_Lane: I know /home/addshore itself went gronk. [22:48:44] But not more than than. [22:48:46] the whole directory? [22:48:57] can't stat() "/home/addshore" as stdout_path: Invalid argument [22:49:21] EINVAL on stat() = bad news. [22:50:01] it seems to be working now [22:50:08] from bots-bnr1 [22:51:08] Yeah, it seems back. Odd. That was before the commit so in theory nothing should have been affected. [22:52:25] andrewbogott: mind a review? https://gerrit.wikimedia.org/r/#/c/58623/ [22:53:34] that should reduce load on ldap and make browsing wikitech faster [22:55:38] addshore: I've cleared the error state of your jobs. Things should be back to running. [22:56:39] Coren: indeed [22:56:50] well, it does move files around [22:56:54] before the commit [22:57:25] but all writes should be directed to bricks 1 and 2 [22:58:41] Ryan_Lane: I +2 it on code review; what's the usual process for verified when jenkins isn't around? [22:58:54] pray? [22:59:00] that didn't seem to fix the issue [22:59:08] I need to add some debugging statements to see what's up [22:59:19] well, it solved it for anons, but not for logged-in users [22:59:45] Ah, Jenkins got to it after me. [23:03:51] oh. duh. [23:03:52] this did work [23:04:14] but I'm caching at the wrong spot [23:04:26] well, pulling from the cache at the wrong spot anyway [23:05:08] because it still needs to get a user for that... [23:05:15] and I'm caching by the DN. [23:05:25] I need to cache by the username, and pull it from the hook [23:06:14] Oh! Well, that'll still save you the group lookup; but yeah, caching by username is better. [23:09:13] Ryan_Lane: Performance stats for hardcore benchmark: [23:09:17] Ryan_Lane: I'm too late! [23:09:19] labstore3: [23:09:25] Operations performed: 12900 Read, 8600 Write, 27506 Other = 49006 Total [23:09:25] Read 201.56Mb Written 134.38Mb Total transferred 335.94Mb (1.1198Mb/sec) [23:09:25] 71.66 Requests/sec executed [23:09:40] labstore1001: [23:09:58] Operations performed: 22332 Read, 14888 Write, 47616 Other = 84836 Total [23:09:58] Read 348.94Mb Written 232.62Mb Total transferred 581.56Mb (1.9384Mb/sec) [23:09:58] 124.06 Requests/sec executed [23:10:24] andrewbogott: not yet :) [23:10:30] I'm pushing in a new patchset [23:10:37] ok [23:10:40] The difference is probably not really that high given labstore3 has some load. [23:14:31] labstore3 has almost no load [23:14:54] iostat shows 96% idle [23:18:33] Ryan_Lane: >>> 0, but still. :-) [23:19:02] So, yeah, we have an NFS server pattern. [23:19:59] The annoying part is that there is no way to put that config in a partman recipe. It doesn't allow setting some of the alignment options when creating the volume groups. :-( [23:21:53] Coren: heh [23:22:10] documentation, then [23:27:36] ah. shit [23:27:44] +2 makes jenkins merge in this repo [23:27:45] heh [23:28:03] I was wondering why I kept getting rejected [23:29:44] Coren: since you volunteered for the last one: https://gerrit.wikimedia.org/r/#/c/58630/2 [23:29:45] :) [23:30:31] using OpenStackNovaUser() causes an ldap lookup [23:30:44] Coren: cheers for that! :) [23:30:45] so, I'm bypassing it by pulling the roles from memcache at that spot [23:33:38] andrewbogott: https://gerrit.wikimedia.org/r/#/c/58615/1 <— looks good. did you get a chance to test it? [23:34:06] Ryan_Lane: +2'ed. [23:34:39] Ryan_Lane: Yeah, it's running on nova-precise2 now. [23:34:42] Seems right to me. [23:34:55] cool [23:47:17] ok. that fixed that [23:47:37] neither logged-in users nor anons are causing ldap lookups on page reads [23:47:56] and the wiki is indeed faster now [23:48:25] remarkably so, in fact [23:49:57] time to try and run on tools again :p [23:51:08] hmm :( no joy [23:53:23] andrewbogott: I +2'd your change [23:53:28] I'd imagine jenkins will merge [23:53:30] yep, thanks [23:53:32] addshore: problems with the filesystem> [23:53:34] ? [23:53:38] or something else? [23:53:45] something else this time :) [23:53:57] ah. ok. good [23:54:04] ^^ [23:54:05] well, good that it isn't the filesystem still :) [23:54:41] just have to work out what it is :P [23:54:59] heh [23:55:35] I sure have been on a performance and usability kick lately. I wonder what to work on next :D [23:58:54] db? ;p [23:59:06] other folks are working on that [23:59:13] we're working on the filesystem stuff right now [23:59:27] but it's a long boring process, so I'm doing other things while waiting :)