[00:06:12] Can someone help me add external-ip support to the 'Netflow' project? [00:14:22] cajoel: I'm guessing no one has assigned you one? [00:14:37] Coren: andrewbogott ^^ [00:15:18] * andrewbogott can do it [00:15:25] although... [00:15:52] cajeol, is it possible for you to tunnel for a few days? We're pretty close to having a reliable proxy system set up that would obviate the need for a public ip [00:15:59] …presuming that it's just web access you need. [00:16:48] (washing dishes, back in 5) [00:26:51] andrew: the flow collections I'm doing till be taking feeds directly from production edge devices [00:27:20] I'm game to wait for a proxy solution, but these are just UDP flows coming from prod routers [00:27:34] does your proxy have UDP support? [00:28:13] Reedy: I thought it was just an option configured per project. [00:28:13] I'm having trouble adding myself to a POSIX group on a labs instance. 'usermod -a -G wikidev werdna' doesn't seem to do the trick. Are POSIX groups managed by some other freaky labs thing? :) [00:29:49] cajoel: If you need UDP then I should allocate you a public IP. The only hitch is that they're all going to change in a few months. [00:31:28] cajoel: OK, if you're project admin then you can set up an address here: https://wikitech.wikimedia.org/wiki/Special:NovaAddress [00:31:33] If not then you need to wait for me to finish eating dinner [00:32:53] I'm the admin, but it failed last time I tried [00:32:55] trying again [00:33:20] done [00:33:21] thanks [00:33:22] cajoel: Nope. Need an admin to provide the IP to the project. You can then allocate as you want [00:33:24] enjoy dinnner [00:33:32] I'm an admin [00:33:34] all set [00:33:40] No, I mean labs admin [00:33:43] oh [00:33:44] s/admin/root/ [00:33:47] andrew did that [00:33:52] bah [00:33:57] so I'm good [00:34:01] I have a notify on andrew [00:34:14] lol [00:34:27] werdna: Wikidev group on which project? [00:34:35] editor-engagement [00:35:18] how to apply public address to runnign instance? [00:35:46] Configure seems to be puppet related only [00:36:48] https://wikitech.wikimedia.org/wiki/Special:NovaAddress [00:36:56] [Allocate IP] [00:37:02] (found it -- thx) [00:48:53] what is the approved upper bound of memory for a job [00:49:53] i was going to run with 2GB but like to request 3 if possible I am worried that job will get killed for requesting 3GB [00:52:15] hey, I have shell access so I figured I was supposed to use bastion-restricted, but I can't log into it [00:52:41] werdna: I think restricted is just for ops -- can you access bastion1? [00:53:12] werdna: And, I'm not sure about the usergroup thing. Certainly users and groups are handled by ldap, but I wouldn't think that would exclude you from making local changes. [00:54:06] auduwage: If there's nothing in the docs about it then probably best to run it past Coren in the morning. [00:54:22] * werdna shrugs [00:56:29] andrewbogott: I don't see anything on the docs. [00:57:03] andrewbogott: I think this is probably why: werdna@ee-flow:~$ sudo grep werdna /etc/passwd [00:57:04] werdna@ee-flow:~$ [00:58:49] well, users are stored in ldap. So you wouldn't have an entry in /etc/passwd [00:58:57] right [00:59:01] so how would I have groups [00:59:12] groups are configured in /etc/passwd, no? [00:59:45] ah no, /etc/group [00:59:49] werdna@ee-flow:~$ cat /etc/group | grep wikidev [00:59:50] werdna@ee-flow:~$ [00:59:51] very helpful [01:00:05] werdna: you can maybe accomplish what you want via this interface: https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup [01:00:28] It's not precisely designed for this, but it will create a user group in ldap and let you manage members. [01:00:36] Unless this group already exists in your project? [01:01:02] it does [01:01:19] werdna@ee-flow:~$ groups spage [01:01:19] spage : wikidev wmf project-bastion project-editor-engagement project-deployment-prep project-tools project-oauth [01:01:47] yep, same thing happens on my instance in a different project. [01:03:55] spage is in wikidev on production too. [01:04:05] So, that's probably why :) [01:05:37] ah [01:05:40] aren't I? [01:05:59] nope! [01:06:00] andrew@fenari:~$ groups [01:06:00] wikidev [01:06:28] on fenari but not on bast1001. Hm... [01:06:45] I am entirely unclear when things are system-local and when in ldap [01:06:55] fenari is old, though… so maybe it's a leftover. [01:07:04] ohh my username is 'andrew' [01:07:34] this old thing :p [01:07:58] Hm, and of course my username is different in labs and production [01:08:22] This really seems like the kind of thing that I would know about :) [01:08:33] :D [01:09:36] andrew@bast1001:~$ groups [01:09:37] wikidev [01:09:46] but there my username is, again, 'andrew', not 'werdna' [01:13:28] welp, I see group wikidev in ldap but it shows no members at all [01:13:45] hmm [01:13:56] but you said spage is in wikidev on production too? [01:14:41] he is... [01:15:00] I remain unclear how this fits together though. As far as I know, ldap doesn't have anything to do with users or groups in production [01:15:01] ah, you mean on the servers [01:15:26] Yeah, how I think this works is : users+groups managed by puppet in production, by ldap in labs. [01:15:40] But that doesn't explain why there's a wikidev group in labs or how spage is a member. [01:16:16] my theory would be that the puppet classes have a list of users to apply the wikidev group to [01:16:27] and that those puppet classes also exist on labs machines [01:16:57] That would surprise me, but… [01:16:59] * andrewbogott looks [01:19:29] werdna: Anyway, I think we should rewind. What is it you're trying to accomplish? [01:19:50] andrewbogott: There's a directory in a project I'm on that is owned by the wikidev group [01:20:04] I need to modify it. I could sudo, or I could chown it to project-editor-engagement [01:20:07] but that doesn't help much [01:20:21] because the next time somebody adds a file, it will just revert to ownership of 'wikidev' [01:20:31] Does it need to be in the wikidev group? chowning sounds reasonable to me [01:20:32] and then I'll be back at square one [01:20:38] unless this system is supposed to be a replica of a production system [01:20:49] well, the primary group of all my other coworkers is 'wikidev' (I assume this, because of the ownership) [01:21:03] meaning that the next time somebody creates a file there, it will be owned by wikidev [01:22:38] Hm, yes, so I see. [01:22:52] Seems broken :) [01:24:20] I don't have a good idea of a solution, though. I'd advise you do what you need to using sudo for now, and try to catch Ryan_Lane (the boss of ldap) tomorrow and see if this is somehow on purpose. [01:25:03] I should clarify -- the labs group setup seems broken. But having that dir in labs owned by wikidev also seems broken. [01:25:08] mmmm [01:27:04] cheers for the help anyway [01:57:44] werdna: If the /directory/ is SGID, any file created there will have the group of the directory and not the users' [02:18:43] Hi all, if anyone is around I have a quick question [02:19:02] What's the best way from within a server-side web script to correctly get the tool's home directory path? [02:19:13] The HOME environment variables seems to be undefined [02:21:41] I could grab __DIR__ and step up but that's kinda fragile [02:25:08] I'm not sure why the web server isn't setting up $HOME correctly [02:25:39] Assuming it runs as the tool user [02:29:08] Maybe I could spawn sh -c 'echo ~' or something crazy like that [02:31:21] Okay I used `bash -c 'echo ~'` and that worked in PHP [02:32:58] errr Make that `bash -c 'echo -n ~'` [02:33:33] Dcoetzee_: that is kind of terrifying [02:33:41] I'm sure you could just use HOMEDIR [02:33:56] Nope there are zero environment variables available containing my home directory, I checked [02:34:09] No HOME, no HOMEDIR, etc. [02:34:46] $_SERVER['HOME'] [02:34:50] Tried that [02:34:53] It's blank [02:35:00] says https://www.google.com.au/search?q=get+home+directory+from+php&oq=get+home+directory+from+php&aqs=chrome..69i57j69i60j69i61j69i59j69i61j69i60.2801j0j7&sourceid=chrome&espv=210&es_sm=91&ie=UTF-8 [02:35:55] I tried getenv, tries $_SERVER, did a print_r on $_SERVER and nothing useful is in it [02:36:12] It seems like the environment isn't being set up right somehow [02:36:42] Testing locally run in a web browser there's no HOME anything [02:36:52] bar in other variable paths [02:37:04] This is run on the server, in production environment [02:37:11] And? [02:37:11] And is server-side [02:37:14] And? [02:37:30] It's still apache running a PHP script on a server [02:37:40] Yes? [02:37:46] Not sure what your point is [02:37:51] What was your point? [02:38:02] My point is I have no easy way to get the home directory of the tool account from the PHP script [02:38:20] Other than spawning bash and asking it [02:38:36] Unless I missed something easier [02:38:44] I just want to load replica.my.cnf [02:38:47] And need its path [02:38:56] I was just testing and confirming the behaviour you're seeing, and that [02:39:00] Config file and hardcode the path? [02:39:14] I'd rather spawn bash than hardcode it [02:39:22] $homeDirBecausePHPSucks = '/home/foobar'; [02:39:23] Why? [02:39:35] Because hardcoding requires reconfiguration every time it's deployed somewhere else? [02:39:44] Same reason one always avoids hardcoding [02:40:00] How often are you going to be deploying it somewhere else? [02:40:30] I've done it once today already :-P [02:40:57] I'll just stick with what I'm doing [02:41:52] www-data:x:33:33:www-data:/var/www:/bin/sh [02:42:01] /var/www [02:42:12] The PHP script is run as the tool account, not as www-data [02:42:22] Otherwise what I'm doing wouldn't work [02:42:47] Also, it wouldn't be able to read replica.my.cnf [02:42:52] Since it's 600 [02:44:11] Testing it... [02:44:31] echo $HOME as www-data tells me /root [02:45:39] * Reedy stabs PHP [02:45:40] I'm guessing that when it switches from www-data to the tool account to run the PHP script it cleans out the environment but doesn't set HOME for the tool account. [02:45:52] Although I have no idea where such things happen. [02:47:08] You can get the current directory with this [02:47:08] $currentDirectory = array_pop(explode("/", getcwd())) [02:47:08] Just keep popping until you get to the home directory. For each pop, add a "../" before the relative addressing. [02:47:08] It can be done by a simple loop. [02:47:12] The internet is scary sometimes [02:47:23] Heh [02:47:54] That would actually work if I knew where to stop. :-P [02:47:56] so keep going back till you fine home, then you want that element and the previous [02:48:13] On Tool Labs of course the home directory is not /home/blah/ [02:50:17] Wooo my script is migrated :-) [02:50:27] And working after I changed to using revision_userindex [02:50:33] Easier fix than I thought [02:50:56] I guess on toolserver everything was running in /home/user/public_html [02:51:05] Yeah [02:51:11] But still it's relative from where you are [02:51:33] What I could do is always go one above public_html [02:51:43] Which will work as long as the public_html directory is in the home directory and is called that [02:51:50] But I like my solution better still [02:52:17] Seems more sane than some potential workarounds at least [02:54:56] The proper fix is really the admins fix it so HOME is set correctly while running PHP scripts, so I'll tag one later for that :-) [03:01:46] I'm actually really impressed how easy migrating from Toolserver to Tool Labs is [03:02:04] I only had to change like two lines [08:15:17] !petan-build [08:15:17] make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-custom [09:58:56] If i use z-cron to scheduling the tasks to run it in toollabs , cron will run untill the computer is ON or what? [10:01:30] any body can help? [11:21:18] Why has my script recently repeatedly stopped? (Running with continuous flag.) [12:11:18] a930913: if it exits with a non-0 exit code, the job will stop [12:33:52] any lab ops around? [12:36:26] hashar said we need a new entry for http://m.wikipedia.beta.wmflabs.org configured by the lab ops [12:38:23] can it be set to point the mobile varnish, the same as http://zero.wikipedia.beta.wmflabs.org [12:39:02] 208.80.153.143 [12:40:14] yurik_: Coren / andrewbogott_afk would be able to add a new DNS domain in the deployment-prep project [12:40:19] though they are not yet around .. [12:42:58] yurik_: gotta eat something will be back [12:44:32] hashar, bon apetite [12:44:39] merci ! [13:46:36] Coren: do we have the equivalent of /mnt/userstore on labs? [13:47:34] You mean a place where everyone can put stuff? /shared/ [13:50:59] Coren: I would assume thats where dumps are stored too correct? [13:52:04] Betacommand: /public/datasets/public/ actually; though I think someone put old dumps in /shared/ [13:54:46] Coren: can you setup a sysmlink system so I can define one file and always get the most current dump? [13:55:16] TS had that and made things easy [13:55:56] That's actually rsynced from the analytics box, any symlink I put in would get clobbered; I can look at having it in the source though. [13:57:23] Coren: what about creating a public/dumps/ with symlinks pointing to the /dataset [13:58:41] Coren or Betacommand - do you know who could add a new dns entry for beta cluster? ^^^ [14:00:12] Hm. I could probably trick something on the file server for that (/public is an automount point, no way to put files there). That said, the easiest solution would be for me to put something outside entirely. Lemme think about it for a minute for something that's going to me maintainable and non tools-specific. [14:06:36] is there a node I can use for non-jsub scripts? [14:08:08] Or can I just use the login server [14:14:50] hashar: are you here? I have an issue on beta labs commons and I'm not sure how to fix it. [14:16:10] chrismcmahon: yeah I am there [14:18:14] hashar: from time to time I get an error because a page in beta commons is trying to load a really long URL " [14:18:14] network log: [200 OK] GET http://bits.beta.wmflabs.org/commons.wikimedia.beta.wmflabs.org/load.php?debug=false&lang=en&modules=ext.centralNotice.bannerController...etc " but the javascript that comes back is not valid. [14:18:57] so that is the resource loader of MediaWiki [14:19:06] it assembles various js in a singledocument [14:19:07] hashar: so I'm wondering if we can just turn off centralNotice.bannerController completely for all of beta labs, it is not necessary [14:19:10] and we have a bunch of them [14:19:25] how would you test central notice banner ? :D [14:19:42] that link gives me: [14:19:43] mw.loader.state({"ext.centralNotice.bannerController...etc":"missing"}); [14:19:48] /* cache key: commonswiki:resourceloader:filter:minify-js:7:83dbfe4a672d9ba8d241834c84f89278 */ [14:19:48] hashar: we've never tested a central notice banner before to my knowledge :-) [14:20:07] hashar: the full link is at https://bugzilla.wikimedia.org/show_bug.cgi?id=56279 [14:21:08] hashar: I've already removed the Template that was causing the other error in that bug report, it should not have been in beta labs at all. [14:21:57] hashar: now that we have the SSL things more or less stable, I'm trying to clean up some of the other errors in the beta labs wikis [14:25:29] hashar: I have a strong feeling that the central notice code behind that URL is wrong or not current or otherwise not what is in production. I would rather remove it completely unless we absolutely have to maintain it for some reason. I just don't know how. [14:26:10] yurik: You mean for the proxy? [14:26:42] Betacommand: You really should use the grid for everything but brief interactive things. [14:27:00] Betacommand: Do you have something that fails to work properly on the grid? (If so, that's a bug) [14:27:23] Coren: Currently *.m.wikipedia.beta.wmflabs.org maps to the mobile varnish (as well as a number of other servers). I need m.wikipedia and other non-language-specific domains also to map there [14:27:54] Coren: No its just a script that im one offing and want to watch [14:28:00] same as m.wikiquotes, m.wiktionary, and or any other domains we have defined on beta cluster [14:29:04] Betacommand: Simplest way to do this (even if it's a little ugly) is to "qrsh your_script"; that'll run it on a suitable node but still give you stdio. [14:30:03] Coren: is that documented anywhere? [14:31:14] Betacommand: It's part of the standard gridengine functionality. I'm not sure that I want to encourage it as a general use case because I *know* people are going to do this in screen routinely and it messes up resource allocation a bit. :-) [14:31:35] chrismcmahon: javascript error: Uncaught SyntaxError: Unexpected end of input [14:31:36] that is pretty bad :/ [14:31:45] chrismcmahon: will look at it in a few [14:32:24] yurik_: I'm not all that familiar with that setup; Ryan or Andrew are your best bet but I'll make sure they share the howto with me. :-) [14:32:24] hashar: that's what I'm seeing. we get that error intermittently upon trying to load certain pages, UploadWizard is an example [14:32:58] hashar: ^ re dns [14:33:04] chrismcmahon: got to find the root cause. Maybe a cache needs to be cleared [14:33:27] Coren: thanks :) [14:34:00] yurik_: so follow up with andrew. You need a wikipedia.beta.wmflabs.org domain so we can add in a 'm' host that would point to the mobile cache. Thus the m.wikipedia.beta.wmflabs.org would point to the mobile cache . Success! [14:34:11] yurik_: or maybe it should be pointed to the text cache, no clue really. [14:34:30] andrewbogott: ? [14:35:01] * andrewbogott catches up [14:36:04] yurik_: I'm surprised that that doesn't work already, but let me check... [14:36:13] thx :) [14:36:15] What do you mean by 'and other non-language-specific domains'? [14:36:43] any site m.{{SITENAME}}.beta... [14:37:03] hashar already got zero.wikipedia.beta.... working [14:37:34] hashar: let me know if you figure out the root cause of that, it's something I would like to see fixed. I'll make a bugzilla ticket for it also. [14:40:29] yurik_: is m.wikipedia.beta.wmflabs.org working like you want now? [14:40:54] andrewbogott: http://m.wikipedia.beta.wmflabs.org/ try it :) [14:41:15] 'like you want' is the question [14:41:19] Obviously it does something vs. nothing :) [14:41:33] it doesn't even DNS resolve :) [14:41:50] For me it does. Let's give it a minute [14:41:53] it should point to the same mobile varnish cache servers as all the other .m.wiki* [14:42:49] andrewbogott: nslookup m.wikipedia.beta.wmflabs.org 8.8.8.8 [14:42:54] also fails [14:43:09] (google dns) [14:43:23] try logging into a labs machine and verifying from there. [14:43:34] chrismcmahon: reviewing the URL reported, one is attempting to load MediaWiki::CollapsibleTemplates.js on commons http://commons.wikimedia.beta.wmflabs.org/wiki/MediaWiki:CollapsibleTemplates.js which does not exist [14:44:33] hashar: I removed MediaWiki:CollapsibleTemplates.js this morning to match production. Our version had unprintable characters in the page [14:44:39] andrewbogott: it works from bastion [14:44:44] very confusing [14:44:57] i wonder why its not working from RU [14:45:00] hashar: we should not be seeing the error from MediaWiki:CollapsibleTemplates.js any more. also MediaWiki:CollapsibleTemplates.css is gone [14:45:01] DNS changes aren't instant, it'll take a while for them to apply locally. [14:45:21] yes, but it works from bastion even when i use google's 8.8.8.8 [14:45:29] $ dig +short m.wikipedia.beta.wmflabs.org @labs-ns0.wikimedia.org [14:45:30] 208.80.153.143 [14:45:33] works for me :-] [14:46:00] that I can't explain [14:46:20] yurik_: 8.8.8.8 is any cast, it points to different dns resolver depending on your source IP / ISP / where in the world you are [14:46:28] so… yurik_, I can't put a wildcard in the middle of an entry. So if you have other domains you want set up I can do them but you'll have to specify them individually. [14:46:34] yurik_: so the query from labs, my machine or your, all ends up at different resolvers. [14:47:20] andrewbogott: i didn't mean a wildcard - only the same domains as set up in beta cluster [14:47:29] not sure where that list is [14:47:46] hashar might know? [14:48:30] yurik_: you want all the m.s to point to that same box, right? [14:48:37] correct [14:48:46] same as zero.s [14:48:54] IF they are defined [14:48:58] actually, scratch zero [14:49:07] only zero.wikipedia, which already works i think [14:50:12] I actually don't see anything here for zero. [14:50:40] I see… eventlogging.beta.wmflabs.org bits.beta.wmflabs.org upload.beta.wmflabs.org beta.wmflabs.org [14:50:42] that's it [14:51:28] andrewbogott: but it works somehow. Strange. Btw, an important one is m.meta.beta.wmflabs.org [14:52:05] chrismcmahon: well something weird happens :-D looking at sauce labs job https://saucelabs.com/jobs/39f24d5767844936bb54b6c60a08507e [14:52:09] There's also *.beta.wmflabs.org and *.m.wikipedia.beta.wmflabs.org [14:52:13] presumably zero is caught by the wildcard. [14:52:47] hmm, i guess there shouldn't be meta come to think of it - since it would be a new domain [14:53:22] we might have to hack some other server to do zero-rating. Bleh [14:53:29] can't wait to switch to IP-based filtering [14:53:54] chrismcmahon: it hard to diagnose, the javascript is truncated in sauce labs output https://saucelabs.com/jobs/39f24d5767844936bb54b6c60a08507e [14:54:31] hashar: that wrong js seems to be Mediawiki:AjaxTranslation.js [14:54:31] yurik_: So… nothing left to do? [14:54:44] chrismcmahon: yup, hasn't been edited in two years [14:54:59] hashar: let's just kill it [14:55:09] chrismcmahon: that is not going to solve the problem [14:55:09] hashar: after we check production :-) [14:55:19] andrewbogott: well, it still doesn't resolve for me - either google or local, lets hope it gets resolved. Thanks for your help! [14:55:32] np [14:55:36] will try it later tonight [14:55:48] hashar: how so not solve the problem? [14:56:32] chrismcmahon: I looked at http://commons.wikimedia.beta.wmflabs.org/wiki/MediaWiki:AjaxTranslation.js and it is surely valid. [14:57:13] chrismcmahon: the question is more: why does the sauce labs think it is invalid. I guess it does not manage to fetch it entirely [15:03:15] hashar: no, there should be something wrong with the js. In the case of CollapsibleTemplate, there were unprintable characters. [15:03:27] hashar: I'm looking to see if I can figure what the issue is [15:03:42] chrismcmahon: tried a bit on my local machine, cant reproduce :-(((( [15:03:50] I can clear out the bits cache maybe [15:05:37] hashar: the version we have is different than http://commons.wikimedia.org/w/index.php?title=MediaWiki:AjaxTranslation.js [15:06:00] ohhh [15:06:09] what do you have ? [15:07:10] might be some cache being corrupted (either varnish or memcached) [15:07:20] http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=MediaWiki:AjaxTranslation.js from 2010. prod is dated 2012 [15:07:21] if it is ever cached in memcached of course [15:07:46] ahh [15:07:47] yeah [15:08:10] but still, the page on beta hasn't changed for the last 2 years [15:08:25] hashar: do we even care about the Mediawiki: namespace on beta at all? [15:09:11] ideally we would want to sync gadgets and mediawiki js & css [15:09:33] and I think some volunteers are trying out their gadgets and or js on beta [15:09:45] the ajaxtranslation.js, we can probably phase it out [15:09:52] not sure how it ends up being included by bits though [15:11:07] Coren: Are the webservers OK, im getting reports of issues [15:11:15] hashar: for now I blanked the AjaxTranslation.js page [15:11:18] hashar: [15:11:19] chrismcmahon: ah from http://commons.wikimedia.beta.wmflabs.org/wiki/MediaWiki:Common.js of course :-] [15:11:41] Betacommand: I'm not seeing anything; lemme take a look at the logs. [15:11:46] hashar: yikes [15:12:09] chrismcmahon: yeah. And I can't help on that. I am not a js person nor do I know what the community is doing with all that javascript [15:12:29] hashar: I don't think we can reasonably maintain all of that [15:12:44] chrismcmahon: feel free to blank that Common.js page [15:12:52] chrismcmahon: that should get rid of most the stuff [15:13:08] hashar: done [15:13:10] thanks [15:13:15] er stuff / gadgets & mediawiki: js [15:13:59] retrigger the job maybe ? [15:14:08] hashar: I've learned a lot this week about all the extra js and css we have. it's a little crazy. [15:14:18] it is not crazy [15:14:20] it is insane! [15:14:23] :-]]]]]]]]]]] [15:14:26] hahahahaha [15:14:34] but this way the community is able to play with javascript [15:14:49] some of those gadgets are VERY popular [15:15:05] ie twinkile https://en.wikipedia.org/wiki/Wikipedia:Twinkle which assist in maintenance [15:15:07] hashar: the funny part is that Firefox is very liberal and doesn't throw errors but Chrome and IEs do [15:15:24] if it was only me, I would remove twinkle from the mediawiki namespace and make it an extension [15:15:35] Betacommand: I'm seeing the "normal" number of errors in the logs, and only one source of 500s for any of yours (from a single IP). [15:15:45] hashar: so anybody that only tests their js in Firefox thinks it [15:15:52] thinks it is OK :-) [15:16:48] Coren: Im getting reports that my tools are giving a "can't start new thread" error message [15:17:01] chrismcmahon: to give you an idea: https://en.wikipedia.org/wiki/MediaWiki:Gadget-Twinkle.js [15:17:23] hashar: yep, seen that [15:17:43] chrismcmahon: that is a community gadget. That is actually very similar to mediawiki extension, each gadgets is registered somewhere and people can enable them iin their user preference with as ingle click (iirc) [15:18:23] chrismcmahon: and of course, we don't care about the gadgets. So whenever MediaWiki is updated, some gadgets are broken if we made some incompatible changes. none of the gadgets are tested :/ [15:18:24] Betacommand: I see it, they are 502s, and all from the /cgi-bin/sandbox URI. What I get when I try it is a pretty error stacktrace from python itself. You might be able to reproduce it with this: http://tools.wmflabs.org/betacommand-dev/cgi-bin/sandbox?page=Country_music [15:18:40] hashar: yes, I've been working on some tests for the HotCat gadget and the ProveIt gadget recently. I messed up the config for ProveIt a lot [15:18:42] That page is definitely output by your script; so it runs but fails. [15:18:57] chrismcmahon: at least they are using git to develop twinkle, then a script sync their git dir with the mediawiki page. [15:19:19] chrismcmahon: what would be nice is to find a way to test them against core. I have no clue how we could do that though. [15:19:52] chrismcmahon: right now, that would mean making them an extension, but the community would loose the power to update/deploy them which is a pity. [15:20:53] hashar: I'm hoping that with common.js gone, I'll see a few more green builds for non-Firefox browsers. [15:21:10] retrigger https://wmf.ci.cloudbees.com/job/browsertests-commons.wikimedia.beta.wmflabs.org-linux-chrome/ ? :-] [15:21:17] yep [15:21:24] if that still fail I will clear out the bits cache [15:23:11] huh. did we just lose DNS for beta labs? [15:23:45] http://en.wikipedia.beta.wmflabs.org/ [15:23:52] Betacommand: I'm a little surprised at "reports" though; AFAICT there is exactly one user that has gotten that error in the past 48h, and always for the same URI. :-) [15:24:52] hashar: brb [15:25:29] Coren: any idea why python would be having issues creating threads? [15:26:40] i thought python doesn't support threads at all [15:26:45] only in native [15:27:27] yurik python supports both multi-threading and multi-processing [15:27:45] andrewbogott: we somehow loose DNS entry for en.wikipedia.beta.wmflabs.org :-( [15:27:46] since when? they had issues because they have one global execution lock [15:28:46] yurik Ive been using 2.6.x+ for several years without issue [15:29:11] Betacommand: Hm; a quick glance at the webserver this runs on doesn't show it running out of resources, so it's probably not environmental. It works with http://tools.wmflabs.org/betacommand-dev/cgi-bin/sandbox?page=Ouimetoscope though so it seems to be data-dependent. [15:29:16] * yurik_ needs a python refresher course [15:29:18] hashar: OK… is that the only one lost? [15:29:30] andrewbogott: not sure [15:29:49] en.m.wikipedia.beta.wmflabs.org works (mobile equivalent) [15:30:04] the other wildcard works, i.e. wikidata.beta.wmflabs.org [15:30:22] Coren: Ill do some digging [15:30:24] Yeah, but the other wildcard isn't competing with a hard-coded entry under the same domain... [15:30:33] hashar, just in case, I'm going to delete and recreate that wildcard entry. [15:30:41] works for me :-] [15:30:50] Coren: are there per tool limits on the web servers? [15:31:50] * Coren grumbles. [15:32:07] Coren: ?? [15:32:18] Betacommand: No, but you might get better debugging ability if you switched to the new scheme: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [15:32:24] and that gets you FCGI for free. [15:32:55] hashar, which box should *.wikipedia.beta.wmflabs.org point to? [15:33:00] (I keep forgetting kvirc maps ^W to close window rather than WERASE) [15:33:54] Betacommand: sec, are we talking about the same thing? CPython has a GIL- global interpretation lock, which only allows one python instruction to be executed in parallel. There are tons of workarounds, but this remains a fundamental python principale [15:34:49] yurik_: CPython ? I use the basic python package [15:34:59] andrewbogott: deployment-cache-text1 [15:35:32] andrewbogott: but the 'm' host should point to the mobile one [15:35:41] Betacommand: cpython is the main implementation, but there are other like pypy, ironpython, etc [15:35:47] yep. Not sure how that'll get handled, but we will see [15:36:20] yurik_: yeah, was looking that up [15:37:17] hashar: is that instance sick? I can't ping [15:37:58] deployment-cache-text1.pmtpa.wmflabs I am on it [15:38:01] IIRC, threading is still subject to the GIL, but multiprocessing gets around it [15:38:22] andrewbogott: ([a-z]+\.)?(m|zero)\.[a-z]+\.beta\.wmflabs\.org --> mobile, followed by ([a-z]+\.)?\.[a-z]+\.beta\.wmflabs\.org --> desktop :) [15:39:21] MrZ-man: multiprocessing -- you mean spawning multiple programs from one? That's done by OS, so any lang supports it, right? [15:39:31] so *.m.beta.wmflabs.org and *.zero.beta.wmflabs.org on deployment-cache-mobile01.pmtpa.wmflabs while *.beta.wmflabs.org is on deployment-cache-text1.pmtpa.wmflabs [15:39:47] yurik its more complex than taht [15:39:58] hashar: Looks like everything is working now -- for you? [15:40:02] basically, but Python has an API for it to make it easier [15:40:31] right, but that API wraps OS interprocess communication systems [15:40:55] just to make it easier, which is more important to python than to other langs because it lacks true multithreading [15:41:54] andrewbogott: *.zero.wikipedia.beta.wmflabs.org points to the text, should be deployment-cache-mobile01 [15:42:06] andrewbogott: the *.m properly points to deployment-cache-mobile01. [15:42:27] *.zero is not and was never set up. [15:42:34] So it's just falling through to *.wikipedia.beta.wmflabs.org [15:42:44] make sesnse [15:42:47] I assume that's acting the way it always has... [15:42:57] I can change it if you would like :) [15:42:58] yup [15:43:02] please! :-] [15:43:10] and yurik_ will owe you some beer [15:43:16] andrewbogott: but (.*\.)?zero should point to mobile too [15:43:29] 'k [15:44:01] i will! and adam will be the guarantoor while i'm somewhere in eastern europe [15:46:50] hashar, working now? [15:47:22] andrew did not get excited by beer... [15:47:23] andrewbogott: I'm not seeing http://commons.wikipedia.beta.wmflabs.org/ or http://en.wikipedia.beta.wmflabs.org/ now [15:47:32] andrewbogott: looks good to me :-] [15:47:48] hehe, sounds like beta cluster is a house of cards ;) [15:47:58] commons works for me [15:48:09] and so does EN [15:48:11] yurik_: so you should get the request hitting the mobile cache now and being relayed to the backend apaches [15:48:20] chrismcmahonbrb: both of those pages load for me… so probably a dns caching thing. We'll see if it breaks for me in 30 or starts working for you :) [15:48:28] yep, will start breaking it shortly :) [15:48:39] yurik_: you still have to set up the virtual hosts and write some mod rewrite to point / to extract_mobile.php or whatever name you will give to your script [15:49:05] andrewbogott: I did my DNS queries agains labs-ns0 so that should be fine for you now :-] [15:52:35] GET http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=MediaWiki:Catfood.js&action=raw&ctype=text/javascriptUncaught SyntaxError: Unexpected end of inputUncaught SyntaxError: Unexpected token ILLEGAL [15:52:48] hashar: "Catfood.js"? [15:52:51] * chrismcmahonbrb deletes [15:53:34] actually, /me would delete if beta commons would resolve for me. :-( [15:53:37] chrismcmahonbrb: worked for me [15:57:24] chrismcmahonbrb: nooo clue [15:59:22] hashar: when you mentioned /data/project/apache/conf, that's on the deployment-apache32 and 33? Or is there some common location mapped on all? [15:59:36] and why do we load all this stuff for a page in UploadWizard that contains only a single image and a "Next" button anyway? [16:04:34] yurik_: /data/project is a shared NFS export mounted on all instances [16:04:44] yurik_: I usually do all my hack on deployment-bastion [16:05:11] hashar: but don't puppets overwrite it? [16:05:37] yurik_: na the apache conf used on beta are a copy made a while ago [16:05:44] yurik_: they are not maintained by puppet. [16:11:24] If i use z-cron to shudelling tasks in toollabs, does it run when just the computer ON ? [16:29:48] chrismcmahonbrb: still? [16:30:00] andrewbogott: I think we're good now [16:30:04] cool [17:05:24] I was messing around and made two different tools, deskana and deskana2. Can someone delete deskana2? I'll never use it. [17:05:33] This thing: https://tools.wmflabs.org/deskana2/ [17:10:57] Krinkle|detached: I note that puppet doesn't run on integration-apache1. Does that mean that instance can be deleted? [17:18:39] i'm having some strange resoults in some sql querys, is there any report of *missing info* or any expert that could give me a hand finding if it's my problem? [17:40:04] OrenBochman: I have a question about the solr-ci instance. [17:40:16] specifically, I'm trying to get puppet to run there, and thwarted. [19:00:33] !log mobile rebased and updated puppet files on mobile-solr2 [19:00:36] Logged the message, dummy [19:07:51] MaxSem: can you currently access instance mobile-solr2? [19:08:35] andrewbogott: I don't know of any integration instances (I'm not using them anyway), hashar might know [19:09:36] andrewbogott, yep [19:10:24] MaxSem: Hm, I cannot. Wonder why... [19:10:31] I mean, I am currently logged in, updating puppet... [19:10:38] and I note that now I can no longer start a new session there. [19:10:43] You can ssh in? [19:11:04] yup - currently logged in, see slowdowns due to puppet run:P [19:11:34] just tried sshing again - still works [19:11:59] hm [19:12:25] is your account called andrew? [19:12:38] yes [19:12:44] well, or as root... [19:12:47] can't get in either way now [19:13:50] hmm, at least your home dir isn't broken, which was a popular reason of ppl being unable to log in before... [19:15:46] I don't especially need access, I just don't want to have broken anything, updating puppet [19:21:39] andrewbogott, puppet run is now over, can you log in? [19:21:53] the instance appears broken now btw [19:22:09] 2 failures during puppet run [19:25:59] dafuq? jetty is startable manually as root, but not from puppet [19:27:06] hi [19:27:09] coren ping [19:27:16] Deleted edits has been delayed, because labs is currently undergoing staff changes. Expected availability is mid-October. Please page Coren for more info. [19:27:24] it's now almost november [19:27:59] It is indeed. Track bug 49189 for news [19:29:19] Bug 49189 - New fields: ar_id, el_id. (edit) ? [19:30:10] Coren: [19:30:41] Yes, that's the dependency we are waiting on. [19:43:22] MaxSem: It still won't let me in. [19:43:42] And… it looks to me like those services /are/ starting, they're just returning non-zero so puppet reports errors. [19:43:43] andrewbogott, meh. if you don't want to kno [19:43:59] w what's wrong for the future, we can delete it [19:44:07] Oh, hm... [19:44:13] Solr should be tested on betalabs now [19:44:18] Well, I want to update puppet on ~20 more instances, I'd hate to break every one. [19:44:28] heh [19:44:36] But if those instances are obsolete then I support cleanup, generally. [19:44:36] anything I can help you with? [19:44:39] Is that whole project obsolete? [19:44:44] no [19:44:54] MaxSem: I dunno, if you can log in and detect why I can't...? [19:45:05] where to look? [19:45:15] no idea [19:45:30] oh:P [19:46:51] Coren, I bet you know how to debug this :) [19:47:05] * Coren reads scrollback. [19:47:13] Oct 31 08:54:09 mobile-solr2 puppet-master[9379]: (Scope(Class[Ssh::Config])) Could not look up qualified variable 'ssh::bastion::ssh_banner'; class ssh::bastion has not been evaluated at /etc/puppet/manifests/ssh.pp:104 [19:47:24] not related? [19:47:37] huh, might be... [19:48:43] okay, found the moment you forced a puppet run... [19:48:43] MaxSem, is there actually a manifests/ssh.pp? [19:48:54] There shouldn't be [19:49:06] Is the stuff in /var/lib/git/operations/puppet synced to gerrit? [19:50:31] haaa [19:50:33] Oct 31 19:11:47 mobile-solr2 puppet-agent[5092]: (/Stage[main]/Passwords::Root/File[/etc/ssh/userkeys/root/.ssh/authorized_keys]) Could not evaluate: Connection refused - connect(2) Could not retrieve file metadata for puppet:///private/ssh/root-authorized-keys: Connection refused - connect(2) at /etc/puppet/private/modules/passwords/manifests/init.pp:37 [19:50:46] that must be it [19:51:09] in the puppet run? That shouldn't matter. [19:51:10] andrewbogott: I'm not sure I understand what bug you mean; that some services are failing to start on an instance? [19:51:34] Coren: I updated puppet on an instance and now I can't log in anymore, as though my key disappeared. Max can still connect. [19:51:44] o_O [19:51:47] What instance? [19:51:49] Coren, does your root key work on mobile-solr2.pmtpa.wmflabs? [19:52:16] Nope. [19:52:20] MaxSem: the 'private' stuff ought not affect login keys since keys should come from a shared mount. [19:52:30] Now… if private /is/ installing keys, then that's bad but could be the problem. [19:52:32] andrewbogott: That's true for normal users, not root keys. [19:52:37] True [19:52:42] Oh, hm. [19:52:44] andrewbogott: root keys come from private. [19:53:01] Coren, ok, true, I guess I'm fixated on my user key being broken. [19:53:05] Although the fact that both are broken is suspicious [19:53:43] anyway… MaxSem, currently puppet doesn't complain about that, right? That's a thing that I specifically fixed [19:53:55] (it had to do with private being rearranged and that instance being out of date) [19:53:56] yup [19:54:05] so... /var/lib/git/labs/private is from Jul 30 [19:54:15] root@mobile-solr2:/var/lib/git/labs/private# git pull [19:54:15] Permission denied (publickey). [19:54:15] fatal: The remote end hung up unexpectedly [19:54:31] GIT_SSH=/var/lib/git/ssh git pull --rebase [19:54:46] ah, I'm root [19:55:07] You need to be root and also do ^^ [19:56:03] root@mobile-solr2:/var/lib/git/labs/private# GIT_SSH=/var/lib/git/ssh git pull --rebase [19:56:04] Current branch master is up to date. [19:58:13] hmm, but labs/private hasn't changed since then... [19:58:27] Yeah, I think that's fine. [19:59:14] MaxSem: do I have a key in /public/keys? [19:59:46] Oct 31 19:46:31 mobile-solr2 sshd[11347]: Failed publickey for andrew from 10.4.0.85 port 46767 ssh2 [19:59:51] there should be two, 'andrew@AndrewMacbook-5.local' and 'andrew@devstack' [20:00:49] yup, see them [20:01:07] well [20:02:00] See, Coren, this is what I'm hoping you can help debug :) [20:02:18] what if I add you explicitly to the project? [20:02:25] Hm. Are you actually in the correct group to log in that project? [20:02:27] or that would be cheating?:P [20:02:35] Yeah, what MaxSem said. :-) [20:02:51] MaxSem, I see myself as being in it... [20:02:53] do you not see me? [20:03:02] ssh access to instances is limited to the group membership. [20:03:02] aren't roots supposed to be able to log in everywhere and pwn us mortals? [20:03:19] ah, you're at the bottom [20:03:21] MaxSem: With root keys, not with normal user keys (but we can add ourselves to any project) [20:03:33] MaxSem, yes, supposed to be. [20:03:53] MaxSem: Do an 'id andrew' and see if he's actually in the group? [20:04:31] groups={...},50214(project-mobile) [20:04:57] Let's try adding Coren and see if it works for him [20:04:59] * andrewbogott adds [20:06:10] I'm clearly not cool enough to log into that instance. What does the auth log looks like? [20:06:42] Oct 31 20:06:17 mobile-solr2 sshd[11951]: Failed publickey for marc from 10.4.0.220 port 54903 ssh2 [20:07:02] No, I mean auth.log [20:07:14] Coren: For context… I'm doing some janitorial work on a couple dozen instances… now worried that I'm going to break 'em all like I did this one [20:07:20] yep, what I'm quoting [20:07:32] Oct 31 20:06:17 mobile-solr2 sshd[11951]: Set /proc/self/oom_score_adj to 0 [20:07:32] Oct 31 20:06:17 mobile-solr2 sshd[11951]: Connection from 10.4.0.220 port 54903 [20:07:32] Oct 31 20:06:17 mobile-solr2 sshd[11951]: Failed publickey for marc from 10.4.0.220 port 54903 ssh2 [20:07:32] Oct 31 20:06:17 mobile-solr2 sshd[11951]: Failed publickey for marc from 10.4.0.220 port 54903 ssh2 [20:07:32] Oct 31 20:06:17 mobile-solr2 sshd[11951]: Connection closed by 10.4.0.220 [preauth] [20:07:39] MaxSem: Read back a bit; I'd need the. Ah, yes. [20:07:46] Huh. [20:07:56] ^^^ that's what corresponds to your latest attempt [20:18:21] so... [20:19:02] it's FUBAR?:P [20:19:24] I'm not geting it. Hm. One thing to check. [20:20:59] Meanwhile… I"m going to update mobile-solr3 and see if it breaks the same way. Woo! [20:21:40] * MaxSem looks for something heavy to throw at andrewbogott:p [20:21:45] GO AHEAD [20:22:02] Um… presuming that solr3 is also no longer useful? [20:22:37] yep - in any case I can recreate it quickly [20:22:41] additional datapoint, coren, can you access mobile-solr3? [20:23:49] andrewbogott: Yep. [20:24:22] * andrewbogott updates [20:28:04] Same thing! More-or-less successfull puppet run (modulo service starting) and now I can't connect anymore. [20:29:01] Oh bleh. [20:29:09] I have to go to hold an interview. [20:29:54] okay, what should I look there? [20:31:11] I'm sure whatever broke is the same on both instances. I can't imagine what, though [20:33:46] gdb sshd to death?:P