[00:13:40] day six without a working dewiki database... I guess there are no news? [00:16:08] apper: what's the latest issue? externallinks too small? [00:16:50] because recentchanges definitely has recent stuff [00:18:09] yes, replication works, but most of the revisions are missing [00:18:42] page seems to be complete, but revision lacks of at least two thirds of the revisions [00:19:12] and 85% percent of the page_latest revisions [00:19:35] apper:jeremyb: latest: 41,293,437 expected: 130,682,295 [00:20:08] and I only checked page, revision and recentchanges - I don't know the state of other tables [00:20:30] but I think revision is essential for most tools (at least for mine) [00:23:13] apper: Coren answered to giftpflanzes question at 15:53 that Sean (DBA) will be on duty tomorrow (ahh today) [00:23:40] right, but if he's where i think he is then it already is day there [00:23:42] hedonil: ah, thanks [00:25:25] jeremyb: so is there a way to contact him directly? [00:25:38] well he's currently -afk... [00:25:46] * jeremyb shall poke [00:25:48] is there no emergency DBA or is the priority of this problem to low to get this fixed yesterday? [00:26:04] I just want to know ;) [00:26:25] jeremyb: yeah poke [00:28:13] labs replication is hostnely substantially more complicated than typical maria/mysql replication [00:28:19] honestly* [00:28:39] so, it (I guess) needs someone that knows how it works to fix it [00:30:36] jeremyb: He, that's what DBA's are for [00:31:10] jeremyb: maybe there's a manual for that:-D [00:31:34] jeremyb: I understand that, but in all companies I know there are emergency plans for such things and people who get paid for being on standby duty. I don't want to blame these people, I just want to know how this works at WMF [00:32:37] I'm not sure labs comes with a SLA [00:33:01] It certainly doesn't get the priority production does/would [00:33:05] grrrrrrr, so much lag (on my shell) [00:33:08] Reedy++ [00:33:18] Reedy: ah, okay [00:33:43] hedonil: my point was that it can't just be fixed by any DBA. needs someone that already knows it. whether there's a manual, unsure [00:34:19] jeremyb: sure. just kiddin' [00:34:53] jeremyb: but - 6 days is a loooong period [00:35:43] hedonil: right, but idk how broken it was or what's needed to fix it [00:36:23] jeremyb: think status was FUBAR [00:38:17] jeremyb: and that's a problem... the only thing which was done by a WMF employee at the corresponding bugzilla bug was setting the priority of the bug down... so noone knows, what has to be done or who cares at the moment... [00:40:17] * hedonil fiddles with voodoo to speed things up [00:40:43] but now I know that a DBA called Sean will work on it today, that's great news [00:41:23] apper: yeah, now we know his name :-D [00:41:35] Go find Domas [00:41:36] * Reedy grins [00:41:54] apper: we make him an offer he can't refuse8-) [00:42:07] Can someone delete the spam https://wikitech.wikimedia.org/wiki/Special:RecentChanges ? [00:42:36] PiRSquared: sure [00:42:43] damn lag [00:45:30] not sure how much to use autoblock [00:45:44] https://wikitech.wikimedia.org/wiki/Special:Contributions/Larygloria0156 [00:48:49] Greetings. I can't login to bastion. Any ideas? [00:50:30] mathbot: you should try bastion2 [00:51:06] Thank you. I wish this was documented somewhere. [00:51:24] bastion looks pretty dead [00:51:32] jeremyb: https://wikitech.wikimedia.org/wiki/Talk:Main_Page#Shell_access_for_spammers.3F.21 [00:51:39] PiRSquared: i saw [00:51:48] https://wikitech.wikimedia.org/wiki/Special:ListUsers/Larygloria?limit=3 [00:59:05] hedonil: apper: so we got a bit of status: there's a process running to compare tables between the 2 boxes and fix places that are broken but that process is repeatedly losing its connection to the labs side. hoping to find out why that's happening today. [00:59:28] jeremyb: thanks! [01:03:29] so, it's 2 AM here, I go to sleep and hopefully the database is fully working soon :) [01:03:30] bye [01:03:37] and thanks jeremy for the update [01:03:39] nacht [01:03:59] nacht ;) [01:20:15] TParis, ping [01:20:48] Hey [01:24:40] I am having problems logging in. I followed all the steps at https://wikitech.wikimedia.org/wiki/Access#Accessing_public_and_private_instances, with no luck. I am able to connect to gerritt, using ssh USER@gerrit.wikimedia.org -p 29418 [01:25:00] I cannot connect to bastion as it is down. Bastion 2 gives: [01:25:25] ssh -Av USER@bastion2.wmflabs.org [01:25:33] debug1: Authentications that can continue: publickey debug1: Next authentication method: publickey debug1: Offering public key: /home/oleg/.ssh/id_rsa debug1: Authentications that can continue: publickey debug1: Trying private key: /home/oleg/.ssh/id_dsa debug1: No more authentication methods to try. Permission denied (publickey). [01:25:38] Any ideas? Thanks! [01:26:16] mathbot: try -vvv not -Av [01:27:12] OK. Here's what I got: [01:27:13] debug1: Authentications that can continue: publickey debug3: start over, passed a different list publickey debug3: preferred gssapi-keyex,gssapi-with-mic,publickey,keyboard-interactive,password debug3: authmethod_lookup publickey debug3: remaining preferred: keyboard-interactive,password debug3: authmethod_is_enabled publickey debug1: Next authentication method: publickey debug1: Offering public key: /home/oleg/.ssh/id_rsa d [01:27:45] I just generated a fresh rsa key using ssh-keygen -t rsa. And now luc. [01:28:13] don't do a fresh key. focus on the old key [01:28:18] when did you add the old key? [01:28:30] A few days ago [01:28:49] OK, to gerritt just today, 20 minutes ago. [01:28:58] ignore gerrit [01:29:09] all we care about is [[special:preferences]] [01:29:21] you have no authorized_keys at all atm [01:29:28] but 2 keys in ldap [01:29:38] presumably you have the same problem as LuaKT [01:29:40] I added a few days ago to the other place, that is to labs [01:30:05] my. [01:30:35] Any suggestions? Miscomunication problems amog the servers perhaps? [01:31:02] for the record, ldap is: [01:31:03] 2048 b7:75:ec:46:fa:01:e0:c9:06:d7:fe:ff:4c:21:2f:1d oleg.alexandrov at gmail.com-1 (RSA) [01:31:07] 2048 55:2b:49:b7:8c:15:c1:3b:57:70:70:32:ba:65:8d:53 oleg.alexandrov at gmail.com-2 (RSA) [01:31:42] we don't know what it is. the symptom is no new keys are being pushed from ldap to the authorized_keys file store at all [01:33:48] hence i said not to make a new key. i knew it wouldn't be propagated :-] [01:34:18] Well, it failed both with the old key and the new key. :) I think I will give up for now. Perhaps it will come back in a few days? [01:34:51] But I had the old key for quite many days, maybe even a week. Oh. I'll try some other time. Thanks1 [01:35:27] well we could look when it first broke for LuaKT [01:36:10] Cool. [01:37:38] LuaKT: first came to ask about it not working 17 UTC on the 28th [01:37:59] the most recently modified authorized_keys file is Nov 26 17:35 UTC [01:38:06] so 2 days before he asked [01:38:11] mathbot: ^ [01:38:17] yep [01:39:25] Can my key be added by hand by you folks? [01:40:57] mathbot: maybe... not by me. but some of the people that could help with that also could just fix the problem :) [01:41:08] (there's not many people that could place your key manually) [01:41:27] Should I write an email to somebody? Or file a ticket? [01:50:27] gah [01:50:29] i was typing! [01:53:30] setting 701 on a file should make it readable to no one right? [01:53:36] its a cgi script [01:54:59] i set 700 and it still works fine [01:56:55] 701 or 700? [01:57:17] anyway, depends who the file owner is and who's reading it [01:57:26] 700 is certainly not noone [02:04:15] ohai notconfusing! [02:04:18] legoktm: ? [02:05:21] yes? [02:05:28] hello [02:05:47] jeremyb: well, i meant no one by my tool. 700 ended up working which is good enough for me [02:07:45] notconfusing: just hi :) [02:08:14] legoktm: i thought you were saying everyone could still read it [02:08:22] when it was 701 someone else could [02:15:11] sorry, is bastion down? [02:15:18] i'm getting connection timeout [02:15:48] yurik: yes. use bastion2 [02:17:15] jeremyb, thx [02:37:25] PiRSquared: how are we doing on spam? [02:37:37] * PiRSquared checks [02:38:03] Seems mostly handled for now. [02:38:21] except for the two spammers with shell access [02:39:15] can't do much about that. they are blocked, right? [02:39:32] yes [02:39:50] jeremyb: you can't revoke shell access once it is granted? [02:39:55] nope! [02:40:05] your link don't do much now that i deleted the pages! [02:40:11] https://wikitech.wikimedia.org/wiki/Special:PrefixIndex/User:Larygloria [02:40:20] jeremyb: https://wikitech.wikimedia.org/wiki/Special:ListUsers/Larygloria?limit=3 :P [02:40:35] the ones with four digits [02:40:39] both have shell [02:40:40] yeah, all blocked [02:41:28] At least they can't do any harm... right? [02:43:36] they could probably still [02:43:43] i.e. we should revoke shell [02:44:42] i wonder if you can update your prefs when you're blocked. e.g. add an ssh key [02:44:55] I think you can. [02:46:22] well key updating is broken anyway :-] [02:46:24] $ for u in larygloria{01,0144,0156}; do groups "$u"; done [02:46:24] larygloria01 : wikidev [02:46:24] larygloria0144 : wikidev project-bastion [02:46:24] larygloria0156 : wikidev project-bastion [02:46:41] so they have no keys set and they're not in any groups besides bastion [02:47:01] so all we need to do is revoke shell / rm from bastion :) [02:52:29] huh, block log makes me think something wrong with autoblock [02:53:03] Hopefully the spammers don't even know they were granted shell access [02:53:20] but I might be underestimating them [02:53:59] wtf. /me tests [02:54:34] huh, a few users like this including rschen7754 [02:54:43] why does custom sig make it's way to ldap? [02:55:26] huh? [02:56:05] rschen7754: your sig on wikitech? [02:56:22] or something [02:56:29] yes, it's my default signature on all other wikis [04:02:25] Coren: please !log in prod log when booting physical hardware? (or someone else voice an opinion) [04:02:34] * jeremyb is referring to virt10 [04:04:00] (I think also mail to labs-l would have been warranted) [04:34:02] does anyone here have experience with maintaining code which is shared between tools? [04:34:30] stick it in a git repo? [04:35:43] ln -s [04:36:45] currently I have my shared code in /home/magog [04:36:54] but that directory is hidden to the tools [04:37:00] should I create a shared tool? [04:37:06] (@SigmaWP) [04:37:17] Hmm [04:37:24] Magog_the_Ogre: You could do what I did [04:37:34] legoktm, I have never once used git before in my life, and I don't have the foggiest idea where I'd start with it [04:37:41] Create a tool with your username and then dump anything useful there [04:37:49] yup, I was thinking about that [04:37:52] * SigmaWP named his tool "sigma" [04:37:52] SigmaWP's idea is pretty good. just symlink the common code [04:37:57] didn't know if I would break some sort of conevntion [04:38:10] Well, nobody's chewed me up yet, so I think I'm doing fine [04:38:37] silly question then [04:38:49] how do I get my code into the tool directory? I don't think I can sftp into it. [04:39:12] I just ssh into it and then cp everything [04:39:28] why not? [04:39:39] you can sftp [04:40:11] but you shouldn't use sftp much [04:41:36] ok thanks everyone [04:41:59] jeremyb, I reiterate I've never used git before; I might as well just sftp into my home directory and then cp like SigmaWP said [04:42:15] no, you should learn git :) [04:42:58] it's almost 12 in the morning. *implying there isn't other time for me to get this done* [04:43:56] oh, good TZ you have [09:08:27] any gerrit people around? mediawiki/extensions/JsonConfig wants to be created :) [09:36:17] Hello, I can't login to bastion.wmflabs.org. Is anyone having any problem? The error I get is: "Network error: Connection timed out" [09:36:31] My internet is okay BTW. [10:09:32] is there a way to remove all mails in the account? [10:12:05] ok, found a way :) [10:42:53] Did /public/pagecounts permanently move to /mnt/pagecounts? [10:43:59] Tanvir: did you try bastion2.wmflabs.org? [10:56:11] Just tried and it worked! Thanks a bunch, Whym! [10:59:44] np [11:20:05] Meh, http://boogs.wmflabs.org down? Cannot test. :-/ [12:01:45] paravoid: if you happen to be around, instances on virt10 are apparently all in SHUTOFF state [12:01:50] which includes bastion.wmflabs.org [12:44:15] what's wrong with bastions? [12:44:24] ~> ssh jkroll@sylvester.pmtpa.wmflabs [12:44:24] ssh: connect to host bastion.wmflabs.org port 22: No route to host [12:44:24] ssh_exchange_identification: Connection closed by remote host [12:46:26] using tools-login.wmflabs.org as ProxyCommand does work. did bastion*.wmflabs.org get deprecated or something? [12:49:40] JohannesK_WMDE: don't know any details, but I heard that you should use bastion2.wmflabs.org for the time being [12:50:09] whym: ... ah. thanks. [12:50:25] o.O [13:36:40] JohannesK_WMDE: use bastion2.wmflabs.org [13:36:52] bastion.wmflabs.org points to bastion1.wmflabs.org which is dead right now [13:49:28] hashar: Thanks for adding that to my scrollback buffer. I just needed it. [14:02:44] dh_python3: error: no such option: --shebang [14:02:46] I am doomed [14:02:49] manybubbles: you are welcome [14:06:26] hashar: A reboot on the wikitech interface should suffice to wake them. [14:07:36] Coren: oh , mind rebooting bastion1 instance in the bastion project? [14:08:00] I have no idea how to use Semantic MediaWiki to select all instances hosted on virt10 though [14:08:47] Is that bit of data even exposed? [14:08:59] on my https://wikitech.wikimedia.org/wiki/Special:NovaInstance [14:09:17] whenever you click on an instance that list the virt box [14:09:23] !log bastion bastion1 and bastion3 rebooted (virt10 failure fallout) [14:09:33] that is how I found out that those bastions box were on virt10 [14:10:09] Ah, I didn't even know that. Makes sense that it'd be though. [14:10:24] It should be possible to make a #ask to get that list. [14:10:27] Coren: /public/pagecounts seems to have moved to /mnt/pagecounts at tools-login. Is this a stable setup, or is it going to change again? [14:10:32] also filled in a bug to make bastion.wmflabs.org a round robin entry pointing to all bastions [14:10:42] will have to have the ssh host key shared on all three instances though [14:11:26] krd: It's a workaround; I /will/ keep a symlink back when I get a chance to reboot the tools project though. [14:11:59] Can you also tell me at which intervals the files in pagecounts are updated? [14:12:09] * Coren didn't notice the bastion1 was down since he uses iron or tools-login. [14:12:27] krd: I /think/ it's a few times a week. [14:13:14] hmm.... Can we have it each day, probably at a fixed time? [14:14:21] Hi Coren and everybody! [14:14:26] oops, /mnt/pagecounts is nearly empty. hmm. [14:14:32] Hallo, Silke [14:14:44] What's the status of dewiki database? [14:14:47] *sigh* [14:16:30] I haven't been able to get a hold of our DBA for the past 3 days. :-( [14:16:55] It's urgent enough in my priority list that once it gets to morning in Oz, I'll give him a phone call. [14:17:41] (in ~7h) [14:18:21] a dba is a... database admin??? [14:18:26] yes [14:18:30] ok [14:18:33] Coren: could apergos helps with db stuff ? [14:19:33] hashar: Possibly, but the whole replication setup is sparesely documented. Sean is still in catch-up mode. [14:20:01] Thanks for the update! [14:20:23] Asher was a good guy, but the replication was a little rushed on the "document things dammit" side. :-) [14:20:30] yeah [14:20:39] that is known as "bug 1 - our documentation sucks" [14:51:10] Coren, did you get my email? [14:52:05] Cyberpower678: Probably, somewhere in my sea of email. Remind me the Subject: ? [14:52:35] Coren, ever since the 503 issues were resolved, the internal links in xtools are broken. [14:53:14] Ah, yes, that email. Need moar information. Give me a problematic URL? [14:53:43] Coren, A link that points to https://tools.wmflabs.org/xtools/ec ends up going to http://tools-webgrid-01:4040/xtools/ec/ [14:54:04] Coren, https://tools.wmflabs.org/xtools/pcount/index.php?name=Cyberpower678&lang=en&wiki=wikipedia and then click Edit Counter [14:55:15] Coren, what could be causing that? [14:56:23] Don't know yet, but note how the correct https://tools.wmflabs.org/xtools/ec/ doesn't do that. [14:56:47] Meaning? [14:57:13] It works if not called as a link, but when typed directly into the browser? [15:00:15] Coren, ^ [15:02:09] No, it means that it works when the final slash is there. [15:02:36] (As it must) [15:03:33] That's weird. It worked before with out the slash. Isn't the web server supposed to correct for that? [15:06:21] Coren, now the edit counter is slow. [15:10:12] Cyberpower678: It is; and that's probably the cause of the problem. The web server will create a 302 for an URI with a missing slash, but right now it looks like it's doing it /after/ the rewrite for proxying. [15:12:15] So is the problem on my end, or your end of the tool? [15:15:20] Coren, ^ [15:15:50] Cyberpower678: Alsmot certainly at the proxy. [15:16:20] Ok. Let me know when you figure out what spoiled. [15:16:45] Coren, it's taking longer and longer to process a request. [15:17:31] I can't help with 'longer and longer' unless you give me more detail about what happens that is longer. Database queries? Local processing? Transmission? [15:20:09] Coren, sorry. When I click on the edit count link, it appears the proxy is taking longer to service the request. The edit counter itself only spends 1.5 seconds to execute but I'm waiting about 10 seconds to see it. [15:22:58] Coren, now it's stuck. [15:23:07] Works fine with https [15:23:34] Coren, then you have a bug when servicing http requests. [15:24:06] It's not at the network; once the counter data arrives, the rest gets downloadded within 70ms (including, btw, a 404 on an image the webpage tries to use) [15:24:27] Cyberpower678: Of course there's a bug: HTTP is accepted. [15:25:37] Getting my edits is pretty consistent at ~2.1 sec. [15:25:46] Coren, so what do I need to do to fix this. [15:26:17] I see nothing to fix; what's wrong with 2-3 secs? [15:26:39] Because HTTPS works but HTTP times out. [15:27:14] Coren, loaded in 45 seconds. [15:27:34] Script itself "Executed in 1.22 second(s)." [15:27:46] It's working fine for me but I note the webpage is trying to load images from toolserver.org! [15:28:48] I know. That's going to be fixed at some point. The page displays before it attempts to load the image. [15:29:31] I don't mind 2-3 seconds, but 45 is insanely long. [15:29:44] I'm not seeing 45 seconds. [15:29:48] And you're using http? [15:29:55] Not https? [15:30:04] Both. [15:30:28] I'm timing it. It takes 45 seconds to display something. [15:31:44] Interestingly enough, your script does /not/ do the same thing under HTTP and HTTPS; and is much more variable under HTTP (took 14s this time for me) [15:32:12] And I think I see why: because of mixed-content prohibition, it doesn't /try/ to fiddle with toolserver.org under HTTPS. [15:32:28] So you get the result instantly. [15:32:47] Under HTTP, it tries to fiddle with stuff from toolserver.org and /those/ seem to time out. [15:33:16] So I'm thinking your javascript is misbehaving. [15:34:02] Coren, ok. Back to the proxy bug. Why are links without the / being redirected elsewhere. [15:34:05] ? [15:35:24] Cyberpower678: I'll look into it sometime today. They're not being redirected "elsewhere", their just letting through an internal hostname. In the meantime, there is an easy fix: correct your links so that they are not missing the / anymore. :-) [15:35:56] Coren, easy, but extremely tedious. [15:36:19] It's a bug that the redirect doesn't work right (clearly) but that's a safety against broken URI being typed manually -- your links should not be pointing to broken places. [15:36:22] Something I don't have time to do today. [15:36:29] https://tools.wmflabs.org/xtools/ec is an illegal, improper URL [15:37:39] Ok. [15:37:42] Not today. [15:38:00] I'll see about the redirect during the day though; it's an annoying bug regardless. [15:45:52] Coren: have you found out why public keys are not generated from ldap to labstore1.pmtpa.wmnet:/keys on /public/keys ? [15:46:17] iirc that is not documented / unpuppetized and thus either some investigation or Ryan Lane to reply [15:56:05] might be andrewbogott_afk :D [15:56:51] andrewbogott_afk: you got rid of user accounts form labstore boxes ( https://gerrit.wikimedia.org/r/#/c/98030/ ) that apparently causes the authorized keys of labs users to no more be exported on labstore1:/keys https://bugzilla.wikimedia.org/show_bug.cgi?id=57751 [16:13:01] hashar: Andrew probably knows, too. [16:13:35] hashar: But that changeset is almost certainly unrelated. [16:18:05] we will see :-] [16:18:36] well the cronjob set the UID of files to the user id, must be failing with some "user not found" or something [16:18:46] but yeah, andrew will find out :] [16:26:00] hashar: I'm getting a slow start today but will look at that soon. [16:26:13] Do you know of a specific account or key that was changed after the 28th? [16:30:01] andrewbogott: jenkins-deploy [16:30:33] and I think jenkins-slave as well :D [16:31:00] Ah, right, that's in the bug [16:31:10] andrewbogott: the files in /keys/ belong to the user so potentially removing the user from labstore would cause the script that export the keys to be failing because it can't chown or something like that [16:31:14] Did you create those the 'normal' way via wikitech account creation? [16:31:26] should spurt some errors in the cron mails though [16:31:33] jenkins-deploy I can't remember [16:31:40] jenkins-slave, yeah created via wikitech [16:31:52] the credentials for both users are on fenari in /home/wikipedia/docs/labs-jenkins [16:32:07] the users are used by the Jenkins master to connect and run jobs on labs instance [16:32:30] had to add a ssh public key which eventually never got accepted since it is never published in /keys/jenkins**/.ssh/authorized_keys [16:32:45] I am sure looking at the cronjob / cron mail will yield an obvious error message [16:33:09] the question would be where the hell is that cron being run. And ideally should be moved back in puppet :] [16:33:39] I'll have to dig. Probably unrelated to the puppet patch you saw though, since that only affected mhoover. [16:35:28] yeah was just a guess [16:35:34] andrewbogott: More data point: endusers changing their keys on wikitech no longer have their changes reflected on the shared filesystem. [16:35:36] because that is a change made on same day that impact labs [16:36:02] andrewbogott: So it's not just new users. [16:36:14] hm. [16:36:30] if you can add any status on https://bugzilla.wikimedia.org/show_bug.cgi?id=57751 that would be nice [16:36:39] even if it is to say you have no clue or can't fix it right now :] [16:37:06] I am off, will be back later on today [16:38:38] Coren: hey, what's up with labs stuff? Specifically: the de db replication thingy and andre__ mentioned accesing his bugzilla test instance is intermittent (right andre?) [16:38:56] We had one of the VM backends fail during the weekend. [16:39:08] That's fixed but will have wrecked havock with a number of projects. [16:39:23] ugh, sucky [16:39:30] The DB thing is separate (afaik, only dewiki is impacted) and will require Sean's gentle touch. [16:39:55] I'm going to pounce on him the second it's daylight in Oz. :-) [16:40:03] Coren, oh I see. Because https://bugzilla.wikimedia.org/show_bug.cgi?id=57642 gets a bit of buzz :) [16:41:01] Yes. I think we need to have Tim capture Sean and lock him in a room until there's enough documentation about replication for other opsen to help. [16:41:34] It doesn't help that he had the hit the ground running from day one in a documentation vacuum himself. I'm impressed he didn't yet run away screaming. :-) [16:49:21] Tools web is extremely slow for me. anyone else? [16:50:16] ireas: Try https, most people report it much faster. [16:50:32] ireas: The old proxy is on its way out to be replaced by something much nicer. [16:50:44] Coren, ah, yes, works better now. Thanks! :) [16:51:13] Coren: think you can review that stuff today? I wrote a replacement for portgrabber that's more generic and works for all tools on all nodes, but we can just make it work with lighty for now [16:51:39] YuviPanda: Linky? [16:51:58] YuviPanda: Does it still do the trick of holding the reservation exactly as long as the FD is open? [16:52:06] Coren: exactly :D [16:52:08] it does [16:52:12] moment [16:52:45] Coren: https://gerrit.wikimedia.org/r/#/c/98352/ [16:52:52] Coren: I'll add an upstart script for it later [16:52:52] YuviPanda: What is your proxy doing when it doesn't have a map? Shuffle off to one of the -webserver-xx? [16:53:05] Coren: yup. equal distribution on all three [16:53:23] Excellent. [16:53:35] Jenkins didn't like your stuff though. [16:53:36] Coren: and the list of 'backup nodes' is in puppet as well [16:53:45] Coren: yeah, that's a stupid stupid rule there [16:53:46] 80col limit [16:54:18] Coren: I'll fix that. that's the only reason it is complaining [16:56:31] There's a couple of other whines about whitespace too. [16:57:36] * Coren reads code now. [16:59:36] Ah, now you grant ports from the proxy, and requests are done over TCP? How do you ensure that the port is really unused? [17:00:05] Coren: so that's the tricky bit I need to figure out now. I'm trying to figure out how to do that without a race condition [17:00:18] since if the helper first probes, then starts, we can have something open up in the meantim [17:00:28] Coren: but that'll crash the app we just opened, since it'll try to bind to that path [17:00:35] so that sounds okay to me, but I wanted to run it by you [17:00:56] That'd work iff you make sure that the next request will end up with a different port. [17:01:12] Otherwise the webserver will restart, get the same port, and crash again. [17:01:48] Coren: randomize+probe? [17:02:04] probe by either connecting to it or parsing netproc? [17:03:57] Well, in the current setup, I simply use a list of ports local to the host, so I can rely on it. You could do the same, only downside is that the same port can't be used on two hosts which may become an issue with a lot of webgrid nodes. [17:04:30] Coren: yup, and it also means a random app can actually just open up a port, since not all port opening is under your full control [17:04:54] Coren: and the new one means we don't need to have 'webgrid' nodes as such. They can just run in the general exec nodes [17:05:26] YuviPanda: That's by design; I am /not/ letting people open random apps this way. :-) [17:05:36] Coren: any reason? [17:05:42] i mean [17:05:44] what reason? [17:05:47] gah, can't english [17:05:50] what's the reason? [17:05:59] YuviPanda: Privacy violations. No unproxied incoming connections. [17:06:13] Coren: err, these are all proxied. no ip is passed back [17:06:18] Oh, wait, you mean why no random app? [17:06:32] as in, I want a node app running and proxying. [17:07:08] Coren: or something of that sort. not just what lighty supports. [17:07:28] Support issues mostly. I don't want people running random webservers. If your project is complex enough that it needs a setup where you set a webserver up, then you need to spin off to your own project. [17:07:46] Coren: my problem was that the current setup completely kills any app that wants to use websockets [17:08:15] or Go. Or Scala. Or anything that doesn't have first rate support for CGI or fcgi or apache / lighty. [17:08:39] Coren: just like how you have webserver start, we can have uwsgi start or go start or whatever [17:09:18] Way outside the scope of tools. [17:09:20] Coren: not right now. I just want to make lighty now [17:09:32] Coren: what. using anything not PHP or Python is 'way out of scope of tools'? [17:10:53] Coren: wanting to use those languages on tools was the only reason I've been working on this proxy at all [17:11:22] Wait, what? Why are you under the impression that lighttpd is language dependent? [17:12:04] over anything that's not cgi or fcgi? [17:12:16] Coren: you can't really use websockets over lighttpd either [17:13:05] Coren: also I don't see why we can't have uwsgi or a node thing running instead of just lighttpd. They're just servers, and once we get the script done for lighttpd they'll be trivial to write [17:13:46] YuviPanda: Yes, but then those will also have their own grid nodes for resource allocation, and startup scripts under our control. [17:13:52] why [17:13:58] startup scripts under our control sure [17:14:01] why own grid? [17:14:03] jsut -mem [17:14:08] *jsub [17:14:16] Having a node you can only run X on means resource allocation is trivial to calculate. -mem oversubscribes for worst-case scenarios. [17:14:34] well, that's the case for general scripts now, isn't it? why can't a job also be a webserver? [17:14:54] worst case we'll add more nodes, and having more general purpose nodes > more special purpose nodes [17:14:57] Because that will oversubscribe by about 3 orders of magnitude. [17:15:08] No. General purpose nodes are *expensive* [17:15:17] is already the current case. [17:15:17] Special purpose nodes are cheap. [17:15:34] No it's not. Why do you think lighttpds run on a dedicated node atm? [17:15:50] That way (and only that way) I can fold the lighttpd footprint to one instance. [17:15:55] VMs in general are cheap. are you saying that we should not actually support websockets, go, node, etc because VMs aren't cheap? [17:16:09] Coren: that's because lighttpd is heavy! a node webserver isn't that different from a node bot [17:16:14] (which we already have a few running) [17:16:23] same for uwsgi, or go [17:16:44] You're not getting me. When allocating on a general purpose node, you have to allocate for the worst case vmem of every single process. [17:16:46] the resource alloc differences between a php script and a lightttpd+php isn't the same as one between a node server and a node bot [17:17:05] Coren: err, if we have a 'go start', it'll accept -mem since it'll just pass it on to jsub [17:17:08] When you are doing it on a special purpose node, you can discount the shared executable footprint --- 2-3 orders of magnitude. [17:17:28] You're not listening to me. [17:17:43] I am, I'm saying are we so short on resources? and also it's not different from running a bot [17:17:57] We are, and it is *completely* different. [17:18:06] Coren, is xtools using the new webserver? [17:18:12] Cyberpower678: You tell me. [17:18:18] I don't know. [17:18:26] Coren, how do I find out? [17:18:31] what do I have to do to convince you that supporting things that are not php or python is not a bad ideA? [17:18:38] Then probably not, unless you randomly do bits of configuration in your sleep. [17:19:21] Coren, can you help me setup xtools to use the new webserver. Please don't link me to the instructions, because that didn't work for me. [17:19:24] YuviPanda: Start by trying to make an argument about what I am talking about? [17:20:07] your argument, as I understand it, is that we don't have enough resources to run them on the exec nodes because it doesn't count for shared memory? [17:20:15] YuviPanda: The only place anyone said anything about not supporting things that are not php or python is, seemingly, in your imagination. [17:20:33] well, there isn't anything php nor python running right now, is there? [17:20:36] YuviPanda: Because I certainly said nothing of the sort. [17:21:02] Coren: okay, can you please reiterate again? [17:21:12] YuviPanda: That assertion is also false. I know of a couple tcl things running, also some perl. [17:21:26] YuviPanda: most people program in either python or php [17:21:39] Betacommand: of course. and 99% of the population also love wikitext. [17:21:53] Coren, can you help me setup xtools to use the new webserver. Please don't link me to the instructions, because that didn't work for me. [17:22:21] Coren: going back, *why not*? if it eats up too much resources, we can always take it away [17:22:28] Cyberpower678: Seems it's already new web http://tools.wmflabs.org/?status [17:22:31] YuviPanda: Things that answer on a port via the proxy will work through webgrid nodes (possibly several different ones), so that resource allocation will be manageable. I.e.: lighttpd nodes, uwsgi nodes sound like a good idea, and so on. [17:23:06] Cyberpower678: scroll to bottom [17:23:11] YuviPanda: are you trying to be a smartass? Ive seen and run some other stuff on labs [17:23:13] hedonil, it's not loading' [17:23:27] Cyberpower678: it's slow [17:23:50] Coren: okay. I still don't understand why they can't run on general nodes, but okay. better than nothing :) [17:23:52] hedonil, still nothing [17:23:57] maybe I'll understand some day. [17:24:14] YuviPanda: We already knows it does too much. I am *not* going to start oversubscribing resources just so people can run random web services. Right now, I support lighttpd. I will probably add a couple more as demand grows for them. [17:24:26] Cyberpower678: takes some time. else qstat will how your httpd [17:24:44] Coren: that's my other concern. if I want to run a go based service, I will have to find 'critical mass' for it [17:25:05] YuviPanda: I can run, without breaking a sweat, 120 lighttpd instances on one dedicated node. I could run ~12 on a general node. [17:25:40] 1708466 0.25944 httpd-xtoo local-xtools r 11/29/2013 15:27:38 webgrid@tools-webgrid-01.pmtpa [17:25:51] Coren, hedonil ^ [17:25:54] Coren: I'm not sure how much of a difference that'll be when you are running node or uwsgi, but I guess we can find out [17:25:58] eventually [17:26:01] YuviPanda: web nodes are for web stuff. exec nodes are for non-web stuff [17:26:29] Cyberpower78: yes, that's the xtools httpd [17:26:35] YuviPanda: https://wikitech.wikimedia.org/wiki/Special:FormEdit/New_Project_Request [17:26:43] hedonil, so is it running on the new webserver? [17:26:43] heh [17:26:57] Cyperpower678: yes [17:27:03] Cool. [17:27:20] YuviPanda: Tools is meant for a very specific use case: simple tools in a managed environment for people who don't want/don't care to do things like set webservers up. [17:27:58] uwsgi start app.py and go start app.go is what tools would be about, vs that link where I need to use puppet and setup everything myself [17:28:37] Cyberpower678: if you have any phpinfo() you can check that, too. e.g. https://tools.wmflabs.org/wikiviewstats/info.php (scroll to bottom) [17:28:46] YuviPanda: And I don't get your assertion that go is somehow special. Are you telling me you can use go in a fcgi setup? [17:29:02] Coren: fcgi is out if you want to run anything with websockets [17:29:35] Any idea why HTTP requests to gerrit.wikimedia.org sometimes 405 from labs? [17:30:05] 40*5*? [17:31:52] * brainwane plunges into putting her first tool on Tool Labs! [17:32:12] error: RPC failed; result=22, HTTP code = 405 [17:32:16] Coren: ^^ [17:32:32] YuviPanda: Yeah, it does. I don't see why that's an issue. Tools provide an environment where you can run things like bots, small web tools, some data analysis, and so on. It's not meant to be an infrastructure for running arbitrary applications on the 'net. [17:33:04] hmm, websockets are arbitrary applications? [17:33:06] oh well [17:33:31] marktraceur: Huh. That's completely insane; that'd mean you're hitting some proxy that doesn't speak DAV or something. I wish we could see if the IP you're connecting to varies? [17:34:40] I doubt it [17:34:50] YuviPanda: Websockets is just a ridiculously fancy way of opening an arbitrary TCP stream. You could run an IRC or X11 server that way, or any arbitrary service. The only thing "web" about websockets is that the setup protocol starts similar enough to HTTP that it fails cleanly when talking to an httpd. [17:35:09] So yeah, websockets are arbitrary applications. [17:35:13] Coren: I don't think there's any way to, but this happens basically every other time I run git fetch [17:35:19] well, and the fact that nginx supports proxying it out of the box, while it doesn't for TCP [17:35:39] YuviPanda: Sure it can, with an HTTP CONNECT request. [17:35:49] Same difference, really. [17:36:04] Or rather, it happens the first time I try, then it usually works the second time [17:36:12] well, does that mean WS won't be ever supported on toollabs? [17:36:34] Coren, the key-updating thing is fixed… are there any other urgent-type things from the weekend that I should think about before returning to my 'normal' task list? [17:36:46] Coren, the new webserver is impressive. [17:36:53] It is super fast, [17:38:02] Probably not. For one, there is literally no demand for it. And even if there was, it has a number of interesting questions of scope and reasonableness to support. *Do* we want to allow arbitrary services running on tools, and are we ready to support the users in doing so? (IMO, the answer to both of those is no) [17:38:09] Cyberpower678: That was the intent. :-) [17:38:11] well, ok [17:38:15] I understand :) [17:38:36] The page history statistics tool seems to be happy. [17:39:07] marktraceur: That is so effin bizzare! Can you warn me the next time you're going to git stuff so I can tcpdump the chatter? [17:40:08] YuviPanda: OTOH, anyone willing to manage their project is perfectly welcome to, provided they otherwise follow the TOS. [17:40:18] Coren, I forget how to set up a tool to use the new webserver. How do I do it again. [17:40:21] Coren: Sure :) [17:40:30] Cyberpower678: As a rule, just 'webservice start' [17:41:31] YuviPanda: Remember that tools has to be a balancing act between uptime, quality of user support, and flexibility. [17:41:33] Cyberpower678: Another way to check if your tool is running on webgrid is to look at the "Server" header in the HTTP response when you access the tool. [17:41:34] Coren: alright! I wish I had known that a few months back, at least wouldn't have had my spirits up [17:41:58] anomie, qstat is already telling me. [17:42:05] Thanks though. [17:42:36] YuviPanda: I still don't get what your issue might be. If you want to play/experiment with websocket stuff, you still can and your proxy is no less useful for it. [17:42:52] YuviPanda: It's just that *tools* isn't the right project for it. [17:42:56] Coren: I want to play with it *without* having to manage all the rest of the infrastructure [17:43:03] Coren: I wanted to play with it in tools. [17:43:13] YuviPanda: In other words, you want /me/ to do it for you! :-P [17:43:15] that was *my* motivating use case for the entire proxy [17:43:26] Coren: no, I *offer* to do it for you - all the supporting work. [17:43:41] Coren, there's a bug. [17:43:58] Coren: but I understand if you don't want to support it [17:44:12] I used webservice start on scottytools and it appeared in qstat as qw and then disappeared. [17:44:16] Coren, ^ [17:44:20] Coren: I am just... a bit sad. [17:44:23] Cyberpower678: Technically, qstat tells you if there's a lighttpd instance running for your tool, while the Server header will tell you if it was actually used. Although lighttpd existing and not being used would be a strange situation worthy of alerting Co-ren. [17:44:23] YuviPanda: FWIW, you can replicate the tools infrastructure by deploying the toollabs module too. [17:44:37] Cyberpower678: What does it say in your error.log? [17:45:26] YuviPanda: Well, gridengine isn't quite entirely puppetable, but I'd gladly give you a hand for the bits of necessary manual config. [17:45:32] 2013-12-02 17:41:43: (log.c.166) server started [17:45:32] 2013-12-02 17:41:43: (log.c.118) opening errorlog '/data/project/scottytools/access.log' failed: Permission denied [17:45:32] 2013-12-02 17:41:43: (server.c.938) Configuration of plugins failed. Going down. [17:45:34] Coren: :) [17:45:39] Coren: I don't want to run bots2 :P [17:46:03] Coren: again, I understand you don't want to support a large number of configurations. I really do. [17:46:09] Coren, ^^^^ [17:46:11] I'm not going to keep harping on this. [17:46:33] Cyberpower678: Check the file permissions on /data/project/scottytools/access.log [17:46:33] Cyberpower678: ... okay, you've just quoted it. Have you *read* it? It tells you exactly what is wrong. :-) [17:46:37] I'm still dissapointed, and my biggest motivation for working on that proxy is also just gone. but oh well. [17:47:50] YuviPanda: Hey, feel free to bring it up on engineering-l. Maybe your use case is popular enough that I'll get more opsen to manage tools as a result and we can then support that. :-) [17:48:12] Coren, I think there may be a file ownership bug. When you create a new instance, the access.log is created with the owner id being 0 [17:48:18] Coren: heh :D why not just assign the 250 spare opsen we have to tools? :) [17:48:33] Cyberpower678: Indeed, because that's what is needed for apache. [17:48:49] Cyberpower678: 'take access.log' [17:48:59] Already done. [17:49:03] It's running now [17:51:20] Coren, when you have a moment please review https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/November#Technical_Operations and contribute? [17:51:38] (And, I wonder if at some point we want to split that into two sections, one for labs and one for tool-labs?) [17:52:26] andrewbogott: On it. [17:52:48] * YuviPanda goes off to prep for meetings and stuf.  [17:53:26] I have successfully logged in to tools-login! Yay! [17:53:34] Coren: Not urgent, but I see task@tools-exec-01.pmtpa.wmflabs hasn't been accepting new jobs for several days now. [17:54:05] anomie: By design, I've disabled it to let it drain from jobs so I can reboot it; it's autofs is braindead. [17:54:25] its* [17:54:38] Coren: I thought that might be the case. Although considering some of the jobs on it now are continuous and have been running since October, how quickly will it drain? [17:55:09] I was planning on checking this morning to see if only continuous jobs were left; if that's the case I'll just force them to migrate. [17:55:24] brainwane: Welcome! [17:55:55] andrewbogott: The split might make sense as Tool labs "mainstreams". [17:56:27] brainwane: Are you back from your sabbatical? [17:56:42] (Or are you a different brainwane/incognito?) [17:56:49] ok, time to create a Tool account. Done! missing-from-wikipedia is here! yay. [17:57:23] andrewbogott: I am still on sabbatical! I am currently, as a volunteer, working on "Missing From Wikipedia" http://www.harihareswara.net/sumana/2013/11/13/0 and moving it into Tool Labs. [17:57:44] Ahah! Cool :) [17:59:38] Oh! I didn't know that brainwane nick. :-) [18:00:10] Coren: I haven't come into IRC much since Sept 27 :-) Hope you are doing well. [18:01:10] brainwane: Going crazy with the amount of work with the DC migration, but we'll survive it. [18:01:44] brainwane: Also, now on a req# so that's new. [18:01:55] nod [18:02:27] Coren, http is slow for every page. [18:02:34] https runs just fine. [18:04:45] If I wanted to use someone else's tool but slightly modified, how would I go about that? [18:05:24] Would I need to create a service group on the site and copy the files? [18:05:38] btw, tools.wmflabs.org feels pretty slow - 72 seconds for a fresh load of that index page [18:05:59] Coren: do you want me to file a bug about that, or do you already know why it's happening/why it's gonna be an ephemeral problem? [18:06:44] brainwane: It's a design problem that'll go away as soon as the new proxy goes into place. Easy peasy workaround: use HTTPS. [18:07:07] Adamrc: That's the easiest way to do it, yes. [18:07:07] Interesting. I have HTTPS Everywhere turned on -- does it not have a rule for tools.wmflabs? [18:07:09] * brainwane investigates [18:08:27] Coren: would that mean the slightly edited tool would be listed on tools.wmflabs.org? Or is it private? [18:09:12] (ah, got it. There is no rule for any wmflabs.org site.) [18:09:13] Adamrc: It'll be listed with the others. [18:12:21] Coren: thanks, I also read that you can't delete tools. Is there no better way when I literally only need to change one line? Also would I need permission from the tool creator? [18:13:00] Resisting the same tool that I'm going to use once only seems kind of bad [18:13:31] Adamrc: End users can't delete tools, but admins can. It's protection against self-nuking, not a prohibition. :-) You wouldn't normally need any sort of permission because everything here is supposed to be suitably open source. [18:13:45] (And if it isn't, that needs fixin') [18:14:23] But perhaps even simpler is to ask the tool maintainer to support your variation? All the more valuable if it's a use case other people might want in the future. [18:15:23] Adamrc: But if it's a single run of the tool, you can probably do it from your home one time without issues. That won't work if it needs to run unattended, though, or has a web interface. [18:16:09] Coren: it's not really a variation. I'm wanting to remove the limitation put in place by the creator to allow the query to run longer. I presume that's allowed? [18:16:47] Adamrc: Probably, but then there might be a good reason for the limit you might want to inquire about first. [18:16:58] Adamrc: But no, it's not against the rules. [18:17:24] Hmm, http://boogs.wmflabs.org/ still down. Is there anything I can do? :-/ [18:17:29] (yay, I now have a .description up at https://tools.wmflabs.org/ ! This feels like a Real Project now!) [18:17:57] andre__: is the VM refusing to restart? [18:17:58] Thanks Coren [18:18:42] so I am expected to restart myself? Alright, wasn't aware of that. So far I never had problems so I didn't care about restarting ever :) [18:19:32] andre__: Well, not so much "expected" as "you could have and it might have failed". I'll kick it for you if you want, it probably was on the downed box this weekend. [18:21:11] andre__: morning, what'ya restarting [18:21:40] andre__: What's the project name this lives in? [18:21:41] coren: has someone looked into the dewiki issues? [18:21:43] brainwane: i think Roan used to update that [18:21:43] mutante, heh, nothing yet. Could not reach boogs.wmflabs.org via browser [18:21:54] brainwane: re: httpseverywhere rules [18:22:02] Coren, https://wikitech.wikimedia.org/wiki/Nova_Resource:Bugzilla - I'll try later, need to grab some food now [18:22:11] giftpflanze: DBA lives in Oz, should be awake in a couple of hours. [18:22:31] andre__: kk, let me know if you need anything, i'm about to go to 'muzilla' and bbiaw [18:22:34] o.O [18:22:35] mutante: yeah, you're right [18:24:46] andre__: Yeah, boogs lives on the box that failed. Kicking it now. [18:32:51] (03PS1) 10Ori.livneh: Add MySQL credentials for Wikimania scholarship webapp [labs/private] - 10https://gerrit.wikimedia.org/r/98566 [18:34:27] (cc Coren) Is it ok to run "pip install" within my tools account to install stuff I need for my webapp? I presume we don't generally want people just installing random stuff off PyPI, but I noticed that pip is already something installed that I can run... [18:35:10] brainwane: file a bugzilla ticket for the install [18:35:14] brainwane: It's acceptable, but I recommend you request it being deployed globally instead (that's not a hard job); if it's useful to you it likely would be useful for others. [18:35:54] brainwane: In fact, using a virtualenv is pretty much the only way if you need an odd version different from the global one. [18:36:40] Coren: got it. OK. I checked by opening up a repl to see whether the stuff I wanted was already on the system and it was! [18:36:44] (Flask and requests) [18:38:36] petan, can you ping me when you have a moment to talk about nagios? [18:39:23] ok, https://bugzilla.wikimedia.org/show_bug.cgi?id=57863 filed to request the pep8 package. [18:44:19] Coren: see RT #6420 for a new pep8 package being requested [18:44:25] backport by hashar [18:44:31] links tickets [18:47:16] Coren, do you have some time to deal with the Java servlet thing? [18:47:49] Silke_WMDE: ping [18:47:57] pong [18:48:05] https://de.wikipedia.org/w/index.php?title=Wikipedia:Kurier&diff=125064431&oldid=125057865 where when? here? [18:49:25] the office hour? [18:49:30] brainwane: Commented on the bugzilla; I expect you only need it on the dev environment and not on the exec nodes? [18:49:47] no, that marc announced that dewiki will be repaired tomorrow [18:49:55] * brainwane commented back, Coren  [18:50:10] I don't know what the exec nodes are! so, you are probably right. [18:51:06] see y'all later. [18:52:00] giftpflanze: I'm looking for the source... hang on [18:52:14] I didn't say it'd be repaired tomorrow, I said I'll hop on the DBA's back today and won't let up until it is. :-) [18:52:41] :D [18:54:01] giftpflanze: here, today 15:14 - 15:20 [18:54:53] Coren: Oh. I translated it into "should be fixed tomorrow" [18:55:12] ireas: I will a bit later this afternoon? ~2h from now? [18:56:06] Coren, afternoon? night! :D Well, I can’t promise I’ll be online then, but I’ll try, okay? [18:56:21] ireas: Oh, yeah, timezone fun. [18:56:25] :D [18:56:40] ireas: Otherwise, you can hit me tomorrow starting 14h UTC [18:56:53] Coren, okay, thanks! [18:59:32] Coren: I proposed to invite people to an office hour to collect feedback and proposals. Would you join? There's no date yet. [18:59:44] Silke_WMDE: I'll certainly be there. [18:59:58] Silke_WMDE: Invite me when you know the time. [19:00:16] ok, cool. [19:04:42] mutante: The debs aren't in ~hashar/debs anymore. :-( [19:06:40] Coren, what is going here http://ganglia.wmflabs.org/latest/?c=tools&h=tools-login&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [19:07:23] Cyberpower678: I'm guessing I have to go do a round of 'kill-the-bot-that-shouldn't-be-running-here' [19:07:48] That explains why login hung up on me. [19:08:50] Coren, who's eating it all up? [19:09:43] The usual. Someone running a bot in screen. [19:10:01] Coren, didn't it say that people who ran it on login instead of the grid will have their access suspended? [19:10:20] Cyberpower678: I did. And they do; it's not the first time I have to do it. [19:14:16] Coren: hmm, i imagine he wanted to move away from noc.wm because it's tampa and put it elsewhere [19:14:38] Is the dewiki still broken Coren? [19:14:53] Coren: no wait https://noc.wikimedia.org/~hashar/debs/pep8_1.4.6/ [19:14:58] isnt that it [19:15:06] Cyberpower678, yes, it is [19:15:59] mutante: Oh duh! I looked in his actual home, not public_html. :-) Nevermind me. :-) [19:16:10] Thought so. [19:17:32] Coren: aah, also see there is both /pep8_1.4.6-1.1 and /pep8_1.4.6 .. not sure which is right [19:17:44] because i just gave you the other link [19:18:00] just go to parent and see [19:18:48] -1.1 is newer. [19:18:55] Cyberpower678: yes it is! and that's why we are at the ready - listening to some music instead: https://commons.wikimedia.org/wiki/File%3ARussian_Anthem_chorus.ogg [19:19:03] andrewbogott: thank you for the keys on labs :D [19:19:20] hashar: working now? [19:19:36] andrewbogott: yes it does [19:20:33] andrewbogott: such a pity the cron is not puppet managed, would be nice to add it in [19:20:51] hashar: ^ pep8 ..your package RT ticket linked to old BZ ticket where sumanah requested it to be installed globally.. [19:20:57] hashar: Yeah, since it's marked for death probably not worth spending time on. [19:21:43] andrewbogott: even without Gluster you will still have to generate the key files don't you ? [19:22:05] hashar, yeah, on NFS. Those bits /are/ puppetized though. [19:22:05] mutante: ah that is pep8_1.4.6-1.1 [19:22:14] andrewbogott: awesome :-] [19:22:40] mutante: the new minor debian version generates a python3-pep8 package which I eventually need sorry for the confusion [19:23:30] hashar: toollabs is already using the new NFS keys I believe. [19:24:12] andrewbogott: if you need other candidates we can migrate integration / deployment-prep to it as well [19:24:22] both projects are candidates for migration to eqiad as well :] [19:24:35] hashar: no worries, i just wanted to make sure we can kill two birds (tickets) with one stone, since Coren was looking [19:24:44] and thanks for the backport [19:24:55] mutante: I am not sure whether pep8 should be installed globally [19:25:25] hashar, if by 'globally' you mean 'throughout toollabs' then I'd say it should. [19:25:27] that is an utility to verify python coding style. Might make sense on labs, but probably not in production. [19:25:34] Oh, yeah, not on prod. [19:25:42] * andrewbogott buts out [19:25:48] um... [19:25:49] * andrewbogott butts out [19:25:53] yeah on tool labs that will be fine. Might want to add pyflakes as well [19:26:04] it does lame static code analysis [19:26:17] * hashar waves at andrewbogott  [19:27:33] i agree i didn't know what "globally" was supposed to really cover [19:27:46] but of course on thinks about labs first for this kind of thing [19:28:31] out again daughter crying [19:28:33] .. [19:30:02] mutante: In context, "globally" meant "for all tool labs project" [19:30:57] Coren, with FastCGI, can we up the memory usage limit to 3 Gigs? [19:31:12] Coren: makes sense, so nobody ever mentioned prod.. yep [19:31:20] Cyberpower678: ! [19:32:16] The webgrid has so much available memory. [19:32:32] Cyberpower678: 3G of ram? For a web page? Goodness no! [19:32:44] Cyberpower678: Are you just *begging* for a DOS? [19:33:11] It'll allow Page History Statistics to process pages with greater amounts of Revisions. [19:33:55] Cyberpower678: That needs to be queued and offlined; you are *not* doing something this heavy upon a simple unauthenticated web request. Just think of what will happen if a crawler hits that! [19:34:44] How about 1G? [19:35:09] Cyberpower678: If you're working with datasets that immense, you need to start rethinking the architecture; it most certainly is not apropriate for a web service. [19:36:26] Cyberpower678: Have a webservice to queue up requests and serve results and a single (on-grid) job to work on the queue, I'd say. [19:36:43] Hitting a web page shouldn't cause that kind of load, ever. [19:36:54] Ok. [19:37:02] curls [19:37:32] If you wanted it be very cool, you'd cache results too. Extra points for doing cumulative. :-) [19:40:09] andre__: Boogs is back up, but I don't see a webserver running. Perhaps it doesn't fire automatically? [19:43:29] Coren: httpd already running [19:44:05] andre__: it's back [19:45:12] The UI on boogs looks /much/ less like programmer-designed UI. :-) [19:47:28] Coren: heh, yea, changes from booogs eventually end up on prod.. we're getting there step by step [19:47:57] we sorted out a bunch of old custom stuff first to get closer to upstream.. [19:48:45] about to merge "Better instructions on top of enter_bug.cgi, link to guided form" by Tholam [19:50:42] Cyberpower678: sup? Fatal error: Maximum execution time of 30 seconds exceeded in /data/project/xtools/public_html/articleinfo/base.php on line 129 [19:51:10] hedonil, link? I'm working on that tool right now. [19:51:41] Cyberpower678: https://tools.wmflabs.org/xtools/articleinfo/ Page: Berlin [19:52:06] Coren: you can get pyflakes in Tools Labs dev environment as well https://gerrit.wikimedia.org/r/#/c/98594/ :D [19:52:07] Loaded in .5 sec [19:52:22] Coren, I'm testing all those weird funky ideas on boogs.wmflabs.org so I hope that looking less like programmer-designed is a compliment :P [19:52:47] andre__: It is. "Never trust a dev to do graphics or UI." [19:53:00] heh [19:53:10] As a GUI dev I take no offence to that [19:53:39] andre__: https://bugzilla.wikimedia.org/enter_bug.cgi?product=Wikimedia see new message. deployed [19:53:43] Damianz: But then, you aren't "a dev". You're a "UX person". :-) [19:53:50] re https://bugzilla.wikimedia.org/show_bug.cgi?id=52696 [19:54:01] Cyberpower678: Article: Berlin, Wiki: de, start: 2013-08-21, end: 2013-11-01 [19:54:15] hedonil, can you give me an exact link> [19:54:18] mutante, nice nice. Thanks [19:54:27] Coren: Nah, I work on like render engine and backend feature stuff... and make the graphics team hate me with my mock ups of what the UI might look like :D [19:54:46] Cyberpower678: https://tools.wmflabs.org/xtools/articleinfo/index.php?article=Berlin&lang=en&wiki=wikipedia&begin=2013-08-01&end=2013-11-01 [19:55:42] Coren, Fatal error: Maximum execution time of 30 seconds exceeded [19:55:45] * Damianz thinks we need to get hashar a bit of ocd to re-align all those comments [19:56:02] Can we at least up that 1 minute, Coren? [19:56:15] is the Bugzilla IRC bot gone? [19:56:29] Cyberpower678: is the date format correct? [19:56:41] nevermind [19:56:54] It would return an error if it wasn't. PHP is very flexible with date inputs. [19:57:25] so flexible that you never know what date PHP ends up returning back to you [19:58:04] (03CR) 10BryanDavis: [C: 031] Add MySQL credentials for Wikimania scholarship webapp [labs/private] - 10https://gerrit.wikimedia.org/r/98566 (owner: 10Ori.livneh) [19:58:16] Cyberpower678: what are the defaults? without date parameter it seems to work fine. [19:58:45] Cyberpower678: maybe time for some input validation? [20:02:17] !jenkins mwext-Collection-OfflineContentGenerator-node_modules-jslint [20:02:17] https://integration.wikimedia.org/ci/job/mwext-Collection-OfflineContentGenerator-node_modules-jslint [20:03:22] hedonil, very peculiar'\ [20:07:02] andre__: and the attachment warning is in prod as well now [20:13:45] Coren: do you know anything about ongoing 503 errors out of the varnish servers on beta labs, particularly for larger pages? [20:16:40] chrismcmahon: I've seen chatter on the mailing lists about it; it seems to be related to a kernel bug. [20:17:06] Faidon is the one with the know. [20:17:20] betalabs? [20:17:23] nope [20:17:35] that would be something labs-specific most probably [20:18:12] paravoid: yes, here is a recent one: https://saucelabs.com/jobs/6d65f1d28c7340c4bb03794010d83612 [20:18:16] I was expecting it'd be the same issue. [20:18:48] paravoid: is there a generic fix for this that beta does not have then? [20:18:53] But then again, the varish I know best is the one you spread on wood. [20:18:54] no [20:19:13] I haven't done anything on betalabs for many months [20:23:29] (03CR) 10Ori.livneh: [C: 032 V: 032] Add MySQL credentials for Wikimania scholarship webapp [labs/private] - 10https://gerrit.wikimedia.org/r/98566 (owner: 10Ori.livneh) [20:24:15] paravoid: any idea about what we might do to stop those 503 errors from varnish on beta labs? [20:29:29] chrismcmahon: it might be a >30s page load [20:29:31] working on that [20:41:03] paravoid: I added you to https://bugzilla.wikimedia.org/show_bug.cgi?id=57249 fwiw [20:41:16] chrismcmahon: I just pushed https://gerrit.wikimedia.org/r/98612 [20:41:22] chrismcmahon: it might help [20:41:35] it's hard to know since I haven't debugged betalabs at all [20:50:08] Can projects receive mail on tool labs yet? [20:56:47] chrismcmahon: that change should be deployed everywhere in at most 10 minutes more or so [20:57:17] thanks paravoid I'll be watching to close the bug for it I hope [20:57:31] valhallasw: Once we get the new mail infrastructure as a consequence of the move to the primary DC (i.e.: ~Feb) [20:58:01] Coren: .......ok [20:58:23] valhallasw: Do you have an urgent need? I might be able to reshuffle order of a few things if so. [20:58:50] Coren: well, for the nlwikibots project on the toolserver we used the project mail to coordinate [20:59:10] Coren: but maybe it's easier to ask for a lists.wikimedia.org address for that [20:59:23] valhallasw: I was about to suggest just that; it's also easier to maintain. [20:59:55] Not necessarily. The project mailing list would be just whoever is part of that project [21:00:00] valhallasw: Incoming mail is still on the map since that's the easiest way for endusers to talk to maintainers. [21:00:21] valhallasw: If you have a mailing list, it'll be trivial to direct the tool email to it once that's ready. [21:00:35] I'm a bit disappointed the TS will almost be killed before mail is up and running, but that's not your fault. [21:00:48] Sure. [21:01:03] * valhallasw finds out who to bug for a mailing list [21:01:09] valhallasw: Thehelpfulone [21:02:15] andrewbogott: How do you feel about getting that LDAP mail thing ready sooner rather than later? [21:02:28] andrewbogott: (Aka: how does your workload look?) [21:02:38] valhallasw: just make it a bugzilla in the mailman component .. and it'll be handled [21:02:45] mutante: ok! [21:03:36] Coren: I don't immediately know what you mean by 'LDAP mail thing' -- but I can probably work on it sometime soon if it isn't enormously complicated. [21:04:37] andrewbogott: In re the long conversation we had with Mark and Ryan for mail to/from tools. Basically, it means having an email entry in LDAP for every service group and a UI to enter and validate it. [21:05:02] andrewbogott: So as to avoid the trickery with .forward. [21:05:17] Coren: Ah, I think I tuned out that conversation when it happened :) That doesn't sound too bad though. [21:05:32] You would be writing the part where the entry actually does something? [21:05:37] andrewbogott: Yep. [21:06:25] be back later [21:09:04] Coren, you tell me when you have some free time? [21:09:25] ireas: I have a little bit of time I can give you now. [21:09:57] :D [21:10:19] okay, so as far as I see, I need a dedicated Java application server anyway, don’t I? [21:10:41] hey #-labs I'm having trouble getting multicast from one of my nodes to another [21:11:12] manybubbles: I honestly don't know how well (if at all) multicast is supported by our current openstack networking layer. [21:11:30] Coren: it was working for a while..... [21:11:47] that is kind of a pain because that is how we use elasticsearch in production [21:11:55] ireas: Probably, the primary question I think we need to ask is whether you are best served by the tool labs or whether it'd make more sense to have your own project. [21:12:44] Coren, well, it’s only one single tool, five classes or so, not even developed by me … so it be glad if it could be done within the tools project [21:13:01] manybubbles: I could see many ways to work around the issue, if it came to that (GRE tunnels, anyone?); but I can look into it if you want tomorrow if it's still biting you. [21:13:47] Coren: I'll send you an email if I get stuck. thanks [21:14:18] ireas: Have you tested it in an environment other than toolserver yet? If I had a clearer picture of what is needed, I can help you set it up. [21:15:16] Coren, no. but it does not need database access or anything, just a server that can host a web archive (WAR) [21:16:14] ireas: It might make sense to create a general-use tomcat then; while I don't think there are many of them I'm certain there is more than one javaish tool. [21:16:59] Coren, for me, that would be great [21:17:41] ireas: Tomcat is finnicky in multitenant setups though, and I do /not/ want to bring jboss in the picture. [21:17:54] * Coren ponders. [21:19:40] Coren: just extend the current NewWeb setup? [21:20:07] valhallasw: You mean by firing up a tomcat-per-tool? [21:20:11] Yep. [21:20:26] * Coren hears the cries of pain of the vm infrastructure. [21:20:34] Not exactly memory-friendly, but hey. [21:20:37] I suppose if we have relatively /few/ of them. [21:21:01] The advantage to this is that this should be relatively simple to implement and deploy. [21:21:28] That is, it's a simple variation on the webgrid theme. [21:21:52] Only it does tomcat rather than lighttpd. [21:23:50] It's a severe waste of resources, but OTOH there should be relatively few of them so the maintenance win might well offset it. [21:24:12] Coren: in my previous life I advocated for all services to embed their dependencies and just run on the command line/init script. in hte case of java this meant bundling jetty [21:24:18] Bringing in a shared tomcat setup would be like stabbing yourself [21:24:55] Damianz: Yeah, that's what I mean by "mainetance win". Not stabbing oneself is a win. [21:27:00] ireas: Sounds like I have a reasonable enough plan to move forward. Will you be in a position to test things for me with your app during the week? [21:27:50] Coren, yes, but if I encounter problems, I might have to contact the original author to fix it. but in general, yes. [21:28:36] ireas: Do you know if tomcat 7.0.26 does the trick? [21:31:42] Coren, I have not tested yet, but I see no reason why it should not work. (If it doesn’t, it would be my task to fix the tool’s code.) [21:57:59] In my app on Tool Labs, I'm trying to check "does an article with pagename foo exist on en.wiki". If I'm trying to check that on the replicated databases, is there a table I should be looking at with a preexisting index? what is the fastest way to do this? [21:58:09] "libroken" (kerberos), whenever that scrolls by in like apt upgrades .. i see "broken" [21:58:47] MariaDB [enwiki_p]> select page_id from page where page_title = 'MariaDB'; takes 10 seconds to give me a result - there must be a faster way [21:58:59] I don't really need the pageID, after all [21:59:53] andrewbogott: any ideas? [22:00:00] brainwane: you dont wanna use api? [22:00:10] mutante: I just want to do whatever is faster. [22:00:23] I thought I would test the API and the db methods to see which way is faster [22:00:29] brainwane: http://stackoverflow.com/questions/2439824/check-if-a-mediawiki-page-exists-python [22:00:35] sometimes I wanna look up, like, about 2000 nouns [22:00:40] i found that when i wanted to give you an API example [22:00:49] to check for DOESNOTEXIST in the xml [22:00:55] see answer 4 [22:00:57] right, I have that running [22:01:27] https://github.com/brainwane/missing-from-wikipedia/blob/master/webapp/missing.py is currently running with the API, checking for the "missing" attribute [22:01:37] not sure about an optimized query [22:01:43] mutante: do you think it'll be faster .... ah, ok [22:01:44] and you should expect that to be faster [22:01:55] dunno yet, comparing both was a good idea [22:02:03] sorry, Daniel, do you mean I should expect an optimized SQL query to be faster? (that's my intuition) [22:02:09] brainwane: I don't know much about the wiki databases… it should be possible via the DB but, no idea how it's structured :( [22:02:19] mlergh. Is Australia up yet? [22:02:29] brainwane: Will your tool also /write/ to the db or a live wiki? [22:02:52] no, no writing [22:03:00] (at least, no writing to a wiki or DB) [22:03:09] Ah, either one should be OK then. [22:03:12] * brainwane tries to remember sean's nick) [22:03:29] uh, sorry about the unbalanced paren [22:03:38] It's 9AM in Sydney… I don't know what hours he keeps though :) springle, you up? [22:04:10] hm, he has a bouncer in wikimedia-operations but says 'springle-away'. [22:04:11] brainwane: hey, try to use this sophisticated tool: https://tools.wmflabs.org/wikiviewstats/index.php?lang=en=page=MariaDB ; if it's not there, it doesn't exist ! I promise 8-) [22:04:23] brainwane: it might need an index to make it faster.. yes [22:04:40] but i dont know that much about the db structure either [22:05:15] hedonil: I am now looking at that site and trying to figure out how to use it :-) [22:05:46] brainwane: die index ist namespace, title. so simply add page_namespace=0 in where clause [22:06:25] there you go, index will make a huge difference [22:06:27] denglish, the index is ..... [22:06:38] Merlissimo: I enjoyed feeling as though I knew German! [22:07:16] * Merlissimo is talking german on telephone while writing in english here [22:07:41] THAT IS WAY FASTER. Danke! [22:08:14] brainwane: you also have pep8 now.. btw.. bug 57863 [22:08:38] Merlissimo: you ruined my efforts :P [22:09:03] Merlissimo, by the way: how to figure out? SHOW INDEX FROM page; does not work [22:09:26] thank you mutante! [22:10:02] ireas: show create table /view [22:10:06] i know it from toolserver and mediawiki installations. havn't looked on maria@labs [22:10:17] ah, thx hedonil! [22:10:27] ireas: show index from page from wikidb; ? [22:11:29] you can let mysql test there query with explain. the result should contain the usable keys and indexes [22:13:17] Merlissimo: hmmmm [22:13:18] MariaDB [enwiki_p]> EXPLAIN select page_id from page where page_title = 'MariaDB'; [22:13:18] ERROR 1345 (HY000): EXPLAIN/SHOW can not be issued; lacking privileges for underlying table [22:13:26] :( [22:14:19] hedonil: aha! I understand. So, your site is good for checking whether an *individual* page exists! but I will be helping people do those checks in bulk [22:14:19] brainwane: are you on the _p table? [22:14:23] hedonil, hm, I don’t see the indexes there … anyway, if I once need to know it, I’ll ask one of you ;) [22:14:36] hedonil: I am indeed, per https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help [22:14:53] * hedonil checks that again [22:15:52] hi kaldari I met someone today who is super into Guarani [22:15:58] hedonil: but your tool wikiviewstats returns true although the page could be deleted or did never exist [22:16:37] Coren: is it me or is the webserver really slow? [22:17:11] Betacommand, you tried using HTTPS? [22:17:30] ireas: no [22:17:33] ireas: you have to issue that command on the xxwiki database - not xxwiki_p [22:17:38] Betacommand: it is faster right now using HTTPS [22:18:22] * valhallasw waves to brainwane  [22:18:27] hi valhallasw! [22:18:59] ireas: http://pastebin.com/xfyyL616 [22:19:02] brainwane: if you want to select from page, make sure to use page_title *and* page_namespace [22:19:03] Betacommand, see my question at 17.49 and Co-rens response [22:19:14] (maybe someone else responded already, but I did not see that directly) [22:19:15] Coren: our old friend http bug is back [22:19:15] valhallasw: will do! [22:19:29] hedonil, ah, great :) thx [22:19:45] brainwane: there is an index on (page_namespace, page_title) which means that it needs to know the namespace before looking up the page title [22:20:04] brainwane: basically, it has 15 big books of page titles, and needs to know in which one to look [22:20:11] right [22:20:12] got it [22:20:13] thanks! [22:21:40] brainwane: Do they use the Guaraní Wikipedia? [22:22:22] kaldari: this person is working on a project to help add content to Guarani Wikipedia, yes [22:22:55] that sounds cool [22:23:27] Now if only someone would start a Mayan Wikipedia [22:40:12] Merlissimo: That's true, but I only promised that it never existed if not there:-D .sales. [22:42:53] Merlissimo: But my new customer brainwane has gone anyway..:'( [22:43:23] hedonil: [22:43:26] ? [22:43:28] I'm here! [22:43:31] kaldari: after 2012? [22:43:34] hedonil: btw do you have some rewriting enabled that allows creating internal links on wikimedia wikis to your results? [22:45:12] brainwane: no why? nothing was invented. [[toollabs:toolname]] is it. do you mean that? [22:45:58] sth. like https[[toollabs:wikiviewstats/dewiki/MariaDB]] [22:48:18] mutante: The beginning of the new b'ak'tun would be the perfect time to start it :) [22:48:54] kaldari: hehe, excellent:) [22:49:23] creating external link is so much wikitext [{{fullurl:toollabs:wikiviewstats/index.php|lang=de&page=MariaDB}} wikiviewstats:MariaDB] [23:00:19] ok, yes, select exists (select page_id from page where page_title = 'MariaDBBBBBBBBBBBB' AND page_namespace=0); runs in like 0.2 seconds. Way faster than API [23:02:11] brainwane: yeah, the magic of composite index! [23:03:12] riiiight, "composite" - I was trying to remember that word, hedonil! [23:03:44] brainwane: if you are doing bulk request, create a temp table and join it with page. that will be much fast in summary [23:04:03] brainwane: let's play a hymn for that! https://commons.wikimedia.org/wiki/File%3ARussian_Anthem_chorus.ogg [23:05:46] hedonil: Does that particular song have a specific connection to composite indices? [23:06:28] brainwane: no. it's the only hymn I currently have in stock. [23:06:57] All right! [23:07:08] :-D [23:07:30] Merlissimo: do you use an ORM or do you just do stuff in raw SQL in your applications? [23:07:32] (or some other method?) [23:09:43] raw sql [23:10:04] nod [23:10:58] I was thinking of using SQLAlchemy. But it seemed like a lot of stuff to mess with. Merlissimo do you have an example app where you create a temp table & join it with an existing table (that's already in one of the MediaWiki DBs)? I learn better from examples sometime [23:11:01] sometimes* [23:11:41] I know about https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Steps_to_create_a_user_database_on_the_replica_servers [23:12:07] which prgramming language are you using? [23:12:26] (sometimes I will be just doing checks on, like, 1-20 pagenames, and sometimes it'll be like 2000. I may have a conditional to switch from one method to another based on # of items to check) [23:12:32] I'm using Python, Merlissimo [23:13:46] Hi, anybody knows how to install JSDuck in Tool Labs? https://github.com/senchalabs/jsduck , it uses RubyGems, I don't know how it works [23:16:28] brainwane: i think we should go in query [23:19:15] hedonil: can we make a deal: you add the possibility to link so sites using internal links and i raise your tool traffic at factor three? [23:20:10] Merlissimo: sounds like a deal [23:23:59] Merlissimo: to be honest -currently I'm just a bit lazy - my plans for this week are: release the first api and invite others to join the tool [23:24:30] hedonil: do you have some traffic statistics? so that i know the value i have to reach? [23:25:09] Thanks all! off for the night [23:28:33] Merlissimo: unfortunately not, since this tool http://tools.wmflabs.org/awstats/cgi-bin/awstats.pl?framename=mainright&output=urldetail doesn't seem to show up any longer tools using the new web [23:30:38] Merlissimo: but we can go by httpd cpu time: http://tools.wmflabs.org/?status currently 30m40s in ~8 days [23:31:09] hedonil, by the way, maybe Intuition (a i18n system for Toolserver and Tool Labs) is something for you: https://tools.wmflabs.org/intuition/ [23:31:52] * hedonil looks into it [23:38:38] Merlissimo: looks pretty cool. but if I select for example greek, not all of the L10n strings are beeing translated. but yeah cool. [23:40:26] Merlissimo: right now I'm hiring mother tongue translators, japanese and russian, waiting for response [23:41:38] Merlissimo: the chinese, spanisch and italian translators have just begun [23:42:49] Merlissimo: the swedish translator has already finished his job [23:44:32] adding vietnamese should be easy. there is a viwiki admin who seems to love translating everything. ;-) [23:46:18] Merlissimo: yeah, translate ALL the things! [23:56:30] Merlissimo: back to intuition tool. yes great approach. If tool developpers would prepare their tools properly for internationalization, one could use some shared l10n resources. [23:58:26] hedonil, that’s one aspect. and: you don’t have to care about translations. and the users can set their preferred language once instead of once for each tool [23:59:06] hedonil, a Python port of intuition is on my to-do list (/cc Krinkle – I did not forget the WP:PB nl tranlsations! but I haven’t got the time! :D) [23:59:39] :) [23:59:52] ireas::-D