[00:53:20] <^d> Friendly reminder: gerrit coming down in about ~40mins for update. [04:09:17] Coren, ping again [04:10:12] hedonil1, ping [04:10:25] Cyberpower678: pong [04:10:47] hedonil1, can you help me figure out why xtools is now throwing a 502. [04:11:03] Cyberpower678: will try [04:11:33] Not long after removing the BOM and restarting the webserver, did it start crashing. [04:11:54] I restarted it again, which fixed it for about 5 minutes and then it crashed again. [04:12:26] Cyberpower678: just called it, seems it doesn't respond right now [04:12:50] I can restart the webserver but that will only work for another 5 minutes. [04:14:47] Cyberpower678: What does the error log say? [04:15:12] I wish Coren was here right now. I'm pretty certain that a BOM does not cause it to work for 5 minutes and then go haywire. :p [04:16:11] scfc_de, I need to pastebin it. [04:18:08] scfc_de, http://pastebin.com/Zysaf3mP [04:18:49] Cyberpower678: my suggestion 1) webservice stop 2) keep it stopped, so thaht apache takes over - then test again [04:20:46] last line of pastebin:  sockets disabled, connection limit reached [04:20:59] webservice shut down [04:21:07] 'k [04:21:12] No input file specified. [04:21:43] Cyberpower678: There's "sockets disabled, connection limit reached" => http://stackoverflow.com/questions/1728306/what-does-sockets-disabled-connection-limit-reached-mean => http://redmine.lighttpd.net/projects/1/wiki/Docs_Performance => "Increasing the ``server.max-fds`` limit will reduce the probability of this problem.". So it sounds like the tools get hammered and/or run too long. [04:22:14] ok. but with apache it says index.php not found [04:22:57] access.log has lots of requests from http://www.phonifier.com => some crawler. [04:23:11] DDOS [04:23:23] your index.php is missing an ?> [04:23:38] hedonil1, that shouldn't make a difference. [04:23:49] we try everything [04:23:52] And http://www.phonifier.com is a spammer site that redirects to a couple of page view sites. *argl*. [04:24:55] scfc_de, can you setup labs to deny those connections? [04:27:15] !log tools tools-webproxy: Blocked Phonifier [04:27:23] Logged the message, Master [04:27:27] Cyberpower678: Done. Can you "webservice start" again? [04:27:47] scfc_de, restarting [04:28:26] xtools is up for now. [04:29:08] works now, you were innocent. no BOM ;) [04:29:14] :D [04:29:28] * Cyberpower678 expects xtools to crash by tomorrow. :p [04:29:39] That looks quieter. [04:33:31] * Cyberpower678 wonders if he can get some sleep now. [04:33:44] Zzz [04:33:47] Z [04:33:57] Good night. [04:37:37] !BOM2 is Cyberpower678 was innocent [04:37:38] You are not authorized to perform this, sorry [04:37:53] ;) [04:37:59] !BOM2 is Cyberpower678 was innocent [04:37:59] Key was added [04:38:07] ha [04:38:10] scfc_de, sorry to bather you again. [04:38:19] scfc_de, http://tools.wmflabs.org/xtools/ec [04:38:27] webgrid is acting funny. [04:39:39] Cyberpower678: That's a bug; you need to call http://tools.wmflabs.org/xtools/ec/ instead. [04:39:59] scfc_de, is this bug going to be fixed? [04:40:56] Cyberpower678: Yes, but probably not soon (cf. https://bugzilla.wikimedia.org/show_bug.cgi?id=59926). [04:41:07] ok. [04:41:12] good night. [04:42:36] andrewbogott: I try to query SMW with the API, but it returns 500. Could you take a look at the logs if they show something obvious? (https://wikitech.wikimedia.org/w/api.php?action=askargs&conditions=%5B%5BCategory:Shell%20Access%20Requests%5D%5D%20%5B%5BIs%20Completed::No%5D%5D&printouts=?Shell%20Request%20User%20Name) [04:44:30] (Bigger picture: action=ask works, but apparently only allows one printout; for another application I need two and I think I need to use askargs for that.) [04:46:34] scfc_de: PHP Catchable fatal error: Argument 2 passed to SMWQueryProcessor::addThisPrintout() must be an array, null given, called in /srv/org/wikimedia/controller/wikis/slot0/extensions/SemanticMediaWiki/includes/api/ApiSMWQuery.php on line 50 and defined in /srv/org/wikimedia/controller/wikis/slot0/extensions/SemanticMediaWiki/includes/SMW_QueryProcessor.php on line 234 [04:47:08] andrewbogott: Okay, that sounds like an SMW bug to me, I'll report it. [04:47:19] Thanks! [04:47:26] ok. Keep in mind we're running an oldish version of smw [04:50:27] andrewbogott: I'll do. Another question: Precise's curl/wget/mono https://wikitech.wikimedia.org complain about the certificate not being verifiable. https://en.wikipedia.org/ works. Is that a known problem tracked somewhere? [04:51:05] what does en have to do with this question? [04:51:24] The certs on wikitech are being messed with but I'm not sure the current state. [04:51:44] The certificate for en.wikipedia.org is okay by curl/wget's standards, wikitech.wikimedia.org's not. [04:52:21] Is RobH the one to ask? [04:52:22] ah, sure. [04:52:27] Probably [04:52:48] There's an RT ticket… [04:52:51] * andrewbogott digs [04:53:23] He's online in -operations, I can take it there. [04:53:37] Looks like rob is in process of buying a proper cert. https://rt.wikimedia.org/Ticket/Display.html?id=6592 [04:53:59] No RT here :-). Any ETA? [04:55:27] dunno [04:55:36] I'll ask RobH. [04:55:57] Re SMW, I submitted the *exact* same bug more than half a year ago, and you're right, it was closed as INVALID (cf. https://bugzilla.wikimedia.org/50950). So I'll have to work around that until we upgrade to 1.9. [05:47:11] Mono sucks. Want to parse JSON, long try & error: Nada in Mono 2.10. Want to parse XML instead, long try & error: Possible in Mono 2.10 with one working solution and a dozen non-, but SMW still has the XML bug. YAML only with external library. *Argl*. [05:49:06] But: WDDX works around the SMW bug and can be parsed as XML. So let's see if this hell has an exit. [07:33:52] https://tools.wmflabs.org/ what is used to retrieve the list of contributors? I already know that tool description is at ~/.description [08:15:50] gry: ldap [08:18:30] Is wikitech wiki running on the same cluster as tools? [08:23:37] gry: I think it runs on a real server [08:27:51] gry: no but you can query ldap at some point from tools as well [08:39:02] it runs on virt0 [08:39:02] gry: wikitech is hosted on a production box, the same one that manages the labs cluster. [08:39:08] ^ [09:06:04] Thanks all. [11:54:06] Hello, since helpmebot is down at the moment, can we bring wm-bot to #wikipedia-en-help? [11:54:51] ...So that it'll be easier for us to link articles. [11:55:28] petan|wk: ^ ? [11:55:36] @help [11:55:36] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 1.20.2.1 my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [11:56:35] https://meta.wikimedia.org/wiki/WM-Bot#Getting_bot_to_the_channel [11:56:53] Zhaofeng_Li: try this: [11:56:57] you can just join #wm-botconnect, where you can type @add #nameofchannelyouwant and it will join your channel and give you admin permissions for that channel. [11:57:16] ugh, "you can just join #wm-bot", i'll try this [12:18:58] petan|wk, eh, could you help us to enable the @link function on #wikipedia-en-help? That'd save us a lot of time pasting the links ourselves. [12:20:31] Coren, xtools isn't loading again. [12:20:52] Coren, and it's not because of a BOM. I swear. [12:21:10] It looks like something might be DDoSing it again. [12:23:52] addshore, are you able to look? [12:25:05] petan|wk, ping [12:25:23] This is why I ping people first. [12:28:43] Cyberpower678: i tried the @trusted command etc, just to find out about the ones on that channel [12:28:53] but it didn't talk to us over there [12:29:05] but, it was already on the channel, we did not make it join [12:29:11] mutante, ?? [12:31:33] Cyberpower678: the bug report would be: enable wm-bot on #wikipedia-en-help while it's already joined and sits in the channel, sorry if that was unrelated and should just be for petan [12:32:34] mutante, I'm not sure what you're talking about, but I really don't have the time right now. Something is DDoSing xtools, and no one that works for labs is here to address what is likely a crawler disbling it. As such, for now, I'm disabling xtools for the time being. [12:32:56] Anyone here going to FOSDEM this weekend? [12:34:20] Cyberpower678: just ignore me then, no worries, i was merely forwarding a request for somebody else from another channel [12:34:33] Damianz: yes [12:34:37] mutante, I don't maintain wm-bot [12:35:06] cool [12:35:10] i don't either, i tried to help, it's been a while now, at this point it doesnt matter [12:35:13] mutante: that was Zhaofeng_Li [12:35:20] also has other stuff to do [12:35:37] * yuvipanda pats mutante [12:35:51] they will survive until petan|wk is back, bug him for docs [12:36:04] hehe :P [12:36:10] mutante: bug/1 :P [12:36:15] mutante: going to FOSDEM? [12:36:23] mutante: aren't you already in europe? [12:36:27] reply: yes, bug resolved [12:36:37] yes, i am [12:36:46] so it's not far by train [12:36:57] the nice train, Thalys, _not_ DB [12:38:22] mutante: hah, nice! [13:01:57] anyone admin to assist me with getting putty setup up for labs. I keep getting the message "No supported authentication methods available (server sent, publickey, hostbased) [13:02:38] pageant is set and running, publickey is uploaded [13:12:45] mutante: what docs [13:13:05] okay, I have fiddled with space variations and copy and paste and now have an error that says "server unexpectedly closed network connection" [13:13:22] petan|wk: Help:Putty, Help:Access, ... [13:13:48] Help:Access to ToolLabs instances with PuTTY and WinSCP [13:15:46] with the last possibly too many connection attempts [13:16:41] sDrewth: sorry I don't understand what you want to say [13:16:53] "Help:Access to ToolLabs instances with PuTTY and WinSCP" [13:16:58] what does that sentence even mean? [13:17:11] [[Help:Access to ToolLabs instances with PuTTY and WinSCP]] [13:17:17] @link [13:17:17] https://wikitech.wikimedia.org/wiki/Help:Access_to_ToolLabs_instances_with_PuTTY_and_WinSCP [13:17:25] Zhaofeng_Li: petan|wk is now back , can you explain to him please [13:17:47] sDrewth: ok that's a nice page [13:17:49] mutante, petan helped us to enable @link [13:17:51] what am I supposed to do with that [13:18:00] great, thanks petan|wk [13:18:04] yw [13:18:54] petan|wk: in case I have exceeded by login attempts, what is the reset time? [13:19:18] login attempts to ssh? [13:19:27] I don't even think there are some [13:19:29] yes [13:19:35] you should be instantly able to reconnect? [13:19:43] okay [13:20:02] these limits are insane I hope we don't admins that would ever set them [13:20:22] it only gives ability to trolls to lock out accounts of others [13:21:50] oh well, I have something screwed up in my config [13:38:27] Coren, ping [13:39:14] Cyberpower678: busy pong. I have to give an interview shortly. [13:39:42] Coren, something is DDoSing xtools. Can you have a look. scfc_de blocked something yesterday. [13:40:18] Not now, but after yes. [13:42:47] Coren, thank you. :-) [15:57:04] Coren, how did the interview go? [16:34:34] Coren, still in the interview? [17:08:58] Coren, it's getting worse. [17:09:14] I'm seeing at 23 hits per minute. [17:09:18] *at least [17:09:30] I have shut down the target tools. [17:09:58] All coming from the same UA [17:10:55] !log deployment-prep added addshore and jhall to project so they can grep logs [17:10:57] Logged the message, Master [17:25:52] addshore, do you have access to block incoming connections to labs? [17:25:57] anomie, ^ [17:26:21] Cyberpower678: I don't [17:26:27] Damn, [17:26:38] There's appears to be a crawler on xtools. [17:26:54] Hammering it with 30 hits a minute. [17:28:48] anomie, someone is clogging -login again. [17:29:00] Takes 5-10 seconds to login. [17:30:33] Well I have to go. [18:18:27] !ping [18:18:27] !pong [18:30:28] pang! [18:32:00] there is a load of ~5.5 on tools-login [18:32:21] someone runs a lot of pywikipedias [18:33:51] * valhallasw hands Coren the banhammer [18:34:35] looks like a broken crontab to me [18:35:17] Most likely. I go check now. [18:35:54] Oh, FFS. I *told* him before to send that to the grid. [18:36:43] * yuvipanda wonders who it is again [18:39:14] Killed with prejudice, and crontab disabled. [18:39:55] Ah, and someone (else) is running a but in .NET, not less. [18:40:17] a but? [18:40:18] :P [18:41:03] * lbenedix1 wonders why people are not using the grid thingy... its soooooo easy to use [18:41:12] lbenedix1: because it's even easier not to use it [18:41:36] but your processes wont get killed there [18:41:55] Hm, that one is being run interactively; so no killy yet. [18:43:13] killall5 -9 [18:44:12] Nope, but someone was nice enough to run a bot in a screen that catches SIGTERM. SIGKILL it is. [19:00:59] (03PS3) 10coren: Package toollabs: add webservice [labs/toollabs] - 10https://gerrit.wikimedia.org/r/102740 [19:01:27] (03CR) 10coren: [C: 032] Package toollabs: add webservice [labs/toollabs] - 10https://gerrit.wikimedia.org/r/102740 (owner: 10coren) [19:01:37] (03CR) 10coren: [V: 032] Package toollabs: add webservice [labs/toollabs] - 10https://gerrit.wikimedia.org/r/102740 (owner: 10coren) [19:47:34] !accessproblems is Do you try to access bastion.wmflabs.org or tools-login.wmflabs.org on Linux or Windows? Do you use the username listed as "Instance shell account name" on https://wikitech.wikimedia.org/wiki/Special:Preferences? Have you read https://wikitech.wikimedia.org/wiki/Help:Access? [19:47:35] You are not authorized to perform this, sorry [19:47:44] Coren, ping [19:47:46] Someone trustworthy around? [19:48:04] @trusted [19:48:04] I trust: .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@wikimedia/Ryan-lane (2admin), .*@wikipedia/.* (2trusted), .*@nightshade.toolserver.org (2trusted), .*@wikimedia/Krinkle (2admin), .*@[Ww]ikimedia/.* (2trusted), .*@wikipedia/Cyberpower678 (2admin), .*@wirenat2\.strw\.leidenuniv\.nl (2trusted), .*@unaffiliated/valhallasw (2trusted), .*@mediawiki/yuvipanda (2admin), .*@wikipedia/Coren (2admin), [19:48:12] * Coren is very much untrustworthy. I could kill your processes without warnings. :-) [19:48:18] Cyberpower678: BOM! [19:48:29] Coren, have you been able to look into the crawler on xtools? [19:48:39] Oh, so I just need to go through nightshade! :-) [19:49:06] !accessproblems is Do you try to access bastion.wmflabs.org or tools-login.wmflabs.org on Linux or Windows? Do you use the username listed as "Instance shell account name" on https://wikitech.wikimedia.org/wiki/Special:Preferences? Have you read https://wikitech.wikimedia.org/wiki/Help:Access? [19:49:07] Key was added [19:49:12] valhallasw: Thanks! [19:49:25] Cyberpower678: Not really. It's an evil bot that disobeys robots.txt? [19:49:48] Coren, can you block it. [19:50:03] It's making about 30 requests every minute. [19:50:05] Cyberpower678: I might; need moar details. [19:50:11] !accessproblems is If you have troubles connecting to bastion.wmflabs.org or tools-login.wmflabs.org, please check 1) if you use the username listed as "Instance shell account name" on https://wikitech.wikimedia.org/wiki/Special:Preferences and b) if https://wikitech.wikimedia.org/wiki/Help:Access helps. If that doesn't help, please ask your question, and add your operating system (linux/windows?) and shell username. [19:50:11] This key already exist - remove it, if you want to change it [19:50:21] !del accessproblems [19:50:21] If you want to remove a key, type !accessproblems del [19:50:24] ugh. [19:50:25] Coren, what details do you need? [19:50:29] !accessproblems del [19:50:29] Successfully removed accessproblems [19:50:31] !accessproblems is If you have troubles connecting to bastion.wmflabs.org or tools-login.wmflabs.org, please check 1) if you use the username listed as "Instance shell account name" on https://wikitech.wikimedia.org/wiki/Special:Preferences and b) if https://wikitech.wikimedia.org/wiki/Help:Access helps. If that doesn't help, please ask your question, and add your operating system (linux/windows?) and shell username. [19:50:31] Key was added [19:50:48] Cyberpower678: Ideally, the UA if its distinctive. Otherwise, tell me what URLs it's hitting so I can grab what I need from the logs. [19:51:40] !valhallasw is This user floods the channel by trying to have me memorize long statements. [19:51:40] Key was added [19:52:04] Coren, I can give you the UA. I consider pretty distinctive. [19:52:59] * Cyberpower678 downloads the access log. [20:02:49] Coren, bear with me here. The log is huge. [20:06:22] Cyberpower678: How about logging into Tools and looking at only the last section?! [20:06:56] scfc_de, how do I do that? [20:07:58] "become xtools" (I believe), "less access.log" (or whatever it's called), either the [End] key or ... "G" to scroll to the bottom, "q" to quit. [20:08:56] "tail" will print out the last lines of a file [20:09:28] tail -n [k] for some k will give you the last k lines [20:09:48] Coren, Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36 [20:10:01] One of many UAs [20:10:10] That are similar to each other. [20:10:23] Well, if you got many problematic UAs then there's little I can do on that front. What URI are they hitting? [20:12:46] Also, why? Chome/ Safari/? Heh. [20:13:16] Can somebody please kill whatever is running on -login [20:14:16] CPU's mostly idling, memory has more than half a gig free. [20:14:57] It's very slllooooooooooooow [20:15:03] for me. [20:15:06] Cyberpower678: Reasonably fast for me. [20:18:34] Coren, http://pastebin.com/7kLkvkZu [20:18:40] Hope that helps [20:18:49] Just look for the funny looking UAs. [20:22:12] Coren, does that help/ [20:22:53] Well, assuming your problem is with /pcount/, I'm on it now. BTW, here's another good reason to split your tools. [20:23:18] Coren, I knew you would say that. [20:26:06] Hm. Most of your load comes from one cloud provider; that's clearly not endusers. [20:27:17] Three hosters, actually. Someone's having fun with scraper bots I'm guessing. [20:27:41] EC2? [20:28:11] is there a mysql lib for python 3 on tool labs? [20:28:59] Coren, they're having fun in the wrong location. I disabled the affected tools, so those bots aren't scraping anything but an error message. [20:30:31] Cyberpower678: I blocked 3 IP ranges that count for about 90% of your traffic or so. [20:30:45] Ok. [20:31:32] Should I go ahead and re-enable the tools? [20:31:52] lbenedix1: python-mysql doesn't have a python 3 port afaik... [20:36:26] Coren, has it quieted down? [20:36:55] Cyberpower678: It's blocked at the network level, your tool(s) shouldn't see those anymore. [20:39:21] Seems to be quiet [20:40:21] Core [20:40:23] Coren, [20:40:25] webservice restart [20:40:25] Retarting webservice....... [20:40:35] ^ :D [20:43:06] (Actually, /no/ tools should see those. I saw no reason to let them play around tools.wmflabs.org from cloud servers) [20:45:44] Coren, there's still activity. [20:46:12] Well, you'd /hope/ so, tools with no users are boring. :-) [20:46:38] No. I mean bot activity, [20:49:23] That's unavoidable; but I've blocked all the biggest offenders, so you should be okay. [20:49:49] Coren, can you block Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.59.8 (KHTML, like Gecko) Version/5.1.9 Safari/534.59.8 [20:49:54] I'm also seeing very little activity around /xtools/ so it can't be all that bad. [20:50:05] Wait no. [20:50:11] Dont do that. [20:50:21] I wouldn't have anyways; that's a normal UA. [20:50:33] Exactly [20:51:04] Also, I've only seen a half-dozen requests in the past several minutes, so I'd harly think you have a problem. :-) [20:51:20] Coren, I meant Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36 [20:52:41] That looks like a normal enduser IP, and I see very few requests coming from there. [20:53:23] The UA is really funny though. [20:54:18] Meh. Some people have fake UA turned on to protect their privacy. I'm not going to block an IP just for that. :-) [20:54:22] Cyberpower678: http://stackoverflow.com/questions/4024230/strange-user-agent-with-google-chrome [20:55:46] Aha, so it'd be chrome lying "I am *ALL THE BROWSERS*" by default now. :-) [20:56:12] So yeah, UA is only useful when it doesn't lie (like polite bots do) [20:56:42] Coren: *Polite* bots obey to robots.txt which should disallow all access :-). [20:58:01] scfc_de, thank you. Just another reason why google sucks. [20:58:05] IMO [20:59:46] Coren, I'm re-enabling the tools. [20:59:56] Things seem calm enough for me now. [21:01:04] Cyberpower678: There's nothing weird about that useragent string [21:01:09] All browsers have UAs like that [21:01:32] FastLizard4, Chrome safari? [21:01:46] Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20121026 Firefox/16.0 SeaMonkey/2.13.2 [21:01:54] There's a Firefox useragent I pulled off my website logs [21:02:38] Cyberpower678: It's the browser indicating all of its capabilities, not just its version [21:02:50] Cyberpower678: For the Chrome UA, it's saying: [21:03:41] I am Mozilla 5.0 compatible, running on Windows 7 64-bit, using the Apple Web Kit renderer version 537.36 (which is KHTML and Gecko-like), Chrome version 32.0.1700.76 equivalent to Safari version 537.36 [21:03:55] Note that the Apple Web Kit version number indicated is the same as the Safari version [21:04:04] csbot reports itself as "CorenSearchBot/1.7 en" [21:04:14] That's because bots don't have GUIs [21:04:21] They don't need to advertise things like what rendering engine they use [21:04:37] Because rednering engine doesn't mean diddly squat to a web crawler ;P [21:04:41] Meh, if you use the UA for picking features, you're doing it wrong anyways. [21:04:51] Coren: But this is how The Universe works [21:04:58] Like it or hate it, it's how things have developed [21:05:07] Nope. You should use actual feature tests. If you even look at the UA, you fail. [21:05:19] Coren: People should use strong passwords. [21:05:23] Doesn't mean they actually will [21:05:36] Likewise, browsers will probably have bloated UAs for the foreseeable future [21:05:55] Because there will always be those people out there who think that using the UA to do feature tests is proper [21:06:03] Don't most modern sites ignore UA's? It's just the SAPs of this world that do evil stuff like that. [21:06:26] valhallasw: Yes; but I'm trying to explain that there's nothing suspicuous about the user-agent string Cyberpower678 noted [21:06:31] Sure. [21:06:36] Coren my bot identifies itself as Peachy [21:06:37] It's just a standard (bloated) GUI browser UA [21:07:04] Heck, even Lynx has a nice bloated UA [21:07:06] Lynx/2.8.8dev.9 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/2.12.14 [21:07:20] So does YandexBot [21:07:22] Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) [21:08:15] Here's an IE UA [21:08:17] Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0) [21:08:37] Mozilla 4.0 Compatible, Microsoft Internet Explorer 8.0 on Windows Vista running the Trident 4.0 rendering engine [21:08:57] !FastLizard4 is UA expert [21:08:57] Key was added [21:09:20] Cyberpower678: It's what I get for being in the web admin field for five years now :P [21:09:34] Dealing with user agent strings for analytics is a pain in the ass [21:09:38] For this very reason [21:10:27] :D [21:10:33] Tools are back online [21:11:59] http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/ [21:12:09] For those of you who want the gory details about how our UA strings became so convoluted [21:22:46] hey, I am having trouble sshing into a labs vm (parsoid-spof in the visualeditor project) on its public IP [21:26:22] gwicke: What does "ssh -v" say? [21:27:31] scfc_de: https://gist.github.com/gwicke/d2129c70e1464824aab5 [21:27:58] I rebooted the VM without success [21:28:45] gwicke: That doesn't look complete?! ("debug1: Offering DSA public key: /home/gabriel/.ssh/id_dsa") [21:33:04] scfc_de, that's where it is hanging [21:33:54] I am able to log into bastion though [21:34:02] the login is slow, but succeeds [21:34:48] from bastion to parsoid-spof hangs too though [21:36:28] * Coren checks. [21:36:57] If I "ssh -v scfc@parsoid.wmflabs.org", it hangs at "debug1: Offering RSA public key: /home/tim/.ssh/id_rsa" (no access denied or such). So it looks like parsoid-spof is "broken". [21:37:26] Hi all. Anybody used with the job queue and Semantic Media Wiki? [21:37:32] 2 cents Coren will now say "Gluster" :-). [21:37:48] we had issues with gluster preventing logins before [21:37:54] home dir not accessible IIRC [21:37:57] scfc_de: Nope. The box is either hosed hard or trashing. Can't even log as root. [21:38:16] rebooting did nothing [21:38:18] gwicke: You say you've just rebooted it? [21:38:26] yes, about 30 minutes ago [21:38:28] I'm with the webplatform.org project, working after Ryan_Lane's work on the infrastructure. And my job queue is filling itself [21:38:48] renoirb: Are you looking for support for a Labs instance or in general? For the latter, #semantic-mediawiki is probably better. [21:39:21] gwicke: It might be gluster being dead enough to hang when mounting. Lemme go and kick it. [21:39:30] * Coren rarely looses a bet when blaming gluster. [21:39:33] Thanks scfc_de, I just came here because Ryan had previously suggested me to come over here for issues when he is not available. [21:39:39] i'll try there then. [21:39:40] * gwicke snickers [21:40:41] gwicke: What's the project name? [21:41:06] visualeditor [21:41:14] node parsoid-spof [21:43:36] * Coren is trying to kick the gluster volumes. [21:44:54] ... always a good sign when the glusterd needs to be killed -9 [21:45:52] gwicke: Box may need to be rebooted again though.' [21:53:46] k [21:58:37] still hanging after hitting the reboot button ~5 minutes ago [23:25:31] can you folks manually reboot parsoid-spof in the VE project? [23:25:41] the reboot from the web interface does not seem to work [23:29:05] it appears that cronjobs may not be firing on the bingle tools-labs instance - anyone know what might be going on/available to help? [23:32:09] gwicke: Hm. I can forcibly kill the VM. [23:32:29] awjr: "bingle tools-labs instance" = ? [23:32:53] scfc_de: the bingle project running to tool labs [23:33:20] or perhaps more ontologically correct, the tool called bingle, running on tool labs [23:33:54] Coren, that would be great [23:34:07] we are still locked out of the vm [23:35:00] awjr: Please don't */1, even for testing. :-) Also, -N bingle with -once means that only one of the three can /ever/ start, which I expect isn't what you mean. :-) [23:35:15] ahem [23:35:18] sorry Coren :p [23:35:28] i took that out [23:35:50] looking through syslog it seems that cron has been firing [23:36:12] awjr: Normally, if cron fails, you'll get an email; have you checked if you have any? If the failiure happens later, that'll end up in bingle.err [23:36:20] Coren: i checked, there was no mail [23:36:32] Coren: i think this has been going on for a while [23:36:39] But both tasks being named 'bingle' guarantees that at most one could work; you specifically ask for -once [23:37:10] i thought that meant that at most one could work at a time and the rest would be queued? [23:37:55] Coren: so it should likely be jsub -N bingle ? [23:38:43] awjr: Or name one bingle and the other bingle-analytics or somesuch. [23:38:48] i see [23:39:14] And if you don't use "-once", you should probably decrease the frequency :-). [23:39:17] Look at bingle.err [23:39:20] [Tue Jan 28 22:05:05 2014] there is a job named 'bingle' already active [23:40:18] awjr: And, indeed, I see you have three running jobs right now (two of them named bingle) so no new jobs with that name would ever be accepted. [23:40:28] ok that makes sense [23:40:42] thanks Coren; i updated the names - hopefully that should do the trick [23:40:46] awjr: Is your intent to start them and keep them running or do they in fact run at interval? [23:40:55] they do in fact run at interval [23:41:15] and only one of the named jobs should run at once [23:44:01] Because you have one of each running since 01/22/2014 11:05:16 [23:44:23] @_@ [23:44:41] that doesnt sound right [23:44:51] it should only take at most a few minutes to run [23:45:04] is it possible to kill those Coren? [23:45:41] awjr: qstat: list your jobs. qdel: removes them. :-) [23:45:53] sweet [23:51:01] gwicke: Sorry, but that instance appears to be completely hosed. [23:51:21] gwicke: Ima try one last thing, but I think you're at the "destroy and recreate from puppet" stage. [23:51:21] hmm [23:51:45] we have rt test results on that instance's storage [23:52:08] Why not on project storage? [23:55:12] gwicke: w00t; either it's fixed or I managed to catch it at the right time, but I'm on it as root. [23:55:21] gwicke: So, try? [23:55:24] sweet [23:55:43] It required tearing it down entirely then restarting it at the kvm level. [23:55:53] Coren, project storage is on gluster [23:56:00] Dunno how it managed to wedge itself that hard. [23:56:04] so unusable for a 40g mysql db [23:56:15] even a git pull on a small repo often takes minutes [23:56:15] Oh, definitely. [23:56:21] I'm in too! [23:56:22] awesome [23:56:24] Also, DB on a VM is teh suxx0rs [23:56:36] yeah, we should get real hw [23:56:50] vms are also useless for catching perf regressions [23:57:46] I'm still not sure how that VM manage to wedge itself that hard, but meh. It's fixed. [23:58:31] thanks!