[00:02:24] Still I need [[.backslash.]]" [00:40:09] !log deployment-prep disabled puppet on deployment-mediawiki{01,02} and enabled verbose apache logging [00:40:11] Logged the message, Master [00:40:19] bd808: fyi ^^ [00:40:25] this is to troubleshoot the issue brett is looking at [00:40:27] ori: Cool. Thanks [01:42:27] What's with the 84 minutes of replag? [01:48:37] @replag [02:12:50] would be nice if wikimedia had its own"pastebin" tool ;) [02:34:34] why? [02:34:42] http://paste.tstarling.com/ sort of qualifies [02:38:10] jeremyb: https://tools.wmflabs.org/paste/ [02:38:20] comets: ^ [02:39:03] Somehow I don't think forever option is gonna last [02:41:41] it used too.. [02:42:22] Dispenser: why, dispenser? [02:44:34] Because we're stingy on storage [02:44:56] heh [02:44:57] And we like making thing "look clean" [02:47:13] So I went with img_name REGEXP "[\\\\\\\\]" as that can be unescaped 3 times and still match a single backslash [02:48:06] or unescaped once and match the same thing [05:32:01] !log deployment-prep depooled deployment-mediawiki02 to investigate HHVM lock-up by cherry-picking I7df8c5310 on beta. [05:32:05] Logged the message, Master [06:54:13] ori: hello [06:54:27] ori: I guess you can take some extra time to debug hhvm on beta cluster [06:54:43] hey hashar. it's back up, but limited to one server [06:54:52] ah that might work as well :] [06:55:09] brett is compiling a build with some debug calls [06:55:19] once that's done we'll bring it back up so it'll be two again [06:55:23] sorry about that :( [06:55:42] you can give the angry mobs my home address (on officewiki) :P [06:55:46] ori: we talked about broken beta during our week qa [06:55:51] nobody is angry [06:56:11] just slightly annoyed about beta being unstable because it is constantly changing [06:56:15] that is not related to just hhvm [06:56:29] puppet changes / software upgrades etc are causing similar issues on a daily basis [06:58:06] ori: and I am heading to the shower :-]  Breakfast complete [06:58:42] hashar: thanks for understanding [06:58:47] enjoy your morning [11:31:49] petan: do you know how many public IPs the tools exec hosts have? [11:32:56] Nemo_bis 1 for each [11:33:44] petan: thanks; that's what I thought, but the project page had empty IP column so I got confused :) [11:34:22] It's all puppet's fault :) [12:24:10] 3Tool Labs tools / 3Quentinv57's tools: SUL Info bug - 10https://bugzilla.wikimedia.org/69004 (10Dan) 3UNCO p:3Unprio s:3major a:3None Here is an error with some username: http://tools.wmflabs.org/quentinv57-tools/tools/sulinfo.php?username=%C8%9Aetcu+Mircea+Rare%C8%99 Your tool cannot provide result... [12:45:56] heh, someone tried 'apt-get install arm-linux' [13:26:53] !log changed inplace bt-hhvm on deployment-mediawiki01/02 to also copy the binary [13:26:54] changed is not a valid project. [13:27:12] right [13:27:19] !log deployment-prep changed inplace bt-hhvm on deployment-mediawiki01/02 to also copy the binary [13:27:21] Logged the message, Master [13:32:57] godog: I think the script is in puppet [13:33:23] hashar: yeah I found https://gerrit.wikimedia.org/r/#/c/150593/ but it isn't merged yet (updated) [13:33:28] ah [13:33:35] we also have our own puppetmaster on deployment-salt.eqiad.wmflabs [13:33:48] though that Gerrit patch is not applied there [13:34:22] 3Tool Labs tools / 3Quentinv57's tools: Username with diacritical symbol: "Your tool cannot provide results for this user." - 10https://bugzilla.wikimedia.org/69004 (10Andre Klapper) [13:38:00] springle, ping [13:38:36] Coren_away, I'm getting some serious performance issue from the DBs. [13:38:40] [0] => Array [13:38:40] ( [13:38:40] [time] => 23.61 [13:38:40] [query] => SELECT rev_timestamp, page_title, page_namespace FROM revision_userindex JOIN page ON page_id = rev_page WHERE (`rev_user` = '14836860') AND `rev_timestamp` > 1 ORDER BY rev_timestamp ASC LIMIT 0,2695596; [13:38:40] [result] => succeeded [13:38:42] ) [13:45:51] bug fill it Cyberpower678 ! [14:12:16] !BOM [14:12:16] Did Cyberpower678 use his crappy editor again? [14:12:21] :p [14:12:48] Furtunately no. I use a PHP IDE which is smart enough to not BOM it. :D [14:50:51] !log deployment-prep Restarted stuck hhvm on deployment-mediawiki02; apache had 89 children waiting for a response [14:50:57] Logged the message, Master [15:02:15] !log deployment-prep Cleaned up puppet repo on deployment-salt; merge conflicts with local Ia463120 hack; reapplied depool of deployment-mediawiki01 [15:02:17] Logged the message, Master [15:03:01] !log deployment-prep Updated cherry-pick of Iceb8f43 [15:03:03] Logged the message, Master [16:16:06] !ping [16:16:06] !pong [16:40:52] 3Wikimedia Labs / 3deployment-prep (beta): Automate updating the puppet checkout - 10https://bugzilla.wikimedia.org/66683#c7 (10Bryan Davis) Patch applied in beta via cherry-pick (how meta). The cron that was running a version of this script from my home dir has been disabled. [17:18:08] 3Wikimedia Labs / 3Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - 10https://bugzilla.wikimedia.org/48501 (10Bryan Davis) [17:18:08] 3Wikimedia Labs / 3deployment-prep (beta): beta labs no longer listens for HTTPS - 10https://bugzilla.wikimedia.org/68387 (10Bryan Davis) [17:33:37] 3Wikimedia Labs / 3Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - 10https://bugzilla.wikimedia.org/48501#c94 (10Bryan Davis) (In reply to Matthew Flaschen from comment #93) > If cost is the issue, did we consider setting up our own certificate > authority (chained to an existi... [17:46:26] !ping [17:46:26] !pong [17:58:04] valhallasw`cloud: btw, the wikitech OAuth issue was a problem on wikitech's end, has been fixed now [18:05:44] YuviPanda: yay! So what cool tool did you build? [18:06:56] valhallasw`cloud: quarry.wmflabs.org :) [18:07:07] valhallasw`cloud: it's tied to mw.org now, I'm wondering if I should switch it to wikitech or meta. [18:08:06] YuviPanda: Thats like tsreports but better? :p [18:08:16] valhallasw`cloud: tsreports lets you run custom SQL? [18:08:52] 3Wikimedia Labs / 3Infrastructure: "Bad gateway" - 10https://bugzilla.wikimedia.org/68457#c2 (10Magnus Manske) 5NEW>3RESO/FIX Changed timeout in library code, works now. [18:09:41] YuviPanda: not custom, but it can do any sql [18:10:05] valhallasw`cloud: right, but this is a different thing, no? Very much 'exploratory' coding for random researchers, etc [18:11:27] Right, but it might make sense to integrate them - save query as report, or something like that [18:12:03] valhallasw`cloud: hmm, does tsreport run them recurringly? [18:13:04] It caches them depending on query duration [18:13:40] First view after cache invalidation triggers a background query [18:14:25] valhallasw`cloud: hmm, right. I still think these are different things - I'm planning on adding a schema viewer instead, for example [18:14:31] valhallasw`cloud: and a 'fork' button [18:17:31] 2796636 0.30000 goodarticl tools.legobo Eqw 07/30/2014 06:40:07 1 [18:17:38] what's the command to figure out what the error is again? [18:18:49] qstat -j [18:18:58] error reason 1: can't get password entry for user "tools.legobot". Either the user does not exist or NIS error! [18:18:59] ??? [18:19:49] YuviPanda: ^ known issue? [18:20:02] huh [18:20:03] legoktm: no [18:20:09] wtf is NIS? [18:20:15] might be ldap [18:20:19] legoktm: try becoming legobot? [18:20:22] legoktm: wait, you alread are [18:20:49] quick question: i can ssh to bastion.wmflabs.org but i'm unable to ssh from there to wikistream-web which I have connected to before [18:21:10] i'm getting a Permission denied (publickey). error ; any ideas? [18:21:36] edsu: are you using a proxycommand? or ssh key forwarding? [18:21:43] YuviPanda: I killed 3 broken jobs and left one in case you want to debug [18:21:45] neither [18:21:57] edsu: ah, right. you should, let me see if I can find you a link. [18:22:01] YuviPanda: the jobs are from 2 days ago, so it might not matter. just had users complain on my talk page so... [18:22:04] !access [18:22:04] https://wikitech.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [18:22:05] i'm just sshing to bastion, and then attempting to ssh to wikistream-web [18:22:32] edsu: ^ see 'Accessing instances with ProxyCommand ssh option (recommended)' there [18:22:40] edsu: indeed, you need to use either one of these methods anyway [18:23:05] YuviPanda: ok, i haven't needed to do that in the past ; but i'll try it [18:23:11] edsu: ok [18:24:09] YuviPanda: i would've thought that i would have the same problem accessing wikistream-web if i am proxying through bastion though [18:24:42] edsu: hmm, did your instance have a public IP? if so, that might have worked. [18:24:54] I'm not fully sure what's happening, though [18:26:11] YuviPanda: bastion.wmflabs.org is public, i ssh'd there, and then used to ssh on to wikistream-web [18:26:33] edsu: in that case, I think you had setup ssh agent forwarding and then forgotten about it or something like that. [18:26:39] YuviPanda: but i cannot do that anymore because wikistream-web doesn't like my public key [18:26:59] edsu: hmm, how long ago did you last login to wikistream-web? was this before the eqiad migration? [18:27:12] YuviPanda: probably 2 months ago [18:27:19] YuviPanda: it was after the migration [18:27:21] hmm, I'm not sure :| [18:27:33] YuviPanda: because i had to get wikistream running again after the migration :) [18:27:36] edsu: poke andrewbogott? he might be able to help [18:27:38] edsu: :) [18:28:01] thx, i'll follow these instructions and make sure they don't make the problem go away first [18:29:27] edsu: :) [18:30:17] yeah, same problem when proxying: Permission denied (publickey). [18:30:57] andrewbogott: hey, YuviPanda said you might be able to help me with an ssh problem w/ wikistream-web instance? [18:31:09] edsu: yes, I'm looking. [18:31:44] edsu: it looks to me like that VM has locked up; do you mind if I reboot it? [18:31:54] Is it providing web services currently, or are they down? [18:32:19] Hm, looks like it is [18:33:40] it's up [18:33:59] i don't mind at all ; but sshd on there seems to be responding at least [18:35:42] edsu: try to connect now? [18:37:22] edsu: I'm going to reboot for an unrelated issue. We'll see if it helps [18:38:46] andrewbogott: same problem: https://gist.github.com/edsu/415f4fe15769b5c9dd00 [18:41:39] andrewbogott: and now the service is down too ... i guess it didn't restart and i need to log in to get it going again [18:41:52] edsu: I'm still looking, some strange things happening here. [18:42:11] andrewbogott: got it; if you are in the middle of something and want me to create a ticket somewhere just let me know [19:04:38] !ping [19:04:38] !pong [19:09:08] edsu: that box should be fixed now… are you able to log in and restart your service? [19:12:16] andrewbogott++ # thanks, i can ssh now yes [19:12:26] andrewbogott: any idea what happened? [19:12:43] cool. Yeah, the instance didn't know what its domain was, due to some leftover setting pre-migration. [19:12:53] And, being confused about domain, it couldn't find out its instance ID [19:12:57] which broke puppet [19:13:00] gotcha [19:13:00] which broke ldap [19:13:21] nice detective work, sir -- i appreciate it [19:13:42] sure thing. I'm hoping that there don't turn out to be 60 more VMs with this problem :) [19:14:18] andrewbogott: can you increase quota for quarry project? [19:15:49] YuviPanda: which quota? [19:16:07] andrewbogott: hmm, I need at least enough for another 'large' instance, so at least CPU [19:17:02] YuviPanda: did that do it? [19:17:07] andrewbogott: looking [19:18:16] andrewbogott: hmm, my instance 'quarry-test' has been stuck at 'deleting' for about... 2 days now [19:20:08] Try again? :p [19:20:23] andrewbogott: doing [19:21:34] YuviPanda: I'll see what the commandline can accomplish [19:21:48] andrewbogott: 'failed to create instanc'e [19:21:48] ok [19:22:52] deleteing from the commandline seems to've worked better. Which, the command the webui runs should've been identical :( [19:23:00] Anyway, should be lots of quota headroom now [19:23:49] andrewbogott: yay, cool [19:47:23] ori: beta seems to be holding up OK now, did you change something since yesterday? [19:48:54] chrismcmahon: hobbled it to half-capacity :) i wish i could tell you that we figured out the underlying issue, but it simply hasn't manifested yet. i suspect it'll happen again. i am trying to watch it closely though to make sure i'm there to respond. and we have it as brett's (fb hhvm guy) top priority. [19:49:28] ori: got it. last couple of days this has been about the time it starts to fall over [19:50:19] chrismcmahon: sorry :/ it must have been frustrating for you. if it's any consolation, it's very good that we caught this in beta; i'm not sure we would have caught it elsewhere (other than prod, i mean). [19:53:15] ori: actually, it did spark a conversation. I'd like to create a beta2 for hacking on cross-cutting stuff like search, hhvm, db hacking, other system-wide experiments. beta2 would still run master branch and be a target for tests, but config'd for global changes [19:53:28] can anyone give me an update on the storage for dumps in Labs ?? [19:53:32] ori: /srv/wikidiff2/Wikidiff2.cpp:11:26: fatal error: thai/thailib.h: No such file or directory [19:53:35] trying to build wikidiff2 [19:53:45] maybe I just need libthai or something [19:53:46] * swtaarrs checks [19:54:00] thanks swtaarrs ! [19:54:37] hmm [19:54:46] GerardM-: I think the dumps are in the process of copying now. Coren is away, maybe andrewbogott knows? [19:55:06] ori: installed libthai-dev, rebuilding [19:55:19] swtaarrs: not all of hhvm, tho, right? just wikidiff2? [19:55:22] yeah [19:55:27] YuviPanda the fact that you expect this is already good news and really welcome [19:55:27] whew [19:55:31] :) [19:57:32] YuviPanda: I don't know for sure, but, yeah, the copy started recently. [19:57:40] GerardM-: ^ [19:57:53] :) [19:57:55] GerardM-: I suspect that's going to go on for a while. I don't know if this will be resolved before wikimania [19:58:28] ori: also, not so frustrating for me, except for fielding complaints from Language, Mobile, Flow, and sundry. I was surprised this week how many teams use beta labs for so many different purposes. [19:58:37] What I care for it that the most recent dumps become available.. It allows us to take actions on what is in Wikidata [21:30:25] swtaarrs: any luck? [21:49:38] ori: I tried to restart with the debug server but it kept crashing [21:49:41] trying to figure out why [21:51:26] hmm I restored the original config and it's still crashing [22:00:58] ori: does upstart log stderr anywhere? [22:01:16] /var/log/upstart/hhvm.log [22:03:15] aha [22:03:16] thanks [22:36:02] legoktm, YuviPanda: I saw a major fluke in LDAP lookups two days ago at 7:00Z (I assume YuviPanda got dozens or hundreds of mail as well). I didn't consider it grave then, but stuck jobs would prevent the same from being resubmitted for continuous jobs. [22:36:21] scfc_de: yeah, I had to do that for a couple of other apps [22:36:27] I think I just remove all "E" jobs. [22:36:37] scfc_de: the stuck jobs were queued at 7:00, so that lines up properly [22:36:47] I just killed all the jobs and it's working smoothly now [22:41:04] !log tools Deleted all jobs in "E" state that were caused by an LDAP failure at ~ 2014-07-30 07:00Z ("can't get password entry for user [...]") [22:41:06] Logged the message, Master [23:00:52] 3Wikimedia Labs / 3deployment-prep (beta): beta labs no longer listens for HTTPS - 10https://bugzilla.wikimedia.org/68387#c8 (10se4598) I'm pretty sure it has not/never worked the last month, b/c occasionally I still hit a old https-beta link from my history, which was never working after migration. This bu... [23:01:07] 3Wikimedia Labs / 3deployment-prep (beta): enable SSL/https support again - 10https://bugzilla.wikimedia.org/63538 (10se4598) [23:12:27] petan: ping? can you make wm-bot join #wikimedia-research and log? [23:12:34] wm-bot: @join #wikimedia-research [23:12:34] Hi YuviPanda, there is some error, I am a stupid bot and I am not intelligent enough to hold a conversation with you :-) [23:12:41] @join #wikimedia-research [23:12:50] wm-bot: @add #wikimedia-research [23:13:59] wm-bot: @help [23:14:03] !help [23:14:03] !documentation for labs !wm-bot for bot [23:14:07] !wm-bot [23:14:07] http://meta.wikimedia.org/wiki/WM-Bot [23:26:23] YuviPanda: has http://ganglia.wmflabs.org/latest/ always given the "There was an error collecting ganglia data (127.0.0.1:8654): fsockopen error: Connection refused" error? [23:26:33] greg-g: indeed. ganglia is dead. [23:26:56] greg-g: if you're looking for betalabs, graphite.wmflabs.org is a thing, but it's not considered 'production' yet. a new machine is on the way for it, and I'll be setting it up after wikimania [23:26:57]