[14:56:41] Hey average, you around? [14:57:38] yes Mr. Halfak [14:57:48] how may I help you ? [14:57:55] Do you have access to s5-analytics and would you be willing to try out a query for me? [14:58:11] can you please write the full name of that machine ? [14:58:32] :) [14:58:49] The alias is s5-analytics-slave.eqiad.wmnet [14:58:55] thanks, trying to log in now [14:59:34] I get permission denied on that host [14:59:52] OK. No worries. [15:00:03] I'll try to get one of the DBAs. Thanks for trying. :) [15:00:09] cool [15:16:24] halfak: I might be able to help you with that query [15:16:44] Oh yeah? [15:16:52] halfak: do you still need it ? [15:17:58] Sure. I just want *any* query to return results on the machine. [15:18:04] average, halfak: Note that the s5 slave is currently lagging ~1/2 an hour. Be easy on it :-) [15:18:28] OK that might be relevant to the problems I'm having. [15:18:35] qchris, how do you check? [15:19:23] Very dumb... as I lack real access... I check the recent changes for wikis that are expected to see lots of updates. [15:20:02] so like "SELECT max(rc_timestamp) FROM recentchanges"? [15:20:26] halfak: Yes. [15:20:35] That worked reliably up to now. [15:20:53] If that is too distanc from now, I go check ganglia for the real lag. [15:21:40] I am connected to the s5-analytics-slave mysql [15:21:42] uhm [15:21:53] Gotcha. That's my strategy too. It's clunky though. We should really have notifications sent to a mailing or maybe to this channel when replag gets high. [15:21:58] qchris: I can run queries with limit to avoid hitting it [15:22:33] I'd love to get a ping the next time that I accidentally lock up replication. [15:25:40] halfak: ping from whom? and ping once which threshold of replication lag is passed? [15:30:31] Good question. Preferably a ping from a script throttled at once/hour or more. I'd like to start with something like a 10 minute replag and see how often we get pinged. [16:22:22] oook hi average, i'm making some coffee but can help [16:40:28] great, Snaps [16:40:31] null will work just fine [17:44:45] ottomata: OKAY, Ill push it [18:50:48] halfak: Sorry. I missed your response. I have a system that runs twice a day and checks for 10 minute replag. [18:51:16] That was like 1 fail every two weeks. (Estimated) [18:52:02] If you want something reliable ... Icinga comes into mind. [18:52:28] Icinga does lag checks IIRC. Did you ask ops to include you in the Icinga alerts? [19:56:41] qchris: I haven't heard of Icinga. Is there anything you could link me to? [19:57:09] https://icinga.wikimedia.org/icinga [19:57:22] But that requires authorization? ... mhmm [19:57:43] Last time I checked ... it worked without IIRC [19:58:15] Either ask in the ops channel, or ping ottomata when he comes online again. [19:58:18] No worries. I'll go bug people in #-labs when I get a minute. Thanks! [19:58:37] Ok.