[09:00:23] springle can i ask you a question? [09:00:41] nuria: hi. sure [09:01:09] Would you know if any recent changes have been done to the labs dbs (enwiki, de wiki)? [09:01:31] yes, we've upgraded two of three boxes to mariadb 10 [09:01:31] We had some very slow queries that now seem to be going much faster (2 times as fast at least) [09:01:37] ^ [09:02:02] we did some improvements on our end but the performance gains look too high [09:02:48] Could the mariadb migration be the cause of way way faster queries? [09:02:59] we are taking sometimes 2, 3 or 4 times as fast [09:03:15] A sample query: [09:03:51] http://pastebin.com/WMLDzM4a [09:08:00] ^springle [09:11:57] nuria: noted. momentarily distracted. brb [09:12:03] k [09:38:22] nuria: couple thoughts: mariadb 10 optimizer improves subquery performance, and the new instances have a *lot* more ram each [09:38:30] aha [09:38:32] also everything is on SSD [09:38:48] "everything" as in even the archive table? [09:39:07] if you're worried about the results, double check it on analytics-store i guess. it will be slower there (no SSD) but correct [09:39:17] yes, all tables [09:39:41] ok, when did the upgrade happen? [09:40:29] labsdb1002 dewiki, commons, etc two weeks ago. labsdb1001 enwiki last week [09:41:13] hmm, no, already lost track of time [09:41:27] three and two weeks ago respectively [09:48:28] many thanks springle [10:21:03] (PS6) Nuria: Removing usage of celery chains from report scheduling [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/150475 (https://bugzilla.wikimedia.org/68840) (owner: Milimetric) [10:58:53] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c6 (nuria) >Parsing all zero.tsv* files i noticed a large number >of other strange items - highly broken URLs that still return miss/200 result. I think we are missing issues here. I will addr... [10:58:56] qchris do you have time for a question? [10:59:25] !ask [10:59:43] Mhmm wm-bot2 does not help in this channel :-) [10:59:48] nuria: Sure. Just ask. [11:01:01] Remember you said yesterday the "orghttp" thing is the logs had happened before? [11:01:18] They should "happen" all along. [11:01:29] Ever since we're capturing logs, shouldn't they? [11:01:47] yes [11:01:50] But yes. I remember when I said that. [11:01:53] since ~may 2013 [11:01:56] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c7 (nuria) I think we are missing issues here -> "I think we are MIXing issues" [11:02:08] in sampled /mobile/zero and api [11:02:20] did not looked at others cause really know what those were [11:02:48] Is there mention of taht issue somewhere in wikitech? I looked: https://wikitech.wikimedia.org/wiki/Analytics/Zero_requests_stream [11:02:54] but could not find it [11:03:54] I am not sure whether or not it's mentioned in wikitech. [11:04:02] I do not recall having it seen there. [11:04:19] But I would not call it issue, but rather side effect of how we are logging. [11:05:50] can you explain ? [11:06:11] or point me to docs that explain how are we logging to those files? [11:06:46] Puppet are the docs [11:06:59] Didn't you look up how those values get created? [11:07:27] Let me find the relevant puppet part for you. [11:09:56] https://git.wikimedia.org/blob/operations%2Fpuppet/production/modules%2Fvarnish%2Ffiles%2Fvarnishncsa.default#L9 [11:09:59] nuria ^ [11:10:03] looking [11:11:32] !ask is Please feel free to ask your question: if anybody who knows the answer is around, they will surely reply. Don't ask for help or for attention before actually asking your question, that's just a waste of time � both yours and everybody else's. :) [11:11:32] You are not authorized to perform this, sorry [11:12:01] !ask is Please feel free to ask your question: if anybody who knows the answer is around, they will surely reply. Don't ask for help or for attention before actually asking your question, that's just a waste of time � both yours and everybody else's. :) [11:12:01] Key was added [11:12:08] !ask | qchris [11:12:08] qchris: Please feel free to ask your question: if anybody who knows the answer is around, they will surely reply. Don't ask for help or for attention before actually asking your question, that's just a waste of time � both yours and everybody else's. :) [11:12:26] !ask is Please feel free to ask your question: if anybody who knows the answer is around, they will surely reply. Don't ask for help or for attention before actually asking your question, that's just a waste of time---both yours and everybody else's. :) [11:12:26] This key already exist - remove it, if you want to change it [11:12:38] !ask del [11:12:39] Successfully removed ask [11:12:44] !ask is Please feel free to ask your question: if anybody who knows the answer is around, they will surely reply. Don't ask for help or for attention before actually asking your question, that's just a waste of time---both yours and everybody else's. :) [11:12:44] Key was added [11:12:50] !ask | qchris [11:12:50] qchris: Please feel free to ask your question: if anybody who knows the answer is around, they will surely reply. Don't ask for help or for attention before actually asking your question, that's just a waste of time---both yours and everybody else's. :) [11:14:40] At last wm-bot2 :-) [11:15:14] qchris: I did saw that as it is on wikitech too: https://wikitech.wikimedia.org/wiki/Cache_log_format [11:15:43] let me look at it in more detail [11:15:45] nuria: Yup. [11:25:32] ok, so the %{Host}I%U%q [11:26:09] is logging that way [11:27:20] what i do not get is why it is not logged like that in every request [11:28:50] (Assuming you mean lowercase "i" after "%{Host}") [11:29:00] yes [11:29:10] I do not understand ... every request (from frontend varnishes) gets logged using that. [11:29:30] nginxs use a different thing, but for varnish ... that should be the thing. [11:29:53] Can you point me to a request from varnishes that does not follow that configuration? [11:30:49] sorry, waht i mean [11:31:52] is that not "all" requests have the issue, since varnish logs most requests the "orghttp" should happen more often than not [11:33:00] Did you check what "http://%{Host}i%U%q" means, and did you try to reproduce those values? [11:34:00] So for example: [11:34:12] Bug 69371 calls out the logged value [11:34:18] http://en.m.wikipedia.orghttp://en.m.wikipedia.org/favicon.ico [11:34:48] How does that distribute to ${Host}i, %U, and %q? [11:35:29] So ... for which values of %{Host}i, %U, and %q would we log "http://en.m.wikipedia.orghttp://en.m.wikipedia.org/favicon.ico" ? [11:36:08] Once you have that, you can try if a request using those %{Host}i, %U, and %q really works on our infrastructure. [11:36:36] Then you have the answer, why we really serve html, icos, ... for those requests [11:36:50] Thereby, you explain bug 69371, and you also [11:38:01] explain the follow-up question around http status codes and content-types. [12:36:23] huh. Fell out of the chan :/ [12:37:37] * qchris grabs some duct tape at ties Ironholds to the channel :-P [12:40:54] qchris, hah :D. [12:41:30] two questions, actually, since you're here; 1, do you know where we store the IP ranges we accept SSL terminators from in Puppet? And, 2, do you know anything about Hadoop's TABLESAMPLE function? [12:42:17] 2. No. ottomata mentioned IIRC. [12:42:41] 1. Don't we accept connections from all machines? [12:43:43] oh, that's a pain. [12:43:52] re 1. [12:44:08] to be more specific: a user comes in with a HTTPS request, we route it through SSL, it comes out the other side. [12:44:19] how do we distinguish that request from internally-generated traffic? [12:44:35] That's easy ... we don't. [12:44:37] :-) [12:44:39] ... [12:44:43] * Ironholds headdesks over and over. [12:44:43] At least on the analytics part of it. [12:44:49] oh, yes. [12:44:51] That I know :D. [12:45:09] I'm wondering if we have any way of doing it for operations purposes. Writing out the PVs email and internally-generated traffic is kind of a lot of traffic. [12:45:17] Mostly fundraising. Stupid fundraising, paying our salaries. [12:45:55] Well ... you might want to check for the role::protoproxy::ssl [12:46:09] Or check logs to have hostname starting in ssl [12:47:32] Also ... the nginxes do logging in a slightly different format. You could use that too. [12:47:40] (I mean to identify them) [12:52:41] aha [12:52:44] danke! [12:52:50] and we have an ottomata :). Perfect timing! [12:52:56] (CR) QChris: [C: -1] [WIP] Notify Icinga about done webrequest datasets (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [12:53:24] qchris, wait, aren't we excluding ssl.* from hadoop? IOW, is this a solved problem? [12:53:46] ssl does not go into hadoop. right. [12:54:47] Oh, Ironholds. I think ... in the last two lines, we used "ssl" for different things. [12:55:05] Hadoop should not have the user's request that arrives at nginx. [12:55:22] But Hadoop should have the request made by the nginx to the caches. [12:55:26] perfect. [12:55:35] So only one of the two. [12:55:37] wait. hmn, possibly. [12:55:55] okay. And the nginx -> cache request: will it have the user IP or the nginx IP? [12:56:13] The nginx IP. The user IP is in the X-Forwarded-For. [12:56:29] hrm [12:56:31] * Ironholds headscratches. [12:56:56] But X-Forwarded-For should be a column on the table. So you can get it. [12:57:00] So in theory, "exclude all requests from WMF IP ranges except those with a non-null XFF..." should exclude internal traffic but include all HTTPS traffic? [12:58:23] Mhmmm. Not sure if there are other services that forward here. [12:58:42] yeah. Hurgh. [12:58:47] Why can't the universe just be simple? ;p [12:58:58] :-P [12:59:05] we need to get them to add a new column to the http 1.1 spec [12:59:11] Is this for an adhoc query or for something production-like? [12:59:14] X-Forwarded-Do-You-Care-About-This-Request. [12:59:22] the latter; I'm kicking off the PV implementation question. [12:59:30] I see :-) [12:59:32] So most of the nitty-gritty is your business, but I want to make sure I'm not terribly off the mark. [13:00:29] But keep in mind, that we get real good requests from labs etc. So excluding all wmf IP ranges might not do the right thing. [13:00:42] yep. [13:00:47] Cool. [13:00:51] But the crucial bit from an engineering POV is: [13:01:01] we need something to parse the puppet IP ranges into something analytics code can understand. [13:01:13] At the moment ErikZ is using static, manually-specified IP ranges that I have a lot of concerns about. [13:01:19] :-) [13:01:38] wellll, one day, who knows how soon, maybe sooner thanyou think, the ip ranges will bein a yaml file [13:01:39] (on that example: cool. We identify labs ranges in the manifest. Okay, back to the email now.) [13:01:47] ottomata, awesome! [13:01:49] folks are starting to look into hiera now that we are on puppet 3 [13:01:56] \o/ [13:02:18] oh mah goodness so many emails! [13:02:47] ottomata, one of them is very complimentary! [13:02:52] it's the one that's the only one from me ;p. [13:15:15] (CR) QChris: [WIP] Notify Icinga about done webrequest datasets (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [13:23:19] ottomata, so while you're around, are you familiar with the TABLESAMPLE function at all? [13:23:27] I'm running into some oddities with it and not sure what's going on. [13:23:44] (Specifically, I sample 1k rows, apply limiters, and get...200k results?!) [13:35:54] Ironholds: I'm not really familiar with it, other than knowing it exists [13:36:37] * Ironholds nods. Okay, will investigate further. Thanks :) [13:36:41] I'll probably just mail the hive list. [14:00:42] be right there.. [14:51:24] (PS1) Yuvipanda: Always specify parameters as tuples [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153606 [14:57:55] (CR) Yuvipanda: [C: 2] Always specify parameters as tuples [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153606 (owner: Yuvipanda) [14:58:00] (Merged) jenkins-bot: Always specify parameters as tuples [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153606 (owner: Yuvipanda) [15:06:34] milimetric, you let me know when you want to go over teh session management [15:07:17] k nuria, in five minutes? [15:10:08] sure [15:12:07] k, in the batcave [15:12:23] (CR) Ottomata: "Yeah, ok, let's keep it simple for now and not add it as a sub workflow." (5 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [15:12:49] (PS6) Ottomata: [WIP] Notify Icinga about done webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [15:22:13] (PS1) Milimetric: Ensure wikimetrics session is always closed [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [15:22:50] (CR) Milimetric: [C: -2] "nowhere near ready" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [15:35:53] Analytics / Wikistats: Add enciclopedia.us.es and ateneodecordoba.org URLs to stats - https://bugzilla.wikimedia.org/68398#c2 (Daniel Zahn) RESO/WON>REOP This request was about adding wikis to wikistats.wmflabs.org , not to be confused with stats.wikimedia.org. We do add non-WMF mediawikis to tha... [16:26:30] (PS1) Yuvipanda: Move check_sql into QueryRevision model [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153620 [16:27:50] * yuvipanda waves at milimetric [16:28:06] yurikR: hey! when do you plan on deploying the Limn extension? We need to rename it before that [16:28:31] yuvipanda, probably in another week, although i could push it out tomorrow ) [16:28:38] what do you want to call it? [16:28:45] yurikR: unsure. Visualization? [16:28:55] Graphing? [16:29:04] Eclypse [16:29:13] Graphlypse? [16:29:20] Viz? [16:29:25] * yuvipanda likes Viz [16:29:32] yurikR: will also need to be translatable :) [16:29:33] Slime [16:29:35] at least the tag [16:29:46] which tag? [16:30:00] you mean the ? [16:30:06] do we actually allow that? [16:30:17] not a good idea to localize HTML element names ) [16:30:19] yeah, parser tags are localized, no? [16:30:24] these aren't html tags ;) [16:30:36] close enough, really don't think we should localize these things :) [16:30:53] imagine localizable computer language... (i have actually seen that) [16:31:13] heh [16:31:16] either way [16:31:18] back to name :) [16:31:25] Viz? [16:31:42] zephir [16:31:47] viz is too short [16:32:51] bleh, zephir is a computer lang already ) [16:32:52] yurikR: lucem? [16:33:01] yurikR: lux? [16:33:02] showit [16:33:11] yurikR: aura? [16:33:19] yurikR: limn is greek for light [16:33:57] are we sure we even want to rename it? :) [16:34:05] yurikR: yes, most definitely [16:34:14] yurikR: to prevent confusion with Limn, which runs all our current dashboards [16:34:15] i think limn could be a container for all JS-based rendering [16:34:26] WikiVega? [16:34:28] plus the name has significant 'tear-my-hair-out' associations :) [16:34:35] or just vega? [16:34:44] that'll tie us down to vega forever :) [16:34:52] we could add current limn as one of the rendering engines [16:34:59] ewww no [16:35:01] :P [16:35:05] ... [16:35:09] NOOOOOO [16:35:12] :P [16:35:22] yurikR: have you worked with Limn itself before? [16:35:33] why not? it would add different modules based on the engine [16:35:41] sure, but not Limn :) [16:35:44] yeah - created data files [16:35:47] :) [16:35:51] yurikR: ah, that's why you don't feel the pain :) [16:36:17] yuvipanda, for wiki, we should stick with "data-only" approaches [16:36:20] yurikR: anyway, I suggest we make this undemocratic, and have milimetric pick someone (Toby? halfak?) to pick a name [16:36:24] no JS code in wiki pages :) [16:36:31] yurikR: oh, I completely agree. I'm just talking about the name now :) [16:37:46] democracy? what's that? lets call it gamma [16:37:52] oh, that's for music (: [16:38:47] we could actually use "" as the tag [16:39:07] sounds fine to me, graph could be the namespace too [16:39:19] guys, why don't you go debate here: http://etherpad.wikimedia.org/p/naming-visualization [16:39:24] qchris: have you ever seen oozie expecting a /lib directory to be in the application.path? [16:39:24] it's an old discussion [16:39:28] i think i've seen this before [16:39:32] namespace is up to each wiki to decide - i'm not even sure they will want a NS [16:39:33] yurikR / yuvipanda ^ [16:39:39] haha [16:39:43] but I don't see why it is doing that for me right now [16:39:44] not namespace number, but NS name [16:39:53] k, gtg, but I'm ok with whatever you decide [16:39:57] * qchris reads backscroll [16:40:12] yurikR: yeah, let's call it graph? [16:40:25] ottomata: Not sure what you are refering to. [16:40:27] - the tag, what about the ext name? [16:40:28] e.g. [16:40:32] libpath [hdfs://qchris-master.eqiad.wmflabs:8020/wmf/refinery/2014-08-12T16.33.12Z--595089a-dirty/oozie/util/monitor/done_flag/lib] does not exist [16:40:37] yurikR: GraphData? [16:40:42] bleh [16:40:48] data is evil [16:40:53] yurikR: heh. [16:41:00] yurikR: DeclareGraph? [16:41:02] graphomaniac [16:41:12] yurikR: graphatic? [16:41:18] graphit [16:41:21] ottomata: No clue. Works for me. [16:41:35] graphite [16:41:36] yurikR: too close to graphite :) [16:41:40] yurikR: hah :) [16:41:46] lets call it graphite :) [16:41:47] ottomata: But I am working on the testcluster too ... maybe the two of us got in the way? [16:41:50] OH, qchris! [16:41:50] yurikR: :P [16:41:51] unless we have an ext like that [16:41:54] YOU ARE TeSTING TOO [16:42:05] yurikR: ya, but we have graphite elsewehere and that's super popular :) [16:42:07] i kept being like "AH what are these old jobs runing" [16:42:11] i probably killed some of your jobs [16:42:16] yurikR: we could pick another isomer of carbon [16:42:22] ottomata, are you in NY? [16:42:28] naw, in maryland right now [16:42:41] ottomata: No worries about my jobs. But you uploaded some new patch sets, so I tested them :-) [16:42:46] aye [16:42:52] look slike they are working! [16:43:13] yuvipanda, graphene? its 2D [16:43:46] yeah, lets do graphene, i like that thing :) [16:43:48] yurikR: graphene is already a graphite dashboard :) we might even end up using that [16:43:56] yurikR: https://github.com/jondot/graphene [16:44:14] ottomata: Yes. And [...]-595089a-dirty looks like one of my deploys. [16:44:18] that's fine - we are talking about the extension here, which could support multiple visual langs [16:44:28] ottomata: I never uploaded 595089a to gerrit. [16:44:36] the tag, Extension:Graphene [16:44:48] ottomata: So I think we got in the way of each other. [16:44:52] yurikR: https://en.wikipedia.org/wiki/Isotopes_of_carbon and https://en.wikipedia.org/wiki/Allotropes_of_carbon [16:44:55] qchris: I had that error on mine as well [16:44:59] so, you see the error too [16:45:04] i was submitting individual workflows [16:45:10] No. I do not see the error. [16:45:11] yurikR: well, if we end up using Graphene dashboards in production, then we'll have graphene the graphite dashboard and graphite the extension. [16:45:21] I mean ... the workflow path is still wrong. [16:45:28] yuvipanda, fine with me ) [16:45:30] But next to that. :-) [16:45:37] yuvipanda, its just the ext name :) [16:45:37] yurikR: I have to deal with both, so no :P [16:45:37] Let me Submit the review. [16:45:48] (CR) QChris: [WIP] Notify Icinga about done webrequest datasets (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [16:45:55] (CR) QChris: [WIP] Notify Icinga about done webrequest datasets (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [16:46:04] (CR) QChris: [C: -1] [WIP] Notify Icinga about done webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [16:46:39] yurikR: we can just call it Carbon :) [16:46:58] yuvipanda, or diamond :) [16:47:04] yurikR we also use diamond in prod :P [16:47:14] https://github.com/BrightcoveOS/Diamond [16:47:15] * yurikR slams his head into the wall [16:47:16] also statsy [16:47:33] yurikR: heh, just checked, we *also* use carbon in prod :P [16:47:53] carbon only makes me think of of emissions, bad cannotation [16:47:59] graphene is from the future [16:48:57] otherwise we might call it the spaceelevator [16:49:09] (not really) [16:49:23] yurikR: alright, let's go with Graphene as ext name and tag [16:49:30] yei! [16:49:35] milimetric, objections? [16:49:49] Graphene? What?! [16:50:17] what, graphene is a 2d carbon, magic material for the future. Graph is the tag :) [16:50:18] that doesn't really make sense... why graphene? [16:50:30] milimetric: tag, Graphene extension name [16:50:37] milimetric: although I'm ok with calling it Extension:Graph [16:50:44] yeah... why not keep it simple [16:50:50] +1 to Extension:Graph ;) [16:50:54] sigh [16:50:56] Graphene implies it has something to do with the Graphite project I think [16:50:58] borring [16:51:14] milimetric: http://jondot.github.io/graphene/ also exists [16:51:33] github would have everything on the planet. Lets concentrate on MW [16:51:56] yurikR: it's not about github, but the fact that it is a graphite dashboard, and we're currently evaluating different graphite dashboards [16:52:09] (PS7) Ottomata: [WIP] Notify Icinga about done webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [16:52:18] yurikR: so if we end up picking Graphene, we'll have graphene.wikimedia.org that'll have the graphite dashboard and a Graphene extension, and that'll be super confusing [16:52:28] it's the same domain (graph things!) [16:52:57] milimetric: you should just be undemocratic and pick a name :) [16:53:12] let's please not call it graphene....i second yuvipanda that is mega-confusing [16:53:21] ok ok, extension graph (very close to Graf - the title of german nobility) [16:53:37] ok ok, [16:53:40] ext: graph [16:53:47] tag [16:53:49] Extension:Graph, provides [16:53:51] lets wait with the NS [16:54:07] NS can be declared much later [16:54:10] cool. yurikR do you want to do the rename or should I? [16:54:22] * yuvipanda is currently fighting with sqlalchemy [16:54:24] yuvipanda, i'll let you do the honors :) [16:54:28] yurikR: :) ok! [16:54:42] btw, who can rename it in gerrit? [16:54:48] qchris: you used to have access to vanadium, right? [16:54:55] no. I never had. [16:55:04] I filed an RT ticket. ... let me find it. [16:55:06] yurikR: you can't. we'll get a new repo and push the current stuff into it [16:55:22] ok [16:55:23] ottomata: RT #8034 [16:55:23] ottomata, so are you in NY? would love to meet at co-space [16:55:31] oh, qchris, that's what you were going to get me to do [16:55:39] ottomata: Yes :-) [16:56:11] yuvipanda, how about Graf ? :) [16:56:47] qchris: sorry about that -marker and _monitoring bit, i thought I had got those, but missed a couple [16:57:03] No worries. Oozie complains, so we find them :-) [16:57:48] yay I crashed hive! [16:57:52] first time since the rebuild [16:58:10] "GC collection limit hit" after searching 2 weeks of data for a single string. Reasonable complaint for the system to have. [17:03:00] milimetric: also let me know when you've a few mins? have a SQLAlchemy question [17:03:36] yuvipanda: sure [17:03:50] milimetric: sure as in 'now' or you'll let me know? :) [17:03:50] I'm debugging a crazy scoped_session issue now so I'm fresh on all things SA [17:03:54] haha, Ironholds, that is a funny one [17:03:54] right now man [17:03:55] like [17:03:55] milimetric: ah :) [17:03:56] let's do it! [17:03:57] :) [17:04:07] the collector is tired of picking up your garbage over and over again [17:04:10] milimetric: so, how do I have multiple relationships() to the same key? [17:04:29] milimetric: QueryRevision has multiple 'runs', a one to many relationship, which is all the runs for a Revision [17:04:40] milimetric: it also has a 'latest_run', which is a one to one relationship [17:04:41] how are you doing things? normal sqlalchemy.orm? [17:04:45] milimetric: yeah [17:04:53] lemme check out the code (or should I look at a patch?) [17:05:00] milimetric: sure, sending patch up, moment [17:05:20] (PS1) Yuvipanda: [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 [17:05:21] milimetric: ^ [17:05:32] that errors with InvalidRequestError: One or more mappers failed to initialize - can't proceed with initialization of other mappers. Original exception was: Could not determine join condition between parent/child tables on relationship QueryRevision.runs - there are multiple foreign key paths linking the tables. Specify the 'foreign_keys' argument, [17:05:32] providing a list of those columns which should be counted as containing a foreign key reference to the parent table. [17:07:17] got it yuvipanda, thinking [17:10:00] ottomata, yeah, I've dated people with that complaint [17:14:02] qchris: re your comment on https://gerrit.wikimedia.org/r/#/c/151963/1/manifests/role/analytics/refinery.pp [17:14:06] milimetric: I asked in #sqlalchemy, nothing yet, sadly [17:14:34] the _SUCCESS flag will not be created until at least 2 hours after the houly directory is created, right? [17:14:34] yuvipanda: I got sidetracked for a sec but solved my issue! [17:14:43] so now I'm grabbing your patch, one sec [17:14:46] milimetric: 100t [17:14:47] err [17:14:47] w00t [17:14:48] ottomata: Right. [17:16:05] hm, but this monitor workflow won't be launched until the _SUCCESS flag for the hour exists [17:16:10] yuvipanda: how do I install / run this thing? [17:16:24] which means we will get passive icinga OKs for the dataset that is 2 hours old [17:16:47] and if _SUCCESS is not created (because of missing data or whatever), then the monitor workflow will not be submitted to oozie, right? [17:17:03] ottomata: milimetric has taken up the review of qchris backup patch [17:17:12] (ok nuria, thanks) [17:17:15] milimetric: ah, that's a somewhat involved process involving celery, redis, mysql, and an ssh tunnel :( [17:17:25] so, even though the datasets we want to be alerted about are 2 hours old [17:17:29] oh no, I just briefly skimmed it ottomata / nuria, you guys can look at it more deeploy [17:17:32] icinga will expect to receive OKs every hour [17:17:38] ok yuvipanda, got it [17:17:38] it isn't checking which dataset the OKs are about [17:17:44] just that it gets OKs every hour [17:18:00] hm, ok, so I should just set the freshness threshold to an hour then? [17:18:04] perhaps 1.5 hours to be safe? [17:18:07] ottomata: I think so. [17:18:13] ottomata: But my comment is untested. [17:18:20] aye, this is all untested! [17:18:26] no way to really test icinga... :/ [17:18:37] Well ... I could set up icinga in labs. [17:18:40] oof [17:18:43] i guess so [17:18:46] sounds annoying to me [17:18:46] ok, i'm going to set it for 1.5 hours then [17:19:00] 1.5 sounds good to me. [17:19:14] yuvipanda: so a query_revision has many runs, and one of those runs is the "latest" run and you'd like to store references to both of those, right? [17:19:20] milimetric: yes [17:21:11] yuvipanda: the latest_run and latest_run_id are conflicting [17:21:17] you just need latest_run [17:21:19] milimetric: oh? [17:21:22] if you want the id, it's latest_run.id [17:21:30] milimetric: and how will it know to grab only the latest_run_id? [17:21:43] milimetric: err, how will it know only to grab the latest QueryRun object, instead of an arbitrary one? [17:21:45] that's how the sqlalchemy mapper works [17:22:07] well, wait, you're setting that right? [17:22:34] like, when you run a QueryRun, you're setting query_revision_instance.latest_run = blah and then saving that, right? [17:23:01] milimetric: yeah, I would [17:23:15] latest_run = relationship('QueryRun', uselist=False) instead of both of those should work then [17:23:35] milimetric: cool, trying out now! [17:23:45] because you see, that's doing automatically what you're trying to do manually with the latest_run_id [17:25:23] milimetric: oh, so will it automatically have some sort of filed in the query_revision table that stores it? [17:25:58] yes, I think by convention it'll be actually called "latest_run_id" but you can probably override that [17:26:39] aaah, cool [17:26:41] let me try it out [17:27:46] but, btw, these relationships are optional, I did them all manually like "user_id = Column(Integer, ForeignKey('user.id'))" [17:28:03] it's totally just a style thing [17:28:37] milimetric: hmm, and that would give me .user as well? [17:28:45] no, it wouldn't [17:28:54] you'd have to be like .join(User) [17:29:01] and that would know how to join [17:29:32] but if you want .user, what you're doing is cool. I just like everything to look more like SQL so .join(User) feels nice to me [17:32:05] qchris: are you testing that latest patchset? [17:32:39] ottomata: Not on the test cluster. But now that the main thing is working, I am reviewing send_ok_to_icinga.sh [17:32:55] (But locally. The cluster is all yours) [17:33:19] k, awesome [17:34:41] yuvipanda: do let me know if that's still not working [17:34:51] I'll make a sample to make sure I'm not talking crap [17:35:07] milimetric: will do! [17:44:10] (CR) QChris: "Mostly nits." (9 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [17:46:45] (PS2) Yuvipanda: [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 [17:46:51] (CR) jenkins-bot: [V: -1] [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 (owner: Yuvipanda) [17:46:53] milimetric: ^ it's not :( [17:47:11] milimetric: I see no writes into latest_run_id at all, even though I'm assigning [17:49:53] milimetric: I think I should follow your lead and not use relationship() [17:50:05] it's really confusing, and is the reason I didn't use SQLAlchemy to start with [18:07:09] qchris: re your 'what if send_nsca fails comment' [18:07:13] won't set -e make the script exit? [18:07:15] if that happens? [18:07:40] There is no "set -e" currently. [18:08:39] But if there was, it should make the script exit. yes. [18:10:17] well there will be in the next patchset!~ [18:10:22] \o/ [18:10:38] if, so, would you rather me do the icinga_pipe thing still? [18:11:19] It's a matter of taste. I'd prefer to see the "echo -e ..." only once. But if you think otherwise, that's fine too. [18:11:45] When doing the icinga_pipe, the relevant functionality is better bundled from my point of view. [18:12:14] All the --dry-run handling is right in a single place (where it is needed) [18:12:21] And also the echo would be in a single place. [18:12:29] Otherwise, it's a bit mixed. [18:12:36] But I am ok either way. [18:14:13] (PS3) Yuvipanda: [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 [18:14:19] (CR) jenkins-bot: [V: -1] [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 (owner: Yuvipanda) [18:16:09] qchris: , i think i prefer it without that, usually i'm with you, but the function would only be used once, which doesn't really buy us much in the case of DRYness [18:16:12] yuvipanda: I agree the relationships are confusing, and I can set up a sample that does what you'd like, if you want [18:16:32] another idea is to not store the "latest" at all, as that seems like premature optimization [18:16:35] milimetric: sure, although I'm trying something else right now. see latest patch, it parses alright, and *seems* to work [18:16:45] milimetric: it's not premature optimization as much as sanity while writing the code :) [18:17:01] milimetric: otherwise 'get list of queries ordered by timestamp of latest queryrun' is just painful [18:17:10] just .query(QueryRun).ordery_by_desc(QueryRun.timestamp).first() [18:17:30] ottomata: (It would DRY the echo) [18:17:37] ottomata: But I am fine without it. [18:17:46] no, I want to get *Query*s that are ordered by timestamp of their QueryRun [18:17:50] ottomata: No icinga_pipe then :-) [18:17:53] so that involves two joins and an order by and a limit [18:17:55] oo, one more thing, what do you mean about $0 being not quoted? [18:17:59] in the usage message? [18:18:07] its in double quotes in the whole message there... [18:18:10] I had expected $(dirname "$0") [18:18:22] I see yuvipanda, makes sense. Checking latest patch [18:18:43] unquote, quote, ok [18:19:38] (PS8) Ottomata: [WIP] Notify Icinga about done webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [18:19:41] qchris: ^ [18:20:14] ottomata: The quotes outside of the subshell are not used in the subshell. [18:20:31] ottomata: Ok. Will look after I found time to eat something :-) [18:26:34] milimetric: I think I've figured out the problem! [18:26:40] let me verify, then will post new PS [18:27:06] k, that whole primaryjoin thing is weird, I've never tried to join like you're joining though [18:27:19] I always just write the sqlalchemy fully - it generates the same sql anyway [18:27:43] milimetric: yeah, but it kept conflicting without the primaryjoin [18:27:57] right, no, I'm sure it's needed [18:28:14] it's like sqlalchemy going - hm, this isn't very typical [18:28:19] yeah [18:28:30] I guess you usually don't have multiple foreign keys [18:28:40] although I remember doing this with Django ORM yeaaars ago and it was ok [18:41:27] yuvipanda, hi, so are you creating it or should we file a request in http://www.mediawiki.org/wiki/Git/New_repositories/Requests [18:42:06] yurikR: I usually poke people and stare at them till they do it, right now doing it in -labs [18:45:16] (PS9) Ottomata: [WIP] Notify Icinga about done webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [18:46:16] ottomata, question for you - do you know what is the easy way to automate scp transfer between two labs instances? or some other mechanism? [18:48:54] in different projects? [18:50:36] is this for a one off, or for some regularly scheduled cron like thing? [18:54:15] yurikR: import with history done :) [18:54:28] yuvipanda, !! [18:54:36] ottomata, cron job [18:54:43] analytics [18:55:03] for two different projects [18:55:16] come to think of it, it might be better to store it on stat1002 [18:55:26] these are SMS logs [18:55:39] hm, where do they come from? [18:55:41] we shouldn't host them on labs [18:55:45] aye, makes sense [18:55:47] they come from our partner [18:55:55] do they come in on a box in prod somewhere? [18:56:05] they are on S3 cloud, uploaded by them [18:56:08] (PS4) Yuvipanda: [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 [18:56:15] (CR) jenkins-bot: [V: -1] [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 (owner: Yuvipanda) [18:56:15] i download it with a python script [18:56:37] do some magic processing, and generate limn datafiles [18:56:45] those datafiles should go to gp [18:57:38] ottomata, ^ [18:58:25] hm, ha, welp, if they are already out there in S3...:p [19:00:26] (PS5) Yuvipanda: [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 [19:04:26] Would you mind if I give it a shot? [19:04:54] Darn ... forgot to /msg YuviPanda|zzz on ^ [19:05:17] (CR) jenkins-bot: [V: -1] [WIP] Add latest_run to query revision [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 (owner: Yuvipanda) [19:11:00] YuviPanda|zzz: you're totally right, this is a weak point of sqlalchemy it seems [19:11:15] I was playing around with it and there are a lot of errors until you get it to work [19:11:31] alas, back to my simple explicit manual joins :) [19:11:46] * milimetric pats self on back for not using relationship [19:26:42] (CR) QChris: [C: -1] "LGTM. Let's wrap it up and merge!" (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [19:28:24] ottomata, so how should i set it up? I would like to move the logs (~4 GB) from my laptop [19:28:41] who is maintaining stat1002? [19:28:58] you have 4GB of logs on your laptop? :/ [19:29:25] Ironholds, i was generating analytics from them directly :( [19:29:30] oh dear :( [19:29:33] they are downloaded from S3 cloud [19:29:37] Aha [19:29:41] external partner [19:29:47] oh, and of course s2 doesn't have a net connection. gotcha. [19:29:58] s2? [19:30:04] stat2, sorry [19:30:10] qchris: i have been grepping the zero logs and can find not commonality for the patterns logging the host twice, other than all requests are http/mostly are 200/misses snd all text/html [19:30:15] Does stat3 have a sync directory for public datasets? If so you might want to try just hosting it there and not having to worry about it. [19:30:19] it does, but I was writing the script to do all that [19:30:29] aha [19:30:37] anyway, I'll stop butting in :P [19:30:43] and now the script to download and process is ready, so i'm looking to move it to stat1002 [19:31:13] hence - looking for someone to tell me where to put this stuff [19:31:42] nuria: Does not have the bug examples of pictures being requested too... let me check again. [19:32:00] there are pictures too [19:32:01] ya [19:32:12] but *.ico are the exception [19:32:39] Ok. I was just thinking about your "all text/html" above [19:34:17] it's like: 1 text/html [19:34:17] 5 image/gif [19:34:17] 47 application/javascript; [19:34:17] 65 image/x-icon [19:34:17] 2075 text/html; [19:34:27] for 10 days in august [19:35:09] it only happens on http , that is clear [19:35:47] You mean http in contrast to https or in contrast to other protocols? [19:36:19] oh qchris, forgot, i responded to your analyitcs1027 thing but forgot to submit the comment [19:36:35] Also ... since you produced all that stats ... did you look at the varnishnsca config and how those lines get produced? [19:36:40] (CR) Ottomata: [WIP] Notify Icinga about done webrequest datasets (7 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [19:37:15] ottomata: Yes, I saw the new patch sets, and reviewed them. [19:37:29] ottomata: Let's get the commit info in shape and merge. [19:39:48] qchris: the faulty login does seem to happen only in urls reported with 'http' to the logs, requests with https do not seem to have the issue [19:42:07] That might be the case ... but let me again ask you whether you looked at the varnishnsca config and how those lines get produced? [19:42:26] That might well give a really simple explanation as I said today in the morning. [19:42:53] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c8 (nuria) Please take a look at our logging format on varninsh: https://git.wikimedia.org/blob/operations%2Fpuppet/production/modules%2Fvarnish%2Ffiles%2Fvarnishncsa.default#L9 The interestin... [19:43:42] qchris: at the "http://%{Host}i%U%q" you mean? yes i did looked at it [19:44:18] Coolio. So you know what the %{Host}i, %U, and %q are for the logs from the bug? [19:44:49] from teh logs themselves or from elsewhere? [19:44:54] *the [19:45:15] From the logs, or from elsewhere. [19:45:57] For example what is %U for a line logging "http://en.m.wikipedia.orghttp://en.m.wikipedia.org/favicon.ico" [19:46:05] parsing {Host} %U %q from the log [19:46:11] is easily done , did that too [19:46:39] but other than the fact that requests mostly are for top (a.k.a http://en.m.wikipedia.org) [19:47:21] i see no other pattern [19:47:25] nuria: Let me rephrase that ... [19:47:50] If for example %q wouldn't be "/favicon.ico" but the whole "http://en.m.wikipedia.org/favicon.ico" [19:48:05] and %{Host}i would be "en.m.wikipedia.org" [19:48:11] wouldn't that explain everything. [19:48:30] And would that be a valid HTTP/1.1 request? [19:48:40] ya of course but it does not explain why [19:48:50] And if so, wouldn't that also explain why we are serving proper responses? [19:49:38] yes, but it does not explain why it happens as clients seem to be very distinct [19:49:52] will look at clients again but they were all over the place [19:50:24] If they are proper, good requests that conform to the standard ... why do we need a reason for such requests? [19:50:59] because it will be nice to know what's triggering them right? [19:51:13] ? [19:52:04] Sure ... it would also be nice to know what triggers requests that end in "foo" :-) [19:54:41] ok, will check a bit more to see if i find anything in common in 'mobile' requests taht follow same pattern but otherwise i do not think we can do much else (let me know if you think about something) [19:56:00] nuria: Sure, go ahead check more on the logs. [19:56:14] I presented you an explanation that would explain the bug away. [19:56:31] You could also just verify it. If it holds true, [19:56:37] The bug can be closed. [19:56:56] verify it how? making a faulty request? [19:57:12] They are not faulty. [19:57:19] They are good requests. [19:57:53] haha, ok, let's call them not-sopretty-requests , but that would verify that a not-so-pretty request gets logged on that fashion [19:57:56] Yes, make one, and see if you get a proper response from the caches. [19:58:06] you do , ya [19:58:22] i bet , but it does not tell me as to the cause of the bug [19:58:46] The "cause" of the bug is our log format. [19:59:13] but anyways, if we are not looking for that type of info I shall grep a bit more and close the bug [19:59:42] As you say :-) [20:00:00] I'll call it a day. [20:00:05] Have fun and see you tomorrow.