[09:59:40] nuria: I toyed around a bit with wikimetrics on staging. Looks good to me so far. Do we have a test plan?! [09:59:44] s/!// [10:01:06] no, there is no test plan. but we should have one going forward, in this case it would have help us not have to do "last time running arround" [10:01:26] i already tested on vagrant and staging for over an hour and ahalf [10:01:43] only found 1 issue [10:01:51] that i have alredy sent to dan [10:01:59] ok. [10:02:04] *already [14:37:45] will be back online in 40 min [16:40:01] * DarTar waves [17:16:28] (PS1) Csalvia: Fixed parse_username to handle Unicode [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/125752 [17:18:39] (CR) Milimetric: "Is it possible to write a test to prove that this solves the problem? As in, the test would fail before this and pass afterwards?" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/125752 (owner: Csalvia) [17:30:51] hey Ironholds [17:30:57] (and other analytics people!) [17:31:18] hey yuvipanda [17:31:19] is UA processing for EL a thing? or do we still need to log the UA manually for now? [17:31:28] uh. [17:31:35] the UA is included in the EL schemas, afaik [17:31:42] I don't know if it's piped through the UA processor, but I assume not. [17:31:47] like, the event wrapper schema? or the individual event schema? [17:31:50] luckily there is a python port of the library we use [17:31:54] oh, no, not talking about the UA processor at all. [17:32:02] that I do not know. But I know most of those words, if that counts? [17:32:04] Not all, but most. [17:32:05] actually, forget I said UA. This is for the app so UA is irrelevant. [17:32:12] yep. You guys mess my work up good. [17:32:20] question is do I need to include in every schema a 'platform' that identifies android / iOS [17:32:29] I...do not know. [17:32:37] I think I should, in that case. :) [17:32:39] I can tell you it would be terribly useful if you included that in the app UAs ;p [17:33:00] does EL store the UA of the request that made that log? [17:33:19] to rephrase my question, when EL stores an event, does it also store the UA? [17:33:52] I believe so. [17:34:21] yes yuvipanda, EL currently stores User Agents for all events as defined by all schemas [17:34:26] aha! [17:34:28] cool [17:34:32] that's the answer I was looking for. yay! [17:34:32] it does so because the User Agent is captured as part of the capsule schema [17:34:50] warning yuvipanda: this is not in compliance with our privacy policy [17:34:54] milimetric: link to the capsule schema? [17:35:08] so long term we will be looking to sanitize the UAs after 90 days [17:35:13] looking for the link now [17:35:35] milimetric: ah. [17:35:39] yuvipanda: http://meta.wikimedia.org/wiki/Schema:EventCapsule [17:36:03] hmm [17:36:03] but, yuvipanda, the other part of the privacy policy compliance is that if you, for example, added UAs to your schema we would sanitize those as well after 90 days [17:36:19] so regardless, you should not add another way of tracking the UA [17:36:32] * yuvipanda regrets using the word 'UA' [17:36:52] since it is *very* different from a browser UA [17:36:54] you're just looking for stuff that can be parsed from the UA? [17:36:56] and doesn't have most of the same issues [17:36:58] yeah, right [17:37:01] milimetric: so this is for the app [17:37:07] right yuvipanda, i follow [17:37:10] milimetric: and our UA is pretty much WikipediaMobile/ OS/ [17:37:18] the non-sensitive parts of the UA will most likely be kept [17:37:27] but we are still deliberating on that [17:37:31] indeed, is there anything sensitive at all in those? [17:37:36] and we'll be implementing some kind of opt-in / opt-out mechanism [17:38:04] hmm. [17:38:12] well, yuvipanda, whether there's something sensitive in UAs of any kind, even your very generic ones, is the subject of hot debate on the lists [17:38:15] so I won't comment here [17:38:36] I think I'll just stay out of the entire discussion, and have an extra field that simply says if the ANdroid app is making the log or the iOS app. [17:38:44] they are very literally two different codebases, so I think that is fine. [17:38:55] but I believe we'll end up keeping the basics - browser, os, etc. unless people opt-out / if people opt-in [17:39:45] well, yuvipanda, once we go down the opt-in / opt-out road, all event logging details will be subject to censorship based on users' individual decisions [17:39:59] so if they want all PII to be stripped, we would have to strip that too [17:40:07] what, which codebase they are using? [17:40:32] either way, if that happens, then I am pretty sure apps won't be the only thing affected, so I'd rather deal with it at that point than now :) [17:40:45] i don't know if that will count, but it might, it depends on whether or not it would add enough bits of identifying information to be dangerous [17:41:00] well, now you don't have to deal with anything [17:41:05] indeed :) [17:41:10] because the userAgent field in the capsule has everything [17:41:20] and will continue to have everything for a long time until we get this prioritized [17:41:39] indeed :D [17:58:59] (CR) Csalvia: "It is possible to write such a test - but we'd have to include the old broken version of the function around just for the test. I think a " [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/125752 (owner: Csalvia) [18:07:44] (PS1) Ottomata: Upping the number of camus mappers to match the number of kafka paritions we are importing [analytics/kraken] - https://gerrit.wikimedia.org/r/125763 [18:08:01] (CR) Ottomata: [C: 2 V: 2] Upping the number of camus mappers to match the number of kafka paritions we are importing [analytics/kraken] - https://gerrit.wikimedia.org/r/125763 (owner: Ottomata) [18:19:48] (PS1) Ottomata: Updating kraken to latest [analytics/kraken/deploy] - https://gerrit.wikimedia.org/r/125765 [18:19:55] (CR) Ottomata: [C: 2 V: 2] Updating kraken to latest [analytics/kraken/deploy] - https://gerrit.wikimedia.org/r/125765 (owner: Ottomata) [18:21:20] (PS2) Csalvia: Fixed parse_username to handle Unicode [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/125752 [18:23:21] thanks csalvia, testing and merging [18:29:35] hold on... one more patch, sorry! [18:29:41] simplifying parse_username even further [18:30:35] (PS3) Csalvia: Fixed parse_username to handle Unicode [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/125752 [18:41:04] (CR) Milimetric: "I tested this in staging on the cohort that cause the bug in the first place. The new error is:" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/125752 (owner: Csalvia) [18:49:03] yuvipanda, the WikipediaApp and WikimediaMobile UAs are you guys,r ight? [18:50:39] Ironholds: WikipediaApp is the new one, WikipediaMobile (not Wikimedia) is old app. [18:50:48] Ironholds: so yes. [18:50:48] danke [18:50:56] good news; you guys seem to be trouncing third-party apps [18:51:10] bad news; developers don't know how to write acceptable UAs [18:52:59] heh [18:53:12] WikipediaMobile UAs were not very consistent, but WikipediaApp should do better [19:06:32] define 'not very consistent'? [19:06:39] as long as they call contain WikipediaMobile, that's all I need. [19:06:55] Ironholds: Other than old versions I think they did. [19:07:01] oy. [19:07:05] hopefully nobody is using those [19:07:26] yeah. they were on iOS too where the upgrade paths are faster. plus little number of people use our app on iOS anyway - it is terrible on iOS [20:02:03] Hey milimetric. I'm starting to work on an example of "Productive new editor" code. [20:02:14] Do you have some time to discuss the details with me? [20:04:51] sure halfak, let's do it [20:04:53] batcave? [20:04:55] :) [20:04:58] Yup. [20:55:20] qchris: i'm thinking about moving the camus.webrequest.properties file out of kraken repo, and into kraken-deploy [20:55:22] whatcha think? [20:55:39] ottomata: Ok for me. [20:55:57] Does this mean krkaen will get rid of configs in general? [20:56:15] Like e.g.: oozie (in the long run) [20:56:20] hmmmm [20:56:22] i dunno [20:56:28] hmmm [20:56:37] Parts here, parts there would be confusing. [20:56:41] yeah true [20:56:47] But honestly ... I dunno. [20:56:56] oozie sorta describes how jobs run and their flow [20:57:00] chains them together [20:57:10] but hm [20:57:11] yeah [20:57:12] ergh [20:57:15] i'm not sure [20:57:20] you know hm, i'm not going to do it now [20:57:26] What? [20:57:32] I didn't want to stop you. [20:57:32] move the .properies [20:57:35] naw [20:57:38] i'm not sure if I want to either [20:57:45] i'm just having trouble deploying kraken because submodule support in trebuchet isn't so good it turns out [20:57:48] i'm going to have to fix it [20:57:57] I see. [20:58:27] Will that be hard to fix? [20:58:39] I so, let's move the properties file over. [20:58:48] s/I/If/ [20:59:05] Just to see how it works ... and later fix it and move it back. [20:59:30] hm, ok [20:59:36] well, i mean [20:59:39] no, let's not [20:59:43] Ok. [20:59:47] right now I only need that change made in one spot [20:59:49] for it to be applied [20:59:52] so i'm committing the changes [20:59:57] and i will just apply them manually :/ [21:00:01] too [21:00:04] :-)) [21:00:24] so, friday I turned on webrequest_bits camus import [21:00:29] and the jobs are taking Forever to run [21:00:33] Hahahaha. [21:00:44] we have this set right now [21:00:44] # Max minutes for each mapper to pull messages (-1 means no limit) [21:00:44] kafka.max.pull.minutes.per.task=-1 [21:00:56] :-) [21:01:01] which means that a single camus mapper will run as long as it can in order to consume as many messages from kafka as it can [21:01:04] i'm thikning about setting that to an hour [21:01:09] or maybe 59 minute:) [21:01:15] so that new jobs can be launched [21:01:25] the huge amount of bits backlog to import [21:01:30] is keeping mobile logs from being imported [21:01:35] i think this will be fine once bits catches up [21:01:39] and there is only an hour to import [21:01:43] What ... bits import is starving mobile import? [21:01:52] yes, because they are consumed as a single job [21:01:54] so [21:01:59] Mhmm. [21:02:06] since I turned this on on friday [21:02:11] only two Camus jobs have been launched [21:02:15] :-d [21:02:19] s/d/D/ [21:02:20] and, if even a single mapper takes a long time to run [21:02:23] the job will still exist [21:02:26] Yes. True. [21:02:29] and the camus wrapper script will not launch another job [21:02:42] so, I'm thinking about just letting them run for no more than 59 minutes each [21:03:05] that way the job will just end, and the new camus job's mappers will start where the last one left off [21:03:23] IIRC it sometimes took >1 minute until the job really started to fetch data. [21:03:37] So it might be worth to add some more margin there. [21:04:52] 55 minutes? [21:05:01] ITS A TRAP! [21:05:05] Looks better to me :-) [21:05:10] Hahaha [21:08:38] appear.in is such a nice place for stalkers :-/ [21:13:44] haha [21:13:45] yup [21:13:50] ok ok, ummmm, what? [21:13:55] we can jump back in ther eif you wanna [21:14:14] i'm in there now qchris [21:14:26] coming [21:18:50] (PS1) Ottomata: Limiting each Camus mapper to a runtime of 55 minutes [analytics/kraken] - https://gerrit.wikimedia.org/r/125880 [21:19:07] (CR) Ottomata: [C: 2 V: 2] Limiting each Camus mapper to a runtime of 55 minutes [analytics/kraken] - https://gerrit.wikimedia.org/r/125880 (owner: Ottomata) [21:20:45] ha, the camus logs are showing about map completion of 1% about every 2 hours right now :p [21:20:55] this job will be finished in 8 hours [21:20:55] ha [21:20:57] oh well [21:20:59] i'll let it go [21:21:03] Haha. [21:21:06] hopefully tomorrow morning it'll be all better [21:21:11] or at least [21:21:15] running sorta normally [21:21:38] If not tomorrow ... we have buffer for whole a week :-) [21:21:44] We have up until Friday. [21:23:59] haha, yup [21:24:06] ok, time to go [21:24:07] byeyeyyeye [21:24:30] Bye. [21:24:36] Enjoy your evening. Party!