[00:06:43] madhuvishy: "In the end, though, just having a number that everyone can point to as an acceptable proxy of reality is more important than how accurate that number may be. " [00:09:43] nuria: yup, true [00:10:03] madhuvishy: also "A study published this year by a Web security company found that bots make up 56 percent of all traffic for larger websites," [00:10:22] Analytics-Backlog: Create a tool that can read the elements of a wiki template and call the API {kudu} - https://phabricator.wikimedia.org/T117290#1794893 (Milimetric) @madhuvishy says "I think you can already do this using mwparserfromhell" Thanks both! [00:38:33] madhuvishy: i have this other spreadsheet: https://docs.google.com/spreadsheets/d/1nslrLlWgxiSF7tQYcHY1_aoZFNGvcqWj5C0TFc8nJ_8/edit#gid=1796802020 [00:38:44] madhuvishy: but i think i am missing your main one [00:42:42] Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1794923 (yuvipanda) So I can confirm that it does get killed in 30min but just doesn't get reflected in the status. I'm investigating why [00:44:36] Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1794925 (yuvipanda) ``` Nov 10 00:21:23 quarry-runner-01 celery[29630]: Traceback (most recent call last): Nov 10 00:21:23 quarry-runner-01 celery[29630]: File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in... [00:48:03] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, and 2 others: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1794939 (bd808) [01:01:51] Analytics, MediaWiki-extensions-Gadgets: Gadget usage statistics - https://phabricator.wikimedia.org/T21288#1794958 (kaldari) [01:02:29] Analytics, Community-Tech, MediaWiki-extensions-Gadgets, Tracking: Gadget usage statistics - https://phabricator.wikimedia.org/T21288#240606 (kaldari) [01:27:30] nuria: what main one? [03:04:53] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [30.0] [03:14:58] !log restarted eventlogging [03:15:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [03:45:02] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [09:26:51] (PS1) Christopher Johnson (WMDE): adds parameters to get_estimated_card_from_prop_predicate [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252169 [09:27:29] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds static path for tsv output [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252033 (owner: Christopher Johnson (WMDE)) [09:28:20] (PS2) Christopher Johnson (WMDE): adds parameters to get_estimated_card_from_prop_predicate [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252169 [09:29:51] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds parameters to get_estimated_card_from_prop_predicate [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252169 (owner: Christopher Johnson (WMDE)) [09:45:22] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015, User-greg: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1795257 (Aklapper) a:greg I think chasemp and I worked out the technical side of things in the last comments, so... [12:08:02] Analytics-Cluster, Analytics-Engineering: analytics1032 has / mounted ro - https://phabricator.wikimedia.org/T118175#1795496 (MoritzMuehlenhoff) There's plenty of oom-killer invocations on java processes in syslog. One of those suspiciously looks like the culprit (at Nov 7 17:10:50). (The oom killing h... [12:31:02] (PS4) DCausse: AvroBinaryMessageDecoder: added "ts" as a possible record for timestamp [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [12:32:29] (CR) DCausse: [C: -1] "The avro schema is incorrect (ts is missing). Not sure what I've done, looks like I copied/pasted it from a bad reference." [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) (owner: DCausse) [12:42:58] (CR) DCausse: "I'm wondering the proper way to fix this issue would not be to update the schema and rename ts to timestamp in CirrusSearchRequestSet.avsc" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [12:44:58] (PS9) DCausse: Add initial oozie job for CirrusSearchRequestSet [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) [12:58:46] Analytics: Update reportcard.wmflabs.org with July-October data - https://phabricator.wikimedia.org/T116244#1795567 (Milimetric) @Tbayer, I've not received new data. Sorry I didn't see this task earlier. The process to get stuff in front of Analytics' eyes is confusing at the moment, we mostly lost use of... [12:59:02] Analytics-Backlog: Update reportcard.wmflabs.org with July-October data - https://phabricator.wikimedia.org/T116244#1795569 (Milimetric) p:Triage>Normal a:Milimetric>None [13:30:46] Analytics-Backlog: Update reportcard.wmflabs.org with July-October data - https://phabricator.wikimedia.org/T116244#1795623 (ezachte) Well I am mostly responsibly for this. When the comScore unique visitors and page views counts dropped so suddenly that it raised serious doubt over the numbers, and our intern... [13:37:57] Analytics-Backlog: Update reportcard.wmflabs.org with July-October data - https://phabricator.wikimedia.org/T116244#1795630 (ezachte) BTW I propose we don't update comScore trends (and tell so on the RC). Their sudden drop seems inconsistent with our internal numbers. comScore was asked to comment and they ag... [14:06:48] joal, hi [14:19:28] (PS1) Christopher Johnson (WMDE): add param qname to getclaims graphs [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252220 [14:20:38] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] add param qname to getclaims graphs [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252220 (owner: Christopher Johnson (WMDE)) [14:29:42] Analytics-Tech-community-metrics: Pull user profile data from wikitech.wikimedia.org and use it in community metrics - https://phabricator.wikimedia.org/T53050#1795781 (Qgil) I think it can be declined. It hasn't progressed in more than two years, and it doesn't look like going anywhere. [14:43:47] hi mforns, still on and off today, Lino's homme [14:48:43] hey joal [14:48:54] ok [14:49:11] just wanted to let you know I read your email [14:49:25] k mforns, makes sense ? [14:49:34] yes, totally [14:49:46] but, wanted to ask what is the correct way to query for those chars? [14:50:03] hmmmm, actually I have no idea !!! [14:50:17] In cassandra I can tell, but know through restbae [14:50:26] I tried HIV%2FAIDS [14:50:34] but it doesn't work [14:51:04] ok [15:43:29] holaaaa [15:43:39] helloo [15:44:01] mforns: did we had anywhere tentative q3 goals? [15:44:14] Hi a-team, still baby-sitting Lino, will be partially here tonigfht [15:44:15] mforns: not q3 but next quarter that is [15:44:28] joal: sounds good, baby comes first [15:44:41] nuria, do you mean my persnoal goals? [15:45:17] mforns: no, team wise [15:45:37] nuria, oh, looking [15:45:46] np joal, hope the little guy feels better [15:46:46] cc milimetric , did we have anywhere our q3 goals? [15:47:03] nuria, I don't remember having any q3 goals notes... [15:47:06] main doc only goes up to q3: https://docs.google.com/presentation/d/1wzm4ZppxiR5Y72Hkg5k2YMmH_2J01Hucb4yJb0yuNzY/edit#slide=id.ge41920073_17_0 [15:53:48] nuria: nope, no Q3 goals' [15:54:52] milimetric: nuria i think this doc has some stuff - https://docs.google.com/document/d/1Rtp4vE1HCXfYzohKgat1A1J88Vb1rcw5GBAjDQQjaTk/edit [15:56:19] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1796023 (Ottomata) Sounds good. Shall I just find a time and set one up? [15:57:31] milimetric, mforns: ok, i have a meeting today where we might need topresent some preliminary stuff, will add some things to: [15:57:40] https://docs.google.com/presentation/d/1wzm4ZppxiR5Y72Hkg5k2YMmH_2J01Hucb4yJb0yuNzY/edit#slide=id.ge828da815_0_0 [15:57:45] cc mforns milimetric [15:57:46] ok [15:58:52] mmm, nuria you wanna talk about it? I'm happy to help [15:59:20] my Q3 ideas would be like: continue wikistats work and Event Bus work [15:59:30] milimetric: i just wrote those two down [15:59:40] sweet [15:59:47] milimetric: also ops, as having teh pageview API will increase our ops workload [16:00:02] milimetric: i gave a concrete example of eventbus mvp (cc ottomata ) [16:00:09] as in "edit stream comes to hadoop" [16:00:09] and Piwik if that ever happens, and a little bit of support for wikimetrics [16:00:26] let me write those two on a side note there [16:05:30] nuria, didn't we have a pageview-api retro scheduled for now? [16:05:38] I may be wrong [16:08:22] hi a-team... is there a postmortem on API happening now? I'm sitting alone in the hangout [16:08:36] kevinator, asking the same [16:09:03] are you in the hangout too? Is google not working? [16:10:07] postmortem? should I come? [16:10:10] (Hi all!) [16:10:27] joal, ottomata , milimetric , mforns , kevinator , madhuvishy : moving pageview API retro [16:10:34] so joal can make it [16:10:39] ok [16:10:43] thx nuria [16:10:45] as today is not teh best day for him [16:11:35] Analytics-Cluster, Analytics-Engineering: analytics1032 has / mounted ro - https://phabricator.wikimedia.org/T118175#1796068 (Ottomata) I saw the OOM killer too, but wasn't sure how that could cause the root partition to go into read only. [16:17:33] milimetric: for cassandra .. which are your permits? [16:17:39] "aqs-admins?" [16:19:26] nuria: where? [16:19:34] oh moving it [16:19:35] cool [16:19:36] nm [16:21:16] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1796119 (ezachte) Fixed foundation stats, which uses codes a bit differently: www.f is foundation desktop, m.f is foundation mob... [16:23:08] ottomata:, milimetric , madhuvishy , mforns , kevinator , joal: moving tasking half an hour later [16:23:20] ok [16:25:07] ottomata:, milimetric , madhuvishy , mforns , kevinator , joal: arg, sorry, we cannot as it will conflict with our wikistats meeting. I guess i will just join late [16:27:52] nuria: sure, also you can use a-team to ping us all - i think we all set it up as buzzword to get pinged [16:28:01] ooohh [16:28:39] yes, all of us set up a-team as a buzzword while you were on leave. We forgot to tell you to do that too :-) [16:30:42] ottomata: i did not do much on mvp front yesterday but i added couple tests, let me push those [16:38:09] ottomata: pushed couple tests, will continue working on testing [16:38:11] today [16:38:18] awesome [16:38:30] nuria: i'm working on some testing soon too, we should sync up [16:38:46] ottomata: ok, now? [16:39:00] sure [16:39:28] omw to batcave [16:51:03] Analytics-Backlog: Create depot to have different API clients for pageview API - https://phabricator.wikimedia.org/T118190#1796223 (Nuria) [16:51:50] Analytics-Kanban: Understand the Perl code for Client OS Major Minor Version report {lama} - https://phabricator.wikimedia.org/T117246#1796227 (Nuria) a:Milimetric>Nuria [16:52:36] Analytics-Kanban: Spike: Understand how Wikistats Traffic reports are computed {lama} [8 pts] - https://phabricator.wikimedia.org/T114669#1796233 (Nuria) Open>Resolved [16:52:37] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 {lama} [13 pts] - https://phabricator.wikimedia.org/T112911#1796234 (Nuria) [16:52:53] Analytics-Kanban, Patch-For-Review: Analytics support for echo dashboard task {frog} [8 pts] - https://phabricator.wikimedia.org/T117220#1796235 (Nuria) Open>Resolved [16:54:23] Analytics-EventLogging, Analytics-Kanban, MediaWiki-extensions-WikimediaEvents, Patch-For-Review: Provide oneIn() sampling function as part of standard library - https://phabricator.wikimedia.org/T112603#1796255 (Nuria) Open>Resolved [16:54:55] dcausse: with yesterday's crazyness i am running back on Cr-ing your code [16:55:03] dcausse: i have it in mind for today [16:55:10] nuria: thanks! :) [16:56:21] nuria: also I was wondering (concerning the ts <> timestamp problem) if we should not rename it to timestamp [16:56:51] dcausse: what was the issue there? was it a reserved word? [16:57:20] reading reviews I see that ottomata has some concerns with a field name "timestamp", but reading java code in refinery it looks like it's a convention to have timestamp field [16:58:04] so I'm not sure what we should do :/ [16:58:48] dcausse: where? [16:58:50] :) [16:59:17] my concerns are mostly conventional, we use dt for ISO8601 string timestamps [16:59:44] dcausse: whatever makes our life easier [16:59:59] ottomata: https://gerrit.wikimedia.org/r/#/c/246990/2/refinery-camus/src/main/avro/CirrusSearchRequestSet.avsc [17:00:21] nuria: on my side the easiest is not to re-deploy the avro schema :) [17:00:32] naw i mean in refinery [17:00:50] dcausse: if you already have it as timestamp, i'm ok with that, is it a problem for you? [17:01:12] ok, tehn it sounds like we do not need to change it and we can abandon the patch [17:01:52] ottomata: no we changed it to "ts" according to your comment, but refinery-camus expects "timestamp" so it's why I've submitted: https://gerrit.wikimedia.org/r/#/c/251267/ [17:01:57] i prefer ts for ints to match dt convention, and to avoid possible name conflict, buuuuut, my preference is only slight, and if it is technically annoying, then just ignore [17:02:32] (sorry for the confusion btw) [17:02:36] cannot join standup cc milimetric give me a sec [17:02:44] ha, well [17:02:51] the best way to do it is make it configurable [17:02:52] e.g. [17:03:00] https://github.com/wikimedia/analytics-camus/blob/wmf/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/JsonStringMessageDecoder.java#L59 [17:03:47] dcausse: nuria, i think we should implement the timestamp properties like from JsonStringMessageDecoder in the AvroDecoder. [17:04:14] https://github.com/wikimedia/operations-puppet/blob/production/modules/camus/templates/webrequest.erb#L26 [17:05:00] ottomata: are camus imports run inside their own JVM? would it be possible to e.g. -Davro.timestamp.field="ts" and keep "timestamp" as a default? [17:05:22] dcausse: on standup, let's sync up later [17:09:02] dcausse: uhhh, no/yes? they are run as a mapreduce job on the cluster [17:09:09] but yes, you can -D... to set them [17:09:19] but, we would just do it in the camus .properties file [17:09:27] in here [17:09:28] https://github.com/wikimedia/operations-puppet/blob/production/modules/camus/templates/mediawiki.erb [17:09:55] ha, the property is being set, but the Avro code doesn't use it [17:09:56] # use the dt field [17:09:56] camus.message.timestamp.field=timestamp [17:11:01] ottomata: thanks! will try to update my patch then [17:11:16] k cool, yeah, i think you can just copy/paste from the Json decoder [17:11:22] it'd be cool if that stuff was generic in camus somewhere, but meh? [17:11:25] not worth it for this change [17:11:43] dcausse: the code in jsondecoder is nice because it makes the timestamp format flexible [17:11:53] you can do string format, or int timestamp in seconds or milliseconds [17:21:17] Guys, do you give me a few minutes somewhere in tasking to review the doc about wikistats graphs, or is it not necessary ? [17:21:21] a-team --^ [17:22:00] joal, great idea [17:32:07] joal, milimetric , madhuvishy , mforns , kevinator: will catch up mid way on tasking meeting [17:32:18] aha [17:38:47] Analytics-Kanban: Pageview API Press release {slug} - https://phabricator.wikimedia.org/T117225#1796368 (Nuria) [17:54:10] hey mforns, I'm back! [17:54:21] (in batcave) [17:54:44] joal, so which was the UDF that handled arrays again? [17:55:11] nuria: sorry, yes, aqs-admins [17:55:27] (CR) OliverKeyes: "Nuria, thoughts on my enum comments? Not having this patch resolved is acting as a blocker and creating some technical debt in how our ana" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) (owner: OliverKeyes) [17:56:29] Ironholds: which comment specifically? [17:56:39] Ironholds: you talking about explode? http://mechanics.flite.com/blog/2014/04/16/using-explode-and-lateral-view-in-hive/ [17:57:06] milimetric, oh ok [17:57:11] Ironholds: i answered several here: https://gerrit.wikimedia.org/r/#/c/247601/5/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/ExternalSearch.java [17:57:46] Ironholds: with the idea of having a better system to manage the amount of regexes on that file [17:57:49] Ironholds: brb [17:59:16] oh, I thought I'd replied to those [17:59:29] milimetric, no, I'm talking about an example of an actual Java UDF that can handle arrays [17:59:44] I do not want to write the world's most complicated Hive subqueries and then have to plug them into every single query [18:00:14] (CR) OliverKeyes: "Okay, comment (which evidently didn't save) was:" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) (owner: OliverKeyes) [18:10:57] Ironholds: UDF is not using arrays but maps IIRC, and it's mediacount stuff [18:12:57] thanks. The only UDF I can find that's media-related is MediaFileUrlParserUDF and I'll confess to not understanding how the hell it works [18:13:13] Right, that's the one I think :) [18:14:33] Facebook has some array-based ones that don't look terrible [18:14:36] I'll experiment with them [18:14:43] k Ironholds [18:19:22] Analytics-Backlog: Allow metrics to roll up results by user across projects {kudu} [5 pts] - https://phabricator.wikimedia.org/T117287#1796712 (Milimetric) [18:27:41] Ironholds:let me help you with your code changes for enums, the idea is to have less regexes /string constant pairs and give those some structure, we can work on that later on today if you want [18:28:10] nuria, sure, I get that, but I don't get /why/ that is a goal. [18:28:22] We currently have 5 lines of code. The intent is to replace them with 20 lines that produces no additional functionality. [18:28:45] Ironholds: no, you have one string constant per regex [18:28:58] like /^bing/ goes with "BIG" [18:29:01] like /^bing/ goes with "BING" [18:29:01] yep [18:29:14] so with an emun those two become a tuplet [18:29:31] okay, that makes sense. It still works out as a more complicated thing, but I can see some advantages for maintainability. [18:29:32] there is no additional code, rather structure, [18:29:41] Ironholds: let's work together on it [18:30:07] also makes those regex/string pairs "extractable" [18:30:20] and usable in other udf without extension of a class [18:31:02] cool! I'm down [18:34:34] I need to review dcausse's patch first [18:35:19] kk [18:35:25] dcausse: did you push you latest patch then and are you going to do dt/timestamp changes in puppet? [18:35:26] I need to work out how one manipulates a weird struct [18:35:43] nuria: I'm working on implementing something like https://github.com/wikimedia/analytics-camus/blob/wmf/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/JsonStringMessageDecoder.java#L59 [18:35:47] in refeniry-camus [18:36:02] concerning the oozie job everything is done [18:36:36] dcausse: ok, i see, that change should be testable [18:36:52] will do my best :) [18:38:08] dcausse: thank you [18:39:21] dcausse: what about the camus cron job? did we updated that one with "--check" option? [18:39:53] nuria: yes done by joal and deployed, _IMPORTED flag is created as expected [18:40:20] dcausse, nuria : yes, but timestamp being imcorrect, flag is created on incorrect data [18:40:33] yes [18:41:25] do we have any other jobs that use AvroBinaryMessageDecoder or AvroJsonMessageDecoder ? [18:41:35] dcausse: active? no [18:41:39] ok [18:41:55] I was wondering about the defaults, looks like it was unix timestamp in milli sec [18:42:37] dcausse: yeah i think camus internally uses the timestamp as miliseconds [18:42:52] the json code i linked to converts to milliseconds if you aren't providing that [18:43:10] our jobs send timestamps in seconds, so this patch was necessary anyways :) [18:45:48] joal: who runs "refinery-drop-cirrus-searchrequest-set-partitions" [18:46:17] joal: are they run on a cron outside hdfs? [18:46:33] cc ottomata [18:47:09] yes cron. [18:56:08] actually a-team my wife just arrived, can join ! [18:56:12] :) [18:56:17] cool joal [18:56:29] pff, just in time :) [19:18:05] (CR) DCausse: [C: -1] "Will implement something similar to https://github.com/wikimedia/analytics-camus/blob/wmf/camus-kafka-coders/src/main/java/com/linkedin/ca" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [19:18:55] wikimedia/mediawiki-extensions-EventLogging#510 (wmf/1.27.0-wmf.6 - 1b0d4bc : Mukunda Modell): The build has errored. [19:18:55] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/1b0d4bc35671 [19:18:55] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/90372650 [19:35:45] Analytics-Backlog: Create a set of celery tasks that can handle the global metric API input {kudu} - https://phabricator.wikimedia.org/T117288#1797007 (Milimetric) [19:36:38] Analytics-Backlog: Create celery chain or other organization that handles validation and computation {kudu} [8 pts] - https://phabricator.wikimedia.org/T118308#1797008 (Milimetric) NEW [19:38:40] Analytics-Backlog: Implement the logic of each node in the celery chain {kudu} [5 pts] - https://phabricator.wikimedia.org/T118309#1797023 (Milimetric) NEW [19:39:15] Analytics-Backlog: Expose the results of the global metric at a public link, that's available immediately for the API {kudu} - https://phabricator.wikimedia.org/T118310#1797031 (Milimetric) NEW [19:41:45] the meeting will be in the batcave, no? [19:41:50] now ish? [19:42:02] i guess [19:42:09] a-team ^ [19:42:17] in, but not for long :) [19:42:32] me too, wanna make it to office, eat etc [19:44:44] woops [19:45:06] nuria: ^ I don't think you're getting a-team yet, right? :) [19:47:43] hey, not keeping track of meeting changes, is this somehting I shoudl go to? [19:48:05] no ottomata [19:48:08] just debrief [19:48:12] on wikistats [19:48:16] oh ok [19:48:19] cool danke [19:51:29] milimetric: no, i do not thinkso [19:51:39] nuria: join us on batcave? [19:51:49] madhuvishy: omw [19:58:52] Hi ottomata! Thanks for the links. So, I guess I want to set up a periodic job asking hive or kafka for query strings from all donate.wm.o hits from the past X hours/day where the qs includes recipient_id. [20:00:01] hiyaaa [20:00:05] Looks like you set up something similar for us with the banner logs [20:00:20] yeah, egjegg, if you want historical stuff just for analysis, i'd use hive or spark [20:00:36] if you need more realtime stuff to feed back into an app (like some FR stuff does) then you can use kafka maybe with kafkatee [20:00:41] nope, this is something FR will want to consult more realtime [20:00:50] oh [20:00:51] ok, so kafka it is [20:00:55] well, i mean, hm. [20:00:59] not actually realtime, [20:00:59] what do you mean real time? [20:01:04] you say past x hours/day [20:01:31] yeah, just so they can iterate on the email wording after seeing how yesterday's mail fared [20:02:41] so, not realtime, then, right? [20:02:45] this is batch on historical data? [20:03:05] sure, if by 'historical' you mean the past day [20:03:10] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [20:03:42] yes [20:03:59] ejegg, at the moment, realtime would be anything that needs to happen more than within an hour [20:04:02] There's no specific timing requested on the ticket, but I'd imagine daily will be very useful [20:04:03] the FR stuff needs the last 15 minutes [20:04:08] from any given time [20:04:14] daily is easy then, you should go wtih hadoop stuff [20:04:17] not from kafka [20:04:19] ah right, the banner stuff is definitely more time-sensitive [20:04:37] if you do from kafka, you have to parse ALL(most) webrequest logs in realtime (with kafkatee, maybe) [20:04:49] if you go with ahdoop, you can wait til a whole days of data is available, and then query it with hive (SQL) [20:05:00] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [20:05:04] Gotcha. [20:05:41] ejegg, are you in analytics-privatedata-users group (with access to stat1002?) [20:05:55] I am not yet! [20:06:01] madhuvishy: do you have a link to your awesome hive tutorial slides? [20:06:17] ottomata: they are on commons, let me find link [20:06:29] ejegg: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Cluster_access [20:06:47] ottomata: https://commons.wikimedia.org/wiki/File:Introduction_to_Hive.pdf [20:06:48] thanks! I'll make that phab ticket now [20:06:57] thanks madhuvishy ! [20:08:12] ejegg: is this something you will want run daily indefinitely, or just run by hand for a while? [20:08:32] ottomata: i'll want to set up a daily job [20:09:14] ejegg: ok, then likely you'll want to learn oozie too, as you can schedule the jobs more reliably than just with cron, but one thing at a time, eh? [20:09:15] :) [20:09:36] our 3rd party email company can't give us click rates unless they send links through one of their domains, which all look a little suspicious [20:10:00] aha, I was about to ask if there was a recurring job framework. Will look into oozie [20:10:21] Thanks for all the help! [20:10:22] ejegg, wear eye protection, it is ugly [20:10:28] but powerful! [20:10:34] (oozie is XML configs...) [20:10:41] ooh, thrilling. [20:10:57] the main reason we like it better than cron, is you can launch jobs based on dataset present [20:11:06] and datasets are flexible [20:11:08] so you can say things like [20:11:24] launch this job once each 48 hour period is avaible every 24 hours [20:11:27] things like that [20:11:30] for you ruse [20:11:31] use [20:11:34] you'll just say [20:11:48] launch this daily job once the data for day X is avail [20:12:01] and each 24 hour period will have a single job instance associated with it [20:12:55] If FR asks for fresher data than 1 day old, how recent / often is possible / reasonable to ask from Hive? [20:13:09] hourly [20:13:33] ah, great! Pretty sure they won't need that, but just in case [20:13:37] an hour's data will usually be available within an hour or two of the current hour [20:13:39] k [20:16:24] Analytics-Backlog: Expose the results of the global metric at a public link, that's available immediately for the API {kudu} [8 pts] - https://phabricator.wikimedia.org/T118310#1797209 (Milimetric) [20:17:02] Analytics-Backlog: Create a set of celery tasks that can handle the global metric API input {kudu} [0 pts] - https://phabricator.wikimedia.org/T117288#1797212 (Milimetric) [20:17:37] Analytics-Backlog: Implement a simple public API to calculate global metrics {kudu} [0 pts] - https://phabricator.wikimedia.org/T117285#1797213 (Milimetric) [20:17:40] Analytics-Backlog: Create a tool that can read the elements of a wiki template and call the API {kudu} - https://phabricator.wikimedia.org/T117290#1797216 (Milimetric) p:High>Normal [20:20:19] Analytics-Backlog: Build a public form that can hit the new API {kudu} [8 pts] - https://phabricator.wikimedia.org/T117289#1797246 (Milimetric) [20:21:38] Analytics-Backlog, Analytics-EventLogging: EventLogging (MySQL?) Kafka consumer stops consuming after Kafka metadata change - https://phabricator.wikimedia.org/T118315#1797263 (Ottomata) NEW [20:22:42] Analytics-Backlog, Analytics-EventLogging: EventLogging (MySQL?) Kafka consumer stops consuming after Kafka metadata change - https://phabricator.wikimedia.org/T118315#1797285 (Ottomata) I've only noticed this happening for the MySQL consumer, but I'm not sure why it would be different than the others. [20:22:44] kevinator: can you come to batcave? [20:27:50] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [20:29:51] nuria: my wifi crashed again :( [20:30:01] milimetric: ok, we wait for you? [20:30:46] I'm restarting, hopefully it works. Will let you know either way [20:31:11] milimetric: ok, we will wait a bit [20:31:20] nuria: do you remember what happened to the task for provisioning eventlog2001 as a cold spare? [20:31:23] did i drop the ball? [20:31:33] no , it is a spare [20:31:43] we have that box [20:31:47] it is not in site.pp, though [20:31:55] moritzm pinged me about that [20:32:01] we had it on may [20:32:08] it could be my fault, i may have forgotten to perform some step [20:32:17] do you remember what the task # was? [20:34:08] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [20:37:57] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [20:41:18] let me look [20:51:17] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [30.0] [20:53:59] cannot find task [20:54:24] but i can tell you we had that machine when otto and I moved EL a while back (spring) [20:55:11] ori, it exists [20:55:12] in codfw [20:55:25] eventlog2001.codfw.wmnet [20:55:31] yeah, but it's not in site.pp, hence the problem [20:55:39] ja, i think it was never used [20:55:49] even as a cold spare, it should be there [20:55:59] for key management, etc. [20:56:24] i mean, i don't know the history of it [20:56:29] i just remember looking for it once and finding it there [20:57:02] kk [20:57:07] i'll deal with it [21:03:39] milimetric: yt? still in meeting? [21:04:31] nuria: I was out for lunch... am back now [21:04:35] still in batcave [21:04:40] ? [21:04:49] yes, can you join? [21:05:01] omw [21:05:12] ottomata: yes, still in meetings [21:05:39] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [21:05:58] Analytics-Kanban: Pageviews per country wikistats report. Feed data from hive - https://phabricator.wikimedia.org/T118323#1797437 (Nuria) NEW [21:09:06] milimetric: k [21:14:02] Analytics-Kanban: Pageviews per country wikistats report. Feed data from hive - https://phabricator.wikimedia.org/T118323#1797479 (Nuria) a:Milimetric [21:17:20] ottomata: do you know anything about the aws alert? [21:17:26] *aqs [21:18:47] hello there #wikimedia-analytics , I'm new to the Discovery team, (front-end) and I'm having an issue with event-logging. Was wondering if anyone here could help me out. [21:18:51] gwicke: I can [21:19:32] I truncated the per-project table to backfill the data with the new format, and completely forgot about the special raw for AQS monitoring [21:19:44] I'll ask milimetric if he can run the script inserting this value [21:19:53] gwicke: --^ ^-- milimetric [21:20:18] joal: kk [21:25:46] mforns: i forget... are we running weekly browser reports and monthly ones? or only weekly? [21:25:47] I'm making a event logging request like the one below, and it's not causing any errors, but it's also not populating any databases, does anyone know why this might be the case? [21:25:54] https://www.irccloud.com/pastebin/noJYabHy/ [21:25:58] nuria, only weekly [21:26:15] jan_drewniak: did you tried testing in vagrant with devserver? [21:26:18] but it should be fairly easy to change to monthly, or add monthly [21:26:20] jan_drewniak: or beta labs? [21:26:26] jan_drewniak: and look at logs? [21:26:51] jan_drewniak: https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/TestingOnBetaCluster [21:27:21] nuria: thanks for the tip, I'm new to all this so I haven't done any beta-testing yet [21:28:00] jan_drewniak: ok, please test on vagrant and beta labs and we can help you if you cannot figure it out (you might need to ask for permits to beta labs machine, as you would be needing sudo) [21:28:02] joal / gwicke: for future reference, the data is the same as the data inserted by this test with this WARNING on it: https://github.com/wikimedia/restbase/blob/master/test/features/pageviews/pageviews.js#L80 [21:28:06] I'll insert that now [21:28:18] jan_drewniak: https://www.mediawiki.org/wiki/Extension:EventLogging#Sanity_checking_your_setup [21:28:59] joal / gwicke: it's also in aqs1001.eqiad.wmnet:/home/milimetric/insert-fake-data-for-monitoring.cql [21:29:35] thx milimetric ! [21:31:32] Analytics-Backlog: Visualization of Browser data to substitute current reports on wikistats - https://phabricator.wikimedia.org/T118329#1797533 (Nuria) NEW [21:32:33] Analytics-Backlog: Run browser reports on hive monthly as well as weekly - https://phabricator.wikimedia.org/T118330#1797540 (Nuria) NEW [21:32:49] Analytics-Backlog: Run browser reports on hive monthly - https://phabricator.wikimedia.org/T118330#1797540 (Nuria) [21:33:37] Analytics-Backlog: Visualization of Browser data to substitute current reports on wikistats - https://phabricator.wikimedia.org/T118329#1797549 (Nuria) We should have a way to visualize weekly and monthly data. [21:35:44] Analytics-Backlog: Visualization of Browser data to substitute current reports on wikistats - https://phabricator.wikimedia.org/T118329#1797550 (Nuria) p:Triage>Normal [21:35:54] Analytics-Backlog: Run browser reports on hive monthly - https://phabricator.wikimedia.org/T118330#1797552 (Nuria) p:Triage>Normal [21:43:59] nuria: around? [21:44:16] i have 15 minutes until 1-1 with Kevin, we can catch up now if you have time [21:45:04] madhuvishy: better at 3:15 or so ? can you make it then? [21:45:16] nuria: yeah [21:53:07] (CR) Nuria: [C: 2] "I think we are ready to merge this patch, will wait for joal to confirm." (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) (owner: DCausse) [21:55:50] Analytics, Fundraising-Backlog, Wikimedia-Fundraising: Public dashboards for CentralNotice and Fundraising - https://phabricator.wikimedia.org/T88744#1797621 (awight) [21:59:48] milimetric: got a sec? [22:31:26] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1797786 (Ottomata) Just made a calendar event for Tuesday at 10:30 PST. Happy to move it if some other time is better. [22:35:12] ejegg: why does the FR cluster need to read this? [22:36:17] ottomata: Oh, sorry, just someplace we can further process the files and insert them into the fr stats db [22:36:30] why do they need inserted in the fr stats db, that is my q [22:36:30] :) [22:37:20] is this data going to be fed back into an app? or is for analysis to inform future dev work on email stuff? [22:37:50] yeah, it's to let the non-tech FR team know which emails work best [22:38:10] aye ok, just not sure why it needs to go into a mysql db then [22:38:18] ottomata: how about making the schema repository URI configurable, with a template syntax for placeholders, and supporting both http and file schemes? so one invocation could set it to "https://meta.wikimedia.org/w/api.php?action=jsonschema&title=%(schema)s&revid=%(revid)s&formatversion=2", and another could set it to "file:///srv/schemas/jsonschema/%(schema)s.%(revid)s.json" ? [22:38:20] couldn't you just use hive and generate a report or something? [22:38:39] right, I guess we could digest the info up front and just give them some stats [22:38:51] ori, the way I have it now, is file schemas just take precedence [22:39:18] ejegg: that is probably better than figuring out how to put the data back into FR mysql [22:39:19] :) [22:39:26] ori [22:39:34] the local schemas are preloaded into schema_cache [22:39:37] that's it [22:39:45] Yeah, I was just picturing that as a step along the way [22:39:55] ottomata: it just seems more elegant -- instead of hard-coding either metawiki or the file path, you just use a URI to point to the schema repository, just as you do for readers/writers [22:40:02] OH [22:40:14] you mean on loading? for a particular schema? yeah, mostly. [22:40:25] on initialization [22:40:41] i had a low priority todo in there about merging file_get_schema and http_get_schema [22:40:46] to use http:// or file:// [22:40:50] and, originally i was doing something like that [22:41:01] looking up schema by name in files [22:41:28] buuuut, i removed that feature. mainly because there is a requirement that latest revision of schema will be used by default [22:41:44] and talk about how local files will only have one revision of schema present at any given time in repo [22:41:47] not sure how I feel about that [22:41:50] but i wanted to support it [22:42:06] so, i support not having revision in schema filename [22:42:32] i did ahve a schema_path format thing kinda like you just suggested in an earlier patch... [22:43:50] in the same way that there is @writes('mysql') and @reads('udp') there could be a @repo('file') and @repo('http') [22:44:18] and we could re-use the "construct a handler based on the URI scheme" logic from readers/writers [22:44:30] hm, that is configured how? on CLI for the various bin/s? [22:44:58] --schema-repo http:///meta... --schema-repo file:/// [22:44:58] ? [22:45:09] hmmm [22:45:14] yeah that could work [22:45:20] with defaults if you dont' provide [22:45:24] to meta [22:45:27] http://meta [22:45:29] ... [22:45:29] hm [22:45:30] (CR) DCausse: Add initial oozie job for CirrusSearchRequestSet (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) (owner: DCausse) [22:45:41] ori, could be cool. [22:46:08] you could even have [22:46:17] ori, if you get a sec and feel like having fun, have at it :) i might not get to it now [22:46:17] @repo('confluent://...') [22:46:21] heheheh [22:46:27] and it could have logic to translate from avro to json schema [22:46:33] * ori wasn't kidding! [22:46:51] well, i do thikn avro support for this will be nice eventually, but i don't think we need confluent if we are doing everything already [22:46:54] URI all the things! [22:47:07] am planning on building in serialization to the topic config [22:47:20] kk [22:47:21] and eventually handling producing avro json or avro binary [22:47:22] BUT [22:47:27] ONE DAY [22:47:27] :) [22:47:28] not now :) [22:47:29] hehehe [22:47:37] ori, now that I have your attention [22:47:37] wfm [22:47:39] heh [22:47:44] dan and I had a thought [22:47:50] of abstracting Event into a class [22:47:53] class Event(dict) [22:48:07] much of the work in this patch is just functions that abstract event meta data retrieval [22:48:19] schema_from_event(event) [22:48:19] etc. [22:48:23] might be nicer to be able to do [22:48:27] event.schema [22:48:44] or event.schema() [22:48:56] or event['schema'] with __getitem__ overloaded [22:48:57] dunno [22:49:13] i decided not to do it for this patch [22:49:16] but was an idea, what do you think? [22:49:56] could do. when i wrote this code initially i had just spent some time thinking about OOP and was under the influence of some rich hickey talks about how OOP is nice but it should not be reflexive default for all use-cases [22:50:08] and that there is a lot of value in limiting yourself to just a half-dozen or so basic data types [22:51:04] eventlogging is a respectably-sized python module that (i think/hope) is well-organized and well-factored without using classes [22:51:21] so i was happy with how that turned out, even though there were specific instances where i thought having methods would be nice [22:51:26] ja was thinking that too [22:51:44] python is funky though, since the way it does modules basically gives you some (globalish) OO stuff that you do't get in other langs [22:51:53] e.g. eventlogging.schema.schema_cache [22:51:56] yeah, namespaces are awesome [22:52:06] php is the same [22:52:11] (>=5.3) [22:52:16] its been a while since I was a PHP guy :) [22:52:18] I am pre namespaces :) [22:52:22] \Foo\Bar() instead of Foo::Bar() [22:52:31] python uses one syntax for both, which is more elegant [22:52:43] ori, in the same vein, after reading that coroutine stuff [22:52:57] it doesn't really seem that different than using an object [22:52:57] but I think part of the problem with how OOP is typically done is that people just use classes as namespaces [22:53:02] yeah [22:53:03] but classes are really much more than that [22:53:11] if you only need a single instance of something, you do'nt need a class [22:53:13] the mediawiki codebase sticks everything in a class and it's kind of gross [22:53:14] yeah [22:53:26] we have lots of classes with only static methods [22:53:30] so, ja the way we are using coroutines here [22:53:40] could be done in a class too? [22:53:46] and it would almost be the same [22:53:59] yes, but coroutines and functions are composable too :) [22:54:13] you don't get inheritance, but i'm not sure inheritance would be all that useful [22:54:29] class KafkaWriter(Writer): [22:54:29] def __init__(...): self.producer = ... [22:54:29] def send(event): [22:54:29] .... [22:54:55] each of the writers is just doing some setup, (init), and then calling (yield) [22:54:58] anyways i gotta run in a moment. the tl;dr from my perspective is that i can easily imagine that using classes would make sense and is the natural way to express in code some requirement [22:54:59] then it does stuff [22:55:16] aye ok, cool, i am not suggesting we change the writers/readers, those are awesome [22:55:20] but for Event, maybe. [22:55:21] but that should be weighed against the cost of introducing OOP to a module that survived well without them [22:55:22] just not now :) [22:55:26] heh [22:55:27] but i could be persuaded either way [22:55:29] well, you have it in some places [22:55:33] parse.py, jrm.py, etc. [22:55:35] but ja [22:55:53] oh yeah, fair point! [22:55:55] that's true [22:55:57] i forgot about those [22:55:59] i'm with you on avoiding the OOM unless you really need an object [22:56:09] many times you don't [22:56:31] this is an awesome talk btw: http://www.infoq.com/presentations/Value-Values [22:56:32] event is stating to feel like it needs it, because I have several functions that all take an event as their first parameter [22:57:16] and replacing 'event' with 'self' would be better because..? :) [22:57:28] cleaner for users of the event [22:57:48] they dont' have to know they have to call a special function to get the schema [22:57:50] but not other data [22:57:54] if the class extends dict [22:57:57] we can abstract that from them [22:58:16] yeah, i think that's definitely valid [22:58:24] not sure i'm sold on it but i see the point [22:58:34] ja, anyway, like i said, i'm not doing that now anyway [22:58:48] cool, ori, i would love to do a walkthrough with you of this patch sometime in the next week, can I book 30 mins of your time somehwere? maybe later in the week, like friday? [22:58:59] sure. 30 might be too little [22:59:04] i'm sure you'll see a buncha things that woudl be slick [22:59:10] well i'll book 30 and we can keep talking if we feel like it [22:59:13] heh [22:59:16] sure! [22:59:18] ok! [22:59:21] thanks for working on this! [22:59:23] * ori runs [22:59:26] k laters! [23:16:40] nuria: batcave? [23:19:17] madhuvishy: yes [23:19:19] omw [23:41:33] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1798048 (Nemo_bis) Great! https://lists.wikimedia.org/pipermail/analytics/2015-November/004502.html I'll now update https://meta....