[00:03:00] Krinkle: but, if you dint want to use eventlogging and use the url directly, it would be possible too [00:03:02] Hm.. I'm getting KeyError: 'kafka' when I run those lines of python [00:03:24] ah, can you paste trace? [00:04:04] https://gist.github.com/Krinkle/22e2101bff0b156276db#file-kafka-event-out [00:04:51] hmmm outdated eventlogging code may be [00:06:26] ori: Can you try on hafnium directly? Maybe the el module is newer there than on tin [00:06:41] yep [00:06:44] running https://gist.github.com/Krinkle/22e2101bff0b156276db#file-kafka-event-py basically [00:09:19] https://dpaste.de/bgSR/raw [00:10:43] hmmm even weirder. I'm not sure how all of this is setup, I'll find out from ottomata tomorrow [00:10:59] please don't turn off the zmq publisher until we have migrated everything, though [00:11:11] as Krinkle said, we rely on that data [00:11:31] ori: yeah, for sure [00:11:35] thanks [00:12:52] Krinkle: also - http://pykafka.readthedocs.org/en/latest/ is the kafka consumer library we use - in the getting started example, using our kafka and zookeeper hosts should be all you need to write your own consumer [00:13:06] I'll follow up tomorrow on this [00:13:11] Thanks [00:13:13] time to leave now [01:22:06] Analytics, Engineering-Community, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1623386 (GWicke) For the REST API, we know that Google is fetching the HTML and data-parsoid for all edited pages as they... [07:36:42] Analytics-Dashiki, Analytics-Kanban, Browser-Support-Firefox: vital-signs doesn't display pageviews graph in Firefox 41, 42 {crow} [3 pts] - https://phabricator.wikimedia.org/T109693#1623735 (Nemo_bis) Yes, with the difference that sometimes the sidebar doesn't load any longer (I had to hard refresh a... [13:48:12] Analytics-Backlog: Add better regexp to agent_type bot filtering {hawk} - https://phabricator.wikimedia.org/T108343#1625233 (JAllemandou) [13:48:12] Analytics-Kanban: Change the agent_type UDF to have three possible outputs: spider, bot, user {hawk} [13 pts] - https://phabricator.wikimedia.org/T108598#1625234 (JAllemandou) [14:23:53] jelouuuu [14:23:58] hi! [14:33:01] joal: sorry about yesterday's patch , i realized i had not pushed my latest version [14:46:04] (PS1) Joal: Update bot filtering for webrequests. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237392 [14:46:28] Hey nuria [14:46:30] No bother :) [14:46:43] I have some review work for you as well ;) [14:54:13] (CR) OliverKeyes: Update bot filtering for webrequests. (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237392 (owner: Joal) [14:59:17] getting this on Hue: "Your query has the following error(s): Could not connect to analytics1027.eqiad.wmnet:10000" [14:59:23] mforns: I'm sorry I forgot to check with you on this [14:59:26] limn1 [14:59:36] (Hive works fine) [14:59:41] you did the language reportcard but didn't update the head, right? [15:00:03] halfak: Not quite overnight, but we're up to 8GB of data so far. [15:00:15] milimetric, mmmm [15:00:19] HaeB: I'll poke otto, he's visiting :) [15:00:21] that's right [15:00:22] Should be done soon then, I suspect. [15:00:38] Maybe because we're taking two regex passes... [15:00:39] mforns: cool, that's ok, but I'll --amend the HEAD and add your change there [15:00:45] that way it's safe from stashing / etc. at least [15:00:45] milimetric, did this cause any problem? [15:00:48] nope [15:00:50] I can do that [15:01:22] mforns: already done [15:01:26] that's weird, hive works but hue says that? [15:01:26] I was messing with a different config [15:01:33] oh ok [15:01:44] thanks! [15:03:40] joal or otto: trying to query wmf_raw i get the json serde is not in path [15:04:04] do we need to pass it in when we start hive? [15:04:05] RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.apache.hive.hcatalog.data.JsonSerDe not found) [15:04:23] nuria: long time I didn't do that [15:04:28] let me collect my thought [15:04:38] joal: ya same here, cause i never used it [15:05:00] nuria: ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar; [15:05:04] hmmm, HaeB try now [15:05:33] from refinery/oozie/webrequest/load/generate_sequence_statistics.hql [15:07:01] joal: k, added to docs: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries#FAQ [15:07:42] Analytics-Kanban: Make reportupdater support script execution - https://phabricator.wikimedia.org/T112109#1625628 (mforns) NEW a:mforns [15:09:11] thx nuria [15:13:57] Hey Ironholds [15:14:05] Read your comment on the regexp for spiders [15:14:25] This regexp comes in addition to ua-parser = 'Spider' [15:14:46] Does that lower a bit the number of ua you'd add to it, or not really ? [15:14:49] Ironholds: --^ [15:15:16] ottomata: works now, thanks! [15:16:05] joal, honestly not [15:16:15] (PS1) Mforns: Make reportupdater support execution of scripts [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/237398 (https://phabricator.wikimedia.org/T112109) [15:16:21] the spider regex in ua-parser is designed to look for just that - spiders - not automata used in crawling or scraping scripts [15:16:35] I'm thinking of things like Java/ or wget or libwww or... [15:16:48] (PS2) Mforns: [WIP] Make reportupdater support execution of scripts [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/237398 (https://phabricator.wikimedia.org/T112109) [15:17:28] Ironholds: some of those are covered already in the regexp [15:20:05] Analytics-Dashiki, Analytics-Kanban, Browser-Support-Firefox: vital-signs doesn't display pageviews graph in Firefox 41, 42 {crow} [3 pts] - https://phabricator.wikimedia.org/T109693#1625691 (Milimetric) @Nemo_bis - I deployed some caching config changes and fixed a bug in the build that was causing so... [15:21:03] some, da ;p [15:22:18] Ironhold: I can update the regexp if you give some patterns you think are worth :) [15:27:00] joal, thought I had in the phabricator ticket? I'll find it [15:27:37] Ironholds: c++ code, right ? [15:28:24] (CR) OliverKeyes: Update bot filtering for webrequests. (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237392 (owner: Joal) [15:28:28] joal, indeedy. COmment left [15:28:34] (with the ones I'd pick out of that) [15:28:38] yup, found it [15:29:23] Ironholds: I'll update based on the code [15:29:33] joal, cool. And I'll leave a comment on an additional change ya can make [15:29:33] Ironholds: sorry for not having thouroughly read :S [15:31:58] joal, no problem! [15:32:03] (CR) OliverKeyes: Update bot filtering for webrequests. (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237392 (owner: Joal) [15:32:19] okay, one additional suggestion left (you don't have to incorporate it but it has been a very useful heuristic ime) [15:34:35] nuria: standup? [15:42:49] hangout frozen for me :( [15:42:54] me too [15:43:24] sorry i have big time connection problems guys, look solved now [15:45:15] Analytics-Cluster, Analytics-Kanban: Decommission remaining old Hadoop Workers - https://phabricator.wikimedia.org/T112113#1625918 (Ottomata) NEW a:Ottomata [16:14:18] nuria: do you want us to talk now? [16:15:53] Hey Ironholds, about your last comment, the hyphen only is part of the regexp [16:16:18] Ironholds: Do you think we'd get a lot of perf increase using a if statement? [16:16:50] joal, oh, I missed it [16:16:53] lemme look again [16:17:07] at the end, hyphen only, and empty [16:17:51] ah yes [16:18:49] joal, would definitely move those out [16:19:20] also, checking for an empty UA is superfluous because that's impossible - "-" is what varnish sets if the UA header is empty [16:19:36] like, if we're getting empty UA fields that's not "the user did not set a UA" that's "something in varnish or kafka is broken" ;p [16:19:57] Oh right, didn't know that :) [16:19:59] thanks [16:20:02] so I'd say use the if-check, yeah. Beats the end of a complex multipart regexpr [16:20:22] joal, is okay! If I am one thing it is "a hive of weird trivia about how our systems work and also how they don't and mostly they don't" [16:20:47] :) [16:21:05] I'm gonna play a bit with regexp vs if perf, just for fun :) [16:21:09] Ironholds: --^ [16:21:27] *thumbs up* [16:25:31] joal: here, whenever you are ready [16:25:38] batcave ? [16:25:42] sure [16:25:42] nuria --^ [16:30:02] (PS2) Nuria: [WIP] Make pageview definition aware of preview parameter [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237274 [16:31:56] Analytics-Cluster, Analytics-Kanban: Decommission remaining old Hadoop Workers - https://phabricator.wikimedia.org/T112113#1626238 (ggellerman) [16:44:06] (CR) Nuria: "Per conversation with @joal we will keep the isPageview and UDFS reading the raw x-analytics header to make sure the code can handle raw d" (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237274 (owner: Nuria) [16:46:11] Ironholds: We'll go for explicit test ;) [16:46:19] joal, hmn? [16:46:33] faster than regexp, really faster :) [16:47:00] I would have expected compiled regexp to handle that reasonably fast, but seems bnot [16:49:54] Analytics-EventLogging, Analytics-Kanban: {tick} Schema Audit - https://phabricator.wikimedia.org/T102224#1626309 (mforns) [16:49:54] Analytics-Backlog, Analytics-EventLogging: Make EventLogging code mark new tables for purging as default {tick} - https://phabricator.wikimedia.org/T106558#1626306 (mforns) Open>Invalid a:mforns As we decided together with Jaime Crespo to implement the auto-purging with white-lists instead of bla... [16:50:49] (PS2) Joal: Update bot filtering for webrequests. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237392 [16:51:29] (CR) Joal: "Cmments inlined: done :)" (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237392 (owner: Joal) [16:52:35] Analytics-Backlog: Put toghether an updated documentation on EventLogging - https://phabricator.wikimedia.org/T112124#1626321 (mforns) NEW [16:55:12] Analytics-Backlog: Put toghether an updated documentation on EventLogging {tick+oryx} - https://phabricator.wikimedia.org/T112124#1626333 (mforns) [16:56:11] Analytics-Backlog: Prepare lightning talk on EL audit - https://phabricator.wikimedia.org/T112126#1626342 (mforns) NEW a:mforns [16:58:11] (PS1) Joal: [WIP] Update agent_type in webrequest [analytics/refinery] - https://gerrit.wikimedia.org/r/237419 (https://phabricator.wikimedia.org/T108598) [17:06:04] Analytics-Backlog: Prepare lightning talk on EL audit {tick} - https://phabricator.wikimedia.org/T112126#1626421 (mforns) [17:06:08] Analytics-Backlog: Put toghether an updated documentation on EventLogging {tick} {oryx} - https://phabricator.wikimedia.org/T112124#1626423 (Milimetric) [17:07:04] Analytics-Backlog: Prepare lightning talk on EL audit {tick} - https://phabricator.wikimedia.org/T112126#1626427 (Milimetric) p:Triage>High [17:07:17] Analytics-Backlog: Put toghether an updated documentation on EventLogging {tick} {oryx} - https://phabricator.wikimedia.org/T112124#1626321 (Milimetric) p:Triage>High [17:08:47] Analytics-Backlog: Doc cleanup day 2.0 - https://phabricator.wikimedia.org/T112024#1626451 (Milimetric) p:Triage>Normal [17:09:25] Analytics-Backlog, Discovery: Present Discovery Metrics at Monthly Metrics meeting - https://phabricator.wikimedia.org/T109775#1626454 (Deskana) p:Triage>Normal [17:09:39] Analytics-Backlog, Discovery: Present Discovery Metrics at Monthly Metrics meeting - https://phabricator.wikimedia.org/T109775#1626455 (Deskana) Open>Resolved We did this a while ago. :-) [17:12:36] Analytics-Backlog, Analytics-Cluster: Spike replacing Camus with Gobblin - https://phabricator.wikimedia.org/T111409#1626468 (Milimetric) p:Triage>High [17:15:38] Analytics-Backlog, Analytics-Cluster: Spike replacing Camus with Gobblin {hawk} - https://phabricator.wikimedia.org/T111409#1626475 (Milimetric) [17:16:23] Analytics-Backlog, Analytics-Cluster: Create Kafka deployment checklist on wikitech {hawk} - https://phabricator.wikimedia.org/T111408#1626480 (Milimetric) [17:16:36] Analytics-Backlog, Analytics-Cluster: Create Kafka deployment checklist on wikitech {hawk} - https://phabricator.wikimedia.org/T111408#1603220 (Milimetric) p:Triage>High [17:20:16] (CR) OliverKeyes: [C: 1] Update bot filtering for webrequests. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237392 (owner: Joal) [17:20:25] Analytics-Backlog: Generalize cube building for hadoop - https://phabricator.wikimedia.org/T111202#1626489 (Milimetric) Open>declined a:Milimetric We don't strictly need this any more. The future is not certain, we have a method in Hive using GROUPING SETS, and we may change our mind. So for now, w... [17:22:15] (Abandoned) Nuria: Add isAppPreview to pageview definition [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [17:22:34] Analytics-Backlog: Backfill pageviews to exclude arbcom wikis {hawk} - https://phabricator.wikimedia.org/T110701#1626507 (Milimetric) [17:24:30] Analytics-Backlog: Backfill pageviews to exclude arbcom wikis {hawk} - https://phabricator.wikimedia.org/T110701#1626515 (JAllemandou) [17:24:31] Analytics-Kanban, Research-and-Data, Patch-For-Review: Backfill pageview data correcting space in title bug {hawk} [5 pts] - https://phabricator.wikimedia.org/T110614#1626516 (JAllemandou) [17:25:01] hey guys. Where can i find a list of our most visited pages on wiki? [17:25:07] (across projects) [17:26:17] hey jdlrobson [17:26:26] For the moment it's not easy to get :( [17:26:33] joal: doesn't matter if it's out of date [17:26:38] i just need to get a realistic sample [17:26:38] stats.grok.se has one [17:27:03] We are working to provide this kind of data soon (end of month if everything goes well) [17:27:12] jdlrobson: --^ [17:27:22] awesome :) [18:04:16] jdlrobson: out of date as in March 2014: http://stats.grok.se/en/top [18:04:39] jdlrobson: you can query pretty quickly for this kind of data in the wmf.pageview_hourly table in hive [18:04:56] if you need something cleaner / more recent [18:26:34] milimetric: good to know. I just want to know the top 20 visited pages so i can run some performance tests on them [18:26:44] or top 50 if i'm more ambitious [18:35:41] milimetric: Did you push neilpquinn's patch for https://edit-analysis.wmflabs.org/ ? Does it need fiddling? [18:36:34] James_F: it got merged :) https://gerrit.wikimedia.org/r/#/c/236237/ [18:37:06] neilpquinn: Yes, but the data in https://edit-analysis.wmflabs.org/compare/ stops dead on Tuesday, when the schema switch over. [18:37:22] I"ll look for errors [18:37:53] milimetric: Thanks! [18:39:29] neilpquinn: Also, do you want to reply to wikimedia-l talking about milimetric's and your work on https://edit-analysis.wmflabs.org/compare/ and plans for "more data"? :-) [18:42:29] joal: do you have data generated for the "tops" endpoint that you can just head -n 50 for jdlrobson's purposes? [18:42:44] jdlrobson: enwiki only? [18:42:52] milimetric: that would be fine [18:43:16] jdlrobson: do you care about mobile only or desktop only or both? [18:43:25] mobile only would be better [18:43:28] milimetric, jdlrobson I don't have the data at hand, but can easily generate it [18:43:31] but if it's too hard desktop is fine [18:43:59] context https://phabricator.wikimedia.org/T111198#1625966 [18:43:59] joal: if it's a pain I can do it, was just wondering if you already had it as you were testing your jobs [18:44:21] So top 50 articles (pageview only), mobile only, enwiki, yesterday ? [18:44:25] jdlrobson: --^ [18:44:30] yes [18:44:38] joal that would be great - better if during last week though [18:44:47] just gives us a larger sample [18:45:15] * milimetric is really excited for when this'll just be a restbase endpoint :) [18:45:28] * joal is even more than excited [18:45:48] jdlrobson: from beginning of september, ok ? [18:45:53] perfect! [18:46:21] joal: are you querying pageview_hourly directly or something else? [18:46:25] jdlrobson: last confirmation: uers only, no bots (or as few as possible) [18:46:34] pageview_hourly [18:46:38] aham [18:46:38] nuria: --^ [18:46:44] anonymous no bots would be great [18:47:12] hm, what do you mean anonymous no bots? [18:47:31] For the moment we only remobve bots that show themselves :) [18:48:48] okay scrap that then :) just anonymous fine [18:50:11] mforns: you around? How long you workin? [18:50:34] milimetric, yes! for 1-2 more hours [18:50:59] mforns: wanna chat in batcave? I'm not sure how to debug this seemingly stopped reportupdater [18:51:09] sure [18:51:12] omw [18:52:48] jdlrobson: how can I communicate that to you easily ? [18:54:39] madhuvishy: HMMM should we do mysql consumer? :D [18:54:48] i'm waiting for a thing to me merged in order to do hafnium ones [18:55:14] jdlrobson: does that work for you https://gist.github.com/jobar/53db2d87461cf3137d03 ? [18:55:25] Analytics-EventLogging, Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1627094 (Ottomata) This is why eventlogging on hafnium is old: https://gerrit.wikimedia.org/r/#/c/237446/ [18:56:53] joal: if that's accurate that's perfect :) Thanks a bunch! [18:57:12] joal: whats your phabricator username? [18:57:37] jdlrobson: pageviews only, mobile app + mobile web, users only (no known bot, or almost), view_count since sept first :) [18:58:16] joal: so does that include logged in? [18:58:20] jdlrobson: I am JAllemandou [18:58:24] Analytics-EventLogging, Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1627196 (Ottomata) Hey, I just looked at navtiming.py. Instead of using the generic eventlogging get_reader() function, it connects to a zmq port directly.... [18:59:25] jdlrobson: this includes any 'pageview' as defined here https://meta.wikimedia.org/wiki/Research:Page_view [18:59:43] We don't know if people having viewed pages are logged or not [18:59:59] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Deploy EventLogging on Kafka to eventlog1001 (aka production!) {stag} [8 pts] - https://phabricator.wikimedia.org/T106260#1627211 (Ottomata) [19:00:00] Analytics-EventLogging, Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1627212 (Ottomata) [19:01:01] Hey madhuvishy [19:01:24] I'll finally manage to spend some time asking you about the top restbase endpoint :) [19:02:21] kevinator: when is the potential analytics offsite? [19:05:53] milimetric: ^? [19:06:05] the one in October? [19:06:16] I think October 20th was the date we casually thought about [19:10:06] danke [19:17:34] Hi hi [19:18:03] Sorry joal ottomata getting to office, we can irc though [19:18:14] hey madhuvishy [19:18:31] Hey :) [19:18:43] quick question: I'd like to change the top enpoint you created for restbase [19:18:53] How should we work that ? [19:19:03] Do you want to do it ? Do you prefer me to do it ? [19:19:27] madhuvishy: https://gerrit.wikimedia.org/r/#/c/237469/3 :) [19:19:38] joal: I can do it, don't know what should change though [19:19:46] I can explain :) [19:19:50] ottomata: yay :) [19:20:01] Analytics-General-or-Unknown, The-Wikipedia-Library: Category based-pageview collection for non-Article space, via Treeviews or similar - https://phabricator.wikimedia.org/T112157#1627421 (Sadads) NEW [19:20:09] Getting the ottomata madhuvishy ! Almost no more 0mq ! [19:20:16] madhuvishy: +1? [19:20:25] okay now we have to move the perf team's stuff. do you have the links to their repo? yeah reading the code [19:20:29] yeah, joal we are going to have an issue with the perf team consumers on hafnium [19:20:30] they don't use eventlogging code [19:20:34] but connect to zmq directly :( [19:20:41] Arfpfpfff :( [19:20:51] so, once we get our stuff all straight [19:20:58] i think i'll make a kafka -> zmq consumer [19:21:06] on eventlogging-valid-mixed [19:21:08] I guess you qre gonna publish from kafka to 0mq on hafnium ;) [19:21:11] to maintain a little backwards compatibility [19:21:13] yaeh [19:21:14] :) [19:21:15] :D [19:21:15] exactly [19:21:17] on hafnium [19:21:24] How wrong ;) [19:21:37] ottomata: i see you said 1000 ms instead of 10000 here [19:21:39] intentional? [19:21:41] That's really cool :) [19:21:42] ¯\_(ツ)_/¯ [19:21:45] yes [19:21:51] from convo with mforns [19:22:14] ottomata: okay cool [19:22:18] ? [19:22:29] mforns: commit offsets for mysql consumer more often [19:22:38] oh! ok! [19:22:59] ottomata: can they not consume directly from kafka? [19:23:29] (PS3) Nuria: [WIP] Make pageview definition aware of preview parameter [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237274 [19:23:43] madhuvishy: they could, but their code uses zmq directly [19:23:46] so they'd have to modify code [19:23:50] not just the uri endpoing [19:23:52] endpoint [19:24:30] hmmm, but that should be easy no? they can use pykafka and consume, given the kafka and zookeeper hosts [19:24:31] ottomata: I'll need some help to setup the repos for uap-java and uap-core [19:24:42] better tomorrow (late for me today) [19:24:45] k [19:24:50] they could! [19:24:51] they can! [19:24:52] :) [19:24:55] ottomata: tomorrow your morning time [19:25:02] ottomata: https://github.com/wikimedia/operations-puppet/blob/production/modules/webperf/files/ve.py this one plugs into eventlogging.connect [19:25:31] this can be easily changed, but they ran into old eventlogging code when running it on hafnium [19:25:43] yeah [19:25:45] they are fine with changing if we give them an example of how to do it [19:25:47] madhuvishy: trying to fix that [19:26:00] madhuvishy: restarted mysql consumer on eventlog1001, looking good! [19:26:06] awesome [19:26:12] madhuvishy: this should fix [19:26:12] https://gerrit.wikimedia.org/r/#/c/237446/1/manifests/network.pp [19:26:17] waiting for alex to merge [19:26:36] okayy [19:28:16] madhuvishy: top endpoint currently has a 'timespan' field, making it different of the other two (granularity + start/end) [19:29:14] yeah [19:29:21] I think we should provide the same service for top than others [19:29:50] Do you agree with that ? [19:30:29] hmmm [19:30:43] i thought it was easier to say top for a month, day etc [19:31:07] Right [19:31:37] With timespan currently, you an only specify year/month/day, no actual 'value' for those, right ? [19:32:18] so it means we don't keep historical computation (so sad, having computed top, and not serving it !) [19:33:36] cool, madhuvishy i merged that. deploy worked this time [19:34:38] Aah right [19:34:45] Yeah joal that makes sense [19:35:03] So what you called timespan here seems in fact a granularity :) [19:35:04] ottomata: nice! [19:35:11] right [19:35:18] joal or ottomata : i am getting "org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: SEL_2" on cluster when trying to run [19:35:36] ottomata: offsite potentially 6 weeks from now (mid to end of october) [19:35:48] kevinator: I'm not there last week [19:36:41] joal or ottomata : https://gist.github.com/nuria/c496b41759713982e34a [19:37:28] https://www.irccloud.com/pastebin/LS9c94LY/ [19:40:02] but joal there should be a way to ask for latest? [19:40:19] madhuvishy: That's a good idea [19:40:42] hm [19:40:51] nuria: weird, never saw that [19:41:00] ya... [19:41:16] let me see if executing teh code in teh hive prompt i get the same thing [19:41:20] *the [19:41:26] I tried, same thing [19:41:28] nuria: --^ [19:41:30] also, timespans for top seem not too intuitive [19:41:44] joal: yaaa [19:41:46] if they said 2015/5 or sth that seems easier [19:41:56] Joal: something no working so hot [19:42:06] madhuvishy: having daily/latest would be great [19:42:23] but we could also accept daily/20150909 [19:43:59] madhuvishy: having top/en.wikipedia/desktop/2015 [19:44:05] madhuvishy: having top/en.wikipedia/desktop/2015/09 [19:44:13] would be perfect I think :) [19:44:26] madhuvishy: having top/en.wikipedia/desktop/2015/09/10 as well (until day) [19:49:07] nuria: , with you rhostly [19:49:09] shortly* [19:49:38] nuria: trying different setups, but something is wrong here [19:49:41] joal or ottomata : are refinery-core and refinery-hive loaded by default on the hive command line? [19:49:52] nuria: no [19:49:54] joal: as in you cannot query hive and use a udf right? [19:50:14] Analytics-EventLogging, Performance-Team, Patch-For-Review: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1627609 (Ottomata) Ah phoo, in order to consume from Kafka, Zookeeper ports on conf100x will have to be opened up to hafnium, which has a... [19:50:20] no, not possible to use a UDF if the jar has not been imported and the function defined [19:50:27] joal: ah ok, so you need to load jars always even for the ones deployed to cluster [19:50:36] Except if you specifically define a conf for that [19:50:42] correct [19:51:04] i think on stat1002 they are loaded [19:51:06] automatically [19:51:07] [19:51:08] hive.aux.jars.path [19:51:08] The location of the plugin jars that contain implementations [19:51:08] of user defined functions and serdes. [19:51:08] [19:51:08] file:///usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar,file:///srv/deployment/analytics/refinery/artifacts/refinery-hive.jar [19:51:08] [19:51:49] hmmmm ! [19:51:57] Didn't know that ottomata [19:52:04] dunno if it works though [19:52:22] i think it did at one point, but i remember recently having to add the hcatalog jar manually anyway [19:52:35] madhuvishy: https://gerrit.wikimedia.org/r/#/c/237479/2 [19:53:03] ottomata: nah, i do not think that works now [19:53:34] ottomata, joal: let me try to load the default jars to see if exception appears [19:53:45] nuria: seems that your problem comes from Hive now uses Kryo to serialize stuff [19:54:13] Reqding your code, I find some variables (ObjectInspector) that are transient, some others not [19:54:45] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1627628 (kevinator) [19:54:45] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Make EventLogging monitoring and alerts based on Kafka metrics {stag} [8 pts] - https://phabricator.wikimedia.org/T106254#1627626 (kevinator) Open>Resolved [19:54:46] joal: ya comes from my code, just used the default jars and they work [19:54:59] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Prep work for Eventlogging on Kafka {stag} - https://phabricator.wikimedia.org/T102831#1627633 (kevinator) Open>Resolved [19:55:00] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1359675 (kevinator) [19:55:15] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1359675 (kevinator) [19:55:16] Analytics-EventLogging, Analytics-Kanban: Send raw client side events to Kafka using varnishkafka instead of varnishncsa {stag} - https://phabricator.wikimedia.org/T106255#1627637 (kevinator) Open>Resolved [19:55:37] joal: niceee, but wait cause user_agent is very similar, will try that , need to leave for an hour for an interview [20:02:08] joal: sorry just joined the interview - i'll play would those endpoints today [20:02:26] np madhu, thanks for thinking about that :) [20:03:11] nuria: Can't find helpful post [20:03:27] joal: in interview , will check back later/tomorrow [20:03:37] np, just wanted to let you know :) [20:06:45] ottomata: do we really need to do the backwards compatibility thing, or is it intermediary until they change their code? [20:06:57] interediary [20:07:02] allows us to turn of more zmq now [20:07:13] ottomata: yeah cool [20:11:44] madhuvishy: https://gerrit.wikimedia.org/r/#/c/237486/ [20:24:20] I'm oof for today guys :) [20:24:32] Have a good end of day a-team :) [20:24:44] ciao [20:25:49] good night joal! [20:28:16] laters! [20:32:55] (PS1) Milimetric: TEMPORARY: hack out the large amount of wikis [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/237493 [20:33:08] (CR) Milimetric: [C: 2 V: 2] TEMPORARY: hack out the large amount of wikis [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/237493 (owner: Milimetric) [20:35:00] James_F|Away / neilpquinn: ok, so running over 6k queries was ok [20:35:11] but what happened was a lot of the little tiny wikis had no results [20:35:36] so the reportupdater didn't know whether there was a problem and it tries to re-run all the queries for all those wikis since April [20:35:54] so that made the number of queries HUGE, and the recent "union all" put it over the limit. [20:36:07] That meant it was taking over 1 day to run all the data [20:36:11] so we weren't seeing updates [20:36:43] Marcel and I hacked it here: https://gerrit.wikimedia.org/r/#/c/237493/ by limiting to the top 7 wikis plus the overall aggregate [20:36:55] that'll run tonight and should update the dashboard for those wikis. [20:37:05] We'll work on a better plan tomorrow [20:49:43] Analytics-EventLogging, Analytics-Kanban, Performance-Team, Patch-For-Review: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1628009 (Ottomata) [20:54:08] Analytics-EventLogging, Analytics-Kanban, Performance-Team, Patch-For-Review: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1628029 (Ottomata) The current state of things: I am running an eventlogging-forwarder on eventlog1001 that is cons... [20:55:18] madhuvishy: it is alllll runninnnn [20:55:24] 12 client side processors [20:55:29] all zmq off except for this forwarder to 8600 [20:58:47] varnishncsa for eventlogging off too [21:03:22] ottomata: yayyy [21:03:31] :D [21:04:24] BYEYYEYE [21:22:47] milimetric: Aha. Thank you so much. [21:23:11] np James_F, I'll babysit it tomorrow to make sure it at least generates data for those main wikis and overall [21:23:17] +1 [21:23:20] Thanks. [21:23:31] But I think it was running like over 100k queries, so it makes sense it was dying [21:23:46] TBH a split for all, and for each of the top N wikis (list TBD) isn't a terrible outcome. [21:24:05] Analytics-Backlog, Team-Practices-This-Week: Get regular traffic reports on TPG pages - https://phabricator.wikimedia.org/T99815#1628266 (ggellerman) a:kevinator>JAufrecht [21:30:51] Analytics-Dashiki, Editing-Analysis, Editing-Department: Time selector on https://edit-analysis.wmflabs.org/compare/ is only followed (?) by the first and fifth elements - https://phabricator.wikimedia.org/T112183#1628303 (Jdforrester-WMF) NEW [21:31:46] James_F: let me know if you want to go forward just generating for a specific list instead of all the wikis. That's easier on the resources, helps save forests and all that :) [21:31:49] Analytics-Dashiki: WMF Dashiki instance should have reasonable URL - https://phabricator.wikimedia.org/T88390#1628317 (Jdforrester-WMF) Open>Resolved a:Jdforrester-WMF This is now moved to https://vital-signs.wmflabs.org/ [21:32:53] milimetric: Indeed. neilpquinn and I have idly chatted about such a list but not made a firm selection. Top 50 + Commons + Wikidata maybe? Eh. [21:33:49] sounds good to me James_F, a list is better than 850 [21:34:01] * James_F crunches some numbers. [21:34:18] milimetric: We can do that. I'll work on that. [21:36:30] * James_F sighs at OpenOffice Calc crashing six times in four minutes. :-( [21:40:39] Analytics-General-or-Unknown, The-Wikipedia-Library: Category based-pageview collection for non-Article space, via Treeviews or similar - https://phabricator.wikimedia.org/T112157#1628362 (Magnus) Both direct page view counts (e.g. treeview) and pages-with-images (baglama) are based on http://stats.grok.... [21:40:39] milimetric: Top 50 by active editors is "en, de, es, fr, ja, ru, it, pt, zh, pl, nl, ar, tr, fa, sv, ko, id, he, uk, cs, no, hu, fi, vi, ca, th, da, ro, el, bg, simple, sr, bn, hy, ur, hi, az, kn, hr, sk, eo, et, lt, sl, ta, tl, ms, be, gl, sh" FWIW. [22:18:54] (PS3) Mforns: Make reportupdater support execution of scripts [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/237398 (https://phabricator.wikimedia.org/T112109) [22:40:21] Analytics-Backlog, Research management, Research-and-Data: Pipeline for data-intensive applications from research to productization to integration - https://phabricator.wikimedia.org/T105815#1628679 (DarTar) [23:28:19] (PS1) Jforrester: success_by_user_type: Split the 5–99 cohort into 5–9 and 10–99 [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/237534