[02:43:10] Analytics-Backlog, Analytics-Dashiki, Editing-Analysis, VisualEditor, Patch-For-Review: Improve the edit analysis dashboard {lion} - https://phabricator.wikimedia.org/T104261#1640257 (Jdforrester-WMF) [05:24:26] Analytics-Backlog, Discovery: Present Google Referrer Traffic analysis at Metrics meeting - https://phabricator.wikimedia.org/T109773#1640366 (Deskana) Open>declined Closing as declined to reflect reality per the above. I ended up presenting on overall strategy rather than this. [07:16:20] Analytics-Tech-community-metrics, DevRel-October-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1640549 (Qgil) NEW [07:17:05] Analytics-Tech-community-metrics, DevRel-September-2015: "Median time to review for Gerrit Changesets, per month": External vs. WMF/WMDE/etc patch authors - https://phabricator.wikimedia.org/T100189#1640557 (Qgil) >>! In T100189#1637199, @Qgil wrote: > Before we had Wikia as well. Any @wikia.com contribut... [07:19:36] Analytics-Tech-community-metrics, DevRel-September-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1640559 (Qgil) Even if Terry told us to show the data of the last 6 months, I think our curves are better understood with a 12 month cycle. No need to change... [08:19:58] Analytics-Tech-community-metrics, DevRel-October-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1640750 (Dicortazar) a:Dicortazar [09:39:18] Analytics-General-or-Unknown, Wikidata: [Story] Statistics for Wikidata exports - https://phabricator.wikimedia.org/T64874#1640858 (Lydia_Pintscher) I've just had another person ask for this. (3rd party researcher who wants to use it for a workshop) It'd be really great to get this published. Adam: Can... [09:54:52] Analytics-Tech-community-metrics, DevRel-September-2015: Mysterious repository breakdown(s)/sorting order - https://phabricator.wikimedia.org/T103474#1640895 (Aklapper) Open>Resolved Thanks! Confirming that http://korma.wmflabs.org/browser/scr-repos.html is sorted by "Reviews submitted per repositor... [10:16:50] Analytics-Tech-community-metrics, DevRel-October-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1640942 (Dicortazar) After checking affiliations, Korma was missing wikia-inc.com as Wikia people. I've updated the database using the sorting hat interface. S... [10:27:17] (CR) JanZerebecki: "I don't have any permission here." [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238319 (owner: Christopher Johnson (WMDE)) [11:14:43] (PS1) Addshore: Modify access rules [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238413 [11:17:08] (PS2) Addshore: Modify access rules [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238413 (https://phabricator.wikimedia.org/T112628) [14:11:41] Analytics-Tech-community-metrics, DevRel-October-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1641464 (Dicortazar) Page now contains info from Wikia. Closing this! [14:11:56] Analytics-Tech-community-metrics, DevRel-October-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1641465 (Dicortazar) Open>Resolved [14:28:59] a-tea m hi! who wants to give second opinion on a ruby task submission? [14:29:05] a-team^ :) [14:29:19] Hey ottomata :) [14:29:32] I am no ruby expert, but I can havea look if you want [14:32:49] (CR) Nuria: "Code looks OK, I would test (via unit tests) :Search vs :search if both need to work." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236860 (owner: OliverKeyes) [14:38:46] Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents: Provide oneIn() sampling function as part of standard library - https://phabricator.wikimedia.org/T112603#1641518 (Nuria) @Krinkle: Agreed. Core package should provide sampling function, after CR sampling in serachsuggest is correct: https://g... [14:39:06] Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents: Provide oneIn() sampling function as part of standard library - https://phabricator.wikimedia.org/T112603#1641520 (Nuria) [14:41:33] ottomata: i "anti-helped" yesterday [14:41:56] haha, s'ok joal is looking [14:41:58] you helped nuria! [14:42:09] i just need someone else to look and either agree or disagree with a pass fail, that's all :) [14:42:14] ottomata: ya, "negative productivity" [14:42:18] pssshhh [14:47:30] ottomata: ok, if nobody else grabs it i will [14:49:39] nuria: currently trying [14:50:13] joal: k [14:55:12] ottomata, joal if you want I can look, too [14:55:37] btw ottomata do you have couple mins to discuss the documentation on Kafka? [14:57:28] (PS4) Nuria: [WIP] Make pageview definition aware of preview parameter [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237274 [14:59:05] mforns: yes, i'm heading to batcave to talk to joal, join us there then lets chat [14:59:13] ok [15:25:13] Analytics-Tech-community-metrics, DevRel-October-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1640549 (Qgil) Thank you! Argh, I wonder which changeset is that single one from Wikia that is now distorting the whole graph... [15:25:27] Analytics-Tech-community-metrics, DevRel-September-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1641695 (Qgil) [15:27:38] Analytics-EventLogging: Use non-static calls in EventLogging - https://phabricator.wikimedia.org/T87468#1641705 (Krinkle) What properties or state would you need to change (outside tests). I don't see any public supposed-to-be-mutable-state or set-methods in EventLogging's client-side. Sounds like maybe you... [15:28:27] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1641707 (kevinator) [15:28:28] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Deploy EventLogging on Kafka to eventlog1001 (aka production!) {stag} [13 pts] - https://phabricator.wikimedia.org/T106260#1641706 (kevinator) Open>Resolved [15:28:38] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1359675 (kevinator) [15:28:52] Analytics-Kanban: Follow on security asks {tick} [3 pts] - https://phabricator.wikimedia.org/T109462#1641710 (kevinator) Open>Resolved [15:29:08] Analytics-Kanban: Fix lagging report updater {lion} [5 pts] - https://phabricator.wikimedia.org/T112276#1641714 (kevinator) Open>Resolved [15:29:23] Analytics-Kanban, Research-and-Data: Analyze referrer traffic to determine report format {hawk} [8 pts] - https://phabricator.wikimedia.org/T108886#1641716 (kevinator) Open>Resolved [15:30:00] Analytics-Dashiki, Analytics-Kanban, Browser-Support-Firefox, Patch-For-Review: vital-signs doesn't display pageviews graph in Firefox 41, 42 {crow} [3 pts] - https://phabricator.wikimedia.org/T109693#1641717 (kevinator) Open>Resolved [15:30:44] Analytics-EventLogging: Support kafka in eventlogging client on tin.eqiad.wmnet - https://phabricator.wikimedia.org/T112660#1641722 (Krinkle) NEW [15:30:46] Analytics-Tech-community-metrics, DevRel-October-2015: "Oldest open Gerrit changesets without code review" panel should filter WIP etc - https://phabricator.wikimedia.org/T112661#1641729 (Qgil) NEW [15:31:09] ottomata: standupppp [15:32:22] Analytics-Tech-community-metrics, DevRel-September-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1641738 (Dicortazar) https://gerrit.wikimedia.org/r/#/c/127172/ :) [15:35:33] Analytics-Tech-community-metrics, DevRel-September-2015: Add Wikia as recognized organization in Korma - https://phabricator.wikimedia.org/T112621#1641743 (Dicortazar) Thanks for the token ;) [15:48:10] (CR) Milimetric: "This patch was pretty close to merging when we left it off. I'm happy to help rebase and finish it if it's still useful. Otherwise maybe" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/126927 (owner: Awight) [15:49:47] (CR) Milimetric: "poke - Mark, you ok with this? Need help?" [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237134 (owner: MarkTraceur) [15:50:02] milimetric: Pong [15:50:10] milimetric: We're putting it on hold for a while [15:50:18] k [15:52:46] (CR) Alex Monk: [C: -1] "I notice that this already inherits from the top level wikidata project which is owned by https://gerrit.wikimedia.org/r/#/admin/groups/10" [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238413 (https://phabricator.wikimedia.org/T112628) (owner: Addshore) [15:56:12] (CR) Addshore: "I honestly have no idea! :)" [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238413 (https://phabricator.wikimedia.org/T112628) (owner: Addshore) [15:56:20] joal: (for later) so query with "limit" clause works fine: [15:56:24] https://www.irccloud.com/pastebin/liMCcjCH/ [16:01:09] nuria: the query you run have no reducer [16:01:39] joal: select isPageview(uri_host, uri_path, uri_query, http_status, content_type, user_agent) from webrequest where year=2015 and month=09 and day=04 and hour=01 ; [16:01:47] without limit does NOT work though [16:02:55] joal: does the fact that it has no limit changes whether there is a reducer? [16:02:58] hm, same error > [16:03:00] ? [16:03:06] shouldn't [16:04:03] joal: do you get an error both ways (limit and w/o limit?) [16:04:15] Didn't try without limits [16:04:18] :) [16:12:39] Analytics-Tech-community-metrics, DevRel-September-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1641892 (Aklapper) >>! In T107562#1640559, @Qgil wrote: > curves are better understood with a 12 month cycle. I agree. Done. [16:19:00] Analytics-EventLogging: EventLogging client should log errors (e.g url too long, or validation failure) - https://phabricator.wikimedia.org/T112592#1641917 (Krinkle) p:Triage>Normal a:Krinkle [16:22:19] joal, want to batcave to look at flights? [16:22:26] sure [16:22:31] cool [16:22:39] in a minute (finishing dev summjit registration0 [16:22:47] sure, wait you there [16:23:19] (CR) OliverKeyes: "Da, luckily they don't; Search is always upper-case since in this case it's the Special page (I'd just go with Special:Search but Special " [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236860 (owner: OliverKeyes) [16:48:52] joal: let me know if /when you have time [17:02:29] nuria: currently with mforns waiting for kevin in batcave about plane tickets [17:02:35] will talk after [17:16:23] joal:k [17:21:28] * milimetric lunch [17:49:55] (CR) Legoktm: [C: 2 V: 2] Modify access rules [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238413 (https://phabricator.wikimedia.org/T112628) (owner: Addshore) [17:50:17] (CR) Legoktm: "Please file a follow-up bug about standardizing the wikidata/* permissions" [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238413 (https://phabricator.wikimedia.org/T112628) (owner: Addshore) [17:55:08] * addshore needs to talk with someone about the best graphing tool for stuff like >> https://docs.google.com/spreadsheets/d/15FjnFyuwgVV9nAlMQhz53ZWPrLXRfZL1xYgnWa5ym3M/edit?usp=sharing [18:02:17] hello addshore yt? [18:06:19] *waves* jut poped out! [18:09:41] addshore: where does your data come from? [18:13:53] ottomata: what version of hive are we running? [18:14:43] nuria: i thikn 1.1.10 [18:14:45] 1.1.0* [18:16:39] ottomata: so this would be the branch of github? https://github.com/apache/hive/blob/branch-1.1.1/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java [18:19:12] Nuria a mix of a hive query and a fancy grep on the api logs [18:19:37] nuria: i think this 1.1.0+cdh5.4.0+103-1.cdh5.4.0.p0.56~trusty-cdh5.4.0 [18:19:38] oops [18:19:40] https://github.com/apache/hive/blob/release-1.1.0/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java [18:19:44] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1642491 (Aklapper) Now that T100189 is resolved (yay!) and we can clearly differentiate between individual/unknown and non... [18:20:11] possibly some cloudera mods in there, but unlikely [18:20:18] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1642494 (Aklapper) [18:20:35] addshore: so the excell spreadsheet is compiled by copying by hand data from those two sources? [18:20:35] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1642496 (Aklapper) p:Normal>High [18:21:11] ottomata: i am at a loss with the kyro stuff, even if the udf is empty (only method stubs) it fails [18:21:56] addshore: is teh data copied by hand into your spreadsheet? [18:22:01] Nuria, indeed [18:22:39] As I don't know the best graphing tool for it that's available currently... Or how to set up some level of automacy with it nicely :) [18:23:41] addshore: Well, depends of what type of data is and whether the process of updating is "automatable". for example you can look at: https://vital-signs.wmflabs.org/#projects=ruwiki,itwiki,dewiki,frwiki,enwiki,eswiki,jawiki/metrics=Pageviews [18:25:01] addshore: meta of matter s that data is precomputed and available as csvs in an endpoint: https://metrics.wmflabs.org/static/public/datafiles/Pageviews/eswiki.csv [18:25:16] Well, be process of getting the data can be automated, and stuck in an SQL table or someyhing [18:26:21] I guess I can just dump it into csvs / tsvs, and then I was guessing limn? [18:26:34] But I think I need to read some docs on how to tie it all together (: [18:26:48] addshore: we do not use limn anymore, [18:27:46] addshore: but you can take a look at dashiki [18:28:13] addshore: that is just a shell that retreives data from endpoint and plots it, so your data has to be available via http [18:28:19] *retrieves [18:28:28] addshore: do you have access to gerrit? [18:28:44] Yup, [18:29:08] Is dashiki documented somewhere for wikimedia things? [18:29:10] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1642509 (awight) @ellery: Can you let us know what you think about the impact on statistics? [18:29:38] addshore: depo is here ssh://@gerrit.wikimedia.org:29418/analytics/dashiki [18:30:43] addshore: you can look at README to install, as i said it is just a client side shell, currently supports two types of data, and that might not be a fit for you, it has a wikicentric layout and a comparation one [18:31:18] Cool, I will have a look when I get home! [18:31:23] addshore: this is also dashiki [18:31:49] addshore: but please note is not a "generic" graph anything tool, works best for metrics attached to wikis [18:32:06] joal:Yt? [18:32:22] nuria: I'm adding a task to write more in-depth dashiki docs. [18:32:35] Oohhhhh [18:32:50] Is metrics.wmflabs dashiki then? [18:32:56] And why the move away from limn? [18:33:00] Analytics-Backlog: Write in-depth dashiki documentation - https://phabricator.wikimedia.org/T112685#1642517 (Milimetric) NEW [18:33:38] Analytics-EventLogging: Use non-static calls in EventLogging - https://phabricator.wikimedia.org/T87468#1642524 (Tgr) Open>declined a:Tgr I don't think there are any right now, but that's because some tasks that should be solved in EL are pushed to clients (see bugs linked from the description). Th... [18:34:37] hey nuria [18:36:36] Joal you are my only hope [18:36:43] :) [18:36:52] joal : i have emptied the udf completely [18:36:55] joal: like: [18:37:33] https://www.irccloud.com/pastebin/Dv7FQFrx/ [18:37:43] so no code, right? [18:37:49] and i still get the exception [18:38:19] joal: "a slightly different one:'org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 100'" [18:38:42] rhm... [18:39:32] Analytics-Cluster, Analytics-Kanban, operations: Fix llama user id - https://phabricator.wikimedia.org/T100678#1642547 (Ottomata) [18:39:39] joal: no code at all on udf, other than return type, let me change that too [18:39:41] If we are that far nuria, let's try something else: rename the class --> the class exists from other jars loaded in different hive sessions, but who knows :( [18:42:59] joal: will do, let me test that [18:43:16] But that's kinda wild wild [18:43:19] joal: yaaa [18:43:40] I'll try to package / execute as well [18:48:09] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1642592 (awight) @Jgreen mentions that the new log pipeline will be updated in realtime, so we should reconside... [18:49:42] milimetric: sorry i haven't worked on those servers today, we need alex! [18:51:24] ottomata: if you tell me how you need him, I can try to bug him tomorrow morning [18:52:03] joal i will comment on ticket, thanks [18:52:09] ok :) [18:54:33] ottomata: uh... I think I found something bad [18:54:34] brain bounce with me? [18:58:57] madhuvishy: you around? [19:04:57] Analytics-Kanban: Bug: client IP is being hashed differently by the different parallel processors - https://phabricator.wikimedia.org/T112688#1642642 (Milimetric) NEW [19:07:59] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1642655 (Addshore) Well, after speaking with nuria in #wikimedia-analytics it looks like limn isn't really used any more apparently? SO maybe we wont move forward with this.... bah, suc... [19:09:27] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1642666 (Nuria) Wait, one thing is limn, other (i know, confusing) the limn-data repositories. those are not tied to limn necessarily, they are just poorly named. [19:09:31] addshore: hi [19:09:56] nuria: there might be some confusion here [19:10:10] I'm happy to set up another limn dashboard if that gets you guys what you need [19:10:56] milimetric: (cc addshore ) their data is in excell, I think the name of the repo and limn dashboard got mixed up? [19:11:17] joal: https://phabricator.wikimedia.org/T111053#1642679 [19:11:39] milimetric: i think addshore is off for a bit , we can talk later. [19:11:45] ok ottomata, will try to ping him tomorrow :) [19:12:04] yeah, I am sorry for the confusion. I will try to clarify a bit in the ticket. [19:12:23] milimetric: addshore's data is here: https://docs.google.com/spreadsheets/d/15FjnFyuwgVV9nAlMQhz53ZWPrLXRfZL1xYgnWa5ym3M/edit?usp=sharing [19:12:37] milimetric: none of it comes from SQL [19:12:49] milimetric: it's hive + greps on logs [19:12:57] nuria: yeah, I saw that, but I think that's just one output, I think aude was asking about the same thing and he had a way of getting data via sql [19:13:12] i'm curious if that's the same dashboard or something else [19:13:41] milimetric: ah, ok, if that is the case things change, of course. [19:14:33] milimetric: that would be a different set of data [19:14:52] nuria: https://vital-signs.wmflabs.org is "shiny" right? [19:15:08] ahemm... me no compredou [19:15:48] milimetric: well, it looks like the dashboarding stuff at vital-signs.wmflabs.org may suite all the cases [19:15:55] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1642692 (Milimetric) Hi everyone. I am sorry for the confusion. I'll try to clarify a bit: limn-<>-data repositories: this is good, please create this if you can or ask me... [19:16:12] addshore: so I spoke to @aude before about a dashboard [19:16:18] is this something else [19:16:59] milimetric: well, the data although slightly linked (the fact its wikidata) is about very different things [19:17:08] though it may also all fit in one dashboard, who knows [19:17:15] nuria: shiny as in https://wikitech.wikimedia.org/wiki/Building_a_Shiny_Dashboard [19:17:27] addshore: no, we don't support shiny [19:17:39] but you're welcome to try and get that up, shiny's a good tool [19:17:54] is vital-signs.wmflabs.org not shiny then? :P [19:17:56] what we support are 1. limn dashboards and 2. dashiki dashboards [19:17:58] *is still so confused* :D [19:18:05] vital signs is a dashboard built using dashiki [19:18:24] ahh okay, it just looks a bit like the dashboard used as an example on the shiny page! [19:18:26] cool! [19:18:37] addshore: yeah, that's just a style coincidence [19:18:44] dashiki has multiple "layouts" which are configured into dashboards via wiki pages [19:18:49] here are two examples of dashiki dashboards: [19:18:58] https://vital-signs.wmflabs.org/ and https://edit-analysis.wmflabs.org/compare/ [19:19:21] by "layouts" we mean a certain way to combine metrics and "parameters" like wikis, dates, etc. to present some data [19:19:52] addshore: so the three "layouts" we support right now are: [19:20:07] 1. metrics-by-projects (dashiki, example is vital signs) [19:20:27] 2. compare (dashiki, example is the comparison of Visual Editor to Wikitext data, good for things like A/B testing) [19:20:58] 3. tabs of arbitrary graphs (limn supports this, an example is the mobile reportcard: http://mobile-reportcard.wmflabs.org/) [19:21:13] addshore: which of those is good for your data? [19:21:35] Personally for me I can see myself using 2 for the data I put in that spreadsheet today [19:21:42] I believe aude may end up wanting something like 1 [19:22:04] and maybe also 3, hah [19:22:52] addshore: so what would your two sides of the comparison be? The spreadsheet looks like it just has several metrics measured over time [19:23:11] ahh, so is comparison simply comparison of 2 things rather than n things? [19:23:26] Analytics-Kanban: Bug: client IP is being hashed differently by the different parallel processors - https://phabricator.wikimedia.org/T112688#1642730 (Milimetric) p:Triage>High [19:23:46] addshore: notice the three buttons on the top right of that dashboard [19:24:01] in that case I would mostly be using 3 [19:24:04] the N things are the comparisons, the 3 buttons on top right are the things you're comparing between [19:25:12] addshore: ok, for 3, you have to create one data file for each graph you want, and host it somewhere public. Then you just point a simple config at that public file and you have a dashboard. Here's the config for the mobile reportcard: https://github.com/wikimedia/analytics-limn-mobile-data/blob/master/dashboards/reportcard.json [19:26:25] okay, and how can I get things into datasets.wikimedia.org/limn-public-data/wikidata/datafiles then for example? [19:26:48] addshore: do you have access to stat1003 or stat1002? You must, if you're running hive, right? [19:26:52] yup [19:27:25] ok, then as long as you're sure they're ok to be shared publicly, from stat1002 you can put files into /a/aggregate-datasets and those will end up on datasets.wikimedia.org/aggregate-datasets after an rsync [19:27:46] awesome! rsync is daily I guess? [19:27:49] it's up to you to organize the cron that gets them there and make a nice folder hierarchy [19:27:54] addshore: rsync is hourly [19:28:00] hourly, awesmoe! [19:28:24] addshore: ok, so do you have limn-wikidata-data yet? Or should I create it? [19:28:24] so, as for file generation am I okay to simply run crons on stat1002 / 1003 then? [19:28:40] milimetric: we do not have that repo yet, so yes it would be great if you could create it! [19:30:13] addshore: cron on those boxes is ok. If you're running SQL or scripts through cron and generating timeseries, you may want to look at our cool script runner called "reportupdater". Let me know and I can show you examples [19:30:17] nuria: Will try to debug tomorrow [19:30:35] joal; k [19:30:39] milimetric: tomorrow morning, talk about PV guard ? [19:30:49] reportupdater sounds like it would be useful! [19:30:51] ?me is hungry ! [19:31:05] Have a nice end of day a-team [19:31:09] Diner time ;) [19:31:17] Analytics-Backlog: Write in-depth dashiki documentation - https://phabricator.wikimedia.org/T112685#1642766 (Nuria) Started: https://wikitech.wikimedia.org/wiki/Analytics/Dashiki [19:31:19] bon apetit joal [19:31:22] 1002 always seems fairly loaded, 1003 has all the same stuff right? thus it might make more sense for me to do my things there? [19:31:25] Merci milimetric :) [19:31:37] bye joal|night ! [19:31:57] heh, I love the message on https://www.mediawiki.org/wiki/Analytics/Limn Warning: Limn is slowly dying [19:32:32] addshore: stat1003 doesn't have hive [19:32:39] ahh okay! [19:33:19] addshore: FYI, both 1002 and 1003 rsync to datasets.wikimedia.org [19:33:36] but on 1002 you put files in /a/aggregate-datasets and on 1003 you put them in /a/limn-public-data [19:33:57] (yes, we know that sucks, but we can't change it because people have a lot of files on there and it would break their workflows) [19:34:02] addshore: https://gerrit.wikimedia.org/r/#/admin/projects/analytics/limn-wikidata-data [19:34:17] addshore: please let me know if you don't have access to merge there, and I can add you or your gerrit group [19:34:51] milimetric: It doesnt look like I do currently [19:35:25] there is a "wikidata" you could add [19:35:31] * "wikidata" group [19:38:48] addshore: done, check now [19:39:07] looks good! [19:39:41] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1642786 (Addshore) Open>Resolved a:Addshore https://gerrit.wikimedia.org/r/#/admin/projects/analytics/limn-wikidata-data has been created and wikidata has access to the group.... [19:39:55] right, I think I'll try and tackle some of this tommorrow then! [19:40:22] It's all clear in my head now at least! :D [19:40:37] milimetric: care to link me to a reportupdater example or two? [19:40:49] addshore: good, we have started docs here: https://wikitech.wikimedia.org/wiki/Analytics/Dashiki [19:40:56] addshore: contributions welcome [19:41:18] nuria: epic! [19:41:32] also is there any timeline on dashiki supporting arbitrary graphs similar to limn? [19:41:43] addshore: reportupdater config: https://github.com/wikimedia/analytics-limn-edit-data/blob/master/edit/config.yaml#L55 and output: http://datasets.wikimedia.org/limn-public-data/metrics/ [19:42:13] addshore: in that example, notice that the name of each section in the reportupdater config is both the name of the SQL file that's executed and the folder under /metrics on datasets.wikimedia.org [19:42:27] from there, the "submetric" is the editor parameter, and the name of the file is the name of the wiki [19:42:37] addshore: iam -personally- not a big beliver in arbitrary graphs, but that is a longer conversation [19:42:39] reportupdater is also quite flexible but we haven't documented that much either [19:42:59] addshore: we can tell you why we do not want to repeat mistakes of teh past [19:43:23] addshore: it's simple to do in dashiki, but yeah, usually not what people want if they go through a few iterations, so we've been going away from that layout [19:44:29] well interestingly layout 1 in cases does look very similar to what I want, but just not comparing wikis.. [19:45:15] specifically for the data in this spreadsheet I made today instead of choosing a project / wiki I would want to choose an api module (for example) [19:47:03] but I guess I will see as dashiki develops and after I do some things in limn if I could simply use dashiki instead [19:47:03] addshore: think how are people going to browse your data, you can talk to a designer and new layouts can be build [19:47:30] nuria: how are the layouts currently controlled? [19:47:33] addshore: 1-off graphs are hardly useful, you end up having a lot of figures without the ability for an end user to find what they want [19:48:05] addshore: we make them, as in analytics team gets together with design team and we design new layout scenarios [19:48:12] https://github.com/wikimedia/analytics-dashiki/tree/master/src/layouts [19:48:52] ottomata: you know what was the kyro issue [19:49:19] what/! [19:49:25] great! thanks all! I have a lot to be looking through now! [19:49:31] ottomata: class name [19:49:38] !? [19:49:54] ottomata: yesss, hive is getting confused somehow [19:50:08] howso? [19:50:12] if i rename UDF to testUDF it works (let me triple check) [19:50:18] ottomata: boy i wish i knew [19:50:18] wow [19:51:25] ottomata: way to loose two days of development, yay!! [19:51:58] addshore: the layouts are controlled by wiki articles, the vital signs one has a very simple config: https://meta.wikimedia.org/wiki/Config:VitalSigns [19:52:25] we could either generalize the metrics-by-project layout to "metrics-by-arbitrary-dimension" and add the "arbitrary-dimension" to the config [19:52:39] or we could just make another layout (probably makes sense since -by-project is a very common case) [19:52:56] milimetric: +1 to make a different layout [19:53:00] milimetric: yeh, I think either of those would cover my usecase [19:53:18] milimetric: in our case having a wiki specific layout makes a lot of sense [19:53:32] yup, I agree with that [19:53:40] yeah, I think when we make that easy and discoverable via documentation, a ton of people are gonna be doing it [19:54:10] but it makes less sense for dashiki as a general dashboarding tool. So maybe -by-project could be a dimension we add at build time... I donno :) [19:54:24] addshore: but any new layout should be designed with mocks 1st to make sure it solves teh use cases we think it does [19:55:11] milimetric: I think having a ready to go project/language layout for WMF is a must [19:55:29] indeed! I'll try and work out what graphs I need to show, aude needs to show and the rest of the team also needs to show and see what currentlys fits etc [19:55:32] milimetric: cause it is both projects and languages let's not forget that. addshore case has 1 dimension [19:56:05] indeed, 1 dimension, in the future that could possible change [19:56:57] yeah, for WMF I think dashiki is very solid right now. just needs docs [19:57:13] it looks lovely! [20:06:15] (PS1) Milimetric: Hack around bad Event Logging IP hash problem [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/238533 (https://phabricator.wikimedia.org/T112688) [20:07:11] (CR) Milimetric: [C: 2 V: 2] Hack around bad Event Logging IP hash problem [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/238533 (https://phabricator.wikimedia.org/T112688) (owner: Milimetric) [20:14:01] addshore: for my use case, i think dashiki can work [20:14:15] what i need as first step is to produce tsvs [20:14:21] I think with a tweak or 2 it may also work for my usecase [20:14:31] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1642878 (aude) \o/ [20:14:55] nuria, responded to your comment on https://gerrit.wikimedia.org/r/#/c/236860/ [20:14:57] addshore: great :) [20:15:01] aude: your stuff is all generated from db queries right? :) [20:15:11] i would like a mode to view it as a table or graph [20:15:15] addshore: it is! [20:15:31] I guess our stuff would be on separate dashboards afaik in terms of dashiki [20:15:42] * aude will start with 1 or a few wikis for usage tracking [20:15:45] (CR) Nuria: "Could we add a test for this and that would be it?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236860 (owner: OliverKeyes) [20:15:49] addshore / aude: different dashboards are no problem [20:15:50] proabably separate [20:16:08] yup, and then I guess some kind of home page for all of the wikidata dashboards [20:16:29] i want to keep track both from a metrics standpoint and to monitor usage to spot possible performance issues or things [20:16:48] sounds good :) [20:16:49] (CR) OliverKeyes: "What do you mean "that would be it"? Caps changes?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236860 (owner: OliverKeyes) [20:16:55] I can see these tsv files getting rather large ;) [20:17:04] possibly [20:17:05] aude, addshore maybe you want to talk a few minutes with a designer to see if you can display both datasets in a cohesive way [20:17:13] in a new dashboard? [20:17:37] addshore, aude: and new layouts can be build [20:17:39] perhaps, we also have other datasets that we will want on a dashboard [20:17:46] probably [20:18:03] the layouts themselves look quite easy to implement, simply pulling in various modules [20:18:12] addshore:, aude (cc milimetric ) but we like to approach dashboards with some information architecture principlkes [20:18:19] nuria: once i get somethign to work at all, then would be happy to get some input [20:18:24] addshore, aude : so we do not end up on the limn [20:18:31] "forever scrolling scenario" [20:18:34] nuria: ok [20:18:36] nuria: yay for avoiding limn if it is dieing :P [20:18:52] aude: cause if you need an extra page to list your wikidata dashboards... mmm [20:19:02] aude: something could be better [20:19:02] for what i need, it's how many pages are using entities on each wikipedia [20:19:08] aude: I think I know how you want the usage tracking displayed so I may try and mock up how this all could look, with the stuff lydia also wants [20:19:18] so, one graph for enwiki, one for de, etc. [20:19:32] addshore, aude: but think about vital-signs [20:19:38] addshore: ok [20:19:43] you can browse 800 wikis on one screen [20:19:49] aude: I think your's would fit in the style vital signs is done [20:19:50] nuria: i want an overview, a total [20:20:11] already have tools.wmflabs.org/audetools/wikidata-stats/ to give you an ide [20:20:20] idea* [20:20:37] aude: sorry, i was trying to point out that thinking about your info architecture before thinking about graphs is the way to go [20:20:44] nuria: i see [20:21:05] aude: that page suffers from "forever scrolling" [20:21:10] * aude nods [20:21:23] aude: so a bit of design love and info architecture can help [20:21:23] it needs sorting, for one thing [20:21:28] :) [20:21:40] Ibut that data could likely be easily shows using the per project dashiki things right now, I think [20:21:54] aude: some time with a designer will go along way [20:21:59] nuria: ok [20:22:02] select the projects, and the metrics are total / sitelinks / labels etc [20:22:35] Analytics-Backlog, Analytics-Dashiki, Editing-Analysis, Editing-Department: Limit wikis on https://edit-analysis.wmflabs.org/compare/ to top 50 Wikipedias {lion} - https://phabricator.wikimedia.org/T112222#1629271 (Jdforrester-WMF) >>! In T112222#1629860, @bmansurov wrote: > How often will this list... [20:22:47] aude, addshore we love building layouts but we like to think about those to avoid past mistakes [20:23:13] ottomata: so for this problem with naming to have happened [20:24:00] ottomata: means that all classes that are loaded by hive cannot really be overriden by you adding a jar (which is fine) [20:24:07] nuria: solved then? [20:24:15] joal|night: well you SOLVED it [20:24:20] and for dashiki as well, it is just the case we setup a labs project for our dashboard / dashboard, put dashiki there and then point it at the right stuff it seems [20:24:36] joal|night: it was the name of the class , so it was a clasloading issue [20:24:44] nuria: naw, I emitted ideas, you tested and actually did the job :) [20:24:52] eh? [20:25:07] ottomata: ahem.. let me explain [20:25:20] ottomata: if you want to change PageviewDefinition.java [20:25:36] yeah nuria, we need to be very careful when testing new implementations of existing functions [20:25:55] ottomata: you cannot test it with your local copy of the jar that you have modified [20:26:26] ottomata: as hive has already loaded the Pageviewdefinition class [20:26:28] nuria: actually you can, but need to use a different name for the UDF :) [20:26:28] OHHHH [20:26:39] aye, maybe we shouldn't auto load those? not sure. [20:26:44] ottomata: nah [20:26:52] ottomata: makes life easier for users [20:26:59] ottomata: we just have to be aware [20:27:04] aye [20:27:07] wow that is an annoying bug [20:27:20] annoying that it manifests as a kryo error [20:27:22] ottomata: indeed ! [20:27:38] Analytics-Backlog: Investigate sample cube pageview_count vs unsampled log pageview count - https://phabricator.wikimedia.org/T108925#1642942 (JKatzWMF) > I am currently computing the daily pageview / webrequest rate for august, to see if there could be big differences daily (I think it could, actually). Plea... [20:27:42] ottomata: i knowww, welcome to java friendly stack traces [20:28:29] addshore: we can host the dashboard for you on a common instance where we have all dashboards. But you're welcome to host it if you like [20:29:56] * joal|night is gone for real ;) [20:32:42] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/wikidata/analytics/dashboard [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238317 (owner: Christopher Johnson (WMDE)) [20:35:35] (PS1) Christopher Johnson (WMDE): move data to subdirectory [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238539 [20:35:37] (PS1) Christopher Johnson (WMDE): adds content data and stacked bar plotter [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238540 [20:35:39] (PS1) Christopher Johnson (WMDE): update content data [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238541 [20:35:44] milimetric: on the common instance sounds like it makes sense! [20:36:28] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] first commit [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238316 (owner: Christopher Johnson (WMDE)) [20:36:59] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] move data to subdirectory [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238539 (owner: Christopher Johnson (WMDE)) [20:37:21] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds content data and stacked bar plotter [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238540 (owner: Christopher Johnson (WMDE)) [20:37:47] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] update content data [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238541 (owner: Christopher Johnson (WMDE)) [20:48:40] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1643003 (Christopher) FYI: I am working on the dashboards and have made some progress using the shiny-server. Check out the very preliminary prototype at http://wdm.wmflabs.org/wdm T... [20:49:39] https://usercontent.irccloud-cdn.com/file/dOpiulWB/ [20:49:50] aude, an example of how the usage tracking could look? ^^ [20:52:22] possible [20:52:47] what i want to avoid is showing all ~280? wikipedias on one graph [20:53:07] yet want to find the ones with the most usage and be able to explore the data [20:53:41] ahhh [20:54:14] currently i am finding usages via the error logs :o [20:54:18] slow queries [20:54:30] hah! [20:54:43] that's how i found what russian and hungarian wikipedia are doing [20:56:49] aude: where is the code / query you currentl used to pull the data out? [20:56:55] I saw you link to a repo the other day! [20:57:19] https://github.com/filbertkm/WikidataStats (it's not that nice yet) [20:57:34] doesn't deal with storing data over time yet [20:57:52] Analytics-Backlog: Investigate sample cube pageview_count vs unsampled log pageview count - https://phabricator.wikimedia.org/T108925#1643028 (Tbayer) As mentioned [[https://phabricator.wikimedia.org/T108925#1563107 | above]], adding or removing these small projects does not resolve the discrepancies. We st... [20:58:40] aude: yeh, data over time would be cool to do asap ;) [20:58:56] yep, obviously and tsvs might work [20:59:44] yeh, although they would likely get big... Thats why I was initially thinking of pulling my data out of places and then shoving it back in some mysql db before then generating tsvs from that [20:59:47] maybe + graphite [21:00:13] * aude once had graphs of change dispatching and used rrdtool for that [21:00:17] yeh, people still havn't told me enough about graphite ;) [21:00:26] :) [21:00:39] graphite is similar but better than rrdtool for this [21:01:20] yeh, I mean if this could also all be shoved into graphite, I love graphana as a tool [21:01:31] but then, thats not public right now anyway... [21:01:49] :/ [21:02:48] (CR) Nuria: "Sorry. Trying again. Could we add a unit test for the new path?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236860 (owner: OliverKeyes) [21:10:21] addshore: thanks for taking the time to think about uses cases a bit [21:10:37] (PS5) Nuria: Make pageview definition aware of preview parameter [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237274 [21:12:45] nuria: yeh, I'm currently drifting some document this my ideas. Will share it with you if it comes to anything! (: [21:12:53] addshore: k [21:12:55] Time for some sleeping soon though! [21:13:15] * aude too [21:46:40] (CR) QChris: "I added wikidata to the Owners group of the /wikidata" [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238319 (owner: Christopher Johnson (WMDE)) [22:18:21] milimetric, so the seed IP patch; pushed? deployed? [22:18:44] ebernhardson and I were talking about the issues he was seeing and realised one possible source for "multiple UUIDs per IP" is "the IPs are actually the same person" ;p [22:20:18] Ironholds: yeah, right, good point. There is no patch yet. The fix is going to take some thinking [22:20:20] the token is generated per page load, so getting 12 tokens from 1 ip is fine. Getting 1 token from 12 ip's is a little odd :) but i think the mail sent to analytics about hashing origin IP address's explains our problem [22:20:28] milimetric, gotcha! [22:20:32] bearloga, look what we found ;p [22:21:08] milimetric, what can discovery do to aid in the thinking and patching? [22:21:29] ebernhardson: yeah, I think you'll see 12 ips per token, definitely [22:22:43] Ironholds: um... we need to figure out a way to get all processor instances to read the same private salt, and re-read it at a determined interval [22:22:58] and these instances may be on different machines [22:23:22] I'm thinking something like a key server or a common file somewhere [22:23:41] makes sense. Do you need bodies for the patching or the thinking? [22:23:44] ottomata and I were brain bouncing a bit [22:23:50] because ebernhardson was volunteered to fix our bug and if this is our bug... ;p [22:24:10] the sooner it gets patched the sooner we can A/B test. Oi, Deskana, read scrollback. [22:24:26] So that thread I read before *was* the issue. [22:24:27] Hah. [22:24:28] heh, cool, well any ideas welcome. I'm happy to help and answer questions [22:24:40] Clearly we need to never make performance improvements again. PROBLEM SOLVED [22:24:53] Deskana: I think the other two issues that Nuria and Gergo found were real as well [22:25:01] a compound problem [22:25:16] but yes, performance is 'da worst [22:25:35] Da. I am having to deal with a similar problem for a volunteer project. [22:25:45] distributed processes dependent on shared resources are gross [22:26:59] well, it's either this or be limited to 1000 events per second forever :) [22:27:44] random idea, could you read it from etcd ? [22:28:00] * ebernhardson just throws random ideas ... see what sticks :P [22:29:07] it would have to be readable from many machines, whichever are running eventlogging_processors. I think otto was mostly concerned with how to do this in puppet so that there would be no manual steps when setting up new processors [22:53:17] Analytics-Engineering, Performance-Team: Support support for "gauge" metric type in Statsv - https://phabricator.wikimedia.org/T112713#1643427 (Krinkle) NEW [22:53:31] Analytics-Engineering, Performance-Team: Support support for "gauge" metric type in Statsv - https://phabricator.wikimedia.org/T112713#1643437 (Krinkle) [22:56:15] https://grafana.wikimedia.org/#/dashboard/db/server-board [23:13:53] Is there a way to get to Eventlogging data from stat1002? Analytics/Data_access#Analytics_slaves says you can, but I'm not finding a way to get to the db password. [23:31:23] nuria: I've added the client_error metrics to https://grafana.wikimedia.org/#/dashboard/db/eventlogging [23:31:35] It's just starting to trickle in, gonna check back later :) (third graph down) [23:40:43] csteipp: mysql --defaults-extra-file=/etc/mysql/conf.d/analytics-research-client.cnf --host analytics-store.eqiad.wmnet [23:42:16] madhuvishy: Thanks! Docs were saying to use research-client.cnf, which looks to only be readable by root.. [23:42:40] csteipp: yeah,if you logged in from stat1003 you can use that [23:42:52] * madhuvishy looks at docs [23:44:10] csteipp: yeah, its a little misplaced. I'll check with andrew as to why this is now different, and update docs