[03:09:29] Analytics, Pageviews-API: Use proper domain names for pageviews API - https://phabricator.wikimedia.org/T127030#2030197 (Yurik) NEW [03:38:57] (PS2) Nuria: [WIP] Dashiki gets pageview data from pageview API [analytics/dashiki] - https://gerrit.wikimedia.org/r/270867 (https://phabricator.wikimedia.org/T124063) [06:23:46] Analytics, Pageviews-API: Pageviews's title parameter is impossible to generate from wiki markup - https://phabricator.wikimedia.org/T127034#2030266 (Yurik) NEW [10:58:22] (PS1) Joal: Add monthly top to cassandra and correct jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/270921 [11:00:25] !log Deploying Refinery from tin (trying) [11:04:48] !log Deploying refinery on hdfs from stat1002 [11:13:49] !log Launch last_access_uniques daily job [11:22:27] !log Launch last_access_uniques monthly job [11:46:25] deployments working again :D [11:46:33] It does elukey ! [11:46:41] Thanks for having lloked into that yesterday :) [11:50:19] "looked" is a big word since I haven't done anything useful :D [11:50:44] anyhow, I am going to re-study the notes that I've taken in SF with you this afternoon about cluster maintenance [11:50:47] :D [11:50:51] Ah, I'm sure you did :) Positive waves are always important :) [13:07:40] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, WMF-deploy-2016-02-09_(1.27.0-wmf.13): Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2030956 (jcrespo) a:jcrespo>None @Neil_P._Qui... [14:08:34] (PS1) Mforns: Improve the data format of the browser report [analytics/refinery] - https://gerrit.wikimedia.org/r/270955 (https://phabricator.wikimedia.org/T126282) [14:09:04] (CR) Mforns: [C: -1] "Still WIP" [analytics/refinery] - https://gerrit.wikimedia.org/r/270955 (https://phabricator.wikimedia.org/T126282) (owner: Mforns) [14:18:02] joal, do you have a minute for a Yarn question? [14:18:33] sure elukey [14:20:43] thanks! So I am watching the scheduler page, I have all the notes related to the queues meaning, etc.. [14:21:37] but I was wondering how to figure out the "status" of the cluster, for example how overloaded it is and if it can handle another job like the one that I have to do (dumps backfill) [14:21:55] it is an excercise to figure out how to read that page :) [14:22:56] because the 97.4% used overall usage probably is not the best indicator.. right? [14:22:57] elukey: True :) [14:23:42] elukey: Various thoughts as they come: High load on the cluster is good: we use available resource :) [14:24:03] However overload on the cluster is not good - How to identify ? [14:24:39] Usually a good indicator is the number of currently running jobs in production [14:26:31] mmmmm [14:27:13] We consider production here because default queue resources can be prehempted by essential and production ones [14:27:29] Back to the code: essential queue === camus [14:27:36] to the CORE, sorry :) [14:27:59] production === hdfs or hdfs-discovery (production style jobs) [14:28:11] priority === user jobs that needs a bump [14:28:22] default === user jobs [14:28:39] descriptions here too: https://github.com/wikimedia/operations-puppet/blob/production/modules/role/templates/analytics_cluster/hadoop/fair-scheduler.xml.erb#L21 [14:28:41] good morning! ) [14:28:42] :)] [14:28:48] o/ [14:28:51] Hey ottomata :) [14:28:54] Thanks for the lionk [14:30:37] elukey: Then about single job resource evaluation, it depends on various factors: data size and format, operation complexity (number of steps and complexity of each) [14:32:34] all right I'll probably have to learn it on the field :D [14:32:53] elukey: Definitely there's some of that :) [14:39:08] joal: i had dinner with diederik last night, a long time ago he worked with dan and I on the analytics team [14:39:22] he's now working at shopify, and I told him about your CamusPartitionChecker, and he really wants to use it! [14:39:39] i told him I'd look into putting it into our fork of Camus instead refinery maybe. [14:39:50] he also said that he thinks linkedin still accepts pull requests [14:42:03] cool ottomata :) [14:44:48] elukey: about relative data size, you can look here: https://wikitech.wikimedia.org/w/index.php?title=File:Pageview_@_Wikimedia_(WMF_Analytics_lightning_talk,_June_2015).pdf&page=6 [14:51:03] thanks! I'll probably need to chat with milimetric about https://phabricator.wikimedia.org/T126464 [14:51:05] to have a more precise idea [14:56:43] Hi elukey, precise idea about what? [15:00:47] hello! about what to do :) [15:06:18] Oh! It's straightforward, are you ready to work on it? We could do a hangout [15:06:29] elukey: ^ [15:12:00] sure! in 10 mins is it ok? [15:12:36] of course [15:22:41] milimetric: ready if you are [15:23:06] elukey: to the batcave! [15:25:12] Analytics, Analytics-Cluster: Upgrade to CDH 5.5 - https://phabricator.wikimedia.org/T119646#2031506 (Ottomata) For reference: http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_upgrade_5_to_latest.html [15:28:26] ottomata: HDFS metadata upgrade: should we plan for a downtime ? [15:29:16] Analytics-Cluster, Operations, hardware-requests: eqiad: New Hive / Oozie server node in eqiad Analytics VLAN - https://phabricator.wikimedia.org/T124945#2031518 (Ottomata) Bump! [15:29:28] joal: i'm reading the cdh upgrade docs now [15:29:33] k [15:29:43] we will plan for downtime, but i think there may not be a metadata upgrade in this version jump [15:29:58] going to do it in analytics labs first [15:30:05] then will set up official beta analytics cluster [15:30:07] and do the upgrade there [15:30:31] yeah makes sense :) [15:30:40] Thanks a lot for the beta cluster setup ! [15:37:07] joal: just perused the cdh upgrade docs, actually looks pretty straightforward. will be the simplest CDH upgrade i've done yet! :) [15:37:27] mostly just stop stuff, install new pacakges, upgrade any dbs (oozie, hive), start it all back up [15:37:39] great ottomata ! [15:37:48] (PS2) Mforns: Improve the data format of the browser report [analytics/refinery] - https://gerrit.wikimedia.org/r/270955 (https://phabricator.wikimedia.org/T126282) [15:37:59] IIRC from the fast scanning of the page you sent, there might not even have any DB update :) [15:38:18] (CR) Mforns: [C: -1] "Still WIP" [analytics/refinery] - https://gerrit.wikimedia.org/r/270955 (https://phabricator.wikimedia.org/T126282) (owner: Mforns) [15:38:42] Sounds good ottomata [15:54:16] Analytics-Tech-community-metrics, Developer-Relations, DevRel-March-2016: Play with Bitergia's Kabana UI (which might potential replace our current UI on korma.wmflabs.org) - https://phabricator.wikimedia.org/T127078#2031579 (Aklapper) NEW a:Aklapper [15:55:07] (CR) Nuria: "Could you please explain on commit message why are we changing the number of reducers?" [analytics/refinery] - https://gerrit.wikimedia.org/r/270921 (owner: Joal) [15:56:50] (CR) Nuria: [C: 2] "Looks good, shoudl we merge this already?" [analytics/refinery] - https://gerrit.wikimedia.org/r/270789 (https://phabricator.wikimedia.org/T127000) (owner: Mforns) [15:58:14] Analytics, Analytics-Kanban: Back-fill pageviews data for dumps.wikimedia.org to May 2015 - https://phabricator.wikimedia.org/T126464#2031606 (elukey) Adding more info after a chat with Dan. The dumps are not visible yet in the /other folder but only in http://dumps.wikimedia.org/other/pageviews/ The 20... [15:58:38] (CR) Nuria: "Do we want to give the last pass to this code? the last set of comments are pretty minor." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [16:01:07] Analytics-Tech-community-metrics, Developer-Relations, DevRel-March-2016: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#2031612 (Aklapper) [16:05:35] wikimedia/mediawiki-extensions-EventLogging#531 (wmf/1.27.0-wmf.14 - b1bc172 : Antoine Musso): The build has errored. [16:05:35] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/b1bc172933d0 [16:05:35] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/109633830 [16:09:16] milimetric: updated https://phabricator.wikimedia.org/T126464, thanks! [16:10:27] Analytics, Analytics-Kanban: Back-fill pageviews data for dumps.wikimedia.org to May 2015 - https://phabricator.wikimedia.org/T126464#2031691 (Milimetric) This [WIP] patch [1] is the one that will add the pageviews dataset to dumps.wikimedia.org and bring some general sanity to the analytics data presente... [16:13:03] no clue what travis-ci complains about the CI jobs on https://gerrit.wikimedia.org/r/#/c/270977/ pass :D [16:13:38] oh the instance is missing etcd ... [16:26:23] Analytics: Make last access data public - https://phabricator.wikimedia.org/T126767#2031760 (Nuria) [16:31:47] nuria: standup? [16:33:55] Analytics-EventLogging, Analytics-Kanban: Some blacklist matching schemas are being consumed by Eventlogging {oryx} [5 pts] - https://phabricator.wikimedia.org/T126410#2031793 (Milimetric) [16:38:44] (CR) Nuria: [V: 2] Move location of pageview and projectview archives [analytics/refinery] - https://gerrit.wikimedia.org/r/270789 (https://phabricator.wikimedia.org/T127000) (owner: Mforns) [16:56:08] Analytics: Make beeline easier to use as a Hive client {hawk} - https://phabricator.wikimedia.org/T116123#2031924 (Ottomata) I think something else must be going on other than raising the heap limit. When I do `export HADOOP_HEAPSIZE=1111 && beeline`, the JVM process that is launched is ``` /usr/lib/jvm/j... [16:56:11] ottomata: Lasdt line of [16:56:22] /usr/lib/hive/bin/hive-config.sh [16:58:54] ja makes sense joal [16:58:57] that's the default [16:59:10] setting $HADOOP_HEAPSIZE works just fine for me [16:59:21] ottomata: If we want to change the default for hive clis, maybe it's the place :) [17:00:59] yes [17:01:06] oh [17:01:12] we don't want to change the default joal [17:01:15] we want people to override [17:01:19] if they want to [17:01:22] if we wanted to change default [17:01:35] the proper thing to do would be to export that env into all shells be default, via /etc/profile.d [17:03:09] ok ottomata [17:03:17] I'm a dirty man :) [17:08:18] lunchtime! [17:09:00] Analytics-Kanban, Patch-For-Review: Move archive/pageview and archive/projectview into archive/pageview_aggregates {hawk} - https://phabricator.wikimedia.org/T127000#2032002 (Milimetric) Marcel, I know Erik consumes the geo files from that location, so we should talk to him. But what's the motivation for... [17:10:40] Analytics-Kanban, Reading-Admin, Patch-For-Review: Tabular layout on dashiki [8 pts] {lama} - https://phabricator.wikimedia.org/T118329#2032009 (Milimetric) I'm sorry, I should've kept this open as a tracking task and made sub-tasks. This had two purposes - the broken down tabular layout by itself, and... [17:14:57] Analytics-Kanban, Patch-For-Review: Move archive/pageview and archive/projectview into archive/pageview_aggregates {hawk} - https://phabricator.wikimedia.org/T127000#2032034 (mforns) @Milimetric Yesterday we spent some time after standup discussing a good naming for the folders that will hold the "traffic... [17:16:10] (PS2) Joal: Add monthly top to cassandra and correct jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/270921 (https://phabricator.wikimedia.org/T120113) [17:23:06] (CR) Bearloga: "Hi! Yes, I'll respond/patch soon :) Have been busy. Thanks" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [17:31:42] Analytics-Kanban, Patch-For-Review: Move archive/pageview and archive/projectview into archive/pageview_aggregates {hawk} - https://phabricator.wikimedia.org/T127000#2032118 (Milimetric) hm... i kind of like how the current names mirror a little bit the organization in the analytics-refinery/oozie folder,... [17:36:25] joal: re: the zika research that you said you might be interested in [17:37:07] I'm currently looking into the timeline of the outbreak and trying to think about what geo-tagged data I could provide at what granularity etc. that would be useful to researchers. [17:37:31] I am happy to chat about that, or about anonymizing it properly once I figure out what I think would be useful [17:38:04] and obviously all of a-team is welcome to help with this if they agree it's a useful project, and all of a-team is also of course welcome to ignore me as well :) [18:03:21] madhuvishy: Please remember to log on SAL when you deploy auto increment to EL [18:03:32] nuria: sure [18:07:20] nuria: I'm deploying from tin for the first time - I get [18:07:22] https://www.irccloud.com/pastebin/AKbZWTsH/ [18:07:24] Analytics-Kanban, Patch-For-Review: Move archive/pageview and archive/projectview into archive/pageview_aggregates {hawk} - https://phabricator.wikimedia.org/T127000#2032296 (mforns) @Milimetric, @Ottomata and others: I agree with @Milimetric. And I think we should leave this for later, it's taking some t... [18:07:46] madhuvishy: you might not have permits [18:07:52] madhuvishy: do you have sudo? [18:07:56] nuria: no [18:08:03] do i need sudo in tin? [18:09:17] madhuvishy: I seem to remember that yes, we do, let me try to deploy myself [18:11:05] nuria: hey [18:11:14] ori: hola [18:11:19] nuria: PM :) [18:11:20] madhuvishy: i cannot even ssh to tin now [18:11:30] nuria: huh [18:11:46] ottomata: do we need sudo on tin to do git deploy? [18:11:52] no [18:12:10] nuria: tin has changed a bit, maybe you just have a stale ssh known host key? [18:12:14] also [18:12:14] try [18:12:15] sshing [18:12:16] ottomata: I ran it and it claims I'm missing git user.name and user.email config [18:12:19] ssh deployment.eqiad.wmnet [18:12:35] madhuvishy: then you need to config your user.name and user.email in your user's gitconfig :p [18:12:47] git config --add ... [18:12:53] ottomata: taht works, let me update docs [18:13:13] nuria: ha, uh, eventlogging docs for clearing an ssh key? [18:13:14] ottomata: ah yes that's what i thought it was [18:13:19] that will happen for any server any time it is reinstalled [18:13:28] ottomata: nah, that ssh deployment works [18:13:49] nuria: I'm gonna try and deploy [18:13:54] madhuvishy: k [18:13:54] is that okay? [18:13:58] ok [18:15:31] madhuvishy: i just verified that i can ssh [18:15:48] ottomata: aah but i cannot git config --add because - error: could not lock config file .git/config: No such file or directory [18:16:04] hmmmm [18:16:05] ma be [18:16:18] ok [18:16:19] nvm [18:16:21] my bad [18:18:10] madhuvishy: k let me know [18:19:23] ottomata: should I do something apart from git pull to update config/schemas [18:19:32] it complains that the repo is dirty [18:20:00] looking [18:21:57] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, WMF-deploy-2016-02-09_(1.27.0-wmf.13): Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2032369 (Neil_P._Quinn_WMF) @jcrespo, I'm don't tot... [18:23:41] ah madhuvishy i'm not sure here, but yes, I *think* you should do: git submodule update --init [18:23:51] it won't hurt either way [18:23:53] ok [18:23:54] and that is probably good to run [18:24:25] okay cool - trying git deploy sync [18:27:21] !log Deployed and restarted eventlogging with Auto-increment ID and Update config/schemas submodule changes (T124741 and T125135) [18:27:29] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} [8 pts] - https://phabricator.wikimedia.org/T125135#1979211 (Stashbot) {nav icon=file, name=Mentioned in SAL, href=https://tools.wmflabs.org/sal/log/AVLrVjYDW8txF7J0uTNN} [2016-02-16T18... [18:28:41] nuria: ^^ [18:29:29] madhuvishy: k [18:29:47] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, WMF-deploy-2016-02-09_(1.27.0-wmf.13): Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2032416 (jcrespo) With "before"? Do you see anythin... [18:31:04] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} [8 pts] - https://phabricator.wikimedia.org/T125135#2032421 (madhuvishy) @jcrespo The auto-increment id change went live on our side - new schemas or tables that get created should have... [18:32:09] Analytics-EventLogging, Analytics-Kanban, DBA, Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} [8 pts] - https://phabricator.wikimedia.org/T125135#2032423 (Nuria) [18:32:28] Analytics-EventLogging, Analytics-Kanban, DBA, Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} [8 pts] - https://phabricator.wikimedia.org/T125135#1979211 (Nuria) @jcrespo: Tagging this task with dba so it appears in your queue [18:32:48] madhuvishy: k just tagged task with DBA, we will leave it in reday to deploy as jaime's work needs to happen [18:32:57] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, WMF-deploy-2016-02-09_(1.27.0-wmf.13): Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2032425 (Neil_P._Quinn_WMF) @jcrespo, oh, I see. I... [18:32:59] madhuvishy: did you check system was inserting correctly? [18:34:54] Hey milimetric, sorry for not answering, was away for a while [18:35:21] We can talk around data granularity and possible anonymization whenever you want :) [18:35:34] no problem! I'm having lunch, reading the Zika article to understand more context [18:35:40] For sure [18:35:57] Analytics-EventLogging, Analytics-Kanban, DBA, Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} [8 pts] - https://phabricator.wikimedia.org/T125135#2032434 (jcrespo) a:madhuvishy>jcrespo Taking it from here, will analyze current state to see what changes are needed... [18:36:19] milimetric: I also thought some comments on the task were very interesting (how wikipedia, like google, get buzz-effect stats that might polute the actual informative data) [18:37:43] madhuvishy: are you here? [18:37:59] madhuvishy: ready for your lightning talk? [18:39:21] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, WMF-deploy-2016-02-09_(1.27.0-wmf.13): Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2032452 (jcrespo) > I delete everything forever. "... [18:40:07] kevinator: yes! sorry I haven't replied to those emails - I took yesterday off - will reply now :) [18:40:59] madhu, can you join us in the hangout or join us on the 5th floor? [18:41:02] kevinator: it's today? [18:41:12] yes, in 20 minutes [18:41:18] oh shit I really thought it was tomorrow [18:41:34] can you still present? [18:42:47] kevinator: Yes I think so - I'll join the hangout in 5 minutes - sorry [18:44:52] yay, thanks! [18:48:17] kevinator: I joined [18:48:22] can I go last? [18:50:25] madhuvishy: I can help you however needed, want me to go over slides? [18:51:12] nuria: just need to add november numbers [18:52:01] madhuvishy: want me to do that? [18:52:17] nuria: sure that would be great [18:52:56] madhuvishy: give me editing permits [18:53:02] madhuvishy: i'll have it done in a sec [18:53:30] nuria: done - slide 17 [18:54:08] nuria: can you also add a link to the nocookie work on slide 16 [18:55:47] madhuvishy: done [18:59:21] joal: yeah, true, but I'm thinking that wouldn't exist in the timeline before these diseases hit the news, so there's probably peace and quiet there. And then we should be able to pin news events on the timeline and attempt to clean their impact out of the baseline trend. If I had a lot of time I'd look into audio editing and how they remove specific kinds [18:59:21] of noise from tracks. [19:04:58] madhuvishy: done, take a look [19:08:05] madhuvishy: can add more to your liking [19:08:22] nuria: it looks great thank you so much [19:08:26] madhuvishy: np [19:17:51] madhuvishy: added link on slide #5 as i got some docs together about that [19:17:56] madhuvishy: just friday [19:17:58] https://meta.wikimedia.org/wiki/Research:Unique_Devices#Differences_between_unique_devices_and_unique_users [19:18:04] nuria: ya alright [19:18:14] madhuvishy: will not touch it no more [19:18:18] hands offf! [19:18:40] nuria: :) [19:18:48] talk in 2 minutes [19:24:47] (PS1) Joal: [WIP] Pageview Sanitization [analytics/refinery/source] - https://gerrit.wikimedia.org/r/271033 [19:25:02] Hey mforns, when you have time, if you want --^ [19:25:12] joal, sure! [19:25:23] I have it in my TODO [19:25:26] mforns: no rush, listening to madhu first ;) [19:25:32] yea [19:26:18] mforns: There is a file about checking Pageview sanitization --> You can read it if you want, but no need to review, it'll be removed :) [19:32:31] Analytics, Pageviews-API: Pageviews's title parameter is impossible to generate from wiki markup - https://phabricator.wikimedia.org/T127034#2032643 (Umherirrender) Try {{FULLPAGENAMEE:$2}} [19:34:33] phew [19:47:33] nuria: slides on commons here - https://commons.wikimedia.org/wiki/File:Unique_devices_lightning_talk.pdf [19:58:07] Good talk madhu :) [19:58:42] joal: thanks! [20:41:43] madhuvishy: all right, let's link them to docs [21:00:20] milimetric, you around? [21:00:29] hi mforns [21:00:30] course [21:01:06] hey, I was thinking, now that we'll have a hierarchy viz, I thought we could have just one report for both desktop and mobile, instead of 2 [21:01:18] and let the viz handle the access_method breakdown [21:01:26] what do you think? [21:02:45] Analytics, Pageviews-API: Pageviews's title parameter is impossible to generate from wiki markup - https://phabricator.wikimedia.org/T127034#2033026 (Milimetric) p:Triage>High [21:03:49] mforns: I'm not sure [21:04:18] if we make the viz do these things, for the tabular layout, then all the different charts have to do it too [21:04:20] (PS1) Ottomata: Put in place proper camus-wmf-0.1.0-wmf6.jar with correct shasum [analytics/refinery] - https://gerrit.wikimedia.org/r/271070 [21:04:31] like the dygraphs timeseries, table, etc. [21:04:38] (CR) Ottomata: [C: 2 V: 2] Put in place proper camus-wmf-0.1.0-wmf6.jar with correct shasum [analytics/refinery] - https://gerrit.wikimedia.org/r/271070 (owner: Ottomata) [21:04:54] milimetric, no no, I mean, having the access_method as the first level of the hierarchy [21:04:57] instead, I was thinking we could allow configuring more than one datasource on the dashboard config page [21:05:04] so instead of metric: ..., submetric: ... [21:05:13] milimetric, what do you mean by {{FULLPAGENAMEE:$2}} ? [21:05:41] we could have metrics: { name: request_breakdown, submetrics: [mobile_os, desktop_os, total_os] } [21:06:17] Analytics, Pageviews-API: Pageviews's title parameter is impossible to generate from wiki markup - https://phabricator.wikimedia.org/T127034#2033047 (Milimetric) agreed this is important, @Yurik, but we've got a big backlog [21:06:23] yurik: that wasn't me, it was someone else commenting [21:06:29] oh [21:06:34] i agree with you that the current way we handle titles is crazy tho [21:06:35] sorry, let me double check [21:06:51] we should've done better, we weren't thinking about the use cases that popped up [21:07:08] milimetric, yeah, keep with the standard, and it will work out ;) [21:07:16] invent your own, and people will complain [21:07:37] we didn't invent anything, we're in a totally different stack, so we have no access to the code that handles that right now [21:07:48] milimetric, what would { name: request_breakdown, submetrics: [mobile_os, desktop_os, total_os] } show? [21:08:06] milimetric, how hard would it be to do a string replace of "spaces->underscores" when you process a title? [21:08:10] in the api [21:08:33] mforns: so the job in that case would be responsible for generating multiple output files, which would be stored in request_breakdown/mobile_os/all.tsv, request_breakdown/desktop_os/all.tsv, etc. [21:09:03] and then the visualization would grab all the files and show all the columns from all the files [21:09:22] ugh but then we have to prefix the columns, and people might not want to show all of them by default [21:09:53] so we'd probably have to add: "enabled_columns: [a, b, c]" which would turn off the other columns in the legend [21:10:09] this is basically all the stuff we solved in limn [21:10:25] milimetric, wouldn't it be easier to have the first level of the hierarchy be the access_method? [21:10:39] in the hierarchy, yeah, but what about all the other visualizations? [21:10:53] and have the inner-most circle be the access_method breakdown, it could be zoomed in and out [21:11:03] my main point is that if we do something, we have to move all the visualizations forward together [21:11:08] ah [21:11:18] otherwise it gets really confusing from a config. point of view, like which files go where [21:11:47] yurik: it's not just that, should the first letter get capitalized? Should weird characters get escaped? [21:11:52] should blank spaces get removed? [21:12:15] milimetric, I thought only some visualizations will support hierarchical data [21:12:16] milimetric, none of the above -- all that can be done in wiki. Its only the space->underscore thats not possible [21:12:41] like hierarchy and tabular, no? [21:12:44] that's easy then, yurik [21:12:50] milimetric, go for it :) [21:12:53] we just had no way of knowing to do that [21:13:12] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Refactor analytics/cdh roles to use hiera, setup Analytics Cluster in beta labs. [21 pts] - https://phabricator.wikimedia.org/T109859#2033068 (Ottomata) Yeehaw! Done! deployment-analytics03 should be our main entrypoint for using hadoop clients (H... [21:13:13] hehe, normalization is a hard problem [21:13:26] milimetric, i will poke at it a bit more [21:13:37] mforns: I don't really know, but ideally any visualization should be able to show any data in whatever way makes sense [21:14:04] Analytics-Cluster, Analytics-Kanban: Upgrade to CDH 5.5 - https://phabricator.wikimedia.org/T119646#2033070 (Ottomata) [21:14:09] in the case of the hierarchy, we have a lot of data that can be pivoted for the visualizer you made, but we can also sum it up and plot it in a timeseries [21:14:19] milimetric, basic problem is that if the user supplies a page name, its their fault if its not working, but if we try to generate pagename ourselves, than its a problem. Example - show graph for the current article [21:14:43] by ourselves i mean a template without parameters [21:14:45] I know, yurik, I totally agree that's a problem [21:15:08] i'm experimenting with some other possibilities, but does not look good [21:15:22] mforns: I think desktop versus mobile is not really a good fit for hierarchical [21:15:27] the ultimate solution is to use lua, but than its a non-transferable [21:15:31] cause it is a different space [21:15:33] i'm getting other bug reports related to this and it's very confusing for me to think about what meets everyone's expectations around encoding / decoding / capitalizing / replacing [21:16:06] mforns: yeah, let's keep all these files separate. Keep the graphs stupid for now [21:16:10] graph one thing at a time [21:16:15] mobile on one graph, desktop on another [21:16:22] milimetric: +1 [21:16:32] with the tabs layout, people can just make a separate tab for desktop, and another for mobile [21:16:45] mforns: more so cause the dataset consider is different [21:16:47] and open two windows side by side on each tab [21:17:10] Analytics-Cluster, Analytics-Kanban: Upgrade analytics and beta project Analytics Clusters to CDH 5.5 [8 pts] - https://phabricator.wikimedia.org/T127115#2033076 (Ottomata) NEW a:Ottomata [21:17:18] nuria, milimetric, what we have today is actually mobile vs. desktop+mobile [21:17:25] essentially, I don't want the back-end to limit what the visualization is able to do [21:17:49] mforns: why not desktop+mobile in the same file vs. mobile and desktop in separate files? [21:18:11] milimetric, you mean 3 reports? [21:18:39] mforns: before we compute it yes, but after data is completely different. You cannot split the global report and get the mobile one [21:19:29] mforns: a hierarchy would be mobile> chrome> version [21:19:36] mforns: wait why did this split even come up? who said we should split out mobile? [21:19:51] maybe .. i... am ... getting .. confused [21:20:05] nuria, yes, mobile> chrome> version [21:20:05] milimetric: we have two reports 1) global browsers [21:20:14] guys - this is crazy, let's talk in the batcave, I can't follow anymore [21:20:15] 2) mobile only [21:20:17] :) [21:20:18] jaja [21:20:19] hehehe ok [21:20:26] ay ay [21:21:28] Analytics, Pageviews-API: Pageviews's title parameter is impossible to generate from wiki markup - https://phabricator.wikimedia.org/T127034#2033111 (Yurik) @umherirrender - that will generate a URL that does not escape slashes. `{{#if: {{{2|}}} | FULLPAGENAMEE:{{{2}}} | {{ARTICLEPAGENAME}} }}` with pa... [21:31:53] milimetric, yeah, it seems space is the only thing that stands in the way. Simply because in mediawiki there is no "db key" keyword. I cannot get a db key, i can only get a real page name + some encoding. And the weird encoding that converts spaces to '_' keeps the slashes, and the proper encoding that escapes slashes encodes spaces as %20 [21:32:50] so if you can juts fix that, we can put that graph on every talk page ;) [21:33:01] huge impact right away :D [22:14:33] yurik: I want to do that, but right now there's a bug about how it treats url-encoded strings and it seems that restbase is not decoding properly or I messed something up in my handler [22:14:54] so I'll fix both of these together [22:14:58] but probably tomorrow [22:15:04] thanks! [22:19:35] a-team, see you tomorrow, good night! [22:50:29] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, WMF-deploy-2016-02-09_(1.27.0-wmf.13): Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2033450 (Neil_P._Quinn_WMF) @jcrespo, I just found...