[00:09:18] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Mobile PMs has reports on session-related metrics from Wikipedia Apps {hawk} - https://phabricator.wikimedia.org/T86535#1251145 (kevinator) [00:14:32] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Mobile PMs has reports on session-related metrics from Wikipedia Apps {hawk} - https://phabricator.wikimedia.org/T86535#1251147 (kevinator) Documentation above has been added. it's wmfuuid. remaining work: - schedule task in oozie - store in file... [05:03:04] Analytics-Cluster, operations, Patch-For-Review: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1251385 (awight) Excellent, thanks @Ottomata! Could you let us know the expected lag time for this new pipe? We would like if banner impression logs become av... [05:03:24] Analytics, Analytics-Kanban, Wikimedia-Fundraising: Provide performant query access to banner show/hide numbers - https://phabricator.wikimedia.org/T90649#1251386 (awight) [05:03:27] Analytics-Cluster, operations, Patch-For-Review: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1251387 (awight) [08:22:11] Filed as https://phabricator.wikimedia.org/T97753 [11:49:33] hi, my name is stefano, and I am a researcher at the oxford internet institute [11:49:43] I have been using some of my free time to draft a very simple visual analytics tool for mid-scale Wikipedia analysis [11:49:58] which uses the Wikipedia API to search for multiple pages in a language editions, [11:50:13] and visualize those pages and the linked pages in another language editions, in a network diagram [11:50:29] you can try the alpha version at http://sdesabbata.github.io/vis-a-wik/ [11:50:36] source code at https://github.com/sdesabbata/vis-a-wik [11:50:55] I am searching for some feedback on functionality and usefulness, and any general comment [11:51:40] any contribution is most welcome! :) --- stefano.desabbata@oii.ox.ac.uk [11:56:41] sdesabbata, hi! that looks awesome! [11:57:09] thanks mforns :) [11:58:12] sdesabbata, I'll bring that up in our daily meeting today (the rest of the team members are in US time zone, so probably not there yet) [11:58:44] that's great, thanks [11:59:19] the long-term aim is to enable editors to analyse wikipedia at a more general level than the single article [11:59:49] hopefully, in a collaborative manner, through a collaborative visualization service [12:00:08] sdesabbata, may I ask if you already have used it to draw any conclusions? [12:00:51] sdesabbata, or have you done any study with it? [12:01:21] mforns, no, I just finished the alpha version a couple of days ago [12:02:43] mforns, I can say that there are several pages on Oxford that are present in English but not in Italian :) [12:03:00] sdesabbata, I see, it is really cool work! I'll show it to the others later today [12:03:05] mforns, which is probably not quite surprising [12:04:07] mforns, hopefully, editors of a particular topic can find it useful, especially multi-langual editors [12:04:15] sdesabbata, I have to leave now for couple hours, got your email for further discussions [12:04:22] sdesabbata, I see [12:04:26] mforns, great, look forward to it [12:04:40] sdesabbata, ok, thank you! bye [12:04:46] mforns, bye [14:00:50] Analytics-Cluster, operations, Patch-For-Review: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1251830 (Ottomata) It should be near realtime, maybe slightly more (less than a second) latency than udp2log. Unless, there is kafka broker downtime (which can... [14:27:07] Analytics-Cluster, operations: Package kafkacat - https://phabricator.wikimedia.org/T97771#1251864 (Ottomata) NEW a:Ottomata [14:44:43] Analytics-Kanban, Labs: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. {mole} - https://phabricator.wikimedia.org/T76075#1251890 (mforns) @coren, @milimetric What I observed back in February is that the revision tables in the wiki DBs (analytics-store) are rec... [15:53:58] ottomata: Heya, let me know when you have a minute plizzzzzz [16:06:24] hello yes joal? [16:06:35] in a meeting, will be back [16:06:39] k [16:06:42] thx [16:22:42] Analytics-Cluster: Setting up Ipython with Spark - https://phabricator.wikimedia.org/T92743#1252191 (Ottomata) Open>Resolved [17:07:11] Analytics-Kanban, Analytics-Wikimetrics: Compact Wikimetrics' old report files {dove} - https://phabricator.wikimedia.org/T95756#1252336 (kevinator) p:Triage>Normal [17:10:14] Analytics-Kanban, Analytics-Wikimetrics: Compact Wikimetrics' old report files {dove} - https://phabricator.wikimedia.org/T95756#1252354 (kevinator) a:mforns [17:38:52] halfak: do you want to talk in the batcave? [17:39:08] kevinator, sure [17:39:51] be right there [17:40:12] kk [17:42:18] Analytics-EventLogging, operations, Patch-For-Review: Reclaim vanadium, move to spares - https://phabricator.wikimedia.org/T95566#1252576 (Cmjohnson) Wiping [17:55:06] kevinator, the code that forces the report failure is on staging, so you can see the rerun button working [17:55:38] kevinator, however, the report will fail again every time [18:22:02] ottomata: sorry for the delay [18:22:06] Are you around ? [18:24:55] ahhhh about to head to cafe! [18:24:59] sorry joal, back in 20ish [18:25:04] no prob :) [18:48:28] Analytics-Cluster, Analytics-Kanban: Refactor webrequest refinement using Spark {hawk} - https://phabricator.wikimedia.org/T97828#1252987 (JAllemandou) NEW a:JAllemandou [18:51:04] joal: back! [18:51:11] what'sup? [18:51:18] Yeah :) [18:51:36] ottomata: got issues testing impala, and using spark 1.3.1 from your dir :( [18:51:42] Nothing works for me tonight ;) [18:53:40] all i have tried is impala-shell, really [18:53:41] whatcha doin? [18:53:47] Same [18:53:53] oh [18:53:59] query? [18:54:05] impala-shell -i analytics1041.eqiad.wmnet [18:54:08] yup [18:54:09] ja [18:54:12] then? [18:54:16] Works for showing dbs [18:54:21] but no computation [18:54:29] different dbs, different errors ! [18:54:33] what query? [18:54:48] what table? [18:54:48] count(1) over an hour of text pageviews [18:55:00] webrequest? [18:55:05] yup [18:55:11] so, ithink we shouldn't use impala on that table, right? [18:55:19] Absolutely right [18:55:21] also, before impala is used, one should run compute stats [18:55:21] ; [18:55:28] t'was not the first thing I tried :) [18:55:34] try on wmf.mediacounts; [18:55:46] Didn't know that ! [18:56:08] I tried on mobile_app_uniques --> missing stats [18:56:19] will run compute stats, then try again :) [18:56:57] Still erro [18:57:06] AM_CANNOT_GET_NODES - AM 'null' error in getNodes() [18:57:11] HM [18:57:30] whaaauuu [18:57:32] weird, hm. ok [18:58:42] sorry :) [18:59:13] About the spark 1.3.1, I was trying to read raw webrequests and got a JSONSerde error [18:59:19] because not in classpath [18:59:24] Where can I find it ? [18:59:32] ottomata: --^ [18:59:46] Analytics-Cluster, operations: Package kafkacat - https://phabricator.wikimedia.org/T97771#1253085 (faidon) kafkacat is already packaged in Debian unstable. It will probably go in to jessie-backports soon. We'd just need to build it for the other distributions we support, if needed. [18:59:54] hm, not sure, haven't really tried it with raw webrequset [19:00:10] you looking for the hcatalog one that he table is defined with? [19:00:52] maybe [19:00:54] /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar [19:01:20] Will try with that :) [19:02:02] Thx a lot ! [19:02:46] hmm, joal, try impala now. i restarted llama and it seems happier... [19:02:48] not sure what happened [19:04:05] Analytics-Cluster, operations: Package kafkacat - https://phabricator.wikimedia.org/T97771#1253116 (Ottomata) Oo! Didn't realize. Awesome! [19:04:05] Seems to work now [19:04:16] Other errors, but due to gz issues [19:04:24] Analytics-Cluster, operations: Backport? and install kafkacat (on stat1002?) - https://phabricator.wikimedia.org/T97771#1253117 (Ottomata) [19:04:39] yeah [19:04:44] try your query on mediacounts [19:04:47] that was my test [19:04:49] yup [19:05:42] nice [19:06:02] I'll look into that gz issue ... [19:06:05] Weird [19:07:01] Now running the query on mobile_apps_uniques_monthly, seems dead [19:07:09] ? [19:07:15] Ah, finaly got an error, but took a long time ! [19:07:46] joal, tunneling to your connected impalad can be useful [19:07:52] you can see what queries are running there [19:07:53] k [19:07:55] ssh -N bast1001.wikimedia.org -L 25000:analytics1041.eqiad.wmnet:25000 [19:08:05] http://localhost:25000/queries [19:08:35] oof something is wrong with the hdfs mount :/ [19:09:11] cool ui [19:09:13] thx [19:09:17] arrf :( [19:10:03] ottomata: hdfs reads the gz file without issue [19:10:17] do you need that table, or are you just playin? :) [19:10:48] playing, but it's the kind of data we probably want to display using impapal :( [19:10:50] impala works best with parquet anyway, right? [19:10:55] correct [19:11:09] oh? i would think we would just compute pageviews, save in parquet,then query that with impala [19:11:11] no? [19:11:24] We'll have pageviews, right [19:11:32] oof, the hdfs mount is messed up [19:11:41] that is why pagecoutnts_all_sites isn't up to date on dumps [19:11:42] errggh [19:12:32] Need help ? [19:14:08] i think i got it., weird. [19:14:12] documenting... :) [19:14:17] k [19:14:59] Let me when you are done [19:16:51] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration [19:16:53] oops [19:17:12] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Fixing_HDFS_mount_at_.2Fmnt.2Fhdfs [19:17:49] k, makes sense :) [19:17:55] Thx for that ! [19:18:28] Wanted to finish about impala usage : I think it would be cool to be able to quesry most of the small data (aggregates) we have [19:19:12] aye, sure [19:19:17] that's all :) [19:19:23] should we save it all in parquet then? :p [19:19:42] Why not :) [19:20:09] If people can access easily and are happy, I don't mind ! [19:21:29] so far i like parquet very much, do it! :) [19:22:26] huhuhu [19:23:07] It will still depend : if people want reports as file they can read, then mayyyyybe text.gz is inevitable ... [19:23:23] isn't that an archive job or something separate? don't really remmeber [19:23:25] But for everything to be queried : Parquet rules [19:23:55] I created a table on top of gz files to append [19:24:29] Anyway ... [19:24:44] Thx for your help ! [19:24:50] Will go for diner [19:24:54] laaater [19:25:24] yuuup! hav a good wekend! [20:17:46] kevinator: declinable? https://phabricator.wikimedia.org/T85027 [20:18:08] Analytics, operations: analytics1013 crashed, investigate... - https://phabricator.wikimedia.org/T97380#1253354 (Ottomata) Open>Resolved [20:18:27] Analytics-Cluster, Patch-For-Review: Upgrade Analytics Cluster to CDH 5.4.0 - https://phabricator.wikimedia.org/T97453#1253357 (Ottomata) This has been tested in labs and in vagrant, and an upgrade of the production cluster is scheduled for Monday. [20:23:12] Analytics-Engineering, MediaWiki-API, Wikipedia-Android-App, Wikipedia-iOS-App: Add page_id and namespace to X-Analytics header in App / api requests - https://phabricator.wikimedia.org/T92875#1253363 (Ottomata) Status update? Its hard for me to follow all of the patches. [20:26:08] Analytics-Cluster, Analytics-Kanban: Expand people's ability to use Hive/Cluster {hawk} - https://phabricator.wikimedia.org/T94903#1253378 (Ottomata) I'm not sure what this ticket is for. Can we close it? [20:26:52] Analytics-Cluster, Patch-For-Review: Make oozie work with spark jobs - https://phabricator.wikimedia.org/T94596#1253388 (Ottomata) Open>declined oozie spark actions are available in CDH 5.4, to which we are upgrading on Monday. [20:26:53] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Mobile PMs has reports on session-related metrics from Wikipedia Apps {hawk} - https://phabricator.wikimedia.org/T86535#1253390 (Ottomata) [20:27:32] Analytics-EventLogging: Make eventlogging-reporter work with generic processor uris, not just 0mq sockets. - https://phabricator.wikimedia.org/T93415#1253392 (Ottomata) [20:27:45] Analytics-EventLogging: Make eventlogging-reporter work with generic processor uris, not just 0mq sockets. - https://phabricator.wikimedia.org/T93415#1136542 (Ottomata) [20:27:53] Analytics-EventLogging: Make eventlogging-reporter work with generic processor uris, not just 0mq sockets. - https://phabricator.wikimedia.org/T93415#1136542 (Ottomata) Joseph, adding you here so we can remember to do this when we switch eventlogging to do more kafka stuff [20:29:28] Analytics, operations, Patch-For-Review: Hadoop logs on logstash are being really spammy - https://phabricator.wikimedia.org/T87206#1253399 (Ottomata) Open>Resolved [20:31:23] Analytics: Explore usage of Spark with Python and streaming - https://phabricator.wikimedia.org/T76351#1253401 (Ottomata) Open>Resolved [20:55:07] Analytics-EventLogging, Analytics-Kanban: Researchers access EventLogging logs to troubleshoot new experiments - https://phabricator.wikimedia.org/T85027#1253453 (kevinator) [20:55:19] ottomata: yes, I think so... let me put a comment in the task [20:58:26] (CR) Catrope: [C: -1] Hide partial week data from charts (1 comment) [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/207534 (https://phabricator.wikimedia.org/T95249) (owner: Mooeypoo) [20:58:44] Analytics-EventLogging, Analytics-Kanban: Researchers access EventLogging logs to troubleshoot new experiments - https://phabricator.wikimedia.org/T85027#1253457 (kevinator) Yes, I believe we can close this task. We have other task to handle the needs here: {T78355} [20:59:03] Analytics-EventLogging, Analytics-Kanban: Researchers access EventLogging logs to troubleshoot new experiments - https://phabricator.wikimedia.org/T85027#1253459 (Ottomata) Open>declined [21:59:14] bye all, have a good weekend [21:59:16] :) [22:01:19] ciao