[05:42:29] Analytics, Analytics-Kanban, Pageviews-API: Special characters showing up as question marks in /pageviews/top endpoint - https://phabricator.wikimedia.org/T145043#2647407 (Tbayer) >>! In T145043#2646987, @MusikAnimal wrote: > Judging by the editing activity and mobile vs desktop views, it seems many... [08:13:36] (PS1) Addshore: Remove redundant cast to snak [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/311381 [08:16:03] (CR) WMDE-leszek: [C: 1] Remove redundant cast to snak [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/311381 (owner: Addshore) [08:16:31] Hi elukey [08:21:12] o/ [08:21:18] I am fixing a vk bug [08:21:31] good way to start the week :D [08:22:09] Yeah [08:22:41] elukey: about upload cache, how do you think I should handle? [08:23:09] two solutions: kill current coord, restart a new one with changed config for threshold [08:23:27] or, keep current one, start a new one only for failed period [08:24:04] I'd say to start a new one only for the failed occurrences [08:24:11] (PS5) Addshore: WikidataArticlePlaceholderMetrics also send search referral data [analytics/refinery/source] - https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) [08:24:19] yesterday it was pretty bad but today the issue is not there anymore [08:24:20] (PS6) Addshore: WikidataArticlePlaceholderMetrics also send search referral data [analytics/refinery/source] - https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) [08:24:34] elukey: problem solved with cron change, right? [08:25:02] it is working fine now, but the underlying issue with the upload cache is still wip :( [08:25:10] right [08:25:18] Ok will correct only failed instances [08:25:22] super [08:26:10] vk atm does the following: when the shm log handle is abandoned (like when varnish restarts) it stops gracefully, sending the pending data to kafka and then exit(0) [08:26:23] systemd is set to restart vk only on failure [08:26:41] so exit(0) prevents the restart [08:26:49] and vk doesn't run [08:27:00] this seems to happen right after the cron restart [08:27:09] now I am trying to think if exit(0) is right or not [08:27:29] the permantent fix should be https://phabricator.wikimedia.org/T138747 [08:45:02] (CR) Tobias Gritschacher: [C: 1] Remove redundant cast to snak [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/311381 (owner: Addshore) [08:52:43] (CR) Tobias Gritschacher: [C: 2] Remove redundant cast to snak [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/311381 (owner: Addshore) [08:55:06] (CR) Tobias Gritschacher: [V: 2] Remove redundant cast to snak [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/311381 (owner: Addshore) [08:56:48] Analytics-Tech-community-metrics, Developer-Relations, Documentation: Create basic Kibana (dashboard) documentation for admins - https://phabricator.wikimedia.org/T145929#2647663 (Qgil) [10:18:19] (CR) Joal: "@nuria: See task listed in commit message" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/311127 (https://phabricator.wikimedia.org/T121550) (owner: Joal) [10:56:23] joal: if you don't have anything against it, I'd try to pool aqs1004 in LVS after lunch [10:56:30] and observe metrics [10:56:36] elukey: please go ahead :) [10:56:53] elukey: Let me know when doing, I'll look at metrics as well ;) [11:00:19] super [11:00:29] oozie complained a lot this morning :( [11:00:34] anything that I can do to help? [11:00:37] a bit [11:00:44] I am still working on making vk a bit more resilient [11:00:47] k [11:00:49] but not sure if it will fix the problem [11:01:06] I can't say [11:01:46] we might want to tolerate permantently more errors if this keeps going [11:01:53] but afaics it is all related to yesterday [11:02:14] k [11:02:35] today jobs look ok, so you seem on the point :) [11:13:25] %3 VSLQ_Dispatch: Varnish Log abandoned or overrun. [11:13:25] %3 VSLQ_Dispatch: Attempt to reconnect to the Varnish log.. [11:13:26] %3 VSLQ_Dispatch: Log reaquired! [13:40:59] joal: ready to add aqs1004 to LVS if you are [13:41:00] :) [13:49:29] Analytics-Kanban: Missing pageviews dumps files for October 1 - 26 - https://phabricator.wikimedia.org/T146029#2648221 (Milimetric) [13:49:53] joal, that task I just filed ^ is about missing pageview data [13:49:57] (on dumps) [13:50:14] someone noticed October 1 2015 -> October 26 2015 is missing [13:50:15] k milimetric [13:50:48] just checking with in case you remember something weird from that period, around when we launched the API [13:51:00] milimetric: can't recall really [13:51:07] I'm guessing we just started filling in October, and backfilled the other months but didn't backfill early October [13:51:33] milimetric: I think we launched dumps some time this year [13:51:46] and must have forgotten some bachfilling [13:52:26] no, we launched it back then but didn't update the html pages. K, I'll run the correct oozie jobs as hdfs [13:52:49] milimetric: hm [13:53:14] milimetric: oozie will not be as easy I think, since dumping is part of pageview extraction process [13:54:02] oh? I thought these were the jobs I wrote last year to just archive the pageview_hourly output [13:54:09] I'll check [13:57:56] Analytics-Wikistats: Ireland in Tagalog, Bengali and Urdu Wikipedia traffic breakdown - https://phabricator.wikimedia.org/T143254#2648275 (Bodhisattwa) [14:00:24] I see, it's like half of the hourly workflow, namely this script: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/hourly/transform_pageview_to_legacy_format.hql [14:01:01] so I'll manually run that, then put the files where they need to be for the rsync I guess [14:05:05] milimetric: great, let me know if you need help [14:18:28] Analytics-Cluster, Operations, Patch-For-Review: decom titanium - https://phabricator.wikimedia.org/T145666#2648309 (Ottomata) :O :) Thank you for everybody’s help on this! [14:20:04] a-team: adding aqs1004 to live traffic pool :) [14:20:28] woo hoo \o/ [14:20:32] yall rock :) [14:20:50] :o [14:22:30] all right it is taking traffic, all 200s atm :) [14:23:05] argh I can see a lot of 400s [14:23:36] uh oh [14:24:36] ah no ok it seems good [14:24:41] one of the 400 is https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/da.wikipedia/all-access/user/Invasionen_af_Italien/daily/20160001/20160031 [14:25:10] and https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/da.wikipedia/all-access/user/Invasionen_af_Italien/daily/20160001/20160031 returns me invalid [14:25:41] same thing for https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/it.wikipedia/all-access/user/Pittura_greca/daily/20160001/20160031 [14:25:51] ok [14:26:26] I removed it from the pool, so I am going to do a quick check then re-add [14:29:35] well it seems working fine, aqs100[123] have 10% of weight and aqs1004 has 5% [14:33:46] (CR) Mforns: [C: 2 V: 2] "This is ready to be deployed, because the dependency is already merged. Nuria gave a thumbs up, and asked me to merge this. So will merge." [analytics/reportupdater] - https://gerrit.wikimedia.org/r/308977 (https://phabricator.wikimedia.org/T117538) (owner: Mforns) [14:35:21] ok cool [14:45:03] Analytics-Kanban, EventBus, Wikimedia-Stream: Public Event Streams - https://phabricator.wikimedia.org/T130651#2141764 (GWicke) Another option could be to integrate this into the REST API, without going through RB itself. There is already a /feed/ hierarchy that could be a decent fit. Alternatively,... [14:45:26] so the weird thing that I noticed from server-board is the memory consumption [14:46:32] used memory is a bit high [14:46:43] 54 over 64 GB used [14:46:48] I didn't expect that [14:48:52] Analytics-Kanban, EventBus, Wikimedia-Stream: Public Event Streams - https://phabricator.wikimedia.org/T130651#2648403 (mobrovac) @GWicke what about users that want to receive events from multiple domains? Connection sharing isn't a real advantage here since we are talking about web sockets. On the o... [14:48:52] but we have two cassandra instances :) [15:00:40] a-team: standddupppp [15:00:53] EEK! [15:00:54] coming [15:08:29] Analytics-Kanban, Patch-For-Review: Switch AQS to new cluster - https://phabricator.wikimedia.org/T144497#2648472 (Nuria) [15:08:31] Analytics-Kanban, Patch-For-Review: Coalesce nulls to 0s in output - https://phabricator.wikimedia.org/T144521#2648471 (Nuria) Open>Resolved [15:09:04] Analytics-Kanban, Patch-For-Review: Continue New AQS Loading - https://phabricator.wikimedia.org/T140866#2479035 (Nuria) Open>Resolved [15:09:07] Analytics-Kanban, Datasets-Webstatscollector, RESTBase-Cassandra, Patch-For-Review: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#2648475 (Nuria) [15:09:15] Analytics-Kanban: Better identify varnish/vcl timeouts and document - https://phabricator.wikimedia.org/T138511#2648478 (Nuria) Open>Resolved [15:09:27] Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2648479 (Nuria) Open>Resolved [15:10:50] Analytics-Kanban, EventBus, Wikimedia-Stream: Public Event Streams - https://phabricator.wikimedia.org/T130651#2648481 (Nuria) [15:10:53] Analytics-Kanban, Continuous-Integration-Infrastructure, Differential, EventBus, Wikimedia-Stream: Run Kasocki tests in Jenkins via Differential commits - https://phabricator.wikimedia.org/T145140#2648480 (Nuria) Open>Resolved [15:11:04] Analytics-Kanban, Patch-For-Review: Continue New AQS Loading - https://phabricator.wikimedia.org/T140866#2648484 (Nuria) [15:11:06] Analytics-Kanban: Load top article data into new AQS cluster - https://phabricator.wikimedia.org/T145089#2648483 (Nuria) Open>Resolved [15:11:23] Analytics-Kanban, Patch-For-Review: Switch AQS to new cluster - https://phabricator.wikimedia.org/T144497#2648486 (Nuria) [15:11:25] Analytics-Kanban: Setup regular loading jobs new aqs cluster (per-article, top and unique devices) - https://phabricator.wikimedia.org/T145087#2648485 (Nuria) Open>Resolved [15:11:38] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: https://yarn.wikimedia.org/cluster/scheduler should be behind ldap - https://phabricator.wikimedia.org/T116192#2648487 (Nuria) Open>Resolved [15:18:22] hey joal. can I take 15 min of your time today? maybe some time later in the evening your time? [15:18:39] leila: currently in meeting, will ping you after :) [15:18:58] sounds good. I'll even send a calendar invite. feel free to move it around. [15:21:57] Analytics-Kanban: Pageview hourly stores records that are not really pageviews and those end up on top endpoint? - https://phabricator.wikimedia.org/T145922#2645266 (JAllemandou) @nuria: Page title is not only extracted from uti_path, but sometimes from uri_query. For pageviews you can access extracted title... [15:36:36] Analytics: Responses on pageview API should be lighter - https://phabricator.wikimedia.org/T145935#2645635 (Milimetric) check out GraphQL when thinking about this [15:36:45] Analytics: Responses on pageview API should be lighter - https://phabricator.wikimedia.org/T145935#2648566 (Milimetric) p:Triage>Normal [15:38:07] Analytics, Analytics-Dashiki: Enable HTTP2 for dashiki - https://phabricator.wikimedia.org/T145801#2648576 (Milimetric) Open>Invalid [15:38:51] Analytics, Analytics-Dashiki: passport-mediawiki-oauth doesn't support callback parameter - https://phabricator.wikimedia.org/T145828#2642376 (Milimetric) used on https://github.com/jdlrobson/weekipedia [15:40:11] Analytics: passport-mediawiki-oauth doesn't support callback parameter - https://phabricator.wikimedia.org/T145828#2648589 (Milimetric) [15:40:25] Analytics: passport-mediawiki-oauth doesn't support callback parameter - https://phabricator.wikimedia.org/T145828#2642376 (Milimetric) p:Triage>Low [15:44:28] Analytics-Kanban: Pageview hourly stores records that are not really pageviews and those end up on top endpoint? - https://phabricator.wikimedia.org/T145922#2648619 (Milimetric) [15:44:30] Analytics, Pageviews-API, Wikimedia-General-or-Unknown: 404.php was most read article - https://phabricator.wikimedia.org/T145791#2648621 (Milimetric) [15:47:11] Analytics: Project: migrate mysql eventlogging access to hadoop - https://phabricator.wikimedia.org/T145527#2648636 (Milimetric) p:Triage>Normal [15:47:36] Analytics, Operations: Remove cronspam from stat1002 to root@ - https://phabricator.wikimedia.org/T145606#2635815 (Milimetric) p:Normal>Low [15:47:51] Analytics-Kanban: Special characters showing up as question marks in /pageviews/top endpoint - https://phabricator.wikimedia.org/T145043#2648645 (Milimetric) [15:48:35] Analytics: Get rid of Pageview-API -> Analytics auto-tagging - https://phabricator.wikimedia.org/T146042#2648650 (Milimetric) [15:48:54] Analytics: Get rid of Pageview-API -> Analytics auto-tagging - https://phabricator.wikimedia.org/T146042#2648650 (Milimetric) p:Triage>High [15:49:30] Analytics, Discovery-Analysis, Easy: [REQUEST] Extract search queries from HTTP_REFERER field for a Wikibook - https://phabricator.wikimedia.org/T144714#2648672 (Milimetric) p:Triage>Normal [15:51:51] Analytics, EventBus, Services: Check eventbus Kafka cluster settings for reliability - https://phabricator.wikimedia.org/T144637#2648679 (Milimetric) p:Triage>Normal [15:53:31] Analytics: Much more pageviews in Tagalog Wikipedia since mid-June 2016 - https://phabricator.wikimedia.org/T144635#2648694 (Milimetric) p:Triage>Normal [15:53:55] Analytics, Analytics-Dashiki: Dashboards working on mobile - https://phabricator.wikimedia.org/T144299#2648704 (Milimetric) p:Triage>Normal [15:56:15] Analytics: Add Analytics-Wikistats 2.0 phab project tag - https://phabricator.wikimedia.org/T146043#2648726 (Milimetric) [15:57:39] Analytics, Labs, Labs-Infrastructure: Report page views for labs instances - https://phabricator.wikimedia.org/T103726#2648753 (Milimetric) This should be done with piwik on labs, now that we have more experience with it. [15:59:26] Analytics, Analytics-EventLogging: Analytics Eng monitors for death of EL processes in syslog {oryx} - https://phabricator.wikimedia.org/T97296#2648763 (Milimetric) Open>Invalid we have good monitoring now, no longer relevant [15:59:41] Analytics, Analytics-EventLogging: Analytics Eng monitors consumed EL events in Graphite (valid & write rate) {oryx} - https://phabricator.wikimedia.org/T97295#2648779 (Milimetric) Open>Invalid [16:00:12] Analytics, Analytics-Cluster, Easy: Add better detection of wikipediaApp to user agent UDF {hawk} - https://phabricator.wikimedia.org/T96376#2648782 (Milimetric) Open>Resolved a:Milimetric This has been done for a while [16:01:19] Analytics, Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1195035 (Milimetric) I'm going to untag Analytics, quarry is a different approach, we're about to allow multi-database data access in a different way. [16:01:27] Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#2648793 (Milimetric) [16:03:42] Analytics, Analytics-Cluster: Report pageviews to the annual report - https://phabricator.wikimedia.org/T95573#1194832 (Milimetric) We're ready to do this with piwik whenever you're ready, @Heather. We can enable it the same way we did with Wikipedia 15, sorry I forgot about this more general use case. [16:03:58] Analytics, Analytics-Cluster: Report pageviews to the annual report - https://phabricator.wikimedia.org/T95573#1194832 (Milimetric) p:Triage>Normal [16:05:24] Analytics, Analytics-Cluster, Story: {story} Community downloads new pageview dumps - https://phabricator.wikimedia.org/T95257#2648812 (Milimetric) Open>Resolved a:Milimetric been done for a while [16:17:27] leila: Hi ! [16:17:39] hi joal. [16:17:44] is now a good time for you? [16:17:59] can we do it in 40, or later? [16:18:11] in 40 mins soun [16:18:12] (in the middle of something, joal) [16:18:17] np leila [16:18:21] in 40 minutes [16:19:28] just sent you an invitation, joal. for a bit later than that, please move it around if it doesn't work. [16:19:30] thank you! [16:20:02] hey! quick pointer, anyone, on how to do hive ql from ipython notebooks? thx in advance! [16:21:47] milimetric: ottomata: nuria_: ^ ? [16:23:07] AndyRussG: i do not think we have access to hive from ipython in prod, maybe we have a notebook running on 1002 but as an experiement, i think [16:23:21] i've not heard of hive working in notebooks, but [16:23:31] ellery has def done it with spark, which supports sql and hive tables [16:23:56] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark#Spark_and_Ipython [16:24:16] then you'd have a pyspark shell in a notebook [16:24:20] and can do stuf flike [16:24:23] Analytics, GLAM-Tech, Pageviews-API: WMF pageview API (404 error) when requesting statitsics over around 1000 files on GLAMorgan - https://phabricator.wikimedia.org/T145197#2648942 (Nuria) @Mrjohncummings: we are talking pass each other,. Let me re-explain: the tool is running into several issues: 1... [16:24:58] ottomata: nuria_: yeah! those are exactly the instructions I'm following... [16:25:12] http://spark.apache.org/docs/1.5.0/sql-programming-guide.html#hive-tables [16:25:36] (switch example tab to Pyhon) [16:25:51] i *think* it should work [16:25:54] no guaruntees though :) [16:26:01] AndyRussG: you will be writing spark code not hive code though [16:26:16] AndyRussG: but again, it is a -off so you will need to experiemnt [16:26:20] *experiment [16:26:33] ah hmmm [16:26:41] yeah spark python [16:26:43] but it integrates with hive [16:26:50] so you can issue hive queries to hive tables [16:26:54] and get them back as spark dataframes [16:27:16] but def indeed a one off [16:27:22] ellery does it all the time, so it should be possible [16:27:28] yea exactly [16:27:29] you can also skip the hive integration al together [16:27:42] but you'd need to then load the data from HDFS into spark RDDs yourself [16:27:44] which isn't hard [16:27:52] Here is one I did a while ago [16:27:52] example with eventlogging data here:" [16:27:54] https://gist.github.com/AndrewGreen/7543e550d5d14ccc6e2399950e76f404 [16:28:10] https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Spark [16:28:14] Looking now to pull data from pageview.hourly and webrequest [16:28:28] you can do it with plain spark RDDs, or manually map the data into sql tables and load them [16:28:54] right.... K [16:29:33] yeah those are also instructions I've followed! So, this time yeah I wanted to try some Hive QL queries, and get the results in the ipython notebook [16:29:37] oh btw note the bottom here: [16:29:40] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark [16:29:44] • If you want to use HiveContext in spark, you need to add the hive lib jars and hive-site.xml to spark (not done by default in our version): [16:29:57] oh nice [16:30:11] nice work with the rdd loading AndyRussG :) [16:30:19] yeah, you can keep doing the same for pageview and webrequest [16:30:23] the data is just in different locations [16:30:27] you can see where in hive [16:30:33] by doing show partitions webrequest [16:30:36] (you'll get a lot!) [16:30:37] or um [16:30:49] (CR) Nuria: WikidataArticlePlaceholderMetrics also send search referral data (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) (owner: Addshore) [16:31:15] it might be able to just browse in hdfs to find it [16:31:21] it sall in /wmf/data/wmf [16:31:24] somewhere [16:31:42] but ja, if you want to use the hive tables (which already map to the data in hdfs), you should be able to [16:32:03] launch pyspark in ipython like you have, but also with the hive options at the bottom of that Spark wikitech page [16:32:13] then you should be able to do this stuff [16:32:13] http://spark.apache.org/docs/1.5.0/sql-programming-guide.html#hive-tables [16:32:23] shoudln't need to do the create table stuff, the tables should be available [16:32:27] so hopefully [16:32:44] results = sqlContext.sql("SELECT ... FROM wmf.webrequest ...).collect() [16:32:44] or whatever [16:32:58] Analytics: Traffic Breakdown Report - Visiting Country per Wiki Trend {lama} - https://phabricator.wikimedia.org/T115609#2648983 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [16:34:21] ottomata: cool beans, thx!!! [16:34:27] I had been looking at this: http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables [16:34:39] Maybe that's a more recent version than we have? [16:37:05] dunno if sc (SparkContext) available in the notebook is the same as the spark (spark session?) variable shown there [16:37:45] Ah I see it's the sparksession is the new sparkcontext... http://blog.madhukaraphatak.com/introduction-to-spark-two-part-1/ [16:38:27] yeah, AndyRussG we have 1.5.1 i think [16:38:30] maybe 1.5.0 [16:38:58] K gotcha :) [16:39:08] * AndyRussG stops looking for how to check version :) [16:39:28] K I'll give it a try as u suggest ^ Thx again!!!!! [17:15:54] a-team I am going offline, aqs1004 looks good [17:16:07] Bye elukey ! [17:16:14] o/ [17:18:17] bye! [17:20:40] Analytics-Kanban, EventBus, Wikimedia-Stream: Public Event Streams - https://phabricator.wikimedia.org/T130651#2649210 (Ottomata) @GWicke and I just had an interesting discussion about websockets/socket.io vs. other HTTP based streams, like SSE/EventSource. Now I'm not sure what is better. Reading... [17:23:51] Analytics: Traffic Breakdown Report - Visiting Country per Wikipedia Language {lama} - https://phabricator.wikimedia.org/T115608#2649245 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [17:23:55] Analytics: Traffic Breakdown Report - Browsers from Visiting Country {lama} - https://phabricator.wikimedia.org/T115610#2649250 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [17:24:17] Analytics: Traffic Breakdown Report - Device by Site from Visiting Country {lama} - https://phabricator.wikimedia.org/T115612#2649253 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [17:24:37] Analytics: Traffic Breakdown Report - Browser Trend {lama} - https://phabricator.wikimedia.org/T115602#2649257 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [17:24:43] Analytics: Traffic Breakdown Report - Google Requests {lama} - https://phabricator.wikimedia.org/T115592#2649260 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [17:24:49] Analytics: Traffic Breakdown Report - Mime Type {lama} - https://phabricator.wikimedia.org/T115594#2649264 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [17:24:56] Analytics: Traffic Breakdown Report - Target Wiki {lama} - https://phabricator.wikimedia.org/T115595#2649268 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [17:25:01] Analytics: Traffic Breakdown Report - Crawlers {lama} - https://phabricator.wikimedia.org/T115596#2649271 (Milimetric) Open>Resolved a:Milimetric this was done as part of last year's traffic report migration [17:31:51] Analytics-Kanban: Missing pageviews dumps files for October 1 - 26 - https://phabricator.wikimedia.org/T146029#2649294 (Milimetric) Ok, this is running, it might take a while though. Maybe a day or two? I'll close this when it's done. If you have access to Hue, you can follow progress here: (this is writ... [17:46:30] Analytics, Research-and-Data, Research-collaborations, Research-management: Oozie job to extract data for WDQS research - https://phabricator.wikimedia.org/T146064#2649398 (leila) [17:47:17] hi nuria_. I had a chat with joal about https://phabricator.wikimedia.org/T146064 . This is time-sensitive and it's relatively easy on his end to help us with. [17:47:47] he suggested that I ping you here to let you know. Please let me know if I can help with something or more information is needed. thanks, nuria_ and team! :) [17:57:03] Analytics-Tech-community-metrics, Developer-Relations (Oct-Dec-2016), Documentation: Create basic Kibana (dashboard) documentation for admins - https://phabricator.wikimedia.org/T145929#2649482 (Aklapper) [18:09:17] Analytics, Research-and-Data-Backlog, Research-collaborations, Research-management: Oozie job to extract data for WDQS research - https://phabricator.wikimedia.org/T146064#2649543 (ggellerman) [18:16:32] Analytics, MediaWiki-API, Reading-Infrastructure-Team: Add pageview stats to the action API - https://phabricator.wikimedia.org/T144865#2612941 (Nuria) @tgr: summing up our conversation. Adding batching to the API at the AQS level can be done but there is homework to do before doing so. Besides the... [18:20:11] Analytics, Data-release: Wikipedia Clickstream dataset. Programmatic Access - https://phabricator.wikimedia.org/T134231#2649660 (ggellerman) [18:20:39] Analytics, Data-release: Wikipedia Clickstream dataset. Programmatic Access - https://phabricator.wikimedia.org/T134231#2258736 (ggellerman) removing Research and Data backlog tag. @ellery and @DarTar are already subscribed [18:21:35] Analytics: Percentage of users with DNT on - https://phabricator.wikimedia.org/T127571#2649672 (leila) [18:37:22] yarhg left my power cord at a cafe...running back to get it :/ [18:43:30] lzia: reading. [18:44:48] lzia: ah , real short, so what is the ask? Do you want us to work with nathaniel to set up that job? [18:46:07] nuria_: joal offered to set up the job. [18:47:38] lzia: I prefer if we work with nathaniel to teach him how to do it so as to spread the knowledge so it can become a bit more self service, we have done that recently with some wikidata and discovery folks [18:48:40] nuria_: I'm stepping in a meeting shortly, I'll message you in 45 min or so. [18:48:43] lzia: in fact seems that some of this data might be already available ...but not all [18:49:25] lzia: ok, will look into the work we did with wikidata team. I *think* some of this data is already available. might be wrong though, lemme verify [18:50:15] yergh raining, will go later [19:25:29] rain's been so nice :) [20:01:16] lzia: did not looked at our current data for wikidata, doing that now. [20:05:41] milimetric: it would be really great if the humidity goes down a bit. :D [20:05:49] my hair can't take it anymore, milimetric. :P [20:06:51] nuria_: Nathaniel pairing with joal sounds good to me. I'll ask him to see if he's available. [20:07:34] lzia: ok, if he submits an initial CR we can work together to get the code into shape, let give you urls [20:09:32] milimetric: I want to see you and ottomata one of these days. Maybe we can work together or something? I'll be around until October 6. Ideally we pull ori into this, too. :P [20:09:56] lzia: wait ... do we know sparq queries come in the url? [20:10:08] lzia: wait ... do we know sparql queries come in the url? [20:10:25] what do you mean, nuria_? [20:11:27] lzia: "The first step will be to extract from the raw server logs individual query records that contain valid SPARQL queries".. wait let me do some tests [20:11:59] nuria_: you don't need to spend time on the research page, unless you want to. [20:12:04] lzia: is this project about this: https://query.wikidata.org/? [20:12:07] we know that we need to extract based on two filters: [20:12:10] uri_host='query.wikidata.org' AND webrequest_source='misc' [20:12:17] yes, nuria_. [20:14:21] lzia: ah, ok, and you guys took a look to see that we have all data needed? [20:14:30] yes, nuria_. [20:15:14] lzia: ok, i see, everything is url encoded [20:15:16] this should be a straightforward extraction, nuria_. it will make life easier if they work with a subset of the data, for everyone. [20:18:12] lzia: aham [21:02:44] lzia: you're here?! [21:03:28] talk about humidity, I went away (thinking for like 5 minutes) to knead my bread and put it to rise again. It ended up being a one hour high tech fight due to how moist it is [21:03:58] lzia: I thought you were staying with us when you came? We painted your room and everything :) [21:28:49] Analytics-Kanban: Make top pages for WP:MED articles - https://phabricator.wikimedia.org/T139324#2650468 (Milimetric) Thanks Doc, yeah I'm looking to get some preliminary numbers by the end of the quarter and then to think about productionizing access to this kind of data in Q3 (starting January next year).... [22:44:11] milimetric: it's complicated. I would have loved to [22:44:15] but I still want to see you. [22:44:27] and I promise to come back to use my room milimetric. ;) [22:47:04] yeah, mini hackathon! [22:55:21] I'm down for a mini hackathon, yall welcome to my place anytime (just give me like 2 hours notice so I can make it respectable :)