[00:28:53] 10Analytics, 10Pageviews-API, 07Easy, 03Google-Code-In-2016: Add monthly request stats per article title to pageview api - https://phabricator.wikimedia.org/T139934#2826121 (10Aklapper) [00:29:03] 10Analytics, 10Pageviews-API, 07Easy, 03Google-Code-In-2016: Add monthly request stats per article title to pageview api - https://phabricator.wikimedia.org/T139934#2447192 (10Aklapper) Imported as https://codein.withgoogle.com/tasks/5734408199864320/ [08:48:51] hello people :) [08:50:47] hello elukey :-) [08:54:08] o/ [09:28:52] Heya elukey :) [09:29:15] o/ [09:29:23] wassup elukey ? [09:29:58] all good, slow monday.. and you? [09:30:40] all good as well, heavy weekend since teaching a lot (again 3 hours this afternoon) [09:30:58] elukey: history reconstruction is still causing me pain :( [09:36:46] :( [11:17:40] 10Analytics-Tech-community-metrics: Panel Gerrit-Delays gets Gateway timeout - https://phabricator.wikimedia.org/T151751#2826696 (10Lcanasdiaz) [11:18:10] 10Analytics-Tech-community-metrics: Panel Gerrit-Delays gets Gateway timeout - https://phabricator.wikimedia.org/T151751#2826696 (10Lcanasdiaz) a:05Luiscanasdiaz>03Lcanasdiaz [11:36:38] Gone teaching a-team, should be there at standup [11:42:17] * elukey sings "Get up, stand up, stand up for your rights.. [11:42:52] * elukey lunch! [15:34:13] joal, yt? [15:34:43] :] [15:35:14] or ottomata? :P [15:47:40] (03PS2) 10Nuria: Adding self-identified bot to bot regex [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/323249 (https://phabricator.wikimedia.org/T150990) [15:48:36] (03PS3) 10Nuria: Adding self-identified bot to bot regex [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/323249 (https://phabricator.wikimedia.org/T150990) [15:48:52] Yiii [15:48:54] hi mforns [15:49:02] hey ottomata! [15:49:10] I figured out :] [15:49:32] chmod in hdfs wasn't doing what I wanted, but now it'sok [15:50:05] aye ok [15:50:06] (03CR) 10Nuria: Adding self-identified bot to bot regex (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/323249 (https://phabricator.wikimedia.org/T150990) (owner: 10Nuria) [16:00:52] joal: standdupp [16:03:43] 06Analytics-Kanban, 03Interactive-Sprint, 13Patch-For-Review: Enhance Report Updater to be able to send data to graphite - https://phabricator.wikimedia.org/T150187#2827371 (10Nuria) [16:41:55] 10Analytics, 10Pageviews-API: No results for Special:BlankPage or Special:BlankPage/RTRC - https://phabricator.wikimedia.org/T151363#2815250 (10Milimetric) Currently BlankPage is one of the special pages we don't count as pageviews explicitly: https://github.com/wikimedia/analytics-refinery-source/blob/master... [16:46:09] 10Analytics-Cluster, 10Continuous-Integration-Config: Add CI job Oozie XML stylesheet validation for the analytics/refinery repository - https://phabricator.wikimedia.org/T147072#2827492 (10MarcoAurelio) 05declined>03Open p:05Normal>03Lowest [16:47:46] 10Analytics, 10Analytics-Dashiki: refactor code using available.projects.json to use sitematrix - https://phabricator.wikimedia.org/T136120#2827503 (10Milimetric) [16:48:16] 10Analytics-Cluster, 10Continuous-Integration-Config: Add CI job Oozie XML stylesheet validation for the analytics/refinery repository - https://phabricator.wikimedia.org/T147072#2827504 (10MarcoAurelio) I've reopened because I plan to investigate how to add this job in ligh of recent commits I've submitted to... [16:50:41] 10Analytics-Dashiki, 06Analytics-Kanban: Improve initial load performance for dashiki dashboards - https://phabricator.wikimedia.org/T142395#2827528 (10Milimetric) [16:50:49] 10Analytics-Dashiki, 06Analytics-Kanban: Improve initial load performance for dashiki dashboards - https://phabricator.wikimedia.org/T142395#2533536 (10Milimetric) a:03Nuria [16:52:28] 10Analytics, 10Monitoring, 06Operations: Switch jmxtrans from statsd to graphite line protocol - https://phabricator.wikimedia.org/T73322#2827542 (10Milimetric) 05Open>03declined Shifting focus to using Prometheus instead. [16:56:09] 10Analytics, 07Easy: Pre-generate mysql ORM code for sqoop - https://phabricator.wikimedia.org/T143119#2827552 (10Milimetric) [16:57:37] 10Analytics, 10Analytics-Wikimetrics: delete useless wikimetrics.report or cohort records - https://phabricator.wikimedia.org/T120713#1859758 (10Milimetric) [16:59:11] 06Analytics-Kanban: Check if we can deprecate legacy TSVs production (same time as pagecounts?) - https://phabricator.wikimedia.org/T130729#2827565 (10Milimetric) [17:06:51] milimetric, do you need a review of the RU changes? [17:09:34] a-team: I forgot to ask - a sales developer representative of Cloudera has reached out to discuss our use case and how we use Cloudera software. He'd like to talk with me this week (probably to get if they can help in any way), should it be something to follow up or should I decline? [17:09:50] (he got my contact from ApacheCon) [17:11:17] ottomata: when you have a min can you take a look at Pchelolo comment here, i think it makes sense to keep dots if needed: https://gerrit.wikimedia.org/r/#/c/319671/16/lib/rdkafka-statsd.js@92 [17:11:59] (03CR) 10Nuria: [C: 032 V: 032] Configuration for fi.wikivoyage.org [analytics/refinery] - 10https://gerrit.wikimedia.org/r/323699 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) [17:19:47] mforns: thanks, not yet, 'cause I don't think I have it working yet [17:19:53] ok [17:19:53] I'll ping you when I remove the [WIP] [17:19:58] ok ok :] [17:22:44] * elukey feels ignored :) [17:22:51] 10Analytics, 10Pageviews-API: No results for Special:BlankPage or Special:BlankPage/RTRC - https://phabricator.wikimedia.org/T151363#2815250 (10Nuria) Please see: https://meta.wikimedia.org/wiki/Research:Page_view#Are_Special:_pages_considered_pageviews.3F Best way (to date) to get aproximate pageviews for S... [17:32:06] hey nuria, do we do the wikistats2.0 thing? [17:32:08] :] [17:32:22] ah sorry, was talking to recruiting, 1 sec [17:38:57] 06Analytics-Kanban, 06Operations, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2827706 (10RobH) a:05mark>03RobH [17:46:05] 10Analytics-EventLogging, 10ArchCom-RfC, 06Discovery, 10Graphs, and 10 others: RFC: Use YAML instead of JSON for structured on-wiki content - https://phabricator.wikimedia.org/T147158#2827766 (10MarkTraceur) I stumbled over this while looking at the Multimedia board, and apart from being baffled why our te... [17:49:55] (03PS1) 10Hjiang: added several new modified sql queries, completed wiki_dbs, and made the config_all [analytics/limn-ee-data] - 10https://gerrit.wikimedia.org/r/323861 [18:13:29] milimetric, mforns: Do you want to hear the pain of a wikihistory denormalizer? [18:13:46] yes but in meeting now for the next 17 min. [18:13:52] k:) [18:14:26] joal, hi! yes, after Dan's meeting we have another one, but it's optional, don't know what milimetric prefers to do [18:31:50] aiy :) [18:31:53] brt mforns [18:34:49] 06Analytics-Kanban, 06Operations, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2827948 (10Ottomata) BTW we should name the new box something other than stat1001. We can use an element name if one is available, and that makes sense. This box hosts severa... [18:36:25] mforns: joal: I'm in the cave [18:36:30] milimetric: joining ! [18:52:54] mforns milimetric, this look ok to you? +1 please and I will merge [18:52:55] https://gerrit.wikimedia.org/r/#/c/322969/ [18:57:29] 06Analytics-Kanban, 06Operations, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2828163 (10elukey) >>! In T149911#2827948, @Ottomata wrote: > BTW we should name the new box something other than stat1001. We can use an element name if one is available, and... [18:57:48] ottomata: that puppet change won't work until my changes to reportupdater are deployed [18:57:56] so hold off on merging for now [18:58:14] it needs a different job name anyway, I think [18:58:17] I'll work it out and ping you [19:01:52] ottomata, and also I need to add another job for editor engamement dashboards today or tomorrow [19:15:18] ok cool [19:28:18] 06Analytics-Kanban, 10EventBus, 06Operations, 10ops-codfw: rack/setup kafka2003 - https://phabricator.wikimedia.org/T150340#2828323 (10Ottomata) [19:38:56] milimetric: is it possible to add data cubes to pivot? like if I wanted to make maps tile usage available? [19:39:32] bearloga: short answer is yes [19:40:02] pivot is a front-end to Druid which is a big-data OLAP store. It might not be the best solution depending on what you really need to do with the data [19:40:29] but it's fairly easy to load / update Druid with data from our cluster [19:41:05] we have examples of how we do it with the pageviews data here: https://github.com/wikimedia/analytics-refinery/tree/78d1e5870d397632a6a3832ed4994a8c9593ab6c/oozie/pageview/druid [19:41:10] bearloga: ^ [19:46:11] milimetric: oh, cool! we have a hive query that counts tiles served by various properties (https://datasets.wikimedia.org/aggregate-datasets/maps/tile_aggregates_no_automata.tsv) so it would be cool to add other info from webrequest (e.g. geolocation) and store that in druid and make it accessible as a data cube and then we, yuri et al. could use pivot's [19:46:11] easy-to-use filter & split to quickly figure out that rise in particular type of tile usage is due to russia or whatever [19:49:42] bearloga: yep. To support the argument for this cube you could write a phab task with the details. We can then help you with any infrastructure, which in this case might be just oozie jobs. The tricky part might be splitting off the small part of webrequest that you need. Because right now each of these types of cubes would need to iterate over the whole [19:49:42] webrequest firehose of data. That's not scalable, so we need to talk alternatives. [19:50:11] but we've given that some thought already, a phab task from you would be appreciated [19:52:26] milimetric: will make a task! :) also, idk if that helps, but those requests are specifically from the webrequest_source = 'maps' partition (see https://github.com/wikimedia/wikimedia-discovery-golden/blob/master/maps/tiles.R#L16--L35) but your scalability concerns are still valid [19:53:19] bearloga, that does help a lot, might be able to do a one-off then, as I think that partition is rather small and stored separately [19:54:00] so an oozie job with a hive query provided by you may be all that's needed. [19:54:31] ^_^ [19:55:27] bearloga: if I may - one of the things to consider for druid is dimension cridnality [19:55:46] cardinality sorryt [20:00:25] joal: if I'm using pageviews as a reference point, will it be OK if my data has the same dimension cardinality as pageviews or is pageviews pushing it already and I need to aim for substantially fewer dimensions? [20:02:40] bearloga: It's not much abour number of dimersions, but how many points in each [20:05:19] elukey: i know you are having dinner , no need to look at this now but tears of happiness come to my eyes seeing our reqs/sec : https://grafana.wikimedia.org/dashboard/db/aqs-elukey [20:06:45] bearloga: order of magnitude of 10k points in a dimension is big for our druid cluster [20:12:56] joal: ah! that makes sense. when you say dimension... is project==en.wikipedia.org *a* dimension and project==fr.wikibooks.org another dimension and ua.browser_family==Chrome also another dimension? [20:18:37] bearloga: project is 1 dimension, with cardinality ~900 (in sql, dimension cardinality is count distinct) [20:22:04] joal: ah! Okie dokie, got it :) so it's on a restriction on a per-dimension level and not, for example: |date|*|project|*|country|*|browser_family|*|os_family| < 10k [20:22:32] indeed bearloga [20:25:54] Thank you for the clarification! I think we'll be OK :) Save for the country & UA info that pageviews already has, the largest cardinality we'll have is the tile zoom level dimension which starts at 1 and goes all the way to to 18 or 22 😃 [20:27:25] (In increments of 1, I should add :P) [20:30:11] bearloga: sounds very much ok :) [20:31:07] bearloga: only thing to mention: for long term pageview cubes, we dropped the ua-device-familly part of the user agent, too big [20:32:09] For short term it's ok, even if big - for long term, big cardinality dimensions become more problematic, so we dropped it (as well as city) [20:32:29] a-team, bearloga, it's for me to get some sleep :) [20:32:35] See y'all tomorrow [20:32:50] bye joal :] [20:33:49] byye [20:35:16] bearloga: also maybe this is obvious but the pageviews cubes are not at the article level. The number of different articles is an example of a dimension that didn't fit well in our relatively small druid cluster. For maps, I know they're not used on a ton of pages so that should be ok. [21:18:57] 06Analytics-Kanban, 06Operations: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2828882 (10RobH) [21:20:06] 06Analytics-Kanban: Replace stat1001 - https://phabricator.wikimedia.org/T149438#2828913 (10RobH) [21:20:08] 06Analytics-Kanban, 06Operations: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2828882 (10RobH) [21:20:21] 06Analytics-Kanban: Replace stat1001 - https://phabricator.wikimedia.org/T149438#2752595 (10RobH) [21:20:23] 06Analytics-Kanban, 06Operations, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2768740 (10RobH) 05Open>03Resolved [21:36:28] (03CR) 10Mforns: [C: 04-1] added several new modified sql queries, completed wiki_dbs, and made the config_all (039 comments) [analytics/limn-ee-data] - 10https://gerrit.wikimedia.org/r/323861 (owner: 10Hjiang) [21:43:31] bye a-team, cya tomorrow! [21:43:56] laters [22:12:55] (03CR) 10Milimetric: "Oops, I forgot to submit these comments, sorry." (038 comments) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/322365 (https://phabricator.wikimedia.org/T150187) (owner: 10MaxSem) [22:15:10] (03CR) 10Milimetric: "I initially didn't look at the example config, I'll update my approach to match." [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/322365 (https://phabricator.wikimedia.org/T150187) (owner: 10MaxSem) [22:17:47] 06Analytics-Kanban, 06Operations: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2829198 (10RobH) [22:47:15] 06Analytics-Kanban, 06Operations: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2829242 (10RobH) [22:48:00] 06Analytics-Kanban, 06Operations: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2828882 (10RobH) a:05RobH>03Ottomata Assigning this to @Ottomata for his review. The system has been installed and calls into puppet, it can have its info appended into site.... [23:31:23] 10Analytics, 06Discovery, 06Discovery-Analysis, 03Interactive-Sprint: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2829353 (10mpopov) [23:33:14] milimetric yurik: per previous discussion: https://phabricator.wikimedia.org/T151832 :) [23:35:57] bearloga, nice! I think the referer header is the most interesting, as it tells us who used the map [23:36:17] the server portion should probably be enough [23:38:51] yurik: with referer, we will only be able to have referer class ('external', 'internal', 'none', etc.) because there's a limit on how many distinct values are possible. to find out specific referers (like that pokemon go website), there will always be direct hive access to webrequest table :) [23:39:20] bearloga, what's the limit on the values? [23:40:42] yurik: joal mentioned that druid (where these counts would be stored) has a limit of 10K values (for dimensions, not counts) [23:41:31] bearloga, 10k per each dimension? [23:42:16] yurik: yup. 10k distinct values per dimension. [23:42:16] bearloga, e.g., can we store each language, and each project type in it? e.g. en.wikipedia would be in "en", and in "wikipedia", vs "no language" + "external" [23:44:21] bearloga, kul, in that case would be good to see which wiki project uses what, or to drill down into languages [23:44:34] yurik: oh totally. pageviews data cube has project & language dimensions already, so that will be fine as long as we limit it to wikimedia sites. [23:45:10] your solution sounds good! [23:45:16] I'll edit the description [23:51:26] 10Analytics, 06Discovery, 06Discovery-Analysis, 03Interactive-Sprint: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2829406 (10mpopov)