[01:48:00] (03CR) 10Awight: "Setting "WIP" again due to the hardcoded datacenter. Also, there's a complication that scores are different yet may be duplicated across " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight) [03:52:39] 10Analytics, 10Analytics-Kanban, 10Chinese-Sites, 10Patch-For-Review: Add Chinese Wikiversity edit-related metrics to Wikistats2 - https://phabricator.wikimedia.org/T213290 (10Shizhao) [04:11:46] 10Analytics, 10Analytics-Kanban, 10Chinese-Sites, 10Patch-For-Review: Add Chinese Wikiversity edit-related metrics to Wikistats 2 - https://phabricator.wikimedia.org/T213290 (10Ericliu1912) [07:23:14] morninggg [07:23:21] so it seems that camus is running fine [07:23:22] goooood [07:47:50] o/ elukey [07:48:02] I watched yesterday night before going to bed ;) [07:48:10] <3 [08:03:37] joal: the only bits missing are report updater and all the spark jobs [08:04:19] elukey: "all the spark jobs" -> 2 jobs right? [08:07:22] joal: I was convinced that there were more from puppet, refine, sanitization, etc.. [08:07:46] elukey: well I can only think of the 2 you mention: refine, and sanitization :D [08:08:04] elukey: The discussion can be around the "etc" (as often) [08:09:42] joal: do we have one single spark job for all the refinements? If so ok, I am ignorant about the spark part :( [08:10:38] I checked the occurrences of profile::analytics::refinery::job::refine_job in puppet for example [08:10:52] elukey: Indeed we do - Spark (or to be more precise, scala in spark driver) handles managing partition folders, checking success files, and run only the needed ones [08:11:30] sure but why in puppet we have multiple profile::analytics::refinery::job::refine_job then? [08:11:33] I count 5 [08:11:46] elukey: This actually is one part that makes me unhappy: This bit of refine is the 3rd 'scheduling' tool we have (oozie, reportupdater, and refine) [08:11:56] * joal wants to invest in AirFlow !!! [08:12:15] yes definitely [08:12:26] elukey: I think it could be related to event-logging + event-bus [08:13:50] for example, one cron to "migrate" is [08:13:50] # Puppet Name: eventlogging_to_druid_navigationtiming_hourly [08:13:51] 0 * * * * /usr/local/bin/eventlogging_to_druid_navigationtiming_hourly >> /var/log/refinery/eventlogging_to_druid_navigationtiming_hourly.log 2>&1 [08:14:01] and this one does a spark2-submit [08:14:08] then there are more [08:14:23] Ah elukey - I know the one I have forgotten: events-to-druid [08:14:25] there are also all the RefineMonitor one [08:14:30] right [08:14:32] Mwarf [08:14:37] ok, 5 it is then ;) [08:14:54] okok, my main concern is not breaking all at once :P [08:15:10] I filed a puppet change to flip only one at the time (hopefully) [08:15:10] elukey: Please break everything and suggest we fix using AirFlow ;) [08:15:14] hahahahah [08:16:10] when all this security madness is ended I promise that I'll work with you on it [08:16:19] \o/ Many thanks :) [09:17:45] 10Analytics, 10Analytics-Kanban: Add 'mediawiki_history_unchecked' dataset to oozie - https://phabricator.wikimedia.org/T213524 (10JAllemandou) a:03JAllemandou [09:17:45] (03PS1) 10Joal: Update mediawiki_history oozie job datasets [analytics/refinery] - 10https://gerrit.wikimedia.org/r/483692 (https://phabricator.wikimedia.org/T213524) [09:19:32] 10Analytics, 10Analytics-Kanban: Update big spark jobs conf with better settings - https://phabricator.wikimedia.org/T213525 (10JAllemandou) [09:20:26] (03PS2) 10Joal: Update big spark jobs settings [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482661 (https://phabricator.wikimedia.org/T213525) [09:24:05] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10Contributors-Analysis, 10Product-Analytics: mediawiki_history missing page events - https://phabricator.wikimedia.org/T205594 (10JAllemandou) [10:32:20] (03CR) 10Joal: [C: 04-1] "Comments inline. IMO one of the most important missing bit is the revision_score daily dataset, with flags only in dedicated daily folders" (0323 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight) [12:51:11] * elukey lunch [12:57:20] 10Analytics, 10Analytics-Wikimetrics: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10Nuria) @MaxSem Noted, still, is teh best replacement that exists for a tool that sees very few use if any. [13:00:57] 10Analytics, 10Analytics-Kanban, 10Chinese-Sites, 10Patch-For-Review: Add Chinese Wikiversity edit-related metrics to Wikistats 2 - https://phabricator.wikimedia.org/T213290 (10Nuria) Nice, there should be metrics up to January 2019 when the February snapshot is computed. [13:02:40] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Nuria) p:05Low→03Normal a:03fdans [13:03:21] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Nuria) Ok, moving to kanban and assigning to fdans as background work, e-mail will be sent to product-analytics... [13:03:38] 10Analytics, 10Analytics-Kanban, 10Contributors-Analysis, 10Product-Analytics: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Nuria) [13:13:47] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Nuria) @Marostegui I am a bit lost, I though jaime was talking about "prod"datab... [13:58:20] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10jcrespo) @Nuria There is apparently 2 tools (or the same, reused), one on produc... [14:00:02] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Nuria) @jcrespo it seems we should be able to deploy (out of the box with a new... [14:01:36] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10jcrespo) Even us roots have one for mysql administration: ` root@cumin1001:~$ my... [14:08:00] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10jcrespo) >>! In T212386#4872811, @Nuria wrote: > @jcrespo it seems we should be... [14:12:20] 10Analytics, 10Anti-Harassment, 10Product-Analytics: Add partial blocks to mediawiki history tables - https://phabricator.wikimedia.org/T211950 (10Nuria) Super, super thanks to @nettrom_WMF for flagging this issue so we can incorporate changes to mw history [14:36:02] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic, 10User-Elukey: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10elukey) [14:43:09] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10elukey) >>! In T212386#4872834, @jcrespo wrote: >>>! In T212386#4872811, @Nuria... [14:47:20] 10Analytics, 10Analytics-Kanban: Reportupdater queries jobs failing - https://phabricator.wikimedia.org/T213219 (10Nuria) 05Open→03Resolved [14:47:41] 10Analytics, 10Analytics-Kanban: Reportupdater queries jobs failing - https://phabricator.wikimedia.org/T213219 (10Nuria) Thanks to @milimetric for the fast turnaround here [15:24:32] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10jcrespo) > Can we discuss about how to implement these? We use "standard" ports... [15:31:06] fdans: I'm sorry, I'm lost in a deep hole of pings and notifications [15:31:22] I think I need to dig myself out before I can do any more useful work [15:31:41] milimetric: yeah no problem, i'm looking on my own right now :) [15:32:05] fdans: not sure if you synced up with Jo yet, he thinks he found the double Id problem [15:33:20] oh [15:33:33] fdans: it's pretty explicitly set in processDeleteEvent: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/page/PageHistoryBuilder.scala#L206 (and a few more lines down) [15:34:03] like, we're just setting both. I forgot to ask him if he was going to submit a patch for it, but if he's not, you can [15:35:01] when I'm done with pings, I'll build a new jar and run reconstruction on my test sqoop from November. Then I can run the same thing with a patch for this issue, and we can compare [15:36:41] milimetric: ok on one hand I feel useless, but on the other that's exactly the line I was just investigating, so I'm a lil happy [15:37:00] fdans: welcome to working with Jo :) [15:37:54] though, a thorough review of this code is not in any way a waste of time [15:38:08] we need all of us to be very familiar with it [15:38:36] milimetric: I agree and I'm learning a lot but god do I feel unproductive [15:38:47] yep [15:38:48] I need to get a wikistats task or something for my brain's sake [15:39:02] yep, that's how I feel all the time [15:39:19] who knows, maybe you'll grow to love CSS to :) [15:39:21] *too [15:40:19] milimetric: this is how I feel when dealing with CSS: [15:40:20] https://www.youtube.com/watch?v=vGCIGEUB32M [15:46:59] fdans: Aubrey Plaza's reaction is so great [15:47:39] I'd rather deal with neverending poop markers than whatever happened in the logging table [15:54:07] nuria: want to chat in cave? Or maybe even during your sync-up with John, this is pretty relevant [16:25:02] (03PS9) 10Milimetric: Join to new actor and comment tables [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476553 (https://phabricator.wikimedia.org/T210543) [16:25:04] (03PS7) 10Milimetric: Update mediawiki-history comment and actor joins [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [16:28:12] fdans: you got this: https://phabricator.wikimedia.org/T206894#4867372? [16:29:30] milimetric: yep! [16:30:23] 10Analytics, 10EventBus, 10Operations, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10Ottomata) p:05Triage→03Normal [16:48:50] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10Milimetric) Maybe if you use `"manifest_version": 2` you have to change the format of the whole extension.json file, cou... [16:52:37] 10Analytics, 10EventBus, 10serviceops, 10Services (watching): Datacenter aware configs for EventGate topic prefixes - https://phabricator.wikimedia.org/T213564 (10Ottomata) p:05Triage→03Normal [17:00:12] a-team: standdduppp [17:00:32] ping fdans mforns joal [17:00:49] 10Analytics, 10EventBus, 10serviceops, 10Services (watching): Datacenter aware configs for EventGate topic prefixes - https://phabricator.wikimedia.org/T213564 (10Pchelolo) > can render the service-runner config.yaml template with values provided by it. We can also include it as an env variable into the c... [17:02:10] trying to join... [17:19:56] 10Analytics, 10Research: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) [17:22:52] 10Analytics, 10Research, 10Article-Recommendation: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) [17:28:46] 10Analytics, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) p:05Triage→03Normal [17:29:13] 10Analytics, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) [17:39:26] o/ [17:40:03] o/ [17:40:53] isaacj and I are writing the onboarding materials for Research and we have run into a question. We have a step as part of our onboarding that is called Analytics system walkthrough. Originally, Research imagined this as an in-person onboarding with 1-2 people from Analytics. I still find this useful. Is this something that we can count on you all for? If yes, can I ask one of you to help us write a short description what needs [17:41:30] What I'm imagining is: reach out to person_x in Analytics to schedule a 1 hour meeting. You should expect to learn about x, y, and z in this meeting. [17:42:54] yeah definitely, I think that it is reasonable, but better to see what the others of the team thinks about it :) [17:43:22] ideally a new researchers should start reading our docs [17:43:35] and then possibly come up with questions etc.. during the meeting? [17:43:43] *researcher [17:44:22] like starting from https://wikitech.wikimedia.org/wiki/Analytics [17:44:32] --> https://wikitech.wikimedia.org/wiki/Analytics#Datasets [17:44:46] --> https://wikitech.wikimedia.org/wiki/Analytics#Systems_-_Analytics/Systems [17:44:49] leila: --^ [17:48:12] * leila reads [17:49:23] elukey: thanks for the links. took a note. I'll wait to hear from others as well. milimetric, ottomata, and joal, ^, for whenever you have time. [17:51:00] indeed i think its great [17:56:17] * elukey off [17:58:06] (03CR) 10Awight: Oozie jobs to produce ORES data (039 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight) [18:15:18] FWIW, I looked through the oozie code and test cases, and I'm mostly convinced that there's no wildcard facility. [18:15:25] for uri-template [18:28:17] ottomata: can you review https://office.wikimedia.org/wiki/Research/Onboarding#Analytics_systems_overview and update as needed? [18:35:54] leila: sounds good! i'd say when someone is requesting a meeting, you should probably give them some pointers on who to reach out to [18:35:59] any of us will do really :) [18:36:01] any/all of us [18:36:24] ottomata: who is the volunteer? :D [18:36:56] me/joseph/dan? [18:47:04] ottomata: ok. :) [19:03:54] musikanimal: hello, yt? [19:04:11] yo! [19:04:32] musikanimal: i was thinking that for grant metrics on cloud labs you should add piwik so you get stats of usage [19:05:04] musikanimal: sounds familiar? We have http://piwik.wikimedia.org [19:05:19] as a solution to capture traffic from small sites [19:06:01] musikanimal:you just need a beacon and it will be ready to go, you can look at wikistats and you will see the piwik beacon [19:06:22] cool! we do have a ticket open about getting usage stats [19:06:25] musikanimal: it will gicve you info as to what urls your users use/browsers etc [19:06:37] musikanimal: so it is as simple as adding a beacon [19:06:55] how do I login? LDAP credentials didn't work [19:10:20] msg musikanimal [19:27:14] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) @elukey as per our chat past week I would like to get at least one host up and ready and with data in plac... [20:51:41] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10srishakatux) @Milimetric I am working on a script for a gadget, and I've copy/pasted the code here: https://pastebin.com... [20:53:17] 10Analytics, 10Scoring-platform-team: Investigate formal test framework for Oozie jobs - https://phabricator.wikimedia.org/T213496 (10greg) removing #releng as this seems like just an investigation task, please do let us know what you find out! [21:42:28] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10Milimetric) @srishakatux I have good news and bad news. The good news is that I added this to my extension.json (with "... [21:48:55] 10Analytics, 10Anti-Harassment, 10Product-Analytics: Distinguish between initial blocks and block modifications in the Mediawiki user history table - https://phabricator.wikimedia.org/T213583 (10nettrom_WMF) [21:50:44] 10Analytics, 10Anti-Harassment, 10Product-Analytics: Distinguish between types of block events in the Mediawiki user history table - https://phabricator.wikimedia.org/T213583 (10nettrom_WMF) [21:50:47] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10Growth-Team, and 2 others: Add EditAttemptStep properties to the schema whitelist - https://phabricator.wikimedia.org/T208332 (10mforns) [22:10:27] 10Analytics, 10Anti-Harassment, 10Product-Analytics: Distinguish between types of block events in the Mediawiki user history table - https://phabricator.wikimedia.org/T213583 (10TBolliger) [22:13:39] 10Analytics, 10Anti-Harassment, 10Product-Analytics: Distinguish between types of block events in the Mediawiki user history table - https://phabricator.wikimedia.org/T213583 (10TBolliger) That list looks comprehensive to me. Block modifications will be a catchall of all changes for partial blocks when they'... [23:40:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10srishakatux) @Milimetric Good news from my end too :D I am able to see the events in the log file. `vagrant git-update...