[00:16:43] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10nettrom_WMF) During our check-in with @Nuria today, I briefly mentioned the current use case I have for getting data from MariaDB. Let me describ... [01:22:26] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Neil_P._Quinn_WMF) [05:31:04] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Nuria) @nettrom_WMF thanks for the notes, I have to say that this is the first time i heard of a survey being persisted to MW, the othe... [06:01:26] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Superset to 0.28.1 - https://phabricator.wikimedia.org/T211605 (10Ottomata) You can see the changes I made to the wikimedia branch here: https://github.com/wikimedia/incubator-superset/commits/wikimedia [06:08:53] 10Analytics, 10Product-Analytics: Page Preview beacons being sent too fast? - https://phabricator.wikimedia.org/T212484 (10Nuria) [06:09:38] 10Analytics, 10Product-Analytics: Page Preview beacons being sent too fast? - https://phabricator.wikimedia.org/T212484 (10Nuria) Note to self, handy sql: select ts, unix_timestamp(ts) - unix_timestamp((lag(ts) over (order by ts))) as delta, uri_path,uri_query .... [06:33:31] 10Analytics: https://www.tracemyfile.com/ is a bot, UA: Mozilla/5.0 (compatible; tracemyfile/1.0) - https://phabricator.wikimedia.org/T212486 (10Nuria) [06:34:58] 10Analytics: https://www.tracemyfile.com/ is a bot, UA: Mozilla/5.0 (compatible; tracemyfile/1.0) - https://phabricator.wikimedia.org/T212486 (10Nuria) [06:53:12] helloooo [06:53:52] wow GC for the namenode looks amazing [06:54:02] only some ms for young gen [06:54:56] 10Analytics, 10Analytics-Kanban, 10Datasets-General-or-Unknown, 10Patch-For-Review: cron job rsyncing dumps webserver logs to stat1005 is broken - https://phabricator.wikimedia.org/T211330 (10elukey) [06:59:58] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Decommission old Hadoop worker nodes and add newer ones - https://phabricator.wikimedia.org/T209929 (10elukey) Last step is to follow https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Decommissioning and d... [07:03:06] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [07:03:28] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [07:03:42] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [07:13:57] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's tables and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10elukey) p:05Triage→03High [07:18:51] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's tables and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) Crossposting my comment from: T210478#4794533 `ops` doesn't need to be migrated. I believ... [07:19:25] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) A couple of updates: * I created T212487 to review and decide what tables to migrate to dbstore100[3-5] among... [07:51:40] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10elukey) [09:17:19] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10elukey) `datasets` seems indeed not useful, I propose to just mysqldump it and save it in... [09:26:10] (03PS1) 10GoranSMilovanovic: Quick Fix - Update Engine [analytics/wmde/WiktionaryCognateDashboard] - 10https://gerrit.wikimedia.org/r/481134 [09:26:32] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] Quick Fix - Update Engine [analytics/wmde/WiktionaryCognateDashboard] - 10https://gerrit.wikimedia.org/r/481134 (owner: 10GoranSMilovanovic) [09:32:55] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) >>! In T212487#4839932, @elukey wrote: > > `flowdb` replication from X1 was re... [09:34:25] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) >>! In T210478#4794536, @Banyek wrote: > If the tables mentioned in https://phabricator.wikimedia.org/T210... [09:35:07] * elukey brb [09:48:27] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Banyek) >>! In T210478#4839959, @Marostegui wrote: > ` > root@cumin1001:~# mysql.py -hdbstore1002 -e "show all slaves... [09:52:29] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) Looks good, one little question - would it be possible to add a staging-like db on every instance if needed in... [09:58:41] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Banyek) >>! In T210478#4839966, @elukey wrote: > Looks good, one little question - would it be possible to add a stagi... [10:02:36] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) Apart from what @Banyek commented, I have a further question: do you mean adding the current stagingdb to... [10:05:47] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Banyek) >>! In T210478#4839973, @Marostegui wrote: > Apart from what @Banyek commented, I have a further question: do... [10:05:55] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) Yes correct an empty staging-like db, and we could call it as we want. I am 99% sure that we will not need it,... [10:08:28] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) However, the original stagingdb is still needed, right? Maybe we should move that discussion to {T212487}? [10:09:09] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) Yeah staging is still needed, I'll comment in the subtask. [10:09:39] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10elukey) The `staging` database is still needed and it is sufficient to copy it to a single... [10:14:17] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) >>! In T212487#4839980, @elukey wrote: > The `staging` database is still neede... [10:16:22] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10elukey) >>! In T212487#4839988, @Marostegui wrote: >>>! In T212487#4839980, @elukey wrote:... [10:20:40] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Superset to 0.28.1 - https://phabricator.wikimedia.org/T211605 (10elukey) Let's restart working on this after the holidays! Thanks :) [10:28:31] 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10Marostegui) p:05Triage→03Normal [11:12:54] going afk after lunch folks, will send e-scrum! [11:12:58] happy holidays :) [11:43:59] 10Analytics, 10Product-Analytics: Page Preview beacons being sent too fast? - https://phabricator.wikimedia.org/T212484 (10phuedx) The beacon should be being sent [[ https://github.com/wikimedia/mediawiki-extensions-Popups/blob/c7e742743bd816726aba133199f16179475b56d2/src/actions.js#L18-L21 | 1 second ]] after... [11:44:08] 10Analytics, 10Product-Analytics: Page Preview beacons being sent too fast? - https://phabricator.wikimedia.org/T212484 (10phuedx) (Emphasis on should) [13:54:58] happy holidays elukey, see you next year! [14:50:03] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Milimetric) First, I agree with @Nuria that ideally we would find one good solution for surveys and stick with it as a rule. EventLogg... [15:57:35] 10Analytics, 10Product-Analytics: Page Preview beacons being sent too fast? - https://phabricator.wikimedia.org/T212484 (10Nuria) I gotta say that 1 sec seems to small of a time, i bet in that case half of the beacons we have are accidental mouse overs that do not speak of content consumption which I thought... [16:09:36] 10Analytics, 10Product-Analytics: Page Preview beacons being sent too fast? - https://phabricator.wikimedia.org/T212484 (10phuedx) To be clear, that's 1 second after the page preview is shown, which occurs no less than 700 ms after the user hovers over the link, i.e. the user has to dwell on the link for >= 1.... [18:00:51] 10Analytics, 10DC-Ops, 10decommission, 10User-Elukey: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH) a:03RobH [18:30:27] I'm realizing that event.mediawiki_revision_score.performer.user_text can be suppressed after the fact, so I should avoid including in ORES dump files--is that correct? [18:30:57] Interesting. We don't have such limitations in other dump files. [18:32:24] Are there any dumps of eventlogging tables? Probably not, I think this is NDA-walled. [18:40:20] 10Analytics, 10Readers-Web-Backlog (Tracking): [Bug] Many JSON decode ReadingDepth schema errors from wikiyy - https://phabricator.wikimedia.org/T212330 (10Jdlrobson) Can't we just whitelist this URI in EventLogging? [18:43:26] awight, what is NDA-walled? [18:43:50] I was thinking of the XML dumps. Those include rev_user_text. [18:44:27] !log Restarted Turnilo to clear a deleted test datasource [18:44:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:45:06] halfak: That's definitely redacted by suppression, and comes from the MW database which has been scrubbed. However, we would be reading an event stream which cannot be suppressed. [18:45:24] Well, not if the redaction comes after the data is written. [18:45:45] We could apply a suppression pass over the data whenever we cut a dump. [18:45:55] XML dumps will contain things that may be redacted in the future, right. But the next XML dump will not include that data. [18:45:59] But we'd not really be hiding much because someone could just listen to the stream and capture that data. [18:46:14] NDA covers access to eventlogging data. [18:46:14] I don't understand your last message. [18:46:24] What are you talking about re. eventlogging? [18:46:31] We're not looking at eventlogging data, are we? [18:46:36] These are public event stream. [18:46:56] We're looking at the event path I pasted above, yes [18:47:03] which "these" are public? [18:47:09] "event path"? [18:47:16] event.mediawiki_revision_score.performer.user_text [18:47:26] That's not eventlogging. [18:47:35] That eventstream [18:47:36] change...prop [18:47:39] Right [18:47:39] so many brands [18:48:03] Eventlogging is the tracking of events that are not generally considered public (usually) using the "eventlogging"(TM) system. [18:48:20] "Click tracking" is another common term. [18:48:48] Right, so event.mediawiki_revision_score.performer.user_text is public data. [18:49:20] that's not quite the right way to look at this, AIUI. Revision text is also public data--until it's redacted. [18:49:29] It's still public data [18:49:38] The redaction isn't about public/private. [18:49:39] It is public data that we have to redact [18:49:42] that's why... [18:49:46] argh. [18:50:09] * awight reads backscroll to try to understand why we're talking about public/private [18:50:10] But it's an important distinction. I think rather than focusing on the revision_score event, I'd like to ask about the plain ol' revision event. [18:50:17] NDA [18:50:52] I think the point here is that we may have to omit username and page_title from our dump, unless we have a fancy way of redacting from Hadoop. [18:50:59] Someone could listen to our event stream quite easily to produce this dump without redaction and they would be perfectly within their rights as it is openly licensed. [18:51:24] I think we need a brand new policy for posting any event dumps. [18:51:25] There may be a ToS with clients, actually. [18:51:28] We haven't done that before. [18:51:46] Sure. Could be. I don't believe we have one. [18:52:20] * awight pulls bell to ask if a-team has insights here [18:52:38] reading [18:53:00] in meeting awight - will read later [18:53:47] thanks :). My question in a nutshell is, should we be redacting or omitting all username and page_title fields from our dumps, or is there an easy way to determine whether specific records should be redacted? [18:54:36] The application is currently an oozie job to transform the revision_score event stream into a format appropriate for making public dumps of ORES data. [18:56:14] ok, awight [18:56:23] page titles are never redacted anywhere for any reason as far as I know [18:57:16] user names are sometimes hidden from a revision via the rev_deleted flag (when bit flag & 4 I believe) [18:58:00] this happens when people query the labs replicas, via this view mechanism: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/labs/db/views/maintain-views.yaml#647 [18:58:06] (look at that, it was 4, yay memory) [18:58:10] :) [18:58:23] and dumps does something similar [18:58:51] but like Aaron pointed out, unless you want to re-generate these every time someone touches a rev_deleted, and then go hunt everyone who downloaded a copy and force them to re-download, it's out there [18:58:58] The main DB dump I believe pulls directly from rev_user_text, which has been redacted in-place. [18:59:18] ok, could be [18:59:56] so basically, I wouldn't worry too much about it, other than either reading from the cloud replicas or applying your own redaction via rev_deleted at the time you generate your dumps [19:00:38] my personal opinion is that there should be one real-time replica that's sanitized and dumped into separate places for batch or real-time use cases [19:01:07] and jobs using it could be re-triggered anytime based on new redaction information (either new rules or just new data) [19:01:26] but we're far away from getting that done, because it requires everyone to collaborate on a big project that nobody has time for [19:01:42] did I answer your question or do you still have doubts? [19:03:11] Okay, well since our consumer only wants the user info for convenience, I would agree with you and I'll just omit user_text. The consumer can download a current user_groups dump to correlate user_id with username, that seems like a simple approach. [19:04:06] sure, start with that and see if anyone gets upset [19:04:40] Thanks for pointing out that page_title is not redacted, I'm a bit surprised though cos you can put illegal etc. text into that. [19:04:54] shhhh WP:BEANS [19:05:12] * awight blushes at https://en.wikipedia.org/w/index.php?title=Special:ProtectedTitles [19:05:17] haha [19:05:19] :) I mean I guess it gets "redacted" when a page is deleted... you could check for that - whether it's in the archive table [19:06:09] aw! that Category would be really useful, too bad it's banned :) [19:07:06] milimetric: Does it seem performant to make these SQL queries from an Oozie job as I'm bulk-transforming eventstream records? [19:08:11] oh! cool, you're doing this in oozie, you could just left-join to the history table and check rev_deleted [19:08:56] it's probably ok performance wise, seems parallelizable enough, but only actual testing will tell, and I can help tweak the query if it's too slow [19:09:26] Great, very helpful pointers! [19:09:28] wait but SQL queries... you mean Hive queries or you're hitting MariaDB somehow? [19:10:27] err I'm making Hive queries so far. I'll have to look at the history table to understand what the suggestion is [19:12:59] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history#Schema [19:13:00] Okay, mediawiki_page_history and mediawiki_user_history will probably work perfectly, once I figure out how to get the most recently updated info. [19:13:13] fancy! [19:13:17] the field that you can check to see whether it's deleted is revision_is_deleted [19:13:43] eh, the page and user histories will be harder to join to from a particular revision id [19:14:09] the mediawiki_history table is basically a join of the revision to both of those [19:14:42] I have page_id and user_id already, happily. [19:14:48] so for a given rev_id, you can find the user name at the time of the revision (event_user_text_historical) or the name currently (event_user_text) [19:14:58] do you not have rev_id? [19:15:01] I do [19:15:19] But it won't answer the question, "is this user now blocked" [19:15:29] yeah, so if you need to know whether the revision is deleted, you'd want to join to history, the user and page tables will give you multiple join results and you'll have to filter by latest or do weird things [19:15:33] it will [19:15:39] you have event_user_blocks [19:15:47] or event_user_blocks_historical [19:16:05] the idea of that table is that it's a giant one-stop-shop [19:16:18] if it doesn't have something that someone needs, we add it [19:16:54] The issue I'm anticipating is that the denormalized mediawiki_history will have the user's blocks at the time the revision is created, but it would require a self-join of that same table by user_id in order to find the user's *current* block status. [19:17:17] awight: that's the difference between the * and *_historical fields [19:17:25] so event_user_blocks has the user's *current* blocks [19:17:35] and event_user_blocks_historical has the blocks at the time of that event (that revision in this case) [19:18:20] I thought Hadoop rows were inefficient to update--are you saying that the entire mediawiki_history table is rebuilt often? [19:18:46] monthly [19:19:18] sorry, should've led with that, is that too slow for your dumps? [19:19:22] Wonderful! Thanks for the explanations, this table will work then. [19:19:50] ok, cool, the way oozie works is there's a dataset defined for mediawiki_history, so you can generate your dumps when mediawiki_history is ready for that month [19:19:58] I should also warn you that this data is still in beta, it's very hard to fully vet it and we have known issues. I think it would still be useful in your case, just making sure you know [19:20:05] We're probably dumping on a monthly cadence, and respecting suppression at all is what I'm aiming for. [19:20:15] This is looking good. [19:20:34] ok, cool, yeah, this will work for you then, and it pulls from cloud anyway, so you get the benefit of any missing user names or whatever built-in [19:20:47] :) that is a bonus. [19:20:47] you don't even have to think about it, just use the user text as it's found there [19:21:04] yep, let me know if you have any questions [19:21:14] Oh, I will ;-) [19:21:35] Really appreciate all the quality time your team has offered! [19:26:01] hey we're happy this stuff is getting used, sometimes communicating at WMF is like screaming at a wall [19:27:22] A wall where you can leave graffiti! [19:28:13] Curious how I will know mediawiki_history.snapshot, to satisfy the partition predicate. [19:28:43] if you build this with oozie, that will come from oozie as a parameter, I'll paste you an example one sec [19:29:07] ah kk thanks [19:29:38] you'll have something like this https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mediawiki/history/reduced/coordinator.xml#L103 [19:29:41] in your coordinator [19:30:05] (Meanwhile "show partitions" is fine for my development.) [19:30:19] and that will be passed by your workflow to your job: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mediawiki/history/reduced/workflow.xml#L184 [19:30:37] yep, exactly, it's YYYY-MM format, one month behind [19:33:38] * awight stares at the smoking streambed left by my exploratory query. [19:46:11] halfak: I just realized, we don't have to deal with page title because a suppressed history would cause the revision to be deleted, and the scoring would error out. [19:49:23] still, mediawiki_history is nice for title because we get the up-to-date page title even it's been moved since scoring. [19:52:49] awight: something to ponder about titles are not deletionsd only, but moves (meeting finished, and backlog read) :) [19:53:13] A page can hae a name at revision A time that is different from its current one [19:53:33] joal: yes! I was just musing, that mediawiki_history is nice because it gives us the new title. [19:53:55] awight: That's the whole point of that huge table - Trying to get [19:54:03] historical information in a usefull way :) [19:54:36] Except from that note on move vs deletion for page, I think you know everything - Ah maybe some info on timing [19:54:45] joal: do you know off-hand whether mediawiki_history.user_text will be empty if the user is blocked by name, e.g. for a libelous name. [19:55:02] awight: Yes, it will [19:55:14] perfect, thanks :-) [19:55:23] awight: user_text will not be filled out for revision events, this is a bit of trickery [19:55:23] Oh maybe not excuse me answered too fast [19:55:33] so event_user_text is the name of the user performing the action [19:55:41] When we say blocked-by-name, we mean blacked by a block in user-table [19:55:42] user_text is the user (if any) that the action is being performed to [19:55:54] so user_text is only filled in when you have user rename or create events [19:55:57] I actually don't know awight [19:56:07] milimetric: thanks, yes event_user_text [19:56:14] I'm sure about users hidden at revision level, but not about the ones hidden at user level [19:56:26] Very interesting question :) [19:56:48] These interactions between suppression are exactly why I'm thrilled to rely on MediaWiki and not to reinvent it. [19:57:07] I imagine that [19:57:26] awight: Can't answer know and will be gone soon, but we'll find you an answer :) [19:57:59] awight: In fo on timing - The mediawiki_history table is usually released on the 8th of the month - So it's not early [19:58:02] joal: I think I'll start by checking event_user_blocks [19:58:24] awight: You might actually come with the answer for me ;) [19:58:36] Timing can be loose for us--I'm planning to kick off my job as a dependent of mediawiki_history [19:58:42] joal: Will do! [19:58:52] awight: We hope to be able to release the dataset earlier in the month, we so far, it's not earlier than 8th (and sometimes 10th) [19:59:00] Cool [19:59:50] Well with those good words, I wish all my analytics and non-analytics fellows a very good end of year, and will take my leave for those few days :) [19:59:56] I would process immediately after the mediawiki_history job because that gives us the best chance of up-to-date redaction, rather than having a potential month with stale redaction. [20:00:06] o/ enjoy! [20:02:03] 10Analytics, 10Product-Analytics: Columns named "dt" in the Data Lake have different formats - https://phabricator.wikimedia.org/T212529 (10nettrom_WMF) [20:04:18] 10Analytics, 10Product-Analytics: Page Preview beacons being sent too fast? - https://phabricator.wikimedia.org/T212484 (10Nuria) I see, that does prevent accidental mouseovers. Now, I still do not see how we can report our page previews mouseovers as content consumption with 1 sec threshold. There cannot be c... [22:44:23] 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Can't write from Spark to local FS - https://phabricator.wikimedia.org/T200609 (10GoranSMilovanovic) 05Open→03Resolved a:03GoranSMilovanovic @JAllemandou @Milimetric Thank you guys - as ever. The dataset is def... [22:55:17] !log Restarted Turnilo to clear a deleted test datasource [22:55:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:59:07] 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES scoring events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) @JAllemandou I didn't have time to chase down the responsible code, but wanted to let you know that the user redactions look good empirical... [23:31:55] 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES scoring events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) Since ORES scores are expensive to recalculate en masse, we only want to refresh scores when a new ORES model or model_version is released.... [23:55:15] (03PS13) 10Mforns: Allow for custom transforms in DataFrameToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477295 (https://phabricator.wikimedia.org/T210099)