[01:02:55] <wikibugs>	 (03PS8) 10Awight: Schema for ORES scores [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732)
[01:02:57] <wikibugs>	 (03PS4) 10Awight: [WIP] Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753
[01:04:53] <wikibugs>	 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES scoring events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) Updated patches should have working DDL and HQL scripts, but I still need to refine and smoke test the job definitions.  Denormalized outpu...
[01:08:03] <wikibugs>	 (03PS5) 10Awight: [WIP] Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732)
[06:48:58] <elukey>	 morning!
[07:40:32] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10srishakatux) @Milimetric Thanks! I'm now able to see the journal logs. But, still running into exactly the same error as...
[08:12:17] <fdans>	 mooorning
[08:12:46] <elukey>	 hola :)
[09:08:56] <addshore>	 Goat Morning
[09:53:05] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, 10Core Platform Team Backlog (Watching / External): Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Banyek)
[09:53:09] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Banyek) 05Open→03Resolved a:03Banyek I cleaned up the tables, so I close the ticket
[11:12:59] <elukey>	 ok so we are almost ready to flip the first camus job to systemd timer
[11:13:07] <elukey>	 everything is in code review
[11:13:24] <elukey>	 I chose to migrate only netflow as testing use case
[11:13:31] <elukey>	 if good I'll move all the others
[11:13:43] <elukey>	 and also Marcel's Hive2Druid stuff
[11:14:34] <elukey>	 ah and also sanitization
[11:14:48] <elukey>	 I'd need to move fast since we keep adding crons! :P
[11:32:31] <elukey>	 fdans: going afk for lunch + errand, if you want we can catch up with Superset when I am back
[11:33:04] <fdans>	 elukey: sounds good!
[12:57:22] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10Milimetric) I'm sorry, I focused on the devserver problems and completely missed the more obvious error you posted.  Tha...
[13:02:48] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10Milimetric) And yes, file an access request with #sre-access-requests for analytics-privatedata-users, you can cc me in...
[13:09:06] <leila>	 elukey: o/
[13:09:39] <leila>	 elukey: I'm a bit lost in the discussion in T172410 . Our team was generally fine with the original proposal you had in the description, but things may have changed recently?
[13:09:40] <stashbot>	 T172410: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410
[13:47:31] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: Code sample for extension.json is wrong - https://phabricator.wikimedia.org/T213285 (10Milimetric) p:05Triage→03Normal
[13:54:22] <wikibugs>	 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10SBisson) >>! In T212172#4849443, @Milimetric wrote: >>> The ultimate purpose of collecting this data is to personalize new users' exper...
[13:55:29] <elukey>	 leila: o/
[13:55:58] <elukey>	 so there are two main points
[13:56:28] <elukey>	 1) we need to move dbstore1002 to a 3 hosts solution, each one running multiple mysql instances that replicate a wiki section (like s1, s2, etc..)
[13:56:52] <elukey>	 together with the staging db and some others (I pinged you to verify one of them in a subtask)
[13:57:05] <elukey>	 this is basically what we have been discussing during these months, nothing changed
[13:57:52] <elukey>	 2) eventually in the bright future we'd have only the Data Lake on Hadoop and nothing more, so the more use cases moved to Hadoop the better in the medium future
[13:57:59] <elukey>	 --
[13:59:15] <elukey>	 what we are discussing in the task now is how to support some use cases from the data analysis world to avoid breaking people's daily workflows when we decommission dbstore1002
[13:59:54] <elukey>	 so in theory you guys should be ok, we'll have of course to sync for the migration to the new hosts but nothing more
[14:00:13] <elukey>	 Not sure if this is clearer or not :(
[14:00:25] <leila>	 elukey: great. this is clear. 
[14:00:37] * leila looks for the subtask that elukey mentions
[14:03:34] <leila>	 elukey: if by subtask you mean T212487, I have already responded.
[14:03:35] <stashbot>	 T212487: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487
[14:07:02] <elukey>	 leila: yep yep sorry I didn't mean that you didn't, it was only to add context :)
[14:07:05] <elukey>	 sorry 
[14:07:36] <leila>	 no worries. I'm catching up with emails and I may have missed it. happy that I'm not /that/ behind this one. ;)
[14:08:16] <elukey>	 :)
[14:13:03] <milimetric>	 hey fdans I have a couple other things to catch up with this morning
[14:13:19] <milimetric>	 I had to put out a couple fires yesterday
[14:13:28] <milimetric>	 so let's do our next scheduled eye bleed tomorrow morning
[14:13:51] <elukey>	 !log shutdown all the hdfs datanode daemons on the decom nodes (analytics1028->41)
[14:13:54] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:14:11] <elukey>	 from the logs those daemons are only collecting deleted
[14:14:14] <elukey>	 *deletes
[14:14:20] <elukey>	 I'll start with a couple
[14:29:00] <elukey>	 ottomata: o/
[14:30:29] <ottomata>	 o/
[14:31:11] <wikibugs>	 10Analytics, 10MediaWiki-API, 10PageViewInfo, 10Pageviews-API: API Analytics - page views by country - https://phabricator.wikimedia.org/T213221 (10Anomie)
[14:31:14] <elukey>	 if you are caffeinated - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482767/ - I'd merge this and then restart namenodes to complete decom
[14:36:09] <elukey>	 ottomata: --^
[14:37:15] <ottomata>	 elukey:  +1
[14:37:20] <elukey>	 and last https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/483136/
[14:37:32] <ottomata>	 elukey:  should we merge those superset patches?
[14:37:58] <elukey>	 ottomata: I didn't have time today but please do if you have, you are surely more knowlegdeable then me and less prone to failures :D
[14:38:09] <elukey>	 there is a deployment server in labs + superset node
[14:38:19] <elukey>	 I can quickly deploy and let Fran test
[14:39:10] <ottomata>	 i think if superset runs from the latest stuff there we should just merge
[14:39:16] <ottomata>	 if we need more patches to fix bugs we can make them
[14:39:46] <elukey>	 sure
[14:40:02] <ottomata>	 actually, i'm goigng to go ahead and merge the first 2, the last one that actually bumps the version we can wait to verify that ^^
[14:40:23] <elukey>	 I am going to restart namenodes in the meantime
[14:40:43] <ottomata>	 k!
[14:40:49] <ottomata>	 (i'm also a little confused about the state of my pathces...)
[14:41:11] <elukey>	 I messed up the last one with a rebase, sorry
[14:42:29] <fdans>	 elukey: ottomata helloooo I can test whatever if you want :)
[14:45:44] <elukey>	 just restarted an-master1002, the old nodes are gone
[14:46:05] <elukey>	 I am going to wait a bit, failover, restart namenode on an-master1001, wait a bit, failover again
[14:46:12] <elukey>	 and then clean up hosts.exclude
[14:50:52] <ottomata>	 gr8 :)
[14:57:05] <wikibugs>	 (03PS2) 10Ottomata: Use wikimedia superset fork to build_wheels.  @wikimedia branch currently at 0.26.3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481053
[14:57:24] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Use wikimedia superset fork to build_wheels.  @wikimedia branch currently at 0.26.3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481053 (owner: 10Ottomata)
[14:58:30] <elukey>	 ottomata: whenever you have time, sanity check for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/483136/1/manifests/site.pp
[15:01:17] <wikibugs>	 (03PS2) 10Ottomata: Update to build from wikimedia's superset fork [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481054
[15:02:03] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update to build from wikimedia's superset fork [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481054 (owner: 10Ottomata)
[15:06:04] <ottomata>	 elukey:  looks right toi me
[15:06:22] <elukey>	 thanks :)
[15:08:40] <wikibugs>	 (03PS5) 10Ottomata: Bump to superset version 0.26.3-wikimedia1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481056
[15:20:27] <wikibugs>	 (03PS6) 10Ottomata: Bump to superset version 0.26.3-wikimedia1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481056
[15:20:47] <elukey>	 ottomata: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482790/ \o/
[15:21:01] <elukey>	 today I've worked a bit to adding the new stuff to camus
[15:21:06] <elukey>	 still wip but looking good
[15:21:44] <ottomata>	 oh nice
[15:22:19] <elukey>	 eventually everything should be all timers and adding kerberos support should be easy
[15:22:54] <ottomata>	 awesooome
[15:23:37] <ottomata>	 elukey:  how do you deploy superset in analytics labs?
[15:23:40] <ottomata>	 i see the deployment-server
[15:23:44] <ottomata>	 but no scap environments
[15:23:52] <ottomata>	 i'm going to go ahead and deploy there so fdans can check
[15:24:09] <fdans>	 coool beans
[15:24:12] <elukey>	 there is a /srv/deployment/etc.. super set dir
[15:24:35] <ottomata>	 yes
[15:24:37] <elukey>	 and then a superset.eqiad.wmflabs host (works but scap host list needs to be updated with it)
[15:24:49] <elukey>	 ahhh ok I got the scap environment thing now
[15:24:49] <ottomata>	 ah you jsut manually edit? k
[15:24:52] <elukey>	 you mean the host list
[15:24:56] <ottomata>	 ya
[15:24:56] <elukey>	 yeah sorry
[15:25:08] <ottomata>	 np!
[15:27:51] <elukey>	 ok hadoop nodes officially decommed! Going to add some notes to the admin docs
[15:28:04] <elukey>	 next step is to build the testing cluster
[15:31:43] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Decommission old Hadoop worker nodes and add newer ones - https://phabricator.wikimedia.org/T209929 (10elukey) Nodes completely removed: * removed from the network topology and restarted namenodes * assigned role::spare:system and removed...
[15:31:45] <wikibugs>	 10Analytics: Add Chinese Wikiversity edit-related metrics to Wikistats2 - https://phabricator.wikimedia.org/T213290 (10mforns)
[15:32:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Decommission old Hadoop worker nodes and add newer ones - https://phabricator.wikimedia.org/T209929 (10elukey) As mentioned before these nodes will become a new testing cluster, more info in T212256
[15:32:25] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Decommission old Hadoop worker nodes and add newer ones - https://phabricator.wikimedia.org/T209929 (10elukey)
[15:36:29] <ottomata>	 ok elukey this superset thing is not yet working....my change to build the static files i thought only ran during build, but now its trying to run on deploy too (when setting up[ the venv)
[15:36:32] <ottomata>	 so i gotta figure that out
[15:36:36] <ottomata>	 i'll work on that later today
[15:37:05] <elukey>	 super
[15:37:22] <elukey>	 at some point I hope that 0.29 goes out so we can go back to the previous release
[15:37:29] <elukey>	 (if it is stable of course)
[15:38:24] <wikibugs>	 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10mpopov) >>! In T212172#4853129, @chelsyx wrote: > Here're some use cases from my work for the iOS app team: >  > - Of course, as @Neil_...
[15:44:57] <wikibugs>	 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10mpopov) By the way, on ouR side, [[ https://github.com/wikimedia/wikimedia-disco...
[16:17:23] <mforns>	 hey a-team, any thing you want me to mention in SoS?
[16:17:39] <ottomata>	 mforns:  maybe the thing about user partial blocks
[16:18:01] <ottomata>	 https://phabricator.wikimedia.org/T202781#4865947
[16:18:05] <mforns>	 ottomata, you want me to flip the table?
[16:18:46] <mforns>	 right
[16:18:55] <ottomata>	 mforns:  you mean (╯°□°）╯︵ ┻━┻ ?
[16:19:02] <mforns>	 yea xD
[16:19:05] <ottomata>	 haha
[16:19:12] <ottomata>	 only if you do this after:
[16:19:19] <ottomata>	 (•_•)
[16:19:19] <ottomata>	 ( •_•)>⌐■-■
[16:19:20] <ottomata>	 (⌐■_■)
[16:19:23] <mforns>	 xDD
[16:24:49] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10jcrespo)
[16:26:51] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10jcrespo) I rebuilt db1082- we are no blocker for any maintenance on those servers, but we would prefer to stop mysql if there is a chance for the server to lose power, while it does not...
[16:31:43] <ottomata>	 elukey:  ops sync?
[16:51:48] <fdans>	 a-team I’m not sure I’ll make stand up, I’m out to get some meds for lauren who’s sick in bed
[17:10:24] <wikibugs>	 10Analytics: Reportupdater should not fail if pid file is malformed - https://phabricator.wikimedia.org/T213308 (10Milimetric) p:05Triage→03High
[17:11:59] <wikibugs>	 10Analytics: Reportupdater should alert if it fails over and over - https://phabricator.wikimedia.org/T213309 (10Milimetric) p:05Triage→03High
[17:12:16] <wikibugs>	 10Analytics: Add Chinese Wikiversity edit-related metrics to Wikistats2 - https://phabricator.wikimedia.org/T213290 (10JAllemandou) It's not present in the wiki-list we sqoop: https://github.com/wikimedia/analytics-refinery/blob/master/static_data/mediawiki/grouped_wikis/labs_grouped_wikis.csv Providing a  patch...
[17:19:44] <wikibugs>	 10Analytics, 10Anti-Harassment, 10Product-Analytics: Add partial blocks to mediawiki history tables - https://phabricator.wikimedia.org/T211950 (10Milimetric) p:05Normal→03High
[17:19:58] * elukey afk for a bit
[17:22:21] <wikibugs>	 (03PS1) 10Joal: Add zhwikiversity to the labs sqoop list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/483186 (https://phabricator.wikimedia.org/T213290)
[17:22:29] <joal>	 milimetric: --^ if you want
[17:23:08] <wikibugs>	 (03CR) 10Joal: [V: 03+1] "Access tested on labsdb" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/483186 (https://phabricator.wikimedia.org/T213290) (owner: 10Joal)
[17:23:18] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add zhwikiversity to the labs sqoop list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/483186 (https://phabricator.wikimedia.org/T213290) (owner: 10Joal)
[17:23:35] <joal>	 Thanks milimetric :)
[17:23:39] <milimetric>	 ty!
[17:24:23] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add Chinese Wikiversity edit-related metrics to Wikistats2 - https://phabricator.wikimedia.org/T213290 (10JAllemandou) a:03JAllemandou
[17:42:40] <mforns>	 thanks for Chinese Wikiversity joal :]
[17:44:49] <wikibugs>	 10Analytics, 10Anti-Harassment, 10Product-Analytics: Add partial blocks to mediawiki history tables - https://phabricator.wikimedia.org/T211950 (10dbarratt)
[17:57:14] <wikibugs>	 (03PS1) 10WMDE-Fisch: Add script to count user setting for disabled AdvancedSearch [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/483194 (https://phabricator.wikimedia.org/T211090)
[18:02:13] <wikibugs>	 (03PS2) 10WMDE-Fisch: Add script to count user setting for disabled AdvancedSearch [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/483194 (https://phabricator.wikimedia.org/T211090)
[18:04:55] * elukey off!
[18:19:16] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "Based on what I see and know I can't spot any mistake. But I don't feel like I know enough to be qualified to merge this." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/483194 (https://phabricator.wikimedia.org/T211090) (owner: 10WMDE-Fisch)
[18:39:48] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10Legoktm)
[18:39:52] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant, 10Patch-For-Review: Code sample for extension.json is wrong - https://phabricator.wikimedia.org/T213285 (10Legoktm) 05Open→03Invalid It's correct, as long as your extension is using `manifest_version: 2` (https://www.medi...
[18:40:03] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui)
[18:41:34] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) @elukey I have updated the original task, to add the last statuses of the curr...
[18:55:24] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui)
[18:55:58] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) @elukey For those databases that we have decided, so far, to backup and archiv...
[18:57:23] <wikibugs>	 (03CR) 10Joal: [C: 04-1] "A bunch of comments, nothing major, but still can't go as-is." (037 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[18:57:27] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) I asked @chasemp about `fab_migration` and I think we need to have a final wor...
[18:57:54] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui)
[18:59:01] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui)
[19:06:59] <wikibugs>	 (03CR) 10Joal: [C: 04-1] "Small addition to one comment." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[19:07:59] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641 (10srishakatux)   >     "EventLoggingSchemas": { >         "CentralAuth": 5690875 >     } By registering the schema in the...
[19:11:46] <mforns>	 hmmmm, EL sanitization's rotating salt seems not to be compatible with runnig sanitization in 2 steps... :(
[19:14:56] <ottomata>	 mforns:  oh y?
[19:15:38] <mforns>	 yes... at the end of quarter salt rotates, so 45 days before, second pass of sanitization also changes salt
[19:16:01] <mforns>	 and starts overwriting data and changing the old salt by the new salt
[19:16:58] <mforns>	 there's a backup of the salt that lives for 2 weeks now...
[19:17:07] <mforns>	 we could use that for the second pass maybe
[19:17:43] <mforns>	 something like: if exists backup, use it, otherwise, use the actual salt file
[19:18:10] <ottomata>	 oh
[19:18:22] <ottomata>	 hm
[19:18:56] <ottomata>	 i think keeping the salt longer is ok?
[19:19:03] <ottomata>	 we can just always keep the last quarter's salt?
[19:19:09] <ottomata>	 is that bad?
[19:20:03] <mforns>	 ottomata, keeping the old salt is, theoretically, the same as not hashing the marked fields for that period
[19:20:13] <mforns>	 so, yes, a bit bad
[19:20:42] <mforns>	 we're keeping it already for 2 extra weeks, to allow for backfilling in case of fireworks
[19:21:16] <mforns>	 maybe I can setup the second pass also after 2 weeks of initial sanitization
[19:21:48] <mforns>	 or maybe keep it for 3-4 weeks? I think one full quarter would be too much
[19:22:05] <mforns>	 and maybe 4 weeks too
[19:23:05] <mforns>	 ottomata, would it be possible in puppet to pass one path if exists, otherwise pass another path to the job?
[19:24:14] <mforns>	 bash hack
[19:26:53] <mforns>	 $(if [ -f $old_salt_path ]; then $old_salt_path; else $new_salt_path; fi)
[19:43:12] <ottomata>	 mforns:  hm not easily in puppet, but in a shell script yes
[19:43:23] <mforns>	 yea
[19:43:29] <ottomata>	 we already deploy a wrapper for spark jobs... but we might need a custom one to do that
[19:44:17] <mforns>	 ottomata, so you think it deserves a specific wrapper? I can do that
[19:44:29] <ottomata>	 mforns:  not sure, woudl be nicer if it didn't... but
[19:44:30] <ottomata>	 hm
[19:44:35] <mforns>	 yea
[19:44:46] <mforns>	 it's not going to be reused...
[19:45:08] <ottomata>	 what's the 2 pass plan btw?  to re-refine everything in bulk later?
[19:45:21] <ottomata>	 what's the period?
[19:45:24] <ottomata>	 for the second pass?
[19:45:32] <ottomata>	 wait...1 week, sanitize last 4?
[19:45:35] <ottomata>	 i don't remember
[19:45:49] <mforns>	 since=46days until=45days
[19:45:52] <wikibugs>	 (03CR) 10Awight: Schema for ORES scores (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[19:45:57] <mforns>	 or sth like that
[19:46:07] <ottomata>	 oh, so refine a full day from 45 days ago
[19:46:08] <ottomata>	 i see.
[19:46:13] <mforns>	 yea
[19:46:17] <ottomata>	 and the salt rotates every quarter?
[19:46:20] <mforns>	 yes
[19:46:27] <ottomata>	 so we'd need the salt from 45 days ago to do that properly?
[19:46:38] <mforns>	 yes, but that is super-easy
[19:46:47] <wikibugs>	 (03CR) 10Awight: Schema for ORES scores (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[19:46:47] <mforns>	 already working, just have to change a number in puppet
[19:46:59] <ottomata>	 right, it seems we'd want to keep it longer, is what you are saying?
[19:47:03] <ottomata>	 just in case?
[19:47:10] <mforns>	 yes
[19:47:34] <mforns>	 keep the old salt for extra 45 days and use it in the second pass if present
[19:47:50] <mforns>	 if not present, use the regular salty
[19:47:52] <mforns>	 salt
[19:48:00] <awight>	 joal: o/ I responded to the path question in CR, but might need more chatting because I think I'm failing to understand something here.
[19:48:37] <mforns>	 ottomata, can you pass a dict with the params to refine_job?
[19:48:54] <ottomata>	 hm, mforns, maybe it'd be best to make the logic detect which salt to use based on time period?
[19:48:57] <mforns>	 so that I can pass the same dict (with overrides) to both refine_jobs?
[19:49:02] <ottomata>	 the salt is passed into the job direectly, right?  its not discovered by job?
[19:49:12] <mforns>	 yes, passed
[19:49:21] <joal>	 Hi awight :)
[19:49:23] <ottomata>	 mforns:  yes you can do that, we'd need to likely use merge() to merge the overrides onto the hashes
[19:49:30] <joal>	 Reading your comments
[19:49:32] <ottomata>	 mforns:  hm.
[19:49:40] <ottomata>	 its too bad the salt finding logic isn't soemthing more like
[19:50:59] <ottomata>	 date = 2018-12-01
[19:50:59] <ottomata>	 salt_file = ${path_to_salt}/${date}.salt
[19:50:59] <ottomata>	 if !exists(salt_file)
[19:50:59] <ottomata>	   salt_file = $path_to_salt/current.salt
[19:51:00] <ottomata>	 ?
[19:51:09] <mforns>	 we could use: $(backup="foo" if [ -f $backup ]; then echo $backup; else echo "bar"; fi)
[19:51:30] <ottomata>	 that way the salt to use is based on the time period (assuming day only) 
[19:51:31] <mforns>	 hm
[19:51:41] <ottomata>	 then its way more flexible. 
[19:51:48] <ottomata>	 or even some index somewhere, that maps time periods to salt files
[19:52:32] <ottomata>	 it seems fragile to just assume two salts, and use one if not the other.  also hard to test
[19:52:51] <ottomata>	 better if the salt to use is preditible
[19:53:21] <ottomata>	 mforns:  i think when you ask me questions i make your life more complicated... :p
[19:53:27] <mforns>	 hehehe, no
[19:54:09] <joal>	 awight: My bad about partition-ordering - If models/versions belong to wikis, then indeed let's use the order you specified (maybe not for the public one though :)
[19:54:17] <mforns>	 I was looking what the filename of the salt was, checking if it already contains the date, but no
[19:55:23] <awight>	 joal: oh good point!  The way that table will be queried is only by wiki, and we'll probably only purge by snapshot
[19:55:29] <wikibugs>	 (03CR) 10Ottomata: Schema for ORES scores (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[19:55:38] <awight>	 oops--queries will be by (wiki, snapshot)
[19:55:46] <ottomata>	 mforns:  what is the salt named now?
[19:55:50] <ottomata>	 just eeventlogging.salt
[19:55:50] <ottomata>	 ?
[19:56:01] <mforns>	 yes
[19:56:08] <joal>	 awight: also, the snapshot is a group for all the joint-subfolders :)
[19:56:12] <mforns>	  /user/hdfs/eventlogging-sanitization-salt.txt
[19:56:16] <ottomata>	 aye
[19:56:22] <ottomata>	 and what is the backup called?
[19:56:35] <mforns>	  /user/hdfs/eventlogging-sanitization-salt.txt.old I think
[19:56:37] <mforns>	 lookin
[19:56:39] <ottomata>	 right ok
[19:56:43] <ottomata>	 somethign like that anyway
[19:56:59] <ottomata>	 yeah, it sounds like we shoudl jsut expect a list of salts and some way to figure out which ones should be used for which time period
[19:57:09] <ottomata>	 just rotating one backup is a little fragile
[19:57:15] <awight>	 joal: cool, so now I have /wmf/data/ores/revision/score_public/snapshot=2018-12/wiki=enwiki, if that sounds right to you?
[19:57:18] <ottomata>	 better to include the date even on the current one
[19:57:26] <ottomata>	 then the logic to find the proper salt will always be the same
[19:57:32] <ottomata>	 might even be better to not fall baack to the latest salt
[19:57:37] <mforns>	 but then... how does puppet know the name of the salt?
[19:57:45] <ottomata>	 maybe if the expected salt is not present, the job shoudl just die
[19:57:47] <awight>	 (open to a better name than "score_public", I want to say "score_with_context")
[19:57:48] <joal>	 awight: yes thank you :)
[19:58:04] <ottomata>	  refining with the wrong salt will lead to unexpected data, right?
[19:58:09] <mforns>	 yes
[19:58:09] <joal>	 awight: I'll leave you with our naming champion ottomata ;)
[19:58:18] <ottomata>	 mforns:  puppet won't ...
[19:58:22] <ottomata>	 somethign will have to
[19:58:28] <ottomata>	 wrapper script...or logic in scala somwhere?
[19:58:43] <awight>	 haha :) I would trust him with a nick like that
[19:58:46] <ottomata>	 haha
[19:59:31] <ottomata>	 awight:  i'd advise that if possible, you should try and keep your database/table_name dirs flat
[19:59:35] <mforns>	 ottomata, OK, will think, going to eat sth, will be back in a bit!
[19:59:45] <ottomata>	 that way it is easy to know exactly what table a file path belongs to
[19:59:49] <wikibugs>	 (03CR) 10Joal: [C: 04-1] Schema for ORES scores (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[19:59:57] <ottomata>	 so /wmf/data/ores is your base location path for all tables in the ores database
[20:00:07] <ottomata>	 and then any tables in there should have directories named after the tables themselves
[20:00:24] <ottomata>	 so if your tables is ores_revision_score_public, the path would be /wmf/data/ores/ores_revision_score_public
[20:00:30] <ottomata>	 ok  mforns_brb
[20:00:37] <awight>	 ottomata: thanks, will do
[20:01:14] <joal>	 ottomata: Nice one - I've messed up long ago in some places on the cluster (pageview/hourly for instance), and now realize that it doesn't help
[20:02:08] <awight>	 ottomata: is "ores_revision_score_archive" a consistent name to give a table which is essentially a copy of "ores_revision_score" but with mediawiki_history metadata included?
[20:02:23] <wikibugs>	 (03CR) 10Ottomata: Schema for ORES scores (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[20:02:46] <wikibugs>	 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10mpopov) Nevermind, per T170022#4800915 & T170022#4866564 I guess there's nobody actually managing Maps and RI is just doing maintenance...
[20:03:17] <ottomata>	 awight:  is this your _public table?
[20:03:25] <awight>	 yes
[20:03:42] <awight>	 It's for creating dumps.
[20:04:12] <ottomata>	 awight:  q, aren't all of these fields avail on the original mediawikI_revision_score table?
[20:04:38] <ottomata>	 from the event?
[20:04:43] <awight>	 ottomata: almost--the big catch is that I want to honor new suppressions.
[20:04:46] <ottomata>	 or...am i consfused
[20:04:47] <ottomata>	 ah
[20:05:05] <ottomata>	 that's just deleting removing the appropriate row?
[20:05:06] <awight>	 also, the schema is notably different for being normalized to one model per row
[20:05:15] <ottomata>	 yes, schema different ya
[20:05:16] <awight>	 ottomata: no, it's redacting the page_title and possibly user_text
[20:05:21] <ottomata>	 ash
[20:05:21] <ottomata>	 ah
[20:05:29] <ottomata>	 so you need to use mw history to know that then i see
[20:05:41] <awight>	 yeah, it's nasty
[20:05:58] <awight>	 I'm open to any approach here, but this is all I've come up with so far.
[20:06:24] <ottomata>	 awight:  re: predicition dont' some models make a list of predicitons?
[20:06:33] <awight>	 yes
[20:06:35] <joal>	 I actually view this as nice :) Being able to follow page-moves/user-renames is fun :)
[20:06:47] <ottomata>	 you just gonna join with comma?
[20:07:02] <ottomata>	 just noticed that you have prediciton as a string
[20:07:05] <awight>	 ottomata: oh sorry, not a list of predictions, but all models have a list of probabilities.  current type is:
[20:07:09] <awight>	     `probability`    array<sruct<name:string,value:double>> comment 'Predicted probability for each class.'
[20:07:16] <awight>	 prediction will always be one string
[20:07:20] <ottomata>	 awight:  i think some have a possible list of predicitions
[20:07:29] <ottomata>	 otherwise we wouldn't have made predicitons an array<string>
[20:07:34] <ottomata>	 at least, i think that's what aaron told us
[20:07:38] <wikibugs>	 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Neil_P._Quinn_WMF) >>! In T212386#4862292, @jcrespo wrote: > There is already an...
[20:07:42] <awight>	 hmm /me checks that
[20:08:08] <awight>	 ottomata: ah that may be to deal with multiple models per row?
[20:08:31] <awight>	 I'd expect it to be a map from model_name to prediction actually.
[20:08:39] <ottomata>	 no, because scores itself is already an array
[20:08:47] <ottomata>	 each revision has an array of scores
[20:09:03] <ottomata>	 and each score has an array of probabilities, and an array of predictions (which is usually single entry, but not always)
[20:09:27] <awight>	 gotcha:   `scores` array<struct<model_name:string,model_version:string,prediction:array<string>,probability:array<struct<name:string,value:double>>>>, 
[20:09:42] <ottomata>	 yup
[20:10:16] <wikibugs>	 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Neil_P._Quinn_WMF) >>! In T206894#4698640, @Milimetric wrote: > Sure, no problem.  It's probably a good idea to...
[20:10:29] <ottomata>	 i don't remember when the prediciton has multiple value though...but i'm pretty sure its possible.  since hte schema needs to be the same for every score, we needed to support it
[20:10:32] <awight>	 ottomata: okay we do have a model that can return a list of predictions, thanks for the catch!
[20:10:38] <ottomata>	 :)
[20:10:57] <ottomata>	 as for naming.... i don't know!  i dont' think _archive is  quite right...
[20:11:14] <awight>	 _dump?
[20:11:24] <awight>	 _with_context?
[20:11:33] <ottomata>	 i kinda like _public better, but it would be nice if it was clearer about how ores_revision_score is a different  schema than ores_revision_score_public
[20:11:36] <joal>	 _dump makes sense if the format is dump-oriented (CSV for instance)
[20:11:39] <ottomata>	 yeah
[20:11:48] <ottomata>	 the only reason you'd use this table is for the dump right?
[20:11:56] <ottomata>	 otherwise folks would just query the event?
[20:11:58] <ottomata>	 evnet table*
[20:11:59] <ottomata>	 ?
[20:12:22] <joal>	 ottomata: possibly not actually - missing a lot of event data in the explode version
[20:12:33] <joal>	 explode+d
[20:12:35] <ottomata>	 ya
[20:12:45] <joal>	 hm
[20:13:09] <ottomata>	 and its not adding more than the event table has
[20:13:09] <awight>	 ottomata: I think the ores_revision_score table will become the most valuable for joins, actually.  events only have the contemporary models, but the ores_revision_score table will be backfilled with new models run against old revisions.
[20:13:19] <ottomata>	 oh! nice.  
[20:13:32] <ottomata>	 _export
[20:13:33] <ottomata>	 ?
[20:13:48] <awight>	 careful--I'm happy to run with any name here :)
[20:13:54] <ottomata>	 haha
[20:14:08] <ottomata>	 I still like _public the best
[20:14:19] <ottomata>	 we do that with some other tables I think?
[20:14:37] <ottomata>	 ...do we?
[20:14:53] <awight>	 My only hesitation is that is makes ores_revision_score seem implicitly private, whereas it's just lacking the metadata columns entirely
[20:14:54] <ottomata>	 maybe we don't!
[20:15:02] <ottomata>	 hm
[20:15:03] <ottomata>	 true
[20:15:11] <ottomata>	 _dump is ok?  
[20:15:22] <joal>	 The other option is to rename the core table to something else, so that the table to be used by user is revision_score?
[20:15:24] <ottomata>	 maybe _export is bette rthan _dump ?
[20:15:50] <ottomata>	 heh awight...
[20:15:58] <ottomata>	 IF things other than revisions will be scored in the future
[20:16:03] <ottomata>	 ...
[20:16:09] <ottomata>	 you could make the small score even more generic
[20:16:15] <ottomata>	 ores_score
[20:16:32] <ottomata>	 `id` bigint,
[20:16:32] <ottomata>	 `entity` string (e.g. revision),
[20:16:59] <ottomata>	 join on id where ores_score.entity = 'mediawiki_revision'
[20:16:59] <ottomata>	 :p
[20:17:05] <ottomata>	 probably not a good idea^
[20:17:08] <ottomata>	 but it is AN idea :p
[20:17:48] <awight>	 ottomata: I've been moving away from this sort of polymorphism for ORES data in the MediaWiki DB and API, fwiw.
[20:17:57] <ottomata>	 ok ok
[20:18:00] <ottomata>	 its bad?
[20:18:11] <awight>	 I think it introduces extra complexity in the long run
[20:18:14] <ottomata>	 aye 
[20:18:16] <ottomata>	 you are probably right
[20:18:19] <joal>	 ores.revision_score_raw
[20:18:28] <joal>	 ores.revision_score
[20:18:28] <ottomata>	 naw not raw
[20:18:35] <joal>	 ok
[20:18:37] <joal>	 :)
[20:18:38] <awight>	 also, use cases are very distinct, there's never a workflow that will query both page scores and revision scores in the same query
[20:18:52] <ottomata>	 ores_revision_score_composite
[20:18:54] <ottomata>	 hmm, naw
[20:19:07] <ottomata>	 its not adding anything that the event table doesn't already have
[20:19:14] <awight>	 ores_revision_score_context ?  although the trick is that it's a snapshot of context
[20:19:15] <ottomata>	 i dunno _export is fine...its really only for exporting right?
[20:19:19] <awight>	 yes ^
[20:19:28] <joal>	 also, if the DB is ores, do we need the ores prefix for the table awight ?
[20:19:37] <awight>	 of course, someone might get the idea that it's a fun table to use directly in hadoop :)
[20:19:39] <wikibugs>	 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Marostegui) I haven't found much on wikitech, so: ` marostegui@tools-bastion-03:...
[20:19:45] <ottomata>	 awight:  that's fine
[20:19:47] <awight>	 joal: I'd love to drop
[20:19:48] <joal>	 awight: that's my idea
[20:20:01] <ottomata>	 joal:  i'm fine with dropping the ores_ table prefix
[20:20:02] <ottomata>	 so
[20:20:23] <joal>	 awight: I think the data joint with mediawiki-history is actually more usefull to others than the `raw` one
[20:20:23] <awight>	 joal: I guess that's fine, but the tradeoff would be much older (1-30 days older) data
[20:20:42] <awight>	 for many consumers, that'll probably be okay
[20:21:03] <joal>	 awight: we're talking stats and trends - for real-tim-ish, use events ;)
[20:21:05] <ottomata>	  /wmf/data/ores/{revision_score, revision_score_error, revision_score_export} 
[20:21:19] <ottomata>	 heh
[20:21:26] <ottomata>	 revision_score_history ? :[
[20:21:50] <awight>	 ottomata: kk
[20:21:57] <joal>	 ottomata: the more I think of it, the more I'd like the 'export' table to be the base got others (in parquet etc - Because it ocntains metadata)
[20:22:07] <ottomata>	 aye
[20:22:14] <ottomata>	 joal that makes sense too, if that's the case _export is a bad name
[20:22:18] <awight>	 :)
[20:22:18] <joal>	 rig
[20:22:19] <ottomata>	 as is _dump
[20:22:30] <joal>	 correct, and CSV format is wrong as well
[20:22:31] <awight>	 revision_score_with_context?
[20:22:48] <joal>	 revision_score_augmented?
[20:22:51] <joal>	 mwarfv
[20:22:53] <awight>	 hehe
[20:23:01] <awight>	 Now I've really brought my bikeshed with me
[20:23:26] <ottomata>	 this is more like where to put the doornknob on the bikeshed, not what color to paint :p
[20:23:34] <ottomata>	 shoudl we put the doorknob on the roof?
[20:23:35] <ottomata>	 probably not!
[20:23:59] <awight>	 :) let's make it big enough for electric cargo bikes
[20:24:13] <ottomata>	 _public is still fine with me!
[20:25:02] <joal>	 ottomata: or we reuse the webrequest approach using databases: ores_raw, ores 
[20:25:03] <ottomata>	 i think as long as there's docs about what the tables are, it is fine
[20:25:09] <joal>	 +1 --^
[20:25:10] <awight>	 ok I'm going with that.  It actually makes sense usage-wise, since the more normalized revision_score tables will often be in a semi-backfilled state.
[20:25:19] <ottomata>	 joal:  naw because raw as we mean it isn't what this is, 
[20:25:37] <joal>	 raw is event ...
[20:25:41] <ottomata>	 raw is more like unrefined input data, yes the _public table comes from somewhere else but
[20:25:47] <joal>	 right
[20:26:03] <joal>	 public it'll be :)
[20:26:05] <ottomata>	 calling this raw woudl be like calling page_history raw but mediawiki_history refined
[20:26:13] <ottomata>	 this is just a step in the pipeline, not raw
[20:33:00] <wikibugs>	 (03CR) 10Joal: [C: 04-1] Schema for ORES scores (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[20:33:15] <joal>	 awight: Just added a comment about format for the public table --^
[20:38:20] <wikibugs>	 (03PS9) 10Awight: Schema for ORES scores [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732)
[20:38:33] <awight>	 ^ integrates our discussion so far
[20:42:14] <wikibugs>	 (03CR) 10Awight: Schema for ORES scores (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[20:42:40] <joal>	 Arf I might have been misleading in my comment awight - For the public table I didn't mean remove model/version from the partitions, but rather move snapshot at the top-level (as you did) - I can think of use-cases where having all models/versions in the same files will be usefull - But I also can view the advantage of having them split
[20:43:58] <awight>	 Interesting, I was thinking that the rebuild will always be the entire set with no model/version distinction, but reconsidering, if this will be used for queries "model" might help a lot
[20:44:23] <awight>	 maybe not "model version" since end-users will be agnostic, and I don't think we'll be purging by model or model version ever
[20:48:04] <joal>	 awight: If most querying should be done by model and version, then it's probably usefull to add the version partition - If most queries are about comparing versions inside a model, then not
[20:48:33] <joal>	 I think I don't understand the "end-users will be agnostic" part of your sentence :)
[20:48:54] <awight>	 I think no queries will filter on version, in other words.  Only one version (the latest) should be provided in any given snapshot.
[20:50:29] <joal>	 Ah I had missed that
[20:50:33] <awight>	 It's only included as a column for informational purposes, so a researcher can look at our errata later and say "nuts, I used enwiki-damaging-0.4.0 which had these known problems"
[20:50:46] <awight>	 thanks for helping me think through this stuff!
[20:51:51] <joal>	 That great awight - I had a demo doing the same exxact thing you do (joining events with history) for analytics purpose :) Thanks for making it happen!
[20:51:52] <wikibugs>	 (03PS10) 10Awight: Schema for ORES scores [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732)
[20:52:26] <joal>	 awight: For the export use-case, last-version for models makes sense - For analytics puposes, more makes sense :)
[20:52:46] <joal>	 Being able to compare versions should be great I assume
[20:53:27] <awight>	 hmm interesting.  We do have version history for overall health statistics of each model version, but detailed scores might be neat also
[20:53:41] <awight>	 I'll make a note about that...
[20:54:02] <joal>	 awight: I'll show you my demo tomorrow if you wish (too late for me tonight) ;)
[20:54:39] <awight>	 I can be a test audience, or let me know when you present to a larger group!
[20:55:34] <joal>	 I showed halfak a while back - it didn't evolve since then - mostly showing fun stats about models
[20:56:14] <joal>	 ok - gone for tonight folks - See you tomorrow
[20:56:17] <awight>	 o/
[20:56:38] <halfak>	 Ooh.  Pull me in for that demo too.  I want to talk more about it. 
[20:56:39] <halfak>	 o/ 
[20:56:44] <halfak>	 good evening/night joal 
[20:57:19] <awight>	 halfak: Interesting point above about preserving scores from older model versions...
[21:01:02] <halfak>	 My general sense is that we shouldn't purge old scores if storage space isn't an issue. 
[21:01:23] <halfak>	 I'd really like to make it easier for consumers to experiment with old models/old scores. 
[21:01:42] <halfak>	 But I don't really see keeping old scores in hadoop as a good solution for that. 
[21:03:15] <awight>	 They'll be nicely contained in directories, so it's easy to ignore them or monitor storage usage.
[21:03:46] <awight>	 i.e. the partitioned data paths are like: /wmf/data/ores/revision_score/wiki=enwiki/model=damaging/model_version=0.0.1
[21:05:31] <halfak>	 Makes sense. 
[21:13:49] <awight>	 either way, I'll plan the import jobs with the assumption that older model scores may or may not be present.
[21:46:17] <wikibugs>	 10Analytics, 10Research, 10Wikidata: Copy Wikidata dumps to HDFs - https://phabricator.wikimedia.org/T209655 (10bmansurov)
[21:47:54] <wikibugs>	 10Analytics, 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10bmansurov)
[21:48:47] <wikibugs>	 10Analytics, 10Research: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov)