[00:01:21] (03PS9) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [00:02:25] (03CR) 10jerkins-bot: [V: 04-1] Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [00:02:34] (03PS10) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [00:03:06] (03CR) 10jerkins-bot: [V: 04-1] Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [00:03:49] (03CR) 10Zhuyifei1999: Add Execution time to recent queries table of Quarry (032 comments) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [00:31:51] (03PS11) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [00:34:51] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog: Popups schema is keeping page/session id longer than 90 days - https://phabricator.wikimedia.org/T207670 (10Jdlrobson) [00:36:02] (03CR) 10Stibba: "@Zhuyifei1999 Should the second/seconds be adjusted depending if it's 1 second or not?" (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [01:26:54] (03CR) 10Zhuyifei1999: [C: 031] Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [01:27:59] seems there is a hive query using 1217 cores and 2494464 MB RAM right now... is that why the cluster seems a bit sluggish? [01:28:14] (03CR) 10Zhuyifei1999: [C: 031] "Honestly, I'd just use seconds. It's almost impossible to have exactly 1.00 second." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [01:51:44] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics: mediawiki_history datasets have null user_text for IP edits - https://phabricator.wikimedia.org/T206883 (10Neil_P._Quinn_WMF) 05declined>03Open Hey @JAllemandou! Sorry to weigh in so late, but why can't we simply copy the `event_user_text_histo... [02:49:20] neilpquinn: https://phabricator.wikimedia.org/T209536 [02:50:25] 10Analytics, 10Analytics-Kanban: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10Tbayer) Is this related to T206279 ? [02:54:24] 10Analytics, 10Analytics-Kanban: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10Neil_P._Quinn_WMF) Thanks for working on this, @JAllemandou! I've just started suffering this on a lot of queries, including ones that previously worked fine. The error message is "OperationalError:... [02:59:57] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10Neil_P._Quinn_WMF) [03:38:44] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics: mediawiki_history datasets have null user_text for IP edits - https://phabricator.wikimedia.org/T206883 (10Milimetric) @Neil_P._Quinn_WMF I agree with that solution, seems fine to me. By the way, we might be having some trouble with user_text due t... [03:50:24] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Milimetric) >>! In T209031#4752361, @Bstorm wrote: > I'm aiming to write tests for this script shortly because it is too compl... [04:38:24] 10Analytics, 10Analytics-Kanban, 10New-Readers, 10Patch-For-Review: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) Looks like @Prtksxna resolved this, just closing the loop. @Nuria one question remains - it looks like the totals are updated on a delay. Prateek has looked... [06:22:58] 10Analytics, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10Marostegui) >>! In T209549#4751000, @Milimetric wrote: > > @Marostegui do you have plans to add ipblocks_restrictions to filtered_tables.txt? If you have any patches along those lines,... [06:27:39] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10Marostegui) Even know the name `dbstore1001` suggests otherwise....this host has nothing to do with the future usage for these dbstore1003-5 hosts, so it can... [06:31:18] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10elukey) a:05elukey>03Cmjohnson >>! In T209620#4752864, @Marostegui wrote: > Even know the name `dbstore1001` suggests otherwise....this host has nothing t... [06:38:07] PROBLEM - Hadoop NodeManager on analytics1029 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [06:54:03] so an1029 doesn't looks good [06:54:13] already a disk failed and a ton of errors related to the others [06:54:33] since it is going to be decom, it might be wise to just remove it from the cluster now [06:56:25] I forced a reboot to see if fsck solves any problem [07:13:11] might be 3 disks with issues [07:13:28] and IIRC we replaced a disk on an1029 not long ago [07:29:25] RECOVERY - Hadoop NodeManager on analytics1029 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [07:32:19] all right let's see if it works [07:32:32] one last chance before decomming it [08:17:39] 10Analytics, 10Research: Copy Wikidata dumps to HDFs - https://phabricator.wikimedia.org/T209655 (10JAllemandou) Not sure when we'll be able to do that. However there is more recent dump available: `/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20181001` :) [08:23:30] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10JAllemandou) >>! In T209536#4752726, @Tbayer wrote: > Is this related to T206279 ? I don't think the root-cause is the same (one problem happens on HiveServer2, the other on... [08:24:57] 10Analytics, 10Research, 10Wikidata: Copy Wikidata dumps to HDFs - https://phabricator.wikimedia.org/T209655 (10Addshore) [08:27:27] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics: mediawiki_history datasets have null user_text for IP edits - https://phabricator.wikimedia.org/T206883 (10JAllemandou) Hi @Neil_P._Quinn_WMF , While I understand the usage frustration, keeping the IPs in `event_user_text_historical` is for me a mat... [08:30:01] Hi addshore - Would you by any mean be interested in T209655 ? :) [08:30:01] T209655: Copy Wikidata dumps to HDFs - https://phabricator.wikimedia.org/T209655 [08:30:11] Hi elukey - Thanks for giving an1029 a last chance [08:33:11] the older workers are falling over :D [08:33:30] joal: in it being done or doing it? ;) [08:33:34] joal: still draft https://gerrit.wikimedia.org/r/474113 [08:33:40] elukey: disks only I'm assuming [08:33:49] it seems so yes [08:33:52] addshore: being done I assumed :) [08:33:57] Yes :D [08:33:59] :D [08:34:16] If it means I can do queries on the wikidata data, such as those that we were doing some weeks / months ago [08:34:22] or making them easier to do [08:34:39] addshore: That's the idea, with some delay in data though [08:36:18] yup, that sounds fine [08:36:21] how much delay? [08:36:32] elukey: This confirms what we knew about hadoop: mostly I/Os :) [08:36:38] addshore: a few days I'd assume [08:36:45] addshore: at the beginning of the month [08:36:45] I'd basically love to kill off my script which current trawl through the dump pulling data out [08:36:48] yes a few days sounds great :) [08:36:58] joal: would it be all wikidat content? not just statements? [08:37:06] so, sitelinks and labels etc too? [08:37:09] addshore: We'd get the thing monthly, so at the end of the month, there is more lag [08:37:26] addshore: I use the wikidata-json dumps, so yes, everything :) [08:37:34] yeh, all sounds great [08:38:10] addshore: and maybe at some I'll have a graph view of the thing, allowing for sparkl-like querying for huge stuff :) [08:38:52] yup, that would be great [08:39:04] * joal loves wikidata :) [08:39:06] im sure GoranSM would like that too :) [09:14:33] 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), and 2 others: revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10Joe) So one scenario in which this can h... [09:17:15] 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), and 2 others: revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10Joe) So while I think my theory is prett... [10:35:17] 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), and 2 others: revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10elukey) Not sure if I am doing anything... [10:58:47] 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), and 2 others: revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10elukey) ` elukey@kafka2002:~$ grep 10.64... [13:26:37] 10Analytics, 10Analytics-EventLogging: Resurrect eventlogging_EventError logging to in logstash - https://phabricator.wikimedia.org/T205437 (10phuedx) I'd actually be interested in picking this up as a 10% project to better understand our logging infrastructure and deployment of the various EL services. If tha... [14:00:56] 10Analytics, 10Research, 10Wikidata: Copy Wikidata dumps to HDFs - https://phabricator.wikimedia.org/T209655 (10bmansurov) Thanks, @JAllemandou! The more recent dumps are very useful. [14:06:37] hmmmmm joal after a lot of wailing and gnashing of teeth downgrading the version... in my local installation with cassandra 2.2, it's still behaving as expected: the field is added without problems [14:07:20] and this is without even upgrading to the latest version of restbase-mod-cassandra [14:10:42] 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), and 2 others: revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10mobrovac) Very interesting! Thank you fo... [14:10:45] joal: here's what I think happened in beta: for some reason the aqs service stopped working, I didn't realise and kept restarting the malfunctioning service, but of course there's no update of the cassandra tables [14:13:36] (03PS2) 10Fdans: Add offset and underestimate to uniques table schema [analytics/aqs] - 10https://gerrit.wikimedia.org/r/473708 (https://phabricator.wikimedia.org/T167539) [14:14:32] elukey: you think we could maybe merge this and unbreak beta? :D [14:19:54] fdans: what do you mean? [14:20:32] elukey: like, deploy the change "properly" in beta [14:21:12] is it ready for prod or still need testing? [14:21:48] because we can live with beta in that state for a bit [14:22:44] elukey: in theory is ready as verified by my local setup, but I'd like to push the change to master and deploy in beta from scap [14:22:47] my main point is that if we need for some reason to deploy something urgent in AQS there will be changes in need to revert [14:22:58] ahh to properly test [14:23:00] sure sure [14:23:02] makes sense then [14:23:18] I have little context in that code so my +1 is meaningless :( [14:23:25] let's wait for a review ok? [14:24:13] okkkeeeeeeiiiii elukey I'll hold on! in the meantime I think Imma set up aqs for CI [14:41:32] (03CR) 10Joal: [C: 031] "LGTM ! Let's test :)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/473708 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [14:41:45] Hey fdans ! Sorry misswed our ping [14:41:51] thank youuuu joal [14:41:57] fdans: changes looks good, let's test ! [14:42:23] fdans: will you test in beta first, or do we go to prod straight? If prod, let's wait to monday :) [14:42:40] yeaaaa elukey joal any of you want to help me fixing the mess I made in beta? [14:43:05] joal: whatever you prefer, according to my tests the worst thing that can happen in prod is that the fields don't get created [14:43:29] fdans: I hear that, but you know the rule :) [14:43:37] yeayea [14:45:13] fdans: http://www.commitstrip.com/en/2018/11/06/experience-is-a-candle/?setLocale=1 [14:47:19] joal: hehe, wasn't suggesting that we deploy now, just saying that this patch is pretty safe to go to prod when we deploy [14:47:37] Sounds good fdans :) [14:47:50] I just like pasting commit-strips :) [14:47:55] yes please no prod deploy on a friday evening :) [15:01:52] Gone for kids folks - Looks like as wednesday, I'll miss standup :( [15:01:56] Will confirm later [15:10:03] ack! [15:36:37] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10elukey) So I changed the MAC address of an-worker1080 to the 10G interface listed in the System Setup as "connected", and forced a PX... [15:50:43] * elukey hates hardware [15:58:21] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10elukey) I also tried to disable explicitly the integrated nic's boot option (and allow only the one from the 10G NIC) but didn't work... [16:16:42] 10Analytics, 10Analytics-Kanban: Set up CI system on AQS - https://phabricator.wikimedia.org/T209711 (10Nuria) [17:02:37] ping joal [17:10:33] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1079.eqiad.wmnet'... [17:25:02] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10elukey) Chris solved the mistery! ` │ 17:22 ... [17:29:17] 10Analytics, 10Analytics-Kanban, 10New-Readers, 10Patch-For-Review: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Nuria) Piwik data is updated once a day. [17:40:40] * elukey off! [17:43:23] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1079.eqiad.wmnet'] ` and were **ALL** successful. [18:07:01] mforns_: my meeting was no show, do you want to work on changes together in batcave? [18:07:21] mforns_: i have 30 minutes [18:07:26] nuria, no prob, I got the change that broke the job [18:07:40] it wasn't to EventLoggingSanitization, rather to Refine.scala [18:07:43] mforns_: ah ok, let's go to bc and you can describe it [18:07:56] and it was even me who deployed it on Oct 4th [18:08:03] ok [18:44:04] 10Analytics, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10TBolliger) What functionality/possibilities would we get from adding a table to filtered_tables? Sorry for the question, and thank you in advance for your answer — this is my first time d... [18:50:11] 10Analytics, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10Marostegui) None really. filtered_tables.txt is a file we use for sanitization to and put column triggers in place for those columns that should not be exposed on labs, so the trigger wil... [18:50:47] (03CR) 10Framawiki: Add Execution time to recent queries table of Quarry (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [19:01:53] (03PS12) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [19:06:51] (03CR) 10Framawiki: [C: 032] "Great!" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [19:07:36] (03Merged) 10jenkins-bot: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [19:07:38] 10Quarry, 10Google-Code-in-2018, 10Patch-For-Review: Show execution time in the Recent Queries page - https://phabricator.wikimedia.org/T71264 (10Framawiki) 05Open>03Resolved a:03Stibba [19:13:44] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics: mediawiki_history datasets have null user_text for IP edits - https://phabricator.wikimedia.org/T206883 (10Neil_P._Quinn_WMF) >>! In T206883#4753059, @JAllemandou wrote: > Hi @Neil_P._Quinn_WMF , > While I understand the usage frustration, keeping t... [19:17:35] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10Neil_P._Quinn_WMF) >>! In T209536#4753050, @JAllemandou wrote: >>>! In T209536#4752726, @Tbayer wrote: >> Is this related to T206279 ? > > I don't think the root-cause is the... [19:20:05] 10Quarry: Create api health point for monitoring - https://phabricator.wikimedia.org/T205151 (10Framawiki) [19:23:43] 10Analytics, 10Analytics-Cluster: Upgrade Hive to ≥ 2.0 - https://phabricator.wikimedia.org/T203498 (10Neil_P._Quinn_WMF) >>! In T203498#4745341, @elukey wrote: > Now I am wondering how much stability we'd sacrifice if we upgrade to Hive 2.11 like described above. It would be extremely useful to some people, b... [19:59:24] 10Analytics, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10Nuria) I think we are mixing things here, scooping mediawiki tables to data lake on hadoop (private cluster) and having that data available in labs (public, sanitized data). @Tbollinger... [21:20:10] Hi analytics eng people! It looks like EventLogging events have stopped appearing in MySQL in beta labs about 6 hours ago? [21:20:39] select max(timestamp) from NavigationTiming_18435897; returns 2018-11-16 15:38:55 UTC, and for many other schemas it's similar or earlire [21:22:41] a-team: --^^ [21:43:31] milimetric: did we ever work out what happened to passport-mediawiki-oauth? [21:44:26] milimetric: i have a patch for it [21:55:46] ^ cc tgr who might know? [22:05:06] Hi, I'm looking for a pointer to the code or config that generates analytics dumps from hadoop, e.g. https://dumps.wikimedia.org/other/pageviews/readme.html [22:13:17] jdlrobson: I wasn't even aware something happened to it [22:13:31] it just disappeared from wikimedia repos one day [22:13:38] tgr: it used to be on github under wikimedia [22:14:20] also no github link on https://www.npmjs.com/package/passport-mediawiki-oauth [22:14:42] Should I set one up based on the files in my node_modules ? [22:15:14] i would need help with the publishing aspect though.. [22:18:08] RoanKattouw: hey I'm here, looking into this now [22:18:14] I don't suppose GitHub has anything sane like a log where one could check? [22:19:45] there's an audit log but no search and it has about one page per day :-/ [22:20:34] jdlrobson: do you know the exact repo name? [22:21:08] nvm, it was passport-mediawiki [22:24:02] wonderful, repo searches in the GitHub audit log fail if the repo was deleted [22:27:15] jdlrobson: Chad apparently deleted it in July along with a couple other repos [22:27:29] mm weird [22:27:42] the audit log doesn't have anything useful like a summary so probably best bet is to ask him [22:27:52] i guess he didnt realise it was being used? [22:28:32] maybe https://github.com/milimetric/passport-mediawiki-oauth is the repo? [22:28:50] ahh interesting..https://github.com/milimetric/passport-mediawiki-oauth/pull/1 is the bug I want to fix [22:28:54] so i guess i tried to fix that already :) [22:29:00] i guess this is on milimetric to work out what to do here! [22:29:36] hi yall, sorry was off today but I’ll jump on and try to answer questions in a few minutes when I get home [22:32:28] 10Analytics, 10Dumps-Generation, 10ORES, 10Scoring-platform-team, and 3 others: [Epic] Make ORES scores available in Hadoop and as a dump - https://phabricator.wikimedia.org/T209611 (10awight) [22:46:21] 10Analytics, 10ORES, 10Scoring-platform-team: Choose HDFS paths and partitioning for ORES scores - https://phabricator.wikimedia.org/T209731 (10awight) [23:24:08] 10Analytics, 10ORES, 10Scoring-platform-team: Wire ORES scoring events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) [23:24:31] 10Analytics, 10ORES, 10Scoring-platform-team: Choose HDFS paths and partitioning for ORES scores - https://phabricator.wikimedia.org/T209731 (10awight) [23:27:49] 10Analytics, 10ORES, 10Scoring-platform-team: Include feature values in ORES changeprop stream - https://phabricator.wikimedia.org/T209734 (10awight) [23:28:43] 10Analytics, 10ORES, 10Scoring-platform-team: Wire ORES scoring events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) [23:28:45] 10Analytics, 10ORES, 10Scoring-platform-team: Choose HDFS paths and partitioning for ORES scores - https://phabricator.wikimedia.org/T209731 (10awight) [23:33:35] 10Analytics, 10Dumps-Generation, 10ORES, 10Scoring-platform-team, and 3 others: [Epic] Make ORES scores available in Hadoop and as a dump - https://phabricator.wikimedia.org/T209611 (10awight) [23:39:30] 10Analytics, 10ORES, 10Scoring-platform-team: Backfill ORES Hadoop scores with historical data - https://phabricator.wikimedia.org/T209737 (10awight) [23:43:40] 10Analytics, 10ORES, 10Scoring-platform-team: Backfill ORES Hadoop scores with historical data - https://phabricator.wikimedia.org/T209737 (10awight) [23:47:17] 10Analytics, 10ORES, 10Scoring-platform-team: Wire ORES scoring events into Hadoop - https://phabricator.wikimedia.org/T209732 (10Nuria) @awight FYI that events need to abide to a schema that can be persisted to sql: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines and tha... [23:50:55] 10Analytics, 10ORES, 10Scoring-platform-team: Choose HDFS paths and partitioning for ORES scores - https://phabricator.wikimedia.org/T209731 (10Nuria) It is worth looking at already existing event data, if we want to reuse the logic that reads events and persists those to hive partitions cannot be schema dep... [23:52:04] RoanKattouw: beta labs has problems with disks getting full , let me see [23:52:55] awight: analytics dumps from hadoop... you mean data files? [23:53:26] awight: there are many data files generated by different pieces of code, an example: [23:54:11] awight: this is the code that generates for example mediacounts file: https://github.com/wikimedia/analytics-refinery/tree/master/oozie/mediacounts [23:54:57] nuria: Great, thanks! That's the repo I needed. [23:55:08] awight: you probably want to look at oozie: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie [23:57:09] 10Analytics, 10Dumps-Generation, 10ORES, 10Scoring-platform-team: Produce dump files for ORES scores - https://phabricator.wikimedia.org/T209739 (10awight) [23:58:20] 10Analytics, 10Dumps-Generation, 10ORES, 10Scoring-platform-team: Produce dump files for ORES scores - https://phabricator.wikimedia.org/T209739 (10awight) Pointers from @Nuria: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie https://github.com/wikimedia/analytics-refinery/tree/master/o... [23:58:43] 10Analytics, 10Dumps-Generation, 10ORES, 10Scoring-platform-team: Produce dump files for ORES scores - https://phabricator.wikimedia.org/T209739 (10awight)