[00:40:01] did something change on stat1004? i can't ssh in any more .. [00:40:06] ssh stat1004.eqiad.wmnet < am i doing something stupid? [00:54:19] worked it out.. was using bast4001 not 4002 [01:22:37] jdlrobson: AH thanks I've been falling back to bast2001 cos too lazy to do ^ that homework. [07:12:48] morniiiing [07:14:21] o/ [07:39:11] query fdans [07:39:14] ufff [07:39:36] yeah Fran I think about you sometimes <3 [07:53:20] so the netflow cron has been migrated to timer! [08:26:10] Hi elukey :) [08:26:35] Quick question - YARN UI still says 14 decommissioned nodes - Is that expected? [08:28:07] joal: lemme try one thing [08:28:23] joal: better now? [08:28:40] nope :( [08:28:49] https://yarn.wikimedia.org/cluster/nodes/decommissioned [08:29:31] https://yarn.wikimedia.org/cluster/nodes doesn't show them though [08:29:44] maybe the decom list remains as historical list? [08:30:09] * joal doesn't know [08:30:31] In the scheduler view, decom nodes count says 14 [08:30:31] I just did a refreshNode command, the hosts.exclude doesn't mention them anymore [08:30:49] maybe a full restart is needed? [08:31:42] we can do it yes, although I don't think it is a massive problem now, we could wait for the next restart (for say jvm upgrades etc..) [08:31:45] buuut we can do it now [08:31:53] elukey: no no - Let's wait [08:32:17] elukey: I was just wondering if the exclude file or something else was still in decom mode - Looks like no :) [08:32:55] ack :) [08:33:06] to be super sure I have removed the yarn/hdfs packages from the old nodes [08:33:19] and they have role::spare::system now in puppet [08:33:29] so there is 0 chance, in theory, that they come back alive [08:33:53] I was pretty sure you had dead-shot them ;) [08:34:23] :D [08:43:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 (10elukey) [08:58:30] It just came to mind that I have been working with you guys for 3 years! [08:58:52] Happy workday elukey :) [09:00:19] * joal loves my team-mates <# [09:01:11] \o/ [09:07:50] 10Analytics, 10Patch-For-Review, 10User-Elukey: Varnishkafka error to investigate: Required feature not supported by broker - https://phabricator.wikimedia.org/T210939 (10elukey) 05Open→03Resolved The issue seems a one time only, and the new alarms have been very reliable over the past months. Closing th... [09:10:44] so the netflow camus timer seems to work fine, there is a minor nit to solve with logging but I filed a code review [09:11:11] once solved I'll start moving more camus jobs [09:11:17] is it ok? [09:11:35] \o/ yes elukey :) [09:11:39] super :) [09:37:15] (03PS6) 10Awight: [WIP] Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) [09:49:27] joal bonjouuuur, would you have any idea of why the latest wikis added to the sqoop list aren't being sqooped [09:49:57] in the november snapshot dan said that the cluster hadn't been deployed, but I know that's not the case for the latest one [09:55:58] Hi fdans - Have you double checked merge-dates vs deploys-dates for the past months? [10:08:12] joal: hmmm, for some reason I thought I had merged this change waaay before I actually did [10:09:10] I don't think there were any deploys of the cluster between the merging of the change (Dec 3) and the beginning of january [10:10:10] and there was a deploy on the 7th, and I can see the whitelist with the latest wikis in prod in stat1007, so I think it's just me freaking out [10:10:14] sorry to bother you joal [10:11:00] np fdans :) And I confirm there have no deploy of the refinery on the cluster in december [10:27:04] (03PS1) 10Fdans: Release 2.5.3 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/483379 [10:27:24] (03CR) 10Fdans: [V: 03+2 C: 03+2] Release 2.5.3 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/483379 (owner: 10Fdans) [10:29:00] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add new wikis to analytics - https://phabricator.wikimedia.org/T209822 (10fdans) Since the merge of this change in Dec 3 the only deploy of refinery was in Jan 7, so these wikis will appear on the January snapshot at the beginning of February. [10:34:31] joal: how do you feel like me replacing all the remaining camus job crons with timers? [10:36:47] elukey: ♪ ♫ ♬ Wow! I feel good! [10:40:46] joal: done :) [10:40:55] let's see if everything proceeds as expected [10:43:44] joal: [10:43:44] NEXT LEFT LAST PASSED UNIT ACTIVATES [10:43:48] Thu 2019-01-10 10:50:00 UTC 6min left Thu 2019-01-10 10:40:00 UTC 3min 24s ago camus-webrequest.timer camus-webrequest.service [10:44:03] looks lovely [10:44:35] last execution 10:40, next 10:50, left/elapsed time correct [11:08:58] from kafka graphs etc.. it looks fine [11:10:29] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10elukey) [11:13:45] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10elukey) [11:28:12] (03CR) 10Addshore: [C: 03+2] "Looks good to me" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/483194 (https://phabricator.wikimedia.org/T211090) (owner: 10WMDE-Fisch) [11:28:17] (03PS1) 10Addshore: Add script to count user setting for disabled AdvancedSearch [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/483386 (https://phabricator.wikimedia.org/T211090) [11:28:25] (03Merged) 10jenkins-bot: Add script to count user setting for disabled AdvancedSearch [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/483194 (https://phabricator.wikimedia.org/T211090) (owner: 10WMDE-Fisch) [11:28:45] (03CR) 10Addshore: [C: 03+2] Add script to count user setting for disabled AdvancedSearch [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/483386 (https://phabricator.wikimedia.org/T211090) (owner: 10Addshore) [11:29:10] (03Merged) 10jenkins-bot: Add script to count user setting for disabled AdvancedSearch [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/483386 (https://phabricator.wikimedia.org/T211090) (owner: 10Addshore) [11:53:27] joal: it seems to me that the new timers are working fine, lemme know if you see anything weird [12:40:25] going afk for lunch + errand [13:56:23] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410 (10jcrespo) [14:29:40] fdans: I haven't been able to do that refactor yet, do you want to pair on it instead of working on quality? [14:39:05] hey teammm [14:40:15] milimetric: sure! [14:41:02] fdans: ok, I'm gonna take a shower. Take a look at https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/480796/ [14:42:41] fdans: basically we're going to factor out a function that builds the Splits, SplitsList, does the broadcast, and so on, so we can use it for all three reconstruction jobs without repeating ourselves [14:44:22] oook [14:58:31] ok, ready fdans, going to jump in the cave [14:58:50] i'm in! [16:17:59] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) a:05Banyek→03elukey Assigning it to Luca, as he is coordinating this. [16:18:57] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) a:05Banyek→03elukey Assigning this to @elukey so he can follow up what is... [16:21:41] (03PS6) 10Milimetric: Update mediawiki-history comment and actor joins [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [16:21:44] (03PS7) 10Ottomata: Bump to superset version 0.26.3-wikimedia1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481056 [16:27:18] ok fdans elukey! superset fork deployed on superset.analytics.eqiad.wmflabs [16:27:31] elukey: the problem i was having is that by putting the github url in requirements.txt [16:27:43] the create_virtualenv on deploy step wasn't able to use the prebguild superset wheel [16:27:56] so it was dling from github and rebuilding again (including the webpack step) [16:28:11] so, i just changed it to install everything in $wheels_dir/*.whl, which should be equivalent [16:28:36] it deployed fine and looks like its working there [16:31:57] niceee [16:39:30] ottomata: if you want we can stop superset, backup the db, deploy [16:39:38] after fran has tested [16:41:23] ya [16:44:19] actually elukey i have to run shortly after standup, maybe monday is best? [16:49:41] ack! [16:49:56] otherwise I can do it tomorrow with fran [16:55:34] elukey ottomata: I'm takin a look now :) [16:58:54] elukey ottomata periodicity pivot working correctly :) [16:59:01] \o/ [16:59:26] great! [17:03:31] gooood [17:26:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10elukey) [17:48:21] * elukey off! [18:02:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10akosiaris) >>! In T211247#4835806, @Pchelolo wrote: >> I assume they will be behind varnish s... [18:50:55] "ores_revision_score_historified" [18:51:10] Following this coinage, https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Revision_augmentation_and_denormalization [19:37:23] 10Analytics, 10Analytics-Wikimetrics: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10MaxSem) Note that Event Metrics (the new name for Grant Metrics) doesn't have a lot of features that Wikimetrics has, and no plans to add them. [19:54:12] Strange, I was trying to follow example coordinator code for /wmf/data/event/VirtualPageView/ but that table starts with year/month/day partitions that are easy to substitute with EL. My table /wmf/data/event/mediawiki_revision_score start with a datacenter partition, which I don't think I can wildcard. [19:56:54] I'm wondering how to write that URI template. [19:57:57] awight: you can 'wildcard' partitions in hive queries by omitting them [19:58:17] Even in the URI template? [19:59:01] I'm trying to pick up the hourly mediawiki_revision_score directories (with a daily job) from a coordinator. [19:59:23] awight: Ah - you're building a datasets.xml file, right? [19:59:28] exactly [19:59:34] right [19:59:36] hm [20:00:01] I've never tested wildcards in oozie datasets path [20:00:44] In case it matter, I don't need the actual datacenter value because the .hql omits that partition like you suggested above... [20:03:43] awight: I can't think of a solution other than hard-coding the values (so baaaaaad) [20:03:50] hm [20:04:42] Maybe I'm fighting the paradigm? All I want is to ingest the 23-25 [sic!] hourly event files... [20:04:58] maybe we can drop the datacenter partition for event/mediawiki_revision_score? [20:05:50] It's an interesting column to have for analytics, but from my perspective I don't see the need to partition on datacenter [20:06:46] awight: droping the partition at raw data level is complicated (not to say not feasible) - the events/ folder is an umbrella for plenty event-types all following the same pattern [20:07:21] okay--but this only crossed my mind because /wmf/data/event/VirtualPageView is missing that partition... [20:07:49] and also seems to have one of the few jobs which does something equivalent to what I'm attempting. [20:08:34] * awight doubts self and greps harder [20:08:38] awight: I think it's because VirtualPageView comes from event-logging system (no datacenter prefix), while ores-scores come from event-bus (with datacenter-prefix) [20:08:52] oh! now that's fun [20:09:07] awight: :) [20:10:12] awight: event-bus was originally built for production-oriented events, and needed "datacenter-aware" consumption IIRC [20:10:19] Speaking of impressive rabbitholes: https://oozie.apache.org/docs/3.1.3-incubating/CoordinatorFunctionalSpec.html#a6.6.2._coord:hoursInDayint_n_EL_Function_for_Synchronous_Datasets [20:11:06] Indeed - oozie does a pretty good jab at handling time :) [20:11:41] O/ [20:11:54] Anyway - Our problem here is to get a trigger once all files for a full day from any datacenter are available [20:12:29] that's what I'm hoping [20:12:44] FWIW, I see hints that wildcard might be allowed in an uri-template [20:13:03] That'd be really great - Not tested though [20:13:48] copy that, I'll give it a try [20:14:22] to handle an edge case, you might want to trigger once a full day in either dc is available, AND wait a bit. [20:14:29] in case of DC switchover, there will be overlap in both DCs [20:15:39] * awight is reminded of x-windows font specs [20:15:39] $ xterm -fn "-*-dejavu sans mono-medium-r-normal--*-80-*-*-*-*-iso10646-1" [20:16:35] not seeing a delay operator or attribute, off hand [20:17:33] #things-for-version-2 :) [20:24:58] Pretty cool: https://www.nature.com/articles/s42256-018-0002-3 [20:25:44] Gone for tonight - See y'all [20:30:07] lol "coord:endOfDays" [20:35:37] Most Oozie jobs seem to use coor:nominalTime to build date parameters to the workflow, but this doesn't make sense to me. nominalTime for a daily job will only be updated once per day, so how is the hour field incremented through 0-23? [20:38:18] I'll just go with it and cross fingers. [20:41:07] awight: i'd expect the hour field to just say 00:00 ? [20:41:17] or hour=0? but oyu might not have to use it? not sure. [20:43:24] ah thanks! that's right, I don't need to pass hour to the workflow. [20:44:13] But I think the same problem applies to "day", since this job is running some time the day after datasets are available. [20:44:25] s/problem/my own confusion/ [20:46:26] i think that's what nominalTime is for [20:46:35] it gives you the date for the time the job is for [20:49:27] The documentation makes it sound like it's the time at which the coordinator job runs, whereas what I suspect I want is the time matched by the uri template. For example, when backfilling from several days ago. [20:50:07] (03PS11) 10Awight: Schema for ORES scores [analytics/refinery] - 10https://gerrit.wikimedia.org/r/481025 (https://phabricator.wikimedia.org/T209732) [20:50:09] (03PS7) 10Awight: [WIP] Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) [20:50:22] on the bright side, my workflow.xml is behaving nicely [20:52:43] oh maybe i'm wrong, maybe nominal is that one. there is a time for the job tho [20:53:19] hm yeah i think nomninal time is the time for the job [20:53:23] that's what we use in webrequest [20:53:24] e.g. [20:53:29] [20:53:29] year [20:53:29] ${coord:formatTime(coord:nominalTime(), "y")} [20:53:29] [20:53:41] so that should be the time for the job itself, not the current time [20:53:56] I feel a bit that testing my coordinator.xml is going to be like jumping on a wild mustang. [20:54:26] yes I think it's the time the job was "supposed" to go off which is odd considering how we can set non-time-based preconditions [20:54:38] its not so bad! the oozie job CLI output lets you see what is scheduled [20:54:39] its also in hue [20:55:01] ya i'm not sure what nominal time would be if it wasn't based on something [20:55:03] I suppose if we're backfilling a missed job, this nominal time will be e.g. DAY - 2... [20:55:16] when the coordinator action was created based on driver event [20:55:23] no [20:55:28] nominalTime will be the time of the backfill [20:55:30] let's see [20:55:47] harr, then I'm unsure again of how we can backfill using this coordinator.xml [20:56:03] very interesting world. Sorry I have all the questions... [20:56:55] check out [20:56:56] oozie job -info 0070368-181112144035577-oozie-oozi-C [20:57:03] run that somewhere [20:57:16] you'll see the list of coordinator workflows that were instantiated [20:57:29] you can see that this is a daily job [20:57:33] and nominal time is assigned for each day [20:57:42] 01-07, 01-08, etc. [20:57:50] when you backfill, you just rerun one of those job instances [20:57:55] and nominal time will always be set properly [20:58:01] oho! [20:58:02] e.g. if you ran 0070368-181112144035577-oozie-oozi-C@1 [20:58:08] nominal time will be 2019-01-07 00:00 GMT [20:58:51] also, nice, you can get more detailed workflow job status by using the workflow id [20:58:52] 0070686-181112144035577-oozie-oozi-W [20:58:57] oozie job -info 0070686-181112144035577-oozie-oozi-W [20:59:07] that'll show you the status of each of the individual actions in that workflow [21:02:49] awight: this is also visible in hue [21:02:50] e.g. [21:02:52] https://hue.wikimedia.org/oozie/list_oozie_coordinator/0070368-181112144035577-oozie-oozi-C/ [21:06:06] oo this is a good one for you awight [21:06:07] https://hue.wikimedia.org/oozie/list_oozie_coordinator/0029176-180510140726946-oozie-oozi-C/ [21:06:13] virtualpageview-druid-daily-coord [21:06:27] you can see that there are two materialized instances waiting for dependencies to be present [21:06:36] or are running [21:06:44] and a bunch of completed instances too [21:06:55] each instance is named by its nominal time [21:07:14] if you click on one of the instances, you can browse around and see the settings it ran with [21:07:40] including the workflow file, and the parameterized property values [21:07:51] that were provided to the workflow by the coordinator [21:10:43] That's a rad UI. [21:12:49] https://hue.wikimedia.org/oozie/list_oozie_coordinator/0073666-181112144035577-oozie-oozi-C/ [21:12:52] scary times [21:14:02] :) [21:14:05] oh dear--12 instances starting in 2009 [21:14:12] * awight eats children [21:14:19] start time! [21:14:19] :) [21:19:41] Those files are actually present, so I think either * wildcard doesn't work like that, or * something's wrong with the way I'm listing files, since they're being reported as a single concatenation with no separators. [21:19:58] ? there shouldn't be any 2009 files. [21:20:05] right? [21:20:12] but in any case, you should provide a start time [21:20:39] if you ever have to restart a coordinator (for xml update, or whatever), you'll need to tell it where to start from [21:20:48] because each coordinator only tracks its own completed workflows [21:20:50] if you kill one [21:21:04] you'll need to tell it where the last one left off, and start from there [21:27:06] terrifying--I used a start_time from 2018-12 and I think some of my jobs might be running [21:29:32] this one is! https://hue.wikimedia.org/oozie/list_oozie_workflow/0073715-181112144035577-oozie-oozi-W/?coordinator_job_id=0073676-181112144035577-oozie-oozi-C [21:29:48] some errors tho [21:29:55] looks like the send_error_email workflow worked! [21:30:02] so i dunno who your emailis set to, but if you, check and see! [21:31:49] awight: https://hue.wikimedia.org/jobbrowser/jobs/job_1544022186674_133056/single_logs [21:31:56] Table not found 'mediawiki_revision_score' [21:31:56] o [21:32:00] you should use the fully qualified talbe name [21:32:01] event. [21:32:15] +1 I had "wmf.mediawiki_revision_score" in my .properties [21:32:21] aye :) [21:34:27] it's alive! [21:38:54] nice i see some successes! [21:39:27] 61MB per day for this normalized table, I'm happy with that. [21:39:36] and only 30s of CPU time, somehow. [21:40:09] I'm used to "smoking river bed"-magnitude cpu usage with hive queries so this is a pleasant surprise. [21:40:29] Now for the denormalizing workflow coordinator... [21:40:58] (03PS8) 10Awight: [WIP] Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) [21:42:10] I guess I'll leave these running in my personal database to burn-in, and we can kill and replace with the production database name once the patches are merged? [21:44:02] yeah that sounds right [21:53:30] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Start refining all blacklisted EventLogging streams - https://phabricator.wikimedia.org/T212355 (10Neil_P._Quinn_WMF) [21:53:32] 10Analytics: Provide historical redirect flag in Data Lake edit data - https://phabricator.wikimedia.org/T161146 (10Neil_P._Quinn_WMF) [21:53:34] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Neil_P._Quinn_WMF) [22:14:16] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Neil_P._Quinn_WMF) [22:20:41] 10Analytics, 10Product-Analytics: Superset's rolling average feature results in error message - https://phabricator.wikimedia.org/T213488 (10Tbayer) [22:21:16] (03PS9) 10Awight: [WIP] Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) [22:21:29] ^ jobs apparently work [22:21:50] 10Analytics, 10Product-Analytics: Superset's rolling average feature results in error message - https://phabricator.wikimedia.org/T213488 (10Tbayer) [22:22:06] 10Analytics, 10Product-Analytics: Superset's rolling average feature results in error message - https://phabricator.wikimedia.org/T213488 (10Tbayer) [22:22:09] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Superset to 0.28.1 - https://phabricator.wikimedia.org/T211605 (10Tbayer) [22:23:42] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Neil_P._Quinn_WMF) [22:24:13] (03PS10) 10Awight: Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) [22:24:25] very nice [22:25:12] 10Analytics, 10ORES, 10Scoring-platform-team (Current): Backfill ORES Hadoop scores with historical data - https://phabricator.wikimedia.org/T209737 (10awight) [22:28:56] 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES scoring events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) I've left the working draft of the two coordinator jobs running in my user database. The hardcoded datacenter is probably a blocker, any s... [22:29:19] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Neil_P._Quinn_WMF) [22:30:08] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10akosiaris) >>! In T208251#4704278, @mobrovac wrote: > Adding the security folks. > > I agree that code ge... [22:51:08] ottomata: Do we write tests for oozie jobs? [22:52:12] awight: no [22:52:36] :) /me skulks off into the sunset [22:52:37] not even sure how we'd do that...but if you figure out a way...maybe we should! [22:53:00] Well I have reams of "smoke test" code I've written now, and was hoping to plug it into something. [22:53:51] i.e. I create limited, sampled copies of wmf.mediawiki_history and event.mediawiki_revision_score in my personal db, then run jobs with -D... overriding table names. [22:54:25] If we started with a set of fixtures in those two upstream tables, it would be possible to predict the expected output table contents. [22:59:54] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10nettrom_WMF) I was working on another ad-hoc analysis case a couple of days ago where I needed information about when a specific abuse... [23:04:32] 10Analytics, 10Release-Engineering-Team, 10Scoring-platform-team: Investigate formal test framework for Oozie jobs - https://phabricator.wikimedia.org/T213496 (10awight) [23:06:39] awight: no idea if this works? [23:06:39] https://oozie.apache.org/docs/3.3.2/ENG_MiniOozie.html [23:06:48] but it'd be nice to be able to run tests without being logged into cluster [23:07:25] I was stumbling across that link too, it looks pretty malnourished. [23:07:40] Supposedly you can run it locally, on the plus side. [23:09:04] 10Analytics, 10Release-Engineering-Team, 10Scoring-platform-team: Investigate formal test framework for Oozie jobs - https://phabricator.wikimedia.org/T213496 (10awight) One alternative is a JUnit class meant for workflow and coordinator testing: http://oozie.apache.org/docs/5.1.0/ENG_MiniOozie.html An exa...