[05:48:53] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:10:21] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:00:47] 10Analytics, 10Analytics-Kanban: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10elukey) [07:07:52] 10Analytics, 10Analytics-Kanban: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10elukey) [07:23:04] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10ayounsi) Thanks, it looks great! >>! In T254332#6649704, @mforns wrote: > Note that the last couple hours do not yet have t... [07:31:26] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10elukey) @mforns I had a chat with Arzhel, the el-to-druid job is configured like this: ` --since $(date --date '-6hours' -u... [07:36:19] goood morning [07:37:13] joal: the hive patch seems working fine, no more oozie failures \o/ [07:41:49] Hi elukey [07:42:13] elukey: I am misunderstanding something I think [07:42:24] elukey: were there regular oozie failures before the patch? [07:51:37] Dentist appointment - back in a bit [07:53:07] joal: there were regular failures (basically for all hive2 actions) in oozie after enabling the db token store [07:53:36] after the hive patch, all good [07:53:59] I added the patch to the bigtop 1.4 package manually (after rebuilding them) [07:54:19] and now I am building for bigtop 1.5, and I'll send a patch later on to ask upstream to include it before the release [07:55:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10Nikerabbit) I don't really know which one (if any) I should be in. IIRC I was added there for eventlogging access, which I still occasionally use. [08:03:23] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10elukey) >>! In T268801#6650551, @Nikerabbit wrote: > I don't really know which one (if any) I should be in. IIRC I was added there for eventlogging access, which I still... [08:15:42] 10Analytics-Radar: Superset getting slower as usage increases - https://phabricator.wikimedia.org/T239130 (10Aklapper) a:05Nuria→03None Resetting assignee (inactive account) [08:15:59] 10Analytics, 10Analytics-EventLogging, 10Research-Backlog: 20K events by a single user in the span of 20 mins - https://phabricator.wikimedia.org/T202539 (10Aklapper) a:05Nuria→03None Resetting assignee (inactive account) [08:16:09] 10Analytics-Radar, 10Analytics-Wikistats, 10Internet-Archive: Feedback on Wikistats 2 new edits pages - https://phabricator.wikimedia.org/T210306 (10Aklapper) a:05Nuria→03None Resetting assignee (inactive account) [08:16:14] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Aklapper) a:05Nuria→03None Resetting assignee (inactive account) [08:16:18] 10Analytics-Radar, 10Data-release, 10Privacy Engineering, 10Privacy: An expert panel to produce recommendations on open data sharing for public good - https://phabricator.wikimedia.org/T189339 (10Aklapper) a:05Nuria→03None Resetting assignee (inactive account) [08:17:03] 10Analytics-Radar, 10Analytics-Wikistats, 10Internet-Archive: Feedback on Wikistats 2 new edits pages - https://phabricator.wikimedia.org/T210306 (10Aklapper) (Also, this task looks unactionable - a bunch of many different things; instead of dedicated tickets for each task.) [08:24:23] the thing that worries me a bit is https://phabricator.wikimedia.org/T268733 [08:24:32] I am wondering if it is a new weird hive 2 feature [08:36:56] 10Analytics: Alter table for navigation timing errors out in Hadoop test - https://phabricator.wikimedia.org/T268733 (10elukey) [08:40:48] !log roll restart cassandra on aqs10* for openjdk upgrades [08:40:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:41:55] 10Analytics-Clusters, 10WMDE-Analytics-Engineering, 10Patch-For-Review, 10User-GoranSMilovanovic: Downscale Wikidata-analysis pyspark scripts to analytics limits - https://phabricator.wikimedia.org/T268684 (10JAllemandou) @GoranSMilovanovic I looked at all the config files mentioned above, they look good :... [08:43:12] Back1 [08:43:26] bonjour [08:43:32] elukey: About T268733 [08:43:32] T268733: Alter table for navigation timing errors out in Hadoop test - https://phabricator.wikimedia.org/T268733 [08:44:08] elukey: Refine uses a weird strategy to communicate with Hive, so I'm not surprised of the error [08:44:29] elukey: Is there a patch for the refine version you run with bigtop on hive2? [08:44:52] joal: ah!!!! [08:45:04] right the jdbc creds! [08:45:13] but IIRC we source hive-site.xml no?? [08:45:29] elukey: we source hive-site I think - which one I can't tell [08:45:59] the one where refine runs in theory, but in this case joal it seems more a DDL issue rather than a kerb one no? [08:46:02] but mainly we use a JDBC connection with dedicated hive-driver jar included in refine-run command [08:46:04] (trying to understand) [08:46:24] elukey: I think the problem comes from driver version mismatch [08:46:36] interesting [08:46:52] elukey: all our spark jobs except for refine use Spark to connect to Hive - Th [08:47:04] elukey: The version stuff is handled by Spark [08:47:28] elukey: For refine, Spark refuse to let us run the DDL we want to over the hive tables [08:47:52] yep and we open a jdbc conn and issue an explicit alter [08:47:53] elukey: So we use a JDBC connection to hive, as it is less picky [08:48:40] elukey: Now for this JDBC connection the hive-jdbc driver in use differs between versions [08:49:54] ah right I see /srv/deployment/analytics/refinery/artifacts/hive-jdbc-1.1.0-cdh5.10.0.jar [08:50:29] elukey: looking at the refine start command also helps (we add the jars manually to the spark job) [08:51:53] joal: so I could try to pull a newer hive-jdbc from maven and replace it in the spark cmd line [08:52:17] elukey: That would be a first good try [08:52:28] perfect as always you are awesome [08:52:29] will test [08:53:15] (the cassandra oozie failure is me doing a restart of aqs, will re-run as soon as the cookbook finishes) [08:53:37] ack elukey [08:53:45] elukey: I was wondering [08:54:02] elukey: also, it seems there is no data for Maradona pageviews from yesterdya :( [08:56:22] elukey: oozie has been working - I assume it's a caching problem [08:59:40] joal: is there a specific page that you are checking? [09:00:00] elukey: of course! The big one from yesterday :) [09:01:01] elukey: cassandra has data - I confirm it's a caching problem [09:01:10] s/problem/feature [09:01:39] Maradona page yesterday: 2.5M hits [09:01:46] almost 2.6 [09:02:46] joal: no I mean if a specific UI / Tool / etc.. doesn't show the correct graph, or if it is somethign else [09:02:56] I am not getting what you are checking :) [09:03:10] elukey: I was checking the pageview tools (labs) for the Maradona page [09:03:25] elukey: And despite asking for data up to yesterday, the site shows yesterday empty [09:03:45] This is due to AQS sending no data for yesterday [09:03:58] And I think this is a caching thing [09:04:24] There must have been requests to the tool earlier in the day, when the data was not yet loaded, and this got cached [09:04:51] joal: do you have an http link for the maradona api page handy? [09:04:57] we can check headers [09:05:12] elukey: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Diego_maradona/daily/2020110500/2020112500 [09:05:35] elukey: wrong link sorry [09:05:43] https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Diego_Maradona/daily/2020110500/2020112500 [09:05:47] (capital M [09:07:47] elukey: cache-control says max-age 86400 [09:07:54] !log force purging https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Diego_Maradona/daily/2020110500/2020112500 from caches [09:07:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:08:13] joal: better now :) [09:08:16] <3 elukey [09:08:44] is the pageview tool ok now? [09:08:52] elukey: arh, nope :( [09:08:56] https://pageviews.toolforge.org/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-20&pages=Diego_Maradona [09:09:24] it works for me, I see a huge blue bar :D [09:09:27] elukey: manual call to pageview-api works - I got data [09:09:39] Ok great, must be caching on my side [09:09:51] Thanks a lot elukey [09:09:57] super thanks for checking :) [09:10:06] better that we have done it and not the community asking us [09:10:18] should we follow up to have a different caching time? [09:11:05] I don't think so elukey - We've never been asked to change it - In most cases it's not problematic [09:11:55] joal: not sure if we set any header, but if not varnish/ats uses a default I think of a day.. maybe controlling it specifically wouldn't be bad [09:12:03] say 12 hours, or similar [09:12:27] it would change a lot in these cases, since we wouldn't need to manually purge [09:12:28] works for me elukey [09:12:37] 12h seems fine [09:12:38] I can open a task and then see what the team thinks about [09:12:52] also there is another aqs-related issue that I was thinking [09:13:15] the other day I stopped druid1005 and aqs started to alert like crazy [09:13:39] that is not acceptable, one out of 5 druid nodes down shouldn't cause any trouble [09:13:54] I agree [09:14:10] elukey: let me triple check on something [09:14:13] my theory is that aqs -> druid-lvs has a long timeout, so if a broker is stopped [09:14:23] then 1/5th of connections for the edit api are piling up [09:14:29] ahhhhh - possible [09:14:49] but not sure where to check in the aqs code :( [09:14:49] I'm going to check druid data redundancy policy for our datasets [09:14:54] ack [09:21:46] 10Analytics, 10Release-Engineering-Team (Development services): Unable to clone git repo from stat1008 - https://phabricator.wikimedia.org/T268290 (10hashar) [09:21:54] elukey: I confirm data should be available - We have replication 2 by default [09:25:22] elukey: it made a long time I had not checked datasources in Druid-prod - I'm gonna make a change on the druid loading job, splitting segments more:currently we have 6 segments per month, and recent segments weight 1G each (too much) - Will ask for 16 segments for each month (this will give us time before having to change) [09:25:30] 10Analytics: AQS pageview default caching is one day - https://phabricator.wikimedia.org/T268809 (10elukey) [09:25:54] super [09:30:23] 10Analytics: AQS should be more resilient to druid nodes not available - https://phabricator.wikimedia.org/T268811 (10elukey) [09:30:29] there you go [09:30:42] joal: if you want to add your thoughts --^ [09:31:23] sure elukey [09:32:22] <3 [09:37:37] 10Analytics, 10Analytics-Kanban: Make druid mediawiki-history-reduced segments smaller - https://phabricator.wikimedia.org/T268813 (10JAllemandou) [09:37:55] 10Analytics, 10Analytics-Kanban: Make druid mediawiki-history-reduced segments smaller - https://phabricator.wikimedia.org/T268813 (10JAllemandou) a:03JAllemandou [09:38:12] (03PS1) 10Joal: Update mediawiki-history-reduced druid loading (shards) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643689 (https://phabricator.wikimedia.org/T268813) [09:43:12] (03CR) 10Elukey: [C: 03+1] Update mediawiki-history-reduced druid loading (shards) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643689 (https://phabricator.wikimedia.org/T268813) (owner: 10Joal) [09:48:04] 10Analytics: AQS should be more resilient to druid nodes not available - https://phabricator.wikimedia.org/T268811 (10JAllemandou) @elukey your idea makes sense. I quickly looked at Hyperswitch and didn't find an obvious way to set a client call timeout. Let's ask @Pchelolo if he can enlighten us. [10:05:38] gmodena: :( [12:46:48] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10mforns) @ayounsi @elukey The reason of the gap is that the streaming job that is ingesting the data into Druid does not have... [12:48:23] mforns: o/ [12:48:39] I am not super clear about the 4 hours thing :( [12:58:37] elukey: I just checked using hdfs dfs -ls, that it takes about 1 hour to ingest netflow raw from kafka [12:58:58] and then the data is refined and ready only 3 hours later [12:59:34] at least that's what I can se by looking at hdfs dfs -ls /wmf/data/wmf/netflow/year=2020/month=11/day=26 [13:00:10] maybe the refine job is also too conservative in the since until params [13:01:09] elukey: yes, the default refine_job.pp time window config is: --since 28 --until 4 [13:01:12] so 4 hours from now [13:02:08] if we changed that for netflow to say --until 2, then we could change the druid_load config to --until 3 [13:02:19] 10Analytics-Clusters, 10WMDE-Analytics-Engineering, 10Patch-For-Review, 10User-GoranSMilovanovic: Downscale Wikidata-analysis pyspark scripts to analytics limits - https://phabricator.wikimedia.org/T268684 (10GoranSMilovanovic) @joal Thank you. The `--num-executors` parameter is intentionally still there t... [13:02:20] and reduce the gap by 2 hours... [13:02:46] but I believe the real solution would be to add the new fields to the streaming job, it should be possible [13:04:12] mforns: ah yes yes right, I was only puzzled by the 4 hours :) [13:04:25] I agree that changing the streaming job is the right way [13:04:48] I didn't follow the whole set of changes and I thought that the new fields were the result of refine or augmentation [13:04:53] (so not in kafka) [13:05:08] but if they are, let's also modify the streaming job now no? [13:05:14] it should take 10 mins [13:05:22] mforns: quick note: We would need a new streaming job (none exist now), sending augmented data to a new kafka topic, and have druid ingest from this new topic [13:05:35] elukey: --^ as well sorry [13:05:43] ah ok then this is indeed an issue [13:06:34] joal: no streaming job??? [13:06:55] mforns: nope - Druid ingests straight from Kafka (as format is json) [13:07:05] aah! I see [13:07:55] ok, well this will be a bit more difficult, but yea! [13:08:49] !log force umount/mount of all /mnt/hdfs mountpoints to pick up opendjdk upgrades [13:08:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:08:57] elukey: yea, new fields are not in kakfa [13:14:44] 10Analytics: AQS pageview default caching is one day - https://phabricator.wikimedia.org/T268809 (10JAllemandou) Thanks @elukey for this ticket. Given that the loading of the data usually finishes before 2AM (UTC) of the current day for the previous day, high-traffic pages would most probably have been requested... [13:52:44] !log roll restart druid daemons on druid analytics to pick up new openjdk upgrades [13:52:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:07:20] (03PS2) 10Joal: [WIP] Update sqoop adding tables and removing timestamps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643029 (https://phabricator.wikimedia.org/T266077) [14:12:43] (03PS3) 10Joal: [WIP] Update sqoop adding tables and removing timestamps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643029 (https://phabricator.wikimedia.org/T266077) [14:19:05] joal: as FYI, I am merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/643446 [14:19:39] awesome elukey -- thanks a lot [14:40:50] 10Analytics, 10Patch-For-Review: Avro Deserializer logging set to DEBUG in pyspark lead to huge yarn stderr container files (causing disk usage alerts) - https://phabricator.wikimedia.org/T268376 (10elukey) 05Open→03Resolved a:03elukey We solved the problem, on stat1005 there was a wrong log4j root logge... [14:55:37] joal: I am playing a bit with hive versions in hadoop test, but one thing that I noticed is that if I execute the alter in say beeline [14:55:52] (so on a hive 2.3.3 client) it returns the same error [14:56:25] if it was a problem of refine I'd have expected the alter to succeed in beeline [14:56:28] does it make sense? [15:00:20] 10Analytics, 10Product-Analytics: Configure superset cache - https://phabricator.wikimedia.org/T268784 (10elukey) The idea that I have is the following: 1) We move Superset to an-tool1010, with Turnilo 2) We deploy a on-host memcached instance, we have already a lot of puppet code + monitoring + metrics to re... [15:01:05] going afk for some errands, be back later :) [15:01:30] ack elukey - same for me let's talk post standup about refine [15:35:38] helloo teamm [15:35:50] joal: can you do the pairing on netflow in short? [15:40:50] 10Analytics: [data quality alarms] Reduce the K to generate more reports - https://phabricator.wikimedia.org/T246682 (10ssingh) Hi! Following up on these tickets now; sorry for the delay. I think the current levels are reasonable and it's fine to leave them at the present value. [15:45:02] 10Analytics: [data quality alarms] Reduce the K to generate more reports - https://phabricator.wikimedia.org/T246682 (10mforns) 05Open→03Resolved a:03mforns Great! Will close this task then. [15:45:04] 10Analytics, 10Analytics-Kanban: Traffic anomaly alarms - https://phabricator.wikimedia.org/T267355 (10mforns) [16:32:25] Hi mforns - I can do it now if you wish [16:36:14] heya joal, yes, gimme 2 mins [16:36:16] :] [16:36:26] sure [16:38:57] joal: ah! I just realized there will be a conflict in the name of the druid datasource [16:39:35] when we move to event db, HiveToDruid will automagically ingest into event_netflow as opposed of wmf_netflow... [16:39:56] you know if there's a way to rename datasources in Druid? [16:40:30] mforns: hm, I don't know an [16:41:16] mforns: the only way I can think of is thrgough reindexing based on existing data [16:41:20] There might be a way though [16:41:52] joal: or else, we can add a new argument to HiveToDruid to let us override the datasource name... [16:42:06] but that would take a refinery-source deployment [16:42:34] whatcha think? [16:42:40] Being able to override the druid datasource seems like a good option to have [16:42:54] in HiveToDruid I mean [16:42:56] mforns: --^ [16:43:03] yea [16:47:38] mforns: I assume this means we'll do the db change after the needed deplo, right? [16:47:48] yes [16:47:50] elukey: nearby by any chance? [16:52:29] nevermind elukey - I managed myself :) [16:53:35] (03PS1) 10Mforns: Add datasource argument to HiveToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/643748 (https://phabricator.wikimedia.org/T231339) [16:54:11] joal: ^ [16:54:16] yessir [16:57:07] (03CR) 10jerkins-bot: [V: 04-1] Add datasource argument to HiveToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/643748 (https://phabricator.wikimedia.org/T231339) (owner: 10Mforns) [16:57:12] I think we could deploy today, it'd be just the packaging to archiva [16:57:24] and adding the links [16:57:29] feasible mforns - We'd need approval from elukey though [16:57:35] yea yea ofc [16:57:50] uou -1 [16:59:31] (03PS2) 10Mforns: Add datasource argument to HiveToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/643748 (https://phabricator.wikimedia.org/T231339) [16:59:41] sorry joal I was in a meeting! [16:59:56] np elukey :) [17:01:47] klausman: standup if you wish :) [17:08:12] (03PS4) 10Joal: Update sqoop adding tables and removing timestamps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643029 (https://phabricator.wikimedia.org/T266077) [17:09:02] (03CR) 10Joal: [V: 03+2] "Tested on some small wikis" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643029 (https://phabricator.wikimedia.org/T266077) (owner: 10Joal) [17:26:24] 10Analytics: Alter table for navigation timing errors out in Hadoop test - https://phabricator.wikimedia.org/T268733 (10elukey) [17:49:13] (03CR) 10Joal: [C: 03+1] "Ok for me - I'd have liked to use an option instead of empty-string as no-value default, but I'm not sure it can be easily done with the c" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/643748 (https://phabricator.wikimedia.org/T231339) (owner: 10Mforns) [17:52:57] joal: refine completed :) [17:53:03] \o/ [17:53:06] Let me check the data [17:53:09] 10Analytics: Alter table for navigation timing errors out in Hadoop test - https://phabricator.wikimedia.org/T268733 (10elukey) I had a chat during post stand up today with Marcel and Joseph about the problem, and after some brain-bounce and research this new shiny parameter came up: ` hive.metastore.disallow.i... [17:53:58] elukey: I can't see new partition :( [17:54:38] joal, thanks for the review :] I think that ConfigHelper does support Option[String] params: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/scala/org/wikimedia/analytics/refinery/core/config/ConfigHelper.scala#L64 [17:55:11] mforns: if you don't mind, we can try it! [17:55:17] mforns: otherwise it's ok :) [17:55:26] joal: of course! tryinh [17:55:48] mforns: I think it'll also be worth tring an execution with logs, just in case ;) [17:56:05] yea yea [17:56:18] elukey: could it be that there was no data? [17:59:19] no idea [17:59:32] lemme check [18:01:15] previously failed refinement and does not have new data since the last refin etc.. [18:01:24] I have to run it with the flag to don't care :) [18:01:26] right [18:01:35] I imagined it could have been [18:01:46] Thanks for checking that elukey - I was trying to do so as well [18:02:02] * joal is slower than elukey - truth is stable [18:02:45] yes sure [18:02:54] you are SLOWER than me [18:03:16] I don't buy it :D [18:04:14] ok re-running with the --ignore_failure_flag=true flag [18:04:29] Ack! [18:15:01] elukey: refined suceeded [18:15:04] checking data :) [18:15:19] MOAR DATAZ! [18:16:12] :D [18:16:19] And importantly, queryable data! [18:16:23] That's a win elukey :) [18:17:42] \o/ [18:18:34] :] [18:23:45] Ok - Leaving for tonight - see you folks :) [18:24:19] byeeee [18:27:39] (03PS1) 10Elukey: oozie: move all hive2 actions settings to analytics-hive.eqiad.wmnet [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643762 (https://phabricator.wikimedia.org/T268028) [18:27:46] the moment has come! :D [18:28:35] ah no it is wrong [18:28:51] uff fixing [18:29:14] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10SBisson) a:05SBisson→03None [18:30:52] (03PS2) 10Elukey: oozie: move all hive2 actions settings to analytics-hive.eqiad.wmnet [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643762 (https://phabricator.wikimedia.org/T268028) [18:37:10] weird, in some files I get an extra newline at the end [18:40:15] (03PS3) 10Elukey: oozie: move all hive2 actions settings to analytics-hive.eqiad.wmnet [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643762 (https://phabricator.wikimedia.org/T268028) [18:42:59] all right ready to go :) [18:51:12] also, I think that only oozie on bigtop supports multiple metastores [18:51:14] sigh [18:51:28] we'll need to test and see [18:51:37] anyway, logging off, ttl! [18:51:59] (03PS3) 10Mforns: Add datasource argument to HiveToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/643748 (https://phabricator.wikimedia.org/T231339) [18:56:02] (03CR) 10Mforns: [V: 03+2] "Tested it with real ingestion. Seems to work!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/643748 (https://phabricator.wikimedia.org/T231339) (owner: 10Mforns) [20:22:55] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up automatic deletion/snitization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10mforns) @ayounsi As discussed in earlier comments, we Analytics will migrate netflow data in HDFS from `/wmf/data/wmf/netflow` to `/wmf/data... [20:44:24] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up automatic deletion/snitization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10mforns) @jallemandou @elukey Here's a sketch of the migration plan, mostly a reference for myself! Please raise flags if something is missing... [21:18:38] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643029 (https://phabricator.wikimedia.org/T266077) (owner: 10Joal)