[02:45:26] <wikibugs>	 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES recent_score events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight)
[04:23:18] <wikibugs>	 (03PS11) 10Awight: Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732)
[04:24:32] <wikibugs>	 (03CR) 10Awight: "I ran into an interesting twist: we need to watch both datacenters' mediawiki_revision_score streams in case of a service switchover, but " (0314 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732) (owner: 10Awight)
[04:30:25] <wikibugs>	 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES recent_score events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) Something tricky I ran into: success files aren't written for hours where there are zero changeprop events through codfw.  Maybe we ha...
[07:44:29] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmn...
[08:20:46] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1039.eqiad.wmnet'] `  a...
[08:21:28] <joal>	 Hi team - still not 100% today - I think it's a flu-ish stuff - Will read emails and connect every now and then but will not try to produce
[08:22:51] <elukey>	 rest joal!!
[08:42:47] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10elukey)
[08:47:39] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10elukey)
[09:21:55] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10fgiunchedi)
[09:24:17] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10elukey)
[09:26:11] <wikibugs>	 (03CR) 10Mforns: Make saltrotate store salts with timestamps as file name. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484250 (https://phabricator.wikimedia.org/T212014) (owner: 10Mforns)
[09:32:35] <wikibugs>	 (03PS3) 10Mforns: Make saltrotate store salts with timestamps as file name. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484250 (https://phabricator.wikimedia.org/T212014)
[09:43:50] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10elukey) So the staging db is kinda problematic to clean up, since it is difficult to figure out owners and reach out to people. I have already started to ask to people to review/drop the old tables, but as preca...
[09:46:10] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10elukey) @Neil_P._Quinn_WMF @Milimetric @mforns @leila @nettrom_WMF @DarTar @Tbayer would you mind to review the tables in the description and see if anything is definitely not needed and can be dropped?
[09:48:28] <wikibugs>	 10Analytics, 10Analytics-Wikistats: [Wikistats v2] Default selection for (active) editors is confusing for inexperienced users - https://phabricator.wikimedia.org/T213800 (10Nemo_bis)
[10:04:54] <addshore>	 anyone around to help me try and locate the data for https://meta.wikimedia.org/wiki/Schema:WikibaseTermboxInteraction ?
[10:04:59] <addshore>	 not sure where it is getting lost :/
[10:06:42] <addshore>	 I'm seeing the JS code hit https://www.wikidata.org/beacon/event, and that looks correct
[10:07:40] <addshore>	 oh wait, i see it in hive now, wow, maybe this event just hasn't been triggered by real users
[10:07:41] <addshore>	 hah
[10:09:14] <elukey>	 good :)
[10:22:56] <wikibugs>	 10Analytics, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10elukey) a:03elukey
[10:24:42] <wikibugs>	 10Analytics, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10elukey) So as far as I can understand I'd need to grab the list of tables and produce a list of:  ` ALTER TABLE $table-name ENGINE=InnoDB; `  @Marostegui does replication need to be stopped w...
[10:28:57] <wikibugs>	 10Analytics, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10Marostegui) No, I don't think you have to stop it. Keep in mind that you can also do: `alter table $SCHEMA.$TABLE engine=InnoDB`
[10:34:56] <wikibugs>	 10Analytics, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10elukey) Thanks!  I have created `/home/elukey/aria_tables_alter.sql` on dbstore1002, if you can review them quickly as sanity check it would be great. Then I'd just execute mysql --skip-ssl <...
[10:36:45] <wikibugs>	 10Analytics, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10elukey) Another question - should we back up the staging database just in case something goes wrong?
[10:39:04] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10elukey)
[10:42:27] <wikibugs>	 10Analytics, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10Marostegui) >>! In T213706#4880452, @elukey wrote: > Thanks! >  > I have created `/home/elukey/aria_tables_alter.sql` on dbstore1002, if you could review them quickly as sanity check it would...
[12:22:12] * elukey lunch!
[12:36:25] <addshore>	 joal: feeling any better?
[12:36:42] <addshore>	 oh no, I see your message now from earlier!
[12:36:45] <addshore>	 get well soon!
[12:50:41] <icinga-wm>	 PROBLEM - eventbus grafana alert on icinga1001 is CRITICAL: CRITICAL: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is alerting: EventBus POST Response Status alert.
[12:52:52] <elukey>	 mmmm
[12:52:56] <elukey>	 grafana alert?
[12:53:07] <icinga-wm>	 RECOVERY - eventbus grafana alert on icinga1001 is OK: OK: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is not alerting.
[12:54:48] <elukey>	 EventBus POST Response Status alert
[12:54:48] <elukey>	 NO DATA for 2 minutes
[12:54:52] <elukey>	 ah there you go
[12:55:57] <elukey>	 will ask to Andrew
[13:08:06] <wikibugs>	 10Analytics: virtualpageview_hourly lacks data from December 17 on - https://phabricator.wikimedia.org/T213602 (10Tbayer) Great, thank you @Ottomata and everyone else for solving this so quickly!  >>! In T213602#4878976, @Nuria wrote: > Data is present now up to the 22nd.  >>! In T213602#4878990, @Nuria wrote: >...
[13:10:47] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) Let's wait for {T213706} to be done before we migrate the existing copy of `stagingdb` to any of the hosts.
[13:10:58] <wikibugs>	 10Analytics, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10Marostegui)
[13:11:02] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui)
[13:42:35] <wikibugs>	 10Analytics: virtualpageview_hourly lacks data from December 17 on - https://phabricator.wikimedia.org/T213602 (10Nuria) It finished by midnite the December data, which  is the one you needed for the report.
[13:48:55] <wikibugs>	 10Analytics, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10elukey) p:05Normal→03High
[13:51:05] <wikibugs>	 10Analytics: Alarms for virtualpageview should exist (probably in oozie) for jobs that have been idle too long - https://phabricator.wikimedia.org/T213716 (10elukey) As FYI I can see the following for Dec 17th in my inbox for analytics-alerts@:  ` OOZIE - SLA END_MISS (AppName=virtualpageview-hourly-coord, JobID...
[14:02:04] <nuria>	 fdans: moved systemd timers and couple other snipets to the new doc and "emptied" the old one
[14:04:00] <fdans>	 nuria: those bits should be turned into something actionable, I don't really know what to do with the systemd timers
[14:05:01] <nuria>	 fdans: we can rework that but it is very useful, if  a job fails, you want to see logs the way to do it since we use systemd has chnaged a lot
[14:12:29] <elukey>	 nuria: in the docs the only bit missing is, as far as I can see, how to restart jobs
[14:12:32] <elukey>	 I can add it now
[14:12:53] <elukey>	 but it is basically issuing a start to the service unit
[14:14:11] <elukey>	 ah no it is mentioned, trying to highlight it
[14:14:32] <elukey>	 anyway, I'd suggest to play with them and see what are the doubts etc..
[14:16:04] <elukey>	 ah no just realized that it was fdans to discuss about timers
[14:16:15] <elukey>	 :)
[14:17:29] <wikibugs>	 10Analytics: Reportupdater should alert if it fails over and over - https://phabricator.wikimedia.org/T213309 (10elukey) We decided to try a simple systemd timer for the moment, that will alarm if report updater will run and return a non zero code. This is currently tracked in T172532
[14:28:36] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10elukey)
[14:29:03] <elukey>	 fdans: you there?
[14:29:12] <fdans>	 hellooo
[14:29:14] <elukey>	 o/
[14:29:25] <elukey>	 if you have time we can deploy superset
[14:29:38] <fdans>	 elukey: I'm ready if you are
[14:29:50] <elukey>	 let's do it
[14:30:01] <elukey>	 so I am going to merge https://gerrit.wikimedia.org/r/#/c/analytics/superset/deploy/+/481056/ (please check that it is the good one)
[14:30:19] <elukey>	 stop superset, take a dump of the database, and finally deploy
[14:30:23] <elukey>	 how does it sound?
[14:35:21] <elukey>	 fdans: ?
[14:35:46] <fdans>	 haha I read it as "take a dump on the database"
[14:36:02] <elukey>	 ahahhahaha
[14:36:16] <fdans>	 the patch looks good to me
[14:36:24] <wikibugs>	 (03CR) 10Fdans: [C: 03+1] Bump to superset version 0.26.3-wikimedia1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481056 (owner: 10Ottomata)
[14:36:44] <elukey>	 fdans: can you +2 merge it and update the superset repo on deploy1001 while I take the mysql dump ?
[14:36:59] <elukey>	 !log stop superset to allow a clean mysqldump
[14:37:01] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:37:10] <fdans>	 elukey: on it
[14:38:23] <wikibugs>	 (03CR) 10Fdans: [V: 03+2 C: 03+2] Bump to superset version 0.26.3-wikimedia1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/481056 (owner: 10Ottomata)
[14:38:46] <elukey>	 fdans: feel free to deploy whenever you want
[14:39:50] <fdans>	 elukey: merged and pulled last version on repo
[14:40:06] <elukey>	 you should have perms to deploy right?
[14:40:15] <fdans>	 yeah, using scap?
[14:40:18] <elukey>	 yep
[14:40:21] <fdans>	 cool
[14:40:58] <fdans>	 !log deploying  superset 0.26.3-wikimedia1
[14:40:59] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:41:29] <fdans>	 elukey: done
[14:42:02] <elukey>	 fdans: goood! so in theory no db upgrade is needed
[14:42:29] <elukey>	 let's check the dashboards
[14:42:36] <elukey>	 and then if the issue has been fixed
[14:44:08] <wikibugs>	 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES recent_score events into Hadoop - https://phabricator.wikimedia.org/T209732 (10Ottomata) Yeah, this isn't the first time we've had this problem.  It isn't actually that easy to solve, because the Kafka consumer doesn't ad...
[14:48:38] <wikibugs>	 10Analytics, 10EventBus, 10Operations, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10fgiunchedi) Discovery records for kafka would come handy in the logging pipeline case too, namely during datacenter failover to move producers off a given datacen...
[14:49:49] <elukey>	 something is weird, it seems like we deployed 0.28 again
[14:49:50] <fdans>	 elukey: lol the isssue is fixed by f-strings in the superset repo
[14:50:07] <elukey>	 yeah but that was the thing that we noticed after the deploy to 0.28
[14:50:24] <elukey>	 we are seeing the same errors?
[14:50:28] <elukey>	 how the hell is possible?
[14:51:26] <fdans>	 elukey: hmmm, if that's the case the filter box should be all broken, lemme check
[14:51:47] <elukey>	 elukey@analytics-tool1003:/srv/deployment/analytics/superset/deploy$ ls artifacts/stretch/superset*
[14:51:50] <elukey>	 artifacts/stretch/superset-0.26.3_wikimedia1-py3-none-any.whl
[14:51:56] <elukey>	 this looks good
[14:52:05] <fdans>	 elukey: yep, filterbox is broken
[14:52:05] <fdans>	 https://superset.wikimedia.org/superset/dashboard/geowikiarchive/?preselect_filters=%7B%0A%20%20%2248%22%3A%20%7B%0A%20%20%20%20%22__from%22%3A%20%222018-03-01T00%3A00%3A00%22%2C%0A%20%20%20%20%22__to%22%3A%20%222018-04-01T00%3A00%3A00%22%0A%20%20%7D%0A%7D
[14:52:21] <fdans>	 it seems like we are indeed in 0.28
[14:53:07] <elukey>	 sigh
[14:53:14] <elukey>	 let's rollback fdans 
[14:53:41] <elukey>	 we need the staging environment
[14:53:46] <elukey>	 before any more deployment
[14:53:55] <fdans>	 elukey: can we rollback using scap?
[14:54:14] <fdans>	 or revert + scap deploy?
[14:54:34] <elukey>	 fdans: revert + scap deploy afaik
[14:55:05] <fdans>	 ok doing it elukey 
[14:55:19] <wikibugs>	 (03PS1) 10Fdans: Revert "Bump to superset version 0.26.3-wikimedia1" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/484444
[14:55:44] <wikibugs>	 (03CR) 10Fdans: [V: 03+2 C: 03+2] Revert "Bump to superset version 0.26.3-wikimedia1" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/484444 (owner: 10Fdans)
[14:56:30] <fdans>	 !log "rolling back to stable superset"
[14:56:31] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:58:26] <elukey>	 fdans: is the deploy ongoing or already finished?
[14:58:51] <fdans>	 elukey: still on promote and restart_service stage(s)
[14:59:14] <elukey>	 ah okok
[14:59:18] <elukey>	 so that explains the 502s
[15:00:03] <fdans>	 elukey: it does seem hard stuck there
[15:00:14] <fdans>	 it didn't take that long at all before
[15:00:34] <wikibugs>	 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES recent_score events into Hadoop - https://phabricator.wikimedia.org/T209732 (10Nuria) So i understand, the expectation here will be that files are written for all hours but empty for those of which there was no data?
[15:01:27] <fdans>	 elukey: failed
[15:02:02] <fdans>	 https://www.irccloud.com/pastebin/z2D2TojK/
[15:02:38] <elukey>	 that doesn't make any sense though
[15:03:17] <elukey>	 because that git repo is only present in the new version
[15:03:23] <elukey>	 ah no right
[15:03:30] <fdans>	 nono this is the revert
[15:03:30] <elukey>	 there are other two patches merged by andrew
[15:03:36] <elukey>	 before that
[15:04:27] <elukey>	 fdans: we can do something like that - create a branch on deploy1001 from the last commit from upstream
[15:04:39] <elukey>	 should be the one before andrew's merges
[15:04:43] <elukey>	 then deploy from that one
[15:04:57] <elukey>	 in the meantime we'll try to figure out how to proceed
[15:05:01] <elukey>	 how does it sound?
[15:05:47] <elukey>	 fdans: ?
[15:05:48] <fdans>	 elukey: not sure what you mean with the last commit from upstream
[15:06:29] <elukey>	 you are right, I meant the last "stable" commit from us
[15:06:46] <elukey>	 that should be fcc7058e90a8fc83eeaa012bd751af2a0f7f3fb0
[15:07:06] <elukey>	 after that there are 3 commits from andrew + your revert
[15:08:02] <elukey>	 otherwise I can do it
[15:08:04] <elukey>	 let me know
[15:08:19] <fdans>	 ok, let me see
[15:08:19] <elukey>	 or even better, bc?
[15:08:28] <elukey>	 probably more productive
[15:11:35] <wikibugs>	 10Analytics, 10EventBus, 10Operations, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10Joe) Sorry, I need some more specifics:  you want to make a dns query, and get as a response the "nearest" kafka cluster in the form of a list of hostnames/ports?...
[15:11:43] <ottomata>	 o/ lemm eknow if yall need help!
[15:13:35] <wikibugs>	 10Analytics, 10EventBus, 10Operations, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10Ottomata) No no for me, all I want is an alias for the list of Kafka brokers in a given Kafka cluster.  I don't need any DC failover stuff.  Perhaps discovery is...
[15:13:45] <elukey>	 ottomata: if you have time we are in bc!
[15:23:05] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10mforns) @elukey  Definitely the tables prefixed with mforns_ can be deleted.
[15:26:14] <wikibugs>	 10Analytics, 10EventBus, 10Operations, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10Joe) Might I suggest that you use a SRV dns record instead? It's more appropriate for enumerating members in a cluster. We use those for etcd discovery.
[15:29:35] <wikibugs>	 10Analytics, 10EventBus, 10WMF-JobQueue, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 2 others: EventBus error "Unable to deliver all events: (curl error: 28) Timeout was reached" - https://phabricator.wikimedia.org/T204183 (10kchapman)
[15:33:15] <wikibugs>	 10Analytics, 10EventBus, 10WMF-JobQueue, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 3 others: EventBus error "Unable to deliver all events: (curl error: 28) Timeout was reached" - https://phabricator.wikimedia.org/T204183 (10CCicalese_WMF) a:03Pchelolo
[16:15:41] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), and 3 others: Spin out a tiny EventLogging RL module for lightweight logging - https://phabricator.wikimedia.org/T187207 (10Milimetric) I just checked and I think we've exorcised any async or...
[16:21:16] <wikibugs>	 10Analytics: Alarms for virtualpageview should exist (probably in oozie) for jobs that have been idle too long - https://phabricator.wikimedia.org/T213716 (10Nuria) a:03Nuria
[16:27:33] <wikibugs>	 10Analytics, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wire ORES recent_score events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) >>! In T209732#4881126, @Ottomata wrote: > We could emit a single test event per hour into the topic in each dc... :)  That works for...
[16:33:24] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10elukey) Took a mysqldump of the staging database and moved it in two places: * on dbstore1002's /srv/elukey_backup * on stat1007's /home/elukey home dir (chmod root:root...
[17:06:23] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH) @fgiunchedi: Thanks for updating about the ms-be systems!  I see you added they can be gracefully powered down, can we just power them back up and ensure puppet runs post...
[17:07:31] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH) >>! In T213748#4881709, @RobH wrote: > @fgiunchedi: Thanks for updating about the ms-be systems!  I see you added they can be gracefully powered down, can we just power t...
[17:50:33] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10leila) @elukey Please delete tables that include leila and leizi in their name. And my apologies that I didn't clean up after myself. I will do better in the future.
[17:52:42] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10DarTar) @elukey same for all dartar_* tables, they can safely be removed.
[17:54:14] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10elukey) Thanks all!
[17:55:16] <elukey>	 milimetric: whenever you have time, can you check the milimetrics_ prefixed tables --^ 
[17:55:19] <elukey>	 ?
[17:56:00] * elukey off!
[17:56:02] <elukey>	 o/
[17:58:10] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10Neil_P._Quinn_WMF) It looks like I only have a few staging tables because I've been cleaning up as I go, but I checked and dropped `ve_experiment_expanded` and `neilpquinn_VE_experiment_revs`
[18:01:08] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10diego) Hi! I'm not sure what is this, but for sure you can delete diego_tmp.  Thanks
[18:09:27] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10nettrom_WMF) I went ahead and deleted all tables starting with "nettrom_" except the four tables referenced in T190434#4085830.
[18:54:10] <awight>	 It's not clear to me whether it's safe to run concurrent inserts into the same table from Oozie...
[18:57:55] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) Tomorrow (Jan 15) we have a meeting with some SRE folks to revisit this.  We've got the cloud-analytics Hadoop...
[18:59:25] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) Some links:  - https://prestodb.io/docs/current/security/ldap.html - https://prestodb.io/docs/current/connector...
[19:01:19] <ottomata>	 hey awight, i see some oozie failure emails
[19:01:21] <ottomata>	 i think you don't get those
[19:01:30] <ottomata>	 you should add your email address to the list of emails that get alerts for that job
[19:02:04] <ottomata>	 OH
[19:02:08] <ottomata>	 its hardcoded!
[19:02:22] <ottomata>	 in send_error_email workflow (assuming you are using that)
[19:02:39] <ottomata>	 oh hmm maybe its not
[19:03:35] <ottomata>	 hm its not hardcoded but we never override it anywhere
[19:03:37] <ottomata>	 you probably should for this
[19:03:50] <ottomata>	 awight:  to answer your previous question
[19:03:56] <ottomata>	 as long as the inserts are into separate hive partitions
[19:04:00] <ottomata>	 it is fine to do it concurrently
[19:04:43] <awight>	 ottomata: thanks for the heads-up, I was fumbling the send_error_email overwrites, would be nicer if I could override or something...
[19:04:52] <awight>	 Hopefully it stops, the job is killed.
[19:05:08] <awight>	 hrm, definitely going to be the same partition, so I'll just set concurrency to 1
[19:05:22] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10Neil_P._Quinn_WMF) I also checked with @JKatzWMF and dropped `jkatz_foo`, `jkatz_foosss26`, and `editor_stats_JK_test`.
[19:09:27] <ottomata>	 if same partition it might be ok awight depends on how the insert is
[19:09:32] <ottomata>	 actually, i think it will be fine if you are not doing insert overwrite
[19:09:37] <ottomata>	 it'll just write new files
[19:09:39] <ottomata>	 in the same partition
[19:10:19] <ottomata>	 awight:  for email overriding, i think if you set something like error_alert_contact=analytics-alerts@wikimdia.org,awight@wikimedia.org
[19:10:26] <ottomata>	 you can then pass it to send_error_email workflow as the
[19:10:47] <ottomata>	 <to>${error_alert_contact}</to> email
[19:10:50] <ottomata>	 param*
[19:10:59] <ottomata>	 oh
[19:11:00] <ottomata>	 i guess
[19:11:14] <ottomata>	                 <property>
[19:11:14] <ottomata>	                     <name>to</name>
[19:11:14] <ottomata>	                     <value>${error_alert_contact}</value>
[19:11:14] <ottomata>	                 </property>
[19:11:19] <ottomata>	 something like that
[19:11:25] <awight>	 ottomata: yes!  great, thanks
[19:11:34] <awight>	 I was missing that parameter
[19:11:44] <awight>	 and apologies for the team spam...
[19:58:31] <wikibugs>	 10Analytics, 10EventBus, 10Operations, 10Patch-For-Review, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10Pchelolo) Once we get it we would need to update Change-Prop, JQ-Change-Prop, EventBus-service, event streams to use the new DNS record.
[20:20:51] <wikibugs>	 10Analytics, 10EventBus, 10Operations, 10Patch-For-Review, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10akosiaris) >>! In T213561#4881255, @Joe wrote: > Might I suggest that you use a SRV dns record instead? It's more appropriate for enumeratin...
[20:21:56] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10Milimetric) dropped all `milimetric_` tables, and gave up on my dreams of figuring out what exactly is going on with mediawiki's revision table.
[20:23:45] <wikibugs>	 10Analytics, 10EventBus, 10Operations, 10Patch-For-Review, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10Ottomata) Kafka doesn't support SRV.  Hence my Round Robin DNS patch.  After more discussion with @bblack, I think I've decided to abandon t...
[20:24:41] <wikibugs>	 10Analytics: Find out what happens to the old rows in the revision table - https://phabricator.wikimedia.org/T142535 (10Milimetric) I just dropped the data I mentioned in this task.  Since we've been sqooping from mediawiki, we have a version of this kind of data in the `wmf_raw` database, in the `mediawiki_revi...
[21:17:25] <wikibugs>	 (03PS12) 10Awight: Oozie jobs to produce ORES data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482753 (https://phabricator.wikimedia.org/T209732)
[23:18:20] <wikibugs>	 10Analytics, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) The first thing we need to do is to oozie-fy the data creation workflow that produces the files you would be loading into mysql (likely tsv), let...
[23:33:00] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10ayounsi) I think there is a distinction to make here when saying "prod", as it's made of several vlans/networks, especial...