[05:22:18] <wikibugs>	 (03PS4) 10Legoktm: Update for Buster, refresh packaging [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668451 (owner: 10Majavah)
[05:31:32] <wikibugs>	 (03PS5) 10Legoktm: Update for Buster, refresh packaging [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668451 (owner: 10Majavah)
[05:34:53] <wikibugs>	 (03PS1) 10Legoktm: Delete gbp.conf, use default options [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668597
[05:36:24] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] "I added Mortiz's changelog entry for completeness and then dropped "debhelper" from Build-Depends since it's implied from debhelper-compat" [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668451 (owner: 10Majavah)
[05:36:32] <wikibugs>	 (03Merged) 10jenkins-bot: Update for Buster, refresh packaging [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668451 (owner: 10Majavah)
[05:37:27] <wikibugs>	 (03PS2) 10Legoktm: Delete d/gbp.conf and d/files, use default options [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668597
[05:37:29] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] Delete d/gbp.conf and d/files, use default options [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668597 (owner: 10Legoktm)
[05:37:34] <wikibugs>	 (03Merged) 10jenkins-bot: Delete d/gbp.conf and d/files, use default options [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668597 (owner: 10Legoktm)
[06:09:57] <wikibugs>	 (03PS1) 10Legoktm: Fix packaging [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668599
[06:11:28] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] Fix packaging [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668599 (owner: 10Legoktm)
[06:11:36] <wikibugs>	 (03Merged) 10jenkins-bot: Fix packaging [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668599 (owner: 10Legoktm)
[06:19:50] <wikibugs>	 (03CR) 10Legoktm: Update for Buster, refresh packaging (032 comments) [analytics/udplog] - 10https://gerrit.wikimedia.org/r/668451 (owner: 10Majavah)
[07:00:38] <wikibugs>	 (03CR) 10Joal: [C: 03+2] "All good :) Thanks for the patches - merge when you wish" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/664885 (https://phabricator.wikimedia.org/T273116) (owner: 10Mforns)
[07:00:53] <elukey>	 good morning 
[07:01:13] <elukey>	 !log stop hadoop daemons on analytics1066 - disk errors on /dev/sdb after reimage
[07:01:15] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[07:13:42] <elukey>	 !log add analytis1066 back with /dev/sdb removed
[07:13:45] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[07:18:33] <elukey>	 it was already not added, weird I don't see a task for a broken disk 
[07:21:27] <elukey>	 joal: bonjour :) ok if I run the systemd timer to drop the druid public datasource?
[07:21:32] <elukey>	 to see how it goes
[07:22:51] <elukey>	 !log drain + reimage analytics107[0-1] to debian buster
[07:22:52] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[07:42:59] <wikibugs>	 (03PS2) 10Lex Nasser: Fix and optimize Hive query and change field names in properties file for top-per-country job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/668236 (https://phabricator.wikimedia.org/T207171)
[07:45:12] <wikibugs>	 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1070.eqiad.wmnet', 'analytics1071.eqiad.wmnet'] ` The log can be found in...
[08:18:10] <wikibugs>	 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1071.eqiad.wmnet', 'analytics1070.eqiad.wmnet'] `  and were **ALL** successful.
[08:32:08] <elukey>	 !log drain + reimage an-worker107[8,9] to Debian Buster (one Journal node included)
[08:32:11] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[08:42:35] <wikibugs>	 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1078.eqiad.wmnet', 'an-worker1079.eqiad.wmnet'] ` The log can be found in...
[09:45:16] <wikibugs>	 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1079.eqiad.wmnet', 'an-worker1078.eqiad.wmnet'] `  and were **ALL** successful.
[09:46:42] <elukey>	 officially more hadoop worker nodes on buster than on stretch :)
[09:46:51] <elukey>	 41 vs 37
[09:53:24] <elukey>	 very weird, the namenode failedover
[09:54:37] <elukey>	 ah snap I think it spend too much time in GC
[09:55:24] <elukey>	 probably time to bump the heap size
[10:06:54] <elukey>	 !log failover HDFS Namenode from 1002 to 1001 (high GC pauses triggered the HDFS zkfc daemon on 1001 and the failover to 1002)
[10:06:55] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:19:37] <elukey>	 created https://gerrit.wikimedia.org/r/c/operations/puppet/+/668659
[10:20:23] <elukey>	 !log force run of refinery-druid-drop-public-snapshots to check Druid public's performances
[10:20:26] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:24:36] <elukey>	 datasource dropped, so far no sign of troubles, metrics look good
[10:24:40] <elukey>	 wikistats is ok as well
[10:25:31] <elukey>	 joal: I think we did it!!!
[10:25:32] * elukey dances
[10:56:10] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Dropping data from druid takes down aqs hosts - part 2 - https://phabricator.wikimedia.org/T270173 (10elukey) Forced a data drop on Druid public and nothing really happened, the problem seems gone!
[11:17:26] * elukey lunch!
[12:22:38] <joal>	 Hi elukey - sorry I've been taking disconnec time this morning - No problem at datasource drop feels like a huge win :) You rock elukey :)
[13:11:08] <wikibugs>	 (03CR) 10Joal: "Comment about comment :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/668236 (https://phabricator.wikimedia.org/T207171) (owner: 10Lex Nasser)
[13:23:04] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "Thanks for the reviewwww, Joal!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/664885 (https://phabricator.wikimedia.org/T273116) (owner: 10Mforns)
[13:31:00] <elukey>	 joal: when you have a moment - https://gerrit.wikimedia.org/r/c/operations/puppet/+/668659/
[13:31:49] <elukey>	 the beast needs to be fed :D
[13:32:02] <joal>	 uhuh
[13:32:49] <joal>	 well - let's do it :)
[13:33:19] <joal>	 elukey: I also wish we work toward reducing file-numbers
[13:34:09] <elukey>	 joal: ah yes I agree :)
[13:36:31] <elukey>	 !log roll restart HDFS Namenodes for the Hadoop cluster to pick up new Xmx settings (https://gerrit.wikimedia.org/r/c/operations/puppet/+/668659)
[13:36:32] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:00:18] <elukey>	 ok the failover from 1002 to 1001 didn't work, weird
[14:00:20] <elukey>	 the cookbook failed
[14:00:30] <joal>	 meh?
[14:03:23] <elukey>	 joal: second attempt worked
[14:05:56] <elukey>	 that is not super great, but I think that the first failed since an-master1001 became unhealthy when doing the failover
[14:06:20] <elukey>	 I waited a lot of minutes, maybe it was too soon, from now on the failovers might need more than 5 mins of wait time
[14:07:10] <elukey>	 ok I am going to wait 10/15 mins before restarting the NN on 1002
[14:07:13] <elukey>	 to complete the procedure
[14:20:03] <elukey>	 ok restarted 1002
[14:23:43] <elukey>	 ottomata: nice reduction with the R package removal! 
[14:24:05] <elukey>	 (sorry for the reviews, didn't had the time to review them in depth and I didn't want to slow you down :( )
[14:25:18] <ottomata>	 s'ok i'm mostly adding you as reviewers for reference and/or objections
[14:28:43] <elukey>	 ack, I feel super ignorant about the conda stuff, I'll have to review it sooner or later
[14:29:02] <elukey>	 when you have a moment next week I'd like to pick your brain on https://github.com/criteo/tf-yarn
[14:29:25] <elukey>	 it uses the cluster-pack/conda-pack thing, I am wondering if we could use it after we add GPU labels in yarn
[14:29:39] <elukey>	 (maybe adapting it to the conda work that you did)
[14:29:52] <elukey>	 I'll also ping Fabian
[14:30:19] <elukey>	 (I hate all those GPUs getting dust :D)
[14:35:03] <ottomata>	 elukey:  ya i read a bunch of that code
[14:35:11] <ottomata>	 it is much more flexible and cool than what i wrote
[14:35:21] <ottomata>	 it is able to detect if local packages have changed and re-upload to yarn
[14:35:22] <ottomata>	 but
[14:35:30] <ottomata>	 it is missing some things we need (probably could do a pull request)
[14:35:44] <ottomata>	 and ultimately, for the conda pack stuff, it doesn't do much other than what I wrote
[14:52:03] <mforns>	 ottomata: hi! do you have a moment for a chat about session length?
[15:02:07] <elukey>	 ottomata: yep yep I asked since I was wondering if your code could fit in, and it seems so, good :)
[15:07:27] <elukey>	 !log drain + reimage analytics1073 and an-worker1086 to Debian Buster
[15:07:30] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:12:43] <milimetric>	 joal: is it just me or did hdfs dfs -cat / hdfs dfs -text used to work for parquet files and doesn't anymore?
[15:12:55] <joal>	 milimetric: nope, never worked :)
[15:13:06] <joal>	 milimetric: parquet data is not to be visible in text
[15:13:19] <milimetric>	 I could've sworn it did :)  ok, just me, I thought we had hooked something up to read it
[15:13:29] <milimetric>	 maybe that was avro
[15:19:27] <mforns>	 I'm back in fdans 
[15:20:03] <fdans>	 mforns: creo que mi internet se ha jodido
[15:20:30] <mforns>	 ops, ok, no pasa nada
[15:35:47] <razzi>	 Hi team, g'day
[15:39:05] <razzi>	 !log rebalance kafka partitions for webrequest_upload partition 10
[15:39:08] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:39:32] <ottomata>	 yoohoo
[15:39:52] <ottomata>	 elukey:  razzi  yall doing labsdb today?
[15:40:16] <razzi>	 That's the plan!
[15:40:24] <razzi>	 I'm ready to start whenever
[15:49:12] <elukey>	 razzi: I am here to help/assist if you need :)
[15:49:43] <razzi>	 ok cool! I'll get started
[15:50:16] <razzi>	 Steps are at https://phabricator.wikimedia.org/T269211#6883946, here I go...
[15:50:32] <ottomata>	 :)
[15:50:44] <elukey>	 razzi: one nit - there is still a reference of "Analytics vlan" in the steps, remember that it should be cloud-etc..
[15:52:19] <razzi>	 Updated! Thanks elukey 
[15:52:41] <razzi>	 !log stop mariadb on labsdb1012
[15:52:43] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:53:24] <elukey>	 razzi: also ping people in #databases so they know :)
[15:54:06] <elukey>	 and if possible also !log in #operations for visibility
[15:55:37] <razzi>	 Ok, messaged to #wikimedia-databases, will log here and in operations
[15:56:44] <elukey>	 super
[16:00:18] <elukey>	 razzi: one qs - in netbox, is Device: labsdb1012 correct? Or does it need to be clouddb1021?
[16:01:28] <razzi>	 elukey: good question, clouddb1021 makes more sense I think since by then I'll have already renamed the dns name to clouddb1021
[16:01:50] <elukey>	 it is not clear in the docs but yeah clouddb looks more reasonable, I'll dig into it
[16:03:38] <elukey>	 yes yes I think we need to use clouddb1021 in there
[16:05:58] <razzi>	 ok cool, thanks for the catch
[16:08:35] <razzi>	 !log sudo cookbook sre.hosts.decommission labsdb1012.eqiad.wmnet -t T269211
[16:08:37] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:08:38] <stashbot>	 T269211: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211
[16:18:58] <elukey>	 nice cookbook completed :)
[16:19:44] <razzi>	 klausman: looks like you have a commit awaiting puppet-merge: ml-ctrl: Add dummy keys for ML k8s control plane
[16:20:10] <klausman>	 will merge in a New York minute
[16:20:11] <razzi>	 Seems like puppet-merge got smarter; it asked if I wanted to merge those, then asked if I wanted to merge mine, rather than making me merge all at once
[16:20:43] <klausman>	 Either is fine by me, just lmk :)
[16:21:15] <razzi>	 Maybe because those are in secrets module actually, and mine were public puppet
[16:21:28] <klausman>	 Yeah, otherwise it'd rewrite history
[16:28:48] <razzi>	 !log rename https://netbox.wikimedia.org/ipam/ip-addresses/734/ DNS name from labsdb1012.mgmt.eqiad.wmnet to clouddb1021.mgmt.eqiad.wmnet
[16:28:50] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:30:35] <razzi>	 !log delete non-mgmt interfaces for labsdb1012 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/
[16:30:39] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:35:16] <razzi>	 elukey: for the form at https://netbox.wikimedia.org/extras/scripts/interface_automation.ProvisionServerNetwork/, I only see the old device name labsdb1012, should I rename it to clouddb1021 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/? 
[16:40:15] <elukey>	 razzi: checking sorry
[16:44:03] <elukey>	 so https://netbox.wikimedia.org/search/?q=clouddb1021&obj_type= looks definitely strange, the parent is labsdb1012
[16:44:26] <razzi>	 hm ok
[16:46:37] <elukey>	 razzi: ah yes, see the "Edit the device page with the new name etc.."
[16:46:40] <elukey>	 https://netbox.wikimedia.org/dcim/devices/2078/
[16:46:53] <elukey>	 it is planned, but not clouddb1021
[16:46:58] <elukey>	 (still carrying the old name)
[16:47:04] <razzi>	 ok cool, missed that step!
[16:47:24] <elukey>	 in theory after this edit you should find it
[16:47:27] <elukey>	 (in the dropdown)
[16:47:41] <razzi>	 !log edit https://netbox.wikimedia.org/dcim/devices/2078/ device name from labsdb1012 to clouddb1021
[16:47:43] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:48:12] <razzi>	 I may be spamming with all my !log-ing, but better to log too much than too little I think
[16:48:54] <joal>	 +1 razzi :)
[16:49:21] <elukey>	 razzi: yep yep! If you want you can just !log the macro operation in #operations (like !log rename blabla) and then be spammy in the task or in here
[16:49:43] <razzi>	 ok gotcha
[16:49:44] <elukey>	 the important bit in #operations is that people can see if an alert matches with some ongoing ops
[16:50:39] <cdanis>	 ottomata: hey, any chance you'll be able to take another pass over https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/667948 today?
[16:50:56] <ottomata>	 cdanis: yes can look at it today!  
[16:51:00] <ottomata>	 ty for reminder
[16:51:03] <cdanis>	 thank you!
[16:52:32] <razzi>	 !log run script at https://netbox.wikimedia.org/extras/scripts/interface_automation.ProvisionServerNetwork/
[16:52:36] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:54:44] <razzi>	 !log sudo cookbook sre.dns.netbox -t T269211 "Reimage and rename labsdb1012 to clouddb1021"
[16:54:53] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:54:53] <stashbot>	 T269211: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211
[16:56:23] <ottomata>	 merged cdanis 
[16:56:30] <cdanis>	 oh awesome
[16:56:39] <ottomata>	 u can do helmfile stuff?
[16:56:47] <cdanis>	 yep!
[17:04:39] <elukey>	 razzi: how is it going with the DNS? :)
[17:04:47] <elukey>	 ah already done, good
[17:05:06] <razzi>	 yep, now working on "insetup" puppet patch
[17:05:17] <wikibugs>	 (03PS1) 10Phuedx: WIP: Add properties to UniversalLanguageSelector schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766)
[17:05:47] <elukey>	 perfect, let's see then if the reimage works :)
[17:07:40] <razzi>	 Yup!
[17:07:53] <razzi>	 !log sudo -i wmf-auto-reimage-host -p T269211 clouddb1021.eqiad.wmnet --new
[17:07:56] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:07:56] <stashbot>	 T269211: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211
[17:26:28] <elukey>	 razzi: how is the reimage going?
[17:26:51] <razzi>	 Seeing "Still waiting for reboot after 10.0 minutes", hopefully that's normal?
[17:27:10] <elukey>	 I am in the console and I don't see much ongoing
[17:27:24] <elukey>	 did it get to debian install?
[17:27:35] <elukey>	 or is it the reboot before it?
[17:28:48] <razzi>	 Here's the output thus far:
[17:28:48] <razzi>	 17:09:08 | clouddb1021.eqiad.wmnet | Removed from Puppet
[17:28:48] <razzi>	 17:09:08 | clouddb1021.eqiad.wmnet | WARNING: Unable to remove from Debmonitor, got: 404
[17:28:48] <razzi>	 17:09:08 | clouddb1021.eqiad.wmnet | Set Boot Device to pxe
[17:28:48] <razzi>	 17:09:09 | clouddb1021.eqiad.wmnet | Current power status is off, powering on
[17:28:49] <razzi>	 17:09:09 | clouddb1021.eqiad.wmnet | Chassis Power Control: Up/On
[17:28:49] <razzi>	 17:15:10 | clouddb1021.eqiad.wmnet | Still waiting for reboot after 5.0 minutes
[17:28:50] <razzi>	 17:22:40 | clouddb1021.eqiad.wmnet | Still waiting for reboot after 10.0 minutes
[17:29:01] <mforns>	 ottomata: hellooo, do you have some time (10 mins) to discuss session length?
[17:30:21] <elukey>	 razzi: strange 
[17:30:47] <ottomata>	 mforns:  yes now is perfect
[17:30:58] <mforns>	 ok! bc?
[17:31:01] <ottomata>	 k
[17:31:43] <elukey>	 razzi: tried to reboot it from the console (vsp -> power reset)
[17:32:12] <elukey>	 nope I don't see anything
[17:34:52] <elukey>	 it is strange since the mgmt interface seems working
[17:34:59] <wikibugs>	 (03PS1) 10Eric Gardner: Update schema to 1.3.0 and add new "image" mediatype option [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668748
[17:35:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update schema to 1.3.0 and add new "image" mediatype option [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668748 (owner: 10Eric Gardner)
[17:36:57] <wikibugs>	 (03PS2) 10Eric Gardner: Update schema to 1.3.0 and add new "image" mediatype option [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668748
[17:37:11] <elukey>	 razzi: when the decom script ran, did it kick off a homer script to update the switch config?
[17:37:24] <razzi>	 elukey: yes
[17:37:41] <elukey>	 razzi: and I guess it removed all configs right?
[17:39:23] <elukey>	 I see in your procedure that the homer stuff is updated at the bottom, but now I don't see anything on the switch related to clouddb1021
[17:40:12] <elukey>	 ah also big surprise, the new clouddb nodes are in the private vlan
[17:40:21] <elukey>	 not in the cloud one
[17:41:03] <elukey>	 ah snap the same as https://phabricator.wikimedia.org/T260441
[17:42:27] <elukey>	 razzi: ok so the situation is a bit complicated, we need to ping Brooke to ask what is the right VLAN, even if I suspect private
[17:42:52] <razzi>	 ok, I found the original homer output in case that's useful
[17:43:06] <elukey>	 in case, we'll need to remove the interfaces (except mgmt), re-run the script to provision but in private, and then run again netbox to update the dns
[17:43:45] <elukey>	 razzi: can you add that into a paste?
[17:47:03] <razzi>	 yep, one moment
[17:47:14] <elukey>	 asking to Brooke in the meantime
[18:16:35] <razzi>	 !log delete non-mgmt interface for clouddb1021
[18:16:59] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:17:40] <razzi>	 !log re-run interface_automation.ProvisionServerNetwork with private vlan
[18:17:43] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:18:44] <razzi>	 !log sudo cookbook sre.dns.netbox -t T269211 "Move clouddb1021 to private vlan"
[18:18:48] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:18:48] <stashbot>	 T269211: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211
[18:30:57] <razzi>	 !log run again sudo -i wmf-auto-reimage-host -p T269211 clouddb1021.eqiad.wmnet --new
[18:31:00] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:31:00] <stashbot>	 T269211: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211
[18:53:30] * elukey afk! 
[18:53:35] <elukey>	 have a good weekend folks :)
[18:55:23] <razzi>	 Alright! Reimage worked, clouddb1021 is insetup. Going afk for lunch
[19:37:42] <wikibugs>	 (03PS3) 10Lex Nasser: Fix and optimize Hive query and change field names in properties file for top-per-country job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/668236 (https://phabricator.wikimedia.org/T207171)
[20:26:49] <wikibugs>	 (03PS1) 10Ottomata: Migrate legacy EL schemas EditAttemptStep and VisualEditorFeatureUse [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668773 (https://phabricator.wikimedia.org/T267343)
[20:36:21] <ottomata>	 joal:  you around?
[21:39:50] <ottomata>	 fkaelin:  got a sec for a java world brain bounce?
[21:43:09] <gmodena>	 Is it ok to let spark jobs run over the weekend? I am testing a series of jobs that i'd expect to take a couple of days (total) to complete. I configured each job according to the "regular size" job spec. They'll run one at a time.
[21:43:26] <ottomata>	 ya sure sure
[21:43:36] <gmodena>	 ottomata awesome, thanks!
[21:43:40] <ottomata>	 although...i don't remember if there was an issuee with kerberos tickets expiring anymore
[21:43:52] <ottomata>	 but there's def no harm in it
[21:44:07] <ottomata>	 gmodena: 'regular' meaning you are using wmfdata?
[21:44:34] <gmodena>	 as long as it not cause issues to your, i can recover from a ticket expiring (I'll check in with during the day)
[21:44:43] <ottomata>	 ok yeah no issues on the clusterr
[21:45:01] <gmodena>	 ottomata yes, I use wmfdata's config to init SparkSession
[21:45:28] <ottomata>	 there was something in wmfdata about timing out sessions too,but maybe that only happens with the .run function
[21:45:33] <ottomata>	 can't recall atm
[21:45:41] <gmodena>	 ack
[21:46:16] <gmodena>	 i'm using only configs, so hopefully I'm good. And it's a test, no biggie if it fails :)
[21:46:23] <ottomata>	 k :)
[21:48:03] <gmodena>	 i recently discovered yarn.wikimedia.org
[21:48:10] <gmodena>	 <3 your systems.
[22:08:37] <ottomata>	 yay!