[00:08:34] <jinxer-wm>	 FIRING: DiskSpace: Disk space thanos-be2006:9100:/ 1.746% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=thanos-be2006 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[00:18:34] <jinxer-wm>	 RESOLVED: DiskSpace: Disk space thanos-be2006:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=thanos-be2006 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[00:19:00] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus-dpkg-success-textfile.service on thanos-be2006:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:54:00] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus-dpkg-success-textfile.service on thanos-be2006:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:56:51] <federico3>	 Amir1, marostegui: when you have a second can we address https://phabricator.wikimedia.org/T422459#11832656 ?
[08:04:07] <marostegui>	 I'll check in a bit
[09:07:56] <marostegui>	 federico3: I don't see how that paste is related to that task.
[09:10:32] <federico3>	 there was also a warning during the maintain-views run that then disappeared minutes ago and I was wondering if the issue during the auto schema run can then impact maintain-views - anyhow I can move the auto schema glitch to the auto schema task
[09:12:02] <federico3>	 https://phabricator.wikimedia.org/T419635#11832765 
[09:29:58] <federico3>	 is it ok if I put section and role in each line in https://phabricator.wikimedia.org/T419961 ?
[09:56:17] <marostegui>	 federico3: works for me, but I think those are generated with a script Amir1 has somewhere so it may get deleted on the next run?
[09:58:17] <federico3>	 I can try
[09:58:33] <marostegui>	 sure!
[10:37:27] <Emperor>	 I've created T423690 with some notes about the thanos disk-filling. I fear the answer is partman-fettling and reimage
[10:37:28] <stashbot>	 T423690: Thanos backends filling their root filesystems overnight - https://phabricator.wikimedia.org/T423690
[11:25:34] <federico3>	 Amir1: can I start the schema change on s7 today?
[11:27:21] <Emperor>	 on a Friday?
[11:34:46] <federico3>	 in the past we've been running schema changes
[12:41:33] <Amir1>	 federico3: yeah. I'm done on s7
[12:41:49] <federico3>	 thanks
[12:57:49] <dhinus>	 Amir1: marostegui: do you have an estimated timeline for the s4/x4 split?
[12:58:14] <marostegui>	 Amir1: ^
[12:59:15] <dhinus>	 for the replicas we were planning to have x4 on clouddb102[45] and keep s4 on the current ones clouddb101[59]... but given 1019 is dead we're re-evaluating the plan
[12:59:23] <dhinus>	 T409557
[12:59:23] <stashbot>	 T409557: Productionize new clouddb* hosts (clouddb1022-1033) - https://phabricator.wikimedia.org/T409557
[13:01:14] <Emperor>	 Hi folks - gentle reminder to put any essential work updates for this week on our shared doc by 15:00 UTC (about 2 hours hence) please :) I know there's a bunch there already.
[13:02:03] <Amir1>	 dhinus: definitely before end of Q4
[13:02:34] <dhinus>	 Amir1: ack, so I think that will be _before_ we get the other clouddb hosts that we're waiting for
[13:02:50] <Amir1>	 But not sure exactly when
[13:04:49] <dhinus>	 marostegui: IIUC we need two copies of s4 in clouddbs, so that after the split one remains s4, and the other one can become x4
[13:05:18] <marostegui>	 dhinus: yes, but that's for later
[13:05:30] <marostegui>	 1024 and 1025 were originally planned for x4
[13:05:45] <marostegui>	 But they are now s4 because x4 isn't a reality in puppet
[13:06:01] <dhinus>	 yes, my worry is that the split happens in prod let's say end of q4, we still don't have the new hosts... so 1024/5 will suddenly lose some tables
[13:06:35] <marostegui>	 dhinus: we aren't going to delete tables automatically
[13:06:47] <dhinus>	 true, but can they still replicate from prod?
[13:07:00] <marostegui>	 replicate what?
[13:07:22] <dhinus>	 the data that will be x4 in prod... how does it get to the replicas? can it go from x4 in prod -> s4 in clouddb?
[13:07:55] <marostegui>	 they will be moved to their x4 masters
[13:08:07] <marostegui>	 once moved tables will be deleted in s4, so they won't reach x4
[13:16:50] <dhinus>	 I'm not following... right now clouddb102[45] are replicating s4 from the sanitariums. after the split in prod, if we don't do anything I expect clouddb102[45] will keep replicating s4, but they won't replicate new data being written into x4
[13:18:18] <marostegui>	 dhinus: 1024 and 1025 are originally planned to be on x4, and thus, they'll be moved to x4 master, so they'll have x4 data
[13:18:51] <dhinus>	 but if we do that, we lose again the redundancy for s4... because s4 would be left only on 1015
[13:19:06] <marostegui>	 Well yes, of course, because clouddb1019 had HW issues
[13:19:11] <marostegui>	 But we don't have any other hosts
[13:19:13] <dhinus>	 unless we already have 1032 up and running, but it looks like it's taking longer than Q4?
[13:19:27] <marostegui>	 dhinus: yes that's the whole problem
[13:19:49] <jynus>	 lag may happen again
[13:19:55] <jynus>	 on bacula db
[13:20:23] <dhinus>	 marostegui: so you're saying we'll have to accept having no redundancy on s4 between the day of the split in prod, and the day we finally get 1032?
[13:20:33] <marostegui>	 dhinus: we have no more hardware
[13:21:58] <dhinus>	 I was wondering if we could temporarily have a host with 3 sections (e.g. s3, x3 and x4), until we get the new hardware...
[13:22:34] <marostegui>	 We may, for now let's address the current issue
[13:22:45] <dhinus>	 ok!
[13:53:34] <Raine>	 o/, I want to clean up after myself in T422546 -- I created temporary tables for the (rejected) new ICU upgrade process. The `sql.php` user from deployment hosts (understandably) doesn't have DROP TABLE privileges. How should I do the cleanup?
[13:53:35] <stashbot>	 T422546: Clean up after the ICU 72 upgrade - https://phabricator.wikimedia.org/T422546
[13:53:59] <Raine>	 (I don't necessarily have to drop tables on a Friday :D just wondering how to go about it)
[13:55:37] <ihurbain>	 Raine: can I bash this? :P
[13:55:55] <Raine>	 ihurbain: sure :D 
[13:56:36] <ihurbain>	 !bash <Raine> (I don't necessarily have to drop tables on a Friday :D just wondering how to go about it)
[13:56:36] <stashbot>	 ihurbain: Stored quip at https://bash.toolforge.org/quip/XF-6m50B8tZ8Ohr0rtlA
[14:11:36] <marostegui>	 Raine: If you specify the wikis the tables are at and the name of the tables, I can do it for you next week
[14:12:25] <Raine>	 marostegui: great, thanks, I'll plop it on the task and ping you
[14:12:35] <marostegui>	 you can assign the task to me if you want
[14:52:25] <dhinus>	 marostegui: can you please review this announcement? https://etherpad.wikimedia.org/p/FJfXKQLwuHX49XPPBRY7
[14:52:39] <Raine>	 marostegui: assigned, thank you!
[14:53:07] <marostegui>	 dhinus: looks good, I'd add the task where clouddb1019 crashed
[14:53:10] <marostegui>	 Raine: thank you
[14:53:16] <dhinus>	 marostegui: good point let me add it
[14:53:43] <marostegui>	 dhinus: other than that, +1
[14:55:09] <dhinus>	 added, check now :)
[14:55:43] <marostegui>	 dhinus: looks good, thank you
[14:55:52] <dhinus>	 ack sending!
[15:00:13] <dhinus>	 (waiting for a +1 from wmcs as well, just in case...)