[07:31:53] <federico3>	 Amir1: can i start the schema change in s8?
[09:02:27] <Amir1>	 federico3: go for it
[09:02:44] <Amir1>	 regarding es one, I would need to look at our docs
[09:02:50] <Amir1>	 or previous cases
[09:02:55] <Amir1>	 x1 	eqiad 	snapshot 	very_wrong_size 	11 hours ago 	307.9 GB 	-20.2 % 	The previous backup had a size of 385.8 GB, a change larger than 15.0%.
[09:02:55] <Amir1>	 		dump 	very_wrong_size 	1 day, 7 hours ago 	96.8 GB 	+17.5 % 	The previous backup had a size of 82.4 GB, a change larger than 15.0%. 
[09:03:50] <Amir1>	 I really think the increase is because of double compression. Is it possible to exclude a table from dump compression since it's already compressed?
[09:04:52] <Amir1>	 Also ^_^
[09:04:53] <Amir1>	 s4 	eqiad 	snapshot 	wrong_size 	4 hours ago 	1.8 TB 	-5.4 % 	The previous backup had a size of 1.9 TB, a change larger than 5.0%. 
[09:04:58] <Amir1>	 (categorylinks)
[09:05:01] <federico3>	 i grew the xfs partition manually for now - my question is if we need to open a bug to a different team for the host provisioning or we want to do the repartitioning on new hosts
[09:54:57] <Amir1>	 https://www.irccloud.com/pastebin/epMuFJtr/
[09:55:14] <Amir1>	 The script is deleting these thumbnail sizes (and it continues in both directions)
[10:00:34] <federico3>	 uh?
[10:04:44] <federico3>	 I restarted ferm but /var/lib/prometheus/node.d/check_ferm_active.prom is still stale, is there any other known workaround?
[10:05:31] <jynus>	 It would be extremelly special to have an exception like that
[10:07:27] <federico3>	 uh?
[10:07:44] <jynus>	 I am answering amir
[10:10:11] <Amir1>	 jynus: noted, no big deal. I'm asking them to move it to ES anyway, it'll change
[10:13:25] <federico3>	 I'm seeing nrpe2nodexp-ferm_active logging out an error around permissions for /var/lib/prometheus/node.d - I'm not seeing bug reports for this in phab - has anyone seen this before?
[10:14:14] <jynus>	 given that is probably a new alert, I don't think it has been seen before
[10:16:42] <jynus>	 you should mention it to Tiziano, I think: T384472
[10:16:42] <stashbot>	 T384472: Candidate nrpe checks for compatibility layer icinga/prometheus/alertmanager - https://phabricator.wikimedia.org/T384472
[10:21:33] <federico3>	     it's probably a red herring, other "healthy" hosts are showing the same error but the stale file is not there
[10:22:19] <federico3>	 thanks, I'll poke him
[10:27:13] <jynus>	 Amir1: also, compressing on backup still gets a 42% reduction on backed up size
[10:28:32] <jynus>	 I think the increase could be because it has to be stored in hexadecimal for text dumping
[10:30:12] <Amir1>	 Aaah. That makes more sense
[10:30:42] <Amir1>	 I knew double compression increases the size but 20% is way too much for that
[10:31:00] <jynus>	 no no, compression decreases the size
[10:31:00] <Amir1>	 This makes a lot more sense 
[10:31:24] <jynus>	 I am now wondering if what's stored on rows is very inefficient
[10:31:35] <jynus>	 if it is hexadecimal on source
[10:31:58] <Amir1>	 It's gzipped content of the original 
[10:32:17] <Amir1>	 I hope it's not turned into hex before storage
[10:33:02] <Amir1>	 I check 
[10:47:14] <jynus>	 so it has "rawdeflate," + base64 encoded content, with around a +1/3 overhead
[10:47:28] <Amir1>	 yup, I just checked
[10:50:38] <jynus>	 still it should be around half the size of the original, with some extra overhead
[10:51:19] <jynus>	 1383 bytes on db, 2673 bytes the original
[11:54:59] <Amir1>	 federico3: would you mind running the checklist script again on this ticket? https://phabricator.wikimedia.org/T395241
[12:10:38] <federico3>	 checklist script? 
[12:17:52] <federico3>	 Amir1: i updated the task with few updated hosts
[12:18:25] <Amir1>	 thanks
[13:27:30] <Emperor>	 federico3: I don't know where you're up to with es2049, but it has an alert active against it for 'The /var/lib/prometheus/node.d/check_ferm_active.prom metrics file has not been updated in 5d 0h 0m 5s. Check processes responsible for updating the file on es2049:9100'
[13:30:51] <federico3>	 Emperor: I've been talking to tappof, see https://phabricator.wikimedia.org/T403617 and https://phabricator.wikimedia.org/T403615 - the first seems to be affecting all hosts due to the puppet config creating the /var/lib/prometheus directory, yet only es2049 is also showing the stale file
[13:32:16] <Emperor>	 ah, cool. Worth silencing the alert for a bit with a link to one or other task, then?
[13:34:00] <federico3>	 ok
[16:35:09] <federico3>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1184544 this should help both for the new host and the existing ones, fleet-wide, but I would recommend rolling the change out incrementally one host at a time. Amir1 any thoughts?
[18:32:43] <jhathaway>	 would it be okay if I borrowed backup1012, to test a bios patch from supermicro, the box is currently insetup
[19:39:41] <jhathaway>	 given /backup is emtpy, I'm going to be bold and borrow it
[22:23:00] <jhathaway>	 well I didn't finish my testing, so if you folks don't mind, I will use backup1012 tomorrow as well