[00:06:37] PROBLEM - Host tools-redis-slave is DOWN: CRITICAL - Host Unreachable (10.68.17.130) [00:08:33] 10Tool-Labs: Configure web services in such a way that users don't have to (re)start it ever - https://phabricator.wikimedia.org/T94883#1176448 (10Ricordisamoa) Frankly, I don't know how a tool like https://tools.wmflabs.org/itsource/ could possibly fall out of memory. [00:16:40] RECOVERY - Host tools-redis-slave is UP: PING OK - Packet loss = 0%, RTA = 1.35 ms [00:17:59] 10Wikimedia-Labs-Infrastructure, 10Continuous-Integration, 3Continuous-Integration-Isolation: Support dedicating a specific virt node to a specific nova project - https://phabricator.wikimedia.org/T84989#1176488 (10Krinkle) [00:21:16] 10Tool-Labs: Configure web services in such a way that users don't have to (re)start it ever - https://phabricator.wikimedia.org/T94883#1176501 (10yuvipanda) That's what I meant by the false positive case - it might have, or it could be (one of a million other things - NFS dies, gridengine has problems, etc etc... [00:24:49] PROBLEM - Puppet failure on tools-redis-slave is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [00:29:13] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Setup a redis slave for toollabs as backup / redundancy - https://phabricator.wikimedia.org/T91239#1176536 (10yuvipanda) I re-created the instance with appropriate lvm setup, is all good now \o/ [00:29:36] 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1176543 (10yuvipanda) 5Open>3Resolved [00:29:51] RECOVERY - Puppet failure on tools-redis-slave is OK: OK: Less than 1.00% above the threshold [0.0] [00:57:08] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 7Tracking: Make sure that toollabs can function fully even with one virt* host fully down - https://phabricator.wikimedia.org/T90542#1176596 (10yuvipanda) [03:36:03] 10Tool-Labs, 3ToolLabs-Goals-Q4: Make webservice / webservice2 write out a service manifest when used - https://phabricator.wikimedia.org/T94964#1176855 (10yuvipanda) 3NEW a:3yuvipanda [03:36:38] 6Labs, 10Tool-Labs, 7Tracking: Make toollabs reliable enough (Tracking) - https://phabricator.wikimedia.org/T90534#1176868 (10yuvipanda) [03:36:41] 6Labs, 10Tool-Labs: Make webservice2 write out a bigbrotherrc file - https://phabricator.wikimedia.org/T90574#1176863 (10yuvipanda) 5Open>3declined a:3yuvipanda Closed in favor of T94964 [03:37:06] 10Tool-Labs, 3ToolLabs-Goals-Q4: Make webservice / webservice2 write out a service manifest when used - https://phabricator.wikimedia.org/T94964#1176855 (10yuvipanda) Note that in a distant future, all the actual logic for starting / stopping webservices should be in the service monitor, and calling the actual... [05:14:32] 10Tool-Labs: Configure web services in such a way that users don't have to (re)start it ever - https://phabricator.wikimedia.org/T94883#1176988 (10scfc) Prior to `bigbrother` being deployed, I thought about setting up continuous jobs (as a superset of web services) by using SGE's epilogue scripts to trigger a re... [06:26:11] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Setup a redis slave for toollabs as backup / redundancy - https://phabricator.wikimedia.org/T91239#1177086 (10Ricordisamoa) [06:35:05] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [07:05:10] RECOVERY - Puppet failure on tools-bastion-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:44:05] 10Tool-Labs: Bigbrother should ignore empty lines in .bigbrotherrc - https://phabricator.wikimedia.org/T94990#1177307 (10Krinkle) 3NEW [10:41:26] can someone tell me how I run a potentially long running sql statement on tool labs? [10:42:05] select count(r.rev_id) from revision r left join page p on r.rev_page = p.page_id where r.rev_timestamp between 20121029000000 and 20141029999999 and p.page_namespace = :ns; [11:06:42] The authenticity of host 'tools-login.eqiad.wmflabs ()' can't be established. ECDSA key fingerprint is 41:db:d9:4f:03:7e:14:20:a6:5b:23:5f:bf:85:42:38. [11:06:54] Not confirmed by https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/tools-login.wmflabs.org [11:07:21] The authenticity of host 'tools-login.wmflabs.org (208.80.155.130)' can't be established. ECDSA key fingerprint is 80:37:58:71:84:99:54:e7:17:dd:c4:be:54:48:41:57. [11:07:26] Not confirmed either. [11:10:56] Sigh. https://wikitech.wikimedia.org/wiki/Help_talk:SSH_Fingerprints/tools-login.wmflabs.org#Outdated [11:50:44] 6Labs, 6operations, 5Patch-For-Review, 7Puppet, 7Regression: Puppet: "Package[gdb] is already declared in file modules/java/manifests/tools.pp" - https://phabricator.wikimedia.org/T94917#1177523 (10hashar) Out of the four precise instances Timo created, only one has the problem: integration-slave-precise... [12:09:26] 10Wikimedia-Labs-General, 10Continuous-Integration, 6operations: role::puppet::self broken on new labs instances - https://phabricator.wikimedia.org/T94834#1177565 (10hashar) [12:20:00] YuviPanda: Hi, Special:FromEdit seems to be broken https://wikitech.wikimedia.org/wiki/Special:FormEdit/Nova_Project_Documentation/Nova_Resource:Chasetest/Documentation [12:22:33] 6Labs, 10Wikimedia-Labs-Infrastructure: Include Base::Standard-packages in labs images - https://phabricator.wikimedia.org/T94995#1177590 (10hashar) 3NEW [12:25:15] 6Labs, 10Wikimedia-Labs-Infrastructure: Jessie labs instance puppet run attempt to remove non empty /etc/ssh/userkeys/admin/.ssh - https://phabricator.wikimedia.org/T94996#1177598 (10hashar) 3NEW [12:25:50] 6Labs, 10Wikimedia-Labs-Infrastructure: Jessie labs instance puppet run attempt to remove non empty /etc/ssh/userkeys/admin/.ssh - https://phabricator.wikimedia.org/T94996#1177598 (10hashar) To remove the puppet notices, one can: `rm -R /etc/ssh/userkeys/` [12:26:58] 10Wikimedia-Labs-General, 10Continuous-Integration, 6operations: role::puppet::self broken on new labs instances - https://phabricator.wikimedia.org/T94834#1177607 (10hashar) I had an instance suffering of the issue, I had to recreate it. I can confirm puppet runs just fine now. Thank you! [13:21:33] what does it mean when I get a "The server's host key does not match the one PuTTY has cached in the registry ... this means that either the server admin has changed the host key, or you have actually connected to another computer pretending to be the server" warning when trying to connect to toollabs? was anything changed? [13:22:44] from -ops :: PROBLEM - Incoming network saturation on labstore1001 is CRITICAL: CRITICAL: 10.34% of data above the critical threshold [100000000.0] [13:27:10] (hmm, was that a response to my question?) [13:29:23] pajz: the ssh fingerprint for tools-login has recently changed https://lists.wikimedia.org/pipermail/labs-l/2015-March/003516.html [13:33:30] ah, sorry -- thanks for the link. [14:55:19] 10Wikimedia-Labs-General: ganglia.wmflabs.org unreacheable - https://phabricator.wikimedia.org/T64729#1177952 (10hashar) 5Invalid>3declined Ganglia has been phased out of labs, please keep this Task closed. If there is some more cleanup work to handle just fill sub tasks :) Thx! [16:21:00] 10Wikimedia-Labs-General: ganglia.wmflabs.org unreacheable - https://phabricator.wikimedia.org/T64729#1178138 (10Dzahn) Just like with nagios.wmflabs, these monitoring projects in labs can have 2 different purposes. One would be something that monitors labs instances and services on them. I understand we don't... [16:45:04] Coren_away: I just woke up but there is a network alert for labstore\ [16:45:11] warning only atm... [16:45:31] Yes, that's something writing a lot of data from the 'net [16:45:39] (lab network) [16:46:32] Not particularily worrisome if it doesn't last too long. [16:46:34] Coren_away: hmm, no corresponding alert for out tho. [16:46:59] I don't know whether labnet has alerts. The metrics are high though. [16:47:02] hmm, we should rejigger our alerts then to only warn if it's at least somewhat lookableat [16:47:09] labnet doesn't... [16:47:23] "lookableat"? [16:47:58] Well, to me waning means "keep an eye on it" [16:48:06] Which I am keeping. [16:49:38] But it doesn't look like it's abating, and labnet shows a lot of outbound too. Some job is being very heavy atm; I normally will start trying to track it down if it lasts too long and/or begins to affect qos which doesn't seem to be the case right now. [16:50:08] (We have got to give some leeway for the occasional heavy processing job) [16:51:40] true [16:51:44] it's at critical now tho [16:52:04] Yeah, I see. It's on its way up rather than down. [16:52:38] ... or wait - it looks like just as I'm saying this... :-) [16:53:08] Nope. False alert. [16:53:12] * Coren_away tracks it down. [16:53:25] At least it's "normal" overload. :-P [16:56:30] tools.rubinbot2 running reflinks.py [16:57:17] No description of the bot, no hints of what it should be doing... [16:57:59] * Coren_away suspends the job. [16:59:44] Hmmm. [17:00:38] * Coren_away checks for subprocesses. [17:01:18] Coren_away: that tool is for adding titles to URLS [17:02:22] Ah-ha. I blamed the wrong tool. [17:02:27] Coren_away: [[User:DumZiBoT/refLinks]] is an old writeup of the tool [17:02:44] Hi all! I see that my parliament diagram tool is down, and that there was a massive number of diagrams crated on 30 March. [17:02:56] * Coren_away resumes [17:03:09] Where can I see the access logs, and was my tool taken down because of overuse? [17:03:36] If I need to throttle it, I'm cool with that, but I'm not sure whether it was intentionally DOSed. [17:03:40] The culprit is one 'mass_worker' for tools.bub [17:03:41] parliamentdiagra: no it probably was just collateral damage from outages :) [17:03:52] Aaah, OK. [17:03:54] parliamentdiagra: I just started it back up and put in a bigbrother file [17:04:01] parliamentdiagra: verify it works? [17:04:07] "mass_worker" also indicates lots of work. [17:04:11] OK, will check. [17:04:48] Ah, it seems to be online again. [17:05:13] but my other question remains - how do I check my logs? [17:05:28] It seems bizarre that people would be creating 50 parliament diagrams in one minute. [17:05:42] That just makes no sense whatsoever. This was supposed to be a niche tool! [17:05:58] parliamentdiagra: The logs were colateral to that same filesystem switch, but I have a backup. I can make you a copy. [17:06:07] Thanks! [17:06:33] YuviPanda: Yep, found the heavy loading tool. Suspended the jobs. Are you working today or off? [17:06:55] Coren_away: working! I'm just still in bed tho. going to the office shortly... [17:07:05] Coren_away: was fixing IRCCloud so I have IRC on my phone [17:07:50] YuviPanda: Wanna do the honors of contacting the maintainer and telling him his jobs were suspended and why? tools.bub: 8 jobs suspended (most running something named mass_worker) [17:08:10] Coren_away: totes. [17:08:49] Coren_away: I'll just leave them a message on wikitech [17:08:53] Coren_away: thanks for looking into it! [17:09:47] Thanks, I now see access.log in my home directory. [17:09:50] From the metrics it looks like that single tool accounted for ~ 2/3 of the NFS traffic. :-) [17:10:21] done [17:10:40] parliamentdiagra: Sometimes, if you have links on a page that can cause things to be generated on the fily, stupid bots can end up crawling in a loop. [17:10:56] * Coren_away returns to his quasi-holiday. [17:11:04] Oh noh. [17:11:19] parliamentdiagra: Yeah, Ive ended up banning some crawlers [17:11:22] I'll have to do some kind of throttling. [17:11:47] parliamentdiagra: or do some User agent checking, often thats enough [17:11:56] Ah, good idea. [17:14:32] Heck, people seem to be hotlinking to the files my tool creates, even though they're nuked by cron script every day. [17:34:40] Cyberpower678? :P [17:34:59] Yes? [17:36:42] addshore, so what's up? [17:36:49] you blocked me ;p [17:36:55] I did? [17:37:16] Whoops. [17:37:25] I meant to block the vandal account. :p [17:37:41] addshore, misclicked the wrong user. :op [17:37:46] :D [17:37:49] XD [17:37:53] I was like, wtf! [17:37:54] xD [17:44:01] andrewbogott, ugh, do we really need to put "Users of ToolLabs can ignore this email" on so many emails? [17:44:10] can we not move them all to a separate list or something? [17:44:36] Krenair: yeah, we can maybe split the list some time in the future. [18:05:09] I would like to monitor the use of my tool. What should I be doing? [18:05:29] How can I get notified if it's getting heavy use? [18:05:43] How can I know if it's taking up significant resources? [18:15:23] would like to merge config changes in labs/tools/wikibugs2 and labs/tools/grrrit but don't know if there are other steps to deploy [18:18:14] mutante: they both have wikitech pages with deploys [18:18:16] Err [18:18:17] details [18:19:12] parliamentdiagra: you shouldn't bother mostly. If it is taking up way too many resources one of us admins will kill it and leave you a message on wikitech [18:26:42] OK, thanks! [18:27:22] YuviPanda: "fab pull" etc. ok. but where are you running those commands? [18:27:42] deploying with phabricator surprised me a bit [18:28:03] 6Labs, 10hardware-requests, 6operations: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1044517 (10Andrew) These are now in DC but not yet actually at eqiad. I'm on clinic duty next week, but would like to start imaging them on the 13th. Thanks! [18:28:17] mutante: on your local check out I think [18:29:51] i'm confused then how that actually deploys it [18:38:28] JohnFLewis: path conflict :p:0 [18:38:39] mutante: hell [18:39:08] hold on [18:39:24] i'll do the grrrit-wm one [18:41:18] mutante: https://gerrit.wikimedia.org/r/#/c/201738/ if you're on bots ;) [18:43:06] (03PS2) 10Dzahn: feed #wmt to new channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/199330 (owner: 10John F. Lewis) [18:45:17] sorry, that's a different bot, just grrrit [18:47:38] JohnFLewis: a second one after the first one , hah [18:48:20] mutante: Hm? [18:48:32] the conflicts and stuff [18:48:40] Ah [18:50:50] (03PS3) 10Dzahn: feed #wmt to new channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/199330 (owner: 10John F. Lewis) [18:51:36] so it seems it was removed from config meanwhile [18:52:15] (03CR) 10Dzahn: [C: 032] feed #wmt to new channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/199330 (owner: 10John F. Lewis) [18:54:18] !log bots restarting grrrit-wm for config change [18:54:21] Logged the message, Master [18:55:42] JohnFLewis: 11:55 -!- grrrit-wm1 [~lolrrit@208.80.155.255] has joined ##wmt [19:39:12] (03PS1) 10John F. Lewis: web: add requests back to menu [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/201826 [19:40:12] 6Labs: Renaming scheme for labs servers - https://phabricator.wikimedia.org/T95042#1178863 (10Andrew) 3NEW a:3Andrew [19:42:16] (03CR) 10Dzahn: [C: 031] web: add requests back to menu [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/201826 (owner: 10John F. Lewis) [19:42:32] (03CR) 10John F. Lewis: [C: 032 V: 032] web: add requests back to menu [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/201826 (owner: 10John F. Lewis) [19:44:46] 6Labs: Renaming scheme for labs servers - https://phabricator.wikimedia.org/T95042#1178873 (10Andrew) [19:56:38] hiya [19:56:39] 6Labs, 10Wikimedia-Labs-Infrastructure: Jessie labs instance puppet run attempt to remove non empty /etc/ssh/userkeys/admin/.ssh - https://phabricator.wikimedia.org/T94996#1178910 (10scfc) [19:56:40] i need help! [19:56:41] 6Labs: /etc/ssh/userkeys/ubuntu notices for every puppet run on labs instances - https://phabricator.wikimedia.org/T94866#1178911 (10scfc) [19:56:47] i enabled two factor auth for wikitech [19:56:50] now i have no idea how to log in [19:57:11] i can't find any docs about it either [19:57:28] hahaha [19:57:42] ottomata: did you generate the two factor auth tokens, etc? [19:57:45] yes [19:57:46] ottomata: and set it up for your phone? [19:57:46] i have them [19:57:49] phone.... [19:57:50] no? [19:58:01] ottomata: err, I mean, whatever factor you were using? GOogle Authenticator? [19:58:18] psshh, it never asked me? i just wrote down the tokens, and then was able to login somehow before [19:58:54] ah, ok [19:58:59] i was able to log inwith one of the ones i hadn't used. [19:58:59] ok [19:59:05] phone... [19:59:10] ottomata: :) since you have root, you can reset it via https://wikitech.wikimedia.org/wiki/Password_reset [19:59:17] 6Labs: /etc/ssh/userkeys/ubuntu notices for every puppet run on labs instances - https://phabricator.wikimedia.org/T94866#1178924 (10scfc) These seem to be related to https://gerrit.wikimedia.org/r/#/c/183814/ & Co. where I thought that the issue was just transient for the migration from the old system to the ne... [19:59:18] (2FA section) [19:59:25] ottomata: and set it up for google authenticator [19:59:44] where do I do that? [19:59:52] i'm looking at preferences but don't see much [20:00:39] ottomata: under 'User profile' in preferences [20:03:42] ugh, i just disabled it :p [20:03:59] oh crap, i need it for instances now, don't I? [20:04:20] ottomata: for logging into instances? No [20:04:36] no, do manage them [20:04:37] to* [20:04:40] ah, yes you do :) [20:04:45] because you've cloudadmin [20:05:06] blargh ok [20:06:10] YuviPanda: i don't understand. where do I tell wikitech to text me? [20:06:21] do I really need to install a google app? [20:06:26] ottomata: it doesn't text you. you have google authenticator app and use that... [20:06:32] ottomata: it's open sources, uses TOTP/HOTP protocol [20:06:54] i have to take a picture of my screen to login? [20:07:11] hmm/ [20:07:12] ? [20:07:15] there's a barcode... [20:07:18] that wikitech should give you [20:07:18] that's a one time thing [20:07:22] yeah [20:07:25] ok... [20:07:27] later it just generates numbers for you that you type [20:07:56] they are time-based [20:09:02] huh. i see. [20:09:17] 6Labs: Renaming scheme for labs servers - https://phabricator.wikimedia.org/T95042#1178948 (10yuvipanda) Just commenting to say all of this makes sense :) [20:09:20] ok ok , kinda cool. [20:09:32] thanks YuviPanda [20:19:27] 10Wikimedia-Labs-General: ganglia.wmflabs.org unreacheable - https://phabricator.wikimedia.org/T64729#1178975 (10scfc) The problem with Icinga is that it depends on exported Puppet resources, and in Labs where these can be influenced by the project admins, this may lead to taking over the puppet master. IIRC th... [20:21:32] shinken-wm: wb :) [20:26:55] 6Labs: Renaming scheme for labs servers - https://phabricator.wikimedia.org/T95042#1178998 (10Andrew) As long as noone vetos, this info should be duplicated on https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions [20:32:31] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1179017 (10scfc) ``` From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) To: root@tools.wmflabs.org Date:... [20:32:57] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1179018 (10scfc) ```From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) To: root@tools.wmflabs.org Date... [20:45:44] 6Labs, 6operations, 5Patch-For-Review, 7Puppet, 7Regression: Puppet: "Package[gdb] is already declared in file modules/java/manifests/tools.pp" - https://phabricator.wikimedia.org/T94917#1179105 (10Dzahn) merged. should be fixed now. wanna confirm? [20:57:39] 10Tool-Labs: Test how bigbrother reacts to user names not resolving and, if necessary, fix it - https://phabricator.wikimedia.org/T90410#1179153 (10scfc) I tested this on `toolsbeta-exec-01` by blocking `ldap` and `ldaps` in iptables and running `nscd -i passwd` to remove the cache: ``` scfc@toolsbeta-exec-01:~... [21:02:59] 10Tool-Labs: Test how bigbrother reacts to user names not resolving and, if necessary, fix it - https://phabricator.wikimedia.org/T90410#1179191 (10yuvipanda) I still think / hope to have completely gotten rid of bigbrother by end of the month :D [21:25:14] i have problems with accessing replicas from my instance. it claims no host, although i've done what's written in help [21:27:20] anybody can help me with that, please? [21:51:14] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 7Tracking: Replace bigbrother and ssh-cron-thingy with service manifests - https://phabricator.wikimedia.org/T90561#1179332 (10yuvipanda) [21:51:16] 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Make webservice / webservice2 write out a service manifest when used - https://phabricator.wikimedia.org/T94964#1179331 (10yuvipanda) 5Open>3Resolved [21:52:37] 10Tool-Labs: Test how bigbrother reacts to user names not resolving and, if necessary, fix it - https://phabricator.wikimedia.org/T90410#1179333 (10scfc) I hope that, too, but until then … :-) Especially, glancing over the code it isn't obvious to me //why// a failed look-up would block those users weeks after. [22:07:55] (03PS1) 10John F. Lewis: web: add mailing list list [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/201853 [22:13:50] (03CR) 10John F. Lewis: [C: 032 V: 032] " CR +2" [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/201853 (owner: 10John F. Lewis) [22:14:30] gwicke: argh, wtf. I messed up your patch (https://gerrit.wikimedia.org/r/#/c/191220/). Sorry, fixing. [22:16:12] ori: don't worry, I think it's all good [22:16:23] ori: my patch basically overwrote yours [22:16:54] oh [22:16:55] heh [22:17:00] :) [22:17:22] I just clobbered that, so you better re-apply yours. [22:17:28] Sorry. [22:28:42] (03CR) 10Tim Landscheidt: "Sorry, in my mental picture this was already merged. I have some reservations about using $HOME, because in the past we had to use "(getp" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/156608 (https://bugzilla.wikimedia.org/54054) (owner: 10coren) [22:40:32] (03CR) 10Tim Landscheidt: "$HOME is still unset in lighttpd CGIs:" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/156608 (https://bugzilla.wikimedia.org/54054) (owner: 10coren) [22:56:59] (03PS1) 10John F. Lewis: web: +2 for SPF [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/201856 [22:57:55] (03CR) 10Alpha: [C: 032 V: 032] web: +2 for SPF [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/201856 (owner: 10John F. Lewis) [23:47:52] PROBLEM - Puppet failure on tools-exec-24 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]