[07:09:41] all the puppet warnings for elastic nodes will go away soon [07:46:39] Are the ps1-d4 and d3 alerts intented to be there as a leftover from yesterday's issue? [07:50:21] no idea, maybe XioNoX knows? --^ [08:01:48] no idea, maybe dcops knows? [09:44:11] apergos: I have added the above alerts to the action items of the incident report and assigned them to willy [10:10:05] thanks marostegui [10:10:47] <_joe_> I just noticed many logstash hosts have icinga notifications disabled; is that intentional? [10:14:20] if they are running the elastic stack 7 then yes afaik [10:23:25] <_joe_> yes seems so [10:23:32] <_joe_> I was asking as some are alerting on icinga [10:29:37] does anyone know if normal logs will contain exact puppet exec commands? [10:30:20] I am trying to debug a potential race condition [10:31:18] _joe_: the logstash1012 one? I'll take a look [10:31:19] I guess I can increse logging verbosity on a test host [10:36:41] <_joe_> jynus: puppet help agent [10:36:47] <_joe_> --debug is what you want [10:36:56] <_joe_> it's easier done when running manually [10:37:04] <_joe_> puppet agent -tv --debug [10:39:06] nice, thank you, _joe_ that was super useful [12:21:05] jclark-ctr: just to confirm this is happening today https://phabricator.wikimedia.org/T261453 ? [12:22:05] @marostegui: yes is happening today I am on site doing prep work right now will start prolly in 30 minutes or so [12:22:18] jclark-ctr: cool thank you! [12:28:03] <_joe_> marostegui: no downtime is expected though [12:41:14] is this related to the cookbook that went off just before, or coincidence? [12:41:56] bblack: no, thankfully (the cookbook was me) [12:42:20] if a downtime cookbook can create all this I'd be impressed :) [12:43:03] (the target was a db host in eqiad, so not in the serving line for anything apart from labsdb) [13:08:47] We will be be postpone pdu swap today per email by Chris today [13:09:01] thanks [13:09:02] take care cmjohnson1 :** [13:24:45] klausman: ops@ is getting spammed by `prometheus-amd-rocm-stats` from stat1005 [13:25:00] I saw, I am working on a fix [13:25:04] ack [13:25:12] https://gerrit.wikimedia.org/r/c/operations/puppet/+/626150 [14:10:09] godog: I'm cleaning up after an especially messy puppet accident yesterday and seeing broken puppet on a bunch of things in the 'monitoring' project. I'm pretty sure they're unrelated to yesterday, but could you have a look let me know? (And maybe tidy up if the fixes are obvious) pontoon-conf-01, pontoon-elastic7-01, pontoon-kafka-01, pontoon-logstash7-02 [14:10:10] thx [14:15:22] andrewbogott: best way to request cloud project deletion? ticket? [14:15:49] yeah, a ticket under the 'new project request' heading will get it noticed [14:18:42] andrewbogott: 'pontoon' refers to a way of running a custom puppetmaster, so yeah, it's quite likely that it's all godog's fault [14:19:09] :) I'll do my best to ignore them [14:19:53] (https://wikitech.wikimedia.org/wiki/Puppet/Pontoon for reference) [14:27:14] andrewbogott: ack, yeah some of those are expected-broken indeed [14:27:21] * godog nudges kormat off the pontoon [14:27:24] ok [14:30:51] * kormat flounders [14:56:20] marostegui: you wrote "Also labweb paged SRE instead of WMCS, expected?"; do you happen to know what pages fired in particular? Was it the host being down, or service-specific things? [14:57:15] andrewbogott: from a quick look on victorops, it seems that it was host down [14:57:26] 'k thanks [15:23:41] if it was in row d then it was unreachable [15:23:52] (yesterday)