[04:50:01] thanks Krinkle! [05:46:18] I am going to restart x1 db master in 15 minutes [08:28:41] anyone had the issue where '/var/lib/puppet/state/last_run_summary.yaml' was a directory itself? got it today on one node [08:28:54] trying to find what caused it [08:43:12] <_joe_> dcaro: I'd first grep auth.log :P [08:43:29] xd [09:33:10] what is the recommended way to introduce an icinga check for a virtual IP address on a server? [09:33:21] (in puppet) [09:33:38] <_joe_> arturo: what do you mean? [09:34:09] I have a pair of servers with a VIP. I want icinga to check if that IP is up or down, by using ping [09:34:25] I'm not very familiar with the monitoring codebase in ops/puppet.git [09:34:30] <_joe_> so it's not on LVS? [09:34:36] it is not on LVS [09:35:03] <_joe_> ok then I have no idea, we never did floating IPs in production so we likely have nothing for it specifically [09:35:50] fair [09:35:54] for ceph stuff we use the host alert1001 [09:35:59] <_joe_> but you can probably use monitoring::host [09:36:11] (so on alert1001 there's a few checks related to ceph cluster status) [09:36:21] not sure if that's the way to go, but that's what is there [09:36:32] (makeis it hard to downtime xd) [09:36:46] we also have a couple of servers with extra IP addresses, will check how we do there [09:42:42] yeah introducing the hostname backed by the VIP with monitoring::host seems correct to me, like we do for lvs [09:43:04] godog I just noticed `modules/icinga/manifests/monitor/toollabs.pp` I think that's the way to go, no? [09:43:19] will write a patch [09:44:13] arturo: yeah that's the idea [10:02:04] godog: something similar to this? https://gerrit.wikimedia.org/r/c/operations/puppet/+/685379 I made up the check command parameter, will need to investigate more [10:02:34] dcaro: ^^^ this might be the way to go for ceph too [10:03:01] +1 [10:03:54] ceph does not have a floating ip though, so the default host ping check will fail, I have to dust off my nagios/icinga skills xd [10:04:56] might be interesting adding there the DC also to avoid collisions (and to being able to have both) [10:06:34] arturo: yeah pretty much that [10:17:03] thanks! [10:24:57] there is an alert for eqiad's port utilisation [13:12:35] can someone please merge betacluster-only changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/684034 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/684088? [13:17:22] fyi all https://debmonitor.wikimedia.org/ is currently unavalible, trying to test something but can revert quickly if anyone needs it (submissions via debmonito-client still work) [13:18:08] jbond42: ack, icinga might complain ;) [13:18:41] ack [13:39:34] something weird is going on with the reimage of db2129. it's getting an ip (the correct one) from dhcp, but it looks like it doesn't get any routes? [13:39:53] it then falls back to asking for manual entry of the netmask & gateway ip [13:47:51] downloading early_command works fine. that reconfigures the networking, and it stops working. [13:48:20] is there anything on the dhcp logs? [13:48:31] dhcp logs look ok [13:48:39] correct mac and such? [13:48:42] yeah [13:49:46] marostegui: hiii there [13:50:17] marostegui: you broke it: https://gerrit.wikimedia.org/r/c/operations/puppet/+/685316 [13:50:26] there's a `||` in there [13:50:54] Ah crap [13:50:55] Fixing! [13:50:57] causing this error: [13:50:59] `/var/lib/dpkg/info/network-preseed.postinst: eval: line 1: syntax error: unexpected "||" (expecting ")")` [13:51:34] which i found by looking through 1500 lines of /var/log/syslog using `more` 😭 [13:52:31] kormat: https://gerrit.wikimedia.org/r/685447 [13:53:20] i've no idea _how_ this then causes the installer to declare that networking is broken, but d-i is.. well, d-i. [14:07:26] progress - the install is now proceeding 🎉 [14:07:49] \o/ [14:09:34] volans: back to your questions re monitoring i think there are two isses. one the underlining issues was a missing sni on the cert which should have been picked up by check_http. hwoever it seems that chack_http dosn;t really care about the domain [14:09:40] /usr/lib/nagios/plugins/check_http -p 7443 --ssl --sni -I 10.64.16.72 -H debmonitor.wikimedissa.org '/' [14:09:44] ^^ gites ok [14:09:48] *gives [14:10:02] :/ [14:10:07] Ther other aspect of this is i think external monitoring [14:10:11] yes [14:10:14] that's what I meant [14:10:37] external can be performed by icinga too if we can hit the same entry point of external traffic of course [14:12:00] sure but what to we attach it to, are there current examples? [14:13:22] dunno, but if I curl https://debmonitor.wikimedia.org/ from alert1001 I get a 301 to IDP and [14:13:25] x-cache: cp1081 miss, cp1089 pass [14:13:41] so it should hit most of the same path of external users [14:14:08] (not network wise ofc, but that's more for a generic external monitoring) [14:14:26] *HTTP 302 [14:16:24] yes i get that but what "icinga host" to attache it to, i think a simlar question came up earlier from art.uro however that sounded slightly different and it sounded to me that we allready had some testing for services on the caching cluser [14:17:01] debmonitor.wikimedia.org [14:17:05] that's obvious [14:18:47] so then this would be the first service testing in this manner? [14:18:53] no, we have a lot of them [14:19:51] opk i see a few example i can added [14:19:53] look at the host list in icinga, there are many, both IPs directly or domains [14:20:00] blog, policy, upload, etc... [14:20:19] I don't recall why we ddn't add it in hte first place [14:24:45] volans: https://gerrit.wikimedia.org/r/c/operations/puppet/+/685464 [14:26:44] ack, I'll wait PPC [14:26:51] should be there [14:27:14] *PCC ofc [14:27:23] however its is noop because of exported resources [14:27:33] eh [14:28:02] the icinga host and icinga service are bothe exported resources and therfore dont show up in a pcc cataloouge diff [14:29:39] hmm actully there is a hack so it should show up, CR is wrong [14:30:19] sorry fixing a thing in prod, will look shortly [14:54:20] jbond42: {done} sorry for the delay [14:54:27] thanks for adding it! :) [14:54:58] np thanks [15:28:57] <_joe_> Majavah: sorry, I saw your first ping yesterday night and I assumed someone did [15:31:22] _joe_: I requested that first time on Monday night, not yesterday :/ [15:31:42] <_joe_> Majavah: yeah I meant monday night, I didn't even check back then [15:32:22] <_joe_> Majavah: merging, and sorry on behalf of us all [15:38:38] thanks _joe_ [15:39:00] can you merge https://gerrit.wikimedia.org/r/684117 and https://gerrit.wikimedia.org/r/684120 too please? [15:40:26] <_joe_> not sure I get the rationale beyond the latter change [15:40:47] <_joe_> reading the task now [15:41:15] <_joe_> oh it's already done, this is just moving hiera to ops/puppet [17:51:26] How Tor created their new status page: https://blog.torproject.org/check-status-of-tor-services [17:52:24] cc cdanis ^^^ [22:25:59] legoktm, Amir1: thank you for moving my toolhub-dev@ list. I gave you a shout out on the list for brining me joy -- https://lists.wikimedia.org/hyperkitty/list/toolhub-dev@lists.wikimedia.org/thread/AMUUBKE45RZSH5CUJ4I53WMTCJNQA3I6/ [22:26:49] <3 [22:26:52] Thanks [22:28:21] I hope moderating the mailing list becomes easier for you [22:28:24] Search is possible. [22:28:29] The security *cough* [22:28:47] you can also compose mail from the web ui [22:29:17] Amir1: a thought that maybe you have already had... are we going to have to fix all uses of the mail: and mailarchive: interwiki links at some point or will 'magic' happen for that? [22:29:43] Kunal is on it [22:29:50] excellent [22:30:00] which basically means magic will happen [22:30:29] https://phabricator.wikimedia.org/T280731 [22:31:18] Also the main page of mailing lists is already getting the redirect (e.g. https://lists.wikimedia.org/mailman/listinfo/lgbt [22:45:36] bd808: :))) the plan is basically that all existing URLs will redirect properly, and we can introduce new interwikis for hyperkitty/postorius links