[15:01:06] test [15:02:30] channel is now logged :) [15:04:50] fun, apparently I cannot deop myself... [15:07:34] yay [15:19:18] cdanis: are you backing up also other stuff from catchpoint or should I? [15:19:30] I am going through stuff yeah [15:19:44] ok ping me if you need a hand [15:19:49] and thanks! [15:39:15] followup from yesterday's missed team meeting, I've 3 questions/proposals for which I'd like your feedback (see the etherpad for more info): [15:40:04] 1) any objection to failover Icinga back to icinga1001 next Mon. (or Tue.)? 13d uptime as of now [15:40:45] 2) the Icinga meta-monitoring (monitoring Icinga itself from outside) should page also for the passive host or just send emails? (I've pro/cons for both) [15:41:52] 3) agree on a plan to increase the Icinga meta-monitoring contact list, so far is notifying only Chris and me. Proposal: this team first for ~1-2weeks and then the whole user list. Thoughts? [15:45:54] my takes on these questions (already talked with volans but repeating them here for clarity): 1) yes please 2) page also for the passive host, assuming it is a rare occurrence 3) SGTM [15:47:10] +1 for 1) [15:47:15] just email for 2) [15:47:29] for 3), what does that entail? [15:49:20] 1) yes 2) also just email 3) im fine with that however to clarify im guessing its still pageing 24x7 [15:53:40] I'm also for 1) yes 2) email 3) notifying means sms in this context or email? [15:55:34] 3) means SMS [15:56:54] then +1 for 3) as well [15:57:32] ack, yeah sounds good [15:57:45] sorry forgot to mention, it has awake hours [15:58:07] it notifies once every hour if there is an ongoing issue and once for the recovery [15:58:23] the "ack" / disable notification is just comment out the crontab :D [15:58:41] (the every hour is ofc configurable) [15:58:59] notify is SMS + email (sms for the alert, email with the details) [15:59:29] and can be email only for the passive host depending on (2) [16:00:33] my personal thought is that the passive host being down is a N+0 situation and page-worthy, but I won't push too hard on that [16:01:11] at the same time it's an active/passive setup that requires manual failover [16:03:42] yeah, and compared to our usual paging threshold it seems a little excessive to page on that [16:05:31] to avoid false positives the script retries $tries times (default 3 times, sleep 10s between them) [16:07:59] what is the probe? does it login and access some status page or so? [16:09:37] it logins and access https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=0 [16:09:40] and checks teh values [16:09:46] (some values) [16:11:13] see the docstring in: [16:11:14] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/external-monitoring/+/refs/heads/master/icinga/check_icinga.py [16:11:16] for more info [16:11:31] * volans doesn't know why gitiles doesn't highlight this file [16:14:42] ack [16:15:42] how do we downtime it the next time icinga1001 gets rebooted for a kernel update? (as we usually don't failover, but just accept a few mins or unavailability) [16:17:45] a good question [16:18:04] I'm updating the restart page [16:18:20] doh, ther eis no icinga there [16:18:23] moritzm: https://wikitech.wikimedia.org/wiki/Wikitech-static#Meta-monitoring [16:18:27] commenting the crontab [16:18:40] but I want to cross-link it on https://wikitech.wikimedia.org/wiki/Service_restarts [16:18:43] give me a minute [16:19:35] what are the pros and cons for paging on the passive host? [16:20:38] volans: thx! [16:21:51] {done} [16:22:13] volans: catchpoint has no data export feature. guess how I've been 'archiving' our setup [16:22:28] * volans hopes not screenshots :D [16:22:41] close! [16:22:49] save webpagce as [16:22:50] i have a directory tree of print-to-PDFs [16:22:57] ahahah [16:22:58] nice! [16:23:04] volans: LGTM, thanks [16:23:09] they are like screenshots that you can copy and paste text from ;) [16:25:00] moritzm: fixed link to go to the specific section, and thanks for reminding me about it ;) [16:26:07] :D [16:28:21] an intruder :-P [16:28:38] I love webrecorder.io for 'saving' websites [16:28:58] hi jijiki [16:29:06] welcome to the foundations jijiki, it's turtles all the way down [16:30:14] * jijiki listens to The Turtles - Happy Together [16:30:20] tx volans :) [20:28:01] anyone have a mo to +1 some very minor fixes: https://gerrit.wikimedia.org/r/c/operations/puppet/+/496606 https://gerrit.wikimedia.org/r/c/operations/puppet/+/496580 [20:42:45] the second one isn’t loading for me [20:50:32] erm https://gerrit.wikimedia.org/r/c/operations/software/netbox-deploy/+/496580 sorry :) [20:50:46] also, thank you :D [20:54:36] sure np