[13:59:02] Who is the keeper of ops@lists.wikimedia.org? aborrero@wikimedia.org (which no longer exists) is still subscribed and the listserv is trying /so hard/ to deliver to him [13:59:28] matthieulec: I think you got that info recently ? ^ [14:06:10] when offboarding people I usually ping Keith, not sure if there's also a second admin [14:06:12] if we only had an offboarding script handling such things :P [14:06:14] Wanna file a Phab ticket and point to https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/openldap/files/offboard-user.py ? [14:06:23] (if that's even in scope?) [14:13:44] * andrewbogott creates T413883 but unsure how to tag [14:14:03] if there's a generic mailman API to 1) check all lists a user is subscribed to and 2) unsubscribes them, that would be useful for all wikimedia.org addresses [14:14:15] but let's avoid special case hacks like just ops@ [14:17:27] it's do-able with shell commands [14:17:38] we have something similar for stewards [14:17:50] basically it is "sync the list members with this text file" [14:22:08] but in the overall bug picture bounces from defunct wikimedia.org addresses are not different than any other bounces from subscribers (like former uni addresses being disabled or free mail accounts having a full mailbox etc.), so looking into a generic mechanism to disable/remove defunct subscriptions in mailman seems like the better approach anyway [14:22:14] s/bug/big [14:25:32] mailman certainly has a bounce probe system (see https://phabricator.wikimedia.org/T369004), is that not working? [14:31:01] andrewbogott: re: your original question. it does have a generic answer, the owner should always be reachable as -owner@lists.wikimedia.org regardless if you know the real email behind it [14:31:46] oh, that makes sense. The flood of undeliverable emails has stopped so the issue may have solved itself, I'll follow up if I get more. [15:46:36] claime sorry I'm late at the party, I emailed ops-owner@lists.wikimedia.org to get subscribed to ops, which is exactly what Daniel is advising [18:30:28] looks like Puppet is broken on both alert* hosts? [18:30:32] Error while evaluating a Method call: [18:30:34] DNS lookup failed for wikitech-static.wikimedia.org Resolv::DNS::Resource::IN::AAAA [18:30:36] …in /srv/puppet_code/environments/production/modules/profile/manifests/icinga/external_monitoring.pp, line: 17, column: 45. [18:30:38] ah [18:30:59] anyone from o11y around to figure out with andrewbogott what needs to happen there? [18:31:53] * andrewbogott heads back to his desk [18:32:18] I saw it yesterday and I think it's related to a patch merged yesterday. [18:34:26] so wikitech-static no longer has a AAAA record. That breaks puppet? [18:36:25] for external monitoring, yeah -- but iirc we don't do that from wikitech-static anymore? [18:36:59] it is resolving the v6 there as well for the allow_from [18:37:09] so that needs to be updated [18:37:40] if it no longer has an AAAA record at all that is [18:37:50] [$host.ipresolve(4), $host.ipresolve(6)].filter [18:38:11] yeah, I'm just not sure whether the correct fix is to do that, or just drop wikitech-static.wm.o from profile::icinga::external_monitoring::monitoring_hosts, if it doesn't need to be in the allow_from anymore anyway [18:39:11] I'm pretty sure we can just drop it [18:40:26] I mean, the alerts in modules/icinga/manifests/monitor/wikitech_static.pp are unrelated right? [18:40:45] `ipresolve()` can be replaced with `dnsquery::lookup($host, true)` which automatically takes care of hosts with and without AAAA records [18:42:22] taavi: [18:42:26] - [$host.ipresolve(4), $host.ipresolve(6)].filter |$val| { $val =~ NotUndef } [18:42:26] + [dnsquery::lookup($host, true).filter |$val| { $val =~ NotUndef } [18:42:27] ? [18:42:40] you don't need that filter [18:43:30] hm, also I misplaced a bracket [18:44:24] so you mean just [18:44:26] $_allow_from = $monitoring_hosts.map |Stdlib::Host $host| { [18:44:27] dnsquery::lookup($host, true) [18:44:27] }.flatten [18:44:52] lookup() returns a list with one or two items depending? [18:49:17] * andrewbogott waiting for pcc confirmation [18:54:32] of course the pcc diff is unhelpful because of puppet being broken for the 'before' case. But... [18:55:05] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1223702 <- taavi, sukhe, want to go that route or just remove wts from external monitoring? [18:57:23] ok, going to try it. thanks denisse [18:57:24] The patch LGTM, let's see what other's say. [18:57:34] rzl, want to chime in? [18:58:57] andrewbogott: I think dropping it is fine but also dnsquery::lookup returns Array[Stdlib::IP::Address::Nosubnet] so your change should work [18:59:29] ok, so using dnsquery::lookup is an improvement regardless [18:59:58] no strong feelings! switching to lookup() sounds good to me either way [18:59:59] yeah [19:01:08] dropping wikitech-static there is also probably the right thing to do, but it's harmless, definitely not as important as fixing puppet on the alert hosts [19:04:32] followup https://gerrit.wikimedia.org/r/c/operations/puppet/+/1223704 [19:06:17] cdanis: puppet should be fixed now; thanks for the ping [19:10:24] Thanks y'all!! [19:20:27] thanks! [19:22:50] moritzm: T279023 / T351202 for the mailman api [19:22:51] T279023: Expose mailman3 internal REST API inside Wikimedia production network - https://phabricator.wikimedia.org/T279023 [19:22:51] T351202: stewards1001 / stewards2001: automatically subscribe stewards to mailman lists (was: Enable API access for Mailman3) - https://phabricator.wikimedia.org/T351202