[00:00:25] 10netops, 10Operations: cr4-ulsfo rebooted unexpectedly - https://phabricator.wikimedia.org/T221156 (10ayounsi) > I have checked the core and the information, we did not find any PR related to this, please give us a few days to analyze the core. [00:39:18] 10HTTPS, 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Krenair) >>! In T133548#5121135, @Krinkle wrote: > @Dzahn Assuming that with Let's Encrypt, HTTPS will work in mode... [04:35:34] mutante: so... the email can be related to the old cert [04:35:50] I've sent an email about it replying to the LE one when we got the 30 days alert [04:36:55] we used to use a cert for icinga.wikimedia.org that got changed to one for icinga.wm.o + icinga1001.wm.o + icinga2001.wm.o [04:38:57] so this is the culprit https://gerrit.wikimedia.org/r/c/operations/puppet/+/493071 [04:41:07] you can check it with openssl using a blackbox approach [04:41:11] https://www.irccloud.com/pastebin/oCqkrdEV/ [04:43:16] or in the servers [04:43:29] https://www.irccloud.com/pastebin/rvJQdhgu/ [04:45:42] regarding live & new pointing to the same directory, it makes sense. they should point to different directories during the renewal period if a staging time greater than 3600 seconds has been configured [04:52:35] and regarding /etc/acme, those are vestiges of the old LE puppetization, nothing related to acme-chief [06:54:33] 10Traffic, 10Operations, 10Patch-For-Review: Removal of If-Cached VCL support - https://phabricator.wikimedia.org/T220510 (10ema) 05Open→03Resolved [09:42:16] 10netops, 10Operations: eqord - ulsfo Telia link down - IC-313592 - https://phabricator.wikimedia.org/T221259 (10akosiaris) Just noting that at 10:41 UTC the circuit was still down per ` akosiaris@cr3-ulsfo> show interfaces descriptions | match 313592 xe-0/1/1 up down Transport: cr2-eqord:xe-0/1/1... [13:24:08] moritzm: cp1008 is saying funny things [13:24:11] Warning: Downgrading to PSON for future requests [13:24:19] Info: Unable to serialize catalog to json, retrying with pson [13:24:25] (when running puppet, that is) [13:25:22] is this related to what vgutierrez was mentioning earlier today (something something jessie C++)? [13:29:55] I think so [13:30:01] in buster instances we get a similar warning [13:30:03] Warning: Downgrading to PSON for future requests [14:18:17] yep I can confirm (from the analytics ones) [14:44:24] 10netops, 10Operations: eqord - ulsfo Telia link down - IC-313592 - https://phabricator.wikimedia.org/T221259 (10ayounsi) Got an email 1h ago saying the onsite crew was still splicing hard. > This is to inform you that splicing activity on the east side is still ongoing and we will keep you updated with work... [15:23:52] Krenair: you're right regarding the script permissions.. E_COFFEE or something /o\ [15:24:02] Krenair: PS10 looks good, but needs to be rebased (jenkins is crying about it) [15:26:42] vgutierrez: got it, thanks for the explanation. the "live & new pointing to the same" i figured was normal, i just wanted to confirm it's not looking unusual [15:27:02] np :) [15:28:07] fatal: bad object 0000000000000000000000000000000000000000 [15:28:07] sigh [15:29:13] * Krenair is wondering if he should report this bug upstream to git [15:32:42] Krenair: I think you should report it :) [15:52:34] 10Traffic, 10Operations, 10decommission, 10ops-codfw: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10RobH) a:03RobH [15:55:11] 10Traffic, 10Operations, 10decommission, 10ops-codfw: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for acamar.wikimedia.org and performed the following actions: - Revoked Puppet certificate - Removed from Pupp... [15:55:39] 10Traffic, 10Operations, 10decommission, 10ops-codfw: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for achernar.wikimedia.org and performed the following actions: - Revoked Puppet certificate - Removed from Pu... [16:03:12] 10Traffic, 10Operations, 10decommission, 10ops-codfw: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10RobH) >>! In T198286#5014740, @MoritzMuehlenhoff wrote: > @RobH I'd like to use one of the hosts for some installer tests in the next weeks, can we hold decommissioning thes... [16:09:29] 10Traffic, 10Operations, 10decommission, 10ops-codfw, 10Patch-For-Review: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10RobH) [16:12:08] 10netops, 10Operations: Test dhcp-option 82 - https://phabricator.wikimedia.org/T221388 (10ayounsi) 05Open→03Resolved p:05Triage→03Low [16:42:27] Good morning! Several VMs in the 'traffic' VPS project have broken puppet: traffic-recdns-anycast, traffic-upload-stretch, traffic-upload-varnish [16:42:52] can someone have a stab at fixing or deleting those? I need to update some DNS things cloud-wide and VMs with broken puppet are likely to break forever. [16:51:32] andrewbogott: traffic-recdns-anycast fixed [16:51:51] thx [16:51:52] ema: ^ [17:49:17] 10Traffic, 10Operations, 10decommission, 10ops-codfw, 10Patch-For-Review: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10RobH) [17:50:33] 10Traffic, 10Operations, 10decommission, 10ops-codfw, 10Patch-For-Review: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10RobH) a:05RobH→03Papaul These are ready to have the disks securely erased and decomissioned. [18:17:20] 10Traffic, 10DNS, 10Mail, 10Operations: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10herron) Indeed I'm able to produce a DKIM issue as well with wiki-mail. Here's an example (seen in headers of message triggered by account preferences change): ` dkim=invalid (public key: granula... [19:07:26] 10Traffic, 10DNS, 10Mail, 10Operations, 10Patch-For-Review: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10colewhite) p:05Triage→03High a:03colewhite [19:08:01] 10Traffic, 10DNS, 10Mail, 10Operations, 10Patch-For-Review: Phabricator SPF record contains internal addressing for phab[12]001 - https://phabricator.wikimedia.org/T221288 (10colewhite) p:05Triage→03Normal [19:08:03] 10Traffic, 10DNS, 10Mail, 10Operations, 10Patch-For-Review: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10faidon) How did it work until now? Also, unrelatedly, we probably should use something stronger than a 1024-bit RSA key. [19:29:29] 10Traffic, 10DNS, 10Mail, 10Operations, 10Patch-For-Review: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10herron) >>! In T221290#5123622, @faidon wrote: > How did it work until now? I wonder the same thing. Looking through old personal emails I have a message from wiki@wikimedia... [19:48:20] 10Traffic, 10DNS, 10Mail, 10Operations, 10Patch-For-Review: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10faidon) It's been a while but if I recall correctly, the intention was to not allow (= not create a valid signature) emails that had e.g. From: person@wikipedia.org (where per... [20:22:06] 10Traffic, 10DNS, 10Mail, 10Operations, 10Patch-For-Review: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10Krenair) >>! In T221290#5123696, @herron wrote: >>>! In T221290#5123622, @faidon wrote: >> How did it work until now? > > I wonder the same thing. Looking through old person... [21:19:56] 10netops, 10Operations: eqord - ulsfo Telia link down - IC-313592 - https://phabricator.wikimedia.org/T221259 (10Dzahn) ` Techs have completed splicing and are hands off. It may be necessary to reset your services locally at your equipment. We will now proceed to form an official RFO which we will share at a l... [21:20:32] 10Traffic, 10DNS, 10Mail, 10Operations, 10Patch-For-Review: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10herron) That's interesting, based on the headers in T221290#5123805 it looks like this issue goes back as far as 2015. >>! In T221290#5123730, @faidon wrote: > It's been a wh... [21:21:12] 10netops, 10Operations: eqord - ulsfo Telia link down - IC-313592 - https://phabricator.wikimedia.org/T221259 (10Dzahn) 05Open→03Resolved a:03Dzahn ` 12:01 <+icinga-wm> RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 55, down: 0, dormant: 0, excluded: 0, unused: 0... [21:51:30] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293 (10RobH) [22:15:45] 10Traffic, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560 (10RobH)