[08:22:18] 10netops, 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10fgiunchedi) [08:59:09] 10netops, 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10fgiunchedi) [09:06:54] 10netops, 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is complete! All netmon hosts are running Buster [10:27:05] 10netops, 10Operations: OSPF metrics - https://phabricator.wikimedia.org/T200277 (10ayounsi) New proposal! Change to T200277#6077728 is to use the following fields: * `metric` - keeps things more generic than `latency` * `state` - choice between `default`, `preferred`, `drained` [12:37:51] <_joe_> hi, I have a dns resolution problem [12:38:53] <_joe_> cp3052:~$ dig +short -x $(dig +short helm-charts.discovery.wmnet) [12:38:55] <_joe_> chartmuseum2001.codfw.wmnet. [12:39:10] <_joe_> while for instance, restbase correctly resolves to eqiad [12:39:25] <_joe_> cp3052:~$ dig +short -x $(dig +short restbase.discovery.wmnet) [12:39:27] <_joe_> restbase.svc.eqiad.wmnet. [12:39:47] <_joe_> now both datacenters are pooled in discovery [12:41:10] <_joe_> also [12:41:12] <_joe_> dns3001:/etc/gdnsd$ cat /var/lib/gdnsd/discovery-helm-charts.state [12:41:14] <_joe_> 10.64.48.26 => UP/300 [12:41:16] <_joe_> 10.192.48.159 => UP/300 [12:41:20] <_joe_> so I don't see anything wrong [12:41:29] <_joe_> still it gets resolved to codfw [12:42:57] <_joe_> bblack, ema any idea how that could be the case? [12:45:05] <_joe_> ok I found the issue [12:45:11] <_joe_> somehow this is inverted [12:45:13] <_joe_> dcmap => { [12:45:15] <_joe_> codfw => 10.64.48.26 [12:45:17] <_joe_> eqiad => 10.192.48.159 [12:45:19] <_joe_> } [12:45:26] _joe_: seems inverte in the discovery resources [12:45:29] yeah same [12:45:35] conclusion I was getting to [12:45:35] <_joe_> jayme: ^^ [12:45:44] ah, that makes sense :) [12:46:46] <_joe_> how is that file generated? in operations/dns or in puppet? [12:46:54] and indeed in ulsfo I get the eqiad IP [12:47:03] _joe_: thats, switched channels again to not get the spoiler from here :) [12:47:22] <_joe_> so this is a puppet generated file [12:47:30] <_joe_> so I guess there is something wrong in some logic somewhere [12:48:22] No, its just wrong in service.yaml [12:48:28] <_joe_> jayme: yep [12:48:35] <_joe_> that's where I just got [12:48:49] Nice ;-) Thanks [12:49:36] notice that the 'hostname' attributes are wrong too [12:49:45] codfw: [12:49:46] hostname: chartmuseum1001 [12:50:17] yep, fixing [12:55:12] https://gerrit.wikimedia.org/r/c/operations/puppet/+/617455 [12:58:27] jayme: lgtm [12:59:12] thanks [13:14:59] lol [13:15:29] at least I was consistent at swapping those 2 [13:18:45] <_joe_> haha indeed [13:55:02] 10Acme-chief, 10Patch-For-Review: acme-chief: support for generating a concatenated cert/key file - https://phabricator.wikimedia.org/T255249 (10Vgutierrez) @bd808 acme-chief 0.27 shipping your changes has been deployed in production. Please note that your change will be effective the next time acme-chief reis... [15:42:19] vgutierrez: thanks for the merge and build! bstorm is upgrading our acme-chief now so we can try it out. :) [15:42:40] as I said on the task, you need to trigger a reissue of the cert [15:42:46] otherwise it won't be generated [15:42:54] adding a spare SNI should suffice [15:44:13] or I could delete the existing cert and trigger an acme-chief reload [15:44:19] if that's ok with you [15:44:33] (I'd recommend disabling puppet on the servers using that cert meanwhile) [16:43:18] 10Acme-chief, 10Patch-For-Review: acme-chief: support for generating a concatenated cert/key file - https://phabricator.wikimedia.org/T255249 (10Bstorm) It reissued, but I'm not seeing the new ec-prime256v1.chained.crt.key on the client, at least with what puppet grabbed. Digging a bit there. [16:45:50] 10Acme-chief, 10Patch-For-Review: acme-chief: support for generating a concatenated cert/key file - https://phabricator.wikimedia.org/T255249 (10Bstorm) Ah, I see, it's the "live" link vs. the "new" link. [16:48:08] 10Acme-chief, 10Patch-For-Review: acme-chief: support for generating a concatenated cert/key file - https://phabricator.wikimedia.org/T255249 (10Bstorm) That's strange. I only have rsa cers, not ecc ones from the latest run. `lang=shell-session root@paws-k8s-haproxy-1:/etc/acmecerts/paws# ls live ec-prime256v... [16:51:56] 10Acme-chief, 10Patch-For-Review: acme-chief: support for generating a concatenated cert/key file - https://phabricator.wikimedia.org/T255249 (10Bstorm) They have appeared! There's a delay on the ec-prime256 certs [16:53:08] 10Acme-chief, 10Patch-For-Review: acme-chief: support for generating a concatenated cert/key file - https://phabricator.wikimedia.org/T255249 (10Bstorm) And that updated the "live" link [17:12:29] 10Traffic, 10CommRel-Specialists-Support, 10Editing-team, 10Fundraising-Backlog, and 9 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Elitre) Removing my team, I don't think there's anything for us here? [17:12:52] 10Traffic, 10Editing-team, 10Fundraising-Backlog, 10Operations, and 8 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Elitre) [17:13:39] 10Acme-chief, 10Patch-For-Review: acme-chief: support for generating a concatenated cert/key file - https://phabricator.wikimedia.org/T255249 (10Bstorm) 05Open→03Resolved It works! [17:27:10] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) a:05RobH→03BBlack Ok, this has now sat neglected for awhile. @bblack: Should I resume updating bios on these hosts in a rotating, one per cluster fash... [17:27:16] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) 05Stalled→03Open [17:27:19] 10Traffic, 10Operations: Servers freezing across the caching cluster - https://phabricator.wikimedia.org/T238305 (10RobH) [17:47:45] 10Acme-chief, 10Patch-For-Review: acme-chief: support for generating a concatenated cert/key file - https://phabricator.wikimedia.org/T255249 (10Krenair) I think the keys are generated first and the certs appear when acme-chief has gone through the ACME API to get stuff signed by the CA