[04:54:14] 10Traffic, 10SRE: provision more machines for eqsin caches - https://phabricator.wikimedia.org/T275046 (10BBlack) These are just about ready and running correct puppetization, but **don't** pool these yet. I think they may have some bad BIOS settings or something, at least related to power mgmt. cpufreq keep... [12:56:02] from T275046 it looks like setting up new hardware is never plug & play [12:56:02] T275046: provision more machines for eqsin caches - https://phabricator.wikimedia.org/T275046 [13:14:56] vgutierrez: how can i ensure i hit a specific cp box and is it safe to restart ats? tryign to do some testing for https://phabricator.wikimedia.org/T281673 [13:20:14] or bblack (if not too early :S) [13:22:06] what do you mean by "safe to restart"? [13:23:02] bblack: some background: im trying to add a new int ca from the new pki server that we can use to issues certificates which will be used between ats -> origin server [13:23:25] I have add the ne int cert today and have applied a server cer to debmoniitor back connection [13:24:08] running curl https://debmonitor.discovery.wmnet:7443/packages/ceph-common from a cp server works fine with no issues (i.e. all certs are correctly in ca-certificates) [13:24:24] however hitting cp server i get a 502 (you shold too) [13:24:39] i had spoken with ema about this a bit which resulted in https://phabricator.wikimedia.org/T281673 [13:24:46] ok [13:24:49] and from that it seems like things shuold work [13:25:04] so.... [13:25:10] however i stil get an error as such i wondered if ats needed to be restarted to pick up the new intermediate i pushed out this morning [13:25:34] (this would still not be the correct solution but wanted to dig into it a bit) [13:25:36] is the ats config for the CA a single value? [13:25:47] because if so, there's no way to "transition" sanely [13:26:18] (unless we can make something else work, like invent a new temporary CA that signs the old and new CAs and use that?) [13:26:18] from https://phabricator.wikimedia.org/T281673 it is currently configured to specify the puppetca however it is also set to just log faliures [13:26:57] my long term plan was to distrubute a ca bundle containing the new certs and the pupet ca cert [13:27:15] however i wanted to first test the asumptions documented in the above task [13:27:55] if it accepts a bundle [13:29:16] yes the documentation is unlcear however i first wanted to test [13:29:18] "The first setting, proxy.config.ssl.client.verify.server=2, results in logging an error if certificate validation fails. " [13:31:57] yeah.. it accepts a bundle but the current working theory is that regarding server certificate validation all the CAs trusted by openssl are allowed [13:32:20] we've tested setting the bundle against the snakeoil cert and it still allows a connection against the applayer [13:32:27] so far from ideal [15:05:54] hi traffic friends, will one of you have time to peek at https://gerrit.wikimedia.org/r/c/operations/puppet/+/679341 soonish? [16:31:36] cdanis: I stared at it as best I could, and it didn't blink! [16:31:51] ahaha [16:31:53] thanks :) [16:32:43] there's maybe some nomenclature issues and reliability issues, but (a) they're beyond the scope of that patch + (b) non-critical, I think [16:34:38] ("a" being lumping a generic definition of public cloud ranges under "abuse_nets" implies our opinion of them negatively when there might be not-so-negative uses of that list + (b) all the uses of the the abuse_nets subsets create weird invisible dependencies on certain keys existing for the VCL to compile at all, but VCL not-compiling is fairly easy to deal with after we notice it and doesn't [16:34:44] break traffic) [16:43:58] agreed on all counts :) [17:51:54] 10Traffic, 10SRE: provision more machines for eqsin caches - https://phabricator.wikimedia.org/T275046 (10BBlack) I checked the BIOS/iDRAC settings on cp5013 against https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/Dell_Documentation#Initial_System_Setup (+ the one custom setting we use on t... [18:05:43] 10Traffic, 10Analytics, 10SRE, 10Patch-For-Review: Add Traffic's notion of "from public cloud" to Analytics webrequest data - https://phabricator.wikimedia.org/T279380 (10CDanis) [18:06:40] 10Traffic, 10Analytics, 10SRE, 10Patch-For-Review: Add Traffic's notion of "from public cloud" to Analytics webrequest data - https://phabricator.wikimedia.org/T279380 (10CDanis) @fdans @JAllemandou New map entry should be ready for Analytics to set up in Turnilo :) [18:34:26] 10Traffic, 10SRE: provision more machines for eqsin caches - https://phabricator.wikimedia.org/T275046 (10BBlack) The others were in the same state. All are fixed and rebooted now, icinga downtimes are removed, netbox status is set to `Active`, and confctl weights are set correctly, but the `pooled` attribute... [19:03:37] 10Traffic, 10SRE, 10Patch-For-Review: cp_upload @ eqsin cascading failures, February 2021 - https://phabricator.wikimedia.org/T274888 (10BBlack) [19:03:49] 10Traffic, 10SRE: provision more machines for eqsin caches - https://phabricator.wikimedia.org/T275046 (10BBlack) 05Open→03Resolved a:03BBlack These are all pooled now and slowly filling their caches. Optimistically closing this task for now! [19:08:23] bblack: \o/