[06:31:52] 10netops, 10Operations: connectivity issues between several hosts on asw2-b-eqiad - https://phabricator.wikimedia.org/T201039 (10Marostegui) The following hosts (aside from the ones above) will need to be downtimed too: db1117, db2042 and db2078 (they replicate from db1072 and db1073) db2037 (replicates from d... [07:21:13] elukey: I'm around if you want to merge the patches for matomo [07:21:33] * elukey sends wikilove to ema [07:22:29] ema: going to do the last checks, should be ready in 10 mins [07:35:11] ema: I think that we can start adding the new backend [07:35:20] elukey: +1 [07:35:34] merging [07:36:51] ema: merged :) [07:38:34] shall we run puppet on cache text? [07:39:13] elukey: yes, please go ahead [07:42:00] I am doing sudo cumin 'A:cp-text' 'run-puppet-agent' -b 4 [07:44:15] elukey: looks good, double-checked on cp1075 [07:45:30] as in, be_matomo1001_eqiad_wmnet has been defined properly, you can proceed with the next patch [07:47:02] ack, I'll do it as soon as cumin completes the current run [07:47:26] (in the meantime I am moving the database from bohrium to matomo1001 [07:57:30] ema: proceeding with the second patch now [07:57:32] k [08:03:30] I can see access logs on matomo1001 :) [08:03:43] perfect! [08:04:03] confirmed on cp1079: [08:04:10] varnishlog -b -q 'BereqHeader:Host ~ "piwik"' [08:04:12] [...] [08:04:18] - BackendReuse 26 vcl-84966c56-06c0-45c0-b082-dca2e7904ace.be_matomo1001_eqiad_wmnet [08:05:23] going to check that nothing explodes during the next couple of hours and then I'll merge the third patch for the clean up [08:05:32] sounds good [09:00:49] ema: merging the cleanup patch ok [09:00:50] ? [09:05:27] elukey: +1 [09:19:03] done! Thanks for the help! [10:06:29] volans, vgutierrez: why are we returning None for lack of a common name now? [10:06:31] it would've triggered an exception before and I think an exception still makes sense/ [10:06:33] ? [10:07:19] ack [10:07:24] that works too for me [10:08:29] I don't understand why it was changed [10:08:43] was it just that the "name_value, =" syntax was too obscure? [10:10:12] it would also fail according to the documented API if the call returns an empty list [10:10:22] not enough values to unpack [10:10:25] yeah which is probably a good thing to fail on [10:10:37] a cert without a CN? [10:10:38] yes but with a proper exception no common name found [10:10:44] not "not enough values to unpack" :) [10:10:44] but maybe with a better exception that "Not enough values" [10:10:45] :) [10:10:58] I suppose [10:11:10] the other option is to do: [10:12:02] try: [10:12:13] return self.certificate.subject.get_attributes_for_oid(NameOID.COMMON_NAME)[0].value [10:12:28] except (relevant exceptions) as e: [10:12:41] raise CustomException('no CN found') from e [10:12:57] yup, but that doesn't take into account len() > 1 [10:13:18] indeed, but is that possible from the API? [10:13:21] it's not clear to me [10:13:31] maybe is already forbidden by the library [10:14:31] get_attributes_for_oid() could return multiple of course, dunno if there is in place any kind of restriction for NameOID.COMMON_NAME [10:15:32] is it even valid for an x509 cert to not have a CN? [10:18:24] we shouldn't trust too much the certificate file, as it is input provided by a third-party [10:19:47] Krenair: BTW, you consider necessary to log warnings telling between the several certificate types (rsa-2048 / EC)? [10:20:42] I don't mind logs being too verbose [10:20:47] logs being not verbose enough causes problems later [10:21:38] ok.. then I guess we should change the other loglines produced within _get_certificate_status() to be consistent [10:27:28] volans, vgutierrez: alright so are we all okay with this now? [10:30:05] I think so [10:32:38] Krenair: meanwhile... what's blocking us on https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/441991 ? [10:33:04] ugh puppet reviews [10:33:08] well [10:33:18] there's my CR-1 [10:35:11] the profile::certcentral::client class in there is just an example and we don't actually use any clients from the initial commit [10:35:20] but I suppose that's a good thing [10:35:57] btw: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/441991/36/modules/certcentral/manifests/cert.pp [10:36:11] do we need to offer the user the ability to check the certificate type? [10:36:20] or we just provide all of them? [10:37:00] s/check/choose/g [10:37:01] sorry :) [10:37:30] no could just provide all of them [10:37:34] guess that solves that [10:38:33] but we'd need to refactor a little bit the file resources dropping the certificates to loop through all the cert types [10:38:49] yes [11:02:33] volans, want to do the honours with https://gerrit.wikimedia.org/r/#/c/operations/software/certcentral/+/460382/ ? [11:04:34] Krenair: do you want to handle that change in the puppet CR? [11:04:38] or may I do it? [11:04:49] I was going to but you can if you like [11:04:54] please go ahead :) [11:14:06] vgutierrez, alright well that's done [11:14:49] once the subject change certcentral commit goes in I'll want to build the package, update the cherry-picks of the puppet commits and give it another test [11:15:13] ack [11:15:33] let's hope we can do all of that today [11:16:12] vgutierrez, the other open certcentral commits... [11:16:22] do you think we need any of them for initial prod deployment? [11:16:44] nah, it's not blocking us [11:16:56] even we could go without the subject change commit [11:23:44] sorry was doing other stuff, going for lunch now, I'll check it right after it. Or if you're in a hurry feel free to bypass me ;) [11:24:55] no hurry [13:53:23] Krenair: did you make somehow a request for a ganneti VM for certcentral? [14:32:12] bblack: I'm checking the form form requesting ganneti VMs [14:32:30] I guess that we need 1 in eqiad, 1 in codfw [14:32:57] private networking should be enough (assuming that it will get proxied internet access) [14:33:29] in terms of connectivity to the rest of our network.. it should have the same as a puppet master right? [14:33:41] cause it will be acting as a puppet file server [14:34:32] all good questions! :) [14:37:20] and of course it should be able to reach via 22/tcp (ssh) the authdns servers :) [14:38:57] "in terms of connectivity" - does that mean "what vlan its on", or "firewall rules"? [14:39:57] but in general my answers are: [14:40:20] 1) Yes, private vlan, not public, use proxy for outbound [14:40:44] the form says... "Networking Requirements: , " [14:40:52] 2) Firewall-wise, I'm not even sure. Obv it will need inbound on various ports from the rest of prod networks, some of those might include fileserver stuff, etc. [14:41:17] firewall wise shouldn't matter for the VM request AFAIK [14:41:29] it's mostly an issue in the puppetization itself [14:49:44] right, hopefully! [14:50:17] I wouldn't worry too much about the "specific networking access needed" bit then, just it needs to be fully on the normal private vlans somewhere [14:51:35] regarding CPU and memory.. dunno if we have some kind of standard VMs size to pick the smallest one :) [14:51:42] makevm asks this stuff? [14:52:02] e.g. "specific networking access needed" [14:52:16] nope [14:52:20] CPU/mem, eh, it shouldn't be too much right? I don't know what the standard minimum is [14:52:38] I was looking at: https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM [14:53:01] somebody pointed me to https://wikitech.wikimedia.org/wiki/SRE_Team_requests#Virtual_machine_requests_(Production) [14:53:17] and I was looking to the form in phabricator linked there [14:53:32] I think that link is more for non-SREs requesting an SRE make a VM for them [14:53:43] for makevm's sake, it only matters if we need a public IP or not [14:53:49] who would then go look at my link and do those things [14:54:05] and the row in case we need several instances in the same DC [14:54:10] otherwise I guess it can be any of them [14:54:17] yeah whatever row has room [14:54:24] 1x per DC [14:55:39] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10Nuria) Let's please update docs: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest [15:05:16] Krenair: did you make somehow a request for a ganneti VM for certcentral? [15:05:17] no [15:05:25] ack :) [15:05:35] ops likely wouldn't accept a VM request from me [15:08:19] wouldn't know what the CPU, memory, disk space etc. should be - it won't need much but I don't know what the normal lowest for prod is [15:12:56] volans, how's the certcentral change? [15:20:51] done [15:24:37] I'm in class now but I'll test stuff later [15:24:43] ack [15:33:58] Krenair: something like netmon instances feels right, 1 vCPU, 2Gb of RAM and 20Gb of disk [15:35:05] vgutierrez: see also debmonitor/puppetboard for a comparison [15:36:27] pretty similar... but 10Gb of disk [15:39:50] we shouldn't have big storage requirements, just a few logs [15:40:56] up to you I guess [15:41:02] it can also be expanded later if needed [16:31:29] bblack, vgutierrez: have you seen or have thoughts on https://github.com/AnalogJ/lexicon ? [16:31:37] funny how it explicitly calls out LE ;) [16:33:25] looks pretty cool, hadn't seen it [16:33:46] would be nice to have some built-in integration (or at least example script) for certcentral->lexicon for others to use. [16:34:14] nod [16:36:18] whatever others there may be heh [16:37:14] I feel like the answer to these kinds of questions for ~90% of the world now is basically: I don't care, I'm just going to hand the keys for my example.com domain to $cloudprovider and let them sort it out. Hopefully there's a checkbox on their UI for "provision TLS automagically". [16:47:34] they don't yet list openstack designate [16:48:40] might be a nice to have along with our gdnsd script (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/459809/5/modules/certcentral/files/dns-sync.py) and the yet-to-be-written designate one [16:49:30] they do standard DDNS [16:49:46] they don't have many local servers listed in general, mostly cloud APIs or whatever [16:50:09] they do have Knot [16:50:19] I would think openstack designate would like to be counted as a cloud API :) [16:50:37] well, in a different sense [16:50:48] the others they list are actual entities, not the software the entities happen to run [16:52:36] yeah well all the major cloud providers run their own proprietary stuff [22:28:52] bblack, XioNoX: should we repool ulsfo even without redundancy restored (or rather, with a tunnel being our redundancy)? [22:29:50] I leave that call to bblack :) [22:30:20] We now have a librenms alert if traffic start using the tunnel, so we can depool fast-ish if that happen [22:40:51] zayo is still trying to draft proper SOFs, which we'll then have to take to Finance and Legal etc., and after we sign they'll have to execute [22:40:58] I'd say the ETA for all that is 1 week at best [22:41:05] probably more