[09:35:51] vgutierrez: hello! [09:36:04] morning onimisionipe [09:36:20] what can I do for you in this lovely Monday? [09:37:05] Good morning vgutierrez ! [09:38:16] I took another approach with that nginx patch [09:38:28] https://gerrit.wikimedia.org/r/c/operations/puppet/+/491972 [09:38:45] sure.. I'll check it after (non)-breaking icinga [09:38:52] so I noticed there's support for lua already, a flag [09:39:09] so I used it and provide support for any version of nginx [09:39:12] thanks! [09:50:41] I'm done with icinga.. let me check that CR onimisionipe [09:50:54] alright. cool! [10:02:17] onimisionipe: I've replied in the CR, it needs some work IMHO [10:02:34] sorry for being a PITA /o\ [10:04:36] vgutierrez: No p!. Thanks! :) [10:35:38] 10HTTPS, 10Traffic, 10Operations: Make sure that services available for NDA-only users are using strong TLS ciphersuites - https://phabricator.wikimedia.org/T217002 (10Vgutierrez) [10:35:50] 10HTTPS, 10Traffic, 10Operations: Make sure that services available for NDA-only users are using strong TLS ciphersuites - https://phabricator.wikimedia.org/T217002 (10Vgutierrez) p:05Triage→03Normal [12:09:26] 10Acme-chief, 10Patch-For-Review: Rename the Certcentral project to Acme-chief - https://phabricator.wikimedia.org/T207389 (10Vgutierrez) [12:10:13] 10Acme-chief, 10Patch-For-Review: Rename the Certcentral project to Acme-chief - https://phabricator.wikimedia.org/T207389 (10Vgutierrez) At this point all the former clients of certcentral are using the new acme-chief code & servers to fetch their certificates [12:10:47] \o/ [12:13:48] \o/ [12:14:07] \o/ [13:50:19] hello! [13:50:21] again [13:50:24] :) [13:51:27] I need help with generating SSL certificate for cloudelastic.svc.eqiad.wmnet [13:54:39] vgutierrez: ^ :) [13:57:46] right.. so far we are using the puppet CA for that kind of stuff IIRC [14:00:18] OK [14:01:27] I noticed something with the way Guillaume setup relforge.svc.eqiad.wmnet( Guillaume is on holiday), I noticed there's no DNS record for relforge.svc.eqiad.wmnet but there's a cert. [14:01:46] I tried ICMP from relforge host and no record found [14:02:26] yeah there doesn't need to be a DNS record for such a cert to exist [14:02:32] I'm guessing its a dead DNS with valid cert cos cirrus.pp requires TLS for elasticsearch. [14:02:41] Krenair: Thanks! [14:03:01] can I get something like that for cloudelastic.svc.eqiad.wmnet? [14:03:16] I'm not sure it's good practice but [14:03:29] theoretically you can generate a cert for any name and sign it with the puppet ca [14:03:35] Ok [14:03:54] I don't have access on any of the puppet masters [14:04:02] so I need help :) [14:04:04] oh [14:04:07] you probably can't do it then [14:04:11] was going to say I think it's https://wikitech.wikimedia.org/wiki/Cergen [14:05:03] yea I can't [14:06:15] Krenair: can you help? [14:06:54] no [14:07:58] vgutierrez might be able to [14:09:07] Ok. thanks! [14:09:19] onimisionipe: why not have a cert per cloudelastic machine instead of this fake svc hostname? [14:09:45] presumably he actually wants to be able to connect to any of them using a single name? [14:09:56] we don't have lvs there [14:10:01] dcausse: that might be a good idea [14:10:17] huh, why is it in svc.eqiad.wmnet? [14:10:34] I'm not sure why relforge was setup like that, perhaps we thought that lvs was ok but then we forgot to clean this up? [14:10:45] Krenair: Its a fake name [14:10:51] for cloudelastic [14:11:23] dcausse: so we can have certs for cloudelastic100[1-4].eqiad.wmnet [14:11:41] onimisionipe: I'll get back to you after lunch :) [14:12:18] vgutierrez: No p! Thanks! [14:13:00] cloudelastic100[1-4].wikimedia.org if I read site.pp correctly [14:13:21] dcausse: but looking at puppet again, cert name works on per cluster basis [14:15:06] onimisionipe, dcausse: for the simple use case utils/create_ecdsa_cert inside the puppet module is more than enough. In all cases it requires sudo on the puppetmasters to add the private key into the private repo [14:16:15] s/puppet module/puppet repo/ [14:16:25] volans: thanks! but neither mat nor myself can do this :( [14:16:49] do you have a task? [14:17:08] volans: https://phabricator.wikimedia.org/T214921 [14:18:51] dcausse: seems we might have to go the relforge-like way in terms of cert or we create ddns for cloudelastic with all the cloudelastic nodes [14:19:43] ah right guillaume is on vacation this week, sorry had forgot that [14:20:06] and there's no point creating ddns or discovery if we are not using lvs [14:20:30] if you don't find anyone before ping me tomorrow and I can probably have a look, I'm totally lacking context to give proper suggestions as of now ;) [14:20:55] volans: alright. thanks! [14:21:36] dcausse: I think profile's cirrus.pp needs some refactoring to make TLS optional [14:21:37] both names seem to work for relforge: [14:21:45] curl https://relforge1002.eqiad.wmnet:9243/ --resolve relforge1002.eqiad.wmnet:9243:10.64.37.21 [14:21:47] curl https://relforge.svc.eqiad.wmnet:9243/ --resolve relforge.svc.eqiad.wmnet:9243:10.64.37.21 [14:22:30] I think we need to make sure that https://cloudelastic1001.wikimedia.org:9243/ will work [14:22:45] dcausse: from where are you doing that pls? [14:23:17] onimisionipe: any node that can access relforge, so you can do this from relforge100[12] or mwmaint1002 [14:24:46] `curl https://relforge.svc.eqiad.wmnet:9243` does not work from relforge1002 [14:25:23] onimisionipe: with --resolve [14:26:25] Ok [14:27:00] so we need to make sure https://cloudelastic1001.wikimedia.org:9243/ work. It should after vgutierrez helps generate the cert [14:28:06] dcausse: so we generating cloudelastic.svc.wikimedia.org or cloudelastic.wikimedia.org? [14:29:27] onimisionipe: I have no clue I prefer valentin to ponder, this cert thing is way out my depth [14:29:52] me too :) [14:33:51] hmmm a few minutes ago you were discussing cloudelastic.svc.eqiad.wmnet [14:34:04] now we're talking about something in wikimedia.org [14:36:37] vgutierrez: yes machines are cloudelastc100[1-4].wikimedia.org [14:37:57] so which kind of traffic is going to be handled by those? cause usually user-faced https traffic is handled by our cp nodes [14:38:18] and those are in charge of terminating the TLS sessions [14:38:19] onimisionipe: why is it wikimedia.org? [14:38:40] yeah.. why those require public IPs? [14:38:45] and they aren't .eqiad.wmnet? [14:39:10] I'm not saying that the current setup is wrong, I just lack context and therefore I've questions :) [14:39:44] vgutierrez: it's a service that needs to be accessible from toolslabs and cloud VPS [14:40:33] but not meant to be exposed publicly [14:41:15] yep [14:41:22] so.. at some point in T194186 they went from .eqiad.wmnet to .wikimedia.org [14:41:22] T194186: rack/setup/install cloudelastic100[1-4].eqiad.wmnet systems - https://phabricator.wikimedia.org/T194186 [14:43:04] right.. here (https://phabricator.wikimedia.org/T194186#4239822) chasemp suggests that the servers should sit in the public vlan, hence the public IPs and the .wikimedia.org domain [14:43:18] yes [14:44:06] public + firewall [14:44:15] of course [14:44:27] so what's going to do the traffic balancing before those 4 nodes? [14:44:28] hm.. so now we have a name problem I suppose? [14:44:43] s/before/in front of/ [14:44:58] vgutierrez: client themselves are usually able to roundrobin (most elastic clients support this) [14:45:06] yes! [14:45:09] you pass the array of hosts [14:45:19] sounds like you do not want a svc hostname then [14:45:39] if that's all taken care of by clients behaving themselves [14:45:42] then you are going to configure 4 hosts and not one, I don't see the need for a svc hostname as Krenair is pointing out [14:45:59] though exposing to labs, don't know if that's a particularly safe assumption - also don't know if it matters or not [14:46:05] svc hostname is 'I think' a limitation of the current code [14:46:13] puppet code [14:46:32] cloudelastic.wikimedia.org? [14:46:35] so you'll have one svc record pointing to the 4 IPs? [14:46:37] at least they'll be dedicated to labs so any problems only affect other labs users [14:47:01] I'm not even sure they need to point to any IP for now [14:47:14] well... that's how DNS work [14:47:20] if we are going the relforge-like way [14:47:34] vgutierrez: nobody will do curl clouelastic.svc.wikimedia.org [14:47:48] right [14:47:49] relforge.svc.eqiad.wmnet does not point to any IP.. but the cert is valid [14:47:55] we need something similar [14:48:15] clients can access elastic via cloudelastic100[1-4].wikimedia.org [14:49:18] so.. in the relforge scenario, what does the translation between relforge.svc.eqiad.wmnet and the actual servers behind that name? [14:49:36] so why is a single shared cert needed at all? [14:49:36] vgutierrez: relforge100[1-2].eqiad.wmnet [14:49:51] vgutierrez, yeah but he means what does that transformation exactly [14:49:53] uh [14:49:56] dcausse, ^ [14:49:59] indeed, that's what I mean [14:50:10] rather than what is the result of the translation [14:50:15] cause the client is connecting to relforge.svc.eqiad.wmnet, so something needs to translate that to the real servers [14:50:18] Krenair: perhaps it's not needed that it's a valid, it's needed inside a var in the puppet code [14:50:22] so [14:50:27] "valid cert" I mean [14:50:27] fix the puppet code to not make poor assumptions? [14:50:48] this will have to be done for sure [14:50:53] yea [14:50:56] it will [14:51:08] so... puppet code aside [14:51:43] perhaps nginx validates the cert name with its servername, perhaps it's the reason? [14:51:45] if the clients are internal ones and they already trust our PuppetCA, the puppet CA issued certificate for each cloudelastic100[1-4].wikimedia.org should suffice [14:52:01] why would nginx require a svc hostname? [14:52:23] oh [14:52:24] point [14:52:30] Cloud VPS hosts will not trust the prod puppet CA [14:52:39] if even being internal ones they don't trust our Puppet CA, the easiest way should deploy a certificate for the required wikimedia.org hostname using acme-chief [14:52:46] in fact [14:53:03] if these are being given public hostnames, IPs etc. [14:53:07] you may want LE [14:53:16] that's what I just suggested :) [14:53:34] right [14:54:03] and uh, by 'hosts' above I mean instances [14:54:15] the hypervisors would but that's not what your clients will be [14:55:22] LE? [14:55:27] Let's Encrypt [14:55:49] ok [14:56:22] this route will require actually creating a real DNS name? [14:56:34] no, don't think so [14:56:36] so to sum: we'd need "official" certs for cloudelastic100[1-4].wikimedia.org, and we forget about this svc hostname [14:56:45] yeah [14:56:47] something like https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/acme_chief.yaml#L61-L68 [14:56:59] TXT records would get created at the name during the LE verification process but I don't think any A record would be necessary, the way we do it [14:57:06] that's how it's done for mx[12]001.wikimedia.org [14:57:29] Ok then [14:58:18] let's do it! [14:58:29] hopefully your hosts which are cloud-facing are still able to talk to the acme-chief servers internally [14:58:53] well.. they're public hosts if they're sitting in public vlans with public IPs [15:00:04] been a long while since I was on a prod machine, especially one with a public IP, but I vaguely recall those hosts being able to talk to private prod IPs? [15:01:57] one working example is librenms/netbox [15:02:11] those two services are being handled by acme-chief in terms of TLS certificates [15:02:28] and they're sitting in servers with public IPs netmon1002.wikimedia.org and netmon2001.wikimedia.org [15:02:29] ok [15:02:36] nice [15:08:40] 10HTTPS, 10Traffic, 10Operations, 10Patch-For-Review: Make sure that services available for NDA-only users are using strong TLS ciphersuites - https://phabricator.wikimedia.org/T217002 (10Vgutierrez) [15:13:19] I realised that probably most certs we've got will be for client servers in this situation [15:13:21] with public IPs [15:13:23] only difference is your hosts are cloud-facing and I don't know what crazy firewall rules that implies [16:54:41] 10HTTPS, 10Traffic, 10Operations, 10Patch-For-Review: Make sure that services available for NDA-only users are using strong TLS ciphersuites - https://phabricator.wikimedia.org/T217002 (10Vgutierrez) [17:48:28] 10Acme-chief, 10Patch-For-Review: Rename the Certcentral project to Acme-chief - https://phabricator.wikimedia.org/T207389 (10Vgutierrez) [17:53:00] 10netops, 10Operations, 10ops-eqiad: Move servers off asw2-a5-eqiad - https://phabricator.wikimedia.org/T212348 (10Cmjohnson) @ayounsi I want to do all the server moves on Thursday this week. Can you ask the service owners to have everything depooled. I will get started at 1500 UTC. The server move will t... [17:54:33] https://www.zdnet.com/article/surveillance-firm-asks-mozilla-to-be-included-in-firefoxs-certificate-whitelist/ [17:58:08] 10netops, 10Operations, 10ops-eqiad: Move servers off asw2-a5-eqiad - https://phabricator.wikimedia.org/T212348 (10fgiunchedi) [17:58:45] 10Acme-chief, 10Patch-For-Review: Rename the Certcentral project to Acme-chief - https://phabricator.wikimedia.org/T207389 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by vgutierrez@cumin1001 for hosts: `certcentral2001.codfw.wmnet,certcentral1001.eqiad.wmnet` - certcentral2001.codfw.wmnet... [18:06:14] 10Acme-chief, 10Patch-For-Review: Rename the Certcentral project to Acme-chief - https://phabricator.wikimedia.org/T207389 (10Vgutierrez) [18:32:19] 10Traffic, 10Elasticsearch, 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Enable nginx prometheus metrics for all elastic nodes - https://phabricator.wikimedia.org/T216681 (10Mathew.onipe) [19:12:38] 10Traffic, 10Operations, 10ops-eqsin: cp5007 correctable mem errors - https://phabricator.wikimedia.org/T216716 (10RobH) As of 2019-02-25 @ 19:12 there are no memory errors logged post dimm slot swap. [19:12:41] 10Traffic, 10Operations, 10ops-eqsin: cp5006 correctable mem errors - https://phabricator.wikimedia.org/T216717 (10RobH) As of 2019-02-25 @ 19:12 there are no memory errors logged post dimm slot swap. [19:46:24] robh: we should keep XioNoX around SG [19:46:54] I suspect he may not appreciate it after about a month. [19:46:58] the servers are happier with him around [19:49:06] i hesitate to use words like totalitarian or authoritarian [19:49:10] but singapore seems like one of the two [19:49:31] but i wouldnt wanna live there du eto that ;D [19:49:59] the former is harsh though cuz you can be singaporian and leave without issue [19:50:07] so meh. [19:50:20] but the food... no matter what anyone can likely spend a month there just eating and happy ;D [19:51:17] 10netops, 10Operations, 10ops-eqiad, 10ops-eqsin, 10Patch-For-Review: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 (10ayounsi) [19:51:26] oh, im wrong [19:51:29] wikipedia itself tells me [19:51:29] https://en.wikipedia.org/wiki/Democracy_Index [19:51:38] its merely flawed demoncracy, much like united states ;] [19:51:43] * robh eats his words [19:52:38] democracy even, that typo seems like a pun and it wasnt. ;D [19:54:43] Singapore is too warm and humid for me [19:55:02] it reminded me of florida ;D [19:55:07] in terms of weather [19:55:23] XioNoX: did you go back and get minced meat noodles? [19:55:28] SG is a tricky case. their democracy is much more-deeply flawed than the US. But their authoritative regime happens to mostly offer tradeoffs that most people there like. [19:55:30] or was your trip wasted?!? [19:55:38] it kind of works out nice in practice, but in theory it's pretty awful :) [19:55:48] robh: they didn't have it anymore :( [19:55:59] but yeah I went and had something similar [19:56:19] XioNoX: you sir have my deepest sympathies and condolences. [19:56:22] ;D [19:56:24] haha [19:56:38] had some amazing indian food though [20:17:55] 10Traffic, 10ExternalGuidance, 10Operations, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Deliver mobile-based version for automatic translations - https://phabricator.wikimedia.org/T212197 (10BBlack) The VCL looks good, please give us some notice (~24h would be ideal?) on when you n... [22:11:10] 10Traffic, 10Wikimedia-Apache-configuration, 10Operations, 10VisualEditor: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200/Loading failed for the