[00:05:39] 10netops, 10Cloud-VPS, 10Operations, 10cloud-services-team (Kanban): dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour - https://phabricator.wikimedia.org/T174596 (10bd808) [00:51:03] 10netops, 10Operations, 10fundraising-tech-ops: Qualys scans causing problematic pfw logspam - https://phabricator.wikimedia.org/T206431 (10cwdent) [02:54:46] 10HTTPS, 10Traffic, 10Operations, 10fundraising-tech-ops: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Liuxinyu970226) @krenair please, no more DV certs, that's the reason why jawiki, ugwiki, wuuwiki, zhwiki, zh-yuewiki and zhwikinews are SNI RSTed by... [03:00:12] 10HTTPS, 10Traffic, 10Operations, 10Upstream: Enable ESNI support on Wikimedia servers - https://phabricator.wikimedia.org/T205378 (10Liuxinyu970226) [03:00:24] 10Traffic, 10Operations, 10Patch-For-Review: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10Liuxinyu970226) [11:13:31] 10Traffic, 10Operations, 10vm-requests, 10Patch-For-Review: Create VMs for certcentral hosts - https://phabricator.wikimedia.org/T206308 (10Vgutierrez) 05Open>03Resolved VMs delivered, added in puppet as spare systems till certcentral puppetization is ready to go [11:13:32] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Vgutierrez) [12:13:44] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Dereckson) >>! In T191183#4647075, @Krinkle wrote: > Gerrit wants 100x100px square thumbnails. The 100x100 size isn't what currently happens in the repository: ``` $ identify *.... [12:49:26] paravoid, hi [12:56:23] 10netops, 10Operations: connectivity issues between several hosts on asw2-b-eqiad - https://phabricator.wikimedia.org/T201039 (10ema) cp1081 and cp1079, both on asw2-b-eqiad, are also having IPv6 connectivity issues with lvs1001: ``` 12:53:09 ema@lvs1001.wikimedia.org:~ $ curl http://localhost:9090/pools/tex... [13:23:03] vgutierrez, so what are we going to do about regr.json? [13:24:28] vgutierrez, those files... don't contain anything private do they? [13:24:48] nope AFAIK [13:25:00] actually certbot publish them for their test environments [13:25:09] my testing ones are just like {"uri": "https://acme-v02.api.letsencrypt.org/acme/acct/$INTEGER_HERE", "body": {}} [13:25:17] i.e: https://github.com/certbot/certbot/blob/master/tests/letstest/testdata/sample-config/accounts/acme-staging.api.letsencrypt.org/directory/48d6b9e8d767eccf7e4d877d6ffa81e3/regr.json [13:25:47] but ours has an empty body for some reason [13:26:12] you sure that key part of the body isn't supposed to be private? [13:27:01] ours don't have the private key embedded [13:27:04] as you just said [13:27:21] I picked a pretty bad example [13:27:26] they're releasing the private key as well [13:27:28] https://github.com/certbot/certbot/tree/master/tests/letstest/testdata/sample-config/accounts/acme-staging.api.letsencrypt.org/directory/48d6b9e8d767eccf7e4d877d6ffa81e3 [13:27:44] presumably theirs is purely garbage test data? [13:27:52] I think so [13:28:03] okiay [13:28:04] okay* [13:28:18] so we just need to deploy regr.json from puppet and private_key.pem from puppet-private [13:28:37] ack.. I'm going to generate right now the one for the ACME v2 staging API [13:32:27] to put on the prod servers? [13:33:29] we can roll out the first tests against the staging API as bblack suggested [13:34:04] and then switch the ACME account [13:34:33] ok [13:34:54] do you want to fix the puppet commit to handle accounts or shall I? [13:35:18] go for it please [13:39:45] hmmm the account id that we use it's completely safe to publish right? [13:40:02] cause it's basically the MD5 of the public key --> self.account_id = hashlib.md5(self.key.public_pem).hexdigest() [13:40:08] so no harm there [13:43:27] 10Traffic, 10Operations: Provide a Let's Encrypt ACME v2 staging environment account - https://phabricator.wikimedia.org/T206461 (10Vgutierrez) p:05Triage>03Normal [13:44:36] ema: just restarted confd on cp1* and cp3* to pick up the new SRV records, all good afaics [13:44:48] k [13:52:18] vgutierrez, https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/441991/39..40/modules/certcentral/manifests/central.pp [13:56:53] vgutierrez, I think https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/459809/6/modules/profile/manifests/authdns/certcentral_target.pp also needs a ferm::service ? [14:09:39] Krenair: for 22/tcp? indeed [14:20:57] okay [14:20:59] well we can come back to that after the first commit is done [14:21:14] hmmm I think it's my first time uploading a new secret, https://gerrit.wikimedia.org/r/#/c/labs/private/+/465171/ [14:21:36] that directory layout makes sense to you Krenair, ema? [14:21:57] yes I think so [14:22:07] assuming 6e01c693ed6e9d9a6b5930923ecef104 will be the prod account ID [14:22:21] yup, in the staging environment :) [14:23:47] ok [14:30:48] vgutierrez: lgtm! [14:39:02] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review, 10Wikimedia-Incident: Collect Backend-Timing in Prometheus - https://phabricator.wikimedia.org/T131894 (10ema) We now expose the following metrics to Prometheus: ``` 14:35:31 ema@cp2004.codfw.wmnet:~ $ curl -s http://localhost:3904/metrics... [14:50:46] _joe_: can we now re-enable puppet on lvs1005? [14:51:05] <_joe_> ema: yes, you'll also need to restart pybal though [14:51:19] <_joe_> else you'll get an alert for etcd [14:52:16] ah, and pybal isn't running on lvs1005's buddy, lvs1002 [14:52:30] <_joe_> that's why I disabled puppet there. I would say you should probably do it when you reenable pybal on lvs1002 [14:52:47] that makes sense, yes [14:56:25] Krenair: so.. should I commit the regr.json now, or should I wait till the current puppet CR gets merged? [14:56:58] 10Traffic, 10Operations, 10Patch-For-Review: Provide a Let's Encrypt ACME v2 staging environment account - https://phabricator.wikimedia.org/T206461 (10Vgutierrez) private key committed into our private repo. [14:57:19] 10Traffic, 10Operations: Provide a Let's Encrypt ACME v2 staging environment account - https://phabricator.wikimedia.org/T206461 (10Vgutierrez) [15:18:22] 10Traffic, 10Operations, 10Patch-For-Review: Traffic Server - Prometheus integration - https://phabricator.wikimedia.org/T202381 (10ema) [15:18:43] 10Traffic, 10Operations, 10Patch-For-Review: Traffic Server - Prometheus integration - https://phabricator.wikimedia.org/T202381 (10ema) 05Open>03Resolved [15:42:23] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review, 10Wikimedia-Incident: Collect Backend-Timing in Prometheus - https://phabricator.wikimedia.org/T131894 (10Gilles) https://grafana.wikimedia.org/dashboard/db/apache-backend-timing getting something started there... [16:27:00] vgutierrez, amend it into the commit as hiera data [16:27:02] or merge first and make a new commit for that [16:27:04] either way [16:27:49] 10netops, 10Operations, 10Patch-For-Review: connectivity issues between several hosts on asw2-b-eqiad - https://phabricator.wikimedia.org/T201039 (10ayounsi) Working with JTAC on this. Here is a tcpdump capture of a neighbor solicitation packet being sent from lvs1002: ``` lvs1002:~$ sudo tcpdump -p -i eth1... [16:45:32] 10Traffic, 10Analytics, 10Analytics-Wikistats, 10Operations, 10Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281 (10mforns) @BBlack ping, bumping this up [16:57:08] * elukey off! [16:58:40] 10netops, 10Operations, 10Patch-For-Review: connectivity issues between several hosts on asw2-b-eqiad - https://phabricator.wikimedia.org/T201039 (10ayounsi) Followed up with JTAC, we can see the NS packets making it into the fabric: ``` # run show firewall Filter: v6-ns-lvs1002-ge-6/0/46.0-i... [18:24:45] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Krinkle) >>! In T191183#4648126, @Paladox wrote: > [..] using phabricator would not work seeing as the name of the file does not match the users username. Can you explain what you... [19:04:08] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Paladox) @Krinkle would you know how to build this? Seeing as you can link your mediawiki profile or your ldap account, you could have a different name. [19:06:29] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Krinkle) >>! In T191183#4650347, @Paladox wrote: > @Krinkle would you know how to build this? Seeing as you can link your mediawiki profile or your ldap account, you could have a d... [19:13:56] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Paladox) Ok, but would you know how to write this cgi script? [19:44:04] 10netops, 10Operations, 10Patch-For-Review: connectivity issues between several hosts on asw2-b-eqiad - https://phabricator.wikimedia.org/T201039 (10ayounsi) Temporarily disable IGMP snooping on the interfaces to narrow down the issue. ```lang=diff [edit protocols igmp-snooping vlan all] + interface ge-6... [20:10:35] 10netops, 10Operations, 10Patch-For-Review: connectivity issues between several hosts on asw2-b-eqiad - https://phabricator.wikimedia.org/T201039 (10ayounsi) >>! In T201039#4649384, @ema wrote: > cp1081 and cp1079, both on asw2-b-eqiad, are also having IPv6 connectivity issues with lvs1001: > I can ping th... [21:01:54] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Krinkle) Yes, I think anyone involved around this task could do it. It's not a question of how. The question is, what do we want for the user experience, and would it be worth it...