[08:19:31] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9891677 (10kamila) 05Open→03Resolved New NICs seem to be happy including overnight network testing out of sheer paranoia,... [09:46:33] godog: opinion on enabling statsd for mw-parsoid? [09:46:43] (today I mean) [10:34:07] claime: +1 SGTM [10:34:39] cool, on it then [10:37:11] If it works well, I have also prepared all the patches for the other namespaces, so we should be able to turn them on Monday [10:45:39] godog: ok so I'm seeing the statsd-exporter pods up for mw-parsoid [10:45:52] but o'd seem like it's not working well [10:46:26] statsd_exporter_lines_total 854 [10:46:28] statsd_exporter_loaded_mappings 0 [10:46:30] 0statsd_exporter_sample_errors_total{reason="malformed_component"} 732 [10:46:32] statsd_exporter_sample_errors_total{reason="malformed_line"} 122 [10:47:42] mhh ok [10:47:55] checking [10:48:25] Ah there's something, the port 9102 responds but not 9125, shows down [10:48:32] I may be missing a rule somewhere [10:49:24] Ah no that's the same for mw-debug [10:50:45] yeah I see at least statsd-exporter metrics for mw-parsoid namespace [10:51:01] I'm looking at this https://w.wiki/APHo [10:51:18] so defo metrics are getting scraped [10:52:02] ah wait, mediawiki wasn't restarted [10:52:08] I bet that's it [10:52:33] let me roll-restart the deployment [10:52:51] kk, other than that and the errors I think everything is in order [10:55:22] mediawiki_stats_buffered_total{app="statsd-exporter", instance="10.67.154.49:9102", job="k8s-pods", kubernetes_namespace="mw-parsoid", kubernetes_pod_name="statsd-exporter-prometheus-7965d94d49-h2qj2", pod_template_hash="7965d94d49", release="prometheus", routed_via="prometheus"} [10:55:24] there we go [10:55:57] sweet [10:56:19] going to lunch shortly, though ping me here if needed [10:58:00] yeah looks like it works correctly [10:58:26] Now to find out why the deployment wasn't restarted [11:27:54] 06serviceops: Migrate etcd::tlsproxy Nginx certs and etcd itself to PKI - https://phabricator.wikimedia.org/T352245#9891989 (10MoritzMuehlenhoff) It seems there's two parts to this migration: The etcd internal cert which will move to PKI along with the v3 migration, but there's also the cert used for the TLS ter... [13:18:09] 06serviceops, 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10wmde-wikidata-tech: Request permission to create 4 kafka topics in kafka-main - https://phabricator.wikimedia.org/T367510 (10dcausse) 03NEW [13:20:38] 06serviceops, 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10wmde-wikidata-tech: Request permission to create 4 kafka topics in kafka-main (WDQS graph split) - https://phabricator.wikimedia.org/T367510#9892397 (10dcausse) [18:09:07] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9894098 (10Papaul) @kamila 2003 is ready for. [18:37:34] 06serviceops: Alerting on under-scaled deployments - https://phabricator.wikimedia.org/T366932#9894216 (10Scott_French) Alert works: Deployment zotero-production in zotero at eqiad has persistently unavailable replicas - Deployment zotero-production in namespace zotero at site eqiad has had non-zero unavailable... [21:35:44] 06serviceops: Migrate etcd::tlsproxy Nginx certs and etcd itself to PKI - https://phabricator.wikimedia.org/T352245#9894693 (10Scott_French) Thanks, @MoritzMuehlenhoff - you're absolutely right that we could decouple these. The TLS proxy will go away with the v3 migration, since its primary use case will be abs...