[00:08:58] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp2045 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=codfw&var-instance=cp2045 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [00:46:52] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 07Wikimedia-Incident: 503 Service Unavailable No server is available to handle this request. - https://phabricator.wikimedia.org/T418392#11652803 (10AlexisJazz) 05Open→03Resolved [02:03:58] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp2044 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=codfw&var-instance=cp2044 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [02:43:43] FIRING: [16x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [02:48:43] FIRING: [31x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [02:53:43] FIRING: [31x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [02:58:43] RESOLVED: [31x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [03:40:56] 10netops, 06Infrastructure-Foundations, 06SRE: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11652966 (10Papaul) @ayounsi The new diagram is now on Wikitech. For the Wikimedia Amsterdam DCs, IP layer do you want for us to update it or just delete it. [03:59:26] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11652987 (10BrokenImages1234) [04:01:34] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11652989 (10BrokenImages1234) No idea how any of this helps, but I edited it. The issue is non-reproducible, it occurs randomly. [04:08:58] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp2045 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=codfw&var-instance=cp2045 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [05:00:52] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, 06SRE: EQSIN:New switch setup/configuration - https://phabricator.wikimedia.org/T418439 (10Papaul) 03NEW [06:03:58] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp2044 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=codfw&var-instance=cp2044 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [06:20:41] 10netops, 06Infrastructure-Foundations, 06SRE: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11653051 (10ayounsi) Updating them would be great. Thanks [07:14:03] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11653134 (10Aklapper) > Browser: Tor, Firefox, Brave Please provide web browser version(s) that you tested. > See that some of the images are broken What is shown in the Network tab of your browser's developer tools in this case? [08:08:58] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp2045 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=codfw&var-instance=cp2045 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [08:30:43] FIRING: [23x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [08:35:43] FIRING: [47x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [08:45:43] RESOLVED: [47x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [09:56:50] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11653480 (10BrokenImages1234) >Please provide web browser version(s) that you tested. Have just tested it again in Tor v15.0.7. Hard-refreshed the page three times in a row, and it was always different images that failed to load... [10:03:58] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp2044 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=codfw&var-instance=cp2044 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [10:13:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp2044 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=codfw&var-instance=cp2044 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [10:13:43] RESOLVED: [3x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp2043 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [14:55:00] 06Traffic: Upgrade to HAProxy 3.0 on cache (bullseye) hosts - https://phabricator.wikimedia.org/T417253#11654762 (10Vgutierrez) [15:40:47] * Assigning GUIDs to configuration objects: The new guid directive available in frontend, backend, and listen sections lets you assign a unique identifier to that section. The server directive also gained a guid argument. For now, the main use is for persisting stats after a reload, since only stats associated with objects having a GUID can be restored. [15:41:01] haproxy 3.0 feature 👀 [15:41:50] yeah.. persistent stats are pretty cool but need some support from our side [15:47:44] native json logging support also [15:52:20] yeah.. I already mentioned that to fabfur [15:52:37] native JSON logging would save a chunk of work to haproxykafka [15:52:55] let haproxykafka do it's fair share of work! :) [15:52:58] (joking) [15:53:13] yeah.. the kafka bits :) [16:16:43] FIRING: [22x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [16:21:43] FIRING: [47x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [16:41:43] FIRING: [47x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [16:46:43] RESOLVED: [47x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [17:12:52] 10Wikimedia-Apache-configuration, 06ServiceOps new, 06SRE, 10Wikibase GraphQL, and 2 others: Create a rewrite for the GraphQL endpoint on wikidata.org - https://phabricator.wikimedia.org/T417026#11655662 (10Blake) [18:12:57] cdanis: OK to enable puppet on cp3070? [18:13:14] oops! [18:13:16] sorry, yes [18:13:19] ok! doing so [19:16:20] 06Traffic, 06DC-Ops: Network drop errors with new codfw cp hosts - https://phabricator.wikimedia.org/T418527 (10BCornwall) 03NEW [19:18:15] 06Traffic, 06DC-Ops, 10ops-codfw: Network drop errors with new codfw cp hosts - https://phabricator.wikimedia.org/T418527#11656209 (10ssingh) [19:21:48] 06Traffic, 06DC-Ops, 10ops-codfw: Network drop errors with new codfw cp hosts - https://phabricator.wikimedia.org/T418527#11656222 (10BCornwall) [19:22:01] 06Traffic, 06DC-Ops, 10ops-codfw: Network drop errors with new codfw cp hosts - https://phabricator.wikimedia.org/T418527#11656224 (10BCornwall) [19:31:37] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Network drop errors with new codfw cp hosts - https://phabricator.wikimedia.org/T418527#11656259 (10BCornwall) FWIW `perf` has the majority of dropped packets as `QUEUE_PURGE` (and `NOT_SPECIFIED`) [20:09:48] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Network drop errors with new codfw cp hosts - https://phabricator.wikimedia.org/T418527#11656360 (10BCornwall) [20:33:07] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656449 (10BCornwall) [20:38:54] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656457 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp2046.codfw.wmnet with OS trixie [20:40:06] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11656458 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp2047.codfw.wmnet with OS trixie [20:51:52] 06Traffic: Redirect techblog.wikimedia.org to diff.wikimedia.org - https://phabricator.wikimedia.org/T417940#11656500 (10TBurmeister) [22:06:20] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11656688 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp2047.codfw.wmnet with OS trixie completed: - cp2047 (**PASS**) - Removed from Puppet and PuppetDB if present and d... [22:09:12] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656708 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp2046.codfw.wmnet with OS trixie completed: - cp2046 (**PASS**) - Removed from Puppet and PuppetDB if present and del... [22:09:54] \o/ [22:13:50] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656717 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2048.codfw.wmnet with OS trixie [22:14:06] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656718 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2049.codfw.wmnet with OS trixie [22:14:13] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656719 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2050.codfw.wmnet with OS trixie [22:14:20] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656720 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2051.codfw.wmnet with OS trixie [22:15:07] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656721 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2052.codfw.wmnet with OS trixie [22:15:08] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656722 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2053.codfw.wmnet with OS trixie [22:15:55] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656723 (10CDobbins) [22:48:47] 06Traffic, 10API Platform, 10MediaWiki-User-login-and-signup, 06MediaWiki-Platform-Team (Q3 Kanban Board), and 2 others: Login with `action=login` and bot password does not create a JWT session cookie - https://phabricator.wikimedia.org/T415007#11656760 (10Tgr) [22:49:17] 06Traffic, 10API Platform, 10MediaWiki-User-login-and-signup, 06MediaWiki-Platform-Team (Q3 Kanban Board), and 2 others: Login with `action=login` and bot password does not create a JWT session cookie - https://phabricator.wikimedia.org/T415007#11656762 (10Tgr) Deployed to production. Follow-up task (not... [22:52:40] FIRING: VarnishPrometheusExporterDown: Varnish Exporter on instance cp2051:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [22:52:43] FIRING: HaproxyKafkaExporterDown: HaproxyKafka on cp2051 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=codfw&var-instance=cp2051 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [22:54:58] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656766 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2048.codfw.wmnet with OS trixie completed: - cp2048 (**PASS**) - Removed from Puppet and PuppetDB if present and delete... [22:57:07] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656767 (10BCornwall) [22:58:29] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656770 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2052.codfw.wmnet with OS trixie completed: - cp2052 (**PASS**) - Removed from Puppet and PuppetDB if present and delete... [22:59:03] 06Traffic, 10API Platform, 10MediaWiki-User-login-and-signup, 06MediaWiki-Platform-Team (Q3 Kanban Board), and 2 others: Login with `action=login` and bot password does not create a JWT session cookie - https://phabricator.wikimedia.org/T415007#11656777 (10Tgr) As a nice side effect, this tanked the "Sessi... [23:04:30] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656797 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2053.codfw.wmnet with OS trixie completed: - cp2053 (**PASS**) - Removed from Puppet and PuppetDB if present and delete... [23:09:20] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656800 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2050.codfw.wmnet with OS trixie completed: - cp2050 (**PASS**) - Removed from Puppet and PuppetDB if present and delete... [23:10:48] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656801 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2049.codfw.wmnet with OS trixie completed: - cp2049 (**PASS**) - Removed from Puppet and PuppetDB if present and delete... [23:14:38] 06Traffic: Reimage cp20[43-58] to Trixie - https://phabricator.wikimedia.org/T418161#11656814 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2051.codfw.wmnet with OS trixie completed: - cp2051 (**PASS**) - Removed from Puppet and PuppetDB if present and delete...