[06:11:20] !incidents [06:11:20] 6820 (ACKED) wmf - metamonitoring - thanos - notified - vip is now DOWN [06:11:21] 6819 (RESOLVED) Manual (paged) by LSobanski (lsobanski@wikimedia.org): Another test page [06:11:21] 6818 (RESOLVED) Manual (paged) by LSobanski (lsobanski@wikimedia.org): Test page [06:11:21] 6815 (RESOLVED) [25x] ProbeDown sre (ip4 probes/service eqiad) [06:11:21] 6817 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams) [06:11:21] 6814 (RESOLVED) wmf - metamonitoring - prometheus - notified - vip is now DOWN [06:11:21] 6816 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams) [06:11:22] 6813 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams) [06:11:22] 6810 (RESOLVED) ProbeDown sre (10.2.1.24 ip4 thumbor:8800 probes/service http_thumbor_ip4 codfw) [06:11:23] 6812 (RESOLVED) HaproxyUnavailable cache_upload global sre (thanos-rule) [06:11:23] 6811 (RESOLVED) VarnishUnavailable global sre (varnish-upload thanos-rule) [15:32:44] hi folks, for visibility: we'll be repooling eqiad shortly after the staff meeting - we should be done prior to the end of the late UTC infra window [15:33:49] gl jasmine_! [15:37:34] thanks, jasmine_! [15:37:44] federico3: topranks: FYI ^ [15:37:48] thanks! [15:38:00] thanks [15:38:08] ok thanks! [16:24:29] jasmine_: good luck on the repool [16:24:31] * claime dips out [16:24:34] :D [16:26:11] :D [16:26:16] gl jasmine_! [16:41:11] ty folks! repooling eqiad for CDN traffic now [16:43:20] and now, we wait :) [16:57:23] :D [16:57:53] looks like things have stabilized nicely between codfw and eqiad CDN traffic [16:57:58] nice! [16:58:17] yep, purely on the network side traffic looks healthy after the shift [16:58:32] * swfrench-wmf thumbs up [16:59:25] https://grafana.wikimedia.org/goto/QhXkPY3NR [16:59:35] sweet, repooling services in eqiad shortly [17:00:36] thanks, topranks - and that bump in eqiad <> codfw transport should go away once we complete the services repoool [17:18:13] FYI, we're seeing an initial bump in 5xx errors for mw-web in eqiad, currently investigating [17:38:43] services have been repooled in eqiad, 5xx errors ^ have stabilized, but currently monitoring [17:39:45] thanks, jasmine_! nicely done :) [17:54:19] 🚀 [17:54:22] nicely done :) [17:54:29] out of curiosity, was there any pattern to the 5xx? [17:55:44] cdanis: transient overload of es6 and es7 on repool, which triggered circuit-breaking [17:55:51] ah makes sense! thanks [17:55:52] lasted ~ 5m [17:56:07] I'll put a quick summary in the switchover task in a bit [18:06:00] ty! much appreciated~ [18:07:21] and thanks for reverse shadowing swfrench-wmf, hope you get some rest before your on-call stretch later