[01:51:09] Out of curiousity: https://twitter.com/john_overholt/status/1276276247602044933 describes a situation where one version of the article is visible logged in and another one logged out for a couple of persons. Neither a hard refresh nor purging the page seems to help. Is this a normal issue? Something we need to do something about? [02:00:56] thanks JohanJ, taking a look [02:11:19] sigh [02:12:01] that's not a happy noise [02:30:57] indeed [02:32:10] OOI, do we have monitoring (or can we add it?) to detect cases like that automagically [02:32:31] Not purging for a few hours isn't a major issue, over a week is definitely one though [02:36:51] 10Traffic, 10Operations: several purgeds badly backlogged (> 10 days) - https://phabricator.wikimedia.org/T256444 (10CDanis) [02:37:08] Reedy: we have monitoring, but not alerting [02:37:54] (stupid distiction, sorry ;) [02:49:11] 10Traffic, 10Operations, 10User-notice: several purgeds badly backlogged (> 10 days) - https://phabricator.wikimedia.org/T256444 (10Johan) [02:55:57] Heh [02:56:04] Sod’s law ;) [03:01:46] 10Traffic, 10Operations, 10User-notice: several purgeds badly backlogged (> 10 days) - https://phabricator.wikimedia.org/T256444 (10CDanis) [03:30:48] 10Traffic, 10Operations, 10User-notice: monitoring & alerting for purged - https://phabricator.wikimedia.org/T256446 (10CDanis) [05:13:33] 10Traffic, 10Operations, 10User-notice: several purgeds badly backlogged (> 10 days) - https://phabricator.wikimedia.org/T256444 (10CDanis) [07:01:33] 10Traffic, 10Varnish, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 3 others: Special:HideBanners is not really cacheable - https://phabricator.wikimedia.org/T256447 (10tstarling) [07:43:43] 10Traffic, 10Operations, 10ops-eqsin: cp5006 multiple alerts (and SSH flapping) - https://phabricator.wikimedia.org/T256449 (10JMeybohm) [07:50:42] 10Traffic, 10Operations, 10ops-eqsin: cp5006 multiple alerts (and SSH flapping) - https://phabricator.wikimedia.org/T256449 (10Volans) Host is back up, the console output during boot was all borked `��fx怘�怘�xx��x�x....` but the kernel boot logs were normally readable. Maybe there is some misconfiguration in... [09:47:14] 10Traffic, 10Operations, 10User-notice: several purgeds badly backlogged (> 10 days) - https://phabricator.wikimedia.org/T256444 (10ema) p:05Triage→03High [09:47:43] 10Traffic, 10Operations, 10User-notice: monitoring & alerting for purged - https://phabricator.wikimedia.org/T256446 (10ema) p:05Triage→03Medium [10:00:39] hello folks [10:00:50] hello elukey [10:01:00] I'd like to move archiva.wikimedia.org to a new backend, but some LE is involved [10:01:04] I created https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/607989/ [10:01:17] let me know if it makes sense or if it is horrible when you have a moment [10:01:40] elukey: while you're here, can you look into https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=54&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=api_appserver ? [10:01:58] it's alerting and I'm currently otherwise occupied [10:02:57] ah lovely [10:20:08] elukey: you're likely gonna get a review on Monday (just added valentin as a reviewer, he's off today) [10:21:25] 10Traffic, 10Operations, 10User-notice: several purgeds badly backlogged (> 10 days) - https://phabricator.wikimedia.org/T256444 (10ema) I have identified the misbehaving purged instances with `rate(purged_events_received_total{cluster="cache_text", topic="eqiad.resource-purge"}[5m]) == 0` and restarted them... [10:23:20] ema: ack thanks [10:25:25] 10Traffic, 10Operations: monitoring & alerting for purged - https://phabricator.wikimedia.org/T256446 (10Johan) [10:25:29] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan 20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10Adithyak1997) [10:29:35] 10Traffic, 10Operations: Make atsmtail-backend.service depend on fifo-log-demux - https://phabricator.wikimedia.org/T256467 (10ema) [10:29:55] 10Traffic, 10Operations: Make atsmtail-backend.service depend on fifo-log-demux - https://phabricator.wikimedia.org/T256467 (10ema) p:05Triage→03Low [10:30:12] 10Traffic, 10Operations, 10ops-eqsin: cp5006 multiple alerts (and SSH flapping) - https://phabricator.wikimedia.org/T256449 (10ema) 05Open→03Resolved a:03ema The host looks fine, closing for now. [10:36:01] ema: need to go now, but the rise in traffic seemed a temporary spike for api-appservers doing search-related actions [10:36:07] all recovered [10:36:19] https://grafana.wikimedia.org/d/myRmf1Pik/varnish-aggregate-client-status-codes?orgId=1&var-site=ulsfo&var-cache_type=varnish-text&var-cache_type=varnish-upload&var-status_type=1&var-status_type=2&var-status_type=3&var-status_type=4&var-status_type=5&var-method=GET&var-method=HEAD&var-method=POST&from=now-3h&to=now [10:36:40] fairly large request spike in ulsfo, most requests rate limited at the frontend layer [10:37:05] elukey: is that what you're talking about? ^ [10:37:55] yes exactly,from 9:55~10 UTC [10:38:09] ack, thanks! [10:42:55] 10Wikimedia-Apache-configuration: Clean up redirects.conf/redirects.dat (remove en2.wikipedia.org, etc.) - https://phabricator.wikimedia.org/T105981 (10Lami1111) Ping ip address Host dev msv [13:19:55] 10Traffic, 10Operations: purged crashes with "fatal error: concurrent map read and map write" - https://phabricator.wikimedia.org/T256479 (10ema) [13:29:33] 10Traffic, 10Operations, 10Patch-For-Review: purged crashes with "fatal error: concurrent map read and map write" - https://phabricator.wikimedia.org/T256479 (10ema) p:05Triage→03Medium [14:02:54] XioNoX: hi! We've got centurylink, zayo, and telia maintenances planned for June 30 FYI. Zayo/Telia overlap timewise [14:03:13] great, which links? [14:04:55] zayo: /OGYX/120003/ZYO [14:05:00] telia: IC-313592 [14:09:46] eqiad/codfw, eqdfw/ulsfo [14:10:02] so that's not so bad [14:17:49] excellent [14:28:34] indeed! [17:07:11] 10Traffic, 10Android-app-Bugs, 10Operations, 10Parsoid, and 6 others: Right-to-Left directionality problem with refs - https://phabricator.wikimedia.org/T251983 (10bearND) 05Open→03Resolved a:03bearND Ok, looks like enough time has passed for the old, cached version to be evited and the newly deploye... [17:38:14] 10Traffic, 10Operations, 10Patch-For-Review: monitoring & alerting for purged - https://phabricator.wikimedia.org/T256446 (10Nemo_bis) [17:38:24] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Nemo_bis)