[03:03:30] 10Traffic, 10CommRel-Specialists-Support, 10Core Platform Team, 10Editing-team, and 10 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Krinkle) [07:51:08] 10Traffic, 10Operations: varnish 5.1.3 frontend child restarted - https://phabricator.wikimedia.org/T185968 (10ema) 05Resolved→03Open The issue occurred again on cp4025. Reopening. ` Mar 14 15:51:49 cp4025 varnishd[20511]: Child (20592) not responding to CLI, killed it. Mar 14 15:51:49 cp4025 varnishd[205... [07:58:43] 10Traffic, 10Operations: OOM killer killed varnihsd cache-main on cp3053 - https://phabricator.wikimedia.org/T247195 (10ema) [07:58:46] 10Traffic, 10Operations: varnish 5.1.3 frontend child restarted - https://phabricator.wikimedia.org/T185968 (10ema) [09:59:11] 10Traffic, 10Operations: varnish 5.1.3 frontend child restarted - https://phabricator.wikimedia.org/T185968 (10ema) The OOM killer intervened due to "Normal" (non-DMA) free memory on NUMA node 0 going below min (1380412 < 1387544): ` [Sat Mar 14 15:51:23 2020] Node 0 Normal free:1380412kB min:1387544kB low:17... [10:13:59] 10Traffic, 10Operations: varnish 5.1.3 frontend child restarted - https://phabricator.wikimedia.org/T185968 (10ema) Also worth mentioning that in the specific case of cp4025, the trouble was caused by a sudden [[https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&var-server=cp4025&var-datas... [11:26:22] 10netops, 10Analytics, 10DC-Ops, 10Operations: kafka-jumbo1006 and stat1005 network issues - https://phabricator.wikimedia.org/T247561 (10elukey) I had a chat with Arzhel today and we didn't find a lot. From his perspective, it seems that something in the middle between the switch and stat1005 is not worki... [12:26:40] 10Traffic, 10CommRel-Specialists-Support, 10Core Platform Team, 10Editing-team, and 10 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Esanders) >>! In T120085#5545232, @Krinkle wrote: > So the question is whether it would be a problem if... [13:19:40] 10netops, 10Operations, 10Patch-For-Review, 10User-Elukey: can aggregated netflow data include the router it was sampled from? - https://phabricator.wikimedia.org/T246186 (10ayounsi) >>! In T246186#5960144, @elukey wrote: > If the cardinality of the three new dimensions are not too big we could definitely... [15:36:04] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 7 others: Picture from Commons not found from Singapore - https://phabricator.wikimedia.org/T231086 (10Sarahmarie1981) [15:56:23] 10Traffic, 10Operations, 10observability, 10User-fgiunchedi: Per-backend ATS Prometheus metrics - https://phabricator.wikimedia.org/T227668 (10ema) 05Open→03Resolved a:03ema Metrics added a while ago, closing! [16:11:07] post meeting question: when should i resume bios updates due to wonky traffic conditions. (No need to answer now just asking while I thought of it, can wait post meeting.) [18:22:28] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10Dvorapa) Any news? From possible solutions like T238751, T240442, T245144 and @Ladsgroup's T247459? La...