[09:18:02] 10Traffic, 10MediaWiki-Cache, 06Operations: Duplicate CdnCacheUpdate on subsequent edits - https://phabricator.wikimedia.org/T145643#2822732 (10hashar) [09:19:40] 10Traffic, 10MediaWiki-Cache, 06Operations: Duplicate CdnCacheUpdate on subsequent edits - https://phabricator.wikimedia.org/T145643#2822737 (10hashar) I guess #traffic is sufficient. I have filled this task as potential material to look at high rate of cache purges which apparently is or will be an issue. N... [09:20:54] Krenair: oh good point! I'll do that later today [09:29:57] 10Traffic, 10MediaWiki-Cache, 06Operations: Duplicate CdnCacheUpdate on subsequent edits - https://phabricator.wikimedia.org/T145643#2822747 (10hashar) Looks like the CdnPurgeJob are intentionally NOT deduplicated! ``` name=includes/jobqueue/jobs/CdnPurgeJob.php, lang=php class CdnPurgeJob extends Job {... [09:33:01] 10Traffic, 06Operations: Varnishkafka and related VSM daemons seeing abandoned VSM logs - https://phabricator.wikimedia.org/T151563#2822762 (10elukey) I checked some hosts showing the same behavior as cp1055 and the type of request that causes the assert failure is always the same: ``` /w/api.php?action=quer... [13:27:00] 10Traffic, 06Operations: Varnishkafka and related VSM daemons seeing abandoned VSM logs - https://phabricator.wikimedia.org/T151563#2823275 (10elukey) p:05Triage>03Normal [13:27:37] ema: --^ not sure if this one should be high or not, but normal seems good enough [13:32:21] elukey: yeah normal is fine [14:48:56] 10netops, 06Operations: Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) - https://phabricator.wikimedia.org/T133387#2823419 (10faidon) This has been opened with JTAC as case [[ https://casemanager.juniper.net/casemanager/#/cmdetails/2016-1125-0413 | 2016-1125-0413 ]]. [15:52:22] 10Traffic, 06Operations: Varnishkafka seeing abandoned VSM logs - https://phabricator.wikimedia.org/T151563#2823500 (10elukey) [15:58:22] 10Traffic, 06Operations: varnishlog daemons seeing Log overrun constantly - https://phabricator.wikimedia.org/T151643#2823504 (10elukey) [16:09:21] 10Traffic, 06Operations: varnishlog daemons seeing Log overrun constantly - https://phabricator.wikimedia.org/T151643#2823522 (10elukey) p:05Triage>03High [16:19:33] 10Traffic, 06Operations: python-varnishapi daemons seeing "Log overrun" constantly - https://phabricator.wikimedia.org/T151643#2823540 (10elukey) [17:48:33] ok so elukey and I have been working a bit on T151643 (on pinkunicorn) [17:48:33] T151643: python-varnishapi daemons seeing "Log overrun" constantly - https://phabricator.wikimedia.org/T151643 [17:49:16] we've tried bumping vsl_space from 80M to 240M and that doesn't really seem to help much [17:50:08] further increases of vsl_space are bound by /var/lib/varnish which is 512M as of now [17:50:29] but yeah anyways that doesn't really seem to change much, even with vsl_space set to 512M [17:50:48] what does help a lot is filtering for non-PURGE requests in the VSL query [17:50:57] ('q', 'ReqMethod ne "PURGE"') [17:51:25] which is clearly not the way to go for varnishreqstats given that we do care about PURGEs there [17:51:38] but it's probably a good idea for the other scripts [17:53:08] uh, maybe we should have specified -l as well? [17:53:13] -l [17:53:13] Specifies size of shmlog file. vsl is the space for the VSL records [80M] and [17:53:16] vsm is the space for stats counters [1M]. Scaling suffixes like 'K' and 'M' [17:53:19] can be used up to (G)igabytes. Default is 81 Megabytes. [17:54:24] weird, how does it relates with vsl_space? [17:55:03] let's just set it and see what happens [18:01:02] there are still a few overruns but less than before [18:01:53] 512? [18:01:58] y [18:02:15] -p vsl_space=512M -l 512M,1M [18:08:06] one easy thing that we could do it to add the ReqMethod ne PURGE to varnishxcache and varnishxcps to solve part of the problem [18:08:23] and then figure out what to do with reqstats [18:15:39] yeah [19:44:42] 10Traffic, 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479#2823701 (10fgiunchedi) The error should have been fixed upstream by https://github.com/jonnenauha/prometheus_varnish_ex...