[09:28:51] 10Traffic, 10Operations, 10Performance-Team (Radar): Add profiling for Varnish and VCL - https://phabricator.wikimedia.org/T175710 (10Krinkle) [10:16:40] jbond42: hi! puppet is still disabled on boron, not sure if that's intended? [10:18:56] ema: thanks it should be enabled again, will do now [10:19:25] ack, thanks [10:49:25] 10Traffic, 10Operations, 10Performance-Team (Radar): Add profiling for Varnish and VCL - https://phabricator.wikimedia.org/T175710 (10Krinkle) Maybe something to discuss with Traffic and possibly collaborate on in a future quarter. [14:37:26] 10Traffic, 10Wikimedia-Apache-configuration, 10DNS, 10Matrix, 10Operations: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Tgr) @fsero this was feedback from modular.im support (and the modular.im config panel indeed checks for the .well-k... [14:38:13] 10Traffic, 10Wikimedia-Apache-configuration, 10DNS, 10Matrix, 10Operations: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Tgr) [14:43:47] https://phabricator.wikimedia.org/T222078#5235321 [14:44:00] it wasn't traffic-tagged, so not so visible over here in the SRE sphere [14:44:50] but research finished up a pretty thorough analysis showing that, as expected, there was readership growth in Asia following the SG edge deployment which can't be accounted for by other things like the general internet population growth over time in the region. [14:45:24] it's pretty interesting to read through how they analyze all that :) [14:46:35] The money quote in it all is: [14:46:37] What we observe is that the increase in connectivity alone does not explain the lift in unique device count in countries whose traffic is now served via Singapore data-center. Eliminating internet connectivity as a cause, and given that there are no other hypothesis about other causes for the lift, we can more confidently say the data suggests that the switch has been responsible for sustained un [14:46:44] ique device increase in countries served by Singapore data-center. [14:47:39] that's awesome news [14:47:39] \o/ [15:23:34] XioNoX, bblack: hey there, two vendors in SG (NTT & Tata) have both invoiced us with significant burst charges -- wondering what's up with that [15:25:34] 10Traffic, 10Operations, 10Patch-For-Review: Rate limit requests to cache_upload - https://phabricator.wikimedia.org/T224884 (10ema) 05Open→03Resolved [15:45:55] I'll have a look later, seems like the Librenms alert didn't trigger [15:46:16] would be great to have 3 transits there to even out the load :) [16:05:11] 3 usable ones anyways [16:45:53] 10netops, 10Analytics, 10Operations: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10fdans) p:05Triage→03High [19:37:28] I am receiving some reports of slowdowns/errors on upload on esams, can someone check? [19:38:04] this seems suspicious: https://grafana.wikimedia.org/d/000000478/varnish-mailbox-lag?orgId=1&var-datasource=esams%20prometheus%2Fops&var-cache_type=upload&var-server=All&from=1559763471639&to=1559849871641 [19:42:06] that looks fine, usually hundreds is not a concern [19:42:28] the text cluster in esams looks suspicious though [19:42:30] https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&var-datasource=esams%20prometheus%2Fops&var-cache_type=text&var-server=All&var-layer=backend [19:42:43] yeah, it may be test, not upload [19:42:46] *text [19:42:49] looking at the backend-backend connection count graph [19:42:51] css failed to me [19:42:54] once [19:43:04] it is not any one varnish in esams though [19:43:08] which is strange [19:44:30] css loading is failing to me all the time [19:45:46] that [19:45:47] to de point that overpassing de cache to a further feels faster [19:45:52] that is odd, there are not a huge number of 50x [19:45:55] *further dc [19:46:01] could be network [19:46:06] that would not generate errors [19:47:22] could be also my network or a transport [19:50:07] so there definitely are a higher-than-baseline number of 503s being generated by esams text varnishes [19:50:15] since 16:00 UTC [19:50:29] but.. it is something like 10 rps total [19:50:51] https://logstash.wikimedia.org/goto/67bb49bd0b8dc9ea03133c0d685c52c1 [19:50:56] not too much stands out here except cp3041 [22:18:48] 10Traffic, 10Operations, 10User-notice: Rate limit requests in violation of User-Agent policy more aggressively - https://phabricator.wikimedia.org/T224891 (10Quiddity) TechNews: I've [[https://meta.wikimedia.org/w/index.php?title=Tech/News/2019/24&diff=19140176&oldid=19140169&diffmode=source|added it to the... [22:28:12] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Deprecate python varnish cachestats - https://phabricator.wikimedia.org/T184942 (10colewhite) Latest dashboard audit: 'varnish\..+\.backends' * "Media" * "API frontend summary" * "Experimental - backend 5xx" * "Maps performances" *...