[01:11:59] 10Varnish, 06Performance-Team, 06Reading-Web-Backlog, 13Patch-For-Review: Vary mobile HTML by connection speed - https://phabricator.wikimedia.org/T119798#2849551 (10Krinkle) 05Open>03declined T119797 was resolved by removing srcset instead of using qlow, and for all mobile views (which we already frag... [09:26:38] 10Traffic, 06Analytics-Kanban, 06Operations, 13Patch-For-Review: Ganglia varnishkafka python module crashing repeatedly - https://phabricator.wikimedia.org/T152093#2849995 (10elukey) 05Open>03Resolved [12:11:32] 10Traffic, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2850204 (10Ciencia_Al_Poder) >>! In T66214#2827486, @GWicke wrote: > Since the need for explicit control should be rare, I think using the Accept header o... [14:29:39] elukey: we've forgotten to get rid of the old varnish and varnishkafka experimental packages from carbon [14:31:12] doing that now, luckily I don't have to re-read reprepro(1) for the nth time in this case [14:32:22] also, I don't think we need sources.list files for experimental on cache hosts any longer [14:33:00] +1, thanks ema [14:59:16] I'm currently expanding my caching simulator for WMF traffic and I was wondering: where can I find how many Varnish instances and pooled for text, uploads, maps, etc. Do these numbers change only due to code redeployment or are there other reason they might get changed? [15:05:22] _joe_: I think you once pointed me to an externally accessible http endpoint listing the hosts currently pooled, right? [15:05:45] can't find it in my irc logs at the moment [15:05:49] <_joe_> ema: 1 sec [15:05:53] sure [15:09:20] oh, that probably was https://config-master.wikimedia.org/conftool/ [15:10:46] 10Traffic, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2850639 (10Fjalapeno) Not sure if it is helpful to examine, but here is a commercial image service API: https://docs.imgix.com/setup/serving-images https... [15:12:12] <_joe_> that's the easiest one, yes [15:16:46] thanks [15:16:50] is there a way to access historical info? [15:17:35] not as far as I'm aware, no [15:18:09] thanks, that's very helpful already [15:18:26] as to the other question: do you change the number of pooled instances often? [15:18:33] and when do you change that number? [15:18:55] (is there some kind of load balancing between text and upload going on, or something like that?) [15:18:55] not really, we do depool/repool hosts for operational reasons (varnish upgrade and what not) [15:19:15] alright, so I'll just assume it's static for the simulator [15:19:19] thanks a lot! [15:19:25] I think it's a fair assumption, yes [15:19:39] and no load balancing between text and upload [15:21:19] the simulator is awesome btw :) [15:23:48] thanks :) - there's a lot more to come (I have two other students working on it, and teaching a course where people will write extensions) [15:24:08] is it available somewhere? Really curious :( [15:24:14] ergh :) [15:26:18] very early release code is here: https://github.com/dasebe/webcachesim [15:26:33] but it's fairly incomplete at the moment [15:26:41] elukey: and some results here https://phabricator.wikimedia.org/T144187#2664359 (and subsequent comments from dberger) [15:28:16] nice thank you :) [15:31:49] <_joe_> dberger: nice indeed [15:32:18] <_joe_> dberger: the best way to access historical info is from the etcd daily backups [15:32:25] <_joe_> not very easy, tbh [15:33:34] it is probably not that important though, we don't change the number of hosts often enough [15:34:21] <_joe_> ema: what might be significant is the restarts/depools of cache backends [15:34:37] <_joe_> well, it can be interesting to correlate [15:34:46] yes [15:34:48] I was interested in correlating as well :) [15:35:36] <_joe_> dberger: no, we honestly didn't think it was so interesting, but you could check the numbers yourself from now on [15:36:19] <_joe_> ema, bblack different topic: I'd like to add our nginx TLS termination classes on the appservers [15:36:35] <_joe_> to allow cross-dc encrypted requests [15:36:57] <_joe_> do you see any reason why that would be a bad idea? [15:38:16] cross-dc requests from whom? [15:39:03] <_joe_> parsoid => mediawiki [15:39:07] <_joe_> directly to the LB [15:39:14] <_joe_> not going through the cache layers [15:39:20] <_joe_> because that doesn't make sense [15:40:14] right [15:40:42] <_joe_> so I made it so that parsoid can do those requests correctly in https://gerrit.wikimedia.org/r/#/c/325550/ [15:40:54] <_joe_> next I'd need to add TLS to the appservers [15:41:03] <_joe_> I could go on and add it to apache [15:41:18] <_joe_> or, I can just proxy via nginx to localhost [15:41:29] <_joe_> if it's much faster [15:41:37] you're gonna need bblack for a proper answer to your question, I can't think of a reason why it would be a bad idea :) [15:41:46] <_joe_> well I have one [15:41:57] <_joe_> it's one more moving part on the appserver [15:42:02] <_joe_> nginx => apache => hhvm [15:42:18] <_joe_> but honestly we have the same moving part to serve traffic to the public [15:43:36] well, traffic-wise it's not another moving part as we would still hit apache directly [15:44:00] <_joe_> yeah I meant we have nginx proxying locally to varnish there [15:44:12] and we all like that so much [15:44:37] <_joe_> I'll take a look at our puppet classes :) [15:45:10] _joe_: thx ;) [15:51:23] oh it looks like we don't have the varnish backends on https://config-master.wikimedia.org/conftool [15:51:27] only nginx and varnish-fe [15:51:52] varnish-fe would be the Hot Object Cache in proper terminology as I've recently discovered [15:51:53] <_joe_> ema: yes, because that's for the load-balancer config [15:52:05] right [15:54:11] ema: but isn't there almost a one-to-one relation between the number of pooled Hot Object Caches and backends? [15:54:26] ema: or what do you mean by backend? [15:54:34] backend would be the disk-based cache [15:54:59] there is a 1-1 relation indeed, but we do restart and wipe the disk-based caches weekly [15:55:14] so perhaps you would be interested in seeing when that happens too [15:56:23] ema: oh, right, yes definitely [15:59:03] https://phabricator.wikimedia.org/T145661 this is the reason for the weekly restarts [16:03:34] 10Traffic, 10MediaWiki-ResourceLoader, 06Operations, 06Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#2850851 (10Krinkle) >>! In T105657#2613260, @Krinkle wrote: > Or maybe we can bump the startup module exp... [16:07:05] 10Traffic, 10Analytics, 06Operations: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2850856 (10Gilles) [16:32:29] dberger: I'm going through http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-120.pdf and the 0.1 difference between stock varnish vs. nginx OHR is not immediately clear to me [16:32:38] both in the US and the HK trace [16:34:10] did you try to figure out why default nginx achieves better OHRs than varnish? [16:37:36] it's nginx with a tuned n-request admission filter [16:37:43] it's not stock nginx [16:38:22] we have an updated version of the paper in the meanwhile, additional experiments, better explanations [16:38:52] I can probably upload that somewhere as the "official" version will only be presented at usenix nsdi in March [16:39:24] that would be great, yes please :) [16:39:35] I'm happy to discuss this all another day ... just for now I've got to run to a meeting [16:39:47] have fun! [16:40:45] ok so the .1 difference can perhaps be explained by the nginx admission filter, while varnish has no admission policy by default [16:40:50] yes [16:55:41] _joe_: I think for now, nginx makes more sense. You can *probably* use the tlsproxy class pretty much as it is today, and it avoids bringing more complexity into the appserver apache config and we know it does certain things "right" [16:55:56] _joe_: (and since it's internaly, we can use "high" TLSv1.2-only) [16:56:16] <_joe_> bblack: code review incoming in a few :P [18:34:30] 10Traffic, 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port vhtcpd statistics from ganglia to prometheus - https://phabricator.wikimedia.org/T147429#2851328 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi This is deployed, I've updated https://grafana.wikimedia.org/dashboard/db... [23:52:59] 10Traffic, 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479#2852882 (10fgiunchedi) I took another look at the cause of UUID/VCL churn, concentrating for now only on the backend va...