[11:12:18] bblack: re: exp policy size-based cutoff, couldn't we move the whole exp logic after cluster_fe_backend_response, thus avoiding splitting the code between wm_common_backend_response and vcl_backend_response (fe)? [11:18:32] after calling cluster_fe_backend_response, that is [11:22:05] uhm, but cluster_fe_backend_response returns [11:35:19] yeah I'm confused as to where exactly put the size-based hfp cutoff [11:36:05] not wm_common_backend_response as we want it on frontends only [11:36:42] probably not in cluster_fe_backend_response as we don't want to copy-paste it in multiple VCL files [11:42:05] ema: instead of retracting service IPs from bgp announcements, we should probably raise MED instead eh [13:21:35] mark: oh, so play with MED values to promote/demote IPs, so to say [13:24:59] yes [13:25:05] instead of retracting the ip entirely [13:25:10] making the host not eligible to be used at all [13:25:13] just make it less attractive [13:25:21] i guess there could be situations where we want to retract the ip entirely [13:25:27] but that should probably wait for the FSM stuff [13:25:37] now I was just thinking of raising the MED while depool threshold is in effect [13:25:48] and maybe until init completes, dunno [13:32:21] mark: what happens if the host with higher MED is unreachable? Do packets in that case get routed to the other hosts? [13:32:25] I imagine so [13:32:51] so the router simply picks the router with the lowest MED (all other things being equal) at all times [13:33:02] so the higher MED one isn't even used [13:33:15] and it will soon drop its bgp due to timeout or whatever [13:33:23] oh right, lowest [13:33:24] even better would be BFD, i might add that some day [13:33:32] yeah see it as 'distance' [13:33:53] brb [13:34:36] HTTP Immutable Responses - https://tools.ietf.org/html/rfc8246 [13:47:43] sorry, diaper duties... [13:47:51] ema: and I was thinking [13:48:04] after this works we should also add prometheus metrics for 'active med' per service [13:48:26] and besides the obvious benefits of that, it would also allow e.g. grafana to work out at all times what the active master is [13:48:37] (the lvs instance with the lowest med per service ip) [13:48:48] nice, yes [13:48:55] and then you could have dashboards which only show the metrics of the active lvs without the clutter of the backups [13:49:05] which are irrelevant for most purposes/people [13:50:09] and possibly even a scriptable way to figure out who the master is! [13:50:19] yes [13:50:24] "master" [13:50:31] the active one anyway [13:50:34] right [13:51:28] but, multiple services share the same ip [13:51:38] so one service with higher MED should probably affect all others [13:52:01] (e.g. port 80 vs 443) [14:33:53] mark: one idea that we had floated before was to set the MED as the sum of the weights of the pooled realservers [14:34:14] mm [14:34:19] so that the LVS having the connectivity to most realservers wins [14:34:26] but in case of complex outages, this could get a bit messy [14:34:30] flapping traffic etc. [14:34:33] yes [14:34:55] especially with ips shared by multiple services [14:38:36] it could also flap when depooling a realserver based on which pybal update first the config from etcd, right? [14:44:09] volans: i can't parse that [14:46:09] mark: when doing a normal depool of a realserver, different LVSes pick that depool from etcd at slightly different times, so the change in MED could make it flap each time we pool/depool a realserver IIUIC [14:46:26] yes [14:46:37] could mitigate that I guess with a delay [14:48:36] or instead of the sum of weights using a % of the reachable realservers [14:50:26] and some delta threshold (switch only if > 15% diff) [14:50:46] to account for the change of percentages when adding/removing servers [15:22:43] i'm inclined to just go with a static raised value [15:22:48] only on depool threshold [16:05:19] bblack: I was looking into 200 responses with CL:0 to try get rid of the vcl workaround for T144257 [16:05:19] T144257: Certain images failing to load in ulsfo - https://phabricator.wikimedia.org/T144257 [16:06:34] bblack: and noticed that we do generate 200s with CL:0 for healtchecks, it might be nicer to include some context info? https://gerrit.wikimedia.org/r/#/c/393251/ [16:08:03] as an alternative we could return 204s I guess, the response body being empty, but then we'd have to update the pybal checks ugh [16:11:14] not that the healtchecks responses have anything to do with T144257 nor its workaround, it just came to mind while poking around :) [16:11:15] T144257: Certain images failing to load in ulsfo - https://phabricator.wikimedia.org/T144257 [16:16:42] there are a few such responses on upload btw http://bit.ly/2iMvIRw [21:41:00] 10Traffic, 10Operations, 10Performance-Team: load.php requests taking multiple minutes - https://phabricator.wikimedia.org/T181315#3786217 (10Tgr) [21:57:29] 10Traffic, 10Operations, 10Performance-Team: load.php requests taking multiple minutes - https://phabricator.wikimedia.org/T181315#3786261 (10Tgr)