[14:42:28] ema: any chance you could prioritize the review of my outstanding pybal patches? [14:42:47] it's rather painful to continue working on it until they're merged with gerrit workflow :/ [14:46:25] mark: sure :) [14:46:36] i'm extending the unit tests also btw [14:46:41] nice [14:49:37] mark: what does the "Nevermind" here mean? https://gerrit.wikimedia.org/r/#/c/354723/4/pybal/bgp.py@2536 Does prefixesAdded == 0 happen when the update message is already "full" so to say? If so we could mention that [14:49:53] other than that, that patchset looks good to me [14:50:14] yes, exactly [14:50:30] i guess it's not clear enough [14:50:32] let me add [14:50:37] thanks [14:55:31] thanks! :) [15:01:23] bah now it conflicts with your file move change [15:01:39] yeah that needs a manual rebase [15:01:50] let me do that [15:01:55] thanks [15:08:43] a couple things on my mind re: pybal/lvs future features (aside from ones already documented / well-known), in case it has any impact with ongoing stuff... [15:10:08] 1) It might be nice to ensure that all monitoring (well, other than runcommand?) happens over the actual LVS direct links, I guess by having a configurable IP TTL for monitoring packets that defaults to 1? Otherwise LVS could lose its direct connection to a VLAN and thus be unable to route the traffic, but the monitor could still succeed via the core couters? [15:10:42] my old idea for that was to have a check for it [15:10:51] that also tests the service ip bound to loopback [15:11:02] so an icmp ping to service ip with correct realserver mac [15:12:51] I assume there's a setsockopt or something we could use to just set the TTL on the tcp/udp monitoring so that they'd just fail if they were routed [15:14:30] 2) Once we've got the IP advertising split out (separate updates, separate configurable MEDs), it might be nice to also have the ability (I think we'd default it on?) to stop advertising a service IP to the routers when you reach the "too many down" point (or a separate limit, in case we want it different than the normal depool threshold) [15:15:24] if the above two are combined, we get the nice feature that if, say, the active LVS for a service loses its link to a whole row in eqiad which causes monitoring loss to too many servers for a service, it will stop advertising and let the other LVS that still has good links to all the rows take over [15:15:25] yes, i was already thinking about that [15:15:34] 2) is easy enough to do now [15:17:56] oh and... [15:18:27] 3) More than one BGP peer to advertise to, so that each pybal can advertise directly to both core couters, instead of having the routers learn about pybals from each other [15:18:34] yep, can do that too [15:19:21] (then loss of 1/2 core routers doesn't have to imply failing out the LVS machines that were talking to that router) [15:20:39] mmh gerrit isn't happy about the rebase: [15:20:40] ! [remote rejected] HEAD -> refs/for/master (change https://gerrit.wikimedia.org/r/354686 closed) [15:21:10] (and indeed https://gerrit.wikimedia.org/r/354686 is closed, we've merged it!) [15:27:00] why does it try to push that one then [15:27:08] different commit? [15:27:14] did you do a fetch? :) [15:27:35] yeah, I did: git fetch https://gerrit.wikimedia.org/r/operations/debs/pybal refs/changes/46/354746/4 && git checkout FETCH_HEAD [15:27:50] and git rebase origin/master [15:28:11] after fixing the conflicts, git push gerrit HEAD:refs/for/master [15:29:12] why doesn't a straight git fetch origin work? [15:32:32] is origin/master up to date in your working copy? [15:33:03] yup [15:33:03] * master 8ab77f6 [origin/master] Add GPLv2 license header to bgp.py [15:34:19] does rebase still leave on old version of 354686 outstanding maybe? [15:34:27] git status might know [15:35:03] ema: check the git history in the branch, it happened to me to have twice the "same" commit with two different SHAs with another commit in the middle [15:35:32] oh yeah that's what happened [15:35:48] bblack: re:pybal monitoring lo, the TTL could ensure only the path followed, not that the backend host has indeed the service IP on the lo interface though [15:37:31] indeed [15:37:42] so a specific monitor that tries an icmp ping to the service ip with the realserver mac [15:37:47] (a raw packet so to speak) [15:37:49] that would do it [15:38:19] 10Traffic, 06Operations, 10fundraising-tech-ops: Fix nits in HTTPS/HSTS configs in externally-hosted fundraising domains - https://phabricator.wikimedia.org/T137161#3284389 (10Jgreen) >>! In T137161#3277960, @BBlack wrote: > @Jgreen - re: civicrm, it needs to emit the HSTS header on **all** HTTPS responses.... [15:39:46] as ema knows I've played with it 2y ago for fun during holidays (IPv4 only, both TCP and "simple" HTTP, no TLS), in case you want it ;) [15:40:34] why not icmp though? [15:40:51] you tried to integrate it into proxyfetch or something? [15:41:28] ema: anyway, I was having similar shenanigans when trying to base on your outstanding changes and quickly got tired of that ;) [15:41:41] to ensure also that the service on the host is listening on that IP too and not only the host one for example [15:41:51] volans: can do that with icmp too? [15:41:52] mark: heh, fun times [15:42:13] volans: oh port open [15:42:15] you're right [15:42:25] yes, that what I meant :D [15:42:43] mark: https://gerrit.wikimedia.org/r/#/c/354746/ looks good now I think [15:43:00] I did it in C because in the long term I might have been trying to make it a plugin for keepalived for my italian friends, but never happened :) [15:43:42] gracias! [15:43:56] i think twisted has some classes for raw packet generation too [15:44:08] technically can be used as an external check, or rewritten in python with scapy [15:46:59] * volans bbiab [15:52:09] also, icmp isn't what functionally matters, it's just another (helpful) artificial sort of test [15:52:48] it doesn't _also_ bother the service though ;) [15:52:58] bothering the service is good :) [15:54:16] I don't know how deep you'd have to dig in C (and then in python for the same interfaces), but there's probably a way to set up the (e.g. TCP) monitoring traffic to use the real service destination IP and the realserver's mac and TTL=1 [15:54:28] which is about as close as you can get to monitoring what you're actually relying on [15:56:11] https://gerrit.wikimedia.org/r/#/q/project:operations/debs/pybal+branch:vipping [15:56:18] that didn't work [15:57:59] when I did it I had to manually set the mac address at layer 2 to the backend server, but maybe there are smarter ways ;) [15:58:30] it's been 10 years anyway, needs new investigation ;) [16:15:56] 10Traffic, 06Operations: Refactor pybal/LVS config for shared failover - https://phabricator.wikimedia.org/T165765#3276305 (10ema) p:05Triage>03Normal [19:57:49] BBR stuff: on cp1065 things look fairly sane. there's some throttling happening in the fq qdisc per-flow, but we I think kinda expect that (throttling some of our fastest users to avoid starving other connections, hence "fair") [19:58:31] but on cp1074 (upload), I'm also seeing some slowly-increasing counters for flows_plimit, which means the 100 packet queue limit per-flow is being exceeded and causing "artificial" dropped packets [19:58:46] which may mean that we need some fq parameter tuning, at least for a case like upload... [19:59:44] I'm still looking into related things, but may or may not be a reason to hold off turning it on more broadly, come up with a plan for tuning up fq params a bit first and try again tomorrow, we'll see [20:04:59] actually even on cp1065, there's small occasional "dropped" from the overall queue limit too [20:05:36] and a bit more of course on cp1074, but neither's overall dropped rate is very big statistically... [21:11:20] I went ahead and reverted (the changes and the reverts are in SAL) [21:11:43] I can't figure out yet how to sanely change the fq parameters for the fq schedulers that are within the mq multiqueues [21:12:03] all related tc things are poorly documented and work awkwardly-different under multiqueue [21:13:41] so basically, BBR turn-on aborted for now. I don't want to risk the fq drops causing a problem. we need a sane way to raise the per-queue and per-flow limits in the fq parameters underneath multiqueue (mq) [21:14:14] I'll look more today and/or early tomorrow (on a non-prod machine) to see if I can figure out the issues with configuring them [21:16:03] ema: if you see this and want to look for an answer before I find one: cp4021 is a good testbed (not in prod service, has bnx2x hardware with multiqueue on like the rest of the machines, etc). "tc qdisc show dev eth0" there will show the root mq scheduler and the many fq queues that are attached. the question is how to change the "limit" and "flow_limit" params of those sanely without breaking [21:16:09] multiqueue (it creates all those subqueues using the global default qdisc with its default parameters...) [21:18:56] I got close to an answer (at least on the CLI rather than puppetized), you can replace all the fq's one by one and use new params [21:19:11] but I'm not comfortable yet that that approach doesn't mess something up with mq's specialness [21:19:40] (as the default fq's all share handle "0:", and the replacement ones get unique ids like "8004:", "8005:", etc...) [21:20:17] that method was like: [21:20:20] tc qdisc replace dev eth0 parent 8001:1 handle 0: fq flow_limit 200 [21:21:18] ^ replaces what was the shared handle 0: underneath 8001:1 (one of the queues) with a new auto-generated distinct handle (e.g. "8004:"), with fq installed with the customized parameter (in that case, flow_limit 200) [21:21:51] to reset everything back to defaults using the current default qdisc for all the subqueues, no matter what you've messed up, you can do: [21:21:58] tc qdisc del dev eth0 root [21:22:09] and it will delete everything and re-create the root mq and so-on... [21:23:22] I'm clearly missing some understanding of some magic about why the many fq's underneath mq share the handle number "0:" [21:24:43] (and a side note: a much easier way to deal with this problem would be if the kmod tc-fq had module parameters to change the defaults. the defaults are compiled in statically. patching that might be an option if nothing else works sanely) [21:39:16] https://groups.google.com/forum/#!msg/bbr-dev/b-7NW-ACb5U/sE4Kz8ESAgAJ [21:40:09] ^ this asks a similar question from someone facing the exact same problem. I couldn't find that in google searches before while I was messing with everything, of course now I find it :P [21:42:09] they seem to go with a non-default setup, where you iteratively create all the sub-fqs with the desired parameters one by one. So...it can be done and we know how to do it sanely. puppetizing that would be tricky though, it really has to be a script. something like interface-rps (or perhaps, since the mq+fq madness is induced by interface-rps -related things anyways, perhaps we can just add it [21:42:15] there...) I'll look more later...