[02:53:12] 10Domains, 10Traffic: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780124 (10KATMAKROFAN) [03:04:12] 10Domains, 10Traffic, 10Operations: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780141 (10Legoktm) 05Open>03declined If/when https://meta.wikimedia.org/wiki/Wikidirectory is approved (or it looks likely to be approved) then it would make sense to buy the domain. At this time fi... [03:08:52] 10Domains, 10Traffic, 10Operations: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780145 (10KATMAKROFAN) 19 support, 2 oppose is not "unlikely to be approved". [03:15:21] 10Domains, 10Traffic, 10Operations: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780146 (10Legoktm) Please compare that to https://meta.wikimedia.org/wiki/Wikivoyage/Archive/2012-11-16#People_interested and https://meta.wikimedia.org/wiki/Requests_for_comment/Travel_Guide [03:16:06] 10Domains, 10Traffic, 10Operations: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780124 (10Dzahn) I suggest contacting the [[ https://meta.wikimedia.org/wiki/Legal | Legal ]] team directly about this. [09:42:04] 10Traffic, 10Operations, 10ops-ulsfo: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3780516 (10BBlack) [09:42:06] 10Traffic, 10Operations, 10Patch-For-Review: rack/setup/install lvs400[567].ulsfo.wmnet - https://phabricator.wikimedia.org/T178436#3780513 (10BBlack) 05Open>03Resolved a:03BBlack These are fully in-service now [09:43:28] 10Traffic, 10Operations, 10ops-ulsfo: decommission lvs400[1-4].ulsfo.wmnet - https://phabricator.wikimedia.org/T178535#3780533 (10BBlack) These are now non-primary, but still active as backups for now. Will switch to spare role and remove from router configs post-Thanksgiving and then real decom can start. [10:12:07] so what does `blacklist modulename` actually do? [10:12:45] the explanation in modprobe.d(5) could as well be written in klingon [10:13:00] lol, yeah [10:13:46] I made those patches because, while I was hopping around lvs400[567] the other day, I kept accidentally typing "iptables -vnL" instead of "ipvsadm -Ln" [10:14:07] and even a readonly iptables command loads the modules, which I'm pretty sure can screw up traffic efficiency even with zero rules :P [10:14:37] and then I thought about blacklisting, but when I tried it manually using the method puppet would use, it didn't do anything [10:14:46] (about my current problem with "iptables" invocation) [10:16:55] ema: welcome back, I've something for you when you're done with email/backlogs ;) [10:17:24] volans: I hope it's a gift! [10:17:37] sure! it's a Puppet gift ;) [10:17:54] :) [10:18:27] so we've added to varnish::instance the Icinga "Varnish child restarted" check [10:18:45] the current semantics of blacklisting a kmod are limited to preventing auto-loading (e.g. by opening an AF_SCTP socket and then auto-loading the sctp kernel modules), it doesn't prevent a root user explicitly loading the module via insmod or modprobe [10:19:03] unfortunately that one is a define, and it's called twice per varnish host (FE+BE) [10:19:19] generating duplicate service checks on Icinga config [10:20:13] so either they should have the instance "type" in the description to make them unique per host, having 2 separate checks (meh) [10:20:47] or find some other magic place where to include the check if it has to be only one per host and not per varnish instance ;) [10:21:09] volans: it sounds like having frontend/backend in the description would be a good solution [10:21:21] or was it a trick question? :) [10:23:21] lol, no trick, just that now you have one check only, that checks only one value of $inst :D so yes, you probably want 2 checks [10:23:29] unless you want to check both of them in a single check somehow [10:25:50] yeah it makes sense to have two checks I think [10:26:29] ok I'll send a patch [10:27:05] thank you for spotting this! [10:27:31] just luck, I was doing sanity check of my changes in icinga and run the config check ;) [10:28:43] moritzm: can you think of any case in which we'd want `blacklist modulename` but not `install modulename /bin/true`? [10:28:51] moritzm: I gather from the (horrible) debian modprobe.d blacklist docs that the prevention of auto-loading doesn't kick in until update-initramfs -u and maybe a reboot? but not sure. [10:29:05] and then yeah there's the install->true hack to prevent other sources of loading [10:29:26] which is also documented here https://wiki.debian.org/KernelModuleBlacklisting [10:29:31] I don't know if the iptables->x_tables case is true autoloading, or the iptables CLI is doing an explicit modprobe or something [10:32:51] there's a couple of issues overlaying each other (which makes all that more confusing, it's quite a mess): [10:34:02] [in any case, user expectation would be "configured blacklist of module X means it can never be loaded by any means, starting right now", which clearly isn't true] [10:34:11] the example given in https://wiki.debian.org/KernelModuleBlacklisting about ipv6 is indeed a special case, you can't blacklist network-related kernel aliases for autoloading, Debian disables a few bizarre ones for security reasons (e.g. Acorn and ROSE) and that needed to be done via explicit patches [10:35:07] to add even more stupidity on top the Linux network maintainers refused to merge the patches for whatever reasons [10:37:13] update-initramfs should only be needed for blacklisting kernel modules which change some machine state during early boot, for a booted system they should all be loaded from /lib/modules AFAICT [10:38:13] as for "can you think of any case in which we'd want `blacklist modulename` but not `install modulename /bin/true"; it's probably fine for all the modules we're currently blacklisting [10:38:26] which is for [10:38:42] - preventing hardware errors by broken modules (like some of the perf stuff on older Dell) [10:39:31] - preventing privilege escalation by abusing some unused/buggy kernel code [10:40:15] if we really need to load such a module (e.g to check whether a hardware error still exists after a firmware update) we can just as well scp the module to the host and use insmod [10:42:37] yeah, just straced a dummy modprobe and it's all going towards /lib/modules [10:43:14] also blacklisting in the initrd would solve cases where e.g. some perf modules is autoprobed/loaded in early and the changes some ACPI table setting or so [10:44:14] will look over https://gerrit.wikimedia.org/r/#/c/392644/ later on [10:55:23] ema: I've added https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1 to the dashboard, feel free to modify/move it as needed [10:56:32] good idea! [10:57:06] it's on the bottom-ish right :D [10:58:32] volans: lgtm and to pcc [10:58:41] great, merging [11:03:49] ema: there you go: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=cp2002 [11:04:13] \o/ [11:04:16] both checks, each with the link to grafana (the folder icon, I know it's horrible, there is a patch to change it) ;) [15:01:28] ema: I was thinking (for that next patch) something along the lines of: (1) put the HFM behavior w/ 67s in there unconditionally for exp (2) Add the cutoff like it is currently in upload-frontend under nhw (but I guess in common VCL, and not-hfp-ing CL-less responses, just CL>x ones) and (3) remember not to switch to exp on v4 clusters (doesn't seem worth preventing this with conditionals) [15:25:31] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3775204 (10MoritzMuehlenhoff) >>! In T180978#3777053, @elukey wrote: > 2) Since the experimental tag has been removed only recently I strongly suggest to use a recent ver... [17:08:51] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3781833 (10demon) p:05Triage>03Lowest gerrit2001 is running stretch, but we haven't reimaged the master cobalt yet (cf T176774). Given that, plus the fact that this... [17:09:16] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3781837 (10demon) [19:29:19] 10Traffic, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782495 (10awight) @hoo Wondering if you wrote an incident report, that I can add to with an explanation of ORES's involvement? [19:38:14] 10Traffic, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782508 (10BBlack) No, we never made an incident rep on this one, and I don't think it would be fair at this time to implicate... [19:40:59] 10Traffic, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782516 (10awight) @BBlack Thanks for the detailed notes! All I was going to add was my understanding of how Ext:ORES has the... [19:42:15] 10Traffic, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715229 (10Zoranzoki21) Does it made problem with high sleep times in pywiki? [19:45:56] 10Traffic, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782522 (10demon) >>! In T179156#3782516, @awight wrote: > @BBlack Thanks for the detailed notes! All I was going to add was m... [20:21:34] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3782577 (10Dzahn) >>! In T180978#3781833, @demon wrote: >> I'm proposing we lower the priority on this and let another service (preferably one with less depending on it)... [20:53:31] 10Traffic, 10Operations, 10ops-ulsfo: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3782634 (10BBlack) [20:53:34] 10Traffic, 10Operations, 10ops-ulsfo, 10Patch-For-Review: rack/setup/install cp40(29|3[012]).ulsfo.wmnet - https://phabricator.wikimedia.org/T178423#3782633 (10BBlack) 05Open>03Resolved [21:01:54] 10Traffic, 10Operations, 10ops-ulsfo: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3782641 (10BBlack) Recapping where we're at on all things here, because even I get lost sometimes: Of the old hosts being decommed, the only one still in live use are: * bast4001 (blocking on bast40...