[12:46:36] 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Alerts on LVS services with one single realserver - https://phabricator.wikimedia.org/T177815#3687153 (10ema) 05Open>03Resolved a:03ema PyBal upgraded to 1.14.2 on all LVS hosts. [12:46:57] 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: RunCommandMonitoringProtocol throws an exception if runcommand.arguments is not specified - https://phabricator.wikimedia.org/T178149#3687156 (10ema) 05Open>03Resolved a:03ema PyBal upgraded to 1.14.2 on all LVS hosts. [14:06:24] bblack, moritzm: https://gerrit.wikimedia.org/r/384520 [14:07:16] unfortunately there's no SystemCallFilter on jessie [14:09:28] it's actually documented in the man page but silently ignored (I found out the hard way and then asked Moritz who said "yes of course!") :) [14:10:42] we could enable it in a +wmf system build (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760299 for the backstory), but then we'd need to rebuild future systemd updates in jessie as well and time seem wiser spend moving cp* to stretch [14:11:10] I'll have a look at the patch later or tomorrow morning [14:12:07] ema: have you tested shm/reload stuff under those settings? [14:12:22] moritzm: yeah let's wait for stretch before enabling seccomp filtering [14:12:58] I guess by shm I mean varnishlog and friends [14:13:44] bblack: hi! So, VCL reloads work fine. varnishlong/ncsa too I think [14:13:52] let me double-check (settings enabled on pinkunicorn) [14:14:31] yup, confirmed [14:20:48] BTW, ReadWritePaths might also be a possible option as well (limiting to /srv and wherever else varnishd might write to) [14:22:07] moritzm: I don't think that's available on jessie; at least it's not documented [14:23:42] ah, it seems to have been renamed in 231: [14:23:57] The InaccessableDirectories=, ReadOnlyDirectories= and [14:24:00] ReadWriteDirectories= unit file settings have been renamed to [14:24:00] InaccessablePaths=, ReadOnlyPaths= and ReadWritePaths= [14:24:11] ah! [14:24:46] however they'll be handling backwards compat handling remains to be seen... [14:26:40] what is this "compatibility" you speak of? all systems are laptops running Fedora updated to the bleeding edge nightly [14:28:06] gh [15:41:31] I was about to carry on with v5 upgrades in ulsfo-misc [15:41:41] then I've realised there's no such thing :) [15:42:23] codfw next then [15:52:48] :) [15:54:44] in light of the continuing complaint about commons delete->purge failures, and our lack of ability to find a solution so far [15:55:19] I've been updating vhtcpd to make sure we can notice any possible failure at that level, and so that it has its own delay mechanisms to work around cache layer layers [15:55:23] err, cache layer races [15:55:50] nice [15:56:03] so we'll be able to have the local vhtcpd daemon delay by configurable amounts (e.g. more delay as we branch upwards from the primary DC, and a fixed be-to-fe delay) [15:56:27] it's not a perfect solution, but if layer races are a factor, it should make them statistically unnoticeable [15:56:51] (which they would normally be anyways, except perhaps for hot objects, which the complaints are about) [15:57:33] I've seen examples of the failures where it's the backend-most cache that failed to purge though. So I don't really believe this solves the problem. It's probably a [swift/mw]-vs-[backendmost-cache] sort of race, if any kind of race [15:57:56] but maybe a small delay at the backend-most caches would paper over that anyways, who knows. it'd due diligence to make sure we've tried everything we can on our end within reason. [15:59:07] I'm still obsessing my way through QA-checking all the new code, but I think it's already in good shape, I've been testing it on cp1008 over the weekend pretty heavily [15:59:38] if you'd like to comb through it and look for stupidity: https://gerrit.wikimedia.org/r/#/projects/operations/software/varnish/vhtcpd,dashboards/default [16:00:15] (but honestly, these diffs are not very amenable to easy review... it's C, and a lot changed, and I squished commits during the process :/) [16:02:52] bblack: vhtcpd's Makefile says "how to build gdnsd" :) [16:03:33] hah, yeah a lot was copied a long time ago for boilerplate [16:04:09] wow the changelog is pretty long! [16:04:29] yes, and a couple of the changes basically rewrite a couple of the major source files, so.... [16:04:42] it is what it is, sometimes you just have to rush through things :P [16:04:55] fixed makefile comment :) [16:09:48] looks awesome, I'll finish up misc_codfw and give a closer look later [16:10:18] the stats output stuff changed completely too, so we'll have to fix up the prometheus stuff [16:10:34] but we have no grafana dashboards currently I don't think? [16:23:21] yeah I don't think we have any at the moment [16:23:29] so this is weird, look: [16:23:32] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=2&fullscreen&orgId=1&var-site=codfw&var-cache_type=misc&var-status_type=1&var-status_type=2&var-status_type=3&var-status_type=4&var-status_type=5 [16:23:45] we've got a ton of 301s in misc_codfw [16:24:02] they seem to be: [16:24:03] [16/Oct/2017:16:23:48 +0000] "GET http://stream.wikimedia.org/socket.io/1/ HTTP/1.1" 301 0 "-" "Java/1.8.0_60-ea" [16:24:24] however, they don't show up in nginx stats [16:24:44] eg: https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?orgId=1&var-server=cp2018&var-datasource=codfw%20prometheus%2Fops&panelId=65&fullscreen [16:25:51] oh sure, they don't show up on nginx stats because the client is hitting port 80 (varnish) :) [16:27:27] right, so someone's failing to use https in their initial URL [16:27:31] and the IP is, surprise surprise, GCE [16:27:33] I'm guessing it's just one client or something? [16:28:35] in any case, the huge volume of 301s there doesn't seem to be new [16:28:43] it's mostly that Java UA, from a couple different IPs (both GCE) [16:39:16] related to the above about varnish port 80. we have some old pending work to stop doing that and move port 80 to nginx too [16:39:47] it's blocked on the noncanonicals redirect service thing, which I think Krenair has some patches for that I've failed to look at in depth yet [16:39:59] I still want to get that done this quarter somehow, from the back burner or whatever [16:54:43] oh yeah [16:55:08] bblack: https://gerrit.wikimedia.org/r/#/c/382873/7/src/Makefile.am is it OK that -lev and friends are gone? [16:59:53] gotta run! see you [17:18:19] ema: yeah, the AC_CHECK_LIB() stuff in https://gerrit.wikimedia.org/r/#/c/382873/7/configure.ac adds them back the "right" way, the old entries in Makefile.am were manual hacks [19:07:15] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3688555 (10Pigsonthewing) Sad to see this claim made in 'Tech News' today: > If you use Internet Explorer 8 on Windows XP you can... [19:15:08] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3688584 (10Johan) @Pigsonthewing Yes, Tech News deals in simplification, for a number of reasons (non-native speakers without access... [19:16:36] 10Traffic, 10Citoid, 10Operations, 10RESTBase, and 5 others: Set-up Citoid behind RESTBase - https://phabricator.wikimedia.org/T108646#3688587 (10mobrovac) [21:02:28] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3688928 (10Pigsonthewing) > simplification... at the cost of precision > "you" instead of "you or someone with administrator acces...