[08:14:55] volans: there's a new dstat-varnishstat patchset available for your reviewing pleasure :) [12:57:10] alright we can now use dstat --varnishstat to monitor varnish counters + expiry lag (b_exl) [12:57:35] and f_exl for backend and frontend respectively [12:57:40] awesome [12:57:48] so back on the geoip inline C stuff [12:58:00] what didn't work with a trivial conversion? was it just the Vmod_func_header stuff? [12:58:10] yes [12:59:06] Vmod_func_header seems to be gone essentially, and I haven't found out how to do that in v4 after a quick web search [13:00:06] try s/Vmod_Func_header/Vmod_header_Func/ ? [13:00:48] (I found that from some git digging on v3 vs v4, but I haven't looked deeply either) [13:01:24] let's try [13:02:07] probably some args changes too, at least the normal s/sp/ctx/ [13:02:08] bblack: that was it :) [13:02:18] cool [13:02:46] yeah well so unless other surprises come up I guess the inline-C part is done as well [13:04:52] so let's confirm noop-ish on eqiad+esams text nodes in pcc and then push it through and we can work from there cleaning up anything further [13:05:32] probably this time around, we upgrade cp1008 and use it for a lot of the early work? [13:05:58] yup [13:09:39] I'm still staring at better ways to factor out the role::cache classes [13:09:47] so much refactoring to do, but none of it's trivial heh [13:10:05] we probably want the weekly backend restarts and some of the rest of the LRU mitigation on all v4 in the general case [13:11:02] I don't think text will end up needing the storage split stuff (I hope not, because I don't think we commonly get Content-Length there anyways) [13:11:27] but at least giving it weekly restarts and the lru/nuke parameter limit changes will help avoid an eventual issue there [13:13:03] it doesn't seem like the new puppet coding standards have a ton of impact initially [13:13:19] in theory all our roles become profiles, and then we get 4x new roles just for the 4 clusters [13:13:27] (which are simple roles that invoke a single profile) [13:14:04] ema, bblack: In about 10 minutes you should both have a new version of my thesis. I improved a lot. Still some things I have to do (Table Labels, Related Work, Graphs on right page usw.) but the writng should be ok hopefully. Again I would love to get your feedback, you two are pretty much the only people who can evaluate if I explained the caching correctly and so on. So pretty please :) [13:14:09] it still doesn't seem to offer a solution for the complex-middle problem or whatever you want to call it [13:14:25] Snorri: will look, thanks :) [13:14:55] in an ideal world, modules/varnish is much simpler than its current form, and simply encapsulates running varnish instances in a generic non-WMF way [13:15:35] and then Something does a very very complex job of aggregating up that + tlsproxy into the basic caching-proxy node with lots of configuration parameters [13:15:53] and then the profiles/roles consume that and give it parameters taken from explicit hiera calls, or something [13:16:31] Oh and when I´m done I want to spent some of my free time learning vcl and so on. So hopefully I can soon not only try to understand the overall concepts but the implementation and maybe even contribute in some way! At least thats the plan. [13:16:49] the new puppet standard argues for Something being a profile in the sense that it's what's mixing up the modules/varnish + modules/tlsproxy magic [13:17:20] but it also argues against Something being a profile, because we need Something to have lots of parameters that aren't looked up in hiera until we get to a cluster-specific profile... [13:17:27] I think? [13:19:33] _joe_ probably knows :) [13:20:31] probably, but he also tends to be pretty busy, so I try to get as far as I can first before I ask :) [13:20:46] maybe I'm thinking about things wrong with where the hiera() calls land [13:21:19] (probably) [13:21:38] ignoring that part of the problem, we still have 4x "module aggregating profiles" in the form of cache::text, cache::upload, etc.... [13:22:00] which share a *ton* of common complexity which only exists at the aggregate level, not at the wmf-unspecific modules/software level [13:22:13] and it doesn't sound like profiles are meant to inherit or anything [13:22:46] they can require, with no params [13:23:45] or the alternative is we try to factor it out where there's only one "profile", cache::node or whatever [13:24:08] and everything that differentiates the 4x clusters is driven by hieradata lookups within (not sure how possible that is, but it's probably doable) [13:24:28] and then we have the 4x roles, all of which use that profile, and the per-cluster params are in the role-specific hieradata [13:25:15] but that one profile, if it were one file/class, would be huge and insanely complex. it would be nice if it could be a full-fledged module somewhere, just in terms of splitting over many files, and having those files contained and isolated well in the directory structure [13:27:12] maybe the argument here is that the best rule to break is the one about modules/ [13:27:40] that we can do some wmf-specific encapsulation in modules/cacheproxy or whatever, to reduce the complexity burden at the profile level [13:27:56] (and modules/cacheproxy gets to break two rules: it's wmf-specific, and it uses modules/varnish + modules/tlsproxy + etc) [13:29:26] _joe_: low priority backlog to read if ou get bored ^ :) [13:29:57] <_joe_> bblack: ack :) [13:32:34] <_joe_> bblack: I should really look again at the cache roles, but I guess you could have a basic common profile, which is roughly role::cache::base IIRC [13:33:14] it's poorly factored, "base" isn't really a base, it's more like a collection of common things that don't need much specific parameterization, but it's used via inheritance [13:33:26] <_joe_> ugh [13:33:37] <_joe_> ok, we have to get rid of inheritance, probably [13:33:45] <_joe_> so the idea could be something like: [13:33:48] nevermind the current structure [13:34:00] it's better just to think in terms of fundamental complexity and parameterization, etc [13:34:50] <_joe_> things that really belong to managing varnish (like varnish::instance) in the module; then have one base profile (maybe separated for varnish 3/4? I dunno) that sets up a basic varnish installation [13:35:12] <_joe_> and then one profile for each of the different types of caches which installs the specific vcl [13:35:21] <_joe_> and anything else you might want to change [13:36:16] <_joe_> since all parameters you might want to configure are explicitly looked up on hiera, it's also easy to have some config to be different between role::cache::text and role::cache::upload [13:36:49] <_joe_> but I remember that the story was much more complex than what I just said :P [13:38:33] so the way I see it is this: [13:38:45] (in terms of problems) [13:39:14] 1) The current "modules/varnish" actually contains a ton of wmf-specific things. not just hieradata lookups and datafiles, but also functionality [13:39:43] 2) Even if we ignore that, look at all the manifests under modules/role/manifests/cache/ (and their corresponding templates, etc)... it's a ton of stuff as well [13:40:21] 3) ANd then once all that complexity is brought together, the 4x cache roles different in substantial ways (which I think could be entirely hieradata-driven, but I'm not 100% sure till I try) [13:40:57] <_joe_> yes, that's not one of the modules I'd start to refactor; it's a ton of work [13:40:57] even modules/tlsproxy really isn't all that pure, but it's closer. it's still only used by the cp hosts though, because it's fairly specific to them [13:41:32] if we made modules/tlsproxy and modules/varnish "purer" where they're more isolated and non-wmf-specific and reusable, a bunch more complexity piles up at what's the current modules/role/manfiests/cache/ level [13:41:36] <_joe_> I don't care much for "purity" as for "no hiera calls and no classes included from other modules" rule [13:42:18] and if we shoe-horn this into the proposed model, probably the best fit is we end up with an insanely-complex "profile" class that needs like 40 separate included classes or manifest files within it, etc [13:42:42] (if we're going for purity) [13:43:41] so my best current thought above is we make an explicitly impure modules/cacheproxy, which isolates things in directory tree terms and tries to handle a lot of the impure wmf-specific complexity of bringing together our wmf-specific configuration of modules/varnish+modules/tlsproxy for the cache proxies [13:44:01] and maybe improve the purity of modules/varnish and modules/tlsproxy in the process, shifting the impure things towards modules/cacheproxy [13:44:31] <_joe_> which will need to be configured, I guess [13:44:33] and then the profile that uses modules/cacheproxy can be considerably less complex, and mostly deal with pulling in cluster-specific hieradata to parameterize the Class instantiations [13:45:00] (I'd hope we can still keep config/hieradata out of modules/cacheproxy) [13:45:02] <_joe_> well, I would've called cacheproxy a profile, but it's purely naming at this point [13:45:19] it does make sense logically as a profile [13:45:44] <_joe_> because you could then have the role classes include the profile::cacheproxy and the specific profile::upload [13:46:01] it's just it would end up being a way-too-complex profile to be in an inter-mixed directory structure with a crapload of files polluting in dirs like modules/profile/manifests/cache/ + modules/profile/templates/cache/ + ... [13:46:27] <_joe_> well, at this point it's mostly naming [13:46:40] <_joe_> I'm ok with both :) [13:46:41] I think, probably upload/text/misc/maps are just "roles", and they all include the same profile [13:46:56] their job is just to let hieradata lookups differ based on role, for the hiera() calls in the singular profile [13:47:27] <_joe_> I would in general try to avoid the single-profile-for-all pattern though, it can make it overcomplex. [13:47:37] most of the complexity is shared, though [13:47:46] <_joe_> in that case :) [13:48:31] the diff between the 4x clusters is mostly data. there's a few functional bits like "do we include cache::kafka::eventlogging on this cluster?", but those can still be a hieradata boolean enclosing the functional stanza [13:48:37] and those are relativey-rare [13:50:41] is there any rule about avoiding namespace "collisions" between profiles, roles, and modules. I know they wouldn't collide inside puppet since profiles and roles are under a subpath, but it seems probably-confusing on some level [13:51:00] modules/cacheproxy + modules/profile/manifests/cacheproxy.pp [13:51:15] <_joe_> I don't think it would [13:51:28] <_joe_> I mean we already have a ton of module foo => role::foo [13:51:30] I guess it's also unconfusing in that it shows the obvious relationship [13:54:26] I guess, even in the new standards, a role could still have role-specific templates/ and files/, and use them as parameters to ::profile::foo? [14:03:15] <_joe_> uhm I'm not sure that would be the idea, no [14:03:56] well, a file or a template-file is still just a piece of data clusters can differ on when invoking otherwise-shared profiles/modules [14:04:11] <_joe_> yes, you're right, I just found a few applications [14:04:13] <_joe_> it makes sense [14:04:30] I'm not sure how it looks in puppet terms to try to use them as params though [14:05:00] <_joe_> $file_with_ca = hiera('something') [14:05:05] I mean I guess 'foo => template(....)' referencing modules/role/templates/blah.erb, and the called class can then use it in a file statement [14:05:21] <_joe_> file { $ca_path: source => $file_with_ca, } [14:05:54] I guess [14:06:01] <_joe_> where $file_with_ca will be 'puppet:///modules/role/something/bla.pem' [14:06:14] it seems odd to use hiera that way, but it's the same basic thing [14:06:23] <_joe_> it's not odd, per se [14:06:41] <_joe_> you're just defining parameters, after all [14:07:03] <_joe_> surely better than using those weird variables like $caller_module_name [14:07:15] e.g. hierdata/role/common/cache/text.pp would have file_with_ca => 'puppet:///modules/role/cache/text/bla.pem' [14:07:26] <_joe_> yes [14:07:43] and manifests/role/cache/text.pp would have (in the params of ::profile::foo): foo => hiera('file_with_ca') [14:08:17] why not just have manifests/role/cache/text.pp have (in the params of ::profile::foo): foo => 'puppet:///modules/role/cache/text/bla.pem' [14:08:30] <_joe_> no, role/cache/text.pp will have just "include ::profile::foo", hieradata/role/common/cache/text.yaml will have that [14:08:33] it just seems like an unecessary indirection [14:08:48] oh right, I forget that the profile isn't the parameterized thing [14:08:55] <_joe_> yes :) [14:09:01] <_joe_> there is not a lot of indirection [14:09:32] <_joe_> you go to a role, see which profiles are defined, see its hieradata, and you should be immediately able to know how the profiles are configured [14:09:50] <_joe_> honestly I think the LVS module /role should be refactored first [14:10:03] <_joe_> but that's one case where I'd expect map() to be available [14:10:16] <_joe_> I suspect that will make our lives much, much easier there. [14:10:20] yeah [14:10:32] <_joe_> I'd wait, not expect [14:10:37] I'm not fond of where it's at today. I think some of the hiera-ization we've done for LVS-related things is actually a step backwards [14:11:00] IPs end up defined in two places, for instance [14:11:10] <_joe_> it has been invaluable though to get the same info in other parts of the code [14:11:16] <_joe_> the lvs::configuration thing [14:11:35] <_joe_> are they? didn't we use yaml references to avoid double-definitions? [14:11:52] no, I mean across the configuration of the LVS service itself (for pybal) and the actual service nodes [14:12:08] we have all the IPs for the pybal view of things in hieradata/common/lvs/configuration.yaml [14:12:38] and then we have things like: bblack@alaxel:~/repos/puppet$ grep lvs:: hieradata/role/eqiad/elasticsearch/cirrus.yaml [14:12:41] lvs::realserver::realserver_ips: "10.2.2.30" [14:12:57] before, realserver_ips were always indirect refernces into the same base LVS dataset of addresses [14:13:15] that pattern of duplication is common in the repo now [14:13:24] <_joe_> no they weren't [14:13:30] <_joe_> not for lvs::realserver [14:13:40] well, in the cases I was looking at they always were [14:13:42] <_joe_> those were all carved by hand in the classes [14:13:55] <_joe_> and yes, that's surely an error [14:14:04] bblack@alaxel:~/repos/puppet$ grep lvs:: hieradata/role/eqiad/elasticsearch/cirrus.yaml [14:14:07] lvs::realserver::realserver_ips: "10.2.2.30" [14:14:09] oops wrong paste [14:14:10] fucking vim [14:14:12] <_joe_> eheh [14:14:25] <_joe_> that's the original sin, yes [14:14:31] the cache nodes still do: [14:14:32] class { 'lvs::realserver': [14:14:32] realserver_ips => $lvs::configuration::service_ips['text'][$::site] [14:14:59] I'm not sure why so many other services are now double-defining the IP, but it's an area that's hard to deal with cleanly in most solutions [14:15:20] <_joe_> bblack: because it was double-defined originally, and stupidly [14:15:33] ok [14:15:35] <_joe_> I'm pretty sure it was the case for mediawiki [14:16:05] maybe it has to do with wanting to not instantiate lvs::realserver the same way? [14:16:09] I don't know [14:16:51] <_joe_> well, this is one of the things I could've fixed in the swift profile, for example. [14:17:07] class { 'lvs::realserver': [14:17:07] realserver_ips => $lvs::configuration::service_ips['text'][$::site] [14:17:10] gha [14:17:20] ok, I'm not pasting things to IRC anymore until I figure out what's gone wrong [14:17:42] I think it's the fault of a change to vim's behavior, but I can't be certain, it could be my terminal too, or some underlying X-related bits [14:18:08] something in debian-unstable changed unstably, and now the normal X middle-button select/paste stuff can't get data in or out of text in vim [14:18:20] but works fine on normal terminal output in the same terminal window, outside of vim [14:18:28] something is trying to be too smart for its own good :P [14:32:34] ema: what about the FIXME on req.backend==foo ? we can't compare req.backend_hint the same way? [14:32:59] in the long run there will be better ways to factor that, we'll have to do it [14:34:28] maybe using vmod_var in vcl_recv to make a determination about which appservice the request is using, and then using simpler conditionals on that variable throughout the rest of the VCL [14:41:16] it would require some heavier refactoring, though [15:12:34] bblack: mmh no, req.backend_hint == appservers results in Backend not found: 'appservers' [15:20:23] using vmod_var seems like a good approach, but I'd keep that separate from the current forward-port patch [15:23:05] well for the purpose at hand, we could use a local throwaway header too [15:23:31] set req.http.X-backend-is-mediawiki = 0|1, and use it below and then unset it [15:23:50] well we really don't need the 0 value, undefined is good enough [15:23:56] yeah [15:24:40] probably can just od that in general, not just for v4. it works in v3 too and it's a bit cleaner [15:24:45] s/od/do/ [15:47:12] bblack: CR updated, it's not a noop anymore of course :) [17:31:01] 10Traffic, 06Operations: Standardize varnish applayer backend definitions - https://phabricator.wikimedia.org/T147844#2709511 (10BBlack) [17:31:04] 10Traffic, 06Operations, 13Patch-For-Review: Use hostnames (not IPs) in deployment-prep varnish app_directors - https://phabricator.wikimedia.org/T147848#2709509 (10BBlack) 05Open>03Resolved a:03BBlack [17:39:06] 10Traffic, 06Operations: Standardize varnish applayer backend definitions - https://phabricator.wikimedia.org/T147844#2709551 (10BBlack) [17:39:10] 10Traffic, 06Discovery, 06Operations, 10Wikidata, and 3 others: Move wdqs to an LVS service - https://phabricator.wikimedia.org/T132457#2709549 (10BBlack) 05Open>03Resolved a:03BBlack [17:53:24] 10Traffic, 06Operations: SSL certificate for policy.wikimedia.org - https://phabricator.wikimedia.org/T110197#2709767 (10RobH) [17:54:51] 10Traffic, 06Operations, 13Patch-For-Review: Move pybal_config to an LVS service - https://phabricator.wikimedia.org/T147847#2709790 (10BBlack) See patch above. I think we can skip over the LVS bit here, until some later date when we have more than one backend defined per-datacenter. We also can't/shouldn'... [18:25:26] 10Traffic, 06Operations, 13Patch-For-Review: Move rcstream to an LVS service - https://phabricator.wikimedia.org/T147845#2709924 (10BBlack) Hmmm, now that I'm really looking at the changes, it's pretty clear we have a bigger problem here than with most services. For some reason I can't remember, rcstream wa... [20:27:19] 10Traffic, 10Analytics, 06Operations: The WMF-Last-Access Set-Cokkie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710558 (10bd808) [20:27:34] 10Traffic, 10Analytics, 06Operations: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710572 (10bd808) [20:40:48] 10Traffic, 10Analytics, 06Operations: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710588 (10bd808) I've found some blog post from 2012 that discuss IE6, 7 & 8 not supporting `Max-Age` and suggesting s... [20:47:12] 10Traffic, 10Analytics, 06Operations: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710558 (10BBlack) We use expires in our `CP` cookies as well (which track connection properties for HTTP/2 stats), so... [20:48:35] 10Traffic, 06Operations: Removing support for DES-CBC3-SHA TLS cipher - https://phabricator.wikimedia.org/T147199#2684468 (10BBlack) [20:48:39] 10Traffic, 10Analytics, 06Operations: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710602 (10BBlack) [21:23:23] 10Traffic, 06Operations: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#2710681 (10Legoktm) [22:24:01] 10Traffic, 10MediaWiki-extensions-CentralNotice, 06Operations: Varnish-triggered CN campaign about browser security - https://phabricator.wikimedia.org/T144194#2591355 (10demon) >>! In T144194#2606379, @Legoktm wrote: > FWIW, MediaWiki has a built-in sitenotice (https://www.mediawiki.org/wiki/Manual:Interfac... [22:31:46] 10Traffic, 10Mobile-Content-Service, 06Operations, 10RESTBase, 06Services: Varnish not purging RESTBase URIs - https://phabricator.wikimedia.org/T127370#2711040 (10GWicke) 05Open>03Resolved a:03GWicke Resolving this in favor of T127387. [22:32:36] 10Traffic, 10Mobile-Content-Service, 06Operations, 10RESTBase, and 2 others: Split slash decoding from general percent normalization in Varnish VCL - https://phabricator.wikimedia.org/T127387#2711047 (10GWicke) [22:52:59] 10Traffic, 10MediaWiki-extensions-CentralNotice, 06Operations: Varnish-triggered CN campaign about browser security - https://phabricator.wikimedia.org/T144194#2711135 (10Legoktm) Well, if varnish doesn't cache those requests couldn't we just UA sniff in MW config and conditionally set the sitenotice based o... [23:23:05] 10Traffic, 06Operations, 06Services, 07discovery-system, 05services-tooling: Figure out an etcd deploy strategy that includes multi DC failure scenarios. - https://phabricator.wikimedia.org/T98165#2711361 (10GWicke) [23:34:42] 10Traffic, 06Operations, 10RESTBase, 06Services, 07Service-Architecture: Proxying new services through RESTBase - https://phabricator.wikimedia.org/T96688#2711407 (10GWicke) 05Open>03Resolved a:03GWicke Yeah, there isn't much useful life in this one left. Closing.