[00:06:31] 10Wikimedia-Apache-configuration, 10Wikimedia-Site-requests, 06Wikisource, 07Community-consensus-needed, 13Patch-For-Review: Move oldwikisource on www.wikisource.org to mul.wikisource.org - https://phabricator.wikimedia.org/T64717#2229611 (10Deskana) Thanks for pinging me, @Billinghurst. I'd be happy to... [10:15:36] 10Traffic, 06Operations, 13Patch-For-Review: confctl: improve/upgrade --tags/--find - https://phabricator.wikimedia.org/T128199#2230144 (10Joe) GET: ``` $ confctl --config conf.yaml select 'dc=codfw,cluster=api_appserver,name=mw20(7[6-9]|[8-9][0-9]).*' get {"mw2076.codfw.wmnet": {"pooled": "yes", "weight": 2... [13:52:46] 10netops, 06Operations: HTCP purges flood across CODFW - https://phabricator.wikimedia.org/T133387#2230372 (10akosiaris) [15:50:47] 10Traffic, 10Analytics, 10DNS, 06Operations: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2230693 (10Nuria) This is what I think needs to be done here to resolve this ticket as soon as possible: (cc-ing @BBlack and @Ottomata for confirmation) - host analytics.wikimedia.org... [16:06:07] 10Traffic, 06Analytics-Kanban, 10DNS, 06Operations: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2230731 (10Nuria) [16:15:20] 10netops, 06Operations, 13Patch-For-Review: block labs IPs from sending data to prod ganglia - https://phabricator.wikimedia.org/T115330#1721704 (10fgiunchedi) I think to properly fix this we'd need PRODUCTION_NETWORKS from https://gerrit.wikimedia.org/r/#/c/260926/ though https://gerrit.wikimedia.org/r/#/c/... [16:55:53] 10Traffic, 06Operations, 06Performance-Team, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2230898 (10BBlack) Packaging patches updated to 1.9.15-1+wmf1 (which is still in branch wmf-1.9.14-1, as we're still based on that release from debian upstream unstable/testing, and th... [17:18:14] 10Traffic, 06Operations, 06Performance-Team, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2230942 (10ori) >>! In T96848#2147772, @BBlack wrote: > [...] but using @ema's new systemtap stuff (which is way better than the sniffer-based solution) [...] Is this code published a... [17:30:34] bblack: Hi! me and ottomata were wondering how to put a static page for datasets.w.o, metrics.w.o and stats.w.o for the upcoming stat1001.eqiad.wmnet re-image that I'll do next week (that hosts those sites). [17:31:09] elukey: do you mean in the cache_misc sense, or in the local apache config on stat1001 sense? [17:31:28] bblack: I was writing sorry, too quick :) [17:32:17] bblack, heh, cache_misc sense, because the host will be offline [17:32:18] so basically I created a very basic role to create a simple apache vhost serving the static page, and the idea was to put in on another host (under cache:misc) [17:32:35] oh I see [17:32:39] but we were wondering if we could use an error document in varnish [17:32:40] directly [17:32:55] let's say, the backend fetch fails and we serve the content [17:33:14] like an Apache ErrorDocument for mod_proxy errors [17:33:27] well we have an error document already for all the caches [17:33:39] it's just not specific to your outage [17:34:13] I feel like an idiot because I didn't think about this possibility [17:34:48] I guess that the generic error page should be enough probably [17:34:51] I'm trying to think if we have an easy way to demo it heh [17:35:04] let's bring down a service! [17:35:49] well, exactly [17:36:11] plus we don't really use healthchecks for applayer backends, so I think it will wait for the whole connect timeout if the down service doesn't immediate refuse connection [17:36:22] (or if it doesn't fail fast for other reasons, like "no route to host") [17:36:50] ja bblack, we were wondering if we could make a generic maintenance thing for miscs maybe [17:36:54] maybe even as a parameter somehow in puppet [17:37:16] maintenance_sites => ['stats.wm.org', 'datasets.wm.org', ...] whatever [17:37:37] that would be rendered to varnish config and if a req comes in matching those, theyd' get the maintenance page [17:37:54] well, we *can* do anything, the question-mark is exactly what we want and how much effort/complexity it costs [17:37:58] heh, ja [17:38:00] indeed [17:38:01] i mean [17:38:26] we can simply return 503 for service and let the standard errorpage take over pretty cheaply I imagine [17:38:38] (and avoid the possibility of timeouts, or actual requests to the backend) [17:39:40] bblack: do you mean adding a VCL rule to return a 503 for $domain_under_maintenance and rely on the error document set? [17:40:13] for that kind of model, we'd probably add a flag to app_directors in role::cache::misc (which would be per-backend would be easier, instead of per-frontend. as in you can turn the flag on for all of stat1001, but not for the 3x hosts individually) [17:40:35] and have the VCL template iterate the app_directors and do the if req.http.hostname == XXX then error 503 logic [17:42:23] I like the idea, ottomata? [17:42:39] hm, that sounds fine ja! [17:42:41] this is totally different from saying "ah yeah of course the change is trivial" [17:42:52] might be useful to be able to do it per site...buuuut meh this is good enough for our use case now! [17:43:16] yep definitely, and as bblack was saying we can refine it later on if we'll have use cases [17:43:27] cool +1 i like [17:43:49] bblack how should we proceed? Open a phab task and than work together on it? [17:44:22] yeah something like that. you drive, I'll advise, I guess :) [17:44:29] elukey: i'm looking at the puppet now, looks fairly simple [17:44:54] app_directors are rendered in misc-backend.inc.cl.erb, ja? [17:45:14] yeah, the data part is easy. add an optional maint => true flag there per stanza [17:45:15] just gotta modify the erb block to look for this maintenance flag in the config when iterating [17:46:04] so you probably only need to modify misc-backend.inc.cl.erb to support this, and then role::cache::misc $app_directors declaration to set it [17:46:35] 'stat1001' => { [17:46:35] 'maintenance' => true [17:46:37] or something [17:46:58] yeah and then in the VCL template, kinda where you see e.g.: [17:46:59] if dir.key?('req_host') [17:47:25] put another conditional at the top so if maint key is set, the only statement pushed inside the if(req.http.host) block is an error synthesis [17:47:32] much like the 404 synth below it, but 503 [17:48:27] I guess it would be if dir.key?('maintenance') ... do error_synth version of things else { the existing if @varnish4/else pair } [17:49:01] something like that anyways [17:49:39] bblack: kinda got it, I'll do it next week and probably ask you or ema questions :) [17:50:21] I wanted to start messing up with VCL code and this seems to be a good oppurtunity [17:50:42] ema's still out next week I think, but yeah [17:51:19] elukey: cool! didn't you want to reinstall on monday though? [17:51:40] monday's going to be crazy in general [17:51:53] ottomata: I am out on Monday (Liberation day in Italy) [17:52:12] I imagine developers have tons of stacked up changes from the freeze this week to push out and break things faster than normal, and so do I too heh [17:52:13] so I'll work on it on Tuesday.. and schedule the downtime on Thursday problably [17:52:58] <_joe_> bblack: ever the optimist heh? [17:53:02] elukey: cool, sounds good! [17:53:08] <_joe_> but to be honest, I had the same thought this morning [17:53:37] <_joe_> I imagined flock of developers itching in withdrawal from a week of deployments freeze [17:53:45] <_joe_> *flocks [17:54:32] <_joe_> but yeah, I have a couple of nuclear-level dangerous "noop" changes that have been waiting since last friday [17:54:34] my big things that were on hold this week are: changing the mobile hostnames to use the desktop IP address, and deploying a proper multi-dc cache_maps cluster on the old mobile hardware [17:54:55] <_joe_> mine are: remove all reference to zend from all of the appservers' apache configs [17:55:10] sounds so simple, but "apache configs" lol [17:55:33] <_joe_> https://gerrit.wikimedia.org/r/#/c/281418/ [17:55:46] <_joe_> +187,-574 [17:56:19] yeah [18:02:47] bblack: logging off, thanks for the info! _joe_ enjoy the long weekend, you deserve some time off :) [18:05:21] <_joe_> indeed [18:05:51] cya! [23:04:11] 10Traffic, 06Operations, 06Performance-Team, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2231730 (10BBlack) >>! In T96848#2230942, @ori wrote: >>>! In T96848#2147772, @BBlack wrote: >> [...] but using @ema's new systemtap stuff (which is way better than the sniffer-based s... [23:46:46] 10Traffic, 06Operations, 06Performance-Team, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2231788 (10ori) >>! In T96848#2231730, @BBlack wrote: >>>! In T96848#2230942, @ori wrote: >>>>! In T96848#2147772, @BBlack wrote: >>> [...] but using @ema's new systemtap stuff (which...