[00:35:32] 10netops, 10Operations: RPKI Validation - https://phabricator.wikimedia.org/T220669 (10faidon) Thanks @JobSnijders, appreciate the feedback very much :) Our goal is to reject all invalids everywhere indeed, just progressively so. Separate validator instances per PoP would be ideal I think, but more so for red... [05:34:27] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes - https://phabricator.wikimedia.org/T226589 (10ema) p:05Triage→03Normal [06:03:58] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Study performance impact of disabling TCP selective acknowledgments - https://phabricator.wikimedia.org/T225998 (10ema) >>! In T225998#5284077, @Gilles wrote: > Remember that x-cache headers are read from right to left. More details here: https:/... [06:33:19] moritzm, jbond42|away: hi! any update on T222356? Can I help speeding up the deployment of a fixed facter? [06:33:19] T222356: facter3: Unable to parse routing table - https://phabricator.wikimedia.org/T222356 [06:33:45] it's everywhere except the puppetmasters and puppetdb [06:34:03] ah, no. different bug [06:36:02] last update by upstream was that they're running the patch through their various OS instances six days ago, guess our CI isn't the only slow one :-) [07:23:34] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp5003.eqsin.wmnet'] ` The log can be found in `... [08:17:10] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5003.eqsin.wmnet'] ` and were **ALL** successful. [08:34:06] the Selective ACK perf diff would be a good blogpost story [08:41:52] yep! [09:23:43] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp5004.eqsin.wmnet'] ` The log can be found in `... [09:27:10] part of the blog post needs to describe how moritzm and I were testing disabling SACK during a break from meetings at the SRE offsite by using hotel wifi to simulate "shit connection" [09:28:49] afaict that was a very real simulation [09:42:49] hi ema l;ooks like your comment on the github issues has triggered them to act. I think once the patch has been merged we could potentially back port it to a package we build our self instead of waiting for there release cycle (which im still trying to understand), but will double check with moritzm [09:43:36] yeah, that sounds good, let's apply/backport the patch once they've merged it [09:45:16] ack sounds good [09:46:41] wonderful [10:12:48] 10Traffic, 10Wikimedia-Apache-configuration, 10DNS, 10Matrix, and 3 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Joe) a:03Joe [10:15:54] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5004.eqsin.wmnet'] ` and were **ALL** successful. [10:19:03] 10Traffic, 10Wikimedia-Apache-configuration, 10DNS, 10Matrix, and 3 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Joe) 05Open→03Resolved Using curl I can confirm the header is now added. I fear you might need to force-reload in... [10:41:16] I've documented URI path normalization on wikitech, feel free to improve/expand: https://wikitech.wikimedia.org/wiki/URI_Path_Normalization [11:44:38] 10Domains, 10Traffic, 10Operations, 10WMF-Legal, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10tramm) 05Open→03Stalled p:05Normal→03High Any news? Anyone able to help with this? [11:58:55] 10Domains, 10Traffic, 10Operations, 10WMF-Legal, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10jcrespo) This is blocked on @CRoslof or someone else from legal. Last thing he said: > but we are still evaluating the implications of doing so [12:30:12] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp5005.eqsin.wmnet'] ` The log can be found in `... [12:40:51] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp5005.eqsin.wmnet'] ` The log can be found in `... [12:41:04] mmh the first reimage of cp5005 failed, retrying [12:41:36] * volans would not be automatically nerd-sniped unless explicitely asked so :-P [12:42:01] * ema is not shooting [12:48:26] turning it off and on again seems to have done the trick [13:28:27] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5005.eqsin.wmnet'] ` and were **ALL** successful. [13:57:38] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in codfw - https://phabricator.wikimedia.org/T226637 (10ema) [13:57:45] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in codfw - https://phabricator.wikimedia.org/T226637 (10ema) p:05Triage→03Normal [13:58:02] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ema) [13:58:09] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ema) p:05Triage→03Normal [13:59:19] Is cp1008 an active proxy? Does code destined for the cp hosts need to be jessie-compatible? [14:00:54] shdubsh: it is a test host that should be replaced by a new one with a contemporary debian release at a certain point (ideally soon) [14:01:13] shdubsh: so I'd say that no, code for cp hosts need not be jessie compatible [14:01:22] what do you have in mind? [14:01:44] ah, and no cp1008 does not serve any real user traffic [14:02:24] I'm fishing for things to consider when packaging varnishkafka exporter and its dependencies. [14:03:32] It came up in review that we should target compat 10 and debhelper>=10 given that jessie support isn't required [14:03:42] thanks ema :) [14:03:46] yw! [14:19:50] 10Traffic, 10Operations, 10Patch-For-Review: Investigate esams text varnish backend fetch failures - https://phabricator.wikimedia.org/T226375 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp3043.esams.wmnet'] ` The log can be found in `/var/log/wm... [14:30:58] XioNoX: noting now so I don't forget later after whatever's going on with librenms: I started digging through the anycast_healthchecker review this morning. Looks good overall, but found an issue with a type name to fix, I think: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/397723/ [14:44:02] ok, thx! will reply [14:54:26] 10Traffic, 10Operations, 10Patch-For-Review: Investigate esams text varnish backend fetch failures - https://phabricator.wikimedia.org/T226375 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp3043.esams.wmnet'] ` and were **ALL** successful. [15:10:16] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp5006.eqsin.wmnet'] ` The log can be found in `... [15:23:22] 10Traffic, 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team (Radar): Increase EventLogging limit from 2K to 5K - https://phabricator.wikimedia.org/T208282 (10Ottomata) 05Open→03Declined Modern Event Platform's EventGate will support larger events in POST bodies. [15:39:44] I've added the option to filter by method here: https://grafana.wikimedia.org/d/000000464/prometheus-varnish-aggregate-client-status-code?orgId=1 [15:40:51] 10netops, 10Operations: RPKI Validation - https://phabricator.wikimedia.org/T220669 (10ayounsi) > I'd say to deploy the two policies to all routers, even if unused (because e.g. they're not peering routers) - after initial testing that is. Yup, that's the plan, to have all routers similar. > Maybe deploy it on... [15:41:53] 10netops, 10Operations: RPKI Validation - https://phabricator.wikimedia.org/T220669 (10JobSnijders) Try the following: ` 'members 0x4300:0.0.0.0:2' ` This is a documentation bug on juniper's website. It has been reported to them already. [15:58:03] 10netops, 10Operations: RPKI Validation - https://phabricator.wikimedia.org/T220669 (10MelchiorAelmans) Thanks @JobSnijders for bringing this to my attention. I've raised this with the documentation team and also added a comment to the SR. Should be fixed soon. Indeed this should be configured as: set policy-... [16:02:59] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5006.eqsin.wmnet'] ` and were **ALL** successful. [16:06:39] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes - https://phabricator.wikimedia.org/T226589 (10ema) [16:06:41] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in eqsin - https://phabricator.wikimedia.org/T226477 (10ema) 05Open→03Resolved Done. [16:49:57] 10netops, 10Operations: RPKI Validation - https://phabricator.wikimedia.org/T220669 (10ayounsi) Thanks for the quick replies, it passes a commit check, will push the following shortly. `lang=diff [edit policy-options policy-statement BGP_sanitize_in then] community delete AS14907:ALL { ... } + commun... [16:53:18] 10netops, 10Operations: RPKI Validation - https://phabricator.wikimedia.org/T220669 (10JobSnijders) These are non-transitive extended communities. They can not cross an EBGP boundary, the deletion in `policy-statement BGP_sanitize_in` is perhaps superfluous. [18:32:37] XioNoX: re: anycast_healthchecker stuff, maybe we try it first with puppet disabled on the recdnses and just do one first, etc. [18:32:59] for sure, yeah [18:33:25] cumin has 'A:dns-rec' to catch them all (the esams ones still have non-standard names) [18:34:12] but I think it's ready for that, whenever (today or tomorrow, depending on what you've got going on) [18:35:11] and then we can battle-test it a bit to see that health responds appropriately to intentional problems at least, and then use it for a few cp nodes or whatever. [18:39:44] There is also a more recent upstream version of anycast_healthchecker, nothing that we strictly need, but if we want to upgrade the packages, I'll need help [18:39:58] https://github.com/unixsurfer/anycast_healthchecker/releases/tag/0.9.0 [18:42:14] yeah I'd avoid it for now unless we know of a bug it fixes [18:42:26] seems like a lot of delta and thus high probability of new bugs heh [18:43:00] eh, major version [18:44:00] well they're in the 0.x.y phase still, which I think even semver makes an exception for [18:44:09] (so there's no gaurantees at all really) [18:44:30] https://semver.org/#spec-item-4 [18:45:06] ah didn't know that doc! [18:45:22] not every project adheres to semver rigidly, or even knows or cares about it [18:45:40] but it's a good default bet for interpreting x.y.z -style version numbers these days, at least loosely [18:47:18] bblack: tomorrow seems better to deploy the CR as it's getting late here, ping me when you get online :) [18:47:33] ok works for me :) [18:50:11] 10Traffic, 10Operations: rack/setup/install ganeti400[123] - https://phabricator.wikimedia.org/T226444 (10BBlack) a:05akosiaris→03None I don't think anyone's 100% sure how we're handling this project, but probably Traffic will figure out the setup for these and ask Alex if we need help. We probably won't...