[06:26:38] 10Traffic, 10Operations: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Marostegui) Thanks for the clarification. My thoughts were that we upgraded also BIOS. Let's start with that indeed. [06:45:26] https://lists.gt.net/nanog/users/208333 - "Wikipedia drops support for old Android smartphones; mandates TLSv1.2 to read" [06:57:18] "Why does access to Wikipedia need to go over https?" I'm very surprised to see comments like that on NANOG... [07:33:32] that guy really has time to kill doesn't he [07:41:52] 10Traffic, 10Operations: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10ema) >>! In T238305#5784511, @Papaul wrote: > sometimes when the IDRAC version is not up to date we might not see and log at system crash Interesting! > so i think let us start by getting all tho... [08:58:36] 10netops, 10Operations: Routinator RSYNC errors - https://phabricator.wikimedia.org/T240817 (10ayounsi) 05Open→03Stalled p:05Normal→03Low [09:11:22] 10netops, 10Operations: Upgrade routinator to 0.6.4 - https://phabricator.wikimedia.org/T242197 (10ayounsi) p:05Triage→03Low [09:23:28] 10Traffic, 10Operations: Docker registry needs cache to vary on Accept header value - https://phabricator.wikimedia.org/T242200 (10Joe) [09:24:22] <_joe_> hi traffic people, I need advice on how to solve ^^ [09:33:28] morning _joe_ [09:35:47] <_joe_> hi vgutierrez [09:47:08] _joe_: vgutierrez: I 've been researching renaming an LVS service (in the interest of helping with the migration to TLS for eventgate-analytics. I 've got https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/562767/ and PCC seems to be ok. Lemme know what you think [09:47:20] ack [09:47:22] checking [09:48:21] it seems pretty doable and with 0 changes. Even on the LVS hosts, it's just a comment that changes in the pybal config [09:48:44] I was afraid we would have way more assumptions [09:49:31] hmmm [09:49:45] it doesn't trigger a change of the name of the config section on pybal? [09:50:08] from [eventgate-analytics_31192] to [eventgate-analytics-http_31192]? [09:51:01] yeah, my bad for calling it a comment [09:51:13] but it's not actionable as far as ipvsadm goes, right? [09:51:33] nope, from ipvsadm PoV is a NOOP [09:52:15] the driver for this btw is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/559167/5/hieradata/common/lvs/configuration.yaml,unified [09:52:44] we could also argue that we could have both the TLS and the non-TLS versions but I am not sure I see much value in that [09:53:18] and if there isn't much value in that, ottomata's comment about the canonical name being eventgate-analytics makes sense [09:53:56] I don't see a point in keeping alive the non-TLS endpoints [09:54:57] neither do I [09:57:59] and the low-traffic LVS could benefit from getting rid of some services [09:58:14] so yeah.. I'd kill the non TLS version O:) [09:59:41] cool. I 'll rebase ottomata's change then on my own, having both for a while and making the TLS one the canonical one naming wise. [13:06:50] don't forget, when removing a service, in addition to the usual pybal restart, you also have to manually remove the ipvs entry [13:07:12] (as in, "ipvsadm -D -t 192.0.2.1:1234" or whatever it was) [13:16:00] 10Traffic, 10Operations: Docker registry needs cache to vary on Accept header value - https://phabricator.wikimedia.org/T242200 (10BBlack) So long as the registry's responses do all the standards-based things correctly (they contain `Vary: Accept`, and the matching `Accept` values also match the `Content-Type`... [16:09:12] 10Traffic, 10Operations, 10Performance Issue: Current performance issues - https://phabricator.wikimedia.org/T242228 (10Gestumblindi) [16:28:56] 10Traffic, 10Operations, 10Performance Issue: Current performance issues - https://phabricator.wikimedia.org/T242228 (10Joe) 05Open→03Resolved a:03Joe Hi, thanks for your report! We were already aware of the issues, and were at work to solve them. Everything should be fine now though. [16:29:53] 10Traffic, 10Operations, 10Performance Issue: Current performance issues - https://phabricator.wikimedia.org/T242228 (10Joe) An incident report will be published later on wikitech at https://wikitech.wikimedia.org/wiki/Incident_documentation [18:00:40] 10Traffic, 10Operations, 10ops-eqsin: rack/setup/install ps[12]-60[34]-eqsin - https://phabricator.wikimedia.org/T242250 (10RobH) p:05Triage→03Normal [18:42:44] XioNoX: are we going to deploy ping offload in esams? [18:43:14] was looking through wmf_netflow for other reasons and realized that text-lb.esams gets something like 60k PPS of ICMP [18:43:56] (and I guess now that we have ganeti there, we could) [18:46:09] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10CDanis) 05Resolved→03Open boldly re-opening this, now that the POPs have Ganeti clusters available. Today I learned that text-lb.esams receives something like 60k+ PP... [20:50:22] 10Traffic, 10Discovery, 10Operations, 10Wikidata, 10Wikidata-Query-Service: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Mstyles) [22:25:49] 10Traffic, 10Discovery, 10Operations, 10Wikidata, 10Wikidata-Query-Service: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Mstyles) from inside any of the WDQS machines ( 'wdqs1004.eqiad.wmnet','wdqs1005.eqiad.wmnet', 'wdqs1006.eqiad.wmnet','wdqs1007.eqi... [22:26:02] bblack: I think this is a traffic issue, could you take a look please: https://phabricator.wikimedia.org/T237165 [22:29:57] maryum: that curl command works for me from wdqs1004 [22:30:14] at least i get some HTML [22:30:28] that html is not right and it's not the same as the other curl responses @mutante [22:30:42] if you try the other curls you see how it's completely different and it shouldn't be [22:30:47] ah,, i was thinking like "nothing happens" and checking iptables [22:30:50] ack