[00:19:07] 10netops, 10Operations: Standardize cross confederation BGP policies - https://phabricator.wikimedia.org/T227808 (10ayounsi) 05Open→03Resolved This is done and pushed to all the sites. [00:19:10] 10netops, 10Operations: Cleanup confed BGP peerings and policies - https://phabricator.wikimedia.org/T167841 (10ayounsi) [10:31:33] 10Traffic, 10Operations: Remove X-Wikimedia-Security-Audit VCL support - https://phabricator.wikimedia.org/T229320 (10ema) [10:31:42] 10Traffic, 10Operations: Remove X-Wikimedia-Security-Audit VCL support - https://phabricator.wikimedia.org/T229320 (10ema) p:05Triage→03Normal [10:37:11] 10Traffic, 10Operations: Consider removing X-Wikimedia-Security-Audit VCL support - https://phabricator.wikimedia.org/T229320 (10ema) [11:04:12] 10Traffic, 10Operations, 10Security-Team: Consider removing X-Wikimedia-Security-Audit VCL support - https://phabricator.wikimedia.org/T229320 (10Peachey88) [13:34:00] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: TLS certificates for Analytics origin servers - https://phabricator.wikimedia.org/T227860 (10elukey) [13:35:45] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: TLS certificates for Analytics origin servers - https://phabricator.wikimedia.org/T227860 (10elukey) 05Stalled→03Open [16:07:56] bblack: response from cloud was basically they decided to not go the haproxy route and are using a dedicated openstack solution. Can we then move forward with LVS for cloudelastic or should something more be considered? [16:16:33] ebernhardson: probably move forward with LVS, I'm re-reviewing the pair of patches now, thanks for the ping! [16:18:11] thanks [16:33:25] ebernhardson: added some -1 nitpicking to both patches [16:35:41] bblack: thanks! i can address those [17:16:52] bblack: I'm going to depool ulsfo in prevision of the routers upgrade - https://gerrit.wikimedia.org/r/526487 [17:19:14] XioNoX: ok, give me a sec before you do the actual router work [17:20:52] XioNoX: are we expecting total network loss, or expecting one router to keep working at a time and maintain some transport+transit? [17:20:54] bblack: I should start it in about 40min, will ping you before starting [17:21:40] bblack: one after the other, so there will be blips during routing convergence, but nothing more [17:21:52] XioNoX: ok, I'm done with my stuff there now anyways [17:28:14] bblack: this drop in codfw is surprizing - https://grafana.wikimedia.org/d/000000343/load-balancers?orgId=1&from=now-3h&to=now&panelId=7&fullscreen [17:29:49] it's like the "LVS connections" graphs have been switched between codfw and ulsfo [17:33:10] what does "LVS connections" track anyways? [17:33:24] all connections to all defined ipvs services, as shown by ipvs stats? [17:33:34] (and does connections include UDP?) [17:34:38] no idea :) [17:36:13] it's node_ipvs_backend_connections_active [17:36:44] and they do seem to come from the correct codfw-vs-ulsfo prometheus/ops sources [17:37:50] in any case, the UDP DNS stuff doesn't ever show "active" conns in LVS due to one-packet-scheduler, so it can't be the DNS change I don't think [17:38:08] plus it affected the text LVS too [17:38:09] hmmm [17:41:23] XioNoX: yeah something seems off [17:41:38] the HTTP stats show a decent-looking and normal shift from ulsfo->codfw [17:41:47] https://grafana.wikimedia.org/d/000000343/load-balancers?orgId=1&from=now-1h&to=now [17:41:57] ^ but if you look at the whole set of graphs there for LVS... it's weird [17:42:11] the ulsfo graphs show the expected dropoffs, but the codfw ones are strange [17:42:34] there's a dropout on codfw "conns" before recoverying and raising a bit, but the PPS graph never rises enough in codfw to account for the PPS loss in ulsfo, etc? [17:43:19] ulso ht1 and ht2 incoming PPS dropped from ~40Kpps -> ~4Kpps [17:43:22] *ulsfo [17:43:45] yet the codfw PPS graphs have barely moved from the ~4Kpps they were at before [17:45:01] XioNoX: nevermind, assume the Real World is all fine [17:45:29] I've found at least one wrong grafana definition (the data being shown is not what's labeled. That faulty PPS graph isn't showing PPS at all) [17:47:44] I'm gonna run through all the panels on that LVS dashboard and fix them up a bit [17:53:51] sounds good, thanks! [17:57:43] was just that one that was "wrong" [17:57:46] fixed now! [18:00:48] cool, thx!