[11:45:50] 10netops, 06Infrastructure-Foundations, 06SRE: Extend sre.network.configure-switch-interfaces cookbook to add sflow and qos config - https://phabricator.wikimedia.org/T379549 (10cmooney) 03NEW p:05Triage→03Low [11:48:07] 10netops, 06Infrastructure-Foundations, 06SRE: Extend sre.network.configure-switch-interfaces cookbook to add sflow and qos config - https://phabricator.wikimedia.org/T379549#10309316 (10cmooney) a:03cmooney [12:14:02] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 3 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10309416 (10Jclark-ctr) @cmooney thanks for the list I have populated both new switches up to port 27 wit... [13:08:25] 06Traffic, 10conftool, 07Epic: Deleting ipblocks when there are uncommitted changes causes failures - https://phabricator.wikimedia.org/T378435#10309504 (10Joe) 05Open→03Resolved [13:08:41] 06Traffic, 10conftool, 07Epic: Deprecate sync, add apply command to requestctl - https://phabricator.wikimedia.org/T376877#10309506 (10Joe) 05In progress→03Resolved [13:26:29] 06Traffic, 10conftool, 07Epic: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules - https://phabricator.wikimedia.org/T377699#10309540 (10Volans) For 3. an //EASY// way to link the superset dashboard from a given requestctl rule is... [13:27:48] 06Traffic, 10conftool, 07Epic: Coexistence of the requestctl CLI tool and of the web interface - https://phabricator.wikimedia.org/T374723#10309546 (10Joe) 05In progress→03Resolved [13:29:46] 06Traffic, 10conftool, 07Epic: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules - https://phabricator.wikimedia.org/T377699#10309551 (10Joe) [13:36:39] 06Traffic, 10conftool, 10Sustainability (Incident Followup): Make it easier to create a new requestctl object - https://phabricator.wikimedia.org/T310009#10309582 (10Volans) With the new requestctl web UI I think it would be very useful if the current requestctl generator ( `https://superset.wikimedia.org/re... [14:44:40] FIRING: [3x] VarnishHighThreadCount: Varnish's thread count on cp5018:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:49:40] FIRING: [7x] VarnishHighThreadCount: Varnish's thread count on cp5018:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:54:40] FIRING: [7x] VarnishHighThreadCount: Varnish's thread count on cp5018:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:59:40] FIRING: [10x] VarnishHighThreadCount: Varnish's thread count on cp5018:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:09:32] hello folks! [15:09:40] FIRING: [9x] VarnishHighThreadCount: Varnish's thread count on cp5018:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:09:45] is it a good time later on to merge and deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087422 ? [15:09:53] so pybal restart etc.. [15:11:03] elukey: go for it :) [15:11:13] not that you need it but let us know if we can help [15:11:22] okok I'll start in a bit, and I'll log what I am going to do for validation :) [15:11:35] sukhe: if you have a min would you review the last PS of the patch? [15:11:41] sure [15:11:42] looking [15:11:44] <3 [15:19:40] RESOLVED: [4x] VarnishHighThreadCount: Varnish's thread count on cp5018:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:34:52] 06Traffic, 10conftool, 07Epic: Link to superset dashboard in requestctl web UI - https://phabricator.wikimedia.org/T379567 (10Joe) 03NEW [15:43:59] elukey: +1ed but I think you saw it [15:45:27] sukhe: yes yes thanks! I was reviewing something on the maps node atm, after reading https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service I am wondering if I should use "service_setup" as state before lvs_setup [15:45:56] profile::lvs::realserver::pools on the maps nodes seems to list only 'karthotherian' [15:46:06] and not 'kartotherian-ssl' for example [15:46:20] yes, ideally :) [15:46:30] I guess it works since the IP is the same [15:46:34] sigh [15:46:38] okok lemme amend the patch [15:47:25] all right patch updated [15:48:39] elukey: since you brought it up, I do think there is value in following these steps exactly and separating out the patches fwiw [15:48:49] that's the path of least resistance [15:49:10] meaning for example it's fine to put the conftool service data in this patch [15:49:33] but in general I have observed in many of these that having distinct patches and following the order exactly alleviates a lot of the pain [15:49:48] I can break them down no problem [15:50:02] no, it's fine but I was just remarking since you mentioned the service_setup thing [15:50:14] since you already have existing things defined, no issues from me [15:50:32] have you pooled it and set the weights? [15:50:48] (you can do it later but there is value doing it now to actually check the results, if you want) [15:51:07] not yet, I was waiting for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087423 to be merged for the conftool data [15:51:25] sorry https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087422 [15:51:58] ah yeah, so this is actually the first step in that sense. no worries at all, let's carry it forward [15:52:05] super, merging! [15:59:13] sukhe: pooled the hosts, then I'd say https://gerrit.wikimedia.org/r/c/operations/puppet/+/1089817 right ? [15:59:45] and follow https://wikitech.wikimedia.org/wiki/LVS#Configure_the_load_balancers [15:59:47] yep. and then the restarts (disable pybal before proceeding with a single host to catch any errors) [16:00:00] yep yep the usual scary part [16:00:28] don't worry, you can blame me (we usually do fabfur but he is AFK) [16:00:43] we usually blame Riccardo :D [16:04:22] running puppet on eqiad lvses [16:05:43] and now proceeding with secondary lvses [16:06:21] elukey: wait please, checking an alert [16:08:01] elukey: you restart pybal on 1020 and 1019? [16:08:11] if it is 1019 I haven't restarted pybal on it [16:09:00] sukhe: I've done 1020 and checked, now I am doing 1019 if you are ok [16:09:09] elukey: ok so you just did 1020. can you log that please, timestamps come in handy [16:09:12] elukey: checking 1019 [16:09:16] then we can do 1020 [16:09:21] sorry, checking 1020, then 1019 [16:09:32] lol [16:09:59] don 1019 [16:10:37] and the new endpoint works in eqiad [16:10:45] looks good [16:10:47] +1 for 1019 [16:10:57] already done [16:11:04] ok [16:11:23] I tested curl "https://kartotherian.svc.eqiad.wmnet:6543/#4/40.75/-73.96" -i [16:11:53] yes, curl localhost:9090/pools/kartotherian-k8s-ssl_6543 looks good as well [16:12:08] In https://wikitech.wikimedia.org/wiki/LVS#Configure_the_load_balancers there is a step to run puppet on all lvses for a dc, then restart pybal on secondary and then primary [16:12:10] give me a second to go through the 1019 alerts [16:12:14] and then we can proceed [16:12:25] ok forced a recheck, worked [16:12:29] do we prefer to amend and do runpuppet on secondary only, then primaries? [16:12:34] super [16:13:03] elukey: where is that text sorry? [16:13:15] https://wikitech.wikimedia.org/wiki/LVS#Configure_the_load_balancers point 2 [16:13:36] > Enable and run Puppet on the first data center you wish to effect the change on (such as eqiad here): [16:13:57] 11:12:30 < elukey> do we prefer to amend and do runpuppet on secondary only, then primaries? [16:14:08] no, since Puppet does not restart Pybal, it's fine in that sense [16:14:22] +1 for moving on with codfw [16:15:42] okok [16:16:11] running puppet [16:17:43] 2014 (secondary) restarted [16:18:35] looks good! [16:18:40] ipvsadm looks good, proceeding with the primary [16:19:25] all works! [16:19:27] thanks sukhe <3 [16:20:32] you did the hard worK :) [16:20:54] tomorrow I'll ping again to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087423 [16:21:06] and then we should be hopefully done, ready for k8s :) [16:21:32] cool. and I am guessing lvs_setup -> production after that [16:21:38] * elukey nods [17:13:29] 06Traffic, 10conftool, 07Epic, 13Patch-For-Review: Link to superset dashboard in requestctl web UI - https://phabricator.wikimedia.org/T379567#10310231 (10Joe) 05Open→03In progress p:05Triage→03High [17:15:44] 06Traffic, 06Infrastructure-Foundations, 06SRE: NEL: don't alert on domains we don't control - https://phabricator.wikimedia.org/T349807#10310243 (10jcrespo) This happened again for another proxy/domain: https://logstash.wikimedia.org/goto/fdbf6830d7a58fccbb40681028ac5bdd [17:24:18] 10netops, 06Infrastructure-Foundations, 06SRE: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322#10310260 (10Reedy) [17:31:31] 06Traffic, 10conftool, 10Sustainability (Incident Followup): Make it easier to create a new requestctl object - https://phabricator.wikimedia.org/T310009#10310274 (10Joe) The way we could do this is something as follows: * Add the CORS headers to superset to allow making authenticated requests from requestct... [17:32:17] 06Traffic, 10conftool, 10Sustainability (Incident Followup): Make it easier to create a new requestctl object - https://phabricator.wikimedia.org/T310009#10310284 (10Joe) As an alternative, which I might actually prefer, all the process would remain server-side if we can grant access to the superset api via... [17:55:26] 06Traffic, 10conftool, 07Epic, 13Patch-For-Review: Link to superset dashboard in requestctl web UI - https://phabricator.wikimedia.org/T379567#10310320 (10ops-monitoring-bot) Deployed hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - T379567 [18:05:22] 06Traffic, 10conftool, 07Epic, 13Patch-For-Review: Link to superset dashboard in requestctl web UI - https://phabricator.wikimedia.org/T379567#10310334 (10Joe) 05In progress→03Resolved [19:04:35] 10netops, 06Infrastructure-Foundations, 06SRE: Manange fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10310435 (10cmooney) [19:04:48] 10netops, 06Infrastructure-Foundations, 06SRE: Manange fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10310436 (10cmooney) a:03cmooney