[06:37:01] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10878331 (10ayounsi) [06:40:00] FIRING: PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5018 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [06:45:00] FIRING: [7x] PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [06:50:00] RESOLVED: [5x] PurgedHighEventLag: High event process lag with purged on cp5020:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [09:16:01] 10netops, 06Infrastructure-Foundations, 06SRE: Homer: stop using the 'section' macro in jinja templates - https://phabricator.wikimedia.org/T395555#10878693 (10Volans) No objection from my side. I can look if there are other alternative options in addition to those mentioned, but I'm not sure there is any. [09:17:25] 06Traffic, 06SRE: Move ncredir7003 into service and decom ncredir7002 - https://phabricator.wikimedia.org/T395796#10878694 (10MoritzMuehlenhoff) [09:37:25] FIRING: SystemdUnitFailed: wmfuniq-experiment-fetcher.service on cp7001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:57:25] RESOLVED: SystemdUnitFailed: wmfuniq-experiment-fetcher.service on cp7001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:21:37] o/ I have a small trafficserver change for yet another restbase migration, moving an API over for something that looks somewhat like group1. Any objections to me rolling this out? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1149624 [14:24:01] hnowlan: none, we have no pending changes, go ahead :) [14:33:33] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10879725 (10Jhancock.wm) @cmooney I'm gonna reply to Jorge's email about boxes and pickup instructions. Not trying to rush, but... [14:46:52] thanks! [16:21:25] inflatador: ryankemper: sorry, I need time to review the patches and I was busy with other work. I will do that today and perhaps let's aim for a tomorrow deploy? [16:22:00] I have one more meeting today and while we can deploy in theory, it's going to be late again (for me) and I don't want to do that :P [16:40:45] sukhe sure, feel free to reschedule at a better time [16:42:03] thanks, I will go over the patches and reach out. [16:42:13] np, thanks again for taking the time [16:45:16] sounds good! [18:14:19] 06Traffic: GeoDNS: consider sending CN to eqsin - https://phabricator.wikimedia.org/T378744#10881220 (10CDobbins) We have a pipeline (currently a Jupyter [[ https://gitlab.wikimedia.org/cdobbins/geo-maps-scripts/-/blob/588318a029fbf9009a475c30e3a012d1bd702f1d/pipelinev1.ipynb | Notebook ]]) that ranks each DC fo... [18:21:14] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10881244 (10BCornwall) [19:16:46] inflatador: ryankemper: please correct me but https://phabricator.wikimedia.org/T143553 and the CRs linked in https://phabricator.wikimedia.org/T143553#10861215 is what we are aiming for, correct? [19:17:27] I am not CCed on all and that's fine, I don't mean that cynically but just to make sure that I know the full context [19:17:54] sukhe 👀 [19:18:51] just making sure I understand it right this time :] [19:19:07] no problem at all, thanks for taking the time [19:20:00] TL;DR being the introduction of search-psi,omega, which currently exist but through a single name (differing ports). but you want to split those up into these two [19:20:23] is the above missing something? because I recall from our last conversation that there was more. [19:21:39] wasn't there a third one? search-chi? [19:22:11] The main goal is to be able to switch traffic between DCs without pushing a mwconfig patch. So we're trying to set up DNS discovery for the first time on the search clusters [19:22:34] and yes, it gets confusing. 'search-chi' is the main cluster, also known simply as 'search' [19:22:50] ah [19:23:01] we left it as simply 'search' because there is no LVS pool for 'search-chi' and that seems like it would've been a lot more work [19:23:17] that's fine, whatever you are comfortable with [19:23:46] ok cool, thanks. the missing bit was the search-chi thing, which in hindsight, you may have already mentioned but I don't recall because we were in the middle of the deploy last time :) [19:24:21] Yeah, sorry about this and thanks for bringing it back up, should help next time ;) [19:24:57] cool. will stamp soon and we can decide a deployment time. [19:24:58] thanks [19:27:05] (review) [19:38:47] 06Traffic, 07Regression, 07xLab: Cookie “WMF-Uniq” has been rejected because it is in a cross-site context - https://phabricator.wikimedia.org/T395958 (10Krinkle) 03NEW [19:39:17] 06Traffic, 06Infrastructure-Foundations, 07Regression, 07xLab: Cookie “WMF-Uniq” has been rejected because it is in a cross-site context - https://phabricator.wikimedia.org/T395958#10881504 (10Krinkle) [19:41:14] left some comments, let's discuss in the CRs and take it from there. [19:42:44] ACK, reading thru now [19:48:15] happy to set up a time for tomorrow to go over this fwiw, fully possible I missed something [19:52:02] 06Traffic, 06Infrastructure-Foundations, 07Regression, 07xLab: Cookie “WMF-Uniq” has been rejected because it is in a cross-site context - https://phabricator.wikimedia.org/T395958#10881537 (10Krinkle) [19:52:43] sukhe: see my response in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151308 [19:53:55] oh, I was struggling to figure out direct link to comment on gerrit but I remembered where the button is, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151308/comments/d55b2873_f83b26c5 specifically [20:07:38] will respond later, I stepped away for a bit. thanks! [23:23:30] 06Traffic, 06Infrastructure-Foundations, 07Regression, 07xLab: Cookie “WMF-Uniq” has been rejected because it is in a cross-site context - https://phabricator.wikimedia.org/T395958#10882246 (10Krinkle) [23:24:35] 06Traffic, 06Infrastructure-Foundations, 07Regression, 07xLab: Cookie “WMF-Uniq” has been rejected because it is in a cross-site context - https://phabricator.wikimedia.org/T395958#10882247 (10Krinkle)