[07:12:46] 10Traffic, 10Upstream: HAProxy 2.6.12 segfaults - https://phabricator.wikimedia.org/T334448 (10Vgutierrez) 05Open→03Resolved Thanks to @BCornwall for taking care of the final deployment of HAProxy 2.6.13 cluster wide [09:13:02] hello folks! I'd need to set up a new LVS service in https://gerrit.wikimedia.org/r/c/operations/puppet/+/918409, lemme know if you have 5 mins to assist :) [09:38:03] * vgutierrez looking [09:41:08] elukey: looks sane enough, the LVS you are looking for are lvs2010 (secondary) and lvs2009 (primary) [09:41:41] (running pcc ATM) [09:52:39] vgutierrez: ack thanks! [09:57:29] vgutierrez: I think I'll merge and restart this afternoon, would it be ok for traffic? I'll ping here again before starting :) [09:58:04] elukey: please coordinate with sukhe as he is working in setting up the new LVS boxes in codfw [09:59:02] vgutierrez: ack perfect, thanks a lot! [10:11:10] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10cmooney) [10:42:19] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10cmooney) @aborrero @dcaro @Andrew I think we are in a position to look at doing this again? I've updated the list of servers... [10:51:13] vgutierrez: there are some eqsin hosts in decommissionin ing NEtbox but still in puppetdb, see https://netbox.wikimedia.org/extras/reports/results/4557563/ what's their expected status? [11:07:41] volans: these were supposed to be scheduled for SSD/HDD shredding and then recycled [11:07:51] I can pick robh today to check on that [11:08:14] elukey: hth! [11:08:15] did the decommissioning cookbook failed? it's supposed to delete them from puppetdb and then shut them off [11:08:59] volans: doesn't seem to have failed, no https://phabricator.wikimedia.org/T323830#8443000 [11:09:15] [5013, as an example] [11:10:50] IIRC robh was working on them so I can follow up (working as in to shred the disks) [11:18:01] ack, power status is off, but it would be weird to be a race that was hit on all those hosts [11:18:12] do I'd like to know what failed [12:48:32] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: cloudgw: review security policy for edge network - https://phabricator.wikimedia.org/T336368 (10aborrero) [12:50:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: cloudgw: review security policy for edge network - https://phabricator.wikimedia.org/T336368 (10aborrero) [13:10:45] (HAProxyRestarted) firing: HAProxy server restarted on cloudlb2001-dev:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=codfw%20prometheus/ops&var-instance=cloudlb2001-dev&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [13:11:23] hmm that's pinging here but should cloud folks [13:11:28] *should be for [13:30:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: cloudgw: review security policy for edge network - https://phabricator.wikimedia.org/T336368 (10cmooney) @aborrero my apologies I messed up the vlan list for cloudgw2002. Should be ok now. ` cmooney@cloudsw1-b1-codfw> show arp no-resol... [13:36:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: cloudgw: review security policy for edge network - https://phabricator.wikimedia.org/T336368 (10cmooney) @aborrero re-reading the description it sounds like there may be some other issues? Let me know if there is anything specific, the... [13:40:32] sukhe: o/ [13:41:13] lemme know if it is ok to add the new LVS vip in codfw or if better to wait [14:03:53] elukey: hi! [14:04:07] I am provisioning a new host in 25 mins or so [14:04:38] if you can wait that will be great but I think feel free to go for it too if you had like [14:05:33] sukhe: nono please even tomorrow, I am in a meeting so go ahead :) [14:07:13] ok! thanks [14:07:23] happy to do tomorrow [15:02:29] sukhe: ok so if I don't hear from you anything I'll target the pyball restarts for codfw lvses tomorrow, otherwise ping me and I'll avoid to do that. Would it be ok? [15:02:54] elukey: completely fine! thanks [15:37:40] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [15:42:43] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [15:43:12] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [15:44:13] 10Traffic, 10WMF-Legal, 10Patch-For-Review, 10Performance-Team (Radar), 10Privacy: Add no-transform to Cache-Control header - https://phabricator.wikimedia.org/T218618 (10BCornwall) I got in contact with @Maryana and they found an email thread on the subject last March. From Mr. Perry: > Yandex has resp... [15:47:56] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [15:49:11] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [16:05:45] (HAProxyRestarted) resolved: HAProxy server restarted on cloudlb2001-dev:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=codfw%20prometheus/ops&var-instance=cloudlb2001-dev&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [16:15:36] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [16:20:09] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [16:25:44] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [16:27:47] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [16:31:48] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [16:36:18] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [16:51:04] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [16:54:48] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [17:24:01] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [17:59:42] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations: Reimaging lvs2012 fails as the host is unreachable from cumin2002 - https://phabricator.wikimedia.org/T336428 (10ssingh) [18:07:56] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations: Reimaging lvs2012 fails as the host is unreachable from cumin2002 - https://phabricator.wikimedia.org/T336428 (10ssingh) The other LVS hosts in codfw seem to be accessible from cumin2002, just not lvs2012 for some reason. ` sukhe@cumin2002:~$ ping -4 lvs201... [18:58:51] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE: Reimaging lvs2012 fails as the host is unreachable from cumin2002 - https://phabricator.wikimedia.org/T336428 (10Volans) I tried to ping from `lvs2012` few hosts in row C and all fails, so I think is the connection with the row C that is misconfigur... [19:13:12] 10Traffic, 10RESTBase-API, 10Documentation: I am hitting a rate limit on REST API endpoint - https://phabricator.wikimedia.org/T307610 (10BCornwall) Great, thanks. @BBlack What is your opinion of [[ #8838820 | the above comment ]]? [19:15:30] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Reimaging lvs2012 fails as the host is unreachable from cumin2002 - https://phabricator.wikimedia.org/T336428 (10ssingh) >>! In T336428#8842727, @Volans wrote: > I tried to ping from `lvs2012` few hosts in row C and all fails, so I think is the c... [19:21:03] 10Traffic, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/decom codfw unified decommission task - https://phabricator.wikimedia.org/T335777 (10BCornwall) 05Open→03In progress [19:28:08] 10Traffic, 10DNS: Central and South American countries in geo-maps - https://phabricator.wikimedia.org/T301605 (10BCornwall) @bblack: Is it like this on purpose? [19:30:08] 10Traffic: Frequent server errors (503 and 502), happened several times in the last 2 days - https://phabricator.wikimedia.org/T297544 (10BCornwall) 05Stalled→03Invalid As there's been no response, I'm going to close this ticket. Please, do re-open if this is still occurring as we'd love to fix it! [20:17:43] 10Wikimedia-Apache-configuration, 10SRE, 10serviceops-radar: catch-all apache vhost on the cluster should return 404 for non-existing sites - https://phabricator.wikimedia.org/T137176 (10Dzahn) [20:51:30] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Reimaging lvs2012 fails as the host is unreachable from cumin2002 - https://phabricator.wikimedia.org/T336428 (10cmooney) Definitely odd. I can ping fine with v6 (default) from cumin2002 as of now: ` cmooney@cumin2002:~$ ping lvs2012 PING lvs201... [21:04:35] 10HTTPS, 10Traffic, 10SRE, 10serviceops, 10Abstract Wikipedia team (Phase λ – Launch): Get new edge & internal HTTPS certificates expanded to add wikifunctions.org and *.wikifunctions.org - https://phabricator.wikimedia.org/T313227 (10BCornwall) 05In progress→03Resolved Resolving since it appears to...