[00:40:42] Heya traffic folks, I already echo'd this to brandon earlier today, but the work is now ongoing in eqsin [00:40:58] we're having to move power cables around a bit, but jin did this before with no downtiem and replacing entire pdus [00:41:18] now we're just shifting things to make room for a router without cables blocking, but just echoing its happening and im logged into each cp host as it happens [04:29:58] and now all work there is done, no incidents or outages [08:03:36] 10Traffic, 10DNS, 10Operations, 10netbox: Netbox DNS change not effective in gdns - https://phabricator.wikimedia.org/T255748 (10ayounsi) p:05Triage→03High [08:07:49] 10Traffic, 10DNS, 10Operations, 10netbox: Netbox DNS change not effective in gdns - https://phabricator.wikimedia.org/T255748 (10ayounsi) I then deployed https://gerrit.wikimedia.org/r/c/operations/dns/+/606385 and the `sudo -s authdns-update` fixed instantly: ` bast5001:~$ host cr3-eqsin.mgmt.eqsin.wmnet... [10:44:18] 10Traffic, 10DNS, 10Operations, 10netbox: Netbox DNS change not effective in gdns - https://phabricator.wikimedia.org/T255748 (10Volans) Glad it solved for now, that's what I tought could be happening. I tried an authdns update but with no changes it exit earlier. I guess the way the cookbook calls the gdn... [11:51:39] 10Traffic, 10Operations: noc.wikimedia.org consistently 503s in eqsin and sometimes 503s in esams - https://phabricator.wikimedia.org/T255368 (10ema) 05Open→03Resolved a:03ema >>! In T255368#6231244, @ema wrote: > what we need to do is (1) ensure the origins don't send Transfer-Encoding on 304 responses... [12:09:55] 10netops, 10Operations, 10ops-eqsin: cr3-eqsin to production - https://phabricator.wikimedia.org/T255766 (10ayounsi) p:05Triage→03Medium [12:11:19] 10netops, 10Operations, 10ops-eqsin: cr3-eqsin to production - https://phabricator.wikimedia.org/T255766 (10ayounsi) [12:11:22] 10netops, 10DC-Ops, 10Operations, 10ops-eqsin: (Need By: TBD) rack/setup/install cr3-eqsin.wikimedia.org - https://phabricator.wikimedia.org/T253246 (10ayounsi) [12:33:23] hmmm we've lost cp5005.mgmt... is that expected XioNoX? [12:33:29] (4hours ago already) [12:33:38] vgutierrez: nop [12:33:54] oh sorry [12:33:55] it seems to be back [12:34:01] but my icinga.wm.o/alerts tab is frozen [12:34:02] :/ [12:34:29] in -operations we have the recovery at 12:08 UTC [12:34:42] no pb! [12:47:00] 10netops, 10Operations, 10ops-eqsin, 10Patch-For-Review: cr3-eqsin to production - https://phabricator.wikimedia.org/T255766 (10ayounsi) [12:48:14] 10netops, 10Operations, 10ops-eqsin, 10Patch-For-Review: cr3-eqsin to production - https://phabricator.wikimedia.org/T255766 (10ayounsi) [13:31:42] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) Bump [14:36:05] 10netops, 10Operations, 10ops-codfw: codfw: rack/setup new srx300 (mr1) - https://phabricator.wikimedia.org/T255577 (10Papaul) @ayounsi the configuation on mr1 is done. Can you take a look and see if i missed anything. The temp root password is the same as the mgmt password. [15:21:16] 10netops, 10Operations, 10ops-eqsin, 10Patch-For-Review: cr3-eqsin to production - https://phabricator.wikimedia.org/T255766 (10RobH) It isn't clear to me if this is for Jin (DreamICC) or for Equinix remote hands. If it is Jin, and I'll be supervising them, I prefer we never do work in eqsin on Monday, a... [15:21:43] 10Traffic, 10Operations, 10Privacy Engineering, 10Research, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Reedy) Minor issue, I going to https://wikiworkshop.org/2019 (and other older sites, rather than one wi... [15:24:08] 10Traffic, 10Operations, 10Privacy Engineering, 10Research, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Reedy) ` curl -I -L https://wikiworkshop.org/2019 HTTP/2 301 date: Thu, 18 Jun 2020 15:17:14 GMT serve... [15:26:28] 10Traffic, 10Operations, 10Privacy Engineering, 10Research, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Reedy) https://github.com/wikimedia/puppet/blob/58ac95353aca3f0925017407ddebc2d397cd9f2f/modules/profil... [15:31:28] 10Traffic, 10Operations, 10Privacy Engineering, 10Research, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Vgutierrez) that's interesting.. why we don't have HSTS headers for wikiworkshop.org? [15:43:06] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10ema) >>! In T242767#6207395, @Ottomata wrote: > Hm, I'm pretty sure the connection is terminated even when t... [15:51:46] vgutierrez: Should the HSTS header be in the apache config or in wikimedia-frontend.vcl.erb? [15:52:02] for the canonical domains varnish does it [15:52:14] I don't know if bblack intentionally skipped wikiworkshop [15:52:25] >// Same regex as above in https_recv_redirect [15:52:26] I'll ping him about that on Monday [15:52:31] lol, there's no regex in https_recv_redirect [15:53:06] yeah.. we used to have one AFAIK :) [15:53:46] !bug 1 [15:53:59] We definitely have similar regexes elsewhere [15:54:24] 10netops, 10Operations, 10ops-codfw: (Need by: End of July-2020 ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul) @wiki_willy in C8 we have two msw (fmsw-c8 and msw-c8) which msw are we going to replace since we have only 1 new mgmt switch per rack. [15:55:19] https://gerrit.wikimedia.org/r/606457 removes said comment [16:16:40] 10netops, 10Operations, 10ops-eqsin, 10Patch-For-Review: cr3-eqsin to production - https://phabricator.wikimedia.org/T255766 (10ayounsi) Discussed over IRC. I don't have any strong preferences on the two questions above. It's a tradeoff between DC traffic, work quality and working hours for both remote han... [16:26:22] 10netops, 10Operations, 10ops-codfw: (Need by: End of July-2020 ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10wiki_willy) @Papaul - if we're short by one msw from upgrading everything, then I would say to not upgrade the most recent msw that you have at codfw. And... [16:34:58] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) Just parking a crazy idea I just had, mostly irrelevant to this ticket. > Large downloads are very... [17:09:00] 10netops, 10Operations, 10ops-eqsin, 10Patch-For-Review: cr3-eqsin to production - https://phabricator.wikimedia.org/T255766 (10RobH) I've created a google doc, since Jin doesn't use phabricator, outlining all the steps above: https://docs.google.com/document/d/1s2_ALpvDT9xTGihYE8BIXo41dSR9T11sZNUFs1mMrFM...