[07:05:57] 10netops, 10Operations: Zayo link eqiad-codfw (OGYX/120003//ZYO) down - TTN-0004110251 - https://phabricator.wikimedia.org/T253610 (10ayounsi) Current status as of 30min ago: > Zayo Technician has cleaned Fibers connections. Zayo is waiting for the circuit to take Errors before changing out the Card. We curren... [07:14:24] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review: Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10ayounsi) ACKing the alerts again with that task as comment. [08:02:17] 10Traffic, 10Wikimedia-Apache-configuration, 10Operations, 10Wikimedia-Site-requests: redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648 (10Nintendofan885) >>! In T249648#6050626, @Bugreporter wrote: > Note sco.wiktionary.org/wiki/ and sco... [09:03:38] 10Traffic, 10netops, 10Operations: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10ayounsi) p:05Triage→03Low [09:17:02] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: Anycast: consistent routers->servers routing - https://phabricator.wikimedia.org/T253666 (10ayounsi) > Option B via MEDs sounds like a good path forward for now, though! https://gerrit.wikimedia.org/r/598836 has been tested and is ready to be merged.... [09:17:38] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: Anycast: consistent routers->servers routing - https://phabricator.wikimedia.org/T253666 (10ayounsi) [09:17:40] 10Traffic, 10netops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10ayounsi) [09:17:50] 10Traffic, 10netops, 10Operations: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10ayounsi) [09:17:55] 10Traffic, 10netops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10ayounsi) [09:29:55] 10Traffic, 10netops, 10Operations: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10ayounsi) pmtud send the packets to the broadcast MAC address, which mean it only works within the same subnet. While we have hosts on different subnets (rows) in the core DCs. How... [12:56:00] 10netops, 10Operations: scrape ripe atlas data for a few anchors at other large networks - https://phabricator.wikimedia.org/T252890 (10jbond) > I think I'm leaning towards a few stable anchors in similar geographic locations to our PoPs. Maybe also a few root servers as well even though they're less apples-to... [13:35:49] XioNoX: so re: T253610 -- the latest update from Zayo says that they've had their link be clean of errors for 45 minutes but ... the circuit still shows down for us? [13:35:49] T253610: Zayo link eqiad-codfw (OGYX/120003//ZYO) down - TTN-0004110251 - https://phabricator.wikimedia.org/T253610 [13:36:14] I'm worried that whatever they're troubleshooting is a different problem than the one we're experiencing [13:36:14] looking [13:37:30] I'm bouncing it on the codfw side, eqiad side is up [13:37:34] ah [13:38:36] cdanis: interesting, we're getting good light [13:38:41] but no link [13:38:44] that is how it's manifested before yeah [13:38:50] ah ok! [13:38:55] all the past times, light levels unchanged, but no packets [13:39:08] one side shows interface down, other side shows loss of OSPF [13:39:34] I'm guessing it's because the local loops aren't the broken part, but rather some part of their DWDM in the middle [13:39:45] then yeah, need to follow up with them, do you want to do it? Or I can no pb [13:39:51] I can do it [13:40:24] I think if you send them enough emails, they will invite you to their christmas party [13:41:53] the problem is that this year they do it on zoom so you don't even get free beer [13:42:38] then why bother? [13:42:44] no point [17:23:52] vgutierrez: We're talking about expanding our use of acme-chief in wmcs (for example, using it for the top level *.wmflabs.org and *.wmcloud.org certs). One of the items on my to-do list there is "make sure that the maintainers of acme-chief know that we're relying on it and reluctantly agree to take our use cases into account in future updates" [17:23:58] Is that a realistic thing to want? [17:25:49] what are your use cases? :) [17:27:02] don't take me wrong, I'm pretty happy to see acme-chief being used across the organization [17:28:23] I think the use cases are currently subsets of your use cases. One change we might make is running with a single active host rather than an active and a passive. [17:28:59] that's a puppetization issue more than a acme-chief matter [17:29:11] And it will integrate with a slightly different type of puppetmaster (using the ::standalone class). As far as I know that's not a significant difference. [17:29:41] vgutierrez: Good point. I think changes in prod puppet classes are the thing that's most likely to break our cloud-vps deployments. I expect the code itself is pretty generic. [17:30:00] yup [17:30:09] we already have different profiles [17:30:13] ok [17:30:14] for cloud and production [17:30:44] So, mostly this is just to serve as 1) warning that you are about to have 'external' users, and 2) request that you loop us in if any breaking changes are in the works. [17:31:08] Are you also the main point of contact for the associated puppet pieces, or is that someone else? [17:31:13] yeah, that makes sense [17:31:38] yup that would be me AFAIK [17:32:06] Krenair worked on the cloud profile and he is familiar as he also collaborated on the production one [17:32:49] yep, I already warned Krenair that I'd be leaning on him for help with setting this up :) [17:33:20] thank you! I'll let you know what wrinkles we hit. [17:35:01] ack [17:52:28] We might need a little tweak in the acme-chief puppet module beyond just the profile [18:36:20] 10netops, 10Operations, 10Patch-For-Review: intermittent brief data dropouts for esams netflow data - https://phabricator.wikimedia.org/T253128 (10CDanis) One more of these today: `May 27 18:33:01 netflow3001 nfacctd[28442]: INFO ( default_kafka/kafka ): [/etc/pmacct/librdkafka.conf] Reading librdkafka glob... [18:44:25] 10netops, 10Operations: Zayo link eqiad-codfw (OGYX/120003//ZYO) down - TTN-0004110251 - https://phabricator.wikimedia.org/T253610 (10ayounsi) As of 16min ago: > Zayo has opened a case against your service. TTN-0004116026 > We are investigating a possible interruption, which may be impacting your service, and... [20:16:26] 10Domains, 10Traffic, 10DNS, 10Operations: Create diff.wikimedia.org subdomain - https://phabricator.wikimedia.org/T253807 (10CKoerner_WMF) [20:54:43] 10Acme-chief, 10cloud-services-team (Kanban): tools/toolsbeta: improve acme-chief integration - https://phabricator.wikimedia.org/T252762 (10Krenair) 05Declined→03Open [23:57:26] 10Traffic, 10Operations, 10Privacy Engineering, 10Research, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10bmansurov) @JFishback_WMF, issues mentioned at T251732#6158467 have been addressed.