[06:06:21] BlankEclair: incident comms are on discord btw [06:06:32] oh okay [06:06:47] BlankEclair: the relay has crapped it in an outage [06:06:53] woo yay [06:07:04] meanwhile i'm busy debugging an extension i'm making [06:07:13] Swift is also dead [06:09:26] dns may have crapped itself? [06:09:28] > [04/09/2024 15:46] PROBLEM - os161 PowerDNS Recursor on os161 is CRITICAL: No response from DNS ::1 [06:18:51] BlankEclair: yup [06:28:56] BlankEclair: I'm fairly sure it's internal dns only [06:29:01] Void is around and looking [06:52:28] @orduin something just happened on bots171 [06:52:37] The password I meant [06:52:38] yeah, I rebooted it [06:52:49] @orduin can you reboot every down server in icinga? [06:53:03] Showing puppet, ntp or PowerDNS errors [06:53:15] sure, don't really have any better ideas [06:53:59] @orduin can we try and see if we can grab any logs too? [06:56:22] yeah will do [06:58:04] https://issue-tracker.miraheze.org/T12538 for the outage [06:58:12] Subtask for you not knowing the password [06:58:33] I will be arriving at work soon so going offline [07:02:43] @orduin mw162 is definately still offline [07:02:59] And garylog161 [07:03:04] yeah, doesn't seem to be working [07:03:12] That's no fun [07:03:34] @orduin any indication from the ones that did work on why it died [07:04:46] @orduin icinga + phab's DB just went offline [07:05:03] PROBLEM - db182 PowerDNS Recursor on db182 is CRITICAL: Domain 'wikitide.net' was not found by the server [07:05:38] Although phab is still up [07:05:40] [1/5] Not clear, just seeing: [07:05:41] [2/5] ``` [07:05:41] [3/5] Sep 4 05:42:49 swiftproxy171 pdns-recursor[608]: msg="Failed to update . records" error="Too much time waiting for .|NS, timeouts: 5, throttles: 0, queries: 6, 7508msec" subsystem="housekeeping" level="0" prio="Error" tid="0" ts="1725428569.875" exception="ImmediateServFailException" [07:05:41] [4/5] Sep 4 05:42:49 swiftproxy171 pdns-recursor[608]: msg="Failed to update . records" subsystem="housekeeping" level="0" prio="Warning" tid="0" ts="1725428569.875" rcode="-1" [07:05:42] [5/5] ``` [07:09:19] could be a network problem then [08:33:50] Is there a problem with https://issue-tracker.miraheze.org/ (Can Not Connect to MySQL). [08:40:53] robkam: it fixed itself [08:58:00] Everything is randomly breaking. Blame dns. [09:02:18] > [04/09/2024 18:17] PROBLEM - Host db182 is DOWN: CRITICAL - Host Unreachable (10.0.18.103) [09:02:22] maybe not dns >.< [10:50:34] [1/2] I can't log into "new" wiki, i.e. where CA hasn't attached me yet, because of VPN, this is something new [10:50:34] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1280842437425762374/IMG_20240904_134923.jpg?ex=66d98cf9&is=66d83b79&hm=1e01b65808121d1efbd1b12a40179550368eddb490d8908aa9f2a55b4805febe& [10:51:24] it's not new [10:51:48] @theoneandonlylegroom https://meta.miraheze.org/wiki/Tech:Noticeboard#GlobalBlocking_affecting_account_autocreation [10:52:08] ah [10:52:11] it's a change in the GlobalBlocking extension in 1.42 [10:52:11] my bad [10:52:52] that went under my radar also [16:12:51] @bluemoon0332 is our status still things maybe breaking possibly network possibly dns [16:13:41] custom domain DNS is still definitely having troubles. [16:14:09] we're through the worst of it now [16:14:36] I've been talking with FiberState about this [16:14:51] @bluemoon0332 have we identified cause then? [16:15:12] they're investigating, but we do have a theory regarding possible cause [16:15:20] What is the theory [16:15:39] Maybe better discussion for internal until theory is confirmed? [16:16:37] @bluemoon0332 can dm if needed but tbh there's no reason unless there's security aspects to keep it quiet [16:17:06] I would like to know what the cause is though [16:17:28] it's very early still, but they suspect this was a "temporary upstream issue" [16:17:39] Okay [16:17:54] Ah, racoons stealing wiring, classic. 🤣 [16:18:03] it might [16:18:19] @notaracham is there anything specific with custom domains? [16:18:31] Cause this morning it was mostly impacting internal services tbh [16:18:46] https://cdn.discordapp.com/attachments/1006789349498699827/1280925031119061043/image.png?ex=66d9d9e5&is=66d88865&hm=80d3e36ac2afdaf9e75000c61b56f5ccde1dc4f262d6f163dac36a0bd0cd523e& [16:18:55] I'll try flushing local DNS cache to be prudent [16:18:55] oh [16:19:19] that is quite possibly a different error altogether [16:19:23] oh ye [16:19:24] that is [16:19:35] @notaracham can I dm you? [16:19:40] i forgot about you [16:19:43] as time passed, IPv6 connectivity was being lost on one cloud server and some VMs [16:20:04] Heh, sure. My workday has begun so i'll be less available for the next couple horus [16:20:18] in fact, as I was writing a comment on the phorge task about PowerDNS, phorge171 had lost IPv6 connectivity entirely [16:21:21] it all happened right when I powered on the computer too... [16:21:35] you think it was a signal that maybe I shouldn't use the computer? [16:22:21] yes [16:22:33] it feels like a going outside and not using computers day tbh [16:22:47] I'm gonna do just that today actually [16:23:14] we once had a cloudflare outage when fossbots was behind cf for like half an hour and my status update was, ye nothing we can do. It's sunny. Go outside instead. [16:28:09] I can’t access non-mediawiki services unless I proxy/vpn to a different country [16:32:35] weird [16:32:49] that points to something ISP specific in the US though [20:49:17] Currently going through every vm and if I can't ping it, I restart it [20:49:28] wack a mole [20:49:43] how long do we wanna wait before offically pronouncing the issue resolved [20:49:44] cloud16 done [20:50:03] i have no power whatsoever, but maybe a few hours? [20:50:21] that sounds reasonable [20:50:28] * MacFan4000 watches the icings alerts vanish one by one [20:50:41] and hey I don't offically have tech power either lol [20:50:54] if it's looking good maybe that tentative give it a try announce, then confirm after a good period? [20:50:55] dont mean opinions arent helpful [20:51:09] id say only ping after a bit [20:51:12] @orduin [20:51:12] just going based on whar happened earlier [20:51:25] can you get info from FS bout what happened [20:51:36] istg if its something stupid like a loose cable [20:52:03] Dunno, we were told "Please try now, this should be resolved." [20:52:16] can you ask? [20:52:22] prob useful for the PM [20:52:27] or is it locked [20:57:05] cool [20:57:06] well [20:57:11] im gonna log off for now [20:57:23] may pop on IRC later [21:00:43] Thanks @orduin [21:19:15] I'll keep an eye on things, I've asked FS for any hints on what happend and if there's anything we can do on our end. [21:21:40] Cool