[05:10:14] <_joe_> cdanis: that's wrong, we do run rsyslog in every mw pod [05:11:17] <_joe_> it listens on a local tcp port for messages coming from mediawiki. We've considered running daemonsets instead [13:22:54] thanks _joe_ the additional context is trying to capture more debug output from msmtp, https://phabricator.wikimedia.org/T383047#11075860 [13:23:59] <_joe_> jhathaway: so you want to send its logs to syslog? [13:24:03] <_joe_> or what specifically? [13:25:16] it writes its error messages to stderr, but php-fpm sends those to /dev/null, so I was thinking of adding a wrapper in combination with logger to send them to a remote rsyslog, i.e. the one in the pod [13:26:21] <_joe_> that could work but you'll need to add some specific config to that rsyslog [13:27:05] I couldn't pretend to be a mediawiki log? [13:27:22] <_joe_> please no :) [13:27:28] advice taken [13:27:39] <_joe_> there's a series of reasons why that's not great [13:27:52] <_joe_> first that logs from mw have their own topics and filters IIRC [13:29:04] got it, just sounded simpler at first pass, though I do love rainer script [13:36:17] _joe_: ahh I misread the chart and didn't do my usual thing of just looking at kubectl directly 😅 re: daemonset, in theory, the otel collector should have all the functionality we'd need [13:56:37] crossposting from Slack: urandom and I are looking to present at https://2025.texaslinuxfest.org/ in October. Are there any previous SRE presentations y'all would recommend to borrow/take inspiration from/etc? [14:02:31] jhathaway or moritzm, could I get a quick look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1177403? It's the last blocker (I think) to us building Trixie images for cloud-vps. [14:12:38] yup [14:27:53] andrewbogott: will these images use the Puppet 8 agent from Trixie? for baremetal hosts we'll eventually use a forward port of the Puppet 7 agent (availanle in component/puppet7 for trixie-wikimedia), so that's also an option for cloud VPS [14:28:42] but it's also perfectly fine to use 8 from the get go, the component is mostly used to not block trixie migrations until all legacy facts are ported away across the roles [14:30:18] moritzm: they will use 8 for the moment, but the bootstrap code does in upgrade with --allow-downgrades so if/when you pin things then new VMs will switch to 7 [14:34:56] moritzm I just created T401694 for openjdk-11 on Bookworm. It's low-priority but if I can help LMK. I can reprepro with the worst of 'em :) [14:34:57] T401694: Make openjdk-11 available in our Debian Bookworm repos - https://phabricator.wikimedia.org/T401694 [14:37:45] andrewbogott: ah, that makes sense [14:37:59] inflatador_: I'll look into it next week [14:38:03] thanks moritzm [14:39:03] moritzm ACK, thanks. I imagine you are pretty busy with Trixie ;P [15:04:19] !log tcpdump dhcp traffic capture on cloudnet1005 and cloudnet1006 - T400223 [15:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:23] T400223: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223 [20:59:24] heads-up that I've depooled eqiad search for troubleshooting T400160 . We plan to repool within the next hr or so cc: ebernhardson ryankemper [20:59:25] T400160: Investigate eqiad cluster quorum failure issues - https://phabricator.wikimedia.org/T400160 [21:12:58] !incidents [21:12:59] 6591 (UNACKED) ATSBackendErrorsHigh cache_text sre (mw-web-ro.discovery.wmnet eqsin) [21:12:59] 6584 (RESOLVED) db2161 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:00] 6583 (RESOLVED) db2154 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:00] 6589 (RESOLVED) db2163 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:00] 6588 (RESOLVED) db2152 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:00] 6587 (RESOLVED) db2166 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:00] 6585 (RESOLVED) db2164 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:02] 6590 (RESOLVED) db2181 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:02] 6586 (RESOLVED) db2167 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:02] 6580 (RESOLVED) db2164 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:02] 6574 (RESOLVED) db2167 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:03] 6579 (RESOLVED) db2154 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:03] 6578 (RESOLVED) db2161 (paged)/MariaDB Replica Lag: s8 (paged) [21:13:04] !ack 6591 [21:13:04] 6576 (RESOLVED) db2163 (paged)/MariaDB Replica Lag: s8 (paged) [21:55:11] Following up from above, we've repooled eqiad search