[00:54:26] 10netops, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10Jclark-ctr) [00:55:41] 10netops, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson switches racked in c8 and d5 added to Netbox [07:43:39] 10Traffic, 10Operations, 10Performance-Team (Radar): Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 (10Gilles) [[ https://grafana.wikimedia.org/d/M7xQ_BeWk/response-time-by-host?orgId=1 | The dashboard ]] won't open for me now, it's stuck on a spinner: {F... [07:50:38] 10netops, 10Operations, 10ops-eqord: eqord - ulsfo Telia link down - IC-313592 - https://phabricator.wikimedia.org/T221259 (10ayounsi) Remote hands replaced the optics yesterday but the link is still down. Lights are correct. Emailed Telia 12h ago with: ` Remote hands replaced the optic, we're still seeing... [08:30:20] 10Traffic, 10Operations, 10Patch-For-Review: ATS: Add the ability to check if origin server responses can be cached and their lifetime to the Lua plugin - https://phabricator.wikimedia.org/T251537 (10ema) The PR adding a function to the TS API for getting maxage [[ https://github.com/apache/trafficserver/pul... [09:37:37] 10netops, 10Operations, 10observability: LibreNMS monitoring glitch caused paging - https://phabricator.wikimedia.org/T252630 (10ayounsi) p:05Triage→03Medium [09:39:42] 10netops, 10Operations: Upgrade Junos on asw2-esams - https://phabricator.wikimedia.org/T252631 (10ayounsi) p:05Triage→03Low [09:40:16] 10netops, 10Operations, 10observability: LibreNMS monitoring glitch caused paging - https://phabricator.wikimedia.org/T252630 (10ayounsi) [12:26:52] 10Traffic, 10Analytics, 10Operations, 10User-jbond: Fix geoip updaters for new MaxMind hashed keys by 2019-08-15 - https://phabricator.wikimedia.org/T228533 (10Marostegui) It's been around 10 months since the last update, anything pending here? [12:38:47] 10Traffic, 10Operations: cp5012 fails to boot after reimage: junk in compressed archive unpacking initramfs - https://phabricator.wikimedia.org/T237360 (10Marostegui) @ema is this still valid? [12:44:23] XioNoX: another weird thing that happened last night was we had some packet loss towards eqsin (enough to cause icinga to alert about SNMP scraping failures for the router network interfaces check) [12:45:17] cdanis: didn't check timeline, but looks like the primary transport to eqsin failed [12:45:26] from librenms emails [12:45:30] ah [12:45:35] around 03:55? [12:45:37] https://smokeping.wikimedia.org/?displaymode=n;start=2020-05-13%2003:44;end=2020-05-13%2004:20;target=eqsin.Core.cr1-eqsin [12:46:26] cdanis: Time elapsed: 3m 39s Timestamp: 2020-05-13 04:12:01 [12:47:03] so I think it's related [12:47:04] so seems it took a while to fail over then [12:47:54] https://cas-icinga.wikimedia.org/cgi-bin/icinga/history.cgi?host=cr1-eqsin&service=Router+interfaces [12:48:23] the timestamp I shared was the recovery it seems [12:50:18] cdanis: yeah without digging too much, and with LibreNMS 5min granularity that sounds about right [12:50:42] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Operations, 10Thumbor: 500, Internal Server Error on Commons for images at specified size - https://phabricator.wikimedia.org/T250211 (10Marostegui) 05Open→03Resolved This works now. I am going to consider this resolved, maybe it was a one time thi... [12:51:04] I guess, packet loss has to get bad enough for BFD / OSPF to stop working, but of course other things can notice or be affected 'first' [12:53:10] 10Traffic, 10Operations: Servers freezing across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Marostegui) [12:57:19] 10Traffic, 10Operations: Backport iproute2 4.x from debian testing -> our jessie - https://phabricator.wikimedia.org/T138591 (10Marostegui) 05Open→03Declined Declining per T138591#3853953 [12:58:09] 10Traffic, 10Operations, 10SRE-swift-storage: Unexplained increase in thumbnail 500s - https://phabricator.wikimedia.org/T147648 (10Marostegui) 05Open→03Resolved There is no way we can debug this anymore after 4 years :) [13:19:16] 10netops, 10Operations, 10ops-eqord: eqord - ulsfo Telia link down - IC-313592 - https://phabricator.wikimedia.org/T221259 (10CDanis) At 07:50 UTC Telia responded stating that this was due to a planned maintenance PWIC110129, despite the circuit having been down for days already. The maintenance was schedul... [13:22:26] XioNoX: what's the best way to bounce an interface? [13:23:21] cdanis: `set interface X disabled` `commit confirmed 2` `rollback 1` `commit` [13:23:39] 10Traffic, 10Operations: cp5012 fails to boot after reimage: junk in compressed archive unpacking initramfs - https://phabricator.wikimedia.org/T237360 (10ema) 05Open→03Resolved a:03ema >>! In T237360#6133331, @Marostegui wrote: > @ema is this still valid? We haven't found the cause, but there's certain... [13:23:48] XioNoX: ok, will do [13:24:25] cdanis: thanks! I saw the emails, was working on something else [13:26:04] np, happy to learn things [13:31:09] 10netops, 10Operations, 10ops-eqord: eqord - ulsfo Telia link down - IC-313592 - https://phabricator.wikimedia.org/T221259 (10CDanis) Replied to Telia: > Thanks, interfaces have been bounced on both ends. > > Here's the light levels on the Chicago side: > > cdanis@cr2-eqord> show interfaces diagnostics... [14:43:47] 10Traffic, 10Operations, 10SRE-swift-storage: Unexplained increase in thumbnail 500s - https://phabricator.wikimedia.org/T147648 (10Aklapper) 05Resolved→03Declined Translates to declined to me as nothing was actively resolved :) [14:50:16] 10Traffic, 10Operations: cp5012 fails to boot after reimage: junk in compressed archive unpacking initramfs - https://phabricator.wikimedia.org/T237360 (10ema) 05Resolved→03Declined [15:01:07] 10netops, 10Operations, 10observability, 10User-fgiunchedi: Upgrade LibreNMS to 1.63 - https://phabricator.wikimedia.org/T251222 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Upgraded! [15:02:51] 10netops, 10Operations, 10observability, 10User-fgiunchedi: Upgrade LibreNMS to 1.63 - https://phabricator.wikimedia.org/T251222 (10jcrespo) 05Resolved→03Open ` [14:57] PROBLEM - MariaDB Slave SQL: m1 on db2078 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1118, Errmsg... [15:05:55] 10netops, 10Operations, 10observability, 10User-fgiunchedi: Upgrade LibreNMS to 1.63 - https://phabricator.wikimedia.org/T251222 (10jcrespo) ` Error 'Row size too large. The maximum row size for the used table type, not counting BLOBs, is 8126. This includes storage overhead, check the manual. You have to... [15:39:00] 10netops, 10Operations, 10observability, 10User-fgiunchedi: Upgrade LibreNMS to 1.63 - https://phabricator.wikimedia.org/T251222 (10fgiunchedi) 05Open→03Resolved Resolving as @jcrespo fixed the issue and will be following up with a separate task [18:09:40] 10Traffic, 10Operations, 10Performance-Team (Radar): Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 (10dpifke) Weird. It was working yesterday (I verified that new data was appearing with the correct labels), but is now hanging for me as well. I'll inves... [18:16:19] 10Traffic, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission lvs1007-1012 - https://phabricator.wikimedia.org/T208586 (10Cmjohnson) 05Open→03Resolved lvs10[10-12] still had network port description. removed a few old lvs links in wmnet file. resolving task. Removed all se... [18:16:21] 10netops, 10Operations, 10decommission, 10ops-eqiad: Decommission asw-c-eqiad - https://phabricator.wikimedia.org/T208734 (10Cmjohnson) [18:16:23] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Move servers off asw2-a5-eqiad - https://phabricator.wikimedia.org/T212348 (10Cmjohnson) [18:21:14] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10Cmjohnson) 05Open→03Resolved no sign of any entries for these servers, the have already been sold. [18:21:18] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Move servers off asw2-a5-eqiad - https://phabricator.wikimedia.org/T212348 (10Cmjohnson) [18:21:21] 10netops, 10Operations, 10decommission, 10ops-eqiad: Decommission asw-c-eqiad - https://phabricator.wikimedia.org/T208734 (10Cmjohnson)