[03:47:56] 10Traffic, 10Operations, 10Patch-For-Review, 10Puppet, and 3 others: Deprecate `base::service_unit` in puppet - https://phabricator.wikimedia.org/T194724 (10bd808) [07:22:13] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10ayounsi) > I checked the log messages and could see “I2c slave read back errors” on this FPC: > May 31 20:00:08 re0.cr1-codfw chassisd[4838]: CHASSISD_I2CS_READBACK_ERROR: Readback error from I2C slave forFPC 0 ([... [08:39:32] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10ayounsi) a:05ayounsi→03Papaul JTAC doesn't want to RMA it without a restart/reseat. Restart didn't help. @papaul can you re-seat cr1-codfw:fpc0 asap? [09:00:58] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10ayounsi) a:05Papaul→03ayounsi Opened remote hands request instead T254136. [11:50:50] 10netops, 10Operations, 10ops-esams: Amsterdam maintenance (June 2020) - https://phabricator.wikimedia.org/T254021 (10ayounsi) Scheduling it for this Wednesday June 3rd at 6am UTC, 2h window for a 1h work. [13:19:21] XioNoX: kafkatee / rpkicounter looks kinda broken on netflow3001 [13:19:27] May 31 18:00:51 netflow3001 kafkatee[22321]: OSError: [Errno 98] Address already in use [13:19:37] still lots of spam for that [13:20:06] https://gerrit.wikimedia.org/r/c/operations/puppet/+/585434 [13:20:21] ahah okay got it [13:20:26] :) [13:20:37] are you going to uninstall on those hosts by hand? [13:21:27] cdanis: it's only 5 hosts so yes, but I can be convinced otherwise [13:21:46] nah up to you [13:22:42] I was digging through logs trying to figure out what happened here https://w.wiki/SVn [13:25:36] curious [13:25:41] exactly 30min [13:25:46] yeah it's strange [13:26:03] ah wait [13:26:06] it's _all_ exporter_ips [13:26:29] yeah okay, so this feels like a druid problem or something [13:28:08] ok [13:34:37] 10Traffic, 10Operations, 10observability, 10Performance-Team (Radar), 10Sustainability (Incident Prevention): Document and/or improve navigation of the various HTTP frontend Grafana dashboards - https://phabricator.wikimedia.org/T253655 (10fgiunchedi) Indeed, thanks for filing this @Krinkle ! My two cent... [13:52:17] 10Traffic, 10Analytics, 10Operations: missing wmf_netflow data, 18:30-19:00 May 31 - https://phabricator.wikimedia.org/T254161 (10CDanis) [14:37:57] bblack: in case you're around by any chance, do you see any blocker to merge https://gerrit.wikimedia.org/r/c/operations/dns/+/585545 ? [14:38:09] (the diff with PS2 that you reviewed is just a rebase and the commit message) [14:39:03] volans: I haven't looked at it in a little while, but I trust my former +1, and all things are currently nominal with the dns clusters [14:39:08] so yeah, got for it [14:39:46] great, thanks, will go "carefully" :) [14:44:50] 10Traffic, 10Operations, 10Privacy Engineering, 10Research, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10JFishback_WMF) [14:47:03] bblack: FYI {done} deployment went ok, I'm testing the records [14:48:15] awesome [14:54:44] all resolution I've done looks good AFAICT [16:43:02] 10Traffic, 10DC-Ops, 10Operations: Fix recdns config on various hardware devices - https://phabricator.wikimedia.org/T254178 (10BBlack) [17:59:26] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10ayounsi) The linecard went through those states: `0 Present Testing` `0 Offline ---Unresponsive---` `0 Present Absent` `0 Offline ---Unresponsive---` And seems to be flapping betw... [18:03:09] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10ayounsi) Followed up with Juniper and requested a RMA. [18:15:17] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10ayounsi) @RobH to be ahead of Juniper, they will need the following to issue the RMA and know where to ship the part. Note that the task is public, feel free to make it SRE only or email me the info. Linecard replacem... [18:43:50] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10RobH) https://netbox.wikimedia.org/dcim/sites/esams/ point of contact is just 'iron mountain shipping' and the generic contact number. That has all the info I think you are requesting? We will need to open in inbo... [18:49:36] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10RobH) IRC update: codfw not esams, heh.. https://netbox.wikimedia.org/dcim/sites/codfw/ I would list the generic info for their NOC though, which I'll append to that entry now.