[06:02:34] <wikibugs>	 10Traffic, 10Operations: cp3048 down, mgmt console not reachable - https://phabricator.wikimedia.org/T171145#3455383 (10elukey)
[06:02:40] <elukey>	 ema: ---^
[07:22:06] <wikibugs>	 10Traffic, 10Operations: cp3048 down, mgmt console not reachable - https://phabricator.wikimedia.org/T171145#3455383 (10MoritzMuehlenhoff) I had the same symptom wich oxygen a few days ago and a "racadm racreset" fixed the mgmt for me.
[07:32:18] <wikibugs>	 10Traffic, 10Operations: cp3048 down, mgmt console not reachable - https://phabricator.wikimedia.org/T171145#3455451 (10elukey) From ipmitool sel I got a lot of these:  ```   7b | 07/20/2017 | 01:06:48 | Processor #0x0d | Transition to Non-recoverable | Asserted   7c | 07/20/2017 | 01:06:49 | Unknown #0x28 |...
[08:15:37] <ema>	 elukey: hey :)
[08:16:15] <ema>	 elukey: so cp3048 did come back up after power-cycling I see
[08:17:14] <ema>	 _joe_: thanks for taking care of the depool
[08:17:59] <_joe_>	 np
[08:20:09] <elukey>	 ema: yep! Weird but this morning me/Daniel weren't able to connect to mgmt
[08:21:21] <ema>	 elukey: so mgmt wasn't reachable at first, then later on it was?
[08:23:34] <elukey>	 this is my understanding but it was early in the morning so it might have done some PEBCAK
[08:23:55] <moritzm>	 the sshd on the mgmt is old, so it runs into the "slow DH group" problem from https://phabricator.wikimedia.org/T171041
[08:24:23] <moritzm>	 I think it was working fine all the time, but simply timed out with openssh > 7 as the client
[08:25:27] <ema>	 oh interesting
[08:58:36] <wikibugs>	 10Traffic, 10Operations: cp3048 down, mgmt console not reachable - https://phabricator.wikimedia.org/T171145#3455570 (10ema) 05Open>03Resolved a:03ema So as @MoritzMuehlenhoff mentioned on IRC the mgmt issues might have been due to T171041.  The host is back online and looks fine at the moment so I've re...
[09:24:11] <wikibugs>	 10Traffic, 10Operations, 10Reading-Admin, 10Reading-Community-Engagement: TEST: redirect small portion of unauthenticated desktop users to mobile web - https://phabricator.wikimedia.org/T117826#3455702 (10fgiunchedi)
[11:22:20] <paravoid>	 ema: we're getting a lot of alerts from UnitedLayers' Icinga about a PDU failing
[11:22:33] <paravoid>	 it may be that their network is flaky, or that the PDU is indeed flaky and rebooting or something
[11:23:00] <paravoid>	 if it's the latter which I doubt, we may be seeing equipment of ours losing half their power
[11:23:03] <paravoid>	 jfyi :)
[11:28:57] <ema>	 paravoid: thanks for the heads up!
[12:22:31] <volans>	 paravoid: last time I've asked rob about this he told me that seems that their check is flaky and the PDUs were fine, but worth checking again 
[12:22:46] <volans>	 it was a couple of months ago I think
[13:15:57] <paravoid>	 ema: cp3039 has a weird OCSP warning
[13:21:07] <ema>	 paravoid: mmh, globalsign-2016-rsa-unified.ocsp is in fact one day old
[13:24:28] <ema>	 Jul 20 10:52:42 cp3039 update-ocsp-all: Error querying OCSP responder
[13:24:29] <ema>	 Jul 20 10:52:42 cp3039 update-ocsp-all: 140171424560784:error:27076072:OCSP routines:PARSE_HTTP_LINE1:server response error:ocsp_ht.c:314:Code=524,Reason=Unassigned
[13:24:31] <ema>	 Jul 20 10:52:42 cp3039 update-ocsp-all: 
[13:24:33] <ema>	 Jul 20 10:52:42 cp3039 update-ocsp-all: OCSP update failed for /etc/update-ocsp.d/globalsign-2016-rsa-unified.conf
[13:24:42] <ema>	 I've tried running update-ocsp-all manually and it did work fine
[13:28:14] <volans>	 ema: I remember to chat with brandon about this, basically my understanding is that we run it once per day and if it fails icinga will warning for 1 day and the next day it if it runs fine the alarm goes away
[13:28:49] <volans>	 I've proposed to maybe do a single retry on the script that fetches it on failure after sleeping a bit
[13:29:04] <volans>	 to avoid those false positives for a single failure
[13:30:14] <ema>	 +1
[13:38:53] <ema>	 mmh cp1050 is stuck at 'Initializing firmware interfaces...'
[13:40:58] <wikibugs>	 10netops, 10Operations, 10monitoring: Evaluate LibreNMS' Graphite backend - https://phabricator.wikimedia.org/T171167#3456248 (10faidon)
[13:42:45] <ema>	 yes, I am trying to turn it off and on again :)
[13:51:21] <wikibugs>	 10netops, 10Operations, 10monitoring, 10User-fgiunchedi: Evaluate LibreNMS' Graphite backend - https://phabricator.wikimedia.org/T171167#3456276 (10fgiunchedi)
[13:54:29] <wikibugs>	 10Traffic, 10Commons, 10Operations, 10Thumbor, 10media-storage: ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION - https://phabricator.wikimedia.org/T170605#3456281 (10Jeff_G) Another new symptom using the same browsers as in the original description: https://upload.wikimedia.org/wikipedia/commons/thumb...
[13:55:55] <wikibugs>	 10Traffic, 10Operations, 10ops-eqiad: cp1050 apparently stuck while "Initializing firmware interfaces..." - https://phabricator.wikimedia.org/T171168#3456283 (10ema)
[13:56:11] <wikibugs>	 10Traffic, 10Operations, 10ops-eqiad: cp1050 apparently stuck while "Initializing firmware interfaces..." - https://phabricator.wikimedia.org/T171168#3456296 (10ema) p:05Triage>03Normal
[13:56:36] <wikibugs>	 10Traffic, 10Operations, 10ops-eqiad: Degraded RAID on cp1008 - https://phabricator.wikimedia.org/T171028#3456297 (10ema) @Cmjohnson please replace the disk (sda) whenever you've got the chance!
[14:05:18] <wikibugs>	 10Traffic, 10Operations: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442#3456328 (10ema) Forwarding-only caching resolvers would help with issues such as T171048 and T151643.
[14:29:12] <wikibugs>	 10Traffic, 10Commons, 10Operations, 10Thumbor, 10media-storage: ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION - https://phabricator.wikimedia.org/T170605#3456387 (10Aklapper) No such problems in Firefox 54 or Chromium 59 on a Linux desktop. Issue seems to be browser / platform specific?
[15:40:13] <wikibugs>	 10netops, 10Operations, 10ops-eqiad: Replace cr1/2-eqiad air filters - https://phabricator.wikimedia.org/T170138#3456762 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson done
[22:59:54] <wikibugs>	 10Traffic, 10Operations, 10Performance-Team, 10TemplateStyles, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#3458687 (10Etonkovidova)