[00:14:33] 07HTTPS, 10Traffic, 06Discovery, 06Operations, and 2 others: compile number of http uses for http://www.wikidata.org/entity - https://phabricator.wikimedia.org/T154017#3035189 (10Krinkle) [11:40:20] 10Traffic, 06Analytics-Kanban, 06Operations, 15User-Elukey: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#3036155 (10Milimetric) yes, definitely has been going on for a while because we never really looked at it. We assumed the numbers piwik reported made sense because... [14:39:44] bblack: there's a couple of varnish expiry mailbox alerts btw, not sure what to do about them tho [15:02:18] godog: they're kind of experimental so far [15:05:43] https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?var-server=cp1072&var-datasource=eqiad%20prometheus%2Fops&from=now-6h&to=now&panelId=21&fullscreen [15:05:48] ^ that shows what's being alerted on [15:06:02] but we're still trying to find optimal values for alerting [15:06:29] expiry mailbox lag itself isn't technically a problem, but it's the reason we restart backends once a week (because otherwise eventually it runs away), and can cause 503s when it gets too bad [15:08:02] cp1074 is about a day away from its natural restart, cp1072 closer to 12h [15:09:10] ah, interesting, looks like the same pattern of increasing -> back down -> increasing repeated on feb 8th too [15:09:14] so far no 503 impact from those two that are alerting [15:09:49] in https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes?var-site=eqiad&var-cache_type=upload&var-status_type=5&from=now-6h&to=now [15:10:08] I'll keep an eye out and ack those [15:13:16] neat, thanks ! [18:23:19] 10netops, 06Labs, 06Operations: asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues - https://phabricator.wikimedia.org/T155875#3037010 (10faidon) 05Open>03Resolved a:03faidon The "Sanity Checks Failed" log messages continue to happen sporadically but we haven't had a switch failure in over 3 weeks no... [21:45:56] 10netops, 06Operations, 10ops-ulsfo: lvs4002 power supply failure - https://phabricator.wikimedia.org/T151273#3037470 (10RobH) a:05RobH>03BBlack I'm assigning this task to Brandon for followup. In IRC, we discussed that he would likely fail ulsfo over to a 3 lvs system setup. I'm not sure if there is a...