[02:25:28] 10Traffic, 10DNS, 06Operations, 13Patch-For-Review: mgmt hosts that exist but don't resolve to an IP - https://phabricator.wikimedia.org/T149875#2797909 (10Dzahn) [02:28:07] 10Traffic, 10DNS, 06Operations, 13Patch-For-Review: mgmt hosts that exist but don't resolve to an IP - https://phabricator.wikimedia.org/T149875#2797911 (10Dzahn) 05Open>03Resolved all done. ran getmgmtips rejects.txt stays empty. [07:09:42] 10Traffic, 10MediaWiki-General-or-Unknown, 06Operations: Failure to save recent changes - https://phabricator.wikimedia.org/T150503#2798106 (10Marostegui) 05Open>03stalled p:05Unbreak!>03High [09:29:29] is there away to test the 6k use case reported in --^ [09:46:31] elukey: yeah, I guess. I've edited my sandbox with >6k of text and no repro. [09:47:45] ema: your sandbox? (/me ignorant) [09:48:25] elukey: https://en.wikipedia.org/wiki/Help:My_sandbox :) [09:48:42] ah nice! [09:49:25] I also checked the apache2logs and didn't find any major trend ongoing (at least not similar to last week) [12:37:08] 10Wikimedia-Apache-configuration, 06Operations: catch-all apache vhost on the cluster should return 404 for non-existing sites - https://phabricator.wikimedia.org/T137176#2798770 (10elukey) ``` elukey@mw1099:~$ sudo apachectl -S VirtualHost configuration: 127.0.0.1:80 localhost (/etc/apache2/conf-ena... [14:55:10] bblack: any ideas on what to do about the hundreds of alerts we have for SSL certs? [14:55:24] Displaying Result 1 - 208 of 208 Matching Services [14:55:43] the vast majority of them are two alerts per cp*, for unified ecdsa/rsa [15:01:06] paravoid: if you want to ack them (removing the tick from Sticky Acknoledgement to let them alarm again on critical) you can search for "HTTPS Unified" in Icinga and do select all ;) [15:06:13] meanwhile, 13 hosts in text to go (varnish+kernel upgrade). In parallel, I've started with upload [15:22:59] yeah we'll just have to ack them [15:23:09] the alerts are legit :) [15:23:22] I'll go clean it up if someone hasn't already [15:25:44] [done] [16:34:29] 10Traffic, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2799548 (10Fjalapeno) I was reading over some of the Strawman API was wondering, is the response going to specify the file type? I couldn't quite tell.... [16:40:20] 07HTTPS, 10Traffic, 10Monitoring, 06Operations, 13Patch-For-Review: adjust ssl certificate montioring to differentiate between standard and LE certificates. - https://phabricator.wikimedia.org/T144293#2799556 (10AlexMonk-WMF) 05Open>03Resolved a:03AlexMonk-WMF [16:42:29] cache_text done [16:43:03] 10Traffic, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2799572 (10Anomie) >>! In T66214#2799548, @Fjalapeno wrote: > I was reading over some of the Strawman API was wondering, is the response going to specify... [16:43:48] \o/ [16:44:05] more interviews starting up soon, I won't be very responsive off and on today [16:44:16] have fun! :) [17:57:47] stopping the reboots to investigate what happend to cp3039 [17:57:56] (see T150879) [17:57:56] T150879: Cannot connect to cp3039.mgmt.esams.wmnet:22 - https://phabricator.wikimedia.org/T150879 [18:33:44] wikibugs wtf :P [18:33:56] ema: I'll look at 3039 when I get free later in a few hours [18:38:24] apparently the reboots reset the ack on the cert warnings too heh [18:39:35] I did a sticky ack this time. it's not like we're going to forget in this case. [18:49:55] what's wtf about wikibugs? [18:50:24] it wasn't here pasting ticket updates, which has been a problem the past few days [18:50:32] I do see it was working for a while inbetween above, though [19:15:22] bblack: thanks. Yeah, setting the downtimes seems to reset acks unfortunately [19:16:59] it happened also to cp3009's acks yesterday [19:18:00] carrying on with the remaining upload hosts [19:51:39] alright I'm calling it a day, we've got 10 upload hosts to go. Happy to take care of them tomorrow :) [20:10:15] 10Traffic, 06Operations, 10media-storage: Unexplained increase in thumbnail 500s - https://phabricator.wikimedia.org/T147648#2800392 (10fgiunchedi) @JoeWalsh is there a timeline for 5.3.0 ? We're still seeing significant traffic for 0px requests [20:51:49] 10Wikimedia-Apache-configuration: Unit tests for apache config/rewrites - https://phabricator.wikimedia.org/T57857#595854 (10elukey) Idea worth to discuss imo: setting `LogLevel rewrite:trace8` in the apache logs (only for testing) is really useful: ``` elukey@deployment-mediawiki06:~$ curl http://localhost --h... [21:28:00] 10Wikimedia-Apache-configuration: Unit tests for apache config/rewrites - https://phabricator.wikimedia.org/T57857#595854 (10hashar) `operations/apache-config.git` had a test suite written by Tim. Based on curl it was exercising the apache conf quite nicely. You can find it in origin/HEAD^ in /test/ . Maybe we... [21:30:34] 10Traffic, 06Operations, 05Prometheus-metrics-monitoring: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479#2800666 (10fgiunchedi) I've captured `varnishstat -j` over the course of 1/2 day on `cp4001` and it seems the uuid is the backend "identity"... [23:52:34] 10Traffic, 06Operations, 05Prometheus-metrics-monitoring: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479#2787072 (10BBlack) Yeah the UUID in there is actually from the VCL. Every time we change VCL, it's recompiled and the output is given a UUID...