[07:06:24] 10Traffic, 06Labs, 06Operations, 10Tool-Labs: repeated 503 errors for 90 minutes now on cp1065 - https://phabricator.wikimedia.org/T146451#2698695 (10doctaxon) 05Resolved>03Open Hi, I think, a restart is needed again, there are too much 503 errors on several proxy servers like cp1053. A reasonable bot... [07:09:54] 10Traffic, 06Labs, 06Operations, 10Tool-Labs: repeated 503 errors for 90 minutes now on cp1065 - https://phabricator.wikimedia.org/T146451#2698701 (10doctaxon) If those errors occur again and again, a technically check of these proxies has to be done, I suppose. [07:52:54] 10Traffic, 06Labs, 06Operations, 10Tool-Labs: repeated 503 errors for 90 minutes now on cp1065 - https://phabricator.wikimedia.org/T146451#2698809 (10doctaxon) Firing with traffic (different API URLs) the error report occurs about every 1.5 minutes (!) (Sorry, but what is an unbreak now! error report, if... [09:06:58] 10Traffic, 06Labs, 06Operations, 10Tool-Labs: repeated 503 errors for 90 minutes now on cp1065 - https://phabricator.wikimedia.org/T146451#2698915 (10Joe) All the restarts finished right now, the cluster should be in a much better shape now. [09:45:49] 10Traffic, 06Labs, 06Operations, 10Tool-Labs: repeated 503 errors for 90 minutes now on cp1065 - https://phabricator.wikimedia.org/T146451#2698986 (10Joe) 05Open>03Resolved [09:59:31] <_joe_> ema: can I assign T147480 to you? I am done with the rest of the cluster, I think. [09:59:31] T147480: Upgrade conftool to 0.3.1 - https://phabricator.wikimedia.org/T147480 [09:59:50] _joe_: sure, go ahead [10:00:13] I've tested some basic actions yesterday on puppetmaster1001 and confctl seemed to work fine [10:00:58] 10Traffic, 06Operations, 15User-Joe, 07discovery-system: Upgrade conftool to 0.3.1 - https://phabricator.wikimedia.org/T147480#2698995 (10Joe) a:05Joe>03ema [12:31:28] bblack: https://gerrit.wikimedia.org/r/#/c/314658/ to remove upload v3 compat [12:31:39] and then yes, we can close the task :) [12:37:15] nice :) [12:37:53] cleanup always feels good [14:18:42] 10Traffic, 06Labs, 06Operations, 10Tool-Labs: repeated 503 errors for 90 minutes now - https://phabricator.wikimedia.org/T146451#2699434 (10BBlack) [14:22:05] 10Traffic, 06Labs, 06Operations, 10Tool-Labs: repeated 503 errors for 90 minutes now - https://phabricator.wikimedia.org/T146451#2661551 (10BBlack) (took the cache host out of the title to prevent confusion in future Phab searches for problems on specific cache hosts, since it didn't turn out to be relevant). [14:41:33] 10Traffic, 06Operations, 10media-storage: Unexplained increase in thumbnail 500s - https://phabricator.wikimedia.org/T147648#2699497 (10BBlack) [14:45:24] 10Traffic, 06Operations, 10media-storage: Unexplained increase in thumbnail 500s - https://phabricator.wikimedia.org/T147648#2699515 (10BBlack) [14:46:25] 10Traffic, 06Operations, 10media-storage: Unexplained increase in thumbnail 500s - https://phabricator.wikimedia.org/T147648#2699516 (10fgiunchedi) Could be related to thumbor deployment, partial deployment was enabled on Sept 7th with https://gerrit.wikimedia.org/r/#/c/308746/ on Sept 13th on small wikis h... [15:07:40] 10Traffic, 06Operations, 10media-storage: Unexplained increase in thumbnail 500s - https://phabricator.wikimedia.org/T147648#2699558 (10fgiunchedi) Another data point, scrolling now the list of 500s I see a good chunk with size '0px' coming from `user_agent": "Wikipedia/942 CFNetwork/808.0.2 Darwin/16.0.0"`... [15:14:09] godog: that CFNetwork/Darwin part implies iOS app I think [15:14:13] not sure about Wikipedia/942 [15:14:30] revision? [15:14:36] yeah I wonder whose app that is [15:15:05] yeah I was looking if there's a release schedule for our own app, without luck so far [15:15:45] sep 14th [15:15:49] v5.2.0 [15:17:14] is that what our UA string even looks like? [15:17:18] and Aug 15th 5.0.6 to store, 5.1.0 to beta [15:17:31] I'm just checking the date of the releases from email ;) [15:19:40] I was looking at the code and found a versionedUserAgent but uses WikipediaApp/%@ [15:23:07] heh so yeah possibly not our app, but yeah all Wikipedia/942 afaics [15:24:00] might be a different app? how many do we have? I checked few repos on wikimedia org on github but not found much [15:25:12] there's third party apps too [15:25:23] community-maintained commons app that used to be ours [15:26:13] although the time was kinda aligned with our releases [15:26:23] is missing the Sep. 23rd one only [15:26:47] * volans brb [15:27:41] side issue is that 0px requests shouldn't 500 of course, 411 maybe? :P [15:27:56] 400 regardless [16:30:03] 10Traffic, 10Wikimedia-Apache-configuration, 06Operations, 13Patch-For-Review: Sometimes apache error 503s redirect to /503.html and this redirect gets cached - https://phabricator.wikimedia.org/T109226#2699697 (10elukey) >>! In T109226#2696811, @BBlack wrote: > On your repro attempts: I think the original... [16:33:55] 10Traffic, 10Wikimedia-Apache-configuration, 06Operations, 13Patch-For-Review: Sometimes apache error 503s redirect to /503.html and this redirect gets cached - https://phabricator.wikimedia.org/T109226#2699698 (10BBlack) hmm I'm pretty sure we were able to repro reliably at one point in the past, but I'd... [16:35:17] 10Traffic, 10Wikimedia-Apache-configuration, 06Operations, 13Patch-For-Review: Sometimes apache error 503s redirect to /503.html and this redirect gets cached - https://phabricator.wikimedia.org/T109226#2699700 (10BBlack) Maybe there are different ways in which `HHVM is busted`, and having hhvm be down isn...