[03:08:22] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Pruem) [03:24:56] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10CDanis) My guess is that the beginning of this problem correlates with the beginning of the fetch... [04:03:59] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Pruem) This correlates to the previous appearance of the problem in early June, see [[https://graf... [06:58:15] 10netops, 10Operations, 10Patch-For-Review: check_ospf.py fails on mr1-eqsin - https://phabricator.wikimedia.org/T225905 (10ayounsi) > (timeout is not respected) I couldn't reproduce that bug. And it seems to be respected on all my tries now. CR above should solve the issue. [07:42:25] XioNoX: checking the source code of snimpy.manager it looks like the default timeout is 5 seconds [07:44:20] on the Manager class itself, the default value is None https://github.com/vincentbernat/snimpy/blob/master/snimpy/manager.py#L231 [07:45:42] vgutierrez: ok! setting it to 30 seems to solve the issue [07:45:59] 5s should be enough in theory, but it looks like it's not [07:46:16] but when it builds the session object, only sets the timeout if is not None [07:46:17] https://github.com/vincentbernat/snimpy/blob/master/snimpy/manager.py#L278-L285 [07:46:26] and the session object has a 5secs default timeout [07:46:26] `real 0m8.541s` [07:48:07] well, that's for the whole script, with several snimpy calls [07:48:57] but manually running it with and without the `timeout=30` show that it works [07:50:37] ok [08:11:25] 10netops, 10Operations: Investigate cr2-eqord's disconnection from the rest of the network - https://phabricator.wikimedia.org/T224535 (10ayounsi) 05Open→03Resolved Opened T226158 for the tunnel. Everything else here is done. [08:33:47] 10netops, 10Operations, 10Patch-For-Review: check_ospf.py fails on mr1-eqsin - https://phabricator.wikimedia.org/T225905 (10ayounsi) 05Open→03Resolved a:03ayounsi Check is now green! https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=mr1-eqsin&service=OSPF+status [09:10:44] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10ema) Some additional observations: - We're currently running with TCP SACK disabled (T225998) - T... [09:15:53] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Tomybrz) [09:19:40] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Tgr) Not limited to dewiki / Germnany (unsurprisingly), there have been a bunch of reports from hu... [09:40:34] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10ArielGlenn) I saw many slow en wiki page loads yesterday, including missing skins. (Logged in user... [10:16:27] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Yann) https://commons.wikimedia.org/wiki/Commons:Village_pump#Timouts [10:39:28] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Aklapper) @Yann: There is nothing new in that thread? Feel free to add `{{tracked|Txxxxxx}}` on-wi... [11:02:54] 10Traffic, 10Operations, 10Patch-For-Review: Make cp1099 the new pinkunicorn - https://phabricator.wikimedia.org/T202966 (10MoritzMuehlenhoff) Or maybe use one of cp1071-cp1074, the servers which were used for the original ATS tests? These were bought in 2015 and are currently unused. [12:43:33] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Bouzinac) Interested in the solving of this pb. What I can say is that a common template (https://... [13:29:13] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Gestumblindi) Seems to be better. Solved by the rebooting mentioned by @ema ? I don't experience t... [14:03:34] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Wurgl) Agree to @Gestumblindi ! I just tried and everything is less than 3 seconds. [14:22:09] 10Traffic, 10Wikimedia-Apache-configuration, 10DNS, 10Matrix, and 2 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Joe) @tgr just to be sure, you just want the url https://wikimedia.org/.well_known to be served from a static file? [14:34:17] 10Traffic, 10Operations, 10docker-pkg, 10serviceops: Getting registry metadata from a public client fails on our registry - https://phabricator.wikimedia.org/T220085 (10fsero) works for me ` >>> import docker >>> client = docker.from_env(version='auto') >>> print(client.images.get_registry_data('quay.io... [14:40:04] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10PM3) +1, everything running smoothly now, including API queries. [14:40:53] 10Traffic, 10Operations, 10Performance-Team, 10Performance: Sometimes pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10PM3) [14:41:51] 10Traffic, 10Wikimedia-Apache-configuration, 10DNS, 10Matrix, and 2 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Tgr) >>! In T223835#5271300, @Joe wrote: > @tgr just to be sure, you just want the url https://wikimedia.org/.well_kno... [15:28:37] 10Traffic, 10Operations, 10docker-pkg, 10serviceops: Getting registry metadata from a public client fails on our registry - https://phabricator.wikimedia.org/T220085 (10Joe) >>! In T220085#5271335, @fsero wrote: > works for me using python 2.7 and docker==3.7.2 > > ` >>>> import docker >>>> client = dock... [16:30:14] 10Traffic, 10DC-Ops, 10Operations: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) Ok, in checking, EQIAD seems to enter its PEAK usage around 20:00 GMT (so about a half an hour from now at 10:00 Pacific.) I'll pull both the 'show chassis power' on cr1 an... [16:52:11] 10Traffic, 10DC-Ops, 10Operations: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) commands to run: juniper: show chassis power dell (via idrac): racadm getsensorinfo [17:06:17] 10Traffic, 10DC-Ops, 10Operations: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) Power data: Power drawn live @ approximately 10:00-10:10 AM Pacific: cr1-eqiad: System: Zone 0: Capacity: 4100 W (maximum 4100 W) Allocated power:... [17:14:58] 10Traffic, 10DC-Ops, 10Operations: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) so for the QFX5100 (thanks @papaul) the command is: ` show chassis environment pem ` The 10G switches are PEM 2/4/7, so I'll just include them all: asw2-b-eqiad: ` FPC... [18:31:56] 10Traffic, 10DC-Ops, 10Operations: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) [18:33:34] 10Traffic, 10DC-Ops, 10Operations: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) [18:36:23] 10Traffic, 10DC-Ops, 10Operations: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) Updated from irc chat and @bblack. Peak eqiad time is actually 01:30 GMT (18:30 Pacific). For total Linux hosts: 9x lvs/misc/ganeti type nodes, 16x cache nodes. Basically...