[03:49:35] 10Traffic, 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10Tgr) [09:06:32] 10netops, 10Operations: migrate netinsights from rhenium to sulfur - https://phabricator.wikimedia.org/T212011 (10elukey) [10:34:14] loggedin on enwiki I'm seeing an issue I spotted a while ago reoccuring again [10:34:23] startup module request stuck for 2 minutes [10:34:33] with Chrome ultimately bailing with "Failed to load resource: net::ERR_SPDY_PROTOCOL_ERROR" [10:35:06] other wikis I tried aren't having the issue [10:35:23] actually, based on the endless spinner I'm getting, seems to happen logged out too [10:36:13] yep, net::ERR_SPDY_PROTOCOL_ERROR 200 in an incognito window as well, still for the startup module [10:40:32] it's fine now [11:17:14] 10Traffic, 10Operations, 10Patch-For-Review: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10Vgutierrez) So, after setting http2_max_field_size to 8k, you can properly fetch the same URLs over HTTP 1.1 and HTTP2. If the max length is exceeded over HTTP 1.1, a 414 error... [11:59:27] lvs2002 has a failed raid battery, FYI [12:33:01] 10Traffic, 10Operations, 10ops-codfw: lvs2002: raid battery failure - https://phabricator.wikimedia.org/T213417 (10ema) [12:33:08] 10Traffic, 10Operations, 10ops-codfw: lvs2002: raid battery failure - https://phabricator.wikimedia.org/T213417 (10ema) p:05Triage→03Normal [12:33:47] jynus: task opened, thank you [12:34:37] this is up to you, but what I do is acking the alert and commenting the ticket number there [12:35:19] yep I usually do that too :) [12:36:14] I am saying because I can see reasons not to do it- for example, I am not sure it realerts if the text changes or goes to critical [12:36:31] with so many different checks on the same entry [13:46:06] we have some replacements for lvs2001-6 in some state of progress too [13:46:59] https://phabricator.wikimedia.org/T196560 [14:12:07] sadly those are blocked by T203194 [14:12:07] T203194: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 [14:12:21] the bnxt_en issue also affects those new lvs boxes :( [14:12:56] 10Traffic, 10Operations, 10Patch-For-Review: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10Anomie) Firefox 64 and Chromium 72 also react to the "enhance your calm" as a dropped connection rather than as a 414. Firefox gives me an empty page, while Chromium is a bit m... [14:14:23] sigh... Chromium and firefox error handling regarding HTTP2 sucks big time apparently [15:43:38] 10Wikimedia-Apache-configuration, 10Operations, 10VisualEditor: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10matmarex) [16:02:20] 10Wikimedia-Apache-configuration, 10Operations, 10VisualEditor: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10Elitre) Is T212575 related? [16:07:40] have the traffic or performance teams asked for HAR files to diagnose such things in the past? [16:40:45] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10BBlack) I suspect our bug is fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=73f21c653f930f438d53eed29b5e4c65c8a0f906 which i... [16:45:05] 10Wikimedia-Apache-configuration, 10Operations, 10VisualEditor: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10CDanis) If anyone can manage to reproduce this consistently, even for a few minutes, capturing a Chrome NetLog using chrom... [16:45:50] 10Traffic, 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10Jhernandez) [16:49:32] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10MoritzMuehlenhoff) >>! In T203194#4870205, @BBlack wrote: > I suspect our bug is fixed by: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm... [16:55:03] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10BBlack) Actually, it is already in the 4.9.y LTS/stable branch, here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.9.y&id=b2be15b... [16:56:06] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10Vgutierrez) yeah, it's included as part of 4.9.134: https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.9.134 [16:56:29] bblack: you beat me on finding it by 1 minute [16:58:22] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10MoritzMuehlenhoff) Even better, then we can simply get the 4.9.144-1 kernel from stretch-proposed-updates and test whether that is the correct fix [17:04:31] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10BBlack) Yeah. It's hard to "prove" whether we have this bug fixed other than running a supposed fix on the bnxt_en cp10 fleet for a while as a statistical test, but p... [17:53:52] 10Traffic, 10Wikimedia-Apache-configuration, 10Operations, 10VisualEditor: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10herron) p:05Triage→03Normal [22:18:32] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10Legoktm) HTTP 429 is rate limiting... https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429 Since these are calls to ap... [22:18:56] are API rate limits documented somewhere? [22:24:21] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10herron) p:05Triage→03Normal [22:24:48] 10Traffic, 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10herron) p:05Triage→03Normal [22:26:04] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10bd808) >>! In T213475#4871260, @Chicocvenancio wrote: > On a different note, is this impossible to be done from the dumps? As... [22:28:20] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10Automactic) Hi, I am the dev of Zimfarm (the system automating the scrape process). I can run the scraper at home successfully... [22:30:42] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10chasemp) Could a change to coming from a 172 address have effected ratelimit whitelisting? [22:42:36] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10bd808) The 429 response is definitely a rate limit on on the Wikimedia side. It is not obvious to me by looking at the upstream... [22:42:40] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10akosiaris) >>! In T213475#4871480, @chasemp wrote: > Could a change to coming from a 172 address have effected ratelimit whitel... [22:44:57] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10akosiaris) > Yes. The VCL code that performs the rate limiting is in modules/varnish/templates/text-frontend.inc.vcl.erb and in... [22:46:36] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10BBlack) Right. I'm not up to speed on where all related changes are, but from VCL's point of view its definition of `wikimedia... [23:05:36] 10Traffic, 10Cloud-VPS, 10Operations, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10akosiaris) >>! In T213475#4871518, @BBlack wrote: > Right. I'm not up to speed on where all related changes are, but from VCL'...