[11:15:50] 10Traffic, 06Analytics-Kanban, 06Operations, 06Wikipedia-iOS-App-Backlog, and 2 others: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#3052693 (10elukey) >>! In T154558#3043971, @JoeWalsh wrote: > @Milimetric this UA is from the iOS app. In testing locally, I didn't see... [11:31:05] 10Traffic, 06Analytics-Kanban, 06Operations, 06Wikipedia-iOS-App-Backlog, and 2 others: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#3052728 (10elukey) More numbers about number of requests landing to piwik/apache/bohrium and failed ones (503s). The following numbers... [11:51:41] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#2962020 (10Nemo_bis) > a range of issues It would be useful to document what aspects were considered beyond network, legal and cost. For instance, was environmental impact considered (cf. http://www.greenpea... [12:01:36] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3052793 (10tomasz) Given that most commercial data centers in Singapore seem to based on 0% renewable energy, I would really like to know whether environmental concerns were taken into consideration when this... [12:13:11] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3052807 (10Gnom1) Hi, where can we have a good discussion about the need to choose a datacenter that runs on renewable energy? I suppose that this bug is not the ideal location. Thanks for any pointers! [13:06:04] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3052899 (10BBlack) This probably isn't the ideal location, but I can speak to the issue here since it's obvious that some will come looking here for that answer. The TL;DR is that environmental consideration... [13:15:16] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3052915 (10Gnom1) Thank you for this information, Brandon. While your points are understandable, this does not mean that we should not try to find a vendor that uses renewable energy for their servers. So aga... [13:33:06] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3052927 (10BBlack) I think you can discuss that anywhere you like (within reason!). To clarify re: your language above: we deploy our own server hardware as opposed to using virtual hosting, so the environme... [14:02:23] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3052998 (10Gnom1) Thank you for your reply, Brandon. Maybe I should clarify my question: Where can //Wikipedians// have a discussion //with you and your team// about //running Wikipedia's servers on renewable... [14:24:48] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3053045 (10BBlack) I really don't mean to be overly facile here, but if you're interested in having a discussion, we can have that at any usual public discussion venue. The wikitech mailing list might be a g... [14:29:34] 10Traffic, 06Analytics-Kanban, 06Operations, 06Wikipedia-iOS-App-Backlog, and 2 others: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#3053061 (10Milimetric) It seems to me you can close this task and open up a new one to investigate Varnish / Apache problems (as those a... [15:00:24] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3053131 (10Gnom1) Oh, I've already tried [[ https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/085128.html | writing to wikitech-l ]], which did not lead anywhere. I also asked to be added to ops-l,... [15:27:09] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3053150 (10BBlack) I think the wikitech discussion seems like it was, in fact, a good discussion of the issue. So if your goal is discussion, I don't see the issue here. The metawiki page does contain a lot... [15:38:02] I've added a graph for varnish sessions closed with RX_TIMEOUT [15:38:07] https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=26&fullscreen&from=now-3h&to=now&var-server=cp3007&var-datasource=esams%20prometheus%2Fops [15:38:40] temporarily bumped timeout_idle on misc as an experiment (5s -> 60s) to see if the piwik errors are somehow related [15:38:56] \o/ [15:39:13] (only on the backends as the issue seems to mostly affect them) [15:42:36] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3053179 (10Gnom1) The goal is to //have Wikipedia's servers run on renewable energy//. It's as simple as that. In Europe, this is a no-brainer, while I understand that it is not so much in the U.S. But Google... [15:46:35] well that seems to have helped reducing MAIN.sc_rx_timeout significantly [15:46:52] https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=26&fullscreen&from=1487945990782&to=1487951199254&var-server=cp3007&var-datasource=esams%20prometheus%2Fops [15:48:14] at least a good thing :) [15:48:28] is the change applied to the whole misc cluster? [15:48:38] yep, all misc varnish-be [15:49:13] tailing 5xx.log to see if anything changes [15:51:52] last 5xx I see was at 15:36:26 [15:52:39] and I bumped timeout_dile at 15:35ish, so it looks like it [15:53:10] \o/ [15:53:51] we actually might need this on other clusters too! [15:54:10] so the theory that reaped connections might cause EOF to read() is sort of true? [15:54:42] or better connections closed due to timeout_idle [15:58:10] needs more investigation, but for varnish-be the client is either a varnish-fe or another varnish-be (timeout_idle is about client connections) [15:58:20] and we do use the default idle_send_timeout (60s) [15:59:11] so yeah our idle_send_timeout should be < than timeout_idle on the backends or we'll keep on getting RX_TIMEOUTs [15:59:33] now, how that specifically relates to piwik errors I'm not sure yet [16:02:09] but I imagine there could be requests running while we close the session? [16:13:58] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3053235 (10BBlack) >>! In T156029#3053179, @Gnom1 wrote: > The goal is to //have Wikipedia's servers run on renewable energy//. It's as simple as that. I don't think that's a realistic goal anytime soon. >... [16:17:31] yeah I think so, causing the EOF while writing/reading [16:21:38] ema: can we make the settings permanent on misc? [16:23:14] elukey: it's not going to be overwritten by puppet so I'd be tempted to leave it as it is for the weekend and find the proper timeout_idle/idle_send_timeout settings for all clusters next week [16:24:49] I've been playing with CLI barcharts hehe, look at this: [16:24:55] egrep 'cp3010|cp3007' /srv/log/webrequest/5xx.json | grep '"http_status":"503"' | jq '.dt' | sed 's/[0-9]:[0-9][0-9]"//' | sed 's/"//'| python ~ema/bar_chart.py [16:25:15] ack, I am fine with this [16:25:19] thanks :) [16:27:34] thank you! [17:38:07] 10Traffic, 06Operations, 10ops-eqiad: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#3053422 (10faidon) a:05BBlack>03Cmjohnson [18:17:43] 10Traffic, 06Operations, 10ops-eqiad: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#3053517 (10Cmjohnson) @faidon, I will swap out the sfp+ ...that is the most typical culprit. Do we need to schedule downtime? or can I do anytime? [18:22:01] 10Wikimedia-Apache-configuration: Create 2030.wikimedia.org redirect to Meta portal - https://phabricator.wikimedia.org/T158981#3053534 (10gpaumier) [18:38:03] 10Traffic, 06Operations, 10ops-eqiad: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#3053568 (10faidon) I believe @ema has depooled it, so any time should be OK. [19:10:05] 10Traffic, 06Operations, 10ops-eqiad: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#3053683 (10ema) @Cmjohnson yes the system is indeed depooled. Please go ahead whenever it is convenient for you. Thanks!