[00:37:45] 10Traffic, 10Commons, 10Multimedia, 10Operations, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3361808 (10Poyekhali) >>! In T167400#3361750, @Platonides wrote: >>>! In T167400#3356482, @Poyekhali wrote: >> Unless Commons have a lot... [03:40:27] 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Blog: Change automatic shortlink in blog theme - https://phabricator.wikimedia.org/T165511#3268418 (10Tbayer) Cool - I assume this has had enough eyes; merging and submitting now. (BTW, for later internal reference, that was [[https://wordpressvip.zendesk.com/h... [06:33:17] 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Blog: Change automatic shortlink in blog theme - https://phabricator.wikimedia.org/T165511#3362021 (10Tbayer) This has been deployed. Per a quick look, the shortlink in Volker's example above has been fixed (` 10Wikimedia-Apache-configuration, 10Wikidata, 10Wikimedia-Site-requests, 10Patch-For-Review, and 3 others: [RFC] should wikidata.org/entity/Q12345 do content negotiation, instead of redirecting to wikidata.org/wiki/Special:EntityData/Q36661 first? - https://phabricator.wikimedia.org/T119536#3362326 (10danie... [12:37:37] 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Blog: Change automatic shortlink in blog theme - https://phabricator.wikimedia.org/T165511#3363110 (10ema) [14:14:38] XioNoX: hey, are you about to start with T167274? [14:14:38] T167274: codfw row D switch upgrade - https://phabricator.wikimedia.org/T167274 [14:15:06] ema: yeah, could you review https://gerrit.wikimedia.org/r/#/c/360352/ ? [14:16:46] XioNoX: yeah we should also route around codfw in hieradata/role/common/cache/{text,misc,upload}.yaml [14:17:47] (just noticed the ticket now) [14:19:55] also in general it would be better to do this during the EU morning when the US DCs are less busy I guess :) [14:22:07] yeah, I'm on the pacific time zone right now, so EU morning would be middle of my night [14:23:36] is it good for https://gerrit.wikimedia.org/r/#/c/360352/ ? [14:24:03] and https://gerrit.wikimedia.org/r/360357 ? [14:26:25] XioNoX: +1ed [14:27:56] XioNoX: I'll force a puppet run on ulsfo caches to pick up the change [14:28:21] ema: thx, puppet-merge done [14:29:39] and dns updated as well [14:31:56] XioNoX: ok, puppet run in progress [14:35:54] done [14:37:20] XioNoX: codfw traffic starting to decrease https://grafana.wikimedia.org/dashboard/db/load-balancers?panelId=8&fullscreen&orgId=1&from=now-3h&to=now [14:37:48] great! [14:38:30] and expiry mailbox lag in eqiad starting to increase heh [14:48:58] ema: normal that lvs2002 is not decreasing anymore? [14:50:35] mmh [14:50:48] so this looks good: https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes?panelId=7&fullscreen&orgId=1&var-site=codfw&var-cache_type=All&var-status_type=1&var-status_type=2&var-status_type=3&var-status_type=4&var-status_type=5&from=now-3h&to=now [14:51:59] the LVS graph should keep on decreasing though [14:56:56] would it be possible that one name server didn't det updated? [14:57:01] get* [14:58:01] 10Traffic, 10ArchCom-RfC, 10Commons, 10MediaWiki-File-management, and 12 others: Use content hash based image / thumb URLs - https://phabricator.wikimedia.org/T149847#3363634 (10GWicke) [14:58:45] XioNoX: what puzzles me is that there's essentially no request going through codfw at the moment [14:59:18] possible ddos tentative? [15:01:29] ema: would you say it's safe to proceed with the switch upgrade? [15:02:21] XioNoX: hang on a sec please [15:05:58] sure [15:07:57] XioNoX: so yeah, I've double-checked that except for the usual clients with broken DNS resolvers there's essentially no requests hitting codfw text/upload/misc frontend/backend [15:08:11] XioNoX: go ahead [15:08:30] alright, thx! [15:12:42] I believe there must be something wrong with the LVS graph above, ipvsadm -L on lvs2002 looks sane [15:14:41] godog: https://grafana.wikimedia.org/dashboard/db/load-balancers?panelId=7&fullscreen&orgId=1&from=now-3h&to=now&edit still shows a high connection rate for lvs2002, contrary to what ipvsadm -L on the system shows [15:15:02] godog: rate(node_ipvs_connections_total[5m]) [15:17:03] ema: interesting! I'll take a look [15:17:09] doh, "node_ipvs_connections_total The total number of connections made." [15:18:20] heh, maybe pybal healthchecks ? [15:19:06] there is user traffic in there afaics, dns rec [15:21:33] godog: dang, yeah [15:26:27] it would be useful to split the graph by service :) [15:28:03] indeed, there's active/inactive connections per-backend [15:35:06] https://blogs.dropbox.com/tech/2017/06/evolution-of-dropboxs-edge-network/ [17:03:16] 10netops, 10Operations, 10Patch-For-Review: codfw row D switch upgrade - https://phabricator.wikimedia.org/T167274#3364158 (10ayounsi) Upgrade done. Took a bit longer than expected ~1h45min. But process was smooth. Full logs on P5597 [17:08:22] XioNoX: I've run puppet on the codfw cp hosts where it failed, https://gerrit.wikimedia.org/r/#/c/360381/ prepped [17:10:04] ema: failed? [17:10:37] 10Domains, 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Site-requests: SSL error for https://wikispecies.org/ - https://phabricator.wikimedia.org/T164868#3249552 (10Dzahn) I think it might be worth this _one and only_ exception to add this domain to the main cert. Of course we don't want to do that with... [17:11:37] XioNoX: yeah, transient puppetfails (see eg: ms-be2025 now) [17:15:21] doesn't seem to be v6 related (RAs are being received) v6 routes are there [17:16:05] ema: how can I reproduce? [17:16:52] XioNoX: running puppet now on ms-be2025, don't worry about that :) [17:48:07] XioNoX: user traffic coming fine through codfw hosts [17:48:12] great! [17:49:05] 10netops, 10Operations, 10Patch-For-Review: codfw row D switch upgrade - https://phabricator.wikimedia.org/T167274#3364565 (10ayounsi) 05Open>03Resolved [17:50:48] XioNoX: I'll point ulsfo back to codfw in a bit [17:55:02] okay, let me know if there is anything I should do [18:14:57] XioNoX: nope, all good :) [18:26:02] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 2 others: runUpdate.sh script in wikidata stand-alone has abruptly started incurring numerous 429 errors. - https://phabricator.wikimedia.org/T168019#3364719 (10Lisp.hippie) Everything is running smoothly on our end. Thanks @ema and @Smalyshev ! [19:01:14] 10Traffic, 10ArchCom-RfC, 10Commons, 10MediaWiki-File-management, and 15 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#3364959 (10GWicke) >>! In T66214#3256693, @Tgr wrote: > We'll also need a way to display old versions of images. Clients can encounter old versions withou... [19:12:35] 10HTTPS, 10Traffic, 10Operations, 10LDAP: Update certificates on productions replicas of corp.wikimedia.org LDAP - https://phabricator.wikimedia.org/T168460#3365036 (10Framawiki) [19:14:36] 10HTTPS, 10Traffic, 10Operations, 10LDAP: Update certificates on productions replicas of corp.wikimedia.org LDAP - https://phabricator.wikimedia.org/T168460#3365053 (10RobH) This is for ldap use, not https, not sure #traffic or #https or #traffic belong. [19:26:46] 10Traffic, 10netops, 10Operations: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462#3365105 (10ayounsi) [19:37:23] 10Traffic, 10netops, 10Operations: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462#3365153 (10ayounsi) [19:48:12] 10Traffic, 10Analytics, 10Operations: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#3365183 (10GWicke) >>! In T118365#3349563, @Nuria wrote: >>which matches metrics end points explicitly limited at 100/s per client IP. > > mmm... looking at pageview API dashb... [19:55:10] 10Traffic, 10netops, 10Operations: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462#3365193 (10ayounsi) [21:34:47] 10Traffic, 10netops, 10Operations: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462#3365472 (10ayounsi) [23:51:05] 10Traffic, 10HyperSwitch, 10Operations, 10RESTBase-API, 10Services (next): Respect host header in RESTBase, and redirect /rest_v1 to /rest_v1/ - https://phabricator.wikimedia.org/T167972#3351561 (10Pchelolo) When I'm trying to implement this internally within #restbase I hit into T168481 so that one sho...