[08:01:26] morning [08:20:59] sup [08:21:03] o/ joakino [08:22:56] 👋 [14:09:21] joakino: o/ [15:09:42] jhobs i created https://phabricator.wikimedia.org/T138085 for the deployment piece i'm presuming to be worked for next sprint. in practice the config patch could be created today, but t138085 is where the real points would get captured. as we discussed yesterday, the additional mfe logic should be incorporated into https://phabricator.wikimedia.org/T127250 [15:11:06] ^ bmansurov phuedx joakino i marked t138085 as a one pointer (one liner-ish swat), but lemme know if you think we should discuss further. [15:13:10] looks good [16:03:39] dr0ptp4kt: thanks Adam! [16:07:57] jhobs: have time to review the search overlay patch again? [16:09:04] bmansurov: yes, although it may have to wait until after lunch [16:09:17] jhobs: thanks, that'll do it [17:49:37] bearND: niedzielski: man, was the 5/9 hotfix release our first production release since 3/23? :( [17:55:25] mdholloway: that sounds right [18:46:55] bearND: niedzielski: lol, the sessions schema changed when we deployed reading lists [18:46:59] another mystery solved [18:47:15] * mdholloway plays sad trombone sound [18:47:25] :) [18:50:54] the latency increase issue might still be out there but that explains the apparent drop for sure [18:51:05] *sessions drop [18:58:36] now how am i going to update that spreadsheet :| [19:11:28] mdholloway: ping [19:11:44] hello gwicke [19:12:06] oh, you figured out a part of the mystery already [19:12:08] gwicke: just figured out one silly query error [19:12:10] yep [19:12:17] I was about to ask about the OkHTTP issue [19:12:40] the bug says "can't log in or edit", but you said earlier that HTTPS was used no matter what [19:13:02] so, wouldn't the OkHTTP issue affect all connections? [19:13:42] this is re https://phabricator.wikimedia.org/T134817 [19:14:17] gwicke: i think it affected all connections, but we were only crashing on edit and login attempts [19:14:44] gwicke: scratch that, don't remember if it was a crash [19:15:14] nope, just a connection error [19:15:18] maybe it only affected POSTs? [19:15:33] that seems quite possible [19:15:35] since it seems to be related to reading the request body [19:16:05] right [19:16:34] I still wonder if the move to http2 could have something to do with the apparent slow-down [19:17:03] something is still a little off on the client side since logging shows that we're still falling back to http/1.1 even when i remove the shim that forces it, even after the nginx fix is in place [19:17:22] i haven't had the chance to dig into it [19:17:42] the fall-back could cost some round-trips [19:23:18] https://github.com/square/okhttp/issues/1305 has some discussion that sounds related [19:26:28] gwicke: fwiw, the latency numbers from the table for the new sessions schema look quite a bit better [19:26:47] nginx blog post on npn vs. alpn: https://www.nginx.com/blog/supporting-http2-google-chrome-users/ [19:27:32] mdholloway: oh, are they more in line with the pre-June 1st values? [19:28:06] gwicke: yes, quite a bit better in fact [19:28:52] gwicke: there seems to be a correlation between low latency and speed to update, which i guess isn't unexpected [19:29:36] yeah, heavy users are more likely to be on faster connections [19:31:03] gwicke: i started a new tab: enwiki prod app since 2016-05-31 [19:31:19] (first day with data for the new schema) [19:31:54] wow, that's a big difference [19:32:01] down to 4xx ms [19:33:40] oh, that was remaining -- but still , lead is down quite a bit [19:34:00] which of these are using rb, and which the php api? [19:39:46] gwicke: i still have it broken into two main sections: mw latency on left vs. rb latency on right -- just have two columns (old schema and new schema) for each single column in the last sheet [19:40:11] gwicke: luckily it seems that the transition to the new schema is happening pretty rapidly [19:40:19] ah, missed the rb one on the right [19:40:42] tough on a laptop :) [19:41:33] but yeah, lead section numbers in the 700s and remaining in the low 400s is great! [19:44:55] indeed [19:47:48] I saw that you are using the PHP API when images are disabled, and users disabling images are probably on slower connections [19:48:33] so, if there are a lot of users with images disabled on slow connections, then this might over-state the speed-up slightly [19:49:34] gwicke: true, though only a very small number of users have images disabled, 0.42% based on pageview data last time we looked [19:50:52] that's not a lot, indeed [19:51:34] coincidentally, exactly 1% of the 42% still on the RB API [19:51:53] err, still on the PHP API [19:53:02] :) [19:53:05] is the expectation that the number of sessions using the new schema (new app) hitting the PHP API is going to drop soon, once those users open the app again & the config is refreshed? [19:54:38] i would have expected them to have switched to RB long ago :( [19:54:48] we have yet to figure out what's behind the slow rollout [19:55:51] at least it's continuing gradually [19:56:05] and it looks like we don't have a real slow-down after all [19:56:49] yes, looks like the latency issue is resolved [19:57:35] do you mind summarizing this on the task? [19:58:21] gwicke: sure, i'll summarize and close out [19:58:37] mdholloway: thank you! [19:58:49] gwicke: thanks for all the help investigating such a silly error! [19:59:21] gwicke: we learned some interesting data, at least [20:00:25] indeed, and we gained some confidence in the results by checking possible issues