[20:44:42] Krinkle: I thought forcing a request to the primary DC automatically sets POST-like DB profiler expectations, but that doesn't seem to be the case: https://phabricator.wikimedia.org/T388165#10611377 [20:45:11] I wonder if there's a nice way of handling this [20:48:19] it's the same issue we had with Special:UserLogin/return faking a form POST - AuthManager::autoCreateUser() silences the DB profiler in a scope, but there are all kinds of followup writes happening outside of the scope due to deferreds [20:48:33] tgr_: A few aspects: 1) for sizable portion of traffic the primary DC is also their nearest secondary DC, 2) There route traffic to the primary DC if you recently made writes (ChronologyProtector, UseDC=master), those are regular GETs and should not be permitted writes. They're just there so that we can find the stored CP token and pick/waitfor an appropiate replica. [20:48:39] I wonder if deferreds should just inherit an active silencing scope [20:48:55] The handful of cases where unusual routing happens, to ease local testing and CI, should indeed convey this to the TransactionProfiler. [20:49:08] I'd say that ATS/multi-dc.lua at WMF approximates that, not the other way around. [20:49:24] Afaik auto-creation already has a tolerance for it. [20:49:42] I think the ATS rules are fine now, I just wanted to cut down on log noise [20:49:43] But yeah, there's stuff being queued indirectly there. I noticed those in the logs when I wrote my reply on that switchover test task [20:50:46] AaronSchulz and I have gone back and forth on this over the years, on how much pressure to put on writes in deferred updates. It's important that their volume is low, and generally speaking it should be 0 when you just browse and are testing things whether locally or mwdebug. [20:50:51] But yeah there's cases where we don;t. [20:51:09] I suppose we could make the trxProfiler state sticky and change it for the whole request instead of only a limited scope. [20:51:20] and then give up on some hypothetical coverage for the rest of a pageview where autocreation happened. [20:51:23] that seems like a reasonable trade off [20:51:57] that's what we did for the UserLogin case the other day right? [20:52:25] yeah, we just set the expectations to the ones normally used for POST [20:52:26] tgr_: btw, did you mean to imply requestst that do autocreation are routed to the primary DC? [20:53:33] it's a more clear-cut case because the login /return URL is really trying to be a POST, it's only a GET for compatibility with browser redirect behavior [20:54:05] for autocreation on GET, enforcing no writes does make for the rest of the request [20:54:16] but yeah not much lost if that doesn't happen [20:54:51] I was thinking maybe deferreds and transaction idle callbacks could just copy the profiler expectations when they are scheduled [20:55:34] what I mean is, if it were the other way around (as you thought) and ATS is signalling to MW to change transactinprofiler for primary DC requests... then autocreation would still warn because those are GETs with no information on them that autocreation might happen, they're in the secondary DC (and tolerated because they are rare and silently fail / progressively enhance) [20:55:36] the whole thing is not really related to primary DC routing, it's just how I noticed it because there are a lot of autocreations during autologin [20:55:49] ah I see, yeah, that makes sense. [20:55:50] some way to say "treat this like a POST for the whole request (including postsend)" would be nice. Maybe when we have cdn route centralauth requests to the primary DC it could set a header that does that. [20:56:12] AaronSchulz: right now that requires the caller to queue a deferred update it seems, which is not great. [20:56:49] https://gerrit.wikimedia.org/g/mediawiki/core/+/9c37b2dc9dbf78199280befbd80d9c1a1b4a433e/includes/actions/RollbackAction.php#296 [20:56:50] https://gerrit.wikimedia.org/g/mediawiki/core/+/9c37b2dc9dbf78199280befbd80d9c1a1b4a433e/includes/specialpage/AuthManagerSpecialPage.php#233 [20:57:26] Krinkle: are these for requests that CDN knows to route to the main DC via regex or something else? [20:58:15] AaronSchulz: we're talking about auto-creation on GET in secondary DC pageviews (and probably a few on requests routed to the primary DC as well, but they're the same to MW). [20:58:23] This happens in setup.php / AuthManager and is silenced correctly [20:58:38] But, account creation result in variosu defered updates e.g. Echo resetting stuff and what not, which happen outside that silencing [20:58:42] so that means we still emit a bunch of warnings [20:59:13] in general, autocreation can't be routed to the primary, it's very unpredictable [20:59:50] If we combined POST/PostSend-POST into one concept (and have MW tell TransactionProfiler which phase we're in), then we could have redefineExpectations() accept a key like 'POST' and taht'd be that. [21:00:01] AuthManagerSpecialPage could but currently isn't (since the special page name is localized, but we could generate it in canonical form specially for this purpose if we wanted) [21:00:05] no idea about rollback [21:00:10] instead of the caller obtaining $wgTrxProfilerLimits and baby sitting it [21:00:39] (This was tgr_ idea.) [21:01:52] even if we route them to the primary DC, MW would still need to make its own mind up about what the expectations should be. that way it works locally and in CI, and mwdebug etc. Not dependent on CDN routing itself. [21:02:02] my problem with autocreation was, that conceptually it's not true that the whole request is POST-like (unlike rollback etc). Ideally the changed expectations would only apply to the autocreate method and the deferreds scheduled within that method. [21:02:11] Right. Ideally we'd minimize the stuff extensions are doing there. I don't like saying "well one thing has to write, so anything else can now", though it would be nice for truly related updates to be silenced as well. [21:02:13] but probably not worth overcomplicating it [21:02:38] Oh I see, you mean to carry the state over to the deferred, hm.. yeah. that would be more semantically correct, but yeah also more complex. [21:02:55] but, yeah...hard to think of something simple. [21:03:13] ref T348901 [21:03:13] T348901: Decide on owner and contact for "Cross-DC database query" alert - https://phabricator.wikimedia.org/T348901 [21:03:30] I think a simple approach, combined with actually alerting on this, could suffice. [21:03:47] the biggest enemy to telemetry is false positives and alert fatigue. [21:04:37] so making sure we keep logstash clean (err on side of silence) but have counters/alerts that can't be silenced, should help to keep things in balance. [21:18:52] Krinkle: having the profiler know the stage would simplify things indeed. Callers wouldn't have to fiddle with the exact expectations, just what verb to treat the request as. The entrypoint code would handle advancing the stages. You could still have POST and POST-postsend config. [21:23:29] So autoCreateUser() could do something like Profiler::instance()->getTransactionProfiler()->redefineRequestVerb( 'POST' ) [21:23:56] Also, it would be nice as it's own service and not a Profiler field [21:24:41] It used to include manual profilerSection() info in the logs in some cases, but none of that logic is there afaik [21:26:37] Even if we were to add something like that back I assume it would use the span code and not SectionProfiler. So glueing it to Profiler is anachronistic. [21:31:39] ack, but that's a larger project with pros and cons. It's not obvious to me that we want a service-bound profiler. Similar to webrequest, telemetry, logging, and deferreds; these are process-bound, not wiki-bound. The test I use in my mind is: If you were to create a service container for another wiki in the same process, do you need a separate instance, and can such instance be reasonably configured and work correctly? [21:32:04] In this case, it seems transactionprofiler is more process bound in that regard. so a singleeton seems appropiate. Where to get it and how to inject it.. less sure about htat.