[01:28:42] does MediaWiki sample statsd metrics like ->increment() ? [01:30:00] Krinkle: ^ I assume you'd know [01:30:58] I'm looking at https://grafana.wikimedia.org/d/ZIvCK9EMz/globalwatchlist?orgId=1&from=now-30d&to=now and also don't understand why there are fractional numbers there [01:31:23] it's just `$this->statsdDataFactory->increment( 'globalwatchlist.load_special_page' );` [01:37:24] legoktm: no, we do not. not anywhere in core or prod [01:38:08] a good graph will use .rate or .sample_rate and not any of the other (unreliable) fields. https://wikitech.wikimedia.org/wiki/Graphite#Extended_properties [01:38:34] the one caveat with rate is that it is per second, whereas we buffer out from statsd to graphite once a minute [01:38:38] so fractional is expected [01:38:57] 1 data point a given minute will be 1/60 rate [01:39:17] use scale(60) if you prefer per minute numbers (which will generally always be whole numbers) [01:39:43] (also document in the axis label what the unit is etc) [01:42:06] I mostly just want to know the total count, not too plussed on the exact timing [01:42:10] * legoktm reads the link [01:51:58] Krinkle: so on https://grafana-rw.wikimedia.org/d/ZIvCK9EMz/globalwatchlist?viewPanel=4&orgId=1&from=now-7d&to=now&forceLogin=true I updated it to use .rate + scale(3600) and labeled it as "loads per hour", is that accurate? [01:52:50] legoktm: hm.. not exactly, this is giving you the per-minute average of how many there would be per hour if constant [01:53:18] given the per-minute resolution, that might be confusing [01:54:30] the default for this kind of metric is to plot it as .rate labelled per second, or rate |scale(60) and labelled per minute [01:54:57] scaling it further tends to be confusing I think since it's still a data point per-minute, just inflated with a different label [01:55:06] hm [01:55:59] okay, switched it to scale(60) / per minute [01:56:54] I'd also set a Y-min of 0, use bars, null as zero [01:57:21] since here nulls/absent are essentially zero its just that we don't push any data (in prometheus, the pull would get real zeros) [01:57:44] right now it is cutting off the bottom of the graph since no data will be zero [01:58:42] lgtm :) [01:59:15] the "total" panel, for example, when you hover somewhere in the large empty space, the tooltip shows the closest data from a few days earlier [01:59:34] although that may be fine there [01:59:53] but you'd want to avoid that with the more regular rate stuff, and actually give you zero etc. [10:26:55] addshore: to reply to your question from yesterday: I don't even know what API. All I have is the screenshot produced by selenium. That's why I'm asking. [11:10:21] duesen: can you link me to the code patch again? i'll take a look [12:47:08] addshore: the patch is here: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/670570 [12:47:08] I suppose the issue will be easy to find once I know what API modules are involved and why. [17:51:05] Does anyone know how long it took wikimedia to get a photo dna key? [18:07:15] RhinosF1: Cindy might, but she seems to have vanished from irc into Slack chatrooms :/ [18:09:32] bd808: is it supposed to take like over 6 months [18:10:12] the task was written over a year ago, so yeah maybe (T247977) [18:10:13] T247977: Implement Hash Checking of Media Files - https://phabricator.wikimedia.org/T247977 [18:14:13] RhinosF1: I found https://phabricator.wikimedia.org/T246206#5932214 which I think confirms that Cindy is the person who may be able to answer your question. [18:14:45] bd808: email best? [18:15:07] probably, yeah [18:15:40] she's also on irc but not in this channel [18:15:57] she's in wikimedia-cpt though [18:16:33] my /whowas CindyCicaleseWMF gave no results, but awesome! [18:16:58] heh [19:30:46] TimStarling: AaronSchulz: I'm seeing some JobQueue jobs failing due to "The critical section …rdbms… timed out after 180 seconds". I can't tell if this is intentional or not. Since in wmf-config we set higher limits for jobs and POST. [19:31:15] it seems there is a separate limit wgCriticalSectionTimeLimit which defaults to 180 and is unmodified. [19:31:46] Naively, I was thinking these would have the same limit as the main one, just interrupted later instead of immediately [19:32:47] there's also transactionprofiler limits for queries, which yet another thing [21:33:17] duesen: https://github.com/wikimedia/Wikibase/blob/master/client/data-bridge/src/data-access/ApiPageEditPermissionErrorsRepository.ts#L41-L49 [21:33:52] it does that api call twice, once for the page on the client, and once for the entity page on the repo (I believe) [21:51:33] I talked with AaronSchulz about maybe making $wgCriticalSectionTimeLimit be infinite by default, but we didn't quite get to the end of that conversation [21:51:52] I think it would be fine to do that, but Aaron seemed noncommital [22:03:04] addshore: thanks for digging that up! [22:09:23] the script will get killed by something at somepoint eventually...I guess the idea was to at least give MW a chance to handle it and shut down. Maybe it can be raised for the job queue though. [22:11:04] Krinkle: which jobs? [22:36:56] AaronSchulz: I think they were upload jobs but I think that's an aside. [22:37:08] actually, thining about it some more, I don't understand why it exisxts at all as a configurable time limit [22:37:25] I assume under no ciircumstances do we interrupt a critical section, right? [22:37:35] so what does it actually control? [22:38:54] I mean, why would we want a generic indescrimitate time limit specifically for use around "critical sections" different from the general execution timeout (neither of which is process-killing anyway, so shutdown should be fine either way). [22:45:10] I can imagine a use case for wanting a non-global time limit over a closure, but in my mind those (theoretical) use cases would be non-critical, e.g. where you'd want it to stop early. For example, invoking the parser for an interface message with upto ~ 1 second wall time allotted. I'm trying to think of when you'd want a sequence to run uninterrupted but then end with an exception if it took longer than a certain amount of time (that'd [22:45:10] be easier to implement on your own as wel with startTime-endTime without this library). [23:27:09] looking at selenium flakiness -- there's a task open since 2019 about the test that's failing