[16:41:42] YuviPanda, all the workers are still alive! [16:41:45] \o/ [16:42:02] halfak: wooooooooooooooo [16:42:06] I think I'll stop pinging you now about that unless something bad happens. [16:43:03] halfak: nah feel free to. I love pings etc [16:43:11] :D [16:43:20] Now to figure out why I slept for 11h and still feel like crap [17:08:47] Me too. It's a feel-like-crap day. [17:09:04] That's the only explanation I have. [19:10:26] yuvipanda: Finally got my head into the ORES extension. [19:10:32] I'm changing the schema, see https://gerrit.wikimedia.org/r/238825 [19:10:46] Woooooo [19:10:48] Awesome [19:10:49] The point is mostly to hold results from multiple models [19:10:53] legoktm: ^ [19:11:11] woohoo [19:11:12] awight: ah I see. Yes that will simplify things in the future [19:11:12] yay [19:11:29] awight: do you want to just rebase it into my change? [19:11:37] er, merge? [19:11:38] legoktm: Thanks for kicking that off! It's great working with an Extension Registration-based repo [19:11:44] :) [19:12:03] sure, I'll hold off for a while though, wanna break up my changes into smaller patches and so on [19:12:24] legoktm & awight, will the cache handle invalidation? [19:12:38] what kind of invalidation? [19:12:52] We release updated models. In that case, old scores should be forgotten and re-generated. [19:13:36] how often do we expect that to happen? the "cache" is an sql table, so we could have a maintenance script re-generate values [19:14:02] legoktm, once/monthish [19:14:41] Sometimes we accidentally deploy a broken model briefly and need to clear out those bad scores. [19:14:45] That happened last week. [19:14:48] For 5 hours. [19:19:50] halfak: are models versioned? [19:20:24] Yes. [19:20:28] halfak: thanks! yeah, a model version column then [19:20:32] We don't publish that version, but we can. [19:20:58] I think we should store the version, and then have a script we can run to say "regenerate model foo version bar" [19:21:11] Is that the only way the cache should be invalidated, or is it possible that we'll need to purge cache without updating the version? [19:21:46] awight, right now, I'm using the version as the primary mechanism for cache invalidation -- even if the model didn't substantially change. [19:22:04] legoktm: Good point. Since we don't need to cache multiple model versions, it doesn't need to be a column on all classfication rows. [19:22:22] halfak: OK, perfect. Thanks for seeing things from the tech perspective :) [19:23:14] Sure. :) [19:24:51] It's nice to not have to explain the value in putting "_v1" at the end of everything [19:27:04] So, now, how should we report the version of the model? [19:27:42] I'm thinking that it would be valuable to get the version with every request. It could also be valuable to return the model if you ask for /scores/// [19:27:46] yep [19:27:54] the former, at least [19:28:06] That also solves some synchronization issues if we're hitting a cluster. [19:28:46] Right now, we get "prediction" and "probability" as fields in the "score" [19:28:54] I'd like to add a "version" field. [19:29:10] Or maybe a "meta" field that would contain versions. [19:29:37] either way, version or nested metadata that includes version [19:30:47] I'm leaning towards just version, and a second API to get extended metadata about the model provenance [19:32:37] halfak: Do you see any problem in having results from different model revisions in the db while we wait for the cache to refresh? [19:34:23] awight, nope [19:34:29] That's how the redis cache works right now. [19:34:57] Given a request, the system looks up the current version and then checks to see if we have a score for that version cached. If not, generate a new score. [19:35:08] The mode of operation is the same if there's an old score or no score at all. [19:35:21] We don't delete old scores -- just let redis LRU them [19:35:35] If there is an old score, we serve it and then rescore in the background? [19:35:53] Nope. Just ignore the old score. Pretend it doesn't exist. [19:36:04] *but* we don't have to operate that way in the extension [19:36:54] What's the benefit of caching the old score, then? [19:37:09] Basically none. We just don't actively delete. [19:37:15] wat [19:37:43] With redis, I'd rather have it batch delete the keys that haven't been used in a long time, than to do a bunch of individual deletions. [19:38:03] AFAICT, redis in LRU mode will do period batch deletes of old keys. [19:38:06] definitely. I'm just grokking why have redis at all [19:38:20] Oh! So we don't need to regenerate the score. [19:38:36] Ah! When you say "old score" you mean "same version, generated previously" [19:39:27] Yeah. Then we just serve the score. If the versions match, we assume that generating the score again will have the same result. [19:39:41] riiight. okay thank that makes perfect sense [19:40:18] Phew, sorry I was listening to the research showcase with one ear [19:40:29] lol same [19:40:34] :) [19:40:42] Relaying questions too! [20:29:30] * awight desperately wants to comment extension.json [20:35:02] Magic '@'! [20:36:58] Can't... use that file to add a scalar global [20:38:31] halfak|Lunch: ugh, apparently the last date for IEG commitee nominations was monday... [20:38:37] I've done it today, let's see how bad that was [20:38:40] I hope it get sthroguh [20:39:46] * awight mutters to self. Ah, you can add scalars, but under the "config" key [20:39:56] awight: legoktm knows [20:40:41] Everything [20:40:53] That's secretly why I'm rambling in this channel ;) [20:41:18] Haha [21:08:12] No comments in JSON makes me very sad. [21:08:30] It was one of the primary things that pushed me towards exploring YAML as a config file option. [21:10:09] All json is also yaml by default [21:10:15] So that is great as well [21:11:44] Yeah... Maybe. I like things to be strict. [21:11:56] But it does make migrating from JSON to YAML easier. [21:12:06] Yeah [21:12:20] halfak: if you are using a yaml parser and throw it json it will just read it [21:12:29] Don't think you can even turn that off [21:13:06] Yeah. That's right. [21:16:06] Turns out, we have a pseudostandard in extension.json, at least, that keys starting with "@" are ignored. [21:17:32] I don't know how I feel about that [21:17:58] ^ summarizes all of mw development anyway [21:28:22] awight: why do you want comments? [21:28:54] legoktm: Just a little inline documentation for new config variables. Where should that go instead? [21:30:12] awight: README or on the wiki