[21:20:02] o/ Amir1, if you're around, I have an easy review for you. https://github.com/wiki-ai/editquality/pull/69 [21:20:14] Yup [21:20:19] working on wikidata stuff [21:20:24] Gotcha. No rush [21:20:24] but I can spare a minute [21:21:33] halfak: why we have even rev ids for enwiktionary [21:21:49] Because the 200k sample was *way to big* [21:21:54] I needed to cut it in half [21:22:06] And this is a way to do so fairly and deterministically. [21:22:08] :D [21:24:25] :))) [21:24:29] Do shuffle dude [21:25:24] Not deterministic. I would have had to check in the file. [21:26:21] hmm, still I don't have faith in numbers [21:26:22] :D [21:26:27] it's not that bad [21:26:39] but I don't like it [21:41:37] Amir1, yeah, either that or we (1) have a totally separate query or (2) we check in the dataset [21:41:45] Why no faith in #s? [21:42:03] They might not be totally random [21:49:28] Amir1, they are auto incremented. [21:49:42] Under what conditions would non-uniformity occur? [21:50:16] someone makes too much edit I guess [21:50:59] But how would that be a problem? [21:51:27] I've been in this debate before. https://meta.wikimedia.org/wiki/Research_talk:VisualEditor%27s_effect_on_newly_registered_editors/June_2013_study#Observational_study_with_haphazard_non-random_assignment_to_VE_or_Wikitext [21:51:58] I ran a ton of analysis to show that the strategy does not produce a bias. [21:52:06] Round robin sampling is pretty common [21:52:07] Interesting [21:52:55] I will read it