[13:15:01] \o [13:15:18] pulled some additional numbers...pondering. If i didn't completely get this wrong (maybe, it's for some reason awkward) the decline in ac submit of 0.15% is ~4,200 fewer submits per day, and the ac success increase by 0.43% represents 12,000 additional successes each day [14:11:11] which is very naively just taking the mean page views per day and multiplying by the percentages [14:11:23] page views w/ac [14:52:48] * ebernhardson ponders how to label wikis with their language type, alphabetic vs syllabic vs ideographic [16:04:24] interesting...if i split the entire dataset by user edit bucket, users with 1000+ edits saw a 0.2% increase in submits, and an 0.5% increase in successes. [16:05:38] oh, actually i misread. submits wre up for 1000+ edits, but successes were up for 100-999 edits, 1000+ edits didn't make statistical significance for successes [16:11:38] break, back in 15-20 [16:26:34] back [17:31:27] ebernhardson: just saw your question about writing system type. I could take maybe half an hour and give you a 98% or better accurate list based on the Wikipedia for each language if you really want it. (I *think* it'd be 100% accurate, but I know better than to promise that.) [17:41:53] I'm making some progress on the opensearch cluster helm chart, ref https://github.com/opensearch-project/opensearch-k8s-operator/tree/main/charts/opensearch-cluster . The bad (?) news is we're gonna have to figure out opensearch security [17:52:42] Trey314159: well, i'm not sure if it matters or not :P I was pondering different ways to slice the data that might be more clear, based on the idea fuzziness is less-fuzzy in alphabetic languages [17:52:54] but, would we really vary the configuration? i dunno, probably not [17:55:32] Yeah, that's what I figured you wanted to look at. If you want me to do it, I'm game. Using the Site Matrix as a starting point makes it tractable, I think. [17:57:49] Trey314159: yea you get a list of languages, it's just the tedium of classifying them all. if thats not too hard, i can run the analysis [17:58:22] Sure.. I'll see how far I get in half an hour... [17:58:56] i'm also tempted to drop the per-wiki table...even a null test shows ~20 wikis with significant effects. [17:59:03] it's just misleading [17:59:17] or move to an appendix with caveats [18:35:38] ~20/~350 is ~6%... so about what you'd expect at 95% confidence. [18:36:43] I got through 220 wikis in half an hour. I originally had thought an hour but decided the latin ones would be easy. they aren't hard, exactly, but I gotta check for language converters! Anyway, an hourish should do it... I'll report back in another half hour why that's not quite enough time, I'm sure! [18:43:58] Trey314159: in theory (i dont know the theory, i just read about it this week:P) the multiple tests correction i applied is supposed to be the false discovery rate, so 5% of of the discovered things should be false positives, rather than 5% of the source [18:44:01] maybe :P [19:16:09] Maybe, indeed. That might be too much statistics for a Friday afternoon. [19:16:39] I've identified all the scripts for all the wikis, but I still need to categorize a few of them.. almost done [19:24:57] hmm, annoyingly matplotlib will print x-axis precision up to 10 digits if its available...and you can't cap it without using a custom formatter :( [19:25:07] will just have to live with one graph having far too many digits :P [19:47:20] ebernhardson: I sent you the writing system spreadsheet. I tried to group similarish types (like alphabets (Latin) and abjads (Arabic)) under Super-type, if you want to use fewer categories. Super-type is also the predominant type if there is one. Some I hedged with "(mixed)" in parens. Some are just "mixed" because there's no getting around it. Both types "A/B" and "B/A" exist because they match the writing system order (which is [19:47:20] alphabetical when there are two)... you will still have to clean it up a bit to use it, I think.. but it's 94.3% of the way there. [19:47:40] Trey314159: thanks! [19:48:08] Trey314159: do your codes match the wiki domains? I know some aren't real language codes [19:48:37] otherwise i can reverse them with sitematrix to the wiki's almost-language codes [19:48:44] Yeah, they should. I started with the Site Matrix as the first two columns. [19:48:53] (in my analysis i have the en.wikipedia part of en.wikipedia.org, rather than the real language) [19:49:02] I deleted most of the closed wikis, but not all. [19:49:05] kk [19:50:32] Trey314159: the report on people.wikimedia.org has been updated, hopefully this is the final report (unless alphabetic shows something interesting, maybe). [19:52:18] mostly its more words, and more numbers in the words, but the graphs are about the same. other than dropping short queries [19:52:31] and i guess all the appendix tables [19:52:36] Cool. I will take a look! [20:09:51] Trey314159: sadly, submit rate still declines limited to alphabetic. AC success rate is insane though, 61.1% in alphabetic languages, 32% in logographic (mixed), and 14.6% in unlisted [20:10:26] the submit rate changes doesn't meet statistical significance in anyhting other than alphabetic [20:10:59] that 14.6% hurts a bit [20:12:10] unlisted are these: '-', 'commons', 'foundation', 'incubator', 'meta', 'species', 'outreach', 'wikimania', 'wikitech', 'test', 'ua', 'login', 'bd', 'mx', 'dk', 'api', 'beta', 'test2', 'test-commons', 'nyc' [20:12:46] '-' is mediawiki, wikidata, wikisource, wikifunctions [20:27:17] ahh, so the weirdos. makes more sense [20:30:14] pulled the numbers looking at everything, not just statistically significant changes. Suggests places we could improve things :P see https://phabricator.wikimedia.org/F65674307 and https://phabricator.wikimedia.org/F65674314 [20:30:36] syllabic languages got 22.8% success rate, logographic 18.7 [20:31:21] oh actually the second one is still missing some rows...sec [20:32:19] https://phabricator.wikimedia.org/F65674321 should be the right one [20:36:51] * ebernhardson separately wonders if i should be learning about and applying baesian analysis...but leave that for another report :P [20:38:39] still looking, but F65674321 has an infinity (divide by 0) in it.. which looks like a possible error [20:39:15] OTOH, it's 22 observations, so maybe not [20:40:10] i think its because the control has 100% submit rate [20:40:53] lift is `(test - control) / (1 - control)`, so i'm dividing by zero [20:50:32] Ha.. that's the other one. I saw the inf% change because control had a 0% success rate. Those *really* small samples are kinda useless. [20:51:58] The real value are the relative success rates (control or test) acrosss language types. [20:52:16] ahh, i totally missed that one. Yea probably same problem though. The normal tables are limited to only include rows with at least 0.1% of the total number of rows, so filters all that out [20:52:50] i used to have it at 1k, but bumped up to 0.1% to remove more things that are probably noise [20:53:25] Yeah.. I replied in email, too... but everything makes sense and the report looks great! [20:53:59] i'll be out next week, so i guess the test just keeps running, but should be a simple config change to swap it over when ready [20:58:08] have a good vacation! I'm off for the weekend!