[13:15:01] <ebernhardson>	 \o
[13:15:18] <inflatador>	 <o/
[14:10:45] <ebernhardson>	 pulled some additional numbers...pondering.   If i didn't completely get this wrong (maybe, it's for some reason awkward) the decline in ac submit of 0.15% is ~4,200 fewer submits per day, and the ac success increase by 0.43% represents 12,000 additional successes each day
[14:11:11] <ebernhardson>	 which is very naively just taking the mean page views per day and multiplying by the percentages
[14:11:23] <ebernhardson>	 page views w/ac
[14:52:48] * ebernhardson ponders how to label wikis with their language type, alphabetic vs syllabic vs ideographic
[16:04:24] <ebernhardson>	 interesting...if i split the entire dataset by user edit bucket,  users with 1000+ edits saw a 0.2% increase in submits, and an 0.5% increase in successes.
[16:05:38] <ebernhardson>	 oh, actually i misread.  submits wre up for 1000+ edits, but successes were up for 100-999 edits,  1000+ edits didn't make statistical significance for successes
[16:11:38] <inflatador>	 break, back in 15-20
[16:26:34] <inflatador>	 back
[17:31:27] <Trey314159>	 ebernhardson: just saw your question about writing system type. I could take maybe half an hour and give you a 98% or better accurate list based on the Wikipedia for each language if you really want it. (I *think* it'd be 100% accurate, but I know better than to promise that.)
[17:41:53] <inflatador>	 I'm making some progress on the opensearch cluster helm chart, ref https://github.com/opensearch-project/opensearch-k8s-operator/tree/main/charts/opensearch-cluster . The bad (?) news is we're gonna have to figure out opensearch security
[17:52:42] <ebernhardson>	 Trey314159: well, i'm not sure if it matters or not :P  I was pondering different ways to slice the data that might be more clear, based on the idea fuzziness is less-fuzzy in alphabetic languages
[17:52:54] <ebernhardson>	 but, would we really vary the configuration? i dunno, probably not
[17:55:32] <Trey314159>	 Yeah, that's what I figured you wanted to look at. If you want me to do it, I'm game. Using the Site Matrix as a starting point makes it tractable, I think.
[17:57:49] <ebernhardson>	 Trey314159: yea you get a list of languages, it's just the tedium of classifying them all. if thats not too hard, i can run the analysis
[17:58:22] <Trey314159>	 Sure.. I'll see how far I get in half an hour...
[17:58:56] <ebernhardson>	 i'm also tempted to drop the per-wiki table...even a null test shows ~20 wikis with significant effects.  
[17:59:03] <ebernhardson>	 it's just misleading
[17:59:17] <ebernhardson>	 or move to an appendix with caveats
[18:35:38] <Trey314159>	 ~20/~350 is ~6%... so about what you'd expect at 95% confidence.
[18:36:43] <Trey314159>	 I got through 220 wikis in half an hour. I originally had thought an hour but decided the latin ones would be easy. they aren't hard, exactly, but I gotta check for language converters! Anyway, an hourish should do it... I'll report back in another half hour why that's not quite enough time, I'm sure!
[18:43:58] <ebernhardson>	 Trey314159: in theory (i dont know the theory, i just read about it this week:P)  the multiple tests correction i applied is supposed to be the false discovery rate, so 5% of of the discovered things should be false positives, rather than 5% of the source
[18:44:01] <ebernhardson>	 maybe :P
[19:16:09] <Trey314159>	 Maybe, indeed. That might be too much statistics for a Friday afternoon.
[19:16:39] <Trey314159>	 I've identified all the scripts for all the wikis, but I still need to categorize a few of them.. almost done
[19:24:57] <ebernhardson>	 hmm, annoyingly matplotlib will print x-axis precision up to 10 digits if its available...and you can't cap it without using a custom formatter :(
[19:25:07] <ebernhardson>	 will just have to live with one graph having far too many digits :P
[19:47:20] <Trey314159>	 ebernhardson: I sent you the writing system spreadsheet. I tried to group similarish types (like alphabets (Latin) and abjads (Arabic)) under Super-type, if you want to use fewer categories. Super-type is also the predominant type if there is one. Some I hedged with "(mixed)" in parens. Some are just "mixed" because there's no getting around it. Both types "A/B" and "B/A" exist because they match the writing system order (which is 
[19:47:20] <Trey314159>	 alphabetical when there are two)... you will still have to clean it up a bit to use it, I think.. but it's 94.3% of the way there.
[19:47:40] <ebernhardson>	 Trey314159: thanks!
[19:48:08] <ebernhardson>	 Trey314159: do your codes match the wiki domains? I know some aren't real language codes
[19:48:37] <ebernhardson>	 otherwise i can reverse them with sitematrix to the wiki's almost-language codes
[19:48:44] <Trey314159>	 Yeah, they should. I started with the Site Matrix as the first two columns.
[19:48:53] <ebernhardson>	 (in my analysis i have the en.wikipedia part of en.wikipedia.org, rather than the real language)
[19:49:02] <Trey314159>	 I deleted most of the closed wikis, but not all.
[19:49:05] <ebernhardson>	 kk
[19:50:32] <ebernhardson>	 Trey314159: the report on people.wikimedia.org has been updated, hopefully this is the final report (unless alphabetic shows something interesting, maybe).
[19:52:18] <ebernhardson>	 mostly its more words, and more numbers in the words, but the graphs are about the same. other than dropping short queries
[19:52:31] <ebernhardson>	 and i guess all the appendix tables
[19:52:36] <Trey314159>	 Cool. I will take a look!
[20:09:51] <ebernhardson>	 Trey314159: sadly, submit rate still declines limited to alphabetic.  AC success rate is insane though, 61.1% in alphabetic languages, 32% in logographic (mixed), and 14.6% in unlisted 
[20:10:26] <ebernhardson>	 the submit rate changes doesn't meet statistical significance in anyhting other than alphabetic
[20:10:59] <Trey314159>	 that 14.6% hurts a bit
[20:12:10] <ebernhardson>	 unlisted are these: '-', 'commons', 'foundation', 'incubator', 'meta', 'species', 'outreach', 'wikimania', 'wikitech', 'test', 'ua', 'login', 'bd', 'mx', 'dk', 'api', 'beta', 'test2', 'test-commons', 'nyc'
[20:12:46] <ebernhardson>	 '-' is mediawiki, wikidata, wikisource, wikifunctions
[20:27:17] <Trey314159>	 ahh, so the weirdos. makes more sense
[20:30:14] <ebernhardson>	 pulled the numbers looking at everything, not just statistically significant changes.  Suggests places we could improve things :P  see https://phabricator.wikimedia.org/F65674307 and https://phabricator.wikimedia.org/F65674314
[20:30:36] <ebernhardson>	 syllabic languages got 22.8% success rate, logographic 18.7
[20:31:21] <ebernhardson>	 oh actually the second one is still missing some rows...sec
[20:32:19] <ebernhardson>	 https://phabricator.wikimedia.org/F65674321 should be the right one
[20:36:51] * ebernhardson separately wonders if i should be learning about and applying baesian analysis...but leave that for another report :P
[20:38:39] <Trey314159>	 still looking, but F65674321 has an infinity (divide by 0) in it.. which looks like a possible error
[20:39:15] <Trey314159>	 OTOH, it's 22 observations, so maybe not
[20:40:10] <ebernhardson>	 i think its because the control has 100% submit rate
[20:40:53] <ebernhardson>	 lift is `(test - control) / (1 - control)`, so i'm dividing by zero
[20:50:32] <Trey314159>	 Ha.. that's the other one. I saw the inf% change because control had a 0% success rate. Those *really* small samples are kinda useless.
[20:51:58] <Trey314159>	 The real value are the relative success rates (control or test) acrosss language types.
[20:52:16] <ebernhardson>	 ahh, i totally missed that one. Yea probably same problem though. The normal tables are limited to only include rows with at least 0.1% of the total number of rows, so filters all that out
[20:52:50] <ebernhardson>	 i used to have it at 1k, but bumped up to 0.1% to remove more things that are probably noise
[20:53:25] <Trey314159>	 Yeah.. I replied in email, too... but everything makes sense and the report looks great!
[20:53:59] <ebernhardson>	 i'll be out next week, so i guess the test just keeps running, but should be a simple config change to swap it over when ready
[20:58:08] <Trey314159>	 have a good vacation! I'm off for the weekend!