[13:40:05] o/ halfak [13:40:09] hey, I've got some news [13:40:14] tell me when you're around [14:27:03] halfak: hey, around? [14:53:07] o/ sorry, I'm in meeting. Done in 8 mins. [15:10:59] Hey Amir1 [15:11:01] just got done [15:25:52] I just got your ping halfak [15:26:01] * Amir1 curses at Konversation [15:27:03] I asked Eran and he told me that he will make an announcement in he.wp [15:27:05] re. getting pywikibase updated? [15:27:06] very soon [15:27:08] Yay! [15:27:18] I also fixed pywikibase bug [15:27:20] Did you direct him to the announcement I wrote on the ORES talk page? [15:27:24] and released a new version [15:27:31] halfak: yes [15:27:56] Cool. Copying that table might be a pain. I welcome Eran to take liberties he sees as apriopriate. [15:28:01] right now I'm focusing to find what's wrong with water (Q283) [15:28:08] Also, great on the new pywikibase. [15:28:15] Yeah. That's a weird on. [15:28:25] *one [15:28:42] I wanted to talk to you about bias detection [15:28:54] you wanted me to talk about it when you're online [15:29:12] Oh yes. [15:29:54] I don't know results of sigclust about clusters [15:29:58] on our data [15:30:13] is it not satisfactory halfak? [15:30:40] I want to comment on literature part [15:31:22] Oh it is. It's just that we weren't sure what to do next with the clusters. [15:31:32] oh, I also found another strange item: https://www.wikidata.org/w/index.php?title=Q37628&action=history [15:31:36] Mila Kunis [15:32:07] Can you give me results to see them [15:32:11] maybe I can check [15:33:18] Oh sure. Actually this would be a good opportunity for you to use the sigclust library when you are working to clean it up. [15:33:47] See https://github.com/aetilley/sigclust/blob/master/enwiki_data/clustering_tests.ipynb [15:36:24] Sure :) [15:37:03] See also https://github.com/aetilley/sigclust/blob/master/enwiki_data/similarity.py, which does the similarity comparison using Jaccard Index. [15:37:15] It definitely need some cleanup. [15:37:19] :) [15:41:22] OK :) [15:57:15] halfak: I have trouble understanding https://github.com/aetilley/sigclust/blob/master/enwiki_data/clustering_tests.ipynb [15:57:26] Would you help a little bit [15:57:46] Sure. What are the points of confusion? [15:58:31] "# cluster indices <= the cluster index blah blah" [15:58:50] # cluster indices means number of clusters? [16:01:52] Sorry. That's the simulated cluster indexes. [16:03:37] When we simulate 100 times and the cluster index of 30 of those simulations is <= the cluster index of the real data, then we'd say that there's a 30% chance that our real data is not clustered. [16:11:40] I see [17:04:49] wiki-ai/wb-vandalism#84 (pywikibase - 178e0cb : amir): The build passed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/91635541 [18:24:02] wiki-ai/wb-vandalism#86 (scale - 191a080 : amir): The build passed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/91650553 [18:32:49] halfak: Hey, I got to go [18:33:00] please check PRs in wb-vandaism [18:33:04] bye [18:33:27] Will do Amir1 [18:34:26] Both look good to me. [18:34:31] I'll run some test later today [18:49:51] and I'm back [18:54:11] o/ [18:54:31] * halfak is building a new revert model for wikidata right now :) [18:55:02] we need to extract features again halfak [18:55:14] * halfak already did that ;) [18:55:24] from dumps? [18:55:29] Nope. API. [18:55:45] I just took your balanced set, chopped off the rev_id and ran the API extractor ;) [18:56:05] amazing [18:56:35] \o/ [18:56:36] .824 AUC [18:56:39] \o/ [18:56:43] So, better :) [18:56:53] it's better [18:57:00] goot :D [18:57:19] goot = good + woot? [18:57:28] good in german [18:57:39] Ahh1 [18:57:44] :D [18:59:14] I'll get a pr with the new model in a couple minutes. [19:00:26] I should redo the mapper for ores-contrib today [19:07:25] Amir1, screwed something up. Rebuilding the model. [19:10:21] halfak: Can I help? [19:10:41] Na. Just forgot to increment the version number. [19:10:52] Easier to just rebuild the model than to modify the old one. [19:11:56] okay. will AUC improve? [19:12:49] Probably not. Will vary slightly in a random direction. [19:13:01] 0.825 [19:13:11] okay [19:13:25] I expected better performance [19:13:47] Me too. Then again, we could have some bad training data too. [19:13:54] Gotta get Wikilabels up and running. [19:14:39] * YuviPanda reminds halfak to send a stern letter to the bug about backups [19:14:50] diff building will be a challenge for wikidata [19:15:40] Amir1, https://github.com/wiki-ai/wb-vandalism/pull/18 [19:15:54] * halfak accepts YuviPanda's reminder [19:17:02] merged halfak [19:18:04] We'll be able to update the model in ORES on the next deploy. :) [19:18:08] wiki-ai/wb-vandalism#92 (new_model - ca52dab : halfak): The build passed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/91660496 [19:18:33] Oh! I should check the model against water. [19:18:34] One sec. [19:19:13] Wikidata search is awful! [19:20:22] Amir1, looks like we're scoring edits to water with a bit less extreme scores. [19:20:56] :)))) [19:21:23] We're still scoring highly, but not at the 99-100% level. [19:22:11] Last 5 edits: 0.79, 0.94, 0.86, 0.81, 0.92 [19:22:25] that's better [19:22:32] but we still need to work on them [19:22:53] Compared to 0.98, 1.00, 1.00, 0.98, 0.99 [19:22:55] ^ old model [19:23:34] okay [19:23:46] I spend some more time to see what I can do [19:28:52] Amir1, we should pull in badwords & informals. [19:30:36] We should look at our reverted edits qualitatively too to see if they have any weirdnesses. [19:30:50] See https://github.com/wiki-ai/wb-vandalism/blob/master/datasets/wikidata.sampled_revision.20k_balanced_2015.tsv [19:40:28] sure halfak [19:40:34] first in my to-do list [19:40:55] Amir1, would it help if I upload a dataset that includes the reverted labels too? [19:41:14] definitely [19:41:55] https://github.com/wiki-ai/wb-vandalism/pull/19 [20:43:28] halfak two more UI + Forms translations for you [20:43:35] estonian and italian [20:43:42] working on german as we speak [20:43:58] White_Cat, I noted that "bot" is missing from itwiki's list of groups [20:44:01] Is that purposeful? [20:44:01] vietnamese is non responsive but I think my new post may take their attention [20:44:03] no [20:44:14] people typically dont list that because I do [20:44:21] we trust sysop and bot [20:44:28] Oh are there others? [20:44:30] Gotcha. [20:44:31] and hence they often dont list the two [20:44:48] the onees on the card are all that I was given [20:44:56] I am on low battery so I may vanish briefly [20:44:59] tramifiying [20:46:43] hah tram broke down [20:46:49] busifying [20:48:31] german is somewhat there [20:49:32] I love the recommendation service. [20:49:42] The recommendation API and WikiProjects are going to befriend each other. [20:51:14] They will go to Applebee's every Friday and share appetizers. This will be the nature of their friendship. [21:06:59] harej, \o/ [21:07:21] I really wish that I could get ellery and Leila to hang out here. [21:07:27] And check their code into wiki-ai [21:07:31] * halfak grumbles [23:46:12] * aetilley lurks