[14:10:58] Okay, I want to work on stuff [14:11:07] let's see what we do have here [14:11:21] 1- revscoring regex opt for other langs [14:12:04] 2- ores user agent analysis [14:12:37] 3- work on kian a little bit [14:12:45] 4- read some papers [14:14:34] oh and 5- wikilabels performance issues [14:14:51] (maybe building a knockover test) [14:28:56] o/ Amir1 [14:29:01] Just got done with first meeting [14:29:07] hey halfak [14:29:08] :) [14:29:18] right now i'm working on other langs [14:29:20] Amir1, I want to look into mwparserfromhell performance this morning. [14:29:22] kk [14:29:33] great [14:38:26] let's see if tests pass [14:38:27] https://github.com/wiki-ai/revscoring/pull/270 [14:41:10] wiki-ai/revscoring#699 (regex_opt2 - c54d5dd : Amir Sarabadani): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/127534081 [14:44:02] Amir1, what is "?:"? [14:44:16] non capturing group [14:44:29] there is no difference between (foo) and (?:foo) [14:44:42] except it doesn't capture it as a group [14:44:43] Gotcha. But it give a performance boost? [14:44:46] Oh! [14:44:48] (and it's faster) [14:44:50] That works. [14:44:56] :) [14:45:03] but not very much [14:45:12] that's why I didn't try it on en [14:45:23] I ran some and it was in order of .5% [14:45:35] Looks like the build failed [14:45:50] In spanish [14:45:54] "mecagoenlaleche" [14:47:49] because we removed "\w" from start of regexes [14:47:55] and it can't catch them [14:48:02] let investigate deeper [14:49:25] yeah [14:50:52] Yeah... it's a weird one. I'd be OK with removing the test case or catching it explicitly. [14:50:59] In the end, I don't think it'll matter that much. [14:51:34] no, we just need to delete the "me" at first of the word [14:51:55] halfak: I didn't touch the Persian [14:51:58] it is a huge mess [14:52:11] complicated regex, too complicated [14:52:24] but since requests from fa.wp are not much [14:52:30] I hope you would be okay with that [14:53:47] Amir1, no worries there. [14:54:52] writer of those regexes is someone I know, he is famous for doing things that are way out of performance issues [14:55:11] lol You could take a simplifying pass at some point. [14:55:16] Maybe we should file a task? [14:55:24] like he has a script that shows misspelled words, it takes about 1 min to process for a medium article [14:55:37] sure [14:55:39] why not [14:55:55] we should have a task for these regex performance boosts too [14:57:48] still failing [14:57:53] some other word [14:58:25] +1 I'll make both tasks. [15:00:15] 06Revision-Scoring-As-A-Service, 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: Apply regex performance optimizations to badwords/informals detection - https://phabricator.wikimedia.org/T134267#2260256 (10Halfak) [15:00:24] 06Revision-Scoring-As-A-Service, 10revscoring: Apply regex performance optimizations to badwords/informals detection - https://phabricator.wikimedia.org/T134267#2260270 (10Halfak) [15:02:02] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: Simplify and optimize persian regular expressions - https://phabricator.wikimedia.org/T134268#2260274 (10Halfak) [15:02:26] 06Revision-Scoring-As-A-Service, 10revscoring: Apply regex performance optimizations to badwords/informals detection - https://phabricator.wikimedia.org/T134267#2260288 (10Halfak) [15:02:30] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for wikidatawiki - https://phabricator.wikimedia.org/T130301#2260289 (10Halfak) [15:02:39] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for wikidatawiki - https://phabricator.wikimedia.org/T130301#2131957 (10Halfak) a:03Ladsgroup [15:03:51] SPAM COMING [15:04:09] 06Revision-Scoring-As-A-Service, 10ORES, 10rsaas-editquality: Deploy updates to ORES - https://phabricator.wikimedia.org/T134174#2260293 (10Halfak) 05Open>03Resolved [15:04:11] 06Revision-Scoring-As-A-Service, 10Wikidata, 10rsaas-editquality: Train / Test wikidata damaging model - https://phabricator.wikimedia.org/T134047#2260294 (10Halfak) 05Open>03Resolved [15:04:13] 06Revision-Scoring-As-A-Service, 10wikilabels: Deploy updates for Wikilabels - https://phabricator.wikimedia.org/T134032#2260295 (10Halfak) 05Open>03Resolved [15:04:15] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Train/test 'damaging' and 'goodfaith' model for nlwiki - https://phabricator.wikimedia.org/T133563#2260297 (10Halfak) 05Open>03Resolved [15:04:17] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: ScoredRevisions flags everything on Wikidata - https://phabricator.wikimedia.org/T133903#2260296 (10Halfak) 05Open>03Resolved [15:04:19] 06Revision-Scoring-As-A-Service, 10ORES: Deploy updates for ORES - https://phabricator.wikimedia.org/T133558#2260299 (10Halfak) 05Open>03Resolved [15:04:21] 06Revision-Scoring-As-A-Service, 10wikilabels: i18n for API errors in wikilabels - https://phabricator.wikimedia.org/T133561#2260298 (10Halfak) 05Open>03Resolved [15:04:23] 06Revision-Scoring-As-A-Service, 10wikilabels: Review staging protocol for Wikilabels - https://phabricator.wikimedia.org/T133557#2260300 (10Halfak) 05Open>03Resolved [15:04:25] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 10wikilabels: Complete wikidatawiki edit quality campaign - https://phabricator.wikimedia.org/T130274#2260302 (10Halfak) 05Open>03Resolved [15:04:27] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 07Epic: [Epic] Edit quality models (damaging/goodfaith) - https://phabricator.wikimedia.org/T130213#2260303 (10Halfak) [15:04:29] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for wikidatawiki - https://phabricator.wikimedia.org/T130301#2131957 (10Halfak) [15:04:32] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for wikidatawiki - https://phabricator.wikimedia.org/T130301#2260301 (10Halfak) 05Open>03Resolved [15:04:32] :)))) [15:04:34] 06Revision-Scoring-As-A-Service, 10wikilabels: WikiLabels doesn't handle well revdeleted edits - https://phabricator.wikimedia.org/T130234#2260305 (10Halfak) 05Open>03Resolved [15:04:35] 06Revision-Scoring-As-A-Service: Report of work since last report - https://phabricator.wikimedia.org/T128958#2260306 (10Halfak) 05Open>03Resolved [15:04:37] 06Revision-Scoring-As-A-Service, 10ORES: Set up graphite dashboard for ores - https://phabricator.wikimedia.org/T127594#2260307 (10Halfak) 05Open>03Resolved [15:04:42] 06Revision-Scoring-As-A-Service, 10wikilabels: Develop way to load campaigns into the wikilabels - https://phabricator.wikimedia.org/T102336#2260308 (10Halfak) 05Open>03Resolved [15:04:46] The only type of spam I like [15:10:10] 06Revision-Scoring-As-A-Service, 10revscoring: Apply regex performance optimizations to badwords/informals detection - https://phabricator.wikimedia.org/T134267#2260317 (10Halfak) See * https://github.com/wiki-ai/revscoring/pull/269 * https://github.com/wiki-ai/revscoring/pull/270 [15:27:58] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for wikidatawiki - https://phabricator.wikimedia.org/T130301#2260340 (10Halfak) 05Open>03Resolved [15:28:00] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 07Epic: [Epic] Edit quality models (damaging/goodfaith) - https://phabricator.wikimedia.org/T130213#2260341 (10Halfak) [15:28:45] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for nlwiki - https://phabricator.wikimedia.org/T130290#2260347 (10Halfak) a:03Ladsgroup [15:29:36] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for ruwiki - https://phabricator.wikimedia.org/T130293#2260349 (10Halfak) [15:30:04] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for ruwiki - https://phabricator.wikimedia.org/T130293#2131822 (10Halfak) [15:30:08] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for nlwiki - https://phabricator.wikimedia.org/T130290#2131757 (10Halfak) [15:30:20] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for ruwiki - https://phabricator.wikimedia.org/T130293#2131822 (10Halfak) a:03Ladsgroup [15:30:47] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Deploy edit quality models for ukwiki - https://phabricator.wikimedia.org/T130294#2260371 (10Halfak) [15:30:49] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 07Epic: [Epic] Edit quality models (damaging/goodfaith) - https://phabricator.wikimedia.org/T130213#2260372 (10Halfak) [15:30:51] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for ruwiki - https://phabricator.wikimedia.org/T130293#2131822 (10Halfak) 05Open>03Resolved [15:31:02] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 07Epic: [Epic] Edit quality models (damaging/goodfaith) - https://phabricator.wikimedia.org/T130213#2129835 (10Halfak) [15:31:04] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for nlwiki - https://phabricator.wikimedia.org/T130290#2260373 (10Halfak) 05Open>03Resolved [16:16:04] installing dictionaries so I can test revscoring locally [16:16:48] brb dog time [16:27:21] I think this one would make things fixed [16:27:27] I've got to go [16:27:31] be back soon [16:30:35] wiki-ai/revscoring#703 (regex_opt2 - 3ef9bf3 : Amir Sarabadani): The build was fixed. https://travis-ci.org/wiki-ai/revscoring/builds/127564423 [16:55:23] 06Revision-Scoring-As-A-Service, 10revscoring: Tamil language utilities - https://phabricator.wikimedia.org/T134105#2260801 (10Shanmugamp7) Sure, i will do it by this weekend [17:15:57] halfak: https://github.com/wiki-ai/revscoring/pull/270/files [17:16:23] back! [17:16:28] Got lunch too [17:19:08] :) [17:19:10] awesome [17:19:12] JohanJ just brought up doing something for the Wikimania hackathon [17:19:23] cool [17:19:23] I asked him to join us here, but I thought we could brainstorm. [17:19:34] By then, we'll have the ORES extension deployed to at least a couple of wikis. [17:19:40] yeah [17:20:01] wikilabels for more wikis [17:20:34] +1 [17:20:45] It would be cool if we could have a booth or something like that. [17:21:07] We could demo ORES and then help people sit down to do translations and vet BWDS lists. [17:21:12] since that's only me [17:21:15] that would be hard [17:21:21] * halfak imagines a booth with two chairs next to it. [17:21:36] I can sit in WMDE and asks people [17:21:38] Yeah, but I can probably get you support. JohanJ is WMF CL and he might be able to help us. [17:21:41] +1 for WMDE :) [17:21:55] yeah, [17:22:40] first we need to determine how we can support more languages [17:22:50] 1- new bwds lists [17:22:58] 2- new wikilabels [17:23:23] 3- label edits until we get damaging [17:23:57] We might want to start pre-emtively building bwds lists for the big wikis. [17:24:21] we did that last year I guess [17:24:36] but we used all of them [17:24:43] we can do another round [17:26:03] let's determine what languages it would be good to have [17:26:38] We could just start at the top of wikipedia.org [17:26:58] https://www.wikipedia.org/ [17:27:39] soooo [17:27:42] war, ceb [17:27:46] :D [17:27:49] Maybe we skip the botpedias [17:27:51] Yeah [17:29:46] halfak: thanks for merging [17:30:03] :) Thanks for your work on it :D [18:33:05] Biking to the University Back in ~40 mins [18:34:03] halfak: I've got the list ready [18:34:20] Of next BWDS runs? [18:34:34] yup [18:34:45] shared the document with you [18:34:49] Great1 [18:34:56] I'm AFK for a little while now. [18:34:59] Will check it later. [18:35:03] Awesome [18:35:11] I do some other stuff in the mean time [19:34:09] DarTar: hey, I've got some analysis for you [19:34:10] https://gist.github.com/Ladsgroup/0a496085be965395b07610341ec31007 [19:34:21] 297 distinct user agents [19:34:33] during 6 days, and only in one of our three nodes [19:48:39] O/ [19:48:51] I forgot my laptop charger 😣 [19:49:00] So am on phone [19:52:02] halfak|Mobile: o/ [20:07:38] halfak|Mobile: tell me when you're around [20:07:40] https://gist.github.com/Ladsgroup/0a496085be965395b07610341ec31007 [21:11:47] Hey Amir1 [21:11:54] What is this gist? [21:12:15] it's an analysis of user agents [21:12:34] in the last six days we had about 2M requests [21:12:41] 1.5 precachig [21:13:08] (that's only one of our three nodes) [21:13:13] web-03 [21:13:38] we had 297 distinct user agents [21:13:42] halfak|Mobile: ^ [21:13:42] Woah! That's a lot [21:14:07] So ~500k requests by 300 users [21:14:14] Assuming no overlap [21:14:25] yeah, DarTar asked it in the last meeting [21:14:35] and that's only the log in web-03 [21:17:20] oh halfak|Mobile I think I made a mistake [21:17:41] 1.5 is actually us doing analysis (?) [21:17:52] I'm nost sure [21:17:54] *not [21:18:24] no, I'm dumb [21:18:27] I haven't been doing analysis [21:18:27] it's precaching [21:18:31] Using ORES [21:19:10] precaching doesn't use very good user agent [21:19:14] but anyway [21:19:23] it's 1.5M precaching [21:21:18] That's a pretty good trade-off I think [21:21:22] 3:1 [21:21:38] \o/ [21:21:48] Especially since we cache all the models [21:21:57] Even reverted when damaging is available [21:22:59] when someone requests ores it's being logged too [21:23:11] whether is cached or not [21:23:48] (I'm not sure about varnish cache thought but I highly doubt we do have a varnish/squid cache) [22:22:02] o/ [23:45:06] o/ [23:45:10] no one is here