[14:34:45] o/ [14:34:49] Hey folks [14:34:53] o/ Amir1 [14:35:05] hey halfak [14:35:14] I have gooood news for you [14:35:17] In my morning, I hope to get a few more language assets pulled into revscoring and start building models for those wikis [14:35:22] Oooh :) [14:35:26] and no, it's not prod [14:35:28] :( [14:35:36] https://phabricator.wikimedia.org/T131666 [14:35:51] first is that, I trained reverted for Russian [14:35:58] for the first time it was GB \o/ [14:35:59] :D [14:36:29] lol [14:36:30] great! [14:36:36] xgboost was running really slow? [14:36:54] It's totally reasonable that you disable since xgboost tends to get similar results as GradientBoosting. [14:36:55] yup [14:37:01] But, it *should* be faster [14:37:07] most of them timedout [14:37:09] I think we need to explicitly tell it to parallelize. [14:37:25] Yeah... that's broken. Either way, this result is great :) [14:37:52] anddd [14:37:53] https://phabricator.wikimedia.org/T102347 [14:38:01] https://phabricator.wikimedia.org/T130773 [14:38:39] I wasn't sure we should define a logger for the db object in wikilabels [14:38:58] but it's a common practice in mediawiki so I thought we can give it a try [14:39:52] I need to go very soon [14:40:05] but I can work the Japanese once I get the chance [14:40:22] = in one hour [14:40:22] 04Error: Command “in” not recognized. Please review and correct what you’ve written. [14:41:11] bad robot [14:41:21] = shuddup [14:41:21] 04Error: Command “shuddup” not recognized. Please review and correct what you’ve written. [14:41:24] :D [14:41:45] Cool! Great to see progress in these areas. [14:42:00] I'll focus on languages today, but I'll aim to review your progress on these tomorrow. [14:43:00] reviewing the wikilabels? [14:43:11] Or russian? [14:43:46] https://github.com/wiki-ai/editquality/pull/23 [14:44:00] after that I want to work on this: https://phabricator.wikimedia.org/T105521 [14:44:04] Reviewing wikilabels [14:44:08] not very sure though [14:44:15] awesome [14:44:27] Abandoning tasks in wikilabels should be relatively easy. [14:44:36] yeah [14:44:41] We actually have an API endpoint for abandoning a workset. :) [14:44:46] There's just nothing in the UI for that yet [14:44:59] awesome [14:45:36] btw. the wikilabels patch, specially the clientside one, is very simple [14:45:59] I think we may want to discuss alternative logging strategies a bit. [14:46:12] kk [14:46:38] I learned a lot about wikilabels [14:46:53] (how it works, the system, etc.) [14:46:56] I need to go [14:47:00] be back very soon [14:47:01] o/ [14:48:29] WIll be around for a few hours [14:48:31] o/ [15:07:07] aspell-hi works on Jessie 8.3 [15:07:23] So I think that this is just something broken with the package for Trusty and Precise. [15:29:43] Got a new deb and it works. [15:30:07] However, we may struggle to have the aspell package work in travis [15:47:45] hmm [15:48:08] halfak: is there any eta for releasing new version for ubuntu? [15:48:16] Nope [15:48:21] They may never do that [15:48:28] Not sure [15:54:57] halfak: without intsaling ores, services work [15:55:12] if they don't I need to add them in scap config [15:55:16] but they do [15:55:27] (I tested several times) [15:55:43] Amir1, I don't understand [15:56:10] https://github.com/wiki-ai/ores-wikimedia-config/commit/3e23e6e4ffe1a03e1e2819053e0a4ec599d82c01#commitcomment-17036600 [15:56:16] I'm talking about this [15:56:45] Yeah. Services should work without installing ores. [15:57:10] that's the whole plan [15:57:14] right? [15:57:28] So... hmmm... We could also run the precached utility from the config/submodules/ores/ directory as "./utility precached". [15:57:38] Well, we do need precaching [15:58:27] that's what I suggested him [15:58:48] do what uwsgi and celery do, run from the working directory [15:59:44] (we can run it from /srv/ores/config easily) [15:59:57] no need to run it from /config/submodules/ores/ [16:00:07] (I think it's symlink) [16:01:44] Amir1, "./utility" is in submodules/ores [16:01:54] hmm [16:03:41] Oh! I think I've got it. [16:03:54] python submodules/ores/utility -h works [16:04:08] so we can do "python submodules/ores/utility precached ..." [16:04:38] yeah [16:05:03] lots of ways to do without needing to install [16:51:41] Amir1, can you give me an example of a signed arabic word [16:51:43] ? [16:53:23] and persian [16:55:22] halfak: sure [16:55:35] "َِ [16:55:43] it's really hard to see [16:55:51] ُ [16:56:01] ^ is that a whole word? [16:56:13] no, it's a vowel [16:56:26] sounds like "o" [16:56:35] I need a whole word -- preferrably with a vowel in the middle-ish for testing [16:56:42] we don't write them mostly [16:56:44] sure [16:56:51] When my test fails, it'll split the word [16:57:01] When it succeeds it will keep it whole [16:57:01] مُنیر [16:57:07] Awesome [16:58:50] Cool! It breaks the test. Now to fix :) [16:59:55] awesome [17:01:30] amazingly even the re library doesn't work with them very well [17:01:50] so we have regex library that you need to install [17:05:37] Arg this is very frustrating [17:08:30] Amir1, this is a dumb question maybe. Is persion written with arabic symbols? [17:08:41] * halfak is looking through unicode ranges now [17:08:48] actually that's pretty smart [17:09:09] Persian uses Arabic symbols [17:09:19] kk :) [17:09:39] but it extends them. for example Arabic doesn't have letter for "p" they change it to "b" [17:09:45] but Persian has [17:10:07] so persian has a letter that arabic doesn't [17:10:32] and there are several letters that arabic has and persian uses another form (another unicode character) [17:10:45] fa: ی, ar: ي [17:10:50] note the dots [17:13:01] Gotcha. [17:17:19] OK. I can successfully differentiate the word you gave me. [17:17:30] Now to try some sentences from arwiki and fawiki :) [17:21:04] try he [17:21:15] he will be next [17:21:16] I know Hebrew has them too [17:25:03] halfak: extracting features for japanese returns error since we haven't defined any dictionaries for ja in revscoring [17:25:15] is it intentional? [17:25:42] Amir1, yeah. Can't use those features for ja [17:25:52] E.g. turkish doesn't have a dictionary (I think) [17:26:06] ok [17:26:15] thanks [17:28:35] ok, started the feature extractor [17:28:49] It might get done very soon [17:33:02] afk for a while [17:53:17] Amir1, when you get back, it would be great to have you check my work re. persian & arabic here https://github.com/halfak/deltas/pull/9 [17:57:56] I'm somewhat confused between editquality repo and revesionscoring, as to what part does editquality play in revesionscoring? [18:55:32] halfak: looks good :) [19:05:51] Thanks Amir1 :) [19:05:56] BTW, I just finished https://github.com/wiki-ai/revscoring/pull/261 [19:06:26] It took a lot. hindi is *very poorly* supported by python's re library [19:06:39] But I figured out a good generalization for how we look for word boundaries. [19:07:07] I've gotta run. Will pick things up again tomorrow. [19:07:26] But tomorrow I'm going to focus on getting a blog article written about the analysis of anon bias in our old models. [19:07:51] FYI: https://rpubs.com/avner/ores [19:08:21] o/ [19:08:30] codezee, ! [19:08:34] Sorry I missed your question. [19:08:39] revscoring is a core framework. [19:08:47] editquality is an implementation of that framework [19:08:52] revscoring is like Flask [19:09:02] and editquality would be like a Flask app [19:09:07] OK now I go. [19:09:08] o/ [19:09:42] halfak: thanks for that info, I'll follow up tomorrow or when you have time :) [19:25:57] * Amir1 is looking