[20:36:27] leila, we meeting now? [20:36:42] riiiight. got distracted, will be there in 2 min halfak [20:36:46] kk [23:31:39] halfak, awight .. fyi ... scott has been exploring a fst framework for language variants .. in case that might also be useful for your ores usecase .. http://www.aclweb.org/anthology/E/E09/E09-2008.pdf claims perf over say flex .. not that you are using flex, but i remember seeing tons of regexps in your code. [23:33:18] it is primarily targeted at converters / transformations .. but, anyway, fyi in case it is useful to you .. https://github.com/mhulden/foma/blob/master/foma/docs/simpleintro.md [23:37:04] subbu: That looks fun, and almost certainly useful. We’re just now looking at issues with PCFG memory usage ballooning during parsing. [23:37:31] look at the table on page 4 of the pdf. [23:39:41] https://gerrit.wikimedia.org/r/#/c/423197/ is the parsoid patch for using that formalism for language variants .. it is still work in progress, but quite far along at this point. [23:45:56] O_O it’s so complex…. just the code generation in https://gerrit.wikimedia.org/r/#/c/423197/5/tools/build-langconv-fst.js is blowing my mind [23:46:48] awight, see my note to reviewers that clarifies why that exists. [23:47:11] strictly speaking, that part isn't required and we could use the C binary that comes with the foma package .. but, that means introducing a binary dpeendency in our deploy. [23:47:48] so, he built a runner for the FST that takes as input the compresses json format that he generates from foma's att format. [23:48:24] but, neither is required. you can just write a ".foma" file, and generate a ".att" file via the foma package and then run it with foma's runner (in C). [23:48:43] so, don't get distracted by that complexity. [23:48:49] :) [23:49:28] alright .. signing off now. ttyl. [23:57:09] Thanks for the reading, formalisms and declarative data are always fun. We recently switched from directly writing Makefile rules, to generating them from configuration. Never looking back. (once our edge cases are resolved :p)