[00:26:06] Ironholds: still around? [08:39:02] is everything ok with revscoring? https://en.wikipedia.org/wiki/Special:RecentChanges looks very red currently when viewed with the ScoredRevisions.js gadget... [10:12:59] Identifying the subset of revisions of a page which was imported from another wiki is not trivial. [14:14:26] ToAruShiroiNeko_, halfak : ^ [14:14:57] ...it seems that revscoring has suddenly developed a very negative opinion about english wp, coloring almost everything red ;) [14:21:13] HaeB: perhaps the algo read Fram's rant. [14:35:29] Nemo_bis: that must be it ;) (i read it too, it actually inspired me to check some upcoming dyk hooks and correct a bogus one - but as a human i have better input weighting overall ;) [14:54:24] hmm? [14:54:30] hello [14:55:01] yeah it looks unusual [14:55:18] we are aware of it, halfak was working on the model IIRC [14:57:50] HaeB it probably will be fixed later today [14:57:56] we have a hack session [14:59:28] cool thanks [15:41:30] Hey folks. Sorry for the trouble. I'm looking into it. [15:42:25] * halfak curses a bit [15:43:46] * halfak waits for scipy installs on ores-compute [15:49:05] * halfak whistles at all of the red in ScoredRevisions [15:49:50] I think that this will help us not make this mistake in the future: https://github.com/wiki-ai/revscoring/issues/180 [18:48:07] HaeB, FYI, looks like we're back [18:49:54] See https://meta.wikimedia.org/wiki/Talk:Objective_Revision_Evaluation_Service#ORES_revert_models_super_negative [18:50:02] for details on what happened. [21:15:58] YuviPanda, around now [21:17:35] Ironholds: had / have questions about tsv things [21:17:36] Specifically [21:17:36] I want a schema [21:17:36] For tsv [21:17:36] With each column having a data type, name, and whether it can be null or not [21:17:45] oh, cool! [21:18:06] and you want to define it on a per-file basis or you want a way of deterministically working it out when reading things in? [21:18:10] Ironholds: and am wondering if such a thing already exists [21:18:16] Both [21:18:30] the former I'm not aware of, the latter absolutely; fread and readr are great examples [21:18:46] Latter sounds nicer since it is automatic [21:18:47] My ultimate goal is to ease importing tsv files into mysql for querying via quarry [21:18:49] basically the approach they take is to let you specify a schema in advance. If you don't it makes a best-guess attempt. [21:18:51] And that requires a mysql schema [21:19:03] so what this looks like is reading in the first 30 rows and attempting to cast the values in each column [21:19:12] from highest complexity to lowest [21:19:35] so first you see if it can be cast as an int, then a numeric, then a date, then a character, and whichever one first works, that's the field type. [21:20:04] Yeah so I'm building a small script that does that but does that through the entire dataset [21:20:14] Widening from int to float to date to string [21:20:28] Ironholds: how would you represent nulls? [21:20:50] halfak's dataset I'm playing with has the word NULL [21:22:04] YuviPanda, what's the storage system? [21:22:14] and recommend not doing the entire dataset. Optimisation, baby! [21:22:14] Ironholds: tsv? [21:22:35] YuviPanda, I meant what you're turning it into! [21:22:42] Ironholds: ah myswl [21:22:46] Mysql [21:23:02] which has a NULL type? ;p [21:23:20] Yes but the tsv [21:23:23] Also has nulls [21:23:33] I am figuring out how it should be represented in the tsv [21:28:41] I'm confused [21:28:49] are you writing from MySQL to TSV or TSV to MySQL? [21:29:00] I normally see TSV NULLs represented with Null or NA [21:34:13] YuviPanda, that's a good question. In MySQL TSV style, the word "NULL" means NULL. [21:34:20] I'm not sure how you would represent the word "NULL" [21:34:45] halfak: yeah [21:35:20] Ha [21:35:20] https://gist.github.com/halfak/ff6866e91f13ee4f196f [21:35:30] It's just 100% unhandled. [21:40:39] heh [21:45:47] halfak: I need to convert quarry to python3 as well [21:46:03] halfak: so how do you feel about git submodules? [21:46:34] halfak: I'm thinking of making revscoring a submodule of ores [21:46:34] What problem would you like to solve with them? [21:46:43] Yeah. That I think makes sense. [21:46:47] I'd like to tightly couple ores to a particular version of revscoring [21:46:54] cool [21:47:11] let me prepare commits [21:50:54] halfak: so this 'bundles' revscoring with ores [21:51:10] halfak: I think that'll allow us to get rid of the revscoring line from requirements.txt [21:51:20] halfak: shouldn't affect people using it from pip afaict [21:51:23] halfak|Mobile: ^ [21:51:40] Cool. [21:51:48] Yeah. Will be |Mobile for the next couple of hours [21:52:00] halfak: ok. you should try out IRCCloud etc [21:52:55] halfak: I'm also thinking of making ores a submodule of ores-wikimedia-config