[00:02:01] aetilley I suggest you get better soon. :) [00:02:09] being sick is unpleasant :p [00:02:26] This was your motivational input of the day! [00:03:22] halfak would it not be easier to generate models with a timestamp by default [00:03:26] including wiki name etc [00:04:02] ToAruShiroiNeko, we do associate timestamp with the model [00:04:12] But the point is to make it easy, right? [00:05:14] yes [00:05:21] but I dont think we have a structured model name [00:05:25] its whatever the user types [00:05:39] it may be made to display the timestamp and wiki to distinguish [00:06:05] also the keyword (revert, good faith, damaging) [00:06:22] How would the model builder know what it is predicting? [00:06:34] it knows it from the input [00:06:44] So, the output filename is part of the input. [00:06:45] a keyword could be passed as a parameter [00:07:13] input typically mentions the wiki or its api right? [00:07:13] But regardless, I'm not sure what problem this would solve. [00:07:26] each file would be unique and structured [00:07:47] based on that structure it would be put on the relevant directory where the code is run or etc [00:08:03] Yes. We already do this. [00:08:29] I thought we did not have a directory structure atm [00:08:48] Well, we don't have a public one, but we do have our repositories. [00:08:57] hmmm [00:09:01] The repositories have directories for models. [00:09:07] yes [00:09:41] I was imagining that we'd copy the model and data files out to a file server somewhere so that someone doesn't need to revert the repos back to a past state in order to work with an old model. [00:09:44] Or a current one. [00:09:55] Because our deployments are always way behind dev. [00:10:01] yes [00:10:49] I would put them in an oldmodel directory for emphasis [00:10:59] "you are using an outdated model" [00:11:28] most recent one would not be with date or anyhting so that bots and etc would use the most recent one, we do this already [00:12:12] ores/model/foo would be most recent [00:12:31] ores/old/model/foo would be older [00:12:44] "old" you wouldn't put a date on it? [00:12:56] Maybe associate the datasets that we used to train it. [00:13:34] I dont know, I think having the word old there to emphasise may be a good idea [00:14:30] it wouldnt fix anything nor would it break anything [00:14:50] but it would be that extra emphasis 3rd party users would perhaps like [00:15:18] Regretfully I'll need to head out now. [00:15:38] It's getting late and I have some chores before bed. I'll be on early tomorrow if you want to keep iterating on thoughts. [00:18:14] Here's my giant WIP pr: https://github.com/wiki-ai/revscoring/pull/231 [00:18:23] Not much to talk about yet, but it's starting to come together. [00:18:26] Worked on it all weekend. [00:18:49] OK. I'm off. [00:18:49] o/ [00:31:13] ok [00:31:22] see you [01:12:54] I'm not sure which machine hosts ores-staging, is it res-misc-01.ores.eqiad.wmflabs ? [01:13:01] *o [01:13:53] nope [01:14:08] ores-staging-01 [01:33:40] (03PS1) 10Awight: [DO NOT MERGE] Force 'testwiki' ID [extensions/ORES] - 10https://gerrit.wikimedia.org/r/258916 [02:17:28] awight: hey, around? [02:19:34] o/ Amir1 [02:19:55] o/ halfak :) [02:19:59] http://ores.wmflabs.org/scores/wikidatawiki/ [02:20:09] I'm trying to make this extension work [02:20:15] amazing [02:20:32] Also http://ores.wmflabs.org/scores/testwiki/ [02:20:33] have you done some tests halfak ? [02:20:45] yeah, I read your comments at the card [02:21:17] Nope. Just got all this together. No tests other than checking the last 5 in Q283. [02:22:43] halfak: https://www.wikidata.org/wiki/Wikidata:ORES/Report_mistakes [02:22:55] These cases are really good examples to test [02:23:06] everything looks good [02:23:45] We need to write an extension that makes a the ores score appear on any link to a diff. [02:24:05] * Write that into the ORES extension. [02:24:23] yeah [02:24:29] I thought about it too [02:24:45] That would make looking at these mistakes much easier. [02:26:07] There can be a ton of feature request around the extension [02:26:11] that's the front-end [02:26:51] Yeah. We could maybe find someone who would have fun with it. :) [02:27:49] We could do an IEG and pull in a front-end developer. [02:28:10] Rather, try to find someone who might like to do that and support them. [02:29:02] mgalloway just volunteered to take a look at doing some design work. [02:29:33] If we could identify a few key functionalities that MediaWiki needs to really take advantage of ORES, we could prioritize them. [02:30:15] We should find more developers to help [02:30:31] +1 [02:30:58] I've some ideas about who [02:31:12] I'll share it with you later [02:31:16] I've got a discussion ready for us at the dev. summit. [02:31:20] Sounds good. [02:31:30] Probably good to try to pull them into a discussion ahead of time :) [02:32:50] yeah [02:33:04] I wish I was there, I could find some people to help [02:33:06] :( [02:34:13] No worries. 95% of this is online except for 1 hour. [02:34:34] Our job to make sure that the conversation lives past that one hour. [02:48:22] halfak: do you want to be added to the project? so you can test it? [02:48:44] ORES Extension? [02:49:05] Or the tools project? [02:49:39] the tools project [02:49:49] I don't have access to the extension [02:50:33] Gotcha. Sure. I might just end up using vagrant though. [02:50:46] I've already got the VM ready to go. [02:53:06] VM doesn't work in tools [02:53:15] I use it in my localhost [02:53:24] (no diff. both doesn't work) [07:06:53] (03PS2) 10Awight: Add an config variable to override the wiki ID [extensions/ORES] - 10https://gerrit.wikimedia.org/r/258916 (https://phabricator.wikimedia.org/T112856) [07:07:50] (03CR) 10jenkins-bot: [V: 04-1] Add an config variable to override the wiki ID [extensions/ORES] - 10https://gerrit.wikimedia.org/r/258916 (https://phabricator.wikimedia.org/T112856) (owner: 10Awight) [07:11:00] (03CR) 10Awight: Switch from the "reverted" to "damaging" model (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/257851 (https://phabricator.wikimedia.org/T112856) (owner: 10Awight) [07:48:00] (03PS3) 10Awight: Add an config variable to override the wiki ID [extensions/ORES] - 10https://gerrit.wikimedia.org/r/258916 (https://phabricator.wikimedia.org/T112856) [08:42:29] (03PS1) 10Awight: Update field name in API response [extensions/ORES] - 10https://gerrit.wikimedia.org/r/258937 [16:19:35] o/ leila [16:19:35] :D [16:20:02] Did I already do an intro for you to people in this channel? [16:20:02] hello halfak. :D [16:20:16] no, I /think/ it's my first time here. [16:20:31] I can also just gradually grasp the information. :D [16:20:54] OK. So you know Amir1 and YuviPanda. ToAruShiroiNeko & aetilley are working on the revscoring IEG grant. [16:21:14] awight & legoktm have been working on the ORES extension [16:21:28] bmansuro_, has been working on the Wiki labels extension. [16:21:30] o/ leila [16:21:51] o/ halfak [16:21:53] hi Amir1. [16:21:55] halfak: https://phabricator.wikimedia.org/T118039 [16:21:56] o/ team [16:22:02] jenelizabeth, usually has some fun thoughts about machine learning in the abstract to share (usually an ANN framing) [16:22:21] khoobi leila jan? [16:22:33] Vinh has been working on improving our model fitness. he should have some technical reports out in a few weeks. [16:22:34] Doing well, Amir1. Listening carefully to halfak's intro. [16:22:47] violetto, has been looking at what design work we need. [16:23:56] Oh! And Helder used to be working on IEG, but he's too busy these days. Still he contributes a lot of portuguese processing code and does reviewing work. [16:24:12] I think that's everyone I can introduce. [16:24:25] Let me introduce you to madhuvishy, halfak. ;-) [16:24:42] thanks, halfak. Now I know much more. :-) [16:24:51] Ahh! Forgot about madhu. Then again, we haven't roped her into much work recently. [16:25:03] For everyone else, leila is another research scientist on the Wikimedia team. [16:25:11] She's working on article recommendations. [16:25:23] For which harej was recently interested. :) [16:26:02] correct; am also interested in annotating edits [16:26:10] for the type of edit they are [16:26:13] hi harej. :-) [16:26:29] harej, awesome. You've seen our ongoing project re. edit typing, right? [16:26:46] happy to chat about article rec whenever you want. it's easiest starting Friday morning PST, harej. [16:26:47] So I asked you about this and you pointed me to some WP:Labels discussion. Is this a different project? [16:27:11] Nope. Same project. [16:27:25] We're just finishing up the form that people will use to do the labeling. [16:27:31] Good, good. [16:27:35] What are the types you had in mind? [16:28:06] We have a mix of semantic and syntactic categories. [16:29:13] You can see them here: https://github.com/wiki-ai/wikilabels/blob/semantic_form/forms/available/edit_type.yaml [16:29:22] "Meanings" == semantic category [16:29:34] Then syntactic changes are expressed as object/action pairs. [16:29:58] We'll be running a pilot soon where we'll provide a big text field along with the structured categories so that people can note what is missing/difficult [16:30:22] Then we'll update the categories and run it again with a big dataset and a small text field for notes. [16:32:08] also, I am sure you have considered that people can do multiple things in one edit? [16:32:51] Yup. [16:33:18] See our mock https://commons.wikimedia.org/wiki/File:Wikilabels.Edit_types_v2_(mockup).svg [16:33:22] harej, ^ [16:33:23] So I am going to fuck with your system and do three separate things all in the same edit. Will your AI differentiate the three things within the same edit? [16:33:55] The idea is that you would be able to add a set of cards for all semantic meanings that are relevant and add the object/action pairs within the semantic meaning cards. [16:34:20] harej, yes. Though, we haven't settled on a modeling strategy yet. [16:34:48] That's fancy. [16:34:49] Right now, my primary concern is gathering a dataset that accurately represents how Wikipedians think about the meaning of work. [16:35:03] We'll essentially be doing a usability study during the pilot. [16:35:11] It could be confusing and too complex. We'll see. [16:35:19] You *are* asking people to do a lot. [16:35:26] Indeed. [16:36:07] leila, forgot to also say that guillom has been helping us with french language stuff like article quality features and badwords/informals for damage detection. [16:36:35] guillom, I've been welcoming leila to the channel and giving an overview of what people here are up to re. AI in wikimedia. [16:36:48] * guillom mostly idles here :) [16:37:06] halfak: we're making progress about ORES test system [16:37:14] Amir1, great! [16:37:33] I'm hoping to get a new deploy out for ORES today that will fix itwiki's model. [16:37:39] I could make some changes to the test model if you like [16:37:52] there are a few patches there [16:38:08] it shouldn't be necessary [16:38:14] OK cool. [16:38:21] because we are configuring it to work either way [16:38:38] focus on more important jobs, We'll take care of this [16:39:00] but I need YuviPanda's halp about logging errors of mediawiki [16:39:31] YuviPanda: if you can hear me please take a look at https://phabricator.wikimedia.org/T118039 [19:03:41] o/ violetto [19:03:58] hello halfak [19:04:00] Was hoping to catch up with you re. design of wikilabels and the ores extensions. [19:04:07] And make sure you had what you needed. [19:04:52] yea, i was too optimistic about last week. [19:05:08] let me find time today to send some questions over to you [19:05:26] No worries. I figure you are super busy. :) Sounds good. [19:05:28] * YuviPanda provides appropriate doses of gloom and doom to violetto [19:05:37] NOO [19:05:43] DOn't listen to YuviPanda [19:05:46] :P [19:06:04] haha! [19:06:17] lucky you i don't listen to pandas [19:06:49] how do people get out of blankets when i'ts really cold? [19:06:50] * YuviPanda wonders [19:07:12] quiddity: used to be PandaMonium [19:07:14] You promise yourself that there is a hot shower waiting for you. [19:07:24] PandaMonium: except the shower broke today... [19:07:27] so I've no hot shower [19:07:32] Oh. Than don't get out of bed. [19:07:39] right [19:07:40] That's a good enough reason to call in sick. [19:07:42] ;) [19:07:44] well [19:07:48] I can work while still under blankets [19:07:54] my laptop is also under my blanket now [19:08:00] and wifi passes through blankets [19:08:10] So, suggestion #2, get dressed and get right back into the bed to warm up. [19:08:16] Showers are for people who have hot water. [19:08:23] * YuviPanda nods [19:19:25] wiki-ai/revscoring#406 (features_commons - 90dc3c4 : halfak): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/96810682 [19:19:47] Shuddup travis. [19:20:58] wiki-ai/revscoring#407 (fix_italian - c4c72ba : Aaron Halfaker): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/96811073 [19:21:14] SEE [19:21:17] SEE what happens [19:21:20] stupid travis [19:22:08] wiki-ai/revscoring#408 (fix_italian - 7362004 : halfak): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/96811284 [19:22:13] arg [19:46:10] THE BUILD PASSED AND FROM HENCEFORTH THIS DAY SHALL BE CELEBRATED AS THE DAY THE BUILD PASSED [19:51:23] :) [19:51:45] I wonder if there is a way to tell travis to shut up unless master fails. [20:38:36] * quiddity is wearing a sweater, toque, scarf, and has a blanket wrapped around his midriff. Toasty! [20:42:44] it's in the 60s here today [20:42:53] 66 in fact [20:43:00] that's 19 C [20:43:15] a bit cold [21:04:31] Well, it should be much colder here! [21:07:29] first time using irc(yikes I know) testing to see if a message gets through. Hello world. [21:09:00] hi utilitarianexe [21:09:22] wooo it works thanks for the response [21:11:05] now off to find a way to write some useful code for wikipedia :-) [22:44:54] this is an odd first-channel, utilitarianexe :) [23:06:37] o/ utilitarianexe [23:08:11] oooo Platonides is here [23:08:13] nice [23:25:42] haha yea pretty weird halfak directed me here for finding wikipedia projects to work on [23:26:37] Aha! You're one of people who emailed me. You get credit for coming to visit the channel! Most people are like "psssh IRC? WHAT YEAR IS IT!?" [23:26:39] oh also @halfak your projects are super cool but thinking I am going to start with a simpler wiki bot project to get my feet wet [23:26:49] Sounds very reasonable. [23:27:04] lol [23:27:05] Maybe you'd be interested in testing out some of YuviPanda's work towards making bots way easier. [23:27:42] that sounds cool to try [23:27:49] halfak: we're gonna send out this announcement soon: https://etherpad.wikimedia.org/p/paws-announce [23:28:54] hello Yuvi [23:28:56] YuviPanda, kicking ass and taking names [23:29:01] I have been lurking here for some weeks now [23:29:28] since I found about this new cabal^W project [23:29:35] there is no cabal, etc [23:30:27] :) [23:30:28] login worked [23:30:50] utilitarianexe: https://www.mediawiki.org/wiki/Manual:Pywikibot/PAWS_walk-through has more detailed info :D [23:30:52] YuviPanda: are you going to thottle that paws thing? [23:30:57] Platonides: in what sense? [23:31:07] number of edits [23:31:12] Platonides: the default pwb throttling applies, I guess [23:31:15] nothing outside of that [23:31:42] so when someone not properly speaking regex starts running it, doesn't make a too big mess ;) [23:32:02] they can do the same thing on toollabs / their home computer already :) [23:34:24] utilitarianexe: let me know how it goes for you :) [23:34:31] (am also in the #pywikibot channel) [23:34:45] halfak: I'm going to work on a more researcher oriented version of this at some point in the next few weeks [23:35:17] YuviPanda: they can [23:35:36] but at least they need to be able to install python! [23:35:37] will do [23:35:40] Ooh. So, it would need access to data and a nice way to host resulting datasets -- like quarry [23:35:43] YuviPanda, ^ [23:36:11] Platonides: yeah, but that's just 'anti-vandalism by obfuscated wikitext' :) [23:36:31] halfak: yes, https://phabricator.wikimedia.org/T119859 [23:36:33] :D [23:37:55] +1 for a /public folder [23:38:08] I know [23:38:21] In the future, we might have the potential to write a special file format for inclusion into quarry. [23:38:36] Rather so that it can be queriable in quarry. [23:38:44] but it kinda works [23:38:44] I suppose we could just provide a DB connection [23:38:47] halfak: the more I think about it, the more I think that something like this should supplant quarry [23:39:00] halfak: https://github.com/catherinedevlin/ipython-sql [23:39:03] We'll still want a nice, simple UI for direct querying. [23:39:07] hmm [23:39:09] true [23:39:19] That's a killer user-case. [23:39:23] Very democratizing. [23:39:29] the simple UI? yeah [23:39:30] But I hear you. [23:39:42] I've been mulling over this a fair bit [23:39:46] What you are building is bigger and encompassing of quarry's functionality. [23:39:48] esp. since jupyter / ipython has an actual upstream [23:39:54] ':) [23:40:09] I do agree it needs to be simple enough [23:40:13] and quarry is far simpler than notebooks [23:40:16] butttttttt [23:40:17] wait let me fin dit [23:41:44] halfak: https://github.com/oreillymedia/thebe [23:41:59] halfak: https://oreillymedia.github.io/thebe/examples/matplotlib.html [23:42:02] as an example [23:42:08] except with the SQL magic [23:43:50] halfak: notice the fact that it *can* be interactive as well [23:45:04] Interesting [23:45:31] but doesn't necessarily have to be [23:45:46] so think of an authoring environment where you can mix wikitext with python with sql [23:48:54] Platonides: true :) I suppose 'we will see what happens and revoke / kick people if they do bad things' [23:51:54] YuviPanda, you are saying wonderful things to me. [23:52:19] * halfak gets a worklog [23:52:23] So, like this: https://meta.wikimedia.org/wiki/Research_talk:VisualEditor's_effect_on_newly_registered_editors/Work_log/2015-09-30 [23:52:28] halfak: I can probably give you a 'mix markdown with python with sql' already [23:52:40] halfak: exactly like that [23:52:44] :DDDD [23:52:54] halfak: just fixing up the sql magic and making it work for us will get us to there [23:52:58] except you're writing markdown [23:52:59] which is ok [23:53:06] Totally fine. [23:53:44] halfak: if i give you a notebook install in prod you can already do that there [23:57:38] YuviPanda, what would that take? [23:58:08] I suppose there would need to be a serious security review -- or a way to publish notebooks from the prod instance. [23:58:14] halfak: the analytics team saying 'yeah, we need this' :) [23:58:19] The latter sounds fine to me. [23:58:20] halfak: oh yeah, definitely the latter [23:58:36] halfak: we put all of jupyter behind LDAP restricted to people currently in research [23:58:42] +1 [23:58:47] halfak: this is the same issue we had with quarry in prod.