[00:22:35] HaeB, See http://paws-public.wmflabs.org/paws-public/EpochFail/projects/headings/extract_headings.ipynb [00:23:02] but do remember the public URLs will break at some point :D [00:28:58] * halfak runs away [00:29:02] have a good one folks! [00:29:03] o/ [15:09:51] o/ sabya [15:09:58] How did the staging setup go? [16:17:25] o/ halfak [16:18:14] getting internal server error. debugging. http://ores-experimental.wmflabs.org/ [16:18:38] sorry was AFK [16:20:11] sabya, you'll want to use journalctrl to look at the logs. [16:20:14] I'll get you a command. [16:20:37] ok [16:21:06] sabya, "sudo journalctl -u uwsgi-ores-web | less" [16:25:58] aha. lots of interesting stuff. [16:27:01] leila: the survey should be turned off now [16:27:43] schana: thanks. checking (wrong channel at first ;-) [16:28:32] I assume there'll be some stragglers until the flow stops [16:28:46] yup, schana. [16:29:24] yeah, cool schana. I don't see any more responses. we're good. thanks. [16:29:33] no problem :D [16:29:52] sabya, I'm not seeing anything obvious suggesting what the issue is. [16:29:57] Gotta run dog outside for a bit [16:30:00] Will be back to help debug [16:30:20] module mw is missing I think [16:55:01] sabya, shouldn't need "mw" [16:55:05] Something weird is going on. [16:55:38] syslog says no python app [16:55:49] Yeah. I'm looking into that now. [16:55:58] fab -H localhost initialize_staging_server should this be enough? [16:56:08] https://etherpad.wikimedia.org/p/setting_up_ORES_lab_cluster [16:56:16] i recorded my steps there [16:57:59] sabya, OK. I think I might have found the problem. [16:58:14] Looking in `wikiclass` which may have a broken old dependency on `mw` [16:58:35] Yup. It does. ' [16:58:39] Time to kill that. [16:58:51] In the short term, I'll fix the current install [17:00:00] https://github.com/wiki-ai/wikiclass/issues/15 [17:00:08] ok [17:00:56] OK. Restarting web servcie. [17:01:00] *service [17:01:05] ok [17:01:29] \o/ [17:02:50] in the ui, do i need to type the revision ids? is there a way to import the recent revids from enwiki? [17:03:29] sabya, right now, the UI is pretty basic, but there are some other UIs that read the "RCStream" [17:03:49] E.g. http://tools.wmflabs.org/raun/ [17:04:07] http://tools.wmflabs.org/raun/?language=en&project=wikipedia&userlang=en [17:04:18] ^ Pulls changes and scores for English Wikipedia [17:04:38] Note the little flame indicator with a percentage next to it. [17:04:51] Huggle actually uses ORES too these days. [17:05:23] ok. [17:05:42] i had listed some questions in the etherpad. could you check? [17:06:02] at the bottom of https://etherpad.wikimedia.org/p/setting_up_ORES_lab_cluster [17:12:00] sabya, some answers in the etherpad [17:12:42] halfak:yes, read them. btw, I tried running the ores from the ui for current revid from enwiki [17:12:45] throws error. [17:13:03] revid = 34026812 [17:13:29] the error is 'SVC' object has no attribute '_dual_coef_' [17:14:14] sabya, yeah. I'm seeing that too. [17:14:28] http://ores-experimental.wmflabs.org/scores/enwiki/damaging/34026812/ [17:14:31] For raw error [17:14:46] I guessing we have a version issue with sklearn [17:15:41] yeah. Wrong version of sklearn, it looks like. [17:15:45] I wonder how that happened. [17:15:47] * halfak digs [17:16:46] Ahh. I see the problem. Looks like `editquality` specifies min version number for `revscoring` but we need a max in there. [17:17:10] Version matching will get a lot easier soon. For now, I'll fix this, but it's going to take a little while to compile. [17:17:32] Sorry for the trouble. Nice to find some of these issues. [17:17:48] We're in a versioning transition and that is making things interesting. [17:18:05] sklearn 0.15.2 --> 0.17.1 and revscoring 0.7.11 --> 1.0.1 [17:18:06] Soon [17:18:23] i see [17:20:33] sabya, building the right version of sklearn now. Should take ~30 minutes. [17:22:13] * halfak works on completing the versioning transition. [17:22:56] halfak: ok. where did you see that editquality is missing a max version? [17:23:18] https://github.com/wiki-ai/editquality/blob/master/requirements.txt#L4 [17:23:38] Should read "revscoring >= 0.7.11, < 0.7.999" [17:24:00] I.e. give me the last version of 0.7.x [17:25:31] gotcha [17:38:07] halfak:added a 4th question in the etherpad. [17:46:14] sabya, answered. [17:46:54] halfak:thanks [17:47:38] o/ schana [17:47:45] hey halfak [17:47:56] Just wanted to check in on your availability. Amir1 is looking to pick up some of the tasks I have assigned to you. [17:48:12] If you won't be able to work on them today, then we'll re-assign to Amir1 [17:48:24] that would be fine with me [17:49:01] schana, how long do you estimate your work with recsys blockers will continue? [17:52:32] halfak: I'm hoping to be finished this week if things go smoothly [17:53:23] schana, OK. Keep me up to date. :) [17:53:37] will do [17:53:56] lunch! [18:22:34] hi everyone [18:23:30] is somebody involved in this project online? [18:23:51] https://meta.wikimedia.org/wiki/Research:Automated_classification_of_edit_types [18:26:23] hjfocs: that's probably halAFK [18:26:27] but he's away for lunch [18:27:08] alright, thanks yuvipanda [18:28:43] Yep that's halfak's project [18:33:59] it's noted, no luck with the time zones, as I'll go for dinner in a few minutes :-) [18:34:28] hjfocs: you can leave messages here and maybe contact info and he will be able to read it :) [18:38:36] sounds good. I just wanted to know the implementation status of the classifier [18:39:48] as I'll soon start the development of a classifier for the StrepHit IEG project: [18:40:04] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References [18:41:39] could you please tell halAFK to drop me an e-mail at fossati@spaziodati.eu ? [18:42:01] I have to go now, thanks! [18:49:27] I see the messages from hjfocs [18:50:23] halfak: I think the sklearn is done, becuase your shell user is not running anything. so I restarted uwsgi. Again "internal server error" started appearing. [18:52:02] kk sabya checking [18:53:32] Weird. We're getting an error for an old version of yamlconf too. [18:53:37] That one doesn't make any sense. [18:55:22] hmm... [18:55:58] Yeah. I just confirmed that the line for yamlconf >= 0.1.0 in ores-wikimedia-config/requirements.txt is right. [18:56:04] Not sure why the old version got installed. [18:56:11] But I just installed the right version. [18:56:22] sabya, taking on getting this version change stuff worked out. [18:56:32] Hopefully you'll be the last to experience the mess :) [18:57:39] hope so. how long before new version is rolled out? [18:58:13] sabya, just working out the last details of the deployment right now. [18:58:43] We were hoping to switch away from SVC to RandomForest and GradientBoosting models since we get a more fitness from them. [18:59:00] But it turns out that they also substantially change the probability estimates. [18:59:07] And users of ORES will need to make changes. [18:59:26] We don't want to spring that on people, so we've been working on strategies to make sure that, if we do this now, we'll never surprise anyone again. [18:59:42] E.g. https://phabricator.wikimedia.org/T129230 [19:00:28] I've been working on building up our evaluation metrics so that users (other tool developers) can ask ORES where thresholds should be placed. [19:02:11] Hmm looks like our workers aren't running. [19:02:14] why not use versioning in the ORES api? so that users of ORES are not taken by surprise. they have the option to use new API versions. [19:02:21] yes. [19:02:40] sabya, we'll be refining models -- making them more fit and accurate. [19:02:58] Old models will be hard to maintain. We'll need to re-train them at every iteration. [19:03:14] But it's a good proposal. [19:03:26] Would love to see a solid description of *how* we could manage that. [19:03:37] Regretfully, I haven't seen anything that suggested it might be easy. [19:04:20] * halfak restarts celery workers [19:04:28] ok. [19:04:54] \o? [19:04:59] \o/ [19:05:07] http://ores-experimental.wmflabs.org/scores/enwiki/damaging/34026812/ [19:05:09] BAM [19:05:13] It works! [19:05:16] It's alive! [19:05:18] :D [19:06:03] yay, :-) [19:06:50] what do you think should be the next step for me? [19:09:36] sabya, so, that depends on what your interests are. One direction to take is system robustness. We could try to figure out how to remove all SPOF. [19:09:47] E.g. get rid of redis or use tewmproxy. [19:10:24] Another alternative is to look into engineering some other aspects of the system. E.g. how do we make it so that no users are ever surprised by a new model being deployed? [19:10:50] Right now, I we have ~30 models in production. All of those models are sitting in memory in both uwsgi and celery workers. [19:11:08] I'm not sure how much memory we are using now, but at some point this might become intractable. [19:11:24] *but* we could also rely on swap and an LRU. Most models will not be used that often. [19:13:41] hmm. [19:14:56] sabya, I'm hoping to help you find something that will be fun and interesting for you and we'll figure out how to integrate that into other work. [19:15:07] No need to make big decisions right now. [19:15:16] Can always change your mind. [19:16:06] sure. so let me start with some small tasks. just to get a grip on things. [19:16:34] Cool. [19:16:37] * halfak looks at phab [19:17:39] One that might be simple is https://phabricator.wikimedia.org/T106638 [19:17:55] We have a "precached" utility in ORES [19:18:08] It listens to the RCStream and requests scores for all the edits as they come through. [19:18:20] This helps make sure they are waiting in the cache when someone needs them. [19:18:29] Right now, I'm running it from a screen. [19:18:42] We should have a nice way to run it as a service on a prod machine. [19:18:52] e.g. one of the web nodes. [19:19:13] This would require some work with puppet and debian-style services. [19:20:22] Another task that could be interesting is https://phabricator.wikimedia.org/T128087 [19:20:48] HashingVectorization is a fun strategy for extracting signal from text. We haven't even looked into it yet, but it seems promising. [19:21:25] I've been told that the way this blows up the feature space isn't really *too* problematic when using a Tree Boosting strategy. [19:21:51] But I'm not even sure what would really be involved in trying to make a hash'd feature set work. [19:24:42] i think they are good ones to start with. [19:28:20] Will be happy to talk more and give examples for each. [19:32:02] this is where I should see all tasks/ideas, right? https://phabricator.wikimedia.org/tag/revision-scoring-as-a-service/ [19:33:56] are we maintaining assigned to properly? so I can skip the one which are already being worked on by someone. [19:34:01] halfak: ^ [19:34:37] sabya, yes. We are roughly maintaining that, but if an assigned task is in paused, feel free to ping and ask if it's being worked on. [19:34:52] Usually when something is in the paused column there's already been some work done. [19:37:28] got it. i'll go through phab tasks tomorrow and talk to you more about it. thanks for all the help. [19:39:50] going to bed [20:33:50] Bah. Missed saying goodbye to sabya [20:49:13] * halfak editconflicts with Guillom on the Events page [20:49:14] ;) [20:49:37] Was also adding past CSCW, so it wasn't worth negotiating the conflict [21:30:09] halfak: heh :) [21:30:29] There are more to add to the "past" section, but my prioity was the "upcoming" one. [21:30:40] I'll probably bring this up at the next RG meeting [21:31:49] * guillom goes back afk. [21:32:59] guillom, have you seen wikiCFP? [21:33:08] It would be great if we could have a custom list of conferences from that. [21:33:29] I didn't know about it, no [21:33:34] ( http://wikicfp.com/cfp/ I assume?) [21:34:13] will check later [21:34:32] Yup! [21:40:49] yuvipanda, halfak: possible I could get some help from one of you tomorrow or Friday with an issue I'm having with HostBot OAuth integration? [21:41:05] J-Mo, yeah. Got a summary of the issue? [21:41:41] can't import requests_oauthlib when I run Teahouse invites script via JSUB [21:42:05] probably I've got an incompatible version of Python in my venv [21:54:04] J-Mo, Oh yeah. I remember talking to you about that. [21:54:16] I think that yuvipanda will likely have better insights than me there :/ [21:54:48] o/ harej [21:54:52] np. I'll ping him separately. he's staying at my house in a few weeks, so I can probably squeeze a favor out of him. thanks halfak [21:55:28] harej, I'm looking to set up a wikibase for experimenting with dataset metadata. Was wondering if you had a guide you worked from for setting up librarybase. [21:56:29] J-Mo: you probably need to pass '-l release=trusty' but also use a virtualenv if you can etc [21:56:34] but yeah, in transit etc [21:59:30] halfak: it's really not hard. Just follow the instructions on MediaWiki.org or wherever they are. set up wikibase with composer and all the dependencies will be installed for you. [22:15:52] halfak: if I can do it, you, a professionally trained computer scientist, can too [22:27:55] halfak: in terms of 'good news' I finally found the bug that was causing (WMF) accounts to fail :D [22:27:57] and I also have a fix! [22:28:28] madhuvishy: ^ :D [22:51:12] harej, yeah. Just trying to make sure that the process of setting up Wikibase is as fast as possible. Do you have a link to "the instructions" you worked from for your labs instance? [22:51:30] yuvipanda, \o/ almost forgot the bug exists since I've been using EpochFail. [22:51:34] Not that I can remember [22:55:13] harej, ok thanks anyway. [22:55:14] :) [23:11:06] halfak: :D [23:20:47] * halfak has started to get use to loading up PAWS and checking on his stuff. [23:54:34] I just achieved 0.295 ROC-AUC [23:54:35] lol [23:54:37] wat