[08:16:24] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 06Operations: Set up monitoring for ORES redis database - https://phabricator.wikimedia.org/T155482#2944388 (10akosiaris) [08:45:27] 10Revision-Scoring-As-A-Service-Backlog, 06Operations: Set up oresrdb redis node in codfw - https://phabricator.wikimedia.org/T139372#2944445 (10Joe) [16:23:32] 10Revision-Scoring-As-A-Service-Backlog, 10DBA, 10MediaWiki-General-or-Unknown, 10ORES: Fatal exception of type "DBQueryError" on sorting ORES contributions - https://phabricator.wikimedia.org/T155500#2945272 (10Samtar) [19:01:58] o/ [19:36:39] hey halfak -- 2200 UTC today works for me. Thank you! [19:36:49] Great! [20:53:11] 10Revision-Scoring-As-A-Service-Backlog, 10Research Ideas: General image classifier for commons - https://phabricator.wikimedia.org/T155538#2946598 (10Halfak) [20:54:18] 10Revision-Scoring-As-A-Service-Backlog, 10Research Ideas: General image classifier for commons - https://phabricator.wikimedia.org/T155538#2946613 (10Halfak) [20:56:07] 10Revision-Scoring-As-A-Service-Backlog: WikiProject recommender service - https://phabricator.wikimedia.org/T155539#2946618 (10Halfak) [20:56:13] 10Revision-Scoring-As-A-Service-Backlog, 10Research Ideas: WikiProject recommender service - https://phabricator.wikimedia.org/T155539#2946631 (10Halfak) [21:02:14] 06Revision-Scoring-As-A-Service, 10Research Ideas: Article importance prediction model - https://phabricator.wikimedia.org/T155541#2946658 (10Halfak) [21:02:35] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 10rsaas-draftquality, 03Research-and-Data-2016-17-Q2: [Epic] Build draft quality model (spam, vandalism, attack, or OK) - https://phabricator.wikimedia.org/T148038#2946671 (10Halfak) [21:02:41] 06Revision-Scoring-As-A-Service, 10Research Ideas: [Epic] Article importance prediction model - https://phabricator.wikimedia.org/T155541#2946672 (10Halfak) [21:05:03] 10Revision-Scoring-As-A-Service-Backlog, 10Research Ideas: Newcomer coaching AI - https://phabricator.wikimedia.org/T155542#2946675 (10Halfak) [21:48:01] 10Revision-Scoring-As-A-Service-Backlog, 06Discovery-Search, 10Research Ideas: Use implicit feedback to improve search rankings - https://phabricator.wikimedia.org/T155559#2946985 (10Halfak) [21:55:32] 10Revision-Scoring-As-A-Service-Backlog, 06Discovery-Search, 10Research Ideas: Use implicit feedback to improve search rankings - https://phabricator.wikimedia.org/T155559#2947023 (10Halfak) [21:57:41] 10Revision-Scoring-As-A-Service-Backlog, 10Wikidata: Linked fact checker - https://phabricator.wikimedia.org/T155560#2947029 (10Halfak) [22:01:34] 10Revision-Scoring-As-A-Service-Backlog: Notability detector - https://phabricator.wikimedia.org/T155561#2947059 (10Halfak) [22:01:46] 10Revision-Scoring-As-A-Service-Backlog: Notability detector - https://phabricator.wikimedia.org/T155561#2947072 (10Halfak) [22:01:59] 10Revision-Scoring-As-A-Service-Backlog: Notability detector - https://phabricator.wikimedia.org/T155561#2947059 (10Halfak) [22:05:03] 10Revision-Scoring-As-A-Service-Backlog: Linkify: A tool for recommending links between articles - https://phabricator.wikimedia.org/T155563#2947098 (10Halfak) [22:05:16] 10Revision-Scoring-As-A-Service-Backlog: Linkify: A tool for recommending links between articles - https://phabricator.wikimedia.org/T155563#2947111 (10Halfak) This might be {{done}}. See http://tools.wmflabs.org/navlink-recommendation/ [22:05:16] How cool, wikibugs! [22:06:25] lol [22:07:27] 10Revision-Scoring-As-A-Service-Backlog: Linkify: A tool for recommending links between articles - https://phabricator.wikimedia.org/T155563#2947098 (10Halfak) Note that this idea as brought up as part of the AI wishlist brainstorming session at the Wikimedia Developer summit. I'm filing the task even though I... [22:07:39] 10Revision-Scoring-As-A-Service-Backlog, 10Research Ideas: Linkify: A tool for recommending links between articles - https://phabricator.wikimedia.org/T155563#2947137 (10Halfak) [22:11:06] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: Generate PCFG sentence models - https://phabricator.wikimedia.org/T148037#2947167 (10Halfak) [22:11:12] 06Revision-Scoring-As-A-Service, 10revscoring: Generate PCFG sentence models - https://phabricator.wikimedia.org/T148037#2712808 (10Halfak) [22:11:19] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : Extract features for deleted page (draft quality model) - https://phabricator.wikimedia.org/T148581#2947169 (10Halfak) [22:11:38] 06Revision-Scoring-As-A-Service, 10revscoring: Generate PCFG sentence models - https://phabricator.wikimedia.org/T148037#2712808 (10Halfak) a:03Halfak [22:11:40] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : Extract features for deleted page (draft quality model) - https://phabricator.wikimedia.org/T148581#2726858 (10Halfak) a:03Halfak [22:13:40] o/ jonas_agx_ [22:13:46] hey halfak -- can we talk now? [22:13:57] Yup! :) [22:14:19] Sorry for making you wait, my connection was not working [22:14:21] So, my goal today is to get you to the point where you can run the profiler on feature extraction for a wiki. [22:14:26] No worries. [22:14:31] Great! [22:14:36] Want to focus on ptwiki? [22:15:09] damaging, reverted, and goodfaith all use the same features, so we don't need to worry about differences there. [22:15:16] We can focus on enwiki if it is simpler [22:15:33] Meh. All the same to me. [22:16:23] Enwiki, i'm not sure if I have all dictionaries [22:16:51] Sure. [22:16:59] I just installed aspell and enchant [22:17:03] OK. So. What operating system are you using? [22:17:29] We'll need myspell-en-au, myspell-en-us, myspell-en-gb [22:17:34] I'm on manjaro (it's a arch linux cousing) right now [22:18:04] above I gave names for deb packages I'm familiar with getting through apt. [22:18:08] Will that help you find what you need? [22:18:29] It helps [22:18:37] I'm searching for it now [22:18:45] kk [22:20:56] I can't find a package with that name, would hunspell do the trick? [22:21:07] It'll work for demonstration purposes. [22:21:26] 10Revision-Scoring-As-A-Service-Backlog, 10Research Ideas: Linkify: A tool for recommending links between articles - https://phabricator.wikimedia.org/T155563#2947265 (10leila) @Halfak yeah, I think https://arxiv.org/abs/1512.07258 is basically what you're looking for (as you linked in Description). The blocke... [22:21:37] Hmmm.. maybe it would be easier to just have you work from one of our compute boxes [22:21:57] jonas_agx_, do you have a Wikimedia Labs account? [22:22:06] E.g. for logging into tool labs? [22:22:25] Cool, I do. It's probably jonas_agx [22:23:19] I'm confirming right now [22:23:46] kk [22:28:01] I have an account but can't find the username [22:28:17] So they don't let me create a new one with tha same email [22:29:26] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 10rsaas-draftquality, 03Research-and-Data-2016-17-Q2: [Epic] Build draft quality model (spam, vandalism, attack, or OK) - https://phabricator.wikimedia.org/T148038#2947336 (10Halfak) FYI: https://github.com/wiki-ai/draftquality/blob/master/models/en... [22:31:37] Regretfully, I can't just simply inspect the database. [22:31:41] I'm looking into other ways. [22:31:51] I used tool labs awhile ago [22:32:14] jonas_agx_, want to join me in #wikimedia-labs [22:32:15] ? [22:32:30] Okay [22:41:39] OK. I'm just running puppet to make sure it's added you to the machine and then I'll help you log in. [22:42:17] Okay, I'm adding my new ssh-key to my account [22:42:41] Let me know when you have finished that. I'll need to run puppet again [22:42:48] Done [22:44:54] OK. You'll need to make sure that the following ssh command picks up the right SSH key [22:45:14] Also that it uses the labs bastion. Hopefully this'll just work. If not, we'll debug [22:45:24] "ssh ores-compute-01.eqiad.wmflabs" [22:45:43] jonas_agx_, ^ [22:46:33] okay [22:47:51] "ssh: Could not resolve hostname ores-compute-01.eqiad.wmflabs: Name or service not known" [22:48:27] halfak: ^ [22:49:16] Check this out https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_.28recommended.29 [22:49:21] And update your .ssh/config [22:50:36] okay [22:53:48] done! I'm in [22:53:55] \o/ [22:54:04] Thank you!!! [22:54:10] OK. So I want to quickly help you set up an environment to work in. [22:54:17] Great! [22:54:35] OK if I help you set up yours like mine and then you can make modifications later? [22:54:54] Yes [22:56:09] Starting at ~ [22:56:10] $ mkdir venv [22:56:14] $ cd venv [22:56:24] $ virtualenv 3.4 -p $(which python3) --system-site-packages [22:56:44] done [22:56:57] $ source ~/venv/3.4/bin/activate [22:57:21] Following [22:57:25] $ mkdir projects [22:57:25] $ cd projects [22:57:45] hmmm what's your github username? [22:58:49] jonasagx [22:59:09] $ git clone https://jonasagx@github.com/wiki-ai/revscoring [22:59:39] $ cd revscoring [22:59:40] $ pip install -r requirements.txt [23:02:49] jonas_agx_, I expect this will take a long time. [23:03:06] If it fails, try to "pip install numpy" and then try the "pip install -r requirements.txt" again [23:03:40] Okay [23:03:49] For reference of the commands I gave you, see https://gist.github.com/halfak/1a0a9e0c754e12f2686a2eaa8046667d [23:04:25] The installation seems stuck in compilation [23:04:33] Thanks for the gist [23:04:34] That's normal [23:04:48] Takes a while [23:06:44] While it is installing the dependences could you explain me how I can contribute? [23:08:01] Ahh yes. So once we have revscoring working for you, I'll show you how to run the profiler during feature extraction. [23:08:16] Then we can look at one of the slowest parts of feature extraction and talk about making it faster. [23:08:40] Cool! [23:16:33] I'll let you know when it is ready [23:16:50] kk [23:26:14] 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Deploy draftquality models to WMFLabs - https://phabricator.wikimedia.org/T155576#2947549 (10Halfak) [23:26:18] 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Deploy draftquality models to WMFLabs - https://phabricator.wikimedia.org/T155576#2947562 (10Halfak) https://github.com/wiki-ai/ores-wmflabs-deploy/pull/72 [23:26:25] 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Deploy draftquality models to WMFLabs - https://phabricator.wikimedia.org/T155576#2947563 (10Halfak) a:03Halfak [23:28:14] 06Revision-Scoring-As-A-Service, 10revscoring: Sentence bank for personal attacks - https://phabricator.wikimedia.org/T148035#2947583 (10Halfak) [23:28:21] 06Revision-Scoring-As-A-Service, 10revscoring: Sentence bank for vandalism - https://phabricator.wikimedia.org/T148034#2947584 (10Halfak) [23:28:33] 06Revision-Scoring-As-A-Service, 10revscoring: Sentence bank for Featured Articles - https://phabricator.wikimedia.org/T148033#2947585 (10Halfak) [23:28:45] 06Revision-Scoring-As-A-Service, 10revscoring: Sentence bank for spam - https://phabricator.wikimedia.org/T148032#2947586 (10Halfak) [23:30:26] 06Revision-Scoring-As-A-Service, 10Wikilabels: Deploy labeling campaign for ptwiki in 2016 - https://phabricator.wikimedia.org/T155276#2947589 (10Halfak) I've got a new sample https://quarry.wmflabs.org/query/15468. I've autolabeled it and I'm just getting ready to deploy. [23:30:28] All requiriments installed [23:30:33] halfak: ^ [23:30:46] Great. [23:30:50] So, now... [23:30:59] $ pip install revscoring==1.3.4 [23:31:03] It shouldn't need to do much [23:31:21] Done [23:32:11] $ cd .. [23:32:15] $ git clone https://jonasagx@github.com/wiki-ai/editquality [23:32:22] $ cd editquality [23:34:02] Okay [23:34:16] Shoud I install its requiriments too? [23:34:42] Nope. :) [23:34:46] Next two commands [23:34:47] $ make datasets/enwiki.labeled_revisions.20k_2015.json [23:34:47] $ cat datasets/enwiki.labeled_revisions.20k_2015.json | shuf -n 1000 | revscoring extract editquality.feature_lists.enwiki.damaging --host https://en.wikipedia.org --profile enwiki.damaging.profile.md > /dev/null [23:35:10] * halfak crosses fingers. [23:35:15] That last one was shot from the hip! [23:37:47] It didn't work [23:38:25] Can you paste the error? [23:38:25] I think it needs nltk data [23:38:53] "ImportError: Could not load stopwords for revscoring.languages.english. You may need to install the nltk 'stopwords' corpora. See http://www.nltk.org/data.html" [23:39:08] Aha! [23:39:10] OK. [23:39:18] "python -m nltk.downloader stopwords" would fix it? [23:39:38] $ python -m nltk.downloader stopwords [23:39:40] Yup [23:40:19] Okay I will run the second command again [23:40:57] "Traceback (most recent call last): [23:40:57] File "/home/agx/venev/3.4/bin/revscoring", line 11, in [23:40:57] sys.exit(main()) [23:40:58] File "/home/agx/venev/3.4/lib/python3.4/site-packages/revscoring/revscoring.py", line 53, in main [23:41:00] module.main(sys.argv[2:]) [23:41:04] File "/home/agx/venev/3.4/lib/python3.4/site-packages/revscoring/utilities/extract.py", line 110, in main [23:41:07] profile_f, verbose, debug) [23:41:09] File "/home/agx/venev/3.4/lib/python3.4/site-packages/revscoring/utilities/extract.py", line 150, in run [23:41:12] write_profile(profile_f, dependents, profile, batch_size) [23:41:15] File "/home/agx/venev/3.4/lib/python3.4/site-packages/revscoring/utilities/extract.py", line 234, in write_profile [23:41:18] ('min_time', round(min(profile['per_batch_duration']), 3)), [23:41:21] ValueError: min() arg is an empty sequence" [23:41:25] [23:42:11] $ wc datasets/enwiki.labeled_revisions.20k_2015.json [23:42:15] ^ What's the output? [23:42:40] the file is empty [23:42:44] 0 0 0 datasets/enwiki.labeled_revisions.20k_2015.json [23:43:14] rm datasets/enwiki.labeled_revisions.20k_2015.json [23:43:16] make datasets/enwiki.labeled_revisions.20k_2015.json [23:43:23] remove it and then make it again [23:44:04] Error [23:44:07] "ImportError: No module named 'mwreverts' [23:44:08] Could not load utility autolabel. [23:44:08] Makefile:304: recipe for target 'datasets/enwiki.labeled_revisions.20k_2015.json' failed [23:44:08] make: *** [datasets/enwiki.labeled_revisions.20k_2015.json] Error 1 [23:44:12] " [23:44:35] Oh! Of course editquality does have some requirements we need. [23:44:43] $ pip install -r requirements.txt [23:45:13] okay [23:48:14] There is a package missing called para [23:48:31] even after i install the requiriments [23:48:39] installed* [23:49:24] "ImportError: No module named 'para' [23:49:24] Could not load utility autolabel. [23:49:25] Makefile:304: recipe for target 'datasets/enwiki.labeled_revisions.20k_2015.json' failed [23:49:25] make: *** [datasets/enwiki.labeled_revisions.20k_2015.json] Error 1 [23:49:28] " [23:56:08] I added "para==0.0.5" to requeriments [23:58:09] It's working now