[03:20:00] 06Revision-Scoring-As-A-Service, 10revscoring, 07Spike: [Spike] Investigate HashingVectorizer - https://phabricator.wikimedia.org/T128087#2491101 (10Sabya) 05Open>03Resolved For the record, the IRC log is here: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-ai/20160710.txt [13:35:33] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Announce deployment of wp10 models to ruwiki community - https://phabricator.wikimedia.org/T138623#2491889 (10Halfak) @putnik, sorry for the delay. I was on vacation. See https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service#Article_quality_mod... [13:36:36] o/ [13:36:41] halfak: o/ [13:36:42] heyyyyy [13:36:45] Hey! [13:38:17] I have done some stuff this week but I didn't wanted to get you involved in work while you were in vacation so I wrote report on a daily basis and put it in google docs (contains some NDA stuff :D) for when you're back [13:38:28] halfak: I just share it with you [13:40:00] TLDR: plwiki had some issues, got fixed, wikidata failed jobs got fixed (and ores extension has its own dashboard now) also precaching got much more robust now [13:41:58] Nice. I wouldn't have received the messages anyway. I was away-from-phone for 6 days :) [13:42:23] I hope you had fun :) [13:43:02] I did. It was nice to go be outside for a long time. [13:43:26] I'll have a lot of email to catch up on, so I'll need to take a lighter load than usual this week. [13:45:17] I saw your efforts on testing out the refactor PR. I can help with that today. [13:47:47] How will using codfw work as a canary node when redis is unavailable there? [13:50:35] halfak: wrt using the codfw as canary, We can only test web part there, since even redis would be there, the canary node sends the process to the broker (redis) and even redis works there, the process goes to another node [13:51:23] "and even redis works there"? [13:53:44] Just finished the report. It's awesome. Thanks for writing it up. [13:56:53] halfak: *even if redis works there [13:56:54] :D [13:58:25] OK. This is good, but we won't be at 100% canary if we don't have a redis node there. [13:58:32] I wonder when akosiaris gets back. [13:59:05] he is back, I saw his notes in phab (for other project) [13:59:17] the real canary is beta [13:59:30] which everything works there [14:03:26] halfak: just today [14:03:32] \o/ [14:03:37] still catching up on a ton of things [14:03:52] akosiaris, when you settle in, I want to talk about new machines. [14:04:00] how are you guys ? [14:04:08] halfak: sure. new machines about what ? [14:04:13] Seems we're getting to big for sharing scb1001/2 and we need a redis node in codfw. [14:04:20] There are tasks we can dig into later. [14:04:38] *too big [14:04:41] * halfak feels shame [14:04:46] yeah the latter is probably easy. The former, I 'll have to take a look [14:05:28] by big ? that thread about memory ? where ORES was incorrectly blamed for mobileapps failing ? [14:05:40] akosiaris, related, yes. [14:05:54] We're working on the memory issues, but we're likely to keep growing as we add new prediction models. [14:06:56] Still, definitely worth discussing what staying on scb 1001/2 looks like. [14:07:06] memory leaks aside (which should be fixed if any exist), yeah given the growth, increased memory usage is to be expected [14:07:41] yup [14:10:56] I'm still skeptical of a memory leak -- especially since processes often reduce their memory footprint. [14:11:20] But will keep looking [14:13:24] Amir1, good job getting back to pine re. article quality. It's always nice when we can say "yes, we already support that" :) [14:13:34] :D [14:13:38] thanks [14:19:04] akosiaris: https://lists.wikimedia.org/pipermail/ai/2016-July/000041.html I listed everything we did to reduce the memory pressure. The memory usage now is stable but a little tight https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=scb1001.eqiad.wmnet&m=cpu_report&s=by+name&mc=2&g=mem_report&c=Service+Cluster+B+eqiad [14:19:16] schana, Just reading your notes from SOS. [14:19:41] I'd appreciate it if you don't refer to the issue with ORES and memory usage as a "leak" until we can confirm that this is what it is. [14:19:51] It seems problematic for us to socialize that idea. [14:20:17] No big deal though. [14:20:27] After all, it could be a leak. [14:22:46] Some pics from the camping trip I just got back from: https://www.flickr.com/photos/jaredvolkl/sets/72157671509782285 [14:24:51] I mean the issue with celeryd which it seems to me a memory leak in celeryd (not our issue) [14:25:08] Fair point. [14:40:10] nice place halfak! [14:42:02] Was a lot of work. We'd usually travel 19 - 30 km per day paddling canoe and carrying 40 kg packs & canoes on "portages" ranging from 50 meters to 1 km. [14:42:18] You get in shape for it quickly. [14:42:38] The first few days are the hardest. By the end, it feels like everything is lighter and you can paddle for forever. [14:47:35] Yeah, I should try that some time soon [14:47:52] oh I forgot, we did in Wikimania :D [14:48:21] lol yeah. Honestly, I find a lot of similarities between international travel and travel-camping. [14:48:36] You try to pack light. You end up walking long distances. [14:48:40] every morning when I got from my apartment to the hacking space I was already exhausted :D [14:48:44] You sleep in uncomfortable places. [14:49:05] You make sure you get enough sleep and calories so that you can make it through the next day. [16:46:41] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-extensions-ORES, 05Wikimania-Hackathon-2016: Make hidenondamaging=1 faster - https://phabricator.wikimedia.org/T138444#2492623 (10Halfak) [16:48:30] 06Revision-Scoring-As-A-Service, 10revscoring: Tamil language utilities - https://phabricator.wikimedia.org/T134105#2492628 (10Halfak) a:03schana [16:50:09] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES: CI tests for the ORES extension - https://phabricator.wikimedia.org/T140455#2492631 (10Ladsgroup) a:03Ladsgroup [16:51:45] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 10Wikimedia-Site-requests, 07Beta-Feature: Deploy ORES review tool in English Wikipedia - https://phabricator.wikimedia.org/T140003#2492638 (10Ladsgroup) a:03Ladsgroup [16:52:01] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 10Wikimedia-Site-requests, 07Beta-Feature: Deploy ORES review tool in Polish Wikipedia - https://phabricator.wikimedia.org/T140005#2492640 (10Ladsgroup) a:03Ladsgroup [16:52:27] 06Revision-Scoring-As-A-Service, 10ORES: Have a metric for direct requests to ORES not scoring ones - https://phabricator.wikimedia.org/T139446#2492650 (10Ladsgroup) a:03Ladsgroup [16:53:00] 06Revision-Scoring-As-A-Service, 10ORES: ORES production blog post - https://phabricator.wikimedia.org/T141275#2492654 (10Halfak) [16:53:03] 06Revision-Scoring-As-A-Service, 06Research-and-Data, 10Research-outreach, 07Epic, 03Research-and-Data-2017-Q1: [Epic] Write a comprehensive story on ORES (covering productization and research reports) - https://phabricator.wikimedia.org/T140429#2492670 (10Halfak) [16:53:05] 06Revision-Scoring-As-A-Service, 10ORES: ORES production blog post - https://phabricator.wikimedia.org/T141275#2492669 (10Halfak) [16:53:07] 06Revision-Scoring-As-A-Service, 10ORES: Have a metric for direct requests to ORES not scoring ones - https://phabricator.wikimedia.org/T139446#2432568 (10Ladsgroup) I will do this after getting the refactor merged, otherwise It'll probably cause some conflicts. [16:53:27] halfak: https://phabricator.wikimedia.org/tag/revision-scoring-as-a-service/ here's my tasks for this week [16:53:30] what do you think? [16:54:09] Amir1, looks excellent! [16:54:21] And ambitious. [16:54:32] schana, have you been doing any work inside of MediaWiki yet? [16:55:52] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: [Spec] CI tests for Wikilabels - https://phabricator.wikimedia.org/T137625#2492689 (10Halfak) [16:56:07] :D [16:56:18] schana, if you have experience there, I think it would be good if Amir can help you find an introductory task for working in the ORES extension. [16:56:29] I spend half of my week trying to learn writing CI tests in mediawiki [16:56:30] If not, then it would be cool to have you pick up https://phabricator.wikimedia.org/T137625 [16:56:38] As I expect your experience will help. [16:57:39] Amir1, I'm going to grab lunch and then come back to the config on ores-experiment. [16:57:45] It should be good timing for your travels. [16:57:52] yup [16:57:54] o/ [18:18:59] halfak: I'm back [18:19:08] Hey! Me too. [18:19:13] Looking at ores-experimental now [18:36:18] Amir1, looks like maybe puppet isn't right on ores-experimental [18:36:33] realpath() of /etc/uwsgi/apps-enabled/ores-web.ini failed: No such file or directory [18:36:45] halfak: given the number of instances we have I think that's quite possible [18:36:59] OK checking on that [18:37:06] I will log in and check [18:37:13] I need answer someone first [18:40:24] running the puppet agent atm [18:41:03] halfak: puppet looks happy [18:41:08] Hmm [18:41:26] where do you get this error? [18:41:29] Looks like the config file is called "ores.ini" [18:41:45] When running: sudo service uwsgi-ores-web restart [18:43:49] okay, I see it [18:46:32] halfak: I'm checking to see what's wrong, but stupid wikitech [18:46:38] It might take some time [18:48:12] Looks like the config re puppet is fine. [18:49:22] the puppet codes are fine [18:53:24] halfak: /etc/uwsgi/apps-enabled/ores.ini is there [18:53:32] and uwsgi is working [18:53:43] Agreed. Not sure why it is looking for another file. [18:54:00] in fact it's up for six days [18:54:10] sudo service uwsgi-ores status [18:54:20] ladsgroup@ores-experiment:~$ sudo service uwsgi-ores status [18:54:27] are you in this instance? [18:54:35] halfak: ^ [18:55:14] Ahh! I was looking for ores-web or uwsgi-ores-web [18:55:30] So I restarted uwsgi-ores-web since ores-web was not present [18:55:39] * halfak restarts uwsgi-ores [18:56:18] oh, okay [19:03:31] halfak: I was thinking we might find scores for non-main ns edits in wikidata, since the failed jobs got retried 30 times and ORES is an AI. it might gets upset and throws something as score to shut up the extension [19:08:32] legoktm: hey, if you're around I have two patches you need to take a look if you have some time: [19:08:41] 1- https://gerrit.wikimedia.org/r/#/c/300807/ super trivial [19:08:59] 2- https://gerrit.wikimedia.org/r/#/c/299743/ I got this tested in mw-revscoring.wmflabs.org [19:12:37] lol Amir1 ;) [19:14:56] :ي [19:14:57] :D [19:15:04] back to writing CI tests [19:16:57] http://ores-experiment.wmflabs.org/ is up [19:17:18] http://ores-experiment.wmflabs.org/v2/scores/enwiki/reverted/32534574?features [19:17:20] works [19:22:07] I think I should update precached so that it groups requests by model. [19:25:56] halfak: https://twitter.com/sempf/status/514473420277694465?lang=en [19:26:09] we should act like this on ores-experiment for the next couple of days [19:26:30] (also please make a PR against ores-wmflabs-config and ores in gerrit) [19:26:53] I didn't actually change anything [19:27:46] have you deployed your changes on ores into it? [19:28:14] I changed no code or config [19:28:25] Was just trying to replicate the error you discussed. [19:28:50] Ha! Got an error. [19:29:07] the error happens when you add your ores refactor to the codebase [19:34:26] Amir1, OK. Cool I think I figured out the issue with the PR [19:35:20] nice, yes :) [19:40:00] oh, before merging the refactor I want to have a deployment into prod [19:40:22] of all old things we did, timeout, footer, etc. [19:44:30] We can merge and hold the submodule back [19:45:15] yeah I know, I just wanted to get these stuff be live, they have been there long enough [19:45:42] halfak: but we can just go until the commit before the last [19:45:51] it's okay either ways [19:48:12] Yeah. I want this deployed to ores.wmflabs.org for a little while [19:48:43] hello halfak [19:49:00] welcome back from them vacations [19:49:13] Thanks! [19:50:23] do you think you can throw me a few statistics for the final report? [19:51:57] I think you may have them from various presentations [19:56:34] Hmm... Can't replicate the error on my machine. [20:05:41] ToAruShiroiNeko, sorry I missed the Q. Not sure what exactly you are looking for. [20:05:54] Maybe a count of the number of languages supported? [20:07:44] well [20:07:47] we can do many things [20:07:56] https://grafana.wikimedia.org/dashboard/db/ores [20:08:05] least amount of effort for you preferably :) [20:08:06] we have this but we don't have total requests [20:08:09] https://meta.wikimedia.org/wiki/Grants:IEG/Revision_scoring_as_a_service/Renewal/Final#Outcomes [20:08:16] What are the results of your project? [20:08:21] https://grafana.wikimedia.org/dashboard/db/ores-extension [20:08:24] languages supported, number of hand coded entries etc [20:08:40] number of wikis we have wiki labels [20:08:41] ToAruShiroiNeko, will be hard to gather this for the end of the grant period. [20:08:55] I dont see why we would need to restrict it to that [20:09:04] any extra data wont be shunned [20:09:12] assuming you are ok with that [20:10:00] (oh, btw halfak if you zoom out enough the ores-extension dashboard you can even see when nlwiki had issue too. We had 400 error per min!) [20:10:31] ToAruShiroiNeko, no concerns here [20:11:14] I will have to be honest [20:11:19] I am not in the best of spirits lately [20:18:21] afk [20:29:07] Amir1, https://gerrit.wikimedia.org/r/300991 [20:36:02] * halfak syncs up versions of revscoring and deltas between local and ores-experiment [20:47:41] halfak: I'm back [20:51:15] It's the new version of gerrit, I'm trying to understand how it works [20:53:37] halfak: done [20:56:54] Thanks. Got pulled into a meeting so I still haven't tested [20:58:55] (03CR) 10Legoktm: [C: 032] Better way to handle errors in Cache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/299743 (https://phabricator.wikimedia.org/T137880) (owner: 10Ladsgroup) [20:59:56] (03Merged) 10jenkins-bot: Better way to handle errors in Cache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/299743 (https://phabricator.wikimedia.org/T137880) (owner: 10Ladsgroup) [21:00:53] thanks legoktm :) [21:57:14] 10Revision-Scoring-As-A-Service-Backlog, 06Research-and-Data, 10Research-outreach: Organize a technical workshop for ORES - https://phabricator.wikimedia.org/T141310#2494013 (10DarTar) [22:19:01] Amir1, how were you viewing the error you saw on ores-experimental? [22:19:19] I'm trying to journalctl -u uwsgi-ores and seeing no errors in the log. [22:19:33] halfak: /var/log in syslog [22:19:42] or deamon.log [22:30:07] Weird. I'm still not seeing the errors. [22:30:22] I'm not seeing the requests come in at all. [22:30:30] Maybe I'll trying using wget on the machine itself. [22:33:11] Amir1, sorry to bug you again, but I'm stuck. This is really weird. I've never ran into this issue before. [22:33:30] I can see the log of the server starting up. [22:33:33] no worries, that's why I'm here [22:33:36] But I can't see any log items after that. [22:33:46] Oh! There goes a puppet run :) [22:34:24] halfak: one thing I do usually in these cases, is to stop the services and run it manually using /srv/ores/venv/bin/python ores_celery.py [22:34:33] and uwsgi in two files [22:34:35] Hmm... Could work. Will try [22:34:38] *tabs [22:34:53] but you need to change the port from 5000 to 8081 in the file [22:35:29] and then I make the requests in ores-experiment.wmflabs.org [22:38:00] okay, I just wrote my first successful CI test in mediawiki [22:38:05] \o/ [22:38:08] OK that works. [22:44:38] well... that was obscure as a hell [22:44:46] I think I have it fixed. [22:44:52] It only happens when a score is in the cache. :) [22:47:51] http://ores-experiment.wmflabs.org/v2/scores/enwiki/?models=damaging|goodfaith&revids=729078817 [22:47:53] Amir1, ^ [22:47:55] Works now [22:48:13] it's awesome we found beforehand [22:48:27] +1 would have been very frustrating if not found. [22:49:25] okay, I thought you want to introduce v3 for changing place of revision and model [22:51:54] halfak: I think we need to change our config repos too. Is that correct? [22:52:01] Yes [22:52:09] hmm [22:52:13] Amir1, I didn't do v3, but it wouldn't be hard. [22:52:33] no worries, I just thought this way [22:52:56] All of the hard infra for v3 is ready. [22:53:19] It will enable you to do feature injection in all paths. [22:53:26] halfak: once it got deployed, remind me to fix precaching [22:53:36] Will do. Also, precaching will change. [22:53:47] We'll want to have precaching group models in its requests. [22:54:05] That will take advantage of the changes I made for multi-model scoring [22:54:11] no, I'm talking about the refactor because right now we senddifferent requests for each model [22:54:18] * halfak is looking forward to reducing precaching load by a factor of 3-4. [22:55:02] halfak: btw. this is one of the things I've been in the past couple of weeks. https://gerrit.wikimedia.org/r/#/c/299284/ [22:55:41] the tests that I wrote for this is the first step (I used it as an excuse to learn writing tests for mediawiki so I write it for the extension) [22:55:43] \o/ Awesome. I had a researcher who was looking for a good way to track entity usage. [22:56:09] the next step is a gui (probably I put it in action=info in GUI) [22:56:32] https://en.wikipedia.org/wiki/Barak_Obama?action=info [22:56:35] here [22:56:44] nice lgbt flag in site notice of enwp [22:57:32] https://en.wikipedia.org/wiki/Barack%20Obama?action=info [22:58:00] Number of page watchers 2,940 [23:00:14] Wooo! Running precaching on ores-experimental. We're barely using 15% CPU :) [23:01:09] Oh wait. I was doing something wrong. [23:01:10] Retesting [23:03:17] OK more like 40-70% CPU [23:14:50] OK time to run to puppy training classes. See yall tomorrow [23:14:50] o/ [23:22:33] o/