[06:33:05] (03CR) 10Thiemo Mättig (WMDE): [C: 031] Integrate with Special:Contributions (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/264608 (https://phabricator.wikimedia.org/T122537) (owner: 10Awight) [12:33:29] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Complete eswiki edit quality campaign - https://phabricator.wikimedia.org/T131963#2184357 (10Johan) I posted a new [[ https://es.wikipedia.org/wiki/Wikipedia:Café/Archivo/Miscelánea/Actual#ORES | reminder ]]. [12:35:14] (03PS15) 10Ladsgroup: Integrate with Special:Contributions [extensions/ORES] - 10https://gerrit.wikimedia.org/r/264608 (https://phabricator.wikimedia.org/T122537) (owner: 10Awight) [12:55:38] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Complete eswiki edit quality campaign - https://phabricator.wikimedia.org/T131963#2539728 (10Johan) I've encouraged anyone who's interested to get in touch with you – a few more edit reviews are nice, but what would really help here... [13:11:44] (03PS16) 10Ladsgroup: Integrate with Special:Contributions [extensions/ORES] - 10https://gerrit.wikimedia.org/r/264608 (https://phabricator.wikimedia.org/T122537) (owner: 10Awight) [13:19:33] (03PS17) 10Ladsgroup: Integrate with Special:Contributions [extensions/ORES] - 10https://gerrit.wikimedia.org/r/264608 (https://phabricator.wikimedia.org/T122537) (owner: 10Awight) [13:30:10] o/ [13:30:26] halfak: o/ [13:30:34] mw-revscoring.wmflabs.org/wiki/Special:Contributions/Someone [13:30:46] (log in and enable ores beta feature) [13:31:08] * halfak does [13:31:26] Nice! Looks great with the full highlighted row [13:32:48] Okay, I think I don't remember the yesterday discussion. Do you think we should hold on deploying to prod until you run experiment on precaching? [13:34:27] halfak: ^ [13:35:20] Amir1, been running over night. We could try a deploy today, but I'm in uber-meetings mode starting in 1.5 hours so I'd need you to monitor and stick around for a while. [13:35:44] sure [13:36:13] Is it in grafana or graphite [13:36:20] It looks like our failure rate went up to 35% at one point :/ [13:36:28] Both? [13:36:29] https://grafana-labs-admin.wikimedia.org/dashboard/db/ores-labs [13:36:41] okay [13:37:17] God damn. I hate that grafana sometimes can't connect to graphite [13:39:56] by sometimes do you mean always? [13:40:00] specially in labs [13:41:46] Yes. :( [13:51:13] Hmm... Any idea where the logs for precached end up? [13:51:32] Amir1, ^ [13:51:38] syslog [13:53:18] Amir1, were getting some weird errors. [13:53:33] E.g. in precaching this URL returns "RevisionNotFound" https://ores.wmflabs.org/v2/scores/enwiki/?models=reverted|damaging|goodfaith&revids=733806957 [13:53:37] But now it returns a score. [13:53:45] * halfak looks into the deletion logs. [13:54:19] I saw things like this in ores extension too [13:54:30] my wild guess is that the api acts crazy [13:54:33] Do you think that maybe the API [13:54:35] yeah... [13:54:38] because it does sometimes [13:56:20] And this guy timed out https://ores.wmflabs.org/v2/scores/dewiki/?models=reverted&revids=156874993 [13:56:25] Looks like it might timeout again [13:56:30] Yup [13:56:50] http://de.wikipedia.org/wiki/?diff=156874993 [13:57:01] * halfak waits while is browser locks up [13:57:02] lol [13:57:15] Wholey moley [13:57:24] Holey moley? [13:57:39] Whole lee Mole lee [13:57:45] (Big page) [13:58:30] oh yeah [13:59:38] So, it looks like sometimes, the API is crazy [14:04:04] halfak: https://phabricator.wikimedia.org/T141169#2500645 [14:04:13] these tips helped me a lot [14:05:20] scaleToSeconds [14:05:24] <3 that function [14:11:19] Looks like our scoring speed is slow on labs [14:11:29] I would expect the average score to be generated in 0.5 secs [14:11:34] but we're at 1.2 secs. [14:11:51] Oh crap. Are you editing too? [14:11:53] Amir1, ^ [14:12:00] We're overwriting each other's changes [14:12:17] I did some very small changes [14:12:21] nothing important [14:12:56] kk [14:13:00] They have been overwritten :( [14:13:04] Want to try again? [14:13:06] I'll duck out [14:13:10] Make sure to refresh first [14:13:17] nope, i'll do it later [14:14:26] I'm going to look into the scoring speed issue quickly. [14:27:25] 06Revision-Scoring-As-A-Service, 10ORES: Extrapolate memory usage per worker forward 2 years - https://phabricator.wikimedia.org/T142046#2539886 (10akosiaris) Assuming in about 2 years most models and wikis have been added, 2.1GB of memory per worker is a pretty good guideline. I suppose the number of workers... [14:39:18] OK. I've confirmed that the cache-sharing between model evaluations is working as expected. [14:41:00] o/ akosiaris. Thanks for the comment. Would you start us a new task for building up the hardware request? [14:41:26] halfak: I am actually writing it right now [14:41:32] COol! [14:41:33] :) [14:41:42] Happy to see this moving forward. [14:41:59] I hear that budget is going to be an issue. I'll be pushing on that with Dario & Wes. [14:54:59] yeah, I was afraid of that [14:59:56] halfak: https://phabricator.wikimedia.org/T142578#2539946 [15:00:01] it's probably gonna take a while [15:00:17] especially if budget is going to be an issue [15:08:45] splitting web and worker nodes will cost us some trouble in puppet config but that's not something we should be worried about now [15:13:50] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Complete eswiki edit quality campaign - https://phabricator.wikimedia.org/T131963#2539980 (10Halfak) Indeed. One carrot that I'd like to dangle is the [ORES review tool](https://www.mediawiki.org/wiki/ORES_review_tool). We can only... [15:14:28] Amir1, rather, it will require us to fix the puppet config so it's not crazy ;) [15:17:06] :D [18:28:53] halfak: I just added you to an event, half an hour from now for deploying ores [18:29:41] Amir1, OK. If I'm running backup, that should work for me. [18:29:59] okay [18:30:00] great [18:50:36] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 07Puppet: Move vagrant role to use ores in production - https://phabricator.wikimedia.org/T142618#2541130 (10Ladsgroup) [19:00:39] o/ Amir1 [19:00:44] Do we have a window now? [19:00:46] hey! [19:00:49] yup [19:00:54] Great. I'm standing by. [19:01:26] halfak: oh, my bad, It is in one hour from now [19:01:32] facepalm [19:01:34] Uh oh. [19:01:46] * Amir1 punching myself in the face [19:01:50] That can work, but I'll have to drive across the city right now. [19:01:50] :D [19:02:00] Only softly for effect. No black eyes ;) [19:02:17] * halfak works toward getting packed up. [19:03:32] :D [19:03:42] I can't even if I wanted to [19:07:05] 06Revision-Scoring-As-A-Service, 10ORES: [Investigate] Periodic redis related errors in wmflabs - https://phabricator.wikimedia.org/T141946#2541181 (10-jem-) @Ladsgroup: Probably because I used the first URL I found when searching, and since then until now I hadn't realized that was a better choice. I have cha... [19:29:24] OK walking out the door. Might not be there right when you start Amir1,m but it should be soon [19:29:41] Don't worry [19:29:44] I've got it [19:42:42] (03CR) 10Ladsgroup: [C: 032] Integrate with Special:Contributions [extensions/ORES] - 10https://gerrit.wikimedia.org/r/264608 (https://phabricator.wikimedia.org/T122537) (owner: 10Awight) [19:43:45] (03Merged) 10jenkins-bot: Integrate with Special:Contributions [extensions/ORES] - 10https://gerrit.wikimedia.org/r/264608 (https://phabricator.wikimedia.org/T122537) (owner: 10Awight) [20:27:52] 06Revision-Scoring-As-A-Service, 10ORES: Make scb1002.eqiad.wmnet the canary node for ORES - https://phabricator.wikimedia.org/T142630#2541503 (10Ladsgroup) [21:18:10] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy edit quality models for plwiki - https://phabricator.wikimedia.org/T130292#2541738 (10Ladsgroup) a:03Ladsgroup [21:59:25] 06Revision-Scoring-As-A-Service, 10ORES, 07Puppet: Change CP to do several models at once. - https://phabricator.wikimedia.org/T142360#2541943 (10Ladsgroup) https://github.com/wikimedia/change-propagation/pull/78 [22:13:54] Amir1, so, our total memory usage went from 45.4GB to 26.2GB on scb1001 [22:14:00] Probably the same on scb1002 [22:14:08] So... I don't know why memory pressure didn't fall [22:14:31] If I'm right, we should be able to start up ~10 more workers per node [22:14:32] hmm [22:14:40] Maybe 8 to be safe [22:14:58] I can do that [22:15:15] It looks like we have 24 celery workers now, right? [22:15:18] but first, let's investigate this [22:15:22] halfak: yes [22:15:34] we'll go to 32 very soon [22:15:40] Weird. So, I'm guessing that something came out of swap to take up the available RAM. [22:17:23] Hmm... We're apparently using 160% of RAM right now [22:17:38] That can't be right [22:17:57] (this is what I get if I add up the MEM% column) [22:18:01] I generally work from RES [22:18:27] "RES stands for the resident size, which is an accurate representation of how much actual physical memory a process is consuming. " [22:18:33] ---http://mugurel.sumanariu.ro/linux/the-difference-among-virt-res-and-shr-in-top-output/ [22:18:35] interesting [22:19:20] Yeah. And re. RES, we dropped almost 19GB [22:19:39] Bah. I need to run. [22:20:01] I think our next step should be to consult some ops specialist about our observed change in memory usage and the state of the machine [22:20:09] E.g. akosiaris [22:20:23] Maybe he'd know right away what's going on and whether or not this is a victory [22:20:26] yes [22:20:29] Anyway, I'm out of here [22:20:31] Have a good one! [22:20:33] o/ [22:20:36] have fun [22:20:37] o/