[01:47:32] <wikibugs>	 10Scoring-platform-team, 10revscoring, 10Chinese-Sites, 10artificial-intelligence: Chinese language utilities - https://phabricator.wikimedia.org/T109366#3811368 (10Shizhao) chinese badwords: https://github.com/pychen0918/bad-words-chinese chinese informal words: https://resources.allsetlearning.com/chines...
[01:59:23] <wikibugs>	 10Scoring-platform-team, 10ChangeProp, 10ORES, 10Services (doing): Change ORES rules to send all events to new "/precache" endpoint - https://phabricator.wikimedia.org/T158437#3811372 (10Pchelolo) @Ladsgroup Actually, I don't think it works. Trying to POST with curl to ORES I get 500 and the following stac...
[02:33:01] <wikibugs>	 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10ORES, 10Documentation: Elaborate documentation on how to deploy ORES to a new wiki - https://phabricator.wikimedia.org/T182054#3811391 (10awight)
[02:33:57] <wikibugs>	 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10ORES, 10Documentation: Elaborate documentation on how to deploy ORES to a new wiki - https://phabricator.wikimedia.org/T182054#3811405 (10awight)
[02:34:01] <wikibugs>	 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10Patch-For-Review, 10artificial-intelligence: Experiment with using English Wikipedia models on Simple English - https://phabricator.wikimedia.org/T181848#3811404 (10awight)
[02:34:03] <wikibugs>	 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10Patch-For-Review, 10artificial-intelligence: Experiment with using English Wikipedia models on Simple English - https://phabricator.wikimedia.org/T181848#3804284 (10awight)
[02:46:18] <wikibugs>	 10Scoring-platform-team, 10Patch-For-Review: Clean up ORES vagrant role - https://phabricator.wikimedia.org/T181850#3811413 (10awight) https://github.com/wiki-ai/ores/pull/238
[02:47:59] <travis-ci>	 wiki-ai/ores#879 (extras - 94722ee : Adam Roses Wight): The build passed. https://travis-ci.org/wiki-ai/ores/builds/311648006
[02:55:47] <wikibugs>	 10Scoring-platform-team, 10Patch-For-Review: Clean up ORES vagrant role - https://phabricator.wikimedia.org/T181850#3811420 (10awight) After chatting with @Tgr in IRC, I'm thinking that we should refactor our vagrant role to work on the ores-prod-deploy repo.  This gives us the best approximation of production...
[02:57:06] <wikibugs>	 10Scoring-platform-team, 10JADE, 10MediaWiki-Vagrant: Vagrant role for JADE - https://phabricator.wikimedia.org/T182055#3811422 (10awight)
[02:57:26] <wikibugs>	 10Scoring-platform-team, 10MediaWiki-Vagrant, 10Patch-For-Review: Clean up ORES vagrant role - https://phabricator.wikimedia.org/T181850#3811434 (10awight)
[07:59:58] <refeed[m]>	 Hi, I got a psycopg2.ProgrammingError, when trying to fetch user with campaigns data in http://localhost:8080/users/555755/?campaigns , seems like wikilabels in wmflabs also has the same problem (http://labels.wmflabs.org/users/100/?campaigns).
[08:00:30] <refeed[m]>	 This is the traceback: https://dpaste.de/tA18#L
[08:05:13] <refeed[m]>	 btw, may I send my patch for this :D ?
[13:54:53] <awight>	 halfak: Amir1: lmk what you think about this idea, https://phabricator.wikimedia.org/T181850#3811420
[14:19:06] <wikibugs>	 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10Patch-For-Review, 10artificial-intelligence: Experiment with using English Wikipedia models on Simple English - https://phabricator.wikimedia.org/T181848#3812701 (10awight) Ran into an issue:  This test change, https://simple.wikipedi...
[14:36:17] <wikibugs>	 10Scoring-platform-team, 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10monitoring: Make an ORES service log dashboard for logstash-beta - https://phabricator.wikimedia.org/T182005#3812742 (10awight) 05Open>03Resolved a:03awight It needs refinement, but here's a rough pass which just shows al...
[14:58:40] <awight>	 o/
[14:59:41] <halfak>	 o/
[15:03:48] * halfak reads vagrant proposal
[15:04:11] <halfak>	 ores-prod-deploy uses a lot of memory for a dev machine. 
[15:05:08] <halfak>	 We can use 99-local.yaml to limit the set of models to a very small set. 
[15:05:42] <halfak>	 awight, what is meant by "It would be nice if each submodule could be split into its own role,"
[15:05:53] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Wikimedia-Incident: Document Nov 28-29 ORES outage - https://phabricator.wikimedia.org/T182101#3812850 (10awight)
[15:06:30] <awight>	 halfak: That sounds like a good compromise, and the developer can selectively reenable models as needed.
[15:06:36] <halfak>	 +1 
[15:07:06] <halfak>	 You can easily do that with: https://phabricator.wikimedia.org/source/ores-deploy/browse/master/config/00-main.yaml;6baed71ef9a02264660f8637fb2313c9c560f71d$39
[15:07:10] <halfak>	 Just select the wikis you want. 
[15:08:07] <awight>	 That last bit is esoteric, I was just musing about how the model repos are orthogonal and it would be nice to be able to just install and develop on draftquality without the other stuff.  Your config hack solves that well enough.
[15:09:22] <wikibugs>	 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10Patch-For-Review, 10artificial-intelligence: Experiment with using English Wikipedia models on Simple English - https://phabricator.wikimedia.org/T181848#3812875 (10Halfak) https://simple.wikipedia.org/w/index.php?diff=3266888 doesn't...
[15:09:40] <awight>	 You agree with the general idea though, that we should make mw-vagrant much more productiony?
[15:15:11] <wikibugs>	 10Scoring-platform-team, 10ChangeProp, 10ORES, 10Services (doing): Change ORES rules to send all events to new "/precache" endpoint - https://phabricator.wikimedia.org/T158437#3812912 (10Halfak) Woops!  I see the problem.  I'm looking into it.
[15:22:57] <andre__>	 Amir1, Hey! Could you reply to https://phabricator.wikimedia.org/T171083#3790710 when you find some time? Because I'd love to see that as some GCI tasks :) TIA!
[15:24:28] <wikibugs>	 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10Patch-For-Review, 10artificial-intelligence: Experiment with using English Wikipedia models on Simple English - https://phabricator.wikimedia.org/T181848#3812972 (10awight) Aha, thanks!  On to the next puzzle.  All four thresholds wer...
[15:29:34] <wikibugs>	 10Scoring-platform-team, 10MediaWiki-extensions-ORES: Cached thresholds should be purged when model version is incremented - https://phabricator.wikimedia.org/T182111#3813051 (10awight)
[15:35:39] <awight>	 I don’t really like the last week thing in our “current work” etherpad, but it’s easy to not look at so whatevs
[15:36:29] <halfak>	 What's the problem?
[15:36:41] <halfak>	 What don't you like about it? 
[15:37:22] <awight>	 I like the idea of a finite page is all.  We could shuffle over to an /Archive maybe?
[15:37:36] <halfak>	 "Finite"?
[15:38:00] <awight>	 yah like one that represents just the now and immediate future, rather than all of history
[15:38:14] <halfak>	 I'm confused about "last week" thing in our current work etherpad. 
[15:38:14] <awight>	 Also, I guess I used the wrong assumptions at first, cos I’ve been deleting items which especially makes my “past weeks” useless
[15:38:30] <halfak>	 Where do you see "last week"?
[15:38:32] <awight>	 starting at line 83
[15:38:49] <halfak>	 Right.  You don't like the history? 
[15:38:56] <halfak>	 Does it cause it to load up slow? 
[15:38:57] <awight>	 It’s really petty of me and I’m fine with however other people like to work...
[15:39:22] <halfak>	 Makes sense to me to keep adding to the top and archive when it causes a problem (usually around 2-3k lines)
[15:39:33] <awight>	 sure but I like to have my list get shorter, rather than struck-through stuff lingering
[15:40:16] <awight>	 It’s probably easier to explain by voice.
[15:40:22] <halfak>	 awight, it's your list 
[15:40:32] <halfak>	 delete that shit
[15:40:40] <awight>	 But for example, my line 53 has some finished items and one outstanding item
[15:40:52] <awight>	 it doesn’t work to move the struck-through pieces to last week or anything
[15:40:53] <halfak>	 Right.  
[15:41:12] <halfak>	 Oh.  So this isn't about keeping stuff from last week. 
[15:41:19] <halfak>	 It's about not deleting stuff from this week?
[15:41:19] <awight>	 so the options are just * delete old crap or * leave it struck-through until entire tasks are done
[15:41:38] <awight>	 no worries—as long as no one else cares, I’ll maintain however works for me
[15:41:59] <awight>	 :) but then “last week” is actually a snapshot of the beginning of this week!
[15:42:03] <halfak>	 awight, I've been removing struck out stuff that was covered in last week. 
[15:42:09] <halfak>	 awight, right!
[15:42:09] <awight>	 meh
[15:42:31] <halfak>	 e.g. multiweek items get smaller if they carry over to the next week. 
[15:42:39] <awight>	 haha okay on to business.  Hey this TODO format finally got me to respond to some of your JADE talk topics!
[15:42:44] <halfak>	 I feel like it's useful to just communicate what I'm working on this week. 
[15:42:51] <halfak>	 :) 
[15:43:01] <awight>	 +1 just holler if my deletionism gets to be confusing
[15:45:03] <wikibugs>	 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10ORES, 10Documentation: Elaborate documentation on how to deploy ORES to a new wiki - https://phabricator.wikimedia.org/T182054#3813174 (10awight)
[15:50:32] <wikibugs>	 10Scoring-platform-team, 10ChangeProp, 10ORES, 10Services (doing): Change ORES rules to send all events to new "/precache" endpoint - https://phabricator.wikimedia.org/T158437#3813178 (10Halfak) https://github.com/wiki-ai/ores/pull/239
[15:52:00] <wikibugs>	 10Scoring-platform-team (Current), 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Sprint-2017-11-22: ORES thresholds for Wikidata is too strict - https://phabricator.wikimedia.org/T180450#3813186 (10thiemowmde) p:05Triage>03Low
[15:52:02] <wikibugs>	 10Scoring-platform-team, 10Operations, 10Wikimedia-Logstash, 10monitoring, 10Wikimedia-Incident: Send celery and wsgi service logs to logstash - https://phabricator.wikimedia.org/T181630#3813187 (10awight) Celery is now logging verbosely to /srv/log/ores/app.log, please wire that into logstash.
[15:52:42] <travis-ci>	 wiki-ai/ores#881 (fix_precache - a569d6d : halfak): The build passed. https://travis-ci.org/wiki-ai/ores/builds/311927045
[15:52:47] <halfak>	 \o/
[15:53:00] <halfak>	 I should add a pre-cache route to the ore API test. 
[15:55:59] <awight>	 +1 for coverage
[15:56:15] <awight>	 We should probably set aside a day for increasing test coverage together.
[16:02:58] <awight>	 FYI, I’m running a stress test on the new cluster for fun.
[16:03:27] <halfak>	 Cool
[16:03:45] <halfak>	 Amir1, I'm working on a test case for that.  Will submit a follow-up
[16:04:14] <Adotchar>	 Halfak
[16:04:18] <Adotchar>	 Can you block IP’s
[16:04:29] <halfak>	 Not directly. 
[16:04:33] <Adotchar>	 Okay.
[16:11:10] <Amir1>	 halfak: hey, cool
[16:19:32] <halfak>	 Damn rounding errors
[16:23:41] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3813376 (10awight)
[16:31:52] <halfak>	 Amir1, https://github.com/wiki-ai/ores/pull/240
[16:38:41] <halfak>	 passed testing :) 
[16:38:47] <halfak>	 back in a bit
[16:48:31] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3813464 (10awight) Running a low-ish test at 1,200 req/min, https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&from=151249012992...
[16:57:16] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3813527 (10awight) I got this shred of stack trace from `service celery-ores-worker status -l`, but can't get at anything more with m...
[16:57:56] <awight>	 halfak: fyi I can reproduce the nodes dying during a low-level stress test, but can’t find *any* diagnostics about what’s happening.
[16:59:24] <halfak>	 With celery 4?
[16:59:25] <awight>	 Good news is that each server currently has a ceiling of about 700 req/min (if they weren’t dying), which would add up to about 6k req/min, and our highest traffic ever is 4k req/min
[16:59:28] <awight>	 no, celery 3.
[16:59:45] <awight>	 This is the same behavior we were seeing on production, BTW.
[16:59:50] <halfak>	 Not really
[16:59:57] <awight>	 I just don’t know if it was OOM or what.
[16:59:59] <halfak>	 We haven't seen celery nodes dying in a long time. 
[17:00:11] <awight>	 sure we have, with OOM
[17:00:13] <awight>	 just last week.
[17:00:20] <halfak>	 Right.  That was weird. 
[17:00:21] <halfak>	 And new
[17:01:10] <awight>	 https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&from=1511887901770&to=1511923909519&panelId=3&fullscreen
[17:02:06] <awight>	 I don’t think that’s true.  Here’s an example from April, and I can find more: https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&from=1492515342917&to=1492883503801&panelId=3&fullscreen
[17:02:38] <awight>	 https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&from=1491790074080&to=1492086112601&panelId=3&fullscreen
[17:03:02] <awight>	 https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&from=1497964572262&to=1498219041327&panelId=3&fullscreen
[17:03:11] <awight>	 It seems to be a thing.
[17:04:16] <halfak>	 New to me then, I guess. 
[17:04:26] <halfak>	 Check the syslog for OOM
[17:05:06] <awight>	 I think it’s OOM, which we can see with this nice new graphs: https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&from=1512492553537&to=1512492965200&panelId=24&fullscreen&refresh=10s
[17:05:29] <halfak>	 Ahh yup
[17:05:43] <awight>	 I’ll ask to tune down the workers?
[17:05:53] <awight>	 Not sure how to make a good guess though.
[17:06:15] <halfak>	 Weren't we not even close to OOM before?
[17:07:00] <Amir1>	 halfak: sorry, was afk for dinner
[17:07:04] <Amir1>	 {{merged}}
[17:07:05] <AsimovBot>	 10[3] 04https://meta.wikimedia.org/wiki/Template:merged
[17:07:45] <awight>	 halfak: Check this out, https://phabricator.wikimedia.org/T181544
[17:07:50] <awight>	 The formula was flipped.
[17:07:57] <halfak>	 Ahh
[17:07:58] <awight>	 total - free = available
[17:08:12] <halfak>	 That formula isn't flipped. 
[17:08:21] <awight>	 total - free?
[17:08:31] <halfak>	 diffseries(total, free) == used
[17:08:45] <awight>	 They were labeled “available memory"
[17:08:54] <halfak>	 Ahh
[17:08:55] <awight>	 and we had no way of seeing the memory ceiling
[17:09:07] <awight>	 now they graph just "free"
[17:10:07] <halfak>	 Makes sense. 
[17:11:08] <halfak>	 Thanks Amir1!  Just saw your merge :) 
[17:15:57] <Amir1>	 halfak: yw :)
[17:42:22] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3813675 (10awight) I'm pretty sure it's just an OOM, still it would be nice to be able to read more logs.  The available memory graph...
[17:42:33] <awight>	 tl;dr, To estimate the number of "grown" workers we can support: 57.5GB free / (72MB idle + 400MB working) = 125 workers. Slightly conservatively, let's try 100 workers.
[17:47:48] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3813703 (10Halfak) This is probably not a memory leak.  Workers increase in memory because they variable amounts of data for performi...
[17:48:32] <awight>	 halfak: ^ thanks for weighing in there.  I just realized I can pop over to real production to get the real memory use.
[17:50:07] <awight>	 great, it’s less than my estimate.
[17:50:12] <halfak>	 :) 
[17:50:24] <awight>	 The mode is ~1.27GB
[17:51:32] <awight>	 The growth makes quite a big difference, even using that number, actual memory use goes up by about 4x for each worker.
[17:51:46] <awight>	 we should patch `top` while we’re at it...
[17:55:02] <awight>	 https://www.selenic.com/smem/
[17:55:44] <Zppix>	 O/
[18:01:59] <halfak>	 OMG meetings done. 
[18:02:05] <halfak>	 NOO 
[18:02:07] <halfak>	 Another meeting
[18:02:10] <halfak>	 A 1.5 hour meeting
[18:02:11] <halfak>	 why
[18:02:13] <halfak>	 why?
[18:02:16] <halfak>	 :'( 
[18:08:46] <Zppix>	 halfak: if youll take my exams for my ccna certification ill go to your meeting :P
[18:16:40] <paladox>	 what is ccna?
[18:18:39] <Zppix>	 Cisco certified network admin
[18:27:04] <bearloga>	 last night I came across this video of a junior-now-senior-in-HS giving a presentation on teaching a tensorflow-built deep neural net to play chess https://www.youtube.com/watch?v=bJfqn4Ysvsk and, like, wow…kids these days
[18:30:52] <refeed[m]>	 wow
[18:31:26] <Zppix>	 Wow
[19:40:39] <awight>	 halfak: You think it’s acceptable to holler in -operations to ask for a merge of https://gerrit.wikimedia.org/r/#/c/395579/ ?
[19:40:47] <awight>	 I’m unclear on ops ettiquete
[19:42:25] <awight>	 Can’t mess with mutante’s IRC highlighting patterns
[19:43:35] <mutante>	 done :)
[19:44:22] <mutante>	 that being said.. in theory i want us to never need the "beg on IRC" thing because everybody just reacts to being added on Gerrit :)
[19:44:40] <mutante>	 but i also know that's not always realistic
[19:45:22] <mutante>	 i do check the gerrit incoming queue in web ui and see that in web ui (not email)
[19:45:38] <awight>	 mutante: In Alex’s defense, he’s in UTC+1 or something, so I was only fishing cos he can’t possibly be working still.
[19:46:02] <awight>	 hehe but you can be on my gerrit mailing list
[19:46:27] <mutante>	 heh! ok.  And yes, Alex is really good at it
[19:46:47] <mutante>	 (noticing gerrit)
[20:02:21] <wikibugs>	 10Scoring-platform-team (Current), 10Wikimedia-Incident: How can we test all the wiki/page combinations that can be affected by ORES? - https://phabricator.wikimedia.org/T181830#3814130 (10greg)
[20:02:36] <halfak>	 o/
[20:02:42] <halfak>	 Sorry was AFK for lunch and forgot to say
[20:03:14] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Scap, 10Release-Engineering-Team (Next): scap support for git-lfs - https://phabricator.wikimedia.org/T181855#3814132 (10greg)
[20:05:15] <wikibugs>	 10Scoring-platform-team (Current), 10GitHub-Mirrors, 10ORES: Disconnect scoring repos to stop mirroring from GitHub - https://phabricator.wikimedia.org/T181851#3814134 (10greg) really UBN!? And is it done done?  (Also, please be careful of what tags are dragged along when creating sub-tasks.)
[20:06:42] <wikibugs>	 10Scoring-platform-team, 10Gerrit, 10ORES, 10Operations, and 2 others: Plan migration of ORES repos to git-lfs - https://phabricator.wikimedia.org/T181678#3814151 (10greg)
[20:06:54] <wikibugs>	 10Scoring-platform-team, 10Documentation, 10Wikimedia-Incident: Document ORES architecture from a robustness perspective - https://phabricator.wikimedia.org/T181831#3814153 (10greg)
[20:06:57] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Scap, 10Release-Engineering-Team (Watching / External): ORES should use a git large file plugin for storing serialized binaries - https://phabricator.wikimedia.org/T171619#3814156 (10Halfak)
[20:06:59] <wikibugs>	 10Scoring-platform-team (Current), 10GitHub-Mirrors, 10ORES: Disconnect scoring repos to stop mirroring from GitHub - https://phabricator.wikimedia.org/T181851#3814154 (10Halfak) 05Open>03Resolved I think UBN was reasonable for this one.  We realized that, should we have an issue in prod, our hands would...
[20:07:04] <wikibugs>	 10Scoring-platform-team (Current), 10Wikimedia-Incident: Improvements to ORES deployment documentation and process - https://phabricator.wikimedia.org/T181183#3814157 (10greg)
[20:07:08] <awight>	 halfak: \p/ https://grafana-admin.wikimedia.org/dashboard/db/ores?orgId=1&panelId=3&fullscreen&from=1512503519630&to=1512504359630
[20:07:34] <wikibugs>	 10Scoring-platform-team (Current), 10GitHub-Mirrors, 10ORES: Disconnect scoring repos to stop mirroring from GitHub - https://phabricator.wikimedia.org/T181851#3814159 (10greg) Cool, thanks!
[20:08:35] <halfak>	 awight, looking.  Is the success having workers stay online. 
[20:09:30] <awight>	 yes!
[20:09:55] <awight>	 argh doesn’t look good, actually.
[20:10:02] <awight>	 oh it does
[20:10:04] <awight>	 weird stuff
[20:10:32] <halfak>	 Nice.  Still overloaded :/  But that's a nice high number to be overloaded on
[20:10:55] <awight>	 Looks like we can increase the # of workers again
[20:11:55] <awight>	 4.6k req/min!
[20:12:09] <halfak>	 ^ *scores*
[20:12:18] <halfak>	 That's important because cached stuff is cheap :) 
[20:12:30] <halfak>	 And if we expect 60-80% cache hits
[20:13:32] <halfak>	 30GB available per worker node O.O
[20:13:55] <awight>	 So I found a long-term average of 522 req/min
[20:14:11] <awight>	 and peak of 4k for a day but that never happened again
[20:14:30] <halfak>	 awight, that might have caused an overload
[20:14:40] <awight>	 good—I was intentionally overshooting
[20:14:53] <halfak>	 Depending on when it happened, we know that our overload metrics were borked for a bit. 
[20:14:56] <awight>	 it went away, though!
[20:18:31] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3814202 (10awight) Downsizing my estimate.  With 100 workers/box, we're hovering around 28GB free, so (57.5GB - 28GB) / 100 workers =...
[20:19:02] <halfak>	 awight, did we ever push our max queue size on the ores* boxes?
[20:19:12] <halfak>	 Our max queue size is what engages the overload state. 
[20:19:24] <halfak>	 Old queue was 100 score_requests
[20:19:31] <awight>	 It’s currently set to 200
[20:19:38] <halfak>	 OK good. 
[20:19:55] <awight>	 We really don’t need to increase workers, but I’ll do it for fun.
[20:20:45] <halfak>	 New queue should be roughly (95% score time/10 seconds)*worker_processes
[20:21:12] <halfak>	 Woops that shoulds have been 10 seconds / 95% score time
[20:21:26] <halfak>	 so (4 * worker_processes)
[20:21:49] <halfak>	 Theoretically, the queue can fully clear in 10 seconds 95% of the time. 
[20:22:40] <awight>	 ok I’ll blindly set that in this patch
[20:23:03] <awight>	 150 workers, backlog of 600
[20:23:22] <awight>	 https://gerrit.wikimedia.org/r/#/c/395608/
[20:25:43] <halfak>	 +1
[20:25:46] <awight>	 increasing the stress x5 to see what happens
[20:26:14] <awight>	 The waves are strange
[20:27:23] <awight>	 Can we install more memory btw?
[20:27:34] <awight>	 the CPUs stay pretty cold
[20:28:06] <halfak>	 awight, regretfully not.  These machines are to become standard kubernetes nodes eventually. 
[20:28:28] <halfak>	 Maybe that means eventually we'll be able to build a more flexible container that can use lots of ram but little CPU. 
[20:28:31] <awight>	 ah k
[20:29:33] <awight>	 We might be able to lower our memory usage, too.
[20:29:37] <awight>	 It’s not the models.
[20:30:23] <awight>	 Just in-flight stuff, that adds up too much.  Shouldn’t need an average of 300MB eh
[20:30:32] <awight>	 maybe there’s other memory we can warm up before forking, too!
[20:31:19] <awight>	 This is strange—the much higher stress hasn’t caused any overload yet, but should have.
[20:31:25] <awight>	 it’s at —delay 0.001
[20:31:48] <awight>	 That’s 60k per minute, but only 3.7k are being processed.
[20:32:36] <halfak>	 hmm... maybe there's a limitation to how fast requests can be produced using the strategy I have. 
[20:32:52] <halfak>	 awight, try two parallel stress tests ;) 
[20:33:23] <awight>	 I have to run for now, back in a few hours.  I’ll turn this off, feel free to try it if you’re inspired!
[20:34:12] <halfak>	 kk thanks awight!
[20:51:32] <halfak>	 I'm going to go run outside and get the snow shoveled before it turns into perma-ice. 
[20:55:38] <wikibugs>	 10Scoring-platform-team: MediaWiki API query `rvprop=oresscores` does not always return `draftquality` - https://phabricator.wikimedia.org/T182146#3814293 (10Samtar)
[20:56:08] <wikibugs>	 10Scoring-platform-team, 10ORES: MediaWiki API query `rvprop=oresscores` does not always return `draftquality` - https://phabricator.wikimedia.org/T182146#3814306 (10Samtar)
[20:56:37] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3814308 (10akosiaris) >>! In T169246#3813527, @awight wrote: > I got this shred of stack trace from `service celery-ores-worker statu...
[20:57:08] <wikibugs>	 10Scoring-platform-team, 10ORES: MediaWiki API query `rvprop=oresscores` does not always return `draftquality` - https://phabricator.wikimedia.org/T182146#3814293 (10Samtar)
[21:00:09] <TheresNoTime>	 Wotcha clever people - is anyone around who knows a thing or two about ORES? Bonus points if that extends to API usage a la T182146
[21:00:10] <stashbot>	 T182146: MediaWiki API query `rvprop=oresscores` does not always return `draftquality` - https://phabricator.wikimedia.org/T182146
[21:23:58] <halAFK>	 o/ TheresNoTime 
[21:24:13] <halfak>	 Had to step away to shovel the snow while there's still a little daylight
[21:30:45] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3814381 (10Halfak) Looks like that was before worker count was changed, right?   The last stress test started at 20:08.
[21:36:25] <TheresNoTime>	 Hi halfak, "lucky" to have snow! :)
[21:38:30] <halfak>	 heh sort of. 
[21:38:37] <halfak>	 TheresNoTime, so what's the question about ORE?
[21:39:25] <TheresNoTime>	 Summed up in T182146, but tl;dr is `rvprop=oresscores` is only sometimes returning draftquality (https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&titles=Main_page&formatversion=2&rvprop=oresscores vs https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&titles=Wikipedia&formatversion=2&rvprop=oresscores)
[21:39:26] <stashbot>	 T182146: MediaWiki API query `rvprop=oresscores` does not always return `draftquality` - https://phabricator.wikimedia.org/T182146
[21:39:28] <Zppix>	 halfak: where can i pull the results of the wikilabels campaign for simplewiki (when there is more progress)
[21:39:42] <halfak>	 results?
[21:39:52] <Zppix>	 Err data
[21:40:27] <halfak>	 bg
[21:40:39] <halfak>	 Um... you don't need to put it anywhere 
[21:40:40] <Zppix>	 Bg?
[21:41:12] <Zppix>	 halfak: i do actually i want to use that data for something else
[21:43:37] <halfak>	 Oh well you can download it right from labels.wmflabs.org when it is done. :) 
[21:48:09] <Zppix>	 Cool
[21:48:20] <TheresNoTime>	 halfak: any ideas as to why draftquality is only sometimes being returned? :-)
[21:49:21] <halfak>	 TheresNoTime, the revision was probably saved before we enabled draftquality
[21:50:57] <TheresNoTime>	 ah.. when was that enabled?
[21:52:22] <halfak>	 A few days ago. 
[21:52:23] <TheresNoTime>	 Hmm, I was getting drafts from https://en.wikipedia.org/wiki/Category:AfC_pending_submissions_by_age/Very_old, which probably pre-date draftquality
[21:52:35] <TheresNoTime>	 *definitely pre-date draftquality
[21:52:35] <halfak>	 TheresNoTime, if you want to access ORES directly, that might be better
[21:52:46] <halfak>	 see https://ores.wikimedia.org
[21:53:16] <halfak>	 E.g. https://ores.wikimedia.org/v3/scores/enwiki/813854746/draftquality
[21:55:02] <TheresNoTime>	 So, just to I'm clear, and this is probably over-simplified - `rvprop=oresscores` returns "pre-scored" values (and thus old revisions won't have draftquality scores), whereas `/v3/scores/enwiki/` is scoring live?
[21:55:07] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3814414 (10Halfak) It seems that the fastest we can send requests from one machine is about 4.5k/min or 75/sec.  I tried running two...
[21:55:29] <halfak>	 TheresNoTime, right. 
[21:55:34] <TheresNoTime>	 I'll definitely use https://ores.wikimedia.org then :)
[21:55:39] <halfak>	 OK :)  
[21:55:49] <halfak>	 Make sure you don't make more that 2 parallel requests at a time. 
[21:56:04] <halfak>	 And if you're making a tool, add it to the list of ORES tools :) 
[21:56:13] <Zppix>	 ^ why only 2?
[21:56:22] <halfak>	 https://www.mediawiki.org/wiki/ORES/Applications
[21:56:30] <halfak>	 Zppix, because you don't want to overload ORES. 
[21:56:30] <wikibugs>	 10Scoring-platform-team, 10ORES: MediaWiki API query `rvprop=oresscores` does not always return `draftquality` - https://phabricator.wikimedia.org/T182146#3814417 (10Samtar) 05Open>03Invalid This only occurs on revisions prior to `draftquality` being enabled, so is expected behaviour - oops!
[21:56:40] <halfak>	 api.php says to make only one request at a time. 
[21:56:44] <halfak>	 We can hndle two in parallel. 
[21:56:58] <Zppix>	 Says who maybe i wanna overload it xD
[21:57:33] * halfak bans zppix from ORES
[21:57:42] <TheresNoTime>	 halfak: plan is to scan through all the old AfC drafts and get a list of very badly scoring drafts - so it'll be one-by-one, but quite a few requests
[21:57:47] <TheresNoTime>	 they'll probably be batched though
[21:58:04] <TheresNoTime>	 will that cause any problems?
[21:58:12] <halfak>	 TheresNoTime, cool.  If you send us requests of 50 revid batches, that'll be the fastest
[21:58:23] <TheresNoTime>	 Sounds good :)
[21:58:25] <halfak>	 We have a python utility designed to help you if you want to use it. 
[21:58:44] <halfak>	 It's kind of poorly documented ATM :\
[21:59:19] <TheresNoTime>	 Oh, link?
[22:00:04] <halfak>	 https://github.com/wiki-ai/ores/blob/master/ores/api.py#L20
[22:00:50] <halfak>	 You'll need to be using python3, pip install ores (might fail.  See https://github.com/wiki-ai/revscoring#installation)
[22:00:57] <halfak>	 The from ores.api import Session
[22:01:25] <TheresNoTime>	 Only the best installation instructions include a "might fail" :-P
[22:01:55] <halfak>	 Heh.  But the link to what you might be missing as far as dependencies. 
[22:02:06] <halfak>	 Regretfully, pip can't install a bunch of unix libraries for you :/
[22:03:17] <TheresNoTime>	 Well thank you for the clarification ref the /other/ API call!
[22:03:32] <halfak>	 No problem TheresNoTime.  :) 
[22:18:45] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3814494 (10Halfak) OK nevermind.  It seems like we have another limit.  I ran a stress test on ores1001 and ores1002.  Here's what I...