[01:11:45] 06Revision-Scoring-As-A-Service, 10revscoring: Implement sentences datascources & experiment with normalization. - https://phabricator.wikimedia.org/T148867#2743671 (10Halfak) OK. So I've been looking at getting clean sentences out of Wikipedia articles. I get a lot of nonsense around tables. It would be ni... [06:25:54] 06Revision-Scoring-As-A-Service: Review ORES Grafana metrics - https://phabricator.wikimedia.org/T149015#2743907 (10awight) 05Open>03Resolved a:03awight The new graphs look great! [06:27:23] 06Revision-Scoring-As-A-Service: Review ORES Grafana metrics - https://phabricator.wikimedia.org/T149015#2743922 (10awight) @Halfak You might want to review the TODOs at [[ https://www.mediawiki.org/wiki/ORES/Metrics | mw:ORES/Metrics ]] as well. [08:13:20] halfak: seen this? https://freedom-to-tinker.com/2016/08/24/language-necessarily-contains-human-biases-and-so-will-machines-trained-on-language-corpora/ [14:59:23] 06Revision-Scoring-As-A-Service, 10ORES: Implement datasources_extracted metric - https://phabricator.wikimedia.org/T149199#2745036 (10Halfak) [14:59:31] 06Revision-Scoring-As-A-Service, 10ORES: Implement datasources_extracted metric - https://phabricator.wikimedia.org/T149199#2745051 (10Halfak) https://github.com/wiki-ai/ores/pull/176 [15:00:32] 06Revision-Scoring-As-A-Service: Review ORES Grafana metrics - https://phabricator.wikimedia.org/T149015#2739711 (10Halfak) {T149199} [15:16:27] 06Revision-Scoring-As-A-Service: Review ORES Grafana metrics - https://phabricator.wikimedia.org/T149015#2745111 (10Halfak) Also {{done}} for reviewing those todos. [15:45:54] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Meta ORES: UI for reviewing how ORES classifies you and your stuff - https://phabricator.wikimedia.org/T148700#2745208 (10Capt_Swing) @halfak this is probably part of the plan already, but wanted to lobby for it if it isn't: there are two use cases for refutatio... [15:48:08] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Meta ORES: UI for reviewing how ORES classifies you and your stuff - https://phabricator.wikimedia.org/T148700#2745213 (10Halfak) Agreed. I'd include edits (or pages or users) you are concerned about in "your stuff". I'd like to pursue this, but right now, it... [15:50:16] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Meta ORES: UI for reviewing how ORES classifies you and your stuff - https://phabricator.wikimedia.org/T148700#2745219 (10Capt_Swing) Would non-engineering support be helpful? [15:52:21] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Meta ORES: UI for reviewing how ORES classifies you and your stuff - https://phabricator.wikimedia.org/T148700#2745221 (10Halfak) Good Q. Yes! I'd like to get this type of system well documented before-hand. We might even start imagining where we'd put a proo... [16:38:17] (03PS8) 10Anomie: Action API integration for ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/313831 (https://phabricator.wikimedia.org/T143614) [16:38:57] (03CR) 10Anomie: "I'm hoping the watchlist patch will be merged soon enough to avoid having to split things here." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/313831 (https://phabricator.wikimedia.org/T143614) (owner: 10Anomie) [16:41:04] halfak: tiny question--did you intend to have both bars & lines? https://grafana.wikimedia.org/dashboard/db/ores?panelId=11&fullscreen&from=1477420624120&to=1477420972342 [16:42:50] And, a bigger question, I guess: celery worker threads seem to take about 1GB of resident memory, but only last a few seconds of CPU time. That seems like a lot of overhead. [16:45:21] meh--just convinced myself that they last a few minutes of wall time, so whatever about the initialization cost. [16:45:44] lol woops [16:45:55] That looks pretty silly, huh? [16:46:08] a bit ;) [16:46:36] adamwight, yeah, we restart celery threads periodically because of some weird internal memory issues. [16:46:49] But they'll score at least 100 times before restarting [16:46:50] good idea [16:46:52] Might be 1000 [16:47:41] Removed the bars from that plot [16:47:50] btw, are we happy with the 1s median response time? [16:47:56] I was going to try to profile a bit, but if that's sounds like reasonable performance, I won't bother [16:48:35] adamwight, it's not what I want, but it's pretty good. [16:48:38] confirmed :) [16:48:51] We have a profiler in `revscoring extract` that you can use to explore. [16:49:21] It's hard to just use python's profile to make sense of things. Our profiler will tell you the time to solve each dependency. [16:49:53] It works nicely for finding out how long it takes to get text (IO) and how long to process that text (CPU) for the two "datasources" that do that [16:50:55] ok great [16:50:58] While I'm causing ADHD, I was wondering... I made a request to the API, then didn't find my score in the redis cache... [16:55:06] https://ores.wikimedia.org/v2/scores/enwiki/damaging/745065890/ should -> ores:enwiki:damaging:745065890:* AIUI [16:55:38] meanwhile, my browser reports X-Cache:"cp1061 miss, cp2012 miss, cp4002 miss, cp4001 hit/1" [16:55:39] which makes me feel funny. [16:56:44] adamwight, need to include the version number of the model in the redis key [16:56:46] Should we be investigating whether we're going through frontend cache by mistake? [16:57:12] ores:enwiki:damaging:0.1.2:745065890 [16:57:18] yah I thought the wildcard at the end would cover that, e.g. "keys ores:enwiki:damaging:745065890:*" [16:57:34] Ohh... does the version come after the ID? [16:57:41] I thought [16:57:51] Oh! It does! [16:57:52] * adamwight reads code [16:58:20] https://github.com/wiki-ai/ores/blob/master/ores/score_caches/redis.py#L42 [16:58:36] adamwight, are you connected to the right redis port? [16:58:51] we have a redis for celery and a redis for caching [16:59:32] I believe: :6380 for the score cache [16:59:35] I see lots of other keys using "randomkey" [16:59:41] e.g. ores:viwiki:reverted:25399141:0.1.2 [17:00:22] It's... really strange though that my revision doesn't appear in the score cache. [17:00:30] Agreed. [17:00:34] The TTL on those entries is 15 years or something [17:00:53] It uses an LRU strategy to stay within memory [17:01:04] so even if it had been calculated and was being served from mystery varnish, there should still be a key [17:01:07] oh. that might explain it then [17:01:14] & would also confirm that we're being varnished. [17:01:43] Hm... we shouldn't be being varnished, but it could be that we've not deployed the cache-avoiding headers yet [17:01:43] Which is a problem cos it bypasses model version invalidation [17:02:01] * adamwight is busy making problems to solve :) [17:02:21] adamwight, right [17:02:22] I don't think we should have to use anti-caching headers cos we aren't supposed to be included in the varnish "misc" config [17:02:27] Looks like we need a deploy [17:02:35] We have anti-caching headers merged. [17:02:51] I've been working on a revscoring upgrade that required regenerating all of the models. [17:02:56] So that's due [17:03:10] I should be able to send that to WMFLabs today or tomorrow. [17:03:38] Amir1 is currently looking into why a recent deploy that fixes some logging issues failed. [17:03:42] So we're blocked on that. [17:04:28] Example use of @nocache: https://github.com/wiki-ai/ores/blob/master/ores/wsgi/routes/v2/scores.py#L16 [17:04:34] kk that would be a good time to test this varnish theory [17:04:36] athough, I was going to simply delete the cached score from redis [17:04:51] https://github.com/wiki-ai/ores/blob/master/ores/wsgi/util.py#L89 [17:05:15] O_o [17:05:53] We would verify those headers by making a curl request from the labs server, maybe [17:06:13] Oh! You don't think they'd get passed through? [17:07:09] I know the headers would be slightly changed, maybe those would pass through but I wouldn't trust it [17:09:09] Seems like they should pass through to such down secondary caches -- like the browser's [17:11:38] I'm going to go grab lunch. I'll read scrollback when I get back. Also, I think I'll summarize what's needed in order to get an ORES deploy done. [17:11:44] until then, [17:11:49] There are directives for the browser, and also for varnish [17:29:58] halfak|Lunch: fwiw, it does look like our cache-control header should be disabling varnish caching: https://github.com/wikimedia/operations-puppet/blob/production/modules/varnish/templates/vcl/wikimedia-common.inc.vcl.erb#L331 [17:35:18] It's getting creepy. https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Misc%2520Web%2520caching%2520cluster%2520ulsfo&tab=m&vn=&hide-hf=false [17:35:24] cp4001: down. [17:35:34] * adamwight hard refreshes [17:35:52] X-Cache:"cp1061 miss, cp2012 miss, cp4002 miss, cp4001 hit/2" [17:56:05] Oh. oh [17:56:26] I must be on the wrong boxes? [17:56:38] Is this the labs or the production project? https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores [18:02:19] 06Revision-Scoring-As-A-Service: Spike: Is Varnish caching ORES responses? - https://phabricator.wikimedia.org/T149223#2745755 (10awight) [18:03:50] 06Revision-Scoring-As-A-Service: Spike: Is Varnish caching ORES responses? - https://phabricator.wikimedia.org/T149223#2745773 (10awight) I think I must be looking at staging boxes. Can someone help me access the production cluster? [18:05:38] hmm varnish + nginx of ores caching [18:06:01] ToAruShiroiNeko: hi! [18:06:06] greetings [18:06:19] yeah it's causing me a raised eyebrow [18:06:39] most of the creepy is probably explained by accidentally probing the staging boxes [18:07:06] unfortunately I have to run for a few hours... [18:29:25] I'm back, but I got hammered by messages so I'm working through them now [20:24:57] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 03Research-and-Data-2016-17-Q1, 15User-Ladsgroup: Generate recent article quality scores for English Wikipedia - https://phabricator.wikimedia.org/T135684#2746284 (10Halfak) (145058 + 33858) / (27118 + 5820) = 5.43 X as many GA+ articles [22:42:04] 06Revision-Scoring-As-A-Service, 10Wikilabels: Load new stratified edit_type campaign - https://phabricator.wikimedia.org/T149256#2746772 (10Halfak) [22:42:12] 06Revision-Scoring-As-A-Service, 10Wikilabels: Load new stratified edit_type campaign - https://phabricator.wikimedia.org/T149256#2746791 (10Halfak) https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_edit_types/Work_log/2016-10-26 [22:57:34] 06Revision-Scoring-As-A-Service, 10revscoring: Implement sentences datascources & experiment with normalization. - https://phabricator.wikimedia.org/T148867#2746866 (10Halfak) OK. I've managed to build a simple sub-parser for handling tables. Essentially, I'll parse an entire table into one giant "sentence"....