[16:52:15] halfak: is there a guideline on where to build data pipelines like the one were looking at? E.g. are there particular database servers or something? [16:52:57] I’m not aiming to design it ready for production, but having something that’s not half bad would be nice. [16:54:45] Nettrom, you should ask that question in -analytics :) [16:54:58] joal and milimetric will have opinions. [16:55:00] halfak: alright, will do, thanks! :) [16:55:06] Oh they are here ' [16:55:10] So you can ask them here too :D [16:55:40] I think we just did? :D [16:57:35] reminder for joal and milimetric: we're looking at extracting the "organic" link graph from wikitext and to aggregate "pageview rates" and other general stats from request logs. [16:58:56] so far I’ve figured out that getting pageview rates most likely means running some queries on Hive, and I’m thinking that the results go into some kind of intermediary table [17:29:50] Hi Nettrom and halfak -- I'll be in meeting soon, will catch up with you after [20:58:00] halfak: "working on api.php abuse" from scrum of scrums... know what that entails? Some of my tests against ORES just started breaking. [20:58:25] ragesoss, not you. I just did a deployment and something is up. [20:59:04] ah, I see, following along in -ai... [21:10:33] ragesoss, can you try what you're doing against ores.wmflabs.org and tell me how it goes? [21:16:40] halfak: tried a few times, and the tests were passing each time. failure is intermittent on production. Still slower than usual on wmflabs though, I think. [21:17:04] Yeah. Also seeing intermittent here. [21:17:16] The problem is only in codfw -- not in eqiad [21:17:28] (different datacenters -- should be same code and config) [21:17:31] still working on it. [21:18:14] that data center naming convention is so terrible. [21:20:12] { [21:20:12] "scores": { [21:20:12] "enwiki": { [21:20:12] "wp10": { [21:20:12] "scores": { [21:20:14] "675892696": { [21:20:16] "error": { [21:20:18] "message": "Timed out after 15 seconds.", [21:20:20] "type": "TimeoutError" [21:20:23] } [21:20:25] } [21:20:27] }, [21:20:29] "version": "0.5.0" [21:20:31] } [21:20:34] } [21:20:36] } [21:20:38] } [21:20:42] interesting place to hide the error message! [21:21:14] hadn't seen one before, but I wouldn't have expected so much nesting. [21:21:39] Yeah. It takes the place of the score. [21:21:50] I'll have to add some checks for that. [21:21:51] Because you can score multiple revisions and only some error. [21:23:16] ragesoss: https://4.bp.blogspot.com/-wTejjZemfws/Vm3fv97XotI/AAAAAAAACe0/LwWQmvjyMwo/s1600/haduuken.jpg [21:23:45] :-) [21:30:32] oh, I see, it was just a quirck of my test forcing a float value that make it come up as 0.0 rather than nil when the timeout errors occurred. [21:30:48] I was very confused about how my code was resulting in scores of 0.0. [21:56:13] ragesoss, looks like we're in the clear now. [21:56:25] Switched traffic to point back at eqiad and everything is happy again. [21:56:49] * halfak writes up and incident report. [21:59:56] sweet, thanks! it broke some tests, but looks like the production dashboard suffered no ill effects. so that's nice to know.