[00:56:26] o/ YuviPanda [00:56:52] halfak: doiiit [00:56:53] :) [00:57:31] I made a mistrake in mwapi. Need to upload a new version to pypu [00:57:32] https://github.com/mediawiki-utilities/python-mwapi/pull/18 [00:57:39] :D [00:58:47] We're up on staging [00:58:48] http://ores-staging.wmflabs.org/scores/enwiki/reverted/4567894/ [00:59:14] I randomly get this from the API: [00:59:16] "[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1769)" [00:59:32] If you just retry, it works [01:00:22] That's strange [01:00:42] halfak: in where? [01:00:47] In ores? [01:00:50] Yeah. [01:00:53] mwapi [01:01:20] So we might see a couple of those in the logs. [01:01:29] But they only happened right after startup [01:01:33] Ok [01:01:38] halfak: file a bug? [01:01:51] Yeah good call. [01:05:55] https://github.com/CodethinkLabs/sandboxlib/issues/7 [01:06:57] halfak: also quarry back again [01:08:09] halfak: and your datasets are there [01:08:23] Oh yeah! I saw you said that. [01:08:42] Will check it out as soon as I finish off this deploy. [01:09:31] So I've been running the precached against this for a while. It seems that each worker will raise this error once. [01:10:26] I think we should move forward. [01:10:48] So, the deploy is going to be a fun time again. [01:11:19] YuviPanda, ^ what do you think? [01:12:29] fyi, scikit-learn is kicking a bit more of my butt. But I'm sure I can finish the package over the weekend, is that blowing any schedules? [01:13:09] Now I'm back to some gfortran .so linking issue, which I swear I solved in another one of these backports [01:13:12] awight, hey dude. No problem. :) [01:13:28] Your last comment in the thread on phab suggested we change versions. [01:13:30] :D no more grant [01:13:31] I'm OK with that. [01:14:11] cool. For triaging purposes, I'd say it saves us a day of screwing around with the packaging [01:14:17] Woot! [01:14:21] Let's do it. [01:14:42] k. Hopefully there's a new animated gif or something to make the upgrade pain worthwhile [01:18:24] YuviPanda, that's the wrong dataset! But it's still a good one. [01:18:30] Woops. Sorry about that [01:18:55] I meant to link this one! http://datasets.wikimedia.org/public-datasets/enwiki/article_quality/article_period_quality_scores.tsv.bz2 [01:24:35] Oh well. The combination of those two datasets is very powerful [01:24:41] But they should switch names [01:25:15] The dataset you have loaded should be article_stats_enwiki_july2015 [01:26:43] halfak: I se [01:26:44] e [01:26:57] halfak: I'm writing a process for getting new datasources in it [01:27:01] halfak: https://etherpad.wikimedia.org/p/add-dataset-to-quarry [01:27:16] What do you think about moving forward with the ORES deploy? [01:27:19] halfak: the manual part now is making the mysql schema [01:27:28] halfak: do it! [01:30:37] Like clockwork [01:31:03] NoooO! [01:31:08] The web nodes failed [01:31:47] YuviPanda, http://pastebin.com/BziciKaT [01:32:07] ,,, [01:32:08] wtf is that [01:32:12] halfak: is that an outage now? [01:32:32] Yes [01:32:35] No [01:32:37] Yay! [01:32:47] Just slow [01:33:05] too slow? [01:33:21] OK for now. [01:33:25] I shut down precaching [01:33:38] scoring takes 3 seconds when it should take 1 [01:33:48] So. WTF is this? [01:33:57] ... [01:33:58] not usre [01:34:00] *sure [01:34:02] looking [01:34:04] Worked fine on the celery nodes [01:34:10] halfak: when did this fail? [01:34:13] halfak: fab? [01:34:17] halfak: on pip upgrade? [01:34:28] yea. I'll get the full error. [01:34:44] https://gist.github.com/halfak/c9ce7de9e6b85bd58c86 [01:35:35] halfak: so web-02 is unaffected? [01:35:40] It seems [01:35:49] It seems 01 is OK too [01:35:59] They are perfectly happy hitting the new celery nodes :S [01:36:08] If not a bit slow [01:36:23] halfak: restarted uwsgi on -02, since it was down earlier from the timeotu deadlock [01:36:36] kk [01:37:07] restarts take forever [01:37:09] another thing we need to fix [01:37:21] yeah. They don't on staging [01:37:52] I think it might just be connections slowly draining [01:38:00] and it restarting 'gracefully' [01:38:17] Is it up again? [01:38:20] so 1. stop accepting new connections, 2. wait for old connections to die, 3. restar [01:38:21] t [01:38:27] or it could be sometime totally else [01:38:31] I want to try to install ores 0.5.0 on -02 [01:39:08] It is still starting [01:39:19] halfak: let's not do that atm without seeing why it went wrong on -01 [01:39:24] also fuck this network [01:39:25] wtf [01:39:41] kk [01:40:56] Yeah. it's the venv. [01:41:01] I can't install anything with pip [01:41:09] halfak: I'm depooling -01 and repooling -02 [01:41:11] (-02 works) [01:41:13] kk [01:42:22] halfak: done [01:42:35] halfak: we can delete that venv and re-initialize it [01:42:44] https://github.com/pypa/pip/issues/2679 [01:42:51] Looks like it could be a corrupt pyc [01:43:38] hmm [01:43:42] should we delete just the pycs? [01:43:50] or all of it? [01:43:53] I think so. [01:45:23] halfak: for former or latter? [01:45:34] Oh... Let's do all of it. [01:45:39] Esp. since we are de-pooled [01:45:45] No big deal to wait. [01:45:46] halfak: yesah [01:46:00] halfak: so we rm -rf /srv, and do initialize_server [01:46:10] will do [01:46:21] halfak: want me to do it or? [01:46:28] Oh.. I'm there. [01:46:31] I feel like I should do this. [01:46:44] So I can get comfortable working in your absence :) [01:46:48] halfak: ok! +1 [01:46:50] halfak: I"ll be here [01:48:53] YuviPanda, https://gist.github.com/halfak/746c69da1074623b06a3 [01:49:06] Does that dir get created in provisioning? [01:49:25] halfak: oh, sorry forgot - after rm -rf do a 'sudo puppet agent -tv' [01:49:56] We're going to have a minor problem. [01:50:01] redis is an optional dependency [01:50:09] so we need to install it manually [01:50:19] .. [01:50:19] Or rather, I suppose we can add it to requirements.txt [01:50:21] that's bad [01:50:25] In order-wikimedia-config [01:50:32] *ores-wikimedia-config [01:50:53] I continue to maintain we should have a 'canonical' setup and not need a support matrix :) [01:50:57] halfak: but we can just do it manualy now [01:51:09] (sorry just occurred to me. I was debugging with aetilley today and only just worked out the implications now.) [01:51:16] right [01:51:30] we should switch to debs soooon [01:51:38] halfak: anyway, I'm ok doing it manually this time. [01:53:11] * halfak stages that quick [01:53:33] Cool. Without a blip in staging [01:53:50] ok [01:54:08] did you force a puppet ron on -01? (sudo puppet agent -tv) [01:54:12] it might've already happened [01:54:13] Yup [01:54:24] initializing now [01:54:31] Will be compiling scipy shortly [01:54:40] ahh that [01:54:41] Here we go [01:56:52] Scipy -- why oh why? Do you prevent my fly.. [01:56:56] code from being deployed [01:57:00] ranges :P [01:57:16] What's the version installed on the nodes? [01:57:38] * YuviPanda continues maintaining that ranges will cause nothing but pain [01:57:39] halfak: 0.14.0 [01:57:41] is what is [01:58:15] We never fixed the range. https://github.com/wiki-ai/revscoring/blob/master/requirements.txt#L13 [01:58:51] halfak: https://github.com/wiki-ai/revscoring/commit/78c898501ca73b2366d356cffa13483dee61782e [01:59:43] Where did this commit come from? [01:59:45] halfak: haha https://github.com/wiki-ai/revscoring/pull/186 [01:59:49] halfak: I made it durnig the outage last time [02:00:04] Weird, why did it get overwritten? [02:00:10] didn't get merged [02:00:12] it was on a merge [02:00:14] Oh! It's a pull [02:00:16] :P [02:01:24] yay [02:01:24] :D [02:04:22] * halfak compiles sklearn [02:04:29] :( [02:05:24] I guess we have to compile that one for now. [02:05:33] At least on initialize [02:05:46] * halfak takes the opportunity to kill the restart crons on the workers [02:06:32] halfak: yeah [02:12:05] OK. -01 should be online. [02:12:45] Looks like we're upgrading -02 [02:12:47] I didn't mean to do that. [02:12:53] -01 install went great [02:13:39] Yeah. canceled that [02:14:25] halfak: dartar and company want a discussion on librarybase. how does this discussion happen and where? [02:14:39] Yeah. It didn't get redis still. [02:14:58] harej, good Q. No idea. But I can ping DarTar when I next see him and ask him what he wants. [02:15:01] Or you could :) [02:17:02] OK. Clean run works well. [02:17:20] No more compiling crap! [02:17:30] Now I just wait for uwsgi to restart [02:17:44] harej, I can talk to you about what I want with librarybase [02:17:59] I wan't to build a metadata extraction strategy into mwcites. [02:18:10] *extract --> fetch [02:19:39] web-01 is in good shape [02:19:56] I'm going to pool it and then try web-02 [02:22:32] halfak: wooo [02:22:57] How do you run fab deploy_web against just one host? [02:23:16] I've been trying "fab deploy_web --hosts=ores-web-02.eqiad.wmflabs [02:23:23] But it will still start with -01 [02:25:03] halfak such that you retrieve the meta data once and then use Librarybase as a store? [02:25:12] Yes [02:27:32] AND we' [02:27:33] re done [02:27:34] ! [02:27:36] It works [02:28:00] harej, I want to prime librarybase like no one has primed a datastore before. [02:28:10] And then I want to talk to you about tracking citations. [02:28:20] halfak: wooo [02:28:22] Tay [02:28:23] Yay [02:30:55] OK. [02:30:59] I need to go run away. [02:31:06] I'll check back in about an hour [02:31:14] Thanks for the hand YuviPanda :) [02:31:20] harej, have a good night! [02:32:30] * harej will probably still be online in an hour [05:07:04] * halfak is late [05:07:26] And we're still online [05:07:59] hello halfak! [05:08:18] Just on for a couple minutes to make sure ORES is in a good state [05:08:55] Looks like Flower is in a weird state. [05:09:09] wisconsin? [05:09:13] Can't load JS and CSS or somethine [05:09:15] *g [05:09:23] Wisconsin's not that weird. [05:09:32] But I feel obligated to dislike it [05:10:16] OK. Looks like we're in a good state. [05:10:24] 8.14 million scores processed. [05:10:30] I'm off. Have a good one! [05:10:48] o/ [12:04:12] wisconsin? [14:41:45] What is difficult when building a wikimedia ai? I am nearly finished with the wikimedia json parser and plan to build an AI based on keyword tagging, pageranking, nlp with simple bayesian probabilistics and then using a neural network with feedforward training and rnn processing - am I speaking black here? [15:44:13] o/ pressure679 [15:44:43] So, flexibility in feature extraction is difficult. Working real time is difficult when you need to do computationally intensive things like diffing. [15:45:22] It's somewhat difficult to get all of the moving pieces together so that you can distribute cpu use. [15:45:57] So, why did you decide to build a json parser? [15:46:39] It sounds like you are planning to build something very complicated. Do you know that you need that level of complication in order to serve your goals. [15:51:34] hello halfak [15:51:41] Hey ToAruShiroiNeko_ [15:51:47] Digging into this right away [15:51:55] [2015-09-12, 16:15:04] <+HaeB> ...it seems that revscoring has suddenly developed a very negative opinion about english wp, coloring almost everything red ;) [15:52:04] Then I'm going to pick up that model_info proposal [15:52:38] morning all [15:52:42] o/ aetilley [15:52:43] or hello [15:53:03] halfak: Reason for json is just lighter to parse than html, and yes I do realize this is a big project which is going to take more than a few days or weeks. - and to the extration feature - I think a pattern matching on the most used pages would be good, and ofc first parse the pages with titles or maybe keywords with the ones the user wants. [15:53:56] pressure679, json parsers are common in the wild. That's why I wonder why you'd build your own. [15:54:00] What will your AI do? [15:56:10] * halfak rebuilds the enwiki reverted predictor [15:56:19] Looks like all of the models are affected. [15:56:30] ToAruShiroiNeko_, It could be an issues in one of our feature extractors [15:56:42] We'll find out if this model builds OK. [15:56:48] ok [15:56:54] I don't think anything changed in our model code. [15:57:41] Luckily, YuviPanda and I just cleaned up our deployment pattern so I can deploy new models as soon as we have this worked out. [15:58:40] halfak: My dream is it to interprete data from wiki pages into Bloom's Taxonomy, but right now I think that is far fetched as that needs abstraction on a ontological level. [15:58:53] *an ontological level. [15:59:15] So what is the now-goal? [16:00:26] halfak: Just create a client interface for easing educational purposes based on my experience of it. - kind of a q&a bot with some smart scripting features. [16:01:24] Have you seen the lit. around semantic relatedness and other intelligent analysis techniques on top of Wikipedia data? [16:03:07] OK. It looks like model training isn't a problem. [16:03:13] So I think it is in feature extraction [16:05:21] halfak: Not really, but thanks for the reference. I think I will go /away for now. [16:05:43] pressure679, one more thing if you have a sec [16:05:59] Check out http://shilad.github.io/wikibrain/ [16:06:04] It sounds like what you are working on. [16:06:13] Shilad is a regular in this channel :) [16:06:29] I hope to see you around soon :) [16:07:05] - oh btw, ofc word stemming is an priority in mapping keywords from wiki pages, and also nlp, but right now my language of choice doesn't have an optimal nltk for the purpose, so maybe a basic noun parser from basic adj/pro/verb exclusion will be used. [16:08:07] What lang? [16:08:46] stemming isnt always good [16:09:02] to stem or not to stem is uite a philosophical discussion [16:12:48] Ah I'm glad you all were able to find each other [16:13:12] Looks like user_age got borked. [16:26:54] is ores borked? https://en.wikipedia.org/w/index.php?title=Giant_panda&action=history every single edit is showing up as red [16:27:02] Yeah. We're working on it. [16:27:04] :/ [16:27:11] Sorry legoktm [16:27:19] ok :) [16:27:34] See https://meta.wikimedia.org/wiki/Talk:Objective_Revision_Evaluation_Service [16:27:36] for updates [16:27:45] I'm just about to post about a likely issue I found [16:28:13] * legoktm watchlists [16:29:08] * halfak posts a table of feature values that changed with the last update. [16:29:18] ToAruShiroiNeko_, PR for revscoring incoming [16:29:27] If you can merge that, I'll propagate the versions and try again. [16:29:32] One sec though [16:29:41] ok [16:31:15] ToAruShiroiNeko_, https://github.com/wiki-ai/revscoring/pull/187 [16:31:37] we need to hold a party for the 200th [16:31:47] 200th what? [16:31:50] Oh! PR [16:31:50] pr [16:31:58] :) [16:32:09] so problem was a typo? [16:32:26] Yes [16:32:35] A typo that does not return an error. [16:32:41] Silent failure is sneaky death [16:33:24] done [16:33:25] ya [16:33:40] my biggest blunder was when I forgot a ; after an if() [16:33:42] no errors [16:34:00] halfak: do you log API warnings? [16:34:17] Not warnings, but errors. Should log warnings. [16:34:32] In fact, I think we should throw an error for warnings, but that's another problem [16:34:47] Depends on the warning of course, I guess. [16:35:03] The warning "you used the wrong fucking parameter" should be an error IMO. [16:35:17] We can fix that on our side. [16:35:58] Parsoid also learned a similar lesson a while back about logging warnings [16:36:00] with perhaps a slightly more polite warning :) [16:36:21] legoktm, We should probably write that into mwapi? [16:36:42] logger.warning("You used the wrong *&%$(@& parameter") [16:36:46] lol [16:37:51] * halfak installs goddamn scypi [16:37:59] Seriously [16:38:15] logger.warning("I'm going install scipy again. :P") [16:39:11] legoktm, https://github.com/mediawiki-utilities/python-mwapi/issues/19 [16:39:39] woot [16:39:45] Well... Might as well fix that while we are waiting for scipy [16:39:51] we == halfak [16:40:09] and thanks for fixing MediaWiki* :) [16:40:39] legoktm, indeed. Nice to have you note the issue. [16:40:57] I'll be going through all the mw I maintain to clean them up and announce in the near future. [16:41:04] I'll make sure that gets on the checklist [16:41:22] legoktm, can you point me to a schema for API warnings? [16:41:49] https://en.wikipedia.org/w/api.php?action=query&ucusers=Foo&list=users [16:41:53] Here I suppose https://www.mediawiki.org/wiki/API:Errors_and_warnings [16:42:11] yeah... can there be multiple fields here? [16:42:18] "warnings": {"modulethatcausedit": {"*":"text that may contain multiple warnings jumbled up"}} [16:42:27] are you using formatversion=2? [16:42:28] The docs say I should expect a and field [16:42:30] https://www.mediawiki.org/wiki/API:Errors_and_warnings [16:42:34] Oh [16:43:09] doc['warnings']['main']['warnings'] [16:43:40] 'main' is because the warning is thrown by the 'main' module [16:43:54] Gotcha. [16:43:55] you'll want to iterate over all keys in the first 'warnings' dict [16:44:10] doc['warnings'][]['warning'] [16:44:15] kk [16:44:16] yep [16:44:25] Can I expect each value to contain a dict with the one key? [16:45:29] yes [16:45:45] 'warning' is a concatenated string of all the warnings that happened in the request [16:45:58] there's a bug somewhere to make it structured and machine readable [16:46:11] Gotcha Thanks. :) [16:46:33] Do you know how I could generate a complex warning? [16:47:21] https://en.wikipedia.org/w/api.php?action=query&ucusers=Foo&list=users|foobar&formatversion=2 [16:47:40] halfak so we can build a model for several languages [16:48:10] legoktm, where can I read up on how formatversion changes things? Or is it just for warnings? [16:48:53] we can build a german (geeez) and dutch models [16:52:10] legoktm, https://github.com/mediawiki-utilities/python-mwapi/pull/20 [16:52:16] Any feedback you have would be appreciated [16:53:16] * halfak submits new versions to pypi [16:54:55] halfak: https://www.mediawiki.org/wiki/API:JSON_version_2#Using_the_new_JSON_results_format [16:55:40] Gotcha. Would like to make the leap some point soon. [16:55:56] It would be great to have a list of all the fields that are affected. [16:56:05] But I can run a shit-ton of queries to get at that. [16:56:28] I commented [16:56:30] bbl! [16:56:39] o/ [16:59:13] OK. This is weird. [16:59:21] ToAruShiroiNeko_, looks like I introduced a new error. :) [17:00:27] well that is progress [17:00:30] better from no error [17:03:36] ToAruShiroiNeko_, https://github.com/wiki-ai/revscoring/pull/188 [17:04:55] huh [17:05:05] it gave an error but it looks like its merged [17:05:25] Gave an error? [17:05:35] github did [17:05:39] Not merged: https://github.com/wiki-ai/revscoring/pull/188 [17:05:39] asking me to refresh [17:05:53] Oh! I just posted a comment with the error it fixes. [17:06:00] ah [17:06:17] really merged this time [17:07:33] just a quick check, did anyone else noticed a high volume of false positives for "reverted" scores recently? [17:07:34] https://en.wikipedia.org/w/index.php?diff=680704671#ScoredRevisions.js [17:08:33] Helder, yeah. See https://meta.wikimedia.org/wiki/Talk:Objective_Revision_Evaluation_Service [17:08:45] We're working on it frantically [17:08:48] :) [17:09:13] Luckily, I don't think it has anything to do with the model itself. Just the feature extractor. [17:09:14] SO I [17:09:16] thanks! [17:09:27] So I'm fixing the feature extractor. We should be able to solve this within the hour. [17:12:26] Yay! Feature sets match now [17:12:27] :) [17:12:34] Now, time to try a deploy [17:12:39] First, to staging [17:13:36] And for some reason we're building numpy again [17:13:41] FFS [17:15:43] c'mon scipy [17:15:48] compile better [17:23:13] Imma fix that later today [17:23:19] And stop using pip [17:23:19] :D [17:23:23] \o/ [17:23:27] At least for deploys [17:23:34] If not all the time [17:23:45] Yeah [17:24:10] I'm going to use git submodukes for ores and revscoring [17:24:37] For the dev environment? [17:24:59] Or are you talking about with ores-wikimedia-config [17:25:03] ? [17:25:36] Crap. We need to blow out cache. [17:26:27] I can specify a key prefix. [17:26:43] Would you do that hot on redis YuviPanda? Or should we just increment the model version? [17:26:59] If we do that, we need to deploy new models, but it would invalidate old cache [17:28:34]