[00:56:26] o/ YuviPanda [00:56:52] halfak: doiiit [00:56:53] :) [00:57:31] I made a mistrake in mwapi. Need to upload a new version to pypu [00:57:32] https://github.com/mediawiki-utilities/python-mwapi/pull/18 [00:57:39] :D [00:58:47] We're up on staging [00:58:48] http://ores-staging.wmflabs.org/scores/enwiki/reverted/4567894/ [00:59:14] I randomly get this from the API: [00:59:16] "[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1769)" [00:59:32] If you just retry, it works [01:00:22] That's strange [01:00:42] halfak: in where? [01:00:47] In ores? [01:00:50] Yeah. [01:00:53] mwapi [01:01:20] So we might see a couple of those in the logs. [01:01:29] But they only happened right after startup [01:01:33] Ok [01:01:38] halfak: file a bug? [01:01:51] Yeah good call. [01:05:55] https://github.com/CodethinkLabs/sandboxlib/issues/7 [01:06:57] halfak: also quarry back again [01:08:09] halfak: and your datasets are there [01:08:23] Oh yeah! I saw you said that. [01:08:42] Will check it out as soon as I finish off this deploy. [01:09:31] So I've been running the precached against this for a while. It seems that each worker will raise this error once. [01:10:26] I think we should move forward. [01:10:48] So, the deploy is going to be a fun time again. [01:11:19] YuviPanda, ^ what do you think? [01:12:29] fyi, scikit-learn is kicking a bit more of my butt. But I'm sure I can finish the package over the weekend, is that blowing any schedules? [01:13:09] Now I'm back to some gfortran .so linking issue, which I swear I solved in another one of these backports [01:13:12] awight, hey dude. No problem. :) [01:13:28] Your last comment in the thread on phab suggested we change versions. [01:13:30] :D no more grant [01:13:31] I'm OK with that. [01:14:11] cool. For triaging purposes, I'd say it saves us a day of screwing around with the packaging [01:14:17] Woot! [01:14:21] Let's do it. [01:14:42] k. Hopefully there's a new animated gif or something to make the upgrade pain worthwhile [01:18:24] YuviPanda, that's the wrong dataset! But it's still a good one. [01:18:30] Woops. Sorry about that [01:18:55] I meant to link this one! http://datasets.wikimedia.org/public-datasets/enwiki/article_quality/article_period_quality_scores.tsv.bz2 [01:24:35] Oh well. The combination of those two datasets is very powerful [01:24:41] But they should switch names [01:25:15] The dataset you have loaded should be article_stats_enwiki_july2015 [01:26:43] halfak: I se [01:26:44] e [01:26:57] halfak: I'm writing a process for getting new datasources in it [01:27:01] halfak: https://etherpad.wikimedia.org/p/add-dataset-to-quarry [01:27:16] What do you think about moving forward with the ORES deploy? [01:27:19] halfak: the manual part now is making the mysql schema [01:27:28] halfak: do it! [01:30:37] Like clockwork [01:31:03] NoooO! [01:31:08] The web nodes failed [01:31:47] YuviPanda, http://pastebin.com/BziciKaT [01:32:07] ,,, [01:32:08] wtf is that [01:32:12] halfak: is that an outage now? [01:32:32] Yes [01:32:35] No [01:32:37] Yay! [01:32:47] Just slow [01:33:05] too slow? [01:33:21] OK for now. [01:33:25] I shut down precaching [01:33:38] scoring takes 3 seconds when it should take 1 [01:33:48] So. WTF is this? [01:33:57] ... [01:33:58] not usre [01:34:00] *sure [01:34:02] looking [01:34:04] Worked fine on the celery nodes [01:34:10] halfak: when did this fail? [01:34:13] halfak: fab? [01:34:17] halfak: on pip upgrade? [01:34:28] yea. I'll get the full error. [01:34:44] https://gist.github.com/halfak/c9ce7de9e6b85bd58c86 [01:35:35] halfak: so web-02 is unaffected? [01:35:40] It seems [01:35:49] It seems 01 is OK too [01:35:59] They are perfectly happy hitting the new celery nodes :S [01:36:08] If not a bit slow [01:36:23] halfak: restarted uwsgi on -02, since it was down earlier from the timeotu deadlock [01:36:36] kk [01:37:07] restarts take forever [01:37:09] another thing we need to fix [01:37:21] yeah. They don't on staging [01:37:52] I think it might just be connections slowly draining [01:38:00] and it restarting 'gracefully' [01:38:17] Is it up again? [01:38:20] so 1. stop accepting new connections, 2. wait for old connections to die, 3. restar [01:38:21] t [01:38:27] or it could be sometime totally else [01:38:31] I want to try to install ores 0.5.0 on -02 [01:39:08] It is still starting [01:39:19] halfak: let's not do that atm without seeing why it went wrong on -01 [01:39:24] also fuck this network [01:39:25] wtf [01:39:41] kk [01:40:56] Yeah. it's the venv. [01:41:01] I can't install anything with pip [01:41:09] halfak: I'm depooling -01 and repooling -02 [01:41:11] (-02 works) [01:41:13] kk [01:42:22] halfak: done [01:42:35] halfak: we can delete that venv and re-initialize it [01:42:44] https://github.com/pypa/pip/issues/2679 [01:42:51] Looks like it could be a corrupt pyc [01:43:38] hmm [01:43:42] should we delete just the pycs? [01:43:50] or all of it? [01:43:53] I think so. [01:45:23] halfak: for former or latter? [01:45:34] Oh... Let's do all of it. [01:45:39] Esp. since we are de-pooled [01:45:45] No big deal to wait. [01:45:46] halfak: yesah [01:46:00] halfak: so we rm -rf /srv, and do initialize_server [01:46:10] will do [01:46:21] halfak: want me to do it or? [01:46:28] Oh.. I'm there. [01:46:31] I feel like I should do this. [01:46:44] So I can get comfortable working in your absence :) [01:46:48] halfak: ok! +1 [01:46:50] halfak: I"ll be here [01:48:53] YuviPanda, https://gist.github.com/halfak/746c69da1074623b06a3 [01:49:06] Does that dir get created in provisioning? [01:49:25] halfak: oh, sorry forgot - after rm -rf do a 'sudo puppet agent -tv' [01:49:56] We're going to have a minor problem. [01:50:01] redis is an optional dependency [01:50:09] so we need to install it manually [01:50:19] .. [01:50:19] Or rather, I suppose we can add it to requirements.txt [01:50:21] that's bad [01:50:25] In order-wikimedia-config [01:50:32] *ores-wikimedia-config [01:50:53] I continue to maintain we should have a 'canonical' setup and not need a support matrix :) [01:50:57] halfak: but we can just do it manualy now [01:51:09] (sorry just occurred to me. I was debugging with aetilley today and only just worked out the implications now.) [01:51:16] right [01:51:30] we should switch to debs soooon [01:51:38] halfak: anyway, I'm ok doing it manually this time. [01:53:11] * halfak stages that quick [01:53:33] Cool. Without a blip in staging [01:53:50] ok [01:54:08] did you force a puppet ron on -01? (sudo puppet agent -tv) [01:54:12] it might've already happened [01:54:13] Yup [01:54:24] initializing now [01:54:31] Will be compiling scipy shortly [01:54:40] ahh that [01:54:41] Here we go [01:56:52] Scipy -- why oh why? Do you prevent my fly.. [01:56:56] code from being deployed [01:57:00] ranges :P [01:57:16] What's the version installed on the nodes? [01:57:38] * YuviPanda continues maintaining that ranges will cause nothing but pain [01:57:39] halfak: 0.14.0 [01:57:41] is what is [01:58:15] We never fixed the range. https://github.com/wiki-ai/revscoring/blob/master/requirements.txt#L13 [01:58:51] halfak: https://github.com/wiki-ai/revscoring/commit/78c898501ca73b2366d356cffa13483dee61782e [01:59:43] Where did this commit come from? [01:59:45] halfak: haha https://github.com/wiki-ai/revscoring/pull/186 [01:59:49] halfak: I made it durnig the outage last time [02:00:04] Weird, why did it get overwritten? [02:00:10] didn't get merged [02:00:12] it was on a merge [02:00:14] Oh! It's a pull [02:00:16] :P [02:01:24] yay [02:01:24] :D [02:04:22] * halfak compiles sklearn [02:04:29] :( [02:05:24] I guess we have to compile that one for now. [02:05:33] At least on initialize [02:05:46] * halfak takes the opportunity to kill the restart crons on the workers [02:06:32] halfak: yeah [02:12:05] OK. -01 should be online. [02:12:45] Looks like we're upgrading -02 [02:12:47] I didn't mean to do that. [02:12:53] -01 install went great [02:13:39] Yeah. canceled that [02:14:25] halfak: dartar and company want a discussion on librarybase. how does this discussion happen and where? [02:14:39] Yeah. It didn't get redis still. [02:14:58] harej, good Q. No idea. But I can ping DarTar when I next see him and ask him what he wants. [02:15:01] Or you could :) [02:17:02] OK. Clean run works well. [02:17:20] No more compiling crap! [02:17:30] Now I just wait for uwsgi to restart [02:17:44] harej, I can talk to you about what I want with librarybase [02:17:59] I wan't to build a metadata extraction strategy into mwcites. [02:18:10] *extract --> fetch [02:19:39] web-01 is in good shape [02:19:56] I'm going to pool it and then try web-02 [02:22:32] halfak: wooo [02:22:57] How do you run fab deploy_web against just one host? [02:23:16] I've been trying "fab deploy_web --hosts=ores-web-02.eqiad.wmflabs [02:23:23] But it will still start with -01 [02:25:03] halfak such that you retrieve the meta data once and then use Librarybase as a store? [02:25:12] Yes [02:27:32] AND we' [02:27:33] re done [02:27:34] ! [02:27:36] It works [02:28:00] harej, I want to prime librarybase like no one has primed a datastore before. [02:28:10] And then I want to talk to you about tracking citations. [02:28:20] halfak: wooo [02:28:22] Tay [02:28:23] Yay [02:30:55] OK. [02:30:59] I need to go run away. [02:31:06] I'll check back in about an hour [02:31:14] Thanks for the hand YuviPanda :) [02:31:20] harej, have a good night! [02:32:30] * harej will probably still be online in an hour [05:07:04] * halfak is late [05:07:26] And we're still online [05:07:59] hello halfak! [05:08:18] Just on for a couple minutes to make sure ORES is in a good state [05:08:55] Looks like Flower is in a weird state. [05:09:09] wisconsin? [05:09:13] Can't load JS and CSS or somethine [05:09:15] *g [05:09:23] Wisconsin's not that weird. [05:09:32] But I feel obligated to dislike it [05:10:16] OK. Looks like we're in a good state. [05:10:24] 8.14 million scores processed. [05:10:30] I'm off. Have a good one! [05:10:48] o/ [12:04:12] wisconsin? [14:41:45] What is difficult when building a wikimedia ai? I am nearly finished with the wikimedia json parser and plan to build an AI based on keyword tagging, pageranking, nlp with simple bayesian probabilistics and then using a neural network with feedforward training and rnn processing - am I speaking black here? [15:44:13] o/ pressure679 [15:44:43] So, flexibility in feature extraction is difficult. Working real time is difficult when you need to do computationally intensive things like diffing. [15:45:22] It's somewhat difficult to get all of the moving pieces together so that you can distribute cpu use. [15:45:57] So, why did you decide to build a json parser? [15:46:39] It sounds like you are planning to build something very complicated. Do you know that you need that level of complication in order to serve your goals. [15:51:34] hello halfak [15:51:41] Hey ToAruShiroiNeko_ [15:51:47] Digging into this right away [15:51:55] [2015-09-12, 16:15:04] <+HaeB> ...it seems that revscoring has suddenly developed a very negative opinion about english wp, coloring almost everything red ;) [15:52:04] Then I'm going to pick up that model_info proposal [15:52:38] morning all [15:52:42] o/ aetilley [15:52:43] or hello [15:53:03] halfak: Reason for json is just lighter to parse than html, and yes I do realize this is a big project which is going to take more than a few days or weeks. - and to the extration feature - I think a pattern matching on the most used pages would be good, and ofc first parse the pages with titles or maybe keywords with the ones the user wants. [15:53:56] pressure679, json parsers are common in the wild. That's why I wonder why you'd build your own. [15:54:00] What will your AI do? [15:56:10] * halfak rebuilds the enwiki reverted predictor [15:56:19] Looks like all of the models are affected. [15:56:30] ToAruShiroiNeko_, It could be an issues in one of our feature extractors [15:56:42] We'll find out if this model builds OK. [15:56:48] ok [15:56:54] I don't think anything changed in our model code. [15:57:41] Luckily, YuviPanda and I just cleaned up our deployment pattern so I can deploy new models as soon as we have this worked out. [15:58:40] halfak: My dream is it to interprete data from wiki pages into Bloom's Taxonomy, but right now I think that is far fetched as that needs abstraction on a ontological level. [15:58:53] *an ontological level. [15:59:15] So what is the now-goal? [16:00:26] halfak: Just create a client interface for easing educational purposes based on my experience of it. - kind of a q&a bot with some smart scripting features. [16:01:24] Have you seen the lit. around semantic relatedness and other intelligent analysis techniques on top of Wikipedia data? [16:03:07] OK. It looks like model training isn't a problem. [16:03:13] So I think it is in feature extraction [16:05:21] halfak: Not really, but thanks for the reference. I think I will go /away for now. [16:05:43] pressure679, one more thing if you have a sec [16:05:59] Check out http://shilad.github.io/wikibrain/ [16:06:04] It sounds like what you are working on. [16:06:13] Shilad is a regular in this channel :) [16:06:29] I hope to see you around soon :) [16:07:05] - oh btw, ofc word stemming is an priority in mapping keywords from wiki pages, and also nlp, but right now my language of choice doesn't have an optimal nltk for the purpose, so maybe a basic noun parser from basic adj/pro/verb exclusion will be used. [16:08:07] What lang? [16:08:46] stemming isnt always good [16:09:02] to stem or not to stem is uite a philosophical discussion [16:12:48] Ah I'm glad you all were able to find each other [16:13:12] Looks like user_age got borked. [16:26:54] is ores borked? https://en.wikipedia.org/w/index.php?title=Giant_panda&action=history every single edit is showing up as red [16:27:02] Yeah. We're working on it. [16:27:04] :/ [16:27:11] Sorry legoktm [16:27:19] ok :) [16:27:34] See https://meta.wikimedia.org/wiki/Talk:Objective_Revision_Evaluation_Service [16:27:36] for updates [16:27:45] I'm just about to post about a likely issue I found [16:28:13] * legoktm watchlists [16:29:08] * halfak posts a table of feature values that changed with the last update. [16:29:18] ToAruShiroiNeko_, PR for revscoring incoming [16:29:27] If you can merge that, I'll propagate the versions and try again. [16:29:32] One sec though [16:29:41] ok [16:31:15] ToAruShiroiNeko_, https://github.com/wiki-ai/revscoring/pull/187 [16:31:37] we need to hold a party for the 200th [16:31:47] 200th what? [16:31:50] Oh! PR [16:31:50] pr [16:31:58] :) [16:32:09] so problem was a typo? [16:32:26] Yes [16:32:35] A typo that does not return an error. [16:32:41] Silent failure is sneaky death [16:33:24] done [16:33:25] ya [16:33:40] my biggest blunder was when I forgot a ; after an if() [16:33:42] no errors [16:34:00] halfak: do you log API warnings? [16:34:17] Not warnings, but errors. Should log warnings. [16:34:32] In fact, I think we should throw an error for warnings, but that's another problem [16:34:47] Depends on the warning of course, I guess. [16:35:03] The warning "you used the wrong fucking parameter" should be an error IMO. [16:35:17] We can fix that on our side. [16:35:58] Parsoid also learned a similar lesson a while back about logging warnings [16:36:00] with perhaps a slightly more polite warning :) [16:36:21] legoktm, We should probably write that into mwapi? [16:36:42] logger.warning("You used the wrong *&%$(@& parameter") [16:36:46] lol [16:37:51] * halfak installs goddamn scypi [16:37:59] Seriously [16:38:15] logger.warning("I'm going install scipy again. :P") [16:39:11] legoktm, https://github.com/mediawiki-utilities/python-mwapi/issues/19 [16:39:39] woot [16:39:45] Well... Might as well fix that while we are waiting for scipy [16:39:51] we == halfak [16:40:09] and thanks for fixing MediaWiki* :) [16:40:39] legoktm, indeed. Nice to have you note the issue. [16:40:57] I'll be going through all the mw I maintain to clean them up and announce in the near future. [16:41:04] I'll make sure that gets on the checklist [16:41:22] legoktm, can you point me to a schema for API warnings? [16:41:49] https://en.wikipedia.org/w/api.php?action=query&ucusers=Foo&list=users [16:41:53] Here I suppose https://www.mediawiki.org/wiki/API:Errors_and_warnings [16:42:11] yeah... can there be multiple fields here? [16:42:18] "warnings": {"modulethatcausedit": {"*":"text that may contain multiple warnings jumbled up"}} [16:42:27] are you using formatversion=2? [16:42:28] The docs say I should expect a and field [16:42:30] https://www.mediawiki.org/wiki/API:Errors_and_warnings [16:42:34] Oh [16:43:09] doc['warnings']['main']['warnings'] [16:43:40] 'main' is because the warning is thrown by the 'main' module [16:43:54] Gotcha. [16:43:55] you'll want to iterate over all keys in the first 'warnings' dict [16:44:10] doc['warnings'][]['warning'] [16:44:15] kk [16:44:16] yep [16:44:25] Can I expect each value to contain a dict with the one key? [16:45:29] yes [16:45:45] 'warning' is a concatenated string of all the warnings that happened in the request [16:45:58] there's a bug somewhere to make it structured and machine readable [16:46:11] Gotcha Thanks. :) [16:46:33] Do you know how I could generate a complex warning? [16:47:21] https://en.wikipedia.org/w/api.php?action=query&ucusers=Foo&list=users|foobar&formatversion=2 [16:47:40] halfak so we can build a model for several languages [16:48:10] legoktm, where can I read up on how formatversion changes things? Or is it just for warnings? [16:48:53] we can build a german (geeez) and dutch models [16:52:10] legoktm, https://github.com/mediawiki-utilities/python-mwapi/pull/20 [16:52:16] Any feedback you have would be appreciated [16:53:16] * halfak submits new versions to pypi [16:54:55] halfak: https://www.mediawiki.org/wiki/API:JSON_version_2#Using_the_new_JSON_results_format [16:55:40] Gotcha. Would like to make the leap some point soon. [16:55:56] It would be great to have a list of all the fields that are affected. [16:56:05] But I can run a shit-ton of queries to get at that. [16:56:28] I commented [16:56:30] bbl! [16:56:39] o/ [16:59:13] OK. This is weird. [16:59:21] ToAruShiroiNeko_, looks like I introduced a new error. :) [17:00:27] well that is progress [17:00:30] better from no error [17:03:36] ToAruShiroiNeko_, https://github.com/wiki-ai/revscoring/pull/188 [17:04:55] huh [17:05:05] it gave an error but it looks like its merged [17:05:25] Gave an error? [17:05:35] github did [17:05:39] Not merged: https://github.com/wiki-ai/revscoring/pull/188 [17:05:39] asking me to refresh [17:05:53] Oh! I just posted a comment with the error it fixes. [17:06:00] ah [17:06:17] really merged this time [17:07:33] just a quick check, did anyone else noticed a high volume of false positives for "reverted" scores recently? [17:07:34] https://en.wikipedia.org/w/index.php?diff=680704671#ScoredRevisions.js [17:08:33] Helder, yeah. See https://meta.wikimedia.org/wiki/Talk:Objective_Revision_Evaluation_Service [17:08:45] We're working on it frantically [17:08:48] :) [17:09:13] Luckily, I don't think it has anything to do with the model itself. Just the feature extractor. [17:09:14] SO I [17:09:16] thanks! [17:09:27] So I'm fixing the feature extractor. We should be able to solve this within the hour. [17:12:26] Yay! Feature sets match now [17:12:27] :) [17:12:34] Now, time to try a deploy [17:12:39] First, to staging [17:13:36] And for some reason we're building numpy again [17:13:41] FFS [17:15:43] c'mon scipy [17:15:48] compile better [17:23:13] Imma fix that later today [17:23:19] And stop using pip [17:23:19] :D [17:23:23] \o/ [17:23:27] At least for deploys [17:23:34] If not all the time [17:23:45] Yeah [17:24:10] I'm going to use git submodukes for ores and revscoring [17:24:37] For the dev environment? [17:24:59] Or are you talking about with ores-wikimedia-config [17:25:03] ? [17:25:36] Crap. We need to blow out cache. [17:26:27] I can specify a key prefix. [17:26:43] Would you do that hot on redis YuviPanda? Or should we just increment the model version? [17:26:59] If we do that, we need to deploy new models, but it would invalidate old cache [17:28:34] Looks like deleting keys by prefix in redis is a no-go. [17:28:39] I'll update the model versions. [17:34:06] halfak OH [17:34:09] I remember now [17:34:18] you wanted to talk about how to handle deleted versions [17:42:33] Hmm... Version does not seem to be working for cache invalidation [17:42:40] Since when was this? [17:45:37] Yup. Version changed. This is a cache issue. [17:45:39] God damn [17:53:41] Hmm... Cache is working in my local dev. [17:53:44] Something is a bit weird. [17:55:37] YuviPanda, I think it is the lb [17:55:43] Is it caching? [17:57:12] I take it back. [17:57:16] Can't be nginx [17:59:22] Nope. It's in my use of celery. [17:59:37] YuviPanda, do you know a good way to blow away celery's cache of results? [17:59:45] Basically I want to clear the result backend [18:02:59] I think we need to do this every time we deploy new code [18:03:50] hi all, anyone here familiar with lambda calculus? [18:06:55] aetilley probably is. [18:06:59] ... when he comes back. [18:09:10] Arg! This is driving me nuts. [18:09:17] I'm seriously considering just blowing cache apart. [18:09:47] But it would be unsafe to do that while the service is running. [18:10:10] I don't understand why celery is behaving this way. [18:10:37] celery *does* get the version number as part of its key [18:10:43] Something else is being the cache here. [18:14:11] Well... we do work for new revisions. So I think I'm going to deploy this. [18:15:19] Looks like we're building numpy and scipy again. [18:15:40] I can't believe that this is still an issue after hardcoding the versions. [18:16:49] pip isn't using the local env's packages. [18:18:06] Oh. Looks like the workers were on 0.16.0. [18:18:12] And we just hard-coded to 0.14.0 [18:18:17] "Because hardcoding is better [18:18:52] Why are the workers on 0.16 [18:18:59] Because we used to be on ranges [18:19:06] Oh [18:19:12] Wait [18:19:21] Is the Debian package on 16 [18:19:25] Nope [18:19:27] Or is that the pip [18:19:27] 0.14.0 [18:19:29] (Am out [18:19:31] Aaah [18:19:34] That makes sense [18:19:44] BTW, we're having a weird caching issue. [18:19:46] I'll be at a laptop in a couple hours [18:19:47] I can't figure it out. [18:19:51] Oh OK [18:19:52] * ToAruShiroiNeko_ is fascinated by the amount of pain numpy and scipy brings [18:19:58] Flushdb kills all of redis data [18:20:11] Lb does no caching atm [18:20:11] Safe to do while the celery is using it as a backend? [18:21:57] Ah [18:22:02] You will lose jobs [18:22:09] You can clear things with a prefix [18:22:15] With some Lua [18:22:18] What's celery's prefix? [18:22:24] Been trying to work that out. [18:22:27] There is a stack overflow anser [18:22:28] Ah [18:22:34] Do keys * [18:22:40] And see? [18:22:45] Yeah... that's going to return a billion things [18:22:48] 8.5 million [18:22:51] Actually [18:23:26] Still compiling on worker-01 [18:23:39] * halfak suppresses his frustration [18:24:38] YuviPanda, how would you increase logging in staging? [18:25:45] I want to get debug from uwsgi [18:28:21] WHAT! [18:28:34] I just updated the script, increased the log level and restarted and it worked [18:28:36] WTF [18:28:43] WTFW WTWFWTWFWTFWTW [18:29:23] I just updated ores_wsgi.py and restarted uwsgi on staging to get this. [18:29:35] So, maybe the restart on uwsgi is insufficient? [18:29:39] Needs more reboot? [18:29:49] * halfak sighs [18:29:57] Compiling on worker-02 now [18:32:09] OK. It looks like there's no "uwsgi-ores-web" on staging [18:32:20] so 'sudo service uwsgi-ores-web restart' does nothing [18:32:32] But 'sudo uwsgi restart' works great [18:32:47] *'sudo service uwsgi restart' [18:34:09] Either way, I think our deploy is good. So long as I can make sure that the web nodes get a proper restart. [18:36:32] halfak: OK. I'll be available in a few hours - 2 or 3? [18:37:20] No worries. I think I've got this now that I know it isn't a caching issue, but the new code not getting fully deployed on staging. [18:37:34] But it would be cool if you could swing back around then. [18:37:48] I'll give you a status update in ~30 minutes by PM. [18:37:54] That you can read later [18:38:08] Yeah ok [18:38:08] I'm at a friend's place so.. [18:38:17] Unavailable much until I leave [18:38:45] No worried. Thanks for checking in. [18:38:49] :) [18:38:58] Sorry for all the pings [18:44:27] https://github.com/wiki-ai/ores-wikimedia-config/issues/29 [18:44:43] Workers are done [18:44:48] * halfak wipes brow [18:45:15] And the web nodes are ready (because we already did the version switch there) [18:45:20] It looks like we are in a good state. [18:45:58] Workers are happily working [18:47:52] AND WE'RE BACK! [18:55:01] halfak: didn't realize there was an outage [18:55:02] :( [18:55:20] Not really an outage so much as a shitty scoring period [18:55:25] I guess that is like an outage [18:56:47] o/ Shilad [18:57:22] We had pressure679 in here earlier talking about content modeling [18:57:24] Halfak, Hi! [18:57:33] If he/she pings you, know that I sent 'em your way. [18:57:38] :) [18:57:40] Awesome. Thanks! [19:03:47] halfak so [19:27:42] halfak I am adding language features to french, dutch and german btw [19:27:52] but my lack of regex skill kind of limits me [19:29:13] ToAruShiroiNeko_, sounds like this is a good time to develop the skill. [19:29:49] The nice thing about regex is that it usually laid out plainly. You don't have to follow references. [19:30:01] You really only need to know a few basic operators. [19:30:20] E.g. repetition: * + {min,max} [19:30:28] groups: () [19:30:33] classes: [] [19:30:41] special classes: \s \d \w \b [19:30:53] Once you have those, that's like 99.9% of regexes in the wild. [19:44:01] hmm [19:44:41] is r"foutre" a valid regex? [19:44:59] granted it is a match for one thing [19:47:22] Yup [19:47:26] ToAruShiroiNeko_, ^ [19:58:22] okay [19:58:37] what I will do is I will commit one match regexxed version first [19:58:46] and then try playing with regex a little [19:58:47] Sure. [19:58:55] Look at how I set up my test files [19:59:13] Do yours a similar away. That will make it easy to test your regex as you go. [19:59:17] I am working on revscoring/language/ [19:59:21] Yu[ [19:59:23] Yup [19:59:26] adding bad word lists etc [19:59:28] see revscoring/languages/tests/ [19:59:32] yeah [19:59:40] at some point I will create those too :) [19:59:45] bad words are a apin [19:59:51] pain [19:59:56] see https://github.com/wiki-ai/revscoring/blob/master/revscoring/languages/tests/test_english.py [20:00:10] Notice how I have lots of variants of the curses [20:00:35] But only one or two regexes for the same general curse: https://github.com/wiki-ai/revscoring/blob/master/revscoring/languages/english.py#L29 [20:02:23] halfak: in your code, what does "doc" stand for? (e.g. https://github.com/mediawiki-utilities/python-mwapi/blob/master/mwapi/session.py#L130) [20:02:51] JSON document [20:03:02] ah [20:03:10] YuviPanda, asked about that too. Do you find it counter-intuitive? [20:03:20] I usually like to do things like: page_doc and rev_doc [20:03:24] I could do query_doc [20:03:28] Or response_doc [20:03:48] * YuviPanda still finds it counter intuitive and prefers response / resp [20:03:49] I've never heard anyone say "JSON document" before [20:03:56] Yeah me neither [20:04:03] (I'm at a bank now) [20:04:18] https://en.wikipedia.org/wiki/JSON [20:04:25] "JSON documents can be encoded in UTF-8, UTF-16 or UTF-32 ..." [20:04:27] I associate "response" with the network request, and use something like "data" for the dict [20:04:35] "... a JSON document may consist entirely of any possible JSON typed value." [20:04:40] Yea. [20:04:47] Seems to me that data is even less specific though [20:04:51] halfak: so far I've only ever heard you say it :P [20:05:03] its not an abbreviation though :P [20:05:05] Well I didn't write the article :P [20:05:15] Oh! Fair point [20:05:18] document is a long word [20:05:22] halfak sure, I did this before ;) [20:05:29] and you're going to get sub-fields out of it [20:05:36] doc['query']['pages'] [20:05:37] also, is python-mwapi py3 only? [20:05:38] ^ e.g. [20:05:42] so I want it to be short. [20:05:45] regex is new but I did create language files before :P [20:05:47] legoktm, recent change, yeah [20:07:51] I just copied english and am putting in german and dutch curse words [20:08:59] https://github.com/mediawiki-utilities/python-mwapi/pull/21 [20:10:37] legoktm, overwrites my IP [20:10:39] :P [20:11:03] if you google your IP, python-mwapi is pretty high in the result list :P [20:11:22] * legoktm runs away to play some ingress [20:11:39] Meh. My Wikipedia article has my birth date and birth city so... [20:11:47] o/ legoktm [20:15:36] halfak I want the regex to be a seperate pull reuest as I will spend some time playing with it [20:15:48] Thats OK. [20:16:04] Just so long as the basic regexes you have in place catch the full set of badwords. [20:16:28] Say, could you also go through the Turkish badwords list and label it like I did the Spanish, Portuguese, French and Indonesian? [20:17:07] https://github.com/wiki-ai/revscoring/pull/189 [20:17:33] with turkish I have a problem with our system [20:17:40] ? [20:17:54] I would like to brainstorm the approach [20:18:10] What's the problem? [20:18:10] so curse words arent remotely important since they are filtered to death [20:18:28] Well, that doesn't matter for the Turkish *language* [20:18:38] It matters for the application of the language to features for trwiki [20:18:51] the problem is when someone adds in text replacing ç with ch and ş with sh [20:19:24] vast majority of the reverts we had have had this [20:19:35] woha [20:19:52] did I do a past progresive participble whatchamacall it tense? [20:21:03] Same thing as "that that" [20:21:14] Did you know that that thing is there? [20:21:34] All of the reverts we have had have had this [20:21:39] You dropped the "have" :P [20:23:04] O_O [20:23:10] * ToAruShiroiNeko_ hides under table [20:24:37] English, the sense is what it makes! [20:28:48] English, the sense you can believe in [20:34:41] well pull reuest is there for your review [20:40:33] ToAruShiroiNeko_, some notes [20:41:10] v==notes are noted [20:41:49] There's no way this test should pass. [20:42:00] as in? because it is in english? [20:42:11] There's no "myspell-de" [20:42:15] Yes [20:42:19] oh [20:42:22] And it's looking for misspelled dutch words [20:43:12] given how large the langugage this is surprising [20:43:28] I'll look into it tomorow [20:43:36] kk [20:49:07] * halfak runs away too [20:49:08] o/