[00:24:34] YuviPanda: found the bug... pybuild is case sensitive to the package name, it needs to match the deb name and not the python module name. [00:25:19] Anyway... so what's the acceptance criteria for these puppies? There's a branch we can use to build debs, but it's not upstreamed. [00:26:08] I've got a showstopper bug and a PR in place to fix it. If anyone wants to help me get new models staged tonight, check out https://github.com/wiki-ai/revscoring/pull/158 [00:26:34] halfak: reading... [00:26:37] awight, I implemented the RegexLanguage class. I know what you are talking about re. composing. I'd like to address that later if it's OK. [00:27:28] No worries--it's just barely itching, we can wait until the affected area spreads :) [00:28:30] this change is great [00:28:53] Can you briefly explain the bug this patch is fixing? [00:31:12] Yeah. So it had to do with re-writing the module. It's the same problem we have when I tried to use function decorators around pickled things. If you set a name in a module to one thing and then change that later, pickle gets confused. [00:32:06] So, pickle is all like, "I heard there was a function called revscoring.languages.english.is_badword_process" and our language utility is like "I've never heard of 'is_badword_process'. I just have 'is_badword'." [00:32:07] OK. Yeah, using the modules as de facto classes was unnerving... [00:33:16] merged [00:38:56] Thanks awight ! [00:39:13] I should probably run the code when reviewing... [00:39:45] I have python in pieces on cinder blocks at the moment, though, working on this packaging. [00:41:30] awight, I know what you mean. [00:41:38] FWIW, tests pass :) [00:47:26] BAM: https://github.com/wiki-ai/revscoring/releases/tag/v0.4.9 [00:55:12] * halfak grumbles. [00:55:17] There's always one more thing. [00:55:33] Really, I'm cleaning out some cobwebs where we haven't been walking recently. [00:56:10] I know what u mean--I live in an old fruit packing shed, disguised as a small house... the spiders have a long memory, though. [00:56:29] I probably sweep the ceilings more than I sweep the floor! [00:56:54] * halfak imagines spiders raining down on awight as he sweeps [00:58:08] on the bright side, my kid has no fear of them whatsoever. She charged right over and sticks her nose in their faces... calls it "doing an experiment" [00:58:42] lol! Just so long as they aren't dangerous! No harm then :) [00:58:57] Fast! [00:58:59] Thanks [01:00:34] * halfak installs right from the repo this time rather than cutting a version. [01:01:00] Sure. NOW it all works -- when I get cautious [01:02:44] Anyway, I'll let it conclude with building the models this time before I cut a new version. [01:02:51] And on that note, it's time to hang out with Jenny. [01:02:56] See you folks tomorrow! [01:02:58] o/ [01:14:38] hmm [01:14:48] I had a spider building a web on the tram the other day [01:14:57] buggers do not miss a second to do this [01:15:09] on a moving tram... [01:16:36] Good thing about reference frames seeming like they're not moving! [13:56:22] * halfak looks around for a YuviPanda [13:56:44] I've got a PR that has been waiting for a while that I think you could review in 30 seconds. [13:56:45] https://github.com/wiki-ai/ores/pull/75 [13:56:58] * halfak is trying to clean up his open tasks. [16:18:10] Hey um, I got the attention of the socketIO-client author, and he's asking about why we use the 0.9 protocol. [16:18:13] https://github.com/invisibleroads/socketIO-client/pull/88#issuecomment-128394500 [16:18:42] I could just point him to our requirements.txt, but I was hoping to add a few words about why we're still on the old protocol. Does it have to do with server compatibility? [16:18:51] awight: because the 1.0 protocol didn't exist when the server was written, I think [16:18:58] and 0.9 and 1.0 are incompatible [16:19:01] iirc [16:19:10] thanks! You know what server we're using? [16:19:28] gevent-socketio, perhaps? [16:20:25] hmm [16:20:31] "import socketio" [16:20:34] that doesn't help =p [16:21:13] but probably gevent-socketio [16:22:45] yes, http://packages.ubuntu.com/trusty/python-socketio which is gevent-socketio [16:23:04] awesome, thanks for tracking that dependency down. [16:23:37] Well, the upstream author is suggesting that the socketIO-client should be rewritten to support both 0.9 and 1.0 [16:24:26] valhallasw`cloud: sorry, which repo is that in? [16:24:36] https://github.com/wikimedia/rcstream/blob/master/rcstream/rcstream [16:24:42] I'm wondering whether we could simply run the node server [16:24:42] k [16:24:53] aww [16:25:06] but given that there's clients running s.io 0.9, I'm not sure how we can upgrade without breaking those [16:25:50] Cool, I can give a solid reply now, that it would be useful to us to have both protocols supported in the client... [16:30:08] https://phabricator.wikimedia.org/T68232 [17:23:07] awight: yeah, the plan is to re-write rcstream into nodejs which will switch to 1.0 [18:03:25] oh! thanks for the note, yeah I see the task now: https://phabricator.wikimedia.org/T86803 [18:03:37] Timo says he already wrote a rough pass at it... [20:43:38] madhuvishy: I'll be there about 15mins late sorry [20:43:57] okay. halfak ^ [20:45:46] At lyft [21:09:02] o/ [21:09:28] YuviPanda: madhuvishy will be back at my desk in 2 min [21:12:33] :) [21:13:52] madhuvishy: let's start anyway? [21:13:55] Ha. Looks like YuviPanda is later than I am. [21:13:57] madhuvishy: I'll be in the office in like 10mins [21:14:14] madhuvishy: am creating wikilabels project and adding you as admin [21:14:22] Add me too! [21:14:29] ok [21:14:30] YuviPanda: okay [21:15:14] madhuvishy: added. can you create an instnace? I guess medium would be fine [21:16:04] YuviPanda: okay [21:16:16] 1 minute - going up to 5th floor. Come there? [21:16:26] madhuvishy: ok [21:16:32] halfak: I added you too [21:16:53] YuviPanda, anything I can do to help in the meantime? [21:17:17] halfak: not atm I guess :D do you know if there are any active campaigns running atm / people using the site? [21:17:28] halfak: we'll need a few minutes downtime at the last minute when we switch DNS [21:17:40] YuviPanda, I'll scope that out quick. [21:17:59] halfak: ok. YOu can tail logs on the current host I guess if you want [21:18:08] BTW, the DB has not been copied. Currently the DB is only on labels.eqiad.wmflabs [21:19:02] halfak: yes that's one of the steps [21:19:08] steps are: [21:19:12] 1. provision new instance [21:19:17] 2. put it on labels-new.wmflabs.org [21:19:19] 3. test to see if it's good [21:19:32] 4. do a DB dump from current place, import in new place [21:19:38] 5. switchover DNS [21:19:43] madhuvishy: ^ [21:20:10] OK. Looks like no one has been active for about 3 hours. [21:20:27] cool [21:20:32] so we can tolerate 5-10mins of downtime I guess [21:20:35] * halfak goes to label some revisions while he waits. [21:20:40] YuviPanda, definitely. [21:22:19] okay YuviPanda back and creating instance [21:22:30] madhuvishy: ok. callit wikilabels-01? always use sequences :D [21:22:34] I'm almost at the office [21:22:48] okay come up to 5th [21:22:49] madhuvishy: you also need to add the role to the 'manage puppet groups' on the sidebar so it shows up in the configure page [21:22:51] madhuvishy: ok [21:30:06] madhuvishy: am here now [21:32:57] halfak: we don't need video for this do we/ [21:32:59] ? [21:33:02] I think we can do this just on IRC [21:33:30] Yeah. I don't think so. We'll see if we feel like we're slow and voice would help. [21:33:37] ok [21:33:52] madhuvishy: halfak we should log actions in the labs channel [21:34:00] format is : !log message [21:34:02] madhuvishy: ^ [21:41:15] YuviPanda, will try to keep that in mind. So, next time I do a deploy or maintenance, I'll want to log it. [21:41:50] facepalm for pr 163 [21:42:02] RIP PEP8 [21:42:05] halfak: ok! [21:42:09] lol [21:42:21] I just responded. it looks like reza1615 is behind master. [21:42:46] yes, we talk in Telegram [21:42:57] hey said he wants to fix it [21:44:28] Great. [21:44:45] Amir1, I struggle a lot to format rtl in my editor. Any protips? [21:45:02] I think it's not actually rtl, but the modification chars. [21:45:41] ctrl+shift+x sometimes helps (ff only) [21:46:35] Firefox? You write python in firefox? [21:46:47] no of course not [21:46:50] :P [21:46:53] I use sublime [21:47:02] sublime handles the chars OK? [21:47:05] (I was like no no no no) [21:47:21] not very good but acceptable [21:47:32] gedit works good too [21:47:44] Oh yeah. I should try gedit. [21:47:47] oh by the way, result for Persian bad words (my code) wasn't good, I realized that my Tokenizer for Persian wasn't complete I re-run it and the result is really good now :) [21:48:21] Cool! Are you using a similar tokenization style as we are within revscoring? [21:48:28] Or do we need to update that too? [21:48:40] No [21:49:40] and there is a thing, Persian language doesn't have latin characters but I think we should add them too, since bad words can be expressed in latin too [21:50:18] vim master race etc [21:50:20] :P [21:50:41] Amir1, I agree. We should think about how to express curses from other languages. [21:51:09] Maybe dictionaries and badwords should eb a flexible feature. [21:51:25] E.g. english_words / words in persian might be quite predictive. [21:51:37] we need a meta bad word too [21:51:45] like yolo came up everywhere [21:51:48] lol [21:51:59] also lol :D [21:52:05] Brain virus that only affects trolls [21:53:50] one of the bad words in Persian is "member of Basij" :))))) [21:55:26] What is Basij? [21:56:11] A branch of Army which works in enforcing Sharia in Iran [21:56:39] people join as volunteers but It has benefits for them [21:57:47] Ha. I see. I'm guessing this branch of the Army is unpopular enough for it to be an insult to suggest someone is a member of it? [21:58:09] very unpopular [21:58:22] yup [21:58:59] some people are proud to be member of it anyway [21:59:36] halfak: haha we lost the db passwords by accidentally deleting them lol [22:00:32] :)))) [22:00:37] lol wut? [22:01:33] So... you can just root onto the machine and change the root passwd, right? [22:01:40] *DB password [22:09:00] Amir1, it looks like there's a bunch of unbalanced parens here: https://github.com/wiki-ai/revscoring/pull/163 [22:09:02] AM I reading this wrong? [22:09:12] Maybe there's a persian char that looks like ( or )? [22:09:43] no there is not a persian char [22:09:59] probably this user didn't check it very carefull [22:10:04] *carefully [22:11:59] Gotcha. Could you take a look and confirm or deny my concerns? [22:12:09] sure [22:12:15] Honestly, these regexes look *WAY* too long [22:13:18] halfak: yeah we did that [22:18:11] halfak: are you using any module called 'encoding'? [22:18:35] I don't think so. Is it part of standard lib? [22:19:02] Yeah.. It doesn't look like it. Why do you ask? [22:19:16] Could be a secondary dependency on something we *do* use. [22:23:46] halfak: we were getting an error but that's sorted [22:27:53] Cool! In next meeting, but feel free to pull me out if you need to. It's extremely boring ;) [22:30:38] halfak: what are the db credentials in the old labs instance [22:32:17] PM'd [22:59:19] halfak: migrated db also. can you verify things work? [23:01:17] hey guys! [23:01:45] Sorry. Laptop issues. Will need to reboot. [23:02:14] wazaap! [23:03:43] Just a quick introduction: Alchimista is a user from ptwiki who mantains some of our bots, and he is experimenting with revscoring [23:04:15] Helder: lets let halfak return :P [23:04:24] yep! :-) [23:04:34] Sorry of here [23:04:40] Sort! [23:04:59] I'm going to have to run shortly [23:05:07] ToAruShiroiNeko: hey! can you help us test wikilabels? [23:05:11] halfak|Mobile: do you know who else can? [23:05:33] sure [23:06:21] i'm developing a bot, but with heuristics, and it would be usefull to use revscore to check the bot's output, since i'm doing it manually, there's no streaming for now, right? [23:06:23] Hey folks. [23:06:25] Back to normal. What do I need to test? [23:08:24] Alchimista you can request 500 scores at once if you like [23:08:37] I wouldn't recommend that. [23:08:44] I wouldnt either [23:08:45] I'd recommend 50 at a time. [23:08:52] halfak|Mobile: labels-new.wmflabs.org [23:09:02] halfak: we moved the data as well so need to check data / functionality [23:09:04] ToAruShiroiNeko: ^ [23:09:21] But yes, Alchimista there's no stream of scores, but you can listen to rcstream and query our service. [23:09:25] On it, let me start up the virtual machine and get this party rolling [23:09:27] OK YuviPanda [23:10:19] This is a good URL for checking: http://labels-new.wmflabs.org/campaigns/enwiki/4/?campaign=stats [23:10:28] It looks good to me. These are the stats I expect. [23:10:30] YuviPanda: I'm trying my bot to get stopwords for languages but it gives me permission error [23:10:30] PermissionError: [Errno 1] Operation not permitted: '/public/dumps/public/scowiki/20150702/scowiki-20150702-pages-meta-history.xml.bz2' [23:10:33] ? [23:10:50] halfak: ok, can you verify and sign off [23:10:55] I'll try to load up the gadget and do the OAuth maneuver. [23:10:59] for now it's more interesting the query, to have an idea how the bot is working, but 50 scores at once seems pretty nice [23:11:34] and it happens in grid engine [23:11:35] and the datasets, are public? [23:11:36] halfak: want us to switch it over now? [23:11:39] Alchimista, do you mean [23:11:44] YuviPanda, not quite [23:11:45] http://ores.wmflabs.org/scores/ptwiki/?models=reverted&revids=41338880|41339445|123456|78910 [23:11:46] ? [23:11:47] halfak: ok [23:12:09] * halfak logs into his test account and points it at this new server [23:12:09] halfak: btw. I'm working to use bad word detection system to get stop words, using idf [23:12:11] :) [23:12:49] Alchimista, the probabilities returned by that URL are computed when they are requested (or loaded from cache) [23:12:49] :D! [23:12:51] Helder: yap [23:13:07] GOod idea Amir1 [23:13:37] :) [23:13:47] so, I don't think there is a dataset with e.g. rows such as | (rev) 123456 | (score) 0.123 | [23:13:51] Alchimista, ^ [23:13:53] YuviPanda, getting an error with https://labels-new.wmflabs.org/gadget/loader.js [23:13:59] is that what you was looking for? [23:14:08] So something is up. [23:14:18] Helder: the dataset for the model training [23:14:22] That path loads a Flask template [23:14:28] It could be that it can't find the templates. [23:15:11] I think halfak saved on Labs a copy of the tsv files which contains the revids of each page used for training the models [23:15:22] That is right. [23:15:23] Alchimista, ^ [23:15:34] halfak: [23:15:38] https://www.irccloud.com/pastebin/8HWhcHty/ [23:15:42] halfak: is that dataset public? it would be interesting to use it [23:16:19] * halfak checks config [23:16:33] It's not hosted, but there's nothing private about it. [23:16:57] Sure enough, it appears there's some missing things from the config. [23:17:26] Can try fixing in master quick. OK madhuvishy ? [23:17:43] halfak: okayy [23:17:57] okay YuviPanda I have my virtual machine up and running [23:18:03] halfak: in the config repo? [23:18:19] Yup [23:18:28] Done [23:19:00] It's going to behave weird on anything but labels.wmflabs.org, but that's going to need to be OK. [23:20:27] I'll need to figure out how to get this from the request object in flask later. [23:20:38] halfak: there's a Host HTTP header [23:20:52] Yeah. So it should be an attribute of 'request' [23:21:01] * halfak files bug [23:21:02] yup [23:21:25] halfak: works now [23:21:27] https://github.com/wiki-ai/wikilabels/issues/77 [23:22:39] https://labels-new.wmflabs.org/gadget/loader.js loads now [23:22:42] OK. It looks like this is working. [23:22:50] * halfak checks the last submitted label. [23:22:55] And workset created, [23:23:32] No labels since I submitted mine. [23:24:25] Yeah. So if you load a snapshot right now, it should be good. [23:24:48] halfak: we loaded a snapshot few minutes ago [23:24:48] No activity in the last hour. [23:24:54] What time UTC? [23:25:22] Last activity was 2015-08-06 22:26:46.462613 [23:25:38] halfak: 2015-08-06T22:58:36.507Z [23:25:38] Yeah... Actually that should be good. [23:25:46] https://stashbot.wmflabs.org/#/dashboard/elasticsearch/SAL is the log btw [23:26:08] OK. Let's flip the switch! [23:26:26] can others see https://stashbot.wmflabs.org/#dashboard/temp/AU8FVh036snAnmqnLHKx [23:26:33] I'll run a quick test after it is flipped [23:26:37] Do you want me to do it? [23:26:37] ok [23:27:23] * halfak waits for wikitech [23:27:55] Umm. [23:28:06] It looks like labels.wmflabs.org is already pointing to it! [23:28:24] * halfak checks the revscoring proxy to see if it still exists [23:28:25] halfak: yes, see -labs for logs from madhuvis.hy. [23:28:37] she already flipped it a few secs ago [23:28:42] Oh! [23:28:45] halfak: flipped :) [23:28:48] :D [23:28:51] OK. Tests again [23:30:32] And we're good! [23:30:34] Thanks guys! [23:30:46] YuviPanda & madhuvishy: <3 <3 <3 <3 [23:30:49] halfak: \o/ [23:31:08] halfak: :D [23:31:21] halfak: so I guess you need to start buying madhuvishy drinks too [23:31:29] I've got to run right now. I'll need to get used to the fab script later. I'll make sure I test some staging tomorrow. [23:31:41] halfak: we don't have a staging server yet, will need to set it up at some point [23:31:43] madhuvishy, >:) so much Absynth [23:31:47] halfak: I will move the staging instance to this project soon [23:31:49] hah [23:32:01] Gotcha. fabric still points to the old servers [23:32:08] i've never done absinthe [23:32:08] halfak: pfft spelt wrong :P absinthe ;) [23:32:10] Anyway gotta run. [23:32:20] and unfortunately it doesn't have any hallucinogens despite what pop culture tells you [23:32:21] Abs-synth [23:32:22] ya i'll fix the fab thing too [23:32:26] madhuvishy: flaming absinthe shots are the best [23:32:36] * halfak runs away before he gets in more trouble [23:32:37] o/ [23:32:42] i've never done shots [23:32:47] bye halfak! [23:32:55] o/ Alchimista, it was nice to meet you. [23:33:01] I hope you'll stick around. [23:33:05] Talk to you later! [23:33:17] bye halfak, yap, i'll be around