[17:04:30] hey halfak, I injured my hand really bad today in soccer (don't get me wrong I had to take soccer course :| I'm more fan of swimming and biking) anyway I probably broke some bones I do my best to work but I might not be at my best speed [17:04:49] No worries. Heal first. We'll work out the details later. [17:04:59] Don't make your injury last longer. [17:05:35] using some fixing materials it only aches now [17:05:48] if i take painkiller, i sleep [17:06:10] so i can work but not very much [17:06:16] * halfak has broken his wrist ~10 times. [17:06:22] wow [17:06:30] it's my wrist too [17:06:40] strange that it's soccer [17:06:40] My recommendation is ice + acetomenaphin & immobilization. [17:07:20] I did ice + iboborphen and immobilization [17:08:24] you injured your wrist because of biking? [17:12:14] * YuviPanda has never broken bones [17:12:18] and shall like to keep it that way [17:12:34] Amir1: take care of your hand first! [17:12:54] thanks YuviPanda :) [18:32:51] Amir1, biking, boxing & sledding. [18:34:53] halfak: wow, my bf is also boxer [18:35:40] I never got my license, but I had a lot of fun sparing in college. [18:35:48] Now, my old gear gets neglected. [19:32:04] wiki-ai/wb-vandalism#100 (new_reverteds - 5cfd5d9 : Amir Sarabadani): The build passed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/92782600 [19:34:15] wiki-ai/wb-vandalism#101 (new_reverteds - b567fa2 : halfak): The build passed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/92782857 [19:52:27] wiki-ai/revscoring#329 (table_format - ea512df : halfak): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/92786232 [19:58:55] halfak: around? [19:59:02] Yup [19:59:17] page.is_content_namespace [19:59:26] is it nessecary to add? [19:59:48] It's not, but I figured that would be important since we're going to try to process talk pages like wikidata items :S [20:00:26] I can take it out if you think that is important. [20:00:33] It won't require an additional request. [20:01:06] there is nothing important [20:01:47] but because all of our training set is ns=0 edits [20:01:54] it won't really matter [20:02:17] Oh! I didn't know about that. [20:02:27] I suspected that we just errored on any other namespace anyway. [20:02:34] Since we'd try to read the content as JSON. [20:02:51] yeah, we can build a pure revscoring model for them [20:03:26] even though it doesn't really matter, there is no important vandalism in those namespaces [20:03:42] +1 [20:03:54] BTW, this most recent model seem to perform way better on water. [20:04:12] Also, our new "probably damaging revert" detection strategy filters out half the reverts! [20:04:13] awesome [20:04:24] So we might want to adapt your dump parser. [20:04:45] ah an unlicensed boxer eh? [20:04:50] awesome [20:05:07] I will remember not to out box you :p [20:05:08] fight club, first rule: don't talk about it [20:05:14] aha yes [20:05:28] second rule is also a recursive to the first one right? [20:05:28] :D We used training gloves though. So no one got hit that hard. [20:05:44] halfak D: [20:05:49] * halfak imagines recursive rules. [20:06:06] White_Cat: yes [20:06:08] That's be "The first rule of fightclub is: Fightclub rules" [20:06:11] basically an average wikipedia copyright policy [20:06:35] The first rule of fightclub: go to Fightclub rules; [20:06:46] goto probably instead of go to [20:06:59] halfak: i will do the adapt, just give me some instructions [20:07:12] Rest wrist :P [20:07:40] void Fightclub_rules() { Fightclub_rules(); } [20:09:39] okay, i do it only with my left hand :D [20:10:16] Amir1, OK. So I'm not sure what you mean by "the adapt". [20:10:16] i will also merge the pr if you take out that [20:10:20] :) [20:10:24] Oh sure. Let me do that now. [20:10:28] I have gotten used to the soldiers with full automatic rifles in the streets :/ [20:10:47] halfak Also, our new "probably damaging revert" detection strategy filters out half the reverts! [20:10:48] halfak So we might want to adapt your dump parser. [20:11:08] or maybe I got it wrong [20:12:13] Oh yeah! So. [20:12:17] * halfak gets the change [20:12:46] thanks [20:12:52] Amir1, see https://github.com/wiki-ai/editquality/commit/5cc0170df2f29bdbc06e804d15902daa6cea7f6b [20:13:15] So, important note. The mwreverts.api.check() method returns a tuple of three things: [20:13:22] 1. information about reverting other edits [20:13:22] 2 [20:13:29] . information about being reverted [20:13:37] 3. information about being reverted back to. [20:13:58] And edit can "revert", "be reverted" or "be reverted back to in some future edit" [20:14:28] The primary change in this commit was to include this was_reverted_to_by_someone_else condition. [20:14:46] as a revscoring in prod update: we've decided we'll need to use separate redis servers for revscoring (to keep impact on mw's redis servers minimal) [20:14:52] alex is filing a procurmenet ticket [20:14:57] So, if your edit was reverted, but someone other than you reverts back to the version you saved, you're good. [20:15:11] and we've a structure agreed upon (two nodes doing both web and worker duty as one cluster) [20:15:35] YuviPanda, we have a problem when our worker nodes can use the CPU that our web nodes use. [20:15:43] We could lower the priority of the worker nodes. [20:15:56] *nose --> processes [20:16:00] *nodes [20:16:02] ugh [20:16:08] I wonder if in practice it won't be of much difference in our case [20:16:20] since we'll also be sharing CPU with other services (mobileapp and more in the future) [20:16:28] and the machines have 64 'real' cores each... [20:16:32] YuviPanda, I did a lot of testing in this. This is a very important concern. [20:16:47] Our whole system breaks down if the web workers become our bottleneck. [20:16:56] In testing, they did when CPU was loaded down. [20:16:59] halfak: okay, thanks :} [20:17:51] halfak: can't we tune that by just setting the number of web and worker processes? [20:17:55] err [20:17:57] *tuning [20:18:12] Not if they are both sharing the same CPUs [20:18:25] Unless, I guess, if we set so few workers we can't fully utilize. [20:18:33] I've invited alex here [20:18:45] what do you mean by 'sharing' [20:19:02] well, when the same CPUs run both the web and celery workers, the celery workers seem to win [20:19:15] if we've access to 64 and (hypothetically) say 32 to web and 32 to workers, then the kernel will figure the rest out [20:19:16] So rather than celery's queue being the bottleneck, the web worker queue becomes the bottleneck. [20:19:22] goddamn laggy IRCaaaarrrgghsafhsahashg [20:19:51] halfak: yeah, but that's because they're sharing CPUs in he sense of we're overprovisioning them (4 cores, >4 (workercount + web count)) [20:20:02] and in production we'll have enough CPUs we won't need to overprovision [20:20:14] YuviPanda, we should run ~ 4 celery workers per CPU. [20:20:17] unless i"m misunderstanding (which is totally possible since you've spent way more time thinking of this than I have) [20:20:22] Oh... then we'll only utilize at ~25%. [20:20:44] I think the numbers will change since these are not VM cores so we need to experiment [20:21:10] but I think we shouldn't worry about utilization but more about latency and pressure, I guess [20:21:15] It would be great if we could say, "Celery, you get 30 CPUs, so you have 120 workers. Uwsgi, you get 2 CPUs. Start up as many workers as you need." [20:21:30] remember this is also shared machines so there's other code running [20:22:02] So, in testing, we don't share resources, but in prod, we do? [20:22:05] I think the way to look at it should be 'we want latency to be X and this is the tuning we should do to keep latency at X for the foreseeable future' [20:22:06] That seems... backwards. [20:22:08] well [20:22:17] in testing you share resources with all other labs vms :) [20:22:30] in production you share resources with other services more explicitly [20:22:48] Yeah. Fair enough. But will we suddenly lose half our capacity? [20:22:51] no [20:22:59] What enforces that? [20:23:02] oh [20:23:04] wait [20:23:07] 'suddenly' as in? [20:23:16] because the other resource is taking up too much of it? [20:23:19] Something else starts competing with us. [20:23:23] Yeah [20:23:37] right so we don't actually have any mechanism to do that yet outside of just 'let us have enough memory and cores to make sure that even if we do lose half capacity it is all ok' [20:23:44] * YuviPanda waves at akosiaris [20:24:12] OK. So utilization is always non-optimal then? Even at load? [20:24:18] we can use cgroups to do guaranteed memory and CPU core limtis if needed [20:24:27] yes [20:24:37] OK. Well that's fine I guess. [20:24:40] our utilization super sucks everywhere. app servers are like at 10-15% utilization all the time [20:24:51] I'm pretty sure it will work so long as our web workers don't get blocked on CPU. [20:24:52] and in general we aim to not have more than 50% util on everything [20:25:01] close to 50% util means 'add more resources' [20:25:22] YuviPanda, yeah +1 for that. I just want to have 100% *available* [20:25:29] oh yeah totally [20:25:47] So if we suddenly get a bunch of requests (someone is doing an offline analysis) we can use what we've got and start looking for better ways to support. [20:25:50] I don't think we'll run into availability problems due to hardware contention mostly because there's just enough hardware [20:26:11] I think the 'real' hardware cores will be powerful enough for us to handle it [20:26:37] Well, I'm not worried about power. I'm worried about demand. [20:26:40] there's no direct comparison but one of these machines in prod is definitely way more powerful than a similar specc'd labs VM [20:26:41] Imagine infinite demand. [20:26:57] How do we make sure our backpressure mechanism is work and use 100% of available resources. [20:27:08] well, you never want to use 100% of available resources :) [20:27:11] Currently, we do that by having the web nodes work independently of the celery nodes. [20:27:15] you want to use at best 50% of available resources [20:27:21] YuviPanda, I don't want to. I want to be able to. [20:27:28] so you deal with infinite demand by scaling horizontally [20:27:43] YuviPanda, but we're talking about running everything in one/two boxes. [20:27:46] 'oh this is utilizing too much - so we either move it to its own machines, or add more machines to this cluser' [20:27:54] * YuviPanda curses IRC [20:28:03] YuviPanda, but I want our backpressure system to work. [20:28:05] halfak: yes, so two boxes + separate boxes for redis [20:28:42] So either we don't let celery be able to fully utilize, or we enter a situation where celery can use so much CPU that it crushes our backpressure system. [20:28:57] fucking lag aaaagghh. I'ms eeing your messages only several seconds after you send them [20:29:10] * halfak wonders how you know about the time delay :P [20:29:17] I totally agree halfak [20:29:33] because I see the same delay for my messages too [20:29:36] and its' infuckingfuriating [20:29:49] Why is priority an issue here? [20:29:58] as in? [20:30:02] It seems to me that we can just run uwsgi are a higher priority and we're set [20:30:09] priority of what? [20:30:13] CPU [20:30:31] I guess I just thought 'oh yeah, let us get it on those systems and play with all the tuning and decide then' :) [20:30:49] so I totally agree, I just think we've very little information to make decisions about what we'll exactly want to do at this point of time [20:31:03] so we need to set it up and load it and then mess around. [20:31:04] YuviPanda, OK. That's fair. I'm just flagging a problem that we're running into on our single-machine staging env. now. [20:31:10] yeah, totally. [20:31:32] but remember it's a single overprovisioned in a not-great way machine and wouldn't translate directly at all to this scenario [20:31:44] we should definitely be on the lookout for it [20:31:58] but not put premature limits in place before measuring, I guess? [20:32:08] and we have nice ways to measure [20:32:10] and load test :D [20:32:41] Sure... It's just that I took a lot of measurements and theoretically, the number and power of CPUs will play no role. [20:32:45] Regardless, you are right. [20:32:51] We should test when we get there. [20:33:46] I will also say that we should even aim for *not* using 100% of everything - since that means you can't get on the machine and do stuff :) backpressure is to make sure we don't end up using 100% I guess. [20:33:53] but step 1 is let's get it on the machinesssss [20:34:00] (and setup varnish and lvs and deployment) [20:34:02] oh god [20:34:04] deployment [20:34:08] akosiaris: this is using fabric for deployment now. what do we do? [20:34:46] * YuviPanda doesn't want to touch trebuchet with a pole of any length [20:34:47] YuviPanda: either scap3 which is basically similar to fabric, or trebuchet [20:34:48] scap3? [20:34:53] yeah, let's scap3 [20:34:56] and stay out of trebuchet [20:34:59] configuration is the same for both btw [20:35:04] * YuviPanda nods [20:35:07] * halfak sides with YuviPanda's judgement [20:35:12] but yeah, let's go with scap3 [20:35:17] sounds way more futureproof [20:35:28] need to talk to releng though for that, as well as godog [20:35:28] o/ akosiaris. Thanks for your help :) [20:35:39] I have 0 experience with scap3 yet as it is just out of the oven [20:35:54] halfak: just going my job :-) [20:35:59] s/going/doing/ [20:36:05] akosiaris: we also have a fun uwsgi problem where restarts take far longer than they should [20:36:07] sorry, 10:30 pm here [20:36:13] hm ? why ? [20:36:19] haven't loked yet [20:36:25] I thought it was doing a graceful but it might not be [20:36:34] probably some request/task still being completed ? [20:36:37] need to dig at some point [20:36:41] no it's not doing that I think [20:36:46] since we saw some requests fail during restart [20:37:16] aaaagh, so many things to do so little time [20:37:24] * YuviPanda sets hair on fire and runs around like a chicken [20:37:25] that's always the case [20:37:29] lol [20:37:57] akosiaris: it's been great to work with halfak though - he's done most of the things I thought I was going to do :D [20:38:00] it's pretty nice [20:38:00] do we have somewhere in phabricator what remains tbd ? [20:38:11] akosiaris: yeah, there was a trackign ticket somewhere [20:38:14] let me find it [20:38:37] I got 2 actionables for me which is get the redis boxes and gbpize python-sklearn [20:38:41] I'm game for whatever work I can do. I'm happy picking up new tech for this since I'll be maintaining it anyway. [20:38:59] akosiaris: https://phabricator.wikimedia.org/T106867 [20:39:13] akosiaris: yeah, sklearn we need to create a new task [20:39:32] yeah, I 'll do both tomorrow [20:39:38] akosiaris: development also happens on github so we'll need gerrit mirrors [20:39:47] ok cool [20:39:57] hmm not autosynced I suppose right ? [20:40:11] halfak: will just be pushing to gerrit as well ? [20:40:35] we can do that... [20:40:45] so just push to gerrit before deployement [20:40:47] or ? I sense a better idea in the air [20:40:58] I was going to suggest automatically mirroring into gerrit :D [20:41:04] but I don't know the security implications of that [20:41:18] can github do that ? [20:41:53] no [20:42:00] akosiaris: what do the restbase people do? [20:42:32] I think push to gerrit [20:42:59] I guess we can do that too [20:43:07] just direct push [20:43:11] so you never deal with the UI [20:43:13] It's just going to be like pushing to another remote, right? [20:43:17] yeah [20:43:22] Yes please. No gerrit UI. <3 [20:43:40] ok [20:43:51] akosiaris: halfak I'll create the repos in gerrit and we can test pushing to them [20:44:01] tbh, I doubt there is much sense in all that, but let's keep the status quo for now [20:44:04] Cool. Thanks YuviPanda [20:44:12] akosiaris: instead of directly deploying from github? [20:44:15] yes [20:44:29] We have some submodules. Will those need to move too? [20:44:42] I think there has been like a talk to the death 3 years ago about all that and the move to gerrit [20:44:47] but I wasn't here back then [20:44:48] akosiaris: while I agree there's not much direct sense in that from a technical pov, I feel it'll be controversial in some vague sense and I guess there's enough controversy we *have* to deal with already anyway :D [20:44:59] YuviPanda: exactly [20:45:08] halfak: I suppose so :-( [20:45:10] how many ? [20:45:25] also please tell me no recursive submodules [20:45:33] not sure scap3 supports that [20:45:34] ores-wikimedia-config -> ores -> revscoring [20:45:35] wiki-ai/ores, wiki-ai/revscoring, wiki-ai/wikiclass, wiki-ai/editquality [20:45:51] akosiaris, there is a recursive one, but it doesn't need to be. [20:45:58] Just happens to be. [20:46:30] ores-wikimedia-config --> ((ores --> revscoring), wikiclass, editquality, wb-vandalism) [20:46:31] so I ask because trebuchet definitely does not support that. I think it was noted for scap3 and it was supposed to support but I honestly don't know [20:46:35] Forgot wb-vandalism last time [20:47:01] We don't have to stick with the submodule pattern. honestly, I'm not too attached. [20:47:06] so all these are extra code that's unrelated to all the stuff that is debian packaged ? [20:47:42] akosiaris, yeah. not debian packaged. Those repos are under my control. [20:47:56] oh I think submodules are gonna be fine, it's just the recursive ones that I am wondering about [20:48:03] but I will contact godog and as about it [20:48:05] we could also go 'fuck all this' and just bundle everything :) [20:48:17] Yeah. We don't need that. I end up making a deep symlink to pull 'revscoring' up anyway. [20:48:26] It makes sense that they are separate as python packages/libraries/documentation. [20:48:38] halfak: maybe we can get rid of that recursive one now... [20:48:45] it's not actually recursive, just nested [20:48:47] nested! [20:48:48] We'll also likely be adding new submodules as we go. [20:48:58] Each submodule represents a new type of prediction problem. [20:49:06] Yes. Nested. [20:49:13] Oh god recursive lol [20:50:52] so, anything else ? [20:51:23] akosiaris: we need a long term solution for deb packaging [20:51:26] hmm [20:51:29] maybe not :D [20:52:08] I should just gbp them all and show halfak how to update them I guess [20:52:31] YuviPanda: it might be easier that you think.. we got some pretty good architecture these days around these things [20:52:44] like clean build environments in labs and production [20:52:57] it's mostly some version bumping most of the times [20:53:02] akosiaris: hmm I'm mostly copy-pasting debian/ dirs around which I find problematic I guess [20:53:04] initial packaging [20:53:52] * YuviPanda should stop whining and JDI [20:53:57] lol [20:54:08] ok I 'll take that as my cue to go to sleep [20:54:20] I 'll start working on these tomorrow and let you know guys [20:54:26] akosiaris: \o/ [20:57:31] Thanks again akosiaris. [20:57:40] YuviPanda, I'll need you to help me know what things I should go learn. [20:58:20] * YuviPanda thinks [20:59:07] halfak: so I think our blockers now are: remove nested submodule, package new dependencies (mwapi and stuff?), move puppet to use debs instead of pip, and use scap3 instead of fabric [20:59:28] (3) is dependent on (2) [20:59:42] Which one is (3)? [20:59:44] (4) is dependent on the releng team (and I think we should let akosiari.s handle that) [20:59:55] 3 is 'move puppet to use debs' [20:59:57] OH. I thinK i see [21:00:06] So, I can clean up the submodules. [21:00:09] No prob. [21:00:22] I guess I should also learn some scap3? [21:00:24] we should also maybe write down our submodule policy somewhere [21:00:37] as to 'which goes as submodules vs libraries' [21:00:46] Sure. Totally. [21:00:48] we already decided that I think but not sure if we wrote it down [21:01:34] Right now the distinction is really clear. If it's our primary libraries (ore, revscoring) or a library that uses the revscoring framework to build models (editquality, wikiclass, wb-vandalism) it's a submodule. [21:01:52] This is really good for the applications of the revscoring framework because we can check in models. [21:02:03] And then they appear in the submodules :) [21:02:05] right. so we should write it down and amend in the future if we need to chang eit [21:02:07] yeah [21:04:40] * halfak just deployed wikilabels campaigns to etwiki, jawiki, dewiki and itwiki. [21:04:47] nice [21:04:51] \o/ [21:05:53] halfak: I can also show you a bit of debian packaging (or specifically: debian packaging for python modules that have no compilation, which fits all our things) [21:06:55] Cool. A git gist that shows the commands step-by-step would be great. [21:07:00] Or any markdown file really. [21:07:12] I guess an on-wiki guide would be nice too. [21:07:24] We should start building up the wikitech stuff for ORES. [21:07:30] I've been focusing on the meta wiki stuff. [21:08:45] halfak: so the problem so far with debs is that there are like a million guides on the internet :D [21:08:53] I guess we should have one specific to ourselvs too [21:09:13] Maybe.. if you can point me to one that'll work great for our python packages, then cool. [21:09:14] +1 [21:09:28] I couldn't really find one when I was looking for it [21:09:44] we need to use 'pybuilder' and 'git buildpackage' but a lot of the guides I found are... tooo detailed [21:12:14] halfak: ok so our action items are: I do some debian stuff, you do some de-nesting stuff [21:13:31] Yeah [21:13:56] YuviPanda, https://phabricator.wikimedia.org/T119435#1826519 [21:18:35] halfak: I also sent a pre-emptive anti inflammant on that bug [21:18:48] let's see if that works [21:19:22] halfak: so what should we call this in gerrit? we need a namespace so as to say [21:19:26] could be research/ or ai/ [21:19:49] hmmm [21:20:17] can be multiple levels too [21:20:19] research/ai [21:20:53] we'll need to push all of those things through and find some way to deal with submodules [21:21:06] Let's drop the ai. [21:21:24] or do "wiki-ai" for parody with github [21:21:37] but that would be redundant. [21:21:37] parity? [21:21:41] Yeah [21:21:42] That [21:22:41] nah, let's not do wiki-ai... [21:22:46] k [21:22:48] research/? [21:22:50] yeah [21:22:52] ok [21:22:54] so [21:22:58] research/ores-wikimedia-config [21:23:00] research/ores [21:23:03] brb dog needs attention. [21:23:03] research/revscoring [21:23:04] +1 [21:23:09] kk I'll create them all later [21:45:56] YuviPanda, sorry to run away. [21:46:06] Didn't want to find a pile of something somewhere. [21:46:09] np I need to run away now :) [21:46:16] heh, better dog poop than cat puke [21:46:25] * halfak wants neither [21:46:25] o/