[08:05:05] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3819169 (10akosiaris) >>! In T169246#3817883, @awight wrote: > Point well taken. What if we temporarily depool some of the servers f... [08:13:50] 10Scoring-platform-team, 10ORES, 10Performance: Diagnose and fix 4.5k req/min ceiling for ores* requests - https://phabricator.wikimedia.org/T182249#3819202 (10akosiaris) removing uWSGI from the tests is very easy. Just submit directly to the celery queue the jobs/min you 'd like and see if the scores proces... [11:27:41] (03PS1) 10Ladsgroup: Fix name of class in docs [extensions/ORES] - 10https://gerrit.wikimedia.org/r/395978 [11:29:12] (03PS2) 10Ladsgroup: Join decomposition of ores_model table queries [extensions/ORES] - 10https://gerrit.wikimedia.org/r/395811 (https://phabricator.wikimedia.org/T181334) [14:18:14] (03CR) 10Umherirrender: Fix name of class in docs (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/395978 (owner: 10Ladsgroup) [14:22:26] Good morning! [14:22:35] awight: hows you? [14:22:57] (03CR) 10Thiemo Mättig (WMDE): [C: 032] Fix name of class in docs (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/395978 (owner: 10Ladsgroup) [14:23:09] Good morning! [14:23:14] refeed[m]: o/ [14:23:34] *activity* - is there a phab task for adding ORES scores to AbuseFilter? [14:24:01] I dont think so [14:24:22] Our workboard is at scoring-platform-team feel free to check there TheresNoTime [14:24:40] (03Merged) 10jenkins-bot: Fix name of class in docs [extensions/ORES] - 10https://gerrit.wikimedia.org/r/395978 (owner: 10Ladsgroup) [14:26:52] It's already night here tbh ww [14:34:05] TheresNoTime: hi! There are some interesting comments about AbuseFilter integration on https://phabricator.wikimedia.org/T123178 [15:27:40] 10Scoring-platform-team, 10ORES, 10Performance: Diagnose and fix 4.5k req/min ceiling for ores* requests - https://phabricator.wikimedia.org/T182249#3820270 (10Halfak) We do have graphite logging in celery workers every time a score is processed. I'm not sure how that helps us in this situation. We're curr... [15:28:22] halfak: I thought akosiaris’s suggestion was great, to inject straight into the celery queue. [15:28:37] awight, I don't know. That sounds painful and weird. [15:28:42] hehe [15:28:45] Does celery *just* put it at the end of the queue? [15:29:02] Or do we need to write a new stress tester to connect to celery and submit the jobs directly. [15:29:23] How do we make sure we send the data in the same way that uwsgi does? [15:29:38] I think the latter, we would need to tap into our code to inject celery [15:29:48] +1 that increasing uwsgi workers is also a perfectly decent approach [15:29:54] Why don't we just use the uwsgi client to send it to the celery queue? It seems perfectly suited to the task of being a stress testor. [15:29:56] what does uwsgi have to do with the code that does the job insertion ? [15:30:10] uwsgi is just an app server.. it's the ores code anyway that does the job insertion [15:30:19] * halfak sighs [15:30:35] from the sigh I guess I might be wrong ? [15:30:36] ores client that lives in uwsgi == "uwsgi client" [15:30:54] What would be nice is to find a way to add instrumentation that can diagnose this bottleneck [15:31:06] We'd need to instrument the uwsgi queue [15:31:15] We have instrumentation of the * of active web workers. [15:31:18] We already have some metrics coming out of that [15:31:30] yeah, and # of requests served [15:31:48] btw I am not saying that uwsgi is not the bottleneck here, it might very well be. And given the ~1s request time it's entirely plausible [15:32:04] now if we only could drop the request time from ~1s ... [15:32:13] The web worker count instrumentation seems to be broken for ores* [15:32:21] ooh? [15:32:32] ~1s request time? [15:32:47] https://grafana.wikimedia.org/dashboard/db/ores?panelId=13&fullscreen&orgId=1&from=now-24h&to=now-1m [15:32:47] The mean 1.17s to process a score? [15:33:00] yup, that's what I got from the graphs... ~1s [15:33:07] I don't see how that is relevant [15:33:22] what does the uwsgi worker do in that ~1s ? [15:33:28] I think it’s relevant because uWSGI needs to maintain an open socket for that time [15:33:40] even if it’s not working, it has a limited number of parallel requests it can handle [15:34:10] awight, right. But we can't make it much faster and regardless we need to handle even longer requests. [15:34:28] Right! That's why we need to bump up the number of parallel requests that uwsgi can handle. [15:34:41] ok, just curious here. with just 1 worker, how many req/s would we handle ? [15:34:51] what kind of worker? [15:34:59] right, sorry about that. uwsgi process [15:35:20] We could handle 1 / 1.1 reqs/sec [15:35:28] 1 request per 1.1 seconds [15:35:48] And it's not really requests [15:35:50] Mind. [15:35:57] ? [15:36:00] When we have a cache hit, it's fast and cheap. [15:36:09] These are processed scores we are talking about. [15:36:14] A cache miss. [15:36:14] ah yes, but that's irrelevant for what I am about to say [15:36:20] And it's a request for a *score* [15:36:27] We get other requests for model info and stuff. [15:36:42] My guesstimate is that we should increase the uWSGI pool by x5, simple because we’re at 20% CPU. [15:36:47] So we should use the language of scores/sec [15:36:48] so in the case of a cache miss for a scoring, the uwsgi worker is blocked for the entirety of 1.1 secs, doing nothing else, right ? [15:37:02] awight, why would we increase by 5x? [15:37:19] It’s looking like our celery worker count will have available workers even when the CPUs are maxed-out, so that won’t be a limiting factor. [15:37:23] akosiaris, not entirely, no [15:37:29] halfak: because that would give us 100% CPU utilization. [15:37:29] The web worker does some stuff. [15:37:43] awight, why not just have enough web workers to feed the celery workers? [15:37:48] awight: you DO NOT WANT 100% utilization [15:37:49] Like I proposed. [15:38:01] akosiaris: +1 — what’s the correct target? [15:38:14] I’m only suggesting this for stress testing, not a production level. [15:38:48] Yes. So this is an old conversation -- that we need to have slightly more web workers than celery workers in order to saturate the capacity we have on the celery side. [15:38:59] I'd be OK with having twice as many web workers too. [15:39:04] production wise, the correct target is what allows you to serve your guesstimation of incoming traffic + a % for a buffer against spikes [15:39:05] But more than that is unnecessary [15:39:21] which is not that issue to guesstimate ;-) [15:39:26] Right. akosiaris speaks to a mixture of uwsgi and celery workers. [15:39:26] s/issue/easy/ [15:39:42] Basically we need 1:1 uwsgi and celery for whatever capacity meets akosiaris' constraints. [15:39:54] Between 1:1 and 2:1 is fine. [15:39:56] akosiaris: fwiw, we have an average traffic level of 500 req/min over the past year. [15:40:18] so ~9 r/s (rounding up) [15:40:30] The largest recorded spike was in February, when it hit 4.5k req/min for a day or two, but I think that was being throttled by the uWSGI bottleneck so we don’t know how high a real spike might be. [15:40:47] awight, not possible [15:40:53] We had more web workers than celery workers. [15:41:05] halfak: I don’t think it’s relevant to match uWSGI workers to celery workers, because the current number of celery workers can oversaturate the CPUs. [15:41:21] Oh then we should cut down celery workers. [15:41:51] awight, can or *will* if they were all active at the same time? [15:41:52] halfak: ok either way, the peak has a flat plateau, which suggest that something was throttling the incoming requests at a steady level, and we don’t know what the real demand would have been. [15:42:06] awight, did we overload? [15:42:07] awight: probably less [15:42:10] Because we've overloaded with less. [15:42:12] +1 that my napkin indicates that we’ll end up turning down the celery worker count [15:42:44] We can't operate effectively without cranking up uwsgi worker count regardless. [15:42:45] halfak: no but remember, overload conditions seem to be caused by OOM [15:42:55] halfak: +1! [15:42:56] What? [15:42:57] No [15:43:03] I mean yeah if it kills workers. [15:43:20] I found that our nodes were dying intermittently forever now [15:43:23] Overload is just when celery's queue gets too big -- there are too many incoming requests for the celery pool to handle. [15:43:39] Right. If a node dies, the celery pool is smaller all of a sudden. [15:44:31] yes but it’s looking like our celery pool can actually handle all the requests we receive—this is the scenario you’ve been worried about, that we’re silently overloading at the uWSGI entry point [15:45:47] awight, we got overloads when google was hammering us. [15:45:58] If my estimate was right, then 50 workers x 4 machines should be able to handle almost 9k req / min [15:46:03] https://grafana-admin.wikimedia.org/dashboard/db/ores?orgId=1&from=1484268570210&to=1486633765710 [15:46:08] halfak: Can you … ah [15:46:27] yeah that’s the window I was thinking of [15:46:37] awight, *scores/min [15:46:42] ty [15:46:57] I don’t see any hint of the workers dying [15:47:02] right [15:47:24] Because we didn't need workers to die to get overloaded [15:47:48] I believe we might have bumped worker count after this point. [15:47:57] k, that demonstrates that the uWSGI / celery ratio was good then [15:48:11] We didn't have changeprop response speed tracking at the time it seems. [15:50:03] Is it possible to instrument average celery worker idle time, I wonder? [15:50:19] I’d think that celery would support some of that natively. [15:52:24] In the meantime, we should probably try the obvious solution of bringing out uwsgi workers up to spec. [15:52:53] fwiw, on Feb 5 we increased the celery worker count 40 -> 45 [15:54:07] I'm looking for the right way to change uwsgi workers_per_core on ores* specifically. [15:54:30] also fyi, workers_per_core was == 2 in Feb [15:54:48] https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/ores/stresstest.yaml [15:54:56] halfak: search operations-puppet for workers_per_core [15:54:57] Does that seem like the right place? [15:55:18] Right awight. I don't want to change it for scb* [15:55:28] yes, but I have to admit I never understood why we have workers_per_core and not just plain "workers" [15:55:39] akosiaris, I'm OK with "workers". [15:55:49] That’s the right place for our stress boxes, yes [15:56:10] yeah, overall I think we should not be tuning too much the software to the hardware [15:56:23] kubernetes is coming and that way is not going to work [15:56:34] ores::web::workers_per_core [15:56:43] akosiaris: What’s the alternative? [15:56:53] In celery we just set the number of workers [15:57:13] profile::ores::celery::workers vs profile::ores::web::workers_per_core [15:57:25] akosiaris: ah, I see what you’re saying. [15:57:50] honestly I don’t understand when we add “profile::"- [15:57:50] in kubernetes land ? just decide how many req/s you want to serve, then define the quantum execution unit and allow kubernetes to autoscale [15:58:07] akosiaris, +1 [15:58:12] But we're not there now [15:58:15] akosiaris: halfak: oh hey, $processes = $::processorcount * $workers_per_core [15:58:21] right [15:58:24] so we’re free to specify $processes directly instead. [15:58:35] nope, but keep it in mind it's coming [15:58:41] awight, not sure that helps anything [15:58:44] akosiaris, next year [15:58:55] next quarter is mathoid, the quarter after that I would like to propose ORES [15:59:00] awight, setting $processes that is [15:59:11] akosiaris, sure I'm happy to do that. [15:59:34] This quarter, we're trying to deploy to our new cluster without breaking our current cluster :) [15:59:37] Sorry to say, a tidal wave of meeting is headed my way. [15:59:46] lol [16:00:04] I'm going to work on a workers_per_core patch for stresstest.yaml [16:03:24] * halfak adds a task for refactoring ORES puppet stuff for kubernetes. [16:04:33] 10Scoring-platform-team, 10ORES, 10Operations: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331#3820428 (10Halfak) [16:05:49] 10Scoring-platform-team, 10ORES, 10Operations: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331#3820428 (10awight) One thing that @akosiaris pointed out, we'll want to replace this puppet formula: > $processes = $::processorcount * $workers_per_core and specify the num... [16:06:28] awight, damn wrong task :P [16:06:31] I'm working on that one [16:06:37] lol [16:06:42] 10Scoring-platform-team, 10ORES: Refactor ORES puppet for Kubernetes - https://phabricator.wikimedia.org/T182332#3820450 (10Halfak) [16:06:44] There ^ [16:07:14] note the point in the description where I thought "Oh crap just post it and edit later" [16:08:13] 10Scoring-platform-team, 10ORES: Refactor ORES puppet for Kubernetes - https://phabricator.wikimedia.org/T182332#3820450 (10Halfak) [16:12:23] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3820532 (10Halfak) In this case, it's advanced smoke testing for the cluster. I'm hesitant to deploy in production until we've thoro... [16:27:10] halfak: what would be the multiclass equivalent of counting class here - https://github.com/wiki-ai/revscoring/blob/master/revscoring/scoring/statistics/classification/counts.py#L23 [16:27:43] i'm not able to extend "self['predictions'][label][predicted] += 1" to the multiclass case [16:31:33] codezee, oh damn. [16:31:38] For label in label [16:31:45] :) [16:31:49] * halfak thinks. [16:32:29] I wonder if we should change the default for "label" to be a list of labels and multilabel would be the only case where "labels" contains more than one label. [16:32:34] This is a painful generalization. [16:32:41] halfak: i thought abt that but given [A,B] in predicted and [A,B,C] in true, does it make sense to increment each of [A][A], [A][B], [A][C] [16:32:57] no [16:33:08] good point. [16:34:18] Counts only seems to make sense in the single-label case as it is defined. [16:34:23] * halfak thinks more [16:35:05] Actually, wait. [16:35:14] You could just leave it as is and it will make a HUGE table. [16:35:22] Pairs of sets. [16:36:03] increment [(A,B,C)][(A,B)] [16:36:08] I don't know if I like that idea. [16:36:20] It doesn't convey what I really want. [16:37:24] yeah, that hardly seems useful [16:39:18] awight, when you said we can change $processes directly, I was mis-reading the code. I see what you mean now. This could work :) [16:39:31] * halfak works that into his patch [16:39:39] patch for review already [16:39:51] :| I've been wasting my time. [16:39:53] lol [16:40:10] That's why I'd announced I was working on it. [16:40:14] Oh well. [16:40:17] I need to relocate, this connection is too slow to even upload [16:40:23] edit conflicts happen [16:40:34] I thought you were just bumping the number [16:41:06] On the down side, this patch affects production. [16:41:08] i see now why sklearn doesn't support confusion matrix for multi-label :/ [16:41:15] and… decreases workers perhaps [16:41:18] Working out the right way to do it without affecting scb* [16:41:26] * awight taps fingers waiting for the upload [16:41:27] haha codezee [16:41:35] You can have this if it’s useful… and if it ever uploads [16:41:40] Was just going to suggest you look around online to see what other people do. [16:42:16] https://gerrit.wikimedia.org/r/396055 [16:42:17] All yours [16:42:51] Icinga died ill tell -operations [16:43:08] halfak: different numbers of CPUs on scb1001-2, vs scb1003-4 I see [16:43:43] 24 vs 32 CPUs [16:44:10] So a really cautious patch would set different numbers for each machine. [16:44:20] How about we just do 1:1? [16:44:58] (that’s what I’ve done in that patch) [16:45:26] it also makes sense cos celery_workers is that way already. Are we concerned that this will OOM? [16:51:20] * awight throws knife I used to kill icinga-wm off a short bridge [16:54:38] awight: ^ i had it fixed [17:01:10] relocating to avoid extradition…. back in 10 [17:11:13] halfak: You want to own that patch or should I fix it up? [17:12:38] awight, I'm working on my own. [17:12:46] It looks like we had different ideas. [17:13:04] ok [17:13:28] eisenhaus335/wikilabels#2 (master - a0bcb0a : eisenhaus335): The build has errored. https://travis-ci.org/eisenhaus335/wikilabels/builds/313065894 [17:13:56] akosiaris: Is there some magic Puppet glue I’m overlooking, or is profile::ores::celery::workers a typo for profile::ores::web::celery_workers ? [17:14:17] halfak: want to share the WIP? [17:14:55] akosiaris: It… seems to work which scares me. [17:15:24] nvm, I see the explicit hiera call, ./modules/profile/manifests/ores/web.pp: $celery_workers = hiera('profile::ores::celery::workers', 45), [17:15:26] awight, documenting math [17:15:45] we shouldn't have that in web.pp [17:15:50] akosiaris: Why is it like that? I thought direct hiera calls were evil? [17:15:59] Not in profile apparently [17:16:01] ? [17:16:06] O_o [17:16:39] halfak: Looks like you’re right, grep -rw hiera modules/profile/ | wc -l => 1361 [17:17:44] I hate puppet, and actively wish for something better to eclipse it. [17:17:58] Probably all of ops feels the same way [17:19:44] I’m going to try to pay attention to JADE like I threatened on the calendar... [17:20:14] 10Scoring-platform-team, 10ORES, 10Patch-For-Review, 10Performance: Diagnose and fix 4.5k req/min ceiling for ores* requests - https://phabricator.wikimedia.org/T182249#3820695 (10Halfak) OK so, I think we actually need to bump the worker count to `celery_workers` + `queue_size`. Since `queue_size` is 600... [17:20:27] awight, ^ [17:20:33] https://gerrit.wikimedia.org/r/396064 [17:20:49] I'm looking into a better way to overwrite $processes directly. [17:21:15] Will submit that as a followup if I can get it. [17:21:46] halfak: I would just mash our patches together. [17:21:55] set $uwsgi_workers to 64, so scb1003-4 are unharmed, then specify $uwsgi_workers directly for scb1001 and scb1002 [17:22:15] c.f. hieradata/hosts/scb1001.yaml [17:23:20] 64 * 9 isn't going to cut it for stresstest [17:23:28] rather ores* [17:23:48] Sure, specifiy as 2200 / 9 or something in the stresstest.yaml [17:24:53] +1 for putting all the ducks in a row, so we don’t have to ask for multiple ops CRs [17:25:54] halfak: whats exactly the function of population_rates parameter? [17:26:00] stresstest workers + queue_size = 750 fwiw [17:26:13] oh hmm that’s queue_size over all machines [17:26:19] right [17:26:26] so 150 + (600 / 9) [17:26:33] 220 [17:27:51] 600 + 1 [17:28:08] Note that it's good to have more than that because we get requests that do not ask for a score. [17:28:36] * awight tiptoes away [17:29:42] * Zppix drags awight back [17:31:48] * awight throws icinga-wm off another precipice and Zppix must decide whose life to save [17:32:19] maybe 150 + (600 / 9) + 50 fudge [17:32:20] Neither [17:32:20] icinga-wm: is outdated anyway [17:32:25] 266 [17:32:57] Maybe just 10 fudge since non-scoring requests are so fast. [17:33:20] ~230 [17:34:24] halfak: factor in cached reqs [17:34:40] Zppix, same story [17:36:01] I wonder if we move celery to k8s or something if we could get more bang for our buck [17:37:16] Zppix, that's the plan [17:37:30] Oh sweet [17:48:59] awight, I agree re. merging. I think that your change is good though I don't know where the "45" number came from. [17:49:16] I see 48 on scb1* nodes [17:49:18] hehe 1:1 with celery. But it turns out that’s not a number we use [17:49:24] and 32 on scb2* nodes. [17:49:35] Oh I'm talking about current production. [17:49:41] I’m happy to rebase and tweak it, shall I? [17:49:50] I'm thinking it's safe to just set everything to 48 [17:49:56] For current scb nodes [17:50:02] and then have something in stresstest for ores* nodes. [17:50:20] sure, let’s do that. [17:50:34] The only thing bothering me is that memory is right up against the wall already. [17:50:56] Was looking at that. We have fewer CPUs per host in scb2* [17:51:03] BUT the same amount of memory [17:51:17] So theoretically it would be safe to bump the web worker count. [17:52:16] RES is 800M for these, and I doubt they’re even close to that, cos of the copy-on-write thing. [17:52:47] actual memory per celery worker is 310M according to my napkin, and those do much more data stuff. [17:53:05] right [17:53:19] We'd be going up from 32 workers to 48 workers. [17:53:47] here, lemme just rebase and tweak [17:53:51] so it’s safe [17:54:39] awight, https://gerrit.wikimedia.org/r/396064 [17:55:14] Woops. I have a mistake in there ^_^ [17:55:57] No patchset uploaded. [17:56:03] *new [17:56:51] Mind if I do? [17:57:13] do what [17:57:21] nvm [17:57:43] I was going to tweak it to not change production [17:57:53] but it looks like you’re still hacking [17:58:05] Na. How would we fully not change production? [17:58:27] In this case it shouldn't change scb1*, but it will have an effect on scb2* [17:58:58] It is very annoying that scb is different between CODFW and EQIAD. [17:59:01] I would set the default to 32, then add entries to hosts/scb100* to set to current values. [17:59:08] awight: direct hiera calls are not evil, implicit ones are [17:59:33] akosiaris: Like, class params? [17:59:34] and any kind of hiera call outside of a profile is [17:59:38] https://gerrit.wikimedia.org/r/#/c/396064/4/modules/profile/manifests/ores/web.pp is an example of explicit? [17:59:42] is evil that is [17:59:51] yes, that's the good call [17:59:58] cool I think I see what you mean then [18:00:34] * akosiaris searches search for the wmf doc on this [18:00:46] there you go https://wikitech.wikimedia.org/wiki/Puppet_coding#Organization [18:00:52] that's WMF guidelines on these things [18:00:56] akosiaris: I think future computer historians are going to look back on puppet and go “OMFG” [18:01:19] I definitely agree [18:01:27] awight, I agree with setting up specific hiera for scb1* [18:01:32] You want to do it or should I? [18:01:52] anyway, got to run.. send reviews my way and I 'll review :D [18:01:56] :) [18:02:05] halfak: Happy to. One moment, please [18:02:46] OK I'll leave it to you. [18:02:56] kk thanks akosiaris [18:04:00] halfak: I won’t do it this time, but FYI my favorite way to co-author stuff like this is to do a followup patch in real-time, and the first author can cherry-pick -p in as desired. [18:04:58] halfak: I see a few of the existing lines are violating “explicit hiera calls with no fallback value” [18:05:27] awight, not sure what you mean there. [18:05:38] AIUI, this is about the “, 48)” default value [18:05:49] So that's good or bad? [18:05:59] Looks like that is a "fallback value" [18:06:16] not gonna change that though. [18:06:41] I think the style guide is saying that fallback value is bad [18:06:42] Is that good or bad [18:07:09] Both [18:07:17] But it’s not clear to me where the defaults should go, so I’ll leave it alone. [18:07:21] Ahh yeah. I agree. There are two fallbacks in place. [18:07:27] In the module itself [18:07:29] Check it out [18:07:31] At the top [18:07:38] the fallback in the module will never come into play though [18:07:56] modules/ores/manifests/web.pp [18:08:03] Why not? [18:08:58] I think we only um instantiate (donno what it’s called in puppetville) those modules from the profile code. [18:10:50] There’s no hiera.yaml, so I have no clue how to reason about which config files are read for each node. [18:11:15] awight, but you changed stuff in modules/ores/manifests/web.pp before [18:11:25] Why did you change it there if you don't think it matters? [18:11:27] sure I’m just following the trail [18:11:51] I can’t blaze new trails cos I don’t know what’s up [18:12:47] Ahh yeah. I'm in the same boat. Gonna rely on ako* to help us out in review. [18:13:14] I think we should drop the fallback in profile, keep it in the module and set the hiera in all of the scb1* [18:13:16] And we're done. [18:13:32] Then a follow-up can clean up all of the other fallbacks in profile. [18:14:07] awight, ^ [18:14:07] Other way around maybe but ok [18:14:11] ? [18:14:17] What would be the other way around? [18:14:24] If the profile always instantiates the module, then the fallback in the module is never used. [18:14:35] I don't think that is the case. [18:14:54] ummm [18:14:55] I think if hiera doesn't have anything (and we don't do the fallback in profile) then the module's value will be used. [18:15:03] ok I’m keeping both fallbacks cos this is spooky [18:15:08] I’ll comment in the commit message. [18:15:10] kk [18:15:38] That's fine with me. Again, can be cleaned up in a follow-up [18:19:01] Pushed. [18:24:19] +1'd [18:24:23] I'm running off to lunch [18:24:30] back for post morten in ~30 mins [18:24:34] *mortem [18:24:37] o/ Nettrom [18:24:38] :D [18:24:51] Typo reminded me to say "Hi" but now I'm running away for lunch [18:24:55] have a good lunch, halfak :) [18:53:19] akosiaris: your one of the Datacenter team people right? [18:55:46] dc ops? no he's not [18:56:34] (also bear in mind it's 9 pm our tz so he's hopefully away) [18:57:41] back [18:57:49] In the Post Mortem call [18:57:59] (not started yet) [18:58:23] o/ awight [18:58:29] have you ever done one of these before. [18:58:35] nope [18:58:38] I just realized I have no idea what they are actually like [18:58:46] I guess we let Greg drive it? [18:58:53] Only in fundraising, which is unique I’m sure [18:59:10] I actually didn’t realize this was a standard thing. [19:04:29] (03CR) 10Catrope: [C: 04-1] Join decomposition of ores_model table queries (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/395811 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [19:09:17] Ugh anyone an networking expert here? [19:36:19] hey, do you know the status of ores200*.codfw.wmnet? [19:36:33] they seem to exist but not in use yet? [19:36:48] are they going to use role(ores::stresstest) maybe? [19:37:28] i just need all nodes to have _a_ role, if in doubt i will give it "role(test)" for now, just cant be without any role as it seems now [19:38:29] found it, status: stalled ok https://phabricator.wikimedia.org/T165170 [19:38:59] but the reason to stall it was "while https://phabricator.wikimedia.org/T169246 is ongoing" and that ticket is resolved [19:39:10] so might be unstalled [19:40:42] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review: rack/setup/install ores2001-2009 - https://phabricator.wikimedia.org/T165170#3821059 (10Dzahn) Is this unstalled now? The reason was while T168246 is ongoing but that ticket is resolved. Is it really resolved though? [19:41:41] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3414945 (10Dzahn) Is the stress test over? Then T165170 is probably unstalled now. Is it not over yet? Then maybe this ticket shou... [19:45:15] *note, reopen stress task [19:45:37] ^ lol [19:46:08] awight: ive never heard of a dev to want to do that but ok [20:02:45] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3821108 (10awight) @Dzahn sorry--we decided to test some more, to overcome a suspiciously low performance ceiling. I'll make the fol... [20:05:45] 10Scoring-platform-team, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3821137 (10awight) [20:05:47] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3821132 (10awight) 05Resolved>03Open Reopening until we finish with {T182249}. [20:15:02] 10Scoring-platform-team, 10ORES, 10Patch-For-Review: Refactor ORES puppet for Kubernetes - https://phabricator.wikimedia.org/T182332#3821154 (10Halfak) Right now, it seems like we want to have one uwsgi worker per celery worker because a uwsgi worker will block while a celery worker generates a score. We'll... [20:16:41] 10Scoring-platform-team, 10ORES, 10Patch-For-Review: Refactor ORES puppet for Kubernetes - https://phabricator.wikimedia.org/T182332#3821157 (10Halfak) [20:17:27] 10Scoring-platform-team, 10ORES, 10Performance: Profile ORES code memory use - https://phabricator.wikimedia.org/T182350#3821158 (10awight) [20:22:04] back in 30 minutes [20:52:59] 10Scoring-platform-team, 10MediaWiki-extensions-ORES: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3821279 (10awight) [21:31:45] awight, I have growing skepticism about using Flow for JADE comments [21:49:16] It comes down to, are we providing a way to spell, I think. [21:49:16] slowdown: ^ want to jump into this chat? [21:49:21] * slowdown reads [21:49:40] Do we have to provide admin suppression tools just in case abusers figure out how to spell something using the inputs other than free-form text. [21:49:51] e.g. a string of first letters of articles. [21:51:21] I would say yes, because that will happen [21:51:22] slowdown: We’re using Risker’s guidelines, https://en.wikipedia.org/wiki/User:Risker/Risker's_checklist_for_content-creation_extensions [21:52:34] I've seen abuse pushed through in similar form in the old Education Extension, and it didn't have suppression built in so the abuse had to be manually redacted from the database [21:52:55] * awight mops beads of sweat [21:53:33] Risker defines “content” as: Any material added, removed, altered, revised, edited, deleted, or otherwise manipulated by a registered or unregistered user using any user interface that creates a change to any aspect of the Wikimedia project. [21:53:37] that would be us. [21:54:05] awight, I think you're getting hung up on suppression [21:54:18] When I think we need to support basic curation [21:54:19] halfak: well, thanks for thinking of this. It’s a whole lot of architectural and roadmap change, though. [21:54:25] and suppression is a small part of that [21:54:32] I don't think so awight [21:54:34] I still like Flow because it gives us real discussions [21:54:39] we knew we needed mediawiki integration [21:54:51] We've been discussing Risker's articles [21:55:14] I like Flow too -- for discussions [21:55:14] We should not re-implement discussion systems [21:55:28] But comments != discussions or posts. [21:55:38] halfak: Not sure what you mean wrt suppression—we’ve been designing for editing judgments all along. [21:55:51] And the benefits we get for cramming comments into "discussion" don't add up to me. [21:56:01] awight, right but judgements must show up in recentchanges/logging/etc. [21:56:09] Or curation can't really happen [21:56:11] yes that’s new news [21:56:14] Oh and watchlists. [21:56:23] Not to me. Sorry we didn't discuss this before. [21:56:26] IMO we were only going to allow the author to “edit” their own judgment. [21:56:28] I've had it in mind the whole time. [21:57:01] awight, not sure what you mean by that. [21:57:09] no worries, I’m just trying to communicate how dramatically this altered my own understanding [21:57:11] People are going to switch preference bits. [21:57:25] I'm struggling to imagine your previous understanding. [21:57:39] exactly, that people can edit their own judgments, although it’s really adding more judgments and deprecating the old ones. [21:57:58] Sounds like you are talking about "endorsements" [21:58:38] no, people changing their mind about a judgment. [21:58:38] that is deprecating an old judgment and adding a new one. [21:58:51] I still don’t like "endorsements" [21:59:50] awight, hmm. Do you know what a !vote is? [22:00:00] Here’s the crux of my problem, > What is an endorsement without a judgment, if not a discussion thread [22:00:00] from https://www.mediawiki.org/wiki/Topic:Tzw0uv2bucrdprm4 [22:00:15] And endorsement doesn't exist without a judgement [22:00:18] No [22:00:34] A !vote is a common wiki pattern for discussion subjective decisions. [22:01:17] E.g. "'''support''' I've seen proposals like this work in the past" [22:01:17] or "'''oppose''' this proposal would break something else" [22:01:32] Eventually consensus happens (or not!) and it's recorded. [22:01:34] I’d prefer that we replace “endorsement” with the author just providing their own judgment. [22:01:50] awight, right but you're not discussing the common behavior here. [22:01:55] Surely from a user [22:01:56] (on another topic, I just found email notification for “structured discussions") [22:02:17] Please no more topics [22:02:28] I’m happy to learn and adapt to on-wiki behavior [22:02:56] But support/oppose discussions are in response to a proposal. [22:03:05] In this case, you’re saying they are in response to a judgment and coment. [22:03:07] *mm [22:04:28] Well, from a user's point of view, when they show up and provide the first !vote, they are creating a judgement and providing their endorsement of that judgement. [22:04:47] It would likely be one action for any user of JADE. [22:04:59] But if the judgement already exists, they'd just be endorsing the old judgement. [22:05:22] judgement might be "delete", "merge", or "keep" in a deletion discussion. [22:05:26] IMO, this only works if we separate judgments by schema, i.e. voters are endorsing only “damaging: true” and not “{damaging: true, goodfaith: true}" [22:05:47] Right. I think so too. [22:05:51] It’s not right to have a yes/no vote on something that you can respond to with multiple dimensions [22:06:20] huh? [22:06:20] That's not the problem I see there. [22:06:31] but separating the judgment is nasty [22:06:46] What does that have to do with endorsements? [22:06:55] I think that problem is orthogonal. [22:07:24] You’re making me question it, but I’ve been thinking that (judgement_schema_A, judgment_schema_B, comment) is a single decision if it happens in one session. [22:07:34] It’s all information that goes together [22:07:44] Just to the user. [22:07:57] especially the comment+judgment [22:08:12] now we’re talking about an interface in which we would have to get a new comment for each judgment-schema [22:08:16] But from a schema point of view, what would we do when we add a new schema? [22:08:23] hm? [22:09:02] Well, we could combine and just multiply the comment across the set. [22:09:06] Well, if you have a comment "looks bad" with {"damaging": true} [22:09:06] then later we add "goodfaith" [22:09:08] What then? [22:09:20] Do you rewrite that judgement to include a "null" goodfaith judgement? [22:09:20] There are lots of schemas available, we only present some, and the user is free to use all or none of the available ones [22:09:37] Right. Or copy their summary to all of 'em like in a commons upload. [22:10:10] I wish I had a more rigorous framework for this [22:10:42] I don’t quite get what you mean bu [22:10:44] by [22:10:47] argh [22:10:51] by “add goodfaith” [22:11:02] like, the user wants to add one more piece of data to their judgment? [22:11:11] or the developers deploy a new judgment schema? [22:11:54] Hey whats the api link for querying ores from enwiki? [22:12:18] https://ores.wikimedia.org/v3/enwiki [22:12:32] K [22:12:43] awight, add goodfaith as in imagine we deploy and realize we want to add a "goodfaith" question to "edits" [22:12:46] Zppix: go to https://ores.wikimedia.org/v3/ for docs [22:13:30] nevermind "goodfaith" -- let's say we have that. And we want to add a new question called "spam". [22:13:31] halfak: oh no problem, schemas are filled in for a judgment using a join table [22:13:44] Right. But if a group has a single comment. [22:14:02] Then the old comments don't account for spam. [22:14:06] And spam would be newly added to the group. [22:14:15] I think it makes more sense that all judgements have their own comments. [22:14:23] That's what's in the current version of our schema. [22:14:24] for exactly the reason you’re describing [22:14:24] that’s a new schema version, and old judgments use the old schema [22:14:25] I think that’s important cos we need to know exactly what questions we asked. [22:14:30] e.g. if we reverse the order of questions, that’s very significant to the data [22:14:38] It's not a new schema. It's an additional schema. [22:14:55] We won't set the question order. [22:14:55] cool. [22:15:03] E.g. in huggle, they may only be concerned with "damaging" [22:15:09] lemme look at the schema code cos it was “correct” in the etherpad [22:15:11] And so users will only answer that question. [22:15:29] IMO the wiki page is the source of truth [22:15:34] +1 ok that’s fine. [22:15:41] well maybe not [22:15:52] but I see your point that we can’t dictate what tools do. [22:16:19] And we shouldn't! [22:16:19] I think you're thinking of the wikilabels use case. [22:16:22] Where we want the cleanest judgements we can get. [22:16:26] we need to collect the tool user-agent [22:16:32] I’ve been imagining that schemas come with a reference message and help text we use to ask the question. Probably points to an onwiki message. [22:16:47] awight, agreed re tool/source. that's not in the schema and should be. [22:17:03] +1 for each schema including a message. [22:17:38] E.g. "damaging" -- "Did the edit cause damage? Should it be reverted?" [22:18:04] And "goodfaith" -- "Does it appear that the editor who saved this edit was trying to contribute productively." [22:18:12] We have these strings in the wikilabels i18n [22:18:15] oh dang—more than we thought: name of the label, help text describing what we want to know and guidelines, and text for each label if it’s binary or multiple-choice. [22:18:29] great [22:18:29] Again in the wikilabels i18n [22:18:36] So we can copy that over [22:18:39] I might have some lag [22:18:43] What about “Yes” / “No” or “Good faith” vs “Bad faith" [22:19:05] Moving closer to the router... [22:19:08] see wikilabels? [22:19:42] (working on it) [22:19:58] yeah we need messages for each binary choice. [22:19:59] So, I'm concerned that we started talking about using Flow for comments and how we would enable curation and now we're talking about i18n for labels. [22:20:03] as part of the schema. [22:20:44] I’m not concerned :p. there’s a lot we need to tighten up [22:20:45] we have 6 or so Talk:JADE threads [22:21:29] I do not believe that messages (i18n) should be part of any schema. [22:21:51] But they should be related to the schema. [22:21:51] We should not cut a new version of the schema for every update in translation. [22:22:07] k schema_definition is a blob so we’re good there. [22:22:16] noo it would be message keys. [22:22:21] Right [22:22:25] agreed [22:22:40] We also need to collect what language the user’s interface was in [22:22:46] in case there are problems with the i18n [22:23:29] okay, judgment_score is already a join table as I was hoping [22:23:29] so we’re free to add or update schemas [22:24:04] I’m noting stuff on our sync etherpad for lack of a more obvious place [22:24:12] judgement_score? [22:24:43] How about the wiki page and discussions on /Implementation [22:24:55] https://github.com/wiki-ai/jade/blob/master/schema.sql#L107 [22:25:01] k [22:25:35] You guys dont mind if i do a quick restart of icinga2-wm do you? [22:43:26] awight: you missed your chance the netsplit already started :P [22:44:33] halfak: Back to the initial topic… Is it unwiki of us to say that the author of a judgment owns it and other people can’t edit that? [22:44:57] Do we have to let admins edit rather than just suppress? [22:45:02] awight, well I think a judgement can only change by being set through consensus. [22:45:20] I think just suppress. [22:45:21] I don't think a judgement should have an author [22:45:30] But there is the person who first endorsed it. [22:45:40] you mean the rank? [22:45:53] I'm not sure rank makes sense here. but it could [22:46:09] ooh I can edit your Flow comments, for an example of extreme wikiness. [22:46:10] It's rank in Wikidata but boolean everywhere else. [22:46:35] awight, right. I don't think that makes sense for comments. [22:46:51] ew, Flow :/ [22:46:57] I was planning on rank having the same meaning as in Wikidata [22:46:59] when you change your judgment, the old one is rank=deprecated. [22:47:02] when consensus elevates a judgment to subjective truth, it has rank=preferred [22:47:03] oh [22:47:05] +1 [22:47:23] * awight squints at TheresNoTime [22:47:29] * halfak does too [22:47:50] big fans? [22:48:07] Nope. I do like discussion systems that anyone can figure out those. Ones that look like the rest of the internet. [22:48:17] However, I don't think it's a great idea to convert enwiki. [22:48:24] \o/ [22:48:33] But for new spaces, it can certainly make sense to just have a decent discussion system. [22:48:37] E.g. in JADE :) [22:48:59] then again, I'm still very much wikitext editor over VE. I imagine Flow appeals to the VE folks [22:49:02] haha, halfak found TheresNoTime’s compromise button [22:49:10] I'm a wikitext guy too. [22:49:15] :) [22:49:22] I love VE except when it hangs and erases an hour of my work, which I have to recover using screenshots. [22:49:42] and to those who like VE, well, no one is perfect :-) [22:49:43] Though sometimes (sometimes) I'll use VE to edit a large table. [22:49:43] Otherwise, I just want wikitext :) [22:52:02] Oh, also, not sure if you both saw my words of thanks the other day - the awesomeness which is ORES helped me create https://en.wikipedia.org/wiki/User:There%27sNoTime/AfC_very_old_draft_scores [22:52:39] Which will hopefully go towards finding quick-decline drafts, and free up the enwiki AfC backlog \o/ [22:52:53] Hey all -- currently working on this and awight and I are discussing whether it should be moved over to MediaWiki: https://wikitech.wikimedia.org/wiki/ORES/New_model_checklist#Step_9:_Deploy_the_new_model [22:53:19] TheresNoTime, I did see that. I didn't have much time to click through. Does it look like the draftquality model is proving useful? [22:53:29] https://wikitech.wikimedia.org/wiki/ORES/New_model_checklist (the whole page actually) :-) [22:53:57] srrodlund, I think it's wikitech material. [22:54:29] Seems wikimedia specific to me. [22:54:57] Then again, I don't think it would be weird to have this at MediaWiki.org [22:54:57] halfak: mostly, some are less "spam" and more "generally not notable", but so far AfC reviewers agree that a prediction of "spam" normally means they would decline the draft [22:55:14] Oh good. That's useful to know TheresNoTime [22:55:24] It's trained based on CSD deletion decisions. [22:55:54] So maybe people flag things "Spam" for non-notability often. [22:55:54] Trained on G11s for "spam" iirc? [22:56:03] ^ was just typing that :D [22:56:53] But on the whole, useful, and likely to become a bot [22:56:58] I'd love to see a short summary of what you and your collaborators are seeing with the model's predictions somewhere so that I can reference it when making improvements. [22:57:19] And maybe when advocating for more resources. [22:59:06] Definitely will do! [23:21:18] Arg. I still haven't made it to look at codezee's code. I'll do that tomorrow. [23:21:44] Have a good night or whatever is going on in your timezone, folks! [23:21:44] o/