[14:08:31] o/ akosiaris. I was just looking at the switchover email. [14:08:57] I'm wondering what we plan to do with ORES. It seems like we should actually make sure CODFW ores is working as expected. [14:09:42] halfak: https://phabricator.wikimedia.org/T159615 [14:09:53] the changes I 've uploaded should take care of that [14:10:06] Gotcha. [14:10:14] as far as end-to-end tests go, they are already running [14:10:23] same state as in eqiad [14:10:25] OH wow. I have a response to mobrovac that I never sent. eek. [14:11:56] yeah took me a while to answer as well [14:12:40] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 06Operations, 13Patch-For-Review, and 2 others: [spec] Active-active setup for ORES across datacenters (eqiad, codfw) - https://phabricator.wikimedia.org/T159615#3163940 (10Halfak) Hey folks. I just realized that I had the following message in sitting in ph... [14:14:03] akosiaris, will your work just have requests split between datacenters? [14:14:13] duplicated [14:14:17] I forgot where we landed on having a redis proxy for cache. [14:14:28] Oh! Gotcha. Just for precached though, right? [14:14:32] yes [14:15:06] which should be good enough to get the redis cache warmed up [14:15:21] so that when the traffic is switched over we don't have a catastrophic meltdown [14:15:33] well no ORES was that back-pressure system so that can't happen [14:15:50] but we would not be serving requests which would be as bad [14:16:14] Right. Agreed. [14:16:31] So we're not quite getting to active-active with this so much as non-catastrophic failover :) [14:18:29] well, depends. after those changes are merged, we should be able to serve without problems requests from both DCs, right ? [14:19:01] which is kind of active/active. maybe the cache is not shared between the 2 DCs but maybe that's enough ? [14:19:11] we should discuss this more thoroughly I guess [14:19:50] akosiaris, I think sharing the cache would help a lot. People often score the same set of revisions multiple times. [14:19:59] I do this a lot for research analysis. [14:20:25] They won't be recent revisions. Just the same batch of ~10-100k revisions that I want to work with. [14:20:27] yes, but do you don't expect to end up in different DCs [14:21:03] as in, the per DC cache has this extra benefit of being geographically distributed (minus the precaching) [14:21:19] It's really useful to have ORES respond mega-fast when I work on the dataset for the second time. [14:21:52] yes, won't that happen anyway ? [14:22:10] I mean hit the same DC so get that result you just computed [14:22:13] ? [14:22:14] It'll happen half the time. [14:22:19] why ? [14:22:27] Assuming 50/50 split between datacenters. [14:22:33] don't assume that [14:22:34] And separate caches. [14:22:36] oh [14:22:40] Why not :) [14:22:43] it's geographically distributed [14:22:49] Oh! [14:22:49] requests go to the closest DC [14:22:52] Interesting [14:23:06] Yeah... that'd mitigate this concern. [14:24:16] in an active/active scenario (when we get there), you can expect the western US for example to hit CODFW and the Eastern US to hit EQIAD [14:24:30] europe would hit eqiad, australia would hit codfw [14:24:42] and so on [14:27:31] and if ORES results became cacheable at higher levels (https://phabricator.wikimedia.org/T137962 is probably relevant) Europe and Asia would probably hit ESAMS and Australia ULSFO [14:27:42] there's a nice blog post about that in our blog from a few years ago [14:27:46] lemme find the link [14:28:04] https://blog.wikimedia.org/2014/07/09/how-ripe-atlas-helped-wikipedia-users/ [14:28:06] there you go [14:28:34] Gotcha. That'll be interesting to monitor. :S [14:29:11] we control that btw. We try to mostly have our configuration match the state of that map [14:29:39] and keep in mind a southeast asia DC is coming up [14:31:41] We'll need to keep those in mind for CapEx for FY2019. [14:31:50] it's gonna be cache only [14:31:59] so no worries [14:32:03] Oh I see. So we wouldn't have an ORES cluster there? [14:32:07] nope [14:32:15] gotcha. I suppose that makes sense. [14:32:16] neither mediawiki, not anything else [14:32:31] only caches [14:33:09] but getting ORES responses cacheable (some of them at least) would help users over there a bit [14:34:08] akosiaris, yeah. That's a complex problem because of our performance dynamics. We could always have more hardware and eat the extra IO and CPU. :) [14:34:20] We might need to multiply by 4 or 5 though :| [15:22:31] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 06Operations, 13Patch-For-Review, and 2 others: [spec] Active-active setup for ORES across datacenters (eqiad, codfw) - https://phabricator.wikimedia.org/T159615#3164081 (10mobrovac) What is the status of {T148714} ? Has it been deployed and tested? If so, w... [15:31:50] 06Revision-Scoring-As-A-Service, 10Analytics, 10ChangeProp, 10EventBus, and 3 others: Create generalized "precache" endpoint for ORES - https://phabricator.wikimedia.org/T148714#3164091 (10Halfak) In T159615, @mobrovac asked about this. This change is now deployed. We should be ready to receive changepr... [15:43:48] 06Revision-Scoring-As-A-Service, 10Analytics, 10ChangeProp, 10EventBus, and 3 others: Create generalized "precache" endpoint for ORES - https://phabricator.wikimedia.org/T148714#3164103 (10mobrovac) Thanks @Halfak for the info! I've just taken a look at the PR and am wondering why you opted to consume the... [18:52:09] halfak: are you going to online tomorrow? [18:53:48] to be online* [19:27:59] Yeah. I'll be online around 1500 UTC for a bit. [19:29:49] glorian_wd, ^ [19:30:55] halfak: oh? for a bit? so you will not online as long as usual? [19:31:09] not going to online* [19:31:11] TOmorrow is a Saturday ;P [19:31:26] halfak: yeah, i meant, as usual Saturday :D [19:31:38] I'm usually around for 4 hours or so on Saturdays. [19:32:04] oh I see. Ok, regardless, I will try to give you 5k sample by tonight [19:32:24] halfak: so perhaps, today or tomorrow you can feed them to the labels.wmflabs.org [19:32:32] I'm working on the code now [19:33:14] glorian_wd, we might have a bit of a problem with just working from your sample. [19:33:24] halfak: what problem? [19:33:33] I'd like to have your code for filtering out the undesired pages in wikiclass or maybe a new repo specifically for itemquality. [19:33:46] and then have the process of getting the items and filtering encoded in a makefile. [19:34:05] E.g. see https://github.com/wiki-ai/wikiclass/blob/master/Makefile [19:34:12] This is the makefile for our article quality models. [19:34:29] Here are the utilities that we use: https://github.com/wiki-ai/wikiclass/tree/master/wikiclass/utilities [19:36:36] * glorian_wd trying to understand what halfak means [19:36:59] Where do I find the utility you wrote to filter out unwanted items? [19:39:49] hmm [19:40:29] halfak: firstly, are the utilities here: https://github.com/wiki-ai/wikiclass/tree/master/wikiclass/utilities, used for pre-processing the sample? [19:40:58] yes [19:42:46] halfak: OK, so I must write a new script under that utilities directory? [19:43:26] Initially, I thought I only need to send you 5k item_id which are not unwanted_pages [19:46:07] We need to have a well documented process for arriving at that 5k [19:46:19] Sorry this totally wasn't clear. I've been to busy to think clearly about it. [19:46:25] And make sure you knew what I expected. [19:47:49] halfak: I am still don't understand what should I do to get at that 5k [19:48:15] by using that utilities that you've just explained [19:48:17] Write a query that gets a sample of items. Write a script (in python) that filters the unwanted pages out. [19:48:41] I'm changing locations. I'll be offline for 30 mins while I bike to a coffee shop [19:49:03] halfak: Ok. let me know when you are return [19:49:11] kk [19:49:13] have returned* [20:01:43] 10Revision-Scoring-As-A-Service-Backlog: DRAFT: Use rate limiting for ORES Action API score retrieval - https://phabricator.wikimedia.org/T162484#3164775 (10dr0ptp4kt) [20:02:29] 10Revision-Scoring-As-A-Service-Backlog: DRAFT: Use rate limiting for ORES Action API score retrieval - https://phabricator.wikimedia.org/T162484#3164791 (10dr0ptp4kt) [20:03:44] 10Revision-Scoring-As-A-Service-Backlog: DRAFT: Use rate limiting for ORES Action API score retrieval - https://phabricator.wikimedia.org/T162484#3164775 (10dr0ptp4kt) [21:06:39] o/ glorian_wd [21:06:55] So yeah. I think the best next step is to show me your code for filtering out the undesirable items [21:07:43] halfak: but I am still don't get about how to reach that next step [21:08:03] for instance, querying for 5k data. Do you mean I should have 5k data first before filtering them? [21:08:22] don't understand* [21:09:32] Based on your reply above, I assumed I should write a query to get 5k sample. After I got that 5k sample, I will eliminate unwanted items from that 5k sample [21:11:55] Maybe 6k data first and then filter? [21:14:00] What proportion of items are unwanted? [21:14:02] halfak: yeah sorry [21:14:37] halfak: proportion? in the last pilot, there were around 27 unwanted items from 250 original sample [21:14:50] however, keep in mind that, 27 is not included redirect items [21:15:13] I have just realized today that identifying redirect items cannot be done simply by looking at instance of [21:15:36] redirect items have a different structure. But it should be easy to identify using the api [21:19:23] I am thinking that, it is not a good strategy to extract 6k sample and eliminate unwanted items from these sample. Perhaps, it is a better strategy to extract 2000 items from each strata, and eliminate unwanted items from these items. That way, we can ensure that each strata has no unwanted items [21:20:02] Sure [21:20:07] Sounds like a good idea to me. [21:25:21] halfak: Actually, I am 90% done with that using script other than Python. [21:25:36] But if you require me to do that using Python, I can do that as well. [21:25:42] I will do it tomorro [21:25:45] tomorrow* [21:26:07] halfak: after I code a Python script for that, I should add that into utilities GitHub repo right? [21:27:23] It'll need to work like a utility in that repo. If you can get together a python script, I'll do code review to get it merged. [21:27:37] It might be easier if you just produce the script. [21:27:44] I highly recommend using the mwapi library. [21:27:54] mwbase might be handy too. [21:28:26] Are the filters that we are worried about only `redirect` and a few `instance-of` values? [21:29:49] Yeah [21:30:02] halfak: there're 4 instance of and a redirect page [21:30:12] Cool. [21:30:25] The instance-of should be configurable through the command-line [21:31:06] halfak: what're mwbase and mwapi library? [21:31:11] are those Python library [21:31:27] and what do you mean that the instance-of could be configurable through the command line [21:31:28] ? [21:52:27] The utility should take a list of Qids to exclude from a command-line parameter [21:52:34] yes those are python libraries. [22:00:35] halfak: oh do you mean passing unwanted items Q-ID via argument? [22:03:43] yes [22:04:23] I can work on this with you tomorrow. [22:04:27] But I have to run for now. [22:05:31] glorian_wd, ^ [22:05:32] o/ [22:06:17] halfak: see you tomorrow! [22:06:18] o/