[07:05:13] YuviPanda: hey, around? [11:17:09] bmansuro_: please split the bad words list to three ones [11:17:30] Amir1: ok, should I also add more bad words? [11:17:41] that would be great [11:17:48] ok will do [11:20:33] thanks :) [15:21:28] o/ [15:51:46] o/ halfak [15:51:52] Hey Amir1 [15:51:59] thanks for getting to uzwiki so quick! [15:52:15] yw :) [15:52:20] Happy to do something [15:52:31] Sorry I'm late on the balanced extractor. I've been working odd hours trying to get this refactoring done. I made some good progress last night though. [15:53:23] awesome [15:53:33] I wish I could help more [15:54:12] any thoughts here? [15:54:13] https://phabricator.wikimedia.org/T121005 [15:55:07] I'll read it very soon [15:55:11] OK. [15:55:24] I don't want to skim the task, it sounds important [15:55:38] I do it in one hour though :) [15:55:47] BTW, I'll be AFK for most of Friday since it's a major holiday and my family won't like it if I'm working. [15:55:59] (also sounds good re. reading and sharing thoughts) [15:56:17] yeah some here but for other reasons [15:56:36] I also need to study (it's exams time) , spend some time with family, I'll leave to visit my parents tomorrow but just for one day [15:56:59] merry Christmas btw :) [15:57:11] \o/ Merry Christmas! [15:57:37] So, take all the time you need. Just make sure I know what's up. :) [15:57:49] re. studies and holiday. [15:57:54] sure [15:57:56] or whatever, really :) [15:58:04] tomorrow I'll take a day off [15:59:05] after that I will be working harder since I'll need to take some days off early January for exams [15:59:18] (almost at the same time of the Dev Summit) [15:59:30] so it won't be a big miss [15:59:35] Well that's sort of convenient I suppose. [15:59:48] BTW, did you see that the Wikimania submission deadline is Jan 7th this year? [16:01:11] scholarship or session [16:01:14] ? [16:06:07] Amir1, session [16:06:14] usually they are in April. [16:06:22] So, this is very out of the ordinary [16:06:36] I only head about it through word of mouth [16:06:56] I didn't see an announcement, but then again, I have announcement blinders :S [16:07:03] hmmm [16:07:36] I applied for scholarship but I didn't determine anything for a seesion [16:07:40] *session [16:07:49] halfak: any ideas for a session? [16:08:05] Lessons learned building machine learning models for Wikidata [16:08:22] You could talk about Kian and ORES work [16:08:37] I think it would be very interesting to talk about the progression toward higher fitness models. [16:08:58] On an ethical note, I think it's /good/ that we educate people about the capabilities and limitations of these models. [16:43:01] o/ violetto [16:45:17] halfak: sure [16:45:20] great idea [16:45:34] I'll submit a session ASAP [16:53:39] I just asked violetto to take a look at https://upload.wikimedia.org/wikipedia/commons/7/7a/Revision_Scoring_as_a_Service_logo.svg and maybe help us make some improvements. [16:54:42] She asked what we were going or in the old logo and I remember discussing (1) wikimedia colors and (2) aiming to capture something mysterious -- like ML algorithms [16:54:53] Somehow "alchemy" seems to fit with "ores". [16:55:43] the wikimedia colors are stupid [16:55:50] :D [16:58:05] :)))) [17:00:28] haha but it's nice that people try to use the same colors even though it's not their favorite [17:01:18] it'll be nice to hear what you guys think ores is and bounce around ideas for the logo [17:02:08] So, I think it is important that we map out some conceptual boundaries. [17:02:36] 'ores' -- The webservice that hosts prediction models. This is the output of the larger 'revision scoring' project. [17:03:12] 'revscoring' -- The python library that helps us build and test models. This library provides a framework for encoding modeling problems. [17:03:54] 'wikilabels' -- The input (human judgement) for training scorer models. [17:04:22] We could be talking about a logo for the entire 'Revision scoring' project. [17:04:40] OR we could be talking about a logo for all 'Wikimedia AI projects [17:04:56] more logos? o.O [17:05:08] Helder, more updates to current logos [17:10:25] i recommend the CBS logo [17:12:16] i think wikimedia AI projects would be good, you can do something like WM AI projects logo • Revision scoring [17:12:32] going out for lunch, but I can still see replies on my phone, brb! [17:12:59] Not sure what you mean by "WM AI projects logo • Revision scoring" [17:13:12] Would that be link a general AI logo with the words "revision scoring" beneath it? [17:13:18] Or something like that? [17:31:18] Amir1, [17:31:23] looking at "claims_added" [17:31:31] it looks like we have two ways of working this out. [17:31:51] one is to do a dict_diff on pywikibase.ItemPage.claims [17:32:21] the other is to dig into the the claims and look for claims within each "p_number" [17:32:48] It seems these are two different things, but we call them the same thing and, incidentally, overwrite our feature name when doing so. [17:32:55] I'd like to differentiate them with a good name. [17:33:52] So, I'd like a good name for the pair of (item, property) and the triple of (item, property, value) [17:34:44] sorry I was afk [17:34:46] reading [17:36:40] halfak: can you give me a link, I've trouble understanding this [17:36:43] brb for dinner [17:40:08] I'm just going to call these statements for now and we'll see how that goes. [18:16:37] okay I'm back [18:17:28] halfak: system of claims are complicated in wikidata, references are way more complicated but [18:17:41] understanding this is little bit hard [18:17:49] we have something like this [18:18:10] item.claims['P31'] = [Claim1, Claim2] [18:18:27] which means P31 can have several values [18:18:45] So, maybe we should call each item in item.claims a "property"? [18:18:50] so a dict diffing is not sufficient [18:19:04] An item property has a list of claims. [18:19:48] I should look into their data model to give you better name [18:19:56] maybe they have something like this [18:20:51] but I think they call it Statement (statement = P31:Q1, Q2 and claim= P31:Q5) so each statement can have two claims [18:21:01] two or more obviously [18:21:11] but give me a minute to dig deeper [18:25:51] No worries. I think this'll work. [18:26:10] So, the wording I have now is that the dict_diff of item.claims produces a "property_diff" [18:26:21] and that you need to dig into that to get a "claims_diff" [18:27:39] exactly halfak [18:28:03] Cool. Working this out. :) [18:28:11] * halfak reduces lines of code and adds functionality. [18:28:59] https://www.wikidata.org/wiki/Wikidata:Glossary [18:29:35] based on this a claim is just P31:Q5 and a statement is P31:Q5 and possible references and qualifiers for that claim [18:29:57] so basically we have no name of our collection of P## values [18:30:48] I think that calling it item.claims is fine. [18:30:58] It's just weird that in diffing it, we get a properties diff. [18:31:06] But now that this is clear I can implement it. :) [18:33:25] awesome [19:31:59] halfak: btw don't touch references, it's a mess we have something like this claims['P31'] = [Claim1,Claim2] Calim1.references = [{'snaks':{'P143':[Claim1, Claim2]}}, {'snaks':... [19:32:21] Amir1, yeah. that was one of the methods that I mostly just left alone. [19:32:35] I'll need to ask for your help in verifying that our test cases are actually right. [19:33:20] Right now, I'm just assuming they are and working recording the outputs we get. [19:33:35] So that if we change/break something, we'll notice. [19:34:15] okay :) [20:49:56] halfak: yup yup. if I understand correctly kinda like this: [20:49:57] https://s-media-cache-ak0.pinimg.com/736x/eb/47/04/eb47045681e4c90ce16787ca2d5cd2e7.jpg [20:50:23] same logo consistently but different cities [20:50:31] Oooh. I like it. [20:50:55] https://s-media-cache-ak0.pinimg.com/736x/b7/5c/9b/b75c9bb33253502f33dc19f5272076d2.jpg [20:51:21] https://s-media-cache-ak0.pinimg.com/736x/83/e4/ea/83e4ea81d9bc65117c4cbc846e1971c6.jpg [20:57:56] those other two aren't really related to my suggestion, more like inspiration of a play of the same logo for different purposes [21:24:57] Seems like this will be very useful as we expand our sub-projects [21:27:31] * YuviPanda waves at halfak and violetto and everyone else toooo [21:27:37] o/ YuviPanda [21:31:51] * halfak looks at the clock ticking on his time to finish this refactor. [21:46:14] akosiaris: are the ores redises setup yet? [21:46:26] YuviPanda: not even close [21:46:37] haha [21:46:39] ofc [21:46:52] akosiaris: I rejigged the role last week, we're using multi-tenant redis now [21:46:57] one for cache and one for queue [21:47:55] ok [21:48:03] it's almost definitely going for next year btw [21:48:09] unless something magic happens and I get the time [21:48:11] akosiaris: yeah I'm sure. [21:48:21] halfak: ^ [21:49:04] YuviPanda, just so I understand, this is the bare-meta future ORES prod, right? [21:49:08] *metal [21:49:37] halfak: yes [21:50:25] Cool. How far out is the machine that will host the workers? [21:52:08] halfak: the machine is already there [21:52:17] just needs a bit of work to finish up puppetizing and setting up etc [21:52:30] Oh! So redis machine is the last bit? [21:52:49] How are you thinking about robustness in the case of a hardware failure? [21:52:54] halfak: we'll have two [21:52:59] halfak: and be able to switchover [21:54:36] Same deal with workers? [21:55:01] halfak: yeah [21:55:12] :) [21:55:20] halfak: so we'll have two machines and we haven't settled on an appropriate way to do this [21:55:38] halfak: my last thought process was that we'll have both of them active with web and workers on each, but same redis. [21:56:15] YuviPanda, yeah, we still need to figure out how to make sure that, under load, the workers run out of CPU before wsgi. [21:56:22] I think a simple CPU priority will do that [21:56:25] Or we can set limits. [21:56:39] I think we won't run into that problem on these machines, but we'll know once we actually load test [21:56:47] It's really important that uwsgi can still function under full worker load. [21:57:01] YuviPanda, essentially, then why even have backpressure? [21:57:16] Our backpressure system breaks when uwsgi gets too overloaded to contact the workers/redis/ [21:57:30] But yeah. +1 for tests. [21:57:30] I'm not disagreeing with you halfak :) Just trying to not think of it before we actually have the machines :) [21:57:44] and the particular characteristics there. [21:57:45] Fair point. :)