[16:52:06] mornin' halfak. do you have 5 min to chat? [16:54:28] Hey leila, just finishing up an email. [16:54:36] Batcave in a couple minutes? [16:55:15] sure halfak. thanks. [17:23:49] hey leila [17:24:28] as usual, I started early this morning and I won’t make it in time to the office for our 9:30 [17:24:58] we could do this remotely or if I come to the office we could push it to later in the day [17:25:21] I have a slight preference for the latter or I’ll be stuck at home for another hour [17:25:58] does that work for you? I don’t think there’s anything particularly urgent to discuss this morning other than WikiGrok next steps [17:31:01] leila, we meeting? I'm not sure what the invite updates are about ;p [17:31:22] I'm finishing up another meeting Ironholds. [17:31:32] okie-dokes! [17:33:08] halfak, you got a recommended candied bacon recipe? ;p [17:34:01] I should write mine down. [17:34:52] (1) slice bacon into tiny strips (1-2 cm wide, 2-3 cm long) [17:35:13] (2) fry in pan until water/grease mixture begins to pool [17:36:10] (3) add brown sugar liberally and stir -- if there is still water left in the mixture, the sugar will partially dissolve and stick to the bacon. If it doesn't dissolve, you are too late and theres only grease left. [17:36:34] (4) Combine with scrambled eggs and serve with buttered toast. [17:36:44] sweet! [17:36:52] thank you! And yes, you should write it up on your site [17:36:59] next to your feels on the, what is it. Beta distribution? [17:37:02] I have photos. [17:37:07] :) [17:48:42] leila, when you've got a minute, I'd like to introduce you to khitron. khitron is working on an analysis of interwiki article links that are recorded in WikiData. [17:49:04] I figured you'd have some pro-tips about extracting data using the WDQ service. [17:50:11] halfak, I'm ready. :-) [17:50:28] Hello, leila. [17:50:42] hi khitron. nice meeting you. [17:51:03] \o/ [17:51:07] Me too. And sorry for my bad English [17:51:16] so, do you have a project page khitron? [17:51:27] so far so good. :-) [17:52:05] I have no idea what are you talking about in "project page" [17:52:22] lemme link you to one, khitron [17:52:59] :-) [17:53:00] we usually document the projects that we're working on in Research meta. This is an example of a project page that is in fact related to your work: https://meta.wikimedia.org/wiki/Research:Increasing_article_coverage [17:53:54] so what question(s) are you trying to answer khitron? [17:57:03] hi livnetata. Do you by any chance document your research for interlanguage links somewhere public? [17:57:37] Hi Leila [17:57:39] I wanted to link khitron to it and I realized I don't know where the documentation exits livnetata [17:57:48] I need some connection between wikipedia and wikidata. For example, I'm trying to create online a list of all wikipedia articles with human instance in wikidata but without defaultsort property. As I understand now, it's impossible, because I must read wikipedia using sql and wikidata using url queries. And I can't interleave between them. [17:58:52] I'm working on documenting everything but it is not yet organized... [17:59:04] np, livnetata. looking forward to it. :-) [17:59:48] Thanks. I think the mail I sent to analytics is part of it. [18:00:13] khitron, through API queries, you can do things like: give me all the items in WikiData that are such and such and have corresponding page in a specific Wikipedia language. Would that address your question? [18:00:56] And regarding translations, I have a list of ideas here https://www.mediawiki.org/wiki/Wikipedia_article_translation_metrics/How_to_detect_translated_articles but the code for it is not public (or close to done) yet. [18:01:11] yes, livnetata, and that just got me more excited. At some point during this quarter (next two months) we should touch-base. Some of the work you're doing is closely related to another body of work I'm working on with Bob West to identify what's missing from Wikipedia. [18:01:26] ah! thanks livnetata. [18:01:54] khitron, I need to go to a meeting. Feel free to continue over email, or we can chat when I'm back in 2 hours? [18:02:07] Possibly. I should see it and make some experiments. Don't know if it will work, leila [18:02:15] khitron, did you go over all the wikidata tables? [18:02:44] Thanks. I'll try this leila, and we'll talk another time you'll be here [18:03:06] What do you mean in "all" livnetata? [18:03:18] And hello you too. [18:03:49] hi :) [18:04:50] wikidata have special tables for their data [18:05:21] But I don't know if you you can query them as you need. If you can, that answers your question. [18:05:55] khitron: something like this may help http://wdq.wmflabs.org/api?q=claim[31:tree[5][][279]]ANDlink[enwiki] which me all the items in WikiData that are instances of human (looking into all classes, sub-classes, sub-sub-classes, etc. of human using the tree) and have a page in enwiki. [18:05:59] ciao khitron. [18:06:17] By leila [18:06:32] Anyway, I think Leila figured it out. [18:06:36] It's broken link to me, livnetata [18:08:23] for me too [18:08:25] give me a sec [18:11:41] my bad khitron, livnetata. try this one: http://wdq.wmflabs.org/api?q=claim%5B31:%28tree%5B5%5D%5B%5D%5B279%5D%29%5D%20AND%20link%5Benwiki%5D&props=* it should give you all the properties of all instances of human in WikiData that have a enwiki page. [18:11:48] (it takes a while for it to run) [18:12:35] Nice [18:17:47] 504 gateaway :-). And anywhere, if it's only "have a enwiki page", that means all the information is in wikidata. As I know, defaultsort can't be recognized in wikidata [19:03:19] I believe there is no more. Thank you very much for your help, halfak, leila, livnetata and good bye :-) [19:29:31] what's the technical term for wikidata's content? [19:29:35] Like, Q100. What's "Q"? [19:29:40] item [19:29:47] ta! [19:29:48] each thing is an item. items have property. those are P [19:29:53] thanks! :) [19:29:55] mind your P's and Q's [19:31:52] DarTar, so when are we meeting? [19:32:08] hey [19:32:19] I didn’t know if you were around today, [19:33:08] Leila only has a short window today in the early afternoon but my schedule is insane, I want to move this to tomorrow [19:33:30] DarTar, okay. And yeah, I'm not on holiday, so...around. [19:33:51] I'm sort of not sure what I'm working on at the moment so I'm doing PV refinement stuff in the meantime [19:41:11] Ironholds: I am syncing up with kevinator and grace on tasks that connect RD and Dev, I added two tasks on Trello that I think you can help with (priority is lower than prepping for the UC discussion and following up with Nuria on session data) [19:41:19] see bottom of Staged swimlane [19:41:51] DarTar, well, I don't really have any prep for UCs, that I know of? [19:41:52] kevinator: you can crosslink them [19:41:54] https://trello.com/c/iYWhCrlK/636-mid-commons-page-views-in-webstatscollector-drop-precipitously-in-2015 [19:41:57] and I'm working with Kevin/Nuria already [19:42:03] https://trello.com/c/Xvp93j8p/637-high-data-qa-of-legacy-vs-new-pv-via-udf [19:42:12] DarTar,Ironholds : We will not get to session data right away i do not think [19:42:39] nuria, Ironholds: I just added kevinator to that thread, he was not copied [19:42:55] DarTar: as pageview work in oozie (on which i'm working with ananth) has higher priority [19:43:43] DarTar: but once we get there (after apps monthly report) I will make sure to sync up with Ironholds just like we did for daily apps uniques [19:43:57] nuria: awesome, thanks [19:45:12] Ironholds: if you have time later I can tell you more about the data QA card [19:45:22] prep for UCs, grace is sending out a reminder [19:46:34] I don't understand what "prep for UCs" means. [19:46:38] as said above ;p [19:46:45] preparation for universities of california [19:47:33] Ironholds: the UC discussion we have tomorrow morning [19:47:55] * DarTar waves at harej [19:48:04] DarTar, I know what the /discussion is/ [19:48:05] hi dario [19:48:11] I do not know what /preparing for it/ looks like [19:48:14] what am I meant to prepare? [19:48:24] show up with an opinion? Read 300 pages of explanatory docs on k-anonymity? [19:49:00] Ironholds: grace sent out a reminder [19:49:41] yeah, I'm reading it [19:49:45] DarTar: do we have somewhere that document for k-anonymity that los alamos people sent? [19:49:52] DarTar: for my life i cannot find it [19:49:57] " the choice of the mechanism should be reached by a product owner after consulting stakeholders in the use-cases for uniques [19:49:57] and privacy advocates" [19:50:06] nuria: yes, hang on [19:50:07] what's the difference between this and every thread we've ever had ever? [19:50:59] nuria: https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews [19:51:39] DarTar: ok, thank you. Not so relevant for the uniques discussion but I wanted to understand their proposal better [20:43:47] Ironholds: hello! [20:46:05] halfak: as a gentle reminder: https://meta.wikimedia.org/wiki/Research:WikiProjects_and_Subject_Area_Activity_(English_Wikipedia) [20:46:42] Harej, did you want me to look at the methods section? [20:46:50] I think that was what it was [20:46:58] What's a longitudinal factor? [20:46:59] I am also interested in information about your quality heuristics! [20:47:24] logitudinal factor == https://en.wikipedia.org/wiki/Censoring_(statistics) [20:47:52] the longitudinal factors that affect wikiprojects mostly have to do with how some wikiprojects were active years ago even if they are not active now; differing levels of activity throughout a project's life. To keep everything even from a time scale perspective I am just doing things from July 1 to December 31 [20:48:39] I'm not sure this will help. Many WikiProjects will be in different lifecycle stages between July 1 and Dec. 31 [20:49:02] Might we try to control for the project init date? [20:49:15] Or maybe from the first time the project reached a certain level of activity? [20:49:16] Right; we're only interested in the projects that were *recently* active. We're not interested in a broad understanding of project activity. [20:49:22] kk [20:49:35] For this study I am particularly focused on the present. [20:49:52] Or the very recent past, as it were. [20:50:09] You state a hypothesis more broadly than these temporal bounds "hypothesis that the average number of edits per WikiProject talk page thread (as computed from talk page edit summaries) positively correlates with the number of subject-area article edits>" [20:51:02] Is there a way to state that in a way that is constrained to the temporal bounds I'm applying to the study? [20:51:37] Not sure. Might need to operationalize "recent". [20:51:42] ottomata, yo! [20:51:42] What would such a correlation mean? [20:51:51] Or rather, why would you hypothesize it? [20:52:50] See an example of hypothesis rationales here: https://meta.wikimedia.org/wiki/Research:Asking_anonymous_editors_to_register/Study_1 [20:53:13] Such a correlation might mean that WikiProjects play an active role in the development of articles by being a part of the article development workflow. Note that I don't actually consider this to be the case in most scenarios but it is a hypothesis I wish to test [20:53:44] In any case, I have to go to another meeting. Can you email your notes and questions to jamesmhare@gmail.com? [20:53:56] Sure. So, hypothesis is that active wikiprojects will correlate with active subject areas. [20:54:10] I would propose a month-to-month measures of edit activity and WP activity. [20:54:45] You could still use the last N months (e.g. July - Dec) [20:54:54] But you would look for a continued correlation over time. [20:54:57] Sure. [20:55:00] Anyways, talk to you later! [20:55:08] Oh yea. See ya! [21:06:01] Ironholds: yo! 1. FYI, we deployed a new refinery version! your changes are available as refinery 0.0.4 [21:06:30] 2. what's the status on this? https://gerrit.wikimedia.org/r/#/c/185377/ [21:06:35] you are going to rewrite it, or something, right? [21:09:19] ottomata, I saw! and yep, will do :). Just trying to unbork my head. [21:09:23] #2 is on today's to-do [21:09:43] cool danke [21:09:59] Ironholds: i'll be out for the next week, so I guess poke nuria to get reivews faster [21:11:13] Ironholds: at your service, sir [21:15:17] nuria, yay! Thankya :) [21:15:29] for what it's worth I will hopefully go on holiday soon too, but I can at least get an initial patch in :D [22:33:17] Ironholds: YARRGHGHVH [22:33:20] we ahve talked about this before, I know. [22:33:21] but [22:33:23] ori is asking me [22:33:25] : [22:33:29] namespace? or namespace_id [22:33:32] in x-analytics header? [22:34:32] ottomata, ID, ideally [22:34:35] namespace name is too variable [22:34:36] ID! [22:34:38] plz [22:34:41] languages! [22:34:44] plus, sometimes it's not actually in the URL because they've used an alias [22:34:51] and namespace alias retrieval is SO MUCH NOT FUN [22:34:53] so, ID :D [22:35:02] +9001 [22:35:14] hha [22:35:15] ok [22:35:21] i thought we talked abou tthis ina big thread a while ago [22:35:25] and decided to keep things consistent [22:35:29] consistent? [22:35:35] halfak: e.g. you and I talked about it it for revision schema [22:35:40] Oh! Nameingwise? [22:35:41] consistent with mw dbs [22:35:42] yes [22:36:05] i mean, i much prefer namespace_id [22:36:06] In the RevisionDocument schema, we ended up sticking with page.namespace to match the DB's page_namespace [22:36:11] yes [22:37:47] Hmm... Consistency doesn't work if we don't keep it up, but then again, I don't feel very strongly here. [23:04:30] hey halfak, sure: anything we can do to solve the conflict, although I don’t know what the new time is [23:05:09] 4:30 PM on my cal. [23:05:17] It looks like the event was moved successfullu. [23:05:34] I was able to snag Isidore [23:08:13] ewulczyn, thanks for the model tips. I'm just trying now to test the model on an unbalanced set. [23:14:10] Hello. I am back from Oakland. [23:20:50] ewulczyn, sure enough, I've still got my AUC with an unbalanced test set. This hack just might work. [23:25:17] halfak: got it, but I see you only moved this week, was that intentional? [23:25:37] btw I’m done with other stuff if you have a moment to review the PMID dataset [23:26:02] DarTar, only moved these week because of the temporary strategy meeting conflict./ [23:26:08] now is a good time for PMIDs [23:26:11] ok cool [23:26:18] I think all I need is access to the figshare thingie [23:26:45] yeah, I realized that adding a collaborator doesn’t automatically make the entry collaboratively editable, booo [23:26:56] Silly close-source minded world [23:26:59] let me try and see if there’s a way to fix this [23:29:09] halfak: Mark Hahnel is online on skype, I pinged him in case he’s at his screen [23:29:48] I know we can figure this out. [23:30:01] Oliver and I had a similar setup on figshare for uploading the session datasets. [23:30:13] BTW, we should discuss making a blog post about that again. :) [23:31:32] halfak: I see figshare has “projects” as collaborative objects [23:31:43] not sure if that’s what you guys used [23:32:44] DarTar, it was this thing: http://figshare.com/articles/Activity_Sessions_datasets/1291033 [23:32:48] And all it's sub-things. [23:33:45] hmm interesting, that looks exactly like a regular dataset release [23:33:54] ah, Mark is responding! :) [23:35:55] Goddamn. I wrote a description for each of the datasets and now I can't even figure out how to read them. [23:41:25] DarTar, any luck [23:41:26] ? [23:43:10] yes, just finished talk to him [23:43:27] so, tl;dr is we can use “projects” [23:43:34] like the one I just created and invited you to [23:43:59] they work as private workspaces that (I understand) let people collaborate on any resource attached to them [23:44:08] before publication [23:44:21] after publication it’s still possible to add new resources or modify the metadata [23:44:40] although the project itself cannot be made public, it really is a private collaboration space for now [23:45:30] mark also says that adding more resources to a dataset will retain the same DOI and add timestamps, proper versioning will happen later this year [23:45:53] so that should do it for now, I invited you to a project called “test” and moved the fileset inside it [23:46:46] halfak: can you see it? [23:47:28] * halfak navigates [23:47:57] Well... I don't see an invite. [23:48:12] hmm [23:48:21] Did you invite me: http://figshare.com/authors/halfak/96516 [23:48:30] Or some other halfak? [23:48:39] that halfak, let me try again [23:48:55] you’re marked as invited [23:49:05] and I have an option to send the invite again [23:49:11] Oh! I got it. Came to a different email than I expected. [23:49:12] maybe there’s some lag [23:49:15] k cool [23:49:32] so if you go to Projects > test [23:49:39] you should be able to see the fileset [23:50:03] and clicking on the cogwheel icon on the right should give you full access to the metadata and files [23:50:33] Got it [23:50:45] lmk if it doesn’t work [23:51:22] brb [23:52:19] It doesn't seem like I can edit anything. [23:52:31] I can "download" and "preview" [23:53:17] We have the wrong license too. Should be CC0 [23:53:52] halfak: quick hangout? [23:54:16] kk [23:54:20] call when ready [23:54:28] Or wait. Batcave!