[11:29:19] Hi everyone! [11:29:45] I have a question related with the data I am curating in my research. [11:30:49] If in the data I collect usernames, should I sign some ethic declaration or should I include something special in my research? [11:31:12] I mean both in the research wiki page and in the possible places in which the data will be stored or used. [14:38:54] Generally, for offline analysis, we don't do anything special for data that is public. That said, you might consider obfuscating usernames depending your results depending on what you're studying. [14:38:57] ivanhercaz, ^ [14:39:31] Could you link to your research page? [14:40:31] Thank you, halfak! I am not going to publish a corpus of usernames nor something else, but I am thinking about to get the most common users in the scope. [14:40:51] Of course, the research page is https://meta.wikimedia.org/wiki/Research:History_and_cultural_heritage_of_the_Canary_Islands_in_the_Wikimedia_projects [14:41:07] I have to improve it a bit. [14:42:10] Aha! It looks like you plan to run a bot. Do I understand that right? [14:43:07] It looks like the "bot" isn't really making any edits -- just extracting data. [14:43:26] Yes, more or less, a semiautomatic tool to not make article by article. [14:43:53] No, no, I am not going to edit nothing. The tool extract the data and then I am going to analyze it. [14:46:03] Any comment or suggestion by you, halfak, or someone else, is very welcome! [14:47:46] Aha! OK if the bot is primarily extracting data, I don't see a big ethical concern. Most of the time with ethical concerns I ask "who might be harmed". A lot of the time, people want to contact Wikipedians in mass with survey requests or to make changes to the wiki itself -- which has the potential to be disruptive and waste people's time. Here it looks like the only possible disruption is if you end up discovering something [14:47:46] enbarrassing about an individual contributor and publishing that associated with their name. [14:47:58] What kind of insights are you hoping to gain? [14:48:48] *embarrassing [14:49:20] The main insights that I want to gain are what is the state of the cultural heritage and the history of he Canary Islands in the Wikimedia projects (mainly in Spanish Wikipedia). [14:49:44] It sounds like you're mostly analyzing the quality and completeness of articles then. Is that right? [14:50:57] All the data is about pages, but I am just worried about the possible one that I am also interested: are the articles worken by many users or are a "top" of users creating/editing them? [14:51:25] Thus I am worried about if I should mention the usernames or just to publish it in numbers and not mentioned users. [14:51:48] Of course, it isn't something that should harm anyone, but I prefer to ask to confirm the possible policies about it. [14:52:13] And yes, it is about analyze: first, the quantity, then the quality and completeness. [14:54:33] Oh! I though that I had the data I want to gain in the research page but I have seen now I didn't add it. [14:54:44] I will improve the research page in the next days. [14:57:07] Nice. I don't see a big risk. But you might make the "top N" editors list plan explicit in the writeup and post that in a main forum on Spanish Wikipedia for people to look at and comment on. I can't imagine any serious concerns, but it is good practice to share your research proposals with people and you might get some people interested in your work. [14:57:37] Nice! Very good idea! [14:57:45] Thank you very much, halfak! [14:59:09] It sounds like a fun project. I think it's very respectable that you are concerned about this sort of stuff in advance and reaching out. I hope your project goes well. Please let us know when you have results. We run a monthly research showcase and I think your work might be a good candidate for presentation :) [15:03:25] I am really grateful for your words, halfak! For me would be a pleasure to present my work in the monthly research showcase. And yes, I will notice everyone when I will have the results. [20:08:33] isaacj, what would you call what a Wikidata Q-id represents? A concept? A thing? [20:11:10] Also, I was wondering if you have a dataset that contains WikiProject directory mappings across various language Wikipedias.