[08:38:05] Hi everyone, our monthly office hour (with analytics) will be starting at the hour in about 20 minutes [08:39:05] * RhinosF1 online [09:01:20] hello! [09:01:23] Hey everyone, welcome to office hours for the next 1 hour. [09:03:10] Research and Analytics will be around and try to answer and discuss any questions or ideas you might have [09:03:19] If you would like to ask a question or discuss a particular topic, simply type in the chat. We will do our best to answer as quickly as possible (I will try to relay to specific members of our teams); but everyone else please feel free to chime in as you see fit. [09:05:14] Is the start of the office hours at 11 or 12 Berlin time? [09:05:41] 11 Berlin time (so now) [09:05:54] Good morning - thx. [09:07:14] What is the scope of discussions here? [09:07:29] Seppl2013: sorry if there was confusion around the starting time. we changed starting time for this meeting. we are experimenting with alternating timezones to allow more people tojoin [09:10:43] Recently I joined https://www.wikidata.org/wiki/Wikidata:WikiProject_Events and I'd like to ask some questions in this context. [09:10:43] there is no fixed agenda. we wanted to make ourselves more available to volunteers, researchers, etc; discuss any questions about our teams, our projects, or more importantly questions about your projects or ideas that we can support you with during the office hours [09:11:33] for example questions about how to use a specific dataset [09:12:06] One question for me is what the notability criteria would be for entering conferences and conference series into wikidata. [09:12:09] so focus is on questions related to research and analytics part of projects [09:14:58] Seppl2013: notability criteria for wikidata are probably best discussed with the wikidata community. they also have dedicated office hours https://www.wikidata.org/wiki/Wikidata:Events#Office_hours [09:16:45] Hello! [09:16:47] Anybody here speaks Portuguese? [09:17:03] Bixo: um pouco : ) [09:17:11] :) [09:17:33] One of the project goals of WikiProject Events is "Interconnect Wikidata items for individual conference presentations with a main Wikidata item for the conference itself". I am currently working an an approach to extract the relevant information from proceeding titles which are alreay available [09:17:54] Sobre a Wikipédia, existe alguma pesquisa relacionada sendo desenvolvida? [09:18:16] Também precisaria montar um GIF das versões de um verbete, existe alguma ferramenta que faça isso? [09:18:50] Seppl2013: how do you extract the information? [09:20:08] @mgerlach - i wrote a parser that understands proceeding titles like Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA and extracts: {'enum': 'Thirty-First', 'prefix': 'AAAI', 'event': 'Conference', 'topic': 'Artificial Intelligence', 'month': 'February', 'daterange': '4 - [09:20:09] 9', 'year': '2017', 'city': 'San Francisco', 'province': 'California', 'country': 'USA'} which could then be used to find the corresponding conference. [09:20:50] Especially finding the conference series would be nice. [09:21:29] Unfortunately there are currently only a few entries available see https://tools.wmflabs.org/hay/vizquery/#PREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20wikibase%3A%20%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0APREFIX%20schema%3A%20%3Chttp%3A%2 [09:21:29] F%2Fschema.org%2F%3E%0APREFIX%20bd%3A%20%3Chttp%3A%2F%2Fwww.bigdata.com%2Frdf%23%3E%0ASELECT%20DISTINCT%20%3Fitem%20%3FitemLabel%20%3FitemDescription%20(SAMPLE(%3Fimage)%20AS%20%3Fimage)%20%3Fsitelink%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ27785883.%0A%20%20OPTIONAL%20%7B%20%3Fitem%20wdt%3AP18%20%3Fimage.%20%7D%0A%20%20OPTIONAL%20%7B%0A% [09:21:30] 20%20%20%20%3Fsitelink%20schema%3Aabout%20%3Fitem%3B%0A%20%20%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fde.wikipedia.org%2F%3E.%0A%20%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%2Cfr%2Ces%2Cde%2Cru%2Cit%2Cnl%2Cja%2Czh%2Cpl%2Ccs%22.%20%7D%0A%7D%0AGROUP%20BY%20%3Fitem%20 [09:21:30] %3FitemLabel%20%3FitemDescription%20%3Fsitelink [09:21:44] Bixo: sorry, not sure I fully understand the question. what research are you looking for? [09:22:21] Research about Wikipedia, in general way... [09:22:23] And [09:23:58] Bixo: research on wikimedia projects are generally documented on meta: https://meta.wikimedia.org/wiki/Research:Index [09:24:07] I'm doing this, but in the wiki-pt. I need a tool to take a frame in every modification in the article of the wiki. [09:24:26] this is a long list. happy to point you to more specific projects. [09:25:24] Bixo: so you want to take a picture (snapshot) of the displayed article? [09:25:34] yeah [09:25:42] @Bixo - wikipedia is based on the MediaWiki software so you can use the Mediawiki-API to extract any information you see fit. see https://www.mediawiki.org/wiki/API:Main_page/pt [09:25:47] but, to do a time line [09:26:25] Seppl2013: the link with the query does not work for me. are there any specific challenges you run into when trying to extract the data? [09:30:14] Bixo: do you need the picture or rather the information contained in the page? and by timeline you mean, the same article over time? [09:30:46] @mgerlach conference proceedings series Q27785883 is the entity which has currently only some 120 entries. As part of the https://www.tib.eu/en/research-development/project-overview/project-summary/confident I 'd like to add more conferences and series based on existing data. One challenge is that proceedings titles are sometimes ambigous since [09:30:47] syntactical elements like dots are missing. Or the character set being used is strange or broken. [09:31:39] mgerlach: I need the picture, the development of the page about the same article over time? [09:31:59] mgerlach: .* [09:32:30] miriam_: do you have an idea about getting snapshots of pictures of articles? [09:33:56] Seppl2013: Could you please paste the url-decoded versions of your SPARQL queries here so that I can have a better look at them? [09:34:40] mgerlach: The page: https://pt.wikipedia.org/wiki/Princ%C3%ADpio_da_incerteza_de_Heisenberg [09:35:21] Bixo: so it is a single page you are interested in? [09:35:48] @bixo https://www.mediawiki.org/wiki/API:Revisions will give you all revisions of an article. [09:35:49] Yes, but i need a tool to do this. [09:36:30] @bixo https://pt.wikipedia.org/w/index.php?title=Princ%C3%ADpio_da_incerteza_de_Heisenberg&action=history has all the revision [09:36:49] Seppl2013: owww, I will try this. Thx [09:37:38] @GoranSM @http://wiki.bitplan.com/index.php/SPARQL#Conference_Series has the Query # WF 2020-06-07SELECT ?item ?itemLabel WHERE { # scientific conference series (Q47258130) ?item wdt:P31 wd:Q47258130. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }} [09:37:40] Seppl2013: But the history can't presente a time line [09:38:42] Seppl2013: if it is only small differences in the title-names, you could match them using something like Levensthein distance https://en.wikipedia.org/wiki/Levenshtein_distance [09:39:11] @Bixo - you need a few lines of software to do that in your favorite progamming language. At https://stackoverflow.com/questions/tagged/mediawiki-api you'll find a questions & answers forums for details on how to use the API: [09:39:38] @Bixo if you post your question on stackoverflow in detail i'll happily answer there. [09:40:15] Seppl2013: Ok, the query runs and delivers on WDQS. Now: what do you need help on? Matching the extracted data as you have described with the query results (the conferences)? [09:40:32] Seppl2013: Ok, Thx :) [09:42:22] @GoranSM @mgerlach - in the process there will be a necessity of mass insertions and fixes. I intend to do these using a python API and would love to get pointers on similar approaches. Before I do a mass edit of hundreds or thousands of entries i'd like to try out the approch with a few dozen entries. [09:43:14] Bixo: this paper looks at visual position of links in wikipedia articles https://dl.acm.org/doi/abs/10.1145/2872518.2889388 perhaps this contains some description of the methodology [09:43:34] Seppl2013: Are you trying to match (a) paper title with (b) conference title only, and the infer if the paper was presented in the respective conference? [09:46:07] mgerlach: Thx. [09:46:44] @Goran SM - the title contains meta data like the field, acronym, title and ordinal of a conference. So if i find e.g. the Fifth European conference on WikiData Projects, Berlin 2020 I might search for a First-Fourth European conference on WikiData Projects. If I find at least three conference entries i might decide that this is a proper conference [09:46:44] series. I am interested in the possible cross checks. e.g. checking that the conference topic is valid and whether any other metadata is already available that e.g. supports the notability decisions to be made. [09:54:20] @mgerlach - thank you for the pointer for the https://en.wikipedia.org/wiki/Levenshtein_distance - interestingly the spelling of content seems not be much of a problem. I tried a dictionary approach and with a dictionary of only some 700 words I could already verriy 65% of the content of 16.000 proceedings of "scholarly articles "on which i tried [09:54:21] the approach. E.g. for the field the results were: Health: 591, Clinical: 416, Medical: 380, Cancer: 367, Medicine: 336 as the top results. In the case of matches that are so common i'd assume to be safe in linking e.g. the proceedings to the field. But to what entity should i e.g. link for Cancer - https://de.wikipedia.org/wiki/Cancer needs [09:54:21] dereferencing ... [09:58:34] Seppl2013: so the problem is to link the procedings-title to a field? (sorry, still confused) [09:59:02] Seppl2013: I would say that are doing something very useful and interesting here, but in order to help you I would really need to understand the research plan and the approach taken precisely. Do you have any documentation on this project, anywhere - Wiki, Github..? [10:00:47] @GoranSM - the documentation will go public in a few days - so i could get back to this during the next office hours. Where would be a good place to post the documentation link besideds this IRC channel and may be the telegram chat? [10:01:39] Seppl2013: the query above gives all items belonging to scientific conference. is the task to match a given proceedings title you extract to one of the existing conference items? [10:03:35] and yes, please keep us updated in any of the following office hours. [10:03:40] Seppl2013: I am only informally involved here - trying to help - while @mgerlach should know what is the best way to get in touch with WMF Research team on your project once the documentation is ready. As of me, just ping me on goran.milovanovic_ext@wikimedia.de. [10:03:51] @mgerlach - matching to existing entries will only rarely give a positive result - in this case a link from proceeding to conference would IMHO be useful. If the conference is not there a new entry might be necessary dependending on the notability of the conference and availability of data. In the ConfIDent project we intend to eventually bring [10:03:51] together meta data from different sources and add PIDs to each. conference. For the time being we'd simply like to try out our approaches. [10:04:09] Seppl2013:My question at https://stackoverflow.com/questions/62552467/a-time-line-from-article-of-the-wikipedia [10:06:50] Seppl2013: so if you expect few matches, the problem is whether to add a new entry? or where to link it to? what are the different options for linking? [10:07:45] Seppl2013: but since we are on time; feel free to reach to me via email mgerlach@wikimedia.org or update during next office hours [10:08:01] thanks everyone for joining the discussion today [10:08:05] hello and good bye everyone ! I joined very late so I did not talk, but it was interesting to follow. See you next month ! [10:08:11] @GoranSM - i have invited you to our wiki which is not public yet. I'll send you a personal e-mail with a link to the current state of affairs. [10:09:47] @Seppl2013 - thx for the help - talk to you next time @mgerlach - i am also adding you to our wiki. [10:09:47] Seppl2013: I will do my best to help. The problem sounds really interesting. [10:10:21] agree with goranSM [10:10:23] mgerlach: @everyone: See you around. [10:10:38] thanks, and see you around [10:11:03] Thanks, and see you around. [17:20:12] Thanks Martin for encouraging me to join this channel! [17:20:21] always interesting to hear what this team is up to [19:34:20] Hey djellel_! Would you be able to attend the Research Group meeting tomorrow? [19:34:43] I have a collaborator who is working on detecting undisclosed paid editors and I'd like you to share notes WRT sock puppet detection. [19:35:16] o/ eljohnny! Welcome! As you can see this channel is bursty. Sometimes we go days without discussion, sometimes the channel is full. [19:35:40] Questions, thoughts, ideas -- all are welcome. [19:36:35] yes, I can attend [19:47:24] halfak: and, sure, we can discuss that tomorrow during the RG meeting [19:47:53] Awesome! See you there :)