[07:55:31] hey schana. my flight will arrive at 3:30 [07:55:43] you may have left by then, but if you haven't, I'll find you. :D [14:52:29] _o/ [15:13:15] o/ :) [15:13:55] hey halfak [15:14:04] Good morning guillom :) [15:14:10] Are you traveling to wikimania? [15:14:14] *Italy [15:15:01] Nope, I was in Europe for a few weeks and staying for Wikimania as well would have meant being away from home for a time that I deemed too long. [15:15:27] Instead I'm enjoying a very quiet, almost empty office, and a week without meetings! [15:17:15] +1 to no meetings! Regretfully I still have a few with non-wikimedia folks. [18:09:03] halfak: I want a relatively scalable way to start from the citation and derive the DOI if it exists. I'm creating a bunch of items on journal articles but I don't have the DOI handy. I could do lookups by hand but that would take forever. [18:09:58] harej, I have working code from the hackathon. I can get it to you. [18:10:24] Can it take free-text citations? Is there any way to check against false positives? [18:12:48] Oh wait. I was misunderstanding. But I still have an answer to your question. One sec. [18:13:59] harej, http://www.crossref.org/08downloads/search_tools.pdf [18:14:21] currently I am using http://search.crossref.org/dois?sort=score&q=Passive%20monitoring%20of%20fluctuating%20concentrations%20using%20weak%20sorbents. [18:14:33] but while I can exercise my human discretion there, that doesn't scale well [18:14:57] unless I can generally expect that the highest ranked one is the one I want? [18:19:29] I see. there were some people in berlin who had a system they argued worked better. [18:19:35] Maybe they'd have some insights. [18:19:38] * halfak checks notes doc. [18:20:09] https://etherpad.wikimedia.org/p/WikiCiteRoom121 [18:20:13] See line 96 [18:20:15] "Marin" [18:20:30] Bilbo.openeditionlab.org [18:20:42] harej, ^ [18:21:08] bilbo baggins [18:24:10] I don't know how this works? [18:24:38] It doesn't seem to do anything? [18:27:53] harej, I agree. Does not thing useful [18:27:56] I might have been mistaken [18:28:14] harej, what's the hit rate if you use crossref's lookup and just choose the first result? [18:28:34] "hit rate" as in success rate? I will want to do a randomized test for that. [18:28:46] Yeah. [18:28:57] Also would be good to know what an acceptable success rate is [18:30:06] Uh, high. Very high. [18:30:22] The goal is to derive the DOI so that I can plug it into Source Metadata and fill out Wikidata entries with high quality metadata. [18:30:38] It's no good to have a bunch of high quality metadata if it's the wrong data. [18:30:58] And we are dealing with a magnitude in the tens of thousands. [18:33:27] How doable is 99% accuracy? 99.9%? [18:35:53] Also, I think it would help if we set a minimum relevancy score. Such that if the highest ranking result is some pitiful score like a 1.3, we just say there's no match. [18:42:33] Hmm... Not sure I can comment on what is do-able [18:47:54] I'm doing a really cool thing: extending a government database via Wikidata [18:50:54] Oh nice [18:50:59] Finally, the data comes out of Wikidata :) [18:51:58] The data won't flow from Wikidata to NIOSHTIC so much as Wikidata will replace NIOSHTIC. [18:52:06] Not that NIOSHTIC is going anywhere. [18:53:34] Gotcha [18:54:55] https://www.wikidata.org/wiki/Q24706982 << here is an example of an entry where I want to work backwards and find the DOI from what little metadata I have [18:55:12] so that I can get the richer metadata from Crossref by way of Source Metadata [18:55:18] ( https://tools.wmflabs.org/sourcemd )