AN API FOR AUTHOR NAME DISAMBIGUATION

Dudhbhate, Gauravi Uday

Start Over Back to Search

AN API FOR AUTHOR NAME DISAMBIGUATION

Open Access

Author:: Dudhbhate, Gauravi Uday
Graduate Program:: Computer Science and Engineering
Degree:: Master of Science
Document Type:: Master Thesis
Date of Defense:: June 26, 2017
Committee Members:: Dr. Lee Giles, Thesis Advisor/Co-Advisor
Keywords:: API
Disambiguation
Author Name Disambiguation
Machine Learning
Web service information extraction
scholarly big data
Web service
information extraction
Web Service
Information Extraction
Random Forest
Clustering
Abstract:: In digital libraries, there are ambiguities present in an author’s name primarily when one name can have multiple variations, when multiple authors can share the same name and when the ambiguity exists due to incorrect input of data or due to incorrect extraction by automated software. Especially, in digital libraries, when this problem for author name ambiguity is persistent, it can be inconvenient for users. Authors would then be required to manually sort through the serach result for a scholarly document or an article written by a particular author, in the absence of author name disambiguation techniques. With great amount of research underway for author name disambiguation, where techniques are achieving almost 90-95 percent accuracy in displaying the accurate articles written by a particular author, the querying latency and the return of results, of such algorithms is slow. Further although such algorithms exist there are very few, if any, end-to-end services that provide a web accessible platform to submit a query and obtain the disambiguated articles with simply the click of a button. In this thesis, we propose a hierarchical approach to attain a faster querying latency while maintaining the sameaccuracy of 90-95 percent. We further propose an end-to-end service that provides an API to achieve ease in use of the algorithm and user satisfaction. We show that our hierarchical method outperforms the most accurate random forest approach. Finally, we also provide a comparitive analysis of the query latency and the classification accuracy of the two methods.

Tools