Scholar Name Disambiguation via Collective Clustering
Open Access
- Author:
- Luo, Dongsheng
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- October 13, 2020
- Committee Members:
- Xiang Zhang, Thesis Advisor/Co-Advisor
Suhang Wang, Committee Member
Dongwon Lee, Committee Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- Name Disambiguation
Collective Clustering
Information Network - Abstract:
- Scholar name disambiguation remains a hard and unsolved problem, which brings various troubles for bibliography data analytics. Most existing methods handle name disambiguation separately that tackles one name at a time, and neglect the fact that disambiguation of one name affects the others. Further, it is typically common that only limited information is available for bibliography data, e.g., only basic paper and citation information is available in DBLP. In this thesis, we propose a collective approach to name disambiguation, which takes the connection of different ambiguous names into consideration. We reformulate bibliography data as a heterogeneous multipartite network, which initially treats each author reference as a unique author entity, and disambiguation results of one name propagate to the others of the network. To further deal with the sparsity problem caused by limited available information, we also introduce word-word and venue-venue similarities, and we finally measure author similarities by assembling similarities from four perspectives. Using real-life data, we experimentally demonstrate that our approach is both effective and efficient.