EFFECTIVE SOLUTIONS FOR NAME LINKAGE AND THEIR APPLICATIONS
Open Access
Author:
Elmacioglu, Ergin
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
June 03, 2008
Committee Members:
Dongwon Lee, Committee Chair/Co-Chair Piotr Berman, Committee Member Wang Chien Lee, Committee Member Patrick M Reed, Committee Member
Keywords:
record linkage data cleaning entity resolution
Abstract:
In order to identify entities, their names (e.g., the names of persons or movies) are among the most commonly chosen identifiers. However, since names are often ambiguous and not unique, confusion inevitably occurs. In particular, when a variety of names are used for the same real-world entity, detecting all variants and consolidating them into a single canonical entity is a significant problem. This problem has been known as the record linkage or entity resolution problem. In order to solve this problem effectively, we first propose a novel approach that advocates the use of the Web as the source of collective knowledge, especially for cases where current approaches fail due to incompleteness of data. Secondly, we attempt to mine semantic knowledge hidden in the entity context and use it with the existing approaches to improve the performance further. Finally, we the mixed type of linkage problem where contents of different entities are mixed in the same pool. Our goal is to group different contents into different clusters by focusing on extraction of the most relevant input pieces. We also illustrate the use of the proposed techniques in various real world applications.