Acknowledgments in Scientific Documents: Extraction, Storage, Search, and Social Network

Open Access
Khabsa, Madian
Graduate Program:
Computer Science and Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
April 16, 2012
Committee Members:
  • C Lee Giles, Thesis Advisor
  • Wang Chien Lee, Thesis Advisor
  • Raj Acharya, Thesis Advisor
  • Acknowledgments
  • Information Extraction
  • Entity Resolution
  • Search Engines
  • Digital Libraries
Acknowledgments are widely used in scientific articles to express gratitude and credit collaborators. Despite suggestions that indexing acknowledgments will give interesting insights, there is currently, to the best of our knowledge, no such system to track acknowledgments and index them. In this thesis we introduce AckSeer, a search engine and repository for automatically extracting and storing acknowledgments in digital libraries. AckSeer is a fully automated system that scans items in digital libraries including conference papers, journals, and books and extracts acknowledgment sections and identifies the entities within. We describe the architecture of AckSeer and discuss the extraction algorithms, which achieve an F1 measure above 83%. We use multiple Named Entity Recognition (NER) tools and propose a method for merging the outcome from different recognizers. The resulting entities are stored in a database. They are then added to the AckSeer index along with the metadata of the containing paper/book, and thus the entities are made searchable. We build AckSeer on top of the documents in the CiteSeerx digital library yielding more than 500,000 acknowledgments and more than 4 million mentioned entities. After building a repository for acknowledgments, we construct an acknowledgments graph, and study the relationships between the entities therein. The social networks of authors and publications have been well studied in the literature, with an exhaustive study of nearly all network properties. However, to the best of our knowledge the social graph of acknowledgments have never been investigated.