1. DESIGN AND IMPLEMENTATION OF A MULTI-STAGE PIPELINE FOR LARGE SCALE EXTRACTING, CLUSTERING AND INGESTION OF ACADEMIC DOCUMENTS FOR CITESEERX Restricted (Penn State Only) Author: Angadi, Manoj Kumar Title: DESIGN AND IMPLEMENTATION OF A MULTI-STAGE PIPELINE FOR LARGE SCALE EXTRACTING, CLUSTERING AND INGESTION OF ACADEMIC DOCUMENTS FOR CITESEERX Graduate Program: Computer Science and Engineering (MS) Keywords: CiteSeerExtractionClusteringIngestionLSHBM25NGXElasticsearchPythonPipelineEISCitationindexPDFDocumentServer File: Login to Download Committee Members: Chitaranjan Das, Program Head/ChairC Lee Giles, Thesis Advisor/Co-AdvisorBhuvan Urgaonkar, Committee Member
2. Large Scale Author Name Disambiguation in Scholarly Databases Open Access Author: Menon, Arjun Title: Large Scale Author Name Disambiguation in Scholarly Databases Graduate Program: Computer Science Keywords: Author Name DisambiguationMachine LearningScholarly DatabaseClusteringDistributed System File: Download Arjun_Menon_MS_Thesis.pdf Committee Members: C Lee Giles, Thesis Advisor/Co-AdvisorBhuvan Urgaonkar, Committee MemberChitaranjan Das, Program Head/Chair