1. DESIGN AND IMPLEMENTATION OF A MULTI-STAGE PIPELINE FOR LARGE SCALE EXTRACTING, CLUSTERING AND INGESTION OF ACADEMIC DOCUMENTS FOR CITESEERX Restricted (Penn State Only) Author: Angadi, Manoj Kumar Title: DESIGN AND IMPLEMENTATION OF A MULTI-STAGE PIPELINE FOR LARGE SCALE EXTRACTING, CLUSTERING AND INGESTION OF ACADEMIC DOCUMENTS FOR CITESEERX Graduate Program: Computer Science and Engineering (MS) Keywords: CiteSeerExtractionClusteringIngestionLSHBM25NGXElasticsearchPythonPipelineEISCitationindexPDFDocumentServer File: Login to Download Committee Members: Chitaranjan Das, Program Head/ChairC Lee Giles, Thesis Advisor/Co-AdvisorBhuvan Urgaonkar, Committee Member