1. Overcoming the bottleneck of extracting and indexing hundreds of millions of academic papers to support a scholarly big data service: a case study of CiteSeerX Open Access Author: Keesara, Sai Raghav Title: Overcoming the bottleneck of extracting and indexing hundreds of millions of academic papers to support a scholarly big data service: a case study of CiteSeerX Graduate Program: Computer Science and Engineering Keywords: Information RetrievalInformation ExtractionDigital LibrariesSearch EngineScalabilityAcademic LibrariesElasticsearch File: Download Keesara_Thesis_May2021.pdf Committee Members: C Lee Giles, Thesis Advisor/Co-AdvisorBhuvan Urgaonkar, Committee MemberJian Wu, Special SignatoryChitaranjan Das, Program Head/Chair
2. DESIGN AND IMPLEMENTATION OF A MULTI-STAGE PIPELINE FOR LARGE SCALE EXTRACTING, CLUSTERING AND INGESTION OF ACADEMIC DOCUMENTS FOR CITESEERX Restricted (Penn State Only) Author: Angadi, Manoj Kumar Title: DESIGN AND IMPLEMENTATION OF A MULTI-STAGE PIPELINE FOR LARGE SCALE EXTRACTING, CLUSTERING AND INGESTION OF ACADEMIC DOCUMENTS FOR CITESEERX Graduate Program: Computer Science and Engineering Keywords: CiteSeerExtractionClusteringIngestionLSHBM25NGXElasticsearchPythonPipelineEISCitationindexPDFDocumentServer File: Login to Download Committee Members: Chitaranjan Das, Program Head/ChairC Lee Giles, Thesis Advisor/Co-AdvisorBhuvan Urgaonkar, Committee Member