An Assessment of Reproducibility of Social and Behavioral Science Papers Using Supervised Learning Models
Open Access
Author:
Nivargi, Rajal
Graduate Program:
Computer Science and Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
June 22, 2021
Committee Members:
Chitaranjan Das, Program Head/Chair C Lee Giles, Thesis Advisor/Co-Advisor Sarah Michele Rajtmajer, Committee Member Jian Wu, Special Signatory Rui Zhang, Committee Member
Keywords:
Reproducibility Replication Social and Behavioral Sciences
Abstract:
In the last decade, there has been increased conversation over the "reproducibility crisis" and "replication crisis" in various medical, life and behavioral sciences. This thesis focuses on the social and behavioral sciences(SBS) research claims. We try to assess prediction of reproducibility of SBS papers using supervised machine learning models. We use a framework of feature extraction to retrieve 5 categories of features namely: bibliometric features, venue features, and author features from public APIs or open source machine learning libraries with customized parsers, Statistical features by recognizing patterns in the body text and semantic features from public APIs or using natural language processing models. These features are analyzed using different feature selection methods such as pairwise correlations, mutual information and ANOVA-F values. Their importance for predicting a set of human-assessed ground truth labels for the SBS papers was studied. We identify the top features based on the feature selection methods by comparing the performance of 10 supervised machine learning models.