Understanding and Predicting Retractions of Published Works
Open Access
Author:
Modukuri, Sai Ajay
Graduate Program:
Computer Science and Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
June 24, 2021
Committee Members:
Chitaranjan Das, Program Head/Chair C. Lee Giles, Thesis Advisor/Co-Advisor Sarah Rajtmajer, Thesis Advisor/Co-Advisor Jian Wu, Special Signatory Rui Zhang, Committee Member
Keywords:
Machine Learning Reproducibility Psychology
Abstract:
Recent increases in the number of retractions of published papers reflect heightened attention and increased scrutiny in the scientific process motivated, in part, by the replication crisis. These trends motivate computational tools for understanding and assessment of the scholarly record. Here, we sketch the landscape of retracted papers in the Retraction Watch database, a collection of 19k records of published scholarly articles that have been retracted for various reasons (e.g., plagiarism, data error). Using metadata as well as features derived from full-text for a subset of retracted papers in the social and behavioral sciences, we develop a random forest classifier to predict retraction in new samples with 73% accuracy and F1-score of 71%. We believe this study to be the first of its kind to demonstrate the utility of machine learning as a tool for the assessment of retracted work.