Automated and Semi-Automated Assessment of STEM Reasoning Questions

Li, Zhaohui

Automated and Semi-Automated Assessment of STEM Reasoning Questions

Open Access

Author:: Li, Zhaohui
Graduate Program:: Computer Science and Engineering (PHD)
Degree:: Doctor of Philosophy
Document Type:: Dissertation
Date of Defense:: February 18, 2024
Committee Members:: Chitaranjan Das, Program Head/Chair
Dongwon Lee, Outside Unit & Field Member
Huijuan Xu, Major Field Member
Rui Zhang, Major Field Member
Rebecca Passonneau, Chair & Dissertation Advisor
Keywords:: Automated Assessment
Machine Learning
Selective PredictionHuman-in-the-Loop
NLP
Education
Relation Networks
Contrastive Learning
Human-in-the-Loop
Selective Prediction
Abstract:: Language supports all human activity, including education, where reliance on natural language processing can facilitate instructional methods that support better learning. This thesis addresses a challenge faced by STEM educators in balancing the ease of grading selected response (SR) questions, like multiple choice, against the educational benefits of constructed response (CR) questions, which require students to articulate their own reasoning. Assessment of CR questions is labor-intensive. This thesis develops novel NLP methods to support use of CR questions in STEM education for formative assessment, where students receive feedback during instruction. Development of automated methods to assess CR questions confronts three equally important challenges. One challenge is data insufficiency: existing datasets, and datasets we create with collaborators in education research, involve a high degree of human effort and quality control, thus are limited in number, size, and diversity. A second challenge pertains to the problem formulation. Standard NLP classifiers are inadequate to capture the alternative relational structures of diverse question formats preferred in educational settings, and do not investigate informative feedback. A third challenge involves the tradeoff between the reliability of automated methods versus the high cost of expert human effort in manual assessment. This thesis addresses the challenges of limited datasets and diverse formats of CR questions through collaboration with STEM education researchers to create new datasets, and use of relational neural networks (RNs) for assessment. Three new datasets have been created for research in secondary and post-secondary physics and statistics education research. RNs are highly efficient learners, thus suitable for small datasets, and can flexibly model diverse CR formats. Our first model, SFRN, achieved an 8-11% improvement over previous work. Our second model, AsRRN, replicates the relational structure of CR questions that have multiple questions per scenario. It also investigates the use of contrastive learning to learn distinct representations for the same correctness class corresponding to different kinds of feedback. It outperformed all baselines, including state-of-the-art large language models. To balance high reliability of expert assessment with ease of automated assessment, this thesis proposes two new methods for machine learning of selective prediction (SP) policies, meaning when to trust a learned classifier’s decision, and when to defer to a human expert. To our knowledge, no other work jointly trains a classifier and a SP policy. In summary, this thesis contributes significantly to automated assessment in STEM education. It seeks to enhance student learning, provide flexible assessment methods across diverse educational contexts, and lessen educators’ assessment workload, effectively merging NLP advances with the real-world necessities of education and education research.

Tools