Balancing Generative and Supervised AI: The Trade-off Between LLMs and PLMs in Formative Assessment

Open Access
- Author:
- Wei, Yuchen
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- February 13, 2025
- Committee Members:
- Rebecca Jane Passonneau, Thesis Advisor/Co-Advisor
Wenpeng Yin, Committee Member
Chitaranjan Das, Program Head/Chair
Rui Zhang, Committee Member - Keywords:
- Formative Assessment
Automated Answer Assessment
Large Language Model
Pretrained Language Model - Abstract:
- Automated formative assessment aims to provide a scalable, efficient, and objective evaluation of student responses. This thesis investigates the trade-offs between generative approaches, leveraging large language models (LLMs), and supervised methods based on pre-trained language models (PLMs) for assessing answers of reasoning questions in the STEM field. By evaluating on multiple STEM assessment datasets—including SemEval-2013 Task 7, ISTUDIO, ASAP, and CLASSIFIES—this work compares the performance of fine-tuned PLMs trained on human-annotated data against LLMs applied through in-context learning with both examples and concept-based rubrics. The study also introduces innovative data synthesis methods to generate training data for answer assessment tasks. It utilizes LLMs to generate and re-annotate synthetic training samples, incorporating diversity-enhancing strategies that include case generation and a randomly selected target word count. The generated data enable a lightweight PLM to capture and distill the capabilities of more complex LLMs with little or no human effort in data. Experimental results demonstrate that while supervised PLMs achieve high accuracy with a large amount of human-labeled data, LLM-based approaches provide a competitive, cost-effective alternative, particularly when augmented by rubric-guided assessments. In addition, the generation of explanatory feedback from LLMs offers improved interpretability, providing students with clear, rubric-aligned rationales for their scores.