DEVELOPMENT AND EVALUATION OF AUTHENTIC RUBRIC

Open Access
- Author:
- Pun, Wik Hung
- Graduate Program:
- Educational Psychology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- March 01, 2017
- Committee Members:
- Hoi K. Suen, Dissertation Advisor/Co-Advisor
Hoi K. Suen, Committee Chair/Co-Chair
Pui-Wa Lei, Committee Member
Rayne A. Sperling, Committee Member
Dennis K.J. Lin, Outside Member - Keywords:
- Comparative Judgment
Performance Assessment
Evaluation
Essays Grading - Abstract:
- Systematic evaluation is a key component of reliable and valid performance assessment. Scoring rubrics are often used to achieve this goal by promoting inter-rater agreement evaluating performances (Johnson, Penny, & Gordon, 2009). More recently, some scholars (e.g., Goffin, Gellatly, Paunonen, Jackson, & Meyer, 1996; Goffin & Olson, 2011; Pollitt, 2004, 2012) advocated for the use of comparative judgment, a scoring procedure adopted from the Thurstone’s (Thurstone, 1927a) Law of Comparative Judgment, to replace scoring rubrics in performance assessment. It was argued that the comparative judgment method holds multiple advantages over scoring rubrics. Empirically, some studies were able to show that evaluations elicited via comparative judgment methods had higher criterion-related validity evidence (Goffin et al., 1996; Goffin, Jelley, Powell, & Johnston, 2009; McMahon & Jones, 2014; Olson, Goffin, & Haynes, 2007; Shah, Bradley, Parekh, Wainwright, & Ramchandran, 2013). However, comparative judgment in its original implementation has major drawbacks including laborious evaluation process, not able to discern the absolute quality of the performance, and unable to communicate the evaluation standards to examinee effectively. In this study, a new implementation of the comparative judgment method termed authentic rubric was proposed. The authentic rubric replaced the scoring categories in the scoring rubric with expert evaluated performances and asked the raters to compare performances to be evaluated against these anchors. 22 raters were recruited to evaluate 100 argumentative essays using either a holistic rubric or the proposed authentic rubric. The authentic rubrics were constructed by selecting essays evaluated by two professional raters. Five hypotheses related to user experience, psychometric properties of the evaluations, and efficiency of the evaluation process were proposed and tested. Among the five hypotheses, only one was confirmed and showed that raters who used authentic rubric found the evaluation experience to be more enjoyable. Nonetheless, examining the data showed that authentic rubric evaluations had marginally higher reliability and criterion-related validity. Post-hoc analysis revealed that a larger sample size may be needed to reach statistical significance conclusion. Implications of the study findings and future areas of research were addressed.