Oral Performace Scoring Using Generalizability Theory And Many-facet Rasch Measurement: A Comparison Study

Open Access
Alkahtani, Saif Fahad
Graduate Program:
Educational Psychology
Doctor of Philosophy
Document Type:
Date of Defense:
June 21, 2012
Committee Members:
  • Jonna Marie Kulikowich, Dissertation Advisor
  • Jonna Marie Kulikowich, Committee Chair
  • Hoi Kin Suen, Committee Member
  • Pui Wa Lei, Committee Member
  • Edgar Paul Yoder, Committee Member
  • Measurement
  • MFRM
  • Generalizability Theory
  • IRT
  • Analytic scoring
  • Holistic scoring
  • Oral performance
The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent raters for their implementation of Quranic rules and proficiency of reading. Although, the relationship of raw scores of holistic and analytic revealed high estimates, suggesting that, on average, rank ordering of students was consistent across the scoring rubrics, a paired-sample t-test revealed statistically significant differences between their means. Furthermore, individual and overall comparisons of holistic and analytic scoring rubrics of Quranic recitations using MFRM showed that analytic scoring rubric is associated with better individual and overall fit statistics for all measurement facets. Likewise, G-theory analysis showed that analytic scoring rubrics were associated with lesser measurement errors and with higher coefficients of dependability (i.e., G-coefficients, and D-indices). The introduction of analytic rubrics might have helped guide raters to evaluate students’ recitations more consistently and bring raters to a common understanding of the scoring scales. Such findings might lend more support to the introduction of analytic scoring to the Quranic assessment practice, as it guides raters to rate consistently and similarly.