Performance of Popular Item Response Theory Dimensionality Assessment Methods Under Several Nonstandard and Suboptimal Conditions

Open Access
- Author:
- Hochstedt, Kirsten
- Graduate Program:
- Educational Psychology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 03, 2020
- Committee Members:
- Pui-Wa Lei, Dissertation Advisor/Co-Advisor
Pui-Wa Lei, Committee Chair/Co-Chair
Jonna Marie Kulikowich, Committee Member
Sarah Elizabeth Zappe, Committee Member
Mosuk Chow, Outside Member
David Lee, Program Head/Chair - Keywords:
- item response theory
latent trait nonnormality
dimensionality
unidimensionality
DIMTEST
NOHARM - Abstract:
- Assessing test dimensionality is a fundamental part of the evaluation of a test and a prerequisite for using item response theory models. The purpose of this study was to investigate how popular dimensionality assessment methods perform under some nonstandard and commonly encountered suboptimal conditions when the items are dichotomously scored. To this end, a range of sample sizes (250, 500, 1,000, and 2,000 examinees), test lengths (20, 40, and 60 items), latent trait distributions (normal, symmetric/leptokurtic, uniform/platykurtic, and asymmetric/leptokurtic), guessing specifications (correct/no guessing, correct/guessing, and incorrect/guessing), and inter-trait correlation (.3, .5, and .7) were manipulated in a simulation study. The dimensionality assessment methods examined include two nonparametric item pair conditional covariance-based essential dimensionality assessments as implemented in DIMTEST and four goodness-of-fit indices based on a parametric nonlinear factor analytic method as performed in NOHARM. The performance of each method was evaluated based on their incorrect rejection rates of unidimensionality (Type I error rate). The methods that performed well based on the Type I error rate were then examined by their correct rejection rates of unidimensionality (power). The results indicate that the nonparametric method DIMTEST performed the best overall. DIMTEST is recommended particularly when the sample size is large, the test length is medium or long, and there is no guessing behavior, even when the inter-trait correlation is high. Based on the power analysis, the NOHARM-based approximate chi-square statistic (ACHI) might be preferred over DIMTEST when the test length is short, particularly if guessing behavior is present. There was a tendency for DIMTEST and ACHI to maintain high power when the latent trait distribution was uniform (skewness = 0; kurtosis = 1.7).