DEALING WITH TESTLET-STRUCTURED DATA: EFFECT OF SAMPLE SIZE ON IRT MODEL SELECTION
Open Access
- Author:
- Li, Xinyue
- Graduate Program:
- Educational Psychology
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- March 19, 2019
- Committee Members:
- Pui-Wa Lei, Thesis Advisor/Co-Advisor
- Keywords:
- item response theory
unidimensional model
bifactor model
model selection
testlet data - Abstract:
- When educational assessments are composed of testlets that violate the local independence assumption of unidimensional Item Response Theory (IRT) models, the theoretically true model would be the bifactor IRT model. Bifactor models take into account the local dependency in a testlet and focus on only one primary factor. However, highly parameterized multidimensional IRT model (including bifactor model) requires larger sample sizes to obtain stable and accurate parameter estimates. Researchers are faced with a model selection problem between the simpler unidimensional IRT (UIRT) models and the highly parameterized multidimensional IRT models, especially when sample size is limited. The purpose of this study is to examine whether item and person parameter estimates produced by the simpler unidimensional 3PL model would be comparable to those produced by bifactor 3PL model when test data come from testlets, under different sample size conditions. We fitted 13 GRST (Gray Silent Reading Tests) testlets using the bifactor model to a pseudo population (N=3865) in order to obtain the true parameter values; next, we generated eight research conditions (4 sample sizes x 2 models) and compared those parameter estimates with true values. Results show that: (a) item parameter estimation becomes more stable and more accurate as sample size increases for both models, except for the guessing parameter; (b) UIRT models yield more stable item/person parameter estimates than bifactor IRT models over replications, except for guessing parameter and the intercept parameter in N=250; while bifactor IRT models produce more accurate item/person parameter estimates than UIRT models in most research conditions, except person parameter in N=250; UIRT models produce more stable and more accurate person parameter estimates than bifactor models when sample size is small (N=250); (c) bifactor models require more estimation time than unidimensional models; in addition, bifactor models are more likely to encounter convergence problems, especially in large sample sizes.