Statistical Aspects Of Computerized Adaptive Testing
Open Access
- Author:
- Sie, Haskell
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- May 09, 2014
- Committee Members:
- James Landis Rosenberger, Dissertation Advisor/Co-Advisor
Pui Wa Lei, Committee Member
Mosuk Chow, Committee Member
Hoi Kin Suen, Committee Member - Keywords:
- Item response theory
multidimensional model
likelihood function - Abstract:
- In the past several decades, Computerized Adaptive Testing (CAT) has received much attention in educational and psychological research due to the efficiency in achieving the goal of assessment, whether it is to estimate the latent trait of test takers with high precision or to accurately classify them into one of several latent classes. In the latter case, the adaptive nature of CAT is used in educational testing to make inferences about the location of examinees' latent ability relative to one or more pre-specified cut-off points along the ability continuum. When there is only one cut-off point and two proficiency groups, this type of CAT is commonly referred to as Adaptive Mastery Testing (AMT). A well-known approach in AMT is to combine the Sequential Probability Ratio Test (SPRT) stopping rule with item selection to maximize Fisher information at the mastery threshold. In the first part of this dissertation, a new approach is proposed in which a time limit is defined for the test and examinees' response times are considered in both item selection and test termination. Item selection is performed by maximizing Fisher information per time unit, rather than Fisher information itself. The test is terminated once the SPRT makes a classification decision, the time limit is exceeded, or there is no remaining item that has a high enough probability of being answered before the time limit. In a simulation study, the new procedure showed a substantial reduction in average testing time while slightly improving classification accuracy compared to the original method. In addition, the new procedure reduced the percentage of examinees who exceeded the time limit. Another well-known stopping rule in AMT is to terminate the assessment once the examinee's two-sided ability confidence interval lies entirely above or below the cut score. The second part of this dissertation proposes new procedures that seek to improve such a variable-length stopping rule by coupling it with curtailment and stochastic curtailment. Under the new procedures, test termination can occur earlier if the probability is high enough that the current classification decision remains the same should the test continue. Computation of this probability utilizes normality of an asymptotically equivalent version of the maximum likelihood estimate (MLE) of ability. In two simulation studies, the new procedures showed a substantial reduction in average test length (ATL) while maintaining similar classification accuracy to the original stopping rule based on the ability confidence interval. In the last part of this dissertation, generalization to multidimensional CAT (MCAT) is examined. Research has shown that MCAT improves the precision of both subscores and overall scores compared to its unidimensional counterpart. Several studies have investigated the performance of MCAT in recovering examinees’ multiple abilities depending on the item selection methods. None of these studies, however, considered an item pool containing a mixture of multiple-choice (MC) and constructed-response (CR) items. With many assessments currently containing such a mixture of item types that measure more than one trait, there is an obvious need to understand how different item selection methods choose different types of items depending on their dimensional loadings (simple-structure versus complex-structured) and location of maximum information. In a simulation study, performance of five MCAT item selection methods were compared using an item pool consisting of a mixture between MC and CR items for mixed-format assessments. Ability recovery as well as item preferences of each method (simple- versus complex-structured items and location of maximum information) were examined.