Previous studies in CAT have indicated that item calibration error might bias the estimations of person abilities and standard errors, resulting in classification failure. This research investigated the influence of item calibration error on test length and classification accuracy via various sample sizes and item selection criterion. The authors manipulated 4 calibration sample sizes (e.g., 500, 1000, 2500 and infinite) and 2 termination criteria, conditional standard error (CSE) and ability confidence interval (ACI). They also studied 2 model conditions (2PML and 3PML) to understand the interactions among these factors. The findings showed that test length was sensitive to the magnitude of item calibration error under the CSE condition, but not in the ACI condition. A small sample size reduced classification accuracy when the CSE termination rule was used, but the pattern was not found when the ACI termination rule was performed. In addition, the ACI rule was sensitive to the cut location, but the CSE was sensitive to a more extreme value only.
This study is valuable. First, it enhances our knowledge in the influence of item calibration error and its relation to test length. This present study followed the methods in the study of van der Linden and Glas (2000) but switched focus from fix-length CAT to variable-length CAT. Combined with previous studies in fixed-length CAT, this present study provided evidence to understand the possibility of lessen administrated items to reduce participants’ burden. It provided insights for the application of variable-length CAT in clinical studies or practices, which always want to use lesser items to obtain the most information of patients’ physical and mental health. Second, this study provided evidence to understand the interactions among termination criteria, calibration sample sizes, and model conditions by investigating 16 possible conditions. Also, the descriptions of results are presented in a good order.
As the authors mentioned in this study, exposure control and content balance worked efficiently to reduce item calibration error. Therefore, future studies can replicate this study but add exposure control and content balance to see the differences. The influence of termination rules, sample sizes on item calibration error or test length might disappear or be reduced.