35 Characterizing sources of uncertainty in item response theory scale scores (Present by Sherry)

Wayne's comments

Wayne's comments

by CHEN Chia Wen -
Number of replies: 0

This paper propose a multiple imputation- based (MI) approach to estimate the person latent trait considering the uncertainty at the calibration of item parameters. In traditional approach, such as MAP and EAP, just use the point estimate of the item parameters to score person. It causes the underestimated standard errors of measurement, and further results in the premature termination in the variable length CAT. The MI approach is not only considering the uncertainty of item parameter, but also can be simply used on the output of the available IRT software programs. Furthermore, this paper demonstrated this approach can be applied to different IRT models (2PL, 3PL, and PCM) and scoring model (EAP and summed-score EAP). This approach is a kind of plausible value method. It suggests us to sample the M plausible estimate from the multivariate normal distribution of item parameters. The posterior distribution of each sampling parameter was calculated, and then the different theta hat can be computed. Final, we combined those theta hat by average, and gained the variance of theta hat. This study also proposed an index to observe the increase rate in variance when we used MI approach. The result showed that there is larger effect at the smaller test length and smaller sample size for calibration.

1) Intuitively, this approach is similar with MCMC. The MCMC was also mentioned in this paper. As the author mentioned, the MCMC is very time consuming. MI approach is relative efficient. Therefore, I think this study should compare MI with MCMC at the effect of estimate.

2) I am curious the summed-score EAP. How different is it between the traditional EAP using one parameter model?

3) I think about a future simulation study. We can set a true item parameter of CAT pool for generating response. However, we still calibrate the item parameter by a group of population. The first situation is using the point estimate of this calibration as the true item parameter and traditional EAP in CAT. The second situation is using this MI approach in CAT. We can observe the precision of the ability estimate and the test length while variable length condition.