This study proposes a method for improving the accuracy of person-fit analysis using lz which takes into account test unreliability when estimating the ability and constructs the distribution for each lz through resampling methods. Several traditional and recent PFS (Person-Fit Statistic) methods and the modified method proposed here were assessed in a series of simulations to evaluate the performance of type I error and detection rate of three aberrance types of cheating, lack of motivation, and speeding. The results show that the new method CTAD (correct person estimates and adjust the null distribution) performed much better than all the others methods on controlling type I error and had reasonable good power.
Main idea of the new method
EAP has been shown to have better statistical properties in its smaller bias and standard error. However, estimates obtained from Bayesian approach would exhibit a tendency to regress to the prior mean while test length is short. The amount of shrinkage is a function of the unreliability of the test, so that the author proposed that modifying person estimate by dividing by reliability formula (equation 5).
Comments, Questions and Future Study
1) As the author mentioned, the limit of this paper is that performance on power of different PFS methods can’t be compared due to different type I error controlling. As the author used true item parameters and the null distribution could not be adjusted or corrected thoroughly to a standard normal distribution, we advise to simulate responses by true item parameters and examinees without any aberrant behaviors, which means data follows specific model, for example the Rasch model or the 3PLM. Then, the author can find the cut point from above null distribution according to specific nominal alpha, so that the detection rate of different PFS methods can be compared with the same baseline of type I error.
2) I don’t really get the point of page 163 step (8) and step (9).
3) The proposed method of the paper tried to correct the EAP estimates by considering unreliability of test. Therefore, corrected ability estimate is equal to the original EAP divide by test reliability. By doing so, it would inflate bias of EAP as well as the conclusion of SE would be inflated last week we discussed in the lab meeting.
4) Actually, the power is definitely affected by precision of person estimate. That is why many researchers have tried to obtain more accurate estimate by modified estimates. However, modified person estimate still is affected by dirty responses when examinees with aberrant behaviors. As my opinion, modified method is a good way to improve type I controlling or increase power on detection. More positive approach is that we have to abandon dirty data as possible as we can when we estimate person ability. Therefore, we propose purification procedure to pick item with misfit between model and response, then to estimate person ability by the rest clean items. The purification study is ongoing.
5) The author mentioned that another limitation of the study is that known item parameters are used to estimate person ability. However, the reality is that sometimes we may not have well-calibrated item parameters to use. Therefore, future study of another scenario for assessing PFS performance of estimated item parameters is needed. We are now also conducting ongoing study with purification procedure for considering such scenario.
6) It’s quite wired that it didn’t have the best type I controlling and power when using true person estimates. If true ability can't work on detecting, how come we need to modify person estimate?