Big idea of the paper:
Random guessing in multiple choice items will affect item parameters and person measurement. It can certainly include the guessing parameter in the model, just as 3PL model does. However, there are many examples where the Rasch model is applied to the multiple choice items. In these case, how to assess the presence of random guessing?
The paper applied the idea of Andersen’s theorem where two procedures were implemented. The first analysis estimate the difficulty from the whole sample which includes the random guessing. The second analysis (called tailored analysis) estimate the difficulty from the subsample which is least likely to be affected by guessing. In this paper, only one subsample was involved as the tailored analysis. Then, the significance of the difference between the two analysis could be tested.
It was argued that the low ability person are more likely to guess randomly, compaired to the high ability person. Hence, the models treated the guessing is a function of relative proficiency. Therefore, an integer values will added as a raise into the term (1-P) (P is the probability of answer correctly). The value was controlled to be small value (ie, 1) for the low ability person while be a large value (ie, 15), so that the impact of guessing for high ability person could be reduced.
It was also argued that guessing is function of item difficulty. That is, it is more likely to have guessing in more difficulty items while it is more unlikely to have guessing in easy items (that's why 6 easiest items were anchored in anchored analysis).
In the simulation study, after the response was generated, it was analysis by three approach: (1) First analysis which using the whole sample, (2) tailored analysis which convert part of data which are tested as significant to be missing, (3) anchored analysis where the easiest items are anchored.
The ICC from the first analysis showed guessing in the data. The tailored analysis was found to be very close to the true item parameter. The estimate of the anchored analysis showed systematic underestimate for the more difficulty items (because the presence of guessing, the more difficulty items were mistakenly believed to be easier). The simulation example with no guessing confirmed the accuracy of the tailored analysis.
The empirical study compared the item estimate from the tailored and anchored analysis. It was found that the more difficult items in the tailored analysis are not as difficult in the anchored analysis (as shown in the simulations). It was expected since the empirical data has been proved to contain random guessing in the more difficult item and the tailored analysis takes the random guessing in the items into account while the anchored analysis does not. The SD of the item difficulty estimates in the anchored analysis is smaller than in the tailored analysis . It was also expected from the Anderson’s theorem (the SD from the first analysis will be smallest).
Questions:
1. As a practitioner, how can we decide to use the tailored analysis or use the 3PL model to assess the presence of random guessing when the 3PL is often fit the data better than Rasch model?
2. The tailored analysis could identified ‘majority of items whose responses were affected by guessing, particularly the most difficult items’. However, how about the Type I error and power of the analysis? What are the factors that will affect the performance of the tailored analysis?
3. The cutpoint for the tailored analysis decide which part of the data will be converted to missing data. However, the computation of the cutpoint seemed to be arbitrary. Why y is 15, not 10 or other value? How to correct the increasing probability of Type I error when many significant tests (the number of significant tests equal to the number of items) were carried out?