ARC Laboratory Sharing: Xuelan's review

Big idea of the paper:

Random guessing in multiple choice items will affect item parameters and person measurement. It can certainly include the guessing parameter in the model, just as 3PL model does. However, there are many examples where the Rasch model is applied to the multiple choice items. In these case, how to assess the presence of random guessing?

The paper applied the idea of Andersen’s theorem where two procedures were implemented. The first analysis estimate the difficulty from the whole sample which includes the random guessing. The second analysis (called tailored analysis) estimate the difficulty from the subsample which is least likely to be affected by guessing. In this paper, only one subsample was involved as the tailored analysis. Then, the significance of the difference between the two analysis could be tested.

It was argued that the low ability person are more likely to guess randomly, compaired to the high ability person. Hence, the models treated the guessing is a function of relative proficiency. Therefore, an integer values will added as a raise into the term (1-P) (P is the probability of answer correctly). The value was controlled to be small value (ie, 1) for the low ability person while be a large value (ie, 15), so that the impact of guessing for high ability person could be reduced.

It was also argued that guessing is function of item difficulty. That is, it is more likely to have guessing in more difficulty items while it is more unlikely to have guessing in easy items (that's why 6 easiest items were anchored in anchored analysis).

In the simulation study, after the response was generated, it was analysis by three approach: (1) First analysis which using the whole sample, (2) tailored analysis which convert part of data which are tested as significant to be missing, (3) anchored analysis where the easiest items are anchored.

The ICC from the first analysis showed guessing in the data. The tailored analysis was found to be very close to the true item parameter. The estimate of the anchored analysis showed systematic underestimate for the more difficulty items (because the presence of guessing, the more difficulty items were mistakenly believed to be easier). The simulation example with no guessing confirmed the accuracy of the tailored analysis.

The empirical study compared the item estimate from the tailored and anchored analysis. It was found that the more difficult items in the tailored analysis are not as difficult in the anchored analysis (as shown in the simulations). It was expected since the empirical data has been proved to contain random guessing in the more difficult item and the tailored analysis takes the random guessing in the items into account while the anchored analysis does not. The SD of the item difficulty estimates in the anchored analysis is smaller than in the tailored analysis . It was also expected from the Anderson’s theorem (the SD from the first analysis will be smallest).

Questions:

1. As a practitioner, how can we decide to use the tailored analysis or use the 3PL model to assess the presence of random guessing when the 3PL is often fit the data better than Rasch model?

2. The tailored analysis could identified ‘majority of items whose responses were affected by guessing, particularly the most difficult items’. However, how about the Type I error and power of the analysis? What are the factors that will affect the performance of the tailored analysis?

3. The cutpoint for the tailored analysis decide which part of the data will be converted to missing data. However, the computation of the cutpoint seemed to be arbitrary. Why y is 15, not 10 or other value? How to correct the increasing probability of Type I error when many significant tests (the number of significant tests equal to the number of items) were carried out?

Re: Xuelan's review

by QIU Xuelan - Tuesday, 8 November 2011, 4:25 PM

Comment is welcome!

Re: Xuelan's review

by ZHONG Xiaoling - Thursday, 10 November 2011, 11:31 AM

(I may be severely wrong, so just take my answers as reference only)

1. I guess the purpose of this paper does not include choosing between a Rasch and 3PL models. However, it did mention in the second paragraph (on page 2) that Rasch model has some desirable properties. One desirable property is that the item difficulty estimates can be obtained independent of the persons’ distribution. Therefore, there are situations where Rasch model is desirable. For example, in adaptive testing where item banks are used, requiring only item difficulties makes things simple. Back to Xuelan’s question, I guess that you can always use 3PL to fit the data if there are reasons to believe that guessing exists, unless in situations when you have strong preference to Rasch models, such like in adaptive testing.

2. By “type I error and power of the analysis”, do you mean type I error and power of the significant test (which based on the Andersen’s theorem) of the difference between tailored and anchored analysis (as in the last column in table 1)? I believe additional simulation studies need to be conducted to answer this question. The factors that will affect the performance of this significant test can also be studied through simulation studies. This could be a follow-up study of this paper.

3. I agree that the choice of cutoff point is arbitrary. The authors explained the reason why they chose 0.3 as the cutoff point and y=15 (on pages 12 and 13). Choosing y=15 is to emulate with the real data ARPM. For other real data, the choice may be different. Choosing the cutoff point of 0.3 (in fact, a number smaller than 0.27) is to make sure that /beta-/delta < -1.0. That is, person’s proficiency is over 1.0 logits less than the item difficulty. I am not sure if the Type I error needs to be corrected with Bonferroni (or other corrections) in this situation. It seems to me that they are simply Z tests which are independent with each other.

Re: Xuelan's review

by HUANG Sheng Yun - Wednesday, 9 November 2011, 11:31 AM

Hi Xuelan,

as your first question, i think why the author wanted to assess random guess is buecause he wanted to use the Rasch model. and the model is not related to guessing, so he used the form c+c(1-p)^y to generate data. however, the guessing is taken into the 3PLM and guessing here is not random guessing, the guessing here is related to item property. i don't think the 3PLM can be used here for assessing random guessing.

Re: Xuelan's review

by LIU CHEN WEI - Thursday, 10 November 2011, 6:08 PM

for
1. the author only considered that person with lower ability would have more guessing behavior than other conditions. That's why the author invented the modified 3-PLM to simulate "guessing" response. We can visualize it from figure 2. The development of tailored testing and related methods may be only appropriate to Rasch model due to its statistically invariant parameter estimates. That's what the author explained in the figure 1. By the way, the 3PLM is notorious in parameter estimation (i.e., it is hard to estimate c parameter accurately). I think the modified 3PLM would suffer the same difficulty.

2. require more Monte Carlo simulation study? That is not considered in this study.

3. the author adopted y=15 for simulation condition. I think it's hard to decide the y value in empirical analysis. And the data may be insufficient to estimate the value of y.

01 Rasch Guessing(Present by Jacob)

Xuelan's review