Xuelan's readings and review

Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff’s Delta Plot

Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff’s Delta Plot

by QIU Xuelan -
Number of replies: 0

1. Delta Plot

Angoff’s Delta plot method is useful for small sample size in detecting DIF. It consists of four steps: (1) compute the proportion of correct response for each item in reference and focal group, respectively; (2) the proportion are transformed to a continue scale using standard normal distribution (z score); (3) z score are transformed into Delta score by linear relationship, delta=4z+13 to avoid any negative values. (4) The paired delta scores for reference and focal groups are plotted together. When DIF items are absent, the plot looks like an ellipsoid, while DIF items depart from the ellipsoid. A DIF statistic is computed as the perpendicular ( 垂直的 ) distance between the Delta point and the major axis of the ellipsoid. A large statistic (e.g., >1.5) indicating the item departs from the major axis, thus flagged as DIF item.

2. Modified Delta Plot

It was argued that the DIF threshold (for example, 1.5 vs 2.0) is fixed, and is chosen arbitrarily. In fact, the DIF threshold may relate to all Delta points (including the DIF item’s point) and the significance level. In the researcher’s previous study, the Delta point are assumed to be bivariate normal distributed and the perpendicular distances follow a (univariate) normal distribution. Thus, the item can be determined to be DIF item when the perpendicular distance is larger than a critical value in terms of certain significance level (e.g., 0.05). It was argued that the modified Delta plot has several advantages, including well control Type I error, more powerful for small sample size.

3. Item purification

One main purpose of the study is to investigate the influence of item purification on the modified Delta plot method. Briefly, thee perpendicular distances were estimated iteratively, by removing the DIF items from computation. A critical question is: how the threshold is updated during iteration? One option is to keep the threshold fixed to its initial value throughout the process (IPP1). The second option is to update all the parameters at each iteration (IPP3). The third option is to just update partly slope parameters (IPP2).

4. Simulations

In simulations, sample size, test length, impact, DIF item percentage, DIF size and DIF method were manipulated. The performance of modified Delta plot versus the proposed three purification procedures (IPP1, IPP2, and IPP3) were compared when DIF items are absent and present. Two parameters IRT model was used in the simulations.

It was found that IPP3 has inflated type I error in the absence of DIF (will explain in details). The modified Delta plot and other two purification procedures (IPP1 and IPP2) have similar Type I error (close to nominal significance level).

Again, the similar findings were found in the presence of DIF. In terms of power, the IPP3 has larger power (but meaningless considering its inflated Type I error). The left three methods, performs similarly.

5. Discussion

The reasons for not improving the Delta plot by item purification are discussed. The DIF threshold in Delta plot depended on the sample of Delta score itself instead of the underlying statistical distribution. Thus, for example, for IPP3, when the covariance between the Delta score decrease, the threshold increase. Consequently, this method will yield the risk of flagging more and more DIF items, thus, lead to inflated Type I error. However, for the modified Delta plot, it can already keep stable Type I error (though conservative), hence the item purification has no room to play its role.

Questions and Comments:

1. It was assumed that the Delta points are bivariate normal distributed in the modified Delta plot method. In the simulation, the tests were sampled from standard normal distribution, and hence the method seems to perform successfully. However, it is often the test length is less than 100 and the normal assumption is not always hold. How the method performs when the normal assumption is violated?

2. According to many previous study, the purifications plays significant role when the test contains more than 20% of DIF items. In simulations, the percentages are not larger than 20%. It is interesting to investigate whether item purification might be more useful in that setting.

3. The authors analogue the modified Delta plot method to the robust method they did in previous study (Magis & De Boeck 2012). Interestingly, I did the simulations to investigate whether the item purification is necessary to the robust method and found that the item purification does not improve the DIF detection to the robust method (the similar conclusion to present study!). The reasons are not clear for me yet.