Sherry's readings and review

Empirical Selection of Anchors for Testing of differential Item Functioning

Empirical Selection of Anchors for Testing of differential Item Functioning

by ZHONG Xiaoling -
Number of replies: 0

This paper suggests a rank-based strategy of empirically selecting anchors. The steps are:

a) Test all items for DIF with IRT-LRT using all other as anchors;

b) Compute the ratio of the LR statistic to the number of free parameters, f, for each item;

c) Rank order the items based on the LR/f ratio;

d) Designate g items with the g smallest LR/f ratios to be anchors.

A simulation study is carried out to

1) Evaluate the rank-based strategy in terms of frequency of selecting a set of group-invariant anchors, and

2) Compare test results using AOI anchor strategy and the suggested rank-based strategy

Test length varies from 10 to 40, percentage of DIF items varies from 0% to 80%, and types of DIF include uniform or non-uniform. Generated responses are 5-categorical ordinal data.

Results show that a clean anchor item was selected in a very high chance (over 70%) by the rank-based method even if the proportion of DIF items reaches 80%. However, if more anchor items need to be selected, the chance to get a clean anchor item set can be as low as 20%. Under all conditions, the rank-based method outperforms AOI strategy for testing DIF.

Use of a single anchor will minimize the chance of contamination, but leads to less stable and lower power in their study.

One contribution of the suggested rank-based strategy is that:

The inclusion of the f parameter allows mixed-type items (e.g., items with different numbers of response options, items following different IRT models).

Adopting this strategy to a factor analysis context leads to the steps:

a) Constrain all factor loadings to be equal and calculate the likelihood

b) Relax factor loading of one indicator and calculate the likelihood

c) Obtain the LR test statistics and difference in degrees of freedom, f.

d) Rank order the indicators based on the LR/f.

e) Constrain g variables with the smallest LR/f ratios to be equal, and let others to be free

Usually, f=1 and the strategy becomes essentially the same to the modification index method by Yoon and Millsap (2007). Of course, purification procedure can be implemented to improve this strategy.

Although in their study, a single anchor item produces lower power than multiple anchor items, previous simulation study (Wang, 2004; Wang and Yeh, 2003) show that results can be very accurate and the power is adequate as long as the sample size is large (e.g., 1000 per group).

Results in this paper showed that power tended to be greater with non-uniform DIF than with uniform DIF when using clean items as anchors, and the number of correct models was usually greater with non-uniform DIF. The author explained these by simulation methodology. But I suspect that this may due to the cancelling-off effect of non-uniform DIF.