Sandy's readings and review

Topic 2: DIF

Topic 2: DIF

by HUANG Sheng Yun -
Number of replies: 2

Application of the DIF-free-then-DIF Strategy to the Logistic Regression Method in Assessment of Differential Item Functioning (Thesis of Mr. Chan)

Traditional method on assessing DIF is not based on pure another items and it’s actually depend on unclean items. Hence, DIF detection may be confound by considering these dirty items into total score. Main contribution of the article is to apply DIF-free-then-DIF strategy to detect DIF by using the logistic regression method. The strategy is to find a (several) pure item(s) to be an anchor item(s), then assess DIF through pure items which is found from the first stage. The new method can improve the performance of DIF detection better than the conventional one. A series of simulations were conducted and the results illustrated that type I error was well controlled by adoption of the strategy, particularly for larger DIF difference. Moreover, the power was also reasonable good compared to other methods.

Main idea of the new method

1) DIF-free -then-DIF (first round)

DIF-free is a purification procedure which has been proposed to find pure items in the first round. Set the first item to be an anchor item, then to compute difference of -2 log likelihood of two logistic regression models for each item. At the same way, to set second, third, …, and the very last item to be anchor item, then compute statistics. Finally, summing of difference of -2 log likelihood for each item, and set items with smaller sum to be DIF-free items (anchor items).

2) DIF-free-then-DIF (second round)

Only using total score by summing of DIF-free items (also is denoted as anchor items here), then compute -2 log likelihood of two logistic regression models: One is full model that it considers total score and group membership into the model, and the other is reduce model which only takes total score into account. We can assess whether DIF exists or not by hypothesis testing. When testing is significant, it means that null hypothesis is not true (the coefficient of group membership is not equal to 0), that is DIF exists.

Comments, Questions and Future Study

1) Conventionally, we focused on detecting DIF items using methods which don’t ignore dirty items in the model. However, the performance of DIF detection may be affected by these unclean items. The present study proposed a much creative and positive idea to find pure items first. Thus, it can resolve the problem mentioned above.

2) The fundamental thought of the paper has given me a light on adjusting purification procedure that using on my on-going study. In my study, we tried to adopt purification to find more accurate ability estimates and item parameter. However, the original method could not prove that it can improve the performance of detecting examinees with pre-knowledge by purification procedure. The original method we used is also based on assessing bad items or bad persons, then discard these bad items and persons to obtain much accurate estimates. Maybe we can try another opposite approach to find pure items and persons first. I think it would be a good way to recovery item parameter for sure.

Reference

Wang, W.-C. (2008). Assessment of differential item functioning. Journal of Applied Measurement, 9, 387-408.

In reply to HUANG Sheng Yun

Re: Topic 2: DIF

by HUANG Sheng Yun -

Assessment of Differential Item Functioning

The present paper is an overview of DIF methods under IRT-based. The author first described what the difference between DIF and misfit is. Then, simply simulated a data set from a DIF condition and adopted fit statistic to detect whether specific DIF item has DIF or not. The result shown that fit statistic is not good enough to detect DIF item. In the second section of the paper, the author briefly introduced three methods which have been widely used to build common metric over groups. These three methods are the equal-mean-difficulty (EMD), the all-other-item (AOI) and the constant-item (CI). Similarly, the author briefly explained assumption and procedure of each method. EMD only well works if the test has no DIF items or focal and reference groups have the same amount impact on DIF items. AOI also has a strong assumption of the test has no DIF item or the studied item is the only DIF item in the test. The CI method is to set a set of items as anchor items, once DIF free items can be found, the method can works quite well. Actually, the assumptions of EMD and AOI are too severe to meet. Even CI are much flexible, however, how to find DIF-free items is a vital issue to be solved. Here, the author conducted a simulation to demonstrate the performance on the three methods. Results illustrated that all the hypotheses and limitation regarding the three methods were confirmed. Moreover, the author also proposed DIF-free-then-DIF strategy into the study which include two parts: 1) iterative CI to find DIF free items to be anchor set (according to Wang, et al. (2007), a set of 4 DIF free items is enough to detect DIF); 2) apply CI to detect DIF. The final point I got from the paper is that if a DIF amount is greater than cut-off point (0.5 logit or 2.72 odds-ratio), then the item need to be removed from the test.

Comments, Questions and Future Study

1) In my opinion, I thought the methods introduced here is related to IRT because of it doesn’t mention any thing about total score. However, the paper illustrated that how to model DIF by linear modeling by different models such like the Rach model of dichotomous items, the PCM of polytomous items, the Facet model, and the testlet model , it seems to me that the procedure is kind of similar to non-IRT based. However, I don’t have enough knowledge to figure out what the procedure exactly is.

2) There is totally difference between DIF item and misfit item: DIF means examinees with the same capacity have different expected probability of being answer an item correctly if they belong to different group (e.g. gender or race). However, DIF items only are removed from the test if they involve unfairness issue across groups. On the contrary, misfit items definitely have to be removed from the test on any situation. As the above mention, I think first we need to examine which item is misfit and delete it, and then examine which item is DIF if the test involves group issue.

In reply to HUANG Sheng Yun

Re: Topic 2: DIF

by HUANG Sheng Yun -

DIF Trees Using Classification Trees to Detect Differential Item Functioning

This paper proposed method of decision tree analysis to detect uniform DIF item, comparing with two conventional MH and logistic regression methods. The algorithm of MH is based on a three way contingency table of frequencies for reference group and focal group against all score levels, thus common odds ratio for each item can be obtained. One can find uniform DIF if main effect coefficient is significant and nonuniform DIF if coefficient of interaction term is significant through the LR method. The tree method in the present study adopted the deviance of Venables and Ripley (2002). A series of simulation were conducted and all aforementioned methods were considered to detect DIF. Results indicated that high power of all three methods if sample size is large; the tree method has the best performance on power and the identical type one error with the other two. The limitation of the paper is we should find a way to choice criterion of the tree method.