The study extended a bifactor MIRT model for testlets where the random effect for ability is the primary dimension and the random effect for testlet effect is the secondary dimension for DIF detection. Specifically, an additional part representing the DIF effect is incorporate into the model.
It was argued that the proposed DIF detection model is distinctive from the previous study in two ways: (1) DIF magnitudes in this study was estimated under the assumption that the average DIF magnitude is zero (that is Equal-Mean-Difficulty method) while Wang, Bradlow, Wainer and Muller (2008) estimate the DIF magnitude using the all other item (that is All-Other-Item method) in the test as anchor items. (2) the impact effect was considered and could be estimated.
However, EMD method is correct only when there are no DIF item at all or there are some DIF items but some favor reference group while others favor focal group and DIF effect could be cancelled out. In this study, it was simulated that the DIF items in three tests had positive DIF whereas the DIF items in the other three testlets had negative DIF. In this situation, bifactor MITR DIF model could produce better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model. In practice, if DIF effects could not be cancelled out, the EMD method could be problematic.
Wang and Wilson (2005a, 2005b) proposed DIF detection model by extending their Rasch testlet response model. It was argued that since the model in this study was based on a 2PL testlet response theory mode, it could not compared with Wang and Wilson’s model (2005a, 2005b). But if item discrimination was assumed to be 1.00 across items, could it be compared?