Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models
As MIMIC model has been proposed to detect uniform DIF only, the author intuitively proposed MIMIC-interaction model to detect nonuniform DIF and uniform DIF. In the present paper, a series of simulations were conducted on IRT-LR-DIF, MIMIC, and MIMIC-interaction. The result show that MIMIC has the best hit rate, however, false positive rate of the model was inflated. Also, an empirical data on the Loss of Control Scale was analyzed by using MIMIC-interaction. In the analysis, p value was adjusted by BH corrected to avoid inflated false positive rate.
Sharing, Question and Future study:
1) As the result show that MIMIC-interaction has inflated false positive rate, the hit rate can’t be compared to the other methods which have well controlled false positive rate and claimed MIMIC-interaction model has the best performance on hit rate. Maybe the author can try to include BH corrected then compare new result to IRT-LR-DIF and MIMIC. On my opinion, BH corrected is much better adopted in the previous studies than solely in analysis of empirical example.
2) In some situation, IRT-LR has the best performance and the author proclaimed that the reason is due to the problem of inaccurate estimate of bF. Also, the author explained that false positive rate inflation is because the software. When IRT-LR works well you say it’s due to bised-bF estimate, however, the false positive rate inflation of MIMIC-interaction and you would say that is software problem. It was kind of trying to promote MIMIC-interaction as a perfect procedure without any weakness itself. I’m not sure such feeling is because of it’s my misunderstanding for the paper. but I agree the author suggested that we need to look for more appropriate LMS to evaluate MIMIC-interaction.