Effect of Multiple Testing Adjustment in Differential Item Functioning Detection
Jihye Kim and T. C. Oshima
The main goal of this study is to investigate the effect of adjustment procedures for multiple testing in the context of DIF studies.
Four methods: the Mantel–Haenszel (MH) method (Holland & Thayer, 1988), the logistic regression (LR) procedure (Swaminathan & Rogers, 1990), the Differential Functioning Item and Test (DFIT) framework (Raju, Linden, & Fleer, 1995), and
Lord’s chi-square test
Three adjustment procedures: the Bonferroni correction, Holm’s procedure, and the BH false discovery rate
Sample size: 1000/1000; 500/500
DIF Items:3/20;6/40;
The type I error and power were computed.
The results show that MH and LR benefited from Holm’s or BH’s adjustment procedures at all test lengths and sample sizes considered in this study, while IRT-based procedures did not benefit from the adjustment procedures as the inflation of Type I errors was not observed under conditions in this study.
Comments:
The simulation study use IRT model to simulate the data, which may explain why the inflation of Type I errors was not observed when using IRT-based procedures. If we use 3PL to simulate the data then using 2pl or Rasch to analyze DIF, the effect of adjustment procedure may occur.