A Comparison Between Some Generalized Mantel-Haenszel Statistics for Detecting DIF in Data Simulated Under the Graded Response Model
Ángel M. Fidalgo and Dave Bartram
Applied Psychological Measurement 2010
Main idea:
The purpose of the study is to detect the relative efficacy of the generalized Mantel-Haenszel(GMH) statistic (Mantel & Haenszel, 1959) and the Mantel test (Mantel, 1963) for large numbers of differential item functioning (DIF) patterns.
The differences between them are as follows:
1 For polytomous data, the GMH treats the response categories as nominal data, whereas the Mantel test considers the ordinal nature of the response categories in polytomous items.
2 In GMH, the H1 specifies that the distribution of the response variable differs in nonspecific patterns across groups. The Mantel test specifies in H1 that the mean responses differ across the grouping variable.
The simulation study was conducted.
2 test length: 10 items, 20 items
2 mean latent trait differences : N(0, 1), N(–1, 1)
8DIF pattern: 1 Constant DIF pattern
1 Balanced DIF pattern
3 Low-unbalanced DIF pattern
3 High-unbalanced DIF pattern
2 DIF magnitude: 0.25, 0.40
Sample size:500
1,000 data sets were generated using the GAUSS program
The GMHDIF program was used to analyze the data sets simulated.
Results:
1 The type of score used to compute Mantel test influences its capacity for detecting different patterns of DIF. When the simulated DIF pattern is unbalanced low or balanced, use of rank scores, compared to the customary integers, maintains excellent control of Type I error.
For the detection of constant DIF, the two scoring systems offer similar results.
2 Both statistics displayed good control over their Type I error rates for both the null and non-null conditions.
The larger the shifts in means between groups involved in the DIF pattern, the more powerful is GMH test relative to Mantel test. This is the case for the constant pattern. On the other hand, when the means differences are minimal, Mantel test is more powerful than GMH test.
Finally, the author recommend to use the GMH test, for it is not only capable of detecting more complex patterns of association, but also avoiding having to justify a choice that may be problematic or having to continue with the arbitrary use of integer scores.
Comments:
1 The pattern of DIF in my opinion is a bit complex and not very clear.
2 The sample size is fixed at 500. Maybe more conditions can be considered in the future study.
3 The structure of the context is very clear and easy to follow.