The purpose of this paper was to examine the performance of three statistics for assessing goodness-of-fit for Rasch model: R1, R2, and M2. Simulations were conducted to compare the performance in the accuracy under correct model specification, the accuracy and power to reject a 2-Pl model under model misspecification, and the power to reject a 3-Pl model under correct model specification. The authors also provided two empirical examples to depict comparions among fit indices using the LSAT 7 and the Chilean Mathematical Proficiency data. It was concluded that the three statistics perform better than a chi-square test does when the sample size is small, and that M2 outperforms in the accuracy and power under correct model specification or misspecification than the others. However, findings from empirical data supported the use of chi-square tests, not the three statistics, although authors argued the data are sparse.
Comments:
1) In section 5.3, it was stated that “the proposed asymptotic procedure approximates fairly well the empirical rejection rates of R1, R2, and M2, but not of R1*.” However, values of R1* in table 3 and table 4 were close to the ones of R1 even though they were not as high as the values yielded by M2 method. I am curious why the authors made such statement.
2) Most goodness-of-fit (e.g., RMSEA) provides a cut-off value for a good/bad model, but information regarding how badly or well the model fits the data has not been well-documented in the literature. If a fit indice can tell users the deviance between a tested model and a good model, it will be appreciated, particularly when no alternative models are available in research. This might be a good topic for future studies.