The purpose of this paper is to demonstrate the bifactor model for vertical scaling with construct shift better than unidimensional IRT or testlet model. In a vertical scaling with construct shift situation, we assumed the concept of latent trait changed by different generation groups. Therefore, the bifactor model use the grade-specific dimensions to model group-specific latent trait. Further, the test must be used to assess a desired general concept, calling general dimension in bifactor model. For example, a mathematics assessment for Grade 3, 4, and 5 reflect separately different content, such as Algebra for Grade 3, Geometry for Grade 4 and Calculus for Grade 5. No matter what linking design we used, intuitively we can model different contents by different ability. It is reasonable because the student doing Geometry well might do Algebra badly. However, the purpose of this assessment is examining the mathematic ability. We still assume there is a general mathematic ability across the three grades.
In this paper, the researcher administered a simulation study to compare the parameter recovery of bifactor model with unidimensional IRT model. He also manipulated the sample size, degree of construct shift (variance of specific ability) and percentage of common items to observe the influence of those factor. Further, the real data also was analyzed for comparing different models by fit indices.
1) The problem in this simulation study is the item discriminate parameter fixed to the true value at estimate. I think he could directly eliminate the discriminate parameter of specific theta in function 1 for violating this arguement.
2) I am curious why the percentage of common items had no effect in the simulation study. He said Simon 2008 find the result consistent with this study. I thought this factor would affect the precision of general ability.
3) ANOVA test was used to test the effect of the three factor at bias, RMSE and SE of all parameter estimates. Those dependent variables violated the normal assumption of ANOVA. Maybe this argument not very important. It is a creative approach to view those indices.