This paper discusses in detail the problem of modifying the initial model when it does not fit the sample data well. Simulation studies are carried out to demonstrate the capitalization problem of this data-driven method.
In applications of covariance structure modeling (and many other structural equation modeling), people usually fit the sample data to an initial model that usually comes from theory or previous research studies. However, when the initial model does not fit the sample very well, researchers tend to modify the initial model according to some modification index that most SEM software provides.
However, the final model you get by modifying the initial model according to one specific sample may not be the same as the final model you get according to another sample. Moreover, the final model you get, although fit your sample well, may not fit other samples well. The reason is that model modifications that substantially improve the model fit may merely fit chance characteristics of the sample, rather than represent aspects of the model that generalize to other samples and to the population. This is the so-called “problem of capitalization on chance”.
Two studies are carried out to investigate stability and cross-validity of model modifications. The initial model is quite complicated, with 21 manifest variables which follow an SEM model with four or seven factors. Cross-validation indices based on the discrepancy function F(S, Sigma) are used. In the single-sample case, model modifications improve the model fit under all conditions according to any index.
This is not surprising at all, because the choices of the sequential modified models are based on the maximum decrease of the indices. However, in the two-sample case, inconsistency occurs in two samples under a couple of conditions, especially with medium or small sample sizes. Moreover, selections of modified models are highly subject to sampling fluctuations. This could be an evidence of the problem of capitalization on chance.
In the context of confirmatory analysis, the problem of capitalization is critical for the model development. The concern is relevant to the vast majority of studies in which model modification is applied. However, in the context of exploratory analysis, the importance of model modification procedures is more than the problem of capitalization on chance because researchers need to first construct a rough model in which little is known about the structure of relationships among variables and then explore variations of that model. In these situations, the initial model has to be improved by modifications according to the sample.
This paper also reveals that model modification process should not be purely mechanical. It should be based on theory, and the modifications should be able to be explained in a concept level.