The estimation of the IRT reliability coefficient and its lower
and upper bounds, with comparisons to CTT reliability statistics
Seonghoon Kim • Leonard S. Feldt
Asia Pacific Educ. Rev. (2010) 11:179–188
Purpose of the study
1) to investigate the mathematical characteristics of the test reliability coefficient as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient
2) to examine relative performances of the IRT reliability statistics and two classical test theory (CTT) reliability statistics (Cronbach’s alpha and Feldt–Gilmer congeneric coefficients) under various testing conditions
Two studies were conducted to investigate theoretical characteristics of the IRT item- and test-level reliability statistics and to examine relative performances of the IRT and CTT test reliability coefficients using real data.
The first study concerned dichotomously scored multiple choice (MC) item tests and used the (3PL) model to manipulate the degrees of discrimination, difficulty, and guessing for the MC items. The results suggest that to enhance test score reliability test developers should use MC items that have high discrimination, are affected little by guessing, and have difficulty matching to the average ability level of a target examinee group.
The second study used the 3PL model and the GPC model (Muraki 1992) to deal with mixed-format tests that contained both MC and polytomously scored constructed response (CR) items.
The result appears that the alpha coefficient is slightly less than the Feldt–Gilmer coefficient for the all
3PL tests and the Science GPC test, but the former is relatively much less than the latter for the other (GPC and
3PL+GPC) tests.
The results show that
(1) the IRT reliability coefficient was higher than the CTT reliability statistics;
(2) the IRT reliability coefficient was closer to the Feldt–Gilmer coefficient than to the Cronbach’s alpha coefficient;
(3) the alpha coefficient was close to the lower bound of IRT reliability
The test reliability coefficient is defined as the ratio of true-score variance to observed variance, which
is equivalent to the squared correlation between T and X. From the perspective of nonlinear regression, the ratio can be interpreted as the correlation ratio of test score X on ability q
The local independence assumption in IRT replaces the uncorrelated errors assumption in CTT.
This paper considers two approaches to computing the IRT reliability.
The true score variance approach
It is more efficient for involving a computationally intensive iterative algorithm to obtain the conditional frequency distribution
It also allows one to compute the IRT-based item score reliability coefficient.
The observed score variance approach
It is more appealing to practitioners, because it deals with observed score distribution
It is simplicity in computation of observed variance
The paper is useful for us to understand the cause and effect of the IRT reliabiliry.There are still some formulas need to be digested for me