ARC Laboratory Sharing: xiaoxue‘s review

18 Item selection in multidimensional computerized adaptive testing (Present by Nicky)

xiaoxue‘s review

cw's ►

ITEM SELECTION IN MULTIDIMENSIONAL COMPUTERIZED ADAPTIVE

TESTING—GAINING INFORMATION FROM DIFFERENT ANGLES

CHUN WANG AND HUA-HUA CHANG

This study conducts a rigorous investigation on the relationships among four promising item selection methods: D-optimality, KL information index, continuous entropy, and mutual information.

Two simulation studies were carried out to compare the performance of the four methods.

The first study focuses on evaluating the estimation accuracy of each method and comparing the item selection overlap among the methods;

Item pool : 450 items

Model: M2PL

a_1jand a_2j : U(0, 1.3)

b_j : U(−1.3, 1.3)

θ₀₁, θ₀₂ = (−2.0, −1.6, . . . , 2.0), 1000 simulations

test length: 25

The second study focuses on a special case in which the item bank is unbalanced.

a_1j: U (0, 1)

a_2j: U (0, 2)

θ₀₁, θ₀₂ = (−1.0, 0, 1.0)

The estimation accuracy of each method is measured by mean squared error (MSE), and bias for each element in q. For the case of conciseness, a Euclidean distance are used as a global index of psychometric precision.

The simulation results showed that mutual information not only improved the overall estimation accuracy but also yielded the smallest conditional mean squared error in most region of ability vector.

MCAT：

Sequentially select the items

Recursively estimate a p-dimensional ability vector

Item selection rules：

Maximizing information about the location of an examinee on the q-coordinate system

Minimizing the error in the estimation of the location

Minimization of the confidence interval of the ability estimates

D-optimality criterion:

Maximizing the determinant of the Fisher information matrix which is

equivalent to minimizing the confidence ellipsoid of the ability estimate in

MCAT

Kullback–Leibler (KL) information：

Global information or a global profile of the discrimination power of an item;

A KL index is the integration of KL information over a region that contains <?mso-application progid="Word.Document"?> 12

="OrlPr>ame="O" type="#_x0000_t75">.

Entropy:

Measuring uncertainty of the distribution of a random variable

Maximum KL distance between two subsequent posteriors (KLP):

The KLP method utilizes the KL information to measure the distance between the current and new posterior distributions of the estimate of <?mso-application progid="Word.Document"?> 12

" type="#_x0000_t75">, and by maximizing such distance it yields the largest change of the posterior distribution.

KL information with Bayesian update (KLB) method:

KLB switches the positions of the current and new posterior distributions when calculating the KL distance

Mutual information method:

Maximizing the mutual information between the current posterior and predictive response distributions on the candidate item.

The advantage shared by MUI, CEM, KLB, and KLP is that they only utilize the response pattern of the examinee without relying on the interim estimate and their flexibility. While D-optimality and CEM, are designed to directly reduce to uncertainty of the ability estimates through the item selection.

Well, I have to say I am always confused by so many methods.

In the results, the author concludes that MUI has a relatively larger region of low AED values, followed by D-optimality and CEM. KI, however, produces higher AED in a larger range of the ability space. How can they tell the difference of the region between D-optimality and CEM for they are so irregular? Just with the naked eye? Or there are other indices can be used for comparison?

In the note part, the author shows that some numbers have large absolute values when one ability dimension is high and the other ability dimension is low. This is due to the MLE estimation method. Why not change the estimation method?

cw's ►