09 IRT for force choice items (present by Chenwei)

振維's

振維's

LIU CHEN WEI -
回帖数:0

Forced-choice questionnaires have advantages that can avoid acquiescence, random effect arisen from personal subjective judge, and halo effect which occur in rating scale. But traditional analytic methods for forced-choice questionnaires cannot give us information about drawing a comparison between persons. The authors proposed an IRT Thurston's model to deal with the kind of multiple-task ipsative data. The Thurstonian model assumes that each item will elicit a utility, a result of a discriminal process. Then the latent utility of item was modeled for comparison between items. However, the interest here is to estimate everyone's latent trait. Thus, the Thurstonian model was extended to include multiple dimension of latent trait. It cannot only estimate the location of latent trait but dealing with multiple-task comparison. After the Thurstonian IRT model and its identification were developed, the item characteristic function and information function for pairs give us insights and implication of this model. The limited information methods were chosen for parameter estimation available readily in Mplus.

Simulation one provided the results of parameter estimation for simplest Thurstonian IRT model. Table 2 showed that the item parameter was recovered well only for condition of mixed keyed-item. More the items per trait is, the better the recovery is. For estimation of latent trait, Table 3 showed that the similar results of parameter estimation in item. If mixed key-item and long test were used, the reliability of latent trait will be higher. It should note that the empirical reliability is somewhat underestimated rather than actual reliability in short test due to the impact of prior information was used in MAP. For goodness-of-fit tests, the condition consisting of mixed key-items and long test got better results, although it tends to overreject model.

Simulation two generalized condition to have five dimensions, pairs, triplets, and quads comparison. For goodness-of-fit test, the rejection rate is slight higher than expect when block size is two. The rejection rates are much higher when block size is three or four. The item parameters were recovered well when mixed items were used no matter how many block size was. For test reliability, it is slight overestimated when block size three or four was used. There is a reverse result in condition of block size is two.

The selected real data was analyzed separately using rating scale form and ipsative form. The results showed that the model fit for MRSM is quite poor. The forced-choice form get better model-data fit relatively. The correlation from the two different form have similar results but one (Agreeableness and Openness). For MAP estimates, the correlation between rating scale form and forced-choice form are high. But the rating scale form gives higher reliability rather than forced-choice form. Note that the overall reliability is lower than in simulation.

In sum, consisting of mixed keyed direction of pairs in block is recommended for forced-choice IRT model. If the pair of items has the same direction, for getting more information, larger difference of slope parameters is required and number of traits has to increase. For a pair of positively items, it will get better result when correlation of two traits is more negative. For many traits involved, the lower average correlation is suggested. For block size, the larger block size will increase more binary responses which give more information for estimation. However, it also increases respondent's cognitive load.

Qs:

1. The discussion in simulation one is hard to understand. What does it mean 「in the direction taken from an angle of about 45° toward」 ?

In my opinion, it can be clear when we refer to formula (19) and (20). The larger the (20) is, the larger the information is. The derivative of response function in (20) is squared in (19). The (20), ignore the minus sign, will be largest if the sum of trait one and trait two are 0 when the pair of items are all positive keyed item (i.e., positive slope parameter). Thus, gaining the information or not depends on the relative position of the two latent traits.

2. In real data analysis, how to choose and decide the item is positive or negative? Judge by previous results (i.e., where were the slope parameters from? ) or wording of the item?

3. As above, we should determine an item is measuring which latent trait based on previous results as well. So we can arrange the item appropriately.

4. Is prior information necessary for estimation of latent trait under forced-choice model? If ML estimate for trait score was used, will the result be more accurate rather than MAP (except those cases cannot be estimated)?

5. It is suggested that the local independence was ignored in MAP estimate, is it really minor when many traits were involved in test?

6. It seems that the test consisting of multiple traits will help us discriminate person to person under forced-choice model. If all the important factors were considered carefully, so the obstacle of ipsative data analysis for comparing between persons was solved perfectly?