Sandy's readings and review

Topic 1: Person Fit

Topic 1: Person Fit

HUANG Sheng Yun -
回帖数:4

This study proposes a method for improving the accuracy of person-fit analysis using lz which takes into account test unreliability when estimating the ability and constructs the distribution for each lz through resampling methods. Several traditional and recent PFS (Person-Fit Statistic) methods and the modified method proposed here were assessed in a series of simulations to evaluate the performance of type I error and detection rate of three aberrance types of cheating, lack of motivation, and speeding. The results show that the new method CTAD (correct person estimates and adjust the null distribution) performed much better than all the others methods on controlling type I error and had reasonable good power.

Main idea of the new method

EAP has been shown to have better statistical properties in its smaller bias and standard error. However, estimates obtained from Bayesian approach would exhibit a tendency to regress to the prior mean while test length is short. The amount of shrinkage is a function of the unreliability of the test, so that the author proposed that modifying person estimate by dividing by reliability formula (equation 5).

Comments, Questions and Future Study

1) As the author mentioned, the limit of this paper is that performance on power of different PFS methods can’t be compared due to different type I error controlling. As the author used true item parameters and the null distribution could not be adjusted or corrected thoroughly to a standard normal distribution, we advise to simulate responses by true item parameters and examinees without any aberrant behaviors, which means data follows specific model, for example the Rasch model or the 3PLM. Then, the author can find the cut point from above null distribution according to specific nominal alpha, so that the detection rate of different PFS methods can be compared with the same baseline of type I error.

2) I don’t really get the point of page 163 step (8) and step (9).

3) The proposed method of the paper tried to correct the EAP estimates by considering unreliability of test. Therefore, corrected ability estimate is equal to the original EAP divide by test reliability. By doing so, it would inflate bias of EAP as well as the conclusion of SE would be inflated last week we discussed in the lab meeting.

4) Actually, the power is definitely affected by precision of person estimate. That is why many researchers have tried to obtain more accurate estimate by modified estimates. However, modified person estimate still is affected by dirty responses when examinees with aberrant behaviors. As my opinion, modified method is a good way to improve type I controlling or increase power on detection. More positive approach is that we have to abandon dirty data as possible as we can when we estimate person ability. Therefore, we propose purification procedure to pick item with misfit between model and response, then to estimate person ability by the rest clean items. The purification study is ongoing.

5) The author mentioned that another limitation of the study is that known item parameters are used to estimate person ability. However, the reality is that sometimes we may not have well-calibrated item parameters to use. Therefore, future study of another scenario for assessing PFS performance of estimated item parameters is needed. We are now also conducting ongoing study with purification procedure for considering such scenario.

6) It’s quite wired that it didn’t have the best type I controlling and power when using true person estimates. If true ability can't work on detecting, how come we need to modify person estimate?

回复HUANG Sheng Yun

Re: Topic 1: Person Fit

HUANG Sheng Yun -

Detecting Response Styles and Faking in Personality and Organizational Assessments by Mixed Rasch Models

This chapter made an overview of applying mixed Rasch models to personality and organizational assessments. Contrary to ability assessment, respondents may know the “correct” answer or to make a specific response due to the fact that they want to follow socially desirable answer even if they don’t really think about themselves as persons responding to the type of person they belong to. In such situation, clinical psychologists and employers cannot make a good diagnosis and arrangement for patients and employees. The authors here introduced mixed Rasch models to deal with such problems aforementioned and also shown the information and advantages the mixed Rasch models can provide. Several empirical datum were fit by IRT models to assess which model has the best fit according to ACI and CACI indices. Results revealed that response styles, faking, and structure difference in item parameters can be interpreted meaningful by the mixed Rasch models more than the conventional models.

Main idea of the new method

Empirical datum on dichotomous items and polytomous items were examined through the mixed Rasch models and HYBRID models. For the dichotomous items, respondents were separated into two subgroups of response patterns which are yea-sayers and nay-sayers. However, in this empirical data, it’s not easy to find response style classes because of all participants were allowed to decide to take the questionnaire by themselves. For polytomous items, the authors shown some studies in the past and that those studies were to examine unmotivated personality data, measurement invariance, faking, and response styles in anger expression. Results illustrated that the mixed models could make more informative conclusion, for example, item threshold would be disorder for those who were faking.

Comments, Questions and Future Study

1) As Prof Wang’ comments last week, if the analysis on the mixed Rasch model only shows that there are two classes which are high-ability class and low-ability class. This kind of analysis is useless due to the fact that we also can get such information from individual’s ability. Similarly, this conclusion easily appears on test of dichotomous items. Attribute yea-sayers and nay-sayers to two different classes have the same confounding of latent trait. It would be much meaningful if the subgroups are independent from latent trait, such as faking and no-faking.

2) This model can be used to detect faking response pattern in the future studies.

3) The authors mentioned that this kind of studies is similar DIF concept. Thus, we can apply the mixed Rasch models to detect DIF. On my opinion, only DIF items are a lot, so that we can accurately distinguish unknow membership. It’s kind of tricky, because we don’t want there are too many DIF items in real data. Therefore, I’m quite confused that how to apply the mixed models on DIF detecting.

回复HUANG Sheng Yun

Re: Topic 1: Person Fit

HUANG Sheng Yun -
The Effect of Person Misfit on Classification Decisions
This paper investigated that applied person fit statistics to improve the correct rate of classification decisions. A series of simulations were conducted to illustrate the effect of person misfit on classification decisions. Several independent variables were manipulated for the simulations: cut-point for decision judgment, test length, person fit statistic, type of aberrant behavior, estimation method, and sample size. The dependent variable was correct decision rate of normal group and misfit group. The results shown that the effect of misfit responses would affect item and person estimation and cut-point obviously affect the correct decision rate, particularly for cut-point was set to be -1 and 1.

Comments, Questions and Future Study
1) Aberrant behaviors such like guessing and cheating, which are mainly to be investigated. However, researchers generated such responses violations with different manipulation in person fit studies. Thus, it is hard to compare results on different studies on the same basis. Consequently, suggestions and results are only dependent on their studies. I think we need to do some literature of cognitive psychology on guessing and cheating behaviors, then try to design a more reasonable simulation scenarios. That would help different studies can be comparative if they adopt the same design frame.
2) The authors basically used person fit to improve the correct decision rate on classification testing. I think we can extend the study to computerized classification testing. However, these two testing have different algorithms, it would be a problem how to chose which person fit statistic to be used.
3) The misfit responses would affect item and person estimation, then person fit statistics might be affected by biased parameter estimations. We can try to apply purification procedure to improve estimation into the study. That would mainly reduce the effect of misfit on estimation.
回复HUANG Sheng Yun

Re: Topic 1: Person Fit

HUANG Sheng Yun -

Testing Person Fit in Cognitive Diagnosis

This paper developed likelihood-based ratio testing to detect two types of aberrant responses for cognitive diagnosis model (CDM). Two CDMs were introduced: the first one is DINA model which is the simplest one to depict responses that result from person attributes; and the second one is RUM which is model responses with more than two probabilities comparing with DINA model. Three simulations were conducted by the authors and the first one was to evaluate marginal and joint likelihood ratio testing under true item parameters, estimated item parameters, and item parameters were perturbed within 95% confidence intervals. Moreover, the last two simulations were to investigate two types of aberrant response of spuriously high scores that result from multiple strategies are available and spuriously low scores that is caused by Q-matrix is partly missing. Finally, a real data was analyzed by DINA and RUM models. Results illustrated that proposed methods performed adequately well for detecting persons with aberrant responses.

Main ideas of the present paper

  • DINA model (eq.1 and 2)
  • RUM (eq.1 and 3)
  • Aberrant response or not (eq.4) 
  • Joint likelihood ratio testing (eq.5 and 7)
  • Marginal likelihood ratio testing (eq 6 and 8)

Comments, Questions and Future Study

1) Conventionally, person fit statistics have been adopted for continuous IRT models. The authors put main ideas from past researches and developed adequate method for CDM models and found the methods performed quite well.

2) From the first simulation, it illustrated that it had the best performance when true item parameters were known. It provided a hint for us to think about that more accurate estimated attributes may help to improve the detection of aberrant responses. As the iterate strategy we adopted under conventional IRT model, the strategy may also benefit for using under CDM.

回复HUANG Sheng Yun

Re: Topic 1: Person Fit

HUANG Sheng Yun -

A Didactic Presentation of Snijders’ lz* Index of Person Fit with Emphasis on Response Model Selection and Ability Estimation

Conventional lz index has been popular and adopted in the studies of person fit. However, the index only follow the standardize normal distribution when computing by true ability. In practice, we can obtain estimated ability from collecting data, thus lz doesn’t fulfill the assumption of standardize normal distribution. As a result, pre-specific nominal level may not be achieved in such situation. Snijders’ lz had been developed to resolve the problem, however, it has not been common used nowadays. The main reason is that the procedure is not easy to follow, so that the authors tried to introduce a didactic technique on Snijders’ lz method with more concrete description. The Snijders’ lz follows standardize normal distribution when estimated ability is used on index computation if ability method meets equation 14 of the paper. As my point of view, it is really better than conventional one under the scenario of item parameters known. In the reality, item parameters and person parameter are estimated simultaneously. Thus, parameters may affected by noise of aberrant behaviors, so that Snijders’ lz would not follows the property of standardize normal distribution.