This study incorporated the SDT model into the HRM, which is more flexible to describe the behavior of the raters. The SDT-HRM assumed a latent variable for the item’s true categories, separated the effects of the rater and the item, and defined various rater effects like rater’s perception and decision criteria. Compared with Patz’s rater model, major advantage for the SDT-HRM is that it can handle a variety of rater effects, like central tendency, and the preference to use limited response categories.
The authors stated that the IRT rater models make an inappropriate assumption about the relation between rater’s scores and examinee’s proficiency. What the rater could provide is the information about the quality of an essay produced by the examinee, which in turn provides information about the examinee’s proficiency. Which underlying mechanism is true remains a question. HRM can be a powerful model to represent the relation between rater’s scores and examinee’s proficiency. It is also possible that raters provide direct information on the person’s proficiency, but the effect is not the simple linear relationship.
What does it mean by saying “the raters were less able to detect the correct categories for the second item”? Does the item characteristic affect the raters’ ability to detect the latent categories? Are they independent from each other?