In brief, this study attempted to investigate whether subject matter expert (SME) ratings could replace the MML estimates. A story was put on an IRT model for unidimensional pairwise-preference items: in the Zinnes-Griggs (ZG) model, a virtual item with two statements had two location parameters, and a participant's choice depended on the shorter distance between his/her location and one of that location parameters. Like the familiar IRT models, a CAT should be built by implementing the ZG model.
1. As shown in Figure 1, the ZG model always yields a monotonic item response function for any item pair with different location parameters; however it cannot fully account for the complexity of human behavior. For example, a definitely niggard is asked to buy his wife a grand-name bag, LV or Hermès. According to the ZG model, he has an absolutely high chance to choose the cheaper one, maybe LV. But from my view, at the proficiency scale the distance between his location and either option is nearly identical, so that he has a nearly 50-50 chance to choose either one.
2. Parameter estimates derived from MML are located at a mathematically meaningful scale. Do the SME ratings have the same property as well as MML? If no, why an identical measurement model was still implemented? With insufficient proof, I cannot agree the rationality of this article.