The present study proposed an interesting idea that combined a signal detection theory (SDT) model with a hierarchical rater model to account for the influence of nested information (raters nested in items) and rater effects on the estimation of person ability. The authors argued that the Facets models ignored that raters are nested within items, resulting in a decrease in estimates of the standard errors of proficiency. Also, the authors challenged the use of maximum likelihood estimation that causes boundary problems.
Here are my comments:
1) It is not clear why the authors chose the Dirichlet distribution as a prior.
2) The HRM-SDT seems to assume that raters fix his or her ratings at a certain range from the beginning to the end of grading. However, in reality raters change their ratings more at the start and decrease the variability of ratings as they grade more and more essays. How does the HRM-SDT model the changing ratings, instead of monitoring the absolute range of rating criteria?
3) Table 1 showed that HRM yielded a smaller range of estimates and less variation in item step parameters for CR item 1, compared to the HRM-SDT. But for CR item 2, the HRM yielded a wider range of estimates of item step parameters. I am curious how to interpret these findings.