This study mainly examined how different rounding rules affect cut-scores. Results showed rounding to the nearest whole number severally affect the cut-scores, but the effect was mitigated when done at the cluster level. The other two rounding methods, nearest 0.05, and nearest two decimal places, recovered the intended cut-scores fairly well. Another thing was pointed out that more test items around the intended cut-scores would provide less bias. However, different panelists have different intended cut-scores, it is hard to design a set of items which are sufficient to accurately estimate all panelists’ intended cut-scores. What I wonder is, whether this kind of research is really important for standard setting? I think the results are reasonable and could be expected even without any simulation study. Rounding to the nearest whole number has loss more information than the other two methods, which consequently would affect the precision of parameter estimation.
Based on the results in this study, when using Angoff method, it is advised to ask the panelists to provide judgments with two decimal. I think it is not easy for the panelists to well control the difference between judgments, like 3.65 and 3.85. In this way, whether the internal consistency of one panelist would be affected? And whether this variance would affect the estimated cut-scores?
In this study, the item parameters were used as known without error. What about taking the measurement error of items into account?
Yes, we can investigate whether there are biases in three rules with error term. in Rekcase 2006, he refered a simulation method including the error term and evaluated this effect. when raters gave a score for each item, this score should add a error term fallowing the Beta distribution. when the simulation including error term is run, we can replicate more times and evaluate those rounding rules.