This is really an unfamailar topic to me. The conclusions are surprising that using theta-based and total score resulted in samll difference in classification rate. It seemed reasonable for 1PL where total score is the sufficient statistic of ability estimation. However, the similar results when 2PL, 3PL were simulating seemed unreasonable.
When I was reading the paper, an interesting idea came to me: since the goal of classification is to detect which category that the examinee belongs to, shall we use the signal detection theory approach to achieve the goal? Unfortunately, I yet don't know how to do now.