1. Two comments are in terms of the iterative procedure. As the description “only the item associate with the largest chi-square difference in the first run is considered biased”, does it mean that at least one item will be considered biased even if actually there is no DIF item in the test? Also, it seems to me that the iterative procedure yields a nonreversible result, suggesting that once a non-DIF item is mistakenly marked, it cannot be classified to the correct group any longer in later iterations. Apparently, the employed iterative procedure is different with the used purification procedure of detecting DIF items.
2. Although the false positive rate could be restrained within a relatively acceptable range by either proposed methods, it was still uniformly higher than the nominal level.
3. Among the simulations, a special case was considered in which only an item was manipulated as biased item in a short test. Is it possible for the proposed methods that false positive rate will be out of control when a longer test contains a high proportion of biased items?