D-optimality, Kullback-Leibler information, entropy method, and mutual information item selection method have been given and compared in simulation study. D-optimality is commonly used in multidimensional adaptive testing. However, it may lead to high estimation error in early-stage testing. The Kullback-Leibler information was developed to overcome the problems. It is used as global information rather than fisher information. The differential entropy is used as expected posterior continuous entropy, so it makes sense to select the item with least entropy. The mutual information is a special case of KL. The two functions of random variable x and y are tested if they were independent. If y contains part information about x, then we can get the information of x from y. The KL index was developed to select optimal item. Note that KL has a property of preferring highly discriminative on multiple dimensions. KL v.s. D-optimality: They are not the same thing on multidimention when item length goes infinity. KLP selects the item that can produce largest update of the posterior distribution. KLP is conceptually like information gain. It tries to measure the information gain in moving from the current posterior to a new posterior.
Finally, simulation studies were given to compare the performance of KI, D-optimality, CEM, and MUI. The MUI has the best performance among those methods. Note that the KLP is ignored here.
1. It is not so intuitive to understand why KL information makes effect on reducing the error of estimation. KL is known as relative entropy, so when can we call it as “information”? and why we should select the candidate item with largest KL information? Why it works?
2. As for formulation (12), the pi is a distribution not a probability. It has been strange to me for a long time why we can put (12) into (11) that leads to (13)? According to the definition of entropy, the term in the (11) should be a probability function. So, I think the (13) should not be called as “entropy”.
3. Why (13) is called differential entropy? Where does it represent the “difference”?
4. KLP v.s. KLB: they used different KL and the difference is that it switches the old posterior distribution and new posterior distribution. They are not the same thing, so the item selection indexes are possibly different. And KLP was not included in simulation study. It is curious that the KLP is superior than KLB or not. And what the conceptions between the two index?
5. They all demand high computationally intense, especially in high dimensions. But interesting stuff is that they did not consider the provisional trait estimate at all when selecting next item.