1. How to verify the quality of the new assembled test? How to evaluate whether the new generated/assembled test has fully satisfied the preset goals, and not bring some other noise?
2. The test was assembled based on a specific model. After the test is administered to examinees, does it mean that only the true model can be used to analyze the response data?
3. Automated test generation technique can be used in areas where the knowledge has clean structure. The test generation tool needs to be more heterogeneous in nature; otherwise the tool may always produce items with similar structure and expression, causing the problem similar with item exposure. The paper stated that, some combinations of radical and incidentals may result in invalid items. Then how the program can justify whether an item is “valid” or “invalid”?
4. What is the difference between (a) automatic test assembly techniques, and (b) item selection procedure in CAT?
1. Rule base method is constraint all the preset goals by programming. There is no noise.
2. Yes, only true model.
3. It's indeed a problem.
4. I think automatic test assembly is the CAT with so many constrains.