Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)
Magnus Stensmo, Terrence J. Sejnowski
Probability models can be used to predict outcomes and compensate for missing data, but even a perfect model cannot be used to make decisions unless the utility of the outcomes, or preferences between them, are also provided. This arises in many real-world problems, such as medical di(cid:173) agnosis, where the cost of the test as well as the expected improvement in the outcome must be considered. Relatively little work has been done on learning the utilities of outcomes for optimal decision making. In this paper, we show how temporal-difference reinforcement learning (TO(A» can be used to determine decision theoretic utilities within the context of a mixture model and apply this new approach to a problem in medical di(cid:173) agnosis. TO( A) learning of utilities reduces the number of tests that have to be done to achieve the same level of performance compared with the probability model alone, which results in significant cost savings and in(cid:173) creased efficiency.