Part of Advances in Neural Information Processing Systems 11 (NIPS 1998)
Nuno Vasconcelos, Andrew Lippman
The hierarchical representation of data has various applications in do(cid:173) mains such as data mining, machine vision, or information retrieval. In this paper we introduce an extension of the Expectation-Maximization (EM) algorithm that learns mixture hierarchies in a computationally ef(cid:173) ficient manner. Efficiency is achieved by progressing in a bottom-up fashion, i.e. by clustering the mixture components of a given level in the hierarchy to obtain those of the level above. This cl ustering requires onl y knowledge of the mixture parameters, there being no need to resort to intermediate samples. In addition to practical applications, the algorithm allows a new interpretation of EM that makes clear the relationship with non-parametric kernel-based estimation methods, provides explicit con(cid:173) trol over the trade-off between the bias and variance of EM estimates, and offers new insights about the behavior of deterministic annealing methods commonly used with EM to escape local minima of the likelihood.