Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)
Thomas Briegel, Volker Tresp
We replace the commonly used Gaussian noise model in nonlinear regression by a more flexible noise model based on the Student-t(cid:173) distribution. The degrees of freedom of the t-distribution can be chosen such that as special cases either the Gaussian distribution or the Cauchy distribution are realized. The latter is commonly used in robust regres(cid:173) sion. Since the t-distribution can be interpreted as being an infinite mix(cid:173) ture of Gaussians, parameters and hyperparameters such as the degrees of freedom of the t-distribution can be learned from the data based on an EM-learning algorithm. We show that modeling using the t-distribution leads to improved predictors on real world data sets. In particular, if outliers are present, the t-distribution is superior to the Gaussian noise model. In effect, by adapting the degrees of freedom, the system can "learn" to distinguish between outliers and non-outliers. Especially for online learning tasks, one is interested in avoiding inappropriate weight changes due to measurement outliers to maintain stable online learn(cid:173) ing capability. We show experimentally that using the t-distribution as a noise model leads to stable online learning algorithms and outperforms state-of-the art online learning methods like the extended Kalman filter algorithm.