Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)
Kazumi Saito, Ryohei Nakano
This paper compares three penalty terms with respect to the effi(cid:173) ciency of supervised learning, by using first- and second-order learn(cid:173) ing algorithms. Our experiments showed that for a reasonably ade(cid:173) quate penalty factor, the combination of the squared penalty term and the second-order learning algorithm drastically improves the convergence performance more than 20 times over the other com(cid:173) binations, at the same time bringing about a better generalization performance.