Part of Advances in Neural Information Processing Systems 7 (NIPS 1994)

*Holger Schwenk, Maurice Milgram*

When training neural networks by the classical backpropagation algo(cid:173) rithm the whole problem to learn must be expressed by a set of inputs and desired outputs. However, we often have high-level knowledge about the learning problem. In optical character recognition (OCR), for in(cid:173) stance, we know that the classification should be invariant under a set of transformations like rotation or translation. We propose a new modular classification system based on several autoassociative multilayer percep(cid:173) trons which allows the efficient incorporation of such knowledge. Results are reported on the NIST database of upper case handwritten letters and compared to other approaches to the invariance problem.

1 INCORPORATION OF EXPLICIT KNOWLEDGE

The aim of supervised learning is to learn a mapping between the input and the output space from a set of example pairs (input, desired output). The classical implementation in the domain of neural networks is the backpropagation algorithm. If this learning set is sufficiently representative of the underlying data distributions, one hopes that after learning, the system is able to generalize correctly to other inputs of the same distribution.

992

Holger Schwenk, Maurice Milgram

It would be better to have more powerful techniques to incorporate knowledge into the learning process than the choice of a set of examples. The use of additional knowledge is often limited to the feature extraction module. Besides simple operations like (size) normalization, we can find more sophisticated approaches like zernike moments in the domain of optical character recognition (OCR). In this paper we will not investigate this possibility, all discussed classifiers work directly on almost non preprocessed data (pixels).

In the context of OCR interest focuses on invariance of the classifier under a number of given transformations (translation, rotation, ... ) of the data to classify. In general a neural network could extract those properties of a large enough learning set, but it is very hard to learn and will probably take a lot of time. In the last years two main approaches for this invariance problem have been proposed: tangent-prop and tangent-distance. An indirect incorporation can be achieved by boosting (Drucker, Schapire and Simard, 1993).

In this paper we briefly discuss these approaches and will present a new classification system which allows the efficient incorporation of transformation invariances.

1.1 TANGENT PROPAGATION

The principle of tangent-prop is to specify besides desired outputs also desired changes jJJ. of the output vector when transforming the net input x by the transformations tJJ. (Simard, Victorri, LeCun and Denker, 1992). For this, let us define a transformation of pattern p as t(p, a) : P --t P where P is the space of all patterns and a a parameter. Such transformations are in general highly nonlinear operations in the pixel space P and their analytical expressions are seldom known. It is therefore favorable to use a first order approximation:

Do not remove: This comment is monitored to verify that the site is working properly