{"title": "A Topographic Product for the Optimization of Self-Organizing Feature Maps", "book": "Advances in Neural Information Processing Systems", "page_first": 1141, "page_last": 1147, "abstract": "", "full_text": "A Topographic Product for the Optimization \n\nof Self-Organizing Feature Maps \n\n-\n\nHans-Ulrich Bauer, Klaus Pawelzik, Theo Geisel \n\nInstitut fUr theoretische Physik and SFB Nichtlineare Dynamik \n\nU niversitat Frankfurt \nRobert-Mayer-Str. 8-10 \nW -6000 Frankfurt 11 \nFed. Rep . of Germany \n\nemail: bauer@asgard.physik.uni-frankfurt.dbp \n\nAbstract \n\nOptimizing the performance of self-organizing feature maps like the Ko(cid:173)\nhonen map involves the choice of the output space topology. We present \na topographic product which measures the preservation of neighborhood \nrelations as a criterion to optimize the output space topology of the map \nwith regard to the global dimensionality DA as well as to the dimensi(cid:173)\nons in the individual directions. We test the topographic product method \nnot only on synthetic mapping examples, but also on speech data. In the \nlatter application our method suggests an output space dimensionality of \nDA = 3, in coincidence with recent recognition results on the same data \nset. \n\n1 \n\nINTRODUCTION \n\nSelf-organizing feature maps like the Kohonen map (Kohonen, 1989, Ritter et al., \n1990) not only provide a plausible explanation for the formation of maps in brains, \ne.g. in the visual system (Obermayer et al., 1990), but have also been applied to \nproblems like vector quantization, or robot arm control (Martinetz et al., 1990). \nThe underlying organizing principle is the preservation of neighborhood relations. \nFor this principle to lead to a most useful map, the topological structure of the \noutput space must roughly fit the structure of the input data. However, in technical \n1141 \n\n\f1142 \n\nBauer, Pawelzik, and Geisel \n\napplications this structure is often not a priory known. For this reason several \nattempts have been made to modify the Kohonen-algorithm such, that not only \nthe weights, but also the output space topology itself is adapted during learning \n(Kangas et al., 1990, Martinetz et al., 1991). \nOur contribution is also concerned with optimal output space topologies, but we \nfollow a different approach, which avoids a possibly complicated structure of the \noutput space. First we describe a quantitative measure for the preservation of neigh(cid:173)\nborhood relations in maps, the topographic product P. The topographic product \nhad been invented under the name of\" wavering product\" in nonlinear dynamics in \norder to optimize the embeddings of chaotic attractors (Liebert et al., 1991). P = 0 \nindicates perfect match of the topologies. P < 0 (P > 0) indicates a folding of \nthe output space into the input space (or vice versa), which can be caused by a \ntoo small (resp. too large) output space dimensionality. The topographic product \ncan be computed for any self-organizing feature map, without regard to its specific \nlearning rule. Since judging the degree of twisting and folding by visually inspec(cid:173)\nting a plot of the map is the only other way of \"measuring\" the preservation of \nneighborhoods, the topographic product is particularly helpful, if the input space \ndimensionality of the map exceeds DA = 3 and the map can no more be visualized. \nTherefore the derivation of the topographic product is already of value by itself. \n\nIn the second part of the paper we demonstrate the use of the topographic product \nby two examples. The first example deals with maps from a 2D input space with \nI,lonflat stimulus distribution onto rectangles of different aspect ratios, the second \nexample with the map of 19D speech data onto output spaces of different dimen(cid:173)\nsionality. In both cases we show, how the output space topology can be optimized \nusing our method. \n\n2 DERIVATION OF THE TOPOGRAPHIC PRODUCT \n\n2.1 KOHONEN.ALGORlTHM \n\nIn order to introduce the notation necessary to derive the topographic product, we \nvery briefly recall the Kohonen algorithm. It describes a map from an input space \nV into an output space A. Each node j in A has a weight vector Wj associated with \ni.t, which points into V. A stimulus v is mapped onto that node i in the output \nspace, which minimizes the input space distance dV (Wi, v): \n\n(1) \n\nDuring a learning step, a random stimulus is chosen in the input space and mapped \nonto an output node i according to Eq. 1. Then all weights Wj are shifted towards v, \nwith the amount of shift for each weight vector being determined by a neighborhood \nfunction h~,j: \n\n(2) \n(dA(j, i) measures distances in the output space.) hj i effectively restricts the nodes \nparticipating in the learning step to nodes in the vl~inity of i. A typical choice for \n\n\fA Topograhic Product for the Optimization of Self-Organizing Feature Maps \n\n1143 \n\nthe neighborhood function is \n\n(3) \n\nIn this way the neighborhood relations in the output space are enforced in the \ninput space, and the output space topology becomes of crucial importance. Finally \nit should be mentioned that the learning step size c as well as the width of the \nneighborhood function u are decreased during the learning for the algorithm to \nconverge to an equilibrium state. A typical choice is an exponential decrease . For \na detailed discussion of the convergence properties of the algorithm, see (Ritter et \nal., 1988). \n\n2.2 TOPOGRAPHIC PRODUCT \n\nAfter the learning phase, the topographic product is computed as follows. For each \noutput space node j, the nearest neighbor ordering in input space and output space \nis computed (nt(j) denotes the k-th nearest neighbor of j in A, n\"y (j) in V). Using \nthese quantities, we define the ratios \n\nQI(j,k) \n\nQ2(j, k) \n\ndV (Wj, wn~(j) \nV \nd (Wj, wn~(j) \ndA (j, nt (j\u00bb \ndA(j, n\"y (j\u00bb \n\n' \n\n(4) \n\n(5) \n\nOne has QI(j, k) = Q2(j, k) = 1 only, if the k-th nearest neighbors in V and A \ncoincide. Any deviations of the nearest neighbor ordering will result in values for \nQI.2 deviating from 1. However, not all differences in the nearest neighbor orderings \nin V and A are necessarily induced by neighborhood violations. Some can be due \nto locally varying magnification factors of the map, which in turn are induced by \nspatially varying stimulus densities in V. To cancel out the latter effects, we define \nthe products \n\nPI(j, k) \n\nP2(j, k) \n\nFor these the relations \n\nPI(j, k) > 1, \nP2 (j, k) < 1 \n\n(6) \n\n(7) \n\nhold. Large deviations of PI (resp. P2) from the value 1 indicate neighborhood \nviolations, when looking from the output space into the input space (resp. from the \ninput space into the output space). In order to get a symmetric overall measure, \nwe further multiply PI and P2 and find \n\nP3(j, k) \n\n(8) \n\n\f1144 \n\nBauer, Pawelzik, and Geisel \n\nFurther averaging over all nodes and neighborhood orders finally yields the topo(cid:173)\ngraphic product \n\np = \n\n1 \n\nN(N - 1) f; ~ log(P3(j, k)). \n\nN N-l \n\n(9) \n\nThe possible values for P are to be interpreted as follows: \n\nP < 0: \nPO : \nP > 0: \n\noutput space dimension DA too low, \noutput space dimension DA o.k., \noutput space dimension DA too high. \n\nThese formulas suffice to understand how the product is to be computed. A more \ndetailed explanation for the rational behind each individual step of the derivation \ncan be found in a forthcoming publication (Bauer et al., 1991). \n\n3 EXAMPLES \n\nWe conclude the paper with two examples which exemplify how the method works. \n\n3.1 \n\nILLUSTRATIVE EXAMPLE \n\nThe first example deals with the mapping from a 2D input space onto rectangles of \ndifferent aspect ratios. The stimulus distribution is flat in one direction, Gaussian \nRhaped in the other (Fig 1a). The example demonstrates two aspects of our method \nat once. First it shows that the method works fine with maps resulting from nonflat \nstimulus distributions. These induce spatially varying areal magnification factors \nof the map, which in turn lead to twists in the neighborhood ordering between \ninput space and output space. Compensation for such twists was the purpose of \nthe multiplication in Eqs (6) and (7) . \n\nTable 1: Topographic product P for the map from a square input space with a \nGaussian stimulus distribution in one direction, onto rectangles with different aspect \nratios. The values for P are averaged over 8 networks each. The 43x 6-output space \nmatches the input data best, since its topographic product is smallest. \n\nN \n\naspect ratio \n\nP \n\n256x 1 \n128x2 \n64x4 \n43x6 \n32x8 \n21 x 12 \n16x 16 \n\n256 \n64 \n16 \n\n7.17 \n4 \n1.75 \n1 \n\n-0.04400 \n-0.03099 \n-0.00721 \n0.00127 \n0.00224 \n0.01335 \n0.02666 \n\n\fA Topograhic Product for the Optimization of Self-Organizing Feature Maps \n\n1145 \n\nP(x.yl \n\n1.0. \nas \n\n0..6 \n\n0..4 \n\n0.2 \n\nFig. la \n\n1.0. \n\nD.S '-\n\n0.6 f-\n\n0..4 ~ \n\n0..2 ,... \n\n0. \n0..2 \n\nFig. Ie \n\ny.. \n\n-\n\n-\n\n-\n\n-\n\nI \n\n--\n\n0..4 \n\n0.6 \n\nx \n\nD.S \n\nto. \n\no.S \n\n0..6 \n\n>. \n\n0.4 \n\n0.2 \n\n0. \n0.2 \n\nto. \n\n0..8 \n\n0..6 \n\n>. \n\n0..4 \n\n0..2 \n\n0. \n0..2 \n\n0..4 \n\n0..6 \n\nX \n\nD.S \n\nFig. Ib \n\n0..L. \n\nX \n\n0..6 \n\n0..8 \n\nFig. Id \n\nFigure 1: Self-organizing feature maps of a Gaussian shaped (a) 2-dimensional \nstimulus distribution onto output spaces with 128 x 2 (b), 43 x 6 (c) and 16 x 16 \n(d) output nodes. The 43 x 6-output space preserves neighborhood relations best. \n\n\f1146 \n\nBauer, Pawelzik, and Geisel \n\nSecondly the method cannot only be used to optimize the overall output space \ndimensionality, but also the individual dimensions in the different directions (i.e. \nthe different aspect ratios). If the rectangles are too long, the resulting map is \nfolded like a Peano curve (Fig. Ib), and neighborhood relations are severely violated \nperpendicular to the long side of the rectangle. If the aspect ratio fits, the map has \nlc), neighborhoods are preserved. The zig-zag-form at the \na regular look (Fig. \nouter boundary of the rectangle does not correspond to neighborhood violations. \nIf the rectangle approaches a square, the output space is somewhat squashed into \nthe input space, again violating neighborhood relations (Fig. Id). The topographic \nproduct P coincides with this intuitive evaluation (Tab. 1) and picks the 43 x 6-net \n38 the most neighborhood preserving one. \n\n3.2 APPLICATION EXAMPLE \n\nIn our second example speech data is mapped onto output spaces of various di(cid:173)\nmensionality. The data represent utterances of the ten german digits, given as \n19-dimensional acoustical feature vectors (GramB et al., 1990). The P-values for \nthe different maps are given in Tab. 2. For both the speaker-dependent as well as \nthe speaker-independent case the method distinguishes the maps with DA = 3 as \nmost neighborhood preserving. Several points are interesting about these results. \nFirst of all, the suggested output space dimensionality exceeds the widely used \nDA = 2. Secondly, the method does not generally judge larger output space di(cid:173)\nmensions as more neighborhood preserving, but puts an upper bound on DA. The \ndata seems to occupy a submanifold of the input space which is distinctly lower \nthan four dimensional. Furthermore we see that the transition from one to several \nspeakers does not change the value of DA which is optimal under neighborhood \nconsiderations. This contradicts the expectation that the additional interspeaker \nvariance in the data occupies a full additional dimension. \n\nTable 2: Topographic product P for maps from speech feature vectors in a 19D \nir. put space onto output spaces of different dimensionality D V. \n\nD V \n\nN \n\n1 \n2 \n3 \n4 \n\n256 \n16x 16 \n7x6x6 \n4x4x4x4 \n\nP \n\nspeaker-\ndependent \n\n-0.156 \n-0.028 \n0.019 \n0.037 \n\nP \n\nspeaker-\n\nindependent \n\n-0.229 \n-0.036 \n0.007 \n0.034 \n\nWhat do these results mean for speech recognition? Let us suppose that several \nutterances of the same word lead to closeby feature vector sequences in the input \nspace. If the mapping was not neighborhood preserving, one should expect the tra(cid:173)\njectories in the output space to be separated considerably. If a speech recognition \nsystem compares these output space trajectories with reference trajectories corre(cid:173)\nsponding to reference utterances of the words, the probability of misclassification \nrises. So one should expect that a word recognition system with a Kohonen-map \n\n\fA Topograhic Product for the Optimization of Self-Organizing Feature Maps \n\n1147 \n\npreprocessor and a subsequent trajectory classifier should perform better if the \nneighborhoods in the map are preserved. \n\nThe results of a recent speech recognition experiment coincide with these heuristic \nexpectations (Brandt et al., 1991). The experiment was based on the same data \nset, made use of a Kohonen feature map as a preprocessor, and of a dynamic time(cid:173)\nwarping algorithm as a sequence classifier. The recognition performance of this \nhybrid system turned out to be better by about 7% for a 3D map, compared to a \n2D map with a comparable number of nodes (0.795 vs. 0.725 recognition rate). \n\nAcknowledgements \n\nThis work was supported by the Deutsche Forschungsgemeinschaft through SFB \n185 \"Nichtlineare Dynamik\", TP A10. \n\nReferences \n\nH.-U. Bauer, K. Pawelzik, Quantifying the Neighborhood Preservation of Self(cid:173)\nOrganizing Feature Maps, submitted to IEEE TNN (1991). \n\nW.D. Brandt, H. Behme, H.W. Strube, Bildung von Merkmalen zur Spracherken(cid:173)\nnung mittels Phonotopischer Karten, Fortschritte der Akustik - Proc. of DAGA 91 \n(DPG GmbH, Bad Honnef), 1057 (1991). \n\nT. GramB, H.W. Strube, Recognition of Isolated Words Based on Psychoacoustics \nand Neurobiology, Speech Comm. 9, 35 (1990). \nJ .A. Kangas, T.K. Kohonen, J .T. Laaksonen, Variants of Self-Organizing Maps, \nIEEE Trans. Neur. Net. 1,93 (1990). \nT. Kohonen, Self-Organization and Associative Memory, 3rd Ed., Springer (1989). \nW. Liebert, K. Pawelzik, H.G. Schuster, Optimal Embeddings of Chaotic Attractors \nfrom Topological Considerations, Europhysics Lett. 14,521 (1991). \n\nT. Martinetz, H. Ritter, K. Schulten, Three-Dimensional Neural Net for Learning \nVisuomotor Coordination of a Robot Arm, IEEE Trans. Neur. Net. 1, 131 (1990). \nT. Martinetz, K. Schulten, A \"Neural-Gas\" Network Learns Topologies, Proc. \nICANN 91 Helsinki, ed. Kohonen et al., North-Holland, 1-397 (1991). \nK. Obermaier, H. Ritter, K. Schulten, A Principle for the Formation of the Spatial \nStructure of Cortical Feature Maps, Proc. Nat. Acad. Sci. USA 87, 8345 (1990). \nH. Ritter, K. Schulten, Convergence Properties of Kohonen's Topology Conserving \nMaps: Fluctuations, Stability and Dimension Selection, BioI. Cyb. 60, 59-71 \n(1988). \nH. Ritter, T. Martinetz, K. Schulten, Neuronale Netze, Addison Wesley (1990). \n\n\f\fPART XIV \n\nPERFORMANCE \nCOMPARISONS \n\n\f\f", "award": [], "sourceid": 508, "authors": [{"given_name": "Hans-Ulrich", "family_name": "Bauer", "institution": null}, {"given_name": "Klaus", "family_name": "Pawelzik", "institution": null}, {"given_name": "Theo", "family_name": "Geisel", "institution": null}]}