{"title": "Investment Learning with Hierarchical PSOMs", "book": "Advances in Neural Information Processing Systems", "page_first": 570, "page_last": 576, "abstract": "", "full_text": "Investment Learning \n\nwith Hierarchical PSOMs \n\nJorg Walter and Helge Ritter \n\nDepartment of Information Science \n\nUniversity of Bielefeld, D-33615 Bielefeld, Germany \n\nEmail:  {walter.helge}@techfak.uni-bielefeld.de \n\nAbstract \n\nWe propose a hierarchical scheme for rapid learning of context dependent \n\"skills\"  that  is  based  on  the  recently  introduced  \"Parameterized  Self(cid:173)\nOrganizing Map\" (\"PSOM\"). The underlying idea is to first invest some \nlearning effort to  specialize the  system  into  a  rapid learner for  a  more \nrestricted range of contexts. \n\nThe specialization is carried out by a prior \"investment learning stage\", \nduring which the system acquires a set of basis mappings or \"skills\" for \na  set of prototypical contexts.  Adaptation  of a \"skill\" to a new context \ncan then be achieved by interpolating in the space of the basis mappings \nand thus can be extremely rapid. \n\nWe demonstrate the potential of this approach for the task of a 3D visuo(cid:173)\nmotor map for  a  Puma robot and  two  cameras.  This  includes  the for(cid:173)\nward and backward robot kinematics in  3D end effector coordinates, the \n2D+2D retina coordinates and also the 6D joint angles.  After the invest(cid:173)\nment phase  the transformation can  be learned for  a new camera set-up \nwith a single observation. \n\n1  Introduction \n\nMost current applications of neural network learning algorithms suffer from a large number \nof required training examples.  This may not be a problem when data are abundant, but in \nmany application domains,  for example in robotics,  training examples  are costly  and the \nbenefits of learning can only be exploited when significant progress can be made within a \nvery small number of learning examples. \n\n\fInvestment Learning with  Hierarchical PSOMs \n\n571 \n\nIn the present contribution, we propose in section 3 a hierarchically structured learning ap(cid:173)\nproach which can be applied to many learning tasks that require system identification from \na  limited set of observations.  The idea builds  on the recently introduced  \"Parameterized \nSelf-Organizing  Maps\"  (\"PSOMs\"),  whose  strength  is  learning maps  from  a  very  small \nnumber of training examples [8,  10,  11]. \nIn  [8], the feasibility  of the approach was demonstrated in the domain of robotics, among \nthem, the learning of the inverse kinematics transform of a full 6-degree of freedom (DOF) \nPuma robot.  In  [10],  two  improvements were  introduced,  both achieve  a  significant  in(cid:173)\ncrease in  mapping accuracy and  computational efficiency.  In  the next section,  we  give a \nshort summary of the PSOM  algorithm;  it is  decribed in  more detail  in  [11]  which  also \npresents applications in the domain of visual learning. \n\n2  The PSOM Algorithm \n\nA  Parameterized Self-Organizing  Map  is  a  parametrized,  m-dimensional  hyper-surface \nM  =  {w(s)  E  X  ~ rn.dls  E  S  ~ rn.m}  that is  embedded in  some higher-dimensional \nvector space X.  M  is  used in a very similar way as  the standard discrete self-organizing \nmap:  given  a  distance measure dist(x, x')  and  an  input vector x,  a  best-match  location \ns*(x) is determined by minimizing \n\ns*:=  argmin  dist(x, w(s)) \n\nSES \n\n(1) \n\nThe associated \"best-match vector\" w(s*) provides the best approximation of input x in the \nmanifold M. If we require dist(\u00b7) to vary only in a subspace X in of X  (i.e., dist( x, x') = \ndist(Px, Px/),  where the diagonal matrix P projects into xin), s* (x) actually will only \ndepend on Px. The projection (l-P)w(s* (x))  E x out ofw(s* (x)) lies in the orthogonal \nsubspace x out  can  be  viewed  as  a  (non-linear) associative completion of a fragmentary \ninput x  of which only the part Px is  reliable.  It is  this  associative mapping that we will \nexploit in applications of the PSOM. \n\nM \n\nis \n\nX out \n\n3 \n\nD  aeA;;S \n\nconstructed \nas  a  manifold  that  passes \nthrough  a  given  set  D  of \ndata  examples  (Fig.  I  de(cid:173)\npicts the situation schemat(cid:173)\nically).  To  this  end,  we \nassign  to  each  data  sam(cid:173)\nple  a  point  a  E  Sand \ndenote the  associated  data \nsample by Wa.  The set A \nof the  assigned  parameter \nvalues  a  should  provide a \ngood  discrete  \"model\"  of \nthe topology of our data set \n(Fig.  I  right).  The assign(cid:173)\nment between data vectors \nand points a  must be made \nin  a  topology preserving fashion  to  ensure good interpolation by  the manifold M  that is \nobtained by the following steps. \n\nFigure 1:  Best-match s*  and associative completion w(s*(x)) of \ninput  Xl, X2  (Px) given  in  the  input  subspace  Xin.  Here  in this \nsimple  case,  the  m  = 1 dimensional  manifold  M  is  constructed \nto  pass  through  four  data  vectors  (square  marked).  The  left  side \nshows the d = 3 dimensional embedding space X  =  xin  X  X out \nand the right side depicts the best match parameter s* (x) parameter \nmanifold S together with the \"hyper-lattice\" A of parameter values \n(indicated by white squares) belonging to the data vectors. \n\n\f572 \n\n1. WALTER, H.  RITTER \n\nFor each point a  E  A, we construct a \"basis function\" H(\u00b7, a; A) or simplified I  H(\u00b7, a)  : \nS  ~ 1R that obeys ( i) H (ai, aj) =  1 for i  =  j  and vanishes at all other points of A  i =J  j \n(orthonormality condition,) and (ii) EaEA H (a, s) =  1 for '<Is (\"partition of unity\" condi(cid:173)\ntion.) We will mainly be concerned with the case of A  being a m-dimensional rectangular \nhyper-lattice; in this case, the functions H(\u00b7, a) can be constructed as products of Lagrange \ninterpolation polynomials, see [11] . Then, \n\nW(s)  =  L  H(s, a)  Wa\u00b7 \n\naEA \n\n(2) \n\ndefines  a manifold M  that passes through all data examples.  Minimizing dist(\u00b7)  in Eq.  1 \ncan  be  done  by  some iterative procedure,  such  as  gradient descent or - preferably - the \nLevenberg-Marquardt algorithm [11].  This makes M  into the attractor manifold of a (dis(cid:173)\ncrete time) dynamical system.  Since M contains the data set D, any at least m-dimensional \n\"fragment\" of a data example x  =  wED will  be attracted to the correct completion w. \nInputs x  \u00a2 D will be attracted to some approximating manifold point. \n\nThis  approach  is  in  many  ways  the  continuous  analog  of  the  standard  discrete  self(cid:173)\norganizing map.  Particularly  attractive  features  are  (i)  that  the  construction of the  map \nmanifold  is  direct  from  a  small set of training  vectors,  without  any  need  for  time  con(cid:173)\nsuming adaptation sequences,  (ii)  the capability of associative completion,  which allows \nto freely  redefine variables as  inputs or outputs (by changing dist(\u00b7)  on demand, e.g. one \ncan reverse the mapping direction), and  (iii) the possibility of having attractor manifolds \ninstead of just attractor points. \n\n3  Hierarchical PSOMs:  Structuring Learning \n\nRapid learning requires that the structure of the learner is  well matched to his  task.  How(cid:173)\never, if one does not want to pre-structure the learner by hand, learning again seems to be \nthe only way to achieve the necessary pre-structuring. This leads to the idea of structuring \nlearning itself and motivates to split learning into two stages: \n(i) The earlier stage is considered as an  \"investment stage\" that may be slow and that may \nrequire a larger number of examples.  It has the task to pre-structure the system in such a \nway that in the later stage, \n(ii) the now specialized system can learn fast and with extremely few examples. \n\nTo  be  concrete,  we  consider  specialized  mappings  or \"skills\",  which  are  dependent on \nthe state of the system or system environment.  Pre-structuring the  system is  achieved by \nlearning a set of basis mappings, each in a prototypical system context or environment state \n(\"investment phase\".)  This imposes a strong need for an  efficient learning tool- efficient \nin particular with respect to the number of required training data points. \n\nThe  PSOM  networks  appears  as  a  very  attractive  solution:  Fig.  2  shows  a  hierarchical \narrangement of two PSOM. The task of mapping from input to output spaces is learned -\nand performed, by the \"Transformation-PSOM\" (\"T-PSOM\"). \n\nDuring the first learning stage, the investment learning phase the T-PSOM is  used to learn \n:  Xl  +-t  X2  or context dependent \"skills\" is constructed in the \na set of basis mappings Tj \n\"T-PSOM\", each of which gets encoded as  a internal parameter or \"weight\"  set Wj .  The \n\n1 In contrast to kernel methods, the basis functions may depend on the relative POSition to all other \n\nknots.  However, we drop in our notation the dependency  H (a, s) = H (a, s; A) on the latter. \n\n\fInvestment Learning  with  Hierarchical  PSOMs \n\n573 \n\nContext \n\n.~~~~ .\u2022 (Meta-PSOM) ......... ~  CO. \n-.. ---.~  (  T-P!OM  ) \n\n:  weIghts \n\nFigure 2:  The transforming ''T-PSOM'' maps between input and output spaces (changing direction \non demand).  In a particular environmental context, the correct transformation is learned, and encoded \nin  the  internal  parameter or  weight set w.  Together  with  an  characteristic environment  observation \nUref,  the  weight  set  w  is  employed  as  a training  vector for  the  second  level  \"Meta-PSOM\".  After \nlearning a structured set of mappings, the Meta-PSOM is able to generalizing the mapping for a new \nenvironment.  When  encountering any  change,  the environment  observation  Uref  gives  input to  the \nMeta-PSOM and determines the new weight set w for the basis T-PSOM. \n\nsecond level  PSOM (\"Meta-PSOM\") is  responsible for learning the  association  between \nthe weight sets Wj  of the first level T-PSOM and their situational contexts. \nThe system context is characterized by a suitable environment observation, denoted ure /' \nsee Fig. 2. \nThe context situations are chosen such that the associated basis mappings capture already a \nsignificant amount of the underlying model structure, while still being sufficiently general \nto capture the variations with respect to which system environment identification is desired. \nFor the training of the second level Meta-PSOM each constructed T-PSOM weight set Wj \nserves together with its  associated environment observation ure/,j  as  a high dimensional \ntraining data vector. \n\nRapid learning is  the return on  invested effort in  the  longer pre-training phase.  As a re(cid:173)\nsult, the task of learning the \"skill\" associated with an  unknown system context now takes \nthe form  of an  immediate Meta-PSOM --+ T-PSOM  mapping:  the Meta-PSOM  maps  the \nnew  system  context observation  ure/,new  into the  parameter set Wnew  for  the T-PSOM. \nEquipped with Wnew , the T-PSOM provides the desired mapping Tnew. \n\n4  Rapid Learning of a Stereo Visuo-motor Map \n\nIn  the  following,  we  demonstrate the  potential of the  investment learning  approach, with \nthe task of fast learning of 3D vi suo-motor maps for a robot manipulator seen by a pair of \nmovable cameras. Thus, in this demonstration, each situated context is given by a particular \ncamera arrangement, and the assicuated \"skill\" is  the mapping between camera and robot \ncoordinates. \nThe Puma robot is positioned behind a table and the entire scene is displayed on two win(cid:173)\ndows  on  a computer monitor.  By mouse-pointing,  a user can,  for  example,  select on the \nmonitor one point and the position on a line appearing in  the other window,  to indicate a \ngood position for the robot end effector, see Fig.  3.  This requires to compute the transfor(cid:173)\nmation T  between pixel coordinates U =  (uL , uR )  on the monitor images and correspond(cid:173)\ning world coordinates if in the robot reference frame - or alternatively - the corresponding \nsix robot joint angles (j (6 DOF). Here we demonstrate an integrated solution, offering both \nsolutions with the same network. \n\nThe T-PSOM  learns  each  individual  basis  mapping Tj  by  visiting  a rectangular grid  set \nof end effector positions ei  (here a 3x3x3 grid in if of size 40 x 40 x 30cm3 )  jointly with \n\n\f574 \n\nJ. WALTER, H.  RITTER \n\n(OL \n\nweights \n\nFigure 3:  Rapid learning of the 3D  visuo-motor coordination for  two cameras.  The basis T-PSOM \n(m  = 3)  is  capable of mapping  to  and  from  three coordinate  systems:  Cartesian robot  world  co(cid:173)\nordinates, the robot joint angles  (6-DOF), and the location of the end-effector in coordinates of the \ntwo camera retinas.  Since the left and right camera can be relocated independently, the weight set of \nT-PSOM is split, and parts W L, W R  are learned in two separate Meta-PSOMs (\"L\" and \"R\"). \n\n..... R \n\nthe joint angle tuple ~ and the location in camera retina coordinates (2D in each camera) \nut, uf\u00b7  Thus the training vectors wai  for the construction of the T-PSOM are the tuples \n( ~  ~  -+L \nXi, (}i, U i  'Ui  ). \nHowever, each Tj  solves  the mapping task only for  the current camera arrangement,  for \nwhich Tj  was  learned.  Thus there is  not yet any particular advantage to other, specialized \nmethods for camera calibration [1].  The important point is,  that we  now will employ the \nMeta-PSOM to interpolate in the space of the mappings {Tj }. \n\nTo  keep  the  number of prototype  mappings  manageable,  we  reduce  some  DOFs  of the \ncameras by calling for fixed focal length. camera tripod height. and twist joint. To constrain \nthe elevation and azimuth viewing angle. we require one land mark f.!ix  to remain visible \nin  a constant image position.  This  leaves  two  free  parameters per camera,  that can  now \nbe determined by one extra observation of a chosen auxiliary world reference point f.re!. \nWe  denote the camera image coordinates of f.re!  by Ure!  =  (u~e! ' u::e!).  By reuse of the \ncameras as \"environment sensor\", Ure!  now implicitly encodes the two camera positions. \n\nIn  the  investing pre-training phase,  nine  mappings Tj  are  learned by the T-PSOM,  each \ncamera visiting a 3 x 3 grid. sharing the set of visited robot positions f.i.  As Fig. 2 suggests, \nnormal1y  the entire weight set w serves  as  part of the training vector to  the Meta-PSOM. \nHere the problem becomes factorized since the left and right camera change tripod place \nindependently:  the weight set of the T-PSOM is  split,  and the two parts can be learned in \nseparate Meta-PSOMs. Each training vector Waj for the left camera Meta-PSOM consists \nof the  context observation  u~e! and  the  T-PSOM  weight  set part w L  =  (uf,\u00b7\u00b7\u00b7, U~7) \n(analogous the right camera Meta-PSOM.) \n\nThis enables in  the following phase the rapid learning, for  new.  unknown camera places. \nOn the basis of one single observation Ure!.  the desired transformation T  is  constructed. \nAs  visualized in  Fig.  3.  Ure!  serves as  the input to  the second level Meta-PSOMs.  Their \noutputs are interpolations between previously learned weight sets and they project directly \ninto the weight set of the basis level T-PSOM. \n\nThe resulting T-PSOM  can  map  in  various  directions.  This  is  achieved  by  specifying a \nsuitable distance function dist(\u00b7)  via the projection matrix P, e.g.: \n\ni(u) \n\n(  .... \nDUI-tX \nI'T-PSOM  u; \n\n(3) \n\n\fInvestment Learning  with  Hierarchical PSOMs \n\n8(u) \nu(x) \nwL(u~el ) \n\nFT~tSOM (u;  wL( u~e/)' WR (u~/)) \nFf~PSOM(X;  wL(u~e/),WR(U~e/)) \nFM7t~-PSOM,L(U~e/;  OL);  analog WR(u~/) \n\n575 \n\n(4) \n(5) \n(6) \n\nTable 1 shows experimental results averaged over 100 random locations ~ (from within the \nrange of the training set) seen in  10 different camera set-ups, from within the 3 x 3 square \ngrid of the training positions, located in a normal distance of about 125 cm (center to work \nspace center,  1 m2 ,  total  range of about 55-21Ocm),  covering a  disparity  angle range of \n25\u00b0-150\u00b0.  For identification  of the positions  ~ in  image coordinates,  a  tiny  light source \nwas installed at the manipulator tip and a simple procedure automated the finding of u with \nabout  \u00b10.8 pixel  accuracy.  For the  achieved  precision  it is  important to  share  the  same \nset of robot positions ~i' and that the sets  are topologically ordered, here as  a 3x3x3 goal \nposition grid (i) and two 3 x 3 camera location (j) grids. \n\nMapping Direction \npixel Ul--t  Xrobot  =>  Cartesian error !:1x \nCartesian x I--t  u =>  pixel error \npixel Ul--t  o\"obot  => Cartesian error !:1x \n\nDirect trained \n\nT-PSOM \n\nT-PSOMwith \nMeta-PSOM \n\n1.4mm  0.008 \n1.2pix \n0.010 \n3.8mm  0.023 \n\n4.4mm  0.025 \n3.3 pix \n0.025 \n5.4mm  0.030 \n\nTable  1:  Mean Euclidean deviation (mm or pixel) and normalized root mean square error (NRMS) \nfor 1000 points total in comparison of a direct trained T-PSOM and the described hierarchical Meta(cid:173)\nPSOM network, in the rapid learning mode after one single observation. \n\n5  Discussion and Conclusion \n\nA  crucial  question  is  how  to  structure  systems,  such  that  learning  can  be  efficient.  In \nthe present paper, we demonstrated a hierarchical approach that is motivated by a decom(cid:173)\nposition of the  learning phase  into  two  different stages:  A  longer,  initial  learning phase \n\"invests\" effort into  a gradual  and domain-specific  specialization of the system.  This  in(cid:173)\nvestment  learning  does  not yet  produce the  final  solution,  but instead  pre-structures  the \nsystem such that  the subsequently final  specialization  to  a particular solution (within  the \nchosen domain) can be achieved extremely rapidly. \n\nTo implement this  approach,  we used  a hierarchical  architecture of mappings.  While in \nprinciple  various  kinds  of network types  could  be used  for  this  mappings,  a  practically \nfeasible  solution  must be  based  on  a  network  type  that  allows  to  construct the required \nbasis mappings from rather small number of training examples.  In addition, since we use \ninterpolation in  weight space,  similar mappings should give rise  to  similar weight sets to \nmake interpolation meaningful.  PSOM meat this requirements very well, since they allow \na direct non-iterative construction of smooth mappings from rather small data sets.  They \nachieve this be generalizing the discrete self-organizing map [3, 9]  into a continuous map \nmanifold such that interpolation for new data points can benefit from topology information \nthat is not available to most other methods. \n\nWhile  PSOMs  resemble  local  models  [4,  5,  6]  in  that  there  is  no  interference between \ndifferent training  points,  their use  of a  orthogonal set of basis  functions  to  construct the \n\n\f576 \n\nJ.  WALTER, H.  RIITER \n\nmap  manifold put them  in  a  intennediate position  between  the  extremes  of local  and  of \nfully distributed models. \n\nA  further very  useful  property in  the present context is  the ability  of PSOMs  to  work as \nan  attractor network with  a continuous attractor manifold.  Thus  a PSOM  needs  no fixed \ndesignation of variables as inputs and outputs; Instead the projection matrix P  can be used \nto freely partition the full set of variables into input and output values.  Values of the latter \nare obtained by a process of associative completion. \n\nTechnically, the investment learning phase is realized by learning a set of prototypical basis \nmappings represented as  weight sets of a T-PSOM that attempt to cover the range of tasks \nin the given domain.  The capability for subsequent rapid specialization within the domain \nis  then  provided by  an  additional mapping that maps  a situational context into a suitable \ncombination of the previously learned prototypical basis  mappings.  The construction  of \nthis mapping again is solved with a PSOM (\"Meta\"-PSOM) that interpolates in the space \noJprototypical basis mappings that were constructed during the \"investment phase\". \n\nWe demonstrated the potential of this approach with the task of 3D visuo-motor mapping, \nlearn-able with a single observation after repositioning a pair of cameras. \nThe achieved  accuracy  of 4.4 mm  after learning by  a  single  observation,  compares  very \nwell  with  the  distance range 0.5-2.1 m  of traversed  positions.  As  further  data  becomes \navailable, the T-PSOM can certainly be fine-tuned to improve the perfonnance to the level \nof the directly trained T-PSOM. \n\nThe presented arrangement of a basis T-PSOM and two Meta-PSOMs demonstrates further \nthe possibility to split hierarchical learning in independently changing domain sets.  When \nthe number of involved free context parameters is growing, this factorization is increasingly \ncrucial to keep the number of pre-trained prototype mappings manageable. \n\nReferences \n\n[1]  K.  Fu, R. Gonzalez and C. Lee.  Robotics: Control,  Sensing,  Vision,  and Intelligence.  McGraw(cid:173)\n\nHill, 1987 \n\n[2]  F.  Girosi  and  T.  Poggio.  Networks  and  the  best  approximation  property.  BioI.  Cybem., \n\n63(3):169-176,1990. \n\n[3]  T.  Kohonen.  Self-Organization and Associative Memory.  Springer, Heidelberg,  1984. \n[4]  1.  Moody  and  C.  Darken.  Fast learning in networks of locally-tuned processing units.  Neural \n\nComputation,  1:281-294,  1989. \n\n[5]  S.  Omohundro.  Bumptrees  for  efficient  function,  constraint,  and  classification  learning.  In \n\nNIPS*3,  pages 693-699.  Morgan Kaufman Publishers, 1991. \n\n[6]  1.  Platt.  A resource-allocating network for function interpolation.  Neural Computation,  3:213-\n\n255,1991 \n\n[7]  M.  Powell.  Radial basis functions for multivariable interpolation:  A  review,  pages  143-167. \n\nClarendon Press, Oxford, 1987. \n\n[8]  H. Ritter.  Parametrized self-organizing maps. In S. Gielen and B.  Kappen; editors, ICANN'93-\n\nProceedings, Amsterdam, pages 568-575. Springer Verlag, Berlin, 1993. \n\n[9]  H.  Ritter, T. Martinetz, and  K.  Schulten.  Neural  Computation and Self-organizing  Maps.  Ad(cid:173)\n\ndison Wesley,  1992. \n\n[10]  1.  Walter and  H.  Ritter.  Local PSOMs  and  Chebyshev  PSOMs - improving the  parametrised \n\nself-organizing maps.  In Proc.  ICANN,  Paris, volume 1, pages 95-102, October 1995. \n\n[11]  1.  Walter and H.  Ritter.  Rapid learning with parametrized self-organizing maps.  Neurocomput(cid:173)\n\ning,  Special Issue,  (in press),  1996. \n\n\f", "award": [], "sourceid": 1141, "authors": [{"given_name": "J\u00f6rg", "family_name": "Walter", "institution": null}, {"given_name": "Helge", "family_name": "Ritter", "institution": null}]}