{"title": "A Framework for Non-rigid Matching and Correspondence", "book": "Advances in Neural Information Processing Systems", "page_first": 795, "page_last": 801, "abstract": null, "full_text": "A Framework for Non-rigid Matching \n\nand Correspondence \n\nSuguna Pappu, Steven Gold, and Anand Rangarajan1 \nDepartments of Diagnostic Radiology and Computer Science \n\nand the Yale Neuroengineering and Neuroscience Center \n\nYale University New Haven, CT 06520-8285 \n\nAbstract \n\nMatching feature point sets lies at the core of many approaches to \nobject recognition. We present a framework for non-rigid match(cid:173)\ning that begins with a skeleton module, affine point matching, \nand then integrates multiple features to improve correspondence \nand develops an object representation based on spatial regions to \nmodel local transformations. The algorithm for feature matching \niteratively updates the transformation parameters and the corre(cid:173)\nspondence solution, each in turn. The affine mapping is solved in \nclosed form, which permits its use for data of any dimension. The \ncorrespondence is set via a method for two-way constraint satisfac(cid:173)\ntion, called softassign, which has recently emerged from the neural \nnetwork/statistical physics realm. The complexity of the non-rigid \nmatching algorithm with multiple features is the same as that of \nthe affine point matching algorithm. Results for synthetic and real \nworld data are provided for point sets in 2D and 3D, and for 2D \ndata with multiple types of features and parts. \n\n1 \n\nIntroduction \n\nA basic problem of object recognition is that of matching- how to associate sensory \ndata with the representation of a known object. This entails finding a transforma(cid:173)\ntion that maps the features of the object model onto the image, while establishing a \ncorrespondence between the spatial features. However, a tractable class of transfor(cid:173)\nmation, e.g., affine, may not be sufficient if the object is non-rigid or has relatively \nindependent parts. If there is noise or occlusion, spatial information alone may \nnot be adequate to determine the correct correspondence. In our previous work in \nspatial point matching [1], the 2D affine transformation was decomposed into its \n\nIe-mail address of authors: lastname-firstname@cs.yale.edu \n\n\f796 \n\nS. PAPPU, S. GOLD, A. RANGARAJAN \n\nphysical component elements, which does not generalize easily to 3D, and so, only \na rigid 3D transformation was considered. \n\nWe present a framework for non-rigid matching that begins with solving the basic \naffine point matching problem. The algorithm iteratively updates the affine pa(cid:173)\nrameters and correspondence in turn, each as a function of the other. The affine \ntransformation is solved in closed form, which lends tremendous flexibility- the \nformulation can be used in 2D or 3D. The correspondence is solved by using a \nsoftassign [1] procedure, in which the two-way assignment constraints are solved \nwithout penalty functions. The accuracy of the correspondence is improved by the \nintegration of multiple features. A method for non-rigid parameter estimation is \ndeveloped, based on the assumption of a well-articulated model with distinct re(cid:173)\ngions, each of which may move in an affine fashion, or can be approximated as \nsuch. Umeyama [3] has done work on parameterized parts using an exponential \ntime tree search technique, and Wakahara [4] on local affine transforms, but neither \nintegrates multiple features nor explicitly considers the non-rigid matching case, \nwhile expressing a one-to-one correspondence between points. \n\n2 Affine Point Matching \n\nThe affine point matching problem is formulated as an optimization problem for \ndetermining the correspondence and affine transformation between feature points. \nGiven two sets of data points Xj E Rn-l, n = 3,4 .. . , i = 1, ... , J, the image, \nand Yk E Rn-l, n = 3,4, ... , k = 1, ... , K, the model, find the correspondence and \nassociated affine transformation that best maps a subset of the image points onto a \nsubset of the model point set. These point sets are expressed in homogeneous coor-\ndinates, Xj = (l,Xj), Yk = (1, Yk). {aij} = A E Rnxn is the affine transformation \nmatrix. Note that{alj = 0 Vi} because of the homogeneous coordinates. Define the \nmatch variable Mjk where Mjk E [0,1]. For a given match matrix {Mjd, transfor(cid:173)\nmation A and I, an identity matrix of dimension n, Lj,k MjkllXj - (A + I)Yk112 \nexpresses the similarity between the point sets. The term -a Lj,k Mjk, with pa(cid:173)\nrameter a > 0 is appended to this to encourage matches (else Mjk = 0 V i, k \nminimizes the function). To limit the range of transformations, the terms of the \naffine matrix are regularized via a term Atr(AT A) in the objective function, with \nparameter A, where tr(.) denotes the trace of the matrix. Physically, Xj may fully \nmatch to one Yk, partially match to several, or may not match to any point. A sim(cid:173)\nilar constraint holds for Yk. These are expressed as the constraints in the following \noptimization problem: \n\n(1) \n\ns.t. LMjk::S 1, Vk, LMjk::S 1, Vi and Mjk ~ 0 \n\nj \n\nk \n\nTo begin, slack variables Mj,K+l and MJ+l,k are introduced so that the in(cid:173)\nequality constraints can be transformed into equality constraints: Lf~t Mjk = \n1, Vk and Lf:/ Mjk = 1, Vi. Mj,K+l = 1 indicates that Xj does not match to \nany point in Yk. An equivalent unconstrained optimization problem to (2) is de(cid:173)\nrived by relaxing the constraints via Lagrange parameters Ilj, l/k, and introducing \nan x log x barrier function, indexed by a parameter {3. A similar technique was used \n\n\fA Framework for Nonrigid Matching and Correspondence \n\n797 \n\n[2] to solve the assignment problem. The energy function used is: \n\nmin max LMjkllXj - (A+ J)Yk112 + Atr(AT A) - a LMjk + LJLj(L Mjk -1) \n\nA,M ~,v . \n\n),k \n\nJ \n\nK+1 \n\n. \n),k \n\n. \n) \n1 J+1 K+1 \n\nk=l \n\nK \n\nJ+1 \n\n+ LlIk(LMjk -1) + (j L L Mjk(1ogMjk -1) \n\nk \n\nj=l \n\nj=l k=l \n\nThis is to be minimized with respect to the match variables and affine parameters \nwhile satisfying the constraints via Lagrange parameters. Using the recently devel(cid:173)\noped soft assign technique, we satisfy the constraints explicitly. When A is fixed, \nwe have an assignment problem. Following the development in [1], the assignment \nconstraints are satisfied using soft assign , a technique for satisfying two-way (as(cid:173)\nsignment) constraints without a penalty term that is analogous to softmax which \nenforces a one-way constraint. First, the match variables are initialized: \n\nThis is followed by repeated row-column normalization of the match variables until \na stopping criterion is reached: \n\n(2) \n\nM)\"k = Mjk \n\n'\"\"' \nL--j' Mj'k \n\nMjk \nthen M j k = '\"\"' M \n\nL--k' \n\njk' \n\n(3) \n\nWhen the correspondence between the two point sets is fixed, A can be solved in \nclosed form, by holding M fixed in the objective function, and differentiating and \nsolving for A: \n\nA = A*(M) = (L Mjk(Xj Y[ - YkY{\u00bb(L MjkYkY[ + AI)-l \n\n(4) \n\nj,k \n\nj,k \n\nThe algorithm is summarized as: \n\n1. INITIALIZE: Variables: A = 0, M = 0 \n\nParameters: .Binitial, .Bupdate, .Bfinal T = Inner loop iterations, A \n\n2. ITERATE: Do T times for a fixed value of .B \n\nSoftassign: Re-initialize M*(A) and then (Eq. 2) until ilM small \nA*(M) updated (Eq. 4) \n\n3. UPDATE: While.B < .Bfinal, .B ~.B * .Bupdate, Return to 2. \n\nThe complexity of the algorithm is O(J K). Starting with small .Binitial permits \nmany partial correspondences in the initial solution for M. As.B increases the \ncorrespondence becomes more refined. For large .Bfinal, M approaches a permutation \nmatrix (adjusting appropriately for the slack variables). \n\n3 Nonrigid Feature Matching: Affine Quilts \n\nRecognition of an object requires many different types of information working in \nconcert. Spatial information alone may not be sufficient for representation, espe(cid:173)\ncially in the presence of noise. Additionally the affine transformation is limited in \nits inability to handle local variation in an object, due to the object's non-rigidity \nor to the relatively independent movement of its parts, e.g., in human movement. \n\nThe optimization problem (2) easily generalizes to integrate multiple invariant fea(cid:173)\ntures. A representation with multiple features has a spatial component indicating \n\n\f798 \n\nS. PAPPU, S. GOLD, A. RANGARAJAN \n\nthe location of a feature element. At that location, there may be invariant geomet(cid:173)\nric characteristics, e.g., this point belongs on a curve, or non-geometric invariant \nfeatures such as color, and texture. Let Xjr be the value of feature r associated \nwith point Xj. The location of point Xj is the null feature. There are R features \nassociated with each point Xj and Yk. Note that the match variable remains the \nsame. The new objective function is identical to the original objective function, \n(2), appended by the term \"\u00a3j,k ,r MjkWr(Xjr - Ykr)2. The (Xjr - Ykr)2 quan(cid:173)\ntity captures the similarity between invariant types of features, with Wr a weight(cid:173)\ning factor for feature r. Non-invariant features are not considered. In this way, \nthe point matching algorithm is modified only in the re-initialization of M(A): \nMjk = exp(-,8(IIXj - (I + A)Yk112 + \"\u00a3rWr(Xjr - y kr )2 - a)) The rest of the \nalgorithm remains unchanged. \nDecomposition of spatial transformations motivates classification of the B individual \nregions of an object and use of a \"quilt\" of local affine transformations. In the \nmultiple affine scenario, membership to a region is known on the well-articulated \nmodel, but not on the image set . It is assumed that all points that are members \nof one region undergo the same affine transformation. The model changes by the \naddition of one subscript to the affine matrix, Ab(k) where b(k) is an operator that \nindicates which transformation operates on point k. In the algorithm, during the \nA(M) update, instead of a single update, B updates are done. Denote K(b) = \n{klb(k) = b}, i.e., all the points that are within region b. Then in the affine update, \nAb = Ab(M) = (L: j, kEK(b) Mjk(Xj Y{ - YkY{))(\"\u00a3j, kEK(b) MjkYkY{ + AbI)-l \nHowever, the theoretical complexity does not change, since the B updates still only \nrequire summing over the points. \n\n4 Experimental Results: Hand Drawn and Synthetic \n\nThe speed for matching point sets of 50 points each is around 20 seconds on an SGI \nworkstation with a R4400 processor. This is true for points in 2D, 3D and with \nextra features . This can be improved with a tradeoff in accuracy by adopting a \nlooser schedule for the parameter ,8 or by changing the stopping criterion. \n\nIn the hand drawn examples, the contours of the images are drawn, discretized and \nthen expressed as a set of points in the plane. In Figure (1), the contours of the \nboy's face were drawn in two different positions, and a subset of the points were \nextracted to make up the point sets. In each set this was approximately 250 points. \nNote that even with the change in mood in the two pictures, the corresponding \nparts of the face are found. However, in Figure (2) spatial information alone is \n\nFigure 1: Correspondence with simple point features \n\ninsufficient. Although the rotation ofthe head is not a true affine transformation, it \n\n\fA Framework for Nonrigid Matching and Correspondence \n\n799 \n\nis a weak perspective projection for which the approximation is valid. Each photo \nis outlined, generating approximately 225 points in each face. A point on a contour \n\nFigure 2: Correspondence with multiple features \n\nhas associated with it a feature marker indicating the incident textures. For a \nhuman face, we use a binary 4-vector, with a 1 in position r if feature r is present. \nSpecifically, we have used a vector with elements [skin, hair, lip, eye]. For example, \na point on the line marking the mouth segment the lip from the skin has a feature \nvector [1,0,1,0]. Perceptual organization of the face motivates this type of feature \nmarking scheme. The correspondence is depicted in Figure (2) for a small subset of \nmatches. \n\nNext, we demonstrate how the multiple affine works in recovering the correct corre(cid:173)\nspondence and transformation. The points associated with the standing figure have \na marker indicating its part membership. There are six parts in this figure: head, \ntorso, each arm and each leg. The correspondence is shown in Figure (3). \n\nFor synthetic data, all 2D and 3D single part experiments used this protocol: The \nmodel set was generated uniformly on a unit square. A random affine matrix \nis generated, whose parameters, aij are chosen uniformly on a certain interval, \nwhich is used to generate the image set. Then, Pd image points are deleted, and \nGaussian noise, N(O, u) is added. Finally, spurious points, Ps are added. For \nthe multiple feature scenario, the elements of the feature vector are randomly \nmislabelled with probability, Pr , to represent distortion. For these experiments, \n50 model points were generated, and aij are uniform on an interval of length \n1.5. u E {0.01, 0.02, ... , 0.08}. Point deletions and spurious additions range from \n0% to 50% of the image points. The random feature noise associated with non(cid:173)\nspatial features has a probability of Pr = 0.05. The error measure we use is \nea = C Li,j laij -a.ij I where c = # par:meters interva~ length\u00b7 aij and a.ij are the correct \nparameter and the computed value, respectively. The constant term c normalizes \nthe measure so that the error equals 1 in the case that the aij and aij are chosen at \nrandom on this interval. The factor 3 in the numerator of this formula follows since \n\n\f800 \n\nS. PAPPU, S. GOLD, A. RANGARAJAN \n\n. _ ................... , \n\n'-' . \n---................ ~ \n~\u00b7~--\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7~-\u00b7\u00b7-\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7R \nIl\u00b7 \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7~ \n. . .;.;;:...':\":-:-:~~ ... - ... ~~:~~.~.:~~g \n\n\u2022 \n\n~\u00b7\u00b7~\u00b7~-\u00b7-\u00b7\u00b7-\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7:\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7<:\"!\u00bb>,.~Il \n\n;']~f \n\nFigure 3: Articulated Matching: Figure with six parts \n\nElx - yl = ~, when x and yare chosen randomly on the unit interval, and we want \nto normalize the error. The parameters used in all experiments were: ,Binitial = .091, \n,Bfmal = 100, ,Bupdate = 1.075, and T = 4. \nThe model has four regions, 24 parameters. Points corresponding to part 1 were \ncentered at (.5, .5), and generated randomly with a diameter of 1.0. For the image \nset, an affine transformation was applied with a translation diameter of .5, i.e., for \na21, an, and the remaining four parameters have a diameter of 1. Points corre(cid:173)\nsponding to regions 2, 3, and 4 were centered at (-.5, .5), (-.5, -.5), (.5, -.5) with \nmodel points and transformations generated in a similar fashion. 120 points were \ngenerated for the model point set, divided equally among the four parts. Image \npoints were deleted with equal probability from each region. Spurious point were \nnot explicitly added, since the overlapping of parts provides implicit spurious points. \n\nResults for the 2D and 3D (simple point) experiments are in Figure (4). Each data \npoint represents 500 runs for a different randomly generated affine transformation. \nIn all experiments, note that the error for small amounts of noise is approximately \nequal to that when there is no noise. We performed similar experiments for point \nsets that are 3-dimensional (12 parameters), but without any feature information. \nFor the experiments with features, shown in Figure (5) we used R = 4 features, and \nWr = 0.2, Vr. Each data point represents 500 runs.As expected, the inclusion of \nfeature information reduces the error, especially for large u. Additionally, Figure \n(5) details synthetic results for experiments with multiple affines (2D). Each data \npoint represents 70 runs. \n\n5 Conclusion \n\nWe have developed an affine point matching module, robust in the presence of noise \nand able to accommodate data of any dimension. The module forms the basis for \na non-rigid feature matching scheme in which multiple types of features interact to \nestablish correspondence. Modeling an object in terms of its spatial regions and \nthen using multiple affines to capture local transformations results in a tractable \nmethod for non-rigid matching. This non-rigid matching framework arising out of \n\n\fA Framework for Nonrigid Matching and Correspondence \n\n801 \n\n20 Results \n\n3D Resutts \n\n0 .25r--~--~-~--..., \n\n~ 0.15 \n\n! 0.2 \n..e(cid:173)g 0.1 \n~0.05 \n\nQ) \n\no \n\nx \n\no \nx \n\no \n\no \n\no \nx \n\no \n\n0.25~~--~-~-~-, \n\n! 0.2 \n~ 0.15 \n.e-\ng 0.1 0 \nx x \n0 \n: \nx x + + \n~0.05 + + _ .~ . J_!.-+- . - ' -' \n\n0 \nx x x \n\n0 \n\n0 \n\n0 \n\n0 \n\n0 \n\nO~----~-----~ \n\n0.02 \nStandard deviation: Jitter \n\n0.04 \n\n0.06 \n\n0.08 \n\n-. : Pd = 0%,P8 = 0%, \n+ : Pd = 10%,P8 = 10%, \n\no~----~-----~ \n\n0.08 \n\n0.06 \n\n0.04 \n\n0.02 \nStandard deviation: Jitter \n0: Pd = 50%,P8 = 10% \nX: Pd = 30%,P8 = 10% \n\nFigure 4: Synthetic Experiments: 2D and 3D \n\n4 Features \n\n4 Parts \n\niii \n\"ai \n~ 0.1 \n., \n..\" o \nai 0.05 \n\nQ. \n\n' ' ' ' ... \n---\n. -.-\n\" \u2022 \u2022 \n_.-' \n\n, ...... \n\n\u2022 \n\u2022 .--\n\niii 0.25 \n\"ai \n~ 0.2 \nc;; \n%0.15 \ne \niii 0.1 \n~0.05 \n\nx \n\n0 \n\n.-\n\n0 \n.-\n.-\n\nx \n\n0 \n.-\n\nx \nx 0 \n.\" . \"\" \n0 \n\nx \n\n. \" \n\n~ .9 \u00b7-\" \n-.-\n\no~----~-----~ \n\n0.02 \nStandard deviation: Jitter \n\n0.04 \n\n0.06 \n\n0.08 \n\no~----~-----~ \n\n0.02 \nStandard deviation: Jitter \n\n0.04 \n\n0.06 \n\n0.08 \n\n.- :Pd = 0%,P8 = 0% \n\n* : Pd = 10%,P8 = 10% \n. : Pd = 30%,P8 = 10%, \n-- : Pd = 50%,P8 = 10%, \n\n0: Pd = 10%,P8 = 0% \nX: Pd = 25%,P8 = 0% \n: Pd = 40%,P8 = 0% \n-\n\nFigure 5: Synthetic Experiments: Multiple features and parts \n\nneural computation is widely applicable in object recognition. \n\nAcknowledgements: Our thanks to Eric Mjolsness for many interesting discus(cid:173)\nsions related to the present work. \n\nReferences \n\n[1] S. Gold, C. P. Lu, A. Rangarajan, S. Pappu, and E. Mjolsness. New algo(cid:173)\nrithms for 2D and 3D point matching: Pose estimation and correspondence. \nIn G. Tesauro, D. Touretzky, and J. Alspector, editors, Advances in Neural \nInformation Processing Systems, volume 7, San Francisco, CA, 1995. Morgan \nKaufmann Publishers. \n\n[2] J. Kosowsky and A. Yuille. The invisible hand algorithm: Solving the assignment \n\nproblem with statistical physics. Neural Networks, 7:477-490, 1994. \n\n[3] S. Umeyama. Parameterized point pattern matching and its application to \nrecognition of object families. IEEE Trans. on Pattern Analysis and Machine \nIntelligence, 15:136-144,1993. \n\n[4] T . Wakahara. Shape matching using LAT and its application to handwritten \nnumeral recognition. IEEE Trans. in Pattern Analysis and Machine Intelligence, \n16:618- 629, 1994. \n\n\f", "award": [], "sourceid": 1118, "authors": [{"given_name": "Suguna", "family_name": "Pappu", "institution": null}, {"given_name": "Steven", "family_name": "Gold", "institution": null}, {"given_name": "Anand", "family_name": "Rangarajan", "institution": null}]}