{"title": "Solving the multi-way matching problem by permutation synchronization", "book": "Advances in Neural Information Processing Systems", "page_first": 1860, "page_last": 1868, "abstract": "The problem of matching not just two, but m different sets of objects to each other arises in a variety of contexts, including finding the correspondence between feature points across multiple images in computer vision. At present it is usually solved by matching the sets pairwise, in series. In contrast, we propose a new method, permutation synchronization, which finds all the matchings jointly, in one shot, via a relaxation to eigenvector decomposition. The resulting algorithm is both computationally efficient, and, as we demonstrate with theoretical arguments as well as experimental results, much more stable to noise than previous methods.", "full_text": "Solving the multi-way matching problem by\n\npermutation synchronization\n\nDeepti Pachauri,y Risi Kondorx and Vikas Singhzy\n\nyDept. of Computer Sciences, University of Wisconsin\u2013Madison\n\nzDept. of Biostatistics & Medical Informatics, University of Wisconsin\u2013Madison\nxDept. of Computer Science and Dept. of Statistics, The University of Chicago\n\npachauri@cs.wisc.edu risi@uchicago.edu vsingh@biostat.wisc.edu\n\nAbstract\n\nThe problem of matching not just two, but m different sets of objects to each other\narises in many contexts, including \ufb01nding the correspondence between feature\npoints across multiple images in computer vision. At present it is usually solved\nby matching the sets pairwise, in series. In contrast, we propose a new method,\nPermutation Synchronization, which \ufb01nds all the matchings jointly, in one shot,\nvia a relaxation to eigenvector decomposition. The resulting algorithm is both\ncomputationally ef\ufb01cient, and, as we demonstrate with theoretical arguments as\nwell as experimental results, much more stable to noise than previous methods.\n\n\u2032\n\n\u2032\nn\n\n\u2032\n1; x\n\n\u2032\n2; : : : ; x\n\n1 Introduction\nFinding the correct bijection between two sets of objects X = fx1; x2; : : : ; xng and X\n=\ng is a fundametal problem in computer science, arising in a wide range of con-\nfx\ntexts [1]. In this paper, we consider its generalization to matching not just two, but m different sets\nX1; X2; : : : ; Xm. Our primary motivation and running example is the classic problem of matching\nlandmarks (feature points) across many images of the same object in computer vision, which is a\nkey ingredient of image registration [2], recognition [3, 4], stereo [5], shape matching [6, 7], and\nstructure from motion (SFM) [8, 9]. However, our approach is fully general and equally applicable\nto problems such as matching multiple graphs [10, 11].\nPresently, multi-matching is usually solved sequentially, by \ufb01rst \ufb01nding a putative permutation (cid:28)12\nmatching X1 to X2, then a permutation (cid:28)23 matching X2 to X3, and so on, up to (cid:28)m(cid:0)1;m. While\none can conceive of various strategies for optimizing this process, the fact remains that when the\ndata are noisy, a single error in the sequence will typically create a large number of erroneous\npairwise matches [12, 13, 14]. In contrast, in this paper we describe a new method, Permutation\ni;j=1 of assignments jointly, in a single shot,\nSynchronization, that estimates the entire matrix ((cid:28)ji)m\nand is therefore much more robust to noise.\nFor consistency, the recovered matchings must satisfy (cid:28)kj(cid:28)ji = (cid:28)ki. While \ufb01nding an optimal matrix\nof permutations satisfying these relations is, in general, combinatorially hard, we show that for the\nmost natural choice of loss function the problem has a natural relaxation to just \ufb01nding the n leading\neigenvectors of the cost matrix. In addition to vastly reducing the computational cost, using recent\nresults from random matrix theory, we show that the eigenvectors are very effective at aggregating\ninformation from all\npairwise matches, and therefore make the algorithm surprisingly robust to\nnoise. Our experiments show that in landmark matching problems Permutation Synchronization can\nrecover the correct correspondence between landmarks across a large number of images with small\nerror, even when a signi\ufb01cant fraction of the pairwise matches are incorrect.\nThe term \u201csynchronization\u201d is inspired by the recent celebrated work of Singer et al. on a similar\nproblem involving \ufb01nding the right rotations (rather than matchings) between electron microscopic\n\n(\n\n)\n\nm\n2\n\n1\n\n\fimages [15][16][17]. Historically, multi-matching has received relatively little attention. However,\nindependently of, and concurrently with the present work, Huang and Guibas [18] have recently\nproposed a semide\ufb01nite programming based solution, which parallels our approach, and in problems\ninvolving occlusion might perform even better.\n\n1; xi\n\n2; : : : ; xi\nn\n\np (cid:24) xj\n\np in Xi has a natural counterpart xj\n\n2 Synchronizing permutations\nConsider a collection of m sets X1; X2; : : : ; Xm of n objects each, Xi = fxi\nthat for each pair (Xi; Xj), each xi\ncomputer vision, given m images of the same scene taken from different viewpoints, xi\nmight be n visual landmarks detected in image i, while xj\nimage j, in which case xi\n(cid:28)ji(p) for some\nSince the correspondence between Xi and Xj is a bijection, one can write it as xi\npermutation (cid:28)ji : f1; 2; : : : ; ng ! f1; 2; : : : ; ng. Key to our approach to solving multi-matching is\n((cid:28) (i)), the n! possible\nthat with respect to the natural de\ufb01nition of multiplication, ((cid:28)\npermutations of f1; 2; : : : ; ng form a group, called the symmetric group of degree n, denoted Sn.\nWe say that the system of correspondences between X1; X2; : : : ; Xm is consistent if xi\nq and\nq (cid:24) xk\nr. In terms of permutations this is equivalent to requiring that\nxj\nthe array ((cid:28)ij)m\n\ng, such\nq in Xj. For example, in\n2; : : : ; xi\nn\nn are n landmarks detected in\n\nq correspond to the same physical feature.\n\nr together imply that xi\n\nq signi\ufb01es that xi\n\n(1)\nAlternatively, given some reference ordering of x1; x2; : : : ; xn, we can think of each Xi as realizing\nits own permutation (cid:27)i (in the sense of x\u2113 (cid:24) xi\n\n(cid:27)i(\u2113)), and then (cid:28)ji becomes\n\ni;j=1 satisfy\n\n8i; j; k:\n\np (cid:24) xk\n\np (cid:24) xj\n\np (cid:24) xj\n\n(cid:28)kj(cid:28)ji = (cid:28)ki\n\n(cid:28) )(i) := ((cid:28)\n\n1; xj\n\n2; : : : ; xj\n\np and xj\n\n1; xi\n\n\u2032\n\n\u2032\n\n(cid:0)1\n(cid:28)ji = (cid:27)j(cid:27)\ni\n\n:\n\n(2)\n\nThe existence of permutations (cid:27)1; (cid:27)2; : : : ; (cid:27)m satisfying (2) is equivalent to requiring that ((cid:28)ji)m\ni;j=1\nsatisfy (1). Thus, assuming consistency, solving the multi-matching problem reduces to \ufb01nding\njust m different permutations, rather than O(m2). However, the (cid:27)i\u2019s are of course not directly\nobservable. Rather, in a typical application we have some tentative (noisy) ~(cid:28)ji matchings which we\nmust synchronize into the form (2) by \ufb01nding the underlying (cid:27)1; : : : ; (cid:27)m.\nGiven (~(cid:28)ji)m\nmutation Synchronization as the combinatorial optimization problem\n\ni;j=1 and some appropriate distance metric d between permutations, we formalize Per-\n\nminimize\n\n(cid:27)1;(cid:27)2;:::;(cid:27)m2Sn\n\n(cid:0)1\nd((cid:27)j(cid:27)\ni\n\n; ~(cid:28)ji):\n\n(3)\n\nThe computational cost of solving (3) depends critically on the form of the distance metric d. In this\npaper we limit ourselves to the simplest choice\n\nN\u2211\n\ni;j=1\n\n2\n\nwhere P ((cid:27)) 2 Rn(cid:2)n are the usual permutation matrices\n\n{\n\nd((cid:27); (cid:28) ) = n (cid:0) \u27e8P ((cid:27)); P ((cid:28) )\u27e9 ;\n\u2211\n\n1 if (cid:27)(p) = q\n0 otherwise;\n\n[P ((cid:27))]q;p :=\n\nand \u27e8A; B\u27e9 is the matrix inner product \u27e8A; B\u27e9 := tr(A\n\u22a4\nThe distance (4) simply counts the number of objects assigned differently by (cid:27) and (cid:28). Further-\n); P (~(cid:28)ji)\u27e9; suggesting the\nmore, it allows us to rewrite (3) as maximize(cid:27)1;(cid:27)2;:::;(cid:27)m\ngeneralization\n\nn\np;q=1 Ap;q Bp;q:\n\u27e8P ((cid:27)j(cid:27)\n\u27e9\n\nm\u2211\n\n\u2211\n\nm\ni;j=1\n\nB) =\n\n(cid:0)1\ni\n\n\u27e8\n\nmaximize\n(cid:27)1;(cid:27)2;:::;(cid:27)m\n\ni;j=1\n\nP ((cid:27)j(cid:27)\n\n(cid:0)1\ni\n\n); Tji\n\n;\n\n\u22a4\nji = Tij. Intuitively, each Tji is an objective\nwhere the Tji\u2019s can now be any matrices, subject to T\nq in Xj. This\nmatrix, the (q; p) element of which captures the utility of matching xi\ngeneralization is very useful when the assignments of the different xi\np\u2019s have different con\ufb01dences.\nFor example, in the landmark matching case, if, due to occlusion or for some other reason, the\ncounterpart of xi\n\np is not present in Xj, then we can simply set [Tji]q;p = 0 for all q.\n\np in Xi to xj\n\n(4)\n\n(5)\n\n\f2.1 Representations and eigenvectors\n\nThe generalized Permutation Synchronization problem (5) can also be written as\n\nmaximize\n(cid:27)1;(cid:27)2;:::;(cid:27)m\n\n\u27e8P; T \u27e9 ;\n\nwhere\n\nP =\n\n0B@ P ((cid:27)1(cid:27)\n\n...\nP ((cid:27)m(cid:27)\n\n(cid:0)1\n1 )\n\n(cid:0)1\n1 )\n\n1CA\n\n: : : P ((cid:27)1(cid:27)\n...\n...\n: : : P ((cid:27)m(cid:27)\n\n(cid:0)1\nm )\n\n(cid:0)1\nm )\n\nand\n\nT =\n\n0B@ T11\n\n...\nTm1\n\n: : : T1m\n...\n...\n: : : Tmm\n\n1CA :\n\n(6)\n\n(7)\n\nA matrix valued function (cid:26) : Sn ! Cd(cid:2)d is said to be a representation of the symmetric group if\n(cid:26)((cid:27)2) (cid:26)((cid:27)1) = (cid:26)((cid:27)2(cid:27)1) for any pair of permutations (cid:27)1; (cid:27)2 2 Sn. Clearly, P is a representation\nof Sn (actually, the so-called de\ufb01ning representation), since P ((cid:27)2(cid:27)1) = P ((cid:27)2) P ((cid:27)1). Moreover,\n\u22a4. Our\nP is a so-called orthogonal representation, because each P ((cid:27)) is real and P ((cid:27)\nfundamental observation is that this implies that P has a very special form.\nProposition 1. The synchronization matrix P is of rank n and is of the form P = U (cid:1) U\n\n(cid:0)1) = P ((cid:27))\n\n\u22a4, where\n\nProof. From P being a representation of Sn,\n\nU =\n\n0B@ P ((cid:27)1) P ((cid:27)1)\n\n...\n\nP ((cid:27)m) P ((cid:27)1)\n\n\u22a4\n\n\u22a4\n\nP =\n\n1CA ;\n\n(8)\n\n: : : P ((cid:27)1) P ((cid:27)m)\n...\n: : : P ((cid:27)m) P ((cid:27)m)\n\n...\n\n\u22a4\n\n\u22a4\n\n\u22a4. Since U has n columns, rank(P) is at most n. This rank is achieved because\nimplying P = U (cid:1) U\nP ((cid:27)1) is an orthogonal matrix, therefore it has linearly independent columns, and consequently the\n\u25a0\ncolumns of U cannot be linearly dependent.\n\nCorollary 1. Letting [P ((cid:27)i)]p denote the p\u2019th column of P ((cid:27)i), the normalized columns of U,\n\n0B@ P ((cid:27)1)\n\n...\n\n1CA :\n\nP ((cid:27)m)\n\n0B@ [P ((cid:27)1)]\u2113\n\n1CA\n\n...\n\n[P ((cid:27)m)]\u2113\n\nu\u2113 =\n\n1p\nm\n\nmaximize\nP2Mn\n\nm\n\nP = m\n\n\u27e8P; T \u27e9 ;\nn\u2211\n\n\u22a4\n\u2113 ;\n\nv\u2113 v\n\n\u2113=1\n\n3\n\n\u2113 = 1; : : : ; n;\n\n(9)\n\nare mutually orthogonal unit eigenvectors of P with the same eigenvalue m, and together span the\nrow/column space of P.\nProof. The columns of U are orthogonal because the columns of each constituent P ((cid:27)i) are orthog-\nonal. The normalization follows from each column of P ((cid:27)i) having norm 1. The rest follows by\n\u25a0\nProposition 1.\n\n2.2 An easy relaxation\n\nSolving (6) is computationally dif\ufb01cult, because it involves searching the combinatorial space of a\ncombination of m permutations. However, Proposition 1 and its corollary suggest relaxing it to\n\nwhere Mm\nare m. This is now just a generalized Rayleigh problem, the solution of which is simply\n\nn is the set of mn\u2013dimensional rank n symmetric matrices whose non-zero eigenvalues\n\n(10)\n\n(11)\n\n\fwhere v1; v2; : : : ; v\u2113 are the n leading normalized eigenvectors of T . Equivalently, P = U (cid:1) U\nwhere\n\n\u22a4,\n\n( j\n\nv1\nj\n\np\nm\n\nU =\n\n)\n\nj\nvn\nj\n\nj\nv2\nj\n\n: : :\n: : :\n: : :\n\n:\n\n(12)\n\nThus, in contrast to the original combinatorial problem, (10) can be solved by just \ufb01nding the m\nleading eigenvectors of T .\nOf course, from P we must still recover the in-\ndividual permutations (cid:27)1; (cid:27)2; : : : ; (cid:27)m. How-\never, as long as P is relatively close in form\n(7), this is quite a simple and stable process.\nOne way to do it is to let each (cid:27)i be the per-\nmutation that best matches the (i; 1) block of\nP in the linear assignment sense,\n\nCompute the n leading eigenvectors (v1; v2; : : : ; vn)\nof T and set U =\nfor i = 1 to m do\n\nAlgorithm 1 Permutation Synchronization\nInput: the objective matrix T\n\nm [v1; v2; : : : ; vn]\n\n\u22a4\n1:n; 1:n\n\nPi1 = U(i(cid:0)1)n+1:in; 1:n U\n(cid:27)i = arg max(cid:27)2Sn\n\n\u27e8Pi1; (cid:27)\u27e9 [Kuhn-Munkres]\n\np\n\n(cid:27)i = arg min\n(cid:27)2Sn\n\n\u27e8P ((cid:27)); [P]i;1\u27e9 ;\n\nwhich is solved in O(n3) time by the Kuhn\u2013\nMunkres algorithm [19]1, and then set (cid:28)ji =\n(cid:0)1\n, which will then satisfy the consistency\n(cid:27)j(cid:27)\ni\nrelations. The pseudocode of the full algo-\nrithm is given in Algorithm 1.\n\n3 Analysis of the relaxed algorithm\n\nend for\nfor each (i; j) do\n\n(cid:28)ji = (cid:27)j(cid:27)\n\nend for\n\n(cid:0)1\ni\n\nmatchings\n\nOutput: the matrix ((cid:28)ji)m\n\ni;j=1 of globally consistent\n\n0B@ [P (~(cid:27)1)]\u2113\n\n1CA\n\n...\n\n[P (~(cid:27)m)]\u2113\n\nv\u2113 =\n\n1p\nm\n\nLet us now investigate under what conditions we can expect the relaxation (10) to work well, in\nparticular, in what cases we can expect the recovered matchings to be exact.\nIn the absence of noise, i.e., when Tji = P (~(cid:28)ji) for some array (~(cid:28)ji)j;i of permutations that al-\nready satisfy the consistency relations (1), T will have precisely the same structure as described by\nProposition 1 for P. In particular, it will have n mutually orthogonal eigenvectors\n\n\u2113 = 1; : : : ; n\n\n(13)\n\nwith the same eigenvalue m. Due to the n\u2013fold degeneracy, however, the matrix of eigenvectors\n(12) is only de\ufb01ned up to multiplication by an arbitrary rotation matrix O on the right, which means\nthat instead of the \u201ccorrect\u201d U (whose columns are (13)), the eigenvector decomposition of T may\nreturn any U\n\n= U O. Fortunately, when forming the product\n\n\u2032\n\nP = U\n\n\u2032 (cid:1) U\n\n\u2032\u22a4\n\n= U O O\n\nU\n\n\u22a4\n\n\u22a4\n\n= U (cid:1) U\n\n\u22a4\n\nthis rotation cancels, con\ufb01rming that our algorithm recovers P = T , and hence the matchings\n(cid:28)ji = ~(cid:28)ji, with no error.\nOf course, rather than the case when the solution is handed to us from the start, we are more in-\nterested in how the algorithm performs in situations when either the Tji blocks are not permutation\nmatrices, or they are not synchronized. To this end, we set\nT = T0 + N ;\n\nalgorithm solves forb(cid:28) = arg max(cid:28)2Sn\n\n(14)\nwhere T0 is the correct \u201cground truth\u201d synchronization matrix, while N is a symmetric perturbation\nmatrix with entries drawn independently from a zero-mean normal distribution with variance (cid:17)2.\nIn general, to \ufb01nd the permutation best aligned with a given n (cid:2) n matrix T , the Kuhn\u2013Munkres\n\u27e8P ((cid:28) ); T\u27e9 = arg max(cid:28)2Sn (vec(P ((cid:28) )) (cid:1) vec(T )). Therefore,\n1 Note that we could equally well have matched the (cid:27)i\u2019s to any other column of blocks, since they are only\nde\ufb01ned relative to an arbitrary reference permutation: if, for any \ufb01xed (cid:27)0, each (cid:27)i is rede\ufb01ned as (cid:27)i(cid:27)0, the\npredicted relative permutations (cid:28)ji = (cid:27)j(cid:27)0((cid:27)i(cid:27)0)\n\n(cid:0)1 = (cid:27)j(cid:27)\n\nstay the same.\n\n(cid:0)1\ni\n\n4\n\n\fFigure 1: Singular value histogram of T under the noise model where each ~(cid:28)ji with probability p =\nf0:10; 0:25; 0:85g is replaced by a random permutation (m = 100, n = 30). Note that apart from the ex-\ntra peak at zero, the distribution of the stochastic eigenvalues is very similar to the semicircular distribution for\nGaussian noise. As long as the small cluster of deterministic eigenvalues is clearly separated from the noise,\nPermutation Synchronization is feasible.\n\nwriting T = P ((cid:28)0) + \u03f5, where P ((cid:28)0) is the \u201cground truth\u201d, while \u03f5 is an error term, it is guaranteed\nto return the correct permutation as long as\n\n\u2225 vec(\u03f5)\u2225 < min\n\n(cid:28)\u20322 Snnf(cid:28)0g\n\n\u2225 vec((cid:28)0) (cid:0) vec((cid:28)\n\n\u2032\n\n)\u2225 =2:\n\n2\n\nBy the symmetry of Sn, the right hand side is the same for any (cid:28)0, so w.l.o.g. we can set (cid:28)0 = e (the\n\u2032 is just a transposition, e.g., the permutation\nidentity), and \ufb01nd that the minimum is achieved when (cid:28)\nthat swaps 1 with 2 and leaves 3; 4; : : : ; n in place. The corresponding permutation matrix differs\np\nfrom the idenity in exactly 4 entries, therefore a suf\ufb01cient condition for correct reconstruction is that\n4 = 1. As n grows, \u2225\u03f5\u2225Frob becomes tightly concentrated\n\u2225\u03f5\u2225Frob = \u27e8\u03f5; \u03f5\u27e91=2 = \u2225vec(\u03f5)\u2225 < 1\naround (cid:17)n, so the condition for recovering the correct permutation is (cid:17) < 1=n.\nPermutation Synchronization can achieve a lower error, especially in the large m regime, because\nthe eigenvectors aggregate information from all the Tji matrices, and tend to be very stable to per-\nturbations. In general, perturbations of the form (14) exhibit a characteristic phase transition. As\nlong as the largest eigenvalue of the random matrix N falls below a given multiple of the smallest\nnon-zero eigenvalue of T0, adding N will have very little effect on the eigenvectors of T . On the\nother hand, when the noise exceeds this limit, the spectra get fully mixed, and it becomes impossible\nto recover T0 from T to any precision at all.\nIf N is a symmetric matrix with independent N (0; (cid:17)2) entries, as nm ! 1, its spectrum will tend to\nWigner\u2019s famous semicircle distribution supported on the interval ((cid:0)2(cid:17)(nm)1=2; 2(cid:17)(nm)1=2), and\nwith probability one the largest eigenvalue will approach 2(cid:17)(nm)1=2 [20, 21]. In contrast, the non-\nzero eigenvalues of T0 scale with m, which guarantees that for large enough m the two spectra will\nbe nicely separated and Permutation Synchronization will have very low error. While much harder\nto analyze analytically, empirical evidence suggests that this type of phase transition behavior is\ncharacteristic of any reasonable noise model, for example the one in which we take each block of T\nand with some probability p replace it with a random permutation matrix (Figure 1).\nTo derive more quantitative results, we consider the case where N is a so-called (symmetric) Gaus-\nsian Wigner matrix, which has independent N (0; (cid:17)2) entries on its diagonal, and N (0; (cid:17)2=2) entries\neverywhere else. It has recently been proved that for this type of matrix the phase transition occurs\n= 1=2, so to recover T0 to any accuracy at all we must have (cid:17) < (m=n)1=2 [22].\nat (cid:21)det\nBelow this limit, to quantify the actual expected error, we write each leading normalized eigenvector\ni is the projection of vi to the space U0 spanned by the\nv1; v2; : : : ; vn of T as vi = v\n?\n(cid:3)\n(cid:3)\ni , where v\ni + v\nn of T0. By Theorem 2.2 of [22] as nm ! 1,\nnon-zero eigenvectors v0\n2; : : : ; v0\n1; v0\n\u22252 a:s:(cid:0)(cid:0)(cid:0)! 1 (cid:0) (cid:17)2 n\n\u2225v\n\u22252 a:s:(cid:0)(cid:0)(cid:0)! (cid:17)2 n\n\u2225v\n(cid:3)\n?\ni\ni\nm\nm\n\u27e9 = \u27e8vi; vj\u27e9 (cid:0) \u27e8v\nIt is easy to see that \u27e8v\na:s:(cid:0)(cid:0)! 0, which implies \u27e8v\n\u27e9\n?\n?\n?\n(cid:3)\n(cid:3)\ni ; v\ni ; v\ni ; v\nso, setting (cid:21) = (1 (cid:0) (cid:17)2n=m)\nj\nj\n(cid:0)1=2, the normalized vectors (cid:21)v\n(cid:3)\np\n1; : : : ; (cid:21)v\northonormal basis for U0. Thus, U =\nm [v0\nby\n\n(15)\na:s:(cid:0)(cid:0)! 0,\n(cid:3)\nn almost surely tend to an\n1; : : : ; v0\nn]\n\np\nm [v1; : : : ; vn] is related to the \u201ctrue\u201d U0 =\n\nmin=(cid:21)stochastic\n\nmax\n\nand\n\n\u27e9\n\n?\nj\n\n:\n\n(cid:21)U a:s:(cid:0)(cid:0)! U0O + (cid:21)E\n\n\u2032\n\n= (U0 + (cid:21)E)O;\n\n\u2032 has norm (cid:17)(n=m)1=2.\nwhere O is some rotation and each column of the noise matrices E and E\nSince multiplying U on the right by an orthogonal matrix does not affect P, and the Kuhn\u2013Munkres\n\n5\n\n\fFigure 2: The fraction of ((cid:27)i)m\ni=1 permutations that are incorrect when reconstructed by Permutation Synchro-\nnization from an array (~(cid:28)ji)m\nj;i=1, in which each entry, with probability p is replaced by a random permutation.\nThe plots show the mean and standard deviation of errors over 20 runs as a function of p for m = 10 (red),\nm = 50 (blue) and m = 100 (green). (Left) n = 10. (Center) n = 25. (Right) n = 30.\n\nalgorithm is invariant to scaling by a constant, this equation tells us that (almost surely) the effect\nof (14) is equivalent to setting U = U0 + (cid:21)E. In terms of the individual Pji blocks of P = U U\n\u22a4,\nneglecting second order terms,\n\nPji = (U 0\n\nj + (cid:21)Ej)(U 0\n\n\u22a4 (cid:25) P ((cid:28)ji) + (cid:21)U 0\nj E\ni and Ei denote the appropriate n (cid:2) n submatrices\nwhere (cid:28)ji is the ground truth matching and U 0\nof U 0 and E. Conjecturing that in the limit Ei and Ej follow rotationally invariant distributions,\nalmost surely\n\n\u22a4\ni + (cid:21)EjU 0\u22a4\n\ni + (cid:21)Ei)\n\n;\n\ni\n\nlim\u2225 U 0\nj E\n\ni + EjU 0\u22a4\n\u22a4\n\ni\n\n\u2225Frob = lim\u2225 Ei + Ej \u2225Frob (cid:20) 2 (cid:17)n=m:\n\nThus, plugging in to our earlier result for the error tolerance of the Kuhn\u2013Munkres algorithm, Per-\nmutation Synchronization will correctly recover (cid:28)ji with probability one provided 2(cid:21)(cid:17)n=m < 1, or,\nequivalently,\n\n(cid:17)2 <\n\nm=n\n\n1 + 4(m=n)(cid:0)1 :\n\nThis is much better than our (cid:17) < 1=n result for the naive algorithm, and remarkably only slightly\nstricter than the condition (cid:17) < (m=n)1=2 for recovering the eigenvectors with any accuracy at all.\nOf course, these results are asymptotic (in the sense of nm ! 1), and strictly speaking only apply\nto additive Gaussian Wigner noise. However, as Figure 2 shows, in practice, even when the noise is\nin the form of corrupting entire permutations and nm is relatively small, qualitatively our algorithm\nexhibits the correct behavior, and for large enough m Permutation Synchronization does indeed\nrecover all ((cid:28)ji)m\n\nj;i=1 with no error even when the vast majority of the entries in T are incorrect.\n\n4 Experiments\n\nSince computer vision is one of the areas where improving the accuracy of multi-matching problems\nis the most pressing, our experiments focused on this domain. For a more details of our results,\nplease see the extended version of the paper available on project website.\n\nStereo Matching. As a proof of principle, we considered the task of aligning landmarks in 2D\nimages of the same object taken from different viewpoints in the CMU house (m = 111 frames\nof a video sequence of a toy house with n = 30 hand labeled landmark points in each frame) and\nCMU hotel (m = 101 frames of a video sequence of a toy hotel, n = 30 hand labeled landmark\npoints in each frame) datasets. The baseline method is to compute (~(cid:28)ji)m\ninde-\npendent linear assignment problems based on matching landmarks by their shape context features\n[23]. Our method takes the same pairwise matches and synchronizes them with the eigenvector\nbased procedure. Figure 3 shows that this clearly outperforms the baseline, which tends to degrade\nprogressively as the number of images increases. This is due to the fact that the appearance (or de-\nscriptors) of keypoints differ considerably for large offset pairs (which is likely when the image set\nis large), leading to many false matches. In contrast, our method improves as the size of the image\nset increases. While simple, this experiment demonstrates the utility of Permutation Synchroniza-\ntion for multi-view stereo matching, showing that instead of heuristically propagating local pairwise\nmatches, it can \ufb01nd a much more accurate globally consistent matching at little additional cost.\n\ni;j=1 by solving\n\n(\n\n)\n\nm\n2\n\n6\n\n\f(a)\n\nFigure 3:\n(a) Normalized error as m increases on the House dataset. Permutation Synchronization (blue)\nvs. the pairwise Kuhn-Munkres baseline (red). (b-c) Matches found for a representative image pair. (Green\ncircles) landmarks, (green lines) ground truth, (red lines) found matches. (b) Pairwise linear assignment, (c)\nPermutation Synchronization. Note that less visible green is good.\n\n(b)\n\n(c)\n\nFigure 4: Matches for a representative image pairs from the Building (top) and Books (bottom) datasets.\n(Green circles) landmark points, (green lines) ground truth matchings, (red lines) found matches. (Left) Pair-\nwise linear assignment, (right) Permutation Synchronization. Note that less visible green is better (right).\n\nRepetitive Structures. Next, we considered a dataset with severe geometric ambiguities due to\nrepetitive structures. There is some consensus in the community that even sophisticated features\n(like SIFT) yield unsatisfactory results in this scenario, and deriving a good initial matching for\nstructure from motion is problematic (see [24] and references therein). Our evaluations included 16\nimages from the Building dataset [24]. We identi\ufb01ed 25 \u201csimilar looking\u201d landmark points in the\nscene, and hand annotated them across all images. Many landmarks were occluded due to the camera\nangle. Qualitative results for pairwise matching and Permutation Synchronization are shown in Fig 4\n(top). We highlight two important observations. First, our method resolved geometrical ambiguities\nby enforcing mutual consistency ef\ufb01ciently. Second, Permutation Synchronization robustly handles\nocclusion: landmark points that are occluded in one image are seamlessly assigned to null nodes in\nthe other (see the set of unassigned points in the rightmost image in Fig 4 (top)) thanks to evidence\nderived from the large number of additional images in the dataset. In contrast, pairwise matching\nstruggles with occlusion in the presence of similar looking landmarks (and feature descriptors). For\nn = 25 and m = 16, the error from the baseline method (Pairwise Linear Assignment) was 0:74.\nPermutation Synchronization decreased this by 10% to 0:64. The Books dataset (Fig 4, bottom)\ncontains m = 20 images of multiple books on a \u201cL\u201d shaped study table [24], and suffers geometrical\nambiguities similar to the above with severe occlusion. Here we identi\ufb01ed n = 34 landmark points,\nmany of which were occluded in most images. The error from the baseline method was 0:92, and\nPermutation Synchronization decreased this by 22% to 0:70 (see extended version of the paper).\nKeypoint matching with nominal user supervision. Our \ufb01nal experiment deals with matching\nproblems where keypoints in each image preserve a common structure.\nIn the literature, this is\nusually tackled as a graph matching problem, with the keypoints de\ufb01ning the vertices, and their\nstructural relationships being encoded by the edges of the graph. Ideally, one wants to solve the\nproblem for all images at once but most practical solutions operate on image (or graph) pairs. Note\n\n7\n\n\fthat in terms of dif\ufb01culty, this problem is quite distinct from those discussed above.\nIn stereo,\nthe same object is imaged and what varies from one view to the other is the \ufb01eld of view, scale,\nor pose.\nIn contrast, in keypoint matching, the background is not controlled and even sophisti-\ncated descriptors may go wrong. Recent solutions often leverage supervision to make the prob-\nlem tractable [25, 26]. Instead of learning parameters [25, 27], we utilize supervision directly to\nprovide the correct matches on a small subset of randomly picked image pairs (e.g., via a crowd-\nsourced platform like Mechanical Turk). We hope to exploit this \u2018ground-truth\u2019 to signi\ufb01cantly\nboost accuracy via Permutation Synchronization. For our experiments, we used the baseline method\noutput to set up our objective matrix T but with a \ufb01xed \u201csupervision probability\u201d, we replaced\nthe Tji block by the correct permutation matrix, and ran Permutation Synchronization. We con-\nsidered the \u201cBikes\u201d sub-class from the Caltech 256 dataset, which contains multiple images of\ncommon objects with varying backdrops, and chose to match images in the \u201ctouring bike\u201d class.\nOur analysis included 28 out of 110 images in this dataset that\nwere taken \u201cside-on\u201d. SUSAN corner detector was used to\nidentify landmarks in each image. Further, we identi\ufb01ed 6 in-\nterest points in each image that correspond to the frame of the\nbicycle. We modeled the matching cost for an image pair as\nthe shape distance between interest points in the pair. As be-\nfore, the baseline was pairwise linear assignment. For a \ufb01xed\ndegree of supervision, we randomly selected image pairs for\nsupervision and estimated matchings for the rest of the image\npairs. We performed 50 runs for each degree of supervision.\nMean error and standard deviation is shown in Fig 5 as super-\nvision increases. Fig 6 demonstrates qualitative results by our\nmethod (right) and pairwise linear assignment (left).\n\nFigure 5: Normalized error as the\ndegree of supervision varies. Base-\nline method PLA (red) and Permuta-\ntion Synchronization (blue)\n\n5 Conclusions\n\n(\n\n)\n\nEstimating the correct matching between two sets from noisy similarity data, such as the visual\nfeature based similarity matrices that arise in computer vision is an error-prone process. However,\nwhen we have not just two, but m different sets, the consistency conditions between the\npair-\nwise matchings severely constrain the solution. Our eigenvector decomposition based algorithm,\nPermutation Synchronization, exploits this fact and pools information from all pairwise similarity\nmatrices to jointly estimate a globally consistent array of matchings in a single shot. Theoretical\nresults suggest that this approach is so robust that no matter how high the noise level is, for large\nenough m the error is almost surely going to be zero. Experimental results con\ufb01rm that in a range\nof computer vision tasks from stereo to keypoint matching in dissimilar images, the method does\nindeed signi\ufb01cantly improve performance (especially when m is large, as expected in video), and\ncan get around problems such as occlusion that a pairwise strategy cannot handle. In future work we\nplan to compare our method to [18] (which was published after the present paper was submitted), as\nwell as investigate using the graph connection Laplacian [28].\n\nm\n2\n\nAcknowledgments\n\nWe thank Amit Singer for invaluable comments and for drawing our attention to [18]. This work\nwas supported in part by NSF\u20131320344 and by funding from the University of Wisconsin Graduate\nSchool.\n\nFigure 6: A representative triplet from the \u201cTouring bike\u201d dataset. (Yellow circle) Interest points in each\nimage. (Green lines) Ground truth matching for image pairs (left-center) and (center-right). (Red lines) Matches\nfor the image pairs: (left) supervision=0.1, (right) supervision=0.5.\n\n8\n\n\fReferences\n[1] R. E. Burkard, M. Dell\u2019Amico, and S. Martello. Assignment problems. SIAM, 2009.\n[2] D. Shen and C. Davatzikos. Hammer: hierarchical attribute matching mechanism for elastic registration.\n\nTMI, IEEE, 21, 2002.\n\n[3] K. Duan, D. Parikh, D. Crandall, and K. Grauman. Discovering localized attributes for \ufb01ne-grained\n\nrecognition. In CVPR, 2012.\n\n[4] M.F. Demirci, A. Shokoufandeh, Y. Keselman, L. Bretzner, and S. Dickinson. Object recognition as\n\nmany-to-many feature matching. IJCV, 69, 2006.\n\n[5] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S.M. Seitz. Multi-view stereo for community photo\n\ncollections. In ICCV, 2007.\n\n[6] A.C. Berg, T.L. Berg, and J. Malik. Shape matching and object recognition using low distortion corre-\n\nspondences. In CVPR, 2005.\n\n[7] J. Petterson, T. Caetano, J. McAuley, and J. Yu. Exponential family graph matching and ranking. NIPS,\n\n2009.\n\n[8] S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S.M. Seitz, and R. Szeliski. Building Rome\n\nin a day. Communications of the ACM, 54, 2011.\n\n[9] I. Simon, N. Snavely, and S.M. Seitz. Scene summarization for online image collections. In ICCV, 2007.\n[10] P.A. Pevzner. Multiple alignment, communication cost, and graph matching. SIAM JAM, 52, 1992.\n[11] S. Lacoste-Julien, B. Taskar, D. Klein, and M.I. Jordan. Word alignment via quadratic assignment. In\n\nProc. HLT - NAACL, 2006.\n\n[12] A.J. Smola, S.V.N. Vishwanathan, and Q. Le. Bundle methods for machine learning. NIPS, 20, 2008.\n[13] I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, and Y. Singer. Large margin methods for structured\n\nand interdependent output variables. JMLR, 6, 2006.\n\n[14] M. Volkovs and R. Zemel. Ef\ufb01cient sampling for bipartite matching problems. In NIPS, 2012.\n[15] A. Singer and Y. Shkolnisky. Three-dimensional structure determination from common lines in cryo-EM\nby eigenvectors and semide\ufb01nite programming. SIAM Journal on Imaging Sciences, 4(2):543\u2013572, 2011.\n[16] R. Hadani and A. Singer. Representation theoretic patterns in three dimensional cryo-electron microscopy\n\nI \u2014 the intrinsic reconstitution algorithm. Annals of Mathematics, 174(2):1219\u20131241, 2011.\n\n[17] R. Hadani and A. Singer. Representation theoretic patterns in three-dimensional cryo-electron microscopy\n\nII \u2014 the class averaging problem. Foundations of Computational Mathematics, 11(5):589\u2013616, 2011.\n\n[18] Qi-Xing Huang and Leonidas Guibas. Consistent shape maps via semide\ufb01nite programming. Computer\n\nGraphics Forum, 32(5):177\u2013186, 2013.\n\n[19] H.W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2,\n\n1955.\n\n[20] E.P. Wigner. On the distribution of the roots of certain symmetric matrices. Ann. Math, 67, 1958.\n[21] Z. F\u00a8uredi and J. Koml\u00b4os. The eigenvalues of random symmetric matrices. Combinatorica, 1, 1981.\n[22] F. Benaych-Georges and R.R. Nadakuditi. The eigenvalues and eigenvectors of \ufb01nite, low rank perturba-\n\ntions of large random matrices. Advances in Mathematics, 227(1):494\u2013521, 2011.\n\n[23] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. PAMI,\n\n24(4):509\u2013522, 2002.\n\n[24] R. Roberts, S. Sinha, R. Szeliski, and D. Steedly. Structure from motion for scenes with large duplicate\n\nstructures. In CVPR, 2011.\n\n[25] T.S. Caetano, J.J. McAuley, L. Cheng, Q.V. Le, and A.J. Smola. Learning graph matching. PAMI,\n\n31(6):1048\u20131058, 2009.\n\n[26] M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected \ufb01xed point method for graph matching\n\nand map inference. In NIPS, 2009.\n\n[27] T. Jebara, J. Wang, and S.F. Chang. Graph construction and b-matching for semi-supervised learning. In\n\nICML, 2009.\n\n[28] A. S. Bandeira, A. Singer, and D. A. Spielman. A Cheeger inequality for the graph connection Laplacian.\n\nSIAM Journal on Matrix Analysis and Applications, 34(4):1611\u20131630, 2013.\n\n9\n\n\f", "award": [], "sourceid": 939, "authors": [{"given_name": "Deepti", "family_name": "Pachauri", "institution": "UW-Madison"}, {"given_name": "Risi", "family_name": "Kondor", "institution": "University of Chicago"}, {"given_name": "Vikas", "family_name": "Singh", "institution": "UW-Madison"}]}