{"title": "Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching", "book": "Advances in Neural Information Processing Systems", "page_first": 127, "page_last": 135, "abstract": "Graph matching is a challenging problem with very important applications in a wide range of fields, from image and video analysis to biological and biomedical problems. We propose a robust graph matching algorithm inspired in sparsity-related techniques. We cast the problem, resembling group or collaborative sparsity formulations, as a non-smooth convex optimization problem that can be efficiently solved using augmented Lagrangian techniques. The method can deal with weighted or unweighted graphs, as well as multimodal data, where different graphs represent different types of data. The proposed approach is also naturally integrated with collaborative graph inference techniques, solving general network inference problems where the observed variables, possibly coming from different modalities, are not in correspondence. The algorithm is tested and compared with state-of-the-art graph matching techniques in both synthetic and real graphs. We also present results on multimodal graphs and applications to collaborative inference of brain connectivity from alignment-free functional magnetic resonance imaging (fMRI) data.", "full_text": "Robust Multimodal Graph Matching:\nSparse Coding Meets Graph Matching\n\nMarcelo Fiori\nUniversidad de la\nRep\u00b4ublica, Uruguay\n\nmfiori@fing.edu.uy\n\nPablo Sprechmann\n\nDuke University\n\nDurham, NC 27708\n\npablo.sprechmann@duke.edu\n\nJoshua Vogelstein\nDuke University\n\nDurham, NC 27708\n\njovo@math.duke.edu\n\nPablo Mus\u00b4e\n\nUniversidad de la\nRep\u00b4ublica, Uruguay\n\npmuse@fing.edu.uy\n\nGuillermo Sapiro\nDuke University\n\nDurham, NC 27708\n\nguillermo.sapiro@duke.edu\n\nAbstract\n\nGraph matching is a challenging problem with very important applications in a\nwide range of \ufb01elds, from image and video analysis to biological and biomedical\nproblems. We propose a robust graph matching algorithm inspired in sparsity-\nrelated techniques. We cast the problem, resembling group or collaborative spar-\nsity formulations, as a non-smooth convex optimization problem that can be ef-\n\ufb01ciently solved using augmented Lagrangian techniques. The method can deal\nwith weighted or unweighted graphs, as well as multimodal data, where different\ngraphs represent different types of data. The proposed approach is also naturally\nintegrated with collaborative graph inference techniques, solving general network\ninference problems where the observed variables, possibly coming from differ-\nent modalities, are not in correspondence. The algorithm is tested and compared\nwith state-of-the-art graph matching techniques in both synthetic and real graphs.\nWe also present results on multimodal graphs and applications to collaborative in-\nference of brain connectivity from alignment-free functional magnetic resonance\nimaging (fMRI) data. The code is publicly available.\n\nIntroduction\n\n1\nProblems related to graph isomorphisms have been an important and enjoyable challenge for the\nscienti\ufb01c community for a long time. The graph isomorphism problem itself consists in determining\nwhether two given graphs are isomorphic or not, that is, if there exists an edge preserving bijection\nbetween the vertex sets of the graphs. This problem is also very interesting from the computational\ncomplexity point of view, since its complexity level is still unsolved: it is one of the few problems in\nNP not yet classi\ufb01ed as P nor NP-complete (Conte et al., 2004). The graph isomorphism problem is\ncontained in the (harder) graph matching problem, which consists in \ufb01nding the exact isomorphism\nbetween two graphs. Graph matching is therefore a very challenging problem which has several\napplications, e.g., in the pattern recognition and computer vision areas. In this paper we address the\nproblem of (potentially multimodal) graph matching when the graphs are not exactly isomorphic.\nThis is by far the most common scenario in real applications, since the graphs to be compared are\nthe result of a measuring or description process, which is naturally affected by noise.\nGiven two graphs GA and GB with p vertices, which we will characterize in terms of their p \u00d7 p\nadjacency matrices A and B, the graph matching problem consists in \ufb01nding a correspondence\nbetween the nodes of GA and GB minimizing some matching error.\nIn terms of the adjacency\nmatrices, this corresponds to \ufb01nding a matrix P in the set of permutation matrices P, such that\nit minimizes some distance between A and PBPT. A common choice is the Frobenius norm\n||A \u2212 PBPT||2\n\nij. The graph matching problem can be then stated as\n\nF =(cid:80)\nP\u2208P ||A \u2212 PBPT||2\n\nij M2\n\nF , where ||M||2\nmin\n\nF = min\n\nP\u2208P ||AP \u2212 PB||2\nF .\n\n(1)\n\n1\n\n\fThe combinatorial nature of the permutation search makes this problem NP in general, although\npolynomial algorithms have been developed for a few special types of graphs, like trees or planar\ngraphs for example (Conte et al., 2004).\nThere are several and diverse techniques addressing the graph matching problem, including spectral\nmethods (Umeyama, 1988) and problem relaxations (Zaslavskiy et al., 2009; Vogelstein et al., 2012;\nAlmohamad & Duffuaa, 1993). A good review of the most common approaches can be found in\nConte et al. (2004). In this paper we focus on the relaxation techniques for solving an approximate\nversion of the problem. Maybe the simplest one is to relax the feasible set (the permutation matrices)\nto its convex hull, the set of doubly stochastic matrices D, which consist of the matrices with non-\nnegative entries such that each row and column sum up one: D = {M \u2208 Rp\u00d7p : Mij \u2265 0, M1 =\n1, MT 1 = 1}, 1 being the p-dimensional vector of ones. The relaxed version of the problem is\n\n\u02c6P = arg min\n\nP\u2208D ||AP \u2212 PB||2\nF ,\n\nwhich is a convex problem, though the result is a doubly stochastic matrix instead of a per-\nmutation. The \ufb01nal node correspondence is obtained as the closest permutation matrix to \u02c6P:\nP\u2217 = arg minP\u2208P ||P\u2212 \u02c6P||2\nF , which is a linear assignment problem that can be solved in O(p3) by\nthe Hungarian algorithm (Kuhn, 1955). However, this last step lacks any guarantee about the graph\nmatching problem itself. This approach will be referred to as QCP for quadratic convex problem.\nOne of the newest approximate methods is the PATH algorithm by Zaslavskiy et al. (2009), which\ncombines this convex relaxation with a concave relaxation. Another new technique is the FAQ\nmethod by Vogelstein et al. (2012), which solves a relaxed version of the Quadratic Assignment\nProblem. We compare the method here proposed to all these techniques in the experimental section.\nThe main contributions of this work are two-fold. Firstly, we propose a new and versatile formu-\nlation for the graph matching problem which is more robust to noise and can naturally manage\nmultimodal data. The technique, which we call GLAG for Group lasso graph matching, is inspired\nby the recent works on sparse modeling, and in particular group and collaborative sparse coding.\nWe present several experimental evaluations to back up these claims. Secondly, this proposed for-\nmulation \ufb01ts very naturally into the alignment-free collaborative network inference problem, where\nwe collaborative exploit non-aligned (possibly multimodal) data to infer the underlying common\nnetwork, making this application never addressed before to the best of our knowledge. We assess\nthis with experiments using real fMRI data.\nThe rest of this paper is organized as follows. In Section 2 we present the proposed graph matching\nformulation, and we show how to solve the optimization problem in Section 3. The joint collabo-\nrative network and permutation learning application is described in Section 4. Experimental results\nare presented in Section 5, and we conclude in Section 6.\n\n2 Graph matching formulation\n\nWe consider the problem of matching two graphs that are not necessarily perfectly isomorphic. We\nwill assume the following model: Assume that we have a noise free graph characterized by an\nadjacency matrix T. Then we want to match two graphs with adjacency matrices A = T + OA and\no TPo + OB, where OA and OB have a sparse number of non-zero elements of arbitrary\nB = PT\nmagnitude. This realistic model is often used in experimental settings, e.g., (Zaslavskiy et al., 2009).\nIn this context, the QCP formulation tends to \ufb01nd a doubly stochastic matrix P which minimizes the\n\u201caverage error\u201d between AP and PB. However, these spurious mismatching edges can be thought\nof as outliers, so we would want a metric promoting that AP and PB share the same active set (non\nzero entries representing edges), with the exception of some sparse entries. This can be formulated\nin terms of the group Lasso penalization (Yuan & Lin, 2006). In short, the group Lasso takes a set\nof groups of coef\ufb01cients and promotes that only some of these groups are active, while the others\nremain zero. Moreover, the usual behavior is that when a group is active, all the coef\ufb01cients in the\ngroup are non-zero. In this particular graph matching application, we form p2 groups, one per matrix\n\nentry (i, j), each one consisting of the 2-dimensional vector(cid:0)(AP)ij, (PB)ij\n\n(cid:1). The proposed cost\n\nfunction is then the sum of the l2 norms of the groups:\n\n(2)\n\n(cid:88)\n\nf (P ) =\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:0)(AP)ij, (PB)ij\n\n(cid:1)(cid:12)(cid:12)(cid:12)(cid:12)2 .\n\ni,j\n\n2\n\n\fIdeally we would like to solve the graph matching problem by \ufb01nding the minimum of f over the\nset of permutation matrices P. Of course this formulation is still computationally intractable, so we\nsolve the relaxed version, changing P by its convex hull D, resulting in the convex problem\n\n\u02dcP = arg min\n\nP\u2208D f (P).\n\n(3)\n\nAs with the Frobenius formulation, the \ufb01nal step simply \ufb01nds the closest permutation matrix to \u02dcP.\nLet us analyze the case when A and B are the adjacency matrices of two isomorphic undirected\nunweighted graphs with e edges and no self-loops. Since the graphs are isomorphic, there exist a\no .\npermutation matrix Po such that A = PoBPT\n\u221a\nLemma 1 Under the conditions stated above, the minimum value of the optimization problem (3)\n2e and it is reached by Po, although the solution is not unique in general. Moreover, any\nis 2\nsolution P of problem (3) satis\ufb01es AP = PB.\n\nProof: Let (a)k denote all the p2 entries of AP, and (b)k all the entries of PB. Then f (P) can be\n\nre-written as f (P) =(cid:80)\n\n(cid:112)a2\na2 + b2 \u2265 \u221a\n\nk\n\nObserving that\n\n\u221a\n\nk .\n\nk + b2\n2 (a + b), we have\n2\n\n(cid:88)\n\n(cid:113)\n\na2\nk + b2\n\nk \u2265 (cid:88)\n\nf (P ) =\n\n\u221a\n\n2\n2\n\n(4)\n\n(ak + bk) .\n\nk\n\nk\n\n2e.\n\nk ak = (cid:80)\n\nthe entries of A, which is two times the number of edges. Therefore(cid:80)\n\nNow, since P is doubly stochastic, the sum of all the entries of AP is equal to the sum of all\n\u221a\nk bk = 2e and\nf (P) \u2265 2\nThe equality in (4) holds if and only if ak = bk for all k, which means that AP = PB. In particular,\n(cid:3)\nthis is true for the permutation Po, which completes the proof of all the statements.\nThis Lemma shows that the fact that the weights in A and B are not compared in magnitude does\nnot affect the matching performance when the two graphs are isomorphic and have equal weights.\nOn the other hand, this property places a fundamental role when moving away from this setting.\nIndeed, since the group lasso tends to set complete groups to zero, and the actual value of the\nnon-zero coef\ufb01cients is less important, this allows to group very dissimilar coef\ufb01cients together,\nif that would result in fewer active groups. This is even more evident when using the l\u221e norm\ninstead of the l2 norm of the groups, and the optimization remains very similar to the one presented\nbelow. Moreover, the formulation remains valid when both graphs come from different modalities,\na fundamental property when for example addressing alignment-free collaborative graph inference\nas presented in Section 4 (the elegance with which this graph matching formulation \ufb01ts into such\nproblem will be further stressed there). In contrast, the Frobenious-based approaches mentioned\nin the introduction are very susceptible to differences in edge magnitudes and not appropriate for\nmultimodal matching1.\n\n3 Optimization\nThe proposed minimization problem (3) is convex but non-differentiable. Here we use an ef\ufb01cient\nvariant of the Alternating Direction Method of Multipliers (ADMM) (Bertsekas & Tsitsiklis, 1989).\nThe idea is to write the optimization problem as an equivalent arti\ufb01cially constrained problem, using\ntwo new variables \u03b1, \u03b2 \u2208 Rp\u00d7p:\n\ns.t. \u03b1 = AP, \u03b2 = PB.\n\n(5)\n\nThe ADMOM method generates a sequence which converges to the minimum of the augmented\nLagrangian of the problem:\n\nL(P, \u03b1, \u03b2, U, V) =\n\n||\u03b1 \u2212 AP + U||2 +\n\n||\u03b2 \u2212 PB + V||2 ,\n\nc\n2\n\n1If both graphs are binary and we limit to permutation matrices (for which there are no algorithms known to\n\ufb01nd the solution in polynomial time), then the minimizers of (2) and (1) are the same (Vince Lyzinski, personal\ncommunication).\n\n3\n\nmin\nP\u2208D\n\n(cid:88)\n||(cid:0)\u03b1ij, \u03b2ij\n(cid:1)||2\n(cid:88)\n(cid:1)||2 +\n||(cid:0)\u03b1ij, \u03b2ij\n\ni,j\n\ni,j\n\nc\n2\n\n\fwhere U and V are related to the Lagrange multipliers and c is a \ufb01xed constant.\nThe decoupling produced by the new arti\ufb01cial variables allows to update their values one at a time,\nminimizing the augmented Lagrangian L. We \ufb01rst update the pair (\u03b1, \u03b2) while keeping \ufb01xed\n(P, U, V); then we minimize for P; and \ufb01nally update U and V, as described next in Algorithm 1.\n\n: Adjacency matrices A, B, c > 0.\n\nInput\nOutput: Permutation matrix P\u2217\nInitialize U = 0, V = 0, P = 1\nwhile stopping criterion is not satis\ufb01ed do\n\np 1T 1\n\n(\u03b1t+1, \u03b2t+1) = arg min\u03b1,\u03b2\nPt+1 = arg minP\u2208D 1\nUt+1 = Ut + \u03b1t+1 \u2212 APt+1\nVt+1 = Vt + \u03b2t+1 \u2212 Pt+1B\n\ni,j ||(cid:0)\u03b1ij, \u03b2ij\n(cid:80)\n\n(cid:1)||2 + c\n\n2||\u03b1t+1 \u2212 AP + Ut||2\n\nF + 1\n\n2||\u03b1 \u2212 APt + Ut||2\n2||\u03b2t+1 \u2212 PB + Vt||2\n\nF\n\nF + c\n\n2||\u03b2 \u2212 PtB + Vt||2\n\nF\n\nend\nP\u2217 = argminQ\u2208P ||Q \u2212 P||2\nAlgorithm 1: Robust graph matching algorithm. See text for implementation details of each step.\n\nF\n\nThe \ufb01rst subproblem is decomposable into p2 scalar problems (one for each matrix entry),\nij)2.\n\n(\u03b2ij \u2212 (PtB)ij + Vt\n\n(\u03b1ij \u2212 (APt)ij + Ut\n\nij)2 +\n\nmin\n\u03b1ij ,\u03b2ij\n\nc\n2\n\nFrom the optimality conditions on the subgradient of this subproblem, it can be seen that this can\nbe solved in closed form by means of the well know vector soft-thresholding operator (Yuan & Lin,\n2006): Sv(b, \u03bb) =\n\nb .\n\n||(cid:0)\u03b1ij, \u03b2ij\n(cid:104)\n1 \u2212 \u03bb||b||2\n\n(cid:1)||2 +\n(cid:105)\n\n+\n\nc\n2\n\nThe second subproblem is a minimization of a convex differentiable function over a convex set, so\ngeneral solvers can be chosen for this task. For instance, a projected gradient descent method can\nbe used. However, this would require to compute several projections onto D per iteration, which is\none of the computationally most expensive steps. Nevertheless, we can choose to solve a linearized\nversion of the problem while keeping the convergence guarantees of the algorithm (Lin et al., 2011).\nIn this case, the linear approximation of the \ufb01rst term is:\n||\u03b1t+1 \u2212 APk + Ut||2\n\n||\u03b1t+1 \u2212 AP + Ut||2\n\nF + (cid:104)gk, P \u2212 Pk(cid:105) +\n\n||P \u2212 Pk||2\nF ,\n\nwhere gk = \u2212AT(\u03b1t+1 + Ut \u2212 APk) is the gradient of the linearized term, (cid:104)\u00b7,\u00b7(cid:105) is the usual inner\n, with \u03c1(\u00b7) being the spectral norm.\nproduct of matrices, and \u03c4 is any constant such that \u03c4 <\n\n1\n\nF \u2248 1\n2\n\n1\n2\u03c4\n\n1\n2\n\nThe second term can be linearized analogously, so the minimization of the second step becomes\n\nmin\nP\u2208D\n\n1\n2\n\n||P\u2212(cid:0)Pk + \u03c4 AT(\u03b1t+1 + Ut \u2212 APk)(cid:1)\n(cid:125)\n(cid:124)\n2 (C + D) over D.\n\n||2\nF +\n\n\ufb01xed matrix C\n\n(cid:123)(cid:122)\n\nwhich is simply the projection of the matrix 1\n\n(cid:124)\n\n1\n2\n\n\u03c1(ATA)\n\n||P\u2212(cid:0)Pk + \u03c4 (\u03b2t+1 + Vt \u2212 PkB)BT(cid:1)\n(cid:125)\n\n(cid:123)(cid:122)\n\n\ufb01xed matrix D\n\n||2\n\nF\n\nSummarizing, each iteration consists of p2 vector thresholdings when solving for (\u03b1, \u03b2), one pro-\njection over D when solving for P, and two matrix multiplications for the update of U and V. The\ncode is publicly available at www.fing.edu.uy/\u02dcmfiori.\n4 Application to joint graph inference of not pre-aligned data\nEstimating the inverse covariance matrix is a very active \ufb01eld of research. In particular the inference\nof the support of this matrix, since the non-zero entries have information about the conditional de-\npendence between variables. In numerous applications, this matrix is known to be sparse, and in this\nregard the graphical Lasso has proven to be a good estimator for the inverse covariance matrix (Yuan\n& Lin, 2007; Fiori et al., 2012) (also for non-Gaussian data (Loh & Wainwright, 2012)). Assume\nthat we have a p-dimensional multivariate normal distributed variable X \u223c N (0, \u03a3); let X \u2208 Rk\u00d7p\nbe a data matrix containing k independent observations of X, and S its empirical covariance matrix.\nThe graphical Lasso estimator for \u03a3\u22121 is the matrix \u0398 which solves the optimization problem\n\ntr(S\u0398) \u2212 log det \u0398 + \u03bb\n\n|\u0398ij| ,\n\nmin\n\u0398(cid:31)0\n\n(6)\n\n(cid:88)\n\ni,j\n\n4\n\n\fwhich corresponds to the maximum likelihood estimator for \u03a3\u22121 with an l1 regularization.\nCollaborative network inference has gained a lot of attention in the last years (Chiquet et al., 2011),\nspecially with fMRI data, e.g., (Varoquaux et al., 2010). This problem consist of estimating two (or\nmore) matrices \u03a3\u22121\nB from data matrices XA and XB as above, with the additional prior\ninformation that the inverse covariance matrices share the same support. The joint estimation of \u0398A\nand \u0398B is performed by solving\n\nA and \u03a3\u22121\n\nmin\n\n\u0398A(cid:31)0,\u0398B(cid:31)0\n\ntr(SA\u0398A) \u2212 log det \u0398A + tr(SB\u0398B) \u2212 log det \u0398B + \u03bb\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:0)\u0398A\n\n(cid:88)\n\ni,j\n\n(cid:1)||2 ,\n\nij, \u0398B\nij\n\n(7)\n\nwhere the \ufb01rst four terms correspond to the maximum likelihood estimators for \u0398A, \u0398B, and the\nlast term is the group Lasso penalty which promotes that \u0398A and \u0398B have the same active set.\nThis formulation relies on the limiting underlying assumption that the variables in both datasets\n(the columns of XA and XB) are in correspondence, i.e., the graphs determined by the adjacency\nmatrices \u0398A and \u0398B are aligned. However, this is in general not the case in practice. Motivated\nby the formulation presented in Section 2, we propose to overcome this limitation by incorporating\na permutation matrix into the optimization problem, and jointly learn it on the estimation process.\nThe proposed optimization problem is then given by\n\ntr(SA\u0398A) \u2212 log det \u0398A + tr(SB\u0398B) \u2212 log det \u0398B + \u03bb\n\n\u0398A,\u0398B(cid:31)0\n\nmin\nP\u2208P\n\n(8)\nEven after the relaxation of the constraint P \u2208 P to P \u2208 D, the joint minimization of (8) over\n(\u0398A, \u0398B) and P is a non-convex problem. However it is convex when minimized only over\n(\u0398A, \u0398B) or P leaving the other \ufb01xed. Problem (8) can be then minimized using a block-coordinate\ndescent type of approach, iteratively minimizing over (\u0398A, \u0398B) and P.\nThe \ufb01rst subproblem (solving (8) with P \ufb01xed) is a very simple variant of (7), which can be solved\nvery ef\ufb01ciently by means of iterative thresholding algorithms (Fiori et al., 2013). In the second\nsubproblem, since (\u0398A, \u0398B) are \ufb01xed, the only term to minimize is the last one, which corresponds\nto the graph matching formulation presented in Section 2.\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:0)(\u0398AP)ij, (P\u0398B)ij\n\n(cid:1)||2.\n\n(cid:88)\n\ni,j\n\n5 Experimental results\n\nWe now present the performance of our algorithm and compare it with the most recent techniques\nin several scenarios including synthetic and real graphs, multimodal data, and fMRI experiments.\nIn the cases where there is a \u201cground truth,\u201d the performance is measured in terms of the matching\nerror, de\ufb01ned as ||Ao \u2212 PBoPT||2\nF , where P is the obtained permutation matrix and (Ao, Bo) are\nthe original adjacency matrices.\n\n5.1 Graph matching: Synthetic graphs\n\nWe focus here in the traditional graph matching problem of undirected weighted graphs, both with\nand without noise. More precisely, let Ao be the adjacency matrix of a random weighted graph and\nBo a permuted version of it, generated with a random permutation matrix Po, i.e., Bo = PT\no AoPo.\nWe then add a certain number N of random edges to Ao with the same weight distribution as the\noriginal weights, and another N random edges to Bo, and from these noisy versions we try to recover\nthe original matching (or any matching between Ao and Bo, since it may not be unique).\nWe show the results using three different techniques for the generation of the graphs: the Erd\u02ddos-\nR\u00b4enyi model (Erd\u02ddos & R\u00b4enyi, 1959), the model by Barab\u00b4asi & Albert (1999) for scale-free graphs,\nand graphs with a given degree distribution generated with the BTER algorithm (Seshadhri et al.,\n2012). These models are representative of a wide range of real-world graphs (Newman, 2010). In\nthe case of the BTER algorithm, the degree distribution was generated according to a geometric law,\nthat is: Prob(degree = t) = (1 \u2212 e\u2212\u00b5)e\u00b5t.\nWe compared the performance of our algorithm with the technique by Zaslavskiy et al. (2009)\n(referred to as PATH), the FAQ method described in Vogelstein et al. (2012), and the QCP approach.\n\n5\n\n\fFigure 1 shows the matching error as a function of the noise level for graphs with p = 100 nodes\n(top row), and for p = 150 nodes (bottom row). The number of edges varies between 200 and 400\nfor graphs with 100 nodes, and between 300 and 600 for graphs with 150 nodes, depending on the\nmodel. The performance is averaged over 100 runs. This \ufb01gure shows that our method is more\nstable, and consistently outperforms the other methods (considered state-of-the-art), specially for\nnoise levels in the low range (for large noise levels, is not clear what a \u201ctrue\u201d matching is, and in\naddition the sparsity hypothesis is no longer valid).\n\nr\no\nr\nr\ne\ng\nn\ni\nh\nc\nt\na\n\nM\n\n18\n\n14\n\n10\n\n6\n\n2\n\nr\no\nr\nr\ne\ng\nn\ni\nh\nc\nt\na\n\nM\n\n16\n14\n12\n10\n8\n6\n4\n2\n\n0\n\n5\n\n10\nNoise\n\n15\n\n20\n\n25\n\n0\n\n5\n\n10\nNoise\n\n15\n\n20\n\n25\n\n0\n\n(a) Erd\u02ddos-R\u00b4enyi graphs\n\n(b) Scale-free graphs\n\nr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\n10\n\n8\n\n6\n\n4\n\n2\n\nr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\n14\n12\n10\n8\n6\n4\n2\n\n10\n\n15\nNoise\n\n5\n20\n(c) BTER graphs\n\n25\n\n30\n\n18\n\nr\no\nr\nr\ne\n\n14\n\n10\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\nr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\n6\n\n2\n\n9\n\n7\n\n5\n\n3\n\n1\n\n0\n\n5\n\n10\nNoise\n\n15\n\n20\n\n25\n\n0\n\n5\n\n10\nNoise\n\n15\n\n20\n\n25\n\n0\n\n5\n\n10\n\n15\nNoise\n\n20\n\n25\n\n30\n\n(d) Erd\u02ddos-R\u00b4enyi graphs\n\n(e) Scale-free graphs\n\n(f) BTER graphs\n\nFigure 1: Matching error for synthetic graphs with p = 100 nodes (top row) and p = 150 nodes (bottom row).\nIn solid black our proposed GLAG algorithm, in long-dashed blue the PATH algorithm, in short-dashed red the\nFAQ method, and in dotted black the QCP.\n\n5.2 Graph matching: Real graphs\n\nWe now present similar experiments to those in the previous section but with real graphs. We use\nthe C. elegans connectome. Caenorhabditis elegans is an extensively studied roundworm, whose so-\nmatic nervous system consists of 279 neurons that make synapses with other neurons. The two types\nof connections (chemical and electrical) between these 279 neurons have been mapped (Varshney\net al., 2011), and their corresponding adjacency matrices, Ac and Ae, are publicly available.\nWe match both the chemical and the electrical connection graphs against noisy arti\ufb01cially permuted\nversions of them. The permuted graphs are constructed following the same procedure used in Section\n5.1 for synthetic graphs. The weights of the added noise follow the same distribution as the original\nweights. The results are shown in Figure 2. These results suggest that from the prior art, the PATH\nalgorithm is more suitable for the electrical connection network, while the FAQ algorithm works\nbetter for the chemical one. Our method outperforms both of them for both types of connections.\n\n5.3 Multimodal graph matching\n\nOne of the advantages of the proposed approach is its capability to deal with multimodal data. As\ndiscussed in Section 2, the group Lasso type of penalty promotes the supports of AP and PB to be\nidentical, almost independently of the actual values of the entries. This allows to match weighted\ngraphs where the weights may follow completely different probability distributions. This is com-\nmonly the case when dealing with multimodal data: when a network is measured using signi\ufb01cantly\ndifferent modalities, one expects the underlying connections to be the same but no relation can be\nassumed between the actual weights of these connections. This is even the case for example for\nfMRI data when measured with different instruments. In what follows, we evaluate the performance\nof the proposed method in two examples of multimodal graph matching.\n\n6\n\n\fr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\n5\n4.5\n4\n3.5\n3\n2.5\n2\n1.5\n1\n0.5\n\n0\n\n5 10 15 20 25 30 35 40 45 50\n\nNoise\n\n(a) Electrical connection graph\n\n8\n7\n6\n5\n4\n3\n2\n1\n\nr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\n0\n\n5\n\n10 15 20 25 30 35 40 45 50\n\nNoise\n\n(b) Chemical connection graph\n\nFigure 2: Matching error for the C. elegans connectome, averaged over 50 runs. In solid black our proposed\nGLAG algorithm, in long-dashed blue the PATH algorithm, and in short-dashed red the FAQ method. Note that\nin the chemical connection graph, the matching error of our algorithm is zero until noise levels of \u2248 50.\n\nWe \ufb01rst generate an auxiliary binary random graph Ab and a permuted version Bb = PT\no AbPo.\nThen, we assign weights to the graphs according to distributions pA and pB (that will be speci\ufb01ed\nfor each experiment), thus obtaining the weighted graphs A and B. We then add noise consisting\nof spurious weighted edges following the same distribution as the original graphs (i.e., pA for A\nand pB for B). Finally, we run all four graph matching methods to recover the permutation. The\nmatching error is measured in the unweighted graphs as ||Ab \u2212 PBbPT||F . Note that while this\nmetric might not be appropriate for the optimization stage when considering multimodal data, it\nis appropriate for the actual error evaluation, measuring mismatches. Comparing with the original\npermutation matrix may not be very informative since there is no guarantee that the matrix is unique,\neven for the original noise-free data.\nFigures 3(a) and 3(b) show the comparison when the weights in both graphs are Gaussian distributed,\nbut with different means and variances. Figures 3(c) and 3(d) show the performances when the\nweights of A are Gaussian distributed, and the ones of B follow a uniform distribution. See captions\nfor details. These results con\ufb01rm the intuition described above, showing that our method is more\nsuitable for multimodal graphs, specially in the low range of noise.\n\n35\n30\n\n25\n\nr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\n20\n\n15\n\nM\n\n10\n\n5\n\n0\n\n5\n\n10 15 20 25 30 35 40 45\n\nNoise\n\n(a) Erd\u02ddos-R\u00b4enyi graphs\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\nr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\nr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\n25\n\n20\n\n15\n\n10\n\n5\n\nr\no\nr\nr\ne\n\ng\nn\ni\nh\nc\nt\na\n\nM\n\n0\n\n5\n\n10 15 20 25 30 35 40 45\n\nNoise\n\n(b) Scale-free graphs\n\n0\n\n5\n\n10 15 20 25 30 35 40 45\n\nNoise\n\n(c) Erd\u02ddos-R\u00b4enyi graphs\n\n0\n\n5\n\n10 15 20 25 30 35 40 45\n\nNoise\n\n(d) Scale-free graphs\n\nFigure 3: Matching error for multimodal graphs with p = 100 nodes.\nIn (a) and (b), weights in A are\nN (1, 0.4) and weights in B are N (4, 1). In (c) and (d), weights in A are N (1, 0.4) and weights in B are\nuniform in [1, 2]. In solid black our proposed GLAG algorithm, in long-dashed blue the PATH algorithm, in\nshort-dashed red the FAQ method, and in dotted black the QCP.\n\n7\n\n\f5.4 Collaborative inference\nIn this last experiment, we illustrate the application of the permuted collaborative graph inference\npresented in Section 4 with real resting-state fMRI data, publicly available (Nooner, 2012). We con-\nsider here test-retest studies, that is, the same subject undergoing resting-state fMRI in two different\nsessions separated by a break. Each session consists of almost 10 minutes of data, acquired with\na sampling period of 0.645s, producing about 900 samples per study. The CC200 atlas (Craddock\net al., 2012) was used to extract the time-series for the \u2248 200 regions of interest (ROIs), resulting in\ntwo data matrices XA, XB \u2208 R900\u00d7200, corresponding to test and retest respectively.\nTo illustrate the potential of the proposed framework, we show that using only part of the data in\nXA and part of the data in a permuted version of XB, we are able to infer a connectivity matrix\nalmost as accurately as using the whole data. Working with permuted data is very important in this\napplication in order to handle possible miss-alignments to the atlas.\nSince there is no ground truth for the connectivity, and as mentioned before the collaborative setting\n(7) has already been proven successful, we take as ground truth the result of the collaborative infer-\nence using the empirical covariance matrices of XA and XB, denoted by SA and SB. The result of\nthis collaborative inference procedure are the two inverse covariance matrices \u0398A\nGT . In\nshort, the gold standard built for this experiment are found by solving (obtained with the entire data)\n\nGT and \u0398B\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:0)\u0398A\n\n(cid:88)\n\ni,j\n\n(cid:1)||2 .\n\nij, \u0398B\nij\n\nmin\n\n\u0398A(cid:31)0,\u0398B(cid:31)0\n\ntr(SA\u0398A) \u2212 log det \u0398A + tr(SB\u0398B) \u2212 log det \u0398B + \u03bb\n\nH be the \ufb01rst 550 samples of XA, and XB\n\nH the \ufb01rst 550 samples of XB, which correspond\nH and SB\nH\nH Po. With these two\nB\nH we run the algorithm described in Section 4, which alternately computes the\n\nNow, let XA\nto a little less than 6 minutes of study. We compute the empirical covariance matrices SA\nB\nof these data matrices, and we arti\ufb01cially permute the second one: \u02dcS\nH = PT\nmatrices SA\ninverse covariance matrices \u0398A\nWe compare this approach against the computation of the inverse covariance matrix using only one\nof the studies. Let \u0398A\n\nH and the matching P between them.\n\nH and \u0398B\n\nH and \u02dcS\n\ns be the results of the graphical Lasso (6) using SA and SB:\ntr(SK\u0398) \u2212 log det \u0398 + \u03bb\n\nfor K = {A, B}.\n\n|\u0398ij| ,\n\ns and \u0398B\n\n(cid:88)\n\no SB\n\n\u0398K\n\ns = argmin\n\n\u0398(cid:31)0\n\nGT \u2212\nThis experiment is repeated for 5 subjects in the database. The errors ||\u0398A\nH||F are shown in Figure 4. The errors for \u0398B are very similar. Using less than 6 minutes of each\n\u0398A\nstudy, with the variables not pre-aligned, the permuted collaborative inference procedure proposed\nin Section 4 outperforms the classical graphical Lasso using the full 10 minutes of study.\n\ns ||F and ||\u0398A\n\nGT \u2212 \u0398A\n\ni,j\n\n)\n3\n\u2212\n\n0\n1\n\u00d7\n\n(\n\nr\no\nr\nr\nE\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\n1\n\n2\n\n3\n\nSubject\n\n4\n\n5\n\nGT \u2212 \u0398A\n\n||\u0398A\nGT \u2212 \u0398A\n\nFigure 4:\nInverse covariance matrix\nestimation for fMRI data.\nIn blue,\nerror using one complete 10 minutes\ns ||F .\nstudy:\nIn red, er-\nror ||\u0398A\nH||F with collabora-\ntive inference using about 6 minutes\nof each study, but solving for the un-\nknown node permutations at the same\ntime.\n\n6 Conclusions\nWe have presented a new formulation for the graph matching problem, and proposed an optimization\nalgorithm for minimizing the corresponding cost function. The reported results show its suitability\nfor the graph matching problem of weighted graphs, outperforming previous state-of-the-art meth-\nods, both in synthetic and real graphs. Since in the problem formulation the weights of the graphs\nare not compared explicitly, the method can deal with multimodal data, outperforming the other\ncompared methods. In addition, the proposed formulation naturally \ufb01ts into the pre-alignment-free\ncollaborative network inference framework, where the permutation is estimated together with the\nunderlying common network, with promising preliminary results in applications with real data.\nAcknowledgements: Work partially supported by ONR, NGA, NSF, ARO, AFOSR, and ANII.\n\n8\n\n\fReferences\nAlmohamad, H. and Duffuaa, S. A linear programming approach for the weighted graph match-\ning problem. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 15(5):522\u2013525,\n1993.\n\nBarab\u00b4asi, A. and Albert, R. Emergence of scaling in random networks. Science, 286(5439):509\u2013512,\n\n1999.\n\nBertsekas, D. and Tsitsiklis, J. Parallel and Distributed Computation: Numerical Methods. Prentice\n\nHall, 1989.\n\nChiquet, J., Grandvalet, Y., and Ambroise, C. Inferring multiple graphical structures. Statistics and\n\nComputing, 21(4):537\u2013553, 2011.\n\nConte, D., Foggia, P., Sansone, C., and Vento, M. Thirty years of graph matching in pattern recog-\nnition. International Journal of Pattern Recognition and Arti\ufb01cial Intelligence, 18(03):265\u2013298,\n2004.\n\nCraddock, R.C., James, G.A., Holtzheimer, P.E., Hu, X.P., and Mayberg, H.S. A whole brain fMRI\natlas generated via spatially constrained spectral clustering. Human Brain Mapping, 33(8):1914\u2013\n1928, 2012.\n\nErd\u02ddos, P. and R\u00b4enyi, A. On random graphs, I. Publicationes Mathematicae, 6:290\u2013297, 1959.\nFiori, Marcelo, Mus\u00b4e, Pablo, and Sapiro, Guillermo. Topology constraints in graphical models. In\n\nAdvances in Neural Information Processing Systems 25, pp. 800\u2013808, 2012.\n\nFiori, Marcelo, Mus\u00b4e, Pablo, Hariri, Ahamd, and Sapiro, Guillermo. Multimodal graphical models\n\nvia group lasso. Signal Processing with Adaptive Sparse Structured Representations, 2013.\n\nKuhn, H. W. The Hungarian method for the assignment problem. Naval Research Logistic Quar-\n\nterly, 2:83\u201397, 1955.\n\nLin, Z., Liu, R., and Su, Z. Linearized alternating direction method with adaptive penalty for low-\nIn Advances in Neural Information Processing Systems 24, pp. 612\u2013620.\n\nrank representation.\n2011.\n\nLoh, P. and Wainwright, M. Structure estimation for discrete graphical models: Generalized covari-\nance matrices and their inverses. In Advances in Neural Information Processing Systems 25, pp.\n2096\u20132104. 2012.\n\nNewman, M. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.\nNooner, K. et al. The NKI-Rockland sample: A model for accelerating the pace of discovery science\n\nin psychiatry. Frontiers in Neuroscience, 6(152), 2012.\n\nSeshadhri, C., Kolda, T.G., and Pinar, A. Community structure and scale-free collections of Erd\u02ddos-\n\nR\u00b4enyi graphs. Physical Review E, 85(5):056109, 2012.\n\nUmeyama, S. An eigendecomposition approach to weighted graph matching problems. Pattern\n\nAnalysis and Machine Intelligence, IEEE Transactions on, 10(5):695\u2013703, 1988.\n\nVaroquaux, G., Gramfort, A., Poline, J.B., and T., Bertrand. Brain covariance selection: better indi-\nvidual functional connectivity models using population prior. In Advances in Neural Information\nProcessing Systems 23, pp. 2334\u20132342, 2010.\n\nVarshney, L., Chen, B., Paniagua, E., Hall, D., and Chklovskii, D. Structural properties of the\n\ncaenorhabditis elegans neuronal network. PLoS Computational Biology, 7(2):e1001066, 2011.\n\nVogelstein, J.T., Conroy, J.M., Podrazik, L.J., Kratzer, S.G., Harley, E.T., Fishkind, D.E., Vogelstein,\nR.J., and Priebe, C.E. Fast approximate quadratic programming for large (brain) graph matching.\narXiv:1112.5507, 2012.\n\nYuan, M. and Lin, Y. Model selection and estimation in regression with grouped variables. Journal\n\nof the Royal Statistical Society: Series B, 68(1):49\u201367, 2006.\n\nYuan, M. and Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika,\n\n94(1):19\u201335, February 2007.\n\nZaslavskiy, M., Bach, F., and Vert, J.P. A path following algorithm for the graph matching problem.\n\nPattern Analysis and Machine Intelligence, IEEE Transactions on, 31(12):2227\u20132242, 2009.\n\n9\n\n\f", "award": [], "sourceid": 131, "authors": [{"given_name": "Marcelo", "family_name": "Fiori", "institution": "Universidad de la Rep\u00fablica, Uruguay"}, {"given_name": "Pablo", "family_name": "Sprechmann", "institution": "Duke University"}, {"given_name": "Joshua", "family_name": "Vogelstein", "institution": "Duke University"}, {"given_name": "Pablo", "family_name": "Muse", "institution": "Universidad de la Rep\u00fablica, Uruguay"}, {"given_name": "Guillermo", "family_name": "Sapiro", "institution": "Duke University"}]}