{"title": "The Lov\u00e1sz \u03d1 function, SVMs and finding large dense subgraphs", "book": "Advances in Neural Information Processing Systems", "page_first": 1160, "page_last": 1168, "abstract": "The Lovasz $\\theta$ function of a graph, is a fundamental tool in combinatorial optimization and approximation algorithms.  Computing $\\theta$ involves solving a SDP  and is extremely expensive even for moderately sized graphs.  In this paper we establish that the Lovasz $\\theta$ function is equivalent to  a kernel learning problem related to one class SVM. This interesting connection opens up many opportunities  bridging graph theoretic algorithms and machine learning.   We show that there exist graphs, which we call $SVM-\\theta$ graphs, on which the Lovasz $\\theta$ function can be approximated well by a one-class  SVM.    This leads to a novel use of SVM techniques to solve algorithmic problems in large graphs e.g. identifying a planted clique  of size $\\Theta({\\sqrt{n}})$ in a random graph $G(n,\\frac{1}{2})$. A classic approach for this problem involves computing  the $\\theta$ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of $SVM-\\theta$ graph, and as a consequence a SVM based approach  easily identifies the clique in large graphs and is competitive with the  state-of-the-art.    Further, we introduce  the notion of a ''common orthogonal labeling'' which extends the notion  of a ''orthogonal labelling of a single  graph (used in defining the $\\theta$ function)  to multiple graphs.  The problem of finding the optimal common orthogonal labelling is cast as a  Multiple Kernel Learning problem and is used to identify a large common dense region in multiple graphs.  The proposed algorithm achieves an order of magnitude scalability compared to the state of the art.", "full_text": "The Lov\u00b4asz \u03d1 function, SVMs and \ufb01nding large dense\n\nsubgraphs\n\nVinay Jethava \u2217\n\nComputer Science & Engineering Department,\n\nChalmers University of Technology\n\n412 96, Goteborg, SWEDEN\njethava@chalmers.se\n\nAnders Martinsson\n\nDepartment of Mathematics,\n\nChalmers University of Technology\n\n412 96, Goteborg, SWEDEN\n\nandemar@student.chalmers.se\n\nChiranjib Bhattacharyya\n\nDepartment of CSA,\n\nIndian Institute of Science\nBangalore, 560012, INDIA\n\nchiru@csa.iisc.ernet.in\n\nDevdatt Dubhashi\n\nComputer Science & Engineering Department,\n\nChalmers University of Technology\n\n412 96, Goteborg, SWEDEN\ndubhashi@chalmers.se\n\nAbstract\n\nThe Lov\u00b4asz \u03d1 function of a graph, a fundamental tool in combinatorial optimiza-\ntion and approximation algorithms, is computed by solving a SDP. In this paper\nwe establish that the Lov\u00b4asz \u03d1 function is equivalent to a kernel learning problem\nrelated to one class SVM. This interesting connection opens up many opportuni-\nties bridging graph theoretic algorithms and machine learning. We show that there\nexist graphs, which we call SVM \u2212 \u03d1 graphs, on which the Lov\u00b4asz \u03d1 function\ncan be approximated well by a one-class SVM. This leads to novel use of SVM\n\u221a\ntechniques for solving algorithmic problems in large graphs e.g.\nidentifying a\nplanted clique of size \u0398(\n2 ). A classic approach for\nthis problem involves computing the \u03d1 function, however it is not scalable due\nto SDP computation. We show that the random graph with a planted clique is an\nexample of SVM \u2212 \u03d1 graph. As a consequence a SVM based approach easily\nidenti\ufb01es the clique in large graphs and is competitive with the state-of-the-art.\nWe introduce the notion of common orthogonal labelling and show that it can be\ncomputed by solving a Multiple Kernel learning problem. It is further shown that\nsuch a labelling is extremely useful in identifying a large common dense subgraph\nin multiple graphs, which is known to be a computationally dif\ufb01cult problem. The\nproposed algorithm achieves an order of magnitude scalability compared to state\nof the art methods.\n\nn) in a random graph G(n, 1\n\nIntroduction\n\n1\nThe Lov\u00b4asz \u03d1 function [19] plays a fundamental role in modern combinatorial optimization and\nin various approximation algorithms on graphs, indeed Goemans was led to say It seems all\nroads lead to \u03d1 [10]. The function is an instance of semide\ufb01nite programming(SDP) and\nhence computing it is an extremely demanding task even for moderately sized graphs. In this paper\nwe establish that the \u03d1 function is equivalent to solving a kernel learning problem in the one-class\nSVM setting. This surprising connection opens up many opportunities which can bene\ufb01t both graph\ntheory and machine learning. In this paper we exploit this novel connection to show an interesting\napplication of the SVM setup for identfying large dense subgraphs. More speci\ufb01cally we make the\nfollowing contributions.\nhttp://www.cse.chalmers.se/(cid:101)\nand\n\n\u2217Relevant\n\ndatasets\n\ncode\n\ncan\n\nbe\n\nfound\n\non\n\njethava/svm-theta.html\n\n1\n\n\f1.1 Contributions:\n\n1.We give a new SDP characterization of Lov\u00b4asz \u03d1 function,\n\nmin\n\nK\u2208K(G)\n\n\u03c9(K) = \u03d1(G)\n\nwhere \u03c9(K) is computed by solving an one-class SVM. The matrix K is a kernel matrix, associated\nwith any orthogonal labelling of G. This is discussed in Section 2.\n2. Using an easy to compute orthogonal labelling we show that there exist graphs, which we call\nSVM \u2212 \u03d1 graphs, on which Lov\u00b4asz \u03d1 function can be well approximated by solving an one-class\nSVM. This is discussed in Section 3.\n3. The problem of \ufb01nding a large common dense subgraph in multiple graphs arises in a variety\nof domains including Biology, Internet, Social Sciences [18]. Existing state-of-the-art methods\n[14] are enumerative in nature and has complexity exponential in the size of the subgraph. We\nintroduce the notion of common orthogonal labelling which can be used to develop a formulation\nwhich is close in spirit to a Multiple Kernel Learning based formulation. Our results on the well\nknown DIMACS benchmark dataset show that it can identify large common dense subgraphs in\nwide variety of settings, beyond the realm of state-of-the-art methods. This is discussed in Section\n4.\n4. Lastly, in Section 5, we show that the famous planted clique problem, can be easily solved for\nlarge graphs by solving an one-class SVM. Many problems of interest in the area of machine learning\ncan be reduced to the problem of detecting planted clique, e.g detecting correlations [1, section 4.6],\ncorrelation clustering [21] etc. The planted clique problem consists of identifying a large clique in\na random graph. There is an elegant approach for identifying the planted clique by computing the\nLov\u00b4asz \u03d1 function [8], however it is not practical for large graphs as it requires solving an SDP.\nWe show that the graph associated with the planted clique problem is a SVM \u2212 \u03d1 graph, paving\nthe way for identifying the clique by solving an one-class SVM. Apart from the method based on\ncomputing the \u03d1 function, there are other methods for planted clique identi\ufb01cation, which do not\nrequire solving an SDP [2, 7, 24]. Our result is also competitive with the state-of-the-art non-SDP\nbased approaches [24].\nNotation We denote the Euclidean norm by (cid:107) \u00b7 (cid:107) and the in\ufb01nity norm by (cid:107) \u00b7 (cid:107)\u221e. Let S d\u22121 =\n{u \u2208 Rd|(cid:107)u(cid:107) = 1} denote a d dimensional sphere. Let Sn denote the set of n\u00d7n square symmetric\nn denote n \u00d7 n square symmetric positive semide\ufb01nite matrices. For any A \u2208 Sn\nmatrices and S+\nwe denote the eigenvalues \u03bb1(A) \u2265 . . . \u2265 \u03bbn(A). diag(r) will denote a diagonal matrix with\ndiagonal entries de\ufb01ned by components of r. We denote the one-class SVM objective function by\n\n\u03c9(K) =\n\nmax\n\n\u03b1i\u22650,i=1,...,n\n\n2\n\n\u03b1i\u03b1jKij\n\n(1)\n\n(cid:32)\n(cid:124)\n\nn(cid:88)\n\ni=1\n\n\u03b1i \u2212 n(cid:88)\n(cid:123)(cid:122)\n\ni=1\n\nf (\u03b1;K)\n\n(cid:33)\n(cid:125)\n\nthe subgraph induced by S \u2286 V in graph G; having density \u03b3(GS) is given by \u03b3(GS) = |ES|/(cid:0)|S|\n(cid:1).\n\nwhere K \u2208 S+\nn . Let G = (V, E) be a graph on vertices V = {1, . . . , n} and edge set E. Let\nA \u2208 Sn denote the adjacency matrix of G where Aij = 1 if edge (i, j) \u2208 E, and 0 otherwise. An\neigenvalue of graph G would mean the eigenvalue of the adjacency matrix of G. Let \u00afG denote the\ncomplement graph of G. The adjacency matrix of \u00afG is \u00afA = ee(cid:62) \u2212 I\u2212 A, where e = [1, 1, . . . , 1](cid:62)\nis a vector of length n containing all 1\u2019s, and I denotes the identity matrix. Let GS = (S, ES) denote\nLet Ni(G) = {j \u2208 V : (i, j) \u2208 E} denote the set of neighbours of vertex i in graph G, and degree\nof node i to be di(G) = |Ni(G)|. An independent set in G (a clique in \u00afG is a subset of vertices\nS \u2286 V for which no (every) pair of vertices has an edge in G (in \u00afG). The notation is standard e.g.\nsee [3].\n\n2\n\n2 Lov\u00b4asz \u03d1 function and Kernel learning\nConsider the problem of embedding a graph G = (V, E) on a d dimensional unit sphere Sd\u22121. The\nstudy of this problem was initiated in [19] which introduced the idea of orthogonal labelling: An\n\n2\n\n\fi uj = 0 whenever (i, j) (cid:54)\u2208 E and ui \u2208 S d\u22121 \u2200 i = 1, . . . , n.\n\northogonal labelling of graph G = (V, E) with |V | = n, is a matrix U = [u1, . . . , un] \u2208 Rd\u00d7n such\nthat u(cid:62)\nAn orthogonal labelling de\ufb01nes an embedding of a graph on a d dimensional unit sphere: for every\nvertex i there is a vector ui on the unit sphere and for every (i, j) (cid:54)\u2208 E ui and uj are orthogonal.\nUsing the notion of orthogonal labellings, [19] de\ufb01ned a function, famously known as Lov\u00b4asz \u03d1\nfunction, which upper bounds the size of maximum independent set. More speci\ufb01cally\n\nfor any graph G : ALPHA(G) \u2264 \u03d1(G),\n\nwhere ALPHA(G) is the size of the largest independent set. Finding large independent sets is\na fundamental problem in algorithm design and analysis and computing ALPHA(G) is a classic\nNP-hard problem which is even very hard even to approximate [11]. However, the Lov\u00b4asz function\n\u03d1(G) gives a tractable upper-bound and since then Lov\u00b4asz \u03d1 function has been extensively used\nin solving a variety of algorithmic problems e.g. [6]. It maybe useful to recall the de\ufb01nition of\nLov\u00b4asz \u03d1 function. Denote the set of all possible orthogonal labellings of G by Lab(G) = {U =\n[u1, . . . , un]|ui \u2208 S d\u22121, u(cid:62)\n\ni uj = 0\u2200(i, j) (cid:54)\u2208 E}.\n\n\u03d1(G) = min\n\nU\u2208Lab(G)\n\nmin\nc\u2208S d\u22121\n\nmax\n\ni\n\n1\n\n(c(cid:62)ui)2\n\n(2)\n\nThere exist several other equivalent de\ufb01nitions of \u03d1, for a comprehensive discussion see [16].\nHowever computation of Lov\u00b4asz \u03d1 function is not practical even for moderately sized graphs as it\nrequires solving a semide\ufb01nite program on a matrix which is of the size of the graph. In the following\ntheorem, we show that there exist connections between the \u03d1 function and the SVM formulation.\nn | Kii =\nTheorem 2.1. For a undirected graph G = (V, E), with |V | = n, let K(G) := {K \u2208 S+\n1, i \u2208 [n], Kij = 0, (i, j) (cid:54)\u2208 E} Then, \u03d1(G) = minK\u2208K(G) \u03c9(K)\nProof. We begin by noting that any K \u2208 K(G) is positive semide\ufb01nite and hence there exists\nU \u2208 Rd\u00d7n such that K = U(cid:62)U. Note that Kij = u(cid:62)\ni uj where ui is a column of U. Hence by\ninspection it is clear that the columns of U de\ufb01nes an orthogonal labelling on G, i.e U \u2208 Lab(G).\nUsing a similar argument we can show that for any U \u2208 Lab(G), the matrix K = U(cid:62)U, is an\nelement of K(G). The set of valid kernel matrices K(G) is thus equivalent to Lab(G). Note that if\nU is a labelling then U = Udiag(\u0001) is also an orthogonal labelling for any \u0001(cid:62) = [\u00011, . . . , \u0001n], \u0001i =\n\u00b11 i = 1, . . . , n. It thus suf\ufb01ces to consider only those labellings for which c(cid:62)ui \u2265 0 \u2200 i =\n\u2264 t. This is\n1, . . . , n holds. For a \ufb01xed c one can write maxi\ntrue because the minimum over t is attained at maxi\n. Setting w = 2tc yields the following\n(cid:107)w(cid:107)2\n4 with constraints w(cid:62)ui \u2265 2. This establishes\nrelation minc\u2208S d\u22121 maxi\nthat for a labelling, U, the optimal c is obtained by solving an one-class SVM. Application of\n(c(cid:62)ui)2 = \u03c9(K) where K = U(cid:62)U\nstrong duality immediately leads to the claim minc\u2208S d\u22121 maxi\nand \u03c9(K) is de\ufb01ned in (1). As there is a correspondence between each element of Lab(G) and K\nminimization of \u03c9(K) over K is equivalent to computing the \u03d1(G) function.\n\n(c(cid:62)ui)2 = mint t2 subject to\n\n(c(cid:62)ui)2 = minw\u2208Rd\n\n1\n\nc(cid:62)ui\n\n1\n\nc(cid:62)ui\n\n1\n\n1\n\n1\n\nThis is a signi\ufb01cant result which establishes connections between two well studied formulations,\nnamely \u03d1 function and the SVM formulation. An important consequence of Theorem 2.1 is an\neasily computable upperbound on \u03d1(G) namely that for any graph G\n\nALPHA(G) \u2264 \u03d1(G) \u2264 \u03c9(K) \u2200K \u2208 K(G)\n\n(3)\n\nSince solving \u03c9(K) is a convex quadratic program, it is indeed a computationally ef\ufb01cient alternative\nto the \u03d1 function. In fact we will show that there exist families of graphs for which \u03d1(G) can be\napproximated to within a constant factor by \u03c9(K) for suitable K. Theorem 2.1 is closely related to\nthe following result proved in [20].\nTheorem 2.2. [20] For a graph G = (V, E) with |V | = n let C \u2208 Sn matrix with Cij = 0\nwhenever (i, j) (cid:54)\u2208 E. Then,\n\n\u03d1(G) = minC v(G, C)\n\n= max\nx\u22650\n\n3\n\n(cid:18)\n\n2x(cid:62)e \u2212 x(cid:62)(cid:18) C\n\n(cid:19)\n\n(cid:19)\n\n\u2212\u03bbn(C)\n\n+ I\n\nx\n\n\fProof. See [20]\nSee that for any feasible C the matrix I + C\u2212\u03bbn(C) \u2208 K(G). Theorem 2.1 is a restatement of\nTheorem 2.2, but has the additional advantage that the stated optimization problem can be solved\nas an SDP. The optimization problem minCv(G, C) with constraints on C is not an SDP. If we\n\ufb01x C = A, the adjacency matrix, we obtain a very interesting orthogonal labelling, which we will\nrefer to as LS labelling, introduced in [20]. Indeed there exists family of graphs, called Q graphs\nfor which LS labelling yields the interesting result ALPHA(G) = v(G, A), see [20]. Indeed\non a Q graph one does not need to compute a SDP, but can solve an one-class SVM, which has\nobvious computational bene\ufb01ts. Inspired by this result, in the remaining part of the paper, we study\nthis labelling more closely. As a labelling is completely de\ufb01ned by the associated kernel matrix, we\nrefer to the following kernel as the LS labelling,\n\nK =\n\nA\n\u03c1\n\n+ I where \u03c1 \u2265 \u2212\u03bbn(A).\n\n(4)\n\n3 SVM\u2212 \u03d1 graphs: Graphs where \u03d1 function can be approximated by SVM\n\nWe now introduce a class of graphs on which \u03d1 function can be well approximated by \u03c9(K) for K\nde\ufb01ned by (4). In the spirit of approximation algorithms we de\ufb01ne:\nDe\ufb01nition 3.1. A graph G is a SVM \u2212 \u03d1 graph if \u03c9(K) \u2264 (1 + O(1))\u03d1(G) where K is a LS la-\nbelling.\n\nSuch classes of graphs are interesting because on them, one can approximate the Lov\u00b4asz \u03d1 function\nby solving an SVM, instead of an SDP, which in turn can be extremely useful in the design and anal-\nysis of approximation algorithms. We will demosntrate two examples of SVM \u2212 \u03d1 graphs namely\n(a.) the Erd\u00a8os\u2013Renyi random graph G(n, 1/2) and (b.) a planted variation. Here the relaxation\n\u03c9(K) could be used in place of \u03d1(G), resulting in algorithms with the same quality guarantees but\nwith faster running time \u2013 in particular, this will allow the algorithms to be scaled to large graphs.\nThe classical Erd\u00a8os-Renyi random graph G(n, 1/2) has n vertices and each edge (i, j) is present\nindependently with probability 1/2. We list a few facts about G(n, 1/2) that will be used repeatedly.\nFact 3.1. For G(n, 1/2),\n\n\u221a\n\n\u2022 With probability 1\u2212 O(1/n), the degree of each vertex is in the range n/2\u00b1 O(\nn log n).\n\u2022 With probability 1 \u2212 e\u2212nc for some c > 0, the maximum eigenvalue is n/2 \u00b1 o(n) and the\nminimum eigenvalue is in the range [\u2212\u221a\n\n\u221a\n\nn,\n\n2 \u2212 1. For G = G(n, 1/2) , with probability 1 \u2212 O(1/n), \u03c9(K) \u2264\n\n\u221a\n\nTheorem 3.1. Let \u0001 >\n(1 + \u0001)\u03d1(G) where K is de\ufb01ned in (4) with \u03c1 = 1+\u0001\u221a\n\nn] [9].\n\u221a\n\nn.\n\n2\n\n\u221a\nProof. We begin by considering the case for \u03c1 = (1 + \u03b4\nn. By Fact 3.1 for all choices of \u03b4 > 0,\n2 )\nthe minimum eigenvalue of 1\n\u03c1 A + I is, almost surely, greater than 0 which implies that f (\u03b1, K)\n(see (1)) is strongly concave. For such functions KKT conditions are neccessary and suf\ufb01cient for\noptimality. The KKT conditions for a G(n, 1\n\n2 ) are given by the following equation\n\nAi,j\u03b1j = 1 + \u00b5i, \u00b5i\u03b1i = 0, \u00b5i \u2265 0\n\n(5)\n\n(cid:88)\n\n(i,j)\u2208E\n\n\u03b1i +\n\n1\n\u03c1\n\n2 (ee(cid:62) \u2212 I), be\nAs A is random we begin by analyzing the case for expectation of A. Let E(A) = 1\nE(A)\nthe expectation of A. For the given choice of \u03c1, the matrix \u02dcK =\n\u03c1 + I is positive de\ufb01nite. More\nimportantly f (\u03b1, \u02dcK) is again strongly concave and attains maximum at a KKT point. By direct\nveri\ufb01cation \u02c6\u03b1 = \u02c6\u03b2e where \u02c6\u03b2 = 2\u03c1\n\nn\u22121+2\u03c1 satis\ufb01es\n\n\u03b1 +\n\n1\n\u03c1\n\nE(A)\u03b1 = e.\n\n4\n\n(6)\n\n\fThus \u02c6\u03b1 is the KKT point for the problem,\n\n\u00aff = max\n\u03b1\u22650\n\nf (\u03b1, \u02dcK) =\n\nwith the optimal objective function value \u00aff. By choice of \u03c1 = (1 + \u03b4\n2 )\n2\u03c1/n + O(1/n). Using the fact about degrees of vertices in G(n, 1/2), we know that\n\nn we can write \u02c6\u03b2 =\n\nwhere a(cid:62)\n\ni\n\nis the ith row of the adjacency matrix A. As a consequence we note that\n\n(7)\n\n(8)\n\n(9)\n\n(10)\n\n\u02c6\u03b1 = n \u02c6\u03b2\n\u221a\n\n\u03c1\n\ni=1\n\n+ I\n\n(cid:19)\n\nn(cid:88)\n\n\u02c6\u03b1 \u2212 \u02c6\u03b1(cid:62)(cid:18)E(A)\n+ \u2206i with |\u2206i| \u2264(cid:112)n log n\n(cid:88)\n\nAij \u02c6\u03b1j \u2212 1\n\nn \u2212 1\n\na(cid:62)\ni e =\n\n2\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\u02c6\u03b1i +\n\n1\n\u03c1\n\nj\n\n\u2206i\n\n\u02c6\u03b2\n\u03c1\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) =\n(cid:112)n log n\n(cid:19)\n\nRecalling the de\ufb01nition of f and using the above equation along with (8) gives\n\n|f (\u02c6\u03b1; K) \u2212 \u00aff| \u2264 n\n\n\u02c6\u03b22\n\u03c1\n\n(cid:18)\n\nAs noted before the function f (\u03b1; K) is strongly concave with \u22072\n\u03b1f (\u03b1; K) (cid:22) \u2212 \u03b4\n\u03b1. Recalling a useful result from convex optimization, see Lemma 3.1, we obtain\n\n2+\u03b4 I for all feasible\n\n\u03c9(K) \u2212 f (\u02c6\u03b1; K) \u2264\n\n(cid:107)\u2207f (\u02c6\u03b1; K)(cid:107)2\n\n1\n\u03b4\n\n1 +\n\nObserving that \u2207f (\u03b1; K) = 2(e\u2212 \u03b1\u2212 A\nwith (9) and (8) gives (cid:107)\u2207f (\u02c6\u03b1; K)(cid:107) \u2264 \u221a\n\u221a\n(11) and using equation (10) we obtain \u03c9(K) \u2264 \u02c6f + O(log n) = (2 + \u03b4)\nequality follows by plugging in the value of \u02c6\u03b2 in (7). It is well known [6] that \u03d1(G) =\nG(n, 1\nfollows by choice of \u03b4.\n\n(11)\n\u03c1 \u03b1) and using the relation between (cid:107)\u00b7(cid:107)\u221e and 2 norm along\nn(cid:107)\u2207f (\u02c6\u03b1; K)(cid:107)\u221e = 2n\nlog n. Plugging this estimate in\nn + O(log n) The second\nn for\nn) and the theorem\n\n2 ) with high probability. One concludes that \u03c9(K) \u2264 2+\u03b4\u221a\n\n\u03d1(G) + o(\n\n\u221a\n2\n\n\u221a\n\n\u221a\n\n\u221a\n\n\u02c6\u03b2\n\u03c1\n\n2\n\nDiscussion: Theorem 3.1 establishes that instead of SDP one can solve an SVM to evaluate\n\u03d1 function on G(n, 1/2). Although it is well known that ALPHA(G(n, 1/2)) = 2 log n whp,\nthere is no known polynomial time algorithm for computing the maximum independent set. [6]\ngives an approximation algorithm that \ufb01nds an independent set in G(n, p) which runs in expected\npolynomial time, via a computation of \u03d1(G(n, p)),which also applies to p = 1/2. The \u03d1 function\nalso serves as a guarantee of the approximation which other algorithms a simple Greedy algorithm\ncannot give. Theorem 3.1 allows us to obtain similar guarantees but without the computational over-\nhead of solving an SDP. Apart from \ufb01nding independent sets computing \u03d1(G(n, p)) is also used as\na subroutine in colorability [6], and here again one can use the SVM based approach to approximate\nthe \u03d1 function.\nSimilar arguments show also that other families of graphs such as the 11 families of pseudo\u2013random\ngraphs described in [17] are SVM \u2212 \u03d1 graphs.\nLemma 3.1. [4] A function g : C \u2282 Rd \u2192 R is said to be strongly concave over S if there\nexists t > 0 such that \u22072g(x) (cid:22) \u2212tI \u2200 x \u2208 C. For such functions one can show that if p\u2217 =\nmaxx\u2208C g(x) < \u221e then\n\n\u2200x \u2208 C p\u2217 \u2212 g(x) \u2264 1\n2t\n\n(cid:107)\u2207g(x)(cid:107)2\n\n4 Dense common subgraph detection\n\nThe problem of \ufb01nding a large dense subgraph in multiple graphs has many applications [23, 22,\n18]. We introduce the notion of common orthogonal labelling, and show that it is indeed possible\nto recover dense regions in large graphs by solving a MKL problem. This constitutes signi\ufb01cant\nprogress with respect to state of the art enumerative methods [14].\n\n5\n\n\fnT\n\nProblem de\ufb01nition Let G = {G(1), . . . , G(M )} be a set of simple, undirected graphs G(m) =\n(V, E(m)) de\ufb01ned on vertex set V = {1, . . . , n}. Find a common subgraph which is dense in all the\ngraphs.\nMost algorithms which attempts the problem of \ufb01nding a dense region are enumerative in nature and\nhence do not scale well to \ufb01nding large cliques. [14], \ufb01rst studied a related problem of \ufb01nding all\npossible common subgraphs for a given choice of parameters {\u03b3(1), . . . , \u03b3(M )} which is atleast \u03b3i\n\ndense in G(i). In the worst case, the algorithm performs depth \ufb01rst search over space of(cid:0) n\ncliques of size nT . This has \u0398((cid:0) n\n\n(cid:1) possible\n(cid:1)) space and time complexity, which makes it impractical for\n\nmoderately large nT . For example, \ufb01nding quasicliques of size 60 requires 8 hours (see Section 6).\nIn the remainder of this section, we focus on \ufb01nding a large common sparse subgraph in a given\ncollection of graphs; with the observation that this is equivalent to \ufb01nding a large common dense\nsubgraph in the set of complement graphs. To this end we introduce the following de\ufb01nition\nDe\ufb01nition 4.1. Given simple unweighted graphs, G(m) = (V, E(m)) m = 1, . . . , M on a common\nvertex set V with |V | = n, the common orthogonal labelling on all the labellings is given by\ni uj = 0 if (i, j) /\u2208 E(m) \u2200 m = 1, . . . , M}.1\nui \u2208 S d\u22121 such that u(cid:62)\nFollowing the arguments of Section 2 it is immediate that size of the largest common independent\nset is upper bounded by minK\u2208L \u03c9(K) where L = {K \u2208 S+\n: Kii = 1\u2200i \u2208 [n], Kij =\n0 whenever (i, j) /\u2208 E(m) \u2200 m = 1, . . . , M}. We wish to exploit this fact in identifying large\ncommon sparse regions in general graphs. Unfortunately this problem is a SDP and will not scale\nwell to large graphs. Taking cue from MKL literature we pose a restricted version of the problem\nnamely\n\nnT\n\nn\n\nK=(cid:80)M\n\nm=1 \u03b4mK(m) , \u03b4m\u22650 (cid:80)M\n\nmin\n\nm=1 \u03b4m=1\n\n\u03c9(K)\n\nmax\n\nt\u2208R,\u03b1i\u22650\n\nalso a common orthogonal labelling. Using the fact that \u2200x \u2208 RM minpm\u22650,(cid:80)M\nwhere K(m) is an orthogonal labelling of G(m). Direct veri\ufb01cation shows that any feasible K is\nm=1 pm=1 p(cid:62)x =\nminm xm = max{t|xm \u2265 t \u2200 m = 1, . . . , M} one can recast the optimization problem in (12) as\nfollows\n\nt s.t. f (\u03b1; K(m)) \u2265 t \u2200 m = 1, . . . , M\n\n(13)\nwhere K(m) is the LS labelling for G(m),\u2200m = 1, . . . , M. The above optimization can be readily\nsolved by state of the art MKL solvers. This result allows us to build a parameter free common\nsparse subgraph (CSS) algorithm shown in Figure 1 having following advantages:\nit provides a\ntheoretical bound on subgraph density (Claim 4.1 below); and, it requires no parameters from the\nuser beyond the set of graphs G(1), . . . , G(M ).\ni > 0} and S1 = {i : \u03b1\u2217\nLet \u03b1\u2217 be the optimal solution in (13); and SV = {i : \u03b1\u2217\ni = 1} with\ncardinalities nsv = |SV | and n1 = |S1| respectively. Let \u00af\u03b1(m)\nj\u2208Ni(G\n(m)\ndenote\nmin,S = mini\u2208S\nS\ndi(G(m)\nS )\nthe average of the support vector coef\ufb01cients in the neighbourhood Ni(G(m)\nsubgraph G(m)\n\nS ) of vertex i in induced\n\nhaving degree di(G(m)\n\nS ) = |Ni(G(m)\n\n(cid:80)\n\n\u03b1\u2217\n\nS\n\nj\n\n)\n\n(12)\n\nT (m) =\n\ni \u2208 SV : di(G(m)\n\nSV ) <\n\nwhere c = min\ni\u2208SV\n\n\u03b1\u2217\n\ni\n\n(14)\n\nT\n\nClaim 4.1. Let T \u2286 V be computed as in Al-\ngorithm 1. The subgraph G(m)\ninduced by T ,\nin graph G(m), has density at most \u03b3(m) where\n\u03b3(m) = (1\u2212c)\u03c1(m)\n\nProof. (Sketch) At optimality, t = (cid:80)n\nThis allows us to write 0 \u2264 (cid:80)\nmin,SV ) Dividing by(cid:0)nT\n\n\u00af\u03b1min,SV (nT \u22121)\n\ni\u2208S \u03b1\u2217\n\ni=1 \u03b1\u2217\ni .\ni (2 \u2212 \u03b1\u2217\n\ni \u2212(cid:80)\n(cid:1) completes the proof.\n\ndi(G(m)\nT )\n\u03c1(m)\n\n\u00af\u03b1(m)\n\n2\n\n\u03b1\u2217 = Use MKL solvers to solve eqn. (13)\nT = \u2229mT (m) {eqn. (14)}\nReturn T\n\nFigure 1: Algorithm for \ufb01nding common sparse\nsubgraph: T = CSS(G(1), . . . , G(M ))\n\nj ) \u2212 t as 0 \u2264 (cid:80)\n\ni\u2208T (1 \u2212 c \u2212\n\nj(cid:54)=i K (m)\n\nij \u03b1\u2217\n\n1This is equivalent to de\ufb01ning an orthogonal labelling on the Union graph of G(1), . . . , G(M )\n\n6\n\n(cid:40)\n\n(cid:41)\nS )|. We de\ufb01ne\n(1 \u2212 c)\u03c1(m)\n\u00af\u03b1(m)\n\nmin,SV\n\n\f\u221a\n\n(cid:18)\n\n1\nt\n\n(cid:19)\n\n5 Finding Planted Cliques in G(n, 1/2) graphs\n\nFinding large cliques or independent sets is a computationally dif\ufb01cult problem even in random\ngraphs. While it is known that the size of the largest clique or independent set in G(n, 1/2) is 2 log n\nwith high probability, there is no known ef\ufb01cient algorithm to \ufb01nd a clique of size signi\ufb01cantly larger\nthan log n - even a cryptographic application was suggested based on this (see the discussion and\nreferences in the introduction of [8]).\n\nHidden planted clique A random graph G(n, 1/2) is chosen \ufb01rst and then a clique of size k is\nintroduced in the \ufb01rst 1, . . . , k vertices. The problem is to identify the clique.\n\u221a\n[8] showed that if k = \u2126(\nn), then the hidden clique can be discovered in polynomial time by com-\nputing the Lov\u00b4asz \u03d1 function. There are other approaches [2, 7, 24] which do not require computing\nthe \u03d1 function.\n\u221a\nWe consider the (equivalent) complement model G(n, 1/2, k) where a independent set is planted on\nn), \u00afG(n, 1/2, k) is a SVM \u2212 \u03d1 graph.\nthe set of k vertices. We show that in the regime k = \u2126(\nWe will further demonstrate that as a consequence one can identify the hidden independent set with\nhigh probability by solving an SVM. The following is the main result of the section.\nn for large enough constant t \u2265 1 with K as in\nTheorem 5.1. For G = \u00afG(n, 1/2, k) and k = 2t\n(4) and \u03c1 =\n\nn + k/2,\n\n\u221a\n\n\u221a\n\n1 +\n\n\u03d1(G)\n\n+ o(1)\n\nn + O(log n) =\n\n\u03c9(K) = 2(t + 1)\nwith probability at least 1 \u2212 O(1/n).\nProof. The proof is analogous to that of Theorem 3.1. Note that |\u03bbn(G)| \u2264 \u221a\nn + k/2. First we\nconsider the expected case where all vertices outside the planted part S are adjacent to k/2 vertices\nin S and (n \u2212 k)/2 vertices outside S. and all verties in the planted part have degree (n \u2212 k)/2.\n\u221a\n\u221a\nprobability, all vertices in S have degree (n \u2212 k)/2 \u00b1(cid:112)(n \u2212 k) log(n \u2212 k) and those outside S are\nn for i (cid:54)\u2208 S and \u03b1i = 2(t + 1)2/\nn for i \u2208 S satisfy KKT\n\u221a\nWe check that \u03b1i = 2(t + 1)/\nk log k vertices in S and to (n \u2212 k)/2 \u00b1(cid:112)(n \u2212 k) log(n \u2212 k) vertices ouside\nn). Now apply Chernoff bounds to conclude that with high\nconditions with an error of O(1/\nadjacent to k/2 \u00b1 \u221a\n(cid:19)\n(cid:18)(cid:113) log n\nS. Now we check that the same solution satis\ufb01es KKT conditions of \u00afG(n, 1/2, k) with an error of\n. Using the same arguments as in the proof of Theorem 3.1, we conclude that\n\u221a\n\n\u221a\n\n\u0001 = O\n\u03c9(K) \u2264 2(t + 1)\n\nn\n\nn + O(log n). Since \u03d1(G) = 2t\n\nn for this case [8], the result follows.\n\nThe above theorem suggests that the planted independent set can be recovered by taking the top k\nvalues in the optimal solution. In the experimental section we will discuss the performance of this\nrecovery algorithm. The runtime of this algorithm is one call to SVM solver, which is considerably\ncheaper than the SDP option. Indeed the algorithm due to [8], requires computation of \u03d1 function.\nThe current best known algorithm for \u03d1 computation has an O(n5 log n)[5], run time complexity. In\ncontrast the proposed approach needs to solve an SVM and hence scales well to large graphs. Our\napproach is competitive with the state of the art [24] as it gives the same high probability guarantees\nand have the same running time, O(n2). Here we have assumed that we are working with a SVM\nsolver which has a time complexity of O(n2) [13].\n\n6 Experimental evaluation\n\u221a\nComparison with exhaustive approach [14] We generate synthetic m = 3 random graphs over\nn vertices with average density \u03b4 = 0.2, and having single (common) quasi-clique of size k = 2\nn\nwith density \u03b3 = 0.95 in all the three graphs. This is similar to the synthetic graphs generated\nin the original paper [see 14, Section 6.1.2]. We note that both our MKL-based approach and\nexhaustive search in [14] recovers the quasi-clique. However, the time requirements are drastically\ndifferent. All experiments were conducted on a computer with 16 GB RAM and Intel X3470 quad-\ncore processor running at 2.93 GHz. Three values of k namely k = 50, 60 and k = 100 were used.\nIt is interesting to note that CROCHET [14] took 2 hours and 9 hours for k = 50 and k = 60 sized\ncliques and failed to \ufb01nd a clique of size of 100. The corresponding numbers for MKL are 47.5,54.8\nand 137.6 seconds respectively.\n\n7\n\n\fCommon dense subgraph detection We evaluate our algorithm for \ufb01nding large dense regions on\nthe DIMACS Challenge graphs 2 [15], which is a comprehensive benchmark for testing of clique\n\ufb01nding and related algorithms. For the families of dense graphs (brock, san, sanr), we focus on\n\ufb01nding large dense region in the complement of the original graphs.\nWe run Algorithm 1 using SimpleMkl3 to \ufb01nd large common dense subgraph. In order to evalu-\nate the performance of our algorithm, we compute \u00afa = maxm a(m) and a = minm a(m) where\na(m) = \u03b3(G(m)\nT )/\u03b3(G(m)) is relative density of induced subgraph (compared to original graph den-\nsity); and nT /N is relative size of induced subgraph compared to original graph size. We want\na high value of nT /N; while a should not be lower than 1. Table 1 shows evaluation of Algo-\nrithm 1 on DIMACS dataset. We note that our algorithm \ufb01nds a large subgraph (large nT /N)\nwith higher density compared to original graph in all of DIMACS graph classes making it suit-\nable for \ufb01nding large dense regions in multiple graphs. In all cases the size of the subgraph, nT\nwas more than 100. The MKL experiments reported in Table 1 took less than 1 minute (for each\ngraph family); while the algorithm in [14] aborts after several hours due to memory constraints.\n\nPlanted clique recovery We generate 100\nrandom graphs based on planted clique\n\u221a\nmodel G(n, 1/2, k) where n = 30000 and\nn for each choice\nhidden clique size k = 2t\nof t. We evaluate the recovery algorithm\ndiscussed in Section 4.2. The SVM prob-\nlem is solved using Libsvm4. For t \u2265 2 we\n\ufb01nd perfect recovery of the clique on all the\ngraphs, which is agreement with Theorem\n5.1.\nIt is worth noting that the approach takes 10\nminutes to recover the clique in this graph of\n30000 vertices which is far beyond the scope\nof SDP based procedures.\n\n7 Conclusion\n\nGraph family\nc-fat200\nc-fat500\nbrock200\u2021\nbrock400\u2021\nbrock800\u2021\np hat300\np hat500\np hat700\np hat1000\np hat1500\nsan200\u2021\nsan400\u2021\nsanr200\u2021\nsanr400\u2021\n\nN M nT\nN\n0.50\n0.31\n0.41\n0.50\n0.50\n0.53\n0.48\n0.45\n0.43\n0.38\n0.50\n0.42\n0.39\n0.43\n\n200\n500\n200\n400\n800\n300\n500\n700\n1000\n1500\n200\n400\n200\n400\n\n3\n4\n4\n4\n4\n3\n3\n3\n3\n3\n5\n3\n2\n2\n\n\u00afa\n2.12\n3.57\n1.36\n1.15\n1.08\n1.53\n1.55\n1.58\n1.60\n1.63\n1.51\n1.19\n1.86\n1.20\n\na\n0.99\n1.01\n0.99\n1.05\n1.01\n1.15\n1.17\n1.18\n1.19\n1.20\n1.08\n1.02\n1.04\n1.02\n\nTable 1: Common dense subgraph recovery on multi-\nple graphs in DIMACS dataset. Here \u00afa and a denote\nthe maximum and minimum relative density of the\ninduced subgraph (relative to density of the original\ngraph) and nT /N is the relative size of the induced\nsubgraph compared to original graph size.\n\nIn this paper we have established that the\nLov\u00b4asz \u03d1 function, well studied in graph the-\nory can be linked to the one-class SVM for-\nmulation. This link allows us to design scal-\nable algorithms for computationally dif\ufb01cult\nproblems.\nIn particular we have demon-\nstrated that \ufb01nding a common dense region\nin multiple graphs can be solved by a MKL problem, while \ufb01nding a large planted clique can be\nsolved by an one class SVM.\n\nAcknowledgements\n\nCB is grateful to Department of CSE, Chalmers University of Technology for their hospitality and\nwas supported by grants from ICT and Transport Areas of Advance, Chalmers University. VJ and\nDD were supported by SSF grant Data Driven Secure Business Intelligence.\n\n2ftp://dimacs.rutgers.edu/pub/challenge/graph/benchmarks/clique/\n3http://asi.insa-rouen.fr/enseignants/\u02dcarakotom/code/mklindex.html\n4http://www.csie.ntu.edu.tw/\u02dccjlin/libsvm/\n\n8\n\n\fReferences\n[1] Louigi Addario-berry, Nicolas Broutin, Gbor Lugosi, and Luc Devroye. Combinatorial testing\n\nproblems. Annals of Statistics, 38:3063\u20133092, 2010.\n\n[2] Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a\n\nrandom graph. Random Structures and Algorithms, pages 457\u2013466, 1998.\n\n[3] B. Bollob\u00b4as. Modern graph theory, volume 184. Springer Verlag, 1998.\n[4] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press,\n\nNew York, NY, USA, 2004.\n\n[5] T.-H. Hubert Chan, Kevin L. Chang, and Rajiv Raman. An sdp primal-dual algorithm for\n\napproximating the lov\u00b4asz-theta function. In ISIT, 2009.\n\n[6] Amin Coja-Oghlan and Anusch Taraz. Exact and approximative algorithms for coloring g(n,\n\np). Random Struct. Algorithms, 24(3):259\u2013278, 2004.\n\n[7] U. Feige and D. Ron. Finding hidden cliques in linear time. In AofA10, 2010.\n[8] Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semi-\n\nrandom graph. Random Struct. Algorithms, 16:195\u2013208, March 2000.\n\n[9] Z. F\u00a8uredi and J. Koml\u00b4os. The eigenvalues of random symmetric matrices. Combinatorica,\n\n1:233\u2013241, 1981.\n\n[10] Michel X. Goemans. Semide\ufb01nite programming in combinatorial optimization. Math. Pro-\n\n[11] J. H\u02daastad. Clique is hard to approximate within n1\u2212\u03b5. Acta Mathematica, 182(1):105\u2013142,\n\ngram., 79:143\u2013161, 1997.\n\n1999.\n\n[12] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1990.\n[13] Don R. Hush, Patrick Kelly, Clint Scovel, and Ingo Steinwart. Qp algorithms with guaranteed\naccuracy and run time for support vector machines. Journal of Machine Learning Research,\n7:733\u2013769, 2006.\n\n[14] D. Jiang and J. Pei. Mining frequent cross-graph quasi-cliques. ACM Transactions on Knowl-\n\nedge Discovery from Data (TKDD), 2(4):16, 2009.\n\n[15] D.S. Johnson and M.A. Trick. Cliques, coloring, and satis\ufb01ability: second DIMACS imple-\n\nmentation challenge, October 11-13, 1993, volume 26. Amer Mathematical Society, 1996.\n[16] Donald Knuth. The sandwich theorem. Electronic Journal of Combinatorics, 1(A1), 1994.\n[17] Michael Krivelevich and Benny Sudakov. Pseudo-random graphs. In More Sets, Graphs and\nNumbers, volume 15 of Bolyai Society Mathematical Studies, pages 199\u2013262. Springer Berlin\nHeidelberg, 2006.\n\n[18] V.E. Lee, N. Ruan, R. Jin, and C. Aggarwal. A survey of algorithms for dense subgraph\n\ndiscovery. Managing and Mining Graph Data, pages 303\u2013336, 2010.\n\n[19] L. Lovasz. On the Shannon capacity of a graph. Information Theory, IEEE Transactions on,\n\n25(1):1\u20137, 1979.\n\n[20] C.J. Luz and A. Schrijver. A convex quadratic characterization of the lov\u00b4asz theta number.\n\nSIAM Journal on Discrete Mathematics, 19(2):382\u2013387, 2006.\n\n[21] Claire Mathieu and Warren Schudy. Correlation clustering with noisy input. In Proceedings\nof the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA \u201910, pages\n712\u2013728, Philadelphia, PA, USA, 2010. Society for Industrial and Applied Mathematics.\n\n[22] P. Pardalos and S. Rebennack. Computational challenges with cliques, quasi-cliques and clique\n\npartitions in graphs. Experimental Algorithms, pages 13\u201322, 2010.\n\n[23] V. Spirin and L.A. Mirny. Protein complexes and functional modules in molecular networks.\n\nProceedings of the National Academy of Sciences, 100(21):12123, 2003.\n\n[24] Dekel Yael, Gurel-Gurevich Ori, and Peres Yuval. Finding hidden cliques in linear time with\n\nhigh probability. In ANALCO11, 2011.\n\n9\n\n\f", "award": [], "sourceid": 563, "authors": [{"given_name": "Vinay", "family_name": "Jethava", "institution": null}, {"given_name": "Anders", "family_name": "Martinsson", "institution": null}, {"given_name": "Chiranjib", "family_name": "Bhattacharyya", "institution": null}, {"given_name": "Devdatt", "family_name": "Dubhashi", "institution": null}]}