{"title": "Graph Clustering: Block-models and model free results", "book": "Advances in Neural Information Processing Systems", "page_first": 2478, "page_last": 2486, "abstract": "Clustering graphs under the Stochastic Block Model (SBM) and extensions are well studied. Guarantees of correctness exist under the assumption that the data is sampled from a model. In this paper, we propose a framework, in which we obtain \"correctness\" guarantees without assuming the data comes from a model. The guarantees we obtain depend instead on the statistics of the data that can be checked. We also show that this framework ties in with the existing model-based framework, and that we can exploit results in model-based recovery, as well as strengthen the results existing in that area of research.", "full_text": "Graph Clustering: Block-models and model free\n\nresults\n\nYali Wan\n\nDepartment of Statistics\nUniversity of Washington\n\nSeattle, WA 98195-4322, USA\nyaliwan@washington.edu\n\nMarina Meil\u02d8a\n\nDepartment of Statistics\nUniversity of Washington\n\nSeattle, WA 98195-4322, USA\nmmp@stat.washington.edu\n\nAbstract\n\nClustering graphs under the Stochastic Block Model (SBM) and extensions are\nwell studied. Guarantees of correctness exist under the assumption that the data\nis sampled from a model. In this paper, we propose a framework, in which we\nobtain \u201ccorrectness\u201d guarantees without assuming the data comes from a model.\nThe guarantees we obtain depend instead on the statistics of the data that can be\nchecked. We also show that this framework ties in with the existing model-based\nframework, and that we can exploit results in model-based recovery, as well as\nstrengthen the results existing in that area of research.\n\n1\n\nIntroduction: a framework for clustering with guarantees without model\nassumptions\n\nIn the last few years, model-based clustering in networks has witnessed spectacular progress. At\nthe central of intact are the so-called block-models, the Stochastic Block Model (SBM), Degree-\nCorrected SBM (DC-SBM) and Preference Frame Model (PFM). The understanding of these models\nhas been advanced, especially in understanding the conditions when recovery of the true clustering\nis possible with small or no error. The algorithms for recovery with guarantees have also been\nimproved. However, the impact of the above results is limited by the assumption that the observed\ndata comes from the model.\nThis paper proposes a framework to provide theoretical guarantees for the results of model based\nclustering algorithms, without making any assumption about the data generating process. To de-\nscribe the idea, we need some notation. Assume that a graph G on n nodes is observed. A model-\nbased algorithm clusters G, and outputs clustering C and parameters M(G,C).\nThe framework is as follows: if M(G,C) \ufb01ts the data G well, then we shall prove that any other\nclustering C\ufffd of G that also \ufb01ts G well will be a small perturbation of C. If this holds, then C with\nmodel parameters M(G,C) can be said to capture the data structure in a meaningful way.\nWe exemplify our approach by obtaining model-free guarantees for the SBM and PFM models.\nMoreover, we show that model-free and model-based results are intimately connected.\n\n2 Background: graphs, clusterings and block models\nGraphs, degrees, Laplacian, and clustering Let G be a graph on n nodes, described by its ad-\njacency matrix \u02c6A. De\ufb01ne \u02c6di = \ufffdn\n\u02c6Aij the degree of node i, and \u02c6D = diag{ \u02c6di} the diagonal\nmatrix of the node degrees. The (normalized) Laplacian of G is de\ufb01ned as1 \u02c6L = \u02c6D\u22121/2 \u02c6A \u02c6D\u22121/2. In\n\nj=1\n\n1Rigorously speaking, the normalized graph Laplacian is I \u2212 \u02c6L [10].\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fZik = 1 if i \u2208 k, 0 otherwise, for i = 1, . . . n, k = 1, . . . K.\n\nextension, we de\ufb01ne the degree matrix D and the Laplacian L associated to any matrix A \u2208 Rn\u00d7n,\nwith Aij = Aji \u2265 0, in a similar way.\nLet C be a partitioning (clustering) of the nodes of G into K clusters. We use the shorthand notation\ni \u2208 k for \u201cnode i belongs to cluster k\u201d. We will represent C by its n\u00d7 K indicator matrix Z, de\ufb01ned\nby\n(1)\nNote that Z T Z = diag{nk} with nk counting the number of nodes in cluster k, and Z T \u02c6AZ =\nk,l=1 with nkl counting the edges in G between clusters k and l. Moreover, for two indicator\n[nkl]K\nmatrices Z, Z\ufffd for clusterings C,C\ufffd, (Z T Z\ufffd)kk\ufffd counts the number of points in the intersection of\ncluster k of C with cluster k\ufffd of C\ufffd, and (Z T \u02c6DZ\ufffd)kk\ufffd computes\ufffdi\u2208k\u2229k\ufffd\n\u02c6di the volume of the same\nintersection.\n\u201cBlock models\u201d for random graphs (SBM, DC-SBM, PFM) This family of models contains\nStochastic Block Models (SBM) [1, 18], Degree-Corrected SBM (DC-SBM) [17] and Prefer-\nence Frame Models (PFM) [20]. Under each of these model families, a graph G with adja-\ncency matrix \u02c6A over n nodes is generated by sampling its edges independently following the law\n\u02c6Aij \u223c Bernoulli(Aij), for all i > j. The symmetric matrix A = [Aij] describing the graph is the\nedge probability matrix. The three model families differ in the constraints they put on an acceptable\nA. Let C\u2217 be a clustering. The entries of A are de\ufb01ned w.r.t C\u2217 as follows (and we say that A is\ncompatible with C\u2217).\n\nnegative.\n\nSBM : Aij = Bkl whenever i \u2208 k, j \u2208 l, with B = [Bkl] \u2208 RK\u00d7K symmetric and non-\nDC-SBM : Aij = wiwjBkl whenever i \u2208 k, j \u2208 l, with B as above and w1, . . . wn non-negative\nPFM : A satis\ufb01es D = diag(A1), D\u22121AZ = ZR where 1 denotes the vector of all ones, Z\nis the indicator matrix of C\u2217, and R is a stochastic matrix (R1 = 1, Rkl \u2265 0), the details\nare in [20]\n\nweights associated with the graph nodes.\n\nWhile perhaps not immediately obvious, the SBM is a subclass of the DC-SBM, and the latter a\nsubclass of the PFM. Another common feature of block-models, that will be signi\ufb01cant throughout\nthis work is that for all three, Spectral Clustering algorithms [15] have been proved to work well\nestimating C\u2217.\n3 Main theorem: blueprint and results for PFM, SBM\n\nLet M be a model class, such as SBM, DC-SBM, PFM, and denote M(G,C) \u2208 M to be a model\nthat is compatible with C and is \ufb01tted in some way to graph G (we do not assume in general that this\n\ufb01t is optimal).\nTheorem 1 (Generic Theorem) We say that clustering C \ufb01ts G well w.r.t M iff M(G,C) is \u201cclose\nto\u201d G. If C \ufb01ts G well w.r.t M, then (subject to other technical conditions) any other clustering C\ufffd\nwhich also \ufb01ts G well is close to C, i.e. dist(C,C\ufffd) is small.\nIn what follows, we will\nin\nparticular the following will be formally de\ufb01ned.\ni.e an algorithm\nto \ufb01t a model\nresults to be\n(2) A goodness of \ufb01t measure between M(C,G) and the data G.\ncomputable in practice.\n(3) A distance between clusterings. We adopt the widely used Misclassi\ufb01cation Error (or Hamming)\ndistance de\ufb01ned below.\nThe Misclassi\ufb01cation Error (ME) distance between two clusterings C,C\ufffd over the same set of n\npoints is\n(2)\n\ninstantiate this Generic Theorem, and the concepts therein;\n\nThis is necessary since we want our\n\nin M to (G,C).\n\n(1) Model construction,\n\ndist(C,C\ufffd) = 1 \u2212\n\nwhere \u03c0 ranges over all permutations of K elements SK, and \u03c0(k) indexes a cluster in C\ufffd. If the\npoints are weighted by their degrees, a natural measure on the node set, the Weighted ME (wME)\n\n1\nn\n\nmax\n\n\u03c0\u2208SK \ufffdi\u2208k\u2229\u03c0(k)\n\n1,\n\n2\n\n\fdistance is\n\n1\n\nmax\n\ndist \u02c6d(C,C\ufffd) = 1 \u2212\nIn the above,\ufffdi\u2208k\u2229k\ufffd\n\u02c6di represents the total weight of the set of points assigned to cluster k by C\nand to cluster k\ufffd ( or \u03c0(k)) by C\ufffd. Note that in the indicator matrix representation of clusterings, this\nis the (k, k\ufffd) element of the matrix Z T \u02c6DZ\ufffd \u2208 RK\u00d7K. While dist is more popular, we believe dist \u02c6d\nis more natural, especially when node degrees are dissimilar, as \u02c6d can be seen as a natural measure\non the set of nodes, and dist \u02c6d is equivalent to the earth-mover\u2019s distance.\n\n\ufffdn\n\n(3)\n\n\u02c6di\n\ni=1\n\n\u02c6di .\n\n\u03c0\u2208SK \ufffdi\u2208k\u2229\u03c0(k)\n\n3.1 Main result for PFM\nConstructing a model Given a graph G and a clustering C of its nodes, we wish to construct a\nPFM compatible with C, so that its Laplacian L satis\ufb01es that || \u02c6L \u2212 L|| is small.\nLet the spectral decomposition of \u02c6L be\nlow \ufffd = \u02c6Y \u02c6\u039b \u02c6Y T + \u02c6Ylow \u02c6\u039blow \u02c6Y T\n\n\u02c6L = [ \u02c6Y \u02c6Ylow]\ufffd \u02c6\u039b\n\n\u02c6\u039blow \ufffd\ufffd \u02c6Y T\n\nwhere \u02c6Y \u2208 Rn\u00d7K , \u02c6Ylow \u2208 Rn\u00d7(n\u2212K), \u02c6\u039b = diag(\u02c6\u03bb1,\u00b7\u00b7\u00b7 , \u02c6\u03bbK), \u02c6\u039blow = diag(\u02c6\u03bbK+1,\u00b7\u00b7\u00b7 , \u02c6\u03bbn). To\nensure that the matrices \u02c6Y , \u02c6Ylow are uniquely de\ufb01ned we assume throughout the paper that \u02c6L\u2019s K-th\neigengap, i.e, |\u03bbK| \u2212 |\u03bbK+1|, is non-zero.\nAssumption 1 The eigenvalues of \u02c6L satisfy \u02c6\u03bb1 = 1 \u2265 |\u02c6\u03bb2| \u2265 . . . \u2265 |\u02c6\u03bbK| > |\u02c6\u03bbK+1| \u2265 . . .|\u02c6\u03bbn|.\nDenote the subspace spanned by the columns of M, for any M matrix, by R(M ), and |||| the\nEuclidean or spectral norm.\n\n\u02c6Y T\n\n(4)\n\nlow\n\n0\n\n0\n\nPFM Estimation Algorithm\nInput Graph G with \u02c6A, \u02c6D, \u02c6L, \u02c6Y , \u02c6\u039b, clustering C with indicator matrix Z.\nOutput (A, L) = P F M (G,C)\n\n1. Construct an orthogonal matrix derived from Z.\n\nYZ = \u02c6D1/2ZC\u22121/2, with C = Z T \u02c6DZ the column normalization of Z.\n\n(5)\n\nNote Ckk =\ufffdi\u2208k\n\n\u02c6di is the volume of cluster k.\n\n2. Project YZ on \u02c6Y and perform Singular Value Decomposition.\n\nF = Y T\nZ\n3. Change basis in R(YZ) to align with \u02c6Y .\n\n\u02c6Y = U \u03a3V T\n\nY = YZU V T . Complete Y to an orthonormal basis [Y B] of Rn.\n\n4. Construct Laplacian L and edge probability matrix A.\n\nL = Y \u02c6\u039bY T + (BBT ) \u02c6L(BBT ),\n\nA = \u02c6D1/2L \u02c6D1/2.\n\n(6)\n\n(7)\n\n(8)\n\nProposition 2 Let G, \u02c6A, \u02c6D, \u02c6L, \u02c6Y , \u02c6\u039b and Z be de\ufb01ned as above, and (A, L) = P F M (G,C). Then,\n\n1. \u02c6D and L, or A de\ufb01ne a PFM with degrees \u02c6d1:n.\n2. The columns of Y are eigenvectors of L with eigenvalues \u02c6\u03bb1:K.\n3. \u02c6D1/21 is an eigenvector of both L and \u02c6L with eigenvalue \u02c6\u03bb1 = 1.\n\nThe proof is relegated to the Supplement, as are all the omitted proofs.\nP F M (G,C) is an estimator for the PFM parameters given the clustering. It is evidently not the\nMaximum Likelihood estimator, but we can show that it is consistent in the following sense.\n\n3\n\n\fProposition 3 (Informal) Assume that G is sampled from a PFM with parameters D\u2217, L\u2217 and com-\npatible with C\u2217, and let L = P F M (G,C\u2217). Then, under standard recovery conditions for PFM (e.g\n[20]) ||L\u2217 \u2212 L|| = o(1) w.r.t. n.\nAssumption 2 (Goodness of \ufb01t for PFM) || \u02c6L \u2212 L|| \u2264 \u03b5.\nP F M (G,C) instantiates M(G,C), and Assumption 2 instantiates the goodness of \ufb01t measure. It\nremains to prove an instance of Generic Theorem 1 for these choices.\n\nTheorem 4 (Main Result (PFM)) Let G be a graph with \u02c6d1:n, \u02c6D, \u02c6L, \u02c6\u03bb1:n as de\ufb01ned, and \u02c6L sat-\nisfy Assumption 1. Let C,C\ufffd be two clusterings with K clusters, and L, L\ufffd be their corresponding\n(|\u02c6\u03bbK|\u2212|\u02c6\u03bbK+1|)2 and\nLaplacians, de\ufb01ned as in (8), and satisfy Assumption 2 respectively. Set \u03b4 =\n\u03b40 = mink Ckk/ maxk Ckk with C de\ufb01ned as in (5), where k indexes the clusters of C. Then,\nwhenever \u03b4 \u2264 \u03b40,\n\n(K\u22121)\u03b52\n\ndist \u02c6d(C,C\ufffd) \u2264\n\nwith dist \u02c6d being the weighted ME distance (3).\n\n\u03b4,\n\nmaxk Ckk\n\n\ufffdk Ckk\n\n(9)\n\nIn the remainder of this section we outline the proof steps, while the partial results of Proposition 5,\n6, 7 are proved in the Supplement. First, we apply the perturbation bound called the Sinus Theorem\nof Davis and Kahan, in the form presented in Chapter V of [19].\n\nProposition 5 Let \u02c6Y , \u02c6\u03bb1:n, Y be de\ufb01ned as usual. If Assumptions 1 and 2 hold, then\n\n|| diag(sin \u03b81:K ( \u02c6Y , Y ))|| \u2264\n\n\u03b5\n\n|\u02c6\u03bbK| \u2212 |\u02c6\u03bbK+1|\n\n= \u03b5\ufffd\n\n(10)\n\nwhere \u03b81:K are the canonical (or principal) angles between R( \u02c6Y ) and R(Y ) (see e.g [8]).\nThe next step concerns the closeness of Y, \u02c6Y in Frobenius norm. Since Proposition 5 bounds the\nsinuses of the canonical angles, we exploit the fact that the cosines of the same angles are the singular\nvalues of F = Y T \u02c6Y of (6).\n\nProposition 6 Let M = Y Y T , \u02c6M = \u02c6Y \u02c6Y T and F, \u03b5\ufffd as above. Assumptions 1 and 2 imply that\n\nF = trace M \u02c6M T \u2265 K \u2212 (K \u2212 1)\u03b5\ufffd2.\n\n1. ||F||2\n2. ||M \u2212 \u02c6M||2\n\nF \u2264 2(K \u2212 1)\u03b5\ufffd2.\n\nNow we show that all clusterings which satisfy Proposition 6 must be close to each other in the\nweighted ME distance. For this, we \ufb01rst need an intermediate result. Assume we have two clus-\nterings C,C\ufffd, with K clusters, for which we construct YZ, Y, L, M, respectively Y \ufffdZ, Y \ufffd, L\ufffd, M\ufffd as\nabove. Then, the subspaces spanned by Y and Y \ufffd will be close.\n\nProposition 7 Let \u02c6L satisfy Assumption 1 and let C,C\ufffd represent two clusterings for which L, L\ufffd\nsatisfy Assumption 2. Then, ||Y T\nThe main result now follows from Proposition 7 and Theorem 9 of [13], as shown in the Supplement.\nThis proof approach is different from the existing perturbation bounds for clustering, which all use\ncounting arguments. The result of [13] is a local equivalence, which bounds the error we need in\nterms of \u03b4 de\ufb01ned above (\u201clocal\u201d meaning the result only holds for small \u03b4).\n\nF \u2265 K \u2212 4(K \u2212 1)\u03b5\ufffd2 = K \u2212 \u03b4\n\nZ Y \ufffdZ||2\n\n4\n\n\f3.2 Main Theorem for SBM\n\nIn this section, we offer an instantiation of Generic Theorem 1 for the case of the SBM. As before,\nwe start with a model estimator, which in this case is the Maximum Likelihood estimator.\n\nSBM Estimation Algorithm\nInput Graph with \u02c6A, clustering C with indicator matrix Z.\nOutput A = SBM (G,C)\n\n1. Construct an orthogonal matrix derived from Z: YZ = ZC\u22121/2 with C = Z T Z.\n2. Estimate the edge probabilities: B = C\u22121Z T \u02c6AZC\u22121.\n3. Construct A from B by A = ZBZ T .\n\nProposition 8 Let \u02dcB = C 1/2BC 1/2 and denote the eigenvalues of \u02dcB, ordered by decreasing mag-\nnitude, by \u03bb1:K. Let the spectral decomposition of \u02dcB be \u02dcB = U \u039bU T , with U an orthogonal matrix\nand \u039b = diag(\u03bb1:K ). Then\n\n1. A is a SBM.\n\n2. \u03bb1:K are the K principal eigenvalues of A. The remaining eigenvalues of A are zero.\n\n3. A = Y \u039bY T where Y = YZU.\n\nAssumption 3 (Eigengap) B is non-singular (or, equivalently, |\u03bbK| > 0.\nAssumption 4 (Goodness of \ufb01t for SBM) || \u02c6A \u2212 A|| \u2264 \u03b5.\nWith the model (SBM), estimator, and goodness of \ufb01t de\ufb01ned, we are ready for the main result.\nTheorem 9 (Main Result (SBM)) Let G be a graph with incidence matrix \u02c6A, and \u02c6\u03bbA\nK the K-th\nsingular value of \u02c6A. Let C,C\ufffd be two clusterings with K clusters, satisfying Assumptions 3 and 4.\nSet \u03b4 = 4K\u03b52\nK|2 and \u03b40 = mink nk/ maxk nk, where k indexes the clusters of C. Then, whenever\n|\u02c6\u03bbA\n\u03b4 \u2264 \u03b40, dist(C,C\ufffd) \u2264 \u03b4 maxk nk/n, where dist represents the ME distance (2).\nNote that the eigengap of \u02c6A, \u02c6\u039bA\nK is not bounded above, and neither is \u03b5. Since the SBM is less\n\ufb02exible than the PFM, we expect that for the same data G, Theorem 9 will be more restrictive than\nTheorem 4.\n\n4 The results in perspective\n\n4.1 Cluster validation\n\nTheorems like 4, 9 can provide model free guarantees for clustering. We exemplify this procedure in\nthe experimental Section 6, using standard spectral clustering as described in e.g [18, 17, 15]. What\nis essential is that all the quantities such as \u03b5 and \u03b4 are computable from the data.\nMoreover, if Y is available, then the bound in Theorem 4 can be improved.\n\nProposition 10 Theorem 4 holds when \u03b4 is replaced by \u03b4Y = K\u2212 < \u02c6M , M >F +(K \u2212 1)(\u03b5\ufffd)2 +\n2\ufffd2(K \u2212 1)\u03b5\ufffd|| \u02c6M \u2212 M||F , with \u03b5\ufffd = \u03b5/(|\u02c6\u03bbK| \u2212 |\u02c6\u03bbK+1|) and M, \u02c6M de\ufb01ned in Proposition 6.\n\n4.2 Using existing model-based recovery theorems to prove model-free guarantees\n\nWe exemplify this by using (the proof of) Theorem 3 of [20] to prove the following.\n\nTheorem 11 (Alternative result based on [20] for PFM) Under the same conditions as in Theo-\nrem 4, dist \u02c6d(C,C\ufffd) \u2264 \u03b4WM , with \u03b4WM = 128\n\n(|\u02c6\u03bbK|\u2212|\u02c6\u03bbK+1|)2 .\n\nK\u03b52\n\n5\n\n\fIt follows, too, that with the techniques in this paper, the error bound in [20] can be improved by a\nfactor of 128.\nSimilarly, if we use the results of [18] we obtain alternative model-free guarantee for the SBM.\nAssumption 5 (Alternative goodness of \ufb01t for SBM) || \u02c6L2\u2212L2||F \u2264 \u03b5, where \u02c6L, L are the Lapla-\ncians of \u02c6A and A = SBM (G,C) respectively.\nTheorem 12 (Alternative result based on [18] for SBM) Under the same conditions as in Theo-\nrem 9, except for replacing Assumption 4 with 5, dist(C,C\ufffd) \u2264 \u03b4RCY with \u03b4RCY = \u03b52\n16 maxk nk\n.\n|\u02c6\u03bbK|4\nA problem with this result is that Assumption 5 is much stronger than 4 (being in Frobenius norm).\nThe more recent results of [17] (with unspeci\ufb01ed constants) in conjunction with our original As-\nsumptions 3, 4, and the assumption that all clusters have equal sizes, give a bound of O(K\u03b52/\u02c6\u03bb2\nK)\nfor the SBM; hence our model-free Theorem 9 matches this more restrictive model-based theorem.\n\nn\n\n4.3 Sanity checks and Extensions\nIt can be easily veri\ufb01ed that if indeed G is sampled from a SBM, or PFM, then for large enough n,\nand large enough model eigengap, Assumptions 1 and 2 (or 3 and 4) will hold.\nSome immediate extensions and variations of Theorems 4, 9 are possible. For example, one could\nreplace the spectral norm by the Frobenius norm in Assumptions 2 and 4, which would simplify\nsome of the proofs. However, using the Frobenius norm would be a much stronger assumption [18]\nTheorem 4 holds not just for simple graphs, but in the more general case when \u02c6A is a weighted graph\n(i.e. a similarity matrix). The theorems can be extended to cover the case when C\ufffd is a clustering\nthat is \u03b1-worse than C, i.e when ||L\ufffd \u2212 \u02c6L|| \u2265 ||L \u2212 \u02c6L||(1 \u2212 \u03b1).\n4.4 Clusterability and resilience\nOur Theorems also imply the stability of a clustering to perturbations of the graph G. Indeed, let \u02c6L\ufffd\nbe the Laplacian of G\ufffd, a perturbation of G. If || \u02c6L\ufffd \u2212 \u02c6L|| \u2264 \u03b5, then || \u02c6L\ufffd \u2212 L|| \u2264 2\u03b5, and (1) G\ufffd is\nwell \ufb01tted by a PFM whenever G is, and (2) C is \u03b4 stable w.r.t G\ufffd, hence C is what some authors [9]\ncall resilient.\nA graph G is clusterable when G can be \ufb01tted well by some clustering C\u2217. Much work [4, 7] has\nbeen devoted to showing that clusterability implies that \ufb01nding a C close to C\u2217 is computationally\nef\ufb01cient. Such results can be obtained in our framework, by exploiting existing recovery theorems\nsuch as [18, 17, 20], which give recovery guarantees for Spectral Clustering, under the assumption of\nsampling from the model. For this, we can simply replace the model assumption with the assumption\nthat there is a C\u2217 for which L (or A) satis\ufb01es Assumptions 1 and 2 (or 3 and 4).\n5 Related work\n\nTo our knowledge, there is no work of the type of Theorem 1 in the literature on SBM, DC-SBM,\nPFM. The closest work is by [6] which guarantees approximate recovery assuming G is close to a\nDC-SBM.\nSpectral clustering is also used for loss-based clustering in (weighted) graphs and some stability\nresults exist in this context. Even though they measure clustering quality by different criteria, so that\nthe \u03b5 values are not comparable, we review them here. The recent paper of [16], Theorem 1.2 states\nthat if the K-way Cheeger constant of G is \u03c1(k) \u2264 (1 \u2212 \u02c6\u03bbK+1)/(cK 3) then the clustering error2\ndist \u02c6d(C,Copt) \u2264 C/c = \u03b4P SZ. In the current proof, the constant C = 2 \u00d7 105; moreover, \u03c1(K)\ncannot be computed tractably.\nIn [14], the bound \u03b4M SX depends on \u03b5M SX, the Normalized Cut\nscaled by the eigengap. Since both bounds refer to the result of spectral clustering, we can compare\nthe relationship between \u03b4M SX and \u03b5M SX; for [14], this is \u03b4M SX = 2\u03b5M SX [1 \u2212 \u03b5M SX /(K \u2212 1)],\n2The results is stronger, bounding the perturbation of each cluster individually by \u03b4P SZ, but it also includes\n\na factor larger than 1, bounding the error of K-means algorithm.\n\n6\n\n\fZ \u2212 Y \ufffdZ||2\n\nwhich is about K \u2212 1 times larger than \u03b4 when \ufffd = \ufffdM SX. In [5], dist(C,C\ufffd) is de\ufb01ned in terms of\nF , and the loss is (closely related) to || \u02c6A \u2212 SBM (G,C)||2\nF . The bound does not take\n||Y T\ninto account the eigengap, that is, the stability of the subspace \u02c6Y itself.\nBootstrap for validating a clustering C was studied in [11] (see also references therein for earlier\nwork). In [3] the idea is to introduce a statistics, and large deviation bounds for it, conditioned on\nsampling from a SBM (with covariates) and on a given C.\n\n6 Experimental evaluation\n\nExperiment Setup Given G, we obtain a clustering C0 by spectral clustering [15]. Then we\ncalculate clustering C by perturbing C0 with gradually increasing noise. For each C, we construct\nPFM (C,G)and SBM(C,G) model, and further compute \ufffd, \u03b4 and \u03b40. If \u03b4 \u2264 \u03b40, C is guaranteed to be\nstable by the theorems. In the remainder of this section, we describe the data generating process for\nthe simulated datasets and the results we obtained.\n\nPFM Datasets We generate from PFM model with K = 5, n = 10000, \u03bb1:K =\n(1, 0.875, 0.75, 0.625, 0.5). eigengap = 0.48, n1:K = (2000, 2000, 2000, 2000, 2000). The\nstochastic matrix R and its stationary distribution \u03c1 are shown below. We sample an adjacency\nmatrix \u02c6A from A (shown below).\n\nA\n\n\u02c6A\n\n\u03c1 =\ufffd 25\nR =\uf8ee\uf8ef\uf8ef\uf8ef\uf8f0\n\n.79\n.03\n.09\n.04\n.10\n\n.12\n\n.17\n\n.18\n\n.02\n.71\n.16\n.00\n.01\n\n.06\n.23\n.69\n.00\n.03\n\n.03\n.00\n.00\n.80\n.11\n\n.28 \ufffd\n\n.10\n.02\n.06\n.16\n.76\n\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fb\n\nPerturbed PFM Datasets A is obtained from the previous model by perturbing its principal\nsubspace (details in Supplement). Then we sample \u02c6A from A.\n\nLancichinetti-Fortunato-Radicchi (LFR) simulated matrix [12] The LFR benchmark graphs\nare widely used for community detection algorithms, due to heterogeneity in the distribution\nof node degree and community size. A LFR matrix is simulated with n = 10000, K = 4,\nnk = (2467, 2416, 2427, 2690) and \u00b5 = 0.2, where \u00b5 is the mixing parameter indicating the\nfraction of edges shared between a node and the other nodes from outside its community.\n\nPolitical Blogs Dataset A directed network \ufffdA of hyperlinks between weblogs on US politics,\ncompiled from online directories by Adamic and Glance [2], where each blog is assigned a political\nleaning, liberal or conservative, based on its blog content. The network A contains 1490 blogs.\nAfter erasing the disconnected nodes, n = 983. We study \u02c6A = ( \ufffdAT \ufffdA)3, which is a smoothed\nundirected graph. For \ufffdAT \ufffdA we \ufb01nd no guarantees.\n\nThe \ufb01rst two data sets are expected to \ufb01t the PFM well, but not the SBM, while the LFR data is\nexpected to be a good \ufb01t for a SBM. Since all bounds can be computed on weighted graphs as well,\nwe have run the experiments also on the edge probability matrices A used to generate the PFM and\nperturbed PFM graphs.\nThe results of these experiments are summarized in Figure 1. For all of the experiments, the cluster-\ning C is ensured to be stable by Theorem 4 as the unweighted error grows to a breaking point, then\nthe assumptions of the theorem fail. In particular, the C0 is always stable in the PFM framework.\n\n7\n\n\fComparing \u03b4 from Theorem 9 to that from Theorem 4, we \ufb01nd that Theorem 9 (guarantees for SBM)\nis much harder to satisfy. All \u03b4 values from Theorem 9 are above 1, and not shown.3 In particular,\nfor the SBM model class, the C cannot be proved stable even for the LFR data.\nNote that part of the reason why with the PFM model very little difference from the clustering C0 can\nbe tolerated for a clustering to be stable is that the large eigengap makes P F M (G,C) differ from\nP F M (G,C0) even for very small perturbations. By comparing the bounds for \u02c6A with the bounds\nfor the \u201cweighted graphs\u201d A, we can evaluate that the sampling noise on \u03b4 is approximately equal\nto that of the clustering perturbation. Of course, the sampling noise varies with n, decreasing for\nlarger graphs. Moreover, from Political Blogs data, we see that \u201csmoothing\u201d a graph, by e.g. taking\npowers of its adjacency matrix, has a stability inducing effect.\n\nFigure 1: Quantities \ufffd, \u03b4, \u03b40 from Theorem 4 plotted vs dist(C, C0) for various datasets: \u02c6A denotes a simple graph, while A denotes a\nweighted graph (i.e. a non-negative matrix). For the Political Blogs: Truth means C0 is true clustering of [2], spectral means C0 is obtained\nfrom spectral clustering. For SBM, \u03b4 is always greater than \u03b40.\n\n7 Discussion\n\nThis paper makes several contributions. At a high level, it poses the problem of model free validation\nin the area of community detection in networks. The stability paradigm is not entirely new, but\nusing it explicitly with model-based clustering (instead of cost-based) is. So is \u201cturning around\u201d the\nmodel-based recovery theorems to be used in a model-free framework.\nAll quantities in our theorems are computable from the data and the clustering C, i.e do not con-\ntain undetermined constants, and do not depend on parameters that are not available. As with\ndistribution-free results in general, making fewer assumptions allows for less con\ufb01dence in the con-\nclusions, and the results are not always informative. Sometimes this should be so, e.g when the\ndata does not \ufb01t the model well. But it is also possible that the \ufb01t is good, but not good enough\nto satisfy the conditions of the theorems as they are currently formulated. This happens with the\nSBM bounds, and we believe tighter bounds are possible for this model. It would be particularly\ninteresting to study the non-spectral, sharp thresholds of [1] from the point of view of model-free\nrecovery. A complementary problem is to obtain negative guarantees (i.e that C is not unique up to\nperturbations).\nAt the technical level, we obtain several different and model-speci\ufb01c stability results, that bound the\nperturbation of a clustering by the perturbation of a model. They can be used both in model-free\nand in existing or future model-based recovery guarantees, as we have shown in Section 3 and in the\nexperiments. The proof techniques that lead to these results are actually simpler, more direct, and\nmore elementary than the ones found in previous papers.\n\n3We also computed \u03b4RCY but the bounds were not informative.\n\n8\n\n\fReferences\n[1] Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models:\nfundamental limits and ef\ufb01cient recovery algorithms. arXiv preprint arXiv:1503.00609, 2015.\n\n[2] Lada A Adamic and Natalie Glance. The political blogosphere and the 2004 us election:\ndivided they blog. In Proceedings of the 3rd international workshop on Link discovery, pages\n36\u201343. ACM, 2005.\n\n[3] Edoardo M. Airoldi, David S. Choi, and Patrick J. Wolfe. Con\ufb01dence sets for network struc-\n\nture. Technical Report arXiv:1105.6245, 2011.\n\n[4] Pranjal Awasthi. Clustering under stability assumptions. In Encyclopedia of Algorithms, pages\n\n331\u2013335. 2016.\n\n[5] Francis Bach and Michael I. Jordan. Learning spectral clustering with applications to speech\n\nseparation. Journal of Machine Learning Research, 7:1963\u20132001, 2006.\n\n[6] Maria-Florina Balcan, Christian Borgs, Mark Braverman, Jennifer Chayes, and Shang-Hua\nTeng. Finding endogenously formed communities. In Proceedings of the Twenty-Fourth An-\nnual ACM-SIAM Symposium on Discrete Algorithms, pages 767\u2013783. SIAM, 2013.\n\n[7] Shai Ben-David. Computational feasibility of clustering under clusterability assumptions.\n\nCoRR, abs/1501.00437, 2015.\n\n[8] Rajendra Bhatia. Matrix analysis, volume 169. Springer Science & Business Media, 2013.\n\n[9] Yonatan Bilu and Nathan Linial. Are stable instances easy? CoRR, abs/0906.3162, 2009.\n\n[10] Fan RK Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997.\n\n[11] Brian Karrer, Elizaveta Levina, and M. E. J. Newman. Robustness of community structure in\n\nnetworks. Phys. Rev. E, 77:046119, Apr 2008.\n\n[12] Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. Benchmark graphs for testing\n\ncommunity detection algorithms. Physical review E, 78(4):046110, 2008.\n\n[13] Marina Meil\u02d8a. Local equivalence of distances between clusterings \u2013 a geometric perspective.\n\nMachine Learning, 86(3):369\u2013389, 2012.\n\n[14] Marina Meil\u02d8a, Susan Shortreed, and Liang Xu. Regularized spectral learning. In Robert Cow-\nell and Zoubin Ghahramani, editors, Proceedings of the Arti\ufb01cial Intelligence and Statistics\nWorkshop(AISTATS 05), 2005.\n\n[15] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm.\nIn T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information\nProcessing Systems 14, Cambridge, MA, 2002. MIT Press.\n\n[16] Richard Peng, He Sun, and Luca Zanetti. Partitioning well-clustered graphs with k-means and\nIn Proceedings of the Annual Conference on Learning Theory (COLT), pages\n\nheat kernel.\n1423\u20131455, 2015.\n\n[17] Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochastic\nblockmodel. In Advances in Neural Information Processing Systems, pages 3120\u20133128, 2013.\n\n[18] Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional\n\nstochastic blockmodel. The Annals of Statistics, pages 1878\u20131915, 2011.\n\n[19] Gilbert W Stewart and Ji-guang Sun. Matrix perturbation theory, volume 175. Academic press\n\nNew York, 1990.\n\n[20] Yali Wan and Marina Meila. A class of network models recoverable by spectral clustering.\nIn Daniel Lee and Masashi Sugiyama, editors, Advances in Neural Information Processing\nSystems (NIPS), page (to appear), 2015.\n\n9\n\n\f", "award": [], "sourceid": 1297, "authors": [{"given_name": "Yali", "family_name": "Wan", "institution": "University of Washington"}, {"given_name": "Marina", "family_name": "Meila", "institution": "University of Washington"}]}