{"title": "Edge-exchangeable graphs and sparsity", "book": "Advances in Neural Information Processing Systems", "page_first": 4249, "page_last": 4257, "abstract": "Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the Aldous-Hoover theorem guarantees that these graphs are dense or empty with probability one, whereas many real-world graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edge exchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We demonstrate that edge-exchangeable models, unlike models that are traditionally vertex exchangeable, can exhibit sparsity. To do so, we outline a general framework for graph generative models; by contrast to the pioneering work of Caron and Fox (2015), models within our framework are stationary across steps of the graph sequence. In particular, our model grows the graph by instantiating more latent atoms of a single random measure as the dataset size increases, rather than adding new atoms to the measure.", "full_text": "Edge-exchangeable graphs and sparsity\n\nDiana Cai\n\nDept. of Statistics, U. Chicago\n\nChicago, IL 60637\n\ndcai@uchicago.edu\n\nTrevor Campbell\n\nCSAIL, MIT\n\nCambridge, MA 02139\n\ntdjc@mit.edu\n\nTamara Broderick\n\nCSAIL, MIT\n\nCambridge, MA 02139\n\ntbroderick@csail.mit.edu\n\nAbstract\n\nMany popular network models rely on the assumption of (vertex) exchangeability,\nin which the distribution of the graph is invariant to relabelings of the vertices.\nHowever, the Aldous-Hoover theorem guarantees that these graphs are dense or\nempty with probability one, whereas many real-world graphs are sparse. We\npresent an alternative notion of exchangeability for random graphs, which we call\nedge exchangeability, in which the distribution of a graph sequence is invariant\nto the order of the edges. We demonstrate that edge-exchangeable models, unlike\nmodels that are traditionally vertex exchangeable, can exhibit sparsity. To do\nso, we outline a general framework for graph generative models; by contrast to\nthe pioneering work of Caron and Fox [12], models within our framework are\nstationary across steps of the graph sequence. In particular, our model grows the\ngraph by instantiating more latent atoms of a single random measure as the dataset\nsize increases, rather than adding new atoms to the measure.\n\n1\n\nIntroduction\n\nIn recent years, network data have appeared in a growing number of applications, such as online\nsocial networks, biological networks, and networks representing communication patterns. As a result,\nthere is growing interest in developing models for such data and studying their properties. Crucially,\nindividual network data sets also continue to increase in size; we typically assume that the number of\nvertices is unbounded as time progresses. We say a graph sequence is dense if the number of edges\ngrows quadratically in the number of vertices, and a graph sequence is sparse if the number of edges\ngrows sub-quadratically as a function of the number of vertices. Sparse graph sequences are more\nrepresentative of real-world graph behavior. However, many popular network models (see, e.g., Lloyd\net al. [19] for an extensive list) share the undesirable scaling property that they yield dense sequences\nof graphs with probability one. The poor scaling properties of these models can be traced back to a\nseemingly innocent assumption: that the vertices in the model are exchangeable, that is, any \ufb01nite\npermutation of the rows and columns of the graph adjacency matrix does not change the distribution\nof the graph. Under this assumption, the Aldous-Hoover theorem [1, 16] implies that such models\ngenerate dense or empty graphs with probability one [20].\nThis fundamental model misspeci\ufb01cation motivates the development of new models that can achieve\nsparsity. One recent focus has been on models in which an additional parameter is employed to\nuniformly decrease the probabilities of edges as the network grows (e.g., Bollob\u00e1s et al. [3], Borgs\net al. [4, 5], Wolfe and Olhede [24]). While these models allow sparse graph sequences, the sequences\nare no longer projective. In projective sequences, vertices and edges are added to a graph as a\ngraph sequence progresses\u2014whereas in the models above, there is not generally any strict subgraph\nrelationship between earlier graphs and later graphs in the sequence. Projectivity is natural in\nstreaming modeling. For instance, we may wish to capture new users joining a social network and\nnew connections being made among existing users\u2014or new employees joining a company and new\ncommunications between existing employees.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fCaron and Fox [12] have pioneered initial work on sparse, projective graph sequences. Instead of\nthe vertex exchangeability that yields the Aldous-Hoover theorem, they consider a notion of graph\nexchangeability based on the idea of independent increments of subordinators [18], explored in depth\nby Veitch and Roy [22]. However, since this Kallenberg-style exchangeability introduces a new\ncountable in\ufb01nity of latent vertices at every step in the graph sequence, its generative mechanism\nseems particularly suited to the non-stationary domain. By contrast, we are here interested in exploring\nstationary models that grow in complexity with the size of the data set. Consider classic Bayesian\nnonparametric models as the Chinese restaurant process (CRP) and Indian buffet process (IBP); these\nengender growth by using a single in\ufb01nite latent collection of parameters to generate a \ufb01nite but\ngrowing set of instantiated parameters. Similarly, we propose a framework that uses a single in\ufb01nite\nlatent collection of vertices to generate a \ufb01nite but growing set of vertices that participate in edges\nand thereby in the network. We believe our framework will be a useful component in more complex,\nnon-stationary graphical models\u2014just as the CRP and IBP are often combined with hidden Markov\nmodels or other explicit non-stationary mechanisms. Additionally, Kallenberg exchangeability is\nintimately tied to continuous-valued labels of the vertices, and here we are interested in providing a\ncharacterization of the graph sequence based solely on its topology.\nIn this work, we introduce a new form of exchangeability, distinct from both vertex exchangeability\nand Kallenberg exchangeability. In particular, we say that a graph sequence is edge exchangeable if\nthe distribution of any graph in the sequence is invariant to the order in which edges arrive\u2014rather\nthan the order of the vertices. We will demonstrate that edge exchangeability admits a large family of\nsparse, projective graph sequences.\nIn the remainder of the paper, we start by de\ufb01ning dense and sparse graph sequences rigorously.\nWe review vertex exchangeability before introducing our new notion of edge exchangeability in\nSection 2, which we also contrast with Kallenberg exchangeability in more detail in Section 4. We\nde\ufb01ne a family of models, which we call graph frequency models, based on random measures in\nSection 3. We use these models to show that edge-exchangeable models can yield sparse, projective\ngraph sequences via theoretical analysis in Section 5 and via simulations in Section 6. Along the way,\nwe highlight other bene\ufb01ts of the edge exchangeability and graph frequency model frameworks.\n\n2 Exchangeability in graphs: old and new\n\nLet (Gn)n := G1, G2, . . . be a sequence of graphs, where each graph Gn = (Vn, En) consists of a\n(\ufb01nite) set of vertices Vn and a (\ufb01nite) multiset of edges En. Each edge e \u2208 En is a set of two vertices\nin Vn. We assume the sequence is projective\u2014or growing\u2014so that Vn \u2286 Vn+1 and En \u2286 En+1.\nConsider, e.g., a social network with more users joining the network and making new connections\nwith existing users. We say that a graph sequence is dense if |En| = \u2126(|Vn|2), i.e., the number of\nedges is asymptotically lower bounded by c \u00b7 |Vn|2 for some constant c. Conversely, a sequence is\nsparse if |En| = o(|Vn|2), i.e., the number of edges is asymptotically upper bounded by c \u00b7 |Vn|2\nfor all constants c. In what follows, we consider random graph sequences, and we focus on the case\nwhere |Vn| \u2192 \u221e almost surely.\n\n2.1 Vertex-exchangeable graph sequences\n\nIf the number of vertices in the graph sequence grows to in\ufb01nity, the graphs in the sequence can\nbe thought of as subgraphs of an \u201cin\ufb01nite\u201d graph with in\ufb01nitely many vertices and a correspond-\ningly in\ufb01nite adjacency matrix. Traditionally, exchangeability in random graphs is de\ufb01ned as the\ninvariance of the distribution of any \ufb01nite submatrix of this adjacency matrix\u2014corresponding to any\n\ufb01nite collection of vertices\u2014under \ufb01nite permutation. Equivalently, we can express this form of\nexchangeability, which we henceforth call vertex exchangeability, by considering a random sequence\nof graphs (Gn)n with Vn = [n], where [n] := {1, . . . , n}. In this case, only the edge sequence is\nrandom. Let \u03c0 be any permutation of the integers [n]. If e = {v, w}, let \u03c0(e) := {\u03c0(v), \u03c0(w)}. If\nEn = {e1, . . . , em}, let \u03c0(En) := {\u03c0(e1), . . . , \u03c0(em)}.\nDe\ufb01nition 2.1. Consider the random graph sequence (Gn)n, where Gn has vertices Vn = [n] and\nedges En. (Gn)n is (in\ufb01nitely) vertex exchangeable if for every n \u2208 N and for every permutation \u03c0\nof the vertices [n], Gn\n\nd\n= \u02dcGn, where \u02dcGn has vertices [n] and edges \u03c0(En).\n\n2\n\n\f2\n\n2\n\n1\n\n1\n\n1\n\n5\n\n2\n\n3\n\n2\n\n2\n\n1\n\n1\n\n3\n\n1\n\n5\n\n2\n\n2\n\n1\n\n2\n\n1\n\n5\n\n2\n\n4\n\n4\n\n1\n\n3\n\n3\n\n4\n\n1\n\n6\n\n4\n\n3\n\n4\n\n4\n\n2\n\n3\n\n5\n\n2\n\n2\n\n3\n\n4\n\n2\n\n4\n\n1\n\n1\n\n6\n\n4\n\n1\n\n1\n\n1\n\n1\n\n5\n\n2\n\nFigure 1: Upper, left four: Step-augmented graph sequence from Ex. 2.2. At each step n, the step\nvalue is always at least the maximum vertex index. Upper, right two: Two graphs with the same\nprobability under vertex exchangeability. Lower, left four: Step-augmented graph sequence from\nEx. 2.3. Lower, right two: Two graphs with the same probability under edge exchangeability.\n\nA great many popular models for graphs are vertex exchangeable; see Appendix B and Lloyd\net al. [19] for a list. However, it follows from the Aldous-Hoover theorem [1, 16] that any vertex-\nexchangeable graph is a mixture of sampling procedures from graphons. Further, any graph sampled\nfrom a graphon is almost surely dense or empty [20]. Thus, vertex-exchangeable random graph\nmodels are misspeci\ufb01ed models for sparse network datasets, as they generate dense graphs.\n\n2.2 Edge-exchangeable graph sequences\n\nVertex-exchangeable sequences have distributions invariant to the order of vertex arrival. We introduce\nedge-exchangeable graph sequences, which will instead be invariant to the order of edge arrival.\nAs before, we let Gn = (Vn, En) be the nth graph in the sequence. Here, though, we consider\nonly active vertices\u2014that is, vertices that are connected via some edge. That lets us de\ufb01ne Vn as a\nfunction of En; namely, Vn is the union of the vertices in En. Note that a graph that has sub-quadratic\ngrowth in the number of edges as a function of the number of active vertices will necessarily have\nsub-quadratic growth in the number of edges as a function of the number of all vertices, so we obtain\nstrictly stronger results by considering active vertices. In this case, the graph Gn is completely\nde\ufb01ned by its edge set En.\nAs above, we suppose that En \u2286 En+1. We can emphasize this projectivity property by augmenting\neach edge with the step on which it is added to the sequence. Let E(cid:48)\nn be a collection of tuples, in\nwhich the \ufb01rst element is the edge and the second element is the step (i.e., index) on which the edge\nn = {(e1, s1), . . . , (em, sm)}. We can then de\ufb01ne a step-augmented graph sequence\nis added: E(cid:48)\n(E(cid:48)\nn)n = (E(cid:48)\n1, E(cid:48)\n2, . . .) as a sequence of step-augmented edge sets. Note that there is a bijection\nbetween the step-augmented graph sequence and the original graph sequence.\nExample 2.2. In the setup for vertex exchangeability, we assumed Vn = [n] and every edge is\nintroduced as soon as both of its vertices are introduced. In this case, the step of any edge in the\nstep-augmented graph is the maximum vertex value. For example, in Figure 1, we have\n\n1 = \u2205, E(cid:48)\nE(cid:48)\n\n2 = E(cid:48)\n\n3 = {({1, 2}, 2)}, E(cid:48)\n\n4 = {({1, 2}, 2), ({1, 4}, 4), ({2, 4}, 4), ({3, 4}, 4)}.\n\nIn general step-augmented graphs, though, the step need not equal the max vertex, as we see next. (cid:4)\nExample 2.3. Suppose we have a graph given by the edge sequence (see Figure 1):\n\nE1 = E2 = {{2, 5},{5, 5}}, E3 = E2 \u222a {{2, 5}}, E4 = E3 \u222a {{1, 6}}.\n\nThe step-augmented graph E(cid:48)\n\n4 is {({2, 5}, 1), ({5, 5}, 1), ({2, 5}, 3), ({1, 6}, 4)}.\n\n(cid:4)\n\nRoughly, a random graph sequence is edge exchangeable if its distribution is invariant to \ufb01nite\npermutations of the steps. Let \u03c0 be a permutation of the integers [n]. For a step-augmented edge set\nn = {(e1, s1), . . . , (em, sm)}, let \u03c0(E(cid:48)\nE(cid:48)\nDe\ufb01nition 2.4. Consider the random graph sequence (Gn)n, where Gn has step-augmented edges\nn and Vn are the active vertices of En. (Gn)n is (in\ufb01nitely) edge exchangeable if for every n \u2208 N\nE(cid:48)\n\nn) = {(e1, \u03c0(s1)), . . . , (em, \u03c0(sm))}.\n\n3\n\n\fand for every permutation \u03c0 of the steps [n], Gn\nand associated active vertices.\n\nd\n\n= \u02dcGn, where \u02dcGn has step-augmented edges \u03c0(E(cid:48)\nn)\n\nSee Figure 1 for visualizations of both vertex exchangeability and edge exchangeability. It remains\nto show that there are non-trivial models that are edge exchangeable (Section 3) and that edge-\nexchangeable models admit sparse graphs (Section 5).\n\n3 Graph frequency models\n\nWe next demonstrate that a wide class of models, which we call graph frequency models, exhibit edge\nexchangeability. Consider a latent in\ufb01nity of vertices indexed by the positive integers N = {1, 2, . . .},\nalong with an in\ufb01nity of edge labels (\u03b8{i,j}), each in a set \u0398, and positive edge rates (or frequencies)\n(w{i,j}) in R+. We allow both the (\u03b8{i,j}) and (w{i,j}) to be random, though this is not mandatory.\nFor instance, we might choose \u03b8{i,j} = (i, j) for i \u2264 j, and \u0398 = R2. Alternatively, the \u03b8{i,j}\ncould be drawn iid from a continuous distribution such as Unif[0, 1]. For any choice of (\u03b8{i,j}) and\n(w{i,j}),\n\n(cid:88)\n\n{i,j}:i,j\u2208N\n\nW :=\n\nw{i,j}\u03b4\u03b8{i,j}\n\n(1)\n\nis a measure on \u0398. Moreover, it is a discrete measure since it is always atomic. If either (\u03b8{i,j}) or\n(w{i,j}) (or both) are random, W is a discrete random measure on \u0398 since it is a random, discrete-\nmeasure-valued element. Given the edge rates (or frequencies) (w{i,j}) in W , we next show some\nnatural ways to construct edge-exchangeable graphs.\n\nIf the rates (w{i,j}) are normalized such that(cid:80){i,j}:i,j\u2208N w{i,j} = 1, then\n\nSingle edge per step.\n(w{i,j}) is a distribution over all possible vertex pairs. In other words, W is a probability measure. We\ncan form an edge-exchangeable graph sequence by \ufb01rst drawing values for (w{i,j}) and (\u03b8{i,j})\u2014and\nsetting E0 = \u2205. We recursively set En+1 = En \u222a {e}, where e is an edge {i, j} chosen from the\ndistribution (w{i,j}). This construction introduces a single edge in the graph each step, although it\nmay be a duplicate of an edge that already exists. Therefore, this technique generates multigraphs\none edge at a time. Since the edge every step is drawn conditionally iid given W , we have an\nedge-exchangeable graph.\n\nMultiple edges per step. Alternatively, the rates (w{i,j}) may not be normalized. Then W may\nnot be a probability measure. Let f (m|w) be a distribution over non-negative integers m given some\nrate w \u2208 R+. We again initialize our sequence by drawing (w{i,j}) and (\u03b8{i,j}) and setting E0 = \u2205.\nIn this case, recursively, on the nth step, start by setting F = \u2205. For every possible edge e = {i, j},\nind\u223c f (\u00b7|we) and add me copies of edge e to\nwe draw the multiplicity of the edge e in this step as me\nF . Finally, En+1 = En \u222a F . This technique potentially introduces multiple edges in each step, in\nwhich edges themselves may have multiplicity greater than one and may be duplicates of edges that\nalready exist in the graph. Therefore, this technique generates multigraphs, multiple edges at a time.\nIf we restrict f and W such that \ufb01nitely many edges are added on every step almost surely, we have\nan edge-exchangeable graph, as the edges in each step are drawn conditionally iid given W .\nGiven a sequence of edge sets E0, E1, . . . constructed via either of the above methods, we can\nform a binary graph sequence \u00afE0, \u00afE1, . . . by setting \u00afEi to have the same edges as Ei except with\nmultiplicity 1. Although this binary graph is not itself edge exchangeable, it inherits many of the\nproperties (such as sparsity, as shown in Section 5) of the underlying edge-exchangeable multigraph.\nThe choice of the distribution on the measure W has a strong in\ufb02uence on the properties of the\nresulting edge-exchangeable graph sampled via one of the above methods. For example, one choice is\nto set w{i,j} = wiwj, where the (wi)i are a countable in\ufb01nity of random values generated according\nto a Poisson point process (PPP). We say that (wi)i is distributed according to a Poisson point process\nparameterized by rate measure \u03bd, (wi)i \u223c PPP(\u03bd), if (a) #{i : wi \u2208 A} \u223c Poisson(\u03bd(A)) for any\nset A with \ufb01nite measure \u03bd(A) and (b) #{i : wi \u2208 Aj} are independent random variables across any\n\ufb01nite collection of disjoint sets (Aj)J\nj=1. In Section 5 we examine a particular example of this graph\nfrequency model, and demonstrate that sparsity is possible in edge-exchangeable graphs.\n\n4\n\n\f(a) Graph frequency model (\ufb01xed y, n steps)\n\n(b) Caron\u2013Fox, PPP on [0, y] \u00d7 [0, y] (1 step, y grows)\nFigure 2: A comparison of a graph frequency model (Section 3 and Equation (2)) and the generative\nmodel of Caron and Fox [12]. Any interval [0, y] contains a countably in\ufb01nite number of atoms with\na nonzero weight in the random measure; a draw from the random measure is plotted at the top (and\nrepeated on the right side). Each atom corresponds to a latent vertex. Each point (\u03b8i, \u03b8j) corresponds\nto a latent edge. Darker point colors on the left occur for greater edge multiplicities. On the left, more\nlatent edges are instantiated as more steps n are taken. On the right, the edges within [0, y]2 are \ufb01xed,\nbut more edges are instantiated as y grows.\n\nG = (cid:80)\n\n4 Related work and connection to nonparametric Bayes\nGiven a unique label \u03b8i for each vertex i \u2208 N, and denoting gij = gji to be the number of undirected\nedges between vertices i and j, the graph itself can be represented as the discrete random measure\n+. A different notion of exchangeability for graphs than the ones in\n+ is (jointly)\n\nSection 2 can be phrased for such atomic random measures: a point process G on R2\nexchangeable if, for all \ufb01nite permutations \u03c0 of N and all h > 0,\n\ni,j gij\u03b4(\u03b8i,\u03b8j ) on R2\n\nG(Ai \u00d7 Aj)\n\nd\n\n= G(A\u03c0(i) \u00d7 A\u03c0(j)), for (i, j) \u2208 N2,\n\nwhere Ai := [h \u00b7 (i \u2212 1), h \u00b7 i].\n\ngraph measure G = (cid:80)\n\nThis form of exchangeability, which we refer to as Kallenberg exchangeability, can intuitively be\nviewed as invariance of the graph distribution to relabeling of the vertices, which are now embedded in\nR2\n+. As such it is analogous to vertex exchangeability, but for discrete random measures [12, Sec. 4.1].\nExchangeability for random measures was introduced by Aldous [2], and a representation theorem\nwas given by Kallenberg [17, 18, Ch. 9]. The use of Kallenberg exchangeability for modeling graphs\nwas \ufb01rst proposed by Caron and Fox [12], and then characterized in greater generality by Veitch and\nRoy [22] and Borgs et al. [6]. Edge exchangeability is distinct from Kallenberg exchangeability, as\nshown by the following example.\nExample 4.1 (Edge exchangeable but not Kallenberg exchangeable). Consider the graph frequency\nmodel developed in Section 3, with w{i,j} = (ij)\u22122 and \u03b8{i,j} = {i, j}. Since the edges at each\nstep are drawn iid given W , the graph sequence is edge exchangeable. However, the corresponding\ni,j nij\u03b4(i,j) (where nij = nji \u223c Binom(N, (ij)\u22122)) is not Kallenberg\nexchangeable, since the probability of generating edge {i, j} is directly related to the positions (i, j)\n+ of the corresponding atoms in G (in particular, the probability is decreasing in ij). (cid:4)\nand (j, i) in R2\nOur graph frequency model is reminiscent of the Caron and Fox [12] generative model, but has a\nnumber of key differences. At a high level, this earlier model generates a weight measure W =\ni,j wij\u03b4(\u03b8i,\u03b8j ) (Caron and Fox [12] used, in particular, the outer product of a completely random\nmeasure), and the graph measure G is constructed by sampling gij once given wij for each pair\ni, j. To create a \ufb01nite graph, the graph measure G is restricted to the subset [0, y] \u00d7 [0, y] \u2282 R2\n+ for\n0 < y < \u221e; to create a projective growing graph sequence, the value of y is increased. By contrast,\nin the analogous graph frequency model of the present work, y is \ufb01xed, and we grow the network\n\n(cid:80)\n\n5\n\n\fby repeatedly sampling the number of edges gij between vertices i and j and summing the result.\nThus, in the Caron and Fox [12] model, a latent in\ufb01nity of vertices (only \ufb01nitely many of which\nare active) are added to the network each time y increases. In our graph frequency model, there is\na single collection of latent vertices, which are all gradually activated by increasing the number of\nsamples that generate edges between the vertices. See Figure 2 for an illustration.\nIncreasing n in the graph frequency model has the interpretation of both (a) time passing and (b) new\nindividuals joining a network because they have formed a connection that was not previously there. In\nparticular, only latent individuals that will eventually join the network are considered. This behavior\nis analogous to the well-known behavior of other nonparametric Bayesian models such as, e.g., a\nChinese restaurant process (CRP). In this analogy, the Dirichlet process (DP) corresponds to our\ngraph frequency model, and the clusters instantiated by the CRP correspond to the vertices that are\nactive after n steps. In the DP, only latent clusters that will eventually appear in the data are modeled.\nSince the graph frequency setting is stationary like the DP/CRP, it may be more straightforward to\ndevelop approximate Bayesian inference algorithms, e.g., via truncation [11].\nEdge exchangeability \ufb01rst appeared in work by Crane and Dempsey [13, 14], Williamson [23], and\nBroderick and Cai [7, 8], Cai and Broderick [10]. Broderick and Cai [7, 8] established the notion of\nedge exchangeability used here and provided characterizations via exchangeable partitions and feature\nallocations, as in Appendix C. Broderick and Cai [7], Cai and Broderick [10] developed a frequency\nmodel based on weights (wi)i generated from a Poisson process and studied several types of power\nlaws in the model. Crane and Dempsey [13] established a similar notion of edge exchangeability\nin the context of a larger statistical modeling framework. Crane and Dempsey [13, 14] provided\nsparsity and power law results for the case where the weights (wi)i are generated from a Pitman-Yor\nprocess and power law degree distribution simulations. Williamson [23] described a similar notion\nof edge exchangeability and developed an edge-exchangeable model where the weights (wi)i are\ngenerated from a Dirichlet process, a mixture model extension, and an ef\ufb01cient Bayesian inference\nprocedure. In work concurrent to the present paper, Crane and Dempsey [15] re-examined edge\nexchangeability, provided a representation theorem, and studied sparsity and power laws for the same\nmodel based on Pitman-Yor weights. By contrast, we here obtain sparsity results across all Poisson\npoint process-based graph frequency models of the form in Equation (2) below, and use a speci\ufb01c\nthree-parameter beta process rate measure only for simulations in Section 6.\n\n5 Sparsity in Poisson process graph frequency models\n\n\u03bd([0, 1]) = \u221e and(cid:82) 1\nin\ufb01nite collection of rates in [0, 1] and that(cid:80)\n\nWe now demonstrate that, unlike vertex exchangeability, edge exchangeability allows for sparsity in\nrandom graph sequences. We develop a class of sparse, edge-exchangeable multigraph sequences via\nthe Poisson point process construction introduced in Section 3, along with their binary restrictions.\nModel. Let W be a Poisson process on [0, 1] with a nonatomic, \u03c3-\ufb01nite rate measure \u03bd satisfying\n0 w\u03bd(dw) < \u221e. These two conditions on \u03bd guarantee that W is a countably\nw\u2208W w < \u221e almost surely. We can use W to construct\nthe set of rates: w{i,j} = wiwj if i (cid:54)= j, and w{i,i} = 0. The edge labels \u03b8{i,j} are unimportant in\ncharacterizing sparsity, and so can be ignored.\nTo use the multiple-edges-per-step graph frequency model from Section 3, we let f (\u00b7|w) be Bernoulli\nwith probability w. Since edge {i, j} is added in each step with probability wiwj, its multiplicity\nM{i,j} after n steps has a binomial distribution with parameters n, wiwj. Note that self-loops are\navoided by setting w{i,i} = 0. Therefore, the graph after n steps is described by:\n\nW \u223c PPP(\u03bd)\n\nM{i,j} ind\u223c Binom(n, wiwj) for i < j \u2208 N.\n\nEn containing {i, j} with multiplicity M{i,j}, and active vertices Vn = {i : (cid:80)\n(cid:88)\n\n(2)\nAs mentioned earlier, this generative model yields an edge-exchangeable graph, with edge multiset\nj M{i,j} > 0}.\nAlthough this model generates multigraphs, it can be modi\ufb01ed to sample a binary graph ( \u00afVn, \u00afEn) by\nsetting \u00afVn = Vn and \u00afEn to the set of edges {i, j} such that {i, j} has multiplicity \u2265 1 in En. We\ncan express the number of vertices and edges, in the multi- and binary graphs respectively, as\n| \u00afVn| =|Vn| =\n\n1(cid:0)M{i,j} > 0(cid:1) .\n\nM{i,j} > 0\n\n| \u00afEn| =\n\n\uf8f6\uf8f8 ,\n\nM{i,j},\n\n\uf8eb\uf8ed(cid:88)\n\nj(cid:54)=i\n\n1\n\n(cid:88)\n\ni\n\n(cid:88)\n\ni(cid:54)=j\n\n|En| =\n\n1\n2\n\n6\n\n1\n2\n\ni(cid:54)=j\n\n\fMoments. Recall that a sequence of graphs is considered sparse if |En| = o(|Vn|2). Thus, sparsity\nin the present setting is an asymptotic property of a random graph sequence. Rather than consider the\nasymptotics of the (dependent) random sequences |En| and |Vn| in concert, Lemma 5.1 allows us to\nconsider the asymptotics of their \ufb01rst moments, which are deterministic sequences and can be analyzed\nseparately. We use \u223c to denote asymptotic equivalence, i.e., an \u223c bn \u21d0\u21d2 limn\u2192\u221e an\n= 1. For\ndetails on our asymptotic notation and proofs for this section, see Appendix D.\nLemma 5.1. The number of vertices and edges for both the multi- and binary graphs satisfy\nn \u2192 \u221e.\n\n| \u00afEn| a.s.\u223c E(cid:0)| \u00afEn|(cid:1) ,\n\n| \u00afVn| = |Vn| a.s.\u223c E (|Vn|) ,\n\n|En| a.s.\u223c E (|En|) ,\n\nbn\n\nThus, we can examine the asymptotic behavior of the random numbers of edges and vertices by\nexamining the asymptotic behavior of their expectations, which are provided by Lemma 5.2.\nLemma 5.2. The expected numbers of vertices and edges for the multi- and binary graphs are\n\nE(cid:0)| \u00afVn|(cid:1) = E (|Vn|) =\n(cid:90)(cid:90)\n\nwv \u03bd(dw)\u03bd(dv),\n\n(cid:90) (cid:20)\n(cid:18)\nE(cid:0)| \u00afEn|(cid:1) =\n\n1 \u2212 exp\n\n1\n2\n\n(cid:90)\n(cid:90)(cid:90)\n\n(cid:19)(cid:21)\n\n\u2212\n\n(1 \u2212 (1 \u2212 wv)n)\u03bd(dv)\n\n\u03bd(dw),\n\n(1 \u2212 (1 \u2212 wv)n) \u03bd(dw)\u03bd(dv).\n\nE (|En|) =\n\nn\n2\n\n(cid:90) 1\n\nSparsity. We are now equipped to characterize the sparsity of this random graph sequence:\nTheorem 5.3. Suppose \u03bd has a regularly varying tail, i.e., there exist \u03b1 \u2208 (0, 1) and (cid:96) : R+ \u2192 R+\ns.t.\n\n\u03bd(dw) \u223c x\u2212\u03b1(cid:96)(x\u22121),\n\nx \u2192 0\n\nand\n\n\u2200c > 0,\n\nlim\nx\u2192\u221e\n\n(cid:96)(cx)\n(cid:96)(x)\n\n= 1.\n\nx\n\nThen as n \u2192 \u221e,\n\n|Vn| a.s.\n\n= \u0398(n\u03b1(cid:96)(n)),\n\n|En| a.s.\n\n= \u0398(n),\n\n| \u00afEn| a.s.\n\n= O\n\n(cid:16)\n\n(cid:16)\n\n(cid:96)(n1/2) min\n\n1+\u03b1\n2 , (cid:96)(n)n\n\nn\n\n3\u03b1\n2\n\n(cid:17)(cid:17)\n\n.\n\nTheorem 5.3 implies that the multigraph is sparse when \u03b1 \u2208 (1/2, 1), and that the restriction to the\nbinary graph is sparse for any \u03b1 \u2208 (0, 1). See Remark D.7 for a discussion. Thus, edge-exchangeable\nrandom graph sequences allow for a wide range of sparse and dense behavior.\n\n6 Simulations\n\nIn this section, we explore the behavior of graphs generated by the model from Section 5 via\nsimulation, with the primary goal of empirically demonstrating that the model produces sparse graphs.\nWe consider the case when the Poisson process generating the weights in Equation (2) has the rate\nmeasure of a three-parameter beta process (3-BP) on (0, 1) [9, 21]:\n\n\u0393(1 + \u03b2)\n\nw\u22121\u2212\u03b1(1 \u2212 w)\u03b1+\u03b2\u22121 dw,\n\n\u03bd(dw) = \u03b3\n\n\u0393(1 \u2212 \u03b1)\u0393(\u03b1 + \u03b2)\n\n\ufb01nite total mass(cid:80)\n\n(3)\nwith mass \u03b3 > 0, concentration \u03b2 > 0, and discount \u03b1 \u2208 (0, 1). In order for the 3-BP to have\nj wj < \u221e, we require that \u03b2 > \u2212\u03b1. We draw realizations of the weights from a\n3-BP(\u03b3, \u03b2, \u03b1) according to the stick-breaking representation given by Broderick, Jordan, and Pitman\n\u221e(cid:88)\n[9]. That is, the wi are the atom weights of the measure W for\n\nCi(cid:88)\n\ni\u22121(cid:89)\n\nW =\n\nV (i)\ni,j\n\n(1 \u2212 V ((cid:96))\n\ni,j )\u03b4\u03c8i,j ,\n\niid\u223c Pois(\u03b3),\n\nCi\n\ni=1\n\nj=1\n\nl=1\n\nind\u223c Beta(1 \u2212 \u03b1, \u03b2 + (cid:96)\u03b1),\n\nV ((cid:96))\ni,j\n\niid\u223c B0\n\n\u03c8i,j\n\n2000 rounds, resulting in(cid:80)2000\n\nand any continuous (i.e., non-atomic) choice of distribution B0.\nSince simulating an in\ufb01nite number of atoms is not possible, we truncate the outer summation in i to\ni=1 Ci weights. The parameters of the beta process were \ufb01xed to \u03b3 = 3\nand \u03b8 = 1, as they do not in\ufb02uence the sparsity of the resulting graph frequency model, and we varied\n\n7\n\n\f(a) Multigraph edges vs. active vertices\n\n(b) Binary graph edges vs. active vertices\n\nFigure 3: Data simulated from a graph frequency model with weights generated according to a 3-BP.\nColors represent different random draws. The dashed line has a slope of 2.\n\nthe discount parameter \u03b1. Given a single draw W (at some speci\ufb01c discount \u03b1), we then simulated\nthe edges of the graph, where the number of Bernoulli draws N varied between 50 and 2000.\nFigure 3a shows how the number of edges varies versus the total number of active vertices for\nthe multigraph, with different colors representing different random seeds. To check whether the\ngenerated graph was sparse, we determined the exponent by examining the slope of the data points\n(on a log-scale). In all plots, the black dashed line is a line with slope 2. In the multigraph, we found\nthat for the discount parameter settings \u03b1 = 0.6, 0.7, the slopes were below 2; for \u03b1 = 0, 0.3, the\nslopes were greater than 2. This corresponds to our theoretical results; for \u03b1 < 0.5 the multigraph\nis dense with slope greater than 2, and for \u03b1 > 0.5 the multigraph is sparse with slope less than 2.\nFurthermore, the sparse graphs exhibit power law relationships between the number of edges and\nvertices, i.e., |EN| a.s.\u223c c|VN|b, N \u2192 \u221e, where b \u2208 (1, 2), as suggested by the linear relationship in\nthe plots between the quantities on a log-scale. Note that there are necessarily fewer edges in the\nbinary graph than in the multigraph, and thus this plot implies that the binary graph frequency model\ncan also capture sparsity. Figure 3b con\ufb01rms this observation; it shows how the number of edges\nvaries with the number of active vertices for the binary graph. In this case, across \u03b1 \u2208 (0, 1), we\nobserve slopes that are less than 2. This agrees with our theory from Section 5, which states that the\nbinary graph is sparse for any \u03b1 \u2208 (0, 1).\n\n7 Conclusions\n\nWe have proposed an alternative form of exchangeability for random graphs, which we call edge\nexchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We\nhave demonstrated that edge-exchangeable graph sequences, unlike traditional vertex-exchangeable\nsequences, can be sparse by developing a class of edge-exchangeable graph frequency models that\nprovably exhibit sparsity. Simulations using edge frequencies drawn according to a three-parameter\nbeta process con\ufb01rm our theoretical results regarding sparsity. Our results suggest that a variety of\nfuture directions would be fruitful\u2014including theoretically characterizing different types of power\nlaws within graph frequency models, characterizing the use of truncation within graph frequency\nmodels as a means for approximate Bayesian inference in graphs, and understanding the full range of\ndistributions over sparse, edge-exchangeable graph sequences.\n\nAcknowledgments\n\nWe would like to thank Bailey Fosdick and Tyler McCormick for helpful conversations.\n\n8\n\n\fReferences\n[1] D. J. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate\n\nAnalysis, 11(4):581\u2013598, 1981.\n\n[2] D. J. Aldous. Exchangeability and related topics. In \u00c9cole d\u2019\u00e9t\u00e9 de probabilit\u00e9s de Saint-Flour, XIII\u20141983,\n\nvolume 1117 of Lecture Notes in Math., pages 1\u2013198. Springer, Berlin, 1985.\n\n[3] B. Bollob\u00e1s, S. Janson, and O. Riordan. The phase transition in inhomogeneous random graphs. Random\n\nStructures Algorithms, 31(1):3\u2013122, 2007.\n\n[4] C. Borgs, J. T. Chayes, H. Cohn, and Y. Zhao. An Lp theory of sparse graph convergence I: limits, sparse\n\nrandom graph models, and power law distributions. arXiv e-print 1401.2906, 2014.\n\n[5] C. Borgs, J. T. Chayes, H. Cohn, and S. Ganguly. Consistent nonparametric estimation for heavy-tailed\n\nsparse graphs. arXiv e-print 1401.1137, 2015.\n\n[6] C. Borgs, J. T. Chayes, H. Cohn, and N. Holden. Sparse exchangeable graphs and their limits via graphon\n\nprocesses. arXiv e-print 1601.07134, 2016.\n\n[7] T. Broderick and D. Cai. Edge-exchangeable graphs, sparsity, and power laws. In NIPS 2015 Workshop on\n\nBayesian Nonparametrics: The Next Generation, 2015.\n\n[8] T. Broderick and D. Cai. Edge-exchangeable graphs and sparsity. In NIPS 2015 Workshop on Networks in\n\nthe Social and Informational Sciences, 2015.\n\n[9] T. Broderick, M. I. Jordan, and J. Pitman. Beta processes, stick-breaking and power laws. Bayesian\n\nAnalysis, 7(2):439\u2013475, 2012.\n\n[10] D. Cai and T. Broderick. Completely random measures for modeling power laws in sparse graphs. In NIPS\n\n2015 Workshop on Networks in the Social and Informational Sciences, 2015.\n\n[11] T. Campbell, J. Huggins, J. How, and T. Broderick. Truncated random measures. arXiv e-print 1603.00861,\n\n2016.\n\n[12] F. Caron and E. Fox. Sparse graphs using exchangeable random measures. arXiv e-print 1401.1137v3,\n\n2015.\n\n[13] H. Crane and W. Dempsey. A framework for statistical network modeling. arXiv e-print 1509.08185, 2015.\n\n[14] H. Crane and W. Dempsey. Atypical scaling behavior persists in real world interaction networks. arXiv\n\ne-print 1509.08184, 2015.\n\n[15] H. Crane and W. Dempsey. Edge exchangeable models for network data. arXiv e-print 1603.04571, 2016.\n\n[16] D. N. Hoover. Relations on probability spaces and arrays of random variables. Preprint, Institute for\n\nAdvanced Study, Princeton, NJ, 1979.\n\n[17] O. Kallenberg. Exchangeable random measures in the plane. Journal of Theoretical Probability, 3(1):\n\n81\u2013136, 1990.\n\n[18] O. Kallenberg. Probabilistic symmetries and invariance principles. Probability and its Applications.\n\nSpringer, New York, 2005.\n\n[19] J. R. Lloyd, P. Orbanz, Z. Ghahramani, and D. M. Roy. Random function priors for exchangeable arrays\n\nwith applications to graphs and relational data. In NIPS 25, 2012.\n\n[20] P. Orbanz and D. M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures.\n\nIEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):437\u2013461, 2015.\n\n[21] Y. W. Teh and D. G\u00f6r\u00fcr. Indian buffet processes with power-law behavior. In NIPS 23, 2009.\n\n[22] V. Veitch and D. M. Roy. The class of random graphs arising from exchangeable random measures. arXiv\n\ne-print 1512.03099, 2015.\n\n[23] S. Williamson. Nonparametric network models for link prediction. Journal of Machine Learning Research,\n\n17:1\u201321, 2016.\n\n[24] P. J. Wolfe and S. C. Olhede. Nonparametric graphon estimation. arXiv e-print 1309.5936, 2013.\n\n9\n\n\f", "award": [], "sourceid": 2111, "authors": [{"given_name": "Diana", "family_name": "Cai", "institution": "University of Chicago"}, {"given_name": "Trevor", "family_name": "Campbell", "institution": "MIT"}, {"given_name": "Tamara", "family_name": "Broderick", "institution": "MIT"}]}