{"title": "Scalable Influence Estimation in Continuous-Time Diffusion Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 3147, "page_last": 3155, "abstract": "If a piece of information is released from a media site, can it spread, in 1 month, to a million web pages? This influence estimation problem is very challenging since both the time-sensitive nature of the problem and the issue of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with $|\\Vcal|$ nodes and $|\\Ecal|$ edges to an accuracy of $\\epsilon$ using  $n=O(1/\\epsilon^2)$ randomizations and up to logarithmic factors $O(n|\\Ecal|+n|\\Vcal|)$ computations. When used as a subroutine in a greedy influence maximization algorithm, our proposed method is guaranteed to find a set of nodes with an influence of at least $(1 - 1/e)\\operatorname{OPT} - 2\\epsilon$, where $\\operatorname{OPT}$ is the optimal value. Experiments on both synthetic and real-world data show that the proposed method can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence.", "full_text": "Scalable In\ufb02uence Estimation in\n\nContinuous-Time Diffusion Networks\n\nNan Du\u2217\n\nLe Song\u2217\n\nManuel Gomez-Rodriguez\u2020\n\nHongyuan Zha\u2217\n\nGeorgia Institute of Technology\u2217\n\nMPI for Intelligent Systems\u2020\n\ndunan@gatech.edu\n\nlsong@cc.gatech.edu\n\nmanuelgr@tue.mpg.de\n\nzha@cc.gatech.edu\n\nAbstract\n\nIf a piece of information is released from a media site, can we predict whether\nit may spread to one million web pages, in a month ? This in\ufb02uence estimation\nproblem is very challenging since both the time-sensitive nature of the task and\nthe requirement of scalability need to be addressed simultaneously. In this paper,\nwe propose a randomized algorithm for in\ufb02uence estimation in continuous-time\ndiffusion networks. Our algorithm can estimate the in\ufb02uence of every node in\na network with |V| nodes and |E| edges to an accuracy of \u0001 using n = O(1/\u00012)\nrandomizations and up to logarithmic factors O(n|E|+n|V|) computations. When\nused as a subroutine in a greedy in\ufb02uence maximization approach, our proposed\nalgorithm is guaranteed to \ufb01nd a set of C nodes with the in\ufb02uence of at least\n(1 \u2212 1/e) OPT\u22122C\u0001, where OPT is the optimal value. Experiments on both\nsynthetic and real-world data show that the proposed algorithm can easily scale\nup to networks of millions of nodes while signi\ufb01cantly improves over previous\nstate-of-the-arts in terms of the accuracy of the estimated in\ufb02uence and the quality\nof the selected nodes in maximizing the in\ufb02uence.\n\nIntroduction\n\n1\nMotivated by applications in viral marketing [1], researchers have been studying the in\ufb02uence max-\nimization problem: \ufb01nd a set of nodes whose initial adoptions of certain idea or product can trigger,\nin a time window, the largest expected number of follow-ups. For this purpose, it is essential to ac-\ncurately and ef\ufb01ciently estimate the number of follow-ups of an arbitrary set of source nodes within\nthe given time window. This is a challenging problem for that we need \ufb01rst accurately model the\ntiming information in cascade data and then design a scalable algorithm to deal with large real-world\nnetworks. Most previous work in the literature tackled the in\ufb02uence estimation and maximization\nproblems for in\ufb01nite time window [1, 2, 3, 4, 5, 6]. However, in most cases, in\ufb02uence must be\nestimated or maximized up to a given time, i.e., a \ufb01nite time window must be considered [7]. For\nexample, a marketer would like to have her advertisement viewed by a million people in one month,\nrather than in one hundred years. Such time-sensitive requirement renders those algorithms which\nonly consider static information, such as network topologies, inappropriate in this context.\nA sequence of recent work has argued that modeling cascade data and information diffusion using\ncontinuous-time diffusion networks can provide signi\ufb01cantly more accurate models than discrete-\ntime models [8, 9, 10, 11, 12, 13, 14, 15]. There is a twofold rationale behind this modeling choice.\nFirst, since follow-ups occur asynchronously, continuous variables seem more appropriate to repre-\nsent them. Arti\ufb01cially discretizing the time axis into bins introduces additional tuning parameters,\nlike the bin size, which are not easy to choose optimally. Second, discrete time models can only\ndescribe transmission times which obey an exponential density, and hence can be too restricted\nto capture the rich temporal dynamics in the data. Extensive experimental comparisons on both\nsynthetic and real world data showed that continuous-time models yield signi\ufb01cant improvement\nin settings such as recovering hidden diffusion network structures from cascade data [8, 10] and\npredicting the timings of future events [9, 14].\n\n1\n\n\fHowever, estimating and maximizing in\ufb02uence based on continuous-time diffusion models also en-\ntail many challenges. First, the in\ufb02uence estimation problem in this setting is a dif\ufb01cult graphical\nmodel inference problem, i.e., computing the marginal density of continuous variables in loopy\ngraphical models. The exact answer can be computed only for very special cases. For example,\nGomez-Rodriguez et al. [7] have shown that the problem can be solved exactly when the trans-\nmission functions are exponential densities, by using continuous time Markov processes theory.\nHowever, the computational complexity of such approach, in general, scales exponentially with the\nsize and density of the network. Moreover, extending the approach to deal with arbitrary transmis-\nsion functions would require additional nontrivial approximations which would increase even more\nthe computational complexity. Second, it is unclear how to scale up in\ufb02uence estimation and maxi-\nmization algorithms based on continuous-time diffusion models to millions of nodes. Especially in\nthe maximization case, even a naive sampling algorithm for approximate inference is not scalable:\nn sampling rounds need to be carried out for each node to estimate the in\ufb02uence, which results in an\noverall O(n|V||E|) algorithm. Thus, our goal is to design a scalable algorithm which can perform\nin\ufb02uence estimation and maximization in this regime of networks with millions of nodes.\nIn particular, we propose CONTINEST (Continous-Time In\ufb02uence Estimation), a scalable rando-\nmized algorithm for in\ufb02uence estimation in a continuous-time diffusion network with heterogeneous\nedge transmission functions. The key idea of the algorithm is to view the problem from the perspec-\ntive of graphical model inference, and reduces the problem to a neighborhood estimation problem\nin graphs. Our algorithm can estimate the in\ufb02uence of every node in a network with |V| nodes and\n|E| edges to an accuracy of \u0001 using n = O(1/\u00012) randomizations and up to logarithmic factors\nO(n|E| + n|V|) computations. When used as a subroutine in a greedy in\ufb02uence maximization al-\ngorithm, our proposed algorithm is guaranteed to \ufb01nd a set of nodes with an in\ufb02uence of at least\n(1 \u2212 1/e) OPT\u22122C\u0001, where OPT is the optimal value. Finally, we validate CONTINEST on both\nin\ufb02uence estimation and maximization problems over large synthetic and real world datasets. In\nterms of in\ufb02uence estimation, CONTINEST is much closer to the true in\ufb02uence and much faster\nthan other state-of-the-art methods. With respect to the in\ufb02uence maximization, CONTINEST al-\nlows us to \ufb01nd a set of sources with greater in\ufb02uence than other state-of-the-art methods.\n\n2 Continuous-Time Diffusion Networks\nFirst, we revisit the continuous-time generative model for cascade data in social networks introduced\nin [10]. The model associates each edge j \u2192 i with a transmission function, fji(\u03c4ji), a density over\ntime, in contrast to previous discrete-time models which associate each edge with a \ufb01xed infection\nprobability [1]. Moreover, it also differs from discrete-time models in the sense that events in a\ncascade are not generated iteratively in rounds, but event timings are sampled directly from the\ntransmission function in the continuous-time model.\nContinuous-Time Independent Cascade Model. Given a directed contact network, G = (V,E),\nwe use a continuous-time independent cascade model for modeling a diffusion process [10]. The\nprocess begins with a set of infected source nodes, A, initially adopting certain contagion (idea,\nmeme or product) at time zero. The contagion is transmitted from the sources along their out-going\nedges to their direct neighbors. Each transmission through an edge entails a random spreading time,\n\u03c4, drawn from a density over time, fji(\u03c4 ). We assume transmission times are independent and\npossibly distributed differently across edges. Then, the infected neighbors transmit the contagion\nto their respective neighbors, and the process continues. We assume that an infected node remains\ninfected for the entire diffusion process. Thus, if a node i is infected by multiple neighbors, only the\nneighbor that \ufb01rst infects node i will be the true parent. As a result, although the contact network\ncan be an arbitrary directed network, each cascade (a vector of event timing information from the\nspread of a contagion) induces a Directed Acyclic Graph (DAG).\nHeterogeneous Transmission Functions. Formally, the transmission function fji(ti|tj) for di-\nrected edge j \u2192 i is the conditional density of node i getting infected at time ti given that node j\nwas infected at time tj. We assume it is shift invariant: fji(ti|tj) = fji(\u03c4ji), where \u03c4ji := ti \u2212 tj,\nand nonnegative: fji(\u03c4ji) = 0 if \u03c4ji < 0. Both parametric transmission functions, such as the ex-\nponential and Rayleigh function [10, 16], and nonparametric function [8] can be used and estimated\nfrom cascade data (see Appendix A for more details).\nShortest-Path property. The independent cascade model has a useful property we will use later:\ngiven a sample of transmission times of all edges, the time ti taken to infect a node i is the length\n\n2\n\n\fof the shortest path in G from the sources to node i, where the edge weights correspond to the\nassociated transmission times.\n3 Graphical Model Perspectives for Continuous-Time Diffusion Networks\nThe continuous-time independent cascade model is essentially a directed graphical model for a set of\ndependent random variables, the infection times ti of the nodes, where the conditional independence\nstructure is supported on the contact network G (see Appendix B for more details). More formally,\nthe joint density of {ti}i\u2208V can be expressed as\n\np ({ti}i\u2208V ) =\n\n(1)\nwhere \u03c0i denotes the set of parents of node i in a cascade-induced DAG, and p(ti|{tj}j\u2208\u03c0i) is the\nconditional density of infection ti at node i given the infection times of its parents.\nInstead of directly modeling the infection times ti, we can focus on the set of mutually independent\nrandom transmission times \u03c4ji = ti \u2212 tj. Interestingly, by switching from a node-centric view to an\nedge-centric view, we obtain a fully factorized joint density of the set of transmission times\n\n(cid:89)\ni\u2208V p (ti|{tj}j\u2208\u03c0i ) ,\n\np(cid:0){\u03c4ji}(j,i)\u2208E(cid:1) =\n\n(cid:89)\n\n(j,i)\u2208E fji(\u03c4ji),\n\n(2)\n\nBased on the Shortest-Path property of the independent cascade model, each variable ti can be\nviewed as a transformation from the collection of variables {\u03c4ji}(j,i)\u2208E.\nMore speci\ufb01cally, let Qi be the collection of directed paths in G from the source nodes to node i,\nwhere each path q \u2208 Qi contains a sequence of directed edges (j, l). Assuming all source nodes are\ninfected at zero time, then we obtain variable ti via\n\n(cid:0){\u03c4ji}(j,i)\u2208E(cid:1) := min\n\n(cid:88)\n\n(3)\nwhere the transformation gi(\u00b7) is the value of the shortest-path minimization. As a special case, we\ncan now compute the probability of node i infected before T using a set of independent variables:\n\nti = gi\n\n(j,l)\u2208q\n\nq\u2208Qi\n\n\u03c4jl,\n\nPr{ti \u2264 T} = Pr(cid:8)gi\n\n(cid:0){\u03c4ji}(j,i)\u2208E(cid:1) \u2264 T(cid:9) .\n\nIn\ufb02uence Estimation Problem in Continuous-Time Diffusion Networks\n\n(4)\nThe signi\ufb01cance of the relation is that it allows us to transform a problem involving a sequence of\ndependent variables {ti}i\u2208V to one with independent variables {\u03c4ji}(j,i)\u2208E. Furthermore, the two\nperspectives are connected via the shortest path algorithm in weighted directed graph, a standard\nwell-studied operation in graph analysis.\n4\nIntuitively, given a time window, the wider the spread of infection, the more in\ufb02uential the set of\nsources. We adopt the de\ufb01nition of in\ufb02uence as the average number of infected nodes given a set of\nsource nodes and a time window, as in previous work [7]. More formally, consider a set of C source\nnodes A \u2286 V which gets infected at time zero, then, given a time window T , a node i is infected in\nthe time window if ti \u2264 T . The expected number of infected nodes (or the in\ufb02uence) given the set\nof transmission functions {fji}(j,i)\u2208E can be computed as\n\n=\n\ni\u2208V\n\n(5)\nwhere I{\u00b7} is the indicator function and the expectation is taken over the the set of dependent\nvariables {ti}i\u2208V.\nEssentially, the in\ufb02uence estimation problem in Eq. (5) is an inference problem for graphical models,\nwhere the probability of event ti \u2264 T given sources in A can be obtained by summing out the\npossible con\ufb01guration of other variables {tj}j(cid:54)=i. That is\n\ni\u2208V\n\ni\u2208V Pr{ti \u2264 T} ,\n\nE [I{ti \u2264 T}] =\n\n(cid:88)\n\n\u03c3(A, T ) = E(cid:104)(cid:88)\n\nI{ti \u2264 T}(cid:105)\n\n(cid:88)\n\n(cid:90) \u221e\n\n(cid:16)(cid:89)\nj\u2208V p(cid:0)tj|{tl}l\u2208\u03c0j\n\n(cid:1)(cid:17)(cid:16)(cid:89)\n\n(cid:17)\n\nj\u2208V dtj\n\n,\n\n(6)\n\nPr{ti \u2264 T} =\n\n\u00b7\u00b7\u00b7\n\n\u00b7\u00b7\u00b7\n\n0\n\nti=0\n\n0\n\n(cid:90) \u221e\n\n(cid:90) T\n\nwhich is, in general, a very challenging problem. First, the corresponding directed graphical models\ncan contain nodes with high in-degree and high out-degree. For example, in Twitter, a user can\nfollow dozens of other users, and another user can have hundreds of \u201cfollowees\u201d. The tree-width\ncorresponding to this directed graphical model can be very high, and we need to perform integration\nfor functions involving many continuous variables. Second, the integral in general can not be eval-\n\n3\n\n\fuated analytically for heterogeneous transmission functions, which means that we need to resort to\nnumerical integration by discretizing the domain [0,\u221e). If we use N level of discretization for each\nvariable, we would need to enumerate O(N|\u03c0i|) entries, exponential in the number of parents.\nOnly in very special cases, can one derive the closed-form equation for computing Pr{ti \u2264 T} [7].\nHowever, without further heuristic approximation, the computational complexity of the algorithm\nis exponential in the size and density of the network. The intrinsic complexity of the problem\nentails the utilization of approximation algorithms, such as mean \ufb01eld algorithms or message passing\nalgorithms.We will design an ef\ufb01cient randomized (or sampling) algorithm in the next section.\n5 Ef\ufb01cient In\ufb02uence Estimation in Continuous-Time Diffusion Networks\nOur \ufb01rst key observation is that we can transform the in\ufb02uence estimation problem in Eq. (5) into a\nproblem with independent variables. Using relation in Eq. (4), we have\n\n(cid:88)\n\ni\u2208V Pr(cid:8)gi\n\n(cid:0){\u03c4ji}(j,i)\u2208E(cid:1) \u2264 T(cid:9) = E(cid:104)(cid:88)\n\nI(cid:8)gi\n\n(cid:0){\u03c4ji}(j,i)\u2208E(cid:1) \u2264 T(cid:9)(cid:105)\n\n\u03c3(A, T ) =\n\n,\n\ni\u2208V\n\nborhood size of the source nodes, i.e., the summation (cid:80)\n\n(7)\nwhere the expectation is with respect to the set of independent variables {\u03c4ji}(j,i)\u2208E. This equivalent\nformulation suggests a naive sampling (NS) algorithm for approximating \u03c3(A, T ): draw n samples\nof {\u03c4ji}(j,i)\u2208E, run a shortest path algorithm for each sample, and \ufb01nally average the results (see\nAppendix C for more details). However, this naive sampling approach has a computational com-\nplexity of O(nC|V||E| + nC|V|2 log |V|) due to the repeated calling of the shortest path algorithm.\nThis is quadratic to the network size, and hence not scalable to millions of nodes.\nOur second key observation is that for each sample {\u03c4ji}(j,i)\u2208E, we are only interested in the neigh-\ni\u2208V I{\u00b7} in Eq. (7), rather than in the\nindividual shortest paths. Fortunately, the neighborhood size estimation problem has been studied\nin the theoretical computer science literature. Here, we adapt a very ef\ufb01cient randomized algorithm\nby Cohen [17] to our in\ufb02uence estimation problem. This randomized algorithm has a computational\ncomplexity of O(|E| log |V| + |V| log2 |V|) and it estimates the neighborhood sizes for all possible\nsingle source node locations. Since it needs to run once for each sample of {\u03c4ji}(j,i)\u2208E, we obtain\nan overall in\ufb02uence estimation algorithm with O(n|E| log |V| + n|V| log2 |V|) computation, nearly\nlinear in network size. Next we will revisit Cohen\u2019s algorithm for neighborhood estimation.\n5.1 Randomized Algorithm for Single-Source Neighborhood Size Estimation\nGiven a \ufb01xed set of edge transmission times {\u03c4ji}(j,i)\u2208E and a source node s, infected at time 0, the\nneighborhood N (s, T ) of a source node s given a time window T is the set of nodes within distance\nT from s, i.e.,\n\nN (s, T ) =(cid:8)i(cid:12)(cid:12) gi\n\n(cid:0){\u03c4ji}(j,i)\u2208E(cid:1) \u2264 T, i \u2208 V(cid:9) .\n\n(8)\nInstead of estimating N (s, T ) directly, the algorithm will assign an exponentially distributed ran-\ndom label ri to each network node i. Then, it makes use of the fact that the minimum of a set of\nexponential random variables {ri}i\u2208N (s,T ) will also be a exponential random variable, but with its\nparameter equals to the number of variables. That is if each ri \u223c exp(\u2212ri), then the smallest label\nwithin distance T from source s, r\u2217 := mini\u2208N (s,T ) ri, will distribute as r\u2217 \u223c exp{\u2212|N (s, T )|r\u2217}.\nSuppose we randomize over the labeling m times, and obtain m such least labels, {ru\u2217}m\nu=1. Then\nthe neighborhood size can be estimated as\n\n.\n\nu=1 ru\u2217\n\n(9)\nwhich is shown to be an unbiased estimator of |N (s, T )| [17]. This is an interesting relation since\nit allows us to transform the counting problem in (8) to a problem of \ufb01nding the minimum random\nlabel r\u2217. The key question is whether we can compute the least label r\u2217 ef\ufb01ciently, given random\nlabels {ri}i\u2208V and any source node s.\nCohen [17] designed a modi\ufb01ed Dijkstra\u2019s algorithm (Algorithm 1) to construct a data structure\nr\u2217(s), called least label list, for each node s to support such query. Essentially, the algorithm starts\nwith the node i with the smallest label ri, and then it traverses in breadth-\ufb01rst search fashion along\nthe reverse direction of the graph edges to \ufb01nd all reachable nodes. For each reachable node s, the\ndistance d\u2217 between i and s, and ri are added to the end of r\u2217(s). Then the algorithm moves to the\nnode i(cid:48) with the second smallest label ri(cid:48), and similarly \ufb01nd all reachable nodes. For each reachable\n\n|N (s, T )| \u2248 m \u2212 1(cid:80)m\n\n4\n\n\fnode s, the algorithm will compare the current distance d\u2217 between i(cid:48) and s with the last recorded\ndistance in r\u2217(s). If the current distance is smaller, then the current d\u2217 and ri(cid:48) are added to the end\nof r\u2217(s). Then the algorithm move to the node with the third smallest label and so on. The algorithm\nis summarized in Algorithm 1 in Appendix D.\nAlgorithm 1 returns a list r\u2217(s) per node s \u2208 V, which contains information about distance to the\nsmallest reachable labels from s. In particular, each list contains pairs of distance and random labels,\n(d, r), and these pairs are ordered as\n\n\u221e > d(1) > d(2) > . . . > d(|r\u2217(s)|) = 0\n\n(10)\n(11)\nwhere {\u00b7}(l) denotes the l-th element in the list. (see Appendix D for an example). If we want to\nquery the smallest reachable random label r\u2217 for a given source s and a time T , we only need to\nperform a binary search on the list for node s:\n\nr(1) < r(2) < . . . < r(|r\u2217(s)|),\n\nr\u2217 = r(l), where d(l\u22121) > T \u2265 d(l).\n\n(12)\nFinally, to estimate |N (s, T )|, we generate m i.i.d. collections of random labels, run Algorithm 1\nu=1, which we use on Eq. (9) to estimate |N (i, T )|.\non each collection, and obtain m values {ru\u2217}m\nThe computational complexity of Algorithm 1 is O(|E| log |V| + |V| log2 |V|), with expected size\nof each r\u2217(s) being O(log |V|). Then the expected time for querying r\u2217 is O(log log |V|) using\nbinary search. Since we need to generate m set of random labels and run Algorithm 1 m times, the\noverall computational complexity for estimating the single-source neighborhood size for all s \u2208 V\nis O(m|E| log |V| + m|V| log2 |V| + m|V| log log |V|). For large scale network, and when m (cid:28)\nmin{|V|,|E|}, this randomized algorithm can be much more ef\ufb01cient than approaches based on\ndirectly calculating the shortest paths.\n\n5.2 Constructing Estimation for Multiple-Source Neighborhood Size\nWhen we have a set of sources, A, its neighborhood is the union of the neighborhoods of its consti-\ntuent sources\n\n(cid:91)\ni\u2208A N (i, T ).\n\nN (A, T ) =\n\n(13)\n\nThis is true because each source independently infects its downstream nodes. Furthermore, to cal-\nculate the least label list r\u2217 corresponding to N (A, T ), we can simply reuse the least label list r\u2217(i)\nof each individual source i \u2208 A. More formally,\n\nr\u2217 = mini\u2208A minj\u2208N (i,T ) rj,\n\n(14)\nwhere the inner minimization can be carried out by querying r\u2217(i). Similarly, after we obtain m\nsamples of r\u2217, we can estimate |N (A, T )| using Eq. (9). Importantly, very little additional work is\nneeded when we want to calculate r\u2217 for a set of sources A, and we can reuse work done for a single\nsource. This is very different from a naive sampling approach where the sampling process needs to\nbe done completely anew if we increase the source set. In contrast, using the randomized algorithm,\nonly an additional constant-time minimization over |A| numbers is needed.\n\n5.3 Overall Algorithm\nSo far, we have achieved ef\ufb01cient neighborhood size estimation of |N (A, T )| with respect to a\ngiven set of transmission times {\u03c4ji}(j,i)\u2208E. Next, we will estimate the in\ufb02uence by averaging over\nmultiple sets of samples for {\u03c4ji}(j,i)\u2208E. More speci\ufb01cally, the relation from (7)\n\n\u03c3(A, T ) = E{\u03c4ji}(j,i)\u2208E [|N (A, T )|] = E{\u03c4ji}E{r1,...,rm}|{\u03c4ji}\n\n,\n\n(15)\n\n(cid:21)\n\n(cid:20) m \u2212 1(cid:80)m\n\nu=1 ru\u2217\n\nsuggests the following overall algorithm\n\nContinuous-Time In\ufb02uence Estimation (CONTINEST):\n1. Sample n sets of random transmission times {\u03c4 l\n2. Given a set of {\u03c4 l\n3. Estimate \u03c3(A, T ) by sample averages \u03c3(A, T ) \u2248 1\n\nij}(j,i)\u2208E, sample m sets of random labels {ru\n\nij}(j,i)\u2208E \u223c (cid:81)\n(cid:80)n\n\ni }i\u2208V \u223c (cid:81)\n(cid:0)(m \u2212 1)/(cid:80)m\n\n(j,i)\u2208E fji(\u03c4ji)\n\nn\n\nl=1\n\ni\u2208V exp(\u2212ri)\n\nul=1 rul\u2217 (cid:1)\n\n5\n\n\f(cid:18) 2|V|\n\n(cid:19)\n\nn (cid:62) C\u039b\n\nImportantly, the number of random labels, m, does not need to be very large. Since the estimator\nfor |N (A, T )| is unbiased [17], essentially the outer-loop of averaging over n samples of random\ntransmission times further reduces the variance of the estimator in a rate of O(1/n). In practice,\nwe can use a very small m (e.g., 5 or 10) and still achieve good results, which is also con\ufb01rmed\nby our later experiments. Compared to [2], the novel application of Cohen\u2019s algorithm arises for\nestimating in\ufb02uence for multiple sources, which drastically reduces the computation by cleverly\nusing the least-label list from single source. Moreover, we have the following theoretical guarantee\n(see Appendix E for proof)\nTheorem 1 Draw the following number of samples for the set of random transmission times\n\n(16)\nwhere \u039b := maxA:|A|\u2264C 2\u03c3(A, T )2/(m \u2212 2) + 2V ar(|N (A, T )|)(m \u2212 1)/(m \u2212 2) + 2a\u0001/3 and\n|N (A, T )| \u2264 a, and for each set of random transmission times, draw m set of random labels. Then\n\n|(cid:98)\u03c3(A, T ) \u2212 \u03c3(A, T )| (cid:54) \u0001 uniformly for all A with |A| (cid:54) C, with probability at least 1 \u2212 \u03b4.\n\n\u00012 log\n\n\u03b4\n\nThe theorem indicates that the minimum number of samples, n, needed to achieve certain accuracy\nis related to the actual size of the in\ufb02uence \u03c3(A, T ), and the variance of the neighborhood size\n|N (A, T )| over the random draw of samples. The number of random labels, m, drawn in the inner\nloop of the algorithm will monotonically decrease the dependency of n on \u03c3(A, T ). It suf\ufb01ces to\ndraw a small number of random labels, as long as the value of \u03c3(A, T )2/(m \u2212 2) matches that\nof V ar(|N (A, T )|). Another implication is that in\ufb02uence at larger time window T is harder to\nestimate, since \u03c3(A, T ) will generally be larger and hence require more random labels.\n\nIn\ufb02uence Maximization\n\n6\nOnce we know how to estimate the in\ufb02uence \u03c3(A, T ) for any A \u2286 V and time window T ef\ufb01ciently,\nwe can use them in \ufb01nding the optimal set of C source nodes A\u2217 \u2286 V such that the expected number\nof infected nodes in G is maximized at T . That is, we seek to solve,\n\nA\u2217 = argmax|A|(cid:54)C \u03c3(A, T ),\n\n(17)\nwhere set A is the variable. The above optimization problem is NP-hard in general. By construction,\n\u03c3(A, T ) is a non-negative, monotonic nondecreasing function in the set of source nodes, and it can\nbe shown that \u03c3(A, T ) satis\ufb01es a diminishing returns property called submodularity [7].\nA well-known approximation algorithm to maximize monotonic submodular functions is the greedy\nalgorithm. It adds nodes to the source node set A sequentially. In step k, it adds the node i which\nmaximizes the marginal gain \u03c3(Ak\u22121 \u222a{i}; T )\u2212 \u03c3(Ak\u22121; T ). The greedy algorithm \ufb01nds a source\nnode set which achieves at least a constant fraction (1 \u2212 1/e) of the optimal [18]. Moreover, lazy\nevaluation [5] can be employed to reduce the required number of marginal gains per iteration. By\nusing our in\ufb02uence estimation algorithm in each iteration of the greedy algorithm, we gain the\nfollowing additional bene\ufb01ts:\nFirst, at each iteration k, we do not need to rerun the full in\ufb02uence estimation algorithm (section 5.2).\nWe just need to store the least label list r\u2217(i) for each node i \u2208 V computed for a single source,\nwhich requires expected storage size of O(|V| log |V|) overall.\nSecond, our in\ufb02uence estimation algorithm can be easily parallelized. Its two nested sampling loops\ncan be parallelized in a straightforward way since the variables are independent of each other. How-\never, in practice, we use a small number of random labels, and m (cid:28) n. Thus we only need to\nparallelize the sampling for the set of random transmission times {\u03c4ji}. The storage of the least\nelement lists can also be distributed.\nHowever, by using our randomized algorithm for in\ufb02uence estimation, we also introduce a sampling\nerror to the greedy algorithm due to the approximation of the in\ufb02uence \u03c3(A, T ). Fortunately, the\ngreedy algorithm is tolerant to such sampling noise, and a well-known result provides a guarantee\nfor this case (following an argument in [19, Th. 7.9]):\nTheorem 2 Suppose the in\ufb02uence \u03c3(A, T ) for all A with |A| \u2264 C are estimated uniformly with\n(1 \u2212 1/e)OP T \u2212 2C\u0001 with probability at least 1 \u2212 \u03b4.\n\nerror \u0001 and con\ufb01dence 1 \u2212 \u03b4, the greedy algorithm returns a set of sources (cid:98)A such that \u03c3((cid:98)A, T ) \u2265\n\n6\n\n\f(a) In\ufb02uence vs. time\n\n(b) Error vs. #samples\n\n(c) Error vs. #labels\n\nFigure 1: For core-periphery networks with 1,024 nodes and 2,048 edges, (a) estimated in\ufb02uence for increas-\ning time window T , and (b) \ufb01xing T = 10, relative error for increasing number of samples with 5 random\nlabels, and (c) for increasing number of random labels with 10,000 random samples.\n7 Experiments\n\n(cid:0) t\n\n\u03b1\n\ne\u2212(t/\u03b1)\u03b2\n\n(cid:1)\u03b2\u22121\n\nWe evaluate the accuracy of the estimated in\ufb02uence given by CONTINEST and investigate the per-\nformance of in\ufb02uence maximization on synthetic and real networks. We show that our approach\nsigni\ufb01cantly outperforms the state-of-the-art methods in terms of both speed and solution quality.\nSynthetic network generation. We generate three types of Kronecker networks [20]: (i) core-\nperiphery networks (parameter matrix:\n[0.9 0.5; 0.5 0.3]), which mimic the information dif-\nfusion traces in real world networks [21], (ii) random networks ([0.5 0.5; 0.5 0.5]), typically\nused in physics and graph theory [22] and (iii) hierarchical networks ([0.9 0.1; 0.1 0.9]) [10].\nNext, we assign a pairwise transmission function for every directed edge in each type of net-\nwork and set its parameters at random. In our experiments, we use the Weibull distribution [16],\n, t (cid:62) 0, where \u03b1 > 0 is a scale parameter and \u03b2 > 0 is a shape\nf (t; \u03b1, \u03b2) = \u03b2\n\u03b1\nparameter. The Weibull distribution (Wbl) has often been used to model lifetime events in survival\nanalysis, providing more \ufb02exibility than an exponential distribution [16]. We choose \u03b1 and \u03b2 from 0\nto 10 uniformly at random for each edge in order to have heterogeneous temporal dynamics. Finally,\nfor each type of Kronecker network, we generate 10 sample networks, each of which has different\n\u03b1 and \u03b2 chosen for every edge.\nAccuracy of the estimated in\ufb02uence. To the best of our knowledge, there is no analytical solu-\ntion to the in\ufb02uence estimation given Weibull transmission function. Therefore, we compare CON-\nTINEST with Naive Sampling (NS) approach (see Appendix C) by considering the highest degree\nnode in a network as the source, and draw 1,000,000 samples for NS to obtain near ground truth.\nFigures 1(a) compares CONTINEST with the ground truth provided by NS at different time window\nT , from 0.1 to 10 in corre-periphery networks. For CONTINEST, we generate up to 10,000 ran-\ndom samples (or set of random waiting times), and 5 random labels in the inner loop. In all three\nnetworks, estimation provided by CONTINEST \ufb01ts accurately the ground truth, and the relative er-\nror decreases quickly as we increase the number of samples and labels (Figures 1(b) and 1(c)). For\n10,000 random samples with 5 random labels, the relative error is smaller than 0.01. (see Appendix F\nfor additional results on the random and hierarchal networks)\nScalability. We compare CONTINEST to the state-of-the-art method INFLUMAX [7] and the Naive\nSampling (NS) method in terms of runtime for the continuous-time in\ufb02uence estimation and maxi-\nmization. For CONTINEST, we draw 10,000 samples in the outer loop, each having 5 random labels\nin the inner loop. For NS, we also draw 10,000 samples. The \ufb01rst two experiments are carried out\nin a single 2.4GHz processor. First, we compare the performance of increasingly selecting sources\n(from 1 to 10) on small core-periphery networks (Figure 2(a)). When the number of selected sources\nis 1, different algorithms essentially spend time estimating the in\ufb02uence for each node. CONTINEST\noutperforms other methods by order of magnitude and for the number of sources larger than 1, it can\nef\ufb01ciently reuse computations for estimating in\ufb02uence for individual nodes. Dashed lines mean that\na method did not \ufb01nish in 24 hours, and the estimated run time is plotted. Next, we compare the run\ntime for selecting 10 sources on core-periphery networks of 128 nodes with increasing densities (or\nthe number of edges) (Figure 2(a)). Again, INFLUMAX and NS are order of magnitude slower due to\ntheir respective exponential and quadratic computational complexity in network density. In contrast,\nthe run time of CONTINEST only increases slightly with the increasing density since its compu-\ntational complexity is linear in the number of edges (see Appendix F for additional results on the\nrandom and hierarchal networks). Finally, we evaluate the speed on large core-periphery networks,\nranging from 100 to 1,000,000 nodes with density 1.5 in Figure 2(c). We report the parallel run time\n\n7\n\n246810050100150200Tinfluence  NSConTinEst10210310400.020.040.060.08#samplesrelative error5102030405002468x 10\u22123#labelsrelative error\f(a) Run time vs. # sources (b) Run time vs. network density (c) Run time vs. #nodes\n\nFigure 2: For core-periphery networks with T = 10, runtime for (a) selecting increasing number of sources\nin networks of 128 nodes and 320 edges; for (b)selecting 10 sources in networks of 128 nodes with increasing\ndensity; and for (c) selecting 10 sources with increasing network size from 100 to 1,000,000 \ufb01xing 1.5 density.\n\n(a) In\ufb02uence estimation error (b) In\ufb02uence vs. #sources\n\n(c) In\ufb02uence vs. time\n\nFigure 3:\nIn MemeTracker dataset, (a) comparison of the accuracy of the estimated in\ufb02uence in terms of\nmean absolute error, (b) comparison of the in\ufb02uence of the selected nodes by \ufb01xing the observation window\nT = 5 and varying the number sources, and (c) comparison of the in\ufb02uence of the selected nodes by by \ufb01xing\nthe number of sources to 50 and varying the time window.\nonly for CONTINEST and NS (both are implemented by MPI running on 192 cores of 2.4Ghz) since\nINFLUMAX is not scalable. In contrast to NS, the performance of CONTINEST increases linearly\nwith the network size and can easily scale up to one million nodes.\nReal-world data. We \ufb01rst quantify how well each method can estimate the true in\ufb02uence in a\nreal-world dataset. Then, we evaluate the solution quality of the selected sources for in\ufb02uence max-\nimization. We use the MemeTracker dataset [23] which has 10,967 hyperlink cascades among 600\nmedia sites. We repeatedly split all cascades into a 80% training set and a 20% test set at random\nfor \ufb01ve times. On each training set, we learn the continuous-time model using NETRATE [10] with\nexponential transmission functions. For discrete-time model, we learn the infection probabilities\nusing [24] for IC, SP1M and PMIA. Similarly, for LT, we follow the methodology by [1]. Let C(u)\nbe the set of all cascades where u was the source node. Based on C(u), the total number of distinct\nnodes infected before T quanti\ufb01es the real in\ufb02uence of node u up to time T . In Figure 3(a), we\nreport the Mean Absolute Error (MAE) between the real and the estimated in\ufb02uence. Clearly, CON-\nTINEST performs the best statistically. Because the length of real cascades empirically conforms\nto a power-law distribution where most cascades are very short (2-4 nodes), the gap of the estima-\ntion error is relatively not large. However, we emphasize that such accuracy improvement is critical\nfor maximizing long-term in\ufb02uence. The estimation error for individuals will accumulate along the\nspreading paths. Hence, any consistent improvement in in\ufb02uence estimation can lead to signi\ufb01cant\nimprovement to the overall in\ufb02uence estimation and maximization task, which is further con\ufb01rmed\nby Figures 3(b) and 3(c) where we evaluate the in\ufb02uence of the selected nodes in the same spirit as\nin\ufb02uence estimation: the true in\ufb02uence is calculated as the total number of distinct nodes infected\nbefore T based on C(u) of the selected nodes. The selected sources given by CONTINEST achieve\nthe best performance as we vary the number of selected sources and the observation time window.\n8 Conclusions\nWe propose a randomized in\ufb02uence estimation algorithm in continuous-time diffusion networks,\nwhich can scale up to networks of millions of nodes while signi\ufb01cantly improves over previous state-\nof-the-arts in terms of the accuracy of the estimated in\ufb02uence and the quality of the selected nodes\nin maximizing the in\ufb02uence. In future work, it will be interesting to apply the current algorithm\nto other tasks like in\ufb02uence minimization and manipulation, and design scalable algorithms for\ncontinuous-time models other than the independent cascade model.\nAcknowledgement: Our work is supported by NSF/NIH BIGDATA 1R01GM108341-01, NSF\nIIS1116886, NSF IIS1218749, NSFC 61129001, a DARPA Xdata grant and Raytheon Faculty Fel-\nlowship of Gatech.\n\n8\n\n12345678910100101102103104105#sourcestime(s)  ConTinEstNSInflumax> 24 hours1.522.533.544.55100101102103104105densitytime(s)  ConTinEstNSInflumax> 24 hours102103104105106102103104105106#nodestime(s)  ConTinEstNS> 48 hours10203040500.511.522.533.5TMAE  ConTinEstICLTSP1MPMIA010203040500102030405060#sourcesinfluence  ConTinEstGreedy(IC)Greedy(LT)SP1MPMIA05101520020406080Tinfluence  ConTinEst(Wbl)Greedy(IC)Greedy(LT)SP1MPMIA\fReferences\n[1] David Kempe, Jon Kleinberg, and \u00b4Eva Tardos. Maximizing the spread of in\ufb02uence through a\n\nsocial network. In KDD, pages 137\u2013146, 2003.\n\n[2] Wei Chen, Yajun Wang, and Siyu Yang. Ef\ufb01cient in\ufb02uence maximization in social networks.\n\nIn KDD, pages 199\u2013208, 2009.\n\n[3] Wei Chen, Yifei Yuan, and Li Zhang. Scalable in\ufb02uence maximization in social networks\n\nunder the linear threshold model. In ICDM, pages 88\u201397, 2010.\n\n[4] Amit Goyal, Francesco Bonchi, and Laks V. S. Lakshmanan. A data-based approach to social\n\nin\ufb02uence maximization. Proc. VLDB Endow., 5, 2011.\n\n[5] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne M. VanBriesen,\nand Natalie S. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420\u2013429,\n2007.\n\n[6] Matthew Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral market-\n\ning. In KDD, pages 61\u201370, 2002.\n\n[7] Manuel Gomez-Rodriguez and Bernhard Sch\u00a8olkopf.\n\ntime diffusion networks. In ICML \u201912, 2012.\n\nIn\ufb02uence maximization in continuous\n\n[8] Nan Du, Le Song, Alexander J. Smola, and Ming Yuan. Learning networks of heterogeneous\n\nin\ufb02uence. In NIPS, 2012.\n\n[9] Nan Du, Le Song, Hyenkyun Woo, and Hongyuan Zha. Uncover topic-sensitive information\n\ndiffusion networks. In AISTATS, 2013.\n\n[10] Manuel Gomez-Rodriguez, David Balduzzi, and Bernhard Sch\u00a8olkopf. Uncovering the tempo-\n\nral dynamics of diffusion networks. In ICML, pages 561\u2013568, 2011.\n\n[11] Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Sch\u00a8olkopf. Structure and Dynamics\n\nof Information Pathways in On-line Media. In WSDM, 2013.\n\n[12] Ke Zhou, Le Song, and Hongyuan Zha. Learning social infectivity in sparse low-rank networks\nusing multi-dimensional hawkes processes. In Arti\ufb01cial Intelligence and Statistics (AISTATS),\n2013.\n\n[13] Ke Zhou, Hongyuan Zha, and Le Song. Learning triggering kernels for multi-dimensional\n\nhawkes processes. In International Conference on Machine Learning(ICML), 2013.\n\n[14] Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Sch\u00a8olkopf. Modeling information\n\npropagation with survival theory. In ICML, 2013.\n\n[15] Shuanghong Yang and Hongyuan Zha. Mixture of mutually exciting processes for viral diffu-\n\nsion. In International Conference on Machine Learning(ICML), 2013.\n\n[16] Jerald F. Lawless. Statistical Models and Methods for Lifetime Data. Wiley-Interscience, 2002.\n[17] Edith Cohen. Size-estimation framework with applications to transitive closure and reachabil-\n\nity. Journal of Computer and System Sciences, 55(3):441\u2013453, 1997.\n\n[18] GL Nemhauser, LA Wolsey, and ML Fisher. An analysis of approximations for maximizing\n\nsubmodular set functions. Mathematical Programming, 14(1), 1978.\n\n[19] Andreas Krause. Ph.D. Thesis. CMU, 2008.\n[20] Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos, and Zoubin\n\nGhahramani. Kronecker graphs: An approach to modeling networks. JMLR, 11, 2010.\n\n[21] Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause. Inferring networks of diffu-\n\nsion and in\ufb02uence. In KDD, 2010.\n\n[22] David Easley and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly\n\nConnected World. Cambridge University Press, 2010.\n\n[23] Jure Leskovec, Lars Backstrom, and Jon M. Kleinberg. Meme-tracking and the dynamics of\n\nthe news cycle. In KDD, 2009.\n\n[24] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. In SIG-\n\nMETRICS/PERFORMANCE, pages 211\u2013222. ACM, 2012.\n\n[25] Wei Chen, Chi Wang, and Yajun Wang. Scalable in\ufb02uence maximization for prevalent viral\n\nmarketing in large-scale social networks. In KDD \u201910, pages 1029\u20131038, 2010.\n\n9\n\n\f", "award": [], "sourceid": 1446, "authors": [{"given_name": "Nan", "family_name": "Du", "institution": "Georgia Tech"}, {"given_name": "Le", "family_name": "Song", "institution": "Georgia Tech"}, {"given_name": "Manuel", "family_name": "Gomez Rodriguez", "institution": "MPI for Intelligent Systems"}, {"given_name": "Hongyuan", "family_name": "Zha", "institution": "Georgia Tech"}]}