{"title": "Infinite Hidden Semi-Markov Modulated Interaction Point Process", "book": "Advances in Neural Information Processing Systems", "page_first": 3900, "page_last": 3908, "abstract": "The correlation between events is ubiquitous and important for temporal events modelling. In many cases, the correlation exists between not only events' emitted observations, but also their arrival times. State space models (e.g., hidden Markov model) and stochastic interaction point process models (e.g., Hawkes process) have been studied extensively yet separately for the two types of correlations in the past. In this paper, we propose a Bayesian nonparametric approach that considers both types of correlations via unifying and generalizing hidden semi-Markov model and interaction point process model. The proposed approach can simultaneously model both the observations and arrival times of temporal events, and determine the number of latent states from data. A Metropolis-within-particle-Gibbs sampler with ancestor resampling is developed for efficient posterior inference. The approach is tested on both synthetic and real-world data with promising outcomes.", "full_text": "In\ufb01nite Hidden Semi-Markov Modulated Interaction\n\nPoint Process\n\nPeng Lin\u00a7\u2020, Bang Zhang\u00a7, Ting Guo\u00a7, Yang Wang\u00a7, Fang Chen\u00a7\n\n\u00a7Data61 CSIRO, Australian Technology Park, 13 Garden Street, Eveleigh NSW 2015, Australia\n\u2020School of Computer Science and Engineering, The University of New South Wales, Australia\n\n{peng.lin, bang.zhang, ting.guo, yang.wang, fang.chen}@data61.csiro.au\n\nAbstract\n\nThe correlation between events is ubiquitous and important for temporal events\nmodelling. In many cases, the correlation exists between not only events\u2019 emitted\nobservations, but also their arrival times. State space models (e.g., hidden Markov\nmodel) and stochastic interaction point process models (e.g., Hawkes process)\nhave been studied extensively yet separately for the two types of correlations\nin the past. In this paper, we propose a Bayesian nonparametric approach that\nconsiders both types of correlations via unifying and generalizing the hidden semi-\nMarkov model and interaction point process model. The proposed approach can\nsimultaneously model both the observations and arrival times of temporal events,\nand automatically determine the number of latent states from data. A Metropolis-\nwithin-particle-Gibbs sampler with ancestor resampling is developed for ef\ufb01cient\nposterior inference. The approach is tested on both synthetic and real-world data\nwith promising outcomes.\n\n1\n\nIntroduction\n\nTemporal events modeling is a classic machine learning problem that has drawn enormous research\nattentions for decades. It has wide applications in many areas, such as \ufb01nancial modelling, social\nevents analysis, seismological and epidemiological forecasting. An event is often associated with\nan arrival time and an observation, e.g., a scalar or vector. For example, a trading event in \ufb01nancial\nmarket has a trading time and a trading price. A message in social network has a posting time and a\nsequence of words. A main task of temporal events modelling is to capture the underlying events\ncorrelation and use it to make predictions for future events\u2019 observations and/or arrival times.\nThe correlation between events\u2019 observations can be readily found in many real-world cases in which\nan event\u2019s observation is in\ufb02uenced by its predecessors\u2019 observations. For examples, the price of a\ntrading event is impacted by former trading prices. The content of a new social message is affected\nby the contents of the previous messages. State space model (SSM), e.g., the hidden Markov model\n(HMM) [16], is one of the most prevalent frameworks that consider such correlation. It models the\ncorrelation via latent state dependency. Each event in the HMM is associated with a latent state\nthat can emit an observation. A latent state is independent of all but the most recent state, i.e.,\nMarkovianity. Hence, a future event observation can be predicted based on the observed events and\ninferred mechanisms of emission and transition.\nDespite its popularity, the HMM lacks the \ufb02exibility to model event arrival time. It only allows\n\ufb01xed inter-arrival time. The duration of a type of state follows a geometric distribution with its\nself-transition probability as the parameter due to the strict Markovian constraint. The hidden semi-\nMarkov model (HSMM) [14, 21] was developed to allow non-geometric state duration. It is an\nextension of the HMM by allowing the underlying state transition process to be a semi-Markov chain\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fwith a variable duration time for each state. In addition to the HMM components, the HSMM models\nthe duration of a state as a random variable and a state can emit a sequence of observations.\nThe HSMM allows the \ufb02exibility of variable inter-arrival times, but it does not consider events\u2019\ncorrelation on arrival times. In many real-world applications, one event can trigger the occurrences of\nothers in the near future. For instance, earthquakes and epidemics are diffusible events, i.e., one can\ncause the occurrences of others. Trading events in \ufb01nancial markets arrive in clusters. Information\npropagation in social network shows contagious and clustering characteristics. All these events\nexhibit interaction characteristics in terms of arrival times. The likelihood of an event\u2019s arrival time is\naffected by the previous events\u2019 arrival times. Stochastic interaction point process (IPP), e.g., Hawkes\nprocess [6], is a widely adopted framework for capturing such arrival time correlation. It models the\ncorrelation via a conditional intensity function that depicts the event intensity depending on all the\nprevious events\u2019 arrival times. However, unlike the SSMs, it lacks the capability of modelling events\u2019\nlatent states and their interactions.\nIt is clearly desirable in real-world applications to have both arrival time correlation and observation\ncorrelation considered in a uni\ufb01ed manner so that we can estimate both when and how events will\nappear. Inspired by the merits of SSMs and IPPs, we propose a novel Bayesian nonparametric\napproach that uni\ufb01es and generalizes SSMs and IPPs via a latent semi-Markov state chain with\nin\ufb01nitely countable number of states. The latent states governs both the observation emission and\nnew event triggering mechanism. An ef\ufb01cient sampling method is developed within the framework of\nparticle Markov chain Monte Carlo (PMCMC) [1] for the posterior inference of the proposed model.\nWe \ufb01rst review closely related techniques in Section 2, and give the description of the proposed model\nin Section 3. Then Section 4 presents the inference algorithm. In Section 5, we show the results of\nthe empirical studies on both synthetic and real-word data. Conclusions are drawn in Section 6.\n\n2 Preliminaries\n\nIn this section, we review the techniques that are closely related to the proposed method, namely\nhidden (semi-)Markov model, its Bayesian nonparametric extension and Hawkes process.\n\n2.1 Hidden (Semi-)Markov Model\n\nThe HMM [16] is one of the most popular approaches for temporal event modelling. It utilizes a\nsequence of latent states with Markovian property to model the dynamics of temporal events. Each\nevent in the HMM is associated with a latent state that determines the event\u2019s observation via a\nemission probability distribution. The state of an event is independent of all but its most recent\npredecessor\u2019s state (i.e., Markovianity) following a transition probability distribution. The HMM\nconsists of 4 components: (1) an initial state probability distribution,(2) a \ufb01nite latent state space,\n(3) a state transition matrix, and (4) an emission probability distribution. As a result, the inference\nfor the HMM involves: inferring (1) the initial state probability distribution, (2) the sequence of the\nlatent states, (3) the state transition matrix and (4) the emission probability distribution.\nThe HMM has proven to be an excellent general framework modelling sequential data, but it has two\nsigni\ufb01cant drawbacks: (1) The durations of events (or the inter-arrival times between events) are\n\ufb01xed to a common value. The state duration distributions are restricted to a geometric form. Such\nsetting lacks the \ufb02exibility for real-world applications. (2) The size of the latent state space in the\nHMM must be set a priori instead of learning from data.\nThe hidden semi-Markov model (HSMM) [14, 21] is a popular extension to the HMM, which tries to\nmitigate the \ufb01rst drawback of the HMM. It allows latent states to have variable durations, thereby\nforming a semi-Markov chain. It reduces to the HMM when durations follow a geometric distribution.\nAdditional to the 4 components of the HMM, HSMM has a state duration probability distribution. As\na result, the inference procedure for the HSMM also involves the inference of the duration probability\ndistribution. It is worth noting that the interaction between events in terms of event arrival time is\nneglected by both the HMM and the HSMM.\n\n2\n\n\f2.2 Hierarchical Dirichlet Process Prior for State Transition\n\nThe recent development in Bayesian nonparametrics helps address the second drawback of the HMM.\nHere, we brie\ufb02y review the Hierarchical Dirichlet Process HMM (HDP-HMM). Let (\u0398,B) be a\nmeasurable space and G0 be a probability measure on it. A Dirichlet process (DP) G is a distribution\nof a random probability measure over the measurable space (\u0398,B). For any \ufb01nite measurable partition\n(A1,\u00b7\u00b7\u00b7 , Ar) of \u0398, the random vector (G(A1),\u00b7\u00b7\u00b7 , G(Ar)) follows a \ufb01nite Dirichlet distribution\nparameterized by (\u03b10G0(A1),\u00b7\u00b7\u00b7 , \u03b10G0(Ar)), where \u03b10 is a positive real number.\nHDP is de\ufb01ned based on DP for modelling grouped data. It is a distribution over a collection of\nrandom probability measures over the measurable space (\u0398,B). Each one of these random probability\nmeasure Gk is associated with a group. A global random probability measure G0 distributed as\na DP is used as a mean measure with concentration parameter \u03b3 and base probability measure H.\nBecause the HMM can be treated as a set of mixture models in a dynamic manner, each of which\ncorresponds to a value of the current state, the HDP becomes a natural choice as the prior over the\nstate transitions [2, 18]. The generative HDP-HMM model can be summarized as:\n\u03b8k | \u03bb, H (cid:118) H(\u03bb),\n(cid:118) F (\u03b8sn )\n.\n\n\u03c0k |\u03b10, \u03b2 (cid:118) DP(\u03b10, \u03b2),\nyn | sn, (\u03b8k)\u221e\n(cid:118) \u03c0sn\u22121,\n\n\u03b2 | \u03b3 (cid:118) GEM(\u03b3),\nsn |sn\u22121, (\u03c0k)\u221e\n\n(1)\n\nk=1\n\nk=1\n\nGEM denotes stick-breaking process. The variable sequence \u03c0k indicates the latent state sequence.\nyn represents the observation. HDP acts the role of a prior over the in\ufb01nite transition matrices. Each\n\u03c0k is a draw from a DP, it depicts the transition distribution from state k. The probability measures\nfrom which \u03c0k\u2019s are drawn are parameterized by the same discrete base measure \u03b2. \u03b8 parameterizes\nthe emission distribution F . Usually H is set to be conjugate of F simplifying inference. \u03b3 controls\nbase measure \u03b2\u2019s degree of concentration. \u03b10 plays the role of governing the variability of the prior\nmean measure across the rows of the transition matrix.\nBecause the HDP prior doesn\u2019t distinguish self-transitions from transitions to other states, it is\nvulnerable to unnecessary frequent switching of states and more states. Thus, [5] proposed a\nsticky HDP-HMM to include a self-transition bias parameter into the state transition measure\n\u03c0k \u223c DP (\u03b10 + \u03ba, (\u03b10\u03b2 + \u03ba\u03b4k)/(\u03b10 + \u03ba)), where \u03ba controls the stickness of the transition matrix.\n\n2.3 Hawkes Process\n\nStochastic point process [3] is a rich family of models that are designed for tackling various of\ntemporal event modeling problems. A stochastic point process can be de\ufb01ned via its conditional\nintensity function that provides an equivalent representation as a counting process for temporal events.\nGiven N (t) denoting the number of events occurred in the time interval [0, t) and \u03c4t indicating the\narrival times of the temporal events before t, the intensity for a time point t conditioned on the arrival\ntimes of all the previous events is de\ufb01ned as:\n\u03bb(t|\u03c4t) = lim\n\u2206t\u21920\n\nE[N (t + \u2206t) \u2212 N (t)|\u03c4t]\n\n(2)\n\n.\n\n\u2206t\n\nIt is worth noting that we do not consider edge effect in this paper, hence no events exist before time\n0. A variety of point processes has been developed with distinct functional forms of intensity for\nvarious modeling purposes. Interaction point process (IPP) [4] considers point interactions with an\nintensity function dependent on historical events. Hawkes process [7, 6] is one of the most popular\nand \ufb02exible IPPs. Its conditional intensity has the following functional form:\n\n(cid:88)\n\n\u03bb(t) = \u00b5(t) +\n\n\u03c8n(t \u2212 tn).\n\n(3)\n\ntn<t\n\nA typical decay function is in exponential form, i.e., \u03bb(t) = \u00b5 +(cid:80)\n\nWe use \u03bb(t) to represent intensity function conditioned on previous points \u03c4t with the consideration\nof notation simplicity. The function \u00b5(t) is a non-negative background intensity function which is\noften set to a positive real number. The function \u03c8n((cid:52)t) represents the triggering kernel of event\ntn. It is a decay function de\ufb01ned on [0,\u221e) depicting the decayed in\ufb02uence of triggering new events.\ntn<t \u03b1(cid:48) \u00b7 exp(\u2212\u03b2(cid:48)(t \u2212 tn)). As\ndiscussed in [7, 10], because the superposition of several Poisson processes is also a Poisson process,\nHawkes process can be considered as a conditional Poisson process that is a constituted by combining\na background Poisson process \u00b5(t) and a set of triggered Poisson processes with intensity \u03c8n(t\u2212 tn).\n\n3\n\n\fFigure 1: (1) An intuitive illustration of the iHSMM-IPP model. Every event in the iHSMM-IPP\nmodel is associated with a latent state s, an arrival time t and an observable value y. The colours of\npoints indicate latent states. Blue curve shows the event intensity. The top part of the \ufb01gure illustrates\nthe IPP component of the iHSMM-IPP model and the bottom part illustrates the HSMM component.\nThe two components are integrated together via an in\ufb01nite countable semi-Markov latent state chain.\n(2) Graphical model of the iHSMM-IPP model. The top part shows the HDP-HMM.\n\n3\n\nIn\ufb01nite Hidden Semi-Markov Modulated Interaction Point Process\n(iHSMM-IPP)\n\nInspired by the merits of SSMs and IPPs, we propose an in\ufb01nite hidden semi-Markov modulated\ninteraction point process model (iHSMM-IPP). It is a Bayesian nonparametric stochastic point\nprocess with a latent semi-Markov state chain determining both event emission probabilities and\nevent triggering kernels. An intuitive illustration is given in Fig. 1 (1). Each temporal event in the\niHSMM-IPP is represented by a stochastic point and each point is associated with a hidden discrete\nstate {si} that plays the role of determining event emission and triggering mechanism. As in SSMs\nand IPPs, the event emission probabilities guide the generation of event observations {yi} and the\nevent triggering kernels in\ufb02uence the occurrence times {ti} of events. The hidden state depends only\non the most recent event\u2019s state. The size of the latent state space is in\ufb01nite countable with the HDP\nprior.\nThe model can be formally de\ufb01ned as the following and its corresponding graphical model is given in\nFig. 1 (2).\n\n\u03b2 | \u03b3 (cid:118) GEM(\u03b3),\n\n\u03c1k | \u03c7, H(cid:48) (cid:118) H(cid:48)(\u03c7),\n\ntn | \u00b7 (cid:118) PP(\u00b5 +\n\n\u03c8\u03c1si\n\n\u03c0k | \u03b10, \u03b2 (cid:118) DP(\u03b10, \u03b2),\nsn | sn\u22121, (\u03c0k)\u221e\n(t \u2212 ti)), yn | sn, (\u03b8k)\u221e\n\nn\u22121(cid:88)\n\nk=1\n\n\u03b8k | \u03b7, H (cid:118) H(\u03b7),\n(cid:118) \u03c0sn\u22121 ,\n\n(cid:118) F (\u03b8sn ).\n\nk=1\n\n(4)\n\ni=1\n\n(\u00b7) to denote the triggering kernel parameterized by \u03c1si which is indexed by latent\nWe use \u03c8\u03c1si\nstate si. We use \u03c8si(\u00b7) instead of \u03c8\u03c1si\n(\u00b7) for the remaining of the paper for the sake of notation\nsimplicity. The iHSMM-IPP is a generative model that can be used for generating a series of events\nwith arrival times and emitted observations. The arrival time tn is drawn from a Poisson process.\nWe do not consider edge effect in this work. Therefore, the \ufb01rst event\u2019s arrival time, t1, is drawn\nfrom a homogeneous Poisson process parameterized by a hyper-parameter \u00b5. For n > 1, tn is\ndrawn from an inhomogeneous Poisson process whose conditional intensity function is de\ufb01ned as:\ni=1 \u03c8si(t \u2212 ti). As de\ufb01ned before, \u03c8si(\u00b7) indicates the triggering kernel of a former point\ni whose latent state is si. The state of the point sn is drawn following the guidance of the HDP\nprior as in the HDP-HMM. The emitted observation yn is generation from the emission probability\ndistribution F (\u00b7) parameterized by \u03b8sn which is determined by the state sn.\n\n\u00b5 +(cid:80)n\u22121\n\n4\n\n\f4 Posterior Inference for the iHSMM-IPP\n\nIn this section, we describe the inference method for the proposed iHSMM-IPP model. Despite its\n\ufb02exibility, the proposed iHSMM-IPP model faces three challenges for ef\ufb01cient posterior inference:\n(1) strong correlation nature of its temporal dynamics (2) non-Markovianity introduced by the event\ntriggering mechanism, and (3) in\ufb01nite dimensional state transition. The traditional sampling methods\nfor high dimensional probability distributions, e.g., MCMC, sequential Monte Carlo (SMC), are\nunreliable when highly correlated variables are updated independently, which can be the case for the\niHSMM-IPP model. So we develop the inference algorithm within the framework of particle MCMC\n(PMCMC), a family of inferential methods recently developed in [1]. The key idea of PMCMC is to\nuse SMC to construct a proposal kernel for an MCMC sampler. It not only improves over traditional\nMCMC methods but also makes Bayesian inference feasible for a large class of statistical models. For\ntackling the non-Markovianity, ancestor resampling scheme [13] is incorporated into our inference\nalgorithm. As existing forward-backward sampling methods, ancestor resampling uses backward\nsampling to improve the mixing of PMCMC. However, it achieves the same effect in a single forward\nsweep instead of using separate forward and backward sweeps. More importantly, it provides an\neffective way of sampling for non-Markovian SSMs.\nGiven a sequence of N events, {yn, tn}N\nn=1, the inference algorithm needs to sample the hidden\nstate sequence, {sn}N\nn=1, emission distribution parameters \u03b81:K, background event intensity \u00b5,\ntriggering kernel parameters, \u03c81:K (we omit \u03c1 and use \u03c81:K instead of \u03c8\u03c11:K for notation simplicity\nas before), transition matrix, \u03c01:K, and the HDP parameters (\u03b10, \u03b3, \u03ba, \u03b2). We use K to represent\nthe number of active states and \u2126 to indicate the set of variables excluding the latent state sequence,\ni.e., \u2126 = {\u03b10, \u03b2, \u03b3, \u03ba, \u00b5, \u03b81:K, \u03c81:K, \u03c01:K}. Only major variables are listed, and \u2126 may also include\nother variables, such as the probability of initial latent state. At a high level, all the variables are\nupdated iteratively using a particle Gibbs (PG) sampler. A conditional SMC is performed as a\nproposal kernel for updating latent state sequence in each PG iteration. An ancestor resampling\nscheme is adopted in the conditional SMC for handling the non-Markovianity caused by the triggering\nmechanism. Metropolis sampling is used in each PG iteration to update background event intensity \u00b5\nand triggering kernel parameters \u03c81:K. The remaining variables in \u2126 can be sampled by following\nthe scheme in [5, 18] readily. The proposal distribution q\u2126(\u00b7) in the conditional SMC can be set by\nfollowing [19]. The PG sampler is given in the following:\n\nStep 1: Initialization, i = 0, set \u2126(0), s1:N (0), B1:N (0).\nStep 2: For iteration i (cid:62) 1\n\n(a) Sample \u2126(i) \u223c p{\u00b7|y1:N , t1:N , s1:N (i \u2212 1)}.\n(b) Run a conditional SMC algorithm targeting p\u2126(i)(s1:N|y1:N , t1:N ) conditional on\n(c) Sample s1:N (i) \u223c \u02c6p\u2126(i)(\u00b7|y1:N , t1:N ).\n\ns1:N (i \u2212 1) and B1:N (i \u2212 1).\n\nWe use B1:N to represent the ancestral lineage of the prespeci\ufb01ed state path s1:N and \u02c6p\u2126(i)(\u00b7|y1:N ) to\nrepresent the particle approximation of p\u2126(i)(\u00b7|y1:N ). The details of the conditional SMC algorithm\nare given in the following. It is worth noting that the conditioned latent state path is only updated via\nthe ancestor resampling.\nStep 1: Let s1:N = {sB1\n\nN } denote the path that is associated with the ancestral lineage\n\n2 ,\u00b7\u00b7\u00b7 , sBN\n\n1 , sB2\n\nB1:N\n\nStep 2: For n = 1,\n\n(a) For j (cid:54)= B1, sample sj\n(b) Compute weights w1(sj\n\n1)/(cid:80)J\n\n1) = p(sj\n\n1 \u223c q\u2126(\u00b7|y1), j \u2208 [1,\u00b7\u00b7\u00b7 , J]. (J denotes the number of particles.)\n1|y1) and normalize the weights\n1 ). (We use p(sj\n1) to represent the probability of the\n1|y1) to represent the proposal distribution conditional on\n\n1)F (y1|sj\n\n1)/q\u2126(sj\n\n1 = w1(sj\n\nW j\ninitial latent state and q\u2126(sj\nthe variable set \u2126.)\n\nm=1 w1(sm\n\nStep 3: For n = 2,\u00b7\u00b7\u00b7 , N :\n\n(a) For j (cid:54)= Bn, sample ancestor index of particle j: aj\n\nn\u22121 \u223c Cat(\u00b7|W 1:J\n\nn\u22121).\n\n5\n\n\f(b) For j (cid:54)= Bn, sample sj\n\nn \u223c q\u2126(\u00b7|yn, s\n\naj\nn\u22121\nn\u22121 ). If sj\n\nn = K + 1 then create a new state using\n\nthe stick-breaking construction for the HDP:\n(i) Sample a new transition probability \u03c0K+1 \u223c Dir(\u03b10\u03b2).\n(ii) Use stick-breaking construction to expand \u03b2 \u2190 [\u03b2, \u03b2K+1]:\n\nK+1 \u223c Beta(1, \u03b3),\n\u03b2(cid:48)\n\n\u03b2K+1 = \u03b2(cid:48)\n\nK+1\n\n(1 \u2212 \u03b2(cid:48)\nl).\n\nK(cid:89)\n\nl=1\n\n(iii) Expand transition probability vectors \u03c0k to include transitions to state K + 1 via\n\nthe HDP stick-breaking construction:\n\nk,K+1 \u223c Beta(\u03b10\u03b2K+1, \u03b10(1 \u2212 K+1(cid:88)\n\n\u03c0(cid:48)\n\n\u03c0k \u2190 [\u03c0k,1,\u00b7\u00b7\u00b7 , \u03c0k,K+1], \u2200k \u2208 [1, K + 1], where\n\nk,l).\n(iv) Sample parameters for a new emission probability and triggering kernel \u03b8K+1 \u223c\n\nk,K+1\n\nl=1\n\nl=1\n\n\u03b2l)), \u03c0k,K+1 = \u03c0(cid:48)\n\n(1 \u2212 \u03c0(cid:48)\n\nK(cid:89)\n\nH and \u03c81:K \u223c H(cid:48).\n\n(d) Perform ancestor resampling for the conditioned state path. Compute the ancestor\n\nweights \u02dcwp,j\n\nn\u22121|N via Eq. 7 and Eq. 8 and resample aBn\n\nn as p(aBn\n\n(e) Compute and normalize particle weights:\nn|s\nn)/q\u2126(sj\n\nn\u22121 )F (yn|sj\n\nn|s\nn) = \u03c0(sj\n\nwn(sj\n\naj\nn\u22121\n\naj\nn\u22121\nn\u22121 , yn), Wn(sj\n\nn\u22121|N .\n\nn = j) \u221d \u02dcwp,j\nJ(cid:88)\n\nn) = wn(sj\n\nn)/(\n\nwn(sj\n\nn)).\n\n4.1 Metropolis Sampling for Background Intensity and Triggering Kernel\n\nj=1\n\nFor the inference of the background intensity \u00b5 and the parameters of triggering kernels \u03c8k in the step\n2 (a) of the PG sampler, Metropolis sampling is used. As described in [3], the conditional likelihood\nof the occurrences of a sequence of events in IPP can be expressed as:\n\nL (cid:44) p(t1:N|\u00b5, \u03c81:K ) =\n\n(cid:32) N(cid:89)\n\n(cid:33)\n\n(cid:90) T\n\n(cid:18)\n\n\u2212\n\n\u03bb(tn)\n\nexp\n\n\u03bb(t)dt\n\n.\n\nn=1\n\n0\n\n(cid:19)\n\nWe describe the Metropolis update for \u03c8k, and similar update can be derived for \u00b5. The normal\ndistribution with the current value of \u03c8k as mean is used as the proposal distribution. The proposed\ncandidate \u03c8\u2217\n. The ratio can be\ncomputed as:\n\nk will be accepted with the probability: A(\u03c8\u2217\n\nk, \u03c8k) = min\n\n1, \u02c6p(\u03c8\u2217\n\nk)\n\u02c6p(\u03c8k)\n\n(cid:16)\n\n(cid:17)\n\n\u02c6p(\u03c8\u2217\nk)\n\u02c6p(\u03c8k)\n\n=\n\np(\u03c8\u2217\nk)\np(\u03c8k)\n\n\uf8eb\uf8ed (cid:88)\n\n\u00b7 exp\n\nu\u2208[1,N ]\n\n\u00b7 p(t1:N|\u03c8\u2217\nk, rest)\np(t1:N|\u03c8k, rest)\n\np(\u03c8\u2217\nk)\np(\u03c8k)\n\n\u00b7\n\n=\n\nsu (T \u2212 tu))\n(\u03a8su (T \u2212 tu) \u2212 \u03a8\n\u2217\n\n(cid:33)\n\nsu (tn \u2212 tu)\nu<n \u03c8\u2217\nu<n \u03c8su (tn \u2212 tu)\n\n(cid:32) N(cid:89)\n\nn=1\n\n\u00b5(tn) +(cid:80)\n\u00b5(tn) +(cid:80)\n\uf8f6\uf8f8 .\n\nWe use \u03a8(\u00b7) to represent the cumulative distribution function of the kernel function \u03c8(\u00b7). We use\n(\u00b7) to represent the u-th event\u2019s triggering kernel candidate if su = k. It remains the current\n\u03c8\u2217\ntriggering kernel otherwise. [0, T ] indicates the time period of the N events.\n\nsu\n\n(5)\n\n(6)\n\n4.2 Truncated Ancestor Resampling for Non-Markovianity\n\n\u02dcwp,j\n\nn|N = wj\n\n\u03b3n+p({sj\n\nTruncated ancestor resampling [13] is used for tackling the non-Markovianity caused by the triggering\nmechanism of the proposed model. The ancestor weight can be computed as:\n\nn+1:n+p})\n1:n, s(cid:48)\n\u03b3n(sj\n1:n)\nL(t1:p)\nL(t1:n)\nFor notation simplicity, we use wj\nn). In general, n + p needs to reach the last\nevent in the sequence. However, due the computational cost and the in\ufb02uence decay of the past events\nin the proposed iHSMM-IPP, it is practical and feasible to use only a small number of events as an\napproximation instead of using all the remaining events in Eq. 8.\n\nn+1:n+p})\n1:n, s(cid:48)\n\u03b3t(sj\n1:n)\n\np(s1:p, y1:p, t1:p)\np(s1:n, y1:n, t1:n)\n\nF (yj|sj)\u03c0(sj|sj\u22121)\n\nn to represent wn(sj\n\n\u03b3n+p({sj\n\np(cid:89)\n\nj=n+1\n\n(8)\n\n(7)\n\n=\n\n=\n\n\u00b7\n\nn\n\n6\n\n\fFigure 2: (1) Normalized Hamming distance errors for synthetic data. (2) Cleaned energy consump-\ntion readings of the REDD data set. (3) Estimated states by the proposed iHSMM-IPP model.\n\n5 Empirical Study\n\nIn the following experiments, we demonstrate the performance of the proposed inference algorithm\nand show the applications of the proposed iHSMM-IPP model in real-world settings.\n\n5.1 Synthetic Data\n\nthe triggering kernels take the exponential form: \u03bb(t) = 0.6 +(cid:80)\n\nAs in [20, 5, 19], we generate the synthetic data of 1000 events via a 4-state Gaussian emission HMM\nwith self-transition probability of 0.75 and the remaining probability mass uniformly distributed\nover the other 3 states. The means of emission are set to \u22122.0 \u22120.5 1.0 4.0 with the deviation of\n0.5. The occurrence times of events are generated via the Hawkes process with 4 different triggering\nkernels, each of which corresponds to a HMM state. The background intensity is set to 0.6 and\ntn<t \u03b1(cid:48) \u00b7 exp(\u2212\u03b2(cid:48)(t \u2212 tn)) with\n{0.1, 0.9},{0.5, 0.9},{0.1, 0.6},{0.5, 0.6} as the {\u03b1(cid:48), \u03b2(cid:48)} parameter pairs of the kernels. A thinning\nprocess [15] (a point process variant of rejection sampling) is used to generate event times of Hawkes\nprocess.\nWe compared 4 related methods to demonstrate the performance of the proposed iHSMM-IPP model\nand inference algorithm: particle Gibbs sampler for sticky HDP-HMM [19], weak-limit sampler for\nHDP-HSMM [8], Metropolis-within-Gibbs sampler for marked Hawkes process [17] and variational\ninference for marked Hawkes process [11]. The normalized Hamming distance error is used to\nmeasure the performance of the estimated state sequences. The Diff distance used in [22] (i.e.,\n, \u03c8(t) and \u02dc\u03c8(t) represent the true and estimated kernels respectively) is adopted for\nmeasuring the performance of the estimated triggering kernels. The estimated ones are greedily\nmatched to minimize their distances from the ground truth.\nThe average results of the normalized Hamming distance errors are shown in Fig. 2 (1) and the\nDiff distance errors are shown in the second column of Table 1. The results show that the proposed\ninference method can not only quickly converge to an accurate estimation of the latent state sequence\nbut also well recover the underlying triggering kernels. Its clear advantage over the compared SSMs\nand marked Hawkes processes is due to its considerations of both occurrence times and emitted\nobservations for the inference.\n\n(cid:82) ( \u02dc\u03c8(t)\u2212\u03c8(t))2dt\n(cid:82) (\u03c8(t))2dt\n\n5.2 Understanding Energy Consumption Behaviours of Households\n\nIn this section, we use energy consumption data from the Reference Energy Disaggregation Dataset\n(REDD ) [9] to demonstrate the application of the proposed model. The data set was collected via\nsmart meters recording detailed appliance-level electricity consumption information for individual\nhouse. The data sets were collected with the intension to understand household energy usage patterns\nand make recommendations for ef\ufb01cient consumption. The 1 Hz low frequency REDD data is used\nand down sampled to 1 reading per minute covering 1 day energy consumption. Very low and high\nconsumption readings are removed from the reading sequence. Fig. 2 (2) shows the cleaned reading\nsequence. Colours indicate appliance types and readings are in Watts.\nThe appliance types are modelled as latent states in the proposed iHSMM-IPP model. The readings\nare the emitted observations of states governed by Gaussian distributions. The relationship between\nthe usages of different appliances is modelled via the state transition matrix. The triggering kernels\n\n7\n\n\fMethod\niHSMM-IPP\nM-MHawkes\nVI-MHawkes\nHDP-HSMM\nS-HDP-HMM\n\nSynthetic\n\nDiff\n0.36\n0.55\n0.62\n\n-\n-\n\nREDD\n\nHamming\n\n0.30\n0.63\n0.76\n0.42\n0.55\n\nLogLik\n\u2212120.11\n\u2212173.36\n\u2212193.62\n\u2212147.52\n\u2212163.28\n\nHamming\n\n0.39\n0.64\n0.78\n0.52\n0.59\n\nPipe\n\nLogLik MSE Failures MSE Hours\n\u2212677\n\u22121035\n\u22121200\n\u2212850\n\u2212993\n\n82.8\n142.2\n166.7\n103.8\n128.5\n\n28.6\n80.2\n93.7\n42.3\n55.9\n\nTable 1: Results on Synthetic, REDD and Pipe data sets.\n\nof states in the model depict the in\ufb02uences of appliances on triggering the following energy con-\nsumptions, e.g., the usage of washing machine triggers the following energy usage of dryer. As in\nthe \ufb01rst experiment, exponential form of trigger is adopted and independent exponential priors with\nhyper-parameter 0.01 are used for kernel parameters (\u03b1(cid:48), \u03b2(cid:48)).\nThe 4 methods used in the \ufb01rst experiment are compared with the proposed model. The average\nresults of the normalized Hamming distance errors and the log likelihoods are shown in the third and\nfourth columns of Table 1. The proposed model outperforms the other methods due to the fact that it\nhas the \ufb02exibility to capture the interaction between the usages of different appliances. Other models\nmainly rely on the emitted observations, i.e., readings for inferring the types of appliances.\n\n5.3 Understanding Infrastructure Failure Behaviours and Impacts\n\nDrinking water pipe networks are valuable infrastructure assets. Their failures (e.g., pipe bursts and\nleaks) can cause tremendous social and economic costs. Hence, it is of signi\ufb01cant importance to\nunderstand the behaviours of pipe failures (i.e., occurrence time, failure type, labour hours for repair).\nIn particular, the relationship between the types of two consecutive failures, the triggering effect of a\nfailure on the intensity of future failures and the labour hours taken for a certain type of failure can\nhelp provide not only insights but also guidance to make informed maintenance strategies.\nIn this experiment, a sequence of 1600 failures occurred in the same zone within 15 years with 10\ndifferent failure types [12] are used for testing the performance of the proposed iHSMM-IPP model.\nFailure types are modelled as latent states. Labour hours for repair are emissions of states, which are\nmodelled by Gaussian distributions. It is well observed in industry that pipe failures occur in clusters,\ni.e., certain types of failures can cause high failure risk in near future. Such behaviours are modelled\nvia the triggering kernels of states.\nAs in the \ufb01rst experiment, we compare the proposed iHSMM-IPP model with 4 related methods.\nThe sequence is divided into two parts 90% and 10%. The \ufb01rst part of the sequence is used for\ntraining models. The normalized Hamming distance errors and log likelihoods are used for measuring\nthe performances on the \ufb01rst part. Then the models are used for predicting the remaining 10% of\nthe sequence. The predicted total number of failures and total labour hours for each failure type\nare compared with ground truth by using mean square error. The results are shown in the last four\ncolumns of Table 1. It can be seen that the proposed iHSMM-IPP achieves the best performance\nfor both the estimation on the \ufb01rst part of the sequence and the prediction on the second part of the\nsequence. Its superiority comes from the fact that it well utilizes both the observed labour hours and\noccurrence times while others only consider part of the observed information or have limitations on\nmodel \ufb02exibility.\n\n6 Conclusion\n\nIn this work, we proposed a new Bayesian nonparametric stochastic point process model, namely the\nin\ufb01nite hidden semi-Markov modulated interaction point process model. It captures both emitted\nobservations and arrival times of temporal events for capturing the underlying event correlation. A\nMetropolis-within-particle-Gibbs sampler with truncated ancestor resampling is developed for the\nposterior inference of the proposed model. The effectiveness of the sampler is shown on a synthetic\ndata set with the comparison of 4 related state-of-the-art methods. The superiority of the proposed\nmodel over the compared methods is also demonstrated in two real-world applications, i.e., household\nenergy consumption modelling and infrastructure failure modelling.\n\n8\n\n\fReferences\n[1] C. Andrieu, A. Doucet, and R. Holenstein. Particle markov chain monte carlo methods. Journal of the\n\nRoyal Statistical Society: Series B (Statistical Methodology), 72(3):269\u2013342, 2010.\n\n[2] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The in\ufb01nite hidden markov model. In NIPS, pages\n\n577\u2013584, 2001.\n\n[3] D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes: volume II: general theory\n\nand structure. Springer Science & Business Media, 2007.\n\n[4] P. J. Diggle, T. Fiksel, P. Grabarnik, Y. Ogata, D. Stoyan, and M. Tanemura. On parameter estimation for\n\npairwise interaction point processes. International Statistical Review, pages 99\u2013117, 1994.\n\n[5] E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. An hdp-hmm for systems with state persistence.\n\nIn Proceedings of the 25th international conference on Machine learning, pages 312\u2013319. ACM, 2008.\n\n[6] A. G. Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1):83\u2013\n\n90, 1971.\n\n[7] A. G. Hawkes and D. Oakes. A cluster process representation of a self-exciting process. Journal of Applied\n\nProbability, pages 493\u2013503, 1974.\n\n[8] M. J. Johnson and A. Willsky. The hierarchical dirichlet process hidden semi-markov model. arXiv\n\npreprint arXiv:1203.3485, 2012.\n\n[9] K. J.Z. and J. M.J. Redd: A public data set for energy disaggregation research. In Proceedings of the\n\nSustKDD Workshop on Data Mining Appliations in Sustainbility, 2011.\n\n[10] J. Kingman. On doubly stochastic poisson processes. In Mathematical Proceedings of the Cambridge\n\nPhilosophical Society, volume 60, pages 923\u2013930. Cambridge Univ Press, 1964.\n\n[11] L. Li and H. Zha. Energy usage behavior modeling in energy disaggregation via marked hawkes process. In\nProceedings of the Twenty-Ninth AAAI Conference on Arti\ufb01cial Intelligence, January 25-30, 2015, Austin,\nTexas, USA., pages 672\u2013678, 2015.\n\n[12] P. Lin, B. Zhang, T. Guo, Y. Wang, and F. Chen. Interaction point processes via in\ufb01nite branching model.\n\nThe Thirtieth AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2016.\n\n[13] F. Lindsten, M. I. Jordan, and T. B. Sch\u00f6n. Particle gibbs with ancestor sampling. The Journal of Machine\n\nLearning Research, 15(1):2145\u20132184, 2014.\n\n[14] K. P. Murphy. Hidden semi-markov models (hsmms). unpublished notes, 2, 2002.\n[15] Y. Ogata. On lewis\u2019 simulation method for point processes. Information Theory, IEEE Transactions on,\n\n27(1):23\u201331, 1981.\n\n[16] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition.\n\nProceedings of the IEEE, 77(2):257\u2013286, 1989.\n\n[17] J. G. Rasmussen. Bayesian inference for hawkes processes. Methodology and Computing in Applied\n\nProbability, 15(3):623\u2013642, 2013.\n\n[18] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the\n\namerican statistical association, 101(476), 2006.\n\n[19] N. Tripuraneni, S. Gu, H. Ge, and Z. Ghahramani. Particle gibbs for in\ufb01nite hidden markov models. In\n\nAdvances in Neural Information Processing Systems, pages 2386\u20132394, 2015.\n\n[20] J. Van Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the in\ufb01nite hidden markov\nmodel. In Proceedings of the 25th international conference on Machine learning, pages 1088\u20131095. ACM,\n2008.\n\n[21] S.-Z. Yu. Hidden semi-markov models. Arti\ufb01cial Intelligence, 174(2):215\u2013243, 2010.\n[22] K. Zhou, H. Zha, and L. Song. Learning triggering kernels for multi-dimensional hawkes processes. In\n\nICML, pages 1301\u20131309, 2013.\n\n9\n\n\f", "award": [], "sourceid": 1935, "authors": [{"given_name": "matt", "family_name": "zhang", "institution": "Nicta"}, {"given_name": "Peng", "family_name": "Lin", "institution": "Data61"}, {"given_name": "Peng", "family_name": "Lin", "institution": "Data61"}, {"given_name": "Ting", "family_name": "Guo", "institution": "Data61"}, {"given_name": "Yang", "family_name": "Wang", "institution": "Data61"}, {"given_name": "Yang", "family_name": "Wang", "institution": "Data61"}, {"given_name": "Fang", "family_name": "Chen", "institution": "Data61"}]}