Recurrent Dirichlet Belief Networks for Interpretable Dynamic Relational Data Modelling

Yaqiong Li, Xuhui Fan, Ling Chen, Bin Li, Zheng Yu, Scott A. Sisson

Introduction

Dynamic data is a common feature in many real-world applications, including relational data analysis Mucha et al. (2010); Phan and Airoldi (2015); Yang and Koeppl (2018) for learning time-varying node interactions, and text modelling Guo et al. (2018); Schein et al. (2019) for exploring topic evolution. Modelling dynamic data has become a vibrant research topic, with popular techniques ranging from non-Bayesian methods, such as Collaborative Filtering with Temporal Dynamics (SVD++) Koren (2009), to Bayesian deep probabilistic frameworks such as Deep Poisson-Gamma Dynamical Systems (DPGDS) Guo et al. (2018). The main advantage of Bayesian deep probabilistic frameworks is the flexible model design and the strong modelling performance. However, most of these frameworks are static so that they cannot account for the evolution of relationships over time. It would be highly beneficial if the frameworks can be extended to the dynamic setting to enjoy the modelling advantages.

The Dirichlet Belief Network (DirBN) Zhao et al. (2018) has been proposed recently as a promising deep probabilistic framework for learning interpretable deep latent structures. To date, the DirBN has mainly been used in two applications: (1) topic structure learning Zhao et al. (2018), where latent representations are used to model the word distribution for topics; and (2) relational models Fan et al. (2019a), where latent representations model the nodes’ membership distribution over communities. By constructing a deep architecture for latent distributions, the DirBN can model high-order dependence between topic-word distributions (in topic models) and nodes’ membership distributions (in relational models).

In this work, we propose a Recurrent Dirichlet Belief Network (Recurrent-DBN) to explore the complex latent structures in dynamic relational data. In addition to constructing an interpretable deep architecture for the data within individual time steps, we also study the temporal dependence in the dynamic relational data through (layer-to-layer) connections crossing consecutive time steps. Consequently, our Recurrent-DBN can describe long-term temporal dependence (i.e., the dependence between the current variables and those in the previous several time steps), improving over the one-order Markov structures that usually describe the dependence between the current variables and those in the previous one time step only.

For model inference, we further develop an efficient Gibbs sampling algorithm. Besides upward propagating latent counts as done by DirBN, we also introduce a backward step to propagate the counts from the current time step to the previous time steps. Our experiments on real-world dynamic relational data show significant advantages of the Recurrent-DBN over the state-of-the-art models in tasks of interpretable latent structure discovery and link prediction. Similar to DirBN that can be considered as a self-contained module Zhao et al. (2018), our Recurrent-DBN could be flexibly adapted to account for dynamic data other than evolving relational data, such as time-varying counts and dynamic drifting text data.

We summarise this paper’s main merits as follows:

Recurrent structures are designed to model long term temporal dependence. Also, interpretable and organised latent structures are well explored;

An efficient Gibbs sampling method is devised that first upward-backward propagates latent counts and then downward-forward samples variable;

Significantly improved model performance in real-world dynamic relational models compared to the state-of-the-art, including better link prediction performance and enhanced interpretable latent structure visualisation.

Background information of DirBN

We first give a brief review of the DirBN model. In general, the DirBN constructs a multi-stochastic layered architecture to represent interpretable latent distributions for objects. We describe it within the relational data setting for illustrative purposes. Given a binary observed linkage matrix R{0,1}N×N\boldsymbol{R}\in\{0,1\}^{N\times N} for NN nodes, where RijR_{ij} denotes whether node ii has a relation to node jj, the DirBN constructs an LL-layer and KK-length community membership distribution πi={πi(l)}l=1L{\boldsymbol{\pi}_{i}=}\{\boldsymbol{\pi}_{i}^{(l)}\}_{l=1}^{L} for each node ii. The generative process for the membership distributions {πi(l)}l=1L\{\boldsymbol{\pi}_{i}^{(l)}\}_{l=1}^{L}, as well as the observed matrix R\boldsymbol{R}, can be briefly described as:

βii(l1)Gam(c,1d),i,i=1,,N\beta_{i^{\prime}i}^{(l-1)}\sim\text{Gam}(c,\frac{1}{d}),\forall i,i^{\prime}=1,\ldots,N

πi(l)Dirichlet(α1×K1(l=1)+iβii(l1)πi(l1))\boldsymbol{\pi}_{i}^{(l)}\sim\text{Dirichlet}(\boldsymbol{\alpha}^{1\times K}\boldsymbol{1}(l=1)+\sum_{i^{\prime}}\beta_{i^{\prime}i}^{(l-1)}\boldsymbol{\pi}_{i^{\prime}}^{(l-1)})

XiMultinomial(M;πi(L)),i=1,,N\boldsymbol{X}_{i}\sim\text{Multinomial}(M;\boldsymbol{\pi}_{i}^{(L)}),\forall i=1,\ldots,N;

RijBernoulli(f(Xi,Xj)),i,i=1,,NR_{ij}\sim\text{Bernoulli}\left(f(\boldsymbol{X}_{i},\boldsymbol{X}_{j})\right),\forall i,i^{\prime}=1,\ldots,N;

where α1×K\boldsymbol{\alpha}^{1\times K} is a concentration parameter generating the membership distribution in the 11st layer, βii(l1)\beta_{i^{\prime}i}^{(l-1)} represents the information propagation coefficient from node ii^{\prime} to node ii in the (l1)(l-1)th layer, cc and dd are the hyper-parameters generating these propagation coefficients, Xi\boldsymbol{X}_{i} is the latent count information for node ii and MM is the sum of these counts, and f(Xi,Xj)f(\boldsymbol{X}_{i},\boldsymbol{X}_{j}) represents the probabilistic function mapping a pair of membership distributions to a linkage probability. A larger value of βii(l1)\beta_{i^{\prime}i}^{(l-1)} indicates higher influence of πi(l1)\boldsymbol{\pi}_{i^{\prime}}^{(l-1)} on the generation of πi(l)\boldsymbol{\pi}_{i}^{(l)}. Therefore, βii(l1)\beta_{i^{\prime}i}^{(l-1)} is set to if node ii^{\prime} is not connected to node ii in the observed data R\boldsymbol{R}.

It is difficult to directly implement efficient Gibbs sampling for the DirBN because the prior and posterior distributions of the membership distributions πi(l)\boldsymbol{\pi}_{i}^{(l)} are not conjugate. To address this issue, a strategy of first upward propagating latent counts and then downward sampling variables has been developed in Zhao et al. (2018). Given the count information Xi\boldsymbol{X}_{i} for node ii, the DirBN upward propagates Xi\boldsymbol{X}_{i} to all the nodes in the (L1)(L-1)th layer through a Chinese Restaurant Table (CRT) distribution. Each node in the (L1)(L-1)th layer collects these propagated counts and uses their sum as its latent count Xi(L1)\boldsymbol{X}_{i}^{(L-1)} in the (L1)(L-1)th layer. This procedure is repeated until the counts have been assigned to all layers. Thus, conjugate constructions can be created for each variable and thereby used to construct efficient Gibbs samplers.

Recurrent-DBN for dynamic relational data modeling

To handle dynamic relational data, we attach an index tt to variables to denote the corresponding time step. Thus, the observed dynamic relational data can be described as R{0,1}N×N×T\boldsymbol{R}\in\{0,1\}^{N\times N\times T} for NN nodes at TT time steps, where Rij,tR_{ij,t} denotes whether node ii has relation to node jj at the ttth time step. Each matrix {R,t}t{0,1}N×N\{\boldsymbol{R}_{-,t}\}_{t}\in\{0,1\}^{N\times N} can be either asymmetric (directional) or symmetric (non-directional) and we do not consider self-linkages {Rii,t}i,t\{R_{ii,t}\}_{i,t}.

In the Recurrent-DBN, we assume the time-dependent membership distribution of a node ii in the ll-th layer at time step tt, πi,t(l)\boldsymbol{\pi}_{i,t}^{(l)}, follows a Dirichlet distribution. Its generative process can be described as below, with the propagation of πi,t(l)\boldsymbol{\pi}_{i,t}^{(l)} illustrated in Fig. 1 (Left). For notation convenience, any parameters with index are set to zero. It is noted that we have already used the observed data into the generative process.

\beta_{i^{\prime}i,t}^{(l-1)}\left\{\begin{array}[]{ll}=0,&\text{ if }R_{i^{\prime}i,t}=0;\\ \sim\text{Gam}(c_{c}^{(l)},\frac{1}{d_{c}}),&\text{ if }i^{\prime}=i;\\ \sim\text{Gam}(c_{u}^{(l)},\frac{1}{d_{c}}),&\text{ if }i^{\prime}\neq i.\end{array}\right.

\gamma_{i^{\prime}i,t-1}^{(l)}\left\{\begin{array}[]{ll}=0,&\text{ if }R_{i^{\prime}i,t-1}=0;\\ \sim\text{Gam}(c_{c}^{(l)},\frac{1}{d_{c}}),&\text{ if }i^{\prime}=i;\\ \sim\text{Gam}(c_{u}^{(l)},\frac{1}{d_{c}}),&\text{ if }i^{\prime}\neq i.\end{array}\right.

Calculate concentration parameter ψi,t(l)\boldsymbol{\psi}_{i,t}^{(l)}:

πi,t(l)Dirichlet(α1×K1(t=1,l=1)+ψi,t(l))\boldsymbol{\pi}_{i,t}^{(l)}\sim\text{Dirichlet}(\boldsymbol{\alpha}^{1\times K}\boldsymbol{1}(t=1,l=1)+\boldsymbol{\psi}_{i,t}^{(l)}).

We restrict the two nodes to have information propagated only if they are observed with positive relationship (step (a).i and (a).ii). This can reduce the computational cost of calculating β,t(l),γ,t(l)\boldsymbol{\beta}_{-,t}^{(l)},\boldsymbol{\gamma}_{-,t}^{(l)} from O(N2)\mathcal{O}(N^{2}) to the scale of the number of positive relationships. Also, it encourages connected nodes to have more similar membership distributions and larger dependencies between each other.

The concentration parameter ψi,t(l)\boldsymbol{\psi}_{i,t}^{(l)} for generating πi,t(l)\boldsymbol{\pi}_{i,t}^{(l)} comprises two parts: the information propagated from all other nodes’ latent representations in the (l1)(l-1)-th layer at time tt, iβii,t(l1)πi,t(l1)\sum_{i^{\prime}}\beta_{i^{\prime}i,t}^{(l-1)}\boldsymbol{\pi}_{i^{\prime},t}^{(l-1)}, and those in the ll-th layer at time (t1)(t-1), iγii,t1(l)πi,t1(l)\sum_{i^{\prime}}\gamma_{i^{\prime}i,t-1}^{(l)}\boldsymbol{\pi}_{i^{\prime},t-1}^{(l)}. In other words, ψi,t(l)\boldsymbol{\psi}_{i,t}^{(l)} is a linear sum of all the previous-layers’ information at the same time step and all the previous-time steps’ information in the same layer. When the coefficients β\boldsymbol{\beta} dominate over γ\boldsymbol{\gamma}, the hierarchical structure plays a more important role. Otherwise, the temporal dependence has higher influence.

2 Application to dynamic relational data

After generating the membership distributions {πi,t(l)}\{\boldsymbol{\pi}_{i,t}^{(l)}\}, we use the Bernoulli-Poisson link function Dunson and Herring (2005); Zhou (2015); Fan et al. (2019a) to generate the relational data at each time step:

Λk1k2Gamma(λ1,λ0),k1,k2\Lambda_{k_{1}k_{2}}\sim\text{Gamma}(\lambda_{1},\lambda_{0}),\forall k_{1},k_{2}

Mi,tPoisson(M),i,tM_{i,t}\sim\text{Poisson}(M),\forall i,t

Xi,tMultinomial(Mi,t;πi,t(L)),i,t\boldsymbol{X}_{i,t}\sim\text{Multinomial}(M_{i,t};\boldsymbol{\pi}_{i,t}^{(L)}),\forall i,t;

Cij,k1k2,tPoisson(Xi,k1,tΛk1k2Xj,k2,t),k1,k2C_{ij,k_{1}k_{2},t}\sim\text{Poisson}(X_{i,k_{1},t}\Lambda_{k_{1}k_{2}}X_{j,k_{2},t}),\forall k_{1},k_{2}

Rij,t=1(k1,k2Cij,k1k2,t>0)R_{ij,t}={\boldsymbol{1}(\sum_{k_{1},k_{2}}C_{ij,k_{1}k_{2},t}>0)},

where Λk1k2\Lambda_{k_{1}k_{2}} is a community compatibility parameter such that a larger value of Λk1k2\Lambda_{k_{1}k_{2}} indicates a larger possibility of generating the links between communities k1k_{1} and k2k_{2}, λ1,λ0,M\lambda_{1},\lambda_{0},M are hyper-parameters, Mi,tM_{i,t} is a scaling parameter for generating the related counting information for node ii at time tt, and Cij,k1k2,tC_{ij,k_{1}k_{2},t} is a community-to-community latent integer for linkage RijR_{ij} at time tt.

Through the Multinomial distributions with πi,t(L)\boldsymbol{\pi}_{i,t}^{(L)} as event probabilities, Xi,t\boldsymbol{X}_{i,t} can be regarded as an estimator of πi,t(L)\boldsymbol{\pi}_{i,t}^{(L)}. Since the sum MiPoisson(M)M_{i}\sim{\text{Poisson}(M)}, according to the Poisson-Multinomial equivalence, each Xi,k,tX_{i,k,t} is equivalently distributed as Xi,k,tPoisson(Mπi,k,t(L))X_{i,k,t}\sim{\text{Poisson}(M{\pi}_{i,k,t}^{(L)})}. Therefore, both the prior distribution for generating Xi,k,tX_{i,k,t} and the likelihood based on Xi,k,tX_{i,k,t} are Poisson distributions. We may form feasible categorical distribution on its posterior inference. This trick is inspired by the recent advances in data augmentation and marginalisation techniques Fan et al. (2019a), which allows us to implement posterior sampling for Xi,k,tX_{i,k,t} efficiently.

The counts Xi,t\boldsymbol{X}_{i,t} lead to the generation of the K×KK\times K integer matrix Cij,t\boldsymbol{C}_{ij,t}. Based on the Bernoulli-Poisson link function Dunson and Herring (2005); Zhou (2015), the observed Rij,tR_{ij,t} is mapped to the latent Poisson count random variable matrix Cij,t\boldsymbol{C}_{ij,t}. It is shown in Fan et al. (2019a) that {Cij,k1k2,t}k1,k2=0\{C_{ij,k_{1}k_{2},t}\}_{k_{1},k_{2}}=0 if Rij,t=0R_{ij,t}=0. That is, only the non-zero links are involved during the inference for Cij,k1k2,t\boldsymbol{C}_{ij,k_{1}k_{2},t}, which largely reduces the computational complexity, especially for large and sparse dynamic relational data.

Recurrent structure. Before describing the inference of Recurrent-DBN, we discuss the characteristic of the recurrent structure of our model. Instead of using the one-order Markov property to describe the temporal dependence (assuming the state at time tt depends on the states at time t1t-1 only), which is adopted by most probabilistic dynamic models, the deep structure of the Recurrent-DBN allows the latent variables at time tt depend on those at time steps from t1t-1 to tLt-L. For example, by using the law of total expectations, we can have the expectation of the latent count X,t\boldsymbol{X}_{-,t} in a 22-layered Recurrent-DBN as (We use the notation - to denote a related parameter or variable hereafter):

3 Inference

The joint distribution of the latent variables is expressed as:

While the DirBN only has upward-propagation for the latent counts and downward-sampling for the latent variables, for the Recurrent-DBN we develop an upward-backward propagation and forward-downward Gibbs sampling algorithm for count propagation and latent variable sampling. Posterior simulation for the Recurrent-DBN involves two key steps in each sampling iteration: (1) propagating the counts X\boldsymbol{X} upward and backward to the upper layers and previous time steps via a latent count variable m\boldsymbol{m}; (2) forward and downward sampling π,β,γ\boldsymbol{\pi},\boldsymbol{\beta},\boldsymbol{\gamma} given the propagated latent counts m\boldsymbol{m}. Full updates for the other variables are similar to those in Fan et al. (2019a).

Figure 1 (right) illustrates the upward-backward propagation of counts X\boldsymbol{X} to the latent count variable m\boldsymbol{m} at each hidden layers. Generally speaking, for i,i=1,,N,l=1,,L,t=1,,T,k=1,,Ki,i^{\prime}=1,\ldots,N,l=1,\ldots,L,t=1,\ldots,T,k=1,\ldots,K, the latent variable ψ\boldsymbol{\psi} is generated as Eq. (1). mi,k,t(l)m_{i,k,t}^{(l)} refers to the latent counts for the node ii in layer ll at time tt for the kk-th community. By integrating the mi,k,t(l)m_{i,k,t}^{(l)}, the likelihood term of ψi,t(l)\boldsymbol{\psi}_{i,t}^{(l)} can be calculated as:

By introducing the auxiliary variables qi,t(l)q_{i,t}^{(l)} and yi,k,t(l)y_{i,k,t}^{(l)}, the likelihood term of ψi,t(l)\boldsymbol{\psi}_{i,t}^{(l)} can be further augmented as:

where the qi,t(l)q_{i,t}^{(l)} and yi,k,t(l)y_{i,k,t}^{(l)} can be generated as:

Consequently, yi,k,t(l)y_{i,k,t}^{(l)} can be considered as the ‘derived latent counts’ for node ii derived from the latent counts mi,k,t(l)m_{i,k,t}^{(l)}. Each yi,k,t(l)y_{i,k,t}^{(l)} can then be upward and backward distributed based on the probabilities of ψi,k,t(l)\psi_{i,k,t}^{(l)} as follows:

Here, the yi,k,t(l)y_{i,k,t}^{(l)} is divided into two parts: one is delivered to each ii^{\prime} at time tt of layer l1l-1 ((Zi1,k,t(l1),,ZiN,k,t(l1))(Z_{i1,k,t}^{(l-1)},\ldots,Z_{iN,k,t}^{(l-1)})), and the other to each ii^{\prime} at time t1t-1 of layer ll (Ai1,t1,k(l),,Ai1,t1,k(l))A_{i1,t-1,k}^{(l)},\ldots,A_{i1,t-1,k}^{(l)})). We denote them as Zi,k,t(l1)\boldsymbol{Z}_{i-,k,t}^{(l-1)} and Ai,t1,k(l)\boldsymbol{A}_{i-,t-1,k}^{(l)} respectively. The latent counts of lower layers and previous time steps can thus be calculated respectively as:

Let mi,T(L)=Xi,T\boldsymbol{m}_{i,T}^{(L)}=\boldsymbol{X}_{i,T}, for t=T1,,2,i,i=1,,Nt=T-1,\ldots,2,i,i^{\prime}=1,\ldots,N, the specification in terms of layer LL is as follows,

To summarize, upward and backward propagation derives yi,t(l)\boldsymbol{y}_{i,t}^{(l)} from the latent counts mi,t(l)\boldsymbol{m}_{i,t}^{(l)}. Then, yi,t(l)\boldsymbol{y}_{i,t}^{(l)} is distributed to all ii^{\prime} at time tt of layer l1l-1 and time t1t-1 of layer ll respectively as Zi,k,t(l1)\boldsymbol{Z}_{i-,k,t}^{(l-1)} and Ai,t1,k(l)\boldsymbol{A}_{i-,t-1,k}^{(l)}. Lastly, Zi,k,t(l1)\boldsymbol{Z}_{-i,k,t}^{(l-1)} and Ai,t1,k(l)\boldsymbol{A}_{-i,t-1,k}^{(l)} contribute to the generation of mi,t(l1)\boldsymbol{m}_{i,t}^{(l-1)} and mi,t1(l)\boldsymbol{m}_{i,t-1}^{(l)} respectively. By repeating this process through layers and crossing time steps, we propagate the X\boldsymbol{X} to the m(l)\boldsymbol{m}^{(l)} upward and backward sequentially.

3.2 Forward-Downward Sampling Latent Variables

The generated ψ,q,m(l),(Z,A)\boldsymbol{\psi},\boldsymbol{q},\boldsymbol{m}^{(l)},(\boldsymbol{Z},\boldsymbol{A}) can enable to form closed Gibbs sampling algorithm for the following variables:

After obtaining the latent counts mi,t(l)\boldsymbol{m}_{i,t}^{(l)} for each layer and each time step, the posterior inference of πi,t(l)\boldsymbol{\pi}_{i,t}^{(l)} can be proceeded as:

The likelihood term of βii,t(l){\beta}_{i^{\prime}i,t}^{(l)} can be represented as:

The prior of βii,t(l),βii,t(l)\beta_{i^{\prime}i,t}^{(l)},\beta_{i^{\prime}i,t}^{(l)} is Gam(γi(l),1c(l))\text{Gam}(\gamma^{(l)}_{i},\frac{1}{c^{(l)}}). Their posterior distribution is

Related Work

Several Bayesian deep probabilistic frameworks have been proposed to capture the temporal dependence in dynamic data Gan et al. (2015); Gong (2017); Henao et al. (2015). The Deep Dynamic Sigmoid Belief Network Gan et al. (2015) sequentially stacks models of sigmoid belief networks and uses the binary-valued hidden variables to depict the log-range dynamic dependence. The Deep Dynamic Poisson Factor Analysis (DDPFA) Gong (2017) incorporates the Recurrent Neural Networks (RNN) into the Poisson Factor Analysis (PFA) to depict the long-range dynamic dependence. However, in DDPFA, the parameters in RNN and the latent variables in PFA are optimized separately. Poisson Gamma Dynamic Systems (PGDS) Schein et al. (2016) are developed to model the counting data through a “shallow” modelling strategy. Dynamic-PGDS (DPGDS) Guo et al. (2018) is probably the closest work to our approach. Compared with DPGDS, our Recurrent-DBN differs in three aspects: (1) our Recurrent-DBN generates normalized latent representations and thus provides more interpretable structures; (2) the count information is propagated in a different way; (3) our Recurrent-DBN is devised in the setting of relational modelling, while DPGDS is for the topic modelling setting.

For modelling dynamic network data, many of the existing works are “shallow” probabilistic modelling. The dynamic Tensorial Mixed Membership Stochastic Block model (T-MBM) Tarrés-Deulofeu et al. (2019) and the Fragmentation Coagulation Based MMSB (fcMMSB) Yu and Fan (2020) combine the notable mixed-membership stochastic block model with a dynamic setting. The Bayesian Poisson Tensor Factorization (BPTF) Schein et al. (2015) and the Dependent Relational Gamma Process model (DRGPM) Yang and Koeppl (2018) are the representative works that use Poisson matrix factorization techniques to address dynamic counting data. There are also some models using the collaborative filtering techniques such as SVD++. Some methods are not developed for dynamic network data originally, but they have later been applied to the dynamic scenario, such as structure-based models like Common Neighbor (CN) Newman (2001), and network embedding models, including Scalable Multiplex Network Embedding (MNE) Zhang et al. (2018) and DeepWalk Perozzi et al. (2014). It is noted that there is a recent trend in using the graphon theory Lloyd et al. (2012); Orbanz and Roy (2014) to model the network data Fan et al. (2016, 2018b, 2018a, 2019b, 2020).

Experiments

We evaluate the performance of our proposed Recurrent-DBN on five real-world data sets, by comparing with nine baseline methods: Mixed Membership Stochastic Block model (MMSB) Airoldi et al. (2008), T-MBM, fcMMSB, BPTF, DRGPM, SVD++, CN, MNE and DeepWalk. Except MMSB, all of the other eight baseline models are implemented with the released code. For MMSB, we use Gibbs sampling for the inference of all variables.

The real-world relational data sets used in this paper are: Coleman Coleman (1964), Mining Reality Eagle and Pentland (2006), Hypertext Isella et al. (2011), Infectious Isella et al. (2011) and Student Net Fan et al. (2014). The summarized statistics are detailed in Table 2. For the hyper-parameters, we specify MGamma(N,1)M\sim\text{Gamma}(N,1) for all data sets, {cc(l),cu(l)}l,d,dc\{c_{c}^{(l)},c_{u}^{(l)}\}_{l},d,d_{c} and Λk1,k2\Lambda_{k1,k2} are all given Gamma(1,1)\text{Gamma}(1,1) priors and L=3L=3. For MMSB, we set the membership distribution according to Dirichlet(11×K)\text{Dirichlet}(\boldsymbol{1}^{1\times K}).

2 Link prediction

For link prediction, we randomly extract a proportion of 10%10\% of relational data entries (either links or non-links) at each time step as the test set. The remaining 90%90\% is used for training. The test relational data are not used to construct the information propagation matrix (i.e., we set {βii,t(l)}l,t=0\{\beta_{i^{\prime}i,t}^{(l)}\}_{l,t}=0 if RiiR_{i^{\prime}i} is the testing data). We estimate the posterior mean of ek1,k2Xi,k1,tΛk1k2Xj,k2,te^{-{\sum_{k_{1},k_{2}}X_{i,k_{1},t}\Lambda_{k_{1}k_{2}}X_{j,k_{2},t}}} as the linkage probability for each test data. These linkage probabilities are then used to calculate two evaluation metrics: the area under the curve of the receiver operating characteristic (AUC) and the precision-recall (precision). Higher values of AUC and precision indicate better model performance.

The detail results are shown in Table 1. We report the average evaluation results for each model over 1616 runs. Each run uses 30003000 MCMC iterations with the first 15001500 discarded as burn-in. Overall, Recurrent-DBN outperforms the baseline models for both metrics on almost all data sets. As might be expected, the value of AUC and precision increase with higher model complexity of Recurrent-DBN (i.e., larger values of KK). For the other methods, fcMMSB is competitive with DRGPM and outperforms the other baselines. However, they all perform worse than Recurrent-DBN, especially for data sets with large numbers of NN or TT. We can see that the Recurrent-DBN has clear advantages in learning dynamic relational data, thanks to the deep hierarchical structure and recurrent long-term temporal dependence modelling.

3 Latent variable visualization

To gain further insights, we visualize the latent variables in Figure 2. It can be observed from the top part that: (1) for the same time step, the membership distributions change gradually with the increase of layers; (2) the membership distributions share some similarities for consecutive time steps and the similarities slowly shift along with the time. For example, the left bottom area of {πi=1:30(3)}\{\boldsymbol{\pi}_{i=1:30}^{(3)}\} seems to have 33 different patterns: time steps t=1t=1, t=24t=2\sim 4, and t=510t=5\sim 10. The bottom part of Figure 2 visualizes propagation coefficients. It is reasonable to see the values of β\overline{\boldsymbol{\beta}} in the first layer and γ\overline{\boldsymbol{\gamma}} in the first several time steps are small, since less information is propagated in these cases. The values become larger when more information is propagated. Also, the layer-wise propagation seems to have a larger influence than the cross-time propagation, with an average value of β/γ=1.21.4\overline{\boldsymbol{\beta}}/\overline{\boldsymbol{\gamma}}=1.2\sim 1.4.

Conclusion

We have presented a probabilistic deep hierarchical structure named Recurrent Dirichlet Belief Networks (Recurrent-DBN) for learning dynamic relational data. Through Recurrent-DBN, the evolution of the latent structure is characterized by both the cross-layer and the cross-time dependencies. We also develop an upward-backward–forward-downward information propagation to enable efficient Gibbs sampling for all variables. The experimental results on a variety of real data sets demonstrate the excellent predictive performance of our model, and the inferred latent structure provides a rich interpretation for both hierarchical and dynamic information propagation. Our Recurrent-DBN can be applied to tasks like dynamic topic models Guo et al. (2018); Zhao et al. (2018)) and dynamic collaborative filtering. We keep these potential applications as the future work.

Acknowledgements

This work is partly supported by ARC Discovery Project DP180100966. Yaqiong Li is a recipient of UTS Research Excellence Scholarship. Xuhui Fan and Scott A. Sisson are supported by the Australian Research Council through the Australian Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS, CE140100049). Bin Li is supported by Shanghai Municipal Science & Technology Commission (16JC1420401) and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.

References