Recurrent Dirichlet Belief Networks for Interpretable Dynamic Relational Data Modelling
Yaqiong Li, Xuhui Fan, Ling Chen, Bin Li, Zheng Yu, Scott A. Sisson
Introduction
Dynamic data is a common feature in many real-world applications, including relational data analysis Mucha et al. (2010); Phan and Airoldi (2015); Yang and Koeppl (2018) for learning time-varying node interactions, and text modelling Guo et al. (2018); Schein et al. (2019) for exploring topic evolution. Modelling dynamic data has become a vibrant research topic, with popular techniques ranging from non-Bayesian methods, such as Collaborative Filtering with Temporal Dynamics (SVD++) Koren (2009), to Bayesian deep probabilistic frameworks such as Deep Poisson-Gamma Dynamical Systems (DPGDS) Guo et al. (2018). The main advantage of Bayesian deep probabilistic frameworks is the flexible model design and the strong modelling performance. However, most of these frameworks are static so that they cannot account for the evolution of relationships over time. It would be highly beneficial if the frameworks can be extended to the dynamic setting to enjoy the modelling advantages.
The Dirichlet Belief Network (DirBN) Zhao et al. (2018) has been proposed recently as a promising deep probabilistic framework for learning interpretable deep latent structures. To date, the DirBN has mainly been used in two applications: (1) topic structure learning Zhao et al. (2018), where latent representations are used to model the word distribution for topics; and (2) relational models Fan et al. (2019a), where latent representations model the nodes’ membership distribution over communities. By constructing a deep architecture for latent distributions, the DirBN can model high-order dependence between topic-word distributions (in topic models) and nodes’ membership distributions (in relational models).
In this work, we propose a Recurrent Dirichlet Belief Network (Recurrent-DBN) to explore the complex latent structures in dynamic relational data. In addition to constructing an interpretable deep architecture for the data within individual time steps, we also study the temporal dependence in the dynamic relational data through (layer-to-layer) connections crossing consecutive time steps. Consequently, our Recurrent-DBN can describe long-term temporal dependence (i.e., the dependence between the current variables and those in the previous several time steps), improving over the one-order Markov structures that usually describe the dependence between the current variables and those in the previous one time step only.
For model inference, we further develop an efficient Gibbs sampling algorithm. Besides upward propagating latent counts as done by DirBN, we also introduce a backward step to propagate the counts from the current time step to the previous time steps. Our experiments on real-world dynamic relational data show significant advantages of the Recurrent-DBN over the state-of-the-art models in tasks of interpretable latent structure discovery and link prediction. Similar to DirBN that can be considered as a self-contained module Zhao et al. (2018), our Recurrent-DBN could be flexibly adapted to account for dynamic data other than evolving relational data, such as time-varying counts and dynamic drifting text data.
We summarise this paper’s main merits as follows:
Recurrent structures are designed to model long term temporal dependence. Also, interpretable and organised latent structures are well explored;
An efficient Gibbs sampling method is devised that first upward-backward propagates latent counts and then downward-forward samples variable;
Significantly improved model performance in real-world dynamic relational models compared to the state-of-the-art, including better link prediction performance and enhanced interpretable latent structure visualisation.
Background information of DirBN
We first give a brief review of the DirBN model. In general, the DirBN constructs a multi-stochastic layered architecture to represent interpretable latent distributions for objects. We describe it within the relational data setting for illustrative purposes. Given a binary observed linkage matrix for nodes, where denotes whether node has a relation to node , the DirBN constructs an -layer and -length community membership distribution for each node . The generative process for the membership distributions , as well as the observed matrix , can be briefly described as:
;
;
where is a concentration parameter generating the membership distribution in the st layer, represents the information propagation coefficient from node to node in the th layer, and are the hyper-parameters generating these propagation coefficients, is the latent count information for node and is the sum of these counts, and represents the probabilistic function mapping a pair of membership distributions to a linkage probability. A larger value of indicates higher influence of on the generation of . Therefore, is set to if node is not connected to node in the observed data .
It is difficult to directly implement efficient Gibbs sampling for the DirBN because the prior and posterior distributions of the membership distributions are not conjugate. To address this issue, a strategy of first upward propagating latent counts and then downward sampling variables has been developed in Zhao et al. (2018). Given the count information for node , the DirBN upward propagates to all the nodes in the th layer through a Chinese Restaurant Table (CRT) distribution. Each node in the th layer collects these propagated counts and uses their sum as its latent count in the th layer. This procedure is repeated until the counts have been assigned to all layers. Thus, conjugate constructions can be created for each variable and thereby used to construct efficient Gibbs samplers.
Recurrent-DBN for dynamic relational data modeling
To handle dynamic relational data, we attach an index to variables to denote the corresponding time step. Thus, the observed dynamic relational data can be described as for nodes at time steps, where denotes whether node has relation to node at the th time step. Each matrix can be either asymmetric (directional) or symmetric (non-directional) and we do not consider self-linkages .
In the Recurrent-DBN, we assume the time-dependent membership distribution of a node in the -th layer at time step , , follows a Dirichlet distribution. Its generative process can be described as below, with the propagation of illustrated in Fig. 1 (Left). For notation convenience, any parameters with index are set to zero. It is noted that we have already used the observed data into the generative process.
\beta_{i^{\prime}i,t}^{(l-1)}\left\{\begin{array}[]{ll}=0,&\text{ if }R_{i^{\prime}i,t}=0;\\ \sim\text{Gam}(c_{c}^{(l)},\frac{1}{d_{c}}),&\text{ if }i^{\prime}=i;\\ \sim\text{Gam}(c_{u}^{(l)},\frac{1}{d_{c}}),&\text{ if }i^{\prime}\neq i.\end{array}\right.
\gamma_{i^{\prime}i,t-1}^{(l)}\left\{\begin{array}[]{ll}=0,&\text{ if }R_{i^{\prime}i,t-1}=0;\\ \sim\text{Gam}(c_{c}^{(l)},\frac{1}{d_{c}}),&\text{ if }i^{\prime}=i;\\ \sim\text{Gam}(c_{u}^{(l)},\frac{1}{d_{c}}),&\text{ if }i^{\prime}\neq i.\end{array}\right.
Calculate concentration parameter :
.
We restrict the two nodes to have information propagated only if they are observed with positive relationship (step (a).i and (a).ii). This can reduce the computational cost of calculating from to the scale of the number of positive relationships. Also, it encourages connected nodes to have more similar membership distributions and larger dependencies between each other.
The concentration parameter for generating comprises two parts: the information propagated from all other nodes’ latent representations in the -th layer at time , , and those in the -th layer at time , . In other words, is a linear sum of all the previous-layers’ information at the same time step and all the previous-time steps’ information in the same layer. When the coefficients dominate over , the hierarchical structure plays a more important role. Otherwise, the temporal dependence has higher influence.
2 Application to dynamic relational data
After generating the membership distributions , we use the Bernoulli-Poisson link function Dunson and Herring (2005); Zhou (2015); Fan et al. (2019a) to generate the relational data at each time step:
;
,
where is a community compatibility parameter such that a larger value of indicates a larger possibility of generating the links between communities and , are hyper-parameters, is a scaling parameter for generating the related counting information for node at time , and is a community-to-community latent integer for linkage at time .
Through the Multinomial distributions with as event probabilities, can be regarded as an estimator of . Since the sum , according to the Poisson-Multinomial equivalence, each is equivalently distributed as . Therefore, both the prior distribution for generating and the likelihood based on are Poisson distributions. We may form feasible categorical distribution on its posterior inference. This trick is inspired by the recent advances in data augmentation and marginalisation techniques Fan et al. (2019a), which allows us to implement posterior sampling for efficiently.
The counts lead to the generation of the integer matrix . Based on the Bernoulli-Poisson link function Dunson and Herring (2005); Zhou (2015), the observed is mapped to the latent Poisson count random variable matrix . It is shown in Fan et al. (2019a) that if . That is, only the non-zero links are involved during the inference for , which largely reduces the computational complexity, especially for large and sparse dynamic relational data.
Recurrent structure. Before describing the inference of Recurrent-DBN, we discuss the characteristic of the recurrent structure of our model. Instead of using the one-order Markov property to describe the temporal dependence (assuming the state at time depends on the states at time only), which is adopted by most probabilistic dynamic models, the deep structure of the Recurrent-DBN allows the latent variables at time depend on those at time steps from to . For example, by using the law of total expectations, we can have the expectation of the latent count in a -layered Recurrent-DBN as (We use the notation to denote a related parameter or variable hereafter):
3 Inference
The joint distribution of the latent variables is expressed as:
While the DirBN only has upward-propagation for the latent counts and downward-sampling for the latent variables, for the Recurrent-DBN we develop an upward-backward propagation and forward-downward Gibbs sampling algorithm for count propagation and latent variable sampling. Posterior simulation for the Recurrent-DBN involves two key steps in each sampling iteration: (1) propagating the counts upward and backward to the upper layers and previous time steps via a latent count variable ; (2) forward and downward sampling given the propagated latent counts . Full updates for the other variables are similar to those in Fan et al. (2019a).
Figure 1 (right) illustrates the upward-backward propagation of counts to the latent count variable at each hidden layers. Generally speaking, for , the latent variable is generated as Eq. (1). refers to the latent counts for the node in layer at time for the -th community. By integrating the , the likelihood term of can be calculated as:
By introducing the auxiliary variables and , the likelihood term of can be further augmented as:
where the and can be generated as:
Consequently, can be considered as the ‘derived latent counts’ for node derived from the latent counts . Each can then be upward and backward distributed based on the probabilities of as follows:
Here, the is divided into two parts: one is delivered to each at time of layer (), and the other to each at time of layer (). We denote them as and respectively. The latent counts of lower layers and previous time steps can thus be calculated respectively as:
Let , for , the specification in terms of layer is as follows,
To summarize, upward and backward propagation derives from the latent counts . Then, is distributed to all at time of layer and time of layer respectively as and . Lastly, and contribute to the generation of and respectively. By repeating this process through layers and crossing time steps, we propagate the to the upward and backward sequentially.
3.2 Forward-Downward Sampling Latent Variables
The generated can enable to form closed Gibbs sampling algorithm for the following variables:
After obtaining the latent counts for each layer and each time step, the posterior inference of can be proceeded as:
The likelihood term of can be represented as:
The prior of is . Their posterior distribution is
Related Work
Several Bayesian deep probabilistic frameworks have been proposed to capture the temporal dependence in dynamic data Gan et al. (2015); Gong (2017); Henao et al. (2015). The Deep Dynamic Sigmoid Belief Network Gan et al. (2015) sequentially stacks models of sigmoid belief networks and uses the binary-valued hidden variables to depict the log-range dynamic dependence. The Deep Dynamic Poisson Factor Analysis (DDPFA) Gong (2017) incorporates the Recurrent Neural Networks (RNN) into the Poisson Factor Analysis (PFA) to depict the long-range dynamic dependence. However, in DDPFA, the parameters in RNN and the latent variables in PFA are optimized separately. Poisson Gamma Dynamic Systems (PGDS) Schein et al. (2016) are developed to model the counting data through a “shallow” modelling strategy. Dynamic-PGDS (DPGDS) Guo et al. (2018) is probably the closest work to our approach. Compared with DPGDS, our Recurrent-DBN differs in three aspects: (1) our Recurrent-DBN generates normalized latent representations and thus provides more interpretable structures; (2) the count information is propagated in a different way; (3) our Recurrent-DBN is devised in the setting of relational modelling, while DPGDS is for the topic modelling setting.
For modelling dynamic network data, many of the existing works are “shallow” probabilistic modelling. The dynamic Tensorial Mixed Membership Stochastic Block model (T-MBM) Tarrés-Deulofeu et al. (2019) and the Fragmentation Coagulation Based MMSB (fcMMSB) Yu and Fan (2020) combine the notable mixed-membership stochastic block model with a dynamic setting. The Bayesian Poisson Tensor Factorization (BPTF) Schein et al. (2015) and the Dependent Relational Gamma Process model (DRGPM) Yang and Koeppl (2018) are the representative works that use Poisson matrix factorization techniques to address dynamic counting data. There are also some models using the collaborative filtering techniques such as SVD++. Some methods are not developed for dynamic network data originally, but they have later been applied to the dynamic scenario, such as structure-based models like Common Neighbor (CN) Newman (2001), and network embedding models, including Scalable Multiplex Network Embedding (MNE) Zhang et al. (2018) and DeepWalk Perozzi et al. (2014). It is noted that there is a recent trend in using the graphon theory Lloyd et al. (2012); Orbanz and Roy (2014) to model the network data Fan et al. (2016, 2018b, 2018a, 2019b, 2020).
Experiments
We evaluate the performance of our proposed Recurrent-DBN on five real-world data sets, by comparing with nine baseline methods: Mixed Membership Stochastic Block model (MMSB) Airoldi et al. (2008), T-MBM, fcMMSB, BPTF, DRGPM, SVD++, CN, MNE and DeepWalk. Except MMSB, all of the other eight baseline models are implemented with the released code. For MMSB, we use Gibbs sampling for the inference of all variables.
The real-world relational data sets used in this paper are: Coleman Coleman (1964), Mining Reality Eagle and Pentland (2006), Hypertext Isella et al. (2011), Infectious Isella et al. (2011) and Student Net Fan et al. (2014). The summarized statistics are detailed in Table 2. For the hyper-parameters, we specify for all data sets, and are all given priors and . For MMSB, we set the membership distribution according to .
2 Link prediction
For link prediction, we randomly extract a proportion of of relational data entries (either links or non-links) at each time step as the test set. The remaining is used for training. The test relational data are not used to construct the information propagation matrix (i.e., we set if is the testing data). We estimate the posterior mean of as the linkage probability for each test data. These linkage probabilities are then used to calculate two evaluation metrics: the area under the curve of the receiver operating characteristic (AUC) and the precision-recall (precision). Higher values of AUC and precision indicate better model performance.
The detail results are shown in Table 1. We report the average evaluation results for each model over runs. Each run uses MCMC iterations with the first discarded as burn-in. Overall, Recurrent-DBN outperforms the baseline models for both metrics on almost all data sets. As might be expected, the value of AUC and precision increase with higher model complexity of Recurrent-DBN (i.e., larger values of ). For the other methods, fcMMSB is competitive with DRGPM and outperforms the other baselines. However, they all perform worse than Recurrent-DBN, especially for data sets with large numbers of or . We can see that the Recurrent-DBN has clear advantages in learning dynamic relational data, thanks to the deep hierarchical structure and recurrent long-term temporal dependence modelling.
3 Latent variable visualization
To gain further insights, we visualize the latent variables in Figure 2. It can be observed from the top part that: (1) for the same time step, the membership distributions change gradually with the increase of layers; (2) the membership distributions share some similarities for consecutive time steps and the similarities slowly shift along with the time. For example, the left bottom area of seems to have different patterns: time steps , , and . The bottom part of Figure 2 visualizes propagation coefficients. It is reasonable to see the values of in the first layer and in the first several time steps are small, since less information is propagated in these cases. The values become larger when more information is propagated. Also, the layer-wise propagation seems to have a larger influence than the cross-time propagation, with an average value of .
Conclusion
We have presented a probabilistic deep hierarchical structure named Recurrent Dirichlet Belief Networks (Recurrent-DBN) for learning dynamic relational data. Through Recurrent-DBN, the evolution of the latent structure is characterized by both the cross-layer and the cross-time dependencies. We also develop an upward-backward–forward-downward information propagation to enable efficient Gibbs sampling for all variables. The experimental results on a variety of real data sets demonstrate the excellent predictive performance of our model, and the inferred latent structure provides a rich interpretation for both hierarchical and dynamic information propagation. Our Recurrent-DBN can be applied to tasks like dynamic topic models Guo et al. (2018); Zhao et al. (2018)) and dynamic collaborative filtering. We keep these potential applications as the future work.
Acknowledgements
This work is partly supported by ARC Discovery Project DP180100966. Yaqiong Li is a recipient of UTS Research Excellence Scholarship. Xuhui Fan and Scott A. Sisson are supported by the Australian Research Council through the Australian Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS, CE140100049). Bin Li is supported by Shanghai Municipal Science & Technology Commission (16JC1420401) and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.