Graph U-Nets
Hongyang Gao, Shuiwang Ji
Introduction
Convolutional neural networks (CNNs) (LeCun et al., 2012) have demonstrated great capability in various challenging artificial intelligence tasks, especially in fields of computer vision (He et al., 2017; Huang et al., 2017) and natural language processing (Bahdanau et al., 2015). One common property behind these tasks is that both images and texts have grid-like structures. Elements on feature maps have locality and order information, which enables the application of convolutional operations (Defferrard et al., 2016).
In practice, many real-world data can be naturally represented as graphs such as social and biological networks. Due to the great success of CNNs on grid-like data, applying them on graph data (Gori et al., 2005; Scarselli et al., 2009) is particularly appealing. Recently, there have been many attempts to extend convolutions to graph data (GNNs) (Kipf & Welling, 2017; Veličković et al., 2017; Gao et al., 2018). One common use of convolutions on graphs is to compute node representations (Hamilton et al., 2017; Ying et al., 2018). With learned node representations, we can perform various tasks on graphs such as node classification and link prediction.
Images can be considered as special cases of graphs, in which nodes lie on regular 2D lattices. It is this special structure that enables the use of convolution and pooling operations on images. Based on this relationship, node classification and embedding tasks have a natural correspondence with pixel-wise prediction tasks such as image segmentation (Noh et al., 2015; Gao & Ji, 2017; Jégou et al., 2017). In particular, both tasks aim to make predictions for each input unit, corresponding to a pixel on images or a node in graphs. In the computer vision field, pixel-wise prediction tasks have achieved major advances recently. Encoder-decoder architectures like the U-Net (Ronneberger et al., 2015) are state-of-the-art methods for these tasks. It is thus highly interesting to develop U-Net-like architectures for graph data. In addition to convolutions, pooling and up-sampling operations are essential building blocks in these architectures. However, extending these operations to graph data is highly challenging. Unlike grid-like data such as images and texts, nodes in graphs have no spatial locality and order information as required by regular pooling operations.
To bridge the above gap, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. Based on these two operations, we propose U-Net-like architectures for graph data. The gPool operation samples some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. As an inverse operation of gPool, we propose a corresponding graph unpooling (gUnpool) operation, which restores the graph to its original structure with the help of locations of nodes selected in the corresponding gPool layer. Based on the gPool and gUnpool layers, we develop graph U-Nets, which allow high-level feature encoding and decoding for network embedding. Experimental results on node classification and graph classification tasks demonstrate the effectiveness of our proposed methods as compared to previous methods.
Related Work
Recently, there has been a rich line of research on graph neural networks (Gilmer et al., 2017). Inspired by the first order graph Laplacian methods, (Kipf & Welling, 2017) proposed graph convolutional networks (GCNs), which achieved promising performance on graph node classification tasks. The layer-wise forward-propagation operation of GCNs is defined as:
In addition to convolution, some studies tried to extend pooling operations to graphs. (Defferrard et al., 2016) proposed to use binary tree indexing for graph coarsening, which fixes indices of nodes before applying 1-D pooling operations. (Simonovsky & Komodakis, 2017) used deterministic graph clustering algorithm to determine pooling patterns. (Ying et al., 2018) used an assignment matrix to achieve pooling by assigning nodes to different clusters of the next layer.
Graph U-Nets
In this section, we introduce the graph pooling (gPool) layer and graph unpooling (gUnpool) layer. Based on these two new layers, we develop the graph U-Nets for node classification tasks.
Pooling layers play important roles in CNNs on grid-like data. They can reduce sizes of feature maps and enlarge receptive fields, thereby giving rise to better generalization and performance (Yu & Koltun, 2016). On grid-like data such as images, feature maps are partitioned into non-overlapping rectangles, on which non-linear down-sampling functions like maximum are applied. In addition to local pooling, global pooling layers (Zhao et al., 2015a) perform down-sampling operations on all input units, thereby reducing each feature map to a single number. In contrast, -max pooling layers (Kalchbrenner et al., 2014) select the -largest units out of each feature map.
However, we cannot directly apply these pooling operations to graphs. In particular, there is no locality information among nodes in graphs. Thus the partition operation is not applicable on graphs. The global pooling operation will reduce all nodes to one single node, which restricts the flexibility of networks. The -max pooling operation outputs the -largest units that may come from different nodes in graphs, resulting in inconsistency in the connectivity of selected nodes.
In this section, we propose the graph pooling (gPool) layer to enable down-sampling on graph data. In this layer, we adaptively select a subset of nodes to form a new but smaller graph. To this end, we employ a trainable projection vector . By projecting all node features to 1D, we can perform -max pooling for node selection. Since the selection is based on 1D footprint of each node, the connectivity in the new graph is consistent across nodes. Given a node with its feature vector , the scalar projection of on is . Here, measures how much information of node can be retained when projected onto the direction of . By sampling nodes, we wish to preserve as much information as possible from the original graph. To achieve this, we select nodes with the largest scalar projection values on to form a new graph.
Notably, the gate operation makes the projection vector trainable by back-propagation (LeCun et al., 2012). Without the gate operation, the projection vector produces discrete outputs, which makes it not trainable by back-propagation. Figure 1 provides an illustration of our proposed graph pooling layer. Compared to pooling operations used in grid-like data, our graph pooling layer employs extra training parameters in projection vector . We will show that the extra parameters are negligible but can boost performance.
2 Graph Unpooling Layer
Up-sampling operations are important for encoder-decoder networks such as U-Net. The encoders of networks usually employ pooling operations to reduce feature map size and increase receptive field. While in decoders, feature maps need to be up-sampled to restore their original resolutions. On grid-like data like images, there are several up-sampling operations such as the deconvolution (Isola et al., 2017; Zhao et al., 2015b) and unpooling layers (Long et al., 2015). However, such operations are not currently available on graph data.
To enable up-sampling operations on graph data, we propose the graph unpooling (gUnpool) layer, which performs the inverse operation of the gPool layer and restores the graph into its original structure. To achieve this, we record the locations of nodes selected in the corresponding gPool layer and use this information to place nodes back to their original positions in the graph. Formally, we propose the layer-wise propagation rule of graph unpooling layer as
3 Graph U-Nets Architecture
It is well-known that encoder-decoder networks like U-Net achieve promising performance on pixel-wise prediction tasks, since they can encode and decode high-level features while maintaining local spatial information. Similar to pixel-wise prediction tasks (Gong et al., 2014; Ronneberger et al., 2015), node classification tasks aim to make a prediction for each input unit. Based on our proposed gPool and gUnpool layers, we propose our graph U-Nets (g-U-Nets) architecture for node classification tasks.
In our graph U-Nets (g-U-Nets), we first apply a graph embedding layer to convert nodes into low-dimensional representations, since original inputs of some dataset like Cora (Sen et al., 2008) usually have very high-dimensional feature vectors. After the graph embedding layer, we build the encoder by stacking several encoding blocks, each of which contains a gPool layer followed by a GCN layer. gPool layers reduce the size of graph to encode higher-order features, while GCN layers are responsible for aggregating information from each node’s first-order information. In the decoder part, we stack the same number of decoding blocks as in the encoder part. Each decoder block is composed of a gUnpool layer and a GCN layer. The gUnpool layer restores the graph into its higher resolution structure, and the GCN layer aggregates information from the neighborhood. There are skip-connections between corresponding blocks of encoder and decoder layers, which transmit spatial information to decoders for better performance. The skip-connection can be either feature map addition or concatenation. Finally, we employ a GCN layer for final predictions before the softmax function. Figure 3 provides an illustration of a sample g-U-Nets with two blocks in encoder and decoder. Notably, there is a GCN layer before each gPool layer, thereby enabling gPool layers to capture the topological information in graphs implicitly.
4 Graph Connectivity Augmentation via Graph Power
5 Improved GCN Layer
In Eq. 1, the adjacency matrix before normalization is computed as in which a self-loop is added to each node in the graph. When performing information aggregation, the same weight is given to node’s own feature vector and its neighboring nodes. In this work, we wish to give a higher weight to node’s own feature vector, since its own feature should be more important for prediction. To this end, we change the calculation to by imposing larger weights on self loops in the graph, which is common in graph processing. All experiments in this work use this modified version of GCN layer for better performance.
Experimental Study
In this section, we evaluate our gPool and gUnpool layers based on the g-U-Nets proposed in Section 3.3. We compare our networks with previous state-of-the-art models on node classification and graph classification tasks. Experimental results show that our methods achieve new state-of-the-art results in terms of node classification accuracy and graph classification accuracy. Some ablation studies are performed to examine the contributions of the proposed gPool layer, gUnpool layer, and graph connectivity augmentation to performance improvements. We conduct studies on the relationship between network depth and node classification performance. We investigate if additional parameters involved in gPool layers can increase the risk of over-fitting.
In experiments, we evaluate our networks on node classification tasks under transductive learning settings and graph classification tasks under inductive learning settings.
Under transductive learning settings, unlabeled data are accessible for training, which enables the network to learn about the graph structure. To be specific, only part of nodes are labeled while labels of other nodes in the same graph remain unknown. We employ three benchmark datasets for this setting; those are Cora, Citeseer, and Pubmed (Kipf & Welling, 2017), which are summarized in Table 1. These datasets are citation networks, with each node and each edge representing a document and a citation, respectively. The feature vector of each node is the bag-of-word representation whose dimension is determined by the dictionary size. We follow the same experimental settings in (Kipf & Welling, 2017). For each class, there are 20 nodes for training, 500 nodes for validation, and 1000 nodes for testing.
Under inductive learning settings, testing data are not available during training, which means the training process does not use graph structures of testing data. We evaluate our methods on relatively large graph datasets selected from common benchmarks used in graph classification tasks (Ying et al., 2018; Niepert et al., 2016; Zhang et al., 2018). We use protein datasets including D&D (Dobson & Doig, 2003) and PROTEINS (Borgwardt et al., 2005), the scientific collaboration dataset COLLAB (Yanardag & Vishwanathan, 2015). These data are summarized in Table 2.
2 Experimental Setup
We describe the experimental setup for both transductive and inductive learning settings. For transductive learning tasks, we employ our proposed g-U-Nets proposed in Section 3.3. Since nodes in the three datasets are associated with high-dimensional features, we employ a GCN layer to reduce them into low-dimensional representations. In the encoder part, we stack four blocks, each of which consists of a gPool layer and a GCN layer. We sample 2000, 1000, 500, 200 nodes in the four gPool layers, respectively. Correspondingly, the decoder part also contains four blocks. Each decoder block is composed of a gUnpool layer and a GCN layer. We use addition operation in skip connections between blocks of encoder and decoder parts. Finally, we apply a GCN layer for final prediction. For all layers in the model, we use identity activation function (Gao et al., 2018) after each GCN layer. To avoid over-fitting, we apply regularization on weights with . Dropout (Srivastava et al., 2014) is applied to both adjacency matrices and feature matrices with keep rates of 0.8 and 0.08, respectively.
For inductive learning tasks, we follow the same experimental setups in (Zhang et al., 2018) using our g-U-Nets architecture as described in transductive learning settings for feature extraction. Since the sizes of graphs vary in graph classification tasks, we sample proportions of nodes in four gPool layers; those are 90%, 70%, 60%, and 50%, respectively. The dropout keep rate imposed on feature matrices is 0.3.
3 Performance Study
Under transductive learning settings, we compare our proposed g-U-Nets with other state-of-the-art models in terms of node classification accuracy. We report node classification accuracies on datasets Cora, Citeseer, and Pubmed, and the results are summarized in Table 3. We can observe from the results that our g-U-Nets achieves consistently better performance than other networks. For baseline values listed for node classification tasks, they are the state-of-the-art on these datasets. Our proposed model is composed of GCN, gPool, and gUnpool layers without involving more advanced graph convolution layers like GAT. When compared to GCN directly, our g-U-Nets significantly improves performance on all three datasets by margins of 2.9%, 2.9%, and 0.6%, respectively. Note that the only difference between our g-U-Nets and GCN is the use of encoder-decoder architecture containing gPool and gUnpool layers. These results demonstrate the effectiveness of g-U-Nets in network embedding.
Under inductive learning settings, we compared our methods with other state-of-the-art models on graph classification tasks with datasets D&D, PROTEINS, and COLLAB, and the results are summarized in Table 4. We can observe from the results that our proposed gPool method outperforms DiffPool (Ying et al., 2018) by margins of 1.79% and 1.43% on the D&D and PROTEINS datasets. Notably, the result obtained by DiffPool-DET on COLLAB is significantly higher than all other methods and the other two DiffPool models. On all three datasets, our model outperforms baseline models including DiffPool. In addition, DiffPool claimed that their training utilized auxiliary task of link prediction to stabilize model performance, which indicates the instability of DiffPool model. But in our experiments, we only use graph labels for training without any auxiliary tasks to stabilize training.
4 Ablation Study of gPool and gUnpool layers
Although GCNs have been reported to have worse performance when the network goes deeper (Kipf & Welling, 2017), it may also be argued that the performance improvement over GCN in Table 3 is due to the use of a deeper network architecture. In this section, we investigate the contributions of gPool and gUnpool layers to the performance of g-U-Nets. We conduct experiments by removing all gPool and gUnpool layers from our g-U-Nets, leading to a network with only GCN layers with skip connections. Table 5 provides the comparison results between g-U-Nets with and without gPool or gUnpool layers. The results show that g-U-Nets have better performance over g-U-Nets without gPool or gUnpool layers by margins of 2.3%, 1.6% and 0.5% on Cora, Citeseer, and Pubmed datasets, respectively. These results demonstrate the contributions of gPool and gUnpool layers to performance improvement. When considering the difference between the two models in terms of architecture, g-U-Nets enable higher level feature encoding, thereby resulting in better generalization and performance.
5 Graph Connectivity Augmentation Study
In the above experiments, we employ gPool layers with graph connectivity augmentation by using the graph power in Section 3.4. Here, we conduct experiments on node classification tasks to investigate the benefits of graph connectivity augmentation based on g-U-Nets. We remove the graph connectivity augmentation from gPool layers while keeping other settings the same for fairness of comparisons. Table 6 provides comparison results between g-U-Nets with and without graph connectivity augmentation. The results show that the absence of graph connectivity augmentation will cause consistent performance degradation on all of three datasets. This demonstrates that graph connectivity augmentation via graph power can help with the graph connectivity and information transfer among nodes in sampled graphs.
6 Network Depth Study of Graph U-Nets
Since the network depth in terms of the number of blocks in encoder and decoder parts is an important hyper-parameter in the g-U-Nets, we conduct experiments to investigate the relationship between network depth and performance in terms of node classification accuracy. We use different network depths on node classification tasks and report the classification accuracies. The results are summarized in Table 7. We can observe from the results that the performance improves as network goes deeper until a depth of 4. The over-fitting problem happens in deeper networks and prevents networks from improving when the depth goes beyond that. In image segmentation, U-Net models with depth 3 or 4 are commonly used (Badrinarayanan et al., 2017; Çiçek et al., 2016), which is consistent with our choice in experiments. This indicates the capacity of gPool and gUnpool layers in receptive field enlargement and high-level feature encoding even working with very shallow networks.
7 Parameter Study of Graph Pooling Layers
Since our proposed gPool layer involves extra parameters, we compute the number of additional parameters based on our g-U-Nets. The comparison results between g-U-Nets with and without gPool or gUnpool layers on dataset Cora are summarized in Table 8. From the results, we can observe that gPool layers in U-Net model only adds 0.12% additional parameters but can promote the performance by a margin of 2.3%. We believe this negligible increase of extra parameters will not increase the risk of over-fitting. Compared to g-U-Nets without gPool or gUnpool layers, the encoder-decoder architecture with our gPool and gUnpool layers yields significant performance improvement.
Conclusion
In this work, we propose novel gPool and gUnpool layers in g-U-Nets networks for network embedding. The gPool layer implements the regular global -max pooling operation on graph data. It samples a subset of important nodes to enable high-level feature encoding and receptive field enlargement. By employing a trainable projection vector, gPool layers sample nodes based on their scalar projection values. Furthermore, we propose the gUnpool layer which applies unpooling operations on graph data. By using the position information of nodes in the original graph, gUnpool layer performs the inverse operation of the corresponding gPool layer and restores the original graph structure. Based on our gPool and gUnpool layers, we propose the graph U-Nets (g-U-Nets) architecture which uses a similar encoder-decoder architecture as regular U-Net on image data. Experimental results demonstrate that our g-U-Nets achieve performance improvements as compared to other GNNs on transductive learning tasks. To avoid the isolated node problem that may exist in sampled graphs, we employ the graph power to improve graph connectivity. Ablation studies indicate the contributions of our graph connectivity augmentation approach.
Acknowledgments
This work was supported in part by National Science Foundation grants IIS-1908166 and IIS-1908198.