# NN Modules (Tensorflow)¶

We welcome your contribution! If you want a model to be implemented in DGL as a NN module, please create an issue started with “[Feature Request] NN Module XXXModel”.

If you want to contribute a NN module, please create a pull request started with “[NN] XXXModel in tensorflow NN Modules” and our team member would review this PR.

## Conv Layers¶

TF NN conv module

### GraphConv¶

class dgl.nn.tensorflow.conv.GraphConv(in_feats, out_feats, norm='both', weight=True, bias=True, activation=None)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply graph convolution over an input signal.

Graph convolution is introduced in GCN and can be described as below:

$h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})$

where $$\mathcal{N}(i)$$ is the neighbor set of node $$i$$. $$c_{ij}$$ is equal to the product of the square root of node degrees: $$\sqrt{|\mathcal{N}(i)|}\sqrt{|\mathcal{N}(j)|}$$. $$\sigma$$ is an activation function.

The model parameters are initialized as in the original implementation where the weight $$W^{(l)}$$ is initialized using Glorot uniform initialization and the bias is initialized to be zero.

Notes

Zero in degree nodes could lead to invalid normalizer. A common practice to avoid this is to add a self-loop for each node in the graph, which can be achieved by:

>>> g = ... # some DGLGraph

Parameters: in_feats (int) – Input feature size. out_feats (int) – Output feature size. norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s in-degrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the $$c_{ij}$$ in the paper is applied. weight (bool, optional) – If True, apply a linear layer. Otherwise, aggregating the messages without a weight matrix. bias (bool, optional) – If True, adds a learnable bias to the output. Default: True. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.
weight

tf.Tensor – The learnable weight tensor.

bias

tf.Tensor – The learnable bias tensor.

### RelGraphConv¶

class dgl.nn.tensorflow.conv.RelGraphConv(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=False, dropout=0.0)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Relational graph convolution layer.

Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:

$h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})$

where $$\mathcal{N}^r(i)$$ is the neighbor set of node $$i$$ w.r.t. relation $$r$$. $$c_{i,r}$$ is the normalizer equal to $$|\mathcal{N}^r(i)|$$. $$\sigma$$ is an activation function. $$W_0$$ is the self-loop weight.

The basis regularization decomposes $$W_r$$ by:

$W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}$

where $$B$$ is the number of bases.

The block-diagonal-decomposition regularization decomposes $$W_r$$ into $$B$$ number of block diagonal matrices. We refer $$B$$ as the number of bases.

Parameters: in_feat (int) – Input feature size. out_feat (int) – Output feature size. num_rels (int) – Number of relations. regularizer (str) – Which weight regularizer to use “basis” or “bdd” num_bases (int, optional) – Number of bases. If is none, use number of relations. Default: None. bias (bool, optional) – True if bias is added. Default: True activation (callable, optional) – Activation function. Default: None self_loop (bool, optional) – True to include self loop message. Default: False dropout (float, optional) – Dropout rate. Default: 0.0

### GATConv¶

class dgl.nn.tensorflow.conv.GATConv(in_feats, out_feats, num_heads, feat_drop=0.0, attn_drop=0.0, negative_slope=0.2, residual=False, activation=None)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply Graph Attention Network over an input signal.

$h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} \alpha_{i,j} W^{(l)} h_j^{(l)}$

where $$\alpha_{ij}$$ is the attention score bewteen node $$i$$ and node $$j$$:

\begin{align}\begin{aligned}\alpha_{ij}^{l} & = \mathrm{softmax_i} (e_{ij}^{l})\\e_{ij}^{l} & = \mathrm{LeakyReLU}\left(\vec{a}^T [W h_{i} \| W h_{j}]\right)\end{aligned}\end{align}
Parameters: in_feats (int, or a pair of ints) – Input feature size. If the layer is to be applied to a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value. out_feats (int) – Output feature size. num_heads (int) – Number of heads in Multi-Head Attention. feat_drop (float, optional) – Dropout rate on feature, defaults: 0. attn_drop (float, optional) – Dropout rate on attention weight, defaults: 0. negative_slope (float, optional) – LeakyReLU angle of negative slope. residual (bool, optional) – If True, use residual connection. activation (callable activation function/layer or None, optional.) – If not None, applies an activation function to the updated node features. Default: None.

### SAGEConv¶

class dgl.nn.tensorflow.conv.SAGEConv(in_feats, out_feats, aggregator_type, feat_drop=0.0, bias=True, norm=None, activation=None)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

GraphSAGE layer from paper Inductive Representation Learning on Large Graphs.

\begin{align}\begin{aligned}h_{\mathcal{N}(i)}^{(l+1)} & = \mathrm{aggregate} \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\\h_{i}^{(l+1)} & = \sigma \left(W \cdot \mathrm{concat} (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1} + b) \right)\\h_{i}^{(l+1)} & = \mathrm{norm}(h_{i}^{l})\end{aligned}\end{align}
Parameters: in_feats (int, or pair of ints) – Input feature size. If the layer is to be applied on a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value. If aggregator type is gcn, the feature size of source and destination nodes are required to be the same. out_feats (int) – Output feature size. feat_drop (float) – Dropout rate on features, default: 0. aggregator_type (str) – Aggregator type to use (mean, gcn, pool, lstm). bias (bool) – If True, adds a learnable bias to the output. Default: True. norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

### SGConv¶

class dgl.nn.tensorflow.conv.SGConv(in_feats, out_feats, k=1, cached=False, bias=True, norm=None)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Simplifying Graph Convolution layer from paper Simplifying Graph Convolutional Networks.

$H^{l+1} = (\hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2})^K H^{l} \Theta^{l}$
Parameters: in_feats (int) – Number of input features. out_feats (int) – Number of output features. k (int) – Number of hops $$K$$. Defaults:1. cached (bool) – If True, the module would cache $(\hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}})^K X\Theta$ at the first forward call. This parameter should only be set to True in Transductive Learning setting. bias (bool) – If True, adds a learnable bias to the output. Default: True. norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.

### APPNPConv¶

class dgl.nn.tensorflow.conv.APPNPConv(k, alpha, edge_drop=0.0)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Approximate Personalized Propagation of Neural Predictions layer from paper Predict then Propagate: Graph Neural Networks meet Personalized PageRank.

\begin{align}\begin{aligned}H^{0} & = X\\H^{t+1} & = (1-\alpha)\left(\hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2} H^{t}\right) + \alpha H^{0}\end{aligned}\end{align}
Parameters: k (int) – Number of iterations $$K$$. alpha (float) – The teleport probability $$\alpha$$. edge_drop (float, optional) – Dropout rate on edges that controls the messages received by each node. Default: 0.

### GINConv¶

class dgl.nn.tensorflow.conv.GINConv(apply_func, aggregator_type, init_eps=0, learn_eps=False)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Graph Isomorphism Network layer from paper How Powerful are Graph Neural Networks?.

$h_i^{(l+1)} = f_\Theta \left((1 + \epsilon) h_i^{l} + \mathrm{aggregate}\left(\left\{h_j^{l}, j\in\mathcal{N}(i) \right\}\right)\right)$
Parameters: apply_func (callable activation function/layer or None) – If not None, apply this function to the updated node feature, the $$f_\Theta$$ in the formula. aggregator_type (str) – Aggregator type to use (sum, max or mean). init_eps (float, optional) – Initial $$\epsilon$$ value, default: 0. learn_eps (bool, optional) – If True, $$\epsilon$$ will be a learnable parameter.

## Global Pooling Layers¶

Tensorflow modules for graph global pooling.

### SumPooling¶

class dgl.nn.tensorflow.glob.SumPooling[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply sum pooling over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k$
call(graph, feat)[source]

Compute sum pooling.

Parameters: graph (DGLGraph) – The graph. feat (tf.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size. tf.Tensor

### AvgPooling¶

class dgl.nn.tensorflow.glob.AvgPooling[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply average pooling over the nodes in the graph.

$r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k$
call(graph, feat)[source]

Compute average pooling.

Parameters: graph (DGLGraph) – The graph. feat (tf.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size. tf.Tensor

### MaxPooling¶

class dgl.nn.tensorflow.glob.MaxPooling[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply max pooling over the nodes in the graph.

$r^{(i)} = \max_{k=1}^{N_i}\left( x^{(i)}_k \right)$
call(graph, feat)[source]

Compute max pooling.

Parameters: graph (DGLGraph) – The graph. feat (tf.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size. tf.Tensor

### SortPooling¶

class dgl.nn.tensorflow.glob.SortPooling(k)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply Sort Pooling (An End-to-End Deep Learning Architecture for Graph Classification) over the nodes in the graph.

Parameters: k (int) – The number of nodes to hold for each graph.
call(graph, feat)[source]

Compute sort pooling.

Parameters: graph (DGLGraph) – The graph. feat (tf.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, k * D)$$, where $$B$$ refers to the batch size. tf.Tensor

### GlobalAttentionPooling¶

class dgl.nn.tensorflow.glob.GlobalAttentionPooling(gate_nn, feat_nn=None)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)$
Parameters: gate_nn (tf.layers.Layer) – A neural network that computes attention scores for each feature. feat_nn (tf.layers.Layer, optional) – A neural network applied to each feature before combining them with attention scores.
call(graph, feat)[source]

Compute global attention pooling.

Parameters: graph (DGLGraph) – The graph. feat (tf.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size. tf.Tensor

## Utility Modules¶

### Edge Softmax¶

tf modules for graph related softmax.

dgl.nn.tensorflow.softmax.edge_softmax(graph, logits, eids='__ALL__')[source]