# NN Modules (Tensorflow)¶

## Conv Layers¶

TF NN conv module

### GraphConv¶

class dgl.nn.tensorflow.conv.GraphConv(in_feats, out_feats, norm='both', weight=True, bias=True, activation=None, allow_zero_in_degree=False)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Graph convolution was introduced in GCN and mathematically is defined as follows:

$h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})$

where $$\mathcal{N}(i)$$ is the set of neighbors of node $$i$$, $$c_{ij}$$ is the product of the square root of node degrees (i.e., $$c_{ij} = \sqrt{|\mathcal{N}(i)|}\sqrt{|\mathcal{N}(j)|}$$), and $$\sigma$$ is an activation function.

Parameters
• in_feats (int) – Input feature size; i.e, the number of dimensions of $$h_j^{(l)}$$.

• out_feats (int) – Output feature size; i.e., the number of dimensions of $$h_i^{(l+1)}$$.

• norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s in-degrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the $$c_{ij}$$ in the paper is applied.

• weight (bool, optional) – If True, apply a linear layer. Otherwise, aggregating the messages without a weight matrix.

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

• activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Default: False.

weight

The learnable weight tensor.

Type

torch.Tensor

bias

The learnable bias tensor.

Type

torch.Tensor

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Examples

>>> import dgl
>>> import numpy as np
>>> import tensorflow as tf
>>> from dgl.nn import GraphConv

>>> # Case 1: Homogeneous graph
>>> with tf.device("CPU:0"):
...     g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
...     feat = tf.ones((6, 10))
...     conv = GraphConv(10, 2, norm='both', weight=True, bias=True)
...     res = conv(g, feat)
>>> print(res)
<tf.Tensor: shape=(6, 2), dtype=float32, numpy=
array([[ 0.6208475 , -0.4896223 ],
[ 0.68356586, -0.5390842 ],
[ 0.6208475 , -0.4896223 ],
[ 0.7859846 , -0.61985517],
[ 0.8251371 , -0.65073216],
[ 0.48335412, -0.38119012]], dtype=float32)>
>>> # allow_zero_in_degree example
>>> with tf.device("CPU:0"):
...     g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
...     conv = GraphConv(10, 2, norm='both', weight=True, bias=True, allow_zero_in_degree=True)
...     res = conv(g, feat)
>>> print(res)
<tf.Tensor: shape=(6, 2), dtype=float32, numpy=
array([[ 0.6208475 , -0.4896223 ],
[ 0.68356586, -0.5390842 ],
[ 0.6208475 , -0.4896223 ],
[ 0.7859846 , -0.61985517],
[ 0.8251371 , -0.65073216],
[ 0., 0.]], dtype=float32)>

>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> with tf.device("CPU:0"):
...     g = dgl.bipartite((u, v))
...     u_fea = tf.convert_to_tensor(np.random.rand(2, 5))
...     v_fea = tf.convert_to_tensor(np.random.rand(4, 5))
...     conv = GraphConv(5, 2, norm='both', weight=True, bias=True)
...     res = conv(g, (u_fea, v_fea))
>>> res
<tf.Tensor: shape=(4, 2), dtype=float32, numpy=
array([[ 1.3607183, -0.1636453],
[ 1.6665325, -0.2004239],
[ 2.1405895, -0.2574358],
[ 1.3607183, -0.1636453]], dtype=float32)>


### RelGraphConv¶

class dgl.nn.tensorflow.conv.RelGraphConv(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=True, low_mem=False, dropout=0.0, layer_norm=False)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Relational graph convolution layer.

Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:

$h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})$

where $$\mathcal{N}^r(i)$$ is the neighbor set of node $$i$$ w.r.t. relation $$r$$. $$c_{i,r}$$ is the normalizer equal to $$|\mathcal{N}^r(i)|$$. $$\sigma$$ is an activation function. $$W_0$$ is the self-loop weight.

The basis regularization decomposes $$W_r$$ by:

$W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}$

where $$B$$ is the number of bases, $$V_b^{(l)}$$ are linearly combined with coefficients $$a_{rb}^{(l)}$$.

The block-diagonal-decomposition regularization decomposes $$W_r$$ into $$B$$ number of block diagonal matrices. We refer $$B$$ as the number of bases.

The block regularization decomposes $$W_r$$ by:

$W_r^{(l)} = \oplus_{b=1}^B Q_{rb}^{(l)}$

where $$B$$ is the number of bases, $$Q_{rb}^{(l)}$$ are block bases with shape $$R^{(d^{(l+1)}/B)*(d^{l}/B)}$$.

Parameters
• in_feat (int) – Input feature size; i.e, the number of dimensions of $$h_j^{(l)}$$.

• out_feat (int) – Output feature size; i.e., the number of dimensions of $$h_i^{(l+1)}$$.

• num_rels (int) – Number of relations. .

• regularizer (str) – Which weight regularizer to use “basis” or “bdd”. “basis” is short for basis-diagonal-decomposition. “bdd” is short for block-diagonal-decomposition.

• num_bases (int, optional) – Number of bases. If is none, use number of relations. Default: None.

• bias (bool, optional) – True if bias is added. Default: True.

• activation (callable, optional) – Activation function. Default: None.

• self_loop (bool, optional) – True to include self loop message. Default: True.

• low_mem (bool, optional) – True to use low memory implementation of relation message passing function. Default: False. This option trades speed with memory consumption, and will slowdown the forward/backward. Turn it on when you encounter OOM problem during training or evaluation. Default: False.

• dropout (float, optional) – Dropout rate. Default: 0.0

• layer_norm (float, optional) – Add layer norm. Default: False

Examples

>>> import dgl
>>> import numpy as np
>>> import tensorflow as tf
>>> from dgl.nn import RelGraphConv
>>>
>>> with tf.device("CPU:0"):
>>>     g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>>     feat = tf.ones((6, 10))
>>>     conv = RelGraphConv(10, 2, 3, regularizer='basis', num_bases=2)
>>>     etype = tf.convert_to_tensor(np.array([0,1,2,0,1,2]).astype(np.int64))
>>>     res = conv(g, feat, etype)
>>>     res
<tf.Tensor: shape=(6, 2), dtype=float32, numpy=
array([[-0.02938664,  1.7932655 ],
[ 0.1146394 ,  0.48319   ],
[-0.02938664,  1.7932655 ],
[ 1.2054908 , -0.26098895],
[ 0.1146394 ,  0.48319   ],
[ 0.75915515,  1.1454091 ]], dtype=float32)>

>>> # One-hot input
>>> with tf.device("CPU:0"):
>>>     one_hot_feat = tf.convert_to_tensor(np.array([0,1,2,3,4,5]).astype(np.int64))
>>>     res = conv(g, one_hot_feat, etype)
>>>     res
<tf.Tensor: shape=(6, 2), dtype=float32, numpy=
array([[-0.24205256, -0.7922753 ],
[ 0.62085056,  0.4893622 ],
[-0.9484881 , -0.26546806],
[-0.2163915 , -0.12585883],
[-0.14293689,  0.77483284],
[ 0.091169  , -0.06761569]], dtype=float32)>


### GATConv¶

class dgl.nn.tensorflow.conv.GATConv(in_feats, out_feats, num_heads, feat_drop=0.0, attn_drop=0.0, negative_slope=0.2, residual=False, activation=None, allow_zero_in_degree=False)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply Graph Attention Network over an input signal.

$h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} \alpha_{i,j} W^{(l)} h_j^{(l)}$

where $$\alpha_{ij}$$ is the attention score bewteen node $$i$$ and node $$j$$:

\begin{align}\begin{aligned}\alpha_{ij}^{l} &= \mathrm{softmax_i} (e_{ij}^{l})\\e_{ij}^{l} &= \mathrm{LeakyReLU}\left(\vec{a}^T [W h_{i} \| W h_{j}]\right)\end{aligned}\end{align}
Parameters
• in_feats (int, or pair of ints) – Input feature size; i.e, the number of dimensions of $$h_i^{(l)}$$. ATConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer is to be applied to a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.

• out_feats (int) – Output feature size; i.e, the number of dimensions of $$h_i^{(l+1)}$$.

• feat_drop (float, optional) – Dropout rate on feature. Defaults: 0.

• attn_drop (float, optional) – Dropout rate on attention weight. Defaults: 0.

• negative_slope (float, optional) – LeakyReLU angle of negative slope. Defaults: 0.2.

• residual (bool, optional) – If True, use residual connection. Defaults: False.

• activation (callable activation function/layer or None, optional.) – If not None, applies an activation function to the updated node features. Default: None.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Defaults: False.

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Examples

>>> import dgl
>>> import numpy as np
>>> import tensorflow as tf
>>> from dgl.nn import GATConv
>>>
>>> # Case 1: Homogeneous graph
>>> with tf.device("CPU:0"):
>>>     g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>>     feat = tf.ones((6, 10))
>>>     gatconv = GATConv(10, 2, num_heads=3)
>>>     res = gatconv(g, feat)
>>>     res
<tf.Tensor: shape=(6, 3, 2), dtype=float32, numpy=
array([[[ 0.75311995, -1.8093625 ],
[-0.12128812, -0.78072834],
[-0.49870574, -0.15074375]],
[[ 0.75311995, -1.8093625 ],
[-0.12128812, -0.78072834],
[-0.49870574, -0.15074375]],
[[ 0.75311995, -1.8093625 ],
[-0.12128812, -0.78072834],
[-0.49870574, -0.15074375]],
[[ 0.75311995, -1.8093626 ],
[-0.12128813, -0.78072834],
[-0.49870574, -0.15074375]],
[[ 0.75311995, -1.8093625 ],
[-0.12128812, -0.78072834],
[-0.49870574, -0.15074375]],
[[ 0.75311995, -1.8093625 ],
[-0.12128812, -0.78072834],
[-0.49870574, -0.15074375]]], dtype=float32)>

>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> g = dgl.bipartite((u, v))
>>> with tf.device("CPU:0"):
>>>     u_feat = tf.convert_to_tensor(np.random.rand(2, 5))
>>>     v_feat = tf.convert_to_tensor(np.random.rand(4, 10))
>>>     gatconv = GATConv((5,10), 2, 3)
>>>     res = gatconv(g, (u_feat, v_feat))
>>>     res
<tf.Tensor: shape=(4, 3, 2), dtype=float32, numpy=
array([[[-0.89649093, -0.74841046],
[ 0.5088224 ,  0.10908248],
[ 0.55670375, -0.6811229 ]],
[[-0.7905004 , -0.1457274 ],
[ 0.2248168 ,  0.93014705],
[ 0.12816726, -0.4093595 ]],
[[-0.85875374, -0.53382933],
[ 0.36841977,  0.51498866],
[ 0.31893706, -0.5303393 ]],
[[-0.89649093, -0.74841046],
[ 0.5088224 ,  0.10908248],
[ 0.55670375, -0.6811229 ]]], dtype=float32)>


### SAGEConv¶

class dgl.nn.tensorflow.conv.SAGEConv(in_feats, out_feats, aggregator_type, feat_drop=0.0, bias=True, norm=None, activation=None)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

GraphSAGE layer from paper Inductive Representation Learning on Large Graphs.

\begin{align}\begin{aligned}h_{\mathcal{N}(i)}^{(l+1)} &= \mathrm{aggregate} \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\\h_{i}^{(l+1)} &= \sigma \left(W \cdot \mathrm{concat} (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1}) \right)\\h_{i}^{(l+1)} &= \mathrm{norm}(h_{i}^{l})\end{aligned}\end{align}
Parameters
• in_feats (int, or pair of ints) –

Input feature size; i.e, the number of dimensions of $$h_i^{(l)}$$.

GATConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer applies on a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.

If aggregator type is gcn, the feature size of source and destination nodes are required to be the same.

• out_feats (int) – Output feature size; i.e, the number of dimensions of $$h_i^{(l+1)}$$.

• feat_drop (float) – Dropout rate on features, default: 0.

• aggregator_type (str) – Aggregator type to use (mean, gcn, pool, lstm).

• bias (bool) – If True, adds a learnable bias to the output. Default: True.

• norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.

• activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

Examples

>>> import dgl
>>> import numpy as np
>>> import tensorflow as tf
>>> from dgl.nn import SAGEConv
>>>
>>> # Case 1: Homogeneous graph
>>> with tf.device("CPU:0"):
>>>     g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>>     feat = tf.ones((6, 10))
>>>     conv = SAGEConv(10, 2, 'pool')
>>>     res = conv(g, feat)
>>>     res
<tf.Tensor: shape=(6, 2), dtype=float32, numpy=
array([[-3.6633523 , -0.90711546],
[-3.6633523 , -0.90711546],
[-3.6633523 , -0.90711546],
[-3.6633523 , -0.90711546],
[-3.6633523 , -0.90711546],
[-3.6633523 , -0.90711546]], dtype=float32)>

>>> # Case 2: Unidirectional bipartite graph
>>> with tf.device("CPU:0"):
>>>     u = [0, 1, 0, 0, 1]
>>>     v = [0, 1, 2, 3, 2]
>>>     g = dgl.bipartite((u, v))
>>>     u_fea = tf.convert_to_tensor(np.random.rand(2, 5))
>>>     v_fea = tf.convert_to_tensor(np.random.rand(4, 5))
>>>     conv = SAGEConv((5, 10), 2, 'mean')
>>>     res = conv(g, (u_fea, v_fea))
>>>     res
<tf.Tensor: shape=(4, 2), dtype=float32, numpy=
array([[-0.59453356, -0.4055441 ],
[-0.47459763, -0.717764  ],
[ 0.3221837 , -0.29876417],
[-0.63356155,  0.09390211]], dtype=float32)>


### ChebConv¶

class dgl.nn.tensorflow.conv.ChebConv(in_feats, out_feats, k, activation=<function relu>, bias=True)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.

\begin{align}\begin{aligned}h_i^{l+1} &= \sum_{k=0}^{K-1} W^{k, l}z_i^{k, l}\\Z^{0, l} &= H^{l}\\Z^{1, l} &= \tilde{L} \cdot H^{l}\\Z^{k, l} &= 2 \cdot \tilde{L} \cdot Z^{k-1, l} - Z^{k-2, l}\\\tilde{L} &= 2\left(I - \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2}\right)/\lambda_{max} - I\end{aligned}\end{align}

where $$\tilde{A}$$ is $$A$$ + $$I$$, $$W$$ is learnable weight.

Parameters
• in_feats (int) – Dimension of input features; i.e, the number of dimensions of $$h_i^{(l)}$$.

• out_feats (int) – Dimension of output features $$h_i^{(l+1)}$$.

• k (int) – Chebyshev filter size $$K$$.

• activation (function, optional) – Activation function. Default ReLu.

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

Example

>>> import dgl
>>> import numpy as np
>>> import tensorflow as tf
>>> from dgl.nn import ChebConv
>>>
>>> with tf.device("CPU:0"):
>>>     g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>>     feat = tf.ones((6, 10))
>>>     conv = ChebConv(10, 2, 2)
>>>     res = conv(g, feat)
>>>     res
<tf.Tensor: shape=(6, 2), dtype=float32, numpy=
array([[ 0.6163, -0.1809],
[ 0.6163, -0.1809],
[ 0.6163, -0.1809],
[ 0.9698, -1.5053],
[ 0.3664,  0.7556],
[-0.2370,  3.0164]], dtype=float32)>


### SGConv¶

class dgl.nn.tensorflow.conv.SGConv(in_feats, out_feats, k=1, cached=False, bias=True, norm=None, allow_zero_in_degree=False)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Simplifying Graph Convolution layer from paper Simplifying Graph Convolutional Networks.

$H^{K} = (\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2})^K X \Theta$

where $$\tilde{A}$$ is $$A$$ + $$I$$. Thus the graph input is expected to have self-loop edges added.

Parameters
• in_feats (int) – Number of input features; i.e, the number of dimensions of $$X$$.

• out_feats (int) – Number of output features; i.e, the number of dimensions of $$H^{K}$$.

• k (int) – Number of hops $$K$$. Defaults:1.

• cached (bool) –

If True, the module would cache

$(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}})^K X\Theta$

at the first forward call. This parameter should only be set to True in Transductive Learning setting.

• bias (bool) – If True, adds a learnable bias to the output. Default: True.

• norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features. Default: False.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Default: False.

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Example

>>> import dgl
>>> import numpy as np
>>> import tensorflow as tf
>>> from dgl.nn import SGConv
>>>
>>> with tf.device("CPU:0"):
>>>     g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>>     feat = tf.ones((6, 10))
>>>     conv = SGConv(10, 2, k=2, cached=True)
>>>     res = conv(g, feat)
>>>     res
<tf.Tensor: shape=(6, 2), dtype=float32, numpy=
array([[0.61023676, 0.5246612 ],
[0.61023676, 0.5246612 ],
[0.61023676, 0.5246612 ],
[0.8697353 , 0.7477695 ],
[0.60570633, 0.520766  ],
[0.6102368 , 0.52466124]], dtype=float32)>


### APPNPConv¶

class dgl.nn.tensorflow.conv.APPNPConv(k, alpha, edge_drop=0.0)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Approximate Personalized Propagation of Neural Predictions layer from paper Predict then Propagate: Graph Neural Networks meet Personalized PageRank.

\begin{align}\begin{aligned}H^{0} & = X\\H^{t+1} & = (1-\alpha)\left(\hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2} H^{t}\right) + \alpha H^{0}\end{aligned}\end{align}
Parameters
• k (int) – Number of iterations $$K$$.

• alpha (float) – The teleport probability $$\alpha$$.

• edge_drop (float, optional) – Dropout rate on edges that controls the messages received by each node. Default: 0.

### GINConv¶

class dgl.nn.tensorflow.conv.GINConv(apply_func, aggregator_type, init_eps=0, learn_eps=False)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Graph Isomorphism Network layer from paper How Powerful are Graph Neural Networks?.

$h_i^{(l+1)} = f_\Theta \left((1 + \epsilon) h_i^{l} + \mathrm{aggregate}\left(\left\{h_j^{l}, j\in\mathcal{N}(i) \right\}\right)\right)$
Parameters
• apply_func (callable activation function/layer or None) – If not None, apply this function to the updated node feature, the $$f_\Theta$$ in the formula.

• aggregator_type (str) – Aggregator type to use (sum, max or mean).

• init_eps (float, optional) – Initial $$\epsilon$$ value, default: 0.

• learn_eps (bool, optional) – If True, $$\epsilon$$ will be a learnable parameter. Default: False.

Example

>>> import dgl
>>> import numpy as np
>>> import tensorflow as tf
>>> from dgl.nn import GINConv
>>>
>>> with tf.device("CPU:0"):
>>>     g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>>     feat = tf.ones((6, 10))
>>>     lin = tf.keras.layers.Dense(10)
>>>     conv = GINConv(lin, 'max')
>>>     res = conv(g, feat)
>>>     res
<tf.Tensor: shape=(6, 10), dtype=float32, numpy=
array([[-0.1090256 ,  1.9050574 , -0.30704725, -1.995831  , -0.36399186,
1.10414   ,  2.4885745 , -0.35387516,  1.3568261 ,  1.7267858 ],
[-0.1090256 ,  1.9050574 , -0.30704725, -1.995831  , -0.36399186,
1.10414   ,  2.4885745 , -0.35387516,  1.3568261 ,  1.7267858 ],
[-0.1090256 ,  1.9050574 , -0.30704725, -1.995831  , -0.36399186,
1.10414   ,  2.4885745 , -0.35387516,  1.3568261 ,  1.7267858 ],
[-0.1090256 ,  1.9050574 , -0.30704725, -1.995831  , -0.36399186,
1.10414   ,  2.4885745 , -0.35387516,  1.3568261 ,  1.7267858 ],
[-0.1090256 ,  1.9050574 , -0.30704725, -1.995831  , -0.36399186,
1.10414   ,  2.4885745 , -0.35387516,  1.3568261 ,  1.7267858 ],
[-0.0545128 ,  0.9525287 , -0.15352362, -0.9979155 , -0.18199593,
0.55207   ,  1.2442873 , -0.17693758,  0.67841303,  0.8633929 ]],
dtype=float32)>


## Global Pooling Layers¶

Tensorflow modules for graph global pooling.

### SumPooling¶

class dgl.nn.tensorflow.glob.SumPooling[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply sum pooling over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k$
call(graph, feat)[source]

Compute sum pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (tf.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size.

Return type

tf.Tensor

### AvgPooling¶

class dgl.nn.tensorflow.glob.AvgPooling[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply average pooling over the nodes in the graph.

$r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k$
call(graph, feat)[source]

Compute average pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (tf.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size.

Return type

tf.Tensor

### MaxPooling¶

class dgl.nn.tensorflow.glob.MaxPooling[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply max pooling over the nodes in the graph.

$r^{(i)} = \max_{k=1}^{N_i}\left( x^{(i)}_k \right)$
call(graph, feat)[source]

Compute max pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (tf.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size.

Return type

tf.Tensor

### SortPooling¶

class dgl.nn.tensorflow.glob.SortPooling(k)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply Sort Pooling (An End-to-End Deep Learning Architecture for Graph Classification) over the nodes in the graph.

Parameters

k (int) – The number of nodes to hold for each graph.

call(graph, feat)[source]

Compute sort pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (tf.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, k * D)$$, where $$B$$ refers to the batch size.

Return type

tf.Tensor

### GlobalAttentionPooling¶

class dgl.nn.tensorflow.glob.GlobalAttentionPooling(gate_nn, feat_nn=None)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)$
Parameters
• gate_nn (tf.layers.Layer) – A neural network that computes attention scores for each feature.

• feat_nn (tf.layers.Layer, optional) – A neural network applied to each feature before combining them with attention scores.

call(graph, feat)[source]

Compute global attention pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (tf.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size.

Return type

tf.Tensor

## Heterogeneous Graph Convolution Module¶

### HeteroGraphConv¶

class dgl.nn.tensorflow.HeteroGraphConv(mods, aggregate='sum')[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

A generic module for computing convolution on heterogeneous graphs.

The heterograph convolution applies sub-modules on their associating relation graphs, which reads the features from source nodes and writes the updated ones to destination nodes. If multiple relations have the same destination node types, their results are aggregated by the specified method.

If the relation graph has no edge, the corresponding module will not be called.

Examples

Create a heterograph with three types of relations and nodes.

>>> import dgl
>>> g = dgl.heterograph({
...     ('user', 'follows', 'user') : edges1,
...     ('user', 'plays', 'game') : edges2,
...     ('store', 'sells', 'game')  : edges3})


Create a HeteroGraphConv that applies different convolution modules to different relations. Note that the modules for 'follows' and 'plays' do not share weights.

>>> import dgl.nn.pytorch as dglnn
>>> conv = dglnn.HeteroGraphConv({
...     'follows' : dglnn.GraphConv(...),
...     'plays' : dglnn.GraphConv(...),
...     'sells' : dglnn.SAGEConv(...)},
...     aggregate='sum')


Call forward with some 'user' features. This computes new features for both 'user' and 'game' nodes.

>>> import tensorflow as tf
>>> h1 = {'user' : tf.random.normal((g.number_of_nodes('user'), 5))}
>>> h2 = conv(g, h1)
>>> print(h2.keys())
dict_keys(['user', 'game'])


Call forward with both 'user' and 'store' features. Because both the 'plays' and 'sells' relations will update the 'game' features, their results are aggregated by the specified method (i.e., summation here).

>>> f1 = {'user' : ..., 'store' : ...}
>>> f2 = conv(g, f1)
>>> print(f2.keys())
dict_keys(['user', 'game'])


Call forward with some 'store' features. This only computes new features for 'game' nodes.

>>> g1 = {'store' : ...}
>>> g2 = conv(g, g1)
>>> print(g2.keys())
dict_keys(['game'])


Call forward with a pair of inputs is allowed and each submodule will also be invoked with a pair of inputs.

>>> x_src = {'user' : ..., 'store' : ...}
>>> x_dst = {'user' : ..., 'game' : ...}
>>> y_dst = conv(g, (x_src, x_dst))
>>> print(y_dst.keys())
dict_keys(['user', 'game'])

Parameters
• mods (dict[str, nn.Module]) – Modules associated with every edge types. The forward function of each module must have a DGLHeteroGraph object as the first argument, and its second argument is either a tensor object representing the node features or a pair of tensor object representing the source and destination node features.

• aggregate (str, callable, optional) –

Method for aggregating node features generated by different relations. Allowed string values are ‘sum’, ‘max’, ‘min’, ‘mean’, ‘stack’. The ‘stack’ aggregation is performed along the second dimension, whose order is deterministic. User can also customize the aggregator by providing a callable instance. For example, aggregation by summation is equivalent to the follows:

def my_agg_func(tensors, dsttype):
# tensors: is a list of tensors to aggregate
# dsttype: string name of the destination node type for which the
#          aggregation is performed
stacked = tf.stack(tensors, axis=0)
return tf.reduce_sum(stacked, axis=0)


mods

Modules associated with every edge types.

Type

dict[str, nn.Module]

call(g, inputs, mod_args=None, mod_kwargs=None)[source]

Forward computation

Invoke the forward function with each module and aggregate their results.

Parameters
• g (DGLHeteroGraph) – Graph data.

• inputs (dict[str, Tensor] or pair of dict[str, Tensor]) – Input node features.

• mod_args (dict[str, tuple[any]], optional) – Extra positional arguments for the sub-modules.

• mod_kwargs (dict[str, dict[str, any]], optional) – Extra key-word arguments for the sub-modules.

Returns

Output representations for every types of nodes.

Return type

dict[str, Tensor]