NN Modules (MXNet)¶
Conv Layers¶
MXNet modules for graph convolutions.
GraphConv¶

class
dgl.nn.mxnet.conv.
GraphConv
(in_feats, out_feats, norm='both', weight=True, bias=True, activation=None, allow_zero_in_degree=False)[source]¶ Bases:
mxnet.gluon.block.Block
Graph convolution was introduced in GCN and mathematically is defined as follows:
\[h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})\]where \(\mathcal{N}(i)\) is the set of neighbors of node \(i\), \(c_{ij}\) is the product of the square root of node degrees (i.e., \(c_{ij} = \sqrt{\mathcal{N}(i)}\sqrt{\mathcal{N}(j)}\)), and \(\sigma\) is an activation function.
 Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
out_feats (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
norm (str, optional) –
How to apply the normalizer. Can be one of the following values:
right
, to divide the aggregated messages by each node’s indegrees, which is equivalent to averaging the received messages.none
, where no normalization is applied.both
(default), where the messages are scaled with \(1/c_{ji}\) above, equivalent to symmetric normalization.left
, to divide the messages sent out from each node by its outdegrees, equivalent to random walk normalization.
weight (bool, optional) – If True, apply a linear layer. Otherwise, aggregating the messages without a weight matrix.
bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.

weight
¶ The learnable weight tensor.
 Type
torch.Tensor

bias
¶ The learnable bias tensor.
 Type
torch.Tensor
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import mxnet as mx >>> from mxnet import gluon >>> import numpy as np >>> from dgl.nn import GraphConv
>>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = mx.nd.ones((6, 10)) >>> conv = GraphConv(10, 2, norm='both', weight=True, bias=True) >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> print(res) [[1.0209361 0.22472616] [1.1240715 0.24742813] [1.0209361 0.22472616] [1.2924911 0.28450024] [1.3568745 0.29867214] [0.7948386 0.17495811]] <NDArray 6x2 @cpu(0)>
>>> # allow_zero_in_degree example >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> conv = GraphConv(10, 2, norm='both', weight=True, bias=True, allow_zero_in_degree=True) >>> res = conv(g, feat) >>> print(res) [[1.0209361 0.22472616] [1.1240715 0.24742813] [1.0209361 0.22472616] [1.2924911 0.28450024] [1.3568745 0.29867214] [0. 0.]] <NDArray 6x2 @cpu(0)>
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_fea = mx.nd.random.randn(2, 5) >>> v_fea = mx.nd.random.randn(4, 5) >>> conv = GraphConv(5, 2, norm='both', weight=True, bias=True) >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, (u_fea, v_fea)) >>> res [[ 0.26967263 0.308129 ] [ 0.05143356 0.11355402] [ 0.22705637 0.1375853 ] [ 0.26967263 0.308129 ]] <NDArray 4x2 @cpu(0)>

forward
(graph, feat, weight=None)[source]¶ Compute graph convolution.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray or pair of mxnet.NDArray) –
If a single tensor is given, it represents the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
Note that in the special case of graph convolutional networks, if a pair of tensors is given, the latter element will not participate in computation.
weight (torch.Tensor, optional) – Optional external weight tensor.
 Returns
The output feature
 Return type
mxnet.NDArray
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
Note
Input shape: \((N, *, \text{in_feats})\) where * means any number of additional dimensions, \(N\) is the number of nodes.
Output shape: \((N, *, \text{out_feats})\) where all but the last dimension are the same shape as the input.
Weight shape: \((\text{in_feats}, \text{out_feats})\).
RelGraphConv¶

class
dgl.nn.mxnet.conv.
RelGraphConv
(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=True, low_mem=False, dropout=0.0, layer_norm=False)[source]¶ Bases:
mxnet.gluon.block.Block
Relational graph convolution layer.
Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:
\[h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})\]where \(\mathcal{N}^r(i)\) is the neighbor set of node \(i\) w.r.t. relation \(r\). \(c_{i,r}\) is the normalizer equal to \(\mathcal{N}^r(i)\). \(\sigma\) is an activation function. \(W_0\) is the selfloop weight.
The basis regularization decomposes \(W_r\) by:
\[W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}\]where \(B\) is the number of bases, \(V_b^{(l)}\) are linearly combined with coefficients \(a_{rb}^{(l)}\).
The blockdiagonaldecomposition regularization decomposes \(W_r\) into \(B\) number of block diagonal matrices. We refer \(B\) as the number of bases.
The block regularization decomposes \(W_r\) by:
\[W_r^{(l)} = \oplus_{b=1}^B Q_{rb}^{(l)}\]where \(B\) is the number of bases, \(Q_{rb}^{(l)}\) are block bases with shape \(R^{(d^{(l+1)}/B)*(d^{l}/B)}\).
 Parameters
in_feat (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
out_feat (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
num_rels (int) – Number of relations. .
regularizer (str) – Which weight regularizer to use “basis” or “bdd”. “basis” is short for basisdiagonaldecomposition. “bdd” is short for blockdiagonaldecomposition.
num_bases (int, optional) – Number of bases. If is none, use number of relations. Default:
None
.bias (bool, optional) – True if bias is added. Default:
True
.activation (callable, optional) – Activation function. Default:
None
.self_loop (bool, optional) – True to include self loop message. Default:
True
.low_mem (bool, optional) – True to use low memory implementation of relation message passing function. Default: False. This option trades speed with memory consumption, and will slowdown the forward/backward. Turn it on when you encounter OOM problem during training or evaluation. Default:
False
.dropout (float, optional) – Dropout rate. Default:
0.0
layer_norm (float, optional) – Add layer norm. Default:
False
Examples
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from mxnet import gluon >>> from dgl.nn import RelGraphConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = mx.nd.ones((6, 10)) >>> conv = RelGraphConv(10, 2, 3, regularizer='basis', num_bases=2) >>> conv.initialize(ctx=mx.cpu(0)) >>> etype = mx.nd.array(np.array([0,1,2,0,1,2]).astype(np.int64)) >>> res = conv(g, feat, etype) [[ 0.561324 0.33745846] [ 0.61585337 0.09992217] [ 0.561324 0.33745846] [0.01557937 0.01227859] [ 0.61585337 0.09992217] [ 0.056508 0.00307822]] <NDArray 6x2 @cpu(0)>

forward
(g, x, etypes, norm=None)[source]¶ Forward computation
 Parameters
g (DGLGraph) – The graph.
feat (mx.ndarray.NDArray) –
Input node features. Could be either
\((V, D)\) dense tensor
\((V,)\) int64 vector, representing the categorical values of each node. It then treat the input feature as an onehot encoding feature.
etypes (mx.ndarray.NDArray) – Edge type tensor. Shape: \((E,)\)
norm (mx.ndarray.NDArray) – Optional edge normalizer tensor. Shape: \((E, 1)\).
 Returns
New node features.
 Return type
mx.ndarray.NDArray
TAGConv¶

class
dgl.nn.mxnet.conv.
TAGConv
(in_feats, out_feats, k=2, bias=True, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
Topology Adaptive Graph Convolutional layer from paper Topology Adaptive Graph Convolutional Networks.
\[H^{K} = {\sum}_{k=0}^K (D^{1/2} A D^{1/2})^{k} X {\Theta}_{k},\]where \(A\) denotes the adjacency matrix, \(D_{ii} = \sum_{j=0} A_{ij}\) its diagonal degree matrix, \({\Theta}_{k}\) denotes the linear weights to sum the results of different hops together.
 Parameters
in_feats (int) – Input feature size. i.e, the number of dimensions of \(X\).
out_feats (int) – Output feature size. i.e, the number of dimensions of \(H^{K}\).
k (int, optional) – Number of hops \(K\). Default:
2
.bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.

lin
¶ The learnable linear module.
 Type
torch.Module
Example
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from mxnet import gluon >>> from dgl.nn import TAGConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = mx.nd.ones((6, 10)) >>> conv = TAGConv(10, 2, k=2) >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> res [[0.86147034 0.10089529] [0.86147034 0.10089529] [0.86147034 0.10089529] [0.9707841 0.0360311 ] [0.6716844 0.02247889] [ 0.32964635 0.7669234 ]] <NDArray 6x2 @cpu(0)>

forward
(graph, feat)[source]¶ Compute topology adaptive graph convolution.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
mxnet.NDArray
GATConv¶

class
dgl.nn.mxnet.conv.
GATConv
(in_feats, out_feats, num_heads, feat_drop=0.0, attn_drop=0.0, negative_slope=0.2, residual=False, activation=None, allow_zero_in_degree=False)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Graph Attention Network over an input signal.
\[h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} \alpha_{i,j} W^{(l)} h_j^{(l)}\]where \(\alpha_{ij}\) is the attention score bewteen node \(i\) and node \(j\):
\[ \begin{align}\begin{aligned}\alpha_{ij}^{l} &= \mathrm{softmax_i} (e_{ij}^{l})\\e_{ij}^{l} &= \mathrm{LeakyReLU}\left(\vec{a}^T [W h_{i} \ W h_{j}]\right)\end{aligned}\end{align} \] Parameters
in_feats (int, or pair of ints) – Input feature size; i.e, the number of dimensions of \(h_i^{(l)}\). GATConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer is to be applied to a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.out_feats (int) – Output feature size; i.e, the number of dimensions of \(h_i^{(l+1)}\).
num_heads (int) – Number of heads in MultiHead Attention.
feat_drop (float, optional) – Dropout rate on feature. Defaults:
0
.attn_drop (float, optional) – Dropout rate on attention weight. Defaults:
0
.negative_slope (float, optional) – LeakyReLU angle of negative slope. Defaults:
0.2
.residual (bool, optional) – If True, use residual connection. Defaults:
False
.activation (callable activation function/layer or None, optional.) – If not None, applies an activation function to the updated node features. Default:
None
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Defaults:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from mxnet import gluon >>> from dgl.nn import GATConv >>> >>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = mx.nd.ones((6, 10)) >>> gatconv = GATConv(10, 2, num_heads=3) >>> gatconv.initialize(ctx=mx.cpu(0)) >>> res = gatconv(g, feat) >>> res [[[ 0.32368395 0.10501936] [ 1.0839728 0.92690575] [0.54581136 0.84279203]] [[ 0.32368395 0.10501936] [ 1.0839728 0.92690575] [0.54581136 0.84279203]] [[ 0.32368395 0.10501936] [ 1.0839728 0.92690575] [0.54581136 0.84279203]] [[ 0.32368395 0.10501937] [ 1.0839728 0.9269058 ] [0.5458114 0.8427921 ]] [[ 0.32368395 0.10501936] [ 1.0839728 0.92690575] [0.54581136 0.84279203]] [[ 0.32368395 0.10501936] [ 1.0839728 0.92690575] [0.54581136 0.84279203]]] <NDArray 6x3x2 @cpu(0)>
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.heterograph({('A', 'r', 'B'): (u, v)}) >>> u_feat = mx.nd.random.randn(2, 5) >>> v_feat = mx.nd.random.randn(4, 10) >>> gatconv = GATConv((5,10), 2, 3) >>> gatconv.initialize(ctx=mx.cpu(0)) >>> res = gatconv(g, (u_feat, v_feat)) >>> res [[[1.01624 1.8138596 ] [ 1.2322129 0.8410206 ] [1.9325689 1.3824553 ]] [[ 0.9915016 1.6564168 ] [0.32610354 0.42505783] [ 1.5278397 0.92114615]] [[0.32592064 0.62067866] [ 0.6162219 0.3405491 ] [1.356375 0.9988818 ]] [[1.01624 1.8138596 ] [ 1.2322129 0.8410206 ] [1.9325689 1.3824553 ]]] <NDArray 4x3x2 @cpu(0)>

forward
(graph, feat, get_attention=False)[source]¶ Compute graph attention network layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray or pair of mxnet.NDArray) – If a mxnet.NDArray is given, the input feature of shape \((N, *, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape \((N_{in}, *, D_{in_{src}})\) and \((N_{out}, *, D_{in_{dst}})\).
get_attention (bool, optional) – Whether to return the attention values. Default to False.
 Returns
mxnet.NDArray – The output feature of shape \((N, *, H, D_{out})\) where \(H\) is the number of heads, and \(D_{out}\) is size of output feature.
mxnet.NDArray, optional – The attention values of shape \((E, *, H, 1)\), where \(E\) is the number of edges. This is returned only when
get_attention
isTrue
.
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
EdgeConv¶

class
dgl.nn.mxnet.conv.
EdgeConv
(in_feat, out_feat, batch_norm=False, allow_zero_in_degree=False)[source]¶ Bases:
mxnet.gluon.block.Block
EdgeConv layer.
Introduced in “Dynamic Graph CNN for Learning on Point Clouds”. Can be described as follows:
\[h_i^{(l+1)} = \max_{j \in \mathcal{N}(i)} ( \Theta \cdot (h_j^{(l)}  h_i^{(l)}) + \Phi \cdot h_i^{(l)})\]where \(\mathcal{N}(i)\) is the neighbor of \(i\). \(\Theta\) and \(\Phi\) are linear layers.
Note
The original formulation includes a ReLU inside the maximum operator. This is equivalent to first applying a maximum operator then applying the ReLU.
 Parameters
in_feat (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
out_feat (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
batch_norm (bool) – Whether to include batch normalization on messages. Default:
False
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from mxnet import gluon >>> from dgl.nn import EdgeConv >>> >>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = mx.nd.ones((6, 10)) >>> conv = EdgeConv(10, 2) >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> res [[1.0517545 0.8091326] [1.0517545 0.8091326] [1.0517545 0.8091326] [1.0517545 0.8091326] [1.0517545 0.8091326] [1.0517545 0.8091326]] <NDArray 6x2 @cpu(0)>
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_fea = mx.nd.random.randn(2, 5) >>> v_fea = mx.nd.random.randn(4, 5) >>> conv = EdgeConv(5, 2, 3) >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, (u_fea, v_fea)) >>> res [[3.4617817 0.84700686] [ 1.3170856 1.5731761 ] [2.0761423 0.56653017] [1.015364 0.78919804]] <NDArray 4x2 @cpu(0)>

forward
(g, h)[source]¶ Forward computation
 Parameters
g (DGLGraph) – The graph.
feat (mxnet.NDArray or pair of mxnet.NDArray) –
\((N, D)\) where \(N\) is the number of nodes and \(D\) is the number of feature dimensions.
If a pair of mxnet.NDArray is given, the graph must be a unibipartite graph with only one edge type, and the two tensors must have the same dimensionality on all except the first axis.
 Returns
New node features.
 Return type
mxnet.NDArray
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
SAGEConv¶

class
dgl.nn.mxnet.conv.
SAGEConv
(in_feats, out_feats, aggregator_type='mean', feat_drop=0.0, bias=True, norm=None, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
GraphSAGE layer from paper Inductive Representation Learning on Large Graphs.
\[ \begin{align}\begin{aligned}h_{\mathcal{N}(i)}^{(l+1)} &= \mathrm{aggregate} \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\\h_{i}^{(l+1)} &= \sigma \left(W \cdot \mathrm{concat} (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1}) \right)\\h_{i}^{(l+1)} &= \mathrm{norm}(h_{i}^{l})\end{aligned}\end{align} \] Parameters
in_feats (int, or pair of ints) –
Input feature size; i.e, the number of dimensions of \(h_i^{(l)}\).
GATConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer applies on a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.If aggregator type is
gcn
, the feature size of source and destination nodes are required to be the same.out_feats (int) – Output feature size; i.e, the number of dimensions of \(h_i^{(l+1)}\).
feat_drop (float) – Dropout rate on features, default:
0
.aggregator_type (str) – Aggregator type to use (
mean
,gcn
,pool
,lstm
).bias (bool) – If True, adds a learnable bias to the output. Default:
True
.norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.
activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.
Examples
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from dgl.nn import SAGEConv >>> >>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = mx.nd.ones((6, 10)) >>> conv = SAGEConv(10, 2, 'pool') >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> res [[ 0.32144994 0.8729614 ] [ 0.32144994 0.8729614 ] [ 0.32144994 0.8729614 ] [ 0.32144994 0.8729614 ] [ 0.32144994 0.8729614 ] [ 0.32144994 0.8729614 ]] <NDArray 6x2 @cpu(0)>
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_fea = mx.nd.random.randn(2, 5) >>> v_fea = mx.nd.random.randn(4, 10) >>> conv = SAGEConv((5, 10), 2, 'pool') >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, (u_fea, v_fea)) >>> res [[0.60524774 0.7196473 ] [ 0.8832787 0.5928619 ] [1.8245722 1.159798 ] [1.0509381 2.2239418 ]] <NDArray 4x2 @cpu(0)>

forward
(graph, feat)[source]¶ Compute GraphSAGE layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray or pair of mxnet.NDArray) – If a single tensor is given, it represents the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
mxnet.NDArray
SGConv¶

class
dgl.nn.mxnet.conv.
SGConv
(in_feats, out_feats, k=1, cached=False, bias=True, norm=None, allow_zero_in_degree=False)[source]¶ Bases:
mxnet.gluon.block.Block
Simplifying Graph Convolution layer from paper Simplifying Graph Convolutional Networks.
\[H^{K} = (\tilde{D}^{1/2} \tilde{A} \tilde{D}^{1/2})^K X \Theta\]where \(\tilde{A}\) is \(A\) + \(I\). Thus the graph input is expected to have selfloop edges added.
 Parameters
in_feats (int) – Number of input features; i.e, the number of dimensions of \(X\).
out_feats (int) – Number of output features; i.e, the number of dimensions of \(H^{K}\).
k (int) – Number of hops \(K\). Defaults:
1
.cached (bool) –
If True, the module would cache
\[(\tilde{D}^{\frac{1}{2}}\tilde{A}\tilde{D}^{\frac{1}{2}})^K X\Theta\]at the first forward call. This parameter should only be set to
True
in Transductive Learning setting.bias (bool) – If True, adds a learnable bias to the output. Default:
True
.norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features. Default:
False
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Example
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from dgl.nn import SGConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = mx.nd.ones((6, 10)) >>> conv = SGConv(10, 2, k=2, cached=True) >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> res [[ 2.264404 0.26684892] [ 2.264404 0.26684892] [ 2.264404 0.26684892] [ 3.2273252 0.3803246 ] [ 2.247593 0.2648679 ] [ 2.2644043 0.26684904]] <NDArray 6x2 @cpu(0)>

forward
(graph, feat)[source]¶ Compute Simplifying Graph Convolution layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
mxnet.NDArray
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
Note
If
cache
is set to True,feat
andgraph
should not change during training, or you will get wrong results.
APPNPConv¶

class
dgl.nn.mxnet.conv.
APPNPConv
(k, alpha, edge_drop=0.0)[source]¶ Bases:
mxnet.gluon.block.Block
Approximate Personalized Propagation of Neural Predictions layer from paper Predict then Propagate: Graph Neural Networks meet Personalized PageRank.
\[ \begin{align}\begin{aligned}H^{0} &= X\\H^{l+1} &= (1\alpha)\left(\tilde{D}^{1/2} \tilde{A} \tilde{D}^{1/2} H^{l}\right) + \alpha H^{0}\end{aligned}\end{align} \]where \(\tilde{A}\) is \(A\) + \(I\).
 Parameters
Example
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from dgl.nn import APPNPConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = mx.nd.ones((6, 10)) >>> conv = APPNPConv(k=3, alpha=0.5) >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> res [[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. ] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. ] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. ] [1.0303301 1.0303301 1.0303301 1.0303301 1.0303301 1.0303301 1.0303301 1.0303301 1.0303301 1.0303301 ] [0.86427665 0.86427665 0.86427665 0.86427665 0.86427665 0.86427665 0.86427665 0.86427665 0.86427665 0.86427665] [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ]] <NDArray 6x10 @cpu(0)>

forward
(graph, feat)[source]¶ Compute APPNP layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mx.NDArray) – The input feature of shape \((N, *)\). \(N\) is the number of nodes, and \(*\) could be of any shape.
 Returns
The output feature of shape \((N, *)\) where \(*\) should be the same as input shape.
 Return type
mx.NDArray
GINConv¶

class
dgl.nn.mxnet.conv.
GINConv
(apply_func, aggregator_type, init_eps=0, learn_eps=False)[source]¶ Bases:
mxnet.gluon.block.Block
Graph Isomorphism Network layer from paper How Powerful are Graph Neural Networks?.
\[h_i^{(l+1)} = f_\Theta \left((1 + \epsilon) h_i^{l} + \mathrm{aggregate}\left(\left\{h_j^{l}, j\in\mathcal{N}(i) \right\}\right)\right)\] Parameters
apply_func (callable activation function/layer or None) – If not None, apply this function to the updated node feature, the \(f_\Theta\) in the formula.
aggregator_type (str) – Aggregator type to use (
sum
,max
ormean
).init_eps (float, optional) – Initial \(\epsilon\) value, default:
0
.learn_eps (bool, optional) – If True, \(\epsilon\) will be a learnable parameter. Default:
False
.
Example
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from mxnet import gluon >>> from dgl.nn import GINConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = mx.nd.ones((6, 10)) >>> lin = gluon.nn.Dense(10) >>> lin.initialize(ctx=mx.cpu(0)) >>> conv = GINConv(lin, 'max') >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> res [[ 0.44832918 0.05283341 0.20823681 0.16020004 0.37311912 0.03372726 0.05716725 0.20730163 0.14121324 0.46083626] [ 0.44832918 0.05283341 0.20823681 0.16020004 0.37311912 0.03372726 0.05716725 0.20730163 0.14121324 0.46083626] [ 0.44832918 0.05283341 0.20823681 0.16020004 0.37311912 0.03372726 0.05716725 0.20730163 0.14121324 0.46083626] [ 0.44832918 0.05283341 0.20823681 0.16020004 0.37311912 0.03372726 0.05716725 0.20730163 0.14121324 0.46083626] [ 0.44832918 0.05283341 0.20823681 0.16020004 0.37311912 0.03372726 0.05716725 0.20730163 0.14121324 0.46083626] [ 0.22416459 0.0264167 0.10411841 0.08010002 0.18655956 0.01686363 0.02858362 0.10365082 0.07060662 0.23041813]] <NDArray 6x10 @cpu(0)>

forward
(graph, feat)[source]¶ Compute Graph Isomorphism Network layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray or a pair of mxnet.NDArray) – If a mxnet.NDArray is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape \((N_{in}, D_{in})\) and \((N_{out}, D_{in})\). If
apply_func
is not None, \(D_{in}\) should fit the input dimensionality requirement ofapply_func
.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output dimensionality of
apply_func
. Ifapply_func
is None, \(D_{out}\) should be the same as input dimensionality. Return type
mxnet.NDArray
GatedGraphConv¶

class
dgl.nn.mxnet.conv.
GatedGraphConv
(in_feats, out_feats, n_steps, n_etypes, bias=True)[source]¶ Bases:
mxnet.gluon.block.Block
Gated Graph Convolution layer from paper Gated Graph Sequence Neural Networks.
\[ \begin{align}\begin{aligned}h_{i}^{0} &= [ x_i \ \mathbf{0} ]\\a_{i}^{t} &= \sum_{j\in\mathcal{N}(i)} W_{e_{ij}} h_{j}^{t}\\h_{i}^{t+1} &= \mathrm{GRU}(a_{i}^{t}, h_{i}^{t})\end{aligned}\end{align} \] Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(x_i\).
out_feats (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(t+1)}\).
n_steps (int) – Number of recurrent steps; i.e, the \(t\) in the above formula.
n_etypes (int) – Number of edge types.
bias (bool) – If True, adds a learnable bias to the output. Default:
True
. Can only be set to True in MXNet.
Example
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from dgl.nn import GatedGraphConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = mx.nd.ones((6, 10)) >>> conv = GatedGraphConv(10, 10, 2, 3) >>> conv.initialize(ctx=mx.cpu(0)) >>> etype = mx.nd.array([0,1,2,0,1,2]) >>> res = conv(g, feat, etype) >>> res [[0.24378185 0.17402579 0.2644723 0.2740628 0.14041871 0.32523093 0.2703067 0.18234392 0.32777587 0.30957845] [0.17872348 0.28878236 0.2509409 0.20139427 0.3355541 0.22643831 0.2690711 0.22341749 0.27995753 0.21575949] [0.23911178 0.16696918 0.26120248 0.27397877 0.13745922 0.3223175 0.27561218 0.18071817 0.3251124 0.30608907] [0.25242943 0.3098581 0.25249368 0.27968448 0.24624602 0.12270881 0.335147 0.31550157 0.19065917 0.21087633] [0.17503153 0.29523152 0.2474858 0.20848347 0.3526433 0.23443702 0.24741334 0.21986549 0.28935105 0.21859099] [0.2159364 0.26942077 0.23083271 0.28329757 0.24758333 0.24230732 0.23958017 0.23430146 0.26431587 0.27001363]] <NDArray 6x10 @cpu(0)>

forward
(graph, feat, etypes)[source]¶ Compute Gated Graph Convolution layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(N\) is the number of nodes of the graph and \(D_{in}\) is the input feature size.
etypes (torch.LongTensor) – The edge type tensor of shape \((E,)\) where \(E\) is the number of edges of the graph.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output feature size.
 Return type
mxnet.NDArray
GMMConv¶

class
dgl.nn.mxnet.conv.
GMMConv
(in_feats, out_feats, dim, n_kernels, aggregator_type='sum', residual=False, bias=True, allow_zero_in_degree=False)[source]¶ Bases:
mxnet.gluon.block.Block
The Gaussian Mixture Model Convolution layer from Geometric Deep Learning on Graphs and Manifolds using Mixture Model CNNs.
\[ \begin{align}\begin{aligned}u_{ij} &= f(x_i, x_j), x_j \in \mathcal{N}(i)\\w_k(u) &= \exp\left(\frac{1}{2}(u\mu_k)^T \Sigma_k^{1} (u  \mu_k)\right)\\h_i^{l+1} &= \mathrm{aggregate}\left(\left\{\frac{1}{K} \sum_{k}^{K} w_k(u_{ij}), \forall j\in \mathcal{N}(i)\right\}\right)\end{aligned}\end{align} \]where \(u\) denotes the pseudocoordinates between a vertex and one of its neighbor, computed using function \(f\), \(\Sigma_k^{1}\) and \(\mu_k\) are learnable parameters representing the covariance matrix and mean vector of a Gaussian kernel.
 Parameters
in_feats (int) – Number of input features; i.e., the number of dimensions of \(x_i\).
out_feats (int) – Number of output features; i.e., the number of dimensions of \(h_i^{(l+1)}\).
dim (int) – Dimensionality of pseudocoordinte; i.e, the number of dimensions of \(u_{ij}\).
n_kernels (int) – Number of kernels \(K\).
aggregator_type (str) – Aggregator type (
sum
,mean
,max
). Default:sum
.residual (bool) – If True, use residual connection inside this layer. Default:
False
.bias (bool) – If True, adds a learnable bias to the output. Default:
True
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from dgl.nn import GMMConv >>> >>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = mx.nd.ones((6, 10)) >>> conv = GMMConv(10, 2, 3, 2, 'mean') >>> conv.initialize(ctx=mx.cpu(0)) >>> pseudo = mx.nd.ones((12, 3)) >>> res = conv(g, feat, pseudo) >>> res [[0.05083769 0.1567954 ] [0.05083769 0.1567954 ] [0.05083769 0.1567954 ] [0.05083769 0.1567954 ] [0.05083769 0.1567954 ] [0.05083769 0.1567954 ]] <NDArray 6x2 @cpu(0)>
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_fea = mx.nd.random.randn(2, 5) >>> v_fea = mx.nd.random.randn(4, 10) >>> pseudo = mx.nd.ones((5, 3)) >>> conv = GMMConv((5, 10), 2, 3, 2, 'mean') >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, (u_fea, v_fea), pseudo) >>> res [[0.1005067 0.09494358] [0.0023314 0.07597432] [0.05141905 0.08545895] [0.1005067 0.09494358]] <NDArray 4x2 @cpu(0)>

forward
(graph, feat, pseudo)[source]¶ Compute Gaussian Mixture Model Convolution layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – If a single tensor is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
pseudo (mxnet.NDArray) – The pseudo coordinate tensor of shape \((E, D_{u})\) where \(E\) is the number of edges of the graph and \(D_{u}\) is the dimensionality of pseudo coordinate.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output feature size.
 Return type
mxnet.NDArray
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
ChebConv¶

class
dgl.nn.mxnet.conv.
ChebConv
(in_feats, out_feats, k, bias=True)[source]¶ Bases:
mxnet.gluon.block.Block
Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.
\[ \begin{align}\begin{aligned}h_i^{l+1} &= \sum_{k=0}^{K1} W^{k, l}z_i^{k, l}\\Z^{0, l} &= H^{l}\\Z^{1, l} &= \tilde{L} \cdot H^{l}\\Z^{k, l} &= 2 \cdot \tilde{L} \cdot Z^{k1, l}  Z^{k2, l}\\\tilde{L} &= 2\left(I  \tilde{D}^{1/2} \tilde{A} \tilde{D}^{1/2}\right)/\lambda_{max}  I\end{aligned}\end{align} \]where \(\tilde{A}\) is \(A\) + \(I\), \(W\) is learnable weight.
 Parameters
in_feats (int) – Dimension of input features; i.e, the number of dimensions of \(h_i^{(l)}\).
out_feats (int) – Dimension of output features \(h_i^{(l+1)}\).
k (int) – Chebyshev filter size \(K\).
activation (function, optional) – Activation function. Default
ReLu
.bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.
Example
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from dgl.nn import ChebConv >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = mx.nd.ones((6, 10)) >>> conv = ChebConv(10, 2, 2) >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> res [[ 0.832592 0.738757 ] [ 0.832592 0.738757 ] [ 0.832592 0.738757 ] [ 0.43377423 1.0455742 ] [ 1.1145986 0.5218046 ] [ 1.7954229 0.00196505]] <NDArray 6x2 @cpu(0)>

forward
(graph, feat, lambda_max=None)[source]¶ Compute ChebNet layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
lambda_max (list or tensor or None, optional.) –
A list(tensor) with length \(B\), stores the largest eigenvalue of the normalized laplacian of each individual graph in
graph
, where \(B\) is the batch size of the input graph. Default: None.If None, this method would set the default value to 2. One can use
dgl.laplacian_lambda_max()
to compute this value.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
mxnet.NDArray
AGNNConv¶

class
dgl.nn.mxnet.conv.
AGNNConv
(init_beta=1.0, learn_beta=True, allow_zero_in_degree=False)[source]¶ Bases:
mxnet.gluon.block.Block
Attentionbased Graph Neural Network layer from paper Attentionbased Graph Neural Network for SemiSupervised Learning.
\[H^{l+1} = P H^{l}\]where \(P\) is computed as:
\[P_{ij} = \mathrm{softmax}_i ( \beta \cdot \cos(h_i^l, h_j^l))\]where \(\beta\) is a single scalar parameter.
 Parameters
init_beta (float, optional) – The \(\beta\) in the formula, a single scalar parameter.
learn_beta (bool, optional) – If True, \(\beta\) will be learnable parameter.
allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Example
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from dgl.nn import AGNNConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = mx.nd.ones((6, 10)) >>> conv = AGNNConv() >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat) >>> res [[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] <NDArray 6x10 @cpu(0)>

forward
(graph, feat)[source]¶ Compute AGNN layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature of shape \((N, *)\) \(N\) is the number of nodes, and \(*\) could be of any shape. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape \((N_{in}, *)\) and \((N_{out}, *)\), the \(*\) in the later tensor must equal the previous one.
 Returns
The output feature of shape \((N, *)\) where \(*\) should be the same as input shape.
 Return type
mxnet.NDArray
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
NNConv¶

class
dgl.nn.mxnet.conv.
NNConv
(in_feats, out_feats, edge_func, aggregator_type, residual=False, bias=True)[source]¶ Bases:
mxnet.gluon.block.Block
Graph Convolution layer introduced in Neural Message Passing for Quantum Chemistry.
\[h_{i}^{l+1} = h_{i}^{l} + \mathrm{aggregate}\left(\left\{ f_\Theta (e_{ij}) \cdot h_j^{l}, j\in \mathcal{N}(i) \right\}\right)\]where \(e_{ij}\) is the edge feature, \(f_\Theta\) is a function with learnable parameters.
 Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\). NN can be applied on homogeneous graph and unidirectional bipartite graph. If the layer is to be applied on a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.out_feats (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
edge_func (callable activation function/layer) – Maps each edge feature to a vector of shape
(in_feats * out_feats)
as weight to compute messages. Also is the \(f_\Theta\) in the formula.aggregator_type (str) – Aggregator type to use (
sum
,mean
ormax
).residual (bool, optional) – If True, use residual connection. Default:
False
.bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.
Examples
>>> import dgl >>> import numpy as np >>> import mxnet as mx >>> from mxnet import gluon >>> from dgl.nn import NNConv >>> >>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = mx.nd.ones((6, 10)) >>> lin = gluon.nn.Dense(20) >>> lin.initialize(ctx=mx.cpu(0)) >>> def edge_func(efeat): >>> return lin(efeat) >>> efeat = mx.nd.ones((12, 5)) >>> conv = NNConv(10, 2, edge_func, 'mean') >>> conv.initialize(ctx=mx.cpu(0)) >>> res = conv(g, feat, efeat) >>> res [[0.39946803 0.32098457] [0.39946803 0.32098457] [0.39946803 0.32098457] [0.39946803 0.32098457] [0.39946803 0.32098457] [0.39946803 0.32098457]] <NDArray 6x2 @cpu(0)>
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_feat = mx.nd.random.randn(2, 10) >>> v_feat = mx.nd.random.randn(4, 10) >>> conv = NNConv(10, 2, edge_func, 'mean') >>> conv.initialize(ctx=mx.cpu(0)) >>> efeat = mx.nd.ones((5, 5)) >>> res = conv(g, (u_feat, v_feat), efeat) >>> res [[ 0.24425688 0.3238042 ] [0.11651017 0.01738572] [ 0.06387337 0.15320925] [ 0.24425688 0.3238042 ]] <NDArray 4x2 @cpu(0)>

forward
(graph, feat, efeat)[source]¶ Compute MPNN Graph Convolution layer.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray or pair of mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(N\) is the number of nodes of the graph and \(D_{in}\) is the input feature size.
efeat (mxnet.NDArray) – The edge feature of shape \((N, *)\), should fit the input shape requirement of
edge_nn
.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output feature size.
 Return type
mxnet.NDArray
Dense Conv Layers¶
DenseGraphConv¶

class
dgl.nn.mxnet.conv.
DenseGraphConv
(in_feats, out_feats, norm='both', bias=True, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
Graph Convolutional Network layer where the graph structure is given by an adjacency matrix. We recommend user to use this module when applying graph convolution on dense graphs.
 Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
out_feats (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s indegrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the \(c_{ij}\) in the paper is applied.
bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.
Notes
Zero indegree nodes will lead to allzero output. A common practice to avoid this is to add a selfloop for each node in the graph, which can be achieved by setting the diagonal of the adjacency matrix to be 1.
See also

forward
(adj, feat)[source]¶ Compute (Dense) Graph Convolution layer.
 Parameters
adj (mxnet.NDArray) – The adjacency matrix of the graph to apply Graph Convolution on, when applied to a unidirectional bipartite graph,
adj
should be of shape should be of shape \((N_{out}, N_{in})\); when applied to a homo graph,adj
should be of shape \((N, N)\). In both cases, a row represents a destination node while a column represents a source node.feat (mxnet.NDArray) – The input feature.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
mxnet.NDArray
DenseSAGEConv¶

class
dgl.nn.mxnet.conv.
DenseSAGEConv
(in_feats, out_feats, feat_drop=0.0, bias=True, norm=None, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
GraphSAGE layer where the graph structure is given by an adjacency matrix. We recommend to use this module when appying GraphSAGE on dense graphs.
Note that we only support gcn aggregator in DenseSAGEConv.
 Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_i^{(l)}\).
out_feats (int) – Output feature size; i.e, the number of dimensions of \(h_i^{(l+1)}\).
feat_drop (float, optional) – Dropout rate on features. Default: 0.
bias (bool) – If True, adds a learnable bias to the output. Default:
True
.norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.
activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.
See also

forward
(adj, feat)[source]¶ Compute (Dense) Graph SAGE layer.
 Parameters
adj (mxnet.NDArray) – The adjacency matrix of the graph to apply SAGE Convolution on, when applied to a unidirectional bipartite graph,
adj
should be of shape should be of shape \((N_{out}, N_{in})\); when applied to a homo graph,adj
should be of shape \((N, N)\). In both cases, a row represents a destination node while a column represents a source node.feat (mxnet.NDArray or a pair of mxnet.NDArray) – If a mxnet.NDArray is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape \((N_{in}, D_{in})\) and \((N_{out}, D_{in})\).
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
mxnet.NDArray
DenseChebConv¶

class
dgl.nn.mxnet.conv.
DenseChebConv
(in_feats, out_feats, k, bias=True)[source]¶ Bases:
mxnet.gluon.block.Block
Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.
We recommend to use this module when applying ChebConv on dense graphs.
 Parameters
in_feats (int) – Dimension of input features \(h_i^{(l)}\).
out_feats (int) – Dimension of output features \(h_i^{(l+1)}\).
k (int) – Chebyshev filter size.
activation (function, optional) – Activation function, default is ReLu.
bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.
See also

forward
(adj, feat, lambda_max=None)[source]¶ Compute (Dense) Chebyshev Spectral Graph Convolution layer.
 Parameters
adj (mxnet.NDArray) – The adjacency matrix of the graph to apply Graph Convolution on, should be of shape \((N, N)\), where a row represents the destination and a column represents the source.
feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
lambda_max (float or None, optional) – A float value indicates the largest eigenvalue of given graph. Default: None.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
mxnet.NDArray
Global Pooling Layers¶
MXNet modules for graph global pooling.
SumPooling¶

class
dgl.nn.mxnet.glob.
SumPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply sum pooling over the nodes in the graph.
\[r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k\]
forward
(graph, feat)[source]¶ Compute sum pooling.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
 Returns
The output feature with shape \((B, *)\), where \(B\) refers to the batch size.
 Return type
mxnet.NDArray

AvgPooling¶

class
dgl.nn.mxnet.glob.
AvgPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply average pooling over the nodes in the graph.
\[r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k\]
forward
(graph, feat)[source]¶ Compute average pooling.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
 Returns
The output feature with shape \((B, *)\), where \(B\) refers to the batch size.
 Return type
mxnet.NDArray

MaxPooling¶

class
dgl.nn.mxnet.glob.
MaxPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply max pooling over the nodes in the graph.
\[r^{(i)} = \max_{k=1}^{N_i} \left( x^{(i)}_k \right)\]
forward
(graph, feat)[source]¶ Compute max pooling.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
 Returns
The output feature with shape \((B, *)\), where \(B\) refers to the batch size.
 Return type
mxnet.NDArray

SortPooling¶

class
dgl.nn.mxnet.glob.
SortPooling
(k)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Sort Pooling (An EndtoEnd Deep Learning Architecture for Graph Classification) over the nodes in the graph.
 Parameters
k (int) – The number of nodes to hold for each graph.

forward
(graph, feat)[source]¶ Compute sort pooling.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
 Returns
The output feature with shape \((B, k * D)\), where \(B\) refers to the batch size.
 Return type
mxnet.NDArray
GlobalAttentionPooling¶

class
dgl.nn.mxnet.glob.
GlobalAttentionPooling
(gate_nn, feat_nn=None)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in the graph.
\[r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)\] Parameters
gate_nn (gluon.nn.Block) – A neural network that computes attention scores for each feature.
feat_nn (gluon.nn.Block, optional) – A neural network applied to each feature before combining them with attention scores.

forward
(graph, feat)[source]¶ Compute global attention pooling.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
 Returns
The output feature with shape \((B, D)\), where \(B\) refers to the batch size.
 Return type
mxnet.NDArray
Set2Set¶

class
dgl.nn.mxnet.glob.
Set2Set
(input_dim, n_iters, n_layers)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Set2Set (Order Matters: Sequence to sequence for sets) over the nodes in the graph.
For each individual graph in the batch, set2set computes
\[ \begin{align}\begin{aligned}q_t &= \mathrm{LSTM} (q^*_{t1})\\\alpha_{i,t} &= \mathrm{softmax}(x_i \cdot q_t)\\r_t &= \sum_{i=1}^N \alpha_{i,t} x_i\\q^*_t &= q_t \Vert r_t\end{aligned}\end{align} \]for this graph.
 Parameters

forward
(graph, feat)[source]¶ Compute set2set pooling.
 Parameters
graph (DGLGraph) – The graph.
feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
 Returns
The output feature with shape \((B, D)\), where \(B\) refers to the batch size.
 Return type
mxnet.NDArray
Heterogeneous Graph Convolution Module¶
HeteroGraphConv¶

class
dgl.nn.mxnet.
HeteroGraphConv
(mods, aggregate='sum')[source]¶ Bases:
mxnet.gluon.block.Block
A generic module for computing convolution on heterogeneous graphs.
The heterograph convolution applies submodules on their associating relation graphs, which reads the features from source nodes and writes the updated ones to destination nodes. If multiple relations have the same destination node types, their results are aggregated by the specified method. If the relation graph has no edge, the corresponding module will not be called.
Pseudocode:
outputs = {nty : [] for nty in g.dsttypes} # Apply submodules on their associating relation graphs in parallel for relation in g.canonical_etypes: stype, etype, dtype = relation dstdata = relation_submodule(g[relation], ...) outputs[dtype].append(dstdata) # Aggregate the results for each destination node type rsts = {} for ntype, ntype_outputs in outputs.items(): if len(ntype_outputs) != 0: rsts[ntype] = aggregate(ntype_outputs) return rsts
Examples
Create a heterograph with three types of relations and nodes.
>>> import dgl >>> g = dgl.heterograph({ ... ('user', 'follows', 'user') : edges1, ... ('user', 'plays', 'game') : edges2, ... ('store', 'sells', 'game') : edges3})
Create a
HeteroGraphConv
that applies different convolution modules to different relations. Note that the modules for'follows'
and'plays'
do not share weights.>>> import dgl.nn.pytorch as dglnn >>> conv = dglnn.HeteroGraphConv({ ... 'follows' : dglnn.GraphConv(...), ... 'plays' : dglnn.GraphConv(...), ... 'sells' : dglnn.SAGEConv(...)}, ... aggregate='sum')
Call forward with some
'user'
features. This computes new features for both'user'
and'game'
nodes.>>> import mxnet.ndarray as nd >>> h1 = {'user' : nd.random.randn(g.number_of_nodes('user'), 5)} >>> h2 = conv(g, h1) >>> print(h2.keys()) dict_keys(['user', 'game'])
Call forward with both
'user'
and'store'
features. Because both the'plays'
and'sells'
relations will update the'game'
features, their results are aggregated by the specified method (i.e., summation here).>>> f1 = {'user' : ..., 'store' : ...} >>> f2 = conv(g, f1) >>> print(f2.keys()) dict_keys(['user', 'game'])
Call forward with some
'store'
features. This only computes new features for'game'
nodes.>>> g1 = {'store' : ...} >>> g2 = conv(g, g1) >>> print(g2.keys()) dict_keys(['game'])
Call forward with a pair of inputs is allowed and each submodule will also be invoked with a pair of inputs.
>>> x_src = {'user' : ..., 'store' : ...} >>> x_dst = {'user' : ..., 'game' : ...} >>> y_dst = conv(g, (x_src, x_dst)) >>> print(y_dst.keys()) dict_keys(['user', 'game'])
 Parameters
mods (dict[str, nn.Module]) – Modules associated with every edge types. The forward function of each module must have a DGLHeteroGraph object as the first argument, and its second argument is either a tensor object representing the node features or a pair of tensor object representing the source and destination node features.
aggregate (str, callable, optional) –
Method for aggregating node features generated by different relations. Allowed string values are ‘sum’, ‘max’, ‘min’, ‘mean’, ‘stack’. The ‘stack’ aggregation is performed along the second dimension, whose order is deterministic. User can also customize the aggregator by providing a callable instance. For example, aggregation by summation is equivalent to the follows:
def my_agg_func(tensors, dsttype): # tensors: is a list of tensors to aggregate # dsttype: string name of the destination node type for which the # aggregation is performed stacked = mx.nd.stack(*tensors, axis=0) return mx.nd.sum(stacked, axis=0)

forward
(g, inputs, mod_args=None, mod_kwargs=None)[source]¶ Forward computation
Invoke the forward function with each module and aggregate their results.
 Parameters
g (DGLHeteroGraph) – Graph data.
inputs (dict[str, Tensor] or pair of dict[str, Tensor]) – Input node features.
mod_args (dict[str, tuple[any]], optional) – Extra positional arguments for the submodules.
mod_kwargs (dict[str, dict[str, any]], optional) – Extra keyword arguments for the submodules.
 Returns
Output representations for every types of nodes.
 Return type
Utility Modules¶
Sequential¶

class
dgl.nn.mxnet.utils.
Sequential
(prefix=None, params=None)[source]¶ Bases:
mxnet.gluon.nn.basic_layers.Sequential
A squential container for stacking graph neural network blocks.
We support two modes: sequentially apply GNN blocks on the same graph or a list of given graphs. In the second case, the number of graphs equals the number of blocks inside this container.
Examples
Mode 1: sequentially apply GNN modules on the same graph
>>> import dgl >>> from mxnet import nd >>> from mxnet.gluon import nn >>> import dgl.function as fn >>> from dgl.nn.mxnet import Sequential >>> class ExampleLayer(nn.Block): >>> def __init__(self, **kwargs): >>> super().__init__(**kwargs) >>> def forward(self, graph, n_feat, e_feat): >>> with graph.local_scope(): >>> graph.ndata['h'] = n_feat >>> graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h')) >>> n_feat += graph.ndata['h'] >>> graph.apply_edges(fn.u_add_v('h', 'h', 'e')) >>> e_feat += graph.edata['e'] >>> return n_feat, e_feat >>> >>> g = dgl.DGLGraph() >>> g.add_nodes(3) >>> g.add_edges([0, 1, 2, 0, 1, 2, 0, 1, 2], [0, 0, 0, 1, 1, 1, 2, 2, 2]) >>> net = Sequential() >>> net.add(ExampleLayer()) >>> net.add(ExampleLayer()) >>> net.add(ExampleLayer()) >>> net.initialize() >>> n_feat = nd.random.randn(3, 4) >>> e_feat = nd.random.randn(9, 4) >>> net(g, n_feat, e_feat) ( [[ 12.412863 99.61184 21.472883 57.625923 ] [ 10.08097 100.68611 20.627377 60.13458 ] [ 11.7912245 101.80654 22.427956 58.32772 ]] <NDArray 3x4 @cpu(0)>, [[ 21.818504 198.12076 42.72387 115.147736] [ 23.070837 195.49811 43.42292 116.17203 ] [ 24.330334 197.10927 42.40048 118.06538 ] [ 21.907919 199.11469 42.1187 115.35658 ] [ 22.849625 198.79213 43.866085 113.65381 ] [ 20.926125 198.116 42.64334 114.246704] [ 23.003159 197.06662 41.796425 117.14977 ] [ 21.391375 198.3348 41.428078 116.30361 ] [ 21.291483 200.0701 40.8239 118.07314 ]] <NDArray 9x4 @cpu(0)>)
Mode 2: sequentially apply GNN modules on different graphs
>>> import dgl >>> from mxnet import nd >>> from mxnet.gluon import nn >>> import dgl.function as fn >>> import networkx as nx >>> from dgl.nn.mxnet import Sequential >>> class ExampleLayer(nn.Block): >>> def __init__(self, **kwargs): >>> super().__init__(**kwargs) >>> def forward(self, graph, n_feat): >>> with graph.local_scope(): >>> graph.ndata['h'] = n_feat >>> graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h')) >>> n_feat += graph.ndata['h'] >>> return n_feat.reshape(graph.number_of_nodes() // 2, 2, 1).sum(1) >>> >>> g1 = dgl.DGLGraph(nx.erdos_renyi_graph(32, 0.05)) >>> g2 = dgl.DGLGraph(nx.erdos_renyi_graph(16, 0.2)) >>> g3 = dgl.DGLGraph(nx.erdos_renyi_graph(8, 0.8)) >>> net = Sequential() >>> net.add(ExampleLayer()) >>> net.add(ExampleLayer()) >>> net.add(ExampleLayer()) >>> net.initialize() >>> n_feat = nd.random.randn(32, 4) >>> net([g1, g2, g3], n_feat) [[101.289566 22.584694 89.25348 151.6447 ] [130.74239 49.494812 120.250854 199.81546 ] [112.32089 50.036713 116.13266 190.38638 ] [119.23065 26.78553 111.11185 166.08322 ]] <NDArray 4x4 @cpu(0)>