# NN Modules (MXNet)¶

## Conv Layers¶

MXNet modules for graph convolutions.

### GraphConv¶

class dgl.nn.mxnet.conv.GraphConv(in_feats, out_feats, norm='both', weight=True, bias=True, activation=None, allow_zero_in_degree=False)[source]

Bases: mxnet.gluon.block.Block

Graph convolution was introduced in GCN and mathematically is defined as follows:

$h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})$

where $$\mathcal{N}(i)$$ is the set of neighbors of node $$i$$, $$c_{ij}$$ is the product of the square root of node degrees (i.e., $$c_{ij} = \sqrt{|\mathcal{N}(i)|}\sqrt{|\mathcal{N}(j)|}$$), and $$\sigma$$ is an activation function.

Parameters
• in_feats (int) – Input feature size; i.e, the number of dimensions of $$h_j^{(l)}$$.

• out_feats (int) – Output feature size; i.e., the number of dimensions of $$h_i^{(l+1)}$$.

• norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s in-degrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the $$c_{ij}$$ in the paper is applied.

• weight (bool, optional) – If True, apply a linear layer. Otherwise, aggregating the messages without a weight matrix.

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

• activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Default: False.

weight

The learnable weight tensor.

Type

torch.Tensor

bias

The learnable bias tensor.

Type

torch.Tensor

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Examples

>>> import dgl
>>> import mxnet as mx
>>> from mxnet import gluon
>>> import numpy as np
>>> from dgl.nn import GraphConv

>>> # Case 1: Homogeneous graph
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = GraphConv(10, 2, norm='both', weight=True, bias=True)
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> print(res)
[[1.0209361  0.22472616]
[1.1240715  0.24742813]
[1.0209361  0.22472616]
[1.2924911  0.28450024]
[1.3568745  0.29867214]
[0.7948386  0.17495811]]
<NDArray 6x2 @cpu(0)>

>>> # allow_zero_in_degree example
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> conv = GraphConv(10, 2, norm='both', weight=True, bias=True, allow_zero_in_degree=True)
>>> res = conv(g, feat)
>>> print(res)
[[1.0209361  0.22472616]
[1.1240715  0.24742813]
[1.0209361  0.22472616]
[1.2924911  0.28450024]
[1.3568745  0.29867214]
[0.  0.]]
<NDArray 6x2 @cpu(0)>

>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> g = dgl.bipartite((u, v))
>>> u_fea = mx.nd.random.randn(2, 5)
>>> v_fea = mx.nd.random.randn(4, 5)
>>> conv = GraphConv(5, 2, norm='both', weight=True, bias=True)
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, (u_fea, v_fea))
>>> res
[[ 0.26967263  0.308129  ]
[ 0.05143356 -0.11355402]
[ 0.22705637  0.1375853 ]
[ 0.26967263  0.308129  ]]
<NDArray 4x2 @cpu(0)>

forward(graph, feat, weight=None)[source]

Compute graph convolution.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray or pair of mxnet.NDArray) –

If a single tensor is given, it represents the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape $$(N_{in}, D_{in_{src}})$$ and $$(N_{out}, D_{in_{dst}})$$.

Note that in the special case of graph convolutional networks, if a pair of tensors is given, the latter element will not participate in computation.

• weight (torch.Tensor, optional) – Optional external weight tensor.

Returns

The output feature

Return type

mxnet.NDArray

Raises

DGLError – If there are 0-in-degree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting allow_zero_in_degree parameter to True.

Note

• Input shape: $$(N, *, \text{in_feats})$$ where * means any number of additional dimensions, $$N$$ is the number of nodes.

• Output shape: $$(N, *, \text{out_feats})$$ where all but the last dimension are the same shape as the input.

• Weight shape: $$(\text{in_feats}, \text{out_feats})$$.

### RelGraphConv¶

class dgl.nn.mxnet.conv.RelGraphConv(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=True, low_mem=False, dropout=0.0, layer_norm=False)[source]

Bases: mxnet.gluon.block.Block

Relational graph convolution layer.

Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:

$h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})$

where $$\mathcal{N}^r(i)$$ is the neighbor set of node $$i$$ w.r.t. relation $$r$$. $$c_{i,r}$$ is the normalizer equal to $$|\mathcal{N}^r(i)|$$. $$\sigma$$ is an activation function. $$W_0$$ is the self-loop weight.

The basis regularization decomposes $$W_r$$ by:

$W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}$

where $$B$$ is the number of bases, $$V_b^{(l)}$$ are linearly combined with coefficients $$a_{rb}^{(l)}$$.

The block-diagonal-decomposition regularization decomposes $$W_r$$ into $$B$$ number of block diagonal matrices. We refer $$B$$ as the number of bases.

The block regularization decomposes $$W_r$$ by:

$W_r^{(l)} = \oplus_{b=1}^B Q_{rb}^{(l)}$

where $$B$$ is the number of bases, $$Q_{rb}^{(l)}$$ are block bases with shape $$R^{(d^{(l+1)}/B)*(d^{l}/B)}$$.

Parameters
• in_feat (int) – Input feature size; i.e, the number of dimensions of $$h_j^{(l)}$$.

• out_feat (int) – Output feature size; i.e., the number of dimensions of $$h_i^{(l+1)}$$.

• num_rels (int) – Number of relations. .

• regularizer (str) – Which weight regularizer to use “basis” or “bdd”. “basis” is short for basis-diagonal-decomposition. “bdd” is short for block-diagonal-decomposition.

• num_bases (int, optional) – Number of bases. If is none, use number of relations. Default: None.

• bias (bool, optional) – True if bias is added. Default: True.

• activation (callable, optional) – Activation function. Default: None.

• self_loop (bool, optional) – True to include self loop message. Default: True.

• low_mem (bool, optional) – True to use low memory implementation of relation message passing function. Default: False. This option trades speed with memory consumption, and will slowdown the forward/backward. Turn it on when you encounter OOM problem during training or evaluation. Default: False.

• dropout (float, optional) – Dropout rate. Default: 0.0

• layer_norm (float, optional) – Add layer norm. Default: False

Examples

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from mxnet import gluon
>>> from dgl.nn import RelGraphConv
>>>
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = RelGraphConv(10, 2, 3, regularizer='basis', num_bases=2)
>>> conv.initialize(ctx=mx.cpu(0))
>>> etype = mx.nd.array(np.array([0,1,2,0,1,2]).astype(np.int64))
>>> res = conv(g, feat, etype)
[[ 0.561324    0.33745846]
[ 0.61585337  0.09992217]
[ 0.561324    0.33745846]
[-0.01557937  0.01227859]
[ 0.61585337  0.09992217]
[ 0.056508   -0.00307822]]
<NDArray 6x2 @cpu(0)>

forward(g, x, etypes, norm=None)[source]

Forward computation

Parameters
• g (DGLGraph) – The graph.

• feat (mx.ndarray.NDArray) –

Input node features. Could be either

• $$(|V|, D)$$ dense tensor

• $$(|V|,)$$ int64 vector, representing the categorical values of each node. It then treat the input feature as an one-hot encoding feature.

• etypes (mx.ndarray.NDArray) – Edge type tensor. Shape: $$(|E|,)$$

• norm (mx.ndarray.NDArray) – Optional edge normalizer tensor. Shape: $$(|E|, 1)$$.

Returns

New node features.

Return type

mx.ndarray.NDArray

### TAGConv¶

class dgl.nn.mxnet.conv.TAGConv(in_feats, out_feats, k=2, bias=True, activation=None)[source]

Bases: mxnet.gluon.block.Block

Topology Adaptive Graph Convolutional layer from paper Topology Adaptive Graph Convolutional Networks.

$H^{K} = {\sum}_{k=0}^K (D^{-1/2} A D^{-1/2})^{k} X {\Theta}_{k},$

where $$A$$ denotes the adjacency matrix, $$D_{ii} = \sum_{j=0} A_{ij}$$ its diagonal degree matrix, $${\Theta}_{k}$$ denotes the linear weights to sum the results of different hops together.

Parameters
• in_feats (int) – Input feature size. i.e, the number of dimensions of $$X$$.

• out_feats (int) – Output feature size. i.e, the number of dimensions of $$H^{K}$$.

• k (int, optional) – Number of hops $$K$$. Default: 2.

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

• activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

lin

The learnable linear module.

Type

torch.Module

Example

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from mxnet import gluon
>>> from dgl.nn import TAGConv
>>>
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = TAGConv(10, 2, k=2)
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> res
[[-0.86147034  0.10089529]
[-0.86147034  0.10089529]
[-0.86147034  0.10089529]
[-0.9707841   0.0360311 ]
[-0.6716844   0.02247889]
[ 0.32964635 -0.7669234 ]]
<NDArray 6x2 @cpu(0)>

forward(graph, feat)[source]

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature.

Return type

mxnet.NDArray

### GATConv¶

class dgl.nn.mxnet.conv.GATConv(in_feats, out_feats, num_heads, feat_drop=0.0, attn_drop=0.0, negative_slope=0.2, residual=False, activation=None, allow_zero_in_degree=False)[source]

Bases: mxnet.gluon.block.Block

Apply Graph Attention Network over an input signal.

$h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} \alpha_{i,j} W^{(l)} h_j^{(l)}$

where $$\alpha_{ij}$$ is the attention score bewteen node $$i$$ and node $$j$$:

\begin{align}\begin{aligned}\alpha_{ij}^{l} &= \mathrm{softmax_i} (e_{ij}^{l})\\e_{ij}^{l} &= \mathrm{LeakyReLU}\left(\vec{a}^T [W h_{i} \| W h_{j}]\right)\end{aligned}\end{align}
Parameters
• in_feats (int, or pair of ints) – Input feature size; i.e, the number of dimensions of $$h_i^{(l)}$$. GATConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer is to be applied to a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.

• out_feats (int) – Output feature size; i.e, the number of dimensions of $$h_i^{(l+1)}$$.

• feat_drop (float, optional) – Dropout rate on feature. Defaults: 0.

• attn_drop (float, optional) – Dropout rate on attention weight. Defaults: 0.

• negative_slope (float, optional) – LeakyReLU angle of negative slope. Defaults: 0.2.

• residual (bool, optional) – If True, use residual connection. Defaults: False.

• activation (callable activation function/layer or None, optional.) – If not None, applies an activation function to the updated node features. Default: None.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Defaults: False.

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Examples

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from mxnet import gluon
>>> from dgl.nn import GATConv
>>>
>>> # Case 1: Homogeneous graph
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> gatconv = GATConv(10, 2, num_heads=3)
>>> gatconv.initialize(ctx=mx.cpu(0))
>>> res = gatconv(g, feat)
>>> res
[[[ 0.32368395 -0.10501936]
[ 1.0839728   0.92690575]
[-0.54581136 -0.84279203]]
[[ 0.32368395 -0.10501936]
[ 1.0839728   0.92690575]
[-0.54581136 -0.84279203]]
[[ 0.32368395 -0.10501936]
[ 1.0839728   0.92690575]
[-0.54581136 -0.84279203]]
[[ 0.32368395 -0.10501937]
[ 1.0839728   0.9269058 ]
[-0.5458114  -0.8427921 ]]
[[ 0.32368395 -0.10501936]
[ 1.0839728   0.92690575]
[-0.54581136 -0.84279203]]
[[ 0.32368395 -0.10501936]
[ 1.0839728   0.92690575]
[-0.54581136 -0.84279203]]]
<NDArray 6x3x2 @cpu(0)>

>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> g = dgl.bipartite((u, v))
>>> u_feat = mx.nd.random.randn(2, 5)
>>> v_feat = mx.nd.random.randn(4, 10)
>>> gatconv = GATConv((5,10), 2, 3)
>>> gatconv.initialize(ctx=mx.cpu(0))
>>> res = gatconv(g, (u_feat, v_feat))
>>> res
[[[-1.01624     1.8138596 ]
[ 1.2322129  -0.8410206 ]
[-1.9325689   1.3824553 ]]
[[ 0.9915016  -1.6564168 ]
[-0.32610354  0.42505783]
[ 1.5278397  -0.92114615]]
[[-0.32592064  0.62067866]
[ 0.6162219  -0.3405491 ]
[-1.356375    0.9988818 ]]
[[-1.01624     1.8138596 ]
[ 1.2322129  -0.8410206 ]
[-1.9325689   1.3824553 ]]]
<NDArray 4x3x2 @cpu(0)>

forward(graph, feat)[source]

Compute graph attention network layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray or pair of mxnet.NDArray) – If a mxnet.NDArray is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape $$(N_{in}, D_{in_{src}})$$ and $$(N_{out}, D_{in_{dst}})$$.

Returns

The output feature of shape $$(N, H, D_{out})$$ where $$H$$ is the number of heads, and $$D_{out}$$ is size of output feature.

Return type

mxnet.NDArray

Raises

DGLError – If there are 0-in-degree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting allow_zero_in_degree parameter to True.

### EdgeConv¶

class dgl.nn.mxnet.conv.EdgeConv(in_feat, out_feat, batch_norm=False, allow_zero_in_degree=False)[source]

Bases: mxnet.gluon.block.Block

EdgeConv layer.

Introduced in “Dynamic Graph CNN for Learning on Point Clouds”. Can be described as follows:

$h_i^{(l+1)} = \max_{j \in \mathcal{N}(i)} \mathrm{ReLU}( \Theta \cdot (h_j^{(l)} - h_i^{(l)}) + \Phi \cdot h_i^{(l)})$

where $$\mathcal{N}(i)$$ is the neighbor of $$i$$. $$\Theta$$ and $$\Phi$$ are linear layers.

Parameters
• in_feat (int) – Input feature size; i.e, the number of dimensions of $$h_j^{(l)}$$.

• out_feat (int) – Output feature size; i.e., the number of dimensions of $$h_i^{(l+1)}$$.

• batch_norm (bool) – Whether to include batch normalization on messages. Default: False.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Default: False.

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Examples

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from mxnet import gluon
>>> from dgl.nn import EdgeConv
>>>
>>> # Case 1: Homogeneous graph
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = EdgeConv(10, 2)
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> res
[[1.0517545 0.8091326]
[1.0517545 0.8091326]
[1.0517545 0.8091326]
[1.0517545 0.8091326]
[1.0517545 0.8091326]
[1.0517545 0.8091326]]
<NDArray 6x2 @cpu(0)>

>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> g = dgl.bipartite((u, v))
>>> u_fea = mx.nd.random.randn(2, 5)
>>> v_fea = mx.nd.random.randn(4, 5)
>>> conv = EdgeConv(5, 2, 3)
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, (u_fea, v_fea))
>>> res
[[-3.4617817   0.84700686]
[ 1.3170856  -1.5731761 ]
[-2.0761423   0.56653017]
[-1.015364    0.78919804]]
<NDArray 4x2 @cpu(0)>

forward(g, h)[source]

Forward computation

Parameters
• g (DGLGraph) – The graph.

• feat (mxnet.NDArray or pair of mxnet.NDArray) –

$$(N, D)$$ where $$N$$ is the number of nodes and $$D$$ is the number of feature dimensions.

If a pair of mxnet.NDArray is given, the graph must be a uni-bipartite graph with only one edge type, and the two tensors must have the same dimensionality on all except the first axis.

Returns

New node features.

Return type

mxnet.NDArray

Raises

DGLError – If there are 0-in-degree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting allow_zero_in_degree parameter to True.

### SAGEConv¶

class dgl.nn.mxnet.conv.SAGEConv(in_feats, out_feats, aggregator_type='mean', feat_drop=0.0, bias=True, norm=None, activation=None)[source]

Bases: mxnet.gluon.block.Block

GraphSAGE layer from paper Inductive Representation Learning on Large Graphs.

\begin{align}\begin{aligned}h_{\mathcal{N}(i)}^{(l+1)} &= \mathrm{aggregate} \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\\h_{i}^{(l+1)} &= \sigma \left(W \cdot \mathrm{concat} (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1}) \right)\\h_{i}^{(l+1)} &= \mathrm{norm}(h_{i}^{l})\end{aligned}\end{align}
Parameters
• in_feats (int, or pair of ints) –

Input feature size; i.e, the number of dimensions of $$h_i^{(l)}$$.

GATConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer applies on a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.

If aggregator type is gcn, the feature size of source and destination nodes are required to be the same.

• out_feats (int) – Output feature size; i.e, the number of dimensions of $$h_i^{(l+1)}$$.

• feat_drop (float) – Dropout rate on features, default: 0.

• aggregator_type (str) – Aggregator type to use (mean, gcn, pool, lstm).

• bias (bool) – If True, adds a learnable bias to the output. Default: True.

• norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.

• activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

Examples

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from dgl.nn import SAGEConv
>>>
>>> # Case 1: Homogeneous graph
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = SAGEConv(10, 2, 'pool')
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> res
[[ 0.32144994 -0.8729614 ]
[ 0.32144994 -0.8729614 ]
[ 0.32144994 -0.8729614 ]
[ 0.32144994 -0.8729614 ]
[ 0.32144994 -0.8729614 ]
[ 0.32144994 -0.8729614 ]]
<NDArray 6x2 @cpu(0)>

>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> g = dgl.bipartite((u, v))
>>> u_fea = mx.nd.random.randn(2, 5)
>>> v_fea = mx.nd.random.randn(4, 10)
>>> conv = SAGEConv((5, 10), 2, 'pool')
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, (u_fea, v_fea))
>>> res
[[-0.60524774  0.7196473 ]
[ 0.8832787  -0.5928619 ]
[-1.8245722   1.159798  ]
[-1.0509381   2.2239418 ]]
<NDArray 4x2 @cpu(0)>

forward(graph, feat)[source]

Compute GraphSAGE layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray or pair of mxnet.NDArray) – If a single tensor is given, it represents the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape $$(N_{in}, D_{in_{src}})$$ and $$(N_{out}, D_{in_{dst}})$$.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature.

Return type

mxnet.NDArray

### SGConv¶

class dgl.nn.mxnet.conv.SGConv(in_feats, out_feats, k=1, cached=False, bias=True, norm=None, allow_zero_in_degree=False)[source]

Bases: mxnet.gluon.block.Block

Simplifying Graph Convolution layer from paper Simplifying Graph Convolutional Networks.

$H^{K} = (\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2})^K X \Theta$

where $$\tilde{A}$$ is $$A$$ + $$I$$. Thus the graph input is expected to have self-loop edges added.

Parameters
• in_feats (int) – Number of input features; i.e, the number of dimensions of $$X$$.

• out_feats (int) – Number of output features; i.e, the number of dimensions of $$H^{K}$$.

• k (int) – Number of hops $$K$$. Defaults:1.

• cached (bool) –

If True, the module would cache

$(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}})^K X\Theta$

at the first forward call. This parameter should only be set to True in Transductive Learning setting.

• bias (bool) – If True, adds a learnable bias to the output. Default: True.

• norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features. Default: False.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Default: False.

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Example

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from dgl.nn import SGConv
>>>
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = SGConv(10, 2, k=2, cached=True)
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> res
[[ 2.264404   -0.26684892]
[ 2.264404   -0.26684892]
[ 2.264404   -0.26684892]
[ 3.2273252  -0.3803246 ]
[ 2.247593   -0.2648679 ]
[ 2.2644043  -0.26684904]]
<NDArray 6x2 @cpu(0)>

forward(graph, feat)[source]

Compute Simplifying Graph Convolution layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature.

Return type

mxnet.NDArray

Raises

DGLError – If there are 0-in-degree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting allow_zero_in_degree parameter to True.

Note

If cache is set to True, feat and graph should not change during training, or you will get wrong results.

### APPNPConv¶

class dgl.nn.mxnet.conv.APPNPConv(k, alpha, edge_drop=0.0)[source]

Bases: mxnet.gluon.block.Block

Approximate Personalized Propagation of Neural Predictions layer from paper Predict then Propagate: Graph Neural Networks meet Personalized PageRank.

\begin{align}\begin{aligned}H^{0} &= X\\H^{l+1} &= (1-\alpha)\left(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{l}\right) + \alpha H^{0}\end{aligned}\end{align}

where $$\tilde{A}$$ is $$A$$ + $$I$$.

Parameters
• k (int) – The number of iterations $$K$$.

• alpha (float) – The teleport probability $$\alpha$$.

• edge_drop (float, optional) – The dropout rate on edges that controls the messages received by each node. Default: 0.

Example

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from dgl.nn import APPNPConv
>>>
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = APPNPConv(k=3, alpha=0.5)
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> res
[[1.         1.         1.         1.         1.         1.
1.         1.         1.         1.        ]
[1.         1.         1.         1.         1.         1.
1.         1.         1.         1.        ]
[1.         1.         1.         1.         1.         1.
1.         1.         1.         1.        ]
[1.0303301  1.0303301  1.0303301  1.0303301  1.0303301  1.0303301
1.0303301  1.0303301  1.0303301  1.0303301 ]
[0.86427665 0.86427665 0.86427665 0.86427665 0.86427665 0.86427665
0.86427665 0.86427665 0.86427665 0.86427665]
[0.5        0.5        0.5        0.5        0.5        0.5
0.5        0.5        0.5        0.5       ]]
<NDArray 6x10 @cpu(0)>

forward(graph, feat)[source]

Compute APPNP layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mx.NDArray) – The input feature of shape $$(N, *)$$. $$N$$ is the number of nodes, and $$*$$ could be of any shape.

Returns

The output feature of shape $$(N, *)$$ where $$*$$ should be the same as input shape.

Return type

mx.NDArray

### GINConv¶

class dgl.nn.mxnet.conv.GINConv(apply_func, aggregator_type, init_eps=0, learn_eps=False)[source]

Bases: mxnet.gluon.block.Block

Graph Isomorphism Network layer from paper How Powerful are Graph Neural Networks?.

$h_i^{(l+1)} = f_\Theta \left((1 + \epsilon) h_i^{l} + \mathrm{aggregate}\left(\left\{h_j^{l}, j\in\mathcal{N}(i) \right\}\right)\right)$
Parameters
• apply_func (callable activation function/layer or None) – If not None, apply this function to the updated node feature, the $$f_\Theta$$ in the formula.

• aggregator_type (str) – Aggregator type to use (sum, max or mean).

• init_eps (float, optional) – Initial $$\epsilon$$ value, default: 0.

• learn_eps (bool, optional) – If True, $$\epsilon$$ will be a learnable parameter. Default: False.

Example

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from mxnet import gluon
>>> from dgl.nn import GINConv
>>>
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> lin = gluon.nn.Dense(10)
>>> lin.initialize(ctx=mx.cpu(0))
>>> conv = GINConv(lin, 'max')
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> res
[[ 0.44832918 -0.05283341  0.20823681  0.16020004  0.37311912 -0.03372726
-0.05716725 -0.20730163  0.14121324  0.46083626]
[ 0.44832918 -0.05283341  0.20823681  0.16020004  0.37311912 -0.03372726
-0.05716725 -0.20730163  0.14121324  0.46083626]
[ 0.44832918 -0.05283341  0.20823681  0.16020004  0.37311912 -0.03372726
-0.05716725 -0.20730163  0.14121324  0.46083626]
[ 0.44832918 -0.05283341  0.20823681  0.16020004  0.37311912 -0.03372726
-0.05716725 -0.20730163  0.14121324  0.46083626]
[ 0.44832918 -0.05283341  0.20823681  0.16020004  0.37311912 -0.03372726
-0.05716725 -0.20730163  0.14121324  0.46083626]
[ 0.22416459 -0.0264167   0.10411841  0.08010002  0.18655956 -0.01686363
-0.02858362 -0.10365082  0.07060662  0.23041813]]
<NDArray 6x10 @cpu(0)>

forward(graph, feat)[source]

Compute Graph Isomorphism Network layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray or a pair of mxnet.NDArray) – If a mxnet.NDArray is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape $$(N_{in}, D_{in})$$ and $$(N_{out}, D_{in})$$. If apply_func is not None, $$D_{in}$$ should fit the input dimensionality requirement of apply_func.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is the output dimensionality of apply_func. If apply_func is None, $$D_{out}$$ should be the same as input dimensionality.

Return type

mxnet.NDArray

### GatedGraphConv¶

class dgl.nn.mxnet.conv.GatedGraphConv(in_feats, out_feats, n_steps, n_etypes, bias=True)[source]

Bases: mxnet.gluon.block.Block

Gated Graph Convolution layer from paper Gated Graph Sequence Neural Networks.

\begin{align}\begin{aligned}h_{i}^{0} &= [ x_i \| \mathbf{0} ]\\a_{i}^{t} &= \sum_{j\in\mathcal{N}(i)} W_{e_{ij}} h_{j}^{t}\\h_{i}^{t+1} &= \mathrm{GRU}(a_{i}^{t}, h_{i}^{t})\end{aligned}\end{align}
Parameters
• in_feats (int) – Input feature size; i.e, the number of dimensions of $$x_i$$.

• out_feats (int) – Output feature size; i.e., the number of dimensions of $$h_i^{(t+1)}$$.

• n_steps (int) – Number of recurrent steps; i.e, the $$t$$ in the above formula.

• n_etypes (int) – Number of edge types.

• bias (bool) – If True, adds a learnable bias to the output. Default: True. Can only be set to True in MXNet.

Example

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from dgl.nn import GatedGraphConv
>>>
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = GatedGraphConv(10, 10, 2, 3)
>>> conv.initialize(ctx=mx.cpu(0))
>>> etype = mx.nd.array([0,1,2,0,1,2])
>>> res = conv(g, feat, etype)
>>> res
[[0.24378185 0.17402579 0.2644723  0.2740628  0.14041871 0.32523093
0.2703067  0.18234392 0.32777587 0.30957845]
[0.17872348 0.28878236 0.2509409  0.20139427 0.3355541  0.22643831
0.2690711  0.22341749 0.27995753 0.21575949]
[0.23911178 0.16696918 0.26120248 0.27397877 0.13745922 0.3223175
0.27561218 0.18071817 0.3251124  0.30608907]
[0.25242943 0.3098581  0.25249368 0.27968448 0.24624602 0.12270881
0.335147   0.31550157 0.19065917 0.21087633]
[0.17503153 0.29523152 0.2474858  0.20848347 0.3526433  0.23443702
0.24741334 0.21986549 0.28935105 0.21859099]
[0.2159364  0.26942077 0.23083271 0.28329757 0.24758333 0.24230732
0.23958017 0.23430146 0.26431587 0.27001363]]
<NDArray 6x10 @cpu(0)>

forward(graph, feat, etypes)[source]

Compute Gated Graph Convolution layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature of shape $$(N, D_{in})$$ where $$N$$ is the number of nodes of the graph and $$D_{in}$$ is the input feature size.

• etypes (torch.LongTensor) – The edge type tensor of shape $$(E,)$$ where $$E$$ is the number of edges of the graph.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is the output feature size.

Return type

mxnet.NDArray

### GMMConv¶

class dgl.nn.mxnet.conv.GMMConv(in_feats, out_feats, dim, n_kernels, aggregator_type='sum', residual=False, bias=True, allow_zero_in_degree=False)[source]

Bases: mxnet.gluon.block.Block

The Gaussian Mixture Model Convolution layer from Geometric Deep Learning on Graphs and Manifolds using Mixture Model CNNs.

\begin{align}\begin{aligned}u_{ij} &= f(x_i, x_j), x_j \in \mathcal{N}(i)\\w_k(u) &= \exp\left(-\frac{1}{2}(u-\mu_k)^T \Sigma_k^{-1} (u - \mu_k)\right)\\h_i^{l+1} &= \mathrm{aggregate}\left(\left\{\frac{1}{K} \sum_{k}^{K} w_k(u_{ij}), \forall j\in \mathcal{N}(i)\right\}\right)\end{aligned}\end{align}

where $$u$$ denotes the pseudo-coordinates between a vertex and one of its neighbor, computed using function $$f$$, $$\Sigma_k^{-1}$$ and $$\mu_k$$ are learnable parameters representing the covariance matrix and mean vector of a Gaussian kernel.

Parameters
• in_feats (int) – Number of input features; i.e., the number of dimensions of $$x_i$$.

• out_feats (int) – Number of output features; i.e., the number of dimensions of $$h_i^{(l+1)}$$.

• dim (int) – Dimensionality of pseudo-coordinte; i.e, the number of dimensions of $$u_{ij}$$.

• n_kernels (int) – Number of kernels $$K$$.

• aggregator_type (str) – Aggregator type (sum, mean, max). Default: sum.

• residual (bool) – If True, use residual connection inside this layer. Default: False.

• bias (bool) – If True, adds a learnable bias to the output. Default: True.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Default: False.

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Examples

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from dgl.nn import GMMConv
>>>
>>> # Case 1: Homogeneous graph
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = GMMConv(10, 2, 3, 2, 'mean')
>>> conv.initialize(ctx=mx.cpu(0))
>>> pseudo = mx.nd.ones((12, 3))
>>> res = conv(g, feat, pseudo)
>>> res
[[-0.05083769 -0.1567954 ]
[-0.05083769 -0.1567954 ]
[-0.05083769 -0.1567954 ]
[-0.05083769 -0.1567954 ]
[-0.05083769 -0.1567954 ]
[-0.05083769 -0.1567954 ]]
<NDArray 6x2 @cpu(0)>

>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> g = dgl.bipartite((u, v))
>>> u_fea = mx.nd.random.randn(2, 5)
>>> v_fea = mx.nd.random.randn(4, 10)
>>> pseudo = mx.nd.ones((5, 3))
>>> conv = GMMConv((5, 10), 2, 3, 2, 'mean')
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, (u_fea, v_fea), pseudo)
>>> res
[[-0.1005067  -0.09494358]
[-0.0023314  -0.07597432]
[-0.05141905 -0.08545895]
[-0.1005067  -0.09494358]]
<NDArray 4x2 @cpu(0)>

forward(graph, feat, pseudo)[source]

Compute Gaussian Mixture Model Convolution layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – If a single tensor is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape $$(N_{in}, D_{in_{src}})$$ and $$(N_{out}, D_{in_{dst}})$$.

• pseudo (mxnet.NDArray) – The pseudo coordinate tensor of shape $$(E, D_{u})$$ where $$E$$ is the number of edges of the graph and $$D_{u}$$ is the dimensionality of pseudo coordinate.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is the output feature size.

Return type

mxnet.NDArray

Raises

DGLError – If there are 0-in-degree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting allow_zero_in_degree parameter to True.

### ChebConv¶

class dgl.nn.mxnet.conv.ChebConv(in_feats, out_feats, k, bias=True)[source]

Bases: mxnet.gluon.block.Block

Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.

\begin{align}\begin{aligned}h_i^{l+1} &= \sum_{k=0}^{K-1} W^{k, l}z_i^{k, l}\\Z^{0, l} &= H^{l}\\Z^{1, l} &= \tilde{L} \cdot H^{l}\\Z^{k, l} &= 2 \cdot \tilde{L} \cdot Z^{k-1, l} - Z^{k-2, l}\\\tilde{L} &= 2\left(I - \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2}\right)/\lambda_{max} - I\end{aligned}\end{align}

where $$\tilde{A}$$ is $$A$$ + $$I$$, $$W$$ is learnable weight.

Parameters
• in_feats (int) – Dimension of input features; i.e, the number of dimensions of $$h_i^{(l)}$$.

• out_feats (int) – Dimension of output features $$h_i^{(l+1)}$$.

• k (int) – Chebyshev filter size $$K$$.

• activation (function, optional) – Activation function. Default ReLu.

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

Example

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from dgl.nn import ChebConv
>>>
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = ChebConv(10, 2, 2)
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> res
[[ 0.832592   -0.738757  ]
[ 0.832592   -0.738757  ]
[ 0.832592   -0.738757  ]
[ 0.43377423 -1.0455742 ]
[ 1.1145986  -0.5218046 ]
[ 1.7954229   0.00196505]]
<NDArray 6x2 @cpu(0)>

forward(graph, feat, lambda_max=None)[source]

Compute ChebNet layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes.

• lambda_max (list or tensor or None, optional.) – A list(tensor) with length $$B$$, stores the largest eigenvalue of the normalized laplacian of each individual graph in graph, where $$B$$ is the batch size of the input graph. Default: None. If None, this method would compute the list by calling dgl.laplacian_lambda_max.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature.

Return type

mxnet.NDArray

### AGNNConv¶

class dgl.nn.mxnet.conv.AGNNConv(init_beta=1.0, learn_beta=True, allow_zero_in_degree=False)[source]

Bases: mxnet.gluon.block.Block

Attention-based Graph Neural Network layer from paper Attention-based Graph Neural Network for Semi-Supervised Learning.

$H^{l+1} = P H^{l}$

where $$P$$ is computed as:

$P_{ij} = \mathrm{softmax}_i ( \beta \cdot \cos(h_i^l, h_j^l))$

where $$\beta$$ is a single scalar parameter.

Parameters
• init_beta (float, optional) – The $$\beta$$ in the formula, a single scalar parameter.

• learn_beta (bool, optional) – If True, $$\beta$$ will be learnable parameter.

• allow_zero_in_degree (bool, optional) – If there are 0-in-degree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0-in-degree nodes in input graph. By setting True, it will suppress the check and let the users handle it by themselves. Default: False.

Note

Zero in-degree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a self-loop for each node in the graph if it is homogeneous, which can be achieved by:

>>> g = ... # a DGLGraph


Calling add_self_loop will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Set allow_zero_in_degree to True for those cases to unblock the code and handle zere-in-degree nodes manually. A common practise to handle this is to filter out the nodes with zere-in-degree when use after conv.

Example

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from dgl.nn import AGNNConv
>>>
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> conv = AGNNConv()
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat)
>>> res
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
<NDArray 6x10 @cpu(0)>

forward(graph, feat)[source]

Compute AGNN layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature of shape $$(N, *)$$ $$N$$ is the number of nodes, and $$*$$ could be of any shape. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape $$(N_{in}, *)$$ and $$(N_{out}, *)$$, the $$*$$ in the later tensor must equal the previous one.

Returns

The output feature of shape $$(N, *)$$ where $$*$$ should be the same as input shape.

Return type

mxnet.NDArray

Raises

DGLError – If there are 0-in-degree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting allow_zero_in_degree parameter to True.

### NNConv¶

class dgl.nn.mxnet.conv.NNConv(in_feats, out_feats, edge_func, aggregator_type, residual=False, bias=True)[source]

Bases: mxnet.gluon.block.Block

Graph Convolution layer introduced in Neural Message Passing for Quantum Chemistry.

$h_{i}^{l+1} = h_{i}^{l} + \mathrm{aggregate}\left(\left\{ f_\Theta (e_{ij}) \cdot h_j^{l}, j\in \mathcal{N}(i) \right\}\right)$

where $$e_{ij}$$ is the edge feature, $$f_\Theta$$ is a function with learnable parameters.

Parameters
• in_feats (int) – Input feature size; i.e, the number of dimensions of $$h_j^{(l)}$$. NN can be applied on homogeneous graph and unidirectional bipartite graph. If the layer is to be applied on a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.

• out_feats (int) – Output feature size; i.e., the number of dimensions of $$h_i^{(l+1)}$$.

• edge_func (callable activation function/layer) – Maps each edge feature to a vector of shape (in_feats * out_feats) as weight to compute messages. Also is the $$f_\Theta$$ in the formula.

• aggregator_type (str) – Aggregator type to use (sum, mean or max).

• residual (bool, optional) – If True, use residual connection. Default: False.

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

Examples

>>> import dgl
>>> import numpy as np
>>> import mxnet as mx
>>> from mxnet import gluon
>>> from dgl.nn import NNConv
>>>
>>> # Case 1: Homogeneous graph
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> feat = mx.nd.ones((6, 10))
>>> lin = gluon.nn.Dense(20)
>>> lin.initialize(ctx=mx.cpu(0))
>>> def edge_func(efeat):
>>>      return lin(efeat)
>>> efeat = mx.nd.ones((12, 5))
>>> conv = NNConv(10, 2, edge_func, 'mean')
>>> conv.initialize(ctx=mx.cpu(0))
>>> res = conv(g, feat, efeat)
>>> res
[[0.39946803 0.32098457]
[0.39946803 0.32098457]
[0.39946803 0.32098457]
[0.39946803 0.32098457]
[0.39946803 0.32098457]
[0.39946803 0.32098457]]
<NDArray 6x2 @cpu(0)>

>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> g = dgl.bipartite((u, v))
>>> u_feat = mx.nd.random.randn(2, 10)
>>> v_feat = mx.nd.random.randn(4, 10)
>>> conv = NNConv(10, 2, edge_func, 'mean')
>>> conv.initialize(ctx=mx.cpu(0))
>>> efeat = mx.nd.ones((5, 5))
>>> res = conv(g, (u_feat, v_feat), efeat)
>>> res
[[ 0.24425688  0.3238042 ]
[-0.11651017 -0.01738572]
[ 0.06387337  0.15320925]
[ 0.24425688  0.3238042 ]]
<NDArray 4x2 @cpu(0)>

forward(graph, feat, efeat)[source]

Compute MPNN Graph Convolution layer.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray or pair of mxnet.NDArray) – The input feature of shape $$(N, D_{in})$$ where $$N$$ is the number of nodes of the graph and $$D_{in}$$ is the input feature size.

• efeat (mxnet.NDArray) – The edge feature of shape $$(N, *)$$, should fit the input shape requirement of edge_nn.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is the output feature size.

Return type

mxnet.NDArray

## Dense Conv Layers¶

### DenseGraphConv¶

class dgl.nn.mxnet.conv.DenseGraphConv(in_feats, out_feats, norm='both', bias=True, activation=None)[source]

Bases: mxnet.gluon.block.Block

Graph Convolutional Network layer where the graph structure is given by an adjacency matrix. We recommend user to use this module when applying graph convolution on dense graphs.

Parameters
• in_feats (int) – Input feature size; i.e, the number of dimensions of $$h_j^{(l)}$$.

• out_feats (int) – Output feature size; i.e., the number of dimensions of $$h_i^{(l+1)}$$.

• norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s in-degrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the $$c_{ij}$$ in the paper is applied.

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

• activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

Notes

Zero in-degree nodes will lead to all-zero output. A common practice to avoid this is to add a self-loop for each node in the graph, which can be achieved by setting the diagonal of the adjacency matrix to be 1.

GraphConv

forward(adj, feat)[source]

Compute (Dense) Graph Convolution layer.

Parameters
• adj (mxnet.NDArray) – The adjacency matrix of the graph to apply Graph Convolution on, when applied to a unidirectional bipartite graph, adj should be of shape should be of shape $$(N_{out}, N_{in})$$; when applied to a homo graph, adj should be of shape $$(N, N)$$. In both cases, a row represents a destination node while a column represents a source node.

• feat (mxnet.NDArray) – The input feature.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature.

Return type

mxnet.NDArray

### DenseSAGEConv¶

class dgl.nn.mxnet.conv.DenseSAGEConv(in_feats, out_feats, feat_drop=0.0, bias=True, norm=None, activation=None)[source]

Bases: mxnet.gluon.block.Block

GraphSAGE layer where the graph structure is given by an adjacency matrix. We recommend to use this module when appying GraphSAGE on dense graphs.

Note that we only support gcn aggregator in DenseSAGEConv.

Parameters
• in_feats (int) – Input feature size; i.e, the number of dimensions of $$h_i^{(l)}$$.

• out_feats (int) – Output feature size; i.e, the number of dimensions of $$h_i^{(l+1)}$$.

• feat_drop (float, optional) – Dropout rate on features. Default: 0.

• bias (bool) – If True, adds a learnable bias to the output. Default: True.

• norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.

• activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

SAGEConv

forward(adj, feat)[source]

Compute (Dense) Graph SAGE layer.

Parameters
• adj (mxnet.NDArray) – The adjacency matrix of the graph to apply SAGE Convolution on, when applied to a unidirectional bipartite graph, adj should be of shape should be of shape $$(N_{out}, N_{in})$$; when applied to a homo graph, adj should be of shape $$(N, N)$$. In both cases, a row represents a destination node while a column represents a source node.

• feat (mxnet.NDArray or a pair of mxnet.NDArray) – If a mxnet.NDArray is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape $$(N_{in}, D_{in})$$ and $$(N_{out}, D_{in})$$.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature.

Return type

mxnet.NDArray

### DenseChebConv¶

class dgl.nn.mxnet.conv.DenseChebConv(in_feats, out_feats, k, bias=True)[source]

Bases: mxnet.gluon.block.Block

Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.

We recommend to use this module when applying ChebConv on dense graphs.

Parameters
• in_feats (int) – Dimension of input features $$h_i^{(l)}$$.

• out_feats (int) – Dimension of output features $$h_i^{(l+1)}$$.

• k (int) – Chebyshev filter size.

• activation (function, optional) – Activation function, default is ReLu.

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

ChebConv

forward(adj, feat, lambda_max=None)[source]

Compute (Dense) Chebyshev Spectral Graph Convolution layer.

Parameters
• adj (mxnet.NDArray) – The adjacency matrix of the graph to apply Graph Convolution on, should be of shape $$(N, N)$$, where a row represents the destination and a column represents the source.

• feat (mxnet.NDArray) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes.

• lambda_max (float or None, optional) – A float value indicates the largest eigenvalue of given graph. Default: None.

Returns

The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature.

Return type

mxnet.NDArray

## Global Pooling Layers¶

MXNet modules for graph global pooling.

### SumPooling¶

class dgl.nn.mxnet.glob.SumPooling[source]

Bases: mxnet.gluon.block.Block

Apply sum pooling over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k$
forward(graph, feat)[source]

Compute sum pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size.

Return type

mxnet.NDArray

### AvgPooling¶

class dgl.nn.mxnet.glob.AvgPooling[source]

Bases: mxnet.gluon.block.Block

Apply average pooling over the nodes in the graph.

$r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k$
forward(graph, feat)[source]

Compute average pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size.

Return type

mxnet.NDArray

### MaxPooling¶

class dgl.nn.mxnet.glob.MaxPooling[source]

Bases: mxnet.gluon.block.Block

Apply max pooling over the nodes in the graph.

$r^{(i)} = \max_{k=1}^{N_i} \left( x^{(i)}_k \right)$
forward(graph, feat)[source]

Compute max pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size.

Return type

mxnet.NDArray

### SortPooling¶

class dgl.nn.mxnet.glob.SortPooling(k)[source]

Bases: mxnet.gluon.block.Block

Apply Sort Pooling (An End-to-End Deep Learning Architecture for Graph Classification) over the nodes in the graph.

Parameters

k (int) – The number of nodes to hold for each graph.

forward(graph, feat)[source]

Compute sort pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, k * D)$$, where $$B$$ refers to the batch size.

Return type

mxnet.NDArray

### GlobalAttentionPooling¶

class dgl.nn.mxnet.glob.GlobalAttentionPooling(gate_nn, feat_nn=None)[source]

Bases: mxnet.gluon.block.Block

Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)$
Parameters
• gate_nn (gluon.nn.Block) – A neural network that computes attention scores for each feature.

• feat_nn (gluon.nn.Block, optional) – A neural network applied to each feature before combining them with attention scores.

forward(graph, feat)[source]

Compute global attention pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, D)$$, where $$B$$ refers to the batch size.

Return type

mxnet.NDArray

### Set2Set¶

class dgl.nn.mxnet.glob.Set2Set(input_dim, n_iters, n_layers)[source]

Bases: mxnet.gluon.block.Block

Apply Set2Set (Order Matters: Sequence to sequence for sets) over the nodes in the graph.

For each individual graph in the batch, set2set computes

\begin{align}\begin{aligned}q_t &= \mathrm{LSTM} (q^*_{t-1})\\\alpha_{i,t} &= \mathrm{softmax}(x_i \cdot q_t)\\r_t &= \sum_{i=1}^N \alpha_{i,t} x_i\\q^*_t &= q_t \Vert r_t\end{aligned}\end{align}

for this graph.

Parameters
• input_dim (int) – Size of each input sample

• n_iters (int) – Number of iterations.

• n_layers (int) – Number of recurrent layers.

forward(graph, feat)[source]

Compute set2set pooling.

Parameters
• graph (DGLGraph) – The graph.

• feat (mxnet.NDArray) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph.

Returns

The output feature with shape $$(B, D)$$, where $$B$$ refers to the batch size.

Return type

mxnet.NDArray

## Heterogeneous Graph Convolution Module¶

### HeteroGraphConv¶

class dgl.nn.mxnet.HeteroGraphConv(mods, aggregate='sum')[source]

Bases: mxnet.gluon.block.Block

A generic module for computing convolution on heterogeneous graphs.

The heterograph convolution applies sub-modules on their associating relation graphs, which reads the features from source nodes and writes the updated ones to destination nodes. If multiple relations have the same destination node types, their results are aggregated by the specified method.

If the relation graph has no edge, the corresponding module will not be called.

Examples

Create a heterograph with three types of relations and nodes.

>>> import dgl
>>> g = dgl.heterograph({
...     ('user', 'follows', 'user') : edges1,
...     ('user', 'plays', 'game') : edges2,
...     ('store', 'sells', 'game')  : edges3})


Create a HeteroGraphConv that applies different convolution modules to different relations. Note that the modules for 'follows' and 'plays' do not share weights.

>>> import dgl.nn.pytorch as dglnn
>>> conv = dglnn.HeteroGraphConv({
...     'follows' : dglnn.GraphConv(...),
...     'plays' : dglnn.GraphConv(...),
...     'sells' : dglnn.SAGEConv(...)},
...     aggregate='sum')


Call forward with some 'user' features. This computes new features for both 'user' and 'game' nodes.

>>> import mxnet.ndarray as nd
>>> h1 = {'user' : nd.randomrandn(g.number_of_nodes('user'), 5)}
>>> h2 = conv(g, h1)
>>> print(h2.keys())
dict_keys(['user', 'game'])


Call forward with both 'user' and 'store' features. Because both the 'plays' and 'sells' relations will update the 'game' features, their results are aggregated by the specified method (i.e., summation here).

>>> f1 = {'user' : ..., 'store' : ...}
>>> f2 = conv(g, f1)
>>> print(f2.keys())
dict_keys(['user', 'game'])


Call forward with some 'store' features. This only computes new features for 'game' nodes.

>>> g1 = {'store' : ...}
>>> g2 = conv(g, g1)
>>> print(g2.keys())
dict_keys(['game'])


Call forward with a pair of inputs is allowed and each submodule will also be invoked with a pair of inputs.

>>> x_src = {'user' : ..., 'store' : ...}
>>> x_dst = {'user' : ..., 'game' : ...}
>>> y_dst = conv(g, (x_src, x_dst))
>>> print(y_dst.keys())
dict_keys(['user', 'game'])

Parameters
• mods (dict[str, nn.Module]) – Modules associated with every edge types. The forward function of each module must have a DGLHeteroGraph object as the first argument, and its second argument is either a tensor object representing the node features or a pair of tensor object representing the source and destination node features.

• aggregate (str, callable, optional) –

Method for aggregating node features generated by different relations. Allowed string values are ‘sum’, ‘max’, ‘min’, ‘mean’, ‘stack’. The ‘stack’ aggregation is performed along the second dimension, whose order is deterministic. User can also customize the aggregator by providing a callable instance. For example, aggregation by summation is equivalent to the follows:

def my_agg_func(tensors, dsttype):
# tensors: is a list of tensors to aggregate
# dsttype: string name of the destination node type for which the
#          aggregation is performed
stacked = mx.nd.stack(*tensors, axis=0)
return mx.nd.sum(stacked, axis=0)


mods

Modules associated with every edge types.

Type

dict[str, nn.Module]

forward(g, inputs, mod_args=None, mod_kwargs=None)[source]

Forward computation

Invoke the forward function with each module and aggregate their results.

Parameters
• g (DGLHeteroGraph) – Graph data.

• inputs (dict[str, Tensor] or pair of dict[str, Tensor]) – Input node features.

• mod_args (dict[str, tuple[any]], optional) – Extra positional arguments for the sub-modules.

• mod_kwargs (dict[str, dict[str, any]], optional) – Extra key-word arguments for the sub-modules.

Returns

Output representations for every types of nodes.

Return type

dict[str, Tensor]

## Utility Modules¶

### Sequential¶

class dgl.nn.mxnet.utils.Sequential(prefix=None, params=None)[source]

Bases: mxnet.gluon.nn.basic_layers.Sequential

A squential container for stacking graph neural network blocks.

We support two modes: sequentially apply GNN blocks on the same graph or a list of given graphs. In the second case, the number of graphs equals the number of blocks inside this container.

Examples

Mode 1: sequentially apply GNN modules on the same graph

>>> import dgl
>>> from mxnet import nd
>>> from mxnet.gluon import nn
>>> import dgl.function as fn
>>> from dgl.nn.mxnet import Sequential
>>> class ExampleLayer(nn.Block):
>>>     def __init__(self, **kwargs):
>>>         super().__init__(**kwargs)
>>>     def forward(self, graph, n_feat, e_feat):
>>>         with graph.local_scope():
>>>             graph.ndata['h'] = n_feat
>>>             graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h'))
>>>             n_feat += graph.ndata['h']
>>>             e_feat += graph.edata['e']
>>>             return n_feat, e_feat
>>>
>>> g = dgl.DGLGraph()
>>> g.add_edges([0, 1, 2, 0, 1, 2, 0, 1, 2], [0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> net = Sequential()
>>> net.initialize()
>>> n_feat = nd.random.randn(3, 4)
>>> e_feat = nd.random.randn(9, 4)
>>> net(g, n_feat, e_feat)
(
[[ 12.412863   99.61184    21.472883  -57.625923 ]
[ 10.08097   100.68611    20.627377  -60.13458  ]
[ 11.7912245 101.80654    22.427956  -58.32772  ]]
<NDArray 3x4 @cpu(0)>,
[[  21.818504  198.12076    42.72387  -115.147736]
[  23.070837  195.49811    43.42292  -116.17203 ]
[  24.330334  197.10927    42.40048  -118.06538 ]
[  21.907919  199.11469    42.1187   -115.35658 ]
[  22.849625  198.79213    43.866085 -113.65381 ]
[  20.926125  198.116      42.64334  -114.246704]
[  23.003159  197.06662    41.796425 -117.14977 ]
[  21.391375  198.3348     41.428078 -116.30361 ]
[  21.291483  200.0701     40.8239   -118.07314 ]]
<NDArray 9x4 @cpu(0)>)


Mode 2: sequentially apply GNN modules on different graphs

>>> import dgl
>>> from mxnet import nd
>>> from mxnet.gluon import nn
>>> import dgl.function as fn
>>> import networkx as nx
>>> from dgl.nn.mxnet import Sequential
>>> class ExampleLayer(nn.Block):
>>>     def __init__(self, **kwargs):
>>>         super().__init__(**kwargs)
>>>     def forward(self, graph, n_feat):
>>>         with graph.local_scope():
>>>             graph.ndata['h'] = n_feat
>>>             graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h'))
>>>             n_feat += graph.ndata['h']
>>>             return n_feat.reshape(graph.number_of_nodes() // 2, 2, -1).sum(1)
>>>
>>> g1 = dgl.DGLGraph(nx.erdos_renyi_graph(32, 0.05))
>>> g2 = dgl.DGLGraph(nx.erdos_renyi_graph(16, 0.2))
>>> g3 = dgl.DGLGraph(nx.erdos_renyi_graph(8, 0.8))
>>> net = Sequential()
>>> net.initialize()
>>> n_feat = nd.random.randn(32, 4)
>>> net([g1, g2, g3], n_feat)
[[-101.289566  -22.584694  -89.25348  -151.6447  ]
[-130.74239   -49.494812 -120.250854 -199.81546 ]
[-112.32089   -50.036713 -116.13266  -190.38638 ]
[-119.23065   -26.78553  -111.11185  -166.08322 ]]
<NDArray 4x4 @cpu(0)>

forward(graph, *feats)[source]

Sequentially apply modules to the input.

Parameters
• graph (DGLGraph or list of DGLGraphs) – The graph(s) to apply modules on.

• *feats – Input features. The output of $$i$$-th block should match that of the input of $$(i+1)$$-th block.