NN Modules (MXNet)¶
Contents
We welcome your contribution! If you want a model to be implemented in DGL as a NN module, please create an issue started with “[Feature Request] NN Module XXXModel”.
If you want to contribute a NN module, please create a pull request started with “[NN] XXXModel in MXNet NN Modules” and our team member would review this PR.
Conv Layers¶
MXNet modules for graph convolutions.
GraphConv¶

class
dgl.nn.mxnet.conv.
GraphConv
(in_feats, out_feats, norm=True, bias=True, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
Apply graph convolution over an input signal.
Graph convolution is introduced in GCN and can be described as below:
\[h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})\]where \(\mathcal{N}(i)\) is the neighbor set of node \(i\). \(c_{ij}\) is equal to the product of the square root of node degrees: \(\sqrt{\mathcal{N}(i)}\sqrt{\mathcal{N}(j)}\). \(\sigma\) is an activation function.
The model parameters are initialized as in the original implementation where the weight \(W^{(l)}\) is initialized using Glorot uniform initialization and the bias is initialized to be zero.
Notes
Zero in degree nodes could lead to invalid normalizer. A common practice to avoid this is to add a selfloop for each node in the graph, which can be achieved by:
>>> g = ... # some DGLGraph >>> g.add_edges(g.nodes(), g.nodes())
Parameters:  in_feats (int) – Number of input features.
 out_feats (int) – Number of output features.
 norm (bool, optional) – If True, the normalizer \(c_{ij}\) is applied. Default:
True
.  bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.  activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features.
Default:
None
.

weight
¶ mxnet.gluon.parameter.Parameter – The learnable weight tensor.

bias
¶ mxnet.gluon.parameter.Parameter – The learnable bias tensor.

forward
(graph, feat)[source]¶ Compute graph convolution.
Notes
 Input shape: \((N, *, \text{in_feats})\) where * means any number of additional dimensions, \(N\) is the number of nodes.
 Output shape: \((N, *, \text{out_feats})\) where all but the last dimension are the same shape as the input.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature
Returns: The output feature
Return type: mxnet.NDArray
RelGraphConv¶

class
dgl.nn.mxnet.conv.
RelGraphConv
(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=False, dropout=0.0)[source]¶ Bases:
mxnet.gluon.block.Block
Relational graph convolution layer.
Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:
\[h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})\]where \(\mathcal{N}^r(i)\) is the neighbor set of node \(i\) w.r.t. relation \(r\). \(c_{i,r}\) is the normalizer equal to \(\mathcal{N}^r(i)\). \(\sigma\) is an activation function. \(W_0\) is the selfloop weight.
The basis regularization decomposes \(W_r\) by:
\[W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}\]where \(B\) is the number of bases.
The blockdiagonaldecomposition regularization decomposes \(W_r\) into \(B\) number of block diagonal matrices. We refer \(B\) as the number of bases.
Parameters:  in_feat (int) – Input feature size.
 out_feat (int) – Output feature size.
 num_rels (int) – Number of relations.
 regularizer (str) – Which weight regularizer to use “basis” or “bdd”
 num_bases (int, optional) – Number of bases. If is none, use number of relations. Default: None.
 bias (bool, optional) – True if bias is added. Default: True
 activation (callable, optional) – Activation function. Default: None
 self_loop (bool, optional) – True to include self loop message. Default: False
 dropout (float, optional) – Dropout rate. Default: 0.0

forward
(g, x, etypes, norm=None)[source]¶ Forward computation
Parameters:  g (DGLGraph) – The graph.
 x (mx.ndarray.NDArray) –
 Input node features. Could be either
 \((V, D)\) dense tensor
 \((V,)\) int64 vector, representing the categorical values of each node. We then treat the input feature as an onehot encoding feature.
 etypes (mx.ndarray.NDArray) – Edge type tensor. Shape: \((E,)\)
 norm (mx.ndarray.NDArray) – Optional edge normalizer tensor. Shape: \((E, 1)\)
Returns: New node features.
Return type: mx.ndarray.NDArray
TAGConv¶

class
dgl.nn.mxnet.conv.
TAGConv
(in_feats, out_feats, k=2, bias=True, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Topology Adaptive Graph Convolutional Network
\[\mathbf{X}^{\prime} = \sum_{k=0}^K \mathbf{D}^{1/2} \mathbf{A} \mathbf{D}^{1/2}\mathbf{X} \mathbf{\Theta}_{k},\]where \(\mathbf{A}\) denotes the adjacency matrix and \(D_{ii} = \sum_{j=0} A_{ij}\) its diagonal degree matrix.
Parameters:  in_feats (int) – Number of input features.
 out_feats (int) – Number of output features.
 k (int, optional) – Number of hops :math: k. (default: 2)
 bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.  activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features.
Default:
None
.

lin
¶ mxnet.gluon.parameter.Parameter – The learnable weight tensor.

bias
¶ mxnet.gluon.parameter.Parameter – The learnable bias tensor.

forward
(graph, feat)[source]¶ Compute graph convolution
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
Return type: mxnet.NDArray
Global Pooling Layers¶
MXNet modules for graph global pooling.
SumPooling¶

class
dgl.nn.mxnet.glob.
SumPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply sum pooling over the nodes in the graph.
\[r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k\]
forward
(graph, feat)[source]¶ Compute sum pooling.
Parameters:  graph (DGLGraph or BatchedDGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((*)\) (if input graph is a BatchedDGLGraph, the result shape would be \((B, *)\).
Return type: mxnet.NDArray

AvgPooling¶

class
dgl.nn.mxnet.glob.
AvgPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply average pooling over the nodes in the graph.
\[r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k\]
forward
(graph, feat)[source]¶ Compute average pooling.
Parameters:  graph (DGLGraph or BatchedDGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((*)\) (if input graph is a BatchedDGLGraph, the result shape would be \((B, *)\).
Return type: mxnet.NDArray

MaxPooling¶

class
dgl.nn.mxnet.glob.
MaxPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply max pooling over the nodes in the graph.
\[r^{(i)} = \max_{k=1}^{N_i} \left( x^{(i)}_k \right)\]
forward
(graph, feat)[source]¶ Compute max pooling.
Parameters:  graph (DGLGraph or BatchedDGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((*)\) (if input graph is a BatchedDGLGraph, the result shape would be \((B, *)\).
Return type: mxnet.NDArray

SortPooling¶

class
dgl.nn.mxnet.glob.
SortPooling
(k)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Sort Pooling (An EndtoEnd Deep Learning Architecture for Graph Classification) over the nodes in the graph.
Parameters: k (int) – The number of nodes to hold for each graph. 
forward
(graph, feat)[source]¶ Compute sort pooling.
Parameters:  graph (DGLGraph or BatchedDGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((k * D)\) (if input graph is a BatchedDGLGraph, the result shape would be \((B, k * D)\).
Return type: mxnet.NDArray

GlobalAttentionPooling¶

class
dgl.nn.mxnet.glob.
GlobalAttentionPooling
(gate_nn, feat_nn=None)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in the graph.
\[r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)\]Parameters:  gate_nn (gluon.nn.Block) – A neural network that computes attention scores for each feature.
 feat_nn (gluon.nn.Block, optional) – A neural network applied to each feature before combining them with attention scores.

forward
(graph, feat)[source]¶ Compute global attention pooling.
Parameters:  graph (DGLGraph or BatchedDGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((D)\) (if input graph is a BatchedDGLGraph, the result shape would be \((B, D)\).
Return type: mxnet.NDArray
Set2Set¶

class
dgl.nn.mxnet.glob.
Set2Set
(input_dim, n_iters, n_layers)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Set2Set (Order Matters: Sequence to sequence for sets) over the nodes in the graph.
For each individual graph in the batch, set2set computes
\[ \begin{align}\begin{aligned}q_t &= \mathrm{LSTM} (q^*_{t1})\\\alpha_{i,t} &= \mathrm{softmax}(x_i \cdot q_t)\\r_t &= \sum_{i=1}^N \alpha_{i,t} x_i\\q^*_t &= q_t \Vert r_t\end{aligned}\end{align} \]for this graph.
Parameters: 
forward
(graph, feat)[source]¶ Compute set2set pooling.
Parameters:  graph (DGLGraph or BatchedDGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((D)\) (if input graph is a BatchedDGLGraph, the result shape would be \((B, D)\).
Return type: mxnet.NDArray

Utility Modules¶
Edge Softmax¶
Gluon layer for graph related softmax.

dgl.nn.mxnet.softmax.
edge_softmax
(graph, logits, eids='__ALL__')[source]¶ Compute edge softmax.
For a node \(i\), edge softmax is an operation of computing
\[a_{ij} = \frac{\exp(z_{ij})}{\sum_{j\in\mathcal{N}(i)}\exp(z_{ij})}\]where \(z_{ij}\) is a signal of edge \(j\rightarrow i\), also called logits in the context of softmax. \(\mathcal{N}(i)\) is the set of nodes that have an edge to \(i\).
An example of using edge softmax is in Graph Attention Network where the attention weights are computed with such an edge softmax operation.
Parameters:  graph (DGLGraph) – The graph to perform edge softmax
 logits (mxnet.NDArray) – The input edge feature
 eids (mxnet.NDArray or ALL, optional) – Edges on which to apply edge softmax. If ALL, apply edge softmax on all edges in the graph. Default: ALL.
Returns: Softmax value
Return type: Tensor
Notes
 Input shape: \((E, *, 1)\) where * means any number of additional dimensions, \(E\) equals the length of eids. If eids is ALL, \(E\) equals number of edges in the graph.
 Return shape: \((E, *, 1)\)
Examples
>>> from dgl.nn.mxnet.softmax import edge_softmax >>> import dgl >>> from mxnet import nd
Create a
DGLGraph
object and initialize its edge features.>>> g = dgl.DGLGraph() >>> g.add_nodes(3) >>> g.add_edges([0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2]) >>> edata = nd.ones((6, 1)) >>> edata [[1.] [1.] [1.] [1.] [1.] [1.]] <NDArray 6x1 @cpu(0)>
Apply edge softmax on g:
>>> edge_softmax(g, edata) [[1. ] [0.5 ] [0.33333334] [0.5 ] [0.33333334] [0.33333334]] <NDArray 6x1 @cpu(0)>
Apply edge softmax on first 4 edges of g: >>> edge_softmax(g, edata, nd.array([0,1,2,3], dtype=’int64’)) [[1. ]
[0.5] [1. ] [0.5]]<NDArray 4x1 @cpu(0)>