NN Modules (MXNet)¶
We welcome your contribution! If you want a model to be implemented in DGL as a NN module, please create an issue started with “[Feature Request] NN Module XXXModel”.
If you want to contribute a NN module, please create a pull request started with “[NN] XXXModel in MXNet NN Modules” and our team member would review this PR.
Conv Layers¶
MXNet modules for graph convolutions.
GraphConv¶

class
dgl.nn.mxnet.conv.
GraphConv
(in_feats, out_feats, norm='both', weight=True, bias=True, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
Apply graph convolution over an input signal.
Graph convolution is introduced in GCN and can be described as below:
\[h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})\]where \(\mathcal{N}(i)\) is the neighbor set of node \(i\). \(c_{ij}\) is equal to the product of the square root of node degrees: \(\sqrt{\mathcal{N}(i)}\sqrt{\mathcal{N}(j)}\). \(\sigma\) is an activation function.
The model parameters are initialized as in the original implementation where the weight \(W^{(l)}\) is initialized using Glorot uniform initialization and the bias is initialized to be zero.
Notes
Zero in degree nodes could lead to invalid normalizer. A common practice to avoid this is to add a selfloop for each node in the graph, which can be achieved by:
>>> g = ... # some DGLGraph >>> g.add_edges(g.nodes(), g.nodes())
Parameters:  in_feats (int) – Number of input features.
 out_feats (int) – Number of output features.
 norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s indegrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the \(c_{ij}\) in the paper is applied.
 weight (bool, optional) – If True, apply a linear layer. Otherwise, aggregating the messages without a weight matrix.
 bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.  activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features.
Default:
None
.

weight
¶ mxnet.gluon.parameter.Parameter – The learnable weight tensor.

bias
¶ mxnet.gluon.parameter.Parameter – The learnable bias tensor.

forward
(graph, feat, weight=None)[source]¶ Compute graph convolution.
Notes
 Input shape: \((N, *, \text{in_feats})\) where * means any number of additional dimensions, \(N\) is the number of nodes.
 Output shape: \((N, *, \text{out_feats})\) where all but the last dimension are the same shape as the input.
 Weight shape: “math:(text{in_feats}, text{out_feats}).
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature.
 weight (torch.Tensor, optional) – Optional external weight tensor.
Returns: The output feature
Return type: mxnet.NDArray
RelGraphConv¶

class
dgl.nn.mxnet.conv.
RelGraphConv
(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=False, dropout=0.0)[source]¶ Bases:
mxnet.gluon.block.Block
Relational graph convolution layer.
Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:
\[h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})\]where \(\mathcal{N}^r(i)\) is the neighbor set of node \(i\) w.r.t. relation \(r\). \(c_{i,r}\) is the normalizer equal to \(\mathcal{N}^r(i)\). \(\sigma\) is an activation function. \(W_0\) is the selfloop weight.
The basis regularization decomposes \(W_r\) by:
\[W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}\]where \(B\) is the number of bases.
The blockdiagonaldecomposition regularization decomposes \(W_r\) into \(B\) number of block diagonal matrices. We refer \(B\) as the number of bases.
Parameters:  in_feat (int) – Input feature size.
 out_feat (int) – Output feature size.
 num_rels (int) – Number of relations.
 regularizer (str) – Which weight regularizer to use “basis” or “bdd”
 num_bases (int, optional) – Number of bases. If is none, use number of relations. Default: None.
 bias (bool, optional) – True if bias is added. Default: True
 activation (callable, optional) – Activation function. Default: None
 self_loop (bool, optional) – True to include self loop message. Default: False
 dropout (float, optional) – Dropout rate. Default: 0.0

forward
(g, x, etypes, norm=None)[source]¶ Forward computation
Parameters:  g (DGLGraph) – The graph.
 x (mx.ndarray.NDArray) –
 Input node features. Could be either
 \((V, D)\) dense tensor
 \((V,)\) int64 vector, representing the categorical values of each node. We then treat the input feature as an onehot encoding feature.
 etypes (mx.ndarray.NDArray) – Edge type tensor. Shape: \((E,)\)
 norm (mx.ndarray.NDArray) – Optional edge normalizer tensor. Shape: \((E, 1)\)
Returns: New node features.
Return type: mx.ndarray.NDArray
TAGConv¶

class
dgl.nn.mxnet.conv.
TAGConv
(in_feats, out_feats, k=2, bias=True, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Topology Adaptive Graph Convolutional Network
\[\mathbf{X}^{\prime} = \sum_{k=0}^K \mathbf{D}^{1/2} \mathbf{A} \mathbf{D}^{1/2}\mathbf{X} \mathbf{\Theta}_{k},\]where \(\mathbf{A}\) denotes the adjacency matrix and \(D_{ii} = \sum_{j=0} A_{ij}\) its diagonal degree matrix.
Parameters:  in_feats (int) – Number of input features.
 out_feats (int) – Number of output features.
 k (int, optional) – Number of hops :math: k. (default: 2)
 bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.  activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features.
Default:
None
.

lin
¶ mxnet.gluon.parameter.Parameter – The learnable weight tensor.

bias
¶ mxnet.gluon.parameter.Parameter – The learnable bias tensor.

forward
(graph, feat)[source]¶ Compute graph convolution
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
Return type: mxnet.NDArray
GATConv¶

class
dgl.nn.mxnet.conv.
GATConv
(in_feats, out_feats, num_heads, feat_drop=0.0, attn_drop=0.0, negative_slope=0.2, residual=False, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Graph Attention Network over an input signal.
\[h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} \alpha_{i,j} W^{(l)} h_j^{(l)}\]where \(\alpha_{ij}\) is the attention score bewteen node \(i\) and node \(j\):
\[ \begin{align}\begin{aligned}\alpha_{ij}^{l} & = \mathrm{softmax_i} (e_{ij}^{l})\\e_{ij}^{l} & = \mathrm{LeakyReLU}\left(\vec{a}^T [W h_{i} \ W h_{j}]\right)\end{aligned}\end{align} \]Parameters:  in_feats (int or pair of ints) –
Input feature size.
If the layer is to be applied to a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.  out_feats (int) – Output feature size.
 num_heads (int) – Number of heads in MultiHead Attention.
 feat_drop (float, optional) – Dropout rate on feature, defaults:
0
.  attn_drop (float, optional) – Dropout rate on attention weight, defaults:
0
.  negative_slope (float, optional) – LeakyReLU angle of negative slope.
 residual (bool, optional) – If True, use residual connection.
 activation (callable activation function/layer or None, optional.) – If not None, applies an activation function to the updated node features.
Default:
None
.

forward
(graph, feat)[source]¶ Compute graph attention network layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – If a mxnet.NDArray is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
Returns: The output feature of shape \((N, H, D_{out})\) where \(H\) is the number of heads, and \(D_{out}\) is size of output feature.
Return type: mxnet.NDArray
 in_feats (int or pair of ints) –
EdgeConv¶

class
dgl.nn.mxnet.conv.
EdgeConv
(in_feat, out_feat, batch_norm=False)[source]¶ Bases:
mxnet.gluon.block.Block
EdgeConv layer.
Introduced in “Dynamic Graph CNN for Learning on Point Clouds”. Can be described as follows:
\[x_i^{(l+1)} = \max_{j \in \mathcal{N}(i)} \mathrm{ReLU}( \Theta \cdot (x_j^{(l)}  x_i^{(l)}) + \Phi \cdot x_i^{(l)})\]where \(\mathcal{N}(i)\) is the neighbor of \(i\).
Parameters: 
forward
(g, h)[source]¶ Forward computation
Parameters:  g (DGLGraph) – The graph.
 h (mxnet.NDArray) –
\((N, D)\) where \(N\) is the number of nodes and \(D\) is the number of feature dimensions.
If a pair of tensors is given, the graph must be a unibipartite graph with only one edge type, and the two tensors must have the same dimensionality on all except the first axis.
Returns: New node features.
Return type: mxnet.NDArray

SAGEConv¶

class
dgl.nn.mxnet.conv.
SAGEConv
(in_feats, out_feats, aggregator_type='mean', feat_drop=0.0, bias=True, norm=None, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
GraphSAGE layer from paper Inductive Representation Learning on Large Graphs.
\[ \begin{align}\begin{aligned}h_{\mathcal{N}(i)}^{(l+1)} & = \mathrm{aggregate} \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\\h_{i}^{(l+1)} & = \sigma \left(W \cdot \mathrm{concat} (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1} + b) \right)\\h_{i}^{(l+1)} & = \mathrm{norm}(h_{i}^{l})\end{aligned}\end{align} \]Parameters:  in_feats (int) –
Input feature size.
If the layer is to be applied on a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.If aggregator type is
gcn
, the feature size of source and destination nodes are required to be the same.  out_feats (int) – Output feature size.
 feat_drop (float) – Dropout rate on features, default:
0
.  aggregator_type (str) – Aggregator type to use (
mean
,gcn
,pool
,lstm
).  bias (bool) – If True, adds a learnable bias to the output. Default:
True
.  norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.
 activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features.
Default:
None
.

forward
(graph, feat)[source]¶ Compute GraphSAGE layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray or pair of mxnet.NDArray) – If a single tensor is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
Return type: mxnet.NDArray
 in_feats (int) –
SGConv¶

class
dgl.nn.mxnet.conv.
SGConv
(in_feats, out_feats, k=1, cached=False, bias=True, norm=None)[source]¶ Bases:
mxnet.gluon.block.Block
Simplifying Graph Convolution layer from paper Simplifying Graph Convolutional Networks.
\[H^{l+1} = (\hat{D}^{1/2} \hat{A} \hat{D}^{1/2})^K H^{l} \Theta^{l}\]Parameters:  in_feats (int) – Number of input features.
 out_feats (int) – Number of output features.
 k (int) – Number of hops \(K\). Defaults:
1
.  cached (bool) –
If True, the module would cache
\[(\hat{D}^{\frac{1}{2}}\hat{A}\hat{D}^{\frac{1}{2}})^K X\Theta\]at the first forward call. This parameter should only be set to
True
in Transductive Learning setting.  bias (bool) – If True, adds a learnable bias to the output. Default:
True
.  norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.

forward
(graph, feat)[source]¶ Compute Simplifying Graph Convolution layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
Return type: mxnet.NDArray
Notes
If
cache
is se to True,feat
andgraph
should not change during training, or you will get wrong results.
APPNPConv¶

class
dgl.nn.mxnet.conv.
APPNPConv
(k, alpha, edge_drop=0.0)[source]¶ Bases:
mxnet.gluon.block.Block
Approximate Personalized Propagation of Neural Predictions layer from paper Predict then Propagate: Graph Neural Networks meet Personalized PageRank.
\[ \begin{align}\begin{aligned}H^{0} & = X\\H^{t+1} & = (1\alpha)\left(\hat{D}^{1/2} \hat{A} \hat{D}^{1/2} H^{t}\right) + \alpha H^{0}\end{aligned}\end{align} \]Parameters: 
forward
(graph, feat)[source]¶ Compute APPNP layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mx.NDArray) – The input feature of shape \((N, *)\) \(N\) is the number of nodes, and \(*\) could be of any shape.
Returns: The output feature of shape \((N, *)\) where \(*\) should be the same as input shape.
Return type: mx.NDArray

GINConv¶

class
dgl.nn.mxnet.conv.
GINConv
(apply_func, aggregator_type, init_eps=0, learn_eps=False)[source]¶ Bases:
mxnet.gluon.block.Block
Graph Isomorphism Network layer from paper How Powerful are Graph Neural Networks?.
\[h_i^{(l+1)} = f_\Theta \left((1 + \epsilon) h_i^{l} + \mathrm{aggregate}\left(\left\{h_j^{l}, j\in\mathcal{N}(i) \right\}\right)\right)\]Parameters:  apply_func (callable activation function/layer or None) – If not None, apply this function to the updated node feature, the \(f_\Theta\) in the formula.
 aggregator_type (str) – Aggregator type to use (
sum
,max
ormean
).  init_eps (float, optional) – Initial \(\epsilon\) value, default:
0
.  learn_eps (bool, optional) – If True, \(\epsilon\) will be a learnable parameter.

forward
(graph, feat)[source]¶ Compute Graph Isomorphism Network layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray or a pair of mxnet.NDArray) – If a mxnet.NDArray is given, the input feature of shape \((N, D_{in})\)
where \(D_{in}\) is size of input feature, \(N\) is the number of
nodes.
If a pair of mxnet.NDArray is given, the pair must contain two tensors of
shape \((N_{in}, D_{in})\) and \((N_{out}, D_{in})\).
If
apply_func
is not None, \(D_{in}\) should fit the input dimensionality requirement ofapply_func
.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output dimensionality of
apply_func
. Ifapply_func
is None, \(D_{out}\) should be the same as input dimensionality.Return type: mxnet.NDArray
GatedGraphConv¶

class
dgl.nn.mxnet.conv.
GatedGraphConv
(in_feats, out_feats, n_steps, n_etypes, bias=True)[source]¶ Bases:
mxnet.gluon.block.Block
Gated Graph Convolution layer from paper Gated Graph Sequence Neural Networks.
\[ \begin{align}\begin{aligned}h_{i}^{0} & = [ x_i \ \mathbf{0} ]\\a_{i}^{t} & = \sum_{j\in\mathcal{N}(i)} W_{e_{ij}} h_{j}^{t}\\h_{i}^{t+1} & = \mathrm{GRU}(a_{i}^{t}, h_{i}^{t})\end{aligned}\end{align} \]Parameters: 
forward
(graph, feat, etypes)[source]¶ Compute Gated Graph Convolution layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(N\) is the number of nodes of the graph and \(D_{in}\) is the input feature size.
 etypes (torch.LongTensor) – The edge type tensor of shape \((E,)\) where \(E\) is the number of edges of the graph.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output feature size.
Return type: mxnet.NDArray

GMMConv¶

class
dgl.nn.mxnet.conv.
GMMConv
(in_feats, out_feats, dim, n_kernels, aggregator_type='sum', residual=False, bias=True)[source]¶ Bases:
mxnet.gluon.block.Block
The Gaussian Mixture Model Convolution layer from Geometric Deep Learning on Graphs and Manifolds using Mixture Model CNNs.
\[ \begin{align}\begin{aligned}h_i^{l+1} & = \mathrm{aggregate}\left(\left\{\frac{1}{K} \sum_{k}^{K} w_k(u_{ij}), \forall j\in \mathcal{N}(i)\right\}\right)\\w_k(u) & = \exp\left(\frac{1}{2}(u\mu_k)^T \Sigma_k^{1} (u  \mu_k)\right)\end{aligned}\end{align} \]Parameters:  in_feats (int, or pair of ints) –
Number of input features.
If the layer is to be applied on a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.  out_feats (int) – Number of output features.
 dim (int) – Dimensionality of pseudocoordinte.
 n_kernels (int) – Number of kernels \(K\).
 aggregator_type (str) – Aggregator type (
sum
,mean
,max
). Default:sum
.  residual (bool) – If True, use residual connection inside this layer. Default:
False
.  bias (bool) – If True, adds a learnable bias to the output. Default:
True
.

forward
(graph, feat, pseudo)[source]¶ Compute Gaussian Mixture Model Convolution layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – If a single tensor is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
 pseudo (mxnet.NDArray) – The pseudo coordinate tensor of shape \((E, D_{u})\) where \(E\) is the number of edges of the graph and \(D_{u}\) is the dimensionality of pseudo coordinate.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output feature size.
Return type: mxnet.NDArray
 in_feats (int, or pair of ints) –
ChebConv¶

class
dgl.nn.mxnet.conv.
ChebConv
(in_feats, out_feats, k, bias=True)[source]¶ Bases:
mxnet.gluon.block.Block
Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.
\[ \begin{align}\begin{aligned}h_i^{l+1} &= \sum_{k=0}^{K1} W^{k, l}z_i^{k, l}\\Z^{0, l} &= H^{l}\\Z^{1, l} &= \hat{L} \cdot H^{l}\\Z^{k, l} &= 2 \cdot \hat{L} \cdot Z^{k1, l}  Z^{k2, l}\\\hat{L} &= 2\left(I  \hat{D}^{1/2} \hat{A} \hat{D}^{1/2}\right)/\lambda_{max}  I\end{aligned}\end{align} \]Parameters: 
forward
(graph, feat, lambda_max=None)[source]¶ Compute ChebNet layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
 lambda_max (list or mxnet.NDArray or None, optional.) – A list(tensor) with length \(B\), stores the largest eigenvalue
of the normalized laplacian of each individual graph in
graph
, where \(B\) is the batch size of the input graph. Default: None. If None, this method would compute the list by callingdgl.laplacian_lambda_max
.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
Return type: mxnet.NDArray

AGNNConv¶

class
dgl.nn.mxnet.conv.
AGNNConv
(init_beta=1.0, learn_beta=True)[source]¶ Bases:
mxnet.gluon.block.Block
Attentionbased Graph Neural Network layer from paper Attentionbased Graph Neural Network for SemiSupervised Learning.
\[H^{l+1} = P H^{l}\]where \(P\) is computed as:
\[P_{ij} = \mathrm{softmax}_i ( \beta \cdot \cos(h_i^l, h_j^l))\]Parameters: 
forward
(graph, feat)[source]¶ Compute AGNN Layer.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature of shape \((N, *)\) \(N\) is the number of nodes, and \(*\) could be of any shape. If a pair of mxnet.NDArray is given, the pair must contain two tensors of shape \((N_{in}, *)\) and \((N_{out}, *})\), the the \(*\) in the later tensor must equal the previous one.
Returns: The output feature of shape \((N, *)\) where \(*\) should be the same as input shape.
Return type: mxnet.NDArray

Dense Conv Layers¶
DenseGraphConv¶

class
dgl.nn.mxnet.conv.
DenseGraphConv
(in_feats, out_feats, norm='both', bias=True, activation=None)[source]¶ Bases:
mxnet.gluon.block.Block
Graph Convolutional Network layer where the graph structure is given by an adjacency matrix. We recommend user to use this module when applying graph convolution on dense graphs.
Parameters:  in_feats (int) – Input feature size.
 out_feats (int) – Output feature size.
 norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s indegrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the \(c_{ij}\) in the paper is applied.
 bias (bool) – If True, adds a learnable bias to the output. Default:
True
.  activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features.
Default:
None
.
See also

forward
(adj, feat)[source]¶ Compute (Dense) Graph Convolution layer.
Parameters:  adj (mxnet.NDArray) – The adjacency matrix of the graph to apply Graph Convolution on, when
applied to a unidirectional bipartite graph,
adj
should be of shape should be of shape \((N_{out}, N_{in})\); when applied to a homo graph,adj
should be of shape \((N, N)\). In both cases, a row represents a destination node while a column represents a source node.  feat (torch.Tensor) – The input feature.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
Return type: mxnet.NDArray
 adj (mxnet.NDArray) – The adjacency matrix of the graph to apply Graph Convolution on, when
applied to a unidirectional bipartite graph,
DenseChebConv¶

class
dgl.nn.pytorch.conv.
DenseChebConv
(in_feats, out_feats, k, bias=True)[source]¶ Bases:
torch.nn.modules.module.Module
Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.
We recommend to use this module when applying ChebConv on dense graphs.
Parameters: See also

forward
(adj, feat, lambda_max=None)[source]¶ Compute (Dense) Chebyshev Spectral Graph Convolution layer.
Parameters:  adj (torch.Tensor) – The adjacency matrix of the graph to apply Graph Convolution on, should be of shape \((N, N)\), where a row represents the destination and a column represents the source.
 feat (torch.Tensor) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
 lambda_max (float or None, optional) – A float value indicates the largest eigenvalue of given graph. Default: None.
Returns: The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
Return type: torch.Tensor

Global Pooling Layers¶
MXNet modules for graph global pooling.
SumPooling¶

class
dgl.nn.mxnet.glob.
SumPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply sum pooling over the nodes in the graph.
\[r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k\]
forward
(graph, feat)[source]¶ Compute sum pooling.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((B, *)\), where \(B\) refers to the batch size.
Return type: mxnet.NDArray

AvgPooling¶

class
dgl.nn.mxnet.glob.
AvgPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply average pooling over the nodes in the graph.
\[r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k\]
forward
(graph, feat)[source]¶ Compute average pooling.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((B, *)\), where \(B\) refers to the batch size.
Return type: mxnet.NDArray

MaxPooling¶

class
dgl.nn.mxnet.glob.
MaxPooling
[source]¶ Bases:
mxnet.gluon.block.Block
Apply max pooling over the nodes in the graph.
\[r^{(i)} = \max_{k=1}^{N_i} \left( x^{(i)}_k \right)\]
forward
(graph, feat)[source]¶ Compute max pooling.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, *)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((B, *)\), where \(B\) refers to the batch size.
Return type: mxnet.NDArray

SortPooling¶

class
dgl.nn.mxnet.glob.
SortPooling
(k)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Sort Pooling (An EndtoEnd Deep Learning Architecture for Graph Classification) over the nodes in the graph.
Parameters: k (int) – The number of nodes to hold for each graph. 
forward
(graph, feat)[source]¶ Compute sort pooling.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((B, k * D)\), where \(B\) refers to the batch size.
Return type: mxnet.NDArray

GlobalAttentionPooling¶

class
dgl.nn.mxnet.glob.
GlobalAttentionPooling
(gate_nn, feat_nn=None)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in the graph.
\[r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)\]Parameters:  gate_nn (gluon.nn.Block) – A neural network that computes attention scores for each feature.
 feat_nn (gluon.nn.Block, optional) – A neural network applied to each feature before combining them with attention scores.

forward
(graph, feat)[source]¶ Compute global attention pooling.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((B, D)\), where \(B\) refers to the batch size.
Return type: mxnet.NDArray
Set2Set¶

class
dgl.nn.mxnet.glob.
Set2Set
(input_dim, n_iters, n_layers)[source]¶ Bases:
mxnet.gluon.block.Block
Apply Set2Set (Order Matters: Sequence to sequence for sets) over the nodes in the graph.
For each individual graph in the batch, set2set computes
\[ \begin{align}\begin{aligned}q_t &= \mathrm{LSTM} (q^*_{t1})\\\alpha_{i,t} &= \mathrm{softmax}(x_i \cdot q_t)\\r_t &= \sum_{i=1}^N \alpha_{i,t} x_i\\q^*_t &= q_t \Vert r_t\end{aligned}\end{align} \]for this graph.
Parameters: 
forward
(graph, feat)[source]¶ Compute set2set pooling.
Parameters:  graph (DGLGraph) – The graph.
 feat (mxnet.NDArray) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph.
Returns: The output feature with shape \((B, D)\), where \(B\) refers to the batch size.
Return type: mxnet.NDArray

Utility Modules¶
Sequential¶

class
dgl.nn.mxnet.utils.
Sequential
(prefix=None, params=None)[source]¶ Bases:
mxnet.gluon.nn.basic_layers.Sequential
A squential container for stacking graph neural network blocks.
We support two modes: sequentially apply GNN blocks on the same graph or a list of given graphs. In the second case, the number of graphs equals the number of blocks inside this container.
Examples
Mode 1: sequentially apply GNN modules on the same graph
>>> import dgl >>> from mxnet import nd >>> from mxnet.gluon import nn >>> import dgl.function as fn >>> from dgl.nn.mxnet import Sequential >>> class ExampleLayer(nn.Block): >>> def __init__(self, **kwargs): >>> super().__init__(**kwargs) >>> def forward(self, graph, n_feat, e_feat): >>> graph = graph.local_var() >>> graph.ndata['h'] = n_feat >>> graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h')) >>> n_feat += graph.ndata['h'] >>> graph.apply_edges(fn.u_add_v('h', 'h', 'e')) >>> e_feat += graph.edata['e'] >>> return n_feat, e_feat >>> >>> g = dgl.DGLGraph() >>> g.add_nodes(3) >>> g.add_edges([0, 1, 2, 0, 1, 2, 0, 1, 2], [0, 0, 0, 1, 1, 1, 2, 2, 2]) >>> net = Sequential() >>> net.add(ExampleLayer()) >>> net.add(ExampleLayer()) >>> net.add(ExampleLayer()) >>> net.initialize() >>> n_feat = nd.random.randn(3, 4) >>> e_feat = nd.random.randn(9, 4) >>> net(g, n_feat, e_feat) ( [[ 12.412863 99.61184 21.472883 57.625923 ] [ 10.08097 100.68611 20.627377 60.13458 ] [ 11.7912245 101.80654 22.427956 58.32772 ]] <NDArray 3x4 @cpu(0)>, [[ 21.818504 198.12076 42.72387 115.147736] [ 23.070837 195.49811 43.42292 116.17203 ] [ 24.330334 197.10927 42.40048 118.06538 ] [ 21.907919 199.11469 42.1187 115.35658 ] [ 22.849625 198.79213 43.866085 113.65381 ] [ 20.926125 198.116 42.64334 114.246704] [ 23.003159 197.06662 41.796425 117.14977 ] [ 21.391375 198.3348 41.428078 116.30361 ] [ 21.291483 200.0701 40.8239 118.07314 ]] <NDArray 9x4 @cpu(0)>)
Mode 2: sequentially apply GNN modules on different graphs
>>> import dgl >>> from mxnet import nd >>> from mxnet.gluon import nn >>> import dgl.function as fn >>> import networkx as nx >>> from dgl.nn.mxnet import Sequential >>> class ExampleLayer(nn.Block): >>> def __init__(self, **kwargs): >>> super().__init__(**kwargs) >>> def forward(self, graph, n_feat): >>> graph = graph.local_var() >>> graph.ndata['h'] = n_feat >>> graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h')) >>> n_feat += graph.ndata['h'] >>> return n_feat.reshape(graph.number_of_nodes() // 2, 2, 1).sum(1) >>> >>> g1 = dgl.DGLGraph(nx.erdos_renyi_graph(32, 0.05)) >>> g2 = dgl.DGLGraph(nx.erdos_renyi_graph(16, 0.2)) >>> g3 = dgl.DGLGraph(nx.erdos_renyi_graph(8, 0.8)) >>> net = Sequential() >>> net.add(ExampleLayer()) >>> net.add(ExampleLayer()) >>> net.add(ExampleLayer()) >>> net.initialize() >>> n_feat = nd.random.randn(32, 4) >>> net([g1, g2, g3], n_feat) [[101.289566 22.584694 89.25348 151.6447 ] [130.74239 49.494812 120.250854 199.81546 ] [112.32089 50.036713 116.13266 190.38638 ] [119.23065 26.78553 111.11185 166.08322 ]] <NDArray 4x4 @cpu(0)>
Edge Softmax¶
Gluon layer for graph related softmax.

dgl.nn.mxnet.softmax.
edge_softmax
(graph, logits, eids='__ALL__')[source]¶ Compute edge softmax.
For a node \(i\), edge softmax is an operation of computing
\[a_{ij} = \frac{\exp(z_{ij})}{\sum_{j\in\mathcal{N}(i)}\exp(z_{ij})}\]where \(z_{ij}\) is a signal of edge \(j\rightarrow i\), also called logits in the context of softmax. \(\mathcal{N}(i)\) is the set of nodes that have an edge to \(i\).
An example of using edge softmax is in Graph Attention Network where the attention weights are computed with such an edge softmax operation.
Parameters:  graph (DGLGraph) – The graph to perform edge softmax
 logits (mxnet.NDArray) – The input edge feature
 eids (mxnet.NDArray or ALL, optional) – Edges on which to apply edge softmax. If ALL, apply edge softmax on all edges in the graph. Default: ALL.
Returns: Softmax value
Return type: Tensor
Notes
 Input shape: \((E, *, 1)\) where * means any number of additional dimensions, \(E\) equals the length of eids. If eids is ALL, \(E\) equals number of edges in the graph.
 Return shape: \((E, *, 1)\)
Examples
>>> from dgl.nn.mxnet.softmax import edge_softmax >>> import dgl >>> from mxnet import nd
Create a
DGLGraph
object and initialize its edge features.>>> g = dgl.DGLGraph() >>> g.add_nodes(3) >>> g.add_edges([0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2]) >>> edata = nd.ones((6, 1)) >>> edata [[1.] [1.] [1.] [1.] [1.] [1.]] <NDArray 6x1 @cpu(0)>
Apply edge softmax on g:
>>> edge_softmax(g, edata) [[1. ] [0.5 ] [0.33333334] [0.5 ] [0.33333334] [0.33333334]] <NDArray 6x1 @cpu(0)>
Apply edge softmax on first 4 edges of g:
>>> edge_softmax(g, edata, nd.array([0,1,2,3], dtype='int64')) [[1. ] [0.5] [1. ] [0.5]] <NDArray 4x1 @cpu(0)>