# NN Modules (PyTorch)¶

We welcome your contribution! If you want a model to be implemented in DGL as a NN module, please create an issue started with “[Feature Request] NN Module XXXModel”.

If you want to contribute a NN module, please create a pull request started with “[NN] XXXModel in PyTorch NN Modules” and our team member would review this PR.

## Conv Layers¶

Torch modules for graph convolutions.

### GraphConv¶

class dgl.nn.pytorch.conv.GraphConv(in_feats, out_feats, norm='both', weight=True, bias=True, activation=None)[source]

Bases: torch.nn.modules.module.Module

Apply graph convolution over an input signal.

Graph convolution is introduced in GCN and can be described as below:

$h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})$

where $$\mathcal{N}(i)$$ is the neighbor set of node $$i$$. $$c_{ij}$$ is equal to the product of the square root of node degrees: $$\sqrt{|\mathcal{N}(i)|}\sqrt{|\mathcal{N}(j)|}$$. $$\sigma$$ is an activation function.

The model parameters are initialized as in the original implementation where the weight $$W^{(l)}$$ is initialized using Glorot uniform initialization and the bias is initialized to be zero.

Notes

Zero in degree nodes could lead to invalid normalizer. A common practice to avoid this is to add a self-loop for each node in the graph, which can be achieved by:

>>> g = ... # some DGLGraph

Parameters: in_feats (int) – Input feature size. out_feats (int) – Output feature size. norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s in-degrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the $$c_{ij}$$ in the paper is applied. weight (bool, optional) – If True, apply a linear layer. Otherwise, aggregating the messages without a weight matrix. bias (bool, optional) – If True, adds a learnable bias to the output. Default: True. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.
weight

torch.Tensor – The learnable weight tensor.

bias

torch.Tensor – The learnable bias tensor.

forward(graph, feat, weight=None)[source]

Compute graph convolution.

Notes

• Input shape: $$(N, *, \text{in_feats})$$ where * means any number of additional dimensions, $$N$$ is the number of nodes.
• Output shape: $$(N, *, \text{out_feats})$$ where all but the last dimension are the same shape as the input.
• Weight shape: “math:(text{in_feats}, text{out_feats}).
Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature weight (torch.Tensor, optional) – Optional external weight tensor. The output feature torch.Tensor
reset_parameters()[source]

Reinitialize learnable parameters.

### RelGraphConv¶

class dgl.nn.pytorch.conv.RelGraphConv(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=False, dropout=0.0)[source]

Bases: torch.nn.modules.module.Module

Relational graph convolution layer.

Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:

$h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})$

where $$\mathcal{N}^r(i)$$ is the neighbor set of node $$i$$ w.r.t. relation $$r$$. $$c_{i,r}$$ is the normalizer equal to $$|\mathcal{N}^r(i)|$$. $$\sigma$$ is an activation function. $$W_0$$ is the self-loop weight.

The basis regularization decomposes $$W_r$$ by:

$W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}$

where $$B$$ is the number of bases.

The block-diagonal-decomposition regularization decomposes $$W_r$$ into $$B$$ number of block diagonal matrices. We refer $$B$$ as the number of bases.

Parameters: in_feat (int) – Input feature size. out_feat (int) – Output feature size. num_rels (int) – Number of relations. regularizer (str) – Which weight regularizer to use “basis” or “bdd” num_bases (int, optional) – Number of bases. If is none, use number of relations. Default: None. bias (bool, optional) – True if bias is added. Default: True activation (callable, optional) – Activation function. Default: None self_loop (bool, optional) – True to include self loop message. Default: False dropout (float, optional) – Dropout rate. Default: 0.0
forward(g, x, etypes, norm=None)[source]

Forward computation

Parameters: g (DGLGraph) – The graph. x (torch.Tensor) – Input node features. Could be either $$(|V|, D)$$ dense tensor $$(|V|,)$$ int64 vector, representing the categorical values of each node. We then treat the input feature as an one-hot encoding feature. etypes (torch.Tensor) – Edge type tensor. Shape: $$(|E|,)$$ norm (torch.Tensor) – Optional edge normalizer tensor. Shape: $$(|E|, 1)$$ New node features. torch.Tensor

### TAGConv¶

class dgl.nn.pytorch.conv.TAGConv(in_feats, out_feats, k=2, bias=True, activation=None)[source]

Bases: torch.nn.modules.module.Module

Topology Adaptive Graph Convolutional layer from paper Topology Adaptive Graph Convolutional Networks.

$\mathbf{X}^{\prime} = \sum_{k=0}^K \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}\mathbf{X} \mathbf{\Theta}_{k},$

where $$\mathbf{A}$$ denotes the adjacency matrix and $$D_{ii} = \sum_{j=0} A_{ij}$$ its diagonal degree matrix.

Parameters: in_feats (int) – Input feature size. out_feats (int) – Output feature size. k (int, optional) – Number of hops :math: k. (default: 2) bias (bool, optional) – If True, adds a learnable bias to the output. Default: True. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.
lin

torch.Module – The learnable linear module.

forward(graph, feat)[source]

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature. torch.Tensor

### GATConv¶

class dgl.nn.pytorch.conv.GATConv(in_feats, out_feats, num_heads, feat_drop=0.0, attn_drop=0.0, negative_slope=0.2, residual=False, activation=None)[source]

Bases: torch.nn.modules.module.Module

Apply Graph Attention Network over an input signal.

$h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} \alpha_{i,j} W^{(l)} h_j^{(l)}$

where $$\alpha_{ij}$$ is the attention score bewteen node $$i$$ and node $$j$$:

\begin{align}\begin{aligned}\alpha_{ij}^{l} & = \mathrm{softmax_i} (e_{ij}^{l})\\e_{ij}^{l} & = \mathrm{LeakyReLU}\left(\vec{a}^T [W h_{i} \| W h_{j}]\right)\end{aligned}\end{align}
Parameters: in_feats (int, or pair of ints) – Input feature size. If the layer is to be applied to a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value. out_feats (int) – Output feature size. num_heads (int) – Number of heads in Multi-Head Attention. feat_drop (float, optional) – Dropout rate on feature, defaults: 0. attn_drop (float, optional) – Dropout rate on attention weight, defaults: 0. negative_slope (float, optional) – LeakyReLU angle of negative slope. residual (bool, optional) – If True, use residual connection. activation (callable activation function/layer or None, optional.) – If not None, applies an activation function to the updated node features. Default: None.
forward(graph, feat)[source]

Compute graph attention network layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape $$(N_{in}, D_{in_{src}})$$ and $$(N_{out}, D_{in_{dst}})$$. The output feature of shape $$(N, H, D_{out})$$ where $$H$$ is the number of heads, and $$D_{out}$$ is size of output feature. torch.Tensor

### EdgeConv¶

class dgl.nn.pytorch.conv.EdgeConv(in_feat, out_feat, batch_norm=False)[source]

Bases: torch.nn.modules.module.Module

EdgeConv layer.

Introduced in “Dynamic Graph CNN for Learning on Point Clouds”. Can be described as follows:

$x_i^{(l+1)} = \max_{j \in \mathcal{N}(i)} \mathrm{ReLU}( \Theta \cdot (x_j^{(l)} - x_i^{(l)}) + \Phi \cdot x_i^{(l)})$

where $$\mathcal{N}(i)$$ is the neighbor of $$i$$.

Parameters: in_feat (int) – Input feature size. out_feat (int) – Output feature size. batch_norm (bool) – Whether to include batch normalization on messages.
forward(g, h)[source]

Forward computation

Parameters: g (DGLGraph) – The graph. h (Tensor or pair of tensors) – $$(N, D)$$ where $$N$$ is the number of nodes and $$D$$ is the number of feature dimensions. If a pair of tensors is given, the graph must be a uni-bipartite graph with only one edge type, and the two tensors must have the same dimensionality on all except the first axis. New node features. torch.Tensor

### SAGEConv¶

class dgl.nn.pytorch.conv.SAGEConv(in_feats, out_feats, aggregator_type, feat_drop=0.0, bias=True, norm=None, activation=None)[source]

Bases: torch.nn.modules.module.Module

GraphSAGE layer from paper Inductive Representation Learning on Large Graphs.

\begin{align}\begin{aligned}h_{\mathcal{N}(i)}^{(l+1)} & = \mathrm{aggregate} \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\\h_{i}^{(l+1)} & = \sigma \left(W \cdot \mathrm{concat} (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1} + b) \right)\\h_{i}^{(l+1)} & = \mathrm{norm}(h_{i}^{l})\end{aligned}\end{align}
Parameters: in_feats (int, or pair of ints) – Input feature size. If the layer is to be applied on a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value. If aggregator type is gcn, the feature size of source and destination nodes are required to be the same. out_feats (int) – Output feature size. feat_drop (float) – Dropout rate on features, default: 0. aggregator_type (str) – Aggregator type to use (mean, gcn, pool, lstm). bias (bool) – If True, adds a learnable bias to the output. Default: True. norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.
forward(graph, feat)[source]

Compute GraphSAGE layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape $$(N_{in}, D_{in_{src}})$$ and $$(N_{out}, D_{in_{dst}})$$. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature. torch.Tensor

### SGConv¶

class dgl.nn.pytorch.conv.SGConv(in_feats, out_feats, k=1, cached=False, bias=True, norm=None)[source]

Bases: torch.nn.modules.module.Module

Simplifying Graph Convolution layer from paper Simplifying Graph Convolutional Networks.

$H^{l+1} = (\hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2})^K H^{l} \Theta^{l}$
Parameters: in_feats (int) – Number of input features. out_feats (int) – Number of output features. k (int) – Number of hops $$K$$. Defaults:1. cached (bool) – If True, the module would cache $(\hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}})^K X\Theta$ at the first forward call. This parameter should only be set to True in Transductive Learning setting. bias (bool) – If True, adds a learnable bias to the output. Default: True. norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.
forward(graph, feat)[source]

Compute Simplifying Graph Convolution layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature. torch.Tensor

Notes

If cache is se to True, feat and graph should not change during training, or you will get wrong results.

### APPNPConv¶

class dgl.nn.pytorch.conv.APPNPConv(k, alpha, edge_drop=0.0)[source]

Bases: torch.nn.modules.module.Module

Approximate Personalized Propagation of Neural Predictions layer from paper Predict then Propagate: Graph Neural Networks meet Personalized PageRank.

\begin{align}\begin{aligned}H^{0} & = X\\H^{t+1} & = (1-\alpha)\left(\hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2} H^{t}\right) + \alpha H^{0}\end{aligned}\end{align}
Parameters: k (int) – Number of iterations $$K$$. alpha (float) – The teleport probability $$\alpha$$. edge_drop (float, optional) – Dropout rate on edges that controls the messages received by each node. Default: 0.
forward(graph, feat)[source]

Compute APPNP layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature of shape $$(N, *)$$ $$N$$ is the number of nodes, and $$*$$ could be of any shape. The output feature of shape $$(N, *)$$ where $$*$$ should be the same as input shape. torch.Tensor

### GINConv¶

class dgl.nn.pytorch.conv.GINConv(apply_func, aggregator_type, init_eps=0, learn_eps=False)[source]

Bases: torch.nn.modules.module.Module

Graph Isomorphism Network layer from paper How Powerful are Graph Neural Networks?.

$h_i^{(l+1)} = f_\Theta \left((1 + \epsilon) h_i^{l} + \mathrm{aggregate}\left(\left\{h_j^{l}, j\in\mathcal{N}(i) \right\}\right)\right)$
Parameters: apply_func (callable activation function/layer or None) – If not None, apply this function to the updated node feature, the $$f_\Theta$$ in the formula. aggregator_type (str) – Aggregator type to use (sum, max or mean). init_eps (float, optional) – Initial $$\epsilon$$ value, default: 0. learn_eps (bool, optional) – If True, $$\epsilon$$ will be a learnable parameter.
forward(graph, feat)[source]

Compute Graph Isomorphism Network layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape $$(N_{in}, D_{in})$$ and $$(N_{out}, D_{in})$$. If apply_func is not None, $$D_{in}$$ should fit the input dimensionality requirement of apply_func. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is the output dimensionality of apply_func. If apply_func is None, $$D_{out}$$ should be the same as input dimensionality. torch.Tensor

### GatedGraphConv¶

class dgl.nn.pytorch.conv.GatedGraphConv(in_feats, out_feats, n_steps, n_etypes, bias=True)[source]

Bases: torch.nn.modules.module.Module

Gated Graph Convolution layer from paper Gated Graph Sequence Neural Networks.

\begin{align}\begin{aligned}h_{i}^{0} & = [ x_i \| \mathbf{0} ]\\a_{i}^{t} & = \sum_{j\in\mathcal{N}(i)} W_{e_{ij}} h_{j}^{t}\\h_{i}^{t+1} & = \mathrm{GRU}(a_{i}^{t}, h_{i}^{t})\end{aligned}\end{align}
Parameters: in_feats (int) – Input feature size. out_feats (int) – Output feature size. n_steps (int) – Number of recurrent steps. n_etypes (int) – Number of edge types. bias (bool) – If True, adds a learnable bias to the output. Default: True.
forward(graph, feat, etypes)[source]

Compute Gated Graph Convolution layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature of shape $$(N, D_{in})$$ where $$N$$ is the number of nodes of the graph and $$D_{in}$$ is the input feature size. etypes (torch.LongTensor) – The edge type tensor of shape $$(E,)$$ where $$E$$ is the number of edges of the graph. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is the output feature size. torch.Tensor

### GMMConv¶

class dgl.nn.pytorch.conv.GMMConv(in_feats, out_feats, dim, n_kernels, aggregator_type='sum', residual=False, bias=True)[source]

Bases: torch.nn.modules.module.Module

The Gaussian Mixture Model Convolution layer from Geometric Deep Learning on Graphs and Manifolds using Mixture Model CNNs.

\begin{align}\begin{aligned}h_i^{l+1} & = \mathrm{aggregate}\left(\left\{\frac{1}{K} \sum_{k}^{K} w_k(u_{ij}), \forall j\in \mathcal{N}(i)\right\}\right)\\w_k(u) & = \exp\left(-\frac{1}{2}(u-\mu_k)^T \Sigma_k^{-1} (u - \mu_k)\right)\end{aligned}\end{align}
Parameters: in_feats (int) – Number of input features. out_feats (int) – Number of output features. dim (int) – Dimensionality of pseudo-coordinte. n_kernels (int) – Number of kernels $$K$$. aggregator_type (str) – Aggregator type (sum, mean, max). residual (bool) – If True, use residual connection inside this layer. Default: False. bias (bool) – If True, adds a learnable bias to the output. Default: True.
forward(graph, feat, pseudo)[source]

Compute Gaussian Mixture Model Convolution layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – If a single tensor is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape $$(N_{in}, D_{in_{src}})$$ and $$(N_{out}, D_{in_{dst}})$$. pseudo (torch.Tensor) – The pseudo coordinate tensor of shape $$(E, D_{u})$$ where $$E$$ is the number of edges of the graph and $$D_{u}$$ is the dimensionality of pseudo coordinate. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is the output feature size. torch.Tensor

### ChebConv¶

class dgl.nn.pytorch.conv.ChebConv(in_feats, out_feats, k, bias=True)[source]

Bases: torch.nn.modules.module.Module

Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.

\begin{align}\begin{aligned}h_i^{l+1} &= \sum_{k=0}^{K-1} W^{k, l}z_i^{k, l}\\Z^{0, l} &= H^{l}\\Z^{1, l} &= \hat{L} \cdot H^{l}\\Z^{k, l} &= 2 \cdot \hat{L} \cdot Z^{k-1, l} - Z^{k-2, l}\\\hat{L} &= 2\left(I - \hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2}\right)/\lambda_{max} - I\end{aligned}\end{align}
Parameters: in_feats (int) – Number of input features. out_feats (int) – Number of output features. k (int) – Chebyshev filter size. bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.
forward(graph, feat, lambda_max=None)[source]

Compute ChebNet layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. lambda_max (list or tensor or None, optional.) – A list(tensor) with length $$B$$, stores the largest eigenvalue of the normalized laplacian of each individual graph in graph, where $$B$$ is the batch size of the input graph. Default: None. If None, this method would compute the list by calling dgl.laplacian_lambda_max. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature. torch.Tensor

### AGNNConv¶

class dgl.nn.pytorch.conv.AGNNConv(init_beta=1.0, learn_beta=True)[source]

Bases: torch.nn.modules.module.Module

Attention-based Graph Neural Network layer from paper Attention-based Graph Neural Network for Semi-Supervised Learning.

$H^{l+1} = P H^{l}$

where $$P$$ is computed as:

$P_{ij} = \mathrm{softmax}_i ( \beta \cdot \cos(h_i^l, h_j^l))$
Parameters: init_beta (float, optional) – The $$\beta$$ in the formula. learn_beta (bool, optional) – If True, $$\beta$$ will be learnable parameter.
forward(graph, feat)[source]

Compute AGNN layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature of shape $$(N, *)$$ $$N$$ is the number of nodes, and $$*$$ could be of any shape. If a pair of torch.Tensor is given, the pair must contain two tensors of shape $$(N_{in}, *)$$ and $$(N_{out}, *})$$, the the $$*$$ in the later tensor must equal the previous one. The output feature of shape $$(N, *)$$ where $$*$$ should be the same as input shape. torch.Tensor

### NNConv¶

class dgl.nn.pytorch.conv.NNConv(in_feats, out_feats, edge_func, aggregator_type, residual=False, bias=True)[source]

Bases: torch.nn.modules.module.Module

Graph Convolution layer introduced in Neural Message Passing for Quantum Chemistry.

$h_{i}^{l+1} = h_{i}^{l} + \mathrm{aggregate}\left(\left\{ f_\Theta (e_{ij}) \cdot h_j^{l}, j\in \mathcal{N}(i) \right\}\right)$
Parameters: in_feats (int) – Input feature size. If the layer is to be applied on a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value. out_feats (int) – Output feature size. edge_func (callable activation function/layer) – Maps each edge feature to a vector of shape (in_feats * out_feats) as weight to compute messages. Also is the $$f_\Theta$$ in the formula. aggregator_type (str) – Aggregator type to use (sum, mean or max). residual (bool, optional) – If True, use residual connection. Default: False. bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.
forward(graph, feat, efeat)[source]

Compute MPNN Graph Convolution layer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor or pair of torch.Tensor) – The input feature of shape $$(N, D_{in})$$ where $$N$$ is the number of nodes of the graph and $$D_{in}$$ is the input feature size. efeat (torch.Tensor) – The edge feature of shape $$(N, *)$$, should fit the input shape requirement of edge_nn. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is the output feature size. torch.Tensor

### AtomicConv¶

class dgl.nn.pytorch.conv.AtomicConv(interaction_cutoffs, rbf_kernel_means, rbf_kernel_scaling, features_to_use=None)[source]

Bases: torch.nn.modules.module.Module

Atomic Convolution Layer from paper Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity.

We denote the type of atom $$i$$ by $$z_i$$ and the distance between atom $$i$$ and $$j$$ by $$r_{ij}$$.

Distance Transformation

An atomic convolution layer first transforms distances with radial filters and then perform a pooling operation.

For radial filter indexed by $$k$$, it projects edge distances with

$h_{ij}^{k} = \exp(-\gamma_{k}|r_{ij}-r_{k}|^2)$

If $$r_{ij} < c_k$$,

$f_{ij}^{k} = 0.5 * \cos(\frac{\pi r_{ij}}{c_k} + 1),$

else,

$f_{ij}^{k} = 0.$

Finally,

$e_{ij}^{k} = h_{ij}^{k} * f_{ij}^{k}$

Aggregation

For each type $$t$$, each atom collects distance information from all neighbor atoms of type $$t$$:

$p_{i, t}^{k} = \sum_{j\in N(i)} e_{ij}^{k} * 1(z_j == t)$

We concatenate the results for all RBF kernels and atom types.

Notes

• This convolution operation is designed for molecular graphs in Chemistry, but it might be possible to extend it to more general graphs.
• There seems to be an inconsistency about the definition of $$e_{ij}^{k}$$ in the paper and the author’s implementation. We follow the author’s implementation. In the paper, $$e_{ij}^{k}$$ was defined as $$\exp(-\gamma_{k}|r_{ij}-r_{k}|^2 * f_{ij}^{k})$$.
• $$\gamma_{k}$$, $$r_k$$ and $$c_k$$ are all learnable.
Parameters: interaction_cutoffs (float32 tensor of shape (K)) – $$c_k$$ in the equations above. Roughly they can be considered as learnable cutoffs and two atoms are considered as connected if the distance between them is smaller than the cutoffs. K for the number of radial filters. rbf_kernel_means (float32 tensor of shape (K)) – $$r_k$$ in the equations above. K for the number of radial filters. rbf_kernel_scaling (float32 tensor of shape (K)) – $$\gamma_k$$ in the equations above. K for the number of radial filters. features_to_use (None or float tensor of shape (T)) – In the original paper, these are atomic numbers to consider, representing the types of atoms. T for the number of types of atomic numbers. Default to None.
forward(graph, feat, distances)[source]

Apply the atomic convolution layer.

Parameters: graph (DGLGraph) – Topology based on which message passing is performed. feat (Float32 tensor of shape (V, 1)) – Initial node features, which are atomic numbers in the paper. V for the number of nodes. distances (Float32 tensor of shape (E, 1)) – Distance between end nodes of edges. E for the number of edges. Updated node representations. V for the number of nodes, K for the number of radial filters, and T for the number of types of atomic numbers. Float32 tensor of shape (V, K * T)

## Dense Conv Layers¶

### DenseGraphConv¶

class dgl.nn.pytorch.conv.DenseGraphConv(in_feats, out_feats, norm='both', bias=True, activation=None)[source]

Bases: torch.nn.modules.module.Module

Graph Convolutional Network layer where the graph structure is given by an adjacency matrix. We recommend user to use this module when applying graph convolution on dense graphs.

Parameters: in_feats (int) – Input feature size. out_feats (int) – Output feature size. norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s in-degrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the $$c_{ij}$$ in the paper is applied. bias (bool) – If True, adds a learnable bias to the output. Default: True. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.
forward(adj, feat)[source]

Compute (Dense) Graph Convolution layer.

Parameters: adj (torch.Tensor) – The adjacency matrix of the graph to apply Graph Convolution on, when applied to a unidirectional bipartite graph, adj should be of shape should be of shape $$(N_{out}, N_{in})$$; when applied to a homo graph, adj should be of shape $$(N, N)$$. In both cases, a row represents a destination node while a column represents a source node. feat (torch.Tensor) – The input feature. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature. torch.Tensor

### DenseSAGEConv¶

class dgl.nn.pytorch.conv.DenseSAGEConv(in_feats, out_feats, feat_drop=0.0, bias=True, norm=None, activation=None)[source]

Bases: torch.nn.modules.module.Module

GraphSAGE layer where the graph structure is given by an adjacency matrix. We recommend to use this module when appying GraphSAGE on dense graphs.

Note that we only support gcn aggregator in DenseSAGEConv.

Parameters: in_feats (int) – Input feature size. out_feats (int) – Output feature size. feat_drop (float, optional) – Dropout rate on features. Default: 0. bias (bool) – If True, adds a learnable bias to the output. Default: True. norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.
forward(adj, feat)[source]

Compute (Dense) Graph SAGE layer.

Parameters: adj (torch.Tensor) – The adjacency matrix of the graph to apply SAGE Convolution on, when applied to a unidirectional bipartite graph, adj should be of shape should be of shape $$(N_{out}, N_{in})$$; when applied to a homo graph, adj should be of shape $$(N, N)$$. In both cases, a row represents a destination node while a column represents a source node. feat (torch.Tensor or a pair of torch.Tensor) – If a torch.Tensor is given, the input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape $$(N_{in}, D_{in})$$ and $$(N_{out}, D_{in})$$. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature. torch.Tensor

### DenseChebConv¶

class dgl.nn.pytorch.conv.DenseChebConv(in_feats, out_feats, k, bias=True)[source]

Bases: torch.nn.modules.module.Module

Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.

We recommend to use this module when applying ChebConv on dense graphs.

Parameters: in_feats (int) – Number of input features. out_feats (int) – Number of output features. k (int) – Chebyshev filter size. bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.
forward(adj, feat, lambda_max=None)[source]

Compute (Dense) Chebyshev Spectral Graph Convolution layer.

Parameters: adj (torch.Tensor) – The adjacency matrix of the graph to apply Graph Convolution on, should be of shape $$(N, N)$$, where a row represents the destination and a column represents the source. feat (torch.Tensor) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. lambda_max (float or None, optional) – A float value indicates the largest eigenvalue of given graph. Default: None. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature. torch.Tensor

## Global Pooling Layers¶

Torch modules for graph global pooling.

### SumPooling¶

class dgl.nn.pytorch.glob.SumPooling[source]

Bases: torch.nn.modules.module.Module

Apply sum pooling over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k$
forward(graph, feat)[source]

Compute sum pooling.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size. torch.Tensor

### AvgPooling¶

class dgl.nn.pytorch.glob.AvgPooling[source]

Bases: torch.nn.modules.module.Module

Apply average pooling over the nodes in the graph.

$r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k$
forward(graph, feat)[source]

Compute average pooling.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size. torch.Tensor

### MaxPooling¶

class dgl.nn.pytorch.glob.MaxPooling[source]

Bases: torch.nn.modules.module.Module

Apply max pooling over the nodes in the graph.

$r^{(i)} = \max_{k=1}^{N_i}\left( x^{(i)}_k \right)$
forward(graph, feat)[source]

Compute max pooling.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, *)$$, where $$B$$ refers to the batch size. torch.Tensor

### SortPooling¶

class dgl.nn.pytorch.glob.SortPooling(k)[source]

Bases: torch.nn.modules.module.Module

Apply Sort Pooling (An End-to-End Deep Learning Architecture for Graph Classification) over the nodes in the graph.

Parameters: k (int) – The number of nodes to hold for each graph.
forward(graph, feat)[source]

Compute sort pooling.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, k * D)$$, where $$B$$ refers to the batch size. torch.Tensor

### GlobalAttentionPooling¶

class dgl.nn.pytorch.glob.GlobalAttentionPooling(gate_nn, feat_nn=None)[source]

Bases: torch.nn.modules.module.Module

Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)$
Parameters: gate_nn (torch.nn.Module) – A neural network that computes attention scores for each feature. feat_nn (torch.nn.Module, optional) – A neural network applied to each feature before combining them with attention scores.
forward(graph, feat)[source]

Compute global attention pooling.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, D)$$, where $$B$$ refers to the batch size. torch.Tensor

### Set2Set¶

class dgl.nn.pytorch.glob.Set2Set(input_dim, n_iters, n_layers)[source]

Bases: torch.nn.modules.module.Module

Apply Set2Set (Order Matters: Sequence to sequence for sets) over the nodes in the graph.

For each individual graph in the batch, set2set computes

\begin{align}\begin{aligned}q_t &= \mathrm{LSTM} (q^*_{t-1})\\\alpha_{i,t} &= \mathrm{softmax}(x_i \cdot q_t)\\r_t &= \sum_{i=1}^N \alpha_{i,t} x_i\\q^*_t &= q_t \Vert r_t\end{aligned}\end{align}

for this graph.

Parameters: input_dim (int) – Size of each input sample n_iters (int) – Number of iterations. n_layers (int) – Number of recurrent layers.
forward(graph, feat)[source]

Compute set2set pooling.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, D)$$, where $$B$$ refers to the batch size. torch.Tensor

### SetTransformerEncoder¶

class dgl.nn.pytorch.glob.SetTransformerEncoder(d_model, n_heads, d_head, d_ff, n_layers=1, block_type='sab', m=None, dropouth=0.0, dropouta=0.0)[source]

Bases: torch.nn.modules.module.Module

The Encoder module in Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks.

Parameters: d_model (int) – Hidden size of the model. n_heads (int) – Number of heads. d_head (int) – Hidden size of each head. d_ff (int) – Kernel size in FFN (Positionwise Feed-Forward Network) layer. n_layers (int) – Number of layers. block_type (str) – Building block type: ‘sab’ (Set Attention Block) or ‘isab’ (Induced Set Attention Block). m (int or None) – Number of induced vectors in ISAB Block, set to None if block type is ‘sab’. dropouth (float) – Dropout rate of each sublayer. dropouta (float) – Dropout rate of attention heads.
forward(graph, feat)[source]

Compute the Encoder part of Set Transformer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(N, D)$$. torch.Tensor

### SetTransformerDecoder¶

class dgl.nn.pytorch.glob.SetTransformerDecoder(d_model, num_heads, d_head, d_ff, n_layers, k, dropouth=0.0, dropouta=0.0)[source]

Bases: torch.nn.modules.module.Module

The Decoder module in Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks.

Parameters: d_model (int) – Hidden size of the model. num_heads (int) – Number of heads. d_head (int) – Hidden size of each head. d_ff (int) – Kernel size in FFN (Positionwise Feed-Forward Network) layer. n_layers (int) – Number of layers. k (int) – Number of seed vectors in PMA (Pooling by Multihead Attention) layer. dropouth (float) – Dropout rate of each sublayer. dropouta (float) – Dropout rate of attention heads.
forward(graph, feat)[source]

Compute the decoder part of Set Transformer.

Parameters: graph (DGLGraph) – The graph. feat (torch.Tensor) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(B, D)$$, where $$B$$ refers to the batch size. torch.Tensor

## Utility Modules¶

### Sequential¶

class dgl.nn.pytorch.utils.Sequential(*args)[source]

Bases: torch.nn.modules.container.Sequential

A squential container for stacking graph neural network modules.

We support two modes: sequentially apply GNN modules on the same graph or a list of given graphs. In the second case, the number of graphs equals the number of modules inside this container.

Parameters: *args – Sub-modules of type torch.nn.Module, will be added to the container in the order they are passed in the constructor.

Examples

Mode 1: sequentially apply GNN modules on the same graph

>>> import torch
>>> import dgl
>>> import torch.nn as nn
>>> import dgl.function as fn
>>> from dgl.nn.pytorch import Sequential
>>> class ExampleLayer(nn.Module):
>>>     def __init__(self):
>>>         super().__init__()
>>>     def forward(self, graph, n_feat, e_feat):
>>>         graph = graph.local_var()
>>>         graph.ndata['h'] = n_feat
>>>         graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h'))
>>>         n_feat += graph.ndata['h']
>>>         e_feat += graph.edata['e']
>>>         return n_feat, e_feat
>>>
>>> g = dgl.DGLGraph()
>>> g.add_edges([0, 1, 2, 0, 1, 2, 0, 1, 2], [0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> net = Sequential(ExampleLayer(), ExampleLayer(), ExampleLayer())
>>> n_feat = torch.rand(3, 4)
>>> e_feat = torch.rand(9, 4)
>>> net(g, n_feat, e_feat)
(tensor([[39.8597, 45.4542, 25.1877, 30.8086],
[40.7095, 45.3985, 25.4590, 30.0134],
[40.7894, 45.2556, 25.5221, 30.4220]]), tensor([[80.3772, 89.7752, 50.7762, 60.5520],
[80.5671, 89.3736, 50.6558, 60.6418],
[80.4620, 89.5142, 50.3643, 60.3126],
[80.4817, 89.8549, 50.9430, 59.9108],
[80.2284, 89.6954, 50.0448, 60.1139],
[79.7846, 89.6882, 50.5097, 60.6213],
[80.2654, 90.2330, 50.2787, 60.6937],
[80.3468, 90.0341, 50.2062, 60.2659],
[80.0556, 90.2789, 50.2882, 60.5845]]))


Mode 2: sequentially apply GNN modules on different graphs

>>> import torch
>>> import dgl
>>> import torch.nn as nn
>>> import dgl.function as fn
>>> import networkx as nx
>>> from dgl.nn.pytorch import Sequential
>>> class ExampleLayer(nn.Module):
>>>     def __init__(self):
>>>         super().__init__()
>>>     def forward(self, graph, n_feat):
>>>         graph = graph.local_var()
>>>         graph.ndata['h'] = n_feat
>>>         graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h'))
>>>         n_feat += graph.ndata['h']
>>>         return n_feat.view(graph.number_of_nodes() // 2, 2, -1).sum(1)
>>>
>>> g1 = dgl.DGLGraph(nx.erdos_renyi_graph(32, 0.05))
>>> g2 = dgl.DGLGraph(nx.erdos_renyi_graph(16, 0.2))
>>> g3 = dgl.DGLGraph(nx.erdos_renyi_graph(8, 0.8))
>>> net = Sequential(ExampleLayer(), ExampleLayer(), ExampleLayer())
>>> n_feat = torch.rand(32, 4)
>>> net([g1, g2, g3], n_feat)
tensor([[209.6221, 225.5312, 193.8920, 220.1002],
[250.0169, 271.9156, 240.2467, 267.7766],
[220.4007, 239.7365, 213.8648, 234.9637],
[196.4630, 207.6319, 184.2927, 208.7465]])

forward(graph, *feats)[source]

Sequentially apply modules to the input.

Parameters: graph (DGLGraph or list of DGLGraphs) – The graph(s) to apply modules on. *feats – Input features. The output of $$i$$-th block should match that of the input of $$(i+1)$$-th block.

### KNNGraph¶

class dgl.nn.pytorch.factory.KNNGraph(k)[source]

Bases: torch.nn.modules.module.Module

Layer that transforms one point set into a graph, or a batch of point sets with the same number of points into a union of those graphs.

If a batch of point set is provided, then the point $$j$$ in point set $$i$$ is mapped to graph node ID $$i \times M + j$$, where $$M$$ is the number of nodes in each point set.

The predecessors of each node are the k-nearest neighbors of the corresponding point.

Parameters: k (int) – The number of neighbors
forward(x)[source]

Forward computation.

Parameters: x (Tensor) – $$(M, D)$$ or $$(N, M, D)$$ where $$N$$ means the number of point sets, $$M$$ means the number of points in each point set, and $$D$$ means the size of features. A DGLGraph with no features. DGLGraph

### SegmentedKNNGraph¶

class dgl.nn.pytorch.factory.SegmentedKNNGraph(k)[source]

Bases: torch.nn.modules.module.Module

Layer that transforms one point set into a graph, or a batch of point sets with different number of points into a union of those graphs.

If a batch of point set is provided, then the point $$j$$ in point set $$i$$ is mapped to graph node ID $$\sum_{p<i} |V_p| + j$$, where $$|V_p|$$ means the number of points in point set $$p$$.

The predecessors of each node are the k-nearest neighbors of the corresponding point.

Parameters: k (int) – The number of neighbors
forward(x, segs)[source]

Forward computation.

Parameters: x (Tensor) – $$(M, D)$$ where $$M$$ means the total number of points in all point sets. segs (iterable of int) – $$(N)$$ integers where $$N$$ means the number of point sets. The elements must sum up to $$M$$. A DGLGraph with no features. DGLGraph

### Edge Softmax¶

Torch modules for graph related softmax.

dgl.nn.pytorch.softmax.edge_softmax(graph, logits, eids='__ALL__')[source]

Compute edge softmax.

For a node $$i$$, edge softmax is an operation of computing

$a_{ij} = \frac{\exp(z_{ij})}{\sum_{j\in\mathcal{N}(i)}\exp(z_{ij})}$

where $$z_{ij}$$ is a signal of edge $$j\rightarrow i$$, also called logits in the context of softmax. $$\mathcal{N}(i)$$ is the set of nodes that have an edge to $$i$$.

An example of using edge softmax is in Graph Attention Network where the attention weights are computed with such an edge softmax operation.

Parameters: graph (DGLGraph) – The graph to perform edge softmax logits (torch.Tensor) – The input edge feature eids (torch.Tensor or ALL, optional) – Edges on which to apply edge softmax. If ALL, apply edge softmax on all edges in the graph. Default: ALL. Softmax value Tensor

Notes

• Input shape: $$(E, *, 1)$$ where * means any number of additional dimensions, $$E$$ equals the length of eids. If eids is ALL, $$E$$ equals number of edges in the graph.
• Return shape: $$(E, *, 1)$$

Examples

>>> from dgl.nn.pytorch.softmax import edge_softmax
>>> import dgl
>>> import torch as th


Create a DGLGraph object and initialize its edge features.

>>> g = dgl.DGLGraph()
>>> g.add_edges([0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2])
>>> edata = th.ones(6, 1).float()
>>> edata
tensor([[1.],
[1.],
[1.],
[1.],
[1.],
[1.]])


Apply edge softmax on g:

>>> edge_softmax(g, edata)
tensor([[1.0000],
[0.5000],
[0.3333],
[0.5000],
[0.3333],
[0.3333]])


Apply edge softmax on first 4 edges of g:

>>> edge_softmax(g, edata[:4], th.Tensor([0,1,2,3]))
tensor([[1.0000],
[0.5000],
[1.0000],
[0.5000]])