NN Modules (PyTorch)¶
Conv Layers¶
Torch modules for graph convolutions.
GraphConv¶

class
dgl.nn.pytorch.conv.
GraphConv
(in_feats, out_feats, norm='both', weight=True, bias=True, activation=None, allow_zero_in_degree=False)[source]¶ Bases:
torch.nn.modules.module.Module
Graph convolution was introduced in GCN and mathematically is defined as follows:
\[h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ji}}h_j^{(l)}W^{(l)})\]where \(\mathcal{N}(i)\) is the set of neighbors of node \(i\), \(c_{ji}\) is the product of the square root of node degrees (i.e., \(c_{ji} = \sqrt{\mathcal{N}(j)}\sqrt{\mathcal{N}(i)}\)), and \(\sigma\) is an activation function.
If a weight tensor on each edge is provided, the weighted graph convolution is defined as:
\[h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{e_{ji}}{c_{ji}}h_j^{(l)}W^{(l)})\]where \(e_{ji}\) is the scalar weight on the edge from node \(j\) to node \(i\). This is NOT equivalent to the weighted graph convolutional network formulation in the paper.
To customize the normalization term \(c_{ji}\), one can first set
norm='none'
for the model, and send the prenormalized \(e_{ji}\) to the forward computation. We provideEdgeWeightNorm
to normalize scalar edge weight following the GCN paper. Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
out_feats (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
norm (str, optional) –
How to apply the normalizer. Can be one of the following values:
right
, to divide the aggregated messages by each node’s indegrees, which is equivalent to averaging the received messages.none
, where no normalization is applied.both
(default), where the messages are scaled with \(1/c_{ji}\) above, equivalent to symmetric normalization.left
, to divide the messages sent out from each node by its outdegrees, equivalent to random walk normalization.
weight (bool, optional) – If True, apply a linear layer. Otherwise, aggregating the messages without a weight matrix.
bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.

weight
¶ The learnable weight tensor.
 Type
torch.Tensor

bias
¶ The learnable bias tensor.
 Type
torch.Tensor
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import GraphConv
>>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> conv = GraphConv(10, 2, norm='both', weight=True, bias=True) >>> res = conv(g, feat) >>> print(res) tensor([[ 1.3326, 0.2797], [ 1.4673, 0.3080], [ 1.3326, 0.2797], [ 1.6871, 0.3541], [ 1.7711, 0.3717], [ 1.0375, 0.2178]], grad_fn=<AddBackward0>) >>> # allow_zero_in_degree example >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> conv = GraphConv(10, 2, norm='both', weight=True, bias=True, allow_zero_in_degree=True) >>> res = conv(g, feat) >>> print(res) tensor([[0.2473, 0.4631], [0.3497, 0.6549], [0.3497, 0.6549], [0.4221, 0.7905], [0.3497, 0.6549], [ 0.0000, 0.0000]], grad_fn=<AddBackward0>)
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.heterograph({('_U', '_E', '_V') : (u, v)}) >>> u_fea = th.rand(2, 5) >>> v_fea = th.rand(4, 5) >>> conv = GraphConv(5, 2, norm='both', weight=True, bias=True) >>> res = conv(g, (u_fea, v_fea)) >>> res tensor([[0.2994, 0.6106], [0.4482, 0.5540], [0.5287, 0.8235], [0.2994, 0.6106]], grad_fn=<AddBackward0>)

forward
(graph, feat, weight=None, edge_weight=None)[source]¶ Compute graph convolution.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, it represents the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of torch.Tensor is given, which is the case for bipartite graph, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
weight (torch.Tensor, optional) – Optional external weight tensor.
edge_weight (torch.Tensor, optional) – Optional tensor on the edge. If given, the convolution will weight with regard to the message.
 Returns
The output feature
 Return type
torch.Tensor
 Raises
DGLError – Case 1: If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
. Case 2: External weight is provided while at the same time the module has defined its own weight parameter.
Note
Input shape: \((N, *, \text{in_feats})\) where * means any number of additional dimensions, \(N\) is the number of nodes.
Output shape: \((N, *, \text{out_feats})\) where all but the last dimension are the same shape as the input.
Weight shape: \((\text{in_feats}, \text{out_feats})\).

reset_parameters
()[source]¶ Reinitialize learnable parameters.
Note
The model parameters are initialized as in the original implementation where the weight \(W^{(l)}\) is initialized using Glorot uniform initialization and the bias is initialized to be zero.
EdgeWeightNorm¶

class
dgl.nn.pytorch.conv.
EdgeWeightNorm
(norm='both', eps=0.0)[source]¶ Bases:
torch.nn.modules.module.Module
This module normalizes positive scalar edge weights on a graph following the form in GCN.
Mathematically, setting
norm='both'
yields the following normalization term:\[c_{ji} = (\sqrt{\sum_{k\in\mathcal{N}(j)}e_{jk}}\sqrt{\sum_{k\in\mathcal{N}(i)}e_{ki}})\]And, setting
norm='right'
yields the following normalization term:\[c_{ji} = (\sum_{k\in\mathcal{N}(i)}e_{ki})\]where \(e_{ji}\) is the scalar weight on the edge from node \(j\) to node \(i\).
The module returns the normalized weight \(e_{ji} / c_{ji}\).
 Parameters
Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import EdgeWeightNorm, GraphConv
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> edge_weight = th.tensor([0.5, 0.6, 0.4, 0.7, 0.9, 0.1, 1, 1, 1, 1, 1, 1]) >>> norm = EdgeWeightNorm(norm='both') >>> norm_edge_weight = norm(g, edge_weight) >>> conv = GraphConv(10, 2, norm='none', weight=True, bias=True) >>> res = conv(g, feat, edge_weight=norm_edge_weight) >>> print(res) tensor([[1.1849, 0.7525], [1.3514, 0.8582], [1.2384, 0.7865], [1.9949, 1.2669], [1.3658, 0.8674], [0.8323, 0.5286]], grad_fn=<AddBackward0>)

forward
(graph, edge_weight)[source]¶ Compute normalized edge weight for the GCN model.
 Parameters
graph (DGLGraph) – The graph.
edge_weight (torch.Tensor) – Unnormalized scalar weights on the edges. The shape is expected to be \((E)\).
 Returns
The normalized edge weight.
 Return type
torch.Tensor
 Raises
DGLError – Case 1: The edge weight is multidimensional. Currently this module only supports a scalar weight on each edge. Case 2: The edge weight has nonpositive values with
norm='both'
. This will trigger square root and division by a nonpositive number.
RelGraphConv¶

class
dgl.nn.pytorch.conv.
RelGraphConv
(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=True, low_mem=False, dropout=0.0, layer_norm=False)[source]¶ Bases:
torch.nn.modules.module.Module
Relational graph convolution layer.
Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described in DGL as below:
\[h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}e_{j,i}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})\]where \(\mathcal{N}^r(i)\) is the neighbor set of node \(i\) w.r.t. relation \(r\). \(e_{j,i}\) is the normalizer. \(\sigma\) is an activation function. \(W_0\) is the selfloop weight.
The basis regularization decomposes \(W_r\) by:
\[W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}\]where \(B\) is the number of bases, \(V_b^{(l)}\) are linearly combined with coefficients \(a_{rb}^{(l)}\).
The blockdiagonaldecomposition regularization decomposes \(W_r\) into \(B\) number of block diagonal matrices. We refer \(B\) as the number of bases.
The block regularization decomposes \(W_r\) by:
\[W_r^{(l)} = \oplus_{b=1}^B Q_{rb}^{(l)}\]where \(B\) is the number of bases, \(Q_{rb}^{(l)}\) are block bases with shape \(R^{(d^{(l+1)}/B)*(d^{l}/B)}\).
 Parameters
in_feat (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
out_feat (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
num_rels (int) – Number of relations. .
regularizer (str) – Which weight regularizer to use “basis” or “bdd”. “basis” is short for basisdiagonaldecomposition. “bdd” is short for blockdiagonaldecomposition.
num_bases (int, optional) – Number of bases. If is none, use number of relations. Default:
None
.bias (bool, optional) – True if bias is added. Default:
True
.activation (callable, optional) – Activation function. Default:
None
.self_loop (bool, optional) – True to include self loop message. Default:
True
.low_mem (bool, optional) – True to use low memory implementation of relation message passing function. Default: False. This option trades speed with memory consumption, and will slowdown the forward/backward. Turn it on when you encounter OOM problem during training or evaluation. Default:
False
.dropout (float, optional) – Dropout rate. Default:
0.0
layer_norm (float, optional) – Add layer norm. Default:
False
Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import RelGraphConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 10) >>> conv = RelGraphConv(10, 2, 3, regularizer='basis', num_bases=2) >>> conv.weight.shape torch.Size([2, 10, 2]) >>> etype = th.tensor(np.array([0,1,2,0,1,2]).astype(np.int64)) >>> res = conv(g, feat, etype) >>> res tensor([[ 0.3996, 2.3303], [0.4323, 0.1440], [ 0.3996, 2.3303], [ 2.1046, 2.8654], [0.4323, 0.1440], [0.1309, 1.0000]], grad_fn=<AddBackward0>)
>>> # Onehot input >>> one_hot_feat = th.tensor(np.array([0,1,2,3,4,5]).astype(np.int64)) >>> res = conv(g, one_hot_feat, etype) >>> res tensor([[ 0.5925, 0.0985], [0.3953, 0.8408], [0.9819, 0.5284], [1.0085, 0.1721], [ 0.5962, 1.2002], [ 0.0365, 0.3532]], grad_fn=<AddBackward0>)

forward
(g, feat, etypes, norm=None)[source]¶ Forward computation.
 Parameters
g (DGLGraph) – The graph.
feat (torch.Tensor) –
Input node features. Could be either
\((V, D)\) dense tensor
\((V,)\) int64 vector, representing the categorical values of each node. It then treat the input feature as an onehot encoding feature.
etypes (torch.Tensor or list[int]) –
Edge type data. Could be either
An \((E,)\) dense tensor. Each element corresponds to the edge’s type ID. Preferred format if
lowmem == False
.An integer list. The i^th element is the number of edges of the i^th type. This requires the input graph to store edges sorted by their type IDs. Preferred format if
lowmem == True
.
norm (torch.Tensor, optional) –
Edge normalizer. Could be either
An \((E, 1)\) tensor storing the normalizer on each edge.
 Returns
New node features.
 Return type
torch.Tensor
Notes
Under the
low_mem
mode, DGL will sort the graph based on the edge types and compute message passing one type at a time. DGL recommends sorts the graph beforehand (and cache it if possible) and provides the integer list format to theetypes
argument. Use DGL’sto_homogeneous()
API to get a sorted homogeneous graph from a heterogeneous graph. Passreturn_count=True
to it to get theetypes
in integer list.
TAGConv¶

class
dgl.nn.pytorch.conv.
TAGConv
(in_feats, out_feats, k=2, bias=True, activation=None)[source]¶ Bases:
torch.nn.modules.module.Module
Topology Adaptive Graph Convolutional layer from paper Topology Adaptive Graph Convolutional Networks.
\[H^{K} = {\sum}_{k=0}^K (D^{1/2} A D^{1/2})^{k} X {\Theta}_{k},\]where \(A\) denotes the adjacency matrix, \(D_{ii} = \sum_{j=0} A_{ij}\) its diagonal degree matrix, \({\Theta}_{k}\) denotes the linear weights to sum the results of different hops together.
 Parameters
in_feats (int) – Input feature size. i.e, the number of dimensions of \(X\).
out_feats (int) – Output feature size. i.e, the number of dimensions of \(H^{K}\).
k (int, optional) – Number of hops \(K\). Default:
2
.bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.

lin
¶ The learnable linear module.
 Type
torch.Module
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import TAGConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 10) >>> conv = TAGConv(10, 2, k=2) >>> res = conv(g, feat) >>> res tensor([[ 0.5490, 1.6373], [ 0.5490, 1.6373], [ 0.5490, 1.6373], [ 0.5513, 1.8208], [ 0.5215, 1.6044], [ 0.3304, 1.9927]], grad_fn=<AddmmBackward>)

forward
(graph, feat)[source]¶ Compute topology adaptive graph convolution.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
torch.Tensor
GATConv¶

class
dgl.nn.pytorch.conv.
GATConv
(in_feats, out_feats, num_heads, feat_drop=0.0, attn_drop=0.0, negative_slope=0.2, residual=False, activation=None, allow_zero_in_degree=False, bias=True)[source]¶ Bases:
torch.nn.modules.module.Module
Apply Graph Attention Network over an input signal.
\[h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} \alpha_{i,j} W^{(l)} h_j^{(l)}\]where \(\alpha_{ij}\) is the attention score bewteen node \(i\) and node \(j\):
\[ \begin{align}\begin{aligned}\alpha_{ij}^{l} &= \mathrm{softmax_i} (e_{ij}^{l})\\e_{ij}^{l} &= \mathrm{LeakyReLU}\left(\vec{a}^T [W h_{i} \ W h_{j}]\right)\end{aligned}\end{align} \] Parameters
in_feats (int, or pair of ints) – Input feature size; i.e, the number of dimensions of \(h_i^{(l)}\). GATConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer is to be applied to a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.out_feats (int) – Output feature size; i.e, the number of dimensions of \(h_i^{(l+1)}\).
num_heads (int) – Number of heads in MultiHead Attention.
feat_drop (float, optional) – Dropout rate on feature. Defaults:
0
.attn_drop (float, optional) – Dropout rate on attention weight. Defaults:
0
.negative_slope (float, optional) – LeakyReLU angle of negative slope. Defaults:
0.2
.residual (bool, optional) – If True, use residual connection. Defaults:
False
.activation (callable activation function/layer or None, optional.) – If not None, applies an activation function to the updated node features. Default:
None
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Defaults:False
.bias (bool, optional) – If True, learns a bias term. Defaults:
True
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import GATConv
>>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> gatconv = GATConv(10, 2, num_heads=3) >>> res = gatconv(g, feat) >>> res tensor([[[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]]], grad_fn=<BinaryReduceBackward>)
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.heterograph({('A', 'r', 'B'): (u, v)}) >>> u_feat = th.tensor(np.random.rand(2, 5).astype(np.float32)) >>> v_feat = th.tensor(np.random.rand(4, 10).astype(np.float32)) >>> gatconv = GATConv((5,10), 2, 3) >>> res = gatconv(g, (u_feat, v_feat)) >>> res tensor([[[0.6066, 1.0268], [0.5945, 0.4801], [ 0.1594, 0.3825]], [[ 0.0268, 1.0783], [ 0.5041, 1.3025], [ 0.6568, 0.7048]], [[0.2688, 1.0543], [0.0315, 0.9016], [ 0.3943, 0.5347]], [[0.6066, 1.0268], [0.5945, 0.4801], [ 0.1594, 0.3825]]], grad_fn=<BinaryReduceBackward>)

forward
(graph, feat, get_attention=False)[source]¶ Compute graph attention network layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, the input feature of shape \((N, *, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape \((N_{in}, *, D_{in_{src}})\) and \((N_{out}, *, D_{in_{dst}})\).
get_attention (bool, optional) – Whether to return the attention values. Default to False.
 Returns
torch.Tensor – The output feature of shape \((N, *, H, D_{out})\) where \(H\) is the number of heads, and \(D_{out}\) is size of output feature.
torch.Tensor, optional – The attention values of shape \((E, *, H, 1)\), where \(E\) is the number of edges. This is returned only when
get_attention
isTrue
.
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
EdgeConv¶

class
dgl.nn.pytorch.conv.
EdgeConv
(in_feat, out_feat, batch_norm=False, allow_zero_in_degree=False)[source]¶ Bases:
torch.nn.modules.module.Module
EdgeConv layer.
Introduced in “Dynamic Graph CNN for Learning on Point Clouds”. Can be described as follows:
\[h_i^{(l+1)} = \max_{j \in \mathcal{N}(i)} ( \Theta \cdot (h_j^{(l)}  h_i^{(l)}) + \Phi \cdot h_i^{(l)})\]where \(\mathcal{N}(i)\) is the neighbor of \(i\). \(\Theta\) and \(\Phi\) are linear layers.
Note
The original formulation includes a ReLU inside the maximum operator. This is equivalent to first applying a maximum operator then applying the ReLU.
 Parameters
in_feat (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
out_feat (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
batch_norm (bool) – Whether to include batch normalization on messages. Default:
False
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import EdgeConv
>>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> conv = EdgeConv(10, 2) >>> res = conv(g, feat) >>> res tensor([[0.2347, 0.5849], [0.2347, 0.5849], [0.2347, 0.5849], [0.2347, 0.5849], [0.2347, 0.5849], [0.2347, 0.5849]], grad_fn=<CopyReduceBackward>)
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_fea = th.rand(2, 5) >>> v_fea = th.rand(4, 5) >>> conv = EdgeConv(5, 2, 3) >>> res = conv(g, (u_fea, v_fea)) >>> res tensor([[ 1.6375, 0.2085], [1.1925, 1.2852], [ 0.2101, 1.3466], [ 0.2342, 0.9868]], grad_fn=<CopyReduceBackward>)

forward
(g, feat)[source]¶ Forward computation
 Parameters
g (DGLGraph) – The graph.
feat (Tensor or pair of tensors) –
\((N, D)\) where \(N\) is the number of nodes and \(D\) is the number of feature dimensions.
If a pair of tensors is given, the graph must be a unibipartite graph with only one edge type, and the two tensors must have the same dimensionality on all except the first axis.
 Returns
New node features.
 Return type
torch.Tensor
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
SAGEConv¶

class
dgl.nn.pytorch.conv.
SAGEConv
(in_feats, out_feats, aggregator_type, feat_drop=0.0, bias=True, norm=None, activation=None)[source]¶ Bases:
torch.nn.modules.module.Module
GraphSAGE layer from paper Inductive Representation Learning on Large Graphs.
\[ \begin{align}\begin{aligned}h_{\mathcal{N}(i)}^{(l+1)} &= \mathrm{aggregate} \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\\h_{i}^{(l+1)} &= \sigma \left(W \cdot \mathrm{concat} (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1}) \right)\\h_{i}^{(l+1)} &= \mathrm{norm}(h_{i}^{l})\end{aligned}\end{align} \]If a weight tensor on each edge is provided, the aggregation becomes:
\[h_{\mathcal{N}(i)}^{(l+1)} = \mathrm{aggregate} \left(\{e_{ji} h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\]where \(e_{ji}\) is the scalar weight on the edge from node \(j\) to node \(i\). Please make sure that \(e_{ji}\) is broadcastable with \(h_j^{l}\).
 Parameters
in_feats (int, or pair of ints) –
Input feature size; i.e, the number of dimensions of \(h_i^{(l)}\).
SAGEConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer applies on a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.If aggregator type is
gcn
, the feature size of source and destination nodes are required to be the same.out_feats (int) – Output feature size; i.e, the number of dimensions of \(h_i^{(l+1)}\).
feat_drop (float) – Dropout rate on features, default:
0
.aggregator_type (str) – Aggregator type to use (
mean
,gcn
,pool
,lstm
).bias (bool) – If True, adds a learnable bias to the output. Default:
True
.norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.
activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.
Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import SAGEConv
>>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> conv = SAGEConv(10, 2, 'pool') >>> res = conv(g, feat) >>> res tensor([[1.0888, 2.1099], [1.0888, 2.1099], [1.0888, 2.1099], [1.0888, 2.1099], [1.0888, 2.1099], [1.0888, 2.1099]], grad_fn=<AddBackward0>)
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_fea = th.rand(2, 5) >>> v_fea = th.rand(4, 10) >>> conv = SAGEConv((5, 10), 2, 'mean') >>> res = conv(g, (u_fea, v_fea)) >>> res tensor([[ 0.3163, 3.1166], [ 0.3866, 2.5398], [ 0.5873, 1.6597], [0.2502, 2.8068]], grad_fn=<AddBackward0>)

forward
(graph, feat, edge_weight=None)[source]¶ Compute GraphSAGE layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, it represents the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
edge_weight (torch.Tensor, optional) – Optional tensor on the edge. If given, the convolution will weight with regard to the message.
 Returns
The output feature of shape \((N_{dst}, D_{out})\) where \(N_{dst}\) is the number of destination nodes in the input graph, math:D_{out} is size of output feature.
 Return type
torch.Tensor
SGConv¶

class
dgl.nn.pytorch.conv.
SGConv
(in_feats, out_feats, k=1, cached=False, bias=True, norm=None, allow_zero_in_degree=False)[source]¶ Bases:
torch.nn.modules.module.Module
Simplifying Graph Convolution layer from paper Simplifying Graph Convolutional Networks.
\[H^{K} = (\tilde{D}^{1/2} \tilde{A} \tilde{D}^{1/2})^K X \Theta\]where \(\tilde{A}\) is \(A\) + \(I\). Thus the graph input is expected to have selfloop edges added.
 Parameters
in_feats (int) – Number of input features; i.e, the number of dimensions of \(X\).
out_feats (int) – Number of output features; i.e, the number of dimensions of \(H^{K}\).
k (int) – Number of hops \(K\). Defaults:
1
.cached (bool) –
If True, the module would cache
\[(\tilde{D}^{\frac{1}{2}}\tilde{A}\tilde{D}^{\frac{1}{2}})^K X\Theta\]at the first forward call. This parameter should only be set to
True
in Transductive Learning setting.bias (bool) – If True, adds a learnable bias to the output. Default:
True
.norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features. Default:
False
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import SGConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> conv = SGConv(10, 2, k=2, cached=True) >>> res = conv(g, feat) >>> res tensor([[1.9441, 0.9343], [1.9441, 0.9343], [1.9441, 0.9343], [2.7709, 1.3316], [1.9297, 0.9273], [1.9441, 0.9343]], grad_fn=<AddmmBackward>)

forward
(graph, feat)[source]¶ Compute Simplifying Graph Convolution layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
torch.Tensor
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
Note
If
cache
is set to True,feat
andgraph
should not change during training, or you will get wrong results.
APPNPConv¶

class
dgl.nn.pytorch.conv.
APPNPConv
(k, alpha, edge_drop=0.0)[source]¶ Bases:
torch.nn.modules.module.Module
Approximate Personalized Propagation of Neural Predictions layer from paper Predict then Propagate: Graph Neural Networks meet Personalized PageRank.
\[ \begin{align}\begin{aligned}H^{0} &= X\\H^{l+1} &= (1\alpha)\left(\tilde{D}^{1/2} \tilde{A} \tilde{D}^{1/2} H^{l}\right) + \alpha H^{0}\end{aligned}\end{align} \]where \(\tilde{A}\) is \(A\) + \(I\).
 Parameters
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import APPNPConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 10) >>> conv = APPNPConv(k=3, alpha=0.5) >>> res = conv(g, feat) >>> res tensor([[1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000], [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000], [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000], [1.0303, 1.0303, 1.0303, 1.0303, 1.0303, 1.0303, 1.0303, 1.0303, 1.0303, 1.0303], [0.8643, 0.8643, 0.8643, 0.8643, 0.8643, 0.8643, 0.8643, 0.8643, 0.8643, 0.8643], [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]])

forward
(graph, feat)[source]¶ Compute APPNP layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – The input feature of shape \((N, *)\). \(N\) is the number of nodes, and \(*\) could be of any shape.
 Returns
The output feature of shape \((N, *)\) where \(*\) should be the same as input shape.
 Return type
torch.Tensor
GINConv¶

class
dgl.nn.pytorch.conv.
GINConv
(apply_func, aggregator_type, init_eps=0, learn_eps=False)[source]¶ Bases:
torch.nn.modules.module.Module
Graph Isomorphism Network layer from paper How Powerful are Graph Neural Networks?.
\[h_i^{(l+1)} = f_\Theta \left((1 + \epsilon) h_i^{l} + \mathrm{aggregate}\left(\left\{h_j^{l}, j\in\mathcal{N}(i) \right\}\right)\right)\]If a weight tensor on each edge is provided, the weighted graph convolution is defined as:
\[h_i^{(l+1)} = f_\Theta \left((1 + \epsilon) h_i^{l} + \mathrm{aggregate}\left(\left\{e_{ji} h_j^{l}, j\in\mathcal{N}(i) \right\}\right)\right)\]where \(e_{ji}\) is the weight on the edge from node \(j\) to node \(i\). Please make sure that e_{ji} is broadcastable with h_j^{l}.
 Parameters
apply_func (callable activation function/layer or None) – If not None, apply this function to the updated node feature, the \(f_\Theta\) in the formula.
aggregator_type (str) – Aggregator type to use (
sum
,max
ormean
).init_eps (float, optional) – Initial \(\epsilon\) value, default:
0
.learn_eps (bool, optional) – If True, \(\epsilon\) will be a learnable parameter. Default:
False
.
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import GINConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 10) >>> lin = th.nn.Linear(10, 10) >>> conv = GINConv(lin, 'max') >>> res = conv(g, feat) >>> res tensor([[0.4821, 0.0207, 0.7665, 0.5721, 0.4682, 0.2134, 0.5236, 1.2855, 0.8843, 0.8764], [0.4821, 0.0207, 0.7665, 0.5721, 0.4682, 0.2134, 0.5236, 1.2855, 0.8843, 0.8764], [0.4821, 0.0207, 0.7665, 0.5721, 0.4682, 0.2134, 0.5236, 1.2855, 0.8843, 0.8764], [0.4821, 0.0207, 0.7665, 0.5721, 0.4682, 0.2134, 0.5236, 1.2855, 0.8843, 0.8764], [0.4821, 0.0207, 0.7665, 0.5721, 0.4682, 0.2134, 0.5236, 1.2855, 0.8843, 0.8764], [0.1804, 0.0758, 0.5159, 0.3569, 0.1408, 0.1395, 0.2387, 0.7773, 0.5266, 0.4465]], grad_fn=<AddmmBackward>)

forward
(graph, feat, edge_weight=None)[source]¶ Compute Graph Isomorphism Network layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape \((N_{in}, D_{in})\) and \((N_{out}, D_{in})\). If
apply_func
is not None, \(D_{in}\) should fit the input dimensionality requirement ofapply_func
.edge_weight (torch.Tensor, optional) – Optional tensor on the edge. If given, the convolution will weight with regard to the message.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output dimensionality of
apply_func
. Ifapply_func
is None, \(D_{out}\) should be the same as input dimensionality. Return type
torch.Tensor
GatedGraphConv¶

class
dgl.nn.pytorch.conv.
GatedGraphConv
(in_feats, out_feats, n_steps, n_etypes, bias=True)[source]¶ Bases:
torch.nn.modules.module.Module
Gated Graph Convolution layer from paper Gated Graph Sequence Neural Networks.
\[ \begin{align}\begin{aligned}h_{i}^{0} &= [ x_i \ \mathbf{0} ]\\a_{i}^{t} &= \sum_{j\in\mathcal{N}(i)} W_{e_{ij}} h_{j}^{t}\\h_{i}^{t+1} &= \mathrm{GRU}(a_{i}^{t}, h_{i}^{t})\end{aligned}\end{align} \] Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(x_i\).
out_feats (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(t+1)}\).
n_steps (int) – Number of recurrent steps; i.e, the \(t\) in the above formula.
n_etypes (int) – Number of edge types.
bias (bool) – If True, adds a learnable bias to the output. Default:
True
.
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import GatedGraphConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 10) >>> conv = GatedGraphConv(10, 10, 2, 3) >>> etype = th.tensor([0,1,2,0,1,2]) >>> res = conv(g, feat, etype) >>> res tensor([[ 0.4652, 0.4458, 0.5169, 0.4126, 0.4847, 0.2303, 0.2757, 0.7721, 0.0523, 0.0857], [ 0.0832, 0.1388, 0.5643, 0.7053, 0.2524, 0.3847, 0.7587, 0.8245, 0.9315, 0.4063], [ 0.6340, 0.4096, 0.7692, 0.2125, 0.2106, 0.4542, 0.0580, 0.3364, 0.1376, 0.4948], [ 0.5551, 0.7946, 0.6220, 0.8058, 0.5711, 0.3063, 0.5454, 0.2272, 0.6931, 0.1607], [ 0.2644, 0.2469, 0.6143, 0.6008, 0.1516, 0.3781, 0.5878, 0.7993, 0.9241, 0.1835], [ 0.6393, 0.3447, 0.3893, 0.4279, 0.3342, 0.3809, 0.0406, 0.5030, 0.1342, 0.0425]], grad_fn=<AddBackward0>)

forward
(graph, feat, etypes=None)[source]¶ Compute Gated Graph Convolution layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – The input feature of shape \((N, D_{in})\) where \(N\) is the number of nodes of the graph and \(D_{in}\) is the input feature size.
etypes (torch.LongTensor, or None) – The edge type tensor of shape \((E,)\) where \(E\) is the number of edges of the graph. When there’s only one edge type, this argument can be skipped
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output feature size.
 Return type
torch.Tensor
GMMConv¶

class
dgl.nn.pytorch.conv.
GMMConv
(in_feats, out_feats, dim, n_kernels, aggregator_type='sum', residual=False, bias=True, allow_zero_in_degree=False)[source]¶ Bases:
torch.nn.modules.module.Module
The Gaussian Mixture Model Convolution layer from Geometric Deep Learning on Graphs and Manifolds using Mixture Model CNNs.
\[ \begin{align}\begin{aligned}u_{ij} &= f(x_i, x_j), x_j \in \mathcal{N}(i)\\w_k(u) &= \exp\left(\frac{1}{2}(u\mu_k)^T \Sigma_k^{1} (u  \mu_k)\right)\\h_i^{l+1} &= \mathrm{aggregate}\left(\left\{\frac{1}{K} \sum_{k}^{K} w_k(u_{ij}), \forall j\in \mathcal{N}(i)\right\}\right)\end{aligned}\end{align} \]where \(u\) denotes the pseudocoordinates between a vertex and one of its neighbor, computed using function \(f\), \(\Sigma_k^{1}\) and \(\mu_k\) are learnable parameters representing the covariance matrix and mean vector of a Gaussian kernel.
 Parameters
in_feats (int) – Number of input features; i.e., the number of dimensions of \(x_i\).
out_feats (int) – Number of output features; i.e., the number of dimensions of \(h_i^{(l+1)}\).
dim (int) – Dimensionality of pseudocoordinte; i.e, the number of dimensions of \(u_{ij}\).
n_kernels (int) – Number of kernels \(K\).
aggregator_type (str) – Aggregator type (
sum
,mean
,max
). Default:sum
.residual (bool) – If True, use residual connection inside this layer. Default:
False
.bias (bool) – If True, adds a learnable bias to the output. Default:
True
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import GMMConv
>>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> conv = GMMConv(10, 2, 3, 2, 'mean') >>> pseudo = th.ones(12, 3) >>> res = conv(g, feat, pseudo) >>> res tensor([[0.3462, 0.2654], [0.3462, 0.2654], [0.3462, 0.2654], [0.3462, 0.2654], [0.3462, 0.2654], [0.3462, 0.2654]], grad_fn=<AddBackward0>)
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_fea = th.rand(2, 5) >>> v_fea = th.rand(4, 10) >>> pseudo = th.ones(5, 3) >>> conv = GMMConv((10, 5), 2, 3, 2, 'mean') >>> res = conv(g, (u_fea, v_fea), pseudo) >>> res tensor([[0.1107, 0.1559], [0.1646, 0.2326], [0.1377, 0.1943], [0.1107, 0.1559]], grad_fn=<AddBackward0>)

forward
(graph, feat, pseudo)[source]¶ Compute Gaussian Mixture Model Convolution layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – If a single tensor is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of tensors are given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
pseudo (torch.Tensor) – The pseudo coordinate tensor of shape \((E, D_{u})\) where \(E\) is the number of edges of the graph and \(D_{u}\) is the dimensionality of pseudo coordinate.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output feature size.
 Return type
torch.Tensor
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
ChebConv¶

class
dgl.nn.pytorch.conv.
ChebConv
(in_feats, out_feats, k, activation=<function relu>, bias=True)[source]¶ Bases:
torch.nn.modules.module.Module
Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.
\[ \begin{align}\begin{aligned}h_i^{l+1} &= \sum_{k=0}^{K1} W^{k, l}z_i^{k, l}\\Z^{0, l} &= H^{l}\\Z^{1, l} &= \tilde{L} \cdot H^{l}\\Z^{k, l} &= 2 \cdot \tilde{L} \cdot Z^{k1, l}  Z^{k2, l}\\\tilde{L} &= 2\left(I  \tilde{D}^{1/2} \tilde{A} \tilde{D}^{1/2}\right)/\lambda_{max}  I\end{aligned}\end{align} \]where \(\tilde{A}\) is \(A\) + \(I\), \(W\) is learnable weight.
 Parameters
in_feats (int) – Dimension of input features; i.e, the number of dimensions of \(h_i^{(l)}\).
out_feats (int) – Dimension of output features \(h_i^{(l+1)}\).
k (int) – Chebyshev filter size \(K\).
activation (function, optional) – Activation function. Default
ReLu
.bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import ChebConv >> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 10) >>> conv = ChebConv(10, 2, 2) >>> res = conv(g, feat) >>> res tensor([[ 0.6163, 0.1809], [ 0.6163, 0.1809], [ 0.6163, 0.1809], [ 0.9698, 1.5053], [ 0.3664, 0.7556], [0.2370, 3.0164]], grad_fn=<AddBackward0>)

forward
(graph, feat, lambda_max=None)[source]¶ Compute ChebNet layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
lambda_max (list or tensor or None, optional.) –
A list(tensor) with length \(B\), stores the largest eigenvalue of the normalized laplacian of each individual graph in
graph
, where \(B\) is the batch size of the input graph. Default: None.If None, this method would set the default value to 2. One can use
dgl.laplacian_lambda_max()
to compute this value.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
torch.Tensor
AGNNConv¶

class
dgl.nn.pytorch.conv.
AGNNConv
(init_beta=1.0, learn_beta=True, allow_zero_in_degree=False)[source]¶ Bases:
torch.nn.modules.module.Module
Attentionbased Graph Neural Network layer from paper Attentionbased Graph Neural Network for SemiSupervised Learning.
\[H^{l+1} = P H^{l}\]where \(P\) is computed as:
\[P_{ij} = \mathrm{softmax}_i ( \beta \cdot \cos(h_i^l, h_j^l))\]where \(\beta\) is a single scalar parameter.
 Parameters
init_beta (float, optional) – The \(\beta\) in the formula, a single scalar parameter.
learn_beta (bool, optional) – If True, \(\beta\) will be learnable parameter.
allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import AGNNConv >>> >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> conv = AGNNConv() >>> res = conv(g, feat) >>> res tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], grad_fn=<BinaryReduceBackward>)

forward
(graph, feat)[source]¶ Compute AGNN layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – The input feature of shape \((N, *)\) \(N\) is the number of nodes, and \(*\) could be of any shape. If a pair of torch.Tensor is given, the pair must contain two tensors of shape \((N_{in}, *)\) and \((N_{out}, *)\), the \(*\) in the later tensor must equal the previous one.
 Returns
The output feature of shape \((N, *)\) where \(*\) should be the same as input shape.
 Return type
torch.Tensor
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
NNConv¶

class
dgl.nn.pytorch.conv.
NNConv
(in_feats, out_feats, edge_func, aggregator_type='mean', residual=False, bias=True)[source]¶ Bases:
torch.nn.modules.module.Module
Graph Convolution layer introduced in Neural Message Passing for Quantum Chemistry.
\[h_{i}^{l+1} = h_{i}^{l} + \mathrm{aggregate}\left(\left\{ f_\Theta (e_{ij}) \cdot h_j^{l}, j\in \mathcal{N}(i) \right\}\right)\]where \(e_{ij}\) is the edge feature, \(f_\Theta\) is a function with learnable parameters.
 Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\). NNConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer is to be applied on a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.out_feats (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
edge_func (callable activation function/layer) – Maps each edge feature to a vector of shape
(in_feats * out_feats)
as weight to compute messages. Also is the \(f_\Theta\) in the formula.aggregator_type (str) – Aggregator type to use (
sum
,mean
ormax
).residual (bool, optional) – If True, use residual connection. Default:
False
.bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.
Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import NNConv
>>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> lin = th.nn.Linear(5, 20) >>> def edge_func(efeat): ... return lin(efeat) >>> efeat = th.ones(6+6, 5) >>> conv = NNConv(10, 2, edge_func, 'mean') >>> res = conv(g, feat, efeat) >>> res tensor([[1.5243, 0.2719], [1.5243, 0.2719], [1.5243, 0.2719], [1.5243, 0.2719], [1.5243, 0.2719], [1.5243, 0.2719]], grad_fn=<AddBackward0>)
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_feat = th.tensor(np.random.rand(2, 10).astype(np.float32)) >>> v_feat = th.tensor(np.random.rand(4, 10).astype(np.float32)) >>> conv = NNConv(10, 2, edge_func, 'mean') >>> efeat = th.ones(5, 5) >>> res = conv(g, (u_feat, v_feat), efeat) >>> res tensor([[0.6568, 0.5042], [ 0.9089, 0.5352], [ 0.1261, 0.0155], [0.6568, 0.5042]], grad_fn=<AddBackward0>)

forward
(graph, feat, efeat)[source]¶ Compute MPNN Graph Convolution layer.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor or pair of torch.Tensor) – The input feature of shape \((N, D_{in})\) where \(N\) is the number of nodes of the graph and \(D_{in}\) is the input feature size.
efeat (torch.Tensor) – The edge feature of shape \((E, *)\), which should fit the input shape requirement of
edge_func
. \(E\) is the number of edges of the graph.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is the output feature size.
 Return type
torch.Tensor
AtomicConv¶

class
dgl.nn.pytorch.conv.
AtomicConv
(interaction_cutoffs, rbf_kernel_means, rbf_kernel_scaling, features_to_use=None)[source]¶ Bases:
torch.nn.modules.module.Module
Atomic Convolution Layer from paper Atomic Convolutional Networks for Predicting ProteinLigand Binding Affinity.
Denoting the type of atom \(i\) by \(z_i\) and the distance between atom \(i\) and \(j\) by \(r_{ij}\).
Distance Transformation
An atomic convolution layer first transforms distances with radial filters and then perform a pooling operation.
For radial filter indexed by \(k\), it projects edge distances with
\[h_{ij}^{k} = \exp(\gamma_{k}r_{ij}r_{k}^2)\]If \(r_{ij} < c_k\),
\[f_{ij}^{k} = 0.5 * \cos(\frac{\pi r_{ij}}{c_k} + 1),\]else,
\[f_{ij}^{k} = 0.\]Finally,
\[e_{ij}^{k} = h_{ij}^{k} * f_{ij}^{k}\]Aggregation
For each type \(t\), each atom collects distance information from all neighbor atoms of type \(t\):
\[p_{i, t}^{k} = \sum_{j\in N(i)} e_{ij}^{k} * 1(z_j == t)\]Then concatenate the results for all RBF kernels and atom types.
 Parameters
interaction_cutoffs (float32 tensor of shape (K)) – \(c_k\) in the equations above. Roughly they can be considered as learnable cutoffs and two atoms are considered as connected if the distance between them is smaller than the cutoffs. K for the number of radial filters.
rbf_kernel_means (float32 tensor of shape (K)) – \(r_k\) in the equations above. K for the number of radial filters.
rbf_kernel_scaling (float32 tensor of shape (K)) – \(\gamma_k\) in the equations above. K for the number of radial filters.
features_to_use (None or float tensor of shape (T)) – In the original paper, these are atomic numbers to consider, representing the types of atoms. T for the number of types of atomic numbers. Default to None.
Note
This convolution operation is designed for molecular graphs in Chemistry, but it might be possible to extend it to more general graphs.
There seems to be an inconsistency about the definition of \(e_{ij}^{k}\) in the paper and the author’s implementation. We follow the author’s implementation. In the paper, \(e_{ij}^{k}\) was defined as \(\exp(\gamma_{k}r_{ij}r_{k}^2 * f_{ij}^{k})\).
\(\gamma_{k}\), \(r_k\) and \(c_k\) are all learnable.
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import AtomicConv
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 1) >>> edist = th.ones(6, 1) >>> interaction_cutoffs = th.ones(3).float() * 2 >>> rbf_kernel_means = th.ones(3).float() >>> rbf_kernel_scaling = th.ones(3).float() >>> conv = AtomicConv(interaction_cutoffs, rbf_kernel_means, rbf_kernel_scaling) >>> res = conv(g, feat, edist) >>> res tensor([[0.5000, 0.5000, 0.5000], [0.5000, 0.5000, 0.5000], [0.5000, 0.5000, 0.5000], [1.0000, 1.0000, 1.0000], [0.5000, 0.5000, 0.5000], [0.0000, 0.0000, 0.0000]], grad_fn=<ViewBackward>)

forward
(graph, feat, distances)[source]¶ Apply the atomic convolution layer.
 Parameters
graph (DGLGraph) – Topology based on which message passing is performed.
feat (Float32 tensor of shape \((V, 1)\)) – Initial node features, which are atomic numbers in the paper. \(V\) for the number of nodes.
distances (Float32 tensor of shape \((E, 1)\)) – Distance between end nodes of edges. E for the number of edges.
 Returns
Updated node representations. \(V\) for the number of nodes, \(K\) for the number of radial filters, and \(T\) for the number of types of atomic numbers.
 Return type
Float32 tensor of shape \((V, K * T)\)
CFConv¶

class
dgl.nn.pytorch.conv.
CFConv
(node_in_feats, edge_in_feats, hidden_feats, out_feats)[source]¶ Bases:
torch.nn.modules.module.Module
CFConv in SchNet.
SchNet is introduced in SchNet: A continuousfilter convolutional neural network for modeling quantum interactions.
It combines node and edge features in message passing and updates node representations.
\[h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} h_j^{l} \circ W^{(l)}e_ij\]where \(\circ\) represents elementwise multiplication and for \(\text{SPP}\) :
\[\text{SSP}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))  \log(\text{shift})\] Parameters
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import CFConv >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> nfeat = th.ones(6, 10) >>> efeat = th.ones(6, 5) >>> conv = CFConv(10, 5, 3, 2) >>> res = conv(g, nfeat, efeat) >>> res tensor([[0.1209, 0.2289], [0.1209, 0.2289], [0.1209, 0.2289], [0.1135, 0.2338], [0.1209, 0.2289], [0.1283, 0.2240]], grad_fn=<SubBackward0>)

forward
(g, node_feats, edge_feats)[source]¶ Performs message passing and updates node representations.
 Parameters
g (DGLGraph) – The graph.
node_feats (torch.Tensor or pair of torch.Tensor) – The input node features. If a torch.Tensor is given, it represents the input node feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of torch.Tensor is given, which is the case for bipartite graph, the pair must contain two tensors of shape \((N_{src}, D_{in_{src}})\) and \((N_{dst}, D_{in_{dst}})\) separately for the source and destination nodes.
edge_feats (torch.Tensor) – The input edge feature of shape \((E, edge_in_feats)\) where \(E\) is the number of edges.
 Returns
The output node feature of shape \((N_{out}, out_feats)\) where \(N_{out}\) is the number of destination nodes.
 Return type
torch.Tensor
DotGatConv¶

class
dgl.nn.pytorch.conv.
DotGatConv
(in_feats, out_feats, num_heads, allow_zero_in_degree=False)[source]¶ Bases:
torch.nn.modules.module.Module
Apply dot product version of self attention in GCN.
\[h_i^{(l+1)} = \sum_{j\in \mathcal{N}(i)} \alpha_{i, j} h_j^{(l)}\]where \(\alpha_{ij}\) is the attention score bewteen node \(i\) and node \(j\):
\[ \begin{align}\begin{aligned}\alpha_{i, j} &= \mathrm{softmax_i}(e_{ij}^{l})\\e_{ij}^{l} &= ({W_i^{(l)} h_i^{(l)}})^T \cdot {W_j^{(l)} h_j^{(l)}}\end{aligned}\end{align} \]where \(W_i\) and \(W_j\) transform node \(i\)’s and node \(j\)’s features into the same dimension, so that when compute note features’ similarity, it can use dotproduct.
 Parameters
in_feats (int, or pair of ints) – Input feature size; i.e, the number of dimensions of \(h_i^{(l)}\). DotGatConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer is to be applied to a unidirectional bipartite graph,
in_feats
specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.out_feats (int) – Output feature size; i.e, the number of dimensions of \(h_i^{(l+1)}\).
num_heads (int) – Number of head in MultiHead Attention
allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import DotGatConv
>>> # Case 1: Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> g = dgl.add_self_loop(g) >>> feat = th.ones(6, 10) >>> dotgatconv = DotGatConv(10, 2, num_heads=3) >>> res = dotgatconv(g, feat) >>> res tensor([[[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]], [[ 3.4570, 1.8634], [ 1.3805, 0.0762], [ 1.0390, 1.1479]]], grad_fn=<BinaryReduceBackward>)
>>> # Case 2: Unidirectional bipartite graph >>> u = [0, 1, 0, 0, 1] >>> v = [0, 1, 2, 3, 2] >>> g = dgl.bipartite((u, v)) >>> u_feat = th.tensor(np.random.rand(2, 5).astype(np.float32)) >>> v_feat = th.tensor(np.random.rand(4, 10).astype(np.float32)) >>> dotgatconv = DotGatConv((5,10), 2, 3) >>> res = dotgatconv(g, (u_feat, v_feat)) >>> res tensor([[[0.6066, 1.0268], [0.5945, 0.4801], [ 0.1594, 0.3825]], [[ 0.0268, 1.0783], [ 0.5041, 1.3025], [ 0.6568, 0.7048]], [[0.2688, 1.0543], [0.0315, 0.9016], [ 0.3943, 0.5347]], [[0.6066, 1.0268], [0.5945, 0.4801], [ 0.1594, 0.3825]]], grad_fn=<BinaryReduceBackward>)

forward
(graph, feat, get_attention=False)[source]¶ Apply dot product version of self attention in GCN.
 Parameters
graph (DGLGraph or bi_partities graph) – The graph
feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).
get_attention (bool, optional) – Whether to return the attention values. Default to False.
 Returns
torch.Tensor – The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
torch.Tensor, optional – The attention values of shape \((E, 1)\), where \(E\) is the number of edges. This is returned only when
get_attention
isTrue
.
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
TWIRLSConv¶

class
dgl.nn.pytorch.conv.
TWIRLSConv
(input_d, output_d, hidden_d, prop_step, num_mlp_before=1, num_mlp_after=1, norm='none', precond=True, alp=0, lam=1, attention=False, tau=0.2, T=1, p=1, use_eta=False, attn_bef=False, dropout=0.0, attn_dropout=0.0, inp_dropout=0.0)[source]¶ Bases:
torch.nn.modules.module.Module
Together with iteratively reweighting least squre from paper Graph Neural Networks Inspired by Classical Iterative Algorithms.
 Parameters
input_d (int) – Number of input features.
output_d (int) – Number of output features.
hidden_d (int) – Size of hidden layers.
prop_step (int) – Number of propagation steps
num_mlp_before (int) – Number of mlp layers before propagation. Default:
1
.num_mlp_after (int) – Number of mlp layers after propagation. Default:
1
.norm (str) – The type of norm layers inside mlp layers. Can be
'batch'
,'layer'
or'none'
. Default:'none'
precond (str) – If True, use pre conditioning and unormalized laplacian, else not use pre conditioning and use normalized laplacian. Default:
True
alp (float) – The \(\alpha\) in paper. If equal to \(0\), will be automatically decided based on other hyper prameters. Default:
0
.lam (float) – The \(\lambda\) in paper. Default:
1
.attention (bool) – If
True
, add an attention layer inside propagations. Default:False
.tau (float) – The \(\tau\) in paper. Default:
0.2
.T (float) – The \(T\) in paper. If < 0, \(T\) will be set to infty. Default:
1
.p (float) – The \(p\) in paper. Default:
1
.use_eta (bool) – If
True
, add a learnable weight on each dimension in attention. Default:False
.attn_bef (bool) – If
True
, add another attention layer before propagation. Default:False
.dropout (float) – The dropout rate in mlp layers. Default:
0.0
.attn_dropout (float) – The dropout rate of attention values. Default:
0.0
.inp_dropout (float) – The dropout rate on input features. Default:
0.0
.
Note
add_self_loop
will be automatically called before propagation.Example
>>> import dgl >>> from dgl.nn import TWIRLSConv >>> import torch as th
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 10) >>> conv = TWIRLSConv(10, 2, 128, prop_step = 64) >>> res = conv(g , feat) >>> res tensor([[ 0.4556, 2.6692], [ 0.4556, 2.6692], [ 0.4556, 2.6692], [ 1.0112, 5.9241], [ 0.8011, 4.6935], [ 0.8844, 5.1814]], grad_fn=<AddmmBackward>)

forward
(graph, feat)[source]¶ Run TWIRLS forward.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – The initial node features.
 Returns
The output feature
 Return type
torch.Tensor
Note
Input shape: \((N, \text{input_d})\) where \(N\) is the number of nodes.
Output shape: \((N, \text{output_d})\).
TWIRLSUnfoldingAndAttention¶

class
dgl.nn.pytorch.conv.
TWIRLSUnfoldingAndAttention
(d, alp, lam, prop_step, attn_aft=1, tau=0.2, T=1, p=1, use_eta=False, init_att=False, attn_dropout=0, precond=True)[source]¶ Bases:
torch.nn.modules.module.Module
Combine propagation and attention together.
 Parameters
d (int) – Size of graph feature.
alp (float) – Step size. \(\alpha\) in ther paper.
lam (int) – Coefficient of graph smooth term. \(\lambda\) in ther paper.
prop_step (int) – Number of propagation steps
attn_aft (int) – Where to put attention layer. i.e. number of propagation steps before attention. If set to
1
, then no attention.tau (float) – The lower thresholding parameter. Correspond to \(\tau\) in the paper.
T (float) – The upper thresholding parameter. Correspond to \(T\) in the paper.
p (float) – Correspond to \(\rho\) in the paper..
use_eta (bool) – If True, learn a weight vector for each dimension when doing attention.
init_att (bool) – If
True
, add an extra attention layer before propagation.attn_dropout (float) – the dropout rate of attention value. Default:
0.0
.precond (bool) – If
True
, use preconditioned & reparameterized version propagation (eq.28), else use normalized laplacian (eq.30).
Example
>>> import dgl >>> from dgl.nn import TWIRLSUnfoldingAndAttention >>> import torch as th
>>> g = dgl.graph(([0, 1, 2, 3, 2, 5], [1, 2, 3, 4, 0, 3])).add_self_loop() >>> feat = th.ones(6,5) >>> prop = TWIRLSUnfoldingAndAttention(10, 1, 1, prop_step=3) >>> res = prop(g,feat) >>> res tensor([[2.5000, 2.5000, 2.5000, 2.5000, 2.5000], [2.5000, 2.5000, 2.5000, 2.5000, 2.5000], [2.5000, 2.5000, 2.5000, 2.5000, 2.5000], [3.7656, 3.7656, 3.7656, 3.7656, 3.7656], [2.5217, 2.5217, 2.5217, 2.5217, 2.5217], [4.0000, 4.0000, 4.0000, 4.0000, 4.0000]])
GCN2Conv¶

class
dgl.nn.pytorch.conv.
GCN2Conv
(in_feats, layer, alpha=0.1, lambda_=1, project_initial_features=True, allow_zero_in_degree=False, bias=True, activation=None)[source]¶ Bases:
torch.nn.modules.module.Module
The Graph Convolutional Network via Initial residual and Identity mapping (GCNII) was introduced in “Simple and Deep Graph Convolutional Networks” paper. It is mathematically is defined as follows:
\[\mathbf{h}^{(l+1)} =\left( (1  \alpha)(\mathbf{D}^{1/2} \mathbf{\hat{A}} \mathbf{D}^{1/2})\mathbf{h}^{(l)} + \alpha {\mathbf{h}^{(0)}} \right) \left( (1  \beta_l) \mathbf{I} + \beta_l \mathbf{W} \right)\]where \(\mathbf{\hat{A}}\) is the adjacency matrix with selfloops, \(\mathbf{D}_{ii} = \sum_{j=0} \mathbf{A}_{ij}\) is its diagonal degree matrix, \(\mathbf{h}^{(0)}\) is the initial node features, \(\mathbf{h}^{(l)}\) is the feature of layer \(l\), \(\alpha\) is the fraction of initial node features, and \(\beta_l\) is the hyperparameter to tune the strength of identity mapping. It is defined by \(\beta_l = \log(\frac{\lambda}{l}+1)\approx\frac{\lambda}{l}\), where \(\lambda\) is a hyperparameter. :math: beta ensures that the decay of the weight matrix adaptively increases as we stack more layers.
 Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
layer (int) – the index of current layer.
alpha (float) – The fraction of the initial input features. Default:
0.1
lambda_ (float) – The hyperparameter to ensure the decay of the weight matrix adaptively increases. Default:
1
project_initial_features (bool) – Whether to share a weight matrix between initial features and smoothed features. Default:
True
bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.allow_zero_in_degree (bool, optional) – If there are 0indegree nodes in the graph, output for those nodes will be invalid since no message will be passed to those nodes. This is harmful for some applications causing silent performance regression. This module will raise a DGLError if it detects 0indegree nodes in input graph. By setting
True
, it will suppress the check and let the users handle it by themselves. Default:False
.
Note
Zero indegree nodes will lead to invalid output value. This is because no message will be passed to those nodes, the aggregation function will be appied on empty input. A common practice to avoid this is to add a selfloop for each node in the graph if it is homogeneous, which can be achieved by:
>>> g = ... # a DGLGraph >>> g = dgl.add_self_loop(g)
Calling
add_self_loop
will not work for some graphs, for example, heterogeneous graph since the edge type can not be decided for self_loop edges. Setallow_zero_in_degree
toTrue
for those cases to unblock the code and handle zeroindegree nodes manually. A common practise to handle this is to filter out the nodes with zeroindegree when use after conv.Examples
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import GCN2Conv
>>> # Homogeneous graph >>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3])) >>> feat = th.ones(6, 3) >>> g = dgl.add_self_loop(g) >>> conv1 = GCN2Conv(3, layer=1, alpha=0.5, \ ... project_initial_features=True, allow_zero_in_degree=True) >>> conv2 = GCN2Conv(3, layer=2, alpha=0.5, \ ... project_initial_features=True, allow_zero_in_degree=True) >>> res = feat >>> res = conv1(g, res, feat) >>> res = conv2(g, res, feat) >>> print(res) tensor([[1.3803, 3.3191, 2.9572], [1.3803, 3.3191, 2.9572], [1.3803, 3.3191, 2.9572], [1.4770, 3.8326, 3.2451], [1.3623, 3.2102, 2.8679], [1.3803, 3.3191, 2.9572]], grad_fn=<AddBackward0>)

forward
(graph, feat, feat_0)[source]¶ Compute graph convolution.
 Parameters
graph (DGLGraph) – The graph.
feat (torch.Tensor) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is the size of input feature and \(N\) is the number of nodes.
feat_0 (torch.Tensor) – The initial feature of shape \((N, D_{in})\)
 Returns
The output feature
 Return type
torch.Tensor
 Raises
DGLError – If there are 0indegree nodes in the input graph, it will raise DGLError since no message will be passed to those nodes. This will cause invalid output. The error can be ignored by setting
allow_zero_in_degree
parameter toTrue
.
Note
Input shape: \((N, *, \text{in_feats})\) where * means any number of additional dimensions, \(N\) is the number of nodes.
Output shape: \((N, *, \text{out_feats})\) where all but the last dimension are the same shape as the input.
Weight shape: \((\text{in_feats}, \text{out_feats})\).
Dense Conv Layers¶
DenseGraphConv¶

class
dgl.nn.pytorch.conv.
DenseGraphConv
(in_feats, out_feats, norm='both', bias=True, activation=None)[source]¶ Bases:
torch.nn.modules.module.Module
Graph Convolutional Network layer where the graph structure is given by an adjacency matrix. We recommend user to use this module when applying graph convolution on dense graphs.
 Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_j^{(l)}\).
out_feats (int) – Output feature size; i.e., the number of dimensions of \(h_i^{(l+1)}\).
norm (str, optional) – How to apply the normalizer. If is ‘right’, divide the aggregated messages by each node’s indegrees, which is equivalent to averaging the received messages. If is ‘none’, no normalization is applied. Default is ‘both’, where the \(c_{ij}\) in the paper is applied.
bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.
Notes
Zero indegree nodes will lead to allzero output. A common practice to avoid this is to add a selfloop for each node in the graph, which can be achieved by setting the diagonal of the adjacency matrix to be 1.
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import DenseGraphConv >>> >>> feat = th.ones(6, 10) >>> adj = th.tensor([[0., 0., 1., 0., 0., 0.], ... [1., 0., 0., 0., 0., 0.], ... [0., 1., 0., 0., 0., 0.], ... [0., 0., 1., 0., 0., 1.], ... [0., 0., 0., 1., 0., 0.], ... [0., 0., 0., 0., 0., 0.]]) >>> conv = DenseGraphConv(10, 2) >>> res = conv(adj, feat) >>> res tensor([[0.2159, 1.9027], [0.3053, 2.6908], [0.3053, 2.6908], [0.3685, 3.2481], [0.3053, 2.6908], [0.0000, 0.0000]], grad_fn=<AddBackward0>)
See also

forward
(adj, feat)[source]¶ Compute (Dense) Graph Convolution layer.
 Parameters
adj (torch.Tensor) – The adjacency matrix of the graph to apply Graph Convolution on, when applied to a unidirectional bipartite graph,
adj
should be of shape should be of shape \((N_{out}, N_{in})\); when applied to a homo graph,adj
should be of shape \((N, N)\). In both cases, a row represents a destination node while a column represents a source node.feat (torch.Tensor) – The input feature.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
torch.Tensor
DenseSAGEConv¶

class
dgl.nn.pytorch.conv.
DenseSAGEConv
(in_feats, out_feats, feat_drop=0.0, bias=True, norm=None, activation=None)[source]¶ Bases:
torch.nn.modules.module.Module
GraphSAGE layer where the graph structure is given by an adjacency matrix. We recommend to use this module when appying GraphSAGE on dense graphs.
Note that we only support gcn aggregator in DenseSAGEConv.
 Parameters
in_feats (int) – Input feature size; i.e, the number of dimensions of \(h_i^{(l)}\).
out_feats (int) – Output feature size; i.e, the number of dimensions of \(h_i^{(l+1)}\).
feat_drop (float, optional) – Dropout rate on features. Default: 0.
bias (bool) – If True, adds a learnable bias to the output. Default:
True
.norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.
activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default:
None
.
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import DenseSAGEConv >>> >>> feat = th.ones(6, 10) >>> adj = th.tensor([[0., 0., 1., 0., 0., 0.], ... [1., 0., 0., 0., 0., 0.], ... [0., 1., 0., 0., 0., 0.], ... [0., 0., 1., 0., 0., 1.], ... [0., 0., 0., 1., 0., 0.], ... [0., 0., 0., 0., 0., 0.]]) >>> conv = DenseSAGEConv(10, 2) >>> res = conv(adj, feat) >>> res tensor([[1.0401, 2.1008], [1.0401, 2.1008], [1.0401, 2.1008], [1.0401, 2.1008], [1.0401, 2.1008], [1.0401, 2.1008]], grad_fn=<AddmmBackward>)
See also

forward
(adj, feat)[source]¶ Compute (Dense) Graph SAGE layer.
 Parameters
adj (torch.Tensor) – The adjacency matrix of the graph to apply SAGE Convolution on, when applied to a unidirectional bipartite graph,
adj
should be of shape should be of shape \((N_{out}, N_{in})\); when applied to a homo graph,adj
should be of shape \((N, N)\). In both cases, a row represents a destination node while a column represents a source node.feat (torch.Tensor or a pair of torch.Tensor) – If a torch.Tensor is given, the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape \((N_{in}, D_{in})\) and \((N_{out}, D_{in})\).
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
torch.Tensor
DenseChebConv¶

class
dgl.nn.pytorch.conv.
DenseChebConv
(in_feats, out_feats, k, bias=True)[source]¶ Bases:
torch.nn.modules.module.Module
Chebyshev Spectral Graph Convolution layer from paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.
We recommend to use this module when applying ChebConv on dense graphs.
 Parameters
in_feats (int) – Dimension of input features \(h_i^{(l)}\).
out_feats (int) – Dimension of output features \(h_i^{(l+1)}\).
k (int) – Chebyshev filter size.
activation (function, optional) – Activation function, default is ReLu.
bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
.
Example
>>> import dgl >>> import numpy as np >>> import torch as th >>> from dgl.nn import DenseChebConv >>> >>> feat = th.ones(6, 10) >>> adj = th.tensor([[0., 0., 1., 0., 0., 0.], ... [1., 0., 0., 0., 0., 0.], ... [0., 1., 0., 0., 0., 0.], ... [0., 0., 1., 0., 0., 1.], ... [0., 0., 0., 1., 0., 0.], ... [0., 0., 0., 0., 0., 0.]]) >>> conv = DenseChebConv(10, 2, 2) >>> res = conv(adj, feat) >>> res tensor([[3.3516, 2.4797], [3.3516, 2.4797], [3.3516, 2.4797], [4.5192, 3.0835], [2.5259, 2.0527], [0.5327, 1.0219]], grad_fn=<AddBackward0>)
See also

forward
(adj, feat, lambda_max=None)[source]¶ Compute (Dense) Chebyshev Spectral Graph Convolution layer.
 Parameters
adj (torch.Tensor) – The adjacency matrix of the graph to apply Graph Convolution on, should be of shape \((N, N)\), where a row represents the destination and a column represents the source.
feat (torch.Tensor) – The input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes.
lambda_max (float or None, optional) – A float value indicates the largest eigenvalue of given graph. Default: None.
 Returns
The output feature of shape \((N, D_{out})\) where \(D_{out}\) is size of output feature.
 Return type
torch.Tensor
Global Pooling Layers¶
Torch modules for graph global pooling.
SumPooling¶

class
dgl.nn.pytorch.glob.
SumPooling
[source]¶ Bases:
torch.nn.modules.module.Module
Apply sum pooling over the nodes in a graph .
\[r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k\]Notes
Input: Could be one graph, or a batch of graphs. If using a batch of graphs, make sure nodes in all graphs have the same feature size, and concatenate nodes’ feature together as the input.
Examples
The following example uses PyTorch backend.
>>> import dgl >>> import torch as th >>> from dgl.nn import SumPooling >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> sumpool = SumPooling() # create a sum pooling layer
Case 1: Input a single graph
>>> sumpool(g1, g1_node_feats) tensor([[2.2282, 1.8667, 2.4338, 1.7540, 1.4511]])
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ node features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats]) >>> >>> sumpool(batch_g, batch_f) tensor([[2.2282, 1.8667, 2.4338, 1.7540, 1.4511], [1.0608, 1.2080, 2.1780, 2.7849, 2.5420]])

forward
(graph, feat)[source]¶ Compute sum pooling.
 Parameters
graph (DGLGraph) – a DGLGraph or a batch of DGLGraphs
feat (torch.Tensor) – The input feature with shape \((N, D)\), where \(N\) is the number of nodes in the graph, and \(D\) means the size of features.
 Returns
The output feature with shape \((B, D)\), where \(B\) refers to the batch size of input graphs.
 Return type
torch.Tensor

AvgPooling¶

class
dgl.nn.pytorch.glob.
AvgPooling
[source]¶ Bases:
torch.nn.modules.module.Module
Apply average pooling over the nodes in a graph.
\[r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k\]Notes
Input: Could be one graph, or a batch of graphs. If using a batch of graphs, make sure nodes in all graphs have the same feature size, and concatenate nodes’ feature together as the input.
Examples
The following example uses PyTorch backend.
>>> import dgl >>> import torch as th >>> from dgl.nn import AvgPooling >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> avgpool = AvgPooling() # create an average pooling layer
Case 1: Input single graph
>>> avgpool(g1, g1_node_feats) tensor([[0.7427, 0.6222, 0.8113, 0.5847, 0.4837]])
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ note features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats]) >>> >>> avgpool(batch_g, batch_f) tensor([[0.7427, 0.6222, 0.8113, 0.5847, 0.4837], [0.2652, 0.3020, 0.5445, 0.6962, 0.6355]])

forward
(graph, feat)[source]¶ Compute average pooling.
 Parameters
graph (DGLGraph) – A DGLGraph or a batch of DGLGraphs.
feat (torch.Tensor) – The input feature with shape \((N, D)\), where \(N\) is the number of nodes in the graph, and \(D\) means the size of features.
 Returns
The output feature with shape \((B, D)\), where \(B\) refers to the batch size of input graphs.
 Return type
torch.Tensor

MaxPooling¶

class
dgl.nn.pytorch.glob.
MaxPooling
[source]¶ Bases:
torch.nn.modules.module.Module
Apply max pooling over the nodes in a graph.
\[r^{(i)} = \max_{k=1}^{N_i}\left( x^{(i)}_k \right)\]Notes
Input: Could be one graph, or a batch of graphs. If using a batch of graphs, make sure nodes in all graphs have the same feature size, and concatenate nodes’ feature together as the input.
Examples
The following example uses PyTorch backend.
>>> import dgl >>> import torch as th >>> from dgl.nn import MaxPooling >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> maxpool = MaxPooling() # create a max pooling layer
Case 1: Input a single graph
>>> maxpool(g1, g1_node_feats) tensor([[0.8948, 0.9030, 0.9137, 0.7567, 0.6118]])
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ node features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats]) >>> >>> maxpool(batch_g, batch_f) tensor([[0.8948, 0.9030, 0.9137, 0.7567, 0.6118], [0.5278, 0.6365, 0.9990, 0.9028, 0.8945]])

forward
(graph, feat)[source]¶ Compute max pooling.
 Parameters
graph (DGLGraph) – A DGLGraph or a batch of DGLGraphs.
feat (torch.Tensor) – The input feature with shape \((N, *)\), where \(N\) is the number of nodes in the graph.
 Returns
The output feature with shape \((B, *)\), where \(B\) refers to the batch size.
 Return type
torch.Tensor

SortPooling¶

class
dgl.nn.pytorch.glob.
SortPooling
(k)[source]¶ Bases:
torch.nn.modules.module.Module
Apply Sort Pooling (An EndtoEnd Deep Learning Architecture for Graph Classification) over the nodes in a graph. Sort Pooling first sorts the node features in ascending order along the feature dimension, and selects the sorted features of topk nodes (ranked by the largest value of each node).
 Parameters
k (int) – The number of nodes to hold for each graph.
Notes
Input: Could be one graph, or a batch of graphs. If using a batch of graphs, make sure nodes in all graphs have the same feature size, and concatenate nodes’ feature together as the input.
Examples
>>> import dgl >>> import torch as th >>> from dgl.nn import SortPooling >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> sortpool = SortPooling(k=2) # create a sort pooling layer
Case 1: Input a single graph
>>> sortpool(g1, g1_node_feats) tensor([[0.0699, 0.3637, 0.7567, 0.8948, 0.9137, 0.4755, 0.5197, 0.5725, 0.6825, 0.9030]])
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ node features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats]) >>> >>> sortpool(batch_g, batch_f) tensor([[0.0699, 0.3637, 0.7567, 0.8948, 0.9137, 0.4755, 0.5197, 0.5725, 0.6825, 0.9030], [0.2351, 0.5278, 0.6365, 0.8945, 0.9990, 0.2053, 0.2426, 0.4111, 0.5658, 0.9028]])

forward
(graph, feat)[source]¶ Compute sort pooling.
 Parameters
graph (DGLGraph) – A DGLGraph or a batch of DGLGraphs.
feat (torch.Tensor) – The input feature with shape \((N, D)\), where \(N\) is the number of nodes in the graph, and \(D\) means the size of features.
 Returns
The output feature with shape \((B, k * D)\), where \(B\) refers to the batch size of input graphs.
 Return type
torch.Tensor
WeightAndSum¶

class
dgl.nn.pytorch.glob.
WeightAndSum
(in_feats)[source]¶ Bases:
torch.nn.modules.module.Module
Compute importance weights for atoms and perform a weighted sum.
 Parameters
in_feats (int) – Input atom feature size
Examples
The following example uses PyTorch backend.
>>> import dgl >>> import torch as th >>> from dgl.nn import WeightAndSum >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> weight_and_sum = WeightAndSum(5) # create a weight and sum layer(in_feats=16)
Case 1: Input a single graph
>>> weight_and_sum(g1, g1_node_feats) tensor([[1.2194, 0.9490, 1.3235, 0.9609, 0.7710]], grad_fn=<SegmentReduceBackward>)
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ node features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats]) >>> >>> weight_and_sum(batch_g, batch_f) tensor([[1.2194, 0.9490, 1.3235, 0.9609, 0.7710], [0.5322, 0.5840, 1.0729, 1.3665, 1.2360]], grad_fn=<SegmentReduceBackward>)
Notes
WeightAndSum module was commonly used in molecular property prediction networks, see the GCN predictor in dgllifesci to understand how to use WeightAndSum layer to get the graph readout output.

forward
(g, feats)[source]¶ Compute molecule representations out of atom representations
 Parameters
g (DGLGraph) – DGLGraph with batch size B for processing multiple molecules in parallel
feats (FloatTensor of shape (N, self.in_feats)) – Representations for all atoms in the molecules * N is the total number of atoms in all molecules
 Returns
Representations for B molecules
 Return type
FloatTensor of shape (B, self.in_feats)
GlobalAttentionPooling¶

class
dgl.nn.pytorch.glob.
GlobalAttentionPooling
(gate_nn, feat_nn=None)[source]¶ Bases:
torch.nn.modules.module.Module
Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in a graph.
\[r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)\] Parameters
gate_nn (torch.nn.Module) – A neural network that computes attention scores for each feature.
feat_nn (torch.nn.Module, optional) – A neural network applied to each feature before combining them with attention scores.
Examples
The following example uses PyTorch backend.
>>> import dgl >>> import torch as th >>> from dgl.nn import GlobalAttentionPooling >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> gate_nn = th.nn.Linear(5, 1) # the gate layer that maps node feature to scalar >>> gap = GlobalAttentionPooling(gate_nn) # create a Global Attention Pooling layer
Case 1: Input a single graph
>>> gap(g1, g1_node_feats) tensor([[0.7410, 0.6032, 0.8111, 0.5942, 0.4762]], grad_fn=<SegmentReduceBackward>)
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ node features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats], 0) >>> >>> gap(batch_g, batch_f) tensor([[0.7410, 0.6032, 0.8111, 0.5942, 0.4762], [0.2417, 0.2743, 0.5054, 0.7356, 0.6146]], grad_fn=<SegmentReduceBackward>)
Notes
See our GGNN example on how to use GatedGraphConv and GlobalAttentionPooling layer to build a Graph Neural Networks that can solve Soduku.

forward
(graph, feat)[source]¶ Compute global attention pooling.
 Parameters
graph (DGLGraph) – A DGLGraph or a batch of DGLGraphs.
feat (torch.Tensor) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph, and \(D\) means the size of features.
 Returns
The output feature with shape \((B, D)\), where \(B\) refers to the batch size.
 Return type
torch.Tensor
Set2Set¶

class
dgl.nn.pytorch.glob.
Set2Set
(input_dim, n_iters, n_layers)[source]¶ Bases:
torch.nn.modules.module.Module
For each individual graph in the batch, set2set computes
\[ \begin{align}\begin{aligned}q_t &= \mathrm{LSTM} (q^*_{t1})\\\alpha_{i,t} &= \mathrm{softmax}(x_i \cdot q_t)\\r_t &= \sum_{i=1}^N \alpha_{i,t} x_i\\q^*_t &= q_t \Vert r_t\end{aligned}\end{align} \]for this graph.
 Parameters
Examples
The following example uses PyTorch backend.
>>> import dgl >>> import torch as th >>> from dgl.nn import Set2Set >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> s2s = Set2Set(5, 2, 1) # create a Set2Set layer(n_iters=2, n_layers=1)
Case 1: Input a single graph
>>> s2s(g1, g1_node_feats) tensor([[0.0235, 0.2291, 0.2654, 0.0376, 0.1349, 0.7560, 0.5822, 0.8199, 0.5960, 0.4760]], grad_fn=<CatBackward>)
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ node features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats], 0) >>> >>> s2s(batch_g, batch_f) tensor([[0.0235, 0.2291, 0.2654, 0.0376, 0.1349, 0.7560, 0.5822, 0.8199, 0.5960, 0.4760], [0.0483, 0.2010, 0.2324, 0.0145, 0.1361, 0.2703, 0.3078, 0.5529, 0.6876, 0.6399]], grad_fn=<CatBackward>)
Notes
Set2Set is widely used in molecular property predictions, see dgllifesci’s MPNN example on how to use DGL’s Set2Set layer in graph property prediction applications.

forward
(graph, feat)[source]¶ Compute set2set pooling.
 Parameters
graph (DGLGraph) – The input graph.
feat (torch.Tensor) – The input feature with shape \((N, D)\) where \(N\) is the number of nodes in the graph, and \(D\) means the size of features.
 Returns
The output feature with shape \((B, D)\), where \(B\) refers to the batch size, and \(D\) means the size of features.
 Return type
torch.Tensor
SetTransformerEncoder¶

class
dgl.nn.pytorch.glob.
SetTransformerEncoder
(d_model, n_heads, d_head, d_ff, n_layers=1, block_type='sab', m=None, dropouth=0.0, dropouta=0.0)[source]¶ Bases:
torch.nn.modules.module.Module
The Encoder module in Set Transformer: A Framework for Attentionbased PermutationInvariant Neural Networks.
 Parameters
d_model (int) – The hidden size of the model.
n_heads (int) – The number of heads.
d_head (int) – The hidden size of each head.
d_ff (int) – The kernel size in FFN (Positionwise FeedForward Network) layer.
n_layers (int) – The number of layers.
block_type (str) – Building block type: ‘sab’ (Set Attention Block) or ‘isab’ (Induced Set Attention Block).
m (int or None) – The number of induced vectors in ISAB Block. Set to None if block type is ‘sab’.
dropouth (float) – The dropout rate of each sublayer.
dropouta (float) – The dropout rate of attention heads.
Examples
>>> import dgl >>> import torch as th >>> from dgl.nn import SetTransformerEncoder >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> set_trans_enc = SetTransformerEncoder(5, 4, 4, 20) # create a settrans encoder.
Case 1: Input a single graph
>>> set_trans_enc(g1, g1_node_feats) tensor([[ 0.1262, 1.9081, 0.7287, 0.1678, 0.8854], [0.0634, 1.1996, 0.6955, 0.9230, 1.4904], [0.9972, 0.7924, 0.6907, 0.5221, 1.6211]], grad_fn=<NativeLayerNormBackward>)
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ node features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats]) >>> >>> set_trans_enc(batch_g, batch_f) tensor([[ 0.1262, 1.9081, 0.7287, 0.1678, 0.8854], [0.0634, 1.1996, 0.6955, 0.9230, 1.4904], [0.9972, 0.7924, 0.6907, 0.5221, 1.6211], [0.7973, 1.3203, 0.0634, 0.5237, 1.5306], [0.4497, 1.0920, 0.8470, 0.8030, 1.4977], [0.4940, 1.6045, 0.2363, 0.4885, 1.3737], [0.9840, 1.0913, 0.0099, 0.4653, 1.6199]], grad_fn=<NativeLayerNormBackward>)
See also
Notes
SetTransformerEncoder is not a readout layer, the tensor it returned is nodewise representation instead out graphwise representation, and the SetTransformerDecoder would return a graph readout tensor.

forward
(graph, feat)[source]¶ Compute the Encoder part of Set Transformer.
 Parameters
graph (DGLGraph) – The input graph.
feat (torch.Tensor) – The input feature with shape \((N, D)\), where \(N\) is the number of nodes in the graph.
 Returns
The output feature with shape \((N, D)\).
 Return type
torch.Tensor
SetTransformerDecoder¶

class
dgl.nn.pytorch.glob.
SetTransformerDecoder
(d_model, num_heads, d_head, d_ff, n_layers, k, dropouth=0.0, dropouta=0.0)[source]¶ Bases:
torch.nn.modules.module.Module
The Decoder module in Set Transformer: A Framework for Attentionbased PermutationInvariant Neural Networks.
 Parameters
d_model (int) – Hidden size of the model.
num_heads (int) – The number of heads.
d_head (int) – Hidden size of each head.
d_ff (int) – Kernel size in FFN (Positionwise FeedForward Network) layer.
n_layers (int) – The number of layers.
k (int) – The number of seed vectors in PMA (Pooling by Multihead Attention) layer.
dropouth (float) – Dropout rate of each sublayer.
dropouta (float) – Dropout rate of attention heads.
Examples
>>> import dgl >>> import torch as th >>> from dgl.nn import SetTransformerDecoder >>> >>> g1 = dgl.rand_graph(3, 4) # g1 is a random graph with 3 nodes and 4 edges >>> g1_node_feats = th.rand(3, 5) # feature size is 5 >>> g1_node_feats tensor([[0.8948, 0.0699, 0.9137, 0.7567, 0.3637], [0.8137, 0.8938, 0.8377, 0.4249, 0.6118], [0.5197, 0.9030, 0.6825, 0.5725, 0.4755]]) >>> >>> g2 = dgl.rand_graph(4, 6) # g2 is a random graph with 4 nodes and 6 edges >>> g2_node_feats = th.rand(4, 5) # feature size is 5 >>> g2_node_feats tensor([[0.2053, 0.2426, 0.4111, 0.9028, 0.5658], [0.5278, 0.6365, 0.9990, 0.2351, 0.8945], [0.3134, 0.0580, 0.4349, 0.7949, 0.3891], [0.0142, 0.2709, 0.3330, 0.8521, 0.6925]]) >>> >>> set_trans_dec = SetTransformerDecoder(5, 4, 4, 20, 1, 3) # define the layer
Case 1: Input a single graph
>>> set_trans_dec(g1, g1_node_feats) tensor([[0.5538, 1.8726, 1.0470, 0.0276, 0.2994, 0.6317, 1.6754, 1.3189, 0.2291, 0.0461, 0.4042, 0.8387, 1.7091, 1.0845, 0.1902]], grad_fn=<ViewBackward>)
Case 2: Input a batch of graphs
Build a batch of DGL graphs and concatenate all graphs’ node features into one tensor.
>>> batch_g = dgl.batch([g1, g2]) >>> batch_f = th.cat([g1_node_feats, g2_node_feats]) >>> >>> set_trans_dec(batch_g, batch_f) tensor([[0.5538, 1.8726, 1.0470, 0.0276, 0.2994, 0.6317, 1.6754, 1.3189, 0.2291, 0.0461, 0.4042, 0.8387, 1.7091, 1.0845, 0.1902], [0.5511, 1.8869, 1.0156, 0.0028, 0.3231, 0.6305, 1.6845, 1.3105, 0.2136, 0.0428, 0.3820, 0.8043, 1.7138, 1.1126, 0.1789]], grad_fn=<ViewBackward>)
See also

forward
(graph, feat)[source]¶ Compute the decoder part of Set Transformer.
 Parameters
graph (DGLGraph) – The input graph.
feat (torch.Tensor) – The input feature with shape \((N, D)\), where \(N\) is the number of nodes in the graph, and \(D\) means the size of features.
 Returns
The output feature with shape \((B, D)\), where \(B\) refers to the batch size.
 Return type
torch.Tensor
Heterogeneous Graph Convolution Module¶
HeteroGraphConv¶

class
dgl.nn.pytorch.
HeteroGraphConv
(mods, aggregate='sum')[source]¶ Bases:
torch.nn.modules.module.Module
A generic module for computing convolution on heterogeneous graphs.
The heterograph convolution applies submodules on their associating relation graphs, which reads the features from source nodes and writes the updated ones to destination nodes. If multiple relations have the same destination node types, their results are aggregated by the specified method. If the relation graph has no edge, the corresponding module will not be called.
Pseudocode:
outputs = {nty : [] for nty in g.dsttypes} # Apply submodules on their associating relation graphs in parallel for relation in g.canonical_etypes: stype, etype, dtype = relation dstdata = relation_submodule(g[relation], ...) outputs[dtype].append(dstdata) # Aggregate the results for each destination node type rsts = {} for ntype, ntype_outputs in outputs.items(): if len(ntype_outputs) != 0: rsts[ntype] = aggregate(ntype_outputs) return rsts
Examples
Create a heterograph with three types of relations and nodes.
>>> import dgl >>> g = dgl.heterograph({ ... ('user', 'follows', 'user') : edges1, ... ('user', 'plays', 'game') : edges2, ... ('store', 'sells', 'game') : edges3})
Create a
HeteroGraphConv
that applies different convolution modules to different relations. Note that the modules for'follows'
and'plays'
do not share weights.>>> import dgl.nn.pytorch as dglnn >>> conv = dglnn.HeteroGraphConv({ ... 'follows' : dglnn.GraphConv(...), ... 'plays' : dglnn.GraphConv(...), ... 'sells' : dglnn.SAGEConv(...)}, ... aggregate='sum')
Call forward with some
'user'
features. This computes new features for both'user'
and'game'
nodes.>>> import torch as th >>> h1 = {'user' : th.randn((g.number_of_nodes('user'), 5))} >>> h2 = conv(g, h1) >>> print(h2.keys()) dict_keys(['user', 'game'])
Call forward with both
'user'
and'store'
features. Because both the'plays'
and'sells'
relations will update the'game'
features, their results are aggregated by the specified method (i.e., summation here).>>> f1 = {'user' : ..., 'store' : ...} >>> f2 = conv(g, f1) >>> print(f2.keys()) dict_keys(['user', 'game'])
Call forward with some
'store'
features. This only computes new features for'game'
nodes.>>> g1 = {'store' : ...} >>> g2 = conv(g, g1) >>> print(g2.keys()) dict_keys(['game'])
Call forward with a pair of inputs is allowed and each submodule will also be invoked with a pair of inputs.
>>> x_src = {'user' : ..., 'store' : ...} >>> x_dst = {'user' : ..., 'game' : ...} >>> y_dst = conv(g, (x_src, x_dst)) >>> print(y_dst.keys()) dict_keys(['user', 'game'])
 Parameters
mods (dict[str, nn.Module]) – Modules associated with every edge types. The forward function of each module must have a DGLHeteroGraph object as the first argument, and its second argument is either a tensor object representing the node features or a pair of tensor object representing the source and destination node features.
aggregate (str, callable, optional) –
Method for aggregating node features generated by different relations. Allowed string values are ‘sum’, ‘max’, ‘min’, ‘mean’, ‘stack’. The ‘stack’ aggregation is performed along the second dimension, whose order is deterministic. User can also customize the aggregator by providing a callable instance. For example, aggregation by summation is equivalent to the follows:
def my_agg_func(tensors, dsttype): # tensors: is a list of tensors to aggregate # dsttype: string name of the destination node type for which the # aggregation is performed stacked = torch.stack(tensors, dim=0) return torch.sum(stacked, dim=0)

forward
(g, inputs, mod_args=None, mod_kwargs=None)[source]¶ Forward computation
Invoke the forward function with each module and aggregate their results.
 Parameters
g (DGLHeteroGraph) – Graph data.
inputs (dict[str, Tensor] or pair of dict[str, Tensor]) – Input node features.
mod_args (dict[str, tuple[any]], optional) – Extra positional arguments for the submodules.
mod_kwargs (dict[str, dict[str, any]], optional) – Extra keyword arguments for the submodules.
 Returns
Output representations for every types of nodes.
 Return type
Utility Modules¶
Sequential¶

class
dgl.nn.pytorch.utils.
Sequential
(*args)[source]¶ Bases:
torch.nn.modules.container.Sequential
A sequential container for stacking graph neural network modules.
DGL supports two modes: sequentially apply GNN modules on 1) the same graph or 2) a list of given graphs. In the second case, the number of graphs equals the number of modules inside this container.
 Parameters
*args – Submodules of torch.nn.Module that will be added to the container in the order by which they are passed in the constructor.
Examples
The following example uses PyTorch backend.
Mode 1: sequentially apply GNN modules on the same graph
>>> import torch >>> import dgl >>> import torch.nn as nn >>> import dgl.function as fn >>> from dgl.nn.pytorch import Sequential >>> class ExampleLayer(nn.Module): >>> def __init__(self): >>> super().__init__() >>> def forward(self, graph, n_feat, e_feat): >>> with graph.local_scope(): >>> graph.ndata['h'] = n_feat >>> graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h')) >>> n_feat += graph.ndata['h'] >>> graph.apply_edges(fn.u_add_v('h', 'h', 'e')) >>> e_feat += graph.edata['e'] >>> return n_feat, e_feat >>> >>> g = dgl.DGLGraph() >>> g.add_nodes(3) >>> g.add_edges([0, 1, 2, 0, 1, 2, 0, 1, 2], [0, 0, 0, 1, 1, 1, 2, 2, 2]) >>> net = Sequential(ExampleLayer(), ExampleLayer(), ExampleLayer()) >>> n_feat = torch.rand(3, 4) >>> e_feat = torch.rand(9, 4) >>> net(g, n_feat, e_feat) (tensor([[39.8597, 45.4542, 25.1877, 30.8086], [40.7095, 45.3985, 25.4590, 30.0134], [40.7894, 45.2556, 25.5221, 30.4220]]), tensor([[80.3772, 89.7752, 50.7762, 60.5520], [80.5671, 89.3736, 50.6558, 60.6418], [80.4620, 89.5142, 50.3643, 60.3126], [80.4817, 89.8549, 50.9430, 59.9108], [80.2284, 89.6954, 50.0448, 60.1139], [79.7846, 89.6882, 50.5097, 60.6213], [80.2654, 90.2330, 50.2787, 60.6937], [80.3468, 90.0341, 50.2062, 60.2659], [80.0556, 90.2789, 50.2882, 60.5845]]))
Mode 2: sequentially apply GNN modules on different graphs
>>> import torch >>> import dgl >>> import torch.nn as nn >>> import dgl.function as fn >>> import networkx as nx >>> from dgl.nn.pytorch import Sequential >>> class ExampleLayer(nn.Module): >>> def __init__(self): >>> super().__init__() >>> def forward(self, graph, n_feat): >>> with graph.local_scope(): >>> graph.ndata['h'] = n_feat >>> graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h')) >>> n_feat += graph.ndata['h'] >>> return n_feat.view(graph.number_of_nodes() // 2, 2, 1).sum(1) >>> >>> g1 = dgl.DGLGraph(nx.erdos_renyi_graph(32, 0.05)) >>> g2 = dgl.DGLGraph(nx.erdos_renyi_graph(16, 0.2)) >>> g3 = dgl.DGLGraph(nx.erdos_renyi_graph(8, 0.8)) >>> net = Sequential(ExampleLayer(), ExampleLayer(), ExampleLayer()) >>> n_feat = torch.rand(32, 4) >>> net([g1, g2, g3], n_feat) tensor([[209.6221, 225.5312, 193.8920, 220.1002], [250.0169, 271.9156, 240.2467, 267.7766], [220.4007, 239.7365, 213.8648, 234.9637], [196.4630, 207.6319, 184.2927, 208.7465]])
WeightBasis¶

class
dgl.nn.pytorch.utils.
WeightBasis
(shape, num_bases, num_outputs)[source]¶ Bases:
torch.nn.modules.module.Module
Basis decomposition module.
Basis decomposition is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:
\[W_o = \sum_{b=1}^B a_{ob} V_b\]Each weight output \(W_o\) is essentially a linear combination of basis transformations \(V_b\) with coefficients \(a_{ob}\).
If is useful as a form of regularization on a large parameter matrix. Thus, the number of weight outputs is usually larger than the number of bases.
 Parameters
KNNGraph¶

class
dgl.nn.pytorch.factory.
KNNGraph
(k)[source]¶ Bases:
torch.nn.modules.module.Module
Layer that transforms one point set into a graph, or a batch of point sets with the same number of points into a union of those graphs.
The KNNGraph is implemented in the following steps:
Compute an NxN matrix of pairwise distance for all points.
Pick the k points with the smallest distance for each point as their knearest neighbors.
Construct a graph with edges to each point as a node from its knearest neighbors.
The overall computational complexity is \(O(N^2(logN + D)\).
If a batch of point sets is provided, the point \(j\) in point set \(i\) is mapped to graph node ID: \(i \times M + j\), where \(M\) is the number of nodes in each point set.
The predecessors of each node are the knearest neighbors of the corresponding point.
 Parameters
k (int) – The number of neighbors.
Notes
The nearest neighbors found for a node include the node itself.
Examples
The following example uses PyTorch backend.
>>> import torch >>> from dgl.nn.pytorch.factory import KNNGraph >>> >>> kg = KNNGraph(2) >>> x = torch.tensor([[0,1], [1,2], [1,3], [100, 101], [101, 102], [50, 50]]) >>> g = kg(x) >>> print(g.edges()) (tensor([0, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5]), tensor([0, 0, 1, 2, 1, 2, 5, 3, 4, 3, 4, 5]))

forward
(x, algorithm='bruteforceblas', dist='euclidean')[source]¶ Forward computation.
 Parameters
x (Tensor) – \((M, D)\) or \((N, M, D)\) where \(N\) means the number of point sets, \(M\) means the number of points in each point set, and \(D\) means the size of features.
algorithm (str, optional) –
Algorithm used to compute the knearest neighbors.
’bruteforceblas’ will first compute the distance matrix using BLAS matrix multiplication operation provided by backend frameworks. Then use topk algorithm to get knearest neighbors. This method is fast when the point set is small but has \(O(N^2)\) memory complexity where \(N\) is the number of points.
’bruteforce’ will compute distances pair by pair and directly select the knearest neighbors during distance computation. This method is slower than ‘bruteforceblas’ but has less memory overhead (i.e., \(O(Nk)\) where \(N\) is the number of points, \(k\) is the number of nearest neighbors per node) since we do not need to store all distances.
’bruteforcesharemem’ (CUDA only) is similar to ‘bruteforce’ but use shared memory in CUDA devices for buffer. This method is faster than ‘bruteforce’ when the dimension of input points is not large. This method is only available on CUDA device.
’kdtree’ will use the kdtree algorithm (CPU only). This method is suitable for lowdimensional data (e.g. 3D point clouds)
’nndescent’ is a approximate approach from paper Efficient knearest neighbor graph construction for generic similarity measures. This method will search for nearest neighbor candidates in “neighbors’ neighbors”.
(default: ‘bruteforceblas’)
dist (str, optional) –
The distance metric used to compute distance between points. It can be the following metrics: * ‘euclidean’: Use Euclidean distance (L2 norm)
\(\sqrt{\sum_{i} (x_{i}  y_{i})^{2}}\).
’cosine’: Use cosine distance.
(default: ‘euclidean’)
 Returns
A DGLGraph without features.
 Return type
SegmentedKNNGraph¶

class
dgl.nn.pytorch.factory.
SegmentedKNNGraph
(k)[source]¶ Bases:
torch.nn.modules.module.Module
Layer that transforms one point set into a graph, or a batch of point sets with different number of points into a union of those graphs.
If a batch of point sets is provided, then the point \(j\) in the point set \(i\) is mapped to graph node ID: \(\sum_{p<i} V_p + j\), where \(V_p\) means the number of points in the point set \(p\).
The predecessors of each node are the knearest neighbors of the corresponding point.
 Parameters
k (int) – The number of neighbors.
Notes
The nearest neighbors found for a node include the node itself.
Examples
The following example uses PyTorch backend.
>>> import torch >>> from dgl.nn.pytorch.factory import SegmentedKNNGraph >>> >>> kg = SegmentedKNNGraph(2) >>> x = torch.tensor([[0,1], ... [1,2], ... [1,3], ... [100, 101], ... [101, 102], ... [50, 50], ... [24,25], ... [25,24]]) >>> g = kg(x, [3,3,2]) >>> print(g.edges()) (tensor([0, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 6, 6, 7, 7]), tensor([0, 0, 1, 2, 1, 2, 3, 4, 5, 3, 4, 5, 6, 7, 6, 7])) >>>

forward
(x, segs, algorithm='bruteforceblas', dist='euclidean')[source]¶ Forward computation.
 Parameters
x (Tensor) – \((M, D)\) where \(M\) means the total number of points in all point sets, and \(D\) means the size of features.
segs (iterable of int) – \((N)\) integers where \(N\) means the number of point sets. The number of elements must sum up to \(M\). And any \(N\) should \(\ge k\)
algorithm (str, optional) –
Algorithm used to compute the knearest neighbors.
’bruteforceblas’ will first compute the distance matrix using BLAS matrix multiplication operation provided by backend frameworks. Then use topk algorithm to get knearest neighbors. This method is fast when the point set is small but has \(O(N^2)\) memory complexity where \(N\) is the number of points.
’bruteforce’ will compute distances pair by pair and directly select the knearest neighbors during distance computation. This method is slower than ‘bruteforceblas’ but has less memory overhead (i.e., \(O(Nk)\) where \(N\) is the number of points, \(k\) is the number of nearest neighbors per node) since we do not need to store all distances.
’bruteforcesharemem’ (CUDA only) is similar to ‘bruteforce’ but use shared memory in CUDA devices for buffer. This method is faster than ‘bruteforce’ when the dimension of input points is not large. This method is only available on CUDA device.
’kdtree’ will use the kdtree algorithm (CPU only). This method is suitable for lowdimensional data (e.g. 3D point clouds)
’nndescent’ is a approximate approach from paper Efficient knearest neighbor graph construction for generic similarity measures. This method will search for nearest neighbor candidates in “neighbors’ neighbors”.
(default: ‘bruteforceblas’)
dist (str, optional) –
The distance metric used to compute distance between points. It can be the following metrics: * ‘euclidean’: Use Euclidean distance (L2 norm)
\(\sqrt{\sum_{i} (x_{i}  y_{i})^{2}}\).
’cosine’: Use cosine distance.
(default: ‘euclidean’)
 Returns
A DGLGraph without features.
 Return type
NodeEmbedding Module¶
NodeEmbedding¶

class
dgl.nn.pytorch.sparse_emb.
NodeEmbedding
(num_embeddings, embedding_dim, name, init_func=None, device=None, partition=None)[source]¶ Bases:
object
Class for storing node embeddings.
The class is optimized for training largescale node embeddings. It updates the embedding in a sparse way and can scale to graphs with millions of nodes. It also supports partitioning to multiple GPUs (on a single machine) for more acceleration. It does not support partitioning across machines.
Currently, DGL provides two optimizers that work with this NodeEmbedding class:
SparseAdagrad
andSparseAdam
.The implementation is based on torch.distributed package. It depends on the pytorch default distributed process group to collect multiprocess information and uses
torch.distributed.TCPStore
to share metadata information across multiple gpu processes. It use the local address of ‘127.0.0.1:12346’ to initialize the TCPStore.NOTE: The support of NodeEmbedding is experimental.
 Parameters
num_embeddings (int) – The number of embeddings. Currently, the number of embeddings has to be the same as the number of nodes.
embedding_dim (int) – The dimension size of embeddings.
name (str) – The name of the embeddings. The name should uniquely identify the embeddings in the system.
init_func (callable, optional) – The function to create the initial data. If the init function is not provided, the values of the embeddings are initialized to zero.
device (th.device) – Device to store the embeddings on.
parittion (NDArrayPartition) – The partition to use to distributed the embeddings between processes.
Examples
Before launching multiple gpu processes
>>> def initializer(emb): th.nn.init.xavier_uniform_(emb) return emb
In each training process
>>> emb = dgl.nn.NodeEmbedding(g.number_of_nodes(), 10, 'emb', init_func=initializer) >>> optimizer = dgl.optim.SparseAdam([emb], lr=0.001) >>> for blocks in dataloader: ... ... ... feats = emb(nids, gpu_0) ... loss = F.sum(feats + 1, 0) ... loss.backward() ... optimizer.step()

all_get_embedding
()[source]¶ Return a copy of the embedding stored in CPU memory. If this is a multiprocessing instance, the tensor will be returned in shared memory. If the embedding is currently stored on multiple GPUs, all processes must call this method in the same order.
NOTE: This method must be called by all processes sharing the embedding, or it may result in a deadlock.
 Returns
The tensor storing the node embeddings.
 Return type
torch.Tensor

all_set_embedding
(values)[source]¶ Set the values of the embedding. This method must be called by all processes sharing the embedding with identical tensors for
values
.NOTE: This method must be called by all processes sharing the embedding, or it may result in a deadlock.
 Parameters
values (Tensor) – The global tensor to pull values from.

property
comm
¶ Return dgl.cuda.nccl.Communicator for data sharing across processes.
 Returns
Communicator used for data sharing.
 Return type
dgl.cuda.nccl.Communicator

property
emb_tensor
¶ Return the tensor storing the node embeddings
DEPRECATED: renamed weight
 Returns
The tensor storing the node embeddings
 Return type
torch.Tensor

property
embedding_dim
¶ Return the dimension of embeddings.
 Returns
The dimension of embeddings.
 Return type

property
num_embeddings
¶ Return the number of embeddings.
 Returns
The number of embeddings.
 Return type

property
optm_state
¶ Return the optimizer related state tensor.
 Returns
The optimizer related state.
 Return type
tuple of torch.Tensor

property
partition
¶ Return the partition identifying how the tensor is split across processes.
 Returns
The mode.
 Return type
String

reset_trace
()[source]¶ Clean up the trace of the indices of embeddings used in the training step(s).

set_optm_state
(state)[source]¶ Store the optimizer related state tensor.
 Parameters
state (tuple of torch.Tensor) – Optimizer related state.

property
store
¶ Return torch.distributed.TCPStore for meta data sharing across processes.
 Returns
KVStore used for meta data sharing.
 Return type
torch.distributed.TCPStore

property
trace
¶ Return a trace of the indices of embeddings used in the training step(s).
 Returns
The indices of embeddings used in the training step(s).
 Return type
[torch.Tensor]

property
weight
¶ Return the tensor storing the node embeddings
 Returns
The tensor storing the node embeddings
 Return type
torch.Tensor