Torch modules for graph convolutions.

class dgl.nn.pytorch.conv.GraphConv(in_feats, out_feats, norm=True, bias=True, activation=None)[source]

Bases: torch.nn.modules.module.Module

Apply graph convolution over an input signal.

Graph convolution is introduced in GCN and can be described as below:

\[h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})\]

where \(\mathcal{N}(i)\) is the neighbor set of node \(i\). \(c_{ij}\) is equal to the product of the square root of node degrees: \(\sqrt{|\mathcal{N}(i)|}\sqrt{|\mathcal{N}(j)|}\). \(\sigma\) is an activation function.

The model parameters are initialized as in the original implementation where the weight \(W^{(l)}\) is initialized using Glorot uniform initialization and the bias is initialized to be zero.


Zero in degree nodes could lead to invalid normalizer. A common practice to avoid this is to add a self-loop for each node in the graph, which can be achieved by:

>>> g = ... # some DGLGraph
>>> g.add_edges(g.nodes(), g.nodes())
  • in_feats (int) – Number of input features.
  • out_feats (int) – Number of output features.
  • norm (bool, optional) – If True, the normalizer \(c_{ij}\) is applied. Default: True.
  • bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.
  • activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.

torch.Tensor – The learnable weight tensor.


torch.Tensor – The learnable bias tensor.

forward(feat, graph)[source]

Compute graph convolution.


  • Input shape: \((N, *, \text{in_feats})\) where * means any number of additional dimensions, \(N\) is the number of nodes.
  • Output shape: \((N, *, \text{out_feats})\) where all but the last dimension are the same shape as the input.
  • feat (torch.Tensor) – The input feature
  • graph (DGLGraph) – The graph.

The output feature

Return type:



Reinitialize learnable parameters.


Torch modules for graph related softmax.

class dgl.nn.pytorch.softmax.EdgeSoftmax[source]

Bases: torch.autograd.function.Function

Apply softmax over signals of incoming edges.

For a node \(i\), edgesoftmax is an operation of computing

\[a_{ij} = \frac{\exp(z_{ij})}{\sum_{j\in\mathcal{N}(i)}\exp(z_{ij})}\]

where \(z_{ij}\) is a signal of edge \(j\rightarrow i\), also called logits in the context of softmax. \(\mathcal{N}(i)\) is the set of nodes that have an edge to \(i\).

An example of using edgesoftmax is in Graph Attention Network where the attention weights are computed with such an edgesoftmax operation.

static forward(ctx, g, score)[source]

score = dgl.EData(g, score) score_max = score.dst_max() # of type dgl.NData score = score - score_max # edge_sub_dst, ret dgl.EData score_sum = score.dst_sum() # of type dgl.NData out = score / score_sum # edge_div_dst, ret dgl.EData return out.data