dgl.nn.functional.edge_softmax(graph, logits, eids='__ALL__', norm_by='dst')[source]

Compute softmax over weights of incoming edges for every node.

For a node \(i\), edge softmax is an operation that computes

\[a_{ij} = \frac{\exp(z_{ij})}{\sum_{j\in\mathcal{N}(i)}\exp(z_{ij})}\]

where \(z_{ij}\) is a signal of edge \(j\rightarrow i\), also called logits in the context of softmax. \(\mathcal{N}(i)\) is the set of nodes that have an edge to \(i\).

By default edge softmax is normalized by destination nodes(i.e. \(ij\) are incoming edges of i in the formula above). We also support edge softmax normalized by source nodes(i.e. \(ij\) are outgoing edges of i in the formula). The former case corresponds to softmax in GAT and Transformer, and the latter case corresponds to softmax in Capsule network. An example of using edge softmax is in Graph Attention Network where the attention weights are computed with this operation. Other non-GNN examples using this are Transformer, Capsule, etc.

  • graph (DGLGraph) – The graph over which edge softmax will be performed.

  • logits (torch.Tensor or dict of torch.Tensor) – The input edge feature. Heterogeneous graphs can have dict of tensors where each tensor stores the edge features of the corresponding relation type.

  • eids (torch.Tensor or ALL, optional) – The IDs of the edges to apply edge softmax. If ALL, it will apply edge softmax to all edges in the graph. Default: ALL.

  • norm_by (str, could be src or dst) – Normalized by source nodes or destination nodes. Default: dst.


Softmax value.

Return type:

Tensor or tuple of tensors


  • Input shape: \((E, *, 1)\) where * means any number of additional dimensions, \(E\) equals the length of eids. If the eids is ALL, \(E\) equals the number of edges in the graph.

  • Return shape: \((E, *, 1)\)

Examples on a homogeneous graph

The following example uses PyTorch backend.

>>> from dgl.nn.functional import edge_softmax
>>> import dgl
>>> import torch as th

Create a DGLGraph object and initialize its edge features.

>>> g = dgl.graph((th.tensor([0, 0, 0, 1, 1, 2]), th.tensor([0, 1, 2, 1, 2, 2])))
>>> edata = th.ones(6, 1).float()
>>> edata

Apply edge softmax over g:

>>> edge_softmax(g, edata)

Apply edge softmax over g normalized by source nodes:

>>> edge_softmax(g, edata, norm_by='src')

Apply edge softmax to first 4 edges of g:

>>> edge_softmax(g, edata[:4], th.Tensor([0,1,2,3]))

Examples on a heterogeneous graph

Create a heterogeneous graph and initialize its edge features.

>>> hg = dgl.heterograph({
...     ('user', 'follows', 'user'): ([0, 0, 1], [0, 1, 2]),
...     ('developer', 'develops', 'game'): ([0, 1], [0, 1])
...     })
>>> edata_follows = th.ones(3, 1).float()
>>> edata_develops = th.ones(2, 1).float()
>>> edata_dict = {('user', 'follows', 'user'): edata_follows,
... ('developer','develops', 'game'): edata_develops}

Apply edge softmax over hg normalized by source nodes:

>>> edge_softmax(hg, edata_dict, norm_by='src')
    {('developer', 'develops', 'game'): tensor([[1.],
    [1.]]), ('user', 'follows', 'user'): tensor([[0.5000],