class dgl.nn.pytorch.conv.SAGEConv(in_feats, out_feats, aggregator_type, feat_drop=0.0, bias=True, norm=None, activation=None)[source]

Bases: torch.nn.modules.module.Module

GraphSAGE layer from Inductive Representation Learning on Large Graphs

\[ \begin{align}\begin{aligned}h_{\mathcal{N}(i)}^{(l+1)} &= \mathrm{aggregate} \left(\{h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\\h_{i}^{(l+1)} &= \sigma \left(W \cdot \mathrm{concat} (h_{i}^{l}, h_{\mathcal{N}(i)}^{l+1}) \right)\\h_{i}^{(l+1)} &= \mathrm{norm}(h_{i}^{(l+1)})\end{aligned}\end{align} \]

If a weight tensor on each edge is provided, the aggregation becomes:

\[h_{\mathcal{N}(i)}^{(l+1)} = \mathrm{aggregate} \left(\{e_{ji} h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)\]

where \(e_{ji}\) is the scalar weight on the edge from node \(j\) to node \(i\). Please make sure that \(e_{ji}\) is broadcastable with \(h_j^{l}\).

  • in_feats (int, or pair of ints) –

    Input feature size; i.e, the number of dimensions of \(h_i^{(l)}\).

    SAGEConv can be applied on homogeneous graph and unidirectional bipartite graph. If the layer applies on a unidirectional bipartite graph, in_feats specifies the input feature size on both the source and destination nodes. If a scalar is given, the source and destination node feature size would take the same value.

    If aggregator type is gcn, the feature size of source and destination nodes are required to be the same.

  • out_feats (int) – Output feature size; i.e, the number of dimensions of \(h_i^{(l+1)}\).

  • aggregator_type (str) – Aggregator type to use (mean, gcn, pool, lstm).

  • feat_drop (float) – Dropout rate on features, default: 0.

  • bias (bool) – If True, adds a learnable bias to the output. Default: True.

  • norm (callable activation function/layer or None, optional) – If not None, applies normalization to the updated node features.

  • activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.


>>> import dgl
>>> import numpy as np
>>> import torch as th
>>> from dgl.nn import SAGEConv
>>> # Case 1: Homogeneous graph
>>> g = dgl.graph(([0,1,2,3,2,5], [1,2,3,4,0,3]))
>>> g = dgl.add_self_loop(g)
>>> feat = th.ones(6, 10)
>>> conv = SAGEConv(10, 2, 'pool')
>>> res = conv(g, feat)
>>> res
tensor([[-1.0888, -2.1099],
        [-1.0888, -2.1099],
        [-1.0888, -2.1099],
        [-1.0888, -2.1099],
        [-1.0888, -2.1099],
        [-1.0888, -2.1099]], grad_fn=<AddBackward0>)
>>> # Case 2: Unidirectional bipartite graph
>>> u = [0, 1, 0, 0, 1]
>>> v = [0, 1, 2, 3, 2]
>>> g = dgl.bipartite((u, v))
>>> u_fea = th.rand(2, 5)
>>> v_fea = th.rand(4, 10)
>>> conv = SAGEConv((5, 10), 2, 'mean')
>>> res = conv(g, (u_fea, v_fea))
>>> res
tensor([[ 0.3163,  3.1166],
        [ 0.3866,  2.5398],
        [ 0.5873,  1.6597],
        [-0.2502,  2.8068]], grad_fn=<AddBackward0>)
forward(graph, feat, edge_weight=None)[source]

Compute GraphSAGE layer.

  • graph (DGLGraph) – The graph.

  • feat (torch.Tensor or pair of torch.Tensor) – If a torch.Tensor is given, it represents the input feature of shape \((N, D_{in})\) where \(D_{in}\) is size of input feature, \(N\) is the number of nodes. If a pair of torch.Tensor is given, the pair must contain two tensors of shape \((N_{in}, D_{in_{src}})\) and \((N_{out}, D_{in_{dst}})\).

  • edge_weight (torch.Tensor, optional) – Optional tensor on the edge. If given, the convolution will weight with regard to the message.


The output feature of shape \((N_{dst}, D_{out})\) where \(N_{dst}\) is the number of destination nodes in the input graph, \(D_{out}\) is the size of the output feature.

Return type



Reinitialize learnable parameters.


The linear weights \(W^{(l)}\) are initialized using Glorot uniform initialization. The LSTM module is using xavier initialization method for its weights.