class, hidden_size, num_heads, attn_bias_type='add', norm_first=False, dropout=0.1, attn_dropout=0.1, activation=ReLU())[source]

Bases: Module

Graphormer Layer with Dense Multi-Head Attention, as introduced in Do Transformers Really Perform Bad for Graph Representation?

  • feat_size (int) – Feature size.

  • hidden_size (int) – Hidden size of feedforward layers.

  • num_heads (int) – Number of attention heads, by which feat_size is divisible.

  • attn_bias_type (str, optional) –

    The type of attention bias used for modifying attention. Selected from β€˜add’ or β€˜mul’. Default: β€˜add’.

    • ’add’ is for additive attention bias.

    • ’mul’ is for multiplicative attention bias.

  • norm_first (bool, optional) – If True, it performs layer normalization before attention and feedforward operations. Otherwise, it applies layer normalization afterwards. Default: False.

  • dropout (float, optional) – Dropout probability. Default: 0.1.

  • attn_dropout (float, optional) – Attention dropout probability. Default: 0.1.

  • activation (callable activation layer, optional) – Activation function. Default: nn.ReLU().


>>> import torch as th
>>> from dgl.nn import GraphormerLayer
>>> batch_size = 16
>>> num_nodes = 100
>>> feat_size = 512
>>> num_heads = 8
>>> nfeat = th.rand(batch_size, num_nodes, feat_size)
>>> bias = th.rand(batch_size, num_nodes, num_nodes, num_heads)
>>> net = GraphormerLayer(
>>> out = net(nfeat, bias)
forward(nfeat, attn_bias=None, attn_mask=None)[source]

Forward computation.

  • nfeat (torch.Tensor) – A 3D input tensor. Shape: (batch_size, N, feat_size), where N is the maximum number of nodes.

  • attn_bias (torch.Tensor, optional) – The attention bias used for attention modification. Shape: (batch_size, N, N, num_heads).

  • attn_mask (torch.Tensor, optional) – The attention mask used for avoiding computation on invalid positions, where invalid positions are indicated by True values. Shape: (batch_size, N, N). Note: For rows corresponding to unexisting nodes, make sure at least one entry is set to False to prevent obtaining NaNs with softmax.


y – The output tensor. Shape: (batch_size, N, feat_size)

Return type: