GraphormerLayer

class dgl.nn.pytorch.gt.GraphormerLayer(feat_size, hidden_size, num_heads, attn_bias_type='add', norm_first=False, dropout=0.1, attn_dropout=0.1, activation=ReLU())[source]

Bases: torch.nn.modules.module.Module

Graphormer Layer with Dense Multi-Head Attention, as introduced in Do Transformers Really Perform Bad for Graph Representation?

Parameters
  • feat_size (int) – Feature size.

  • hidden_size (int) – Hidden size of feedforward layers.

  • num_heads (int) – Number of attention heads, by which feat_size is divisible.

  • attn_bias_type (str, optional) –

    The type of attention bias used for modifying attention. Selected from ‘add’ or ‘mul’. Default: ‘add’.

    • ’add’ is for additive attention bias.

    • ’mul’ is for multiplicative attention bias.

  • norm_first (bool, optional) – If True, it performs layer normalization before attention and feedforward operations. Otherwise, it applies layer normalization afterwards. Default: False.

  • dropout (float, optional) – Dropout probability. Default: 0.1.

  • attn_dropout (float, optional) – Attention dropout probability. Default: 0.1.

  • activation (callable activation layer, optional) – Activation function. Default: nn.ReLU().

Examples

>>> import torch as th
>>> from dgl.nn import GraphormerLayer
>>> batch_size = 16
>>> num_nodes = 100
>>> feat_size = 512
>>> num_heads = 8
>>> nfeat = th.rand(batch_size, num_nodes, feat_size)
>>> bias = th.rand(batch_size, num_nodes, num_nodes, num_heads)
>>> net = GraphormerLayer(
        feat_size=feat_size,
        hidden_size=2048,
        num_heads=num_heads
    )
>>> out = net(nfeat, bias)
forward(nfeat, attn_bias=None, attn_mask=None)[source]

Forward computation.

Parameters
  • nfeat (torch.Tensor) – A 3D input tensor. Shape: (batch_size, N, feat_size), where N is the maximum number of nodes.

  • attn_bias (torch.Tensor, optional) – The attention bias used for attention modification. Shape: (batch_size, N, N, num_heads).

  • attn_mask (torch.Tensor, optional) – The attention mask used for avoiding computation on invalid positions, where invalid positions are indicated by True values. Shape: (batch_size, N, N). Note: For rows corresponding to unexisting nodes, make sure at least one entry is set to False to prevent obtaining NaNs with softmax.

Returns

y – The output tensor. Shape: (batch_size, N, feat_size)

Return type

torch.Tensor