dgl.to_block

dgl.to_block(g, dst_nodes=None, include_dst_in_src=True)[source]

Convert a graph into a bipartite-structured block for message passing.

A block is a graph consisting of two sets of nodes: the input nodes and output nodes. The input and output nodes can have multiple node types. All the edges connect from input nodes to output nodes.

Specifically, the input nodes and output nodes will have the same node types as the ones in the original graph. DGL maps each edge (u, v) with edge type (utype, etype, vtype) in the original graph to the edge with type etype connecting from node ID u of type utype in the input side to node ID v of type vtype in the output side.

The output nodes of the block will only contain the nodes that have at least one inbound edge of any type. The input nodes of the block will only contain the nodes that appear in the output nodes, as well as the nodes that have at least one outbound edge connecting to one of the output nodes.

If the dst_nodes argument is not None, it specifies the output nodes instead.

Parameters
  • graph (DGLGraph) – The graph. Must be on CPU.

  • dst_nodes (Tensor or dict[str, Tensor], optional) –

    The list of output nodes.

    If a tensor is given, the graph must have only one node type.

    If given, it must be a superset of all the nodes that have at least one inbound edge. An error will be raised otherwise.

  • include_dst_in_src (bool) –

    If False, do not include output nodes in input nodes.

    (Default: True)

Returns

The new graph describing the block.

The node IDs induced for each type in both sides would be stored in feature dgl.NID.

The edge IDs induced for each type would be stored in feature dgl.EID.

Return type

DGLBlock

Raises

DGLError – If dst_nodes is specified but it is not a superset of all the nodes that have at least one inbound edge.

Notes

to_block() is most commonly used in customizing neighborhood sampling for stochastic training on a large graph. Please refer to the user guide Chapter 6: Stochastic Training on Large Graphs for a more thorough discussion about the methodology of stochastic training.

Examples

Converting a homogeneous graph to a block as described above: >>> g = dgl.graph(([1, 2], [2, 3])) >>> block = dgl.to_block(g, torch.LongTensor([3, 2]))

The output nodes would be exactly the same as the ones given: [3, 2].

>>> induced_dst = block.dstdata[dgl.NID]
>>> induced_dst
tensor([3, 2])

The first few input nodes would also be exactly the same as the ones given. The rest of the nodes are the ones necessary for message passing into nodes 3, 2. This means that the node 1 would be included.

>>> induced_src = block.srcdata[dgl.NID]
>>> induced_src
tensor([3, 2, 1])

You can notice that the first two nodes are identical to the given nodes as well as the output nodes.

The induced edges can also be obtained by the following:

>>> block.edata[dgl.EID]
tensor([2, 1])

This indicates that edge (2, 3) and (1, 2) are included in the result graph. You can verify that the first edge in the block indeed maps to the edge (2, 3), and the second edge in the block indeed maps to the edge (1, 2):

>>> src, dst = block.edges(order='eid')
>>> induced_src[src], induced_dst[dst]
(tensor([2, 1]), tensor([3, 2]))

The output nodes specified must be a superset of the nodes that have edges connecting to them. For example, the following will raise an error since the output nodes does not contain node 3, which has an edge connecting to it.

>>> g = dgl.graph(([1, 2], [2, 3]))
>>> dgl.to_block(g, torch.LongTensor([2]))     # error

Converting a heterogeneous graph to a block is similar, except that when specifying the output nodes, you have to give a dict:

>>> g = dgl.heterograph({('A', '_E', 'B'): ([1, 2], [2, 3])})

If you don’t specify any node of type A on the output side, the node type A in the block would have zero nodes on the output side.

>>> block = dgl.to_block(g, {'B': torch.LongTensor([3, 2])})
>>> block.number_of_dst_nodes('A')
0
>>> block.number_of_dst_nodes('B')
2
>>> block.dstnodes['B'].data[dgl.NID]
tensor([3, 2])

The input side would contain all the nodes on the output side:

>>> block.srcnodes['B'].data[dgl.NID]
tensor([3, 2])

As well as all the nodes that have connections to the nodes on the output side:

>>> block.srcnodes['A'].data[dgl.NID]
tensor([2, 1])