dgl.distributed.node_split

dgl.distributed.node_split(nodes, partition_book=None, ntype='_N', rank=None, force_even=True, node_trainer_ids=None)[source]

Split nodes and return a subset for the local rank.

This function splits the input nodes based on the partition book and returns a subset of nodes for the local rank. This method is used for dividing workloads for distributed training.

The input nodes are stored as a vector of masks. The length of the vector is the same as the number of nodes in a graph; 1 indicates that the vertex in the corresponding location exists.

There are two strategies to split the nodes. By default, it splits the nodes in a way to maximize data locality. That is, all nodes that belong to a process are returned. If force_even is set to true, the nodes are split evenly so that each process gets almost the same number of nodes.

When force_even is True, the data locality is still preserved if a graph is partitioned with Metis and the node/edge IDs are shuffled. In this case, majority of the nodes returned for a process are the ones that belong to the process. If node/edge IDs are not shuffled, data locality is not guaranteed.

Parameters
  • nodes (1D tensor or DistTensor) – A boolean mask vector that indicates input nodes.

  • partition_book (GraphPartitionBook, optional) – The graph partition book

  • ntype (str, optional) – The node type of the input nodes.

  • rank (int, optional) – The rank of a process. If not given, the rank of the current process is used.

  • force_even (bool, optional) – Force the nodes are split evenly.

  • node_trainer_ids (1D tensor or DistTensor, optional) – If not None, split the nodes to the trainers on the same machine according to trainer IDs assigned to each node. Otherwise, split randomly.

Returns

The vector of node IDs that belong to the rank.

Return type

1D-tensor