dgl.distributed.initialize¶

dgl.distributed.initialize(ip_config, num_servers=1, num_workers=0, max_queue_size=21474836480, net_type='tensorpipe', num_worker_threads=1)[source]¶

Initialize DGL’s distributed module

This function initializes DGL’s distributed module. It acts differently in server or client modes. In the server mode, it runs the server code and never returns. In the client mode, it builds connections with servers for communication and creates worker processes for distributed sampling. num_workers specifies the number of sampling worker processes per trainer process. Users also have to provide the number of server processes on each machine in order to connect to all the server processes in the cluster of machines correctly.

Parameters

ip_config (str) – File path of ip_config file
num_servers (int) – The number of server processes on each machine. This argument is deprecated in DGL 0.7.0.
num_workers (int) – Number of worker process on each machine. The worker processes are used for distributed sampling. This argument is deprecated in DGL 0.7.0.
max_queue_size (int) –
Maximal size (bytes) of client queue buffer (~20 GB on default).

Note that the 20 GB is just an upper-bound and DGL uses zero-copy and it will not allocate 20GB memory at once.
net_type (str, optional) –
Networking type. Valid options are: 'socket', 'tensorpipe'.

Default: 'tensorpipe'
num_worker_threads (int) – The number of threads in a worker process.

Note

Users have to invoke this API before any DGL’s distributed API and framework-specific distributed API. For example, when used with Pytorch, users have to invoke this function before Pytorch’s pytorch.distributed.init_process_group.