dgl.dataloading.GraphDataLoader¶

class dgl.dataloading.GraphDataLoader(dataset, collate_fn=None, use_ddp=False, ddp_seed=0, **kwargs)[source]¶

Batched graph data loader.

PyTorch dataloader for batch-iterating over a set of graphs, generating the batched graph and corresponding label tensor (if provided) of the said minibatch.

Parameters

dataset (torch.utils.data.Dataset) – The dataset to load graphs from.
collate_fn (Function, default is None) – The customized collate function. Will use the default collate function if not given.
use_ddp (boolean, optional) –
If True, tells the DataLoader to split the training set for each participating process appropriately using torch.utils.data.distributed.DistributedSampler.

Overrides the sampler argument of torch.utils.data.DataLoader.
ddp_seed (int, optional) –
The seed for shuffling the dataset in torch.utils.data.distributed.DistributedSampler.

Only effective when use_ddp is True.
kwargs (dict) –
Key-word arguments to be passed to the parent PyTorch torch.utils.data.DataLoader class. Common arguments are:
- batch_size (int): The number of indices in each batch.
- drop_last (bool): Whether to drop the last incomplete batch.
- shuffle (bool): Whether to randomly shuffle the indices at each epoch.

Examples

To train a GNN for graph classification on a set of graphs in dataset:

>>> dataloader = dgl.dataloading.GraphDataLoader(
...     dataset, batch_size=1024, shuffle=True, drop_last=False, num_workers=4)
>>> for batched_graph, labels in dataloader:
...     train_on(batched_graph, labels)

With Distributed Data Parallel

If you are using PyTorch’s distributed training (e.g. when using torch.nn.parallel.DistributedDataParallel), you can train the model by turning on the use_ddp option:

>>> dataloader = dgl.dataloading.GraphDataLoader(
...     dataset, use_ddp=True, batch_size=1024, shuffle=True, drop_last=False, num_workers=4)
>>> for epoch in range(start_epoch, n_epochs):
...     dataloader.set_epoch(epoch)
...     for batched_graph, labels in dataloader:
...         train_on(batched_graph, labels)

__init__(dataset, collate_fn=None, use_ddp=False, ddp_seed=0, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(dataset[, collate_fn, use_ddp, …])	Initialize self.
`check_worker_number_rationality`()
`set_epoch`(epoch)	Sets the epoch number for the underlying sampler which ensures all replicas to use a different ordering for each epoch.

Attributes

`collator_arglist`
`multiprocessing_context`