6.7 Using GPU for Neighborhood Sampling¶
DGL since 0.7 has been supporting GPU-based neighborhood sampling, which has a significant speed advantage over CPU-based neighborhood sampling. If you estimate that your graph and its features can fit onto GPU and your model does not take a lot of GPU memory, then it is best to put the GPU into memory and use GPU-based neighbor sampling.
For example, OGB Products has 2.4M nodes and 61M edges, each node having 100-dimensional features. The node feature themselves take less than 1GB memory, and the graph also takes less than 1GB since the memory consumption of a graph depends on the number of edges. Therefore it is entirely possible to fit the whole graph onto GPU.
This feature is experimental and a work-in-progress. Please stay tuned for further updates.
Using GPU-based neighborhood sampling in DGL data loaders¶
One can use GPU-based neighborhood sampling with DGL data loaders via
Putting the graph onto GPU.
num_workersargument to 0, because CUDA does not allow multiple processes accessing the same context.
deviceargument to a GPU device.
All the other arguments for the
NodeDataLoader can be
the same as the other user guides and tutorials.
g = g.to('cuda:0') dataloader = dgl.dataloading.NodeDataLoader( g, # The graph must be on GPU. train_nid, sampler, device=torch.device('cuda:0'), # The device argument must be GPU. num_workers=0, # Number of workers must be 0. batch_size=1000, drop_last=False, shuffle=True)
GPU-based neighbor sampling also works for custom neighborhood samplers as long as
(1) your sampler is subclassed from
BlockSampler, and (2)
your sampler entirely works on GPU.
EdgeDataLoader and heterogeneous graphs
are not supported.