dgl.sampling¶
Sampling operators.
This module contains the implementations of various sampling operators.
Sampling algorithms on graphs.
Random walk sampling functions¶
random_walk (g, nodes, *[, metapath, length, …]) 
Generate random walk traces from an array of starting nodes based on the given metapath. 
pack_traces (traces, types) 
Pack the padded traces returned by random_walk() into a concatenated array. 
Neighbor sampling functions¶
sample_neighbors (g, nodes, fanout[, …]) 
Sample neighboring edges of the given nodes and return the induced subgraph. 
select_topk (g, k, weight[, nodes, edge_dir, …]) 
Select the neighboring edges with klargest (or ksmallest) weights of the given nodes and return the induced subgraph. 
PyTorch DataLoaders with neighborhood sampling¶
Builtin sampler classes for more complicated sampling algorithms¶

class
dgl.sampling.
RandomWalkNeighborSampler
(G, num_traversals, termination_prob, num_random_walks, num_neighbors, metapath=None, weight_column='weights')[source]¶ PinSagelike neighbor sampler extended to any heterogeneous graphs.
Given a heterogeneous graph and a list of nodes, this callable will generate a homogeneous graph where the neighbors of each given node are the most commonly visited nodes of the same type by multiple random walks starting from that given node. Each random walk consists of multiple metapathbased traversals, with a probability of termination after each traversal.
The edges of the returned homogeneous graph will connect to the given nodes from their most commonly visited nodes, with a feature indicating the number of visits.
The metapath must have the same beginning and ending node type to make the algorithm work.
This is a generalization of PinSAGE sampler which only works on bidirectional bipartite graphs.
Parameters:  G (DGLGraph) – The graph.
 num_traversals (int) –
The maximum number of metapathbased traversals for a single random walk.
Usually considered a hyperparameter.
 termination_prob (float) –
Termination probability after each metapathbased traversal.
Usually considered a hyperparameter.
 num_random_walks (int) –
Number of random walks to try for each given node.
Usually considered a hyperparameter.
 num_neighbors (int) – Number of neighbors (or most commonly visited nodes) to select for each given node.
 metapath (list[str] or list[tuple[str, str, str]], optional) –
The metapath.
If not given, DGL assumes that the graph is homogeneous and the metapath consists of one step over the single edge type.
 weight_column (str, default "weights") – The name of the edge feature to be stored on the returned graph with the number of visits.
 Inputs –
  –
 seed_nodes (Tensor) – A tensor of given node IDs of node type
ntype
to generate neighbors from. The node typentype
is the beginning and ending node type of the given metapath.  Outputs –
  –
 g (DGLGraph) – A homogeneous graph constructed by selecting neighbors for each given node according to the algorithm above.
Examples
See examples in
PinSAGESampler
.

class
dgl.sampling.
PinSAGESampler
(G, ntype, other_type, num_traversals, termination_prob, num_random_walks, num_neighbors, weight_column='weights')[source]¶ PinSAGElike neighbor sampler.
This callable works on a bidirectional bipartite graph with edge types
(ntype, fwtype, other_type)
and(other_type, bwtype, ntype)
(wherentype
,fwtype
,bwtype
andother_type
could be arbitrary type names). It will generate a homogeneous graph of node typentype
where the neighbors of each given node are the most commonly visited nodes of the same type by multiple random walks starting from that given node. Each random walk consists of multiple metapathbased traversals, with a probability of termination after each traversal. The metapath is always[fwtype, bwtype]
, walking from node typentype
to node typeother_type
then back tontype
.The edges of the returned homogeneous graph will connect to the given nodes from their most commonly visited nodes, with a feature indicating the number of visits.
Parameters:  G (DGLGraph) –
The bidirectional bipartite graph.
The graph should only have two node types:
ntype
andother_type
. The graph should only have two edge types, one connecting fromntype
toother_type
, and another connecting fromother_type
tontype
.  ntype (str) – The node type for which the graph would be constructed on.
 other_type (str) – The other node type.
 num_traversals (int) –
The maximum number of metapathbased traversals for a single random walk.
Usually considered a hyperparameter.
 termination_prob (int) –
Termination probability after each metapathbased traversal.
Usually considered a hyperparameter.
 num_random_walks (int) –
Number of random walks to try for each given node.
Usually considered a hyperparameter.
 num_neighbors (int) – Number of neighbors (or most commonly visited nodes) to select for each given node.
 weight_column (str, default "weights") – The name of the edge feature to be stored on the returned graph with the number of visits.
 Inputs –
  –
 seed_nodes (Tensor) – A tensor of given node IDs of node type
ntype
to generate neighbors from.  Outputs –
  –
 g (DGLHeteroGraph) – A homogeneous graph constructed by selecting neighbors for each given node according to PinSage algorithm.
Examples
Generate a random bidirectional bipartite graph with 3000 “A” nodes and 5000 “B” nodes.
>>> g = scipy.sparse.random(3000, 5000, 0.003) >>> G = dgl.heterograph({ ... ('A', 'AB', 'B'): g, ... ('B', 'BA', 'A'): g.T})
Then we create a PinSage neighbor sampler that samples a graph of node type “A”. Each node would have (a maximum of) 10 neighbors.
>>> sampler = dgl.sampling.PinSAGESampler(G, 'A', 'B', 3, 0.5, 200, 10)
This is how we select the neighbors for node #0, #1 and #2 of type “A” according to PinSAGE algorithm:
>>> seeds = torch.LongTensor([0, 1, 2]) >>> frontier = sampler(seeds) >>> frontier.all_edges(form='uv') (tensor([ 230, 0, 802, 47, 50, 1639, 1533, 406, 2110, 2687, 2408, 2823, 0, 972, 1230, 1658, 2373, 1289, 1745, 2918, 1818, 1951, 1191, 1089, 1282, 566, 2541, 1505, 1022, 812]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))
For an endtoend example of PinSAGE model, including sampling on multiple layers and computing with the sampled graphs, please refer to our PinSage example in
examples/pytorch/pinsage
.References
 Graph Convolutional Neural Networks for WebScale Recommender Systems
 Ying et al., 2018, https://arxiv.org/abs/1806.01973
 G (DGLGraph) –
Neighborhood samplers for multilayer GNNs¶

class
dgl.sampling.
MultiLayerNeighborSampler
(fanouts, replace=False, return_eids=False)[source]¶ Sampler that builds computational dependency of node representations via neighbor sampling for multilayer GNN.
This sampler will make every node gather messages from a fixed number of neighbors per edge type. The neighbors are picked uniformly.
Parameters:  fanouts (list[int] or list[dict[etype, int] or None]) –
List of neighbors to sample per edge type for each GNN layer, starting from the first layer.
If the graph is homogeneous, only an integer is needed for each layer.
If None is provided for one layer, all neighbors will be included regardless of edge types.
If 1 is provided for one edge type on one layer, then all inbound edges of that edge type will be included.
 replace (bool, default True) – Whether to sample with replacement
 return_eids (bool, default False) –
Whether to return edge IDs of the original graph in the sampled blocks.
If True, the edge IDs will be stored as
dgl.EID
feature for each edge type.
Examples
To train a 3layer GNN for node classification on a set of nodes
train_nid
on a homogeneous graph where each node takes messages from all neighbors (assume the backend is PyTorch): >>> sampler = dgl.sampling.NeighborSampler([None, None, None]) >>> collator = dgl.sampling.NodeCollator(g, train_nid, sampler) >>> dataloader = torch.utils.data.DataLoader( … collator.dataset, collate_fn=collator.collate, … batch_size=1024, shuffle=True, drop_last=False, num_workers=4) >>> for blocks in dataloader: … train_on(blocks)If we wish to gather from 5 neighbors on the first layer, 10 neighbors on the second, and 15 layers on the third: >>> sampler = dgl.sampling.NeighborSampler([5, 10, 15])
If training on a heterogeneous graph and you want different number of neighbors for each edge type, one should instead provide a list of dicts. Each dict would specify the number of neighbors to pick per edge type. >>> sampler = dgl.sampling.NeighborSampler([ … {(‘user’, ‘follows’, ‘user’): 5, … (‘user’, ‘plays’, ‘game’): 4, … (‘game’, ‘playedby’, ‘user’): 3}] * 3)
 fanouts (list[int] or list[dict[etype, int] or None]) –
Data loaders for minibatch iteration¶

class
dgl.sampling.
NodeCollator
(g, nids, block_sampler)[source]¶ DGL collator to combine training node classification or regression on a single graph.
Parameters:  g (DGLHeteroGraph) – The graph.
 nids (Tensor or dict[ntype, Tensor]) – The node set to compute outputs.
 block_sampler (
BlockSampler
) – The neighborhood sampler.
Examples
To train a 3layer GNN for node classification on a set of nodes
train_nid
on a homogeneous graph where each node takes messages from all neighbors (assume the backend is PyTorch): >>> sampler = dgl.sampling.NeighborSampler([None, None, None]) >>> collator = dgl.sampling.NodeCollator(g, train_nid, sampler) >>> dataloader = torch.utils.data.DataLoader( … collator.dataset, collate_fn=collator.collate, … batch_size=1024, shuffle=True, drop_last=False, num_workers=4) >>> for input_nodes, output_nodes, blocks in dataloader: … train_on(input_nodes, output_nodes, blocks)
Abstract class for neighborhood sampler¶

class
dgl.sampling.
BlockSampler
(num_hops)[source]¶ Abstract class specifying the neighborhood sampling strategy for DGL data loaders.
The main method for BlockSampler is
sample_blocks()
, which generates a list of blocks for a multilayer GNN given a set of seed nodes to have their outputs computed.The default implementation of
sample_blocks()
is to repeatnum_hops
times the following: Obtain a frontier with the same nodes as the original graph but only the edges
involved in message passing on the last layer.
Customizable via
sample_frontier()
.  Optionally, postprocess the obtained frontier (e.g. by removing edges connecting training
node pairs). One can add such postprocessors via
add_frontier_postprocessor()
.  Convert the frontier into a block.
 Optionally, postprocess the block (e.g. by assigning edge IDs). One can add such
postprocessors via
add_block_postprocessor()
.  Prepend the block to the block list to be returned.
All subclasses should either
 Override
sample_blocks()
method, or  Override
sample_frontier()
method while specifying the number of layers to sample innum_hops
argument.
See also
For
,please
 Obtain a frontier with the same nodes as the original graph but only the edges
involved in message passing on the last layer.
Customizable via