Graph samplers¶

dgl.contrib.sampling.sampler.
NeighborSampler
(g, batch_size, expand_factor, num_hops=1, neighbor_type='in', node_prob=None, seed_nodes=None, shuffle=False, num_workers=1, max_subgraph_size=None, return_seed_id=False)[source]¶ Create a sampler that samples neighborhood.
Note
This method currently only supports MXNet backend. Set “DGLBACKEND” environment variable to “mxnet”.
This creates a subgraph data loader that samples subgraphs from the input graph with neighbor sampling. This simpling method is implemented in C and can perform sampling very efficiently.
A subgraph grows from a seed vertex. It contains sampled neighbors of the seed vertex as well as the edges that connect neighbor nodes with seed nodes. When the number of hops is k (>1), the neighbors are sampled from the khop neighborhood. In this case, the sampled edges are the ones that connect the source nodes and the sampled neighbor nodes of the source nodes.
The subgraph loader returns a list of subgraphs and a dictionary of additional information about the subgraphs. The size of the subgraph list is the number of workers. The dictionary contains:
‘seeds’: a list of 1D tensors of seed Ids, if return_seed_id is True.Parameters:  g (the DGLGraph where we sample subgraphs.) –
 batch_size (The number of subgraphs in a batch.) –
 expand_factor (the number of neighbors sampled from the neighbor list) – of a vertex. The value of this parameter can be an integer: indicates the number of neighbors sampled from a neighbor list. a floatingpoint: indicates the ratio of the sampled neighbors in a neighbor list. string: indicates some common ways of calculating the number of sampled neighbors, e.g., ‘sqrt(deg)’.
 num_hops (The size of the neighborhood where we sample vertices.) –
 neighbor_type (indicates the neighbors on different types of edges.) – “in” means the neighbors on the inedges, “out” means the neighbors on the outedges and “both” means neighbors on both types of edges.
 node_prob (the probability that a neighbor node is sampled.) – 1D Tensor. None means uniform sampling. Otherwise, the number of elements should be the same as the number of vertices in the graph.
 seed_nodes (a list of nodes where we sample subgraphs from.) – If it’s None, the seed vertices are all vertices in the graph.
 shuffle (indicates the sampled subgraphs are shuffled.) –
 num_workers (the number of worker threads that sample subgraphs in parallel.) –
 max_subgraph_size (the maximal subgraph size in terms of the number of nodes.) – GPU doesn’t support very large subgraphs.
 return_seed_id (indicates whether to return seed ids along with the subgraphs.) – The seed Ids are in the parent graph.
Returns: The iterator returns a list of batched subgraphs and a dictionary of additional information about the subgraphs.
Return type: A subgraph iterator