Graph samplers

dgl.contrib.sampling.sampler.NeighborSampler(g, batch_size, expand_factor, num_hops=1, neighbor_type='in', node_prob=None, seed_nodes=None, shuffle=False, num_workers=1, return_seed_id=False, prefetch=False)[source]

Create a sampler that samples neighborhood.

This creates a subgraph data loader that samples subgraphs from the input graph with neighbor sampling. This sampling method is implemented in C and can perform sampling very efficiently.

A subgraph grows from a seed vertex. It contains sampled neighbors of the seed vertex as well as the edges that connect neighbor nodes with seed nodes. When the number of hops is k (>1), the neighbors are sampled from the k-hop neighborhood. In this case, the sampled edges are the ones that connect the source nodes and the sampled neighbor nodes of the source nodes.

The subgraph loader returns a list of subgraphs and a dictionary of additional information about the subgraphs. The size of the subgraph list is the number of workers.

The dictionary contains:

  • seeds: a list of 1D tensors of seed Ids, if return_seed_id is True.
Parameters:
  • g (the DGLGraph where we sample subgraphs.) –
  • batch_size (The number of subgraphs in a batch.) –
  • expand_factor (the number of neighbors sampled from the neighbor list) – of a vertex. The value of this parameter can be an integer: indicates the number of neighbors sampled from a neighbor list. a floating-point: indicates the ratio of the sampled neighbors in a neighbor list. string: indicates some common ways of calculating the number of sampled neighbors, e.g., ‘sqrt(deg)’.
  • num_hops (The size of the neighborhood where we sample vertices.) –
  • neighbor_type (indicates the neighbors on different types of edges.) – “in” means the neighbors on the in-edges, “out” means the neighbors on the out-edges and “both” means neighbors on both types of edges.
  • node_prob (the probability that a neighbor node is sampled.) – 1D Tensor. None means uniform sampling. Otherwise, the number of elements should be the same as the number of vertices in the graph.
  • seed_nodes (a list of nodes where we sample subgraphs from.) – If it’s None, the seed vertices are all vertices in the graph.
  • shuffle (indicates the sampled subgraphs are shuffled.) –
  • num_workers (the number of worker threads that sample subgraphs in parallel.) –
  • return_seed_id (indicates whether to return seed ids along with the subgraphs.) – The seed Ids are in the parent graph.
  • prefetch (bool, default False) – Whether to prefetch the samples in the next batch.
Returns:

The iterator returns a list of batched subgraphs and a dictionary of additional information about the subgraphs.

Return type:

A subgraph iterator