dgl.distributed.graph_services.sample_neighbors

dgl.distributed.graph_services.sample_neighbors(g, nodes, fanout, edge_dir='in', prob=None, replace=False)[source]

Sample from the neighbors of the given nodes from a distributed graph.

For each node, a number of inbound (or outbound when edge_dir == 'out') edges will be randomly chosen. The returned graph will contain all the nodes in the original graph, but only the sampled edges.

Node/edge features are not preserved. The original IDs of the sampled edges are stored as the dgl.EID feature in the returned graph.

This version provides an experimental support for heterogeneous graphs. When the input graph is heterogeneous, the sampled subgraph is still stored in the homogeneous graph format. That is, all nodes and edges are assigned with unique IDs (in contrast, we typically use a type name and a node/edge ID to identify a node or an edge in DGLGraph). We refer to this type of IDs as homogeneous ID. Users can use dgl.distributed.GraphPartitionBook.map_to_per_ntype() and dgl.distributed.GraphPartitionBook.map_to_per_etype() to identify their node/edge types and node/edge IDs of that type.

For heterogeneous graphs, nodes can be a dictionary whose key is node type and the value is type-specific node IDs; nodes can also be a tensor of homogeneous ID.

Parameters
  • g (DistGraph) – The distributed graph..

  • nodes (tensor or dict) – Node IDs to sample neighbors from. If it’s a dict, it should contain only one key-value pair to make this API consistent with dgl.sampling.sample_neighbors.

  • fanout (int) –

    The number of edges to be sampled for each node.

    If -1 is given, all of the neighbors will be selected.

  • edge_dir (str, optional) –

    Determines whether to sample inbound or outbound edges.

    Can take either in for inbound edges or out for outbound edges.

  • prob (str, optional) –

    Feature name used as the (unnormalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge.

    The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don’t have to sum up to one). Otherwise, the result will be undefined.

  • replace (bool, optional) –

    If True, sample with replacement.

    When sampling with replacement, the sampled subgraph could have parallel edges.

    For sampling without replacement, if fanout > the number of neighbors, all the neighbors are sampled. If fanout == -1, all neighbors are collected.

Returns

A sampled subgraph containing only the sampled neighboring edges. It is on CPU.

Return type

DGLGraph