dgl.sampling.PinSAGESampler¶
-
class
dgl.sampling.
PinSAGESampler
(G, ntype, other_type, num_traversals, termination_prob, num_random_walks, num_neighbors, weight_column='weights')[source]¶ PinSAGE-like neighbor sampler.
This callable works on a bidirectional bipartite graph with edge types
(ntype, fwtype, other_type)
and(other_type, bwtype, ntype)
(wherentype
,fwtype
,bwtype
andother_type
could be arbitrary type names). It will generate a homogeneous graph of node typentype
where the neighbors of each given node are the most commonly visited nodes of the same type by multiple random walks starting from that given node. Each random walk consists of multiple metapath-based traversals, with a probability of termination after each traversal. The metapath is always[fwtype, bwtype]
, walking from node typentype
to node typeother_type
then back tontype
.The edges of the returned homogeneous graph will connect to the given nodes from their most commonly visited nodes, with a feature indicating the number of visits.
- Parameters
G (DGLGraph) –
The bidirectional bipartite graph.
The graph should only have two node types:
ntype
andother_type
. The graph should only have two edge types, one connecting fromntype
toother_type
, and another connecting fromother_type
tontype
.ntype (str) – The node type for which the graph would be constructed on.
other_type (str) – The other node type.
num_traversals (int) –
The maximum number of metapath-based traversals for a single random walk.
Usually considered a hyperparameter.
termination_prob (int) –
Termination probability after each metapath-based traversal.
Usually considered a hyperparameter.
num_random_walks (int) –
Number of random walks to try for each given node.
Usually considered a hyperparameter.
num_neighbors (int) – Number of neighbors (or most commonly visited nodes) to select for each given node.
weight_column (str, default "weights") – The name of the edge feature to be stored on the returned graph with the number of visits.
Examples
Generate a random bidirectional bipartite graph with 3000 “A” nodes and 5000 “B” nodes.
>>> g = scipy.sparse.random(3000, 5000, 0.003) >>> G = dgl.heterograph({ ... ('A', 'AB', 'B'): g.nonzero(), ... ('B', 'BA', 'A'): g.T.nonzero()})
Then we create a PinSage neighbor sampler that samples a graph of node type “A”. Each node would have (a maximum of) 10 neighbors.
>>> sampler = dgl.sampling.PinSAGESampler(G, 'A', 'B', 3, 0.5, 200, 10)
This is how we select the neighbors for node #0, #1 and #2 of type “A” according to PinSAGE algorithm:
>>> seeds = torch.LongTensor([0, 1, 2]) >>> frontier = sampler(seeds) >>> frontier.all_edges(form='uv') (tensor([ 230, 0, 802, 47, 50, 1639, 1533, 406, 2110, 2687, 2408, 2823, 0, 972, 1230, 1658, 2373, 1289, 1745, 2918, 1818, 1951, 1191, 1089, 1282, 566, 2541, 1505, 1022, 812]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))
For an end-to-end example of PinSAGE model, including sampling on multiple layers and computing with the sampled graphs, please refer to our PinSage example in
examples/pytorch/pinsage
.References
- Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Ying et al., 2018, https://arxiv.org/abs/1806.01973
-
__init__
(G, ntype, other_type, num_traversals, termination_prob, num_random_walks, num_neighbors, weight_column='weights')[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(G, ntype, other_type, …[, …])Initialize self.