FusedCSCSamplingGraph

class dgl.graphbolt.FusedCSCSamplingGraph(c_csc_graph: ScriptObject)[source]

Bases: SamplingGraph

A sampling graph in CSC format.

copy_to_shared_memory(shared_memory_name: str)[source]

Copy the graph to shared memory.

Parameters:

shared_memory_name (str) – Name of the shared memory.

Returns:

The copied FusedCSCSamplingGraph object on shared memory.

Return type:

FusedCSCSamplingGraph

in_subgraph(nodes: Tensor | Dict[str, Tensor]) SampledSubgraphImpl[source]

Return the subgraph induced on the inbound edges of the given nodes.

An in subgraph is equivalent to creating a new graph using the incoming edges of the given nodes. Subgraph is compacted according to the order of passed-in nodes.

Parameters:

nodes (torch.Tensor or Dict[str, torch.Tensor]) –

IDs of the given seed nodes.
  • If nodes is a tensor: It means the graph is homogeneous graph, and ids inside are homogeneous ids.

  • If nodes is a dictionary: The keys should be node type and ids inside are heterogeneous ids.

Returns:

The in subgraph.

Return type:

SampledSubgraphImpl

Examples

>>> import dgl.graphbolt as gb
>>> import torch
>>> total_num_nodes = 5
>>> total_num_edges = 12
>>> ntypes = {"N0": 0, "N1": 1}
>>> etypes = {
...     "N0:R0:N0": 0, "N0:R1:N1": 1, "N1:R2:N0": 2, "N1:R3:N1": 3}
>>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12])
>>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor(
...     [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices,
...     node_type_offset=node_type_offset,
...     type_per_edge=type_per_edge,
...     node_type_to_id=ntypes,
...     edge_type_to_id=etypes)
>>> nodes = {"N0":torch.LongTensor([1]), "N1":torch.LongTensor([1, 2])}
>>> in_subgraph = graph.in_subgraph(nodes)
>>> print(in_subgraph.sampled_csc)
{'N0:R0:N0': CSCFormatBase(indptr=tensor([0, 0]),
      indices=tensor([], dtype=torch.int64),
), 'N0:R1:N1': CSCFormatBase(indptr=tensor([0, 1, 2]),
            indices=tensor([1, 0]),
), 'N1:R2:N0': CSCFormatBase(indptr=tensor([0, 2]),
            indices=tensor([0, 1]),
), 'N1:R3:N1': CSCFormatBase(indptr=tensor([0, 1, 3]),
            indices=tensor([0, 1, 2]),
)}
pin_memory_()[source]

Copy FusedCSCSamplingGraph to the pinned memory in-place. Returns the same object modified in-place.

sample_layer_neighbors(seeds: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None, random_seed: Tensor | None = None, seed2_contribution: float = 0.0) SampledSubgraphImpl[source]

Sample neighboring edges of the given nodes and return the induced subgraph via layer-neighbor sampling from the NeurIPS 2023 paper Layer-Neighbor Sampling – Defusing Neighborhood Explosion in GNNs

Parameters:
  • seeds (torch.Tensor or Dict[str, torch.Tensor]) –

    IDs of the given seed nodes.
    • If nodes is a tensor: It means the graph is homogeneous graph, and ids inside are homogeneous ids.

    • If nodes is a dictionary: The keys should be node type and ids inside are heterogeneous ids.

  • fanouts (torch.Tensor) –

    The number of edges to be sampled for each node with or without considering edge types.

    • When the length is 1, it indicates that the fanout applies to all neighbors of the node as a collective, regardless of the edge type.

    • Otherwise, the length should equal to the number of edge types, and each fanout value corresponds to a specific edge type of the nodes.

    The value of each fanout should be >= 0 or = -1.
    • When the value is -1, all neighbors (with non-zero probability, if weighted) will be sampled once regardless of replacement. It is equivalent to selecting all neighbors with non-zero probability when the fanout is >= the number of neighbors (and replace is set to false).

    • When the value is a non-negative integer, it serves as a minimum threshold for selecting neighbors.

  • replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.

  • probs_name (str, optional) – An optional string specifying the name of an edge attribute. This attribute tensor should contain (unnormalized) probabilities corresponding to each neighboring edge of a node. It must be a 1D floating-point or boolean tensor, with the number of elements equalling the total number of edges.

  • random_seed (torch.Tensor, optional) –

    An int64 tensor with one or two elements.

    The passed random_seed makes it so that for any seed node s and its neighbor t, the rolled random variate r_t is the same for any call to this function with the same random seed. When sampling as part of the same batch, one would want identical seeds so that LABOR can globally sample. One example is that for heterogenous graphs, there is a single random seed passed for each edge type. This will sample much fewer nodes compared to having unique random seeds for each edge type. If one called this function individually for each edge type for a heterogenous graph with different random seeds, then it would run LABOR locally for each edge type, resulting into a larger number of nodes being sampled.

    If this function is called without a random_seed, we get the random seed by getting a random number from GraphBolt. Use this argument with identical random_seed if multiple calls to this function are used to sample as part of a single batch.

    If given two numbers, then the seed2_contribution argument determines the interpolation between the two random seeds.

  • seed2_contribution (float, optional) – A float value between [0, 1) that determines the contribution of the second random seed, random_seed[-1], to generate the random variates.

Returns:

The sampled subgraph.

Return type:

SampledSubgraphImpl

Examples

>>> import dgl.graphbolt as gb
>>> import torch
>>> ntypes = {"n1": 0, "n2": 1}
>>> etypes = {"n1:e1:n2": 0, "n2:e2:n1": 1}
>>> indptr = torch.LongTensor([0, 2, 4, 6, 7, 9])
>>> indices = torch.LongTensor([2, 4, 2, 3, 0, 1, 1, 0, 1])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor([1, 1, 1, 1, 0, 0, 0, 0, 0])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices,
...     node_type_offset=node_type_offset,
...     type_per_edge=type_per_edge,
...     node_type_to_id=ntypes,
...     edge_type_to_id=etypes)
>>> nodes = {'n1': torch.LongTensor([0]), 'n2': torch.LongTensor([0])}
>>> fanouts = torch.tensor([1, 1])
>>> subgraph = graph.sample_layer_neighbors(nodes, fanouts)
>>> print(subgraph.sampled_csc)
{'n1:e1:n2': CSCFormatBase(indptr=tensor([0, 1]),
            indices=tensor([0]),
), 'n2:e2:n1': CSCFormatBase(indptr=tensor([0, 1]),
            indices=tensor([2]),
)}
sample_negative_edges_uniform(edge_type, node_pairs, negative_ratio)[source]

Sample negative edges by randomly choosing negative source-destination pairs according to a uniform distribution. For each edge (u, v), it is supposed to generate negative_ratio pairs of negative edges (u, v'), where v' is chosen uniformly from all the nodes in the graph. As u is exactly same as the corresponding positive edges, it returns None for negative sources.

Parameters:
  • edge_type (str) – The type of edges in the provided node_pairs. Any negative edges sampled will also have the same type. If set to None, it will be considered as a homogeneous graph.

  • node_pairs (Tuple[Tensor, Tensor]) – A tuple of two 1D tensors that represent the source and destination of positive edges, with β€˜positive’ indicating that these edges are present in the graph. It’s important to note that within the context of a heterogeneous graph, the ids in these tensors signify heterogeneous ids.

  • negative_ratio (int) – The ratio of the number of negative samples to positive samples.

Returns:

A tuple consisting of two 1D tensors represents the source and destination of negative edges. In the context of a heterogeneous graph, both the input nodes and the selected nodes are represented by heterogeneous IDs, and the formed edges are of the input type edge_type. Note that negative refers to false negatives, which means the edge could be present or not present in the graph.

Return type:

Tuple[Tensor, Tensor]

sample_negative_edges_uniform_2(edge_type, node_pairs, negative_ratio)[source]

Sample negative edges by randomly choosing negative source-destination edges according to a uniform distribution. For each edge (u, v), it is supposed to generate negative_ratio pairs of negative edges (u, v'), where v' is chosen uniformly from all the nodes in the graph. u is exactly same as the corresponding positive edges. It returns positive edges concatenated with negative edges. In negative edges, negative sources are constructed from the corresponding positive edges.

Parameters:
  • edge_type (str) – The type of edges in the provided node_pairs. Any negative edges sampled will also have the same type. If set to None, it will be considered as a homogeneous graph.

  • node_pairs (torch.Tensor) – A 2D tensors that represent the N pairs of positive edges in source-destination format, with β€˜positive’ indicating that these edges are present in the graph. It’s important to note that within the context of a heterogeneous graph, the ids in these tensors signify heterogeneous ids.

  • negative_ratio (int) – The ratio of the number of negative samples to positive samples.

Returns:

A 2D tensors represents the N pairs of positive and negative source-destination node pairs. In the context of a heterogeneous graph, both the input nodes and the selected nodes are represented by heterogeneous IDs, and the formed edges are of the input type edge_type. Note that negative refers to false negatives, which means the edge could be present or not present in the graph.

Return type:

torch.Tensor

sample_neighbors(seeds: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None) SampledSubgraphImpl[source]

Sample neighboring edges of the given nodes and return the induced subgraph.

Parameters:
  • seeds (torch.Tensor or Dict[str, torch.Tensor]) –

    IDs of the given seed nodes.
    • If nodes is a tensor: It means the graph is homogeneous graph, and ids inside are homogeneous ids.

    • If nodes is a dictionary: The keys should be node type and ids inside are heterogeneous ids.

  • fanouts (torch.Tensor) –

    The number of edges to be sampled for each node with or without considering edge types.

    • When the length is 1, it indicates that the fanout applies to all neighbors of the node as a collective, regardless of the edge type.

    • Otherwise, the length should equal to the number of edge types, and each fanout value corresponds to a specific edge type of the nodes.

    The value of each fanout should be >= 0 or = -1.
    • When the value is -1, all neighbors (with non-zero probability, if weighted) will be sampled once regardless of replacement. It is equivalent to selecting all neighbors with non-zero probability when the fanout is >= the number of neighbors (and replace is set to false).

    • When the value is a non-negative integer, it serves as a minimum threshold for selecting neighbors.

  • replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.

  • probs_name (str, optional) – An optional string specifying the name of an edge attribute used. This attribute tensor should contain (unnormalized) probabilities corresponding to each neighboring edge of a node. It must be a 1D floating-point or boolean tensor, with the number of elements equalling the total number of edges.

Returns:

The sampled subgraph.

Return type:

SampledSubgraphImpl

Examples

>>> import dgl.graphbolt as gb
>>> import torch
>>> ntypes = {"n1": 0, "n2": 1}
>>> etypes = {"n1:e1:n2": 0, "n2:e2:n1": 1}
>>> indptr = torch.LongTensor([0, 2, 4, 6, 7, 9])
>>> indices = torch.LongTensor([2, 4, 2, 3, 0, 1, 1, 0, 1])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor([1, 1, 1, 1, 0, 0, 0, 0, 0])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices,
...     node_type_offset=node_type_offset,
...     type_per_edge=type_per_edge,
...     node_type_to_id=ntypes,
...     edge_type_to_id=etypes)
>>> nodes = {'n1': torch.LongTensor([0]), 'n2': torch.LongTensor([0])}
>>> fanouts = torch.tensor([1, 1])
>>> subgraph = graph.sample_neighbors(nodes, fanouts)
>>> print(subgraph.sampled_csc)
{'n1:e1:n2': CSCFormatBase(indptr=tensor([0, 1]),
            indices=tensor([0]),
), 'n2:e2:n1': CSCFormatBase(indptr=tensor([0, 1]),
            indices=tensor([2]),
)}
temporal_sample_neighbors(nodes: Tensor | Dict[str, Tensor], input_nodes_timestamp: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None, node_timestamp_attr_name: str | None = None, edge_timestamp_attr_name: str | None = None) ScriptObject[source]

Temporally Sample neighboring edges of the given nodes and return the induced subgraph.

If node_timestamp_attr_name or edge_timestamp_attr_name is given, the sampled neighbor or edge of an input node must have a timestamp that is smaller than that of the input node.

Parameters:
  • nodes (torch.Tensor) – IDs of the given seed nodes.

  • input_nodes_timestamp (torch.Tensor) – Timestamps of the given seed nodes.

  • fanouts (torch.Tensor) –

    The number of edges to be sampled for each node with or without considering edge types.

    • When the length is 1, it indicates that the fanout applies to all neighbors of the node as a collective, regardless of the edge type.

    • Otherwise, the length should equal to the number of edge types, and each fanout value corresponds to a specific edge type of the nodes.

    The value of each fanout should be >= 0 or = -1.
    • When the value is -1, all neighbors (with non-zero probability, if weighted) will be sampled once regardless of replacement. It is equivalent to selecting all neighbors with non-zero probability when the fanout is >= the number of neighbors (and replace is set to false).

    • When the value is a non-negative integer, it serves as a minimum threshold for selecting neighbors.

  • replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.

  • probs_name (str, optional) – An optional string specifying the name of an edge attribute. This attribute tensor should contain (unnormalized) probabilities corresponding to each neighboring edge of a node. It must be a 1D floating-point or boolean tensor, with the number of elements equalling the total number of edges.

  • node_timestamp_attr_name (str, optional) – An optional string specifying the name of an node attribute.

  • edge_timestamp_attr_name (str, optional) – An optional string specifying the name of an edge attribute.

Returns:

The sampled subgraph.

Return type:

SampledSubgraphImpl

to(device: device) None[source]

Copy FusedCSCSamplingGraph to the specified device.

property csc_indptr: tensor

Returns the indices pointer in the CSC graph.

Returns:

The indices pointer in the CSC graph. An integer tensor with shape (total_num_nodes+1,).

Return type:

torch.tensor

property edge_attributes: Dict[str, Tensor] | None

Returns the edge attributes dictionary.

Returns:

If present, returns a dictionary of edge attributes. Each key represents the attribute’s name, while the corresponding value holds the attribute’s specific value. The length of each value should match the total number of edges.”

Return type:

Dict[str, torch.Tensor] or None

property edge_type_to_id: Dict[str, int] | None

Returns the edge type to id dictionary if present.

Returns:

If present, returns a dictionary mapping edge type to edge type id.

Return type:

Dict[str, int] or None

property indices: tensor

Returns the indices in the CSC graph.

Returns:

The indices in the CSC graph. An integer tensor with shape (total_num_edges,).

Return type:

torch.tensor

Notes

It is assumed that edges of each node are already sorted by edge type ids.

property node_attributes: Dict[str, Tensor] | None

Returns the node attributes dictionary.

Returns:

If present, returns a dictionary of node attributes. Each key represents the attribute’s name, while the corresponding value holds the attribute’s specific value. The length of each value should match the total number of nodes.”

Return type:

Dict[str, torch.Tensor] or None

property node_type_offset: Tensor | None

Returns the node type offset tensor if present. Do not modify the returned tensor in place.

Returns:

If present, returns a 1D integer tensor of shape (num_node_types + 1,). The tensor is in ascending order as nodes of the same type have continuous IDs, and larger node IDs are paired with larger node type IDs. The first value is 0 and last value is the number of nodes. And nodes with IDs between node_type_offset_[i]~node_type_offset_[i+1] are of type id β€˜i’.

Return type:

torch.Tensor or None

property node_type_to_id: Dict[str, int] | None

Returns the node type to id dictionary if present.

Returns:

If present, returns a dictionary mapping node type to node type id.

Return type:

Dict[str, int] or None

property num_edges: int | Dict[str, int]

The number of edges in the graph. - If the graph is homogenous, returns an integer. - If the graph is heterogenous, returns a dictionary.

Returns:

The number of edges. Integer indicates the total edges number of a homogenous graph; dict indicates edges number per edge types of a heterogenous graph.

Return type:

Union[int, Dict[str, int]]

Examples

>>> import dgl.graphbolt as gb, torch
>>> total_num_nodes = 5
>>> total_num_edges = 12
>>> ntypes = {"N0": 0, "N1": 1}
>>> etypes = {"N0:R0:N0": 0, "N0:R1:N1": 1,
...     "N1:R2:N0": 2, "N1:R3:N1": 3}
>>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12])
>>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor(
...     [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3])
>>> metadata = gb.GraphMetadata(ntypes, etypes)
>>> graph = gb.fused_csc_sampling_graph(indptr, indices, node_type_offset,
...     type_per_edge, None, metadata)
>>> print(graph.num_edges)
{'N0:R0:N0': 2, 'N0:R1:N1': 1, 'N1:R2:N0': 2, 'N1:R3:N1': 3}
property num_nodes: int | Dict[str, int]

The number of nodes in the graph. - If the graph is homogenous, returns an integer. - If the graph is heterogenous, returns a dictionary.

Returns:

The number of nodes. Integer indicates the total nodes number of a homogenous graph; dict indicates nodes number per node types of a heterogenous graph.

Return type:

Union[int, Dict[str, int]]

Examples

>>> import dgl.graphbolt as gb, torch
>>> total_num_nodes = 5
>>> total_num_edges = 12
>>> ntypes = {"N0": 0, "N1": 1}
>>> etypes = {"N0:R0:N0": 0, "N0:R1:N1": 1,
...     "N1:R2:N0": 2, "N1:R3:N1": 3}
>>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12])
>>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor(
...     [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices,
...     node_type_offset=node_type_offset,
...     type_per_edge=type_per_edge,
...     node_type_to_id=ntypes,
...     edge_type_to_id=etypes)
>>> print(graph.num_nodes)
{'N0': 2, 'N1': 3}
property total_num_edges: int

Returns the number of edges in the graph.

Returns:

The number of edges in the graph.

Return type:

int

property total_num_nodes: int

Returns the number of nodes in the graph.

Returns:

The number of rows in the dense format.

Return type:

int

property type_per_edge: Tensor | None

Returns the edge type tensor if present.

Returns:

If present, returns a 1D integer tensor of shape (total_num_edges,) containing the type of each edge in the graph.

Return type:

torch.Tensor or None