dgl.sort_csc_by_tag

dgl.sort_csc_by_tag(g, tag, tag_offset_name='_TAG_OFFSET', tag_type='node')[source]

Return a new graph whose CSC matrix is sorted by the given tag.

Sort the internal CSC matrix of the graph so that the adjacency list of each node , which contains the in-edges, is sorted by the tag of the in-neighbors. After sorting, edges sharing the same tag will be arranged in a consecutive range in a node’s adjacency list. Following is an example:

Consider a graph as follows:

0 <- 0, 1, 2, 3, 4
1 <- 0, 1, 2

Given node tags [1, 1, 0, 2, 0], each node’s adjacency list will be sorted as follows:

0 <- 2, 4, 0, 1, 3
1 <- 2, 0, 1

Given edge tags [1, 1, 0, 2, 0, 1, 1, 0] has the same effect as above node tags.

The function will also return the starting offsets of the tag segments in a tensor of shape \((N, max\_tag+2)\). For a node i, its in-edges connecting to node tag j is stored between tag_offsets[i][j] ~ tag_offsets[i][j+1]. Since the offsets can be viewed node data, we store it in the ndata of the returned graph. Users can specify the ndata name by the tag_pos_name argument.

Note that the function will not change the edge ID neither how the edge features are stored. The input graph must allow CSC format. The graph must be on CPU.

If the input graph is heterogenous, it must have only one edge type and two node types (i.e., source and destination node types). In this case, the provided node tags are for the source nodes, and the tag offsets are stored in the destination node data.

The sorted graph and the calculated tag offsets are needed by certain operators that consider node tags. See sample_neighbors_biased() for an example.

Parameters:
  • g (DGLGraph) – The input graph.

  • tag (Tensor) – Integer tensor of shape \((N,)\), \(N\) being the number of (source) nodes or edges.

  • tag_offset_name (str) – The name of the node feature to store tag offsets.

  • tag_type (str) – Tag type which could be node or edge.

Returns:

g_sorted – A new graph whose CSC matrix is sorted. The node/edge features of the input graph is shallow-copied over.

  • g_sorted.ndata[tag_offset_name] : Tensor of shape \((N, max\_tag + 2)\).

  • If g is heterogeneous, get from g_sorted.dstdata.

Return type:

DGLGraph

Examples

tag_type is node.

>>> import dgl
>>> import torch
>>> g = dgl.graph(([0,1,2,3,4,0,1,2],[0,0,0,0,0,1,1,1]))
>>> g.adj_external(scipy_fmt='csr', transpose=True).nonzero()
(array([0, 0, 0, 0, 0, 1, 1, 1], dtype=int32),
 array([0, 1, 2, 3, 4, 0, 1, 2], dtype=int32)))
>>> tag = torch.IntTensor([1,1,0,2,0])
>>> g_sorted = dgl.sort_csc_by_tag(g, tag)
>>> g_sorted.adj_external(scipy_fmt='csr', transpose=True).nonzero()
(array([0, 0, 0, 0, 0, 1, 1, 1], dtype=int32),
 array([2, 4, 0, 1, 3, 2, 0, 1], dtype=int32))
>>> g_sorted.ndata['_TAG_OFFSET']
tensor([[0, 2, 4, 5],
        [0, 1, 3, 3],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]])

tag_type is edge.

>>> g = dgl.graph(([0,1,2,3,4,0,1,2],[0,0,0,0,0,1,1,1]))
>>> tag = torch.tensor([1, 1, 0, 2, 0, 1, 1, 0])
>>> g_sorted = dgl.sort_csc_by_tag(g, tag, tag_type='edge')
>>> g_sorted.adj_external(scipy_fmt='csr', transpose=True).nonzero()
(array([0, 0, 0, 0, 0, 1, 1, 1], dtype=int32), array([2, 4, 0, 1, 3, 2, 0, 1], dtype=int32))
>>> g_sorted.dstdata['_TAG_OFFSET']
tensor([[0, 2, 4, 5],
        [0, 1, 3, 3],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]])