dgl.compact_graphs

dgl.compact_graphs(graphs, always_preserve=None, copy_ndata=True, copy_edata=True)[source]

Given a list of graphs with the same set of nodes, find and eliminate the common isolated nodes across all graphs.

This function requires the graphs to have the same set of nodes (i.e. the node types must be the same, and the number of nodes of each node type must be the same). The metagraph does not have to be the same.

It finds all the nodes that have zero in-degree and zero out-degree in all the given graphs, and eliminates them from all the graphs.

Useful for graph sampling where you have a giant graph but you only wish to perform message passing on a smaller graph with a (tiny) subset of nodes.

Parameters
  • graphs (DGLGraph or list[DGLGraph]) –

    The graph, or list of graphs.

    All graphs must be on CPU.

    All graphs must have the same set of nodes.

  • always_preserve (Tensor or dict[str, Tensor], optional) –

    If a dict of node types and node ID tensors is given, the nodes of given node types would not be removed, regardless of whether they are isolated.

    If a Tensor is given, DGL assumes that all the graphs have one (same) node type.

  • copy_ndata (bool, optional) –

    If True, the node features of the returned graphs are copied from the original graphs.

    If False, the returned graphs will not have any node features.

    (Default: True)

  • copy_edata (bool, optional) –

    If True, the edge features of the reversed graph are copied from the original graph.

    If False, the reversed graph will not have any edge features.

    (Default: True)

Returns

The compacted graph or list of compacted graphs.

Each returned graph would have a feature dgl.NID containing the mapping of node IDs for each type from the compacted graph(s) to the original graph(s). Note that the mapping is the same for all the compacted graphs.

All the returned graphs are on CPU.

Return type

DGLGraph or list[DGLGraph]

Notes

This function currently requires that the same node type of all graphs should have the same node type ID, i.e. the node types are ordered the same.

If copy_edata is True, the resulting graph will share the edge feature tensors with the input graph. Hence, users should try to avoid in-place operations which will be visible to both graphs.

This function discards the batch information. Please use dgl.DGLGraph.set_batch_num_nodes() and dgl.DGLGraph.set_batch_num_edges() on the transformed graph to maintain the information.

Examples

The following code constructs a bipartite graph with 20 users and 10 games, but only user #1 and #3, as well as game #3 and #5, have connections:

>>> g = dgl.heterograph({('user', 'plays', 'game'): ([1, 3], [3, 5])},
>>>                      {'user': 20, 'game': 10})

The following would compact the graph above to another bipartite graph with only two users and two games.

>>> new_g, induced_nodes = dgl.compact_graphs(g)
>>> induced_nodes
{'user': tensor([1, 3]), 'game': tensor([3, 5])}

The mapping tells us that only user #1 and #3 as well as game #3 and #5 are kept. Furthermore, the first user and second user in the compacted graph maps to user #1 and #3 in the original graph. Games are similar.

One can verify that the edge connections are kept the same in the compacted graph.

>>> new_g.edges(form='all', order='eid', etype='plays')
(tensor([0, 1]), tensor([0, 1]), tensor([0, 1]))

When compacting multiple graphs, nodes that do not have any connections in any of the given graphs are removed. So if you compact g and the following g2 graphs together:

>>> g2 = dgl.heterograph({('user', 'plays', 'game'): ([1, 6], [6, 8])},
>>>                      {'user': 20, 'game': 10})
>>> (new_g, new_g2), induced_nodes = dgl.compact_graphs([g, g2])
>>> induced_nodes
{'user': tensor([1, 3, 6]), 'game': tensor([3, 5, 6, 8])}

Then one can see that user #1 from both graphs, users #3 from the first graph, as well as user #6 from the second graph, are kept. Games are similar.

Similarly, one can also verify the connections:

>>> new_g.edges(form='all', order='eid', etype='plays')
(tensor([0, 1]), tensor([0, 1]), tensor([0, 1]))
>>> new_g2.edges(form='all', order='eid', etype='plays')
(tensor([0, 2]), tensor([2, 3]), tensor([0, 1]))