dgl.create_block

dgl.create_block(data_dict, num_src_nodes=None, num_dst_nodes=None, idtype=None, device=None)[source]

Create a message flow graph (MFG) as a DGLBlock object.

Parameters
  • data_dict (graph data) –

    The dictionary data for constructing a MFG. The keys are in the form of string triplets (src_type, edge_type, dst_type), specifying the source node type, edge type, and destination node type. The values are graph data in the form of \((U, V)\), where \((U[i], V[i])\) forms the edge with ID \(i\). The allowed graph data formats are:

    • (Tensor, Tensor): Each tensor must be a 1D tensor containing node IDs. DGL calls this format “tuple of node-tensors”. The tensors should have the same data type, which must be either int32 or int64. They should also have the same device context (see below the descriptions of idtype and device).

    • (iterable[int], iterable[int]): Similar to the tuple of node-tensors format, but stores node IDs in two sequences (e.g. list, tuple, numpy.ndarray).

    If you would like to create a MFG with a single source node type, a single destination node type, and a single edge type, then you can pass in the graph data directly without wrapping it as a dictionary.

  • num_src_nodes (dict[str, int] or int, optional) –

    The number of nodes for each source node type, which is a dictionary mapping a node type \(T\) to the number of \(T\)-typed source nodes.

    If not given for a node type \(T\), DGL finds the largest ID appearing in every graph data whose source node type is \(T\), and sets the number of nodes to be that ID plus one. If given and the value is no greater than the largest ID for some source node type, DGL will raise an error. By default, DGL infers the number of nodes for all source node types.

    If you would like to create a MFG with a single source node type, a single destination node type, and a single edge type, then you can pass in an integer to directly represent the number of source nodes.

  • num_dst_nodes (dict[str, int] or int, optional) –

    The number of nodes for each destination node type, which is a dictionary mapping a node type \(T\) to the number of \(T\)-typed destination nodes.

    If not given for a node type \(T\), DGL finds the largest ID appearing in every graph data whose destination node type is \(T\), and sets the number of nodes to be that ID plus one. If given and the value is no greater than the largest ID for some destination node type, DGL will raise an error. By default, DGL infers the number of nodes for all destination node types.

    If you would like to create a MFG with a single destination node type, a single destination node type, and a single edge type, then you can pass in an integer to directly represent the number of destination nodes.

  • idtype (int32 or int64, optional) – The data type for storing the structure-related graph information such as node and edge IDs. It should be a framework-specific data type object (e.g., torch.int32). If None (default), DGL infers the ID type from the data_dict argument.

  • device (device context, optional) – The device of the returned graph, which should be a framework-specific device object (e.g., torch.device). If None (default), DGL uses the device of the tensors of the data argument. If data is not a tuple of node-tensors, the returned graph is on CPU. If the specified device differs from that of the provided tensors, it casts the given tensors to the specified device first.

Returns

The created MFG.

Return type

DGLBlock

Notes

  1. If the idtype argument is not given then:

    • in the case of the tuple of node-tensor format, DGL uses the data type of the given ID tensors.

    • in the case of the tuple of sequence format, DGL uses int64.

    Once the graph has been created, you can change the data type by using dgl.DGLGraph.long() or dgl.DGLGraph.int().

    If the specified idtype argument differs from the data type of the provided tensors, it casts the given tensors to the specified data type first.

  2. The most efficient construction approach is to provide a tuple of node tensors without specifying idtype and device. This is because the returned graph shares the storage with the input node-tensors in this case.

  3. DGL internally maintains multiple copies of the graph structure in different sparse formats and chooses the most efficient one depending on the computation invoked. If memory usage becomes an issue in the case of large graphs, use dgl.DGLGraph.formats() to restrict the allowed formats.

  4. DGL internally decides a deterministic order for the same set of node types and canonical edge types, which does not necessarily follow the order in data_dict.

Examples

The following example uses PyTorch backend.

>>> import dgl
>>> block = dgl.create_block(([0, 1, 2], [1, 2, 3]), num_src_nodes=3, num_dst_nodes=4)
>>> block
Block(num_src_nodes=3, num_dst_nodes=4, num_edges=3)
>>> block = dgl.create_block({
...     ('A', 'AB', 'B'): ([1, 2, 3], [2, 1, 0]),
...     ('B', 'BA', 'A'): ([2, 1], [2, 3])},
...     num_src_nodes={'A': 6, 'B': 5},
...     num_dst_nodes={'A': 4, 'B': 3})
>>> block
Block(num_src_nodes={'A': 6, 'B': 5},
      num_dst_nodes={'A': 4, 'B': 3},
      num_edges={('A', 'AB', 'B'): 3, ('B', 'BA', 'A'): 2},
      metagraph=[('A', 'B', 'AB'), ('B', 'A', 'BA')])

See also

to_block()