LegacyTUDataset¶

class dgl.data.LegacyTUDataset(name, use_pandas=False, hidden_size=10, max_allow_node=None, raw_dir=None, force_reload=False, verbose=False, transform=None)[source]¶

Bases: dgl.data.dgl_dataset.DGLBuiltinDataset

LegacyTUDataset contains lots of graph kernel datasets for graph classification.

Parameters
  • name (str) – Dataset Name, such as ENZYMES, DD, COLLAB, MUTAG, can be the datasets name on https://chrsmrrs.github.io/datasets/docs/datasets/.

  • use_pandas (bool) – Numpy’s file read function has performance issue when file is large, using pandas can be faster. Default: False

  • hidden_size (int) – Some dataset doesn’t contain features. Use constant node features initialization instead, with hidden size as hidden_size. Default : 10

  • max_allow_node (int) – Remove graphs that contains more nodes than max_allow_node. Default : None

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

max_num_node¶

Maximum number of nodes

Type

int

num_classes¶

Number of classes

Type

int

num_labels¶

(DEPRECATED, use num_classes instead) Number of classes

Type

numpy.int64

Notes

LegacyTUDataset uses provided node feature by default. If no feature provided, it uses one-hot node label instead. If neither labels provided, it uses constant for node feature.

The dataset sorts graphs by their labels. Shuffle is preferred before manual train/val split.

Examples

>>> data = LegacyTUDataset('DD')

The dataset instance is an iterable

>>> len(data)
1178
>>> g, label = data[1024]
>>> g
Graph(num_nodes=88, num_edges=410,
      ndata_schemes={'feat': Scheme(shape=(89,), dtype=torch.float32), '_ID': Scheme(shape=(), dtype=torch.int64)}
      edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)})
>>> label
tensor(1)

Batch the graphs and labels for mini-batch training

>>> graphs, labels = zip(*[data[i] for i in range(16)])
>>> batched_graphs = dgl.batch(graphs)
>>> batched_labels = torch.tensor(labels)
>>> batched_graphs
Graph(num_nodes=9539, num_edges=47382,
      ndata_schemes={'feat': Scheme(shape=(89,), dtype=torch.float32), '_ID': Scheme(shape=(), dtype=torch.int64)}
      edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)})
__getitem__(idx)[source]¶

Get the idx-th sample.

Parameters

idx (int) – The sample index.

Returns

Graph with node feature stored in feat field and node label in node_label if available. And its label.

Return type

(dgl.DGLGraph, Tensor)

__len__()[source]¶

Return the number of graphs in the dataset.