TreeGridDataset

class dgl.data.TreeGridDataset(tree_height=8, num_motifs=80, grid_size=3, perturb_ratio=0.1, seed=None, raw_dir=None, force_reload=False, verbose=True, transform=None)[source]

Bases: DGLBuiltinDataset

TREE-GRIDS dataset from GNNExplainer: Generating Explanations for Graph Neural Networks

This is a synthetic dataset for node classification. It is generated by performing the following steps in order.

Construct a balanced binary tree as the base graph.
Construct a set of n-by-n grid motifs.
Attach the motifs to randomly selected nodes of the base graph.
Perturb the graph by adding random edges.
Generate constant feature for all nodes, which is 1.
Nodes in the tree belong to class 0 and nodes in grids belong to class 1.

Parameters:

tree_height (int, optional) – Height of the balanced binary tree. Default: 8
num_motifs (int, optional) – Number of grid motifs to use. Default: 80
grid_size (int, optional) – The number of nodes in a grid motif will be grid_size ^ 2. Default: 3
perturb_ratio (float, optional) – Number of random edges to add in perturbation divided by the number of original edges in the graph. Default: 0.1
seed (integer, random_state, or None, optional) – Indicator of random number generation state. Default: None
raw_dir (str, optional) – Raw file directory to store the processed data. Default: ~/.dgl/
force_reload (bool, optional) – Whether to always generate the data from scratch rather than load a cached version. Default: False
verbose (bool, optional) – Whether to print progress information. Default: True
transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access. Default: None

num_classes

Number of node classes

Type:: int

Examples

>>> from dgl.data import TreeGridDataset
>>> dataset = TreeGridDataset()
>>> dataset.num_classes
2
>>> g = dataset[0]
>>> label = g.ndata['label']
>>> feat = g.ndata['feat']

__getitem__(idx)[source]: Gets the data object at index.

__len__()[source]: The number of examples in the dataset.