

utils.get_download_dir() Get the absolute path to the download directory.[, path, overwrite, …]) Download a given URL.
utils.check_sha1(filename, sha1_hash) Check whether the sha1 hash of the file content matches the expected hash.
utils.extract_archive(file, target_dir) Extract archive file.

Dataset Classes

Stanford sentiment treebank dataset

For more information about the dataset, see Sentiment Analysis.

class'train', vocab_file=None)[source]

Stanford Sentiment Treebank dataset.

Each sample is the constituency tree of a sentence. The leaf nodes represent words. The word is a int value stored in the x feature field. The non-leaf node has a special value PAD_WORD in the x field. Each node also has a sentiment annotation: 5 classes (very negative, negative, neutral, positive and very positive). The sentiment label is a int value stored in the y feature field.


This dataset class is compatible with pytorch’s Dataset class.


All the samples will be loaded and preprocessed in the memory first.

  • mode (str, optional) – Can be 'train', 'val', 'test' and specifies which data file to use.
  • vocab_file (str, optional) – Optional vocabulary file.

Get the tree with index idx.

Parameters:idx (int) – Tree index.
Return type:dgl.DGLGraph

Get the number of trees in the dataset.

Returns:Number of trees.
Return type:int

Mini graph classification dataset

class, min_num_v, max_num_v)[source]

The dataset class.

The datset contains 8 different types of graphs.

  • class 0 : cycle graph
  • class 1 : star graph
  • class 2 : wheel graph
  • class 3 : lollipop graph
  • class 4 : hypercube graph
  • class 5 : grid graph
  • class 6 : clique graph
  • class 7 : circular ladder graph


This dataset class is compatible with pytorch’s Dataset class.

  • num_graphs (int) – Number of graphs in this dataset.
  • min_num_v (int) – Minimum number of nodes for graphs
  • max_num_v (int) – Maximum number of nodes for graphs

Get the i^th sample.

idx : int
The sample index.
Returns:The graph and its label.
Return type:(dgl.DGLGraph, int)

Return the number of graphs in the dataset.


Number of classes.

Protein-Protein Interaction dataset


A toy Protein-Protein Interaction network dataset.

Adapted from

The dataset contains 24 graphs. The average number of nodes per graph is 2372. Each node has 50 features and 121 labels.

We use 20 graphs for training, 2 for validation and 2 for testing.


Get the i^th sample.

idx : int
The sample index.
Returns:The graph, features and its label.
Return type:(dgl.DGLGraph, ndarray, ndarray)

Return number of samples in this dataset.