DGLDataset

class dgl.data.DGLDataset(name, url=None, raw_dir=None, save_dir=None, hash_key=(), force_reload=False, verbose=False, transform=None)[source]

Bases: object

The basic DGL dataset for creating graph datasets. This class defines a basic template class for DGL Dataset. The following steps will be executed automatically:

  1. Check whether there is a dataset cache on disk (already processed and stored on the disk) by invoking has_cache(). If true, goto 5.

  2. Call download() to download the data if url is not None.

  3. Call process() to process the data.

  4. Call save() to save the processed dataset on disk and goto 6.

  5. Call load() to load the processed dataset from disk.

  6. Done.

Users can overwite these functions with their own data processing logic.

Parameters:
  • name (str) – Name of the dataset

  • url (str) – Url to download the raw dataset. Default: None

  • raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/

  • save_dir (str) – Directory to save the processed dataset. Default: same as raw_dir

  • hash_key (tuple) – A tuple of values as the input for the hash function. Users can distinguish instances (and their caches on the disk) from the same dataset class by comparing the hash values. Default: (), the corresponding hash value is 'f9065fa7'.

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

url

The URL to download the dataset

Type:

str

name

The dataset name

Type:

str

raw_dir

Directory to store all the downloaded raw datasets.

Type:

str

raw_path

Path to the downloaded raw dataset folder. An alias for os.path.join(self.raw_dir, self.name).

Type:

str

save_dir

Directory to save all the processed datasets.

Type:

str

save_path

Path to the processed dataset folder. An alias for os.path.join(self.save_dir, self.name).

Type:

str

verbose

Whether to print more runtime information.

Type:

bool

hash

Hash value for the dataset and the setting.

Type:

str

abstract __getitem__(idx)[source]

Gets the data object at index.

abstract __len__()[source]

The number of examples in the dataset.