DGLDataset

class dgl.data.DGLDataset(name, url=None, raw_dir=None, save_dir=None, hash_key=(), force_reload=False, verbose=False, transform=None)[source]

Bases: object

The basic DGL dataset for creating graph datasets. This class defines a basic template class for DGL Dataset. The following steps will be executed automatically:

  1. Check whether there is a dataset cache on disk (already processed and stored on the disk) by invoking has_cache(). If true, goto 5.

  2. Call download() to download the data if url is not None.

  3. Call process() to process the data.

  4. Call save() to save the processed dataset on disk and goto 6.

  5. Call load() to load the processed dataset from disk.

  6. Done.

Users can overwite these functions with their own data processing logic.

Parameters
  • name (str) – Name of the dataset

  • url (str) – Url to download the raw dataset. Default: None

  • raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/

  • save_dir (str) – Directory to save the processed dataset. Default: same as raw_dir

  • hash_key (tuple) – A tuple of values as the input for the hash function. Users can distinguish instances (and their caches on the disk) from the same dataset class by comparing the hash values. Default: (), the corresponding hash value is 'f9065fa7'.

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

url

The URL to download the dataset

Type

str

name

The dataset name

Type

str

raw_dir

Directory to store all the downloaded raw datasets.

Type

str

raw_path

Path to the downloaded raw dataset folder. An alias for os.path.join(self.raw_dir, self.name).

Type

str

save_dir

Directory to save all the processed datasets.

Type

str

save_path

Path to the processed dataset folder. An alias for os.path.join(self.save_dir, self.name).

Type

str

verbose

Whether to print more runtime information.

Type

bool

hash

Hash value for the dataset and the setting.

Type

str

abstract __getitem__(idx)[source]

Gets the data object at index.

abstract __len__()[source]

The number of examples in the dataset.