class dgl.data.utils.split_dataset(dataset, frac_list=None, shuffle=False, random_state=None)[source]


Split dataset into training, validation and test set.

  • dataset – We assume len(dataset) gives the number of datapoints and dataset[i] gives the ith datapoint.

  • frac_list (list or None, optional) – A list of length 3 containing the fraction to use for training, validation and test. If None, we will use [0.8, 0.1, 0.1].

  • shuffle (bool, optional) – By default we perform a consecutive split of the dataset. If True, we will first randomly shuffle the dataset.

  • random_state (None, int or array_like, optional) – Random seed used to initialize the pseudo-random number generator. Can be any integer between 0 and 2**32 - 1 inclusive, an array (or other sequence) of such integers, or None (the default). If seed is None, then RandomState will try to read data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise.


Subsets for training, validation and test.

Return type:

list of length 3