GDELTDataset

class dgl.data.GDELTDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]

Bases: DGLBuiltinDataset

GDELT dataset for event-based temporal graph

The Global Database of Events, Language, and Tone (GDELT) dataset. This contains events happend all over the world (ie every protest held anywhere in Russia on a given day is collapsed to a single entry). This Dataset consists ofevents collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).

Reference:

Statistics:

  • Train examples: 2,304

  • Valid examples: 288

  • Test examples: 384

Parameters:
  • mode (str) – Must be one of (β€˜train’, β€˜valid’, β€˜test’). Default: β€˜train’

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

start_time

Start time of the temporal graph

Type:

int

end_time

End time of the temporal graph

Type:

int

is_temporal

Does the dataset contain temporal graphs

Type:

bool

Examples

>>> # get train, valid, test dataset
>>> train_data = GDELTDataset()
>>> valid_data = GDELTDataset(mode='valid')
>>> test_data = GDELTDataset(mode='test')
>>>
>>> # length of train set
>>> train_size = len(train_data)
>>>
>>> for g in train_data:
....    e_feat = g.edata['rel_type']
....    # your code here
....
>>>
__getitem__(t)[source]

Get graph by with events before time t + self.start_time

Parameters:

t (int) – Time, its value must be in range [0, self.end_time - self.start_time]

Returns:

The graph contains:

  • edata['rel_type']: edge type

Return type:

dgl.DGLGraph

__len__()[source]

Number of graphs in the dataset.

Return type:

int