FB15k237Dataset

class dgl.data.FB15k237Dataset(reverse=True, raw_dir=None, force_reload=False, verbose=True, transform=None)[source]

Bases: KnowledgeGraphDataset

FB15k237 link prediction dataset.

FB15k-237 is a subset of FB15k where inverse relations are removed. When creating the dataset, a reverse edge with reversed relation types are created for each edge by default.

FB15k237 dataset statistics:

  • Nodes: 14541

  • Number of relation types: 237

  • Number of reversed relation types: 237

  • Label Split:

    • Train: 272115

    • Valid: 17535

    • Test: 20466

Parameters:
  • reverse (bool) – Whether to add reverse edge. Default True.

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_nodes

Number of nodes

Type:

int

num_rels

Number of relation types

Type:

int

Examples

>>> dataset = FB15k237Dataset()
>>> g = dataset.graph
>>> e_type = g.edata['e_type']
>>>
>>> # get data split
>>> train_mask = g.edata['train_mask']
>>> val_mask = g.edata['val_mask']
>>> test_mask = g.edata['test_mask']
>>>
>>> train_set = th.arange(g.num_edges())[train_mask]
>>> val_set = th.arange(g.num_edges())[val_mask]
>>>
>>> # build train_g
>>> train_edges = train_set
>>> train_g = g.edge_subgraph(train_edges,
                              relabel_nodes=False)
>>> train_g.edata['e_type'] = e_type[train_edges];
>>>
>>> # build val_g
>>> val_edges = th.cat([train_edges, val_edges])
>>> val_g = g.edge_subgraph(val_edges,
                            relabel_nodes=False)
>>> val_g.edata['e_type'] = e_type[val_edges];
>>>
>>> # Train, Validation and Test
__getitem__(idx)[source]

Gets the graph object

Parameters:

idx (int) – Item index, FB15k237Dataset has only one graph object

Returns:

The graph contains

  • edata['e_type']: edge relation type

  • edata['train_edge_mask']: positive training edge mask

  • edata['val_edge_mask']: positive validation edge mask

  • edata['test_edge_mask']: positive testing edge mask

  • edata['train_mask']: training edge set mask (include reversed training edges)

  • edata['val_mask']: validation edge set mask (include reversed validation edges)

  • edata['test_mask']: testing edge set mask (include reversed testing edges)

  • ndata['ntype']: node type. All 0 in this dataset

Return type:

dgl.DGLGraph

__len__()[source]

The number of graphs in the dataset.