FB15k237Dataset

class dgl.data.FB15k237Dataset(reverse=True, raw_dir=None, force_reload=False, verbose=True, transform=None)[source]

Bases: dgl.data.knowledge_graph.KnowledgeGraphDataset

FB15k237 link prediction dataset.

    Deprecated since version 0.5.0:
  • train is deprecated, it is replaced by:

    >>> dataset = FB15k237Dataset()
    >>> graph = dataset[0]
    >>> train_mask = graph.edata['train_mask']
    >>> train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
    >>> src, dst = graph.find_edges(train_idx)
    >>> rel = graph.edata['etype'][train_idx]
    
  • valid is deprecated, it is replaced by:

    >>> dataset = FB15k237Dataset()
    >>> graph = dataset[0]
    >>> val_mask = graph.edata['val_mask']
    >>> val_idx = th.nonzero(val_mask, as_tuple=False).squeeze()
    >>> src, dst = graph.find_edges(val_idx)
    >>> rel = graph.edata['etype'][val_idx]
    
  • test is deprecated, it is replaced by:

    >>> dataset = FB15k237Dataset()
    >>> graph = dataset[0]
    >>> test_mask = graph.edata['test_mask']
    >>> test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
    >>> src, dst = graph.find_edges(test_idx)
    >>> rel = graph.edata['etype'][test_idx]
    

FB15k-237 is a subset of FB15k where inverse relations are removed. When creating the dataset, a reverse edge with reversed relation types are created for each edge by default.

FB15k237 dataset statistics:

  • Nodes: 14541

  • Number of relation types: 237

  • Number of reversed relation types: 237

  • Label Split:

    • Train: 272115

    • Valid: 17535

    • Test: 20466

Parameters
  • reverse (bool) – Whether to add reverse edge. Default True.

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_nodes

Number of nodes

Type

int

num_rels

Number of relation types

Type

int

train

A numpy array of triplets (src, rel, dst) for the training graph

Type

numpy.ndarray

valid

A numpy array of triplets (src, rel, dst) for the validation graph

Type

numpy.ndarray

test

A numpy array of triplets (src, rel, dst) for the test graph

Type

numpy.ndarray

Examples

>>> dataset = FB15k237Dataset()
>>> g = dataset.graph
>>> e_type = g.edata['e_type']
>>>
>>> # get data split
>>> train_mask = g.edata['train_mask']
>>> val_mask = g.edata['val_mask']
>>> test_mask = g.edata['test_mask']
>>>
>>> train_set = th.arange(g.number_of_edges())[train_mask]
>>> val_set = th.arange(g.number_of_edges())[val_mask]
>>>
>>> # build train_g
>>> train_edges = train_set
>>> train_g = g.edge_subgraph(train_edges,
                              relabel_nodes=False)
>>> train_g.edata['e_type'] = e_type[train_edges];
>>>
>>> # build val_g
>>> val_edges = th.cat([train_edges, val_edges])
>>> val_g = g.edge_subgraph(val_edges,
                            relabel_nodes=False)
>>> val_g.edata['e_type'] = e_type[val_edges];
>>>
>>> # Train, Validation and Test
__getitem__(idx)[source]

Gets the graph object

Parameters

idx (int) – Item index, FB15k237Dataset has only one graph object

Returns

The graph contains

  • edata['e_type']: edge relation type

  • edata['train_edge_mask']: positive training edge mask

  • edata['val_edge_mask']: positive validation edge mask

  • edata['test_edge_mask']: positive testing edge mask

  • edata['train_mask']: training edge set mask (include reversed training edges)

  • edata['val_mask']: validation edge set mask (include reversed validation edges)

  • edata['test_mask']: testing edge set mask (include reversed testing edges)

  • ndata['ntype']: node type. All 0 in this dataset

Return type

dgl.DGLGraph

__len__()[source]

The number of graphs in the dataset.