FB15kDataset

class dgl.data.FB15kDataset(reverse=True, raw_dir=None, force_reload=False, verbose=True, transform=None)[source]

Bases: dgl.data.knowledge_graph.KnowledgeGraphDataset

FB15k link prediction dataset.

    Deprecated since version 0.5.0:
  • train is deprecated, it is replaced by:

    >>> dataset = FB15kDataset()
    >>> graph = dataset[0]
    >>> train_mask = graph.edata['train_mask']
    >>> train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
    >>> src, dst = graph.edges(train_idx)
    >>> rel = graph.edata['etype'][train_idx]
    
  • valid is deprecated, it is replaced by:

    >>> dataset = FB15kDataset()
    >>> graph = dataset[0]
    >>> val_mask = graph.edata['val_mask']
    >>> val_idx = th.nonzero(val_mask, as_tuple=False).squeeze()
    >>> src, dst = graph.edges(val_idx)
    >>> rel = graph.edata['etype'][val_idx]
    
  • test is deprecated, it is replaced by:

    >>> dataset = FB15kDataset()
    >>> graph = dataset[0]
    >>> test_mask = graph.edata['test_mask']
    >>> test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
    >>> src, dst = graph.edges(test_idx)
    >>> rel = graph.edata['etype'][test_idx]
    

The FB15K dataset was introduced in Translating Embeddings for Modeling Multi-relational Data. It is a subset of Freebase which contains about 14,951 entities with 1,345 different relations. When creating the dataset, a reverse edge with reversed relation types are created for each edge by default.

FB15k dataset statistics:

  • Nodes: 14,951

  • Number of relation types: 1,345

  • Number of reversed relation types: 1,345

  • Label Split:

    • Train: 483142

    • Valid: 50000

    • Test: 59071

Parameters
  • reverse (bool) – Whether to add reverse edge. Default True.

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_nodes

Number of nodes

Type

int

num_rels

Number of relation types

Type

int

train

A numpy array of triplets (src, rel, dst) for the training graph

Type

numpy.ndarray

valid

A numpy array of triplets (src, rel, dst) for the validation graph

Type

numpy.ndarray

test

A numpy array of triplets (src, rel, dst) for the test graph

Type

numpy.ndarray

Examples

>>> dataset = FB15kDataset()
>>> g = dataset.graph
>>> e_type = g.edata['e_type']
>>>
>>> # get data split
>>> train_mask = g.edata['train_mask']
>>> val_mask = g.edata['val_mask']
>>>
>>> train_set = th.arange(g.number_of_edges())[train_mask]
>>> val_set = th.arange(g.number_of_edges())[val_mask]
>>>
>>> # build train_g
>>> train_edges = train_set
>>> train_g = g.edge_subgraph(train_edges,
                              relabel_nodes=False)
>>> train_g.edata['e_type'] = e_type[train_edges];
>>>
>>> # build val_g
>>> val_edges = th.cat([train_edges, val_edges])
>>> val_g = g.edge_subgraph(val_edges,
                            relabel_nodes=False)
>>> val_g.edata['e_type'] = e_type[val_edges];
>>>
>>> # Train, Validation and Test
>>>
__getitem__(idx)[source]

Gets the graph object

Parameters

idx (int) – Item index, FB15kDataset has only one graph object

Returns

The graph contains

  • edata['e_type']: edge relation type

  • edata['train_edge_mask']: positive training edge mask

  • edata['val_edge_mask']: positive validation edge mask

  • edata['test_edge_mask']: positive testing edge mask

  • edata['train_mask']: training edge set mask (include reversed training edges)

  • edata['val_mask']: validation edge set mask (include reversed validation edges)

  • edata['test_mask']: testing edge set mask (include reversed testing edges)

  • ndata['ntype']: node type. All 0 in this dataset

Return type

dgl.DGLGraph

__len__()[source]

The number of graphs in the dataset.