FakeNewsDataset

class dgl.data.FakeNewsDataset(name, feature_name, raw_dir=None, transform=None)[source]

Bases: DGLBuiltinDataset

Fake News Graph Classification dataset.

The dataset is composed of two sets of tree-structured fake/real news propagation graphs extracted from Twitter. Different from most of the benchmark datasets for the graph classification task, the graphs in this dataset are directed tree-structured graphs where the root node represents the news, the leaf nodes are Twitter users who retweeted the root news. Besides, the node features are encoded user historical tweets using different pretrained language models:

  • bert: the 768-dimensional node feature composed of Twitter user historical tweets encoded by the bert-as-service

  • content: the 310-dimensional node feature composed of a 300-dimensional β€œspacy” vector plus a 10-dimensional β€œprofile” vector

  • profile: the 10-dimensional node feature composed of ten Twitter user profile attributes.

  • spacy: the 300-dimensional node feature composed of Twitter user historical tweets encoded by the spaCy word2vec encoder.

Reference: <https://github.com/safe-graph/GNN-FakeNews>

Note: this dataset is for academic use only, and commercial use is prohibited.

Statistics:

Politifact:

  • Graphs: 314

  • Nodes: 41,054

  • Edges: 40,740

  • Classes:

    • Fake: 157

    • Real: 157

  • Node feature size:

    • bert: 768

    • content: 310

    • profile: 10

    • spacy: 300

Gossipcop:

  • Graphs: 5,464

  • Nodes: 314,262

  • Edges: 308,798

  • Classes:

    • Fake: 2,732

    • Real: 2,732

  • Node feature size:

    • bert: 768

    • content: 310

    • profile: 10

    • spacy: 300

Parameters:
  • name (str) – Name of the dataset (gossipcop, or politifact)

  • feature_name (str) – Name of the feature (bert, content, profile, or spacy)

  • raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

name

Name of the dataset (gossipcop, or politifact)

Type:

str

num_classes

Number of label classes

Type:

int

num_graphs

Number of graphs

Type:

int

graphs

A list of DGLGraph objects

Type:

list

labels

Graph labels

Type:

Tensor

feature_name

Name of the feature (bert, content, profile, or spacy)

Type:

str

feature

Node features

Type:

Tensor

train_mask

Mask of training set

Type:

Tensor

val_mask

Mask of validation set

Type:

Tensor

test_mask

Mask of testing set

Type:

Tensor

Examples

>>> dataset = FakeNewsDataset('gossipcop', 'bert')
>>> graph, label = dataset[0]
>>> num_classes = dataset.num_classes
>>> feat = dataset.feature
>>> labels = dataset.labels
__getitem__(i)[source]

Get graph and label by index

Parameters:

i (int) – Item index

Return type:

(dgl.DGLGraph, Tensor)

__len__()[source]

Number of graphs in the dataset.

Return type:

int