FakeNewsDataset¶
-
class
dgl.data.
FakeNewsDataset
(name, feature_name, raw_dir=None, transform=None)[source]¶ Bases:
dgl.data.dgl_dataset.DGLBuiltinDataset
Fake News Graph Classification dataset.
The dataset is composed of two sets of tree-structured fake/real news propagation graphs extracted from Twitter. Different from most of the benchmark datasets for the graph classification task, the graphs in this dataset are directed tree-structured graphs where the root node represents the news, the leaf nodes are Twitter users who retweeted the root news. Besides, the node features are encoded user historical tweets using different pretrained language models:
bert: the 768-dimensional node feature composed of Twitter user historical tweets encoded by the bert-as-service
content: the 310-dimensional node feature composed of a 300-dimensional “spacy” vector plus a 10-dimensional “profile” vector
profile: the 10-dimensional node feature composed of ten Twitter user profile attributes.
spacy: the 300-dimensional node feature composed of Twitter user historical tweets encoded by the spaCy word2vec encoder.
Reference: <https://github.com/safe-graph/GNN-FakeNews>
Note: this dataset is for academic use only, and commercial use is prohibited.
Statistics:
Politifact:
Graphs: 314
Nodes: 41,054
Edges: 40,740
Classes:
Fake: 157
Real: 157
Node feature size:
bert: 768
content: 310
profile: 10
spacy: 300
Gossipcop:
Graphs: 5,464
Nodes: 314,262
Edges: 308,798
Classes:
Fake: 2,732
Real: 2,732
Node feature size:
bert: 768
content: 310
profile: 10
spacy: 300
- Parameters
name (str) – Name of the dataset (gossipcop, or politifact)
feature_name (str) – Name of the feature (bert, content, profile, or spacy)
raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/
transform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
-
labels
¶ Graph labels
- Type
Tensor
-
feature
¶ Node features
- Type
Tensor
-
train_mask
¶ Mask of training set
- Type
Tensor
-
val_mask
¶ Mask of validation set
- Type
Tensor
-
test_mask
¶ Mask of testing set
- Type
Tensor
Examples
>>> dataset = FakeNewsDataset('gossipcop', 'bert') >>> graph, label = dataset[0] >>> num_classes = dataset.num_classes >>> feat = dataset.feature >>> labels = dataset.labels
-
__getitem__
(i)[source]¶ Get graph and label by index
- Parameters
i (int) – Item index
- Returns
- Return type
(
dgl.DGLGraph
, Tensor)