CornellDatasetΒΆ

class dgl.data.CornellDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]ΒΆ

Bases: dgl.data.geom_gcn.GeomGCNDataset

Cornell subset of WebKB, later modified by Geom-GCN: Geometric Graph Convolutional Networks

Nodes represent web pages. Edges represent hyperlinks between them. Node features are the bag-of-words representation of web pages. The web pages are manually classified into the five categories, student, project, course, staff, and faculty.

Statistics:

  • Nodes: 183

  • Edges: 298

  • Number of Classes: 5

  • 10 train/val/test splits

    • Train: 87

    • Val: 59

    • Test: 37

Parameters
  • raw_dir (str, optional) – Raw file directory to store the processed data. Default: ~/.dgl/

  • force_reload (bool, optional) – Whether to re-download the data source. Default: False

  • verbose (bool, optional) – Whether to print progress information. Default: True

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access. Default: None

num_classesΒΆ

Number of node classes

Type

int

Notes

The graph does not come with edges for both directions.

Examples

>>> from dgl.data import CornellDataset
>>> dataset = CornellDataset()
>>> g = dataset[0]
>>> num_classes = dataset.num_classes
>>> # get node features
>>> feat = g.ndata["feat"]
>>> # get data split
>>> train_mask = g.ndata["train_mask"]
>>> val_mask = g.ndata["val_mask"]
>>> test_mask = g.ndata["test_mask"]
>>> # get labels
>>> label = g.ndata['label']
__getitem__(idx)ΒΆ

Gets the data object at index.

__len__()ΒΆ

The number of examples in the dataset.