class, force_reload=False, verbose=True, transform=None)[source]

Bases: GeomGCNDataset

Cornell subset of WebKB, later modified by Geom-GCN: Geometric Graph Convolutional Networks

Nodes represent web pages. Edges represent hyperlinks between them. Node features are the bag-of-words representation of web pages. The web pages are manually classified into the five categories, student, project, course, staff, and faculty.


  • Nodes: 183

  • Edges: 298

  • Number of Classes: 5

  • 10 train/val/test splits

    • Train: 87

    • Val: 59

    • Test: 37

  • raw_dir (str, optional) – Raw file directory to store the processed data. Default: ~/.dgl/

  • force_reload (bool, optional) – Whether to re-download the data source. Default: False

  • verbose (bool, optional) – Whether to print progress information. Default: True

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access. Default: None


Number of node classes




The graph does not come with edges for both directions.


>>> from import CornellDataset
>>> dataset = CornellDataset()
>>> g = dataset[0]
>>> num_classes = dataset.num_classes
>>> # get node features
>>> feat = g.ndata["feat"]
>>> # get data split
>>> train_mask = g.ndata["train_mask"]
>>> val_mask = g.ndata["val_mask"]
>>> test_mask = g.ndata["test_mask"]
>>> # get labels
>>> label = g.ndata['label']

Gets the data object at index.


The number of examples in the dataset.