OnDiskDataset¶
-
class
dgl.graphbolt.
OnDiskDataset
(path: str, include_original_edge_id: bool = False)[source]¶ Bases:
dgl.graphbolt.dataset.Dataset
An on-disk dataset which reads graph topology, feature data and Train/Validation/Test set from disk.
Due to limited resources, the data which are too large to fit into RAM will remain on disk while others reside in RAM once
OnDiskDataset
is initialized. This behavior could be controled by user viain_memory
field in YAML file. All paths in YAML file are relative paths to the dataset directory.A full example of YAML file is as follows:
dataset_name: graphbolt_test graph: nodes: - type: paper # could be omitted for homogeneous graph. num: 1000 - type: author num: 1000 edges: - type: author:writes:paper # could be omitted for homogeneous graph. format: csv # Can be csv only. path: edge_data/author-writes-paper.csv - type: paper:cites:paper format: csv path: edge_data/paper-cites-paper.csv feature_data: - domain: node type: paper # could be omitted for homogeneous graph. name: feat format: numpy in_memory: false # If not specified, default to true. path: node_data/paper-feat.npy - domain: edge type: "author:writes:paper" name: feat format: numpy in_memory: false path: edge_data/author-writes-paper-feat.npy tasks: - name: "edge_classification" num_classes: 10 train_set: - type: paper # could be omitted for homogeneous graph. data: # multiple data sources could be specified. - name: node_pairs format: numpy # Can be numpy or torch. in_memory: true # If not specified, default to true. path: set/paper-train-node_pairs.npy - name: labels format: numpy path: set/paper-train-labels.npy validation_set: - type: paper data: - name: node_pairs format: numpy path: set/paper-validation-node_pairs.npy - name: labels format: numpy path: set/paper-validation-labels.npy test_set: - type: paper data: - name: node_pairs format: numpy path: set/paper-test-node_pairs.npy - name: labels format: numpy path: set/paper-test-labels.npy
- Parameters
-
property
all_nodes_set
¶ Return the itemset containing all nodes.
-
property
dataset_name
¶ Return the dataset name.
-
property
feature
¶ Return the feature.
-
property
graph
¶ Return the graph.
-
property
tasks
¶ Return the tasks.
-
property
yaml_data
¶ Return the YAML data.