YAML specificationο
This document describes the YAML specification of metadata.yaml
file for
OnDiskDataset
. metadata.yaml
file is used to specify the dataset
information, including the graph structure, feature data and tasks.
dataset_name: <string>
graph:
nodes:
- type: <string>
num: <int>
- type: <string>
num: <int>
edges:
- type: <string>
format: <string>
path: <string>
- type: <string>
format: <string>
path: <string>
feature_data:
- domain: node
type: <string>
name: <string>
format: <string>
in_memory: <bool>
path: <string>
- domain: node
type: <string>
name: <string>
format: <string>
in_memory: <bool>
path: <string>
- domain: edge
type: <string>
name: <string>
format: <string>
in_memory: <bool>
path: <string>
- domain: edge
type: <string>
name: <string>
format: <string>
in_memory: <bool>
path: <string>
tasks:
- name: <string>
num_classes: <int>
train_set:
- type: <string>
data:
- name: <string>
format: <string>
in_memory: <bool>
path: <string>
- name: <string>
format: <string>
in_memory: <bool>
path: <string>
validation_set:
- type: <string>
data:
- name: <string>
format: <string>
in_memory: <bool>
path: <string>
- name: <string>
format: <string>
in_memory: <bool>
path: <string>
test_set:
- type: <string>
data:
- name: <string>
format: <string>
in_memory: <bool>
path: <string>
- name: <string>
format: <string>
in_memory: <bool>
path: <string>
dataset_name
ο
The dataset_name
field is used to specify the name of the dataset. It is
user-defined.
graph
ο
The graph
field is used to specify the graph structure. It has two fields:
nodes
and edges
.
nodes
:list
The
nodes
field is used to specify the number of nodes for each node type. It is a list ofnode
objects. Eachnode
object has two fields:type
andnum
.
type
:string
, optionalThe
type
field is used to specify the node type. It isnull
for homogeneous graphs. For heterogeneous graphs, it is the node type.
num
:int
The
num
field is used to specify the number of nodes for the node type. It is mandatory for both homogeneous graphs and heterogeneous graphs.
edges
:list
The
edges
field is used to specify the edges. It is a list ofedge
objects. Eachedge
object has three fields:type
,format
andpath
. -type
:string
, optionalThe
type
field is used to specify the edge type. It isnull
for homogeneous graphs. For heterogeneous graphs, it is the edge type.
format
:string
The
format
field is used to specify the format of the edge data. It can becsv
ornumpy
. If it iscsv
, noindex
andheader
fields are needed. If it isnumpy
, the array requires to be in shape of(2, num_edges)
.numpy
format is recommended for large graphs.
path
:string
The
path
field is used to specify the path of the edge data. It is relative to the directory ofmetadata.yaml
file.
feature_data
ο
The feature_data
field is used to specify the feature data. It is a list of
feature
objects. Each feature
object has five canonical fields: domain
,
type
, name
, format
and path
. Any other fields will be passed to
the Feature.metadata
object.
domain
:string
The
domain
field is used to specify the domain of the feature data. It can be eithernode
oredge
.
type
:string
, optionalThe
type
field is used to specify the type of the feature data. It isnull
for homogeneous graphs. For heterogeneous graphs, it is the node or edge type.
name
:string
The
name
field is used to specify the name of the feature data. It is user-defined.
format
:string
The
format
field is used to specify the format of the feature data. It can be eithernumpy
ortorch
.
in_memory
:bool
, optionalThe
in_memory
field is used to specify whether the feature data is loaded into memory. It can be eithertrue
orfalse
. Default istrue
.
path
:string
The
path
field is used to specify the path of the feature data. It is relative to the directory ofmetadata.yaml
file.
tasks
ο
The tasks
field is used to specify the tasks. It is a list of task
objects. Each task
object has at least three fields: train_set
,
validation_set
, test_set
. And you are free to add other fields
such as num_classes
and all these fields will be passed to the
Task.metadata
object.
name
:string
, optionalThe
name
field is used to specify the name of the task. It is user-defined.
num_classes
:int
, optionalThe
num_classes
field is used to specify the number of classes of the task.
train_set
:list
The
train_set
field is used to specify the training set. It is a list ofset
objects. Eachset
object has two fields:type
anddata
.
type
:string
, optionalThe
type
field is used to specify the node/edge type of the set. It isnull
for homogeneous graphs. For heterogeneous graphs, it is the node or edge type.
data
:list
The
data
field is used to loadtrain_set
. It is a list ofdata
objects. Eachdata
object has four fields:name
,format
,in_memory
andpath
.
name
:string
The
name
field is used to specify the name of the data. It is mandatory and used to specify the data fields ofMiniBatch
for sampling. It can be eitherseeds
,labels
orindexes
. If any other name is used, it will be added into theMiniBatch
data fields.
format
:string
The
format
field is used to specify the format of the data. It can be eithernumpy
ortorch
.
in_memory
:bool
, optionalThe
in_memory
field is used to specify whether the data is loaded into memory. It can be eithertrue
orfalse
. Default istrue
.
path
:string
The
path
field is used to specify the path of the data. It is relative to the directory ofmetadata.yaml
file.
validation_set
:list
test_set
:list
The
validation_set
andtest_set
fields are used to specify the validation set and test set respectively. They are similar to thetrain_set
field.