class dgl.nn.pytorch.sparse_emb.NodeEmbedding(num_embeddings, embedding_dim, name, init_func=None, device=None, partition=None)[source]

Bases: object

Class for storing node embeddings.

The class is optimized for training large-scale node embeddings. It updates the embedding in a sparse way and can scale to graphs with millions of nodes. It also supports partitioning to multiple GPUs (on a single machine) for more acceleration. It does not support partitioning across machines.

Currently, DGL provides two optimizers that work with this NodeEmbedding class: SparseAdagrad and SparseAdam.

The implementation is based on torch.distributed package. It depends on the pytorch default distributed process group to collect multi-process information and uses torch.distributed.TCPStore to share meta-data information across multiple gpu processes. It use the local address of ‘’ to initialize the TCPStore.

NOTE: The support of NodeEmbedding is experimental.

  • num_embeddings (int) – The number of embeddings. Currently, the number of embeddings has to be the same as the number of nodes.

  • embedding_dim (int) – The dimension size of embeddings.

  • name (str) – The name of the embeddings. The name should uniquely identify the embeddings in the system.

  • init_func (callable, optional) – The function to create the initial data. If the init function is not provided, the values of the embeddings are initialized to zero.

  • device (th.device) – Device to store the embeddings on.

  • parittion (NDArrayPartition) – The partition to use to distributed the embeddings between processes.


Before launching multiple gpu processes

>>> def initializer(emb):
        return emb

In each training process

>>> emb = dgl.nn.NodeEmbedding(g.number_of_nodes(), 10, 'emb', init_func=initializer)
>>> optimizer = dgl.optim.SparseAdam([emb], lr=0.001)
>>> for blocks in dataloader:
...     ...
...     feats = emb(nids, gpu_0)
...     loss = F.sum(feats + 1, 0)
...     loss.backward()
...     optimizer.step()