# dgl.optim¶

dgl sparse optimizer for pytorch.

## Node embedding optimizer¶

class dgl.optim.pytorch.SparseAdagrad(params, lr, eps=1e-10)[source]

This optimizer implements a sparse version of Adagrad algorithm for optimizing dgl.nn.NodeEmbedding. Being sparse means it only updates the embeddings whose gradients have updates, which are usually a very small portion of the total embeddings.

Adagrad maintains a $$G_{t,i,j}$$ for every parameter in the embeddings, where $$G_{t,i,j}=G_{t-1,i,j} + g_{t,i,j}^2$$ and $$g_{t,i,j}$$ is the gradient of the dimension $$j$$ of embedding $$i$$ at step $$t$$.

Parameters
• params (list[dgl.nn.NodeEmbedding]) – The list of dgl.nn.NodeEmbedding.

• lr (float) – The learning rate.

• eps (float, Optional) – The term added to the denominator to improve numerical stability Default: 1e-10

Examples

>>> def initializer(emb):
th.nn.init.xavier_uniform_(emb)
return emb
>>> emb = dgl.nn.NodeEmbedding(g.number_of_nodes(), 10, 'emb', init_func=initializer)
...     ...
...     feats = emb(nids, gpu_0)
...     loss = F.sum(feats + 1, 0)
...     loss.backward()
...     optimizer.step()

class dgl.optim.pytorch.SparseAdam(params, lr, betas=(0.9, 0.999), eps=1e-08, use_uva=None, dtype=torch.float32)[source]

Node embedding optimizer using the Adam algorithm.

This optimizer implements a sparse version of Adagrad algorithm for optimizing dgl.nn.NodeEmbedding. Being sparse means it only updates the embeddings whose gradients have updates, which are usually a very small portion of the total embeddings.

Adam maintains a $$Gm_{t,i,j}$$ and Gp_{t,i,j} for every parameter in the embeddings, where $$Gm_{t,i,j}=beta1 * Gm_{t-1,i,j} + (1-beta1) * g_{t,i,j}$$, $$Gp_{t,i,j}=beta2 * Gp_{t-1,i,j} + (1-beta2) * g_{t,i,j}^2$$, $$g_{t,i,j} = lr * Gm_{t,i,j} / (1 - beta1^t) / \sqrt{Gp_{t,i,j} / (1 - beta2^t)}$$ and $$g_{t,i,j}$$ is the gradient of the dimension $$j$$ of embedding $$i$$ at step $$t$$.

NOTE: The support of sparse Adam optimizer is experimental.

Parameters
• params (list[dgl.nn.NodeEmbedding]) – The list of dgl.nn.NodeEmbeddings.

• lr (float) – The learning rate.

• betas (tuple[float, float], Optional) – Coefficients used for computing running averages of gradient and its square. Default: (0.9, 0.999)

• eps (float, Optional) – The term added to the denominator to improve numerical stability Default: 1e-8

• use_uva (bool, Optional) – Whether to use pinned memory for storing ‘mem’ and ‘power’ parameters, when the embedding is stored on the CPU. This will improve training speed, but will require locking a large number of virtual memory pages. For embeddings which are stored in GPU memory, this setting will have no effect. Default: True if the gradients are generated on the GPU, and False if the gradients are on the CPU.

• dtype (torch.dtype, Optional) – The type to store optimizer state with. Default: th.float32.

Examples

>>> def initializer(emb):
th.nn.init.xavier_uniform_(emb)
return emb
>>> emb = dgl.nn.NodeEmbedding(g.number_of_nodes(), 10, 'emb', init_func=initializer)