DGL is still in its alpha stage, so expect some trial and error. Keep in mind that
DGL is a framework atop other frameworks (e.g. Pytorch, MXNet), so it is important
to figure out whether the bug is due to DGL or the backend framework. For example,
DGL will usually complain and throw a
DGLError if anything goes wrong. If you
are pretty confident that it is a bug, feel free to raise an issue.
Graph can be very large and training on graph may cause OOM. There are several tips to check when you get an OOM error.
Try to avoid propagating node features to edges. Number of edges are usually much larger than number of nodes. Try to use out built-in functions whenever it is possible.
Look out for cyclic references due to user-defined functions. Usually we recommend using global function or module class for the user-defined functions. Pay attention to the variables in function closure. Also, it is usually better to directly provide the UDFs in the message passing APIs rather than register them:
# define a message function def mfunc(edges): return edges.data['x'] # better as the graph `mfunc` does not hold a reference to `mfunc` g.send(some_edges, mfunc) # the graph hold a reference to `mfunc` so as all the variables in its closure g.register(mfunc) g.send(some_edges)
If your scenario does not require autograd, you can use
inplace=Trueflag in the message passing APIs. This will update features inplacely that might save memory.