# Model overview¶

We developed DGL with a broad range of applications in mind. Building state-of-art models forces us to think hard on the most common and useful APIs, learn the hard lessons, and push the system design.

We have prototyped altogether 10 different models, all of them are ready to run out-of-box and some of them are very new graph-based algorithms. In most of the cases, they demonstrate the performance, flexibility, and expressiveness of DGL. For where we still fall in short, these exercises point to future directions.

We categorize the models below, providing links to the original code and tutorial when appropriate. As will become apparent, these models stress the use of different DGL APIs.

# Graph neural networks and its variants¶

**Graph convolutional network (GCN)**[research paper] [tutorial] [Pytorch code] [MXNet code]: This is the most basic GCN. The tutorial covers the basic uses of DGL APIs.**Graph attention network (GAT)**[research paper] [tutorial] [Pytorch code] [MXNet code]: GAT extends the GCN functionality by deploying multi-head attention among neighborhood of a node. This greatly enhances the capacity and expressiveness of the model.**Relational-GCN**[research paper] [tutorial] [Pytorch code] [MXNet code]: Relational-GCN allows multiple edges among two entities of a graph. Edges with distinct relationships are encoded differently.**Line graph neural network (LGNN)**[research paper] [tutorial] [Pytorch code]: This network focuses on community detection by inspecting graph structures. It uses representations of both the original graph and its line-graph companion. In addition to demonstrating how an algorithm can harness multiple graphs, this implementation shows how you can judiciously mix simple tensor operations and sparse-matrix tensor operations, along with message-passing with DGL.**Stochastic steady-state embedding (SSE)**[research paper] [tutorial] [MXNet code]: SSE is an example to illustrate the co-design of both algorithm and system. Sampling to guarantee asymptotic convergence while lowering complexity and batching across samples for maximum parallelism. The emphasis here is that a giant graph that cannot fit comfortably on one GPU card.

# Batching many small graphs¶

**Tree-LSTM**[paper] [tutorial] [PyTorch code]: Sentences have inherent structures that are thrown away by treating them simply as sequences. Tree-LSTM is a powerful model that learns the representation by using prior syntactic structures such as a parse-tree. The challenge in training is that simply by padding a sentence to the maximum length no longer works. Trees of different sentences have different sizes and topologies. DGL solves this problem by adding the trees to a bigger container graph, and then using message-passing to explore maximum parallelism. Batching is a key API for this.

# Generative models¶

**DGMG**[paper] [tutorial] [PyTorch code]: This model belongs to the family that deals with structural generation. Deep generative models of graphs (DGMG) uses a state-machine approach. It is also very challenging because, unlike Tree-LSTM, every sample has a dynamic, probability-driven structure that is not available before training. You can progressively leverage intra- and inter-graph parallelism to steadily improve the performance.**JTNN**[paper] [PyTorch code]: This network generates molecular graphs using the framework of a variational auto-encoder. The junction tree neural network (JTNN) builds structure hierarchically. In the case of molecular graphs, it uses a junction tree as the middle scaffolding.

# Old (new) wines in new bottle¶

**Capsule**[paper] [tutorial] [code]: this new computer vision model has two key ideas – enhancing the feature representation in a vector form (instead of a scalar) called*capsule*, and replacing max-pooling with dynamic routing. The idea of dynamic routing is to integrate a lower level capsule to one (or several) of a higher level one with non-parametric message-passing. We show how the later can be nicely implemented with DGL APIs.**Transformer**[paper] [tutorial] [code] and**Universal Transformer**[paper] [tutorial] [code]: these two models replace RNN with several layers of multi-head attention to encode and discover structures among tokens of a sentence. These attention mechanisms can similarly formulated as graph operations with message-passing.

# Training on giant graphs¶

**Sampling**[paper] [tutorial] [MXNet code] [Pytorch code]: You can perform neighbor sampling and control-variate sampling to train a graph convolution network and its variants on a giant graph.**Scale to giant graphs**[tutorial] [MXNet code] [Pytorch code]: You can find two components (graph store and distributed sampler) to scale to graphs with hundreds of millions of nodes.