# Graph neural networks and its variants¶

**Graph convolutional network (GCN)**[research paper] [tutorial] [Pytorch code] [MXNet code]: This is the most basic GCN. The tutorial covers the basic uses of DGL APIs.**Graph attention network (GAT)**[research paper] [tutorial] [Pytorch code] [MXNet code]: GAT extends the GCN functionality by deploying multi-head attention among neighborhood of a node. This greatly enhances the capacity and expressiveness of the model.**Relational-GCN**[research paper] [tutorial] [Pytorch code] [MXNet code]: Relational-GCN allows multiple edges among two entities of a graph. Edges with distinct relationships are encoded differently.**Line graph neural network (LGNN)**[research paper] [tutorial] [Pytorch code]: This network focuses on community detection by inspecting graph structures. It uses representations of both the original graph and its line-graph companion. In addition to demonstrating how an algorithm can harness multiple graphs, this implementation shows how you can judiciously mix simple tensor operations and sparse-matrix tensor operations, along with message-passing with DGL.**Stochastic steady-state embedding (SSE)**[research paper] [tutorial] [MXNet code]: SSE is an example to illustrate the co-design of both algorithm and system. Sampling to guarantee asymptotic convergence while lowering complexity and batching across samples for maximum parallelism. The emphasis here is that a giant graph that cannot fit comfortably on one GPU card.

# Batching many small graphs¶

**Tree-LSTM**[paper] [tutorial] [PyTorch code]: Sentences have inherent structures that are thrown away by treating them simply as sequences. Tree-LSTM is a powerful model that learns the representation by using prior syntactic structures such as a parse-tree. The challenge in training is that simply by padding a sentence to the maximum length no longer works. Trees of different sentences have different sizes and topologies. DGL solves this problem by adding the trees to a bigger container graph, and then using message-passing to explore maximum parallelism. Batching is a key API for this.

# Generative models¶

**DGMG**[paper] [tutorial] [PyTorch code]: This model belongs to the family that deals with structural generation. Deep generative models of graphs (DGMG) uses a state-machine approach. It is also very challenging because, unlike Tree-LSTM, every sample has a dynamic, probability-driven structure that is not available before training. You can progressively leverage intra- and inter-graph parallelism to steadily improve the performance.**JTNN**[paper] [PyTorch code]: This network generates molecular graphs using the framework of a variational auto-encoder. The junction tree neural network (JTNN) builds structure hierarchically. In the case of molecular graphs, it uses a junction tree as the middle scaffolding.

# Revisit classic models from a graph perspective¶

**Capsule**[paper] [tutorial] [PyTorch code]: This new computer vision model has two key ideas. First, enhancing the feature representation in a vector form (instead of a scalar) called*capsule*. Second, replacing max-pooling with dynamic routing. The idea of dynamic routing is to integrate a lower level capsule to one or several higher level capsules with non-parametric message-passing. A tutorial shows how the latter can be implemented with DGL APIs.**Transformer**[paper] [tutorial] [PyTorch code] and**Universal Transformer**[paper] [tutorial] [PyTorch code]: These two models replace recurrent neural networks (RNNs) with several layers of multi-head attention to encode and discover structures among tokens of a sentence. These attention mechanisms are similarly formulated as graph operations with message-passing.

# Training on giant graphs¶

**Sampling**[paper] [tutorial] [MXNet code] [Pytorch code]: You can perform neighbor sampling and control-variate sampling to train a graph convolution network and its variants on a giant graph.**Scale to giant graphs**[tutorial] [MXNet code] [Pytorch code]: You can find two components (graph store and distributed sampler) to scale to graphs with hundreds of millions of nodes.