{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\nLink Prediction using Graph Neural Networks\n===========================================\n\nIn the :doc:`introduction <1_introduction>`, you have already learned\nthe basic workflow of using GNNs for node classification,\ni.e.\u00a0predicting the category of a node in a graph. This tutorial will\nteach you how to train a GNN for link prediction, i.e.\u00a0predicting the\nexistence of an edge between two arbitrary nodes in a graph.\n\nBy the end of this tutorial you will be able to\n\n- Build a GNN-based link prediction model.\n- Train and evaluate the model on a small DGL-provided dataset.\n\n(Time estimate: 28 minutes)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import itertools\nimport os\nos.environ['DGLBACKEND'] = 'pytorch'\n\nimport numpy as np\nimport scipy.sparse as sp\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nimport dgl\nimport dgl.data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Overview of Link Prediction with GNN\n------------------------------------\n\nMany applications such as social recommendation, item recommendation,\nknowledge graph completion, etc., can be formulated as link prediction,\nwhich predicts whether an edge exists between two particular nodes. This\ntutorial shows an example of predicting whether a citation relationship,\neither citing or being cited, between two papers exists in a citation\nnetwork.\n\nThis tutorial formulates the link prediction problem as a binary classification\nproblem as follows:\n\n- Treat the edges in the graph as *positive examples*.\n- Sample a number of non-existent edges (i.e.\u00a0node pairs with no edges\n between them) as *negative* examples.\n- Divide the positive examples and negative examples into a training\n set and a test set.\n- Evaluate the model with any binary classification metric such as Area\n Under Curve (AUC).\n\n
The practice comes from\n `SEAL
``dgl.remove_edges`` works by creating a subgraph from the\n original graph, resulting in a copy and therefore could be slow for\n large graphs. If so, you could save the training and test graph to\n disk, as you would do for preprocessing.
The builtin functions are optimized for both speed and memory.\n We recommend using builtin functions whenever possible.
If you have read the :doc:`message passing\n tutorial <3_message_passing>`, you will notice that the\n argument ``apply_edges`` takes has exactly the same form as a message\n function in ``update_all``.
This tutorial does not include evaluation on a validation\n set. In practice you should save and evaluate the best model based on\n performance on the validation set.