dgl.node_label_informativeness¶

dgl.node_label_informativeness(graph, y, eps=1e-08)[source]¶

Label informativeness (\(\mathrm{LI}\)) is a characteristic of labeled graphs proposed in the Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond

Label informativeness shows how much information about a node’s label we get from knowing its neighbor’s label. Formally, assume that we sample an edge \((\xi,\eta) \in E\). The class labels of nodes \(\xi\) and \(\eta\) are then random variables \(y_\xi\) and \(y_\eta\). We want to measure the amount of knowledge the label \(y_\eta\) gives for predicting \(y_\xi\). The entropy \(H(y_\xi)\) measures the hardness’ of predicting the label of :math:xi` without knowing \(y_\eta\). Given \(y_\eta\), this value is reduced to the conditional entropy \(H(y_\xi|y_\eta)\). In other words, \(y_\eta\) reveals \(I(y_\xi,y_\eta) = H(y_\xi) - H(y_\xi|y_\eta)\) information about the label. To make the obtained quantity comparable across different datasets, label informativeness is defined as the normalized mutual information of \(y_{\xi}\) and \(y_{\eta}\):

\[\mathrm{LI} = \frac{I(y_\xi,y_\eta)}{H(y_\xi)}\]

Depending on the distribution used for sampling an edge \((\xi, \eta)\), several variants of label informativeness can be obtained. Two of them are particularly intuitive: in edge label informativeness (\(\mathrm{LI}_{edge}\)), edges are sampled uniformly at random, and in node label informativeness (\(\mathrm{LI}_{node}\)), first a node is sampled uniformly at random and then an edge incident to it is sampled uniformly at random. These two versions of label informativeness differ in how they weight high/low-degree nodes. In edge label informativeness, averaging is over the edges, thus high-degree nodes are given more weight. In node label informativeness, averaging is over the nodes, so all nodes are weighted equally.

This function computes node label informativeness.

Parameters

graph (DGLGraph) – The graph.
y (torch.Tensor) – The node labels, which is a tensor of shape (|V|).
eps (float, optional) – A small constant for numerical stability. (default: 1e-8)

Returns

The node label informativeness value.

Return type

float

Examples

>>> import dgl
>>> import torch

>>> graph = dgl.graph(([0, 1, 2, 2, 3, 4], [1, 2, 0, 3, 4, 5]))
>>> y = torch.tensor([0, 0, 0, 0, 1, 1])
>>> dgl.node_label_informativeness(graph, y)
0.3381872773170471