dgl.node_label_informativeness๏ƒ

dgl.node_label_informativeness(graph, y, eps=1e-08)[source]๏ƒ

Label informativeness (\(\mathrm{LI}\)) is a characteristic of labeled graphs proposed in the Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond

Label informativeness shows how much information about a nodeโ€™s label we get from knowing its neighborโ€™s label. Formally, assume that we sample an edge \((\xi,\eta) \in E\). The class labels of nodes \(\xi\) and \(\eta\) are then random variables \(y_\xi\) and \(y_\eta\). We want to measure the amount of knowledge the label \(y_\eta\) gives for predicting \(y_\xi\). The entropy \(H(y_\xi)\) measures the hardnessโ€™ of predicting the label of :math:xi` without knowing \(y_\eta\). Given \(y_\eta\), this value is reduced to the conditional entropy \(H(y_\xi|y_\eta)\). In other words, \(y_\eta\) reveals \(I(y_\xi,y_\eta) = H(y_\xi) - H(y_\xi|y_\eta)\) information about the label. To make the obtained quantity comparable across different datasets, label informativeness is defined as the normalized mutual information of \(y_{\xi}\) and \(y_{\eta}\):

\[\mathrm{LI} = \frac{I(y_\xi,y_\eta)}{H(y_\xi)}\]

Depending on the distribution used for sampling an edge \((\xi, \eta)\), several variants of label informativeness can be obtained. Two of them are particularly intuitive: in edge label informativeness (\(\mathrm{LI}_{edge}\)), edges are sampled uniformly at random, and in node label informativeness (\(\mathrm{LI}_{node}\)), first a node is sampled uniformly at random and then an edge incident to it is sampled uniformly at random. These two versions of label informativeness differ in how they weight high/low-degree nodes. In edge label informativeness, averaging is over the edges, thus high-degree nodes are given more weight. In node label informativeness, averaging is over the nodes, so all nodes are weighted equally.

This function computes node label informativeness.

Parameters:
  • graph (DGLGraph) โ€“ The graph.

  • y (torch.Tensor) โ€“ The node labels, which is a tensor of shape (|V|).

  • eps (float, optional) โ€“ A small constant for numerical stability. (default: 1e-8)

Returns:

The node label informativeness value.

Return type:

float

Examples

>>> import dgl
>>> import torch
>>> graph = dgl.graph(([0, 1, 2, 2, 3, 4], [1, 2, 0, 3, 4, 5]))
>>> y = torch.tensor([0, 0, 0, 0, 1, 1])
>>> dgl.node_label_informativeness(graph, y)
0.3381872773170471