dgl.node_label_informativeness๏
- dgl.node_label_informativeness(graph, y, eps=1e-08)[source]๏
Label informativeness (\(\mathrm{LI}\)) is a characteristic of labeled graphs proposed in the Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond
Label informativeness shows how much information about a nodeโs label we get from knowing its neighborโs label. Formally, assume that we sample an edge \((\xi,\eta) \in E\). The class labels of nodes \(\xi\) and \(\eta\) are then random variables \(y_\xi\) and \(y_\eta\). We want to measure the amount of knowledge the label \(y_\eta\) gives for predicting \(y_\xi\). The entropy \(H(y_\xi)\) measures the hardnessโ of predicting the label of :math:xi` without knowing \(y_\eta\). Given \(y_\eta\), this value is reduced to the conditional entropy \(H(y_\xi|y_\eta)\). In other words, \(y_\eta\) reveals \(I(y_\xi,y_\eta) = H(y_\xi) - H(y_\xi|y_\eta)\) information about the label. To make the obtained quantity comparable across different datasets, label informativeness is defined as the normalized mutual information of \(y_{\xi}\) and \(y_{\eta}\):
\[\mathrm{LI} = \frac{I(y_\xi,y_\eta)}{H(y_\xi)}\]Depending on the distribution used for sampling an edge \((\xi, \eta)\), several variants of label informativeness can be obtained. Two of them are particularly intuitive: in edge label informativeness (\(\mathrm{LI}_{edge}\)), edges are sampled uniformly at random, and in node label informativeness (\(\mathrm{LI}_{node}\)), first a node is sampled uniformly at random and then an edge incident to it is sampled uniformly at random. These two versions of label informativeness differ in how they weight high/low-degree nodes. In edge label informativeness, averaging is over the edges, thus high-degree nodes are given more weight. In node label informativeness, averaging is over the nodes, so all nodes are weighted equally.
This function computes node label informativeness.
- Parameters:
- Returns:
The node label informativeness value.
- Return type:
Examples
>>> import dgl >>> import torch
>>> graph = dgl.graph(([0, 1, 2, 2, 3, 4], [1, 2, 0, 3, 4, 5])) >>> y = torch.tensor([0, 0, 0, 0, 1, 1]) >>> dgl.node_label_informativeness(graph, y) 0.3381872773170471