Homophily measure recommended in Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond

Adjusted homophily is edge homophily adjusted for the expected number of edges connecting nodes with the same class label (taking into account the number of classes, their sizes, and the distribution of node degrees among them).

Mathematically it is defined as follows:

$\frac{h_{edge} - \sum_{k=1}^C \bar{p}(k)^2} {1 - \sum_{k=1}^C \bar{p}(k)^2},$

where $$h_{edge}$$ denotes edge homophily, $$C$$ denotes the number of classes, and $$\bar{p}(\cdot)$$ is the empirical degree-weighted distribution of classes: $$\bar{p}(k) = \frac{\sum_{v\,:\,y_v = k} d(v)}{2|E|}$$, where $$d(v)$$ is the degree of node $$v$$.

It has been shown that adjusted homophily satisifes more desirable properties than other homophily measures, which makes it appropriate for comparing the levels of homophily across datasets with different number of classes, different class sizes, andd different degree distributions among classes.

Adjusted homophily can be negative. If adjusted homophily is zero, then the edge pattern in the graph is independent of node class labels. If it is positive, then the nodes in the graph tend to connect to nodes of the same class more often, and if it is negative, than the nodes in the graph tend to connect to nodes of different classes more often (compared to the null model where edges are independent of node class labels).

Parameters:
• graph (DGLGraph) β The graph.

• y (torch.Tensor) β The node labels, which is a tensor of shape (|V|).

Returns: