# Is Homophily Really A Bad Metric? Disentangling Graph Homophily For Graph Neural Networks
Code Implementation for "Is Homophily Really A Bad Metric? Disentangling Graph Homophily For Graph Neural Networks" (Yilun Zheng $et\ al.$ 2024)

## Overview
The repository is organised as follows:
```
| -- train.py # Main training framework
| -- config.py # Configurations and descriptions of hyper-parameters
| -- datasets.py # Dataset class and homophily metrics of label, structural, and feature homophily
| -- homophily.py # Additional homophily metrics
| -- preprocess_dataset.py # Preprocess real-world datasets
| -- utils.py # Some useful functions
| -- /data # Folder to place preprocessed datasets
| -- /experiments # Folder to save experimental results
```
## Environment

```
python==3.7.16
torch==1.12.0+cu116
torch-cluster==2.1.1
torch-geometric==2.3.1
torch-scatter==2.1.1
torch-sparse==2.1.1
torch-spline-conv==1.2.2
dgl==1.1.2+cu116
ogb==1.3.6
numpy==1.19.2
scipy==1.7.3
networkx==2.3
```

## Data Preparation
Datasets of roman-empire, amazon-ratings, minesweeper, tolokers, questions, squirrel-filtered, chameleon-filtered, actor, texas-4-classes, cornell, and wisconsin are from https://github.com/yandex-research/heterophilous-graphs/tree/main/data.

Datasets of amazon-photo, amazon-computer, coauthor-cs, coauthor-physics, wikics, blog-catalog, ogbn-arxiv, genius, twitch-DE, twitch-ENGB, twitch-ES, twitch-FR, twitch-PTBR, twitch-RU, and twitch-TW are downloaded from preprocess_dataset.py
```
load_new_dataset(dataset_name=<DATASET_NAME>,split_type='random', train_prop=0.6, valid_prop=0.2, num_data_splits=10)
```

Please put the preprocessed datasets in ./data

## Training

For synthetic datasets, 
```
python train.py --dataset syn --model <MODEL NAME> --num_runs 1 --device <DEVICE> --save_dir <SAVED FILE DIR> --seed <RANDOM SEED> --syn_num_node <NUM OF NODES> --syn_num_class <NUM OF CLASS> --syn_num_degree "<MINIMUM NODE DEGREE> <MAXIMUM NODE DEGREE>" --syn_feat_dim <FEATURE DIMENSION> --syn_h_l <LABEL HOMOPHILY> --syn_h_s <STRUCTURAL HOMOPHILY> --syn_h_f <FEATURE HOMOPHILY> --num_steps <MAXIMUM TRAINING STEPS>
```

For real-world datasets,
```
python train.py --dataset <DATASET NAME> --model <MODEL NAME> --num_layers 2 --device <DEVICE> --save_dir <SAVED FILE DIR> --num_steps <MAXIMUM TRAINING STEPS> --hidden_dim <SIZE OF EMBEDDING> --dropout <DROPOUT RATE> --lr <LEARNING RATE>
```

For model hyper-parameters, please refer to config.py for detailed descriptions.

## Homophily Measurement

For label, structural, and feature homophily, please run
```
python datasets.py
```
For the other types of homophily, please refer to homophily.py

## Attribution
Part of model training are based on https://github.com/yandex-research/heterophilous-graphs
Parts of the homophily metrics are based on https://github.com/SitaoLuan/When-Do-GNNs-Help

## License
MIT