Unifying invariant and variant features for graph out-of-distribution via probability of necessity and sufficiency

Xuexin Chen; Ruichu Cai; Kaitao Zheng; Zhifan Jiang; Zhengting Huang; Zhifeng Hao; Zijian Li

doi:10.1016/j.neunet.2024.107044

Unifying invariant and variant features for graph out-of-distribution via probability of necessity and sufficiency

Neural Netw. 2024 Dec 20:184:107044. doi: 10.1016/j.neunet.2024.107044. Online ahead of print.

Authors

Xuexin Chen¹, Ruichu Cai², Kaitao Zheng³, Zhifan Jiang⁴, Zhengting Huang⁵, Zhifeng Hao⁶, Zijian Li⁷

Affiliations

¹ School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China. Electronic address: im.chenxuexin@gmail.com.
² School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China; Peng Cheng Laboratory, Shenzhen 518066, China. Electronic address: cairuichu@gmail.com.
³ School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China. Electronic address: 1037452735@qq.com.
⁴ School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China. Electronic address: 468788700@qq.com.
⁵ School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China. Electronic address: zhengtinghuang68@gmail.com.
⁶ Mohamed bin Zayed University of Artificial Intelligence, Masdar City, Abu Dhabi, United Arab Emirates. Electronic address: haozhifeng@stu.edu.cn.
⁷ College of Science, Shantou University, Shantou 515063, China. Electronic address: leizigin@gmail.com.

PMID: 39721104
DOI: 10.1016/j.neunet.2024.107044

Abstract

Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has considerable real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraphs and result in suboptimal generalization. To address this challenge, we propose exploiting Probability of Necessity and Sufficiency (PNS) to extract sufficient and necessary invariant substructures. Beyond that, we further leverage the domain variant subgraphs related to the labels to boost the generalization performance in an ensemble manner. Specifically, we first consider the data generation process for graph data. Under mild conditions, we show that the sufficient and necessary invariant subgraph can be extracted by minimizing an upper bound, built on the theoretical advance of the probability of necessity and sufficiency. To further bridge the theory and algorithm, we devise the model called Sufficiency and Necessity Inspired Graph Learning (SNIGL), which ensembles an invariant subgraph classifier on top of latent sufficient and necessary invariant subgraphs, and a domain variant subgraph classifier specific to the test domain for generalization enhancement. Experimental results demonstrate that our SNIGL model outperforms the state-of-the-art techniques on six public benchmarks, highlighting its effectiveness in real-world scenarios.

Keywords: Domain generalization; Graph out-of-distribution; Probability of necessity and sufficiency.