Purpose: Genetic variants in complement genes are associated with age-related macular degeneration (AMD). However, many rare variants have been identified in these genes, but have an unknown significance, and their impact on protein function and structure is still unknown. We set out to address this issue by evaluating the spatial placement and impact on protein structureof these variants by developing an analytical pipeline and applying it to the International AMD Genomics Consortium (IAMDGC) dataset (16,144 AMD cases, 17,832 controls).
Methods: The IAMDGC dataset was imputed using the Haplotype Reference Consortium (HRC), leading to an improvement of over 30% more imputed variants, over the original 1000 Genomes imputation. Variants were extracted for the CFH , CFI , CFB , C9 , and C3 genes, and filtered for missense variants in solved protein structures. We evaluated these variants as to their placement in the three-dimensional structure of the protein (i.e. spatial proximity in the protein), as well as AMD association. We applied several pipelines to a) calculate spatial proximity to known AMD variants versus gnomAD variants, b) assess a variant's likelihood of causing protein destabilization via calculation of predicted free energy change (ddG) using Rosetta, and c) whole gene-based testing to test for statistical associations. Gene-based testing using seqMeta was performed using a) all variants b) variants near known AMD variants or c) with a ddG >|2|. Further, we applied a structural kernel adaptation of SKAT testing (POKEMON) to confirm the association of spatial distributions of missense variants to AMD. Finally, we used logistic regression on known AMD variants in CFI to identify variants leading to >50% reduction in protein expression from known AMD patient carriers of CFI variants compared to wild type (as determined by in vitro experiments) to determine the pipeline's robustness in identifying AMD-relevant variants. These results were compared to functional impact scores, ie CADD values > 10, which indicate if a variant may have a large functional impact genomewide, to determine if our metrics have better discriminative power than existing variant assessment methods. Once our pipeline had been validated, we then performed a priori selection of variants using this pipeline methodology, and tested AMD patient cell lines that carried those selected variants from the EUGENDA cohort (n=34). We investigated complement pathway protein expression in vitro , looking at multiple components of the complement factor pathway in patient carriers of bioinformatically identified variants.
Results: Multiple variants were found with a ddG>|2| in each complement gene investigated. Gene-based tests using known and novel missense variants identified significant associations of the C3 , C9 , CFB , and CFH genes with AMD risk after controlling for age and sex (P=3.22×10 -5 ;7.58×10 -6 ;2.1×10 -3 ;1.2×10 -31 ). ddG filtering and SKAT-O tests indicate that missense variants that are predicted to destabilize the protein, in both CFI and CFH, are associated with AMD (P=CFH:0.05, CFI:0.01, threshold of 0.05 significance). Our structural kernel approach identified spatial associations for AMD risk within the protein structures for C3, C9, CFB, CFH, and CFI at a nominal p-value of 0.05. Both ddG and CADD scores were predictive of reduced CFI protein expression, with ROC curve analyses indicating ddG is a better predictor (AUCs of 0.76 and 0.69, respectively). A priori in vitro analysis of variants in all complement factor genes indicated that several variants identified via bioinformatics programs PathProx/POKEMON in our pipeline via in vitro experiments caused significant change in complement protein expression (P=0.04) in actual patient carriers of those variants, via ELISA testing of proteins in the complement factor pathway, and were previously unknown to contribute to AMD pathogenesis.
Conclusion: We demonstrate for the first time that missense variants in complement genes cluster together spatially and are associated with AMD case/control status. Using this method, we can identify CFI and CFH variants of previously unknown significance that are predicted to destabilize the proteins. These variants, both in and outside spatial clusters, can predict in-vitro tested CFI protein expression changes, and we hypothesize the same is true for CFH . A priori identification of variants that impact gene expression allow for classification for previously classified as VUS. Further investigation is needed to validate the models for additional variants and to be applied to all AMD-associated genes.