Phenotypic Prediction of Missense Variants via Deep Contrastive Learning

Missense variants (MVs) significantly influence numerous clinical phenotypes, but our understanding of their phenotypic consequences remains constrained. Existing computational approaches to interpreting MVs predominately assess their pathogenicity, without considering phenotypic heterogeneity. We present a machine-learning-based method, PheMART, to predict the clinical phenotypic consequences of MVs. PheMART integrates comprehensive variant and pheno- type characterizations by leveraging a robust combination of multiple resources involving protein language models, protein-protein interactions, protein domains, medical knowledge graphs, and electronic health records. Exploiting contrastive learning, PheMART establishes connections between MVs and 4,179 phenotypes by jointly projecting them into a cohesive low-dimensional metric space where proximity signifies relevance. Besides substantially outperforming existing models, PheMART aids in diagnosing individuals with rare diseases by effectively pinpointing clinical diagnoses and causative MVs. As a resource to the community, we provide a database of phenotypic predictions for 5.1 million putative pathogenic amino acid alterations. We provide visualizations both by phenotypes and by genes.

Input

Information for trait

Plot with SNP

Data for SNP

Plot for Domain

Data for Domain

Input

Information

Plot

Data