Machine Learning for Personalized Medicine

Marie-Curie Action: "Initial Training Networks"

Research talk by Yves Moreau: Variant prioritization by genomic data fusion

Watch the video at

NGS has rapidly increased our ability to discover the cause of many previously unresolved rare monogenic disorders by sequencing rare exomic variation. However, after standard filtering against nonsynonymous single nucleotide variants (nSNVs) and loss-of-function mutations that are not present in healthy populations or unaffected samples, many potential candidate mutations are often retained and we need predictive methods to prioritize variants for further validation. Several computational methods have been proposed that take into account biochemical, evolutionary and structural properties of mutations to assess their potential deleteriousness. However, most of these methods suffer from high false positive rates when predicting the impact of rare nSNVs. A plausible explanation for this poor performance is that many of these predicted variants are mildly deleterious, but in no way specific to the disease of interest. We therefore propose a genomic data fusion methodology that integrates multiple strategies to detect deleteriousness of mutations and prioritizes them in a phenotype-specific manner. A key innovation is that we incorporate into our strategy a computational method for gene prioritization, which scores mutated genes based on their similarity to known disease genes by fusing heterogeneous genomic information. We also integrate haploinsufficiency prediction scores that predict the probability that the function of a gene is affected if present in a functionally haploid state. To integrate or fuse these data sources, we develop a machine-learning model using the Human Genome Mutation Database (HGMD) of human disease-causing mutations compared to three control sets: common polymorphisms and two independent sets of rare variation. Benchmarking on HGMD demonstrates that this integrative phenotype-specific variant prioritization significantly outperforms state-of-the-art predictors, such as SIFT or PolyPhen-2.

Download slides (PDF)

Return to the overview of lectures ...