Machine Learning for Personalized Medicine

Marie-Curie Action: "Initial Training Networks"

Project: Comparison of multi-marker methods to identify the genetics pathways underlying asthma

Yuanlong Liu
Florence Demenais
Paris, France 

Yuanlong Liu
was born in Chongqing, China, in 1988. He studied Probability and Mathematical statistics at University of Science and Technology of China (USTC), and obtained his MSc in 2013, working on Bandwidth selection methods in kernel density estimation.

Project description

Asthma, as many multifactorial diseases, is a complex and heterogeneous disease that results from many genetic and environmental factors. Numerous genetic studies based on candidate gene and positional cloning approaches and more recently Genome-Wide Association Studies (GWAS) have identified a number of asthma susceptibility genes. However, these genes explain only a part of the genetic component of asthma.

Up to now, GWAS have consisted of testing association of disease (or any phenotype) with each of the hundreds of thousands to millions of SNPs genotyped over the genome and of highlighting those SNPs reaching a stringent genome-wide significant level (e.g., p-value < 10-8 to take into account multiple testing). In comparison, the joint analysis of multiple SNPs (multi-marker analysis) can allow detecting SNPs with small effects, SNP-SNP interactions, and can provide more insight into the biological pathways that influence disease. An increasing number of multi-marker methods have been proposed which differ by a number of aspects:

1) how they are applied either at the genome-wide level or targeted towards candidate pathways;
2) the use or not of prior biological knowledge;
3) the input data available: raw genotypic and phenotypic data or SNP summary statistics;
4) the statistical method used and the way of assessing the significance of the test statistics.

This project will focus on multi-marker and network-based approaches using biological knowledge that is available in public databases at an increasing rate with the development of new sequencing and “omic” technologies (e.g., protein-networks, co-expression of genes, and co-occurrence of genes publications…). These approaches integrate GWAS association data and biological data such as gene ontology classes, biological pathways, protein networks…, to search for enrichment of specific pathways or networks in genes associated with disease. The main objectives of this project are: to compare the characteristics of different approaches integrating GWAS data and biological knowledge for the genetic analysis of complex phenotypes (e.g., gene set enrichment analysis, protein-protein interaction networks, gene co-expression networks…); to define the most suitable analysis strategy based on the outcomes of the first objective and to propose extensions of existing methods (e.g., to address the issue of genetic heterogeneity); and to apply these approaches to large-scale genome-wide SNP data of asthma to identify new gene sets involved in this disease.

Motivation for participating in the network

Medicine and human health have long been my keen interest and are of high preference for career development. The ITN MLPM program brings me perspectives of cutting-edge medical technologies, especially the frontier of personalized medicine tailored for individuals. The high-level pre-doctoral trainings provided by the program helps to broaden my knowledge and enhance skills in medical analysis. It also offers various opportunities to collaborate with multi-nodes academic labs and work in private companies, making it readily for interdisciplinary cooperation in the inspiring academic-industrial network.


Duration of fellowship: from September 2013 to September 2016


MLPM2012 Publications:


- Y. Liu, M. Brossard, C. Sarnowski, P. Margaritte-Jeannin, F. Llinares, A. Vaysse, M.H. Dizier, E. Bouzigon, F. Demenais. Bring together Machine Learning and Statistical Genetics for Personalized Medicine14th annual congress of international drug discovery science and technology (invited talk), Nanjing, China, 16-19 Nov 2016

- Y. Liu, M. Brossard, C. Sarnowski, P. Margaritte-Jeannin, A. Vaysse, M.H. Dizier, E. Bouzigon, F. Demenais. Integrate network resources to optimize genetic association studies. European Society of Human Genetics annual meeting, Barcelona, Spain, 21-24 May 2016.

- Y. Liu, M. Brossard, C. Sarnowski, P. Margaritte-Jeannin, F. Llinares, A. Vaysse, M. H. Dizier, E. Bouzigon, F. Demenais, and GABRIEL asthma consortium. Integration of genome-wide association data and human protein interaction networks identifies a gene sub-network underlying childhood-onset asthma. American Society of Human Genetics annual meeting, Baltimore, USA, 6-10 Oct 2015.

- Y. Liu, M. Brossard, P. Margaritte-Jeannin, F. Llinares, C. Sarnowski1, L. Al-Shikhley, N. Lavielle, A. Vaysse, M.H. Dizier, E. Bouzigon, F. Demenais. Network-Assisted Investigation of Signals from Genome-Wide Association Studies in Childhood-onset Asthma. Capita Selecta in Complex Disease Analysis conference, Liège, Belgium, 24-26 Nov 2014.


Yuanlong Liu, Myriam Brossard, Damian Roqueiro , Patricia Margaritte-Jeannin, Chloé Sarnowski, Emmanuelle Bouzigon, Florence Demenais. A novel network method (SigMod) identifies a strongly interconnected gene module associated with childhood asthma. European Society of Human Genetics annual meeting, Barcelona, Spain, 21-24 May 2016.                  
- Y. Liu, M. Brossard, C. Sarnowski, P. Margaritte-Jeannin, F. Llinares, A. Vaysse, M.H. Dizier, E. Bouzigon, F. Demenais. Network-based analysis of GWAS data identifies agene sub-network underlying childhood-onset asthma. Interational Genetic Epidimiology Society Annual Meeting, Baltimore, USA, 4-6 Oct  2015.

- Liu Y et al. Pathways and Protein networks associated with asthma. Annual meeting of the French Doctoral School, Saint Malo, France, 20-22 Oct 2014.


SigMod: an exact and efficient method to identify a strongly interconnected disease-associated module in gene connectome

SigMod2: identify novel disease-associated modules closely connected to previously known disease genes

fastCGP: a fast and powerful algorithm to compute gene-level P-values from Genome-Wide Association Studies through circular genomic permutation


Other publications:

  • Yuanlong Liu, On optimal data-based bandwidth selection in kernel density estimation: a revisit with new approach, Applications of statistics and management [Journal in Chinese, in press]
  • Xiangning Wang, Jie Xu, Yuanlong Liu. [DCC-GARCH, M-Copula-GARCH, Copula-SV in the application of futures hedging], Systems Engineering [Journal in Chinese] 2012, 31:50-64