Machine Learning for Personalized Medicine

Marie-Curie Action: "Initial Training Networks"

Project: High-throughput detection of higher-order epistasis using GPGPU computing methods

Meiwen Jia

Bertram Müller-Myhsok

Max Planck Institute of Psychiatry 


Meiwen Jia was born in Wuhu, China, in 1986. She studied Bioinformatics at the East China Normal University, and obtained her MSc in 2012 working on Prediction of human diseasome-related nonsynonymous variants by Bayesian causal inference.

Project description

The complex phenotype of human beings (e.g. common diseases) are likely associated with the effects of multiple genetic variants in combination with environmental factors. It is essential to investigate gene-gene and/or gene-environment interactions to understand mechanisms of complex diseases. Models of two-way epistatic interactions are often investigated and improved. However, evidence appears that genes not only interact with each other in a pairwise way but could also be involved in complicated networks. Unfortunately, there is no prior information regarding how many SNPs interact epistatically. Detection of high-order interaction in high dimensional data becomes theoretically and statistically difficult, including model’s rationality, computational complexity, statistical power and so on. The goal of this project is not only to develop refined methods but also to implement approaches into practical tools. The implementation will run on standard graphics processing units (GPU), provding a fast and inexpensive way for detecting high-order interaction through parallel computing. The work will focus on:
1. Simulation studies comparing different state-of-art techniques in epistasis analysis, including SVMs.
2. Development of a further refined method based on the results of simulation studies.
3. Implementation of the developed method in GPGPU techniques for maximum throughput.

Motivation for participating in the network

After 3-year master’s education in biomedicine, I realized the importance and necessity of cross disciplinary knowledge. Interdisciplinary education can break the limitations of traditional approaches in a single field, and germinates innovative perspectives and methods. I was looking for a unique and creative PhD programme, which provides doctoral students with opportunities to learn and integrate knowledge of multiple academic fields. MLPM creates such an inspiring and collaborative environment to enable young researchers to gain deep insight into machine learning methodology and statistical genetics theory. MLPM emphasizes not only high quality doctoral-level training in science, but also the researchers’ career development. The industrial internship helps us to better understand the application of the theories in practice. It tries to build a bridge from basic research to clinical application. Meanwhile, we can also learn the professional work culture, communication and management skills, projected-based assignments as well as the basic operation in enterprises, which is a complement for researchers in academic field.


Duration of fellowship: from June 2013 to June 2016



MLPM2012 Publications: 

  • Yu Y, Fuscoe JC, Zhao C, Guo C, Jia M, Qing T, Bannon DI, Lancashire L, Bao W, Du T, Luo H, Su Z, Jones WD, Moland CL, Branham WS, Qian F, Ning B, Li Y, Hong H, Guo L, Mei N, Shi T, Wang KY, Wolfinger RD, Nikolsky Y, Walker SJ, Duerksen-Hughes P, Mason CE, Tong W, Thierry-Mieg J, Thierry-Mieg D, Shi L, Wang C. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat Commun 2014; 5: 3230.
  • Jia M, Liu Y, Shen Z, Zhao C, Zhang M, Yi Z, Wen C, Deng Y, Shi T. HDAM: a resource of human disease associated mutations from next generation sequencing studies. BMC Med Genomics 2013; 6 Suppl 1: S16.
  • Wang J, Jia M, Zhu L, Yuan Z, Li P, Chang C, Luo J, Liu M, Shi T. Systematical detection of significant genes in microarray data by incorporating gene interaction relationship in biological systems. PLoS One 2010; 5: e13721. 

Other publications:

  • Zhang W, Yu Y, Hertwig F, Thierry-Mieg J, Zhang W, Thierry-Mieg D, Wang J, Furlanello C, Devanarayan V, Cheng J, Deng Y, Hero B, Hong H, Jia M, Li L, Lin SM, Nikolsky Y, Oberthuer A, Qing T, Su Z, Volland R, Wang C, Wang MD, Ai J, Albanese D, Asgharzadeh S, Avigad S, Bao W, Bessarabova M, Brilliant MH, Brors B, Chierici M, Chu TM, Zhang J, Grundy RG, He MM, Hebbring S, Kaufman HL, Lababidi S, Lancashire LJ, Li Y, Lu XX, Luo H, Ma X, Ning B, Noguera R, Peifer M, Phan JH, Roels F, Rosswog C, Shao S, Shen J, Theissen J, Tonini GP, Vandesompele J, Wu PY, Xiao W, Xu J, Xu W, Xuan J, Yang Y, Ye Z, Dong Z, Zhang KK, Yin Y, Zhao C, Zheng Y, Wolfinger RD, Shi T, Malkas LH, Berthold F, Wang J, Tong W, Shi L, Peng Z, Fischer M. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol 2015; 25 (16):133. doi: 10.1186/s13059-015-0694-1
  • SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol 2014; 32(9):903-14. doi: 10.1038/nbt.2957