Machine Learning for Personalized Medicine

Marie-Curie Action: "Initial Training Networks"

Inter-platform Concordance and Prediction of Chemical Mode of Action Using

 

The organizers selected a CAMDA 2016 challenge question. The same question was asked at CAMDA 2015. Since there was no satisfying answer, the CAMDA officials repeated the challenge in 2016.

CAMDA 2016

Challenge 3: SEQC Rat TGx - rat liver response to chemicals

Topic 2: Classification / prediction: "We know we can get 100% accurate prediction of the chemical mode of action for RNASeq. Can you develop a similarly good predictor that works crossplatform?"

 

BACKGROUND:

In last 10 years, technologies in molecular genetics are rapidly changed its situation from low throughput to high throughput. Accordingly, the quality of the data which has high impact on efficient diagnosis and prognosis has been extensively improved. These technologies are based on polymerase chain reaction, microarray and next generation sequencing, in chronological order from oldest to newest. Concordance between these technologies/platforms is one of the challenging topic in genetics and computational biology. Which is also the key point to be able to combine retrospective and prospective data.

GOAL:

To extract as much as possible and accurate information from dataset collections and transform it into an understandable structure for further use in medicine. The project work was focused on prediction of effect of chemicals using gene expression data.

The data provider tried 61 different classifiers (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243706/). Using RNAseq data they were able to do %100 accurate prediction. But  they were not successful using microarray data.
So the main question for us was: Can we develop a similarly good predictor that works for microarray?

DATA:

The matched RNASeq and microarray gene expression profiles which have been provided by FDA SEQC consortium were obtained from CAMDA Challenge website.

RESULTS:

We were not successful predicting the chemical's mode of action as good as the data provider did. However, we obtained good results for microarray data, comparable with the results of Wang C et al (2014)*.

Data, codes and results can be found here: https://github.com/mzwiessele/MLPMMarchRetreat2016

Reference:

* Wang C et al. A comprehensive study design reveals treatment- and abundance-dependent concordance between RNASeq and microarray data. Nat Biotechnol 2014; 32: 926-932.