Project: Machine Learning for biomarker discovery from heterogeneous data


Researcher: Felipe Llinarez López	Supervisor: Karsten Borgwardt	Department of Biosystems Science and Engineering, ETH Zürich, Switzerland

Felipe Llinares López was born in Algeciras, Spain, in December 1989. He studied Telecommunication Engineering at Universidad Carlos III de Madrid and obtained his MSc in September 2012 working on “Clustering Techniques for Base Station Coordination in a Wireless Cellular System.

Project description

Due to the nature of the data, applying Machine Learning to biomedicine and computational biology is particularly challenging. Besides being extremely high dimensional and having typically small sample sizes, available data exhibits a high degree of heterogeneity. For example, we might be interested in diagnosing patients by integrating biochemical data (like blood tests), different imaging modalities (like MRI or PET) along with qualitative clinical descriptors or the patient’s medical history.
Another relevant source of data heterogeneity in biomedical and biochemical research comes from the data structures themselves. While most standard Machine Learning methods assume that the input data consists of real-valued vectors, structures such as strings, graphs or time series are extremely common in medicine and computational biology. Without being able to handle those structures, representing DNA/protein sequences, chemical molecules, known interactions between genes or the evolution of clinical biomarkers and gene expression profiles, just to name a few, would be extremely difficult.

The main goal of this project is to address those obstacles by developing specific Machine Learning algorithms that go beyond the state-of-the-art in heterogeneous data integration for computational biology and medical research.

Motivation for participating in the network

Deciphering how living beings work all the way from the molecular level up to the physiological traits they exhibit is, in my opinion, the greatest reverse engineering problem Mankind ever faced. A better understanding of the biochemical machinery that makes us who we are would open the door to custom-tailored medical treatment, which could dramatically enhance our quality of life.

However, I deeply believe that the task ahead us has such an overwhelming complexity that traditional scientific research will never suffice to solve the puzzle on its own. Instead of relying on scientists to come up and validate every single explanation, merely using data analysis as a tool to validate from empirical data precise man-made hypotheses, I believe that Machine Learning holds the key to surpass our inherent limitations. With ever-increasing data availability and computing power, I am confident that in the not-so-distance future, Machine Learning algorithms will allow us to find and make sense of extremely complex patterns in experimental data, which we could not ever hope to find otherwise. Those new AI-fueled (artificial intelligence) findings will in turn allow us to refine our Machine Learning algorithms by incorporating more and more prior knowledge, creating a synergy between Machine-Learning-based research and traditional research that I believe will ultimately succeed to explain the detailed mechanisms of molecular biology, changing forever the way we fight disease.

My long-term career goal is to become an active part of that process, contributing to the design of specialized Machine Learning algorithms for computational biology and biomedical research.

By bringing together Machine Learning with Statistical Genetics, the “Machine Learning for Personalized Medicine” Initial Training Network offers an unparalleled environment to start such a career regardless of the student’s background. Therefore, I believe this to be a unique opportunity for anyone sharing the same motivation.

****************************

Duration of fellowship: from September 2013 to September 2016

Contact: felipe.llinares@bsse.ethz.ch

MLPM Publications:

ARTICLES

- Laetitia Papaxanthos*, Felipe Llinares-López*, Dean Bodenham and Karsten Borgwardt (*=equal contributions). Finding significant combinations of features in the presence of categorical covariates, Accepted at NIPS 2016.

- Mahito Sugiyama, Felipe Llinares Lopez, Niklas Kasenburg, Karsten Borgwardt.  Significant Subgraph Mining with Multiple Testing Correction,  Proceedings of the 2015 SIAM International Conference on Data Mining. 2015, in press.

- Felipe Llinares-López, Dominik G. Grimm, Dean A. Bodenham, Udo Gieraths, Mahito Sugiyama, Beth Rowan, Karsten Borgwardt. Genome-wide detection of intervals of genetic heterogeneity associated with complex traits. Bioinformatics (2015) 31 (12):i240-i249.

- Llinares-López F, Sugiyama M, Papaxanthos L, Borgwardt KM. Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing. Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD2015), 2015, 725-734.

CODE

http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html

ORAL PRESENTATIONS

- Llinares-López F, Sugiyama M, Papaxanthos L, Borgwardt KM. Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing. KDD conference 2015.

POSTERS

- Llinares-López F, Sugiyama M, Papaxanthos L, Borgwardt KM. Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing. KDD conference 2015

- Llinares-López F*, Bodenham D*, Papaxanthos L*, Borgwardt KM. Detecting significant high-order associations between genotype and phenotype while conditioning on covariates. NIPS conference 2015 (* first authors)

Machine Learning for Personalized Medicine

Marie-Curie Action: "Initial Training Networks"