Machine Learning for Personalized Medicine

Marie-Curie Action: "Initial Training Networks"

Talks and Speakers

Talks and Speakers

Invited Speakers

Pierre Baldi

University of California in Irvine

Carbon-Based Computing Vs Silicon-Based Computing: Towards A New Theory of Circadian Rhythms

Carbon-based and silicon-based computing systems are very different. One key difference is the pervasive presence of circadian rhythms in living systems at multiple levels. At the molecular level, circadian rhythms are regulated by a central clock consisting of a key negative transcription-translation feedback loop involving a dozen of genes. However, integrative systems biology analyses of high-throughput transcriptomic and metabolomic data reveal that roughly 10% of genes or metabolites oscillate in a circadian manner in any given cell or tissue. Furthermore, when data is aggregated across different tissues and genetic or environmental conditions, the overlap in circadian species beyond the core clock is very small. Thus a large fraction of molecular species in the cell is capable of oscillating in a circadian manner under some set of conditions. We will present a novel theory of circadian rhythms to explain these puzzling findings. In this theory, molecular networks are viewed as networks of coupled-oscillators sculpted by 3.5 billion years of evolution. Under a given set of genetic and environmental conditions, a cell can reprogram itself and select its own subset of oscillatory species out of a vast repertoire. The oscillating species provide a physiological signature of the state of the cell.

References:
V. R. Patel, K. Eckel-Mahan, P. Sassone-Corsi, and P. Baldi. How Pervasive Are Circadian Oscillations? Trends in Cell Biology, in press, DOI:10.1016/j.tcb.2014.04.005, (2014).
K. L. Eckel-Mahan, V. R. Patel, S. de Mateo, N. J. Ceglia, S. Sahar, S. Dilag, Kenneth A. Dyar, R. Orozco-Solis, P. Baldi, and P. , K. S.Vignola, R. P. Mohney, P. Baldi, and P. Sassone-Corsi. Coordination of Metabolome and Transcriptome by the Circadian Clock.Sassone-Corsi. Reprogramming of the Circadian Clock by Nutritional Challenge. Cell, 155, 7, 1464-1478, (2013).
V. Patel, K. Eckel Mahan, P. Sassone-Corsi, and P. Baldi. CircadiOmics: Integrating Circadian Genomics, Transcriptomics, Proteomics, and Metabolomics. Nature Methods, 9, 8, 772-773, (2012).
K. L. Eckel-Mahan, V. R. Patel PNAS, 109 (14) 5541-5546, (2012)

Guillaume Bourque

McGill University

Deciphering human non-coding DNA using machine learning approaches

In this presentation we will present an overview of the functional genomics datasets and tools that have been made available by consortiums such as ENCODE, the NIH Roadmap and now the International Human Epigenome Consoritum (IHEC). These data have been generated in a collection of reference and disease cell-types and include information on protein-DNA interactions or on histone marks (ChIP-Seq), transcriptome (RNA-Seq), methylation (Mehyl-Seq) and open chromatin (DNase-Seq). We will explain how these data can used to interpret human non-coding DNA and help identify detrimental DNA variants or mutations. Finally, we will show how machine-approaches can be used to go beyond the simple annotation of non-coding DNA and to mine these functional genomics data even further.

Machine learning for brain imaging slides

In this talk, I would like to showcase a few examples of machine learning problems that arise when using brain imaging to understand brain function and its pathologies. I'll first introduce brain images and to derive statistical features. Then I'll discuss how prediction from these images is useful for diagnostic purposes, but also as a windows to understand the brain. I'll highlight specific challenges that arise when learning predictive models from brain maps, and details solutions put forward by our group, namely spatial penalties. Moving beyond well-posed statistical maps, I'll show how a combination of unsupervised modeling and supervised learning can predict phenotypic traits from spontaneous brain activity, recorded without controlling the subjects behavior. Finally, I'll detail how our work builds upon and nourishes a Python software stack that we leverage to interact with practitioners.

Scientific Lectures

Antonio Artés Rodríguez

University Carlos III

Introduction to Hidden Markov Models slides

The lecture will present an overview on Hidden Markov Models (HMM), an ubiquitous tool for dealing with sequential data. We will introduce student different methods for estimating the hidden states and model parameters. We will consider classic as well as parametric and non-parametric Bayesian inference methods, and methods suited for massive data sets like spectral learning.

Joaquin Dopazo and David Montaner

Centro de Investigación Príncipe Felipe

Principles of genomic data analysis

Based on our MDA courses held in different locations (Valencia, Cambridge, Edinburgh and Lisbon), we will introduce students to the basics and the state-of-the-art on genomic data analysis from the most common platforms (microarrays, genotyping arrays and NGS) and the most common analytic tasks on these contexts, including differential gene expression, class discovery, predictors, association tests and functional analysis.

Bertram Müller-Myhsok

Max Planck Institute of Psychiatry

Personalized medicine with a special view on psychiatry

We will present the state of the art on this subject, trying to derive guidelines and future perspectives.

Fernando Pérez Cruz

University Carlos III

Bayesian Machine Learning and Graphical Models

Based on lectures held at Princeton University, we will introduce students to the basics and the state-of-the-art on parametric and non-parametric Bayesian Machine Learning and graphical models description, inference and learning, as well, as exponential family models for machine learning. We will also show the application of these tools to genetics and medicine problems.

Kristel Van Steen

University of Liege

Methodological aspects in integromics: integrating multiple omics data sets

The advent of high-throughput technologies including sequencers and array-based assays (expression, SNP, CpG) have caused the generation of humongous amounts of data often referred to as “Big Data”. The biological datasets are heterogeneous and often include gene expression, genotype, epigenome and other types of data that are referred to as “-omics” data. As a result, there is a strong effort across multi-disciplinary scientific communities to develop robust, computationally efficient and sensible data processing pipelines to effectively analyze “-omics” data in order to extract biologically and clinically relevant information – “useful knowledge”.

The enthusiasm of having access to vast amounts of information resources comes with a caveat. In contrast to single omics studies, integrated omics studies are extremely challenging. These challenges include protocol development for standardizing data generation and pre-processing or cleansing in integrative analysis contexts, development of computationally efficient analytic tools to extract knowledge from dissimilar data types to answer particular research questions, the establishment of validation and replication procedures, and tools to visualize results. However, from a personalized medicine point of view the anticipated advantages are believed to outweigh any difficulty related to “integromics”. The strong interest in the topic has already resulted in the emergence of new integrative cross-disciplinary techniques based on for instance kernel fusion, probabilistic Bayesian networks, correlation networks, statistical data-dimensionality reduction models, and clustering.

In this contribution, we will highlight the key steps involved in omics integration efforts and will summarize main analytic paths. We will then zoom in on a novel integrated analysis framework (based on genomic MB-MDR). This framework will be used as a red thread to discuss main issues, pitfalls and merits of integrated analyses. Unprecedented opportunities lie ahead!

Valentina Boeva

Institut Curie

Introduction to the bioinformatics of cancer: high-throughput sequencing of the genome, epigenome and transcriptome

In this presentation, I will introduce bioinformatics approaches to study genetic, epigenetic and transcriptome variations in cancer. I demonstrate how we can analyze whole genome, exome sequencing and amplicon sequencing data in order to find small mutations, copy number changes and large structural variants in cancer DNA. I will talk about the analysis of RNA-seq and ChIP-seq data in cancer studies.

Analysis of epigenetics and chromatin states in normal and cancer cells

In this talk, I will introduce methods for characterization of epigenetic profiles and detection of chromatin states. I will explain how chromatin states are associated with gene expression. I will also speak about epigenetic changes in cancer cells.

Bernhard Schölkopf

Max Planck Institute for Intelligent Systems

Kernel Methods

The course will introduce the main concepts of machine learning and statistical inference using kernel functions. Kernels induce a vector space representation of the data, they formalize the notion of a nonlinear similarity measure, and they parameterize the function class used for estimation. Kernel methods are studied using methods of functional analysis, and implemented using tools of convex optimization.

Volker Tresp

Siemens

Statistical Relational Learning

Statistical Relational Learning concerns applications where relationships between objects are important. Examples are social networks (Jack, likes, Mary), a clinical setting (Jack, hasDisease, Diabetes) and knowledge graphs (Obama, presidentOf, USA). I will first review multivariate statistical models (Bayesian networks, Markov networks, mixture models, factor models) and then discuss how they are generalized to relational domains as probabilistic relational models, Markov logic networks, infinite hidden relational models and the RESCAL model.

Microbiome analytics: teeny organisms, Big Data

A hundred trillion microbes of thousands of different kinds live in and on us, outnumbering our body cells 10 to 1. Recent studies, using innovative metagenomic (population-level genomic) technologies, suggest that these microbial communities and their collective genomes affect our health, metabolism, even behavior. In this course, I review the computational tools currently used for microbiome-related analysis and discuss the main challenges of incorporating such data in personalized medicine applications.

Fabian Heinemann

Roche Diagnostics

Machine learning in the health-care industry at Roche

Roche is a Swiss based, international health-care company. An important focus of Roche is on the field of Personalized Healthcare (PHC). The talk will start with a brief company profile, followed by an introduction to PHC. In the second part, selected examples of Machine Learning at Roche will be presented.

Complementary Skills Courses

Mikel Tapia

University Carlos III

The Firm and the Financial Markets slides annex

The aim of this course is to understand what a corporation is and what is the link between the company´s operations and the financial markets. At the end of the course students should be able to understand the definition of a firm, the role played by managers, the basic concepts about instruments, markets, and bonds, the functioning of financial markets, and some basics about price statistics.

Alexander Gerber

Rhine Waal University

Open Science Communication: Changes in Academia Through Social Media

Part 1: Overview of tools and platforms on the social web
Part 2: Cases, Demonstrations, Interactive Training
Suggested reading.

Patrice Wegener

Max Planck Institute for Biological Cybernetics

How to make good use of funding programmes for your own career development slides

(a) Overview: The funding landscape at a glance.

(b) Team simulation game: a Collaborative research project application (2 x 12 participants max).