Most common diseases, including allergy, cancer and diabetes, are complex. The genetic susceptibility of an individual to such a disease is not the result of a single causative gene, but rather altered interactions between multiple genes. In many cases, DNA microarray studies have implicated hundreds of genes. Moreover, there is considerable individual variability. A clinical consequence of all this is variable response to medication, which increases both suffering and cost. Physicians should ideally be able to personalize medication routinely, based on measurements of but a few protein markers. The identification of such markers is thus an important goal, but also a formidable challenge, and one that requires understanding complex pathogenic mechanisms and how they vary across populations. It is possible that the recent advances in high-throughput genomics, computer science, bioinformatics and systems biology outlined below could contribute to such understanding (Fox JL. Nature Biotechnol 2007).
In this project we hypothesize that markers for individualized medication can be identified by network-based analysis of gene expression arrays. Our approach is outlined as follows:
The project is facilitated by the recent development of high-throughput methods to analyse SNPs, proteins and different regulatory elements such as microRNAs and DNA methylation. Moreover, because the layers and elements are interdependent, an analysis of dependencies can be used for step-wise cross-validation (for example, altered mRNA expression due to regulatory SNPs).
On the other hand, this project is faced with several noteworthy challenges: a) the heterogeneity of complex diseases, and in many cases very little is known about causal mechanisms; b) the difficulties in finding representative study models; c) methodological problems involved in the development of computational and bioinformatics methods to build modules; d) experimental validation of disease mechanisms that may involve great numbers of genes, many of which have unknown or poorly defined functions.
The effort is based on ongoing multi-disciplinary collaborations between clinically active researchers and leading experts in genomics, systems biology, computer science, bioinformatics and statistics.
The development and application of methods draws on the applicants’ experiences of genome-wide association studies (Sladek et al. Nature 2007), network models of gene expression data (Jenssen et al. Nat Genet 2001, Voy et al. PLoS Comput Biol 2006) and linking such models to genetic variations (Chesler et al. Nat Genet 2005). These methods are further developed on allergen-challenged lymphocytes from patients with seasonal allergic rhinitis (SAR). This is an optimal disease model because it is common, well-defined, has a known external cause (pollen) and can be analyzed in both experimental and clinical studies. The analytical methods are, however, likely to be applicable to other complex diseases. The allergen challenged cells are analysed with DNA microarrays to identify transcriptomal modules responding to the challenge. SNPs in regulatory regions of key genes in the modules are studied, and the corresponding proteins analysed, in supernatants as well as in nasal fluids as diagnostic markers (Benson et al. J Allergy Clin Immunology 2006). The timetable calls for delivery of the first multi-layer modules after the first 18 project months based on a transcriptomal template from DNA microarray studies. Onto this template is added data about regulatory elements such as transcription factors, microRNAs and DNA methylation. These elements are analyzed on a genome-wide scale, and this information will be used to examine selected SNPs and proteins. During the next six months, selected genes will be examined experimentally. For genes with unknown roles, this will involve combined bioinformatics and RNAi studies as described by the participants (Murali et al. Nat Biotechnol 2006, Sonnichsen et al. Nature 2005, Echeverri et al. Nat Methods 2006, Nat Rev Genet 2006). The aim is to define one or more disease modules that are general to all patients with seasonal allergic rhinitis. During year three, large-scale studies will be performed to define individual variations in such modules, ranging from SNPs to proteins. In parallel, the original general modules will be refined using novel methods to analyse other layers and regulatory elements. During year four, clinical studies will be performed to test if selected protein markers can be used to predict treatment response. In addition, the analytical methods will be made available on the Internet in a standardized format for studies of other complex diseases.
Complex diseases like diabetes, allergy and cancer depend on altered interactions between large numbers of genes, many of which do not belong to known disease mechanisms. Genome-wide association studies performed by one of the applicants have for example described such genes in diabetes (Sladek et al. Nature 2007). In allergic disease DNA microarray have shown changes of expression of hundreds of genes (Benson et al. J Allergy Clin Immunol 2004, 2006). Emerging high-throughput technologies indicate disease-associated changes in other layers and regulatory elements, for example copy number variants, DNA methylation and microRNAs (Hardiman et al. Pharmacogenomics 2006). On top of this complexity there is considerable individual heterogeneity. A clinical consequence of this is variable response to treatment, which increases both suffering and cost. Personalized medication has therefore been highlighted as a priority. At present, however, there are only a few examples that have reached the clinic (Fox JL. Nat Biotechnol 2007).
One approach to functionally understand gene expression changes in complex diseases may be to change the scale from individual genes to groups of functionally related genes. Such genes may be identified with bioinformatics methods, like cluster analysis, that group genes whose expression levels correlate. These clusters can be used for classification, for example of different lymphomas (Alizadeh et al. Nature 2000). However, the analysis in itself does not give any functional understanding of disease mechanisms. One possibility to obtain such understanding is to search the gene expression data for genes known to belong to specific pathways (Benson et al. Cytokine 2002). A problem with this is that complex diseases often involve multiple interacting pathways, which may be difficult to separate from each other. Rather, they form sub-networks or modules. Such modules have been identified in studies of cancer and functionally annotated (Segal et al. Nat Genet 2005). Networks provide a compelling framework to organize and functionally understand complex systems (Barabasi et al. Nat Rev Genet 2004, Mustacchi et al. Yeast 2006). In the context of gene expression data in human cells, network-based analysis has been applied to form networks of interacting genes and dissect those networks to find modules and pathways (Jenssen et al. Nat Genet 2001, Calvano et al. Nature 2005). The same analytical methods have also been applied to human disease and used to go from modules to individual genes (figure 1). The corresponding proteins have been tried as diagnostic markers in human disease (Benson et al 2006. J Allergy Clin Immunol 2006). The latter study was also based on computational methods developed by one of the applicants in studies of inbred mice, in which altered gene expression patterns data were used to find genetic variants that caused those alterations (Chesler et al. Nat Genet 2005). Linking gene expression changes to genetic variants has also been performed in human cells (Bystrykh et al. Nat Genet 2005). These studies show how changes in a transcriptional module can be used as a template for further studies:
Problems in high-throughput studies of complex diseases
In this project we address these problems as follows:
Figure 1. Network-based analysis of DNA microarray data. A) Genes identified by DNA microarray analysis of allergic disease (red) are mapped on to an interaction network formed by all human genes (grey). B) Modules of interacting genes that represent disease-associated biological functions are identified C) A module is dissected to find a pathway D) The pathway is analysed to find a putative disease-causing gene (in this case exemplified by an up-stream gene)
To our knowledge this is the first project that aims to define multi-layer modules (MLM) in a complex disease and use them for a clinical goal, to personalize medication. This involves development and integration of novel computational and bioinformatics methods based on a systems biological framework. If successful, the project may serve as a model for studies of other complex diseases. The analytical methods will be made available on the Internet in a standardized format for such studies. The project may also increase understanding of the relative role of different layers and elements, as well as of genes with presently unknown functions in complex diseases.
Application of systems biology and high-throughput genomics to solve a concrete clinical problem, i.e. to personalize medication is a formidable challenge. This requires integration of many different forms of expertise and complex analytical methods that are applied to diverse materials such as human cells and knockout mice. Some of the patient groups are hard to find (e.g. monozygous twins) or need to be examined more than once. Finally, the results need to be validated in clinical studies to see if treatment response can be predicted. Since there is limited experiences of analysing data of such diversity and complexity the integrated development and application of new computational, bioinformatics and statistical solutions is required.
In order to build MLM all the high-throughput experiments must be performed on the same individuals. Therefore the clinical materials are obtained in one clinical research centre (The Unit for Clinical Systems Biology/The Unit for Pediatric Allergology, The Queen Silvia Children’s Hospital, Göteborg, Sweden, WP1). The Unit for Clinical Systems Biology is headed by a clinician, Dr Mikael Benson, who coordinates this project. MB has a six-year grant from the Swedish Research Council as a senior researcher that is combined with his position as a senior consultant in paediatric allergology at Queen Silvia Children’s Hospital. Obtaining rare patient materials is simplified by patient registries, such as the Swedish Twin Registry (figure 2). All high-throughput experiments are performed either at the Genomics Core Facilities in Göteborg and Oslo, or by Cenix Bioscience, a biotech SME with expertise in RNA interference (WP1, WP3 and WP5, respectively)
A team of leading international experts has been assembled to analyse the data. Professor Michael A. Langston and his group of post docs and PhD students at the Department of Electrical Engineering and Computer Science at the University of Tennessee were the first to harness fixed-parameter tractability in order to develop pioneering clique-centric methods that find modules of putatively co-regulated genes in gene expression data (Abu-Khzam et al. Algorithmica 2006), link these modules to genetic variation using quantitative trait loci (Chesler et al. Nat Genet 2005), and synthesize from these methods novel topological differential analysis tools (Voy et al. PLoS Comput Biol 2006). ML is also an expert on combinatorial algorithms for the integration and analysis of biological data of large scale and wide diversity (Kirova et al. CAMDA 2006; Zhang et al. Supercomputing 2005). This work is facilitated by state-of-the-art supercomputers at the nearby Oak Ridge National Laboratory, where ML is a Collaborating Scientist in the Systems Genetics Group within the Biosciences Division (WP2).
Functional annotation of modules is done in WP3 by Professor Eivind Hovig and associates using their PubGene co-citation literature network (Jenssen et al. Nat Genet 2001). An advantage of PubGene is that modules are not restricted by canonical gene interaction pathways that have been described in healthy cells. It may therefore be particularly suitable to functionally annotate disease modules that may have gene interactions that differ from those in healthy cells. Another advantage is that PubGene is combined with other data sources to provide cell-specific and multi-layer network information. Additionally PubGenes’ algorithms can be modified in a customized fashion to adjust to the context specific needs of the complex diseases to be analyzed clinically. EH is an expert in fields including the bioinformatics of biomedical text mining, microarray analysis, and annotation based comparisons of DNA features. EH has both a wetlab group within cancer and gene silencing, and a bioinformatics group.
Figure 2. Analysis of selected patient groups, such as monozygous twins, under controlled conditions reduces the complexity of the project. The photo shows a nurse at the Unit for Clinical Systems Biology obtaining blood samples from twins for in vitro allergen-challenge of CD4 + lymphocytes.
An important challenge in the project is that high-throughput technologies generate data where the number of variables greatly exceeds the number of observation. This requires the development and application of new statistical methods, which will be undertaken in WP 4 by Professor David J. Balding and Dr Lachlan J.M. Coin of the Centre for Biostatistics at Imperial College London. DJB has a PhD in applied probability from Oxford, and since graduating has worked to apply mathematical, computational and statistical methods to solve problems in biology and medicine, particularly in population, evolutionary and medical genetics. Recently he has been active in developing and applying novel statistical methods for the analysis of genetic association data, particularly for genome-wide association studies (Sladek et al. Nature 2007) These methods have tackled problems of confounding by population structure, exploiting haplotype clustering to strengthen signals of association, and simultaneous analysis of large numbers of genetic variants (Balding et al. Nat Rev Genet 2006). LJMC has a PhD in Bioinformatics from the Wellcome Trust Sanger Institute and Cambridge University, and has worked in comparative genomics, phylogenetics, microarray analysis, identification of copy-number variants and haplotype clustering methods based on Hidden Markov Models (Coin et al. PNAS 2003, Futreal PA et al. Nat Rev Cancer 2004).
The role of disease-associated genes identified through the methods described above, will be confirmed experimentally using high-throughput RNAi (HT-RNAi). Cenix BioScience GmbH, represented by Drs. Christophe Echeverri (founder, CEO/CSO) and Birte Sönnichsen (COO) (WP5) were the first to pioneer high throughput applications of RNAi (HT-RNAi), carrying out the first comprehensive genome-wide RNAi screen (Sönnichsen et al, Nature 2005) for genes involved in early embryogenesis of the nematode worm C. elegans, where RNAi was first described in 1998 by Fire and Mello’s Nobel Prize-winning work (Fire et al, Nature 1998). Since 2001, Cenix has focused entirely on further developing and applying the power of genome-scale HT-RNAi with high content, multi-parametric assays using automated microscopy in a wide range of human and rodent cultured cell models. As such, Cenix has since established itself as a global leader in exploiting this technology in collaboration with both academic and pharmaceutical research groups, to advance a wide range of basic research and applied disease fields including oncology as well as metabolic, infectious and cardiovascular diseases (e.g. Sachse et al. Oncogene 2004). Along the way, Cenix has also led the way in defining the combination of computational and automated laboratory analysis infrastructures required to drive these advanced functional genomics applications (Sachse et al. Methods Enzymol 2005; Echeverri et al. Nat Rev Genet 2006 and Nat Methods, 2006). CE has a PhD in cell biology from the University of Massachussetts (Worcester, MA), and his postdoctoral work at the European Molecular Biology Laboratory (EMBL, Heidelberg, Germany) pioneering the genome-scale use of RNAi formed the basis for founding Cenix. BS has a PhD in cell biology from the University of Göttingen (Göttingen, Germany), and joined Cenix as one of its first senior scientists following her postdoctoral work at Cancer Research UK (formerly, the ICRF) in London and the EMBL, (Heidelberg, Germany).
Organization and funding: The UCSB is part of the Centre for Systems Biology in Göteborg, which links several groups within the medical, natural science and technological faculties. This local network extends internationally through EC funded projects with a total budget of 15 million Euro. The UCSB is supported by the Swedish Research Council, the European Commission and Göteborg University. The total funding is some 2 million Euro over the next three years. Key investigators and their roles: Mikael Benson is the coordinator of the project and the head of the UCSB. He and co-workers are responsible for gathering the clinical materials and performing the experimental and genomic studies in collaboration with the UPA, the Department of Immunology and the Bio-X-Med genomics unit, as well as with the applicants in WP3 and WP5. MB is a senior consultant in pediatric allergology at the UPA. He has a six-year grant as a senior researcher from the Swedish Research Council to apply high-throughput technology and systems biology to find markers for personalized medication.
Organization and funding: The University of Tennessee is a comprehensive land-grant university and a Carnegie I Research Institution. Knoxville is the flagship campus of the University of Tennessee system. Students enroll from every state in the nation and over 100 foreign countries. Oak Ridge National Laboratory is the Department of Energy’s primary institution for high performance computing. It has distinctive capabilities in biological science, materials science, neutron science and many other areas. ORNL employs roughly 1500 scientists and engineers and covers a total of 58 square miles. Dr. Langston regularly consults at and maintains accounts on ORNL’s vast assortment of state-of-the-art clusters, supercomputers and mass storage systems. His team is funded 75% time thorough UT and 25% time through grants on which he serves as PI or co-PI.
Key PIs and their roles: Professor Langston leads a team of students, post doctoral fellows and research associates whose work is focused on efficient algorithm design, analysis and high performance implementations, with a special emphasis on applications to computational biology. He also serves as Collaborating Scientist at ORNL, where he maintains offices in the Biosciences Division and regularly consults in the Computer Science and Mathematics Division, the Chemical Sciences Division, the Joint Institute for Computational Science and the Computational Biology Institute. He is currently in the process of developing portals through which the community at large may access his team’s computational tools. His work in developing ClustalXP is a prominent example. Professor Langston has authored over 200 refereed publications, including those in journals relevant to this project such as Nature Genetics, PLoS Computational Biology, Journal of the ACM, and Journal of Allergy and Clinical Immunology. He is perhaps best known for his long-standing work on combinatorial algorithms, complexity theory and design paradigms for sequential and parallel computation. In addition to maintaining his research program, he regularly teaches courses on algorithmic analysis, computational and systems biology, discrete optimization, graph theory and related subjects. His research has been funded in the U.S. by the National Science Foundation, the Department of Defense, the Department of Energy, the National Institutes of Health and a variety of other agencies. It has been funded abroad by the Australian Research Council and the European Commission. He has received numerous awards, most recently the Distinguished Service Prize from the Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory.
Organization and funding: The research group is integrated within several institutions, The Rikshospitalet-Radiumhospitalet university clinic, and the Institute of Informatics at the university of Oslo. The group also runs a bioinformatics core facility for both institutions, funded by them. Funding is also obtained from the Functional genomics program of Norway. The group is integrated into the national bioinformatics framework. The group is active in both text mining, statistical mechanics of DNA and a number of aspects related to microarrays.
Key investigators and their roles: Eivind Hovig, professor at Department of Informatics at the University of Oslo, Norway, holds positions as a group leader at the Institute for Cancer Research, and as section head at the Department of medical informatics at The Norwegian Radium Hospital. He leads two research groups (a wetlab group and a drylab group) of students, post doctoral fellows and research associates whose work is focused on development and implementation of high-throughput techniques for genomics and clinical bioinformatics. He has coauthored some 85 articles in journals including Nature Genetics, Nature Biotechnology, PNAS and Lancet, has authored 7 patents and has received awards for scientific excellence as well as inventor prices for his work. He is the leader of the FUGE functional genomics platform of bioinformatics in the Oslo region, and member of the board of the national FUGE bioinformatics steering group. He is chief scientific officer of the Norwegian based bioinformatics company, PubGene Inc., based on a patent of Hovig, Jenssen et al., and is engaged as adviser in two other companies.
Organization and funding: The Centre for Biostatistics at Imperial College (http://www.icbiostatistics.org.uk/), jointly led by Professors David Balding and Sylvia Richardson, includes around 10 academic staff, 20 postdocs and 15 research students, based in several departments but primarily within the Department of Epidemiology and Public Health at the St Mary’s Hospital campus. The research of the Centre is focussed on using advanced biostatistical methods to enhance population health. It has major strengths in the statistical analysis of genetic association studies and gene expression studies, particularly using highly-structured Bayesian modelling implemented via intensive stochastic simulation techniques. The latter exploit a local computer farm within the Department as well as the Imperial High Performance Computing centre. The work of the Centre is funded by UK research councils (MRC, BBSRC, EPSRC, ESRC), research charities (Wellcome Trust, BHF), the European Union and the US NIH, and industry (GSK, Astrazeneca).
Key PIs and their roles: DJB is Professor of Statistical Genetics and LJMC is Senior Research Fellow, and both contribute to leading a team of around 12 postdocs and PhD students. LJMC will be the primary supervisor of the postdoc, and together they will deliver the principal research outcomes of the workpackage. DJB will offer and advice and overall supervision, and co-ordinate contributions from other members of the Centre for Biostatistics when their specialist expertise is required for the project. In addition to the deliverables specified above, the Imperial College team will provide specialist biostatistical advice to the MultiMod project partners as needs arise during the collaboration.
Organization and funding: Cenix BioScience GmbH (based in Dresden, Germany: http://www.cenix-bioscience.com) is a privately held biotechnology company founded in 1999, specializing in advanced, cell-based applications of RNAi for the discovery and functional characterization of novel therapeutic targets, biomarkers and drug candidates. Led by Drs. Christophe Echeverri (lead founder, CEO/CSO) and Birte Sönnichsen (COO), Cenix counts 34 staff, 28 of which form the scientific team combining both laboratory and IT operations that drive the company’s core offerings: advanced research services. The revenues from these RNAi-based discovery projects with both pharma (e.g. Bayer, Schering, Merck KgaA, etc..) and academic clients, have secured the company’s continued profitability for the last 4 years running. Cenix has also maintained active R&D activities to further develop its technology base both on its own and within research consortia supported by 7 major grants from German federal granting programs.. Cenix scientists have successfully completed genome-scale RNAi screening projects using various human and rodent cell-based models in a wide range of disease areas including oncology, as well as metabolic, cardiovascular and infectious diseases. Cenix particular area of specialisation is combining HT cell-based applications of RNAi with high content, multi-parametric readout assays, to extract the richest possible analyses from these datasets, with maximal patho-physiological relevance.
Key PIs and their roles: Dr. Maria Mirotsu has lead major RNAi studies for clients and several internal RNAi optimization studies in disease relevant cell systems at Cenix, including primary B and T lymphocytes and the monocyte cell line U937.