Bayesian Latent Class Modelling for Functional Genomics: Combining Experimental Results and Data Base Knowledge (FGBayes)
Project leader: Professor ELJA ARJAS
University of Helsinki, Rolf Nevanlinna Institute, University of Helsinki
Doctoral students of the project:
Sarish Talikota, Rashi Gupta
Other researchers of the project:
Madhuchhanda Bhattacharjee
Jukka Corander
Dario Gasbarra
Andrew Thomas
Key words: gene expression data, Bayesian latent varable modelling, dynamic analysis
Project desciption and main results:
The study of the role of genes in controlling biological processes is now changing from describing the functions of individual genes towards a more holistic systems point of view. In order to gain a better understanding about such complex structures one needs to integrate evidence from a number of different sources of biological data, and combine this information with expert hypotheses.
The starting point of our analyses is to use microarray gene expression data. Such data are generally very noisy and therefore extracting useful information that would be relevant for the targeted biological questions is a challenging task by itself. The process of information extraction consists of a number of distinct steps, with the outcome of each step depending on the outcomes of the preceding ones. In our work we integrate these steps into a single framework using Bayesian latent variable modelling, thereby also combining different sources of uncertainty by jointly modelling the uncertainty involved in gene identification and in the currently existing knowledge about their functionality. Our results so far indicate distinct functional patterns across tissue types among the genes that were selected on the basis of expression information.
Several approaches, which use expression information to infer about pathways, are currently available for mapping gene clusters onto networks. We propose that the considerable amount of knowledge available on putative reactions, or on other interactions, should be incorporated into the analysis at an earlier stage than is currently done. We propose using a unified statistical approach based on Bayesian networks, again integrating different sources of information and data, and utilising expert knowledge on specific issues while still accounting for the uncertainties involved.
For a dynamic analysis of a cellular system it is necessary to create a mathematical model. The scope and level of abstraction of the model may vary substantially depending on whether the intention is, for example, to obtain an in-depth understanding of system behaviour or to predict complex behaviours in response to complex stimuli. Valuable tools for creating and understanding such models can be developed by combining the existing theory of causality, findings from expression level analysis, graphical models, and information from biological databases. The main goal will be the development of new analytic and numerical tools which apply and extend the theory of Bayesian graphical models in conjunction with various sources of information such as experimental data, literature, and functional and pathways databases.
The proposed work will be carried out in close collaboration with a number of study groups involved in substantive biological research, both in Finland and abroad. With the new tools and new ways of thinking being applied to specific research projects we hope to contribute to a better understanding of the corresponding biological systems, their structures, and dynamics.
Publications: