Probabilistic Methods for Microarray Data Analysis
Consortium leader: JUKKA HEIKKONEN
Laboratory of Computational Engineering, Helsinki University of Technology
Other project leaders of the consortium:
Henry Tirri, professor, Department of Computer Science, Univesity of Helsinki
Tomi Mäkelä, professor, Institute of Biomedicine, University of Helsinki
Doctoral students of the consortium:
Aatu Kaapro
Thomas Westerling
Tommi Mononen
Other researchers of the consortium:
Key words: microarray data analysis, probabilistic methods
Project desciption and main results:
The main objective of the research is to develop advanced methods for microarray data analysis. In particular we want to tackle the following main research issues:
1) Denoising of microarray images. This is a crucial preprocessing step in microarray data analysis and the research team will study a new information theory-based technique in this context. The goal of the research is to provide a new generic methodology for microarray image noise reduction based on the concept of Minimum Description Principle (proposed by Dr. Jorma Rissanen) and applicable also to other 1 to N dimensional biosignals. Part of the work will be done in collaboration with Dr. Jorma Rissanen.
2) Comprestimation (also called multiterminal estimation). In this research we will develop comprestimation methods for microarray images. In particular we are interested in developing lossless and progressive compression schemes, which allow for lossy image reconstruction and transmission at low bitrates, while still maintaining the possibility for a fully lossless reconstruction. As discussed above, microarray images are processed using a multi-step procedure, and consequently no simple distortion criterion exists. This research will be joint research with Professor Bin Yu (UC Berkeley).
3) Gene clustering and classification. We are interested in viewing the clustering of genes and the sample classification problem in a unified setting as one model selection problem rather than treating them separately. It is important to realize that unsupervised clustering of genes, and clustering for prediction generate very different results. By including a classification criterion in the clustering algorithm, we generate clusters that have predictive ability, which is not guaranteed to happen when genes are clustered in an unsupervised fashion. The approach adopted is based on information theoretical criteria called Minimum Description Length (MDL) principle
4) Gene interaction network reconstruction. Our goal is to reconstruct gene interaction networks from the microarray gene expression data combined with a priori knowledge of the process. We take into consideration both gene to gene interactions in one and several time phases and the affects of slow but cumulative actions of certain genes or oscillations in concentration of product of some genes.
5) Estimation of the reliability of the results. When analyzing and interpreting results from microarray experiments it would be of utmost importance to be able to also provide estimates on the reliability of the results. This requires that statistical uncertainties are maintained in all processing steps starting from noise reduction and ending up with gene clustering and classification. Our goal is to develop a methodology where the uncertainties are handled in a systematic and robust way during all image analysis steps to provide estimates for the reliability of the final output.
Publications: