Combining multiple data sources in functional genomics for improving genome-wide inferences
Consortium leader: SAMUEL KASKI
Lab. of Computer and Information Science, Helsinki University of Technology http://www.cis.hut.fi/projects/mi/
Other project leaders of the consortium:
Jaakko Hollmén, chief research scientist, Department of Computer Science and Engineering, Helsinki University of Technology http://www.cis.hut.fi/jhollmen/
Eero Castrén, professor, Neuroscience Center, University of Helsinki http://www.helsinki.fi/neurosci/castren.htm
Sakari Knuutila, professor, Depts of Pathology and Medical Genetics, University of Helsinki http://www.helsinki.fi/cmg/
Doctoral students of the consortium:
Ioana Borze
Arto Klami
Juha E.A. Knuuttila
Mikko Korpela
Leo Lahti
Pamela Lindholm
Samuel Myllykangas
Penny Nymark
Anne Tyybäkinoja
Merja Oja
Salla Ruosaari
Suvi Savola
Ilari Scheinin
Janne Toivola
Hanna Vauhkonen
Jarkko Venna
Other researchers of the consortium:
Sami Khuri (visiting professor) (San Jose State University)
Janne Nikkilä
Jaakko Peltonen
Key words: bioinformatics, cancer, data mining, data fusion, microarray, functional genomics, neurotrophins
Project desciption and main results:
We address a fundamental data-analytic limitation of genome-wide microarray measurements. The number of genes that can be measured at a time is huge but the number of samples (microarrays) is limited by the measurement cost and sample availability. Hence, the relative number of representative samples per gene is always very small, and the problem will persist; in new experimental settings there never exists representative data a priori. This makes accurate data analysis difficult and increases the chances of false discoveries when targeting a holistic view of the cell, based on the noisy high-dimensional data.
Our bioinformatics research problem is how to take advantage of existing, partially representative data sets of different types to support inferences in biological and medical questions. We have developed general data analysis methods that can use the accumulating body of data in supporting genome-wide inferences in new settings and research questions. The developed methods have been applied in cancer research and comparative functional genomics.
For example, genome-wide integration strategies have been used to study various cancers. We have also developed CanGEM (Cancer GEnome Mine), which is a public database for storing microarray data and relevant metadata about the measurements. By combining information from aCGH and gene expression arrays using novel computational approaches we have identified putative target genes that could have value in clinical applications.
Publications:
Gupta R, Ruosaari S, Kulathinal S, Hollmén J, and Auvinen P. Microarray image segmentation using additional dye An experimental study. Molecular and Cellular Probes 21(56) 321-328. OctoberDecember, 2007. (In Press).
Keski-Säntti H, Atula T, Tikka J, Hollmén J, Mäkitie AA, and Leivo I. Predictive value of histopathologic parameters in early squamous cell carcinoma of oral tongue. Oral Oncology 43(10) 1007-1013, November, 2007.
Hollmén J and Tikka J. Compact and understandable descriptions of mixture of Bernoulli distributions. In M.R. Berthold and J. ShaweTaylor and N. Lavrac (editors): Proc. of the 7th International Symposium on Intelligent Data Analysis (IDA 2007) 4723:112. Lecture Notes in Computer Science. SpringerVerlag, Ljubljana, Slovenia. September 2007.
Myllykangas S, Junnila S, Kokkola A, Autio R, Scheinin I, KarjalainenLindsberg ML, Kiviluoto T, Knuutila S, Puolakkainen P, and Monni O. Integrated Gene Copy Number and Expression Microarray Analysis of Gastric Cancer Highlights Potential Target Genes. Int J Cancer. In press.
Scheinin I, Myllykangas S, Borze I, Knuutila S, and Saharinen J. CanGEM: mining gene copy number changes in cancer. Nucl Acids Res. In press.
Lindholm P, Salmenkivi K, Vauhkonen H, Nicholson AG, Anttila S, Kinnula V, and Knuutila S. Gene copy number analysis in malignant pleural mesothelioma using oligonucleotide array CGH. Cytogenet Genome Res. In press.
Savola S, Nardi F, Scotlandi K, Picci P, and Knuutila S. Microdeletions in 9p21.3 induce false negative results in CDKN2A FISH analysis of Ewing sarcoma. Cytogenet Genome Res. In press.
Tikka J, Hollmén J, and Myllykangas S. In Sandoval F, Prieto A, Cabestany J, Graña M (eds.). Mixture modeling of DNA copy number amplification patterns in cancer. Proc. of the 9th International WorkConference on Artificial Neural Networks (IWANN'2007). SpringerVerlag: Heidelberg, pp. 972-979. 2007.
Nymark P, Lindholm PM, Korpela MV, Lahti L, Ruosaari S, Kaski S, Hollmén J, Anttila S, Kinnula VL, and Knuutila S. Gene expression profiles in asbestosexposed epithelial and mesothelial lung cell lines. BMC Genomics 8:62, 2007.
Myllykangas S, Böhling T, and Knuutila S. Specificity, selection and significance of gene amplifications in cancer. Review. Semin Cancer Biol 17:42-55, 2007.
Venna J. Dimensionality Reduction for Visual Exploration of Similarity Structures. D.Sc. thesis. Dissertations in Computer and Information Science, Report D20. Espoo, Finland, 2007.
Kaski S and Peltonen J. Learning from Relevant Tasks Only. In Joost N. Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, Dunja Mladenic, and Andrzej Skowron (eds.): Machine Learning: ECML 2007 (Proc. of the 18th European Conference on Machine Learning), Lecture Notes in Artificial Intelligence 4701, p. 608-615. SpringerVerlag, Berlin, Germany, 2007
Peltonen J, Goldberger J, and Kaski S. Fast Semisupervised Discriminative Component Analysis. In Diamantaras K, Adali T, Pitas I, Larsen J, Papadimitriou T, and Douglas S, editors, Machine Learning for Signal Processing XVII, p. 312-317. IEEE, 2007.
Nybo K, Venna J and Kaski S. The selforganizing map as a visual neighbor retrieval method. In Proc. of 6th Int. Workshop on SelfOrganizing Maps (WSOM '07). Bielefeld University, Bielefeld, Germany, 2007.
Klami A and Kaski S. Local Dependent Components. In Zoubin Ghahramani (Ed.), Proc. of the 24th International Conference on Machine Learning (ICML 2007), pp. 425-433. Omni Press, 2007.
Wikman H, Ruosaari S, Nymark P, Sarhadi VK, Saharinen J, Vanhala E, Karjalainen A, Hollmén J, Knuutila S and Anttila S: Gene expression and copy number profiling suggests the importance of allelic imbalance in 19p in asbestosassociated lung cancer. Oncogene 26:4730-4737, 2007.
Kaski S, Rousu J, and Ukkonen E. Probabilistic modeling and machine learning in structural and systems biology; editorial of a special issue. BMC Bioinformatics, 8(Suppl 2):S1, 2007.
Venna J, and Kaski S. Comparison of visualization methods for an atlas of gene expression data sets. Information Visualization, 6:139-154, 2007.
Sairanen M, O'Leary OF, Knuuttila JE and Castrén E. Chronic antidepressant treatment selectively increases expression of plasticityrelated proteins in the hippocampus and medial prefrontal cortex of the rat. Neuroscience 144: 368–374, 2007.
Nymark P, Wikman H, Ruosaari S, Hollmén J, Vanhala E, Karjalainen A, Anttila S, and Knuutila S. Identification of Specific Gene Copy Number Changes in AsbestosRelated Lung Cancer. Cancer Research 66(11)5737-5743, 2006.
Nikkilä J, Honkela A, and Kaski S. Exploring the independence of gene regulatory modules In Rousu J, Kaski S, and Ukkonen E, editors, Probabilistic Modeling and Machine Learning in Structural and Systems Biology (PMSB 2006), workshop Proc., pages131-136, Helsinki University Printing House, 2006.
Myllykangas S, Himberg J, Böhling T, Nagy B, Hollmén J, and Knuutila S: DNA copy number amplification profiling of human neoplasms. Oncogene 25:7324-7332, 2006.
Myllykangas S and Knuutila S. Manifestation, mechanisms and mysteries of gene amplifications. Review. Cancer Lett 232:79-89, 2006.
Rousu J, Kaski S and Ukkonen E (eds.) Probabilistic Modeling and Machine Learning in Structural and Systems Biology. Workshop, Tuusula, Finland, June 1718. University of Helsinki, Finland, 2006.
Seiffert U, Hammer B, Kaski S and Villmann T. Neural Networks and Machine Learning in Bioinformatics Theory and Applications. Proc. of ESANN'06, 14th European Symposium on Artificial Neural Networks, pp. 521-532. dside, Evere, Belgium, 2006.
Semenov A., Goldsteins G. and Castrén E. Phosphoproteomic analysis of neurotrophin receptor TrkB signaling pathways in mouse brain. Cell. Mol. Neurobiol., 26:163-175, 2006.
Korpela M and Hollmén J. Extending an algorithm for clustering gene expression time series. In J. Rousu and S. Kaski and E. Ukkonen (eds.): Proc. of the Workshop on Probabilistic Modeling and Machine Learning in Structural and Systems Biology pp. 120-124. University of Helsinki, Department of Computer Science, Series of Publications B, Report B20064, 2006.
Venna J and Kaski S. Visualizing Gene Interaction Graphs with Local Multidimensional Scaling. In Proc. of ESANN'06, 14th European Symposium on Artificial Neural Networks, pp. 557-562. dside, Evere, Belgium, 2006.
Klami A and Kaski S. Generative models that discover dependencies between data sets. In McLoone S, Adali T, Larsen J, Van Hulle M, Rogers A, and Douglas SC (eds.): Machine Learning for Signal Processing XVI, pp. 123-128. IEEE, 2006.
Venna J and Kaski S. Local multidimensional scaling. Neural Networks 19, 889899, 2006. Rantamäki T, Knuuttila J, Hokkanen M and Castrén E. The effects of acute and longterm lithium treatments on trkB neurotrophin receptor activation in the mouse hippocampus and anterior cingulate cortex. Neuropharmacology 50, 421-427, 2006.
Bonsaythip C, Hollmén J, Kaski S and Oresic M (eds.). Proc. of KRBIO05, Symposium on Knowledge Representation in Bioinformatics. Helsinki University of Technology, Espoo, Finland, 2005.
Kaski S and Nikkilä J. Of mice and men and yeast, and dependency exploration. CSCnews, Information Technology for Science in Finland 4, pp. 24-26, 2005.
Koponen E., Rantamäki T., Voikar V., Saarelainen T., MacDonald E. and Castrén E. Enhanced BDNF signaling is associated with an antidepressantlike behavioral response and changes in brain monoamines. Cell. Mol. Neurobiol. 25, 973-980, 2005.
Sairanen M., Lucas G., Ernfors P., Castrén M, and Castrén E. BDNF and antidepressant drugs have different but coordinated effects on neuronal turnover, proliferation and survival in the adult dentate gyrus. J. Neurosci. 25, 1089-1094, 2005.
Bounsaythip C, Lindfors E, Gopalacharyulu PV, Hollmén J and Oresic M. Networkbased representation of biological data for enabling contextbased mining. Proc. of KRBIO'05, International Symposium of the Knowledge Representation in Bioinformatics, pp. 1-6. 2005.
Gopalacharyulu PV, Lindfors E, Bounsaythip C, Kivioja T, Yetukuri L, Hollmén J and Oresic M. Data integration and visualization system for enabling conceptual biology. Bioinformatics 21 (suppl 1.) pp. 177-185, 2005.
Nikkilä J, Roos C, Savia E, and Kaski S. Explorative modeling of yeast stress response and its regulation with gCCA and associative clustering. International Journal of Neural Systems, 15, 237-246, 2005.
Elo LL, Lahti L, Skottman H, Kyläniemi M, Lahesmaa R and Aittokallio T. Integrating probelevel expression changes across generations of Affymetrix arrays. Nucleic Acids Research 33, e193, 2005.
Kaski S, Nikkilä J, Sinkkonen J, Lahti L, Knuuttila J, and Roos C. Associative clustering for exploring dependencies between functional genomics data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics. Special Issue on Machine Learning for Bioinformatics Part 2, vol. 2, nr. 3, pp. 203-216, 2005.
Sinkkonen J, Kaski S Nikkilä J, and Lahti L. Associative Clustering (AC): Technical Details. Technical Report A84, Helsinki University of Technology, Publications in Computer and Information Science, Espoo, Finland, April 2005.
Klami A, Nikkilä J, Roos C, and Kaski S. Extracting yeast stress genes by dependencies between stress treatments. Poster in the European Conference on Computational Biology 05 (ECCB), Madrid, Spain, September 28 - October 1, 2005.
Kettunen E, Nicholson AG, Nagy B, Seppänen JK, Ollikainen T, Ladas G, Kinnula V, Dusmet M, Nordling S, Hollmén J, Kamel D, Goldstraw P and Knuutila S. L1CAM, INP10, Pcadherin, tPA and ITGB4 overexpression in malignant pleural mesotheliomas revealed by combined use of cDNA and tissue microarray. Carcinogenesis 26(1), 17-25, 2005.
Nikkilä J, Roos C, and Kaski S. Integration of transcription factor binding and gene expression by associative clustering. In Bounsaythhip, Hollmén, Kaski, Oresic (eds.): KRBIO05, Proc. of Symposium of Knowledge Representation in Bioinformatics, pp. 22-29, Espoo, Finland, 15.17. June 2005, Otamedia Ltd.
Kaski S, Nikkilä J, Savia E, and Roos C. Discriminative clustering of yeast stress response. In Seiffert, Jain, and Schweizer, editors, Bioinformatics using Computational Intelligence Paradigms, pp. 75-92. Springer, Berlin, 2005.
Sinkkonen J, Nikkilä J, Lahti L, and Kaski S. Associative Clustering. ECML 2004, Pisa, Italy. In: Boulicaut, Esposito, Giannotti,Pedreschi (eds.): Machine Learning: ECML2004 (Proc. of 15th European Conference on Machine Learning), Lecture Notes in Computer Science 3201, pp. 396406, 2004.
Nikkilä J, Roos C, and Kaski S. Exploring dependencies between yeast stress genes and their regulators. In: Yang, Everson, and Yin, editors, Proc. of International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2004), pp.92-98, Springer, 2004.
Knuuttila J., Törönen P. and Castrén E. Effects of antidepressant drug imipramine on gene expression in rat prefrontal cortex. Neurochem Res. 29, 1235-1244, 2004.
Lähteinen S., Pitkänen A., Knuuttila J., Törönen P. and Castrén E. Modified hippocampal gene expression during epileptogenesis in transgenic mice with altered BDNF signalling. Eur. J. Neurosci. 19, 3245-3254, 2004.
Koponen E., Võikar V., Riekki R. , Saarelainen T., Rauramaa T., Rauvala H., Taira T., Castrén E. Transgenic mice overexpressing the fulllength neurotrophin receptor trkB exhibit increased activation of trkB/PLC pathway, reduced anxiety, and facilitated learning. Mol. Cell. Neurosci. 26, 166-181, 2004.
Kettunen E, Anttila S, Seppänen JK, Karjalainen A, Edgren H, Lindström I , Salovaara R, Nissen AM, Salo J, Mattson K, Hollmén J, Knuutila S and Wikman H. Differentially expressed genes in nonsmall cell lung cancer: expression profiling of cancerrelated genes in squamous cell lung cancer. Cancer Genetics and Cytogenetics 149(2) pp. 98-106, 2004.
Wikman H, Seppänen JK, Sarhadi VK, Kettunen E, Salmenkivi K, Kuosma E, VainioSiukola K, Nagy B, Karjalainen A, Sioris S, Salo J, Hollmén J, Knuutila S and Anttila S. Caveolins as tumor markers in lung cancer detected by combined use of cDNA and tissue microarrays. Journal of Pathology 203, pp. 584-593, 2004.