At the same time that transcriptional analysis is booming, considerable data on protein family, function and pathway/process data is accumulating as a result of the structural and functional genomics initiatives. Structural data, in particular, can give profound insights into protein function and the nature of protein-protein interactions, and annotations can now be provided for nearly 50% of some genomes and above 80% of genes coding for enzymes and other proteins participating in biochemical pathways. In this project we therefore propose to use extensive structural data to provide prior knowledge on gene function.
The development of successful strategies for mining transcriptomics data will critically depend on integrating the expression data with this protein family/structure/function data. This data will provide crucial prior knowledge on gene function for guiding and interpreting the clustering of co-expressed genes, or when modelling temporal events in gene expression patterns.
In this project we will develop an integrated data warehouse - BioMap - containing protein family data i.e sequence, structure, function and pathway/process data integrated with the gene expression and other experimental data. The data warehouse will then underpin data mining protocols which we will develop which use prior knowledge of protein family and functions to facilitate analysis of co-expressed genes. We will also develop methods for data visualisation, especially for pathways/processes and interacting proteins suggested by the data mining.
The focus at Birkbeck is the construction of the integrated data warehouse. The significant Computer Science
challenges which arise are, first, the modelling and integration of functional data within the data warehouse and,
second, the development of data warehousing techniques to support data mining exploiting changing domain knowledge.
Cluster based integration of Heterogeneous Biological Databases using the AutoMed toolkit M Maibaum, L Zamboulis, G Rimon, C.Orengo, N Martin, A.Poulovasillis, Proc. 2nd International Workshop Data Integration in the Life Sciences DILS 2005, 191-207, (2005).