User:Kpzhang

Einstein $A=X+Y+Z$ , A:Success, X:Hard Work, Y:Correct Method, Z: Less Talk.

A old and famous chinese poem

Research

Database Integration

Linking biological databases semantically for knowledge discovery

Many important life sciences questions are aimed at studying the relationships and interactions between biological functions/processes and biological entities such as genes. The answers may be found by examining diverse types of biological/genomic databases. Finding these answers, however, requires accessing, and retrieving data, from diverse biological data sources. More importantly, sophisticated knowledge discovery processes involve traversing through large numbers of inherent links among various data sources. Currently, the links among data are either implemented as hyperlinks without explicitly indicating their meanings and labels, or hidden in a seemingly simple text format. Consequently, biologists spend numerous hours identifying potentially useful links and following each lead manually, which is time-consuming and error-prone. Our research is aimed at constructing semantic relationships among all biological entities. We have designed a semantic model to categorize and formally define the links. By incorporating ontologies such as Gene or Sequence ontology, we propose techniques to analyze the links embedded within and among data records, to explicitly label their semantics, and to facilitate link traversal, querying, and data sharing. Users may then ask complicated and ad hoc questions and even design their own workflow to support their knowledge discovery processes.

The semantic constructs defined in BioSem can be used to characterize and classify the links among biological entities. The hyperlinks recorded in the Entrez system between entries in different databases have the relationship semantics represented in BioSem. However, many of these are hidden in the records or summaries and need to be generated from the hyperlinks and explicitly labeled. We classify and label the links using our link ontology which is derived from the relationships in BioSem. These labeled links with semantics can then help users answer complex queries.

Based on semantic model and link ontology, we take advantage of existing API provided by EntreZ system to implement a prototype system to demonstrate the ability of our system to do ad hoc and semantic queries.

Using Semantic Meta-Graph Theory to Integrate Biological Database

We are using semantic meta-graph theory to represent the relationships among biological entities, including functions and some important processes. Semantic meta-graph can not only show the semantics of nodes and their links, but also provides the coordinated information of nodes. We develop a three-layer framework(Frontend--Wikipedia visualization, Middleware--Semantic Meta-Graph, Backend--Data sources) to clearly find which data sources should be integrated in terms of one particular query condition.

Bioinformatics

Data Mining Application--Phosphorylation Sites Prediction
Data Mining Application--Catalytic Sites Prediction

Enzyme catalytic sites are well-defined residues that are relevant to enzyme function by which can accelerate some chemical reactions. More and more tertiary structures of enzyme have been resolved, in contrast, the number of enzyme function which has been known is very small. The prediction of catalytic sites on enzyme is helpful to know the function of enzyme in very great degree. Of course, it is time-consuming and very expensive to identify catalytic residues on an enzyme by biological experiments.In the past few years, sequence analysis has been the first guide for the prediction of special residues on a protein. It is not enough for the accurate prediction to focus on sequence conservation only. we not only consider increasing physico-chemical properties such as B-factors and solvent accessibility on each residue, but also select new model – Naïve Bayes Classifier. The results of this method could reach 91.22% of correctly classified rate, 8.77% of incorrectly classified rate and less running-time consuming.

Some experimental results in the same dataset for catalytic sites prediction

The influence of different size on dataset for catalytic sites prediction

Comparison to previous methods with the accuracy of prediction

Publications

Sudha Ram, Kunpeng Zhang, Using Semantic Meta-Graph theory to Integrate Biological Databases. (Submitted to WITS2008)
Sudha Ram, Kunpeng Zhang, Wei Wei, Linking biological database semantically for knowledge discovery (Accepted by ER2008)
Kunpeng Zhang, Yun Xu, Guoliang Chen, Prediction of Enzyme Catalytic Residues Based on Naïve Bayes Classification (International Journal of Bioinformatics Research and Applications 2008 - Vol. 4, No.3 pp. 295 - 305).
Yun Xu, Kunpeng Zhang, Guoliang Chen, Prediction of Enzyme Catalytic Residues Based on Bayes Classification,2006 Workshop on Intelligent Computing & Bioinformatics of CAS, PP. 35~38, 2006.
Kunpeng Zhang, Yun Xu, Yifei Shen, Guoliang Chen, Using A Neural Networking Method to Predict Protein Phosphorylation Sites with Specific Kinase, IEEE International Symposium on Neural Networks 2006, 682-689, Volume 3973/2006, LNCS.

—Preceding unsigned comment added by Kpzhang (talk • contribs) 00:24, 19 July 2008 (UTC)