Different people have different ways of describing information, so communication over the World Wide Web is often chaotic. XML is a standard for describing Web accessible data; however it is frequently still necessary to map and translate XML formatted data to allow users to exchange information. If there are N XML formatted Bioinformatics data sources, N 2 XSL translators are typically used.
Methods : In order to address this problem, we map XML DTD describing Bioinformatics data into a common OWL ontology using the BSML tag set, and then map this representation into the ISO SQL/XML representation, as used in the newest version of the SFSU ER Design Tools. Our strategy is tested using NIH-NCBI XML formatted data. The W3C OWL language uses features similar to relational databases to explain how pieces of information on one website relate to information on other web sites. A single item of data can be logically related to a vast amount of information for automatically searching or querying intelligently. An OWL ontology for N different data sets requires 2N XSL translators for communication. Since an OWL ontology provides a unified semantics, merging different data sources in a meaningful way is straightforward, and easier than merging the equivalent relational data sources. However it is still necessary to have two XSL translators for each data source. When each OWL ontology community has a standard tag set as a backbone or a common vocabulary, it is easier to understand how to combine data in a meaningful way. However, XML files are larger due to the use of character data types and embedded tags. If the OWL community ontology is mapped into a centralized relational database, instead of using XML files, it speeds up storing and retrieving semantic information.
Results : We have completed a semantic mapping of the NIH-NCBI XML DTD’s into the ISO SQL/XML Standard, as implemented in the newest version of the SFSU ER Design Tools, using OWL and a common BSML tag set. Our prototype is operational with a large selection of data downloaded from NIH-NCBI via the SFSU Hedgehog web site for demonstration purposes.
Conclusions : The OWL ontology we developed allows N different Bioinformatics data sources to communicate with each other using only 2N XSL translation files. By mapping the NIH-NCBI XML files into the ISO SQL/XML representation, all of the existing ER Design tools for modeling, storing and retrieving centralized and distributed data can be used. |