Link: NCBI PubMed
Summary
Query Integrator System (QIS) is a database mediator framework intended to address robust data integration from continuously changing heterogeneous data sources in the biosciences. This paper outlines barriers to interoperability of bioscience databases, summarizes previous interoperation approaches, and describes QIS.The QIS architecture is based on a set of distributed network-based servers, data source servers, integration servers, and ontology servers, that exchange metadata as well as mappings of both metadata and data elements to elements in an ontology. Metadata version difference determination coupled with decomposition of stored queries is used as the basis for partial query recovery when the schema of data sources alters.
Bioscience schemas evolve significantly and rapidly because of scientific progress, changing research goals and discovery of better data representation techniques. Due to this nature, federation of bioscience databases faces these major barriers:
- Databases do not usually support unstructured SQL for performance and safety reasons. Therefore, predefined parametrized queries are generally used. Any alterations to the database can cause these queries to break.
- Interoperation between databases becomes difficult in the presence of synonymy and polysemy. Such issues require presence of a controlled vocabulary, and that data and metadata must be mapped to concepts in the controlled vocabulary.
- Federated search mechanisms must appropriately exclude data that are still preliminary and not available for public access beyond the research group creating an individual database.
The objectives underlying the creation of QIS are: (1) to integrate data sources, (2) to devise a scalable approach to schema evolution in a loosely coupled database federation, (3) to devise robust mechanisms for metadata exchange, (4) to address the problem of breaking-up of federated queries by automatic detection of schema evolution, (5) to support interoperation with a separation between public and private data, (6) to facilitate recording of system semantics, and (7) to devise an open-source, low-cost, lightweight system to facilitate research work.
QIS uses a distributed architecture that is composed of three main functional units: integrator servers (ISs), data source servers (DSSs), and the ontology server (OS). These units form the system's middle tier, connecting data consumers with data providers and knowledge sources.
Definitions
Synonymy: It is the semantic relation that holds between two words that can, in a given context, express the same meaning.Polysemy: It is defined as the ambiguity of an individual word or phrase that can be used, in different contexts, to express two or more different meanings.
No comments:
Post a Comment