Tuesday, April 21, 2009

Requirements of Phylogenetic Databases

Citation: Luay Nakhleh, Daniel Miranker, Francois Barbancon, William H. Piel, Michael Donoghue. Requirements of Phylogenetic Databases, Third IEEE International Symposium on Bioinformatic and Bioengineering, vol. 0, no. 0, pp. 141, 2003.
Link: IEEE CS Digital Library

Summary

This work examines the impact of phylogenetic databases on the need and use of phylogenetic data. It evaluates the drawbacks of unnormalized Newick format in existing databases, e.g. TreeBASE, and suggests using normalized data model by providing a list of potential application/queries that a biologist may wish to see integrated into their phylogenetic DBMS.

There are two major drawbacks of the unnormalized Newick format:
  • The database cannot directly support queries concerning the relationships between the taxa and the structure of the phylogeny.
  • Some processes (e.g. hybridization, horizontal gene transfer etc.) result in graph structures, which are not supported by Newich format.

Authors of this paper identify six different categories of users of phylogenetic databases: (1) casual users, (2) visualization, (3) study development, (4) super-tree algorithms, (5) simulation studies, and (6) comparative genomics.

Definitions

Phylogeny: A phylogeny is a rooted, leaf-labeled tree, whose leaves represent a set of operational taxa, and whose internal nodes represent the (hypothetical) ancestral taxa. A phylogeny on a set of taxa represents the evolutionary history of the taxa in from their most recent ancestor (at the root of the tree).
Tree of Life A phylogenetic tree that represents the evolutionary history of all species in the world. It is expected that when finished, the Tree of Life will contain millions of species.