The EcoCyc electronic database is a literature-derived resource that describes the genome and biochemical machinery of Escherichia coli. The database contains up-to-date annotations and the DNA sequences of all genes in E. coli and describes all known pathways of its small-molecule metabolism. Each pathway and its component reactions and enzymes are annotated in rich detail, with extensive references to the biomedical literature. For example, the detail provided for each E.coli enzyme includes its cofactors, activators, inhibitors, and subunit structure. In July, the database included 159 metabolic pathways, 946 reactions, 629 enzymes, and 4390 genes.
EcoCyc also is used by investigators to annotate other microbial genomes. Because E.coli has the largest fraction of gene products whose functions were determined experimentally, sequence- similarity matches to E.coli are less likely to result in incorrect function predictions than are matches to other microbial genomes. Those genomes may have a higher rate of annotation errors due to computational, and perhaps transitive, misannotation.
EcoCyc Features
EcoCyc's Pathway Tools software provides a powerful environment for creating, managing, and publishing pathway and genome databases (DBs) on the Web. One of the components is the Pathway Genome Navigator, which allows users to query these DBs and to visualize and compare the resulting pathways, genes, genome maps, reactions, and enzymes. The PathoLogic program computationally predicts an organism's metabolic-pathway complement and creates a DB describing that prediction. A set of graphical tools allows users to edit the annotations interactively.
Future Directions
The EcoCyc project is now moving beyond E.coli metabolic pathways to include signal-transduction pathways, transport proteins, regulation of gene expression, and tRNAs. Version5.0, released in June, contains detailed annotations of E.coli phosphotransferase-system transporters authored by collaborators Milton Saier and Ian Paulsen (University of California, San Diego). In addition, Julio Collado (Universidad Nacional Autonoma de Mexico) is adding descriptions of E.coli gene regulation, including operons, promoters, and DNA-binding proteins. Once the regulatory mechanisms are added to EcoCyc, researchers will be able to compare known mechanisms of E.coli gene regulation with microarray-derived gene-expression data.
Pangea Systems, a bioinformatics company in Oakland, California, makes EcoCyc available free to the academic community and for a fee to commercial organizations. The company is interested in collaborating with academic genome centers to use the Pathway Tools for curation and Web publishing of their genomes. |