Comparative Plant Genomics Resources at Plant GDB Page: 610
The following text was automatically extracted from the image on this page using optical character recognition software:
Comparative Plant Genomics Resources at PlantGDB1
Qunfeng Dong, Carolyn J. Lawrence2, Shannon D. Schlueter, Matthew D. Wilkerson, Stefan Kurtz,
Carol Lushbough, and Volker Brendel*
Department of Genetics, Development and Cell Biology (Q.D., C.J.L., S.D.S., M.D.W., V.B.), and
Department of Statistics (V.B.), Iowa State University, Ames, Iowa 50011-3260; Zentrum fiir
Bioinformatik, Universitit Hamburg, 20146 Hamburg, Germany (S.K.); and Department of
Computer Science, University of South Dakota, Vermillion, South Dakota 57069 (C.L.)
PlantGDB (http://www.plantgdb.org/) is a database of plant molecular sequences. Expressed sequence tag (EST) sequences are
assembled into contigs that represent tentative unique genes. EST contigs are functionally annotated with information derived
from known protein sequences that are highly similar to the putative translation products. Tentative Gene Ontology terms are
assigned to match those of the similar sequences identified. Genome survey sequences are assembled similarly. The resulting
genome survey sequence contigs are matched to ESTs and conserved protein homologs to identify putative full-length open
reading frame-containing genes, which are subsequently provisionally classified according to established gene family
designations. For Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa), the exon-intron boundaries for gene structures are
annotated by spliced alignment of ESTs and full-length cDNAs to their respective complete genome sequences. Unique
genome browsers have been developed to present all available EST and cDNA evidence for current transcript models (for
Arabidopsis, see the AtGDB site at http://www.plantgdb.org/AtGDB/; for rice, see the OsGDB site at http://www.plantgdb.
org/OsGDB/). In addition, a number of bioinformatic tools have been integrated at PlantGDB that enable researchers to carry
out sequence analyses on-site using both their own data and data residing within the database.
Plant genome sequence data have been accumulat-
ing from three major sources: whole-genome sequenc-
ing and assembly (Arabidopsis [Arabidopsis thaliana]:
Lin et al., 1999; Mayer et al., 1999; Salanoubat et al.,
2000; rice [Oryza sativa]: Goff et al., 2002; Yu et al., 2002;
Medicago truncatula: http://medicago.org/), genome
survey sequences (GSS; maize [Zea mays]: Palmer et al.,
2003; Whitelaw et al., 2003; Fernandes et al., 2004;
sorghum [Sorghum bicolor]: Bedell et al., 2005), and
expressed sequence tags (ESTs; more than 50 species).
This data flow is likely to continue, with a focus on
complete sequencing of "reference species" (Arabi-
dopsis, rice, maize, M. truncatula, and tomato [Lyco-
persicon esculentum]), draft sequencing of other selected
species, and further EST and full-length cDNA se-
quencing. Considerable resources have been devoted
to the development of public databases that provide
access to plant genome data. However, finding ways to
efficiently access and effectively analyze those se-
quence data remains a nontrivial challenge for many
PlantGDB (http://www.plantgdb.org/) is our on-
going effort to aid in the organization and interpreta-
tion of sequence data through the development and
1 This work was supported by the National Science Foundation
Plant Genome Research Projects (grant no. DBI-0321600 to V.B. and
2 Present address: Corn Insects and Crop Genetics Research, U.S.
Department of Agriculture Agricultural Research Service and De-
partment of Agronomy, Iowa State University, Ames, IA 50011-3260.
* Corresponding author; e-mail firstname.lastname@example.org; fax 515-
implementation of integrated databases and analytical
tools. In this article, we discuss some of the unique
sequence storage and analysis capabilities provided by
PlantGDB and compare them to those made available
through other online resources. All PlantGDB data and
scripts described here are freely available from our
download site (http://www.plantgdb.org/download/
download.php) or by request.
PlantGDB is a plant sequence database. Its data
consist of plant sequences and their associated an-
notations. There are mainly three types of plant se-
quences: complete genome sequences for Arabidopsis
and rice, other kinds of sequences including EST and
GSS extracted from public sequence repositories such
as GenBank (Benson et al., 2005), and assembled EST
and GSS contigs.
Data Sources and Updates
Plant sequences that are made available through
public repositories compose the core PlantGDB se-
quence set. Currently, PlantGDB contains sequences
from more than 24,000 plant species (belonging to
more than 6,000 genera). Our sequence-processing
scripts extract all plant nucleotide sequences from EST
(http://www.ncbi.nlm.nih.gov/dbEST/), GSS (http:/ /
www.ncbi.nlm.nih.gov/dbGSS/), sequence tagged
sites (http:/ /www.ncbi.nlm.nih.gov/dbSTS/), high-
throughput genomic (http://www.ncbi.nlm.nih.gov/
HTGS/), and other genomic DNA sequence categories
at GenBank and populate our relational database. All
Plant Physiology, October 2005, Vol. 139, pp. 610-618, www.plantphysiol.org 2005 American Society of Plant Biologists
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Dong, Qunfeng; Lawrence, Carolyn J.; Schlueter, Shannon D.; Wilkerson, Matthew D.; Kurtz, Stefan; Brendel, Volker et al. Comparative Plant Genomics Resources at Plant GDB, article, October 2005; [Rockville, Maryland]. (digital.library.unt.edu/ark:/67531/metadc78294/m1/1/: accessed March 26, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Arts and Sciences.