A new tool for protein fold recognition: A Bayesian heuristic threading algorithm. Page: 4 of 12
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
2
1. Introduction
Each protein generally folds into its own unique three-dimensional structure. The latter is thus
assumed to be determined solely by the sequence of amino acids of which the protein is composed. It is
of considerable interest to discover how proteins fold, and to understand it well enough to make
predictions. Increased attention has been attracted to this problem as a result of the Human Genome
Project. The sequencing centers are determining nucleotide sequences of genetic DNA, which translate
to amino acid sequences of new proteins. The functions of these proteins-the biochemical workhorses of
the organism-are determined by their 3D conformations. Thus, solving the protein folding problem will
be of immense importance in producing the scientific and technical benefits promised by modern
molecular biology.
An active area of research is development of fold recognition methods for structure prediction.
[1,2] Such methods, also called threading, are intended for cases where the sequence of the target protein
is not similar to that of any protein of known structure. The idea is that the fold of the target protein U
might be close to one in a database, even if their sequences are dissimilar. The computation consists of
aligning the sequence of U to a known structure K in a way that optimizes the energy [3] or some other
fitness function. This is repeated for all the examples in a fold library, to find the one that gives the best
fitness score. Fold-recognition methods for structure prediction are currently lacking in accuracy and
reliability. This was illustrated in a recent contest [4], in which folds were predicted for sequences
whose maximum degrees of homology with proteins of known structure ranged from quite low to
moderate. (Experimentally-derived structures of the targets were published only after the contest.) Only
one third of the folds predicted were the correct choices from the fold libraries (not including targets
whose folds fail to match any known ones). Even when the correct fold was recognized, the alignment of
sequence to structure was generally very poor [1].
One shortcoming in existing fold-recognition methods is the lack of realism of the potentials
employed [3]. Another weakness is in the threading algorithms used, i.e., the procedures that (given the
energy function) are supposed to find the lowest-energy alignment between a given sequence and given
structure. This work is directed toward providing an improved threading algorithm.
The threading algorithm is asked to locate the lowest-energy one out of a huge number of
possible alignments of sequence U to template K. When interactions occur between amino acids that are
spatially close, this is an NP-hard problem [5], meaning that any method that always finds the global
minimum, even in the worst case, requires computational effort that increases exponentially with the
number of independent variables. Recently, Lathrop and Smith [6] published a branch-and-bound search
(BB) procedure. The BB is guaranteed to find the lowest-energy alignment for the worst case, but (as
expected) at a cost that grows exponentially with the number of degrees of freedom (i.e., with the number
of core elements in the template). Xu and Uberbacher [7] have presented a threading algorithm that
likewise always finds the global minimum. This one however, runs in polynomial time when the
interactions are modest in number.
In the following, protein fold recognition is described in further detail, along with the definition
and role of threading algorithms. Two popular threading methods are outlined, to show where the
difficulties enter. Our new Bayesian threading algorithm is then presented. Finally, we show numerical
tests of the speed and accuracy of the three methods.
2.1 Protein Fold Recognition
Several general computational approaches to the protein-folding problem are under intense
exploration. The one that concerns us here is called threading, or fold recognition. The idea is, given a
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Crawford, O. A new tool for protein fold recognition: A Bayesian heuristic threading algorithm., article, October 1, 1997; Tennessee. (https://digital.library.unt.edu/ark:/67531/metadc689876/m1/4/: accessed April 25, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.