Dynamical model for DNA sequences Page: 5,281
16 p.View a full description of this article.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
VOLUME 52, NUMBER 5
Dynamical model for DNA sequences
P. Allegrini,' M. Barbi,2 P. Grigolini,1,2'3 and B. J. West 1
1 Center for Nonlinear Science, University of North Texas, P.O. Box 5368, Denton, Texas 76203
2Dipartimento di Fisica dell'Universit& di Pisa, Piazza Torricelli 2, 56100 Pisa, Italy
3lstituto di Biofisica del Consiglio Nazionale delle Ricerche, Via San Lorenzo 28, 56127 Pisa, Italy
(Received 4 May 1995)
We address the problem of DNA sequences, developing a "dynamical" method based on the
assumption that the statistical properties of DNA paths are determined by the joint action of
two processes, one deterministic with long-range correlations and the other random and S-function
correlated. The generator of the deterministic evolution is a nonlinear map belonging to a class of
maps recently tailored to mimic the processes of weak chaos responsible for the birth of anomalous
diffusion. It is assumed that the deterministic process corresponds to unknown biological rules
that determine the DNA path, whereas the noise mimics the influence of an infinite-dimensional
environment on the biological process under study. We prove that the resulting diffusion process,
if the effect of the random process is neglected, is an a-stable L6vy process with 1 < a < 2. We
also show that, if the diffusion process is determined by the joint action of the deterministic and the
random process, the correlation effects of the "deterministic dynamics" are canceled on the short-
range scale, but show up in the long-range one. We denote our prescription to generate statistical
sequences as the copying mistake map (CMM). We carry out our analysis of several DNA sequences
and their CMM realizations with a variety of techniques and we especially focus on a method of
regression to equilibrium, which we call the Onsager analysis. With these techniques we establish
the statistical equivalence of the real DNA sequences with their CMM realizations. We show that
long-range correlations are present in exons as well as in introns, but are difficult to detect, since the
exon "dynamics" is shown to be determined by the entanglement of three distinct and independent
CMM's.
PACS number(s): 87.10.+e, 05.40.+j, 05.70.LnI. INTRODUCTION
In the past decade or so there has been a ground swell
of interest in unraveling the mysteries of DNA. One ap-
proach that has, in just a few years, proven to be partic-
ularly fruitful in this regard is the statistical analysis of
DNA sequences [1-9] using modern statistical measures.
One focus of this analysis has been on the distribution of
the four bases adenine, cytosine, guanine, and thymine
(A, C, G, and T) in order to shed light on the following
fundamental problems: (i) establishing the role of the
noncoding regions in DNA sequences (introns) in the hi-
erarchy of biological functions [2,5], (ii) finding simple
methods of statistical analysis of such sequences to dis-
tinguish the noncoding from the coding regions (exons)
[6], (iii) discovering the constraints and regularities be-
hind DNA evolution and their connections to the Darwin
theory of selection and more generally to contemporary
evolution theories [7,8], (iv) extracting new global infor-
mation on DNA and its function [2,3,5], and (v) estab-
lishing the roles of chance and determinism in genetic
evolution and coding regarded as being the "program"
underlying the development and life of every organism
[7].
A familiar kind of analysis of DNA sequences is that
used by Voss [3] based on the equal-symbol correlation.He uses a binary indicator function Uk (xn) that is equal
to 1 if a letter k occurs at the position x, and to 0 oth-
erwise. The letter k is defined by the four nucleotides
k = A, C, T, G. The indicator functions are used to con-
struct the correlation function and its Fourier transform,
the spectral densityS(f) = > Sk (f),
k=A,C,T,G(1)
from which he removed the white noise floor. The details
of this technique are reviewed in Sec. III D.
The analysis of the spectrum S(f) led Voss to the
following two major observations regarding the general
properties of DNA spectra: (a) the spectra have a peak
at f = 1/3, t = 3 (for coding sequences) and (b) the
DNA sequences have long-range correlations as indicated
by the slope of the spectrum, when plotted on a log-log
graph paper.
We shall discuss the 1/3 peak subsequently. Here we
stress that the long-range correlation means that1
lim S(f) oc
f-o f'(2)
with 1 > v > 0 (the case v = 0 corresponds to a com-
pletely random distribution, with no correlation). The@ 1995 The American Physical Society
PHYSICAL REVIEW E
NOVEMBER 1995
1063-651X/95/52(5)/5281(16)/$06.00
52 5281
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Allegrini, Paolo; Barbi, M.; Grigolini, Paolo & West, Bruce J. Dynamical model for DNA sequences, article, November 1995; [College Park, Maryland]. (https://digital.library.unt.edu/ark:/67531/metadc139499/m1/1/: accessed April 18, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Arts and Sciences.