Научная статья на тему 'Algorithms for the prediction of RNA secondary structures'

Algorithms for the prediction of RNA secondary structures Текст научной статьи по специальности «Математика»

CC BY
71
28
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук

Аннотация научной статьи по математике, автор научной работы — Elloumi M.

In this paper, we tackle the problem of the prediction by energy computation of RNA stable secondary structures. We present, under the Hypothesis of Loops Dependent Energy (HLDE), our dynamic programming algorithm to compute the free energies of the stable secondary structures and our traceback algorithm to predict these structures. We compute the free energies of the stable secondary structures by using a new approach, called m-Multiloop Approach ( mMA), m > 1. This computation is achieved within a time proportional to n4 and using a memory space proportional to n2. The prediction of the stable secondary structures is achieved within a time proportional to n3 * log3(n). Compared to other approaches, the m-MA enables us to improve the estimation of the minimum energetic contributions of the multiloops. And hence, it enables us to improve the estimation of the free energies of the stable secondary structures.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Algorithms for the prediction of RNA secondary structures»

Computational Technologies

Vol 6, No 5, 2001

ALGORITHMS FOR THE PREDICTION OF RNA SECONDARY STRUCTURES

M. Elloumi

Computer Science Department, Faculty of Economic Sciences and Management of Tunis, Tunisia e-mail: [email protected]

В данной работе задача прогнозирования вторичной структуры макромолекул ДНК решается с помощью "энергетических расчетов". Излагаются алгоритм динамического программирования для вычисления свободных энергий стабильных вторичных структур и отслеживающий алгоритм для прогнозирования этих структур. Свободные энергии стабильных вторичных структур рассчитываются с использованием нового подхода, получившего название " m-многоконтурный подход" (m-MA), m > 1. Вычисление выполняется за время, пропорциональное n4, и требует объема памяти, пропорционального n2. Прогонозирование стабильных вторичных структур выполняется за время, пропорциональное n3 * log3(n). В сравнении с другими подходами алгоритм (m-MA) позволяет улучшить оценку минимальных энергетических вкладов множества контуров, что уточняет оценку свободных энергий стабильных вторичных структур.

Introduction

In molecular biology, a macromolecule can be coded by a string called primary structure. Each character in this string codes a constituent of the macromolecule. For the RiboNucleic Acid (RNA), these constituents, called bases, are Adenine, Cytosine, Guanine and Uracil. They are coded respectively by the characters A, C, G and U. Under some thermodynamic conditions, some regions of the macromolecule interact, thus creating folds within the macromolecule. These interactions are expressed at the level of the primary structure by pairings between the different substrings coding the regions that interact. The primary structure of the macromolecule provided with these pairings is called secondary structure. It is easy then to imagine that a macromolecule, represented by its primary structure, can have many secondary structures. However, only one of these structures is stable: it is the one that has the minimum free energy. The knowledge of this structure plays an important role, not only, to determine the interactions of the macromolecule with the DesoxyriboNucleic Acid (DNA) and the proteins, but also, to know its functions and its biochemical activities [9, 24].

Purely experimental methods, such as X-ray diffraction and Nuclear Magnetic Resonance (NMR) [18, 19], used to determine the secondary structures of RNA macromolecules are costly, require a long experimentation time and are practicable only for small molecules (few tens of bases). We resort then to a technique, called prediction by energy computation, which is both

© M. Elloumi, 2001.

experimental and algorithmic: the estimation of the energetic contribution generated by a given pairing or by a loop is made from experimental results [11, 12, 26, 27, 21, 16], whereas, the choice of the pairings to keep in order to have the stable secondary structure is made through an algorithm [20, 30, 14, 17, 2, 6, 23, 31, 13, 7, 8]. We distinguish two types of algorithms to predict secondary structures of RNA macromolecules:

(i) Either algorithms adopting a regions approach: we establish the list of all the substrings (regions) that can be paired with each other, while respecting the thermodynamic laws. Then, from the different combinations of the unoverlapped pairings, we establish the list of all the possible secondary structures. For each secondary structure, we compute its free energy by using the experimental results [11, 12, 26, 27, 21, 16]. The structure that has the minimum free energy is the true secondary structure of the macromolecule.

The algorithms that adopt this approach are, unfortunately, costly in computing time. Among these algorithms, we cite the one of Pipas and Mc Mahon [20], the one of Studnicka et al. [25], the one of Martinez [14] and the one of Dumas and Ninio [6]. The algorithm of Pipas and Mc Mahon is the first algorithm to be used to predict secondary structures of RNA macromolecules. Its computing time complexity is O(2n), where n is the size of the string [22].

(ii) Or algorithms adopting a dynamic programming approach [3, 4]. These algorithms have been developed either under the Hypothesis of Linearity of Energy (HLE) or under the Hypothesis of Loops Dependent Energy (HLDE) [23, 31]. Using these algorithms, we proceed in two steps:

During the first step, we compute the energy of the stable secondary structure associated with the concerned string (primary structure): the computation of the energies of the stable secondary structures associated with longer substrings is made by using the computations results of the energies of the stable secondary structures associated with shorter substrings. We reiterate this process until the energy of the stable secondary structure associated with the whole string is computed.

During the second step, we predict the pairings that generate the stable secondary structure associated with the concerned string: the prediction of the pairings that generate the stable secondary structures associated with shorter substrings is made according to the pairings that generate the stable secondary structures associated with longer substrings. We reiterate this process until the prediction of the stable secondary structure associated with the whole string is ended.

The algorithms that adopt this approach are less costly. Among these algorithms, we cite the one of Waterman and Smith [30], the one of Nussinov and Jacobson [17], the one of Auron et al. [2], those of Sankoff et al. [23] and those of Elloumi [7, 8]. The order of computing time complexity of these algorithms varies between O(n3) and O(n4), where n is the length of the string.

In this paper, we present under the HLDE our dynamic programming algorithm to compute the free energies of the stable secondary structures and our traceback algorithm to predict these structures. We compute the free energies of the stable secondary structures thanks to a new approach called m-Multiloop Approach (m-MA), where m > 1. This computation is achieved within a time proportional to n4 and using a memory space proportional to n2. The prediction of the stable secondary structures is achieved within a time proportional to n3*log3(n). Compared to other approaches, the m-MA enables us to improve the estimation of the minimum energetic contributions of the multiloops. And hence, it enables us to improve the estimation of the free energies of the stable secondary structures. The other approaches, either ignore the energetic contributions of the multiloops, or compute these contributions under the HLE.

In the first section of this paper, we present, on one hand, a formal definition of a secondary structure and of its different kinds of loops, on the other hand, we define the free energy and the loop energy associated with a substring.

In the second section, we show how we represent a secondary structure and its different loops.

In the third section, we present the different equations of energies computation.

In the fourth section, we present, under the HLDE, our dynamic programming algorithm to compute the free energies of the stable secondary structures and our traceback algorithm to predict these structures.

Finally, in the last section, we present our conclusion.

1. Definitions and notations

Let a be a finite alphabet, a string is an element of a*, it is a concatenation of elements of A. The length of a string w, denoted by |w|, is the number of the characters that constitute this string. By convention, the null length string will be denoted by e. A string w of length n will be denoted by w1>n and the ith character of w, 1< i < n, will be denoted by w%. A portion of w that begins at the position i and ends at the position j, 1 < i < j < n, is called substring of w and will be denoted by wi;j. By convention, when j < i we will set wi;j = e. When i = 1 and 1 < j < n then the substring wi j is called prefix of w and when 1 < i < n and j = n then the substring wi;n is called suffix of w. The primary structure of an RNA macromolecule is a string which characters belong to the alphabet Arna = {A, C, G, U}.

Let w be a primary structure of an RNA macromolecule, the set {w\ wi+1,... , wj}, 0 < i < j < |w|, of the characters making up a substring wi;j of w will be denoted by c(witj). We define on c(w) a pairing relation, denoted by satisfying the following properties:

(i) If w% ^ wj then (j — i) > 4.

(ii) If wl ^ wj then wl = A and wj = U, or wl = U and wj = A, or wl = C and wj = G, or wl = G and wj = C, or wl = G and wj = U, or wl = U and wj = G. The pair {w\ wj} is called Watson-Crick Pair (WCP).

(iii) If wl ^ wj then for any k, k £ [1..i — 1] U[i + 1--j — 1] U[j + 1--|w|], we can have neither wl ^ wk nor wj ^ wk.

(iv) For any couples (i,i') and (j, j'), i £ ]i..j[ and j' £ [1..i[U]j..|w|], if we have wl ^ wj then we cannot have wl ^ wj .

A secondary structure associated with a primary structure w and a pairing relation defined on c(w), is the set S(w, = {(w\ wj)|wl ^ wj and 0 < i < j < |w|}. The empty secondary structure will be denoted by w. A subset S(wi;j, 0 < i < j < |w|, of S(w, such that S (wi;j, = {(wp,wq )|wp ^ wq and 0 < i < p<q < j < |w|} is called substructure of S(w,

With each secondary structure S(w, we associate a negative weight, denoted by E(w, ^ ), called free energy of the structure S(w, The function E is called energetic function. The secondary structure for which this energy is minimum is called stable secondary structure of the macromolecule. It will be denoted by Smin(w) and its free energy will be denot ed by Emin(w):

The function E" is an energetic function dependent solely on the nature of the bases that constitute the string w. By convention, we will set E"(e) = 0.

(1)

Let us consider now a substring wj, 0 < i < j < |w|, the loop energy, denoted by Eioop(wi , j), associated with the substring wj is the minimum free energy that can have a secondary structure of w, j containing the couple (w',wj):

, . / min„|{E(wij if on c(w^-)|w' ^ wj, . .

ElooP(w'j) H else. (2)

Each secondary structure S(w, can be subdivided in a unique way in a certain number of loops. We distinguish five types of loops:

(i) If w' ^ wj and the bases w'+1, w'+2,... , wj-1 are not paired then the singleton n, j (w) = {(w',wj)} is called hairpin loop.

(ii) If w' ^ wj, w'+1 ^ wj-1,... , w'+fc ^ wj-k, with k > 1, then the set akj(w) = {(w',wj), (w'+1, wj-1),... , (w'+fc, wj-k)} is called stack.

(iii) If w' ^ wj and w'+fc ^ wj-1 (resp. w' ^ wj and w'+1 ^ wj-k), with i + 1 < i + k < j — 1 (resp. i + 1 < j — k < j — 1), and the bases w'+1, w'+2,... ,wi+fc-1 (resp. wj-k+1, wj-k+2,... , wj-1) are not paired then the pair Ak j(w) = {(w', wj), (w'+fc, wj-1)} (resp. pk j(w) = {(w',wj), (wi+1,wj-fc)}) is called left bulge loop (resp. right bulge loop).

(iv) If w',wj and w'+1 ^ wj-m, with i + 1 < i + l < j — m<j — 1, and the bases wi+1, wi+2,... , w'+1-1 and wj-m+1, wj-m+2,... , wj-1 are not paired then the pair Z' j^w) = {(w',wj), (wi+1 ,wj-m)} is called interior loop.

(v) If w' ^ wj, wi+fcl ^ wi+l1, wi+fc2 ^ wi+l2,..., wi+fcm ^ wi+1m, with i < i + k1 < i +11 < i + k2 < i +12 < ... < i + km < i + lm < j, and for any k, k £ ]i..i + k1^]i + l1..i + k2[U ... U ]i + lm..j[, we have wk is not paired then the set ^j1' "' 'km '1m (w) = {(w',wj), (wi+fcl ,wi+l1), (wi+fc2,wi+l2),... , (wi+fcm,wi+1m} is called multiloop. The couples (k1,l1), (k2,l2), ..., (km,lm) generate together m branches that is why 11' "'' km'1m (w) is also called m-multiloop.

Each one of these loops is said to be closed by the couple (w',wj). The set of loops that constitute a secondary structure S(w, will be denoted by l(w, and the set of loops that constitute the stable secondary structure Smin(w) will be denoted by lmin(w).

Let l' be one of the loops of a secondary structure S(w, An unpaired base wk of w is said to be accessible from lj, if and only if, there is a couple (wp,wq) of lj such that p < k < q and there is no other couple (w1 ,wm) belonging to S(w, but not belonging to lj such that p<l<k<m<q.

The set of non accessible bases from any loop of l(w, is called the tail of the secondary structure S(w, and will be denoted by t(w, The tail of the stable secondary structure Smin(w) will be denoted by Tmin(w).

Let wj, j be a substring and let sn (wj , j) be the set of the hairpin loops that can be defined thanks to the bases of wj,j. We define on sn(wj ,j) a partial order relation, denoted by [>, such that for a couple ({(wp,wq)}, {(wr,ws)}) of sn(w,j) x sn(w,j), we have {(wp, wq)} > {(wr, ws)}, if and only if p<q<r<s.A list of hairpin loops [{(wP1, wq1)}, {(wp2, wq2)},... , {(wPm, wqm }] ordered by the partial order relation > will be denoted by {(wP1, wq1)} > {(wP2 ,wq2)} > ... > {(wPm, wqm)}.

The hairpin loops graph, denoted by Gn(w^j) = (V^(w^j),En(wjj)), associated with the substring wj'j is the directed graph such that V^(wjj) = {(p — i + 1,q — i + 1)|{(wp,wq)} £ sn(w'j)} and EnKj) = {((p — i + 1, q — i + 1), (r — i + 1, s — i + 1))|{(wp, wq)} > {(wr, ws)}}.

2. Representation of a secondary structure and its loops

A secondary structure S(w, can be represented by an undirected graph G, G = (V, E), such that V = C(w) and E = S(w, A loop l of S(w, defined on C(wj), 0 < i < j < |w|, can be represented by a subgraph G', G'=(V' ,E'), of G such that V' = C (wi;j) and E' = l. When the edges of G are represented by segments with equal lengths, we say that we have a normal representation of the structure S(w,

Fig. 1 is a normal representation of a secondary structure with its different loops.

Fig. 1. Normal representation of a secondary structure.

3. Equations of energies computation

As we have explained in the introduction, to predict the stable secondary structure of an RNA macromolecule, we are brought to compute the minimum free energy that can have a secondary structure of this macromolecule. Unfortunately, the computing methods based on principles of thermodynamic do not permit to compute this energy. Whereas, it is experimentally possible to determine the energetic contribution of a WCP or a loop [11, 12, 26, 27, 21, 16]. A first hypothesis introduced by biochemists consists then in supposing that the energy of a secondary structure depends only on WCPs that constitute this structure [17]. We will call this hypothesis,

Hypothesis of Pairs Dependent Energy (HPDE):

!min„{ £ e(wi,wj)}, if 3^ on C(w),

(3)

0 else,

where e is a negative energetic function dependent solely on the nature of the concerned WCP.

Actually, the HPDE is a hypothesis that is far from being realistic. In fact, it ignores a very important fact: only stacks contribute with negative energies in the computation of the free energies of the secondary structures, the other loops contribute with positive energies [10]. And hence, only stacks tend to stabilize the secondary structures of the macromolecules, the other loops tend to destabilize them. Starting from this fact, biochemists have introduced later a better (more realistic) hypothesis. This hypothesis consists in supposing that the free energy of a secondary structure depends not only on the pairs that constitute this structure, but also, on the other unpaired bases of the macromolecule [31]. We will call this hypothesis, Hypothesis of Loops Dependent Energy (HLDE):

Emin(w)

min^ £ E'(/¿)+ £ e'(wrh , if 3^ on c(w),

li€L(w,«) wr €t (w,«) J (4)

M

E''(w) = £ e'(ws) else.

s=1

Function E' is an energetic function that depends on both the pairs that constitute the loop ^ and the accessible bases from this loop. The values of the energetic function E' are negative for the stacks and positive for the other loops [11, 12, 26, 27, 21, 16]. The function e' is a positive energetic function dependent only on the nature of the base in question.

In the particular case where for any loop /j, /j g l(w, we have:

E'(/i )= V e(wp,wq )+ V e'(ws), (5)

J] e(wp,wq )+ J]

(wp,wq)€li ws accessible

and(wp,wq)£lh,(fe<i) from li

we say that the function E is linear. We will call this hypothesis, Hypothesis of Linearity of Energy (HLE) [23].

It is easy to remark that the equation of Emin(w) under the HPDE is nothing else except a particular case of the one of Emin(w) under the HLE. Indeed, in the equation of Emin(w) under the HLE, we only have to set e'(wl) = 0, for any i, 0 < i < |w|, to find again the one of Emin(w) under the HPDE.

Our dynamic programming algorithm to compute the free energies of the stable secondary structures, and our traceback algorithm to predict these structures, under the HLDE, use the following theorems:

Lemma 1. For any substring wi;j, 0 < i < j — 4 < |w|, of a primary structure w and for any k, i < k < j, if (w\wfc) g Smin(wi)j) then under the HLDE we have:

Emin (wjj ) = EioopKk ) + E

min

Proof. By definition, we have Emin(wi)j) = min«{E(wi;jConsidering that the couple (wl,wfc) g Smin(wij), we eliminate then all the substructures that do not contain this couple. Therefore, we have:

Emin(wi,j)= min {E(wi,j,

Since the concerned substructures are those that contain the couple ) then thanks to

the HLDE we have:

Emin(wij) = min {E(wi)fc, + E(wfc+i;j,

l. e.:

Emin(wi,j) = min {E(wi>fc, + min{E(wfc+i;i, Finally, thanks to Equations (1) and (2), we have:

Emin

(wjj) = EbopKfc) + E

min

Theorem 1. For any substring Wj3-, 0 < i < j — 4 < |w|, of a primary structure w, we have under the HLDE:

min] e'(wi) + Emin(wi+i,j), min {Eioop(wj)fc)+Emin(wfc+ij)h if 3 ^ on c(w^-),

I i+4<fc<j J

such that

Emin (Wij )=<

3

^e'(ws) else.

Vs=»

Proof. If there are pairings on c (w^j) then the secondary structures of Wjj are in one of the following cases:

(a) either there is no base wk, 0 < i < k < j, such that (w\wfc) belongs to this structure,

(b) or there is a base wk, 0 < i < k < j, such that (wl,wfc) belongs to this structure.

If the stable secondary structure Smin(wi;j) is in the case (a) then we have obviously Smin(wij) = Smin(wi+1j). Then we have too lmin(wij) = lmin(wi+1)j). On the other hand, under the HLDE we have:

Emin Kj )= J] E'(/fc)+ J] e'(wr).

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Ifc ) Wr ermin(wi,j )

The base w* is unpaired, then it belongs to rmin(wi)j). We can then rewrite Emin(wi;j) as follows: EminKj )= J] E'(/fc)+ J] e' (wr) + e'(w*).

ifce£min(wi,j ) wr €Tmin(wi,j )

and wr=wl

Or, thanks to the equality between lmin(wi)j) and lmin(wi+1;j):

Emin(wi,j )= J] E'(/fc)+ J] e'(wr) + e'(w').

ik eL min(wi+1,j ) wr eTmin(wi + 1,j)

Hence, thanks to Equation (4):

Emin(Wi,j) = Emin(Wi+ij ) + e'(w*),

On the other hand, if the structure Smin(wi;j) is in the case (b) then let us call ko the position in Wij such that (w\wfco) g Smin(wi)j). According to Lemma 1, we have:

Emin (Wij) = Eioop(Wi ,fco ) + Emin

Since Emin(w*j) is minimum, we have then:

Emin

(w*,j )= min {Eloop(w*,fc)+ Emin(wfc+1,j)}. t+4<fc<j

Considering both cases (a) and (b), since we seek to minimize the value of the energy Emin (w*j), we have then:

Emin(wij) = min{e'(w*) + Emin(w*+1,j), min (Eloop(Wi)fc) + Emin(wfc+1;j)}}.

*+4<fc<j

Finally, from Equation (4), if there are no pairings on c (w*j) then we have:

j

Emin(wi,j) = y e'(ws).

Theorem 2. Let Wjj, 0 < i < j — 4 < |w|, be a substring of a primary structure w such that the bases w* and wj can be paired with each other. Under the HLDE, we have:

EloopKj) = min |(w)), miin{E/(a1;j(w)) + Eloop(w*+i,j_i)}, min | min{E'(A*;j(w)) + Eloop(w*+i;j_1)}, min{E'(p*;j(w)) + Eloop(w*+1;j_i)}},

m

mmMC-rM) + Eloop(wi+l,j_m)}, min jE'^1/1'''''^ (w)) + Eloop(w>+fcs,*+ls )}}.

s=1

Proof. Let us denote ^min the pairing on c (w*j) such that E(w*j, ^min) is minimum and w* ^min wj• We have Eloop(w*j) = E(w*j, ^min)• The loop q0 of S(w*j, ^min) that contains the couple (w* ,wj) is in one of the following cases:

(a) either it is a hairpin loop,

(b) or it is a stack,

(c) or it is a bulge loop,

(d) or it is an interior loop,

(e) finally, or it is a multiloop.

Let us examine each one of these cases. Case (a):

In this case, we have obviously:

EloopKj) = E/(n*,j (w)). <ase (b): Let us call 10 the position in such that a'0

have

Case (b): Let us call l0 the position in w*j such that a*0,- (w) = q0. Under the HLDE, we

Eloop(w*,,) = E/(a*0j (w)) + E(w*+io,j_1o , ^min)-

Considering that the base w*+l0 is paired with the base wj_l0 (since (w*+l0 , wj_l0) g a*,(w)) and the energy E(w*+l0oj_l0, ^min) is minimum (otherwise the energy Eloop(w*j) will not be minimum), we have then:

E(w*+i0,j_i0 , ^min) = Eloop(w*+i0,j_i0)-

Hence, we have:

Eioop(wij ) = (w)) + Eioop(wi+i0 ,j-1o )•

The energy E^aij(w)) + Eloop(wi+l0)j-l0) is minimum, we have then:

EioopKj) = min{E '(aij (w)) + Eioop(wi+i>J--i )}•

The cases (c), (d) and (e) are processed in the same way as the case (b) and we have: Case (c):

EioopKj) = min{min {E^A^- (w)) + Eioop(wj+7j_i)}, mini{E'(piJ. (w)) + Eioop^+ij-i )}}• Case (d):

EoopK;) = min(i>m) {E/(Ci;jm(w)) + Eioop(wi+i>j-m)}.

Case (e):

m

Eioop(wi,j)= mm 7 ){E'(Z^1-.^.7™(w)) + J] ^(w^, ,¿+7.)}•

s=1

Considering all these cases together, since we seek to minimize the value of the energy Eioop(wi;j), we have then:

EioopKj) = min |(w)), min{£"(aij(w)) + Eioop^+ij-i)}, min | m7in{E/(AiJ-(w)) + Eioop(wj+7j_i)}, min^'^(w)) + Eioop(wi+ij_7)}},

m

min{E,(Ci;jm(w)) + £ioop(wi+7j_m)}, ;min {E'(^kj'7l'-'fcm'7m (w)) + £ Eioop(wi+fcs,i+7s )}} • (7,m) (fcl,7i,...,fcm,7m) ^ J

s=1

We present now, under the HLDE, our dynamic programming algorithm, guided by a base-to-base pairing, to compute the energy of the stable secondary structure, then, we present our algorithm that predicts this structure by basing itself on tracing back the matrix filled by the previous algorithm. We use a new approach, called m-Multiloop Approach (m-MA), m > 1, that enables us to determine the m-multiloop that has the minimum energetic contribution. Thanks to this approach, the complexity of our prediction algorithm under the HLDE is reduced to a polynomial order.

In [8], we have also presented, under the HLE, our other algorithms of energies computation and prediction.

4. Energies computation and prediction of the stable secondary structure

According to Theorems 1 and 2, the computation of an energy Emin(w) under the HLDE depends on the one of the energetic contributions of the different loops defined on c (w).

The computation of the energetic contributions of stacks, hairpin, bulge and interior loops do not cause problems. In fact, there are experimental results [11, 12, 26, 27, 21, 16] concerning

the energetic contributions of these loops. On the other hand, for a substring w*j such that the bases w* and wj can be paired with each other, the number of these loops do not exponentially increase with the size of the substring w*j. Indeed, the number of stacks closed by the couple (w*,wj) is O(j — i), the ones of bulge and interior loops are O((j — i)2), and finally, there is only one hairpin loop closed by the couple (w*,wj).

Concerning the computation of the energetic contributions of multiloops, two problems arise:

(i) The first one concerns the measure of the energetic contributions of these loops. In fact, the measures that have been made concern only multiloops with at most 8 residues [16]. Those which concern multiloops with more than 8 residues are very difficult to accomplish.

(ii) The second one concerns the number of possible multiloops associated with a given substring. Indeed, for a substring w*j, such that the bases w* and wj can be paired with each other, the number of these loops grows exponentially with the size of the substring w*j [7] (Theorem 2.5, p25).

To overcome these two problems, we suggest the following approach, which we will call m-Multiloop Approach (m-MA), m > 1, that enables us to determine the m-multiloop that has the minimum energetic contribution. In what follows, we describe our approach for m = 3:

If the concerned substring, let us call it w*j, is of a length more than a certain threshold , << |w|, we operate by a divide-and-conquer strategy [1] we locate a couple (w1 ,wm), i < l < m < j, such that, on one hand, the bases w1 and wm can be paired with each other and, on the other hand, the loop energy Eloop(w1>m) is minimum. This couple divides the substring w*+1,,_ 1 in two other smaller substrings: w*+1,1_1 and wm+1,j_1. Then we process the substring w*+1,1_1 (resp. wm+1oj_1), in the same way as we have processed the substring w*j: we locate a couple (wp,wq) (resp. (wr ,ws)), i + 1 < p < q < l — 1 (resp. m +1 < r < s < j — 1), such that, on one hand, the bases wp and wq (resp. wr and ws) can be paired with each other and, on the other hand, the loop energy Eloop(wp,q) (resp. Eloop(wr,s)) is minimum.

The pairings between the bases wp and wq, w1 and wm, wr and ws, and w* and wj, generate together a 3-multiloop.

If the substring w*j is of a length less than the threshold , we identify all the multiloops defined on c (w*j) and closed by the couple (w*,wj). The energetic contribution of one of these loops is estimated thanks to Ninio's experimental results [16].

By adopting the 3-MA, the loop energy Eloop(w*j) is defined by the following equation:

where E^ (w*,,) is the minimum energy that can have a substructure associated with the substring w*j, containing a multiloop closed by the couple (w*,wj):

min^i,...,^){EV1'11'-'^(w)) + £m=1 E'(n*+fcs,*+is(w))} else.

Eloop(wi0,m() + Eloop(wp0,q0)+ Eloop^0^0) + Z e'(ws) +e(w*, wj) if (j — i) > tsz,

sG]*..j[\ ([p0--g0jui10.-m0luir0.-s0])

Fig. 2. A secondary structure with a 3-multiloop.

where 1q, m0, p0, q0, r0 and s0 are positions in w^- such that:

Eioop(w/o,mo) = min (El00p(wi,m)}, i<i<m<j

Eloop(wpo,qo ) = min {Eloop(wp,q)}, i<p<q<«0

Eioop(wro,so ) = min {Eloop (wr,s)}. mo<r<s<j

(8)

Compared to other approaches, we mention among others the one of Waterman [28, 29], the one of Zuker and Stiegler [33] and the one of Sankoff et al. [23], the m-MA enables one to improve the estimation of the minimum energetic contributions of the multiloops. In fact:

(i) When the concerned substring is long, we compute in a polynomial time the minimum energy that can have an m-multiloop, where m = 2 for 5S rRNA, m = 3 for 4RNA, 12S rRNA and 16S rRNA, and m = 4 for 23S rRNA [9, 24].

(ii) When the concerned substring is short, we compute the minimum energy that can have a multiloop, basing ourselves on Ninio's experimental results [16]. This enables us to compute more accurately the minimum free energy of the whole macromolecule.

The other approaches are:

(i) Either ignore the energetic contributions of multiloops, i.e. suppose that E^(wi;j) = 0 for any couple (i,j), 0 < i < j — 4 < |w|. It is the case of Waterman's approach [28, 29] and the one of Zuker and Stiegler [33].

(ii) Or compute these contributions under the HLE. These approaches suppose, implicitly, that we have: E^(wi;j) = Emjn(wi+1j_i) + e(wl,wj) for any couple (i,j), 0 < i < j — 4 < |w|. It is the case of the approach of Sankoff et al. [23].

Certainly, the second type approaches are better than the first type ones, but they remain nevertheless unrealistic. Indeed, the linear functions that approximate the energetic contributions of the multiloops, give too large values in the case where these loops are too long or too short, whereas, the values that they give are too small in the case where these loops are of moderate lengths [23]. Our approach by its search of m-multiloops, in the case of long substrings, and its

use of experimental results [16], in the case of short substrings, gives a more accurate estimation of the minimum energetic contributions of the multiloops.

On the other hand, by using the m-MA, the computation of the free energies of the stable secondary structures is achieved within a time proportional to n4 and using a memory space proportional to n2. These complexities are known to be the best existing complexities, to solve the problem of the prediction of the stable secondary structures of RNA macromolecules under the HLDE [16, 31, 29, 32, 15].

We present now our dynamic programming algorithm that computes the loop energies by using the 3-MA. We will need a half of a matrix M, of size (|w|*|w|), to store the energetic values of the function Ejoop. For any couple (i,j), 0 < i < j < |w|, we will set:

M[j, i] := EioopKj).

The other half of the matrix will be used to store the energetic values of the function Emin. For any couple (i,j), 0 < i < j < |w|, we will set:

M[i, j] := £min(wij).

Algorithm 1

(i) (i.a) Construct a matrix M of size (|w|*|w|), such that, for any couple (i, j), 0 < (j — i) < 4, we have M[j, i] :=

(ii) for j := 5 to |w| do

for i := j — 4 downto 1 do if |wi,wi} is a WCP then

(ii.a) mi := E'(^ j(w)); {hairpin}

(ii-b) m2 := 'min {£'(<») + Mj — 1,i + /]};

^-jr4^ {stack}

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(wi+1, wj-1} is a WCP

(ii.c)

(ii.c') m3 := min {E'(Aj (w)) + M[j — 1,i + /]};

2—andi-5) {left bulge}

{wi+1, wj 1} is a WCP

(ii.c") m3 := min {E'(pi (w)) + M[j — l, i + 1]};

2-1-aindi-5) {right bulge}

{wi+1,wj-1} is a WCP

(ii.c''') m3 := min{m3,m'3} (ii.d) m4 := min {E'(Cl,'jm(w)) + M[j — m, i + /]};

2<2m--((j7--ii--if-)4) {interior}

and

{wi+1,wj-m} is a WCP

(ii. e) {multiloop}

if (j — i) > then {3-multiloop}

(ii.e')

(ii.e'.a.a) m5 := min {M[m, l]}

(i+i)< l—(j—5), (1+4)—m< (j —i), and

{w1 ,wm } is a WCP

(ii.e'.a.b) give values to 10 and mo such that M[mo,/o] = m5 (ii.e'.b.a)m'5 := min {M[q,p]} {{(wp,wq)} is on the

((^+14))iPq—((100—5?), left of{(w1 ,wm)}}

and

{wp,wq} is a WCP

(ii.e'.b.b) give values to p0 and q0 such that M[q0,p0]=m/5 (ii.e/.c.a)m/5/ := min {M[s,r]}; {{(wr,ws)} is on the

(m°++)<-s<(i-"i5 right of{(wl, wm)}}

and

{wr, ws } isaWCP

(ii.e/.c.6) give values to r0 and s0 such that M[s0,r0] = m// (ii.e/.d)

m5 := m5 + m/ + m// + ^^ e/(ws) + e(wj, wj)

([po-qo] U^o-mo] UN-so])

else {multiloop closed

by(wj, wj)}

f m

(ii-e//) m5 := (k i mm i \ ^ j - ' Wm (w)) + E E^+ks ^ (w))

(k1 5 ••• 5 km 5 lm J I s— 1

endif

(ii.f) M[j, i] := min{m1; m2, m3, m4, m5};

else M[j, i] := endif endfor endfor

During the step (ii.e//), we seek to construct all the multiloops defined thanks to the bases of the substring wi+1 j_1 through searching for all the lists of hairpin loops defined, too, thanks to the bases of the substring wi+1 5j_1. The search of all these lists is made by identifying all the paths that exist in the graph Gn(wi+1 j_1). Indeed, each vertex (ks, 1s), in this graph, represents a hairpin loop {(wi+ks, wi+ls)}. A path [(k1,11), (k2,12), • • • , (km, 1m)], m > 2, represents then an ordered list {(wj+kl , wj+l1)} > {(wj+k2 , wj+l2)} > ... > {(wj+km , wj+lm)} made up by m hairpin loops. If the bases Wj and Wj can be paired with each other then this list and the couple (wj, wj) constitute together the multiloop p,kj5 115 "'5 km'lm(w).

The search of all the paths linking two vertices in a graph is a well-known problem. In [5], Berge describes an algorithm that solves this problem.

Proposition 1. Let w be the primary structure of an RNA macromolecule. Algorithm 1 computes the loop energies Eioop(wi j), 0 < i < j < |w|, by using the 3-MA and we have

M[j, i] = Eioop(wi 5 j).

Proof. From Theorem 2 and Equations (6) and (7), Algorithm 1 computes the loop energies Eioop(wi j), 0 < i < j < |w|, by using the 3-MA and for any couple (i, j), 0 < i < j < |w|, if the bases wj and wj can be paired with each other then M[j,i] = Eloop(w» j), else M[j, i] =

Proposition 2. Algorithm 1 is of complexities O(|w|4) in computing time and O(|w|2) in memory space.

Proof. For each couple (i, j), 0 < i < j < |w|, we make at most 1 iteration during the step (ii.a), |_(j — i — 4)/2j iterations during the step (ii.b), 2*(j — i — 6) iterations during the step (ii.c) and (j — i — 7)2 iterations during the step (ii.d). The computations of the energetic contributions of a hairpin and a bulge loop are of complexity O(|w|) in computing time. The ones of the energetic contributions of a stack and an interior loop are of complexity O(1) in computing time [16]. Therefore, the computing time complexity of the steps (ii.a) and (ii.b) is O(|w|), and the one of the steps (ii.c) and (ii.d) is O(|w|2).

Let us consider now the step (ii.e):

(i) When (j — i) > , we go into the step (ii.e'): we make at most (j — i — 5)2 iterations during the step (ii.e'.a), (10 — i — 5)2 iterations during the step (ii.e'.b) and (j — m0 — 5)2 iterations during the step (ii.e'.c) (i < 10 < m0 < j). Therefore, the step (ii.e') is of complexity O(|w|2) in computing time.

(ii) When (j — i) < , we look for all the multiloops defined thanks to bases of the substring wi,j and closed by the couple (wl,wj). The computation of the energetic contribution of one of these multiloops is achieved within a time proportional to [16], << |w|. Then, the computing time of this step is bounded by the constant )*tsz, where ) is the maximum number of multiloops that can be defined thanks to bases of a substring xi)tsz and closed by the couple (xi,xtsz) [7] (Theorem 2.5, p25). Then, the step (ii.e) is of complexity O(|w|2) in computing time.

We have (|w| — 4)2/2 couples (i, j) to process, since we must have (j — i) > 4, therefore, Algorithm 1 is of complexity O(|w|4) in computing time.

Algorithm 1 uses a memory space equal to |w|2/2, then, it is of complexity O(|w|2) in memory space.

We present now our dynamic programming algorithm that computes the energies Emin(wi;j), 0 < i < j < |w|, under the HLDE. This algorithm uses the matrix M whose second half has been filled thanks to Algorithm 1 (M[j, i] := Eioop(wi;j), for any (i, j), 0 < i < j < |w|).

Algorithm 2

(i) (i.a) Construct a matrix M of size (|w|*|w|); (i.b) for any (i, j), 0 < j — i < 4, do

s := 0;

for k := i to j do s := s + e' (wk) endfor;

M[i,j] := s endfor;

(ii) for j := 5 to |w| do

for i := j — 4 downto 1 do

(ii.a) mi := e'(wl) + M[i + 1, j]; m2 := (ii.b) for any k, i + 4 < k < j, do if {w',wk} is a WCP then

m2 := min{m2, M[k, i] + M[k + 1, j]} endif; endfor; (ii.c) M[i, j] := min{mi, m2};

endfor; endfor;

(iii) Emin(w) := M[1, |w|].

Proposition 3. Let w be the primary structure of an RNA macromolecule. For any couple (i, j), 0 < i < j < |w|, Algorithm 2 computes the energy Emin(wi)j) under the HLDE, by using the 3-MA, and we have M[i, j] = Emin(wi;j).

Proof. According to Theorem 1 and Proposition 1, for any couple (i, j), 0 < i < j < |w|,

Algorithm 2 computes the energy Emin(wi)j) under the HLDE, by using the 3-MA, and if the

j

bases w* and wj can be paired with each other then M[i, j] = Emin(wi;j), else M[i, j] = Z e'(ws).

Proposition 4. Algorithm 2 is of complexities O(|w|3) in computing time and O(|w|2) in memory space.

Proof. For each couple (i, j), 0 < i < j < |w|, we search for the position k0, i + 4 < k0 < j, such that {w\wfc0} is a WCP and M[k0,i] + M[k0 + 1, j] = mini+4<fc<j{M[k,i] + M[k + 1, j]}. This search is made linearly by incrementing the position k, i + 4 < k < j. Then, for a couple (i,j), this search is of complexity O(|w|) in computing time. We have (|w| — 4)2/2 couples (i, j) to process (since we must have (j — i) > 4). Therefore, Algorithm 2 is of complexity O(|w|3) in computing time.

Algorithm 2 uses a memory space equal to |w|2/2 then it is of complexity O(|w|2) in memory space.

We present now Algorithm 4 which is our prediction algorithm under the HLDE. This algorithm traces back the matrix M filled thanks to Algorithms 1 and 2.

To construct the stable secondary structure Smin (wi;j) associated with a substring wi;j, Algorithm 4 operates in the following way:

(i) When the base w* is not paired with any other base of the substring wi;j then the secondary structure Smin(wi)j) is equal to the secondary structure Smin(wi+ij). And we search then for the couples that constitute this structure by a recursive call to Algorithm 4.

(ii) Whereas, when the base w* is paired with a base wko, 0 < i < k0 < j, then, from Lemma 1, energy Emin (wi;j) satisfies the equation:

Emin

(wi,j) = Eioop(wi

,fco ) + Emin (wfco+ij ).

And then, the couple (wl,wfco) belongs too to the substructure associated with the substring wi;fco and having a free energy equal to Eloop (wi;ko). The search of the couples that constitute this substructure is made by making a recursive call to Algorithm 3. This algorithm uses the 3-MA but in the opposite direction: we deduce the pairings associated with shorter substrings according to the pairings associated with the ends of longer substrings. The construction of the structure Smin(wko+ij), associated with the substring wko+i;j, is made by a recursive call to Algorithm 4.

We begin then by describing Algorithm 3, then, we describe Algorithm 4. The list L, used by Algorithm 3, represents the set of the couples that make up the stable secondary structure associated with the substring wi;j. It is initialized to the empty list at the first call to this algorithm.

Algorithm 3 (i,j)

(i) (i.a) m2 := min i<i<L(j-i-4)/2j {E'(ai j(w)) + M[j — /, i + /]}; {stack}

and

(wi+1 is a WCP

(i.b) give a value to such that E'(^j (w)) + M[j — , i + ] = m2;

(ii) (ii.a) m3 := min 2<i<(j-i-5) {E'(Ai>j(w)) + M[j — 1, i + /]}; {left bulge}

and

{wi+1,wj-1} is a WCP

(ii.b) give a value to /a such that ¿"(A^j(w)) + M[j — 1,i + /a] = m3;

(iii) (iii.a) m'3 := min 2<i<(j-i-5) {E'(pij(w)) + M[j — /, i + 1]}; {right bulge}

and

{w^W'"1} isaWCP

(iii.b) give a value to /p such that ¿'(pij (w)) + M [j — /p, i + 1] = m'3;

(iv) (iv.a) m4 := min 2<z<(j-i-6) {E '((I^M) + M [j — m, i + /]}; {interior}

2<m<(j-i-Z-4) and

{wi+1,wj-m} isaWCP

(iv.b) give values to /^ and m^ such that E ' (Cij'™0 (w)) + M [j — m^, i + /^ ] = m4;

(v) { multiloop } (v.a) if (j — i) > , then { 3-multiloop }

(v.a.a) m5 := min (i+i)<i<(j-5), {M[m,/]}

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(l+4)<m<(j-1), and

{w;,wm} isaWCP

(v.a.b) give values to and mM such that M[mM,/M] = m5;

(v.a.c) m'5 := min (¿+i)<p<(zM-5), {M[q,p]} {{(wp,wq)} is on theleft of{(wl,wm)}}

(p+4)<q<(ZM-1), and

{wp,wq} is a WCP

(v.a.d) give values to and such that M= m'5

(v.a.e) m'5' := min(m^+1)<r<(j-5),{M[s, r]}; {{(wr, ws)} is on the right of{(wl, wm)}}

(r+4)<s<(j-1), ' and

{wr,ws} isaWCP

(v.a./) give values to rM and such that M[sM,rM] = m'5'; (v.a.g) m5 := m5 + m'5 + m'5' + ^ e'(ws) + e(wi, wj)

(v.b) else { multiloop closed by (wi,wj) } (v.b.a)

!m

E(w)) + J] EW,i+is (w))

s=1

(v.b.b) give values to k0,/0, k0,/0,... , , /^ such that

m

(w)) + £ E'(ni+fco,i+io (w)) = m5

s=1

endif;

(vi) L := LU{wi,wj)};

(vii) case M [j, i] of

m2 : L := L (J{(wi+1, wj-1)^{(wi+2, w^-2)}U ... (J{(w^-1, w^+1)}

Algorithm 3 (i + , j — ); m3: Algorithm 3 (i + 1A, j — 1); m'3: Algorithm 3 (i + 1, j — /p); m4: Algorithm 3 (i + /^, j — m^);

m5: if (j — i) > then Algorithm 3 Algorithm 3 Algorithm 3 else

L := LU {(wi+fc0, wi+10 )} U {(wi+fc0, wi+10 )} U • • • U {(wi+fcm, )} endif

endcase

Proposition 5. Let wi;j be a substring of the primary structure w such that the bases w*and wj can be paired with each other. And let M be the matrix filled thanks to Algorithm 1 (M[j, i] := Ei00p(wij) for any (i,j), 0 < i < j < |w|). Algorithm 3 gives the substructure, associated with the substring wi;j, containing the couple (w\ wj) and having a free energy equal

to Eioop(Wi,j ).

Proof. From Proposition 1, Equations (6) and (7), and Theorem 2, Algorithm 3 gives the substructure, associated with the substring wi;j, containing the couple (wl,wj) and having a free energy equal to Eloop(wi;j ).

Proposition 6. Algorithm 3 is of complexity O(|w|2 * log3(|w|)) in computing time. Proof. We demonstrate in the same way as in the proof of Proposition 2 that:

(i) the step (i) of Algorithm 3 is of complexity O(|w|) in computing time,

(ii) the steps (ii), (iii), (iv) and (v) are of complexity O(|w|2) in computing time. Therefore, each call to Algorithm 3 is of complexity O(|w|2) in computing time. On the other

hand, each call to Algorithm 3 generates, at most, three other recursive calls to Algorithm 3. Therefore, for a primary structure w, we have, at most, log3(|w|) recursive levels. Hence, Algorithm 3 is of complexity O(|w|2 * log3(|w|)) in computing time.

Algorithm 4 (i, j)

(i) if (j — i) > 4 then

(i.a) if M[i, j] = e'(wi) + M[i + 1, j] then Algorithm 4 (i + 1, j) else

(i.b) k := i + 4;

while (M [i,j ] = M [k, i] + M [k + 1,j ]) do k := k + 1 endwhile; (i.c) Algorithm 3 (i,k); Algorithm 4 (k + 1,j);

endif; endif;

(ii) Smin(wi,j) := L.

Proposition 7. Let wi;j be a substring of the primary structure w and let M be the matrix filled thanks to Algorithms 1 and 2. Algorithm 4 gives the substructure Smin(wi)j), under the HLDE, by using the 3-MA.

Proof. Propositions 1 and 3, and Theorem 1 guarantee that Algorithm 4 gives the substructure Smin(wi)j), under the HLDE, by using the 3-MA.

Proposition 8. Algorithm 4 is of complexity O(|w|3 * log3(|w|)) in computing time. Proof. Each call to Algorithm 4 generates an other call to this algorithm. Therefore, for a primary structure w, we make, at most, |w| calls. During a call concerning a couple (i, j), 0 < i < j — 4 < |w|:

(i) We look for the position k0, i + 4 < k0 < j, such that {w», wk0} is a WCP and M[i, j] = M[k0, i] + M[k0 + 1, j] = mini+4<k<j{M[k, i] + M[k + 1, j]}. This search is of complexity O(|w|) in computing time.

(ii) Then, we make a call to Algorithm 3. From Proposition 6, this algorithm is of complexity O(|w|2 * log3(|w|)) in computing time.

Therefore, each call to Algorithm 4 is of complexity O(|w|2 * log3(|w|)) in computing time. Hence, Algorithm 4 is of complexity O(|w|3 * log3(|w|)) in computing time.

Its easy to verify that the complexities of our algorithms remain the same for m = 3, i.e., m =2 or m = 4, where m is the number of the considered branches in a multiloop.

5. Experimental results

We have executed the program corresponding to our algorithm on strings coding *RNA, 5S rRNA, 12S rRNA, 16S rRNA and 23S rRNA macromolecules. We have been provided with these data by the European Molecular Biology Laboratory (Heidelberg, Germany).

The program is written in C and implemented on a SUN SPARCstation computer. Tab. 1 shows the processed data sizes and the corresponding results, where 9 is the success rate, it is defined by:

9 = np££l * 100. (9)

ntotal

With npred is the number of the predicted structures that are close to the true ones and ntotal is the total number of the structures. We have measured the closeness of the predicted structures to the true ones by using the following rate:

p = * 100. (10)

n total

With npred is the number of the Watson-Crick Pairs (WCP) in the true structure that were predicted and ntotal is the total number of the WCPs in the true structure.

Table 1

Experimental results

Macromolecule Approximate String Length Number of Strings Processed m 9,%

1 RNA 80 200 3 95.06

5S rRNA 120 150 2 94.68

12S rRNA 950 100 3 95.24

16S rRNA 1600 100 3 94.62

23S rRNA 2900 50 4 94.50

6. Conclusion

In this paper, we have tackled the problem of the prediction by energy computation of the stable secondary structures of RNA macromolecules. The algorithms that we have presented deal with this problem under the Hypothesis of Loops Dependent Energy (HLDE): we compute

the free energies of the stable secondary structures by using a new approach called m-Multiloop Approach (m-MA), where m > 1. This computation is made in a time proportional to n4 and using a memory space proportional to n2. The prediction of the stable secondary structures is made within a time proportional to n3 * log3(n). Compared to other approaches, the m-MA enables to improve the estimation of the minimal energetic contributions of the multiloops. And hence, it enables to improve the estimation of the free energies of the stable secondary structures. The other approaches, either ignore the energetic contributions of the multiloops, or compute these contributions under the HLE.

Our prediction algorithm, under the HLDE, has predicted secondary structures close to the true ones with a success rate of the order of 95%. This result is very interesting, when we know that the other algorithms, either do not reach this rate, or reach it with an exponential complexity [16, 31, 29, 32, 15].

References

[1] ÄHÜ A. v., HüPCROFT J.E., ÜLLMAN J. D. The Design and Analysis of Computer Algorithms. Addison-Wesley Publishing Company, 1974. P. 60-65.

[2] ÄURON B. E., RlNDOWE W. P., Vary C. P. H. ET. AL. Computer aided prediction of RNA secondary structures // Nucleic Acids Research. 1982. Vol. 10, No. 1. P. 403-419.

[3] BELLMAN R. E. Dynamic Programming. New Jersey: Princeton Univ. Press, 1957.

[4] BELLMAN R. E., Dreyfus S. E. Applied Dynamic Programming. New Jersey: Princeton Univ. Press, 1962.

[5] BERGE C. Graphes et Hypergraphes. Paris: Dunod Editeur, 1970.

[6] Dumas J. P., NiNIO J. Efficient algorithms for folding and comparing nucleic acid sequences // Nucleic Acids Research. 1982. Vol. 10, No. 1. P. 197-206.

[7] ELLOUMI M. Analysis of Strings Coding Biological Macromolecules. Science Doctorate Dissertation. The Univ. of Aix-Marseilles III, France, June 1994.

[8] ELLOUMI M. New Algorithms to Predict Secondary Structures of RNA Macromolecules // 11th Intern. Conf. on Industrial and Eng. Appl. of Artificial Intelligence and Expert Systems (Benicassim, Castellon, Spain), Springer Lecture Notes in Artificial Intelligence, Springer-Verlag, Berlin, June 1998, P. 864-875.

[9] Gesteland R. F., Atkins J.F. (Eds.) The RNA World. N.Y.: Cold Spring Harbor Lab. Press, 1993.

[10] GRALLA J., STEITZ J. A., CrOTHERS D.M. Direct physical evidence for secondary structure in an isolated fragment of R17 bacteriophage mRNA. 1974. No. 248. P. 204208.

[11] GRALLA J., CROTHERS D.M. Free energy of imperfect nucleic acid helices II. Small hairpin loops // J. Mol. Biol. 1973. No. 73. P. 497-511.

[12] Gralla J., CROTHERS D. M. Free energy of imperfect nucleic acid helices III. Small internal loops resulting from mismatches //J. Mol. Biol. 1973. No. 78. P. 301-319.

[13] LARMORE L. L., SCHIEBER B. On-line dynamic programming with applications to the prediction of RNA secondary structure // J. of Algorithms. 1991. No. 12. P. 490-515.

[14] MARTINEZ H. A New Algorithm for Calculating RNA Secondary Structure. Manuscript, 1980.

[15] MEIDANIS J., Setubal J. C. Introduction to Computational Molecular Biology. Boston: PWS Publ. Company, 1997.

[16] NlNIO J. Prediction of pairing schemes in RNA molecules-loop contributions and energy of wobble and non-wobble pairs // Biochimie. 1979. No. 61. P. 1133-1150.

[17] NuSSINOV R., JACOBSON A. Fast algorithm for predicting the secondary structure of single-stranded RNA // Proc. Nat. Acad. Sci. USA. Nov. 1980. Vol. 77, No. 11. P. 63096313.

[18] Oppenheimer N.J., JAMES T. L. Nuclear magnetic resonance: Part A, Spectral techniques and dynamics // Methods in Enzymology. 1989. No. 176.

[19] Oppenheimer N. J., JAMES T. L. Nuclear magnetic resonance: Part B, Structure and mechanism // Ibid. No. 177.

[20] PIPAS J. M., Mc MAHON J. E. Method for prediction RNA secondary structure // Proc. Nat. Acad. Sci. USA. June 1975. Vol.72, No. 6. P. 2017-2021.

[21] SALSER W. Globin messenger-RNA sequences: analysis of base-pairing and evolutionary implications // Cold Spring Harbor Symp. Quant. Biol. 1977. No. 42. P. 985-1002.

[22] Sankoff D., Morin A.M., Cedergren R.J. The evolution of 5sRNA secondary structure //J. Canadien de Biochimie. 1978. No. 56. P. 440-443.

[23] Sankoff D., Kruskal J.B., Mainville S., Cedergren R. Fast algorithms to determine RNA secondary structures containing multipleloops. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley Publ., Massachusetts. 1983. P. 93-120.

[24] Simons R. W., GRUNBERG-MANAGO M. (Eds.) RNA Structure and Function. N.Y.: Cold Spring Harbor Lab. Press, 1998.

[25] Studnicka G.M., Rahn G.M., Cummings I. W., Salser W. A. Computer method for predicting the secondary structure of single-stranded RNA // Nucleic Acids Research. 1978. Vol. 5, No. 9. P. 3356-3387.

[26] TINOCO I. Jr, Borer Ph. N., Dengler B. ET. AL. Improved estimation of secondary structure in ribonucleic acids // Nature New Biology. 1973. No. 246. P. 40-41.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

[27] Uhlenbeck O. C., Borer Ph. N., Dengler B., Tinoco I. Stability of RNA hairpin loops: A6-Cm-U6 //J. Mol. Biol. 1973. No. 73. P. 483-496.

[28] WATERMAN M.S. Secondary structure of single-stranded nucleic acids // Studies in Foundations and Combinatorics, Advances in Mathematics Supplementary Studies. 1978. No. 1. P. 167-212.

[29] Waterman M. S. Introduction to computational biology / J. Wiley (Eds.), 1995.

[30] WATERMAN M.S., Smith T. F. RNA Secondary structure: A complete mathematical analysis // Mathematical Biosciences, 1978. No. 42. P. 257-266.

[31] ZukER M. The use of dynamic programming algorithms in RNA secondary structure prediction // Mathematical Methods for DNA Sequences, CRC Press Inc., Boca Raton, Florida, 1989. P. 168-170.

[32] ZukER M. Prediction of RNA Secondary Structure by Energy Minimization. Washington Univ. Press, St. Louis, Mo., 1996.

[33] ZukER M., STIEGLER P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information // Nucleic Acids Research. 1981. Vol. 9, No. 1. P. 133-148.

Received for publication March 21, 2001

i Надоели баннеры? Вы всегда можете отключить рекламу.