Научная статья на тему 'POLARIZATION ANALYSIS OF GENE SEQUENCE STRUCTURES: MAPPING OF EXTREME LOCAL POLARIZATION STATES'

POLARIZATION ANALYSIS OF GENE SEQUENCE STRUCTURES: MAPPING OF EXTREME LOCAL POLARIZATION STATES Текст научной статьи по специальности «Физика»

CC BY
11
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
GENE SEQUENCES / NUCLEOTIDE TRIPLETS / PHASE SCREEN / DIFFRACTION / STOKES VECTOR COMPONENTS / EXTREME STATES / BINARY DISTRIBUTIONS

Аннотация научной статьи по физике, автор научной работы — Zimnyakov Dmitry A., Alonova Marina V., Skripal Anatoly V., Zaitsev Sergey S., Feodorova Valentina A.

A method for visualization and identification of nucleotide sequences is proposed based on the synthesis of phase screens displaying the structure of the analyzed sequences and reconstruction of binary maps of the extreme values of the local Stokes vector components in the diffraction zone. This diffraction zone is formed due to reading out the phase screen by a coherent collimated beam with two orthogonally polarized (x-y) components. With different phase delays of the x-y components of the readout beam introduced by the phase screen elements, this causes a variety of local polarization states in the diffraction zone. The discrimination level for the local component of the Stokes vector chosen for the binary mapping is established near the extreme value for this component. Computer verification of the proposed method using nucleotide sequences for various strains of the model African swine fewer (ASF) virus as the test objects shows its high efficiency in the detection of differences between two compared sequences corresponding to the same type of infectious agent. Analysis of the model data for the strains under study made it possible to establish a power-law character of correlation coefficients of the synthesized binary distributions of extreme local polarization states depending on the discrimination threshold detuning from the maximum value of the Stokes vector component used for the mapping.

i Надоели баннеры? Вы всегда можете отключить рекламу.

Похожие темы научных работ по физике , автор научной работы — Zimnyakov Dmitry A., Alonova Marina V., Skripal Anatoly V., Zaitsev Sergey S., Feodorova Valentina A.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «POLARIZATION ANALYSIS OF GENE SEQUENCE STRUCTURES: MAPPING OF EXTREME LOCAL POLARIZATION STATES»

Polarization Analysis of Gene Sequence Structures: Mapping of Extreme Local Polarization States

Dmitry A. Zimnyakov1,2, Marina V. Alonova1*, Anatoly V. Skripal3, Sergey S. Zaitsev4,

and Valentina A. Feodorova4

1 Yury Gagarin State Technical University of Saratov, 77 Polytechnicheskaya str., Saratov 410054, Russia

2 Institute for Precision Mechanics and Control Problems of the Russian Academy of Sciences (IPTMU RAS),

24 Rabochaya str., Saratov 410028, Russia

3 Saratov State University, 83 Astrakhanskaya str., Saratov 410012, Russia

4 Federal Research Center for Virology and Microbiology (Saratov Branch), 53 Strelkovaya Divisiya str., building 6,

Saratov 410028, Russia

* e-mail: alonova marina@mail.ru

Abstract. A method for visualization and identification of nucleotide sequences is proposed based on the synthesis of phase screens displaying the structure of the analyzed sequences and reconstruction of binary maps of the extreme values of the local Stokes vector components in the diffraction zone. This diffraction zone is formed due to reading out the phase screen by a coherent collimated beam with two orthogonally polarized (x-y) components. With different phase delays of the x-y components of the readout beam introduced by the phase screen elements, this causes a variety of local polarization states in the diffraction zone. The discrimination level for the local component of the Stokes vector chosen for the binary mapping is established near the extreme value for this component. Computer verification of the proposed method using nucleotide sequences for various strains of the model African swine fewer (ASF) virus as the test objects shows its high efficiency in the detection of differences between two compared sequences corresponding to the same type of infectious agent. Analysis of the model data for the strains under study made it possible to establish a power-law character of correlation coefficients of the synthesized binary distributions of extreme local polarization states depending on the discrimination threshold detuning from the maximum value of the Stokes vector component used for the mapping. © 2022 Journal of Biomedical Photonics & Engineering.

Keywords: gene sequences; nucleotide triplets; phase screen; diffraction; Stokes vector components; extreme states; binary distributions.

Paper #3526 received 12 Sep 2022; revised manuscript received 24 Oct 2022; accepted for publication 28 Oct 2022; published online 3 Dec 2022. doi: 10.18287/JBPE22.08.040302.

1 Introduction

Improvement of tools and methods for rapid diagnosis of viral infectious disease (VID) agents requires development and application of the latest approaches based on advances in molecular microbiology and bioinformatics. In recent years, emergence of new infections and more frequent outbreaks of VPDs with high epidemic or even pandemic potential associated with identification of mutant variants of viruses require a detailed study of the biodiversity of the relevant

pathogens and search for alternative technologies needed for rapid and highly accurate analysis of polymorphism of both individual targeted genes and entire genomes of VPD pathogens [1]. The process of studying the genetic polymorphism of biological objects includes two essential stages; at the first stage, the relevant DNA is sequenced using one of two major well-established techniques known as next-generation sequencing (NGS) technology: short-read sequencing and long-read sequencing [2]. As a result, the primary structure of a linear biomolecule of a certain length is determined

strongly indicating positions of four basic nucleotides (adenine, cytosine, thymine and guanine, assigned by the symbols A, C, T, G, respectively) in the target DNA. At the second essential stage, the structure of the obtained symbol sequence is analyzed in order to identify the most significant features that characterize the DNA under study. It should be noted that this analysis is no less important than the initial DNA sequencing and plays a decisive role in various bioinformatics applications. Typically, the analysis of the structure of symbol series obtained by sequencing of the microbial DNA and RNA is performed using software methods based on the statistical analysis of nucleotide-corresponding symbols and their groups in the studied series [3]. At the same time, new instrumental and instrumental-software (hybrid) approaches based on the principles of coherent-optical and polarization analysis of quasi-random two-dimensional structures can also be applied to solve such problems.

In Refs. [4, 5], a method for imaging and analysis of gene sequences based on computer synthesis of GB (Gene-Based) speckle patterns, which carry information regarding unique combinations of three of the four basic nucleotide bases (adenine (A), guanine (G), cytosine (C), thymine (T)), is considered. Thus, a fragment of a linear nucleotide sequence (for example, ATG, GGA, TGT), containing N2 triplets, is transformed into a matrix of N x N elements, each of which is assigned an integer value in the range from 0 to 63, depending on the combination of nucleotides in the triplet. In particular, transformation of the alphabetic code of the triplet Xi

(0 < i < N,0 < j < N ) into the corresponding value of the matrix element can be performed according to the following algorithm [4, 5]: Xtj = 16Hj + 4X2 + X3 - 21

where each of the three factors H1, H 2, H3 (indices "1-3" correspond to the position of the nucleotide in the triplet) takes the value from 1 to 4 ( A «1, C « 2, G « 3,T « 4 ). Thus, the maximum value occurs when the triplet "TTT" appears in the sequence, and the minimum (zero) value occurs when the triplet "AAA" appears.

The matrix generated in this way can be used as a spatial phase modulator for a coherent light beam with the depth of phase modulation of each element equal to KjXi (K9 is a scaling factor). As a result of diffraction

of a coherent readout beam on the synthesized phase modulator, a GB speckle-structure is formed in the far diffraction zone (for example, in the focal plane of the Fourier-transforming lens).

Stochasticity of the distribution of the diffracted light field is due to the close-to-random character of the Xtj - distribution; accordingly, at sufficiently large

values of the scaling factor K > p the formed GB

speckle-structure will tend to the so-called developed speckle field [6]. The developed speckle patterns are characterized by the Gaussian probability density functions of the real and imaginary components of the

complex amplitude with zero mean values. In turn, this leads to a negative exponential form of intensity probability density functions. At the same time, for each nucleotide sequence transformed into a stochastic phase matrix, a unique GB speckle microscopic structure is formed, which can be used to identify a nucleotide sequence for a particular gene. In Refs. [4, 5], it was proposed to search for mutation-driven differences in gene sequences corresponding to biological objects of one pathogenic microorganism, Chlamydia trachomatis, based on the correlation analysis of GB speckle structures synthesized in this way.

The concept of visualization and recognition of differences in nucleotide sequences (polymorphism) using synthesized GB speckles seems quite original; however, it has certain drawbacks that limit the possibilities of its instrumental implementation and practical application. One of the most significant drawbacks is a rather weak effect of decorrelation of spatial intensity distribution of the synthesized GB speckle structure with respect to the initial distribution when a small number of triplets (1-2) in the analyzed target gene sequence is substituted. In addition, initial consideration of this concept within the framework of scalar diffraction theory greatly limits the functionality that can be significantly expanded by applying the principles of polarization optics.

The aim of this work was to further develop the methodology for coherent-optical imaging and analysis of nucleotide sequences of infectious agents based on the principles of local stochastic transformations of the polarization state of the reading coherent beam by a synthesized spatial modulator of local phase shifts of orthogonally polarized components of the reading beam and identification in the diffracted field of the regions corresponding to extreme values of the local components of the Stokes vector. It is expected that this approach should be characterized by substantially higher sensitivity to local changes in the structure of the analyzed triplet sequences in comparison with the correlation analysis of "scalar" GB speckles. Such expectations are stipulated by the fact that extreme states of the normalized local components of the Stokes vector (e.g., minima of the first component s0 or absolute values of the fourth component s3 close to 1, which determine contribution of the circular polarization to the local polarization state) are achieved at a strictly definite phase and amplitude relations between the x-y components of the diffracted light field at a given point. Changing the triplet in the original sequence of the target gene by replacing at least one nucleotide should lead to violation of these phase relations and, consequently, to the shifts in the zones of extreme states in the observation plane, which can be identified by the correlation analysis of binarized distributions of extreme states.

2 Polarization Visualization of the Structure of Nucleotide Sequences

The proposed principle of transforming the sequence containing N2 triplets of nucleotides into a phase screen is fundamentally different from the principle considered in Ref. [4]. Each triplet is associated with the 2*2

submatrix {al o} ; the matrix elements and base

nucleotides are assigned to each other as a00 « A ,

a0 1 « G , al0 «C , axl «T . Note that this

assignment is conditional and does not affect the subsequent analysis; the only condition is that the transformation rule is constant for all triplets in the nucleotide sequence of a particular gene. The values of submatrix elements determine the number of corresponding nucleotides in the triplet; thus, the sum of all submatrix elements always equals 3. Accordingly, the minimum number of null elements in the submatrix is 1, and the maximum number is 3. The phase screen matrix Dt t of the size 2N x 2N is synthesized by sequentially

combining all N2 submatrices.

Consider the process of diffraction on a synthesized phase screen of a linearly polarized coherent light beam; the polarization plane of the beam forms the angle p/ 4

with the sides of the screen. The diffracted light field (formed, for example, in the focal plane of the Fourier-transforming lens ("frequency plane") is a superposition of linearly polarized components with orthogonal polarization directions, where the spatial distribution of each can be calculated using identical discrete Fourier transforms (see, for example, Ref. [7]):

Km 4N 2

: ^ ^ exp[-j ■ scale ■ {(n/N)(k ■i + m ■ t) - f*; y}].

(1)

In the Eq. (1), the index "y" corresponds to the direction along the columns of the synthesized phase screen and, respectively, "x" along the rows. The amplitudes of the x- and y-components of the incident readout beam on the screen are assumed to be 1. The scaling parameter scale is used to select the analyzed region of the diffracted field. In the case of the instrumental implementation of the scheme under consideration, the following relationship takes place between the dimensional scales in the input (i, t) and output (k,m ) planes: Dk m = Fk/Dt t . Here, F is the

focal length of the used Fourier-transforming lens, X is the wavelength of laser radiation, and Dkm,D. t

determine the dimensional separation between adjacent points in the corresponding systems of discrete coordinates.

The terms j are defined as jx. = K D ,, and the

t i,t Ti,t j i,t 7

terms jit are defined as jit = jxit + AO ( AO is the

constant phase shift between the x- and y-components of the readout beam passing through the screen).

For each pair of the calculated values of complex amplitudes of diffracted components Exkm, Eykm (0 < k < 2N, 0 < m < 2N), we calculate normalized local values of the Stokes vector components as:

Sk,m = ( \Ek,m

I2 +1 E?m I2) /2;

, =| |Ek -\Ey

'k, m \\ k, m V^k, m

2 s.

4,m = 2\Elm |\Elm \ C0S (dk,m )/2s\

Sl, m = 2 | El m\\Eim\ Sm ( 8k, m )/2s \

(2)

0 .

k, m'

0

k, m 5

where dkm is the phase difference of x- and y-

components of the diffracted field at the corresponding point of the diffraction plane.

3 Model Polarization Images of the Nucleotide Sequences of the p72 Target Gene (by the Example of the African Swine Fever Pathogen; Strains "HuB20", "Zaire", "Ulyanovsk 19/WB-5699")

As an example, let us consider the results of the computer experiment on polarization imaging of the nucleotide sequences of the target gene p72 of three different strains of the model African swine fever virus (ASF) ("HuB20" (NCBI GenBank access number: MW521382.1), Zaire (NCBI GenBank access number MW296952.1), Ulyanovsk 19/WB-5699 (NCBI GenBank access number MW306192.1) [8]). ASF is known to be a contagious viral disease of animals (pigs and boars) with a lethal outcome of up to 100% [9]. The initial fragment of the nucleotide sequence from the start codon for the "HuB20" strain is

ATGGCATCAGGAGGAGGAGCTTTT... , and the corresponding fragment of the first line of the synthesized phase matrix is

11111012 120100 0 1 1 0 1 1 0 0 0 0 1 1 0 3'

(3)

The differences in the structures of nucleotide triplet (codon) sequences within the analyzed fragments of 625 triplets for the strains under consideration are presented in Table 1. Note that the sequence of the "Zaire" strain differs significantly from those of "HuB20" and "Ulyanovsk" (30 and 29 triplets, respectively), while the difference between the "HuB20" and "Ulyanovsk" sequences is minimal (1 triplet).

For simulation purposes, the value of the scaling factor K9 was taken as p; it can be shown that for the

case of equal probability of the finding values 0, 1, 2, and 3 at a random choice of the matrix element D,,, the value

i=-N;=-N

of K = p gives zero value of the element s0 0 (i.e.,

suppression of the non-diffracted component of the readout beam passing through the modulator in the near-axis region of the observation plane). Indeed, in the case of the equal probability of finding matrix elements (Di t)

, the real part of the on-axis amplitude of the x-component of the diffracted field can be represented as follows:

Re (Elo) x (1 - 4p) + p + p cos (Kv) +

+pcos (2K9) + pcos (3K9 ) = (1 - 3p)+ (4)

+p (cos (K9) + cos (2K9) + cos (3K9)).

Here, p = 0.25 is the probability of finding one of the elements 0, 1, 2, and 3 at random choice. Accordingly, the imaginary part of EX0 can be written as:

Im (Elo) a p (sin (K9 ) + sin (2K9 ) + sin (3K9 )), (5)

and the on-axis intensity of the x-component is:

Ix0fi x [Re (El0)]2 + [Im ()]2 x 1 - 6p + +12p2 + 2p (1 - p) cos Kv + (6)

+2p (1 - 2p) cos 2K9 + 2p (1 - 3p) cos 3K9.

It can be seen that substituting K = n for p = 0.25 gives a zero value of IX0. A similar consideration can be

carried out for the y-component under the condition

AO = const.

Significantly non-zero values of intensity of the non-diffracted component indicate, in particular, the dominance of some of the four nucleotides in the original sequence. The value of the phase shift AO was assumed to be p/ 2. This choice is because in this case the local polarization states of the boundary field (directly behind the phase screen) will correspond to either the right circular polarization (in the cases if Di t = 0 or 2) or the

left circular polarization (in the cases if Dt t = 1 or 3). This causes a broader variety of local polarization states in the (k, m) plane and, accordingly, a greater

probability of the appearance of states with s3k m close to the extreme values of 1 or -1, compared, for example, with the case of AO = p .

Based on the size of the generated phase matrices Dit (50x50), the N value in the simulation was taken to

be 100, and the indices k and m varied from 0 to 99; the maximum value scale should not exceed 0.5 to avoid the effect of a spatial frequency overlap (aliasing) [10].

Fig. 1 shows the synthesized spatial distributions of lg (St m ) as grayscale maps in the near-axis region of the

frequency plane ( scale = 0.1) for three strains under consideration (the logarithmic scale of representation chosen due to very wide intervals of intensity changes of synthesized intensity distributions within the near-axis region). Note the pronounced speckle modulation of the diffracted field. The phase differences of the boundary field (immediately behind the phase screen) for different elements of the phase matrix take only three possible values ( -p, 0, p ) for the used phase modulation algorithm, though stochasticity of spatial distributions of the elements of the synthesized phase screens leads to speckle formation in the diffracted zone. Substantially nonzero values of the diffracted field intensity at the beam axis (the central region of the synthesized distributions) indicate that the nucleotide distributions in the analyzed sequences are substantially non-uniform.

Table 1 Differences in triplet sequences for the analyzed strains.

# in the Composition of triplets

sequence (from the start codon) HuB20 Ulyanovsk Zaire

26 CTG CTG CTT

30 ATC ATC ATT

44 GAT GAT GAC

107 CAT CAT CAC

115 TCA TCA TCG

126 TCC TCC GCC

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

132 GGG GGG GGT

149 CCC CCC CCT

160 CCT CCT CCC

164 CCC CCC CCT

200 CTA CTA CTG

228 GTT GTT GTC

268 TGT TGT TGC

275 TTT TTT TTC

278 CAG CAG CAA

298 ATC ATC ATT

333 TTG TTG CTG

338 AAT AAT AAC

347 TCA TCA TCG

376 GTA GTA ATA

379 TCA TCA TCG

383 GCT GCT CCT

432 CGC CGC CGA

434 TCG TCG TCC

483 CAA CAA CAG

490 TTT TTC TTC

500 CCC CCC CCT

523 TCT TCT TCG

552 AAA AAA AAG

570 GCG GCG GCA

_ -1.48

(c)

Fig. 1 Model distributions of local values of the logarithm of the first component of the Stokes vector in the frequency plane (scale = 0.1) for target genes of the ASF virus strains: (a) "HuB20"; (b) "Ulyanovsk"; and (c) "Zaire".

Visually, distributions of lg (s°k m) appear to be

insignificantly different from each other, but spatial distributions of the special points (in particular, for which

the lg (s0 m ) values are less than some threshold value-

conditional points and "zero amplitude" lines) are expected to show significant differences from strain to strain.

Fig. 2 shows similar distributions of the components s\ m characterizing contribution of the left (s3k m = -1) or

right ( s'k m = 1) circular component to the local polarization state at the point ( k, m ). Antisymmetric character of these distributions (s3k_50 m-50 = _s350 _k 50 _m),

caused by the applied algorithm of phase modulation of the x- and y-components of the readout beam passing through the synthesized screen, is remarkable. Uniqueness of s3k m distributions for each analyzed strain

should also be noted. The next section is devoted to the analysis of such uniqueness.

4 Mutational Changes in Nucleotide Sequences and Variability of the Binary Maps of Extreme Polarization States

The obtained model for spatial distributions of the local values of the Stokes vector component \s\m} can be

transformed into binary forms by introducing a certain threshold value of s]h and subsequent binarization according to the rule:

I S > S

J A k ,m — A th

1 "3 < si

®~slm = 1(0);

= 0(1).

® S3, m

(7)

Binary sk m distributions for the target genes of the strains "HuB20", "Zaire", and "Ulyanovsk" at s]h = -0.99 (selection of the near-circular left local polarization states) are shown in Fig. 3. The maximum allowable value of scale, equal to 0.5, was used in the modeling. For quantitative comparison of the limit state maps synthesized in this way, a correlation coefficient is defined as:

1k.m

(a)

(b)

(c)

Fig. 2 Model distributions of the local values of the third component of the Stokes vector in the frequency plane (scale = 0.1) for the target genes of the ASF virus strains: (a) "HuB20", (b) "Ulyanovsk", and (c) "Zaire".

100

100

100

Fig. 3 "Panoramic" binary maps for the polarization extreme states (left elliptical polarization with s3km < -0.99, scale = 0.5) for the target genes of ASF virus strains: (a) "HuB20"; (b) "Ulyanovsk"; and (c) "Zaire".

X s

3

3 k. m

R1% -

(8)

where indices "1" and "2" correspond to the nucleotide sequences for different strains. Accordingly, for the

"HuB20" and "Ulyanovsk" strains (Fig. 5 (a, b)) R^ v is

0.472, and for the "HuB20" and "Zaire" strains

RH,K » 0.099.

It is of interest to analyze sensitivity of the correlation coefficient to the binary distributions of extreme states

{S3 m} introduced in this way to the changes in the

structure of nucleotide sequences caused by random

substitutions to a part of the triplets - one nucleotide for another. This analysis was performed for an original nucleotide sequence corresponding to the "HuB20" strain by the random changes in one, two, or more triplets providing that the sum of the elements of the substituent submatrix is equal to 3. The average values of the correlation coefficient were calculated as

T

k ,m

IsH

S ( skm

. k ,m

)2

(9)

depending on the number of the substitutions Nr . Averaging was carried over the groups of 10 binary maps of extreme states {skm }, synthesized for the sequences with a given number Nr of random substitutions at the threshold values 4 = -0.95 and s]h =-0.99.

1.0

0.8

« 0.6

0.4

0.2

o.n

» 2

A 3 v 4 □ 5

o 6

'°0o°oOc

10 15 20 25 30 N.

Fig. 4 Average correlation coefficients of the binary distributions s3m (Eq. (9)) to the "HuB20" strain in the

random substitutions of a given number Nr of unit nucleotides in triplets (1, 2) compared with the values of correlation coefficients to the "HuB20" - "Ulyanovsk" (3, 5) and "HuB20" - "Zaire" (4, 6) pairs. (1, 3, 4) discrimination level is -0.95; and (2, 5, 6) discrimination level is -0.99.

in

The obtained values ^Rj3 ^ = f (Nr) are shown i

Fig. 4; selectively shown confidence intervals for Nr = 1 and 2 correspond to the significance level of 0.9.

Note a sharp drop in (R^) when replacing single

elements in the original sequences and the subsequent slow decrease when Nr further increases. Assuming a uniform distribution of the displayed polarization extreme states at a given threshold s3 with the density Rth = Nth/ 4N2 (i.e., the density is equal to the ratio of the number of extreme states to the number of pixels in the frequency plane), we can show that f K® N2

(random nucleotide substitution in all triplets), then

^Rj3 ^ ®Ph. Fig. 4 also shows R^ u (sequences for the

"HuB20" and "Ulyanovsk" strains) and R^ z (sequences

for the "HuB20" and "Zaire" strains). Note to the absence of the overlapping confidence intervals in the figure for a small number of triplets with nucleotide substitutions; this indicates high sensitivity of the correlation coefficient of the binary distributions of extreme local polarization states to minor the changes in the structure of nucleotide sequences of the analyzed target genes.

An increase in discrimination threshold leads to a decrease in the correlation coefficient. Fig. 5 shows the dependences of correlation coefficients for the pairs of the nucleotide sequences of the ASF virus strains "HuB20" - "Ulyanovsk" ( Nr = 1) and "HuB20" -

"Zaire" ( Nr = 30) (Table 1) on the value of the discrimination level detuning from the minimum possible value =-1).

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Qg 10"'

10-'

' o

mi

10

1 + sfh

Fig. 5 Correlation coefficients for the pairs of nucleotide sequences of the ASF virus strains "HuB20" -"Ulyanovsk" (1) and "HuB20" - "Zaire" (2) depending on detuning of the discrimination threshold s3th from the

minimum possible value (-1). The solid and dashed lines are applied as guides for the eye and show trends in the behavior of the dependences that are close to the power-law trend.

Note that for sufficiently wide intervals of the changes in the detuning parameter 1 + s^h (on the order of 1.5-2 decades), the dependences presented in Fig. 5 admit with an acceptable accuracy the power approximations of the form

R

(l + Sh )"

(10)

similar to the behavior of complex systems near the critical points (for example, in the region of the phase transition).

At the same time, in contrast to the classical critical phenomena, the established close-to-power decrease in correlation coefficients as they approach the minimum

possible value s]h is not universal in the broad sense (the a values for the pairs of nucleotide sequences of the ASF virus strains "HuB20" - "Ulyanovsk" (a~ 0.39) and "HuB20" - "Zaire" (a~ 0.75) differ significantly in the case under consideration, i.e. depend on the number of triplets substituted. The issue of such "quasi-critical" behavior of the synthesized binary distributions of polarization extreme states depending on the structure of synthesized phase screens is beyond the scope of this work and is an object of further research.

5 Conclusion

Thus, the considered method of polarization imaging of the structure of nucleotide sequences in the targeted genes of the model ASF virus strains using binary mapping of extreme local polarization states of the diffracted reading beam demonstrates high sensitivity to small changes in the structure of the sequences. The synthesized map of extreme states at an appropriately high discrimination threshold is a unique identifying object that corresponds solely to the analyzed nucleotide sequence.

Note that the proposed method, in contrast to the correlation analysis of the synthesized GB speckles [4, 5], allows rather simple instrumental implementation for research and demonstration purposes. The corresponding polarimetric system can consist of a single-mode HeNe laser with a beam expander-collimator and a spatial filter used to improve the quality of the linearly polarized beam, a multielement liquid crystal spatial light modulator with computer control (for example, manufactured by Holoeye (Germany) or a similar unit), the Fouriertransform lens, a set of polarization filters, and a CMOS camera. In particular, when using a He-Ne-laser, the transmissive spatial light modulator Holoeye LC 2012 (the pixel size is 36 ^m), and a Fourier transform lens with a focal length of about 100 mm, the characteristic dimensions of the diffraction pattern in the focal plane are covered by the photosensitive zones of modern CMOS sensors. The number of CMOS sensor pixels within the area of interest will be of the order of 3 x 104 for a CMOS pixel size of 10 ^m; this is more than enough to implement the discussed technique. Following from the expected dimensions of the used zone of the spatial light modulator (no more than 3x3 mm) and the

requirement for readout beam uniformity within this zone, it can be assumed that the beam expansion factor can be no more than 20-25. Note that in the considered model of the formation of diffraction patterns in the focal plane of the Fourier-transforming lens (see Eq. (1)), a uniform distribution of the readout field amplitude over

the surface of the synthesized phase screen () is

assumed. This uniformity can be ensured at the stage of formation of the readout beam by using a beam-expanding telescopic system with a pinhole diaphragm as the beam cleaner. With a sufficiently large magnification factor of the telescopic system (as noted above, about 20-25 or more), in the paraxial region of the phase screen used for phase modulation, a close-to-uniform amplitude distribution will be formed. On the other hand, the problem of spatial inhomogeneity of the readout beam can be taken into account within the framework of the model (1) by additionally introducing a matrix of weighting amplitude coefficients. This matrix can be obtained using preliminary calibration of the considered polarimetric system. Note that the instrumental implementation is beyond the scope of this study and will be carried out in the future.

Also, it is necessary to note, that the considered approach based on the four-gradation matrix coding of initial nucleotide sequences, superposition of two Fourier transforms with a certain phase shift, and nonlinear transformations of the obtained superposition with a subsequent binarization and synthesis of the maps of extreme states can be considered as the basis for computer algorithms applied in identification of the objects with a complex quasi-random structure.

The proposed method is sufficiently flexible and allows for various modifications of the boundary field phase modulation algorithms that extend the functional capabilities of mapping and identifying the structure of nucleotide sequences in the target genes of infectious agents.

Disclosures

The authors declare no conflict of interest.

Acknowledgements

This work was supported by the Russian Science Foundation (Project № 22-21-00194).

References

1. S. Kumar, G. S. Kumar, S. S. Maitra, P. Maly, S. Bharadwaj, P. Sharma, and V. D. Dwivedi, "Viral informatics: bioinformatics-based solution for managing viral infections," Briefings in Bioinformatics 23(5), bbac326 (2022).

2. S. Goodwin, J. D. McPherson, and W. R. McCombie, "Coming of age: ten years of next-generation sequencing technologies," Nature Reviews Genetics 17, 333-351 (2016).

3. N. N. Pozdnichenko, M. S. Stupin, A. S. Gumenyuk, M. S. Doroshenko, and O. P. Shafeeva, "Algorithms and software for obtaining dissimilar order and high order of symbolic sequences," Journal of Physics: Conference Series 1050(1), 012063 (2018).

4. S. S. Ulyanov, O. V. Ulianova, S. S. Zaytsev, Y. V. Saltykov, and V. A. Feodorova, "Statistics on gene-based laser speckles with a small number of scatterers: implications for the detection of polymorphism in the Chlamydia trachomatis ompl gene," Laser Physics Letters 15(4), 045601 (2017).

5. S. S. Ulyanov, S. S. Zaytsev, O. V. Ulianova, Y. V. Saltykov, and V. A. Feodorova, "Using of methods of speckle optics for Chlamydia trachomatis typing," Proceedings of SPIE 10336, 103360D (2017).

6. J. W. Goodman, Statistical Optics, 2nd ed., John Wiley & Sons, Hoboken, New Jersey, USA (2015). ISBN: 978-1119-00945-0.

7. J. W. Goodman, Introduction to Fourier Optics, 3rd ed., Publishers, Englewood, Colorado, USA (2005). ISBN: 9780974707723.

8. A. Mazloum, A. van Schalkwyk, A. Shotin, A. Igolkin, I. Shevchenko, K. Gruzdev, and N. Vlasova, "Comparative Analysis of Full Genome Sequences of African Swine Fever Virus Isolates Taken from Wild Boars in Russia in 2019," Pathogens 10(5), 521 (2021).

9. D. Beltran-Alcrudo, M. A. C. C. Gallardo, S. A. Kramer, M. L. Penrith, A. Kamata, and L. Wiersma, African swine fever: detection and diagnosis, Food and Agriculture Organization of the United Nations (FAO), Rome, Italy (2017). ISBN: 978-92-5-109752-6.

10. R. Bracewell, The Fourier Transform and Its Applications, 2nd ed., McGraw Hill, New York, USA (1986). ISBN 978-0070070165.

i Надоели баннеры? Вы всегда можете отключить рекламу.