Научная статья на тему 'COMPUTATIONAL EXPERIMENTS ON SOLVING GRID ELLIPTIC EQUATIONS BY SOME PARALLEL ITERATIVE METHODS ON THE K60 CLUSTER'

COMPUTATIONAL EXPERIMENTS ON SOLVING GRID ELLIPTIC EQUATIONS BY SOME PARALLEL ITERATIVE METHODS ON THE K60 CLUSTER Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
43
23
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
MATHEMATICAL MODELING / ADAPTIVE ALTERNATING-TRIANGULAR METHOD / PARALLEL ALGORITHMS FOR SOLVING GRID EQUATIONS / MPI / OPENMP

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Atayan A.M., Chistyakov A.E.

The paper deals with the problem of preserving natural water systems, as well as maintaining their integrity, not only through the enterprise of organizational, engineering and technical solutions, but also through the development of highly effective mathematical modeling techniques that make it possible quickly and efficiently, based on interconnected high-precision models of hydrophysics and hydrobiology, predict the processes of pollution spreading and the occurrence of hazardous phenomena in coastal systems. The article considers algorithms for solving grid equations developed for high-performance cluster systems. A model of parallel computations is proposed, which makes it possible, when choosing the appropriate method for solving the problem of aquatic ecology, to estimate the cost of calculations, which is defined as the product of the time of parallel solution of the problem and the number of processors used. An estimate of the optimal amount of information packet for exchange between processors is obtained. The adaptive modified alternating-triangular method of minimum corrections is described, the results of numerical experiments for the parallel variant of this one are presented.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «COMPUTATIONAL EXPERIMENTS ON SOLVING GRID ELLIPTIC EQUATIONS BY SOME PARALLEL ITERATIVE METHODS ON THE K60 CLUSTER»

UDC 519.6 10.23947/2587-8999-2022-1-3-121-136

COMPUTATIONAL EXPERIMENTS ON SOLVING GRID ELLIPTIC EQUATIONS BY SOME PARALLEL ITERATIVE METHODS

ON THE K60 CLUSTER*

A.M. Atayan, A.E. Chistyakov

Don State Technical University, Rostov-on-Don, Russia

Hatayan24@mail.ru

The paper deals with the problem of preserving natural water systems, as well as maintaining their integrity, not only through the enterprise of organizational, engineering and technical solutions, but also through the development of highly effective mathematical modeling techniques that make it possible quickly and efficiently, based on interconnected high-precision models of hydrophysics and hydrobiology, predict the processes of pollution spreading and the occurrence of hazardous phenomena in coastal systems. The article considers algorithms for solving grid equations developed for high-performance cluster systems. A model of parallel computations is proposed, which makes it possible, when choosing the appropriate method for solving the problem of aquatic ecology, to estimate the cost of calculations, which is defined as the product of the time of parallel solution of the problem and the number of processors used. An estimate of the optimal amount of information packet for exchange between processors is obtained. The adaptive modified alternating-triangular method of minimum corrections is described, the results of numerical experiments for the parallel variant of this one are presented.

Keywords: mathematical modeling, adaptive alternating-triangular method, parallel algorithms for solving grid equations, MPI, OpenMP.

Introduction. In the modern world, the issues of ecology and the preservation of the quality of coastal (especially fresh) and commercial waters are becoming more and more relevant. In order to preserve water systems and maintain their integrity, it is important not only to take organizational, engineering and technical solutions, but also to have highly effective methods for modeling various potential and actual mechanisms of primary and secondary pollution of coastal systems, which make it possible to quickly and efficiently, based on interconnected high-precision models of hydrophysics and hydrobiology to predict the processes of transport of pollutants, as well as the occurrence of hazardous phenomena. Cluster systems are widely used to reduce the time of calculations by mathematical models. Currently, there is a certain trend in the development of computing systems. On the one hand, as before, performance growth continues due to an increase in the number of processor cores. On the other hand, heterogeneous systems are becoming more and more popular, the

* The study was financially supported by the Council for Grants of the President of the Russian Federation within the framework of the scientific project No. MD-3624.2021.1.1

high performance of which is due to the use of computational elements of a fundamentally different architecture. The first direction of development requires an increasing degree of parallelism from the algorithm and motivates the transition from the MPI model to the two-level parallel model MPI + OpenMP, which better matches the modern architecture of supercomputers with multi-core nodes. The second direction requires the adaptation of algorithms to a heterogeneous architecture and an even more complex parallel model, combining fundamentally different types of parallelism. When developing parallel algorithms for solving problems of computational mathematics, the fundamental point is the analysis of the efficiency of using parallelism, which usually consists in estimating the resulting acceleration of the calculation process (reducing the time for solving the problem). The formation of such acceleration estimates can be carried out in relation to the selected computational algorithm (evaluation of the efficiency of parallelization of a particular algorithm).

The paper [1] presents the results of the work of the created effective software for performing hydrodynamic computational experiments, which allow numerical modeling of bottom deformation in the coastal zone of a reservoir. The results of numerical experiments are presented.

The paper [2] considers the types of parallelism used in the architectures of modern computer systems and describes how they manifest themselves in programs. Six paradigms of parallel programming are analyzed, and the connection of paradigms with generations of high-performance computing systems is shown. Methods for describing and representing parallelism with the help of various kinds of program models are considered. The reasons that determine the complexity of developing effective software for parallel computing systems are discussed. The connection of the discussed material with the actively developed online encyclopedia of properties and features of parallel algorithms AlgoWiki is noted.

The paper [3] considers the use of extended parallelization for computing problems of gas dynamics and aeroacoustics on heterogeneous clusters with nodes that combine computational elements of fundamentally different architectures, CPUs and GPGPUs. The two-level parallelization model MPI + OpenMP is supplemented by the use of OpenCL to load the GPGPU, thus implementing the third level of parallelism.

In [4], a comparative experimental analysis of various methods of iteration acceleration for solving superlarge sparse systems of linear algebraic equations (SLAE) is carried out: parametrized intersection of subdomains, the use of special interface conditions at the boundaries of adjacent subdomains, and the use of coarse-grid correction (aggregation, or reduction) of the original SLAE for constructing an additional preconditioner. Algorithms are parallelized at two levels by software tools for distributed and shared memory. Test SLAEs are obtained using finite-difference approximations of the Dirichlet problem for the diffusion-convective equation with different values of the convective coefficients on a sequence of condensing grids.

Decomposition of the computational domain in one spatial direction. To describe hydrodynamic problems, the system of Navier-Stokes equations and the continuity equation with the corresponding initial and boundary conditions are used [5, 6]. Approximation of the problem of hydrodynamics with respect to the time variable can be performed on the basis of splitting schemes for physical processes [7]. In this case, the most computationally time-consuming part will be the calculation of the pressure field based on the Poisson equation. The use of regularizers according to

B.N. Chetverushkin in the continuity equation entails a change in the method of calculating the pressure field, which will be calculated based on the wave equation of the form [8]:

-^p;-AP=f, (1)

c

where A is the Laplace operator, or Laplacian, c is the speed of sound, f is the given function.

It follows from this approach that the pressure cannot propagate faster than the speed of the shock front (in the linear approximation of the speed of sound). If we do not take into account the time of collision between molecules in the continuity equation, then we obtain the Poisson equation [9]. Applying the Gauss-Ostrogradsky theorem for this equation, we obtain the instantaneous propagation of the pressure field from the sources of the field to sinks. The approach obtained is less laborious from a computational point of view, since the discrete analog of the wave equation has diagonal dominance, in contrast to the discrete analog of the Poisson equation, as a result of which the condition number of the operator, which affects the rate of convergence of the iterative method, decreases. It should also be noted that when calculating the pressure based on the Poisson equation in the case of boundary conditions in the Neumann form, the determinant of the system will be equal to zero, as a result, the problem of the existence of a solution to the grid equation obtained in the process of discretization arises. When using regularizers according to B.N. Chetverushkin has no such problem.

For equation (1), we set the initial conditions in the following form:

PL = ^ 4.0=0- (2)

If the pressure field is known, then we will use the boundary conditions of the first kind:

4 ^ = P

l( x, y ,z)ey 1

If the flow through the boundary is known, then we will use the boundary conditions of the second (for an — 0) or third kind (for an > 0):

pn'|( x, y,z )er=a„ P + JJ, (3)

where n is the normal directed inside the computational domain, y is the boundary of the computational domain, a = {ax ,ay , az j, ¡3 — {¡x, ¡3y, J.321 are given vectors.

Problem (1) - (3) is considered in the domain r, the linear dimensions of which along the vertical are much smaller than the dimensions along the horizontal coordinate directions.

We will assume that computational domain r is inscribed in a parallelepiped, which we will cover with a uniform computational grid:

ohJ : {x (ihx) — x,, y (jhy ) — yj; i e 0N-1, j e 0, Ny-1; hxNx — lx, hyNy — ly j, where i, j are the indices of the computational domain, hx, h are the steps in spatial directions, Nx, N are the numbers of steps in spatial directions, 4, ^ are the dimensions of the computational domain.

In the nodes of the computational grid, the values of the field u(x, y) are calculated: u;. . at i e1, N - 2, j e1, N - 2, while along the perimeter (i e{0, N ~ 1}, j e{0, N _1}) there are fictitious nodes. Let us decompose the computational domain along the spatial direction Oy by straight lines parallel to the axis Ox, and let wr denote the subdomain with the number r, 0 < r < R—1, where R - the number of subdomains into which the original domain is divided. The

calculated nodes of area wr are elements U . at i e 1, N — 2, j e 1, N[ — 2 . The splitting of the original

area is made in such a way that adjacent areas wr and wr + intersect at two nodes along the direction perpendicular to the splitting lines and equalities take place urNr = urt ¡J1, urNr = UJ4 (Fig. 1).

To represent the value of field u(x, y) in vector form, a pair of indices i, j can be matched with the value m describing the ordinal number of the element of vector u : m = i + jNx, 0 < m < n — 1

n is the length of vector u = (u0,u,. .,u„_i)r . This representation is convenient to use when describing and researching algorithms for solving grid equations by iterative methods.

Fig. 1. Decomposition of the computational domain.

For fragments wr obtained as a result of decomposition of the computational domain in one spatial direction, it is necessary to know two parameters: the initial index j = N{ in the initial computational domain and the width of the fragment N[. The index number N[, from which the corresponding fragment of the computational domain begins, can be calculated using the formula

N =Lr•(Ny — 2)/R ,

where [ x J is the «floor» function is defined as the largest integer less than or equal to x, x ] is the «ceiling» function is defined as the smallest integer greater than or equal to x.

The width of the subdomain wr along the axis Oy is calculated by the formula

n=[( r+o-( Ny—2)/r

— N+ 2.

The following parameters are used for the theoretical evaluation of the work of computer systems:

^ is the execution time of one arithmetic operation; ^ is the time of data transmission organization (latency); ^ is the time of transmission per one data.

Figure 2 shows a graph of the dependence of the transmission time on the amount of data for a different number of exchanges between nodes of the computing system. The graph shows that the transmission time dependence function has a jump with the volume of transmitted data equal to approximately 512 floating points. Let 's denote this value Nmax = 512.

Fig. 2. Dependence of data transfer time on volume when working with a different number of

computing nodes.

The calculation of data on a multiprocessor computing system allows you to significantly reduce the calculation time. However, the efficiency of the computing system's operating time may not always be expected. In this case, it is correct to carry out a theoretical analysis of calculating the calculation time based on regression analysis.

Least Squares Latency Calculation. Let there be a certain variable i, which represents the ith observation of the dependent variable y,, and wee denote the explanatory factors by a vector xt. Then we can express the multiple regression model in the following form:

y, = 3 +^1x,1 +Ax2 +... + Ppxip +st, (1)

where i = 1,2,...,p; ¡3 is a free member; st is the member containing an error.

A vector with dimension n is a matrix of values of explanatory factors, with dimension n on (p +1). The model in matrix form will look like

Y = X 3 + s. (2)

An estimate of this model from a certain sample will be an equation in which

3 = (¡0 3 ... Pp) , e = (ex e2... sn) . To estimate the vector of unknown parameters3 , we need to

use the least squares method.

The minimization condition for the residual sum of squares can be represented as:

S = I (y* - y) =! e? = ££T = (Y - Xß! (Y - Xß) ^ min

t=i t=i

(3)

Performing transformations in (3), we obtain

S = YY-pTXTY - YTXp + fiXTX/ . (4)

The product YTX/ is, as a result, some matrix of dimension l- n [n (p +1)] • [/ (p +1)] . Hence it follows

S = YTY - 2pTX TY -/XT X min, (5)

where XTX is the matrix of sums of first powers, squares and pairwise products of n observations of explanatory factors; XTY is vector of products, dimension n of observations of explanatory factors and dependent variables.

The solution of the matrix equation will be the vector / = (XX) 1XY , where (XX) 1 is the matrix inverse to the matrix of system coefficients; XY is a vector of its free members.

Knowing vector /, any multiple regression equation can be represented as y = X0T/. To

calculate the operating time of the computing system y acts as the final time, and the explanatory factors indicated by vector x are: the size of the computational grid, the number of computing nodes

used. Thus, it is possible to calculate the average operating time of the entire system. Based on the presented regression analysis, a linear dependence of the running time of the parallel program on the volume of transmitted data and the number of involved computing nodes of the multiprocessor system was obtained (Fig.3) for those cases when the volume of transmitted data is less than or more than 512 elements.

Latency and data transfer times are calculated using the least squares method. The formula for latency is:

["5.21 x10-6 +1.53x10-7 (R-2), if Nx <Nmax,

t/ =■

6.733 x10-6 (R-2),ifNx > Nmax.

(6)

Transmission time per data - tx = 3.3 x!0~y [s].

a)

310

210

110

-1 1 , 1 -

700

6)

Fig. 3. Dependence of data transfer time on volume a) data volume up to 512 elements; b) the amount of data is more than 512 elements.

Parallel computations model of two-dimensional problems based on explicit schemes and iterative methods with a diagonal preconditioner. The algorithm for solving grid problems based on an explicit scheme corresponds to one iteration of the Jacobi algorithm. Consider the complexity of implementing algorithms based on explicit schemes and iterative methods with a diagonal preconditioner using the Jacobi method as an example. It is required to solve a system of linear algebraic equations: Au = f, where A = (atj)™.=1 is a matrix that has an inverse matrix,

u =(u,u2,...,%)T, f = (f,f,..., fN)T are column vectors, respectively, of unknown and right parts.

To calculate the residual vector rn = f - Aun, where n is the iteration number in the sequential

version of the Jacobi method, it is required to find the values (Nx - 2)(Ny - 2) of the vector elements

u at the next time layer. With the parallel organization of calculations, each calculator will process (Nx -2)(Nr -2)the elements of the vector u. To calculate each element of the residual vector, 10

arithmetic operations are required. Knowing the residual vector, we determine the complexity of calculating the elements of the vector u. It is 2 arithmetic operations. The parallel variant of the Jacobi method algorithm will also require 2 data transfers by the number of (Nx - 2) elements.

Fig. 4. Calculation by the Jacobi method.

The time spent on one iteration in the case of a sequential version of the Jacobi method is:

t = ta12(Nx -2)(Ny -2). (7)

Let's estimate the running time of a parallel algorithm on a multiprocessor computing system:

t = 12tfl (Nx - 2) (max (N2r - 2) + 2 (t, + (Nx - 2) tx )),

Ny-2

R

< max

(K - 2)<

Ny-2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

R

max Nr

N, - 2

R

Then

t = 12a (N -y -2) + 2(t, +(N, - 2)t,).

The speed-up of the parallel algorithm of the Jacobi method is:

(8)

A =

6ta (Nx -2)(Ny -2)

(Nx -2)(Nv -2)

6 ta ^ x 2 y + tk + (Nx - 2) tx

If the amount of transmitted data is N - 2 > , then k =

x max J

N2 - 2

N„„„

(9)

exchanges are

(10)

performed. Then the running time of the parallel algorithm of the Jacobi method is:

A = J{1 + R(tk+(Nx- 2)tx)'

R [ 6ta (Nx - 2)(Ny - 2

Seidel method. We will solve grid equations in a conveyor way. Let us additionally partition along the O x axis into blocks of m elements. Let us denote the step s number, the block number q, 0 < q < Q -1 (Fig. 5). At the step, the s computing node r, if s > r, calculates the block with the

number q = s - r. The parameter s = q + r is in the range 0 < s < R + Q - 2, where Q = [(N - 2)/

is the number of blocks.

im

Fig. 5. Calculation by the Seidel method.

It is necessary to do R + Q -1 steps, at each step it is necessary to calculate m • (N2r - 2) grid values of the function

un+1 = D- (A^u"+1 + A+un),

the calculation of which takes 9 arithmetic operations. The total time spent on calculations will be

t = 9tam (N2r - 2) (R + Q -1) = 9ta (Nx - 2 + m (R -1)) (N2r - 2) .

The time equal to 9tam (R -1) (N2r - 2), is the start delay of the last computing node.

Let's calculate the time spent on data exchanges. When calculating the values of grid functions, it is necessary to make R + Q -1 transfers over m elements. In which, when 0 < s < R - 2

collective exchanges are s performed between R computing nodes. During the operation of the conveyor R -1 < s < Q -1, collective exchanges are performed between all computing nodes (total Q - R + 1exchanges). When the conveyor Q < s < R + Q - 2 is stopped, R - 1collective exchanges

between the R + Q - 2 - s computing nodes are performed. After performing the iteration by the Seidel method, one transmission of the Nx - 2 elements between all the computing nodes are required.

The total time for the organization of transfers will be

R-2 R +Q-2

h = It(s)+(Q-R+1)t(R-!)+ Z t(R+Q-2-s) =

s=0 s=Q

R-2

2E t' (s )+(Q - R+1) t' (R -1).

s=0

The time spent on exchanges is

txmQ + tx (Nx - 2) = 2tx (Nx - 2). The time spent on the parallel implementation of one iteration of the Seidel method is

R-2

t = 9ta (Nx -2 + m(R-1))(N2r - 2) + 2£t! (s) +

s=

=0

+ (Q-R + 1)-1, (R-1) + 2tx (Nx -2).

Take the derivative t (m) with respect m to Q =

N - 2

m N. - 2

N - 2

m

, Nr2 « N2.

t'(m) = 9ta (R -1) (N2 - 2) --^^ -1, ^R -1) = 0.

m =

m

Therefore, the optimal amount of transfers is equal to

)(Nx -2)-tt (R-ij"

pta (R-1)(N2 -2)

The speed-up of the parallel algorithm of the Seidel method is equal to

A = 9ta (Nx - 2) (N - 2)/(9tfl (Nx - 2 + m (R -1)) (Nr2 - 2) +

R-2 A

+2Zt, (s) + (Q-R +1)-1, (R-1) + 2tx (Nx -2)1.

s=0 y

Figure 6 shows a comparison of acceleration depending on the amount of transmitted data on a grid of 1000 x 1000 computational nodes. Measurements of the calculation time were made for transmissions with a volume of 10, 50 and 100 elements. The greatest acceleration was observed with a transmission volume of 50 elements.

Figure 7 shows a comparison of the acceleration of the parallel algorithm depending on the amount of transmitted data on a grid of 10,000 x 10,000 computational nodes. Measurements of the calculation time were made for transmissions with a volume of 5, 10, 50, 100, 500, and 1000 elements. The greatest acceleration was observed with a transmission volume of 50 elements. In the case of an increase in the volume of receiving and transmitting data, the speed of calculations began to decrease. This result is due to the fact that with large volumes of transmissions, labor costs increase, which ultimately does not justify itself.

Fig. 6. Comparison of the acceleration of the parallel algorithm depending on the amount of transmitted data

on a 1000 x 1000 grid.

Fig.7. Comparison of parallel algorithm acceleration depending on the amount of data transferred on a grid

of 10000 x 10000 calculation nodes.

Figure 8 shows a comparison of the theoretical and practical values of the acceleration of the parallel algorithm in the case of the optimal amount of transfers.

Fig. 8. Comparison of the theoretical and practical values of the acceleration of the parallel algorithm in the

case of the optimal amount of transfers.

Modified alternating triangular iterative method. In a finite-dimensional Hilbert space H, the problem of finding a solution to an operator equation is considered [10]:

Ax = f, A: H ^ H, (21)

where A is a linear, positive definite operator (A > 0). To find problem (21), we will use the implicit iterative process

m+1 m

BX—— + Axm = f, B: H ^ H. (22)

T

In equation (22), m is the iteration number, t > 0 is the iteration parameter, and B is some reversible operator. The inversion of the operator B in (22) should be much simpler than the direct inversion of the original operator A in (21). When constructing B, we will proceed from the additive representation of the operator A is the symmetric part of the operator A.

Ao = R + R, R = R*, (23)

where R, R are the lower- and upper-triangular operators.

Here and below we will also use the skew-symmetric part of the operator A

, A - A* A1 --.

1 2

Due to (23) (Ay, y) = (Ay, y) = 2(Ry, y) = 2(Ry, y). Therefore R > 0, R > 0 . Let in (22)

B = (D + oR )Dl(D + oR), D = D* > 0, o> 0, y e H, (24)

where D is some operator (for example, the diagonal part of the operator A).

Since A0 = A0* > 0, then together with (23) this given B = B* > 0. Relations (22) - (24) define the modified alternating-triangular method (MATM) for solving the problem, provided that the operators R1, R are defined and the methods for determining the parameters t , co and the operator D are indicated.

It is possible to write the algorithm for calculating the grid equations of the MATM of the variational type as follows:

r'" = Ax'" - f, B(a>m)w'H - rm, <5 =

(Dw'",wm)

\\(DlRW'\RWmy

(Awm, wm )2 (B-'Awm, Awm ) s 2 = 1__VZ_:_I_ k2 ^^_1_—_- (25)

m (B-lAwm, Awm)(Bwm, wm ^ m (B-lAwm, Awm) '

1 -

G. =

s 2k2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

sm km

^K1+km) (A0wm,wm)

' T>»+1 = / J, , „, /„A ' ^ = ^ " Tn,+lW , = <*>m

1 + km (1 - Sm2 )' m+^ m (B 1A0wm , A0wm )

где rm - the residual vector, wm - the correction vector, parameter describes the rate of convergence of the method, km describes the ratio of the norm of the skew-symmetric part of the

operator to the norm of the symmetric part.

The convergence rate of the method is:

P<

* 1 v -1

* 7 v +1

where v* = 1 + k2 + kj , where v - matrix condition number C0, C0 = B 1/2AB The value c is optimal for

■1/2

CO =

( Dwm wm )

( DxR2wm , R2 wm )

and there is an estimate of the condition number of the matrix C :

f

v = max

y

1

- + 2

v

Dy, y )( D-% y, R2 У )

( AoУ, у )

1

< — 2

1

'

where â = —, D < - A, RD^R <-A . Д — 1 2 4

(26)

Fig. 9. Dependence of the acceleration of parallel algorithms of the Seidel, Jacobi methods and the modified alternating-triangular method on the number of computing nodes.

Figure 9 shows a comparison of the acceleration of parallel algorithms for the Seidel, Jacobi methods and the modified alternating-triangular method depending on the number of computing nodes. The calculations were made on a grid with one million calculation cells. The launches were carried out sequentially, starting from the launch on one computing node and ending with the connection of all available nodes.

In comparison with the Jacobi and Seidel methods, the MATM requires a significantly smaller number of iterations for convergence. With a good optimization of the parallel algorithm of the MPTM, the acceleration differs by no more than 10% on the number of computing nodes up to 24 compared to the acceleration of the parallel algorithm of the Jacobi method. Figure 10 shows the speed-up graphs of the developed parallel algorithms based on MPI and hybrid technology MPI + OpenMP depending on the number of computers involved (taking into account various options for decomposition of the computational domain) for MATM. The maximum number of calculators used was 24, the size of the computational grid was one million computational cells.

Acceleration

80

-MPI MPI+OpeaMP

70

60 j 50

SO i-

Ü 40

u ÍJ

< 30 20 10 0

1 2 3 4 5 6 7 S 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Number of nodes

Fig. 10. Dependence of the speed-up of parallel algorithms based on MPI and MPI + OpenMP technologies

on the number of computers (alternate-triangular method).

Conclusion. The efficiency of parallel programs on systems with distributed memory essentially depends on the communication environment. A fairly complete communication environment is characterized by two parameters: bandwidth, which determines the number of bytes transmitted per unit of time, and latency. Communication operations are much slower than accessing local memory, so the most efficient parallel programs will be those in which exchanges are minimized.

Under certain circumstances, the acceleration may be greater than the number of processors used, in which case they say that there is a superlinear acceleration. Despite the paradoxical nature of such situations (acceleration exceeds the number of processors), in practice, superlinear acceleration can take place. One of the reasons for this phenomenon may be the non-linear nature of the dependence of the complexity of solving the problem on the amount of data being processed. The difference between the computational schemes of the serial and parallel methods can also be a source of superlinear acceleration.

On closer examination, it can be noted that attempts to improve the quality of parallel computing in one of the indicators (acceleration or efficiency) can lead to a deterioration in the situation in another indicator, because the indicators of the quality of parallel computing are contradictory. For example, an increase in acceleration can usually be achieved by increasing the number of processors, which usually leads to a drop in efficiency. Conversely, efficiency gains are achieved in many cases by reducing the number of processors (easily achieved with a single

processor). As a result, the development of parallel computing methods often involves the choice of some compromise option, taking into account the desired acceleration and efficiency indicators.

When choosing an appropriate parallel method for solving a problem, it may be useful to estimate the cost of computing, defined as the product of the parallel solution time of the problem and the number of processors used, and to obtain the optimal amount of information packet for exchange between processors. The application of this approach made it possible to achieve the greatest acceleration.

To obtain theoretical estimates of the performance of a computing system for calculating two-dimensional and three-dimensional diffusion-convection problems by iterative methods based on the decomposition of the computational domain in one and two spatial directions, the dependence of the transmission time on the amount of data for a different number of nodes of the computing system is obtained. It is established that the dependence of the transmission time has a jump with the volumes of transmitted data equal to approximately 512 floating point numbers. Based on the data on the number of processors, the amount of data transmitted, the size of the calculated subdomain, data on latency times and the execution of one arithmetic operation, the optimal amount for data transmission are calculated using the least squares method. The time spent on parallel implementation of one iteration by pipeline-type methods (the Seidel method and the ATM) is calculated from the volume of transmitted data, the number of blocks and the time of data transmission organization (latency), and the optimal amount of transmitted data depending on the system parameters is also calculated.

For areas elongated along horizontal directions, a method for solving a system of grid equations is described. To solve such a system that arises during the discretization of the problem of hydrodynamics of a shallow reservoir, the MATM as applied. The results showed the advantages of hybrid parallelization technology, the acceleration for the MPM-based algorithm reached 68,40 when dividing the computational domain into 24 parts along the direction of the Ox axis.

References

1. Sukhinov A.I., Chistyakov A.E., Protsenko E.A., Sidoryakina V.V., Protsenko S.V. Parallel algorithms for solving the problem of coastal bottom relief dynamics // Num. Meth. Prog.. - 2020. - Vol.21, №3. - P.196-206.

2. Voevodin V.V. Parallelism in large software packages (why is it difficult to create efficient software) // Chebyshevskii Sb. - 2017. - Vol.18, №3, P. 188-201 (in Russian).

3. Gorobets A.V., Sukov S.A., Zheleznyakov A.O., Bogdanov P.B., Chetverushkin B.N. / The use of GPUs in the framework of hybrid two-level parallelization MPI + OpenMP on heterogeneous scientific computing resources // Proceedings of the international conference. PCT2011- 2011. - P.452-460 (in Russian).

4. Gur'eva, Ya.L. Il'in V.P. On acceleration technologies of parallel decomposition methods // Num. Meth. Prog. - 2015. - Vol.16, №1, P. 146-154 (in Russian).

5. Atayan A.M., Nikitina A.V., Sukhinov A.I., Chistyakov A.E. Mathematical modeling of hazardous natural phenomena in a shallow basin // Comput. Math. Math. Phys. - 2022. - Vol.61, № 2.- P. 269-286.

6. Sukhinov A.I., Chistyakov A.E., Alekseenko E.V. Numerical realization of three-dimensional model of hydrodynamics for shallow water basins on high-performance system // Math. Models Comput. Simul. - 2011. - Vol.3, №5. - P. 562-574.

7. Alekseenko E., Roux B., Fougere D., Chen P.G. The effect of wind induced bottom shear stress and salinity on Zostera noltii replanting in a Mediterranean coastal lagoon. Estuarine // Coastal and Shelf Science.

- 2017. - Vol.187. - P.293-305.

8. Belotserkovskii O.M., Gushchin V.A., Shchennikov V.V. Use of the splitting method to solve problems of the dynamics of a viscous incompressible fluid // Comput. Math. Math. Phys.- 1975. - Vol.15, №1. - P. 190-200.

9. Sukhinov A.I., Chistyakov A.E., Kuznetsova I.Y., Atayan A.M., Nikitina A.V. Regularized difference scheme for solving hydrodynamic problems // Math. Models Comput. Simul. - 2022. - Vol. 14, №5.

- P.745-754.

10. Sukhinov A.I., Chistyakov A.E. Adaptive analog-SSOR iterative method for solving grid equations with nonselfadjoint operators // Math. Models Comput. Simul. - 2012. - Vol.4, №4. - P.398-409.

Authors:

Atayan Asya Mikhailovna, Don State Technical University, (Gagarin square, 1, Rostov-on-Don, Russia), Senior Lecturer of the Department of Computer Engineering and Automated Systems Software, DSTU. Email address: atayan24@mail.ru, ORCID: 0000-0003-4629-1002.

Chistyakov Alexander Evgenyevich, Don State Technical University, (Gagarin square, 1, Rostov-on-Don, Russia), Doctor of Science in Physics and Maths, Professor of the Department of Computer Engineering and Automated Systems Software. Email address: cheese_05@mail.ru, ORCID: 00000002-8323-6005.

УДК 519. 6 10.23947/2587-8999-2022-1-3-121-136

ВЫЧИСЛИТЕЛЬНЫЕ ЭКСПЕРИМЕНТЫ ПО РЕШЕНИЮ СЕТОЧНЫХ ЭЛЛИПТИЧЕСКИХ УРАВНЕНИЙ НЕКОТОРЫМИ ПАРАЛЛЕЛЬНЫМИ ИТЕРАЦИОННЫМИ МЕТОДАМИ НА КЛАСТЕРЕ К60 *

А.М. Атаян, А.Е. Чистяков

Донской государственный технический университет, Ростов -на-Дону, Российская Федерация

Hatayan24@mail.ru

Работа посвящена решению проблемы сохранения природных водных комплексов, а также поддержанию их целостности не только путем предприятия организационных, инженерных и технических решений, но и с помощью разработки высокоэффективных методик математического моделирования, дающих возможность быстро и эффективно, на основе взаимосвязанных высокоточных моделей гидрофизики и гидробиологии, прогнозировать процессы распространения загрязнений и возникновение опасных явлений в прибрежных системах. В статье рассматриваются алгоритмы решения сеточных уравнений, разработанные для высокопроизводительных кластерных систем. Предложена модель параллельных расчетов, позволяющая при выборе надлежащего способа решения задачи водной экологии оценить стоимость вычислений, определяемую как произведение времени параллельного решения задачи и числа используемых процессоров. Получена оценка оптимального объема пакета информации для обмена между процессорами. Описан адаптивный модифицированный попеременно-треугольный метод минимальных поправок, приведены результаты численных экспериментов для параллельного варианта МПТМ.

Ключевые слова: математическое моделирование, адаптивный попеременно-треугольный метод, параллельные алгоритмы решения сеточных уравнений, MPI, OpenMP

Авторы:

Атаян Ася Михайловна, Донской государственный технический университет (344000, Ростов-на-Дону, пл. Гагарина, 1), старший преподаватель кафедры «Программное обеспечение вычислительной техники и автоматизированных систем», ДГТУ, atayan24@mail.ru, ОЯСГО: 0000-0003-4629-1002.

Чистяков Александр Евгеньевич, Донской государственный технический университет (344000, Ростов-на-Дону, пл. Гагарина, 1), доктор физико-математических наук, профессор кафедры «Программное обеспечение вычислительной техники и автоматизированных систем», ДГТУ, сheese_05@mail.ru, ОЯС1Б: 0000-0002-8323-6005.

* Исследование выполнено при финансовой поддержке Совета по грантам Президента Российской Федерации в рамках научного проекта № МД-3624.2021.1.1

i Надоели баннеры? Вы всегда можете отключить рекламу.