Научная статья на тему 'New features of parallel implementation of N-body problems on Gpu'

New features of parallel implementation of N-body problems on Gpu Текст научной статьи по специальности «Медицинские технологии»

CC BY
315
44
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
MULTI-GPU / OPENMP-CUDA / GPU-DIRECT / NVIDIA TESLA / N-BODY / SINGLE AND DOUBLE PRECISION NUMERICAL SIMULATION / COLLISIONLESS SYSTEM / GRAVITATIONAL INSTABILITY / ЗАДАЧА N-ТЕЛ / ОДИНАРНАЯ И ДВОЙНАЯ ТОЧНОСТЬ ЧИСЛЕННЫХ РЕШЕНИЙ / ЗВЕЗДНЫЙ ГАЛАКТИЧЕСКИЙ ДИСК / ГРАВИТАЦИОННАЯ НЕУСТОЙЧИВОСТЬ

Аннотация научной статьи по медицинским технологиям, автор научной работы — Khrapov S.S., Khoperskov S.A., Khoperskov A.V.

This paper focuses on the parallel implementation of a direct N-body method (particle-particle algorithm) and the application of multiple GPUs for galactic dynamics simulations. Application of a hybrid OpenMP-CUDA technology is considered for models with a number of particles N:105-107. By means of N-body simulations of gravitationally unstable stellar galactic we have investigated the algorithms parallelization efficiency for various Nvidia Tesla graphics processors (K20, K40, K80). Particular attention was paid to the parallel performance of simulations and accuracy of the numerical solution by comparing single and double floating-point precisions (SP and DP). We showed that the double-precision simulations are slower by a factor of 1,7 than the single-precision runs performed on Nvidia Tesla K-Series processors. We also claim that application of the single-precision operations leads to incorrect result in the evolution of the non-axisymmetric gravitating N-body systems. In particular, it leads to significant quantitative and even qualitative distortions in the galactic disk evolution. For instance, after 104 integration time steps for the single-precision numbers the total energy, momentum, and angular momentum of a system with N=220 conserve with accuracy of 10-3, 10-2 and 10-3 respectively, in comparison to the double-precision simulations these values are 10-5, 10-15 and 10-13, respectively. Our estimations evidence in favour of usage of the second-order accuracy schemes with double-precision numbers since it is more efficient than in the fourth-order schemes with single-precision numbers.

i Надоели баннеры? Вы всегда можете отключить рекламу.

Похожие темы научных работ по медицинским технологиям , автор научной работы — Khrapov S.S., Khoperskov S.A., Khoperskov A.V.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «New features of parallel implementation of N-body problems on Gpu»

MSC 34N05, 37M05, 68U20

DOI: 10.14529/ mmp 180111

NEW FEATURES OF PARALLEL IMPLEMENTATION OF N-BODY PROBLEMS ON GPU

S.S. Khrapov1, S. A. Khoper skov2, A. V. Khoper skov1

1 Volgograd State University, Volgograd, Russian Federation

2

E-mail: [email protected], [email protected], [email protected]

This paper focuses on the parallel implementation of a direct N-body method (particle-particle algorithm) and the application of multiple GPUs for galactic dynamics simulations. Application of a hybrid OpenMP-CUDA technology is considered for models with a number of particles N ~ 105 ^ 107. By means of N-body simulations of gravitationally unstable stellar galactic we have investigated the algorithms parallelization efficiency for various Nvidia Tesla graphics processors (K20, K40, K80). Particular attention was paid to the parallel performance of simulations and accuracy of the numerical solution by comparing single and double floating-point precisions (SP and DP). We showed that the double-precision simulations are slower by a factor of 1,7 than the single-precision runs performed on Nvidia Tesla K-Series processors. We also claim that application of the single-precision operations leads to incorrect result in the evolution of the non-axisymmetric gravitating N

distortions in the galactic disk evolution. For instance, after 104 integration time steps for the single-precision numbers the total energy, momentum, and angular momentum of a system with N = 220 conserve with accuracy of 10-3, 10-2 and 10-3 respectively, in comparison to the double-precision simulations these values are 10-5, 10-15 and 10-13, respectively. Our estimations evidence in favour of usage of the second-order accuracy schemes with double-precision numbers since it is more efficient than in the fourth-order schemes with single-precision numbers.

Keywords: Multi-GPU; OpenMP-CUDA; GPU-Direct; Nvidia Tesla; N-body; single and double precision numerical simulation; collisionless system; gravitational instability.

Introduction

Different N-body models are essential for the theoretical studies of the gravitating collisionless systems dynamics [1], such as galactic stellar disks, elliptical galaxies, globular clusters, galactic dark matter haloes [2-4]. N-body models is a fundamental tool for cosmological dark matter only simulations [5,6].

Our research may also contribute to the Lagrangian methods of computational fluid dynamics. Let us refer to a widely used SPH method (Smooth Particle Hydrodynamics) for a self-gravitating gas [7,8]. In addition to astrophysical applications, the N-body method is widely used for the modeling of rarefied plasma, ion and electron beams, problems of molecular dynamics, which differ in the type of interaction between the particles.

Gravitational interactions between N particles is a resource-intensive problem, and it can be solved using different approaches. There are various groups of approximate methods (for example, Particle-Mesh, SuperBox [9], Particle-Multiple-Mesh/Nested-Grid Particle-Mesh [10], TreeCode [11], Fast Multiple Method [12]) which significantly reduce computation time in comparison to the direct calculation of the gravitational forces between all pairs of particles (so-called Particle-Particle or PP method) which has a complexity of O(N2). However, the PP approach provides the best accuracy for the

total gravitational force calculation, and it is a kind of benchmark for the testing of the approximate methods.

In computational astrophysics, the problem of the software transfer to new hardware platforms becomes relevant due to the wide distribution of powerful computer systems on graphics processors. The result of the parallel software implementation and its efficiency depend a lot on the features of the code and the sequence of numerical operations [13,14].

To increase the spatial resolution for a large number of particles N researchers often use a second-order numerical time integration schemes with a single-precision numbers. From another hand, such schemes are very efficient for long-term integration, e.g., evolution of the galactic systems at cosmological time scales (up to hundred of disk rotations or ~ 105 — 106 integration time steps).

This approach is justified for CPU calculations, however for parallel GPUs, this approach can lead to unphysical results because of parallel features of the hardware. Our work aims to provide computational characteristics analysis of the parallel program for

N

on GPUs.

1. Basic Equations and Numerical Scheme

N

N

dt =

dV Y, fj, * =1,-,N, (1)

where vi is the velocity vector of the i-th particle. The gravitational interaction force between i-th and j-th particles is:

f = G m_ r)

fj = -G |r - rj + 5\3 ' (2)

where G is the gravitational constant, mj is the mass of j-th particle, 8 is the gravitational softening length, the radius-vector ri(t) = fvi dt determines the position of the ith particle in space and for cartesian coordinate system we can use the following Ir — rj + 81 = \J(xi — Xj )2 + (yi — yj )2 + (zi — Zj )2 + 82. The small para meter 8 ensures a collisionlessness of the system.

Equilibrium model of the collisionless stellar disk in the radial direction is made by the balance between the disk self-gravity, rotation and chaotic (thermal) motions [15]:

v^ = -5Ф + c2 Л _ сф + г д(6с2г) + г д{vrvz)'. r дг г \ с2 qo2 дг с2

here ^ is the gravitational potential, g is the density of the disk, (r, p, z) is the cylindrical coordinates, cr is the radial velocity dispersion, cv is the azimhutal velocity dispersion, vr, vv, vz are the radial, azimuthal and vertical velocity components, respectively, (...) is the averaging operation.

Vertical equilibrium for the geometrically thin disk is the following:

d?g ^ dz 2

dg dz

AkGq2 g2 d ga

+ — [g _ g*] + -*Jz-Q =0 ' (4)

2

Fig. 1. (a) The two-level scheme of parallelization with OpenMP-CUDA. (b) Architecture of 2xCPU+6xGPU

where q* =

1 dV2

Qa

d

(tq(vtvz}), Vc is the circular velocity in the disk

4nGr dr t dr

equatorial plane (z = 0). In (1), (2) we neglect the gravitational interaction between

dark matter halo, stellar bulge and gaseous component [9,16,171. According to this model

iterative procedure of the disk initial conditions generation is described in [1,15,18]. Note

that, in such model, the stellar disk is completely self-gravitating, and the following analysis

provides an upper limit for numerical errors in direct ^-body integration.

Fig. 2. Scheme of the parallel algorithm for calculation of the gravitational interactions between particles on Multi-GPU

Next, we describe the main features of time integration of ^-body system. For the second-order time integration of equations (1), we used a leapfrog scheme with a fixed step size of 0.2 Myr. Such approach is also called as "Kick-Drift-Kick" or KDK scheme. In order to speed up the integration this method utilizes a gravity solver for (1) only once at each time step. The main steps of the leapfrog method for self-gravity N-body models are as follows:

(I) Velocity vector vi at a predictor sub-step, at a moment of t + At, is given by

n

V (t + At) = vi (t) + At Y^ fij (t),

j=1,j=i

where At is a full time step.

(II) Next we update the position of particles ri at time t + At as

Г (t + At) = ri (t) + A [Vi(t + At) + vi (t)] .

(5)

(6)

After this step we need to re-calculate the accelerations of the particles fj (t+At) according to (2).

(Ill) During the corrector step the velocity, vi3 values are recalculated at time t + At

n

Vi (t + At) =

vM±M±M + A E fj(t + At).

(7)

j =1j

As it is clearly seen, the KDK scheme (5) - (7) allows to increase an N-body solver performance by a factor of 2 in comparison to Runge-Kutta schemes of the second order approximation. This is because the gravitational interaction between the particles is calculated only once per integration time step.

CPU

t + dt

Save Data

GPU

CPU

Fig. 3. Flow diagram for the calculation module

2. Parallel Algorithm Structure

N

integration based on Hybrid Parallelization Technology OpenMP-CUDA (Fig. 1, 2).

N

described above (5) - (7) by using OpenMP-CUDA and GPU-Direct technologies. Figures la and lb demonstrate the schemes of the two-level parallelization of OpenMP-CUDA and the communications between GPUs based on GPU-Direct technology.

N

dynamics can be used only on computational systems with shared CPU + k x GPU memory, taking into account two-level OpenMP-CUDA scheme (Fig. la). OpenMP

k

kx

dynamics of N/k particles, and fast data exchange between the GPU is carried out via

Fig. 4. Calculation time (in seconds) for galactic stellar disk evolution on various CPUs

the PCI-e bus based on CPU-Direct technology. Note that for multiprocessor computing systems (2 or more CPUs on the same motherboard), CPU-Direct technology works only for CPUs connected to the PCI-e buses under the control of one processor (Fig. lb). Previously we applied a similar approach for the hvdrodvnamic SPH-code [3].

Fig. 2 shows the data flows between different types of GPU memory while the calculation of gravitational forces between particles (2) on Multi-CPU architectures. Each computational CUDA kernel first copies the data for the md j particles from the slow Global Memory to the fast Shared Memory, and then, after the synchronization of all parallel CUDA threads, the gravitational interaction between the particles is calculated according to the (2). This parallel algorithm allows us to accelerate the computations by a factor of 3 — 4 due to the fast Shared Memory [3,19]. We emphasize that the transfer of data, stored on different CPUs, from Global Memory to Shared Memory proceeds via PCI-e bus using GPU-Direct technology. The computer algorithm for calculation of the N-bodv system dynamics consists of two main Global CUDA Kernels, which are launched on a CPU with multiple CPUs by using OpenMP technology: - The Gravity Force Computation (GFC) is a CUDA Kernel for calculation of the gravitational forces between particles (2). It is characterized bv a computational complexity of O(N2). - The Update System (US) is a CUDA Kernel for calculation of the particles positions and velocities at a next step of the KDK scheme (5) - (7). Here computational complexity is O(N). Fig. 3 showrs the sequence of execution of the main Global CUDA Kernels.

3. Main Results

We have studied the parallelization efficiency and accuracy of our algorithm by means of simulation of the gravitationallv unstable collisionless disk [1,20,21]. The calculations were carried out on GPU Nvidia Tesla computers: K20, K40, K80.

In Figs. 4 we show the computation time (one integration step) for various CPUs. Calculation time with double precision on one Tesla K80 GPU is by 15% larger than the Tesla K40 GPU. This is due to the different speeds of access to Global Memory and Shared Memory on these CPUs [3]. The speed of access to global memory on the K80 GPU is greater than on the K40 GPU, and in the case of shared memory, the situation is the opposite.

In Table 1 we present the calculation time for simulations with various numbers of particles and performed on different GPUs for single- and double precision. Computation time depends quadratically on the number of particles, which corresponds to the O(N2) algorithm complexity. Table 1 also shows that increase of the number of GPUs leads to an almost linear increase in computing performance.

Table 1

The dependence of the one integration step calculation time

N

tgpu S , single precision tgpu [s], double precision

N x 1024 lxGPU 2xGPU 4xGPU lxGPU 2xGPU 4xGPU

128 0,4 0,2 0,1 0,9 0,5 0,3

256 1,7 0,9 0,45 3,7 2 1

512 6,9 3,6 1,8 15 7,9 4

1024 27,4 14,4 7,4 60 31,6 16,2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2048 109,6 57,6 29,6 240 126,4 64,8

4096 438 230 118 960 506 259

8192 1754 922 474 3840 2022 1037

The speed of our algorithm on the Nvidia Tesla K-Series processor for double precision is smaller by a factor of 1,5 — 2, 2 in comparison to the single precision depending on the particles number and GPU type. The average speed up for a single GPU is approximately 1, 7.

Let us consider the accuracy of the most important integral physical conservation laws. For total energy, momentum and angular momentum, we have the following expressions:

P N mi\vi\2 1 N N Gmimj

E = i 2 + 2 2-2- \ri — rj + s\' (8)

i=1 i=1 j=1,j=i J

P = Y mv, (9)

N

miVi

i=1

N

L = Y тг[тг x vi]z . (10)

i=1

Fig. 5 demonstrates the evolution of the total energy error for the entire system of particles. Obviously, this quantity accumulates faster for single precision. The angular momentum error for double precision varies in the range of 10"16 ^ 10"13, while for single precision format it is larger by four orders of magnitude. Note, however, that center mass of the stellar disk system for SP and DP (Fig. 6) is roughly the same.

For SP all dependencies demonstrate a sharp increase of the integral values errors at t ~ 5 (Figs. 5, 6). The latter is caused by the emergence of strong asymmetric disturbances due to the development of gravitational instability in the simulated disk (Fig. 7).

As seen from Table 2 with an increase of the number of particles the conservation laws' errors for energy, momentum, and angular momentum for SP increases faster than O(N1>5), while for DP it remains within the round-off errors.

Fig. 5. Evolution of the relative deviations of the total energy (top panel) and angular momentum (bottom panel) for the stellar disk system of N = 220 particles. Red and blue lines are the single precision and double precision, respectively. Period of rotation of the stellar disk at the periphery is & 4

In Fig. 7 we show the impact of the total energy, momentum, and angular momentum conservation on the evolution of the stellar galactic disk. There are no qualitative differences in the disk surface density distributions for SP and DP until t & 5. However, at larger times (t > 7), we obtain significant quantitative and qualitative distortions of the simulation result for SP in comparison to more sophisticated DP based simulation.

Conclusion

In this work, we analyze the N-bodv simulations on GPUs by using a direct method of the gravitational forces calculation (Particle-Particle algorithm) and parallel OpenMP-CUDA technologies. We found that a single-precision numbers in the second-order accuracy

Fig. 6. Evolution of the absolute deviations of impulse (top panel) and mass center (bottom panel) for the system consisting of N = 220 particles. Red and blue lines are single precision and double precision, respectively

Table 2

Maximal deviation of the total impulse, total energy, and angular momentum as a function of number of particles

N Х1024 |таж •) SP 1ДР |таж 1 DP | ^L|max lo ' SP 1 ^L|max Lo ' DP \^e\max eo ' SP \^e\max eo ' DP

128 0,0040954 2, 504• 10-15 0,0002052 2, 621 •Ю-14 0,0002132 4, 602-10-5

256 0,0055965 1, 923 • 10-15 0,0009059 3, 553•Ю-14 0,0005128 7,40940-5

512 0,0138642 3, 074• 10-15 0,0020217 7, 017•Ю-14 0,0004764 6,47440-5

1024 0,0421156 4, 278 • 10-15 0,0078611 7, 327•Ю-14 0,0029361 5,627 •Ю-5

igcr igc ig|Aa|

Fig. 7. Distribution of the surface density in the stellar disk (N = 220) at different times for a single precision (left panels, asp), double precision (center panels, aDP) and |Aa| = \aDP — aSPI (right panels)

N

system evolution. We claim that this is due to significant violation of the laws of conservation of momentum, angular momentum, and energy at modeling times exceeding 104 time steps of integration, which, in turn, is the result of the accumulation of roundoff errors in the calculation of gravitational forces. This effect is mostly pronounced for N

Acknowledgements. SSK is thankful to the RFBR (grants 16-07-01037 and 16-020064-9). SAK gratefully acknowledges funding from the Russian Foundation for Basic R.esearch (16-32-60043). AVIi is thankful to the Ministry of Education and Science of the Russian Federation (government task 2.852.2017/4.6). Authors also wish to thank Yulia Venichenko for useful comments which helped improve the paper.

References

1. Fridrrian A.M., Khoperskov A.V. Physics of Galactic Disks. Cambridge: Cambridge International Science Publishing, 2013.

2. Kennedy G.F., Meiron Y., Shukirgaliyev B., Panamarev T., Berczik P. et. al. The DRAGON Simulations: Globular Cluster Evolution with a Million Stars. Monthly Notices of the Royal Astronomical Society, 2016, vol. 458, no. 2, pp. 1450-1465. DOI: 10.1093/innras/stw274

Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software (Bulletin SUSU MMCS), 2018, vol. 11, no. 1, pp. 124-136

3. Khrapov S., Khoperskov A. Smoothed-Particle Hydrodynamics Models: Implementation Features on GPUs. Communications in Computer and Information Science, 2017, vol. 793, pp. 266-277. DOI: 10.1007/978-3-319-71255-0_21

4. Smirnov A.A., Sotnikova N.Ya., Koshkin A.A. Simulations of Slow Bars in Anisotropic Disk Systems. Astronomy Letters, 2017, vol. 43, no. 2, pp. 61-74. DOI: 10.1134/S1063773717020062

5. Comparai J., Prada F., Yepes G., Klypin A. Accurate Mass and Velocity Functions of Dark Matter Haloes. Monthly Notices of the Royal Astronomical Society, 2017, vol. 469, no. 4, pp. 4157-4174. DOI: 10.1093/mnras/stxll83

6. Knebe A., Stoppacher D., Prada F., Behrens C., Benson A. et al. Multidark-Galaxies: Data Release and First Results. Monthly Notices of the Royal Astronomical Society, 2018, vol. 474, no. 4, pp. 5206-5231. DOI: 10.1093/mnras/stx2662

7. Hwang J.-S., Park C. Effects of Hot Halo Gas on Star Formation and Mass Transfer During Distant Galaxy-Galaxy Encounters. The Astrophysical Journal, 2015, vol. 805, pp. 131-149. DOI: 10.1088/0004-637X/805/2/131

8. Portaluri E., Debattista V., Fabricius M., Cole D.R., Corsini E. et al. The Kinematics of a-drop Bulges from Spectral Synthesis Modelling of a Hydrodynamical Simulation. Monthly Notices of the Royal Astronomical Society, 2017, vol. 467, no. 1, pp. 1008-1015. DOI: 10.1093/mnras/stxl72

9. Khoperskov A.V., Just A., Korchagin V.I., Jalali M.A. High Resolution Simulations of Unstable Modes in a Collisionless Disc. Astronomy and Astrophysics, 2007, vol. 473, pp. 31-40. DOI: 10.1051/0004-6361:20066512

10. Gelato S., Chernoff D.F., Wasserman I. An Adaptive Hierarchical Particle-Mesh Code with Isolated Boundary Conditions. The Astrophysical Journal, 1997, vol. 480, pp. 115-131. DOI: 10.1086/303949

11. Barnes J., Hut P. A Hierarchical O(N log N) Force-Calculation Algorithm. Nature, 1986, vol. 324, pp. 446-449. DOI: 10.1038/324446a0

12. Greengard L. The Numerical Solution of the N-Body Problem. Computers in physics, 1990, vol. 4, pp. 142-152. DOI: 10.1063/1.4822898

13. Huang S.-Y., Spurzem R., Berczik P. Performance Analysis of Parallel Gravitational N-Body Codes on Large Gpu Clusters. Research in Astronomy and Astrophysics, 2016, vol. 16, no. 1, p. 11. DOI: 10.1088/1674-4527/16/1/011

14. Steinberg O.B. Circular Shift of Loop Body - Programme Transformation, Promoting Parallelism. Bulletin of the South Ural State University. Series: Mathematical Modelling, Programming and Computer Software, 2017, vol. 10, no. 3, pp. 120-132. DOI: 10.14529/mmpl70310

15. Khoperskov A., Bizyaev D., Tiurina N., Butenko M. Numerical Modelling of the Vertical Structure and Dark Halo Parameters in Disc Galaxies. Astronomische Nachrichten, 2010, vol. 331, pp. 731-745. DOI: 10.1002/asna.200911402

16. Khoperskov A.V., Khoperskov S.A., Zasov A.V., Bizyaev D.V., Khrapov S.S. Interaction between Collisionless Galactic Discs and Nonaxissymmetric Dark Matter Haloes. Monthly Notices of the Royal Astronomical Society, 2013, vol. 431, pp. 1230-1239. DOI: 10.1093/mnras/stt245

17. Khoperskov S.A., Vasiliev E.O., Khoperskov A.V., Lubimov V.N. Numerical Code for Multi-Component Galaxies: from N-Body to Chemistry and Magnetic Fields. Journal of Physics: Conference Series, 2014, vol. 510, pp. 1-13. DOI: 10.1088/1742-6596/510/1/012011

18. Rodionov S.A., Athanassoula E., Sotnikova N.Ya. An Iterative Method for Constructing Equilibrium Phase Models of Stellar Systems. Monthly Notices of the Royal Astronomical Society, 2009, vol. 392, no. 2, pp. 904-916. DOI: 10.1111/j.1365-2966.2008.14110.x

19. Bellemana R.G., Jeroen B., Simon F., Portegies Z. High Performance Direct Gravitational N-Body Simulations on Graphics Processing Units: An Implementation in CUDA. New Astronomy, 2008, vol. 13, pp. 103-112. DOI: 10.1016/j.newast.2007.07.004

20. Griv E., Wang H.-H. Density Wave Formation in Differentially Rotating Disk Galaxies: Hydrodynamic Simulation of the Linear Regime. New Astronomy, 2014, vol. 30, pp. 8-27. DOI: 10.1016/j.newast.2014.01.001

21. Romeo A., Falstad N. A Simple and Accurate Approximation for the Q Stability Parameter in Multicomponent and Realistically Thick Discs. Monthly Notices of the Royal Astronomical Society, 2013, vol. 433, no. 2, pp. 1389-1397. DOI: 10.1093/mnras/stt809

Received January 22, 2018

УДК 502.57 DOI: 10.14529/mmpl80111

ОСОБЕННОСТИ ПАРАЛЛЕЛЬНОЙ РЕАЛИЗАЦИИ ЧИСЛЕННЫХ МОДЕЛЕЙ N-ТЕЛ НА GPU

С.С. Храпов1, С.А. Хоперское2, A.B. Хоперское1

1 Волгоградский государственный университет, г. Волгоград, Российская Федерация

2

Рассмотрены особенности параллельной реализации прямого гравитационного моделирования N-тел на нескольких GPU с использованием технологии GPU-Direct. Подробно описан параллельный алгоритм решения задачи N-тел, основанный на гибридной технологии распараллеливания OpenMP-CUDA с количеством частиц N ~ 105 + 107. Исследована эффективность распараллеливания нашего алгоритма для различных GPU линейки Nvidia Tesla (К20, К40, К80) при моделировании динамики гравитационно-неустойчивого звездного галактического диска. Изучена производительность и точность моделирования при использовании чисел одинарной и двойной точности. Так, например, для процессора Nvidia Tesla К80 время вычислений с двойной точностью оказалась всего лишь в 1,85 раз больше чем для одинарной точности. Показано, что использовать числа с одинарной точностью при моделировании на GPU сильно несимметричных систем взаимодействующих N-тел схемами второго порядка точности по времени некорректно, так как это приводит к значительным количественным и качественным искажениям результата. Так, например, для чисел одинарной точности после 104 шагов по времени полные энергия, импульс и момент импульса системы N-тел (N = 220) сохранились с точностью менее 2 • 10_3, 4 • 10-2 и 7 • 10~3, соответственно. Для чисел двойной точности эти законы сохранения выполнялись с точностью более 5 • 10-5, 10~15 и 10~13, соответственно. Наши оценки показывают, что по шкале производительность-точность использование схем второго порядка точности по времени наряду с числами двойной точности оказывается на 20-30% более эффективно, чем схем четвертого порядка с числами одинарной точности.

Ключевые слова: Multi-GPU; OpenMP-CUDA; GPU-Direct; Nvidia Tesla; задача N-тел; одинарная и двойная точность численных решений; звездный галактический диск; гравитационная неустойчивость.

Литература

1. Fridman, A.M. Physics of Galactic Disks / A.M. Fridman, A.V. Khoperskov. - Cambridge: Cambridge International Science Publishing, 2013.

2. Kennedy, G.F. The DRAGON Simulations: Globular Cluster Evolution with a Million Stars / G.F. Kennedy, Y. Meiron, B. Shukirgaliyev, T. Panamarev, P. Berczik et al // Monthly Notices of the Royal Astronomical Society. - 2016. - V. 458, № 2. - P. 1450-1465.

3. Khrapov S., Khoperskov A. Smoothed-Particle Hydrodynamics Models: Implementation Features on GPUs //Communications in Computer and Information Science. - 2017. - V. 793, pp.266-277.

4. Smirnov, A.A. Simulations of Slow Bars in Anisotropic Disk Systems / A.A. Smirnov, N.Ya. Sotnikova, A.A. Koshkin // Astronomy Letters. - 2017. - V. 43, № 2. - P. 61-74.

5. Comparai, J. Accurate Mass and Velocity Functions Of Dark Matter Haloes / J. Comparai, F. Prada, G. Yepes, A. Klypin // Monthly Notices of the Royal Astronomical Society. - 2017.

- V. 469, № 4. - P. 4157-4174.

6. Knebe, A. Multidark-Galaxies: Data Release and First Results / A. Knebe, D. Stoppacher, F. Prada, C. Behrens, A. Benson et al. // Monthly Notices of the Royal Astronomical Society.

- 2018. - V. 474, № 4. - P. 5206-5231.

7. Hwang, J.-S. Effects of Hot Halo Gas on Star Formation and Mass Transfer During Distant Galaxy-Galaxy Encounters / J.-S. Hwang, C. Park // The Astrophysical Journal. - 2015. -V. 805. - pp. 131-149.

8. Portaluri, E. The Kinematics of ст-Drop Bulges from Spectral Synthesis Modelling of a Hydrodynamical Simulation / E. Portaluri, V. Debattista, M. Fabricius, D.R. Cole, E. Corsini et al // Monthly Notices of the Royal Astronomical Society. - 2017. - V. 467, № 1. -P. 1008-1015.

9. Khoperskov, A.V. High Resolution Simulations of Unstable Modes in a Collisionless Disc / A.V. Khoperskov, A. Just, V.l. Korchagin, M.A. Jalali // Astronomy and Astrophysics. -2007. - V. 473. - pp. 31-40.

10. Gelato, S. An Adaptive Hierarchical Particle-Mesh Code with Isolated Boundary Conditions / S. Gelato, D.F. Chernoff, I. Wasserman // The Astrophysical Journal. - 1997. - V. 480. -P. 115-131.

11. Barnes, J. A Hierarchical O(N log N) Force-Calculation Algorithm / J. Barnes, P. Hut // Nature. - 1986. - V. 324. - P. 446-449.

12. Greengard, L. The Numerical Solution of the N-Body Problem / L. Greengard // Computers in Physics. - 1990. - V. 4. - P. 142-152.

13. Huang, S.-Y. Performance Analysis of Parallel Gravitational N-Body Codes on Large Gpu Clusters / S.-Y. Huang, R. Spurzem, P. Berczik // Research in Astronomy and Astrophysics.

- 2016. - V. 16, № 1. - P. 11.

14. Steinberg, O.B. Circular Shift of Loop Body - Programme Transformation, Promoting Parallelism / O.B. Steinberg // Вестник ЮУрГУ. Серия: Математическое моделирование и программирование. - 2017. - Т. 10, № 3. - С. 120-132.

15. Khoperskov, A. Numerical Modelling of the Vertical Structure and Dark Halo Parameters in Disc Galaxies / A. Khoperskov, D. Bizyaev, N. Tiurina, M. Butenko // Astronomische Nachrichten. - 2010. - V. 331. - P. 731-745.

16. Khoperskov, A.V. Interaction between Collisionless Galactic Discs and Nonaxissymmetric Dark Matter Haloes / A.V. Khoperskov, S.A. Khoperskov, A.V. Zasov, D.V. Bizyaev, S.S. Khrapov // Monthly Notices of the Royal Astronomical Society. - 2013. - V. 431. -P. 1230-1239.

17. Khoperskov, S.A. Numerical Code for Multi-Component Galaxies: from N-Body to Chemistry and Magnetic Fields / S.A. Khoperskov, E.O. Vasiliev, A.V. Khoperskov, V.N. Lubimov // Journal of Physics: Conference Series. - 2014. - V. 510. - P. 1-13.

18. Rodionov, S.A. An Iterative Method for Constructing Equilibrium Phase Models of Stellar Systems / S.A. Rodionov, E. Athanassoula, N.Ya. Sotnikova // Monthly Notices of the Royal Astronomical Society. - 2009. - V. 392, № 2. - P. 904-916.

19. Bellemana, R.G. High Performance Direct Gravitational N-body Simulations on Graphics Processing Units: an Implementation in CUDA / R.G. Bellemana, B. Jeroen, F. Simon, Z. Portegies // New Astronomy. - 2008. - V. 13. - P. 103-112.

20. Griv, E. Density Wave Formation in Differentially Rotating Disk Galaxies: Hydrodynamic Simulation of the Linear Regime / E. Griv, H.-H. Wang // New Astronomy. - 2014. - V. 30. -P. 8-27.

21. Romeo, A. A Simple and Accurate Approximation for the Q Stability Parameter in Multicomponent and Realistically Thick Discs / A. Romeo, N. Falstad // Monthly Notices of the Royal Astronomical Society. - 2013. - V. 433, № 2. - P. 1389-1397.

Работа выполнена при финансовой поддержки Министерства образования и науки Российской Федерации (гос. задание № 2.852.2017/4-6) и РФФИ (гранты 16-0701037, 16-02-00649, 16-32-60043).

Сергей Сергеевич Храпов, К&НДИДсХТ физико-математических наук, доцент, доцент кафедры «Информационные системы и компьютерное моделирование», Волгоградский государственный университет (г. Волгоград, Российская Федерация), [email protected].

Сергей Александрович Хоперсков, К&НДИДсХТ физико-математических наук, научный сотрудник, отдел физики звездных и планетных систем, Институт астрономии РАН (г. Москва, Российская Федерация), [email protected].

Александр Валентинович Хоперсков, доктор физико-математических наук, профессор, заведующий кафедрой «Информационные системы и компьютерное моделирование:», Волгоградский государственный университет (г. Волгоград, Российская Федерация), [email protected].

Поступила в редакцию 22 января 2018 г.

i Надоели баннеры? Вы всегда можете отключить рекламу.