Научная статья на тему 'Sampling of integrand for integration using shallow neural network'

Sampling of integrand for integration using shallow neural network Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
4
0
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
Shallow Neural Network / Numerical Integration / Metropolis–Hastings Algorithm / нейронная сеть / приближенное интегрирование / алгоритм Метрополиса–Гастингса

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Alexander Ayriyan, Hovik Grigorian, Vladimir Papoyan

In this paper, we study the effect of using the Metropolis–Hastings algorithm for sampling the integrand on the accuracy of calculating the value of the integral with the use of shallow neural network. In addition, a hybrid method for sampling the integrand is proposed, in which part of the training sample is generated by applying the Metropolis–Hastings algorithm, and the other part includes points of a uniform grid. Numerical experiments show that when integrating in high-dimensional domains, sampling of integrands both by the Metropolis–Hastings algorithm and by a hybrid method is more efficient with respect to the use of a uniform grid.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Способ формирования обучающей выборки для вычисления интеграла с использованием нейронной сети

В настоящей работе исследуется применение алгоритма Метрополиса–Гастингса при формировании обучающей выборки для нейросетевой аппроксимации подынтегральной функции и его влияние на точность вычисления значения интеграла. Предложен гибридный способ формирования обучающего множества, в рамках которого часть выборки генерируется посредством применения алгоритма Метрополиса–Гастингса, а другая часть включает в себя узлы равномерной сетки. Численные эксперименты показывают, что при интегрировании в областях больших размерностей предложенный способ является более эффективным относительно узлов равномерной сетки.

Текст научной работы на тему «Sampling of integrand for integration using shallow neural network»

Discrete & Continuous Models & Applied Computational Science

ISSN 2658-7149 (Online), 2658-4670 (Print)

UDC 519.65:519.217

PACS 07.05.Tp, 07.05.Mh, 02.70.-c

DOI: 10.22363/2658-4670-2024-32-1-38-47

2024, 32(1)38-47

http://journals.rudn.ru/miph

Research article

EDN: GFROYO

Sampling of integrand for integration using shallow neural network

Alexander Ayriyan1,23, Hovik Grigorian1,23'4, Vladimir Papoyan1,23

1 Joint Institute for Nuclear Research, 6 Joliot-Curie St, Dubna, 141980, Russian Federation

2 Alikhanyan National Science Laboratory (YerPhI), 2 Alikhanyan Brothers St, Yerevan, 0036, Republic of Armenia

3 Dubna State University, 19 Universitetskaya St, Dubna, 141980, Russian Federation

4 Yerevan State University, 1 Alex Manoogian St, Yerevan, 0025, Republic of Armenia

(received: November 13, 2023;revised: December 15, 2023;accepted: January 12, 2024)

Abstract. In this paper, we study the effect of using the Metropolis-Hastings algorithm for sampling the integrand on the accuracy of calculating the value of the integral with the use of shallow neural network. In addition, a hybrid method for sampling the integrand is proposed, in which part of the training sample is generated by applying the Metropolis-Hastings algorithm, and the other part includes points of a uniform grid. Numerical experiments show that when integrating in high-dimensional domains, sampling of integrands both by the Metropolis-Hastings algorithm and by a hybrid method is more efficient with respect to the use of a uniform grid.

Key words and phrases: Shallow Neural Network, Numerical Integration, Metropolis-Hastings Algorithm

1. Introduction

In the recent study [1], an algorithm for numerical integration was proposed based on the use of a neural network with one hidden layer. In this approach, the neural network approximates the integrand function within a bounded region that includes the integration domain. A training the neural network may certainly require a significant amount of time and computational resources. However, when the training is completed, the neural network architecture allows for the analytical integration of the approximated integrand. Furthermore, the integral of the neural network's function can be computed in any other subregion without the need for retraining. Thus, the neural network integration approach is efficient for tasks where it is necessary to repeatedly calculate the integral of the same function in different regions.

The neural network training is the main challenge of an integrand approximation. During supervised learning training data plays a significant role, and consequently an approach to their sampling. In the paper [1], a uniform grid-based discretization of the domain was used as the function sampling method. However, this approach is inefficient for integrands with significant variations in certain subregions.

In this article, the impact of using the Metropolis-Hastings algorithm for shape-based sampling of an integrand on the integration accuracy is considered. A hybrid approach for forming the training dataset is proposed, in which a portion of the training dataset (with a relative volume fraction denoted as p) is generated using the Metropolis-Hastings algorithm, while the other part includes nodes of a uniform grid.

©AyriyanA., GrigorianH., PapoyanV., 2024

This work is licensed under a Creative Commons Attribution 4.0 International License https://creativecommons.org/licenses/by-nc/4.0/legalcode

2. The neural network method of approximate integration

In many areas of science and engineering there is a need for approximate calculation of an integral for a given continuous real function f : Rn ^ R over a region S

I[f] = f f(x)dx. (1)

Js

According to the universal approximation theorem [2] and Theorem 2 in [1], any function f(x) as defined above can be approximated arbitrarily accurately using a shallow (single hidden layer) neural network f(x) with a logistic sigmoid activation function (3). This network can be analytically integrated within a bounded convex region S.

The mathematical expression for such a neural network can be represented by the formula:

k / n \

(2)

K / n

f(x) = b(2) + W^(b(1) + Wjx) = b(2) + 2 uf^ (bJ-1) + 2

¡=1 V i=i

ij

J=1

where:

- W1 and W2 are weight matrices for the first and second layers of the neural network, respectively;

- b(1) and b(2) are the bias vectors for the first and second layers of the neural network, respectively;

- x is an n-dimensional vector of input values for the function f;

- a is the logistic sigmoid function:

a(z) = —1—. (3)

( ) 1 + e-z w

The logistic sigmoid function and its antiderivatives are expressed in terms of polylogarithmic functions Lim of different orders m: Li0:

Lio(z) = -Y-pi. (4)

In particular, the sigmoid function itself is expressed in terms of the zeroth-order polylogarithm

°(z) = = —Lio(—ez). (5)

1 + e z

Note that the representation of each subsequent antiderivative a(z) increases the order of the polylogarithm by one. This representation of the sigmoid function allows us to integrate f(x) in accordance with Theorem 2 from [1].

The integration region S can be extended to S such that SCS and f(x) = 0 for x eS\S. S is an n-dimensional hyperrectangle in R", specifically S = [a1,^1]x [a2,p2] x ••• x [an,Pn].

Thus, expression for the neural network integral I(f, a, ft) is defined as:

/(/,«, ft) = I[f] = b2 Jl(fii —ad+2 "f ¡=1 j=1

nn (1) (=1^1

(6)

where i>j is defined as:

2" / T " 1\ = 2 (— exp —bf — 2 «^.r ). (7)

r=1 V L i=1 J/

Here, is the sign in front of the r-th term of the sigmoid integration, and li>r represents the corresponding integration limit for the i-th dimension. These limits are defined by:

k =11 (—1)^/2"A (8)

d=1

, K

^ if [r/2n 'J is even, ''r otherwise.

It is worth noting that an alternative to polylogarithmic functions can be the Fermi-Dirac integral:

1 f00 tn

= T5+»J0 (10)

J 0

which is related to the polylogarithm as:

Fnx = —Lin+1(-ex). (11)

Then the function (7) takes on a new form:

*J

2n / n \

= £-^-1 (-bj(1) -^a^J. (12)

r= 1 V 1=1 '

This alternative representation can be useful when the weight coefficients uj-1 and biases bj1 acquire large values due to training, causing potential data type overflow issues. However, the effectiveness of this depends on the specific implementation of the Fermi-Dirac integral calculation.

3. Sampling of integrand

The main difficulty of the considered integration approach lies in the neural network training that approximates the integrand. In other words, the main problem is to calculate the weight matrices W1, W2, and biases b(1), b(2) to achieve the minimum value of the objective function. A successful solution to a supervised learning problem depends on several factors, the most important one is the generation of a training dataset.

The training set, denoted as D = {(x, f(x)) | x e SN cS} (where SN is an N-element finite subset of S), includes the argument vector x and the corresponding function values f(x). In other words, the training set results from sampling the integrand.

In [1], a uniform grid of nodes is used as the training dataset, which leads to insufficiently accurate approximation for families of integrands with sharp value changes in certain subregions, hence resulting in unsatisfactory integration results. In particular, there would be insufficient learning points in regions where the shape of the function is drastically sharp for instance for functions with explicitly pronounced peaks. This case is illustrated in the figure 1b.

In this research, a hybrid approach is proposed for forming the training dataset, where a part of the training data DMH is generated using the Metropolis-Hastings algorithm [3] and [4], which allows sampling any probability distribution function. Another part of the dataset DUG consists of nodes from a uniform grid. As a result both data sets united to the final training set D = DMH u DUG.

The Metropolis-Hastings algorithm is based on constructing a converging Markov chain, where each iteration involves generating a new random point x from an auxiliary distribution, followed by deciding whether to accept or reject this point, using information about the value of the integrand f(x). Applying such an approach to functions with a narrow and high peak will increase the point density in areas where the function value increases, thereby improving both the integrand approximation and integration accuracy. Examples of point generation using the hybrid method and the Metropolis-Hastings algorithm are shown in figures 1d and 1c, respectively.

The main challenge in applying the Metropolis-Hastings algorithm lies in the absence of a universally efficient approach to determine the algorithm's parameters. Additionally, it is necessary to establish the number of points to generate using the Metropolis-Hastings algorithm relative to the total number of points for effective integrand approximation. This paper empirically investigates the impact of parameter determination and the proportion of points generated by the Metropolis-Hastings algorithm on the integration results.

Figure 1. Different ways to sample a function with a clearly defined peak with fixed parameters Ci = 0.0146162197 and

c2 = 299.985384

4. Implementation and testing 4.1. Implementation

The implementation of the approach was carried out in the Python programming language within the ML/DL ecosystem in the computational component called jhub2 [5] using the Keras [6] and mpmath [7] Python libraries.

The number of neurons in the hidden layer of the network was determined according to the expression:

K2N

Ko^i (B + 2)j

where K1 = 4.33 and K2 = 16 [1], N is the number of elements in the sample. The mean squared error (MSE) was used as the objective function:

(13)

1 N

MSE = 12ti—ft2. (14)

¡=1

The neural network was trained using the Levenberg-Marquardt backpropagation algorithm [8], [9] for 5000 epochs. The Levenberg-Marquardt method is one of the fastest and has high convergence for small-sized neural networks [10]. For the neural network training 90% of the total number of points D were used, with the remaining 10% used for validation.

The inputs and outputs of the neural network were transformed into the range [dmin, dmax] through Min-Max normalization [11]. After the training process, based on the obtained weight coefficients W1, W2, and biases b(1), b(2) related to the normalized function /', the integral is calculated according to formula (6). The integration limits a and @ must be scaled to a' and according to the transformation of the neural network arguments. Then, according to formula (6), a scaled value of the integral /(/', a', £') is obtained. To obtain the integral I(f, x, £), it is necessary to rescale /(/',«', £') according to the expression:

/(/, «, /8) = I(^)(£maX -{r,'(/', f) + (/min - ^maX dmin) V(S).

V (s X^max ^min) V umax ^min '

Here /max and /min are the maximum and minimum values of the function in the training set D, and F(S) and ^(S') are hypervolumes of the integration domain S before and S' after the scale transformation, respectively, and since the latter are hyperrectangles, the volumes are:

n

nS) = n(ft -«i), (15)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

1=1

ns') = n (# -ai). (16)

1=1

In further calculations, dmax = 1, and dmin = -1.

The accuracy of integration is assessed by determining the number of correct digits (CD) in the approximate value of the integral obtained using the neural network:

CD(7,7) = - log.

7-7

(17)

4.2. Functions for testing

The testing of the hybrid sampling was performed on three classes of integrands, /1, /2, /3, defined within the unit cube [0,1]". All three classes of parameterized functions were taken from a set of functions for testing multidimensional integration algorithms compiled by Alan Genz [12]: Oscillatory function:

Zi(x) = cos

Corner Peak function:

Continuous function:

v-(« + 1)

/2«

/3« = exp

CjX;).

= (l + Z C^j

Ci|xi -Mi|).

(18)

(19)

(20)

Here, M; are shift parameters, with values uniformly randomly distributed in the interval [0,1]. The vector of parameters c can be used to control the complexity of integration. It is determined for each family of functions jj separately:

U zL < )c'

(21)

where c' is a vector of size n, with its components uniformly randomly distributed in the range [0,1]. The values of fy and ej are fixed for each class of functions and are presented in the table 1.

c

Table 1

Values of integration complexity parameters

n Parameters fi /2 /3

n = 2 hJ 100 600 100

ei 1 1 1

n = 6 hj 300 1000 200

eJ 1.75 1.75 1.75

4.3. Testing of the hybrid approach

Each class of integrands (18)-(20) was investigated in two spatial dimensions, n = 2 and 6, with a corresponding number of points N = 103, 104, and 105. The results of the computations are illustrated in the figure 2.

Each point on the graph corresponds to the average value of 20 approximate integrals with varying parameters u and c. For N = 105, the number of computed integrals was reduced from 20 to 5 due to the lengthy neural network training.

For functions with a clear peak, the proposed method of defining the training set increases the accuracy of integration compared to training on the nodes of a uniform grid, for any number of points generated by the Metropolis-Hastings algorithm. On the other hand, a value of p near 1 can deteriorate the function approximation due to a lack of points near the small function values, thereby reducing integration accuracy. In the case of oscillatory and continuous functions, applying hybrid sampling does not significantly increase accuracy when n = 2. Furthermore, for large p values of 0.9 and 1.0, integration accuracy significantly decreases compared to training on the nodes of a uniform grid. However, for the case when n = 6, integration accuracy increases when p is set to 0.2 and 0.7 for the oscillatory function, and 0.1 and 0.2 for the continuous function, relative to p = 0.

It is worth noting that for functions with a clear peak, with a small number of points (N = 103) and n = 2, increasing p the accuracy is rising by 2 digits. However, for the oscillatory function, the situation is reversed with increasing p, accuracy decreases from 4 to 2 digits. In the case of the continuous function, for most p values, accuracy remains nearly unchanged, but for large p values, the average CD value decreases significantly due to the low number or complete absence of points in the training set, which are nodes of the uniform grid. A similar trend of decreasing integration accuracy for large p values is also observed for the oscillatory function when n = 6. However, for the function with a clear peak, the behavior of the average CD as a function of p remains the same for small N values.

With an increase in the value of N, accuracy increases for almost all p values, mirroring the change in CD, but less abruptly. It is important to note that the use of the Metropolis-Hastings algorithm for generating the training set reduces computational costs. In particular, for a function with a clear peak, the accuracy at p = 0.8 and N = 103 is comparable to the result at p = 0 and N = 105. A similar effect exists for other function classes, but only for n = 6. The choice of the optimal value of p remains an open question.

In general, the dependence of accuracy on the proportion of points may not be monotonic since it is determined by the nature of the function itself and the training set derived from this nature. Furthermore, neural network training also depends on weight initialization. Therefore, the non-monotonicity has a statistical nature and is dependent on the type of integrands. An assessment of statistical divergence requires a much larger volume of computations and could be discussed in a separate study.

5. Conclusion

In the context of this study, it was found that the application of the Metropolis-Hastings algorithm improves the accuracy of the neural network integration compared to using a uniform grid of nodes.

Oscillatory 2D

-m- N = 103

■ * ■ N = 10"

N = 105

tS 4 S

0.4 0.6

P

£

S 5

0.4 0.6

P

(a) Results for the oscillatory function in two-dimensional space (b) Results for the oscillatory function in six-dimensional space

7. [CornerPeak2D) 6

-m- N = 103

■ * ■ N = 10"

N = 105

g =

<1>

1 *

u *

ï *

-m- N = 103

■ * ■ N = 10"

N = 105

0.4 0.6

P

Z 0

u

s

O Tu *

1 0

0.0 0.2 0.4 0.6 0.8 1.0

P

(c) Results for the corner peak function in two-dimensional space

7- [continuous 2D| 6

c 5

^ 3

u

S

I 2

U

1 0

CornerPeak6D

-m- N = 103

■ * ■ N = 10"

N = 105

0.4 0.6

P

(d) Results for the corner peak function in six-dimensional space

7 6

I 5

<u £

ui 4 "oi

T3

tS 3 g

o

u 2 1 0

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Continuous 6D

-m- N = 103

■ * ■ N = 10"

N = 105

0.4 0.6

p

(e) Results for the continuous function in two-dimensional space (f) Results for the continuous function in six-dimensional space

7

7

6

6

S 5

4

3

3

2

6

4

i 2

£ 4

Figure 2. Results of the validation of hybrid sampling for functions (18)-(20). Each curve represents N. Each point on the graphs is the average value of 20 approximate integrals at a given p. For large values of N = 105, the number of trials was

reduced to 5 due to the lengthy training time

Thus, a hybrid method for creating a training dataset has been proposed and tested. Part of the dataset is generated based on the function's values using the aforementioned algorithm, while another part includes nodes of a uniform grid within the function's domain and their corresponding values. To characterize the hybrid method, the concept of the relative proportion of the sample volume, denoted as p, obtained through pseudo-random generation, was introduced. The relationship between the accuracy of approximate integration and this parameter was investigated.

The testing was performed on three families of functions with two and six variables, proposed for testing integral computation methods. It was shown that the best results, on average, were obtained when p ranged from 0.1 to 0.3. It's also worth noting the improvement in results using the hybrid approach for higher dimensions, denoted as n, and a larger number of points, denoted as N.

Neural network integration appears promising in certain classes of problems, as the analytical formula for its integration as a function of integration domain parameters allows storing the integral form of the function and analytically computing an approximate value in any subdomain without the need for retraining the network. This work demonstrated that sampling using methods to generate points based on the values of integrands in combination with uniform grid nodes can improve the results of approximate integration by a neural network. Nevertheless, the choice of the optimal ratio of the first set of points to the second in the training dataset remains an open question.

Acknowledgments: The authors express their gratitude to the HybriLIT heterogeneous computing platform team for the opportunity to perform calculations in an ecosystem for machine learning, deep learning and data analysis problems. The authors thank Dr. Jan Busa for valuable comments while reading the manuscript.

References

1. Lloyd, S., Irani, R. A. & Ahmadi, M. Using neural networks for fast numerical integration and optimization. IEEE Access 8, 84519-84531. doi:10.1109/access.2020.2991966 (2020).

2. Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of Control Signals and Systems 2, 303-314. doi:10.1007/bf02551274 (Dec. 1989).

3. Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97-109. doi:10.1093/biomet/57.1.97 (Apr. 1970).

4. Chib, S. & Greenberg, E. Understanding the Metropolis-Hastings algorithm. The American Statistician 49, 327. doi:10.2307/2684568 (Nov. 1995).

5. Ecosystem for tasks of machine learning, deep learning and data analysis http://hlit.jinr. ru / en / access - to - resources _ eng / ecosystem -for-ml_dl_ bigdataanalysis -tasks_eng/. Accessed: 2023-10-10.

6. Chollet, F. etal. Keras https://keras.io.

7. Johansson, F. et al. mpmath: a Python library for arbitrary-precision floating-point arithmetic (version 0.18) http://mpmath.org.

8. Marquardt, D. W. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics 11,431-441. doi:10.1137/0111030 (June 1963).

9. Marco, F. D. Tensorflow Levenberg-Marquardt https : //github . com/fabiodimarco/tf-levenberg-marquardt.

10. Ki§i, O. & Uncuoglu, E. Comparison of three back-propagation training algorithms for two case studies. Indian Journal of Engineering and Materials Sciences 12,434-442 (Oct. 2005).

11. Jiawei Han, M. K. & Pei, J. Data mining: concepts and techniques Third Edition. 703 pp. doi:10. 1016/c2009-0-61819-5 (Elsevier Inc., 225 Wyman Street, Waltham, MA 02451, USA, 2012).

12. Genz, A. A package for testing multiple integration subroutines in Numerical Integration 337-340 (Springer Netherlands, 1987). doi:10.1007/978-94-009-3889-2_33.

To cite: Ayriyan A., Grigorian H., Papoyan V., Sampling of integrand for integration using shallow neural network, Discrete and Continuous Models and Applied Computational Science 32 (1)(2024)38-47.D01:10.22363/2658-4670-2024-32-1-38-47.

Information about the authors

Ayriyan, Alexander—PhD in Physics and Mathematics, Head of sector of the Division of Computational Physics of JINR, Assistant professor of Department of Distributed Information Computing Systems of Dubna State University; Senior Researcher of AANL (YerPhI) (e-mail: ayriyan@jinr.ru, phone: +7(496)216-35-98, ORCID: https://orcid.org/0000-0002-5464-4392)

Grigorian, Hovik—Candidate of Physical and Mathematical Sciences, Senior Researcher of JINR; Senior Researcher of AANL (YerPhI); Assistant professor of Dubna State University; assistant professor of Yerevan State University; (e-mail: hovik. grigorian@gmail.com, phone: +7(496)216-27-46, ORCID: https://orcid.org/0000-0002-0003-0512)

Papoyan, Vladimir—Junior researcher of JINR, Junior researcher of AANL (YerPhI), PhD student of Dubna State University (e-mail: vlpapoyan@jinr.ru, phone: +7(496)216-35-98, ORCID: https://orcid.org/0000-0003-0025-5444)

УДК 519.65:519.217

PACS 07.05.Tp, 07.05.Mh, 02.70.-c

DOI: 10.22363/2658-4670-2024-32-1-38-47 EDN: GFROYO

Способ формирования обучающей выборки для вычисления интеграла с использованием нейронной сети

А. С. Айриян1'2>3, О. А. Григорян1'2'3'4, В. В. Папоян1'2'3

1 Объединённый институт ядерных исследований,

ул. Жолио-Кюри, д. 6, Дубна, 141980, Российская Федерация

2 Национальная научная лаборатория им. А. Алиханяна (ЕрФИ), ул. братьев Алиханян, д. 2, Ереван, 0036, Республика Армения

3 Государственный университет «Дубна»,

ул. Университетская, д. 18, Дубна, 141980, Российская Федерация

4 Ереванский государственный университет,

ул. Алекса Манукяна, д. 1, Ереван, 0025, Республика Армения

Аннотация. В настоящей работе исследуется применение алгоритма Метрополиса-Гастингса при формировании обучающей выборки для нейросетевой аппроксимации подынтегральной функции и его влияние на точность вычисления значения интеграла. Предложен гибридный способ формирования обучающего множества, в рамках которого часть выборки генерируется посредством применения алгоритма Метрополиса-Гастингса, а другая часть включает в себя узлы равномерной сетки. Численные эксперименты показывают, что при интегрировании в областях больших размерностей предложенный способ является более эффективным относительно узлов равномерной сетки.

Ключевые слова: нейронная сеть, приближенное интегрирование, алгоритм Метрополиса-Гастингса

i Надоели баннеры? Вы всегда можете отключить рекламу.