Научная статья на тему 'Software implementation of numerical operations on random variables'

Software implementation of numerical operations on random variables Текст научной статьи по специальности «Математика»

CC BY
88
14
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ЧИСЛЕННЫЕ ОПЕРАЦИИ НАД СЛУЧАЙНЫМИ ВЕЛИЧИНАМИ / ГИСТОГРАММНАЯ АРИФМЕТИКА / ГИСТОГРАММЫ ВТОРОГО ПОРЯДКА / МОНТЕ-КАРЛО / ИНТЕРВАЛЬНАЯ МАТЕМАТИКА / NUMERICAL OPERATIONS ON RANDOM VARIABLES / HISTOGRAM ARITHMETIC / SECOND ORDER HISTOGRAM / MONTE CARLO METHOD / INTERVAL MATHEMATICS

Аннотация научной статьи по математике, автор научной работы — Dobronets Boris S., Krantsevich Artem M., Krantsevich Nikolay M.

We consider a software implementation of numerical operations on different types of random variables and introduce algorithms for arithmetic operations on random variables represented by their probability densities. We also estimate the accuracy of these algorithms and compare the accuracy of histogram arithmetic and the Monte Carlo methods.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Software implementation of numerical operations on random variables»

Journal of Siberian Federal University. Mathematics & Physics 2013, 6(2), 168—173

УДК 519.24

Software Implementation of Numerical Operations on Random Variables

Boris S. Dobronets*

Institute of Space and Information Technology, Siberian Federal University, Kirenskogo, 26, Krasnoyarsk, 660074

Russia

Artem M. Krantsevich Nikolay M. Krantsevich*

Institute of Mathematics and Computer Science, Siberian Federal University, Svobodny, 79, Krasnoyarsk, 660041

Russia

Received 23.11.2012, received in revised form 26.12.2012, accepted 26.01.2013 We consider a software implementation of numerical operations on different types of random variables and introduce algorithms for arithmetic operations on random variables represented by their probability densities. We also estimate the accuracy of these algorithms and compare the accuracy of histogram arithmetic and the Monte Carlo methods.

Keywords: numerical operations on random variables, histogram arithmetic, second order histogram, Monte Carlo method, interval mathematics.

Introduction

Statistical methods are being increasingly used in a wide range of applications. The presence of uncertainties in the input data of many practical problems motivates the need for methods that take them into account.

Analytical methods of probabilistic analysis are limited and can not be used for most applications. Monte Carlo methods is a powerful and versatile approach, which is widely used for stochastic modeling. Despite its strengths, it has some shortcomings; one of the most serious is its slow convergence.

An alternative approach, may be, in some cases, interval analysis [1], approaches relying on numerical operations on random variables [2-5], numerical methods of probabilistic analysis [6,7].

The packages that numerically implement histogram arithmetic [2,3,5] have some drawbacks, because arithmetic operations on random variables largely employ the Cartesian product of subintervals, which significantly affects the accuracy of the results.

1. The probability density function types

In this section we list different types of probability density functions of random variables, on which we consider arithmetic operations.

* BDobronets@sfu-kras.ru takrantsevich@gmail.com ^ krantsevich@gmail.com © Siberian Federal University. All rights reserved

Discrete random variables. A discrete random variable £ assumes values xi, x2,xn, each with probability p(xj). The function p(x) is sometimes called the probability function or the probability density.

Histograms. A histogram is a random variable whose density function is piecewise constant. The histogram P is defined by the grid {x^i = 0, ...,n}; on the interval [xj_i,xj], i = 1,...,n the histogram takes constant value pi.

Interval histograms. In most applications it is impossible to obtain an accurate probability density function, then we estimate it from below and above. Such estimates are usually approximated by intervals. A random variable is called an interval histogram if its probability density function P(x) is a piecewise-interval function.

Second order histograms. In the case of epistemic uncertainty second order histograms are also used, along with interval histograms. The probability density function P(x) of a second order histogram is a piecewise-histogram function, i.e., a histogram such that each column of it is again a histogram [8].

Piecewise linear functions. Piecewise linear functions also can be considered as a tool for approximation of the density function of a random variable. A piecewise linear function is a continuous function that is linear on each segment [xi_1, xi], i = 1,..., n.

Splines. A spline is a sufficiently smooth piecewise-polynomial function. We consider random variables whose probability density functions are approximated by splines.

Analytically given probability density. Random variables with probability density given analytically.

2. Operations on probability densities of random variables

Here we consider the arithmetic operations on probability density functions of different kinds.

Operations on discrete values. Let * € {+, —, •,/, |} be an arithmetic operation of two independent discrete random variables £ and n £ and n take values xi and yi with probabilities pi and qi, respectively. The result of this arithmetic operation is a random variable ^ that assumes values xi * yj with probability piqj-.

Because "the combinatorial explosion" is possible, it is necessary to transform discrete random variables to other types, e.g., a histogram. For this purpose, the algorithms of [9], for example, can be used.

Operation on histograms. Let p(x, y) be the joint probability density function of two random variables x and y, and pz be the histogram that approximates the probability density of the arithmetic operation on two random variables x * y, where * £ {+, —, •, /, |}. Then the probability of z being in the interval [zi, zi+1] is determined by the formula from [10]

P (zfc <z<zfc+i)= / p(x, y)dxdy, (1)

Jnk

where = {(x, y)|zfc < x * y < zfc+i}.

The numerical implementation of this method follows. Let the histogram variable x be given by the grid {ai} and probabilities {pi}, and the variable y by {bi} and {qi}, respectively. Let [ao,an] and [b0, bn] be the supports of the probability densities of these variables, and the rectangle [a0, an] x [b0, bn] be the support of the joint probability density p(x1, x2). We divide the rectangle [a0, an] x [b0, bn] into n2 rectangles [ai, ai+1] x [bj, bj+1], and the probability of getting into such is a constant piqj- for independent random variables, and pij for dependent ones.

To compute the required histogrampz, we walk through all the rectangles [ai, ai+1] x [bj, bj+1]; and for each of them we calculate its contribution into each segment [zk , zk+1] of the resultant histogram. To this end, we consider the region

= ^fc n ([ai, ai+1] x [bj, bj+1]), - 169 -

and compute the integral over Qk

Pzk = / p(x,y)dxdy. (2)

Note that for each [aj, ai+1] x [bj, bj+i] the joint probability density p(x, y) is constant, therefore this integral equals the ratio of the area of Qk to the area of [aj,aj+1] x [bj,bj+1]. Having walked through all the boxes, we compute the histogram pz. These computations require O(n2) arithmetic operations.

Operations on a histogram and a discrete random variable. Consider operations of the form x * c, where * G {+, —, •,/}, c is a constant and x is a random variable with the probability density fx. If * G {+, —} then the probability density function fz of the random variable z = x * c can be easily expressed as follows: fz(£ * c) = fx(£), £ G R. Let * be multiplication and c = 0, then fz(£) = fx(£/c)/c. If c = 0 then the random variable z takes only one value 0 with probability 1. The operation of division by c = 0 is performed analogously: fz(£) = fx(£ • c) • c, £ G R.

In the case of operations on a discrete and a histogram random variable we consider the segments Ai x [bj, bj+1] instead of the rectangles [aj,aj+1] x [bj,bj+1] . We proceed analogously to the previous case, walking through all these segments and computing the contribution of each of them into the resultant histogram. The only difference in the numerical implementation is that we compute the ratio of the segments' lengths, not areas. However, if the the number of values assumed by the discrete random variable is large, these calculations can cause a significant increase of time cost. In such a situation, a discrete random variable is represented by a histogram, and then the operation produces a histogram, as described in the preceding subsection.

Operation on a histogram and an analytically given density function. In this case the computation is similar to the case with two histograms. But the joint probability density is not constant, and we need to compute the integrals of the form (2). The result is then a histogram that approximates the density distribution of the resulting random variable.

Operations on the second order histograms. Let X, Y be two second order histograms defined by the grids { vj, i = 0,1, ... n } and { wj, i = 0,1, ... n }, and the sets of histograms { Pxi }, { Pyi }. Let Z = X * Y and * G {+, —, •, /, |}. We compute Z as a second order histogram. Let {zi, i = 0,1,... n} be a grid, then following (1) the histogram Pzi on the interval [zk, zk+1] is determined by the formula

Pzk = if X(£)Y(n)d£dn/(zk+1 — zk), J Jnk

where Qk = {(£,n)|zk < £ * n < z^}.

Note that the function X(£)Y(n) on each rectangle [vi-1,vi] x [wj_1,wj] is a constant histogram Pxi • Pyj. The integral of a constant histogram over a region is the value of the histogram multiplied by the area of the region.

3. Procedures

The designed package includes the following modules: addition, subtraction, addition of a random variable and a number, multiplication, multiplication of a random variable by a number, division, rational exponents of a random variable, the computation of the mean and the dispersion, normalization.

4. Test. Comparison with Monte Carlo

In order to test the numerical operations on histogram variables, we consider the addition of four random variables uniformly distributed on [0,1].

Note that the probability density of the sum of n uniformly distributed variables is

Pn(x) = T-1^-1 - cn(x - 1)n-1 + C2(x - 2)n-1 - ...) (3)

(n — l)!

where Ck are binomial coefficients, and for each fixed value of the argument x the sum in brackets comprises only those terms for which the value of (x - k), k = l, 2,... is nonnegative [11]. Thus, when n = 4 we have:

p(x) = <

l 3

-x 3, 6 '

- 2 x3 + 2x2 - 2x +2,

1 x3 — 4x2 + 10x — 22,

2 3 '

-1 x3 + 2x2 - 8x + 32, if 3 < x < 4. 6 + +3 , ^ ^

if 0 < x < 1; if 1 < x < 2; if 2 < x < 3;

Table 1. The errors of histogram arithmetic and Monte Carlo Methods

n N = 104 N = 105 N = 106 ||Hn - Pn || 2.

10 0.0059 0.00168 0.00037 4.16e-3

20 0.0055 0.00198 0.00041 5.39e-4

50 0.0026 0.00103 0.00026 3.47e-5

100 0.0023 0.00062 0.00018 4.35e-6

150 0.0016 0.00055 0.00016 1.28e-6

200 0.0014 0.00044 0.00014 5.44e-7

Let N be the number of repetitions, n be the mesh of the grid, Hn the histogram probabilistic extension of p for n (the exact histogram), Pn the natural histogram extension of p for n obtained by performing arithmetic operations, and MCn,N the histogram approximation of p obtained by Monte Carlo method for n, N

The table presents the approximation errors ||Hn — Pn||2 and ||Hn — MCn,N||2 in norm for the sum of four uniformly distributed random variables. We note that for a fixed n the error of Monte Carlo method decreases as « 1/vN, while the rate of convergence for the natural histogram extension is a « 3.5 [12]. Moreover, the number of operations in histogram arithmetic is O(n2), and the number of operations for Monte Carlo method is O(N).

Suppose that we want to achieve accuracy e. The number of operations for Monte Carlo method is O(e-2) that should be compared to O(e-2/a) required in histogram arithmetic. Thus, we conclude that the approach relying on the histogram operations is about e-2(1-1/a) times more efficient than Monte Carlo methods.

It follows immediately from Table 1 that histogram arithmetic is about 100-1000 times more efficient than Monte Carlo methods.

5. Increase in accuracy

More accurate results can be obtained if the sought probability density function is represented as a piecewise linear function or a spline. This can be achieved in two ways.

The first one is to smooth the resultant histogram. For example, connecting the middle point of the histogram columns we obtain a piecewise linear function that approximates the probability density.

Otherwise, one can determine the values of the probability density function of the operation on two random variables at specific points that are represented by curves on the graph of the joint probability density function of the random variables (these curves are lines for addition, subtraction, and division, and hyperbolas for multiplication). Computing the integral over these curves, we obtain the probability of getting into these specific points, after normalization of the result we construct a piecewise-linear function or a spline. Instead of an integral over a curve we can compute the probability of getting into a strip (as in the histogram case). The strip can be taken to be a sufficiently small neighborhood of the curve.

References

[1] B.S.Dobronets, Interval Mathematics, Krasnoyarsk, KSU, 2004, (in Russian).

[2] D.Berleant, Automatically verified reasoning with both intervals and probability density functions, Interval Computations, 1993, no. 2, 48-70

[3] W.Li, J.Hym, Computer arithmetic for probability distribution variables, Reliability Engineering and System Safety, 85(2004).

[4] R.Williamson, T.Downs, Probabilistic arithmetic. I. Numerical methods for calculating convolutions and dependency bounds, International Journal of Approximate Reasoning, 1990, no. 4., 89-158.

[5] V.A.Gerasimov, B.S.Dobronets, M.Yu.Shustrov, Numerical operations of histogram arithmetic and their applications, Automation and Remote Control, 52(1991), no. 2, 208-212.

[6] B.S.Dobronets, O.A.Popova, Numerical Operations on Random Variables and their Application, Journal of Siberian Federal University. Mathematics & Physics, 4(2011), no. 2, 229-239 (in Russian).

[7] B.S.Dobronets, O.A.Popova, Numerical probabilistic analysis and probabilistic extension, Proceedings of the XV International EM'2011 Conference, Oleg Vorobyev, ed., Krasnoyarsk, SFU, RIFS, 2011, 67-69 (in Russian).

[8] B.S.Dobronets, O.A.Popova Histogram time series, Proceedings of the X International FAMES'2011 Conference, Oleg Vorobyev, ed., Krasnoyarsk, RIFS, SFU, KSTEI, 2011, 127-130 (in Russian).

[9] A.V.Kryanev, G.V.Lukin, Mathematical Methods for treatment undefined data, 2nd ed., Rev., Moscow, FIZMATLIT, 2006 (in Russian).

[10] B.V.Gnedenko, A course in the theory of probability, Moscow, Nauka, 1988 (in Russian).

[11] S.P.Shary, Interval analysis or Monte-Carlo methods? Computational Technologies, 12(2007), no. 1, 103-112 (in Russian).

[12] B.S.Dobronets, O.A.Popova, Numerical probabilistic analysis under aleatory and epistemic uncertainty, 15th GAMM-IMACS Iternational Symposium SCAN'12, Book of Abstacts, Novosibirsk, Russia, Institute of Computational Technologies, 2012, 33-34.

Программная реализация операций над случайными величинами

Борис С.Добронец Артем М. Кранцевич Николай М. Кранцевич

В статье рассмотрена программная реализация операций над различными видами случайных величин. Представлены алгоритмы арифметических операций над случайными величинами, заданными своими функциями плотности вероятности. Рассмотрены задачи преобразования типов случайных величин. Приведены оценки точности построенных операций. Произведено сравнение точности реализованных операций с методом Монте-Карло.

Ключевые слова: численные операции над случайными величинами, гистограммная арифметика, гистограммы второго порядка, Монте-Карло, интервальная математика.

i Надоели баннеры? Вы всегда можете отключить рекламу.