Научная статья на тему 'The analysis of nonparametric mixture properties with a probablity density of a multidimensional random variable'

The analysis of nonparametric mixture properties with a probablity density of a multidimensional random variable Текст научной статьи по специальности «Математика»

CC BY
66
17
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
MIXTURE OF PROBABILITY DENSITIES / NONPARAMETRIC ESTIMATION / LARGE SAMPLES / ASYMPTOTIC PROPERTIES

Аннотация научной статьи по математике, автор научной работы — Lapko A. V., Lapko V. A.

The asymptotic properties of a mixture with nonparametric estimations of probability density with a multidimensional random variable are researched in this article. They are compared with the properties of the traditional Rosenblatt–Parzen type nonparametric probability density estimation, depending on the quantity of the composed mixture and dimension of the random variable.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «The analysis of nonparametric mixture properties with a probablity density of a multidimensional random variable»

A. V. Lapko

Institute of Computational Modelling, Russian Academy of Sciences, Siberian Branch, Russia, Krasnoyarsk

V. A. Lapko

Siberian State Aerospace University named after academician M. F. Reshetnev, Russia, Krasnoyarsk

THE ANALYSIS OF NONPARAMETRIC MIXTURE PROPERTIES WITH A PROBABLITY DENSITY OF A MULTIDIMENSIONAL RANDOM VARIABLE

The asymptotic properties of a mixture with nonparametric estimations of probability density with a multidimensional random variable are researched in this article. They are compared with the properties of the traditional Rosenblatt-Parzen type nonparametric probability density estimation, depending on the quantity of the composed mixture and dimension of the random variable.

Keywords: mixture of probability densities, nonparametric estimation, large samples, asymptotic properties.

The application of nonparametric statistics methods based on the estimations of Rosenblatt-Parzen type probability density [1; 2] is a rapidly developing modelling method of priori uncertainty systems. However, when the research conditions of the system are complicated, there appear methodical and computing difficulties in traditional nonparametric algorithms and models; this can be clearly observed during the processing of statistical data in great amounts.

The perspective "detour" direction of the arisen problems consists in the application of decomposition principles of training samples according to their size, and the application of the parallel calculation technology.

The purpose of this work is to prove the effective usage of decomposition principles when processing large-scale arrays of statistical data, on the basis of the asymptotic properties' analysis for a nonparametric estimation of probability density mixture.

Let sample V = (x', i = 1, n) from n independent observations of k - dimensional random variable x = (xv, v = 1, k) be with a probability density p (x). The

type p (x) is a priori unknown.

Let's divide sample V into T observation groups Vj =(x', i e Ij), j = 1, T . Multiple observation numbers x in the group with number j shall be identified

n = N

as Ij. While: Q Ij = I = (i = 1, n) .The quantity

j=1

of units in samples Vj = (x' , i e Ij) is equal and equals

_ n n = — .

T

At each sample Vj let us construct a nonparametric

estimation of probability density with a multidimensional random variable x [1]:

Pj(x) = -

1

n c

xn®

f - xA

j = 1, T . (1)

symmetry. The parameters of nuclear cv = cv (n) functions decrease with the increase of n .

Let the intervals of component xv value change for vector x be identical. In these conditions it is reasonable to assume that the values of coefficients cv in nonparametric estimations of probability densities pj (x),

j = 1, T are identical and equal to c. Then estimation (1) will look as:

(x ^ xn®

r x„ - x ^

n c

j = 1, T . (2)

As for magnifying p (x) with statistical sample V we

shall use a mixture of nonparametric estimations of a probability density type:

= 1 T _

p (x )=t x pj(x ).

(3)

j=1

Statistics (3) allows the usage of parallel calculation technology while estimating the probability density in conditions of large samples.

The asymptotic properties p (x) are defined by the following statement.

The theorem. Let p (x) and its first two derivatives

from each component xv, v = 1, k be limited and

continuous; the ®( uv ) conditions:

nuclear functions

satisfy

In statistics (1), the nuclear function ®(mv ) is satisfied to conditions of normalization, positivity, and

®(uv ) = ®(-uv), 0 <®(uv )<ro, J®(uv )duv = 1, J uv2 ®(uv )duv = 1, Jum ®(uv)duv , 0 <m ; v = 1, k ,

of sequence c = c (n) for blur coefficient in nuclear functions are such, that at n ^ro values, c ^ 0 and

nck ^ ro .

Then at finite values T the nonparametric estimation (3) of the probability density p (x)has a property of asymptotic unbiasedness and competence.

felj v=1

v=1

Hereinafter infinite limits of integration are omitted. The proof: 1. By definition:

M ((( x) )= T £ M (( (x)) = T i SJk

j—"

■in ®

1 V c

k

P( x1>

T j=1 nc ielj

Xk ) dX"" • • • dXk —

/

xv - tv

„2 k

W" — M(p(x)-p(x)Xp(2) (x),

2 v=1

(4)

where pf (x) - is the second derivative of the probability density p(x) at component xv.

From here, in condition that c ^ 0 at n ^ro, appears the property of the asymptotic unbiasedness for a mixture of nonparametric probability density estimations (3).

2. For convergence proof of p( x) in square mean we shall consider the following expression:

MJ(p(x)-p(x)) dx1...dxk =

—m i... i

7 X (p( x)- pj ( x))

. j—1

dx" • • • dxk —

— —— M

t 2

XJ.i(p(x)-pj(x)) dx"...dxk +

.j—"

■ XXJ. • • i (p( x) - pj( x) ) (p( x) - pt( x) )dx"• • • dxk

j—11—1 t * j

(5)

(6)

which, with great enough volumes of statistical data considering expression (4) is presented as:

2k

i••• i| p(x)+yXp(2)(x)| dx"•••dxk •

(7)

= ^ JK Jn^1^ J p^ h ) dt1 K dtk =

k

= J... Jn®(Mv) P(x1 - cw1,..., xk - cwk) du1... duk ,

v=1

where M - is a mathematical expectations sign. When performing the conversion, it is considered that statistical sample units Vj, j = 1, T are values of the same random variable t with a density probability of p(tj,..., tk).

Let's spread out p(x1 -cu ,., xk - cuk) in the Taylor row at point x = xj,...,xk and being limited by the first two terms of the series, we get:

Notice that the asymptotic statistics expression of

type:

M J... J pt (x)p(x) dx1... dxk

corresponds to:

( c2 k \ J... J^ p (x) +—a pV2)(x)^ p (x) dxj .dxk . (8)

Substituting expression (7), (8) in (6), after a series of simple conversions will give:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

M J . J ( p (x) - pj (x)) (p(x) - pt (x)) dxj... dxk ~

~ c4 J. j(i pV2)(x)^ dxj . dxk = c4b . (9)

In V. A. Epanechnikov's research [2] - an asymptotic expression for the purpose of square deviation in nonparametric probability density estimation p(x), composing the first part of expression (5), is received:

M i • • • i ( p ( x) - pj ( x) ) dx"... dxk

ni®2 (Uv )duv 4

' - +—B.

4

(10)

—1

n ck

Accounting (9) and (10), expression (5) with enough n values is represented as:

M J... i( p( x) - p( x)) dx"... dxk

ni®2 (Uv )duv 4 ' - +—B.

(11)

—1

Let's find the asymptotic component expression for the second part of expression (5):

Mi - i (p(x) - pj (x)) (p(x) - p, (x)) dx"... dxk —

— i... i p1 (x) dx"... dxk - Mi... i pt (x)p(x) dx"... dxk -- Mi... i pj (x)p(x) dx"... dxk + + M i... i pj ( x) pt ( x) dx"... dxk.

Let's transform its last part:

M i... i pj ( x) pt ( x) dx"... dxk —

— j. i M (pj ( x) )m (pt ( x) )dx ... dxk,

Tnck 4

It is not difficult to notice that in conditions c ^ 0 at nck ^ro the estimation n ^ro of probability density mixture (3) converges in square mean to p(x); considering the property of its asymptotic unbiasedness is well-founded.

At T = 1 the received result (11) coincides with Epanechnikov's theorem [2], which confirms the correctness of the fulfilled conversions.

The analysis of approximating properties of statistics p (x). For the efficiency analysis of a nonparametric

estimation of probability densities mixture (3) and the Rosenblatt-Parzen estimations of a probability density:

1 n k (

p (x h-V i II ®

f x., - x ^

let's consider the ratio of asymptotic expressions, corresponding to deviation squares for the best coefficients of blur values in nuclear functions.

i—" v—"

Let's define the minimum value W2 of expression (11)

with optimal coefficient c* values of blur nonparametric estimations pj (x) composing the probability densities mixture. In the accepted assumption value:

*

c =

k

kЩф2 (uv )du

nB

(k+4)

Then:

W2 =

k

Щф2 (Uv )du

Bk

(k+4)

4 + Tk

(12)

4 Tk(k+4)

If к = 1, then W2 - is coincides with the minimal

asymptotic expression of square deviation for the mixture of nonparametric probability densities estimations, obtained in study [3].

At Г = 1 and n = n expression (12) corresponds to the minimal asymptotic expression W2' for a deviation square of the probability density Rosenblatt-Parzen type estimation [2].

After simple conversions we get: W 4 + Тк

r2 =—

2 W

(4 + k )T

k

(k+4)

By analogy we shall calculate the ratio for the minimal values of the main dispersing composing statistics p (x)

and p (x):

W3 =-

Tk

k

(k+4)

f к Y

П|ф2 (Uv )du

Bk

(k+4)

W ' = ——

k

k(k+4)

f П|ф2 (Uv)du Л

Bk

k

(k+4)

Their ratio looks as:

R3 =

Wl W3

(k+4)

T

It is easy to be convinced, that the ratio of asymptotic expressions offset: W1, W/ for the estimated probability

density p ( x) and p ( x) at optimal blur coefficients for

nuclear functions, is equal to:

R = WL = T (k+4) 1 W'

Dependences of ratios R2 (a), R3 (b), Rj (c) from the dimension of random variable к and x = ( , v = 1, к) quantity T = 1-10 (curves 1, ..., 10), composing the nonparametric estimations mixture of probability density p(x) (3)

k

b

a

With growth of component quantity T of the nonparametric estimations mixture of probability density, there is an increase in ratio values R2> 1 (figure, a), R1> 1 (figure, c). The noticed deterioration of approximating mixture properties p (x) in comparison to traditional nonparametric estimation of density probability p (x) (12), points to the decrease in sample

sizes used during the estimation of compositions p (x).

This is a special feature of minor dimensions к of random variables. When complicating the estimating probability density with efficiency к , the growth of nonparametric estimations p (x) also decreases p (x). Criteria corresponding to them W2, W2' and W1, Wj become commensurable; this is evident in the decreasing of ratio R2 and Rj values.

The offered mixture p (x) of probability density estimations has a lesser dispersion in comparison to the nonparametric estimation p (x), which is identified by its

structure, since statistics synthesis p (x) is carried out on

the basis of an averaging operator (figure, b). With a quantity increase in T composing the mixture of

nonparametric estimations p (x), the density probability

and dimension к of random dimensions increases.

On the basis of the asymptotic properties analysis for nonparametric estimations mixtures of probability density with a multidimensional random variable, the decomposition possibility for initial statistical data under a synthesis of nonparametric statistics in large samples conditions is justified. The researched statistics, in comparison to the traditional Rosenblatt - Parzen nonparametric evaluation, has a considerably smaller dispersion and allows using parallel calculating technologies.

References

1. Parzen E. On estimation of a probability density function and mode // Ann. Math. Statistic. 1962. Vol. 33. P. 1065-1076.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2. Epanechnikov V. A. Nonparametric estimation of a many-dimensional probability density // Teoriya veroyatnosti i ee primeneniya, 1969. Vol. 14. № 1. P. 156-161.

3. Lapko V. A., Varochkin S. S., Egorochkin I. A. Development and research of a nonparametric estimation of the probability density grounded on a principle of decomposition of learning sample on its size // Vestnik SibSAU. 2009. Vol. 1 (22). P. 45-49.

© Lapko A. V., Lapko V. A., 2010

D. V. Lichargin Siberian Federal University, Russia, Krasnoyarsk

GENERATION OF THE STATE TREE BASED ON GENERATIVE GRAMMAR OVER TREES OF STRINGS

In the article the principle of state trees generation is considered based on the generative grammars over trees of strings in such objects as the sentences of natural languages, as well as two and tree dimensional images. The image of the object as a forest is considered; including the trees of object different layouts for the purpose of complex system modeling.

Keywords: natural language generation, generative grammars, semantics.

The problem of natural language sentences generation is one of the key issues in the field of computer science and formal grammar theories. The issue of meaningful speech generation applies to the area of semantics and computer science [1-7]. The states tree generation issue is studied well enough in computer science and in system analysis. In respect to the question of meaningful phrases tree generation the problem is first of all connected to the method of sentence generation by means of Chomsky's generative grammars. Generative grammars are successfully applied in software such as electronic translation systems, expert systems, systems of orthography checking, etc.

The flash point of the article is the analysis prospects for using generative grammars not over strings, but over trees of strings. In this respect it is possible to solve the

task of generating grammatically and semantically meaningful speech more effectively and increasing the efficiency of different images analysis and synthesis aspects.

The importance of the issue on effective generating language meaningful constructions and two or three dimensional images is generally understood and is connected with the demands of linguistic and other software.

The purpose of this research is to apply generative grammars on the necessity basis over trees as means of meaningful speech generation connected with greater heterogeneous context.

The novelty of the work is in the application of generative grammars not over strings but over trees of strings.

i Надоели баннеры? Вы всегда можете отключить рекламу.