Научная статья на тему 'Neural network modeling of data with gaps'

Neural network modeling of data with gaps Текст научной статьи по специальности «Математика»

CC BY
108
19
i Надоели баннеры? Вы всегда можете отключить рекламу.

Аннотация научной статьи по математике, автор научной работы — A. N. Gorban, A. A. Rossiev, D. C. Wunsch

A method of modeling data with gaps by a sequence of curves has been developed. It is a generalization of iterative construction of singular expansion of matrices with gaps. The derived dependencies are extrapolated by Carleman's formulas. The method is interpreted as a construction of neural network conveyor.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Предложен метод моделирования данных с пробелами последовательностью кривых. Он является обобщением сингулярного разложения матриц с пробелами. Полученные зависимости экстраполируются с использованием формул Карлемана. Метод интерпретируется как конвейер нейронов

Текст научной работы на тему «Neural network modeling of data with gaps»

variable, linear functions and their superpositions. Appl. Math. Lett., 1998. V. 11, No. 3, pp. 45-49. Gorban A.N., Wunsch D. The General Approximation Theorem. Proceedings IJCNN'98, IEEE, 1998. PP. 1271-1274.

13. Kolmogorov A.N. On representation of continuous functions of 17 several variables in the form of superposition of continuous functions of one variable. Dokl. AN SSSR, 1957. T. 114, No.

5. P. 953-956. 18

14. Arnold V.I. On representation of functions of several variables

in the form of superposition of functions of smaller number of 19 variables // Matematicheskoe prosveshchenie, 1958. Vyp. 3. P. 41-61.

15. Stone M.N. The generalized Weierstrass approximation theorem, Math. Mag., 1948. V.21. PP. 167-183, 237-254.

16. Rumelhart D.E., Hinton G.E., Williams R.J. Learning internal

УДК 681.32:007

NEURAL NETWORK MODELING OF DATA WITH GAPS

A. N. Gorban, A. A. Rossiev, D. C. Wunsch II

representations by error propagation, Parallel Distributed Processing: Exploration in the Microstructure of Cognition, D.E.Rumelhart and J.L.McClelland (Eds.), vol. 1, Cambridge, MA: MIT Press, 1986. PP. 318 - 362.

. Bartsev S.I., Okhonin V.A. Adaptive networks of information processing. Preprint IF SO AN SSSR, Krasnoyarsk, 1986, ?59B, 20 p.

. Neuroinformatics / A.N. Gorban, V.L. Dunin-Barkovsky, E.M. Mirkes i dr. Novosibirsk: Nauka (Sib. otdelenie), 1998.

. Mirkes E.M. Neurocomputer: a project of standard. Novosibirsk: Nauka (Sib. otdelenie), 1998.

Haaiftmëa 12.02.2000 niera aopoÖKH 01.03.2000

A method of modeling data with gaps by a sequence of curves has been developed. It is a generalization of iterative construction of singular expansion of matrices with gaps. The derived dependencies are extrapolated by Carleman's formulas. The method is interpreted as a construction of neural network conveyor.

Предложен метод моделирования данных с пробелами последовательностью кривых. Он является обобщением сингулярного разложения матриц с пробелами. Полученные зависимости экстраполируются с использованием формул Кар-лемана. Метод интерпретируется как конвейер нейронов.

Запропоновано метод моделювання даних з пропусками по-слiдовнiстю кривих. Biн е узагальненням сингулярного розкла-дання матриць з пропусками. Отримат залежностi екстра-полюються з використанням формул Карлеманна. Метод ттерпретуеться як конвеер нейротв.

INTRODUCTION

Information about an object under study often carries gaps and erroneous values. Generally, data need preliminary processing, e.g. to fill gaps and correct distorted values. Explicit or implicit such a preprocessing takes place almost every time.

Let there be a table of data (aij), its rows correspond

to objects, and its columns correspond to features. Then, let a part of information in the table be missing - there are gaps (some aij = @, symbol @ is used to denote gaps in

the data). The main problem that arises in this connection is to fill the existing gaps plausibly. There exists an associated problem - to "repair" the table: to distinguish the data with unfeasible values and correct them. In addition, it is useful to construct a calculator associated with the table, to fill the gaps in the incoming data about new objects and repair these new data (under the assumption that the data are interrelated in the same way as in the initial table). Such a calculator implies these data about new

objects to be interrelated as in the initial table.

It should be emphasized that these problems deny discussion of either true data values or statistical provability, they discuss plausibility only.

The described problems are especially difficult (and simultaneously - attractive) in the cases when the density of gaps is high, their location is irregular, and the amount of data is small, for instance, the number of rows is approximately equal to the number of columns.

Ordinary regression algorithms - construction of empirical dependences of some data on other - are inapplicable here. If the gaps are located irregularly, this would require to construct dependencies of unknown data with respect to all their known locations in the table. This would actually

mean to construct 2n 1 dependences, where n is the number of features. To recover any unknown set of data at least something should be known. In this connection one needs to use the method of modeling data by manifolds of small dimension.

The essence of the method is as follows. The vector of data x with k gaps is represented as the k-dimensional linear manifold Lx , parallel to k coordinate axes corresponding to the missing data. Under a priori restrictions on the missing values instead of Lx we have a rectangular parallelepiped Px c Lx. The manifold M of a given small dimension (in most cases - a curve) approximating the data in the best way and satisfying certain regularity conditions) is searched for. For the complete vectors of data the accuracy of approximation is determined as a regular distance from a point to a set (the lower bound of the distances to the points of the set). For the incomplete data in its stead use is made of the lower bound of the distances between the points of M and Lx (or, accordingly, Px). From the data the closest to them points of M are subtracted - we

obtain a residual, and the process is repeated until the residuals are close enough to zero. Proximity of the linear manifold Lx or parallelepiped Px to zero means that the

distance from zero to the point of Lx (accordingly, Px)

closest to it is small. To further specify the method is to determine how the manifold M is constructed.

The idea of modeling data by manifolds of small dimension has been conceived long ago.

Its most widespread, old and feasible implementation for the data without gaps is the classical method of principal components. The method calls for modeling the data by their orthogonal projections over "principal components" - eigen vectors of the correlation matrix with corresponding largest eigen values. Another algebraic interpretation of the principal component method is singular expansion of the data table. Generally, to present data with sufficient accuracy requires relatively few principal components and dimension is reduced by a factor of tens.

Its generalization for the nonlinear case - the method of principal curves - was proposed in 1988. There are generalizations of the classical method of principal components for data with gaps as well.

The work describes a technique of constructing a system of models for incomplete data. In the simplest case they are a generalization of the classical (linear) method of principal components for data with gaps. The quasilinear method below is superstructed over the linear and employs its results. Finally, the formalism of self-organizing curves is employed to construct an essentially non-linear method.

Each method is accompanied with a physical interpretation illustrating similarity of the methods and their sequential development.

In the general case the method of modeling data with gaps by manifolds (linear and non-linear) of small dimension appears to be more efficient as compared to ordinary regression equations.

The algorithms being developed can be applied when the data matrix cannot be reduced by rearrangement of lines and columns to the following box-diagonal form:

A1 @ . ■ @

A = @ a2 . • @

@ @ ... An

absence of data.

It is required to construct models that make possible to solve the following three problems related to recovery of the missing data:

1. To fill the gaps in data plausibly.

2. To repair the data, i.e. to correct their values in such a way as to make the models constructed work best.

3. To construct by the table available a calculator that would fill the gaps in data and repair them as they arrive (assuming the data in the line arriving at the input to be connected by the same relations as in the initial table).

This raises the question: how (in which metric) to evaluate the error of the model? To choose the measure of the error is required both to construct the models and to test them. From the viewpoint of simplicity of calculations the most attractive is the least squares method (LSM). By this method the error is calculated as the sum of squares of deviations over all known data (Mean Square Error -MSE). Yet, arbitrary rule associated with the choice of scale, i.e. normalization of data is present here, too.

1 THE LINEAR MODEL

1.1 Singular Expansion of Tables with Gaps

The material of this section is not immediately employed to process data, yet it provides a simplest example and a prototype for further constructions.

Let there be a rectangular table A=( a j ) the cells of

which are filled with real numbers or symbol @ denoting absence of data.

The problem is to approximate A best by a matrix of form Xiyj by the least squares method.

° = X (aij - x¡y]) i, j aij * @

(1)

where @ are the rectangular matrices with unknown elements. To establish connections between different At boxes in such tables is impossible.

STATEMENT OF THE PROBLEM

Let there be a rectangular table A=( a-) the cells of which are filled with real numbers or symbol @ denoting

The problem is solved by successive iterations by explicit formulas.

The halting criterion is the smallness of relative improvement A®/® , where AO is the decrease of value ® obtained in one cycle, and ® is the current value. The second criterion is the smallness of the value ® per se. The procedure comes to a halt when A®/®<£ or ®<S for certain £, 8 < 0 . As a result for the given matrix A find the best approximation by the matrix Pi of the form xyj.

Further on, look for A-Pi for the best approximation of the same form P2 and so on, until, e.g. the norm A does

not sufficiently approach zero.

Thus, the initial matrix A is presented in the form of a sum of matrices of rank 1, i.e. A= Pi + P2 +...+Pq .

1.2 Principal Component Method for Tables with Gaps

Following the method described in the previous section derive straight lines crossing the origin of coordinates. Such homogeneous models are not always required. Expand the initial table not over the matrices of form P= xy; but over the matrices of form P=xy; + b:. This brings us to the next problem:

O =

= I (aij - xiyi - bj) ^ min .

(2)

i, ] aij * @

Solution of the problem (2) yields models of data by the linear manifolds not necessarily crossing the origin of coordinates.

The basic procedure is to find the best approximation of the table with gaps by matrix of form xyj + bj [4, 5].

With the vectors yj and bj fixed the values xi, providing minimum to the form (2) unambiguously and simply

are defined from equalities d-® = 0 .

" xi

In analogy, with the vector xi fixed the values yj h bj , providing minimum to the form (2) are explicitly defined

from two equalities tt^-® = 0 and tt^-® = 0 .

dyj dbj

Initial values:

2

y is random, normalized to 1 (i.e. ^ yj = 1)

gaps.

Q-factor filling of gaps is their definition from the sum

of Q of obtained matrices of the form xyi; + b;,

i j j

Q-factor repairing of a table is its substitution with the

Q of obtained matrices of the form x y + b .

i j j

Describe the procedure of recovery of data in the line bj with gaps (some aj =@) arriving for processing. Let

there be constructed a sequence of matrices Pq of the form

x:y: + b; (P„ = xqyq + bq ), exhausting the initial matrix A

] ] q l ] o

with a preset accuracy. For each q determine the number xq (a) and vector ajq with a given line:

aj = aj ( aj ^ @ ) ;

I ( aq -1 - bq ) yq

q , , aj * @ x ( a ) = ■

i (yq )2

(3)

aj * @

a'q = aq 1 - bq - xq(a)yq, (aj * @) ;

here the manifold M is a straight line, the coordinates of points on M are assigned by the parametric equation Zj = tyj + bj and the projection PrM (a) is defined by (2):

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Pr ( a ) = t( a ) yj + bj ;

b;=— I a,-, where n- = I 1 (number of known da-

aij * @

aij * @

ta), i.e. bj is determined as mean value in a column.

Setting practically arbitrary initial approximations for yj and bj, find value xi then assuming yj and bj

unknown find their values for fixed xi and so on - these

simple iterations converge.

The halting criterion is the same as for the problem (1). Successive exhaustion of the matrix A. Find for a given matrix A the best approximation by the matrix P1 of the

form xyj + bj . Then find for A-P1 the best approximation

of the same form P2 and so on. The checking can be done,

e.g. by the residual dispersion of columns.

In the case of no gaps the method described brings to ordinary principal components - singular expansion of centered initial table of data. In this case, starting with q=2,

Pq = x1y\ (b=0). In the general case this is not necessarily so. It should be emphasized that centering (transition to zero mean values) is not applicable to the data with

I (a} - bj)yj

i, j

t ( a ) = j^*®-.

I (yj)2

i, j aij * @

For recovery of the Q - factor assume Q

aj = I xq(a)yq + bq, (aj * @).

q = 1

(4)

When there are no gaps the derived straight lines are orthogonal and we have an orthogonal system of factors. For the incomplete data this is not the case, but the derived system of factors can be orthogonalized, the process consists in recovery of the initial table by the derived system of factors following which the system is again recalculated, now on complemented data.

1.3 Mechanical Interpretation

Consider mechanical concepts underlying construction of the modeling manifold.

j

Figure 1

f(X) =

fi(X) fd (X

E {X| Xf(X) = X},

where

(5)

f x) = sup {X:||x - f(X)|| = inf ||x - f(^)||} (6)

1 X ^

is the function of projection on the curve.

In brief, every point on the principal curve is mean value of all data points projected on it. The mean square error between the data points and their closest projections on the principal curve in the j-th iteration is as follows:

Let the space of data have a straight rigid beam (Fig .1). Let every datum be connected with the beam by a spring with the end of the spring moving along the beam. Fix the initial position of the beam and find such a position of springs that corresponds to the minimum of elastic energy. Then fix the position of spring ends on the beam, release the beam and let it regain mechanical equilibrium. Then fix the beam in a new position and again release the spring ends.

The system monotonically approaches equilibrium - minimum of its energy, as at every stage the strain energy decreases. The obtained beam is nothing but the first principal component. The projection of the datum is, at this, determined by the site where the spring corresponding to it is fixed on the beam.

The iteration finding of the coefficients xi, yj and bj

described in the previous section is fully analogous to the described mechanical process: the position of the beam is determined by the coefficients yj , bj and the points where

the springs are fixed - by the coefficients xi .

The data with gaps are modeled by hard rods (one gap), planes, etc. A corresponding spring can freely move along these objects. This means that the spring efficiently connects the shadow (projection) of the beam on the subspace of known data with the point in this subspace.

2 THE QUASILINEAR MODEL

2.1 The Principal Curve Method

Take the principal curve method as a prototype to construct the nonlinear models [1-3].

Definition 1: (Principal curve). Let X denote a random

vector in the subspace Rd and let there be n data vectors.

The principal curve f c Rd is a smooth (C~) curve (of

dimension 1) in Rd parametrized by the parameter

X e A c R , going through the middle of d - dimensional data described by X,

MSEU) = E

X - f(j)[Xf(j)(X)]

(7)

The algorithm of constructing the principal curve is iteration involving calculation of mathematical expectation (5) and projection at each iteration.

2.2 A Method of Constructing Quasilinear Models

A simplest version of nonlinear factor analysis is super-structed over the linear one. It is proposed to use quasilinear models admitting simple explicit formulas for data processing based on the described algorithm of constructing linear models.

Let, as in the case of linear models, there be a table with gaps A. Quasilinear model is constructed in several stages:

1) To construct a linear model is to solve the problem (2). For the sake of certainty assume (y,b)=0, (y,y)=1 -this can be achieved always.

2) Interpolate (smoothen). Construct a vector-function f(t) minimizing the functional

* = I

ai, j - fj

I aikVk\\ + a J ( f "( t ))2 dt, (8)

i j aij * @

where a >0 is the smoothening parameter.

The problem is solved by cubic splines whose coefficients are found from the equality to zero of corresponding partial derivatives $ (8) on a certain uniform network.

The problem can also be solved with a polynomial of a small degree, however, even though such a solution makes possible to obtain a satisfactory interpolation with small calculation efforts, it cannot yield good extrapolation.

3) Extrapolation. The obtained function is extrapolated over the entire real axis.

2.2.1 Interpolation with a polynomial of small degree

In the case of approximating with a polynomial of degree n the problem is to best approximate the matrix A

by the polynomials of the form fj( x) = fjxn+ f]n _ 1 xn 1+ + ••• + fjx + f0 , i.e.:

O =

I

ai, j

"ij * @

ai, j - fj

2 r " 2 Iaikt/k + a I (f (t)) dt ^ min ,

where a >0 - is the smoothening parameter.

2.2.2 Interpolation by cubic splines

The following smoothening problem is to be solved by cubic splines:

of complex variables.

So, under consideration is the problem of analytical extension of the function, set on an infinite sequence of points {tk} (k=1,2,..). It is required to construct a formula of extension from a finite set best in the following sense: the sequence of functions fn( t) obtained by extension from the set {tk} (k=1,2,...n), converges faster than

for all other formulas of this class. Of course, it takes to additionally define the following concepts: what are "formulas of this class", what convergence is meant, etc. All this has been done in relevant mathematical literature [6].

The smoothened vector-function f(t) can be optimally extrapolated from a certain finite set {tk} (not necessarily

related with projections over the straight line Zj = tyj + bj of initial lines of data)

O = I

ai, j - fj

2 "2

I aikyk + a | (f (t)) dt ^ min,

ai'j * @

f( t)« ty + b + I (f( tk) - tky - b)x

where a >0 is the smoothening parameter, and f - the arbitrary smooth function.

2.3 The Problem of Extrapolation, Optimum Analytical Continuation and Carleman's Formula

The problem of extrapolating the available data beyond the limits of dispersion is well known. Its solution cannot be omitted - there is no guarantee that the data to follow will get exactly into the variation range of those available and it is not possible always limit the scope beyond this range. The necessity to construct new formulas for all possible data values is called by two circumstances more: first - the smoothened dependence constructed at the first stage is fundamentally interpolation and cannot be extrapolated, second - actually it carries in explicit form the information about each line of the data matrix. Smoothening, e.g. by merely a polynomial of a small degree by the least squares method is free from the second drawback (the information is "reduced" to several coefficients), but does not yield a good extrapolation.

Common (but fairly rough) extrapolation by the straight lines is: the resulting function f(t) is extrapolated from the segment (e.g. [a,b]) over the entire real axis owing to the first approximation constructed at the ends of the segment f(t)= f(a)+ f' (a)(t-a) with t<a h f(t)= f(b)+ f (b)(t-b) with t>b.

Optimal extrapolation is of more interest. Its more strict statement calls for involvement of the problem of analytical extension of the function (from the finite set of points over a straight line or space). It is also convenient to pass over from the real variable t to a band on the plane

Xt

k = 1

Xtfc

2(e - e k)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Xt, Xt;

Xt

Xt,

n

(e k + e j)(e - e 1)

Xt Xt

Xt Xtk Xt Xt Xt Xt

X(eXt + e k)(t - tk) j = 1 (e k - e j)(eXt + e ') j * k

(9)

where X is the parameter of the methods specifying the width of the band on the plane of complex numbers, where the extrapolated function is holomorphic with a guarantee (the width is n/X ).

Generally, to extrapolate according to Carleman for a set of nodes { tk } we take points uniformly placed on the

segment, but not the initial experimental data. The values of f(t) at these points we find by the interpolation formulas.

By Carleman's formulas we extrapolate the deviation of the curve f(t) from the straight line ty+b. The Carleman's formulas provide for good extrapolation of analytical functions over the entire straight line (it cannot be guaranteed, of course, that in every specific case it is the formula (9) that will assure the best extrapolation, yet, there are several theorems that the formula (9) and relevant formulas yield the best approximation for different classes of analytical functions [6]).

Smoothing and extrapolation by Carleman's formula can be combined into a single process, i.e. interpolation and extrapolation can be done simultaneously.

2.4 Mechanical Interpretation

Assuming the beam to be able to deviate elasti-cally from its straight form we have the following picture (Fig. 2)

m

calculates one coordinate of a point on the curve by the formula (9). Such a "neuron" acts on the vector a of input signals (with gaps) as follows: t(a) is calculated by the formula (1.6) (operation of the summator), then the nonlinear elements calculate fq( t (a)) after that the difference fl(t(a)) ( aj ^ @) is transmitted to the following neuron. With a traveling along this conveyor the sum of values of fl( t (a)) ( aj =@) builds up. It is these sums that form the

vector of output signals - the proposed values of the missing data. Should the need arise to make repairs of data the

Figure 2

sum of values of fjl( t (a)) is built up for each coordinate j.

The points where the springs are fixed on the beam are determined by the projections on the straight beam (as the model is quasilinear).

The problem arises to determine behavior of the beam ends beyond the boundaries of the data range - the above described extrapolation problem.

2.5 Application of Quasilinear Models

A point on the constructed curve f(t), corresponding to the complete ("complete") data vector a is constructed as f((a,y)), this is the quasilinearity of the method: First we find the projection of the data vector over the straight line Pr(a)=ty+b, t=(a,y) after that we construct a point on the curve f(t). This is also true for the incomplete data vectors - first we find on the straight line the nearest point t(a), then - respective point on the curve f(t) with t=t(a).

After the curve f(t) is constructed the data matrix is displaced with the matrix of deviation from the model. Then again we find the best approximation of the form xyj + bj for the matrix of deviations, again construct

smoothening, following which we extrapolate by Carleman and so on, until the deviations sufficiently approach zero.

As a result the initial table takes the form of the Q-fac-tor model:

hj -s fj (tq). l

(10)

Input summator

Nonlinear converters

Figure 3

If aij ^ @, this formula approximates the initial data, otherwise it yields a method of data recovery.

3 NEURAL CONVEYOR

The constructed algorithm makes possible neural network interpretation. Connected to each curve fq (t) is one summator (its weights are the coordinates of the vector yq ), a set of n free summands ("thresholds") - coordinates of the vector bq and n nonlinear converters each of which

The structure of neurons is not standard (Fig. 3) - it has one input summator and n nonlinear converters (in compliance with the dimension of the data vector).

Operation of the summator is not quite ordinary either -for incomplete data vectors it calculates the scalar product with available data and performs additional normalization. This additional normalization of the input weights of the summator takes into account only those with known respective values of the input vector coordinates).

4 THE SELF-ORGANIZING CURVES (SOC)

4.1 The Idea of Self-Organizing Curves

The quasilinear factors are good for the "medium nonlinear problems" where the contribution of the linear part is relatively substantial (about 50% or more). For essentially nonlinear problems it is but natural to use some approximation of principal curves instead of linear principal components.

Instead of linear manifolds of small dimension we propose to use corresponding self-organizing maps (SOM) or, in other words - self-organizing curves (SOC). The method we use is somewhat different from the method of Kohonen maps [7] by more transparent physical interpretation and

52

"PaflioeaeKTpoHiKa, iH^opMaTHKa, ynpaBaiHHfl" № 1, 2000

explicit form of the variation principle.

Let SOC be defined by a set of points (kernels) successively placed on a curve (in the first approximation let SOC be simply a broken line Y) and it is required to map on it a set of points of data X={xi}. Introduce operator Ï,

which for every vector x e X associates the nearest to it point from Y:

x ^ yj . ||yj - x|| ^ min,

(11)

each kernel yj is associated with its taxon

Kj = {x e X|x ■

■yj }

(12)

The method of constructing SOC resembles the methods of dynamic kernels except for the additional restrictions on

connectivity and elasticity. The minimized value is constructed from the following summands: the measure of data approximation -

D1 = I I IIх - yj\\ .

(13)

j x e Kj

the measure of connectivity (the points close on the curve must get close on the data subspace) -

D=

= Ill

jj

yj +1

(14)

the measure of nonlinearity -

4.2 The Local Minimum Problem

As opposed to the linear and quasilinear cases this problem of minimizing the functional (4.6.) is not convex and there arise difficulties associated with getting to the local minimum range. This can result in unsatisfactory solution of the problem.

Even though there are numerous methods to solve this problem, we would rather dwell upon the multigrid method and the "annealing" method.

4.3 The Smoothening Problem

The obtained broken line {yj} (j=1..m) can be smooth-

ened by different methods, e.g. by cubic splines. This, however, brings forth certain difficulties associated with finding projections of the data over the smoothened curve, as this requires to solve algebraic equations of the 5th degree.

This called for construction of the initial broken line of a continuous projector. Corresponding to the ends of the broken line are values (-1) and 1, respectively, the projections of nodes onto the segment [-1, 1] are determined by the nodes of uniform (considered is the number of the nodes of the broken line only) or non-uniform (account is also taken of the distance between the nodes of the broken line) grid.

Final smoothening is done by Carleman's formulas. The broken line {yj} (j=1..m) replaces, at this, the principal

component of the classical method, while the smoothening process is analogous to construction of the quasilinear model.

D3 = III2yj- yj - 1-yj +11 •

(15)

So, to construct SOC we have to minimize the functional:

D, _ 2 D = ■—1 + X- — + u-

X m

D

D

(16)

where X, are the parameters of connectivity and nonlin-earity - "moduli of elasticity" (division by the number of points |X| and the number of kernels m means normalization "in one summand" and makes possible to use identical methods of varying X and for samplings of different size).

When division of a set of data into taxons is fixed, SOC are constructed unambiguously - a simple linear problem is solved. When the position of kernels is also fixed the taxons can also be easily constructed by the formulas (11), (12). Dividing the problem into successive search of kernels - taxons - kernels we have an algorithm whose convergence is ensured by the fact that the D criterion decreases with its each step (16).

4.4 Mechanical Interpretation

The above described linear and quasilinear models have a very strong restriction - the beam is either rigid or it is flexible, but is built along a straight line, which is essential e.g. in the case when the data are placed not along some straight line, but along a circumference (or along a strongly curved arc, at least).

To circumvent it the beam should be elastic (to be determined not by a straight line, but by a curve). This presents difficulties in defining the distance from the point to the curve in space (even more so, when it is not a point but a linear manifold).

In the above described method close to the method of Kohonen's self-organizing maps the sought for elastic beam is presented in the form of a broken line whose nodes are freely connected with the data (Fig. 4).

In analogy to the rigid case in several iterations the system regains its equilibrium. Their number is, at this, finite, as at every step the springs decrease their total energy, and the number of possible states (methods to fix the nodes of the broken line by the springs to the data) is finite.

2

^ min

Figure 4

The introduced moduli of elasticity represent the degree of attraction of the nodes of the broken line to each other and the degree of resistance to bending at the nodes, respectively.

5 EXPERIMENTAL RESULTS

Illustrate the process of modeling the data with gaps on the basis of the table of presidential elections in the US with 31 election situation (from 1860 to 1980). Each election in the table contains data of 12 binary features [10]:

1. Has the presidential party (P-party) been in power for more than one term? (More1)

2. Did the P-party receive more than 50% of the popular vote in the last election? (More50)

3. Was there significant activity of a third party during the election year? (Third)

4. Was there serious competition in the P-party primaries? (Conc)

5. Was the P-party candidate the president at the time of the election? (Prez)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

6. Was there a depression or recession in the election year? (Depr)

7. Was there an average annual growth in the gross national product of more than 2.1% in the last term? (Val2.1)

8. Did the P-party president make any substantial political changes during his term? (Chan)

9. Did significant social tension exist during the term of the P-party? (Wave)

10. Was the P-party administration guilty of any serious mistakes or scandals? (Mist)

11. Was the P-party candidate a national hero? (R.Hero)

12. Was the O-party candidate a national hero? (O.Hero)

The table also contains information about results of elections (victory of the presidential or opposition party).

The values of the binary features are equal to -1 (answer "no" for the input feature or the victory of the presidential party ) and to 1 (answer "yes" for the input feature or the victory of the opposition).

The models constructed by this table confidently predicted results of the second election of Reagan, victory of Bush over Ducacis, both victories of Clinton [11].

The degrees of approximation (in per cent of the initial value) of the table by several factors according to the model (Table 1) were as follows (Table 1).

If the error in the calculated value of the feature is less than 50%, this means that it is the exact value (the features are qualitative, therefore, with the error less than 50% the sign of prediction defines exact value).

For satisfactory prediction by linear models suffice are 4 factors, which indicates that this is a 4-factor problem (in the ordinary meaning of this word).

The quasilinear models, SOC-based models, in particular, usually do with only one nonlinear factor to predict satisfactorily.

The produced sets of factors were tested as follows:

1. A model was constructed on a complete table, then gaps were randomly added into the table, following which the procedure of filling the gaps was launched. As a result the obtained values were compared to the initial values.

The testing demonstrated that up to 25% of gaps (of the total number of initial data) were satisfactorily filled by linear and quasilinear models. For SOC-based models this index is 50%. I.e. even every other datum is eliminated from the table, still it can be recovered with satisfactory accuracy.

2. Gaps were randomly introduced into the table, then a model was constructed on the basis of "gap" table, following which the procedure of filling the gaps was launched. As a result the obtained values were compared to the initial values. The filling was satisfactory when the gaps amounted to 10% of the total number of initial data.

The method was actually tested on the problem of predicting complications in myocardial infarction. The table of data on complications in myocardial infarction presents observation of 1700 patients by 126 parameters. Experiments demonstrated that the first 15-20 quasilinear factors are sufficient to satisfactorily repair most values of features in the table 1.

DISCUSSION

A method applicable to fill gaps and repair data with gaps has been developed. Three different versions of the method - from simplest linear models to the method of principal curves for the data with gaps have been presented. The neural network implementation of the method makes possible easy construction of its parallel implementations.

Table 1

№ Feature Approximation (%) depending on the number of factors

Linear model Quasiliniear model SOC

1 4 10 1 4 10 1 4 10

1 Morel 11.88 59.98 77.36 25.37 63.75 95.91 53.49 80.81 96.85

2 More50 9.07 61.10 79.43 14.99 73.69 95.18 30.77 75.89 95.12

3 Third 29.66 4489 91.56 31.73 6697 97.45 32.94 76.83 96.93

4 Conc 62.30 63.28 77.84 69.51 77.12 90.24 72.63 78.86 95.42

5 Prez 31.72 59.68 80.27 45.58 68.01 93.03 56.08 74.85 95.18

6 Depr 32.17 52.43 93.38 37.95 71.08 95.56 58.86 80.53 95.62

7 Val2_1 4.12 37.67 94.19 6.27 69.22 96.53 28.80 72.23 95.44

8 Changes 2.33 49.87 86.19 16.81 61.15 94.95 13.01 72.01 93.77

9 Wave 25.13 62.33 80.34 33.18 66.82 95.65 32.68 63.96 96.71

10 Mist 50.34 61.05 86.17 64.83 70.52 97.55 60.80 81.35 96.90

11 R_Hero 33.35 48.12 90.69 54.86 66.27 97.52 27.30 83.67 95.76

12 O_Hero 36.55 50.07 92.03 45.69 68.41 97.55 52.22 76.42 96.17

13 Answer 69.22 69.96 81.78 92.27 9285 96.72 97.82 98.43 99.50

The given algorithm of filling gaps - as opposed to many other algorithms designed for this purpose - does not involve its a priori filling. However, it calls for preliminary normalization ("dedimensionalizing") of the data -transition in each column of the table to a "natural" unit of measurement. It is noteworthy that data centering cannot turn the problem of processing data with gaps into a homogeneous problem.

Of great interest is the question: how many summands (principal curves) should be taken to process the data? There are several versions of answers, yet most of them are subject to the heuristic formula: the number of summands must be minimal among those that provide for satisfactory (tolerant) testing of the method with the known data. Such a principle of "minimum sufficiency" is specific for many neural network applications [8,9].

The method developed manifests itself in the form of an "anzatz" - a sentence, but not a series of theorems. This is not incidental - we propose a technology of constructing plausible evaluations of missing data, but not of their unknown real value.

REFERENCES

1. Hastie T., Stuetzle W. Principal curves. Journal of the American Statistical Association. 1988, Jun. V. 84, No. 406. PP.502-516.

2. LeBlank M., Tibshorany N. Adaptive principal surfaces. Journal of the American Statistical Association. 1994, Mar. V. 89, No. 425. PP. 53-64.

3. Kramer M.A. Nonlinear principal component analysis using autoassociative neural networks. AlChE Journal. 1991. V.37, No. 2. PP. 233-243.

4. Gorban A.N., Makarov S.V., Rossiev A.A. Neural conveyor to recover gaps in tables and contruct regression by small samplings with incomplete data // Matematica. Computer. Obra-zovanie. Vyp. 5. Part II. Selected Transactions / Ed. G.Yu. Riznichenko. M.: Progress-Traditsiya Publishers, 1998. PP. 27-32.

5. Rossiev A.A. Modeling data by curves to recover gaps in tables // Neuroinformatics methods / Ed. A.N. Gorban, Krasnoyarsk: KSTU Press, 1998. PP.6-22.

6. Aizenberg L.A. Carleman's Formulas in Complex Analysis. Theory and Applications / Ed. M. Hazewinkel - Kluwer Academic Publishers-Dordrecht/Boston/London, 1993, 300 p.

7. Kohonen T. Self-Organizing Maps. Springer: Berlin - Heidelberg, 1997.

8. Gorban A.N., Rossiev A.A. Neural networks on PC. Novosi-birsk:Nauka, 1996. 276 p.

9. Neuroinformatics / A.N. Gorban, V.L. Dunin-Barkovsky, E.M. Mirkes et al. Novosibirsk: Nauka (Sib. Otd-nie), 1995. 256 p.

10. Lichtman A.J., Keilis-Borok V.I., Pattern Recognition as Applied to Presidential Elections in U.S.A., 1860-1980; Role of Integral Social, Economic and Political Traits, Contribution N 3760. 1981, Division of Geological and Planetary Sciences, California Institute of Technology.

11. Gorban A.N., Waxman C. Neural Networks for Political Forecast. Proceedings of the WCNN'95 (World Congress on Neural-Networks'95, Washington DC, July 1995). P.176- 178.

Hafliftmja 08.02.2000 nicrn flopo6KH 01.03.2000

i Надоели баннеры? Вы всегда можете отключить рекламу.