Научная статья на тему 'On non-parametric models of multidimensional non-inertial processes with dependent input variables'

On non-parametric models of multidimensional non-inertial processes with dependent input variables Текст научной статьи по специальности «Математика»

CC BY
38
6
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
НЕПАРАМЕТРИЧЕСКОЕ МОДЕЛИРОВАНИЕ / NONPARAMETRIC MODELING / NON-INERTIAL PROCESSES WITH DELAY / ИНДИКАТОРНАЯ ФУНКЦИЯ / INDICATOR FUNCTION / H-ПРОЦЕСС / H-PROCESS / БЕЗЫНЕРЦИОННЫЙ ОБЪЕКТСЗАПАЗДЫВАНИЕМ

Аннотация научной статьи по математике, автор научной работы — Medvedev Alexander V., Chzhan Ekaterina A.

The problem of identification of multidimensional non-inertial systems with delay is considered. Components of the input vector are stochastically related, and this relationship is unknown a priori. Such processes have "tubular" structure in the space of the input and output variables. In this situation methods of identification theory of non-inertial systems are not applicable. In general, it is not known a priori whether the process has "tubular" structure or not. To clear up this question the problem of estimation of the volume of a subdomain where "tubular" process takes place is considered. The initial data for this problem follows from the measurement of input-output variables. An algorithm for estimating the volume of the "tubular" subdomain in relation to the volume of the investigated process is suggested. The volume of the investigated process is always known from a priori information or production schedules. Numerical experiments are carried out with the use of the methodof statistical modeling. They show high effectiveness of the proposed algorithm.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

О непараметрических моделях безынерционных многомерных процессов с зависимыми входными переменными

Рассматривается задача идентификации многомерных безынерционных системс запаздыванием при стохастической зависимости компонент вектора входных воздействий, причем характер этой зависимости априори неизвестен. Подобные процессы имеют «трубчатую» структуру в пространстве входных-выходных переменных. Методы теории идентификации для построения моделей безынерционных систем оказываются неприменимыми. Вообще априори неизвестно, является ли интересующий нас процесс "трубчатым". Для анализа этого обстоятельства специально рассматривается задача вычисления объема подобласти, в которой протекает "трубчатый" процесс. Исходными данными являются результаты наблюдений входных-выходных переменных. Приведен алгоритм вычисления объема этой подобласти по отношениюк объему исследуемого процесса, который всегда известен из априорных сведений или технологического регламента. Проведены объемные численные исследования средствами метода статистического моделирования, которые свидетельствуют о достаточно высокой эффективности предложенных моделей.

Текст научной работы на тему «On non-parametric models of multidimensional non-inertial processes with dependent input variables»

УДК 512.54

On Non-parametric Models of Multidimensional Non-inertial Processes with Dependent input Variables

Alexander V. Medvedev*

Siberian State Aerospace University, Krasnoyarsky Rabochy, 31, Krasnoyarsk, 660014

Russia

Ekaterina A. Chzhan^

Institute of Information and Space Technology, Siberian Federal University, Svobodny, 79, Krasnoyarsk, 660041

Russia

Received 08.11.2016, received in revised form 12.03.2017, accepted 20.07.2017 The problem of identification of multidimensional non-inertial systems with delay is considered. Components of the input vector are stochastically related, and this relationship is unknown a priori. Such processes have "tubular" structure in the space of the input and output variables. In this situation methods of identification theory of non-inertial systems are not applicable. In general, it is not known a priori whether the process has "tubular" structure or not. To clear up this question the problem of estimation of the volume of a subdomain where "tubular" process takes place is considered. The initial data for this problem follows from the measurement of input-output variables. An algorithm for estimating the volume of the "tubular" subdomain in relation to the volume of the investigated process is suggested. The volume of the investigated process is always known from a priori information or production schedules. Numerical experiments are carried out with the use of the method of statistical modeling. They show high effectiveness of the proposed algorithm.

Keywords: non-parametric modeling, non-inertial processes with delay, indicator function, H-process. DOI: 10.17516/1997-1397-2017-10-4-514-521.

Introduction

One of the key factors of identification processes in various sectors of human activity (economy, production) is the use of a priori information about the process under investigation. An appropriate sample of observations of input-output variables can be obtained on site in experiment. In various practical problems, these variables can be stochastically related. The nature of this relationship is often unknown a priori. Ragnar Firsh drew attention to this fact in creating economic models [1]. He introduced the term multicollinearity - stochastic relationships between input variables. The close linear correlation between input variables leads to the loss in accuracy of coefficients of the estimated model or even makes impossible to obtain estimates [2]. This phenomenon is typical of many industries. Thus, the correlation between the world financial indicators was found [3]. The authors suggested to use in predictive models only such variables that are not linearly related. A linear model of net profit based on the actual data of financial statements of the "Svyaz" company was obtained using input variables that are not linearly

* medvedev@sibgau.ru tekach@list.ru © Siberian Federal University. All rights reserved

related [4]. The phenomenon of multicollinearity is typical of processes in genetics [5] and ecology [6]. There are parametric linear models that are traditionally applied to such processes. We consider the situation when there are stochastic non-linear relationships between input variables. We propose models of "tubular" process.

We consider dynamic processes. In practice, input variables are often measured at sufficiently small intervals At, for example, with electrical sensors (current, frequency, temperature, humidity, etc.). Some output variables can be measured at a substantially longer time interval AT, AT >> At (chemical analysis, physical and mechanical testing, etc.). Thus, the duration of the investigated process may be considerably less than the interval AT. In this case, the main idea is to treat such channel as non-inertial with delay and formulate appropriate problem of identification and control.

1. Problem statement

General scheme of the identification process is shown in Fig. 1. The input vector u(t) = (u1(t),u2(t),...,um(t)) E Q(u) C Rm has dimension m, the output variable vector x(t) = (xi(t),X2(t),... ,xn(t)) E Q(x) C Rn has dimension n. For simplicity, let us consider the case of scalar output variable x(t). System response channels GU1, Gu2,..., GUm, Gx correspond to input and output variables, and they include control tools. Random error of variable measurements has zero mean value and bounded dispersion. The object can be described as follows:

x(t + T) = A(u(t)) + £ (t),

(1)

where A is unknown object operator, t is the value of delay, £(t) is random disturbance with zero mean value and bounded dispersion.

Fig. 1. General scheme of the identification process

The input and output variables are continuous because of the nature of the process but measurements are carried out at discrete times At due to control tools. There is an initial sample of observations {u, Xi ,i = 1, 2,... ,s}, where s is the sample size.

Current information about the process (sample of observations) as well as available a priori information are supplied to the unit "model", where x(t + t) is the model output. This block contains a certain class of models. Thus, it is necessary to formulate the model of the process.

The peculiarity of these processes is the presence of a stochastic relationship between input variables [7]. Such processes are called "tubular" or H-processes. Conventional identification algorithms do not give satisfactory results in simulation of such processes so it is proposed to use the H-model.

2. "Tubular" processes

For reasons of simplicity and without the loss of generality, we consider the process with two input variables u1, u2 and one output variable x. Let us assume that values of input and output variables u1, u2 and output variable x are distributed in the range [0,1]. The domain of each variable is the interval, so the process takes place in the unit hypercube (Fig. 2). If there is a relationship between input variables:

ui (t) = f (u2 (t)), (2)

then the process proceeds along the line in the three-dimensional space. Thus, in Fig. 2 unit cube Q(u,x) is the domain of the process. The process observations belong not to the whole unit cube but only to the line which is located inside it. It should be noted that form (2) may be either linear or nonlinear.

Fig. 2. Simple scheme of a process with functionally related input variables

The relationship between input variables can be stochastic:

ui (t) = f (u2 (t))+ n(t), (3)

where ¡i(t) is the random disturbance with zero mean value and bounded dispersion. Let us define the domain where "tubular" process proceeds as (u,x). Volume of this subdomain (u,x) is less then the volume of the hypercube Q(u,x) (Fig. 3).

Peculiarity of this process is that it proceeds not in the whole domain Q(u,x), but only in subdomain QH (u,x). This must be taken into account in solving the identification problem of the "tubular" process. Thus, the use of conventional parametric models [8-10] for identifying the "tubular" processes can lead to unsatisfactory results. To make a prediction we set values of input variables u1 and u2 which belong to the regulated domain Q(u,x) but they do not belong

4 X

Fig. 3. Scheme of "tubular" process

to the "tubular" subdomain (u,x). The value of the output variable may not belong to the domain Q(u,x) (point C in Fig. 3). This value is easily eliminated because the boundary of Q(u,x) is always known. Another situation occurs when the value of the output variable x is in regulated area but it does not belong to QH(u,x) (point A). In this case, it is problematic to eliminate this value. Only point B belongs to the domain of the "tubular" process, i.e., (uB,xB) G QH(u,x) C Q(u,x).

2. Model of "tubular" process

Let us consider the use of conventional parametric models for identification of stochastic process when input variables are related. In particular, let H-process be defined with the linear equation in the three-dimensional space. The schedule of the process has the form of line in the absence of noise. The conventional parametric model for this process has the form

x(t + T) = Aa(u(t), a),

(4)

where Aa is the selected class of parametric models, a is the parameter vector.

If we use several samples of observations we get different values of the estimated coefficients a. Also, every model has the form of plane. Thus, estimations that determine the position of the plane in the space of input and output variables can vary significantly depending on the particular sample. It is obvious that such model can not adequately describe investigated processes.

It is proposed to supplement the conventional parametric models with indicator function I(u). Then model (4) can be reworked as follows:

x(t + T) = Aa(u(t), a)Is (u), where indicator Is (u) can be taken in the following form:

(5)

Is (u) = <

s m

1, if £ n* (y? - u{)) =0,

i=1 j = 1 s m

0, if £ n* {c-1 {u? - j) =0■ i=i?=i

(6)

The smoothing parameter cs is defined as a solution of minimization problem for the quadratic criterion which shows the equivalence between object and model outputs compliance. Solution of minimization problem is based on the method of "sliding examination" [10]. Parameter cs

and bell function $ ^¡"^uj — uj)^J satisfy the convergence conditions [11].

The initial sample of observations {ui, xi ,i = 1, 2,... ,s} is obtained by measuring the input and output variables. It is used in the calculation of parameters a in model (5). The initial sample acts also as a learning sample when we calculate the estimation of indicator function (6). If input variables are related then this relationship is contained in the initial sample, that is, all sampling points belong to the "tubular" domain. So, if we have to deal only with real data obtained from the object then the estimation of the indicator function (6) of output model (5) is equal to one. If we use model (5) with the value u' e Q(u, x) that does not belong to the field of "tubular" process then the indicator function is equal to zero. This indicates that the process at this value u' does not exist. If there is no relationship between input variables then model (5) coincides with the standard parametric model (4).

3. Volume estimation of "tubular" process domain

It is unknown a priori whether the process has "tubular" structure or not. Restoration of the relationship between input variables is a complex and time-consuming process, especially if the vector of input variables u(t) has high dimension. It is proposed to estimate the volume of "tubular" process domain ilH(u,x) using the following algorithm.

Algorithm.

Step 1. Generate initial learning sample [ui,xi,i = 1,2,...,s}. In practice, we measure input-output variables and use these observations as a learning sample.

Step 2. For every variable Uj ,j = 1,... ,m find the minimal Uj and the maximum Uj values: Uj £ [uj ,Uj ], j = 1, .. ., m.

Step 3. Generate test sample {ui, 1 = 1,..., s'} in the interval [u, u].

Step 4. Define sampling points {ui, 1 = 1,..., s'} that belong to the "tubular" subdomain. To do this, calculate the estimation of the indicator function (6).

Step 5. Then find the ratio of number of sample points si that belong to "tubular" subdomain (the indicator function for such elements is equal to 1) to the total size of the test sample s':

' = ^ (7)

Accordingly, the stronger is the relationship between input variables, the smaller is the value of v. If the process is not "tubular", i.e., all input variables are independent with each other then the value of v is close to 1.

4. Computer experiment

We carry out series of simulations of the "tubular" process. Let us assume that the object is described with the following equation:

x(t + t ) = 2ul(t) + sm u2(t) — 0.5u2 (t) + u4(t) — 0.3u2(t). (8)

This equation simulates the behavior of some real process. The initial sample is {xi+T, ui, i = 1, 2,..., s}, where the value of t is a multiple of discreteness At. Later a shift in the output variable x in the observation matrix of input and output variables is introduced so the delay in

subsequent expressions is omitted. The sample is {xi, ui, i = 1,2,..., s}. Equation (8) is not known. When measuring the output variable x random noise is introduced as

m = aZ (t), (9)

where Z(t) is a random variable uniformly distributed on the interval [-1, 1], a is the interference value. For example, if the interference is 10% then a = 0.1.

The investigated object has "tubular" structure due to the relationship between input variables. H-models describe such objects. Any a priori information on the form of the relationship between input variables is not available. The relationships between input variables are described as follows:

ui(t) £ [0, 3],

U2 (t) = Ui(t) + Hi(t),

< us(t) = sin(ui(t) + U2(t)) + fj,2(t), (10)

U4(t) = 0.3ui(t)u2(t) + H3(t), u5(t) = Ui(t) - U4(t) + in(t),

here variable u1(t) is the random number uniformly distributed on the interval [0, 3], ^i(t), i = 1,..., 4 are random values generated according to the following formula:

V(t) = Ъя (t), (11)

where я(t) is the random number uniformly distributed on the interval [-1, 1], Ъ is the value of the interference. Let us note again that the form of equation (8) and system (10) is not known. System (10) is needed to construct a model of the object based on observations of input and output variables. First, let us consider the traditional way of identification. Taking into account the identification theory, we assume the following parametric model for object (8) using a priori information [8]:

x(t + т) = aiui(t) + a2 sinU2(t) + asu^(t) + a4U4(t) + azu\(t). (12)

where ai,i = 1,..., 5 are unknown parameters.

Let us generate sample {u1i, u2i, u3i, u4i, u5i, xi,i = 1,..., s} and estimate coefficients ai, i = 1,..., 5 of model (13), using the least squares method [8] with various values of interference a, Ъ and the sample size s. Results are presented in the Tab. 1.

There is a small refinement of parameters of model (13) with the growth of sample size. The accuracy of the simulation at s = 500, a = 0, 5, Ъ = 0,5 is 0.07. However, stochastic relationship between input variables is not included in model (13). Consequently, this parametric model can not be used for prediction because the process does not exist in the regulated area. It is necessary to modify the parametric model with the indicator function:

x(t + т) = (au(t) + a2 sinu2(t) + a3u^(t) + a4u4(t) + a5u^(t)) Is(u(t)). (13)

where Is(u) is the indicator function (6).

We present the results of estimation of the volume of "tubular" process subdomain (Tab. 2). The sample {u', i = 1, 2, . . . , s'} is generated, using the proposed algorithm.

About 2% of elements of test sample belong to "tubular" subdomain in the case of 5%% noise level and 3%o of elements of test sample belong to "tubular" subdomain in the case of 10%. The results indicate that the region of the "tubular" process is much less than the regulated area. This means that the investigated process has "tubular" structure.

Table 1. Coefficients estimations of the model (13)

s a b ai a2 as a4 a5

500 0.5 0.5 2.01 2.99 -0.49 0.98 -0.24

1000 0.5 0.5 1.99 3 -0.5 1.01 -0.3

500 0.1 0.1 1.98 3 -0.51 1.09 -0.23

1000 0.1 0.1 2 3 -0.49 0.99 -0.21

Table 2. Volume estimation of "tubular" sub domain

s' a b si v

500 0.5 0.5 11 0.022

1000 0.5 0.5 24 0.024

500 0.1 0.1 16 0.032

1000 0.1 0.1 30 0.03

Conclusion

An algorithm for the identification of "tubular" processes with stochastic relationships between input variables is proposed. The form of these relationships is not known a priori. It is shown that the dynamical system with significant discrete control of output variable should be treated as non-inertial with delay. In this case, conventional models of the identification theory can not be used. The introduction of appropriate indicators is required. The H-model with stochastically independent input variables coincides with well-known models.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

The problem of volume estimation of the region QH(u,x) is also discussed. The method of calculating this volume is based on the Monte-Carlo method. This algorithm with the existing initial learning sample allows us to find out the presence or absence of a "tubular" structure of the process. Some numerical results of the implementation of proposed algorithms are presented.

References

[1] K.J.Arrow, The work of Ragnar Frisch, econometrician, Econometrica: Journal of the Econometric Society, 8(1960), no. 2, 175-192.

[2] S.A.Ayvazyan, I.S.Enukov, L.D.Meshalkin, Applied Statistics: Basics of modeling and primary data processing, M., Finansy i Statistika, 1983 (in Russian).

[3] A.V.Koltyshev, The methods of forecasting the financial condition of the oil and gas company, Problems of Geology and Mineral Resources Development: Proceedings of the XIX International Symposium of Academician M.A.Usov, 2(2015), Tomsk, 664-670 (in Russian).

[4] I.V.Orlova, E.S.Filonova, Selection of exogenous factors in the regression model with data multicollinearity, The International Journal of Applied and Basic Research, 5(2015), 108-116 (in Russian).

[5] J.G.Prunier et al., Multicollinearity in spatial genetics: separating the wheat from the chaff using commonality analyses, Molecular ecology, 24(2015), 263-283.

[6] S.F.Spear, N.Balkenhol, M.J.Fortin, B.H.McRae, K.Scribner, Use of resistance surfaces for landscape genetic studies: considerations for parameterization and analysis, Molecular Ecology, 19(2010), 3576-3591.

[7] A.V.Medvedev, E.D.Mihov, O.V.Nepomnyashchiy, Mathematical Modeling of H-processes, Journal of Siberian Federal University. Mathematics & Physics, 9(2016), no. 3, 338-346.

[8] Ya.Z.Tcypkin, Foundation of theory identification, M., Nauka, 1984 (in Russian).

[9] A.Fournier, D.Fussell, L.Carpenter, Computer rendering of stochastic models, Communications of the ACM, 25(1982), no. 6, 371-384.

[10] B.Peeters, G. De Roeck, Stochastic system identification for operational modal analysis: a review, Journal of Dynamic Systems, Measurement and Control, 123(2001), 659-667.

[11] E.A.Nadaraya, On estimating regression, Theory of Probability and its Applications, 9(1964), 141-142.

О непараметрических моделях безынерционных многомерных процессов с зависимыми входными переменными

Александр В. Медведев

Сибирский государственный аэрокосмический университет Красноярский рабочий, 31, Красноярск, 660014

Россия

Екатерина А. Чжан

Институт космических и информационных технологий Сибирский федеральный университет Свободный, 79, Красноярск, 660041

Россия

Рассматривается задача идентификации многомерных безынерционных систем с запаздыванием при стохастической зависимости компонент вектора входных воздействий, причем характер этой зависимости априори неизвестен. Подобные процессы имеют «трубчатую» структуру в пространстве входных-выходных переменных. Методы теории идентификации для построения моделей безынерционных систем оказываются неприменимыми. Вообще априори неизвестно, является ли интересующий нас процесс "трубчатым". Для анализа этого обстоятельства специально 'рассматривается задача вычисления объема подобласти, в которой протекает "трубчатый" процесс. Исходными данными являются результаты наблюдений входных-выходных переменных. Приведен алгоритм вычисления объема этой подобласти по отношению к объему исследуемого процесса, который всегда известен из априорных сведений или технологического регламента. Проведены объемные численные исследования средствами метода статистического моделирования, которые свидетельствуют о достаточно высокой эффективности предложенных моделей.

Ключевые слова: непараметрическое моделирование, безынерционный объект с запаздыванием, индикаторная функция, H-процесс.

i Надоели баннеры? Вы всегда можете отключить рекламу.