Научная статья на тему 'Особенности использования информации по Шеннону в задачах, связанных с линейной регрессией'

Особенности использования информации по Шеннону в задачах, связанных с линейной регрессией Текст научной статьи по специальности «Математика»

CC BY
60
13
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
КОЛИЧЕСТВО ИНФОРМАЦИИ ПО ШЕННОНУ / ЛИНЕЙНАЯ РЕГРЕССИЯ / СТОХАСТИЧЕСКИЙ ПАРАМЕТР РЕГРЕССИИ / SHANNON INFORMATION QUANTITY / LINEAR REGRESSION / STOCHASTIC REGRESSION PARAMETER

Аннотация научной статьи по математике, автор научной работы — Пичугин Юрий Александрович

В статье рассматривается использование количества информации по Шеннону (SIQ) в задачах, связанных с линейной регрессией. Показано, что SIQ, содержащееся в компонентах отклика относительно стохастических параметров, выражается через информационную матрицу Фишера, является выпуклым функционалом на множестве компонент отклика и при достаточно большом масштабе параметров эквивалентно использованию D-критерия в задачах планирования эксперимента. Определено SIQ относительно постоянных параметров регрессии. Рассмотрена альтернативная постановка задачи оптимального планирования эксперимента (OEP) и проанализирована ее связь c традиционной постановкой. Рассмотрена задача информативного упорядочивания данных при использовании регрессии на базис главных компонент. Предложены алгоритмы, учитывающие ценность информации при наличии частичных пропусков данных.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

The Shannon information quantity in the tasks associated with linear regression: usage pattern

The article discusses the use of the Shannon information quantity (SIQ) in the tasks associated with linear regression. It has been shown that the SIQ contained in the response components with respect to stochastic parameters is expressed through the Fisher information matrix, is a convex functional on the set of response components, and is equivalent to the use of the D-criterion in the problems of planning the experiment at a sufficiently large scale of parameters. The SIQ relatively constant regression parameters were determined. An alternative formulation of the optimal experiment planning (OEP) problem was considered and its relation to the traditional formulation was analyzed. The problem of information ordering of data using regression on the basis of principal components was considered. Some algorithms taking into account the value of information in the presence of partial data gaps were proposed.

Текст научной работы на тему «Особенности использования информации по Шеннону в задачах, связанных с линейной регрессией»

MATHEMATICS

DOI: 10.18721/JPM.12314 УДК 519.24

THE SHANNON INFORMATION QUANTITY IN THE TASKS ASSOCIATED WITH LINEAR REGRESSION: USAGE PATTERN

Yu.A. Pichugin Saint Petersburg State University of Aerospace Instrumentation, St. Petersburg, Russian Federation

The article discusses the use of the Shannon information quantity (SIQ) in the tasks associated with linear regression. It has been shown that the SIQ contained in the response components with respect to stochastic parameters is expressed through the Fisher information matrix, is a convex functional on the set of response components, and is equivalent to the use of the ^-criterion in the problems of planning the experiment at a sufficiently large scale of parameters. The SIQ relatively constant regression parameters were determined. An alternative formulation of the optimal experiment planning (OEP) problem was considered and its relation to the traditional formulation was analyzed. The problem of information ordering of data using regression on the basis of principal components was considered. Some algorithms taking into account the value of information in the presence of partial data gaps were proposed.

Keywords: Shannon information quantity, linear regression, stochastic regression parameter

Citation: Pichugin Yu.A., The Shannon information quantity in the tasks associated with linear regression: usage pattern, St. Petersburg Polytechnical State University Journal. Physics and Mathematics. 12 (3) (2019) 150-161. DOI: 10.18721/JPM.12314

ОСОБЕННОСТИ ИСПОЛЬЗОВАНИЯ ИНФОРМАЦИИ ПО ШЕННОНУ В ЗАДАЧАХ, СВЯЗАННЫХ С ЛИНЕЙНОЙ РЕГРЕССИЕЙ

Ю.А. Пичугин

Санкт-Петербургский государственный университет аэрокосмического приборостроения, Санкт-Петербург, Российская Федерация

В статье рассматривается использование количества информации по Шеннону (Б^) в задачах, связанных с линейной регрессией. Показано, что содержащееся в компонентах отклика относительно стохастических параметров, выражается через информационную матрицу Фишера, является выпуклым функционалом на множестве компонент отклика и при достаточно большом масштабе параметров эквивалентно использованию Б-критерия в задачах планирования эксперимента. Определено относительно постоянных параметров регрессии. Рассмотрена альтернативная постановка задачи оптимального планирования эксперимента (ОЕР) и проанализирована ее связь с традиционной постановкой. Рассмотрена задача информационного упорядочивания данных при использовании регрессии на базис главных компонент. Предложены алгоритмы, учитывающие ценность информации при наличии частичных пропусков данных.

Ключевые слова: количество информации по Шеннону, линейная регрессия, стохастический параметр регрессии

Ссылка при цитировании: Пичугин Ю.А. Особенности использования информации по Шеннону в задачах, связанных с линейной регрессией// Научно-технические ведомости СПбГПУ. Физико-математические науки. 2019. Т. 12. № 3. С. 164-176. БО1: 10.18721/ 1РМ.12314

Introduction

The study considers Shannon information as its main subject. Let us briefly describe how this concept evolved in literature.

The quantity of information in the transmitted message, first defined by Claude Elwood Shannon, is closely related to the concept of entropy. Average entropy H(£), or the information transmitted in a message £ by n characters (values)

{x1, X2, Xn},

occurring with probabilities

{P?(1), P?(2), P(n)} follows the expression [1]:

H = i )• lo§2 P(i ) = IP( i H.

i=1 i=1

where H(£) = —log2P=(i) is the individual entropy.

Theoretically, any number a > 1, which sets the scale, can be used as the base of the logarithm.

If there is another random variable n taking m values, i.e.,

^V У2, Уm},

with probabilities

{P(1), P(2), P(m)}

then, according to Shannon, the quantity of information /(£,n) contained in the message £ relative to the message n or in n relative to £ (there is symmetry in this case) is expressed as

n m

1 fe n) = -EZPh( i". j )*

i=1 j=1

xlog c

Pnfr j) P (i) Pn( j) '

where P^O'j) is the probability that £ takes the value x. and n takes the value y, respectively.

If P^O'j) = 0, the corresponding member of the sum is assumed to equal zero.

Gelfand and Yaglom [2] proved that if \ and n are two random Gaussian vectors, then the amount of information /(|,n) contained in the vector \ relative to the vector n and vice versa is n m

1 fe = '' j )x i=i j=i

< s (1)

xlog (i,j)

g aP,{i) Pn( j)'

where V , V are the covariance matrices of the

£ n

components of vectors \ and n, respectively; I is the unit matrix of the corresponding dimension; V. is the covariance matrix of the components of vector \ and vector n,

= '

where T is the transpose operator.

It was established in [2] that Shannon's quantity of information was independent of the scale of variables and invariant with respect to linear transformations of the vectors \ and n. This allows to represent Eq. (1) in the form

In) = -loga |sina1sina2 ...sinah\, (2)

where cosa. = r (j = 1,2,..., h) is no other than a geometric interpretation of the correlation coefficient ri between a pair of independent (principal) components of these vectors. In this case, h = min(n,m). Apparently, Eq. (1) allows to use SIQ in multidimensional problems.

A notable applied study by Pokrovsky [3] applied SIQ to solving the problem of optimization of satellite observations based on the linear regression model. This study is very close to the problem of optimal planning of a regression experiment (referred to as optimal experiment planning (OEP) from now on). However, the case of stochastic regression parameters (see the section "Relationship between SIQ and the Fisher information matrix" below) is considered in [3], departing from the traditional statement of the general OEP problem (see the section "Traditional and alternative statements of the OEP problem").

The goal of this study has consisted in understanding the relationship between SIQ, the Fisher information matrix and the OEP criteria in an alternative statement of the problem, developing the ideas of [2, 3], and finding new possibilities for using this mathematical framework in different practical applications.

Relationship between SIQ and the Fisher information matrix

The classical statement of the regression analysis problem includes the following model:

y = F0 + £, (3)

where y is the measurement vector with the dimension n; 0 is the vector of parameters to be estimated, with the dimension m; F is the (n x m) matrix (n > m); £ is the random error vector.

It is assumed here that the components of the vector £ obey the multidimensional normal distribution, have zero mean values, are mutually independent and have the same variance o2, i.e.,

e ~ N(0,o2I),

where 0 is a zero vector. Therefore,

y ~ N(F0, o2I),

i.e., V = V.

' y e

In addition, we shall assume that F is a matrix of full rank.

The Fisher information matrix of the regression parameter vector 0, which is a derivative of the information contained in the response components relative to these regression parameters, is determined by the following formula [4]:

M = -E<

d 2 L

de, e

j

where E is the expected value operator, L is the log-likelihood, i.e.,

L = ln

-1

(2n) 2 det 2 (Vg):

x exp

1

- 2 (y - F0 )T Vg-1 (y - F0)

It follows then that

M = o 2FTF = V-1,

where

VS

( FT F

-1

(4)

(5)

is the covariance matrix of OLS (ordinary least squares) estimates of the components of the regression parameter vector 0.

Different convex functionals of the information matrix M (see below) can serve as a measure of the information contained in the response components with respect to regression parameters (3). In this case, the vector of OLS estimates of these regression parameters is expressed as

0 = | FT F ) 1FTy (6)

:(FTF)

(see monograph [5]).

If the assumptions about the covariance matrix of regression residuals are violated when

this matrix has an arbitrary structure, i.e.,

8 ~ N(0, V) (V * o2I),

we have, instead of Eqs. (4)—(6), the following expressions, respectively:

M = FT Vg-1F = V-1,

VS =

( FT Vg-1F )-1,

0 = ( FT Vg-1F ) 1FT Vg-1y.

(7)

(8)

(9)

Eq. (9) is known as the generalized method of OLS estimation.

It follows from Eqs. (5) and (8) that the OLS estimate vector 0 has a normal distribution

or

0 ~ N (0, ü2(FTF)-1

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

0 ~ N (e,(FT Vg-1F)-1

(depending on the structure of the matrix Ve).

Each of these distributions defines a corresponding likelihood function. It is easy to verify that the Fisher information matrices of the vector 0 calculated from these likelihood functions coincide with the matrices given by Eqs. (4) and (7), respectively.

Let us assume that the regression parameters have a stochastic nature and obey a multidimensional normal distribution, i.e.,

0 ~ N(E0,Ve).

Then, even though the distribution of the vector y is different, Eqs. (5), (6) or (8), (9) are still valid for OLS estimates of the components of the vector 0 constructed of the parameters of regression (3), depending on the structure of the matrix Ve, and the form of the probability distribution of the vector 0 and the Fisher information matrix do not change (see Eqs. (4), (7)). However, any convex functional of this matrix serves as a measure of information with respect to the parameter vector 0, contained in its OLS estimate. It is this information matrix (as more invariant) that is considered in case of stochastic parameters, for which the following theorem holds.

Theorem 1. The quantity of information I(y,0) contained in the response components in model (3) with respect to stochastic normally distributed parameters, provided that F is a full-rank matrix,

is related to the Fisher information matrix M by the formula

I (y, 0) = 2log« det (I + MVq) . (10)

Proof. Assuming that regression parameters (3) are stochastic, we have the following equalities:

V = FV8FT + V ,

y 8 s'

V 8 = FV8 and V8 = V8Ft.

y8 8 8y 8

Let us use these expressions to transform Eq. (1): I ( y, 0) = - 2log « det ( I - FV0 V0-1V0Fr V-1 ) =

Vy Vy1 - FV(

^ Vy"1 ) =

=- 2iog «det ( =- 2iog«det

2 log« det ( Vs Vy"1 ) = 2 log« det ( Vs-1Vy ) = 1

Vy - FV0F

V,-11=

= ^log «det ( VS1

Vs+ FV0F

T

A* = (A|T) h (B*)T = (BT|T),

where block T has the dimensions n * (n - m). Let us fill block T by the following rule:

ti + m, j «i + m, j + m

_S (5*0),

= ^2 log« det (I + Vs-1FV0Fr) = 2log « det (I + FT Vs-1FV0) = = 2log « det (I + MV0).

We used the algebraic identity

det (I+AB) = det (I + BA)

for the equality (*), which is obvious in case of square matrices at least one of which is invertible.

Indeed, let the matrix A be invertible. Then, multiplying the left-hand side of this equality under the determinant by A-1 on the left and by A on the right, we obtain the identity. It is known that this transformation, called the similarity transformation, does not change the values of the determinant.

Now let A and B be rectangular matrices; A is a matrix with dimensions n * m, B is a matrix with the dimensions m * n (n > m).

Based on the statement of the theorem, we may assume that A is a matrix of full rank. Without loss of generality, let us then assume that the first m rows of the matrix A are linearly independent. We extend the rectangular matrices A and B to square ones with the dimensions n * n:

if i = j, j = 1,2,...,w - m;

the remaining elements of T are filled with zeros.

The given identity then holds for matrices A* and B* (see above). Next, tending 5 to zero, we have the required identity for the original matrices A and B.

Theorem 1 is completely proved.

Eq. (10) and its formulation were previously given in [6, 7].

Assuming that parameters of regression (3) are stochastic, formulating Eq. (10) yields another equation, namely,

I(y, 0) _ 2log«detVy - 1log«detVe.

This equality provides a good description of SIQ in model (3) with stochastic parameters.

Traditional and alternative statements of OEP problem

Traditional statement. The initial statement of the OEP problem (that has become traditional) under the conditions of model (3) is reduced to calculating a set of quantities [8, 9]:

P={PbP2, ...,Pw }

Pi > 0; i = 1,2,

, w;

Z Pt _ 1, i _1

(11)

called a plan.

These quantities are interpreted as relative frequencies at which the response components are measured (which is why it is termed the plan), similar to the law of discrete probability distribution. This concept was apparently borrowed from the theory of matrix games for mixed strategies [10], where a similar situation occurs. OEP theory uses the values comprising the plan to transform the information matrix by the following scheme (within the framework of this theory):

M ^ M(p) = (FTPF), (12)

where P = diag(ppp2,..., pn).

A requirement for maximization of some convex functional (criterion) from this transformed matrix (see below) is imposed on the plan.

Thus, a transition is made from the initial values p. = 1/n (i = 1,2,...,«) to optimal values at which some criterion of the transformed (see above) information matrix reaches a maximum, i.e.,

critM(p) = crit FTF ^ max.

The criteria (functionals of M(p)) considered include the determinant (D criterion), trace (A criterion), the minimum eigenvalue of the matrix M(p) (E criterion) and other convex functionals of this matrix [8, 9].

Clarification. It was easy to notice above that the information matrix is understood in OEP theory not exactly as the Fisher information matrix equal to o-2FJF (see Eq. (4)) but rather as the proportional matrix FJF [8, 9] or FJPF (see above), which is of no fundamental importance for the problem of maximization of the functionals of this matrix. On the other hand, it follows from the regression analysis described, for example, in monograph [5] that transition to the matrix FJPF is only practical if we use parameter estimate

e = ( FT PF ) 1 FT Py,

known as the generalized OLS estimate (see Eq. (9)), or, if P has a diagonal structure, as the weighted OLS estimate, which is what we actually have here. It follows from the Gauss—Markov theorem whose proof is given in [9] that it is correct to apply such an estimate only when, firstly, V8 * o2I, and, secondly, when the matrix P is inversely proportional to the matrix V8, i.e.,

P = cV"1 (c * 0).

However, no mention of these conditions is found in OEP theory [8, 9]. We can only assume that it is the result of discarding the factors o2 and o-2 in the equations for V~ and M, and sometimes 8 in some regression equations (see [9]) that transformation (12) turned out to be unrelated to the structure of covariance matrix composed of errors of regression (3), even though the final goal is precisely in estimating the parameters.

Alternative statement. The following statement is equally important from a practical standpoint.

Let us consider different proper subsets of the form

q ^{1,2,...,n}

(cardq indicates the number of elements in q).

Let y be a vector with the dimension cardq

containing components y only with numbers from the set q; let Fq be a matrix that contains rows only with numbers from the set q.

Using the vector yq, we can also calculate the OLS estimate of the vector 0, i.e.,

e = (fJF9 )-1 FJ y n. (13)

Then the covariance matrix of the vector follows the expression

VS =o

( FT F

-1

and the Fisher information matrix the expression

(14)

M =o 2FTF

In the case V8 * o2I, Eqs. (13) and (14) change similarly to Eqs. (4) and (6) (see Eqs. (7) and (9)). Taking a fixed value cardq = k (k < «), let us select a subset q that yields the maximum of some convex functional on the information matrix M .

This statement is fairly significant for the problem of saving observation tools with minimum loss in the accuracy with which all y components are reproduced. Indeed, as we substitute the OLS estimate of the vector 0 to model (3) by Eq. (13), we are interested in ensuring that all response values are reproduced with a minimum loss in accuracy, which is what minimizing the variance of the OLS estimate of the vector 0 should ensure.

Comparison of two statements. Let us establish that the alternative statement is reduced to the traditional one by imposing the corresponding restrictions. Indeed, p. = 1/k if i e q and p. = 0 if i g q, which does not violate the basic requirements for plan (11). Then the information matrix corresponding to the traditional statement of the OEP problem follows the expression

1 q2

M(p) = FT PF= - fJ Fq =— M q,

where Mq is determined by Eq. (14), i.e., it is the Fisher information matrix.

However, we also obtain a parameter estimate expressed by Eq. (13):

e = (FTPF ) 1FTPy =

= k

k

( Fq Fq ) , Fq y

q

( FqT Fq )-1;

q

FT y.

This estimate does not contradict the Gauss-Markov theorem.

Evidently, using, for example, the D criterion to solve the problem in this statement is only possible given k > m when

det ( fJ F„

* 0.

Therefore, the most convenient criteria in the alternative statement are A criterion, E criterion (see above), etc. However, as Seber observed in [5], the criteria that are independent of the scale of quantities are preferable. As noted above, SIQ does not depend on the scale of variables (see Eq. (2)). On the other hand, using SIQ implies that the regression parameters are stochastic in nature. We are going to confirm below that SIQ can be used in the alternative statement of the OEP problem if the parameters of regression (3) are constant values.

Properties of SIQ as an OEP criterion

in the alternative statement

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

It follows from Eqs. (10) and (14) that the SIQ contained in the components of the vector yq relative to stochastic components of 0 is expressed as

I ( y q, e)= ilogfl det ( I + M q V0).

(15)

= 2logfl det(i + MqM 1 ).

(16)

Comments.

1) In this definition, we equated I(yq,0) to I(y ,0), i.e., reduced the problem to the case when both vectors are stochastic, as required by Eq. (1).

2) If a = 2 (the bit is the unit of information in the binary numeral system), then

0 < I (yq, 0)< m (m = dim0).

Next, let us define the following functional on subsets of the form q (see above): ®(q) = I(yq,0).

Then the following theorem holds true. Theorem 2. The functional O(q) is convex, i.e., the inequality

0(ap + pq)>aO(p) + p O(q),

where a, p>0;a + p=1 holds true; p, q c {1,2,...,n}.

Proof. Let us transform Eq. (15) as follows:

I (y q, 0)= 2log a det ( V0-1 + M q

+ 2loga det V0.

(17)

Eq. (15) allows to determine the SIQ contained in the vector y relative to the vector of

•> q

regression parameters 0 for the case when the parameters of regression (3) are not stochastic in nature, i.e., have constant values.

Definition. If the parameters in model (3) are constant, then I(yq,0) is expressed by the formula

1 (y q' e) = « det (1 + M q Ve )=

Let us prove the inequality imposed by the theorem for the functional Y(q):

¥(q) = -2-logadet (V0-1 + M q ). (18)

Let p, q c {1,2,.,n}, as expected. Then, according to the definition of Y(q), we have:

Y(ap+pq)= = 2logadet (V0-1 + (aM p +pM q)) =

= 1log a det ((a+P)V0_1 + (aM p +pM q = 2loga det (a(V0-1 + M p) +

+P(V0-1 + Mq) > 2log «(deta (V0-1 + M p) x x detP (V0-1 + Mq) ) =

_a ^2log « det ( V01 + Mp ) +

+ p ^2log « det ( V0-1 + Mq )_ _ aT(p) + p^(q).

Here the inequality (**) follows from the general inequality

det(aA + pB) > detAa • detBP ;

the general inequality holds true under the given restrictions with respect to a and p in case of symmetric positive-definite matrices A and B (the proof is given in [9]), which the matrices V-1 and M and, therefore, their

8 q ' '

sum are.

Thus, we have the inequality

Y(ap + pq )>a¥( p ) + P^( q ).

Adding the term

2 log a detVe,

to both sides of this inequality, we obtain the following inequality:

¥(ap + Pq) + 1logadetVe > > a*F( p) + P*F( q) +(a + P)"" logadetVe.

In view of Eq. (17), we have the required inequality for the functional O(q).

Theorem 2 is completely proved.

Remark. It is evident from Eq. (16) and the proof of Theorem 2 that the functional

®(q) = I (yq,0)

is convex given constant regression parameters, when the quantity of information is determined by Eq. (16).

Let us return to the case of stochastic regression parameters. Here the following almost obvious theorem relating I(yq,0) to the D criterion (determinant of the information matrix, see above) holds true. Notably, while OEP theory traditionally does not consider the case of stochastic parameters, it does not explicitly prohibit it.

Theorem 3. If stochastic regressio« parameters have a sufficie«tly large scale, maximizi«g the SIQ co«tai«ed i« the respo«se compo«e«ts relative to the regressio« parameters is equivale«t to the D criterio«.

Proof. Let p, q c {1,2,.,«} and cardp = cardq. It follows from Eq. (17) that it is also possible in this case to consider the functional Y(q) defined by Eq. (18), instead of the functional

®(q) = I (yq,0).

It is fairly evident that if the stochastic regression parameters have a sufficiently large scale and, accordingly, the elements of the matrix V-1 are small, the following inequalities are certainly equivalent:

det ( M p + V0-1) > det ( M q + V0-1) det (M p )> det (M q ),

Theorem 3 is completely proved.

Notably, Eqs. (15), (16) and the statement of Theorem 2 are quite sufficient to justify using SIQ in an alternative statement of the OEP problem, for both stochastic and constant parameters of regression (3).

If we solve the OEP problem in an alternative statement for successive values of k (see above) from k = 1 to k = «, we obtain a sequence of response components by descending information contribution, as well as an increasing sequence of information quantities I.. It is convenient to represent the latter, excluding the dependence on the choice of the base of logarithm a, in the following form:

I. :=

j

(///(y,e))ioo% (j = 1,2,...,«),

which, given that the logarithm is monotonic, also means the required.

where I(y,0) is the total quantity of information when q = {1,2,.,«}.

Considerations arguing that the statement and solution of such a problem are valid are given at the end of the next section.

Using principal component regression in information ordering and accounting for

the value of information with data gaps

In most practical problems, we at best only have a sample of observations of a certain vector

{y, j = 1,2, ..., N}.

Such sample is typically represented as a so-called selective matrix Y of dimension « * N. Thus, the columns of the matrix Y are realizations of a random vector y. Let us assume that the vector y obeys a multidimensional normal distribution with the parameters 0y and Vy, i.e.,

y ~ N (0y, Vy).

Let us calculate the estimates of the distribution parameters

1 N

e j = N £ y j'

N j=1

1 N

VJ=N—1Z (yj-ej)(yj-ej)T. (19) N 1 j=1

Next let us calculate an orthogonal matrix Q, such that the equality

QTVQ = = diag(^1, X2,..., Xn)

holds and X1 > X2 >... > Xn.

Let us define a vector of dimension m (m < «) in the form

z = Qm (y-0 y),

where the matrix Q^ contains only the first m columns of the matrix Q.

The components of the vector z are mutually independent, are called selective principal components, and are interpreted as hidden factors. The matrix

Vz = = diag(^, £2, £m)

is essentially the estimate of the covariance matrix of components z.

The word selective is usually omitted but it should be borne in mind that we would have true principal components only with the known

distribution parameters 0 and V .

y y

The most correct choice of dimension for the vector z (m = dim z) is connected to verifying the statistical hypothesis

H: L > L >...> 1 > 1 +1

1 — 2 — — m — m + 1

1

m+2

1 .

However, there are relatively simple methods for finding m [11].

Let us now use centered values of the vector y, i.e.,

y := y-0v,

and consider the regression y = Fz +e,

(20)

and each existing (not missing) measurement to 1. Then each element n.. (i = 1,2,...,«; j = 1,2,...,N) of the matrix N is equal to zero or unity. The gaps in the matrix Y are also filled with zeros. Let us calculate the mean values of the components of the vector y by the formula N / N

yi = Z ytj Z nij(i =1'2'-'n)

j=1 / j=1

and proceed to the centered matrix Y, where v.. := v.. — y, if «.. ^ 0,

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

leaving the values that correspond to ni. = 0 unchanged, i.e., zeros.

Algorithm 1. Now, if stable sources of information are preferred over unstable for some reason, the estimate for the covariance matrix of components of the vector should be calculated by the formula

1 _ ~T y N

(21)

where F = Q^.

Let us assume that the residual vector e has the same properties as in model (3), i.e.,

e ~ N(0,o2I).

Model (20) differs from model (3) primarily in that the regression parameters in the first case (in model (20)) are, generally speaking, stochastic. As noted above, this model allows to solve different practical and theoretical problems related to information ordering by sequentially selecting response components with maximized SIQ at each selection step, i.e.,

I (y q, z) = 2 log«det (I + MqA(m) ),

or any of the classical criteria, for example, trM? [12].

In addition, model (20) allows to apply different approaches to determining the value of information with data missing due to instability of information sources.

Let the initial sample matrix Y contain gaps in individual components. We construct a matrix N of the same dimension as Y (n * N), where each missing measurement corresponds to 0,

If there are no gaps and the divisor is equal to N — 1, estimate (21) coincides with estimate (19). After calculating the matrix F = Qw and performing informational ordering of the components of the vector y, the numbers of the response components corresponding to unstable information (gaps) are likely to be at the end of this sequence (see above, the end of the previous section). Estimates of individual elements of the mutual covariance matrix by Eq. (21) with missing data are similar to estimates of autocovariance proposed by Jenkins and Watts in their well-known study [13]. The divisor in these estimates equals, regardless of the magnitude of the shift, and, accordingly, of the number of terms, the length of the series N, which, as established in [13], does not yield overestimated wavelengths. Such a constant divisor can reduce the value of unstable information in our case.

Algorithm 2. However, it is precisely the unstable sources of information that may prove to have the greatest value to the researcher. In this case, the elements of the covariance matrix should be estimated by the formula

Y Y

T

NN

T

ij (22)

(i, j = 1,2,..., n)

with [NN7]. + 0 and [Vy]. = 0 otherwise.

Here [,]. is the operator taking a matrix element from the ith row and from the jth column.

Eq. (22) ensures that the divisor is equal to the number of nonzero terms of the corresponding sum. If the estimate of the covariance matrix is calculated this way, the numbers of the response components with data gaps can appear at the beginning of the ordered sequence [14].

Recall that SIQ is calculated using biased estimates and has a geometric interpretation for a Gaussian distribution (see Eq. (2)). Therefore, a divisor equal to N could also be used in Eq. (19), while a biased estimate o2 should be used in model (20) [11].

Remark. The only serious criticism regarding SIQ as a measure of information quantity is that it does not take into account the subjective value of information to the consumer.

The algorithms proposed above solve this problem to some extent, since they can be also used in the most general case for calculating SIQ by Eq. (1). Moreover, calculating the estimates of the matrices V , V and V , we should

S n sn

construct and use the corresponding data availability matrices: N^ and Nn (see above). Evidently, if unstable information has high value (algorithm 2), the estimate V^ should be calculated using the corresponding matrix elements N?NTn as divisors (see Eq. (22)). Any subjective preferences for the value of information with respect to individual components of the given vectors can be satisfied by combining algorithms 1 and 2.

Before we discuss the results, we should note that using principal components regression and information ordering based on this regression (see the end of the previous section) allowed to solve a number of applied problems. The most important of these problems are the following:

optimal integration of satellite and ground-based observations [12];

detecting the core of the corruption structure [14];

detecting the zones with dynamic instability of atmospheric circulation [15];

optimization of environmental monitoring [16];

optimization of microelectronic production control [17].

This approach can be used for solving a wide range of applied problems, which confirms the practical significance of our study.

Results and discussion

Reviewing the situation as a whole, we should first of all pay tribute to Pokrovsky's studies (see Introduction and Ref. [3]), estab-

lishing that both the case of stochastic regression parameters and the alternative statement of the OEP problem are relevant for theoretical and practical problems. Pokrovsky was the first to use SIQ in the problem of optimizing of atmospheric sensing via satellites (see also introduction and Ref. [3]). However, that study did not address the relationship between SIQ and the Fisher information matrix (see Eqs. (10), (15), (16)), as well as other issues considered in our paper.

The traditional statement of the OEP problem has generated some interesting mathematical results, as this approach allows to solve the problems based on continuous plans by involving methods from mathematical analysis. Therefore, a theoretically substantiated answer to the clarification made in the section "Traditional and alternative statements of the OEP problem" would make these results more valuable for practical problems.

While the alternative statement of the OEP problem is fairly simple and clear, the following issue is still poorly understood. With a fixed

cardq=k* (k* < n),

there is obviously a proper subset q which gives the maximum of the optimal criterion

maxcritM (critM

? v ?

trMq, I(y?„), etc.).

However, we cannot be certain and neither can we obtain rigorous proof that we will find this exact value maxcritMq by solving this problem by successively increasing k from 1 to k* (see the end of the section "Properties of SIQ as an OEP criterion in the alternative statement") maximizing the given criterion at each step of sequential selection.

A similar situation is observed in constructing optimal regression [5]:

y = p0 + p1x1 +p2 x2+.. + pF xk' +8,

when we select the regressors providing the maximum determination coefficient R2 with a fixed number of k* from the set of potential regressors x (cardx > k*).

The problem can also be solved by sequential selection of regressors, maximizing the determination coefficient R2 at each step [5]; similarly, the question of whether we can obtain the maximum value of R2 by this procedure remains open.

It seems plausible enough that the answer is positive with a normal distribution. In practice,

this problem is usually solved in both cases by the inclusion and rejection procedure, stopping the process if repeated steps are detected [5].

Conclusion

We have obtained the following results.

1. We have proved that the SIQ contained in the response components with respect to stochastic parameters

a) is directly related to the Fisher information matrix;

b) is a convex functional on the set of response components;

c) is equivalent to using the D criterion in the OEP problem given a sufficiently large scale of parameters.

2. SIQ has been found with respect to constant regression parameters, where SIQ is

also a convex functional on the set of response components;

3. We have analyzed the relationship between traditional and alternative statements of the OEP problem.

4. We have considered principal component regression in the problem of information ordering of data.

5. We have proposed algorithms accounting for the subjective value of information with partially missing data.

Finally, we should stress that the above studies [3, 12, 14-17] conclusively prove that the regression model with stochastic parameters can be used in the alternative statement of the OEP problem; practical significance of the results obtained in this study is fully confirmed by [14-17].

REFERENCES

1. Shannon C.E., Weaver W., A mathematical theory of communication, Urbana University Press, Chicago, 1949.

2. Gelfand I.M., Yaglom A.M., Computation of the amount of information about a stochastic function contained in another such function, Uspekhi Matematichtskih Nauk. 12 (1 (73)) (1957) 3-52.

3. Pokrovsky O.M., Ob optimalnyh usloviyah kosvennogo zondirovaniya atmosfery [On the optimum conditions of the atmosphere indirect sensing], Reports of AS USSR, Physics of Atmosphere and Ocean. 5 (12) (1969) 13241326.

4. Kendall M.G., Stuart A., The advanced theory of statistics, Vol. 2: Inference and relationship, Charles Griffin & Company limited, London, 1958 -1966.

5. Seber G.A.F., Linear regression analysis, John Wiley & Sons, New York, London, Sydney, Toronto, 1977.

6. Pichugin Yu.A., O svyazi kolichestva informatsii po Shennonu s informatsionnoy matritsey Fishera v zadache planirovaniya regressionnogo experimenta [On the relation of the amount of Shannon's information with Fisher's information matrix in the problem of regression experiment planning], The Issue of Inter-University Scientific Articles "Informatics - Research and Innovation", No. 3, Saint-Petersburg, Herzen University (1999) 32-36.

7. Pichugin Yu.A., Notes on selecting data in the tasks associated with linear multiple regression, Industrial Laboratory. Diagnostics of materials. 5(2002) 61-62.

8. Matematicheskaya teoriya planirovaniya experimenta [Mathematical theory of experiment planning], Ed. S.M. Ermakov, Nauka, Moscow, 1983.

9. Ermakov S.M., Zhiglyavskiy A. A., Matematicheskaya teoriya optimalnogo experimenta [Mathematical theory of optimal experiment], Nauka, Moscow, 1987.

10. Von Neumann J., Morgenstern O., Theory of games and economic behavior, Princeton University Press, Princeton, 1953.

11. Pichugin Yu.A., Notes on using the principal components in the mathematical simulation, St. Petersburg Polytechnical State University Journal. Physics and Mathematics. 11 (3) (2018) 74-89.

12. Pichugin Yu.A., Pokrovsky O.M., On the method of complexing the ground and spaceborne meteorologic information, Remote Sensing, (6) (1992) 25-31.

13. Jenkins G.M., Watts D.G., Spectral analysis and its applications, Holden-Day, San Fracisco — Cambridge — London — Amsterdam (1969).

14. Pichugin Yu. A., Malafeyev O.A., Rylow

D., Zaitseva I. A statistical method for corrupt agents detection, International Conference on Numerical Analysis and Applied Mathematics (ICNAAM 2017), AIP Conf. Proc. (1978) 100014-1—100014-4.

15. Pichugin Yu.A., Geografiya dinamicheskoy neustoychivosti tsirkulyatsii atmosfery v Severnom polusharii (modelirovanie i analiz) [Geography of dynamic instability of atmospheric circulation in the Northern hemisphere (simulation and

analysis)], Reports of Russian Geographical Society. 137 (3) (2005) 12-16.

16. Pichugin Yu.A., Ekologicheskiy monitoring i metody mnogomernoy matematicheskoy statistiki [Environmental monitoring and methods of multivariate mathematical statistics], Astrakhan Bulletin

Received 22.05.2019, accepted 14.06.2019.

of Environmental Education (2) (2012) 101 — 105.

17. Gusman Yu.A., Pichugin Yu.A., Smirnov А.О., Shannon's informative value for control of microelectronic productions, Voprosy Radioelektroniki [Radio Electronics Issues]. (10) (2018) 6—10.

THE AUTHOR

PICHUGIN Yury A.

Saint-Petersburg State University of Aerospace Instrumentation

61 Bolshaya Morskaya St., St. Petersburg, 190000, Russian Federation

yury-pichugin@mail.ru

СПИСОК ЛИТЕРАТУРЫ

1. Shannon C.E., Weaver W. A mathematical theory of communication, Urbana University Press, Chicago, 1949. (Перевод на русский язык см. в Сборнике переводов «Теория передачи электрических сигналов при наличии помех» под ред. С.А. Железнова. М.: Изд-во иностранной литературы, 288 .1953 с.)

2. Гельфанд И.М., Яглом А.М. О вычислении количества информации о случайной функции, содержащейся в другой такой функции // Успехи математических наук. 1957. Т. 1 № .12 (73). С. 3-52.

3. Покровский О.М. Об оптимальных условиях косвенного зондирования атмосферы // Известия АН СССР. Физика атмосферы и океана. 1969. Т. 12 № .5. С. 1324-1326.

4. Кендал М.Дж., Стьюарт А. Статистические выводы и связи. М.: Наука, 899 .1973 с.

5. Себер Дж. Линейный регрессионный анализ. М.: Мир, 456 .1980 c.

6. Пичугин Ю.А. О связи количества информации по Шеннону с информационной матрицей Фишера в задаче планирования регрессионного эксперимента // Межвуз. сб. науч. трудов «Информатика — исследования и инновации». Вып. 3. СПб.: РГПУ им. А.И. Герцена, 1999. С. 32—36.

7. Пичугин Ю.А. Замечания к отбору данных в задачах, связанных с линейной множественной регрессией // Заводская лаборатория. Диагностика материалов. .2002 5 №. С. 61—62.

8. Математическая теория планирования эксперимента. Под ред. С.М. Ермакова. М.: Наука. Гл. ред. физ.-мат. лит., 1983. 392 с.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

9. Ермаков С.М., Жиглявский А.А. Математическая теория оптимального эксперимента. М.: Наука. Гл. ред. физ.-мат. лит-ры., 320 .1987 с.

10. Фон Нейман Дж., Моргенштерн О. Теория игр и экономическое поведение. М.: Наука. Гл. ред. физ.-мат. лит., 1970. 707 с.

11. Пичугин Ю.А. Замечания к использованию главных компонент в математическом моделировании // Научно-технические ведомости СПбГПУ. Физико-математические науки. 2018. Т. 3 № .11. С. 89-74.

12. Пичугин Ю.А., Покровский О.М. О методе комплексирования наземной и спутниковой метеорологической информации // Исследование Земли из космоса. 6 № .1992. С. 25-31.

13. Дженкинс Г., Ваттс Д. Спектральный анализ и его приложения. Выпуск 1. М.: Мир, 316 - .1971 е.; Выпуск 2. М.: Мир, 1972. - 287 с.

14. Pichugin Yu. A., Malafeyev O.A., Rylow D., Zaitseva I. A statistical method for corrupt agents detection // International Conference on Numerical Analysis and Applied Mathematics (ICNAAM 2017). AIP Conf. Proc. 1978. Pp. 100014-1-100014-4.

15. Пичугин Ю.А. География динамической неустойчивости циркуляции атмосферы в Северном полушарии (моделирование и анализ) // Известия Русского географического общества. 2005. Т. 3 № .137. С. 12-16.

16. Пичугин Ю.А. Экологический мониторинг и методы многомерной

математической статистики // Астраханский А. О. Информативность по Шеннону в

вестник экологического образования. .2012 контроле микроэлектронной продукции //

2 №. С. 105—101. Вопросы радиоэлектроники. 10 № .2018. С.

17. Гусман Ю.А., Пичугин Ю.А., Смирнов 10—6.

Статья поступила в редакцию 22.05.2019, принята к публикации 14.06.2019.

СВЕДЕНИЯ ОБ АВТОРЕ

ПИЧУГИН Юрий Александрович — доктор физико-математических наук, профессор Института инноватики и базовой магистерской подготовки Санкт-Петербургского государственного университета аэрокосмического приборостроения.

190000, Российская Федерация, г. Санкт-Петербург, Большая Морская ул., 61. yury-pichugin@mail.ru

© Peter the Great St. Petersburg Polytechnic University, 2019

i Надоели баннеры? Вы всегда можете отключить рекламу.