Научная статья на тему 'EVALUATION OF RELIABILITY’ INDICES AND CHARACTERISTICS OF THE POWER SYSTEM' EQUIPMENT AND DEVICES BY NON TRADITIONAL METHOD'

EVALUATION OF RELIABILITY’ INDICES AND CHARACTERISTICS OF THE POWER SYSTEM' EQUIPMENT AND DEVICES BY NON TRADITIONAL METHOD Текст научной статьи по специальности «Математика»

CC BY
18
4
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
representativeness / reliability indices / varieties of signs / statistical distribution function / simulation modeling / fiducially intervals / testing statistical hypothesis / type I and II errors / sample / population / multidimensionality

Аннотация научной статьи по математике, автор научной работы — Farzaliyev Y.Z., Farhadzadeh E.M.

In the paper considered the research expediency classification of statistical data according to the given varieties of signs. The researching carried out based on modeling of small and multidimensional samples to statistical distribution functions. A discrepancy found in the estimation of the mathematical expectation of the average values of sample implementation, to overcome this inconsistency, a new method for modeling samples of random variables is proposed. It established that the classification in the literature data carried out according to the varieties of signs accepted in the classifiers without control of expediency. The causes of errors arising in the evaluation of Kolmogorov statistics as the largest in absolute deviation are analyzed the deviation between statistical distribution functions of the population and sample using simulation modeling, fiducially intervals and the theory of testing statistical hypotheses. These erroneous calculations with a small number and multidimensionality of sampling implementations double increases of the Type II Error. Finally, the result showed the advantages of the new method in comparison with Kolmogorov’ criterion via checking representativeness of sample.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «EVALUATION OF RELIABILITY’ INDICES AND CHARACTERISTICS OF THE POWER SYSTEM' EQUIPMENT AND DEVICES BY NON TRADITIONAL METHOD»

EVALUATION OF RELIABILITY' INDICES AND CHARACTERISTICS OF THE POWER SYSTEM' EQUIPMENT AND DEVICES BY NON TRADITIONAL

METHOD

Farzaliyev Y.Z., Farhadzadeh E.M.

Azerbaijan Scientific - Research and Design - Prospecting Institute of Energetic, Baku city, H. Zardabi Avenue 94 [email protected]

Abstract

In the paper considered the research expediency classification of statistical data according to the given varieties of signs. The researching carried out based on modeling of small and multidimensional samples to statistical distribution functions. A discrepancy found in the estimation of the mathematical expectation of the average values of sample implementation, to overcome this inconsistency, a new method for modeling samples of random variables is proposed. It established that the classification in the literature data carried out according to the varieties of signs accepted in the classifiers without control of expediency. The causes of errors arising in the evaluation of Kolmogorov statistics as the largest in absolute deviation are analyzed the deviation between statistical distribution functions of the population and sample using simulation modeling, fiducially intervals and the theory of testing statistical hypotheses. These erroneous calculations with a small number and multidimensionality of sampling implementations double increases of the Type II Error. Finally, the result showed the advantages of the new method in comparison with Kolmogorov' criterion via checking representativeness of sample.

Keywords: representativeness, reliability indices, varieties of signs, statistical distribution function, simulation modeling, fiducially intervals, testing statistical hypothesis, type I and II errors, sample, population, multidimensionality

1. Introduction

In the paper presents the results of a study most difficult case of estimating the statistical distribution function for a given varieties of signs. An analogue of the problem solved is the Kolmogorov's criterion with the significant difference that the statistical distribution function here compared with the analytical distribution of a random variable. The practical application of this criterion is often erroneous, since it is not the Kolmogorov' statistics that is compared with the critical value, but the magnitude of the largest difference between the given distribution function and the statistical distribution function of a random sample, which is similar to it. It shown that reducing the risk of an erroneous decision in a situation where the deviation between the distributions functions is doubtful achieved by taking into account the magnitude of the Type II

Error. The study of this important issue allowed us to identify the main cause of this error and indicate the way to eliminate it. By the above criteria, the representativeness of the sample is recognized only under the condition that the significance of the statistics "the greatest value of a random variable" in comparison with other statistics is significantly higher.

The main assumption is the possibility of presenting the statistical data of operation via representative sample from the population of these data, i.e. these data appear homogeneous. The reliability indices calculated here are naturally averaged characters. In reality, the data belongs to the class of multidimensional data. However, due to the lack of methods for analyzing multidimensional data, they are mistakenly taken as an analogue of the general population, and calculations of reliability' indices and characteristics are carried out using methods that focus on analyzing samples from the general population, i.e. data with one constant distribution law. In turn, this assumption leads to erroneous decisions with all the ensuing consequences. Therefore, reliability ensuring provides for the possibility of comparing estimates of reliability indices of specific electrical equipment, i.e. the transition from the average reliability indices to indices of individual reliability.

In analyzing the reliability of electric power system equipment, the classification of operation statistics data carried out on one, and sometimes on two signs. Classification of statistical data on more than two signs not practiced. The reason for this is the diversity of the varieties of signs and the decrease in the accuracy estimates of reliability indices. The decrease in accuracy goes because of the assumption that the statistical data corresponds to a random sample from general population.

Statistical data characterizing the reliability of electric power system equipment (information on non-operating states) depend on a large number of passport and operational data (installation site, voltage class, design, service life and other signs.). That is why they cannot be considered either as an analogue of the general population, or as a finite sample of homogeneous data. Firstly, multidimensional data set not only by a set of random variables characterizing the reliability of the studies objects, but also by set of varieties of signs characterizing each random variable.

When classifying the multidimensional statistical data for a given varieties of signs, sample data is extracted non-randomly from a finite population of multidimensional data. A non-random sample consists of random variables and the distribution' features in the variation' interval of random variables of a finite population of multidimensional data depend on the varieties of signs. The type of distribution law for a finite population of multidimensional statistical data not known and systematically changes randomly as statistical data accumulated. The change' interval of a random variable in a sample from finite population of multidimensional statistical data for a given varieties of signs is no longer than the change' interval of a random variable in the finite population.

These features allow us to conclude that the use of classical methods for analyzing samples from the general population for analyzing samples from a finite population of multidimensional and small volume data leads to an increase in the risk of an erroneous decision.

2. Methods

The method and algorithm for calculating statistical distribution function (s.d.f.), which

* *

characterizes the largest deviation of Fi(g) and Fs (g), provided that Fs (g) is unrepresentative,

consists of the following sequence of calculations:

1. The next (from the required N implementations) sample of n random numbers is simulated;

*

2. Forming s.d.f. of Fs (g);

*

3. The largest divergence between Fe(§) and Fs (g) is determined. We denote this value as

An,emp, where the index "emp" corresponds to the empirical character of the sample.

*

Having determined the statistical characteristics of this sample {Fs (g) and An,emp}, we

* *

proceed to the formation of F (An) according to the realizations of the greatest

*

divergence between the distribution functions Fi(g) and the set (N). Fs (y), modeled on

*

Fs (g). For what:

*

4. According to Fs (g) is forming distribution: where i=1,(n+1); g - is random variable with a uniform distribution in the interval [0,1];

0 if y<y1

1 -1 (y-y) (1)

—7 + 7-if yi <y<yn+i

n +1 (yi+1 -Vi)(n +1)

1 if yn+1

Fs (V) -

5. On standard RAND() program is simulation random number g with a uniform distribution in the interval [0,1];

6. On the distribution (1) is calculation a random number y corresponding to probability g. Calculations are carried out according to the formula:

y=Vi + (Vi+1 - Vi)[g • (n +1) - (i -1)] (2)

with i=1,(n+1)

7. Items 5 and 6 are repeated n times;

*

8. On the sampling of {y}n builds s.d.f. Fs (y);

* A*

9. The largest divergence between Fi(g) and Fs (y) is determined. Denote it by A n;

10. Items (5^9) will repeat N times;

11. The average value of the random variable A n is determined. Denote it by M (A n);

12. According to N values of A*, s.d.f. of F (An) is formed [1].

* *

If we assume that distribution F (An) corresponds to the normal distribution law, and the

average value of M (A n) is equal to An,emp and corresponds to F (A n) = P = 0,5, then for all implementations of An,emp, the probability of which is 0.1<a<0.5, preference should be given to

assumption H2. However, the assumption about distribution function of normal law of the

* *

F (An) does not correspond to reality. As an example, figure 1 shows a histogram of the

* *

distribution of implementations A n for s.d.f. Fs (y) given in table 1

A *

Figure 1. The histogram of implementations A

Experimental studies have established that:

- evaluating of the expediency of data classification by comparing the boundary values of confidence intervals of reliability indices is associated with an increase in the risk of erroneous decisions;

- the error in using the absolute value of the largest discrepancy between F^* and Fs

instead of the calculated value of Kolmogorov's statistics is in the difference of their distribution functions, and, consequently, in the critical values [2].

- the regression equation of boundary values of fiducially intervals obtained by the standard program of the power transformation with determination coefficient R2 (R2>0,999) looks like:

A1

(я, )

= - An

-0,5

А(я ) = - Ая ) - n- J where A=0,652

a

-0,175

at a = 0,05

Al

(я1 ) = - 1,121

-0,5

A(ff[) = (1 - 1,12n°'5)ns1

- the reducing risk of an erroneous decision is achieved by taking into account the significance of the difference in the distribution of random variables of the divergence between the population's s.d.f. and data sampling.

In this regard, there is a need for modeling s.d.f. statistical parameters of random variables [3].

3. Results and Discussion

s

s

Classification of statistical data according to the given varieties of signs, firstly, presupposes the possibility to evaluating its expediency.

One of the ways to characterize the expediency of data classification is to assess the nature of the divergence of the s.d.f. a finite population of multidimensional data and a sample of this multidimensional data for a given varieties of signs.

Thus, the developed method and algorithm determines the most significant varieties of signs and, therefore, the working sample at each stage of the classification of multidimensional data. Reducing the risk of erroneous classification of multidimensional statistical data carried out

by evaluating the expediency of such a classification. The basis of the comparison between F-* (x)

and Fs (x) is the statistical modeling of ns pseudorandom numbers g equal to the number of random variables of the sample, with a uniform distribution in the interval [0,1]. A precondition

for this is the random nature of the difference between F^ (g) and Fs (g).

The representative nature of sample {g]n , in solving the problem of evaluating the expediency of classifying multidimensional data controlled by the Kolmogorov's criterion. According to this criterion, sample {g}n is not representative if:

Dn > Dn,(1_a) (3) + _

where: Dn = max(Dn , Dn )

D+ = max{D+ D_ = max {d_ 1 < i < n

Di =\-_gi

n

(4)

(5)

(6)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

D_=[g' _ v) ^ <7)

D„(i-a) " is the critical value of the statistics Dn, provided that Fz (g) and Fs (g) differ randomly.

In Kolmogorov's criteria it is noted that the evaluation of Dn by formula,

D+ = max{D+|} 1 < i < n (8)

leads to incorrect decisions about the ratio of Fz (g) and F*(g). The reason for this discrepancy not specified. For an indefinite in advance n, a decrease in the calculation time, is achieved by using the exact Stephens approximation, which tabulated critical values of Dn,(i-a), depending on n and a reduces to a dependence only on a. The sampling {g}n is unrepresentative if:

A • Dnv > Si_a (9)

where

V oV

■sjnv + 0.12 +

(10)

For example, for ns = 4, the value of A = 2.175 and for a = 0.1 the critical value is Si-a = 1.224, and for a = 0.05, the value is S1-a = 1.358.

The application of the inverse problem-solving method, where it is known in advance that sample {g}n is unrepresentative, has shown that the criteria (3) and (8) for the most commonly used in practice values of a=0.05 and a=0.1 do not establishes the non-random nature of the divergence

between F^ (g) and Fs (g) at less ns only for those cases where it is not in doubt.

For example: To confirm this statement, consider the following example. Let random numbers y have a uniform distribution Fi(y) in the interval [0.5; 1]. A random sample {y}n with n=4 is specified: {0.86346; 0.50672; 0.91424 and 0.67210}. Let us check the assumption of the representativeness of this sample for the uniform distribution law of the random variable g in the interval [0,1]. The calculations' results gave in table 1.

Table 1. An example of representativeness of the sample

i (П ) i/n D- Note

1 0.507 0.25 -0.257 +0.506 D + = 0.086 ; D- = 0.506

2 0.672 0.5 -0.172 +0.422 Dn=0.506; Dn<D4; 0.9=0.565

3 0.863 0.75 -0.113 +0.363 ADn=1.101;

4 0.914 1.00 +0.086 +0.164 ADn<S0.9=1.224

As follows from table 1, sample |y}4 does not contradict the assumption of representativeness with respect to Fz with a = 0.1

These features and some assumptions about the reasons for their occurrence required to move from the analysis of the absolute values of the largest difference the discrepancy Dn, to the analysis of the distribution of the largest absolute value of the implementation of vertical discrepancy Fe(§) and F* , which we denote as An. The use of type formulas:

(11)

A n = maxl --I 1 < i < n

V n )

when calculated on a computer leads to erroneous results. For example, according to the data of

Table 1, the maximum value among the four implementations of the value of D+ will be

+ *

Di = 0.086, and the largest in absolute value vertical divergence between Fe(|) and Fs (Ç) will be equal to D+ = -0.256 .

Figure 2. Block diagram of the algorithm for calculating the largest divergence distributions Fe(Q and Fs (£)

The systematization' results of these implementations presented in Table 2 allow us to conclude:

Table 2. Some evaluation results of s.d.f. F*(An)

V(An) n \ 0,025 0,05 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 0,95 0,975

2 -0.842 -0.775 -0.684 -0.551 -0.473 -0.149 -0.363 -0.304 -0.239 -0.060 0.184 0.285 0.343

3 -0.7094 -0.635 -0.566 -0.471 -0.400 -0.335 -0.296 -0.252 -0.200 -0.145 0.231 0.299 0.372

4 -0.623 -0.567 -0.494 -0.414 -0.355 -0.302 -0.253 -0.217 -0.173 0.155 0.240 0.319 0.377

5 -0.567 -0.511 -0.449 -0.370 -0.318 -0.274 -0.232 -0.190 -0.147 0.164 0.246 0.309 0.360

6 -0.523 -0.469 -0.411 -0.338 -0.292 -0.252 -0.215 -0.173 -0.127 0.171 0.244 0.303 0.358

7 -0.481 -0.438 -0.384 -0.318 -0.274 -0.235 -0.201 -0.162 -0.113 0.165 0.235 0.290 0.342

11 -0.389 -0.353 -0.309 -0.255 -0.219 -0.189 -0.110 -0.129 -0.097 0.160 0.216 0.260 0.302

16 -0.33 -0.295 -0.258 -0.215 -0.184 -0.158 -0.134 -0.103 0.107 0.150 0.194 0.232 0.264

22 -0.280 -0.253 -0.221 -0.183 -0.157 -0.135 -0.113 -0.083 0.105 0.137 0.176 0.210 0.235

29 -0.246 -0.219 -0.193 -0.160 -0.138 -0.119 -0.099 -0.068 0.098 0.126 0.158 0.186 0.212

40 -0.208 -0.187 -0.164 -0.136 -0.119 -0.102 -0.084 -0.050 0.089 0.112 0.140 0.164 0.185

60 -0.173 -0.156 -0.137 -0.114 -0.097 -0.083 -0.069 0.054 0.077 0.096 0.118 0.138 0.155

90 -0.142 -0.127 -0.111 -0.092 -0.079 -0.068 -0.055 0.051 0.067 0.081 0.100 0.116 0.130

120 -0.122 -0.110 -0.096 -0.080 -0.068 -0.059 -0.047 0.047 0.060 0.072 0.089 0.102 0.114

150 -0.110 -0.099 -0.086 -0.071 -0.062 -0.053 -0.042 0.041 0.053 0.065 0.079 0.092 0.104

1. The quantiles of the F*(An)=a distribution with n>2 are equal in magnitude and opposite in sign (the difference in sign is due to the difference in formulas (6) and (11) to the quantiles of the distribution F(Dn)=2a;

2. The distribution of F*(An) is asymmetrical. For illustrative purposes in figure 3 are s.d.f. F*(An) for ns. It is precisely the assumption about the symmetry of the distribution F(An) that can explain the discrepancy between the probability of almost equal quantiles of the distributions F*(An) and F(Dn);

3. The smaller gn is, the greater the negative value of An in sign, since An=(gn-1). According to experimental data, the smallest An value for n = 2 turned out to be An= -0,992, and the largest An=+0,489 with supremum's equal to 1 and 0.5 respectively;

4. In the distribution F*(An), we will distinguish between lower An and upper An boundary values with a significance level a, i.e.

F * (An )=a/ 2

F * (An )= (1 -a/ 2)

5. It was established that at 0,25>F*(An)>0,75, i.e. at a<0,5

V n

For example, for n = 4 and a=0.1 in accordance with the distribution of F*(An) (see table 2) the value of A4 = -0.567, and A4 = +0.319. At the same time, according to the formula (13):

-(0,25-0,567)=0,317= A 4

If n = 29 and a = 0.2, then An = -0.193, and An = 0.158 . The value of A„ by the formula (13) is

- (0.034-0.193) = 0.159.

A n =-|- + A n

(12)

(13)

■0,7 0,6 - 0,5 0,4 -0,3 -0,2 -0,1 0 0,1 0,2 0,3 0,-1

Figure 3. S.d.f. of F*(An) for numbers ns

In figure 4 shows the histograms of the distribution of negative and positive values of An for n = 4 and n = 29

TODD 6000 5000 4000 3000 2000 10DD 0

/

11=4

5858

4033

I

■nihil..

73 10

/ / / p* p^ pÎ / p* ° 0# 0Î J1 J

12D0D 10000 8000 FOOD 4)00 2100

h^TT

10207

m 3a

2036 2390

2 08 | 1 867 1 № , ,

n1 s Bv B- a- "a- "a" "a- i- "a-

Figure 4. Histograms of the distribution of the greatest divergence of distributions Fe(^) and F* (£)

As follows from figure 4, negative values of An significantly exceed positive values of An in relative number and range of change. Based on paragraph 3, it is clear that this is not accidental and does not indicate that the sample is not representative. With increasing n, the ratio of negative and positive values of An decreases and tends to unity. For n = 2, negative values of An are 87.5%, and for n = 29, 61%, and for n = 150, 55%. Thus, even for n = 150, the quantiles of the distribution F*(An) for a = 0.05 and a = 0.95 are not equal to [-0.099; +0.092]. The histograms also explain the patterns of distribution F*(An) shown in Figure 3.

Figure 5 shows the curves of changes in the boundary values statistics An for a number of values s.d.f. F*(An). The criterion for controlling the representativeness of sample j^jn with significance level a in this case is:

A n < An< A n

(14)

Figure 5. Regularity change of the boundary values to the greatest divergence of the distributions Fs(g) and *

F*(g)

Denote positive values of An by A+, and negative values - An. Taking into account paragraph 1 and equation (13), a sample {g}n with a significance level of a<0,5 can be taken as representative if:

An <

D

1

n,(1-2a)

M < Dn,(1-2a)

(15)

whereas:

.+ 1 A n +-

= A.

criterion (14) for significance level a can be represented as:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

An +-

V

n y

A,

= D

n,(1-2a)

(16)

Here it is necessary to pay attention to the inconsistency of the equations of significance of An and

dn,(1-2a).

If we look at Table 1, it is easy to see that the interval criterion (12), which allows you to take into account the sign of the greatest divergence An, is also unable to establish the non-representative nature of the sample {y}n .

It is known that reducing the risk of an erroneous decision when classifying data can be achieved by considering not only Type I Error, but also of the Type II Error [4].

The simplest solution to this problem would be to compare An between Fe(§) and F*(g)

with the boundary values of the interval |An; AnJ corresponding to the significance level a=0,5.

This is the limiting case of values a when An=0. The Type II Error is p=(1-a), i.e. also equal to 0.5. If a is taken less than 0.5, then the Type II Error p increases. In real conditions:

- configurations Fs(g) and F* (g) are different, i.e. An^0;

- for the same values of An, (a+p) is less than or equal to one;

- as An increases, (a+p) decreases, reaches its minimum (An,opt) and then increases;

- if An<An,opt, then a>p, if An>An,opt, then a<P;

- the difference between a and p increases as the difference between An and An,opt increases.

n

n

y

Comparison of the implementations of An with the boundary values of An and A n ,

Hi % -

calculated respectively for F (A n) = 0,25 and F (An ) = 0.75, makes it possible not to calculate

s.d.f., which determines the Type II Error p, which can be attributed to the advantages of this method. Its disadvantages are the need to double the number of simulated implementations of the distribution F* (<£), the unjustified reduction of the disperse An, the heuristic approach.

Conclusions

1. Statistical data on the reliability of Electric Power System' equipment and devices are a finite population of multidimensional data. Therefore, the use of classical methods of analyzing samples from the general population for analyzing samples from multidimensional data leads to an increased risk of erroneous decisions;

2. Research of the accuracy of existing methods for modeling random variables according to s.d.f. showed that the discrepancy between the accuracy of the methods is manifested only when the number of implementations of the sample ns < 20. A new method of modeling random numbers by s.d.f. is recommended;

3. Experimental researches have established that with the significance levels of assumptions used in practice, if the critical value of Kolmogorov's statistics at significance level a is equal to the estimate of the magnitude of the largest divergence, then the significance level of this estimate will be equal to 0,5 a ;

4. Reducing the risk of an erroneous decision is achieved by taking into account the significance of the difference in the distribution of random variables of the divergence between the s.d.f. of population and data sampling;

5. Assessing the appropriateness of data classification requires the involvement of simulation modeling of realizations of random variables, involving the mathematical apparatus of the theory of testing statistical hypotheses and fiducial probabilities.

References

[1] Buslenko, N.P. Modelirovanie slozhnyh sistem / N.P. Buslenko. - Moskva: Nauka, -1978, - 400 s. (In Russian).

[2] Gnedenko, B.V. and Belyaev, Yu.K. and Solovev A.D. (1965) Matematicheskie metodyi v teorii nadezhnosti. M. «Nauka», 524 s. (In Russian).

[3] Kendall, M.Dzh. and St'yuart A. (1966) Teoriya raspredeleniy. Per. s angl. Pod red. A.N.Kolmogorova. 587 s

[4] Ryabinin I.A. (1971) Osnovyi teorii i rascheta nadezhnosti sudovyih elektroenergeticheskih sistem. L., Sudostroenie. 454 s. (In Russian).

i Надоели баннеры? Вы всегда можете отключить рекламу.