SOFTWARE CONTRIBUTION TO THE AVAILABILITY OF MICROPROCESSOR-BASED RELAY PROTECTION

M.I. Uspensky

M. I. Uspensky

Komi SC UB RAS, Syktyvkar, Russian Federation. uspensky@energy.komisc.ru

Abstract

An important characteristic of relay protection functioning is availability of microprocessor relay protection software. An approach to estimation of such parameter and correlation between it and hardware availability on the example of 110/35/10 kV distribution network microprocessor protection is considered in the paper. The behavioral nature of the availability under research, reasons and a share of various kinds of the error leading to failure of program execution, variants of program volume definition, some solution approaches to the task at hand, including methods of Jelinsky-Moranda, and also examples of assessing the ratio of these availabilities are considered. An algorithm for the software evaluation used is presented. The influence of different conditions on such evaluation is shown. Applications of different approaches to software readiness estimation for the above types of protection based on data during debugging of protection programs are given.

Key words: reliability, availability, software, relay protection module.

I Introduction

The reliability index is an important characteristic of relay protection and automatics (RPA) functioning. Many authors, including us [1, 2], noted that such characteristic of modern digital protections is convenient to divide into components: hardware or technical reliability, connected with failure (destruction) of relay protection device elements; traffic reliability, defined by temporary loss or distortion of data without failure of process bus element; program reliability due to errors in development of execution programs; and resistance to external purposeful influence on transmitted information. In [3], the behavior of the first component on the reliability indicator was given and shown by the example of the 110/35/10 kV distribution network protection system. Here we will consider the approach to software reliability characterization, and on the example of the same system, we will evaluate the contribution of this component to the total availability of the aforementioned protections.

II Specifics of software reliability

It is known that software failure is associated with its inadequacy to the set tasks. There are many definitions of software failure. Most definitions of a software error come down to [4]: Software reliability is the probability that a program will work without failures for a certain period of time taking into account the degree of their influence on the output results. The frequency of errors from statistical data, reduced to 100% errors is given in Table 1, and the position "Incomplete or erroneous task" is disclosed in more detail.

On the one hand, software is not subject to wear and tear and its reliability is determined only by development errors. Thus, this indicator should increase with time, if correction of detected errors does not introduce new errors. On the other hand, many programmers' experience shows that in a large software, no matter how much you test it, some errors will remain. Due to the testing that simulates almost all the real modes, the errors of incorrect software operation are corrected, but there always remains a set of data that occurs due to some, usually external conditions, for example, interference or erroneous human actions, which cannot be foreseen and which will lead the software to work incorrectly. The next dilemma to solve here is how to optimize the quality/cost ratio so as not to lose market priority, or customer confidence. It is important to remember that we are here examining the readiness of the software to work.

L[4]_

The manifestation of an error in the software system is reflected in a failure situation, which leads the program either to a hang (stopping while waiting for the next command, which does not really exist) or to incorrect calculations, leading to erroneous actions.

The specificity of relay protection programs is that often the application programs are prepared in the languages of programmable logic controllers (PLCs) [5], which reduces the probability of program errors. However, the operating environment is written in more traditional software languages such as C, Java, etc. A system of programs written in different programming languages when estimating its reliability, is reduced to the average assembler equivalent per 1000 lines through "KAELOC - K of Assembler Equivalent Lines of Code", where K is 1000 lines of code [6] (see Table 2).

Cause of error Frequency, %

Task deviation 12

Ignorance of programming rules 10

Erroneous data sample 10

Erroneous logic or operation sequence 12

Erroneous arithmetic operations 9

Insufficient time to solve 4

Improper interrupt handling 4

Incorrect constants or input data 3

Inaccurate writing 8

Incomplete or erroneous assignment 28

Errors in numerical values 12

Insufficient accuracy requirements 4

Erroneous characters or symbols 2

Mistakes in the design 15

Incorrect description of hardware 2

Incomplete or inaccurate design basis 52

Ambiguity of requirements 13

Table 2. Conversion factors

Programming language Factor

Assembler, macroassembler 1

C 2.5

C++ 11

Fortran 3

Pascal 3.5

LISP 1.5

Ada 4.5

Forth 5

Query languages (like SQL) Object-oriented 4th generation languages PLC languages 25 16 10 ... 33

Basically, software bugs are tried to remove when writing and debugging, and a lot of programs are created to detect bugs at the debugging stage. But it is expected that some (small) number of errors is present in the program. The detection programs are tuned for specific external conditions (which group of people prepares the program under test, the temperature and electromagnetic environment, etc.) What to do with the remaining errors? 1. The salesperson continues to test and identify errors, which are corrected in customers. 2. Buyers identify bugs and turn them over to the creators for correction. 3. Change the vendor.

III Evaluating the software's contribution to availability by programming averages

A fairly rough estimate of software availability can be determined as follows [7]. For responsible applications, which include the RPA software, by the time the system is delivered to the client it may contain from 4 to 15 errors per 100 000 lines of program code [8]. For illustration, let us note that the number of code lines of WINDOWS XP is over 45 million, the NASA program is 40 million, the Linux 4.11 kernel is over 18 million. If we estimate the complex of simultaneously working RPA programs at 1 million code lines, the number of errors at the beginning of software operation E = (V/100 000) • 15 = 150 errors. Then, using the formula of average software MTBF, we get

Xw =P # =0.01 = 1.5 • 10"6 or tsw =-%- = 106 «76 years, (1)

^ $ 106 *SW 1.5-8760 1 W

where E is the number of errors per complex of jointly working programs accepted for operation, V is the size of the complex in code lines, p is the program complexity factor, usually in the range of 0.001...0.01, Asw is the failure rate and tsw is the MTBF of software, 8760 is the number of hours per year. The size of the RPA application programs is most often limited to thousands of assembler lines because of the requirement for their speed. Then, at the value of 15 errors per 100 000 code lines, adopted for the application software after testing with the volume of code lines E = 4000 • 15/100,000 = 0.6 errors

\w =R # = 0.01 — = 1.5 • 10"6 or tsw = — = « 76 years (2)

^ $ 4000 *SW 8760 1 V '

or about one failure per 76 years. With a recovery time of tr = 2 h ^sw = =-4280-=

r J J sW 1.5-10 +4380

0.9999999997.

IV Software contribution to availability according to the Jelinsky-Moranda model

There are a number of models of reliability growth concerning the process of failure detection [9, 10]. The classification of such models divides them into two groups: models that consider the number of failures as a Markov process; models that consider the failure rate as a Poisson process. Let us use the model of the second group.

The Jelinsky-Moranda model is based on the following assumptions: 1) the time to the next failure is exponentially distributed; 2) the failure rate of a program is proportional to the number of errors remaining in the program.

This model assumes that the time elapsed between failures follows an exponential distribution with a parameter that is proportional to the number of remaining errors in the software. Figure 1 shows a stepped curve characteristic of program failure rate changes as a function of its model run time. It can be seen that as each error is detected, the degree of risk decreases by proportionality constant. This indicates that the impact of each fault correction is the same.

According to these assumptions, the probability of program failure as a function of time ti is

P(t3) = e"*&£&, (3)

where the failure rate is

Fig. 1. In the Jelinsky-Moranda model, the failure rate curve decreases from constant CD.

4 = Cd[E0 -(fc-1)].

(4)

Here Eo is the initial number of errors, k is the number of the last observed program failure/fault, Cd is the proportionality factor. The time countdown starts from the penultimate (k - 1) program failure. The disadvantage of the model is that it assumes complete elimination of errors after it detection without introducing new errors.

From model (3) and the maximum likelihood method we can write

F = nti1 Cd(Eo - i + 1) e-8D(#o-3+i)ii,

(5)

or logarithmic likelihood function

L = In F = Sf=-11(ln [Cd(£o -Î + 1)]-Cd (Eo -i + 1)},

(6)

wherefrom finding the extremum

£ =2f7Î [8;-(Eo-i + 1)tl] = 0, SL _^ - CDt,-

££ = y5-i [

3= = ^3=1 [

#o 3+1

0.

(7)

(8)

From (8) we get

= £)="111/(go-3+1) y)-* t. .

(9)

Substituting (8) into (7), we obtain

(fc-1) ^

y)-* t-¿-¿=1 zi

y ) =1* 1/(#o -3 + 1)

= Xf=Î(£o - i + 1 )ti,

(10)

from which we find Eo by trying its values. Since Eo is an integer, we find the minimal difference between the left and right parts of (10). The closest integer value Eo, at which the difference between the left and right parts of formula (6) is minimal, is usually given in the range k - 1...2k, since the initial number of errors is not less than the known value of the number sum of corrected errors, and the error remaining number is usually not greater than the number of detected errors, i.e. the final total value is equal to the doubled detected value.

The manifestation intensity of the remaining errors of the program is determined. According to the methodology in [11], such intensity is calculated by the formula

y)

K = —y) t'=* {Eo - y5=1 i).

y) t-¿,=1

(11)

But this intensity is bound to the volume of lines with errors. In reality, the failure rate is statistically defined as [12]

A(t)

m(t) n(t)At '

(12)

where m(t) is the number of failed elements (lines with errors) in the considered period At, n(t) is the average number of equipment elements (in our case, code lines or program commands) work-

C

d

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

M. Uspensky

SOFTWARE CONTRIBUTION TO THE AVAILABILITY RT&A, No 3 (69) OF MICROPROCESSOR-BASED RELAY PROTECTION_Volume 17, September 2022

ing in this interval. Therefore, the obtained in (9) intensity should be recalculated to the full volume

of lines or commands under study software, i.e.

* = , (13)

where Nl is the number of lines under study software.

Assuming a constant error rate in accordance with the Jelinsky-Moranda model concept, we calculate the average time to error in the software:

t* = % . (14)

When the error detection and correction time is assumed to be 2 hours (^ = 0.5), the software availability coefficient is

¿w = 110 . (15)

The calculation algorithm is shown in Fig. 2. The initial data of this calculation are: Nll - number of program lines in programming languages, which are converted by means of Table 2 into Nl - number of commands reduced to assembler codes; Nt - number of executed tests; array [E;] - number of detected and corrected errors at the i-th stage of testing, which time is determined by the array [t;]. DE traces the minimal discrepancy between the left (LP) and the right (RP) part of the formula (10) in the Eo search. EE; is the sum of errors known from the tests up to position i. Et; is the sum of times between tests up to position i. The variable LA corresponds to the manifestation intensity of the remaining program errors (A). is the software availability index.

Fig. 2. Calculation algorithm for software availability characteristics.

M. Uspensky

SOFTWARE CONTRIBUTION TO THE AVAILABILITY RT&A, No 3 (69) OF MICROPROCESSOR-BASED RELAY PROTECTION_Volume 17, September 2022

V Calculation of software contributions to the availability of relay protections

Let's evaluate the software availability of the protection and control modules for the 35 kV bus section and transformer section [3]. The necessary data for the studied modules are presented in Tables 3 and 4. The results of calculations - in Table 5.

Table 3. Transformer section protection and control software module

Initial data Ei, errors EE;, errors t;, hours Et;, hours

Nt = 4 0 0 0 0

Nl = 1439aSm+200c++ 1 1 77 77

Nl = 3639asm 1 2 63 140

^ = 0.5 h-1 1 3 7 147

1 4 187 334

C++ - in codes C++, asm - in codes assembler.

Table 4. 35 kV busbar section protection and control software module

Initial data Ei, errors EE;, errors t;, hours Et;, hours

Nt = 3 0 0 0 0

Nl = 1346asm+130c++ 1 1 63 63

Nl = 2776asm 1 2 11 74

^ = 0.5 h-1 1 3 117 191

C++ - in codes C++, asm - in codes assembler.

Table 5. Calculating results of the software availability characteristics of the modules

Transformer section protection and control module 35 kV busbar section protection and control module Flexible logic module

Eo 5 A, years-1 0.04621 Eo 4 A, years -1 0.07153 Eo 4 A, years -1 0.01016

E; 4 tE, years 21.64 E; 3 tE, years 13.98 E; 3 tE, years 98.4

AE 1 Asw 0.99998945 AE 1 Asw 0.99994558 AE 1 Asw 0.99998367

CT

Link

MU

IED,

m(CT) —► m(Lk) —► m(MU) >

m

CL

« ► IEDC

BC —► PS

> CB

BC —► PS

Y

Measurement channel backup

Fig. 3. Reliability block diagram of protection.

According to the protection reliability model (Fig. 3) for the hardware part, presented in [3], and software organization (Fig. 4), we explain that the software part of the model consists of two software protection blocks, autonomous and centralized, included by reliability in parallel with the output to the process bus,

M. Uspensky

SOFTWARE CONTRIBUTION TO THE AVAILABILITY RT&A, No 3 (69) OF MICROPROCESSOR-BASED RELAY PROTECTION_Volume 17, September 2022

and two parallel blocks of flexible logic program. In the software evaluation of the model, the process bus is not taken into account, because it is taken into account in the hardware.

In accordance with the scheme of Fig. 4 we determine the equivalent failure rate and recovery rate of the corresponding transformer protection programs based on the known relations: Ae = Xi ; = - for series connection and = ^ ; Ae = ^/SiT1 - for parallel connec-

tion. Then for Fig. 4, the equivalent values are:

left-hand side = 2 • 0.5 = 1; = 1/(2 • 058760) = 5.2751 • 10_6 h"1;

™ el ' V 0.04621/

right-hand side = 2 • 0.5 = 1; Aer = 1/(2 • = 1.1598 • 10_6 h"1;

Ae = 5.2751 •10"6 + .

1.1598 •10 6 = 6.4349 • pig. 4. Reliability model of transformer protection software.

10_6 h"1 or 0.05637 years - 1;

„ „_6 //'5.2751 10~6 + 1.1598 10~6\ „ _ ,_-,

= 6.4349 • 10 6/(-—-) = 0.5 h 1 or 4380 years - 1.

Consequently, = =-42-0-= 0.999987.

n J !W *e+Te 0.05637+4380

From [3] in the worst case for the hardware model of transformer protection Ahw = 0.9999999764, i.e. the contribution to the unavailability of protection from the hardware is essentially less, than from the software, and its total value Al = Ahw •Asw = 0.9999869, and the average time to failure fc = Al /(je ■ (1- Al) ) = 17.6 years or 153565 hours.

The scheme of the 35 kV busbar protection software model also corresponds to fig. 4. Equivalent values for the left part: N = 2776 lines of assembler code; A = 0.007153 years -1; j = 0.5 h-1. The right part is similar to the transformer model. Then the equivalent values of the left part ^ = 2- 0.5 = 1; Aei = 1/(2 • 0508520) = 8.1655 • 10_6 h"1; Ae = 8.1655 • 10_6 + 1.1598 • 10_6 = 9.3253 • 10_6 h"1 or 0.08169 years - 1;

mnn 1 n_6 //8.1655 10-6 + 1.1598 10"6\ ,_1 ,oon .

= 9.3253 • 10 6/(-—-) = 0.5 h 1 or 4380 years - 1.

Consequently, = =-4280-= 0.999981.

n J !W *e+0e 0.08169+4380

From [3] in the worst case for the hardware model of busbar protection Ahw = 0.999999884, i.e. the contribution to the unavailability of protection also from the hardware is much less, than from the software, and its total value Al = Ahw •Asw = 0.9999809, and the average time between errors fc = Al /(je ■ (1- Al) ) = 12.2 years or 106872 hours.

To estimate the contribution of software to the total unavailability of protection, we can use the expression

= —"-,5# • 100% = *5# • 100% = —• 100%. (16)

Here he is the number of failures, fE is the average time to failure, the index Ahw is the availability of the hardware part.

When discussing the results of the presented work, it should be understood that, despite their outward resemblance to a quantitative assessment, they represent only qualitative indicators of software readiness. On the other hand, the obtained results do not take into account the software test control, which improves the studied indicators. Unfortunately, as noted in [13], there are no reliable methods for quantitative software evaluations other than statistics for a significant period of program operation. Nevertheless, they show that software errors can have a significant influence on reliability indicators of microprocessor relay protection.

VI Other approaches to assessing contributions to the availability

of relay protections

Let's try to estimate the impact of software on RPA functioning from the following statistics. According to [14], the number of microprocessor-based RPA devices in operation in 2013 was 274062 devices, and in 2014 - 319912 devices (Table 4, [14]). From the data [15] "Distribution of cases of device RPA malfunction by types of technical reasons and device RPA types for the period from 01.01.2020 to 30.06.2020" we know that out of 727 cases of RPA failure 18 cases are related to software failure or malfunction. Then the forecast number of RPA devices for 2020 in relation to 2013 from the formula dn = di(1+r)n at r = 100-(d2 - di)/di %, where di - number of devices of the first year, dn - number of devices for n year, r - average annual growth of devices, can make

f 319912 - 274062\6 d7 = 274062 \1 +-274062-] = 693328 devices

Let's take Rosseti's share of RPA as 70% of all devices in [15]. Then a rough estimate of the failure rate A = —— = ——— = 7.42 • 10"5 years-1. Here n - the number of devices, failed due to soft-

0.7=4 0.7-693328 J

ware, for half year (2 in the numerator), 0.7 •N - the number of all microprocessor protections, t -

design period (year). For the recovery time tr = 2 hours = = 742 j/3--'^0 = 0.999999983.

%

And the average time between errors J = - = 13477 years, which is of course unreal. From the relation Ct^nn = —--100% = — • 100% « 2.5% we will note, that the share of failures be-

n-SW+n-HW 727

cause of program errors was 2.5%.

One more approach on the basis of data of work [16] where at small sample a share of failures because of software errors in total number of failures can be estimated as (3+1+3+4)/(11+15+18+17) =11/61-100% =18%, where in numerator failures because of software, and in denominator - total failures. Of course, the small sample does not allow us to confidently judge the representativeness of the figures, but, nevertheless, some idea of the ratio is given.

VII Conclusion

The approach according to formulas (1) and (2) gives quite a large uncertainty range, depending on the choice of the error content coefficient per 100 thousand lines of code and the program complexity coefficient. Its result can be considered as an upper bound of Asw under the chosen conditions. An estimation of Asw contribution values showed that software unavailability was 1.3% of the total unavailability.

The Jelinsky-Moranda reliability model can be considered a lower bound for Asw since the initial conditions are more restrictive here.

Calculations of software availability with the Jelinsky-Moranda reliability model showed that the main unavailability of the considered protections is determined by software unavailability, which was 99.8% for transformer protection and 49.8% for busbar protection. Nevertheless, even in this case, the average total error time is more than 150 thousand hours for transformer protection and more than 100 thousand hours for 35 kV busbar protection.

In contrast to the calculated data from statistics [14,15] showed that the error rate due to software is about 2.5%, and from [16] - 18%.

The work was carried out within the framework of the theme "Models and methods of adaptation of power systems in modern conditions".

References

1. Morozov Yu.M. Reliability of hardware and software systems. St. Petersburg, 2011. 136 p. . (In Russian).

2. Uspensky M.I. Contribution of Hardware, Software, and Traffic to the WAMS Communication Network Availability // Reliability: Theory & Applications Vol. 15, No 3. 2020, pp.70-83. DOI:https:// doi.org/10.24411/1932-2321-2020-13007

3. Uspensky M.I. Reliability Assessment of the Digital Relay Protection System // Reliability: Theory & Applications Vol. 14, No 3. 2019, pp. 10-17. DOI: https://doi.org/10.24411/1932-2321-2019-13001.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

4. Shklyar VN. Reliability of control systems. Tomsk, Russia: Publishing house of Tomsk Polytechnic University. 2009;126 p. (In Russian).

5. Livshits, Yu. E. Programmable Logic Controllers for Process Control / Minsk: BNTU, 2014, Ch. 1, 206 p. (In Russian).

6. Baranov S.P., Domaratsky A.N., Lastochkin N.K., Morozov V.P. Defects prevention during software products creation // Software Products, #1, 2000, pp. 59-63. (In Russian).

7. Borovikov SM, Dik SS, Fomenko NK. A method for predicting applied software tools at the early stages of their development// Reports of the Belarusian State University of Informatics and Radioelectronics. 2019, #5, pp. 45-51. (In Russian).

8. Chukanov VO, Gurov VV, Prokopyeva EV. Methods of ensuring software and hardware reliability for computing systems// Russia, Presentation of the report at the seminar, pp. 1-44. Available: http://www.mcst.ru/files/5357 ec/dd0cd8/50af39/000000/seminar_metody_ obespecheni-ya_apparatno-programmnoy_nadezhnosti_vychislitelnyh_sistem.pdf (In Russian). (accessed 12.03.2019)

9. Bubnov V. P., Safonov V. I., Shardakov K. S. Review of existing models of nonstationary service systems and methods of their calculation // Control, Communication and Security Systems. 2020, # 3, pp. 65-121. DOI: 10.24411/2410-9916-2020-10303. (In Russian).

10. Vasilenko N.V., Makarov V.A. Software Reliability Assessment Models // Bulletin of Novgorod State University, 2004, # 28, pp. 126-132. (In Russian).

11. Iyudu K.A. Reliability and diagnostics of computing machines and systems: Textbook on special "Computing machines, complexes, systems and networks" / M.:Vyssh. shk. 1989, 216 p. (In Russian).

12. Shalin A.I. Reliability and diagnostics of relay protection of power systems. Novosibirsk: Publishing house of NSTU, 2002, 384 p. (In Russian).

13. Littlewood B., Strigini L. "Validation of ultra-high dependability..." - 20 years on // BL-LS-SCSS newsletter2011_02_v04distrib.pdf, 5 p. Available: http:// www.staff.city.ac.uk

14. Concept for the Development of Relay Protection and Automation in the Electric Grid Sector // Appendix # 1 to Rosseti's Management Board Protocol # 356pr dated June 22, 2015. M.,2015, 49 p. Available: https://mig-energo.ru>wp-content/uploads/2015/12/rza-fsk.pdf. (In Russian).

15. Distribution of malfunctions of RPA devices by types of technical reasons and types of RPA devices for the period from 01.01.2020 to 30.06.2020// Available: https://www.so-ups.ru/ filead-min/files/ company/rza/rza_rez_info/rza_rez_vid_teh_1-2k2020.xls. (In Russian).

16. Zakharov O. G. Reliability of Digital Relay Protection Devices. Indicators. Requirements. Estimates. Moscow: Infra-engineering, 2018, 128 p. (In Russian).

SOFTWARE CONTRIBUTION TO THE AVAILABILITY OF MICROPROCESSOR-BASED RELAY PROTECTION Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — M.I. Uspensky

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — M.I. Uspensky

Текст научной работы на тему «SOFTWARE CONTRIBUTION TO THE AVAILABILITY OF MICROPROCESSOR-BASED RELAY PROTECTION»