METHODS OF INCREASE IN ACCURACY OF THE DATA BASED ON INDISTINCT CONCLUSIONS IN SYSTEM OF MONITORING OF PRODUCTION AND TECHNOLOGICAL INDICATORS Zhumanov IsrailIbrahimovic, Ph.D., Professor, Department of Information Technology of the Samarkand State University, Samarkand, Uzbekistan,
E-mail: olimjondi@mail.ru Holmonov Sunatillo Mahmudovich, Department of Information Technology Assistant Samarkand State University, Samarkand, Uzbekistan, E-mail: s-xolmonov@mail.ru Temerbekov Barnohon Maratovna, senior scientific researcher sotrudnik- the department "Automation ofproduction processes" of the Tashkent State
University, Tashkent, Uzbekistan E-mail: doctor_temerbekova2016@mail.ru
Author formulated the problem of increasing the authenticity of non-stationary objects information during transmission and processing by data mining technologies (DMT), oriented on improving and developing the existing methodology based on use of properties and characteristics of fuzzy sets, logic, neural networks models. The efficiency is investigated for methods and algorithms, using statistical, dynamical properties, latent patterns, relationships, and other specific characteristics of information sources. The structure of researches and practical development is submitted as solution of tasks for preliminary data processing, identification, approximation and control of information authenticity by synthesis algorithms of fuzzy conclusions and computing circuits of neuro-fuzzy networks. The problem-oriented complex of built-in services, databases and knowledge bases is designed for providing reliability, safety and integrity of the information. It is realized easily interpretive unified framework, ensuring adaptability and high efficiency of DMT technologies. The results are given for program complexes realization and efficiency researches.
Key words: intellectual technology, software and algorithmic complex, data authenticity, non-stationary object, identification, structural, parametric fuzzy inference, setting parameters.
МЕТОДЫ ПОВЫШЕНИЯ ДОСТОВЕРНОСТИ ДАННЫХ НА ОСНОВЕ НЕЧЕТКИХ ВЫВОДОВ В СИСТЕМЕ МОНИТОРИНГА ПРОИЗВОДСТВЕННО-ТЕХНОЛОГИЧЕСКИХ ПОКАЗАТЕЛЕЙ Жуманов Исраил Ибрагимович, доктор технических наук, профессор кафедры информационных технологий Самаркандского Государственного Университета, г.Самарканд, Узбекистан, (E-mail: olimjondi@mail.ru) Холмонов Сунатилло Махмудович, ассистент кафедры информационных технологий Самаркандского Государственного Университета,
г. Самарканд, Узбекистан, (E-mail: s-xolmonov@mail.ru)
Темербекова Барнохон Маратовна, старший научный сотрудник- исследователь кафедры «Автоматизация производственных процессов» Ташкентского Государственного Университета, г. Ташкент, Узбекистан (E-mail: doctor_temerbekova2016@mail.ru)
Сформулирована проблема повышения достоверности передачи и обработки информации нестационарных объектов за счет технологий интеллектуального анализа данных (ИАД), направленных на совершенствование и развитие сложившейся методологии на основе использования свойств и особенностей моделей нечетких множеств, логики и нейронных сетей. Исследована эффективность методов и алгоритмов, использующих статистические, динамические свойства, скрытые закономерности, взаимосвязи и другие специфические характеристики источников информации. Представлена структура исследований и практических разработок в виде решения задач предварительной обработки данных, идентификации, аппроксимации и контроля достоверности информации на основе синтеза алгоритмов нечетких выводов и вычислительных схем нейро-нечетких сетей. Спроектирован проблемно-ориентированный комплекс встроенных сервисов, баз данных и знаний для обеспечения достоверности, сохранности и целостности информации. Реализован легко интерпретируемый унифицированный фреймворк, обеспечивающий адаптируемость и высокую производительность технологий ИАД. Приведены результаты реализации программных комплексов и исследования их эффективности.
Ключевые слова: интеллектуальные технологии, программно-алгоритмический комплекс, достоверность данных, нестационарный объект, идентификация, нечеткий вывод, настройка параметров.
Relevance of a subject. Continuous growth of amount of information, limitation of primary data, complexity of structure, uncertainty, not stationary of parameters require development of methods and algorithms of the data mining (DM) on basis of the conceptual principles of stay and use of new, earlier unknown knowledge, the hidden properties, regularities, interrelations and features of the accidental temporary processes representing non-stationary objects in systems of monitoring of technical, economic, social and technological indicators of production and technological complexes [1].
In traditional approaches to creation of systems of monitoring of non-stationary objects and in algorithms of data processing the models of artificial intelligence based on the statistical methods incapable fully to consider the conditions, which are negatively influencing the accuracy of calculations, are applied. In addition, the systems based on the DM methods possess a capability to study and adapt in case of receipt of undesirable conclusions only with expanded background of results. Besides, conversion of primary data in knowledge for optimization is performed in them on the basis of attraction of technologies of search of correlations, use of characteristic tendencies, interrelations and regu-
larities in algorithms of statistical recognition, cauterization, classification, regression and correlation analyses [2].
The research of the questions connected with enhancement and development of the developed methodology, which is based on effective methods, and algorithms of DM using properties and features of models of the indistinct sets, fuzzy logic, the neural networks (NN) and neuro indistinct networks (NIN) supplementing traditional statistical approaches is of theoretical and practical interest. There is more and more obvious an efficiency of the methods and algorithms using statistical, dynamic properties, the hidden regularities, interrelations and other specific characteristics of sources of information [3, 4].
The DM methods at the same time purchase unique properties and capabilities which allow to receive tools for adequate identification and optimization of data processing of accidental temporary processes (ATP) in the conditions of transfer of the incomplete, diverse, partially set information in case of big parametrical uncertainty of non-stationary objects.
In this regard, as key questions of a research and development of methods of increase in reliability of transmission and data handling in monitoring systems of no stationary objects on the basis of DM technologies the following is put:
- development of effective methods of data analysis for the purpose of creation of a program and algorithmic complex of increase in reliability of transmission and information processing in interactive and integrated environments;
- development of the methods and different applications including tools of specific application-oriented objectives on the basis of use of a mathematical apparatus of soft computation in case of the solution of complex application-oriented problems of the analysis, data handling and support of reliability of information;
- design of the latest scopes of job oriented DM technologies in the form of different built-in services and program complexes of support of reliability, safety and integrity of information in the databases (D) and the knowledge bases (KB);
- implementation of a program and algorithmic complex of increase in reliability of transmission and data handling in the form of easily interpreted and unified framework providing completeness, safety, reliability of information, an adaptability and high performance of the DM systems.
Basic functions and tasks of a program and algorithmic complex of increase in reliability of transmission and data handling. Basic functions of the program and algorithmic complex of increase in reliability of information (PACIRI) is the following:
- search of correlations, tendencies, correlations and regularities in the data which are subject to placement in a DB and knowledge base in the form of a set of rules of operational and intellectual data handling;
- extraction and use of the useful properties, specific characteristics, regularities of distribution of data for a clustering, classifications, formations of learning, control and test sets;
- preliminary data handling on the basis of division of space of signs of an object and separation of circuits of an object, identification, approximation, regu-larization of parameters of models for optimization of making decisions on reliability of information.
The methods and algorithms of DM used at creation of software and algorithmic complex improve the reliability of the information are based on results of the solution of the following tasks:
- identification and selection of informative signs, cluster analysis and optimization of structure of a multicomponent non-stationary object;
- recognition, classification, identification, approximation of ATP;
- finding of adequate descriptions of contours of objects, nonlinear dependences "entrances exits";
- formation of specific characteristics, useful properties, regularities of distribution of data, knowledge, reference descriptions, images and rules;
- control and assessment of reliability of processing of ATP.
Mathematical model of functioning of software and algorithmic complex improve the reliability of the information on the basis of indistinct sets and fuzzy logic. Let a set of the entrance and output data of a complex be presented in the form:
(Xr, Yr), r = 1M , (1)
Where Xr = XrXrn) - entrance - a measured vector;
Yr ={У1,.У2,...,yM) -output vector.
Indistinct rules of a logical conclusion are set in the form of [5]
M f n \
U I fl X- = X j ,i ■ B ^ y}- = bmo + Kx j + ... +bmnXn]
1=1 ^-=1 J , (2)
Where
b=(bj) i=im j=
The requirement to ensuring reliability of information consists in minimization of values of the following functionality
E=X (yr - yfr min
, (3)
r= 1
f
yr - result of an indistinct conclusion with B parameter in r to a line of an entrance matrix of data.
To entrance matrix x r there will correspond the following result
m / m
yf)• dt y a,X)
i=i I .=i , (4)
Where, d = bo + bilXl + b2x2 + - + b.nx™ -exit of an algorithm of an indistinct conclusion on
. - to ohm to the rule.
(Xr ) -the function of accessory (FA) corresponding to each input variable which is defined from expression
Adi (Xr ) = Ai (xrl ) • Al (xr2 ) • Al (xr3 ) • ••• - Al (xrn )v
V A 2 (xr1 ) • A 2 (xr 2 ) • A 2 (xr 3 ) • ••• • A 2 (xrn )v •••
••• V Am (xr1 ) • Am (xr2 ) • Am (xr3 ) '' ••• • Am (xrn (5)
Expression of assessment of reliability of control of information on an algorithm of an indistinct conclusion according to a formula (4) will register in a look:
m m
yfr = Z PA = Z (гA 0 +Pr • Al • xr1 +Pr • b 2 • xr 2 + • Pr • K • x„ )
i=l i=l • (6)
= AXLA
rir m
ZA (Xr )
i=l • (7)
Further we will enter the following designations:
yf = (f,v.yMT, Y = CVl,y2'•••'^m/ •
Then the functionality (3) will correspond in a matrix view with the requirement of performance of a condition of minimization:
E = (y - Yf J •(y - Yf min (8)
Designing of algorithms of functioning of software and algorithmic complex improve the reliability of the information on the basis of fuzzy logic Creation of algorithms of a complex is based on synthesis of algorithms of indistinct logical conclusions with NNS which promote designing of effective instruments of identification, approximation of ATP and nonlinear dependences "entrances exits" on the basis of linguistic statements like "IF THAT" and transactions over indistinct sets [4] At the same time computing schemes of the NNS following structural elements are implemented:
fazzifikator which will transform the fixed vector of the entrance, influencing
factors to a vector of X values of indistinct sets;
indistinct KB which contains linguistic rules of definition of dependences "entrances exits"Y = f (X) ;
the generator of indistinct outputs realizing indistinct rules KB for determination of the value of an output variable corresponding to indistinct values of input X and output variables Y on FP of linguistic variables;
defazzifikator which will transform value of an exit from an indistinct setY in accurate numbers Y •
The system is constructed on the basis of Mamdani and Sugeno's algorithms which difference consists in a format of the KB task and the procedure of a defazzifikation [6]
Estimates of interrelations of entrances X =( ^ x2v^ x" )and exitY at an algorithm of an indistinct conclusion of Mamdani are defined in the following look:
IF (xi = a.) AND(x2 = a j AND^AND (xn = anjl),
OR (xi = au2) AND (x2 = a2 2) AND ^AND (xn = anj.2)
OR (Xi = aljk]) AND ('2 = a2Jkj) AND...AND('n = anjj) TO yi = dj, i = 1, m
Where,a',1p - estimates of a linguistic term and variable which register in a
line of a matrix with number1 (p =1 ); - quantity of lines-konjyunktsy on
which the linguistic term is estimated dj exity m - quantity of terms of a linguistic variable y.
Reactions of KB are represented by typical operations and in a look U H
If (AND)
kj n
Ufl(x = aj) ^ y=dj . —
p=1 -=1 , j =1 m . (9)
where ajp,
In a complex as the regulated and configured setting of indistinct model FP
Vjp(x) of input linguistic terms corresponding to an input variable are usedXi. Coefficients of accessory of an input variable to an indistinct term are calculated in a look:
xi
ai, ]P =JX(XV x -
Xi Xi e [' Xi ] (10)
Where, (Xi) - FP of entrance linguistic terms.
Coefficients of accessory of an output variable to an indistinct term dj, 1 =1 m are calculated in a look
y
dj = J^dj (y)/y r -
y , у G [У, У]. (li)
Where, где ^dj (y) - FP corresponding to an output variable y.
Degree of accessory of an entrance vectorX =(x2,..., x*) to indistinct terms
dj it is set in the form of the indistinct logical equations
Vd. (X*) = A_[^]p (x*)] . —
1 p=i,kji=i,n JF j = 1, m (12)
Where,v (л) -operationss - norms or t - norms from a set of realization of logical operations OR (AND).
Accessory degreeyy to indistinct termsdj it is set by the indistinct logical equations in a look
~ = JbW X 2),dj (y)> yл
1=4y J, (13)
Where,imp - the implication realized as operation of finding of a minimum;
i = 1, n j = 1, m p = 1 kj
agg -aggregation of indistinct sets which is implemented by operation of finding of a maximum.
As a result of a defazzifikation of an indistinct variable y real value of an exit is definedy, corresponding to an entrance vectorX * by means of a method of definition of the center of gravity
y / y
y = j y • Ay (y)dy {y)dy
y f y . (14)
Unlike an algorithm of an indistinct conclusion of Mamdani of assessment of
interrelations of entrances X =(Xl'Xn) and exity for Sugeno's algorithm are set in a look
kj n
Un( X = a'jp ) ^ y = j + bj ,1 • X1 + j • X2 + ... + hn ■ Xn -
p=1 i=1 , j = 1, m . (15)
Estimates of a logical conclusion of an algorithm of Sugeno are set in the form of linear function of dependence on entrances:
n
dj = bj,o +Z bji'Xi
i=1
Value of an exity algorithm of indistinct conclusions of Sugenoas a result of a defazifikation is defined as the superposition of linear dependences which is carried out in this pointX *n -measured factorial space by calculation of the weighed average.
Realization of methods and algorithms of DMT. software and algorithmic complex improve the reliability of the information has switched on the program modules based on algorithms of indistinct conclusions of Mamdani and Sugeno, algorithms of training of multilayered NANOSECOND with direct and return distribution of mistakes, indistinct rules KB. The following features of indistinct models are used:
- terms dj the output variable in Mamdani's model sets singltonam - indistinct analogs of accurate numbers;
- the level of accessory of one of elements of the universal set will be equal to unit, and remaining - to zero;
- the inferences of rules in KB of Sugeno's model are set by functions in which original coefficients of a level of accessory in case of input variables are equal to zero;
- indistinct KB includes types of sendings, properties of indistinct sets, the inferences in the form of an accurate linear function.
As a part of software and algorithmic complex improve the reliability of the information program complexes of identification, approximation of ATP, setup of structure and parameters of indistinct models for optimization of monitoring of reliability of information are realized.
Important component of software and algorithmic complex improve the reliability of the information are software of structural and parametric identification.
Vectors are for this purpose created1 -FP parameters of terms of entrance variables, O - vector of the FP parameters of terms of an output variable andB -vector of coefficients of linear functions in the conclusions of rules.
Parametrical identification for Mamdani's model comes down to finding of such vector( 1,O), that
1 M
—Z (>v - F(I, O, Xr ))2 ^ min
Mj=i ,
Parametrical identification for Sugeno's model comes down to finding of a vector( 1, B) so that
1 M
Mz (yr - F(I, B, Xr ))2 ^ min
M j=1 .
Restrictions for variables are imposed (1, O), providing the linear orderliness of elements a term sets. As the instrument of parametric identification of ATP the function of non-linear optimization realized in Optimization Toolbox packet is used.
Accepted a term set of a linguistic variable are used when forming indistinct KB.
Programs for parametric identification of non-linear dependences on the basis of indistinct hybrid model in the environment of MATLAB are realized. At the same time setup of indistinct model is carried out both in a command mode by means of the anfis function, and in a conversational mode with use of the anfisedit GUI - module.
Setup of model parameters is carried out by finding of FP and its parameters which minimize discrepancies between the valid and reference behavior of an output variable.
The program which performs the following functions is developed for parametric identification on the basis of indistinct model of Mamdani:
- educationm -the scenario causing function of non-linear optimization constr;
- formation of the goal_fun m-function calculating a discrepancy in case of the preset values of variables.
The quantity of linguistic variables is accepted equal to sixteen. From them eleven - coefficients of concentration of FP input and output variable terms; two - coordinates of maxima of centers of FP of terms "average" of input variables; rub - coordinates of maxima of centers of FP not of extreme terms of output variables: "below an average", "average" and "above an average".
Coordinates of maxima of FP of extreme terms "low" and "high" aren't set up. As a result of testing of a software on Mamdani's model suitable FP and rational quantity of indistinct terms are found.
The mean squared error of parametric identification of ATP on control sample in the conditional parameter in case of Mamdani's model makes 4.61 of 1000 points.
The second part of the pilot studies is devoted to carrying out structural identification in a conversational mode in the environment of a packet of Fuzzy Logic Toolbox containing a set of GUI - modules. The call of the main GUI - module is carried out at the command of fuzzy.
Identifications on the basis ofSugeno's model it is executed by application of algorithms of the lattice partition (grid partition) and subtractive clustering (sub-tractive clustering). The first algorithm allows to create KB containing all possible indistinct rules. The second algorithm generates the indistinct rules corresponding to areas of the greatest concentration of points of FP. Each input linguistic variable is evaluated by bell-shaped FP
x - h
If
1 +
2v\
M X) = v
' c' 1 (16)
Where, c -coefficient of concentration of points of FP; - coefficient of the steepness of FP;h -FP maximum coordinate.
The configured 24 settings are divided by the following principle:
- about 3 parameters for each of 4 rules KB are determined;
- about 3 parameters for each of 4 terms of entrance variables are determined.
The number of the configured settings for Sugeno's model with two entrances
and one exit is minimum. At the same time the mean square error of structural identification on control selection of 1000 points constitutes 1.81.
Efficiency analysis of software and algorithmic complex improve the reliability of the information is carried out on curve function of dependence of an error on number of iterations which schedules are illustrated in the drawing. Here efficiency of indistinct models is compared to efficiency of the algorithms based on statistical models of identification by a polynom of the 4th order. In the drawing shaped lines designated results for Mamadani's model, dash-dotted -for Sugeno's model, continuous - for a polynom of the 4th degree, P - a data processing error, To - number of iterations of a complex.
Function efficiency of the complex.
It is defined that for achievement of the required reliability of transfer and data processing average of iterations of software and algorithmic complex improve the reliability of the information when using model of Mamdani 10 times more, than Sugeno's models.
It is interesting to note that in case of the small sizes of the training data sets efficiency of a complex is significantly higher for indistinct model of Mamdani and function of a complex is provided in case of smaller number of iterations. At the same time, when the amount of the training selection exceeds quantity of the adjusted linguistic variables twice, the complex in case of Sugeno's model becomes steadier. When the amount of the training selection exceeds quantity of the adjusted linguistic variables more than three times, efficiency of a complex practically doesn't increase. The indicators of accuracy of the information received on indistinct models above than in case of polynomial models to two orders.
List of references
1. Makarychev PP, Afonin Yu Operational and Data Mining: Proc. allowance. - Penza: Izd PSU, 2010. - 156 p.
2. Barseghyan AA, Kupriyanov MS, VV Stepanenko Data Analysis Technology: DataMining, VisualMining, TextMining, OLAP. - S.-Pb .: BHV- Petersburg, 2007. - 384 p.
3. Zhumanov II Antigenic control system reliability and transmission of time-dependent processes data based on neuro-fuzzy network // Chemical technology. Kontroliupravlenie. -TSTU, Tashkent, 2013- number 5. - S. 49-56
4. Zaripova G.I. Increase of information transfer authenticity for non-stationary processes on the basis of neuro fuzzy data processing system // "Applied Technologies and Innovations", Prague Development Center. - Prague, 2013. - Vol. 9. - PP.1-10.
5. Rothstein AP Intelligent identification technology: fuzzy logic, genetic algorithms, neural networks. - Vinnitsa: UNIVERSUM., 1999.- 320 p.
6. Mityushkin YI, Mokin BI, Rothstein AP Soft - Computing: identification of patterns of fuzzy knowledge bases. - Vinnitsa: UNIVERSUM.- 2002.- 145 p.