■а о
Розроблено тформацшно-екстремальт алгоритми машинного навчання системи радюнуклидного дiагностування мюкарду з оптимiзацieю словника ознак розтзнаван-ня за тформацшним критерieм Кульбака, що дозволяе тдвищити точтсть дiагнос-тичних рiшень. Дослиджено по^довт спря-мован та ройовi алгоритми селекци словника ознак, який метить як кшьтст, так i категорiальнi ознаки. Отримано безпо-милковi за навчальною матрицею вирi-шальш правила
Ключовi слова: сцинтиграфiя, пере-творення Фур'е, тформацшний критерш,
машинне навчання, ройовий алгоритм □-□
Разработаны информационно-экстремальные алгоритмы машинного обучения системы радионуклидного диагностирования миокарда с оптимизацией словаря признаков по информационным критериям Кульбака, что позволяет повысить точность диагностических решений. Исследованы последовательно направленные и роевые алгоритмы селекции словаря признаков, который содержит как количественные, так и категориальные признаки. Получены безошибочные по обучающей матрице решающие правила
Ключевые слова: сцинтиграфия, преобразования Фурье, информационный критерий, машинное обучение, роевой алгоритм
-□ □-
UDC 004:891.032.26:616.127-073.7
|doi: 10.15587/1729-4061.2016.71930|
DESIGNING ALGORITHMS FOR OPTIMIZATION OF PARAMETERS OF FUNCTIONING OF INTELLIGENT SYSTEM FOR RADIONUCLIDE MYOCARDIAL DIAGNOSTICS
A. Dovbysh
Doctor of technical sciences, Professor, Head of Department* Е-mail: kras@id.sumdu.edu.ua A. Moskalenko Postgraduate Student* Е-mail: a.moskalenko@id.sumdu.edu.ua V. Moskalenko PhD, Senior Lecturer* Е-mail: systemscoders@gmail.com I. S h e l e h o v PhD, Associate Professor* Е-mail: igor-i@ukr.net *Department of computer science Sumy state University Rimsky-Korsakov str., 2, Sumy, Ukraine, 40007
1. Introduction
Radionuclide diagnostics of myocardium allows detecting its perfusion disorders at an early stage of the disease and assessing the severity of the pathological process in patients who had myocardial infarction, in order to determine the treatment trajectory of the patient [1]. The result of the radionuclide research is the reconstructed scintigraphic heart sections, which are often represented as the polar chart that visualize the share of radiopharm preparation inclusion into the myocardium segments in the state of rest or stress. In this case, the reliability of the interpretation of the functional examination results is determined by the skill level of the doctor-diagnostician, because in addition to the analysis of the scintigram pixel brightness in the state of rest and stress, it is necessary to take into consideration the contextual characteristics, for example, the symptoms, bad habits and chronic illness, the age and the weight categories, race, blood type, gender and others. One of the directions of reducing the load on a doctor-diagnostician and increase in the accuracy of diagnostic conclusions is development and
implementation of the capable of learning decision making support systems (DMSS) that make computer interpretation of the radionuclide examination results.
The original matrix of the polar map image contains 1088 pixels, which makes the process of the recognition of pathological functional states [1-3] of the myocardium difficult. The class alphabet, which characterizes the functional myocardium states generally does not fully cover the space of the features of the diagnostic recognition associated with it, due to the complexity of a full analysis of the subject area both at the stage the dictionary formation and at the stage of the alphabet formation. That is why, for the current alphabet of classes, the dictionary of features in the information sense is overloaded and needs optimization. The deletion of non-informative and disruptive signs can improve the information capacity and reduce the computational complexity of decision rules for the diagnostic system, however, under conditions of using training samples of small volume in the tasks with a lot of features, the traditional methods of machine training and feature selection are characterized by low efficiency.
©
2. Analysis of scientific literature and the problem statement
In the tasks connected with the analysis of the medical examination results, the use of artificial neural networks [3, 4], in which a minimum quality of training vectors depends on the feature space size, has become widely spread. However, in practice, the volume of training samples of certain classes of recognition does not reach even a hundred of features, but the volume of the dictionary of features in case of the radionuclide heart diagnostics exceeds thousands, which results in the low efficiency of this approach. Proposed in the works of [4, 5], the methods of the image compression of the polar map of the radiopharm preparation distribution in segment-by-segment averaging of pixels brightness or splitting images into components using Fourier fast transformation allow reducing the feature space size to hundreds or dozens of features, but does not allow getting faultless, by the training matrix, decision rules.
It is connected with both the intentional roughening of the results, loss of informative features and with ignoring the contextual data of patients' examination. In the papers [6, 7], the use of contextual results of patients' examination in the form of disease history and other clinical information for the prognosis of myocardium perfusion disorders on the basis of the supportive vectors methods (SVM) are considered. However, the accuracy of the obtained decision rules with different kernels of SVM classifier did not exceed 81 %, which is associated with heterogeneous distribution of the vectors of the sample and the intersection of the classes in the feature space. In the works [5, 8], it was offered to use the method of principal components and the algorithm Relief -F for reduction of the dictionary of features containing category features, both quantitative and converted with the use of Dummy-coding of category features. However, obtained by the traditional algorithms (SVM, J 4.8, Bayes Net and Naive Bayes), decision rules are not characterized by high accuracy as a result of ignoring non-linear structural image relations and the presence of category features.
One of the promising ways of improving reliability of the decision rules of the recognition of the myocardium functional state is the use of the ideas and methods of informational-extreme intellectual technology (IEI-technology) for the analysis and synthesis of the diagnostic systems capable of learning [9]. IEI-technology is based on the adaptation of the input mathematical description of the diagnostic system to the conditions of its functioning in the process of maximizing the information capability of the system, which allows substantiating the choice of the method and parameters of the compression of the images of the polar maps of radionuclide myocardium research without any loss of diagnostic information.
In this case, the use of the rough binary encoding of the training sample [10, 11] allows accelerating the search for optimal geometric parameters of separate hyper-surfaces in the binary sub-paraception feature space; using the information criterion allows handling the small-size samples (about 40 vectors) and ensuring high generalizing capacity of the decision rules as a result of the smoothing effect of the logarithmic function [12]. In addition, in order to increase the reliability and reduce the computational complexity of decision rules in the framework of the IEI-technology, there were investigated the algorithms for the sequential selection of features without their return, in which the assessment of
both separate features and genetic ones [13] was carried out by the selection of subsets of features.
However, it was shown that it is characteristic for the investigated algorithms to get locked in the local optimum of the multi-extreme information criterion. In this case, the genetic algorithms are characterized by high iteration and sensitivity to the settings of the input parameters, which, in a general case, is a priori unknown.
In the work [13], there was examined the use of sequential algorithms of the selection of features of return, which are characterized by the simplicity of implementation, however, the high complexity of the implementation limits their practical use in the tasks with lots of features. At this, within the IEI-technology there is an opportunity to assess the information of features directly in the process of optimization of the genotype and phenotype parameters of the diagnostics system functioning, which can increase the efficiency of the search for the optimal, in the information sense, dictionary of features.
In the paper [14], there was considered the use of the cluster algorithm of the feature selection, which is easier to implement in comparison with the genetic, it has fewer controlling parameters and is able to get the optimal solution within several iterations of its work. Therefore, it becomes necessary to study the algorithms for the information-extreme machine learning system of the radionuclide diagnostics with the optimization of the feature recognition dictionary by carrying out consistent directed and cluster procedures of search for the global maximum of the information criterion of the optimization, in order to enhance the reliability and the efficiency of diagnostic solutions.
3. The purpose and objectives of the study
The aim of this work is to enhance functional efficiency of the capable of leraning DMSS as a part of a computer system of functional diagnostics of cardio-vascular pathologies by the polar maps images of the myocardium perfusion and by the clinical information.
To achieve the goal, the following tasks are to be solved
- to determine optimum parameters for the compression of the images of the polar mapping of radionuclide examination of the myocardum without the loss of the diagnostic information;
- to design information-extreme sequentional directed and cluster algorithms of selection of the dictionary of features, which includes both quantitative and category features, and compare the effectiveness of the designed algorithms;
- to determine the optimum, in terms of information, volume of the dictionary of features by the representative training samples.
4. Algorithms for information-extreme machine training system of the functional diagnostics of the myocardium with the optimization of the dictionary of features
The input data for making the diagnostic conclusion concerning functional state of the myocardium can include both the quantitative and the category features. To the quantitative features of the functional state of the myocardium, we can generally refer the brightness values of the pixels in
the polar maps of the perfusion of myocardium at rest and load (stress) state, to category (contextual) features we can refer the symptoms, the harmful habits and chronic illness, the age and the weight category, the race, blood type, gender and others.
In the framework of the IEI-technology, there was developed the structure of the decision rules based on the hypothesis of the availability of the basic class of recognition XB e {Xm} relative to which all other images are considered as deviations of a certain level and direction. At this, with the purpose of considering the frequency of occurrence of category features in the basic class, the contour of the optimization should include the operator of their frequency conversion. In the simplest case, the conversion of the training matrix is replacing the nominal values of the category features of the input of a training matrix with the appropriate frequencies of their appearance in the basic class.
The application of rough adaptive encoding of features, in which the unification of the information of different types by binary representation takes place, allows considering the probable characteristics of both quantitative and category features [10, 11]. At this, in the framework of the IEI-tech-nology, the process of encoding features lies in the search of the optimal field limits of the tolerance control which determine the limits of the probability of the feature values in the basic recognition class X4° e{X^}. In the general case of the L-level of the tolerance control system (TCS) of the corresponding lower ALower | i and upper AUpper (i control tolerance of the first level is calculated by the formulas:
global maximum of the information criterion optimization its limiting value in the admissible area of values of function criterion and has the following structure of the iteration procedure optimization:
{S*} =< arg ®
max " 1 M y max Em
GSi M ¿—i m=1 {dm}eGdm m
> ,i = 1,N,
(4)
where GS is the area of admissible values of a parameter of the field of control tolerance for i-th sign; ® is the symbol of the operation of repetition; L is the number of runs of iteration procedures of consistent optimization of control tolerance;Em is the information criterion of the functional efficiency (CFE) of training the diagostic system to recognize the implementation class Xm
However, the efficiency of sequential algorithms usually depends on the starting point of the search, so to determine the input values of the parameter Sp which is the input parameter for the algorithm of sequential optimization, in IEI-technology, the implementation of parallel procedure of the search of the quasi-optimal values of parameter Si that are in the working area of the function definition of the informative CFE, is very common. The structure of the iteration procedure of the optimization of the parameter of the field of control tolerance by the parallel algorithm has the form of [13]
1 M
S* =< arg max -j — y
g I Mm=i
max Em
GenOd m
(5)
ALower,l,i = yB,i
AUpper,l,i = yB,i
1 —
1 —
(1)
(2)
where yBi is the averaged value of a feature in a basic class; Si is the parameter of the field of control tolerance; Smax is the maximum value for a field of control tolerance. The formation of binary training matrix
{x(nj,l|i = 1N;j = U;m = 1M}
for L-level of the tolerance control system is made by the rule
where GS is the admissible area of values for a parameter of field in the control tolerance
S=Si,i = 1N.
In order to increase the reliability of the decision rules and reduce their computational complexity, it is necessary to delete the interfering features, the deletion of which leads to an increase in the _averaged by alphabet classes informative CFE of training E and the non-informative features, the removal of which does not change the information CFE of training. The selection of signs of recognition is carried out by the tree-cycle iteration procedure of the search of the maximum CFE in the working (admissible) area of the definition of the functions of CFE of training by the procedure
„(j)
1, ifAL
0 , else ;
i ^ ym i ^ a
Upper'u;l = 1,L. (3)
The limits of the control tolerance divide the region of the possible values of a feature of recognition into 2xL+1 areas, to each of which a separate binary code of i-th sign consisting of L digits corresponds. The encoding distance between the codes of the adjacent areas is equal to one code unit, and the encoding distance between the code areas situated across one or more areas is equal to two or more code units. The proposed scheme of encoding (3) allows increasing the variety of binary vectors-implementations and considering the direction of the deviation of the distribution of vectors-implementstions of images from the basic class, which corresponds to the most desired functional state of the examined object.
The algorithm for iteration optimization of the control tolerance by the consistent algorithm lies in approaching the
Z = argmax{max{maxEk}},
° Zen 1 Gs {k}
where Ek is the averaged value of the CFE of training DMSS, computated at k-step of training (selection of the dictionary); {k} is the set of training steps.
Among the heuristic methods of the feature selection, the sequential directed algorithms, which include all the stages of the sequential deletion and the sequential return of features are widely spread. However, the characteristics of the feature encoding within the IEI-technology predetermine the presence of the built-in mechanisms for detection of the non-informative features, and the sequential assessment of the descriptiveness degree of signs by their influence on the information CFE of training allows identifying a set of informative, non-informative and disruptive symptoms at the beginning of each run of the procedure of directed optimal search of the dictionary of features. Consider the basic steps
m,L*i-L+l
of the implementation of the algorithm of the consistent directional selection of features modified within the IEI -technology:
1. Initialization of a meter of the number of features in the dictionary: i := N.
2. Implementation of the parallel-sequential optimization of the TCS of the optimal parameters of training by the procedures (4), (5).
3. Initialization of the meter of runs: k:=0.
4. k:=k+1.
5. The formation of the current dictionary Z. by deleting from the initial dictionary all N signs for which the optimal values of the upper and lower tolerance are equal between themselves or equal in their limit values i:= i- Np since they do not take part in the implementation of the maximum-distance or the minimum-distance principles of the optimization of the geometrical parameters of the feature space division.
6. Calculation by the procedure of the parallel optimization^) of the TCS of the maximum average value of the CFE Ei for the current dictionary Zi.
7. Formation of the plural options of dictionaries {Xih|h = 1,i}, the capacity per unit is less than the current one.
8. Computation by the procedure (5) of the parallel optimization of the TCS of the maximum average value of the CFE Ei,h for each dictionary Zih.
9. Deletion from the current dictionary Zi of all Ni features, the deletion of which compared to Ei does not change (non-informative features) or increases (disruptive features) the CFE of training: i := i - Ni.
10. Computation by the procedure (5) of the parallel optimization of the TCS of the maximum average value of CFE Ei.
11. Formation of the set of options of dictionaries
{X ult = 1,N - i},
the capacity of which is one point higher that the current one by returning the deleted features.
12. The definition by the procedure (5) of the parallel optimization of the TCS of the maximum averaged value of the CFE Ei,t for each version of the dictionary Zi t.
13. Adding to the current dictionary one of the signs i := i +J_, addingof which in the maximum way increases the CFE Ei = maxEi,t.
14. Comparison: if i < N, then we perform the transition to step 11, otherwise updating of the optimal dictionary
Z * = argmaxEi takes place.
15. The computation by the procedures (4), (5) of the parallel-sequential ^optimization of the TCS of the global maximum of CFE E for the optimal dictionary Z* and determination of the optimal parameters of training.
16. Comparison:
if |E -E |<e,
the cluster of particles (Particle Cluster Optimization) [14]. Cluster algorithm that implements the search of the optimum dictionary of features operates the populations of na agents, each of which contains one of the variants of the dictionary. The input set of features in the cluster search algorithm seems to be in the form of a numeric vector P|N| of the length N, which corresponds to the position of a cluster agent in a multi-dimensional space of decisions.
In this case the i-th component of the position of the agent of a cluster takes the values from 0 to 100 inclusive and corresponds to the probability of inclusion of the i-th sign of the full (original) dictionary to the dictionary of features of a specific agent. The threshold 8 is used to determine which of the features is used and which is not. By default, the threshold of the selection of features is equal to 8 = 0,5. If for the agent of a cluster the condition Pi >8 is met, then in the appropriate dictionary the i-th sign is present, in the opposite case, it is deleted.
To calculate the value of the objective function of the cluster search algorithm, it is possible to use the procedure (5) of parallel optimization of TCS, and the final value of the information capacity of the diagnostics system for the found optimum dictionary of features is possible to be obtained by the procedures (4) and (5) of the parallel-sequential algorithm of optimization of the TCS. Consider the basic steps of the implementation of the algorithm of particles cluster for optimizing vector P|N|.
1. Initialization of cluster particle (agents):
1) initialization of the number of particles na;
2) initialization of the size of each particle N and initialization of change limit of the i-th coordinate and the j-th particle Pji;
3) initialization of the input positions of particles
p,[0] := 100 ■ U(0,1);
where U(0,1) is the generator of random numbers from the range of (0,1);
4) initialization of the input speeds of particles; Vj(0) := 0;
5) initialization of the maximum speed of the particles
in Vmaxi;
6) initialization of the weight ratio for the formula of speed, i. e. the weight of inertia w and of the acceleration constants of c1 and c2.
2. The increment of the iteration number: k := k +1.
3. The increment of the particle number: j := j +1.
4. The increment of the coordinate number in the posi-tion:i:= i +1.
5. Calculation of a new particle state:
1) computation of i-th component of speed and j -th particle by the rules
VJk +1]:= wVJk] + c1a1i[k] x
x(Pbestj-[k] - p.. [k]) + c2a2 [k] ■ (Gbes^ - p [k]);
Vj,i[k +1]:=
Vj,i[k +1]
Vm
if if
V*[k"
else,
1] < Vm
where e is a small positive number, then "STOP", otherwise, make the transition to step 5.
One of the simplest for the implementation population algorithms of the search optimization is the algorithm of
where a1[k]=U(0,1), a2[k] =U(0,1); 2) upgrading the particle position
Pj[k +1]:= PJk] + Vj[k +1];
3) calculation of the target function J. [k +1];
4) upgrading the values of the best personal Pbest global Gbest positions of the search agents
and
n <n [Pbestj [k],if J(Pj [k +1]) < J(Pbestj[k]);
PbestJk +1]: = ■! j j j
j [Pj[k +1], if else;
Gbest[k +1] := arg max{J(Pbest. [k +1])}.
jj
6. Checking of stop conditions: if k < Kmax, where Kmax -maximum search iteration, and J(Gbest[k +1])< 1,0, then make the transition to step 2, otherwise to step 7.
However the cluster algorithm of search is directed primarily to deleting the disruptive features of recognition, and accordingly to increasing the averaged by the alphabet of classes CFE of training. For the purpose of the additional reduction of the capacity of the dictionary of features by deleting the rest of non-information features, it is necessary to make some modification of the cluster algorithm of the search. For this, it is necessary to modify the procedure of updating the values of the best personal Pbest position of the search agents by the rule
If | J(Pj) - J(Pbestj) | < e and |Pj | < | Pbest. |, then
Pbest. := P.;
where J(...) is the target function in the form of the averaged value of the function CFE; P., Pbest. is the current and the best personal position of j-th agent in accordance with | Pj |, | Pbest. | are the capacities of the dictionary of features of the current and the best personal positions of j-agent; e is any small, close to zero positive number.
Similarly, it is necessary to modify the procedure of updating the values of the best global Gbest position of the search agents.
If |J(Pbestj)-J(Gbestj)|<e and |Pbestj |< |Gbestj |,
then
Gbestj := Pbestj.
As the information CFE of machine learning the diagnostic DMSS, it is proposed to use a modified in the paper [15] normalized measure of S. Kullback, a working formula of which has the form
(k) K(km - K<km
T(k) =_1,m 2m_-
nm(log(2nm +10-w) + ra)'
x log
10-w+nm+[K(km - K2km] 10-w+nm - [K(km - K2km]
(6)
where K(k) is the number of events that characterize the reference of implementations of the class X^ to the container of the class X^ at the k-th step of the machine training; is the number of events that characterize the reference of the implementations of the closest neighboring class XO to the container of the class X° .
The value of CFE, calculated by the formula (4) lies in the range of actual numbers [0;1]. In this case, only the sensitivity of the CFE to the change in the characteristics of accuracy depends on the parameter The value of the parameter ra is usually chosen in the range rae[2;4].
Thus, the proposed algorithms of training intellectual DMSS for the functional diagnostics of the myocardium pathologies lie in implementing multi-cycle iteration procedure of the optimization of the dictionary of features, control tolerance and the geometrical parameters of the division of the feature space into the equivalence classes in the process of the search for the global maximum of the modification of the information CFE of S. Kullback in the workspace of the definition of its function.
5. Results of the physical simulation of the system of radionuclide diagnosing capable of learning
Fig. 1 shows the dependence of the averaged information CFE (6) of training the system of the functional diagnostics of myocardium on the number of the complex components of the fast Fourier transformation, which is used for forming a dictionary of diagnostic features. The three-level tolerance control system was chosen by the recommendations of the paper [15].
Fig. 1. Dependence of the averaged value of the CFE training system of diagnostics on the number of the components of the Fourier transformation
The analysis of Fig. 1 indicates that the optimum number of the complex components of the fast Fourier transformation is equal to c* = 33 and their further increase does not lead to the increase in the TCS of training to recognize the functional state of the myocardium. When this maximum value averaged by the alphabet of class of the CFE is equal to E = 0,75, which in accordance with the principle of deferred decisions illustrates the need to consider other parameters of functioning of the diagnostic system.
Let us consider the results of the machine training of the system of diagnostics with sequential directed selection of features for the alphabet of classes, which characterizes the functional state of the myocardium by ratings: XO - norm; X( - quiet ischemia; XO - acute ischemia; X4 - cicatrix on the heart. The volume of the sample for each class is nm = 300. The dictionary of features contains 66 quantitative features acquired through Fourier transformation, and 10 categorical features that characterize the context data of the examination.
In Fig. 2 the two curves are shown. The first curve illustrates the dynamics of the changes in the maxima of the normal value of the CFE of training in the process of the
selection of the features averaged by the alphabet of classes, and the second curve illustrates the graph of change of the ratio of the capacity of current vocabulary of features to the full dictionary of features during its optimization.
E, El 1,0
0,2
° 10 20 30 40 50 60 70 SO 90 100 k Fig. 2. Graph of the change of the maxima of the averaged CFE (6) and the capacity of the dictionary of features during its optimization by sequential directed algorithm of search
The first curve in Fig. 2 is formed from maxima of the averaged CFE, calculated by the parallel (5) algorithm of the optimization of the TCS. This last value jump of the CFE at the last run of the algorithm of the selection of features corresponds to the result of the parallel-sequential optimization of the TCS by the procedures (5) and (4) with the optimum dictionary of features. The second curve in Fig. 2 consists of the consecutive ups and downs which correspond to the procedures of deleting non-informative and disruptive features and the consistent returning of the deleted features.
The analysis of Fig. 2 shows that during 5 runs of the algorithm of feature selection, 100 iterations were carried out and the optimum, by 17 % abridged dictionary was obtained, in which the value of the CFE, calculated in the process of the parallel-sequential optimization of the TCS is equal to E =0.91, which corresponds to the increase in the reliability of recognition when compared with the full dictionary of features.
Consider the results of the cluster algorithm of the selection of the dictionary of features with the settings by default: the weight of inertia is equal tow = 0,95, the movement of the particles without acceleration c1 = c2 = 1,0, the speed of the particles is limited by the value Vmax j = 2 and the number of particles of cluster is na = 30.
Fig. 3 shows the process of the cluster selection of the dictionary without modifying the procedure of upgrading the values of the best personal Pbest and global Gbest positions, and in Fig. 3 it is shown with the proposed modification. In this case, the first curve illustrates the dynamics in the change of the maxima of the averaged by the alphabetical class value of the CFE (6) of training in the process of the cluster selection of features, and the second curve is the graph of a change in the ratio of the capacity of the current vocabulary of features to the capacity of a full dictionary of features during its optimization.
The analysis of Fig. 3 shows that at the 250th iteration of the cluster search algorithm, the best dictionary was found, that ensures the creation of decision rules, faultless by the training matrix, and the reduction of the capacity of the dictionary of features by 42 %. At this, 90 % of the non-informative and disruptive features were deleted at the first iterations of the cluster algorithm, and the capacity of the dictionary did not substantially change throughout the search.
E, El 1,0
Oil 0,4 0,2
E. El
1,0
0,8 0,6 ■ 0,4 0.2' 0
20 40 60 SO 100 120 140 160 k b
Fig. 3. Graph of change of the maxima of the averaged CFE and the capacity of the dictionary of features during its optimization by the cluster algorithm of search: a — without modification; b — with modification
The analysis of Fig. 3, b shows that in the process of the selection of the features by the modified cluster search algorithm, at the 82nd iteration, we manage to get the limit value of the averaged by the alphabet of classes CFE. However, the introduction of the limits to the number of iterations Kmax = 170 without stopping the search when reaching the limit CFE value, can reduce the capacity of the dictionary of features compared with the original one by 60 %. At this, the graph of a change in the dictionary capacity has mostly falling character.
6. Discussion of the results of physical simulation
Unlike the works [5, 7, 8], where the achieved accuracy of diagnostic results is 85 %, proposed in the framework of the IEI-technology modification of the sequential directed selection of the dictionary of features provides the estimated by the training matrix probability of making correct diagnostic decision equal to Ptrue = 0,999, which corresponds to two incorrectly classified vectors of the training sample. At this, the proposed modification of the cluster algorithm of the selection of the dictionary of features with using information-extreme machine training allows obtaining the faultless by the training matrix decisive rules.
In order to assess the representation level used in the experimental samples, the volume of which is 300 vectors, within the IEI-technology, the minimum volume of the representative training sample is defined. This is carried out by building at each test the confidence interval for estimating the probability of pi of the location of the i-th feature in the area of the control tolerance with the probability of reliability 1-Q:
Ptrue {nf-Wn) ^ pi ^ ^in) j = 1 - Q, (7)
where Q is the level of significance (small sign-positive number that is usually chosen from the standard range of 0.05; 0.01 and 0.001); eQmax(n) is the maximum statistical error with given Q, which in accordance with the paper [16] is a function of the number of tests n and is equal to
c _ arg[Q(x) = 1 - Q/2]
eQmax _ 2^ ,
where ®(...) is the Laplace function.
Test n from the compromise area 30 < n < 100 at which a preset interval is [0,5 ±A], where the value A is chosen from the range 0<A<0.5 by a developer of the software or is calculated by the program, covers dynamic confidence interval (7), defines the minimum volume of the training sample. In a general case, when the preset interval can cover the trust one in a forbidden area (n < 30), one must use the iteration procedure to decrease the parameter A until the lower and upper limits of the confidence interval (7) are covered in the compromise area by the interval [0,5 ±A]. Fig. 4 shows the example of the determination of the minimum number of the training samples for the first feature of recognition with the optimum system of tolerance control and a preset parameter A _ 0,15.
5 15 25 .15 45 55 65 75 85 95 il
Fig. 4. Determination of the minimum volume of the training sample: 1 is the graph of the function £Qmax=f(n); 2 is the graph of the empirical frequency p _ kj /n; 3 is the lower limit of the confidence interval; 4 is the upper limit of the confidence interval
Th analysis of Fig. 4 shows that in the compromise area, the minimum volume of the training sample for the first feature equals to nmin _ 81. A similar procedure was carried out for other features. Since the results of the procedure of the determination of the minimum volume of the tests nmin give the lower values to the examined features, in order to obtain the same number of implementations in the training matrix, it is advisable to take the maximum value nmin _ maxnmini i. e. nmln _ 81. '
Thus, the proposed algorithms of the optimization of the parameters of functioning of the information-extreme machine training of the diagnostic system allow obtaining the faultless by the training samples of the volume n > 81 highly reliable decision rules, the accuracy of which in the exam mode may acceptably approach the limit value.
7. Conclusions
1. There was determined the optimum, in the information sense, number of the components of the fast Fourier transformation which are used in the analysis of scintigrams of myocardium in a state of stress and rest, that allows increasing the efficiency of algorithms of machine training of a diagnostic system.
2. A high efficiency of the developed modifications of the algorithms of the selection of the dictionary of features was proved by the results of the physical simulation. The advantages of the cluster algorithm compared with the dictionary sequentially directed by the operative search and by the level of the compression were displayed. At this, there were obtained faultless by the training matrix decision rules, the reliability of which in the exam mode acceptably approaches the limit value.
3. The optimal capacity, in the information sense, of the dictionary of features, found by the cluster algorithm, which is 40 % of the original dictionary, i. e. meets 30 criteria, was defined. At this, there was established the minimum volume of the representative training sample of the myocardium scin-tigrams for receiving faultless by the training matrix decision rules, that is nmin = 81 which allows reducing the requirements for the formation of the input mathematical description with expanding the alphabet of the recognition classes.
References
Synefia, S. 3D images quantitative perfusion analysis and myocardium polar index for cardiac scintigraphy improvement [Text] / S. Synefia, M. Sotiropoulos, M. Argyrou, M. Bella, I. Floros, A. Valasi, M. Lyra // e-Journal of Science & Technology. - 2014. -Vol. 3, Issue 9. - P. 35-41.
Ohlsson, M. WeAidU-decision support system for myocardial perfusion images using artificial neural networks [Text] / M. Ohls-son // Artificial Intelligence in Medicine. - 2004. - Vol. 30, Issue 1. - P. 49-60. doi: 10.1016/s0933-3657(03)00050-2 Wadhonkar, B. M. A data mining approach for classification of heart disease dataset using neural network [Text] / B. M. Wadhonkar, P. A. Tijare, S. N. Sawalkar // International Journal of Application or Innovation in Engineering & Management (IJAIEM). -2015. - Vol. 4, Issue 5.- P. 426-433.
Heart Disease Diagnosis Using Multiple Kohenen Self Organizing Maps [Text] // Materials of International Conference on Advanced Research in Engineering and Technology, 2013. - 915 p.
Sajn, L. Image processing and machine learning for fully automated probabilistic evaluation of medical images [Text] / L. Sajn, M. Kukar // Journal Computer Methods and Programs in Biomedicine. - 2011. - Vol. 104, Issue 3. - P. e75-e86. doi: 10.1016/ j.cmpb.2010.06.021
Arsanjani, R. Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population [Text] / R. Arsanjani, D. Dey, T. Khachatryan, A. Shalev, S. W. Hayes, M. Fish et. al. // Journal of Nuclear Cardiology. - 2015. -Vol. 22, Issue 5. - P. 877-884. doi: 10.1007/s12350-014-0027-x
7. Tagil, K. A decision support system for stress only myocardial perfusion scintigraphy may save unnecessary rest studies [Text] / K. Tagil, D. Jakobsson, M. Lomsky, J. Marving, S. Svensson, P. Wollmer, B. Hesse // Journal of Biomedical Graphics and Computing. - 2013. - Vol. 3, Issue 2. - P. 46-53. doi: 10.5430/jbgc.v3n2p46
8. Ciecholewski, M. Ischemic heart disease detection using selected machine learning methods [Text] / M. Ciecholewski // International Journal of Computer Mathematics. - 2013. - Vol. 90, Issue 8. - P. 1734-1759. doi: 10.1080/00207160.2012.742189
9. Moskalenko, V. V. Intelligent Decision Support System for Medical Radioisotope Diagnostics with Gamma-camera [Text] / V. V. Moskalenko, A. S. Dovbysh, A. S. Rizhova, O. V. Dyomin // Journal of Nano- and Electronic Physics. - 2015. - Vol. 7, Issue 4.- P. 04036-1-04036-7.
10. Random Subspace Classifier for Recognition of Pests on Crops [Text] // Materials of 4th International Work Conference Bioin-spired Intelligence (IWOBI), 2015. - P. 219.
11. Dovbysh, A. S. Information-extreme method of classification of observation with categorical attributes [Text] / A. S. Dovbysh, V. V. Moskalenko, A. S. Rizhova // Cybernetics and Systems Analysis. - 2016. - Vol. 52, Issue 2. - P. 224-231. doi: 10.1007/s10559-016-9818-1
12. Sipos, R. Log-based predicitive maintenance [Text] / R. Sipos // Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14, 2014. - P. 1977. doi: 10.1145/2623330.2623340
13. Dovbysh, A. S. Feature set optimization of learning control system [Text] / A. S. Dovbysh, I. Shelekhov, E. V. Korobchenko // Adaptive automatic control system. - 2015. - Vol. 2, Issue 27. - P. 44-50.
14. Sivakumar, S. Modified PSO Based Feature Selection for Classification of Lung CT Images [Text] / S. Sivakumar, C. Chandrase-kar // International Journal of Computer Science and Information Technologies. - 2014. - Vol. 5, Issue 2. - P. 2095-2098.
15. Dovbysh, A. S. Information-Extreme Algorithm for Optimizing Parameters of Hyperellipsoidal Containers of Recognition Classes [Text] / A. S. Dovbysh, N. N. Budnyk, V. V. Moskalenko // Journal of automation and information sciences. - 2012. - Vol. 44, Issue 10. - P. 35-44. doi: 10.1615/jautomatinfscien.v44.i10.30
-□ □-
Представлено бездротовий датчик ei6pau,ii, створений на основi MEMS акселерометра. Описано застосован схе-мотехтчт ршення, структуру програмного забезпечення датчика, алгоритм корекци АЧХ MEMS акселерометра та обробки даних, розглянуто питання впливу на датчик елек-тромагнтних полiв. Результаты проведених випробувань тдтверджують працездаттсть датчика та можлив^ть забезпечення за його використання вимiрювань з точтстю, порiвняною з промисловими п'езоакселерометрами
Ключовi слова: вiбрацiя, MEMS акселерометр, бездротовий датчик вiбрацii, Wi-Fi, мотторинг обертового облад-нання
□-□
Представлен беспроводный датчик вибрации, созданный на основе MEMS акселерометра. Описаны примененные схемотехнические решения, структура программного обеспечения датчика, алгоритм коррекции АЧХ MEMS акселерометра и обработки данных, рассмотрен вопрос влияния на датчик электромагнитных полей. Результаты проведенных испытаний подтверждают работоспособность датчика и возможность обеспечения при его использовании измерений с точностью, сравнимой с промышленными пъе-зоакселерометрами
Ключевые слова: вибрация, MEMS акселерометр, беспроводный датчик вибрации, Wi-Fi, мониторинг вращающегося оборудования
-□ □-
UDC 534.08 + 531.768
|DOI: 10.15587/1729-4061.2016.71954|
DEVELOPMENT OF WIRELESS VIBRATION TRANSDUCER BASED ON MEMS ACCELEROMETER
P. Oliynik
PhD, Senior Researcher Research institute of telecommunications National technical university of Ukraine "Kyiv polytechnic institute" Peremohy ave., 37, Kyiv, Ukraine, 03056 E-mail: poleinik@ukr.net
1. Introduction
When measuring vibration of the heavy machinery (generators, turbines), connection of transducers is one of the critical issues. As a rule, classical piezoaccelerometers with charge output aren't used for that task due to their inherent
limitations. As cable capacity has a significant influence on the piezoaccelerometer voltage sensitivity coefficient and on the frequency response slope at high frequencies, cable length is typically not higher than 1-3 m, which is insufficient for measurement of vibration of heavy machinery. Moreover, the cable of piezoaccelerometer has to be prop-
©