Piecewise Continuous Segmentation of Multidimensional Experimental
Signals
Alexander G. Dmitriev Cherepovets Higher Military Engineering Schoolof Radio Electronics Cherepovets, Russia dag334a@fxmail.ru
Abstract
An algorithm for piecewise continuous approximation of structural experimental multidimensional signals with a previously unknown number of intervals for splitting signals into "similar" fragments is proposed. The construction of a multidimensional piecewise continuous approximating function is performed "left - to - right", which allows to use the dynamic programming method to determine the boundaries of the partition intervals. The approximation quality criterion is used, taking into account the number of data on the partition intervals and the "complexity" of the local signal models used.
number of "similar" sites (partition intervals). In addition, the approximating function usually suffers a gap at the boundaries of "similar" sites [Dmi10, Dor84].
The aim of the work is to develop an algorithm for piecewise continuous approximation of multidimensional signals with a previously unknown number of intervals of splitting signals into "similar" fragments, which under certain conditions delivers the optimal value to the selected criterion of approximation quality.
1 Problem Statement
Let a set of signals (a multidimensional signal) y(t) = (y(1) (t),..., y(^ (t)) collectively characterizing the object under study, be presented for analysis. The values y()(t), i = 1,...,s are specified at discrete points in time t = tl,...,tN. The criterion of approximation quality J on the sample of experimental values is chosen as:
Introduction
In various applications, the problem arises of analyzing the so-called structural experimental signals, considered as a time-sequential combination of simpler signals (functions), that have constant properties at the corresponding time intervals [Mot79, Kos04]. Processing of such signals in most cases is reduced to a two-stage procedure: the allocation of "similar" fragments (segmentation stage) and the subsequent construction of the description of the presented signals as a whole. The use of existing methods of signal segmentation proves to be insufficiently effective in studies under conditions of high dimensionality, limited experimental observations and an unknown
Copyright c by the paper's authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). In: A. Khomonenko, B. Sokolov, K. Ivanova (eds.): Selected Papers of the Models and Methods of Information Systems Research Workshop, St. Petersburg, Russia, 4-5 Dec. 2019, published at http://ceur-ws.org
JJijm 1 yt-Ftj (i)
¿=i j
teijT]
where - a priori "weight" of the signal y(', nj — the number of discrete samples on the interval (Tj-i, Tj ], Fftt,, a«) = f aj j (t) - polynomial over
given
j
set
of basis
functions
{jk (t), k = 1,..., m}, af = (a® ,...,ajm) -the vector of
estimated parameters in j-th interval.
Criterion (1) uses weights ^ (i = 1,..., s) and
nj / (nj - m) (j = 1,..., r). The introduction of weights
p. is due to the fact that in practice the parameters
often have different practical significance. Specific values p. are chosen for meaningful reasons, usually
a
they are normalized so that ^ /nt = 1. Weights
i= 1
nj / (nj - m) are the usual normalizing coefficients
that take into account the dimension of the model. It is required to find such a partition
T = (T0,T1,...,Tr), T0 <T <... <T (synchronous
change of signals yw is assumed) of a given interval ft,tN], Ta = ti, Tr = tN into r intervals (TJ_1,TJ],
j = 1,...,r ( r - in general, unknown) and to determine on each of these intervals such values of the parameter vectors a'j\ i = 1,...,5, that the functional (1) takes a
minimum value under the condition of restrictions on the continuity of approximating functions:
Fj >) = f^cT ,<),
(j = 1,...,r); (i = 1,...,s).
(2)
2 Algorithm
Denote by-the s(l) (Tj_1, Tj) error of approximation of
the i-th signal on the interval (Tj-l,Tj ] , calculated by
the method of least squares. Then criterion (1) will take the form:
J = (3)
¿=1 7=1 7
Let's make the following assumption. Borders Tj
(j = 1,..., r-1), partitions T = (T0,T,...,Tr) and
corresponding local approximating functions on each interval of partition will be found from "left to right". Let the position of the boundary Tl be determined,
find the local approximating functions Fj')(t, a(')),
delivering the minimum s1- '(T1,T2), i = 1,...,5. Next,
we fix the position of the boundary Tl, find the
boundary T2 and the corresponding local
approximating functions F2\t, a,')), delivering a
minimum of s(,>(Tl,T2) (i = 1,...,s)with a restriction
on continuity at the boundary Tl:
F^T <) = F^T af), i = 1,..., 5.
Fix T2 , etc.
Under this assumption criterion (3) with constraints (2) has the following property. Let be Tj some fixed
position of the right boundary of the j-th interval.
Then the bounds T* ,...,T* _*, obtained by minimizing
(3) only over the bounds T1,...,TJ_1 are independent of
the boundary values Tj+l,..,Tr_v Indeed, the
functional (3) can be represented as the sum of two nonnegative quantities: J = Ja + Jb , where
Ja=j (T!,..Tj / j=11 -n- E M/- » (T1, T )
nk -
T = TJ-
jb = jb(Tj+i,...Tr_ /j = ] I nn-m(T-i,Tk)
k=j+i
nk _
T = Tj.
But then, obviously, argmin J = argmin Ja.
It follows from this property that if Tj is the optimal
position of the j-th boundary, then the boundaries T,...,T*-i, obtained by minimizing J by T,...,Tj_l,
are also optimal. The considered property of criterion (3) makes it possible to use the dynamic programming procedure [Bel69] to determine the optimal boundaries of intervals.
Let the number of intervals be equal to r0.
The following recursive algorithm finds the partition and local approximating functions, delivering the optimal value to the functional (1) (under the above assumption).
First the functions Jj(Tj) (j = 1,2,...,r0 -1) are tabulated sequentially, where
Ji(T) = —1-ZHs<f> (To,T), T = tp,...,tN_(m_V)p;
n -mtt -0
JJ(TJ) = t min {.Jj-!(TjJ + ^-Zw10(W)'
...,tTj- p
T = t-
j j-p>
,t
N-(ro - J) p'
j = 2,..., ro-1;
Tj - the reference number corresponding to the
boundary Tj, p is the specified minimum allowed
number of samples on the partition interval. Approximating functions on adjacent intervals are constructed taking into account the conditions for continuity (2).
At the same time, the values
MJ_l(TJ), T = j ,...,tN _(ro-j ) p,
j = 2,..., -1, -
the values of the optimal positions of the boundaries Tj_ for each Tj, - are stored. Next, the optimal boundaries of the intervals are determined:
argmin
Tr0-l =t(r0-2)-p '~'tN-p
J-l (To-l) +
+£ Ml£(')(Tro-i,tN ) o
,(4)
K-2 = M,-2 fcl ) ,.., T1* = M1 (T ).
The extreme nature of the dependence J on r
is used to find the partition TH = (T0,T",...,TrH ^Tr)
at an unknown number of intervals. Indeed, with an increase r , on the one hand, there is an increase in weights nj / (nj - m + 1) due to a decrease in the
average nj, which, other things being equal, leads to
an increase in the criterion (1). On the other hand, as the number of intervals increases, the usual quadratic residual decreases, resulting in a decrease in (1). The simultaneous action of these factors leads to the fact that the functional (1) reaches its minimum value at some intermediate (not boundary) value rH .
For determination rH it is possible to use the
approach offered in work [Dmi10]. First, the minimum values of the functional (1) Jj (tN), j = 2,..., rmax are
calculated . At the same time, the values M j-l (Tj ) of
the optimal positions of the boundaries Tj_ for each
T are stored . Next, the number of intervals is
selected and the optimal boundaries are determined. As rH the smallest number of intervals r is chosen, at
which Jr (tN) takes the minimum value. Optimal
bounds THare determined from expressions analogous to (4):
Cl = MH-1 (tN ) . C2 = 2 (Cl ).
..., TH = Ml (T?). The value r is chosen
for substantive or Statistical reasons. In particular, as rmax can be used the value [N / p], where [ x] is the integer part x.
Modeling
To check the efficiency of the algorithm, its modeling was carried out on special multidimensional signals. The developed program made it possible to obtain a multidimensional signal with specified properties. The relevant procedure is as follows. The vector function is considered on the interval: [1, tf ]:
, (t ) = (/>(t )>(t )),
where
j=i
v{:) • t
j-i
T-i - T
(i = 1,...,
is a superposition of piecewise linear continuous function and independent Gaussian interference with and dispersion b2 ;
zero
mean
T0 = 1, T* = N, Tj, j = 1,...,r -1 - nodal points of piecewise linear vector function; yj, j = 0,..., r* -values of this function in points Tj ; e. -
characteristic function equal to one, if t e (Tj,Tj ],
and equal to zero, if t ï (Tj_, Tj ].
The nodal points Tj, the values of piecewise linear
functions yj at these points , the number of intervals
r* are determined randomly using a random number generator by the following procedure:
L j = 0; To =1; ^ = S0,-, i = U, S.
r-l
2. If A''-/-, < /;. then move to 4. Otherwise - we pass to 3.
3. j = j+\- Tj =Tj_l+[pjc + p], ([x] is the integerpart x), v(;" =gJj-d, /=l,...,s, thenmoveto 2.
4. T;=iV; /•' /.
c;, , /?. - random, independent, uniformly
distributed, respectively, at intervals [-l;l] and [0;l]
values; p. d. c. X - pre-selected parameters of the algorithm.
Next, the values y"(l). (I = 1.2..... A'), (i = L....V) of the vector function (5) are calculated. Linear functions were used as local approximating functions:
F*Jn(t,a(J)) = a(Jl)+a(Jl-t, /=1,...,*, j = \...,r .
In the studies, the experimental material contained three groups of three-component multidimensional signals, obtained using the procedure described above. The first group consisted of multidimensional signals without noise, the second and third, respectively, with average (or low) (dispersion b2 e [0,01-0,1]) and increased noise level (b2 e [0,1-0,3]). As expected, for no-noise the algorithm found the desired number of intervals, partitioning, and approximating functions without error.
Typical dependences of the criterion J on r for the second signal group are shown in figures 1
(the t sign in the figures indicates the actual number of intervals).
J+
x0,l
io-
Q" 8" 7" 6-5" 4-3-2-
1- ^_
0 1234567891 11 r A
Figure 1: Average noise level
For signals with an average noise level, the minimum functional, as a rule, falls on the desired number of intervals.
Typical dependences of the criterion J on r for the third signal group are shown in figures 2.
J*
x0,l
1234567891 11 r
Figure 2: Increased noise level
For signals with an increased noise level, the discrepancy between the optimal and the desired number of intervals was more often manifested. For signals with high noise levels were more likely to be a mismatch is found and the specified numbers of intervals.
For both groups of signals, this discrepancy was observed when there were adjacent intervals in the multidimensional signal, at which the difference in the" behavior " of the signal was insignificant, and, as a rule, such intervals contained a small number of samples.
It can also be noted, that for the signals of the second group, the reduction section of the graph of the dependence of J on r jumps into the increase section, and for signals with an increased noise level, such a transition occurs smoothly. The latter can be used to construct optimization procedures that perform a limited search over r.bjhb
Conclusion
Thus, the proposed approach of constructing a piecewise continuous approximating function "left - to - right" allows to apply the dynamic programming
method to determine the boundaries of the partition intervals. Using the extreme behavior of the approximation quality criterion, the number of partition intervals is determined.
References
[Mot79] V. V. Mottl, I. B. Muchnik. Hidden Markov models in structural analysis of signals. M.: Fizmatlit, 1999.
[Dmi10] A. G. Dmitriev. The Algorithm of optimal structural approximation of experimental multidimensional signals. Naukoemkie technologii, No 9: 31-35, 2010.
[Kos04] A. A. Kostin, O. V. Krasotkina, M. V. Markov, V. V. Mottl, I. B. Muchnik. Dynamic programming algorithms for analysis of non-stationary signals. Computational Mathematics and Mathematical Physics, V 44, No 1: 62-77, 2004.
[Dor84] A. A. Dorofeyuk, A. G. Dmitriev. Methods for piecewise approximation of multidimensional curves. Automatics and telemechanics, No 12: 101-108, 1984.
[Bel69] R. Bellman, S. Dreyfus. Applied problems of dynamic programming. M: Nauka, 1969.