Научная статья на тему 'For piecewise smooth signals, 𝑙1 - method is the best among 𝑙𝑝 - methods: an interval-based justification of an empirical fact'

For piecewise smooth signals, 𝑙1 - method is the best among 𝑙𝑝 - methods: an interval-based justification of an empirical fact Текст научной статьи по специальности «Математика»

CC BY
36
8
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
КУСОЧНО-ГЛАДКИЙ СИГНАЛ / PIECEWISE SMOOTH SIGNAL / 𝑙1 МЕТОД / 𝑙1 METHOD / ИНТЕРВАЛЬНАЯ НЕОПРЕДЕЛЕННОСТЬ / INTERVAL UNCERTAINTY

Аннотация научной статьи по математике, автор научной работы — Kreinovich Vladik, Neumaier Arnold

Traditional engineering techniques often use the Least Squares Method (i. e., in mathematical terms, minimization of the 𝑙2 norm) to process data. It is known that in many real-life situations, 𝑙𝑝 methods with ≠2 lead to better results, and different values of are optimal in different practical cases. In particular, when we need to reconstruct a piecewise smooth signal, the empirically optimal value of is close to 1. In this paper, we provide a new theoretical explanation for this empirical fact based on ideas and results from interval analysis.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

𝑙1 -метод наилучший для кусочно-гладких сигналов среди 𝑙𝑝 -методов: интервальное обоснование эмпирического факта

Традиционные методики инженерных расчетов часто используют метод наименьших квадратов (т. е., в математических терминах, минимизацию 𝑙2 -нормы) при обработке различных данных. Известно, что во многих реальных ситуациях 𝑙𝑝 -методы с 𝑝≠2 могут приводить к лучшим результатам, причем для различных практических случаев оптимальны различные значения 𝑝. В частности, при восстановлении кусочно-гладкого сигнала практически оптимальное значение близко к 1. В этой статье дано теоретическое объяснение этого экспериментального факта, основываясь на соображениях и фактах из интервального анализа.

Текст научной работы на тему «For piecewise smooth signals, 𝑙1 - method is the best among 𝑙𝑝 - methods: an interval-based justification of an empirical fact»

Вычислительные технологии

Том 22, № 2, 2017

For piecewise smooth signals, ^-method is the best among /^-methods: an interval-based justification of an empirical fact

V. KREINOVICH1'*, A. NEUMAIER2

1 University of Texas at El Paso, El Paso, TX 79968, USA 2University of Vienna, Vienna, A-1090, Austria

*Corresponding author: Kreinovich, Vladik, e-mail: vladik@utep.edu

Traditional engineering techniques often use the Least Squares Method (i.e., in mathematical terms, minimization of the ¿2-norm) to process data. It is known that in many real-life situations, P-methods with p = 2 lead to better results, and different values of p are optimal in different practical cases. In particular, when we need to reconstruct a piecewise smooth signal, the empirically optimal value of p is close to 1. In this paper, we provide a new theoretical explanation for this empirical fact based on ideas and results from interval analysis.

Keywords: piecewise smooth signal, ^-method, interval uncertainty.

1. Formulation of the problem

Z2-methods: brief reminder. Traditional engineering techniques frequently use the Least Squares Method — LSM (i.e., in mathematical terms, the /2-norm) to process data. For example, if we know that measured values b1,..., bm are related to the unknowns x1,..., xn by the known dependence

n

} ^ Aij Xj ~ bi,

3=1

and we know the accuracy ai of each measurement, then the LSM means that we find the values Xj for which the function

m / i га \ 2

у=^ - y, - ч 1=1 J

takes the smallest possible value.

By the Gauss — Markov Theorem [1], this method is provably optimal (being the best linear unbiased estimator) under the assumption that the measurement errors

Abj d=f Y АцXj - bi

3 = 1

ICT SB RAS, 2017

are uncorrelated with zero mean and standard deviation a». In addition, if the Abi are independent and normally distributed, then the maximum likelihood method [2, 3], which requires p(Ab1,..., Abn) ^ max, takes the form

Since the logarithm is a strictly increasing function, and the logarithm of a product p1 ■ ... ■ pn is equal to the sum of the logarithms, maximizing the maximal likelihood is equivalent to minimizing the sum of negative logarithms — log(p^) of pi, i. e., minimizing the sum

with ^>(x) = x2. We thus get the Least Squares Method.

Similarly, if we know that the next value Xi+1 is close to the previous value Xi of the desired signal, and that the average difference between x+\ — Xi is about ai, then we can use LSM to find the values Xi which minimize the sum

M-methods: brief reminder. In many practical situations, different measurement errors are independent, but the distribution may be different from normal; see, e.g., [4-6]. In this case, the maximum likelihood method is still equivalent to minimizing the sum (1), but with a different function ^>(x) = — log(p(x)).

In many other practical situations, we know that the distribution is not normal, but we do not know its exact shape. In such situations of robust statistics, we can still use a similar method, with an appropriately selected function ^>(x). Such methods are called M-methods; see e.g^ [2, 3, 7].

In such situations, if we know that the next value xi+1 is close to the previous value Xi of the desired signal, and that the average difference between x+\ — Xi is about ai, then we can use LSM to find the values Xi which minimize the sum

/^-methods: a brief reminder. Among different M-methods, empirically, /p-meth-ods — with ip(x) = |x|p for some p > 1 — turn out to be the best for several practical applications; see, e.g., [8]. In this case, we select a signal (= tuple) Xi for which the value

p(A&i,..., Abn) = pi(A&i) ■... ■ pn(Abn),

where

(1)

is the smallest possible. These methods have been successfully used to solve inverse problems in geophysics; see, e.g., [9, 10].

In [11], the empirical success of /p-methods was theoretically explained: it was shown that /p-methods are the only scale-invariant ones, and that they are the only methods optimal with respect to all reasonable scale-invariant optimality criteria. It is therefore reasonable to use /p-methods for processing data.

/^-methods: how to select p. The above-mentioned justification explains that with respect to each optimality criterion, one of the /p-methods is optimal — but does not explain which one. It is known that in different practical situations, different values of p lead to the best signal reconstruction.

For example, in the situation when the errors are normally distributed, p = 2 is the best value. For other situations, we may get p = 1 or p E ]1, 2[.

In each situation, we must therefore empirically select p — e. g., by comparing the result of data processing with the actual (measured) values of the reconstructed quantity.

Empirical fact. In several situations, we know that the reconstructed signal is piece-wise smooth. For example, in geophysics, the Earth consists of several layers with abrupt transition between layers; in image processing, an image often consists of several zones with an abrupt boundary between the zones, etc.

It turns out that in many such situations, the empirically optimal value of p is close to 1; see, e.g., [9] for the inverse problem in geophysics, and [12-15] for image reconstruction.

How this fact is explained now (see, e. g., [12]). In the continuous approximation, the /p-criterion leads to the minimization of f |i|p dt (in the 1D case; multidimensional case is handled similarly). For a transition of magnitude C and width e, the derivative x is ~ C/e, so the contribution of the transition zone to the integral is of order e/ep = e-(p-1). For p > 1, when e ^ 0, this contribution tends to x>. Thus, for p > 1, the minimum is never attained at the discontinuous transition ("jump") e = 0, but always at a smoother transition e > 0.

For p =1, the contribution is finite, so jumps are not automatically excluded — and indeed, they may be correctly reconstructed.

Limitations of our explanation. There are two limitations to this explanation:

• first, it explains why /p-methods for p > 1 do not reconstruct a jump, but it does not explain why I1 methods reconstruct the jump correctly;

• second, it strongly relies on the continuous case and does not fully explain why a similar phenomenon occurs for real-life (discretized) computations.

What we do in this paper. In this paper, we provide a new interval-based theoretical explanation for the above mentioned empirical fact, an explanation that is directly applicable to real-life (discretized) computations.

2. Analysis of the problem and the main results

For simplicity, we will consider 1-D signals x(t). In the interval setting, for several moments of time t1 < ... < tn (usually, equidistant ti = t1 + (i — 1)Ai), we know the intervals i — [Xi, Xi ] that contain the actual (unknown) values xi = x(tj). Based on this interval information, we would like to select the values xi E xi. According to the lp-criterion, among all the tuples (x\,..., xn) for which x\ E X\, ... , ^n E Xn, ^ve select the one for which the value

p

n— 1 £

i=1

(Ti

is the smallest possible.

To select p, we will consider the case of a "transition zone", i. e., the case when for some values I < u, we know two things:

• that the value xi-1 right before the zone cannot be equal to the value xu+1 right after the zone — i. e., that xl-1 If xu+1 = 0; and

• that we have (practically) no information about the values of Xi inside the zone — i. e., at least that for all i from I to u, the interval Xi contains both xl-1 and xu+1.

In this case, the above criterion interpolates the values Xi inside the zone. If we assumed that the signal is smooth, then, no matter how steep the transition, we would have had a smooth interpolation. However, since we consider the situations when the signal is only piecewise smooth, we would rather prefer to have a signal which "jumps" discontinuously from one value to another.

In this section, we will show that for p = 1, we will indeed get such a jump, while for p > 1, we have a smooth transition instead. Let us describe this result in precise terms.

Definition 1. By an /p-problem, we mean the following problem:

GIVEN: n intervals Xi =[x1,X\], ..., [xn,xn], n real numbers a1,... ,an,

and a real number p > 1; AMONG: tuples x1... ,xn such that Xi E [x^Xi] for every i;

FIND: the tuple for which the value V

n— 1

E

i=1

(Ti

is the smallest possible.

Definition 2. An lp-problem is called degenerate if all the values ai are different.

Comment. Almost all combinations a1,... ,an are degenerate.

Definition 3. Let I < u be integers. We say that an lp-problem contains a transition zone between I and u if the following two conditions hold (Fig. 1):

• Xi-1 fl xu+1 = 0; and

• for all i from I to u, we have Xi D xl-1 and Xi D xu+1.

Proposition 1. For p = 1, for each solution Xi to a non-degenerate lp-problem, in each transition zone, we have xl-1 = xl = ... = xt and xt+1 = ... = xu = xu+1 for some t.

In other words, for p = 1, in each transition zone, we have a "jump" from the value xl-1 before the transition zone to the value xu+1 after the transition zone (Fig. 2).

Comment. In the degenerate case, when different values ai are equal, the jump is still an optimal solution, but we may also get other solutions, with a smooth transition from

I- 1

u + 1

I-1

u +1

p

I

u

u

Fig. 1

Fig. 2

xl-1 to xu+1. For example, if all the values are the same, then, as one can easily see, the minimized criterion is proportional to the sum

n— 1

'y ] lXi — Xi+11.

i=1

So, for each solution that monotonically changes from xl—1 to xu—1, the corresponding part

u

^^ lXi — Xi+11 i=l—1

of the sum is equal to |xl—1 — xu+11. Thus, the value of the minimized criterion is the same for the jump solution and for a different solution in which Xi is the same outside [/ — 1,u + 1] — but strictly monotonically changes between I — 1 and u + 1.

Proposition 2. For p > 1, for each solution Xi to an lp-problem, in each transition zone, we have a strictly monotonic sequence xl—1 < xl < ... < xu < xu+1 or xl—1 > xl >

. . . > Xu > Xu+1.

Proposition 3. For p > 1, in the limit when all the values ai tend to the same value a, each solution Xi to an lp-problem, in each transition zone, is linear, i. e., has the form Xi = a + bi for some numbers a and b (Fig. 3).

These results explain why p « 1 is indeed empirically best for processing piecewise smooth signals: only for p = 1, ^-interpolation leads to a piecewise smooth signal.

Comment. The fact that ^-methods are the best among /p-methods does not mean that they are always the best possible interpolation techniques. For example, the above results show that, with an ^-method, we always get a jump, both

• for the steep transition from xl—1 to xu+1, where such a jump is desirable, and

• for a smoother transition from xl—1 to xu+1, where, from the physical viewpoint, we may want to prefer a smooth interpolation.

In other words,

• for small differences Xi — xi+1, we would like to have smooth transitions, while

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

• for large differences Xi — xi+1, we would like to have a jump.

Since a jump is reconstructed when ^>(x) = |x| and a smooth transition, when, e. g., ^>(x) = |x|2, a natural idea is to use a Huber function vp(x) which is equal to |x|2 when |x| is below

Fig. 3

a certain threshold x0, and which is linear ^(x) = C ■ |x| for | x | > Xq ; from the requirement that the function ^>(x) be continuous, we conclude that C = = C ■ |x0|, i.e., that C = x0. Such technique indeed leads to a better reconstruction of piecewise smooth signals; see, e. g. [12] and references therein. Various related choices for ^>(x) have been explored in the context of computer tomography by Kaufman and Neumaier [16, 17].

Huber's function ^>(x), in its turn, has its own limitations; it is worth mentioning that in general, the problem of optimally reconstructing piecewise smooth 2-D signals is NP-hard; see, e.g. [18-20].

3. Proofs

1. First, we observe that the solution to an /p-problem minimizes a continuous function V on a bounded closed set (box) x1 x ... x xn. Thus, this minimum is always attained, i. e., a solution always exists.

2. Let us prove that for every p, the solution xi to the /p-problem is (non-strictly) monotonic in each transition zone, i.e., that xi—1 < xi < ... < xu < xu+1 or xi—1 > xi >

. . . > Xu > Xu+1.

Let us prove this result by reduction to a contradiction. Namely, let us assume that the solution is attained on some non-monotonic sequence. The fact that xi is not monotonic on the transition zone means that not all inequalities between the neighboring values are of the same sign, i.e., that we have xi—1 < xi and Xj > Xj+1 for some indices i and j from this zone. Among such pairs (i,j), let us select a one with the smallest distance |z — j| between i and j.

Without losing generality, we can assume that i < j in this selected pair.

For the selected pair, for indices k between xi and Xj, we cannot have Xk < xk+1 or Xk > xk+1 — otherwise we would get a pair with an even smaller difference |z — j|. Thus, for all intermediate indices k, we get xk = xk+1. Since Xi — Xi+1 — ... — Xj, we thus have Xi, = Xj. So, we have xi—1 < xi = ... = Xj > Xj+1. Let e = min(xi — xi—1,xj — Xj+1 ). Let us now keep all the x-values outside (i,j) intact and replace Xi ... 'X j with the values Xi £ — ... — Xj £. The resulting value xi — e is equal to either xi—1 E xi—1 or to Xj+1 E Xj+1. By the definition of a transition zone, all intermediate intervals xk contain both xi—1 and Xj+1. Hence, the new value of xk is within the corresponding interval xk.

By making this change, we decrease the differences |xi — xi—]\ and lxj+1 — Xj| and leave all other differences intact — and hence, we decrease the value of the minimized objective function V (Fig. 4).

Since the objective function V attains its minimum at the original tuple xi, the possibility to minimize even further is a contradiction. This proves that the solution is monotonic in each transition zone.

3. For the solution, we have xi—1 < xi < ... < xu < xu+1 or xi—1 > xi > ... > xu > xu+1 according to Part 2 of this proof. To complete the proof of Proposition 2, it is now sufficient to prove that for p =1 and for k = I,... ,u, we cannot have any strictly intermediate values

Xk E (Xi—1,Xu+1) (or Xk E (Xu+1,Xi—1)).

We will prove this ad absurdum. Let us assume that an intermediate value xk does exist. In principle, we may have values equal to xk. Due to monotonicity, these values form an interval within [l,u\. Let Xb be the first value equal to xk, and let xe be the last value equal to xk. Then, we have ... < xb—1 < xb = ... = xe < xe+1 < ...

Xi X • Xj .. X

X Xi—l 1 1 X Xj+l

xb X . Xe . X X xe+l

X Xb—l 1 1

Fig. 4

Fig. 5

Let us now choose a value e E [xb—l — xb,xe+l — xe], keep all the x-values from outside [b, e] intact, and replace all the x-values from [b, e] with xb + e = ... = xe + e. Similarly to Part 2 of this proof, we can show that for every e, we still have xb + £ E xb, ..., xe + e E xe. After this replacement, we change only two differences lxi+l — Xi[.

• the difference — Xb —l1 %b—

i is replaced with Xb ^b— i + e, and

• the difference lxe+l Xe | 'Xe |l 'Xe is replaced with xe+l — xe — e.

Thus, after this replacement, the original value V of the minimized objective function is replaced with V + AV, where (Fig. 5)

AV = e

(- —1)

\&b—l OeJ

Since the problem is non-degenerate, i. e., all the values Oi are different, the coefficient at e in AV is non-zero. If this coefficient is positive, we can take negative £ and decrease V; if it is negative, we can decrease V by taking £ > 0. In both cases, we get a contradiction with the fact that the original tuple xi minimizes V. This contradiction proves that intermediate values are impossible. Proposition 2 is proven.

4. Let us now prove that the solution is strictly monotonic for p > 1, using reduction to a contradiction once again.

We assume that the solution is not strictly monotonic, while usual monotonicity holds (Part 2 of the proof). Since it is monotonic, the only way for the solution to be not strictly monotonic is to have xi = xi+l for some index i. We may have several indices with an x-value equal to this let b be the first such index, and let e be the last such index. Then,

Xb Xb+l ... Xe.

Since the intervals Xi—l and xu+l have no common points, we cannot have xi—l = xu+l. Thus, either b = I — 1 or e = u + 1. Without losing generality, we can assume that b = I — 1. Also, without losing generality, we can assume that the solution xi is increasing. Thus, we have xb—l < xb = xb+l.

Let us now pick a small value £ > 0 and replace xb with xb — £ — while leaving all other x-valued intact (Fig. 6).

xb Xb+l

X X

X Xb—l

Fig. 6

This replacement changed the original value V of the minimized function with a new value V + AV, where

(Xb — Xb—1 — e)p £p (Xb — Xb—1)p

=-p--1 p —

p p p 1 °b °b-1

By applying the first term of Taylor expansion to the first ratio in the expression for AV, one can conclude that

AF = —p(Xb — °:b—1)P~l e + 0(e2) + ^. °b—1

We consider the case p > 1. Then, for sufficiently small e, the first term dominates, so the difference AV is negative — which means that we can further decrease V.

This possibility contradicts to the fact that the tuple xi minimizes V. Thus, the solution is indeed strictly monotonic. Proposition 2 is proven. 5. Let us now prove Proposition 3.

By definition of the transition zone, for each index i from this zone, we have xi—1 Ç xi, hence x^ 1 E x^ 1 Ç xi and x i—1 E [x_i, x i ] — thence xi < xi—1. Similarly, from xu+1 Ç Xj,, we conclude that xu+1 < xi.

Due to strict monotonicity (Part 4 of this proof), we have xi—1 < xi < xu+1. Thus, xi < Xi—1 < Xi and and similarly, Xi ^^ ^Ci.

Since the value xi is strictly inside the interval xi, the derivative of the minimized function V is equal to 0. Differentiating V relative to xi (and taking monotonicity into account), we conclude that

p(xi — Xi—1)p—lapi—1 — p(xi+1 — Xi)p—lvpi = 0.

When ai ^ a, we get xi — xi—1 = xi+1 — xi. So, the difference xi — xi—1 is indeed the same for all i within the transition zone. Thus, we get the desired linear dependence of xi on i. The proposition is proven.

Acknowledgement. This work was supported in part by NSF grants EAR-0225670 and DMS-0532645 and by Texas Department of Transportation grant No. 0-5453. The authors are thankful to the participants of the Intern. Conf. on Finite Element Methods in Engineering and Science FEMTEC'2006 (El Paso, Texas, December 11-15, 2006) for valuable discussions, and to the editors, especially to Sergey P. Shary, for their encouragement and help.

References

[1] Zyskind, G., Martin, F.B. On best linear estimation and a general Gauss — Markov theorem in linear models with arbitrary nonnegative covariance structure // SIAM J. Appl. Math. 1969. Vol. 17. P. 1190-1202.

[2] Sheskin, D.J. Handbook of parametric and nonparametric statistical procedures. 3rd edition. Boca Raton, Florida: Chapman & Hall/CRC, 2003. 1184 p.

[3] Wadsworth, H.M. Handbook of statistical methods for engineers and scientists. New York: McGraw-Hill, 1990. 736 p.

[4] Novitskiy, P.V., Zograph, I.A. Estimating the measurement errors. Leningrad: Energoa-tomizdat, 1991. 304 p. (In Russ.)

Новицкий П.В., Зограф И.А. Оценка погрешностей результатов измерений. Л.: Энер-гоатомиздат, 1991. 304 с.

[5] Orlov, A.I. How often are the observations normal? // Industrial Laboratory. 1991. Vol. 57, No. 7. P. 770-772.

[6] Rabinovich, S.G. Measurement errors and uncertainties: theory and practice. N.Y.: Springer-Verlag, 2005. 308 p.

[7] Huber, P.J. Robust statistics. N.Y.: Wiley, 1981. 320 p.

[8] Bickel, P.J., Lehmann, E.L. Descriptive statistics for nonparametric models. 1. Introduction // Ann. Statist. 1975. Vol. 3. P. 1045-1069.

[9] Doser, D.I., Crain, K.D., Baker, M.R., Kreinovich, V., Gerstenberger, M.C. Estimating uncertainties for geophysical tomography // Reliable Computing. 1998. Vol. 4, No. 3. P. 241-268.

[10] Tarantola, A. Inverse problem theory: methods for data fitting and model parameter estimation. Amsterdam: Elsevier, 1987. 613 p.

[11] Nguyen, H.T., Kreinovich, V. Applications of continuous mathematics to computer science. Dordrecht: Kluwer, 1997. 419 p.

[12] Ascher, U.M., Haber, E., Huang, H. On effective methods for implicit piecewise smooth surface recovery // SIAM J. Sci. Comput. 2006. Vol. 28, No. 1. P. 339-358.

[13] Feng, H., Castanon, D.A., Karl, W.C. Underground imaging based on edge-preserving regularization // Proc. of the IEEE Intern. Conf. on Information Intelligence and Systems, October 31 — November 3, 1999, Bethesda, Maryland, IEEE Computer Society, Los Alamitos, California. P. 460-464.

[14] Nikolova, M. Minimizations of cost-functions involving nonsmooth data fidelity terms // SIAM J. Numer. Anal. 2002. Vol. 40. P. 965-994.

[15] Zhou, J., Shu, H., Xia, T., Luo, L. PET image reconstruction using Mumford-Shah regu-larization coupled with L1 data fitting // Proc. of the 27th Annual IEEE Conf. on Engineering in Medicine and Biology, China, September 1-4, 2005. P. 1905-1908.

[16] Kaufman, L.C., Neumaier, A. PET regularization by envelope guided conjugate gradients // IEEE Trans. Medical Imag. 1996. Vol. 15. P. 385-389.

[17] Kaufman, L.C., Neumaier, A. Regularization of ill-posed problems by envelope guided conjugate gradients //J. Comput. Graph. Stat. 1997. Vol. 6. P. 451-463.

[18] Averill, M.G., Miller, K.C., Keller, G.R., Kreinovich, V., Araiza, R., Starks, S.A.

Using expert knowledge in solving the seismic inverse problem // Proc. of the 24th Intern. Conf. of the North American Fuzzy Information Processing Society NAFIPS'2005, Ann Arbor, Michigan, June 22-25, 2005. P. 310-314.

[19] Averill, M.G., Miller, K.C., Keller, G.R., Kreinovich, V., Araiza, R., Starks, S.A.

Using expert knowledge in solving the seismic inverse problem // Intern. J. of Approximate Reasoning. 2007. Vol. 45, No. 3. P. 564-587.

[20] Cabrera, S.D., Iyer, K., Xiang, G., Kreinovich, V. On inverse halftoning: computational complexity and interval computations // Proc. of the 39th Conf. on Information Sciences and Systems CISS'2005, John Hopkins University, March 16-18, 2005. Paper 164.

Received 24 January 2017

i Надоели баннеры? Вы всегда можете отключить рекламу.