Math-Net.Ru
A. A. Gudkov, S. V. Mironov, S. P. Sidorov, S. V. Tyshkevich, A dual active set algorithm for optimal sparse convex regression, Vestn. Samar. Gos. Tekhn. Univ., Ser. Fiz.-Mat. Nauki [J. Samara State Tech. Univ., Ser. Phys. Math. Sci.], 2019, Volume 23, Number 1, 113-130
DOI: https://doi.org/10.14498/vsgtu1673
Use of the all-Russian mathematical portal Math-Net.Ru implies that you have read and agreed to these terms of use http://www.mathnet.ru/eng/agreement
Download details: IP: 109.252.26.93 November 29, 2019, 20:53:53
Vestn. Samar. Gos. Tekhn. Univ., Ser. Fiz.-Mat. Nauki
[J. Samara State Tech. Univ., Ser. Phys. Math. Sci.], 2019, vol. 23, no. 1, pp. 113-130 ISSN: 2310-7081 (online), 1991-8615 (print) d https://doi.org/10.14498/vsgtu1673
Mathematical Modeling, Numerical Methods and Software Complexes
MSC: 90C20, 65K05, 62G08
A dual active set algorithm for optimal sparse convex regression
A. A. Gudkov, S. V. Mironov, S. P. Sidorov, S. V. Tyshkevich
N. G. Chernyshevsky Saratov State University (National Research University),
83, Astrakhanskaya st., Saratov, 410012, Russian Federation.
Abstract
The shape-constrained problems in statistics have attracted much attention in recent decades. One of them is the task of finding the best fitting monotone regression. The problem of constructing monotone regression (also called isotonic regression) is to find best fitted non-decreasing vector to a given vector. Convex regression is the extension of monotone regression to the case of 2-monotonicity (or convexity). Both isotone and convex regression have applications in many fields, including the non-parametric mathematical statistics and the empirical data smoothing. The paper proposes an iterative algorithm for constructing a sparse convex regression, i.e. for finding a convex vector z G 1" with the lowest square error of approximation to a given vector y G 1" (not necessarily convex). The problem can be rewritten in the form of a convex programming problem with linear constraints. Using the Karush-Kuhn-Tucker optimality conditions it is proved that optimal points should lie on a piecewise linear function. It is proved that the proposed dual active-set algorithm for convex regression has polynomial complexity and obtains the optimal solution (the Karush-Kuhn-Tucker conditions are fulfilled).
Keywords: dual active set algorithm, pool-adjacent-violators algorithm, isotonic regression, monotone regression, convex regression.
Research Article
3 ©® The content is published under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/) Please cite this article in press as:
Gudkov A. A., Mironov S. V., Sidorov S. P., Tyshkevich S. V. A dual active set algorithm for optimal sparse convex regression, Vestn. Samar. Gos. Tekhn. Univ., Ser. Fiz.-Mat. Nauki [J. Samara State Tech. Univ., Ser. Phys. Math. Sci.], 2019, vol. 23, no. 1, pp. 113-130. doi: 10.14498/vsgtu1673. Authors' Details:
Alexander A. Gudkov © https://orcid.org/0000-0002-6472-4855 Laboratory Assistant; Risk Institute; e-mail: [email protected] Sergei V. Mironovte https://orcid.org/0000-0003-3699-5006
Cand. Phys. & Math. Sci.; Associate Professor; Dept. of Mathematic Cybernetics and Computer Science; e-mail: [email protected]
Received: 22nd January, 2019 / Revised: 2nd March, 2019 / Accepted: 4th March, 2019 / First online: 5th March, 2019
Introduction. For a given vector z = (z1,...,zn)T from Rra, n £ N, the finite difference operator of order 2 is defined as follows
where A1Zi = Zi+1 — Zi is the finite difference operator of order 1 at Zi.
A vector 2 = (z1,..., zn)T £ Rra is said to be convex if and only if A2Zi ^ 0 for each 1 ^ i ^ n — 2. A vector z = (z1,..., zn)T £ Rra is called 1-monotone (or monotone) if Zi+1 — Zi ^ 0, i = 1,... ,n — 1, and convex vectors also may be called 2-monotone vectors.
The shape-constrained problems in statistics (the task of finding the best fitting monotone regression is one of them) have attracted much attention in recent decades [1-6]. One of most examined problems is the problem of constructing monotone regression (also called isotonic regression) which is to find best fitted non-decreasing vector to a given vector. One can find the detailed review on isotone regression in the work of Robertson and Dykstra [7]. The papers of Barlow and Brunk [8], Dykstra [9], Best and Chakravarti [10], Best [11] consider the problem of finding monotone regression in quadratic and convex programming frameworks. Using mathematical programming approach, the works [12-14] have recently provided some new results on the topic. The papers [15,16] extend the problem to particular orders defined by the variables of a multiple regression. The recent paper [1] proposes and analyzes a dual active-set algorithm for regularized monotonic regression.
fc-monotone regression is the extension of monotone regression to the general case of fc-monotonicity [17]. Both isotone and fc-monotone regression have applications in many fields, including the non-parametric mathematical statistics [3,18], the empirical data smoothing [19-21], the shape-preserving dynamic programming [22], the shape-preserving approximation [23-25].
Denote Ag the set of all vectors from Rra, which are convex. The task of constructing convex regression is to obtain a vector z £ Rra with the lowest square error of approximation to the given vector y £ Rra (not necessarily convex) under condition of convexity of z:
In this paper using the ideas of [26] we develop a new algorithm (the dual active set algorithm) for finding sparse convex regression. We prove that the dual
Sergei P. Sidorov® https://orcid.org/0000-0003-4047-8239
Dr. Phys. & Math. Sci.; Head of Department; Dept. of Functions and Approximations Theory; e-mail: [email protected]
Sergei V. Tyshkevich © https://orcid.org/0000-0001-5417-6165
Cand. Phys. & Math. Sci.; Associate Professor; Dept. of Functions and Approximations Theory; e-mail: tyszkiewicz@yandex. ru
A2Zi = A1zi+i - A1zi = zi+2 - 2zi+i + Zi, 1 ^ i ^ n - 2,
(1)
active set algorithm proposed in this paper is optimal, i.e. the solutions obtaining by the algorithm are optimal.
1. The dual active set algorithm for convex regression. Preliminary Analysis. The problem (1) can be rewritten in the form of a convex programming problem with linear constraints as follows
F(z) = 1 zTz — yTz ^ min,
where minimum is taken over all z £ Rn such that
gi(z) := —(zi+2 — 2zi + Zi) ^ 0, 1 ^ i ^ n — 2.
(2)
(3)
The problem (2)-(3) is a quadratic programming problem and is strictly convex, and therefore it has a unique solution.
Let z be a (unique) global solution of (2)-(3), then there is a Lagrange multiplier ^ = (^i,..., [i!n_2)T G R"-2 such that
n-2
VF (z) + J2 ViVgi(z)=0,
i=i
Qi(z) ^ 0, 1 ^ i < n — 2, ßi ^ 0, 1 ^ i ^ n — 2, ßi9i(z) =0, 1 ^ i ^ n — 2,
(4)
(5)
(6) (7)
where Vgi denotes the gradient of §i. The equations (4)-(7) are the well-known Karush-Kuhn-Tucker optimality conditions. It follows from (4) that
d_ dzi
1 n n-2
— yi)2 + Vi(—Zi+2 + 2Zi+i — Zi)
i=l
/ j I
i=l
= 0, 1 < i < n- 2,
i.e.
zi — yi — ßi =0
Z2 — V2 — ß2 + 2ßi = 0
¿3 — y3 — ß3 + 2/J.2 — ßi = 0
< Zj — yj — /J.j + 2^-1 — /j.j-2 = 0 (8)
Zn-2 — Vn-2 — №n-2 + 2^n-3 — №n-4 = 0 Zn-1 — Vn-1 + 2№n-2 — ^n-3 = 0 Zn — yn — V-n-2 = 0
One can ask the natural question: is it possible to reduce the problem of constructing convex regression for points y = (y1,y2,... ,yn) to the problem of constructing monotone regression for points Ay = (y2 — y1,y3 — y2,... ,yn — yn-1)? The answer to this question is negative. First of all, the Karush-Kuhn-Tucker optimality conditions for these problems are not identical. Secondly, we give a simple example showing that it is not possible to lower the order of monotonicity.
If we take y = (1,0,1,0,1) then Ay = (—1,1, —1,1). Optimal monotone regression for Ay is Az = (—|, — 1, — 1, 1). The convex regression recovered from Az is (1, 2,1, 0,1). The square error of approximation to y = (1, 0,1, 0,1) is equal to 0 + (2)2 + (2)2 + 0 + 0 = §.
On the other hand, if we take the convex sequence z = (1, |, |, |, 1), then the error of approximation to the given vector y = (1,0,1, 0,1) is equal to
0 + (1)2 + (I)2 + (3)2 + 0 = 5, which is less than the error for the convex regression recovered from monotone regression for Ay.
If one sums up all equalities in (8) then
n n
E= E y- (9)
i=i i=i
If we multiply s-th equality in (8) by s and sum up the resulting equalities, we obtain
n n
J2iZi = Y1iy%. (10)
i=i i=i
Thus, equalities (9)-(10) are necessary conditions for optimal solution.
Preliminary analysis of (4)-(7) shows that the first order differences of the optimal solution Z should be sparse, i.e. the sequence A1 Zi, 1 ^ i ^ n — 2, should have many zeroes. It follows from the Karush-Kuhn-Tucker conditions that optimal points should lie on a piecewise linear function. Note that some of these linear functions may be linear regressions constructed from the points of corresponding blocks. In this case the necessary conditions (9)-(10) are fulfilled.
A dual active-set algorithm for convex regression. In this subsection we propose an algorithm such that
- it is with polynomial complexity, i.e. the number of operations required to complete the algorithm for a given input y from Rra is 0(nk) for some nonnegative integer k;
- the solution is convex (the reachability analysis will show this);
- the solution is optimal (the Karush-Kuhn-Tucker conditions are fulfilled).
The proposed algorithm uses so called active set. The active set S consists
of blocks of the form [I, r — 2] C [1,n — 2], such that [l,r — 2] C 5, I — 1 / S, r — 1 / S, and
S = [li,ri\ U [¿2, n] U ... U [lm-i,rm-i\ U [I
where l1 ^ 1, rm ^ n — 2, fi + 3 ^ li+i, i S [1,to — 1], and m is the number of blocks. If ri = li then the f-th block consists of only one point.
Points zri,zri+i,..., zii,zii+i, zii+2 corresponding to the f-th block (plus two points from right) lies on a linear line, for each i.
At each iteration of the algorithm, the active set S C [1,n — 2] is chosen and the corresponding optimization problem is solved
1 n
— yrf ^ min, (11)
i=i
where the minimum is taken over all z £ Rn satisfying
Zi — 2Zi+1 + Zi+2 =0 V i £ S.
(12)
Note that there exists a unique solution to the problem (11)-(12). Let us denote the solution by z(S).
The dual active-set algorithm for convex regression begin
■ Input y £ Rn;
■ Active set S = 0;
■ Initial point z(S) = y;
■ while z(S) £ AV; do
■ set 5 ^ 5 U [n : Zi(S) — 2z+1(S) + z^S) < 0};
■ solve (11)-(12) using points from the active set S;
■ update z(S);
■ Return z(S); end
The computational complexity of the dual active set algorithm for convex regression is 0(n3). It follows from two remarks:
- at each iteration of the algorithm, the active set S is attaching, at least, one index from [1 : n — 2], which means that the number of the while loop iterations can not be greater that n — 2.
- the computational complexity of solving the problem (11)-(12) is 0(n2). 2. The convergence analysis of the dual active set algorithm. Lemmas. The next lemma finds the values of Lagrange multipliers.
Lemma 1. Let z be a global solution of (2)-(3). Then the values of the Lagrange multipliers ^ = (^1,..., ^n-2)T £ Rn-2 defined in (4)-(7) are
Proof. Lemma can be proved by induction. If i = 1 then it follows from (8)
Vi = — 3 + 1){zj — Vj), 1 < i < n — 2.
3 = i
(13)
that
zi — yi = ßi.
(14)
Then from (8) and (14) we obtain
Z2 — y2 = ß2 — 2ßi = ß2 — 2(Zi — yi),
and therefore
ß2 = (Z2 — V2)+2(Zi — yi).
Suppose that (13) is fulfilled for i > 2. Then it follows from (8) that
Zi+1 — Vi+1 = №i+1 — 2Vi + lJ■i-1, and consequently we have
Mi+1 = (zi+1 - Vi+l) + - Vi-1 =
i i— 1
= (Zi+i - Vi+l) + 2 J2(i - 3 + 1)(zj - Vj) - (i - i)(zi - yi) =
3=1 ]=i
i—1
= (Zi+1 - yi+1) + 2(Zi - yi") + Y^ (2(i - 3 + !) - (i - 3))(zj - Vj) =
3 = 1
i—1
= (Zi+1 - Vi+1) + 2(Zi - yi) + J2(i - 3 + 2)(zj - yj) =
j=1
i+1
= J2(i - 3 +2)(zj - yj). □
j=1
Lemma 2. Let 1 <E S, i.e. A2y1 < 0, and assume that 2, 3 ¡/ S. Let z1, z2, z3 be the values of linear regression constructed for input points (1,y1), (2,y2), (3,y3) (see Fig. 1(a)). Then the values of corresponding Lagrange multipliers defined in (13) are non-negative.
Proof. We will show that
^1 = Z1 - y1 ^ 0, ^2 = 2(^1 - y1) + (Z2 - y2) = 0, № = 3(^1 - y{) + 2(z2 - y2) + (¿3 - y:t) = 0.
We have ^1 = z1 - y1 > 0, since y1,y2,y3 lie on a concave line. It follows from (9) and (10) that
3
- Vi) = 0 (15)
i=1
and
3
J^i(Zi - yi) = 0. (16)
i=1
• yi • yi
• Zi • Zi
3
(a)
3 (b)
Figure 1. (a) The case 1 € S, i.e. A2yi < 0, and 2,3 € S, z1, z2, z3 are the values of linear regression constructed for input points (1 ,y1), (2,y2), (3,y3). (b) The case 1, 3 € S, i.e. A2y1 < 0, A2y3 < 0, and 2,4, 5 € S, Zi's are the solutions to the problem (17)
1
2
4
5
1
2
4
5
If we multiply (15) by 3 and subtract (16), we get 2(z1 — y1) + (z2 — y2) = 0, and therefore = 0.
If we multiply (15) by 4 and subtract (16), we get 3(z1 — y1) + 2(z2 — y2) + (z3 — y3) = 0, and therefore = 0. □
Lemma 3. Let y1,y2,y3,y4,y5 be such that 1, 3 E S, i.e. A2yi < 0, A2y3 < 0, and assume that 2, 4, 5 E S (see Fig. 1(b)). Let Zi be the solutions to the optimization problem
5
](zi — yi)2 ^ min, (17)
i=i
where minimum is taken over all z = (z1,..., z5)T such that A2z1 = A2z3 = 0. Then the values of corresponding Lagrange multipliers ^1,... defined in (13) are non-negative.
Proof. Since A2z1 < 0 we have ^1 = z1 — y1 ^ 0. The Lagrangian for the
constrained optimization problem (17) is 1 5
L(z, X1, X2) = 2^(zi — yi)2 — X1(z1 — 2z2 + z3) — h(z3 — 2Z4 + Z5),
i=1
where multipliers A1,A2 satisfy
dL dL
— = Z1 — y1 — X1 =0, — = Z2 — y2 +2\1 =0, OZ1 oz2
dL dL
T— = ¿3 — yz — X1 — X2 = 0, — = Z4 — y4 + 2X2 = 0,
OZ3 oz4
dL X n
T— = Z5 — y5 — X2 = 0.
0Z5
Then
/12 = 2(Z1 — V1) + (Z2 — V2) = 2X1 — 2X1 = 0, y,3 = 3(Z1 — y1) + 2(Z2 — y2) + (Z3 — y3) = 3A1 — 4X1 + X1 + X2 = X2, V4 = 4(Z1 —y1)+3(Z2 —V2)+2(Z3 — y3) + (Z4 — V4) = 4X1 — 6X1+2A1+2A2 — 2A2 = 0, V5 = 5(Z1 — y{)+4(Z2 — y2) +... + (Z5 — V5) = 5X1 — 8X1+3A1+3A2 — 4A2 + A2 = 0. Since A2z3 < 0 we have X2 = z5 — y5 > 0, and therefore > 0. □
If z2 — z1 ^ z4 — z3 (as it is shown in Fig. 1(b)) then A2Zj, ^ 0 for all i £ [1:3], and the algorithm constructed a convex solution. If z2 — z1 > z4 — z3 (as it is in Fig. 2(a)), then the point 2 must be added to the active set S = [1,3} at the next iteration of the algorithm and the block [1,2,3} of the active set S will be formed. The solution to the optimization problem (11)-(12) corresponding to the block [1, 2,3} are points lying on linear regression line constructed for points (1,z-\), ..., (5,z5) (see Fig. 2(b)). Note that the same regression will be constructed for the initial points (1,y1), ..., (5,y5).
3
(a)
3 (b)
Figure 2. (a) The case 1, 3 e S, i.e. A2yi < 0, A2y3 < 0, and 2,4, 5 / S, Zi's are the solutions to the problem (17), z2 — > z4 — z3. (b) The solution to the optimization problem (11)-(12) corresponding to the block {1, 2, 3} C S are points lying on linear regression line constructed
for points (1, zi),. .. , (5, z5)
1
2
4
5
1
2
4
5
Lemma 4. Let at an iteration of the algorithm the pairs
Y = {(1,Z1),..., (k,zk+1)}
be such that [1 : k - 2] C S, A2zi < 0 for all i <E [1 : k - 2], and k - 1,k,k + 1 / S
(see Fig. 3). Let zii £ [1 : k], be the solution to the optimization problem
1 k
- zi)2 ^ min,
i=1
where the minimum is taken over all ( £ Rk such that (i - 2^+ + = 0 for all i £ [1 : k - 2]. Assume that zf^ - 2zf0 + zk+1 < 0, i.e. the point k - 1
should be added to the active set S at the next iteration of the algorithm. Let zi1), i £ [1 : k + 1], be the solution to the optimization problem
1 k+1
2 ^(Ci - ¿i)2 ^ min, i=1
Figure 3. The case 1, 2,. .. ,k — 2 € S and k — 1 € S, ^¿0)'s are the values of linear regression constructed for input points (i, Zi), i € [1 : k]. After the iteration, the point k — 1 must be added to the block [1 : k — 2] since z(°0)1 — + Zk+1 < 0
where the minimum is taken over all ( £ Rfc+1 satisfying Q — 2(i+1 + (i+2 = 0 for all i £ [1 : k — 1]. Then for every i = 1, 2,... ,k we have
Vi — Vi > 0,
where , ^ are Lagrange multipliers for z(0) and z(1) respectively, and yi^1 = d+1 =0.
Proof. Note that z\0) are the values of the linear regression which uses input data points [(1, z-]),..., (k, Zk)}, obtained at points i £ [1 : k]. By the same
reason, zi11 are the values of linear regression which uses input data points Y points i £ [1 : k + 1].
It follows from (13) that
^ = £ (i + 1 — 3)( zf — Zj), № = £ (i + 1 — 3)( z? — Zj) 3=1 j=1
for all i £ [1 : k], and therefore, we have
i
^ = ^ + 1 — №ï] — (18) 3 = 1
Without the loss of generality we will assume that points z\0), i E [1 '■ k] , lie on a straight line passing through the origin and with a slope a, where —1 < a < 1, i.e.
zf] =aj, j = 1,... ,k. (19)
Since z— 2z^0 + Zk+1 < 0, there exists d> 0 such that Zk+1 = a(k + 1) — d. In this case the points z\10, i = 1,... ,k + 1, lie on a straight line with a slope a — (k+i)tk+2) and intercept W+Ï,
m ( 6d \ 2d , . .
= ia — (k Trn+2)) + —1 • > = 1--k + 1 (20)
It follows from (18), (19) and (20) that
^—^ = è+1—j){Ka — (k + m+2) ) + Y+ï— ajy
E„ , ( . 6 d 2d \ J' + l — "i— (k + 1)(k + 2) +—1) =
= —(' + 1)( k + rn+2) p + (' + l> vu È1+
3 = i 3 = i
6d ^ ,2 2d si^ ■
+ (k + 1)(fc + 2) J - + J = j=1 j=1
,. 6d i(i +1) ,. 2d .
= -(« + 1)^-^-r--- + (« + 1)i-
v J(k + 1)(fc + 2) 2 v Jk + 1
+ 6d i(i + 1)(2i + 1) 2d i(i + 1)_
(fc + 1)(fc + 2) 6 k + 12 di(i + 1) f 3(i + 1) 2i + 1 \ di(i + 1)(fc - i)
+ 2 + 2l±i_^ = a%(% + 1)(fc - %) > 0 n (k + 1) V k + 2 + + k + 2 ; (fc + 1)(fc + 2) > . □
Remark. It follows from Lemmas 2 and 4 that the values of Lagrange multipliers are increasing when adding a point to an existing block from right hand side.
Lemma 5. Suppose that before an iteration of the algorithm the pairs
Y = {(M1),..., (k,Zk)}
are such that [2 : k - 2] C S, i.e. A2zi = 0 for all i £ [2 : k - 2], and k - 1,k £ S. If the inequality z1 - 2z2 + z3 < 0 holds (see Fig. 4), i.e. 1 must be added to the block [2 : k - 2] C S at the iteration, then we obtain fai > 0 for every
i £ [1 : k], where fa, i £ [1 : k], corresponds to the solutions zii £ [1 : k], of the optimization problem
1 k
2 ^(Ci - Zi)2 ^ min, i=1
where the minimum is taken over all ( £ Rk such that A2^ = 0 for all i £ [1 : k - 2].
Proof. If we take S = {1}, then it follows from Lemma 2 that fa!"' > 0,
fa!^' = fas'"' = 0. Now let us consider the sequence of the basic operations of adding one point to a block from the right hand side:
- S = {1, 2}, then we have fa^1, fa^1 > 0, fai1 = fai1 =0 by Remark.
Figure 4. The case [2 : k — 2] C S, i.e. A2^ = 0 for all i € [2 : k — 2], and k — 1, k € S. After the iteration, the point k — 1 must be added to the block [1 : k — 2] since z1 — 2z2 + z3 < 0. Points z(1), i € [1 : k], are the solutions to the optimization problem (5).
- S = {1, 2,3}, then we have p^2 ^ 0, pf2 = = 0 (by Remark).
- S'= {1,2,...,k — 2}, then pf-3),..., fat-32 > 0, fat-32 = faih-3) = 0 (by Remark).
At each step we add a point to an existing block of the active set from the right hand side, and therefore the values of Lagrange multipliers are increasing, i.e. they are remaining non-negative. □
Lemma 6. Suppose that before an iteration of the algorithm the pairs
Yi = {(1,2i),..., (k - 3,zk-3)}, Y~2 = i(k,zk ),..., (s,zp)}, 4 < k < p,
are such that [1 : k - 3] C S, [k : p - 2] C S, k - 2,k - 1 / S, i.e. S2zi = 0 for all i E [1 : k - 3], A2Zi = 0 for all i E [k : p - 2], and p - 1,p / S. Assume that one of the following conditions is fulfilled :
- (the case 1) the inequality Zk-2 - 2zk-i + Zk < 0 holds (see Fig. 5), i.e. k - 2 must be added to S at the iteration;
- (the case 2) the inequality Zk - 2zk+1 + Zk+2 < 0 holds (see Fig. 6), i.e. k must be added to S at the iteration.
Let z(1\ i E [1 : p], be the solutions to the optimization problem
1 P
-^ min> (21)
i=1
where the minimum is taken over all ( E Rp satisfying Q — 2Q+\ + Q+2 = 0 for all i E [1 : p — 2] \{k — 1}. Then for every i = 1, 2,... ,p we have
^ > 0,
where p^2 are the values of Lagrange multipliers corresponding to the solutions z\1], i E [1 : p].
-»•
1 2 ••• k - 2 k - 1 k k + 1 k + 2 p - 1 P
Figure 5. The case 1 of merging two blocks
Figure 6. The case 2 of merging two blocks
Proof. The case 1. If z^10, i E [1 ■ p], are the optimal solutions to the problem (21) then there exists a point z* such that
- zk-2 — 2zk-1 + Z*k < 0,
- z\10, i E [1 ■ k], are lying on the regression line constructed by points (1, Z1),..., (k — 1, Zk-1), ( k, z**).
It follows from Lemma 1 that ,..., ^k-1 ^ 0, ^k-1 = 0. There exists a point z'p+1 such that
- the inequality zp-1 — 2zp + z'p+1 < 0 is fulfilled,
- points z\10, i E [k ■ p], are lying on the regression line constructed by points ( k, zk ), (k + 1, zk+1),..., (p, zp), (p + 1, z'p+l).
Then it follows from Lemma 1 that , ^k+1,..., ^ 0, ^p+1 = 0.
The case 2. If z\10, i E [1 ■ p], are the optimal solutions to the problem (21) then there exists a point 0* such that
- the inequality 0* — 2 1 + 2 < 0 holds,
- points z|10, i E [1 ■ k], are lying on the regression line constructed by points (0, zk), (1, Z1 ),..., (k — 1, zk-1), (k, zk ).
It follows from Lemma 5 that ,..., ^k-1 ^ 0, ^k-1 = 0. There exists a point z'k such that
- the inequality z'k — 2zk+1 + zk+2 < 0 is fulfilled,
- points z\10, i E [k ■ p], are lying on the regression line constructed by points ( k, z'k ), (k + 1, zk+1),..., (p, zp)).
Then it follows from Lemma 5 that , ^k+1,..., ^p-1 ^ 0, = 0. □
Availability and optimality analysis. When the algorithm is completed, the output vector has the form
, i %r± + 1 1 ... 1 %r2 ,.. . , Zrm — ± +1, . . . , Zn), Ax times A2 times Am times
where
- m is the number of segments;
- Xj is the number of points at jth segment, X1 = m1, m2 = X1 + X2, ... , mi+1 = X1 + X2 + ... + Xi,n = X1 + X2 + ... Xm.
All points Zj of each segment [ri + 1,..., ri+l] lie on a straight line. A segment can consists of one point.
The solution obtained as a result of the algorithm is convex, by design of the algorithm. It remains to show that the solution is optimal, i.e. it is necessary to show that the Karush-Kuhn-Tucker conditions are fulfilled.
The algorithm starts with any active set such that S C S*, where S* is the active set corresponding the optimal solution. At he first iteration we can always take S = 0 for simplicity.
Theorem. For any initial S C S*, the Algorithm converges to the optimal solution of the problem (1) in, at most, n — 1 — ^ iterations.
Proof. At each while loop of the algorithm, the active set S is expanded by attaching at least one index point from [1,n — 2], which is previously not belonged to the set S. The Algorithm terminates when z(S) becomes convex. If S = [1,n — 2], then the number of blocks is equal to 1 and, therefore, there is no violation of the convexity. If IS| < n — 2 then the number of iterations must be less than n — |51, where IS| is the number of indices in the initial active set S.
The optimality of the solution follows from Lemmas 2-6.
At each iteration of the algorithm, the points at which the convexity is violated are attached to the active set S. If one of such points i with a negative value of the second order finite difference A2zi is isolated (i.e. i — 2,i — 1,i + 1,i + 2 E S), then the algorithm replaces zi, zi+l,zi+2 with the values of linear regression constructed by the points (i, z^, (i + 1, zi+l), (i + 2, zi+2). Lemma 2 states that the values of the corresponding Lagrange multipliers will be non-negative.
If the second-order differences are negative at several consecutive k ^ 1 neighboring points, e.g. Zi,, zi+l,..., zi+k are such that A2Zj < 0, j = i,...,i + k, and A2Zi-2, A2Zi-1 , A2Zi+k+i, A2Zi+k+2 ^ 0, then the algorithm replaces Zj,, zi+l,..., zi+k+2 with the values of a linear regression constructed by the points (i, Zi), (i + 1, z.i+1), (i + k + 2, zi+k+2). Lemma 4 states that the values of the corresponding Lagrange multipliers will be non-negative.
When the algorithm works on subsequent iterations, it may turn out that a point with a negative second-order finite difference is not isolated. In this case the point where the violation of convexity takes place will be attached to the one of the neighboring blocks of the active set S as follows:
- Let r < rl < r2 < ... < rs < p — 2 be such that r + 1 < rl,rl + 1 < r2,...,rj + 1 < rj+l,... ,rs + 1 < p — 2. Suppose that [r : p — 2] C S and rl,r2,... ,rs E S at the previous iteration of the algorithm. If p — 1 must be added to the active set S at the current iteration of the algorithm, i.e. zp-l —2zp+zv+l < 0 for the current solution z, then it follows from Lemma 1 that the values of the corresponding Lagrange multipliers will increase, and therefore they will remain non-negative.
- Let r < rl < r2 < ... < rs < p — 2 be such that r + 1 < rl,rl + 1 < r2,...,rj + 1 < rj+]_,... ,rs + 1 < p — 2. Suppose that [r : p — 2] C S and rl,r2,... ,rs E S at the previous iteration of the algorithm. If r — 1 must be added to the active set S at the current iteration of the algorithm, i.e. A2zr-l < 0 for the current solution z, then it follows from Lemma 5 that the values of the corresponding Lagrange multipliers will increase, and therefore they will remain non-negative.
- Let r < rl < r2 < ... < rs < p — 2 be such that r + 1 < rl,rl + 1 <
r2,..., Tj + 1 < rj+1,..., rs + 1 < p — 2. Suppose that [r : p — 2] c S and r1, r2,..., rs £ S at the previous iteration of the algorithm. If p must be added to the active set S at the current iteration of the algorithm, i.e. zp—2zp+1+zp+2 < 0 for the current solution z, then it follows from Lemma 6 that the values of the corresponding Lagrange multipliers will increase, and therefore they will remain non-negative. - Let r < r1 < r 2 < ... < rs < p — 2 be such that r + 1 < r1, r1 + 1 < r2,..., Vj + 1 < rj+1,..., rs + 1 < p — 2. Suppose that [r : p — 2] c S and r1, r2,..., rs £ S at the previous iteration of the algorithm. If r — 2 must be added to the active set S at the current iteration of the algorithm, i.e. A2zr-2 < 0 for the current solution z, then it follows from Lemma 6 that the values of the corresponding Lagrange multipliers will increase, and therefore they will remain non-negative. The remaining cases of mergers for two or more adjacent blocks and isolated points can be represented as sequences of these basic cases. □
Conclusion. In this paper using the ideas of [26] we develop a new algorithm (the dual active set algorithm) for finding sparse convex regression. At each iteration of the algorithm, it first determines the active set and then solve a standard least-squares subproblem on the active set with small size, which exhibits a local superlinear convergence. Therefore, the algorithm is very efficient when coupled with a parallel execution. The classical optimization algorithms for nonconvex problems (e.g. coordinate descent or proximal gradient descent) only possess sublinear convergence in general or linear convergence under certain conditions [27]. On the other hand, k-FWA and k-PAVA examined in [17] are not optimal. Competing interests. We declare that we have no conflicts of interest in the authorship and publication of this article.
Authors' contributions and responsibilities. Each author has participated in the article concept development and in the manuscript writing. The authors are absolutely responsible for submitting the final manuscript in print. Each author has approved the final version of manuscript.
Funding. This work was supported by the Russian Foundation for Basic Research (project no. 18-37-00060).
Acknowledgments. The authors are grateful to the referee for careful reading of the paper and his constructive comments.
References
1. Burdakov O., Sysoev O. A Dual Active-Set Algorithm for Regularized Monotonic Regression, J. Optim. Theory Appl., 2017, vol.172, no. 3, pp. 929-949. doi: 10.1007/ s10957-017-1060-0.
2. Brezger A., Steiner W. J. Monotonic Regression Based on Bayesian P-Splines: An application to estimating price response functions from store-level scanner data, J. Bus. Econ. Stat., 2008, vol.26, no. 1, pp. 90-104. doi: 10.1198/073500107000000223.
3. Chen Y. Aspects of Shape-constrained Estimation in Statistics, PhD Thesis. Cambridge, UK, University of Cambridge, 2013, https://ethos.bl.uk/OrderDetails.do?uin=uk.bl. ethos.648300.
4. Balabdaoui F., Rufibach K., Santambrogio F. Least-squares estimation of two-ordered monotone regression curves, J. Nonparametric Stat., 2010, vol.22, no. 8, pp. 1019-1037. doi:10.1080/10485250903548729.
5. Hazelton M. L., Turlach B. A. Semiparametric Regression with Shape-Constrained Penalized Splines, Comput. Stat. Data Anal., 2011, vol.55, no. 10, pp. 2871-2879. doi: 10.1016/j. csda.2011.04.018.
6. Lu M. Spline estimation of generalised monotonic regression, J. Nonparametric Stat., 2014, vol.27, no. 1, pp. 19-39. doi: 10.1080/10485252.2014.972953.
7. Robertson T., Wright F. T. Dykstra R. L. Order Restricted Statistical Inference, Wiley Series in Probability and Mathematical Statistics. Chichester (UK) etc., John Wiley & Sons, 1988, xix+521 pp.
8. Barlow R. E. Brunk H. D. The Isotonic Regression Problem and its Dual, J. Amer. Stat. Assoc., 1972, vol.67, no. 337, pp. 140-147. doi: 10.1080/01621459.1972.10481216.
9. Dykstra R. L. An Isotonic Regression Algorithm, J. Stat. Plan. Inf., 1981, vol.5, no. 1, pp. 355-363. doi: 10.1016/0378-3758(81)90036-7.
10. Best M. J., Chakravarti N. Active set algorithms for isotonic regression: A unifying framework, Mathematical Programming, 1990, vol.47, no. 3, pp. 425-439. doi: 10.1007/ bf01580873.
11. Best M. J., Chakravarti N., Ubhaya V. A. Minimizing Separable Convex Functions Subject to Simple Chain Constraints, SIAM J. Optim., 2000, vol.10, no. 3, pp. 658-672. doi: 10. 1137/s1052623497314970.
12. Ahuja R. K., Orlin J. B. A Fast Scaling Algorithm for Minimizing Separable Convex Functions Subject to Chain Constraints, Operations Research, 2001, vol.49, no. 1, pp. 784-789. doi:10.1287/opre.49.5.784.10601.
13. Strömberg U. An Algorithm for Isotonic Regression with Arbitrary Convex Distance Function, Comput. Stat. Data Anal., 1991, vol.11, no. 2, pp. 205-219. doi: 10.1016/ 0167-9473(91)90072-a.
14. Hansohm J. Algorithms and Error Estimations for Monotone Regression on Partially Pre-ordered Sets, J. Multivariate Analysis, 2007, vol.98, no. 5, pp. 1043-1050. doi: 10.1016/j. jmva.2006.11.001.
15. Burdakow O., Grimvall A., Hussian M. A Generalised PAV Algorithm for Monotonic Regression in Several Variables, In: COMPSTAT, Proceedings in Computational Statistics, 16th Symposium Held in Prague, Czech Republic, vol. 10; ed. J. Antoch. New York, SpringerVerlag, 2004, pp. 761-767, http://users.mai.liu.se/olebu87/PAPERS/C0MPSTAT04.pdf.
16. Dykstra R. L., Robertson T. An Algorithm for Isotonic Regression for Two or More Independent Variables, Ann. Statist., 1982, vol.10, no. 3, pp. 708-716. doi: 10.1214/aos/ 1176345866.
17. Sidorov S. P., Faizliev A. R., Gudkov A. A., Mironov S. V. Algorithms for Sparse fc-Monotone Regression, In: Integration of Constraint Programming, Artificial Intelligence, and Operations Research, Lecture Notes in Computer Science, 10848; ed. W.-J. van Hoeve. Cham, Springer International Publishing, 2018, pp. 546-556. doi: 10.1007/978-3-319-93031-2_ 39.
18. Bach F. R. Efficient Algorithms for Non-convex Isotonic Regression through Submodular Optimization, 2017, arXiv: 1707.09157 [cs.LG].
19. Altmann D., Grycko Eu., Hochstättler W., Klützke G. Monotone smoothing of noisy data, Technical Report feu-dmo034.15, FernUniversität in Hagen, 2014, https://www. fernuni-hagen.de/mathematik/DM0/pubs/feu-dmo034-14.pdf.
20. Gorinevsky D. M., Kim S.-J., Beard Sh., Boyd S. P., Gordon G. Optimal Estimation of Deterioration From Diagnostic Image Sequence, IEEE Transactions on Signal Processing, 2009, vol.57, no. 3, pp. 1030-1043. doi: 10.1109/tsp.2008.2009896.
21. Hastie T. Tibshirani R. Wainwright M. Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics and Applied Probability, vol. 143. New York, Chapman and Hall/CRC, 2015, xvi+351 pp. doi: 10.1201/b18401.
22. Cai Y., Judd K. L. Advances in Numerical Dynamic Programming and New Applications, In: Handbook of Computational Economics, vol. 3; eds. K. Schmedders, K. L. Judd, Elsevier, 2014, pp. 479-516. doi: 10.1016/b978-0-444-52980-0.00008-6.
23. Boytsov D. I., Sidorov S. P. Linear approximation method preserving fc-monotonicity, Sib. Elektron. Mat. Izv., 2015, vol.12, pp. 21-27. doi: 10.17377/semi.2015.12.003.
24. Cullinan M. P. Piecewise Convex-Concave Approximation in the Minimax Norm, In: Abstracts of Conference on Approximation and Optimization: Algorithms, Complexity, and Applications (June 29-30, 2017, Athens, Greece); eds. I. Demetriou, P. Pardalos. Athens, National and Kapodistrian University of Athens, 2017, pp. 4.
25. Shevaldin V. T. Approksimatciia lokalnymi splainami [Local approximation by splines]. Ekaterinburg, Ural Branch of RAS, 2014, 198 pp. (In Russian)
26. Leeuw J. Hornik K. Mair P. Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods, J. Stat. Softw., 2009, vol.32, no. 5, pp. 1-24. doi: 10. 18637/jss.v032.i05.
27. Li G., Pong T. K. Calculus of the Exponent of Kurdyka-tojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods, Found. Comput. Math., 2017, vol.18, no. 5, pp. 1199-1232, arXiv: 1602.02915 [math.OC]. doi: 10.1007/ s10208-017-9366-8.
Вестн. Сам. гос. техн. ун-та. Сер. Физ.-мат. науки. 2019. Т. 23, № 1. С. 113-130
ISSN: 2310-7081 (online), 1991-8615 (print)
d https://doi.org/10.14498/vsgtu1673
УДК 519.65, 519.25, 519.853
Двойственный алгоритм на основе активного множества для построения оптимальной разреженной выпуклой регрессии
А. А. Гудков, С. В. Миронов, С. П. Сидоров, С. В. Тышкевич
Саратовский государственный университет им. Н. Г. Чернышевского (национальный исследовательский университет), Россия, 410012, Саратов, ул. Астраханская, 83.
Аннотация
В последнее время задачи статистики с ограничениями на форму данных привлекают повышенное внимание. Одной из таких задач является задача поиска оптимальной монотонной регрессии. Проблема построения монотонной регрессии (которая также называется изотонной регрессией) состоит в том, чтобы для данного вектора (не обязательно монотонного) найти неубывающий вектор с наименьшей ошибкой приближения к данному. Выпуклая регрессия есть развитие понятия монотонной регрессии для случая 2-монотонности (т.е. выпуклости). Как изо-тонная, так и выпуклая регрессия находят применение во многих областях, включая непараметрическую математическую статистику и сглаживание эмпирических данных. В данной статье предлагается итерационный алгоритм построения разреженной выпуклой регрессии, т.е. для нахождения выпуклого вектора z € К" с наименьшей квадратичной ошибкой приближения к данному вектору у € К" (не обязательно являющемуся выпуклым). Задача может быть представлена в виде задачи выпуклого программирования с линейными ограничениями. Используя условия оптимальности Каруша—Куна—Таккера, доказано, что оптимальные точки должны лежать на кусочно-линейной функции. Доказано, что предложенный двойственный алгоритм на основе активного
Научная статья
3 ©® Контент публикуется на условиях лицензии Creative Commons Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/deed.ru) Образец для цитирования
Gudkov A. A., Mironov S. V., Sidorov S. P., Tyshkevich S. V. A dual active set algorithm for optimal sparse convex regression, Vestn. Samar. Gos. Tekhn. Univ., Ser. Fiz.-Mat. Nauki [J. Samara State Tech. Univ., Ser. Phys. Math. Sci.], 2019, vol. 23, no. 1, pp. 113-130. doi: 10.14498/vsgtu1673.
Сведения об авторах
Александр Александрович Гудков © https://orcid.org/0000-0002-6472-4855 лаборант; институт рисков СГУ; e-mail: [email protected] Сергей Владимирович Миронов А https://orcid.org/0000-0003-3699-5006 кандидат физико-математических наук; доцент; каф. математической кибернетики и компьютерных наук; e-mail: [email protected]
Сергей Петрович Сидоров © https://orcid.org/0000-0003-4047-8239
доктор физико-математических наук, доцент; заведующий кафедрой; каф. теории функций и стохастического анализа; e-mail: [email protected]
Сергей Викторович Тышкевич © https://orcid.org/0000-0001-5417-6165 кандидат физико-математических наук, доцент; доцент; каф. теории функций и стохастического анализа; e-mail:[email protected]
множества для построения оптимальной разреженной выпуклой регрессии имеет полиномиальную сложность и позволяет найти оптимальное решение (для которого выполнены условия Каруша—Куна—Таккера).
Ключевые слова: двойственный алгоритм, активное множество, изо-тонная регрессия, монотонная регрессия, выпуклая регрессия.
Получение: 22 января 2019 г. / Исправление: 2 марта 2019 г. / Принятие: 4 марта 2019 г. / Публикация онлайн: 5 марта 2019 г.
Конкурирующие интересы. Мы заявляем, что у нас нет конфликта интересов в авторстве и публикации этой статьи.
Авторский вклад и ответственность. Все авторы принимали участие в разработке концепции статьи и в написании рукописи. Авторы несут полную ответственность за предоставление окончательной рукописи в печать. Окончательная версия рукописи была одобрена всеми авторами.
Финансирование. Работа выполнена при поддержке Российского фонда фундаментальных исследований (проект № 18-37-00060).
Благодарность. Авторы благодарны рецензенту за тщательное прочтение статьи и сделанные им конструктивные замечания.