Научная статья на тему 'Nonlinear (local) optimization. The state of the art'

Nonlinear (local) optimization. The state of the art Текст научной статьи по специальности «Математика»

CC BY
115
49
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук

Аннотация научной статьи по математике, автор научной работы — Spellucci P.

In this overview article we give a short introduction into NLP theory first and then review some of the most promising solution techniques. Whereas convex problems can be dealt with also in very high dimension successfully already, the treatment of nonconvex cases offers resistance to a satisfactory solution approach, since obviously methods which worked well for medium large problems cannot be transfered to very high dimensions.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Nonlinear (local) optimization. The state of the art»

Вычислительные технологии

Том 6, № 3, 2001

NONLINEAR (LOCAL) OPTIMIZATION. THE STATE OF THE ART

P. SPELLUCCI TU Darmstadt, Dept. of Mathematics e-mail: [email protected]

Статья представляет собой краткое введение в теорию нелинейной локальной оптимизации (NLP) и обзор некоторых наиболее перспективных подходов к решению возникающих в связи с этим задач.

Contents

1 Notation 65

2 Some theory 65

3 Unconstrained minimization 67

4 Bound constrained problems 71

5 General linearly constrained problems:

active set methods 73

6 Linearly constrained problems:

Interior-point methods 77

7 Nonlinearly constrained problems 80

7.1 Transformation into an only bound constrained problem............. 82

7.2 Linearization methods ................................ 83

7.3 Modifications of the SQP method.......................... 83

7.4 Adaptation of interior-point methods........................ 88

7.5 Homotopy methods for the KTC-system...................... 92

8 Some numerical results 93

9 Conclusion 96

© P. Spellucci, 2001.

1. Notation

Superscripts denote elements of sequences. Exponentiation occurs on arithmetic expressions only, i.e. we write (a)2 instead of a2. || • || denotes the euclidean norm. Superscript T denotes transposition. We avoid the clumsy notation (xT, yT)T for composite vectors and simply write (x,y). No confusion should result from this. We make frequent use of the notation y a = {(y) : i G A} for a subvector of a vector y with indices in A. Similarly for a matrix B Ba denotes the matrix made up from columns of B with indices in A. Given a vector z y = z- denotes the vector with components y^ = 0 if Zj > 0 and y^ = Zj otherwise. e denotes the vector (1, ... , 1)T. Vf(x) denotes the gradient of a function /, a column or a system of columns (in the case of a vectorfunction). It is always related to the Jacobian by

Vf(x) = (J(x))T .

2. Some theory

The general nonlinear optimization problem is

NLP: Minimize f (x) with respect to x £ S , where the so called feasible set S is defined through

S = {x £ IRn : h(x) = 0, g(x) > 0} .

We assume here that h and g are defined everywhere on IRn with values in some IRe and IR1 and are of class C2 at least, although some results are valid for the class C1 too. The nonsmooth case requires quite special treatment, which is beyond the scope of this paper. For f we assume also differentiability of class C2 on an open superset of S. It is important to note that the same set S can be represented quite differently by functions h and g. It is the representation by a given triple (f, g, h) which defines a nonlinear programming problem in the sequel.

From a practical point of view, other formulations of NLP are more suitable, e. g. one standard form is

S = {x £ IRn : xu < x < xo

bu < Ax < bo cu < c(x) < Co } ,

where xu, xo, bu, bo, cu, co are given constants, possibly £ {-to, to} or possibly of equal size (defining an equation), and A a given matrix. This formulation reflects the different numerical treatment which may be applied to different parts of the constraints. In order to obtain necessary or sufficient optimality conditions, we need further assumptions on h and g, not on S. These assumptions are known as constraint qualifications. Much research took place for obtaining minimal assumptions of this kind, and indeed there is known such a, in some sense minimal one, the Guignard constraint qualification [69, 65]. However, finally from additional considerations it turned out that the following one, which much more lends itself to a geometric interpretation, is "the" qualification one should have, first given by Mangasarian and Fromowitz in [85]:

MFCQ: Vh(x) is of full rank and

Vh(x) z — 0, VgA.(x) z > 0 is solvable

where

A = A(x) = {i e I : gi(x) < 0}

is the so called "active set". (The full rank condition means that the gradients of the equality constraints are linearly independent. From this it follows via the implicit function theorem that the solution set of {z : h(x) = h(z)} is locally (around x) a smooth nonlinear manifold in IRn.) Geometrically MFCQ means that at x there exists a direction z which is tangential to the nonlinear manifold defined by {z : h(z) = h(x)} whereas all active inequality constraints can be strongly improved towards feasibility along z. From the rank condition on Vh it follows that it is also possible to find some z which improves feasibility with respect to h simultaneously if h(x) = 0 [114].

Observe that we have defined MFCQ here also for infeasible points, whereas the usual definition in the literature only takes feasible points into account. A much stronger condition is the so called regularity assumption

LICQ: (Vh(x), VgA(x)) is of full rank .

LICQ implies MFCQ. In order to see that consider the underdetermined linear system

Vh(x)T \ _ ( O

\T

VgA(x)T ) V e

which is always solvable if the matrix has full row rank, as assumed. There are many problems of practical relevance, where the former is violated. MFCQ has some far reaching consequences:

Theorem 2.1. Let x* e S be a local minimizer of problem NLP and MFCQ be satisfied there. Then there exists a bounded set L of multipliers A* e IR+ and for every such A* there

is a unique multiplier p* e IRE such that there holds

KTC: Vf (x*) - Vg(x*)A* - Vh(x*)p* = 0 ,

gi(x*)A* = 0 with A* e L and p* = p*(A*).

Reverse, if KTC holds with a bounded set of multipliers (A*,p*), then MFCQ is satisfied. Finally, if f is convex, h affinely linear and gi concave for all i e I (a convex optimization problem), then KTC is also sufficient for global optimality.

The proof of this can be found in [55] respectively in standard textbooks of nonlinear optimization, e.g. [97]. MFCQ has even more implications for a solution of NLP, e.g. it is the weakest known condition which guarantess exact penalizability of NLP based on properties of h and g alone, see e. g. [17] and also [68]. By exact penalization is meant a transformation of NLP into an unconstrained optimization problem, whose (local and global) solutions coincide with those of NLP. Such a penalization can be obtained via penalty functions of the form

f (x)+ y (||h(x)|| + ||g(x)_ ||)

where y is a suitably large constant and the norm is an absolute one, i.e. ||y|| = || |y| || V y. Here and in the following we use the following convention: given a vector x by x- respectively x+ we understand a vector with components min{0,xi} resp. max{0,xi}.

The Mangasarian-Fromowitz condition plays also a fundamental role in questions of stability of NLP. E.g.

Theorem 2.2. Consider the perturbed problem NLP(g0, h0) where S is replaced by

S(go,ho) = {x £ IRn : h(x) = ho, g(x) > go} .

For x £ S(0, 0) there exist some x £ S(h0,g0) with ||x — x|| < C x (||h0|| + ||g+||) for some constant C = C(x) and (h0,g0) in some neighborhood of (0,0) if and only if MFCQ holds at x.

For the proof, see [105]. A similar result applies to perturbations of the problem functions h and g in a nonparametric (topological) sense [66]. Hence one may consider a problem instance of NLP without MFCQ as incorrectly stated. This is surely true as long as feasible points are considered only. But observe that in practice one also needs that condition for infeasible points. Unfortunately it is far from being true that MFCQ is naturally satisfied there. Consider the simple and extremely well behaved example with

n = 2, f(x) = —x1 — x2, h(x) = x2 + x2 — 2 and g(x) = x.

The feasible set is the quarter circle of radius \[2 in the positive quadrant and even LICQ is satisfied here in a small neighborhood of the feasible set. However MFCQ is violated for x1 < 0, x2 < 0 and, surprisingly enough, some highly renowned codes fail on this example if provided with an initial guess in this region, if not g is given special treatment. (Most codes do not accept initial guesses which violate the given bound constraints. However, it is possible to construct similar examples with a more complicated, nonlinear g, hence this trouble remains.)

There exist also sufficient local optimality conditions for nonconvex problems. These involve the Hessian matrix of the so called Lagrangian of the Problem NLP, namely

L(x,A,^) = f (x) — h(x)T ^ — g(x)T A,

e. g.

Theorem 2.3. Let be given a point x* £ S which satisfies MFCQ and KTC with some pair (A*,^*). If in addition

zTVXxL(x*, A* ,^*)z> 0, V z = 0

with

Vh(x*)T z = 0, Vgi(x*)Tz > 0 if A* = 0,

Vgi(x*)Tz = 0 if A* > 0, V i £ I and some A* £ L, then x* is a strong local minimizer of NLP.

For a proof, again see [97]. Assumptions as given in the previous theorems are also typical for the convergence analysis of optimization algorithms for solving NLP.

3. nconstrained minimization

The original problem

f (x) = min V } xeIR"

is typically reduced to the identification of a single stationary point, that means one solution of the (in general nonlinear) equation

Vf (x) = 0 .

There are known also methods which aim in identifying points which in addition satisfy the necessary second order condition " V2f (x*) positive semidefinite". The classical approach here is the damped Newton method, either in form of employing a line search along the Newton direction or in form of a trust-region approach. Newtons method uses second derivatives and before the advent of automatic differentiation techniques this has been considered as a severe disadvantage. Much effort therefore has gone into the development of methods which avoid usage of second derivatives, which lead to the class of quasi-Newton methods. Astonishingly enough the use of e. g. the well analyzed methods from the Broyden class, especially the BFGS-update formula, often results in algorithms of higher efficiency than the usage of exact second derivatives, maybe because these methods in some sense remember the global behaviour of f whereas a true Newton step employs pure local information only. It can safely be stated that unconstrained problems of small and medium large dimension, say up to some hundred variables, can nowadays routinely and safely be solved using available quasi-Newton based codes, even if the Hessian matrix is rather illconditioned. Unfortunately, no method of practical relevance evolved in this class which were capable of dealing with sparsity in the Hessian (updates from the Broyden class always give full matrices even for a diagonal Hessian). However much progress has been made towards the solution of large-scale problems by exploiting special structure in the Hessian or by using methods adapted from large-scale linear solvers, see e. g. work by Griewank and Toint [67], Nash [92], Buckley and Lenir [18], Byrd, Nocedal and Schnabel [24] and Liu and Nocedal [83]. Griewank and Toint consider problems which show a special additive structure, the class of so called partial separable functions, that are functions f with

m

f (x) = fj (xj) with xj a lowdimensional subvector of x . j=i

The central idea is to devise updating formulae for the Hessians of the fj by dense matrices Aj separately and combining these to a sparse approximation of the Hessian of f. There is public domain software which implements this. The method works well in the convex case and if dimensionality is small enough to allow direct solution techniques for the linear system with the matrix Aj. All other methods mentioned can be characterized as modifications of Newton's method (so called inexact Newton methods) or of the method of conjugate gradients, known for the linear case since 1951. Using these methods problems with some 10 000 variables can be solved without much trouble, at least if illconditioning of the Hessian is not too hard.

In high dimensional unconstrained minimization the main effort comes from the determination of the so called direction of descent dk with (Vf (xk))Tdk < 0. With this in mind the majority of methods which have been proposed up to now follow the linesearch model, i. e.

xk+i = xk + akdk ,

where ak is chosen such that {f (xk)} decreases strongly monotonically satisfying the principle of sufficient decrease

f (xk) - f (xk+1) ^ 0 ^ Vf (xk) ^ 0 ,

with dk satisfying the condition

|(V/))T dk | l|dk ||

> 0(l|V/

Here 0 is a strongly monotonically increasing function with 0(0) = 0, not necessarily explicitly known. The stepsize can be computed by a variety of algorithms, the simplest one being backtracking:

= argmaxja0;k : j G IN0 and

/(xk) - /(xk + ftao,fcdk) > ao,fc(-V/(xk))Tdk}

where 0 < ft < 1 and 0 < $ < ^ are two constants and a0;k an initial guess, maybe simply 1

or some value obtained from a preliminary interpolation, e. g. using the values /(xk), /(xk + dk) and V/(xk)Tdk.

The so called trust-region methods also require a descent of this form, using formally = 1. Here a local model for / is used which has the form

/ (d) = /(xfc) + (V/(xk))Td +1 dTAfcd, ||d|| < pfc .

The (global) minimizer of this problem yields dk and the step xfc+1 = + dk is accepted if

/ (xk) - / (xk+1)

0 < n <

f (0) - Z(dk)

for some small constant n > 0, e.g. n = 10-3. There is then an adaptive choice of pk. If the test fails, then pk is decreased and dk must be computed anew, that means a further (large) linear system must be solved, which increases the algebraic complexity considerably. But the function information need not be computed anew and for very expensive evaluations the additional algebraic effort may be worthwhile. If the step is accepted, then pk+1 is chosen equal to or larger than pk. Trust-region methods have advantages over linesearch methods because of weaker conditions on the models matrix and stronger theoretical results (typically these methods converge to second order necessary points without using negative curvature information if = V2/(xk) for all k.) A trust-region Newton's method is realized in the code TRON of Lin and More [82].

In the (theoretically) ideal case the direction dk is given by Newton's method. The linear system

V2/(xk )dk = -V/(xk)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

can be solved by direct methods for large n if the Hessian is very sparse only. Fortunately, it is not necessary to solve this system very precisely, at least far from the solution, without loosing the good convergence properties. This observation leads to the so called inexact Newton methods, where the Newton direction is approximated by some steps of the preconditioned conjugate gradient method or some solver based on the Lanzcos algorithm. This latter approach requires a routine which computes the product of the Hessian with a given arbitrary vector. In Nash's Code TN/TNB, which is available through NETLIB [93] this is replaced by taking a forward difference of the gradient (V/(xk + tv) — V/(xk))/t, which provokes additional problems. If the solution precision is controlled appropriately (one needs a final residual less than

||V/(xk)||, ak ^ 0), then superlinear convergence speed is maintained. The computation of dk requires several matrix-vector products respectively several evaluations of V/.

Alternatively one may use the so called quasi-Newton methods with limited-memory or the nonlinear preconditioned conjugate gradient method, applied directly to /. These methods generate directions of descent also for nonconvex / and need as essential information only one gradient of / every step. The nonlinear conjugate gradient method, originally proposed by Fletcher and Reeves and now known in a lot of variants is the simpler of these two approaches. Here dk is computed recursively

gk = V/(xk),

gk = Bgk, (B is the preconditioner, positive definite ),

Yk = (gk)r$k,

( if k = 0(mod N),

dk = I (1 - (gk)T)gk + dk-1 otherwise. I Yfc-i Yfc-i

N is the so called restartindex. In theory one has N > n, and, as shown by AlBaali [2], N may be even infinite. With the exception of the leading factor of gk these are the relations of the classical (linear) cg-algorithm. The modification forces (gk)Tdk = Yk > 0, independently of the properties of / and ak. In practical computations it makes little sense to let N grow very much because the quality of the directions is deteriorated very strongly by roundoff and finally the determination of the stepsize a comes into trouble.

Remark: There is a point which often is neglected in theoretical discussions of this method. In the linear case, with a fixed positive definite Hessian, one knows for sure that the "optimal" stepsize produces descent and never checks that, whereas here monotonic descent of / must be verified, for fixing the stepsize. However, under the influence of roundoff, the test for descent may fail even if the direction dk is a good one.

If one chooses N > n then N-step quadratic convergence of this method can be shown. However, for very large n this is practically irrelevant.

The usual quasi-Newton methods replace the Hessian Gk by a positive definite "approximation" Ak and compute dk from

'ik d

Afc dk = -V/(xk )

resp. with Hk = A 1

k

dk = -HV/(xk) .

None of the quasi-Newton methods designed for the maintenance of a given sparsity pattern of the Hessian has shown practical success, whereas the successful ones all produce full matrices even for diagonal Hessian. That caused the research in limited memory methods. One of the best practical methods is based on the so called BFGS-formula which defines {Ak} recursively using the differences

sk - xk + 1 _ xk

and

yk = V/(xk+1) - V/(xk) .

It reads

A = A AkSk(sk)TAk + yk (yk )T Ak+1 Ak (sk)TAksk + (yk)Tsk ,

which for (yk)T > 0 and positive definite Ak produces a positive definite Ak+1. (It is even possible to show that for uniformly convex / the eigenvalues of Ak and A-1 remain uniformly bounded under very weak conditions on the step ^ xfc+1, see e.g. [23]. The condition (yfc)T> 0 is satisfied automatically for / uniformly convex and is enforced otherwise through the stepsize selection using the so called Powell-Wolfe conditions). For Ak resp. = A-1 there exists for the case A0 = 11 another closed formula [24]

Hfc+1 = Yl + Cfc

Dii,fc D21,k

D12,k O

(Ck )T

where

Cfc D21,k D12,k D11,k

Sfc YYfc

-R—1, p21,k)T ,

D12,k (Afc + Y(Yfc )T Yfc )D21,k

with

(Rk )

Ak

Sk Yk

(s^)Tfor 0 < i < j < k, 0 otherwise ,

diagc<i<k((si)Tyi) , (s0,...,sk) ,

The idea of quasi-Newton methods with limited memory simply consists in defining in this formula Yk, Ak and using the last m instances

s

k—m+1

and a variable y = Yfc and fixing a formula for this way. Of course this applies for k > m only. Recommended values for m are in the range {7,..., 20}. This is implemented in the code LBFGS(B) of Byrd, Nocedal and coworkers [22]. With m > n this method should, from theory, give the same results as the original method, if applied to uniformly convex quadratic /. Unfortunately this is far from being true. The numerical evaluation of using this "direct" formula seems to be subject to numerical instability and further research in this direction is required. However, practioners like the method because of their simplicity (no Hessian required). But from numerical experiments it became clear that for strongly nonquadratic / the incomplete Newton methods are to be preferred. Numerical experience is reported in the papers [57, 83, 84, 120].

4. ound constrained problems

For reasons of simple exposition we restrict the discussion to the problem

f (x) = min

V ' |Dn

The case of general lower and upper bounds and of only partially constrained components of x can be derived with ease from this special one. The necessary first order optimality conditions now read

The first methods for this type of problem were of the type of active set methods. Initially / is minimized using an unconstrained technique. However, by proper selection of the stepsize, the constraints remain satisfied. If a variable meets its lower bound, it is fixed there and minimization of / continues on a boundary linear manifold of the feasible set. This process continues until either a (resp. the) constrained minimizer is located or otherwise the components of the gradient corresponding to free variables became small whereas one of its components corresponding to a fixed variable stays relatively large and of negative sign, indicating a false constraining manifold. In the latter case the corresponding variable is freed and moved into the relative interior of the feasible set, with minimization of / continuing. Using some simple criteria for fixing what is meant by "small" and "relatively large" here convergence of this process to a constrained minimizer can be shown. This kind of method is simple to implement and robust, but necessarily very slow if many changes of the active set are required. Nash's code TNB is of this kind. All of the newer methods for this kind of problem however make use of a special intermediate step as soon as a variable meets its bound. This special step consists in (at least one) step of the gradient projection method. This is given by exact (for linear or quadratic /) or approximate minimization of / along the projection of the ray xk — aV/(xk) onto the feasible set, which in the case of bound constraints is trivial , namely

with max for a vector understood componentwise. If the active set A does not change, then unconstrained minimization of / on a constraining manifold (using a method better than the mere gradient descent) is resumed. This leads to fast changes in the active set and rather efficient overall performance, e.g. Byrd et al. [22] (code LBFGSB), Conn, Gould und Toint [26] (code SBMIN gone into LANCELOT), [27], More and Lin [82], code TRON, More and Toraldo [88, 89] (code GPCG by Felkel, see [87]) and Felkels code PL2, again see [87] and Felkels PhD dissertation [39], see also More's MINPACK 2 project [3].

Using this technique it is possible to solve bound constrained quadratic problems with almost the same effort as unconstrained ones. However, this author does not completely participate in the enthusiasm of many colleagues for gradient projection. It is true that the sole role of this method is the fast identification of the correct active set. Nevertheless, for a general nonlinear / it is necessary to check the descent of / along the projected ray if the function is nonconvex or if the computation of the two directional derivatives of / at the breakpoints of the ray cannot be computed cheaply. Because the optimal stepsize along the projection might be quite small for mildly illconditioned Hessian of /, this may lead to premature termination. The optimal stepsize along a ray x — aV/(x) is of the form

V/(x) - A = 0, AiX, = 0, A > 0 .

This can be rewritten as

V/(x)i = |

0 if Xj > 0, 0 otherwise .

Ps(xk - aV/(xk)) = max{0, xk - aV/(xk)}

■](1 + O(||V/1|)) ,

where eig(G) are the eigenvalues of the Hessian G of /. The correction

a*V/(x) = s

can be as small as

¥cond(G)||x - x*||<1 +^conW"

whereas the difference /(x) — /(x + s) may be of the order of

9 1

4 cond(G)

eigmin(G)||x - XoptH

2

(which is much smaller than f (x) — f (xopt).) If this value reaches the roundoff level in the evaluation of f the method terminates of course, at a relatively large value || x Xopt ||, even for V2f only mildly illconditioned. Numerical tests verified this behaviour [101]. Therefore it might finally not be the best solution to move along constraining manifolds at all. There exist newer approaches which make use of a primal-dual approach, including movement in the dual variables (Vf(x))j for x¿ = 0, e.g. [33]. Others aim in avoiding movement along constraining manifolds using moves that at least in a number of variables are always interior to the feasible set, e.g. [13]. But thorough testing of all these solution approaches on an identical testbed are necessary to assess their relative merits.

5. General linearly constrained problems: active set methods

Linearly restricted problems are much easier to handle than those with general constraints. The feasible set is convex. There exist methods for the elimination of redundant equalities and inequalities from systems of linear equality and inequality constraints which work (at least theoretically). Numerically there might be trouble due to illconditioning. (Indeed there are lots of examples in the NETLIB LP-library where infeasibility is hard to discern.) With this reduction done MFCQ is satisfied globally for these systems. There exist finite algorithms to decide feasibility of such constraints (e. g. phase I of the simplex algorithm). No one of these properties translates to general constraints.

Minimization methods of the active set type (minimization on submanifolds) can easily be devised for general linear constraints, hence it is possible to maintain feasibility in linearly constrained optimization quite easily, once a feasible point is known. Some of the sequential quadratic programming (SQP)-methods discussed later in this paper use this fact. Contrary to this the gradient projection method cannot be implemented efficiently for this type of constraints, since the determination of the projection of an (infeasible) point onto a polyhedron involves the solution of a convex quadratic programming problem whose solution might be as costly as the solution of the original problem.

For the convex quadratic linearly constrained optimization problem, the so called convex QP problem, i. e.

f (x) = ^(x)T Ax — aTx where A is positive semidefinite

with the constraints

g(x) = GTx + g0 > 0 , h(x) = HTx + h0 = 0 ,

including the case A = O, the LP problem, there exist specialized highly efficient methods, e. g. the simplex method , the dual simplex method and its generalizations to the quadratic case, e. g. methods given by Fletcher [42], Goldfarb [60] and Goldfarb and Idnani [61]. These methods are part of many optimization libraries and the core of most of the SQP methods for medium large problems, which will be dealt with later in this paper. Using sparse matrix methods these algorithms can also be extended to higher dimensional problems, for the quadratic case see e. g. [62]. For the LP problem there exists a bunch of implementations, commercial as well as in the public domain, capable of solving problems up to 106 variables, given sufficient sparsity in the constraints.

There is newer work [49] which deals with the possibility to combine classical active set technology with activation and inactivation of an arbitrary number of constraints per step, which makes these methods even more efficient, see also sections 3.4.6 and 3.4.7 in [114]. Nevertheless the theoretical worst case complexity of such an approach is always exponential in dimension (and input size). Algorithmically, these methods can be quite involved.

Besides direct treatment of the active linear constraints via elimination techniques (using a choice of numerical linear algebra techniques) there exists also the possibility to transform the problem to a bound constrained one and using the powerful techniques described in the previous section for the latter. For simplicity of exposition let us assume that the constraints are already in the form used by the primal simplex method, namely

BTx - b = 0 , x > 0 .

Indeed any linearly constrained problem can be transformed such that the constraints take this form. Let us also assume that B is of full rank (In reality this is often not the case, and the numerical treatment of rank deficiency is extremely delicate.) The first order necessary optimality conditions then read

V/(x) - B^ - A = 0 , A > 0 ,

A^ = 0, i = 1,..., n .

If / is convex then these conditions are also sufficient for optimality of a feasible point. If there exists a so called Slater point x with BTx - b = 0, x > 0, then the set of possible Lagrangian multipliers is bounded. Friedlander, Martinez and Santos [54] have shown, that in the case / convex and of class C2 the solution can be obtained via the following bound constrained minimization problem involving the primal and the dual variables x, ^ and A simultaneously:

M(x,^,A) = 1 (IIV/(x) - B^ - A||2 + ||BTx - b||2 + (xTA)2) = min .

Z x>0, A>0

An obvious disadvantage of this approach is the increase in dimensionality. Kanzow [76], using the so called Fischer-function

0(a,P) = (Va2 + P2 - a - P)2

succeeded in reformulating the problem as a completely unconstrained one:

1 n K(x,p,A) = 2 (||V/(x) — Bp — A||2 + ||BTx — b||2 + J] 0(Xi,Ai^ = min .

i=1

Fischer's function 0 has the interesting property to be stationary exactly for a, ft > 0, aft = 0 and penalizing simultaneously the primal and dual bound constraints as well as the so called complementarity condition x^A^ = 0. There are many other functions of this type which are heavily used in solution methods for the so called complementarity problem

F(x)Ty = 0, F(x) > 0, y > 0 .

Numerical tests with high dimensional convex QP problems have shown superiority of the Friedlander et al. approach compared with Kanzow's. Both methods transform a convex quadratic problem to a nonconvex nonquadratic one, which may be considered a disadvantage. Indeed any convex QP problem can be rewritten as a convex bound constrained QP problem, as shown by the present author [111]. With

L(x, p, A) = /(x) — pT(BTx — b) — ATx

and

S n

$(x, p, A; S, n) = —L(x, p, A) + 2 ||BT x — b||2 + 2 I Ax — a — Bp — A||2

fischer's iunclion z-zmin, zmin=8_5786e-06

there holds the following: If A is positive semidefinite and in addition positive definite on the kernel of BT, then the minimization of $ with respect to x, A under the bound constraints

x > 0 , A > 0

is equivalent to solving the original QP provided 6 > 0 and n > p((ZTAZ)-1), with Z an orthonormal basis of BT's kernel. p(.) denotes the spectral radius of a matrix. Here the trouble comes through the determination of n. Since the (nearly exact) determination of the theoretical lower bound for this parameter is out of discussion for a truly large-scale problem, much remains to be done here. All these approaches introduce an additional problem (also present for all known primal dual differentiable exact penalty functions for the general NLP problem): the condition number of the Jacobian of the corresponding KTC-system is in the order of the square of the condition number of the KTC-system of the original problem and there seems to be no simple way to alleviate this effect. It is an open question whether there exist exact primal-dual smooth penalty functions (a function for which a single unconstrained minimization gives the solution of NLP together with the multipliers) without that nasty effect. Numerical tests which such methods are given in [40, 115]. In [115] the QP problems are constructed artificially with known solution and known condition number and minimized with the code LBFGSB of [22]. This solver was used with identical parameters and identical termination criterion. The computations were done on a HP9000/715 with double precision IEEE754 arithmetic (about 16 decimals). By proper selection of the termination criteria termination took place only if the relative changes in the objective function were below 10-13. The tables which follow show the precision obtained. DXR, DMR and DLR denote the norm of the error divided by the norm of the true solution, measured in the euclidean norm, for the variables x, ^ and A respectively. The condition number of the Hessian and the reduced Hessian was chosen identically in {10,100,1000,10 000}, just the reciprocal of the value amin listed in the table. Such a condition number would not be considered overly bad. n varied from 1000 to 10 000 in steps of 1000.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Method of Friedlander et al.

^min DXR e DMR e DLR e

10-1 [5 ■ 10-6, 4 ■ 10-5] [6 ■ 10-5,1 ■ 10-3] [7 ■ 10-7, 3 ■ 10-5]

10-2 [3 ■ 10-4, 3 ■ 10-3] [5 ■ 10-2,1 ■ 10-1 ] [1 ■ 10-5, 3 ■ 10-3]

10-3 [1 ■ 10-3,1 ■ 10-2] [8 ■ 10-2,1 ■ 10-1 ] [2 ■ 10-5, 3 ■ 10-2]

10-4 [3 ■ 10-3, 3 ■ 10-2] [9 ■ 10-2, 2 ■ 10-1 ] [3 ■ 10-5, 6 ■ 10-2]

Method of Kanzow

^min DXR e DMR e DLR e

10-1 [2 ■ 10-5,1 ■ 10-2] [4 ■ 10-4,1 ■ 10-1 ] [9 ■ 10-8, 3 ■ 10-2]

10-2 [2 ■ 10-3, 2 ■ 10-2] [2 ■ 10-1, 3 ■ 10-1 ] [6 ■ 10-3, 8 ■ 10-2]

10-3 [7 ■ 10-3, 4 ■ 10-2] [1 ■ 10-1, 2 ■ 10-1 ] [8 ■ 10-3,1 ■ 10-1]

10-4 [1 ■ 10-2, 5 ■ 10-2] [1 ■ 10-1, 3 ■ 10-1 ] [6 ■ 10-3,1 ■ 10-1]

Method of Spellucci

^min DXR e DMR e DLR e

10-1 [1 ■ 10-6, 2 ■ 10-5] [2 ■ 10-5, 3 ■ 10-4] [2 ■ 10-11, 3 ■ 10-6]

10-2 [7 ■ 10-5, 3 ■ 10-4] [5 ■ 10-3,4 ■ 10-2] [4 ■ 10-12,1 ■ 10-6]

10-3 [2 ■ 10-4,1 ■ 10-1] [5 ■ 10-2,9 ■ 10-2] [6 ■ 10-12, 2 ■ 10-2]

10-4 [3 ■ 10-3, 3 ■ 10-1] [4 ■ 10-2,9 ■ 10-2] [2 ■ 10-11,4 ■ 10-2]

Taking into account that such QP solvers may possibly be used as a block box solver within other algorithms it becomes clear from the above results, which indicate errors up to 30 percent for only moderately illconditioned problems, that the practical value of these approaches is quite questionable.

6. Linearly constrained problems: Interior-point methods

In the following we again restrict the discussion to the special problem

f (x) = min

with the constraints

BT x — b = 0 , x > 0 . Again we require full rank of B. At first we only consider a convex quadratic f,

f (x) = ^xTAx — bTx , A positive semidefinite

including the LP case. Since Karmarkar's seminal paper [77] on the construction of methods of polynomial complexity for the LP problem there has been an enormous research effort on what is now known as "interior-point methods". Meanwhile the online archive [74] contains papers which appeared since 1994, and also a pointer to an older collection from Eberhard Kranich, making up some thousand contributions. Today the so called primal-dual interior-point methods are considered as the most powerful approach and we discuss here a typical candidate from this class. The development starts with considering the KTC conditions which in the convex case, given a Slater-point, are necessary and sufficient characterizations of the solution:

Ax — a — Bp — A = 0 , A > 0 ,

x > 0 , A^x^ = 0 , i = 1,..., n, BT x — b = 0 .

We consider this as a nonlinear system in x, p and A. Discarding the sign-constraints for x and A we can write this as a nonlinear equation

(Ax — a — Bp — A —BT x + b XA

where X = diag(x1;... , xn). We parametrize the problem changing the complementarity condition to

Ax — a — Bp — A = 0 ,

A^x^ = e , i = 1,... , n , BTx - b = 0 .

On the set of "interior" points

x > 0, A > 0

the Jacobian of this system is always regular. For e > 0 the unique solution describes a smooth curve with parameter e (known as the "central trajectory") which tends for e ^ 0 to one of the solutions of the initial problem.

Under the additional condition of strict complementarity of one solution this limit point can be precisely characterized (it is the "analytic" center of the solution set).

Remark: Observe that in the LP- and also in the not strictly convex QP-case the solution will not necessarily be unique.

This solution curve is now computed pointwise and approximately only, using the damped Newton method for e fixed, and varying e slowly. During this computation the constraints x > 0, A > 0 must be maintained. If this is done naively then one runs into serious trouble because of the singularities at the boundary and the true art in devising an efficient algorithm consists in a senseful change of e and a senseful selection of the stepsize parameter for damping the Newton step. It turns out that ek should depend on the duality gap

(xk)TAk/n .

A typical algorithm is the one of Kojima, Mizuno and Yoshise [78]:

JF(zk)Az^ = -F(zk) to be solved : Newton step,

JF(zk)AzC = pke to be solved : centering step,

Pk = Ofe((xk)TAk)/n ,

Azk = Azk +Azk = (Axk, Apk, AAk)T,

ak = min{min{xk/(-Axk) : Axk < 0},

min(Ak/(-AAk) : AAk < 0}} ,

ak = min{1,Tkak} ,

zk+1 = zk + ak Azk .

Here z = (x,p, A), Tk G]0,1[, ak G [0,1[ and e = (0,... , 0,e)T where e = (1,... , 1)T G IRn.

In the LP and convex QP case one can show total polynomial complexity of the algorithm for reaching the exact solution (in the case of exact rational input data) with a step number of O(y/nL), where L is the so called input length. In the case of integer coefficients and an LP this is defined as

L = 5>g2(|&j | + 1) + 5>g2(|6j | + 1) + 5>g2(M + 1).

j i

After a finite number of steps the correct constraining manifold can be identified exactly, which allows subsequently to abandon the path following and to move to the boundary in O(n) steps. Of course, this is a nice theoretical result (concerning the so called "small step" methods) but without any practical relevance. A rule of thumb states that "thirty steps suffice" (for the so called "large step" methods) to come sufficiently near to the desired solution. It remains to discuss how to solve the linear systems in this method in the true large-scale situation. The Jacobian of F reads

( A -B -I\

A = diag(Ai).

A -B -I

Jf (z) = I -bt O O

A O X

A is the Hessian of f and hence positive semidefinite for convex f .If A = O, i.e. the LP case, then the solution of the linear systems with this matrix is relatively easy, since it can be reduced to a linear system with the (by assumption) positive definite and much smaller matrix BTA-1XB. In case of a general semidefinite A this is no longer possible and a first resort would be methods for large indefinite symmetric systems, e. g. the techniques developed by Duff, Reid and coworkers [34, 35]. Fortunately these matrices are not general indefinite, such that more efficient solution methods are available see e. g. [59, 117]. There exist other specialized techniques, e. g. iterative preconditioned techniques like those considered by Freund and Jarre [53]. Primal-dual methods for the LP case are discussed in depth by St. Wright in [116].

Mittelmann [86] performed a large number of comparative studies of several interior-point-and simplex-based methods. From his results it can be seen that in many instances interior-point methods can by far outperform the simplex method, but there are problems concerning the reliability of such codes, which however continue to diminish in time to date. Since the codes tested are contiuously updated and improved, the interested reader should directly consult Mittelmanns pages for obtaining the newest results.

It is possible to derive interior-point methods from the minimization of the so called logarithmic barrier function (a method known for a long time [41] and disregarded subsequently because of the inherent illconditioning. In the case of a problem satisfying the linear independence and second order sufficiency condition E + |A| of the eigenvalues of the Hessian tend to infinity whereas the remaining ones stay bounded. If path following is used and the exact Hessian is available, this has only little effect. However, if one uses a general method of descent then even slight illconditioning has a detrimental effect due to the loss of centrality. Restricted to the central path the conditioning of the problem is much better). The logaritmic barrier method penalizes a step towards the boundary by a term which grows to infinity there, multiplied by a parameter (the e above) which is driven to zero, such that the minimizer is approached from the interior of the feasible set. Mostly one uses the so called logarithmic barrier function

n

f (x) — e > ln(xj) = min, BTx — b = 0

—' x

i=1

with e ^ 0. The direct application of this idea requires a point x > 0, which satisfies the equality constraints. This defines a "feasible interior-point" approach, but there exist also "infeasible interior-point" methods, where the equality constraints are satisfied in the limit only.

Remark: If the QP problem is nonconvex, then there is no known algorithm of polynomial complexity to solve it. Even the decision whether a stationary point found is a local minimizer, is NP hard, [108, 119]. The methods described above cannot be used for nonconvex cases.

Remark: In addition to the results mentioned above it is at least of theoretical interest that it is also possible to obtain superlinear convergence using these methods by proper choice of the parameters, see e. g. [103].

If f is a general smooth convex function, then the same techniques can in principle be applied, simply replacing the matrix A there by V2f (xk). The step size selection then has in addition to check the descent property of f as the sole additional complication. An algorithm around this ideas has been developed from HOPDM, see [37].

7. Nonlinearly constrained problems

Concerning nonlinearly constrained problems there is presently not yet such a large progress as with linearly constrained ones, although several successful solution approaches exist. The solution of small to medium-scale general nonlinear problems is nowadays routine. There exist commercial solvers with dense linear algebra, e. g. in the libraries NAG and IMSL and some more powerful ones which already use sparse matrix technology, also capable to deal with some thousand variables at least, see e.g. the optimization software guide by More and Wright [90] and the nonlinear programming FAQ [51]. There are even codes in the public domain which perform quite well [86, 87].

From the older approaches like penalty and the classical augmented Lagrangian methods there survived only the generalized reduced gradient method, e. g. the code GRG2 of Lasdon (see e. g. [51]). This is an active set method and can be considered as an adaptation of the simplex method to nonlinear constraints. Typically the problem is formulated as

by introducing positive slacks for the original inequality constraints. Given a feasible point x a direction of descent d is computed which satisfies (Vh(x))Td = 0 and dj = 0 if Xj = 0 for i e J. For a prospective move

Of course r(x, d, a) = O(a2). The computation of r requires the solution of a nonlinear system of equations for every instance of a. a itself is computed using e. g. backtracking for f along that arc. Freedom exists in defining d, it may be obtained by solving

with the sole restriction of A being positive definite on the kernel of BT. Here A stands for the Hessian of the Lagrangian or a replacement thereof and B is (Vh(x), (ej)ieA). In addition one needs some inactivation scheme for eliminating indices from A based on the signs of the current multiplier estimates. This type of method is reliable but too expensive due to the maintenance of feasibility and the necessity of providing a feasible point first, which requires some phase I like method. Therefore other approaches are currently in the center of the interest. Especially SQP methods, which can be interpreted as modified Newton methods for the system KTC combined with some globalization scheme (mostly via a so called merit function) have shown enormous success for medium-scale problems.

However, for large-scale problems (with more than some thousand variables and even larger ones) the solution approaches have not yet matured. One might discern five different approaches to tackle such large-scale problems:

f (x) = min , h(x) = 0, xj > 0

x ^ x + ad a restoration value r(x, d, a) is computed such that

h(x + ad + r(x,d, a)) = 0 , x + ad + r(x, d, a) > 0 .

i Modification of well established methods simply by introducing sparse matrix techniques

into the linear algebra part.

ii Transformation into bound constrained problems with subsequent use of the specialized solution techniques for the latter.

iii Methods of linearization SLP.

iv New variants of the SQP method.

v Direct adaptations of interior-point methods.

In the first group we find the further development of the code MINOS by Saunders [107], work of Gill, Murray and Saunders on SNOPT [58] (a special instance of a SQP-method which evolved from NPSOL) and for example the SQP methods of Nickel and Tolle [96], Betts and Frank [6], Bartholomew-Biggs and Hernandez [5], Ni [95] and Biegler and Nocedal [7].

MINOS is very efficient if there are only few and only mildly nonlinear constraints. Here a sequence of nonlinear but linearly constrained optimization problems is solved, following Robinson's method [104]. Globalization is obtained by the classical exterior quadratic penalty function. The linear constraints are dealt with using the elimination technique of the simplex method (indeed, MINOS is the long standing reference for an implementation of the simplex method). If nonlinearity is more pronounced and there are many nonlinear constraints, this method is inferior to other approaches.

SNOPT takes advantage of eventual sparsity of the gradients of the constraints using sparse matrix methods for a factorization, which is used in the steps of a primal QP solver within a SQP method. Using this factorization a reduced Hessian is defined which is needed to solve the KTC like systems. It is normally a full matrix, hence either one restricts oneself to problems with a rather modest number of free variables (a case for which SNOPT was originally designed, a discretized control problem) or one resorts to the use of limited-memory quasi-Newton matrices.

Similarly Bartholomew-Biggs's approach is based on exploiting sparsity of constraint gradients and limited-memory approximations of the Hessian of the Lagrangian (this time the full matrix). However, as other users of the limited-memory approach too, the authors report about poor performance of this technique in the case of nonconvex or illconditioned testcases. This corresponds to the numerical experience of the present author.

The papers [4, 5] describe how sparse-matrix techniques can be used directly in the kernel of an otherwise unchanged SQP-code, the QP solver. Since one cannot assume that the Hessian of the Lagrangian is positive definite the arising systems are not quasi-definite as in interior-point methods for convex QP's. The key role here play techniques for solving sparse symmetric indefinite systems, as developed by Duff, Reid and coworkers and gone into the codes MA27 and MA47 of the Harwell library. This technique also plays a key role in many other approaches. There is a disadvantage: the user must specify the nonzero patterns of the gradients and of the Hessian of the Lagrangian. If one uses a modeling system, this will be provided automatically, but otherwise it may be a prohibitively painstaking job. These systems have the form

A being sparse requires exact second derivatives or difference approximations thereof, since quasi-Newton updates will be full. Finite differences might be very costly, if possible at all. For problems of very high dimension the Bunch-Parlett decomposition is of little use because of the increased fill in involved with it and also because of numerical problems occuring if pivoting is limited. Iterative solvers for such systems are known "in principle", e. g. the classic SYMMLQ

[98], see also [9]. However, due to problems with detecting rank deficiency and indefiniteness of the reduced Hessian they are not reliable. But there is an intense research in this area, see e. g. [53, 63]. The last cited work applies the conjugate gradient method to the system with the reduced Hessian without setting this up explicitly.

There is an even more problematic point. Indefinite QP problems which occur quite often in the SQP method if applied to nonconvex problems using the exact Hessian are regularized by a spectral shift, using a norm of the matrix or a similar crude eigenvalue bound. This is done for example by Franke [52] and by Conn, Gould and Toint in [29], with obviously bad outcome.

Ni's approach is a further development of the ideas of Nickel and Tolle. Here the inverse of the Hessian of the Lagrangian resp. of a positive definite replacement thereof is approximated by a limited-memory quasi-Newton update. The QP subproblem is solved via its dual. Problems with an incompatible primal QP are circumvented by regularizing the dual such that it always has a bounded solution.

Biegler, Nocedal und Schmid [7] consider the equality constrained case only. In principle they use the standard SQP approach, however decomposing the correction explicitly into a "vertical" and a "horizontal" part. (The "vertical" part here and at other places concerns always a correction of infeasibility. This correction is typically in the range space of the gradients of the (active) constraints). The essential contribution of this paper is the demonstration how to compute this decomposition efficiently in the large-scale case. Numerical results are presented in the followup paper [8].

7.1. Transformation into an only bound constrained problem

Transformation into an only bound constrained problem is the basic idea of the LANCELOT project of Conn, Gould and Toint [28]. One introduces slack variables into the general inequality constraints and obtains a problem with only equality constraints as general constraints and simple bound constraints as the only inequality constraints. For reasons of simple presentation we consider the case that all variables are bounded from below by zero:

f (x) = min, (NLP :) x > 0, c(x) = 0.

Then the LANCELOT (large and nonlinearly constrained extended Lagrangian optimization technique) approach is based on the following

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Theorem 7.1. Let x* be a local solution of NLP with multipliers and A* for the equality and inequality constraints. Assume Vc(x*) being of full rank and VXxL(x*, A*) positive definite on the kernel of Vc(x*)T. Then there exists some 70 > 0 such that for 7 > 70

x(^) = argminx{f(x) — (^)Tc(x) + 2||c(x)||2 : x > 0}

is well defined on U(^*) as a function of ^ and

= f (xM) — 0u)T c(xfa)) + 2 ||c(x(^))||2

posesses in an unconstrained strict local maximizer.

For given p x(p) will be computed by a bound constrained minimization and ^(p) is simply the minimal value in this process. The function ^(p) itself is maximized with respect to p. It is of great help that V^(p) = —c(x(p)) and V2^(p) = — 11 + O(). This has the consequence that for sufficiently large 7 even the simplest gradient method will converge quickly. All these relations hold for a general c indeed locally only, whereas for an originally convex problem the validity of the approach is global. There exist theoretical bounds for 70, but these cannot be evaluated in practice. They involve the singular values of Vc and the eigenvalues of the Hessian and the reduced Hessian of the original Lagrangian function. This facts and the necessity to choose 7 appropriately (large enough, but not too large, since otherwise the minimization will become troublesome) make the approach problematic. But for a long time this had been the only access to really large-scale problems. Meanwhile alternatives evolved, see below. If the required strong regularity assumptions are not satisfied, this will cause additional trouble, although a rank deficient Vc doesn't hurt too much. The inner/outer iteration is expensive although the outer maximization converges fast usually. It is possible to increase the efficiency by using the problems structure introduced by the slacks. This is discussed for example in the paper [30]. The weak points of the extended Lagrangian approach are discussed in the book [114], section 3.5.2 and 3.5.3, and also in

7.2. Linearization methods

The pure linearization methods (SLP-methods) use the fact that there exists a large choice of highly sophisticated software for LP problems which is capable of solving problems with up to millions of variables and constraints (sparsity assumed). One computes a direction of descent dk for a given exact penalty function at a given guess xk as a solution of the LP-problem

V/(xk )T d + yw

-g(xk) - Vg(xk)Td < -h(xk) - Vh(xk)Td < w , h(xk ) + Vh(xk )T d < w

-Pk < dk < pk , i = 1,... .

= min

< w ,

< w ,

< w ,

> 0,

Here w is a slack variable, which serves the purpose to make the linear constraints of this subproblem always compatible, such that the subproblem is solvable, due to the artificial compactness conditions on . One aims in choosing 7 sufficiently large, such that w = 0 is achieved whenever this is possible. pk is the so called "trust-region radius", which is computed adaptively, depending on the descent obtained for the penalty function (in this case the exact penaltyfunction.) This method cannot converge fast, it corresponds to the ordinary gradient descent in unconstrained minimization. But there exists a lot of enhancements, see e. g. [47, 75].

7.3. Modifications of the SQP method

The articles [64] and [91] describe existing possibilities and the second one also gives some review of existing software. We restrict ourselves here to a short review of the most promising approaches for which there is some numerical experience.

Franke [52] uses a classical SQP method, essentially Schittkowski's method [110]. The emerging QP subproblem

-xTAx — a = min , BTx — b = 0, CTx — c > 0 2

is then solved by an interior-point method, considering the parametrized KTC system

Ax — a — B^ — CA = 0 , BT x — b = 0 , w = CT x — c > 0 , A > 0 , Aw — ee = 0

with parameter e, e ^ 0. This nonlinear system is solved approximately via Newton's method. The resulting linear systems are solved by a modification of the sparse Bunch-Parlett code from the MESCHACH-library of Steward and Leyk (to be found as a tar-file in netlib/cephes). Here A represents the Hessian of the Lagrangian. If this matrix turns out to be indefinite, then Franke regularizes it as done in [4, 29] using estimates of the eigenvalues provided by Gerschgorin discs. As is to be expected the numerical results are disappointing, if this occurs. As a further severe restriction Franke's approach misses a systematic treatment of inconsistent QP-Problems, which occur rather often in nonconvex cases. The approach has gone into a code HQP (see [87]) which solved a bunch of highdimensional problems from the CUTE collection successfully.

Plantenga [100] first transforms a general NLP into an equality constrained one with additional bound constraints, using slack variables for the inequality constraints. For reasons of simple presentation we again assume that all variables are subject to the positivity constraints:

f (x) = min , c(x) = 0 , x > 0 .

The solution of this problem is then approximated by the minimization of a barrier function subject to the equality constraints:

n

f (x) — e^^ ln(xj) = min , i=1

c(x) = 0 , x > 0 ,

that means the barrier technique here is used at the outer stage as opposed to Franke's approach, which uses this at the inner stage. The resulting equality constrained problem in turn is solved using the method of Byrd und Omojokun. This latter is a specialized method for equality constrained problems, a variant of Burke's method [16]. This is a trust-region method. In every step two QP problems with an additional trust-region constraints have to be solved.

||Vc(xk )T v + c(xk )||2 = min

(7.3)

||v||2 < CAk ,

V/(xk)Td +2dTGfcd = min , (7.4)

Vc(xk)Td = Vc(xk)Tvk , (7.5)

||d||2 < Ak .

Here vk is the solution of (7.3). This is an example of the so called "vertical" correction: locally it points vertically to the boundary of the feasible set. This vertical correction serves the purpose of diminishing the infeasibility. The parameter Z is chosen arbitrarily, but fixed in ]0,1[. In (7.3), (7.3) d is decomposed into d = vk + Zku where Zk is a basis of the kernel of Vc(xk)T. Then the condition (7.3) is satisfied automatically. In order to compute such a basis for the problem (7.3) one uses sparse matrix techniques, e. g. the code MA28 of the Harwell library. The trust-region radius Ak is computed adaptively as usual, checking descent for the exact penalty function

n

f (x) - e ln(xi) + Y||c(x)||2 i=1

and e is diminished in the outer iteration. The first version of Plantenga showed performance comparable with LANCELOT. A detailed exposition is to be found in the paper [80]. This method has been further developed by Hribar [73].

A completely different approach is taken by Boggs and coworkers, [10-12]. The algorithm can be used for problems with mixed constraints, but like the authors in their papers we restrict ourselves here to the inequality constrained case, changing also the formulation into our standard form with constraints of the form g(x) > 0. Therefore the original problem now reads

/(x1) = min ,g(x7) > 0 .

After introducing slacks x11 > 0 the problem becomes formally one with equality constraints, the difference lying in the fact that the slacks are kept strictly positive.

x = (x1 ,x7/) , c(x) = g(x7) — x11 = 0 .

The backbone of the method is the classical SQP method with Fletchers differentiable penalty function as a merit function. This is the function

F(x) = f (x) - p(x)Tc(x) + 2||A(x)1/2c(x)112

where

p(x) = (Vc(x)TVc(x))-1 Vc(x)TV/(x) .

Since the QP solver used is an interior-point method which maintains strict positivity of the slacks (by means of the stepsize selection), these can be eliminated from the computation. That means that no increase of dimension occurs here. Infeasible initial values are treated by a "big M" method, the QP problems being of the type

cT d + 2 Ad + M0 = min

1 dT

BTd + b + > 0 , 9 > 0 .

If the algorithm is to be used with an equality constrained problem, then this "big M" phase persists throughout (since a strictly feasible point for the original problem is never found). During the stepsize algorithm the estimate ^(x) for the multipliers and the weight matrix A(x) for the quadratic penalty term are frozen in order to reduce the evaluation effort for the Fletcher-function. Nevertheless every function value requires the solution of a linear system with the matrix

(VgT (xk )Vg(xk) + Zfc),

where g is the vector of inequality constraints and the diagonal matrix built from the values of the slack variables xfc'7/. This matrix is always positive definite, but may be illconditioned. This introduces some trouble if the system must be solved iteratively. Furthermore the parameter Y in Fletchers function is not so easy to obtain, which produces even more trouble for nonconvex problems.

The QP solver itself is an inexact one and works with small dimensional subproblems of dimension up to three, which in turn are solved exactly. The basis of these subspaces is spanned by solutions of systems of the form

(Vg(xk)Z-2VgT(xk)+ Qfc/pfc)pi>fc = ti>fc, i = 1, 2, 3

where is an approximation for the Hessian of the Lagrangian, and is chosen sufficiently large. These are systems of exactly the same structure as they occur when applying the classical barrier method for inequality constrained problems in connection with Newtons method. Should turn out to be not positive definite then also directions of negative curvature for the Lagrangian are used. The paper [12] contains an overview over numerical results obtained so far.

In the approach of Felkel and Spellucci the starting point again is a classical SQP method, this time using the nondifferentiable exact penalty function

$(x; y) = f (x)+ Y||h(x),g(x)"|U

for a problem with mixed constraints, with Y chosen adaptively. This function has the advantage of requiring minimal global conditions on the problem. Its clear disadvantage is its scale dependency, which requires adaptive and dynamic scaling of the problem. For this task a satisfactory practical solution was given already in [71].

A direction of descent for this function is obtained from the QP problem

2dTBkd + Vf (xk)Td + yw + -aw2 = min ,

—g(xk) — Vg(xk)Td < we , —h(xk) — Vh(xk)Td < we , h(xk) + Vh(xk)Td < we , w > 0 ,

where w is a single slack variable which makes the problem compatible. As usual, e = (1,... , 1)T. But of course one wishes to obtain w = 0 which in turn guarantees that dk is also a direction of descent for the penalty term alone. This is obtained by increasing y cautiously. The QP subproblem is not solved exactly as in [71], but approximately only minimizing a shifted logarithmic barrier function for this problem (observe that there are only inequality constraints

due to the transformation of an equality constraint into two inequalites with slack), maintaining the constraint w > 0. For a restriction r.(d) of the QP problem the barrier term is

b(r. s n) = J ln(ri +s) if ri(x) > —

( ' n) [ a0 + a1 r. + a2r2 + a3r3 otherwise

where the parameters are chosen such that C3 continuity is satisfied. Here s > 0 is the shift, which must be chosen sufficiently small but must not tend to zero and n is a parameter. This avoids the singular behaviour at the solution, [102].

This minimization is accomplished by an inexact Newton method based on the Lanczos algorithm. This QP solver has first been tested as a stand alone solver for QP problems with great success, see [38]. In the current version of the NLP solver there are to be observed several inefficiencies. These occur if the matrices are not sufficiently positive definite or if there is a large number of strongly nonlinear equality constraints. In the last case there occur steeply grooved and narrow valleys (due to the choice of a possible large 7 and a small slack w), which make the method slow even if combined with second order corrections. Also the descent property of dk is sometimes lost if the solution precision in the QP solver is relaxed. Improvements are a theme of current research.

The filter SQP method of Fletcher and Leyffer [43-46] avoids the usage of a merit function at all and thereby the complications involved in chosing the penalty parameters appropriately. Rather it considers NLP as a special instance of vector optimization

/(x*) ^ ^ ( /(x)

0(x*) ) - \ 0(x)

where

0(x) =f max{||h(x)||^' ||(g(x))-|U} . Two types of moves are considered. If the trust-region QP problem

mfc (d) = / (xk) + V/(xk )T d + 1 dT Afc d = min

2 d

g(xk) + Vg(xk)Td > 0 ' h(xk) + Vh(xk)Td = 0 ' ||d|| < Afc '

with the trust-region radius Ak is solvable, then its solution is taken as a prospective move and its acceptability is checked, otherwise a restoration move is computed, aiming in diminishing 0. This might be e. g. a solution of

!

w = min

g(xk) + Vg(xk)Td + we > 0 ' h(xk) + Vh(xk)Td + we > 0 ' -h(xk) — Vh(xk)Td + we > 0 ' ||d|| < Afc .

The acceptance of a move and the trust-region radius are controlled by a so called "filter". This is a set of values (/(xs)' 0(xs)) none of which is dominated by the others in the sense of the

half order of IR2 and with 0(xs) = 0. If the QP problem is infeasible, the corresponding values are always added to the filter and a move is made by a restoration step, aiming in computing some pair (xfc+1, Ak+1) such that the corresponding new QP problem becomes feasible. Otherwise the values at xk are added to the filter if

(0) - (dk) <

for some constant k > 0. The point xfc+1 = xk + is accepted if

£(xfc+1) < (1 - 70№ or f (xfc+1) < f - 7* for all values in the filter and if in addition

f (xk) - f (xk+1) > > o .

mfc (0) - mfc (dk) " '

In this case Ak+1 can be increased, in all other cases Ak+1 is decreased. 70 is chosen arbitrarily but fixed in ]0,1[. After adding a pair , fk) to the filter this is purged by deleting all pairs with > 70and fj > fk - 70. This method is quite flexible, in principle it uses a merit function whose weights may change every step (via scalarization of the vector optimization problem). The code filter can be used via NEOS.

7.4. Adaptation of interior-point methods

The modified SQP methods described above result in multistage algorithms of inner/outer iteration type. Since the QP problems inherit the combinatorial nature of the original problem, their solution itself may be quite costly, especially in the large-scale case. Therefore there was much effort in avoiding this by using some more direct approach.

Conn, Gould and Toint [29] describe a new access for minimizing general nonlinear functions subject to linear equality and bound constraints. Nonlinear constraints could be incorporated via an augented Lagrangian. The problem

f (x)

minx

B T x

x > 0

is embedded into a higher dimensional one

f (x) + 2 P(£ + 1)2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

B T x-b

mm

x

£ >

£(Bx - b) 0, 0,

where the penalty parameter p is fixed in principle but has to be determined adaptively. The Kuhn-Tucker conditions for this extended problem become parametrized and the parametric systems are solved by Newton's method, with the Bunch-Parlett-decomposition as linear solver:

Vf (x) - B^ - A = 0 ,

-(BTx0 - b)T^ + p(£ + 1) = 0 ,

BTx - b - £r0 = 0 ,

XAe - ee = 0 ,

X, A > 0 .

b

The logarithmic barrier function

1 n

f (x) + 2P(£ + 1)2 - ^ ln(xi)

i=1

serves as a merit function for stepsize control. In the case V2f (x) not positive definite it is regularized by the additive term ||V2f (x)||1. Numerical results are reported for large QP problems. In the convex case the results are very good, compared e. g. with Goulds direct method, code VE09, but for nonconvex cases they are completely disappointing, as is to be expected from this crude regularization. Divising a good adaptive scheme for the penalty parameter p is difficult. It depends besides others on the reciprocal value of the smallest positive eigenvalue of the indefinite matrix

V2f (x) B

BT O

There one also tried to make use of the classical barrier and penalty functions more directly using some modifications for large dimensional problems in the hope to take advantage of the progress made for highdimensional unconstrained minimization, e. g. Nash and Sofer [94] or Shanno and Breitfeld [15]. The results obtained this way are not overly encouraging. The main reason for this lies in the fact that one loses too much of the "centering" when the penalty parameter is changed. Along the central trajectory the behaviour of these functions is much more amenable than a bit outside, especially if one uses quasi Newton or conjugate gradient type methods.

Forsgren and Gill [50] use the minimization of a mixed penalty-barrier function which was already described by Fiacco and McCormick. One of the problems in connection with this function is the fact that the theoretical multiplier estimates provided by these functions, e. g.

P -■ A.

gi(x(p))

are quite unreliable in practice. The authors try to alleviate this effect by adding penalty terms for these deviations in a primal-dual framework:

1 p n A A

*(x,p, A; e, y) = f (x) + - £(c.(x)2 + y(c.(x) + ep.)2) -(ln(x.) + y(ln(^) + 1 -^)).

2e ' e e

.=1 .=1

This function is minimized simultaneously with respect to x, p and A. e is the path parameter and we have e ^ 0 but y > 0 fixed. For e fixed minimization is performed by the damped Newton method, making use also of directions of negative curvature, which are available through the Bunch-Parlett decomposition of the Hessian. A special pivoting rule for this decomposition is at the heart of their algorithm. Thereby convergence to second order stationary points can be guaranteed. Numerical results are described in [81]. The outcome of Laux's experiments is a severe scale dependency of the chosen penalty function. Furthermore a good choice of the parameters involved is problematic and last not least, a point satisfying the inequality constraints must be known as an initial guess.

In the work [19] a method is investigated which treats only the slack variables by an interior-point approach. A general NLP problem

f (x) = min , cE (x) = 0 , c/ (x) > 0

is transformed into

f (x) = min , cE (x) = 0 , c/ (x) — s = 0 , s > 0 ,

and this in turn is embedded in a family of equality constrained problems

m

f (x) + e^^ ln(si) = min ,

cE (x) = 0 , c/ (x) — s = 0 .

For e fixed this is an equality constrained nonlinear optimization problem which is solved by a SQP method. A trust-region approach is taken. It turns out that in constructing the trust-region the slack variables need a special treatment, namely using a norm like

where designates the change in x and the change in s and, as usual, S = diag(s1,... , s/). P is a chosen constant. A quite general convergence theory is developed. These investigations are continued in the paper by Byrd, Hribar and Nocedal [20]. There comes up an essentially new aspect, namely the formulation of an adequate quadratic model for the Lagrangian of the barrier-problem with respect to the slack variable s. It turns out that the obvious choice

for the quadratic part corresponding to s is not favourable, since it gives directions which tend to violate feasibility of s and hence enforce a reduction of the step size. Much better seems to be the choice

where A designates the diagonal matrix with the multiplier estimates for the equality constraints ci(x) + s = 0. These estimates are computed separately using the least squares estimate from the KTC conditions for x and s fixed. The paper contains detailed discussion how to solve the subproblems efficiently and approximately only. Promising numerical results are presented for the implementation NITRO. NITRO can be tested via the NEOS submission tool.

Gay, Overton and M. Wright [56] consider a general NLP problem with mixed equality and inequality constraints. Different from their competitors they treat the inequality constraints directly by barrier terms. Hence

^S 2

S-1A ,

f (x) = min , h(x) = 0 , g(x) > 0

is solved via

f (x) — e^^ ln(gj(x)) = min , h(x) = 0

with e ^ 0. The parameter dependent KTC conditions

Vh(x)y + Vg(x)z — Vf (x) = 0 ,

h(x) = 0 ,

G(x)z — ee = 0 , G = diag(gj(x))

are solved by Newtons method, after symmetrization and partial elimination:

-K A \ ( dx AT O dy

V/ - Ay - eBTG-1e -h

with K

H + BT G-1ZB .

An indefinite K becomes regularized to definiteness, but such that its eigenvalues tending to infinity are not touched. Two merit functions are used, namely the norm of the right hand side of the Newton equation and periodically also some type of extended Lagrangian

f (x) - e Y^ln - yT h(x) + 2||h(x)|1

(with c finite and in principle fixed). This device serves the purpose to prevent convergence of Newtons method to saddle points or maxima. Numerical results for low dimensional problems are reported. No complete convergence analysis seems to exist.

Vanberbei and Shanno [118] make use of the powerful QP solver LOQO of Vanderbei. It serves at the heart of a SQP code. Here too inequality constraints g(x) > 0 are transformed into equations using positive slacks which in turn are treated by the logarithmic barrier term. The method computes directions of descent for the inexact penalty barrier function

/(x) - e^ln(si) + §||g(x) - s||

i=1

from the reduced system

H O -A i Ax

O -S-1A I As

-AT I O \ AA

—V/ + AA

eS-1e — A g(x) — s

where A = Vg(x) and H = This descent property is obtained for ft sufficiently large.

Indefinite H is regularized by addition of a suitable multiple of the unit matrix. e is chosen dependent on the duality gap:

e = Y min{(1 — r)1—-, 2}3 S A

I

with

0 < r < 1 , £ =

mini SiAi sT A/I

(Remember that I is the number of inequality constraints). In principle we must have ft ^ to, but since in practice one contents oneself with an approximate solution anyway the authors consider this fact as not so critical. The paper also describes a special treatment of bound- and interval constraints such that only general inequalities must be transformed by slacks. Equality constraints are dealt with directly by a quadratic penalty term. The paper also reports numerical experience. LOQO can also be used through NEOS.

Akrotirianakis and Rustem [1] describe an approach similar to the one given by Vanderbei und Shanno. Starting point is a NLP problem

/(x) = min, g(x) = 0 , x > 0 .

2

2

As a merit function serves the mixed penalty-barrier function

n

$(x; c; e) = f (x) - ej] ln(x.) + 2||g(x)||2

.=1

subject to the equality constraints

g(x) = 0 ,

that means the equality constraints are treated in dublicate: they enter the merit function and are posed explicitly. The barrier parameter e is controlled in an outer and the penalty parameter c in an inner iteration. For e and c fixed the directions of descent come from a Newton step for a zero of the function

Vf (x) - z + c(Vg(x))g(x) - Vg(x)y

F(x,y,z; c, e) = | g(x)

XZe - ee

Here y represents the vector of Lagrange multipliers corresponding to the equality constraints and z is obtained as a new free variable from the relation

z = eX-1e .

As usual e = (1,... , 1)T. z corresponds to the Lagrange multipliers of the bounds. For sufficiently large c the descent property of this direction can be guaranteed and the adaptive computation of c is based on that fact. The Jacobian of F has as left upper block

H = V2f (x) - y x V2g(x) + cVg(x)(Vg(x))T + cg(x) x V2g(x).

H + X-1Z must be invertible in order to have a well defined direction d. This is of course a restrictive condition, which also occurs in other work. Even stronger, uniform boundedness and uniform positive definiteness is assumed for this matrix in the convergence analysis. Different stepsizes are used for the primal and the dual variables. For a special version of the algorithm there is a proof of convergence. No numerical results are presented. The proof of the uniform boundedness of c needed for that seems to be incomplete.

7.5. Homotopy methods for the KTC-system

Whereas the work of the previous section results in some inner/outer iteration scheme with the possibility to choose subalgorithms independently it is also possible to use the primal-dual interior-point formulation known from convex LP and QP directly for a general NLP. This is shown in the interesting paper of El Bakry et al. in [32]. These investigations have been continued by Durazzi [36]. The procedure is best interpreted as a transfer of the Kojima, Mizuno and Yoshise method from the LP to the the NLP case. As is known that one results in the LP case to the most efficient solvers presently known. Starting point is a general problem with mixed constraints h(x) = 0 and g(x) > 0. The trajectory defined by the system

V/(x) - Vh(x)p - Vg(x)A = 0 ,

A - z = 0 ,

h(x) = 0 ,

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

g(x) — s = 0 ,

ZSe — ee = 0 ,

formally written as (x,p, A, z, s)(e) with A, z, s > 0 is traced with e ^ 0, the nonlinear system in turn being approximately solved by Newton's method. The Jacobian of this system can be symmetrized and once again one has to solve a large symmetric indefinite linear system every step. Some numerical evidence with small-scale problems is provided. There are some critical points. On point is the large number of artificial unknowns ( a total of n + 3m + p, if m is the dimension of g and p that of h). The structure of the linear system is much less amenable than in the LP case. The convergence conditions are quite strong and seem to restrict the treatment to convex problems. Among other things one needs regularity of the matrix

VXxL(x, p, A) + Vg(x)S-1ZVg(x)T

for all values of x, p, A, z, s in a neighborhood of the trajectory.

A quite similar approach is followed by Epelly, Gondzio and Vial [37] who restrict themselves to the convex case from the very beginning.

8. Some numerical results

In the following we give an excerpt of the results obtained by the COPS project [31]. The test comprises four solvers, LANCELOT, LOQO, MINOS and SNOPT in their newest version (2000). Default options were used for all codes. The problems are formulated in AMPL. Most of them are discretized control problems. There are also three large QP problems in the testsuite. The evaluation was done on a SUN ULTRA SPARC 2 under Solaris 7. We give here a part of the results only, for the lowest and the highest dimension tested. The table gives computing time in seconds and the optimal value of the objective function. "F" denotes a failure (either premature termination or time out). The numbers x-y-z give the number of variables, of the contraints and of the bound constraints.

Problem/solver LANCELOT LOQO MINOS SNOPT

largest small polygon 12.83 3.49 2.11 1.14

50-49-50 7.9715e-01 7.64714e-01 7.64383e-01 7.79740e-01

largest small polygon F F 344.81 256.6

200-199-200 7.72803e-01 7.85040e-01

electron on sphere 3.98 0.84 6.22 9.65

75-25-0 2.43812e+02 2.43812e+02 2.43812e+02 2.43812e+02

electron on sphere 371.6 2437.78 F 1600.48

600-200-0 1.84389e+04 1.84389e+04 1.84390e+04

hanging chain 13.91 17.24 1.22 5.72

100-51-0 5.07230e+00 5.07226e+00 5.07226e+00 5.07226e+00

hanging chain 577.24 1028.35 73.9 F

800-401-0 5.06788e+00 5.06862e+00 5.06862e+00

shape of cam 42.74 0.68 0.87 0.6

100-202-100 4.30178e+00(F) 4.28414e+00 4.28414e+00 4.28414e+00

shape of cam 1887.03 12.47 21.72 25.17

800-1602-800 4.85693e+00(F) 4.27427e+00 4.27426e+00 4.23739e+00

alpha-pinene 1426.01 28.85 1.98 3.74

505-500-5 1.96766e+01 1.98715e+01 1.98715e+01 1.98715e+01

alpha-pinene F 16.87 F 235.44

4005-4000-5 1.98721e+01 1.98721e+01

Problem / solver LANCELOT LOQO MINOS SNOPT

marine population 623.75 2.07 6.58 85.37

615-592-15 1.97522e+07 1.97522e+07 1.97522e+07 1.97522e+07

marine population F 38.4 F 1502.26

4815-4792-15 1.97465e+07 1.97465e+07

flow in channel F 1.55 1.09 2.14

400-400-0 1.00000 1.00000 1.00000

flow in channel F 22.54 32.65 98.5

3200-3200-0 1.00000 1.00000 1.00000

robot arm F 1.03 2.82 10.22

460-300-306 9.14687 9.14687 9.14687

robot arm F F 161.18 2671.63

3610-2400-2406 9.14108 9.14101

particle steering 30.83 923.3 1.58 3.25

256-200-51 5.54672e-01 5.54668e-01 5.54668e-01 5.54668e-01

particle steering 2997.88 F 143.09 147.37

2006-1600-401 5.54552e-01 5.54572e-01 5.54573e-01

goddar rocket F 3.34 1.69 3.04

205-150-153 1.01281 1.01280 1.01282

goddar rocket F 12.42 F 64.48

1605-1200-1203 1.01283 F 1.01283

hang glider F F F 11.14

256-200-153 1.28239e+03

hang glider F F F 1268.67

2006-1601-1203 1.24797e+03

cracking oil 918.28 1.37 5.04 5.25

503-500-3 5.23633e-03 5.23664e-03 5.23664e-03 5.23664e-03

cracking oil F 49.62 161.99 179.71

4003-4000-3 5.23659e-03 5.23659e-03 5.23659e-03

methanol to hydrocarbon 196.62 2.13 5.05 12.92

605-600-5 9.02300e-03 9.02229e-03 9.02228e-03 9.02228e-03

methanol to hydrocarbon F 45.2 263.67 512.16

4805-4800-5 9.02229e-03 9.02228e-03 9.02228e-03

catalyst mixing 7.71 0.66 2.39 3.99

303-200-101 -4.77480e-02 -4.80694e-02 -4.80605e-02 -4.80579e-02

catalyst mixing 424.61 8.25 17.88 181.24

2403-1600-801 -4.71856e-02 -4.80559e-02 -4.74787e-02 -4.80451e-02

elastic plastic torsion 3.01 2.99 108.31 125.58

1250-0-1250 -4.17510e-01 -4.17510e-01 -4.17510e-01 -4.17510e-01

elastic plastic torsion 17.19 15.86 F F

5000-0-5000 -4.18239e-01 -4.18239e-01

journal bearing 3.02 3.36 173.65 722.68

1250-0-1250 -1.54015e-01 -1.54015e-01 -1.54015e-01 -1.54015e-01

journal bearing 17.89 13.33 F F

5000-0-5000 -1.55042e-01 -1.55042e-01

obstacle 2.77 2.98 103.76 137.68

1250-0-1250 2.51948 2.51948 2.51948 2.51948

obstacle 16.33 F F F

5000-0-5000 2.50694e+00

Besides these tests there are also many interesting benchmarks provided by H. D. Mittelmann [86], showing the same effect: there is no clear cut winner and reliability of the codes is far beyond the level meanwhile reached for small- and medium-scale problems. Hence much remains to be done.

9. Conclusion

The great impact of interior-point methods on present optimization technology can clearly be seen from the discussion above. Summarizing it can be safely said that convex problems, be it QP or more general NLP problems, can be solved by a variety of methods with good success, even in the large-scale case. As T. Rockafellar states: "The great watershed in optimization isn't between linearity and nonlinearity but convexity and nonconvexity" [106]. First of all we must be prepared to accept a local solution, possibly much weaker than the desired global one. Which of the possibly many local solutions is identified is subject to details of the algorithm's implementation and the initial guess and it is almost impossible to take influence on that. For a nonconvex problem often some kind of restoration steps must be used. This typically requires the linear independence of the gradients of the violated constraints or a least the selection of a subset of constraints which satisfy this condition. In the small-scale area the QR or the SVD decomposition of the matrix of these gradients helps to do that. This technique cannot be used for large-scale applications because the enormous fill in produced by orthogonal transformations. Other linear algebra techniques are much less reliable. The detection of infeasibility cannot be based on the unboundedness of the dual variables. The detection of nonconvexity via detection of indefiniteness of the Hessian of the Lagrangian is an unreliable process, especially for large scale applications. How to regularize a nonconvex problem, especially in connection with interior-point methods, is an open problem. The crude technique of addition of a multiple of the unit matrix is clearly inadequate. Hence much remains to be done in order to increase the reliability and efficiency of existing methods.

References

[1] Akrotirianakis I., Rustem B. A Globally Convergent Interior-Point Algorithm for General Nonlinear Programming Problems. Tech. Rep. 97/14. Dept. of Comput. Imperial College, London.

[2] Al-Baali M. Descent property and global convergence of the Fletcher-Reeves method with inexact line searches // J. I. M. A. Num. Anal. 1985. Vol. 5. P. 121-124.

[3] Averick B. M., MorE J. J. Evaluation of large-scale optimization problems on vector and parallel architectures // SIOPT 4. 1994. P. 708-721 [Software in info.mcs.anl.gov: / pub/MINPACK-2].

[4] Bartholomew-Biggs M. C., Hernandez M. de F. G. Using the KKT-matrix in the augmented Lagrangian SQP method for sparse constrained optimization // J. O. T. A. 1995. Vol. 85. P. 201-220.

[5] Bartholomew-Biggs M. C., Hernandez M. de F.G. Modifications to the subroutine OPALQP for dealing with large problems //J. Econ. Dyn. Control. 1994. Vol. 18. P. 185-203.

[6] Betts J. T., Frank P. D. A sparse nonlinear optimization algorithm // J. O. T. A. 1994. Vol. 82. P. 519-541.

[7] BlEGLER L. T., Nocedal J., SCHMID C. A reduced Hessian method for large-scale constrained optimization // SIOPT 5. 1995. P. 314-347

[8] BlEGLER L.T., Nocedal J., SCHMID C., Ternet D. Numerical experience with a reduced Hessian method for large scale constrained optimization //J. Comp. Optim. Appl. 2000. Vol. 15. P. 45-67.

[9] BJOERCK A. Numerical stability of methods for solving augmented systems // American Math. Society. Contemp. Math. 1997. Vol. 204. P. 51-60.

[10] BOGGS P.T., DOMICH P.T., Rogers J.E. An interior-point method for large-scale quadratic programming problems // Ann. Oper. Res. 1996. Vol. 62. P. 419-437.

[11] BOGGS P. T., Kearsley A. J., TOLLE J. W. A truncated SQP algorithm for large-scale nonlinear programming problems // Advances in Optimization and Numerical Analysis / S. Gomez and J.P. Hennart (eds.). Dordrecht: Kluw. Acad Publ., 1994. P. 69-77.

[12] BOGGS P.T., Kearsley A. J., TOLLE J.W. A practical algorithm for general large-scale nonlinear optmization problems // SIOPT 9. 1999. P. 755-778.

[13] Branch M. A., Coleman Th. F., Li Y. A subspace, interior and conjugate gradient method for large-scale bound-constrained minimization problems // SIAM J. Sci Comp. 1999. Vol. 21. P. 1-23.

[14] Bongartz I., Conn A. R., Gould N. I. M., Toint Ph. L. CUTE: Constrained and unconstrained testing environment // ACM Trans. Math. Softw. 1995. Vol. 21. P. 123160.

[15] Breitfeld M. G., Shanno D. Preliminary Computational Experience with Modified Log-barrier Functions for Large-Scale Nonlinear Programming / W.W. Hager (Ed.). Large scale optimization. State of the art. Dordrecht: Kluw. Acad. Publ., 1994. P. 45-67.

[16] BURKE J. V. A robust trust-region method for constrained nonlinear programming problems // SIOPT 2. 1992. P. 325-347.

[17] BURKE J. V. An exact penalization viewpoint of constrained optimization // SIAM J. Control and Optimization. 1991. Vol. 29. P. 968-998.

[18] BUCKLEY A., Lenir L. QN-like variable storage conjugate gradients // Math. Prog. 1983. Vol. 27. P. 155-175.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

[19] Byrd R.H., Gilbert J.Ch., Nocedal J. A Trust-region Method Based on Interior-point Techniques for Nonlinear Programming. Rep. 96/05. Optimization Technology Center, Northwestern Univ., Evanston.

[20] Byrd R.H., Hribar M.E., Nocedal J. An interior-point algorithm for large-scale nonlinear programming // SIOPT 9. 1999. P. 877-900.

[21] Byrd R.H., LIU G., Nocedal J. On the local behaviour of an interior point method for nonlinear programming // (Griffiths et al. Eds.) Numer. Anal. 1997: Proc. of the 17th Dundee Biennial Conf. Harlow: Longman. Pitman Res. Notes Math. Ser. 380. 1998. P. 37-56.

[22] Byrd R.H., Lu P., Nocedal J., ZHU C. A limited memory algorithm for bound constrained optimization // SIAM J. Sci. Comp. 1995. Vol. 16. P. 1190-1208.

[23] BYRD R.H., NOCEDAL J. A tool for the analysis of quasi-Newton methods with applications to unconstrained minimization //J. SIAM. Numer. Anal. 1989. Vol. 26. P. 727-739.

[24] Byrd R.H., Nocedal J., SCHNABEL R. B. Representations of quasi-Newton matrices and their use in limited memory methods // Math. Prog. 1994. Vol. 63. P. 129-156.

[25] CONN A. R., GOULD N.I.M., TOINT Ph. L. Large-scale nonlinear constrained optimization: A current survey / E. Spedicato (Ed.). Algorithms for continuous optimization: the state of the art // Proc. of the NATO Advanced Study Institute, Il Ciocco, Barga, Italy, Sept. 5-18, 1993. Dordrecht: Kluwer Acad. Publ. NATO ASI Ser., Ser. C, Math. Phys. Sci. 1994. Vol. 434. P. 287-332.

[26] Conn A. R., Gould N.I.M., Toint Ph. L. Global convergence of a class of trustregion algorithms for optimization with simple bounds // SINUM 25. 1988. P. 433-460; SINUM 26. 1989. P. 764-767.

[27] Conn A. R., Gould N.I. M., Toint Ph. L. Testing a class of methods for solving minimization problems with simple bounds on the variables // Math. Comp. 1988. Vol. 50. P. 399-430.

[28] Conn A. R., Gould N.I.M., Toint Ph. L. A globally convergent Lagrangian algorithm for optimization with general constraints and simple bounds // SINUM 28. 1991. P. 545-572.

[29] CONN A. R., GOULD N. I. M., TOINT Ph. L. A Primal-Dual Algorithm for Minimizing a Nonconvex Function Subject to Bound and Linear Equality Constraints. RAL Rep. 96/9, 1996.

[30] Conn A. R., Gould N. I. M., Toint Ph. L. A globally convergent Lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds // Math. Comp. 1997. Vol. 66. P. 261-288.

[31] Dolan E.D., More J.J. Benchmarking Optimization Software with COPS. Argonne National Laboratory Math. and Comp. Sci. Division Technical Rep. ANL/MCS-246, Nov. 2000.

[32] El-Bakry A. S., Tapia R. A., Tsuchiya T., Zhang Y. On the formulation and theory of the Newton interior-point method for nonlinear programming // J. O. T. A. 1996. Vol. 89. P. 507-541.

[33] Dl PlLLO G., LUCIDI S., PALAGI L. A superlinearly convergent primal-dual algorithm model for constrained optimization problems with bounded variables // OMS. 2000. Vol. 14. P. 49-73.

[34] Duff I. S., Reid J. K. The multifrontal solution of indefinite sparse symmetric linear systems // TOMS 9. 1983. P. 302-323.

[35] Duff I. S., Reid J.K. MA47: A Set of FORTRAN Subroutines for Solving Sparse Symmetric Sets of Linear Equations. Research Rep. Rutherford Appl. Laboratory, Chilton, 1993.

[36] Durazzi C. On the Newton interior-point method for nonlinear programming problems // J. O. T. A. 2000. Vol. 104. P. 73-90.

[37] Epelly O., Gondzio J., Vial J.-P. An Interior-Point Solver for Smooth Convex Optimization with an Application to Environmental-Energy-Economic Models. Logilab Technical Rep. 2000.8. Depat of Management Studies, Univ. of Geneva, Switzerland, July 2000. TR 2000.8.

[38] Felkel R. An Interior-point Method for Large-Scale QP Problems. THD FB Math. 1996. Rep. 1850. [http://www.mathematik.th-darmstadt.de/ags/ag8/felkel/felkel-d.html]

[39] Felkel R. On Solving Large-Scale Nonlinear Programming Problems Using Iterative Methods. Aachen: Shaker ISBN 3-8265-6797-8. 1999.

[40] Felkel R., Spellucci P. Numerical Experiments with an Exact Penalty Function for Convex Inequality Constrained QP-Problems. TUD Math. Dept. 1996. Rep. 1815.

[41] Fiacco A. v., McCormick G. P. Nonlinear Programming. Sequential Unconstrained, Minimization Techniques. N. Y.: Wiley 1968. Reprinted by SIAM Publ., 1990.

[42] Fletcher R. A general quadratic programming algorithm // J. I. M. A. 1971. Vol. 7. P. 76-91.

[43] Fletcher R., Leyffer S. Nonlinear Programming without a Penalty Function. Num. Anal. Rep. NA/171, Dept. of Math., Univ. of Dundee, 1997.

[44] Fletcher R., Leyffer S. User Manual for FilterSQP. Num. Anal. Rep. NA/181, Dept. of Math., University of Dundee, 1998.

[45] Fletcher R., Leyffer S. On the Global Convergence of an SLP-filter Algorithm. Tech. Rep. 98/13, Dept. of Math., FUNDP, Namur, Belgium, 1998.

[46] Fletcher R., Gould N.I.M., Leyffer S., Toint Ph. L. Trust-Region SQP-filter Algorithms for General Nonlinear Programming. Tech. Rep. 99/03. Dept. of Math., FUNDP, Namur, Belgium, 1999.

[47] Fletcher R., Sainz DE la Maza E. Nonlinear programming and nonsmooth optimization by successive linear programming // Math. Prog. 1989. Vol. 43. P. 235256.

[48] Fontecilla R. Inexact secant methods for nonlinear constrained optimization // SINUM 27. 1990. P. 154-165.

[49] FORSGREN A., Murray W. Newton Methods for large-scale linear inequality constrained minimization // SIOPT 7. 1997. P. 162-176.

[50] FORSGREN A., Gill Ph. E. Primal-dual interior methods for nonconvex nonlinear programming // SIAM J. Optim. 1998. Vol. 8. P. 1132-1152.

[51] FOURER R. Linear/Nonlinear Programming FAQ's. http://www.mcs.anl.gov/otc/ Guide/faq.

[52] Franke R. HQP — a Solver for Sparse Nonlinear Optimization. Intern. Rep., TU Ilmenau Dept. of Autom. and Systems Theory. 1995.

[53] FREUND R., JARRE F. Iterative Solution of Indefinite Systems Arising in Large-Scale Optimization by the Symmetric QMR Algorithm. Presented at XVISMP Ann Arbor, 1994.

[54] Friedlander A., Martinez J. M., Santos A. S. On the resolution of linearly constrained convex minimization problems // SIOPT 4. 1994. P. 331-339.

[55] GAUVIN J. A necessary and sufficient regularity condition to have bounded multipliers in nonconvex programming. Math. Prog. 1977. Vol. 12. P. 136-138.

[56] Gay D., Overton M., Wright M. H. A primal-dual interior method for nonconvex nonlinear programming / Y. Yuan (Ed.). Advances in nonlinear programming // Proc. of the 96 Intern. Conf. Beijing, China, Sept. 2-5, 1996. Dordrecht: Kluw. Acad. Publ. Appl. Optim. 1998. Vol. 14. P. 31-56.

[57] GILBERT J. Ch., Lemarechal C. Some numerical experiments with variable-storage quasi-Newton methods // Math. Prog. 1989. Vol. 45. P. 407-435.

[58] Gill Ph. E., Murray W., Saunders M. User's Guide for SNOPT (Version 5.3): A FORTRAN Package for Large-Scale Nonlinear Programming. Tech. Rep. NA 97-4, Dept. of Mathematics, Univ. of California at San Diego, 1997.

[59] Gill Ph. E., Saunders M. A., Shinnerl J. R. On the stability of Cholesky factorization for symmetric quasidefinite systems // SIAM J. Matrix Anal. Appl. 1996. Vol. 17. P. 35-46.

[60] GOLDFARB D. Extensions of Newton's method and simplex methods for solving quadratic programs // (Lootsma, ed.) Numer. Meth. for Nonlinear Optimization. N. Y.: Acad. Press, 1972.

[61] GOLDFARD D., Idnani A. A numerically stable method for solving strictly convex quadratic programs // Math. Prog. 1983. Vol. 27. P. 1-33.

[62] GOULD N. I. M. An algorithm for large-scale quadratic programming //J.I. M. A. Num. Anal. 1991. Vol. 11. P. 299-324.

[63] GOULD N. I. M., HRIBAR M.E., NOCEDAL J. On the Solution of Equality Constrained Quadratic Programming Problems Arising in Optimization. Rep. 09/24/98. Optimization Technology Center, Northwestern Univ., Evanston.

[64] Gould N.I. M., Toint Ph. L. SQP Methods for Large-Scale Nonlinear Programming. Rep. 99/05 FUNDP, available trough anonymous ftp at "pub/reports" on thales.math.fundp.ac.be.

[65] GOULD F. J., Tolle J. W. A necessary and sufficient qualification for constrained optimization // SIAM J. Appl. Math. 1971. Vol. 20. P. 164-172.

[66] GuddAT J., JONGEN H.Th., RUECKMANN J. On stability and stationary points in nonlinear optimization //J. Austr. Math. Soc. Ser. B. 1986. Vol. 28. P. 36-56.

[67] GRIEWANK A., TOINT Ph. L. Partitioned variable metric updates for large structured optimization problems // Num. Math. 1982. Vol. 39. P. 429-448.

[68] GüDDAT J., GüERRA F., NüWACK D. On the role of the Mangasarian-Fromowitz constraint qualification for penalty-, exact penalty-, and Lagrange multiplier methods. A. Fiacco (Ed.). Math. Programming with Data Perturbations // 17th Symp., George Washington Univ. N.Y.: Marcel Dekker. Lect. Notes Pure Appl. Math. 195. 1997. P. 159-183.

[69] GüIGNARD M. Generalized Kuhn — Tucker conditions for mathematical programming in a Banach space // SIAM J. Control. 1969. Vol. 7. P. 232-241.

[70] HAN S. P., MANGASARIAN O. L. A dual differentiate exact penalty function // Math. Prog. 1983. Vol. 25. P. 293-306.

[71] HEINZ J., SPELLUCCI P. A successful implementation of the Pantoja-Mayne sqp method // Optimization Methods and Software. 1994. Vol. 4. P. 1-28.

[72] HüCK W., SCHITTKOWSKI K. Test Examples for Nonlinear Programming Codes. Lect. Notes in Econ. and Math. Syst. 187. Berlin: Springer-Verl., 1981.

[73] HRIBAR M. E. Large-Scale Constrained Optimization. Northwestern Univ. PhD thesis. 1996. Evanston, Illinois, 1996.

[74] INTERIOR Points Online: http://www.mcs.anl.gov/otc/InteriorPoint/

[75] JüNASSON K., MADSEN K. Corrected sequential linear programming for sparse minimax optimization // BIT. 1994. Vol. 34. P. 372-387.

[76] KANZOW Ch. An unconstrained optimization technique for large-scale linearly constrained convex minimization problems // Computing. 1994. Vol. 53. P. 101-117.

[77] KARMARKAR N. A new polynomial time algorithm for linear programming // Combinatorica. 1984. Vol. 4. P. 373-395.

[78] KOJIMA M., MIZUNO S., YOSHISE A. A primal-dual interior-point algorithm for linear programming / N. Megiddo (Ed.) // Progress in mathematical programming: interior-point and related methods. N. Y.: Springer-Verl., 1989. P. 29-47.

[79] KUHN H. W., TUCKER A. W. Nonlinear Programming // Proc. 2nd Berkeley Symp. Math. Stat. Prob. Univ. Cal. Press. 1951. P. 481-492.

[80] Lalee M., NOCEDAL J., PLANTENGA T. On the implementation of an algorithm for large-scale equality constrained optimization // SIOPT8. 1998. P. 682-706.

[81] LaüX Th. Numerische Untersuchungen zur Primal-Dualen Methode von A. Forsgren und Ph. E. Gill: Diploma Thesis, TU Darmstadt, Dept. of Mathematics, 1999.

[82] Lin Chih-Jen, More J.J. Newton's method for large bound-constrained optimization problems // SIAM J. Optim. 1999. Vol. 9, No. 4. P. 1100-1127.

[83] Liu D. C., Nocedal J. On the limited memory BFGS method for large-scale optimization // Math. Prog. B. 1989. Vol. 45. P. 503-528.

[84] LüCIDI St., Roma M. Numerical experiences with new truncated Newton methods in large-scale unconstrained optimization // Comp. Opt. Appl. 1997. Vol. 7. P. 71-87.

[85] MANGASARIAN O. L., FromüWITZ S. The Fritz John necessary optimality conditions in the presence of equality and inequality constraints //J. Math. Anal. Appl. 1967. Vol. 17. P. 37-47.

[86] MITTELMANN H. D. benchmarks for optimization software: at http://plato.la.asu.edu/bench.html (2000).

[87] Mittelmann H.D., SPELLUCCI P. Decision Tree for Optimization Software: at http://plato.la.asu.edu/guide.html (2000).

[88] MüRÉ J. J., TORALDO G. Algorithms for bound constrained quadratic programming problems // Num. Math. 1989. Vol. 55. P. 377-400.

[89] MORÉ J. J., TORALDO G. On the solution of large quadratic programming problems with bound constraints // SIOPT 1. 1991. P. 93-113.

[90] MORÉ J. J., Wright St. J. Optimization Software Guide. Philadelphia: SIAM, 1993.

[91] MURRAY W. Sequential quadratic programming methods for large-scale problems // COAP 7. 1997. P. 127-142.

[92] Nash St. G. Newton type minimization via the Lanczos method // SINUM 21. 1984. P. 770-788.

[93] NASH St. G. Code TN and TNB: http://netlib.org/opt/index.html.

[94] NASH St.G., Sofer A. A barrier method for large-scale constrained optimization // ORSA J. on Computing. 1993. Vol. 5. P. 40-53.

[95] Ni Q. General large-scale nonlinear programming using sequential quadratic programming methods // Bayreuther Math. Schriften. 1993. Vol. 45. P. 133-236.

[96] NICKEL R. H., Tolle J. W. A sparse sequential quadratic programming algorithm // J. O. T. A. 1989. Vol. 60. P. 453-473.

[97] NOCEDAL J., Wright St. J. Numerical Optimization. N.Y.: Springer-Verl., 1999.

[98] Paige C. C., Saunders M. A. Solution of sparse indefinite systems of linear equations // SIAM J. Numer. Anal. 1975. Vol. 12. P. 617-629.

[99] Pantoja J. F. A. DE O., Mayne D. Q. Exact penalty function algorithm with simple updating of the penalty parameter //J. O. T. A. 1991. Vol. 69. P. 441-467.

[100] PLANTENGA T. D. Large-scale constrained optimization using trust-regions: PhD Dissertation. Northwestern Univ., Evanston, Illinois, 1994.

[101] PlNZ W. Ein neues Verfahren zum Losen quadratischer Optimierungsprobleme: Diplomarbeit TUD FB Mathematik, 1996.

[102] POLYAK R. Modified barrier functions (theory and methods) // Math. Prog. 1992. Vol. 54. P. 177-222.

[103] Potra F., Tapia R., Zhang Y. On the superlinear convergence of interior-point algorithms for a general class of problems // SIAM J. Optim. 1993. Vol. 3. P. 413-422.

[104] ROBINSON St. A quadratically convergent algorithm for general nonlinear programming problems // Math. Prog. 1972. Vol. 1. P. 145-156.

[105] ROBINSON St. Stability theory for systems of inequalities. Part II. Differentiable nonlinear systems // SIAM J. Num. Anal. 1976. Vol. 13. P. 497-512.

[106] Rockafellar R. T. Lagrange multipliers and optimality // SIAM Rev. 1993. Vol. 35. P. 183-238.

[107] Saunders M.A. Recent Developments to MINOS (MINOS5.4). Presented at the XVISMP at Ann Arbor, 1994.

[108] Sahni S. Computationally related problems // SIAM J. Comp. 1974. Vol. 3. P. 262-279.

[109] SCHITTKOWSKI K. More Testexamples for Nonlinear Programming Codes: Lect. Notes in Econ. and Math. Syst. 282, Springer-Verl., 1987.

[110] SCHITTKOWSKI K. On the convergence of a sequential quadratic programming method with an augmented Lagrangian line search function // Math. Oper. and Stat. Ser. Opt. 1983. Vol. 14. P. 197-216.

[111] SPELLUCCI P. Solving general convex QP problems via an exact quadratic Lagrangian with bound constraints. THD FB4 preprint 1563 (1993). Revised version 7/94.

[112] SPELLUCCI P. An SQP method for general nonlinear programs using only equality constrained subproblems // Math. Prog. 1998. Vol. 82. P. 413-448.

[113] SPELLUCCI P. A new technique for inconsistent QP problems in the SQP method // Math. Meth. OR, 1998. Vol. 47. P. 355-400.

[114] SPELLUCCI P. Numerische Verfahren der nichtlinearen Optimierung. Birkhauser, 1993.

[115] SPELLUCCI P. Numerical experiments with modern methods for large-scale QP problems // Recent Advances in Optimization / P. Gritzmann et al. (Eds) Lect. Notes in Econ. and Math. Syst. 452. Berlin: Springer-Verl., 1997. P. 313-335.

[116] Wright St. J. Primal-dual Interior-point Methods. Philadelphia: SIAM Publ., 1997.

[117] Vanderbei R. J. Symmetric quasidefinite matrices // SIAM J. Optim. 1995. Vol. 5. P. 100-113.

[118] Vanderbei R.J., Shanno D.F. An Interior-point Algorithm for Nonconvex Nonlinear Programming. SOR-97-21. Statistics and Operations Research, Princeton Univ.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

[119] VAVASIS S. Quadratic programming is in NP // Info. Proc. Lett. 1990. Vol. 36. P. 73-77.

[120] ZüU X., NAVON I.M., Berger M. ET AL. Numerical experience with limited-memory quasi-Newton and truncated Newton methods // SIOPT 3. 1993. P. 582-608.

Received for publication February 1, 2001

i Надоели баннеры? Вы всегда можете отключить рекламу.