№4(8) 2007
G. Antille, A. Weinberg Allen
D-optimal Design for Polynomial Regression Choice of Degree and Robustness
In this paper we show that for the D-optimal design, departures from the design are much less important than a departure from the model. As a consequence, we propose, based on D-optimality, a rule for choosing the regression degree.
We also study different types of departures from the model to define a new class of D-optimal designs, which is robust and more efficient than the uniform one.
One of the main problems in the polynomial regression concerns a choice of the regression degree. In the literature on optimal design it is usually assumed that a statistical model is known. In practice, however, the exact degree of the polynomial is not known with certainty. Moreover, the design support may not correspond to the optimal one given by an analytical solution.
Considerthe common polynomial regression model with heteroscedastic error
and X(x) > 0 is the efficiency function of the experiment under study.
These models are widely used when the response is curvilinear or when nonlinear relationships are approximated by a polynomial over reasonably small range of the x's.
A design of experiment is a probability measure 2 defined on the interval I. An exact design is a probability measure 2 with finite support x1,x2,....,xn at which uncorrelated observations are taken, such that 2(x,) = nt/N where nt is the number of observations taken at x, and N = ^nt is the total number of observations. Otherwise, 2 is said to be an approximate design.
The information matrix of an approximate design, denoted by M( 2), is defined bytheequality
In that case the covariance matrix of the generalized least squares estimates of p is (a 2/N)M2), and the variance of the prediction at a given point x is a2 fT(x) M2)f (x).| M2)| is known as the generalised variance.
An optimal design minimizes (or maximizes) Om, a specific convex (or concave) function of the information matrix. Om (M(2)) = [trace(M(2))_1]1/m defines a class of classical criteria. For m = 1
1. Introduction
E(y|x) = fT(x)p, V(y|x) =a2/Mx), where the explanatory variable x varies in I, a real compact interval
fT (x) = (1,x, x 2,..., xm ), p= (p 0,p1,--,p m )T
M( 2) =jM x )fT ( x )f ( x Щ x ),
or, equivalently, for an exact design by
M( 2) x, )fT ( x, )f ( x, )2( x, ).
55
No4(8) 2007
we have the A-optimality.The D-optimality corresponds tom ^ 0, the criteria based on the determinant of the information matrix. For m ^ w the E-optimality is defined through the eigenvalues of the information matrix.
The purpose of this paper is, for a certain class of efficiency functions and D-optimality, to study the influence of departures from the optimal design on the values of the criteria.
Our numerical results show that the influence on the value of D-efficiency is more important when perturbations are applied to boundary points than to others.
As a consequence of a numerical study of the behavior of the standardized determinant of the information matrix with respect to the degree of the polynomial regression for classical efficiency functions, we propose the following rule for choosing the regression degree.
For homoscedastic cases and for Jacobi's type efficiency functions the regression degree should be 'the smallest reasonable'. For Laguerre efficiency functions the regression degree should be the 'the largest reasonable'. For the Hermite function there is no general rule.
We also point out that departures from the design are much less important than a departure from the model.
Finally we define a new class of designs, higher-order D-optimal, which is
(1) more efficient than uniform designs and
« (2) as robust as uniform ones.
<u £
3
■o
£ family of approximate designs.
T3
c
2 2. Review of Classical Theory
<u
2 Feasibility corresponds to the possibility of estimating the full parameter p.
<u .o o ■c
" | M( & *)| = max| M( &)|,
c i -= i
So, the family of higher-order D-optimal designs can be considered as a near optimal, robust lily of approximate designs.
2. Review of Classical Theory
In this article we restrict our attention to feasible designs for the polynomial regression. isibility corresponds to the possibility A design &*, satisfying the equation
is called D-optimal, where 3 is the set of all feasible designs. D-optimal designs are well known for an important class of the efficiency functions given in Theorems 1 and 2.
.0 V
I &
<u CC
■S
§ Theorem 1 [Fedorov (1972)]
£ a) Forthe polynomial regression of degree n, the D-optimal design is unique, and it is concen-
trated at n+1 equally weighted points. ^ b) For the following efficiency functions: ■5?
<3 1. M x) = 1, -1 < x < 1;
! 2. Mx) = (1-x) a+1(1 + x)p+1, -1< x < 1, a >-1 and p>-1; t 3. Mx) = exp(-x), 0 < x < w;
q 4. Mx) = xa+1 exp(-x), 0<x<w,a>-1
5. Mx) = exp(-x2), -w < x <w,
56
№4(8) 2007
the design points are roots of the corresponding polynomials:
1. (1-x2)P„(x), where Pn(x) is the nth Legendre polynomial;
2. Ja+p (x), where J"+P (x) is the (n+1)th Jacobi polynomial with parameters a, P;
3. xL]n(x), where L\(x) is the nth Laguerre polynomial with parameter 1;
4. La+1(x), where L°n+1(x) is the (n+1)th Laguerre polynomial with parameter a;
5. Hn+1 (x), where Hn+1 (x) is the (n+1)th Hermite polynomial.
The generalization of the second partofTheorem 1, proposed by [Antille (1977)], was followed by numerous authors [Huang et al. (1995)], [He et al. (1996)], [Chang and Lin (1997)], [Imhoff et al. (1998)], [Dette et al. (1999)] and others, and finally resulted in Theorem 2.
Theorem 2 [Antille, Dette, Weinberg (2001)]
For the following efficiency functions:
1. Mx) = x~y exp(-S/x), 0 < x < w, S > 0 and y > 2n;
2. A,(x) = (1 + x2)a+1 exp(2parctan(x)), -w < x< w, a < -n- 1and pg^,
the design points are roots of the corresponding polynomials:
1. 6-+y,'S(x), where B-+y1S (x) is the generalized Bessel polynomial with parameters y,S;
2. Ja++;p,a-,p (ix), the (n+1)th Jacobi polynomial with parameters a + /p,a-/p.
Figures 1 and 2 contain graphical representations of examples of optimal designs for different degrees of regression and for the efficiency functions 1 and 4 (a = 5) of Theorem 1.
I ?
■o
S
■4 с
<5
of regression 9 8 7 6 5 4 3 2 1
Determinant
of the information matrix 1.43e-24
-1.0 -0.5 0.0 0.5 1.0
x
Figure 1. Optimal design for Legendre efficiency function
For more details on optimal designs we refer to the monographs [Fedorov (1972)], [Silvey (1980)], [Atkinson and Donev (1992)], and [Pukelsheim (1993)].
57
№4(8) 2007
Determinant
of the information matrix 1.81e46 2.36e34 2.89e-24 2.64e-16 1.41e-10 3.32e5 2.45e2 3.78 7.23e-1
10 15 20 25 30
«
<u js
!S
■o
0
ОС
тз
с «
<u
1
«Я <Ъ CJ
о ч ,<j о
в
с ,0
<л &
<ъ ос
£ о
<5
а
ii <u
Q
.g
f
Figure 2. The optimal design for the Generalized Lagurre efficiency function (a = 5)
3. Robustness of D-optimal Designs
3.1 Influence of Design Points
Classical theory on optimal designs for polynomial regression provides analytical solutions to optimization problems. However the design points may not correspond to the optimal ones. In this section we study the influence of different perturbations on the design points.
In order to compare designs in term of D-optimality let us define Def as the D-efficiency of a design 2 with respect to the optimal design 2*:
Df = -ML.
| M( 2 *)|
The standardised D-efficiency is the nth root of the D-efficiency, where n is the degree of the polynomial regression.
As an example we consider the well known homoscedastic case for the regression of degree 4 with explanatory variable x g [-1;+1]. Figures 3 and 4 exhibit perturbed designs and their respec-
Perturbation of central points, Legendre
Optimal-» i D-efficiency
0.967 0.957 0.917 0.883 0.878 0.878 0.872 0.845 0.836 0.835 0.82 0.81 0.786 0.76 0.676 0.664 0.644 0.634 0.629 0.553
-1.0 -0.5 0.0 0.5 1.0
x
Figure 3. Perturbation of central points of the optimal design for the Legendre efficiency function
58
Optimal->
Perturbation of boundary points, Legendre
No4(8) 2007
D-efficiency
0.593
0.325
0.32
0.308
0.256
0.226
0.212
0.122
0.106
0.106
0.0939
0.0683
0.0482
0.0279
0.0148
I ?
■o
S
■4 Is
C5
Figure 4. Perturbation of boundary points of the optimal design for the Legendre efficiency function
tive D-efficiencies. In Figure 3 perturbations are applied to central points and in Figure 4 to boundary ones.
The numerical results show that the influence on D-efficiency is more important when perturbations are applied to boundary points than to others.
Another way to illustrate this fact is presented in Figure 5.
We shift respectively x, towards x2, then x2 towards x3, after that x3 towards x2 and finally x2 towards x, and plot, in Figure 5, the D-efficiency versus the distance between two neighbouring points, where x, < x2 < x3 < x4 < x5 are the D-optimal points. Due to the symmetry of the design we do not need to consider the moves of the points x 4 and x 5. In each case the D-optimal design corresponds to the maximal distance between two neighbouring points, with the D-efficiency equal to 1, and when the two points are close, the D-efficiency is close to 0. Looking at Figure 5 from right to left we observe that the decrease of the D-efficiency curves is more important when
D-efficiency 1.0
'*' — moving x1 to x2, V — moving x2 to x3, 'o' — moving x2 to xl, V — moving x3 to x2.
Distance
Figure 5. The D-efficiency versus (vertical axis) the distance between two neighbouring points (horizontal axis)
59
№4(8) 2007
x1 moves towards x2 than when x2 moves towards x, or when x2 towards x3. Therefore and obviously, the conclusions are similar to those drawn in the cases of Figures 3 and 4.
3.2. Influence of the Degree of Regression
Usually the degree of a regression is supposed to be known to allow computation of optimal designs. In practice however, the exact degree of the polynomial is not known with certainty.
In this section we study the behavior of the determinant of the information matrix with respect to the degree of the polynomial regression for classical efficiency functions.
In each case we consider, for comparison,
| Mlm =
m +1
m+1 m+1
в ^x, ))B-
i =1 k< j
«
<u JS
3
■о о ОС тз
с «
<и
<Ъ Q
ч« О
4
,<J
о в с
.О
<л &
<ъ ос
£ о
! о а
<л <и Q
"<S
the standardized determinant of the information matrix. For the classical efficiency functions, D-optimal design points are roots of orthogonal polynomials. The property of these roots, which interlace for increasing degrees, implies that^( xk - xj )2 is a decreasing function of degree if the
k < j
design support is in the interval [-1;+1]. Hence, for efficiency functions, satisfying |A,(x)|^ 1, x g [-1;+1], the standardised determinant of the information matrix is a decreasing function of regression degree. The opposite is true for Laguerre efficiency functions.
So for D-optimal designs generated by Legendre orJacobi polynomials the generalised variance is an increasing function of the degree of regression. For Laguerre or generalised Laguerre the reverse is true. No general result exists for the Hermite case.
Hence, as long as D-optimality is concerned, a rule of choice of the degree can be formulated as follows:
for homoscedastic cases and for Jacobi's type efficiency functions the regression degree should be 'the smallest reasonable';for Laguerre efficiency functions the regression degree should be the 'the largest reasonable';for the Hermite function a case by case study has to be done.
Table 1 presents standardized determinants for D-optimal designs for various efficiency functions.
Table I
Standardized determinants of the information matrices for D-optimal designs for degrees of regression from 1 to 9
Polynomials of Legendre, x) = 1
1. 0.385 0.172 0.081 0.0389 0.0189 0.00924 0.00454 0.00224
Polynomials of Jacobi, x) = (1 - x)"(1 + x ) ", a = 1, ß = 1
0.148 0.0716 0.035 0.0173 0.00854 0.00423 0.0021 0.00105 0.00052
Polynomials of Jacobi, x) = (1 - x)"(1 + x )15, a = 2, ß= 2
0.0819 0.0353 0.0159 0.00739 0.00349 0.00167 0.000802 0.000388 0.000189
m
60
№4(8) 2007
End of Table 1
Polynomials of Jacobi, X(x) = (1 - x)"(1 + x)15,a - 2,p = 4
0.102 0.0326 0.0126 0.00524 0.00228 0.00102 0.000462 0.000213 0.0000996
Polynomials of Legendre, x) = e x
0.135 0.199 0.415 1.12 3.71 14.5 65.7 337 1940
Polynomials of Generalized Laguerre, x) = xae , a = 1
0.0733 0.231 0.695 2.4 9.58 43.8 226 1300 8230
Polynomials of Generalized Laguerre, x) - xae x, a = 2
0.268 0.824 2.68 10.1 44 218 1210 7480 50800
Polynomials of Generalized Laguerre, x) - xae x, a = 2.5
0.723 1.94 6.26 24 107 546 3120 19800 138000
Polynomials of Hermite, x) - e x2
0.184 0.158 0.161 0.187 0.241 0.337 0.506 0.811 1.37
Polynomials of Hermite (x unsealed), x) - ke tx/2o)2, a2 - 1, k - = 10
36.8 14.1 13.9 18.8 30.5 55.9 113 245 568
Polynomials of Hermite (x unsealed), x) - ke tx/2a)2, a2 - 0.1, k = 1
0.0368 0.0141 0.00644 0.00335 0.00192 0.0012 0.00081 0.000581 0.00044
I ?
■о !
4 1:
<5
3.3. Departure from the Optimal Design Versus Departure from the Model
Theanalysis performed in this part of the study was motivated by [Huber (1975), (1981)], who pointed out that D-optimal designs are strongly non-robust in the case of rather small deviations from linearity and showed that the uniform designs have much more satisfactory behavior than the optimal ones.
First, for illustrative purposes, in Table 2 we present the values of the standardized determinants for different regression degrees for uniform designs. It can be easily seen that departure of a model affects more the value of the standardized determinant than departures of the optimal design. We also note that with increasing number of points in uniform designs the values of standardized determinants monotonically increase as well.
At the next step we introduce the concept of a family of D-optimal designs. D-optimal designs of order k for the regression degree l < k are called higher-order D-optimal designs. For Legendre, Jacobi, and Hermite efficiency functions we compute the standardized determinants for different degrees of regression for higher-order D-optimal designs. So, for example, for the optimal design of order 9, which contains 10 points we calculate the standardized determinants for the polynomial regressions of degrees from 1 to 9. These results are presented in Tables 3, 4, and 5. Our goal would be, on the one hand, to verify the Huber's assertions and, on the other hand, to compare the performance of the higher-order D-optimal designs versus uniform ones.
Again we observe that departures from the design are much less important than the departure from the model (Tables 3, 4, and 5), the values of standardized determinants are determined more by the degree of regression than by the «quality» of the design.
61
№4(8) 2007
Table 2
Standardized determinants of the information matrices for uniform designs for the Legendre efficiency function
Regression degree
Number 1 2 3 4 5 6 7 8 9
of points Standardized Determinants
x10-1 x10-1 x10-2 x10-2 x10-2 x10-3 x10-3 x10-3 x10-4
2 2.50
3 2.96 1.14
4 3.12 1.40 5.16
5 3.20 1.51 6.47 2.32
6 3.24 1.58 7.17 2.96 1.04
7 3.27 1.62 7.60 3.35 1.35 4.69
8 3.28 1.64 7.87 3.60 1.55 6.12 2.10
9 3.29 1.66 8.06 3.78 1.70 7.13 2.77 0.941
10 3.30 1.67 8.20 3.90 1.79 7.87 3.26 1.25 4.21
«
¡p
js
3
■о о ОС тз
с «
<и
<Ъ Q
ч« О
4
,<J
о в с
.О
<л ?
& щ
ОС
£ о
! о а
<л <u
Cl .g
f
Comparing Tables 2 and 3 for Legendre efficiency functions, we observe that, in difference from the uniform designs, the values of standardized determinants for higher order D-optimal designs monotonically decrease with increasing number of points. But this decrease is slow and mo-notonic;the values for higher-order D-optimal designs remain always superior to the ones for the uniform designs. The same observation is true for other efficiency functions.
Table 3
Standardized determinants for the information matrices for higher order D-optimal designs for the Legendre efficiency function
Regression degree
Number 1 2 3 4 5 6 7 8 9
of points Standardized Determinants
x10-1 x10-1 x10-1 x10-2 x10-2 x10-2 x10-3 x10-3 x10-3
2 10.00
3 6.67 3.85
4 6.00 3.10 1.72
5 5.71 2.90 1.49 8.10
6 5.56 2.80 1.42 7.24 3.89
7 5.45 2.74 1.38 6.97 3.55 1.89
8 5.38 2.70 1.36 6.83 3.45 1.75 9.24
9 5.33 2.67 1.34 6.73 3.39 1.71 8.67 4.54
10 5.29 2.65 1.33 6.66 3.35 1.68 8.47 4.30 2.24
62
№4(8) 2007
Table 4
Standardized determinants for the information matrices for higher order D-optimal designs for Jacobi (1,1) efficiency functions
Regression degree
Number i 2 3 4 s б 7 s 9
of points Standardized Determinants
xiO-2 xiO-2 xiO-2 xiO-2 xiO-3 xiO-3 xiO-3 xiO-3 xiO-4
2 14.8
S 9.60 7.16
4 8.40 5.74 S.50
5 7.84 5.SS S.02 1.7S
6 7.51 5.1S 2.87 1.54 8.54
7 7.S0 5.00 2.79 1.48 7.80 4.2S
8 7.15 4.91 2.74 1.45 7.56 S.9S 2.10
9 7.0S 4.85 2.70 1.4S 7.4S S.82 1.97 1.05
10 6.95 4.80 2.84 1.42 7.S4 S.77 1.92 0.98 5.20
Table 5
Standardized determinants for the information matrices for higher order D-optimal designs for the Hermite efficiency function
Regression degree
Number i 2 3 4 s б 7 s 9
of points Standardized Determinants
xiO-2 xiO-2 xiO-2 x ю-1 xiO-1 xiO-1 xiO-1 xiO-1 x1OO
2 18.4
S 10.8 15.8
4 8.00 12.0 16.1
5 6.S8 10.S 1S.5 1.87
6 5.S1 8.99 12.1 1.64 2.41
7 4.55 8.05 11.0 1.50 2.16 S.S7
8 S.98 7.S1 10.1 1.40 2.02 S.08 5.07
9 S.54 6.71 9.40 1.S1 1.91 2.92 4.70 8.12
10 S.18 6.21 8.80 1.24 1.81 2.78 4.48 7.60 1.S7
I ?
■o É
■4 Is
CS
This analysis shows, that for the same or close number of observations (points) higher-order D-optimal designs are
(1) more efficient than uniform designs
and
(2)as robustas uniform designs.
63
>
о а
№4(8) 2007
So the family of higher-order D-optimal designs can be considered as a near optimal, robust family of approximate designs.
4. Conclusion
Analytical solutions to optimization problems in the optimal design theory do not exist in many cases. Numerical calculations for D-optimal designs in the polynomial regression pointed out the following results:
1. For Legendre, Jacobi and other non-exponential efficiency functions lower regression degrees correspond to higher D-efficiency;for Laguerre efficiency functions higher degrees correspond to higher D-efficiency and for Hermite efficiency there is no general rule.
2. For Legendre and Laguerre efficiency functions, less central is a point more important is its influence on the D-efficiency;for Hermite efficiency functions this result is not valid;for Jacobi efficiency functions there is no general rule, the results depend on its parameters.
3. We confirm the Huber's hypotheses, that departures from the design are considerably less important than departures from the model.
4. For Legendre efficiency functions we can compare uniform designs with higher-order D-optimal designs, introduced in this paper. D-optimal higher-order designs are significantly more efficient, and the family of higher-order D-optimal designs can be considered as a near optimal, robust family of approximate designs.
Antille G. Plans d'expérience. Measures et plans d'expérience D-optimaux pour les polynomes de régres-
«
<u £
1
0 References
cc
T3 c
; sion // C.R. Acad. Sci. Paris. 1977. №285. ï
j» Antille G., Dette H., Weinberg A. A note on optimal designs in weighted polynomial regression fort he clas-
2 sical efficiency functions // J. Statistic. Plann. Inference. 2003. №113.
<b Chang F. C., Lin G. C. D-optimal designs for weighted polynomial regression. J. Statistic. Plann. Inference.
1 1997. №62.
^ Dette H., Haines L. M., ImhoffL. Optimal designs for rational models and weighted polynomial regression.
I Ann. Statist. 1999. №27. <0
<2 Fedorov V.V. Theory of Optimal Experiments // Academic Press, New York, 1972.
J? HeZ., Studden W.J., Sun D. Optimal designs for rational models // Ann. Statist. 1996. №24.
Huang M.-N. L., Chang F. C., Wong W.K.D-optimal designs for polynomial regression without an intercept.
§ Statist. Sinica. 1995. №5. c
Imhoff L., Krafft O., Schaefer M. D-optimal designs for polynomial regression with weight function x) = x / (1 + x) // Statist. Sinica 1998. №8. Karlin S., Studden W.J. Optimal experimental designs // Ann. Math. Statist. 1966. №37. Karlin S., Studden W.J. Tchebycheff systems: with Applications. Analysis and Statistics. Wiley, New York,
S
<Л
Q 1966.
| KieferJ.C. General equivalence theory for optimum design (approximate theory)//Ann. Statist. 1974.
а №2. о
Pukelsheim F. Optimal Design of Experiments. Wiley, New York, 1993. SilveyS. D. Optimal Design. Chapman and Hall, London, 1980.
64