A.R.Gokul and M.Pachamuthu RT&A' No 4(80)
PREDICTION OF BOX-BEHNKEN DESIGN IN MISSING CASE V°lume 19 December, 2024
OPTIMALITY PREDICTION OF SECOND ORDER BOX-BEHNKEN DESIGN ROBUST TO MISSING OBSERVATION
A.R. Gokul1 and M. Pachamuthu2*
Research Scholar, Department of Statistics, Periyar University, Salem-636011, India
gokulmalar6 @gmail.com 2*Assistant Professor, Department of Statistics, Periyar University, Salem-636011, India [email protected]
Abstract
The study of robust missing observations has gained prominence in statistical research. In particular, the Response Surface Methodology (RSM), a widely applied approach in experimental design, faces challenges when dealing with missing data. This paper investigates two design variants: the three-level second-order Box-Behnken design (BBD) with one missing observation and the Small Box-Behnken Design (SBBD), which involves fewer experimental runs than the standard BBD. We evaluate prediction performance using a fraction of design space (FDS) plot, revealing the distribution of scaled prediction variance (SPV) values across the design space. Additionally, we assess the efficiency of design model parameters using information-based criteria (A, D, and G relative efficiency). Our analysis spans k factors, ranging from k = 3 to 9. The findings guide practitioners in selecting optimal design points for efficient parameter estimation and accurate prediction within the context of missing observations. This comparative study sheds light on the trade-offs between BBD and SBBD, providing valuable insights for experimental design practitioners.
Keywords: Box-Behnken Design, Fraction of Design Space, Scaled Prediction Variance, Optimality, Small Box-Behnken Design
I. Introduction
Response Surface Methodology is a powerful statistical and mathematical model construction technique blend. It's designed to assess the impact of multiple independent variables and find their optimal values to yield the most desirable outcomes. This methodology benefits scenarios that aim to optimize a product or process. The empirical model is based on data observed from the system or process. RSM involves building empirical models using multiple regression and statistical techniques [19]. The second-order model is commonly used in RSM, particularly in Central Composite Designs (CCD) and BBD. Robust missing observation is a critical research area in all statistical methodologies. Even in well-planned experiments, there may be a chaos of missing observations that becomes challenging for estimating parameters in a model. Much of the robust missing observation research in the literature review is performed in different Central Composite Design types using various alpha (a) values rather than second-order Box-Behnken Designs. Draper [7] reviews the research on robust missing observation methods in response surface design and credits the first researcher to develop a parameter estimation formula. Akhtar and Prescott
[2]proposed a minimax loss criterion for handling missing observations, which is now the most used in response surface designs. Akhtar [1] examines a five-factor CCD with two missing observations in three different settings. Smucker et al. [20] gave empirical results for the effect of missing observations on various classical and optimal designs and a new type of missing-robust design in screening and response surface settings. Alrweili et al. [4] use the minimax loss criterion to create more robust designs for missing observations by combining the latest CCDs from GSA and AEK, which are new designs. Hayat et al. [8]explore designs from regular and irregular structure subsets, assess how they dealt with missing design points using the minimax loss criterion, and investigate their alphabetic optimality and prediction performance with FDS plots of the response difference variance. Alanazi et al. [3] present closed-form expressions for missing two observations as a function of a, the axial value used in CCDs with up to 10 factors. Whittinghill [22] explores how Box-Behnken Designs can handle missing observations without losing the ability to estimate all the parameters of interest and the article defines tmax as the maximum number of rows that can be arbitrarily deleted from the design matrix and keep the parameters estimable. Tanco et al.[21] used tmax and D-Efficiency criteria to evaluate three-level second-order polynomial designs, such as Box-Behnken, Face Centered, and other smaller and intermediate designs. Rashid et al.[18] investigate how to deal with one missing observation in Augmented BBD (ABBD) and Augmented Fractional BBD (AFBBD) using the minimax loss criterion and relative D and G efficiency. Rashid et al. [17] examine how a missing observation affects the estimation and prediction abilities of the ABBD'S relative A-, D-, and G-efficiencies. Hemavathi et al. [9] explore how sequential third-order rotatable design can handle missing observations without losing much information. Also, the paper measures the information loss due to one or two missing experimental runs at different distances from the center of the design. Park et al. [16] use graphical methods such as variance dispersion graphs, a fraction of the design space plot, and G-, I- optimality criteria to examine how different experimental designs perform in spherical and cuboidal regions for three to seven factors. Chigbu et al. [13] compare CCD, Small Composite Design (SCD), and MinResV designs for spherical regions with k = 3 to 7 factors based on the optimality criteria and the Variance Dispersion Graph (VDG), and the results show that none of these designs is consistently better for themselves. Li et al. [12] evaluate different CCD, SCD, and MinResV designs for spherical and cuboidal regions with various axial values, and they utilize FDS plots and box plots to analyze the prediction variance properties of the designs. Onwuameze et al. [14] use graphical methods such as VDG and FDS plots to evaluate the prediction variance performance of CCD, SCD, and MinResV in the hypercube region. However, most research on robust missing observations in Box-Behnken design recently focused on third-order designs, called ABBD.
This paper conducts a comparative analysis of the classical BBD and the SBBD focusing on the robustness of these designs when a single observation is missing. The SBBD [24] is noted for its advantage of requiring fewer runs than the BBD. The paper evaluates these designs using relative A-, D-, and G-efficiency to assess parameter estimation accuracy and explores a fraction of the design space plot in terms of scaled prediction variance. The paper is structured as follows: Section 2 Outlines the methodology used in the study. Section 3.1: Presents the results and discussions on scaled prediction variance and relative efficiencies. Section 3.2: Provides an analysis and discussion using the Fraction of Design Space graph of both BBD and SBBD. Section 4: Summarizes the findings and conclusions of the study.
II. Methodology I. Description of Second Order Model
In numerous instances involving response surface methodology, we may not understand the relationship between the predictor variables and the response. A first-order model might not be sufficient to capture the curvature of the response function. Therefore, we often use higher-degree polynomials, such as a second-order model, to better evaluate curvature in optimization
A.R.Gokul and M.Pachamuthu RT&A' No 4(80) PREDICTION OF BOX-BEHNKEN DESIGN IN MISSING CASE_V°!ume 19 December, 2024
experiments. For k quantitative factors denoted by x1,x2,..., xk, a second-order model is
y = Po + lUPiXi + lUPuxf + Y]k-lYk=i+iPijXiXj + £ (1)
where P0,(i, (a, and are the intercept, linear, quadratic, and bilinear terms, respectively, and £i is a random error with mean zero, variance a2 and independent between any pair of runs. The
number of unknown parameters to be estimated is denoted as p = k + k + + 1, and to have
sufficient degrees of freedom to estimate the model coefficients, the number of runs or observations n must be greater than or equal to p.
II. Small Box-Behnken Design
Box and Behnken [6] combined balanced incomplete block design and factorial design to create a three-level factorial design called Box-Behnken design. Box-Behnken Designs are three-level, second-order spherical designs with all points on a sphere. They are typically used for fitting second-order response surface models and are available for 3-12 and 16 factors. This design is widely used for second-order models in analytical chemistry and industrial applications. Small Box-Behnken Design is specially constructed by using a Balanced incomplete block design (BIBD) and Partially Balanced incomplete block design (PBIBD) and replaces treatments partly by 2f- 1designs and partly by full factorial designs. A unique feature of SBBD is that it has minimum runs compared to classical BBD. SBBD consists of two design point categories: Full Factorial design 22 or 23 denoted as (F), and
111
2\l-1 Fractional Factorial design denoted as (FF), which has runs in the form ( 111
>-1 -1 1/
Appendix A of the article outlines the detailed structures of the design point types or called as design matrix X for the Small Box-Behnken Design. Using the design matrix X of BBD and SBBD, we can calculate further computational analysis. To know about the further construction methods of SBBD, refer to article [24].
III. Scaled Prediction Variance
Borkowski [5] gives an analytical form for calculating scaled prediction variance values of central composite design and Box-Behnken design. Scaled prediction variance criteria is an essential tool for selecting response surface designs. It allows for good prediction of response variables at various points of interest throughout the experimental region. This scaling is widely used to facilitate comparisons among designs of various sizes. The prediction variance at a point x is given by
V(x) = " ™r[y(x)] = n . x(m)'(X'X)-1X(m) (2)
x (m) is the design point vector in the design space expanded to model form, n is the design size or runs, and o2 is the observation error. Desirable designs are those which have the smallest value of scaled prediction variance.
IV. Fraction of Design Space
The FDS plot is a useful tool to compare two or more designs, as it shows the SPV distributions of designs with a single curve and their G-efficiency and V-average values. FDS plot is [15] constructed by sampling many values, say n, from throughout the design space and obtaining the corresponding SPV values. The FDS plot informs the experimenter how the SPV varies throughout the design space,
A.R.Gokul and M.Pachamuthu RT&A> No 4(80) PREDICTION OF BOX-BEHNKEN DESIGN IN MISSING CASE_V°lume 19 Decemta, 2024
including the minimum and maximum SPVs. The idea is that the design is better if a larger fraction
of the design space is close to the minimum SPV value. Moreover, the design is more stable if the
line is flatter. The FDS plot helps summarize the range and the proportions of SPV values in the
design space and easily compares designs with a single curve. In addition, [23] it provides the
researcher with a single plot to compare designs or study the properties of a specific design.
Accordingly, the FDS technique could be applied to regular and non-regular design regions.
V. Relative G-, D-, and A-, Efficiency
G-optimality is defined as minimizing the maximum variance of any predicted value over the experimental space. Iwundu[10] investigates how single or multiple missing observations affect cuboidal designs' Relative A-, D-, and G-efficiency. It is defined as the ratio of the determinant of the information matrix of the design to the determinant of the information matrix of an optimal design.
Q _ _P__(3)
eff n-MAXxeRV(x) ( )
Here p is the number of parameters of estimated model, n is the number of observations in the respective design and MAXxeRV(X) is the maximum value of the variance of predicted response. Thus, relative G-efficiency denoted by REC is given as the ratio of Geff of reduced design and of complete design.
gg _ Geff ( reduced ) _ n-MAXxeRvW
G Geff nr-MAXXeRV(x) reduced (
where n is the size of the runs of the complete design, and nr is the size of the runs of the reduced design. According to this definition of REC, a design with a higher value of REC will be preferred. By utilizing equations (3) and (4), we can compute the relative G-efficiency value. These values are then presented in tables 3 and 4.
D- efficiency is defined as maximizing the determinant of the information matrix or minimizing the determinant of the inverse of the information matrix. Thus, relative D-efficiency is given as
1
= (^rf (5)
¡\x'x
Where, p is the number of parameters of the model to be estimated, |X'X|reduced is the determinant of the information matrix of reduced design and IX'XI is the determinant of the complete design matrix. A value approaching one will represent a minor loss, whereas a value below one will represent a more significant loss in model estimation. Through the application of equation (5), we are able to determine the relative D-efficiency value. These computed values are then listed in tables 3 and 4.
The A-Criterion considers the individual variances of the regression coefficients rather than the covariances among coefficients. Thus, relative ^-efficiency is given as
RE = traced)-1
A (trace(X'X)-1)reduced ( '
where the trace is the sum of the main diagonal values of (X'X) 1 , where (trace (X'X) x)
reduced
is the trace of (X'X)-1 of the reduced design, and a design with a higher value of REA will be preferable. By utilizing equation (6), we can ascertain the relative D-efficiency value. The calculated values are subsequently enumerated in tables 3 and 4.
III. Result and Discussion
I. Scaled Prediction Variance of One Missing Box-Behnken Design
The insignificant difference in the Average Scaled Prediction Variance (ASPV) between instances of missing and non-missing observations is evident from the values in Table 1. The smallest prediction value is preferred for optimal prediction performance among each factor's design points. Regarding missing factorial design points, factor k = 8 has the lowest SPV value of 24.08 compared to all other factors k = 3,4,5,6,7 and 9, all of which have SPV values ranging from 34 to 45.
Table 1: SPV values of one missing observation of BBD k = 3 to 9 factors.
Number of Missing Runs SPV
Factors Design Points Min Avg Max
k = 3 None 3.2521 7.3627 16.6260
F 12 3.0656 8.9792 42.2400
Centre 5 3.6256 7.0704 15.7584
None 4.6440 10.4340 17.4960
k = 4 F 24 4.4892 10.9765 39.4400
Centre 6 5.1591 10.1732 16.9157
None 6.6424 14.9776 24.9964
k = 5 F 40 6.5025 15.3540 40.3875
Centre 6 7.3260 14.8095 24.5565
None 7.7598 13.1220 21.8160
k = 6 F 48 7.7009 13.3666 33.7663
Centre 6 8.5807 13.3030 21.1735
None 9.3248 15.3698 20.0942
7 = k F 56 9.1744 15.6709 40.1258
Centre 6 10.5530 15.5733 19.9958
None 12.1800 16.7040 21.0480
k = 8 F 112 12.0309 16.7076 24.0856
Centre 8 12.7449 17.3621 21.5985
k = 9 None 11.8170 24.9730 34.4110
F 120 11.7648 24.8325 34.8945
Centre 10 12.7839 25.0647 34.4301
II. Scaled Prediction Variance of One Missing Small Box-Behnken Design
In the case of the Small Box-Behnken Design, there is an upward trend in the Scaled Prediction Variance for most factors when there are non-missing design points and a center. However, factor k = 7 deviates from this increasing trend. When comparing the difference between full factorial and 2fI-1 fractional factorial design points based on average and Max SPV, there is a moderate difference in the average SPV of all the factors. Interestingly, factors k = 8 and 9 have similar differences. In contrast, Max SPV for factors k = 4,7 and 9 shows significant differences between F and FF design points, while other factors such as k = 5,6 and 8 exhibit moderate differences. Therefore, we can infer that full factorial observations perform better in terms of prediction when a factorial type of observation is missing, compared to fractional factorial observations.
Table 2: SPV values of one missing observation of SBBD k = 4 to 9 factors.
Number Missing of Factors Design Points
Runs
SPV
Min
Avg
Max
k II 4 None 3.1856 14.1042 44.0000
F 12 3.0954 18.6249 113.4000
FF 4 2.8371 27.5100 307.2300
Centre 6 3.3411 13.5639 43.2600
k = 5 None 4.8030 18.8430 60.6000
F 8 4.7096 19.9404 65.2500
FF 16 4.7908 21.9820 123.8300
Centre 6 5.1881 18.2294 58.2900
k = 6 None 6.0458 21.2116 74.1000
F 16 5.8793 21.7375 74.0000
FF 16 5.9385 24.1092 109.1500
Centre 6 6.6304 21.0974 75.8500
I k None 6.0000 23.5152 67.2000
F 24 5.8750 25.9534 94.4700
FF 16 5.7528 29.6429 248.6300
Centre 8 6.4719 23.0723 70.5000
k = 8 None 7.7760 30.6688 85.1200
F 28 7.6608 32.4198 102.0600
FF 28 7.6734 32.8734 121.5900
Centre 8 8.7129 30.4668 85.0500
k = 9 None 7.0000 40.4110 108.5000
F 24 6.9000 46.1817 180.7800
FF 36 6.9000 46.1196 325.6800
Centre
10
7.6659
40.1373
100.0500
III. Relative G, A, and D efficiency values of Box-Behnken Design
Regarding the impact on relative efficiencies, let's first consider the relative A-efficiency. Table 3 presents the variations in relative A-efficiencies resulting from the absence of a factorial point, a center point, and a non-missing point for all factors from k = 3 to 9. The numerical data indicates that A-efficiency is marginally influenced by the absence of a factorial point for only factor k=3. On the other hand, the absence of a factorial point has a statistically significant effect on all other factors. When a center point is missing, the relative A efficiency is similar to that when no design points are missing. Therefore, estimating the precision of individual variances of regression coefficients of the second-order model performs quite well when either factorial or center run points are missing.
The relative D-efficiencies closely mirror the A-efficiencies. When a factorial or center run observation is missing, the relative D-efficiencies are similar to the efficiencies of a complete design for factors k = 3 to 9. Furthermore, the relative D-efficiency value significantly estimates the covariances among coefficients when some observations are absent.
The relative G-effidencies exhibit notable similarities with the relative A- and D-efficiencies. The absence of a factorial point significantly impacts the relative G-effidencies for factors k = 3,4 and 7, moderately affects factors k = 5,6, and is less concerning for factors k = 8,9. Regarding the missing center point, the relative G-efficiency exceeds one compared to when no observations are missing.
Table 3: Relative G, A, and D efficiency values one missing observation of BBD k = 3 to 9 factors
Number of Missing G Relative G Relative A Relative D
Factors Design Efficiency efficiency efficiency efficiency
Points
k = 3 None 60.1400 1.0000 1.0000 1.0000
F 23.6900 0.3939 0.7846 0.8706
Centre 63.4600 1.0552 0.9577 0.9779
ii 4 None 85.7300 1.0000 1.0000 1.0000
F 37.9100 0.4422 0.9181 0.9433
Centre 88.6700 1.0343 0.9736 0.9879
k = 5 None 85.5100 1.0000 1.0000 1.0000
F 52.0000 0.6081 0.9569 0.9675
Centre 13.7400 0.1607 0.9780 0.9914
k = 6 None 64.17 1.0000 1.0000 1.0000
F 82.9200 0.6461 0.9607 0.9709
Centre 66 1.0303 0.9818 0.9935
k = 7 None 89.5 1.0000 1.0000 1.0000
F 89.7100 0.5007 0.9574 0.9731
Centre 90.025 1.0049 0.9836 0.9949
k = 8 None 71.28 1.0000 1.0000 1.0000
F 93.405 0.8736 0.9853 0.9890
Centre 69.45 0.9744 0.9853 0.9970
k = 9 None 79.84 1.0000 1.0000 1.0000
F 78.81 0.9871 0.9957 0.9932
Centre 79.88 1.0004 0.9957 0.9981
IV. Relative G, A, and D efficiency values of SBBD K = 4 to 9 factors.
Relative A and D efficiencies exhibit similar effects for all factors from k = 5 to 9, except for factor k = 4. For factor k = 4, the relative A efficiency for 2fI-1 fractional factorial points is 0.3906, while the relative D efficiency is 0.7179. Both full factorial and 23I-1 fractional factorial points demonstrate good accuracy for individual coefficients and covariances among coefficients when observations are missing for factors k = 5 to 9 in terms of relative A and D efficiency. The numerical data indicates that the absence of a center point does not impact all factors' relative A and D efficiency.
Relative G efficiencies are significantly influenced by factor k = 4. However, factors k = 7 and 9 exhibit superior prediction performance compared to factor k = 4. Moreover, factors k = 5,6 and 8 excel in minimizing the maximum prediction variance compared to all other factors when the full factorial observation is missing. When it comes to missing 23I-1 fractional factorial observations, only factors k= 6 and 8 have a marginally better effect than all other factors of SBBD. The absence of a center point does not impact the relative G efficiency for any factor.
Table 4: Relative G, A, and D efficiency values one missing observation of SBBD K = 4 to 9 factors.
Number Missing G Relative Relative Relative D
of Factors Design Efficiency G A efficiency
Points efficiency efficiency
k i 4 None 34.07 1.0000 1.0000 1.0000
F 13.2300 0.3883 0.7360 0.8874
FF 4.5600 0.1338 0.3906 0.7179
Centre 34.6900 1.0182 0.9855 0.9879
k = 5 None 34.6200 1.0000 1.0000 1.0000
F 32.2000 0.9301 0.9101 0.9399
FF 16.9700 0.4902 0.8006 0.9127
Centre 35.9900 1.0396 0.9883 0.9914
k = 6 None 37.8100 1.0000 1.0000 1.0000
F 37.8200 1.0003 0.9553 0.9582
FF 25.6800 0.6792 0.8147 0.9285
Centre 36.8300 0.9741 0.9904 0.9935
None 53.6200 1.0000 1.0000 1.0000
k = 7 F 38.0800 0.7102 0.9097 0.9573
FF 14.0700 0.2624 0.7087 0.8710
Centre 51.0700 0.9524 0.9959 0.9963
None 53.0100 1.0000 1.0000
k = 8 F 44.0600 0.8312 0.9426 0.9700
FF 37.0600 0.6991 0.9013 0.9613
Centre 52.7300 0.9947 0.9956 0.9971
k = 9 None 50.7400 1.0000 1.0000 1.0000
F 30.4500 0.6001 0.8923 0.9661
FF 16.8800 0.3327 0.8130 0.9519
Centre 54.8300 1.0023 0.9864 0.9981
V. Discussion on Box-Behnken Design by FDS plot
If we interpret Figure 1 (a) intending to identify the most effective design point types based on G-efficiency and maximum SPV, it appears that the center and non-missing design points outperform factorial. They achieve a maximum SPV value of 15.98, which equates to a G-efficiency of 60.6%; this is considerably better than the factorial design points, which reach a high SPV value of 42.44 and a G-efficiency of 23.69%.
Figure 1: (a) FDS for BBD K = 3. (b) FDS for BBD K = 4. (c) FDS for BBD K = 5. (d) FDS for BBD K = 6
When evaluating BBD k=4 in Figure 1 (b), we can see a clear difference between the 50th and 75th percentiles of the design points. For the factorial points, the SPV at the 50% FDS is 19.80, and at the 75% FDS, it is 29.92, resulting in a difference of 10.12. On the other hand, for the center and non-missing points, the SPV at the 50% FDS is 10.29, and at the 75% FDS, it is 13.68, yielding a smaller difference of only 3. This percentile-based assessment leads us to conclude that the center and non-missing points demonstrate more consistent and superior performance across the design space than the factorial points.
The FDS plot depicted in Figure 1 (c) for BBD k=5 suggests that the absence of both the center and factorial design points significantly reduces the likelihood of obtaining a horizontal flat line. The median FDS value for the center and non-missing points is 15.21 SPV, with a mean or average SPV value of 14.8, indicating a lack of symmetry in the SPV and FDS distributions [11]. When a factorial point is missing, the maximum SPV reaches 40.38, and the average SPV is 15.34, suggesting that 50% of the design space exhibits moderate prediction performance.
The FDS plots in Figures 1 (d) and (e) reveal that the curves for the center and non-missing design points exhibit similar prediction performance. Both reach a maximum SPV value of approximately 20 at FDS=1 for factors six and seven. A slight flat line is noticeable in the horizontal curve of the center and non-missing design points for both factors. This increase begins from the median of the FDS and extends to roughly 90% of the design space region, suggesting a certain level of stability in the SPV distribution of the design points. When a factorial point is missing, factor six achieves a maximum SPV value of 33.76, indicating better prediction performance than factor seven, which has an SPV value of 40.13.
Based on comparing factors eight and nine from the FDS plot in Figures 1 (f) and (g), BBD k=8 exhibits a very similar curve across the entire design space region, with a slightly increasing horizontal line. The SPV values range from a minimum of 12 to a maximum of 21. For BBD k=9, the curves for the center, factorial, and non-missing design points have similar prediction performance and are very close to each other, with SPV values ranging from 12 (min) to 34 (max) across the entire design space region. Interestingly, each factor's prediction performance is comparable when both design points are missing in factors k = 8 and 9.
Figure 1: (e) FDS for BBD K = 7. (f) FDS for BBD K = 8. (g) FDS for BBD K = 9.
VI. Discussion on Small Box-Behnken Design by FDS plot
Indeed, Figure 2 (a) presents four types of design points: full factorial, 2fI-1 fractional factorial, and center and non-missing design points. The point at approximately (0.50,12.15) represents that 50% of the total design space has an SPV value at or below 12 for the design point of one missing center run and non-missing run, where these design points exhibit a flat horizontal curve for the maximum design space region up to 1. A flatter curve implies that the maximum and minimum SPV values are closer together, indicating a more stable distribution of the SPV [11]. The full factorial design points of SBBD k = 4 consist of 12 runs. When 22 full factorial points are missing, the maximum SPV value is 113 for the maximum (FDS = 1) design space region, and 80% of the design region has an SPV value of 47 or below. This is similar to missing a center run for the max SPV value. From this, we can infer that 80% of the 22 full factorial design space region has moderate SPV compared to 100% of FDS. The 23I-1 fractional factorial design point in SBBD has four runs. When an FF point is missing, it results in a large SPV value of 307 for the maximum (FDS = 1) design space region. However, 50% of the total design space has an SPV value at or below 76.6, and 75% of the total design space region has an SPV value at or below 111.78. This suggests that FF has moderate SPV for 75% of the design space region compared to the large SPV value for the maximum (FDS = 1) region.
The FDS plot of the 23 full factorial, center, and non-missing design points is depicted in Figure 2 (b). These design points exhibit similar performance, as indicated by their comparable curves and G-effidendes of 32.20, 35.99, and 34.62, respectively. Upon closer inspection, the center and non-missing design points are strikingly similar across the entire design space region, from the minimum (FDS=0) to the maximum (FDS=1), with SPV values of 60.60 and 58.29, respectively. However, the 23 full factorial design point deviates slightly from the other two similarity curves in the FDS region of 75%. The average SPV values for the center, full factorial, and non-missing points are also similar at 18.84, 19.94, and 18.22, respectively, suggesting that the absence of these design points during experimentation does not significantly impact the prediction performance of SPV on average.
Despite having a high SPV value of 123.83 at maximum (FDS=1), 2fI-1 fractional factorial design points maintain a median SPV value of 31.82 in the design space curve and an SPV value of 60.42 for 88% of the total design space. This roughly equates to the maximum SPV value (FDS =1) of the center, factorial, and non-missing runs. Therefore, we can conclude that 23I-1 fractional factorial design points generally provide good prediction performance for most of the design space (90%), except for the maximum region (100%), where their performance is subpar.
As depicted in Figure 2 (c), the design points for the factor k = 6, including centre, 23 full factorial, and non-missing, exhibit a similar horizontal curve across the entire design space region, from the minimum (FDS = 0) to the maximum (FDS = 1). These design points have a G-efficiency of 37% and a maximum SPV value of 74. The SPV value fluctuates between 21 and 30, covering 55% to 80% of the design space region. It's interesting to note that while both full factorial and fractional factorial design points consist of the same sixteen runs, the absence of a run from the 2fI-1 fractional factorial alone results in a significant SPV value of 109.45 in the maximum (FDS =1) region. The curves for 23I-1 fractional factorial are closely aligned with other design points from the minimum (FDS =0) region to 80% of the design space, with an SPV of 40.47 or less. All design points demonstrate moderate or average prediction performance up to 80% of the total design space region, indicating satisfactory prediction performance within this FDS region.
As shown in Figure 2 (d), 23I-1 fractional factorial design points are the only ones that do not exhibit a flat horizontal curve among all design points. The center and non-missing design points share the same horizontal curve, with an average SPV of 22.79 and similar G-efficiencies of 51 and 53.62, respectively. The 23 full factorial design points of SBBD k = 7 consist of 24 runs. If a run is missing from the experiment, the prediction performance of the design remains relatively consistent over approximately 80% of the total design space region, with an SPV value of 42.76. The 23I-1 fractional factorial design points have a substantial SPV value of 248.63 in the maximum (FDS=1) design space region. However, the median of the FDS curve has an average SPV of 70, where the maximum SPV is more than twice the average SPV value. Therefore, we can conclude that the
A.R.Gokul and M.Pachamuthu RT&A, No 4(80) PREDICTION OF BOX-BEHNKEN DESIGN IN MISSING CASE_Volume 19, December, 2024
prediction performance of 23I-1 fractional factorial is moderate for only 50% of the total design space region.
As per the FDS plot in Figure 2 (e), it's observed that the prediction performance across all shrinkage levels is quite similar for all types of design points, ranging from the lowest to the highest SPV value. Approximately half of the FDS curves for all design points closely follow the average SPV values, while the remaining curves diverge towards the maximum of (FDS=1). The global FDS curve suggests that less than 25% of the design space has an SPV value of 29.61 or lower, and 50% has an SPV value of 39.62 or lower for all types of points, including those without missing values. For up to 75% of the design space, the prediction performance of non-missing and center points is significantly better than that of 22 or 23 full factorial points, and 22 or 23 full factorial missing points outperform 2fI- fractional factorial points.
Figure 2: (a) FDS for SBBD K = 4. (b) FDS for SBBD K = 5. (c) FDS for SBBD K = 6
Figure 2: (a) FDS for SBBD K = 7. (e) FDS for SBBD K = 8. (f) FDS for SBBD K = 9
A.R.Gokul and M.Pachamuthu RT&A> No 4(80) PREDICTION OF BOX-BEHNKEN DESIGN IN MISSING CASE_V°lume 19 Decemta, 2024
IV. Conclusion
The robustness of a single missing observation in BBD and SBBD is examined to determine which design points offer the best design efficiency and model parameter estimation using an information-based criterion. BBD and SBBD show good accuracy in estimating individual coefficients and covariances among coefficients when a design point is missing for all factors, except for factor k = 3 in BBD and k = 4 in SBBD, which show poor accuracy. The relative G- efficiency of BBD indicates that factors with increasing numbers have lower maximum variance across the experimental space, except for factor k = 7. In SBBD, full factorial and center design points have lower maximum variance across the experimental space than fractional factorial design points. The FDS plot reveals that missing a center and non-missing run have similar SPV values for all factors across the entire design space region for both BBD and SBBD. In comparison, the factorial design point type in SBBD has higher SPV values than BBD despite the fewer runs in SBBD. For BBD, all factors have similar performance for factorial design points. However, in SBBD, full factorial design points outperform 2fI-1 fractional factorial design points have a high SPV value at the maximum region (FDS =1). If a design point is missing in the 23I-1 fractional factorial design points of SBBD, it results in subpar prediction performance.
This research work indeed holds significant potential in identifying robust missing design point types when observations are missing in an experimental situation for both Box-Behnken Design and Small Box-Behnken Design for certain factors. The findings can be beneficial when data may be lost or corrupted during the experimental process. Moreover, the scope for further research is vast. Future studies could extend this work to include more factors and more than one missing observation with multiple combinations. This would allow for a more comprehensive understanding of the robustness of these designs under various conditions.
References
[1] Akhtar, M. (2001). Five-Factor Central Composite Designs Robust To A Pair Of Missing Observations. ▼ J. Res. Sci, 12(2), 105-115.
[2] Akhtar, M. and Prescott, P. (1986). Response surface designs robust to missing observations. Communications in Statistics - Simulation and Computation, 15(2), 345-363.
[3] Alanazi, K. Georgiou, S. D. and Stylianou, S. (2023). On the robustness of central composite designs when missing two observations. Quality and Reliability Engineering International, 39(4), 11431171.
[4] Alrweili, H. Georgiou, S. and Stylianou, S. (2019). Robustness of response surface designs to missing data. Quality and Reliability Engineering International, 35(5), 1288-1296.
[5] Borkowski, J. J. (1995). Spherical prediction-variance properties of central composite and Box—Behnken designs. Technometrics, 37(4), 399-410.
[6] Box, G. E. P. and Behnken, D. W. (1960). Some New Three Level Designs for the Study of Quantitative Variables. Technometrics, 2(4), 455-475.
[7] Draper, N. R. (1961). Missing Values in Response Surface Designs. Technometrics, 3(3), 389.
[8] Hayat, H. Akbar, A. Ahmad, T. Bhatti, S. H. and Ullah, M. I. (2023). Robustness to missing observation and optimalities of response surface designs with regular and complex structure. Communications in Statistics - Simulation and Computation, 52(11), 5213-5230.
[9] Hemavathi, M. Varghese, E. Shekhar, S. Athulya, C. K. Ebeneezar, S. Gills, R. and Jaggi, S. (2022). Robustness of sequential third-order response surface design to missing observations. Journal of Taibah University for Science, 16(1), 270-279.
[10] Iwundu, M. (2017). Missing observations: The loss in relative A-, D- and G-efficiency. International Journal of Advanced Mathematical Sciences, 5(2), 43.
[11] Khuri, A. I. and Mukhopadhyay, S. (2010, March). Response surface methodology. Wiley Interdisciplinary Reviews: Computational Statistics.
[12] Li, J. Li, L. Borror, C. M. Anderson-Cook, C. and Montgomery, D. C. (2009). Graphical Summaries to Compare Prediction Variance Performance for Variations of the Central Composite Design for 6 to 10 Factors. Quality Technology & Quantitative Management, 6(4), 433-449.
[13] Chigbu, P. E. Ukaegbu, E. C. and Nwanya, J. C. (2009). On comparing the prediction variances of some central composite designs in spherical regions: A review. Statistica, 69(4), 285-298.
[14] Onwuamaeze, C. U. (2021). Optimal prediction variance properties of some central composite designs in the hypercube. Communications in Statistics - Theory and Methods, 50(8), 19111924.
[15] Ozol-Godfrey, A. Anderson-Cook, C. M. and Montgomery, D. C. (2005). Fraction of Design Space Plots for Examining Model Robustness. Journal of Quality Technology, 37(3), 223-235.
[16] Park, Y. J. Richardson, D. E. Montgomery, D. C. Ozol-Godfrey, A. Borror, C. M. and Anderson-Cook, C. M. (2005). Prediction variance properties of second-order designs for cuboidal regions. Journal of Quality Technology, 37(4), 253-266.
[17] Rashid, F. Akbar, A. and Arshad, H. M. (2022). Effects of missing observations on predictive capability of augmented Box-Behnken designs. Communications in Statistics - Theory and Methods, 52(20), 7225-7242.
[18] Rashid, F. Akbar, A. and Zafar, Z. (2019). Some new third order designs robust to one missing observation. Communications in Statistics - Theory and Methods, 48(24), 6054-6062.
[19] Myers, R. H. Montgomery, D. C. and Anderson-Cook, C. M. (2016).Response surface methodology: process and product optimization using designed experiments. John Wiley & Sons.
[20] Smucker, B. J. Jensen, W. Wu, Z. and Wang, B. (2017). Robustness of classical and optimal designs to missing observations. Computational Statistics and Data Analysis, 223, 251-260.
[21] Tanco, M. Del Castillo, E. and Viles, E. (2013). Robustness of three-level response surface designs against missing data. IIE Transactions (Institute of Industrial Engineers), 45(5), 544-553.
[22] Whittinghill, D. C. (1998). A note on the robustness of Box-Behnken designs to the unavailability of data. Metrika (Vol. 48). Springer-Verlag.
[23] Zahran, A. Anderson-Cook, C. M. and Myers, R. H. (2003). Fraction of design space to assess prediction capability of response surface designs. Journal of Quality Technology, 35(4), 377-386.
[24] Zhang, T. F. Yang, J. F. and Lin, D. K. J. (2011). Small Box-Behnken design. Statistics and Probability Letters, 82(8), 1027-1033.