Determination of the Optimal Neural Network Transfer Function for Response Surface Methodology and Robust Design

Le, Tuan-Ho; Jang, Hyeonae; Shin, Sangmun

doi:10.3390/app11156768

Open AccessArticle

Determination of the Optimal Neural Network Transfer Function for Response Surface Methodology and Robust Design

by

Tuan-Ho Le

¹,

Hyeonae Jang

^2,* and

Sangmun Shin

^3,*

¹

Department of Electrical Engineering, Faculty of Engineering and Technology, Quy Nhon University, Binh Dinh 591417, Vietnam

²

Department of Technology Management Engineering, Jeonju University, Jeonju 55069, Korea

³

Department of Industrial & Management Systems Engineering, Dong-A University, Busan 49315, Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2021, 11(15), 6768; https://doi.org/10.3390/app11156768

Submission received: 13 May 2021 / Revised: 14 July 2021 / Accepted: 20 July 2021 / Published: 23 July 2021

(This article belongs to the Section Applied Industrial Technologies)

Download

Browse Figures

Versions Notes

Abstract

:

Response surface methodology (RSM) has been widely recognized as an essential estimation tool in many robust design studies investigating the second-order polynomial functional relationship between the responses of interest and their associated input variables. However, there is scope for improvement in the flexibility of estimation models and the accuracy of their results. Although many NN-based estimations and optimization approaches have been reported in the literature, a closed functional form is not readily available. To address this limitation, a maximum-likelihood estimation approach for an NN-based response function estimation (NRFE) is used to obtain the functional forms of the process mean and standard deviation. While the estimation results of most existing NN-based approaches depend primarily on their transfer functions, this approach often requires a screening procedure for various transfer functions. In this study, the proposed NRFE identifies a new screening procedure to obtain the best transfer function in an NN structure using a desirability function family while determining its associated weight parameters. A statistical simulation was performed to evaluate the efficiency of the proposed NRFE method. In this particular simulation, the proposed NRFE method provided significantly better results than conventional RSM. Finally, a numerical example is used for validating the proposed method.

Keywords:

response surface methodology; neural network; desirability function; maximum-likelihood estimation; robust design

1. Introduction

Among the many quality engineering methodologies, robust design (RD) based on statistical design and analysis methods and optimization methods have contributed significantly to the improvement of product/process quality for more than 20 years. The main objective of RD is to identify the optimal factor settings that can minimize both variability and bias (namely, the deviation between the process mean and desired target value) in a process/product. To demonstrate this RD principle, Taguchi introduced a two-step model based on new design and analysis approaches, particularly orthogonal array and signal-to-noise ratios [1]. However, orthogonal arrays, statistical analysis, and signal-to-noise ratios are highly controversial [2,3,4,5]. Consequently, Vining and Myers [6] improved Taguchi’s model by proposing dual-response (DR) estimation and optimization approaches, in which the separate functions of the mean and variance responses are estimated using response surface methodology (RSM) based on the least-squares method (LSM). Their proposed optimization approach can calculate robust factor settings by minimizing process variability while maintaining the process mean at the target value. This results in the primary procedure of RD being identified in three sequential steps: design of experiment (DoE), response function estimation, and optimization. However, Lin and Tu [7] pointed out that process bias and variability are not considered simultaneously in the DR model and proposed a mean square error (MSE) model as an alternative approach providing more model flexibility when compared with that provided by the DR approach. These DR and MSE models were further extended by Del Castillo and Montgomery [8], Copeland and Nelson [9], Cho et al. [10], Ames et al. [11], Borror [12], Kim and Lin [13], Koksoy and Doganaksoy [14], Ding et al. [15], Shin and Cho [16,17,18], Robinson et al. [19], Fogliatto [20], Goethals and Cho [21], Truong and Shin [22,23], Nha et al. [24], Baba et al. [25], and Yanıkoğlu et al. [26].

In the response function estimation step of the RD procedure, RSM is used extensively to obtain the estimated function associated with an output response and its associated input factors based on the DoE results. RSM is applied extensively in real-world industrial situations, mainly where several input factors potentially affect a product or process’s performance measures or quality characteristics [27]. The relationship between several input factors and one or more associated output responses can be estimated using RSM. Conventionally, LSM can estimate the unknown coefficients of a regression model with many specific error assumptions, such as the errors in the output responses should be followed by a normal distribution (i.e., zero mean and constant variance) identically and independently. However, these assumptions are often violated in many practical problems. Several alternative approaches, such as transformation, weighted least squares (WLS), maximum-likelihood estimation (MLE), Bayesian, and inverse problems, can be used to address this problem. For example, the WLS method has been applied to RSM to estimate the model parameters for unbalanced data [28,29], and Lee and Park [30] and Cho et al. [31] have integrated an MLE method and an expectation–maximization algorithm for unknown parameter estimations using incomplete data. In addition, Goegebeur et al. [32] and Chen and Ye [33,34] proposed a Bayesian approach to estimate the coefficients of response functions where the variances follow a log-normal distribution. Truong and Shin [22,23] proposed a new estimation method that used an inverse problem approach based on Bayesian perspectives. Most of the existing estimation methods in the RSM and RD literature attempt to generate functional forms in input factors for output responses (namely, the process mean, process variance, and quality characteristic). However, relaxing the error assumptions and increasing the estimation precision would result in further improvements.

Artificial neural networks (ANNs) or neural networks (NNs) with nonlinear mapping structures using the human brain function have been widely used as powerful tools in data classification, forecasting, clustering, function approximation, and optimization in recent decades. Irie and Miyake [35], Hornik et al. [36], Cybenko [37], and Funahashi [38] recently addressed the question of approximation using feed-forward NNs and proved that NNs are universal functional approximators with the desired accuracy. ANNs belong to a class of self-adaptive and data-driven techniques, and therefore the undefined relationships between input factors and outputs of a process/product can be ascertained effectively. ANNs can provide a linear or nonlinear relationship between inputs and several outputs without any assumptions based on generalizing capacities of the activation function. Consequently, the functional relationships between input factors and their related output responses in an RD procedure can be estimated without any specific assumptions effectively. Most of the literature on NN function approximation has concentrated on their structure, hidden layers, hidden neurons, and training algorithms. On the contrary, less of them focus on the transfer or activation function, which can strongly affect the complexity and performance of NNs. In addition, most of the transfer functions reported in the literature can provide only a particular configuration to transfer the information from inputs to the associated outputs of a neuron.

The desirability function (DF) approach is one of the most commonly utilized techniques for multiple response optimization problems, in which each quality characteristic is converted into an individual DF that takes a value ranging from 0 to 1. The DF concept was introduced by Harrington [39] for the simultaneous optimization of multiple response problems in the industry field. This approach can be used to determine those optimal factors settings of input variables that result in the most desirable output values, while none of these response values are beyond the specified boundaries. Derringer and Suich [40] implemented a further extension of Harrington’s DF. Depending on whether the response will minimize, maximize, or specify the desired target value, there are three associated DFs: STB (smaller is better), NTB (nominal is better), and LTB (larger is better). Based on the weight parameters, the DF family may provide a flexible transfer function configuration.

Rowlands et al. [41] integrated an NN approach into RD and used the NN to perform the DoE. Su and Hsieh [42], Cook et al. [43], Chow et al. [44], Chang [45], and Chang and Chen [46] combined an NN approach as an estimation method with a genetic algorithm to determine the optimal process parameter settings or the optimal costs in both static and dynamic characteristics in Taguchi’s formula without considering the process mean and variance functions. Recently, Arungpadang and Kim [47] presented a feed-forward NN structure based on RSM in the RD concept.

In the robust design (RD) concept, the process mean and standard deviation of an output response in experiment results can be estimated as a function by many different statistical approaches (i.e., LSM, MLE, and so on). These processes mean and standard deviation functions can be optimized simultaneously based on their desired targets using existing optimization methods. The significant objective of this paper is to propose an alternative estimation method to DR functions (i.e., mean and standard deviation functions) by integrating NN approaches for RSM and RD modeling and then to compare this method to the conventional LSM-based estimation method. First, a feed-forward NN is integrated into the RD procedure as a new DR estimation approach to estimate the process mean and standard deviation response functions. In order to develop the new NN-based estimation method, many different types of transfer functions that can affect the quality of estimation results should be considered.

For this reason, DFs can be utilized as a transfer function family because DFs can represent all three different types (i.e., L-type, S-type, and N-type) of quality characteristics. Among these transfer functions based on DFs, the best transfer function for experimental data using the identified learning and validation approaches is then investigated in this proposed approach. This proposed best transfer function determination procedure for a given NN structure based on DF is derived for the proposed DR estimation method. DFs with their associated weight parameters can provide a variety of transfer function families. In addition, the process mean is often kept at the target value, and the process standard deviation is minimized to the least possible in the RD methodology. Therefore, the “nominal is best” and “smaller is better” DFs are proposed as new NN transfer functions. Next, the best transfer functions for the process mean response function and standard deviation response function are determined based on the optimal weight parameters of the proposed transfer functions using MLE. The associated confidence intervals for the optimal weight parameters are also determined. The results of a case study are presented to support the view that better solutions are obtained from the proposed feed-forward NN-based DR estimation method compared with those from the conventional LSM method and the feed-forward NN estimation approach by using the conventional log-sigmoid transfer function. An overview of the proposed NN-based DR function estimation method is presented in Figure 1.

2. RD Estimation Method Based on RSM

RSM, developed by Box and Wilson [48], consists of mathematical and statistical techniques based on the fittings of empirical models obtained from an experimental design. RSM can estimate the functional empirical relationship between the input factors and their related output responses. The basic theory, estimation methods, and analytical techniques of RSM can be found in the study by Myers and Montgomery [27]. Myers [49] and Khuri and Mukhopadhyay [50] provide insights into the various developmental stages and future directions of RSM. In general, the output response

y

can be identified as a function of the input factor x as follows:

y = x β + ε,

(1)

where x, β, and ε denote the vector of control factors, column vector of model parameters, and random error. The estimated second-order models for the process mean and standard deviation functions are represented as follows:

{\hat{μ}}_{L S M} (x) = {\hat{β}}_{0} + \sum_{i = 1}^{p} {\hat{β}}_{i} x_{i} + \sum_{i = 1}^{p} {\hat{β}}_{i} x_{i}^{2} + \sum_{\begin{matrix} i = 1 \\ i < j \end{matrix}}^{p} \sum_{j = 1}^{p} {\hat{β}}_{i j} x_{i} x_{j},

(2)

{\hat{σ}}_{L S M} (x) = {\hat{δ}}_{0} + \sum_{i = 1}^{p} {\hat{δ}}_{i} x_{i} + \sum_{i = 1}^{p} {\hat{δ}}_{i} x_{i}^{2} + \sum_{\begin{matrix} i = 1 \\ i < j \end{matrix}}^{p} \sum_{j = 1}^{p} {\hat{δ}}_{i j} x_{i} x_{j},

(3)

where

\hat{β}

and

\hat{δ}

are the estimators of the unknown parameters in the process mean and standard deviation functions, respectively. These coefficients are estimated using the conventional LSM, as follows:

\hat{β} = {(X^{T} X)}^{- 1} X^{T} \bar{y_{o b s}} and \hat{δ} = {(X^{T} X)}^{- 1} X^{T} s_{o b s},

(4)

where

\bar{y_{o b s}}

,

s_{o b s}

, and

X

are the observation average, the standard deviation of replicated responses, and design matrix, respectively.

3. Proposed Feed-Forward NN-Based DR Estimation Method

3.1. Feed-Forward NN Structure

NNs may include a broad class of flexible nonlinear regression, discriminant and data reduction models, and nonlinear dynamical systems [51]. An NN consists of simple, highly interconnected computational units called artificial neurons. Zainuddin and Pauline [52] demonstrated that feed-forward NNs from the input to the output are the most popular NNs for use in function approximation. In addition, Cybenko [37], Hornik et al. [36], Funahashi [38], and Hartman et al. [53] demonstrated that a feed-forward NN including one hidden layer could approximate a continuous, arbitrary nonlinear, and multidimensional function. Both processes mean and standard deviation can be two output response characteristics of interest in the RD methodology. The proposed feed-forward NN-based method for estimating the functional relationship associated with input factors and output responses in the RD is depicted in Figure 2.

A vector

x

comprising

k

control factors

x_{1}

,…,

x_{i}

, …,

x_{k}

, named as the input layer, is the input to the hidden neuron in the hidden layer. The summation of

k

inputs of neurons with identified weights (i.e., coefficients) and bias

a

is transformed using the transfer function

f

to generate the associated output of the hidden neuron. The outputs of the hidden neuron are used as inputs to the neuron in the output layer. Assume that there are h hidden nodes

1, \dots, j, \dots, h

,

w_{i j}

,

v_{j}

,

a_{j}

, and

b

denotes the weight connecting input factor

x_{i}

to hidden node

j,

the weight connecting hidden node

j

to the final output, the bias at the hidden node

j

, and the bias at the output, respectively. Then, the general form of the response function obtained using the proposed NN-based estimation method is as follows:

{\hat{y}}_{N N} = g \{\sum_{j = 1}^{h} v_{j} y_{j}^{h i d} + b_{o u t}\} = g \{\sum_{j = 1}^{h} v_{j} [f (\sum_{i = 1}^{k} w_{i j} x_{i} + a_{j})] + b_{o u t}\} .

(5)

3.2. Back-Propagation Learning Algorithm

Among the learning algorithms reported in the literature for performing NN estimation, the back-propagation algorithm is commonly used for training a network owing to its simplicity and applicability [54]. This algorithm comprises two sequential steps: first, the weights of the NN are initialized randomly, and then, the output of the NN is computed and compared with the actual output. The error at the output layer, that is, the network error, is calculated by comparing the actual output

{\hat{y}}_{o u t}

to the desired value

y_{o u t}

to iteratively adjust the weights of the hidden and output layers in accordance with the following function:

E = \frac{1}{2} \sum {(y_{o u t} - {\hat{y}}_{o u t})}^{2} .

(6)

Epoch is a hyperparameter that defines the number of times the entire training vectors are used to update the weights. During the training process, the error is optimally minimized. The iterative step of the gradient descent algorithm changes weights

w_{j}

based on the following relationship:

w_{j} \to w_{j} + Δ w_{j}, where Δ w_{j} = - η \frac{\partial E (w)}{\partial w_{j}} .

(7)

The parameter

η (> 0)

is called the learning rate. There are numerous variations in the basic algorithm based on other optimization techniques (namely, conjugate gradient and Newton methods).

The Marquardt–Levenberg algorithm (trainlm) approximates Newton’s method, which is designed to approach the second-order training speed without applying the Hessian matrix.

The resilient back-propagation training algorithm (trainrp) is a local adaptive learning technique, which performs supervised batch learning in a feed-forward NN. Trainrp removes the negative impact of the dimension of the partial derivative. Therefore, only the sign of the derivative implies the direction of a weight update process.

4. Proposed Transfer Functions Based on DFs

4.1. Integration of DFs into Transfer Functions

DF

d (y)

scales the possible response (y) value to the range of [0, 1], where

d (y) = 0

and

d (y) = 1

represent completely undesirable and desirable values, respectively. Among the three DFs proposed by Derringer and Suich [40] for three situations of a particular response, the “nominal is best” and “smaller is better” DFs are utilized as new transfer functions in the hidden NN layers in order to estimate the process mean and standard deviation functions. Assume that there are

q

runs, and hence, the summation at each hidden neuron represents vector

y_{j}

consisting of

\{y_{1}, \dots, y_{p}, \dots, y_{q}\}

values, where the input of each hidden neuron can then be calculated as

y_{p} (x) = \sum_{i = 1}^{k} x_{i} w_{i j} + a_{j}

at each run order. Therefore, the associated DF output can be represented as

d_{j} = \{d_{1}, \dots, d_{p}, \dots, d_{q}\}

. When a response belongs to the “nominal is best” case, the individual DF is defined as

d_{p} (y_{p}) = \{\begin{matrix} 0 i f y_{p} (x) < L_{p} \\ {(\frac{y_{p} (x) - L_{p}}{T_{p} - L_{p}})}^{s} i f L_{p} \leq y_{p} (x) \leq T_{p} \\ {(\frac{y_{p} (x) - U_{p}}{T_{p} - U_{p}})}^{t} i f T_{p} \leq y_{p} (x) \leq U_{p} \\ 0 i f y_{p} (x) > U_{p} \end{matrix}

(8)

When a response needs to be minimized, the individual DF is

d_{p} (y_{p}) = \{\begin{matrix} 1 i f y_{p} (x) < L_{p} \\ {(\frac{y_{p} (x) - U_{p}}{L_{p} - U_{p}})}^{r} i f L_{p} \leq y_{p} (x) \leq U_{p} \\ 0 i f y_{p} (x) > U_{p} \end{matrix}

(9)

where

L_{p}

,

T_{p}

, and

U_{p}

denote the lower specification, target, and upper specification values, respectively. Superscripts

r

and

s

in Equation (8) and superscript

t

in Equation (9) represent weight parameters of the “nominal is best” and the “smaller is better” DFs, respectively. When the values of

s, t,

and

r

are set to 1, the DFs become linear. For

s < 1

,

t < 1

, and

r < 1

the functions are convex, and for

s > 1

,

t > 1

, and

r > 1

the functions become concave. The integration of the “nominal is best” and “smaller is better” DFs as new transfer functions in the hidden layers to estimate the process mean and standard deviation is depicted in Figure 3.

From Equation (5), the proposed estimated process mean and standard deviation functions (namely,

{\hat{μ}}_{p r o} (x)

and

{\hat{σ}}_{p r o} (x)

) are defined, respectively, as

{\hat{μ}}_{p r o} (x) = \sum_{j = 1}^{h_{m e a n_{p r o}}} \{v_{j}^{{mean}_{pro}} [d_{m e a n}^{*} (\sum_{i = 1}^{k} w_{i j}^{{mean}_{pro}} x_{i} + a_{j}^{{mean}_{pro}})]\} + b_{m e a n}^{p r o},

(10)

{\hat{σ}}_{p r o} (x) = \sum_{j = 1}^{h_s t d_p r o} \{v_{j}^{std_pro} [d_{s t d}^{*} (\sum_{i = 1}^{k} w_{i j}^{std_pro} x_{i} + a_{j}^{std_pro})]\} + b_{s t d}^{p r o},

(11)

where

h_m e a n_p r o

,

h_s t d_p r o

,

a_{j}^{mean_pro}

,

a_{j}^{std_pro}

,

b_{m e a n}^{p r o}

,

b_{s t d}^{p r o}

,

v_{j}^{mean_pro}

,

v_{j}^{std_pro}

,

w_{i j}^{mean_pro}

,

w_{i j}^{std_pro}

,

d_{m e a n}^{*}

, and

d_{s t d}^{*}

denote the number of hidden neurons, bias at the hidden node

j

, bias at the output neuron, weights connecting hidden node

j

to the output responses, weights connecting input factors

x_{i}

to hidden node

j

of the proposed feed-forward NN, and the optimal DFs for the process mean and standard deviation functions, respectively. In addition, the log-sigmoid and linear functions are usually used as conventional transfer functions in hidden and output layers, respectively. The conventional log-sigmoid function

f

and linear function

g

with input layer

x

are given as

f = \frac{1}{1 + \exp (- x)} and g = x .

(12)

Similarly, the estimated process mean and standard deviation functions are given, respectively, as follows:

{\hat{μ}}_{N N} (x) = \sum_{j = 1}^{h_m e a n} \{v_{j}^{mean} [\frac{1}{1 + \exp (- (\sum_{i = 1}^{k} w_{i j}^{mean} x_{i} + a_{j}^{mean}))}]\} + b_{m e a n}^{N N},

(13)

{\hat{σ}}_{N N} (x) = \sum_{j = 1}^{h_s t d} \{v_{j}^{std} [\frac{1}{1 + \exp (- (\sum_{i = 1}^{k} w_{i j}^{std} x_{i} + a_{j}^{std}))}]\} + b_{s t d}^{N N},

(14)

where

h_m e a n

,

h_s t d

,

a_{j}^{mean}

,

a_{j}^{std}

,

b_{m e a n}^{N N}

,

b_{s t d}^{N N}

,

v_{j}^{mean}

,

v_{j}^{std}

,

w_{i j}^{mean}

, and

w_{i j}^{std}

denote the number of hidden neurons, bias at hidden node

j

, bias at the output neuron, weight connecting hidden node

j

to the final output, and the weight connecting input factor

x_{i}

to hidden node

j

of the feed-forward NNs for the process mean and standard deviation functions, respectively.

4.2. Estimation of Weight Parameters for NN Transfer Functions

The DF can provide a flexible NN transfer function family based on various values of the weight parameter. Among the likelihood-based estimation methods, the MLE method was proposed to estimate the optimal weight parameters for NN transfer functions. On the premise that DF values

d_{j}

follow a normal distribution

(μ, σ^{2})

, the distribution of summation

y_{j}

can be inferred.

4.2.1. Estimation of Parameter $s$

Using Equation (8), the parametric family in this case can be given as

(μ, σ^{2}, s)

. Based on the principles of MLE, parameters

(μ, σ^{2}, s)

can be identified when log-likelihood functions

ℓ (d_{j}, μ, σ^{2}, s)

or

ℓ (d_{j}, s)

are the highest. The log-likelihood function in relation to summation

y

is obtained by multiplying the normal probability density function by the Jacobian of the DF,

J (d_{j}, s)

:

ℓ (d_{j}, s) = \log [\prod_{p = 1}^{q} \frac{1}{σ \sqrt{2 π}} e^{- \frac{{(d_{p} - μ)}^{2}}{2 σ^{2}}} \times J (d_{j}, s)],

(15)

where

J (d_{j}, s) = \prod_{p = 1}^{q} |\frac{\partial d_{p}}{\partial y_{p}}| = {(\frac{s}{T_{p} - L_{p}})}^{q} \prod_{p = 1}^{q} {(\frac{y_{p} - L_{p}}{T_{p} - L_{p}})}^{s - 1} .

(16)

Therefore, the log-likelihood function is

ℓ (d_{j}, s) = - \frac{q}{2} \log σ^{2} - \frac{q}{2} \log 2 π - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{2 σ^{2}} + q \log (\frac{s}{T_{p} - L_{p}}) + (s - 1) \sum_{p = 1}^{q} \log (\frac{y_{p} - L_{p}}{T_{p} - L_{p}})

(17)

For a fixed value

s

, obtaining the derivative of

ℓ (d_{j}, s)

with respect to

σ^{2}

, setting this derivative function equal to zero, and solving this function by

σ^{2}

yields

σ^{2} = \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{q} .

(18)

Further, obtaining the derivative of

ℓ (d_{j}, s)

with respect to

μ

, assuming it equal to zero, and solving for

μ

yields

μ = \frac{1}{q} \sum_{p = 1}^{q} d_{p} = \bar{d_{p}} .

(19)

Accordingly, the log-likelihood function can be represented as follows:

ℓ (d_{j}, s) = - \frac{q}{2} \log 2 π - \frac{q}{2} \log \frac{\sum_{p = 1}^{q} {(d_{p} - \bar{d_{p}})}^{2}}{q} - \frac{q}{2} + q \log (\frac{s}{T_{p} - L_{p}}) + (s - 1) \sum_{p = 1}^{q} \log (\frac{y_{p} - L_{p}}{T_{p} - L_{p}})

(20)

Optimal value

\hat{s}

is specified using the statistical simulation method.

4.2.2. Estimation of Parameter $t$

Similarly, from Equation (8), the log-likelihood function for parameter

t

is

ℓ (d_{j}, t) = - \frac{q}{2} \log σ^{2} - \frac{q}{2} \log 2 π - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{2 σ^{2}} + q \log (\frac{t}{T_{p} - U_{p}}) + (t - 1) \sum_{p = 1}^{q} \log (\frac{y_{p} - U_{p}}{T_{p} - U_{p}})

(21)

The profile log-likelihood function used to estimate parameter

t

can be defined as follows:

ℓ (d_{j}, t) = - \frac{q}{2} \log 2 π - \frac{q}{2} \log \frac{\sum_{p = 1}^{q} {(d_{p} - \bar{d_{p}})}^{2}}{q} - \frac{q}{2} + q \log (\frac{t}{T_{p} - U_{p}}) + (t - 1) \sum_{p = 1}^{q} \log (\frac{y_{p} - U_{p}}{T_{p} - U_{p}})

(22)

Optimal value

\hat{t}

is also identified using the statistical simulation method.

4.2.3. Estimation of Parameter $r$

Using Equation (9), the log-likelihood function for parameter

r

is

ℓ (d_{j}, r) = - \frac{q}{2} \log σ^{2} - \frac{q}{2} \log 2 π - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{2 σ^{2}} + q \log (\frac{r}{L_{p} - U_{p}}) + (r - 1) \sum_{p = 1}^{q} \log (\frac{y_{p} - U_{p}}{L_{p} - U_{p}})

(23)

The profile log-likelihood function can then be defined as follows:

ℓ (d_{j}, r) = - \frac{q}{2} \log 2 π - \frac{q}{2} \log \frac{\sum_{p = 1}^{q} {(d_{p} - \bar{d_{p}})}^{2}}{q} - \frac{q}{2} + q \log (\frac{r}{L_{p} - U_{p}}) + (r - 1) \sum_{p = 1}^{q} \log (\frac{y_{p} - U_{p}}{L_{p} - U_{p}})

(24)

Optimal value

\hat{r}

is also determined using the statistical simulation method.

4.3. Confidence Intervals for Weight Parameters

The asymptotic distribution of a maximum likelihood estimator is proposed to identify confidence intervals for weight parameters because the distribution of parameter

\hat{θ}

estimated using MLE is asymptotically normal [55]. An approximate

(1 - α) 100 %

confidence interval for a parameter

θ

can be given as follows:

{\hat{θ}}_{i} \pm z^{*} \hat{SE} ({\hat{θ}}_{i}),

(25)

where

z^{*}

refers to the

(1 - \frac{α}{2}) 100^{t h}

percentile of the standard normal distribution and

\hat{SE} ({\hat{θ}}_{i})

represents the estimated value of the standard error of

{\hat{θ}}_{i}

. Based on the estimates of the parameters

\hat{θ}

, the standard errors of the parameters can be defined by the square root of the diagonal elements of the estimated covariance matrix. First, the distribution of

\hat{θ}

is stated, followed by the derivation of the approximate asymptotic distribution of parameters

\hat{s}

,

\hat{t},

or

\hat{r}

.

The first derivative of log-likelihood function

ℓ (θ | d_{j})

can be given as

V = \frac{\partial ℓ (θ | d_{j})}{\partial θ} .

(26)

The observed Fisher’s total information matrix is given as

\hat{J} (θ) = - \frac{\partial V}{\partial θ} = - \frac{\partial^{2} ℓ (θ | d_{j})}{\partial θ^{2}} .

(27)

Therefore, Fisher’s total information matrix

J

can then be defined as follows:

J (θ) = E [\hat{J} (θ)] .

(28)

The distribution of

\sqrt{q} (\hat{θ} - θ)

as

q \to \infty

converges to a normal distribution. In particular,

{[\frac{J (θ)}{q}]}^{\frac{1}{2}} \sqrt{q} (\hat{θ} - θ) \overset{d i s t}{\to} N (0, I)

(29)

where

J (θ)

is Fisher’s total information matrix [55].

J (θ)

is typically not known, but by substituting a

q^{- \frac{1}{2}}

consistent estimator,

\frac{\hat{J} (\hat{θ})}{q}

for

\frac{J (θ)}{q}

in Equation (29), the same result can be obtained. Accordingly,

\hat{J} {(\hat{θ})}^{\frac{1}{2}} (\hat{θ} - θ) \overset{d i s t}{\to} N (0, I) and \hat{θ} ~ N (θ, \hat{J} {(\hat{θ})}^{- 1}) .

(30)

4.3.1. Confidence Intervals for Parameter $s$

The general parameter

θ

can be determined as

θ = {(μ σ^{2} s)}^{T}

. From Equation (27),

- \hat{J} (θ)

can be given as

- \hat{J} (θ) = (\begin{matrix} \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial σ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial s} \\ \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial μ} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial {(σ^{2})}^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial s} \\ \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s \partial μ} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s \partial σ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s^{2}} \end{matrix}) .

(31)

From the log-likelihood function in Equation (17),

- \hat{J} (θ)

can be calculated as follows (see Appendix A.1):

- \hat{J} (θ) = (\begin{matrix} - \frac{q}{σ^{2}} & - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}} & 0 \\ - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}} & \frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}} & 0 \\ 0 & 0 & - \frac{q}{s^{2}} \end{matrix}) = (\begin{matrix} \hat{J} {(θ)}_{11} & \hat{J} {(θ)}_{12} & \hat{J} {(θ)}_{13} \\ \hat{J} {(θ)}_{21} & \hat{J} {(θ)}_{22} & \hat{J} {(θ)}_{23} \\ \hat{J} {(θ)}_{31} & \hat{J} {(θ)}_{32} & \hat{J} {(θ)}_{33} \end{matrix})

(32)

The square root of the diagonal elements of

\hat{J} {(\hat{θ})}^{- 1}

represents an estimate of the standard error for each estimator. An estimator for the standard error of

\hat{s}

is of particular interest and can be found using the inverses of the partitioned matrices. First, the partition

- \hat{J} (θ)

is as follows:

- \hat{J} (θ) = (\begin{matrix} \hat{J} {(θ)}_{11} & \hat{J} {(θ)}_{12} \\ \hat{J} {(θ)}_{21} & \hat{J} {(θ)}_{22} \end{matrix}) .

(33)

Next, one estimator for

Var (\hat{s})

can be obtained as follows:

\begin{matrix} Var (\hat{s}) = {(\hat{J} {(θ)}_{11} \hat{J} {(θ)}_{22} - \hat{J} {(θ)}_{21} \hat{J} {(θ)}_{12})}^{- 1} \\ = - {[\frac{q}{σ^{2}} (\frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}) + {(\frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}})}^{2}]}^{- 1} . \end{matrix}

(34)

The associated standard error of

\hat{s}

can be calculated by

\hat{SE} (\hat{s}) = \frac{1}{\sqrt{- [\frac{q}{σ^{2}} (\frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}) + {(\frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}})}^{2}]}} .

(35)

Therefore, an approximation

(1 - α) 100 %

confidence interval of

s

can be obtained by using

\hat{s} \pm z^{*} \hat{SE} (\hat{s}) .

(36)

4.3.2. Confidence Intervals for Parameter $t$

Similarly, a confidence interval of

t

can be obtained by using

\hat{t} \pm z^{*} \hat{SE} (\hat{t}) .

(37)

The standard error of

\hat{t}

can be defined as follows (see Appendix A.2):

\hat{SE} (\hat{t}) = \frac{1}{\sqrt{- [\frac{q}{σ^{2}} (\frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}) + {(\frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}})}^{2}]}}

(38)

4.3.3. Confidence Intervals for Parameter $r$

Similarly, an confidence interval of

r

can be obtained by using

\hat{r} \pm z^{*} \hat{SE} (\hat{r}) .

(39)

The standard error of

\hat{r}

can be defined as follows (see Appendix A.3):

\hat{SE} (\hat{r}) = \frac{1}{\sqrt{- [\frac{q}{σ^{2}} (\frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}) + {(\frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}})}^{2}]}}

(40)

5. Case Study

This case study was performed using the printing data example proposed by Vining and Myers [6] and Lin and Tu [7]. The impacts of three input factors, that is, speed (

x_{1}

), pressure (

x_{2}

), and distance (

x_{3}

), on the capabilities of a printing machine that applies colored inks to package levels (y) were investigated. Each input factor had three levels, resulting in a total of 27 runs. Three replications were performed for each data combination.

In comparative studies between methods, the expected quality loss (EQL) is usually applied as a critical optimization criterion in RD. The expectation of the loss function is defined as

EQL = θ [{(\hat{μ} (x) - τ)}^{2} + {\hat{σ}}^{2} (x)] .

(41)

where

θ

represents a positive loss coefficient.

\hat{μ} (x)

,

τ

, and

\hat{σ} (x)

are the estimated process mean function, desirable target value, and estimated standard deviation function, respectively. In this case study, the target value was

τ = 500

. Using MATLAB, the process-estimated mean and standard deviation functions obtained using the LSM based on RSM were as follows:

{\hat{μ}}_{LSM} (x) = 327.6296 + x^{T} m_{L S M} + x^{T} M_{L S M} x,

(42)

{\hat{σ}}_{LSM} (x) = 34.8832 + x^{T} n_{L S M} + x^{T} N_{L S M} x

(43)

where

x = [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}], m_{L S M} = [\begin{matrix} 177.000 \\ 109.426 \\ 131.463 \end{matrix}], M_{L S M} = [\begin{matrix} 32.000 & 66.028 & 75.472 \\ 66.028 & - 22.389 & 43.583 \\ 75.472 & 43.583 & - 29.056 \end{matrix}], n_{L S M} = [\begin{matrix} 11.527 \\ 15.323 \\ 29.190 \end{matrix}], and N_{L S M} = [\begin{matrix} 4.204 & 7.7120 & 5.1093 \\ 7.720 & - 1.316 & 14.082 \\ 5.109 & 14.082 & 16.778 \end{matrix}] .

For comparison, the criteria used in the NN (mainly, the number of hidden neurons, training algorithms, and the number of epochs of NNs) for the conventional log-sigmoid transfer function in the hidden layer were the same as those of the proposed NNs. The NN information for the conventional log-sigmoid transfer function and the proposed DR estimation method are presented in Table 1. The NN architecture consists of several input factors, hidden neurons, and output responses. The weight and bias matrix values of the NNs using the conventional log-sigmoid transfer function for the process mean and standard deviation functions are presented in Table 2. Based on the structures, training algorithms, transfer functions, and the number of epochs of the proposed feed-forward NNs used to estimate the functions of process mean and standard deviation, optimal weight parameters

\hat{s}

,

\hat{t}

, and

\hat{r}

were calculated using Equations (20), (22), and (24), respectively. As shown in Table 3, twenty hidden neurons in the hidden layer are used to estimate mean function, and twenty hidden neurons in the hidden layer are used to estimate standard deviation function. For each hidden neuron, a desirability function is proposed as a transfer function. In addition, weight parameters (i.e., s, t, and r) for DFs identified in Equations (8) and (9) and associated confidence intervals are demonstrated in Table 3. The respective 95% confidence intervals for these weight parameters calculated by using Equations (35), (38), and (40) are presented in Table 3. The weight and bias matrix values of the feed-forward NNs using the proposed transfer functions for the process mean and standard deviation functions are presented in Table 4.

The estimated process mean and standard deviation functions obtained by applying the conventional LSM-RSM, the NN-based estimation method using the conventional log-sigmoid transfer function, and the proposed DR approach with the estimated optimal weight parameters listed in Table 3, along with the associated coefficients of determination

R^{2}

of each estimated response function, are illustrated in Figure 4, Figure 5 and Figure 6 as contour plots, respectively.

The optimal control factor settings, the optimal process parameters (i.e., process mean, bias, and variance), and their associated EQL values obtained by the three different approaches (i.e., the conventional LSM based on RSM, the NN-based estimation method using the conventional log-sigmoid transfer functions, and the proposed NN-based DR estimation method) are presented in Table 5.

As demonstrated in Table 5, the process variance obtained from the NN-based estimation methods is remarkably smaller than that of the conventional statistical LSM approach. Therefore, NN-based estimation methods can provide considerably smaller EQL values (i.e., significant criteria to evaluate the level of quality) than those of the conventional LSM approach in this particular example, although the same RD optimization model is utilized. Of the two NN-based estimation methods, the proposed DR approach provides better optimal solutions when compared with that of the NN-based estimation method that uses the conventional log-sigmoid transfer function because the variance obtained from the proposed DR approach is almost zero in this particular example. Therefore, in terms of EQL, the best optimal solutions can be obtained using the NN-based DR estimation method. The criterion spaces (squared bias vs. variance) of the three estimated functions by using the conventional LSM approach based on RSM, the N-based estimation method using the log-sigmoid transfer function, and the proposed DR approach are illustrated in Figure 7. A green cross marks the optimal settings in each figure.

6. Conclusions and Further Studies

This study proposes a feed-forward NN-based DR estimation method to estimate the process mean and standard deviation functions for RSM and RD modeling. The proposed method can avoid basic error assumptions of the conventional LSM approach in this particular situation. Further, integrating the DFs into the NN structures in the proposed method results in a general and flexible transfer function family for transforming data based on the proposed DFs. The optimal weight parameters of the DF family can be determined using the MLE and the confidence intervals of the optimal weight parameters using the proposed asymptotic distribution of maximum-likelihood estimators. The best transfer function in the NN structure was achieved using the proposed method. The proposed NN-based DR estimation method provides better solutions for EQL than those of the conventional LSM approach and the NN-based estimation method that uses the conventional log-sigmoid transfer function in the particular case study example. In order to improve the reliability of the proposed methods, different types of experimental data (i.e., small and large data sets, different DoE results, and small and large sets of replications) should be considered.

In further studies, the assumption that the error follows a normal distribution and whether the DF values can follow any distribution will be investigated. Additionally, the optimal weight parameters of the DF-based transfer functions were estimated using a Bayesian approach or the Newton–Raphson algorithm. We then plan to determine the confidence intervals of the optimal weight parameters using a t-distribution. In addition, a more comprehensive comparative study between the proposed NN models and higher-order models of conventional RSM can be a significant further research issue. Based on this comparative study and a Bayesian approach, we may develop a new optimal estimation system by integrating the proposed NN models and conventional RSM models.

Author Contributions

Conceptualization, T.-H.L. and S.S.; Methodology, T.-H.L. and S.S.; Modeling, T.-H.L.; Validation T.-H.L. and S.S.; Writing—original draft preparation, T.-H.L. and S.S.; Writing—Review and Editing, H.J. and S.S.; Funding Acquisition, H.J. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) (No. NRF-2019R1F1A1060067 and No. NRF-2019R1G1A1010335).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Confidence Intervals for Weight Parameters

Appendix A.1. Parameter s

Fisher’s total information matrix

J (θ)

for the general parameter

θ

is defined as follows:

- \hat{J} (θ) = (\begin{matrix} \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial σ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial s} \\ \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial μ} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial {(σ^{2})}^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial s} \\ \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s \partial μ} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s \partial σ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s^{2}} \end{matrix}) .

(A1)

The first derivatives of the log-likelihood function

ℓ (θ | d_{j})

for the mean, variance, and parameter

s

can be expressed as follows:

\frac{\partial ℓ (θ | d_{j})}{\partial μ} = \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{σ^{2}}, \frac{\partial ℓ (θ | d_{j})}{\partial σ^{2}} = - \frac{q}{2} \frac{1}{σ^{2}} + \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{2} \frac{1}{{(σ^{2})}^{2}} and \frac{\partial ℓ (θ | d_{j})}{\partial s} = \frac{q}{s} + \sum_{p = 1}^{q} l o g (\frac{y_{p} - L_{p}}{T_{p} - L_{p}})

(A2)

The second derivatives of the log-likelihood function

ℓ (θ | d_{j})

for the mean, variance, and parameter

s

with mixed partials are given as follows:

\frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ^{2}} = - \frac{q}{σ^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial σ^{2}} = - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial s} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial μ} = - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial {(σ^{2})}^{2}} = \frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial s} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s \partial μ} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s \partial σ^{2}} = 0 and \frac{\partial^{2} ℓ (θ | d_{j})}{\partial s^{2}} = - \frac{q}{s^{2}} .

(A3)

Consequently, Fisher’s total information matrix

J (θ)

can be expressed as

- \hat{J} (θ) = (\begin{matrix} - \frac{q}{σ^{2}} & - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}} & 0 \\ - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}} & \frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}} & 0 \\ 0 & 0 & - \frac{q}{s^{2}} \end{matrix}) .

(A4)

Appendix A.2. Parameter t

General parameter

θ

can be determined as

θ = {(μ σ^{2} t)}^{T}

. Based on Equation (27),

- \hat{J} (θ)

can be expressed as

- \hat{J} (θ) = (\begin{matrix} \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial σ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial t} \\ \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial μ} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial {(σ^{2})}^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial t} \\ \frac{\partial^{2} ℓ (θ | d_{j})}{\partial t \partial μ} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial t \partial σ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial t^{2}} \end{matrix})

(A5)

Based on the log-likelihood function in Equation (21), the first derivatives of the log-likelihood function

ℓ (θ | d_{j})

for the mean, variance, and parameter

t

can be expressed as follows:

\frac{\partial ℓ (θ | d_{j})}{\partial μ} = \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{σ^{2}}, \frac{\partial ℓ (θ | d_{j})}{\partial σ^{2}} = - \frac{q}{2} \frac{1}{σ^{2}} + \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{2} \frac{1}{{(σ^{2})}^{2}} and \frac{\partial ℓ (θ | d_{j})}{\partial t} = \frac{q}{t} + \sum_{p = 1}^{q} l o g (\frac{y_{p} - U_{p}}{T_{p} - U_{p}}) .

(A6)

The second derivatives of log-likelihood function

ℓ (θ | d_{j})

for the mean, variance, and parameter

t

with mixed partials are given as follows:

\frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ^{2}} = - \frac{q}{σ^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial σ^{2}} = - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial t} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial μ} = - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial {(σ^{2})}^{2}} = \frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial t} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial t \partial μ} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial t \partial σ^{2}} = 0 and \frac{\partial^{2} ℓ (θ | d_{j})}{\partial t^{2}} = - \frac{q}{t^{2}} .

(A7)

Therefore,

- \hat{J} (θ) = (\begin{matrix} - \frac{q}{σ^{2}} & - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}} & 0 \\ - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}} & \frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}} & 0 \\ 0 & 0 & - \frac{q}{t^{2}} \end{matrix}) = (\begin{matrix} \hat{J} {(θ)}_{11} & \hat{J} {(θ)}_{12} & \hat{J} {(θ)}_{13} \\ \hat{J} {(θ)}_{21} & \hat{J} {(θ)}_{22} & \hat{J} {(θ)}_{23} \\ \hat{J} {(θ)}_{31} & \hat{J} {(θ)}_{32} & \hat{J} {(θ)}_{33} \end{matrix})

(A8)

The square root of the diagonal elements of

\hat{J} {(\hat{θ})}^{- 1}

represents the standard error for each estimator. An estimator for the standard error of

\hat{t}

is of particular interest and can be obtained using the inverses of the partitioned matrices. First, the partition

- \hat{J} (θ)

is

- \hat{J} (θ) = (\begin{matrix} \hat{J} {(θ)}_{11} & \hat{J} {(θ)}_{12} \\ \hat{J} {(θ)}_{21} & \hat{J} {(θ)}_{22} \end{matrix}) .

(A9)

Next, one estimator for

Var (\hat{t})

can be obtained as follows:

\begin{matrix} Var (\hat{t}) = {(\hat{J} {(θ)}_{11} \hat{J} {(θ)}_{22} - \hat{J} {(θ)}_{21} \hat{J} {(θ)}_{12})}^{- 1} \\ = - {[\frac{q}{σ^{2}} (\frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}) + {(\frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}})}^{2}]}^{- 1} \end{matrix}

(A10)

The associated standard error of

\hat{t}

can be defined as

\hat{SE} (\hat{t}) = \frac{1}{\sqrt{- [\frac{q}{σ^{2}} (\frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}) + {(\frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}})}^{2}]}}

(A11)

Appendix A.3. Parameter r

To estimate parameter

r

, general parameter

θ

can be determined as

θ = {(μ σ^{2} r)}^{T}

.

From Equation (27),

- \hat{J} (θ) = (\begin{matrix} \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial σ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial r} \\ \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial μ} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial {(σ^{2})}^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial r} \\ \frac{\partial^{2} ℓ (θ | d_{j})}{\partial r \partial μ} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial r \partial σ^{2}} & \frac{\partial^{2} ℓ (θ | d_{j})}{\partial r^{2}} \end{matrix})

(A12)

Based on the log-likelihood function in Equation (23), the first derivatives of the log-likelihood function

ℓ (θ | d_{j})

for the mean, variance, and parameter

r

can be expressed as follows:

\frac{\partial ℓ (θ | d_{j})}{\partial μ} = \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{σ^{2}}, \frac{\partial ℓ (θ | d_{j})}{\partial σ^{2}} = - \frac{q}{2} \frac{1}{σ^{2}} + \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{2} \frac{1}{{(σ^{2})}^{2}} and \frac{\partial ℓ (θ | d_{j})}{\partial r} = \frac{q}{r} + \sum_{p = 1}^{q} l o g (\frac{y_{p} - U_{p}}{L_{p} - U_{p}}) .

(A13)

The second derivatives of log-likelihood function

ℓ (θ | d_{j})

for the mean, variance, and parameter

r

with mixed partials are given as follows:

\frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ^{2}} = - \frac{q}{σ^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial σ^{2}} = - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial μ \partial r} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial μ} = - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial {(σ^{2})}^{2}} = \frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial σ^{2} \partial r} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial r \partial μ} = 0, \frac{\partial^{2} ℓ (θ | d_{j})}{\partial r \partial σ^{2}} = 0 and \frac{\partial^{2} ℓ (θ | d_{j})}{\partial r^{2}} = - \frac{q}{r^{2}} .

(A14)

Therefore,

- \hat{J} (θ) = (\begin{matrix} - \frac{q}{σ^{2}} & - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}} & 0 \\ - \frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}} & \frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}} & 0 \\ 0 & 0 & - \frac{q}{r^{2}} \end{matrix}) = (\begin{matrix} \hat{J} {(θ)}_{11} & \hat{J} {(θ)}_{12} & \hat{J} {(θ)}_{13} \\ \hat{J} {(θ)}_{21} & \hat{J} {(θ)}_{22} & \hat{J} {(θ)}_{23} \\ \hat{J} {(θ)}_{31} & \hat{J} {(θ)}_{32} & \hat{J} {(θ)}_{33} \end{matrix})

(A15)

The squared root of the diagonal entries of

\hat{J} {(\hat{θ})}^{- 1}

represents the standard error for each estimator. An estimator for the standard error of

\hat{r}

is of particular interest and can be obtained using the inverses of the partitioned matrices. First, partition

- \hat{J} {(θ)}^{}

is

- \hat{J} (θ) = (\begin{matrix} \hat{J} {(θ)}_{11} & \hat{J} {(θ)}_{12} \\ \hat{J} {(θ)}_{21} & \hat{J} {(θ)}_{22} \end{matrix})

(A16)

Next, one estimator for

Var (\hat{r})

can be obtained as follows:

\begin{matrix} Var (\hat{r}) = {(\hat{J} {(θ)}_{11} \hat{J} {(θ)}_{22} - \hat{J} {(θ)}_{21} \hat{J} {(θ)}_{12})}^{- 1} \\ = - {[\frac{q}{σ^{2}} (\frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}) + {(\frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}})}^{2}]}^{- 1} \end{matrix}

(A17)

Finally, the associated standard error of

\hat{r}

can be defined as

\hat{SE} (\hat{r}) = \frac{1}{\sqrt{- [\frac{q}{σ^{2}} (\frac{q}{2} \frac{1}{{(σ^{2})}^{2}} - \frac{\sum_{p = 1}^{q} {(d_{p} - μ)}^{2}}{{(σ^{2})}^{3}}) + {(\frac{\sum_{p = 1}^{q} (d_{p} - μ)}{{(σ^{2})}^{2}})}^{2}]}}

(A18)

References

Taguchi, G. Introduction to Quality Engineering; UNIPUB/Kraus International: White Plains, NY, USA, 1986. [Google Scholar]
Leon, R.V.; Shoemaker, A.C.; Kackar, R.N. Performance measures independent of adjustment: An explanation and extension of Taguchi signal-to-noise ratio. Technometrics 1987, 29, 253–285. [Google Scholar] [CrossRef]
Box, G.E.P. Signal-to-noise ratios, performance criteria, and transformations. Technometrics 1988, 30, 1–17. [Google Scholar] [CrossRef]
Box, G.; Bisgaard, S.; Fung, C. An explanation and critique of Taguchi’s contribution to quality engineering. Qual. Reliab. Eng. Int. 1988, 4, 123–131. [Google Scholar] [CrossRef]
Nair, V.N. Taguchi’s parameter design: A panel discussion. Technometrics 1992, 34, 127–161. [Google Scholar] [CrossRef]
Vining, G.G.; Myers, R.H. Combining Taguchi and response surface philosophies: A dual response approach. J. Qual. Technol. 1990, 22, 38–45. [Google Scholar] [CrossRef]
Lin, D.K.J.; Tu, W. Dual response surface optimization. J. Qual. Technol. 1995, 27, 34–39. [Google Scholar] [CrossRef]
Del Castillo, E.; Montgomery, D.C. A nonlinear programming solution to the dual response problem. J. Qual. Technol. 1993, 25, 199–204. [Google Scholar] [CrossRef]
Copeland, K.A.F.; Nelson, P.R. Dual response optimization via direct function minimization. J. Qual. Technol. 1996, 28, 331–336. [Google Scholar] [CrossRef]
Cho, B.R.; Philips, M.D.; Kapur, K.C. Quality improvement by RSM modeling for robust design. In Proceedings of the 5th Industrial Engineering Research Conference, Minneapolis, MN, USA, 18–20 May 1996; pp. 650–655. [Google Scholar]
Ames, A.E.; Mattucci, N.; Macdonald, S.; Szonyi, G.; Hawkins, D.M. Quality loss functions for optimization across multiple response surfaces. J. Qual. Technol. 1997, 29, 339–346. [Google Scholar] [CrossRef]
Borror, C.M. Mean and variance modeling with qualitative responses: A case study. Qual. Eng. 1998, 11, 141–148. [Google Scholar] [CrossRef]
Kim, K.J.; Lin, D.K.J. Dual response surface optimization: A fuzzy modeling approach. J. Qual. Technol. 1998, 30, 1–10. [Google Scholar] [CrossRef]
Koksoy, O.; Doganaksoy, N. Joint optimization of mean and standard deviation using response surface methods. J. Qual. Technol. 2003, 35, 239–252. [Google Scholar] [CrossRef]
Ding, R.; Lin, D.K.J.; Wei, D. Dual response surface optimization: A weighted MSE approach. Qual. Eng. 2004, 16, 377–385. [Google Scholar] [CrossRef]
Shin, S.; Cho, B.R. Bias-specified robust design optimization and an analytical solutions. Comput. Ind. Eng. 2005, 48, 129–148. [Google Scholar] [CrossRef]
Shin, S.; Cho, B.R. Robust design models for customer specified bounds on process parameters. J. Syst. Sci. Syst. Eng. 2006, 15, 2–18. [Google Scholar] [CrossRef]
Shin, S.; Cho, B.R. Studies on a bi-objective robust design optimization problem. IIE Trans. 2009, 41, 957–968. [Google Scholar] [CrossRef]
Robinson, T.J.; Wulff, S.S.; Montgomery, D.S.; Khuri, A.I. Robust parameter design using generalized linear mixed models. J. Qual. Technol. 2006, 38, 65–75. [Google Scholar] [CrossRef]
Fogliatto, S.F. Multiresponse optimization of products with functional quality characteristics. Qual. Reliab. Eng. Int. 2008, 24, 927–939. [Google Scholar] [CrossRef]
Goethals, P.L.; Cho, B.R. The development of a robust design methodology for time-oriented dynamic quality characteristics with a target profile. Qual. Reliab. Eng. Int. 2010, 27, 403–414. [Google Scholar] [CrossRef]
Truong, N.K.V.; Shin, S. Development of a new robust design method based on Bayesian perspectives. Int. J. Qual. Eng. Technol. 2012, 3, 50–78. [Google Scholar] [CrossRef]
Truong, N.K.V.; Shin, S. A new robust design method from an inverse-problem perspective. Int. J. Qual. Eng. Technol. 2013, 3, 243–271. [Google Scholar] [CrossRef]
Nha, V.T.; Shin, S.; Jeong, S.H. Lexicographical dynamic goal programming approach to a robust design optimization within the pharmaceutical environment. Eur. J. Oper. Res. 2013, 229, 505–517. [Google Scholar] [CrossRef]
Baba, I.; Midi, H.; Rana, S.; Ibragimov, G. An alternative approach of dual response surface optimization based on penalty function method. Math. Probl. Eng. 2015, 2015, 450131. [Google Scholar] [CrossRef] [Green Version]
Yanıkoğlu, İ.; den Hertog, D.; Kleijnen, J.P. Robust dual-response optimization. IIE Trans. 2016, 48, 298–312. [Google Scholar] [CrossRef]
Myers, R.H.; Montgomery, D.C. Response Surface Methodology: Process and Product Optimization using Designed Experiments, 1st ed.; John Wiley & Sons: New York, NY, USA, 1995. [Google Scholar]
Luner, J.J. Achieving continuous improvement with the dual response approach: A demonstration of the Roman catapult. Qual. Eng. 1994, 6, 691–705. [Google Scholar] [CrossRef]
Cho, B.R.; Park, C.S. Robust design modeling and optimization with unbalanced data. Comput. Ind. Eng. 2005, 48, 173–180. [Google Scholar] [CrossRef]
Lee, S.B.; Park, C.S. Development of robust design optimization using incomplete data. Comput. Ind. Eng. 2006, 50, 345–356. [Google Scholar] [CrossRef]
Cho, B.R.; Choi, Y.; Shin, S. Development of censored data-based robust design for pharmaceutical quality by design. Int. J. Adv. Manuf. Technol. 2010, 49, 839–851. [Google Scholar] [CrossRef]
Goegebeur, Y.; Goos, P.; Vandebroek, M. A hierarchical Bayesian approach to robust parameter design. SSRN 2007, 1–23. [Google Scholar] [CrossRef] [Green Version]
Chen, Y.; Ye, K. Bayesian hierarchical modeling on dual response surfaces in partially replicated designs. Qual. Technol. Quant. Manag. 2009, 6, 371–389. [Google Scholar] [CrossRef]
Chen, Y.; Ye, K. A Bayesian hierarchical approach to dual response surface modeling. J. Appl. Stat. 2011, 38, 1963–1975. [Google Scholar] [CrossRef]
Irie, B.; Miyake, S. Capabilities of three-layered perceptrons. In Proceedings of the International Conference Neural Networks, San Diego, CA, USA, 24–27 July 1988; pp. 641–648. [Google Scholar]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
Funahashi, K. On the approximate realization of continuous mappings by neural networks. Neural Netw. 1989, 2, 183–192. [Google Scholar] [CrossRef]
Harrington, E.C. The desirability function. Ind. Qual. Control. 1965, 21, 494–498. [Google Scholar]
Derringer, G.; Suich, R. Simultaneous optimization of several response variables. J. Qual. Technol. 1980, 12, 214–219. [Google Scholar] [CrossRef]
Rowlands, H.; Packianather, M.S.; Oztemel, E. Using artificial neural networks for experimental design in off-line quality. J. Syst. Eng. 1996, 6, 46–59. [Google Scholar]
Su, C.T.; Hsieh, K.L. Applying neural network approach to achieve robust design for dynamic quality characteristics. Int. J. Qual. Reliab. Manag. 1998, 15, 509–519. [Google Scholar] [CrossRef]
Cook, D.F.; Ragsdale, C.T.; Major, R.L. Combining a neural network with a genetic algorithm for process parameter optimization. Eng. Appl. Artif. Intell. 2000, 13, 391–396. [Google Scholar] [CrossRef]
Chow, T.T.; Zhang, G.Q.; Lin, Z.; Song, C.L. Global optimization of absorption chiller system by genetic algorithm and neural network. Energy Build. 2002, 34, 103–109. [Google Scholar] [CrossRef]
Chang, H.H. Applications of neural networks and genetic algorithms to Taguchi’s robust design. Int. J. Electron. Commer. 2005, 3, 90–96. [Google Scholar]
Chang, H.H.; Chen, Y.K. Neuro-genetic approach to optimize parameter design of dynamic multiresponse experiments. Appl. Soft. Comput. 2011, 11, 436–442. [Google Scholar] [CrossRef]
Arungpadang, R.T.; Kim, J.Y. Robust parameter design based on back propagation neural network. Korean Manag. Sci. Rev. 2012, 29, 81–89. [Google Scholar] [CrossRef] [Green Version]
Box, G.E.P.; Wilson, K.B. On the experimental attainment of optimum conditions (with discussion). J. R. Stat. Soc. B 1951, 13, 270–310. [Google Scholar]
Myers, R.H. Response surface methodology—Current status and future directions. J. Qual. Technol. 1999, 31, 54–57. [Google Scholar] [CrossRef]
Khuri, A.I.; Mukhopadhyay, S. Response surface methodology. Wiley Interdiscip. Rev. Comput Stat. 2010, 2, 128–149. [Google Scholar] [CrossRef]
Sarle, W.S. Neural networks and statistical models. In Proceeding of the 19th Annual SAS User Group International Conference, Dallas, TX, USA, 10–13 April 1994; pp. 1–13. [Google Scholar]
Zainuddin, Z.; Pauline, O. Function approximation using artificial neural networks. WSEAS Trans. Math. 2008, 7, 333–338. [Google Scholar]
Hartman, E.J.; Keeler, J.D.; Kowalski, J.M. Layered neural networks with Gaussian hidden units as universal approximations. Neural Comput 1990, 2, 210–215. [Google Scholar] [CrossRef]
Zilouchian, A.; Jamshidi, M. Intelligent Control. Systems Using Soft Computing Methodologies; CRC Press: Boca Raton, FL, USA, 2000. [Google Scholar]
Lindsey, J.K. Parametric Statistical Inference; Oxford University Press: New York, NY, USA, 1996. [Google Scholar]

Figure 1. Overview of the proposed NN-based DR function estimation method.

Figure 2. Proposed feed-forward NN-based RD estimation method.

Figure 3. Integration of DFs as transfer functions in hidden layers: (a) “nominal is best” transfer function and (b) “smaller is better” transfer function.

Figure 4. Estimated process mean and standard deviation response functions from the conventional LSM approach based on RSM: (a) mean (

R^{2} = 92.68 %

); (b) standard deviation (

R^{2} = 45.42 %

).

Figure 4. Estimated process mean and standard deviation response functions from the conventional LSM approach based on RSM: (a) mean (

R^{2} = 92.68 %

); (b) standard deviation (

R^{2} = 45.42 %

).

Figure 5. Estimated process mean and standard deviation response functions from the feed-forward NN approach with conventional log-sigmoid transfer function: (a) mean (

R^{2} = 93.44 %

); (b) standard deviation (

R^{2} = 51.48 %

).

Figure 5. Estimated process mean and standard deviation response functions from the feed-forward NN approach with conventional log-sigmoid transfer function: (a) mean (

R^{2} = 93.44 %

); (b) standard deviation (

R^{2} = 51.48 %

).

Figure 6. Estimated process mean and standard deviation response functions from the feed-forward NN approach with the proposed transfer functions: (a) mean (

R^{2} = 67.17 %

); (b) standard deviation (

R^{2} = 53.49 %

).

Figure 6. Estimated process mean and standard deviation response functions from the feed-forward NN approach with the proposed transfer functions: (a) mean (

R^{2} = 67.17 %

); (b) standard deviation (

R^{2} = 53.49 %

).

Figure 7. Criterion spaces of the three estimated functions: (a) LSM based on RSM; (b) log-sigmoid transfer function-based NN; (c) proposed NN-based DR method.

Table 1. Feed-forward NNs information used to estimate DR functions.

Model		Transfer Functions	Training Function	Architecture	Number of Epochs
Conventional	Mean	Logsig-Purelin	Trainlm	3-20-1	5
Conventional	Standard deviation	Logsig-Purelin	Trainrp	3-20-1	83
Proposed	Mean	“nominal is best”-Purelin	Trainlm	3-20-1	5
Proposed	Standard deviation	“smaller is better”-Purelin	Trainrp	3-20-1	83

Table 2. Weight and bias values from the NNs approach by using the conventional log-sigmoid transfer function.

(a) Estimated Process Mean						(b) Estimated Process Standard Deviation
Weights				Biases		Weights				Biases
$W_{i j}^{m e a n}$			${(V_{j}^{m e a n})}^{T}$	$a_{j}^{m e a n}$	$b_{m e a n}^{N N}$	$W_{i j}^{s t d}$			${(V_{j}^{s t d})}^{T}$	$a_{j}^{s t d}$	$b_{s t d}^{N N}$
7.048	3.571	−1.123	0.134	−7.259	−0.343	6.693	3.771	−0.855	0.0516	−7.531	−0.456
4.726	−5.71	−1.371	−0.212	−6.831		4.928	−5.169	−1.884	−0.038	−7.045
−4.981	4.672	3.442	−0.789	6.003		−4.907	4.677	3.435	−0.147	5.755
4.459	3.990	3.498	1.079	−6.208		4.251	4.486	3.809	0.467	−5.651
2.679	3.682	−6.266	−0.199	−4.201		2.864	3.086	−5.745	0.330	−4.851
−6.436	4.389	0.793	0.209	3.175		−6.849	3.647	0.288	0.467	3.440
−4.908	5.623	−1.339	0.124	2.863		−5.123	5.469	−1.166	−0.356	2.874
1.562	−4.993	5.438	0.190	−2.048		1.447	−4.815	5.474	−0.271	−2.451
6.692	2.115	3.035	−0.139	−0.859		6.152	2.694	2.570	−0.250	−1.183
5.521	−4.127	3.079	0.099	−0.243		6.115	−3.555	2.652	−0.034	0.050
−5.977	3.126	−3.407	0.325	−0.669		−6.131	2.967	−3.265	0.230	−0.850
5.256	−5.103	1.936	0.633	1.256		5.651	−4.723	1.535	−0.040	1.651
6.483	−3.230	2.033	−0.294	2.105		6.082	−3.433	2.667	0.697	1.926
−0.589	−5.911	−4.442	−0.462	−3.229		−0.441	−5.646	−4.983	−0.529	−2.783
3.818	−4.951	−4.368	−0.443	3.550		4.070	−5.306	−4.141	0.407	3.526
−6.018	4.983	−0.135	−0.385	−4.140		−6.091	4.643	−0.476	−0.545	−4.469
−1.158	2.826	6.948	−0.294	−5.321		−1.107	3.380	6.462	−0.155	−5.651
6.528	−3.113	−2.385	0.419	6.011		6.112	−2.436	−2.970	−0.514	6.074
4.157	6.574	1.314	0.425	6.397		4.160	5.845	1.643	0.072	7.251
4.931	−5.012	−2.952	−0.233	7.576		4.467	−4.909	−3.407	−0.070	7.674

Table 3. Estimated weight parameters and associated confidence intervals.

Hidden Neurons	$\hat{s}$	$\hat{s} -$ $z^{*} \hat{S E} (\hat{s})$	$\hat{s} +$ $z^{*} \hat{S E} (\hat{s})$	$\hat{t}$	$\hat{t} -$ $z^{*} \hat{S E} (\hat{t})$	$\hat{t} +$ $z^{*} \hat{S E} (\hat{t})$	$\hat{r}$	$\hat{r} -$ $z^{*} \hat{S E} (\hat{r})$	$\hat{r} +$ $z^{*} \hat{S E} (\hat{r})$
1st	1.150	1.024	1.276	0.100	−0.007	0.207	0.180	0.118	0.242
2nd	0.100	−0.006	0.206	0.180	0.095	0.265	0.160	0.128	0.192
3rd	0.120	0.008	0.232	0.100	0.019	0.181	0.190	0.139	0.241
4th	0.100	−0.022	0.222	0.150	0.056	0.244	0.910	0.777	1.043
5th	0.100	−0.038	0.238	0.110	0.017	0.203	0.170	0.070	0.270
6th	0.100	−0.056	0.256	0.650	0.469	0.831	0.310	0.156	0.464
7th	0.100	−0.076	0.276	0.100	−0.056	0.256	0.120	−0.236	0.476
8th	0.100	−0.108	0.308	0.140	0.005	0.275	0.100	−0.115	0.315
9th	0.100	−0.187	0.387	0.150	0.023	0.277	0.160	0.029	0.291
10th	0.820	0.667	0.973	0.370	0.245	0.495	0.110	0.012	0.208
11th	0.110	−0.063	0.283	1.050	0.852	1.248	0.100	0.029	0.171
12th	0.130	−0.073	0.333	0.100	−0.137	0.337	0.150	0.095	0.205
13th	0.100	−0.123	0.323	0.110	−0.173	0.393	2.370	2.249	2.491
14th	0.120	−0.139	0.379	0.120	−0.283	0.523	0.390	0.267	0.513
15th	0.310	0.019	0.601	0.190	−0.424	0.804	0.150	0.013	0.287
16th	0.480	0.313	0.647	0.100	−0.324	0.524	0.230	0.071	0.389
17th	0.110	−0.069	0.289	3.200	3.085	3.315	0.120	−0.079	0.319
18th	0.240	0.042	0.438	0.100	−0.017	0.217	0.100	−0.261	0.461
19th	0.180	−0.046	0.406	0.120	0.001	0.239	0.100	−0.167	0.367
20th	0.100	−0.154	0.354	0.100	−0.021	0.221	0.400	0.204	0.596

Table 4. Weight and bias values by using the NN-based DR estimation method.

(a) Estimated Mean Response							(b) Estimated Standard Deviation Response
Weights					Biases		Weights				Biases
$W_{i j}^{m e a n_p r o}$			${(V_{j}^{m e a n_p r o})}^{T}$	$a_{j}^{m e a n_p r o}$		$b_{m e a n}^{p r o}$	$W_{i j}^{s t d_p r o}$			${(V_{j}^{s t d_p r o})}^{T}$	$a_{j}^{s t d_p r o}$	$b_{s t d}^{p r o}$
7.048	3.571	−1.123	0.134	−7.259		−0.343	6.693	3.771	−0.855	0.0516	−7.531	−0.456
4.726	−5.71	−1.371	−0.212	−6.831			4.928	−5.169	−1.884	−0.038	−7.045
−4.981	4.672	3.442	−0.789	6.003			−4.907	4.677	3.435	−0.147	5.755
4.459	3.990	3.498	1.079	−6.208			4.251	4.486	3.809	0.467	−5.651
2.679	3.682	−6.266	−0.199	−4.201			2.864	3.086	−5.745	0.330	−4.851
−6.436	4.389	0.793	0.209	3.175			−6.849	3.647	0.288	0.467	3.440
−4.908	5.623	−1.339	0.124	2.863			−5.123	5.469	−1.166	−0.356	2.874
1.562	−4.993	5.438	0.190	−2.048			1.447	−4.815	5.474	−0.271	−2.451
6.692	2.115	3.035	−0.139	−0.859			6.152	2.694	2.570	−0.250	−1.183
5.521	−4.127	3.079	0.099	−0.243			6.115	−3.555	2.652	−0.034	0.050
−5.977	3.126	−3.407	0.325	−0.669			−6.131	2.967	−3.265	0.230	−0.850
5.256	−5.103	1.936	0.633	1.256			5.651	−4.723	1.535	−0.040	1.651
6.483	−3.230	2.033	−0.294	2.105			6.082	−3.433	2.667	0.697	1.926
−0.589	−5.911	−4.442	−0.462	−3.229			−0.441	−5.646	−4.983	−0.529	−2.783
3.818	−4.951	−4.368	−0.443	3.550			4.070	−5.306	−4.141	0.407	3.526
−6.018	4.983	−0.135	−0.385	−4.140			−6.091	4.643	−0.476	−0.545	−4.469
−1.158	2.826	6.948	−0.294	−5.321			−1.107	3.380	6.462	−0.155	−5.651
6.528	−3.113	−2.385	0.419	6.011			6.112	−2.436	−2.970	−0.514	6.074
4.157	6.574	1.314	0.425	6.397			4.160	5.845	1.643	0.072	7.251
4.931	−5.012	−2.952	−0.233	7.576			4.467	−4.909	−3.407	−0.070	7.674

Table 5. Comparative results of various methods.

Estimation Model	Optimal Factor Settings			Process Mean	Process Bias	Process Variance	EQL
Estimation Model	$x_{1}$	$x_{2}$	$x_{3}$	Process Mean	Process Bias	Process Variance	EQL
LSM	1.000	0.072	−0.250	494.672	5.328	1977.533	2005.917
Feed-forward NN	−0.067	0.170	1.000	499.321	0.679	397.507	397.967
Proposed method	0.133	−0.014	0.096	499.885	0.115	103.900	103.913

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Le, T.-H.; Jang, H.; Shin, S. Determination of the Optimal Neural Network Transfer Function for Response Surface Methodology and Robust Design. Appl. Sci. 2021, 11, 6768. https://doi.org/10.3390/app11156768

AMA Style

Le T-H, Jang H, Shin S. Determination of the Optimal Neural Network Transfer Function for Response Surface Methodology and Robust Design. Applied Sciences. 2021; 11(15):6768. https://doi.org/10.3390/app11156768

Chicago/Turabian Style

Le, Tuan-Ho, Hyeonae Jang, and Sangmun Shin. 2021. "Determination of the Optimal Neural Network Transfer Function for Response Surface Methodology and Robust Design" Applied Sciences 11, no. 15: 6768. https://doi.org/10.3390/app11156768

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Determination of the Optimal Neural Network Transfer Function for Response Surface Methodology and Robust Design

Abstract

1. Introduction

2. RD Estimation Method Based on RSM

3. Proposed Feed-Forward NN-Based DR Estimation Method

3.1. Feed-Forward NN Structure

3.2. Back-Propagation Learning Algorithm

4. Proposed Transfer Functions Based on DFs

4.1. Integration of DFs into Transfer Functions

4.2. Estimation of Weight Parameters for NN Transfer Functions

4.2.1. Estimation of Parameter $s$

4.2.2. Estimation of Parameter $t$

4.2.3. Estimation of Parameter $r$

4.3. Confidence Intervals for Weight Parameters

4.3.1. Confidence Intervals for Parameter $s$

4.3.2. Confidence Intervals for Parameter $t$

4.3.3. Confidence Intervals for Parameter $r$

5. Case Study

6. Conclusions and Further Studies

Author Contributions

Funding

Conflicts of Interest

Appendix A. Confidence Intervals for Weight Parameters

Appendix A.1. Parameter s

Appendix A.2. Parameter t

Appendix A.3. Parameter r

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Determination of the Optimal Neural Network Transfer Function for Response Surface Methodology and Robust Design

Abstract

1. Introduction

2. RD Estimation Method Based on RSM

3. Proposed Feed-Forward NN-Based DR Estimation Method

3.1. Feed-Forward NN Structure

3.2. Back-Propagation Learning Algorithm

4. Proposed Transfer Functions Based on DFs

4.1. Integration of DFs into Transfer Functions

4.2. Estimation of Weight Parameters for NN Transfer Functions

4.2.1. Estimation of Parameter s

4.2.2. Estimation of Parameter t

4.2.3. Estimation of Parameter r

4.3. Confidence Intervals for Weight Parameters

4.3.1. Confidence Intervals for Parameter s

4.3.2. Confidence Intervals for Parameter t

4.3.3. Confidence Intervals for Parameter r

5. Case Study

6. Conclusions and Further Studies

Author Contributions

Funding

Conflicts of Interest

Appendix A. Confidence Intervals for Weight Parameters

Appendix A.1. Parameter s

Appendix A.2. Parameter t

Appendix A.3. Parameter r

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.1. Estimation of Parameter $s$

4.2.2. Estimation of Parameter $t$

4.2.3. Estimation of Parameter $r$

4.3.1. Confidence Intervals for Parameter $s$

4.3.2. Confidence Intervals for Parameter $t$

4.3.3. Confidence Intervals for Parameter $r$