Next Article in Journal
Structural Analysis and Design of Reinforced Concrete Bridge Corbels
Next Article in Special Issue
Robot Partner Development Platform for Human-Robot Interaction Based on a User-Centered Design Approach
Previous Article in Journal
Classification of Rice and Starch Flours by Using Multiple Hyperspectral Imaging Systems and Chemometric Methods
Previous Article in Special Issue
Comparative Study of Harmony Search Algorithm and its Applications in China, Japan and Korea
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Fuzzy Regression Analysis Using the F-Transform

1
Department of Applied Mathematics, Hanyang University, Gyeonggi-do 15588, Korea
2
Department of Mathematics, Yonsei University 50 Yonsei-Ro, Seoul 03722, Korea
3
School of Liberal Arts and Science, Korea Aerospace University, Gyeonggi-do 10540, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(19), 6726; https://doi.org/10.3390/app10196726
Submission received: 28 August 2020 / Revised: 17 September 2020 / Accepted: 22 September 2020 / Published: 25 September 2020
(This article belongs to the Special Issue Selected Papers from The Conference ISIS & ICBAK 2019)

Abstract

:
This paper proposes a hybrid estimation algorithm for independently estimating the response function for the center and the response function for the spread in fuzzy regression model. The proposed algorithm combines the least absolute deviations estimation with discriminant analysis. In addition, the F-transform is used to convert spreads of the dependent variable into several groups. Two examples show that our method is superior to the existing methods based on the fuzzy regression model that assumes the same function for spread and center.

1. Introduction

Regression analysis is one of the most useful and popular sub-parts of supervised learning, and success stories have been reported in many areas. Regression analysis assumes that the relevant experimental data are accurate, and numerous statistical treatments have been proposed to model accurately defined data [1]. However, there are cases in which this assumption is unrealistic, because data used in regression analysis consist of observations that might be imprecise such as “about 50 years old” or “almost fully cured”.
In regression analysis, we sometimes encounter imprecise data with non-sharp boundary, linguistic data, and incomplete information. The fuzzy set theory introduced by Zadeh [2] has been used as a means to solve this problem. The fuzzy regression model is needed to estimate the statistical relationship among variables when independent variables, dependent variables or regression coefficients are expressed as a fuzzy set. Since the fuzzy regression model introduced by Tanaka [3], numerous modified or extended fuzzy regression models have been proposed, and many successful applications have been presented [2,3,4,5,6,7,8,9,10,11,12,13,14,15]. One goal of the fuzzy regression analysis is to discover statistical relationships between given fuzzy values. The general approach, as discussed by many authors, is to attempt to induce their relationship from center and spread of fuzzy values using the same function. In the fuzzy regression model, the response functions for center and spread may not be the same. Choi and Yoon [6] described a fuzzy regression model whose response function for center and spread is an exponential function and a linear function, respectively.
In the fuzzy regression model, the response functions for center and spread may not be the same. Choi and Yoon [6] described a fuzzy regression model whose response function for center and spread is an exponential function and a linear function, respectively. If the response function for center and spread does not match, it may be more effective to assume the response function for center and spread in an independent form, and to estimate the assumed response function. In addition, in fuzzy regression models, the center or spread of observation of the response function is often obtained in repeated numbers. If the results of the survey or the quality of the product is expressed as fuzzy numbers, the spread of the response function may repeat a finite value as examples presented in D’Urso [8] and Yoon et al. [14,15]. In this case, it may be more efficient to use regression models for categorical data instead of continuous data. In the ordinary regression analysis, the categorical regression analysis, logistic regression analysis or discriminant analysis is used to analyze the categorical response variable. If the spreads of the fuzzy data are repeated by the same number and the center of the fuzzy data varies, it may be more efficient to estimate the fuzzy regression model by independently estimating the spread and center instead of applying the equivalent function. This is our motivation. The proposed hybrid estimation algorithm is a modified and extended new version of the algorithm introduced in Jung et al. [16].
We propose an algorithm that constructs a fuzzy regression model when the response function for the spread and the center of the dependent variable do not match. To estimate a regression model for the center of the dependent variable with continuous data, the least absolute deviations (LAD) estimation method is used that is not sensitive to an outlier. In addition, a discriminant analysis is used to estimate the response function for the spread of the dependent variable expressed as repeated numbers by finite number. The F-transform introduced by Perfilieva [17] has been studied and found useful in many applications [18,19,20,21,22]. The F-transform converts original data into weighted mean values where the weights are given by the basic functions which are membership functions to identify fuzzy sets. In this paper, we use the F-transform to categorize the spreads of dependent variable. In order to predict the dependent variable of the fuzzy regression model, we propose a hybrid algorithm that combines LAD estimation with discriminant analysis and F-transform. In fuzzy regression analysis, we expect the proposed hybrid algorithm to improve the prediction error if the response function for the spread and the center of the dependent variable do not match.
This paper is organized as follows: Section 2 presents preliminary concepts required to develop the main results. Section 3, proposes a hybrid estimation algorithm combining the LAD estimation method and the discriminant analysis. Section 4 gives two numerical examples to explain our results and compare with the existing methods. Section 5 concludes the paper.

2. Preliminaries

Fuzzy set theory was introduced to provide a suitable concept for dealing with inaccurate data [2]. Following [3,4], we introduce some definitions required to develop the main results such as the fuzzy sets and the fuzzy numbers.
A fuzzy set A is a set of ordered pairs
A = { ( x , μ A ( x ) ) : x A } ,
where μ A : X [ 0 , 1 ] is a membership function of A.
The support of A defined on R is a crisp set defined as
s u p p A = { x | x R , μ A ( x ) > 0 } .
A fuzzy number, A, is a normal and convex subset of the real line, R , with bounded support. As a special case, a fuzzy number A, denoted by A = ( m A , s l A , s r A ) L R , is said to be a L R -fuzzy number if its membership function is denoted by
μ A ( x ) = L m A x s l A for 0 m A x s l A , R x m A s r A for 0 x m A s r A , 0 for otherwise ,
where m A is the center, s l A and s r A are the left spread and the right spread. L and R are functions verifying the properties of the class of fuzzy sets such that L ( 0 ) = R ( 0 ) = 1 and L ( x ) = R ( x ) = 0 , x R [ 0 , 1 ) . In particular, if L ( x ) = R ( x ) = 1 x in A = ( m A , s l A , s r A ) L R , then A is called a triangular fuzzy number and is denoted by A = ( m A , s l A , s r A ) T .
For any α in [ 0 , 1 ] , the α -level set of a fuzzy set A is a crisp set A α = { x R : μ A ( x ) α } that contains all the elements in X that have membership value in A greater than or equal to α .
F (Fuzzy)-transform was introduced by Perfilieva [17] in 2001. Here, some basic concepts from [18] are introduced.
Definition 1.
Let c 1 < < c k be fixed nodes within [ a , b ] , such that c 1 = a , c k = b and k 2. We say that fuzzy sets A 1 , , A k , identified with their membership functions A 1 ( x ) , , A k ( x ) defined on [ a , b ] , constitute a fuzzy partition of [ a , b ] if they fulfill the following conditions for j = 1 , , k :
(1) 
A j : [ a , b ] [ 0 , 1 ] , A j ( c j ) = 1 ;
(2) 
A j ( x ) = 0 if x ( c j 1 , c j + 1 ) where for the uniformity of denotation, we put c 0 = a , c k + 1 = b ;
(3) 
A j ( x ) is continuous;
(4) 
A j ( x ) strictly increases on [ c j 1 , c j ] ( j = 2 , , k ) and A j ( x ) strictly decreases on [ c j , c j + 1 ] ( j = 1 , , k 1 )
(5) 
For all x [ a , b ] ,
j = 1 k A j ( x ) = 1 .
The membership functions A 1 ( x ) , , A k ( x ) are called symmetric basic functions. An example of fuzzy sets A 1 , , A k with symmetric triangular membership functions on the interval [ a , b ] is given below:
A j ( x ) = 1 | x c j h j | x [ c j 1 , c j + 1 ] , 0 otherwise ,
where h j is defined by h j = c j c j 1 on [ c j 1 , c j ] ( j = 2 , , k ). Also, h j = c j + 1 c j on [ c j , c j + 1 ] ( j = 1 , , k 1 ). Here, A j ( x ) strictly increases on [ c j 1 , c j ] ( j = 2 , , k ) and A j ( x ) strictly decreases on [ c j , c j + 1 ] ( j = 1 , , k 1 ), which satisfy above conditions.
Definition 2.
Let a discrete function f : X R be given at a finite set of points X = { x t : t = 1 , n } [ a , b ] . The F-transform of a discrete function f with respect to A 1 , , A k define the numerical vector F k [ f ] = [ F 1 , F 2 , , F k ] , where each F j is given by
F j = t = 1 n f ( x t ) A j ( x t ) t = 1 n A j ( x t ) , j = 1 , , k .
The F j are weighted mean values of f, where the weights are determined by the membership values. The F j are called components of the discrete F-transform.
Definition 3.
Let F k [ f ] = [ F 1 , F 2 , , F k ] be the F-transform of f with respect to A 1 , , A k . Then the function
f F , k ( x t ) = j = 1 k F j A j ( x t ) j = 1 k A j ( x t ) , j = 1 , , k ,
is called the inverse F-transform of f .

3. Fuzzy Regression Based on the F-Transform and Discriminant Analysis

General fuzzy linear regression model proposed by Tanaka [3] and applied in many fields is formulated as follows:
Y ( X i ) = F ( A , X i ) + E i , i = 1 , n
where X i = ( X i j ) ( 1 × ( p + 1 ) ) is the fuzzy input, A = ( A j ) ( 1 × ( p + 1 ) ) is the fuzzy coefficients, F ( A , X i ) is the known response function, Y ( X i ) is the fuzzy output and E i is the fuzzy error .
The α -level set of the general fuzzy regression model is given as follows:
Y ( X i ) α = F ( A α , X i α ) + E i α ( * ) .
The center of the above model(*) is represented as follows:
m y i = f ( a 0 , a 1 , , a p , x i 0 , x i 1 x i p ) + e i .
And the models for the left and right spread of the model (*) are as follows:
( s l Y i ) α = f l ( ( s l A ) α , ( s l X i ) α ) + ( s l E i ) α and ( s r Y i ) α = f r ( ( s r A ) α , ( s r X i ) α ) + ( s r E i ) α .
In particular, if the response function is linear, the fuzzy regression model is formulated as follows:
Y i = A 0 + A 1 X i 1 + + A p X i p + E i , i = 1 , n
where Y i = ( m Y i , s l Y i , s r Y i ) T , A i = ( m A i , s l A i , s r A i ) T and X i k = ( m X i k , s l X i k , s r X i k ) T are fuzzy numbers.
We estimate the α -level set of the proposed fuzzy regression model using by the least squares method which minimizes the sum of squared residuals
i = 1 n [ ( s l Y i ) α * f l ( ( s l A ) α * , ( s l X i ) α * ) ] 2 .
In fuzzy regression analysis, many researchers have used the above least squares method. However, there are two parts to consider in a general fuzzy regression model. The first is that the response function for the left spread f l ( · ) and the response function for the right spread f r ( · ) are not the same. Choi and Yoon [6] proposed a general fuzzy regression model whose response function to the center is exponential and the response function to the spread is linear. The second is that the size of spread or center set of the dependent variable may be very small than the number of samples.
In fuzzy analysis, if the response functions of the spread and center are not match, it may be more efficient to construct the fuzzy regression model by independently estimating the spread and center instead of applying the equivalent function. In addition, if the spreads of dependent and independent variables are expressed in several repetitive numbers as shown in Table 1 presented by D’Urso [8], it may be more efficient to use statistical methods for categorical data than statistical methods for continuous variables. In Table 1 presented by D’Urso [8], the number of samples is 30 but the spread of the response variable is expressed in only 4 values.
In general, the spreads of the fuzzy data in a fuzzy sample vary greatly. However, the spreads of fuzzy data obtained from quality of products, or the preference or surveys, consists of repeating numbers. If the spread is repeated by the same number and the center changes, it may be more efficient to estimate the fuzzy regression model by estimating the spread and center separately.
Especially for fuzzy regression models, it may be more efficient to estimate the spread using categorical data analysis methods when the number of spreads of the dependent variable is small due to iterations.
Discriminant analysis [23,24] is one of the techniques that are used to predict the probability of belonging to a given category based on one or multiple independent variables when the dependent variable is categorical and the independent variable is interval in nature. The categorical variable means that the dependent variable is divided into several categories. In the fuzzy regression model, if the number of spreads of a given dependent variable is small due to iterations, it converts the dependent variable to a categorical variable.
To do this, we use the F-transform. In this paper, we propose the hybrid estimation algorithm to predict the dependent variable of the fuzzy regression with spreads which are represented by only a few numbers due to repetition.

The Proposed Hybrid Estimation Algorithm

Step 1.
Estimate the center m Y i of dependent variable Y i .
Using the set of center of the dependent variable { ( m X i 1 , , m X i p , m Y i ) : i = 1 , , n } , obtain the predicted value m Y ^ i of center of the dependent variable by minimizing the following object function
i = 1 n | m Y i m A 0 m A 1 m X i 1 m A p m X i p |
Step 2.
Estimate the left spread of the dependent variable using F-transform and Fisher’s linear discriminant analysis.
1.
Define the universe of discourse S l : S l Y = { s l Y i : i = 1 , 2 , . . . n } is the set of spreads of dependent variable. Let s l Y i ( 1 ) and s l Y i ( n ) be the minimum and the maximum value of S l Y . Then the universe of discourse S l is defined by S l = [ m a x { 0 , s l Y i ( 1 ) c 1 }, s l Y i ( n ) + c 2 ], where c 1 and c 2 are two proper positive real numbers. The values c 1 and c 2 are predefined by researcher or can be considered to be the tuning parameters.
2.
Define the basic function on S l and obtain F-transform: The F-transform F k [ S l Y ] = [ F 1 l , F 2 l , , F k l ] is obtained by
F j l = i = 1 n s l Y i A j ( s l Y i ) i = 1 n A j ( s l Y i ) , j = 1 , , k ,
where the function A j ( x ) is the basic function on the universe S l .
3.
Classify the given data by using the Fisher’s linear discriminant analysis: Each of the data is grouped q times which is the number of overlapped basic functions of corresponding data. In Figure 1, the number of overlapped basic functions q = 2 in (a,b) and q = 3 in (c). If s l Y i is included in the support of A j then s l Y i is assigned to group g j . Since the basic functions are overlapped, s l Y i can be assigned to more than one group. Using the assigned group and independent variables, we construct the Fisher’s linear discriminant function and predict the assigned group based on the discriminant score. The predicted group is represented by g j ^ , j = 1 , 2 , , q .
4.
Predict the left spread of dependent variable using the inverse F-transform: Using the F-transform obtained by the previous step 2-2 and the inverse F-transform, the spread s l Y i is predicted. Let F k [ S l Y ] = [ F 1 l , F 2 l , , F k l ] be the F-transform of the left spread with respect to A 1 ( x ) , , A k ( x ) . Then the predicted left spread is given by
s ^ l Y i = j = 1 k F j * A j ( s l Y i ) j = 1 k A j ( s l Y i ) , i = 1 , , n ,
where F j * represents the F-transform corresponding to the predicted group g j ^ .
Step 3.
Predict the right spread s r Y i by the step 2.
By this hybrid algorithm, we predict the values of dependent variable. Our overall method is summarized in Algorithm 1.
Algorithm 1: predicting value of dependent variable with F-transform
Applsci 10 06726 i001

4. Numerical Examples and Comparison Studies

In this section, we illustrate two examples to compare the performance of the proposed method and the existing methods which assume the same model for center and spread. Chachi and Taheri estimated the fuzzy regression model by minimizing the sum of the square of the left and right endpoints of the α -level of residuals [4]. Diamond constructed the fuzzy regression model by minimizing the sum of the square of residuals of the end points of support and the center [7]. Two measures are used to compare the performance of the estimated fuzzy regression model. One is the D ( Y , Y ^ ) based on the difference between the estimated value and observed value, and the other is I ( Y , Y ^ ) comparing the overlapped area.
The performance measure D ( Y , Y ^ ) comparing the difference between the predicted value and observed value and is given as follows [5].
D ( Y , Y ^ ) = i = 1 n m d ( Y i , Y ^ i ) ,
where
m d ( Y i , Y ^ i ) = | μ Y i ( x ) μ Y ^ i ( x ) | d x | μ Y i ( x ) | d x + h d ( Y i ( 0 ) , Y ^ i ( 0 ) )
and h d ( Y i ( 0 ) , Y ^ i ( 0 ) ) = inf { inf { | a b | : a Y i ( 0 ) } : b Y ^ i ( 0 ) } . The more efficient model has m d value closer to zero.
The performance measure I ( Y , Y ^ ) comparing the overlapped area between the predicted value and observed value is given as follows [4].
I ( Y , Y ^ ) = i = 1 n c ( Y i , Y ^ i ) ,
where
c ( Y i , Y ^ i ) = M i n { μ Y i ( x ) , μ Y ^ i ( x ) } d x μ Y i ( x ) d x + μ Y ^ i ( x ) d x M i n { μ Y i ( x ) , μ Y ^ i ( x ) } d x
and M i n means the minimum value. The more efficient method has the smaller value of D ( Y , Y ^ ) and the larger value of I ( Y , Y ^ ) .
To show the efficiency of the proposed estimation method, we use examples used by D’Urso [8] and Yoon et al. [14]. To compare the performance of the proposed algorithm with the existing methods, the Chachi and Taheri method [4] and the Diamond method [7], which are based on least squares estimation, are used.
Example 1.
D’Urso [8] and Chachi and Taheri [4] consider a multiple fuzzy regression model. Table 1 shows the performance data for 30 quality Roman restaurants.
1.
Estimate the center of the dependent variable using the least absolute deviation method. The estimated center is given by the following equation:
m ^ Y i = 0.070486 + 0.418084 x 1 + 0.500537 x 2
2.
From Table 1, we obtain S l Y = { 0 , 0.25 , 0.5 , 0.75 } which is the set of left spreads of dependent variable. Define the universe discourse S l as [ m a x { 0 , S l Y m i n c 1 } , S l Y m a x + c 2 ] = [ 0 , 0.8 ] , where S l Y m i n = 0 and S l Y m a x = 0.75 with proper constant c 1 and c 2 .
3.
Estimate the left spread of dependent variable using the F-transform and Fisher’s linear discriminant analysis. The given data shows that the spreads of 30 samples are expressed in only four numbers. The basic functions of S l can be given as follows:
L k = ( 0 , 0 , 0.2 ) T k = 1 , ( 0.2 ( k 1 ) , 0.2 , 0.2 ) T 2 k 4 , ( 0.8 , 0.2 , 0 ) T k = 5 .
Using the defined fuzzy partition and Equation (2), the F-transform is obtained as follows:
F k [ S l Y ] = [ F 1 l , F 2 l , F 3 l , F 4 l , F 5 l ] = [ 0 , 0.25 , 0.42742 , 0.55357 , 0.75 ] .
Since the number of the overlapped basic function is two, each of the data is grouped two times. The first one is represented by g 1 and the Fisher linear discriminant functions which classify the given data using independent variables l X 1 and l X 2 are as follows:
g 12 = 68.95679 + 8.57909 l X 1 + 11.23412 l X 2 g 13 = 54.63401 + 7.71719 l X 1 + 9.92721 l X 2 g 14 = 68.25801 + 8.59155 l X 1 + 11.12702 l X 2 g 15 = 64.67950 + 8.022077 l X 1 + 11.13042 l X 2
where l X 1 = S l X 1 m X 1 and l X 2 = S l X 2 m X 2 are not spreads but endpoints of the triangular fuzzy set.
Using the Fisher’s discriminant scores obtained by the above the Fisher linear discriminant functions, g 1 is predicted as g 1 ^ . By similar method, the second one is represented by g 2 and g 2 is predicted as g 2 ^ . The results are presented in Table 2. In Table 2, the symbol m . g 1 (or m . g 2 ) represents the membership degree to the group g 1 (or g 2 ) of s l y . From Table 2 and Equation (3), the left spreads are predicted. For instance, s ^ l Y 1 = F 3 l · m . g 1 + F 2 l · m . g 2 m . g 1 + m . g 2 = 0.42742 · 0.75 + 0.25 · 0.25 0.75 + 0.25 = 0.38306 . By same method, the right spreads are also predicted and the predicted spreads are presented in Table 3. Table 4 shows the final values estimated by the proposed hybrid algorithm. The symbols l y and r y denote the left endpoint and the right endpoint of the dependent variable, respectively.
The final estimate Y ^ is obtained from Table 4, and the performance measures D ( Y , Y ^ ) and I ( Y , Y ^ ) are given in Table 5. Table 5 shows that the proposed hybrid estimation algorithm has the smallest sum of areas for residual. The area of overlap between the estimated and observed values is smaller than that of the difference method, but the difference is not large. Therefore, we can say that the proposed hybrid algorithm outperforms the existing methods.
Example 2.
Yoon et al. [15] surveyed the impact of family ( X i 1 ), colleague ( X i 2 ), school ( X i 3 ), and national satisfaction ( X i 4 ) on life satisfaction ( Y i ). Table 3 shows data on the satisfaction study, and the data ( m , s ) T in Table 6 is represented by a symmetric triangular fuzzy number with central m with width s. The number of data on the life satisfaction in Table 6 is 106 but the size of the set of the spread S s = { 0 , 0.5 , 1 , 1.5 , 2.5 , 3.5 , 4 , 4.5 , 5 } is only nine. This means that the same value is expressed repeatedly. In this case, it may be more efficient to classify the spread of a dependent variable using discriminant analysis.
Table 6 shows data on the satisfaction study, and the data ( m , s ) T in Table 6 is represented by a symmetric triangular fuzzy number with central m with width s. The number of data on the life satisfaction in Table 6 is 106 but the size of the set of the spread S s = { 0 , 0.5 , 1 , 1.5 , 2.5 , 3.5 , 4 , 4.5 , 5 } is only nine. This means that the same value is expressed repeatedly. In this case, it may be more efficient to classify the spread of a dependent variable using discriminant analysis. The fuzzy partition of S s for discriminant analysis can be defined as follows:
S k = ( 0 , 0 , 1.25 ) T k = 1 , ( ( k 1 ) 1.25 , 1.25 , 1.25 ) T 2 k 5 , ( 6.25 , 1.25 , 0 ) T k = 6 .
Since the number of the overlapped basic function is two, each of the data is grouped two times. The first one is represented by g 1 and the Fisher linear discriminant functions which classify the given data using independent variables l X 1 and l X 2 are as follows:
g 12 = 166.828 + 1.12827 l X 1 + 1.69757 l X 1 + 0.63796 l X 1 + 0.40676 l X 1 g 13 = 160.58558 + 1.08448 l X 1 + 1.69049 l X 1 + 0.662188 l X 1 + 0.40055 l X 1 g 14 = 163.4525 + 1.10958 l X 1 + 1.67704 l X 1 + 0.59332 l X 1 + 0.44547 l X 1 g 15 = 165.96681 + 1.09384 l X 1 + 1.69994 l X 1 + 0.62236 l X 1 + 0.45089 l X 1
where l X i = S l X i m X i ( i = 1 , 2 , 3 , 4 ) are not spreads but endpoints of the triangular fuzzy set. And the result of LAD estimation for the centers { ( m X i 1 , , m X i 4 , m Y i ) : i = 1 , , 106 } of dependent and independent variables is as follows:
m Y ^ i = 0.0039 + 0.2005 m X i 1 + 0.2501 m X i 2 + 0.3500 m X i 3 + 0.1992 m X i 4
The life satisfaction estimated using fuzzy partition, discriminant analysis, and LAD estimation is given in Table 6. The results of the performance measure for life satisfaction Y i is presented in Table 7. The performance of the proposed hybrid estimation algorithm is compared with the Diamond method [7] and Chachi and Taheri method [4], as shown in Example 1. Table 7 shows that if the values of spread are repeatedly expressed with the same values, it may be more efficient to use F-transform and discriminating analysis.
Examples 1 and 2 show that the proposed hybrid estimation algorithm may be more efficient if the spread of a given dependent variable is expressed as a repeated number. That is, when the number of spreads of the dependent variable is smaller than the sample size, the proposed hybrid estimation algorithm is effective.

5. Conclusions

In this paper, we have confirmed that the response function for the center and spread of the dependent variable in the fuzzy regression model may not match, and proposed the hybrid estimation algorithm for independently estimating the response function for the center and the response function for the spread. The proposed hybrid estimation algorithm is a modified and extended new version of the algorithm introduced in Jung et al. [16]. We also applied the discriminant analysis for categorical data to construct the fuzzy regression model when the size of the set of spreads of the dependent variable is very small than the number of samples. In addition, F-transform was used to categorize the spread of the dependent variable. Then, we combined the LAD estimation method for the center of the dependent variable with F-transform and discriminant analysis for the spread of dependent variable to estimate the value of the dependent variable.
Two examples have confirmed that the proposed fuzzy regression model estimated using the F-transform, Fisher discriminant function, and LAD estimation method can be more efficient than the existing other methods. This means that when the number of spreads of the dependent variable is much less than the sample size or the number of centers of the dependent variable, the proposed hybrid estimation algorithm can provide more efficient estimation results.
In future studies, we plan to check whether the proposed hybrid algorithm is robust to the number of basic functions and the type of membership function. In addition, we will apply our algorithm to fuzzy regression model with fuzzy coefficients.

Author Contributions

Conceptualization, S.H.C.; methodology, H.-Y.J. and W.-J.L.; software, W.-J.L.; validation, W.-J.L.; formal analysis, W.-J.L.; investigation, H.-Y.J.; resources, H.-Y.J.; data curation, W.-J.L.; writing—original draft preparation, H.-Y.J.; writing—review and editing, H.-Y.J. and S.H.C.; visualization, H.-Y.J.; supervision, S.H.C.; project administration, S.H.C.; funding acquisition, S.H.C. and H.-Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2017R1D1A1B03039559, No.2019R1I1A1A01046810) and Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No.2020-0-01343, Artificial Intelligence Convergence Research Center(Hanyang University)).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Kim, H.K.; Choi, S.H. Statistical Analysis with Applications; Kyung Moon Sa: Seoul, Korea, 2002. [Google Scholar]
  2. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef] [Green Version]
  3. Tanaka, H.; Uejima, S.; Asai, K. Linear regression analysis with fuzzy model. IEEE Trans. Syst. Man Cybern. 1982, 12, 903–907. [Google Scholar]
  4. Chachi, J.; Taheri, S.M. Multiple Fuzzy Regression Model for Fuzzy Input-Output Data. Iran. J. Fuzzy Syst. 2016, 13, 63–78. [Google Scholar]
  5. Choi, S.H.; Buckley, J.J. Fuzzy regression using least absolute deviation estimators. Soft Comput. 2008, 12, 257–263. [Google Scholar]
  6. Choi, S.H.; Yoon, J.H. General fuzzy regression using least squares method. Int. J. Syst. Sci. 2010, 41, 477–485. [Google Scholar] [CrossRef]
  7. Diamond, P. Fuzzy least squares. Inf. Sci. 1988, 46, 141–157. [Google Scholar] [CrossRef]
  8. D’Urso, P. Linear regression analysis for fuzzy/crisp input and fuzzy/crisp output data. Comput. Stat. Data Anal. 2003, 42, 47–72. [Google Scholar]
  9. Kim, I.K.; Lee, W.J.; Yoon, J.H.; Choi, S.H. Fuzzy Regression Model Using Trapezoidal Fuzzy Numbers for Re-auction Dat. Int. J. Fuzzy Log. Intell. Syst. 2016, 16, 72–80. [Google Scholar] [CrossRef] [Green Version]
  10. Lee, W.J.; Jung, H.Y.; Yoon, J.H.; Choi, S.H. The statistical inferences of fuzzy regression based on bootstrap techniques. Soft Comput. 2015, 19, 883–890. [Google Scholar] [CrossRef]
  11. Taheri, S.M.; Kelkinnama, M. Fuzzy linear regression based on least absolute deviations. Iran. J. Fuzzy Syst. 2012, 9, 121–140. [Google Scholar]
  12. Jung, H.Y.; Yoon, J.H.; Choi, S.H. Fuzzy linear regression using rank transform method. Fuzzy Sets Syst. 2015, 274, 97–108. [Google Scholar] [CrossRef]
  13. Tanaka, H.; Hayashi, I.; Watada, J. Possibilistic linear regression analysis for fuzzy data. Eur. J. Oper. Res. 1989, 40, 389–396. [Google Scholar] [CrossRef]
  14. Yoon, J.H.; Yoon, H.S.; Choi, S.H. A study of the life satisfaction using fuzzy theory. In Proceedings of the 18th International Symposium on Advanced Intelligent Systems (ISIS2017), Daegu, Korea, 11–14 October 2017. [Google Scholar]
  15. Yoon, H.S.; Choi, S.H. The Impact on Life Satisfaction of Nursing Students Using the Fuzzy Regression Model. Int. J. Fuzzy Log. Intell. Syst. 2019, 19, 1–8. [Google Scholar] [CrossRef] [Green Version]
  16. Jung, H.Y.; Lee, W.J.; Choi, S.H. Fuzzy regression model using fuzzy partition. J. Phys. Conf. Ser. 2019, 1334, 12–19. [Google Scholar]
  17. Perfilieva, I.; Haldeeva, E. Fuzzy transformation. In Proceedings of the Joint 9th IFSA World Congress and 20th NAFIPS International Conference, Vancouver, BC, Canada, 25–28 July 2001; pp. 127–130. [Google Scholar]
  18. Perfilieva, I. Fuzzy transforms: Theory and applications. Fuzzy Sets Syst. 2006, 157, 993–1023. [Google Scholar] [CrossRef]
  19. Nguyen, L.; Perfilieva, I.; Holčapek, M. F-Transform Inspired Weak Solution to a Boundary Value Problem. Axioms 2020, 9, 5. [Google Scholar] [CrossRef] [Green Version]
  20. Stepnika, M.; Polakovic, O. A neural network approach to the fuzzy transform. Fuzzy Sets Syst. 2009, 160, 1037–1047. [Google Scholar] [CrossRef] [Green Version]
  21. Roh, S.B.; Oh, S.K.; Seo, K. Design of face recognition system based on fuzzy transform and radial basis function neural networks. Soft Comput. 2019, 23, 4969–4985. [Google Scholar] [CrossRef]
  22. Lee, W.J.; Jung, H.Y.; Yoon, J.H.; Choi, S.H. A novel forecasting method based on F-transform and fuzzy time series. Int. J. Fuzzy Syst. 2017, 19, 1793–1802. [Google Scholar] [CrossRef] [Green Version]
  23. McLachlan, G.J. Discriminant Analysis and Statistical Pattern Recognition; Wiley Interscience: New York, NY, USA, 2004. [Google Scholar]
  24. Fisher, R.A. The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Figure 1. Examples of basic functions. (a) Triangular function, (b) sinusoidal function, (c) trapezoidal function.
Figure 1. Examples of basic functions. (a) Triangular function, (b) sinusoidal function, (c) trapezoidal function.
Applsci 10 06726 g001
Table 1. Fuzzy observed value and predicted value.
Table 1. Fuzzy observed value and predicted value.
No X i 1 X i 2 Y i Y ^ i
1 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 8 , 0.75 , 1.00 ) T ( 5.75 , 0.38 , 0.83 ) T
2 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 6 , 0.25 , 0.50 ) T ( 5.25 , 0.29 , 0.5 ) T
3 ( 6 , 0.25 , 0.50 ) T ( 7 , 0.50 , 1.25 ) T ( 6 , 0.25 , 0.50 ) T ( 6.08 , 0.29 , 0.50 ) T
4 ( 8 , 0.75 , 1.00 ) T ( 9 , 0.00 , 1.00 ) T ( 9 , 0.00 , 1.00 ) T ( 7.92 , 0.00 , 1.20 ) T
5 ( 8 , 0.75 , 1.00 ) T ( 8 , 0.75 , 1.00 ) T ( 8 , 0.75 , 1.00 ) T ( 7.42 , 0.50 , 1.20 ) T
6 ( 6 , 0.25 , 0.50 ) T ( 7 , 0.50 , 1.25 ) T ( 5 , 0.00 , 1.00 ) T ( 6.08 , 0.25 , 0.83 ) T
7 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 7 , 0.50 , 1.25 ) T ( 7.00 , 0.50 , 1.18 ) T
8 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 5 , 0.00 , 1.00 ) T ( 6.50 , 0.25 , 1.20 ) T
9 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 7 , 0.50 , 1.25 ) T ( 7.00 , 0.49 , 1.18 ) T
10 ( 6 , 0.25 , 0.50 ) T ( 7 , 0.50 , 1.25 ) T ( 6 , 0.25 , 0.50 ) T ( 6.08 , 0.29 , 0.50 ) T
11 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 8 , 0.75 , 1.00 ) T ( 7.00 , 0.19 , 1.20 ) T
12 ( 7 , 0.50 , 1.25 ) T ( 6 , 0.25 , 0.50 ) T ( 6 , 0.25 , 0.50 ) T ( 6.00 , 0.29 , 0.50 ) T
13 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 9 , 0.00 , 1.00 ) T ( 7.00 , 0.43 , 1.20 ) T
14 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 8 , 0.75 , 1.00 ) T ( 7.00 , 0.19 , 1.20 ) T
15 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 6.50 , 0.39 , 1.18 ) T
16 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 6.50 , 0.29 , 1.18 ) T
17 ( 6 , 0.25 , 0.50 ) T ( 7 , 0.50 , 1.25 ) T ( 6 , 0.25 , 0.50 ) T ( 6.08 , 0.29 , 0.50 ) T
18 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 7 , 0.50 , 1.25 ) T ( 7.00 , 0.13 , 1.18 ) T
19 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 6.50 , 0.52 , 1.20 ) T
20 ( 7 , 0.50 , 1.25 ) T ( 9 , 0.00 , 1.00 ) T ( 7 , 0.50 , 1.25 ) T ( 7.50 , 0.13 , 1.18 ) T
21 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 7 , 0.50 , 1.25 ) T ( 7.00 , 0.49 , 1.18 ) T
22 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 6 , 0.25 , 0.50 ) T ( 7.00 , 0.06 , 1.11 ) T
23 ( 7 , 0.50 , 1.25 ) T ( 9 , 0.00 , 1.00 ) T ( 7 , 0.50 , 1.25 ) T ( 7.50 , 0.13 , 1.18 ) T
24 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 6.50 , 0.38 , 1.20 ) T
25 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 6 , 0.25 , 0.50 ) T ( 6.50 , 0.29 , 1.11 ) T
26 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 6 , 0.25 , 0.50 ) T ( 6.50 , 0.29 , 1.11 ) T
27 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 6.50 , 0.34 , 1.18 ) T
28 ( 7 , 0.50 , 1.25 ) T ( 8 , 0.75 , 1.00 ) T ( 7 , 0.50 , 1.25 ) T ( 7.00 , 0.49 , 1.18 ) T
29 ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 7 , 0.50 , 1.25 ) T ( 6.50 , 0.34 , 1.18 ) T
30 ( 6 , 0.25 , 0.50 ) T ( 7 , 0.50 , 1.25 ) T ( 6 , 0.25 , 0.50 ) T ( 6.08 , 0.29 , 0.50 ) T
Table 2. The predicted groups based on the overlapped basic functions.
Table 2. The predicted groups based on the overlapped basic functions.
obs s ly g l g 1 ^ m . g 1 g 2 g 2 ^ m . g 2
10.75530.75420.25
20.25330.25220.75
30.25330.25220.75
40220111
50.75540.75430.25
60230121
70.5440.5330.5
80230121
90.5440.5330.5
100.25330.25220.75
110.75520.75410.25
120.25330.25220.75
130240131
140.75520.75410.25
150.5430.5320.5
160.5430.5320.5
170.25330.25220.75
180.5420.5310.5
190.75540.75430.25
200.5420.5310.5
210.5440.5330.5
220.25320.25210.75
230.5420.5310.5
240.75530.75420.25
250.25330.25220.75
260.25330.25220.75
270.5430.5320.5
280.5440.5330.5
290.5430.5320.5
300.25330.25220.75
Table 3. The predicted spreads based on the inverse F-transform.
Table 3. The predicted spreads based on the inverse F-transform.
obs s ly s ^ ly s ry s ^ ry
10.750.3830610.83333
20.250.294350.50.5
30.250.294350.50.5
40011.20434
50.750.5220311.20434
600.2510.83333
70.50.49051.251.18151
800.2511.20434
90.50.49051.251.18151
100.250.294350.50.5
110.750.187511.20434
120.250.294350.50.5
1300.4274211.20434
140.750.187511.20434
150.50.338711.251.18151
160.50.338711.251.18151
170.250.294350.50.5
180.50.1251.251.18151
190.750.5220311.20434
200.50.1251.251.18151
210.50.49051.251.18151
220.250.06250.51.11301
230.50.1251.251.18151
240.750.3830611.20434
250.250.294350.51.11301
260.250.294350.51.11301
270.50.338711.251.18151
280.50.49051.251.18151
290.50.338711.251.18151
300.250.294350.50.5
Table 4. The final prediction results of the dependent variable.
Table 4. The final prediction results of the dependent variable.
obs l y m y r y l y ^ m y ^ r y ^
17.25895.363845.74696.58023
25.7566.54.952055.24645.7464
35.7566.55.788256.08266.5826
499107.91987.91989.12414
57.25896.897277.41938.62364
65565.83266.08266.91593
76.578.256.51077.00128.18271
85566.25076.50077.70504
96.578.256.51077.00128.18271
105.7566.55.788256.08266.5826
117.25896.81377.00128.20554
125.7566.55.705856.00026.5002
1399106.573787.00128.20554
147.25896.81377.00128.20554
156.578.256.161996.50077.68221
166.578.256.161996.50077.68221
175.7566.55.788256.08266.5826
186.578.256.87627.00128.18271
197.25895.978676.50077.70504
206.578.257.37677.50178.68321
216.578.256.51077.00128.18271
225.7566.56.93877.00128.11421
236.578.257.37677.50178.68321
247.25896.117646.50077.70504
255.7566.56.206356.50077.61371
265.7566.56.206356.50077.61371
276.578.256.161996.50077.68221
286.578.256.51077.00128.18271
296.578.256.161996.50077.68221
305.7566.55.788256.08266.5826
Table 5. The comparison of performance measures for Example 1.
Table 5. The comparison of performance measures for Example 1.
Diamond Method [7]Chachi Method [4]Proposed Method
D ( Y , Y ^ ) 1.74421.94951.607425
I ( Y , Y ^ ) 0.26680.37850.368251
Table 6. Data for the satisfaction in Example 2.
Table 6. Data for the satisfaction in Example 2.
Level of Satisfaction
X i 1 X i 2 X i 3 X i 4 Y i Y ^ i
1 ( 67.5 , 2.5 ) T ( 77.5 , 2.5 ) T ( 75.5 , 4.5 ) T ( 70.0 , 0.0 ) T ( 77.5 , 2.5 ) T ( 76.0 , 2.5 ) T
2 ( 85.5 , 4.5 ) T ( 87.5 , 2.5 ) T ( 90.0 , 0.0 ) T ( 77.5 , 2.5 ) T ( 75.5 , 4.5 ) T ( 88.0 , 5.0 ) T
105 ( 90.0 , 0.0 ) T ( 90.0 , 0.0 ) T ( 77.5 , 2.5 ) T ( 90.0 , 2.5 ) T ( 85.5 , 4.5 ) T ( 86.5 , 5.0 ) T
106 ( 100 , 0.00 ) T ( 90.0 , 1.0 ) T ( 80.0 , 4.0 ) T ( 80.0 , 3.5 ) T ( 67.5 , 2.5 ) T ( 84.5 , 2.5 ) T
Table 7. The comparison of performance measures for Example 2.
Table 7. The comparison of performance measures for Example 2.
Dianmond Method [7]Chachi Method [4]Proposed Method
D ( Y , Y ^ ) 7.10026.98356.8701
I ( Y , Y ^ ) 0.10830.10780.1330

Share and Cite

MDPI and ACS Style

Jung, H.-Y.; Lee, W.-J.; Choi, S.H. Hybrid Fuzzy Regression Analysis Using the F-Transform. Appl. Sci. 2020, 10, 6726. https://doi.org/10.3390/app10196726

AMA Style

Jung H-Y, Lee W-J, Choi SH. Hybrid Fuzzy Regression Analysis Using the F-Transform. Applied Sciences. 2020; 10(19):6726. https://doi.org/10.3390/app10196726

Chicago/Turabian Style

Jung, Hye-Young, Woo-Joo Lee, and Seung Hoe Choi. 2020. "Hybrid Fuzzy Regression Analysis Using the F-Transform" Applied Sciences 10, no. 19: 6726. https://doi.org/10.3390/app10196726

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop