Next Article in Journal
Robust Procedures for Estimating and Testing in the Framework of Divergence Measures
Previous Article in Journal
BFF: Bayesian, Fiducial, and Frequentist Analysis of Cognitive Engagement among Cognitively Impaired Older Adults
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Regression Models for Symbolic Interval-Valued Variables

by
Jose Emmanuel Chacón
1 and
Oldemar Rodríguez
2,*
1
National Bank of Costa Rica, San José 11501-2060, Costa Rica
2
School of Mathematics, CIMPA, University of Costa Rica, San José 11501-2060, Costa Rica
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(4), 429; https://doi.org/10.3390/e23040429
Submission received: 15 February 2021 / Revised: 27 March 2021 / Accepted: 30 March 2021 / Published: 6 April 2021

Abstract

:
This paper presents new approaches to fit regression models for symbolic internal-valued variables, which are shown to improve and extend the center method suggested by Billard and Diday and the center and range method proposed by Lima-Neto, E.A.and De Carvalho, F.A.T. Like the previously mentioned methods, the proposed regression models consider the midpoints and half of the length of the intervals as additional variables. We considered various methods to fit the regression models, including tree-based models, K-nearest neighbors, support vector machines, and neural networks. The approaches proposed in this paper were applied to a real dataset and to synthetic datasets generated with linear and nonlinear relations. For an evaluation of the methods, the root-mean-squared error and the correlation coefficient were used. The methods presented herein are available in the the RSDA package written in the R language, which can be installed from CRAN.

1. Introduction

Statistical and data mining methods have been developed mainly in cases in which variables take a single value. In real life, there are many situations in which the use of these type of variables can lead to an important loss of information or result in time-consuming calculations. In the case of quantitative variables, a more complete description can be achieved by describing an ensemble of statistical units in terms of interval data, that is the value taken by a variable is a closed interval in the real numbers.
It is especially useful when it is convenient to summarize large datasets in such a way that the resulting summary is of a manageable size and still maintains as much information as possible from the original dataset. For example, suppose we want to substitute the information of all transactions made by the owner of a credit card with a unique “transaction” summarizing all the original transactions. This is achieved thanks to the fact that this new transaction will have in its fields not only numbers, but also intervals defined by, for example, the minimum and maximum purchase.
The statistical treatment of interval-valued data was considered in the context of symbolic data analysis (SDA) introduced by E. Diday in [1], the objective of which is to extend the classic statistical methods to the study of more complex data structures that include, among others, interval-valued variables. A complete presentation on symbolic data analysis can be found in [2,3,4].
Research on SDA has focused primarily on unsupervised learning, with few contributions in the field of regression, which have been made mainly based on linear models. In this work, we explore nonlinear regression algorithms in conjunction with the center method and the center and range method for interval-valued data to try to improve the results involving classical linear regression for the center and center and range methods proposed in [5,6], respectively, and the extended lasso and ridge regression for interval-value data proposed in [7]. In this way, we are able to study how more sophisticated algorithms can improve the traditional models based on linear methods.
Furthermore, we extend the tool kit of algorithms available for regression on interval-valued data, taking advantage of the properties and the large number of algorithms for real-valued data. We explore the classical linear regression models, tree-based regression models (regression trees, random forest, and boosting), K-nearest neighbors regression, support vector machines regression, and regression using neural networks.
Section 2 gives a summarized presentation of the different regression models for real-valued data. Section 3 presents a summary of the center method and the center and range methods in the context of each regression model considered. Section 4 presents an experimental evaluation with a real dataset and simulated datasets, which evidences the differences and improvements among the models. Finally, Section 5 gives the concluding remarks based on the results.

2. Regression Methods

In this section, we provide a short review of the main regression methods and their mathematical formulation.

2.1. Classical Linear Regression Methods

We present only a summary of the classical linear regression model, while a complete presentation can be found in [8,9]. Linear regression models for decades have been some of the most important predictive methods in statistics; in fact, this continues today as one of the most important tools in statistics and data mining. The idea is, given an input vector x t = ( x 1 , x 2 , , x p ) , where p is the number of variables and x t represents the transpose of x, we want to predict the response variable y through the following linear model:
y ^ = β ^ 0 + j = 1 p x j β ^ j ,
where β 0 is called the intercept. If a constant one is included with vector x and β 0 in the coefficients vector β , the linear model can be written in vectorial form as a product as follows:
y ^ = x t β ^ ,
To fit the linear model in the training data, the most popular estimation method is least squares. In this approach, we pick the coefficients β to minimize the residual sum of squares (RSS):
RSS ( β ) = i = 1 n y i x i t β 2 .
where n is the number of observations in the dataset. RSS ( β ) is a quadric function; therefore, its minimum always exists. Note that it can be written as:
RSS ( β ) = ( y X β ) t ( y X β ) ,
where X is an n × p matrix wherein each row is a vector in the training dataset; y is an n size vector (the output vector in the training dataset). It is well known that, if X t X is a nonsingular matrix, the solution is given by:
β ^ = ( X t X ) 1 X t y .
The approximate value by this model for the component x i can be estimated as y ^ i = x i t β ^ , and the fitted values for a new case x t = ( 1 , x 1 , , x p ) are given by y ^ = x t β ^ .
In practice, there are various methods to find the coefficients β ^ , but the existing methods, and in particular the least squares method, are labor intensive and time consuming with large datasets, and others are not accurate enough with these kinds of datasets. A new non-iterative algorithm for identifying multiple regression coefficients based on the SGTMneural-like structure for the case of large volumes of data processing was proposed in [10], where the high efficiency of the method for the accuracy and speed in comparison with the existing methods was established.
Extensions to this model include shrinkage linear regression models, such as the ridge and lasso models. These models impose different types of regularization on the parameters: L2 regularization, used in ridge, and L1, used in lasso. A complete presentation of these models can be found in [8,11].

2.2. Tree-Based Regression Models

We now present a summary of the three main tree-based regression models. A complete presentation of these models can be found in [8].

2.2.1. Regression Trees

There are two main steps for regression using decision trees: First, we begin by dividing the predictor space in R n into J non-overlapping regions R 1 , , R J ; second, for every testing example that falls in R j , we predict the response variable as the mean of the response variable over the training examples in R j .
The regions R j could be of any geometrical shape, but for the sake of simplicity and interpretation, we restrict ourselves only to rectangular regions (boxes). Therefore, we are searching for a partition of the predictor space into boxes R j such that they minimize the R S S , which can be written as:
R S S = j = 1 J i R j ( y i y ^ R j ) 2 .
It is computationally unfeasible to consider every possible partition of the predictor space; for this reason, the method uses top-down and greedy recursive binary splitting. It is binary because every split of a predictor variable results in a division of the predictor space into two sets; it is top-down because it starts building the tree from the top to the leafs; and it is greedy because, at each step, the best split of the predictor is made without looking ahead and without picking a split that will lead to a lower R S S in some future step.
To construct the tree, first, we consider all the predictor variables and all the possible binary splits for their values. If the variable is numerical, all the possible values s in the range values of the variable are reconsidered, and if the variable is categorical, we consider all possible partitions in two sets of the values of the variable. We then select the variable and the split of that variable that leads to the greatest reduction in R S S . This produces two regions corresponding to two branches of the tree. We then continue looking for the best predictor and the best partition on each of the resulting regions, and the process ends when a stopping criterion is reached. It is common to build a large, complex tree and then prune it in order to reduce over fitting.
Once the regions R 1 , , R J have been created, we can use it to predict the response for new examples as the mean of the response of the training observations in the region to which the new example belongs.

2.2.2. Random Forest

The idea is to build a given number of regression trees on bootstrap sets (that is, obtain distinct datasets by repeatedly sampling with replacement observations from the original dataset) and use, in each tree, a random subset of m of the original predictor variables. Usually, m is taken to be m = p , where p is the total number of variables.
In this case, the prediction of a new example is the mean over the predictions of the individual trees. The idea of random forest is to decorrelate the trees, thereby making the average of the resulting trees less variable.

2.2.3. Boosting

In this method, the idea is to sequentially construct trees to repeatedly modify versions of the training data and the loss function, thereby producing a sequence of regression models, G j ( x ) , whose predictions are then combined and weighted according to the error they produce.
Initially, all the N training examples ( x i , y i ) have the same weights w i = 1 N . In each following step of the training process, the data are modified by updating these weights w i . At step m, those observations with a higher error by the model G m 1 ( x ) induced at the previous step have their weights increased, whereas the weights are decreased for those that have a lower error, and those weights in turn are taken into account by the loss function. As a result of this weight actualization, each successive model in the process is then forced to concentrate on those training observations that present higher errors by previous models in the sequence.

2.3. K-Nearest Neighbors

Given a number K and a testing example x 0 , we identify the set N 0 of the nearest K training examples to x 0 . The prediction y ^ 0 of the response variable for x 0 is the mean of the response variable of the examples in N 0 , that is:
y ^ 0 = 1 K i N 0 y i .
We can use any distance between examples, but it is recommended that the distance that minimizes the testing R S S is used. To select the appropriate number of neighbors K, it is recommended that cross-validation be used to compare the R S S of the resulting models, using K = 1 , , n , where n is the total number of examples.
More details on the K-nearest neighbors regression model were presented in [8].

2.4. Support Vector Machines

The support vector machines model for regression is an extension of the linear regression model that uses a metric that is less sensitive to outliers using a fixed margin to ignore the errors that are within this margin by means of a ϵ -insensitive function in the loss function and defines a hyperplane that is meant to adjust the data and define the prediction.
Given a threshold ϵ , the idea is to define a margin such that examples with residues within the margin do not contribute to the regression fitting, while residue examples outside the margin contribute proportionally to their magnitude. Therefore, the outlier observations have a limited effect, and the examples in which the model fits well do not have an effect on the model.
The loss function of this model is given by:
L F ( β ) = C i = 1 n L ϵ ( y i y ^ i ) + j = 1 p β j 2 ,
where C is the cost penalty, which penalizes large residuals, y ^ i = β 0 + β 1 x i 1 + + β p x i p , and L ϵ ( ξ ) = | ξ | ϵ si | ξ | > ϵ 0 si | ξ | ϵ is the ϵ -insensitive function. We search for parameters β ^ j that minimize L F ( β ) .
A complete presentation of this model can be found in [8].

2.5. Neural Networks

For a neural network model, the function f that approximates the true relation of the data y = f * ( x ) takes the form of a composition f ( x ) = f ( n ) f ( n 1 ) f ( 1 ) ( x ) . In this chain structure, f ( i ) is called the i-th layer of the network, n is the depth of the network, and f ( n ) is the output of the network (which in the regression setup is a real-valued function), and the other layers are called hidden layers and are typically vector-valued functions. The neural network model is associated with a directed acyclic graph describing how the functions are composed together, and the idea of using many layers of vector-valued representations is that each one can learn distinct specific patterns in the data.
The training examples specify directly what the output layer must do at each point x ; that is, it must produce a value that is close to the true value y. The behavior of the hidden layers is not directly specified by the training data; instead, the learning algorithm must decide how to use these layers to best implement an approximation of the true value y.
Neural networks are usually trained using stochastic gradient descent, which involves computing the gradients of complicated functions, and the back-propagation algorithm is used to efficiently compute these gradients.
Full details of the mathematical formulation of the neural network model can be found in [8,12].

3. Regression Models for Symbolic Interval-Valued Variables

In this section, we summarize the center method and the center and range method; a complete presentation of which can be found in [5,6,13,14,15], respectively. We also propose an approach to the center method and the center and range method in the context of the other regression models considered.

3.1. Center Method

In the center method, the β parameters are estimated based on the interval’s midpoints. In this method, there are predictors X 1 , , X p and a response to be predicted Y, all of which are interval valued. Therefore, X is an n × p matrix, where each row is a vector of components of the training dataset x i = ( x i 1 , , x i p ) with x i j = [ a i j , b i j ] , and each component of the Y variable is also an interval y i = [ y L i , y U i ] .
We denote by X c the matrix with the interval’s midpoints of the matrix X, that is x i j c = ( a i j + b i j ) / 2 , and we denote by y i c = ( y L i + y U i ) / 2 the midpoints of Y. The idea of the center method is to fit a linear regression model over X c = ( ( x 1 c ) t , , ( x n c ) t ) ) t with ( x i c ) t = ( 1 , x i 1 c , , x i p c ) for i = 1 , , n and y c = ( y 1 c , , y n c ) t . If ( X c ) t X c is nonsingular from (5), we know that the unique solution for β is given by:
β ^ = ( ( X c ) t X c ) 1 ( X c ) t y c .
The value of the prediction for y = [ y L , y U ] for a new case x = ( x 1 , , x p ) with x j = [ a j , b j ] is estimated as follows:
y ^ L = ( x L ) t β ^ and y ^ U = ( x U ) t β ^ ,
where ( x L ) t = ( 1 , a 1 , , a p ) and ( x U ) t = ( 1 , b 1 , , b p ) .

3.2. Center and Range Method

With the center and range method, Lima Neto and De Carvalho fit the linear regression model for interval-valued variables by using the information contained in the midpoints and in the interval ranges, in order to improve the quality of the prediction of the center method. The idea is to fit two regression models, the first with the midpoint of the interval and the second with the ranges of those same intervals. Just like the center method, there are X 1 , , X p predictors and a response Y, and all these variables are interval valued. Therefore, X is an n × p matrix, where each row is a vector of a component of the training dataset x i = ( x i 1 , , x i p ) with x i j = [ a i j , b i j ] , and each component of the variable Y is also an interval y i = [ y L i , y U i ] .
To fit the first regression model, we proceed in the same way as in the center method, that is if we denote by X c the midpoint’s matrix ( x i j c = ( a i j + b i j ) / 2 ) and we denote by y i c = ( y L i + y U i ) / 2 the midpoints of Y, the center and range method fits a first linear regression model over X c = ( ( x 1 c ) t , , ( x n c ) t ) ) t with ( x i c ) t = ( 1 , x i 1 c , , ( x n c ) t ) ) t for i = 1 , , n and y c = ( y 1 c , , y n c ) t . In this case, if ( X c ) t X c is nonsingular, then we know that the unique solution for β c is given by:
β ^ c = ( ( X c ) t X c ) 1 ( X c ) t y c .
To fit the second regression model, half of the value of the range of each interval is used. For this, we denote by X r the matrix that contains in each component half of the interval ranges of the matrix X, i.e., x i j r = ( b i j a i j ) / 2 , and we denote by y i r = ( y U i y L i ) / 2 half of the interval-valued variable Y. The center and range method then fits a second linear regression model over X r = ( ( x 1 r ) t , , ( x n r ) t ) ) t with ( x i r ) t = ( 1 , x i 1 r , , x i p r ) for i = 1 , , n and y r = ( y 1 r , , y n r ) t . In this case, if ( X r ) t X r is nonsingular from Equation (5), we know that the solution for β r is given by:
β ^ r = ( ( X r ) t X r ) 1 ( X r ) t y r ,
so each case in the training dataset is represented by two vectors w i = ( x i c , y i c ) and r i = ( x i r , y i r ) for i = 1 , , n . The prediction value for y = [ y L , y U ] for a new case x = ( x 1 , , x p ) with x j = [ a j , b j ] is then estimated as follows:
y ^ L = y ^ c y ^ r and y ^ U = y ^ c + y ^ r ,
with:
y ^ c = ( x c ) t β ^ c and y ^ r = ( x r ) t β ^ r ,
where ( x c ) t = ( 1 , x 1 c , , x p c ) and ( x r ) t = ( 1 , x 1 r , , x p r ) .
This model cannot mathematically guarantee that y ^ L i y ^ U i for all i = 1 , n , a problem addressed by Lima Neto and De Carvalho in [6].
Extensions to this model include shrinkage linear regression models for symbolic interval-valued data, which involve a generalization of the ridge and lasso models for interval-valued data. A complete presentation of these models can be found in [7].

3.3. Tree-Based Regression for Symbolic Interval-Valued Variables

3.3.1. Regression Trees Center Method

We begin by dividing the predictor space of interval midpoints in R n into J non-overlapping regions R 1 c , , R J c .
We search for a partition of the predictor space into boxes R j c such that they minimize R S S c , which can be written as:
R S S c = j = 1 J i R j c ( y i c y ^ R j c ) 2 .
To construct the tree, first, we consider all the predictor variables and all the possible binary splits for their values. If the variable is numerical, all the possible values s in the range values of the variable are reconsidered, and if the variable is categorical, we consider all possible partitions in two sets of the values of the variable. We then select the variable and the split of that variable, which leads to the greatest reduction in R S S c . This produces two regions corresponding to two branches of the tree. We then continue looking for the best predictor and the best partition on each of the resulting regions, and the process ends when a stopping criterion is reached.
Once the regions R 1 c , , R J c have been created, the model can be used to predict the response of a new example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] as:
y ^ L = j = 1 J c j 1 { x L R j c } and y ^ U = j = 1 J c j 1 { x U R j c } ,
where c j is the mean of the response centers of the training observations in R j c , x L = ( a 1 , . . . , a p ) and x U = ( b 1 , . . . , b p ) .

3.3.2. Regression Tree Center and Range Method

We begin by dividing the predictor space of interval midpoints in R n into J non-overlapping regions R 1 c , , R J c , and by dividing the predictor space of interval ranges in R n in L non-overlapping regions R 1 r , , R L r .
In this case, we are searching for a partition of the predictor space of centers and ranges into boxes R j c and R j r such that they minimize the R S S c and R S S r , respectively, which can be written as:
R S S c = j = 1 J i R j c ( y i c y ^ R j c ) 2 and R S S r = j = 1 L i R j r ( y i r y ^ R j r ) 2 .
To construct the tree, first, we consider all the predictor variables and all the possible binary splits for their values. If the variable is numerical, all the possible values s in the range values of the variable are reconsidered, and if the variable is categorical, we consider all possible partitions in two sets of the values of the variable. We then select the variable and the split of that variable, which leads to the greatest reduction in R S S c and R S S r . This produces two regions corresponding to two branches of the tree. We then continue looking for the best predictor and the best partition on each of the resulting regions, and the process ends when a stopping criterion is reached.
Once the regions R 1 c , , R J c , and R 1 r , , R L r , have been created, the model can be used to predict the response of a new example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] as:
y ^ L = y ^ c y ^ r and y ^ U = y ^ c + y ^ r ,
with:
y ^ c = j = 1 J c j c 1 { x c R j r } and y ^ r = j = 1 J c j r 1 { x r R j r } ,
where c j c and c j r are the the means of the response centers and ranges of the training observations in R j c and R j r , respectively.

3.3.3. Random Forest Center Method

In this method, the idea is to build a given number M of regression trees, T j c , on bootstrap sets of the center data, using in each tree a random subset of m of the original predictor variables in X c .
In this case, the prediction of a new example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] is the mean over the predictions of the individual trees:
y ^ L = 1 M j = 1 M T j c ( x L ) and y ^ U = 1 M j = 1 M T j c ( x U ) ,
where x L = ( a 1 , . . . , a p ) and x U = ( b 1 , . . . , b p ) .

3.3.4. Random Forest Center and Range Method

With this method, the idea is to build a given number M of regression trees, T j c , on bootstrap sets of the center data and a given number L of regression trees, T j r , on bootstrap sets of the range data for each tree, using in each tree a random subset of m of the original predictor variables in X c and X r .
In this case, the prediction of a new example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] is given by:
y ^ L = y ^ c y ^ r and y ^ U = y ^ c + y ^ r ,
with
y ^ c = 1 M j = 1 M T j c ( x c ) and y ^ r = 1 L j = 1 L T j r ( x r ) .

3.3.5. Boosting Center Method

We construct trees sequentially and repeatedly modify versions of the training center data, thereby producing a sequence of regression models, G j c , whose predictions are then combined and weighted according to the error they produce.
Initially, all the N training examples ( x i c , y i c ) have the same weights w i c = 1 N . On each following step of the training process, the data are modified by updating these weights w i c . At step m, those observations with a higher error by the model G m 1 c ( x ) induced at the previous step have their weights increased, whereas the weights are decreased for those that have a lower error.
In this case, the prediction of a new example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] is the mean over the predictions of the individual trees:
y ^ L = j = 1 M α j c G j c ( x L ) and y ^ U = j = 1 L α j c G j c ( x U ) ,
where α j c measures the error of the j-th model, x L = ( a 1 , . . . , a p ) and x U = ( b 1 , . . . , b p ) .

3.3.6. Boosting Center and Range Method

We construct trees sequentially and repeatedly modify versions of the training center and range data, thereby producing two sequences of regression models, G j c and G j r , whose predictions are then combined and weighted according to the error they produce.
Initially, all the N center training examples ( x i c , y i c ) have the same weights w i c = 1 N , and the same applies to the range training examples. In each following step of the training process, the data are modified by updating these weights w i c and w i r . At step m, those observations with a higher error by the model G m 1 c ( x ) and G m 1 r ( x ) induced at the previous step have their weights increased, whereas the weights are decreased for those that have a lower error.
In this case, the prediction of a new example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] is given by:
y ^ L = y ^ c y ^ r and y ^ U = y ^ c + y ^ r ,
with:
y ^ c = j = 1 M α j c G j c ( x c ) and y ^ r = j = 1 M α j r G j r ( x r ) ,
where α m c measures the error of the j-th center model and α j r measures the error of the jth range model.

3.4. K-Nearest Neighbors Center Method

Given a number K and a testing example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] in the symbolic dataset, we identify the sets N 0 of the nearest K training examples to x c . The prediction y ^ of the response variable for x is given by:
y ^ L = 1 K x i N 0 ( y i ) L and y ^ U = 1 K x i N 0 ( y i ) U .

3.5. K-Nearest Neighbors Center and Range Method

Given the numbers K c and K r and a testing example x in the symbolic dataset, we identify the sets N c and N r of the nearest K c and K r training examples to x c and x r , respectively. The prediction y ^ of the response variable for x is given by:
y ^ L = y ^ c y ^ r and y ^ U = y ^ c + y ^ r ,
with:
y ^ c = 1 K c x i c N c y i c and y ^ r = 1 K r x i r N r y i r .

3.6. Support Vector Machines Center Method

Given a threshold ϵ c , the idea is to define a margin such that examples with residues within the margin do not contribute to the regression fitting, while residue examples outside the margin contribute proportionally to their magnitude.
The loss function of this model is given by:
L F c ( β ) = C i = 1 n L ϵ ( y i c y ^ i c ) + j = 1 p β j 2 ,
where:
L ϵ ( ξ ) = | ξ | ϵ c si | ξ | > ϵ c 0 si | ξ | ϵ c .
We search for parameters β ^ c that minimize L F ( β ) , so the value of the prediction for y = [ y L , y U ] for a new case x = ( x 1 , , x p ) with x j = [ a j , b j ] is estimated as follows:
y ^ L = ( x L ) t β ^ c and y ^ U = ( x U ) t β ^ c ,
where ( x L ) t = ( 1 , a 1 , , a p ) and ( x U ) t = ( 1 , b 1 , , b p ) .

3.7. Support Vector Machines Center and Range Method

Given thresholds ϵ c and ϵ r , we define margins such that examples with residues within these margin do not contribute to regression fitting, while residue examples outside the margins contribute proportionally to their magnitude.
The loss function of the center and ranges models is given by:
L F c ( β ) = C i = 1 n L ϵ c ( y i c y ^ i c ) + j = 1 p β j 2 and L F r ( β ) = C i = 1 n L ϵ r ( y i r y ^ i r ) + j = 1 p β j 2 ,
where:
L ϵ ( ξ ) = | ξ | ϵ si | ξ | > ϵ 0 si | ξ | ϵ .
We search parameters β ^ c that minimize L F ( β ) , so the value of the prediction for y = [ y L , y U ] for a new case x = ( x 1 , , x p ) with x j = [ a j , b j ] is estimated as:
y ^ L = y ^ c y ^ r and y ^ U = y ^ c + y ^ r ,
with:
y ^ c = ( x c ) t β ^ c and y ^ r = ( x r ) t β ^ r ,
where ( x c ) t = ( 1 , x 1 c , , x p c ) and ( x r ) t = ( 1 , x 1 r , , x p r ) .

3.8. Neural Networks Center Method

We consider a neural network model on the center data, and the function f c that approximates the true relation of these data y = f c ( x ) takes the form of a composition f c ( x ) = f n c f n 1 c f 1 c ( x ) .
In this case, the prediction of a new example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] is given by:
y ^ L = f n c f n 1 c f 1 c ( x L ) and y ^ U = f n c f n 1 c f 1 c ( x U ) ,
where x L = ( a 1 , . . . , a p ) and x U = ( b 1 , . . . , b p ) .

3.9. Neural Networks Center and Range Method

We consider two neural networks models, one for the center data and another for the range data. The function f c that approximates the true relation of these data y = f c ( x ) takes the form of a composition f c ( x ) = f n c f n 1 c f 1 c ( x ) .
In this case, the prediction of a new example x = ( x 1 , . . . , x p ) with x j = [ a j , b j ] is given by:
y ^ L = y ^ c y ^ r and y ^ U = y ^ c + y ^ r ,
with:
y ^ c = f n c f n 1 c f 1 c ( x c ) and y ^ r = f n r f n 1 r f 1 r ( x r ) .

4. Experimental Evaluation

As done by Lima Neto and De Carvalho in [6], the evaluation of the results of these interval-valued regression models was carried out using the following metrics: the lower boundary root-mean-squared-error RMSE L , the upper boundary root-mean-squared-error RMSE U , the square of the lower boundary correlation coefficient r L 2 , and the square of the upper boundary correlation coefficient r U 2 .
RMSE L = i = 1 n ( y L i y ^ L i ) 2 n and RMSE U = i = 1 n ( y U i y ^ U i ) 2 n ,
r L 2 = Cov ( y L , y ^ L ) S y L S y ^ L 2 and r U 2 = Cov ( y U , y ^ U ) S y U S y ^ U 2 ,
where y i = [ y L i , y U i ] and its corresponding prediction is y ^ i = [ y ^ L i , y ^ U i ] , y L = ( y L 1 , , y L n ) t , y ^ L = ( y ^ L 1 , , y ^ L n ) t , y U = ( y U 1 , , y U n ) t , y ^ U = ( y ^ U 1 , , y ^ U n ) t ; as is usual, Cov ( Ψ , Φ ) denotes the covariance among variables Ψ and Φ ; S Ψ denotes the standard deviation of variable Ψ .
For the experimental evaluation, we used the following hyperparameters for the models.
  • Lasso and ridge: folds used in K-fold cross-validation to find the best tuning parameter  λ : 10
  • Regression trees: minimum number of observations in a node: 20; maximum depth of any node of the final tree: 10.
  • Random forest: number of trees to grow: 500; number of variables randomly sampled at each split: p 3 , where p is the total number of variables.
  • Boosting: number of trees to grow: 500; shrinkage parameter applied to each tree in the expansion: 0.1; highest level of variable interactions allowed: 1.
  • K-nearest neighbors: maximum number of neighbors considered: 20; kernel to use: triangular.
  • Support vector machines: kernel to use: radial.
  • Neural networks: number of layers: 1; number of neurons: 10; threshold as stopping criteria: 0.05; maximum steps for the training process: 10 5 .
All examples presented in this section were developed using the package RSDA (R for symbolic data analysis) constructed by the authors of this paper for applications in symbolic data analysis (see [16]). The datasets used in this section are also available in the RSDA package. A reproducible document with all the R code for the examples can be downloaded from http://www.oldemarrodriguez.com/publicaciones.html, accessed on 29 March 2021.

4.1. Cardiological Interval Dataset

To illustrate the application of the methods, we considered the data based on [17] and taken from [3], shown in Table 1, in which the pulse rate (Pulse), systolic blood pressure (Sys), diastolic blood pressure (Diast), Art1 and Art2 were recorded as an interval for each of the patients, where Art1 and Art2 are artificial variables added to the data table. The goal was to predict (Pulse). The dataset consisted of 44 rows and 5 attributes. Table 1 presents a glimpse of the dataset. To measure the performance of the models, we split the dataset into training and testing sets, using 70% and 30%, respectively.
After closer inspection of Table 2, it can be seen that the neural network had the best metrics for RMSE L , r L 2 , and r U 2 , and had the second-best RMSE U , the random forest center and range model being the best one. As a result, the neural network model was best for this dataset.
As mentioned before, the neural network center and range model consist of 2 neural networks with 1 layer and 10 neurons, and this selection was made to avoid overfitting the small dataset. This simple neutral network model was able to capture the relation in the data better than the other models, and there was a large difference in performance, even when compared with the second best.
In the previous table, the boosting method was omitted because the table is too short for the model to be fitted.

4.2. Monte Carlo Experiments and Applications

The usefulness of the methods proposed in this paper was evaluated through experiments with synthetic interval-valued datasets with different linear and nonlinear configurations.

4.2.1. Synthetic Linear Symbolic Interval Datasets

We followed the approach of Lima Neto and De Carvalho in [14] and constructed a symbolic interval dataset with the center and range values of the intervals simulated according to a linear relationship.
Our goal was to construct a symbolic dataset with 375 rows and 4 variables. We used 250 rows as the learning set and 125 as the test set. We followed these steps to construct the synthetic dataset.
  • The random variable X j c was uniformly distributed in [ 20 , 40 ] .
  • The random variable Y c was related to the random variables X j c according to Y c = X c β + ϵ , where X c = ( 1 , X 1 c , X 2 c , X 3 c ) , β = ( β 0 , β 1 , β 2 , β 3 ) with β j U [ 10 , 10 ] , and ϵ U [ a , b ] .
  • The random variable Y r was related to the variable Y c according to Y r = Y c β * + ϵ * , in the same manner X j r was related to X j c according to X j r = X j c β * + ϵ * , where β * U [ 0.5 , 1.5 ] and ϵ * U [ i , j ] .
The next table shows different configurations for the values of a , b and i , j .
As per Lima Neto and De Carvalho, these configurations in Table 3 took into account the error at the midpoints combined with the degree of dependence between the midpoints and ranges, with two levels of variability: high variability error U [ 20 , 20 ] , low variability error U [ 5 , 5 ] , a high degree of dependence U [ 1 , 5 ] , and a low degree of dependence U [ 10 , 20 ] .
We present the results for all the datasets.
For dataset D1, we were able to see that the lasso, ridge, and LM center and range models were the best.
For dataset D2, the lasso, ridge, and LM center and range models were the best, and the KNN showed the next best performance.
Once again, for dataset D3, the lasso, ridge, and LM center and range model were the best models, without any close competitors.
For dataset D4, the lasso, ridge, LM, and the boost center and range model were the best. Therefore, the boost model was able to perform well in this dataset.
Note that in Table 4, Table 5, Table 6 and Table 7, some models have NA on r L 2 or r U 2 , and this is because for those models, either the prediction y ^ L or y ^ U had zero standard deviation, and thus, the square of the correlation coefficient cannot be calculated.

4.2.2. Synthetic Nonlinear Symbolic Interval Datasets

We considered a synthetic interval-valued dataset with a nonlinear relation between the response variable and the explanatory variables for the midpoint and range of the intervals.
In order to test the predictive power of the proposed methods, we followed the approach of Lima Neto and De Carvalho in [18] and constructed a symbolic interval dataset with the values of the center and range of the intervals simulated according to a nonlinear relationship between the response variable and the explanatory variables.
Our goal was to construct a symbolic dataset with 3000 rows and 4 variables and compare the methods using a K-fold cross-validation scheme with 10 folds. We followed the following steps to construct the synthetic dataset.
  • The center random variables X j c were uniformly distributed in the interval [ 6 , 6 ] ; that is, X j c U [ 6 , 6 ] .
  • The center random variable Y c was related to the random variables X j c according to the logistic function:
    Y i c = θ 0 c θ 1 c + e θ 2 c X 1 i c + + θ n + 1 c X n i c + ϵ i c ,
    where X j i c and Y i c are the ith entries of the variables X j c and Y c , respectively, θ 0 c U [ 1.9 , 2.1 ] , θ 1 c U [ 2.9 , 3.1 ] , and θ m c U [ 0.9 , 1.1 ] for m = 2 , . . . , n + 1 , and the error component ϵ i c is normally distributed, where ϵ i c N ( 0 , 0.05 ) .
  • The range random variables X j r were uniformly distributed in the interval [ 1 , 4 ] ; that is, X j c U [ 1 , 4 ] .
  • The range random variable Y r was related to the random variables X j r according to the exponential function:
    Y i r = θ 0 r + e ( θ 1 r X 1 i r + + θ n r X n i r ) + ϵ i r ,
    where X j i r , Y i r are the ith entries of the variables X j r , Y r , respectively, θ 0 c U [ 0 , 0.5 ] and θ m r U [ 0.9 , 1.1 ] for m = 1 , . . . , n , and the error component ϵ i r is normally distributed, where ϵ i r N ( 0 , 0.01 ) .
As per Lima Neto and De Carvalho, the previous configuration had a high nonlinearity degree at the midpoints and a high nonlinearity degree in the ranges, as defined by the error components.
We here present the results for the synthetic dataset.
We note from Table 8 that every CRM method outperformed all of the CM methods. On closer inspection, we noticed that the neural network CRM model had the best cross-validation performance among all methods, and we observed that the proposed nonlinear models generally had better evaluation metrics. The second best method was the KNN CRM model, followed by the SVM CRM model.
Table 9 shows the standard deviation of the methods among the 10 folds.
As we can see from Table 9, the standard deviation was low for all methods among the folds, which gave us confidence in our results.

5. Conclusions and Future Work

In this paper, we proposed 12 new methods of fit regression models to interval-valued variables, all based on the central idea of fitting regression models for the centers and ranges of the intervals and extending the ideas of the nonlinear methods for real-valued data. We presented new approaches to fit regression models for symbolic internal-valued variables, which were shown to improve and extend the center method the center and range method proposed by Lima Neto and De Carvalho in [6,13,14,18].
In the experimental evaluation, we found that the use of nonlinear methods greatly improved the prediction results in the regression problems. With the cardiological dataset, a simple neural network was able to radically improve the predictions in comparison to the other methods, especially when compared to those based on linear methods. In the Monte Carlo experiments, as expected, the linear models were the best when using synthetic linear symbolic interval datasets, and only the boosting center and range model was close in performance in one of the datasets. When using the synthetic nonlinear symbolic interval data, we observed the real power and advantages of the nonlinear framework for regression with interval-valued data; in particular, we saw the benefit of using neural networks, as this was once again the model that best captured the underlying structure of the data, when combined with the center and range approach. When comparing with the linear models, we saw that the neural networks center and range model had a R M S E L and a R M S E U of 0.051, which was more than half lower than the root-mean-squared-errors of those center and range models based on linear methods of 0.141; in a similar way, the neural network had a r L 2 and a r U 2 of 0.97, which was much higher than the square of the correlation coefficients of those based on linear methods of 0.77. In addition, we want to note that almost all of the proposed models outperformed the classical models based on linear approaches and that these results did not consider hyperparameter tuning in the methods, which in turn represents an opportunity to further improve the results.
Based on the results found, we not only achieved our goal of extending the tool kit of regression models for interval-valued datasets, that had focused on linear methods, but also, we demonstrated the predictive advantages of making the extension to nonlinear methods. This is relevant due to the fact that, in real-life applications, data rarely follow a linear structure.
The methods proposed, just as the original center method, have the problem explained by Lima Neto and De Carvalho in [6], which is that it cannot be mathematically guaranteed that y ^ L y ^ U . In future work, we will apply the idea proposed in [6], which consists of generating the regression models using certain restrictions that will allow us to guarantee that the methods satisfy this restriction. This was not included in this paper because it was expected to cause confusion in the results, since it would not have been clear if improvements in predictions were due to the applications of shrinkage or to the application of restrictions in the regression methods.

Author Contributions

The authors contributed equally to this work. Both the authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Diday, E. Introduction à l’approche symbolique en Analyse des Données. In Premières Journées Symbolique-Numérique; Université Paris IX Dauphine: Paris, France, 1987. [Google Scholar]
  2. Billard, L.; Diday, E. From the statistics of data to the statistics of knowledge: Symbolic data analysis. J. Am. Stat. Assoc. 2003, 98, 470–487. [Google Scholar] [CrossRef]
  3. Billard, L.; Diday, E. Symbolic Data Analysis: Conceptual Statistics and Data Mining; John Wiley & Sons Ltd.: Letchworth, UK, 2006. [Google Scholar]
  4. Bock, H.-H.; Diday, E. (Eds.) Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data; Springer: Berlin, Germany, 2000. [Google Scholar]
  5. Billard, L.; Diday, E. Regression analysis for interval-valued data. In Data Analysis, Classification and Related Methods; Springer: Berlin, Germany, 2000; pp. 369–374. [Google Scholar]
  6. Lima-Neto, E.A.; De Carvalho, F.A.T. Constrained linear regression models for symbolic interval-valued variables. Comput. Stat. Data Anal. 2010, 54, 333–347. [Google Scholar] [CrossRef]
  7. Rodríguez, O. Shrinkage Linear Regression for Symbolic Interval-Valued Variables. 2018. Available online: https://editions-rnti.fr/?inprocid=1002533 (accessed on 29 March 2021).
  8. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction; Springer: New York, NY, USA, 2008. [Google Scholar]
  9. Hastie, T.; Zou, H. Regularization and variable selection via the elastic net. Stat. Soc. B 2005, 67, 301–320. [Google Scholar]
  10. Izonin, I.; Tkachenko, R.; Kryvinska, N.; Greguš, M. Multiple linear regression based on coefficients identification using non-iterative SGTM neural-like structure. In Proceedings of the International Work-Conference on Artificial Neural Networks, Cadiz, Spain, 14–16 June 2019; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
  11. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  12. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: London, UK, 2016. [Google Scholar]
  13. Lima Neto, E.d.A.; de Carvalho, F.d.A.T. An exponential-type kernel robust regression model for interval-valued variables. In Information Sciences; Elsevier: Amsterdam, The Netherlands, 2018; Volume 454, pp. 419–442. [Google Scholar]
  14. Lima-Neto, E.A.; De Carvalho, F.A.T. Centre and range method to fitting a linear regression model on symbolic interval data. Comput. Stat. Data Anal. 2008, 52, 1500–1515. [Google Scholar] [CrossRef]
  15. Rodríguez, O. Classification et Modèles Linéaires en Analyse des Données Symboliques. Ph.D. Thesis, Paris IX-Dauphine University, Paris, France, 2000. [Google Scholar]
  16. Rodríguez, O.; Agüero, C.; Arce, J.; Chacón, J.E. RSDA-R to Symbolic Data Analysis. R Package Version 1.2. 2020. Available online: http://CRAN.R-project.org/package=RSDA (accessed on 29 March 2021).
  17. Raju, S.R.K. Symbolic Data Analysis in Cardiology. In Symbolic Data Analysis and Its Applications; Diday, E., Gowda, K.C., Eds.; Ceremade, Université Paris-Dauphine: Paris, France, 1997; pp. 245–249. [Google Scholar]
  18. Lima Neto, E.d.A.; de Carvalho, F.d.A.T. Nonlinear regression applied to interval-valued data. Pattern Anal. Appl. 2017, 20, 809–824. [Google Scholar] [CrossRef]
Table 1. Cardiological interval dataset. Syst, systolic; Diast, diastolic; Art, artificial.
Table 1. Cardiological interval dataset. Syst, systolic; Diast, diastolic; Art, artificial.
PulseSystDiastArt1Art2
1[44, 68][90, 100][50, 70][6, 9][1, 6]
2[60, 72][90, 130][70, 90][2, 9][4, 6]
3[56, 90][140, 180][90, 100][9, 10][4, 7]
4[70, 112][110, 142][80, 108][8, 9][5, 5]
5[54, 72][90, 100][50, 70][5, 7][6, 9]
44[65, 67][90, 150][78, 90][8, 9][7, 9]
Table 2. Performance of the methods in the cardiological interval dataset.
Table 2. Performance of the methods in the cardiological interval dataset.
MethodRMSE L RMSE U r L 2 r U 2
LM CM15.846413.71080.17440.2238
Ridge CM16.805513.55040.22340.2456
Lasso CM17.261613.94840.03580.1652
RT CM18.678315.97690.00550.0765
RF CM15.049513.93720.35480.2406
KNN CM16.291412.97310.33700.4125
SVM CM15.575613.43780.43580.4307
NNet CM9.253513.92760.66020.5340
LM CRM12.391313.59610.29890.2191
Ridge CRM12.508113.08210.36890.2368
Lasso CRM13.545213.88530.16540.1381
RT CRM12.773213.61370.26230.1730
RF CRM7.65667.35740.79880.8172
KNN CRM9.898110.73090.58890.4518
SVM CRM11.978912.58680.33240.2728
NNet CRM5.65857.41400.92700.9010
CM: center method. CRM: center and range method. LM: linear regression. Ridge: ridge regression. Lasso: lasso regression. RT: regression trees. RF: random forest. KNN: K-nearest neighbors. SVM: support vector machines. NNet: neural networks.
Table 3. Configurations with the center and range with a linear relationship. D1, Dataset 1.
Table 3. Configurations with the center and range with a linear relationship. D1, Dataset 1.
D1 ϵ U [ 20 , 20 ] ϵ * U [ 1 , 5 ]
D2 ϵ U [ 20 , 20 ] ϵ * U [ 10 , 20 ]
D3 ϵ U [ 5 , 5 ] ϵ * U [ 1 , 5 ]
D4 ϵ U [ 5 , 5 ] ϵ * U [ 10 , 20 ]
Table 4. Performance of the methods for the D1 dataset.
Table 4. Performance of the methods for the D1 dataset.
MethodRMSE L RMSE U r L 2 r U 2
LM CM94.277197.24120.21290.9710
Ridge CM66.205167.68000.21310.9711
Lasso CM92.996995.72360.21290.9710
RT CM222.4937324.0148NANA
RF CM258.4727313.7617NANA
Boost CM235.2453286.0998NANA
KNN CM248.2890295.50310.07310.5838
SVM CM332.8712376.91750.04340.0282
NNet CM133.2041134.92650.20860.7714
LM CRM11.139522.26230.27560.9716
Ridge CRM10.277624.54610.27910.9719
Lasso CRM11.072022.28970.27600.9717
RT CRM25.577252.65830.04010.8410
RF CRM11.974834.36300.19050.9521
Boost CRM13.936525.96540.13880.9613
KNN CRM12.025328.27440.21300.9563
SVM CRM11.646627.33210.15790.9571
NNet CRM12.661725.00320.14150.9641
NA: number not available.
Table 5. Performance of the methods for the D2 dataset.
Table 5. Performance of the methods for the D2 dataset.
MethodRMSE L RMSE U r L 2 r U 2
LM CM208.2321211.01790.03910.8514
Ridge CM182.7565185.26480.03890.8501
Lasso CM207.0609209.79360.03910.8509
RT CM92.9444119.6915NANA
RF CM109.5064127.3871NANA
Boost CM84.7520108.0348NANA
KNN CM99.2145120.96640.03310.5261
SVM CM153.5649161.4119NA0.0196
NNet CM41.529976.73320.02210.2926
LM CRM12.327222.98040.74510.8829
Ridge CRM10.867823.40120.73990.8817
Lasso CRM12.218923.00000.74530.8826
RT CRM21.348135.05390.27940.7257
RF CRM11.368227.71700.58110.8503
Boost CRM14.355723.32770.65930.8799
KNN CRM12.430624.27690.70610.8723
SVM CRM13.335124.34960.67630.8685
NNet CRM15.510524.73350.60630.8644
NA: number not available.
Table 6. Performance of the methods for the D3 dataset.
Table 6. Performance of the methods for the D3 dataset.
MethodRMSE L RMSE U r L 2 r U 2
LM CM731.6593731.95020.83090.9835
Ridge CM713.0555711.71510.83090.9835
Lasso CM730.6541730.82200.83110.9835
RT CM429.9846405.37980.0000NA
RF CM428.2619391.76140.0194NA
Boost CM392.7227354.09300.2566NA
KNN CM413.8011364.56940.47340.4805
SVM CM491.7221468.31710.11160.1944
NNet CM343.0967302.11610.83690.6365
LM CRM30.442329.51110.97090.1197
Ridge CRM34.749927.71740.97070.1324
Lasso CRM30.664729.35100.97090.1198
RT CRM70.358149.06150.84170.0329
RF CRM54.081628.25000.94580.0724
Boost CRM37.453431.90370.95760.0635
KNN CRM41.828932.35330.95630.0203
SVM CRM37.010229.41400.96150.0598
NNet CRM36.103834.26310.95970.0487
NA: number not available.
Table 7. Performance of the methods for the D4 dataset.
Table 7. Performance of the methods for the D4 dataset.
MethodRMSE L RMSE U r L 2 r U 2
LM CM209.5566210.10470.00410.9305
Ridge CM165.3670165.68650.00420.9304
Lasso CM205.1182205.65400.00410.9306
RT CM273.0309289.5941NANA
RF CM274.3768301.9242NANA
Boost CM241.1328259.7999NANA
KNN CM252.3822275.08760.00400.2819
SVM CM339.6949354.9830NA0.0714
NNet CM184.6237212.14730.00220.3013
LM CRM18.184618.86340.00120.9473
Ridge CRM16.768620.67430.00120.9470
Lasso CRM18.080018.92130.00120.9474
RT CRM23.317729.63930.01160.8671
RF CRM15.785725.87130.00980.9272
Boost CRM17.699719.10210.00890.9466
KNN CRM18.874822.69280.00020.9299
SVM CRM19.073721.11230.00530.9362
NNet CRM21.046021.86250.00190.9287
NA: number not available.
Table 8. Ten-fold cross-validation mean metrics.
Table 8. Ten-fold cross-validation mean metrics.
MethodRMSE L RMSE U r L 2 r U 2
LM CM0.66560.66560.73220.7306
Ridge CM0.64930.64930.73220.7306
Lasso CM0.66340.66340.73220.7306
RT CM0.27660.29270.32740.2710
RF CM0.24260.26670.39910.3599
Boost CM0.17170.17570.69610.6962
KNN CM0.25250.26870.33560.3091
SVM CM0.27980.29070.19690.2017
NNet CM0.26420.27150.26950.2764
LM CRM0.14110.14080.77430.7746
Ridge CRM0.14180.14150.77430.7746
Lasso CRM0.14110.14080.77430.7746
RT CRM0.14950.14950.74620.7453
RF CRM0.06810.06830.95310.9525
Boost CRM0.14280.14250.76840.7687
KNN CRM0.05800.05810.96210.9619
SVM CRM0.06170.06170.95700.9568
NNet CRM0.05140.05190.97010.9694
Table 9. Ten-fold cross-validation standard deviation metrics.
Table 9. Ten-fold cross-validation standard deviation metrics.
MethodRMSE L RMSE U r L 2 r U 2
LM CM0.01110.00920.01230.0197
Ridge CM0.01090.00930.01230.0197
Lasso CM0.01110.00920.01230.0197
RT CM0.00930.00400.03760.0318
RF CM0.00750.00500.02680.0206
Boost CM0.00890.00710.03050.0211
KNN CM0.00750.00470.02210.0225
SVM CM0.00730.00690.01900.0359
NNet CM0.00790.00490.02090.0212
LM CM0.00550.00490.01440.0129
Ridge CM0.00540.00470.01440.0129
Lasso CM0.00550.00480.01440.0129
RT CM0.01170.01150.04630.0464
RF CM0.00360.00300.00510.0047
Boost CM0.00560.00510.01440.0139
KNN CM0.00190.00120.00260.0026
SVM CM0.00140.00170.00240.0030
NNet CM0.00230.00230.00240.0028
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chacón, J.E.; Rodríguez, O. Regression Models for Symbolic Interval-Valued Variables. Entropy 2021, 23, 429. https://doi.org/10.3390/e23040429

AMA Style

Chacón JE, Rodríguez O. Regression Models for Symbolic Interval-Valued Variables. Entropy. 2021; 23(4):429. https://doi.org/10.3390/e23040429

Chicago/Turabian Style

Chacón, Jose Emmanuel, and Oldemar Rodríguez. 2021. "Regression Models for Symbolic Interval-Valued Variables" Entropy 23, no. 4: 429. https://doi.org/10.3390/e23040429

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop