A Novel Price Prediction Service for E-Commerce Categorical Data

Fathalla, Ahmed; Salah, Ahmad; Ali, Ahmed

doi:10.3390/math11081938

Open AccessArticle

A Novel Price Prediction Service for E-Commerce Categorical Data

by

Ahmed Fathalla

^1,†

,

Ahmad Salah

^2,3,*,†

and

Ahmed Ali

^4,5,†

¹

Department of Mathematics, Faculty of Science, Suez Canal University, Ismailia 41522, Egypt

²

Department of Computer Science, Faculty of Computers and Informatics, Zagazig University, Zagazig 44519, Egypt

³

College of Computing and Information Sciences, University of Technology and Applied Sciences, Ibri 516, Oman

⁴

Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia

⁵

Higher Future Institute for Specialized Technological Studies, Cairo 3044, Egypt

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(8), 1938; https://doi.org/10.3390/math11081938

Submission received: 3 March 2023 / Revised: 11 April 2023 / Accepted: 17 April 2023 / Published: 20 April 2023

Download

Browse Figures

Versions Notes

Abstract

:

Most e-commerce data include items that belong to different categories, e.g., product types on Amazon and eBay. The accurate prediction of an item’s price on an e-commerce platform will facilitate the maximization of economic benefits for the seller and buyer. Consequently, the task of price prediction of e-commerce items can be seen as a multiple regression on categorical data. Performing multiple regression tasks with categorical independent variables is tricky since the observations of each product type might have different distribution shapes, whereas the distribution shape of all the data might not be representative of each group. In this vein, we propose a service for facilitating the price prediction task of e-commerce categorical products. The main novelty of the proposed service relies on two unique data transformations aiming at increasing the between-group variance and decreasing the within-group variance to improve the task of regression analysis on categorical data. The proposed data transformations are tested on four different e-commerce datasets over a set of linear, non-linear, and neural network-based regression models. Comparing the best existing regression models without applying the proposed transformation, the proposed transformation results show improvements in the range of 1.98% to 8.91% for the four evaluation metrics scores, namely,

R^{2}

, MAE, RMSE, and MAPE. However, the best metrics improvement on each dataset has average values of 16.8%, 8.0%, 6.0%, and 25.0% for

R^{2}

, MAE, RMSE, and MAPE, respectively.

Keywords:

between-group variance; categorical data; data transformation; E-commerce; price prediction; within-group variance

MSC:

68T01

1. Introduction

Using e-commerce platforms, e.g., Amazon and eBay, is a key activity in modern life. On an e-commerce platform, the seller can utilize the task of price prediction to estimate the proper price of a newly added item based on similar items already existing on the same e-commerce platform [1]. In addition, the buyer can use the price prediction service to evaluate the price of a given item existing on the e-commerce platform instead of searching and comparing. The price prediction service should suggest the proper price based on the item features. In e-commerce platforms, the items are grouped based on the product type, e.g., toys and vehicles. In a regression task, the outcome (item price) prediction of categorical data is a more difficult task than non-categorical data, since each product category might have a different distribution shape, whereas the distribution shape of all the items might not be representative of each product type [2]. In the rest of this paper, we will use the product type, product category, and group interchangeably.

Few researchers have addressed the problem of price prediction of e-commerce products. Noor et al. [3] used the multiple linear regression model to predict new and second-hand vehicle prices. Yang et al. [4] proposed a model for predicting vehicle prices. Ahmed et al. proposed a model for addressing house price prediction. Kalaiselvi et al. [5] made pricing analytics of smartphone products. Although most of the e-commerce data used in their work can be grouped into different categories, none of these methods handled the categorical features of different price shapes.

Machine learning (ML) algorithms’ performance and success depend highly on data representation; that is, the better the representation of the data, the better the performance and results of a supervised predictor [6,7]. Data transformations are potential methods for improving the data representation and addressing various practical issues (e.g., tackling normally distributed errors assumption) [8].

Data variables possibly have different forms, e.g., numeric (quantitative) or categorical (qualitative). If the desired outcome (response variable) consists of one or more continuous variables, then the task is called regression; otherwise, it is called a classification task. Simple and multiple regressions depend on the number of independent variables. Furthermore, univariate and multivariate regression corresponds to single and multiple response variables, respectively, [9]. Regression models, which are supervised learning methods, are forms of predictive modeling techniques that are used to predict a continuous dependent value (response variable) based on the independent (predictor) variable(s). In a regression task, the dependent variable is modeled as a function of the independent variables, the corresponding regression parameters (coefficients), and a random error term [10].

Data transformation (e.g., square root or the logarithm of a variable) is a replacement that changes the shape of a variable distribution. Specifically, data transformations not only variate the scale of the transformed variable, but they also change the fundamental relationships between variables while simultaneously altering the distribution of the errors [8]. On the other hand, they address other different issues, e.g., reduce skewness, make equal spreads of variable values in the case of applying transformations to a single variable, or make linear or additive relationships between two or more variables [11].

In regression tasks of qualitative/categorical data, different data groups might have different means compared to all data means, since each group has its own distribution shape. This may increase the degree of data non-linearly. Therefore, some ML models can possibly benefit from their non-linearity (e.g., tree-based models, k-nearest neighbor, support vector machines, or polynomial regression) when addressing that problem. On the other hand, some statistical models are presented to tackle the aforementioned problem. Among them are linear mixed models (by incorporating fixed and random effects) and generalized estimating equations.

One possible solution to increase the data linearity between the items of e-commerce product categories is to guarantee that items of different categories do not have close/similar outcome values. In other words, each product category has a unique range of outcome values (prices) so that items of one product category have closer responses relative to the outcomes of any item from the other product categories. Thus, finding a metric to measure the intersection of the product categories’ price ranges and a data transformation to improve this metric score might provide a basis for the solution to this problem.

In this vein, we propose an experimental study to determine the effect of applying two data transformations to the price prediction regression tasks of e-commerce categorical data by increasing the between-group variance and decreasing the within-group variance. In other words, we propose separately transforming each product category data into a new distribution shape but with lower variance in order to reduce the within-group variance. Additionally, we propose separately shifting each product category data so that all of the product categories’ means are aligned in a straight line (increasing the between-group variance). These data transformations can be applied to the response variable (price) only. The proposed work is provided as an open-source code on GitHub repositories (https://github.com/Ahmed-Fathalla/Transformation, accessed on 3 January 2023).

The main goal of this paper is to propose a service for facilitating the price prediction task of e-commerce categorical products. Therefore, the major contributions of this work can be summarized as follows:

We are the first to propose data transformations to increase the between-group variance and to decrease the within-group variance for regression tasks on e-commerce categorical data.
The two proposed data transformations are evaluated on four categorical e-commerce datasets, where different regression models of different families are used, namely, linear (e.g., linear regression), non-linear (e.g., random forest), neural network (i.e., gated recurrent unit), and statistical (i.e., generalized estimating equations and linear mixed-effects models) methods.
In e-commerce data, textual features have a main role in the regression tasks [12,13]. Therefore, we propose evaluating several regression models to determine which family of regression models is suitable for a high-dimensional (e.g., textual) features dataset. The results show that the proposed data transformations enhanced the linear, non-linear, and neural network-based models for solving regression tasks of e-commerce price prediction.
The two proposed data transformations for regression tasks can possibly be applied to the response variable of categorical non-commercial datasets.

The remainder of this paper is organized as follows. Section 2 discusses the related works. Section 3 describes the research methodology. The experimental results and discussion are presented in Section 4. Finally, we conclude the proposed work in Section 5.

2. Related Work

Machine learning and deep neural networks have recently exhibited top-notch performance in a wide range of applications [14,15,16]. More precisely, in this section, we discuss the price prediction problem and review data transformation and feature engineering methods used in various price prediction tasks. Finally, we review the regression algorithms utilized in the proposed work.

2.1. Price Prediction of E-Commerce Products

Various research efforts have been paid to address the problem of product price prediction by using different prediction methods to improve the accuracy of the deployed models. Most of the e-commerce datasets include products of different categories (e.g., clothes and vehicles) with different price distribution properties (minimum, maximum, mean, and variance price values) that allow products of different categories to have similar/intersected price ranges.

Pal et al. [17] utilized a secondhand car dataset to perform price prediction; the authors used linear regression and random forest regression models. Shastri et al. [18] proposed a model for stock price prediction, where the model is composed of the naïve Bayes model as a feature extractor and a neural network that predicts the stocks’ prices for a particular day. Kalaiselvi et al. [5] proposed a price prediction model for smartphone products using historical data on smartphones over the past few years. Sentimental analysis and a multilayer feedforward neural network are used to predict the prices. Yu et al. [19] proposed different prediction models based on deep learning for predicting house prices based on some factors, i.e., the building age, the number of subways around the building, the number of schools around the building, and the location of the building.

All of the discussed research methods for addressing price prediction of e-commerce products of different categories only utilize feature engineering methods to tackle the price prediction problem, i.e., one-hot encoding. To the best of the authors’ knowledge, there is no method that addresses the difference in distribution shapes of the product categories.

2.2. Feature Engineering for Regression Problem

2.2.1. Textual Feature Engineering

In many applications [12,13,20,21], textual features play an important role in the regression task. Such machine learning regression models are some type of mathematical model that requires numbers to work with. Thus, a numerical representation of the textual feature should be extracted. Most feature extraction methods applied for textual features include term frequency–inverse document frequency (TF–IDF) [22]. TF–IDF is used as a weighting schema in information retrieval and text mining. TF–IDF is also used to convert a corpus (e.g., a textual feature) into a structured numerical format that reflects how important a word is to a document (e.g., the textual feature of an observation) in a corpus. The TF–IDF value raises proportionally by the number of times a word occurs in a document (TF) but is offset by the frequency of that word in the corpus (IDF).

2.2.2. Categorical Features Engineering

A common type of feature is the categorical features (e.g., brand or color of a product). Data in the categorical feature are not necessary for a numerical form but can be textual in nature. Therefore, to represent such a textual feature in a numerical format, the categorical data need to be pre-processed before being fed to machine learning models.

One-Hot Encoding (OHE): The most common way to represent categorical variables is by using OHE or dummy variables, which are widely used for both ordinal (feature of ordered values) and nominal (no order in the feature values) features. OHE is conducted by replacing the categorical variable with one or more new features that can have values of either 0 or 1.

3. Methodology

3.1. The Intuition

The main focus of the two proposed data transformations is the regression task of the categorical e-commerce data. The motivation of the proposed work is to increase the degree of linearity between the different product categories (groups). For linear data, two different values of x should be mapped to two different values of y. For non-linear data, this condition does not hold. For instance, if the data can be fitted by a quadratic curve, then two different x values can have the same y value. In this vein, the statistical analysis of variance (ANOVA) is used to test the degree of variances of a set of groups. ANOVA produces the F-value using Equation (1) [23]:

F = \frac{M S_{b g}}{M S_{w g}}

(1)

where

M S_{b g}

represents the between-group variation and

M S_{w g}

represents the within-group variation. The higher the F-value, the higher the distinction of each group’s response in comparison to the other groups. In other words, the observations of each group behave similarly to one another, but the observations from different groups behave distinctively.

We argue that the regression task of the categorical data should be easier with a higher F-value. Thus, we propose two data transformations that decrease the denominator,

M S_{w g}

, and increase the numerator,

M S_{w g}

; thus, the F-value increases as a result. In other words, the total variation in the dependent variable can be decomposed into two parts: the variation between groups and the variation within groups. We argue that the smaller the within-group variation and/or the higher the between-group variation, the more meaningful/useful the data categories for understanding/predicting the data. The idea can be explained in light of motivational examples.

The first motivational example discusses the effect of reducing the within-group variation. This example includes predicting the price of a given product based on its description, i.e., a regression task. If we have four different observations from two different product types, e.g., car audio systems and men’s suits, then the observations belong to two different groups. Additionally, the features of the observations at hand include the term frequency–inverse document frequency, (TF–IDF), the score of the product description, the product type, and the product price; Figure 1 shows this example.

Figure 1a contains four observations from two different groups, where each group has two observations. The observations include two features: the first feature is the product description, i.e., textual data, and the second feature is the group identifier. Additionally, each feature has a price-dependent feature known as the response vector. The description feature is converted into six different features using the TF–IDF technique. The first three features represent words that may appear in the description of the men’s suits group observations; the last three features represent words that may appear in the car audio group observations.

In Figure 1a, one can notice that the second and third observations belong to two different groups, but both observations have the same price. This case motivates the proposed work of separating as much data as possible from the different groups based on the outcome value, i.e., the price. We argue that the observations of different groups are more relative to the observations within the same group. Thus, this relationship should be reflected in the outcome variable as well.

Figure 1b shows the mean and the standard deviation of the men’s suit and car audio systems groups before and after possible data transformations. The proposed within-group reduction transformation reduces the variability within each group and maintains the same group means. The intuition of the proposed data transformations is to reduce the intersection of the response of the observations of different groups.

Finally, Figure 1c shows that the price of 63 can be a response to two observations (second and third observations) from two different groups. Apparently, each group has a different distribution shape, i.e., mean and variance. Since each observation belongs to a different distribution shape, the proposed within-group variation reduction data transformation modifies the shape of each group so that the intersection between the two different groups is minimized. Thus, the price of 63 changed to 50.08 for the first group and 76.61 for the second group. Moreover, considering 68% of the data, the minimum value of the second group,

μ - σ = 90.22 - 17.66 = 72.56

is larger than the maximum value of the first group

μ + σ = 37.15 + 13.11 = 50.26

. The proposed within-group reduction transformation forces the data to have a smaller within-group variation.

The second motivational example discusses the effect of increasing the between-group variation. Figure 2 includes data spread over four product categories, where each product category (group) has a different color. The x values of each group are closer to one another compared to the x values of the other groups. In other words, the predictor values of group observations are relatively closer than the predictor values of other groups. In Figure 2a, the regression line (the green line), connects to the average of each group. Obviously, the regression line with an accepted performance cannot be a linear equation. On the other hand, the data in Figure 2b can be addressed as a linear regression problem, i.e., using a polynomial equation with linear coefficients. A polynomial equation is impractical for a regression problem with so many independent variables, as discussed in detail at the end of Section 4.1.1.

In addition, we argue that observations of different groups with different values of the independent variable x that have similar or very close values of the dependent variable y might be ambiguous for the regressor. We call a region of the dependent variable y with observations of different groups an ambiguity region. Figure 2b depicts these regions as red rectangles.

The proposed between-group increment transformation reduces or eliminates these regions of ambiguity. This transformation suggests a shift value for each group’s data so that the group means are aligned on a straight line. In other words, the dependent variable y is shifted by a certain amount for each observation. This shifting amount is fixed per group. Figure 3 depicts the data in Figure 2 with two different shift values. Figure 3a depicts the data after shifting each group’s observations separately with a small shift (increasing the dependent variable values). The data in Figure 3a still have a small ambiguity region relative to the data in Figure 2b and the regressor line is linear. In contrast, Figure 3b depicts the same data with a large shift value. We call this data transformation the group means linear alignment.

3.2. The Proposed Data Transformation: Ambiguity Reduction

The proposed method is two-fold: decreasing the within-group variability and increasing the between-group variability. First, the variance of the dependent variable values is reduced per data group (cluster or category), as shown in Figure 1. Second, the values of the dependent variables are shifted in one direction, i.e., upward or downward, to align the groups’ averages in a straight line, Figure 3. In the following subsections, we discuss the proposed method in detail. Of note, the proposed data transformations are applied to the dependent value only, since it is used in calculating the F-value.

3.2.1. Within-Group Reduction

One aspect of improving the F-value is to decrease the denominator of Equation (1). Thus, we propose moving the observations’ values toward the group average to reduce the group variance (we interchangeably refer to this transformation as variance reduction transformation). The further the observation from the group average, the more the observation-dependent value moves toward the group average and vice versa. The observation value is changed by applying Equation (2):

y_{(j, i)_t r a n s f o r m e d} = \frac{y_{(j, i)_o r i g i n a l} + μ_{i} \times V R R}{V R R + 1}

(2)

where

y_{(j, i)_o r i g i n a l}

is the original dependent value of the observation j belonging to group i,

V R R

represents the variance reduction ratio of the data,

μ_{i}

is the mean of group i, and

y_{(j, i)_t r a n s f o r m e d}

is the transformed (new) dependent value of the observation j belonging to group i. In Equation (2), the higher the

V R R

, the less the group variance. In other words, a higher value of the

V R R

forces the observations’ dependent values of group i to be closer to

μ_{i}

. For instance, given that a group’s mean equals five,

μ = 5

, the dependent value of an observation equals ten,

y_{o r i g i n a l} = 10

, and

V R R = 0.5 .

Applying Equation (2), the value of

y_{t r a n s f o r m e d} = 8.3

. Thus,

y_{o r i g i n a l}

moved approximately 1.7 units toward the average. On the other hand, if the dependent value of an observation equals 6,

y_{o r i g i n a l} = 6

, the value of

y_{t r a n s f o r m e d} = 5.7

. Thus,

y_{o r i g i n a l}

moved approximately 0.3 units toward the average.

3.2.2. Between-Group Increase

Another aspect of improving the F-value is to increase the numerator of Equation (1). Thus, we propose increasing the between-group variability. We separately shift each group’s observations so that the averages of all the groups are aligned on a straight line (we interchangeably refer to this transformation as means linear alignment transformation). In other words, all the observations of the same group are shifted by the same value. Of note, only the dependent value is shifted, whereas the independent values are fixed. Therefore, this effect of shifting the observations of each group serves two purposes. First, it increases the between-group variability. Second, it increases the data linearity.

The algorithm for increasing the between-group variation is listed in Algorithm 1. This algorithm aligns the set of group means into a straight line. Thus, the algorithm first computes the line Equation (lines 1 to 4). Then, the algorithm computes the shift of each group’s mean, so that the shift fits on the obtained line (lines 5 to 8).

The algorithm has two inputs; an ascendingly sorted list that includes the mean of each data group and the slope percentage. The second input controls the degree of the slope, and the output is a list of shifts per group. For example, if the data have five groups, then the algorithm produces a list of five shifts, one shift per group. The maximum and minimum group means are extracted in lines 1 and 2, and the last and first means in the input list, respectively. Then, in line 3, the slope is computed by considering the first and last means as two successive points (the difference between them on the x-axis is one). Thus, the slope is calculated by subtracting the minimum group’s mean from the maximum group’s mean and dividing the result by one. Then, the slope is multiplied by a variable,

s l o p e_p e r c

, representing the calculated slope percentage. This variable can increase or decrease the slope based on the nature of the data at hand. Finally, the algorithm computes the amount of shift required to move the mean of each group to fit the line, lines 5 to 8.

Shifting the dependent values in one direction (positive or negative) guarantees that the sign of the shifted values, e.g., prices, is not flipped. The proposed data transformation is about determining the proper values of the two aforementioned steps independently per group.

Algorithm 1: Increasing the between-group variability
Input:
Means: a list of ascendingly sorted product category (group) means
slope_perc: a ratio to modify the slope
Output:
GroupShift: a list of group shift values
ShiftData(Means)

3.2.3. Ambiguity-Value: A Metric to Measure Ambiguity of the Dependent Variable

We define the data ambiguity of a regression model as observations of different groups having the same or close outcome values. For instance, observation a belongs to group 1, and observation b belongs to group 2; if the outcome of observation b is closer to observations a in comparison to other observations belonging to group 1, then observation b causes an ambiguity.

In this context, we propose a metric to measure the degree of data ambiguity by calculating the percentage of observations with outcome values located within the boundaries of many groups. This is achieved by counting the number of observations with an outcome value in the range of other groups rather than the group to which this observation belongs. Then, this counter is divided by the number of observations, n. We call this metric ambiguity-value and defined it in Equation (3):

a m b i g u i t y - v a l u e = \frac{\sum_{i = 1}^{k} \sum_{j = 1}^{m_{i}} \{\begin{matrix} 1 & , i f l \leq \sum_{g = 1}^{k} \{\begin{matrix} 1 & , i f (l b_{g} \leq o_{i, j} \leq u b_{g}) \\ 0 & , o t h e r w i s e \end{matrix} \\ 0 & , o t h e r w i s e \end{matrix}}{n}

(3)

where i represents the group number out of k groups,

i \in (1, k)

, j represents the observation number out of

m_{i}

observations per group,

j \in (1, m)

, l represents the ambiguity level,

l \in (2, k)

, g represents the group numbers other than the group the observation belongs to,

l b_{g}

and

u b_{g}

represent the lower and upper bounds of the outcomes of group g, respectively,

o_{i, j}

represents the outcome of the observation j belonging to group i, and n represents the number of all observations in the k groups.

For observation a belonging to group i, the intuition of the proposed metric is when observation a’s outcome value is only located within group i’s boundaries, then it is easier to handle the observation rather than observation a’s outcome value being located within the boundaries of more than one group. The ambiguity-value metric calculates the percentage of observations with outcome values located within the boundaries of at least l groups. The ambiguity level l is used to tune the ambiguity-value metric. For instance, when l equals four, only observations with outcomes belonging to four or more different group boundaries are counted.

3.2.4. Back-Transformation

The back/inverse transformation of the proposed method includes two steps. First, the shifted outcome values are shifted back with the same shift value but in the opposite direction. Second, as the data variance is reduced using Equation (2), the predicted value of the regression model is transformed back using the inverse equation, as in Equation (4):

y_{p r e d i c t e d_o r i g i n a l} = (y_{p r e d i c t e d_t r a n s f o r m e d} \times (V R R + 1)) - μ_{i} \times V R R

(4)

where

y_{p r e d i c t e d_t r a n s f o r m e d}

represents the output of the regression model for an input of group i and

y_{p r e d i c t e d_o r i g i n a l}

represents the back-transformed value, which is reported as the final value. Another approach to applying the back-transformation is to consider the problem as a regression task; we refer to this model as regression model-II. In this task, the predicted transformed values are the independent variables and the actual outcome is the dependent variable. Thus, the model can be seen as a two-level regression. The first level predicts the outcome in the transformed form, and the second level predicts the final reported value from the predicted transformed value. More specifically, this approach of back-transformation (e.g., regression model-II) may be thought of as a boosting type ensemble (as a general concept) where subsequent models correct the predictions from prior models.

3.2.5. Putting It Together: Ambiguity Reduction Transformation

Figure 4 depicts the block diagram of the proposed service with the two proposed data transformation methods with the regression model(s). The block diagram describes the complete regression task from the input observations to the predicted prices. First, the original transformed prices of the observations are transformed by applying Equation (2) to reduce the data variance per product category. Then, another transformation is applied to align the product category means on a straight line. Then, the regressor model, model-I, is trained on the transformed prices producing the predicted values. Two back-transformations should then be applied to the predicted prices in the reverse order. Thus, the predicted prices are back-shifted. Then, to reverse the effect of Equation (2), one possible solution is to apply the reverse of Equation (2). Another solution is to train another regression model, model-II, to reverse the transformation effect (back-transformation). Regression model-II takes the predicted prices as an input (independent variable) and the true prices as an output (dependent variable). Thus, regression model-II learns to reverse the effect of Equation (2).

As with any regression model, regression model-I produces the predicted value with an error from the true outcome. Trying to inverse this predicted value using the inverse equation of Equation (2) might increase the error value. Thus, we propose using regression model-II to reduce this error.

4. Experimental Results

4.1. Experimental Setup

The implementation of the proposed work is developed in the Python programming language. The problem being addressed is a multiple regression problem as the aim is to predict product prices. Therefore, various machine learning linear, tree-based, deep learning, and statistical regression models of different Python language packages are utilized. Herein, we list all the employed libraries and techniques in the proposed work; the Keras (https://keras.io, accessed on 3 January 2023) library with the TensorFlow [24] back-end are used to implement a deep recurrent neural network model. We adopted machine learning implementation using two different machine learning packages, namely, Scikit-Learn [25] and H2O (https://github.com/h2oai/h2o-3, accessed on 3 January 2023). Furthermore, two statistical regression models from the statsmodels package [26] are used. Other utilized libraries include Pandas [27], Numpy [28], and Matplotlib [29].

Specifically, a list of the used methods and techniques of different libraries include Keras (GRU, Model Checkpoint, Reduce LR On Plateau, and Early Stopping), Scikit-Learn (Linear SVR (LSVR for short), Linear Regression (LR), MLP Regressor, Gradient Boosting (GB) Regressor, Random Forest (RF) Regressor, Polynomial Features, Train-Test Split, OneHotEncoder, TFIDF Vectorizer, mean absolute error, mean squared error, and

R^{2}

score), H2O (Generalized Linear (GLM) Estimator, Gradient Boosting (GB) Estimator, XGBoost (XGB) Estimator, and Random Forest (RF) Estimator) and the statsmodels (i.e., GEE and mixedLM). The Python code used in this experiment can be downloaded from the author’s GitHub page (https://github.com/Ahmed-Fathalla/Transformation, accessed on 3 January 2023).

For easily refereeing those different ML models in the results section (Section 4), we added a prefix to the models according to its library (i.e., HO_ for H2O, and SK_ for Scikit-Learn followed by the model short name, whereas MLE refers to mixedLM). The hardware used to perform the experiments is a computer running Ubuntu 16.04 OS with two

2.3

GHz Intel 8-core processors, 500 GB of Ram, and P100 NVIDIA GPU.

4.1.1. Model Description

Each dataset is divided into a train set and a test set with a splitting ratio of 80% to 20%, respectively. All stated findings only apply to the test set. Additionally, ML models are adjusted so that they do not suffer from overfitting or underfitting. In other words, the acceptable score disparity between the train and test assessment metrics should be less than 0.003. The values of the SK_RF parameter that produced the findings reported for the four datasets are given in Table 1, where

n_e s t i m a t o r s

represents the number of trees created by the model and

m a x_d e p t h

represents the maximal depth each individual tree is grown to. Similarly to this, the HO_RF model has 30 trees as its fixed amount of trees. The rectified linear unit (ReLU) activation function is used for the hidden nodes in the two hidden layers of the MLP regressor, which have 512 and 128 hidden nodes, respectively. Additionally, the Adam optimizer is used with an adaptive learning rate that is initially set to 0.01, and the maximum number of iterations is set to 1000. Other hyperparameters are set to the default values implemented by the libraries.

The deep neural network (GRU for short) model is composed of two network branches (i.e., GRU and one-hot encoding branches). A GRU network branch consists of an embedding layer (of an embedding length of 128 and 300 words per item/textual feature), followed by a GRU of 128 hidden units, then a fully connected layer of 64 units. The one-hot encoding branch consists of a matrix of the categorical features of the used dataset. Finally, the output of those two network branches is concatenated together in one layer that is followed by a fully connected layer of 64 neurons to the output layer.

The implementation of the polynomial regression suffers from a memory error even for a polynomial degree of two, which is the lowest significant degree. Such a memory error is due to the huge number of features extracted by the TF-IDF. However, TF-IDF was limited to extracting the top 1000 features/words across the corpus/textual feature. For more illustration of such an error, a polynomial feature of degree n generates the N feature that is given by

N (n, d) = C (n + d, d)

for a dataset of d features. According to that equation, the Mercari dataset, which has 1113 features, will have C(1113 + 2,2) = 621,055 features. In conclusion, the polynomial feature is not applicable to data with a high number of features. Finally, since mixed linear model implementation does not specify the random effects structure, the default random effects structure (a random intercept for each group) is therefore automatically used.

4.1.2. Transformation Parameters

In this section, we present the different parameters for the two proposed data transformations (variance reduction and means linear alignment transformations).

Variance Reduction Transformation Parameters: The main parameter of this transformation is the variance reduction ratio (VRR). Thus, various values for the variance reduction ratio (VRR) are employed to test the transformation performance on the ML regression models. Specifically, 0.0, 0.01, 0.1, 0.5, 1.0, and 5.0 are tested as values for the

V R R

, where a

V R R

of 0.0 has no reduction effect. However, their results follow the same pattern; that is, as the value of

V R R

increases, the results of regression model-I increase but the final results become worse. Therefore, we select

V R R = 1

as a representative value for the variance reduction transformation.

Since back-transformation of the variance reduction transformation can be achieved by two methods (i.e., regression model-II or the inverse equation, Equation (4)) as mentioned in Section 3.2.4, we then refer to the method used to obtain the reported results in the results (Section 4) as a variable named method, m for short.

m = 1

, the default back-transformation method, refers to the inverse equation, Equation (4), whereas

m = 2

refers to the regression model-II method.

Means Linear Alignment Transformation Parameters: Between-group increment (or means linear alignment) transformation has only one parameter,

s l o p e_p e r c

.

s l o p e_p e r c = N o n e

consists of the original data and

s l o p e_p e r c = 0

is a horizontal line of means, whereas the other values for

s l o p e_p e r c

generate aligned means of the slope value equal to

s l o p e_p e r c

multiplied by the original slope of a line connecting the lowest and highest mean values. A list of values is used to check the means linear alignment transformation effect, which includes None, 0.0, 0.0001, and 0.5, where the original data, horizontal alignment, a tiny and high slops, respectively, of the aligned means, are given as results.

4.2. Datasets

In order to test the efficiency of the two proposed transformations accurately, we ran a set of experiments. We utilized four different e-commerce datasets that have commercial products with the aim of predicting item prices based on a set of features describing each item. In this section, we briefly describe each e-commerce dataset.

4.2.1. Mercari Dataset

The Mercari dataset (https://www.kaggle.com/c/mercari-price-suggestion-challenge, accessed on 3 January 2023) is a dataset for Mercari price suggestion Challenge, which is a Kaggle competition aimed at suggesting selling prices of secondhand products based on a set of tabular features; namely, item description, name, item condition, category name, brand name, and shipping information. The dataset is allowed to be used for the purposes of academic research and education. We used the textual features (name, item description), category_name, and prices. We concatenated both the product name and product item description together into one field named ‘item_description’. Category_name is split by the “/” character, resulting in three categorical features. The second category is selected (which has 113 unique categories) to be the categorical feature used in our work. The other two categories are ignored. The dataset has 1,482,535 observations, where prices range from USD 1 up to USD 2009. We dropped the observations where the category name or item_description are null, so the resulting dataset has 1,476,204 observations.

4.2.2. Used Cars Dataset

The used cars dataset is a public dataset published on Kaggle (https://www.kaggle.com/orgesleka/used-cars-database, accessed on 3 January 2023), which is scraped from eBay Kleinanzeigen. We used this dataset with the goal of predicting car prices based on a group of tabular features given in the dataset. More specifically, we dropped the irrelevant features. The employed features include offer_type, brand, abtest, vehicle_type, gearbox, model, kilometer, fuel_type, and price. All features are categorical features, whereas the model feature is the main feature we used for the proposed target variable transformation since it is the most significant feature that represents car prices. After removing the observation with null values of the model feature, the dataset has 351,044 observations, whereas the car model feature has 249 unique values. The car prices range from USD 1 up to USD 1,000,000.

4.2.3. Inside Airbnb Dataset

Airbnb is an online marketplace where people who want to rent out their homes are connected with people who are looking for accommodations in that area. Airbnb currently covers more than 81,000 cities and 191 countries worldwide. The Airbnb dataset used in this work is a combination of data from 32 different cities that are available online (http://insideairbnb.com/get-the-data.html, accessed on 3 January 2023) with a variety of prices. We merged those cities’ data into one dataset of 457,785 observations. We then selected features that have a significant correlation with the price and a lower number of null values. The selected features include name, description, street, city, room_type, property_type, dataset_name, and price. After dropping the observations of nulls, incorrect, and 0’s prices, the final dataset has 443,807 observations. The prices range from USD 1 up to USD 7,021,525.

4.2.4. Amazon Products Dataset

The Amazon products dataset contains product reviews and metadata from Amazon, which includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). The Amazon products dataset was developed by [30], and is available online (http://jmcauley.ucsd.edu/data/amazon/, accessed on 3 January 2023) for academic use. In the proposed experiment, we only utilized 12 different product categories. Then, we merged the 12 product categories and kept the following features only; price, description, and categories. The dataset has 1,559,192 observations, and the product prices range from USD 1 up to USD 999.

4.3. Accuracy Metrics

We utilized the most frequently used accuracy metrics to compare and evaluate the performance of the proposed data transformation. The evaluation is performed by running the regression model on the data with and without applying the proposed transformation(s). The selection of the evaluation metrics is based on the recently published works of the regression problems, e.g., against the best regression problems [14,17,31,32,33,34]. These four metrics include the root mean squared error (

R M S E

), mean absolute error (

M A E

), mean absolute percentage error (

M A P E

), and coefficient of determination (

R^{2}

). These metrics represent absolute and relative error evaluation methods.

4.4. Results

For better result assessment of the different evaluation metrics, all reported results are reported whereas the response variable’s values range between 0 and 1. The best results obtained by the hierarchical (MLE and GEE), and non-hierarchical (other ML) regression models are listed in Table 2 for the four datasets. Additionally, the results of the regression models on the data transformed with the proposed method is listed in the last line of the table. The results of the MLE and GEE models are the same, as both models are equivalent in linear models. The results of the first line and last lines of Table 2 are produced by the same non-hierarchical model; the only difference is that the table’s last line results are produced from the data with the two proposed data transformations applied. Of note, we compute the evaluation metric score improvement of the proposed data transformation using Equation (5) (for measuring

R^{2}

improvement, the equation must be multiplied by −1).

\frac{(b a s e l i n e_s c o r e - p r o p o s e d_m e t h o d_s c o r e)}{b a s e l i n e_s c o r e} \times 100

(5)

Table 3 lists the best-achieved results without and with applying the two proposed data transformations under the naming baseline and proposed, respectively. These best results are per evaluation metric and per dataset. Additionally, the name of the model producing the best results is indicated in the remark column. Finally, for each result, the evaluation metrics scores are listed as a percentage in the improvement row of the table.

Table 4 lists the model that benefits the most from the proposed data transformations per evaluation metric for each dataset. This model might not have achieved the best results for that dataset, but the model evaluation metric score improved the most in comparison to other models. For instance, for the Airbnb data the RF model

R^{2}

score improved by 13.92% using the proposed data transformations, but the

R^{2}

improved score,

0.4801

, is less than the best improved

R^{2}

score,

0.6196

, achieved by the GRU model.

To visually compare the obtained results of the proposed method and the baseline method, Figure 5 depicts the MAE and RMSE metrics for an SK_LR model. The model was tested for the proposed shifting and variance reduction transformation on the Mercari dataset. Apparently, predicting the data after applying the proposed data transformation improved the results by reducing the error values, especially when the parameters were

V R R = 1

and

m = 2

. The other SK-based models have the same level of prediction accuracy improvement, whereas the proposed data transformation improved the results regardless of the utilized values of the transformation parameters for the SK model; the proposed data transformation did not achieve the same success for the GRU model. Figure 6 shows that the proposed data transformation improved the results only when the used parameter values were

V R R = 1

and

m = 1

.

Table 5 shows the improvement in the evaluation metrics as the slope increases. The results are shown when only using the variance reduction, aligning the group means on a horizontal line,

s l o p e_p e r c = 10^{- 4}

,

s l o p e_p e r c = 0

, and

s l o p e_p e r c = 0.5

with VRR = 1. In general, the higher the

s l o p e_p e r c

, the better the reported evaluation metrics. These gained improvements in the evaluation metrics are degraded after applying the back-transformation on the predicted results. In some cases, the decline in the improvement of evaluation metrics becomes worse than the baseline evaluation metrics scores. In other cases, although on the decline, the final evaluation metrics scores outperform the baseline evaluation metrics scores, e.g.,

s l o p e_p e r c = 10^{- 4}

or

V R R = 1

.

Table 6 presents the model size and the training time of different models for the four datasets used, where model fitting time and size are measured in minutes and KB, respectively. Furthermore, ‘Not measured’ values had a memory error during saving the model to the disk due to the huge model size (>40 GB).

Table 7 lists different scores of ambiguity-value metrics, Equation (3), for the proposed data transformations with different parameters. Table 7 shows the proposed metric scores for two l values. First, l equals two, as this value is the least possible value, which represents the percentage of observations located in the range of at least two different groups. Second, the observation outcome value belongs to the largest number of group outcome value ranges, and l equals the number of these groups. For instance, in the last row of Table 7, applying the proposed data transformations with

s l o p e_p e r c = 0.5

and

V R R = 1

, there are no observations in the Mercari dataset located within the range of more than three different groups. Additionally, 99.55% of the observations of the Mercari dataset are located within at least the range of two different groups, and only 0.45% of the observations are located within at least three groups. Of note, the range of each group is considered the range of

μ \pm 2 \times σ

to eliminate the effect of outliers.

Table 7 outlines the huge reduction in the ambiguity-value metric after applying the proposed transformation. For instance, applying the proposed transformation with

V R R = 1

, the ambiguity-value metric improved from 58.23% to 3.04%. A further improvement was obtained by applying the proposed data transformation with

s l o p e_p e r c = 1 \times 10^{- 4}

and

V R R = 1

; the ambiguity-value metric became 0.45%.

4.5. Discussion

The effect of the proposed data transformations can be understood in light of three factors. First, the dataset properties, such as the number of records and the number of categories. Second, the regression model family, such as linear, tree-based, or neural network-based models. Third, the value of the transformations, i.e., the values to which the between-group variance increased and the within-group variance decreased.

First, the obtained results reveal that there are two factors controlling the effect of the two proposed data transformations: the dataset size and the number of categories. Table 2 and Table 3 emphasize these observations. The used cars dataset has the smallest number of records and the largest number of categories in comparison to the other three datasets. Thus, the effect of the proposed data transformations was insignificant ()an approximately 0.5% improvement on average) for the five transformation configurations of the first column of Table 5 on the evaluation metrics scores in comparison to the best regression model on the data without applying the proposed data transformations. Additionally, the best model benefits from the proposed data transformation with

V R R = 1

and

s l o p e_p e r c = N o n e

on the used cars dataset was the HO_GB with an approximately 7.26% improvement in the average of the four evaluation metrics.

On the other hand, between the four datasets, the Amazon dataset has the largest number of records,

1.5 \times 10^{6}

, and the smallest number of categories, 12 product categories. Running the GRU regression model on the Amazon data with the proposed data transformations, the evaluation metrics scores improved by approximately 6.8% on average whereas the SK Random Forest (SK_RF) model evaluation metrics scores improved by approximately 12.7% on average for the five transformations configurations of Table 5 in the first column. Applying the proposed data transformations on the Mercari and Airbnb datasets, the evaluation metrics scores were affected by the values between the discussed improvements of the used cars and Amazon datasets relative to the dataset size and the number of categories.

Second, the effect of the proposed data transformations on the evaluation metrics scores also depends on the model family. The linear regression models, SK_LR, SK_LSVR, and HO_GLM were not affected by increasing the

s l o p e_p e r c

or combining the two data transformations over the four datasets. These three models achieved improvements of 16.4% and 4.79% on average for the Amazon and used cars datasets, respectively, when

V R R = 1

and

s l o p e_p e r c = 0.5

. The tree-based models, SK_GB, SK_RF, HO_GB, HO_XGB, and HO_RF achieved improvements of 16.1% and 6.0% on average for the Amazon and used cars datasets, respectively, when

V R R = 1

and

s l o p e_p e r c = 0.5

.

Finally, the neural network-based models, SK_MLP and GRU, achieved improvements of 12.6% and 5.2% on average for the Amazon and used cars datasets, respectively, when

V R R = 1

and

s l o p e_p e r c = 0.5

; the latter dataset utilized the SK_MLP model only.

Third, for the Amazon dataset, all the regressor models have an overall improvement on the four evaluation metrics scores when the data transformations were applied with the values mentioned in the experiments. Of note, the non-linear models only achieved a 12.83% improvement on average on the Amazon dataset, when

V R R = 1

and

s l o p e_p e r c = 0.5

. The Mercari dataset showed a similar behavior to the Amazon dataset, except when the two data transformations are combined where the

s l o p e_p e r c = 0.5

. In this situation, the evaluation metrics scores declined drastically for all non-linear models, −26.496% on average for the used cars dataset.

The Airbnb and the used cars datasets have an overall improvement when the

s l o p e_p e r c

parameter value was not high (i.e.,

10^{- 4}

). Although the evaluation metrics scores before applying the back-transformation task were extremely high, as in Table 5. This indicates that the back-transformation of the proposed method was not effective with a non-linear model when the

s l o p e_p e r c

value was high.

Finally, the obtained results show the superiority of the GRU regression model of the textual dataset. Amazon, Mercari, and Airbnb datasets have textual features. These three datasets gave the best results using the GRU regression model.

4.6. Opportunities and Limitations of the Proposed Work

The proposed data transformations were validated on four well-known e-commerce datasets and the obtained results show that training on the datasets after applying the proposed data transformation improved the scores of the evaluation metrics; whereas the proposed idea is validated only using e-commerce datasets, it can be extended to any field where the dataset contains categorical data. The main target of the proposed work is to reduce the similarity between the different data categories to ease the prediction tasks. Thus, there are opportunities to use the proposed work in other fields rather than e-commerce.

The main limitation of the proposed work is the parameter tuning. The proposed variance reduction ratio data transformation includes the VRR parameter. The proposed back-transformation data transformation includes two parameters

s l o p e_p e r c

and method (i.e., m). For instance, in Figure 5, regardless of the values of the proposed data transformation parameters, there was an improvement in the prediction accuracy. On the contrary, the GRU model prediction accuracy did not improve with all the combinations of parameter values except only one combination improving the prediction accuracy, as shown in Figure 6. Thus, it seems that the parameter tuning of the proposed data transformation is crucial for improving the prediction accuracy. The main challenge is how to find such a combination. Of note, this is a common problem for any machine learning or deep learning model which is known as hyperparameter tuning. However, using the proposed data transformations, the user needs to tune the predictive model’s parameters and the data transformation parameter as well.

5. Conclusions

E-commerce data offer items of different product categories. Multiple regression tasks on categorical data require handling groups with different distribution shapes. In this paper, we proposed a price prediction service for the items of an e-commerce platform. The proposed service relies on two data transformations to facilitate regression tasks on categorical data. The two data transformations increase the between-group variance and decrease the within-group variance so that a single item’s price can belong to as few groups as possible. The proposed data transformations are evaluated over four different e-commerce datasets and four evaluation metrics. The evaluation metrics were improved slightly, approximately 2.8% on average, for the relatively small dataset, with

4.5 \times 10^{5}

observations. This improvement increased, approximately 8.3% on average, as the dataset size increased, to about

1.5 \times 10^{6}

observations for the Amazon dataset. The GRU regression model achieved evaluation metrics scores for the regression problem with the textual features. Additionally, best metric improvements on each dataset have average values of 16.8%, 8.0%, 6.0%, and 25.0% for

R^{2}

, MAE, RMSE, and MAPE, respectively. Future work includes applying the two proposed data transformations to multivariate regression problems.

Author Contributions

Conceptualization, A.F. and A.S.; methodology, A.S.; software, A.F.; validation, A.F., A.S. and A.A.; formal analysis, A.S.; investigation, A.F.; resources, A.A.; data curation, A.F.; writing—original draft preparation, A.A.; writing—review and editing, A.A.; visualization, A.A.; supervision, A.A.; project administration, A.S.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported via funding from Prince Sattam Bin Abdulaziz University project number (PSAU/2023/R/1444).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. The data can be found here: https://www.kaggle.com/c/mercari-price-suggestion-challenge, accessed on 3 January 2023; https://www.kaggle.com/orgesleka/used-cars-database, accessed on 3 January 2023; http://insideairbnb.com/get-the-data/, accessed on 3 January 2023; http://jmcauley.ucsd.edu/data/amazon/, accessed on 3 January 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Faiz, T.; Aldmour, R.; Ahmed, G.; Alshurideh, M.; Paramaiah, C. Machine Learning Price Prediction During and Before COVID-19 and Consumer Buying Behavior. In The Effect of Information Technology on Business and Marketing Intelligence Systems; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1845–1867. [Google Scholar]
Laird, N.M.; Ware, J.H. Random-effects models for longitudinal data. Biometrics 1982, 38, 963–974. [Google Scholar] [CrossRef] [PubMed]
Noor, K.; Jan, S. Vehicle price prediction system using machine learning techniques. Int. J. Comput. Appl. 2017, 167, 27–31. [Google Scholar] [CrossRef]
Yang, R.R.; Chen, S.; Chou, E. AI Blue Book: Vehicle Price Prediction using Visual Features. arXiv 2018, arXiv:1803.11227. [Google Scholar]
Kalaiselvi, N.; Aravind, K.; Balaguru, S.; Vijayaragul, V. Retail price analytics using backpropogation neural network and sentimental analysis. In Proceedings of the 2017 Fourth International Conference on Signal Processing, Communication and Networking (ICSCN), Chennai, India, 16–18 March 2017; pp. 1–6. [Google Scholar]
Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
Domingos, P.M. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78–87. [Google Scholar] [CrossRef]
Pek, J.; Wong, O.; Wong, C. Data Transformations for Inference with Linear Regression: Clarifications and Recommendations. Pract. Assess. Res. Eval. 2017, 22, 9. [Google Scholar]
Huberty, C.J.; Morris, J.D. Multivariate Analysis Versus Multiple Univariate Analyses; American Psychological Association: Washington, DC, USA, 1992. [Google Scholar]
Yan, X.; Su, X. Linear Regression Analysis: Theory and Computing; World Scientific: Singapore, 2009. [Google Scholar]
Cox, N.J. Transformations: An Introduction. 2005. [Google Scholar]
Nicholson, D.; Paranjpe, R. A Novel Method for Predicting the End-Price of eBay Auctions. Stanford 2013. Available online: https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwji2tT-37T-AhUE7zgGHds1AKQQFnoECAsQAQ&url=http%3A%2F%2Fcs229.stanford.edu%2Fproj2013%2Fdnicholson_rparanjpe_finalpaper_references_corrected.pdf&usg=AOvVaw1yqtlPXg-ZVDKhVm3xksDa (accessed on 3 January 2023).
Lee, S.; Choeh, J.Y. Predicting the helpfulness of online reviews using multilayer perceptron neural networks. Expert Syst. Appl. 2014, 41, 3041–3046. [Google Scholar] [CrossRef]
Ali, A.; Fathalla, A.; Salah, A.; Bekhit, M.; Eldesouky, E. Marine data prediction: An evaluation of machine learning, deep learning, and statistical predictive models. Comput. Intell. Neurosci. 2021, 2021, 8551167. [Google Scholar] [CrossRef] [PubMed]
Eldesouky, E.; Bekhit, M.; Fathalla, A.; Salah, A.; Ali, A. A robust UWSN handover prediction system using ensemble learning. Sensors 2021, 21, 5777. [Google Scholar] [CrossRef] [PubMed]
Abbas, M.E.; Chengzhang, Z.; Fathalla, A.; Xiao, Y. End-to-end antigenic variant generation for H1N1 influenza HA protein using sequence to sequence models. PLoS ONE 2022, 17, e0266198. [Google Scholar] [CrossRef] [PubMed]
Pal, N.; Arora, P.; Kohli, P.; Sundararaman, D.; Palakurthy, S.S. How Much Is My Car Worth? A Methodology for Predicting Used Carsâ’ Prices Using Random Forest. In Proceedings of the Future of Information and Communication Conference, Singapore, 5–6 April 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 413–422. [Google Scholar]
Shastri, M.; Roy, S.; Mittal, M. Stock Price Prediction using Artificial Neural Model: An Application of Big Data. Eai Endorsed Trans. Scalable Inf. Syst. 2019, 6, e1. [Google Scholar] [CrossRef]
Yu, L.; Jiao, C.; Xin, H.; Wang, Y.; Wang, K. Prediction on housing price based on deep learning. Int. J. Comput. Inf. Eng. 2018, 12, 90–99. [Google Scholar]
Tseng, K.K.; Lin, R.F.Y.; Zhou, H.; Kurniajaya, K.J.; Li, Q. Price prediction of e-commerce products through Internet sentiment analysis. Electron. Commer. Res. 2018, 18, 65–88. [Google Scholar] [CrossRef]
Fathalla, A.; Salah, A.; Li, K.; Li, K.; Francesco, P. Deep end-to-end learning for price prediction of second-hand items. Knowl. Inf. Syst. 2020, 62, 4541–4568. [Google Scholar] [CrossRef]
Robertson, S. Understanding inverse document frequency: On theoretical arguments for IDF. J. Doc. 2004, 60, 503–520. [Google Scholar] [CrossRef]
Girden, E.R. ANOVA: Repeated Measures; Sage: Newcastle upon Tyne, UK, 1992; p. 84. [Google Scholar]
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Seabold, S.; Perktold, J. Statsmodels: Econometric and statistical modeling with python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010. [Google Scholar]
McKinney, W. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; Volume 445, pp. 51–56. [Google Scholar]
Oliphant, T.E. A Guide to NumPy; Trelgol Publishing: Austin, TX, USA, 2006; Volume 1. [Google Scholar]
Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90. [Google Scholar] [CrossRef]
McAuley, J.; Targett, C.; Shi, Q.; Van Den Hengel, A. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 43–52. [Google Scholar]
Yadav, A.; Sahay, A.; Yadav, M.R.; Bhandari, S.; Yadav, A.; Sahay, K.B. One hour Ahead Short-Term Electricity Price Forecasting Using ANN Algorithms. In Proceedings of the 2018 International Conference and Utility Exhibition on Green Energy for Sustainable Development (ICUE), Phuket, Thailand, 24–26 October 2018; pp. 1–4. [Google Scholar]
Yu, M.H.; Wu, J.L. CEAM: A Novel Approach Using Cycle Embeddings with Attention Mechanism for Stock Price Prediction. In Proceedings of the 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), Kyoto, Japan, 27 February–2 March 2019; pp. 1–4. [Google Scholar]
Law, S.; Paige, B.; Russell, C. Take a look around: Using street view and satellite images to estimate house prices. arXiv 2018, arXiv:1807.07155. [Google Scholar] [CrossRef]
You, Q.; Pang, R.; Cao, L.; Luo, J. Image-based appraisal of real estate properties. IEEE Trans. Multimed. 2017, 19, 2751–2759. [Google Scholar] [CrossRef]

Figure 1. A motivational example of within-group variation reduction. * For 68% of the data (

μ \pm σ

).

Figure 1. A motivational example of within-group variation reduction. * For 68% of the data (

μ \pm σ

).

Figure 2. A motivational example of a low between-group score.

Figure 3. A motivational example of increasing the between-group variation, where the dependent variable (price) has two different shifting effect values.

Figure 4. A block diagram of the proposed service with the two proposed data transformations.

Figure 5. Error values for the representative regression SK model.

Figure 6. Error values for the GRU regression model.

Table 1. Scikit-learn random forest hyperparameter.

Dataset	n_estimators	max_depth
Amazon	30	13
Mercari	20	10
Inside_Airbnb	20	10
Cars	30	10

Table 2. Hierarchical vs. non-hierarchical regression models.

	Amazon				Mercari				Airbnb				Cars
	$R^{2}$	MAE	RMSE	MAPE	$R^{2}$	MAE	RMSE	MAPE	$R^{2}$	MAE	RMSE	MAPE	$R^{2}$	MAE	RMSE	MAPE
Nonhierar chical models	0.6427	0.0735	0.1008	26.9930	0.6040	0.0557	0.0747	39.5390	0.5863	0.0400	0.0568	10.3640	0.5477	0.0339	0.0481	18.697
MLE	0.4290	0.0993	0.1274	35.060	0.3511	0.0729	0.0956	54.935	0.5159	0.0446	0.0614	11.753	0.5363	0.0347	0.0488	19.046
GEE	0.4290	0.0993	0.1274	35.060	0.3511	0.0729	0.0956	54.935	0.5159	0.0446	0.0614	11.753	0.5363	0.0347	0.0488	19.046
proposed	0.6928	0.0667	0.0934	24.61	0.6543	0.0511	0.0698	36.018	62.069	0.0377	0.0545	9.6767	0.5640	0.0326	0.0472	18.000

Table 3. The best achieved results.

		Amazon		Mercari		Airbnb		Cars
		Score	Remarks	Score	Remarks	Score	Remarks	Score	Remarks
$R^{2}$	Baseline	0.6427	GRU	0.6040	GRU	0.58631	GRU	0.5477	HO_XGB
	Proposed	0.6928	GRU, $s l o p e_p e r c = 10^{- 4}$	0.6543	GRU, VRR = 1	0.62069	GRU, VRR = 1	0.5640	HO_GLM, VRR = 1 m = 2
	Improvement	7.79%		8.32%		5.86%		2.98%
MAE	Baseline	0.0735	GRU	0.0557	GRU	0.0400	GRU	0.0339	HO_XGB
	Proposed	0.0667	GRU, $s l o p e_p e r c = 10^{- 4}$	0.0511	GRU, VRR = 1	0.0377	GRU, VRR = 1	0.0326	HO_GLM & SK_LR, VRR = 1 m = 2
	Improvement	9.25%		8.26%		5.75%		3.83%
RMSE	Baseline	0.1008	GRU	0.0747	GRU	0.0568	GRU	0.0481	HO_XGB
	Proposed	0.0934	GRU, $s l o p e_p e r c = 10^{- 4}$	0.0698	GRU, VRR = 1	0.0545	GRU, VRR = 1	0.0472	HO_GLM, VRR = 1 m = 2
	Improvement	7.34%		6.56%		4.05%		1.87%
MAPE	Baseline	26.9930	GRU	39.5390	GRU	10.3640	GRU	18.697	HO_XGB
	Proposed	24.61	GRU, $s l o p e_p e r c = 10^{- 4}$ , VRR = 1	36.018	GRU, VRR = 1	9.6767	GRU, $s l o p e_p e r c = 0$	18.000	ALL, VRR = 1 m = 2
	Improvement	8.83%		8.91%		6.63%		3.73%

Table 4. Best improved model.

		Amazon		Mercari		Airbnb		Cars
		Score	Remarks	Score	Remarks	Score	Remarks	Score	Remarks
$R^{2}$	Baseline	0.3730	HO_GB	0.2567	HO_GB	0.4214	SK_RF	0.5011	SK_GB
	Proposed	0.4324	VRR = 1, m = 2	0.3275	VRR = 1, m = 2	0.4800	$s l o p e_p e r c = 10^{- 4}$	0.5521	VRR = 1, m = 2
	Improvement	15.92%		27.60%		13.92%		10.18%
MAE	Baseline	0.0735	GRU	0.0557	GRU	0.0400	GRU	0.0365	SK_GB
	Proposed	0.0667	$s l o p e_p e r c = 10^{- 4}$	0.0511	VRR = 1	0.0377	$s l o p e_p e r c = 0$	0.0332	VRR = 1, m = 2
	Improvement	9.25%		8.26%		5.75%		9.04%
RMSE	Baseline	0.1008	GRU	0.0747	GRU	0.0649	HO_RF	0.0505	SK_GB
	Proposed	0.0934	$s l o p e_p e r c = 10^{- 4}$	0.0698	VRR = 1	0.0615	$s l o p e_p e r c = 0$	0.0478	VRR = 1, m = 2
	Improvement	7.34%		6.56%		5.24%		5.35%
MAPE	Baseline	101.41	SK_MLP	65.231	HO_GB	12.708	SK_GB	19.51	SK_GB
	Proposed	35.517	horizontal line	56	VRR=1	11	VRR = 1, m = 2	18.00	VRR = 1, m = 2
	Improvement	64.98%		14.15%		13.44%		7.74%

Table 5. Different slope values for the Mercari dataset.

	SK_LR				GRU
	$R^{2}$	MAE	RMSE	MAPE	$R^{2}$	MAE	RMSE	MAPE
$V R R = 1$ & $s l o p e_p e r c = N o n e$	0.5335	0.0364	0.0478	14.731	0.7140	0.0278	0.0374	11.111
$s l o p e_p e r c = 0$	0.25379	0.063	0.0826	21.348	0.5446	0.048	0.0645	15.842
$s l o p e_p e r c = 10^{- 4}$	0.25382	0.0631	0.0828	21.437	0.5485	0.0481	0.0644	16.158
$s l o p e_p e r c = 0.5$	0.9994	0.0045	0.0059	1.8379	0.9996	0.0038	0.005	1.7401
$s l o p e_p e r c = 0.5$ & $V R R = 1$	0.9999	0.0022	0.0029	0.7246	0.9999	0.002	0.0027	0.6522

Table 6. Models’ sizes on disk in KB and model fitting times in minutes.

	Amazon		Mercari		Airbnb		Cars
	Space	Run Time	Space	Run Time	Space	Run Time	Space	Run Time
SK_LSVR	9	1.665	10	1.173	10	1.189	4	0.449
SK_LR	17	3.447	18	0.861	18	0.267	6	0.037
SK_MLP	13,703	73.25	14,915	11.54	14,711	4.328	5513	1.274
SK_GB	133	210.9	133	163.5	133	38.74	133	7.452
SK_RF	12,142	14.57	1839	3.208	1366	1.048	2384	0.218
HO_GLM	8098	0.149	9783	0.118	9403	0.069	825	0.042
HO_GB	284	3.058	298	3.042	288	1.024	162	0.616
HO_XGB	661	0.53	650	0.473	638	0.216	340	0.178
HO_RF	5007	18.2	3928	15.9	2969	4.678	2761	1.354
GRU	1,325,488	10.85	388,795	14.59	770.99	3.004	310.00	6.900
GEE	Not measured	1.219	41,181,999	1.425	12,159,951	0.442	2,148,072	0.105
MLE	Not measured	2.712	Not measured	3.143	25,348,159	0.863	3,076,098	0.167

Table 7. Ambiguity-value metric under different transformation values for the Mercari dataset.

	Largest l Value	$l = 2$	l = Largest_Value
No transformation	113	99.73%	58.23%
VRR =1	112	99.89%	3.04%
$s l o p e_p e r c = 0$	112	99.89%	3.04%
$s l o p e_p e r c = 1 \times 10^{- 4}$	113	98.87%	80.13%
$s l o p e_p e r c = 1 \times 10^{- 4}$ & VRR = 1	113	98.88%	80.09%
$s l o p e_p e r c = 0.5$	5	99.81%	2.42%
$s l o p e_p e r c = 1 \times 10^{- 4}$ & VRR = 1	3	99.55%	0.45%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fathalla, A.; Salah, A.; Ali, A. A Novel Price Prediction Service for E-Commerce Categorical Data. Mathematics 2023, 11, 1938. https://doi.org/10.3390/math11081938

AMA Style

Fathalla A, Salah A, Ali A. A Novel Price Prediction Service for E-Commerce Categorical Data. Mathematics. 2023; 11(8):1938. https://doi.org/10.3390/math11081938

Chicago/Turabian Style

Fathalla, Ahmed, Ahmad Salah, and Ahmed Ali. 2023. "A Novel Price Prediction Service for E-Commerce Categorical Data" Mathematics 11, no. 8: 1938. https://doi.org/10.3390/math11081938

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Price Prediction Service for E-Commerce Categorical Data

Abstract

1. Introduction

2. Related Work

2.1. Price Prediction of E-Commerce Products

2.2. Feature Engineering for Regression Problem

2.2.1. Textual Feature Engineering

2.2.2. Categorical Features Engineering

3. Methodology

3.1. The Intuition

3.2. The Proposed Data Transformation: Ambiguity Reduction

3.2.1. Within-Group Reduction

3.2.2. Between-Group Increase

3.2.3. Ambiguity-Value: A Metric to Measure Ambiguity of the Dependent Variable

3.2.4. Back-Transformation

3.2.5. Putting It Together: Ambiguity Reduction Transformation

4. Experimental Results

4.1. Experimental Setup

4.1.1. Model Description

4.1.2. Transformation Parameters

4.2. Datasets

4.2.1. Mercari Dataset

4.2.2. Used Cars Dataset

4.2.3. Inside Airbnb Dataset

4.2.4. Amazon Products Dataset

4.3. Accuracy Metrics

4.4. Results

4.5. Discussion

4.6. Opportunities and Limitations of the Proposed Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI