Comparing Two Deep-Learning Models in Discrete Choice Analysis of Customers’ Mobile Plan Preferences

Oliobi, Chidimma; Chinhamu, Knowledge; Zewotir, Temesgen

doi:10.3390/app14114616

Open AccessArticle

Comparing Two Deep-Learning Models in Discrete Choice Analysis of Customers’ Mobile Plan Preferences

by

Chidimma Oliobi

^*

,

Knowledge Chinhamu

and

Temesgen Zewotir

Department of Statistics, School of Mathematics, Statistics and Computer Science, College of Agriculture, Engineering and Science, University of Kwazulu-Natal, Durban 3630, South Africa

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4616; https://doi.org/10.3390/app14114616

Submission received: 15 April 2024 / Revised: 23 May 2024 / Accepted: 24 May 2024 / Published: 27 May 2024

Download

Browse Figures

Versions Notes

Abstract

:

A discrete choice experiment (DCE) was used to investigate students’ preferences for mobile phone plans at a South African university. Upon obtaining the data, this study compares the predictive performance of two machine-learning models for discrete choice analysis and makes recommendations for model selection. Using concepts from blocked fractional factorial designs, a locally optimal DCE was created for the choice sets. This contrasts with alternative ways that, in practice, could be more difficult, especially when there is a large number of attributes. The call rate, data speed, customer service, premiums, and network coverage were the features considered. A total of 180 respondents were chosen from the student population using a two-stage sample approach, and data were gathered through face-to-face interviews. In this study, two deep-learning models are examined to analyze the data, which are the artificial neural network (ANN) and the extreme gradient boosting (XGBoost) models. Root mean square error (RMSE) and mean absolute error (MAE) are used to assess the model fitness, while accuracy, precision, recall and F1 score were used to compare the models’ performance. The results showed that XGBoost performs better compared to ANN in model fitness and prediction. Thus, the use of the XGBoost deep-learning model in choice preference modeling is therefore encouraged.

Keywords:

discrete choice experiment; mobile telecommunication plans; deep models; XGBoost model; artificial neural network

1. Introduction

All facets of society, including the administration of our personal lives, as well as the educational, social, industrial, and economic sectors, depend heavily on communication. It now permeates every aspect of our daily lives. Over the past three decades, there have been major technological breakthroughs in the mobile communication sector. From 1G and 2G, which essentially allowed users to speak and send text messages via their mobile phones, to 3G, which permitted connection to the internet and reception of multimedia communications, mobile communication technology has advanced. Due to 4G’s extremely fast speed, mobile internet became widely available. In operation since 2018, 5G technology has a speed of 10 gigabits per second (Gbps), which is 10 to 100 times faster than that of 4G [1,2].

Profit maximization is the aim of every sector, including telecommunications. Given how fiercely competitive the telecommunications sector is, service providers must go above and beyond to satisfy clients in order to both keep and win over new ones. The natural tendency of consumers is to select goods or services that satisfy them. When making this decision, they typically take a few things into account. The demands and views of their customers’ wants and perceptions about price, product, promotion, service quality, customer service, and other aspects that may be influencing their choice of telecommunication service providers must thus be identified by telecommunication firms in developing a sustainable and high performing mobile broadband [3,4]. This would make it possible for the service providers to plan their operations profitably.

There is a wealth of literature on prior studies on customer preferences for mobile telephony networks and services. In order to discover important characteristics influencing Bangladeshi consumers’ choice of telecommunication service providers, Haque et al. [5] conducted a study using data obtained via non-probability convenience sampling methods and a methodology based on structural equation modeling (SEM). The findings indicated that Bangladeshi customers gave price the utmost weight when deciding which telecommunications service provider to use.

Using the SEM methodology, Olatokun, and Nwonne [6], looked at the variables that affect consumers’ decisions regarding their mobile service provider in Nigeria. From the study population, 367 respondents were selected using convenience sampling: a non-probabilistic sampling technique. A survey was conducted through face-to-face contact with mobile phone users and questionnaires were used to collect data. The findings showed that call rate, service quality, and service accessibility were the aspects that mobile phone consumers gave the most thought to when making their decisions. Dagli and Jenkins [7] used a discrete choice experiment (DCE) to research the variables North Cyprus consumers thought were most crucial for enhancing mobile phone services. The decision scenarios were built using an orthogonal main effects plan (OMEP), and the respondents were chosen using an exogenous stratified random sampling (ESRS) probability sampling technique.

According to the findings, people valued unlimited internet use, faster internet speeds, and unrestricted roaming the most. Confraria et al. [8] employed a DCE to determine what Portuguese consumers value most when choosing a mobile phone service. The study’s decision scenarios were created using an orthogonal main effects plan (OMEP). The authors demonstrated that consumers would spend an additional 1.3 euros per month to receive a commitment period that was lowered from one year to six months. In order to get over the drawbacks of the non-probability sampling approaches employed in [5,6] a probability two-stage sampling technique was applied in this study to choose our respondents. In addition, a locally optimal DCE was constructed for our choice sets using principles from blocked fractional factorial designs (BFFD) readily available, as opposed to alternative approaches that might be more technically challenging, especially when the number of characteristics is large. In a typical DCE, respondents are shown samples of hypothetical scenarios that were selected a priori from all feasible scenarios using statistical design approaches.

In each scenario, each respondent must choose only one answer. Options for each scenario include a number of attributes with at least two levels. It is anticipated that the respondent would select the choice that would benefit them the most, or the option with the highest utility. The degree to which each feature influences the decision maker’s choice is measured by ensuring a specific amount of variety in the scenarios [9].

Numerous study areas, such as marketing science [10], energy prices [11,12] infection contact tracing in healthcare [13], healthcare drug administration [14,15], sustainable agriculture [16,17], tourism development [18,19], sustainable investment from private investors [19], food research [20], human resource policy intervention [21], and transportation infrastructure management [8], have found use for discrete choice experiments. More particularly in this study, the use of DCE is centered on telecoms, with a summary of studies presented in Table 1 below.

From Table 1, discrete choice analysis has been applied in the context of telecom to elicit customers’ preferences but scarcely using ML models. The multinomial logit (MNL) model proposed by McFadden [26] is the most popular model for discrete choice analysis. With the MNL model, choice probabilities have a closed form and are easily interpretable [27]; however, strict statistical assumptions imply that the model capture all sources of variability and correlation between alternatives (and individuals) using a linear combination of covariates [28]. Furthermore, according to Lheritier et al. [29], the MNL model formulation requires domain knowledge to incorporate sources of heterogeneity and to introduce non-linearity on the attributes of the alternatives. Different versions of the MNL model have been proposed in a bid to add more flexibility to its underlying assumptions. The most common approach to capture that response heterogeneity in mode choice is the mixed multinomial logit (MMNL) models [30,31,32].

However, compared to classic statistical models, ML approaches are a viable alter-native for choice modeling. They provide an alternate approach to standard prediction modeling that may overcome existing shortcomings. By using data-driven learning to represent complex relationships between variables, machine-learning techniques avoid making strong prior assumptions [33]. ML models are more flexible and have better prediction accuracy because they learn from the data [34]. They are able to represent complex relationships [35], demonstrating outstanding results in many areas. Many studies have demonstrated that ML approaches outperform logit models in terms of predictive power [35,36,37,38]. The cited studies have made substantial contributions to the use of ML in discrete choice issues. Weng et al. [39] used an SVM model to study the behavior of public transportation choice. In a large-scale problem, Liang et al. [40], used SVM to estimate the household travel choice mode. Tang et al. [41] simulated travel mode decisions using the classification tree (CT) technique under dynamic behavioral processes. Pitombo et al. [42] estimated mode choice based on the location and socioeconomic characteristics using CT and other approaches. The random forest (RF) ensemble method has been employed by other scholars such as Sekhar et al. [43], who looked at the mode choice behavior of travelers in Delhi. Cheng et al. [36], also investigated the interpretation and predictive power of an RF to simulate travel mode decision behavior. The extreme gradient boosting (XGBoost) model is another ML method that has been applied in travel demand prediction. Wang and Ross [44] employed XGBoost to analyze data from transport mode choice research carried out in the Delaware Valley region. Neural networks (NNs) have also been applied to predict modal choice [41]. All these studies demonstrate the predictive power of ML techniques, but most of them also critique the black-box system that underlies the predictions they make. Hence, a key benefit of logit models which makes them more attractive than ML, is its interpretability ascertained from the regression coefficients. However, in recent times, a number of travel behavior modeling studies [35,36,44], have investigated the possibility of using model-agnostic methods, like permutation feature importance, to extract interpretable information from ML models.

Despite the wide application of discrete choice modeling, as can be seen in Table 1, there is sparse application of DCE in telecom and scarcely any using deep-learning models. Hence, the objective of this study is to employ two deep-learning models (artificial neural network (ANN) and the extreme gradient boosting (XGBoost)) rather than the traditional econometric models (such as MNL), to analyze discrete choice data. The discrete choice experiment (DCE) investigated customers’ preferences for mobile plan using a South African university as case study. It builds upon a previous study in [24], that utilized blocked fractional factorial designs to construct locally optimal DCE for the choice sets. Root mean square error (RMSE) and mean absolute error (MAE) were used to assess the models’ fitness, while accuracy, precision, recall and F1 score were used to compare the models’ performance. Having applied the deep-learning models to the discrete choice analysis of mobile plan preferences of customers, it is easier for the applicability of these deep-learning models in other areas and cases, hence, overcoming the constraints of traditional approach in DCE and achieving higher model accuracies. This study is envisaged to have decision-making implications for telecommunication companies in enhancing customers satisfaction.

The remaining part of this article is organized as follows: the study’s methodology is presented in Section 2. Section 3 presents results and discussion, while Section 4 presents concluding thoughts.

2. Methodology

In this section, we discuss the discrete choice experiment (DCE) and its development process in this study. We also discuss the deep-learning models used in modeling customers’ preferences for mobile telecom plans. These models are the artificial neural network (ANN) model and the extreme gradient boosting (XGBoost) model.

2.1. The Discrete Choice Experiment (DCE)

The DCE is an attribute-based survey approach for assessing the advantages (utility) or quantifying subject preferences for different attributes and figuring out how various attributes affect subject choices. In a discrete choice experiment, respondents are given samples of hypothetical situations (choice sets) that were preliminary selected among all feasible choice sets in accordance with the rules of statistical design. Each option in a choice set has many qualities with two or more tiers. Each choice set consists of a number of possibilities. The majority of the time, a single survey would ask each respondent a number of multiple-choice questions, with each question presenting only one choice set. The option that the subject selects is thought to have the highest utility in each choice set, where utility refers to the advantage that the subject obtains from making a certain decision. According to Tarfasa et al. [45], it is possible to assess the extent to which each characteristic influences the decision maker’s choice by obtaining a specific variance in the scenarios and estimating the marginal rates of substitution of the qualities. Using information gathered from a stated preference survey, we present an application of DCEs to elicit students’ preferences for mobile telecommunications plans in this work. The definitions of the different attributes and levels are presented in Table 2. Tetteh et al. [15] affirmed that the design of the DCE is important since it dictates how qualities, and their levels are merged to produce decision sets. There are 1024 distinct combinations of the five attributes: call rates, data speed, customer service, premiums/incentives, and network coverage, each with four levels (i.e., 4 × 4 × 4 × 4 × 4 × 4). Given the size of the study, it was evident that this was not practicable. The FrF2 function in the R package was used to minimize the size of the design to a 16 run fractional factorial experimental design using BFFD. The DCE questionnaire is built on the options produced by this experimental design.

2.2. Survey Design

The study was conducted at Pinetown, South Africa’s University of KwaZulu-Natal’s Edgewood Campus. With the use of a two-stage sample procedure, 180 respondents were chosen from among the mobile student customers in the various dorms at the institution. More specifically, ten residences were chosen at random from the campus’s twelve residences. From each of the chosen residences, an additional 18 respondents were chosen at random. A stated preference survey was developed. At the start of the choice task, a description of the overall decision scenario was provided. This explanation puts the options in the context of how the respondents should choose. Table 3 is an example of the chosen scenario. Each respondent chose a favorite choice from each of the four scenarios to complete the questionnaire. Face-to-face interviews were the main method of conducting the survey. An additional decision set, the fifth scenario (which is a repeat of the first scenario), was included to test for biases and the experiment’s validity. This was done to assess preference consistency. It was anticipated that sensible respondents would provide uniform responses to the two queries. In this study, 74% of “rational” responses (133/180) were included.

2.3. Artificial Neural-Network Model (ANN)

An ANN is a computational model that is entirely based on the shape and capabilities of biological neural networks. ANNs are non-linear statistical information modeling tools that are used to model complex relationships between inputs and outputs or to determine patterns. ANNs are made up of layers of interconnected nodes, or neurons, where each neuron processes information and exchanges messages with neurons in adjacent layers [46]. The three layers of neurons are the input layer, the hidden layers, and the output layer. Data are received by the input layer and is then passed to the hidden layers. In the hidden layers, the supplied data are transformed through a number of processes so that relevant features may be extracted [46]. The output layer uses the information that has been processed to create the final output. The strength of a connection between neurons is determined by the weight assigned to each connection [47]. Prior to the activation function being applied, biases are added to the weighted sum. Activation functions enable the model to learn complex patterns. They introduce non-linearity into the model and common examples are ReLU (rectified linear unit), tanh, sigmoid, and softmax (for multi-class classification). A total of 70% of the data is used for the training set, while 30% is used as a testing set [48]. Training was repeated for 80% and 90% of the data.

The neural-network data-processing algorithm involves the different steps to prepare the data for training and prediction. One is data cleaning, where inconsistent data, outliers, and missing values are dealt with. Features are normalized to ensure that the scale of all input variables are similar. Another step is feature selection, where the most predictive features for the target variable are selected. Next is data splitting, where the dataset is split into training, validation, and test sets. This aids in assessing how well the model performs with unseen data. One other important step is data encoding. At this stage, input data are transformed into suitable format for neural-network training by scaling or standardization. Categorical variables are converted to numerical representations using one-hot encoding or label encoding. To ensure effective model training, there is a need to organize the data into batches the data. The order of the batches is randomized during training so as to lower the chance of overfitting. Data augmentation is applied to image or text data to artificially increase the diversity of the training data. Also, a data pipeline can be constructed for loading and preprocessing data during model training and evaluation.

Convolutional neural networks (CNNs): These networks are frequently employed in image processing, which includes normalization, augmentation, and resizing of images.

Recurrent neural networks (RNNs): When used with sequential data, such as text or time series, these require temporal alignment for time series data and tokenization, embedding, and padding for text data.

Feed-forward neural network (FNNs): This is a classification algorithm inspired by biology. It is made up of a number of simple neuron-like processing units that are arranged in layers. Every unit in a layer is related to every unit in the layer before it. It is the most basic type of neural network. Feature scaling, categorical variable encoding, and data partitioning into training, validation, and test sets are common preprocessing steps for tabular data [49,50]. The tangent hyperbolic (Tanh) activation function is used. The mathematical representation of the ANN model is given as:

y = Φ_{0} [α + \sum_{h = 1}^{H} w_{h} Φ_{h} [α_{h} + \sum_{j = 1}^{J} w_{j h} X]]

(1)

where:

y

—choice set;

w_{j h}

—weight from the input to hidden nodes;

w_{h}

—weight from the hidden to the output node;

X

—are call rate, data speed, customer service, premiums, and network coverage;

α

,

α_{h}

—bias;

Φ_{0}

,

Φ_{h}

—activation functions (Tan h).

Artificial neural networks are relevant in various machine-learning applications due to their versatility and capability. They do, however, have some limitations. Apart from the fact that they have a tendency to overfit, particularly when trained on limited datasets, it is sometimes challenging to interpret their decisions. Also, substantial computational cost is required to train deep networks with lots of layers and neurons. Optimal performance is only obtainable when the correct architecture (number of layers, neurons per layer, activation functions) and optimization methods have been selected. Their performance can be enhanced by performing appropriate preprocessing processes such as normalization, standardization, and handling missing values [37].

2.4. Extreme Gradient Boosting (XGBoost) Model

XGBoost (extreme gradient boosting) is an advanced machine-learning algorithm that is particularly effective for regression and classification tasks. It belongs to the family of gradient-boosting methods and is known for its high predictive performance and ability to handle complex datasets. XGBoost builds a predictive model by iteratively combining multiple weak predictive models, called decision trees, to form a strong ensemble model [51]. The algorithm optimizes an objective function by minimizing the sum of the loss function and a regularization term, thereby reducing both bias and variance in the final model. The ensemble’s predictions for each individual tree are added together and weighted by a shrinkage factor (learning rate), to produce the XGBoost model’s prediction:

Ŷ = Σᵢ Fᵢ(x)

(2)

where:

−: Ŷ represents the predicted value of the dependent variable (choice),
−: Σᵢ denotes the summation of all individual decision trees (base learners),
−: Fᵢ(x) represents the prediction of the ith decision tree for the given input variables x (where the x variables are the call rate, data speed, customer service, premiums, and network coverage).

In XGBoost, each decision tree is constructed progressively, with succeeding trees picking up on the mistakes (residues) of the predecessors. This method improves overall performance by allowing the model to concentrate on the patterns that the earlier trees were unable to identify. Regularization, gradient-based optimization, and tree pruning are just a few of the approaches that XGBoost uses to increase the accuracy of its predictions [52]. These methods enhance model robustness, decrease over-fitting, and enhance generalization to new data. A particular loss function, such as mean squared error (MSE) for regression or log loss for classification, is minimized during the training phase in order to determine the parameters of the XGBoost model. The performance of the model can also be improved by adjusting hyper-parameters like the learning rate, maximum depth of the trees, and regularization parameters. Due to its capacity for handling big datasets, handling missing values, and automatically handling feature selection, the XGBoost method has grown in popularity. In several industries, including finance, healthcare, and natural language processing, it is now an extensively utilized tool. Programming languages like Python, R, or Scala and specialized libraries like XGBoost or scikit-learn can be used to implement the XGBoost model. These libraries offer practical tools for developing, testing, and optimizing XGBoost models. XGBoost is therefore a strong and adaptable machine-learning algorithm that uses ensemble techniques to produce precise predictions. It is a useful tool for a variety of predictive modeling jobs since it can handle complex interactions and large-scale datasets. It is also able to provide insight into feature importance, making it possible for users to determine which features have the greatest bearing on predictions. Although XGBoost is known for its effectiveness in a wide range of predictive modeling tasks [52], it has certain limitations that should be considered, one of which is that it is not as interpretable as simpler models like linear regression although techniques like SHAP (Shapley Additive exPlanations) values or partial dependence plots can provide some insights into model behavior. XGBoost can be sensitive to noisy data, outliers, and missing values. This can impact the model’s performance because noisy data can cause false splits. To address this, it is critical to preprocess the data by dealing with missing values, eliminating outliers, and using proper feature scaling or normalization methods. In addition, it can learn complex patterns in the training data resulting in extremely specific models that perform well on the training set but generalize poorly to new, unseen data. It also requires careful tuning of hyperparameters to achieve optimal performance and this can be time consuming. A potential problem with XGBoost is that the use of large and deep trees can lead to overfitting. To address this, there is a need to tune the hyperparameters so as to regulate the size and complexity of the trees. Regularization methods can also be used, such as lambda (L2 regularization term on weights) and gamma (minimum loss reduction necessary to make a subsequent division on a leaf node) [51].

2.5. Model Validation

A statistical method called cross-validation is used to evaluate how well machine-learning models generalize to new data, hence determining model performance. By offering a more reliable assessment of the model’s actual performance, it helps reduce overfitting, a prevalent issue in machine learning where a model learns to memorize the training data instead of learning the underlying patterns. The many cross-validation methods, such as leave-n-out cross-validation, k-fold cross-validation, and hold-out cross-validation, each have their benefits and applicability based on the size, structure, and modeling goals of the dataset. For this study, the hold-out cross-validation method was applied. In holdout cross-validation, the dataset is divided into a training set and a testing set, which are two distinct sets. Part of the data is set aside for testing the model’s performance and another portion is utilized for training the model. This entails understanding the underlying patterns in the training data, fitting the model to it, and then optimizing its parameters to minimize the selected objective function (e.g., maximizing accuracy in classification or minimizing loss in regression). The testing set which comprises unseen data not used in training, is used to assess the model’s performance after it has been trained. This evaluation gives an estimate of the model’s generalization ability to new, unseen instances. One disadvantage of this approach is the fact that the findings could be affected by the specific random split of the data.

2.6. Model Performance Measures

Accuracy, precision, recall, and F1-score would be used to evaluate the models’ performance. These are common performance metrics used in cross-validation. The following is a discussion of their mathematical expressions:

A c c u r a c y = \frac{T N + T P}{T N + T P + F N + F P}

(3)

P r e c i s i o n = \frac{T P}{T P + F P}

(4)

R e c a l l = \frac{T P}{T P + F N}

(5)

F 1 - s c o r e = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(6)

where TN, TP, FN, and FP stand for true negative, true positive, and false negative, respectively. These are discovered in the confusion matrix, and the performance requirements are then computed using them. In classification problems involving machine learning, the confusion matrix is a performance indicator. The true positive, true negative, false positive, and false negative are displayed in a 2 by 2 table. The confusion matrix table has a size that is equal to the number of classes squared when multi-class classification is taken into account.

2.7. Model Fitness Criteria

Three criteria are used in this study to evaluate the effectiveness of the model. The root mean square error (RMSE) and mean square error, which gauge the value discrepancy, and the mean absolute error (MAE). These are both the actual values as well as the values the model anticipated. A technique called MAE can be used to assess the precision of continuous variables. We can determine the average size of the errors once we have a set of forecasts. It does not consider the direction of the magnitude.

3. Results and Discussion

For each classification model taken into consideration, the findings of the performance metrics are displayed. The modeling for each classification model uses the data divided into both the training and testing sets. The training set is used to model, while the test data are used for the forecast using the constructed models. The model fitness for the XGBoost and ANN models are displayed in Table 4 and Table 5, respectively.

Table 4 shows the model fitness for the XGBoost model. The table shows the accuracy, sensitivity, specificity, and AUC values of 77.78%, 80.32%, 60.71%, and 78.64% at 70% training set; 77.83%, 80.40%, 61.67%, and 77.73% at 80% training set; and 77.90%, 89.61%, 60.61%, and 79.27% at 90% training set, respectively. The table also shows that the XGBoost model produced MSE, MAE, and RMSE values of 0.15417, 0.3086, and 0.3931 at 70% training set; 0.15381, 0.30497, and 0.39219 at 80% training set; and 0.14940, 0.29790, and 0.38650 at 90% training set. This implies that the XGBoost model performs best at the training set of 90% with the lowest MSE, MAE, and RMSE values.

Table 5 above shows the model fitness for the ANN model. The table shows the accuracy, sensitivity, specificity, and the AUC values of 76.39%, 78.59%, 53.70%, and 75.6% at 70% training set; 76.93%, 79%, 54.08%, and 77.29% at 80% training set; and 77.43%, 79.49%, 57.14%, and 79.14% at 90% training set, respectively. The table also shows that the ANN model produced MSE, MAE, and RMSE values of 0.1585, 0.3236, and 0.3981 at 70% training set; 0.15504, 0.31208, and 0.39376 at 80% training set; and 0.1511, 0.3046, and 0.3887 at 90% training set. This implies that the ANN model performs best at the training set of 90% with the lowest MSE, MAE, and RMSE values.

Figure 1, Figure 2 and Figure 3 show the area under curve (AUC) plots for the XGBoost model, while Figure 4, Figure 5 and Figure 6 show the area under curve (AUC) plots for the ANN model.

4. Model Performance Result

In this section, the comparison between the models is presented using the four metrics: accuracy, precision, recall, and the F1 score. This is done at the considered varying training sets (70, 80, and 90) so as to determine the best data splitting that best depicts the data. Table 6 displays the model performance comparison for the two models.

The performance metrics for the two models under consideration are shown in Table 6 above. The outcome demonstrates that the XGBoost model produced accuracy, precision, recall, and F1 score values of 77.78%, 80.32%, 92.59%, and 86.02% at 70% training set; 77.83%, 80.40%, 92.63%, and 86.87% at 80% training set; and 77.90%, 89.61%, 93.98%, and 87.20% at 90% training set, respectively. For the outcome using the ANN model, the accuracy, precision, recall, and F1 score values are 76.39%, 78.59%, 92.90%, and 85.15% at 70% training set; 76.93%, 79%, 92.95%, and 85.71% at 80% training set; and 77.43%, 79.49%, 93.20%, and 86.37% at 90% training set, respectively. This demonstrates the efficacy of these models in estimating consumer preference for mobile networks.

ML is therefore recommended for its efficacy in modeling consumer preferences. Compared to the traditional linear models used in consumer choice theory, ML models can capture complex non-linear relationships in consumer preferences. By taking into account the subtle interactions between features, the model can yield more correct predictions of consumer behavior. Since ML algorithms can easily handle large volumes of data, big data sources can be leveraged to obtain more comprehensive and data-driven insights into consumer behavior. ML approaches permit flexible model specification which makes it possible for researchers to model consumer preferences in innovative ways. It allows for the inclusion of diverse preferences and behaviors and does not make strict assumptions about utility functions or decision-making processes. Behavioral biases like loss aversion and framing effects can be incorporated into ML models to capture more realistic decision-making scenarios. The incorporation bridges the gap between theoretical concepts and empirical findings, consequently enhancing consumer choice theory and leading to more improved and robust models of consumer behavior. One drawback of the ML models is that predictive accuracy is prioritized over interpretability. Because of this black-box nature, it could be necessary to develop new approaches (such feature importance analysis and model-agnostic interpretation methods) extract insights and to validate model outputs.

5. Conclusions

Every company, including telecommunications providers, aims to maximize profit. To accomplish this goal, telecommunications firms should understand what their customers need and think about the price, product, marketing, service quality, customer service, and other elements that might affect their decision about which telecommunication service providers to use. Additionally, if these businesses want to continue having a steady consumer base in the future, they should emphasize addressing the demands and attitudes of young people. Proper and adequate model predictions can help the company address these problems of preferences and be able to improve and attract consumers’ interest. Therefore, the purpose of this study was to examine the predictive performance of two machine-learning models for discrete choice analysis and make recommendations for model selection. To build a locally optimal DCE for our option sets, we employed readily available principles from blocked fractional factorial designs as opposed to alternative approaches that may be more realistically complex, especially when the number of attributes is large. A two-stage sampling procedure was used to pick 180 University of KwaZulu-Natal, South Africa (Edgewood Campus) students for the study. The data were analyzed using XGBoost and the ANN models. The findings show that, in modeling the consumer’s preference for mobile telecoms plan, the XGBoost model is preferred over ANN as it produced minimum MSE, MAE, and RMSE values and also higher accuracy, precision, recall and F1 score values.

For future research, we intend to explore additional factors that may affect model performance, such as socioeconomic variables. Socioeconomic factors, such as income, education, and occupation, can have a direct impact on the preferences and decision making of consumers. For feature selecting and model building, it is important to understand which socioeconomic factors have the greatest influence on decisions.

In conclusion, consumer choice theory has undergone a paradigm shift with the use of ML to model customer preferences. It offers nuanced analytical tools, scalability, and predictive capability, although it also necessitates careful consideration of interpretability, data ethics, and the integration of behavioral insights to improve understanding of consumer decision making.

Author Contributions

Conceptualization, C.O.; methodology, C.O.; software, C.O.; validation, K.C. and T.Z.; formal analysis, C.O.; investigation, C.O.; resources, K.C. and T.Z.; data curation, C.O.; writing—original draft preparation, C.O.; writing—review and editing, K.C. and T.Z.; supervision, K.C. and T.Z.; project administration, K.C. and T.Z.; funding acquisition, K.C. and T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors are grateful for the support from the University of Kwa-Zulu Natal in undertaking the study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

da Silva, M.M.; Guerreiro, J. On the 5G and Beyond. Appl. Sci. 2020, 10, 7091. [Google Scholar] [CrossRef]
Panwar, N.; Sharma, S.; Singh, A.K. A survey on 5G: The next generation of mobile communication. Phys. Commun. 2016, 18, 64–84. [Google Scholar] [CrossRef]
Edquist, H.; Goodridge, P.; Haskel, J.; Li, X. Lindquist. How important are mobile broadband networks for the global economic development? Inf. Econ. Policy 2018, 45, 16–29. [Google Scholar] [CrossRef]
Umoh, V.; Ekpe, U.; Davidson, I.; Akpan, J. Mobile Broadband Adoption, Performance Measurements and Methodology: A Review. Electronics 2023, 12, 1630. [Google Scholar] [CrossRef]
Haque, A.; Rahman, S.; Rahman, M. Factors Determinants the Choice of Mobile Service Providers: Structural Equation Modeling Approach on Bangladeshi Consumers. Bus. Econ. Res. J. 2010, 1, 17–34. [Google Scholar]
Wole, O.; Simeon, A.N. Determinants of Users’ Choice of Mobile Service Providers in the Nigerian Telecommunications Market. Afr. J. Comput. ICT 2012, 5, 19–32. [Google Scholar]
Dagli, O.; Jenkins, G.P. Consumer preferences for improvements in mobile telecommunication services. Telemat. Inform. 2016, 33, 205–216. [Google Scholar] [CrossRef]
Confraria, J.; Ribeiro, T.; Vasconcelos, H. Analysis of consumer preferences for mobile telecom plans using a discrete choice experiment. Telecomm. Policy 2017, 41, 157–169. [Google Scholar] [CrossRef]
Louviere, J.; Hensher, D.A.; Swait, J. Combining sources of preference data. In Stated Choice Methods; Cambridge University Press: Cambridge, UK, 2000; pp. 227–251. [Google Scholar] [CrossRef]
Elshiewy, O.; Guhl, D.; Boztug, Y. Multinomial Logit Models in Marketing—From Fundamentals to State-of-the-Art. Mark. ZFP 2017, 39, 32–49. [Google Scholar] [CrossRef]
Ayansola, O.A.; Ogundunmade, T.P.; Adedamola, A.O. Modelling Willingness to Pay of Electricity Supply Using Machine Learning Approach. Mod. Econ. Manag. 2022, 1, 9. [Google Scholar] [CrossRef]
Adepoju, A.A.; Oludunni, A.A.; Ogundunmade, T.P. Pettitt and Bayesian Change Point Detcetions in The Price of Kerosene in The Southwestern Region of Nigeria. Int. J. Data Sci. 2022, 3, 33–44. [Google Scholar] [CrossRef]
Jonker, M.; de Bekker-Grob, E.; Veldwijk, J.; Goossens, L.; Bour, S.; Mölken, M.R.-V. COVID-19 Contact Tracing Apps: Predicted Uptake in the Netherlands Based on a Discrete Choice Experiment. JMIR Mhealth Uhealth 2020, 8, e20741. [Google Scholar] [CrossRef] [PubMed]
Dong, D.; Xu, R.H.; Wong, E.L.-Y.; Hung, C.-T.; Feng, D.; Feng, Z.; Yeoh, E.-K.; Wong, S.Y.-S. Public preference for COVID-19 vaccines in China: A discrete choice experiment. Health Expect. 2020, 23, 1543–1578. [Google Scholar] [CrossRef]
Tetteh, E.K.; Morris, S.; Titcheneker-Hooker, N. Discrete-choice modelling of patient preferences for modes of drug administration. Health Econ. Rev. 2017, 7, 26. [Google Scholar] [CrossRef]
Manyise, T.; Lam, R.D.; Lazo, D.P.; Padiyar, A.; Shenoy, N.; Chadag, M.V.; Benzie, J.A.; Rossignoli, C.M. Exploring preferences for improved fish species among farmers: A discrete choice experiment applied in rural Odisha, India. Aquaculture 2024, 583, 740627. [Google Scholar] [CrossRef]
Block, J.B.; Danne, M.; Mußhoff, O. Farmers’ Willingness to Participate in a Carbon Sequestration Program—A Discrete Choice Experiment. Environ. Manag. 2024, 1–18. [Google Scholar] [CrossRef] [PubMed]
Grilli, G.; Tyllianakis, E.; Luisetti, T.; Ferrini, S.; Turner, R.K. Prospective tourist preferences for sustainable tourism development in Small Island Developing States. Tour. Manag. 2021, 82, 104178. [Google Scholar] [CrossRef]
Lizin, S.; Rousseau, S.; Kessels, R.; Meulders, M.; Pepermans, G.; Speelman, S.; Vandebroek, M.; Van Den Broeck, G.; Van Loo, E.J.; Verbeke, W. The state of the art of discrete choice experiments in food research. Food Qual. Prefer. 2022, 102, 104678. [Google Scholar] [CrossRef]
Gutsche, G.; Ziegler, A. Which private investors are willing to pay for sustainable investments? Empirical evidence from stated choice experiments. J. Bank Financ. 2019, 102, 193–214. [Google Scholar] [CrossRef]
Lagarde, M.; Blaauw, D. A review of the application and contribution of discrete choice experiments to inform human resources policy interventions. Hum. Resour. Health 2009, 7, 62. [Google Scholar] [CrossRef]
Sobolewski, M.; Kopczewski, T. Estimating demand for fixed-line telecommunication bundles. Telecomm. Policy 2017, 41, 227–241. [Google Scholar] [CrossRef]
Sohrabi, A.; Pishvaee, M.S.; Hafezalkotob, A.; Bamdad, S. Analysis of consumer preferences for prepaid mobile internet packages in Iran: A Discrete Choice Experiment. Econ. J. Emerg. Mark. 2020, 12, 39–53. [Google Scholar] [CrossRef]
Otekunrin, O.A.; Oliobi, C.E. Modelling Students’ Preferences for Mobile Telecommunication Plans: A Discrete Choice Experiment. Arch. Curr. Res. Int. 2021, 21, 1–10. [Google Scholar] [CrossRef]
Kim, S.; Lee, C.; Lee, J.; Kim, J. Over-the-top bundled services in the Korean broadcasting and telecommunications market: Consumer preference analysis using a mixed logit model. Telemat. Inform. 2021, 61, 101599. [Google Scholar] [CrossRef]
McFadden, D. Conditional Logit Analysis of Qualitative Choice Behavior. In Frontiers in Econometrics; Zarembka, Ed.; Academic Press: Cambridge, MA, USA, 1973; pp. 105–142. [Google Scholar]
Train, K.E. Discrete Choice Methods with Simulation; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar] [CrossRef]
Uncles, M.D. Discrete Choice Analysis: Theory and Application to Travel Demand. J. Oper. Res. Soc. 1987, 38, 370–371. [Google Scholar] [CrossRef]
Lhéritier, A.; Bocamazo, M.; Delahaye, T.; Acuna-Agost, R. Airline itinerary choice modeling using machine learning. J. Choice Model. 2019, 31, 198–209. [Google Scholar] [CrossRef]
Paulssen, M.; Temme, D.; Vij, A.; Walker, J.L. Values, attitudes and travel behavior: A hierarchical latent variable mixed logit model of travel mode choice. Transportation 2014, 41, 873–888. [Google Scholar] [CrossRef]
Yáñez, M.F.; Raveau, S.; Ortúzar, J.D. Inclusion of latent variables in Mixed Logit models: Modelling and forecasting. Transp. Res. Part A Policy Pract. 2010, 44, 744–753. [Google Scholar] [CrossRef]
Zhao, X.; Yan, X.; Yu, A.; Van Hentenryck, P. Prediction and behavioral analysis of travel mode choice: A comparison of machine learning and logit models. Travel Behav. Soc. 2020, 20, 22–35. [Google Scholar] [CrossRef]
Bishop, C. Pattern Recognition and Machine Learning. J. Electron. Imaging 2007, 16, 049901. [Google Scholar] [CrossRef]
Lazar, A.; Ballow, A.; Jin, L.; Spurlock, C.A.; Sim, A.; Wu, K. Machine Learning for Prediction of Mid to Long Term Habitual Transportation Mode Use. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4520–4524. [Google Scholar] [CrossRef]
Hagenauer, J.; Helbich, M. A comparative study of machine learning classifiers for modeling travel mode choice. Expert Syst. Appl. 2017, 78, 273–282. [Google Scholar] [CrossRef]
Cheng, L.; Chen, X.; De Vos, J.; Lai, X.; Witlox, F. Applying a random forest method approach to model travel mode choice behavior. Travel Behav. Soc. 2019, 14, 1–10. [Google Scholar] [CrossRef]
Lee, D.; Derrible, S.; Pereira, F.C. Comparison of Four Types of Artificial Neural Network and a Multinomial Logit Model for Travel Mode Choice Modeling. Transp. Res. Rec. 2018, 2672, 101–112. [Google Scholar] [CrossRef]
Paredes, M.; Hemberg, E.; O’Reilly, U.-M.; Zegras, C. Machine learning or discrete choice models for car ownership demand estimation and prediction? In Proceedings of the 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), Naples, Italy, 26–28 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 780–785. [Google Scholar] [CrossRef]
Weng, J.; Tu, Q.; Yuan, R.; Lin, P.; Chen, Z. Modeling Mode Choice Behaviors for Public Transport Commuters in Beijing. J. Urban Plan Dev. 2018, 144, 05018013. [Google Scholar] [CrossRef]
Liang, L.; Xu, M.; Grant-Muller, S.; Mussone, L. Household travel mode choice estimation with large-scale data—An empirical analysis based on mobility data in Milan. Int. J. Sustain. Transp. 2021, 15, 70–85. [Google Scholar] [CrossRef]
Tang, D.N.; Yang, M.; Zhang, M.H. Travel Mode Choice Modeling: A Comparison of Bayesian Networks and Neural Networks. Appl. Mech. Mater. 2012, 209–211, 717–723. [Google Scholar] [CrossRef]
Pitombo, C.S.; Salgueiro, A.R.; da Costa, A.S.G.; Isler, C.A. A two-step method for mode choice estimation with socioeconomic and spatial information. Spat. Stat. 2015, 11, 45–64. [Google Scholar] [CrossRef]
Sekhar, C.R.; Minal; Madhu, E. Mode Choice Analysis Using Random Forrest Decision Trees. Transp. Res. Procedia 2016, 17, 644–652. [Google Scholar] [CrossRef]
Wang, F.; Ross, C.L. Machine Learning Travel Mode Choices: Comparing the Performance of an Extreme Gradient Boosting Model with a Multinomial Logit Model. Transp. Res. Rec. 2018, 2672, 35–45. [Google Scholar] [CrossRef]
Tarfasa, S.; Balana, B.B.; Tefera, T.; Woldeamanuel, T.; Moges, A.; Dinato, M.; Black, H. Modeling Smallholder Farmers’ Preferences for Soil Management Measures: A Case Study from South Ethiopia. Ecol. Econ. 2018, 145, 410–419. [Google Scholar] [CrossRef]
Kantardzic, M. Data Mining: Concepts, Models, Methods, and Algorithms, 3rd ed.; Wiley-IEEE Press: Piscataway, NJ, USA, 2019; Available online: https://ieeexplore.ieee.org/servlet/opac?bknumber=6105606 (accessed on 10 May 2024).
Goodfellow, I.; Bengio, Y.Y.; Courville, A. Deep learning. Genet. Program. Evolvable Mach. 2018, 19, 305–307. [Google Scholar] [CrossRef]
Ogundunmade, T.P.; Adepoju, A.A. The Performance of Artificial Neural Network Using Heterogeneous Transfer Functions. Int. J. Data Sci. 2021, 2, 92–103. [Google Scholar] [CrossRef]
Ripley, B.D. Pattern Recognition and Neural Networks; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar] [CrossRef]
Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A.K. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef] [PubMed]

Figure 1. AUC plot for XGBoost model at 70% training set.

Figure 2. AUC plot for XGBoost model at 80% training set.

Figure 3. AUC plot for XGBoost model at 90% training set.

Figure 4. AUC plot for ANN model at 70% training set.

Figure 5. AUC plot for ANN model at 80% training set.

Figure 6. AUC plot for ANN model at 90% training set.

Table 1. Summary of related studies on the application of DCE.

Refs	Summary of Studies	Attributes Used	DCE	DCE with Other Models
[8]	Analysed customer preferences for mobile plans and operator features. Reported features that people are willing to pay for and how they relate to socio-demographic variables.	Calling club network effects, pure network effects, length of commitment period, monthly fee/recharge obligations, per-minute call charges	✔	✘
[22]	A sample of Polish people’s subscription choices between standardised packages of fixed-line telecommunications services. Willingness-to-pay estimates were found for the attributes.	Type of fixed-line package, the monthly subscription fee for fixed-line package, outside option	✔	✘
[23]	Used a combination of software and paper-based interviews to study customer preferences for prepaid mobile internet bundles. Deployed Bayesian D-optimal design procedure to design choice sets from four main attributes. Traffic volume and brand qualities were found to have the greatest influence on customer behaviour.	Validity period, traffic volume, price, brand attributes	✔	✘
[24]	Investigated factors that elicited preferences of university students for mobile phone plans. The data were analysed using the MNL model and estimates of the marginal willingness to pay for the attributes were found.	Call rate, data speed, customer service, premiums, network coverage	✔	✘
[25]	Analysed consumer preferences for bundled pay-TV and over-the-top (OTT) services. They analysed the marginal utilities of attributes using a mixed logit model.	Service provider, live channel streaming, video on demand (VOD) service, Bundling OTT service, discount rate (% monthly fee),	✔	✘
This study	Compared two deep-learning models, the artificial neural network (ANN) and the extreme gradient boosting (XGBoost) for the discrete choice analysis of mobile plan preferences of customers. Root mean square error (RMSE) and mean absolute error (MAE) are used to assess the model fitness, while accuracy, precision, recall and F1 score were used to compare the models’ performance. The results showed that XGBoost performs better compared to ANN in model fitness and prediction.	Call rate, data speed, customer service, premiums, network coverage	✔	✔

Table 2. Attributes and levels.

Attributes	Definitions	Attribute Levels
Call rate	The cost a consumer pays the network service provider for using a mobile phone to communicate.	0. 99c 1. 76c
Data speed	The rate at which information or material is transferred from the World Wide Web to computers, tablets, or smartphones is expressed in megabits per second (Mbps).	0. 22.22 1. 21.63
Customer service	A collection of tasks designed to guarantee that customers’ expectations for certain goods or services are met.	0. Not very good 1. Very good
Premiums	An extra incentive is offered to customers to buy a good or service.	0. Low 1. High
Network coverage	The region of the world where a mobile phone can be effectively used to make a call.	0. Vast 1. Not vast

Table 3. A sample choice set presented to participants.

Attributes	Network A	Network B	Network C	Network D
Call rate	99c	79c	79c	99c
Data speed	21.63	22.22	22.22	21.63
Customer service	Not very good	Very good	Very good	Not very good
Premiums/incentives	Low	Low	High	High
Network coverage	Vast	Not vast	Vast	Not vast
I would choose

Table 4. Model fitness for XGBoost model.

	Data Splitting
Measures	70%	80%	90%
Accuracy	0.7778	0.7783	0.7790
95% C1	(0.7486, 0.8051)	(0.7379, 0.8078)	(0.7216, 0.8213)
Sensitivity	0.8032	0.8040	0.8961
Specificity	0.6071	0.6167	0.6061
AUC	0.7864	0.7773	0.7927
MSE	0.15417	0.15381	0.14940
MAE	0.3086	0.30497	0.29790
RMSE	0.3931	0.39219	0.38650

Table 5. Model fitness for artificial neural network (ANN).

	Data Splitting
Measures	70%	80%	90%
Accuracy	0.7778	0.7783	0.7790
95% C1	(0.7486, 0.8051)	(0.7379, 0.8078)	(0.7216, 0.8213)
Sensitivity	0.8032	0.8040	0.8961
Specificity	0.6071	0.6167	0.6061
AUC	0.7864	0.7773	0.7927
MSE	0.15417	0.15381	0.14940
MAE	0.3086	0.30497	0.29790
RMSE	0.3931	0.39219	0.38650

Table 6. Model performance.

Models	XGBoost			ANN
	Training Sets
Metrics	70%	80%	90%	70%	80%	90%
Accuracy	0.7778	0.7783	0.7790	0.7639	0.7693	0.7743
Precision	0.8032	0.8040	0.8961	0.7859	0.7900	0.7949
Recall	0.9259	0.9263	0.9398	0.9290	0.9295	0.9320
F1 score	0.8602	0.8687	0.8720	0.8515	0.8571	0.8637

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Oliobi, C.; Chinhamu, K.; Zewotir, T. Comparing Two Deep-Learning Models in Discrete Choice Analysis of Customers’ Mobile Plan Preferences. Appl. Sci. 2024, 14, 4616. https://doi.org/10.3390/app14114616

AMA Style

Oliobi C, Chinhamu K, Zewotir T. Comparing Two Deep-Learning Models in Discrete Choice Analysis of Customers’ Mobile Plan Preferences. Applied Sciences. 2024; 14(11):4616. https://doi.org/10.3390/app14114616

Chicago/Turabian Style

Oliobi, Chidimma, Knowledge Chinhamu, and Temesgen Zewotir. 2024. "Comparing Two Deep-Learning Models in Discrete Choice Analysis of Customers’ Mobile Plan Preferences" Applied Sciences 14, no. 11: 4616. https://doi.org/10.3390/app14114616

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparing Two Deep-Learning Models in Discrete Choice Analysis of Customers’ Mobile Plan Preferences

Abstract

1. Introduction

2. Methodology

2.1. The Discrete Choice Experiment (DCE)

2.2. Survey Design

2.3. Artificial Neural-Network Model (ANN)

2.4. Extreme Gradient Boosting (XGBoost) Model

2.5. Model Validation

2.6. Model Performance Measures

2.7. Model Fitness Criteria

3. Results and Discussion

4. Model Performance Result

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI