Consumer Purchasing Power Prediction of Interest E-Commerce Based on Cost-Sensitive Support Vector Machine

Ye, Rendao; Yang, Mengyao; Sun, Peng

doi:10.3390/su152014693

Open AccessArticle

Consumer Purchasing Power Prediction of Interest E-Commerce Based on Cost-Sensitive Support Vector Machine

by

Rendao Ye

^*,

Mengyao Yang

and

Peng Sun

School of Economics, Hangzhou Dianzi University, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(20), 14693; https://doi.org/10.3390/su152014693

Submission received: 31 August 2023 / Revised: 4 October 2023 / Accepted: 5 October 2023 / Published: 10 October 2023

(This article belongs to the Section Economic and Business Aspects of Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

The traditional e-commerce business chain is being reconstructed around the content of short videos and live streams, and the interest e-commerce is thriving as a new trend in the e-commerce industry. Diversified content promotes the rapid development of interest e-commerce. For consumers, their preferences for different content reflect their consumption level to a certain extent. The purpose of this study is to accurately predict the purchasing power level with the consumer content preference, and provide new ideas for interest e-commerce business. In this paper, the new swarm intelligence algorithm is used to find the optimal misclassification cost, and three cost-sensitive models are established. On this basis, the content preference of interest e-commerce consumers is used to predict the level of purchasing power. The results show that the content preference of interest e-commerce consumers, such as “fashion”, “photography” and “interpretation”, have a significant effect on the prediction of purchasing power at the 95% confidence level. The accuracies of the optimized cost-sensitive support vector machine in predicting consumer purchasing power are all above 0.9, and the highest is 0.9792. This study effectively alleviates the problem that the classification results tend to be biased towards negative samples, especially when the imbalanced rate of the sample is high. It not only provides researchers with an efficient parameter optimization method, but also reflects the relationship between consumer content preference and purchasing power, providing data support for interest e-commerce operations.

Keywords:

content preference; purchasing power; swarm intelligence algorithm; cost-sensitive support vector machine

1. Introduction

With the advent of 5G, consumers’ attention has further shifted to the content of short videos and live streams. According to the 51st China Internet network status statistics report, released by the China Internet Network Information Center, as of December 2022, the short video user scale reached 1.012 billion, with a penetration rate of 94.8%, and the scale of live stream users reached 751 million, accounting for 70.3% of the overall number of Internet users. Short video and live stream platforms are currently the platforms with the fastest growth rate and the largest growth scale of users. Users not only see the good life through these short videos and live streams, but also generate a new consumption awareness from them, which lays a solid foundation for the development of the interest e-commerce business. Nowadays, content has become the hub that connects consumers to goods. The form of content has been upgraded from texts and graphics to short videos and live streams, so that multi-dimensional goods information is clearly presented with a higher information density and a more vivid presentation form. Consumers are able to fully understand the advantages and usage scenarios of the goods, which stimulates their interest and ultimately leads to clear needs. However, as the consumer market returns to rationality, acquiring business through network traffic will mean soon running into a growth bottleneck, particularly in the interest e-commerce platforms that provide consumers with rich content. Content drives business prominently, which is a challenge for merchants to operate.

Although the purchasing power of consumers is affected by many factors, such as personal economic status, the attractiveness of the product itself and so on, for the interest e-commerce platform, by analyzing consumer content preference and purchasing power, we can deeply understand consumer preference and interests. Furthermore, personalized recommendations and advertisements are provided according to the needs of consumers, so as to improve the click-through rate, conversion rate of advertisements and the amount of sales. Therefore, predicting the purchasing power of interest e-commerce consumers through their content preference can achieve the stratified operation of consumers to accomplish better business. However, at this stage, research about interest e-commerce lacks issues related to consumer content preference. Moreover, few scholars have discussed the direct relationship between consumer content preference and purchasing power. In this paper, consumer purchasing power is predicted with content preference as the feature variable. This study broadens the perspective of the business strategy of interest e-commerce and fills the research gap for interest e-commerce to a certain extent.

In the interest e-commerce market, consumers with high purchasing power are the target customers that most businesses want to attract. The accurate identification of consumers with high purchasing power is a binary classification problem. Consumers with high purchasing power account for a relatively small proportion of the consumer group, showing a typical imbalanced distribution. The classification results of standard classifiers tend to be negative samples in imbalanced datasets. To enhance the performance of classifiers in imbalanced data, researchers have proposed a series of optimization methods regarding the size of the training set, class priorities, cost matrices and decision boundaries. Among these methods, resampling, cost-sensitive learning and ensemble learning have seen widespread application [1]. Resampling is a class of data-based methods, which balances the positive and negative sample size by varying the size of the training set. Among them, the synthetic minority oversampling technique improves the problem of overfitting and sample imbalance, which can effectively recognize user consumption behavior [2]. Cost-sensitive learning and ensemble learning are the main methods for solving class imbalance based on algorithm level. Liu et al. [3] proposed a composite support vector machine that employs an adaptive synthetic sampling algorithm while assigning different penalty factors to positive and negative samples. This approach enhances the adaptability of the classifier to different data characteristics. Zheng et al. [4] and Castro and Braga [5] incorporated prior knowledge into cost parameters using a standard support vector machine and multilayer perceptron neural networks, respectively, thereby proposing cost-sensitive classifiers that consider different misclassification costs. Wang et al. [6] introduced a cost-sensitive AdaBoost ensemble learning algorithm that updates sample weights through an iterative oversampling process, and validated its performance using decision trees and Naive Bayes as base classifiers.

Cost-sensitive learning is essentially a process of parameter optimization in the model, which involves searching for the optimal configuration of parameters to achieve the best performance. Common parameter optimization methods besides gradient-based optimization, Bayesian optimization and grid search, such as population-based optimization, also show promising performances. The basic idea of the swarm intelligence algorithm is to optimize model parameters by simulating animals or things in nature. All individuals in the swarm adjust their positions according to the individual extremum and the global optimal solution. This kind of method is generally used for optimization problems. Huang et al. [7] employed particle swarm optimization, genetic algorithm and grid search to optimize the parameters of the radial basis kernel function in the support vector machine, and applied it to risk identification in the railway transportation of hazardous goods. Mulyawan et al. [8], Sreedevi and Anuradha [9] and Adnan et al. [10] adopted support-vector-machines-optimized particle swarm optimization to classify the human development index, electrocardiogram signals and dissolved oxygen, respectively. Oh et al. [11] employed a combination of a support vector machine and genetic algorithm to generate training data from the approximate model of general cash crops. Huo et al. [12] classified the usual faults of fuel cells using an extreme learning machine and a support vector machine optimized using a genetic algorithm. Aalizadeh et al. [13] proposed an approach that utilizes the ant colony algorithm to optimize the support vector machine, and applied this to predict the toxicity of pollutants to water fleas. Liu et al. [14] introduced the support vector machine optimized using the crossover mutation artificial bee colony algorithm and applied this to intrusion detection. Bui et al. [15] presented a hybrid intelligent approach using the least squares support vector machine and artificial bee colony optimization for the spatial prediction of rainfall-induced landslides. Tharwat et al. [16] proposed a support vector machine parameter optimization method based on the bat algorithm, and compared it with particle swarm optimization and a genetic algorithm. These research and application results show that the model based on the swarm intelligence algorithm can avoid the local optimization problem and has application value.

According to the aforementioned research findings, it is worth noting that the majority of studies based on support vector machine do not take into consideration the classification error resulting from the imbalance between positive and negative samples. A cost-sensitive support vector machine is employed to adjust the information contribution of positive and negative samples within the model by assigning distinct misclassification costs to each class. This approach can be regarded as an essential parameter optimization procedure. The swarm intelligence algorithm emulates the population behavior of organisms in nature to construct the stochastic optimization algorithm. This technique offers numerous advantages, including a simple structure, limited variables, a straightforward implementation, a parallel search capability and population diversity, among others. The main research objectives of this paper have two points. One is to verify the optimization ability of the new swarm intelligence algorithm in parameter optimization and provide a method for the parameter optimization of the cost-sensitive support vector machine. The second is to predict their purchasing power with consumer content preference and improve their operation ideas for the growth of the e-commerce business.

The rest of paper is organized as follows. In Section 2, we review the recently proposed improved swarm intelligence algorithms and their applications in solving practical problems. Section 3 introduces the basic models of this study, including the support vector machine, cost-sensitive support vector machine, whale optimization algorithm, gray wolf algorithm, and salp swarm algorithm. Then, the cost-sensitive support vector machine optimized using a swarm intelligence algorithm is established, and its performance is verified in the open-source datasets. In Section 4, the cost-sensitive support vector machine optimized using a swarm intelligence algorithm is applied to predict the purchasing power and level of interest of e-commerce consumers based on their content preference. We provide conclusions and future work in Section 5.

2. Related Work

Many scholars have improved the new swarm intelligence algorithm, including adjusting flight strategy, integrating chaotic strategy and combining multiple strategies. Deepa and Venkataraman [17] utilized the Lévy flight strategy to adjust the position of humpback whales after position updates. He et al. [18] employed the Lévy flight strategy instead of the spiral update position mechanism of WOA to quickly jump out of the local optimum. Ji and Fan [19] introduced the Lévy flight strategy into the position update of the leader and follower of the salps. Fan and Yu [20] proposed an improved Lévy flight strategy and enhanced the global exploration ability of the gray wolf algorithm. Prasad et al. [21] operated a logistic chaotic map to adjust the control parameters of WOA. The experimental results showed that overfitting was effectively alleviated. Kohli and Arora [22] introduced chaotic strategy or randomness into the optimization algorithm and used different chaotic maps to adjust the key parameters of global optimization, so as to improve the convergence speed. Li et al. [23] proposed a hybrid whale optimization algorithm, which operates a hybrid strategy combining adaptive weight factor and Gaussian random perturbation to adjust the position update rules. Nadimi-Shahraki et al. [24] introduced an adaptive movement step design based on a new multi-trial vector approach, which combines different search strategies in the form of trial vector producers. On this basis, the improved new swarm intelligence algorithm is applied to feature selection [25,26,27], data prediction [28,29,30], engineering design [31,32], optimal power flow optimization [33,34,35] and other scenarios.

Table 1 shows a comparison of the existing methods mentioned above. The improved methods can diversify the population, alleviate the problem of falling into local optima and improve the speed of convergence. However, the optimized parameters of the support vector machine are only the penalty factor and kernel function parameter, without considering the error caused by the difference in the size of the positive and negative class samples. Therefore, a cost-sensitive support vector machine optimized using a swarm intelligence algorithm optimization is established in this paper.

3. Theoretical Basis

3.1. Cost-Sensitive Support Vector Machine

A support vector machine is a machine learning method based on the statistical learning theory proposed by Cortes and Vapnik [36]. It is a supervised learning algorithm commonly used in regression and classification problems. Support vector machine (SVM) has been used by many machine learning researchers due to its advantages such as high generalization ability and minimal structural risk.

Let

X = (x_{1}, x_{2}, \dots, x_{n})

be a set of feature matrices containing n samples,

x_{i}

be the feature variable of the ith sample and

y_{i} \in {\pm 1}

be the category label of

x_{i}

, I = 1, 2,

\dots

, n. In the binary classification problem, the essence of the support vector machine is to maximize the interval between the positive and negative samples and find the optimal hyperplane, which is equivalent to the optimization problem (1).

\min_{ω, b, ξ_{i}} \frac{1}{2} {‖ω‖}^{2} + C \sum_{i = 0}^{n} ξ_{i} s . t . y_{i} (ω^{T} x_{i} + b) \geq 1 - ξ_{i} ξ_{i} \geq 0, i = 1,2, \dots, n .

(1)

ω

and b are the normal vector and displacement term for determining the hyperplane, respectively.

ξ_{i}

are slack variables.

C > 0

is the penalty coefficient, which is used to balance the minimization of

‖ω‖

and

\sum_{i = 0}^{n} ξ_{i}

.

In non-linear separable datasets, the support vector machine makes samples linearly differentiable by mapping them from the original space to the higher finite-dimensional feature space through a kernel function. The kernel function measures the similarity between any pair of samples

x_{i}

and

x_{j}

. The common nonlinear kernel functions are shown in Table 2 [37].

The performance of the machine learning algorithm in a particular dataset relies heavily on the optimization of hyper parameters. Selecting the type of kernel function and determining the parameters of the kernel function are essential parts in the process of establishing the support vector machine model. For example, the parameters to be optimized for the Gaussian kernel support vector machine are the penalty coefficient C and Gaussian kernel bandwidth

σ

. If the optimization process of these two parameters needs to be debugged m times each, the support vector machine needs to be tested m² times in total. The more parameters to be optimized, the more combined tests are required.

The cost-sensitive support vector machine uses different penalty coefficients for positive and negative samples, which not only can reduce the classification error caused by the imbalance of samples, but also reflects the practical value of accurately identifying negative samples in the application. Let C₁ and C₋₁ be the misclassification cost of positive and negative samples, respectively, and the cost-sensitive support vector machine can be expressed as (2).

\underset{ω, b, ξ_{i}}{argmin} \frac{1}{2} {‖ω‖}^{2} + C_{1} \sum_{i | y_{i} = 1} ξ_{i} + C_{- 1} \sum_{i | y_{i} = - 1} ξ_{i} s . t . y_{i} (ω^{T} x_{i} + b) \geq 1 - ξ_{i} ξ_{i} \geq 0, i = 1,2, \dots, n .

(2)

3.2. New Swarm Intelligence Algorithm

3.2.1. Whale Optimization Algorithm

The Whale optimization algorithm (WOA) is a recent and successful metaheuristic algorithm that simulates the hunting behavior of whales. It achieves optimized search by simulating the processes of searching, encircling, pursuing and attacking the prey by whales [38]. The whales randomly search for the target and approach the target through two strategies of surrounding predation and spiral bubble predation, and finally get the optimal one. Each of the whales searches for the target by randomly walking, and the specific mathematical model is shown in Equations (3) and (4).

\vec{D} = |\vec{C} \cdot {\vec{X}}_{r a n d} (t) - \vec{X} (t)|

(3)

\vec{X} (t + 1) = {\vec{X}}_{r a n d} (t) - \vec{A} \cdot \vec{D}

(4)

\vec{X}

is the whale position vector. t is the current iteration.

{\vec{X}}_{r a n d}

is the random whale position vector.

\vec{D}

is the vector distance between the whale and the random one. The vectors

\vec{A}

and

\vec{C}

are defined as Equations (5) and (6).

\vec{A} = 2 \vec{a} \cdot \vec{r} - \vec{a}

(5)

\vec{C} = 2 \vec{r}

(6)

\vec{r}

is a random vector between (0, 1).

\vec{a} = 2 - 2 \cdot t ∕ t_{m a x}

,

t_{m a x}

is the maximum number of iterations.

\vec{a}

decreases linearly from 2 to 0 over the course of iterations. In each iteration of the whale algorithm, there is a probability of

p_{0}

to shrink and surround or a probability of 1 −

p_{0}

to move in a spiral motion to update the position. The specific mathematical model is shown in Equation (7).

\vec{X} (t + 1) = \{\begin{matrix} {\vec{X}}^{*} (t) - \vec{A} \cdot \vec{D} p < p_{0} \\ {\vec{D}}^{'} \cdot e^{b l} \cdot \cos (2 π l) + {\vec{X}}^{*} (t) p \geq p_{0} \end{matrix} \vec{D} = |\vec{C} \cdot {\vec{X}}^{*} (t) - \vec{X} (t)| \vec{D}' = |{\vec{X}}^{*} (t) - \vec{X} (t)|

(7)

{\vec{X}}^{*}

is the position vector of the best whale.

b

is a constant limiting the curvature of the spiral, and l is a random number of [−1, 1].

3.2.2. Grey Wolf Optimization

Gray wolf optimization (GWO) simulates the hunting process based on the social hierarchy and collaborative mechanisms of the gray wolf. It achieves optimization through the stages of tracking, chasing, encircling and attacking [39]. In the mathematical model, the gray wolf individuals with optimal fitness are defined as

α

,

β

,

δ

in order to guide others. All other gray wolves are defined as

ω

, and their positions are updated around three leader wolves. The mathematical model of individual position and distance from the target in the gray wolf optimization algorithm is consistent with the whale optimization algorithm. When the gray wolves surround their prey, the position of

α

,

β

and

δ

are constantly updated. The specific mathematical models are shown in Equations (8)–(10).

\{\begin{matrix} \vec{D_{α}} = |\vec{C_{1}} \cdot \vec{X_{α}} (t) - \vec{X} (t)| \\ \vec{D_{β}} = |\vec{C_{2}} \cdot \vec{X_{β}} (t) - \vec{X} (t)| \\ \vec{D_{δ}} = |\vec{C_{3}} \cdot \vec{X_{δ}} (t) - \vec{X} (t)| \end{matrix}

(8)

\{\begin{matrix} \vec{X_{1}} = |\vec{X_{α}} (t) - A_{1} \cdot \vec{D_{α}}| \\ \vec{X_{2}} = |\vec{X_{β}} (t) - A_{2} \cdot \vec{D_{β}}| \\ \vec{X_{3}} = |\vec{X_{δ}} (t) - A_{3} \cdot \vec{D_{δ}}| \end{matrix}

(9)

\vec{X} (t + 1) = \frac{\vec{X_{1}} + \vec{X_{2}} + \vec{X_{3}}}{3}

(10)

\vec{X_{α}}

,

\vec{X_{β}}

and

\vec{X_{δ}}

are the position of

α

,

β

and

δ

.

\vec{X}

is the position of

ω

.

\vec{D_{α}}

,

\vec{D_{β}}

and

\vec{D_{δ}}

are the distance between

α

,

β

,

δ

and

ω

.

\vec{X_{1}}

,

\vec{X_{2}}

and

\vec{X_{3}}

are the direction and distance that

ω

needs to be adjusted due to the influence of

α

,

β

and

δ

.

3.2.3. Salp Swarm Algorithm

The salp swarm algorithm (SSA) is also a bionic swarm intelligence algorithm, which simulates the chain search pattern of the salp during predation, where individuals follow each other in turn to move [40]. In the process of chain search, the salps form a salp chain. The moving direction of the salp is only affected by its previous one to avoid falling into local optimum. In order to establish a mathematical model to represent the formation of the salp chain and the movement of the salp, the salps are divided into leaders and followers. The leader is the foremost one in the salp chain, and the other salps are followers, which directly or indirectly follow the leader. The process of updating the position of the leader is stochastic, and the specific mathematical model is shown in Equation (11).

\vec{X_{1}} = \{\begin{matrix} \vec{F} + c_{1} ((\vec{U B} + \vec{L B}) c_{2} + \vec{L B}) c_{3} \geq 0 \\ \vec{F} - c_{1} ((\vec{U B} + \vec{L B}) c_{2} + \vec{L B}) c_{3} < 0 \end{matrix}

(11)

\vec{X_{1}}

is the updated position of the leader.

\vec{F}

is the current position of the globally optimal salp.

\vec{U B}

and

\vec{L B}

are the upper and lower boundaries of the search space.

c_{1}

is the convergence factor, which is used to balance the global and local search. It is determined using Equation (12).

c_{2}

and

c_{3}

are random numbers between [0, 1].

c_{1} = 2 e^{- {(\frac{4 l}{L})}^{2}}

(12)

l

is the current iteration and

L

is the maximum iteration. The followers move in chain order, and the specific mathematical model is shown in Equation (13).

\vec{X_{i}} = \frac{1}{2} (\vec{X_{i}} + \vec{X_{i - 1}}) i \geq 2

(13)

3.3. Cost-Sensitive Support Vector Machine Optimized Using New Swarm Intelligence Algorithm

3.3.1. Evaluation Index

Based on the confusion matrix, this paper uses accuracy, recall, precision, G-mean and F1-score to evaluate the classification performance of the cost-sensitive support vector machine. The specific definitions of each index are shown in Equations (14)–(18).

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(14)

R e c a l l = \frac{T P}{T P + F N}

(15)

P r e c i s i o n = \frac{T P}{T P + F P}

(16)

G - m e a n = \sqrt{\frac{T P}{T P + F N} * \frac{T N}{T N + F P}}

(17)

F 1 - s c o r e = \frac{2 * R e c a l l * P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(18)

TP is the number of true positive samples. TN is the number of true negative samples. FP is the number of false positive samples. FN is the number of false negative samples.

3.3.2. Model Evaluation

In the cost-sensitive support vector machine optimized using a swarm intelligence algorithm, the parameter to be optimized is the misclassification cost of positive and negative samples. To solve the tendentious results caused by the difference in the number of positive and negative samples, the misclassification cost should have a negative correlation with the amount of information provided by the samples to the model in theory, which makes the accuracy of the model have continuous significance. When optimizing parameters by using population intelligence algorithms, individual positions correspond to parameter values to be optimized. Moreover, the dimension of individual position corresponds to the number of parameters to be optimized, and the fitness function corresponds to the classification performance. The flowchart of establishing the cost-sensitive support vector machine, optimized using the whale optimization algorithm, gray wolf algorithm and salp swarm algorithm, is shown in Figure 1.

In addition, the support vector machine (SVM), the cost-sensitive support vector machine optimized using grid search (GS-CSSVM), the cost-sensitive support vector machine optimized using a genetic algorithm (GA-CSSVM), the cost-sensitive support vector machine optimized using particle swarm optimization (PSO-CSSVM) and the recently proposed cost-sensitive support vector machine optimized using the dung beetle algorithm (DBO-CSSVM) [41], are established to compare and analyze with the above three models. To verify and compare their classification performance, this paper selects 10 datasets with IR between 1.79 and 9.98 in the open-source data platform KEEL (Knowledge Extraction based on Evolutionary Learning) [42]. The specific description of the dataset is shown in Table 3.

In the experiment, both the fitness function and the objective function are the classification accuracy of the model, the population size is set to 20, the maximum number of iterations is 10 and the number of runs is 10. The following conclusions can be drawn from the experimental results in Table 4, Table 5, Table 6, Table 7 and Table 8.

(1): A cost-sensitive support vector machine optimized using WOA, GWO and SSA performs better than SVM, and than the model optimized using GS, in accuracy, recall, precision, G-mean and F1-score.
(2): Compared with the cost-sensitive support vector machine optimized using GA and PSO, although the recall of the model optimized using WOA, GWO and SSA is poor, the accuracy, precision, G-mean and F1-score are improved
(3): The classification performance of the cost-sensitive support vector machine optimized using WOA, GWO and SSA is still stable when the model optimized using SVM and GS have reached high accuracy.
(4): A cost-sensitive support vector machine optimized using WOA, GWO and SSA is slightly better than the one optimized using DBO in accuracy, recall, G-mean and F1-score.

To sum up, a cost-sensitive support vector machine optimized using WOA, GWO and SSA can avoid falling into local optimal extremum, and can achieve the same, or higher, optimization accuracy as GA and PSO. The optimization search process is global and robust, which can be widely used in practice.

4. Consumer Purchasing Power Prediction of Interest E-Commerce

4.1. Data Description

In this paper, the research data were derived from a regularly updated data platform called oceanengine. The consumer content preference TGI reflects the interest of the target population in a certain category of content. Meanwhile, the consumption preference TGI is divided into high purchasing power and low purchasing power according to 1:9 (IR = 9), 2:8 (IR = 4), 3:7 (IR = 2.3), 4:6 (IR = 1.5) and 5:5 (IR = 1), which resulted in five sample groups. TGI is the relative degree to which the number of consumers in the target population with a particular feature is in the industry. The definition of TGI is shown in Equation (19).

T G I = \frac{P_{x}}{P_{X}}

(19)

P_x represents the proportion of individuals with a certain characteristic in the sample, and P_X represents the proportion of individuals with a certain characteristic in the population.

4.2. Feature Selection

4.2.1. Variance Homogeneity Test

The variance homogeneity test is the antecedent process of the two-sample mean test to determine the form of the test statistic. Taking the sample data with IR = 9 as an example, the homogeneity test results of content preference variance of high and low purchasing power consumers are shown in Table 9. It can be seen that the variance of the content preference variables of “parent-child”, “news” and “game” is homogeneous, and the variance of the other content preference data is significantly different.

4.2.2. Two-Sample Mean Test

In this paper, invalid content preferences are eliminated by testing whether there is a significant difference between the content preference of consumers with high and low purchasing power. Similarly, taking the sample data with IR = 9 as an example, the results of the mean test are shown in Table 10. It can be seen that the 16 content preferences, such as “fashion”, “culture and education”, “shooting” and so on, have significant differences at a 5% significance level in the positive and negative samples, which plays an important role in the consumer purchasing power prediction of interest e-commerce. Therefore, this paper chooses the above 16 variables as the input variables of the prediction model.

Figure 2 shows the content preference TGI of some high purchasing power consumers in the sample data of IR = 9. It can be initially concluded that consumers with high purchasing power prefer to watch content related to “fashion”, “game”, “science and technology” and so on, while they are less interested in content related to “countryside”.

The feature selection process for the sample data of IR = 1, IR = 1.5, IR = 2.3 and IR = 4 will not be repeated, and the final selection results are shown in Table 11. Respectively, 15, 13, 19 and 17 variables were selected as input variables.

4.3. Prediction of Consumer Purchasing Power

In order to balance the weight of input variables and retain the original relationship of the data, the consumer content preference TGI is normalized to lie within the interval [0, 1]. On this basis, the sample data were divided into training and test sets according to 7:3, and a five-fold cross-validation method was used to establish WOA-CSSVM, GWO-CSSVM and SSA-CSSVM consumer purchasing power prediction models. Among them, population sizes are set to 20, and the maximum number of iterations is 10. The prediction results are shown in Table 12. From this, the following conclusions can be drawn:

(1): Compared with the GS-CSSVM model, WOA-CSSVM, GWO-CSSVM and SSA-CSSVM improved the prediction accuracy by 0.0625 to 0.1667, where the accuracy of WOA-CSSVM and GWO-CSSVM models increased to 0.9375 and 0.1667 for IR = 1.5.
(2): The prediction accuracy of WOA-CSSVM, GWO-CSSVM and SSA-CSSVM can reach the same or better than GA-CSSVM and PSO-CSSVM models, except that the prediction accuracy of SSA-CSSVM model decreases slightly at IR = 1.5 and 4. When the sample imbalanced rate is high, WOA-CSSVM, GWO-CSSVM and SSA-CSSVM can effectively alleviate the problem that the model prediction is biased towards a single sample, and the F1-score is increased by 0.1197, 0.1399 and 0.1197, respectively.
(3): The WOA-CSSVM, GWO-CSSVM and SSA-CSSVM models have an accuracy of more than 0.9, with a maximum of 0.9792. Meanwhile, GWO-CSSVM outperforms WOA-CSSVM and SSA-CSSVM in the accuracy and F1-score. The specific results are shown in Figure 3.

Therefore, WOA-CSSVM, GWO-CSSVM and SSA-CSSVM have high predictive value, which can provide interest e-commerce platforms with purchasing power judgment based on consumer content preference and show content value.

5. Conclusions

The booming e-commerce industry is significantly impacting the lives of a massive number of users, driving the transformation of consumer experiences. The motivation for business growth is gradually shifting towards interest-driven stimulation. Content preference is the foundation for accurately identifying consumers with high purchasing power, which enables differentiated operations for users in the interest e-commerce. This paper establishes the cost-sensitive models optimized using the whale optimization algorithm, gray wolf optimization, and salp swarm algorithm. The evaluation of their performance is conducted using publicly available datasets. The experimental results demonstrate that adjusting the misclassification cost using these new swarm intelligence algorithms can effectively address the problem of getting stuck in local optima. The accuracies of these algorithms are comparable to or even higher than that of the genetic algorithm and particle swarm optimization algorithm. Furthermore, consumer purchasing power is predicted using the significantly different consumer content preference TGI. The results show that the accuracies of predicting consumer purchasing power are all above 0.9, with a maximum of 0.9792. Especially when the imbalanced rate of the sample data is high, it can effectively alleviate the problem that the model prediction is biased towards negative samples, and F1-score is increased by 0.1197, 0.1399 and 0.1197, respectively.

In this paper, the whale optimization algorithm, gray wolf optimization, and salp swarm algorithm are used to optimize the cost sensitive support vector machine, which fully demonstrates the optimization ability of these swarm intelligence algorithms in the machine learning model. The cost-sensitive model provides different misclassification costs for different classes of samples and aims to minimize the total cost of the classifier, which is more in line with the background of practical problems such as consumption, medical treatment, detection and so on. In the field of interest e-commerce studied, the insight of consumer interest has become the key link of marketing and consumer decision-making. Digital consumption contains the power to drive economic development and promote economic transformation. As the new consumption mode, interest e-commerce has given birth to the endogenous growth potential of the e-commerce economy, consumption field and supply side. Through the consumer content insight of interest e-commerce, we can better understand the needs of consumers and provide more personalized services and products, so as to improve the consumer experience and increase business sales. In the future, we will explore the variants of the new swarm intelligence algorithm and compare it with the best one at that time. In addition, we will establish a bagging ensemble learning model based on a cost-sensitive support vector machine and find the optimal weight of each base classifier using the new swarm intelligence algorithm and its variants to further verify the generalization performance.

Author Contributions

Conceptualization, R.Y. and M.Y.; methodology, M.Y.; software, M.Y. and P.S.; validation, M.Y. and P.S.; writing—review and editing, M.Y. and P.S.; funding acquisition, R.Y. and M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Foundation of China (Grant No. 21BTJ068) and the Scientific Research Fund of the Zhejiang Provincial Education Department (Grant No. Y202249766).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the opening databases. Data are available from the authors.

Acknowledgments

The authors would like to sincerely thank the editors and the anonymous referees for their valuable suggestions and helpful comments, which greatly improved the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, Y.X.; Chai, Y.; Hu, Y.Q.; Yin, H.P. Review of imbalanced data classification methods. Control Decis. 2019, 34, 673–688. [Google Scholar] [CrossRef]
Hu, H.J.; Zhu, D.J.; Wang, T.; He, C.; Sikder, J.; Jia, Y.C. User consumption behavior recognition based on SMOTE and improved AdaBoost. Int. J. Softw. Sci. Comput. Intell. (IJSSCI) 2022, 14, 1–20. [Google Scholar] [CrossRef]
Liu, D.Q.; Chen, Z.J.; Xu, Y.; Li, F.T. Hybrid SVM algorithm oriented to classifying imbalanced datasets. Appl. Res. Comput. 2018, 35, 1023–1027. [Google Scholar]
Zheng, E.H.; Li, P.; Song, Z.H. Cost sensitive support vector machine. Control Decis. 2006, 21, 473–476. [Google Scholar]
Castro, C.L.; Braga, A.P. Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 888–899. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Chen, H.M.; Wang, S.W. NIBoost: New imbalanced datasets classification method on cost sensitive ensemble learning. J. Comput. Appl. 2019, 39, 629–633. [Google Scholar]
Huang, W.C.; Liu, H.Y.; Zhang, Y.; Mi, R.W.; Tong, C.G.; Xiao, W.; Shuai, B. Railway dangerous goods transportation system risk identification: Comparisons among SVM, PSO-SVM, GA-SVM and GS-SVM. Appl. Soft Comput. J. 2021, 109, 107541. [Google Scholar] [CrossRef]
Mulyawan; Dwilestari, G.; Bahtiar, A.; Basysyar, F.M.; Suarna, N. Classification of human development index using particle swarm optimization based on support vector machine algorithm. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1088, 012033. [Google Scholar] [CrossRef]
Sreedevi, G.; Anuradha, B. Feature extraction and classification of ECG signals with support vector machines and particle swarm optimisation. Int. J. Biomed. Eng. Technol. 2021, 35, 242–262. [Google Scholar] [CrossRef]
Adnan, R.M.; Dai, H.L.; Mostafa, R.R.; Parmar, K.S.; Heddam, S.; Kisi, O. Modeling multistep ahead dissolved oxygen concentration using improved support vector machines by a hybrid metaheuristic algorithm. Sustainability 2022, 14, 3470. [Google Scholar] [CrossRef]
Oh, M.S.; Chen, Z.Y.; Jahanshiri, E.; Isa, D.; Wong, Y.W. An economic feasibility assessment framework for underutilised crops using Support Vector Machine. Comput. Electron. Agric. 2020, 168, 105116. [Google Scholar] [CrossRef]
Huo, W.W.; Li, W.E.; Sun, C.; Ren, Q.; Gong, G.Q. Research on fuel cell fault diagnosis based on genetic algorithm optimization of support vector machine. Energies 2022, 15, 2294. [Google Scholar] [CrossRef]
Aalizadeh, R.; Von der Ohe, P.C.; Thomaidis, N.S. Prediction of acute toxicity of emerging contaminants on the water flea Daphnia magna by ant colony optimization-support vector machine QSTR models. Environmental Science. Process. Impacts 2017, 19, 438–448. [Google Scholar] [CrossRef] [PubMed]
Liu, M.; Huang, F.L.; Fu, Y.M.; Yang, X.L. Application of improved support vector machine algorithm optimized by artificial bee colony algorithm in intrusion detection. Comput. Appl. Softw. 2017, 34, 230–246. [Google Scholar]
Bui, D.T.; Tran, A.T.; Nhat-Duc, H.; Nguyen, Q.T.; Duy, B.N.; Ngo, V.L.; Pradhan, B. Spatial prediction of rainfall-induced landslides for the Lao Cai area (Vietnam) using a hybrid intelligent approach of least squares support vector machines inference model and artificial bee colony optimization. Landslides 2017, 14, 447–458. [Google Scholar]
Tharwat, A.; Hassanien, A.E.; Elnaghi, B.E. A BA-based algorithm for parameter optimization of support vector machine. Pattern Recognit. Lett. 2016, 93, 13–22. [Google Scholar] [CrossRef]
Deepa, R.; Venkataraman, R. Enhancing whale optimization algorithm with Levy flight for coverage optimization in wireless sensor networks. Comput. Electr. Eng. 2021, 94, 107359. [Google Scholar] [CrossRef]
He, X.L.; Zhang, G.; Chen, Y.H.; Yang, S.Z. Multi-class algorithm of WOA-SVM using Levy flight and elite opposition-based learning. Appl. Res. Comput. 2021, 38, 3640–3645. [Google Scholar]
Ji, Y.J.; Fan, C.J. Short-term power load prediction based on WNR-CLSSA-LSTM. Intell. Comput. Appl. 2023, 13, 76–84+88. [Google Scholar]
Fan, X.Z.; Yu, M. Coverage optimization of WSN based on improved grey wolf optimizer. Comput. Sci. 2022, 49, 628–631. [Google Scholar]
Prasad, D.; Mukherjee, A.; Mukherjee, V. Temperature dependent optimal power flow using chaotic whale optimization algorithm. Expert Syst. 2021, 38, 12685. [Google Scholar] [CrossRef]
Kohli, M.; Arora, S. Chaotic grey wolf optimization algorithm for constrained optimization problems. J. Comput. Des. Eng. 2017, 5, 458–472. [Google Scholar] [CrossRef]
Li, J.H.; Guo, H. A hybrid whale optimization algorithm for plane block parallel blocking flowline scheduling optimization with deterioration effect in lean shipbuilding. IEEE Access 2021, 9, 131893–131905. [Google Scholar] [CrossRef]
Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S.; Faris, H. MTDE: An effective multi-trial vector-based differential evolution algorithm and its applications for engineering design problems. Appl. Soft Comput. J. 2020, 97, 106761. [Google Scholar] [CrossRef]
Too, J.; Mafarja, M.; Mirjalili, S. Spatial bound whale optimization algorithm: An efficient high-dimensional feature selection approach. Neural Comput. Appl. 2021, 33, 16229–16250. [Google Scholar] [CrossRef]
Khaire, U.M.; Dhanalakshmi, R. Stability investigation of improved whale optimization algorithm in the process of feature selection. IETE Tech. Rev. 2022, 39, 286–300. [Google Scholar] [CrossRef]
Pan, H.Y.; Chen, S.X.; Xiong, H.L. A high-dimensional feature selection method based on modified gray wolf optimization. Appl. Soft Comput. J. 2023, 135, 110031. [Google Scholar] [CrossRef]
Cui, X.W.; E, S.J.; Niu, D.X.; Chen, B.S.; Feng, J.Q. Forecasting of carbon emission in China based on gradient boosting decision tree optimized by modified whale optimization algorithm. Sustainability 2021, 13, 12302. [Google Scholar] [CrossRef]
Yang, J.X.; Lan, X.P.; Feng, Y.D.; Yang, Y.M.; Guo, Z.M. An ammunition quality evaluation method based on least squares support vector machine. Acta Armamentarii 2022, 43, 1012–1022. [Google Scholar]
He, P.; Wu, W.J. Levy flight-improved grey wolf optimizer algorithm-based support vector regression model for dam deformation prediction. Front. Earth Sci. 2023, 11, 1122937. [Google Scholar] [CrossRef]
Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S. An improved grey wolf optimizer for solving engineering problems. Expert Syst. Appl. 2021, 166, 113917. [Google Scholar] [CrossRef]
Chandran, V.; Mohapatra, P. Enhanced opposition-based grey wolf optimizer for global optimization and engineering design problems. Alex. Eng. J. 2023, 76, 429–467. [Google Scholar] [CrossRef]
Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S.; Abualigah, L.; Abd, E.M.; Oliva, D. EWOA-OPF: Effective whale optimization algorithm to solve optimal power flow problem. Electronics 2021, 10, 2975. [Google Scholar] [CrossRef]
Bathina, V.; Devarapalli, R.; García, M.F.P. Hybrid approach with combining cuckoo-search and grey wolf optimizer for solving optimal power flow problems. J. Electr. Eng. Technol. 2022, 18, 1637–1653. [Google Scholar] [CrossRef]
Zhang, F.; Wang, L.; Zhao, J.; Wu, L. Application of salp swarm algorithm in optimal power flow calculation for power system. Distrib. Energy 2021, 6, 35–43. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Hofmann, T.; Schölkopf, B.; Smola, A.J. Kernel methods in machine learning. Ann. Stat. Off. J. Inst. Math. Stat. 2008, 36, 1171–1220. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Mirjalili, S.; Gandomi, A.H.; Mirjalili, S.Z.; Saremi, S.; Faris, H.; Mirjalili, S.M. Salp swarm algorithm: A bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 2017, 114, 163–191. [Google Scholar] [CrossRef]
Xue, J.K.; Shen, B. Dung beetle optimizer: A new meta-heuristic algorithm for global optimization. J. Supercomput. 2022, 79, 7305–7336. [Google Scholar] [CrossRef]
Triguero, I.; González, S.; Moyano, J.M.; García, S.; Alcalá-Fdez, J.; Luengo, J.; Fernández, A.; del Jesús, M.J.; Sánchez, L.; Herrera, F. KEEL 3.0: An open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 2017, 10, 1238–1249. [Google Scholar] [CrossRef]

Figure 1. Cost-sensitive support vector machine optimized using swarm intelligence algorithm.

Figure 2. Content preference TGI level of some high-purchasing-power consumers.

Figure 3. (a) Comparison of accuracy; (b) Comparison of F1-score.

Table 1. Comparison of existing methods.

Method	Strategy	Optimal Point	Characteristic
Lévy Whale Optimization Algorithm [17]	Adjusting Flight Strategy	Lévy flight is embedded in the exploration process of the whale optimization algorithm.	It easily leaps out of the optimum local position and increases the speed of convergence.
Cubic Lévy Salp Swarm Algorithm [18]		Lévy flight is added to the position update strategy of the leader and follower in the salp swarm algorithm.	It expands the search range of the population and increases the optimum speed, but does not render reliable and consistent results.
Lévy Grey Wolf Optimization [19]		An improved Lévy flight strategy is used to update the position of the grey wolf.	It effectively balances the global and local capabilities.
Chaotic Whale Optimization Algorithm [21]	Integrating Chaotic Strategy	It generates the initial population through a logistic chaotic map.	Its exploration in the search space is more dynamic and global.
Chaotic Grey Wolf Optimization [22]	Integrating Chaotic Strategy	The ten most relevant chaotic maps are used to update the position of the grey wolf.	It finds the optimal solution faster and improves the convergence speed of the algorithm.
Lévy Flight and Elite Opposition-based Whale Optimization Algorithm [20]	Multiple Strategies	It uses Lévy flight instead of the spiral strategy to update the position and introduces elite opposition-based learning to increase population diversity.	It quickly leaps out of the optimum local position and greatly improves the probability of the algorithm searching for the global optimal solution.
Hybrid Whale Optimization Algorithm [23]	Multiple Strategies	A tent chaotic map is introduced to initialize the population. An adaptive weight factor is used to update the position, and the Lévy flight strategy is applied to the current optimal individuals.	It improves the diversity of the initial population. And it balances the strong global exploration ability and the efficiency of local search.

Table 2. Common nonlinear kernel functions.

Kernel Functions	Representation	Parameter Explaining
Polynomial Kernel	$K (x_{i}, x_{j}) = {({x_{i}}^{T} x_{j})}^{d}$	$d \geq 1$ is the degree of the polynomial and degenerates to a linear kernel when $d = 1$
Gaussian Kernel	$K (x_{i}, x_{j}) = e x p (- \frac{{‖x_{i} - x_{j}‖}^{2}}{2 σ^{2}})$	$σ > 0$ is the bandwidth of the Gaussian kernel
Sigmoid Kernel	$K (x_{i}, x_{j}) = t a n h (β {x_{i}}^{T} x_{j} + θ)$	$β > 0, θ < 0$

Table 3. Information of datasets.

Dataset	IR	Attributes	Examples
ionosphere	1.79	33	351
glass1	1.82	9	214
ecoli-0vs1	1.86	7	220
iris0	2.00	4	150
glass0	2.06	9	214
ecoli1	3.36	7	336
appendictis	4.05	7	106
ecoli2	5.46	7	336
ecoli3	8.60	7	336
vowel0	9.98	13	988

Table 4. Accuracy of optimized cost-sensitive support vector machine.

Dataset	IR	SVM	GS	GA	PSO	WOA	GWO	SSA	DBO
ionosphere	1.79	0.8714	0.9000	0.8857	0.9000	0.9000	0.9000	0.9000	0.9000
glass1	1.82	0.6512	0.6512	0.6977	0.6744	0.6744	0.6744	0.6744	0.6744
ecoli-0vs1	1.86	0.9545	0.9773	0.9773	0.9773	0.9773	0.9773	0.9773	0.9773
iris0	2.00	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
glass0	2.06	0.6744	0.6744	0.6744	0.7907	0.7907	0.8140	0.7907	0.7907
ecoli1	3.36	0.8657	0.8209	0.8806	0.8657	0.8657	0.8657	0.8806	0.8806
appendictis	4.05	0.8095	0.8095	0.8571	0.8571	0.8571	0.8571	0.8571	0.8095
ecoli2	5.46	0.9403	0.9701	0.9701	0.9701	0.9701	0.9701	0.9701	0.9701
ecoli3	8.60	0.8955	0.8955	0.9104	0.8955	0.9104	0.8955	0.9104	0.9104
vowel0	9.98	0.9949	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000

Table 5. Recall of optimized cost-sensitive support vector machine.

Dataset	IR	SVM	GS	GA	PSO	WOA	GWO	SSA	DBO
ionosphere	1.79	0.6800	0.9200	0.9600	0.9600	0.9200	0.9600	0.9200	0.9600
glass1	1.82	0.0000	0.0000	0.5333	0.4667	0.4667	0.4667	0.4667	0.2667
ecoli-0vs1	1.86	0.8667	0.9333	0.9333	0.9333	0.9333	0.9333	0.9333	0.9333
iris0	2.00	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
glass0	2.06	0.0714	0.0000	1.0000	0.7857	0.7857	0.8571	0.7857	0.5000
ecoli1	3.36	0.4667	0.2667	0.5333	0.4667	0.4667	0.4667	0.5333	0.5333
appendictis	4.05	0.2500	0.0000	0.2500	0.2500	0.2500	0.2500	0.2500	0.2500
ecoli2	5.46	0.7000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
ecoli3	8.60	0.0000	0.0000	0.4286	0.0000	0.4286	0.0000	0.4286	0.4286
vowel0	9.98	0.9444	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000

Table 6. Precision of optimized cost-sensitive support vector machine.

Dataset	IR	SVM	GS	GA	PSO	WOA	GWO	SSA	DBO
ionosphere	1.79	0.9444	0.8214	0.7742	0.8000	0.8214	0.8000	0.8214	0.8000
glass1	1.82	0.0000	0.0000	0.5714	0.5385	0.5385	0.5385	0.5385	0.5714
ecoli-0vs1	1.86	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
iris0	2.00	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
glass0	2.06	0.5000	0.0000	0.5000	0.6471	0.6471	0.6667	0.6471	0.7778
ecoli1	3.36	0.8750	0.8000	0.8889	0.8750	0.8750	0.8750	0.8889	0.8889
appendictis	4.05	0.5000	0.0000	1.0000	1.0000	1.0000	1.0000	1.0000	0.5000
ecoli2	5.46	0.8750	0.8333	0.8333	0.8333	0.8333	0.8333	0.8333	0.8333
ecoli3	8.60	0.0000	0.0000	0.6000	0.0000	0.6000	0.0000	0.6000	0.6000
vowel0	9.98	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000

Table 7. G-mean of optimized cost-sensitive support vector machine.

Dataset	IR	SVM	GS	GA	PSO	WOA	GWO	SSA	DBO
ionosphere	1.79	0.8154	0.9043	0.9004	0.9121	0.9043	0.9121	0.9043	0.9121
glass1	1.82	/	/	0.6473	0.6055	0.6055	0.6055	0.6055	0.4880
ecoli-0vs1	1.86	0.9309	0.9661	0.9661	0.9661	0.9661	0.9661	0.9661	0.9661
iris0	2.00	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
glass0	2.06	0.2626	/	0.7192	0.7894	0.7894	0.8245	0.7894	0.6823
ecoli1	3.36	0.6765	0.5114	0.7323	0.6765	0.6765	0.6765	0.7232	0.7232
appendictis	4.05	0.4851	/	0.5000	0.5000	0.5000	0.5000	0.5000	0.4851
ecoli2	5.46	0.8293	0.9823	0.9823	0.9823	0.9823	0.9823	0.9823	0.9823
ecoli3	8.60	/	/	0.6437	/	0.6437	/	0.6437	0.6437
vowel0	9.98	0.9718	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000

Table 8. F1-score of optimized cost-sensitive support vector machine.

Dataset	IR	SVM	GS	GA	PSO	WOA	GWO	SSA	DBO
ionosphere	1.79	0.7907	0.8679	0.8571	0.8727	0.8679	0.8727	0.8679	0.8727
glass1	1.82	/	/	0.5517	0.5000	0.5000	0.5000	0.5000	0.3636
ecoli-0vs1	1.86	0.9286	0.9655	0.9655	0.9655	0.9655	0.9655	0.9655	0.9655
iris0	2.00	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
glass0	2.06	0.1250	/	0.6667	0.7097	0.7097	0.7500	0.7097	0.6087
ecoli1	3.36	0.6087	0.4000	0.6667	0.6087	0.6087	0.6087	0.6667	0.6667
appendictis	4.05	0.3333	/	0.4000	0.4000	0.4000	0.4000	0.4000	0.3333
ecoli2	5.46	0.7778	0.9091	0.9091	0.9091	0.9091	0.9091	0.9091	0.9091
ecoli3	8.60	/	/	0.5000	/	0.5000	/	0.5000	0.5000
vowel0	9.98	0.9714	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000

Table 9. Variance homogeneity test results.

Feature Variable	Standard Deviation		Homogeneity of Variance
Feature Variable	Positive Samples	Negative Samples	F Statistic	p-Value
Fashion	2.22	11.00	0.04	0.00
Food	1.77	8.23	0.05	0.00
Cultural Education	0.92	8.57	0.01	0.00
Sports	1.84	8.67	0.04	0.00
Travel	2.74	11.63	0.06	0.00
Photography	2.56	8.24	0.10	0.00
Interpretation	4.69	10.89	0.19	0.00
Originality	4.53	8.78	0.27	0.01
Animals and Plants	1.66	8.25	0.04	0.00
Film and Television	3.13	8.14	0.15	0.00
Life	1.69	8.92	0.04	0.00
Parent–child	12.80	10.37	1.52	0.21
Automobile	2.48	8.92	0.08	0.00
News	16.35	12.15	1.81	0.08
2D	1.89	6.18	0.09	0.00
Game	12.65	9.69	1.70	0.11
Dance	1.35	9.72	0.02	0.00
Military Politics Legislation Police	1.94	9.27	0.04	0.00
Emotion	16.17	7.25	4.98	0.00
Music	2.08	11.07	0.04	0.00
Science and Technology	2.22	14.17	0.02	0.00
Finance	6.30	11.93	0.28	0.01
Medical Treatment	2.17	8.18	0.07	0.00
Countryside	9.69	19.96	0.24	0.00

Table 10. Mean test results.

Feature Variable	Mean		Mean Hypothesis Test
Feature Variable	Positive Samples	Negative Samples	T Statistic	p-Value
Fashion	125.82	108.92	15.79	0.00
Food	105.97	105.29	0.84	0.40
Cultural Education	99.63	101.71	−2.77	0.01
Sports	106.59	105.57	1.19	0.24
Travel	101.99	103.70	−1.44	0.15
Photography	98.43	100.65	−2.36	0.02
Interpretation	90.79	99.85	−6.11	0.00
Originality	101.06	103.61	−1.89	0.07
Animals and Plants	100.37	101.23	−1.08	0.28
Film and Television	97.23	104.48	−7.00	0.00
Life	92.22	102.68	−12.25	0.00
Parent–child	91.30	102.25	−3.91	0.00
Automobile	101.47	105.80	−4.47	0.00
News	99.39	102.21	−0.85	0.40
2D	111.02	105.15	8.40	0.00
Game	132.97	108.25	9.37	0.00
Dance	108.04	102.82	5.96	0.00
Military Politics Legislation Police	95.96	102.86	−7.58	0.00
Emotion	111.03	103.92	1.74	0.10
Music	105.04	102.88	2.04	0.04
Science and Technology	137.47	110.60	20.59	0.00
Finance	92.40	101.89	−5.10	0.00
Medical Treatment	106.82	105.58	1.42	0.16
Countryside	77.85	98.08	−6.88	0.00

Table 11. Feature variable screening results.

Feature Variable	IR = 1		IR = 1.5		IR = 2.3		IR = 4
Feature Variable	Statistic	p-Value	Statistic	p-Value	Statistic	p-Value	Statistic	p-Value
Fashion	6.15	0.00	6.34	0.00	7.18	0.00	8.52	0.00
Food	0.68	0.50	0.96	0.34	−0.32	0.75	0.13	0.90
Cultural education	−1.81	0.07	−1.31	0.19	−2.30	0.02	−1.96	0.05
Sports	0.17	0.87	0.43	0.67	−0.20	0.84	0.03	0.98
Travel	−1.33	0.19	−1.51	0.13	−1.98	0.05	−2.32	0.02
photography	−3.77	0.00	−3.21	0.00	−3.35	0.00	−3.46	0.00
interpretation	−3.66	0.00	−3.69	0.00	−5.16	0.00	−5.73	0.00
Originality	−2.01	0.05	−1.80	0.07	−3.12	0.00	−2.84	0.01
Animals and Plants	−1.40	0.17	−0.92	0.36	−1.22	0.22	−1.13	0.26
movies and television	−2.76	0.01	−2.34	0.02	−3.74	0.00	−3.48	0.00
Life	−4.45	0.00	−4.08	0.00	−6.49	0.00	−7.92	0.00
Parent–child	−4.40	0.00	−4.31	0.00	−5.46	0.00	−4.95	0.00
Automobile	−2.01	0.05	−2.56	0.01	−3.87	0.00	−4.79	0.00
News	−2.31	0.02	−1.61	0.11	−2.05	0.04	−1.09	0.28
2D	2.43	0.02	3.35	0.00	2.48	0.01	3.23	0.00
Game	10.74	0.00	10.23	0.00	9.36	0.00	8.25	0.00
Dance	1.40	0.16	1.97	0.05	2.56	0.01	4.09	0.00
Military Politics Legislation Police	−3.83	0.00	−3.83	0.00	−5.55	0.00	−5.64	0.00
Emotion	3.73	0.00	4.00	0.00	3.41	0.00	2.60	0.01
Music	−0.96	0.34	−0.58	0.56	−0.66	0.51	0.31	0.75
Science and Technology	8.07	0.00	7.61	0.00	8.78	0.00	8.93	0.00
Finance	−1.73	0.08	−1.48	0.14	−3.03	0.00	−4.19	0.00
Medical Treatment	1.20	0.23	1.59	0.11	0.34	0.73	0.39	0.70
Countryside	−4.65	0.00	−5.16	0.00	−7.12	0.00	−8.24	0.00

Table 12. Forecast results of consumer purchasing power.

Model	Evaluation Index	IR = 1	IR = 1.5	IR = 2.3	IR = 4	IR = 9
SVM	Accuracy	0.7500	0.7500	0.8125	0.8542	0.8958
SVM	F1-score	0.7500	0.6000	0.5714	0.6316	/
GS-CSSVM	Accuracy	0.8333	0.7708	0.8125	0.8542	0.8958
GS-CSSVM	F1-score	0.8519	0.5926	0.5263	0.6667	/
GA-CSSVM	Accuracy	0.9167	0.9375	0.9167	0.9167	0.9375
GA-CSSVM	F1-score	0.9091	0.9143	0.8667	0.8182	0.7692
PSO-CSSVM	Accuracy	0.9167	0.9167	0.9167	0.9375	0.9375
PSO-CSSVM	F1-score	0.9130	0.8889	0.8333	0.8235	0.7692
WOA-CSSVM	Accuracy	0.9167	0.9375	0.9167	0.9375	0.9792
WOA-CSSVM	F1-score	0.9200	0.9189	0.8667	0.8571	0.8889
GWO-CSSVM	Accuracy	0.9375	0.9375	0.9375	0.9375	0.9792
GWO-CSSVM	F1-score	0.9362	0.9231	0.8966	0.8571	0.9091
SSA-CSSVM	Accuracy	0.9375	0.9167	0.9375	0.9167	0.9792
SSA-CSSVM	F1-score	0.9388	0.9000	0.8889	0.8182	0.8889

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, R.; Yang, M.; Sun, P. Consumer Purchasing Power Prediction of Interest E-Commerce Based on Cost-Sensitive Support Vector Machine. Sustainability 2023, 15, 14693. https://doi.org/10.3390/su152014693

AMA Style

Ye R, Yang M, Sun P. Consumer Purchasing Power Prediction of Interest E-Commerce Based on Cost-Sensitive Support Vector Machine. Sustainability. 2023; 15(20):14693. https://doi.org/10.3390/su152014693

Chicago/Turabian Style

Ye, Rendao, Mengyao Yang, and Peng Sun. 2023. "Consumer Purchasing Power Prediction of Interest E-Commerce Based on Cost-Sensitive Support Vector Machine" Sustainability 15, no. 20: 14693. https://doi.org/10.3390/su152014693

APA Style

Ye, R., Yang, M., & Sun, P. (2023). Consumer Purchasing Power Prediction of Interest E-Commerce Based on Cost-Sensitive Support Vector Machine. Sustainability, 15(20), 14693. https://doi.org/10.3390/su152014693

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Consumer Purchasing Power Prediction of Interest E-Commerce Based on Cost-Sensitive Support Vector Machine

Abstract

1. Introduction

2. Related Work

3. Theoretical Basis

3.1. Cost-Sensitive Support Vector Machine

3.2. New Swarm Intelligence Algorithm

3.2.1. Whale Optimization Algorithm

3.2.2. Grey Wolf Optimization

3.2.3. Salp Swarm Algorithm

3.3. Cost-Sensitive Support Vector Machine Optimized Using New Swarm Intelligence Algorithm

3.3.1. Evaluation Index

3.3.2. Model Evaluation

4. Consumer Purchasing Power Prediction of Interest E-Commerce

4.1. Data Description

4.2. Feature Selection

4.2.1. Variance Homogeneity Test

4.2.2. Two-Sample Mean Test

4.3. Prediction of Consumer Purchasing Power

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI