1. Introduction
With the everchanging global economic environment and the intensified market competition [
1], the bank deposit business has undergone profound changes. When bank interest rates rise, customers are more inclined to choose time deposits to obtain higher returns, and when interest rates fall, they are more inclined to choose more liquid demand deposits or other means of investment [
2]. In addition, inflation also has an impact on customers’ purchasing power and financial management behaviors [
3]. As customers’ needs are getting increasingly more diversified and personalized, customers of different ages, occupations, income and risk preferences have different expectations for banks’ deposit business. Young customers tend to deposit with strong liquidity, while middle-aged and elderly customers are more inclined to time deposits with higher stability and security. Therefore, banks should segment the market [
4] to understand the needs of different customer groups, and launch corresponding deposit options.
The big data technology, artificial intelligence and financial technology have shown great potential in analysis and prediction of customer behaviors, formulation of marketing strategies and personalized recommendation with their development [
4]. They provide banks with powerful tools for analyzing customer behaviors, market trends and risk management. It also brings new opportunities for the application of customer segmentation theory. Customer Segmentation Theory focuses on dividing the customer base in a market into subgroups with similar characteristics or behaviors [
5]. This approach allows for more precise marketing strategies and service plans tailored to each group. By discovering user preferences within vast amounts of data, banks can provide differentiated services to various customer segments, rescue customers at risk of attrition, and stand out in the competitive landscape of customer resources [
6], laying a theoretical foundation for achieving precise marketing.
Current research and application of customer segmentation theory have the following shortcomings: Firstly, Simplicity of Customer Segmentation Models: Existing models tend to be overly simplistic, neglecting the diversity of customers. They often rely on single-dimensional analyses, focusing only on explicit indicators such as age, assets, or account balances, without considering the integration of these indicators. Additionally, implicit indicators—like customer attitudes, comments, feedback, and preferences—are frequently overlooked [
7]. In actual banking cases, understanding implicit factors such as customer satisfaction with current services, brand recognition, risk tolerance, and online comments and feedback can provide deeper insights into customer needs and identify potential demands and preferences. Secondly, Lack of Dynamic Adaptability and Real-Time Feedback Mechanisms: Traditional tabular analysis methods require manual consolidation of periodic customer data for analysis. In today’s fast-paced environment, banks need to establish flexible operational mechanisms [
8] that can adjust business structures and product types based on real-time data analysis. By leveraging artificial intelligence and machine learning technologies, banks can timely acquire customer data, establish multi-channel feedback mechanisms, and analyze customer preferences to reduce churn rates. This paper proposes a bank customer segmentation strategy [
9] based on the DBSCAN algorithm, which employs web scraping techniques to quickly collect real-time data and obtain customer attribute information, feedback, and comments [
10]. In the process of segmenting bank customers using this data, the strategy comprehensively considers both explicit and implicit customer data indicators and adjusts service plans based on the changing needs of customers.
A well-designed banking marketing plan can enable precise customer targeting and product customization. By tailoring financial management solutions based on the business scale and needs of different types of corporate clients, and through effective customer relationship management strategies, such as timely follow-ups to understand the latest client requirements, banks can enhance clients’ financial knowledge and awareness of funds management [
11]. This, in turn, increases customer loyalty and satisfaction. Therefore, achieving precise marketing is key to improving banking profitability. In the current information age, major banks widely use big data technologies and machine learning methods to aggregate customer information into databases, process customer data in bulk, and employ techniques such as clustering algorithms and association rule analysis to assess customer consumption behavior and investment preferences. This approach updates the traditional manual processing methods in marketing models and operational costs [
12]. However, there are several shortcomings in the current research on marketing plans: Firstly, Lack of Differentiated Marketing Approaches: When recommending banking services, there is no strict differentiation among customers, leading to the use of indiscriminate marketing methods that waste considerable human and material resources without achieving significant results. Secondly, Lack of Awareness in Customer Relationship Maintenance: In terms of customer relationship management, there has been insufficient focus on effective communication with customers and timely handling of their feedback and comments, resulting in decreased customer loyalty and a significant loss of customers.
In recent years, machine learning-based clustering methods have gained increasing attention in customer segmentation. In particular, the DBSCAN algorithm [
13] has become an important tool in customer segmentation within the financial industry due to its ability to automatically identify cluster structures without the need for pre-set cluster numbers and its capacity to handle noise points. However, despite its significant theoretical advantages, DBSCAN faces several challenges in practical applications. Firstly, the algorithm is highly sensitive to parameter selection (such as neighborhood radius and density threshold), which are often set empirically and lack a systematic optimization mechanism. Secondly, customer groups typically have complex attributes across multiple dimensions, and relying on a single clustering metric or simple parameter selection may not fully capture the diverse needs of the customer base, leading to inaccurate clustering results.
This paper aims to explore the current development status and limitations of existing bank customer segmentation methods and marketing strategies. It proposes a customer segmentation framework based on the KM-DBSCAN algorithm to address the challenges banks face in accurately segmenting their customer base. The framework aims to optimize the accuracy of customer segmentation, reduce time complexity, and enhance the application efficiency of customer segmentation in large-scale datasets. Additionally, it offers marketing strategy optimization suggestions and verifies the feasibility and effectiveness of the approach through practical applications in the financial industry.
2. Literature Review
Bank customer segmentation is a key step for banks to achieve targeted marketing, risk control, and resource optimization. In the context of increasingly fierce competition in modern banking, the importance of customer segmentation is growing. Through effective customer segmentation, banks can identify the needs and characteristics of different customer groups, offering tailored products and services to each group, thereby enhancing customer satisfaction and loyalty [
14]. Accurate customer segmentation not only helps banks gain a competitive edge in the market but also supports cost control, optimizes resource allocation, and improves marketing efficiency. In terms of targeted marketing, by segmenting different customer groups, banks can push suitable financial products based on customers’ consumption habits, financial needs, and behavior patterns. For example, for high-net-worth individuals, banks can offer personalized wealth management services; for younger customers, banks can promote online payment or investment products; for customers with credit risks [
15], banks can provide more appropriate loan terms and risk control strategies. Through such customized services, banks can improve the match between products and services, thereby increasing customer stickiness and loyalty.
However, bank customer segmentation faces several challenges in practical applications. Firstly, with the advent of the big data era, customer attributes and behavior data have become increasingly complex and multidimensional [
16]. Traditional customer segmentation methods often rely on a few simple dimensions (such as age, gender, income, etc.), which cannot fully capture the diverse needs and dynamic changes of customers. Therefore, handling large volumes of high-dimensional data and extracting effective information from it is a major challenge for bank customer segmentation. Secondly, customer data often contains a significant amount of noise and outliers [
17], which can seriously impact the results of customer segmentation. Traditional segmentation methods typically lack sufficient robustness when dealing with noisy data, which can lead to inaccurate clustering results and even misguide marketing decisions. How to handle and remove noise data while ensuring clustering accuracy becomes another key issue in solving the bank customer segmentation problem. Lastly, customer behavior patterns and needs are becoming increasingly diverse and rapidly changing. Traditional segmentation methods struggle to capture these dynamic changes. For example, with the development of financial technology, more and more customers are opting for online banking and mobile payments, and this shift in behavior has significant implications for a bank’s marketing strategies and service design. Therefore, how to use flexible and efficient algorithms [
18] to capture the dynamic features of customer behavior and update them in real time is another major challenge for bank customer segmentation.
In summary, bank customer segmentation not only holds significant theoretical and practical importance but also faces numerous challenges in actual operation, such as data complexity, noise interference, and changes in customer needs. Therefore, how to employ advanced technologies and methods to optimize the bank customer segmentation process, improving its accuracy and timeliness, has become a core issue that the banking industry urgently needs to address.
Market segmentation theory was first proposed by American scholar Smith in 1956. He suggested that when formulating marketing strategies, businesses should combine factors such as customer attributes, preferences, and values with product differentiation to develop targeted marketing models and service plans. Since the 1960s, numerous scholars have contributed to the development of market segmentation theory [
19], continually creating new research outcomes. Customer segmentation theory mainly includes behavioral segmentation, attribute-based segmentation, value-based segmentation, and needs-based segmentation, each of which seeks to understand customer needs and behavior patterns from different perspectives, thereby enhancing customer loyalty and business profitability.
Joung and Kim [
20] proposed an interpretable machine learning method for customer segmentation. They divided customer groups by combining sentiment analysis and customer behavior data. This data-driven approach provides an effective supplement to traditional demographic-based segmentation methods, though it faces challenges when dealing with complex and noisy data. This study also adopts machine learning methods and proposes the KM-DBSCAN algorithm. Unlike traditional algorithms such as K-means, this algorithm can handle density clustering problems with noisy data, thereby better identifying potential needs within customer groups. Sun [
21] and other scholars proposed a Gaussian Peak Heuristic Clustering (GPHC) method. This method identifies customer preference patterns using genetic algorithms and hierarchical clustering and optimizes the final customer segmentation results through heuristic information. This study draws on the customer segmentation ideas of the GPHC method, further combining the density clustering characteristics of the DBSCAN algorithm. It innovatively proposes a multi-dimensional clustering framework that integrates customer preferences and behavioral data. This method not only helps uncover potential patterns in customer demand but also provides data support for precise marketing strategies for bank customer groups. Although the GPHC method achieves high clustering accuracy in customer demand data, it may face challenges when dealing with customer data that is unevenly distributed or contains excessive noise. In contrast, the DBSCAN algorithm effectively addresses heterogeneity and noise in customer data through density clustering and can identify potential customer groups without explicit category labels. This makes DBSCAN particularly important in bank customer segmentation, especially when dealing with large datasets and complex, diverse customer groups. Tang et al. [
22] proposed a three-dimensional joint segmentation model for the B2B market, designed to segment customers based on the behaviors and needs of direct customers as well as the needs of downstream clients. This model provides marketing personnel with more accurate sales allocation and communication strategies. Although this research targets B2B customer segmentation, the proposed three-dimensional segmentation model offers insights for bank customer segmentation as well. In bank customer segmentation, in addition to the explicit needs and behavior data of direct customers, implicit needs (such as the latent demand for financial products) may also affect segmentation results. This study integrates explicit indicators, behavioral data, and implicit indicators to improve the accuracy and practicality of the clustering results. Tabianan et al. [
23] proposed a customer segmentation method based on K-Means clustering. This method analyzes customer clustering by collecting purchase data from e-commerce customers, identifying high-profit customer groups, and promoting products accordingly. The study focuses on customer behavior characteristics and uses the K-Means clustering algorithm to analyze customer purchasing patterns, improving segmentation accuracy by optimizing intra-group similarity and inter-group differences. This algorithm demonstrates how clustering analysis can provide theoretical support for customer segmentation and offers practical experience for implementing bank customer segmentation. Although Tabianan et al. [
23] used K-Means clustering, the KM-DBSCAN algorithm proposed in this study shows more advantages compared to K-Means when dealing with noisy data and irregularly distributed customer groups. Particularly in cases where bank customer groups may exhibit complex distribution patterns, KM-DBSCAN is better at identifying heterogeneity between customer groups.
Othayoth et al. [
24] explored the application of various machine learning algorithms in customer segmentation. In the face of information overload related to product details in the e-commerce domain, they used techniques such as customer profiling, similarity clustering, and RFM unit classification to segment customer groups based on behavioral data, thereby achieving personalized recommendations. Sarkar et al. proposed a customer segmentation marketing strategy that combines the K-Means algorithm with RFM analysis, with an overall clustering purity evaluation of 0.95, indicating that the K-Means clustering algorithm combined with RFM analysis achieved an accuracy rate of 95% in customer attribute and feature segmentation. Hicham et al. [
25] introduced a clustering ensemble-based customer segmentation technique, which integrates DBSCAN, Mini Batch K-means, and Mean Shift clustering models, and then uses spectral clustering methods to combine multiple clustering results into the final customer group segmentation. Li et al. [
26] proposed a customer segmentation model that integrates Support Vector Machines (SVM) with clustering algorithms. This model first uses SVM to segment the current customer data, and then combines SVM with clustering algorithms to construct the customer segmentation model. The methods mentioned above explore the application of machine learning algorithms in customer segmentation from different perspectives. They effectively address the challenges posed by information overload in the context of big data, which can complicate decision-making and marketing strategy formulation. However, these methods often rely on a single model for customer segmentation, fail to integrate multiple customer attribute indicators [
27], and do not investigate the impact of different parameters in the model algorithms on the clustering results.
Customer segmentation is also widely applied in practice. Bank of America has enhanced its retail banking marketing effectiveness by implementing customer segmentation strategies. By analyzing customer account activities, spending behaviors, and other financial data, the bank uses clustering algorithms to segment customers and provide personalized products and services such as loans, credit cards, and savings accounts to different customer groups. For example, Bank of America offers exclusive wealth management services and investment products to high-net-worth clients, flexible financial products to younger customers, and low-interest loans and preferential savings accounts to low-income customers. Citibank utilizes big data analytics and machine learning algorithms to achieve precise customer segmentation based on customer behavioral data, transaction data, and other historical information. The bank recommends financial products, behavior predictions, and risk management tailored to each customer group. Citibank’s recommendation system is based on historical customer behavior data, providing personalized financial product suggestions. It also uses Natural Language Processing (NLP) techniques to process and analyze interactions with customers, extracting insights about customer satisfaction and latent needs. China Construction Bank (CCB) uses the K-means clustering algorithm to analyze customer data, segmenting customers into groups based on characteristics such as consumption habits, credit status, and account activity. Using the RFM model, the bank further classifies customers into high-value, potential, and at-risk groups, offering personalized services and products, and devising targeted marketing strategies for each group.
Although customer segmentation has achieved some success in the banking industry, there are still limitations in practical applications. Although Citibank enhances segmentation accuracy using big data and NLP techniques, it still largely relies on explicit indicators such as transaction behavior data and historical records. Implicit characteristics (e.g., psychological needs or latent preferences) may not be sufficiently considered, which limits the depth and precision of segmentation. Banks like Citibank and China Construction Bank, despite utilizing complex machine learning algorithms and clustering techniques, often face high computational complexity with large, high-dimensional datasets. This results in slower processing speeds and challenges in responding quickly for real-time decision-making.
The KM-DBSCAN algorithm proposed in this paper addresses these issues. First, it reduces dependency on parameters by introducing K-means preprocessing, solving the difficulty of selecting global parameters in DBSCAN, especially the choice of neighborhood radius and density threshold. Second, it integrates both explicit and implicit features. By innovatively combining explicit and implicit indicators and using a weighted mechanism, the model increases sensitivity to different features, helping banks better capture customers’ multidimensional needs in practice, compensating for the limitations of relying solely on explicit features. Third, it improves computational efficiency. By using K-means as a preprocessing step, the algorithm reduces the computational complexity of DBSCAN, enhancing its efficiency in big data scenarios. This is particularly advantageous when handling large amounts of customer data.
4. Experimental Process and Result Analysis
4.1. Experimental Process
In this experiment, the customer attribute columns include the customer’s age, occupation, marital status, education background, asset balance, housing loan, borrowing, and activity information. The process of customer analysis with the KM-DBSCAN algorithm is as shown in
Figure 7.
First, the attribute feature column was subjected to PCA (principal component analysis) to reduce the data dimension to improve the operating efficiency of the model. The optimal parameters of neighborhood radius (Eps) and density threshold (MinPts) were selected for division of the clustering effect. After verification, the Eps value of 0.6 and the minPts value of 5 were selected in this experiment to obtain the optimal clustering effect.
Based on the above algorithm model and process, the following clustering results were obtained. The dataset was divided into two parts: the training set and the test set, with 80% of the data used as the training set and 20% as the test set. A comparative experiment was conducted, and the results show that the clustering accuracy of the proposed model in this study improved by 15% compared to the traditional DBSCAN algorithm.
- (1)
Experimental Environment
The environment configuration for this experiment is shown in
Table 1:
- (2)
Selection of Weight Factors
In this experiment, the weight factors for the explicit and implicit indicators in the customer dataset are adjusted to explore the impact of different weight combinations on the clustering results. The optimal weight factors are determined based on the clustering performance. The Davies-Bouldin Index (DBI) is used as the evaluation metric [
37]. A smaller DBI value indicates higher separation between clusters, better compactness within clusters, and thus better clustering performance. The formula for calculating the DBI index is as follows:
where
is the compactness of the
-th cluster, representing the average distance between the samples within the cluster and the cluster center.
is the separation between the
-th and
-th clusters, representing the distance between the centers of the two clusters.
As shown in
Table 2, when the weight ratio of explicit indicators to implicit indicators is 0.7:0.3, the DBI value is the smallest, indicating that the clustering effect of customers is optimal at this point.
- (3)
DBSCAN Parameter Selection
When using the DBSCAN algorithm for clustering within a region, in order to obtain the optimal clustering results, the optimal values for the neighborhood radius and density threshold are determined [
38]. This paper sets up the following comparative experiments, and the experimental data shows that when Eps = 0.6 and MinPts = 5, the number of clusters tends to stabilize, resulting in the optimal clustering outcome. The results are shown in
Table 3.
Key code as follows:
kmeans_labels = kmeans.fit_predict(X)
#Visualize the K-Means clustering results
plt.scatter(X[:, 0], X[:, 1], c = kmeans_labels, cmap = ‘viridis’, s = 10)
plt.title(’K-Means Clustering’)
plt.show()
# Apply DBSCAN within each K-Means cluster
epsilon = 0.6 # Neighborhood radius
min_samples = 5 # Minimum number of samples
#Refine the results of K-Means
final_labels = np.copy(kmeans_labels)
for i in range(4): # DBSCAN clustering for each cluster
# Get the points of the current cluster
cluster_points = X[kmeans_labels == i]
# DBSCAN clustering
dbscan = DBSCAN(eps = epsilon, min_samples = min_samples)
dbscan_labels = dbscan.fit_predict(cluster_points)
# Merge results: DBSCAN refines clustering within the cluster, noise points are marked as −1 mask = kmeans_labels == i
final_labels[mask] = final_labels[mask] × 100 + dbscan_labels
Experimental results as follows in
Figure 8:
4.2. Evaluation Metrics and Result Analysis
In order to test the effect of the algorithm, comparative experiments were conducted to compare the K-Means algorithm, DBSCAN algorithm and KM-DBSCAN algorithm with Silhouette Coefficient [
39], Precision and F1 Score as evaluation indicators.
The Silhouette Coefficient, which combines the tightness inside the cluster and the degree of separation between the clusters, is commonly used to evaluate the quality of the clustering results. The calculation formula is:
. For a data point
i, the average distance n of all in-cluster points except itself is calculated as the tightness, and the average distance m of all points in the nearest neighborhood cluster is calculated as the degree of separation, thereby the contour coefficient is calculated. The value range of
SC(
i) is [−1, 1], and the closer the value of
SC(
i) is to 1, the better the clustering effect. As shown in
Figure 9, after 40 times of iterations, the number of clustering clusters of KM-DBSCAN tended to stabilize. At this time, the contour coefficient reached 0.97, which is closer to 1, and the highest contour coefficients of the K-Means algorithm and the DBSCAN algorithm were 0.85 and 0.75, respectively, which indicates that the proposed KM-DBSCAN algorithm achieves a better clustering effect.
Accuracy is commonly used to evaluate the predictive power of algorithms in different categories, and can be also used to detect the performance of model clustering. In this experiment, the customer data set was divided into training set and test set, and the accuracy is used as the evaluation index [
40]. The results are shown in
Figure 10. It shows that with the increased times of iterations, the accuracy of the algorithm gradually increases. The average accuracy of the KM-DBSCAN algorithm, DBSCAN algorithm and K-Means algorithm is 92.3%, 81.6%, and 71.8%, respectively, which indicates that the proposed KM-DBSCAN algorithm achieves a higher accuracy in customer segmentation.
In order to further evaluate the overall performance of the model, F1 Score is used as the evaluation index. This index is the reconciled average of accuracy and recall rate, where both the two indicators can be taken into account in optimizing the model [
41]. The results are shown in
Figure 11. The F1 average value of the proposed KM-DBSCAN algorithm was 0.92, and that of the DBSCAN algorithm and K-Means algorithm was 0.83 and 0.71, respectively. It can be seen that the F1 average value of the proposed algorithm is significantly higher than that of the K-Means algorithm and the DBSCAN algorithm, which verifies that the overall performance of the constructed model is higher in customer segmentation.
4.3. Development of Marketing Plan
On the basis of above clustering results, the customers are subdivided into 4 groups, with the results shown in
Table 1:
Customer clustering results table and feature segmentation table are shown in
Table 4.
On the basis of above analysis of customer information, activity information, and customer clustering characteristics, the following marketing strategy analysis is made:
First, the characteristics of customers with fixed deposit behavior are summarized as follows:
- (1)
The proportion of technical personnel, management personnel, and company blue-collar workers is high.
- (2)
Customers with higher education have the habit and awareness of time deposits.
- (3)
The balance of assets is high, and there is a disposable amount.
- (4)
The proportion of time deposits of customers without borrowing or housing loans is high.
Second, the influencing factors of customers’ time deposits under the bank’s existing activities are as follows:
- (1)
When customers are contacted in April, May, July, and August, they are more likely to choose the fixed deposit.
- (2)
The results of the previous marketing event will directly affect customers’ choose of time deposits this time.
- (3)
The appropriate call duration helps with customers’ decision-making on time deposits.
- (4)
Proper regular contact with customers helps maintain the stability and durability of customer relationships.
Based on the results of the clustering analysis, the following marketing plan is formulated:
- (1)
On the basis of clustering analysis result, the following personalized marketing plans are formulated for different customer groups:
Customer group 1: This customer group is characterized by high asset balances without borrowing or housing loans, and there is a high proportion of managers and technical personnel aged about 40 years old. This customer group should be mainly maintained. Bank staff should continue to maintain the frequency of contact with them and recommend high-yield and flexible time deposit products.
Customer group 2: This customer group is characterized by high asset balances without borrowing or housing loans, and there is a high proportion of retirees and technical personnel aged about 50 years old. This customer group is the key customers. Bank staff should appropriately increase the frequency and duration of contact with them, and launch fixed deposit products with high security and high stability targeting the elderly population, emphasizing secured principal and fixed income. The bank should also optimize the offline service process, provide a dedicated customer service window or green channel to facilitate the business for elderly customers.
Customer group 3: This customer group have borrowing and housing loans and are about 30 years old. Key focus should be paid to this customer group. The bank should increase the frequency and duration of contact with them, take the initiative to understand their needs, and recommend short-term high-yield fixed deposit products with low required minimum deposits.
Customer group 4: The customer group have less asset balances, and students and unemployed people account for a relatively high proportion. They are less contacted by the bank, but they are key customers to be reached. In view of the characteristics of such customer group, banks should strengthen communication with them, cultivate their awareness and habits of financial management [
42], and launch short-term fixed deposit products with lower required minimum deposits.
- (2)
Proposed customer marketing strategies for different occupational groups
Retired Group: Retired individuals receive pensions, have low material demands, and generally maintain good saving habits. This demographic represents the greatest potential for time deposit purchases. Banks should organize more communication activities targeting elderly retirees, promote awareness of financial scams, and offer gift redemption programs to attract customers to deposit their money in banks.
Student Group: This demographic typically does not have significant amounts for time deposits but possesses strong learning capabilities and a willingness to try new things. Therefore, banks can promote savings and financial management concepts to this group, increase time deposit and financial products suitable for young people, and facilitate both online and offline marketing and participation.
High-Income Groups (Managers, Technicians, etc.): Individuals in this category have a significant amount of disposable assets, are highly educated, and possess planning awareness. They tend to have deeper thinking and foresight when approaching issues. Thus, when recommending products to this group, it is essential to present the short-term and long-term returns of deposit and financial products from a data analysis perspective.
- (3)
Implement cross-marketing for high-value and high-balance asset groups:
Cross-marketing refers to the practice of promoting two or more products through the analysis of customer historical data [
43], targeting customers with potential needs for related marketing to expand the business scope. For example, for customers with bank loans who have higher liquidity requirements, the bank can recommend credit card applications and installment payment services. For customers with high asset balances and high-value customer groups, bank financial products such as funds, insurance, and bonds can be recommended to help customers achieve wealth appreciation.
4.4. Application Effect Analysis
In this study, the proposed customer segmentation and marketing strategy is further applied to actual scenarios, where the revenue growth rate is used as the evaluation index to show the distribution in each percentile [
44]. The result value of the proposed algorithm is compared with the optimal model result. As shown in
Figure 12, the average revenue growth rate of the model algorithm was 16.08%, and that of the optimal model was 18.0%, where there is a difference of 1.92%. This is in line with the range of growth rate differences in practical applications, which indicates that the proposed customer segmentation model and marketing suggestions can help enterprises formulate marketing strategies and improve corporate efficiency.
In this study, customers are subdivided into 4 groups. In order to further identify the application effect of customer segmentation and marketing strategy and analyze the changes in customer attention to bank time deposit products, customer group 1 and customer group 4 were selected as comparison objects. Customer group 4 is a low-value customer group in customer value theory, and should be actively reached. The results are shown in
Figure 13. The average initial customer attention was 36.67%. After application of the marketing strategy, it was 41.17%, and the customers’ attention to the product increased by 4.5%, which indicates that the proposed marketing strategy is effective in expanding potential customers.
6. Conclusions
This paper addresses the limitations of the traditional DBSCAN algorithm in customer segmentation by proposing an improved KM-DBSCAN algorithm based on K-means. The effectiveness and applicability of the algorithm are verified through experiments. The research findings show that the KM-DBSCAN algorithm can reduce computational complexity while improving clustering accuracy and stability. By incorporating a weight adjustment mechanism for both explicit and implicit indicators, this study further optimizes the customer segmentation model, providing technical support for targeted marketing and differentiated services. The results show:
- (1)
With the contour coefficient, accuracy rate and F1 score as the evaluation indicators, the KM-DBSCAN algorithm achieves better overall performance than the K-Means algorithm and DBSCAN algorithm. At this time, the contour coefficient of the model was 0.97; the average accuracy rate was 92.3%; and the average F1 score was 0.92.
- (2)
The proposed customer segmentation and marketing strategy is applied to real scenarios, with revenue growth rate and customer attention as evaluation indicators. The results showed that the average revenue growth rate after application of the model algorithm was 16.08%, and customer attention increased by 4.5%.
In terms of theory, this paper optimizes the traditional DBSCAN algorithm by introducing the K-means algorithm, proposing the KM-DBSCAN algorithm, which provides a new approach for the application of density clustering methods in bank customer segmentation. This improvement not only enhances clustering accuracy and efficiency but also expands the potential of density-based clustering algorithms in real-world business scenarios. This study, through an in-depth exploration of both explicit and implicit customer characteristics, presents a customer segmentation model based on multi-dimensional information. This model not only focuses on basic customer information but also incorporates behavioral data, offering significant theoretical value, especially in the fields of customer behavior analysis and targeted marketing. This approach provides a new perspective for future academic research on consumer behavior patterns and group characteristics.
In practice, the findings of this study can assist banks in developing personalized marketing strategies based on customer group characteristics, helping banks allocate marketing and service resources more rationally, optimize resource distribution, and effectively improve customer satisfaction and operational efficiency [
47].
Although the KM-DBSCAN algorithm demonstrates advantages in accuracy and efficiency, it still has certain limitations. When handling multimodal data, such as complex texts, images, and other types of data, the integration of these data types becomes more challenging, which increases the complexity of algorithm training [
48]. While this paper improves customer segmentation accuracy by integrating explicit and implicit indicators, the set of integrated indicators remains limited. Customer characteristics involve more dimensions, such as social networks, which can also impact the effectiveness of customer segmentation. Future research could combine multimodal neural network methods [
49] and natural language processing (NLP) techniques to process various data types (such as text, images, and speech), enhancing the model’s ability to understand complex data. Fully integrating multidimensional data related to user characteristics, and utilizing social network analysis methods to segment customer groups with higher social influence, could help construct a multidimensional customer segmentation model, thereby achieving more precise customer profiling and marketing strategies. Finally, although our dataset mainly comes from a specific region, the customer behavior patterns and clustering results discovered in the study may have some similarities in other regions or countries. The algorithms and analytical framework we used also have broad applicability. In the future, we plan to incorporate datasets from more regions or countries to more comprehensively evaluate the performance of the algorithm and the generalizability of the conclusions.