Interactive 3D Vase Design Based on Gradient Boosting Decision Trees

Wang, Dongming; Xu, Xing; Xia, Xuewen; Jia, Heming

doi:10.3390/a17090407

Open AccessArticle

Interactive 3D Vase Design Based on Gradient Boosting Decision Trees

¹

College of Physics and Information Engineering, Minnan Normal University, Zhangzhou 363000, China

²

School of Information Engineering, Sanming University, Sanming 365004, China

^*

Author to whom correspondence should be addressed.

Algorithms 2024, 17(9), 407; https://doi.org/10.3390/a17090407

Submission received: 14 August 2024 / Revised: 2 September 2024 / Accepted: 4 September 2024 / Published: 11 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Traditionally, ceramic design began with sketches on rough paper and later evolved into using CAD software for more complex designs and simulations. With technological advancements, optimization algorithms have gradually been introduced into ceramic design to enhance design efficiency and creative diversity. The use of Interactive Genetic Algorithms (IGAs) for ceramic design is a new approach, but an IGA requires a significant amount of user evaluation, which can result in user fatigue. To overcome this problem, this paper introduces the LightGBM algorithm and the CatBoost algorithm to improve the IGA because they have excellent predictive capabilities that can assist users in evaluations. The algorithms are also applied to a vase design platform for validation. First, bicubic Bézier surfaces are used for modeling, and the genetic encoding of the vase is designed with appropriate evolutionary operators selected. Second, user data from the online platform are collected to train and optimize the LightGBM and CatBoost algorithms. Finally, LightGBM and CatBoost are combined with an IGA and applied to the vase design platform to verify their effectiveness. Comparing the improved algorithm to traditional IGAs, KD trees, Random Forest, and XGBoost, it is found that IGAs improve with LightGBM, and CatBoost performs better overall, requiring fewer evaluations and less time. Its R² is higher than other proxy models, achieving 0.816 and 0.839, respectively. The improved method proposed in this paper can effectively alleviate user fatigue and enhance the user experience in product design participation.

Keywords:

Interactive Genetic Algorithm; bicubic Bézier surface; LightGBM algorithm; CatBoost algorithm; 3D vase

1. Introduction

Ceramics have a long history that dates back thousands of years. The earliest ceramic products were mostly simple household utensils and decorative items. As craftsmanship techniques developed, ceramics gradually evolved into artworks with both artistic value and practicality. Ceramic products are cherished for their unique texture, heat resistance, and aesthetic appeal. From ancient hand-made pottery to modern industrial production, the development of ceramics fully reflects the progress of human civilization and technology.

With the advancement of time, ceramic design has undergone a transformation from traditional hand-drawing to modern technology-assisted design. Initially, ceramic designers would create preliminary sketches and patterns on paper, repeatedly modifying and adjusting them to form the final design. With the development of computer technology, software such as CAD (Computer-Aided Design) [1] has been gradually introduced into ceramic design, enabling designers to more precisely draw complex patterns and shapes and perform 3D modeling and virtual display. In recent years, with the rise of intelligent optimization algorithms, ceramic design has entered a new stage. Among them, the Interactive Genetic Algorithm (IGA) [2] has emerged as an advanced design tool, generating complex design schemes by simulating the evolutionary process, allowing designers to create innovative works that traditional hand methods could not conceive. The use of the IGA has not only greatly improved design efficiency and accuracy, but also opened up unprecedented possibilities for the creation of ceramic art.

The IGA is an improved optimization method derived from the Genetic Algorithm (GA). The GA is an optimization algorithm based on the principles of biological evolution. As an effective method for optimizing explicit objective performance indicators, it has gradually become widely used in feature selection and model parameter optimization across various fields. Specifically, the GA has been successfully applied to energy optimization in a heterogeneous blockchain IoT [3], effective feature selection methods in intrusion detection systems [4], Parkinson’s disease detection [5], cloud data security cryptographic systems [6], a hybrid electronic search for task scheduling in cloud computing [7], interpolation of missing values in datasets [8], and real-time monitoring system design for dust in thermal power plants [9].

However, some performance indicators are difficult to define clearly and involve subjective evaluations, known as implicit performance indicator problems. To address these implicit performance indicators, an improved human–computer interaction algorithm has been proposed, and the IGA is one of these. The IGA is an evolutionary algorithm developed from the GA, which combines human subjective judgment with computer algorithms to optimize complex problems. Nowadays, the IGA has been successfully applied in propeller optimization [10], user interface design [11], user-friendly image retrieval systems [12], traditional pattern design [13], and other fields. Although the IGA has extensive applications in many fields, there are also some problems and challenges. One of the main problems with the IGA is user fatigue during the evaluation process. Users need to spend a lot of time and effort evaluating candidate solutions, especially during long-term optimization processes, which may lead to user fatigue and loss of interest, affecting their investment in the task and the accuracy of the evaluation. Therefore, how to effectively reduce user fatigue and improve user engagement is one of the important issues that the IGA algorithm needs to solve.

Nowadays, more and more researchers are adopting proxy models to alleviate the evaluation burden on users of IGAs. These proxy models can predict the fitness of candidate solutions, thereby reducing the number of times users need to directly evaluate solutions and effectively alleviating the problem of user fatigue. Wang and Xu [14] used a PSO-optimized XGBoost algorithm to assist in the evaluation of Interactive Genetic Algorithms. Sun et al. [15] proposed a semi-supervised learning-assisted Interactive Genetic Algorithm with a large population to alleviate the evaluation burden. Huang et al. [16] constructed KD tree proxy models and Random Forest proxy models based on historical user evaluation information to assist in evaluation. Gypa et al. [1] further integrated IGAs into SVM models to avoid user fatigue. Some researchers consider that ordinary users usually do not possess professional design skills, so they propose a method of individual interval fitness to reduce the impact of ambiguity in fitness on evaluation. Wang et al. [17] proposed a product form design method that combines IGAs with interval hesitation time and user satisfaction. Considering the importance of individual customer preferences and the key to effective user engagement, Dou et al. [18] proposed an innovative approach that combines the Kano model with IGAs to achieve more effective product customization and drive customer-driven product design.

LightGBM and CatBoost are machine learning methods based on Gradient Boosting Decision Trees (GBDT). Their combination with optimization algorithms has been widely applied in various fields. For example, Li et al. [19] proposed an intrusion detection method based on optimizing LightGBM with a Genetic Algorithm. Li et al. [20] proposed a model for predicting the exhaust gas temperature of aircraft engines based on optimizing LightGBM with a bat algorithm. Qian et al. [21] used metaheuristic algorithms combined with CatBoost to predict urban gas consumption. Kilinc et al. [22] used a Genetic Algorithm to optimize CatBoost for predicting river flow. Khan et al. [23] proposed an intelligent fish farm dissolved oxygen optimization prediction based on CatBoost and XGBoost optimized by a Genetic Algorithm.

These studies demonstrate that the powerful predictive capabilities of LightGBM and CatBoost can solve many real-world problems. To address the issue of user fatigue in IGAs, this article proposes a method for optimizing IGAs based on the LightGBM and CatBoost algorithms. By collecting evaluation data from online platform users, a dataset was built to train and validate the LightGBM and CatBoost proxy models. In the process of population evolution, the proxy model will predict the fitness of each individual in the population, assist users in evaluation, and if it does not match the expected fitness value, users can make modifications. The innovation of this article lies in the use of a proxy model, which effectively alleviates user fatigue. Finally, the algorithm was applied to the bottle design platform to verify its performance and alleviate user fatigue.

2. Algorithm and Principle

2.1. Proposed Method

In this study, LightGBM and CatBoost were chosen as the core models to improve the Interactive Genetic Algorithm. This decision is primarily based on empirical findings from the existing literature. After conducting an in-depth analysis and comparison of multiple relevant papers [24,25,26], the authors found that LightGBM and CatBoost demonstrate outstanding performance in handling predictive tasks, particularly when dealing with high-dimensional data and complex feature relationships. Specifically, LightGBM excels in efficiency and speed of training within its gradient-boosting framework, significantly reducing computational costs while maintaining prediction accuracy. On the other hand, CatBoost offers unique advantages in handling categorical features and requires minimal data preprocessing, making it more robust and stable in practical applications. Given these strengths, this study employs these two algorithms to enhance the Interactive Genetic Algorithm, aiming to improve the model’s prediction accuracy and overall effectiveness.

2.2. Decision Tree

Decision trees [27] are some of the commonly used machine learning algorithms for building regression prediction models. The principle is as follows:

Nodes and splits: The decision tree consists of nodes and branches. Each non-leaf node in the tree represents a test on a feature attribute; each branch represents the test result; and each leaf node stores a category label or regression value. The process of constructing a decision tree is to select appropriate feature attributes and test them, dividing the dataset into different subsets until the predetermined stopping conditions are met.
Feature selection: At each node, the decision tree algorithm needs to select an optimal feature for splitting to maximize the purity of the node. Generally, the decision tree uses information gain to select features. For the feature set $a = {a_{1}, a_{2}, \dots, a_{i}}$ in the dataset, the information gain of the feature $a_{i}$ represents the degree to which the information of the feature $a_{i}$ reduces the uncertainty of the target data $Z$ . The expression for calculating the information gain $g (Z, a_{i})$ of the feature $a_{i}$ on target data $Z$ is shown in Equation (1):

$g (Z, a_{i}) = H (Z) - H (Z| a_{i}) .$

(1)
Recursive splitting: The process of constructing a decision tree is recursive. Starting from the root node, select the optimal feature for splitting, and then recursively process each child node until the stopping condition is met, such as if the number of samples in the node is less than a certain threshold, the depth of the tree reaches a predetermined value, or there are no more features to choose from.
Pruning: Decision trees are prone to overfitting training data, and to improve generalization ability, pruning of decision trees is necessary. The purpose of pruning is to simplify the model and reduce overfitting. Common pruning methods include pre-pruning and post-pruning. Pre-pruning is the process of stopping the growth of a tree in advance based on certain conditions during the construction of the tree; post-pruning involves first constructing a complete tree and then reducing the complexity of the tree by deleting nodes or subtrees.
Classification and regression: Decision trees can be used for classification and regression problems. In classification problems, each leaf node represents a category; in regression problems, leaf nodes store a numerical value.

2.3. Boosting Decision Trees

Boosting decision trees [28] is an ensemble learning method that involves iteratively training multiple decision tree models and combining them into a strong learner. The improvement method uses an additive model and a forward step-by-step algorithm, which iteratively trains multiple weak learners (such as decision trees) and combines them into a strong learner. For the regression problem, given the training dataset

A = [(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n})]

, where

n

is the number of samples in the dataset,

X_{n}

represents the input value of the nth training sample, and

Y_{n}

represents the output value of the nth training sample. For a regression problem with one output parameter, first, initialize a basic model, usually using a simple model to fit the initial values of the dataset. Equation (2) presents the initial model for boosting decision trees:

g_{0} (x) = 0 .

(2)

Then, the residuals between the prediction model of the

(m - 1)

th boosting decision tree and the actual values of the samples are calculated. These residuals are used to fit and generate a new boosting decision tree

T (x; θ_{m})

. The residuals are used to correct the errors from the previous round, and the calculation formula is shown in Equation (3):

r_{m i} = y_{i} - g_{m - 1} (x), i = 1, 2, \dots, n .

(3)

Finally, the boosting decision tree model for the

m

th tree is obtained, as shown in Equation (4):

g_{m} (x) = g_{m - 1} (x) + T (x; θ_{m}) .

(4)

Repeat the above steps until the number of generated boosting decision trees reaches the expected value

M

, and the boosting decision trees model of the regression problem can be expressed by Equation (5):

g_{M} (x) = \sum_{m = 1}^{M} T (x; θ_{m}) .

(5)

In Equation (5),

T (x; θ_{m})

is the

m

th boosting decision tree,

θ_{m}

is the parameter of the

m

th boosting decision tree, and

M

is the total number of boosting decision trees.

2.4. LightGBM Algorithm

LightGBM [29] (Light Gradient Boosting Machine) is a framework based on the GBDT developed by Microsoft. It supports efficient parallel training and has advantages such as faster training speed, lower memory consumption, better accuracy, and support for distributed processing of massive data. In the GBDT algorithm, two different splitting strategies are used: level-wise splitting strategy and leaf-wise splitting strategy. The traditional GBDT adopts a deep splitting strategy which splits all nodes in each layer, which increases the computational complexity of the algorithm and leads to a decrease in efficiency. In contrast, LightGBM uses a leaf-splitting strategy, first calculating the splitting gain of features and then selecting the feature with the highest gain as the splitting node, which improves the utilization of sample information, reduces the computational burden of useless nodes, and improves the efficiency of training models.

The LightGBM algorithm uses a histogram algorithm to save time and memory. The basic idea is to discretize continuous floating-point eigenvalues into k integers and construct a histogram with a width of k. When traversing data, it accumulates statistics in the histogram based on the discretized values. After completing a data traversal, the histogram accumulates the required statistics and then searches for the optimal segmentation point based on the discrete values of the histogram. Therefore, this algorithm outperforms traditional GBDT algorithms regarding training speed and space utilization. In addition, as the decision tree is a weak classifier, the histogram algorithm has a regularization effect and can effectively prevent overfitting.

LightGBM uses GOSS (Gradient-based One-Side Sampling) technology to improve training efficiency by utilizing gradient information. GOSS prioritizes retaining samples with larger gradients as these samples are crucial for model updates. Meanwhile, a portion of the samples with smaller gradients is randomly sampled to ensure representativeness and reduce computational complexity. To maintain the balance of data distribution, GOSS weights and adjusts these samples with smaller gradients to compensate for the decrease in quantity. LightGBM also adopts Exclusive Feature Bundling (EFB) technology, which effectively reduces the number of features in model training by identifying and merging exclusive features, thereby improving training efficiency and memory utilization. This technology is particularly suitable for high-dimensional and sparse datasets, making LightGBM more efficient in processing large-scale data. The pseudocode for LightGBM is shown in Algorithm 1.

Algorithm 1: LightGBM Algorithm

1: Input: train_data, test_data, hyperparameters

2: Output: predictions

3: lgb_model = lgb.LGBMRegressor(learning_rate = hyperparameters[‘learning_rate’],

max_depth = hyperparameters[‘max_depth’],

subsample = hyperparameters[‘subsample’],

colsample_bylevel = hyperparameters[‘colsample_bylevel],

n_estimators = hyperparameters[‘n_estimators’],

random_seed = 42)

4: lgb_model.fit(train_data[‘X’], train_data[‘y’], eval_set = (test_data[‘X’], test_data[‘y’]))

5: predictions = lgb_model.predict(test_data[‘X’])

6: return predictions

2.5. CatBoost Algorithm

The CatBoost [30] algorithm was developed by the Yandex team and optimized specifically for datasets containing classification features. The CatBoost algorithm is competitive in performance and accuracy and can handle large-scale datasets. The chief characteristic of the CatBoost algorithm is that it specifically processes classification features without the need for tedious preprocessing. It adopts the idea of gradient boosting but has been optimized in implementation to handle high-dimensional and large-scale datasets.

The simplest way to handle the classification features in a GBDT is to replace them with the average value of the labels corresponding to the classification features. In the decision tree, the average value of labels will serve as the criterion for node splitting. This method is called Greedy Target-based Statistics, as shown in Equation (6):

x_{i, k} = \frac{\sum_{j = 1}^{n} [x_{j, k} = x_{i, k}] \cdot Y_{i}}{\sum_{j = 1}^{n} [x_{j, k} = x_{i, k}]} .

(6)

This method has an obvious drawback, which is that the features usually contain more information than the labels. If the average value of the labels is forcibly used to represent the features, there will be a condition shift problem when the training dataset and test dataset have different data structures and distributions. CatBoost uses Ordered Target Statistics (OTS) to reduce the impact of noise and low-frequency categorical data on data distribution. OST is shown in Equation (7):

x_{i, k} = \frac{\sum_{j = 1}^{p - 1} [x_{σ j, k} = x_{σ p, k}] \cdot Y_{j} + a \cdot p}{\sum_{j = 1}^{p - 1} [x_{σ j, k} = x_{σ p, k}] + a},

(7)

where

p

is the added prior term, and

a

is usually a weight coefficient greater than 0. Adding prior terms is a common practice, which can reduce noisy data for features with fewer categories.

The specific steps of feature processing are as follows: Firstly, it randomly sorts the dataset to reduce the model’s dependence on specific features and increase its generalization ability. Secondly, for each category of features, CatBoost performs OTS calculations to convert the statistical information of target variables for each category into numerical features. This transformation method is often based on the average value of the target variable in that category, and the features are mapped to numerical values so that the model can directly use these transformed features for training. In addition, CatBoost also introduces the concept of priority, which calculates weight coefficients based on statistical information of features and is used to weight different categories to improve the stability and generalization ability of the model. These feature-processing strategies enable CatBoost to process classification features more effectively, improving the performance and generalization ability of the model. The pseudocode for CatBoost is shown in Algorithm 2.

Algorithm 2: CatBoost Algorithm

1: Input: train_data, test_data, hyperparameters

2: Output: predictions

3: catboost_model = CatBoostRegressor(learning_rate = hyperparameters[‘learning_rate’],

max_depth = hyperparameters[‘max_depth’],

subsample = hyperparameters[‘subsample’],

colsample_bylevel = hyperparameters[‘colsample_bylevel’],

n_estimators = hyperparameters[‘n_estimators’],

random_seed = 42)

4: catboost_model.fit(train_data[‘X’], train_data[‘y’], eval_set = (test_data[‘X’], test_data[‘y’]))

5: predictions = catboost_model.predict(test_data[‘X’])

6: return predictions

2.6. Proxy Model Process

The specific steps for building a proxy model are shown in Figure 1. The main steps include collecting data, selecting training models, conducting training, saving models, and testing model performance.

2.7. Data Collection and Update

The use of proxy models to improve IGAs aims to assist users in evaluating products; however, training proxy models requires a large amount of data to improve their accuracy. Collecting data requires users to provide feedback, but a small amount of user feedback contains limited data. Therefore, this article designs an online platform [31] to collect data on online users, including basic user information, characteristics and fitness values of each generation population, running time, and other relevant information. The collected dataset is used to train the LightGBM algorithm model and the CatBoost algorithm model to help users predict and evaluate individual fitness values.

Although the LightGBM algorithm model and the CatBoost algorithm model make predictions based on similar individual information, the fitness values predicted based on historical data of different users may deviate from the fitness values evaluated by the current user. To address this, users are allowed to adjust the prediction results, and the evaluation data of new users are ultimately stored in the database. Since a significant amount of time is required for each training of the proxy model, to avoid taking up the user’s time, the proxy model is set to automatically update after the user’s data are saved, allowing the model to account for more situations. The specific update process is detailed in Figure 2.

3. Vase Design and Algorithm Design

3.1. Vase Design

The vase designed by the user in this article is modeled using a bicubic Bézier surface [32], which can be used to design various vase shapes by adjusting the control points of the vase surface, enriching the diversity of the vase. The formula for bicubic Bézier surfaces is as follows:

p (u, v) = \sum_{i = 0}^{3} \sum_{j = 0}^{3} P_{i, j} B_{i, 3} (u) B_{j, 3} (v), (u, v) \in [0,1] \times [0,1] .

(8)

In Equation (8),

B_{i, 3} (u)

and

B_{j, 3} (v)

are cubic Bernstein basis functions, and

P_{i, j}

is the control point of the surface. The equation can be changed into the following:

p (u, v) = U M P M^{T} V^{T} .

(9)

In Equation (9),

U = [u^{3} u^{2} u 1]

,

V = [v^{3} v^{2} v 1]

,

M = [\begin{matrix} \begin{matrix} - 1 & 3 \end{matrix} & - 3 & 1 \\ \begin{matrix} 3 & - 6 \end{matrix} & 3 & 0 \\ \begin{matrix} \begin{matrix} - 3 \\ 1 \end{matrix} & \begin{matrix} 3 \\ 0 \end{matrix} \end{matrix} & \begin{matrix} 0 \\ 0 \end{matrix} & \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix}]

,

P = [\begin{matrix} \begin{matrix} P_{0,0} & P_{0,1} \end{matrix} & P_{0,2} & P_{0,3} \\ \begin{matrix} P_{1,0} & P_{1,1} \end{matrix} & P_{1,2} & P_{1,3} \\ \begin{matrix} \begin{matrix} P_{2,0} \\ P_{3,0} \end{matrix} & \begin{matrix} P_{2,1} \\ P_{3,1} \end{matrix} \end{matrix} & \begin{matrix} P_{2,2} \\ P_{3,2} \end{matrix} & \begin{matrix} P_{2,3} \\ P_{3,3} \end{matrix} \end{matrix}]

.

The vase model in this article consists of 16 bicubic Bézier surfaces, including a bottle mouth surrounded by 4 bicubic Bézier surfaces, a bottle body surrounded by 8 bicubic Bézier surfaces, and a bottom cover surrounded by 4 bicubic Bézier surfaces. During population evolution, differently shaped vase models can be designed by changing the parameters of surface control points. However, without texture, a vase is not complete. Therefore, various prepared vase texture maps were mapped to the surface of the vase using image texture mapping technology [33], increasing the realism of the vase and enriching its style.

3.2. Gene Coding

This article aims to apply IGAs to vase design, requiring the conversion of various vase design parameters into computer-readable formats, with gene coding being one effective method. The design of a vase can be represented by a series of parameters, such as shape, size, pattern, etc. The design parameters of each vase can be represented as a genome, where each gene represents a specific design parameter. This article divides the genome of a vase into three parts, namely vase texture, height, and scaling factor of control points. By crossing and mutating genomes, various types of vases can be obtained, as shown in Figure 3. The bottle body is controlled by control points

P_{0}

–

P_{6}

, and the shape of the vase is changed by changing the scaling coefficient of the control points.

In this paper, binary encoding is used as the method for gene encoding. The specific gene encoding is shown in Table 1. The vase height part is composed of 8 bits of binary code. The vase texture is represented by 6 bits of binary code as there are 64 texture images. Each control point,

P_{0}

–

P_{6}

, is composed of 8 bits of binary code. Therefore, the entire genome is composed of 70 bits of binary code. After the population evolves, the genome is decoded, and the vase parameters obtained from the decoding are used to generate new vases.

3.3. Evolution Operators

In Genetic Algorithms, evolution operators include selection, crossover, and mutation. In vase design, the selection operation identifies the best individuals based on the fitness of the vase designs; the crossover operation is used to exchange genetic information between two individuals to produce new individuals; and the mutation operation introduces new genetic variations into the population. The evolution operators used in this paper are roulette wheel selection with an elite strategy, multi-point crossover, and simple mutation. Roulette wheel selection assigns selection probabilities to each individual based on their fitness, with higher fitness individuals having a greater chance of being selected, as shown in Figure 4. The elite strategy retains the best individuals in the population. During crossover, n crossover points are chosen randomly on the genome, followed by the exchange of genetic information. During mutation, one random position is selected on the genome, and the gene values of the individuals are randomly changed at these positions. The specific implementation of crossover and mutation is shown in Figure 5. Where the blue arrows represent the position selection for crossover and mutation, and the red characters represent the genes after crossover and mutation.

3.4. Algorithm Flow

The algorithm flow includes steps such as initializing the population, evaluating fitness values, selecting excellent individuals, and performing crossover and mutation. In the initialization phase, an initial population is randomly generated, with each individual representing a possible vase design. Next, the proxy model evaluates the fitness values of each individual, and then users modify the fitness values of those deemed inappropriate to assess the quality of their design. In the selection phase, excellent individuals are chosen based on their fitness values, and crossover and mutation operations are performed to produce new offspring. This process continues for multiple generations until the stopping condition is met. The flowchart is shown in Figure 6.

By following the above steps, an interactive 3D vase design based on a GBDT can be achieved. This design method fully leverages the human–computer interaction capabilities of IGAs, enabling users to engage in vase design and ultimately obtain vase design solutions that meet their personal needs and preferences.

4. Experimental Results and Analysis

4.1. Parameter Setting of the Genetic Algorithm

Consideration of user experience is crucial during human–computer interaction. Excessive evaluations per generation and too many generations can lead to user fatigue, while appropriate parameter settings can alleviate this fatigue. Based on references to existing papers and experimental validation, the population size for each evolution is set to 6, the number of generations to 20, and the crossover and mutation rates to 0.9 and 0.1, respectively, as shown in Table 2.

4.2. Comparison of Evolution Strategies

To demonstrate the superiority of the evolution operators chosen in this paper, two different evolution strategies for the IGA were compared, as shown in Table 3.

Figure 7 shows the average fitness values of the evolving population in each generation, where the two lines represent the average fitness value changes for the IGA using Strategy 1 and Strategy 2. Overall, both lines show an upward trend. In the later stages of evolution, the line for Strategy 1 is higher than that for Strategy 2, indicating that Strategy 1 achieves better final optimization results.

4.3. Comparison of Different Proxy Models

To validate the predictive capability of the CatBoost model and LightGBM model chosen in this paper, comparisons were made with the KDT algorithm, RF algorithm, and XGBoost algorithm. User data from the 3D vase design platform were used as the dataset, which was divided into training and test sets in certain proportions to train and validate the models. The predicted values output by the models were compared with the actual values, and a random selection of test set samples was displayed, as shown in Figure 8.

From Figure 8, it can be seen that the curves of the CatBoost model and LightGBM model fit the actual value curves better than those of the other models, indicating better fitting performance. To further verify the predictive performance of the LightGBM and CatBoost models, RMSE, MSE, MAE, and accuracy were used as performance evaluation metrics.

As shown in Table 4, the RMSE for the LightGBM and CatBoost models is 1.0200 and 0.9900, respectively, with MSE values of 1.05 and 0.99, MAE values of 0.626 and 0.637, and R² of 0.816 and 0.839. Compared with the other three proxy models, these values indicate lower errors and higher accuracy, demonstrating that the LightGBM and CatBoost models have better predictive performance.

4.4. Comparison of Optimization Capabilities

To verify the impact of combining proxy models on the optimization capabilities of the IGA, a comparison was made between a traditional IGA and an IGA combined with KDT (KDT-IGA), RF (RF-IGA), XGBoost (XGB-IGA), LightGBM (LGBM-IGA), and CatBoost (CAT-IGA). The more effective Strategy 1 was used for the evolutionary parameters of all six algorithms.

A simple and user-friendly interface was designed for this study, as shown in Figure 9. The left side displays vase individuals from the evolving population for evaluation (maximum score of 10), and the right side shows population size, maximum population evolution generations, crossover probability, mutation probability, and the proxy model used. We compared the average fitness values and average maximum fitness values when five users used six kinds of algorithms to prove the optimization capabilities of the LGBM-IGA and CAT-IGA.

Based on user interactions with the vase design platform, evaluation data were collected from five users, and the average fitness values and average maximum fitness values were calculated. Figure 10a shows the average fitness values of the evolving population in each generation for the six algorithms. The lines for LGBM-IGA and CAT-IGA are higher than those for the other algorithms, showing a general upward trend. Figure 11a shows the average maximum fitness values of the evolving population in each generation, where the lines for LGBM-IGA and CAT-IGA are higher than those for other algorithms. This comparison experiment shows that the two improved algorithms proposed in this paper have stronger optimization capabilities.

Figure 10b and Figure 11b, respectively, display the final generation’s average fitness values and average maximum fitness values, with error bars for the six algorithms. Among them, LGBM-IGA and CAT-IGA consistently achieve the highest average fitness values and average maximum fitness values, with relatively small error bars, indicating their superior performance and stability.

4.5. Comparison of User Fatigue Alleviation

The IGA optimizes problem-solving through user interaction to address issues that cannot be defined with formulas. However, a core issue with IGAs is user fatigue as prolonged evaluations can lead to fatigue. When users feel fatigued, their efficiency and productivity may decrease. Fatigue affects concentration and attention, making it harder for users to focus on tasks, thereby reducing overall system efficiency. This paper uses LGBM-IGA and CAT-IGA algorithms to predict vase fitness values, fitting user evaluation values to reduce the number of operations and alleviate user fatigue. To evaluate the proposed algorithm’s effectiveness in alleviating user fatigue, experiments were conducted to record the evaluation numbers and evaluation time for five users using six algorithms. In this paper, evaluation numbers refer to the count of evaluations a user performs on individuals in the population, while evaluation time reflects the duration time users spend in the design process, system running speed, and algorithm performance. With increasing evaluation numbers and longer evaluation time, users may experience fatigue. Therefore, comparing these metrics across different algorithms can better assess the effectiveness of the proposed algorithm in mitigating user fatigue. Specific data are shown in Table 5.

Figure 12 and Figure 13 show the mean and standard deviation of evaluation numbers and evaluation time for five users on the vase design platform using IGA, KDT-IGA, RF-IGA, XGB-IGA, LGBM-IGA, and CAT-IGA. From Figure 12 and Figure 13, it can be seen that IGA algorithms with proxy models have lower average evaluation numbers and a shorter evaluation time than a traditional IGA, with LGBM-IGA and CAT-IGA exhibiting the lowest average evaluation numbers and the shortest evaluation time among the proxy models. Compared to the IGA, the evaluation numbers were reduced by 74 and 72.4, and the evaluation time was reduced by 233 s and 230 s, respectively. The standard deviations of the evaluation numbers and evaluation time for LGBM-IGA and CAT-IGA are smaller than those for other proxy models, indicating more consistent evaluation numbers and evaluation time across different users. Since the evaluation numbers for the IGA were the same, the standard deviation calculation result was 0. Comparing the mean and standard deviation of the evaluation numbers and evaluation time for the six algorithms, it is evident that proxy model IGAs have an advantage in mitigating user fatigue over traditional IGAs. Among them, LGBM-IGA and CAT-IGA show lower mean and standard deviations, demonstrating higher stability and effectively reducing user fatigue. This indicates that LGBM-IGA and CAT-IGA are more effective compared with other algorithms in alleviating user fatigue, thereby enhancing algorithm performance and user experience.

5. Conclusions

This system aims to optimize user experience and improve design efficiency. An improved IGA method is proposed, utilizing LightGBM and CatBoost as proxy models to predict vase fitness values. The algorithm is applied to a 3D vase design platform, and its effectiveness is validated through experiments. This method relies on user historical data to train LightGBM and CatBoost proxy models, predict user fitness evaluations, and continuously update the models for increased accuracy.

The experimental results show that, according to a comparison of the proxy model performance metrics, LightGBM and CatBoost outperform KDT, RF, and XGBoost with respect to predictive performance and accuracy. Both LGBM-IGA and CAT-IGA exhibit higher average fitness values and maximum fitness curves compared with KDT-IGA, RF-IGA, and XGB-IGA, indicating stronger optimization capabilities. Moreover, LGBM-IGA and CAT-IGA effectively reduce evaluation frequency and duration compared with KDT-IGA, RF-IGA, and XGB-IGA, thereby alleviating user fatigue and making operations more convenient. In summary, the proposed improved algorithm in this paper streamlines the operational process, reduces user fatigue, and enhances product design efficiency, while lowering design costs and strengthening product innovation and competitiveness. However, individual differences may affect the effectiveness of fatigue alleviation, necessitating further in-depth research.

Author Contributions

Conceptualization, D.W. and X.X. (Xing Xu); methodology, D.W. and X.X. (Xing Xu); software, D.W.; validation, D.W., X.X. (Xing Xu), X.X. (Xuewen Xia) and H.J.; formal analysis, D.W. and X.X. (Xing Xu); investigation, D.W.; resources, D.W.; data curation, D.W.; writing—original draft preparation, D.W.; writing—review and editing, D.W., X.X. (Xing Xu), X.X. (Xuewen Xia) and H.J.; visualization, D.W.; supervision, X.X. (Xing Xu), X.X. (Xuewen Xia) and H.J.; project administration, D.W.; funding acquisition, X.X. (Xing Xu). All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the Natural Science Foundation of Fujian Province of China (no. 2021J011007), the Fujian Provincial Department of Education Undergraduate Education and Teaching Research Project (no. FBJY20230083), the Principal’s Foundation of Minnan Normal University (KJ19015), the Program for the Introduction of High-Level Talent of Zhangzhou, and the National Natural Science Foundation of China (no. 61702239).

Data Availability Statement

The data will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Buhr, R.J.A.; Karam, G.M.; Hayes, C.J.; Woodside, C.M. Software CAD: A Revolutionary Approach. IEEE Trans. Softw. Eng. 1989, 15, 235–249. [Google Scholar] [CrossRef]
Kim, H.-S.; Cho, S.-B. Application of Interactive Genetic Algorithm to Fashion Design. Eng. Appl. Artif. Intell. 2000, 13, 635–644. [Google Scholar] [CrossRef]
Babu, R.M.; Satamraju, K.P.; Gangothri, B.N.; Malarkodi, B.; Suresh, C.V. A Hybrid Model Using Genetic Algorithm for Energy Optimization in Heterogeneous Internet of Blockchain Things. Telecommun. Radio Eng. 2024, 83, 1–16. [Google Scholar] [CrossRef]
Halim, Z.; Yousaf, M.N.; Waqas, M.; Sulaiman, M.; Abbas, G.; Hussain, M.; Ahmad, I.; Hanif, M. An Effective Genetic Algorithm-Based Feature Selection Method for Intrusion Detection Systems. Comput. Secur. 2021, 110, 102448. [Google Scholar] [CrossRef]
Soumaya, Z.; Taoufiq, B.D.; Benayad, N.; Yunus, K.; Abdelkrim, A. The Detection of Parkinson Disease Using the Genetic Algorithm and SVM Classifier. Appl. Acoust. 2021, 171, 107528. [Google Scholar] [CrossRef]
Tahir, M.; Sardaraz, M.; Mehmood, Z.; Muhammad, S. CryptoGA: A Cryptosystem Based on Genetic Algorithm for Cloud Data Security. Clust. Comput. 2021, 24, 739–752. [Google Scholar] [CrossRef]
Velliangiri, S.; Karthikeyan, P.; Arul Xavier, V.M.; Baswaraj, D. Hybrid Electro Search with Genetic Algorithm for Task Scheduling in Cloud Computing. Ain Shams Eng. J. 2021, 12, 631–639. [Google Scholar] [CrossRef]
Shahzad, W.; Rehman, Q.; Ahmed, E. Missing Data Imputation Using Genetic Algorithm for Supervised Learning. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 080360. [Google Scholar] [CrossRef]
Wang, B.; Yao, X.; Jiang, Y.; Sun, C.; Shabaz, M. Design of a Real-Time Monitoring System for Smoke and Dust in Thermal Power Plants Based on Improved Genetic Algorithm. J. Healthc. Eng. 2021, 2021, e7212567. [Google Scholar] [CrossRef]
Gypa, I.; Jansson, M.; Wolff, K.; Bensow, R. Propeller Optimization by Interactive Genetic Algorithms and Machine Learning. Ship Technol. Res. 2023, 70, 56–71. [Google Scholar] [CrossRef]
Quiroz, J.C.; Louis, S.J.; Shankar, A.; Dascalu, S.M. Interactive Genetic Algorithms for User Interface Design. In Proceedings of the 2007 IEEE Congress on Evolutionary Computation, Singapore, 25–28 September 2007; pp. 1366–1373. [Google Scholar]
Lai, C.-C.; Chen, Y.-C. A User-Oriented Image Retrieval System Based on Interactive Genetic Algorithm. IEEE Trans. Instrum. Meas. 2011, 60, 3318–3325. [Google Scholar] [CrossRef]
Lv, J.; Zhu, M.; Pan, W.; Liu, X. Interactive Genetic Algorithm Oriented toward the Novel Design of Traditional Patterns. Information 2019, 10, 36. [Google Scholar] [CrossRef]
Wang, D.; Xu, X. 3D Vase Design Based on Interactive Genetic Algorithm and Enhanced XGBoost Model. Mathematics 2024, 12, 1932. [Google Scholar] [CrossRef]
Sun, X.; Gong, D.; Zhang, W. Interactive Genetic Algorithms with Large Population and Semi-Supervised Learning. Appl. Soft Comput. 2012, 12, 3004–3013. [Google Scholar] [CrossRef]
Huang, D.; Xu, X.; Zhang, Y.; Xia, X. Improved Interactive Genetic Algorithm for Three-Dimensional Vase Modeling Design. Comput. Intell. Neurosci. 2022, 2022, e6315674. [Google Scholar] [CrossRef]
Wang, T.; Zhou, M. A Method for Product Form Design of Integrating Interactive Genetic Algorithm with the Interval Hesitation Time and User Satisfaction. Int. J. Ind. Ergon. 2020, 76, 102901. [Google Scholar] [CrossRef]
Dou, R.; Zhang, Y.; Nan, G. Application of Combined Kano Model and Interactive Genetic Algorithm for Product Customization. J. Intell. Manuf. 2019, 30, 2587–2602. [Google Scholar] [CrossRef]
Li, Z.; Li, X. Intrusion Detection Method Based on Genetic Algorithm of Optimizing LightGBM. In Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering, Association for Computing Machinery, New York, NY, USA, 31 December 2022; pp. 1366–1371. [Google Scholar]
Li, D.; Peng, J.; He, D. Aero-Engine Exhaust Gas Temperature Prediction Based on LightGBM Optimized by Improved Bat Algorithm. Therm. Sci. 2021, 25, 845–858. [Google Scholar] [CrossRef]
Qian, L.; Chen, Z.; Huang, Y.; Stanford, R.J. Employing Categorical Boosting (CatBoost) and Meta-Heuristic Algorithms for Predicting the Urban Gas Consumption. Urban Clim. 2023, 51, 101647. [Google Scholar] [CrossRef]
Kilinc, H.C.; Ahmadianfar, I.; Demir, V.; Heddam, S.; Al-Areeq, A.M.; Abba, S.I.; Tan, M.L.; Halder, B.; Marhoon, H.A.; Yaseen, Z.M. Daily Scale River Flow Forecasting Using Hybrid Gradient Boosting Model with Genetic Algorithm Optimization. Water Resour. Manag. 2023, 37, 3699–3714. [Google Scholar] [CrossRef]
Khan, P.W.; Byun, Y.C. Optimized Dissolved Oxygen Prediction Using Genetic Algorithm and Bagging Ensemble Learning for Smart Fish Farm. IEEE Sens. J. 2023, 23, 15153–15164. [Google Scholar] [CrossRef]
Wang, B.; Wu, P.; Chen, Q.; Ni, S. Prediction and Analysis of Train Passenger Load Factor of High-Speed Railway Based on LightGBM Algorithm. J. Adv. Transp. 2021, 2021, 9963394. [Google Scholar] [CrossRef]
Ibrahim, A.; Ridwan, R.L.; Muhammed, M.M.; Abdulaziz, R.O.; Saheed, G. Comparison of the CatBoost Classifier with Other Machine Learning Methods. IJACSA 2020, 11, 232846952. [Google Scholar] [CrossRef]
Wang, Y. Personality Type Prediction Using Decision Tree, GBDT, and Cat Boost. In Proceedings of the 2022 International Conference on Big Data, Information and Computer Network (BDICN), Sanya, China, 20–22 January 2022; pp. 552–558. [Google Scholar]
Myles, A.J.; Feudale, R.N.; Liu, Y.; Woody, N.A.; Brown, S.D. An Introduction to Decision Tree Modeling. J. Chemom. 2004, 18, 275–285. [Google Scholar] [CrossRef]
Roe, B.P.; Yang, H.-J.; Zhu, J.; Liu, Y.; Stancu, I.; McGregor, G. Boosted Decision Trees as an Alternative to Artificial Neural Networks for Particle Identification. In Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment; Elsevier: Amsterdam, The Netherlands, 2005; Volume 543, pp. 577–584. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, Canada, 3–8 December 2018; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Huang, D.; Xu, X. IGAOD: An Online Design Framework for Interactive Genetic Algorithms. SoftwareX 2022, 19, 101205. [Google Scholar] [CrossRef]
Gao, X.R. Bezier Surfaces and Texture Mapping Using Java 3D. In Advanced Engineering Forum; Trans Tech Publications Ltd.: Stafa-Zurich, Switzerland, 2012; Volume 6–7, pp. 1000–1003. [Google Scholar] [CrossRef]
Foley, J.D. Computer Graphics: Principles and Practice; Addison-Wesley Professional: Boston, MA, USA, 1996; ISBN 978-0-201-84840-3. [Google Scholar]

Figure 1. Proxy model flowchart.

Figure 2. Updating the proxy model.

Figure 3. Composition of vase coding.

Figure 4. Roulette wheel selection.

Figure 5. Graphic interaction mechanism.

Figure 6. Algorithm flow chart.

Figure 7. Comparison of evolutionary strategies.

Figure 8. Comparison of predicted and actual values for five proxy models.

Figure 9. Interactive interface.

Figure 10. (a) Comparison of average fitness values. (b) Comparison of average fitness values in the last generation.

Figure 11. (a) Comparison of average maximum fitness values. (b) Comparison of average maximum fitness values in the last generation.

Figure 12. Mean value and standard deviation of evaluation numbers.

Figure 13. Mean value and standard deviation of evaluation time.

Table 1. Binary encoding.

Encoding Composition	Binary Encoding	Number of Binary Bits
height	00101011	8 bit
texture	000110	6 bit
$P_{0}$	00111011	8 bit
$P_{1}$	00010011	8 bit
$P_{2}$	00111111	8 bit
$P_{3}$	01000001	8 bit
$P_{4}$	01000011	8 bit
$P_{5}$	00111111	8 bit
$P_{6}$	00111010	8 bit

Table 2. Genetic parameter setting.

Parameter	Numerical Value
mutation probability	0.1
crossover probability	0.9
population size	6
maximum generation	20

Table 3. Evolutionary strategy.

Evolutionary Operator	Strategy 1	Strategy 2
selection	roulette wheel selection and elite strategies	randomized selection and elite strategy
crossover	multi-point crossover	two-point crossover
mutation	simple mutation	simple mutation

Table 4. Performance comparison of proxy models.

Proxy Model	RMSE	MSE	MAE	R²
KDT	2.5677	2.78	1.953	0.537
RF	1.1533	1.33	0.750	0.766
XGBoost	1.0858	1.18	0.695	0.793
LightGBM	1.0200	1.05	0.626	0.816
CatBoost	0.9900	0.99	0.637	0.839

Table 5. Comparison of evaluation numbers and evaluation time.

User Number		1	2	3	4	5
IGA	evaluation numbers	120	120	120	120	120
IGA	evaluation time	511 s	394 s	478 s	354 s	468 s
KDT-IGA	evaluation numbers	73	66	57	54	42
KDT-IGA	evaluation time	369 s	407 s	370 s	322 s	341 s
RF-IGA	evaluation numbers	68	65	83	31	36
RF-IGA	evaluation time	340 s	341 s	314 s	286 s	292 s
XGB-IGA	evaluation numbers	58	60	30	42	75
XGB-IGA	evaluation time	358 s	236 s	270 s	242 s	248 s
LGBM-IGA	evaluation numbers	40	32	60	52	46
LGBM-IGA	evaluation time	205 s	196 s	230 s	213 s	196 s
CAT-IGA	evaluation numbers	36	54	45	56	47
CAT-IGA	evaluation time	192 s	221 s	195 s	226 s	221 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.; Xu, X.; Xia, X.; Jia, H. Interactive 3D Vase Design Based on Gradient Boosting Decision Trees. Algorithms 2024, 17, 407. https://doi.org/10.3390/a17090407

AMA Style

Wang D, Xu X, Xia X, Jia H. Interactive 3D Vase Design Based on Gradient Boosting Decision Trees. Algorithms. 2024; 17(9):407. https://doi.org/10.3390/a17090407

Chicago/Turabian Style

Wang, Dongming, Xing Xu, Xuewen Xia, and Heming Jia. 2024. "Interactive 3D Vase Design Based on Gradient Boosting Decision Trees" Algorithms 17, no. 9: 407. https://doi.org/10.3390/a17090407

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Interactive 3D Vase Design Based on Gradient Boosting Decision Trees

Abstract

1. Introduction

2. Algorithm and Principle

2.1. Proposed Method

2.2. Decision Tree

2.3. Boosting Decision Trees

2.4. LightGBM Algorithm

2.5. CatBoost Algorithm

2.6. Proxy Model Process

2.7. Data Collection and Update

3. Vase Design and Algorithm Design

3.1. Vase Design

3.2. Gene Coding

3.3. Evolution Operators

3.4. Algorithm Flow

4. Experimental Results and Analysis

4.1. Parameter Setting of the Genetic Algorithm

4.2. Comparison of Evolution Strategies

4.3. Comparison of Different Proxy Models

4.4. Comparison of Optimization Capabilities

4.5. Comparison of User Fatigue Alleviation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI