Prediction of Glass Chemical Composition and Type Identification Based on Machine Learning Algorithms

Chen, Ziwei; Xu, Yang; Zhang, Chao; Tang, Min

doi:10.3390/app14104017

Open AccessArticle

Prediction of Glass Chemical Composition and Type Identification Based on Machine Learning Algorithms

¹

College of Hydraulic Science and Engineering, Yangzhou University, Yangzhou 225009, China

²

College of Mathematical Science, Yangzhou University, Yangzhou 225009, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(10), 4017; https://doi.org/10.3390/app14104017

Submission received: 11 March 2024 / Revised: 26 April 2024 / Accepted: 30 April 2024 / Published: 9 May 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Ancient glass artifacts were susceptible to weathering from the environment, causing changes in their chemical composition, which pose significant obstacles to the identification of glass products. Analyzing the chemical composition of ancient glass has been beneficial for evaluating their weathering status and proposing measures to reduce glass weathering. The objective of this study was to explore the optimal machine learning algorithm for glass type classification based on chemical composition. A set of glass artifact data including color, emblazonry, weathering, and chemical composition was employed and various methods including logistic regression and machine learning techniques were used. The results indicated that a significant correlation (p < 0.05) could only observed between surface weathering and the glass types (high-potassium and lead–barium). Based on the random forest and logistic regression models, the primary chemical components that signify glass types and weathering status were determined using PbO, K₂O, BaO, SiO₂, Al₂O₃, and P₂O₅. The random forest model presented a superior ability to identify glass types and weathering status, with a global accuracy of 96.3%. This study demonstrates the great potential of machine learning for glass chemical component estimation and glass type and weathering status identification, providing technical guidance for the appraisal of ancient glass artifacts.

Keywords:

glass type classification; logistic regression; weathering; BP neural network; random forest

1. Introduction

The Silk Road served as a channel for ancient cultural exchange between Eastern and Western countries, and glass is invaluable proof of early trade. Ancient Chinese glass was a unique style in the history of world glass, with a distinct chemical composition compared to foreign glass. The main raw material used in glass is quartz sand, with the primary chemical component being silicon dioxide (SiO₂). As pure quartz sand has a high melting point, fluxing agents were commonly added during glass refining to reduce the melting temperature. However, the addition of different fluxing agents could change the major chemical components of the glass. For example, adding more lead fluxing agent could increase the content of lead oxide (PbO) and barium oxide (BaO), resulting in the production of lead–barium glass. Ancient glass artifacts, especially those buried underground, are highly susceptible to chemical weathering due to the element exchanges between the glass and soil, thus altering their chemical composition ratio. This often affects the accurate identification of glass types. Therefore, identifying and analyzing the composition of ancient glass is vital to accurately classify the types of glass artifacts.

To date, limited studies have been performed on the patterns of chemical composition changes in weathered glass and the impact of chemical composition on glass classification. For example, Zhou [1] employed infrared reflection spectroscopy and polarized light analysis to investigate the weathering of glass in carbon dioxide and atmospheric conditions, reveal the weathering mechanism of silicate glass, and analyze the primary factors influencing the silicate glass weathering. Tao et al. [2] investigated the effect of doses of zirconium ions injected into phosphate glass on its surface weathering features and layer thickness and proposed that the degree of glass weathering was reduced by adding zirconium ions, from which the glass weathering could be effective identified. Xu et al. [3] investigated the chemical composition of 11 natural glass samples based on X-ray diffraction and used chemical composition analysis to distinguish glass specimens. Furthermore, there is another crucial aspect to the study of glass artifacts. For instance, Fu et al. [4] suggested that the chemical composition of ancient glass may exhibit heterogeneity across different regions, possibly due to localized variations during manufacturing processes or subsequent treatments. Moreover, in the absence of detecting potassium oxide, researchers may rely on the characteristics of other chemical components to infer the type of glass samples. However, current methods have been shown to be insufficient for effectively classifying glass artifacts; thus, we are now considering using modern machine learning algorithms for analyzing the chemical composition of glass.

The development of modern computer technology has provided superior methods for analyzing and processing the composition of glass artifacts, such as machine learning algorithms [5,6]. Machine learning methods involve multidisciplinary fields including probability theory, statistics, approximation theory, convex analysis, algorithmic complexity theory, and other subjects, which could further enhance our data mining, analysis, and processing capabilities [7,8]. For example, Sun et al. [9] conducted a systematic analysis of the glass-forming ability of binary alloys using the support vector machine algorithm to establish the correlation between alloys’ composition and properties, and had much success in identifying new materials. Li [10] utilized deep learning machine learning methods to extract defect features through a data-driven analysis of glass products, which effectively enhanced the accuracy of recognizing and detecting defects in glass products. Zhang et al. [11] proposed an integrated deep learning model for identifying glass defects, which combined a sparse coding classifier with deep convolutional neural networks to improve the model’s recognition accuracy to meet engineering applications. Babbar et al. [12] established connections between polymer data factors (feature space representation, scope of training data, and learning algorithms) for predicting the glass transition temperature of polymers, yielding promising results. Zhang et al. [13] investigated the long-term durability of glass fiber-reinforced polymers in highly alkaline environments, developing a universal, accurate, and optimized tensile strength retention prediction model from a machine learning perspective. Ultimately, they found that machine learning offers practical benefits for long-term durability studies of glass fiber-reinforced polymers. Therefore, machine learning has been proven to be an effective method for identifying and classifying types of glass products.

However, two issues still hinder the feasibility of identifying different types of glass. First, the weathering process of glass could change in the internal elemental composition, causing a misjudgment in the classification result [14,15]. Thus, accurately predicting the chemical composition of glass before weathering is a prerequisite for achieving precise classification. Moreover, although several studies on the classification of glass types have been reported, for example, Zou et al. [16] used one-way analysis of variance to sub-classify glass artifacts. Li et al. [17] employed a multivariate linear regression model for classifying unknown glass products. Guo et al. [18] utilized a support vector machine approach with the sticky slime mold algorithm strategy for classifying glass products. Further investigations are still needed to construct new classification methods and compare the capability of different machine learning algorithms to improve accuracy. Previous studies have reported that neural networks and random forests algorithms exhibit superior estimation and classification performances [19,20].

Therefore, to address the aforementioned issues, a set of ancient glass products data with high-potassium glass and lead–barium glass was sampled and the chi-square test, binary logistic regression, neural network, and random forest algorithms were employed. The objectives of this study were to (1) investigate the correlation between the degree of surface weathering of glass artifacts and the glass type, emblazonry, and color; (2) establish a model for predicting the chemical composition of glass prior to weathering based on the machine learning algorithms; and (3) determine the optimal chemical composition for glass classification and achieve an accurate classification of the types of glass products and recognize the effects of weathering on them.

2. Materials and Methods

2.1. Attribute Statistics

A set of glass artifacts assessed using four categories of data including glass type, emblazonry, color, and chemical composition were measured and collected. The glass type was divided into high-potassium and lead–barium; the emblazonry was simplified to A, B, and C designs; eight shades of color were used including light green, light blue, dark green, dark blue, purple, green, teal, and black; and statistics for the attributes of the data are provided in Table 1.

2.2. Chi-Square Test Model

A chi-square test was adopted to explore the correlation between the four categorical variables that included the type, color, decorative pattern, and surface weathering of the cultural relics [21]. The correlations between the color, decorative pattern, type, and surface weathering of the glass cultural relics were obtained by calculating the statistical value of the chi-square test (χ²) and the column association coefficient (C). The formula for chi-square statistic χ² is shown in Equation (1):

χ^{2} = \sum_{i = 1}^{n} \frac{(f_{i 1} - f_{i 0})}{f_{i 0}}

(1)

where f₀ represents the expected frequency and f₁ represents the observed frequency.

The (C) was used to calculate the correlation between variables, as shown in Equation (2):

C = \sqrt{\frac{χ^{2}}{χ^{2} + n}}

(2)

Where n represents the degrees of freedom and χ² represents the chi-square statistic.

2.3. Binary Logistic Regression Model

Logistic regression could be used to perform regression on binary categorical data to predict the probability of each dependent variable category occurring and analyze each independent variable factor [22]. The logistic regression model was used in this study to extract the most influential chemical components to classify the type of glass artifacts according to the number chemical components containing insensitive components of different types. The independent variable was defined as the chemical composition of the glass artifacts, and the dependent variable was identified as the type of artifact’s. Among the available data, 42 samples of glass artifacts were included for training and the remaining 10 served as the testing set. The types of glass artifacts were encoded by treating “high-potassium” as 1, and “lead-barium” type as 0. Let (P) denote the probability of the glass product being of the “high-potassium” type, with a range of values between 0 and 1. Therefore, (1 − P) represents the probability of the glass product being of the “lead–barium” type, and a specific model was established as shown in Equations (3)–(5):

o d d s = \frac{P}{1 - P}

(3)

\ln (o d d s) = β_{0} + \sum_{i = 1}^{n} β_{i} x_{i}, i = 1,2, \dots n

(4)

P = \frac{e^{(β_{0} + \sum_{i = 1}^{n} β_{i} x_{i})}}{1 + e^{(β_{0} + \sum_{i = 1}^{n} β_{i} x_{i})}}, i = 1,2, \dots n

(5)

where (P) is the probability of glass artifacts being of the high-potassium type, odds refers to the ratio of the probability of an event occurring to without occurring, (β) is the regression coefficient of the binary logistic regression model, and the symbol (β₀) represents the intercept term.

2.4. Neural Network Models

Identifiable neural network models are mathematical models established by simulating the microstructure and function of the human brain’s neural system, which is an important method used to simulate human intelligence [23,24]. This study employed a neural network based on the error backpropagation algorithm (referred to herein as the BP neural network), which is a supervised learning algorithm used in artificial neural networks. In theory, the network consists of nonlinear transformation units and has strong nonlinear mapping capabilities that could approximate any function. Furthermore, the number of hidden layers, processing units, and the learning coefficients in the network could be set according to specific circumstances, providing great flexibility. The structure of the BP neural network can be displayed in a topological diagram as shown in Figure 1:

The BP neural network mainly consists of three layers, including the input layer, hidden layer, and output layer. Various information from external sources was transmitted to the BP neural network’s hidden layer for network processing through the input layer, and the final processing result was obtained from the output layer. When a there is a large error between the BP neural network’s output layer result and its preset input value, the error enters the backpropagation stage of the BP neural network and the network weights are updated until they reach certain conditions with the expected result [25,26].

The input variable (net_j) of the j-th node in the hidden layer was calculated as follows:

n e t_{j} = \sum_{i = 1}^{N} w_{j i} y_{i} + θ_{j}

(6)

where the variable y_i represents the input parameter of the i-th node in the input layer, i = 1, …, N; the variable w_ji represents the weight parameter between the j-th node of the hidden layer and the i-th node of the input layer. The symbol θ_j denotes the bias term (also referred to as the threshold) of the j-th neuron in the hidden layer.

The output variable (x_j) of the j-th node in the hidden layer of the neural network was defined as follows:

x_{j} = ϕ (n e t_{j}) = ϕ (\sum_{i = 1}^{N} w_{j i} y_{i} + θ_{j})

(7)

where θ_j denotes the threshold parameter of the j-th node in the hidden layer of the network.

The input variable (net_t) of the t-th node in the output layer was expressed as follows:

n e t_{t} = \sum_{j = 1}^{p} w_{t j} x_{j} + b_{t} = \sum_{j = 1}^{p} w_{t j} ϕ (\sum_{i = 1}^{N} w_{j i} y_{i} + θ_{j}) + b_{t}

(8)

where Φ(y) represents the activation function of the hidden layer [27]. w_tj denotes the weight parameter between the j-th node in the hidden layer and the t-th node in the output layer, j = 1, …, q. b_t denotes the threshold parameter of the t-th node in the output layer, t = 1, …, l.

The output variable (o_t) of the t-th node in the output layer of the neural network can be expressed as follows:

O_{t} = ψ (n e t_{t}) = ψ (\sum_{j = 1}^{p} w_{t j} x_{j} + b_{t}) = ψ (\sum_{j = 1}^{p} w_{t j} ϕ (\sum_{i = 1}^{N} w_{j i} y_{i} + θ j) + b_{t})

(9)

where Ψ(y) represents the activation function of the output layer and o_t denotes the output of the t-th node in the output layer.

In this study, the number of neurons in the hidden layer was determined following the empirical formula:

n = \sqrt{x + y} + a

(10)

where n represents the number of neurons in the hidden layer, x denotes the number of neurons in the input layer, y refers to the number of neurons in the output layer, and a is a constant.

The selection of a suitable training set was crucial for the BP neural network algorithm. Although some glass artifacts have been weathered, specific areas of glass surfaces may still have their original conditions preserved, which could serve as significant references for the identification models as well. Hence, to construct a training set, the chemical compositions of glass samples before and after weathering were included and randomly correlated following the principle of consistent chemical content. Subsequently, a neural network identification model was derived with the post-weathering glass chemical composition constituting the input layer and the pre-weathering glass chemical composition comprising the output layer. After multiple rounds of testing, the optimal number of neurons in the hidden layer was identified as 10.

2.5. Random Forest Model

Random forest is a frequently used ensemble learning technique based on the use of decision trees in machine learning, which involves training and identifying samples using multiple trees. The random forest method generates independent, identically distributed training sample sets for each decision tree using the bagging approach, and the final classification result relies on the aggregated votes from all decision trees. The primary theory is to repeatedly and randomly extract, with replacement, a vast number of original samples, enabling access to M training sample sets. Subsequently, n classification features (n ≤ N) are randomly selected from the total N features within each sample set to facilitate full node splitting of decision trees, thus generating M decision trees. Consequently, the category of new samples is determined through a majority voting process among the outcomes obtained from all M decision trees [28].

First, assuming a sample set X with n categories, the expected sample entropy can be denoted as Equation (11):

I (x_{1}, x_{2}, \dots, x_{n}) = - \sum_{i = 1}^{n} \frac{x_{i}}{x} \log_{2} \frac{x_{i}}{x}

(11)

where A_i (i = 1, 2, …, n) represents the number of samples for each category and x is the total number of samples involved.

Consider a single feature (C) from a sample set, the expected entropy for the feature can be represented by Equation (12):

\{\begin{array}{l} E (C) = \sum_{j = 1}^{m} \frac{x_{1 j} + x_{2 j} + . . . + x_{n j}}{x} I (x_{1 j}, x_{2 j}, \dots, x_{n j}) \\ I (x_{1 j}, x_{2 j}, \dots, x_{n j}) = - \sum_{i = 1}^{m} \frac{x_{i j}}{x_{j}} \log_{2} \frac{x_{i j}}{x_{j}} \end{array}

(12)

where m denotes the total number of sample features and x_ij is the j-th dimension feature of a sample within category i. The x_j represents the feature value of the j-th sample.

The entropy gain for feature C could be calculated using the following Formula (13) [29]:

G a i n (C) = I (x_{1}, x_{2}, \dots, x_{n}) - E (C)

(13)

The marginal functions for any two features D and E of a sample could be represented using the following Formula (14):

n a (D, E) = a v_{m} (I (t_{m} (D) = E)) - \underset{j \neq E}{m a x} a v_{m} (I (t_{m} (D) = j))

(14)

where the conversion function is denoted as I(X), while E and j, respectively, refer to the positive and negative categories determined by the random forest. The avm (X) and na (D, E) are the mean values and which are proportional to the effectiveness of feature extraction, respectively.

Beyond deploying the random forest identification model, the parameters of random forest, which can significantly modulate the model’s performance, need to be priorly calibrated. In this study, the optimal parameters of the leaf and decision tree counts were determined by the mean square error method proposed in [30,31]. The optimization results showed that the mean square error values were low when the minimum number of leaves was lower than 5, and the associated errors were steady when the number of decision trees exceeded 200 (Figure 2). Consequently, a total of 300 trees and 5 leaves were determined for model identification.

The workflow of this study is as follows: Firstly, we collected a dataset concerning glass artifacts, which included properties such as color, type, emblazonry, weathering conditions, and chemical composition of the glass. Secondly, we employed the chi-square test method to explore the correlation between the surface weathering degree of glass and their types, emblazonries, and colors. Subsequently, the BP neural network algorithm was utilized to predict the chemical composition of weathered glass before weathering. Then, a binary logistic regression model was used to determine the chemical composition, which is important for identifying glass type. Finally, random forest and neural network algorithms were employed to identify the type and weathering conditions of unknown glass artifacts. The specific process is presented in Figure 3.

3. Results

3.1. Correlation Analysis

To reveal the main contributor to glass surface weathering, the correlations between glass surface weathering and features such as color, emblazonry, and type were analyzed based on a chi-squared test (Table 2). The results indicated that there was no significant correlation between weathering and the color or emblazonry of the glass, with p > 0.05. On the contrary, a significant correlation was observed between surface weathering and the glass type (p < 0.05). The glass types, high-potassium and lead–barium, were mainly determined by their chemical components, indicating that different chemical elements exhibited different weathering resistance capabilities. A higher coefficient C implied a stronger correlation between that attribute and surface weathering, with glass type exhibiting the strongest correlation. A similar result was also reported by [32].

3.2. Predicting Chemical Composition

The random forest model was used to analyze the chemical compounds within the high-potassium and lead–barium glasses and identify the sensitive element changes that occurred due to weathering. As shown in Figure 4, for high-potassium glass, the top six significantly altered chemicals affected by weathering were K₂O, Fe₂O₃, SiO₂, Al₂O₃, CaO, and P₂O₅, with importance indices of 0.55, 0.41, 0.37, 0.27, 0.20, and 0.17, respectively. Other components such as MgO, CuO, PbO, and BaO were almost unchanged with importance indices lower than 0.03, indicating that the contents of these chemical substances were influenced by weathering. For lead–barium glass, the top five significantly altered chemicals affected by weathering were SiO₂, PbO, CaO, Al₂O₃, and BaO. Among them, the importance indices of SiO₂, PbO, and CaO were many times higher than that of the other chemical components, with importance indices of 0.41, 0.34, and 0.33, respectively. In addition, the importance indices of the other components such as K₂O, PbO, MgO, Fe₂O₃, and SrO were negligible, indicating that these components were very stable and were not changed by weathering. Therefore, the components that are sensitive to weathering could be effectively utilized to determine the weathering status of glass artifacts.

The pre-weathering chemical compositions of glass artifacts containing high-potassium and lead-barium were forecasted by the BP model and are presented in Figure 5. According to Figure 5, the average chemical composition of each sample of predicted weathered high-potassium glass and lead–barium glass was obtained. A comparison was conducted between the average chemical composition after weathering and the original composition, yielding the rate of change for each chemical component after weathering. For the high-potassium glass, the chemical contents of K₂O, Fe₂O₃, Al₂O₃, CaO, and P₂O₅ were decreased by 95.3%, 76.3%, 72.2%, 81.1%, and 55.9% after weathering, respectively. Contrarily, only the content of SiO₂ increased after weathering, by 32.1%. In addition, the contents of the other chemical components were hardly changed. The results indicated that weathering was highly corrosive to high-potassium glass, and the main chemical components were largely converted into SiO₂. In contrast, after weathering, the lead–barium glass samples experienced a decrease in SiO₂ (30.4%) and CaO (18.8%) and an increase in PbO (42.2%), Al₂O₃ content (10.2%), and BaO content (21.2%). Consequently, the variability in the chemical components showed that that the lead–barium glass experienced relatively limited erosion and minor alterations in SiO₂ content following weathering compared with high-potassium glass. The results were consistent with the reports shown in Figure 4, indicating that the chemical composition, greatly affected by weathering, exhibited significant changes before and after weathering for both glass types.

3.3. Identifying Glass Type and Weathering

The previous sect0ion investigated the chemical components that are sensitive to weathering in the two types of glass and estimated the content of each component prior to weathering. However, for an unidentified glass sample, the type of the glass was unknown and needed to be classified in advance. The logistic regression model was used to extract the most influential chemical components affecting the classification of high-potassium and lead–barium glass artifacts (Figure 6). The results showed that the identified types were completely consistent with the factual types for all ten testing samples. Therefore, the impact weights of each chemical component in the glass types could be determined using the coefficients in the regression model [33].

The regression coefficients of each chemical component used in the classification of high-potassium and lead–barium glass artifacts are shown in Table 3. If the regression coefficient was positive, this indicated that with an increase in the chemical component content, the higher the likelihood of a glass artifact being classified as a high-potassium type. Conversely, a negative regression coefficient suggested that the higher chemical component content, the greater the probability of the artifact being categorized as a lead-barium type. Comparing the absolute values of the regression coefficients of the various chemical components, it can be seen that the PbO, K₂O, BaO, SiO₂, Al₂O₃, and P₂O₅ have significantly higher values, offering critical contributions to the different glass types. Furthermore, these six chemical components are crucial variables for identifying glass weathering status as mentioned above in Section 3.2. Therefore, the main classes of high-potassium and lead–barium and the sub classes of weathered and non-weathered could be judged accordingly using these six chemical components.

Both BP neural network and random forest models were employed to accurately identify the main and sub classes of the glass based on chemical components (Figure 7). The total 52 glass samples were initialized and divided into training and testing datasets of 7:3. The input variables were the six chemical components and outputs were the glass types of weathered high-potassium (HP_w), non-weathered high-potassium (HP_nw), weathered lead–barium (LB_w), and non-weathered lead–barium (LB_nw). Overall, the neural network model showed an overall identification accuracy rate of 81.25%, with three incorrectly identified samples (the first, fifth, and twelfth) in the weathering status of the lead–barium glass. In contrast, the random forest model achieved a 100% identification accuracy rate for all of the tested samples.

To mitigate the influence of random sampling on the accuracy evaluation of the models, ten rounds of random sampling were conducted from the 52 samples. After each random sample partition, 36 samples were allocated to the training set and 16 to the testing set, followed by model training and testing. Following ten rounds of random sampling and testing, the results from these ten rounds were aggregated for accuracy analysis. Confusion matrix plots of the identification results were presented in Figure 8.

In Figure 8, the blue boxes denote the correct rates of identification for each sub class, while the orange boxes represent the identification error rates. The sum of the values in each row indicates the total number of samples in that category. Then results showed that both the neural network and random forest models could effectively and accurately identify the main classes (high-potassium and lead–barium). The main identification errors for both models were found in the sub classes (weathering and non-weathering) of lead–barium. Specifically, the neural network model misclassified a total of 21 out of 74 LB_w samples and 11 out of 31 LB_nw samples. Consequently, the correct rates for identifying LB_w and LB_nw were determined to be 71.6% and 64.5%, respectively. The neural network model achieved an overall identification accuracy rate of 80% for the entire sample set. For the random forest model, among the seventy-four samples of the LB_w, only three samples were identified incorrectly, resulting in an identification accuracy of 95.9%; for the thirty-one samples of the LB_nw, three samples were identified incorrectly, resulting in an identification accuracy of 90.3%. The random forest model has an overall identification accuracy of 96.3% for the total sample set. Comparing the accuracy of the two machine learning models, the random forest model presented a superior capability that could be used to classify and identify the main and sub classes of unknown glass artifacts.

To further evaluate the robustness of the prediction model, the test data of eight additional samples out of the training dataset were employed and the validation results are presented in Table 4. Among the eight predicted target samples, A1, A6, and A7 were high-potassium glass with A1 being non-weathered and A6 and A7 being weathered. A2, A3, A4, A5, and A8 were lead–barium glass with A2, A3, and A5 being weathered, while A4, and A8 were non-weathered. Apart from the sample A3, the identifications of main and sub classes were all accurately determined. The results confirm that the random forest identification model performed well in determining the type and weathering status of glass artifacts and highlight that machine learning algorithms can be effectively used to identify glass artifacts based on their chemical composition.

4. Discussion and Conclusions

In this study, we investigated the correlation between glass surface weathering status and the features of color, emblazonry, and type. The results suggested that there is a correlation between the weathering status and the glass type, which may be attributed to different capabilities of different types of glass to resist weathering due to the variations in their in chemical compositions. Wang et al. [34] investigated the influence index for silicate glass weathering and demonstrated that the degree of weathering was determined by the chemical components of the glass. This study confirmed this as significant correlation was only observed between glass type and surface weathering, revealing that the weathering of glass artifacts was chemical weathering. Therefore, suitable protective measures should be employed in the protection and restoration processes to minimize potential environmental threats to ancient glass artifacts [35,36].

When evaluating the identification performances of the BP neural network and random forest models on the test set, consistent inaccuracies were only observed in the classification of LB_w and LB_nw, and both models presented a superior ability to identify HP_w and HP_nw. The reason for this could be attributed to the sensitive chemical components in lead–barium glass (Figure 4b). Compared to the top five sensitive chemical components for weathering in high-potassium glass, only three chemicals significantly changed (importance index > 0.2) by weathering were reported in lead-barium glass which could bring uncertainty to the weathering identification. Furthermore, this study revealed that the BP neural network model displayed a lower predictive accuracy compared with random forest model, possibly due to the presence of data noise. Data noise refers to inaccuracies introduced by sample selection and measuring area [37]. The random forests could use the randomness of the sample and feature set to efficiently train highly generalized models, which were relatively robust against data noise [38]. The BP neural networks, however, tended to be more vulnerable to outliers and other forms of data noise which may disturb classification and identification outcomes [39].

Through the use of random forest and BP neural network algorithms, our study successfully identified the main and sub classes of eight glass artifacts. Notably, the resulting predictive models demonstrated a high degree of accuracy, contrasting with the methodology proposed by Wang et al. [40], who employed K-means clustering to distinguish glass subcategories based on the elbow rule. Compared with previous studies, the categorization identification in this study clarifies the features of each category and highlights the efficacy of the hybrid methods in gauging and characterizing the type and weathering status of glass relics.

Despite the positive outcomes of this study, certain limitations should be acknowledged. Firstly, the dataset utilized may not encompass all possible types of glass and weathering conditions, potentially restricting the model’s generalizability. Secondly, biases and noise in the data collection process could affect the accuracy of the model. Additionally, this study primarily focused on the impact of chemical composition on glass type and weathering status, with insufficient consideration given to external environmental factors, such as soil type and climatic conditions. To address these limitations, future research could delve into the following directions: 1. Collect data on glass artifacts from various regions, historical periods, and cultural backgrounds to enhance the model’s generalizability and applicability. 2. Investigate the effects of external factors, such as soil type, climatic conditions, and burial environments, on glass weathering and how these factors interact with the chemical composition of glass. 3. Explore new machine learning algorithms to improve the model’s robustness against data noise and outliers, thereby further enhancing classification and identification accuracy. 4. Conduct a more comprehensive study on the classification and weathering mechanisms of glass artifacts by integrating knowledge from fields such as materials science, chemistry, and archaeology.

Through these future research directions, we hope to gain a deeper understanding of the chemical composition, type classification, and weathering processes of ancient glass artifacts, providing more scientific and systematic methods for the conservation and restoration of cultural heritage.

Author Contributions

Conceptualization and writing—original draft preparation, Z.C.; investigation, Y.X.; writing—review and editing, M.T.; supervision and funding acquisition, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (52379049, 52209071), the China Postdoctoral Science Foundation (2023T160552, 2020M671623), “Chunhui Plan” Cooperative Scientific Research Project of Ministry of Education of China (HZKY20220115) and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.

Acknowledgments

The authors would also acknowledge the China Society for Industrial and Applied Mathematics, for providing the experimental data.

Conflicts of Interest

The authors declare no competing interests.

References

Zhou, L. Major factors affecting the weathering of the silicate glass. J. Dalian Polytech. Univ. 1984, 34–44. [Google Scholar] [CrossRef]
Tao, Y.; Wang, S.; Wang, C. Effect of zirconium ion injection on the weathering of phosphate glass. Glass Enamel Ophthalmic Opt. 1993, 1–6. [Google Scholar]
Xu, Z.; Zhao, H.; Gan, F. Nondestructive Analysis of Chemical Composition, Structure and Mineral Phase of Natural Glasses. J. Chin. Ceram. Soc. 2012, 40, 443–449. [Google Scholar]
Fu, Q.; Kuang, G.; LÜ, L.; Mo, H.; Li, Q.; Gan, F. Nondestructive Analysis of the Glass Artifacts of the Han Dynasties Excavated from Guangzhou. J. Chin. Ceram. Soc. 2013, 41, 994–1003. [Google Scholar]
Wang, L. High-Efficiency Screening of Zeolite Materials for Adsorption Separation CO₂ by Machine Learning; Dalian University of Technology: Dalian, China, 2022. [Google Scholar]
Tian, J.; Zhao, Y.; Huang, Y.; Li, Y.; Zhang, C.; Peng, S.; Han, G.; Liu, Y. Theoretical Prediction of Vickers Hardness for Oxide Glasses: Machine Learning Model, Interpretability Analysis, and Experimental Validation. Materialia 2024, 33, 102006. [Google Scholar] [CrossRef]
Liu, F.; Shi, J.; Wang, W.; Zhao, R. Review of machine learning algorithm applied in materials science. New Chem. Mater. 2022, 50, 42–46, 52. [Google Scholar]
Zhou, G.; Zhang, Z.; Feng, R.; Zhao, W.; Peng, S.; Li, J.; Fan, F.; Fang, Q. Chemical Composition Optimization of Biocompatible Non-Equiatomic High-Entropy Alloys Using Machine Learning and First-Principles Calculations. Symmetry 2023, 15, 2029. [Google Scholar] [CrossRef]
Sun, Y.; Bai, H.; Li, M.; Wang, W. Machine Learning Approach for Prediction and Understanding of Glass-Forming Ability. J. Phys. Chem. Lett. 2017, 8, 3434–3439. [Google Scholar] [CrossRef] [PubMed]
Li, J. Glass Defect Detection Based on Deep Learning; Fujian University of Technology: Fuzhou, China, 2021. [Google Scholar]
Zhang, D.; Jin, Y.; Hu, B.; Zhao, Y. Glass Defect Recognition Method Based on Integrated Learning. Comput. Meas. Control. 2019, 30, 168–180. [Google Scholar]
Babbar, A.; Ragunathan, S.; Mitra, D.; Dutta, A.; Patra, T.K. Explainability and extrapolation of machine learning models for predicting the glass transition temperature of polymers. J. Polym. Sci. 2023, 62, 1175–1186. [Google Scholar]
Zhang, K.; Zhang, K.; Bao, R. Machine learning models to predict the residual tensile strength of glass fiber reinforced polymer bars in strong alkaline environments: A comparative study. J. Build. Eng. 2023, 73, 106817. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, Y. Subclass classification and chemical composition analysis and identification of ancient glass products. Acad. J. Mater. Chem. 2023, 4, 54–61. [Google Scholar]
Gentiana, S.; Andreas, H.; Edda, R. AES and EDX surface analysis of weathered float glass exposed in different environmental conditions. J. Non-Cryst. Solids 2021, 572, 121083. [Google Scholar]
Zou, Y. Molecular-Composition Analysis of Glass Chemical Composition Based on Time-Series and Clustering Methods. Molecules 2023, 28, 853. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Lu, P.; Wang, G.; Li, J.; Yang, Z.; Ma, Y.; Wang, H. Analysis of the Composition of Ancient Glass and Its Identification Based on the Daen-LR, ARIMA-LSTM and MLR Combined Process. Appl. Sci. 2023, 13, 6639. [Google Scholar] [CrossRef]
Guo, Y.; Zhan, W.; Li, W. Application of Support Vector Machine Algorithm Incorporating Slime Mould Algorithm Strategy in Ancient Glass Classification. Appl. Sci. 2023, 13, 3718. [Google Scholar] [CrossRef]
Rybacki, P.; Niemann, J.; Derouiche, S.; Chetehouna, S.; Boulaares, I.; Seghir, N.M.; Diatta, J.; Osuch, A. Convolutional Neural Network (CNN) Model for the Classification of Varieties of Date Palm Fruits (Phoenix dactylifera L.). Sensors 2024, 24, 558. [Google Scholar] [CrossRef]
Fang, R.; Zhang, S.; Deng, L.; Fan, W.; Wang, H. Research on a classification model of loess seismic landslides based on random forest in the Haiyuan region. Bull. Eng. Geol. Environ. 2023, 82, 72. [Google Scholar] [CrossRef]
Cao, T.; Chen, M. A Test of Independence for Two Dimensional Contingency Tables Based on Distance Covariance. J. Syst. Sci. Math. 2020, 40, 1687–1700. [Google Scholar]
Rao, Y.; Zhang, X. Based on logistic regression model to determine the weight fuzz comprehensive evaluation method in the application of the slope stability analysis. Nonferrous Met. Sci. Eng. 2015, 6, 111–115. [Google Scholar]
Viatkin, D.; Zakharov, M.; Zhuro, D. Prediction of reduced glass transition temperature of metallic alloys based on a neural network. J. Phys. Conf. Ser. 2022, 2373, 082016. [Google Scholar] [CrossRef]
Chai, B.X.; Eisenbart, B.; Nikzad, M.; Fox, B.; Blythe, A.; Bwar, K.H.; Wang, J.; Du, Y.; Shevtsov, S. Application of KNN and ANN Metamodeling for RTM Filling Process Prediction. Materials 2023, 16, 6115. [Google Scholar] [CrossRef]
Chen, C.T.; Gu, G.X. Generative Deep Neural Networks for Inverse Materials Design Using Backpropagation and Active Learning. Adv. Sci. 2020, 7, 1902607. [Google Scholar] [CrossRef] [PubMed]
Noor, K.; Jan, S. Vehicle Price Prediction System using Machine Learning Techniques. Int. J. Comput. Appl. 2017, 167, 27–31. [Google Scholar] [CrossRef]
Sun, G.; Zeng, G.; Hu, C.; Jiang, T. Starch-based aerogel prepared by freeze-drying: Establishing a BP neural network prediction model. Iran. Polym. J. 2022, 32, 37–44. [Google Scholar] [CrossRef]
Yang, Z.; Lv, H.; Xu, Z.; Wang, X. Source discrimination of mine water based on the random forest method. Sci. Rep. 2022, 12, 19568. [Google Scholar] [CrossRef]
Prasetiyowati, M.I.; Maulidevi, N.U.; Surendro, K. The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy. PeerJ Comput. Sci. 2022, 8, e1041. [Google Scholar] [CrossRef]
Khaire, U.M.; Dhanalakshmi, R. Effects of Random Forest Parameters in the Selection of Biomarkers. Comput. J. 2020, 64, 1840–1847. [Google Scholar] [CrossRef]
Cao, T. Study on the Importance of Variables Based on Random Forest. Stat. Decis. 2022, 38, 60–63. [Google Scholar]
Kun, J. Research on the composition analysis and identification model of ancient glass products. Acad. J. Mater. Chem. 2022, 3, 48–54. [Google Scholar]
Moawed, S.A.; Osman, M.M. The Robustness of Binary Logistic Regression and Linear Discriminant Analysis for the Classification and Differentiation between Dairy Cows and Buffaloes. Int. J. Stat. Appl. 2017, 7, 7. [Google Scholar]
Wang, C.; Tao, Y. The Weathering of Silicate Glasses. J. Chin. Ceram. Soc. 2003, 31, 78–85. [Google Scholar]
Zhang, K.; Wang, J.; Yu, W.; Zhao, J.; Yue, X.; Luo, H. Corrosion mechanisms for lead-barium glass from the Warring States period. Herit. Sci. 2023, 11, 79. [Google Scholar] [CrossRef]
Zhang, C.; Li, R.; Chen, W.; Wang, X. On the research of cultural relic restoration under reverse design. E3S Web Conf. 2020, 189, 03006. [Google Scholar] [CrossRef]
Ali, Y.; Irfan, M.; Hussain, E. The impact of data noise on permanent deformation behaviour of asphalt concrete mixtures. Int. J. Pavement Eng. 2020, 21, 1470–1481. [Google Scholar] [CrossRef]
Tu, H.; Xia, K.; Zhao, E.; Mu, L.; Sun, J. Optimum trim prediction for container ships based on machine learning. Ocean. Eng. 2023, 277, 111322. [Google Scholar] [CrossRef]
Zhou, X.; Ma, Y.; Luo, Y.; Tian, T.; Liu, W.; Li, X.; He, N.; Yan, Z.; Ni, H. Study on Chromatographic Condition Assessment of Transformer Oil Based on Random Forest Model. DEStech Trans. Environ. Energy Earth Sci. 2020, 35482. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Zhao, X.; Li, Z.; Guo, M.; Xiao, W.; Liu, Z. Prediction of Original Ingredients of Portland Glass and Research into Subclassification Methods Based on Machine Learning. J. Chin. Ceram. Soc. 2023, 51, 416–426. [Google Scholar]

Figure 1. Topological diagram of BP neural network.

Figure 2. Comparison of mean square error under different number of leaves and trees in random forest.

Figure 3. Schematic overview of the study workflow.

Figure 4. Weathering-induced changes in chemical composition of two glass types.

Figure 5. The predicated chemical component proportion in high-potassium and lead–barium glass before weathering.

Figure 6. Results of the logistic regression model test.

Figure 7. Identification accuracy comparison between the neural network and random forest models. (a) Neural network; (b) random forest.

Figure 8. Confusion matrices of the performances of the neural network and random forest identification models on the test set. (a) Neural network; (b) random forest.

Table 1. Properties and weathering conditions of glass products.

Property	Level	Non-Weathered	Weathered	Total
Type	Lead–Barium	12	24	36
Type	High-Potassium	10	6	16
Emblazonry	A	11	9	20
	B	0	6	6
	C	11	15	26
Color	Light green	2	1	3
	Pale blue	6	12	18
	Dark green	3	4	7
	Deep Blue	2	0	2
	Purple	2	2	4
	Green	1	0	1
	Blue-green	6	9	15
	Black	0	2	2

Table 2. Results of chi-square test.

Properties	Asymptotic Significance (p)	χ²	Column Association Coefficient (C)
Emblazonry	0.057	5.72	0.291
Type	0.009	3.86	0.326
Color	0.428	7.01	0.302

Table 3. Regression coefficients of each chemical component.

Chemical Components	SiO₂	K₂O	CaO	MgO	Al₂O₃	Fe₂O₃	CuO	PbO	BaO	P₂O₅	SrO
Regression Coefficients	7.68	10.94	3.78	1.31	−6.21	−4.16	−5.46	−12.3	−8.03	−5.83	4.21

Table 4. Chemical composition and identification results of unknown glass artifacts.

Number	Chemical Component						Glass Category	Weathering Condition
Number	PbO	K₂O	BaO	SiO₂	Al₂O₃	P₂O₅	Glass Category	Weathering Condition
A1	0.00	0.00	0.00	78.45	7.23	1.06	High-potassium	Non-weathered
A2	34.30	0.00	0.00	37.75	2.33	14.27	Lead–barium	Weathered
A3	39.58	1.36	4.69	31.95	2.93	2.68	Lead–barium	Weathered
A4	24.28	0.79	8.31	35.47	7.07	8.45	Lead–barium	Non-weathered
A5	12.23	0.37	2.16	64.29	12.75	0.19	Lead–barium	Weathered
A6	0.00	1.35	0.00	93.17	1.52	0.21	High-potassium	Weathered
A7	0.00	0.98	0.00	90.83	5.06	0.13	High-potassium	Weathered
A8	21.24	0.23	11.34	51.12	2.12	1.46	Lead–barium	Non-weathered

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Xu, Y.; Zhang, C.; Tang, M. Prediction of Glass Chemical Composition and Type Identification Based on Machine Learning Algorithms. Appl. Sci. 2024, 14, 4017. https://doi.org/10.3390/app14104017

AMA Style

Chen Z, Xu Y, Zhang C, Tang M. Prediction of Glass Chemical Composition and Type Identification Based on Machine Learning Algorithms. Applied Sciences. 2024; 14(10):4017. https://doi.org/10.3390/app14104017

Chicago/Turabian Style

Chen, Ziwei, Yang Xu, Chao Zhang, and Min Tang. 2024. "Prediction of Glass Chemical Composition and Type Identification Based on Machine Learning Algorithms" Applied Sciences 14, no. 10: 4017. https://doi.org/10.3390/app14104017

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Glass Chemical Composition and Type Identification Based on Machine Learning Algorithms

Abstract

1. Introduction

2. Materials and Methods

2.1. Attribute Statistics

2.2. Chi-Square Test Model

2.3. Binary Logistic Regression Model

2.4. Neural Network Models

2.5. Random Forest Model

3. Results

3.1. Correlation Analysis

3.2. Predicting Chemical Composition

3.3. Identifying Glass Type and Weathering

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI