1. Introduction
In recent years, global emergencies have become more prevalent, presenting a substantial risk to the survival and development of humanity. Data from the Emergency Events Database (EM-DAT) indicates that over 22,000 major emergencies transpired globally from 1990 to 2020, leading to direct economic losses amounting to trillions of dollars. Events such as the 2015 Nepal earthquake, the 2017 Hurricane Harvey in the United States, the 2018 Indonesia earthquake and tsunami, the 2018 Yarlung Tsangpo landslide weir on China’s Jinsha River, and the 2019 outbreak of the global COVID-19 epidemic pose a serious threat to the national economy and the safety of life and property, resulting in a large number and variety of demands for emergency supplies [
1]. Because of the uncertainty of emergencies and the severity of the disaster, it is often difficult to respond quickly to emergency demands by relying solely on a single, limited variety of emergency materials stockpiled by governments, so governments must mobilize the material resources of businesses and society to provide collaborative relief [
2]. In this context, the development of the government–enterprise joint reserve model has emerged as a contemporary research hotspot.
The emergency supplies joint reserve mode (ESJRM) [
3] coordinates and dispatches supplies across sectors, regions, and disasters. Although the problem of resource scarcity has been solved through contractual coordination [
4] and cost-sharing [
5], the government and enterprises have adopted different classification standards for emergency supplies due to their different purposes of use and demands, and different classification standards have different classification numbers and levels, resulting in the phenomenon of emergency supplies not being “found or deployed”. The government generally classifies emergency supplies according to their purpose, such as the US government’s FEMA’s Authorized Equipment Catalogue (AEL) and the Interagency Advisory Board’s (IAB) Standardized Equipment List (SEL) [
6], Japan’s Emergency Supplies Reserve and Rotation System (ESSRS) [
7], and Australia’s Federal Emergency Management Agency (FEMA)’s Overseas Disaster Rescue Plan (ODRP), which sets out detailed resources for relief supplies [
8], the Chinese government adopts the national standard GB/T 38565-2020 Classification and Coding of Emergency Supplies [
9] (hereinafter referred to as GB/T 38565), etc. Enterprises generally classify their emergency supplies based on commerce and trade demands, and they use common product standards such as the Global Product Classification (hereinafter referred to as GPC) [
10] and The United Nations Standard Products and Service Codes (hereinafter referred to as UNSPSC) [
11]. However, due to the differing objectives and purposes of the government and enterprises, it is difficult to accept or build a new unified supply classification standard.
With the advancement of artificial intelligence and other technologies, constructing the mapping relationship between the emergency supplies classification standard and the general supplies classification standard can become more efficient, precise, and convenient to achieve information sharing of joint reserve supplies between government and enterprises, thereby providing a solution to the aforementioned problems. The mapping of supply classification standards falls under the area of taxonomy category mapping, which has been a research hotspot in natural language processing (NLP). Traditional machine learning algorithms, such as NB [
12], KNN [
13], and SVM [
14], rely too heavily on manually set features and have poor model generalization capabilities. Deep learning methods based on neural networks are preferred for their powerful feature extraction capabilities, such as RNN [
15], LSTM [
16], CNN [
17], TextCNN [
18], etc., which provide good mapping classification results. With the use of pre-trained language models such as BERT [
19], BERT can train finer-grained dynamic word vectors than classic word vector models such as Word2vec [
20] and TF-IDF [
21].
As a result, several researchers are merging BERT with neural networks, such as BERT-RNN [
22], BERT-CNN [
23], etc., to improve their performance in domain-specific text categorization mapping tasks. Emergency supplies classification category mapping includes issues such as data sparsity and a strong reliance on context, and little study has been conducted on emergency supplies classification standard category mapping using this combined approach.
Therefore, this uses BERT’s sophisticated semantic extraction of supply classes to characterize full-text features and the TextCNN convolutional layer to extract additional local features. It is believed that this combination will outperform any network working alone. The purpose of this study is to propose a novel combination of BERT and TextCNN network for more accurate category mapping of two separate supply categorization standards, as well as to give technical assistance for the subsequent development of a collaborative government–enterprise reserve supply information exchange system.
The rest of this study is shown below.
Section 2 presents the literature review.
Section 3 constructs an algorithmic model for emergency supplies classification standard category mapping.
Section 4 details the experimental methodology employed in the model.
Section 5 describes the experimental results and analyses them in depth. The paper concludes with the appropriate conclusions and discussion, including the significance of the study, limitations, and future research directions.
5. Results and Analysis
5.1. Results
Table 7 presents the accuracy, precision, recall, and F1 scores for the GB/T 38565 and GPC category mapping across various models. Notably, the BERT-TextCNN approach introduced in this study achieves the highest accuracy at 98.22%, significantly surpassing other deep learning models optimized on the training dataset, including BERT-DSSM, BERT-S2Net, BERT-RNN, BETR-CNN, BERT-BiLSTM, and BERT-BiLSTM-CNN. This finding suggests that the BERT and TextCNN-based methodology proposed herein substantially enhances the accuracy of GB/T 38565 and GPC category mapping. Furthermore, the BERT-TextCNN model achieved the highest F1 score of 97.14%, indicating that the predictions were the most accurate among all models, as well as the strongest identification of positive samples. Lastly, the accuracy of all seven models evaluated in this study exceeds 90%, which indirectly reflects the high quality of the manually annotated corpus and the significant improvement in the performance of the models trained on this dataset.
The performance comparison between BERT-S2Net and BERT-DSSM, both of which employ a two-tower language model architecture, reveals only a marginal difference of 0.36 percentage points. This minimal variance may be attributed to the incorporation of two deep neural network (DNN) structures (f1, f2) within both models, which possess nearly identical parameters, thereby yielding comparable performance outcomes. In the analysis of BERT-RNN and BERT-CNN models, the results demonstrate a notable consistency, likely due to their reliance on shared textual features derived from the pre-training of the foundational BERT model. This configuration enables the RNN to capture global information from text sequences, while the CNN is oriented toward local information, resulting in aligned mapping results.
When examining the performance of BERT-RNN, BERT-BiLSTM, and BERT-BiLSTM-CNN, BERT-BiLSTM achieves an accuracy of 97.86% with the highest recall of 100%. This notable performance can be attributed to the BiLSTM’s capacity to establish a more effective contextual relationship by processing information in both forward and backward directions, thereby enhancing its understanding of the deep semantics of the text. However, the introduction of a CNN layer on top of the BiLSTM may complicate the model, potentially leading to overfitting or increased training difficulty, which could elucidate why BERT-BiLSTM-CNN does not perform as well as BERT-BiLSTM. Furthermore, the evaluation metrics for the BERT-TextCNN model exceed those of the BERT-CNN model, likely due to the superior capability of CNNs in handling image features, while TextCNN is more proficient in efficiently capturing the textual features extracted by BERT, resulting in all its evaluation metrics surpassing 95%.
5.2. Analysis
Figure 7 illustrates the variations in accuracy, precision, recall, and F1 score for the BERT-TextCNN model presented in this study, as evaluated on both the training and test datasets. The red dashed curve represents the training set, while the green solid curve denotes the testing set. The results indicate that the model achieves an accuracy exceeding 95% on the training set following the training process. After 20 epochs, the training accuracy reached 99.80%, with a corresponding testing accuracy of 98.22%. Throughout the training process, the model’s accuracy exhibited minor fluctuations within a narrow range of values after each training stage. Notably, the performance metrics for the test and training sets are closely aligned, with a maximum difference of no more than 2%. This observation suggests that the model possesses robust generalization capabilities and demonstrates a strong capacity to adapt to new data.
To further elucidate the performance of the models, we conducted a comparative analysis of the training outcomes across various models.
Figure 8 illustrates the accuracy, precision, recall, and F1 scores associated with different models engaged in category mapping on the testing dataset. The findings indicate that as the number of epochs increases, the performance metrics for the seven models generally improve, suggesting that all models are capable of learning and adapting to the dataset’s features throughout the training process. Notably, the models BERT-RNN, BERT-LSTM, BERT-CNN, BERT-LSTMCNN, and BERT-TextCNN exhibit exceptionally high performance from the outset, with all metrics surpassing 90%. In contrast, while the BERT-DSSM and BERT-S2Net models gradually approach the highest accuracy, their performance remains significantly inferior to that of the other models, indicating a need for enhancement in their ability to capture and learn from the dataset features. Furthermore, the performance metrics for all models tend to stabilize at higher epochs, particularly for the BERT-TextCNN model, which demonstrates robust adaptability and learning capacity regarding the dataset features. Concurrently, all models exhibit indications of overfitting with an increasing number of epochs, particularly after 15 epochs, suggesting that the optimal number of epochs is influenced by both the dataset size and the model complexity.
In
Table 3, the terms True Positive (TP) and True Negative (TN) represent the number of samples accurately classified by the classifier, and thus, the sum of TP and TN reflects the total number of correctly classified samples. Analyzing the confusion matrix presented in
Figure 9, it is evident that the BERT-TextCNN model achieves the highest number of correct predictions, totaling 276 (comprising 191 TP and 85 TN) out of 281 test samples, thereby establishing it as the model with the superior performance and highest accuracy. Although the BERT-BiLSTM model reports zero false negatives, it exhibits a false positive rate of 2.14%, which is the highest among the evaluated models. In contrast, when considering both false positive and false negative rates, BERT-TextCNN demonstrates the lowest combined rate of 1.78% (0.71% false positives and 1.07% false negatives). This further substantiates that BERT-TextCNN not only attains the highest accuracy but also maintains a minimal incidence of false positive and false negative classifications, thereby minimizing the potential for erroneous mappings.
Figure 10 illustrates the variation in the loss function across various models throughout the training process on the dataset. The findings indicate that the loss values for all models exhibit a consistent decline as the number of epochs increases, suggesting that the models are capable of effectively learning the features of the category description text during training and are continuously enhancing their mapping performance. Notably, the BERT-TextCNN model demonstrates the lowest loss value within the dataset, signifying its superior generalization capability and stability, as well as its enhanced ability to extract both local and global features of the text through model fusion.
6. Conclusions and Discussion
6.1. Conclusions
Recent advancements in the research and implementation of collaborative government–enterprise reserve supplies have largely overlooked the establishment of standardized classification systems for emergency supplies. The absence of consistency in the classification standards utilized by both governmental bodies and enterprises hampers the efficiency of supply responses during coordinated relief operations, potentially resulting in significant adverse outcomes.
This study integrates the strengths of the BERT model, which excels in semantic abstraction, and the TextCNN model, known for its differential representation capabilities. The proposed approach, termed BERT-TextCNN, is applied to the task of mapping categories between emergency supplies taxonomy and general-purpose supplies taxonomy, facilitating the automatic alignment of these two classifications. Training, learning, and testing experiments are conducted using a high-quality, manually annotated corpus. When compared to six other models, including BERT-RNN and BERT-CNN, the BERT-TextCNN model demonstrates superior prediction accuracy and exhibits the highest level of stability among the models evaluated. The experimental findings indicate that: (1) the hybrid model introduced in this study effectively integrates the strengths of BERT and TextCNN, successfully capturing local correlations while preserving comprehensive textual information with notable accuracy and stability. (2) In comparison to the suboptimal BERT-BiLSTM model, the target model BERT-TextCNN demonstrates enhancements in the evaluation metrics of accuracy, recall, and F1 by 0.36%, 2.3%, and 0.61%, respectively, thereby proficiently accomplishing the task of category mapping by the supplies classification standard. (3) Each component of the target model is essential; the fine-tuned ERNIE (an enhanced pre-training model based on BERT) is particularly well-suited for the semantic representation of the emergency supplies classification standard data, thereby enhancing text comprehension. Additionally, the TextCNN module adeptly extracts significant feature information from the text and accurately identifies keywords, resulting in relatively precise category mapping.
6.2. Theoretical Significance
This study contributes to the field of supply chain management for emergency supplies by developing a novel methodology aimed at optimizing both performance and stability. The approach is grounded in established emergency supplies classification standards and an automated mapping knowledge base. Our method comprises three main components:
- (1)
Dataset Pre-processing: We performed pre-processing on category description datasets aligned with two supply classification standards: the GB/T 38565 dataset, which includes three classes and 739 categories, and the GPC dataset, which consists of 44 classes and 5250 categories. Through random polynomial sampling, we manually labeled 200 categories from the GB/T 38565 dataset, creating a category mapping pair dataset with 798 pairs.
- (2)
Semantic Representation Generation: We developed and fine-tuned a BERT-based word embedding model using the category mapping pair dataset to generate global semantic representations. These word vectors were then integrated into a TextCNN framework, which analyzes the semantic representations and extracts locally significant features. Our approach involves a comprehensive comparative analysis to evaluate the accuracy of each model’s category mapping.
- (3)
Model Evaluation and Selection: We retained the highest-performing BERT-TextCNN model for further computations involving the inference dataset. This model enables the derivation of mappings between all categories from the GB/T 38565 dataset and the GPC categories.
6.3. Practical Significance
The practical implications of the proposed methodology are significant for addressing inefficiencies in emergency supply management. The BERT-TextCNN mapping model developed in this study offers a robust solution for both governmental and private entities, enabling the automated assessment of correspondence between two supply classification standards. This model can be seamlessly integrated into software applications to facilitate automated mapping between the GPC and GB/T 38565 standards.
From an application perspective, the model demonstrating optimal performance during experimental trials can be utilized to infer mappings for new datasets, thus aiding in the determination of relationships between novel supply classification standards. Additionally, due to the normative and stable nature of classification standards, the mapping results produced by this methodology are expected to remain valid over extended periods, pending any updates to the classification standards.
Furthermore, the code, model, and manually annotated corpus used in this research are made available for free use by other researchers
1, promoting further exploration and application of the methodology.
6.4. Limitations
The research process is characterized by several significant limitations. Firstly, the labeling of datasets presents considerable challenges, particularly in terms of selecting the appropriate quantity and diversity of datasets. An increase in the number of datasets and the breadth of categories necessitates a greater investment of time, while a limited dataset may undermine the effectiveness of model training. This study seeks to mitigate these challenges by employing existing methodologies to optimize dataset selection based on overall volume.
Secondly, the quality of the dataset has a direct influence on the efficacy of model learning, which is heavily dependent on the expertise of the individuals conducting the labeling. Consequently, the accuracy and reliability of the model are contingent upon the skill level of the personnel involved in this process.
Lastly, the proposed model encounters limitations in terms of interpretability. While it is proficient in making predictions, it lacks the capability for inference. This limitation arises from the need to compare each category in the GB/T 38565 classification standard with all categories in the GPC, which could involve over 3.8 million calculations. Such extensive computational requirements place significant demands on the experimental environment. In light of these constraints, this study primarily assesses the feasibility of the experimental methodology and the performance of the model, concluding that the proposed method is practical and capable of addressing real-world challenges.
6.5. Future Work
Future research should focus on several critical areas to further the advancement of the field. First, it is essential to enhance the processes involved in manual dataset labeling. This entails refining methodologies to optimize both the quantity and quality of data, which is vital for improving model performance. Subsequent studies could explore automated or semi-automated labeling techniques to address the limitations inherent in manual processes, thereby reducing the time and resources required.
Second, within the realm of semantic refinement, there is a pressing need to investigate more sophisticated models for semantic comprehension and feature extraction. The integration of advanced text enhancement strategies and cutting-edge feature learning techniques could substantially enhance the accuracy and interpretability of semantic analyses. Assessing these enhancements through metrics such as ROC-AUC curves will ensure the validity of classifications and contribute to the robustness of the models.
Finally, the utilization of existing text similarity corpora, coupled with the integration of deep learning models via transfer learning, represents a promising direction for future inquiry. Adapting the proposed model to other classification tasks, particularly those related to supply standard categories, could mitigate the challenges associated with the labor-intensive nature of manual labeling. This strategy has the potential to improve the efficiency and scalability of category-matching systems, thereby addressing current limitations and broadening the applicability of the model across diverse contexts.