**3. Materials and Methods**

In this section, we first explain the overall process of the research, then introduce the WTSS algorithm in detail. Next, we introduce the structure of our DNN classifier, and then determine the multiclassification metrics used in the algorithm to evaluate the performance of the classifier. Finally, the method of judging the plausibility of the generated synthetic samples is introduced.

#### *3.1. Workflow of the Study*

To solve the data augmentation problem and the supervised learning problem, an integrated modeling approach that incorporates the war trauma severity scoring algorithm (WTSS) and a DNN model was proposed. This approach's workflow is summarized as follows (Figure 1).


#### *3.2. Random Injury Generation*

In the injury generation process, we first randomly sampled the injured part according to the probability of occurrence; then, we randomly selected the possible injury types according to the injured part; finally, we randomly sampled whether it is accompanied by complications; if there were complications, we randomly selected the possible complications.

**Figure 1.** Workflow of the WTSS–DNN integrated approach.

#### *3.3. WTSS Algorithm*

After injuries were randomly generated, the focus of the research was on how to conduct standardized and accurate injury assessments. To solve this problem, we conducted multiple rounds of discussions and communication with the expert panel and finally decided to carry out a standardized quantitative assessment of various injuries by proposing a war trauma severity scoring algorithm.

Via in-depth summary of the various existing trauma scoring algorithms and based on the idea of multiple nonlinear regression and the key factors that affect severity of an injury (injured part, injury type, complications, and whether there are multiple injuries), after several rounds of testing and optimization, the equation for WTSS was finally determined as follows:

$$F(P, X, \mathbb{C}) = a + \sum\_{i=0}^{6} P\_i X\_i + \mathbb{C}\_i \tag{2}$$

where *F* represents the severity score; *P<sup>i</sup>* represents the weight coefficient of injury severity for each of the seven body parts; *X<sup>i</sup>* shows whether the corresponding body part was injured (if not injured, the corresponding *X<sup>i</sup>* value equals 0; otherwise, it equals the injury severity standard score for the corresponding body part); *C<sup>i</sup>* shows whether the injury was accompanied by complications (if there were no complications, *C<sup>i</sup>* equals 0; otherwise, it equals the corresponding severity score); the bias *a* is the correction value for multiple injuries (if there were multiple injuries, *a* equals −20; otherwise, it equals 0).

Next, we calculated *F* according to the predictive factors *P<sup>i</sup>* , *X<sup>i</sup>* , *C<sup>i</sup>* , and *a*, then selected the corresponding score interval according to the magnitude of *F*. Finally, we labeled the synthetic samples with the consequences of the injury. The pseudocode of WTSS is provided in Algorithm 1.

```
Algorithm 1. War trauma severity score (WTSS).
Input: Weight coefficient of injury parts: Pi = {P0
                                                   , P1
                                                       , ..., P6
                                                              }.
       Injury type score: Xi = {X0
                                   , X1
                                       , ..., X6
                                              }.
       Complication score: Ci = {C0
                                     , C1
                                         , ..., C6
                                                }.
       Correction value for multiple injuries: a = −20.
Output: Severity score: F(P, X, C).
1: n = 0
2: for i = 0 to 6 do
3: if Pi 6= 0 and Xi 6= 0 then
4: F(P, X, C) += Pi
                        *Xi
5: n += 1
6: end if
7: if Ci 6= 0 then
8: F(P, X, C) += Ci
9: end if
10: end for
11: if n > 1 then
12: F(P, X, C) += a
13: end if
14: return F(P, X, C)
```
The WTSS algorithm is a nonlinear model which ignores complicated details of the injury and uses a good correlation between the injuries' consequences and the severity of the injured parts and the injury types [45]. The weight coefficients of injuries in different body parts are shown in Table 1, and the example of the standard severity score for injury types and complications are shown in Figures 2 and 3. The score intervals for the injury consequences are listed in Table 2.

**Table 1.** Weight coefficients of each body part.


In a situation wherein different injury types or complications have the same standard injury severity score in a certain injured part, we coded them to distinguish. Taking the abdomen as an example, the coding method is shown in Figure 4.

As an independent scoring algorithm to determine severity of war trauma, WTSS does not perform an extremely accurate diagnosis of a specific injury. Instead, it performs standardized assessment and prediction of the most probable consequences of injuries from an objective perspective to ensure accuracy of the injury consequence assessment. Additionally, WTSS is not only the core of our WTSS–DNN integrated model that contributes to large-scale analysis and evaluation of war trauma data, but it also helps to quickly evaluate and diagnose soldiers' injuries on the battlefield and determine the treatment strategy. Furthermore, in complex battlefield environments, the soldier's age, physical constitution, and other factors may cause different consequences of the same trauma. Consequently, WTSS only objectively assesses the injury without considering the age and other physiological indicators to meet the requirements of the ideal scoring method that is "easy to implement, objective, and accurate" [38].


**Figure 2.** Standard severity score for different injury types. In this Figure, I indicate that the injury is a blast injury, II indicates that the injury is a gunshot wound.


**Figure 3.** Standard severity score for different complications.

**Table 2.** Description of the score intervals.


**Figure 4.** Example of injury coding in the abdominal area.

## *3.4. Deep Neural Network*

Because the WTSS algorithm is a complicated nonlinear model, this article used a DNN as a classifier model to test the accuracy of injury consequences. The DNN classifier consists of an input layer, an output layer, and several hidden layers. It uses multilayer nonlinear information processing, which can be widely and flexibly used to solve problems such as classification, regression, dimensionality reduction, feature extraction, and clustering. First, we built a suitable DNN classifier network structure according to the actual needs, and the network structure was determined to be 22–16–16–16–4 after the experiment. Next, to test whether such a classifier has excellent generalization ability, we trained it with synthetic samples and tested it with real samples. To verify its performance, we used four multiclassification metrics based on a confusion matrix: accuracy, precision, recall, and the F<sup>1</sup> score [46]. Among these metrics, the F<sup>1</sup> score is the harmonic average of precision and recall. Finally, we adjusted and optimized the hyperparameters and then determined the best learning rate and the training sample size. The confusion matrix is shown in Figure 5.

**Figure 5.** Graph of the multiclassification confusion matrix.

In Figure 5, *L* represents the class number, *nii* and *nij*—the number of class *C<sup>i</sup>* samples correctly predicted as class *C<sup>i</sup>* and incorrectly predicted as class *C<sup>i</sup>* , respectively; *R<sup>i</sup>* and *P<sup>i</sup>* indicate the recall and the precision of class *C<sup>i</sup>* , defined in Equations (3) and (4), and the accuracy and the F<sup>1</sup> score are defined in Equations (5) and (6).

$$P\_i = \frac{n\_{ii}}{\sum\_{j=1}^{L} n\_{ji}} \tag{3}$$

$$R\_i = \frac{n\_{ii}}{\sum\_{j=1}^{L} n\_{ij}} \tag{4}$$

$$Accuracy = \frac{\sum\_{i=1}^{L} n\_{ii}}{\sum\_{i=1, j=1}^{L} n\_{ij}} \tag{5}$$

$$F\_1 \text{ score} = 2 \frac{\sum\_{i=1}^{L} R\_i \sum\_{i=1}^{L} P\_i}{\sum\_{i=1}^{L} R\_i + \sum\_{i=1}^{L} P\_i} \tag{6}$$

#### *3.5. Discrimination of Unreasonable Injuries Based on the Delphi Method*

After data generation, to improve the data plausibility of the synthetic samples, the expert panel reached a consensus on multiple unreasonable injuries based on the domain knowledge and provided feedback. Based on this feedback, we analyzed the law of unreasonable injury combinations and filtered out the unreasonable synthetic samples to improve data plausibility. Finally, we outputted the credible synthetic samples.

#### **4. Empirical Analysis**

Due to the high confidentiality and difficulty of access to war trauma data, it is gradually attracting greater attention from the army, military academies, and related hospitals. To eliminate obstacles to related research, an efficient and credible data augmentation approach is urgently needed in order to support large-scale war trauma data research and war game deduction. Our proposed integrated model provides a new and feasible way to meet the real need for large-scale and automated generation of credible war trauma data.

#### *4.1. Data Collection*

In this study, we collected and organized two types of real war trauma data at a certain scale: data on gunshot wounds and blast injuries. We selected 338 cases (minor injury, 114 cases; moderate injury, 82 cases; serious injury, 74 cases; and critical injury and death, 68 cases) complete with the available data to form the test set. After the preprocessing operations such as one-hot encoding, data standardization, and feature reduction, our war trauma data had a total of 22 features.

#### *4.2. Results Analysis*

We implemented our proposed WTSS–DNN integrated model in Python 3.7.7 and conducted experiments on a personal computer with a Windows 64-bit operating system. After a series of tests on the DNN, the optimal values of all the hyperparameters were determined. The classifier's input dimension was 22, equal to the feature dimension of the war trauma samples. The number of hidden layers of the classifier was set at 4, with each using ReLUs as the activation function. The softmax function was used as the output layer, and categorical cross-entropy was used as the loss function. We used TensorFlow 2.0.0 and GPU to train our DNN classifier; the epoch was set at 1000 and the batch size was set at 256. We chose Adam as our optimization algorithm as it performed best compared to SGD and RMSProp3 [47].

After determining the best network structure of the DNN classifier (22–16–16–16–16–4), we conducted contrast experiments at different learning rates [48]. Specifically, we kept the network structure and other hyperparameters unchanged, then set the values of the learning rate to be 0.05, 0.02, 0.01, 0.005, 0.002, 0.001, 0.0005, and 0.0001, respectively. Table 3 shows accuracy, precision, recall, and the F<sup>1</sup> score at different learning rates on the

same training set with a sample size of 10,000. The results show that the 0.001 learning rate led to the best overall model performance and thus was selected and used.


**Table 3.** Comparison of the multiclassification metrics at different learning rates.

Next, we explored the best training sample size (*n*). On the one hand, low numbers of training samples cannot fully teach sample features and meet the requirements of model accuracy; and on the other hand, too high numbers of training samples can increase the calculation costs and time costs and are not conducive to optimizing the hyperparameters. Therefore, we sought to determine the best training sample size in the range of 1000–20,000 through the trial and error method [49]. In the search process, to avoid the impact of class imbalance on the experimental results, synthetic samples of the four classes were extracted at the same proportion to form a training set for the experiment and test. The overall performance results of the multiclassification metrics at different training sample sizes are shown in Table 4.


**Table 4.** Comparison of the multiclassification metrics at different sample sizes.

The experimental results showed that the small-scale training set did not meet the requirements for model accuracy. As the training sample size continued to increase, the predicted accuracy gradually increased. When the training sample size was 8000, the accuracy reached 80.88%; and when the training scale increased to 12,000, the accuracy increased to 84.33%. However, model performance became deteriorated when the training scale was greater than 12,000, which indicates that blindly increasing the training scale could not guarantee a consistently higher classification accuracy. Besides, when the training scale was increased, as the harmonic average of precision and recall, the trend of the F<sup>1</sup> score was basically consistent with that of accuracy. Therefore, we supposed that selecting a training sample size of 12,000 can achieve the best compromise between the training cost and the classification performance.

Finally, our DNN classifier achieved the best overall performance with 84.33% accuracy, 90.07% precision, 88.44% recall, and an 89.25% F<sup>1</sup> score.

#### *4.3. Evaluation of WTSS Combined with a DNN*

In this section, we first explored the accuracy of injury assessment of different classifier models. Subsequently, to evaluate the respective contributions of the WTSS algorithm and the DNN classifier in the WTSS–DNN data augmentation method, we set up an ablation experiment. Finally, we provided the prediction results of the DNN for real data through the confusion matrix.

First, we compared our DNN model with three classic machine-learning classifiers: random forest (RF) [50], XGBoost [51], and naïve Bayes (NB) [52].

The RF, XGBoost, and NB models and our DNN model were trained with the same training set and then tested with the same real samples. As shown in Figure 6, our DNN classifier performed better than the three classic machine-learning models. The NB model showed the weakest performance in comparison with the other classifier models because when the number of features is large or when the correlation between the features is high, the NB classification effect is poor. These results indicate that classic machine-learning models cannot be effectively trained when there are few samples and verified that a DNN classifier trained with a large amount of data has better classification performance.

**Figure 6.** Performance of the different classification strategies.

Next, to evaluate the respective contributions of the WTSS algorithm and the DNN classifier in the WTSS–DNN integrated model, we set up an ablation experiment. Specifically, we combined different injury assessment methods with different classifier models to observe performance of various combinations. Injury assessment methods include the WTSS algorithm and the manual assessment method (MA); classification models include DNN, RF, XGBoost, and NB. The results of the ablation experiment are shown in Table 5.

**Table 5.** Ablation experiment of different injury assessment methods and classifier models.


From the results of the ablation experiment, we can see that the WTSS algorithm is better than the traditional manual evaluation method, the prediction performance of the DNN classifier is better than that of the machine-learning model, and the combination of WTSS and the DNN performs best. Therefore, the combination of WTSS and the DNN can effectively solve the data augmentation problem of war trauma data and shows superiority compared with artificial generation methods.

Finally, we provided the prediction results of the DNN for real data through the confusion matrix.

From Table 6, we can see that the prediction accuracy for minor injuries and moderate injuries is very high, but the prediction accuracy for critical injuries is only about 60%, which is caused by the complexity of critical injuries.


**Table 6.** Confusion matrix of injury consequence identification.

#### *4.4. Data Filtering*

The Delphi method, also known as the "expert investigation method", was invented in 1946 by RAND Corporation in the United States. The Delphi method is based on the key assumption that predictions from groups are usually more accurate than predictions from individuals. The goal of this method is to use a structured iterative approach to obtain consensual opinions from an expert panel [44].

For the multiple injuries data generated, some injury combinations are unreasonable they are almost impossible to appear in a real war. To improve plausibility and usability of the synthetic samples in our experiment, we decided to use the Delphi method to evaluate unreasonable multiple injuries and filter them out. After several rounds of identification and discussions, the expert panel reached a consensus on the unreasonable multiple injuries based on the domain knowledge. We analyzed the experts' feedback and then filtered out the unreasonable synthetic samples to improve data plausibility to output credible samples. Next, to verify whether the data plausibility improved or not, we randomly selected 300 original multiple-injury synthetic samples and 300 filtered ones, put them into three groups, and conducted contrast experiments. Then, we counted the number of reasonable samples before and after filtering. The experimental results are shown in Figure 7.

**Figure 7.** The numbers of reasonable samples in the synthetic samples.

The experimental results showed that data plausibility of the synthetic samples filtered out was significantly improved in comparison with that of the original ones and came close to 100%.

#### **5. Discussion**

For the WTSS–DNN integrated model, plausibility and effectiveness of the WTSS algorithm play a crucial role in the performance of WTSS–DNN. Therefore, we evaluated plausibility and effectiveness of the WTSS algorithm through the two methods described below. First, the expert panel intervention and assistance. The parameter setting and the scoring standard of the algorithm were determined after multiple rounds of discussions and evaluations with the expert panel, which is highly reasonable and professional. Second, we tested plausibility and effectiveness of the algorithm through ablation experiments. In the ablation experiments, on the one hand, we used the DNN classifier to verify accuracy and plausibility of the algorithm in injury assessment. The experimental results show that the prediction accuracy rate reached 84.43%, which is a satisfactory result. On the other hand, we compared the WTSS algorithm with the traditional manual assessment method, further verified plausibility and superiority of the WTSS algorithm in injury assessment. Therefore, compared with the artificially generated methods, the performance of the proposed WTSS algorithm combined with a DNN in war trauma data augmentation is superior, can ensure high data quality, and automatically generates large-scale war trauma data on demand.

However, the experiment also showed that the prediction accuracy of the severity of multiple injuries was lower than that for a single injury due to the complexity of multiple injuries. Furthermore, after determining the WTSS standards, the proposed approach no longer relies on additional professional knowledge due to the characteristics of DL. Thus, for nonprofessionals, the proposed approach has a low barrier to successful application. Although we were able to generate credible virtual trauma data only for blast injuries and gunshot wounds in this study, with the continuous real data collection, the types of war trauma we can generate will become more abundant. Finally, the combination of DL with medical scoring algorithms can be used for other types of injury data augmentation, such as for surgical injuries and emergency injuries.

#### **6. Conclusions**

In this article, the WTSS algorithm combined with a DNN was presented for the augmentation of war trauma data. Compared with the traditional artificial data augmentation method, our integrated modeling approach not only improves the quality of injury consequence assessment, but can also automatically generate large-scale and credible virtual war trauma data. The generated data make it possible to carry out related data-based military research, which has great practical significance and value. In addition, it also provides a practical auxiliary tool for quickly evaluating soldiers' injuries and formulating treatment strategies, which are of crucial significance to the analysis and evaluation of war trauma data. Finally, because this study was the first attempt to combine DL and the trauma scoring algorithm for the augmentation of war trauma data, it still had some shortcomings, but with the continuous improvement of the WTSS algorithm, the performance of our WTSS–DNN integrated model will become more superior. That is also the focus and direction of our future research, to continuously improve the comprehensiveness and applicability of our integrated modeling approach.

**Author Contributions:** P.Z. conceived the presented idea and verified the analytical methods. J.Y. provided the experimental environment, supervised and validated the findings of this work. Y.Z. provided the research topic, medical theoretical and technical support. Writing, editing, and formatting the manuscript was carried out by P.Z. with support from Y.Z. and J.Y. Funding acquisition was carried out by S.W. and Y.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Fund, sponsor: Jibin Yin, funding number: 61741206.

**Acknowledgments:** We would like to thank Shuoyu Wang and Yi Han for their assistance with this study.

**Conflicts of Interest:** The authors declare no conflict of interest.
