2.4.1. Training

Ninety-three classification samples were randomly separated into training and test sets, respectively, comprising 80% and 20% of the samples, with the relative number of injury and healthy samples being balanced. As only one CT scan per patient was utilized in this study, patients and their respective scans were exclusively assigned to either the training or test set.

5-fold cross validation was employed during the training phase to select models with low variance and low bias. The training set was divided into 5 folds of roughly equal size. The classifier was then trained on 4 folds and tested on the remaining fold. Validation accuracy, AUC, and the associated standard deviations were used to select models.

#### 2.4.2. Model Selection

Five models—RF, naive Bayes, SVM, *k*-NN ensemble, and subspace discriminant ensemble—were selected based on validation performance during the training phase as reported in Table 1. Of the five models, RF performed the best on the training set with an AUC of 0.91.

RF, naive Bayes, and SVM are all popular supervised learning models used for analysis on medical images [5]. Naive Bayes is a probabilistic classifier applying Bayes theorem with an assumption of feature pairwise independence given class values. Ensemble learning combines several classifiers to improve prediction performance. RF is an ensemble learner that leverages multiple decision trees to produce a more accurate and stable prediction. Subspace discriminant ensemble [24] employs the linear discriminant analysis (LDA) scheme for a specific discriminant subspace of low dimension. The *k*-NN ensemble employed in this study uses the Random Space method with *k*-NN learners.

Deep learning, and more specifically the application of convoluted neural networks (CNN) to image analysis, has achieved grea<sup>t</sup> success in recent years [25]. To assess the validity of the hand-crafted features proposed in this study, an end-to-end deep learning method was evaluated along with the traditional machine learning models. A pre-trained CNN, ResNet-50 [26], was used for feature extraction on the segmented CT volumes, with subsequent classification performed by a Long Short-term Memory (LSTM) artificial recurrent neural network (RNN). This combination of CNN for slice-wise feature extraction and LSTM for spatial information extraction across the CT volume has been successful in previous injury detection studies, including classification of intracranial hemorrhage [27,28], lung cancer [29], as well as liver and brain tumors [30]. The goal of this approach is to leverage 2D models pre-trained on the ImageNet dataset [31] while still accounting for spatial information between slices in the 3D volume. ResNet-50 was selected for feature extraction because of its relatively higher accuracy and lower number of parameters (23 M) compared to other architectures commonly used for medical image analysis, such as AlexNet (62 M parameters) and VGGNet (138 M parameters).

Feature extraction was performed by ResNet-50 on each slice of the segmented CT, which were cropped to reduce blank space surrounding the region of interest. An LSTM model was then employed to perform classification on the extracted features across each patient's CT sequence.


**Table 1.** Mean and standard deviation (SD) of performance metrics for spleen injury classification from 5-fold cross validation on the training set. The highest value for each performance metric is **bolded** while the lowest SD is *italicized*.
