1. Introduction
Glioblastoma (GBM) is a malignant brain tumor formed as a result of mutations in the genetic material and epigenetic mechanisms driving continuous cell cycle progression and mitosis in brain cells, leading to abnormal energy metabolism that supports sustained growth [
1]. Solitary brain metastases (SBM), on the other hand, refer to the highly malignant tumors that have spread to distant organs or tissues through the bloodstream [
2]. Primary central nervous system lymphoma (PCNSL) is a rare and highly malignant non-Hodgkin lymphoma characterized by the malignant clonal proliferation of lymphocytes, including intracerebral lymphocytes and lymphocytes with central nervous system involvement [
3]. Malignant tumors of the central nervous system (CNS) primarily consist of gliomas, meningiomas, pituitary adenomas, ventricular meningiomas, CNS lymphomas, and metastatic tumors. Among these, GBM is the most common malignant primary brain tumor, accounting for 77–81% of all primary malignant tumors of the CNS [
4]. During the progression of their disease, between 20% and 40% of patients with systemic cancers will develop metastases [
5]. PCNSL represents approximately 6% of intracranial malignancies [
6]. CNS tumors are more prevalent in Northern Europe, and they also have a significant impact on countries like China, the United States, and India. The high incidence of CNS tumors in these regions represents a substantial health burden, emphasizing the need for effective strategies in terms of prevention, diagnosis, and treatment [
7]. These three classes of brain tumors are malignant brain tumors that occur in the CNS, with typical clinical manifestations of elevated intracranial pressure and various neurological symptoms [
8,
9]. Due to the similarity of conventional MRI findings among the three (as shown in
Figure 1), the three common malignant tumors occur in the central nervous system. These tumors share similar imaging characteristics, including central necrosis, irregular or garland-like enhancement of the tumor margins after contrast enhancement, and extensive edema surrounding the tumor [
10,
11]. Conventional MRI sequences and traditional medical image analysis methods can sometimes make it challenging to differentiate between these three tumor types, particularly for less experienced doctors. Studies have shown that when brain metastases appear as solitary lesions without a clear history of a primary tumor, the similarity of imaging features with high-grade gliomas can lead to misdiagnosis in approximately 40% or more cases [
12]. Additionally, while GBM and PCNSL often exhibit distinct MRI manifestations, there are instances where differentiation becomes difficult. For example, atypical PCNSL tumors containing necrosis and hemorrhage may resemble GBM, while atypical GBM tumors without necrosis and with solid appearances may resemble PCNSL [
13,
14]. However, there are significantly different treatments for different tumors. Early selection of the most appropriate treatment option can greatly improve the prognosis. A portion of patients are forced to choose surgery/puncture due to lack of a clear diagnosis, causing unnecessary trauma and delaying the optimal treatment opportunity. Therefore, mastering accurate identification of the three types of tumors before treatment is of significant clinical importance for guiding clinical treatment, optimizing patient management, and improving patient prognosis.
With the advancement of medical imaging and computer information technology, Lambin et al. [
15] proposed the concept of radiomics in 2012, and the analytical methods of radiomics have been rapidly developed and applied in medical imaging-related fields. Several researchers have used radiomics models to classify brain tumors [
16,
17]. They used manual or overview regions of interest (ROIs) for feature extraction and then constructed classification models based on machine learning (ML) modeling methods. However, describing ROIs is a very subjective, tedious, and time-consuming task. To address this issue, scholars have established numerous automatic segmentation models for brain tumor ROI annotation. These models include unsupervised modeling methods [
18,
19], supervised machine learning (ML) methods [
20,
21], and approaches that combine the strengths of both [
22,
23]. These methods, although capable of performing automatic brain tumor segmentation, require an image preprocessing process. Deep learning (DL) is a complex nonlinear regression method, which has developed into a new research direction in ML. By applying DL, data features can be extracted accurately, automatically, and efficiently, avoiding the errors that may be caused by manual segmentation, which can save manpower, financial resources, and time. It has been applied for the qualitative diagnosis, efficacy evaluation, and prognosis evaluation of many diseases of brain tumors [
24,
25,
26,
27]. A convolutional neural network (CNN) is the most commonly used classification model in deep learning. Due to their ability to directly learn the most relevant features related to brain tumor ROI, as well as their adaptive ability and nonlinear representation of data, CNNs have been widely used in multi-modal MRI-based brain tumor classification research.
In existing studies, researchers have focused on the construction of DL-based dichotomous models to complete the brain tumor-assisted diagnosis model [
28,
29,
30], including the differentiation of GBM and SBM, as well as GBM and PCNSL. Few studies have been proposed on DL in the differential diagnosis of GBM, SBM, and PCNSL. The main reason for this may be that DL technology requires large-scale, high-quality, and standardized medical image data as training data input. However, it is tough to obtain a lot of imaging data on three types of brain tumors that meet the standards. Therefore, this study is based on the standardized medical imaging database of brain tumors established by Huashan Hospital, Fudan University, with the use of DL, to establish a triple classification prediction model for GBM, SBM, and PCNSL to achieve non-invasive and accurate diagnosis of three types of brain tumors before treatment, providing evidence for patient subsequent treatment and buying time.
This paper presents MFFC-Net, a multi-modal MRI fusion-based model for assisting in the diagnosis of three types of common and histologically similar malignant CNS tumors: GBM, SBM, and PCNSL. The key contributions of this work can be summarized as follows:
DenseBlock-based parallel multiple encoders are proposed to extract features simultaneously from different sequences. This allows for comprehensive representation learning across various MRI sequences.
A novel feature fusion module is introduced to enhance the interrelated information between different tumor tissues. By improving the tumor characterization ability of the extracted features, the model achieves more accurate tumor classification.
The model incorporates a spatial-channel self-attentive weighting operation on both the modal and fusion features. This operation dynamically adjusts the relationship between the weights of different channels, enhancing the model’s expressive ability and improving its overall performance.
By leveraging these contributions, MFFC-Net demonstrates promising potential for assisting in the diagnosis of GBM, SBM, and PCNSL, thereby aiding in the effective management and treatment of these malignant CNS tumors.
4. Results
Table 3 and
Figure 4 show the results of brain tumor trichotomies for the radiomics and DL models. Among all radiomics models constructed based on a single sequence, the CE-T1WI-based model (CR-Model) obtained the highest ACC of 0.810. Compared to the CR-Model, the multi-modality radiomics model (MR-Model) increased the ACC by 0.19. In the DeLong test, no significant difference in AUC was identified between the CR-Model and MR-Model (0.859 vs. 0.873,
p = 0.208). The DL model also showed results consistent with the performance of the radiomics model in the results obtained for single sequences. The CE-T1WI-based model (CC-Net) obtained the highest ACC in the single sequence DL models (ACC of 0.841 and AUC of 0.877). In contrast, the ACC of the MR-Net with multi-modal was significantly higher than that of the CC-Net (0.890 vs. 0.841,
p = 0.021). Although MFFC-Net was not significantly different in ACC compared to MC-Net, the MFFC-Net AUC was 0.26 higher and significantly different than MC-Net (0.942 vs. 0.916,
p = 0.032) and F1-score was 0.029 higher (0.919 vs. 0.890).
We compared the classification results of GBM, SBM, and PCSNL obtained by MFFC-Net with DenseNet [
33], SENet [
34], and EfficientNetV2-S [
35]. The results are presented in
Table 4. We observed that while both DenseNet and SENet achieved an AUC exceeding 0.90, it was still slightly lower compared to EfficientNetV2-S, which got an AUC of 0.938. However, our method demonstrated some improvement in ACC, PPV, SEN, SPE, and F1-score compared to EfficientNetV2-S, despite not showing a significant difference (
p = 0.512 with the DeLong test) in terms of AUC.
Compared to clinician diagnoses (as shown in
Figure 5), the MFFC-Net achieved excellent performance with no significant difference from expert radiologists in the ACC (0.920 vs. 0.924,
p = 0.774).
As demonstrated in
Figure 5, the MFFC-Net exhibited outstanding performance when compared to clinician diagnoses. Specifically, our model achieved an accuracy (ACC) score of 0.920, which was comparable to that of expert radiologists with no significant difference between the two (0.920 vs. 0.924,
p = 0.774). These results further validate the effectiveness and reliability of our proposed MFFC-Net as a diagnostic tool for identifying brain tumors.
To visualize the classification weights of the DL model, we plotted the gradient-weighted class activation mapping (Grad-CAM) to visualize the DL-based model of the DL model for a more intuitive understanding of the ROIs of the DL model, as shown in
Figure 6. The red areas correspond to high scores in the tumor category. We found that the MFFC-Net model focuses more on the tumor region.
5. Discussion
In this paper, we developed and tested the MFFC-Net for GBM, SBM, and PCNSL classification. MFFC-Net extracted high-level features for T2-Flair and CE-T1WI parallel encoding, respectively. Then, a feature fusion layer was constructed to enhance the interrelationship information between different tumor tissues and suppress redundant features. After completing the tumor classification task by convolution and Soft-max layers, the deep-fused features were concatenated together. Furthermore, we compared the diagnostic efficacy of radiomics models, DL models, and radiologists.
Among the single-sequence classification models (including the FR-Model, CR-Model, FC-Model, and CC-Model), the efficacy of the CE-T1WI sequence-based models was superior to that of the T2-Flair sequence-based models (SEN of radiomics models: 0.810 vs. 0.728, SPE of radiomics models: 0.905 vs. 0.865, AUC of radiomics models: 0.859 vs. 0.797; SEN of DL models: 0.840 vs. 0.750, SPE of DL models: 0.920 vs.0.875, AUC of DL models: 0.877 vs. 0.818) consistent with our clinical work experience [
36,
37]. It is proved that CE-T1WI can more visually reflect the cellular anisotropy, neovascularization, degree of blood–brain barrier disruption, and infiltration of surrounding tissues in brain tumors [
38,
39]. However, the weak correlation between edema region features and tumor type (as shown in
Figure 6) may have led to the poor performance of the T2-Flair-based classification model in this task.
As for the multi-modal MRI-based classification models (including MC-Net, and MFFC-Net), the MC-Net model based on multi-modal MRI had better diagnostic efficacy than either model based on single MRI (SEN of radiomics models: 0.829 vs. 0.728, SPE of radiomics models: 0.915 vs. 0.865, AUC of radiomics models: 0.873 vs. 0.797; SEN of DL models: 0.889 vs. 0.750, SPE of DL models: 0.945 vs. 0.875, AUC of DL models: 0.916 vs. 0.818). The result is consistent with the performance of radiomics-based classification of brain tumors reported by Bae et al. [
40]. To some extent, these also reflected the potential significance and value of multi-modal MRI in radiomics and DL model construction and clinical application. Furthermore, by
Figure 6 we found that both CC-Net and MC-Net enable the network to focus on the tumor area. MFFC-Net, on the other hand, significantly reduces the weight of non-tumor regions (the weight map of normal brain regions is more blue-oriented). AUC of MFFC-Net based on fusion feature was significantly better than MC-Model (0.942 vs. 0.916,
p = 0.038), which fully indicates that the fusion of deep features, which can better characterize the tissue relationship of tumors, suppresses redundant features, reduces the variance of prediction, and decreases the generalization error [
41]. In addition, the proposed feature fusion layer can improve the classification ability of the DL model.
In addition, the ACC of MFFC-Net was significantly better than the junior radiologist (0.940 vs. 0.782,
p < 0.001) and senior radiologist (0.940 vs. 0.879,
p = 0.017). There was no statistically significant difference between MFFC-Net and expert radiologists in ACC (0.940 vs. 0.943,
p = 0.775). In a sense, the ACC of diagnostic imaging depends heavily on the clinical work experience of radiologists and needs to be improved by long-term clinical practice. And our MFFC-Net can effectively compensate for the lack of diagnostic experience of junior radiologists. It is not the only one: Shin et al. [
42] developed a classification model using ResNet50 for multi-modal MRI and achieved AUCs of 0.889 and 0.835 in the internal and external test sets, respectively. These results were generally consistent with those obtained by radiologists, who achieved AUCs of 0.889 and 0.857, respectively. It can be seen that our MFFC-Net can help them improve the differential diagnosis of three types of brain tumors and gain time for patients’ subsequent treatment. The model also prevents patients from being forced to opt for surgery or puncture due to the inability to confirm the diagnosis, which causes unnecessary harm, by assisting diagnosis in a non-invasive manner.