**1. Introduction**

Tuberous sclerosis complex (TSC) is a rare neurodevelopmental disorder caused by mutations in the TSC1 and TSC2 genes [1,2]. It is characterized by angiofibromas of the face, epilepsy, an intellectual disability, and hamartomas in multiple organs including the heart, kidneys, brain, and lungs [3–5]. The majority of pediatric TSC patients experience their initial seizure in the first year of life [6–8], which has a severe impact on the lives of TSC children [9,10]. Therefore, it is urgent and valuable to develop valid and robust classification models for TSC children in a clinic.

Neurological symptoms are prevalent in nearly all children with TSC, and multicontrast magnetic resonance imaging (MRI) is frequently employed for a clinical diagnosis [11]. To date, T2-weighted imaging (T2W) and fluid-attenuated inversion recovery (FLAIR) have been commonly utilized in a pediatric TSC diagnosis, allowing for the identification of lesions and facilitating high lesion-to-brain contrast visualization. But, the

**Citation:** Jiang, D.; Liao, J.; Zhao, C.; Zhao, X.; Lin, R.; Yang, J.; Li, Z.; Zhou, Y.; Zhu, Y.; Liang, D.; et al. Recognizing Pediatric Tuberous Sclerosis Complex Based on Multi-Contrast MRI and Deep Weighted Fusion Network. *Bioengineering* **2023**, *10*, 870. https://doi.org/10.3390/ bioengineering10070870

Academic Editors: Paolo Zaffino, Maria Francesca Spadea and Kevin J. Otto

Received: 29 May 2023 Revised: 24 June 2023 Accepted: 12 July 2023 Published: 22 July 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

1

cerebrospinal fluid (CSF) signal is strong in T2W, which severely interferes with the visualization of periventricular TSC lesions. FLAIR imaging can suppress cerebrospinal fluid and sufficiently show the lesion–brain contrast clearly, and FLAIR also reduces the signal-to-noise ratio while pressing CSF [12]. Currently, it is not possible for a single MRI sequence to produce all the required tissue contrasts in a single contrast image due to the trade-offs that need to be made when choosing MRI pulse sequence parameters [13]. In recent studies, it has been demonstrated that a synthesized contrast that blends T2W and FLAIR imaging can augment the contrast of multiple sclerosis (MS) lesions, leading to an improved diagnostic efficacy [12,13]. However, to the best of our knowledge, there are not studies on applying a synthesis contrast combining T2W and FLAIR for diagnosing pediatric TSC so far.

Otherwise, deep learning has been studied as an advanced artificial intelligence technology that can automatically learn from medical image data and extract a large number of features [14]. Previously, deep learning models and multi-contrast MRIs have been successfully used for automatically detecting strokes [15] and classifying brain tissues [16]. Until now, convolutional neural networks (CNNs) have been applied to assist in tuber segmentation in TSC patients [17]. Sanchez et al. [18] used two types of contrast MRI, T2W and FLAIR, for the detection task of TSC tubers and achieved the receiver operating characteristic curve that can have an area under the curve (AUC) of 0.99. However, their approach employed a 2D network and solely relied on handpicked MRI slices with evident tubers as input to the network. This method failed to account for the spatial attributes of MRI and neglected the fact that not all TSC patients exhibit visible lesions. Additionally, their datasets were limited to merely 114 TSC patients and 114 controls. Alternatively, recent research suggests that 3D CNNs excel at capturing the spatial characteristics of MRI and effectively capitalize on the interplay between voxels. Consequently, they have been reported to yield superior results in predicting chronological age [19].

To further raise the performance of identifying TSC children in a clinic, a novel deep learning method, named the deep weighted fusion network (DWF-net), was proposed to effectively diagnose pediatric TSC lesions with multi-contrast MRIs. The proposed method has a synthesis contrast, named FLAIR3, from the combination of T2W and FLAIR that can maximize the lesion–brain contrast of pediatric TSC lesions. Moreover, the proposed method has a 3D CNN strategy of the weighted late fusion model combined with multicontrast MRI to automatically diagnose pediatric TSC. The experimental dataset has a total of 680 children, including 331 healthy and 349 TSC children. Experiments intuitively show that the new synthesis FLAIR3 contrast and the weighted 3D CNN strategy can effectively improve the contrast saliency of pediatric TSC lesions, and the classification performance.

The proposed deep learning method is efficient in distinguishing TSC children from healthy children and presently achieves the best performance. The proposed method has great potential in helping clinical doctors diagnose TSC children and provides an effective research tool for pediatric doctors.

#### **2. Methods**

#### *2.1. Optimal Combination of T2W and FLAIR*

Cortical and subcortical nodules are the most common lesions in TSC children. The increased prominence of lesions is crucial for clinical doctors to diagnose pediatric TSC [20]. The T2W signal is related to water content, and most of the lesions have stronger T2W signals than surrounding normal tissues, often exhibiting a bright state. Therefore, the location and size of the pediatric TSC lesions can be seen from the T2W sequence. However, the outline of the lesion is relatively vague in the T2W sequence, and it is difficult to clearly outline the outline of the lesion. Moreover, there was a strong cerebrospinal fluid (CSF) signal interference in T2W. FLAIR, also known as water-suppression imaging, suppresses (darkens) CSF hyperintensity in T2W, thereby making lesions adjacent to CSF clear (brightened). Compared with the T2W sequence, the FLAIR sequence can better represent the surroundings of the lesion and clearly show the lesion area. FLAIR is a T2W scan that

selectively suppresses CSF by reversing pulses. However, CSF signal suppression comes at the expense of reducing the signal-to-noise ratio [12]. FLAIR2 and FLAIR3 have been proposed to combine T2W and FLAIR to improve lesion visualization in MS disease [12,13]. Inspired by [12,13], we propose to optimize the combination of T2W and FLAIR as a new modality named FLAIR3 in pediatric TSC disease as follows [13]:

$$\begin{aligned} \text{FLAIIR}\_3 &= \text{FLAIIR}^\alpha \times T2\mathcal{W}^\beta\\ \text{s.t. } \alpha + \beta &= 3 \end{aligned} \tag{1}$$

where the optimized *α* is 1.55 and *β* is 1.45 based on the signal equations of FLAIR and T2W [13], which can optimally balance the lesion contrast between FLAIR and T2W.

### *2.2. Late Fusion Strategies*

Some recent studies [21] have shown that the late fusion model could grasp the data distribution effectively and finally achieve the best classification performance. Inspired by [22,23], a weighted late fusion strategy was used to combine multi-contrast MRI for classification tasks in pediatric TSC patients. First, T2W, FLAIR, and FLAIR3 were fed into a feature extractor. We propose a deep weighted network (DWF net) that takes the scores of the T2W, FLAIR, and FLAIR3 models as input, and outputs the final classification with a simple and efficient weighted average integration method, as follows:

$$\begin{array}{rcl} \text{S}\_{\text{DWF}} & = & W\_1 \times \text{S}\_{T2W} + W\_2 \times \text{S}\_{\text{FLAR}} + W\_3 \times \text{S}\_{\text{FLAR3}}\\ & \text{s.t.} \,\sum\_{i=1}^3 W\_i = 1 \end{array} \tag{2}$$

where *ST2W*, *SFLAIR*, and *SFLAIR3* represent the classification scores of T2W, *FLAIR*, and *FLAIR*<sup>3</sup> models, respectively. *SDWF* denotes the final output prediction scores of the proposed DWF-net. *W*1, *W*2, and *W*<sup>3</sup> are the weights of the prediction scores of the three multi-contrast MRIs.

To explore the optimal fusion between multi-contrast MRI and to enhance the AUC of the proposed DWF-net, the experiments were performed for values of *W*<sup>1</sup> between 0 and 1, and *W*<sup>2</sup> from 0.1 to 1−*W*<sup>1</sup> with a step of 0.1; *W*<sup>3</sup> is 1−*W*1−*W*2. The weight-searching algorithm is shown in Algorithm 1.

**Algorithm 1** The weight searching algorithm for fusion

```
Input: The prediction scores ST2W, SFLAIR, and SFLAIR3 of three input images and corresponding
ground truth y on testing set.
Output: The weight (W1, W2, and W3) with best AUC on testing set.
1: Initialize AUC best ← 0.
2: for i: =0 to 10 do
3: for j: =0 to 10–i do
4: k ← 10-i–j
5: S temp = (i×ST2W + j×SFLAIR + k×SFLAIR3) × 0.1
6: AUC temp = Compare (Stemp, y)
7: if AUC temp > AUC best then
8: AUC best ← AUC temp
9: W1 ← i×0.1
10: W2 ← j×0.1
11: W3 ← k×0.1
12: end for
13: end for
14: end for
Return W1, W2, and W3
```
#### *2.3. Network Architectures*

The proposed DWF-net method for pediatric TSC patients was implemented using two different 3D CNN architectures. The following sections describe two different 3D CNN models.

ResNet was proposed in 2015 and has been widely applied in detection, segmentation, recognition, and other fields [24]. In addition, ResNet has demonstrated a stable and excellent classification performance in image classification among different variants of various 3D CNNs [24]. Therefore, the first 3D CNN model we consider is 3D-ResNet, which uses a shortcut connection to make a reference for the input of each layer and learns to form a residual function. The residual function is easier to optimize, making the number of network layers much deeper, and can easily obtain a higher accuracy from deeper depths.

For the second 3D CNN model, we utilized the 3D-EfficientNet architecture [25] as our feature extractor. This classification network is known for its efficiency in improving accuracy and reducing the training time and network parameters. The EfficientNet was designed using a neural architecture search and employs the mobile inverted bottleneck convolution (MBConv) module as its core structure. This module, similar to depth-wise separable convolution, minimizes parameters significantly. In addition, the attention idea of the squeeze-and-excitation network (SENet) is also introduced [26] in EfficientNet. The attention mechanism of SENet allows the model to focus more on channel features that are most informative, while suppressing those unimportant channel features, thereby improving the model performance.

As shown in Figure 1a, for the pediatric TSC identification tasks with one single MRI modality, the 3D-ResNet34 and 3D-EfficientNet were used as a feature extractor. When DWF-net was used, two or three modalities were applied as inputs, as shown in Figure 1b. Table 1 displays the 10 models that were trained in this study, each with distinct architectures and inputs.

**Figure 1.** Overall network structure, (**a**) single modality model pipeline, (**b**) schematic of the proposed DWF-net pipeline. The two dotted lines represent the optimal combination of T2W and FLAIR to generate FLAIR3.


**Table 1.** Detailed information on ten network structures.

#### **3. Materials and Experiments**

#### *3.1. Dataset*

In this study, all pediatric volunteers were from Shenzhen Children's Hospital. The study was approved by the Ethics Committee of Shenzhen Children's Hospital (No.2019005). Written informed consent was obtained from all pediatric volunteers and/or their parents. In total, 349 TSC children and 331 healthy children (HC) were included in this study. Inclusion criteria for pediatric TSC patients were (1) aged 0–20 years, (2) no other neurological disorders, and (3) clinically diagnosed with TSC. (4) T2W and FLAIR images are complete and clear. Inclusion criteria for healthy children were (1) aged 0–20 years, (2) without any neurological disorder, (3) clinically defined normal or non-specific findings during routine clinical care. (4) T2W and FLAIR images are complete and clear. Figure 2 shows the exclusion and inclusion criteria of our study.

**Figure 2.** Study exclusion and inclusion criteria of the pediatric dataset.

The data were randomly split into train-validation-test sets in a 7:1:2 ratio. To ensure that every group had the same class proportion, stratified random sampling was employed. Training, validation, and testing datasets had no overlap of patients.

#### *3.2. Data Processing*

Firstly, a FMRIB Linear Image Registration Tool (FLIRT) of FSL (http://fsl.fmrib.ox. ac.uk (accessed on 1 January 2021.)) was used to register T2W into the FLAIR space, and mutual information was used as the cost function. In neuroimaging studies, the lesions are usually located in the brain tissue, and the skull part is an irrelevant site. When brain MRI images are used for classification network research, the brain tissue of the region of interest is often the input. HD-bet is an algorithm for extracting brain tissue [27], which can remove irrelevant images such as of the neck and eyeball. Therefore, in the second step, the deep learning tool HD-bet is used to strip the skull in MRI. Subsequently, all 3D MRI images were resized to 128 × 128 × 128, and the image intensity was normalized to the range of 0 to 1 using the min–max normalization formula:

$$\propto\_{Normalized} = \frac{\mathbf{x} - \mathrm{Min}(\mathbf{x})}{\mathrm{Max}(\mathbf{x}) - \mathrm{Min}(\mathbf{x})} \tag{3}$$

where *Max*(*x*) and *Min*(*x*) represent the highest and lowest values of the brain-extracted MRI images, respectively, and *xNormalized* refers to the normalized MRI images. Finally, T2W and FLAIR were combined and transformed into FLAIR3. The flowchart illustrating the data preprocessing can be found in Figure 3.

**Figure 3.** Flowchart of the data preprocessing.

#### *3.3. Baseline and Effectiveness of Skull Stripping*

In this study, we compared 10 different proposed 3D CNN models with a 2D-InceptionV3 model [18] (baseline model) to evaluate the effectiveness of the proposed deep learning methods. The 2D-InceptionV3 model was exclusively trained on our FLAIR data, with the maximum transverse slice of the FLAIR chosen as the input. Furthermore, we conducted a series of experiments on FLAIR images and T2W images with and without skull-stripping preprocessing to assess the effectiveness of the skull-stripping methodology.

#### *3.4. Comparison of Normalization Methods*

Typically, normalization methods often have a significant impact on the performance of deep learning models. The min–max normalization and Z-score normalization are most used in medical image normalization. While the min–max normalization approach is appropriate for most kinds of data and can effortlessly maintain the initial data distribution structure, it is not ideal for handling sparse data and is prone to being affected by outliers. The Z-score normalization method employs the mean and standard deviation of the original data to normalize it. The following formula illustrates this:

$$\propto\_{Normalized} = \frac{\propto -Mean(\propto)}{std(\propto)}\tag{4}$$

When Mean(x)= 0, std(x) = 1, that is, the mean is 0 and the standard deviation is 1, meaning that the processed data conform to the standard normal distribution. This Z-score method is suitable for most types of data, but it is a centralized method, which will change the distribution structure of the original data, and it is also not suitable for the processing of sparse data. To explore the effectiveness of the normalization operation, we conducted three sets of experiments on both T2W and FLAIR images when using the same network, which are without the normalization method, the Z-score normalization, and the min–max normalization, respectively.

#### *3.5. Model Training and Evaluation*

For our experiments, we used the same partitioning for the training set, validation set, and test set across all models. Each model was trained using a learning rate of 0.0001, SGD optimization, a batch size of 4, and 50 epochs, with the binary cross-entropy loss function. To implement the training, validation, and testing process, we used Python version 3.8.10 and PyTorch version 1.9.0 environments.

For each cohort, we calculated the area under the curve (AUC) of the receiver operating characteristic (ROC), accuracy (ACC), sensitivity (SEN), and specificity (SPE) to evaluate the classification performance of all models. These metrics rely on the true positive (TP), which counts the total number of correct positive classifications, and the true negative (TN), which represents the total number of accurate negative classifications. The false positive (FP) accounts for the total number of positive classifications that are incorrect, while the false negative (FN) represents the total number of negative classifications that are incorrect. We obtained the ACC, SEN, and SPE through the following formulas:

Accuracy (ACC): The percentage of the whole sample that is correctly classified:

$$\text{ACC} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} \tag{5}$$

Sensitivity (SEN): The percentage of the total sample that is true and correctly classified:

$$\text{SEN} = \frac{\text{TP}}{\text{TP} + \text{FN}} \tag{6}$$

Specificity (SPE): The percentage of the total sample that is negative and correctly classified:

$$\text{SPE} = \frac{\text{TN}}{\text{TN} + \text{FP}} \tag{7}$$

#### *3.6. Statistical Analysis*

For this research, categorical variables were presented using the frequency and percentage, while continuous variables were expressed as the mean ± standard deviation. Continuous variables were analyzed using the F-test, while categorical variables underwent

a chi-square analysis. Statistical significance was defined as *p* < 0.05. All statistical analyses were performed using the scikit learn, scipy, and stats libraries in Python 3.8.10.

#### **4. Results**

#### *4.1. Clinical Characteristics of Patients*

All of the 680 child subjects' primary clinical features are listed in Table 2. Among the 349 TSC patients, 188 (53.9%) were identified as male, averaging 45.5 months in age. Moreover, among the 331 HC, 183 (55.3%) were identified as male, averaging 733 months in age. There was a significant difference in the average age between the HC group and the TSC group, with a *p*-value less than 0.05. There was no significant difference in gender.

**Table 2.** The main clinical characteristics of all 680 child subjects.

