Multiple-Stream Models for a Single-Modality Dataset with Fractal Dimension Features

Chang, Yen-Ching

doi:10.3390/fractalfract9040248

Open AccessArticle

Multiple-Stream Models for a Single-Modality Dataset with Fractal Dimension Features

by

Yen-Ching Chang

^1,2

¹

Department of Medical Informatics, Chung Shan Medical University, Taichung 40201, Taiwan

²

Department of Medical Imaging, Chung Shan Medical University Hospital, Taichung 40201, Taiwan

Fractal Fract. 2025, 9(4), 248; https://doi.org/10.3390/fractalfract9040248

Submission received: 20 February 2025 / Revised: 5 April 2025 / Accepted: 10 April 2025 / Published: 15 April 2025

(This article belongs to the Special Issue Fractal Dimensions with Applications in the Real World)

Download

Browse Figures

Versions Notes

Abstract

:

Multiple-stream deep learning (DL) models are typically used for multiple-modality datasets, with each model extracting favorable features from its own modality dataset. Through feature fusion, multiple-stream models can generally achieve higher recognition rates. While feature engineering is indispensable for machine learning models, it is generally omitted for DL. However, feature engineering can be regarded as an important supplement to DL, especially when using small datasets with rich characteristics. This study aims to utilize limited existing resources to improve the overall performance of the considered models. Therefore, I choose a single-modality dataset—the Chest X-Ray dataset—as my original dataset. For ease of evaluation, I take 16 pre-trained models as basic models for the development of multiple-stream models. Based on the characteristics of the Chest X-Ray dataset, three characteristic datasets are generated from the original dataset, including the Hurst exponent dataset (corresponding to a fractal dimension dataset), as inputs to the multiple-stream models. For comparison, various multiple-stream models are developed based on the same dataset. The experimental results show that, with feature engineering, the accuracy can be raised from 91.67% (one-stream) to 94.52% (two-stream), 94.73% (three-stream), and 94.79% (four-stream), while, without feature engineering, it can be increased from 91.67% to 92.35%, 93.49%, and 93.66%, respectively. In the future, the simple yet effective methodology proposed in this study can be widely applied to other datasets, in order to effectively promote the overall performance of models in scenarios characterized by limited resources.

Keywords:

deep learning; feature engineering; feature fusion; multiple-stream models; Hurst exponent; fractal dimension

1. Introduction

Chollet once mentioned, in the book Deep Learning with Python [1], that support vector machines (SVMs) are hard to scale to large datasets and cannot provide good results for perceptual problems such as image classification. He stated that an SVM is a shallow method, and, therefore, useful features must be manually extracted—in a step called feature engineering—to apply an SVM to perceptual problems. SVMs belong to the class of machine learning methods. As indicated by this statement, feature engineering can be seen to play an indispensable role in the era of machine learning.

Subsequently, he also mentioned that deep learning also makes problem solving much easier, as it allows for the total automation of feature engineering, which used to be an extremely important or necessary step in the workflow of machine learning.

Machine learning, as a kind of shallow learning approach, is equivalent to transforming input data to another representation space through one simple transformation scheme, two representation spaces consecutively through two simple transformation schemes, and so on. For example, SVMs utilize high-dimensional nonlinear projections [1], and decision trees use simple calculation and comparison operations. However, such an approach alone is insufficient for some complex problems that require more delicate representations. Therefore, the extraction of multiple-layer representations in a manual way is required, that is, the task of feature engineering.

On the other hand, deep learning can almost completely achieve the task of feature engineering in a fully automatic way. We can learn almost all features through deep learning without any feature engineering operations, thereby reducing the workflow or burden imposed by machine learning approaches. While it can be said that deep learning removes the need for feature engineering, it instead replaces a complicated, delicate, work-intensive, and time-consuming pipeline process with an end-to-end model that is single, simple, smooth, and consistent.

As is well known, feature engineering is related to domain knowledge specific to data and machine learning algorithms. Feature engineering is a procedure that involves manually transforming data into the features required for machine learning methods in a hard-coded way, based on domain knowledge. The main purpose of feature engineering is to improve the final performance of the developed model. In the early days, it is true that, for higher accuracy, we heavily depended on domain knowledge regarding machine learning algorithms. In the era of machine learning, it is inevitable that machine learning approaches for classification demand such hard-coded features in order to recognize the categories of objects or for regression to fit the features meaningfully. In this study, I particularly require domain knowledge on deep learning algorithms, in order to achieve the task of feature engineering specific to deep learning.

Machine learning features derive from the representations used to transform the original data into other data or a representation space in a meaningful way. Therefore, we call them hard-coded features, as these features cannot be changed or tuned as model learning progresses. However, deep learning features are developed during the training of deep learning models, and, hence, we call these features soft-coded to differentiate them from hard-coded ones. On the surface, this seemingly implies that soft-coded features have a higher possibility of reaching a better performance than hard-coded features, which, to some extent, is true. In particular, this depends on what deep learning models are chosen and how they are implemented. However, this does not imply that deep learning models completely do not require feature engineering for enhanced performance, as is typically the case for machine learning models.

For the time being, deep learning techniques are imperfect, and, hence, we can still improve the overall performance of deep learning models through domain knowledge regarding the considered problems.

Before deep learning, feature engineering was inevitable. In fact, no well-handcrafted features led to a lack of good results. Even in the era of machine learning, we still highly depend on the contributions of feature engineering. As deep learning progresses, it removes most tasks of feature engineering, as deep learning models possess the ability to automatically extract most useful and hidden features. However, this does not mean that we do not obtain any benefits at all from feature engineering in the context of deep learning. Chollet [1] has mentioned two reasons for performing feature engineering:

Good features still allow you to solve problems more elegantly while using fewer resources.
Good features let you solve problems with far less data.

These two views are very convincing. Therefore, my aim is to utilize existing resources—namely, a single-modality dataset and pre-trained models—to increase the overall recognition rate through the use of multiple-stream deep learning models (simply called multiple-stream models hereafter) and feature engineering. As is well known, whether deep learning models learn well depends on the size of the used dataset, as well as the learning ability of the models. In this context, different pre-trained models possess certain advantages for some datasets. Under the same model, the more data that are available, the higher the learning ability of the model. However, if we have only a small volume of data, then the information obtained through feature engineering may be extremely useful.

Russell and Norvig [2] have stated that “The ability to do a good job of feature engineering is critical to success”. They quoted from Domingos [3]: “At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used”. This quotation indicates that the success or failure of machine learning almost certainly depends on the feature engineering step.

They also mentioned the concept that “Neural nets can map from raw inputs directly to outputs, and thus largely avoid the need for feature engineering, but they do require more data”. This quote implies that the volume of data and feature engineering form a subtle relationship. The larger the used dataset, the lower the dependence on feature engineering; to the contrary, the smaller the data, the higher the dependence on feature engineering. As such, feature engineering plays an extremely important role in machine learning, and it may play a critical role in deep learning, at least in the short-term. After all, the existing pre-trained models are not so highly advanced that they can extract all hidden features from images.

Given that some people believe that the task of learning representations is the same as feature engineering, Rivas [4] further distinguished the difference between the tasks of learning representations and feature engineering. He emphasized that feature engineering contains explicable components, whereas the task of learning representations does not necessarily associate any meaning to the learned representations. However, we have more possibilities to describe the meanings of the learned representations as we obtain more domain knowledge regarding deep learning techniques.

Based on the comments of Rivas [4], for feature engineering, it is necessary to provide the meaning of features; or, we can say that the task of feature engineering is highly dependent on the meaningful characteristics that data possess. According to the potential meanings of the original data, we can exploit the “treasure trove” of the original data. After all, not all features are meaningful to learn, at least, for some deep learning models.

The explainability of feature engineering largely depends on how well we understand the data and the adopted algorithms; for example, the fractal dimension [5,6,7] and its corresponding Hurst exponent [8,9,10] provide more physical meaning and explicability in the context of medical images and natural scenes. Since electromyographic signals usually show traces of self-similarity, Coelho and Lima [5] used fractal dimension to characterize and measure the complexity inherent in different types of muscle contraction; based on the fact that fractal dimension can properly represent the derivatives of particle breakage, Lai et al. [6] studied the correlation between fractal dimension and particle breakage for tungsten ores under impact crushing; based on the concept of box counting, Cui and Wang [7] recently proposed some novel algorithms to estimate fractal dimension of animal movement in order to analyze animal behavior. To meet the future needs of various fields for fractal analysis, Chang [8] proposed an efficient maximum likelihood estimator for the Hurst exponent of two-dimensional fractional Brownian motion; through deep learning, Chang and Jeng [9] further classified images of two-dimensional fractional Brownian motion in terms of the Hurst exponent; recently, Chang [10] proposed deep-learning estimators for the Hurst exponent of two-dimensional fractional Brownian motion. Therefore, the fractal dimension and Hurst exponent are particularly suitable for feature engineering in these fields.

In addition, Zhang et al. [11] also mentioned one key advantage of deep learning, in that deep learning approaches can automate the labor-intensive process of feature engineering. The point of view of Zhang et al. reflects the common perception of deep learning at present; that is, deep learning can completely learn all meaningful features or extract all important features.

Janiesch et al. [12] gave some opinions on machine learning and deep learning, as well as the transition from machine learning to deep learning. They talked about three approaches to building an analytical model: explicit programming, shallow machine learning, and deep learning. Each model consists of three components: the input part, the model-building part, and the output part. In the context of explicit programming, the model-building part consists of handcrafted model building; in shallow machine learning, the model-building part involves handcrafted feature engineering and automated model building; and, in deep learning, the model-building part includes feature learning (automated feature engineering) and automated model building. In particular, these models are manually, semi-automatically, and automatically built, respectively.

Wang and Liu [13] integrated feature engineering—namely, feature extraction and feature selection—into deep learning to conduct a diagnostic and predictive analysis of turbofan engines. Their adopted feature engineering techniques involved some existing tools for feature extraction and feature selection. Feature extraction techniques include principal component analysis and sliced inverse regression, while feature selection techniques include stepwise regression, multivariate adaptive regression splines, random forest, and extreme gradient boosting. The main purpose of feature extraction and feature selection is to reduce the dimensionality of the features. Next, they combined machine learning models and deep learning models with feature engineering (dimension reduction) for regressors (for models of numerical output), in order to carry out tests to find the best combination.

Different from the previously mentioned applications, my implementation of feature engineering involves transforming or mapping the original images to other characteristic images or maps that can help deep learning models to learn other hidden features from different perspectives. According to the characteristics of the original dataset, in this study, I use three transformation techniques (or tools) to generate three characteristic maps in order to form three extra datasets for multiple-stream models, in particular, the Hurst exponent dataset, the second partial derivative dataset, and the average information content dataset. With the original data included, my experiments utilize four datasets as possible inputs to the considered multiple-stream models.

In general, multiple-stream models are used for multiple-modality datasets (or, simply, modalities) as multiple inputs. For example, Simonyan and Zisserman [14] proposed a two-stream convolutional network architecture that incorporates spatial and temporal networks for action recognition in videos. Ercolano and Rossi [15] proposed a two-stream model combining CNN and long short-term memory (LSTM) models for the activity of daily living recognition. Chu et al. [16] proposed a gesture recognition system based on a dual-stream model, where the first-stream model consists of one 3DCNN and three convolutional LSTMs in series, while the second-stream model is three LSTMs in series. Zhang et al. [17] proposed a dual-stream feature aggregation network for water body segmentation. Sun and Qu [18] proposed a dual-stream U-Net structure for efficient semantic segmentation tasks. Wu et al. [19] proposed a multi-stream feature aggregation network with multi-scale supervision for single image dehazing. Xu et al. [20] proposed a dual-stream representation—the segmentation stream and super-resolution stream—fusion learning paradigm for accurate clinical segmentation. Zhang et al. [21] proposed a dual-stream feature fusion network to extract discriminative features from color (RGB) and grayscale pedestrian images. The two inputs to the dual-stream network are color images and its corresponding grayscale images. With these two types of datasets, the dual-stream network can extract more discriminative global and local features of pedestrians.

Gerber and Pillay [22] proposed an automated design for a deep neural network pipeline composed of five components: pre-processing, data augmentation, feature engineering, neural network architecture selection, and hyperparameter tuning. They adopted feature engineering techniques to extract potential features from the original data. As a result, a striking increase in performance was obtained when well-chosen feature engineering techniques were considered. Their best-performing design for the sentiment analysis and spam detection datasets was composed of three stages—pre-processing, feature engineering, and classification. On the other hand, their best-performing design for image datasets consisted of three stages—augmentation, pre-processing, and classification. They considered contrast enhancement, mean normalization, Gaussian blur, and transformation from RGB to HSV (hue, saturation, and value) color space as pre-processing operations. However, there sometimes exists a gray area between pre-processing and feature engineering, and these four operations for pre-processing may also be viewed as simple feature engineering techniques.

In a case study focused on malware classification, Gibert et al. [23] integrated feature engineering into deep learning. They fused two types of features—handcrafted features and deep features—where the handcrafted features were derived from domain knowledge and the deep features from deep learning models. Finally, they trained XGBoost (eXtreme Gradient Boosting)—a machine learning technique—to classify whether the input is malware or not. Their work considers a task characterized by multiple modalities; therefore, they essentially presented a manual multiple-stream model. They claimed that their research was the first application fusing feature engineering and deep learning through a simple yet effective early fusion mechanism for the problem of malware classification.

Based on the similarity in structure between the wavelet scattering transform (WST)—a deep learning-inspired feature engineering approach—and deep convolutional neural networks (CNNs), Taee et al. [24] adopted the deep WST to classify tremor severity. As the WST does not require optimization of the weights of the convolutional layers, it can largely reduce the computational time. Therefore, the WST can be considered as a compact CNN without training.

At present, it is widely believed that feature engineering is absolutely necessary in machine learning but almost always omitted in deep learning. As such, it is no wonder that studies presenting deep learning models supplemented by feature engineering are rare. However, in this study, I show that it is considerably meaningful and useful to pre-process the original data (e.g., transform the original images into other characteristic images), in order to make them easier for some deep learning models to process.

Therefore, in this study, I propose several multiple-stream models for a single-modality dataset, using fractal dimension features to increase the overall recognition rate. Compared to one-stream pre-trained models on the original dataset, the proposed scheme provides the following contributions:

The proposed scheme is easy to implement in practice.
The proposed scheme is considerably effective.
The proposed scheme can be easily expanded and extended.
The proposed scheme can be widely used on other datasets.

The remainder of this paper is organized as follows: Section 2 provides some related materials and methods. Section 3 gives a detailed introduction of the proposed scheme. Section 4 contains the experimental results and discussion. Finally, I conclude this paper with some facts.

2. Materials and Methods

In this study, I aim to classify images through the use of deep learning approaches supplemented with feature engineering; that is, my purpose is to make good use of the original dataset through feature engineering. Therefore, for analysis and evaluation, three characteristic maps or images were generated from the original dataset such that, in total, four datasets were considered. In addition, to make effective use of all four datasets, I designed some multiple-stream models to verify that feature engineering is still useful, even in the era of deep learning, not just in the context of machine learning.

For practical applications, my ultimate goal is to improve the overall performance of the resulting model(s) under limited resources, especially when considering only one modality or dataset. Therefore, I created other characteristic images as extra datasets, in order to match possible deep learning models and achieve the best possible result.

As four datasets are considered in total, the multiple-stream models are composed of one-stream deep learning models (simply called one-stream models), two-stream deep learning models (simply called two-stream models), three-stream deep learning models (simply called three-stream models), and four-stream deep learning models (simply called four-stream models). In this pilot and pioneering study, for simplicity, the basic models for each stream were all selected from 16 pre-trained deep learning models. In addition, no fine-tuning or further training of parameters was performed, except for the last fully connected layers. In this way, I aim to objectively analyze the effects of deep learning models supplemented by feature engineering.

2.1. The Original Dataset

For comparison, I chose a single open dataset, called “Chest X-Ray”, from Kaggle [25] as the subject of my research. The Chest X-Ray dataset will be called the original dataset in the following. The dataset contains two classes of images: normal and pneumonia (abnormal). As the images of the original dataset are unequal in size, they were cropped into the same size for reliability and fair comparison, with the detailed procedure introduced in Section 4. If there is no possibility of confusion, it will also be called the original dataset (or the O dataset, simply expressed as O); otherwise, it will be called the original cropped dataset.

2.2. The Hurst Exponent Dataset

To experimentally verify that my idea is feasible—that is, that feature engineering is still useful and meaningful—I chose the original dataset as one input to the deep learning models, with feature datasets derived from the original dataset as the other inputs. According to the characteristics of the original dataset, I generated three possibly effective features as additional datasets. Among the possible features for distinguishing between normal and abnormal instances, the fractal dimension (D) has been considered a reliable indicator for a long time, both before and in the era of machine learning. The fractal dimension has a constant relation with the Hurst exponent (H), according to the formulas D = 3 − H for images and D = 2 − H for signals.

For signals, the value of the fractal dimension ranges from 1 to 2; for images, its value ranges from 2 to 3. Although the fractal dimension is more meaningful and common than the Hurst exponent, its value is not suitable for being saved as an image. Therefore, I adopted the corresponding Hurst exponent—with values ranging between 0 and 1—as the characteristic value. The higher the fractal dimension, the rougher the corresponding image; to the contrary, the lower the Hurst exponent is, the rougher the corresponding image. Even so, the Hurst exponent is equivalent in meaning to the fractal dimension. Therefore, the first manually generated feature is the Hurst exponent, and its corresponding dataset is denoted as the Hurst exponent dataset (or the H dataset, simply expressed as H), which was compared to the O dataset.

Considering the Hurst exponent as the first manually generated feature, I modeled the images of the original dataset as fractional Brownian images (FBIs), with the values of two-dimensional fractional Brownian motion (2D FBM) saved as images. Before the era of deep learning, it was quite common to model certain images as 2D FBM images and calculate their fractal dimension to distinguish the difference between normality and abnormality or further use the fractal dimension as an effective feature in the field of machine learning. Even in the era of deep learning, this feature seems to be useful when deep learning models are imperfect. As is well known, perfection or ideality appears to be impossible, although it can be expected that deep learning approaches will be able to automatically extract as many hidden features as possible in the long term.

For modeling, suppose that an FBI of size M × N is expressed as in the following equation:

I_{B} = [\begin{matrix} B [0, 0] & B [0, 1] & \dots & B [0, N - 1] \\ B [1, 0] & B [1, 1] & \dots & B [1, N - 1] \\ ⋮ & ⋮ & ⋱ & ⋮ \\ B [M - 1, 0] & B [M - 1, 1] & \dots & B [M - 1, N - 1] \end{matrix}] .

(1)

Here,

B [m, n]

(m = 0, 1,…, M − 1; n = 0, 1,…, N − 1) represents a grayscale pixel in an FBI.

The autocorrelation function (ACF) of any two pixels

B [x_{1}, y_{1}]

and

B [x_{2}, y_{2}]

is calculated using the following equation:

r_{B B} [(\begin{matrix} x_{1} \\ y_{1} \end{matrix}), (\begin{matrix} x_{2} \\ y_{2} \end{matrix})] = E \{B [x_{1}, y_{1}] B [x_{2}, y_{2}]\} = \frac{σ^{2}}{2} ({‖(\begin{matrix} x_{1} \\ y_{1} \end{matrix})‖}^{2 H} + {‖(\begin{matrix} x_{2} \\ y_{2} \end{matrix})‖}^{2 H} - {‖(\begin{matrix} x_{1} - x_{2} \\ y_{1} - y_{2} \end{matrix})‖}^{2 H})

(2)

where H stands for the Hurst exponent, and σ is the standard deviation of the 2D FBM. It is clear, from the equation, that the ACF is proportional to the sum of the two respective distances from the origin raised to the power 2H minus the distance between the coordinates of two pixels raised to the power 2H.

For its corresponding covariance matrix,

I_{B}

is first denoted as a column vector of size

M N \times 1

, as follows:

{\overset{⇀}{B}}^{T} = [\begin{matrix} B_{0}^{T} & B_{1}^{T} & \dots & B_{M - 1}^{T} \end{matrix}],

(3)

where

B_{m}^{T} = [\begin{matrix} B [m, 0] & B [m, 1] & \dots & B [m, N - 1] \end{matrix}], m = 0, 1, \dots, M - 1 .

Based on the column vector, the corresponding covariance matrix is obtained as follows:

R_{\overset{⇀}{B} \overset{⇀}{B}} = E \{\overset{⇀}{B} {\overset{⇀}{B}}^{T}\} .

(4)

As the covariance matrix is positive definite [26], it can be decomposed through Cholesky factorization [27], according to the following equation:

R_{\overset{⇀}{B} \overset{⇀}{B}} = L L^{T} .

(5)

Therefore, the data of the 2D FBM can be generated using the following equation:

\overset{⇀}{B} = L \overset{⇀}{w},

(6)

where

\overset{⇀}{w}

is white Gaussian noise of zero mean and unit variance. The data of the 2D FBM are numerical values. When the data of the 2D FBM are saved as an image, an FBI is generated.

As

R_{\overset{⇀}{B} \overset{⇀}{B}}

is not stationary, for estimation of the Hurst exponent, Hoefer et al. [26] obtained a stationary covariance matrix through the following second increment process of the 2D FBM:

X [m, n] = B [m, n] - B [m, n - 1] - B [m - 1, n] + B [m - 1, n - 1], m = 1, 2, \dots, M - 1; n = 1, 2, \dots, N - 1 .

(7)

This equation is also called the second partial derivative (SPD), and the process is called two-dimensional fractional Gaussian noise (FGN2) [28]. For sake of consistency, it is called the 2D FGN. Based on the stationary matrix, Chang [8] proposed an efficient maximum likelihood estimator for the 2D FBM to estimate the Hurst exponent.

The procedure used to obtain the H dataset from the O dataset is as follows: First, I chose the size of the slide window as 6 × 6 and set both the row and column strides to 1. Second, I took sub-images of the same size as the window from one original image in a zigzag pattern, that is, first by row and then by column. Third, I estimated the Hurst exponent for each sub-image using the approach described by Chang [8]. Finally, I saved all Hurst exponents as a grayscale image; that is, the Hurst exponent equal to 0 is viewed as 0, while a Hurst exponent equal to 1 is viewed as 255. Therefore, if the original image has a size of 127 × 384, then the transformed image will be 122 × 379.

2.3. The Second Partial Derivative Dataset

Intuitively, abnormal images will present more fluctuating appearances than normal ones. Similar to using the first derivative of a signal to express the variability of one-dimensional signals, I chose the SPD of an image to express the variability of two-dimensional signals or images. The original purpose of the SPD was to obtain a stationary process of a 2D FBM. In this paper, I utilized the variable characteristics of the SPD to serve as a possible feature.

Therefore, my second manually generated feature is the second partial derivative, and its corresponding dataset will be called the second partial derivative dataset (or the S dataset, simply expressed as S).

To obtain the S dataset, I transformed the original images to SPD images one-by-one. If the elements or pixels of an image from the O dataset follow the notation of B in Equation (7), the resulting matrix is denoted as X in Equation (7). As the values of the resulting matrix may lie outside the grayscale range [0, 255], I adopted Min–Max normalization; that is, I first modified the minimum value in the matrix to 0, the maximum value to 255, and scaled all other values accordingly. Then, the matrix was saved as an image. The corresponding SPD images constitute the corresponding S dataset. As the original images were of size 127 × 384, the transformed images will be 126 × 383.

2.4. The Average Information Content Dataset

As mentioned by Chang [29], the average information content (AIC) can provide the average information of an image. The higher the AIC, the more details there are in an image. Therefore, my third manually generated feature was the average information content [30,31], and its corresponding dataset will be called the average information content dataset (or the A dataset, simply expressed as A). According to the nature of normality and abnormality, the AIC seems to be a satisfactory index for the analysis conducted in this study. The AIC (or entropy) of an image with the grayscale values in the range [0, L − 1] (where 0 is viewed as black and L − 1 as white) is defined as follows [29]:

A I C = - \sum_{k = 0}^{L - 1} p (k) \log_{2} p (k), p (k) = n (k) / N, for k = 0, 1, \dots, L - 1

(8)

where N is the number of pixels in an image, and n(k) is the number of pixels presenting the gray level (or intensity) equal to k. In a normal case, L is equal to 256. Obviously, the minimum possible value of the AIC is 0, and the maximum possible value is 8. To save it as an image, the AIC is normalized into the range [0, 1]; that is, it is divided by 8. Therefore, the normalized AIC is used as another feature. Wherever there is no possibility of confusion, the normalized AIC is simply called the AIC for conciseness.

Similar to generating the H dataset, in the case of the A dataset, the slide window has a size of 8 × 8, not 6 × 6. Finally, I calculated the normalized AIC value for each sub-image and then saved all values as a grayscale image; in particular, a normalized AIC value equal to 0 is viewed as 0, while a normalized AIC value equal to 1 is viewed as 255. If the original image has a size of 127 × 384, the transformed image will be 120 × 377.

2.5. Deep Learning Models

In this pilot and pioneering study, I aim to determine whether I can easily increase the recognition rate in a scenario involving a limited resource; in particular, using only one modality dataset. Therefore, I kept the environments of the used deep learning models as simple as possible. For analysis and evaluation, I only chose the first 16 pre-trained models from the list of the available models of Keras applications [32]—namely, Xception, VGG16, VGG19, ResNet50, ResNet50V2, ResNet101, ResNet101V2, ResNet152, ResNet152V2, InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, DenseNet121, DenseNet169, and DenseNet201—as the basic models serving as single-stream models.

For practical applications, it is desirable to increase the overall performance as much as possible using a single original dataset. Therefore, when time is under our control, it is usually necessary to choose other models as our basic models and then apply these basic models to other characteristic datasets derived from the original dataset as well as multiple-stream models. The ultimate goal is to maximize the overall performance under a limited resource (e.g., only one dataset); especially in the context of medical datasets, as it is generally difficult to gather medical data.

Notably, other available models from Keras applications [32] include NASNetMobile, NASNetLarge, EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, EfficientNetB4, EfficientNetB5, EfficientNetB6, EfficientNetB7, EfficientNetV2B0, EfficientNetV2B1, EfficientNetV2B2, EfficientNetV2B3, EfficientNetV2S, EfficientNetV2M, EfficientNetV2L, ConvNeXtTiny, ConvNeXtSmall, ConvNeXtBase, ConvNeXtLarge, and ConvNeXtXLarge.

A model appropriate for one dataset may not be effective for other datasets. How to choose appropriate models depends on the nature of the dataset, as well as its size. Therefore, we must learn to fit the available models to our dataset through trial and error and then determine the best combination of models and datasets.

2.6. Multiple-Stream Deep Learning Models

Based on the four utilized datasets—namely, the original dataset and the three transformed datasets, including the Hurst exponent dataset, the second partial derivative dataset, and the average information content dataset—I designed four multiple-stream models, including one-, two-, three-, and four-stream models. Figure 1, Figure 2, Figure 3 and Figure 4 show the schematic diagrams for one-, two-, three-, and four-stream models, respectively. All four schematic diagrams were composed of four main blocks: The input block, the pre-trained model block, the fully connected layer block, and the output block.

For one-stream models, the inputs were from each of the four datasets; that is, the input to a model may be an image from the original dataset, the Hurst exponent dataset, the second partial derivative dataset, or the normalized average information content dataset. “Pre-trained model” here refers to any of the 16 pre-trained models mentioned previously, which were used to extract all possible hidden features. Then, all of these features were followed by four fully connected layers and one classification layer. In practical applications, the pre-trained models used here can be replaced with other, more suitable deep learning models, considering the used dataset. For simplicity, the number of artificial neurons with the ReLU activation function (simply called ReLU neurons) in the experiments was set to 512, 256, 128, and 64, respectively, from the first hidden layer to the fourth layer. Other parameters were set with their default values. The output was NORMAL or PNEUMONIA.

For two-stream models, two inputs from combinations of the four datasets were considered; however, Figure 2 only shows one of the dataset combinations: the original dataset and the Hurst exponent dataset. The pre-trained models could be any of 16 pre-trained models mentioned above, and the two models from the two streams were simply concatenated together. Then, the concatenated features from two streams were followed by four fully connected layers. For simplicity, these four fully connected layers were the same as those in the one-stream models. The output was also NORMAL or PNEUMONIA.

For the three-stream models, the three inputs came from combinations of the four datasets; however, Figure 3 only shows one of the dataset combinations: the original dataset, the Hurst exponent dataset, and the second partial derivative dataset. The pre-trained models could be any of the 16 pre-trained models. Then, the three models in the three streams were simply concatenated together, and the concatenated features were followed by four fully connected layers. For simplicity, these four fully connected layers were the same as those mentioned above. The output was also NORMAL or PNEUMONIA.

For the four-stream models, the four inputs were the four datasets, that is, the original dataset for the first-stream model, the Hurst exponent dataset for the second-stream model, the second partial derivative dataset for the third-stream model, and the average information content dataset for the fourth-stream model. The pre-trained models could be any of the 16 pre-trained models, and the four models from four streams were simply concatenated together. Then, the concatenated features were followed by four fully connected layers. For simplicity, these four fully connected layers were the same as those mentioned above. The output was also NORMAL or PNEUMONIA.

3. The Proposed Scheme

In this study, the original input images for the deep learning models were obtained from the Chest X-Ray dataset from Kaggle [25]. For the sake of generalization and fair comparison, I cropped the original images as they were unequal in size, with the cropped images serving as input images to the deep learning models. When there is no possibility of confusion, these cropped images are referred to as the original (O) images. In addition, I adopted three characteristic maps, or images transformed from the cropped images, including Hurst exponent (H) images, second partial derivative (SPD, simply S) images, and normalized average information content (AIC, simply A) images. Therefore, four datasets in total were used in the experiments: one is the original dataset, while the other three were transformed from the original dataset.

To understand whether these characteristic images derived through feature engineering can improve the overall performance under a limited resource, I designed multiple-stream models to integrate all features, each stream model using its respective input images. In total, four kinds of model combinations or multiple-stream models were considered: namely, one-, two-, three-, and four-stream models.

In this pilot and pioneering study, I aim to determine, under limited resources (i.e., the Chest X-Ray dataset) and using ready-made models (the available pre-trained models from Keras applications [32]), whether I could increase the overall performance through the use of feature engineering and multiple-stream models. Therefore, I adopted a one-stream model run under the original images as an experimental benchmark for comparison. For the one-stream models, I adopted 16 pre-trained models from Keras applications [32] for analysis and comparison. They are listed in Table 1, where M1 stands for the first chosen pre-trained model, M2 for the second chosen pre-trained model, and so on.

These 16 pre-trained models were also run on the three characteristic image datasets for comparison. Therefore, four datasets were used as inputs to the same 16 pre-trained models. My aim was to determine whether these 16 pre-trained models, when taking the characteristic images as input, outperform those with the original images as input. For reliable comparison, each case was executed twice, and their means and standard deviations were recorded for comparison. Therefore, a total of 128 (4 datasets × 2 times × 16 models) experiments were run under the one-stream model.

Based on the average accuracy of two runs for four different datasets, I chose the top two models—that is, the two models with the best accuracies—as the subsequent experimental models for use in multiple-stream models, including the two-, three-, and four-stream models. For convenience, I denote the best model on the original dataset as O-B1 and the second-best model as O-B2; the best model on the Hurst exponent dataset as H-B1 and the second-best model in this regard as H-B2; the best model on the second partial derivative dataset as S-B1 and the second-best model on this dataset as S-B2; and the best model on the average information content dataset as A-B1 and the second-best model as A-B2. Therefore, a total of 8 (4 × 2) models remained for use in the subsequent experiments considering multiple-stream models. Each stream adopts one model, and each model is among the two selected best models regarding the used dataset.

For multiple-stream models—meaning that multiple inputs are used—the number of combinations for inputs is a combinatorial problem [33]; that is, I had to choose two, three, or four elements from four objects (i.e., datasets) as inputs. Therefore, for the two-stream models, there were six input combinations; for three-stream models, there were four input combinations; and, for four-stream models, only one input combination.

For two-stream models, there were six input combinations in total: O-H (O and H as inputs), O-S (O and S), O-A (O and A), H-S (H and S), H-A (H and A), and S-A (S and A). As I chose the two best models on each dataset to run each combination, based on the counting principle [33], the counting number of models for each combination was 4 (2 × 2). Therefore, in total, I had to execute 24 (6 × 4) experiments, reflecting the number of input combinations (6) multiplied by the number of model combinations (4). In addition, I executed two experiments for each two-stream model, for a total of forty-eight experiments.

For the three-stream models, there were four input combinations in total: O-H-S (O, H, and S as inputs), O-H-A (O, H, and A), O-S-A (O, S, and A), and (H, S, and A). Likewise, I only focused on the two best models on each dataset to run each input combination, and, therefore, each input combination contained eight model combinations. In total, I executed 32 (4 × 8) experiments: the number of input combinations (4) multiplied by the number of model combinations (8). In addition, I executed 2 experiments for each two-stream model, for a total of 64 experiments.

For four-stream models, there was only one input combination: O-H-S-A (O, H, S, and A as inputs). Similarly, I only focused on the 2 best models on each dataset to run each input combination, and, therefore, each input combination contained 16 model combinations. In total, I executed 16 (1 × 16) experiments: the number of input combinations (1) multiplied by the number of model combinations (16). In addition, I executed 2 experiments for each two-stream model, for a total of 32 experiments.

4. Experimental Results and Discussion

The original dataset used in this study was an open dataset—the Chest X-Ray dataset—obtained from Kaggle [25]. Originally, the dataset was composed of three folders (train, test, and val), where each folder consisted of two subfolders reflecting the two categories (NORMAL and PNEUMONIA). In total, there were 5863 grayscale JPEG images, with the train folder containing 5216 images (1314 NORMAL and 3875 PNEUMONIA), the test folder containing 624 images (234 NORMAL and 390 PNEUMONIA), and the val folder containing 16 images (8 NORMAL and 8 PNEUMONIA). Figure 5 shows two sample images (H × W or R × C = 2090 × 1858 and 1422 × 1152) from folder train\NORMAL, while Figure 6 shows two sample images (712 × 439 and 1240 × 840) from folder train\PNEUMONIA. For clarity of illustration, the pairs of images have been resized to the same height (4.92 cm for Figure 5; 4 cm for Figure 6).

Except for their different sizes, it is obvious from Figure 5 and Figure 6 that NORMAL images contain more peripheral parts than PNEUMONIA ones. Therefore, before applying these images, we need to understand the size differences among images in order to avoid wrongly interpreting the performance of the deep learning models, especially when the size difference may dominate during model training. Table 2 lists the statistics of sizes in the two categories, where R denotes the number of pixels by row, C for the number of pixels by column, and RC for the number of pixels by image, where mean and std. (standard deviation) are both rounded to the nearest whole number.

Based on the mean, it is obvious that NORMAL images are much larger than PNEUMONIA ones. According to experience, the distribution of image sizes can severely influence the classification results, potentially leading to biased judgments. As is well known, larger images will generally contain extra components, objects, or contents. When trained, deep learning models will learn these peripheral contents, not genuine information specific to pneumonia, therefore leading the models to make decisions according to these irrelevant features. Finally, deep learning models may obtain a higher accuracy under the same dataset, whereas the trained deep learning models will generally not be applied to other datasets due to a lack of generalization ability.

To obtain deep learning models with more generalization ability, the images of the original dataset need to be fairly pre-processed. Based on the center of each image, I cropped the original images to images of size 127 × 384—the smallest size in the dataset. All cropped images corresponding to the original images constituted the cropped dataset. In the following experiments, for the sake of conciseness and convenience, I refer to the cropped dataset as the original dataset (O) when there is no possibility of confusion. Figure 7 shows cropped images corresponding to those in Figure 5, and Figure 8 shows those corresponding to Figure 6.

It is obvious that the cropped images focus more on the lung part, and not the other peripheral parts, which generally contains other contents irrelevant to pneumonia. It can be expected that deep learning models trained using the cropped dataset will be more accurate than the original dataset.

The Hurst exponent images corresponding to the cropped images in Figure 7 and Figure 8 are shown in Figure 9 and Figure 10; the second partial derivative images corresponding to the cropped images in Figure 7 and Figure 8 are shown in Figure 11 and Figure 12; and the normalized average information content images corresponding to the cropped images in Figure 7 and Figure 8 are shown in Figure 13 and Figure 14, respectively.

Furthermore, for reliable and fair comparison, I adopted 5-fold cross-validation and, therefore, pooled all normal images from the three subfolders (train, test, and val) to form the folder NORMAL and all pneumonia images from the three subfolders (train, test, and val) to form the folder PNEUMONIA. Therefore, the number of images in the NORMAL folder was 1583, while that in the PNEUMONIA folder was 4273. In total, there were 5856 images.

For fair comparison, all deep learning models were implemented in the following operating environment: (1) the computer (HP Z4 G5 Workstation), manufactured by HP (Taipei, Taiwan), had an Intel^® Xeon(R) w3-2435 CPU with a GPU processor (NVIDIA RTX A4500); (2) the programming language was Python 3.9 with tensorflow-gpu 2.6; (3) the optimizer was adaptive moment estimation (adam); (4) the batch size was 16; (5) the number of epochs was 12; and (6) the other parameters were set to their default values.

4.1. One-Stream Models

For one-stream models, the 16 pre-trained models were executed twice on the four datasets, namely, the original dataset (O) and other three transformed datasets, including the Hurst exponent dataset (H), the second partial derivative dataset (S), and the average information content dataset (A). Then, I calculated the means and standard deviations of the accuracies of the two runs. For the multiple-stream models, I chose the two best models, according to these 16 means, as subsequent experimental models. Table 3 lists the results on the original dataset, where R1 stands for the means of accuracies in 5-fold cross-validation in the first run and R2 for the means of accuracies in 5-fold cross-validation in the second run. Corresponding tables (Table A1 and Table A2) providing detailed data on the 5-fold cross-validation results are provided in Appendix A. As mentioned in Table 1, M1 stands for the Xception model, M2 for VGG16, and so on.

From Table 3, it can be seen that the best four models were M13 (91.67%), M3 (91.21%), M16 (90.97%), and M2 (90.88%), that is, MobileNetV2 (O-B1), VGG19 (O-B2), DenseNet201 (O-B3), and VGG16 (O-B4). The best two models, M13 and M3, were chosen for use in the multiple-stream models.

Table 4 lists the results obtained on the Hurst exponent dataset, where R1 and R2 have similar meanings as above. The corresponding tables (Table A3 and Table A4), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A.

From Table 4, the best four models were M3 (92.59%), M2 (92.10%), M14 (91.15%), and M13 (90.36%), that is, VGG19 (H-B1), VGG16 (H-B2), DenseNet121 (H-B3), and MobileNetV2 (H-B4). The best two models, M3 and M2, were chosen for use in the multiple-stream models. Obviously, these were both superior to the best model (91.67%) on the original dataset, indicating that appropriate feature selection is beneficial in terms of model performance. It can be seen, from this result, that the domain knowledge of feature engineering is still important in the context of deep learning.

Table 5 lists the results obtained on the second partial derivative dataset, where R1 and R2 are as defined above. The corresponding tables (Table A5 and Table A6), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A.

From Table 5, it can be seen that the best four models were M2 (93.03%), M3 (92.97%), M13 (91.07%), and M16 (90.99%), that is, VGG16 (S-B1), VGG19 (S-B2), MobileNetV2 (S-B3), and DenseNet201 (S-B4). The best two models, M2 and M3, were chosen for use in the multiple-stream models. It is also clear that these models are both superior to the best model (91.67%) run on the original dataset, further indicating that suitable feature selection through feature engineering is helpful in promoting the overall performance of the considered deep learning models. Therefore, these results also emphasize that the domain knowledge obtained through feature engineering is still important in deep learning.

Table 6 lists the results obtained on the average information content dataset, with R1 and R2 defined as above. The corresponding tables (Table A7 and Table A8), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A.

From Table 6, it can be seen that the best four models were M2 (88.32%), M3 (87.74%), M13 (86.78%), and M12 (85.74%), that is, VGG16 (A-B1), VGG19 (A-B2), MobileNetV2 (A-B3), and MobileNet (A-B4). The best two models, M2 and M3, were chosen for use in the multiple-stream models. Notably, the best model (88.32%) obtained on the average information content dataset only ranked in the eleventh position, according to the results obtained on the original dataset. This indicates that not all features selected through feature engineering are effective in improving the overall performance. Therefore, properly implementing feature engineering approaches plays a very important role in the field of deep learning.

4.2. Two-Stream Models

Based on the results obtained with the one-stream models, I, respectively, chose the best two models from the 16 models for each of the four datasets as multiple-stream inputs. These included MobileNetV2 and VGG 19 on the original dataset, VGG 19 and VGG16 on the Hurst exponent dataset, VGG16 and VGG19 on the second partial derivative dataset, and VGG16 and VGG19 on the average information content dataset.

For the two-stream models, there were a total of six input combinations (i.e., dataset combinations), with each combination composed of four model combinations. Table 7 lists the results for all 24 combinations of datasets and models, where the Datasets column indicates the six combinations of datasets, and the Models column indicates the four combinations of the best two models. The R1 column provides the means of accuracies in 5-fold cross-validation in the first run, while the R2 column provides the means of accuracies in 5-fold cross-validation in the second run. The corresponding tables (Table A9 and Table A10), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A.

For example, the O-H notation indicates that the first-stream model in a two-stream model adopts the original dataset (O) as input, while the second-stream model adopts the Hurst exponent dataset (H) as input. And the B1-B1 notation indicates that the first-stream model comes from the best model (MobileNetV2) for the original dataset (O), while the second-stream model comes from the best model (VGG19) for the Hurst exponent dataset (H).

For ease of comparison, I averaged the four accuracies of each dataset combination, which are listed in Table 8, where the S1 column stands for the best accuracy among all first-stream models, the S2 column for the best accuracy among all second-stream models, and the S12 column for the mean of four accuracies for each group of datasets. For example, the 94.09% in the O-H row is the mean accuracy of the four model combinations for the O and H datasets, while the best accuracy (B1-B1) was 94.52%.

Obviously, through feature engineering and the use of two-stream models, all accuracies were all better than the best accuracies of the individual one-stream models; for example, the average accuracy of the two-stream models on the O-H dataset combination reached up to 94.09%, higher than the best accuracies of 91.67% on O and 92.59% on H. Although the worst results were obtained in the O-A case, the average accuracy (91.69%) of this two-stream model with the O-A dataset combination was still higher than the best accuracies of 91.67% on O and 88.32% on A.

4.3. Three-Stream Models

Similar to the experiments considering two-stream models, each single-stream model was determined from the best two models on their corresponding datasets. Therefore, in the three-stream models, there were four input combinations (or dataset combinations), with each combination consisting of eight model combinations. In total, there were 32 combinations of datasets and models.

Table 9 lists the obtained results for these combinations, where the Datasets and Models columns are similar to those mentioned above, as well as the R1 and R2 columns. The corresponding tables (Table A11 and Table A12), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A.

For example, the O-H-S notation indicates that the first-stream model of three-stream model adopts the original dataset (O) as input, the second-stream model adopts the Hurst exponent dataset (H) as input, and the third-stream model adopts the second partial derivative dataset (S) as input. The B1-B1-B1 notation indicates that the best models on the respective datasets are used for the first-stream model (MobileNetV2) for the original dataset (O), the second-stream model (VGG19) for the Hurst exponent dataset (H), and the third-stream model (VGG16) for the second partial derivative dataset (S).

For ease of comparison, I averaged the eight accuracies under each dataset combination, which are listed in Table 10, where the S1 column denotes the best accuracy among all first-stream models, the S2 column is the best accuracy among all second-stream models, the S3 column is the best accuracy among all third-stream models, and the S123 column is the mean of the eight accuracies for each group of datasets. For example, the 94.32% in the O-H-S row denotes that the mean accuracy of eight model combinations on the O, H, and S datasets, while the best accuracy (B1-B2-B2) was 94.73%.

Obviously, through the use of feature engineering and three-stream models, all accuracies were all better than the best accuracies obtained by the individual one-stream models. For example, the average accuracy of the three-stream models on the O-H-S dataset combination reached up to 94.32%, higher than the best accuracy of 91.67% (O), 92.59% (H), and 93.03% (S). Even in the worst case observed for the O-S-A combination, the average accuracy (93.72%) of this three-stream model was still higher than the best accuracy of 91.67% (O), 93.03% (S), and 88.32% (A).

Among the two-stream models, the best combination was observed in the case of O-H, that is, the original dataset together with the Hurst exponent dataset. The average accuracy in the O-H case was up to 94.09%. Unsurprisingly, the best two three-stream models were both the O-H case together with the other two datasets, namely, the second partial derivative dataset and the average information content dataset (O-H-S and O-H-A), with average accuracies as 94.32% and 94.16%, respectively.

4.4. Four-Stream Models

Similar to the previous experiments involving two- and three-stream models, each model in the four-stream case utilized the best two models for the corresponding datasets. Therefore, for the four-stream models, there was only one input combination (or dataset combination), with this combination consisting of 16 model combinations. Therefore, in total, there were 16 combinations of datasets and models.

Table 11 lists the obtained results, where the Datasets, Models, R1, and R2 columns are as defined above. Corresponding tables (Table A13 and Table A14), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A.

In particular, the O-H-S-A notation indicates that the first-stream model of the four-stream model adopted the original dataset (O) as input, the second-stream model adopted the Hurst exponent dataset (H) as input, the third-stream model adopted the second partial derivative dataset (S) as input, and the fourth-stream model adopted the average information content dataset (A) as input. Meanwhile, the B1-B1-B1-B1 notation indicates that the chosen models were the best models (MobileNetV2, VGG19, VGG16, and VGG16) on the original, Hurst exponent, second partial derivative, and average information content datasets, respectively.

For ease of comparison, I averaged the 16 accuracies under each dataset combination and list the results in Table 12, where the S1 column stands for the best accuracy for the first-stream model, the S2 column for the best accuracy for the second-stream model, the S3 column for the best accuracy for the third-stream model, the S4 column for the best accuracy for the fourth-stream model, and the S1234 column for the mean of the 16 accuracies of each group. For example, the 94.28% in the O-H-S-A row is the mean accuracy of 16 model combinations in the O, H, S, and A datasets, where the best accuracy (B1-B2-B2-B2) was 94.79%.

Obviously, through feature engineering, the average accuracy (94.28%) of the proposed four-stream model is still better than the best accuracy (93.03%) obtained by the individual one-stream models. However, it was slightly lower than the best accuracy (94.32%) of the proposed three-stream models. The phenomenon indicates that not all characteristic images or maps can contribute effectively to the final performance, which is dependent on how they are generated based on domain knowledge of the adopted dataset, as well as whether the generated features can complement each other.

4.5. Multiple-Stream Models for the Original Dataset

In the previous experiments using the multiple-stream models, each input came from four different datasets; namely, one was the original cropped X-ray dataset (O), while the other three datasets (H, S, and A) comprised characteristic images. On the surface, four datasets were used; however, in fact, only one dataset was effectively utilized—the other three datasets were transformed from the original dataset through domain knowledge regarding the original dataset. In particular, in essence, the four datasets are the same. In general, multiple-stream models are applied to multiple modalities; that is, each input differs from the others. However, my original resource was the same. Therefore, I also wished to determine whether multiple-stream models are suitable when using only one modality.

Considering up to four-stream models, I chose the pre-trained models with the four highest accuracies as subsequent experimental models, including two-stream models, three-stream models, and four-stream models. In the case of multiple-stream models using the original dataset, all inputs were the same, and, hence, each stream model must differ from the others. Therefore, the number of combinations for two-stream models was six; the number of combinations for three-stream models was four; and the number of combinations for four-stream models was one.

Table 13 lists the results for two-stream models run on the same dataset for the best 4 models of the 16 pre-trained models, where the R1 and R2 are as defined above. Corresponding tables (Table A15 and Table A16), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A. Similarly to above, B1 stands for the model with the best accuracy of the 16 pre-trained models; B2 for the model with the second-best accuracy; B3 for the model with the third-best accuracy; and B4 for the model with the fourth-best accuracy.

For ease of comparison, Table 14 lists the mean accuracies run under two-stream models for B1 and B2. They also represent the best accuracy for each dataset combination, except for the O-O dataset combination, where the best accuracy was 92.94% for the B1-B3 model combination.

Table 15 lists the results for the three-stream models run on the same dataset for the best four models of the 16 pre-trained models, with R1 and R2 defined as above. Corresponding tables (Table A17 and Table A18), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A.

For ease of comparison, Table 16 lists the mean accuracies obtained with the three-stream models for B1, B2, and B3. These represent the best accuracy for each dataset combination, except for the H-H-H dataset combination, for which the best accuracy was 92.36% for the B2-B3-B4 model combination.

Table 17 lists the results for the four-stream models run on the same dataset using the best four models of the 16 pre-trained models, with R1 and R2 defined as above. Corresponding tables (Table A19 and Table A20), with detailed data regarding the 5-fold cross-validation results, are provided in Appendix A.

For ease of comparison, Table 18 lists the mean accuracies obtained with the four-stream models for B1, B2, B3, and B4.

For ease of comparison and discussion, Table 19 summarizes the accuracies of the multiple-stream models, and Table 20 lists the differences in accuracy among the multiple-stream models. In Table 19 and Table 20, the notation “S.” stands for “stream”.

It is obvious, from Table 19 and Table 20, that, when only the original dataset (O) is utilized as the single input, does the accuracy continue to increase with the number of streams; that is, the mean accuracy of the four-stream models is higher than that of three-stream models, the mean accuracy of three-stream models is higher than that of two-stream models, and the mean accuracy of two-stream models is higher than that of one-stream models. As for the other three characteristic datasets (the H, S, and A datasets), an unstable situation was observed.

The results on the original dataset imply that each pre-trained model has not yet extracted all hidden features. Although use of the other three datasets will not make the results of three-stream models and four-stream models all superior to those of one-stream models, the results for all two-stream models were all superior to those of one-stream models. This can be explained according to the example images shown in Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14, in that the original dataset presents richer information in structure, whereas the other three characteristics appear to provide simpler structural information. Therefore, we can continuously extract more hidden features from the original dataset but not from the other three characteristic datasets.

In fact, this matches our intuition that more complex models are applied to datasets with richer information or appearances, whereas simpler models are more suitable for datasets with concise information or appearance. The experimental results indicated that two-stream models have sufficient ability to extract most information from these three characteristic datasets with a simpler structure. Just as we perform feature engineering for machine learning models, we manually extracted features from the original images with a complicated structure. These extracted features generally compress high-dimensional information into low-dimensional information, and, therefore, their intrinsic structure must be simpler than that of the original image.

4.6. Discussion

To experimentally prove that feature engineering still works under the present environment of deep learning, in this study, by carrying out feature engineering based on the original dataset, I proposed three types of characteristic image and established corresponding characteristic datasets, namely, the Hurst exponent dataset, the second partial derivative dataset, and the average information content dataset. Including the original dataset, four datasets in total are considered in this study. Therefore, I designed four categories of multiple-stream models using 16 pre-trained models; in particular, one-, two-, three-, and four-stream models.

For better generalization and fair comparison, I performed 5-fold cross-validation in the experiments. For higher reliability, every experiment was executed twice. For the sake of simplicity, all basic models for one-stream models were chosen from a selection of 16 pre-trained models, followed by four fully connected layers and one classification layer. No fine-tuning or further training was imposed on these pre-trained models. As four datasets (O, H, S, and A) were considered, for each run, I trained 64 (4 × 16) one-stream models, for a total of 128 experiments.

Considering efficiency and meaningfulness in the subsequent experiments, only the 2 best-performing models among the 16 pre-trained models were chosen as the basic models for use in each stream of the multiple-stream models. According to the average accuracies between two runs (R1 and R2) for the four datasets, the best two models for the original dataset (O) were MobileNetV2 (O-B1) and VGG19 (O-B2); the best two models for the Hurst exponent dataset (H) were VGG19 (H-B1) and VGG16 (H-B2); the best two models for the second partial derivative dataset (S) were VGG16 (S-B1) and VGG19 (S-B2); and the best two models for the average information content dataset (A) were VGG16 (A-B1) and VGG19 (A-B2).

Therefore, for the two-stream models, there were six combinations of datasets (O-H, O-S, O-A, H-S, H-A, and S-A) and four model combinations per dataset combination (B1-B1, B1-B2, B2-B1, and B2-B2). Therefore, for each run, there were 24 (6 × 4) combinations of datasets and models and, hence, 48 experiments in total. Taking the dataset combination O-H and its possible model combination B1-B1 as an example, the inputs of the first- and second-stream models were the original dataset and the Hurst exponent dataset, and the first- and second-stream models were O-B1 and H-B1 (i.e., MobileNetV2 and VGG19), respectively.

For the three-stream models, there were four combinations of datasets (O-H-S, O-H-A, O-S-A, and H-S-A) and eight model combinations per dataset combination (B1-B1-B1, B1-B1-B2, B1-B2-B1, B1-B2-B2, B2-B1-B1, B2-B1-B2, B2-B2-B1, and B2-B2-B2). Therefore, for each run, there were 32 (4 × 8) combinations of datasets and models and, hence, 64 experiments in total. Taking the dataset combination O-H-S and its possible model combination B1-B1-B1 as an example, the inputs of the first-, second-, and third-stream models were the original dataset, the Hurst exponent dataset, and the second partial derivative dataset, and the first-, second-, and third-stream models were O-B1, H-B1, and S-B1 (i.e., MobileNetV2, VGG19, and VGG16), respectively.

For the four-stream models, there was only one dataset combination (O-H-S-A) and 16 model combinations per dataset combination (B1-B1-B1-B1, B1-B1-B1-B2, B1-B1-B2-B1, B1-B1-B2-B2, B1-B2-B1-B1, B1-B2-B1-B2, B1-B2-B2-B1, B1-B2-B2-B2, B2-B1-B1-B1, B2-B1-B1-B2, B2-B1-B2-B1, B2-B1-B2-B2, B2-B2-B1-B1, B2-B2-B1-B2, B2-B2-B2-B1, and B2-B2-B2-B2). Therefore, for each run, there were 16 (1 × 16) combinations of datasets and models, for a total of 32 experiments. Taking the dataset combination O-H-S-A and its possible model combination B1-B1-B1-B1 as an example, the inputs of the first-, second-, third-, and fourth-stream models were the original dataset, the Hurst exponent dataset, the second partial derivative dataset, and the average information content dataset, and the first-, second-, third-, and fourth-stream models were O-B1, H-B1, S-B1, and A-B1 (i.e., MobileNetV2, VGG19, VGG16, and VGG16), respectively.

In order to evaluate whether feature engineering can contribute to the field of deep learning, I calculated the maximum, average, and minimum accuracies of one-stream models executed on the original dataset (O) under the 16 pre-trained models as benchmarks. Each value was averaged over two runs, and the resulting statistics were 91.67%, 89.02%, and 85.63%, respectively.

The same experiments and operations were performed as on the original dataset, and the maximum, average, and minimum accuracies of the 16 pre-trained models executed on the Hurst exponent dataset (H) were 92.59% (↑), 88.86% (↓), and 86.51% (↑), respectively. Obviously, the Hurst exponent dataset yielded higher maximum and minimum accuracies than the original dataset, but its average accuracy was slightly lower. This indicates that the proposed characteristic images—namely, those in the Hurst exponent dataset—derived through feature engineering have good potential to promote the performance on the basis of the original dataset.

For the second partial derivative dataset (S), the maximum, average, and minimum accuracies for the 16 pre-trained models were 93.03% (↑), 89.93% (↑), and 87.57% (↑), respectively. It is obvious that all three indicators on the second partial derivative dataset were higher than those on the original dataset, inspiring confidence that the proposed characteristic images—namely, those in the second partial derivative dataset—derived through feature engineering can enhance the performance on the basis of the original dataset.

For the average information content dataset (A), the maximum, average, and minimum accuracies of the 16 pre-trained models were 88.32% (↓), 85.20% (↓), and 82.71% (↓), respectively. It is obvious that all three indicators from this dataset were lower than those from the original dataset; that is, not all characteristic images are appropriate for promoting the performance of the original dataset.

These results obtained with the proposed characteristic datasets indicate that feature engineering is not only applicable in the context of machine learning; instead, domain knowledge of feature engineering can also play a crucial role in deep learning. An appropriately selected feature will contribute to raising the performance on the basis of the original dataset. In addition, simpler models are more suitable for simpler structural information. Therefore, one-stream models performed under two characteristic datasets (H and S) with simpler structural information are better than those under the original dataset.

On the other hand, to better utilize all four datasets, as mentioned above, I designed multiple-stream models to improve the overall performance; that is, under the limited resource, I attempted to leverage the proposed datasets to make progress in the field. Under the environment of two-stream models, the average accuracies on the six groups of datasets among the four combinations of models were all superior to those (91.67%) on the original dataset. Among these six accuracies, the maximum accuracy (94.09%) was obtained with the dataset combination O-H, with a promotion rate of 2.65% (i.e., from 91.67% to 94.09%); the second-best accuracy (93.81%) was from the combination O-S, with a promotion rate of 2.34% (from 91.67% to 93.81%).

Under the environment of the three-stream models, the average accuracies of the four groups of datasets among eight combinations of models were all superior to that (91.67%) on the original dataset. Among these four accuracies, the maximum accuracy (94.32%) was obtained with the dataset combination O-H-S, with a promotion rate of 2.89% (from 91.67% to 94.32%), while the second-best accuracy (94.16%) was obtained with the combination of O-S-A, with a promotion rate of 2.72% (from 91.67% to 94.16%).

Under the environment of the four-stream models, the average accuracy (94.28%) of all 16 combinations of models was superior to that (91.67%) on the original dataset. The promotion rate was 2.85% (from 91.67% to 94.28%). This promotion rate is slightly lower than the maximum accuracy obtained with three-stream model but still higher than the other accuracies.

Among the four datasets (three characteristic datasets and the original dataset), the best dataset in terms of accuracy was S, the second-best was H, the third-best was O, and the last was A. In general, the effect of feature combinations depends on two main factors: the performance of and the complementarity between different datasets. Therefore, the best performance was obtained under the combination of O and H, not the combination of the best two datasets S and H, and the second-best was the combination of O and S. One possible reason for this is that the original dataset retains richer details than the other characteristic datasets, thereby contributing more strongly to feature fusion. The worst combination was O and A—the two worst-performing individual datasets. Even so, the accuracy (91.69%) was higher than those of both O (91.67%) and A (88.32%), which can be attributed to the effect of feature fusion.

Table 10 indicates that the best feature combination was obtained through the combination of the best three datasets (S, H, and O); that is, the best two datasets, when integrated with the original dataset (with richer details), led to the best performance. This result is consistent with the previously mentioned reasons, namely, individual performance and complementarity.

Table 12 demonstrates that the feature combination of four datasets resulted in lower accuracy than the best combination of three datasets; in particular, the best combination (S, H, and O) of three datasets was weakened through further integration with another dataset (A) having weaker performance.

On the other hand, under the same original dataset, multiple-stream models were still found to contribute to the overall performance. Therefore, we can also increase the performance through constructing multiple-stream models without feature engineering on the original dataset, especially when we have no domain knowledge regarding the original dataset. In addition, while two-stream single-dataset models for the three characteristic datasets still contributed to the overall performance, higher stream models did not; this is because these three characteristic datasets are simpler in structure than the original dataset.

Thanks to the success of two schemes—that is, multiple-stream models with different datasets derived from a single dataset (or resource) and multiple-stream models utilizing the same dataset—on the Chest X-Ray dataset, in the future, the idea of the two schemes will be applied to other datasets; for example, the Alzheimer’s Disease Dataset [34] and Breast Cancer Histopathological Database (BreakHis) [35]. It can be expected that the overall performance will be increased through the utilization of these two simple, yet effective schemes.

In this study, at most, four-stream models were considered. It can be expected that more stream models will improve the overall performance as long as we still can find suitable characteristic datasets. However, more stream models mean that my proposed scheme requires more execution time and memory, which will limit the expansion capability of my proposed scheme. Therefore, if we have time and computing resources, more stream models with suitable characteristic datasets are preferred.

5. Conclusions

In this study, I aimed to improve the overall performance under limited existing resources, considering a single-modality dataset and a range of pre-trained models. To achieve this goal, I designed four categories of multiple-stream models in order to verify that, under a single-modality dataset—namely, the Chest X-Ray dataset—the overall performance can be boosted through the use of multiple-stream models (both with and without feature engineering). Each stream model utilized 16 pre-trained models. Once any state-of-the-art models are developed in the future, they all can be served as our basic models, like 16 pre-trained models. In general, multiple-stream models are developed for multiple-modality datasets, which are used as different inputs in order to complement each other by providing different resources. In contrast, in this study, I generated characteristic datasets from a single original dataset to be used as different inputs to the multiple-stream models.

Based on domain knowledge regarding the original dataset, three characteristic datasets were developed, namely, the Hurst exponent dataset (equivalent to a fractal dimension dataset), the second partial derivative dataset, and the normalized average information content dataset. The maximum accuracy on the original dataset was 91.67%, while those on the Hurst exponent, second partial derivative, and normalized average information content datasets were 92.59%, 93.03%, and 88.32%, respectively. Among the three characteristic datasets, two characteristic datasets—which are closely related to fractal characteristics—yielded higher accuracy than the original dataset, whereas one characteristic dataset performed worse. The characteristic datasets with higher accuracy can be considered relatively reliable, compared to that with lower accuracy. Therefore, the accuracy can be considered as a considerably important indicator to determine whether a chosen feature is effective or not; however, it is not an absolute index—after all, the whole process involves a complicated combination of operations. As such, the result depends on the complementarity of various components, such as the deep learning models used, the original dataset, and the characteristic datasets derived from its data.

The obtained experimental results demonstrated that handcrafted features obtained through domain knowledge of the original dataset can contribute to increasing the overall accuracy, as long as we can find some appropriate features to be used as inputs. However, it should be noted that the use of unsuitable features may lead to worse results. For example, two characteristics of the proposed characteristics—the Hurst exponent and second partial derivative—were found to be suitable in the context of the X-Ray dataset, whereas the normalized average information content was not. Therefore, the average accuracy increased from 91.67% for the one-stream models on the original dataset to 94.09% for two-stream models on the original and Hurst exponent datasets, and it increased further to 94.32% for three-stream models on the original, Hurst exponent and second partial derivative datasets; meanwhile, it slightly decreased to 94.28% for the four-stream models on all four datasets. However, the maximum accuracy increased continuously, from 91.67% (one-stream) to 94.52% (two-stream) to 94.73% (three-stream) to 94.79% (four-stream).

Obviously, feature engineering still plays a very crucial role in deep learning. As with machine learning, we need to understand the background of the original dataset in order to find appropriate features to complement or match the corresponding deep learning models.

Sometimes, we do not have any knowledge regarding the original dataset, and, hence, we cannot manually extract effective features through domain knowledge. Even in this case, we can still boost the overall accuracy through the use of multiple-stream models run on the original dataset. The obtained experimental results demonstrated that multiple-stream models run on the original dataset improved the overall effect of the models; in particular, the accuracy increased continuously from 91.67% (one-stream) to 92.35% (two-stream) to 93.49% (three-stream) to 93.66% (four-stream).

Obviously, no matter whether feature engineering is taken into consideration or not, the overall recognition rate can be improved through the use of multiple-stream models under limited existing resources. In this study, four-stream models were considered at most in the experiments. In the future, the proposed scheme or methodology can be easily applied to other datasets, potentially also enhancing the overall performance of the resulting models. In addition, I will continue to determine more potential characteristic datasets for the original dataset and design expanded multiple-stream models in order to further increase the overall performance.

In this pilot and pioneering study, for fairness of comparison, I did not perform any fine-tuning on the adopted pre-trained models; however, in the future, some fine-tuning—including unfreezing some top layers of the frozen pre-trained models and then jointly training both the newly added fully connected layers and these unfrozen top layers—will be carried out under the same environment, which may further improve the overall performance.

Funding

This work was supported by the National Science and Technology Council, Taiwan, Republic of China, under Grant NSTC 113-2221-E-040-004.

Data Availability Statement

Data are contained within this article.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACF	Autocorrelation function
AIC	Average information content
CNNs	Convolutional neural networks
DL	Deep learning
FBIs	Fractional Brownian images
FBM	Fractional Brownian motion
FGN	Fractional Gaussian noise
LSTM	Long short-term memory
SPD	Second partial derivative
SVMs	support vector machines
WST	Wavelet scattering transform

Appendix A

Table A1. The accuracies of 5-fold cross-validation for the 16 pre-trained models on the original dataset and their corresponding means and standard deviations in the first run.

Models	F1	F2	F3	F4	F5	Mean	Std.
M1	88.48%	86.76%	87.96%	88.47%	90.18%	88.37%	1.10%
M2	91.13%	91.37%	90.44%	89.24%	91.46%	90.73%	0.83%
M3	92.75%	90.86%	91.37%	91.55%	90.09%	91.32%	0.87%
M4	88.57%	89.92%	87.87%	87.87%	86.85%	88.22%	1.01%
M5	89.33%	86.17%	88.30%	88.90%	88.81%	88.30%	1.12%
M6	81.66%	86.42%	86.59%	87.87%	86.17%	85.74%	2.13%
M7	88.65%	88.39%	88.90%	88.56%	88.81%	88.66%	0.18%
M8	87.63%	86.85%	87.28%	85.99%	84.80%	86.51%	1.01%
M9	88.48%	87.28%	87.62%	89.07%	88.90%	88.27%	0.71%
M10	87.80%	87.62%	87.79%	87.79%	87.70%	87.74%	0.07%
M11	88.82%	85.31%	86.93%	87.45%	88.98%	87.50%	1.35%
M12	90.96%	91.29%	90.52%	91.72%	90.35%	90.97%	0.50%
M13	91.81%	92.40%	90.69%	91.97%	90.69%	91.51%	0.70%
M14	90.10%	89.92%	89.50%	90.52%	89.41%	89.89%	0.41%
M15	89.16%	91.12%	90.35%	89.24%	90.61%	90.10%	0.77%
M16	91.98%	90.78%	90.86%	90.95%	90.86%	91.09%	0.45%

Table A2. The accuracies of 5-fold cross-validation for the 16 pre-trained models on the original dataset and their corresponding means and standard deviations in the second run.

Models	F1	F2	F3	F4	F5	Mean	Std.
M1	87.46%	89.84%	86.08%	89.92%	88.30%	88.32%	1.46%
M2	89.76%	92.83%	90.44%	91.97%	90.18%	91.04%	1.17%
M3	90.96%	90.52%	90.86%	91.37%	91.80%	91.10%	0.44%
M4	87.54%	89.24%	90.18%	86.51%	87.19%	88.13%	1.36%
M5	87.63%	88.81%	87.87%	87.36%	88.64%	88.06%	0.57%
M6	84.39%	85.65%	86.08%	85.06%	86.42%	85.52%	0.73%
M7	87.29%	89.84%	89.15%	88.30%	88.13%	88.54%	0.88%
M8	86.09%	85.31%	85.82%	86.08%	84.20%	85.50%	0.71%
M9	87.71%	88.47%	89.33%	86.25%	89.41%	88.23%	1.17%
M10	87.20%	88.47%	89.50%	87.96%	88.64%	88.35%	0.76%
M11	88.82%	89.84%	89.24%	89.58%	88.56%	89.21%	0.47%
M12	90.96%	89.84%	90.09%	91.80%	89.84%	90.51%	0.77%
M13	90.96%	92.74%	90.95%	93.08%	91.37%	91.82%	0.91%
M14	89.76%	88.30%	90.86%	89.67%	90.35%	89.79%	0.86%
M15	89.08%	88.73%	89.58%	88.39%	88.04%	88.76%	0.53%
M16	91.30%	89.33%	91.55%	90.61%	91.46%	90.85%	0.83%

Table A3. The accuracies of 5-fold cross-validation for the 16 pre-trained models on the Hurst exponent dataset and their corresponding means and standard deviations in the first run.

Models	F1	F2	F3	F4	F5	Mean	Std.
M1	86.52%	89.33%	90.44%	89.24%	89.15%	88.93%	1.29%
M2	91.72%	91.72%	93.85%	92.40%	92.66%	92.47%	0.78%
M3	91.64%	93.42%	91.72%	93.60%	92.14%	92.50%	0.84%
M4	86.86%	87.53%	86.08%	87.36%	88.56%	87.28%	0.81%
M5	85.67%	87.62%	87.45%	86.51%	86.34%	86.71%	0.73%
M6	88.99%	87.53%	86.59%	87.96%	87.79%	87.77%	0.77%
M7	87.12%	87.62%	85.40%	85.31%	88.64%	86.82%	1.29%
M8	87.63%	88.39%	86.34%	86.51%	87.28%	87.23%	0.75%
M9	86.77%	85.40%	89.07%	86.25%	86.93%	86.89%	1.22%
M10	85.67%	86.59%	87.19%	88.22%	86.17%	86.77%	0.88%
M11	87.29%	86.17%	87.11%	89.84%	88.73%	87.82%	1.30%
M12	89.51%	90.26%	90.86%	88.73%	91.03%	90.08%	0.86%
M13	89.85%	91.03%	90.61%	88.39%	90.69%	90.11%	0.95%
M14	91.81%	92.57%	91.55%	89.58%	90.69%	91.24%	1.02%
M15	91.72%	88.73%	91.03%	90.35%	90.01%	90.37%	1.01%
M16	89.93%	91.29%	89.24%	90.35%	90.09%	90.18%	0.67%

Table A4. The accuracies of 5-fold cross-validation for the 16 pre-trained models on the Hurst exponent dataset and their corresponding means and standard deviations in the second run.

Models	F1	F2	F3	F4	F5	Mean	Std.
M1	89.68%	88.81%	88.98%	89.92%	88.47%	89.17%	0.54%
M2	89.85%	91.89%	92.40%	91.20%	93.34%	91.74%	1.17%
M3	92.32%	92.23%	93.60%	93.25%	91.97%	92.67%	0.63%
M4	85.49%	85.99%	85.74%	87.36%	87.96%	86.51%	0.97%
M5	87.29%	84.97%	86.17%	86.59%	87.36%	86.48%	0.87%
M6	86.95%	84.29%	86.51%	86.85%	88.39%	86.59%	1.32%
M7	83.70%	85.91%	85.40%	88.22%	87.79%	86.20%	1.65%
M8	86.95%	88.39%	88.64%	88.04%	87.36%	87.88%	0.63%
M9	88.14%	86.68%	86.76%	85.82%	86.42%	86.77%	0.76%
M10	88.23%	87.45%	86.93%	85.48%	87.70%	87.16%	0.94%
M11	87.20%	88.90%	86.85%	85.99%	88.98%	87.59%	1.17%
M12	91.72%	87.62%	89.84%	90.44%	88.81%	89.69%	1.40%
M13	91.72%	89.84%	89.67%	91.03%	90.78%	90.61%	0.77%
M14	88.48%	91.97%	92.49%	90.18%	92.14%	91.05%	1.51%
M15	89.59%	89.15%	90.69%	89.15%	91.29%	89.98%	0.86%
M16	91.38%	90.09%	90.09%	90.35%	88.81%	90.15%	0.82%

Table A5. The accuracies of 5-fold cross-validation for the 16 pre-trained models on the second partial derivative dataset and their corresponding means and standard deviations in the first run.

Models	F1	F2	F3	F4	F5	Mean	Std.
M1	91.13%	89.92%	90.44%	90.69%	89.50%	90.33%	0.57%
M2	93.86%	92.83%	93.42%	92.06%	92.31%	92.90%	0.67%
M3	92.41%	93.42%	92.57%	93.08%	93.08%	92.91%	0.37%
M4	88.74%	88.30%	88.81%	89.07%	89.50%	88.88%	0.39%
M5	89.76%	89.92%	88.47%	89.92%	88.73%	89.36%	0.63%
M6	89.93%	88.81%	88.73%	86.51%	88.30%	88.46%	1.11%
M7	90.36%	86.17%	88.73%	87.70%	88.73%	88.34%	1.38%
M8	88.82%	87.79%	85.91%	87.87%	89.50%	87.98%	1.21%
M9	86.69%	89.15%	88.64%	88.64%	90.61%	88.75%	1.26%
M10	88.74%	88.81%	86.76%	88.13%	88.47%	88.18%	0.75%
M11	90.27%	88.81%	87.02%	88.13%	88.47%	88.54%	1.05%
M12	91.13%	89.41%	91.72%	90.69%	91.37%	90.86%	0.80%
M13	91.13%	90.26%	91.37%	90.61%	91.37%	90.95%	0.44%
M14	91.98%	91.12%	90.09%	89.84%	91.20%	90.85%	0.78%
M15	91.89%	90.26%	91.20%	90.61%	91.29%	91.05%	0.57%
M16	90.61%	91.72%	90.09%	91.12%	90.61%	90.83%	0.55%

Table A6. The accuracies of 5-fold cross-validation for the 16 pre-trained models on the second partial derivative dataset and their corresponding means and standard deviations in the second run.

Models	F1	F2	F3	F4	F5	Mean	Std.
M1	90.87%	91.72%	89.84%	89.33%	92.83%	90.92%	1.26%
M2	92.92%	93.60%	93.08%	93.00%	93.25%	93.17%	0.24%
M3	93.77%	92.74%	92.57%	92.66%	93.42%	93.03%	0.48%
M4	87.46%	89.33%	86.42%	88.56%	88.64%	88.08%	1.02%
M5	89.33%	89.67%	86.68%	90.26%	90.69%	89.33%	1.41%
M6	88.23%	83.60%	87.62%	86.42%	88.47%	86.87%	1.78%
M7	89.33%	89.15%	87.96%	89.92%	89.75%	89.22%	0.69%
M8	84.64%	87.62%	86.17%	88.30%	89.07%	87.16%	1.58%
M9	89.51%	88.90%	88.81%	90.18%	89.41%	89.36%	0.49%
M10	88.31%	89.84%	88.22%	86.42%	87.28%	88.01%	1.14%
M11	89.25%	89.41%	88.90%	89.15%	87.96%	88.93%	0.52%
M12	91.38%	91.55%	90.09%	90.44%	90.44%	90.78%	0.58%
M13	91.30%	92.06%	91.97%	90.26%	90.35%	91.19%	0.77%
M14	92.49%	91.37%	88.98%	90.26%	90.52%	90.73%	1.17%
M15	89.59%	91.80%	90.09%	91.29%	89.92%	90.54%	0.85%
M16	91.98%	91.12%	93.00%	90.26%	89.41%	91.15%	1.26%

Table A7. The accuracies of 5-fold cross-validation for the 16 pre-trained models on the average content information dataset and their corresponding means and standard deviations in the first run.

Models	F1	F2	F3	F4	F5	Mean	Std.
M1	81.66%	82.49%	83.69%	82.07%	82.84%	82.55%	0.70%
M2	87.63%	87.96%	87.02%	90.18%	87.87%	88.13%	1.08%
M3	88.05%	88.39%	87.45%	87.36%	88.04%	87.86%	0.39%
M4	85.24%	82.07%	82.32%	85.40%	85.48%	84.10%	1.56%
M5	85.49%	84.80%	83.26%	83.18%	83.60%	84.07%	0.92%
M6	84.56%	83.43%	84.20%	86.34%	84.03%	84.51%	0.98%
M7	84.73%	84.46%	85.40%	83.43%	84.29%	84.46%	0.64%
M8	81.91%	82.75%	84.80%	80.87%	80.70%	82.21%	1.49%
M9	84.47%	86.42%	85.23%	85.14%	85.74%	85.40%	0.65%
M10	82.68%	85.99%	84.12%	84.12%	84.97%	84.38%	1.09%
M11	86.95%	85.14%	85.91%	84.37%	86.76%	85.83%	0.97%
M12	85.92%	85.31%	86.51%	84.37%	85.82%	85.59%	0.72%
M13	87.12%	88.47%	84.54%	87.62%	86.76%	86.90%	1.31%
M14	85.24%	84.97%	84.88%	84.20%	86.51%	85.16%	0.76%
M15	86.52%	86.25%	84.71%	84.63%	86.68%	85.76%	0.90%
M16	87.20%	85.91%	86.76%	85.82%	82.75%	85.69%	1.56%

Table A8. The accuracies of 5-fold cross-validation for the 16 pre-trained models on the average content information dataset and their corresponding means and standard deviations in the second run.

	F1	F2	F3	F4	F5	Mean	Std.
M1	83.70%	83.01%	83.26%	84.29%	83.26%	83.50%	0.45%
M2	88.05%	89.41%	88.64%	88.73%	87.70%	88.51%	0.59%
M3	87.20%	88.30%	88.22%	86.85%	87.53%	87.62%	0.56%
M4	84.98%	87.11%	82.58%	86.51%	85.57%	85.35%	1.57%
M5	83.87%	82.75%	85.14%	84.46%	83.35%	83.91%	0.83%
M6	80.72%	86.51%	84.12%	84.80%	83.77%	83.98%	1.89%
M7	85.84%	84.54%	84.29%	83.69%	85.57%	84.78%	0.80%
M8	83.53%	82.84%	81.21%	83.86%	84.63%	83.21%	1.15%
M9	84.90%	87.19%	84.71%	84.29%	86.25%	85.47%	1.08%
M10	83.53%	85.31%	84.37%	84.54%	83.77%	84.31%	0.62%
M11	84.04%	84.97%	86.34%	85.99%	83.52%	84.97%	1.08%
M12	84.64%	85.74%	86.08%	86.93%	86.08%	85.90%	0.74%
M13	86.18%	87.96%	83.95%	87.79%	87.45%	86.66%	1.50%
M14	85.67%	84.37%	84.29%	84.29%	86.17%	84.96%	0.80%
M15	85.92%	86.34%	86.08%	85.74%	84.20%	85.66%	0.75%
M16	84.22%	85.23%	85.57%	85.31%	85.40%	85.14%	0.48%

Table A9. The accuracies of 5-fold cross-validation for the 24 two-stream model–dataset combinations under different datasets and their corresponding means and standard deviations in the first run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-H	B1-B1	94.97%	94.36%	93.68%	95.13%	94.45%	94.52%	0.51%
	B1-B2	93.69%	94.53%	94.71%	93.25%	94.19%	94.07%	0.54%
	B2-B1	95.14%	93.17%	93.25%	93.17%	94.28%	93.80%	0.79%
	B2-B2	92.92%	92.66%	94.62%	93.42%	93.42%	93.41%	0.67%
O-S	B1-B1	93.09%	93.68%	94.02%	93.94%	94.11%	93.77%	0.37%
	B1-B2	93.60%	93.00%	94.36%	94.28%	92.23%	93.49%	0.80%
	B2-B1	93.43%	94.53%	93.08%	94.53%	94.28%	93.97%	0.60%
	B2-B2	94.88%	94.71%	92.57%	93.51%	93.85%	93.90%	0.84%
O-A	B1-B1	91.55%	92.06%	92.49%	91.89%	92.91%	92.18%	0.47%
	B1-B2	91.64%	92.83%	93.08%	93.00%	93.77%	92.86%	0.69%
	B2-B1	90.36%	91.55%	90.18%	91.20%	90.44%	90.74%	0.53%
	B2-B2	91.21%	92.06%	91.20%	90.52%	90.01%	91.00%	0.70%
H-S	B1-B1	94.20%	93.94%	93.85%	94.19%	94.28%	94.09%	0.17%
	B1-B2	94.20%	93.68%	93.85%	92.91%	93.25%	93.58%	0.45%
	B2-B1	92.41%	94.71%	93.94%	94.36%	93.77%	93.84%	0.79%
	B2-B2	93.60%	91.72%	92.83%	94.71%	93.94%	93.36%	1.02%
H-A	B1-B1	93.86%	94.28%	94.11%	93.77%	92.40%	93.68%	0.67%
	B1-B2	93.52%	92.91%	92.49%	93.51%	94.71%	93.43%	0.75%
	B2-B1	93.94%	93.25%	93.68%	93.94%	92.23%	93.41%	0.64%
	B2-B2	92.92%	93.51%	92.83%	94.79%	94.45%	93.70%	0.80%
S-A	B1-B1	94.71%	93.60%	93.42%	92.40%	92.40%	93.31%	0.86%
	B1-B2	92.06%	92.49%	93.85%	93.68%	93.51%	93.12%	0.71%
	B2-B1	93.00%	93.34%	93.85%	93.60%	93.77%	93.51%	0.31%
	B2-B2	93.09%	91.80%	92.91%	94.28%	93.60%	93.14%	0.82%

Table A10. The accuracies of 5-fold cross-validation for the 24 two-stream model–dataset combinations under different datasets and their corresponding means and standard deviations in the second run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-H	B1-B1	94.28%	93.68%	94.11%	95.13%	95.39%	94.52%	0.64%
	B1-B2	94.20%	95.39%	95.90%	93.08%	94.53%	94.62%	0.98%
	B2-B1	93.00%	93.34%	93.42%	94.62%	94.45%	93.77%	0.64%
	B2-B2	93.86%	94.11%	93.77%	94.53%	93.85%	94.02%	0.28%
O-S	B1-B1	93.60%	93.42%	94.11%	92.74%	94.28%	93.63%	0.54%
	B1-B2	94.62%	94.88%	93.00%	92.66%	93.08%	93.65%	0.92%
	B2-B1	94.03%	93.77%	93.77%	94.28%	93.25%	93.82%	0.34%
	B2-B2	93.52%	94.11%	94.28%	95.05%	94.19%	94.23%	0.49%
O-A	B1-B1	90.70%	92.74%	91.29%	92.40%	94.02%	92.23%	1.16%
	B1-B2	93.69%	94.02%	92.31%	92.49%	92.23%	92.95%	0.75%
	B2-B1	91.38%	89.24%	90.95%	92.06%	90.69%	90.86%	0.93%
	B2-B2	89.93%	89.84%	91.80%	90.86%	91.12%	90.71%	0.74%
H-S	B1-B1	93.94%	93.85%	93.60%	94.02%	93.77%	93.84%	0.15%
	B1-B2	94.37%	93.25%	94.11%	93.85%	93.51%	93.82%	0.40%
	B2-B1	93.69%	94.28%	93.00%	93.51%	91.29%	93.15%	1.02%
	B2-B2	93.34%	92.23%	94.11%	93.94%	93.34%	93.39%	0.66%
H-A	B1-B1	93.34%	94.11%	93.25%	93.00%	92.91%	93.32%	0.42%
	B1-B2	93.60%	93.25%	94.11%	93.25%	93.34%	93.51%	0.32%
	B2-B1	93.17%	92.83%	93.77%	94.45%	92.31%	93.31%	0.74%
	B2-B2	93.94%	93.25%	93.51%	93.17%	93.51%	93.48%	0.27%
S-A	B1-B1	93.69%	93.51%	93.00%	93.85%	93.00%	93.41%	0.35%
	B1-B2	92.75%	90.95%	93.42%	93.68%	92.74%	92.71%	0.95%
	B2-B1	94.28%	93.77%	93.60%	93.17%	92.06%	93.37%	0.75%
	B2-B2	93.17%	92.74%	91.20%	91.89%	94.11%	92.62%	1.01%

Table A11. The accuracies of 5-fold cross-validation for the 32 three-stream model–dataset combinations and their corresponding means and standard deviations in the first run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-H-S	B1-B1-B1	94.71%	94.88%	95.05%	93.85%	92.83%	94.26%	0.83%
	B1-B1-B2	93.86%	93.25%	94.53%	94.71%	95.22%	94.31%	0.69%
	B1-B2-B1	94.03%	93.42%	94.45%	93.85%	94.53%	94.06%	0.41%
	B1-B2-B2	93.34%	95.39%	94.45%	95.82%	94.62%	94.72%	0.85%
	B2-B1-B1	94.88%	94.28%	95.22%	94.62%	94.02%	94.60%	0.42%
	B2-B1-B2	95.65%	94.19%	93.85%	93.60%	93.77%	94.21%	0.74%
	B2-B2-B1	93.09%	94.53%	94.02%	94.02%	95.30%	94.19%	0.72%
	B2-B2-B2	95.56%	93.85%	95.13%	94.96%	94.19%	94.74%	0.63%
O-H-A	B1-B1-B1	94.54%	94.62%	95.05%	94.02%	93.77%	94.40%	0.45%
	B1-B1-B2	94.03%	93.60%	93.51%	95.39%	94.11%	94.13%	0.67%
	B1-B2-B1	94.54%	94.62%	94.11%	95.05%	94.28%	94.52%	0.32%
	B1-B2-B2	93.94%	92.74%	95.30%	94.96%	96.07%	94.60%	1.16%
	B2-B1-B1	93.43%	92.40%	93.77%	94.53%	95.30%	93.89%	0.99%
	B2-B1-B2	94.28%	94.28%	93.25%	94.19%	93.60%	93.92%	0.42%
	B2-B2-B1	93.69%	93.94%	94.53%	93.34%	94.79%	94.06%	0.54%
	B2-B2-B2	94.11%	94.28%	94.45%	93.25%	94.02%	94.02%	0.41%
O-S-A	B1-B1-B1	94.03%	93.17%	93.51%	93.34%	94.02%	93.61%	0.35%
	B1-B1-B2	93.09%	94.19%	92.66%	93.60%	93.94%	93.49%	0.56%
	B1-B2-B1	94.37%	92.83%	92.83%	94.71%	94.88%	93.92%	0.91%
	B1-B2-B2	92.66%	93.85%	91.20%	93.85%	94.96%	93.31%	1.28%
	B2-B1-B1	94.62%	92.57%	93.00%	93.85%	94.19%	93.65%	0.76%
	B2-B1-B2	91.55%	93.08%	94.28%	95.39%	93.42%	93.55%	1.28%
	B2-B2-B1	94.03%	93.25%	93.94%	94.02%	94.19%	93.89%	0.33%
	B2-B2-B2	93.86%	94.28%	93.00%	95.05%	94.79%	94.19%	0.73%
H-S-A	B1-B1-B1	95.90%	93.08%	93.51%	94.02%	94.79%	94.26%	1.00%
	B1-B1-B2	94.20%	93.08%	94.53%	94.36%	92.40%	93.72%	0.83%
	B1-B2-B1	94.11%	95.13%	93.60%	94.28%	93.68%	94.16%	0.55%
	B1-B2-B2	93.60%	95.05%	94.36%	94.02%	94.11%	94.23%	0.48%
	B2-B1-B1	93.43%	93.00%	94.62%	92.91%	94.19%	93.63%	0.67%
	B2-B1-B2	92.92%	92.31%	94.79%	91.46%	93.60%	93.02%	1.13%
	B2-B2-B1	94.20%	93.68%	94.02%	93.68%	94.02%	93.92%	0.21%
	B2-B2-B2	94.88%	94.88%	93.94%	94.62%	92.83%	94.23%	0.78%

Table A12. The accuracies of 5-fold cross-validation for the 32 three-stream model–dataset combinations and their corresponding means and standard deviations in the second run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-H-S	B1-B1-B1	95.05%	94.11%	93.85%	93.08%	94.45%	94.11%	0.65%
	B1-B1-B2	95.22%	93.34%	94.45%	94.11%	95.05%	94.43%	0.68%
	B1-B2-B1	93.86%	94.71%	91.89%	94.19%	94.11%	93.75%	0.97%
	B1-B2-B2	94.62%	95.05%	95.22%	94.28%	94.53%	94.74%	0.34%
	B2-B1-B1	94.54%	94.19%	94.88%	95.30%	92.40%	94.26%	1.00%
	B2-B1-B2	93.86%	95.05%	94.53%	94.71%	94.62%	94.55%	0.39%
	B2-B2-B1	94.88%	93.85%	93.60%	94.45%	94.45%	94.25%	0.46%
	B2-B2-B2	94.28%	92.14%	93.51%	94.71%	94.62%	93.85%	0.95%
O-H-A	B1-B1-B1	93.60%	95.13%	93.51%	93.25%	95.05%	94.11%	0.81%
	B1-B1-B2	94.71%	93.08%	94.36%	94.28%	94.36%	94.16%	0.56%
	B1-B2-B1	94.62%	93.51%	94.88%	94.36%	93.68%	94.21%	0.53%
	B1-B2-B2	94.54%	94.71%	95.30%	93.68%	94.02%	94.45%	0.56%
	B2-B1-B1	94.11%	93.17%	92.91%	93.42%	95.47%	93.82%	0.92%
	B2-B1-B2	94.37%	94.45%	93.51%	94.96%	93.34%	94.13%	0.61%
	B2-B2-B1	92.58%	94.45%	94.36%	93.34%	94.88%	93.92%	0.84%
	B2-B2-B2	93.43%	94.45%	95.47%	93.51%	94.28%	94.23%	0.74%
O-S-A	B1-B1-B1	93.52%	93.85%	94.28%	92.74%	93.42%	93.56%	0.51%
	B1-B1-B2	94.37%	93.85%	92.83%	92.74%	92.91%	93.34%	0.65%
	B1-B2-B1	94.37%	94.96%	94.19%	93.25%	93.60%	94.07%	0.60%
	B1-B2-B2	94.62%	93.08%	94.45%	93.08%	92.40%	93.53%	0.86%
	B2-B1-B1	93.86%	94.11%	93.85%	93.17%	93.08%	93.61%	0.41%
	B2-B1-B2	93.17%	94.19%	94.28%	93.25%	92.49%	93.48%	0.68%
	B2-B2-B1	94.97%	93.60%	93.34%	94.02%	94.53%	94.09%	0.60%
	B2-B2-B2	94.45%	92.49%	94.28%	96.16%	94.11%	94.30%	1.17%
H-S-A	B1-B1-B1	93.09%	94.19%	94.45%	94.45%	93.77%	93.99%	0.51%
	B1-B1-B2	92.92%	92.83%	93.60%	94.36%	95.39%	93.82%	0.96%
	B1-B2-B1	94.28%	93.42%	94.71%	93.85%	95.05%	94.26%	0.58%
	B1-B2-B2	93.94%	94.79%	93.00%	93.85%	94.96%	94.11%	0.71%
	B2-B1-B1	94.71%	93.51%	94.79%	93.17%	93.17%	93.87%	0.73%
	B2-B1-B2	93.94%	94.62%	94.45%	92.66%	92.49%	93.63%	0.90%
	B2-B2-B1	93.52%	94.88%	94.36%	93.77%	94.02%	94.11%	0.48%
	B2-B2-B2	94.88%	94.53%	94.19%	93.68%	94.19%	94.30%	0.40%

Table A13. The accuracies of 5-fold cross-validation for the 16 four-stream model–dataset combinations and their corresponding means and standard deviations in the first run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-H-S-A	B1-B1-B1-B1	95.39%	93.77%	94.45%	95.22%	93.17%	94.40%	0.85%
	B1-B1-B1-B2	93.77%	93.51%	94.71%	94.79%	94.62%	94.28%	0.53%
	B1-B1-B2-B1	94.54%	95.30%	94.19%	94.53%	94.62%	94.64%	0.36%
	B1-B1-B2-B2	95.73%	94.96%	94.02%	93.08%	94.53%	94.47%	0.89%
	B1-B2-B1-B1	94.45%	96.58%	93.94%	95.22%	94.96%	95.03%	0.89%
	B1-B2-B1-B2	94.28%	93.25%	93.51%	94.88%	94.71%	94.13%	0.64%
	B1-B2-B2-B1	94.45%	94.79%	94.11%	94.28%	95.13%	94.55%	0.37%
	B1-B2-B2-B2	94.54%	94.19%	94.96%	94.28%	94.96%	94.59%	0.33%
	B2-B1-B1-B1	94.03%	94.62%	93.42%	94.11%	94.36%	94.11%	0.40%
	B2-B1-B1-B2	95.39%	94.62%	93.51%	93.85%	94.79%	94.43%	0.67%
	B2-B1-B2-B1	93.94%	94.71%	94.88%	93.60%	93.42%	94.11%	0.58%
	B2-B1-B2-B2	94.62%	93.85%	93.85%	95.64%	94.71%	94.54%	0.66%
	B2-B2-B1-B1	93.86%	93.77%	93.34%	93.94%	93.68%	93.72%	0.21%
	B2-B2-B1-B2	94.71%	91.97%	93.77%	91.46%	95.05%	93.39%	1.44%
	B2-B2-B2-B1	93.17%	94.28%	93.08%	93.51%	94.36%	93.68%	0.54%
	B2-B2-B2-B2	94.11%	94.62%	94.28%	92.83%	94.53%	94.07%	0.65%

Table A14. The accuracies of 5-fold cross-validation for the 16 four-stream model–dataset combinations and their corresponding means and standard deviations in the second run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-H-S-A	B1-B1-B1-B1	95.73%	93.42%	95.13%	92.91%	94.71%	94.38%	1.06%
	B1-B1-B1-B2	93.34%	94.19%	94.28%	93.85%	95.56%	94.25%	0.73%
	B1-B1-B2-B1	94.88%	95.30%	94.88%	93.34%	93.85%	94.45%	0.73%
	B1-B1-B2-B2	93.86%	94.53%	94.88%	93.51%	96.07%	94.57%	0.89%
	B1-B2-B1-B1	95.39%	94.88%	90.86%	93.60%	94.11%	93.77%	1.58%
	B1-B2-B1-B2	94.62%	93.34%	93.94%	94.45%	95.82%	94.43%	0.82%
	B1-B2-B2-B1	94.45%	94.96%	94.19%	95.05%	94.36%	94.60%	0.34%
	B1-B2-B2-B2	94.71%	95.73%	94.36%	95.13%	95.05%	95.00%	0.46%
	B2-B1-B1-B1	94.62%	93.42%	95.13%	95.05%	94.62%	94.57%	0.61%
	B2-B1-B1-B2	94.80%	94.45%	93.68%	95.22%	93.34%	94.30%	0.70%
	B2-B1-B2-B1	93.77%	94.28%	94.36%	94.45%	94.19%	94.21%	0.24%
	B2-B1-B2-B2	94.03%	94.45%	94.02%	93.42%	94.28%	94.04%	0.35%
	B2-B2-B1-B1	94.03%	95.47%	93.94%	94.62%	93.85%	94.38%	0.61%
	B2-B2-B1-B2	92.83%	93.60%	93.60%	93.25%	95.47%	93.75%	0.91%
	B2-B2-B2-B1	94.54%	95.47%	95.22%	92.83%	95.30%	94.67%	0.98%
	B2-B2-B2-B2	91.64%	94.53%	93.94%	92.57%	94.11%	93.36%	1.08%

Table A15. The accuracies of 5-fold cross-validation for the 24 two-stream model–dataset combinations under the same datasets and their corresponding means and standard deviations in the first run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-O	B1-B2	93.00%	92.14%	91.29%	92.91%	91.72%	92.21%	0.67%
	B1-B3	94.37%	92.83%	93.94%	93.17%	91.55%	93.17%	0.98%
	B1-B4	92.24%	91.80%	92.91%	91.37%	90.95%	91.85%	0.68%
	B2-B3	91.13%	90.01%	91.29%	92.23%	92.31%	91.39%	0.84%
	B2-B4	91.98%	91.72%	91.55%	94.02%	92.57%	92.37%	0.90%
	B3-B4	91.47%	91.55%	89.41%	91.55%	92.40%	91.27%	0.99%
H-H	B1-B2	92.92%	93.77%	93.68%	93.25%	93.17%	93.36%	0.32%
	B1-B3	91.47%	92.74%	93.25%	91.46%	92.49%	92.28%	0.71%
	B1-B4	92.75%	91.72%	92.49%	92.06%	92.66%	92.33%	0.39%
	B2-B3	91.47%	92.83%	91.97%	91.80%	92.74%	92.16%	0.53%
	B2-B4	93.94%	92.57%	91.29%	92.66%	90.35%	92.16%	1.23%
	B3-B4	90.96%	93.00%	92.23%	93.94%	92.06%	92.44%	0.99%
S-S	B1-B2	93.69%	93.00%	93.34%	93.85%	93.51%	93.48%	0.29%
	B1-B3	90.19%	92.31%	90.69%	91.46%	91.63%	91.26%	0.74%
	B1-B4	91.81%	90.78%	91.29%	93.25%	91.55%	91.73%	0.83%
	B2-B3	90.36%	92.14%	91.37%	90.44%	92.49%	91.36%	0.86%
	B2-B4	90.53%	91.80%	92.23%	91.20%	93.60%	91.87%	1.03%
	B3-B4	92.15%	92.91%	93.34%	91.37%	92.06%	92.37%	0.69%
A-A	B1-B2	87.88%	88.73%	90.09%	88.30%	89.07%	88.82%	0.75%
	B1-B3	90.10%	87.28%	88.56%	86.93%	85.99%	87.77%	1.42%
	B1-B4	88.48%	88.56%	86.93%	84.97%	86.34%	87.06%	1.35%
	B2-B3	89.68%	87.28%	86.34%	86.51%	89.07%	87.77%	1.36%
	B2-B4	86.77%	86.08%	85.40%	88.81%	86.51%	86.71%	1.15%
	B3-B4	88.14%	86.76%	88.64%	87.79%	87.87%	87.84%	0.62%

Table A16. The accuracies of 5-fold cross-validation for the 24 two-stream model–dataset combinations under the same datasets and their corresponding means and standard deviations in the second run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-O	B1-B2	93.43%	91.97%	91.29%	93.25%	92.49%	92.49%	0.80%
	B1-B3	93.34%	94.71%	92.23%	90.52%	92.74%	92.71%	1.37%
	B1-B4	90.53%	92.74%	92.31%	92.06%	92.91%	92.11%	0.85%
	B2-B3	90.96%	91.46%	90.01%	89.84%	90.35%	90.52%	0.60%
	B2-B4	92.06%	91.37%	92.66%	93.42%	91.80%	92.26%	0.71%
	B3-B4	91.47%	91.37%	91.97%	92.57%	92.40%	91.96%	0.48%
H-H	B1-B2	90.87%	91.46%	93.60%	93.34%	94.11%	92.67%	1.27%
	B1-B3	90.36%	92.66%	92.83%	90.86%	92.40%	91.82%	1.01%
	B1-B4	91.81%	91.37%	94.19%	91.37%	93.77%	92.50%	1.22%
	B2-B3	92.32%	92.23%	93.51%	89.41%	90.86%	91.67%	1.41%
	B2-B4	93.00%	92.66%	92.40%	94.19%	93.68%	93.19%	0.66%
	B3-B4	92.66%	91.63%	90.26%	92.40%	89.33%	91.26%	1.28%
S-S	B1-B2	94.28%	93.00%	94.36%	93.17%	93.60%	93.68%	0.56%
	B1-B3	92.66%	90.52%	93.17%	91.37%	90.26%	91.60%	1.15%
	B1-B4	90.96%	91.63%	91.63%	91.89%	92.83%	91.79%	0.61%
	B2-B3	91.89%	92.40%	92.14%	91.72%	90.86%	91.80%	0.52%
	B2-B4	92.15%	93.42%	92.06%	92.06%	92.14%	92.37%	0.53%
	B3-B4	91.55%	93.25%	91.97%	92.83%	91.55%	92.23%	0.69%
A-A	B1-B2	89.85%	89.33%	88.64%	86.59%	89.50%	88.78%	1.16%
	B1-B3	87.29%	88.04%	88.22%	86.34%	88.47%	87.67%	0.78%
	B1-B4	87.80%	89.33%	87.28%	87.45%	86.76%	87.72%	0.87%
	B2-B3	86.77%	89.24%	87.28%	87.79%	83.86%	86.99%	1.77%
	B2-B4	86.18%	87.70%	88.47%	87.79%	86.76%	87.38%	0.81%
	B3-B4	87.63%	88.30%	86.34%	89.33%	85.65%	87.45%	1.32%

Table A17. The accuracies of 5-fold cross-validation for the 16 three-stream model–dataset combinations and their corresponding means and standard deviations in the first run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-O-O	B1-B2-B3	94.20%	94.11%	93.60%	94.88%	93.08%	93.97%	0.60%
	B1-B2-B4	90.02%	92.91%	92.74%	93.25%	92.91%	92.37%	1.19%
	B1-B3-B4	93.00%	93.42%	93.00%	92.40%	91.37%	92.64%	0.71%
	B2-B3-B4	92.83%	92.31%	92.31%	92.66%	92.49%	92.52%	0.20%
H-H-H	B1-B2-B3	91.98%	93.34%	94.28%	93.08%	90.95%	92.73%	1.15%
	B1-B2-B4	90.27%	93.25%	90.95%	91.55%	88.64%	90.93%	1.51%
	B1-B3-B4	91.89%	89.92%	92.06%	91.63%	93.00%	91.70%	1.00%
	B2-B3-B4	93.26%	92.31%	92.57%	93.25%	91.72%	92.62%	0.59%
S-S-S	B1-B2-B3	93.00%	92.66%	94.45%	93.25%	91.97%	93.07%	0.81%
	B1-B2-B4	90.27%	93.17%	91.46%	92.40%	93.77%	92.21%	1.24%
	B1-B3-B4	91.38%	92.40%	92.40%	91.63%	92.40%	92.04%	0.44%
	B2-B3-B4	92.24%	92.14%	92.57%	91.63%	92.83%	92.28%	0.41%
A-A-A	B1-B2-B3	89.93%	88.13%	89.07%	88.47%	90.09%	89.14%	0.78%
	B1-B2-B4	87.46%	87.79%	88.30%	86.08%	87.53%	87.43%	0.74%
	B1-B3-B4	88.99%	88.47%	87.53%	88.04%	88.81%	88.37%	0.53%
	B2-B3-B4	90.44%	88.47%	89.41%	87.36%	86.34%	88.40%	1.45%

Table A18. The accuracies of 5-fold cross-validation for the 16 three-stream model–dataset combinations and their corresponding means and standard deviations in the second run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-O-O	B1-B2-B3	92.83%	93.77%	93.94%	90.95%	93.51%	93.00%	1.09%
	B1-B2-B4	93.17%	93.00%	93.42%	91.46%	93.60%	92.93%	0.76%
	B1-B3-B4	93.17%	93.68%	93.08%	94.11%	94.36%	93.68%	0.50%
	B2-B3-B4	91.55%	92.31%	92.31%	91.97%	92.06%	92.04%	0.28%
H-H-H	B1-B2-B3	91.30%	91.46%	92.23%	92.40%	91.80%	91.84%	0.43%
	B1-B2-B4	90.78%	91.89%	91.89%	92.57%	93.68%	92.16%	0.95%
	B1-B3-B4	91.72%	92.66%	91.72%	89.33%	93.94%	91.87%	1.51%
	B2-B3-B4	91.21%	92.23%	91.37%	92.57%	93.08%	92.09%	0.71%
S-S-S	B1-B2-B3	90.87%	92.57%	91.29%	91.80%	92.40%	91.79%	0.64%
	B1-B2-B4	89.76%	94.02%	91.55%	91.46%	91.72%	91.70%	1.36%
	B1-B3-B4	91.64%	92.57%	92.31%	93.25%	93.60%	92.67%	0.69%
	B2-B3-B4	91.81%	92.83%	93.17%	92.66%	92.31%	92.55%	0.46%
A-A-A	B1-B2-B3	89.08%	88.90%	88.47%	88.64%	88.64%	88.75%	0.21%
	B1-B2-B4	88.23%	88.90%	87.02%	86.85%	87.62%	87.72%	0.76%
	B1-B3-B4	85.84%	90.35%	87.62%	86.17%	87.79%	87.55%	1.60%
	B2-B3-B4	87.20%	88.64%	88.73%	89.75%	87.87%	88.44%	0.86%

Table A19. The accuracies of 5-fold cross-validation for the 4 four-stream model–dataset combinations and their corresponding means and standard deviations in the first run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-O-O-O	B1-B2-B3-B4	94.45%	93.77%	93.77%	94.19%	93.34%	93.90%	0.39%
H-H-H-H	B1-B2-B3-B4	93.17%	93.51%	92.83%	93.94%	92.49%	93.19%	0.51%
S-S-S-S	B1-B2-B3-B4	93.52%	92.49%	93.42%	88.30%	92.91%	92.13%	1.95%
A-A-A-A	B1-B2-B3-B4	89.59%	88.81%	89.33%	84.71%	89.07%	88.30%	1.81%

Table A20. The accuracies of 5-fold cross-validation for the 4 four-stream model–dataset combinations and their corresponding means and standard deviations in the second run.

Datasets	Models	F1	F2	F3	F4	F5	Mean	Std.
O-O-O-O	B1-B2-B3-B4	93.86%	93.68%	92.91%	93.08%	93.51%	93.41%	0.36%
H-H-H-H	B1-B2-B3-B4	92.41%	90.95%	91.80%	92.40%	93.51%	92.21%	0.84%
S-S-S-S	B1-B2-B3-B4	92.41%	92.40%	92.83%	92.66%	91.03%	92.26%	0.64%
A-A-A-A	B1-B2-B3-B4	88.65%	89.50%	89.07%	88.56%	89.67%	89.09%	0.44%

References

Chollet, F. Deep Learning with Python; Manning: New York, NY, USA, 2018. [Google Scholar]
Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 4th ed.; Pearson Education Limited: New York, NY, USA, 2021. [Google Scholar]
Domingos, P. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78–87. [Google Scholar] [CrossRef]
Rivas, P. Deep Learning for Beginners: A Beginner’s Guide to Getting up and Running with Deep Learning from Scratch Using Python; Packt Publishing: Birmingham, UK, 2020. [Google Scholar]
Coelho, A.L.V.; Lima, C.A.M. Assessing fractal dimension methods as feature extractors for EMG signal classification. Eng. Appl. Artif. Intell. 2014, 36, 81–89. [Google Scholar] [CrossRef]
Lai, J.; Wu, C.; Liao, N.; Shen, H.; Zhong, J.; Liu, R.; Li, L. A study on the correlation between fractal dimension and particle breakage for tungsten ores under impact crushing. Minerals Engineering. Miner. Eng. 2024, 218, 108980. [Google Scholar] [CrossRef]
Cui, T.; Wang, T. Exact box-counting and temporal sampling algorithms for fractal dimension estimation with applications to animal behavior analysis. Results Eng. 2025, 25, 103755. [Google Scholar] [CrossRef]
Chang, Y.-C. An efficient maximum likelihood estimator for two-dimensional fractional Brownian motion. Fractals 2021, 29, 2150025. [Google Scholar] [CrossRef]
Chang, Y.-C.; Jeng, J.-T. Classifying images of two-dimensional fractional Brownian motion through deep learning and its applications. Appl. Sci. 2023, 13, 803. [Google Scholar] [CrossRef]
Chang, Y.-C. Deep-learning estimators for the Hurst exponent of two-dimensional fractional Brownian motion. Fractal Fract. 2024, 8, 50. [Google Scholar] [CrossRef]
Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J. Dive into Deep Learning. Available online: https://d2l.ai/ (accessed on 16 December 2022).
Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
Wang, C.-H.; Liu, J.-Y. Integrating feature engineering with deep learning to conduct diagnostic and predictive analytics for turbofan engines. Math. Probl. Eng. 2022, 2022, 9930176. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Ercolano, G.; Rossi, S. Combining CNN and LSTM for activity of daily living recognition with a 3D matrix skeleton representation. Intell. Serv. Robot. 2021, 14, 175–185. [Google Scholar] [CrossRef]
Chu, H.-C.; Wang, Y.-X.; Wang, T.-Y.; Chang, T.-H.; Lin, C.-H.; Chang, Y.-C. A gesture recognition system based on a dual-stream model for deep learning. Chung Shan Med. J. 2023, 34, 67–73. [Google Scholar]
Zhang, X.; Geng, M.; Yang, X.; Li, C. An enhanced dual-stream network using multi-source remote sensing imagery for water body segmentation. Appl. Sci. 2024, 14, 178. [Google Scholar] [CrossRef]
Sun, Q.; Qu, F. CPF-UNet: A dual-path U-Net structure for semantic segmentation of panoramic surround-view images. Appl. Sci. 2024, 14, 5473. [Google Scholar] [CrossRef]
Wu, J.; Tao, H.; Xiao, K.; Chu, J.; Leng, L. Multi-stream feature aggregation network with multi-scale supervision for single image dehazing. Eng. Appl. Artif. Intell. 2025, 139, 109486. [Google Scholar] [CrossRef]
Xu, R.; Wang, C.; Xu, S.; Meng, W.; Zhang, X. Dual-stream Representation Fusion Learning for Accurate Medical Image Segmentation. Eng. Appl. Artif. Intell. 2023, 123, 106402. [Google Scholar] [CrossRef]
Zhang, W.; Li, Z.; Du, H.; Tong, J.; Liu, Z. Dual-stream Feature Fusion Network for Person Re-identification. Eng. Appl. Artif. Intell. 2024, 131, 107888. [Google Scholar] [CrossRef]
Gerber, M.; Pillay, N. Automated design of the deep neural network pipeline. Appl. Sci. 2022, 12, 12215. [Google Scholar] [CrossRef]
Gibert, D.; Planes, J.; Mateu, C.; Le, Q. Fusing feature engineering and deep learning: A case study for malware classification. Expert Syst. Appl. 2022, 207, 117957. [Google Scholar] [CrossRef]
Taee, A.A.; Hosseini, S.; Khushaba, R.N.; Zia, T.; Lin, C.-T.; Al-Jumaily, A. Deep learning inspired feature engineering for classifying tremor severity. IEEE Access 2022, 10, 105377–105386. [Google Scholar] [CrossRef]
Chest X-Ray Images (Pneumonia). Available online: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia (accessed on 24 September 2021).
Hoefer, S.; Hannachi, H.; Pandit, M.; Kumaresan, R. Isotropic two-dimensional fractional Brownian motion and its application in ultrasonic analysis. In Proceedings of the Engineering in Medicine and Biology Society, 14th Annual International Conference of the IEEE, Paris, France, 29 October–1 November 1992; pp. 1267–1269. [Google Scholar]
Kincaid, D.; Cheney, W. Numerical Analysis: Mathematics of Science Computing; Brooks/Cole Publishing Company: New York, NY, USA, 1996. [Google Scholar]
McGaughey, D.R.; Aitken, G.J.M. Generating two-dimensional fractional Brownian motion using the fractional Gaussian process (FGp) algorithm. Phys. A 2002, 311, 369–380. [Google Scholar] [CrossRef]
Chang, Y.-C. A flexible contrast enhancement method with visual effects and brightness preservation: Histogram planting. Comput. Electr. Eng. 2018, 69, 796–807. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Q.; Zhang, B. Image enhancement based on equal area dualistic sub-image histogram equalization method. IEEE Trans. Consum. Electron. 1999, 45, 68–75. [Google Scholar] [CrossRef]
Dougherty, G. Digital Image Processing for Medical Applications; Cambridge University Press: New York, NY, USA, 2009. [Google Scholar]
Keras Applications. Available online: https://keras.io/api/applications/ (accessed on 6 August 2024).
Ghahramani, S. Fundamentals of Probability with Stochastic Processes, 3rd ed.; Pearson Prentice Hall: New York, NY, USA, 2005. [Google Scholar]
Alzheimer’s Disease Dataset. Available online: https://www.kaggle.com/datasets/rabieelkharoua/alzheimers-disease-dataset (accessed on 21 November 2024).
Breast Cancer Histopathological Database (BreakHis). Available online: https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/ (accessed on 8 January 2025).

$Fractalfract 09 00248 g001$

Figure 1. Schematic diagram of one-stream models.

$Fractalfract 09 00248 g001$

$Fractalfract 09 00248 g002$

Figure 2. Schematic diagram of a two-stream model.

$Fractalfract 09 00248 g002$

$Fractalfract 09 00248 g003$

Figure 3. Schematic diagram of a three-stream model.

$Fractalfract 09 00248 g003$

$Fractalfract 09 00248 g004$

Figure 4. Schematic diagram of four-stream model.

$Fractalfract 09 00248 g004$

$Fractalfract 09 00248 g005$

Figure 5. Two sample images from the folder train\NORMAL.

$Fractalfract 09 00248 g005$

$Fractalfract 09 00248 g006$

Figure 6. Two sample images from the folder train\PNEUMONIA.

$Fractalfract 09 00248 g006$

$Fractalfract 09 00248 g007$

Figure 7. The cropped images corresponding to Figure 5.

$Fractalfract 09 00248 g007$

$Fractalfract 09 00248 g008$

Figure 8. The cropped images corresponding to Figure 6.

$Fractalfract 09 00248 g008$

$Fractalfract 09 00248 g009$

Figure 9. The Hurst exponent images corresponding to Figure 7.

$Fractalfract 09 00248 g009$

$Fractalfract 09 00248 g010$

Figure 10. The Hurst exponent images corresponding to Figure 8.

$Fractalfract 09 00248 g010$

$Fractalfract 09 00248 g011$

Figure 11. The second partial derivative images corresponding to Figure 7.

$Fractalfract 09 00248 g011$

$Fractalfract 09 00248 g012$

Figure 12. The second partial derivative images corresponding to Figure 8.

$Fractalfract 09 00248 g012$

$Fractalfract 09 00248 g013$

Figure 13. The normalized average information content images corresponding to Figure 7.

$Fractalfract 09 00248 g013$

$Fractalfract 09 00248 g014$

Figure 14. The normalized average information content images corresponding to Figure 8.

$Fractalfract 09 00248 g014$

Table 1. The model numbers and their corresponding names of 16 pre-trained models.

Model No.	Model Names
M1	Xception
M2	VGG16
M3	VGG19
M4	ResNet50
M5	ResNet50V2
M6	ResNet101
M7	ResNet101V2
M8	ResNet152
M9	ResNet152V2
M10	InceptionV3
M11	InceptionResNetV2
M12	MobileNet
M13	MobileNetV2
M14	DenseNet121
M15	DenseNet169
M16	DenseNet201

Table 2. The statistics of sizes of normal and pneumonia images, including mean, standard deviation (std.), minimum (min.), and maximum (max.).

	NORMAL			PNEUMONIA
	R	C	RC	R	C	RC
mean	1379	1686	2,417,480	820	1195	1,048,972
std.	343	305	1,061,581	271	285	605,822
min.	496	912	488,064	127	384	48,768
max.	2713	2916	7,532,028	2304	2772	5,815,656

Table 3. The means and standard deviations of accuracies for the 16 pre-trained models on the original dataset.

Models	R1	R2	Mean	Std.
M1	88.37%	88.32%	88.35%	0.03%
M2	90.73%	91.04%	90.88%	0.15%
M3	91.32%	91.10%	91.21%	0.11%
M4	88.22%	88.13%	88.17%	0.04%
M5	88.30%	88.06%	88.18%	0.12%
M6	85.74%	85.52%	85.63%	0.11%
M7	88.66%	88.54%	88.60%	0.06%
M8	86.51%	85.50%	86.01%	0.50%
M9	88.27%	88.23%	88.25%	0.02%
M10	87.74%	88.35%	88.05%	0.31%
M11	87.50%	89.21%	88.35%	0.85%
M12	90.97%	90.51%	90.74%	0.23%
M13	91.51%	91.82%	91.67%	0.15%
M14	89.89%	89.79%	89.84%	0.05%
M15	90.10%	88.76%	89.43%	0.67%
M16	91.09%	90.85%	90.97%	0.12%

Table 4. The means and standard deviations of accuracies for the 16 pre-trained models on the Hurst exponent dataset.

Models	R1	R2	Mean	Std.
M1	88.93%	89.17%	89.05%	0.12%
M2	92.47%	91.74%	92.10%	0.37%
M3	92.50%	92.67%	92.59%	0.09%
M4	87.28%	86.51%	86.89%	0.38%
M5	86.71%	86.48%	86.59%	0.12%
M6	87.77%	86.59%	87.18%	0.59%
M7	86.82%	86.20%	86.51%	0.31%
M8	87.23%	87.88%	87.55%	0.32%
M9	86.89%	86.77%	86.83%	0.06%
M10	86.77%	87.16%	86.96%	0.20%
M11	87.82%	87.59%	87.70%	0.12%
M12	90.08%	89.69%	89.88%	0.20%
M13	90.11%	90.61%	90.36%	0.25%
M14	91.24%	91.05%	91.15%	0.09%
M15	90.37%	89.98%	90.17%	0.20%
M16	90.18%	90.15%	90.16%	0.02%

Table 5. The means and standard deviations of accuracies for the 16 pre-trained models on the second partial derivative dataset.

Models	R1	R2	Mean	Std.
M1	90.33%	90.92%	90.62%	0.29%
M2	92.90%	93.17%	93.03%	0.14%
M3	92.91%	93.03%	92.97%	0.06%
M4	88.88%	88.08%	88.48%	0.40%
M5	89.36%	89.33%	89.34%	0.02%
M6	88.46%	86.87%	87.66%	0.79%
M7	88.34%	89.22%	88.78%	0.44%
M8	87.98%	87.16%	87.57%	0.41%
M9	88.75%	89.36%	89.05%	0.31%
M10	88.18%	88.01%	88.10%	0.09%
M11	88.54%	88.93%	88.74%	0.20%
M12	90.86%	90.78%	90.82%	0.04%
M13	90.95%	91.19%	91.07%	0.12%
M14	90.85%	90.73%	90.79%	0.06%
M15	91.05%	90.54%	90.80%	0.26%
M16	90.83%	91.15%	90.99%	0.16%

Table 6. The means and standard deviations of accuracies for the 16 pre-trained models on the average information content dataset.

Models	R1	R2	Mean	Std.
M1	82.55%	83.50%	83.03%	0.48%
M2	88.13%	88.51%	88.32%	0.19%
M3	87.86%	87.62%	87.74%	0.12%
M4	84.10%	85.35%	84.73%	0.62%
M5	84.07%	83.91%	83.99%	0.08%
M6	84.51%	83.98%	84.25%	0.26%
M7	84.46%	84.78%	84.62%	0.16%
M8	82.21%	83.21%	82.71%	0.50%
M9	85.40%	85.47%	85.43%	0.03%
M10	84.38%	84.31%	84.34%	0.03%
M11	85.83%	84.97%	85.40%	0.43%
M12	85.59%	85.90%	85.74%	0.15%
M13	86.90%	86.66%	86.78%	0.12%
M14	85.16%	84.96%	85.06%	0.10%
M15	85.76%	85.66%	85.71%	0.05%
M16	85.69%	85.14%	85.42%	0.27%

Table 7. The means and standard deviations of accuracies for the 24 two-stream dataset–model combinations under different datasets.

Datasets	Models	R1	R2	Mean	Std.
O-H	B1-B1	94.52%	94.52%	94.52%	0.00%
	B1-B2	94.07%	94.62%	94.35%	0.27%
	B2-B1	93.80%	93.77%	93.78%	0.02%
	B2-B2	93.41%	94.02%	93.72%	0.31%
O-S	B1-B1	93.77%	93.63%	93.70%	0.07%
	B1-B2	93.49%	93.65%	93.57%	0.08%
	B2-B1	93.97%	93.82%	93.90%	0.08%
	B2-B2	93.90%	94.23%	94.07%	0.16%
O-A	B1-B1	92.18%	92.23%	92.20%	0.03%
	B1-B2	92.86%	92.95%	92.90%	0.04%
	B2-B1	90.74%	90.86%	90.80%	0.06%
	B2-B2	91.00%	90.71%	90.86%	0.15%
H-S	B1-B1	94.09%	93.84%	93.96%	0.13%
	B1-B2	93.58%	93.82%	93.70%	0.12%
	B2-B1	93.84%	93.15%	93.49%	0.34%
	B2-B2	93.36%	93.39%	93.37%	0.02%
H-A	B1-B1	93.68%	93.32%	93.50%	0.18%
	B1-B2	93.43%	93.51%	93.47%	0.04%
	B2-B1	93.41%	93.31%	93.36%	0.05%
	B2-B2	93.70%	93.48%	93.59%	0.11%
S-A	B1-B1	93.31%	93.41%	93.36%	0.05%
	B1-B2	93.12%	92.71%	92.91%	0.21%
	B2-B1	93.51%	93.37%	93.44%	0.07%
	B2-B2	93.14%	92.62%	92.88%	0.26%

Table 8. The best accuracies for the first- and second-stream models, as well as the means of the four accuracies for each dataset combination.

Datasets	S1	S2	S12
O-H	91.67%	92.59%	94.09%
O-S	91.67%	93.03%	93.81%
O-A	91.67%	88.32%	91.69%
H-S	92.59%	93.03%	93.63%
H-A	92.59%	88.32%	93.48%
S-A	93.03%	88.32%	93.15%

Table 9. The means and standard deviations of accuracies for the 32 three-stream dataset–model combinations.

Datasets	Models	R1	R2	Mean	Std.
O-H-S	B1-B1-B1	94.26%	94.11%	94.19%	0.08%
	B1-B1-B2	94.31%	94.43%	94.37%	0.06%
	B1-B2-B1	94.06%	93.75%	93.90%	0.15%
	B1-B2-B2	94.72%	94.74%	94.73%	0.01%
	B2-B1-B1	94.60%	94.26%	94.43%	0.17%
	B2-B1-B2	94.21%	94.55%	94.38%	0.17%
	B2-B2-B1	94.19%	94.25%	94.22%	0.03%
	B2-B2-B2	94.74%	93.85%	94.30%	0.44%
O-H-A	B1-B1-B1	94.40%	94.11%	94.25%	0.15%
	B1-B1-B2	94.13%	94.16%	94.14%	0.02%
	B1-B2-B1	94.52%	94.21%	94.36%	0.15%
	B1-B2-B2	94.60%	94.45%	94.53%	0.08%
	B2-B1-B1	93.89%	93.82%	93.85%	0.03%
	B2-B1-B2	93.92%	94.13%	94.02%	0.10%
	B2-B2-B1	94.06%	93.92%	93.99%	0.07%
	B2-B2-B2	94.02%	94.23%	94.13%	0.10%
O-S-A	B1-B1-B1	93.61%	93.56%	93.59%	0.03%
	B1-B1-B2	93.49%	93.34%	93.42%	0.08%
	B1-B2-B1	93.92%	94.07%	94.00%	0.08%
	B1-B2-B2	93.31%	93.53%	93.42%	0.11%
	B2-B1-B1	93.65%	93.61%	93.63%	0.02%
	B2-B1-B2	93.55%	93.48%	93.51%	0.03%
	B2-B2-B1	93.89%	94.09%	93.99%	0.10%
	B2-B2-B2	94.19%	94.30%	94.25%	0.05%
H-S-A	B1-B1-B1	94.26%	93.99%	94.13%	0.14%
	B1-B1-B2	93.72%	93.82%	93.77%	0.05%
	B1-B2-B1	94.16%	94.26%	94.21%	0.05%
	B1-B2-B2	94.23%	94.11%	94.17%	0.06%
	B2-B1-B1	93.63%	93.87%	93.75%	0.12%
	B2-B1-B2	93.02%	93.63%	93.32%	0.31%
	B2-B2-B1	93.92%	94.11%	94.01%	0.09%
	B2-B2-B2	94.23%	94.30%	94.26%	0.03%

Table 10. The best accuracies of the first-, second-, and third-stream models, as well as the means of all eight accuracies of each dataset combination.

Datasets	S1	S2	S3	S123
O-H-S	91.67%	92.59%	93.03%	94.32%
O-H-A	91.67%	92.59%	88.32%	94.16%
O-S-A	91.67%	93.03%	88.32%	93.72%
H-S-A	92.59%	93.03%	88.32%	93.95%

Table 11. The means and standard deviations of accuracies for the 16 four-stream dataset–model combinations.

Datasets	Models	R1	R2	Mean	Std.
O-H-S-A	B1-B1-B1-B1	94.40%	94.38%	94.39%	0.01%
	B1-B1-B1-B2	94.28%	94.25%	94.26%	0.02%
	B1-B1-B2-B1	94.64%	94.45%	94.54%	0.09%
	B1-B1-B2-B2	94.47%	94.57%	94.52%	0.05%
	B1-B2-B1-B1	95.03%	93.77%	94.40%	0.63%
	B1-B2-B1-B2	94.13%	94.43%	94.28%	0.15%
	B1-B2-B2-B1	94.55%	94.60%	94.58%	0.03%
	B1-B2-B2-B2	94.59%	95.00%	94.79%	0.20%
	B2-B1-B1-B1	94.11%	94.57%	94.34%	0.23%
	B2-B1-B1-B2	94.43%	94.30%	94.36%	0.07%
	B2-B1-B2-B1	94.11%	94.21%	94.16%	0.05%
	B2-B1-B2-B2	94.54%	94.04%	94.29%	0.25%
	B2-B2-B1-B1	93.72%	94.38%	94.05%	0.33%
	B2-B2-B1-B2	93.39%	93.75%	93.57%	0.18%
	B2-B2-B2-B1	93.68%	94.67%	94.18%	0.50%
	B2-B2-B2-B2	94.07%	93.36%	93.72%	0.36%

Table 12. The best accuracies of the first-, second-, third-, and fourth-stream models, as well as the means of all accuracies for each combination.

Datasets	S1	S2	S3	S4	S1234
O-H-S-A	91.67%	92.59%	93.03%	88.32%	94.28%

Table 13. The means and standard deviations of accuracies for the 24 two-stream dataset–model combinations under the same datasets.

Datasets	Models	R1	R2	Mean	Std.
O-O	B1-B2	92.21%	92.49%	92.35%	0.14%
	B1-B3	93.17%	92.71%	92.94%	0.23%
	B1-B4	91.85%	92.11%	91.98%	0.13%
	B2-B3	91.39%	90.52%	90.96%	0.44%
	B2-B4	92.37%	92.26%	92.32%	0.05%
	B3-B4	91.27%	91.96%	91.62%	0.34%
H-H	B1-B2	93.36%	92.67%	93.02%	0.34%
	B1-B3	92.28%	91.82%	92.05%	0.23%
	B1-B4	92.33%	92.50%	92.42%	0.09%
	B2-B3	92.16%	91.67%	91.91%	0.25%
	B2-B4	92.16%	93.19%	92.67%	0.51%
	B3-B4	92.44%	91.26%	91.85%	0.59%
S-S	B1-B2	93.48%	93.68%	93.58%	0.10%
	B1-B3	91.26%	91.60%	91.43%	0.17%
	B1-B4	91.73%	91.79%	91.76%	0.03%
	B2-B3	91.36%	91.80%	91.58%	0.22%
	B2-B4	91.87%	92.37%	92.12%	0.25%
	B3-B4	92.37%	92.23%	92.30%	0.07%
A-A	B1-B2	88.82%	88.78%	88.80%	0.02%
	B1-B3	87.77%	87.67%	87.72%	0.05%
	B1-B4	87.06%	87.72%	87.39%	0.33%
	B2-B3	87.77%	86.99%	87.38%	0.39%
	B2-B4	86.71%	87.38%	87.05%	0.33%
	B3-B4	87.84%	87.45%	87.65%	0.20%

Table 14. The mean accuracies for two-stream models using the four datasets.

Datasets	Mean
O-O	92.35%
H-H	93.02%
S-S	93.58%
A-A	88.80%

Table 15. The means and standard deviations of accuracies for the 16 three-stream dataset–model combinations.

Datasets	Models	R1	R2	Mean	Std.
O-O-O	B1-B2-B3	93.97%	93.00%	93.49%	0.49%
	B1-B2-B4	92.37%	92.93%	92.65%	0.28%
	B1-B3-B4	92.64%	93.68%	93.16%	0.52%
	B2-B3-B4	92.52%	92.04%	92.28%	0.24%
H-H-H	B1-B2-B3	92.73%	91.84%	92.28%	0.44%
	B1-B2-B4	90.93%	92.16%	91.55%	0.61%
	B1-B3-B4	91.70%	91.87%	91.79%	0.09%
	B2-B3-B4	92.62%	92.09%	92.36%	0.26%
S-S-S	B1-B2-B3	93.07%	91.79%	92.43%	0.64%
	B1-B2-B4	92.21%	91.70%	91.96%	0.26%
	B1-B3-B4	92.04%	92.67%	92.36%	0.32%
	B2-B3-B4	92.28%	92.55%	92.42%	0.14%
A-A-A	B1-B2-B3	89.14%	88.75%	88.94%	0.20%
	B1-B2-B4	87.43%	87.72%	87.58%	0.15%
	B1-B3-B4	88.37%	87.55%	87.96%	0.41%
	B2-B3-B4	88.40%	88.44%	88.42%	0.02%

Table 16. The mean accuracies for three-stream models using the four datasets.

Datasets	Mean
O-O-O	93.49%
H-H-H	92.28%
S-S-S	92.43%
A-A-A	88.94%

Table 17. The means and standard deviations of accuracies for the 4 four-stream dataset–model combinations.

Datasets	Models	R1	R2	Mean	Std.
O-O-O-O	B1-B2-B3-B4	93.90%	93.41%	93.66%	0.25%
H-H-H-H	B1-B2-B3-B4	93.19%	92.21%	92.70%	0.49%
S-S-S-S	B1-B2-B3-B4	92.13%	92.26%	92.20%	0.07%
A-A-A-A	B1-B2-B3-B4	88.30%	89.09%	88.70%	0.39%

Table 18. The mean accuracies for four-stream models using the four datasets.

Datasets	Mean
O-O-O-O	93.66%
H-H-H-H	92.70%
S-S-S-S	92.20%
A-A-A-A	88.70%

Table 19. Summary of the accuracies of multiple-stream models on the four datasets.

Datasets	One-S.	Two-S.	Three-S.	Four-S.
O	91.67%	92.35%	93.49%	93.66%
H	92.59%	93.02%	92.28%	92.70%
S	93.03%	93.58%	92.43%	92.20%
A	88.32%	88.80%	88.94%	88.70%

Table 20. The accuracy differences among all multiple-stream models on the four datasets.

Datasets	Two-S. vs. One-S.	Three-S. vs. Two-S.	Four-S. vs. Three-S.
O	0.68%	1.14%	0.17%
H	0.43%	−0.73%	0.42%
S	0.55%	−1.15%	−0.23%
A	0.48%	0.15%	−0.25%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, Y.-C. Multiple-Stream Models for a Single-Modality Dataset with Fractal Dimension Features. Fractal Fract. 2025, 9, 248. https://doi.org/10.3390/fractalfract9040248

AMA Style

Chang Y-C. Multiple-Stream Models for a Single-Modality Dataset with Fractal Dimension Features. Fractal and Fractional. 2025; 9(4):248. https://doi.org/10.3390/fractalfract9040248

Chicago/Turabian Style

Chang, Yen-Ching. 2025. "Multiple-Stream Models for a Single-Modality Dataset with Fractal Dimension Features" Fractal and Fractional 9, no. 4: 248. https://doi.org/10.3390/fractalfract9040248

APA Style

Chang, Y.-C. (2025). Multiple-Stream Models for a Single-Modality Dataset with Fractal Dimension Features. Fractal and Fractional, 9(4), 248. https://doi.org/10.3390/fractalfract9040248

Article Menu

Multiple-Stream Models for a Single-Modality Dataset with Fractal Dimension Features

Abstract

1. Introduction

2. Materials and Methods

2.1. The Original Dataset

2.2. The Hurst Exponent Dataset

2.3. The Second Partial Derivative Dataset

2.4. The Average Information Content Dataset

2.5. Deep Learning Models

2.6. Multiple-Stream Deep Learning Models

3. The Proposed Scheme

4. Experimental Results and Discussion

4.1. One-Stream Models

4.2. Two-Stream Models

4.3. Three-Stream Models

4.4. Four-Stream Models

4.5. Multiple-Stream Models for the Original Dataset

4.6. Discussion

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI