In the intracranial hemorrhage detection, the proposed system comprises five phases: Image Collection: RSNA 2019 brain CT hemorrhage database, Segmentation: FCM clustering algorithm, Feature Extraction: HoG, LTP, and LBP, Feature Optimization: Binary cuckoo search algorithm, and Classification: OGRU. The flowchart of the proposed system is illustratively specified in
Figure 1.
3.1. Image Collection
The proposed OGRU model’s performance is validated on a foreign database: RSNA 2019 brain CT hemorrhage database, which consists of 25,272 3D brain scans with 870,301 slices and a pixel size of
. In this manuscript, the proposed OGRU model is re-trained on this database, and then the results are validated with different cross-folds. In the RSNA 2019 brain CT hemorrhage database, the 3D brain scans are labeled with the annotators using 5 brain hemorrhage label types intraparenchymal, epidural, intraventricular, subarachnoid, and subdural. Furthermore, the brain scans are collected from three institutions like Stanford University, Universidade Federal de Sao Paulo Institution, and Thomas Jefferson University Hospital. In the RSNA 2019 brain CT hemorrhage database, the annotators have no information about symptoms’ acuity, medical history, patient age, and prior examination. However, it is automatically labeled as intracranial hemorrhage, while a slice comprises at least one intracranial hemorrhage type. The sample acquired 3D brain scans are depicted in
Figure 2.
3.2. Image Segmentation
After collecting the 3D brain scans, the image segmentation is accomplished using the FCM clustering algorithm to localize the specific object in the complex templates. Hence, the FCM uses fuzzy-set theory to assign a data object to the clusters. In the FCM clustering algorithm, each object is considered a member of each cluster with a variable degree of membership. The similarity between the object is estimated by utilizing the Euclidian distance measure, which plays a crucial role to select the precise clusters. In every iteration, the objective function
is reduced in the FCM clustering algorithm that is defined in Equation (1).
where
indicates clusters,
states degree of membership for
data point
in the cluster
,
indicates the center vector of the cluster
, and
denotes the number of data points.
In addition, the norm
estimates the similarity of data points
to the center vector of the cluster
. Then,
is determined for a given data point
using Equation (2).
where
states the fuzziness coefficient.
Additionally, the center vector is determined using Equation (3) [
21,
22].
The fuzziness coefficient
estimates the clustering tolerance by utilizing Equations (2) and (3). The limited fuzziness coefficient
value has a smaller overlap between the clusters
. In this clustering algorithm, the accuracy
is estimated by using
from the present iteration
to the next iteration
, which is mathematically specified in Equation (4).
where
and
indicates the degree of membership of the iterations
and
, and
specifies the highest vector value. Furthermore, the hybrid feature extraction is accomplished using HoG, LBP, and LTP descriptors for extracting features from the segmented images. The sample segmented 3D brain scans are depicted in
Figure 3.
3.3. Hybrid Feature Extraction
After image segmentation, the hybrid feature extraction is accomplished by using HoG, LBP, and LTP feature descriptors, where these descriptors are selected based on the feature importance calculation. In image processing applications, the HoG descriptor is often used for extracting feature values from medical images. In the HoG feature descriptor, the magnitude, and orientation of the brain scans
are initially computed. The vertical gradient
and horizontal gradient
are mathematically specified in Equation (5).
The computed vertical gradient
and horizontal gradient
are utilized to calculate the gradient magnitude
and angular orientation
that are defined in Equations (6) and (7).
The gradient magnitude
and angular orientation
partitions the 3D brain scans into different cells. Furthermore, the orientation related to the similar cells is integrated and quantized into final histogram bins and then the respective bins are combined into the final histogram [
23,
24]. The total number of features
are estimated by utilizing Equation (8).
where
represents the number of bins,
specifies the number of blocks per 3D brain scan, and
denotes block size.
In addition, the LBP and LTP encode the relation between the neighborhood pixel and the referenced pixel by calculating the gray-level difference. The LBP is an effective texture feature descriptor, which transforms the 3D brain scans into labels based on luminance value. In a 3D brain scan
, the position of the pixel is represented as
, which is derived by utilizing the central pixel value
of the threshold to signify the neighborhood pixel
. Additionally, the binary pixel value is weighted using the power of two, and further, the value is summed to generate a decimal number and it is stored in the location of
. The LBP is mathematically specified in Equations (9) and (10) [
25].
where
denotes maximum jumping time and
specifies the gray-level value of the center pixel
.
Similarly, the LTP is an extension of LBP that uses a thresholding constant for the pixel intensify values of three. In the LTP feature descriptor, the thresholding value is defined by using Equation (11) [
26].
where
denotes thresholding constant. By contrast, the extracted 9824 feature vectors are given as the input to the binary cuckoo search algorithm for feature optimization. The graphical representation of the feature importance calculation is shown in
Figure 4.
3.4. Feature Optimization
After feature extraction, the feature optimization is accomplished using the binary cuckoo search algorithm that is stimulated using obligate parasites. In the host bird nests, the cuckoo birds lay down their eggs. The cuckoo bird mimics the external properties of the eggs from the host nests, such as color, spot, and size, and further, the cuckoo bird place the eggs in the host bird’s nests. When this approach is ineffective, the host birds identify the cuckoo eggs. Then, the host birds abandon the nest or throw away cuckoo eggs, or else the cuckoo is successful in its strategies and process for the next generation. Based on this concept, the cuckoo search algorithm is generated [
27,
28] and the step-by-step process of this algorithm is given below:
Initialization Stage: Firstly, the host nest population is selected randomly (where ).
Generation of New Cuckoo Stage: After randomly initializing the nest population in search space, the initialized cuckoos are assessed by utilizing an objective function for identifying better solutions.
Fitness Evaluation Stage: Compute the fitness based on Equation (12) that helps to select the best one. Where,
indicates feature length,
denotes the state vector of the chaotic system and
represents the state vector of the estimated system.
Updating Stage: Cosine transform is employed to revise the initial solution of levy flights. A nest is chosen randomly and the excellence of the novel solution is assessed. In case, if the excellence of the new solution is superior to the old solution. The old solution is replaced with the new solution; otherwise, consider the old solution as the best solution. The levy flight used by the cuckoo search algorithm is mathematically represented in Equation (13).
The Levy flight Equation (13) with Gaussian distribution is shown in Equations (14) and (15).
where
indicates constant value and
denotes the current generation.
Reject Worst Nest Stage: In this stage, the novel nests are generated randomly, and the worst nests are thrown away based on the possible values. Additionally, the best solutions are graded based on a fitness function. Finally, the best solutions are spotted and recognized as optimal solutions.
Stopping Criterion Stage: This process is replicated until the maximum iteration is accomplished.
Immigration of Cuckoos: Once the cuckoos are grown and become mature; they live in their area and society for a certain period. The best profit society value is selected after the cuckoo groups are formed in dissimilar areas. It is hard to recognize which cuckoo belongs to which group when mature cuckoos live all over the environment. To avoid this concern, cuckoo grouping is carried out using the decision tree method. Each cuckoo
flies toward the goal habitat with a deviation of
radians. These two parameters,
and
help cuckoos identify their positions in the environment. For each cuckoo,
and
are determined by using the Equations (16) and (17).
where
indicates the random number,
denotes the parameter, which compels the deviation from the goal habitat.
The parameter settings of the cuckoo search algorithm are given as follows: iteration is 100, step length is 0.01, Levy flight distribution parameter is 1.5, the number of the nest is 20, the number of transition groups is 8, transition separation coefficient is 1, and transition probability coefficient is 0.1. Next, the selected 5409 feature vectors are given as the input to the OGRU model to classify six classes: Intraparenchymal, Subdural, Subarachnoid, Intraventricular, Epidural, and any other. The flowchart of the binary cuckoo search algorithm is represented in
Figure 5.
3.5. Classification
The GRU is an updated version of the Long Short Term Memory (LSTM) network that integrates forget and input gate into a single gate named the “update gate” and further, the GRU model includes an additional gate named the “reset gate”. Compared to the LSTM network, the GRU model is simple; therefore, it is becoming increasingly popular. Firstly, the GRU modulates the feature information inside the unit without using a memory cell. In the GRU model, the activation function
is a linear interpolation between the previous activation
and candidate activation
at the time state
, which is mathematically specified in Equation (18) [
29,
30].
where
represents the update gate that decides the number of units updating its activation and
states candidate activation.
The mathematical expressions of the update gate and the candidate activation are defined in Equations (19) and (20).
where
states reset gates and
states hyperbolic tangent function.
The
is mathematically calculated using Equation (21).
where
states parameter or weight, and
indicates sigmoid function.
In this scenario, the update gate
controls the prior states, where the long-term dependency units are called active update gates and the short-term dependency units are called active reset gates. The Stochastic Gradient Descent (SGD) optimization algorithm is applied in the GRU model for optimizing the stochastic objective functions based on the lower-order moments. The iterative algorithm: SGD initially starts with the random point of the gradient curve, and then it slants in the slope with the help of a user-defined learning rate until the gradient curve reaches the minimum value. In this study, the SGD optimization algorithm updates the weight or parameter
utilizing the gradient value
and then the corresponding gradient value is multiplied by the learning rate
. Therefore, the updated reset gate is mathematically defined in Equation (22).
where
and the term
states gradient loss function
that reduces
.
If any decimal values occur, the GRU model approximately rounds off the respective decimal values into complete values. The architecture of the GRU model is specified in
Figure 6.
The parameter settings of the GRU model are listed as follows: lambda loss amount is 0.0015, the number of hidden units are 32, the learning rate is 0.0025, and the number of iteration is 100. To resolve the un-constrained non-linear optimization issues, a BFGS algorithm is integrated with the GRU model. The BFGS algorithm uses a gradient descent function
to further reduce the gradient value to the local minimum. The gradient descent function
is mathematically defined in Equation (23).
where,
states non-convex function.
The point
is computed in the next iteration
using the point
, as mentioned in Equation (24).
where
states search direction and
denotes step size, and the minimizer
is mathematically defined in Equation (25).
Additionally, the search direction is specified in Equation (26).
where
denotes the 2nd derivative of
, which is called the Hessian matrix.
In this scenario, the quasi-Newton method is employed to compute
, as mentioned in Equation (27).
where
and
.
Furthermore, the
the approximation is computed utilizing Equation (28). The classes: Intraparenchymal, Subdural, Subarachnoid, Intraventricular, Epidural, and any other are classified based on the approximation of
. The experimental results of the OGRU-CSO model are specified in
Section 4.