1. Introduction
The liver is the largest organ, located underneath the right ribs and below the lung base. It has a role in digesting food [
1,
2]. It is responsible for filtering blood cells, processing and storing nutrients, and converting some these nutrients into energy; it also breaks down toxic agents [
3,
4]. There are two main hepatic lobes, the left and right lobes. When the liver is viewed from the undersurface, there are two more lobes, the quadrate and caudate lobes [
5].
Hepatocellular carcinoma (HCC) [
6] may occur when the liver cells begin to grow out of control and can spread to other areas in the body. Primary hepatic malignancies develop when there is abnormal behavior of the cells [
7].
Liver cancer has been reported to be the second most frequent cancer to cause death in men, and sixth for women. About 750,000 people got diagnosed with liver cancer in 2008, 696,000 of which died from it. Globally, the rate of infection of males is twice than that of females [
8]. Liver cancer can be developed from viral hepatitis, which is much more problematic. According to the World Health Organization, WHO, about 1.45 million deaths a year occur because of this infection [
9]. In 2015, Egypt was named as the country with the highest rate of adults infected by viral hepatitis C (HCV), at 7% [
9]. Because the treatment could not be reached by all infected people, the Egypt’s government launched a “100 Million
Seha” (
seha is an Arabic word meaning “health”) national campaign between October 2018 and April 2019. At the end of March 2019, around 35 million people had been examined for HCV [
10].
Primary hepatic malignancy is more prevalent in Southeast Asia and Africa than in the United States [
11,
12]. The reported survival rate is generally 18%. However, survival rates rely on the stage of disease at the time of diagnosis [
13].
Primary hepatic malignancy is diagnosed by clinical, laboratory, and imaging tests, including ultrasound scans, magnetic resonance imaging (MRI) scans, and computed tomography (CT) scans [
14]. A CT scan utilizes radiation to capture detailed images around the body from different angles, including sagittal, coronal, and axial images. It shows organs, bones, and soft tissues; the information is then processed by the computer to create images, usually in DICOM format [
15]. Quite often, the examination requires intravenous injection of contrast material. The scans can help to differentiate malignant lesions from acute infection, chronic inflammation, fibrosis, and cirrhosis [
16].
Staging of hepatic malignancies depends on the size and location of the malignancy [
16,
17]. Hence, it is important to develop an automatic procedure to detect and extract the cancer region from the CT scan accurately. Image segmentation is the process of partitioning the liver region in the CT scan into regions, where each region represents a semantic part of the liver [
18,
19]. This is a fundamental step to support the diagnosis by radiologists, and a fundamental step to create automatic computer-aided diagnosis (CAD) systems [
20,
21,
22]. CT scans of the liver are usually interpreted by manual or semi-manual techniques, but these techniques are subjective, expensive, time-consuming, and highly error prone.
Figure 1 shows an example where the gray level intensities of the liver and the spleen are too similar to be differentiated by the naked eye. To overcome these obstacles and improve the quality of liver tumors’ diagnosis, multiple computer-aided methods have been developed. However, these systems have not been that great at the segmentation of the liver and lesions due to multiple challenges, such as the low contrast between the liver and neighboring organs and between the liver and tumors, different contrast levels in tumors, variation in the numbers and sizes of tumors, tissues’ abnormalities, and irregular tumor growth in response to medical treatment. Therefore, a new approach must be used to overcome these obstacles [
23].
In this work, a review of wide variety of recent publications of image analysis for liver malignancy segmentation is introduced. In recent years, extensive research has depended on supervised learning methods. The supervised method use inputs labeled to train a model for a specific task—liver or tumor segmentation, in this case. On top of these learning methods are the deep learning methods [
26,
27]. There are many different models of deep learning that have been introduced, such as stacked auto-encoder (SAE), deep belief nets (DBN), convolutional neural networks (CNNs), and Deep Boltzmann Machines (DBM) [
28,
29,
30,
31]. The superiority of the deep learning models in terms of accuracy has been established. However, it is still a challenge to find proper training dataset, which should be huge in size and prepared by experts.
CNNs are considered the best of deep learning methods used. Elshaer et al. [
13] reduced the computation time of a large number of slices by using two trained deep CNN models. The first model was used to get the liver region, and the second model was used for avoiding fogginess from image re-sampling and for avoiding missed small lesions.
Wen Li et al. [
28] utilized a convolutional neural network (CNN) that uses image patches. It considers an image patch for each pixel, such that the pixel of interest is in the center of that patch. The patches are divided into normal or tumor liver tissue. If the patch contains at least 50 percent or more of tumor tissue, the patch is labeled as a positive sample. The reported accuracy reached about 80.6%. The work presented in [
12,
13] reported s more than 94% accuracy rate for classifying the images either as normal or abnormal if the image showed a liver with tumor regions. The CNN model has different architectures—i.e., Alex Net, VGG-Net, ResNet, etc. [
32,
33,
34]. The work presented by Bellver et al. [
5] used VGG-16 architecture as the base network in their work. Other work [
11,
16,
29,
32,
35,
36] has used two-dimensional (2D) U-Net, which is designed mainly for medical image segmentation.
The main objective of this work is to present a novel segmentation technique for liver cross-sectional CT scans based on a deep learning model that has proven successful in image segmentation for scene understanding, namely SegNet [
37]. Memory and performance efficiency are the main advantages of this architecture over the other models. The model has been modified to fit two-class classification tasks.
The paper is organized as follows. In the next section, a review is presented on recent segmentation approaches for the liver and lesions in CT images, as well as a short introduction to the basic concepts addressed in this work.
Section 3 presents the proposed method and the experimental dataset. Experimental results are presented in
Section 4. Finally, conclusions are presented and discussed in
Section 5.
2. Basic Concepts
Convolutional neural networks are similar to traditional neural networks [
20,
38,
39]. A convolutional neural network (CNN) includes one or more layers of convolutional, fully connected, pooling, or fully connected and rectified linear unit (ReLU) layers. Generally, as the network becomes deeper with many more parameters, the accuracy of the results increases, but it also becomes more computationally complex.
Recently, CNN models have been used widely in image classification for different applications [
20,
34,
40,
41,
42] or to extract features from the convolutional layers before or after the down sampling layers [
41,
43]. However, the architectures discussed above are not suitable for image segmentation or pixel-wise classifications. VGG-16 network architecture [
44] is a type of CNN model. The network includes 41 layers. There are 16 layers with learnable weights: there are 13 convolutional layers and three fully connected layers.
Figure 2 shows the architecture of VGG-16 as introduced by Simonyan and Zisserman [
44].
Most pixel-wise classification network architectures are of encoder–decoder architecture, where the encoder part is the VGG-16 model. The encoder gradually decreases the spatial dimension of the images with pooling layers; however, the decoder retrieves the details of the object and spatial dimensions for fast and precise segmentation of images. U-Net [
45,
46] is a convolutional encoder-decoder network used widely for semantic image segmentation. It is interesting because it applies a fully convolutional network architecture for medical images. However, it is very time- and memory-consuming.
The semantic image segmentation approach uses the predetermined weights of the pertained VGG-16 network [
45]. Badrinarayanan et al. [
37], have proposed an encoder–decoder deep network, named SegNet, for scene understanding applications tested on road and indoor scenes. The main parts of the core trainable segmentation engine are an encoder network, a decoder network, and a pixel-wise classification layer. The architecture of the encoder network is similar to the 13 convolutional layers in the VGG-16 network. The function of the decoder network is mapping the features of encoder with low to full-input resolution feature maps for pixel-wise classification.
Figure 3 shows a simple illustration of the SegNet model during the down sampling (max-pooling or subsampling layers) of the encoder part. Instead of transferring the pixel values to the decoder, the indices of the chosen pixel are saved and synchronized with the decoder for the up-sampling process. In SegNet, more shortcut connections are presented. The indices are copied from max pooling instead of copying the features of encoder, such as in FCN [
47], so the memory and performance of SegNet is much more efficient than FCN and U-Net.
5. Conclusions
This paper presents experimental work to adopt a deep learning model, used for semantic segmentation of road scene understanding, for tumor segmentation in CT Liver scans in DICOM format.
SegNet is recent encoder–decoder network architecture that employs the trained VGG-16 image classification network as encoder, and employs corresponding decoder architecture to transform the features back into the image domain to reach a pixel-wise classification at the end. The advantage of SegNet over standard auto-encoder architecture is in the simple yet very efficient modification where the max-pooling indices of the feature map are saved, instead of saving the feature maps in full. As a result, the architecture is much more efficient in training time, memory requirements, and accuracy.
To facilitate binary segmentation of medical images, the classification layer was replaced with binary pixel classification layer. For training and testing, the standard 3D-IRCADb-01 dataset was used. The proposed method correctly detects most parts of the tumor, with accuracy above 86% for tumor classification. However, by examining the results, there were few false positives that could be improved by applying false positive filters or by training the model on a larger dataset.
As a future work, we propose using a new deep learning model as an additional level to increase the localization accuracy of the tumor, and hence reduce the FN rate and increase the IoU metric, like the work introduced in [
20].