Machine Learning Platform for Disease Diagnosis with Contrast CT Scans

Jin, Jennifer; Kim, Mira; Kim, Soo Dong; Jin, Daniel

doi:10.3390/app14177822

Open AccessArticle

Machine Learning Platform for Disease Diagnosis with Contrast CT Scans

¹

School of Computer Science and Engineering, California State University, San Bernardino, CA 92407, USA

²

School of Computer Science, California State University, Fullerton, CA 92831, USA

³

School of Software, Soongsil University, Seoul 06978, Republic of Korea

⁴

Department of Radiology, Loma Linda University Medical Center, Loma Linda, CA 92354, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7822; https://doi.org/10.3390/app14177822

Submission received: 24 July 2024 / Revised: 22 August 2024 / Accepted: 22 August 2024 / Published: 3 September 2024

(This article belongs to the Special Issue Advances in Machine Learning for Healthcare Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Machine learning has gained significant recognition as a powerful approach for medical diagnosis using medical images. Among various medical imaging modalities, contrast-enhanced CT (CECT) is utilized to obtain additional diagnostic information that improves visualization and evaluation of certain abnormalities in the human body, as well as to observe temporal changes in lesions and tumors across different time phases. However, developing such medical diagnostic systems presents two significant challenges: high technical complexity and substantial development effort. This paper presents a software platform that effectively addresses these challenges. Specifically, we propose a unified software process that fully automates contrast-enhanced CT (CECT)-specific disease diagnosis, with key tasks performed by leveraging task-specific machine learning models to enhance accuracy. The platform incorporates a suite of specialized machine learning models into the diagnostic process, enabling precise diagnosis of lesions, malignancies, tumors, tumor characteristics, and temporal changes over phases. Moreover, the platform has been designed according to the Open–Closed Principle, allowing it to be applicable to a wide range of CECT-based diagnostic systems. The platform has been implemented in Python using the Scikit-learn and TensorFlow libraries. To validate its applicability and reusability, a hepatocellular carcinoma (HCC) diagnosis system has been implemented.

Keywords:

machine learning platform; medical image analytics; contrast-enhanced CT; Open–Closed Principle (OCP); liver cancer diagnosis

1. Introduction

Machine learning (ML) has gained significant recognition within healthcare organizations as a potent tool for medical image analytics, particularly in the realm of disease diagnosis. More specifically, ML models can effectively extract meaningful insights from medical images, identify tumor occurrences, stage tumor progression, determine disease occurrences, predict future disease occurrences, and even recommend treatment procedures. Some of the commonly used ML algorithms include convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and support vector machines (SVMs).

Among various medical image modalities, contrast-enhanced CT (CECT) is widely utilized for disease assessment since CECT offers several advantages over non-contrast CT. The advantages include improved visibility of blood vessels, better characterization of lesions, enhanced detection of tumors, improved assessment of organ perfusion, clearer delineation of anatomical structures, and detection of active inflammation. Some of the diseases that can be better diagnosed using CECT include oncological diseases, cardiovascular diseases, and gastrointestinal diseases.

However, developing ML-based diagnosis systems with CECT presents two significant challenges: substantial technical complexity and a high development effort. Consequently, many of these systems suffer from limited diagnostic performance and are not accepted for use in medical clinics.

The high technical complexity of developing these systems is attributed to transforming the diagnosis processes applied by physicians into software processes. It is challenging mainly due to the deviations between the terms, practices, and standards used by physicians and those of software developers. Moreover, it is challenging to configure the most suitable ML algorithms for various diagnostic tasks, train them with an optimal set of hyperparameters, fine-tune trained ML models for optimal performance, manage control flows and data flows among the ML models, and synthesize various diagnostic results to determine ongoing diseases and their progression.

The high development effort is attributed to the considerable time required for designing the system architecture, functional components, data persistence, and runtime behavior. Additionally, the effort needed for designing, training, and tuning multiple ML models trained with different algorithms can be significantly high. This process also requires a team of skilled professionals, including data scientists, software engineers, and medical domain experts.

Our strategy to address these challenges is to develop a machine learning-based software platform that captures the common diagnostic process with CECT, provides an essential set of functional components, manages data persistence, and supports a high level of configurability and extensibility for developing various CECT-based diagnostic systems. More specifically, we designed the platform by applying the Open–Closed Principle, where common features are designed to be fixed and variable features among diagnostic systems are designed to be customizable and extendable. The significant advantages of leveraging this platform for developing CECT-based diagnostic systems are the substantial reduction in technical complexity and the cost-effective development of high-quality diagnostic solutions.

The paper is organized as follows: Section 2 summarizes related works on medical image analytics. Section 3 provides the fundamentals of medical analytics with CECT. Section 4 presents the design of the platform in terms of platform architecture, components, diagnostic process, database, and machine learning models. Section 5 elaborates on a case study of developing an HCC diagnosis system using the proposed platform.

2. Related Works

This section is to summarize representative works on machine learning-based disease diagnosis with CT scans.

Tao et al. [1] present a case report of a renal anastomosing hemangioma, an uncommon variant that histologically resembles angiosarcoma. The authors describe the clinical and pathological features of this rare lesion and provide a literature review to distinguish it from angiosarcoma for an accurate diagnosis. Wang et al. [2] describe the development and deployment of an AI-assisted CT imaging analysis system for COVID-19 screening. The system leverages deep learning algorithms to analyze CT images, providing accurate and efficient COVID-19 detection. The study demonstrates the system’s effectiveness in improving screening workflows during the pandemic.

Kouanou et al. [3] use the Hadoop framework and Spark framework to develop appropriate methods to handle large image datasets for classification. Mansour et al. [4] present an artificial intelligence (AI) and big data analytics-based intracerebral hemorrhage e-diagnosis (AIBDA-ICH) model using CT images. The presented model utilizes IoMT devices for the data acquisition process, involving a graph cut-based segmentation model for identifying the affected regions in the CT images. Dhar et al. [5] traverse the major challenges that the deep learning community faces in medical image diagnosis, like the unavailability of balanced annotated medical image data, adversarial attacks faced by deep neural networks and architectures due to noisy medical image data, a lack of trustability among users and patients, and ethical and privacy issues related to medical data.

Xie et al. [6] conduct design activities to formulate a system that enables physicians to explore and understand AI-enabled chest X-ray analysis using a paired survey between referring physicians and radiologists to reveal whether, when, and what kinds of explanations are needed. A low-fidelity prototype co-designed with three physicians formulates eight key features. Olveres et al. [7] cover different AI advances that tackle medical problems such as cardiology, cancer, dermatology, neurodegenerative disorders, respiratory problems, and gastroenterology. Different imaging is utilized to build automatic systems that help medical diagnosis, with limitations due to the signal-to-noise ratio and the contrast and resolutions in time, space, and wavelength.

Lei et al. [8] introduce the background of AI and its application in breast medical imaging (mammography, ultrasound, and MRI), such as in the identification, segmentation, and classification of lesions; breast density assessment; and breast cancer risk assessment. Lutnick et al. [9] create an intuitive interface for data annotation and the display of neural network predictions within a commonly used digital pathology whole-slide viewer. They demonstrate that segmentation of human and mouse renal micro compartments is repeatedly improved when humans interact with automatically generated annotations throughout the training process. Mehta et al. [10] conduct a systematic mapping study to identify and analyze research on big data analytics and artificial intelligence in healthcare, with a focus on the application of analytics, machine learning, and artificial intelligence over big data, which enables the identification of patterns and correlations and hence provides actionable insights.

Ker et al. [11] introduce machine learning algorithms as applied to medical image analysis, focusing on convolutional neural networks and emphasizing clinical aspects of the field. Chen et al. [12] propose a convolutional autoencoder deep learning framework to support unsupervised image feature learning for lung nodules through unlabeled data, which only needs a small amount of labeled data for efficient feature learning. Willemink et al. [13] describe fundamental steps for preparing medical imaging data in AI algorithm development, explain current limitations to data curation, and explore new approaches to address the problem of data availability. Shi et al. [14] review AI techniques in imaging data acquisition, segmentation, and diagnosis for COVID-19.

Contrast-enhanced computed tomography (CT) is a standard of care for the radiological diagnosis of various types of diseases, including circulatory system disease, kidney and bladder stones, inflammatory diseases, heart disease, and liver masses. Gosalia et al. [15] evaluate the ability of contrast-enhanced CT to detect acute myocardial infarction (MI), which has not been systematically assessed. On contrast-enhanced helical chest CT, they retrospectively identified 18 patients with an initial MI. They conclude acute MI is detectable on contrast-enhanced chest CT as an area of decreased left ventricular myocardial enhancement in a specific coronary arterial distribution. Gonio et al. [16] investigate the radiological findings prognostic for the development of pancreatic adenocarcinoma in a cohort of patients with hepatocellular carcinoma using multiphasic CT.

Chang et al. [17] evaluate the accuracy of contrast-enhanced ultrasound (CEUS), which is safe for patients with chronic kidney disease, for the characterization of kidney lesions in patients with and without chronic kidney disease. Cayet et al. [18] assess CT-scan performance for sinusoidal obstruction syndrome (SOS) diagnosis in patients receiving neoadjuvant chemotherapy (NC) prior to CRLM surgery, comparing the obtained results with the pathological gold standard. Rawson and Pelletier [19] detail the types of contrast agents and describe when to order contrast-enhanced CT.

Ohno et al. [20] explore the basics of dual-energy CT and dynamic first-pass contrast-enhanced perfusion CT techniques, the basics of time-resolved contrast-enhanced MRA and dynamic first-pass contrast-enhanced perfusion MRI, and the clinical applications of contrast-enhanced CT- and MRI-based perfusion assessments for patients with pulmonary nodules, lung cancer, and pulmonary vascular diseases. Scheffel et al. [21] assess the value of dual-energy CECT imaging for the detection of urinary stone disease using dual-source CT and suggest virtual unenhanced CT images reconstructed from contrast-enhanced dual-energy CT that allow for the detection of urinary stones with good sensitivity and excellent specificity but with decreased sensitivity in abdominal obese patients. Fletcher et al. [22] demonstrate contrast-enhanced CT colonography is a promising method for detecting local recurrence, metachronous disease, and distant metastases in patients with prior invasive colorectal carcinoma.

Lavarone et al. [23] suggest that intrahepatic cholangiocarcinoma (ICC), which may develop in patients with cirrhosis, displays distinct vascular patterns in cirrhotic patients during a CT scan, which may allow for differentiation from hepatocellular carcinoma (HCC). Chen et al. [24] compare the enhancement pattern of ICC on contrast-enhanced ultrasound (CEUS) with that on CECT and conclude that the enhancement patterns of ICC on CEUS were consistent with those on CECT in the arterial phase, whereas in the portal phase, ICC faded out more obviously on CEUS than on CECT. Prionas et al. [25] suggest the conspicuity of malignant breast lesions, including ductal carcinoma in situ, is significantly improved on contrast-enhanced breast CT, and quantifying lesion enhancement may aid in the detection and diagnosis of breast cancer. Cha et al. [26] proposed a software platform using Big Health data. Vuppala et al. [27] describe a framework for leveraging the cloud platform and big data technologies to perform medical image data analytics even across multiple institutions, addressing security, privacy, and scalability using a newer generation de-identification technique based on the big data platform.

Chrimes et al. [28] establish a healthcare big data analytics (HBDA) platform and test its performance for different patient query types to demonstrate high usability for a variety of reporting requirements by providers and health professionals. Allen et al. [29] demonstrate a robust platform that uses software automation and high-performance computing (HPC) resources to achieve real-time analytics of clinical data, specifically magnetic resonance imaging (MRI) data. Kaur et al. [30] propose a generic architecture for enabling AI-based healthcare analytics platforms by using open sources and present the importance of applying AI-based predictive and prescriptive analytics techniques in the health sector. The system will be able to extract useful knowledge that helps in decision-making and medical monitoring in real-time through intelligent process analysis and big data processing.

The current work on medical image analytics with CT scans primarily focuses on addressing software approaches for diagnosing diseases using medical images. Some works have been developed to utilize machine learning models for segmentation and classification tasks involved in the diagnostic processes. A few works propose platform approaches to disease diagnosis that are not tied to specific medical image schemes or machine learning model configurations.

Our work is uniquely distinguished from existing works in three aspects: (1) defining a unified software process to automate disease diagnosis with CECT scans; (2) configuring specific ML models for various diagnostic tasks; and (3) designing core components of the platform by applying the Open–Closed Principle (OCP) to provide high configurability for key variability areas.

3. Foundation of Medical Image Analytics with Machine Learning

This section provides a foundation for medical image analytics with ML models and disease diagnosis with CECT.

3.1. Machine Learning Algorithms for Medical Image Analytics

Medical imaging analytics refers to the application of software analytical methods to medical imaging data to extract useful medical information, including health conditions and ongoing diseases. ML plays a key role in developing high-performing medical image analytics systems. The essential types of ML algorithms for medical image analytics include the following:

CNN for image classification, segmentation, and object detection tasks
RNN for analyzing time-series medical images or volumetric data
GAN for image data augmentation, image synthesis, and image-to-image translation
SVM for medical image classification and abnormality detection

There are multiple ML algorithms for each type of ML model. For example, representative CNN algorithms that can be used for medical image analytics include U-Net, V-Net, DeepMedic, ResNet, Inception, and Mask R-CNN. The most optimal ML algorithm for a given diagnostic task should be identified and applied to yield high-performance analytics.

3.2. Contrast-Enhanced Computed Tomography

A CECT scan is a medical imaging technique that uses CT imaging in conjunction with a contrast agent to produce detailed images of internal organs, blood vessels, and other tissues.

A contrast agent is a substance, typically iodine-based or barium sulfate-based, that is administered to the patient either orally, intravenously, or rectally before or during the CT scan. The contrast agent increases the contrast within the images by highlighting specific areas, thereby improving the visibility of organs, tissues, and vessels.

CECT offers significant advantages over non-contrast CT, particularly for specific diagnostic tasks. A primary benefit is the enhanced visualization of target objects in CT slices, which is particularly significant for several types of anatomical and pathological structures. For example, blood vessels and vascular structures such as arteries and veins are much more clearly defined with the use of contrast agents.

Another key benefit of using CECT-based diagnosis is the ability to observe the temporal evolution of medical conditions. This includes monitoring changes in the size, shape, and characteristics of lesions, tumors, or other abnormalities over time. This information is crucial for assessing the progression or regression of diseases, evaluating the effectiveness of treatments, and making informed clinical decisions. An example of CT slices showing temporal changes in a lesion is shown in Figure 1.

The CT image in the figure shows axial slices of the abdomen, specifically highlighting the liver. The arrows indicate lesions of interest within the liver. The figure also reveals the temporal changes in the organ over four phases. The arrows point to a lesion in the liver, showing the progression or response of a hepatic lesion over different phases of contrast enhancement.

The CT slice #1 shows the liver in the unenhanced (non-contrast) phase, captured in an axial plane. CT slice #2 depicts the liver in the arterial phase after the administration of contrast material. CT slice #3 represents the liver in the venous phase, and CT slice #4 displays the liver in the delayed phase. This observation of temporal evolution on medical objects is essential in diagnosing some diseases, such as liver cancer [31].

3.3. Tasks in the Disease Diagnosis Process with CECT

Disease diagnosis with CECT involves a number of common tasks to comprehensively assess the presence, extent, and progression of diseases.

Not all diagnostic tasks should be performed with ML models. Tasks with low complexity, such as the characterization of lesions and tumors by size, shape, and density, do not necessarily require ML models. Many of the complex diagnostic tasks can be performed more accurately with ML models for segmentation and classification purposes, and such diagnostic tasks are summarized in Table 1.

This set of diagnosis tasks becomes the basis for designing ML models and deriving functional components of the platform.

4. Design of the Platform

This section elaborates on the detailed design of the proposed platform in terms of the platform architecture, functional components, disease diagnosis process, database, and ML algorithms.

A software platform is not a specific application; rather, it provides a set of functions and features that are common among potential applications constructed with the platform. Therefore, the reusability of the platform and its configurability for specific applications are key quality requirements. Our approach to meeting these requirements is to design the platform by applying the Open–Closed Principle (OCP) [32]. In this design principle, the common features of a system are designed to be closed to modifications, and the variable features of the system are designed to be open to extension and customization.

4.1. Architecture Design with Microkernel Style

The architecture of a software system defines the stable and schematic layout of the system, where various components are deployed. The architecture of software platforms reveals a higher degree of variability than the architecture of a single application. This is because there are various types of differences, i.e., variation points, among application systems that are built upon the platform [33].

This becomes especially evident with the medical analytics platform proposed in this paper, where variability exists for the target disease types, diagnosis processes, machine learning models deployed, and the policy for determining disease occurrences and their stages.

To handle the variability, we applied the microkernel architecture style to design the platform architecture, as shown in Figure 2.

The microkernel architecture style is characterized by the division of system responsibilities between the core control layer and the plugin control layer, which binds externally devised plugin components on-the-fly. The core control layer provides functionality that is common among various diagnosis systems and is designed with closed-design schemes.

The plugin control layer defines a set of required interfaces in UML that are implemented/realized in plugin components. Externally implemented plugin components are then bound to the plugin control layer to fulfill the variation points.

The benefits of adopting the microkernel architecture style when building this platform are as follows:

Separation of Concerns: Microkernels separate the invariant components from the components with variability, allowing for a clear separation of concerns.
Customizability: The functionality of components with variability can be customized for specific diseases, diagnosis methods, and ML models applied. Updating or replacing plugin objects does not require modifying the kernel, making the system adaptable and extensible.
Platform Independence: The components in the core control layer are stable and platform-specific elements, making it easier to port to different medical image analytics applications.

4.2. Design of Functional Components

To derive functional components effectively, we first constructed a use case model that captures the platform functionality by applying the commonality and variability analysis on a set of representative family applications, including the Liver Imaging Reporting and Data System (LI-RADS), Lung Imaging Reporting and Data System (LUNG-RADS), Breast Imaging Reporting and Data System (BI-RADS) [34] and Prostate Imaging Reporting and Data System (PI-RADS) [35]. For this use case diagram, we derived 12 functional components by clustering related use cases into a component.

A key feature of any software platform is its capability to enable effective customization of platform functionality. For this reason, these functional components were analyzed for their potential variability. Functional components with variability are placed in the plugin control layer, and components with no variability are placed in the core control layer, as shown in Figure 3.

Among the identified components, the six components in the plugin control layer were determined to have intrinsic variability.

The design of functional components with variability can effectively be completed by applying the commonality and variability analysis. The result of applying variability modeling to these components is summarized in Table 2.

The variability analysis becomes the basis for deriving design schemes to effectively customize the variability, such as defining required interfaces for ‘open’-scoped variation points.

For example, the Tumor Segmentation Manager for a specific organ can be defined with a required interface, which can bind different implementations of CNN segmentation, as shown in Figure 4.

Figure 4a shows two different interfaces of this component: a provided interface, iTumor_Segmenter(), and a required interface, iTumor_Seg_Model(), which is implemented in various ways using different CNN segmentation algorithms. Figure 4b shows two plugin objects implemented with different CNN algorithms that can equally bind to the required interface. This will enable the component Tumor Segmentation Manager to work with any valid CNN segmentation model.

4.3. Design of Persistent Datasets

Software systems provide functionality by manipulating relevant data, and some of the data must be stored on permanent data storage or cloud space for persistence. Accordingly, our platform for diagnosing diseases with CECT scans should also manage the set of persistent datasets common among its family of applications.

The persistent data model of the platform was constructed by applying commonalty and variability analysis to a set of representative family applications, including LI-RADS, LUNG-RADS, BI-RADS, and PI-RADS applications. The class diagram capturing persistent datasets for the platform is shown in Figure 5.

The class diagram in the figure organizes persistent objects around two core classes: Diagnosis Session and ML Model. The Diagnosis Session is a logic object class that captures essential contents for each disease diagnosis session. Hence, it aggregates other part-classes: Lesion, Tumor, Medical Feature, and Disease.

The ML Model class captures the meta-information of various machine learning models deployed in the system. This class is specialized into two subclasses: Segmentation Model and Classification Model. Each of the subclasses is further specialized into ML models for specific purposes.

Note that association relationships are defined between classes capturing diagnosis results and the ML models used for diagnosis. For example, the Tumor class is associated with the Tumor Segmentation Model, making it possible to trace which ML model is utilized for segmenting each specific tumor.

4.4. Design of the Unified Diagnosis Process

The process of diagnosing diseases in medical analytics systems using CECT scans reveals a high commonality among different disease types, such as liver disease, lung disease, pancreatic disorder, and vascular disease. Hence, this common process should be modeled and designed in one place to promote effective reuse.

The algorithm design of the unified diagnosis process, diagnose_disease (), is shown in Figure 6.

The algorithm is defined as a sequence of ten steps. Step 1 is to load a patient’s profile and treatment history. Step 2 is to load a CECT scan of the patient. Steps 3 through 7 are the key tasks of the process, consisting of segmentation, classification, and analytics tasks. Step 8 is to determine the occurrence of diseases by referring to the results of steps 3 through 7. Step 10 is to determine the stage of each disease occurrence.

4.5. Design of the Diagnosis Manager

There is a high level of commonality in diagnosing various diseases using CECT scans. Modeling this common diagnosis process can lead to a more effective and reusable design of the platform. However, there are also variabilities in some steps of the main process. Both the common and variable features of the diagnosis process should be captured in the design.

We design the diagnosis process by applying the Template Method Pattern, which represents a single fixed algorithm as the template method and the variability of some steps in the overall algorithm using subclasses and dynamic binding. The design of the Diagnosis Manager by applying this design pattern is shown in Figure 7.

The class diagram illustrates the application of the Template Method Pattern in the design of the Diagnosis Manager component. This pattern is characterized by a superclass that defines the skeleton of an algorithm in a template method, consisting of a sequence of method invocations. Some of these method invocations are to abstract methods, referred to as placeholder methods. These placeholder methods, which are designed to be overridden by subclasses, are highlighted in blue in the figure. They represent the customizable parts of the algorithm, allowing for variation in behavior without altering the overall structure or flow of the algorithm.

The figure illustrates three subclasses that implement the abstract methods defined in the superclass. Each subclass adapts these methods to its specific requirements. For instance, the subclass Liver_Cancer_Diagnosis_Manager implements the segment_organ() method to segment the liver from a given CECT scan. Additionally, it implements the classify_disease() method to classify the type of liver cancer, such as hepatocellular carcinoma (HCC), by applying a standard called Liver Imaging Reporting and Data System (LI-RADS). This classification is based on the analysis of identified tumors, changes over contrast phases, and medical imaging data. The classify_stage() method of this subclass will determine a HCC stage using the categories of LR-NC (Not Categorizable), LR-1, LR-2, LR-3, LR-4, LR-5, LR-5V, LR-M, and LR-TR.

Similarly, another subclass LI-_Cancer_Diagnosis_Manager implements all the abstract methods to its specific requirements for diagnosing lung cancer. The implementation of the abstract methods in this subclass can be performed by applying a standard, Lung Imaging Reporting and Data System (LUNG-RAD). The classify_stage() method of this subclass will determine a lung cancer stage using the taxonomic categories of Category 0 (Incomplete), Category 1 (Negative), Category 2 (Benign Appearance or Behavior), Category 3 (Probably Benign), Category 4A (Suspicious), Category 4B (More Suspicious), and Category 4X (Highest Concern for Malignancy).

This design facilitates the reuse of common diagnostic procedures while allowing for customization to accommodate disease-specific diagnostic methods. The design of Diagnosis Manager demonstrates the high reusability of commonality and customizability for variability, which are essential features of any software platform. This design also provides high extensibility through a specialization of the superclass for new types of disease diagnosis using CECT.

4.6. Design of Machine Learning Models

The core functionality of this platform is to diagnose diseases by analyzing CECT scans, and the platform relies on several types of ML models for diagnosing diseases. A set of six core ML models has been identified through a comparative analysis of representative standards for analyzing CECT scans.

Organ Segmentation Model for segmenting a target organ based on CECT slices.
Lesion Segmentation Model for segmenting lesions based on CECT slices.
Tumor Segmentation Model for segmenting tumors based on lesions.
Image Feature Classifier for classifying various image features.
Disease Type Classifier for classifying diseases from the identified tumors and their medical features.
Disease Stage Classifier for classifying a stage for the identified diseases.

A challenge in managing ML models for this platform is to handle the high variability of ML algorithms for various diseases and the multiple versions of an ML model trained with different training sets. Therefore, the platform should be designed to support the high configurability of training ML models with different algorithms and binding ML models on-the-fly.

Our approach to meeting this requirement is to use ML management within the main program using the Bridge Design Pattern. This pattern facilitates system architecture by decoupling the abstraction part of the system from its implementation part. The design of ML management is shown in Figure 8.

The class diagram depicted in the figure shows two abstract classes: the Segmentation Model and the CNN Segmentation Algorithm. Each abstract class is specialized into multiple subclasses, which represent valid alternatives at their respective variation points. On the left side of the Bridge, there is a class hierarchy that includes various segmentation models tailored for specific segmentation targets such as Organ, Lesion, and Tumor. Conversely, the right side of the Bridge shows a class hierarchy for a range of CNN-based segmentation algorithms.

This structure effectively demonstrates the decoupling of abstraction (segmentation models) from implementation (segmentation algorithms). This separation allows for the independent evolution of segmentation models and the underlying computational segmentation algorithms.

5. Case Study of Developing Liver Cancer Diagnosis

We present the result of a case study that develops a liver cancer diagnosis system by utilizing the proposed platform.

5.1. The HCC Type of Liver Cancer

Liver cancer is considered one of the most life-threatening cancers due to its typically late diagnosis and rapid progression. Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer, accounting for about 75% to 85% of cases. There is a widely accepted standard for diagnosing HCC-type liver cancer called the Liver Imaging Reporting and Data System (LI-RADS).

Our case study develops a medical image analytics system that particularly diagnoses occurrences of HCC using CECT scans. LI-RADS categorizes liver observations into different categories based on their imaging features and the likelihood of representing HCC. The criteria for HCC categorization are summarized in Table 3.

The imaging features listed in the third column of the table can be directly determined based on the results of organ, lesion, and tumor segmentations, as well as medical features classified by the platform.

5.2. Customizing the Platform for HCC Diagnosis

We adopted the proposed platform as the foundation and specialized its design for the HCC diagnosis system as follows:

5.2.1. Designing the Schematic Architecture

The schematic architecture of the platform, as depicted in Figure 2, was devised to be generic and common among disease diagnosis systems using CECT. Hence, the architecture was applied to the HCC diagnosis system without any modifications.

The allocation of the 12 functional components on the core control layer and the plugin control layer was applicable to this target system as well. No additional components were needed in developing the HCC diagnosis system.

5.2.2. Applying the Diagnosis Process for HCC Diagnosis

The platform defines a diagnosis process as a sequence of ten steps, as shown in Figure 6. The control flow of this diagnosis process is also applicable to diagnosing HCC according to the LI-RADS standard. Hence, no change was made to the diagnosis process.

5.2.3. Refining the Diagnosis Manager for HCC Diagnosis

The platform defines the design of Diagnosis Manager by applying the Template Method Pattern, as shown in Figure 7. To specialize, the eight abstract methods defined in the superclass Diagnosis_Manager were implemented in a subclass for the purpose of diagnosing HCC.

The design of HCC Diagnosis Manager by specializing Diagnosis Manager is shown in Figure 9.

The class HCC_Diagnosis_Manager is defined as a subclass, and it implements the eight abstracts for HCC diagnosis by applying the guidelines of the LI-RADS standard. We use the numpy library for handling arrays and the keras for leveraging deep learning models.

For example, the implementation of the method for segmenting tumors from lesions found in CT slices, segment_tumor(), is shown in Figure 10.

This program uses the same reference for a tumor segmentation model, tumor_seg_model. It performs three tasks: reading the list of pre-segmented lesions in line #6, preprocessing CT slices for segmentation in line #9, and segmenting tumors by invoking the ‘predict()’ method in line #26.

5.2.4. Applying the Database for the HCC Diagnosis System

The class diagram capturing persistent datasets on the platform is designed to be generic and common among the family of diagnosis applications. This design is also applicable to the data persistence of the HCC diagnosis system, requiring no change to the persistence design.

5.3. Training ML Models for HCC Diagnosis

5.3.1. Training Set of Labeled CECT Scans

We acquired the training set, i.e., a set of 130 CECT scans, from two sources: Loma Linda University Medical Center [36] and Radiopaedia. The CECT scans from the university medical center were initially unlabeled and therefore required manual labeling. The labels include image masks of lesions and tumors on the liver regions on CT slices and all the necessary features specified in LI-RADS.

An attending physician and medical assistants in Vascular and Interventional Radiology manually labeled the CT slices in the following two steps:

Step 1. Identifying the Region of Interest (ROI), i.e., target elements such as organs, lesions, and tumors. We used touch-screen monitors and stylus pens to delineate the border areas.
Step 2. Creating the Mask: Once the ROI is identified, the mask is created by a software library for converting borderlines into masks. Masks are then represented as a set of pixel values corresponding to the ROI, such as [0, 0, 255]. An example of an original CT slice and its masked image of the liver is shown in Figure 11.

5.3.2. Training ML Models for HCC

The six essential ML models of the platform were applicable to HCC diagnosis. We developed an additional ML model for analyzing medical features presented in tumors. The types of ML models implemented in this case study are summarized in Table 4.

The selection of ML algorithms for segmentation and classification has been made through extensive experiments with training the models with other candidate models.

5.3.3. Integrating an Image Feature Classifier and an HCC Type Classifier

The appearance of HCC on CT changes over time as the tumor grows and develops a blood supply. These changes are particularly evident when contrast agents are used, like CECT. The contrast enhancement pattern over different phases (arterial, venous, and delayed) provides important diagnostic information. HCC typically shows hyperenhancement in the arterial phase (due to its increased arterial blood supply) and washout in the venous or delayed phases (due to its lack of portal venous blood supply).

We applied the Long Short-Term Memory (LSTM) algorithm, a type of RNN, to capture these temporal changes. The input to the LSTM consists of a sequence of medical feature sets observed on different phases of the same detected tumor, where the medical features are identified by the CNN Image Feature Classifier. Hence, we integrated the CNN Image Feature Classifier (Step 6 in the process) and the LSTM HCC Type Classifier (Step 7 in the process).

The structure of the integration and information flow in this integration are shown in Figure 12.

5.4. Implementing the HCC Diagnosis System

We implemented the HCC diagnosis system in Python 3.11 and trained the ML models using Scikit-learn and TensorFlow 2.4 with the Keras API.

5.4.1. Hardware Environment for Training and Running ML Models

The hardware environment required for training the six machine learning models in the CECT-based diagnostic process depends on several factors, including the number of CT scans, the average number of slices per CT scan, and the complexity of the models. The training set used for the liver cancer diagnosis case study consists of 130 CT scans, with an average of 447 slices per CT scan.

The below-mentioned hardware environment was utilized for training the models. We encountered system crashes when the hardware configuration was not sufficient to meet the required computational demands.

Processor: AMD Ryzen Threadripper 7970X with 32 Cores and 64 Threads
GPU: Two NVIDIA RTX 4090 GPUs, each with 24 GB VRAM
RAM: 256 GB DDR4

The computing power required for running the six machine learning models is generally lower than that for training the models. In our case study, we used the same processor and RAM, but only a single GPU of NVIDIA RTX 4090.

5.4.2. User Interface of the System

The user interface of the system was developed as a web-based interface, and a screen of the interface is shown in Figure 13.

The system displays the seven steps of the HCC diagnosis at the top of the screen by adhering to the LI-RADS standard. The list of CT slices in a CT scan is displayed on the left side of the screen, with segmented organs highlighted in yellow. The summary of organ segmentation is shown at the bottom of the screen. The volume of the organ is also computed and displayed in the table.

We designed the user interface to be intuitive and self-explanatory. Additionally, the user interface was designed to be comprehensive, displaying all intermediate analytics results.

5.4.3. Experiments with the CECT Test Set

Software testing is an effective way of validating the functionality and features of software platforms. We created a test set using the CECT scans from Loma Linda University Medical Center, using the typical distribution of datasets into an 80% training set and a 20% test set.

The result of running the HCC diagnosis using this implementation is shown in Figure 14.

The system output shows five occurrences of HCC, (a) through (e), where each HCC is shown with the temporal changes in the tumor, major medical features, the size of the tumor, its location as a Hepatic Segment ID, and the determined stage.

5.4.4. Performance of Machine Learning Models

The performance of ML models is not very relevant to the quality of platform design since the models are trained outside of the platform using disease-specific datasets. However, we measured the performance of seven ML models for HCC diagnosis as the system implementation is available.

The average performance of the seven ML models is shown in Figure 15.

The performance measures for the ML models vary due to the intrinsic characteristics of each prediction task. The first three segmentation models show higher performance than the Tumor Segmentation Model. This difference can be attributed to three factors: the size of the target objects, the vividness of the target objects as they appear in CT slices, and the intrinsic nature of the visual ambiguity and blurriness of tumors.

5.4.5. Execution Performance of the ML Models

ML models, especially those based on deep neural network architectures such as CNNs and RNNs, run much slower than conventional algorithms like tree- or graph-based analysis. Hence, a direct comparison of execution times between ML models and traditional algorithms may not be feasible.

Instead, we analyze the runtime performance of the ML models in the context of the liver cancer diagnosis case study. The liver cancer diagnosis system, utilizing ML models, processes a CECT scan as input and follows the ten steps outlined in Figure 6. Consequently, the time required for diagnosing liver cancer is heavily dependent on the size of the given CECT scan.

Our dataset for training the ML models consists of 130 CECT scans, with an average of 447 slices per scan. The execution time for diagnosing liver cancer with a CT scan of this size was measured to be 18 min and 38 s. This execution time includes the time required to update the database contents and refresh the web-based user interface for each task involved in processing each CT slice, as shown in Figure 13. The actual time for making predictions with the ML models was measured to be approximately 14 min.

6. Assessment of the Platform

Evaluating the development of software platforms is a multi-faceted process that involves assessing various qualitative and quantitative aspects of the software.

6.1. Evaluating Functional Coverage of the Platform

The functional coverage of software platforms refers to the extent to which the platform’s features and capabilities meet the various requirements of potential applications that can be built with the platform [37]. It ensures that all necessary functionalities are implemented and can handle the diverse needs of medical diagnostics using CECT scans.

We define a metric, Functional Coverage (FC), for the functional coverage of the platform as follows:

F C = \frac{N_{f}}{N_{t}}

N_f is the number of features implemented by the platform, and N_t is the total number of features required by the family of medical diagnostic applications using CECT scans. The value range of CR is from 0 to 1, where 0 indicates that none of the required features have been implemented and 1 indicates that all required features have been fully implemented.

The value of N_f is computed as the sum of the four coverage-relevant categories:

The number of functional components that fulfill the functional requirements.
The number of persistent object classes that manage the persistent datasets.
The number of diagnostic tasks specified in the main diagnosis process.
The number of machine learning models required by the functional components.

The functional components of the platform were derived directly from the use-case diagram of the platform. Hence, the set of 12 functional components effectively fulfills the requirements.

The persistent object classes were derived from the data manipulated by each functional component. Hence, the set of 16 persistent object classes represents the persistent datasets managed by medical image analytics systems.

The 8 steps, 3 through 10, of the defined diagnostic process are directly mapped to the diagnosis tasks for diagnosing the target diseases with the CECT, as defined in various standards such as LI-RADS and LUNG-RADS.

The machine learning models needed for diagnosing diseases were derived by considering the intrinsic nature of the required diagnostic tasks. A total of six machine learning models, trained with CNN, RNN, and SVM classification algorithms, sufficiently assist with the required diagnostic tasks.

Accordingly, the N_f for the platform is computed as 42, as shown in Table 5.

All 42 features of the platform have been applicable to developing the liver cancer diagnosis system with the platform. In addition, the liver cancer diagnosis system required additional features to fulfill the diagnosis process and guidelines of the LI-RADS.

Additional Functional Component: Liver Lesion Tracker
This is a specialized software component designed to monitor and document changes in liver lesions over time. This functionality is crucial for the effective management and treatment of liver diseases, particularly hepatocellular carcinoma (HCC), as specified in the LI-RADS framework.
Additional Task in the Diagnosis Process: Specializing Step 9 for Tracking Lesions
Step 9 of the diagnosis process in Section 4.4 has been specialized to support the additional functionality of tracking lesions on the liver.
An Additional ML Model in the Diagnosis Process: Hepatic Segmentation Model
This model is required for segmenting hepatic segments on liver organs. These models are trained using the U-Net algorithm to utilize the encoder–decoder feature that allows for precise localization and segmentation.

The numbers of these additional features required for the liver cancer diagnosis system are shown in the third column of the table. The value of N_t, the number of features required for this target system, becomes 45. This will yield 93.3% of the Functional Coverage (FC) for the case study of constructing the LI-RADS application.

6.2. Evaluating the Feature Satisfaction Index

The Feature Satisfaction Index (FSI) is a quantitative metric used to evaluate the degree to which a software platform’s features meet the set of common functional and non-functional requirements of potential applications that can be built with the platform [38]. This metric takes into account both the importance (weight) of each feature and how well each feature has been implemented (compliance score), providing a comprehensive assessment of overall feature satisfaction.

We define a metric, Feature Satisfaction (FS), for the feature satisfaction of the platform as the following:

F S = \frac{\sum_{i = 1}^{N_{f}} w_{i} \cdot c_{i}}{\sum_{i = 1}^{N_{f}} w_{i}}

where the terms are defined as the following:

N_f is the total number of features provided by the platform.
w_i is the weight assigned to each feature i, representing the relative importance of each feature in the context of the platform’s overall objectives. Its value ranges between 0 and 1. Features critical to the platform’s functionality are assigned higher weights, while less critical features receive lower weights.
c_i is the compliance score of each feature i, indicating how well the feature is implemented. The compliance score assesses how well each feature meets its requirements. Its value ranges between 0 and 1. The value 0 indicates that the feature does not meet the requirements at all, and the value 1 indicates full compliance.

Through the case study of developing the liver cancer diagnosis system using the LI-RADS standard, we measured the compliance scores for the 42 features of the platform and synthesized their values into the metric FS. This result is summarized in Table 6.

Each category of the features is given an equal overall weight of 0.25, i.e., 25%. The weights of the 42 features are defined by the authors and developers who implement the platform. Unlike end-users, who typically assess feature satisfaction of application systems, the platform’s features—comprising approaches, design paradigms, software methods, and ML model strategies—are best understood and evaluated by developers with technical expertise. For example, technical aspects such as the applicability, reusability, and customizability of components like the Medical Feature Analyzer, Diagnosis Session data component, and core ML models require in-depth knowledge of software design, which end-users may lack.

The weights were determined by considering the relevance and impact of each feature on (1) diagnostic accuracy, (2) clinical significance, and (3) their contribution to the overall performance of the ML models. These considerations include expert domain knowledge, statistical analyses of feature importance, and empirical results from extensive testing and validation using diverse datasets.

The compliance scores of the 42 features were determined mainly by the developers who implemented the case study system (the liver cancer diagnosis system). The determination was based on a combination of factors, including (1) clinical relevance, (2) statistical significance, (3) empirical testing, (4) expert consensus, and (5) technical feasibility. This multifaceted approach ensures that the compliance scores reflect a comprehensive and balanced assessment of each feature’s value in diagnosing liver cancer accurately and effectively.

As shown in the table, the overall value of Feature Satisfaction (FS) was computed as 91.4%, which we believe is considerably high for a software platform in the medical application domain. We noticed that the FS value for the feature category of Diagnosis Process measured at 86.6%, the lowest among the four categories. This was mainly due to the high variability in the diagnosis processes for (1) different diseases, (2) their relevant body organs, and (3) standard guidelines governing the diagnosis processes.

In general, an FS rate above 90% is uncommon for software platforms. However, the high FS rate of this platform can be attributed to several objective factors and technical design strategies. The platform is focused on medical analytics using contrast-enhanced CT scans, which narrows its scope and leads to a high degree of feature commonality across applications. Additionally, there is substantial overlap in diagnostic tasks across international standards like LI-RADS, LUNG-RADS, BI-RADS, and PI-RADS, further contributing to the increased FS rate. The unified diagnosis process with CECT scans, paired with well-suited ML algorithms for each task, also enhances the FS rate.

To further boost the platform’s applicability, the Open–Closed Principle (OCP) design was applied to all variation points, including the ML model interfaces. By ensuring that software components such as classes, interfaces, and ML models are open for extension but closed for modification, the platform promotes flexibility and adaptability. This design approach allows for the extension of functionality without altering existing code, leading to enhanced reliability, easier maintenance, and ultimately contributing to the higher FS rate.

7. Concluding Remarks

In this paper, we have introduced a comprehensive machine learning-based software platform designed to address the complexities and challenges associated with diagnosing diseases using CECT scans. The platform provides a unified software process that can fully automate the diagnosis practices of various diseases using CECT scans. In addition, the platform effectively integrates a suite of specialized machine learning models, including CNN models for segmenting organs, lesions, and tumors, RNN models for observing temporal changes in tumors, and SVM models for disease classification and staging.

As the key software strategy, we leveraged the Open–Closed Principle (OCP) to develop a highly configurable and extendable system architecture, ensuring that common features remain stable while allowing for customization to meet the specific needs of various diagnostic applications. The platform’s architecture, based on the microkernel style, separates core control functionalities from plugin control functionalities, enhancing modularity and scalability.

Through a detailed case study, we demonstrated the platform’s applicability by developing a liver cancer diagnosis system that adheres to the LI-RADS standard. The case study highlighted the platform’s ability to support the end-to-end diagnostic process, from CT scan acquisition to generating comprehensive diagnosis reports.

We evaluated the proposed platform using two evaluation metrics: Functional Coverage (FC) and Feature Satisfaction (FS). Each of these evaluation metrics was applied to the case study of developing a liver cancer diagnosis system. Our platform achieved an overall FC score of 93.3% and an overall FS score of 91.4%, indicating its robustness and effectiveness in the medical application domain.

The key contributions of our research can be summarized in the following three points:

Unified Software Process of Diagnosis: Proposing a software process that fully automates the entire procedure involved in CECT-based disease diagnosis.
Integration of Essential ML Models: Identifying and integrating six core machine learning models to support various diagnostic tasks, thereby reducing the technical complexity of developing high-performing diagnostic systems.
Configurable and Extensible Design: Designing the platform to be highly configurable and extendable, enabling efficient customization for different diagnostic requirements.

Our future work should focus on expanding the types of diseases that can be diagnosed using our platform, integrating more dynamic and self-learning algorithms, and enhancing the platform’s ability to manage and analyze large datasets from diverse populations. We note that conducting a case study of developing an application with the proposed platform requires a tremendous amount of time and effort. This is primarily due to the required efforts for acquiring a sufficient number of CECT scans, labeling the CT slices for training purposes, training the required ML models, and fine-tuning the model for acceptable performances. By cooperating with university hospitals and private clinics, we aim to conduct each case study within approximately six months.

Author Contributions

J.J. and M.K. conceptualized the research scope and coordinated the co-authoring of the paper and peer review. S.D.K. designed the software platform and led the implementation of the platform and the case study system. D.J. was responsible for governing the software process and methods that automate the application of LI-RADS. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tao, L.L.; Dai, Y.; Yin, W.; Chen, J. A case report of a renal anastomosing hemangioma and a literature review: An unusual variant histologically mimicking angiosarcoma. Diagn. Pathol. 2014, 9, 159. [Google Scholar] [CrossRef]
Wang, B.; Jin, S.; Yan, Q.; Xu, H.; Luo, C.; Wei, L.; Zhao, W.; Hou, X.; Ma, W.; Xu, Z.; et al. AI-assisted CT imaging analysis for COVID-19 screening: Building and deploying a medical AI system. Appl. Soft Comput. 2021, 98, 106897. [Google Scholar] [CrossRef] [PubMed]
Kouanou, A.T.; Tchiotsop, D.; Kengne, R.; Zephirin, D.T.; Armele, N.M.A.; Tchinda, R. An optimal big data workflow for biomedical image analysis. Inform. Med. Unlocked 2018, 11, 68–74. [Google Scholar] [CrossRef]
Mansour, R.F.; Escorcia-Gutierrez, J.; Gamarra, M.; Díaz, V.G.; Gupta, D.; Kumar, S. Artificial intelligence with big data analytics-based brain intracranial hemorrhage e-diagnosis using CT images. Neural Comput. Appl. 2023, 35, 16037–16049. [Google Scholar] [CrossRef]
Dhar, T.; Dey, N.; Borra, S.; Sherratt, R.S. Challenges of Deep Learning in Medical Image Analysis—Improving Explainability and Trust. IEEE Trans. Technol. Soc. 2023, 4, 68–75. [Google Scholar] [CrossRef]
Xie, Y.; Chen, M.; Kao, D.; Gao, G.; Chen, X.A. CheXplain: Enabling Physicians to Explore and Understand Data-Driven, AI-Enabled Medical Imaging Analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20), Honolulu, HI, USA, 25–30 April 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–13. [Google Scholar]
Olveres, J.; González, G.; Torres, F.; Moreno-Tagle, J.C.; Carbajal-Degante, E.; Valencia-Rodríguez, A.; Méndez-Sánchez, N.; Escalante-Ramírez, B. What is new in computer vision and artificial intelligence in medical image analysis applications. Quant. Imaging Med. Surg. 2021, 11, 3830–3853. [Google Scholar] [CrossRef]
Lei, Y.; Yin, M.; Yu, M.; Yu, J.; Zeng, S.; Lv, W.; Li, J.; Ye, H.; Cui, X.; Dietrich, C.F. Artificial Intelligence in Medical Imaging of the Breast. Front. Oncol. 2021, 11, 600557. [Google Scholar] [CrossRef] [PubMed]
Lutnick, B.; Ginley, B.; Govind, D.; McGarry, S.D.; LaViolette, P.S.; Yacoub, R.; Jain, S.; Tomaszewski, J.D.; Jen, K.; Sader, P. An integrated iterative annotation technique for easing neural network training in medical image analysis. Nat. Mach. Intell. 2019, 1, 112–119. [Google Scholar] [CrossRef] [PubMed]
Mehta, N.; Pandit, A.; Shukla, S. Transforming healthcare with big data analytics and artificial intelligence: A systematic mapping study. J. Biomed. Inform. 2019, 100, 103311. [Google Scholar] [CrossRef]
Ker, J.; Wang, L.; Rao, J.; Lim, T. Deep Learning Applications in Medical Image Analysis. IEEE Access 2018, 6, 9375–9389. [Google Scholar] [CrossRef]
Chen, M.; Shi, X.; Zhang, Y.; Wu, D.; Guizani, M. Deep Feature Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network. IEEE Trans. Big Data 2021, 7, 750–758. [Google Scholar] [CrossRef]
Willemink, M.J.; Koszek, W.A.; Hardell, C.; Wu, J.; Fleischmann, D.; Harvey, H.; Folio, L.R.; Summers, R.M.; Rubin, D.L.; Lungren, M.P. Preparing Medical Imaging Data for Machine Learning. Radiology 2020, 295, 4–15. [Google Scholar] [CrossRef]
Shi, F.; Wang, J.; Shi, J.; Ziyan, W.; Wang, Q.; Tang, Z.; He, K.; Shi, Y.; Shen, D. Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19. IEEE Rev. Biomed. Eng. 2021, 14, 4–15. [Google Scholar] [CrossRef] [PubMed]
Gosalia, A.; Haramati, L.B.; Sheth, M.P.; Spindola-Franco, H. CT detection of acute myocardial infarction. Am. J. Roentgenol. 2004, 182, 1563–1566. [Google Scholar] [CrossRef] [PubMed]
Gonoi, W.; Hayashi, T.Y.; Okuma, H.; Akahane, M.; Nakai, Y.; Mizuno, S.; Tateishi, R.; Isayama, H.; Koike, K.; Ohtomo, K. Development of pancreatic cancer is predictable well in advance using contrast-enhanced CT: A case-cohort study. Eur. Radiol. 2017, 27, 4941–4950. [Google Scholar] [CrossRef] [PubMed]
Chang, E.H.; Chong, W.K.; Kasoji, S.K.; Fielding, J.R.; Altun, E.; Mullin, L.B.; Kim, J.I.; Fine, J.P.; Dayton, P.A.; Rathmell, W.K. Diagnostic accuracy of contrast-enhanced ultrasound for characterization of kidney lesions in patients with and without chronic kidney disease. BMC Nephrol. 2017, 18, 266. [Google Scholar] [CrossRef] [PubMed]
Cayet, S.; Pasco, J.; Dujardin, F.; Besson, M.; Orain, I.; De Muret, A.; Miquelestorena-Standley, E.; Thiery, J.; Genet, T.; Le Bayon, A.G. Diagnostic performance of contrast-enhanced CT-scan in sinusoidal obstruction syndrome induced by chemotherapy of colorectal liver metastases: Radio-pathological correlation. Eur. J. Radiol. 2017, 94, 180–190. [Google Scholar] [CrossRef]
Rawson, J.V.; Pelletier, A.L. When to order contrast-enhanced CT. Am. Fam. Physician 2013, 88, 312–316. [Google Scholar]
Ohno, Y.; Koyama, H.; Lee, H.Y.; Miura, S.; Yoshikawa, T.; Sugimura, K. Contrast-enhanced CT- and MRI-based perfusion assessment for pulmonary diseases: Basics and clinical applications. Diagn. Interv. Radiol. 2016, 22, 407–421. [Google Scholar] [CrossRef]
Scheffel, H.; Stolzmann, P.; Frauenfelder, T.; Schertler, T.; Desbiolles, L.; Leschka, S.; Marincek, B.; Alkadhi, H. Dual-energy contrast-enhanced computed tomography for the detection of urinary stone disease. Investig. Radiol. 2007, 42, 823–829. [Google Scholar] [CrossRef] [PubMed]
Fletcher, J.G.; Johnson, C.D.; Krueger, W.R.; Ahlquist, D.A.; Nelson, H.; Ilstrup, D.; Harmsen, W.S.; Corcoran, K.E. Contrast-enhanced CT colonography in recurrent colorectal carcinoma: Feasibility of simultaneous evaluation for metastatic disease, local recurrence, and metachronous neoplasia in colorectal carcinoma. Am. J. Roentgenol. 2002, 178, 283–290. [Google Scholar] [CrossRef]
Iavarone, M.; Piscaglia, F.; Vavassori, S.; Galassi, M.; Sangiovanni, A.; Venerandi, L.; Forzenigo, L.V.; Golfieri, R.; Bolondi, L.; Colombo, M. Contrast enhanced CT-scan to diagnose intrahepatic cholangiocarcinoma in patients with cirrhosis. J. Hepatol. 2013, 58, 1188–1193. [Google Scholar] [CrossRef] [PubMed]
Chen, L.D.; Xu, H.X.; Xie, X.Y.; Lu, M.D.; Xu, Z.F.; Liu, G.J.; Liang, J.Y.; Lin, M.X. Enhancement patterns of intrahepatic cholangiocarcinoma: Comparison between contrast-enhanced ultrasound and contrast-enhanced CT. Br. J. Radiol. 2008, 81, 881–889. [Google Scholar] [CrossRef]
Prionas, N.D.; Lindfors, K.K.; Ray, S.; Huang, S.Y.; Beckett, L.A.; Monsky, W.L.; Boone, J.M. Contrast-enhanced dedicated breast CT: Initial clinical experience. Radiology 2010, 256, 714–723. [Google Scholar] [CrossRef] [PubMed]
Cha, S.; Abusharekh, A.; Abidi, S.S. Towards a ‘Big’ Health Data Analytics Platform. In Proceedings of the IEEE First International Conference on Big Data Computing Service and Applications, Redwood City, CA, USA, 30 March–2 April 2015; pp. 233–241. [Google Scholar]
Vuppala, S.K.; Dinesh, M.S.; Viswanathan, S.; Ramachandran, G.; Bussa, N.; Geetha, M. Cloud based big data platform for image analytics. In Proceedings of the 2017 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), Bangalore, India, 1–3 November 2017; pp. 11–18. [Google Scholar]
Chrimes, D.; Moa, B.; Zamani, H.; Kuo, M.H. Interactive healthcare big data analytics platform under simulated performance. In Proceedings of the 2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Auckland, New Zealand, 8–12 August 2016; pp. 811–818. [Google Scholar]
Allen, W.J.; Gabr, R.E.; Tefera, G.B.; Pednekar, A.S.; Vaughn, M.W.; Narayana, P.A. Platform for automated real-time high performance analytics on medical image data. IEEE J. Biomed. Health Inform. 2017, 22, 318–324. [Google Scholar] [CrossRef] [PubMed]
Kaur, J.; Mann, K.S. AI based healthcare platform for real time, predictive and prescriptive analytics using reactive programming. J. Phys. Conf. Ser. 2017, 933, 012010. [Google Scholar] [CrossRef]
American College of Radiology. Liver Imaging Reporting and Data System (LI-RADS) Version 2018. Available online: https://www.acr.org/Clinical-Resources/Reporting-and-Data-Systems/LI-RADS (accessed on 21 August 2024).
Meyer, B. Object-Oriented Software Construction; Prentice Hall: Englewood Cliff, NJ, USA, 1988. [Google Scholar]
Kim, S.; Her, J.; Chang, S. A theoretical foundation of variability in component-based development. Inf. Softw. Technol. 2005, 47, 663–673. [Google Scholar] [CrossRef]
American College of Radiology. Breast Imaging Reporting and Data System (BI-RADS). 2013. Available online: https://www.acr.org/Clinical-Resources/Reporting-and-Data-Systems/Bi-Rads (accessed on 21 August 2024).
American College of Radiology. Prostate Imaging Reporting and Data System (PI-RADS). 2019. Available online: https://www.acr.org/Clinical-Resources/Reporting-and-Data-Systems/PI-RADS (accessed on 21 August 2024).
Loma Linda University Medical Center. Department of Radiology, Loma Linda, CA, USA. Available online: https://lluh.org/services/radiology (accessed on 21 August 2024).
Pressman, R.S. Software Engineering: A Practitioner’s Approach; McGraw-Hill: New York, NY, USA, 2014. [Google Scholar]
Kan, S.H. Metrics and Models in Software Quality Engineering; Addison-Wesley: Reading, MA, USA, 2003. [Google Scholar]

Figure 1. Temporal changes in a lesion with contrast-enhanced CT.

Figure 2. Applying microkernel architecture for extensibility.

Figure 3. Functional components in each control layer.

Figure 4. Defining the required interface for plugin components.

Figure 5. Class diagram for persistent datasets.

Figure 6. Software process for disease diagnosis with CECT scans.

Figure 7. Applying the Template Method Pattern for customizability.

Figure 8. Configuring ML models by applying the Bridge Design Pattern.

Figure 9. Specializing the diagnosis manager for HCC liver cancer diagnosis.

Figure 10. Python program for specializing segment_tumor().

Figure 11. Original CT slice and the mask of the liver organ.

Figure 12. Integrating CNN models and LSTM models for HCC diagnosis.

Figure 13. Web user interface of the HCC diagnosis system.

Figure 14. Example of an HCC diagnosis report.

Figure 15. Performance of the ML models for HCC diagnosis.

Table 1. Diagnosis tasks performed by machine learning models.

Diagnosis Tasks	Medical Objects
Segmentation	Organ
	Lesion
	Tumor
Classification	Medical Features of Tumors
	Temporal Change in Tumors
	Occurrence of Diseases
	Stage of Diseases

Table 2. Variability analysis of the functional components.

Functional Components	Variability Type	Variation Points	Variant Scope	Set of Variants
Organ Segmentation Manager	Attribute	Organ	Selection	Organ of Interest for Target Disease
Organ Segmentation Manager	ML Model	CNN Segmentation Model	Open	U-Net, V-Net, RCNN, FCNN, SegNet, etc.
Lesion Segmentation Manager	Attribute	Set of Target Lesions	Selection	Types of Lesions for Target Disease
Lesion Segmentation Manager	ML Model	CNN Segmentation Model	Open	U-Net, V-Net, RCNN, FCNN, SegNet, etc.
Tumor Segmentation Manager	Attribute	Set of Target Tumors	Open	Types of Tumors for Target Disease
	ML Model	CNN Segmentation Model	Open	U-Net, V-Net, RCNN, FCNN, SegNet, etc.
	ML Model	RNN Classification Model	Open	Basic RNN, LSTM, GRN, Convolutional LSTM, etc.
Medical Feature Analyzer	Attribute	Set of Medical Feature Types	Open	Types of Medical Features for Target Disease
Medical Feature Analyzer	ML Model	Set of ML Models	Open	ML Models for Analyzing Target Medical Features
Disease Type Classifier	Logic	Algorithm for Disease Classification	Open	Algorithm for Classifying Occurrences of Target Disease
Disease Stage Classifier	Logic	Algorithm for Stage Classification	Open	Algorithm for Classifying Stage of Disease Occurrence

Table 3. Diagnostic criteria for HCC-type liver cancer.

Category	Probability	Imaging Features		Management
LR-1	Definitely Benign	Diagnostic features for a benign entity OR Lesion resolves without treatment		Continue standard surveillance
LR-2	Probably Benign	Suggestive, but not diagnostic, features for a benign entity		Continue standard surveillance If >1 cm, consider accelerated follow-up, alternative imaging, or multi = disp. discussion
LR-3	Intermediate Probability of HCC	Mass HYPERenhancing in the arterial phase with NO WCT Mass HYPO- or ISOenhancing in the arterial phase: ○ <20 mm with ≤1 of WCT ○ ≥20 mm with NO WCT		Variable At a minimum accelerated follow-up, consider alternative imaging or multidisciplinary discussion
LR-4	Probably HCC	Mass HYPERenhancing in the arterial phase: ○ <10 mm with ≥1 of WCT ○ 10–19 mm with 1 of WCT ○ ≥20 mm with NO WCT Mass HYPO- or ISOenhancing in the arterial phase: ○ <20 mm with ≥2 of WCT ○ ≥20 mm with ≥1 of WCT		Variable Alternative imaging (if distinct advantage) and/or multidisciplinary discussion
LR-5	Definitely HCC	Mass HYPERenhancing in the arterial phase: ○ 10–19 mm with ≥2 of WCT ○ ≥20 mm with ≥1 of WCT		Multidisciplinary discussion
LR-5V	Tumor inVein	Definite enhancing soft tissue in vein		Multidisciplinary discussion Contraindication to liver transplant
LR-M	Probable Malignancy, but Non-specific for HCC	Favor other malignancy: Arterial rim HYPERenhancement Central delayed phase enhancement Concentric enhancement Peripheral washout Hepatic retraction Biliary obstruction greater than expected Targetoid on DWI or hepatobiliary phase	Favor HCC: Diffuse arterial HYPERenhancement Delayed washout Capsule Distinct rim Intralesional fat Diffuse T1 HYPERintensity Tumor in vein Nodule-in-nodule Mosaic architecture Spontaneous hemorrhage	Variable, may include follow-up, alternative imaging, biopsy, treatment, and/or multidisciplinary discussion

Table 4. Machine learning models for HCC diagnosis.

ML Models	Organ Segmentation Model	Lesion Segmentation Model	Tumor Segmentation Model	Hepatic Segmentation Model	Image Feature Classifier	HCC Type Classifier	HCC Stage Classifier
ML Category	Segmentation				Classification
ML Algorithm	U-Net				CNN	LSTM	SVM
Functional Components	Organ Segmentation Manager	Lesion Segmentation Manager	Tumor Segmentation Manager	Lesion/Tumor Segmentation Manager	Medical Feature Analyzer	Disease Type Classifier	Disease Stage Classifier
Prediction	Liver in CT Slices	Lesions in Liver	Tumors in Lesions	Hepatic Segments	List of Image Features	HCC Occurrence	Stage of HCC

Table 5. Computing the Functional Coverage (FC) Index.

Categories of Features	# of Features (Platform)	# of Features (Liver Cancer System)	Functional Coverage (FC)
Functional Components	12	13	92.3%
Persistent Object Classes	16	16	100.0%
Tasks in Diagnosis Process	8	9	88.9%
ML Models for Diagnosis	6	7	85.7%
Sum	42	45	93.3%
	N_f	N_f

Table 6. Computing the Feature Satisfaction (FS) Index.

Categories	List of Features	Weight (W_i)	Compliance (C_i)	W_i * C_i	Feature Satis. (FS)
Functional Components
	Patient Profile Manager	0.01	1	0.01	100%
	Symptom Manager	0.01	1	0.01	100%
	Medical Image Manager	0.03	1	0.03	100%
	Feedback Manager	0.01	1	0.01	100%
	Diagnosis Session Manager	0.015	0.8	0.012	80%
	Diagnosis Report Generator	0.015	0.9	0.0135	90%
	Organ Segmentation Manager	0.03	1	0.03	100%
	Lesion Segmentation Manager	0.03	1	0.03	100%
	Tumor Segmentation Manager	0.03	1	0.03	100%
	Medical Feature Analyzer	0.03	0.7	0.021	70%
	Disease Type Classifier	0.02	0.9	0.018	90%
	Disease Stage Classifier	0.02	0.9	0.018	90%
	Sub-total	0.25		0.2325	93.0%
Persistent Object Classes
	Patient	0.01	1	0.01	100%
	Diagnosis Report	0.015	1	0.015	100%
	CECT	0.01	1	0.01	100%
	Diagnosis Session	0.025	0.8	0.02	80%
	Disease	0.025	1	0.025	100%
	Lesion	0.025	1	0.025	100%
	Tumor	0.025	1	0.025	100%
	Medical Feature	0.025	0.7	0.0175	70%
	Organ Segmentation Model	0.01	1	0.01	100%
	Lesion Segmentation Model	0.01	1	0.01	100%
	Tumor Segmentation Model	0.01	1	0.01	100%
	Disease Classification Model	0.01	1	0.01	100%
	Stage Classification Model	0.01	1	0.01	100%
	Segmentation Model	0.01	1	0.01	100%
	Machine Learning (ML) Model	0.02	0.9	0.018	90%
	Classification Model	0.01	1	0.01	100%
	Sub-total	0.25		0.2355	94.2%
Tasks in Diagnosis Process
	Step 3. Segment target organ	0.035	1	0.035	100%
	Step 4. Segment lesions	0.035	1	0.035	100%
	Step 5. Segment tumors	0.035	0.8	0.028	80%
	Step 6. Analyze medical features	0.035	0.7	0.0245	70%
	Step 7. Analyze temporal changes	0.035	0.8	0.028	80%
	Step 8. Classify disease occurrence	0.03	0.8	0.024	80%
	Step 9. Classify disease stages	0.03	0.9	0.027	90%
	Step 10. Generate diagnosis report	0.015	1	0.015	100%
	Sub-total	0.25		0.2165	86.6%
ML Models used in Diagnosis
	Organ Segmentation Model	0.04	1	0.04	100%
	Lesion Segmentation Model	0.05	1	0.05	100%
	Tumor Segmentation Model	0.05	1	0.05	100%
	Image Feature Classifier	0.05	0.7	0.035	70%
	Disease Type Classifier	0.03	0.9	0.027	90%
	Disease Stage Classifier	0.03	0.9	0.027	90%
	Sub-total	0.25		0.229	91.6%
	GRAND TOTAL	1		0.9135	91.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, J.; Kim, M.; Kim, S.D.; Jin, D. Machine Learning Platform for Disease Diagnosis with Contrast CT Scans. Appl. Sci. 2024, 14, 7822. https://doi.org/10.3390/app14177822

AMA Style

Jin J, Kim M, Kim SD, Jin D. Machine Learning Platform for Disease Diagnosis with Contrast CT Scans. Applied Sciences. 2024; 14(17):7822. https://doi.org/10.3390/app14177822

Chicago/Turabian Style

Jin, Jennifer, Mira Kim, Soo Dong Kim, and Daniel Jin. 2024. "Machine Learning Platform for Disease Diagnosis with Contrast CT Scans" Applied Sciences 14, no. 17: 7822. https://doi.org/10.3390/app14177822

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Platform for Disease Diagnosis with Contrast CT Scans

Abstract

1. Introduction

2. Related Works

3. Foundation of Medical Image Analytics with Machine Learning

3.1. Machine Learning Algorithms for Medical Image Analytics

3.2. Contrast-Enhanced Computed Tomography

3.3. Tasks in the Disease Diagnosis Process with CECT

4. Design of the Platform

4.1. Architecture Design with Microkernel Style

4.2. Design of Functional Components

4.3. Design of Persistent Datasets

4.4. Design of the Unified Diagnosis Process

4.5. Design of the Diagnosis Manager

4.6. Design of Machine Learning Models

5. Case Study of Developing Liver Cancer Diagnosis

5.1. The HCC Type of Liver Cancer

5.2. Customizing the Platform for HCC Diagnosis

5.2.1. Designing the Schematic Architecture

5.2.2. Applying the Diagnosis Process for HCC Diagnosis

5.2.3. Refining the Diagnosis Manager for HCC Diagnosis

5.2.4. Applying the Database for the HCC Diagnosis System

5.3. Training ML Models for HCC Diagnosis

5.3.1. Training Set of Labeled CECT Scans

5.3.2. Training ML Models for HCC

5.3.3. Integrating an Image Feature Classifier and an HCC Type Classifier

5.4. Implementing the HCC Diagnosis System

5.4.1. Hardware Environment for Training and Running ML Models

5.4.2. User Interface of the System

5.4.3. Experiments with the CECT Test Set

5.4.4. Performance of Machine Learning Models

5.4.5. Execution Performance of the ML Models

6. Assessment of the Platform

6.1. Evaluating Functional Coverage of the Platform

6.2. Evaluating the Feature Satisfaction Index

7. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI