3.1.2. Lung Cancer

Lung cancer is a primary cancer that poses a significant threat to human life globally, having the highest mortality rate. Early detection of lung cancer is crucial for a timely diagnosis and subsequent treatment. However, conventional methods for lung cancer detection have limitations, such as their low sensitivity, high costs, and invasive procedures, which restrict their practicality [69]. In this section, we will review the optical biosensors for detecting lung cancer and explore how machine learning can aid in analyzing data and improving their application.

Image-based detection implies the use of images or videos of cells. These images or videos need to be processed to identify and quantify cells [41]. To demonstrate, Hashemzadeh et al. [70] developed a microfluidic chip for lung cancer detection that employs image-based analysis. Images were obtained using an inverted Olympus fluorescence microscope and were analyzed by a deep learning model. The researchers achieved an accuracy of 98.37% in classifying images of lung cancer cell lines and normal cell lines. The overview of the combined microfluidic deep learning approach has been shown in Figure 3A. As another example, Sui et al. [71] described the development of a microfluidic imaging flow cytometer that can detect lung cancer using complex-field imaging and fluorescence detection subsystems. The system can analyze millions of cells and provide a hierarchical analysis of the intrinsic morphological descriptors of single-cell optical and mass density, as well as fluorescently labeled biochemical markers. The data collected from the system were used to train deep learning-based models, which achieved a classification accuracy ranging from 91% to 95% for lung cancer detection.

Volatile organic compounds (VOCs) are potential biomarkers for lung cancer detection. In the study reported by Nguyen et al. [72], a controllable gap plasmonic color film biosensor was developed for the detection and quantification of VOCs. The goal of the study was to diagnose lung cancer based on VOC gas detection from exhaled breath samples. The color changes in the sensor arrays when exposed to humidity and VOCs were recorded using a camera, and a CNN model was trained to classify them into different VOCs (Figure 3B). They collected the data from 70 healthy and 50 lung cancer patients and trained the ML models, reporting a training classification accuracy of 90% and 92.8% for lung cancer and healthy patients, respectively. They achieved a classification accuracy of 89% on the test data.

A label-free classification of lung cancer cell lines was developed by Wei et al. [73] by using a two-dimensional (2D) light-scattering static cytometric technique. In this study, a method for the automatic classification of small cell lung cancer (SCLC) and poorly differentiated lung adenocarcinoma (PD-LUAD) cells was introduced by using 2D lightscattering static cytometry and machine learning (Figure 3C). A laser was used to detect the cells by means of a two-dimensional light-scattering static cytometric technique, where measurements of forward and side scattered light enabled the differentiation of overlapping SCLC and PD-LUAD cells. By employing a support vector machine (SVM) classifier, the team achieved the classification of these cells with an accuracy greater than 99.78%.

**Figure 2.** (**A**) Cross-sectional view of the proposed SPR sensor with the experimental setup. Reprinted with permission from [64]. Copyright 2023 Elsevier. (**B**) Colored entities of the designed sensor with a cross-sectional view with the experimental setup. Reprinted with permission from [65]. Copyright 2023 IEEE. (**C**) Schemes of the digital in-line holographic microscope (DIHM). Reprinted with permission from [68]. Copyright 2023 Springer Nature.

**Figure 3.** (**A**) Overview of the combined microscopic cell imaging and deep learning approach. Reprinted from [70]. Copyright 2021 Springer Nature. (**B**) Schematic of the biosensor with the combination of machine learning methods to detect lung cancers. Reprinted from [72]. (**C**) Schematic of the experimental setups. Reprinted with permission from [73]. Copyright 2018 John Wiley and Sons.

Feature extraction plays a vital role in machine learning when dealing with large amounts of data. It helps to identify and extract the most relevant and informative aspects or characteristics from the data, enabling a more effective and efficient analysis [74]. For example, Ahmad et al. [75] presented a microfluidic platform and light-sheet fluorescence microscopy based on a single-cell classification system to classify human mammary epithelial cells, primary tumor cells, and lung metastasis-derived cells. They used an optofluidic device to deliver single cells to the fluorescent microscope and simulated 3D point clouds of the fluorescent markers. They applied feature extraction techniques along with custom CNN models to classify the images. The authors achieved high accuracy on both the simulated and actual datasets and studied the effects of varying flow rates on accuracy. They reported an accuracy of 99.4% on the actual dataset.

Surface-enhanced Raman scattering (SERS) is a powerful method for identifying chemical information at a single molecular scale [76]. Lin et al. [76] developed a new biosensing platform that can identify and differentiate exosomes derived from cancerous and noncancerous sources. The platform uses a porous-plasmonic SERS chip with CP05 polypeptide to capture and distinguish exosomes without the need for labeling or purification. By combining biological analysis with Raman spectra and machine learning methods, the team accurately differentiated between lung and colon cancer cell–derived exosomes and normal exosomes at the single vesicle level, achieving an 85.72% accuracy. This protocol is fast, reliable, and easy to operate, making it a promising tool for early tumor detection and prognosis. As another example, Park et al. [77] used SERS and statistical pattern analysis

to identify lung cancer cells (Figure 4). Instead of looking at specific peak positions and amplitudes in the spectrum, they analyzed the whole SERS spectra of exosomes using principal component analysis (PCA). Using this approach, they were able to distinguish the exosomes derived from lung cancer cells from those derived from normal cells, with a 95.3% sensitivity and 97.3% specificity.

**Figure 4.** Schematic diagram of lung cancer diagnosis by SERS classification of the exosome. (**A**,**B**) Lung cancer cells and normal cells release exosomes to the extracellular environment, having their own profiles by fusing multivesicular endosomes to the plasma membrane, respectively. (**C**,**D**) Raman spectra of lung cancer cells and normal cell-derived exosomes were achieved by SERS, respectively. (**E**) SERS spectra, achieved by methods of panels (**C**,**D**), are shown. Red lines indicate specific peaks of lung cancer-derived exosomes. (**F**) Exosome classification is obtained by PCA of SERS spectra. Reprinted with permission from [77]. Copyright 2017 American Chemical Society.

#### 3.1.3. Gastrointestinal Cancer

In this section, we will review the application of optical-based biosensors in combination with machine learning to analyze the data collected from sensors for the detection of gastrointestinal cancers, such as pancreatic and liver cancers.

Nowadays, many studies detect pancreatic cancer cells using exosomes, which are small vesicles secreted by cancer cells, as biomarkers for detection. To demonstrate, Ko et al. [78] developed a multichannel nanofluidic system to analyze crude clinical samples. They used exosomes as biomarkers for detecting pancreatic cancer. The exosomes were isolated and analyzed using a microfluidic chip with a nanoporous membrane that allowed the capture of exosomes based on their size. The captured exosomes were then analyzed using machine learning algorithms to classify them as either cancerous or non-cancerous. The results showed that the approach had a high accuracy with an area under curve (AUC) of 0.81 in diagnosing pancreatic cancer, indicating its potential for use in clinical settings as a non-invasive diagnostic tool. As another example, Li et al. [79] developed a new method for detecting colorectal cancer using exosomes as a specific protein biomarker. They created a microfluidic chip with a 3D porous sponge structure and functionalized it with CD9 antibodies to capture exosomes flowing through the microfluidic channel. The authors then used an anti-SORL1 antibody modified with Si-QD silicon quantum dots to label the captured exosomes and obtain fluorescence images. They extracted three features

(luminance, mean, and variance) and trained an RF algorithm to classify the exosomes. The authors report that they achieved a high classification accuracy of 91.14%. Last but not least, Cheng et al. [80] described a nano biosensing chip that utilizes SERS to detect cancer without the need for antibodies. This study showcased a simple and intelligent detection method for efficiently screening liver cancer, achieving a sensitivity of 90% and specificity of 92% in identifying 50 serum SERS spectra from HCC patients compared to 50 serum SERS spectra from healthy individuals.

D'Orazio et al. [81] introduced the concept of machine learning phenomics (MLP), which combines deep learning with time-lapse microscopy to monitor drug responses in colorectal cancer cells. This study aims to evaluate the effectiveness of this approach by comparing it with the conventional methods used to analyze drug responses in these cells. The results demonstrate that MLP can accurately predict drug responses in colorectal adenocarcinoma cells based on their gene expression patterns, and it outperforms the conventional methods in terms of accuracy and efficiency.

Quantum dot immunobionsensors are powerful optical sensors used to detect cancer cells, which were introduced by Saren et al. [82] to detect and quantify gastrointestinal tumor biomarkers. They developed quantum dot (QD)-labeled biofilms to detect four biomarkers: CEA, CA125, CA19-9, and AFP, indicating the presence of gastrointestinal tumors. The antibody conjugates of the QD were analyzed using fluorescence and ultraviolet absorption spectroscopy. The PCA technique was applied to the images obtained from the data collected. The approach was tested on standard samples rather than clinical samples, achieving a classification precision of 99.52% and 99.03% and classification accuracy of 94.86% and 94.2% for colon tumors and gastric tumors, respectively.

Pyruvate kinase disease (PKD) is an inherited disorder that affects red blood cell metabolism and may have an increased risk of developing liver cancer and some types of colon and kidney cancer [83,84]. Mencattini et al. [85] described a machine learning microfluidic-based platform that integrates lab-on-chip devices and data analysis algorithms to evaluate the plasticity of red blood cells in PKD monitoring. The platform uses microfluidic channels to measure the deformability of red blood cells, which is a critical indicator of the disease. The data collected from the microfluidic device are then analyzed using machine learning algorithms to determine the severity of the disease. The blood cells were recorded through a 'forest of pillars', and the video was saved for offline analysis. The efficacy of three networks, AlexNet, ResNet-101, and NasNetLarge, pre-trained deep learning architectures, was tested on actual samples. On the live samples, the performance of AlexNet was 88%, ResNet-101 was 82%, and NasNetLarge was 85%.

## 3.1.4. Gynecological Cancer

The most common types of gynecological cancers are cervical, ovarian, and endometrial (uterine) cancers. Late diagnosis and chemoresistance present significant obstacles to the successful treatment of gynecological cancers. Therefore, there is a pressing need to develop new markers to detect gynecological cancers at an early stage. In this regard, biosensors that are low-cost and non-invasive hold great potential for predicting these types of cancers at an early stage [86]. Moreover, with the emergence of biosensors that generate large amounts of data, the application of machine learning to analyze this data has become increasingly important.

High-content VFC (video flow cytometry) utilizes a 2D light-scattering technique to project optical signals from cells onto an image sensor without optical focusing. This allows for high-content patterns to be obtained and combined with machine learning algorithms, enabling automated, high-throughput analysis of single cells. The VFC technique developed by Liu et al. [87] achieves a measurement rate of around 1000 unlabeled cells per minute and demonstrates high accuracy in classifying cervical carcinoma cell lines, including Caski, HeLa, and C33-A cells. An accuracy of 91.5%, 90.5%, and 90.5% for these cell lines by using a deep learning model has been reported. This study provides high-quality

cell images, automatic digital filtering, and label-free cell classification, offering potential clinical applications. The illustration of our high-content VFC is shown in Figure 5.

**Figure 5.** The schematic diagram of high-content VFC and the 3D schematic diagram of the sheath flow in the flow chamber. Reprinted with permission from [87]. Copyright 2022 John Wiley and Sons.

Serum biomarkers are frequently utilized due to their sensitivity and specificity, which makes them valuable for cancer screening or diagnostic testing purposes. To demonstrate, Kim et al. [88] developed a nanosensor array and a computational model that resulted in the perception-based detection of ovarian cancer from patient serum samples. The researchers aimed to develop a novel approach for diagnosing ovarian cancer based on the unique spectral characteristics of carbon nanotubes modified with quantum defects. They utilized machine learning algorithms to analyze the spectral data obtained from the serum samples. They trained and validated several machine learning classifiers with 269 serum samples to distinguish patients from those with other diseases and healthy individuals. Their results showed that the SVM algorithm yielded the best F-scores among the five machine learning algorithms tested, with an accuracy of 95%.

In another study reported by Pirone et al. [89], a digital holography to model cells in 3D space instead of 2D space was developed. This method provides a better characterization of endometrial cancer cells. They extracted 67 features, such as morphology and histogram, as inputs of machine learning algorithms. In order to test the classification performance with the 3D and 2D features, several common machine-learning methods have been trained and tested on the feature data. The results show that 3D features achieve a better classification performance, and the LDA classifier achieves the best score.
