YOLO-PBESW: A Lightweight Deep Learning Model for the Efficient Identification of Indomethacin Crystal Morphologies in Microfluidic Droplets

Wei, Jiehan; Liang, Jianye; Song, Jun; Zhou, Peipei

doi:10.3390/mi15091136

Open AccessArticle

YOLO-PBESW: A Lightweight Deep Learning Model for the Efficient Identification of Indomethacin Crystal Morphologies in Microfluidic Droplets

School of Mechatronic Engineering, Guangdong Polytechnic Normal University, Guangzhou 510665, China

^*

Author to whom correspondence should be addressed.

Micromachines 2024, 15(9), 1136; https://doi.org/10.3390/mi15091136

Submission received: 10 August 2024 / Revised: 4 September 2024 / Accepted: 5 September 2024 / Published: 6 September 2024

(This article belongs to the Special Issue Recent Advances in Lab-on-a-Chip and Their Biomedical Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Crystallization is important to the pharmaceutical, the chemical, and the materials fields, where the morphology of crystals is one of the key factors affecting the quality of crystallization. High-throughput screening based on microfluidic droplets is a potent technique to accelerate the discovery and development of new crystal morphologies with active pharmaceutical ingredients. However, massive crystal morphologies’ datum needs to be identified completely and accurately, which is time-consuming and labor-intensive. Therefore, effective morphologies’ detection and small-target tracking are essential for high-efficiency experiments. In this paper, a new improved algorithm YOLOv8 (YOLO-PBESW) for detecting indomethacin crystals with different morphologies is proposed. We enhanced its capability in detecting small targets through the integration of a high-resolution feature layer P2, and the adoption of a BiFPN structure. Additionally, in this paper, adding the EMA mechanism before the P2 detection head was implemented to improve network attention towards global features. Furthermore, we utilized SimSPPF to replace SPPF to mitigate computational costs and reduce inference time. Lastly, the CIoU loss function was substituted with WIoUv3 to improve detection performance. The experimental findings indicate that the enhanced YOLOv8 model attained advancements, achieving AP metrics of 93.3%, 77.6%, 80.2%, and 99.5% for crystal wire, crystal rod, crystal sheet, and jelly-like phases, respectively. The model also achieved a precision of 85.2%, a recall of 83.8%, and an F1 score of 84.5%, with a mAP of 87.6%. In terms of computational efficiency, the model’s dimensions and operational efficiency are reported as 5.46 MB, and it took 12.89 ms to process each image with a speed of 77.52 FPS. Compared with state-of-the-art lightweight small object detection models such as the FFCA-YOLO series, our proposed YOLO-PBESW model achieved improvements in detecting indomethacin crystal morphologies, particularly for crystal sheets and crystal rods. The model demonstrated AP values that exceeded L-FFCA-YOLO by 7.4% for crystal sheets and 3.9% for crystal rods, while also delivering a superior F1-score. Furthermore, YOLO-PBESW maintained a lower computational complexity, with parameters of only 11.8 GFLOPs and 2.65 M, and achieved a higher FPS. These outcomes collectively demonstrate that our method achieved a balance between precision and computational speed.

Keywords:

microfluidic droplets; crystal morphologies; object detection; deep learning; improved-YOLOv8

1. Introduction

Crystallization is an operation employed within the pharmaceutical, chemical, materials, and food industries for the targeted formation, purification, and isolation of crystalline organic compounds. The quality of the crystallization is directly linked to pharmacological and therapeutic effectiveness. One of the most important characteristics for evaluating the quality of a crystal is its morphology [1,2]. Additionally, pharmaceutical crystallization also plays a crucial role in downstream processing like the filtration, the drying, and the milling of active pharmaceutical ingredients (APIs) [3,4,5]. However, the inherent crystal morphology of many drugs can significantly hinder these processes. To address this challenge, crystal engineering strategies have been employed to manipulate crystal growth and to achieve desirable morphologies. Antisolvent crystallization is a commonly used method for purifying active pharmaceutical ingredients (APIs) [6,7]. This technology usually involves targeting specific intermolecular interactions through the meticulous selection of solvents and the inclusion of tailored additives. The concentration ratios of the drug solvent to the antisolvent can significantly affect the crystal morphology, which in turn influences the concentration of APIs and ultimately impacts the clinical efficacy of the drug. Kim et al. [8] investigated the crystallization of indomethacin using supercritical carbon dioxide and aqueous antisolvent methods, exploring the impact of experimental conditions and habit-modifying agents on crystal size, morphology, and polymorphic form. Their experiments concluded that variations in drug concentration and changes in the mixing rate of the drug and antisolvent can affect the size of the indomethacin crystals, while using different antisolvents results in distinct crystal morphologies, including crystal sheets, crystal rods, and crystal wires. Additionally, precise control of the experimental parameters during crystallization is crucial. Although effective, this traditional approach is inherently empirical, relying on the scientist’s expertise and often requiring extensive trial-and-error experimentation. This can create a significant bottleneck in terms of time, resource consumption, and overall efficiency [9]. Droplet microfluidic technologies can precisely control experimental conditions, reduce operation time, and conduct a substantial volume of crystallization experiments only using a minimal quantity of APIs [10,11,12]. Sun et al. [13] developed a droplet-based system for protein crystallization, which enabled automated sample introduction and the generation of droplets at an ultra-high throughput of up to 6000 droplets per hour. Yadavali et al. [14] developed a microfluidic droplet platform that efficiently generated large quantities of polymer microparticles, achieving a production rate of 227

g / h

. This system significantly enhances production rates, offering a scalable solution for pharmaceutical applications. Fortt et al. [15] produced spherical crystalline particles of the antiretroviral HIV API cabotegravir via solvent extraction on droplet microfluidic devices. Parallelization led to a 100-fold increase in throughput over single-channel devices, with the resulting drug excipient demonstrating stability during downstream processing. However, the ultra-high throughput of droplet microfluid generates extensive datasets covering many small-object crystals which also obscure each other. Faced with this situation, it is hard for humans to quickly identify different crystal morphologies and it is expensive to utilize automatic screening platforms (such as CrystalQuick X plates from Greiner Bio-One or In Situ-1 plates from MiTeGen, which also come with high costs and require specialized hardware like translational stages for crystal alignment) [16,17,18]. These problems lead to a lack of high-throughput-characterization approaches, which results in the limitation of production rates [19].

Recently, both the advent and the progression of artificial intelligence have facilitated some methods based on deep learning for object detection and classification, thereby making it possible to achieve the fast and accurate detection of crystal morphologies [20]. Scholars from Thailand utilized the VGG16 model to identify sugar crystals [21]. Scholars from Princess Nourah Bint Abdulrahman University combined a squeeze-excitation (SQE) with a dense neural network (DCNN) to achieve high accuracy and efficiency in defect detection in silicon nitride crystal structures [22]. Yann et al. designed the CrystalNet, utilizing a convolutional neural network (CNN) which was trained on a dataset comprising 163,894 high-resolution, grayscale labeled images derived from protein crystallization trials conducted on 1536-well plates [23]. This model achieved an accuracy of 0.908 and an impressive area under the receiver operating characteristic curve (AUC-ROC) of 0.9903 for the classification of crystal classes. Manee et al. modified RetinaNet, designing a novel deep learning network to solve the problem of crystal detection in high-density solute [24].

This paper builds on this work [11] by further examining and employing indomethacin, a nonsteroidal anti-inflammatory drug [25], as a research subject. We used droplet microfluidic technologies to obtain different morphologies of indomethacin, including crystal wire, crystal sheet, crystal rod, and jelly-like phase. The impact of crystal morphology on the properties of indomethacin is substantial. Different crystal forms and shapes can greatly influence a drug’s physical and chemical properties, including its solubility, its dissolution rate, and ultimately its bioavailability and efficacy [26]. The presence of jelly-like substances during the crystallization process can significantly reduce the crystallization rate of the active drug components, hindering the production of high-quality crystals [27]. Indomethacin’s polymorphism is complex, but the α-form and the γ-form are the most produced and useful forms [28]. Crystal wire corresponds to the γ-form of indomethacin, whereas crystal rod and crystal sheet belong to the α-form. Based on their different physical and chemical properties, these crystal forms can be developed into various pharmaceutical dosage forms, such as tablets and capsules [8,29,30]. Additionally, the formation of crystal wire, crystal sheet, and crystal rod requires different drug concentrations in the solvent, as well as varying types and ratios of antisolvents, which in turn results in varying drug concentrations. Manual identification of these different crystal forms is inefficient, and any errors in identification can lead to the waste of both the active pharmaceutical ingredients and the solvents. To solve the problems of slow manual detection and classification of different indomethacin crystal morphologies and the poor recognition of small target crystals, we proposed an improved detection algorithm (YOLO-PBESW) of crystal morphologies based on YOLOv8. Firstly, the original YOLOv8 only has three feature layers—P3, P4, and P5—which are inadequate for the detection of the small target crystals. To enhance the detection of small target crystals, we integrated the high-resolution feature layer P2, improving the acquisition of more shallow details and location information. Secondly, PANet architecture is adopted in the neck of YOLOv8. We utilized the BiFPN [31] for multi-scale feature fusion to improve the detection performance and to reduce the computation cost. This paper proposed the incorporation of an EMA mechanism within the architecture. Furthermore, the introduction of the wise-IoU loss function has enhanced the model’s robustness and generalization. This improvement was particularly crucial when the dataset covered low-quality examples.

This paper’s contributions can be summarized as follows:

Incorporating the high-resolution feature layer P2 into the architecture to preserve greater detail and location information that enhance the accuracy of the detection of small target crystals;
Adopting the methodology of BiFPNs to design a structure that combines features from different scales in place of the original PANet;
Adding the EMA mechanism to enhance feature fusion abilities;
Utilizing WIoU_v3 as an enhanced loss function to improve the accuracy and generalization of predicted bounding boxes.

2. Related Work

The task of object detection is not only the classification of objects but also the accurate localization of these objects within imagery [32]. There are two main periods in object detection: the conventional object detection era (1998–2014) and the period of object detection based on deep learning (2014–now). In the period of conventional object detection, some methods like scale invariant feature transform (SIFT) [33] were used; SIFT was an image descriptor used for object detection and image matching. Histograms of Oriented Gradients (HOG) [34] were also used, which worked by computing and compiling histograms of oriented gradients in localized portions of images which had limitations such as complex backgrounds, multi-scale objects, and small targets. Feng et al. [35] utilized support vector machines (SVM) to classify crystal, leading to a high false-positive rate due to low precision. With the advancement of deep learning and computational capabilities, the new era of object detection has made huge progress [36]. The main stream of object detection based on deep learning can be divided into two category: two-stage algorithms and one-stage algorithms [37].

Two-stage algorithms like R-CNN [38], Fast R-CNN [39], and Faster R-CNN [40] typically follow a two-stage process. In the first stage, the algorithm generates a range of region proposals (RPN) that efficiently identify and extract image regions that contain objects of interest. In the second stage, the algorithm utilizes convolutional networks to classify and regress these proposals. Gao et al. [41] utilized the mask regional convolutional neural network (Mask R-CNN) to validly achieve the segmentation and categorization of two distinct morphological classes of LGA crystals. But the processing speed was only 10 frames per second. Su et al. [11] developed an integrated system that merges a hydrogel droplet-based platform with the Faster R-CNN deep learning algorithm to facilitate the high-throughput screening of antisolvent crystallization conditions for active pharmaceutical ingredients (APIs). But they did not modify and optimize the network architecture, resulting in limitations in both accuracy and detection speed.

Although these algorithms have advantages in detection accuracy, they are often complex and have limitations in detection speed. One-stage algorithms are different to two-stage algorithms. They directly compute the class probability and spatial coordinates of objects, getting the final detection outcomes only through a singular stage. This methodology reduces computation and enhances detection speed but has disadvantages for accuracy. The representation of one-stage algorithms includes the YOLO series and SSD [42]. Jiang et al. [43] proposed a method based on YOLOv4 to reduce the labeling effort and computation load of real-time image segmentation. They used rectangular boxes for object detection to evaluate the characteristic size of crystals, enabling real-time implementation. Fan et al. [44] utilized the YOLOv5 algorithm to detect the scintillation crystals, and the recognition rate and confidence can reach 98% and 80%, respectively. Since our detection task needs to meet the demand for speed and accuracy, we recommended the YOLOv8, the state-of-the-art algorithm of the YOLO series, as the base model. We modified and improved its architecture to achieve the fast and precise detection of different indomethacin crystal morphologies.

3. Methodology

3.1. Improved YOLOv8

YOLO (you only look once) is one of the single-stage detection algorithms known for its fast speed and high accuracy. The latest iteration, YOLOv8, introduced by Ultralytics in 2023, offers a range of configurations tailored for different performance and resource requirements. There are several reasons for our choice of YOLOv8 as the base model. Firstly, compared with YOLOv5 and YOLOv7 [45], YOLOv8 demonstrates notable enhancements in mean Average Precision (mAP), parameter count, and Floating Point Operation per second (FLOPs) outcome, when all models were tested on the COCO dataset [46]. Secondly, the limitations of YOLOv5 include challenges in detecting small targets and a requirement for enhancements in dense-target detection. Additionally, the performance of YOLOv7 is constrained by the quality and quantity of training data, the structure of the model, and the hyperparameter settings during training [47]. Thirdly, there exists a large and active user community of YOLOv8 to foster the development and dissemination of readily available implementation resources. The YOLOv8 is provided in a number of different variants, including YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. Each of these is designed to handle different detection tasks. The objective of this paper is to design and implement a more efficient algorithm for the detection of crystal morphologies, with the aim of improving high-throughput characterization approaches. Accordingly, the YOLOv8 algorithm’s nano model has been selected as the original model, due to its recognition speed, and its structure has been shown in Figure 1.

The architectural design of the YOLOv8 model encompasses three principal components: the backbone, the neck, and the head. To enhance the quality of detection, we introduced structural improvements to these three key parts. The model’s structure has been illustrated in Figure 2. Firstly, we added a high-resolution feature layer P2 to acquire more location information, which proved advantageous in the detection of small targets. Secondly, we utilized the methodology of BiFPNs to replace the PANet structure, which resulted in a reduction in computational complexity and an improvement in detection accuracy. Thirdly, we added the EMA mechanism before the detection head to improve the detection performance while keeping the model lightweight. Additionally, we substituted the SimSPPF module with SPPF to reduce the inference time. Finally, we chose the WIoUv3 loss function to replace the CIoU loss function. The specific modifications are described as follows:

3.1.1. High-Resolution Feature Layer P2

Given the presence of some small crystal targets in our dataset and the large downsampling of YOLOv8, acquiring feature information for small targets which form deeper feature maps poses a challenge. There are three instances of downsampling in the YOLOv8 algorithm: 32-time, 16-time, and 8-time. With larger downsampling, the stronger semantic representation of the features is extracted. But it results in more location-information loss which is not of benefit for the detection of small targets. For example, a downsampling of 8 will generate an 80 × 80 detection scale, while the sensory field from the detection of each grid is 8 × 8. If the target has heights and widths less than 8 pixels, the original YOLOv8 algorithm will struggle to discern it. The features derived from shallower layers with less downsampling retain more locational information, which is advantageous for the detection of small targets. Since there are a large number of small targets in our crystal datasets, a high-resolution feature map with a 4-time downsampling with a detection scale of 160 × 160 was added to the backbone. Its structure is displayed in Figure 3. Incorporating the 160 × 160 scale layer facilitates the propagation of small-crystal information throughout the downsampling, thereby reinforcing the model’s capacity for feature fusion and enhancing the precision of small-target detection. Furthermore, the introduction of an additional detection head extends the detection range for crystals of small size. The enhancement in detection accuracy and range enables the network to more precisely recognize crystal morphologies within the droplet.

3.1.2. BiFPN

Feature pyramid networks (FPNs) employ a top-down methodology, as shown in Figure 4a, enhancing the resolution of the features by upsampling the coarse spatial information present at lower levels and merging it with the semantically richer feature maps at elevated pyramid levels. Path Aggregation Networks (PANs) build upon FPNs by incorporating a complementary bottom-up pathway, as shown in Figure 4b. For example, there is a list of multi-scale features

\vec{P^{i n}} = (P_{l 1}^{i n}, P_{l 2}^{i n}, P_{l 3}^{i n}, \dots)

; here,

\vec{P^{i n}}

denotes the feature at level

l_{i}

. Additionally, feature fusion generates a sequence of intermediate features

\vec{P^{t d}} = (P_{l 1}^{t d}, P_{l 2}^{t d}, P_{l 3}^{t d}, \dots)

. PAN adopts a simple summation strategy for the integration of multi-scale features.

P_{l i}^{t d} = C o n v (P_{l i}^{i n} + R e s i z e (P_{l i + 1}^{t d})

(1)

P_{l i}^{o u t} = C o n v (P_{l i}^{o u t} + R e s i z e (P_{l i + 1}^{o u t})

(2)

where

R e s i z e

represents an upsampling or downsampling operation,

C o n v

represents a convolution operation, and

P^{o u t}

denotes the output feature at level

l_{i}

.

This addition facilitates the propagation of precise localization information from the lower levels to the higher levels of the network. The original neck network of YOLOv8 employs a combination of FPNs and PANs that enables the fusion of features across various layers. But Li et al. [48] found that this structure may filter out some essential feature information and result in a large computational cost. Tan et al. [31] proposed the bi-directional feature pyramid network (BiFPN), as displayed in Figure 4c. The BiFPN simplifies the network by removing nodes with only one input edge, as these nodes contribute less to the process of feature fusion. By focusing on nodes that integrate multiple features, the BiFPN reduces computational complexity while still effectively capturing and combining important features compared with FPNs and PANs. It has two main directions to achieve cross-layer information transfer and feature fusion: upward convergence from lower feature layers and downward convergence from higher feature layers. Combining features from various layers and assigning appropriate weights means faster multi-scale target detection. For example, two fused features at level 4 for BiFPN are as follows:

P_{4}^{t d} = C o n v (\frac{w_{1} \cdot P_{4}^{i n} + w_{2} \cdot R e s i z e (P_{5}^{i n})}{w_{1} + w_{2} + ε})

(3)

P_{4}^{o u t} = C o n v (\frac{w_{1}^{'} \cdot P_{4}^{i n} + w_{2}^{'} \cdot P_{4}^{t d} + w_{3}^{'} R e s i z e (P_{5}^{i n})}{w_{1}^{'} + w_{2}^{'} + w_{3}^{'} + ε})

(4)

through applying ReLU after each

w_{i}

to ensure

w_{i} > 0 (i = 1, 2, 3, \dots)

. The value

ε

is to prevent numerical instability. At level 4 on the top-down pathway,

P_{4}^{t d}

serves as the intermediate feature, and

P_{4}^{o u t}

is the output feature at level 4 on the bottom-up pathway.

In hydrogel droplets, crystal sheets and crystal rods have identical morphologies and sizes. It is difficult for the PAN of the original YOLOv8 to identify them. This paper utilized the methodology of BiFPNs to fuse cross-level feature information for improving the detection accuracy of crystal rods and crystal sheets.

3.1.3. EMA Mechanism

The EMA [49] mechanism is an innovative and effective multiscale attention module. It is unlike previous attention mechanisms, such as the coordinate attention mechanism (CA) [50], which incorporates positional information to enhance spatial feature extraction but has limitations in capturing all of the critical information facing complex spatial information, and the channel attention mechanism, which refines feature representations by exploiting inter-channel relationships and amplifying informative channels but fails to address the significance of feature information across various spatial scales. The EMA mechanism incorporates the ability to analyze features across a spectrum of scales, ranging from small to large scale, that can achieve better performance in intricate detection environments. The structure of the EMA mechanism is shown in Figure 5. Given any input feature tensor

X \in R^{C \times H \times W}

, EMA partitions

X

across the channel dimension, resulting in G sub-features learning different semantics. To extract attention weight descriptors from the segmented feature groups, EMA leverages three parallel processing pathways. Two of these pathways operate a

1 \times 1

convolution branch, while the third pathway utilizes a

3 \times 3

convolution branch. This architecture effectively captures intricate dependencies and minimizes the computational expense. Within the

1 \times 1

convolution branch, two independent 1D global average pooling operations are strategically employed. These operations have the capability of encoding channel-wise information across both spatial dimensions (height and width) separately. The

3 \times 3

convolution branch utilizes a single

3 \times 3

kernel to capture multi-scale feature representations. Following the application of two independent 1D global average pooling operations, EMA incorporates a processing methodology from CA, employing concatenation to two encoded features along the image-height dimension and making it share the same

1 \times 1

convolution. After the convolution, the output is factorized into two vectors. To ensure compatibility with the subsequent linear convolutions, both vectors are passed through separate non-linear sigmoid activation functions. Then, the “re-weight” operation mixes features from various scales to improve or suppress features through the application of an attention map, which is deduced from the input feature map onto the original map. After that, as shown in Equation (5), 2D global average pooling is used to encode the global spatial information of the

1 \times 1

branch’s outputs. The outputs of the smallest branch will be converted to the shape

R_{1}^{(1 \times C ∥ G)} \times R_{3}^{(C ∥ G \times H W)}

. To ensure efficient computation, EMA uses the softmax function for 2D Gussian maps at the outputs of 2D global average pooling to adjust for linear transformations. The first spatial attention map was derived by multiplying the outputs of the above parallel processing with matrix dot-product operations. Furthermore, the same 2D global average pooling was used to encode the global spatial information of the 3 × 3 branch’s outputs, and the 1 × 1 branch was converted to the shape

R_{3}^{(1 \times C ∥ G)} \times R_{1}^{(C ∥ G \times H W)}

. Following this, the second spatial attention map was derived, which preserved precise spatial positional information. In the end, EMA combines the spatial attention weight values, which are derived from the output feature maps using the sigmoid function. This approach focuses on understanding the relationship between each pair of pixels and emphasizes global contexts for all pixels.

Z_{c} = \frac{c 1}{H \times W} \sum_{j}^{H} \sum_{i}^{W} x_{c} (i, j)

(5)

3.1.4. WISE-IoU Function

In the domain of object detection, the loss function plays a crucial role in determining the overall performance of the model. Selecting an appropriate loss function can significantly enhance the accuracy and robustness of the bounding box predictions, thereby improving the model’s ability to precisely localize and classify objects within an image.

Different from YOLOv5, which only uses CIoU loss, YOLOv8 utilizes two kinds of regression loss: CIoU loss and DFL (distribution focal loss) [51]. The equations of CIoU are as follows:

L o s s_{C I o U} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c^{2}}) + α v;

(6)

α = \frac{v}{(1 - I o u) + v};

(7)

v = \frac{4}{π^{2}} {(t a n h^{- 1} \frac{w_{y}}{h_{y}} - t a n h^{- 1} \frac{w_{x}}{h_{x}})}^{2},

(8)

where

α

denotes the coefficient employed to balance competing objectives and the parameter

v

assesses the consistency of the aspect ratio. The centers of the predicted and labelled boxes are represented by

b

and

b^{g t}

, respectively.

ρ

denotes the Euclidean distance between the two center points, while

c

represents the length of the diagonal line of the minimum outer rectangle that contains both the predicted box and the labeled box.

While the CIoU loss function that is applied by YOLOv8 offers significant improvements over conventional IoU loss by effectively addressing challenges such as bounding box offset and aspect ratio imbalances in object detection frameworks, its primary utility lies in improving the fitting precision of bounding box regression. But the low-quality data within the dataset will result in model overfitting as bounding box regression is overly emphasized for such low-quality examples. This overfitting can subsequently diminish the overall detection performance of the model. Since our dataset includes some blur and small crystals, CIoU’s dependence on distance and aspect ratio metrics may unfairly penalize these lower-quality examples. Wise-IoU (WIoU) [52] incorporates weights for the region between predicted bounding boxes and ground truth boxes to solve this problem. There are three versions of WIoU. WIoU_v1 employs an attention mechanism to construct the bounding box loss, focusing on crucial areas within the box to optimize localization accuracy:

L_{I o U} = 1 - I o U;

(9)

R_{W I o U} = e x p [\frac{(x - x_{g t}) + {(y - y_{g t})}^{2}}{{(w_{g}^{2} + H_{g}^{2})}^{*}}];

(10)

L_{W I o U_v 1} = R_{W I o U} L_{I o U},

(11)

where the loss of IoU, denoted as

L_{I o U}

, is defined as the complement of the IoU between the predicted bounding boxes and the ground truth. It focuses on the quality of the overlap.

x

and

y

are the expected bounding box’s center coordinates. The width and height of the minimum bounding box are represented by

w_{g}

and

H_{g}

, respectively.

w_{g t}

and

H_{g t}

separately represent the coordinate of the ground truth bounding box’s center. The

R_{W I o U}

quantifies the normalized distance between the center points of the predicted and the ground truth bounding boxes.

WIoU_v2 introduces a monotonic static focus mechanism (FM), as shown in Equation (12):

L_{W I o U_v 2} = {(\frac{L_{I o U}^{*}}{\bar{L_{I o U}}})}^{γ} L_{W I o U_v 1} (γ > 0)

(12)

where

\bar{L_{I o U}}

represents the exponential running average with momentum. To solve the problem of slow convergence in the late stages of training, the gradient gain

\frac{L_{I o U}^{*}}{\bar{L_{I o U}}}

keeps a high level overall.

WIoU_v3 employs a gradient gain as the focusing coefficient, utilizing a nonmonotonic dynamic focusing mechanism to allocate the loss function strategically. This approach ensures that the model prioritizes samples which are challenging to match accurately with the target, thereby significantly enhancing the detection accuracy and robustness. The equations are as follows:

β = {(\frac{L_{I o U}^{*}}{\bar{L_{I o U}}})}^{γ};

(13)

r = \frac{β}{δ α^{β - δ}};

(14)

W I o U_v 3 = r L_{W I o U_v 3},

(15)

where

β

represents the mapping of outlier degrees.

r

is the gradient gain, and the hyper-parameters

α

and

δ

adjust it.

Compared with WIoU_v1 and WIoU_v2, the

\bar{L_{I o U}}

is dynamic, and the standard for demarcating the quality of anchor boxes is similarly dynamic, so WIoU_v3 can employ a gradient gain allocation strategy that aligns optimally with the prevailing conditions at any given moment. For these reasons, we adopted WIoU_v3 to replace the original CIoU, setting α value to 1.9 and the

δ

value to 3. This configuration is designed to allocate smaller gradient gains to lower-quality anchor boxes, thus enhancing the effectiveness of the bounding box loss function.

3.1.5. SimSPPF

For the purpose of improving the speed of detection, we substituted the conventional Spatial Pyramid Pooling Fusion (SPPF) module (as shown in Figure 6a) in the YOLOv8 with the quicker Simple Spatial Pooling Fusion (SimSPPF) module, as shown in Figure 6b. Different from the Shifted Linear Unit (SiLU) activation function that is adopted by SPPF, SimSPPF uses the Rectified Linear Unit (ReLU) activation function that mitigates the issue of vanishing gradients and facilitates more rapid convergence. Two of these activation functions as shown in Equations (16) and (17):

Re L U : f (x) = \{\begin{matrix} x, x > 0 \\ 0, x < = 0 \end{matrix}

(16)

S i L U : f (x) = \frac{x^{2}}{1 + e^{- x}}

(17)

4. Experiments

4.1. Experiments’ Environments and Training Paraments

The configuration of the experimental environment is shown in Table 1. The experiments were performed on a workstation with 128 GB RAM, Intel Xeno Sliver 4210R (10 Cores and 20 Threads), and Nvidia Quadro RTX5000 (16 GB storage). And all the experiments were based on Windows 10, Python 3.8, Pytorch 2.0.1, and CUDA 11.8.

The training setting is shown in Table 2. In the training stage, we employed the Stochastic Gradient Descent (SGD) optimizer with a weight decay and momentum of 0.0005 and 0.937, respectively. The learning rate was set as 0.01. Additionally, the number of training epochs was set to 300 with a batch size of 12, and the input image resolution was set to 640 × 640. If the training data did not improve within 50 epochs, the training process employed early stopping to prevent overfitting. Training was terminated if the performance metric did not improve for 50 epochs. A learning rate warm-up strategy was implemented for the initial 3 epochs.

4.2. Datasets

We have developed an advanced high-throughput system employing droplet microfluidic technology to conduct experiments, generating indomethacin crystals with various morphologies within hydrogel droplets. The images of droplets and crystals were taken from fluorescence microscopy (TI-U, Nikon, Tokyo, Japan) with a CCD camera (Digital Sight DS-Fi2, Nikon, Japan) and using imaging software (NIS-Elements BR, Nikon, Japan) to display morphologies of them, as shown in Figure 7. Further details can be found in this work [11]. We utilized the LabelImg annotation tool to delineate the minimum bounding box encompassing crystals and JLP instances. The total number of images was 635, including 191 images of jelly-like phase, 193 images of rod crystals, 127 images of sheet crystals, and 124 images of wire crystals (Figure 8). To enhance the efficiency of the model’s training and evaluation, the raw dataset was randomly divided into training, validation, and testing sets, with a ratio of 6:3:1. Data augmentation in object detection is an effective and efficient method to avoid the model’s low generalization and robustness ability when there is insufficient data [53]. We first preferred model-free augmentation [54] to extend the size of the dataset, including flipping horizontally, flipping vertically, changing light, changing color, rotating, cropping, flipping, and mirroring. Ultimately, we adopted mosaic data augmentation [55], HSV-Hue augmentation, HSV-Saturation augmentation, HSV-value augmentation, image translation, image scaling, and image flipping horizontally to improve the generalization of the model during training time. Mosaic data augmentation randomly combined four training images into a single composite image. Considering the enhanced images that were created, this method was not representative of the true distribution of natural images; mosaic data augmentation was disabled for the last 10 epochs. All of these methods were applied to the training set in Figure 9. Figure 10 demonstrates the fundamental characteristics of the training set, which constitute a portion of the complete dataset, segmented into four primary sections.

4.3. Evaluation Metrics

In evaluating the performance of crystal detection, we employed a multifaceted approach by considering various metrics:

Confusion matrix: It provides a visual representation of the classification outcomes for each category, as shown in Figure 11. Each row corresponds to actual categories, while each column represents categories predicted by the model. Values along the diagonal indicate the proportion of instances correctly classified into their respective categories. Based on these definitions, a true positive (TP) is when the model correctly predicts a positive sample, and the actual class is indeed positive. A false negative (FN) occurs when the sample’s true class is positive but the model incorrectly predicts it as negative. A false positive (FP) happens when the sample’s true class is negative yet the model incorrectly identifies it as positive. A true negative (TN) is when the sample’s true class is negative and the model correctly identifies it as negative. The confusion matrix is crucial for calculating various metrics, including precision, recall, and F1-score.

Precision (P): This metric is defined as the proportion of TP outcomes relative to the aggregate number of instances classified as positive, including TP and FP.

P r e c i s i o n = \frac{T P}{T P + F P}

(18)

Recall (R): This metric calculates the ratio of TP to the total of TP and FN.

Re c a l l = \frac{T P}{T P + F N}

(19)

F1-Score: It represents the harmonic mean of precision and recall.

F 1_S c o r e = 2 \times \frac{P \times R}{P + R}

(20)

Average precision (AP): AP signifies the mean precision values computed at various recall levels. It also represents the area under the precision–recall curve, providing a comprehensive evaluation of the model’s ability to accurately identify relevant instances across varying recall thresholds. It was calculated by Equation (21), where P indicates precision, and R indicates recall.

A P = \int_{0}^{1} P (R) d R

(21)

Mean Average Precision (mAP): mAP can be calculated by dividing the AP by the total number of classes (num_classes), as the following equation shows:

m A P = \frac{A P}{n u m_c a l s s e s}

(22)

5. Results and Discussion

5.1. Experimental Analysis

5.1.1. Confusion Matrix

The confusion matrices for YOLOv8n and YOLO-PBESW are presented in Figure 12. Firstly, the YOLO-PBESW demonstrated improved recognition accuracy of crystal wire, crystal rod, and crystal sheet, with respective increases of 2%, 4%, and 1%. Secondly, the misclassification rate of crystal wire, crystal rod, and crystal sheet as background decreased from 13%, 21%, and 25% to 11%, 18%, and 22%, respectively.

5.1.2. P–R Curve

Figure 13 demonstrates the P–R curve graphs of YOLOv8n and YOLO-PBESW. It is obvious that the area enclosed by each curve of our method is significantly larger than the original YOLOv8n, which indicates that our methods achieved a significant improvement in detection.

5.1.3. Detection

Figure 14 shows two sets of comparisons of YOLOv8n and YOLO-PBEWS. In scenarios where the images exhibited intricate backgrounds, particularly in instances of blurred imagery with overlapping crystals of varied types, YOLOv8n often exhibited a tendency to omit detection results. This comparison reveals that our method exhibited superior performance in detecting crystals.

5.2. Ablation Experiments

To evaluate the efficacy of the improvements in each part of the network, including the incorporation of the EMA mechanism, the BiFPN architecture, and the WIoU loss function, we conducted ablation experiments that kept the same training strategies and hyperparameters. The performance results can be seen in Table 3, and the complexity results can be seen in Table 4. W represents the AP of crystal wire, R represents the AP of crystal rod, S represents the AP of crystal sheet, and Jlp represents the AP of jelly-like phase. Adding high-resolution features layer P2 was beneficial to improve the AP value of the crystal rod, the crystal sheet, and the crystal wire compared with the original YOLOv8n. Their AP values ranged from 74.2%, 72.2%, and 92.5% up to 76.2%, 74.1%, and 93.4%, respectively. However, utilizing the BiFPN structure resulted in a slight decrease in the AP value of the crystal wire and crystal rod, reducing the complexity of the model and the computational cost. FLOPs dropped by 0.3 G, the parameters came down by 0.14 M, and the model size was reduced by 0.5 MB. The integration of the EMA mechanism before the detection head helped improve the AP value of the crystal rod and crystal sheet. The AP values of the crystal rod and crystal sheet were increased by 1.9% and 1.4%, respectively. Replacing the original SPPF with SimSPPF effectively improved the inference speed while reducing the model size and parameters. Finally, the substitution of the CIoU with WIoU improved the AP values of all the crystals, which improved from 90.1% to 93.3% for the crystal wire, 77.6% for the crystal rod (which was close to the original value), and from 78% to 80.2% for the crystal sheet. Additionally, the value of the recall increased from 81% to 83.8%, which means that the omission of the crystals was alleviated. The increase in precision from 83% to 85.2% signifies an improvement in the accuracy of the model’s predictions. The F1-score is the harmonic mean of precision and recall. A higher F1-score indicates a better balance between precision and recall. The final model has the highest F1 score (84.49%), which makes it the most reliable model for accurate crystal detection.

5.3. Comparision of Different Models

The complexity of the model is related to the necessity of applying expensive, high-capacity, high-computing-power equipment. Additionally, the model’s complexity also affects the detection speed. To facilitate a comprehensive comparison, we selected representative models from each category. The two-stage algorithm was Faster-RCNN [40] implemented with a ResNet-50 backbone. For the one-stage algorithm, we chose YOLOv3-tiny, YOLOv5n, YOLOv7-tiny [45], and YOLOv8n. We employed four key metrics to evaluate the complexity of each model, including Floating-Point Operations (FLOPs), the number of parameters, the size of models, and frames per second (FPS). FLOPS represent the number of arithmetic operations required per image; parameters are the total number of parameters that need to be trained and FPS is the metric to evaluate inference speed. The result is shown in Table 5. It is obvious that the complexity of two-stage algorithms was much higher than that of the one-stage algorithms. YOLOv5n had the lowest complexity including FLOPs of 4.1 G, parameters of 1.76 M, and the model size of 3.78 MB. Compared with YOLOv5n, there was an increase in the complexity of YOLOv8n. Although the FLOPs of our method showed a small increase, the parameters and model size were lower, including FLOPs of 11.8 G, parameters of 2.65 M, and a model size of 5.4 MB. The YOLOv7-tiny and YOLOv3-tiny were much more complex than our method. Additionally,, we also compared the detection performance of our method with the above algorithms. We first used AP and mAP metrics to measure the detection accuracy of different algorithms for various indomethacin crystal morphologies. In Table 6, the AP of crystal wire, crystal rod, crystal sheet, and jelly-like phase are 92.5%, 74.2%, 72.2%, and 99.5%, respectively, using YOLOv8n. Although YOLOv8n was not the fastest algorithm, its detection performance exhibited an overall improvement in comparison to the other algorithms while slightly increasing the complexity. Combining detection accuracy and model lightweightedness, we choose YOLOv8n as the best basic model to modify for better detection performance. Firstly, YOLO-PBEWS demonstrated the highest AP values in the detection of each indomethacin crystal morphology. The values were 93.3%, 77.6%, 80.2%, and 99.5% in crystal wire, crystal rod, crystal sheet, and jelly-like phase. Compared with the basic YOLOv8n, there existed an increment of 0.5%, 3.4%, and 8% in crystal wire, crystal rod, and crystal sheet, respectively. Secondly, the precision–recall (PR) curve is a graphical representation used to evaluate the performance of a model, with precision plotted on the vertical axis and recall on the horizontal axis. Precision and recall are inversely related metrics, serving as a measure of a model’s predictive accuracy and its ability to identify all relevant instances, respectively.

5.4. Comparision of Different Attention Mechanisms

To augment the model’s capability for feature fusion during detection while maintaining lightweightedness, we integrated the EMA mechanism before the P2 detection head. What seems manifest is that different attention mechanisms lead to different influences on a model’s capability to extract features for the performance of target detection. Therefore, we used the YOLOv8n-P2-BiFPN-SimSPPF-WIoU(YOLO-PBSW) as the baseline and conducted comparative experiments, including adding various prevailing attention mechanisms like CA [50], CBAM [56], SEAM [57], AcMix [58], and EMA. The results are illustrated in Table 7.

The model adopting the CA mechanism had the best value of FPS, reaching up to 79.36. The incorporation of the CBAM attention mechanism into the YOLO-PBSW led to a marginal enhancement in the AP of crystal wire. However, this resulted in a decline in the AP of crystal rod and crystal sheet. The values decreased from 77.2% and 80.6% to 76.3% and 78.4%, which suggests that the CBAM is not an optimal choice for crystal morphology detection. With the addition of the SEAM, there was a 4.5% and 0.3% improvement in the AP of crystal wire and crystal rod, but the AP of the crystal sheet exhibited a 2.9% decline. AcMix caused a significant drop in FPS, which is inadequate for the purpose of real-time detection. The model with the EMA mechanism added exhibited the best detection performance. In each detection, there was an AP value of 93.3% in crystal wire, of 77.6% in crystal rod, of 80.2% in crystal sheet, and of 99.5% in jelly-like phase. Although the AP of crystal sheet slightly decreased compared with the baseline, the AP of crystal wire increased by 4% and the AP of crystal rod increased by 0.4%. Additionally, the model where EMA was added was one of the lightweight models, and FPS could reach 77.52, which illustrated that YOLO-PBEWS achieved a balance between detection performance and speed. This experiment demonstrated the efficacy of integrating the EMA mechanism for recognizing small crystal targets.

Furthermore, the detection results of adding different attention mechanisms are presented in Figure 15. In the first row, the model where the AcMix mechanism was added incorrectly identified the droplet in the upper-left corner as a crystal sheet. The model where the EMA mechanism was added did not miss the target in the right corner or at the top in comparison with the models with SEAM and CBAM. In the second line, it is obvious that the performance of the model with the EMA mechanism was the best. Models incorporating other attention mechanisms exhibited a range of degrees of target missing. In the third row, compared with other attention mechanisms, the model with the EMA mechanism showed the most outstanding performance in detecting crystal wire. In the fourth line, although the model fused with CA and SEAM identified as many crystal rods as the model merged with EMA, it demonstrated greater confidence. In the fifth line, the model with the EMA mechanism added was able to recognize the remaining crystal rod in the image. In sum, the model with EMA showed excellent performance in the detection of different crystal morphologies.

5.5. Comparision of Different Loss Functions

To assess the effectiveness of different loss functions in enhancing the model for our dataset, we conducted the experiment with different loss functions like CIoU, MPDIoU, EIoU, ShapeIoU, and WIoUv3. In addition, we selected YOLOv8n-BiFPN-EMA-SimSPPF (YOLO-PBES) as the base model.

What seems manifest from Table 8 is that the model adopting WIoUv3 demonstrated superior performance in terms of precision, recall, mAP, and FPS compared to models using other loss functions. High precision indicates that when the model predicts a positive instance, it is highly likely to be correct, and high recall suggests that the model is effective at minimizing false negatives, ensuring that fewer instances of the target class are missed during detection. The model with WIoUv3 reached 85.2% and 83.8%, which was better than the others. Additionally, it is noteworthy that the values of mAP and FPS were 87.6% and 77.52, respectively. In comparison to the model applicating CIoU, which was the original loss function, utilizing WIoUv3 increased the loss function by 1.1% and 14.63%, respectively, which means that the WIoUv3 loss function significantly enhanced both the model’s detection performance and speed.

5.6. Comparision of Different Improved Models

FFCA-YOLO is an enhanced version of the YOLO model specifically designed for the improved detection of small objects in remote sensing images. To further validate the effectiveness of the proposed method, we introduced FFCA-YOLO and its lightweight model, L-FFCA-YOLO, for comparison.

As shown in Table 9, YOLO-PBESW is the most efficient model, with the lowest FLOPs (11.8 G), the smallest size (5.46 MB), and the highest FPS (77.52), making it ideal for real-time applications. Firstly, for the more challenging small-object-detection task, our model demonstrated an advantage in detecting crystal rod and crystal sheet, achieving AP improvements of 4.6% and 5.4% over FFCA-YOLO, as well as 3.9% and 7.4% over L-FFCA-YOLO, respectively (Table 10). Secondly, Table 11 shows that YOLO-PBEWS outperforms the other models, with a leading mAP of 87.6%. It is 3.4% higher than FFCA-YOLO and 3.8% higher than L-FFCA-YOLO. Additionally, the YOLO-PBESW model achieved the highest F1-score of 84.49%, suggesting that YOLO-PBESW is more effective in maintaining a low rate of false positives and false negatives.

Based on the comparison of confusion matrices (Figure 16), YOLO-PBESW outperformed the other models in detecting crystal sheet and crystal rod morphologies, achieving 80.2% accuracy for crystal sheet and 77.6% for crystal rod. In comparison, FFCA-YOLO reached only 74.8% accuracy for crystal sheet and 73% for crystal rod, while L-FFCA-YOLO lags further behind with 72.8% for crystal sheet and 73.7% for crystal rod. These results demonstrate YOLO-PBESW’s superior ability to accurately distinguish between crystal sheet and crystal rod morphologies, leading to lower misclassification rates and more dependable detection outcomes.

Figure 17 illustrates the detection comparison of YOLO-PBESW with two other small target detection models. In the first column, FFCA-YOLO missed several instances, and L-FFCA-YOLO incorrectly identified the upper left droplet as a crystal sheet. In the third set of images, only YOLO-PBEWS recognized the crystal rod in the bottom right corner. In the last column, our model identified the largest number of targets. This result shows YOLO-PBEWS has better detection performance.

6. Conclusions

The bioavailability, solubility, permeability, and other properties of drugs are primarily influenced by the crystal morphology. Different drug crystals require specific crystallization conditions. This paper focused on indomethacin and proposed the YOLO-PBESW network to alleviate the issue of poor recognition and slow detection of indomethacin crystal morphologies, including crystal wire, crystal rod, crystal sheet, and jelly-like phase, thereby reducing the consumption of reagents and active pharmaceutical ingredients. We utilized a high-throughput droplet microfluidic system to generate indomethacin crystals, which were captured using a CCD camera and imaging software. The model enhanced the detection efficacy of small targets by integrating a high-resolution feature layer P2, and adopted the concept of BiFPN to modify network structure for reducing the computational cost and increasing the feature extraction. Additionally, we only added the EMA mechanism before the P2 detection head for improving the attention of networks to global features while maintaining lightweightedness. Furthermore, we substituted SimSPPF, which adopts the ReLU activation function, with SPPF, to lower the computational costs. Finally, the CIoU loss function was substituted with the WIoUv3, which utilized a dynamic non-monotonic focus mechanism to improve the performance of detection.

The experimental results demonstrated that the YOLO-PBESW network could reach 87.6% in the mAP metric. In addition, the model size and FPS were 5.46 MB and 77.52, respectively. Our proposed method demonstrates a superior overall performance compared to established algorithms, including Faster-RCNN, YOLOv3-tiny, YOLOv5n, and YOLOv7-tiny, when evaluated for small-target detection on crystals within high-throughput screening droplets. In addition, compared to other algorithms designed for small-object detection, such as FFCA and its lightweight version L-FFCA, our model exhibits lower complexity while achieving better performance in terms of reducing both false negatives and false positives. This exceptional performance suggests significant promise for this method in the study of crystallization. By integrating YOLO-PBEWS with droplet microfluidics technology, we can rapidly identify and classify crystal wire, crystal rod, crystal sheet, and jelly-like phase. This combination enables the precise determination of crystallization conditions for the three types of crystals and helps to avoid conditions that lead to jelly-like formation, thereby significantly reducing the consumption of reagents and active pharmaceutical ingredients.

The shortcomings of this work and future research directions are as follows. Firstly, the dataset used in this study was not sufficiently comprehensive. In future studies, the dataset will be augmented and enhanced by acquiring a larger number of videos and images for model training and testing. Secondly, the model proposed in this work has been validated only for the recognition of indomethacin crystal morphologies and has not yet been tested on other drug crystal morphologies. Future work will focus on validating and optimizing the model for the recognition of other drug crystals. Thirdly, although the model we proposed demonstrates generally effective performance, its accuracy can be compromised when crystals are partially occluded. This issue may affect detection in scenarios where crystals are not fully visible. We will explore techniques such as distillation and network pruning in the future to further reduce computational complexity and enhance detection performance. Furthermore, we intend to adapt the model for deployment on edge computing platforms. This will involve optimizing the algorithm for reduced computational complexity and memory footprint to ensure seamless integration and operation within such platforms.

Author Contributions

Conceptualization, J.W. and P.Z.; Methodology, J.W.; Validation, J.W.; Investigation, J.L.; Writing—original draft, J.W.; Writing—review & editing, J.S. and P.Z.; Funding acquisition, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China, (No. 51905557).

Data Availability Statement

The data supporting this study’s findings can be obtained from the corresponding author upon reasonable request. However, the data are not publicly available due to privacy concerns.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Schmitz, C.; Hussain, M.N.; Meers, T.; Xie, Z.; Zhu, L.; Van Gerven, T.; Yang, X. Pervaporation-Assisted Crystallization of Active Pharmaceutical Ingredients (APIs). Adv. Membr. 2023, 3, 100069. [Google Scholar] [CrossRef]
Galata, D.L.; Meszaros, L.A.; Kallai-Szabo, N.; Szabo, E.; Pataki, H.; Marosi, G.; Nagy, Z.K. Applications of Machine Vision in Pharmaceutical Technology: A Review. Eur. J. Pharm. Sci. 2021, 159, 105717. [Google Scholar] [CrossRef]
Pu, S.; Hadinoto, K. Habit Modification in Pharmaceutical Crystallization: A Review. Chem. Eng. Res. Des. 2024, 201, 45–66. [Google Scholar] [CrossRef]
Halliwell, R.A.; Bhardwaj, R.M.; Brown, C.J.; Briggs, N.E.B.; Dunn, J.; Robertson, J.; Nordon, A.; Florence, A.J. Spray Drying as a Reliable Route to Produce Metastable Carbamazepine Form IV. J. Pharm. Sci. 2017, 106, 1874–1880. [Google Scholar] [CrossRef]
Acevedo, D.; Tandy, Y.; Nagy, Z.K. Multiobjective Optimization of an Unseeded Batch Cooling Crystallizer for Shape and Size Manipulation. Ind. Eng. Chem. Res. 2015, 54, 2156–2166. [Google Scholar] [CrossRef]
Di Profio, G.; Stabile, C.; Caridi, A.; Curcio, E.; Drioli, E. Antisolvent Membrane Crystallization of Pharmaceutical Compounds. J. Pharm. Sci. 2009, 98, 4902–4913. [Google Scholar] [CrossRef] [PubMed]
Müller, M.; Meier, U.; Kessler, A.; Mazzotti, M. Experimental Study of the Effect of Process Parameters in the Recrystallization of an Organic Compound Using Compressed Carbon Dioxide as Antisolvent. Ind. Eng. Chem. Res. 2000, 39, 2260–2268. [Google Scholar] [CrossRef]
Kim, D.-C.; Yeo, S.-D. Modification of Indomethacin Crystals Using Supercritical and Aqueous Antisolvent Crystallizations. J. Supercrit. Fluids 2016, 108, 96–103. [Google Scholar] [CrossRef]
Wilkinson, M.R.; Martinez-Hernandez, U.; Huggon, L.K.; Wilson, C.C.; Dominguez, B.C. Predicting Pharmaceutical Crystal Morphology Using Artificial Intelligence. CrystEngComm 2022, 24, 7545–7553. [Google Scholar] [CrossRef]
Sui, S.; Mulichak, A.; Kulathila, R.; McGee, J.; Filiatreault, D.; Saha, S.; Cohen, A.; Song, J.; Hung, H.; Selway, J.; et al. A Capillary-Based Microfluidic Device Enables Primary High-Throughput Room-Temperature Crystallographic Screening. J. Appl. Crystallogr. 2021, 54, 1034–1046. [Google Scholar] [CrossRef]
Su, Z.; He, J.; Zhou, P.; Huang, L.; Zhou, J. A High-Throughput System Combining Microfluidic Hydrogel Droplets with Deep Learning for Screening the Antisolvent-Crystallization Conditions of Active Pharmaceutical Ingredients. Lab Chip 2020, 20, 1907–1916. [Google Scholar] [CrossRef]
Srikanth, S.; Dubey, S.K.; Javed, A.; Goel, S. Droplet Based Microfluidics Integrated with Machine Learning. Sens. Actuators-Phys. 2021, 332, 113096. [Google Scholar] [CrossRef]
Sun, M.; Fang, Q. High-Throughput Sample Introduction for Droplet-Based Screening with an on-Chip Integrated Sampling Probe and Slotted-Vial Array. Lab Chip 2010, 10, 2864. [Google Scholar] [CrossRef]
Yadavali, S.; Jeong, H.-H.; Lee, D.; Issadore, D. Silicon and Glass Very Large Scale Microfluidic Droplet Integration for Terascale Generation of Polymer Microparticles. Nat. Commun. 2018, 9, 1222. [Google Scholar] [CrossRef] [PubMed]
Fortt, R.; Tona, R.; Martin-Soladana, P.M.; Ward, G.; Lai, D.; Durrant, J.; Douillet, N. Extractive Crystallization of Cabotegravir in Droplet-Based Microfluidic Devices. J. Cryst. Growth 2020, 552, 125908. [Google Scholar] [CrossRef]
Steinwandter, V.; Borchert, D.; Herwig, C. Data Science Tools and Applications on the Way to Pharma 4.0. Drug Discov. Today 2019, 24, 1795–1805. [Google Scholar]
Neun, S.; van Vliet, L.; Hollfelder, F.; Gielen, F. High-Throughput Steady-State Enzyme Kinetics Measured in a Parallel Droplet Generation and Absorbance Detection Platform. Anal. Chem. 2022, 94, 16701–16710. [Google Scholar] [CrossRef] [PubMed]
Broecker, J.; Morizumi, T.; Ou, W.-L.; Klingel, V.; Kuo, A.; Kissick, D.J.; Ishchenko, A.; Lee, M.-Y.; Xu, S.; Makarov, O.; et al. High-Throughput in Situ X-Ray Screening of and Data Collection from Protein Crystals at Room Temperature and under Cryogenic Conditions. Nat. Protoc. 2018, 13, 260–292. [Google Scholar] [CrossRef]
Wilkinson, M.R.; Castro-Dominguez, B.; Wilson, C.C.; Martinez-Hernandez, U. Low-Cost, Autonomous Microscopy Using Deep Learning and Robotics: A Crystal Morphology Case Study. Eng. Appl. Artif. Intell. 2023, 126, 106985. [Google Scholar] [CrossRef]
Wu, Y.; Gao, Z.; Rohani, S. Deep Learning-Based Oriented Object Detection for in Situ Image Monitoring and Analysis: A Process Analytical Technology (PAT) Application for Taurine Crystallization. Chem. Eng. Res. Des. 2021, 170, 444–455. [Google Scholar] [CrossRef]
Chayatummagoon, S.; Chongstitvatana, P. Image Classification of Sugar Crystal with Deep Learning. In Proceedings of the 2021 13th International Conference on Knowledge and Smart Technology (KST-2021), Chonburi, Thailand, 21–24 January 2021; IEEE: New York, NY, USA, 2021; pp. 118–122. [Google Scholar]
Alarfaj, A.A.; Hosni Mahmoud, H.A. Feature Fusion Deep Learning Model for Defects Prediction in Crystal Structures. Crystals 2022, 12, 1324. [Google Scholar] [CrossRef]
Yann, M.; Tang, Y. Learning Deep Convolutional Neural Networks for X-ray Protein Crystallization Image Analysis. Proc. AAAI Conf. Artif. Intell. 2016, 30, 1373–1379. [Google Scholar] [CrossRef]
Manee, V.; Zhu, W.; Romagnoli, J.A. A Deep Learning Image-Based Sensor for Real-Time Crystal Size Distribution Characterization. Ind. Eng. Chem. Res. 2019, 58, 23175–23186. [Google Scholar] [CrossRef]
Yu, P.; Guo, Z.; Wang, T.; Wang, J.; Guo, Y.; Zhang, L. Insights into the Mechanisms of Natural Organic Matter on the Photodegradation of Indomethacin under Natural Sunlight and Simulated Light Irradiation. Water Res. 2023, 244, 120539. [Google Scholar] [CrossRef]
Bongioanni, A.; Bueno, M.S.; Mezzano, B.A.; Longhi, M.R.; Garnero, C.; Bongioanni, A.; Bueno, M.S.; Mezzano, B.A.; Longhi, M.R.; Garnero, C. Pharmaceutical Crystals: Development, Optimization, Characterization and Biopharmaceutical Aspects. In Crystal Growth and Chirality—Technologies and Applications; IntechOpen: Rijeka, Croatia, 2022; ISBN 978-1-80355-058-9. [Google Scholar]
Liao, H.; Huang, W.; Zhou, L.; Gao, Z.; Yin, Q. Gelation Mechanism and Oscillatory Temperature Control Strategy in Perindopril Erbumine Solution Crystallization. Cryst. Growth Des. 2023, 23, 1805–1812. [Google Scholar] [CrossRef]
Matsumoto, M.; Ohno, M.; Wada, Y.; Sato, T.; Okada, M.; Hiaki, T. Enhanced Production of α-Form Indomethacin Using the Antisolvent Crystallization Method Assisted by N2 Fine Bubbles. J. Cryst. Growth 2017, 469, 91–96. [Google Scholar] [CrossRef]
Jarmer, D.J.; Lengsfeld, C.S.; Anseth, K.S.; Randolph, T.W. Supercritical Fluid Crystallization of Griseofulvin: Crystal Habit Modification with a Selective Growth Inhibitor. J. Pharm. Sci. 2005, 94, 2688–2702. [Google Scholar] [CrossRef]
Okumura, T.; Ishida, M.; Takayama, K.; Otsuka, M. Polymorphic Transformation of Indomethacin Under High Pressures*. J. Pharm. Sci. 2006, 95, 689–700. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection 2020. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Wang, K.; Liu, Z. BA-YOLO for Object Detection in Satellite Remote Sensing Images. Appl. Sci. 2023, 13, 13122. [Google Scholar] [CrossRef]
Lowe, G. Sift-the Scale Invariant Feature Transform. Int. J. 2004, 2, 2. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 886–893. [Google Scholar]
Feng, S.; Zhou, H.; Dong, H. Using Deep Neural Network with Small Dataset to Predict Material Defects. Mater. Des. 2019, 162, 300–310. [Google Scholar] [CrossRef]
Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
Wang, J.; Zhang, T.; Cheng, Y.; Al-Nabhan, N. Deep Learning for Object Detection: A Survey. Comput. Syst. Sci. Eng. 2021, 38, 165–182. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
Girshick, R. Fast R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef]
Gao, Z.; Wu, Y.; Bao, Y.; Gong, J.; Wang, J.; Rohani, S. Image Analysis for In-Line Measurement of Multidimensional Size, Shape, and Polymorphic Transformation of l-Glutamic Acid Using Deep Learning-Based Image Segmentation and Classification. Cryst. Growth Des. 2018, 18, 4275–4281. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. ISBN 978-3-319-46447-3. [Google Scholar]
Jiang, Z.; Liu, T.; Huo, Y.; Fan, J. Image Analysis of Crystal Size Distribution and Agglomeration for β Form L-Glutamic Acid Crystallization Based on YOLOv4 Deep Learning. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 3017–3022. [Google Scholar]
Fan, R.; Chen, H.; Chen, S.; Wang, S. Scintillation Crystal Growth Quality Evaluation Based on Machine Learning. IEEE Access 2023, 11, 85191–85201. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Wang, J.; Dai, H.; Chen, T.; Liu, H.; Zhang, X.; Zhong, Q.; Lu, R. Toward Surface Defect Detection in Electronics Manufacturing by an Accurate and Lightweight YOLO-Style Object Detector. Sci. Rep. 2023, 13, 7062. [Google Scholar] [CrossRef]
Wang, Z.; Lei, L.; Shi, P. Smoking Behavior Detection Algorithm Based on YOLOv8-MNC. Front. Comput. Neurosci. 2023, 17, 1243779. [Google Scholar] [CrossRef]
Li, N.; Ye, T.; Zhou, Z.; Gao, C.; Zhang, P. Enhanced YOLOv8 with BiFPN-SimAM for Precise Defect Detection in Miniature Capacitors. Appl. Sci. 2024, 14, 429. [Google Scholar] [CrossRef]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 13713–13722. [Google Scholar]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
Kaur, P.; Khehra, B.S.; Mavi, E.B.S. Data Augmentation for Object Detection: A Review. In Proceedings of the 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), East Lansing, MI, USA, 8–11 August 2021; pp. 537–543. [Google Scholar]
Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognit. 2023, 137, 109347. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding Yolo Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. YOLO-FaceV2: A Scale and Occlusion Aware Face Detector. Pattern Recognit. 2022, 155, 110714. [Google Scholar] [CrossRef]
Pan, X.; Ge, C.; Lu, R.; Song, S.; Chen, G.; Huang, Z.; Huang, G. On the Integration of Self-Attention and Convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 815–825. [Google Scholar]

Figure 1. YOLOv8 network structure.

Figure 2. YOLOv8-PBEWS. Its improvement encompassed an additional 160 × 160 detection layer with EMA mechanism, the integration of BiFPN between the backbone and neck, the replacement of the original SPPF with SimSPPF, and the substitution of the loss function with WIou.

Figure 3. Structure with high-resolution detection layer P2. Red lines indicate newly added P2 layer.

Figure 4. Feature network design. (a) a FPN, (b) a PANet, which adds a bottom-up pathway on top of a FPN, and (c) a BiFPN, which reduces the number of nodes while achieving cross-scale feature fusion.

Figure 5. Structure of EMA.

Figure 6. Comparison of the SPPF module and the SimSPPF module: (a) represents SPPF, CBS in the SPPF module refers to a sequence of convolution, batch normalization, and SiLU activation; and (b) represents SimSPPF, CBR in the SimSPPF module represents a sequence of convolution, batch normalization, and ReLU activation.

Figure 7. Micrographs depicting three representative Indomethacin crystal morphologies within hydrogel droplets: (a) crystal wire, (c) crystal sheet, and (d) crystal rod, alongside (b) amorphous jelly-like phase (JLP). The scale bar at the bottom right corner of each image represents 400 μm.

Figure 8. Annotation performed using LabelImg. (a) crystal of wire, (b) jelly-like phase, (c) crystal of sheet, and (d) crystal of rod.

Figure 9. Model-free augmentation.

Figure 10. Basic information of training set: (a) displays the counts for each detection category; (b) depicts the dimensions of each target box, including the total number of boxes and their range of variation; (c) reveals the relative position of the target’s center point within the overall image; and (d) portrays the aspect ratio of the target’s height to width relative to the entire dataset.

Figure 11. Confusion matrix.

Figure 12. (a) Confusion matrix of YOLOv8n and (b) confusion matrix of YOLO-PBESW. In each matrix, the rows represent the actual classes, while the columns represent the predicted classes. The diagonal elements indicate the correct classifications, with darker shades representing higher accuracy.

Figure 13. P–R curves: (a) depicts the P–R curve of YOLOv8n and (b) depicts the P–R curve of YOLO-PBESW.

Figure 14. Comparison of detection: (a) represents YOLOv8’s detection result and (b) represents YOLO-PBESW’s detection result.

Figure 15. (a) Model with AcMix; (b) model with CA; (c) model with CBAM; (d) model with SEAM; and (e) model with EMA.

Figure 16. Confusion matrices of different improved models: (a) FFCA-YOLO, (b) L-FFCA-YOLO, and (c) YOLO-PBESW. In each matrix, the rows represent the actual classes, while the columns represent the predicted classes. The diagonal elements indicate the correct classifications, with darker shades representing higher accuracy.

Figure 17. (a) FFCA-YOLO; (b) L-FFCA-YOLO; and (c) YOLO-PBESW.

Table 1. Experiments’ environment configuration.

Experimental Component	Version
Operational System	Windows 10
CPU	Intel Xeno Sliver 4210R
GPU	RTX5000
CUDA version	11.8
Python version	3.8
Pytorch version	2.0.1

Table 2. Training setting.

Parameter	Value
Epochs	300
Batch Size	12
Gradient-based optimizers	SGD
Initial learning rate (lr0)	0.01
Momentum	0.937
Weight decay	0.0005

Table 3. Ablation experiments 1. The ‘√’ symbol indicates that the corresponding technique was utilized in the model’s construction for detection.

YOLOv8	P2	BiFPN	EMA	SimSPPF	WIou	W (%)	R (%)	S (%)	Jlp (%)	Precision (%)	Recall (%)	F1 (%)	[email protected] (%)
√						92.5	74.2	72.2	99.5	83.4	80.6	81.9	84.6
√	√					93.4	76.2	74.1	99.5	84.7	82.2	83.4	85.8
√	√	√				92.3	75.7	76.3	99.5	84.5	80.5	82.4	86
√	√	√	√			88.7	77.6	77.7	99.5	83.4	81.4	82.4	85.9
√	√	√	√	√		90.1	78.5	78	99.5	83	81	81.9	86.5
√	√	√	√	√	√	93.3	77.6	80.2	99.5	85.2	83.8	84.5	87.6

Table 4. Ablation Experiments 2. The ‘√’ symbol indicates that the corresponding technique was utilized in the model’s construction for detection.

YOLOv8n	P2	BiFPN	EMA	SimSPPF	WIou	FLOPs (G)	Parameters (M)	Size (MB)	FPS
√						8.1	3	6.1	119
√	√					12.2	2.92	5.96	96.15
√	√	√				11.9	2.78	5.46	88.5
√	√	√	√			11.8	2.67	5.69	76.9
√	√	√	√	√		11.8	2.66	5.46	79.3
√	√	√	√	√	√	11.8	2.65	5.46	77.52

Table 5. Comparison of complexity.

Model	FLOPs (G)	Parameters (M)	Size (MB)	FPS
Faster-RCNN	370.2	137	108	7.46
YOLOv3-tiny	18.9	12.1	23.2	166.7
YOLOv5n	4.1	1.76	3.78	111.1
YOLOv7-tiny	13.0	6.01	12	96.15
YOLOv8n	8.1	3	6.1	119
L-FFCA-YOLO	3.8	0.47	1.85	62.89
YOLO-PBESW	11.8	2.65	5.46	77.52

Table 6. Comparison of performance.

Model	Crystal Wire AP	Crystal Rod AP	Crystal Sheet AP	Jelly-like Phase AP	mAP
Faster-RCNN	93%	23%	25%	100%	68.35%
YOLOv3-tiny	87.3%	66%	64.5%	99.5%	84.6%
YOLOv5n	87.2%	69.5%	69.3%	99.5%	81.4%
YOLOv7-tiny	89.4%	71.4%	72.9%	99.7%	83.4%
YOLOv8n	92.5%	74.2%	72.2%	99.5%	84.6%
L-FFCA-YOLO	88.8%	71%	70.4%	99.1%	82.3%
YOLO-PBESW	93.3%	77.6	80.2%	99.5%	87.6%

Table 7. Comparison of different attention mechanisms.

Model	Crystal Wire AP	Crystal Rod AP	Crystal Sheet AP	Jelly-like Phase AP	Model Size	FPS
YOLO-PBSW	89.3%	77.2%	80.6%	99.5%	5.46 MB	78.13
+CA	91.7%	78.1%	79.5%	99.5%	5.46 MB	79.36
+CBAM	89.9%	76.3%	78.4%	99.5%	5.46 MB	72.46
+SEAM	93.8%	77.9%	78.8%	99.5%	5.47 MB	75.18
+AcMix	89%	77.5%	77.7%	99.5%	5.48 MB	57.80
+EMA	93.3%	77.6	80.2%	99.5%	5.46 MB	77.52

Table 8. Comparison of different loss functions.

YOLO-PBES	Precision (%)	Recall (%)	[email protected] (%)	FPS
CIoU	83%	81%	86.5%	62.89
MPDIoU	82%	82.2%	86%	73.53
EIoU	83.9%	82.1%	86.1%	74.63
ShapeIoU	84.4%	80.7%	85.3%	76.34
WIoUv3	85.2%	83.8%	87.6%	77.52

Table 9. Comparison of different improved models.

Model	FLOPs (G)	Parameters (M)	Size (MB)	FPS
FFCA-YOLO	51.2	7.1	14.5	58.13
L-FFCA-YOLO	37.1	5.04	10.6	50.25
YOLO-PBESW	11.8	2.65	5.46	77.52

Table 10. Comparison of improved models’ performance 1.

Model	Crystal Wire AP	Crystal Rod AP	Crystal Sheet AP	Jelly-like Phase AP
FFCA-YOLO	89.6%	73%	74.8%	99.5%
L-FFCA-YOLO	89.4%	73.7%	72.8%	99.3%
YOLO-PBESW	93.3%	77.6	80.2%	99.5%

Table 11. Comparison of improved models’ performance 2.

YOLO-PBES	Precision (%)	Recall (%)	F1-Score (%)	[email protected] (%)
FFCA-YOLO	85.6	80.7	83	84.2
L-FFCA-YOLO	86	81.4	83.6	83.8
YOLO-PBESW	85.2	83.8	84.5	87.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, J.; Liang, J.; Song, J.; Zhou, P. YOLO-PBESW: A Lightweight Deep Learning Model for the Efficient Identification of Indomethacin Crystal Morphologies in Microfluidic Droplets. Micromachines 2024, 15, 1136. https://doi.org/10.3390/mi15091136

AMA Style

Wei J, Liang J, Song J, Zhou P. YOLO-PBESW: A Lightweight Deep Learning Model for the Efficient Identification of Indomethacin Crystal Morphologies in Microfluidic Droplets. Micromachines. 2024; 15(9):1136. https://doi.org/10.3390/mi15091136

Chicago/Turabian Style

Wei, Jiehan, Jianye Liang, Jun Song, and Peipei Zhou. 2024. "YOLO-PBESW: A Lightweight Deep Learning Model for the Efficient Identification of Indomethacin Crystal Morphologies in Microfluidic Droplets" Micromachines 15, no. 9: 1136. https://doi.org/10.3390/mi15091136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLO-PBESW: A Lightweight Deep Learning Model for the Efficient Identification of Indomethacin Crystal Morphologies in Microfluidic Droplets

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Improved YOLOv8

3.1.1. High-Resolution Feature Layer P2

3.1.2. BiFPN

3.1.3. EMA Mechanism

3.1.4. WISE-IoU Function

3.1.5. SimSPPF

4. Experiments

4.1. Experiments’ Environments and Training Paraments

4.2. Datasets

4.3. Evaluation Metrics

5. Results and Discussion

5.1. Experimental Analysis

5.1.1. Confusion Matrix

5.1.2. P–R Curve

5.1.3. Detection

5.2. Ablation Experiments

5.3. Comparision of Different Models

5.4. Comparision of Different Attention Mechanisms

5.5. Comparision of Different Loss Functions

5.6. Comparision of Different Improved Models

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI