An Accurate Urine Red Blood Cell Detection Method Based on Multi-Focus Video Fusion and Deep Learning with Application to Diabetic Nephropathy Diagnosis

Hao, Fang; Li, Xinyu; Li, Ming; Wu, Yongfei; Zheng, Wen

doi:10.3390/electronics11244176

Open AccessArticle

An Accurate Urine Red Blood Cell Detection Method Based on Multi-Focus Video Fusion and Deep Learning with Application to Diabetic Nephropathy Diagnosis

by

Fang Hao

,

Xinyu Li

,

Ming Li

^*,

Yongfei Wu

and

Wen Zheng

College of Data Science, Taiyuan University of Technology, Taiyuan 030024, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(24), 4176; https://doi.org/10.3390/electronics11244176

Submission received: 3 November 2022 / Revised: 11 December 2022 / Accepted: 12 December 2022 / Published: 14 December 2022

(This article belongs to the Special Issue Deep Learning in Environmental, Electrical, and Biomedical Engineering: Recent Advances and Future Trends)

Download

Browse Figures

Versions Notes

Abstract

:

Background and Objective: Detecting urine red blood cells (U-RBCs) is an important operation in diagnosing nephropathy. Existing U-RBC detection methods usually employ single-focus images to implement such tasks, which inevitably results in false positives and missed detections due to the abundance of defocused U-RBCs in the single-focus images. Meanwhile, the current diabetic nephropathy diagnosis methods heavily rely on artificially setting a threshold to detect the U-RBC proportion, whose accuracy and robustness are still supposed to be improved. Methods: To overcome these limitations, a novel multi-focus video dataset in which the typical shape of all U-RBCs can be captured in one frame is constructed, and an accurate U-RBC detection method based on multi-focus video fusion (D-MVF) is presented. The proposed D-MVF method consists of multi-focus video fusion and detection stages. In the fusion stage, D-MVF first uses the frame-difference data of multi-focus video to separate the U-RBCs from the background. Then, a new key frame extraction method based on the three metrics of information entropy, edge gradient, and intensity contrast is proposed. This method is responsible for extracting the typical shapes of U-RBCs and fusing them into a single image. In the detection stage, D-MVF utilizes the high-performance deep learning model YOLOv4 to rapidly and accurately detect U-RBCs based on the fused image. In addition, based on U-RBC detection results from D-MVF, this paper applies the K-nearest neighbor (KNN) method to replace artificial threshold setting for achieving more accurate diabetic nephropathy diagnosis. Results: A series of controlled experiments are conducted on the self-constructed dataset containing 887 multi-focus videos, and the experimental results show that the proposed D-MVF obtains a satisfactory mean average precision (mAP) of 0.915, which is significantly higher than that of the existing method based on single-focus images (0.700). Meanwhile, the diabetic nephropathy diagnosis accuracy and specificity of KNN reach 0.781 and 0.793, respectively, which significantly exceed the traditional threshold method (0.719 and 0.759). Conclusions: The research in this paper intelligently assists microscopists to complete U-RBC detection and diabetic nephropathy diagnosis. Therefore, the work load of microscopists can be effectively relieved, and the urine test demands of nephrotic patients can be met.

Keywords:

urine red blood cell detection; diabetic nephropathy diagnosis; multi-focus video fusion; YOLOv4

1. Introduction

The urine of nephrotic patients usually contains numerous normal and abnormal red blood cells (RBCs) [1]. According to morphological characteristics, the abnormal urine red blood cells (U-RBCs) can be subdivided into three categories: heterocyst, orbicular, and shadow. Detecting U-RBCs with the support of an optical microscope is an important operation for the diagnosis of nephropathy [2,3,4,5]. By establishing an artificial threshold, researchers were able to accurately identify glomerulonephritis and even make a preliminary diagnosis of IgA and diabetic nephropathy based on the proportion of detected U-RBCs. Currently, U-RBC detection is mainly completed by microscopists. However, manual U-RBC detection makes it difficult to satisfy the urine test demand, especially with the increasing number of nephrotic patients. Meanwhile, long hours and high-intensity labor can not only decrease the U-RBC detection accuracy and efficiency of microscopists but also harm their eyesight and spines.

Benefiting from the rapid development of deep learning technology, a series of U-RBC detection methods have been proposed [6,7,8,9,10,11,12]. The U-RBC images obtained at a single focus of the microscope are used as detection data in all these methods. However, since U-RBCs are often distributed at different depths in the urine sample under the microscope, there are abundant U-RBCs with defocusing phenomena in the single-focus images. That is, only the U-RBCs at a certain depth can be clearly observed when the corresponding focus is set, and the shapes of U-RBCs at other depths may be blurred or obviously deformed. Therefore, the single-focus images severely limit the further improvement of U-RBC detection accuracy.

It is worth noting that the microscopists are accustomed to continuously adjusting the microscope’s focus when observing U-RBCs. Inspired by this habit, the author and his research team [13] proposed a new data type named “multi-focus videos of U-RBCs”, which contain the typical shapes of all U-RBCs no matter which depths they are distributed at in the urine sample. The experimental results prove that replacing single-focus images with multi-focus videos can effectively reduce the occurrence of misclassification. However, the current methods can only complete the classification task and cannot yet implement U-RBC detection on multi-focus videos.

In this paper, an accurate U-RBC detection method based on multi-focus video fusion (D-MVF) is proposed. The proposed D-MVF consists of two stages: multi-focus video fusion and U-RBCs detection. In the fusion phase, referring to the process of mature multi-focus image fusion technology [14,15,16,17], the U-RBCs objects were first separated from the background relying on the frame difference data of multi-focus video. Then, the typical shapes of all separated U-RBCs are extracted through computing three image metrics about information entropy, edge gradient, and intensity contrast, and finally, the extracted typical shapes of U-RBCs are fused into the separated background part to form a single image in which all U-RBCs have a clear morphological shape. In the detection phase, a high-accuracy and rapid target detection deep learning model called You Only Look Once version 4 (YOLOv4) [18,19,20] was employed to detect and classify various U-RBCs from the fused image. There are 887 U-RBC multi-focus videos collected as the experimental dataset. In addition, this paper uses the k-nearest neighbor (KNN) model to replace the traditional artificial threshold for achieving a more accurate diagnosis of diabetic nephropathy (DN). A series of controlled experimental results verify that the accuracy of the proposed D-MVF model is apparently higher than that of the existing methods relying on single-focus images. The accuracy and specificity of DN diagnosis based on D-MVF and KNN basically reach the level of professional microscopists. There is no doubt that the research in this paper has the potential to assist microscopists in detecting U-RBCs and diagnosing nephropathy.

The remainder of this paper is structured as follows: Section 2 introduces this research’s related works. Section 3 contains the procedure and mathematical principle of the proposed D-MVF, as well as the complete process of diabetic nephropathy diagnosis based on the detected U-RBC proportion. Section 4 gives the experimental designs and results in detail. Section 5 contains the paper’s conclusion.

2. Related Work

With the support of accurate and robust deep learning algorithms, researchers designed a series of U-RBC detection methods. Initially, Fernandez et al. demonstrated the feasibility of using an artificial neural network (ANN) to detect U-RBCs and created a sophisticated urine test system [6]. Then, Pan et al. [7] and Ji et al. [8] introduced a high-performance convolutional neural network (CNN) into the field of U-RBC detection and effectively improved the detection accuracy for several U-RBC categories. Based on introducing CNN, Li et al. [9] modified the representative CNN (LeNet-5 neural network) by changing the numbers of output nodes and convolutional layers, and the experimental result proved that this modification is effective. Subsequently, Suhail et al. [10] compared the U-RBC detection performance of different CNNs, such as LeNet-5, RCNN, SSD, and its variants. Simultaneously, Zhao et al. [11] tried to implement U-RBC detection by using a Siamese network with a dual-mode contrastive loss function, and they emphasized the importance of adjusting the loss function for improving the detection accuracy. In previous work, the author’s team attempted to use Faster R-CNN to detect various types of U-RBCs and obtained impressive detection and recall rates [12]. However, there are still many false and missed detections in the results of the above methods because the single-focus images used for these methods contain numerous blurred and deformed U-RBCs caused by the defocusing phenomenon. In the authors’ previous work [13], they put forward a new data type referred to as “multi-focus video,” which efficiently solves the defocusing phenomenon of single-focus images, and it was proven that multi-focus videos have an advantage in improving the accuracy of U-RBC detection over single-focus images.

Generally, multi-focus videos can be directly processed by deep learning-based video processing algorithms for object detection. However, directly dealing with multi-focus video may result in low accuracy for U-RBC detection because U-RBCs only have clear morphological shapes in one frame of the video. An alternative method is to fuse the multi-focus video into one single image that contains clear and typical morphological shapes for all the U-RBC targets. The existing multi-focus data fusion methods can be divided into two directions: space fusion and frequency fusion. Most space fusion strategies include two steps: segmentation and typical shape extraction. The researchers first segment the foreground objects and then extract their typical shapes using basic image features represented by methods, such as the Laplacian pyramids [15,17]. In recent years, due to the powerful image feature extraction ability of CNN, it has substituted the classical feature representation method to fuse multi-focus data and achieved inspiring fusion results [14]. The main strategy of frequency fusion methods is to extract and fuse the high-frequency components of multi-focus data to obtain the clear shapes of foreground objects. In general, image transform algorithms, such as the wavelet transform (WT) and the non-subsampled shearlet transform (NSST) [16], are used to acquire the frequency feature of multi-focus data. Target detection has been very mature, but video object detection is still developing because deformation, occlusion, motion blur, and other factors will cause the object to be undetectable in the middle frame (its appearance changes greatly). VID (object detection from video) has become a challenge in 2015. Tracking is incorporated into the detection. By using track to learn the similarity between different frame features, the object track is used to obtain the displacement of the target between frames [21]. The researchers realize target detection on video by mining the relationship between objects in the current sequence and other different sequences [22].

However, the existing methods cannot be directly applied to fuse the multi-focus videos of U-RBCs. Because the current space fusion methods are designed for fusing two or three multi-focus images, they lack the capability to fuse a multi-focus video with several hundred frames. Meanwhile, since the clear and typical shapes of U-RBCs are not consistent, the current frequency fusion methods cannot extract the typical shapes of U-RBCs. The reason the video detection network is not suitable for this multi-focus video is that one typical form can behave as another typical form in the out-of-focus state. This makes the classification effect of a video-based target detection network worse. In summary, it is necessary to design a novel algorithm that can acquire the typical shapes of U-RBCs and simultaneously fuse massive frames from the multi-focus video.

3. Methodology for the Proposed D-MVF

A designed U-RBCs detection method based on multi-focus video fusion (D-MVF) is presented in this section. The overall framework of D-MVF is shown in Figure 1. It consists of two phases: the multi-focus video fusion phase and the U-RBCs detection phase. It should be emphasized that the U-RBC detection on the multi-focus videos cannot be realized by simply integrating the detection results of the same target on different frames because the U-RBCs can only be correctly detected on the key frame. If integrating the detection results of the same target on different frames, the error detection results on other frames will evidently reduce the U-RBC detection accuracy. Therefore, the “fusion first and then detection” strategy in the D-MVF model is more suitable for U-RBC detection. The mathematical principle of multi-focus video fusion and the structure of the detection model for U-RBCs are detailed in the following subsections. In addition, this section also discusses the feasibility of this accurate diabetic nephropathy diagnosis based on the proportion of detected U-RBCs.

3.1. Multi-Focus Video Fusion

3.1.1. Necessity of Utilizing U-RBC Multi-Focus Videos

The U-RBCs can usually be classified into four typical shapes, which are shown in Figure 2. The typical shapes of U-RBCs can only be observed under the appropriate focus of a microscope. Because U-RBCs are distributed at different depths in urine samples, there will be numerous U-RBCs with defocusing phenomena at single focus under the microscope. On the one hand, the defocusing phenomenon makes the shapes of U-RBCs blurry, as shown in Figure 3a. On the other hand, the defocusing phenomenon can deform the shapes of U-RBCs, as shown in Figure 3b. Obviously, the blurred and deformed U-RBCs can lead to frequent errors and missed detections. The multi-focus videos can capture the typical shape of U-RBCs in one frame with the focus of microscope tuning, and it has been proven that using multi-focus videos as training data is an effective way to solve the defocusing phenomenon of single-focus images [13].

3.1.2. Principles of Multi-Focus Video Fusion

As mentioned in Section 2, the prerequisite for detecting U-RBC on multi-focus video is to fuse it into a single image containing the clear and typical shape of all U-RBC objects. Indeed, there is still a lack of a general method that can fuse multi-focus video with hundreds of frames into one single image at the same time. Inspired by the multi-focus image space fusion method, the fusion of multi-focus video can be divided into two steps: segmentation and typical shape extraction. In the first step, the U-RBCs are segmented from the background based on the frame difference data of the multi-focus videos. In the second step, the typical U-RBC shapes are extracted based on the three metrics of information entropy, edge gradient, and intensity contrast. Finally, the extracted typical shape of U-RBCs is fused into the separated background part, and the fusion of multi-focus video is realized. The principles of multi-focus video fusion are shown in the green area of Figure 1.

Firstly, the U-RBCs are separated from the frame background. Since the size of the visual field and the position of the U-RBC are not changed when the multi-focus video is collected, it has a special property: only the shapes of foreground objects (urine sediments) represented by U-RBCs can change, while the brightness and range of the background (urine supernatant) remain basically unchanged. This property means that in the frame difference data of multi-focus video, only the pixels located on the foreground objects can have large values, while the values of the pixels located on the background may be close to 0. Thus, by accumulating the frame difference data of multi-focus video, the heatmap of foreground objects can be obtained, and its calculation formula is as follows:

M = \sum_{i = 1}^{N - 1} | F_{i + 1} - F_{i} |

(1)

where M is the heatmap, N is the frame number of multi-focus video, and F_i is the i-th frame of multi-focus video. The mask obtained by the OTSU binarization of a heatmap can roughly mark the position and outline of the foreground objects. By applying counterpoint multiplication with the mask, the foreground objects and background in each frame of the multi-focus video can be segmented from each other. Considering that the foreground objects in multi-focus video will only change their shapes instead of positions, the foreground objects with the same position in different frames can be spliced into a series of sub-videos. In addition, the backgrounds of different frames will also be spliced into a sub-video. When the U-RBC exists in a sub-video, there must be a key frame in this sub-video containing the typical shape of the U-RBC. The key frame extraction method proposed in this paper is introduced as follows.

Secondly, the key frame with the typical shape of U-RBCs from the sub-video of U-RBCs is extracted. As shown in Figure 4, the typical shapes of U-RBCs mainly contain three characteristics. (1) The color inside the U-RBC should be consistent and coincidental. (2) The outline of the U-RBC should be clear. (3) The contrast between the U-RBCs and the background should be high. In the field of digital image processing, information entropy, edge gradient, and standard deviation are commonly used to measure the colour uniformity, contour clarity, and contrast of graphics. Among them, information entropy is inversely proportional to colour uniformity, while edge gradient and standard deviation are proportional to clarity and contrast. Therefore, an evaluation index based on the three characteristics (EI-TC) can be constructed to determine whether the shape of the U-RBC is typical. The evaluation index

K

of typical shape is defined as follows:

K = \frac{E S}{H}

(2)

with

H = - \sum_{i = 0}^{255} p_{i} \log_{2} p_{i}

(3)

E = \sum_{x = 1}^{M} \sum_{y = 1}^{N} {(f (x, y) \otimes L)}^{2}

(4)

S = \sqrt{\sum_{x = 1}^{M} \sum_{y = 1}^{N} \frac{{(f (x, y) - μ)}^{2}}{M N}}

(5)

where H, E, and S are the entropy, edge gradient, and standard deviation, respectively; p_i is the proportion of pixels with the intensity value equalling i in the image; f (x, y) denotes the intensity value at pixel (x, y); M and N are the row and column of the image, respectively; L is the Laplacian operator; and μ is the average intensity value. Among all the frames in each sub-video, the higher the K value of a frame is, the more typical the shape of the U-RBC is in the frame, so the frame with the largest K value is deemed the key frame containing the most typical shape of the U-RBC. Based on the coordinates of foreground objects and backgrounds marked by the mask, the key frames of all sub-videos are stitched together to obtain a fused image. The process of multi-focus video fusion is summarized in Algorithm 1.

It is noteworthy that a detection model with high performance rather than a simple classification model is still required in the subsequent detection phases, although the mask obtained at the fusion stage marks the approximate positions of U-RBCs. On the one hand, when the positions of several U-RBCs in the multi-focus video are too close, they will be marked as the same foreground object in the mask, as shown in Figure 5a. On the other hand, when the U-RBC is too close to other urine sediments, such as leukocytes and epithelial cells, the U-RBC and other urine sediments are marked as the same foreground object in the mask, as shown in Figure 5b. Therefore, directly classifying these two foreground objects can lead to a significant decrease in detection accuracy.

Algorithm 1. Procedure for multi-focus video fusion.
Input: U-RBC multi-focus video v. Output: Fused image f.
1:	Calculating the heat map H of frame difference data in v;
2:	Obtaining the mask M by binarizing H;
3:	Based on M, segmenting the foreground objects and backgrounds of all F_i, and then splicing the foreground objects and backgrounds with the same positions in different F_i into a series of sub-videos s;
4:	for each s do;
5:	for each frame S_i of s do;
6:	Calculating the evaluation index of typical shape K of S_i;
7:	end for;
8:	S_i with the largest K is the key frame KS of s;
9:	end for;
10:	According to the positions of foreground objects and background objects marked by M, the KS of all s are stitched together to obtain a fused image f.

3.2. Architecture of the U-RBC Detection

Once a fused image with clear shapes of U-RBCs is obtained, any state-of-the-art detection model could be applied to detect U-RBCs. This paper uses the high-performance model YOLOv4 to implement the detection task. The architecture of YOLOv4 is shown in the blue area of Figure 1. The input of YOLOv4 is a U-RBC-fused image generated from the fusion stage. Subsequently, the CSPDarknet-53 model containing 5 residual blocks is responsible for rapidly generating high-quality feature maps of the fused image. PANet is responsible for connecting feature maps of different sizes under the premise of not changing the channel number of feature maps to enrich the shallow semantic information of feature maps. Then, the head of YOLOv4 receives the three sets of feature maps transmitted from PANet and standardizes their sizes as 19 × 19 × 255, 38 × 38 × 255 and 76 × 76 × 255. The head is responsible for generating many boxes (anchors) that may contain U-RBCs based on the feature maps, and each box contains only one U-RBC. Finally, YOLOv4 outputs these boxes to mark the classification and location of U-RBCs in the fused image.

The mosaic data enhancement method is used in Yolov4. Mosaic data enhancement adopts four images, which are spliced using random scaling, random clipping, and random layout.

As shown in Figure 6, with the increasing number of training iterations, the loss function will gradually converge to the minimum, and the training of YOLv4 is supposed to be stopped.

Training is the prerequisite for YOLOv4 to detect U-RBCs. Whether the training process of YOLOv4 is completed is determined by the loss function, which is defined as follows:

L o s s = L_{c d} + L_{p c} + L_{n c} + L_{c}

(6)

L_{c d} = λ_{c o o r d} \sum_{i = 0}^{K \times K} \sum_{j = 0}^{M} I_{i j}^{o b j} (2 - w_{i} \times h_{i}) (1 - C I O U)

(7)

L_{p c} = \sum_{i = 0}^{K \times K} \sum_{j = 0}^{M} I_{i j}^{o b j} [{\hat{C}}_{i} \log (C_{i}) + (1 - {\hat{C}}_{i}) \log (1 - C_{i})]

(8)

L_{n c} = λ_{n o o b j} \sum_{i = 0}^{K \times K} \sum_{j = 0}^{M} I_{i j}^{n o o b j} [{\hat{C}}_{i} \log (C_{i}) + (1 - {\hat{C}}_{i}) \log (1 - C_{i})]

(9)

L_{c} = \sum_{i = 0}^{K \times K} \sum_{j = 0}^{M} I_{i j}^{n o o b j} \sum_{c \in c l a s s e s} [{\hat{p}}_{i} (c) \log (p_{i} (c)) + (1 - {\hat{p}}_{i} (c)) \log (1 - p_{i} (c))]

(10)

where

L_{c d}

represents the coordinate loss of positive samples,

L_{p c}

and

L_{n c}

are the cross-entropy loss of confidence loss of positive samples and negative samples, respectively,

L_{c}

is cross entropy loss of classification loss of positive samples.

λ_{c o o r d}

and

λ_{n o o b j}

are the positive sample weight coefficient and negative sample weight coefficient, respectively; the values of

I_{i j}^{o b j}

and

I_{i j}^{n o o b j}

are 0 or 1, which are used to judge whether they are positive samples or negative samples;

h_{i}

is the height of the prediction box center point;

w_{i}

is the weight of the prediction box center point;

K \times K

is the number of prediction boxes; CIOU is CIOU-loss, which considers three factors: overlapping area, center distance, and aspect ratio.

As a typical single-stage detection network, the loss function of YOLOv4 is the sum of 3 sub-loss functions: the confidence loss (

L_{p c} + L_{n c}

), the classification loss (

L_{c}

), and the bounding box regression loss (

L_{c d}

).

L_{p c}

and

L_{n c}

are utilized to judge whether the boxes proposed in HEAD contain U-RBCs and

L_{c}

is utilized to determine the classification of U-RBCs.

L_{c d}

is essentially a Complete IoU (CIOU) Loss function, which is utilized to measure the deviation between the predicted position of boxes and the actual position of U-RBCs. The trained YOLOv4 has an ideal location and classification capabilities for U-RBCs in the fused images and can effectively resist the interference of other types of urine sediments, such as leukocytes, bacteria, mucus, casts, and epithelial cells, as shown in Figure 7.

3.3. Diabetic Nephropathy Diagnosis Based on the U-RBC Proportion

As pointed out in Section 1, counting the proportion of different U-RBCs that can be used to diagnose diabetic nephropathy has a solid basis in pathological theory [23,24]. Specifically, diabetic nephropathy can cause glomerulosclerosis and capillary rupture, so the urine of patients may contain massive abnormal U-RBCs. The heterocyst U-RBCs in abnormal U-RBCs are caused by the compressional deformation of RBCs when they pass through the glomerular basement membrane (GBM) cracks. However, the diabetic nephropathy cannot change the permeability of GBM, obviously, and the hyperglycemic environment can reduce the possibility of U-RBC compressional deformation. It can be speculated that numerous abnormal U-RBCs and rare heterocyst U-RBCs in the urine are the typical markers of diabetic nephropathy.

To verify the above inference, urine samples from 71 nephrotic patients in Shanxi Provincial People’s Hospital were collected in 2021. Among them, 12 patients have diabetic nephropathy (DN), 23 patients have IgA nephropathy (IgAN), 23 patients have membranous nephropathy (MN), 2 patients have hypertensive nephropathy (HN), 5 patients have ANCA nephropathy (ANCAN), 2 patients have lupus nephropathy (LN), and 4 patients have purpura nephropathy (PN). The urine of all these patients contains many abnormal U-RBCs. To explore the proportional differences of different abnormal U-RBCs in DN and non-diabetic renal disease (NDRD), Figure 8 shows the box diagram of heterocyst, orbicular, and shadow U-RBC proportions for DN and NDRD patients and the growth trend charts of heterocyst, orbicular, and shadow U-RBC proportions in DN and NDRD along with the increasing proportion of abnormal U-RBCs. In the growth trend charts, the linear fitting slope (LFS) of each curve is calculated to mark the growth difference between different abnormal U-RBCs in DN and NDRD. Figure 8a indicates that the proportion of heterocyst U-RBCs can be used to distinguish DN and NDRD. In the box diagram, compared with NDRD, the maximum, upper quartile, and median heterocyst U-RBC proportions in DN are decreased by 0.456, 0.155, and 0.054, respectively. In the growth trend diagram, with the increasing proportion of abnormal U-RBCs, the LFS of the heterocyst U-RBC proportion in the NDRD is 71.4% higher than that in the DN. While Figure 8b points out that the proportion of orbicular U-RBCs has no obvious discrimination between DN and NDRD. In the box diagram, the maximum, upper quartile, median, and lower quartile of the orbicular U-RBC proportion in DN are only slightly higher than those in NDRD, respectively. In the growth trend diagram, the LSF of the orbicular U-RBC proportion in DN and NDRD is almost the same. Similarly, Figure 8c emphasizes that the proportion of shadow U-RBCs is not suitable for diagnosing DN and NDRD. Although in the growth trend diagram the LSF of the shadow U-RBC proportion in DN is 43.4% higher than that in NDRD, the box diagrams of the shadow U-RBC proportion in DN and NDRD are highly similar.

In summary, the proportion of heterocysts and abnormal U-RBCs can be regarded as two effective identifying and distinguishing features for DN and NDRD. Considering that the current DN diagnosis method heavily relies on an artificial threshold setting, it is obvious that a fixed threshold cannot be utilized to diagnose various patients accurately and robustly. Therefore, this paper employs the K-nearest neighbor (KNN), a mature machine learning method, to replace the threshold method for achieving a more accurate DN diagnosis. Section 4.5 describes the experiments comparing the artificial threshold setting method and KNN. In the KNN algorithm, the distance measurement is Euclidean distance. For two n-dimensional vectors,

x

and

y

, the distance between them is

D (x, y) = \sqrt{{(x_{1} - y_{1})}^{2} + x_{2} - {y_{2}}^{2} + \dots + x_{n} - {y_{n}}^{2}} = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

(11)

4. Experimental Design and Result

PyTorch 1.6.0 was used to implement the proposed approach in Python on a Linux server with an NVIDIA Tesla V100 32 GB GPU and an Intel(R) Core (TM) i7−8700 @3.2 GHz. Ubuntu 16.04 served as the operating system.

In this section, based on the constructed experimental dataset, the effectiveness of the EI-TC key frame extraction method proposed in the D-MVF fusion stage is verified, and the U-RBC detection accuracy of the D-MVF with existing state-of-the-art methods is compared. In addition, this section also compares the performance of DN diagnosis between D-MVF/KNN and professional microscopists. The description of the multi-focus video dataset and experimental results are detailed in the following subsections.

4.1. U-RBC Multi-Focus Video Dataset

From the urine of the 71 nephrotic patients mentioned in the second paragraph of Section 3.3, a total of 887 U-RBC multi-focus videos were collected as the experimental dataset of which 567 videos were set as the training dataset, 160 videos were set as the validation dataset, and 160 videos were the test dataset. In the process of collecting a multi-focus video, the microscopist rotates the focusing wheel of a 400× optical microscope clockwise to make the shapes of all U-RBCs change from blurry to clear, then back to blurry. Meanwhile, the multi-focus videos of U-RBC are recorded by the matching CCD camera at 30 frames per second. The collected multi-focus videos have uniform sizes (2000 × 2000), and each video contains multiple U-RBCs. The classification and location of U-RBCs in all multi-focus videos are labeled by a professional microscopist who has nine years of experience in detecting U-RBCs. Additionally, while collecting each multi-focus video, single-focus images within the same visual field are collected for control experiments.

4.2. Performance Metrics

Three types of evaluation indicators are adopted in this paper. The first category includes key frame quantitative evaluation indicators, such as Peak Signal-to-Noise Ratio (PSNR) and Structural SIMilarity Index (SSMI). The second category is the quantitative evaluation of U-RBCs’ classification results by the D-MVF framework: average precision (AP), mean average precision (mAP), mean average precision (mAR), and harmonic mean (HM). The third category is the quantitative evaluation indicators of DN diagnosis results: accuracy and specificity.

Multi-focus video can observe the morphology of U-RBCs under different focal lengths. One of them must have a typical morphology of urinary red blood cells, and the frames with a typical morphology are defined as key frames. In the experiment, the key frames of U-RBCs in some videos were artificially labeled, and the three characteristic methods were used to find the key frames of U-RBCs in the same video. To judge whether the key frame found by the three-characteristic method is correct, it is necessary to calculate the similarity between the urine red blood cell morphology extracted from the key frame and the U-RBCs morphology in the tag key frame. PSNR and SSIM are both used to measure image similarity. If they are consistent, the keyframes are found correctly. Assume the video frame size is

m \times n

, the manually marked key frames are denoted as I, the key frames discovered using the three-characteristic method are denoted as

K

, and the mean square error (MSE) and PSNR are defined as follows:

M S E = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[I (i, j) - K (i, j)]}^{2}

(12)

P S N R = 10 \cdot \log (\frac{M A X_{I}^{2}}{M S E})

(13)

where

M A X_{I}^{2}

is the maximum possible pixel value of the image.

The formula of SSIM is based on three comparative measures between

I

and

K

: luminance (

l

), contrast (

c

), and structure (

s

) which are defined as follows:

l (I, K) = \frac{2 μ_{I} μ_{K} + c_{1}}{μ_{I}^{2} + μ_{K}^{2} + c_{1}}

(14)

c (I, K) = \frac{2 σ_{I} σ_{K} + c_{2}}{σ_{I}^{2} + σ_{K}^{2} + c_{2}}

(15)

s (I, K) = \frac{σ_{I K} + c_{3}}{σ_{I} σ_{K} + c_{3}}

(16)

S S I M (I, K) = [l {(I, K)}^{α} \cdot c {(I, k)}^{β} \cdot s {(I, K)}^{γ}]

(17)

where

μ_{K}

and

μ_{I}

are the means of

K

and

I

respectively,

σ_{I}

and

σ_{K}

are the variances of

K

and

I

respectively,

σ_{I K}

is the covariance of

K

and

I

,

c_{1} = {(k_{1} L)}^{2}

and

c_{2} = {(k_{2} L)}^{2}

are two constants,

c_{3} = c_{2} / 2

,

L

is the value range of the pixel value,

k_{1} = 0.01

and

k_{2}

= 0.03 is the default value.

The parameters mAP and mAR are used to evaluate the results of U-RBCs’ classification, which are defined as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(18)

R e c a l l = \frac{T P}{T P + F N}

(19)

A P = \frac{1}{n} \sum_{i}^{n} \sum_{j}^{m} P r e c i s i o n_{i k}

(20)

A R = \frac{2}{H} \sum_{h}^{H} m a x (I o U (g t_{h}) - 0.5, 0)

(21)

m A P = \frac{1}{C} \sum_{j}^{C} A P_{j}

(22)

m A R = \frac{1}{C} \sum_{j}^{C} A R_{j}

(23)

H M = 2 \cdot \frac{m A P \cdot m A R}{m A P + m A R}

(24)

where

A R

is the average recall,

C

is the number of categories,

n

is the number of images, and

m

is the number of positive instances in each image. True positives (

T P

): the number of positive instances correctly divided; false positives (

F P

): the number of positive instances wrongly divided; false negatives (

F N

): the number of false negatives; true negatives (

T N

): the number of correctly divided negative cases.

I o U

is the overlapping area (intersection) between the predicted bounding box and the ground truth bounding box, divided by the joint area (union) between them.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(25)

S p e c i f i c i t y = \frac{T N}{T N + F P}

(26)

4.3. Comparison between Different Key Frame Extraction Methods

The extraction of key frames is the most important stage of multi-focus video fusion. To verify the accuracy and real-time performance of the proposed key frame extraction algorithm proposed in Section 3.1, 40 multi-focus videos were constructed by partially segmenting multi-focus videos as the experimental data. The partially segmented multi-focus videos only contain a single U-RBC object, and 60 multi-focus videos for each type of U-RBC are collected. The two widely recognized fast key frame extraction methods based on frame difference data (K-FD) [25] and optical flow data (K-OF) [26], as well as the accurate multi-instance learning (MIL) key frame extraction method [27], are compared with this proposed EI-TC algorithm. The experimental results for comparison are shown in Figure 9. It is discovered that the EI-TC algorithm could find a satisfied key frame with a clear U-RBC shape. In addition, the PSNR and SSIM values between the extracted key frame by four comparing methods and the actual key frame are calculated, and the time consumption of four comparing methods is given to qualitatively compare the performance of key frame extraction. The experimental results are listed in Table 1.

The experimental results listed in Table 1 confirm that the proposed EI-TC method has obvious advantages over the other three methods in terms of key frame extraction quality and speed. Different key frame extraction methods perform differently in the four typical morphologies of U-RBCs. On the aspect of key frame extraction quality, the mean values of PSNR and SSIM for the four typical morphologies of the EI-TC reach inspiring values of 35.249 and 0.955, which are 2.6% to 4.7% and 0.3% to 0.5% higher than the best results of existing methods, respectively. Only for the normal morphology of U-RBCs, the mean value of PSNR and the mean value of SSIM are slightly lower than MIL by 0.9% and 0.2%, respectively. On the aspect of key frame extraction speed, the mean value of time for the EI-TC method to extract key frames from a multi-focus video is only 0.538 s, which surpasses the competing methods by an order of magnitude.

4.4. Comparison between Different U-RBC Detection Methods

The author’s previous work [13] has proven that the deep learning algorithms have sufficient research potential for accurately detecting U-RBCs. To validate the performance of this proposed D-MVF on a multi-focus video dataset, two sets of comparing experiments for U-RBC target detection are carried out using four high-performance deep learning models: Faster R-CNN [28], Cascade R-CNN [29], RetinaNet [30], and YOLOv4 (applied in this proposed framework). Among them, Faster R-CNN and Cascade R-CNN are the state-of-the-art models in the field of U-RBC detection [12]. The object detection algorithms used in this paper are based on 2D images. If the key frames are not extracted from the multi-focus video, the object detection algorithm cannot be directly applied to the video. Therefore, the pseudo-ablation experiment is adopted: the microscopic doctors selected images that they thought could clearly distinguish the morphology of U-RBCs from the same video as the reference group without key frame extraction and compared them with the images with key frame extraction and fusion. The images selected by microscopic doctors are treated as single-focus images.

The first experiment is constructed to compare the detection accuracy of existing U-RBC detection methods based on single-focus images. The second experiment is to test the detection accuracy of competing detection models based on fused multi-focus video. In the first experiment, the collected single-focus images are fed into the four detection models. In the second experiment, the multi-focus videos were first fused into a series of multi-focus fusion images by this proposed fusion algorithm. Subsequently, the fused images are fed into the four detection models. The sub-index to evaluate the model detection accuracy on each type of U-RBC is the average precision (AP), as shown in Table 2. The overall evaluation indexes include mean average precision (mAP), mean average recall (mAR), and the harmonic mean (HM) of mAP and mAR, shown in Table 3.

In Table 2, by comparing the results between the first and second groups of experiments, it is obvious that the proposed D-MVF framework has significant advantages in U-RBC detection accuracy compared with existing methods and the used dataset. Especially on all the U-RBCs, the D-MVF model with YOLOv4 as the detection stage achieves the highest AP values. For the single-form U-RBCs represented by orbicular U-RBCs, the AP of D-MVF with YOLOv4 is only 10.3% to 15.9% higher than existing methods, while for severely deformed heterocyst U-RBCs, the AP gap increases to more than 44.8%.

The experimental results in Table 3 further validate the conclusions in Table 2. No matter which model is used in the detection stage, the mAP, mAR, and HM of D-MVF generally exceed the same model using single-focus images by 10% to 30%. Particularly, the YOLOv4 has been shown to be the best D-MVF detection model. The mAP of D-MVF achieves an inspiring 0.915 with the help of YOLOv4, and the gap between it and other models used as the detection stage is as high as 611%. Naturally, the HM of D-MVF with YOLOv4 also reaches the highest value of 0.922. Experiments show that the D-MVF algorithm with key frame extraction and image fusion algorithms recommended in this paper is necessary for object detection tasks on multi-focus videos of U-RBCs and improves the performance of target detection tasks.

4.5. Comparison between KNN and Artificial Threshold in Diagnosing DN

Many abnormal U-RBCs and rare heterocyst U-RBCs in urine are typical markers of diabetes nephropathy. So, the ratio of abnormal red blood cells to heterologous red blood cells in the urine can be used to help to figure out if the urine samples are from DN patients. It is worth noting that professional microscopists detect and count abnormal and heterocyst U-RBC proportions by an artificial threshold, which is an empirical estimate. Usually, the current artificial threshold standard is that when the proportion of abnormal U-RBCs is more than 50% or the proportion of heterocysts is more than 5%, the patient can be diagnosed with NDRD; otherwise, the patient has DN [23,24]. However, this threshold is based on experience. This paper uses the KNN algorithm to find the proportion that can replace the artificial threshold and improve diagnostic accuracy.

Among the 71 nephrotic patients corresponding to the multi-focus video dataset, 39 patients are from the training and validation datasets, and 32 patients are from the test dataset. The experimental process of comparison between KNN and artificial threshold in diagnosing DN can be divided into three steps. Firstly, the proportion of abnormal and heterocyst U-RBCs from the 39 patients (corresponding to the training and validation datasets) is used to train KNN. To avoid overfitting, the KNN algorithm adopts 5-fold cross validation. The empirical value of K is assumed to be 1. Secondly, the trained KNN and artificial threshold are responsible for diagnosing whether the 32 patients (corresponding to the test dataset) have diabetic nephropathy. Thirdly, considering that the clinical significance of correctly diagnosing NDRD is higher than DN, accuracy and specificity are utilized to evaluate the DN diagnostic performance of KNN and the artificial threshold. While the input of KNN is the U-RBC proportion detected and counted by professional microscopists or the D-MVF model, KNN used the two U-RBC proportions to complete the DN diagnosis. The experimental results of KNN using 5-fold cross validation with professional microscopists, KNN using 5-fold cross validation with D-MVF, and artificial threshold with professional microscopists are shown in Figure 10.

The comparison of the blue column and green column in Figure 10 points out that the DN diagnosis performance of KNN with 5-fold cross-validation is significantly higher than the artificial threshold when the detection and counting methods of U-RBCs are the same. With the support of professional microscopists, the DN diagnosis accuracy and specificity of KNN using 5-fold cross-validation reach 0.785 and 0.858, which are 8.6% and 4.5% higher than the artificial threshold, respectively. The comparison of the orange column and the green column in Figure 9 points out that although the U-RBC detection and counting accuracy is slightly lower than that of professional microscopists, the performance of intelligent DN diagnosis based on D-MVF and KNN has basically reached the level of professional microscopists. The accuracy of KNN with D-MVF is only 4.3% lower than the artificial threshold with professional microscopists, and the specificity of KNN with D-MVF is 4.3% higher than the artificial threshold with professional microscopists. In addition, the speed of intelligent DN diagnosis is 3.78 times as fast as that of professional microscopists.

5. Conclusions and Discussion

In this paper, an innovative D-MVF method to realize U-RBC detection is proposed based on the multi-focus video dataset for the first time. The independently designed D-MVF is a composite method that integrates multi-focus video fusion and U-RBCs detection. At the fusion stage, the typical shapes of all U-RBCs in a multi-focus video are fused into one image. At the detection stages, the YOLOv4 model is utilized to output the classifications and locations of all U-RBCs on the fused image. The above process enables D-MVF to effectively avoid the missed and false detections caused by the defocused U-RBC. Based on the detected and counted U-RBC proportion, the traditional artificial threshold setting is replaced by the KNN model to effectively improve the performance of DN diagnosis. A series of experimental results indicate that the mAP of D-MVF using multi-focus videos achieves an inspiring 0.915, which is 30.7%–38.4% higher than existing U-RBC detection methods based on single-focus images. In addition, the accuracy and specificity of intelligent DN diagnosis based on D-MVF and KNN basically reach the level of professional microscopists. In summary, this proposed intelligent DN diagnosis can assist microscopists in U-RBC detection and DN diagnosis to effectively relieve the working burden of microscopists.

This method has some limitations. The method used in this paper to process multi-focus video is based on the method of processing multi-focus images, i.e., image fusion is performed on multi-focus images. Multi-focus video has two features: continuous time and space. In this paper, only spatial features are used when searching for key frames. The appearance of atypical morphology in the out-of-focus process of U-RBCs has a time correlation. In addition, potential information may be lost during image fusion. Therefore, how to use 3D convolutional neural networks-based methods to process multi-focus video is the future research direction.

Author Contributions

Conceptualization, W.Z. and M.L.; methodology, F.H. and X.L.; writing–review and editing, F.H. and X.L.; formal analysis, Y.W. and W.Z.; investigation, Y.W., M.L. and W.Z.; writing—original draft preparation, X.L.; supervision, X.L.; project administration, W.Z. and M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, Grant Nos. 11771321, 61901292 and 62101376; the Natural Science Foundation of Shanxi Province, China, Grant Nos. 201901D211080 and 201901D211078; the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi, Grant No. 2019L0350.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are included in the article.

Acknowledgments

Appreciate to the Shanxi Provincial People’s Hospital for the desensitization information. Appreciate the microscopist’s thorough explanation of the data collection procedure.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fairley, K.F.; Birch, D.F. Hematuria: A simple method for identifying glomerular bleeding. Kidney Int. 1982, 21, 105–108. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rizzoni, G.; Braggion, F.; Zacchello, G. Evaluation of glomerular and non-glomerular hematuria by phase-contrast microscopy. J. Pediatr. 1983, 103, 370–374. [Google Scholar] [CrossRef] [PubMed]
Rath, B.; Turner, C.; Hartley, B.; Chantler, C. What makes red cells dysmorphic in glomerular haematuria? Pediatr. Nephrol. 1992, 6, 424–427. [Google Scholar] [CrossRef] [PubMed]
Lettgen, B.; Wohlmuth, A. Validity of g1-cells in the differentiation between glomerular and non-glomerular haematuria in children. Pediatr. Nephrol. 1995, 9, 435–437. [Google Scholar] [CrossRef] [PubMed]
Ozkan, I.A.; Koklu, M.; Sert, I.U. Diagnosis of urinary tract infection based on artificial intelligence methods. Comput. Methods Programs Biomed. 2018, 166, 51–59. [Google Scholar] [CrossRef]
Fernandez, E.; Barlis, J.; Dematera, K.A.; LLas, G.; Paeste, R.M.; Taveso, D.; Velasco, J.S. Four class urine microscopic recognition system through image processing using artificial neural network. J. Telecommun. Electron. Comput. Eng. (JTEC) 2018, 214–218. [Google Scholar]
Pan, J.; Jiang, C.; Zhu, T. Classification of urine sediment based on convolution neural network. AIP Conf. Proc. 2018, 1955, 040176. [Google Scholar]
Ji, Q.; Li, X.; Qu, Z.; Dai, C. Research on urine sediment images recognition based on deep learning. IEEE Access 2019, 7, 166711–166720. [Google Scholar] [CrossRef]
Li, T.; Jin, D.; Du, C.; Cao, X.; Chen, H.; Yan, J.; Chen, N.; Chen, Z.; Feng, Z.; Liu, S. The image-based analysis and classification of urine sediments using a LeNet-5 neural network. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2020, 8, 109–114. [Google Scholar] [CrossRef]
Suhail, K.; Brindha, D. A review on various methods for recognition of urine particles using digital microscopic images of urine sediments. Biomed. Signal Process. Control. 2021, 68, 102806. [Google Scholar]
Zhao, X.; Xiang, J.; Ji, Q. Urine red blood cell classification based on Siamese Network. In Proceedings of the 2nd International Workshop on Electronic communication and Artificial Intelligence (IWECAI), Nanjing, China, 12–14 March 2021; Volume 1873, p. 012089. [Google Scholar]
Li, K.; Li, M.; Wu, Y.; Li, X.; Zhou, X. An accurate urine erythrocytes detection model coupled faster rcnn with vggnet. In Proceedings of the 2020 Conference on Artificial Intelligence and Healthcare (CAIH2020), Taiyuan China, 23–25 October 2020; pp. 224–228. [Google Scholar]
Li, X.; Li, M.; Wu, Y.; Zhou, X.; Hao, F.; Liu, X. An accurate classification method based on multi-focus videos and deep learning for urinary red blood cell. In Proceedings of the 2020 Conference on Artificial Intelligence and Healthcare (CAIH2020), Taiyuan China, 23–25 October 2020. [Google Scholar]
Nian, Z.; Jung, C. Cnn-based multi-focus image fusion with light field data. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1044–1048. [Google Scholar]
Kou, L.; Zhang, L.; Zhang, K.; Sun, J.; Han, Q.; Jin, Z. A multi-focus image fusion method via region mosaicking on Laplacian pyramids. PLoS ONE 2018, 13, e0191085. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Deng, N.; Xin, B.-J.; Xing, W.-Y.; Zhang, Z.-Y. Nonwovens structure measurement based on NSST multi-focus image fusion. Micron 2019, 123, 102684. [Google Scholar] [CrossRef] [PubMed]
Sun, J.; Han, Q.; Kou, L.; Zhang, L.; Zhang, K.; Jin, Z. Multi-focus image fusion algorithm based on Laplacian pyramids. J. Opt. Soc. Am. A-Opt. Image Sci. Vis. 2018, 35, 480–490. [Google Scholar] [CrossRef] [PubMed]
Albahli, S.; Nida, N.; Irtaza, A.; Yousaf, M.H.; Mahmood, M.T. Melanoma lesion detection and segmentation using yolov4-darknet and active contour. IEEE Access 2020, 8, 198403–198414. [Google Scholar] [CrossRef]
Montalbo, F. A computer-aided diagnosis of brain tumors using a finetuned yolo-based model with transfer learning. KSII Trans. Internet Inf. Syst. 2021, 14, 4816–4834. [Google Scholar]
Lyra, S.; Mayer, L.; Ou, L.; Chen, D.; Timms, P.; Tay, A.; Chan, P.; Ganse, B.; Leonhardt, S.; Antink, C.H. A deep learning based camera approach for vital sign monitoring using thermography images for ICU patients. Sensors 2021, 21, 1495. [Google Scholar] [CrossRef]
Feichtenhofer, C.; Pinz, A.; Zisserman, A. Detect to Track and Track to Detect. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Han, M.; Wang, Y.; Chang, X.; Qiao, Y. Mining Inter-Video Proposal Relations for Video Object Detection; Springer: Cham, Switzerland, 2020. [Google Scholar]
Lee, E.Y.; Chung, H.C.; Choi, S.O. Non-diabetic renal disease in patients with non-insulin dependent diabetes mellitus. Yonsei Med. J. 1999, 40, 321–326. [Google Scholar] [CrossRef] [Green Version]
Dong, Z.; Wang, Y.; Qiu, Q.; Hou, K.; Zhang, L.; Wu, J.; Zhu, H.; Cai, G.; Sun, X.; Zhang, X.; et al. Dysmorphic erythrocytes are superior to hematuria for indicating non-diabetic renal disease in type 2 diabetics. J. Diabetes Investig. 2016, 7, 115–120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, H.; Pan, L.; Meng, W. Key frame extraction from online video based on improved frame difference optimization. In Proceedings of the IEEE 14th International Conference on Communication Technology, Chengdu, China, 9–11 November 2012; pp. 940–944. [Google Scholar]
Chen, Y.; Hu, W.; Zeng, X.; Li, W. Indexing and matching of video shots based on motion and color analysis. In Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision, Singapore, 5–8 December 2006. [Google Scholar]
Zhu, W.; Hu, J.; Sun, G.; Cao, X.; Qiao, Y. A key volume mining deep framework for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1991–1999. [Google Scholar]
Lin, B.-S.; Chen, J.-L.; Tu, Y.-H.; Shih, Y.-X.; Lin, Y.-C.; Chi, W.-L.; Wu, Y.-C. Using deep learning in ultrasound imaging of bicipital peritendinous effusion to grade inflammation severity. IEEE J. Biomed. Health Inform. 2020, 24, 1037–1045. [Google Scholar] [CrossRef]
Sun, C.-Y.; Hong, X.-J.; Shi, S.; Shen, Z.-Y.; Zhang, H.-D.; Zhou, L.-X. Cascade faster R-CNN detection for vulnerable plaques in oct images. IEEE Access 2021, 9, 24697–24704. [Google Scholar] [CrossRef]
Du, X.; Wang, X.; Ni, G.; Zhang, J.; Hao, R.; Zhao, J.; Wang, X.; Liu, J.; Liu, L. Sdof-net: Super depth of field network for cell detection in leucorrhea micrograph. IEEE J. Biomed. Health Inform. 2021, 26, 1229–1238. [Google Scholar] [CrossRef]

Figure 1. The overall framework of D-MVF.

Figure 2. The typical shapes of U-RBCs.

Figure 3. Examples of U-RBCs with blurred shapes and deformed shapes. (a) typical shapes and corresponding blurred shapes; (b) typical shapes and corresponding deformed shapes.

Figure 4. The 3 characteristics of typical U-RBC shapes.

Figure 5. The purple regions are the masks of (a) several U-RBCs close to each other, and (b) U-RBCs and other urine sediments close to each other.

Figure 6. Trend plots of the loss function for YOLOv4.

Figure 7. The YOLOv4 detection results of U-RBC fusion images: (a) without interference, (b) interfered with the leukocytes, (c) interfered with the bacteria, (d) interfered with the mucus, (e) interfered with the casts, and (f) interfered with the epithelial cells.

Figure 8. The box diagram and growth trend diagram of heterocyst, orbicular, and shadow URBCs in DN and NDRD. (a) The box plot and growth trend of heterocyst U-RBCs; (b) The box plot and growth trend of orbicular U-RBCs; (c) The box plot and growth trend of shadow U-RBCs.

Figure 9. Key frames are extracted by different methods. (a) actual key frames; (b) key frames extracted by the K-FD model; (c) key frames extracted by the K-OF model; (d) key frames extracted by the MIL model; (e) key frames extracted by the EI-TC model adopted by this paper.

Figure 10. The accuracy and specificity of KNN and artificial thresholds in DN diagnosis.

Table 1. Comparison of PSNR, SSIM, and time consumption between four key frame extraction methods in different typical morphologies of urinary red blood cells. The bolded numbers are the best results.

Method	Index	Normal	Heterocyst	Orbicular	Shadow	Mean
K-FD [25]	PSNR	36.011	31.486	31.170	35.993	33.665
	SSIM	0.957	0.937	0.910	0.966	0.950
	Time	8.662	6.800	6.275	5.254	6.748
K-OF [26]	PSNR	36.211	31.892	31.464	37.498	34.266
	SSIM	0.972	0.940	0.923	0.970	0.951
	Time	9.835	7.289	7.118	6.035	7.569
MIL [27]	PSNR	36.754	32.106	32.327	36.172	34.340
	SSIM	0.974	0.938	0.929	0.966	0.952
	Time	10.732	7.680	7.591	7.142	8.286
EI-TC	PSNR	36.405	33.074	33.705	37.810	35.249
	SSIM	0.972	0.938	0.939	0.970	0.955
	Time	0.605	0.521	0.562	0.463	0.538

Table 2. Comparison of AP between various U-RBC detection methods using single-focus and multi-focus images in different typical morphologies of urinary red blood cells. The bolded numbers are the best results.

Data Type	Model	Normal	Heterocyst	Orbicular	Shadow
Single focus images	Faster R-CNN	0.784	0.443	0.871	0.680
	Cascade R-CNN	0.801	0.396	0.829	0.682
	RetinaNet	0.814	0.336	0.863	0.632
	YOLOv4	0.764	0.525	0.831	0.674
Multi-focus videos	D-MVF w Faster R-CNN	0.943	0.671	0.894	0.909
	D-MVF w Cascade R-CNN	0.947	0.686	0.919	0.918
	D-MVF w RetinaNet	0.897	0.604	0.917	0.891
	D-MVF w YOLOv4	0.974	0.760	0.961	0.963

Table 3. Comparison of mAP, mAR, and HM between different U-RBC detection methods based on single-focus image and multi-focus videos. The bolded numbers are the best results.

Data Type	Model	mAP	mAR	HM
Single focus images	Faster R-CNN	0.694	0.873	0.733
	Cascade R-CNN	0.677	0.840	0.750
	RetinaNet	0.661	0.929	0.722
	YOLOv4	0.700	0.760	0.729
Multi-focus videos	D-MVF w Faster R-CNN	0.854	0.966	0.907
	D-MVF w Cascade R-CNN	0.867	0.963	0.912
	D-MVF w RetinaNet	0.827	0.978	0.896
	D-MVF w YOLOv4	0.915	0.930	0.922

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hao, F.; Li, X.; Li, M.; Wu, Y.; Zheng, W. An Accurate Urine Red Blood Cell Detection Method Based on Multi-Focus Video Fusion and Deep Learning with Application to Diabetic Nephropathy Diagnosis. Electronics 2022, 11, 4176. https://doi.org/10.3390/electronics11244176

AMA Style

Hao F, Li X, Li M, Wu Y, Zheng W. An Accurate Urine Red Blood Cell Detection Method Based on Multi-Focus Video Fusion and Deep Learning with Application to Diabetic Nephropathy Diagnosis. Electronics. 2022; 11(24):4176. https://doi.org/10.3390/electronics11244176

Chicago/Turabian Style

Hao, Fang, Xinyu Li, Ming Li, Yongfei Wu, and Wen Zheng. 2022. "An Accurate Urine Red Blood Cell Detection Method Based on Multi-Focus Video Fusion and Deep Learning with Application to Diabetic Nephropathy Diagnosis" Electronics 11, no. 24: 4176. https://doi.org/10.3390/electronics11244176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Accurate Urine Red Blood Cell Detection Method Based on Multi-Focus Video Fusion and Deep Learning with Application to Diabetic Nephropathy Diagnosis

Abstract

1. Introduction

2. Related Work

3. Methodology for the Proposed D-MVF

3.1. Multi-Focus Video Fusion

3.1.1. Necessity of Utilizing U-RBC Multi-Focus Videos

3.1.2. Principles of Multi-Focus Video Fusion

3.2. Architecture of the U-RBC Detection

3.3. Diabetic Nephropathy Diagnosis Based on the U-RBC Proportion

4. Experimental Design and Result

4.1. U-RBC Multi-Focus Video Dataset

4.2. Performance Metrics

4.3. Comparison between Different Key Frame Extraction Methods

4.4. Comparison between Different U-RBC Detection Methods

4.5. Comparison between KNN and Artificial Threshold in Diagnosing DN

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI