Sharpness-Based Distance Detection

Jin, Ying; Zhou, Cangtao; Dai, Wanjun

doi:10.3390/app14198913

Open AccessArticle

Sharpness-Based Distance Detection

by

Ying Jin

^1,2,

Cangtao Zhou

¹ and

Wanjun Dai

^1,*

¹

College of Engineering Physics, Shenzhen University of Technology, Shenzhen 518118, China

²

College of Applied Technology, Shenzhen University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8913; https://doi.org/10.3390/app14198913

Submission received: 27 August 2024 / Revised: 28 September 2024 / Accepted: 1 October 2024 / Published: 3 October 2024

Download

Browse Figures

Versions Notes

Abstract

:

With the advancement of artificial intelligence, visual ranging has become a widely researched field. This paper introduces a novel method for distance measurement by evaluating the sharpness of objects in the current frame. It is well known that the image is sharpest at the camera’s focal point and becomes blurry when moving away from it. Leveraging this characteristic, this study utilizes sharpness to achieve distance measurement. Initially, the specific orientation of the target object is identified and located. The image area of the target object is cropped in a certain direction, and its image quality is assessed through sharpness calculation. Subsequently, the relationship between sharpness and distance values is analysed statistically, and a function curve depicting their correlation is plotted. Consequently, the distance between the target object and the detector can be determined based on sharpness with an extremely small margin of error using this method.

Keywords:

sharpness assessment; distance measurement; image quality; visual ranging

1. Introduction

Measuring the distance between objects plays a key role in many fields. From the earliest physical distance measurement (e.g., straightedge, tape measure, etc.), to the traditional sensor technology for distance measurement (e.g., ultrasonic, infrared sensors), to the present visual distance measurement, human beings have always been searching for better measurement technology. During this period, many inter-object ranging methods have been born, and currently the commonly used ranging methods are laser ranging [1,2], ultrasonic ranging [3,4], radar ranging [5], and machine-vision ranging [6,7], etc. Among them, laser ranging requires a higher level of accuracy, and it can be used to measure the distance between objects. Laser ranging has a high cost, and when the direction of the laser beam changes during the travelling of the object, the target is lost; ultrasonic ranging is affected by the ambient temperature, and the ability of ultrasonic waves to extract information from the surrounding environment is weak; radar ranging is affected by other radar devices, which leads to a large change in the accuracy of the measurement.

In this paper, the direct distance between the target object and the detector will be obtained with the visual ranging method [8,9], which is often used in visual navigation [10], traffic safety [11], visual obstacle avoidance [12], and many other fields. What is different from the previous visual ranging is that, in this paper, the distance detection between objects is carried out by the change in image sharpness. For visual ranging, target detection is the problem to be solved in the first stage [13]. In this paper, YOLOv7 will be used to detect target objects, because YOLO has many advantages, such as: high speed—YOLO will replace an image through the method of grid, and realise the mapping directly from grid to region, reducing redundant calculation and saving a lot of search time; global reasoning—thanks to the model of the YOLO grid used to generate candidate regions, when classifying objects, YOLO uses the entire image for convolution calculation, thus significantly reducing background errors (that is, mistaking the background for an object target) compared to Fast-RCNN; good generalization—YOLO will model the size of the object, usually appearing in the position. When migrating to other areas of detection, it can maintain model AP as well as DPM, which is a strong modeling method for image space. After YOLO target detection, the target object area is extracted, and then the image quality of the area where the target object is located is evaluated.

In general, image quality assessment can be divided into full-reference, partial-reference and no-reference methods. The commonly used objective evaluation of full-reference image quality is mainly based on three aspects of pixel statistics, information theory, and structural information, such as the peak signal-to-noise ratio (PSNR), the information fidelity criterion (IFC), the structural similarity (Structural Similarity Index, SSIM), and so on. Partial reference also becomes semi-reference, which takes part of the feature information of the ideal image as a reference and compares and analyses the image to be evaluated, so as to obtain the result of image quality evaluation. While reference-free image quality assessment is a challenging computer vision problem, the current reference-free image quality assessment is mainly performed using end-to-end deep learning models for pixel-level image quality assessment. With the rapid development of deep learning technology, significant progress has been made in the field of image quality evaluation [14]. The main advantages of deep learning are the ability to build end-to-end feature learning systems instead of manually produced features, and the ability of deep learning to automatically extract and generalise features from training samples to form effective quality prediction models. Researchers have attempted to further improve the quality assessment of images by utilising the powerful extraction of features as a function of convolutional neural networks (CNNs). Through continuous research on convolutional neural networks, Zhu et al. [15] evaluated image quality through an approach based on an optimised convolutional neural network structure, aiming at automatically extracting distinctive image quality features, improving the network learning capability and predicting the evaluation scores through normalisation and packet loss. Ma et al. [16] proposed a CNN-based method for image chunking quality evaluation. It was shown that the method has strong advantages in terms of image compression time and compression efficiency. Yang et al. [17] proposed a new reference-free image quality evaluation method using a migration learning technique. The feature-sharing process is optimised by introducing a network called TTL which focuses on the transfer of semantic features. After several experiments, it is shown that the network has good generalisation ability and can effectively handle different types of images. Liu et al. [18] proposed an improved method for reference-free image sharpness evaluation, which obtains the relevant edge point locations and directions by using the Canny edge detection algorithm based on the activation mechanism and the edge direction detection algorithm based on grey scale information of eight adjacent pixels, and then solves the edge width to build a histogram of the edge width. Finally, according to the performance of the three types of distance factors based on the histogram information, the type 3 distance factors are introduced into the weighted average edge-width-solving model to obtain the sharpness evaluation index. Recently, Li et al. [19] proposed to adaptively fuse high-level semantic features and low-level semantic features using attention blocks. It extracts content and distortion information from images through an image quality evaluation method based on content awareness and distortion inference. Not only does it achieve a good prediction accuracy, it also has strong generalisation performance in cross-dataset testing. This provides a scientific and effective research direction for blind image quality evaluation.

With the continuous development of deep learning, many breakthroughs have been made in visual ranging. Recently, Yang et al. [20] proposed a vehicle distance measurement method using YOLOv5, and the results show that the average error is 3.15% within 60 m of a two-way lane. Huang et al. [21] proposed a small target pedestrian ranging method by combining YOLOv3 with a geometric relation method. Zhang et al. [22] achieved object distance measurement by improving YOLOv5, combining it with a binocular camera to calculate the difference between two images to achieve object distance measurement, showing an average error of 2.05% over a range of 16 m. Similarly, Zhang et al. [23] improved the YOLOv5 algorithm and utilised the spatial geometry of the pixels in the image to achieve distance measurement.

In this paper, distance detection of objects will be realised by combining object sharpness. By building a CNN architecture model and training it on different datasets, an evaluation score that has a higher correlation with human ratings is obtained. After the sharpness of the target object and the distance between the detector and the statistical value are obtained, draw the sharpness of the target object and the distance between the target object and the detector function curve, so that the distance information of the target object can be obtained through the sharpness measurement. This method is very different from other visual ranging methods; most of the pure visual ranging is measured by the size of the target object, but in the actual application scenario, the target object is unknown, it is difficult to know the size of the target object, and its application scope is greatly limited. The method proposed in this paper does not need to know the size of the target object, and the application range has been greatly improved. The measuring method proposed in this paper has many advantages, such as less calculation, a higher measuring accuracy and a lower cost.

2. Target Identification and Location

YOLO is used for object recognition [24,25]. YOLOV7 [26] was used in this paper. Although it is not the latest YOLO version, its speed and accuracy exceed all known object detectors. This section will introduce the network structure of YOLOV7, the mechanism of recognition, and related recognition algorithms. Finally, YOLOV7 was used to detect the target.

2.1. YOLOv7 Basic Model

This paper adopts the YOLOv7 model from the YOLO series for object tracking and recognition. YOLOv7 is the base model in the YOLO family, which outperforms most known target detectors in terms of speed and accuracy in the range of 5–160 frames/s, and has the highest accuracy among the real-time target detectors above 30 frames/s known for GPU V100. Depending on the code-running environment (edge GPU, normal GPU, and cloud GPU), three basic models are set up, which are called YOLOv7-tiny, YOLOv7 and YOLOv7-W6. Compared with other network models in the YOLO series, the detection idea of YOLOv7 is similar to that of YOLOv4 and YOLOv5. YOLO’s real-time object detection capabilities enable rapid recognition and tracking of a variety of objects such as vehicles, pedestrians [27,28], bicycles, and other obstacles [29,30]. These capabilities have been applied in many areas, including motion recognition [31] in video sequences used for surveillance [32], sports analysis [33], and human–computer interaction [34].

In the network structure of YOLOv7, the network is composed of three modules: Input (input), Backbone (backbone) and Head (head network). When an image passes through this neural network, the Input module scales the input image with the purpose of meeting the input size requirements of the backbone network. The backbone network is composed of a CBS layer, E-ELAN (Extended-ELAN) layer [35], MPConv layer, and SPPCSPC layer. The CBS layer is the convolutional layer, and the E-ELAN layer is the high-efficiency layer; the ELAN module is shown in Figure 1. The aggregation network changes the original computational modules on the basis of the ELAN layer, and achieves this without destroying the ELAN layer through expand, shuffle, and merge cardinality to improve the learning ability of the network without destroying the original gradient path. The MPConv layer adds a Maxpool (Maximum Pooling) layer to the convolutional layer, which fuses the previously extracted feature information and improves the network’s generalisability. The SPPCSPC layer adds multiple MaxPool layers to avoid the image processing operations that lead to image distortion, and also achieves the fusion of feature information at different levels. Finally, the fused feature information is detected and the results are outputted. Evaluated on the MS COCO dataset test-dev 2017, the YOLOv7-E6 achieved 55.9% AP and an AP50 resolution of 73.5% with an input size of 1280 pixels at 50 FPS on NVIDIA V100 (Nvidia, Santa Clara, CA, USA) [36].

2.2. Attention Mechanism

The attention mechanism is a method for processing data information in machine learning [37]. In the field of machine learning, there is a large amount of data information to be processed; however, only a small portion of this data information is often important, which makes the attention mechanism particularly important.

The introduction of the attention mechanism in the YOLOv7 network can not only obtain more detailed information about the target region, but also make the network model’s representational ability improve dramatically, thus reducing the interference of most of the invalid information and achieving the purpose of improving the detection effect of neural networks.

2.3. DeepSORT Algorithm

The DeepSORT algorithm is optimised and improved on the basis of the SORT [38] algorithm but still performs detection tracking. The DeepSORT algorithm takes the target feature information extracted by deep learning as the matching standard. The first step is target detection. The DeepSORT algorithm relies on the target detector to determine the target position of each frame in the video. The output of the detector usually includes the bounding box and category of the target. After that, feature extraction of the object is carried out by using deep learning models to extract the appearance features of the object, which are essential for the re-recognition of the target object. Through matching and tracking, this process involves calculating the similarity between the detection and prediction boxes and using the Hungarian algorithm to find the best match. There is also a special mechanism of cascade matching. First, the cost matrix is composed of the cosine distance matrix calculated by the track feature and the detection box feature, and the Mahalanobis distance matrix calculated by the prediction box and the detection box. By matching the detection output results, the unmatched target is defined as a new target, and a new ID number is assigned to the target. This helps improve the accuracy of matching. After cascading matching processing, the blocked target can be recovered and the number of ID switching when the target is blocked again can be reduced. Finally, there is trajectory management, where DeepSORT maintains trajectories for each target and initializes new trajectories for newly detected targets. The flow process is shown in Figure 2.

2.4. Target Detection Experiment

The experimental results obtained by using YOLO for target detection are shown in Figure 3. The objects on the screen are identified and localised and the target objects are extracted. The result graph shows that the confidence level of the target object is very high. The so-called confidence contains two aspects: one is the probability that the bounding box contains an object, and the other is the accuracy of the bounding box. The former is recorded as Pr (object)—when there is a target Pr (object) = 1, when there is no target Pr (object) = 0—and the accuracy of the bounding box is determined by the IOU of the predicted box and the actual box (intersection over union, intersection and union ratio). The confidence level is displayed after the name of the predicted object for easy observation. This demonstrates that the deep learning approach can be used for target recognition, where the loss function for training is composed of CIoU, obj and cls.

L (t_{p}, t_{g t}) = \sum_{k = o}^{K} [α_{k}^{b a l a n c e} α_{b o x} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 1_{k i j}^{o b j} L_{C I o U} + α_{o b j} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 1_{k i j}^{o b j} L_{o b j} + α_{c l s} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 1_{k i j}^{o b j} L_{c l s}]

(1)

where L is the loss function; K, S², and B are the output feature map, cell and the number of anchors on each cell respectively, and α_∗ is the corresponding weight, the default value is α_box = 0.05, α_obj = 0.7, α_cls = 0.3;

1_{k i j}^{o b j}

indicates whether the kth output feature map, the ith cell, and the jth anchor box is a positive sample or not—if it is a positive sample then it will be 1, and if it is not, it will be 0; t_p, t_gt are the prediction vectors and ground-truth vectors;

α_{k}^{b a l a n c e}

is used to balance the weights of the output feature maps for each scale— the default value is [4.0, 1.0, 0.4], which corresponds to the output feature maps of 80 × 80, 40 × 40, 20 × 20 in turn. L_CIoU, L_obj, and L_cls, function on CIoU, obj, and cls respectively.

3. Sharpness Assessment

For the camera, the object in focus is the clearest, while the sharpness of the object on both sides of the focus will change with the distance from the object to the focus, and the further away from the focus, the blurrier the object will be, as shown in Figure 4. The highest point of the sharpness score is the most difficult to determine, because there may be a higher sharpness value on both sides of the higher sharpness score. This paper will conduct several experiments and find this point, that is, the focus position, and conduct a large number of measurement experiments on both sides of the highest point of the sharpness score to ensure that this position is the highest point of the sharpness score. Because of the characteristics of the camera, distance measurement by object sharpness is proposed. When there is a special case of the same sharpness value, the relatively large object in the image must be closer to the lens, while the smaller object is further away from the lens, as shown in Figure 5. The curve relationship between the predicted sharpness and the distance from the object to the lens is shown in Figure 6. In order to further explore the relationship between the two and obtain the related curve function, deep learning will be used to solve the problem.

3.1. Image Sharpness Assessment

In previous experiments, there were many ways to assess the sharpness of objects. For example, the LIVE database [39] is a colour image database developed by the University of Texas at Austin in 2006. It includes 779 distorted images with different average opinion scores (DMOS) from 161 observers, resulting in an image quality score ranging from 1 to 100. The AVA dataset [40] contains more than 250,000 images, each containing a semantic label, a style label, and an aesthetic score. There are 66 categories of semantic labels and 14 categories of style labels, with 10 scores ranging from 1 to 10. The TID2013 database [41,42] is a colour image database developed by Tampere University of Technology in Finland in 2013. It consists of 25 reference images in ‘BMP’ format, 3000 distorted images and 24 distortion types, subjectively evaluated by 985 observers, resulting in an image quality score ranging from 0 to 9. The NITS-IQA database [43] is composed of four hundred and five distorted images and nine original images. Subjective evaluations were made by 162 observers, resulting in an image quality score ranging from 1 to 100.

The image sharpness predictor used in this paper is implemented based on an improved Inception V4 network [44], replacing the last layer of the CNN with a fully connected layer with 10 neurons, which aims to generate a score ranging from 1–10 for any given image, which is then activated and outputted with softmax. The network structure is shown in Figure 7. In addition, the initial weight of the CNN is obtained by training on the ImageNet dataset [45]. In order to obtain the scores of image quality assessment, this paper conducts training on a large number of annotated datasets, including the LIVE dataset, TID2013 dataset, etc. It is worth mentioning that the model is not only based on a single scale prediction, but can make multi-scale predictions and obtain the final score by combining these predictions. In order to prove the reliability of the definition predictor, this paper also verifies the existing dataset by predicting the evaluation score distribution of the AVA dataset and generating histograms to compare with the original evaluation score distribution. As shown in Figure 8, the predicted mean value distribution is very close to the real mean value distribution.

To this end, the same image was tested with different resolution scores, and the experimental results are shown in Figure 9. It can be seen that when the image is the clearest, the score is 4.84678, while the fuzzy image score is only 2.68806. Even with a very clear image, it is extremely difficult to reach a high score.

3.2. Image Sharpness Evaluation at Different Distances

For the fixed-focus camera shooting objects of different distances, the sharpness will change, but within a certain distance, the human eye can hardly distinguish, and then it is necessary to judge by intelligent methods and build a data acquisition device, where the object and the detector are kept in the same horizontal line. Among them, the detector is an integrated photoelectric sensor and the carrying platform (HP-DMA2186), as shown in Figure 10. The results obtained by collecting different-distance images of the same object and evaluating the sharpness are shown in Figure 11. The detector is set to the fixed-focus mode for the experiment, in which the focal length of the detector is 11 mm (the current focal length can be displayed in the detector), the actual distance is measured using the infrared laser rangefinder, and then the length of the fixed tape measure is 2 m, which is the distance of each moving object, and the measurement results are obtained through several experiments. The final experimental measurement shows that the image of the object is the clearest at 5.172 m. The Pycharm 2023 software was used to draw a graph of distance and object sharpness, as shown in Figure 12. It can be seen that the curve of the experimental result is similar to the predicted curve: with the distance less than 5.172 m, the sharpness declines; when the distance is greater than 5.172 m, objects are also less sharp. In fact, for the case of 0 m, the object is attached to the detector, and the sharpness is 0. When the distance is greater than 5.172 m, the sharpness slowly declines until the detector is no longer able to capture the object; at this time the sharpness of the object becomes 0.

3.3. Definition Ranging for Different Targets

In order to further test the feasibility of sharpness ranging, sharpness is evaluated by evaluating different objects in the same horizontal direction, and the experimental principle is shown in Figure 13. Experimenting with two target objects, the object located on the left side has a sharpness score of 4.73108 and the object located on the right side has a sharpness score of 4.74301. Here, the distance of the object on the right side is the true value of 8.656 m derived from actual measurements, whereas the predicted distance of the object on the left side is 8.870 m, and the true value of the distance is 8.833 m. The sharpness is evaluated for another target in the horizontal direction of the centre target object, and the sharpness of the target located on the left side will be lower than the sharpness of the centre target object because the distance from the left target to the camera is further than the distance from the centre target object to the camera. The experimental results are shown in Figure 14. Subsequently, several groups of experiments were conducted, and the experimental data pairs were shown in Table 1. Through the comparison of several groups of experiments, it was found that the error of distance measurement by definition is very small. The distance of the other target object is derived from the curve of sharpness as a function of distance, and comparing the actual distance with the experimentally predicted distance, it can be seen that the error is very small. In addition, in order to further prove the feasibility of resolution ranging, comparative experiments were also carried out, and the experimental results are shown in Figure 15. It can be seen that the resolution ranging proposed in this paper is closer to the actual distance, and the target detection accuracy in this paper is higher. We also conducted data comparison, and the results are shown in Table 2.

3.4. Result Analysis

The object detected in this paper has obvious features, and the more features the object has, the more accurate it will be for identifying and measuring the sharpness of the object. Like a chair, it has two handles and a backrest. For this paper, the clearest distance of the target object under this detector is 5.172 m, and the imaging of the target object away from this point will become blurred.

4. Conclusions

In this paper, the target object was first located with object detection, and then the definition of the target object was evaluated by setting up CNN, and the distance of the object was measured using an infrared laser rangefinder and a tape measure. Then the function curve between the definition and the distance was drawn with the Pycharm 2023 software, and the distances corresponding to different definitions were generated by computer. The feasibility of this technique was verified with several experiments. The results of comparison experiments show that this technique has a more accurate ranging ability.

For object ranging, though the combination of sharpness ranging is undoubtedly a new road, this method has a variety of advantages, for traditional laser and radar ranging is easy to be interfered with by other equipment. Visual ranging is not easily affected. In particular, the extensive use of solid-state lidar may fill the scene with signal pollution, thus affecting the effect of laser detection. In visual ranging, the definition ranging proposed in this paper has many advantages. Firstly, it can measure the distance of unknown objects, whereas the previous monocular and binocular ranging needed the size of the known object to be able to calculate the distance. Secondly, it requires a small amount of computation, whereas the current 3D scene reconstruction will reconstruct the environment around the object to measure the distance and this process requires a lot of calculation. Thirdly, its measurement accuracy is high; as mentioned in this paper, using the YOLO method for ranging, results in relatively large errors. This is because the size of the target object is set to a fixed value before calculation, but in reality, the size of the target object is unknown, which will lead to a large error in the final ranging. Compared with other visual ranging methods, it can be better used in the field of transportation, military and other related fields.

Author Contributions

Conceptualization, W.D. and C.Z.; methodology, W.D.; software, Y.J.; validation, W.D., C.Z. and Y.J.; formal analysis, W.D.; investigation, Y.J.; resources, W.D. and C.Z.; data curation, Y.J.; writing—original draft preparation, Y.J.; writing—review and editing, Y.J.; visualization, Y.J.; supervision, W.D.; project administration, W.D. and C.Z.; funding acquisition, W.D. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shenzhen Science and Technology Program (Grant No. ZDSYS20200811143600001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, Y.; Meng, Z.; Liu, J.; Jiang, H. High-precision infrared pulse laser ranging for active vehicle anti-collision application. In Proceedings of the International Conference on Electric Information & Control Engineering, Wuhan, China, 15–17 April 2011. [Google Scholar]
Xie, X.S.; Fang, Y.W.; Wu, Y.F. Infrared laser ranging in auto adaptive cruise control system. Laser Technol. 2004, 28, 521–523. [Google Scholar]
IEEE Staff. 2010 Intl. Symposium on Spread Spectrum Techniques and Applications; IEEE: Piscataway, NJ, USA, 2010. [Google Scholar]
Parviainen, J.; Lopez, M.A.V.; Pekkalin, O.; Hautamaki, J.; Collin, J.; Davidson, P. Using Doppler Radar and MEMS Gyro to Augment DGPS for Land Vehicle Navigation. In Proceedings of the IEEE Control Applications, St. Petersburg, Russia, 8–10 July 2009. [Google Scholar]
Barford, L. Parallel Transition Localization. In Proceedings of the Instrumentation and Measurement Technology Conference (I2MTC), 2010 IEEE, Austin, TX, USA, 3–6 May 2010; pp. 176–180. [Google Scholar]
Kuo, Y.C.; Pai, N.S.; Li, Y.F. Vision-based vehicle detection for a driver assistance system. Comput. Math. Appl. 2011, 61, 2096–2100. [Google Scholar] [CrossRef]
Yan, L.; Hao-Xue, L. Research on lengthways vehicle distance measurement system of monocular photograph based on computer vision. J. Highw. Transp. Res. Dev. 2004, 21, 103–106. [Google Scholar]
Varuna, D.S.; Jamie, R.; Ahmet, K. Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots. Sensors 2018, 18, 2730. [Google Scholar] [CrossRef]
Tippetts, B.J.; Lee, D.J.; Archibald, J.K. An on-board vision sensor system for small unmanned vehicle applications. Mach. Vis. Appl. 2012, 23, 403–415. [Google Scholar] [CrossRef]
Frag, A.L.; Yu, X.R.; Yi, W.J.; Saniie, J. Indoor Navigation System for Visually Impaired People using Computer Vision. In Proceedings of the 2022 IEEE International Conference on Electro Information Technology (eIT), Mankato, MN, USA, 19–21 May 2022; Volume 19–21, pp. 257–260. [Google Scholar]
Li, S.; Zhao, Q. Research on the Emergency Obstacle Avoidance Strategy of Intelligent Vehicles Based on a Safety Distance Model. IEEE Access 2023, 11, 7124–7134. [Google Scholar] [CrossRef]
Nunes, D.; Fortuna, J.; Damas, B.; Ventura, R. Real-time Vision Based Obstacle Detection in Maritime Environments. In Proceedings of the 2022 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), Santa Maria da Feira, Portugal, 29–30 April 2022; Volume 29–30, pp. 243–248. [Google Scholar]
Shi, Z.; Li, Z.; Che, S.; Gao, M.; Tang, H. Visual Ranging Based on Object Detection Bounding Box Optimization. Appl. Sci. 2023, 13, 10578. [Google Scholar] [CrossRef]
Zhu, M.; Yu, L.; Wang, Z.; Ke, Z.; Zhi, C. Review: A Survey on Objective Evaluation of Image Sharpness. Appl. Sci. 2023, 13, 2652. [Google Scholar] [CrossRef]
Zhu, M.L.; Ge, D.Y. Image quality assessment based on deep learning with FPGA implementation. Signal Process. Image Commun. 2020, 83, 115780. [Google Scholar] [CrossRef]
Ma, D.; Wen, H.; Li, X.; Xie, T.; Li, X. A sub-regional compression method for greenhouse images based on CNN image quality assessment. J. Food Process. Preserv. 2022, 46, e16992. [Google Scholar] [CrossRef]
Yang, X.; Li, F.; Liu, H. TTL-IQA: Transitive transfer learning based no-reference image quality assessment. IEEE Trans. Multimed. 2021, 23, 4326–4340. [Google Scholar] [CrossRef]
Liu, Z.; Hong, H.; Gan, Z.; Wang, J.; Chen, Y. An Improved Method for Evaluating Image Sharpness Based on Edge Information. Appl. Sci. 2022, 12, 6712. [Google Scholar] [CrossRef]
Li, X.; He, S. Blind Image Quality Evaluation Method Based on Cyclic Generative Adversarial Network. IEEE Access 2024, 12, 40555–40568. [Google Scholar] [CrossRef]
Yang, R.; Yu, S.Y.; Yao, Q.H.; Huang, J.M.; Ya, F.M. Vehicle Distance Measurement Method of Two-Way Two-Lane Roads Based on Monocular Vision. Appl. Sci. 2023, 13, 3468. [Google Scholar] [CrossRef]
Huang, T.Y.; Yang, X.J.; Xiang, G.H.; Chen, L. Study on small target pedestrian detection and ranging based on monocular vision. Comput. Sci. 2023, 30, 94–99. [Google Scholar]
Zhang, Y.; Gong, Y.; Chen, X. Research on YOLOv5 Vehicle Detection and Positioning System Based on Binocular Vision. World Electr. Veh. J. 2024, 15, 62. [Google Scholar] [CrossRef]
Zhang, C.; Guo, C.; Li, Y. Research on Aircraft Door ldentification and Position Method Based on lmproved YOLOv5. Comput. Meas. Cont. 2024, 1–9. Available online: http://kns.cnki.net/kcms/detail/11.4762.TP.20230816.1145.022.html (accessed on 16 August 2023).
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ji, Y.; Cao, Y.; Cheng, X.; Zhang, Q. Research on the Application of Helmet Detection Based on YOLOv4. J. Comput. Commun. 2022, 10, 129–139. [Google Scholar] [CrossRef]
Contributors, M. YOLOv7 by MMYOLO. Available online: https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov7 (accessed on 15 August 2023).
Lan, W.; Dang, J.; Wang, Y.; Wang, S. Pedestrian detection based on YOLO network model. In Proceedings of the 2018 IEEE International Conference on Mechatronics and Automation (ICMA), Jilin, China, 5–8 August 2018; pp. 1547–1551. [Google Scholar]
Hsu, W.Y.; Lin, W.Y. Adaptive fusion of multi-scale YOLO for pedestrian detection. IEEE Access 2021, 9, 110063–110073. [Google Scholar] [CrossRef]
Dazlee, N.M.A.A.; Khalil, S.A.; Abdul-Rahman, S.; Mutalib, S. Object detection for autonomous vehicles with sensor-based technology using yolo. Int. J. Intell. Syst. Appl. Eng. 2022, 10, 129–134. [Google Scholar] [CrossRef]
Liang, S.; Wu, H.; Zhen, L.; Hua, Q.; Garg, S.; Kaddoum, G.; Hassan, M.M.; Yu, K. Edge YOLO: Real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25345–25360. [Google Scholar] [CrossRef]
Shinde, S.; Kothari, A.; Gupta, V. YOLO-based human action recognition and localization. Procedia Comput. Sci. 2018, 133, 831–838. [Google Scholar] [CrossRef]
Ashraf, A.H.; Imran, M.; Qahtani, A.M.; Alsufyani, A.; Almutiry, O.; Mahmood, A.; Attique, M.; Habib, M. Weapons detection for security and video surveillance using CNN and YOLO-v5s. Comput. Mater. Contin. 2022, 70, 2761–2775. [Google Scholar]
Zheng, Y.; Zhang, H. Video Analysis in Sports by Lightweight Object Detection Network under the Background of Sports Industry Development. Comput. Intell. Neurosci. 2022, 2022, 3844770. [Google Scholar] [CrossRef]
Ma, H.; Celik, T.; Li, H. Fer-yolo: Detection and classification based on facial expressions. In Proceedings of the Image and Graphics: 11th International Conference, ICIG 2021, Proceedings, Part I 11, Haikou, China, 26–28 December 2021; pp. 28–39. [Google Scholar]
Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H. Designing Network Design Strategies Through Gradient Path Analysis. arXiv 2022, arXiv:2211.04800. [Google Scholar]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
Sheikh, H.R.; Wang, Z.; Cormack, L.; Bovik, A.C. LIVE Image Quality Assessment Database Release 2. 2005. Available online: http://live.ece.utexas.edu/research/quality (accessed on 8 September 2006).
Murray, N.; Marchesotti, L.; Perronnin, F. AVA: A large-scaledatabase for aesthetic visual analysis. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2408–2415. [Google Scholar]
Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Jin, L.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. Color image database TID2013: Peculiarities and preliminary results. In Proceedings of the 4th European Workshop on Visual Information Processing (EUVIP), Paris, France, 10–12 June 2013; pp. 106–111. [Google Scholar]
Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Jin, L.; Egiazarian, K.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. A New Color Image Database TID2013: Innovations and Results. In Advanced Concepts for Intelligent Vision Systems: Proceedings of the 15th International Conference, ACIVS 2013, Poznań, Poland, 28–31 October 2013; Springer International Publishing: Berlin/Heidelberg, Germany, 2013; Volume 8192, pp. 402–413. [Google Scholar]
Ruikar, J.; Chaudhury, S. NITS-IQA Database: A New Image Quality Assessment Database. Sensors 2023, 23, 2279. [Google Scholar] [CrossRef]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]

Figure 1. ELAN module.

Figure 2. DeepSORT flowchart.

Figure 3. Target detection and target object region extraction map.

Figure 4. Sharpness of objects near the focal point.

Figure 5. Objects of the same sharpness.

Figure 6. Plot of predicted sharpness as a function of distance.

Figure 7. Convolutional neural network structure.

Figure 8. Comparison plots of assessment scores using the CNN Prediction AVA dataset.

Figure 9. Same picture with different resolution contrasts.

Figure 10. Detector structure and appearance.

Figure 11. Comparison of object sharpness at different distances.

Figure 12. Plot of sharpness of actual results as a function of distance.

Figure 13. Contrast in sharpness of different objects at the same level.

Figure 14. Results of different object-ranging experiments. (a) the overall area, (b) the area where the target object is located.

Figure 15. Comparison of experimental results. (1) Combining YOLOV5 to achieve monocular distance measurement; (2) Monocular distance measurement achieved by combining specific convolutional neural network.

Table 1. Results of multiple ranging experiments.

Sharpness Score	Predicted Distance	Actual Distance	Magnitude of Error
4.73108	8.870 m	8.833 m	≈0.5%
4.52128	11.391 m	11.404 m	≈0.1%
4.41614	13.756 m	13.802 m	≈0.3%
4.37261	15.793 m	15.719 m	≈0.5%
4.24381	18.692 m	18.987 m	≈1.6%

Table 2. Comparison of experimental results of different methods.

Experimental Methods	Actual Value	Predicted Value	Magnitude of Error	Actual Value	Predicted Value	Magnitude of Error
Sharpness ranging method	8.833 m	8.870 m	≈0.5%	11.404 m	11.391 m	≈0.1%
Sharpness ranging method	8.656 m	8.561 m	≈1.1%	11.316 m	11.291 m	≈0.2%
Distance measuring method combined with YOLO method	8.833 m	7.65 m	≈13%	11.404 m	9.49 m	≈16%
Distance measuring method combined with YOLO method	8.656 m	7.47 m	≈14%	11.316 m	9.52 m	≈16%
Monocular distance measurement was performed in combination with CNN	8.833 m	8.01 m	≈0.9%	11.404 m	10.31 m	≈1.0%
	8.656 m	7.89 m	≈0.8%	11.316 m	10.07 m	≈1.1%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, Y.; Zhou, C.; Dai, W. Sharpness-Based Distance Detection. Appl. Sci. 2024, 14, 8913. https://doi.org/10.3390/app14198913

AMA Style

Jin Y, Zhou C, Dai W. Sharpness-Based Distance Detection. Applied Sciences. 2024; 14(19):8913. https://doi.org/10.3390/app14198913

Chicago/Turabian Style

Jin, Ying, Cangtao Zhou, and Wanjun Dai. 2024. "Sharpness-Based Distance Detection" Applied Sciences 14, no. 19: 8913. https://doi.org/10.3390/app14198913

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sharpness-Based Distance Detection

Abstract

1. Introduction

2. Target Identification and Location

2.1. YOLOv7 Basic Model

2.2. Attention Mechanism

2.3. DeepSORT Algorithm

2.4. Target Detection Experiment

3. Sharpness Assessment

3.1. Image Sharpness Assessment

3.2. Image Sharpness Evaluation at Different Distances

3.3. Definition Ranging for Different Targets

3.4. Result Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI