Abstract
The growing number of vehicles on the roads has resulted in several challenges, including increased accident rates, fuel consumption, pollution, travel time, and driving stress. However, recent advancements in intelligent vehicle technologies, such as sensors and communication networks, have the potential to revolutionize road traffic and address these challenges. In particular, the concept of platooning for autonomous vehicles, where they travel in groups at high speeds with minimal distances between them, has been proposed to enhance the efficiency of road traffic. To achieve this, it is essential to determine the precise position of vehicles relative to each other. Global positioning system (GPS) devices have an intended positioning error that might increase due to various conditions, e.g., the number of available satellites, nearby buildings, trees, driving into tunnels, etc., making it difficult to compute the exact relative position between two vehicles. To address this challenge, this paper proposes a new architectural framework to improve positioning accuracy using images captured by onboard cameras. It presents a novel algorithm and performance results for vehicle positioning based on GPS and video data. This approach is decentralized, meaning that each vehicle has its own camera and computing unit and communicates with nearby vehicles.
1. Introduction
We live in a constantly developing world, where cities are becoming more and more crowded, and the number of cars is constantly increasing. Traffic becomes more and more congested because the road infrastructure can no longer cope with the increasing number of vehicles. This means more fuel consumption, more pollution, longer journeys, stressed drivers, and, most importantly, an increase in the number of accidents. Pedestrians, cyclists, and motorcyclists are the most exposed to road accidents. According to a report from 2017 by the World Health Organization, every year, 1.25 million people die in road accidents, and millions more are injured [1]. The latest status report from 2023 [2] indicates a slight decrease in the number of road traffic deaths to 1.19 million per year, highlighting the positive impact of efforts to enhance road safety. However, it underscores that the cost of mobility remains unacceptably high. The study described in [3] tracked the progress of reducing the number of car accidents since 2010 in several cities. It concluded that very few of the studied cities are improving road safety at a pace that will reduce road deaths by 50% by 2030, in line with the United Nations’ road safety targets.
Autonomous vehicles can avoid some errors made by drivers, and they can improve the flow of traffic by controlling their pace so that traffic stops oscillating. They are equipped with advanced technologies such as global positioning systems (GPS), video cameras, radars, light detection and ranging (LiDARs), and many other types of sensors. They can travel together, exchanging information about travel intentions, detected hazards and obstacles, etc., through vehicle-to-vehicle (V2V) or vehicle to everything (V2X) communication networks.
To increase the efficiency of road traffic, the idea of grouping autonomous vehicles into platoons through information exchange was proposed in [4]. Vehicles should consider all available lanes on a given road sector when forming a group and travel at high speeds with minimal safety distances between them. However, this is possible only if a vehicle can determine its precise position with respect to other traffic participants.
In recent years, image processing and computer vision techniques have been widely applied to solve various real-world problems related to traffic management, surveillance, and autonomous driving. In particular, the detection of traffic participants such as vehicles, pedestrians, and bicycles [5] plays a crucial role in many advanced driver assistance systems (ADAS) and smart transportation applications.
Image processing is important in optimizing traffic by being used to develop functionalities that reduce the number of accidents, increase traffic comfort, and group vehicles into platoons. Several approaches have been proposed over time to detect traffic participants with video cameras using convolutional neural networks [6,7], but most of them require a significant amount of computing power and cannot be used in real-time due to increased latency.
In this paper, a proof of concept algorithm to solve a part of the image-based vehicle platooning problem is proposed. It uses a decentralized approach, with each vehicle performing its own computing steps and determining its position with respect to the nearby vehicles. This approach relies on images acquired by the vehicle’s cameras and the communication between vehicles. To test our approach, we used cheap commercial dashboard cameras equipped with a GPS sensor. No other sensors were used, mainly because they would have greatly increased the hardware cost. Each vehicle computes an image descriptor for every frame in the video stream, which it sends as a message along with other GPS information to other vehicles. Vehicles within communication range receive this message and attempt to find the frame in their own stream that most closely resembles the received one. The novelty of this approach lies in calculating the distance between the two vehicles by matching image descriptors computed for frames from both vehicles, determining the time difference at which the two frames were captured, and considering the traveling speeds of vehicles.
The rest of the paper is organized as follows. Section 2 presents some vehicle grouping methods for traffic optimization, then reviews applications of image processing related to street scenes, and, lastly, presents several image descriptors. The method proposed in this paper is detailed in Section 3, while in Section 4, the implementation of the algorithm is presented. In Section 5, preliminary results are presented, demonstrating the feasibility of the proposed algorithm. Finally, Section 6 presents the main conclusions of this study and directions for future research.
2. Related Work
The aim of this study is to describe a system architecture for positioning nearby vehicles using image processing techniques. The related work section is divided into three subsections: traffic optimization, street scene image processing, and image descriptors, as detailed in the following subsections.
2.1. Traffic Optimization
The majority of driver errors can be avoided by autonomous vehicles. They can reduce traffic oscillations by maintaining a safe distance, while exchanging information in real-time. Single-lane platooning solutions, proposed in [8,9], prove that vehicle platooning improves traffic safety and increases the capacity of existing roads.
To further increase traffic flow, ref. [4] extends the idea of single-lane platoons to multi-lane platoons [10,11,12]. Vehicles should consider all available lanes on a given road sector when creating a group of vehicles and travel with small distances between them at high speeds. The platoon should be dynamic to allow new vehicles to join or leave the group, be able to overcome certain obstacles encountered on the road, and also allow faster-moving vehicles to overtake. Through the vehicle-to-vehicle communication network, they exchange information about travel intentions, dangers, and detected obstacles.
Vehicle movement control is divided into two parts: lateral control and longitudinal control. Lateral control is in charge of changing lanes, whereas longitudinal control is in charge of actions in the current lane. These two, when combined, must ensure that vehicle collisions are avoided. Maintaining formations and joining new members is made easier with lateral control via a lane change solution. At the same time, longitudinal control is used to keep a safe distance between vehicles. As such, the triangle formation strategy inspired by [13] is usually chosen because it offers several advantages, such as the stability of each member within the group and quick regrouping of the members in case of a dissolving scenario.
The way platoons are formed is based on the fields of swarm robotics and flocking, which are inspired by nature, more precisely, by the way fish, birds [14], insects, and mammals interact in a group [15,16]. An individual possesses poor abilities, but within a group, they contribute to the formation of complex group behaviors and, thus, provide flexibility and robustness, such as route planning and task allocation.
2.2. Street Scene Image Processing
Vehicle detection plays a very important role in modern society, significantly impacting transportation efficiency, safety, and urban planning. It optimizes traffic flow by refining signal timings and reducing congestion, as evidenced in [17]. Moreover, advancements in vehicle detection technology have facilitated features like automatic collision avoidance and pedestrian detection, contributing to a decrease in accidents [5].
Law enforcement also benefits from vehicle detection systems, aiding in tasks like license plate identification and stolen vehicle tracking [18]. Additionally, these systems support efficient parking management by monitoring parking spaces and guiding drivers to available spots [19].
Regarding the topic of advanced driver-assistance systems (ADAS), recent studies provide comprehensive reviews of vision-based on-road vehicle detection systems [20,21]. These systems, mounted on vehicles, face challenges in handling the vast amounts of data from traffic surveillance cameras and necessitate real-time analysis for effective traffic management.
Addressing challenges in traffic monitoring requires precise object detection and classification, accurate speed measurement, and interpretation of traffic patterns. Techniques proposed in studies offer efficient approaches for detecting cars in video frames, utilizing image processing methods [22].
While vision-based solutions have made significant advances in the automotive industry, they remain vulnerable to adverse weather conditions [23]. Weather elements like heavy rain, fog, or low lighting can potentially impact the accuracy and reliability of these systems, thus necessitating further research and development efforts.
Moreover, communication between vehicles is an emerging area of research, as highlighted in [24]. Understanding and optimizing vehicle-to-vehicle communication techniques are essential for enhancing road safety and traffic efficiency. Implementing robust communication protocols can facilitate cooperative driving strategies, leading to smoother traffic flow and reduced congestion levels.
2.3. Image Descriptors
Image descriptors are essential in computer vision and image processing. They extract robust and distinctive features from images for tasks like matching, recognition, and retrieval. Rather than processing the entire image, these techniques focus on specific key points. Each key point is associated with a descriptor that describes its properties. Examples of such descriptors are provided below.
2.3.1. Scale-Invariant Feature Transform (SIFT) Descriptor
The scale-invariant feature transform (SIFT) [25] is a widely used feature descriptor in computer vision and image processing. It extracts distinctive features using the scale-space extrema detection and the difference in the Gaussian (DoG) method. SIFT provides invariance to image scaling, rotation, and robustness against changes in viewpoint, illumination, and occlusion.
The algorithm consists of several steps [26]:
- Scale-space extrema detection: this step identifies potential points of interest that are invariant to orientation using the difference of Gaussians (DoG) function.
- Key point localization: the algorithm establishes the location and scale of key points to measure their stability.
- Contrast threshold: following the selection of key points, the algorithm sets a contrast threshold to ensure stability. By considering the DoG function as a contrast function, key points with a DoG value less than 0.03 (after normalizing the intensity [0 1]) are excluded from the list.
- Eliminating edge response: the next approach for localizing key points involves the elimination of edge responses. This is achieved by employing the Hessian matrix derived from the DoG function,Due to variations in the DoG function, there are significant changes in curvature at edges perpendicular to the direction of interest. To ensure robustness, key points lacking a stable maximum are excluded. This is achieved by predicting the Hessian matrix and computing the sum of its eigenvalues.In rare cases, the curvatures may have different signs when the determinant is negative. Let r denote the ratio of the largest to the smallest magnitude eigenvalue, then the expression reaches its minimum value when the two eigenvalues are equal, and it rises as r increases. To ensure the principal curvatures ratio stays below a threshold r, only verifying r is necessary.Therefore, key points with a ratio between the primary curvatures greater than 10 are disregarded when .
- Orientation assignment: local image gradient directions are assigned to each key point position.
- Key point descriptor: descriptors are obtained from the region surrounding each key point, incorporating local image gradients and scale information to represent significant shifts in light and local shape distortions.
2.3.2. Sped-Up Robust Feature (SURF) Descriptor
The sped-up robust feature (SURF) descriptor [27] is a faster and more efficient alternative to SIFT for feature extraction in computer vision and image processing. It is based on Haar wavelet responses and the determinant of the Hessian matrix. SURF achieves comparable performance to SIFT in matching and recognition tasks while significantly improving processing efficiency. Unlike SIFT, which utilizes the difference of Gaussian (DoG) technique to approximate the Laplacian of Gaussian (LoG), SURF employs box filters. This approach offers computational advantages, as box filters can be efficiently computed, and calculations for different scales can be performed simultaneously.
To handle orientation, SURF calculates Haar-wavelet responses in both the x and y directions within a 6s neighborhood around each key point, with s being proportional to scale. The orientation is determined by summing the responses in a sliding scanning area.
For feature extraction, a 20 s × 20 s neighborhood is extracted around each key point and divided into 4 × 4 cells. Haar wavelet responses are computed for each cell, and the responses from all cells are concatenated to form a 64-dimensional feature descriptor.
The SURF algorithm’s implementation involves the following key steps [28]:
- Identifying salient features like blobs, edges, intersections, and corners in specific regions of the integral image. SURF utilizes the fast Hessian detector for feature point detection.
- Utilizing descriptors to characterize the surrounding neighborhood of each feature point. These feature vectors must possess uniqueness while remaining robust to errors, geometric deformations, and noise.
- Assigning orientation to key point descriptors by calculating Haar wavelet responses across image coordinates.
- Ultimately, SURF matching is conducted using the nearest-neighbor approach.
2.3.3. Oriented FAST and Rotated BRIEF (ORB) Descriptor
The oriented FAST and rotated BRIEF (ORB) descriptor [29] is an efficient algorithm for feature extraction in computer vision and image processing. It combines the FAST key point detector [30] with the binary robust independent elementary features (BRIEF) descriptor [31] and introduces rotation invariance. This makes ORB robust to image rotations and enhances its performance in matching and recognition tasks [32]. It achieves comparable performance to other popular descriptors like SIFT and SURF while being significantly faster in computation time.
The ORB method utilizes a simple measure for corner orientation, namely the intensity centroid [33]. First, the moments of a patch are defined as follows:
With these moments, the centroid, also known as the ‘center of mass’ of the patch, can be determined as follows:
One can construct a vector from the corner’s center O to the centroid . The orientation of the patch is then provided as follows:
After calculating the orientation of the patch, it can be rotated to a canonical position, enabling the computation of the descriptor and ensuring rotation invariance. BRIEF (9) lacks rotation invariance; hence, ORB employs rotation-aware BRIEF (rBRIEF) (11). ORB integrates this feature while maintaining the speed advantage of BRIEF:
where is defined as follows:
and is the intensity p at a point x.
The steered BRIEF operator is obtained as follows:
2.3.4. Boosted Efficient Binary Local Image Descriptor (BEBLID)
The boosted efficient binary local image descriptor (BEBLID) [34] is a newer binary descriptor that encodes intensity differences between neighboring pixels. It provides an efficient representation for feature matching and recognition. BEBLID enhances the performance of the real-valued descriptor, boosted efficient local image descriptor (BELID) [35], improving both matching efficiency and accuracy.
The proposed algorithm assumes that there is a training dataset consisting of pairs of images, denoted as , where are labeled with . Here, indicates that two blocks belong to the same image structure, while indicates that they are different. The objective is to minimize the loss using AdaBoost:
where , and represents the learning rate parameter. The function corresponds to the k-th weak learner (WL) in combined with . The WL depends on the feature extraction function and threshold T, defined as follows:
The BEBLID feature extraction function is defined as follows:
where represents the gray value of pixel q, and is the square image frame centered at p, with an area of s. Thus, computes the average gray values of pixels in and and thresholds it. To output binary values , are represented as 0 and as 1, resulting in the BEBLID binary descriptor.
BEBLID has demonstrated superior performance compared to other state-of-the-art descriptors such as SIFT [25], SURF [27], ORB [29], and convolutional neural networks (CNNs) [36] in terms of speed, accuracy, and robustness.
Compared to CNNs, a powerful deep learning approach for feature extraction and recognition, BEBLID is a lightweight and efficient alternative, ideal for low-resource applications. It also offers easier interpretation and debugging due to its binary string representation. Hence, this paper utilizes the advantages and performance of BEBLID as the chosen algorithm.
3. Precise Localization Algorithm
In this paper, an algorithm is proposed to help vehicles position themselves with respect to other nearby vehicles. The approximate distance between two nearby vehicles is computed using GPS data. The exact position between two vehicles cannot be computed using only GPS data because all commercial GPS devices have an intended positioning error [37]. This error might increase further according to various specific conditions, like the number of available satellites, nearby buildings, trees, driving into tunnels, etc. For example, when using a smartphone, the GPS error can be, on average, as much as 4.9 m [38]. Such errors can lead to potentially dangerous situations if any relative vehicle positioning system relies solely on GPS data. For this reason, the aim of this paper is to increase the positioning accuracy using images captured by cameras mounted on each vehicle. Thus, the proposed solution aims to find two similar frames from different vehicles within a certain distance range. Each vehicle sends information about multiple consecutive frames while also receiving similar information from other vehicles for local processing. By using an algorithm to match image descriptors calculated based on these frames, a high number of matches indicates that the vehicles are in relatively the same position. Using the timestamps associated with the two frames, we can determine the moment each vehicle was in that position, allowing us to calculate the distance between them by considering their traveling speed and the time difference between the two.
The proposed approach is decentralized, meaning that each vehicle acts as an independent entity. It has to cover the information exchange with the other vehicles as well as processing the self-acquired and received data. In our model, vehicles employ a V2X communication system with the broadcast information, but they will also use a V2V communication model if the distance to a responding nearby vehicle is below a pre-defined threshold (Figure 1). Each vehicle will broadcast processed information, not being aware if any other vehicle will receive it.
Figure 1.
Vehicle communication. First, each vehicle broadcasts data in a V2X communication model (blue circles). Depending on the computed distance between two vehicles, they can start V2V communication (orange arrows).
The proposed algorithm assumes that each vehicle is equipped with an onboard camera with GPS and a computing unit. The GPS indicates the current position of the vehicle in terms of latitude and longitude, as well as the timestamp at that time and the vehicle’s speed. The vehicle computing unit will handle all computations and communications, so it will process the data and send it to the other vehicles. Also, the processing unit will receive data from other vehicles that are nearby. The processing unit must determine, based on the received information, if a V2V communication can start between the two vehicles. If it can, it will begin an information exchange with the other vehicle and will process the subsequent received data. This means that each vehicle has two roles: the first one involves data processing and communication, while the second involves receiving messages from other nearby vehicles and analyzing them. As the paper does not focus on the communication model itself but rather on the image processing part, we employed a very simple and straightforward communication model. This model cannot be used in real-world applications, where security, compression, and other factors must be taken into consideration. Our main focus when developing the communication model was the main processing steps needed from the image processing point of view. The send and receive roles are described in the following subsection.
3.1. Message Transmission Procedure
To avoid sending large amounts of irrelevant data between vehicles, a handshake system must be defined first. This will prevent congestion in any communication technique. The handshake system allows all vehicles to broadcast their GPS position. As a low-ranged communication system is assumed, only nearby vehicles will receive this message. Any nearby vehicle that receives this broadcast message will compute the distance between the sending vehicle and itself, and if the distance is lower than a threshold, it will send back a start communication message. The distance threshold should be around 15 to 20 m. This will take into consideration both the GPS errors and the minimum safety distance between two vehicles. Also, note that, as most GPS systems record data once per second and the video records at a much greater rate, synchronization between the GPS coordinates and each frame must be performed. In other words, if the camera records 30 frames per second, it means that 30 frames will have attached the same GPS coordinate. For example, a car driving at 60 km/h will travel 16.6 m/s during this time interval. Thus, during these approximately 17 m, the messages sent by the vehicle will have the same GPS position, leading to potentially dangerous situations if data from images is not taken into consideration.
After establishing that the two vehicles are close enough and that image data has to be sent between them, the next question to be answered is exactly which data should be exchanged. Sending the entire video stream is unfeasible due to high bandwidth requirements, so the most straightforward approach is to compute key points for each frame. Then, for each detected key point, a descriptor is computed. All these steps are presented in the flowchart illustrated in Figure 2. Once the descriptors have been computed, every piece of information related to the current frame is serialized and sent to other paired vehicles. This includes the timestamp, latitude, longitude, speed, key points, and descriptors.
Figure 2.
Send message architecture overview.
3.2. Message Reception Procedure
After passing the handshake system described before, each vehicle will only receive messages from nearby vehicles. The message will contain data about GPS positioning and processed image data. The steps these data will take are presented in Figure 3 and described in detail in the following subsection.
Figure 3.
Receive message architecture overview.
Each vehicle will also have its own video stream that is processed locally. This means that, for each frame, the key points and their descriptors are computed. These will have to be matched against the key points and descriptors received from other vehicles. There are various algorithms developed for feature matching, but the most used ones are brute-force matcher (BFMatcher) [39] and fast library for approximate nearest neighbors (FLANN) [40]. Brute-force matcher matches one feature descriptor from the first set with all features in the second set using distance computation to find the closest match. FLANN is an optimized library for fast nearest neighbor search in large datasets and high dimensional features, which works faster than BFMatcher for large datasets and requires two dictionaries specifying the algorithm and its related parameters.
The matching algorithm will output the matched descriptors, which are the descriptors that correspond to two matched pixels in the input images. Usually, filtering is carried out to remove the outliers, i.e., points that have the same descriptors and do not correspond to matching pixels in the input images. Having the two sets of matched descriptors, the next step is to determine the relative position between them. In other words, at this point, a set of matched points is available, but the location of the points from the received image (their descriptors) in the current vehicle’s frame is unknown. This will determine where the two vehicles are positioned with respect to each other: if the points are located in the center of the image, it means that the two vehicles are in the same lane, one in front of the other. If the points are located to the side of the image, it means that the two vehicles are on separate lanes, close to each other. In other words, if corresponding matched points are located on the right side of the first image and on the left side of the second image, then the first vehicle is on the right side of the second vehicle.
One way to determine the points’ relative position is to compute a homography matrix that transforms a point in the first image into a point in the second image. Once the homography matrix is computed, it can be applied to the points from the first image to see where those points are in the second image. In this way, the two vehicles can be relatively positioned in relation to each other.
Of course, it might happen that the current frame from the receiving vehicle corresponds to an older or newer frame from the receiving car. In these cases, multiple comparisons between the current frame and the received frames must be performed. By combining all these pieces of information, the vehicle can detect the location of each nearby vehicle accurately and efficiently, and the exact steps for these are described in detail in the following section regarding algorithm implementation.
4. Framework for Algorithm Implementation
The following section details the implementation of the algorithm, which involves several steps: resizing the image dimensions, extracting the camera-displayed time, simulating the communication process, detecting the corresponding frame, defining the distance calculation formula, and outlining the hardware equipment utilized.
4.1. Adjust Image Dimensions
Considering that the camera captures a significant part of the car dashboard (see, for example, Figure 4), this aspect can negatively influence the image-matching algorithm. It is important to note that this particular area is not relevant for the intended purposes. Furthermore, in that area, information from the camera is displayed, such as the camera name, current date, and speed. These elements can also affect the performance of the used image descriptors. For this reason, the decision was made to crop out the respective area from the original image, thus eliminating irrelevant information and retaining only the essential data for the intended purposes. The cropped area corresponds to the region beneath the red line in Figure 4, and this approach enables greater precision in image matching and enhances the algorithm’s performance regarding the specific intended objectives.
Figure 4.
Cropping out non-relevant image areas: enhancing data relevance and algorithm efficiency.
4.2. Extract Camera Displayed Time
It is necessary to consider that most GPS systems record data once per second, while video recording is carried out at a much higher rate. Therefore, meticulous synchronization between GPS coordinates and each video frame is required. To put it simply, if the camera records 30 frames per second, it means that 30 frames will have the same GPS coordinate attached. To address this issue and ensure the accuracy of our data, optical character recognition (OCR) technology was utilized to extract time information from the images provided by the camera. We specified the exact area where the time is displayed in the image and assigned the corresponding GPS coordinates to that moment. This synchronization process has allowed us to ensure that GPS data are accurately correlated with the corresponding video frames, which is essential for the subsequent analysis and understanding of our information.
To extract text from the image, we used Tesseract version 5.3.0, an open-source optical character recognition engine [41]. Tesseract is renowned for its robustness and accuracy in converting images containing text into editable content.
4.3. Simulation of Vehicle-to-Vehicle Communication
In our system, a vehicle extracts key points from each frame to obtain significant information about the surrounding environment. Additionally, information is retrieved from the GPS system, providing data about latitude, longitude, and crucial details about the number of lanes and vehicle speed, using the specialized function.
To ensure the exchange of information with other involved vehicles, a function was developed to simulate message transmission. As stated in Section 3, our paper focuses mainly on image processing and not on the communication itself. This is why we use a simulation model, in order to prove the feasibility of the proposed method. The function that simulates the communication is responsible for transmitting the processed data to other vehicles. This vehicle-to-vehicle communication is simulated through a file where the information is stored and later read.
To receive and process messages from other vehicles, a function for message reception is utilized. This simple function reads information from the specific file and extracts relevant data to make decisions and react appropriately within our vehicle communication system.
Overall, this architecture enables us to successfully collect, transmit, and interpret data to facilitate efficient communication and collaboration among vehicles in our project.
4.4. Detection of the Corresponding Frame
To detect the frame received from another vehicle within the current vehicle’s frame sequence, we used an approach relying on the number of matches between the two frames. Thus, the frame with the highest number of matches in the video recording was selected.
When received information about a frame is searched in a video, the closer it gets to the corresponding frame, the higher the number of matches increases (as observed in Figure 5). Essentially, the closer the current vehicle gets to the position where that frame was captured, the more similar the images become. With some exceptions that will be discussed at the end of the paper, this approach proved valid during all our tests.
Figure 5.
Observing frame proximity in video analysis: closer vehicle positioning correlates with increased image similarity and match frequency.
Implementing this algorithm involved monitoring the number of matches for each frame and retaining the frame with the most matches. An essential aspect was identifying two consecutive decreases in the number of matches, at which point the frame with the highest number of matches up to that point was considered the corresponding frame.
4.5. Compute Distance
After detecting the corresponding frame in the video stream, the next step involves computing the distance between vehicles and determining their positions. This can be achieved using information from the two vehicles associated with these frames, such as the timestamp and speed.
Thus, if the timestamp from the current vehicle is greater than that of the other vehicle, it indicates that the latter is in front of the current one. The approach to comparing the two timestamps is presented in Figure 6. By knowing the speed of the front vehicle and the time difference between the two vehicles, the distance between them is computed.
Figure 6.
Determining neighboring vehicle relative position using matched frame timestamps.
Conversely, if the timestamp from the current vehicle is smaller than that of the other vehicle, it suggests that the latter is behind the current one. With the speed of the current vehicle and the time difference between the two vehicles, the distance between them can still be computed.
Given that the video operates at a frequency of 30 frames per second and GPS data is reported every second, each of the 30 frames contains the same set of information. However, this uniformity prevents the exact determination of distance because both frame 1 and frame 30 will have the same timestamp despite an almost 1-s difference between the two frames.
To enhance the accuracy of distance computation between the two vehicles, adjustments are made to the timestamp for the frames from both vehicles. In addition to other frame details, the frame number reported with the same timestamp (ranging from 1 to 30) is transmitted. In the distance computation function, the timestamp is adjusted by adding the current frame number divided by the total number of frames (30). For instance, if the frame number is 15, 0.5 s are added to the timestamp.
In Figure 7, the method of computing distance assuming that Vehicle 1 is in the front and Vehicle 2 is behind is detailed. Frame V1 from Vehicle 1, which is the x-th frame at timestamp T1, is detected by Vehicle 2 as matching with frame V2, which is the y-th frame at timestamp . To determine the position relative to Vehicle 1, the other vehicle needs to compute the distance traveled by the first vehicle in the time interval from timestamp to the current timestamp , taking into account its speed.
Figure 7.
Distance estimation between two vehicles through frame matching and timestamp comparison.
To compute the distance as accurately as possible, the speed reported at each timestamp is considered, and the calculation formula is presented in Equation (16). Since Frame V1 is the x-th frame at timestamp , and considering that there are 30 frames per second, the time remaining until timestamp second can be determined. Then, this time interval is multiplied by speed at timestamp to determine the distance traveled in this interval. The distance traveled from timestamps to is determined by multiplying the speeds at these timestamps by 1 s each. To determine the distance traveled from to the frame y-th, the speed at is multiplied by . By summing all these distances, the total distance is obtained.
4.6. Hardware Used
For the developed solution, 2 DDPAI MOLA N3 cameras were utilized, each featuring a 2k resolution and operating at a frame rate of 30 frames per second. These cameras feature built-in GPS functionality that accurately records the vehicle’s location and speed. The advantage of these cameras lies in their GPS data storage format, which facilitates the seamless retrieval of this information. Cameras with identical resolutions were selected to ensure consistency, as not all image descriptors maintain scale invariance, which could otherwise affect algorithm performance.
5. Performed Experiments and Test Results
Based on the implementation presented in the previous section, a series of tests were conducted to demonstrate both the feasibility of the algorithm and its performance. The performances of the BEBLID, ORB, and SIFT descriptors were tested, as well as how the number of features influences frame detection. Finally, a comparison between the distance calculated by the proposed algorithm, the one calculated based on GPS data, and the measured distance is presented to illustrate the preciseness of the proposed algorithm in real-world applications. This comparison shows that the algorithm reflects a high degree of accuracy when validated against physically measured distances, which demonstrates its potential effectiveness for applications requiring precise distance calculations, e.g., vehicle platooning applications.
5.1. Test Architecture
To prove the feasibility and robustness of the proposed algorithm, we conducted various tests in real-world scenarios. For the first test, a vehicle equipped with a dashboard camera was used to make two passes on the same streets, resulting in two video recordings. The main purpose of this test was to determine what descriptors work best and if the proposed system performs well when eliminating the errors caused using multiple cameras. For this purpose, for a frame extracted from the first video (left picture from Figure 8), we had to find the corresponding frame in the second video (right picture from Figure 8). Additionally, the performances of three different descriptors were compared, namely SIFT, ORB, and BEBLID, in terms of matching accuracy and speed. This comparison allows us to evaluate the strengths and limitations of each descriptor in the context of our experiment.
Figure 8.
Two corresponding frames from two video sequences.
For the selected frame and each frame in the second video, the following steps were performed:
- In total, 10,000 key points were detected using the ORB detector for each frame.
- Based on these key points, the descriptors were computed, and the performances of SIFT, ORB, and BEBLID descriptors were compared.
- For feature matching, the brute-force descriptor matcher was used for ORB and BEBLID, which are binary descriptors. This technique compares binary descriptors efficiently by calculating the Hamming distance. As for SIFT, a floating-point descriptor, the FLANN descriptor matcher, was employed. FLANN utilizes approximate nearest neighbor search techniques to efficiently match floating-point descriptors.
- The frame with the highest number of common features with the selected frame from the first video is considered as its corresponding frame in the second video. This matching process is based on the similarity of visual features between frames, allowing us to find the frame in the second video that best corresponds to the reference frame from the first video. In Figure 8, an example of the identified corresponding frame is presented.
The SIFT descriptor is considered a gold-standard reference but requires a significant amount of computational power. It has a feature vector of 128 values. This descriptor managed to match the reference frame with frame 136 from the second video with a total of 3456 matches (Figure 9a). The ORB descriptor is considered one of the fastest algorithms and has a feature vector of 32 values. It also successfully detected frame 136 from the second video with 2178 matches, as shown in Figure 9b. According to the authors, the BEBLID descriptor achieves results similar to SIFT and surpasses ORB in terms of accuracy and speed, with a feature vector of 64 values. However, in our specific test case, the BEBLID descriptor managed to detect frame 136 but with a lower number of matches, specifically 2064 (as shown in Figure 9c) compared to ORB. This discrepancy could be attributed to the specific conditions of our experiment, such as variations in lighting, perspective, or the content of the frames.

Figure 9.
Comparison of Matching Results for SIFT, ORB, and BEBLID.
As a result of this first experiment, all three descriptors considered successfully match two frames from different videos, but with the same camera, even if their content is slightly different. These differences, such as variations in traffic conditions, will also occur when video sequences form different vehicles are used.
5.2. Descriptor Performance Test
The underlying concept of this test involved the deployment of two vehicles equipped with dashboard cameras driving on the same street. As they progressed, the cameras recorded footage, resulting in two distinct videos.
Using these video recordings, the objective of the test was to identify 50 consecutive frames from the leading vehicle within the footage captured by the trailing vehicle. Each frame from the first video was compared with 50 frames from the second video, and the frame with the highest number of matches was taken into consideration. This aimed to ascertain the algorithm’s capability to consistently detect successive frames, thus showcasing its robustness. Furthermore, a secondary aim was to evaluate the performance of the three descriptors used in the process. In Figure 10a, one of the frames from the car in front (left) and the matched frame from the rear car are presented (right). In the frame on the right, the front car is also visible.
Figure 10.
Two matched frames from vehicles.
In Table 1, the results of the three descriptors for a total of 20,000 features are presented. BEBLID successfully detected 39 frames correctly, with instances of incorrect detections shown in blue in the table. These incorrect detections exhibit a minor deviation by detecting a frame either preceding or following the actual frame, which poses no significant concern.
Table 1.
Results of descriptor analysis with color indications: blue for incorrect detections, red for frames detected from behind, and orange for differences exceeding 1 frame.
Note that, the frames shown in blue in Table 1 might be caused by the fact that a perfect synchronization between frames of the used videos cannot be accomplished. For example, the first car traveled at 21.9 km/h or 6.083 m/s records a frame every 0.2 m (considering 30 frames per second). This sampling rate might, in our opinion, cause some of the slightly incorrect detections presented in blue in Table 1. This is the reason to use the blue color, because they might result from the sampling rate of the used cameras and not by an actual error in the matching algorithm.
Similarly, ORB shows good performance by correctly detecting 38 frames. However, the performance of SIFT falls short of expectations, with only 23 out of 50 frames being detected accurately. Additionally, for SIFT, there are cases when it detected a frame from behind after the detection of a subsequent frame, indicated in red in the table. Moreover, the case when the difference between the correct frame and the predicted one is greater than 1 frame is highlighted in orange in the table. Another downside of using SIFT is that it has more cases with three consecutive detections of the same frame than the other two descriptors (BEBLID-0, ORB-1 (frame 45), SIFT-4 (frames 55, 63, 71, 73)). Also, when using SIFT, there are cases when two consecutive detected frames differ by three frames (frames 29, 58, 66 and 71), which is a case that was not encountered when using BEBLID of ORB.
Furthermore, it is noteworthy to highlight that a higher number of matches, as observed in the case of SIFT, does not necessarily translate to better performance. Despite BEBLID having a lower number of matches compared to the other two descriptors, it achieved the highest performance in this test.
These findings underscore the importance of not only relying on the quantity of matches but also considering the accuracy and robustness of the detection algorithm. In this context, BEBLID stands out as a promising descriptor for its ability to deliver reliable performance even with a comparatively lower number of matches.
It is worth mentioning that, the frames written in orange and red are most likely errors in the detection algorithm and can lead to potentially dangerous situations if their number increases.
5.3. Influence of the Number of Features
In this test, the objective was to analyze the influence of the number of features associated with each descriptor on its performance. As the computational time increases with a higher number of features, we examined the performance of the three descriptors across a range of feature numbers, from 20,000 down to 5000.
The test methodology involved detecting 20 frames from the first video against 20 frames from the second video. This approach facilitated an assessment of how varying feature counts affected the accuracy and efficiency of frame detection for each descriptor.
As observed in Table 2, the number of matches per frame decreased as the number of features decreased. For BEBLID, if the number of features decreased from 20,000 to 10,000, the performance did not decline considerably. In fact, for a feature count of 16,000, we achieved the highest number of correctly detected frames, with 18 out of 20. However, if the number of features dropped below 10,000, performance deteriorated significantly.
Table 2.
BEBLID—The influence of the number of features on the frame detection: blue for incorrect detections, red for frames detected from behind.
The results for ORB can be observed in Table 3. For a feature count of 12,000 and 10,000, we achieved 16 out of 20 correctly detected frames. However, if the feature count dropped below 10,000, the performance deteriorated.
Table 3.
ORB—The influence of the number of features on the frame detection: blue for incorrect detections, and orange for differences exceeding 1 frame.
From Table 4, it is clear that the number of correctly detected frames varies depending on the number of features for SIFT. However, overall, this descriptor exhibits poor performance in all cases.
Table 4.
SIFT—The influence of the number of features on the frame detection: blue for incorrect detections, red for frames detected from behind, and orange for differences exceeding 1 frame.
Based on the outcomes of the last two tests, we can conclude that BEBLID generally achieves better results, with the exception being when the number of features is 12,000, where ORB detects 16 frames correctly compared to BEBLID’s 15, see Figure 11. ORB also shows satisfactory results, whereas the performance of SIFT is not as commendable. For the presented reasons, we will use only the BEBLID descriptor in further tests.
Figure 11.
Comparison of correct detection rates for BEBLID, ORB, and SIFT descriptors across different feature counts.
5.4. Distance Computation Test
The objective of this test was to evaluate the distance calculation algorithm and compare the distance calculated based on the proposed algorithm with the distance calculated based on GPS coordinates.
Thirty frames were used as a reference from the car in front, and attempts were made to detect them in the video stream from the car behind using the BEBLID descriptor. As can be observed in Table 5, the detected frames are mostly consecutive. Additionally, it is evident that the distance calculated based on GPS coordinates is significantly larger than the one calculated by the algorithm.
Table 5.
First test—The computed distance between the vehicles (Car A in Front, Car B Behind) for 30 consecutive frames.
A second test was conducted by reversing the order of the two cars. What is observed in this test is that, in this case, the distance calculated based on GPS coordinates is smaller than the distance calculated by the proposed algorithm. These results are presented in Table 6.
Table 6.
Second Test—The computed distance between the vehicles(Reversed Order: Car B in Front, Car A Behind) for 30 consecutive frames.
5.5. Accuracy of the Computed Distance in a Real-World Scenario
The last test that was conducted aims to ascertain the accuracy of the distance between two vehicles computed by the proposed algorithm. For this, we use a simple but very effective real-world testing scenario. This scenario allows us to measure the exact distance between two vehicles and compare it with the computed distance using the presented approach.
For this test, we used two vehicles, each equipped with a video camera. The street where we recorded the videos was a one lane street. First, we recorded the videos used by our positioning algorithm. Next, for the same frames we have computed the distance, we positioned the cars in the exact same location and measured the exact distance using a measuring tape. During this test, the cars were traveling with speeds between 17.8 and 20.3 km/h.
We did this two times, the only difference being that we switched the car order. The frames detected were in the same area for both cases. The results are presented in Table 7 and in Table 8. In these tables, we included the frame from the first video (from the car in front), the detected frame in the video from the car behind, the computed distance relying solely on the GPS coordinates, the distance computed by the proposed algorithm, and the real measured distance.
Table 7.
First Test—Comparison between the measured distance and the computed distance for 3 frames (Car A in Front, Car B Behind).
Table 8.
Second Test—Comparison between the measured distance and the computed distance for 3 frames (Reversed Order: Car B in Front, Car A Behind).
The data presented in these two tables confirm the hypothesis that the distance computed only using the GPS coordinates presents a significant error compared to the real distance and should not be used in car platooning applications. Also, the data indicate that the distance computed using the proposed algorithm outperforms the GPS distance by a great margin, with small differences compared to the real distance. All the differences between the distances computed using the proposed algorithm and the real distances were under 1 m, compared to around 10 m using the GPS distance.
5.6. Limitations
In the conducted tests, we observed that for certain areas captured in the images, such as in Figure 12, the proposed algorithm detected very few matches between frames from the two cameras. This compromised the optimal functioning of the corresponding frame detection algorithm. As depicted in Figure 13, for frame 11, which should ideally have the highest number of matches, they amounted to only around 150. Due to the low number of matches, the algorithm fails to accurately identify the correct frame.
Figure 12.
Area of sparse matches detected by the algorithm.
Figure 13.
Correct frame detection.
One of the reasons for this issue could be lower brightness in these areas, where descriptors may struggle to extract and match significant features between images, resulting in a reduced number of matches. Nevertheless, such cases can be labeled as failed detections and not to be used in further vehicle platooning applications.
6. Conclusions
Increasing urbanization and vehicle density have led to escalating traffic congestion and a rise in road accidents. With millions of lives lost or injured annually, urgent measures are required to enhance road safety. This underscores the necessity for effective vehicle positioning algorithms to mitigate these challenges.
A robust vehicle positioning algorithm is crucial for effective traffic management and enhanced road safety. With the rising number of vehicles, there is an increased need to optimize traffic flow, minimize delays, and enable intelligent control systems. By accurately determining the position of vehicles, advanced functionalities can be developed to reduce the risk of accidents, improve commuting experiences, and facilitate efficient resource allocation. Implementing such an algorithm is vital for creating a safer and more efficient transportation system.
In this paper, an algorithm that accurately and robustly position vehicles on a road with respect to the position of other nearby vehicles was described. The algorithm presents a decentralized approach where each vehicle acts like an independent computational node and tries to position itself depending on data received from the other nearby vehicles.
The decentralized approach proposed in this paper can use a low-range communication system with a very high bandwidth, but each vehicle requires a high computational power to perform all the processing tasks in real time. A centralized approach (using cloud services, for example) can perform all the processing tasks in real-time, but it highly depends on communication between vehicles and the server, mainly because each vehicle will send the entire video stream to the server.
Based on the results obtained for the various performed tests, it was proven that the novel approach proposed in this paper is efficient and can be used to increase the accuracy of the computed distance between vehicles.
For the first future research direction, the goal is to detect whether vehicles are in the same lane or in different lanes based on the relative position of the two matched descriptors. Another research direction involves implementing a centralized approach, where each vehicle sends data to a server that utilizes cloud computing to process all the data in real-time. This way, each vehicle will have a clearer understanding of vehicles that are not within the considered distance threshold. Furthermore, we plan to expand the experiments and conduct them at higher speeds once we find a suitable road that allows for this, aiming to ensure minimal interference and achieve more accurate results.
Author Contributions
Conceptualization, I.-A.B. and P.-C.H.; methodology, I.-A.B. and P.-C.H.; software, I.-A.B.; validation, I.-A.B., P.-C.H. and C.-F.C.; formal analysis, I.-A.B. and P.-C.H.; investigation, I.-A.B. and P.-C.H.; resources, I.-A.B. and P.-C.H.; data curation, I.-A.B.; writing—original draft preparation, I.-A.B.; writing—review and editing, I.-A.B., P.-C.H. and C.-F.C.; visualization, I.-A.B.; supervision, P.-C.H. and C.-F.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data is contained within the article.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| GPS | Global Positioning System |
| LiDAR | Light Detection and Ranging |
| V2V | Vehicle-to-vehicle |
| V2X | Vehicle-to-everything |
| ADAS | Advanced Driver Assistance Systems |
| SIFT | Scale-Invariant Feature Transform |
| DoG | Difference of Gaussian |
| LoG | Laplacian of Gaussian |
| SURF | Speeded-Up Robust Feature |
| ORB | Oriented FAST and Rotated BRIE |
| BRIEF | Binary Robust Independent Elementary Features |
| rBRIEF | Rotation-aware BRIEF |
| BELID | Boosted Efficient Local Image Descriptor |
| BEBLID | Boosted efficient binary local image descriptor |
| CNN | Convolutional Neural Networks |
| BFMatcher | Brute-force Matcher |
| FLANN | Fast Library for Approximate Nearest Neighbors |
| OCR | Optical Character Recognition |
References
- World Health Organization. Save Lives: A Road Safety Technical Package; World Health Organization: Geneva, Switzerland, 2017; p. 60.
- World Health Organization. Global Status Report on Road Safety 2023; World Health Organization: Geneva, Switzerland, 2023.
- Forum, I.T. Monitoring Progress in Urban Road Safety; International Traffic Forum: Paris, France, 2018. [Google Scholar]
- Caruntu, C.F.; Ferariu, L.; Pascal, C.; Cleju, N.; Comsa, C.R. Connected cooperative control for multiple-lane automated vehicle flocking on highway scenarios. In Proceedings of the 23rd International Conference on System Theory, Control and Computing, Sinaia, Romania, 9–11 October 2019; pp. 791–796. [Google Scholar] [CrossRef]
- Sun, Y.; Song, J.; Li, Y.; Li, Y.; Li, S.; Duan, Z. IVP-YOLOv5: An intelligent vehicle-pedestrian detection method based on YOLOv5s. Connect. Sci. 2023, 35, 2168254. [Google Scholar] [CrossRef]
- Ćorović, A.; Ilić, V.; Ðurić, S.; Marijan, M.; Pavković, B. The Real-Time Detection of Traffic Participants Using YOLO Algorithm. In Proceedings of the 2018 26th Telecommunications Forum (TELFOR), Belgrade, Serbia, 20–21 November 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Joshi, R.; Rao, D. AlexDarkNet: Hybrid CNN architecture for real-time Traffic monitoring with unprecedented reliability. Neural Comput. Appl. 2024, 36, 1–9. [Google Scholar] [CrossRef]
- Jia, D.; Lu, K.; Wang, J.; Zhang, X.; Shen, X. A Survey on Platoon-Based Vehicular Cyber-Physical Systems. IEEE Commun. Surv. Tutor. 2016, 18, 263–284. [Google Scholar] [CrossRef]
- Axelsson, J. Safety in Vehicle Platooning: A Systematic Literature Review. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1033–1045. [Google Scholar] [CrossRef]
- Yang, H.; Hong, J.; Wei, L.; Gong, X.; Xu, X. Collaborative Accurate Vehicle Positioning Based on Global Navigation Satellite System and Vehicle Network Communication. Electronics 2022, 11, 3247. [Google Scholar] [CrossRef]
- Kolat, M.; Bécsi, T. Multi-Agent Reinforcement Learning for Highway Platooning. Electronics 2023, 12, 4963. [Google Scholar] [CrossRef]
- Gao, C.; Wang, J.; Lu, X.; Chen, X. Urban Traffic Congestion State Recognition Supporting Algorithm Research on Vehicle Wireless Positioning in Vehicle–Road Cooperative Environment. Appl. Sci. 2022, 12, 770. [Google Scholar] [CrossRef]
- Lee, G.; Chong, N. Recent Advances in Multi Robot Systems; Chapter Flocking Controls for Swarms of Mobile Robots Inspired by Fish Schools; InTechOpen: London, UK, 2008; pp. 53–68. [Google Scholar] [CrossRef]
- Reynolds, C.W. Flocks, Herds and Schools: A Distributed Behavioral Model. SIGGRAPH Comput. Graph. 1987, 21, 25–34. [Google Scholar] [CrossRef]
- Tan, Y.; Yang, Z. Research Advance in Swarm Robotics. Def. Technol. 2013, 9, 18–39. [Google Scholar] [CrossRef]
- Kennedy, J.; Eberhart, R.C.; Shi, Y. Swarm Intelligence. In The Morgan Kaufmann Series in Artificial Intelligence; Morgan Kaufmann: San Francisco, CA, USA, 2001. [Google Scholar] [CrossRef]
- Mandal, V.; Mussah, A.R.; Jin, P.; Adu-Gyamfi, Y. Artificial Intelligence-Enabled Traffic Monitoring System. Sustainability 2020, 12, 9177. [Google Scholar] [CrossRef]
- Sultan, F.; Khan, K.; Shah, Y.A.; Shahzad, M.; Khan, U.; Mahmood, Z. Towards Automatic License Plate Recognition in Challenging Conditions. Appl. Sci. 2023, 13, 3956. [Google Scholar] [CrossRef]
- Rafique, S.; Gul, S.; Jan, K.; Khan, G.M. Optimized real-time parking management framework using deep learning. Expert Syst. Appl. 2023, 220, 119686. [Google Scholar] [CrossRef]
- Tang, X.; Zhang, Z.; Qin, Y. On-Road Object Detection and Tracking Based on Radar and Vision Fusion: A Review. IEEE Intell. Transp. Syst. Mag. 2022, 14, 103–128. [Google Scholar] [CrossRef]
- Umair Arif, M.; Farooq, M.U.; Raza, R.H.; Lodhi, Z.U.A.; Hashmi, M.A.R. A Comprehensive Review of Vehicle Detection Techniques Under Varying Moving Cast Shadow Conditions Using Computer Vision and Deep Learning. IEEE Access 2022, 10, 104863–104886. [Google Scholar] [CrossRef]
- Kalyan, S.S.; Pratyusha, V.; Nishitha, N.; Ramesh, T.K. Vehicle Detection Using Image Processing. In Proceedings of the IEEE International Conference for Innovation in Technology, Bangluru, India, 6–8 November 2020; pp. 1–5. [Google Scholar]
- Zhang, Y.; Carballo, A.; Yang, H.; Takeda, K. Perception and sensing for autonomous vehicles under adverse weather conditions: A survey. J. Photogramm. Remote Sens. 2023, 196, 146–177. [Google Scholar] [CrossRef]
- Lu, S.; Shi, W. Vehicle Computing: Vision and challenges. J. Inf. Intell. 2023, 1, 23–35. [Google Scholar] [CrossRef]
- Lowe, D. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar] [CrossRef]
- Vaithiyanathan, D.; Manigandan, M. Real-time-based Object Recognition using SIFT algorithm. In Proceedings of the 2023 Second International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), Trichirappalli, India, 5–7 April 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-Up Robust Features SURF. Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Sreeja, G.; Saraniya, O. Chapter 3—Image Fusion Through Deep Convolutional Neural Network. In Deep Learning and Parallel Computing Environment for Bioengineering Systems; Sangaiah, A.K., Ed.; Academic Press: Cambridge, MA, USA, 2019; pp. 37–52. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G.R. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Rosten, E.; Drummond, T. Machine Learning for High-Speed Corner Detection. In Computer Vision—ECCV; Springer: Berlin/Heidelberg, Germany, 2006; pp. 430–443. [Google Scholar]
- Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary Robust Independent Elementary Features. In Computer Vision—ECCV; Springer: Berlin/Heidelberg, Germany, 2010; pp. 778–792. [Google Scholar]
- Wu, S.; Fan, Y.; Zheng, S.; Yang, H. Object tracking based on ORB and temporal-spacial constraint. In Proceedings of the IEEE 5th International Conference on Advanced Computational Intelligence, Nanjing, China, 18–20 October 2012; pp. 597–600. [Google Scholar] [CrossRef]
- Rosin, P.L. Measuring Corner Properties. Comput. Vis. Image Underst. 1999, 73, 291–307. [Google Scholar] [CrossRef]
- Suárez, I.; Sfeir, G.; Buenaposada, J.M.; Baumela, L. BEBLID: Boosted efficient binary local image descriptor. Pattern Recognit. Lett. 2020, 133, 366–372. [Google Scholar] [CrossRef]
- Suarez, I.; Sfeir, G.; Buenaposada, J.; Baumela, L. BELID: Boosted Efficient Local Image Descriptor. In Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; pp. 449–460. [Google Scholar] [CrossRef]
- Tian, Y.; Fan, B.; Wu, F. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6128–6136. [Google Scholar] [CrossRef]
- Zhang, H.C.; Zhou, H. GPS positioning error analysis and outlier elimination method in forestry. Trans. Chin. Soc. Agric. Mach. 2010, 41, 143–147. [Google Scholar] [CrossRef]
- van Diggelen, F.; Enge, P.K. The World’s first GPS MOOC and Worldwide Laboratory using Smartphones. In Proceedings of the 28th International Technical Meeting of the Satellite Division of the Institute of Navigation (ION GNSS+ 2015), Tampa, FL, USA, 14–18 September 2015. [Google Scholar]
- OpenCV Modules. Available online: https://docs.opencv.org/4.9.0/ (accessed on 1 May 2024).
- Muja, M.; Lowe, D. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration. VISAPP 2009, 1, 331–340. [Google Scholar]
- Tesseract OCR. Available online: https://github.com/tesseract-ocr (accessed on 1 May 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).