Machine Vision-Based Method for Reconstructing the Vehicle Coordinate System in End-of-Line ADAS Calibration

Ding, Zhigang; Jiang, Jingjing; Zheng, Jishi; Kong, Linghua

doi:10.3390/electronics13173405

Open AccessArticle

Machine Vision-Based Method for Reconstructing the Vehicle Coordinate System in End-of-Line ADAS Calibration

¹

School of Mechanical and Automotive Engineering, Fujian University of Technology, Fuzhou 350118, China

²

School of Transportation, Fujian University of Technology, Fuzhou 350118, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(17), 3405; https://doi.org/10.3390/electronics13173405

Submission received: 3 July 2024 / Revised: 15 August 2024 / Accepted: 16 August 2024 / Published: 27 August 2024

(This article belongs to the Section Electrical and Autonomous Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

To ensure the accuracy and reliability of Advanced Driver Assistance Systems (ADAS), it is essential to perform offline calibration before the vehicles leave the factory. This paper proposes a method for reconstructing the vehicle coordinate system based on machine vision, which can be applied to the offline calibration of ADAS. Firstly, this study explains the preliminary preparations, such as the selection of feature points and the choice of camera model, combining actual application scenarios and testing requirements. Subsequently, the YOLO model is trained to identify and obtain feature regions, and feature point coordinates are extracted from these regions using template matching and ellipse fitting. Finally, a validation experiment is designed to evaluate the accuracy of this method using metrics such as the vehicle’s lateral and longitudinal offset distances and yaw angle. Experimental results show that, compared to traditional vehicle alignment platforms, this method improves reconstruction accuracy while reducing costs.

Keywords:

coordinate system reconstruction; vehicle coordinate system; machine vision; feature extraction

1. Introduction

Advanced Driver Assistance Systems (ADAS) are technologies that use various sensors installed on vehicles to collect and process internal and external information, which is then conveyed to the driver. This enables the driver to react quickly based on the information, thereby enhancing driving safety and comfort [1]. The rapid development of this technology is expected to support increasingly complex driving situations in the future, thus continuously raising the requirements for its reliability [2]. The integration and performance of multiple sensors directly determine the feasibility and safety of the ADAS [3,4,5,6]. Installation errors of these sensors can cause data deviations [7], which are further amplified during multi-sensor fusion. Therefore, calibration of the sensor installation positions is essential. In summary, for the ADAS to function properly, it must be calibrated before the vehicle leaves the factory [8,9], with sensor calibration being a crucial step in this process [10]. The standard installation positions of sensors are determined during the design phase and are unified within the vehicle coordinate system. Thus, the standard positions of sensors are described using the vehicle coordinate system, making the construction of the vehicle coordinate system an indispensable part of sensor calibration. In practice, the vehicle coordinate system is defined relative to a fixed installation frame during vehicle manufacturing. Once the vehicle leaves the installation frame, the vehicle coordinate system can no longer be determined using this method. Therefore, reconstructing the vehicle coordinate system is necessary before sensor calibration.

The calibration of ADAS is divided into three categories: beginning-of-line calibration, end-of-line calibration, and maintenance calibration. The beginning of line calibration refers to the preliminary calibration conducted during vehicle assembly to determine the installation positions of the sensors. Maintenance calibration refers to the recalibration performed in repair shops when a vehicle’s sensors are displaced or malfunction due to incidents such as collisions, ensuring the proper functioning of ADAS. End-of-line calibration, on the other hand, is conducted when the vehicle leaves the production line at specialized functional testing stations, using high-precision equipment to accurately calibrate all ADAS sensors. Compared to the other two types, offline calibration is more rigorous, with each vehicle having a fixed calibration time. There are many devices on the market for maintenance calibration that can also reconstruct the vehicle’s coordinate system. However, these devices are often cumbersome to deploy and require extremely time-consuming manual adjustments, making them unsuitable for fast-paced end-of-line inspection. The method proposed in this paper is aimed at offline calibration of the ADAS, enabling quick and accurate reconstruction of the vehicle coordinate system at fixed calibration stations. This study focuses on end-of-line calibration for Advanced Driver Assistance Systems (ADAS) and does not address other application scenarios.

According to ISO 8855:2011 [11], the vehicle coordinate system is a three-dimensional coordinate system used in vehicle design and engineering to define the positions of various points within the vehicle. It includes a coordinate origin and three coordinate axes, as shown in Figure 1. The X-axis is typically horizontal and points forward. The Y-axis is usually perpendicular to the vehicle’s longitudinal symmetry plane and points to the left. The Z-axis generally points upward. The origin O of the vehicle coordinate system is located at a vehicle reference point, which can be defined as the vehicle’s center of gravity, the center of suspension mass, the mid-point of the axis, or the center of the front axle, depending on the requirements of the analysis or test.

The vehicle alignment platform is currently the most advanced technology for vehicle coordinate system reconstruction and is widely adopted by major automotive manufacturers worldwide. It is used to perform the necessary vehicle coordinate system reconstruction during end-of-line calibration [12]. The structure of the platform is shown in Figure 2. At the beginning of the alignment process, the operator drives the vehicle onto the alignment platform until the front wheels are positioned in the “V” grooves. At this point, the axes of the left and right V-grooves are parallel to the front axle of the wheels, which is usually parallel to the Y-axis of the vehicle coordinate system, thereby establishing the Y-axis. Next, the operator exits the vehicle and pulls the lever to control the motor, which drives the screw rod to move the push rod outward. The push rod pushes against the inner sides of the left and right tires, aligning the vehicle body. At this point, the midline of the wheels coincides with the midline of the alignment device, which is parallel to the X-axis of the vehicle coordinate system, thus establishing the X-axis. Assuming equal tire pressure in all four tires and level ground at the wheel positions, the Z-axis direction is upward perpendicular to the horizontal plane. This determines the axes of the vehicle’s coordinate system. The origin is defined according to actual needs and the definitions of the alignment device manufacturer. Thus, after completing the alignment operation, the vehicle coordinate system is reconstructed. The precision of the vehicle alignment system is within a lateral offset of ±1 mm at the front/rear wheel axle center, a longitudinal offset of ±2 mm at the front axle center, and a yaw angle accuracy error of ±0.2°. The alignment error of this device is influenced by factors such as suspension, wheel load, and tire pressure variations. Moreover, using this device for vehicle alignment involves high construction costs, significant precision degradation over time, large space requirements, and lengthy alignment times. This device is typically an integral component of the complete ADAS offline calibration system, serving as an essential part of the process. However, current research predominantly focuses on the calibration methods within the system, with limited attention given to the initial step of reconstructing the vehicle coordinate system. This paper addresses this gap by focusing on this critical step and aiming to propose a more advanced method for vehicle coordinate system reconstruction.

To address the shortcomings of traditional vehicle mid-platforms, this paper proposes a vehicle coordinate system reconstruction method based on machine vision. Utilizing machine vision principles, the method captures images of specific white body areas with a camera and extracts the coordinates of particular feature points. By doing so, the transformation relationship between the camera coordinate system and the vehicle coordinate system can be calculated, thus achieving the reconstruction of the vehicle coordinate system. This paper also discusses how to use deep learning to extract regions of interest, select appropriate light sources, and develop feature extraction algorithms, overcoming challenges such as reflection. This enables the accurate extraction of feature point coordinates from white body areas.

This method requires only simple hardware, including a camera, two light sources, and a computer, to complete the entire coordinate system reconstruction process. Compared to vehicle alignment platforms costing tens of thousands of dollars, the total cost of the equipment for this method is only a few thousand dollars. Since this is a non-contact solution, the lifespan and maintenance costs of the camera, light sources, and computer are far superior to those of contact-based vehicle alignment platforms. Additionally, the reconstruction accuracy is not affected by factors such as tire pressure changes. In terms of detection efficiency, this method takes less than five seconds to complete the entire process, from taking a photograph to running the program and obtaining the result. In contrast, a vehicle alignment platform takes nearly ten seconds from the time the vehicle enters to complete the alignment. Overall, this method offers higher efficiency, greater accuracy, and a lower cost for reconstructing the vehicle coordinate system.

2. Materials

2.1. Vehicle Coordinate System Reconstruction Based on Rigid Body Transformation

The reconstruction of the vehicle coordinate system [13] can also be referred to as the reconstruction of the vehicle coordinate system representation method. In various studies, coordinate system representation methods are typically divided into absolute representation and relative representation [14]. The former, based on definitions, uses global or predefined reference coordinate systems to describe positions. This requires a known fixed origin and axes in the coordinate system, making it suitable for global positioning or fixed reference system scenarios. For representing vehicle coordinate systems in dynamic scenarios, since the positioning and orientation of each vehicle during offline inspection are different, the origin and axes in the reference system constantly change. Using absolute representation to describe each instance requires redefinition every time, which is challenging and cumbersome. The latter method involves determining a relative point or relative coordinate system in space and finding the relative relationship between the coordinate system to be represented and the known coordinate system. This method of using relative relationships to describe unknown coordinate systems is more suitable for dynamic environments. Even if the vehicle coordinate system constantly changes, it only requires solving the transformation relationship between it and a known coordinate system. This allows the vehicle coordinate system to be described using the known coordinate system and the transformation relationship. Therefore, this study uses relative representation for the reconstruction of the vehicle coordinate system.

In machine vision-based solutions, the camera coordinate system is typically used as a known coordinate system to describe unknown points or coordinate systems. The technology for determining the camera coordinate system is already very mature and is commonly referred to as camera calibration, which will not be elaborated on in this paper. The camera’s coordinate system is generally defined as follows: The origin of the camera coordinate system is the center point of the camera lens; the positive direction of the X-axis is to the right of the camera lens; the positive direction of the Y-axis is downward along the camera lens; and the Z-axis points forward along the direction of the camera lens. In this method, the camera is fixed in position at the detection station, meaning the camera coordinate system is stationary. When the vehicle enters the detection station and comes to a stop, the vehicle’s coordinate system also becomes stationary. At this point, there is a fixed transformation relationship between the two coordinate systems, which can be described by a fixed rotation matrix and translation matrix, known as rigid body transformation.

In daily calculations, a three-dimensional rigid body transformation typically requires the construction of a rigid body transformation matrix T, as shown in Equation (1).

T = [\begin{matrix} R & t \\ 0 & 1 \end{matrix}]

(1)

where R is a 3 × 3 rotation matrix used to represent the rotation of the rigid body; t is a 3 × 1 translation matrix used to represent the translation of the rigid body; 0 is a 3 × 1 zero vector; and 1 is a scalar used to maintain the homogeneity of the coordinates. The formula for the rigid body transformation can be written as Equation (2).

Q = T P

(2)

where P represents the coordinates of the point in three-dimensional space before the rigid body transformation, and Q represents the coordinates after the transformation. In practical calculations, the objects of three-dimensional rigid body transformation are usually described by a set of three-dimensional points in space. In this study, P refers to all known point sets in the camera coordinate system, and Q represents the corresponding point sets in the vehicle coordinate system. Thus, Equation (2) describes the transformation relationship between the coordinate systems where P and Q reside, i.e., the relationship between the camera coordinate system and the vehicle coordinate system. The objective of this study is to solve for the rigid body transformation matrix T. The rotation part R of T has three degrees of freedom, representing rotations around the x-axis, y-axis, and z-axis. The translation part t also has three degrees of freedom, representing translations along the x, y, and z directions, for a total of six degrees of freedom. To determine these degrees of freedom, two sets of nonlinear point pairs, consisting of at least three non-collinear points, are required for the solution [15].

In summary, when the positions of three points in two coordinate systems are known, the rigid body transformation matrix can be solved to obtain the transformation relationship between the two coordinate systems. Therefore, this study needs to obtain the three-dimensional coordinates of physical points in both the camera coordinate system and the vehicle coordinate system. Accurate point coordinates in the vehicle coordinate system involve the vehicle design model, which is typically confidential to automobile manufacturers. Excessive disclosure of dimensions could lead to the model being deciphered and stolen. This study relies on a collaborative project with an automobile manufacturer. However, considering data security, the manufacturer only provides a limited number of point coordinates. The small quantity and weak relative relationships among these points pose challenges. In machine vision solutions, point coordinates in the camera coordinate system are usually obtained through image processing. Theoretically, all point coordinates can be acquired, but due to extraction difficulty and stability concerns, only distinct and easily identifiable feature points are generally extracted. The images are captured by a fixed camera at the detection station. Considering computational accuracy and cost, this method uses a single camera fixed at the detection station, limiting the maximum range of feature point extraction to the camera’s field of view (FOV). A “white body” refers to the part of the vehicle body that has been welded but not yet painted [16]. Compared to subsequently added body parts, feature points on the white body are typically stable and do not move or deform. These stable feature points provide a reliable reference, making the computation results more accurate. Therefore, it is necessary to select easily extractable points from the known points provided by the automobile manufacturer. Considering the limited FOV of the camera, the selected points should be within a relatively concentrated area, belong to the white body, and be fully captured in a single image.

Based on this, as shown in Figure 3, the points selected in this study are located in the white body area at the rear of the vehicle, specifically the body contour points around the fuel tank cover and the center point of the fuel tank cover. These points are referred to as body contour point 1 (feature point 1), body contour point 2 (feature point 2), and the center point of the fuel tank cover (feature point 3) in the following sections.

2.2. Pixel Coordinate System to Camera Coordinate System

To solve the rigid body transformation matrix, as mentioned in Section 2.1, it is necessary to obtain the 3D coordinates of physical points in both the camera coordinate system and the vehicle coordinate system. In Section 2.1, it is explained that this study uses specific feature points as representatives of these physical points. Additionally, it is noted that the 3D coordinates of the physical points in the vehicle coordinate system are directly provided by the automaker. Therefore, the only coordinates that need to be obtained are the 3D coordinates of the feature points in the camera coordinate system, which are typically calculated from the pixel coordinates of the feature points [17]. This calculation process usually involves three coordinate systems: the pixel coordinate system, the image coordinate system, and the camera coordinate system. The pixel coordinate system is transformed into the image coordinate system through an affine transformation, as shown in Equation (3). The image coordinate system is then transformed into the camera coordinate system through a perspective transformation, as shown in Equation (4). Combining these transformations, the conversion from the pixel coordinate system to the camera coordinate system satisfies Equation (5).

[\begin{matrix} u \\ v \\ 1 \end{matrix}] = [\begin{matrix} \frac{1}{d x} & 0 & u_{0} \\ 0 & \frac{1}{d y} & v_{0} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x \\ y \\ 1 \end{matrix}]

(3)

z_{c} [\begin{matrix} x \\ y \\ 1 \end{matrix}] = [\begin{matrix} f & 0 & 0 \\ 0 & f & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}]

(4)

z_{c} [\begin{matrix} x \\ y \\ 1 \end{matrix}] = [\begin{matrix} f & 0 & 0 \\ 0 & f & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}]

(5)

where

f

is the focal length,

d_{x}

and

d_{y}

represent the physical length of one pixel on the camera sensor in the X and Y directions, respectively.

u_{0}

and

v_{0}

represent the coordinates of the center of the camera sensor in the pixel coordinate system. These can be uniformly expressed as:

[\begin{matrix} \frac{1}{d x} & 0 & u_{0} \\ 0 & \frac{1}{d y} & v_{0} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} f & 0 & 0 \\ 0 & f & 0 \\ 0 & 0 & 1 \end{matrix}] = [\begin{matrix} f_{x} & 0 & u_{0} & 0 \\ 0 & f_{y} & v_{0} & 0 \\ 0 & 0 & 1 & 0 \end{matrix}]

(6)

It can be clearly observed that

f_{x} = f / d_{x}, f_{y} = f / d_{y}

, referring to the different focal lengths of the camera in the x and y axes directions. These are known as the camera intrinsic parameters, which are inherent properties of the camera and generally do not change. Since the camera involved in this study was strictly calibrated before leaving the factory, these intrinsic parameters can be directly read using an API (Application Programming Interface) during actual use. Therefore, Equation (5) can also be written as:

z_{c} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = [\begin{matrix} f_{x} & 0 & u_{0} & 0 \\ 0 & f_{y} & v_{0} & 0 \\ 0 & 0 & 1 & 0 \end{matrix}] [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}]

(7)

Here,

(u, v)

represents the two-dimensional coordinates of the feature point in the pixel coordinate system, which can be obtained through image processing in this study and are known quantities.

(X_{c}, Y_{c}, Z_{c})

represents the three-dimensional coordinates of the feature point in the camera coordinate system, which are unknown and need to be determined. This can be more intuitively expressed as Equation (8). When the camera intrinsic parameters are known, the three-dimensional coordinates of the feature points in the camera coordinate system can be calculated using Equation (8).

\{\begin{array}{l} X_{c} = (u - u_{0}) \times Z_{c} / f_{x} \\ Y_{c} = (v - v_{0}) \times Z_{c} / f_{y} \\ Z_{c} = Z_{c} \end{array}

(8)

3. Methods

The overall workflow of the method used in this study is illustrated in Figure 4. The general approach of the method can be divided into two parts. The first part involves obtaining the 3D coordinates of selected feature points using machine vision. The 3D coordinates of the feature points can be calculated using Equation (8). To achieve this, the camera needs to be calibrated first, and the intrinsic parameters of the camera must be obtained. After calibration, the intrinsic parameters are acquired. Next, RGB and depth images of the region containing all feature points, as shown in Figure 3, are captured, and both images are corrected. The RGB image is used to obtain the (u, v) values. To achieve this, the corrected RGB image is input into a target detection model to identify and crop the region containing the feature points. Subsequently, image processing is performed on the region containing the features using template matching and ellipse fitting algorithms to obtain the (u, v) values. Finally, the depth values of the feature points are obtained from the corrected depth image. At this point, all known quantities in Equation (8) have been acquired, allowing the calculation of the 3D coordinates

(X_{c}, Y_{c}, Z_{c})

of the feature points in the camera coordinate system.

The second part of the method involves calculating the coordinate transformation matrix from the camera coordinate system to the vehicle coordinate system. The 3D coordinates of the feature points in the camera coordinate system obtained in the first part are combined with the 3D coordinates of the same feature points in the vehicle coordinate system provided by the vehicle manufacturer. This allows the calculation of the coordinate transformation matrix, thereby enabling the use of the camera coordinate system to describe the vehicle coordinate system, completing the reconstruction. This chapter will introduce the specific implementation methods in the above order.

3.1. Camera Selection and Image Distortion Correction

Before acquiring images, it is necessary to consider the camera selection and installation position. The camera installation position must take into account the actual test station. In this study, when the test vehicle drives into the test station along the guide line, the maximum distance from the edge of the test station to the vehicle body surface is approximately 800 mm. To save costs, major renovations to the test station will not be undertaken, so the maximum distance from the camera installation position to the vehicle body surface is 800 mm. The camera’s FOV needs to cover all regions where the features are located. Measurements indicate that the area to be tested is approximately 550 mm by 450 mm. In practical measurements, there will be some variation in the vehicle’s parked position. Therefore, a margin needs to be left in terms of working distance and FOV. However, due to the presence of guide lines and limit devices, this margin requirement is not large. Considering the above factors, this study selects the Dkam330 M model binocular structured light depth camera from Xi’an Zhiwei Sensing Technology Co., Ltd. in Xi’an, China. Its working distance is 500–1500 mm, with a FOV of 650 mm by 570 mm at 800 mm and a depth accuracy of up to 0.105 mm.

After selecting and installing the camera according to the above standards, raw images can be captured, as shown in Figure 5. Without proper distortion correction, the obtained feature point coordinates (u, v) may have an error of several pixels. However, due to lens distortion during the imaging process, it is necessary to correct the distortion in the raw images [18,19]. Camera distortion is usually divided into radial distortion and tangential distortion [20]. Radial distortion is caused by the manufacturing process of the lens, where the irregular shape of the lens results in different magnifications in different parts. Radial distortion is further divided into barrel distortion and pincushion distortion [21]. Barrel distortion occurs when the magnification in the central region of the optical axis is much greater than in the peripheral regions, while pincushion distortion occurs when the magnification in the peripheral regions is much greater than in the central region of the optical axis. Tangential distortion is caused by the lens not being parallel to the camera sensor plane, usually due to misalignment during lens installation.

Radial distortion is zero at the center of the optical axis and becomes more severe toward the edges along the radial direction of the lens. However, in practical applications, radial distortion is relatively small and can be quantitatively described using the first few terms of a Taylor series expansion [22] around the position where r = 0.

x' = x (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6})

(9)

y' = y (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6})

(10)

In the equation:

r^{2} = x^{2} + y^{2}

,

(x, y)

are the ideal point coordinates in the camera coordinate system, and

(x', y')

are the actual point coordinates. This notation is consistent throughout the subsequent equations. Tangential distortion can be described using p1 and p2.

x' = x + [2 p_{1} x y + p_{2} (r^{2} + 2 x^{2})]

(11)

y' = y + [2 p_{2} x y + p_{1} (r^{2} + 2 y^{2})]

(12)

The overall distortion model combining radial distortion and tangential distortion can be expressed as follows:

x' = x + x (k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + [2 p_{1} x y + p_{2} (r^{2} + 2 x^{2})]

(13)

y' = y + y (k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + [2 p_{2} x y + p_{1} (r^{2} + 2 y^{2})]

(14)

Let the coordinates of point P in the pixel coordinate system be

(u, v)

, and the coordinates of the pixel center be

(u_{0}, v_{0})

. Let

f_{x}

and

f_{y}

be the focal lengths of the camera in the x and y directions, respectively, as defined in Equation (6). Then, the coordinates of point P in the camera coordinate system are:

x' = (u - u_{0}) / f_{x}

(15)

y' = (v - v_{0}) / f_{y}

(16)

The distortion of point P can be obtained according to Equations (13) and (14) as follows:

∆ x = x (k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + 2 p_{1} x y + p_{2} (r^{2} + 2 x^{2})

(17)

∆ y = y (k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + 2 p_{2} x y + p_{1} (r^{2} + {2 y}^{2})

(18)

The coordinates of point P in the camera coordinate system after distortion correction are:

x = x' - ∆ x

(19)

y = y' - ∆ y

(20)

Combining Equations (15)–(20), the coordinates of point P in the pixel coordinate system after distortion correction are:

u = x f_{x} + ∆ x f_{x} + u_{0}

(21)

v = y f_{y} + ∆ y f_{y} + v_{0}

(22)

By rearranging the pixel positions of the captured raw image according to Equations (21) and (22), the distortion-corrected image as shown in Figure 6 can be obtained.

3.2. Feature Region Target Detection

After completing distortion correction, the next step is to determine the 2D coordinates (u, v) of the feature points using the RGB image. As shown in Figure 3, the feature points selected in this study, except for the midpoint of the fuel tank cap, are all edge corner points. For light-colored vehicles, such as white, the edge contours are clear, and corner points are easy to extract. A simple threshold segmentation of the large area containing the corner points, as shown in Figure 7, is sufficient. However, when the vehicle body color darkens, the edge contours become less distinct, and uneven ambient lighting can cause reflections and shadows on the vehicle surface, making corner point extraction in large areas challenging. Using the same lighting method and large-area threshold segmentation, the results, as shown in Figure 8, reveal blurred edges and indistinct corner points. In most vision projects, specific lighting methods are used to address issues of indistinct edges. However, in this study, the white body surface area is large, and solving the problem through lighting would be highly difficult and costly. After comprehensive consideration, this study finds it more reasonable to address the issue by focusing on a small region of interest (ROI) centered on the feature points. Specifically, this involves isolating a small area centered on the feature point from the entire image and then applying targeted lighting. This approach shifts the requirement from achieving uniform lighting and edge enhancement over a large area to a smaller, more manageable area. The difficulty and cost are therefore more reasonable. With appropriate localized lighting, the contours at the corner points become very clear.

Thus, before extracting features, it is first necessary to isolate the ROI from the entire image. This study ultimately employs the YOLOv5 object detection model [23,24] to achieve this goal. The YOLOv5 series includes four versions, listed from the largest to the smallest computational load: YOLOv5x, YOLOv5l, YOLOv5m, and YOLOv5s. The YOLOv5s model has the smallest network structure and the fastest speed [25,26], making it more suitable for detecting the areas around the extraction points, and therefore it is used as the basic algorithm for target detection in this study.

3.2.1. Dataset Construction

To ensure that the training dataset is sufficiently diverse and representative, this study implemented a systematic data augmentation approach. Initially, 330 raw images of vehicles in various colors were captured using a camera from multiple angles, distances, and under different lighting conditions. These images represent different sides of the vehicles, front and rear views, and were taken at various times of the day and under different weather conditions to ensure diversity.

To further enrich the dataset and enhance the model’s generalization ability to different scenarios, several data augmentation techniques were applied to the original images. The specific methods include:

Rotation: The images were subjected to multiple small-angle rotations, both clockwise and counterclockwise. Each rotation was performed at a 5-degree increment, for a total rotation of 10 degrees. This operation was designed to simulate possible variations in the height of the front or rear vehicle body due to changes in tire pressure, thereby improving the model’s accuracy in recognizing such variations.
Brightness Adjustment: The brightness of the images was quantitatively increased or decreased to simulate variations in natural light intensity. Prior to this, the natural light intensity at the test site was measured during midday and nighttime under clear weather conditions using a lux meter on the surface of a white car body, which was consistently illuminated. The results indicated that the natural light intensity ranged between 120 and 150 lux, with a maximum variation of 25%. Consequently, the brightness of the images was adjusted in increments of 10%, up to an increase or decrease of 30%. This adjustment enhances the model’s robustness under varying natural light conditions.
Noise Addition: Gaussian noise with a standard deviation of 10 to 30 and salt-and-pepper noise with a density of 1% to 5% were randomly introduced into the images to simulate potential interference during image acquisition. This augmentation method improves the model’s ability to adapt to various types of image noise that may occur in real-world scenarios.
Cropping: Images were cropped to retain different regions of the image, simulating various shooting distances or occlusion scenarios. Since the detection area is generally located in the center of the image and is not typically obscured, random cropping of 5% to 20% was applied only to the edges of the images. This simulates scenarios where the shooting distance is reduced or the edges are obscured.
Flipping: Horizontal or vertical flipping of the images was performed, causing the vehicles in the images to appear inverted left-to-right or top-to-bottom. This increased the diversity of the data and enhanced the model’s invariance to flipping.

By employing these data augmentation methods, the initial 330 images were expanded to a total of 4620 images, significantly increasing the dataset’s richness and improving the model’s training efficacy. All images have a resolution of 1920 × 1080 and are in JPG format, ensuring consistent image quality and processing. This expanded dataset provides a more robust foundation for subsequent model training and testing. Some of the images from the dataset are shown in Figure 9.

In this study, LabelImg software was used to annotate the images. Three categories were labeled: point1, point2, and point3, representing the regions where the three feature points are located. The data were saved in YOLO format as TXT files. Due to the small scale of this dataset and to reduce the risk of information leakage while more accurately reflecting the model’s performance, 2772 images were used as the training set, 330 images as the validation set, and 924 images as the test set, following the commonly used 924 ratio.

3.2.2. Detection Model

This study focuses on locating the vehicle’s corner points and the center point of the fuel cap. The region where these points are located needs to be detected. The YOLOv5s target detection algorithm was selected as the detection model due to its high detection accuracy and speed, which meet the detection requirements of this study. The YOLOv5s framework mainly consists of four parts: Input, Backbone, Neck, and Head. The training set is fed into the Input part, which includes Mosaic data augmentation, adaptive anchor box calculation, and adaptive image scaling. Mosaic data augmentation randomly stitches together four vehicle images into a new image, enriching the dataset. Before training, the most suitable anchor box parameters for the input images are adaptively calculated. The input image size is adaptively scaled to detect corner and center point regions of different scales. Next, the processed vehicle images are input into the Backbone network, which mainly consists of the Focus structure and the CSP (Cross Stage Partial) structure, efficiently extracting vehicle feature information. Then, the images are passed to the Neck structure, which includes FPN (Feature Pyramid Network) and PAN (Path Aggregation Network), allowing for the extraction of more complex vehicle feature information. Finally, the images are passed to the Head, which outputs information such as the locations of the corners and fuel cap, as well as the categories of the feature points.

3.2.3. Results Analysis

The experimental configuration for training the YOLOv5s model in this study is as follows: Windows 11 operating system, AMD Ryzen 7 6800H CPU, NVIDIA GeForce RTX 3050 GPU, PyTorch version 2.0.1, CUDA version 11.8, and Python version 3.9.18. The model was trained for a total of 300 epochs, with a batch size of 4. The input image size for the neural network was 640 × 640, and the initial learning rate was set to 0.01.

The metrics used to evaluate the detection performance of the YOLOv5s model in this study are Precision (P), Recall (R), and Mean Average Precision at IOU = 0.5 (mAP@0.5).

P = \frac{T P}{T P + F P}

(23)

R = \frac{T P}{T P + F N}

(24)

m A P = \frac{\sum_{i = 1}^{C} A P_{i}}{C}

(25)

Let TP be the number of correctly identified feature regions by the model, FP be the number of incorrectly identified feature regions, and FN be the number of feature regions not identified by the model. Let

P_{i}

be the detection precision of the i-th feature region category, and N be the total number of feature region categories. The training results are shown in Figure 10, with a precision of 97.3%, a recall of 92.9%, and a mean average precision (mAP@0.5) of 93.7%. The actual detection effect is shown in Figure 11.

3.3. Feature Point Extraction

3.3.1. Light Source Selection

After obtaining the ROI, the 2D coordinates of the feature points can be extracted. During extraction, appropriate localized lighting is crucial, as it makes the features within the small area more distinct, thereby improving extraction efficiency and accuracy. Conversely, improper lighting can cause the features to become blurred, significantly increasing the difficulty of extraction and resulting in poor precision. Therefore, the selection of the light source is of utmost importance. The light source selection in this study first considers highlighting the edge contours. The edge contours of the white body typically have significant depth variations, with areas of greater depth often having weaker reflective capabilities, resulting in lower gray values. When the vehicle itself is darker in color, its reflectivity to different wavelengths of light also varies. If the wavelength of the selected light source results in a low reflectance for the vehicle color, the gray values of non-edge areas in the detection region will also be low in the camera’s FOV. This leads to indistinct edges in the detection region, making contour extraction difficult.

To address this issue, the color of the light source needs to be selected based on the reflectivity of the vehicle body color to different wavelengths of light. When the reflectivity is high, the gray value at the edges is larger, resulting in overexposure in the image, which also causes the edges to blur [27]. Through practical testing, it was found that white light meets the detection needs of light-colored vehicle bodies, while red light provides better illumination for dark-colored vehicle bodies.

In addition, the selection of the light source needs to consider the uniformity of the illumination. Uneven lighting can lead to the appearance of extraneous contours, reducing the robustness of the algorithm. This study involves a relatively large feature area, so a strip light source is used for illumination [28]. Finally, the illumination range and power range of the light source are also important considerations. The illumination range must encompass all feature regions, and the power setting only needs to be adjusted once to prevent the feature regions in the image from being too dark or overexposed, considering the relatively stable natural lighting in the actual application scenario.

3.3.2. Corner Extraction Based on Template Matching

In addition to using appropriate lighting, it is also necessary to develop a suitable extraction algorithm for feature point extraction. In this study, Halcon software is used for image processing. Halcon is a powerful machine vision software library developed by MVTec Software GmbH in Germany (Munich, Germany). It provides a rich set of image processing and machine vision algorithms that can meet the needs of extracting feature point pixel coordinates in this study. Given the variability of vehicle colors in practice, using a threshold-based method to detect edges can lead to difficulties in determining the appropriate threshold range due to changes in body color. Therefore, to improve the algorithm’s adaptability to different vehicle colors, this study employs a shape-based template matching method [29,30,31]. This method uses edge features for localization and is not sensitive to grayscale changes caused by color variations, thus offering good robustness for feature extraction on vehicle bodies of different colors.

This method first requires creating a template image with clear and simple edge features. The template image consists of a small region containing the corner point. A template image needs to be created for each corner point. The steps to create the template image are as follows:

Use the read_image operator to read the image.
Use the rgb1_to_gray operator to convert the image to grayscale.
Use the gen_circle operator to draw a circular ROI (Region of Interest) in the region containing the feature point.
Use the reduce_domain operator to crop the ROI region from the grayscale image, obtaining the ROI1 image.
Use the threshold operator to perform threshold segmentation on ROI1.
Use the connection operator to divide the segmented regions into connected regions.
Use the select_shape operator to select the region containing the feature point.
Use the reduce_domain operator to crop the region containing the feature point from the ROI1 image, obtaining the template image, as shown in Figure 12.

After obtaining the template image, the coordinates of the template center point

(C x, C y)

can be acquired. The Harris corner detection algorithm [32] is then applied to this image to obtain the coordinates of its corner points

(H x, H y)

. Next, the template image is used to create the template, and after setting a series of matching parameters, the template image is matched within the image to be matched. The match with the highest score is taken as the target item. The image to be matched is obtained by cropping the target detection image from Section 3.2. After obtaining the target item, the operator is used to obtain its center point coordinates

(c x, c y)

and the row and column scaling factors

(S R, S C)

. The corner coordinates of the target item can be calculated using the following relationship:

∆ x = H x - C x

(26)

∆ y = H y - C y

(27)

h x = c x + S R * ∆ x

(28)

h y = c y + S C * ∆ y

(29)

Here, ∆x and ∆y represent the row and column differences between the corner points and the image center point in the template image, while hx and hy are the row and column coordinates of the corner points in the target item. To simulate changes in vehicle placement positions, which might affect the lighting effect at the corner points, this study also tested corner point extraction by adjusting the light source position or the shooting angle. Tests were conducted on feature regions under different lighting and shooting angles. After optimization, the extraction results are shown in Figure 13. In practical application scenarios, due to the presence of guide lines and limit devices, changes in vehicle placement positions are minimal, and lighting is stable. Thus, even with fixed light sources and camera positions, the captured images remain relatively consistent.

3.3.3. Fuel Cap Center Extraction Based on Ellipse Fitting

As mentioned in Section 2.1, in addition to edge corner points, the center of the fuel cap also needs to be used as a feature point. On the experimental vehicle, the fuel cap is a standard circle, but in the actual image, it generally appears as an ellipse due to the shooting angle. Therefore, finding the center of the fuel cap equates to finding the center of an ellipse.

First, the Canny algorithm [33] is used to extract the edges of the region where the fuel cap is located. Then, conditions such as length and roundness are set to filter out contours that are too short or obviously do not have ellipse edges. The contours that are close and collinear are connected, and continuous and longer contour lines are selected based on length conditions. Finally, all remaining contour lines are counted and sorted by length. The contours are then subjected to ellipse fitting one by one, from longest to shortest [34]. Due to the fixed working distance and minimal vehicle pose changes in practice, the ellipse area of the fuel cap in the image does not vary significantly. Therefore, the target ellipse is determined based on the area of the ellipses fitted to each contour.

During testing, issues such as uneven lighting can cause reflections or shadows on the fuel cap surface. This extraction algorithm shows good robustness to such conditions, and the extraction results are shown in Figure 14. If deployed in actual application scenarios with more stable light sources and natural lighting conditions, the detection results will be more consistent.

3.4. Acquisition of 3D Coordinates in the Camera Coordinate System

After obtaining the two-dimensional coordinates of the target feature points in the pixel coordinate system through the above steps, the three-dimensional coordinates in the camera coordinate system can be calculated according to Equation (7) by acquiring the camera intrinsic parameters and the depth value Zc at the feature points. The depth of the feature points is obtained through the depth map. When the camera captures an image, it simultaneously acquires both the RGB image and the depth map, and both are corrected.

In this case, any point

(x, y)

in the RGB image will have the same coordinates

(x, y)

in the depth map. Each pixel value in the RGB image consists of three channels, recording the R, G, and B values of that pixel, while the depth map consists of a single channel, recording the depth value at that point. In Section 3.3, the two-dimensional coordinates of the feature points were obtained. Therefore, it is only necessary to read the value of the same two-dimensional coordinates in the corrected depth image to obtain the depth value of the feature points. Finally, according to Equation (8), the three-dimensional coordinates of the feature points in the camera coordinate system can be calculated.

4. Results

By employing the methods introduced earlier, the rigid body transformation matrix from the camera coordinate system to the vehicle coordinate system can ultimately be calculated. The structure of this matrix is shown in Equation (1), comprising a rotation matrix R and a translation matrix t. This indicates that each point in the camera coordinate system can be mapped to the corresponding point in the vehicle coordinate system using Equation (2). In other words, each point in the vehicle coordinate system originates from the corresponding point in the camera coordinate system through certain rotations and translations. Thus, the accuracy of this method can be verified in two parts: rotation and translation. Taking the translation experiment as an example, an initial displacement matrix [A, B, C]^T is recorded first. At this point, the origin of the camera coordinate system needs to be translated along the X-axis by A, along the Y-axis by B, and along the Z-axis by C to coincide with the origin of the vehicle coordinate system. Subsequently, the camera is moved along the X-axis of the camera coordinate system by D, and the displacement matrix [a, b, c]^T is calculated again. It is evident that the theoretical displacement matrix should be [A+D, B, C]^T. Therefore, the translation error can be calculated as (A+D-a).

4.1. Rotation Experiment

4.1.1. Evaluation Metrics

Since this study involves determining the rotation matrix, using it to evaluate the rotation accuracy of this method is not intuitive and difficult to compute. In practice, Euler angles can also describe rigid body transformations and have a direct mathematical relationship with the rotation matrix, allowing for mutual conversion. Euler angles transform the rotation matrix into specific rotation amounts around the X, Y, and Z axes, making the evaluation of the rotation process more intuitive. Therefore, after obtaining the rotation matrix, this study calculates the Euler angles to evaluate the accuracy of the rotation part of the coordinate system transformation. This evaluates the method’s ability to reflect the vehicle’s rotation during position and orientation changes. In practical tests, when the vehicle’s position and orientation change on a flat surface, it often involves rotation around the Z-axis. Therefore, this experiment focuses solely on studying the vehicle’s rotational changes around the Z-axis. The ability of this method to reflect rotational changes around the Z-axis represents its ability to reflect the rotational aspect of the vehicle’s position and orientation changes.

4.1.2. Experiment Design

This experiment aims to verify the method’s ability to reflect the vehicle’s rotation. Specifically, after the vehicle rotates by a fixed angle α around an axis, the actual rotation angle θ calculated using this method is compared to α. The rotation calculation error is then given by (θ − α). As shown in Figure 15, rotating the camera to the right around the Y-axis is equivalent to rotating the vehicle to the left around the Z-axis.

The experiment site is a well-lit, flat, enclosed laboratory. The ambient light is provided by LED white lights on the ceiling. The experimental vehicle is a white SUV model that has not been involved in any accidents or severe collisions, meaning there is no deformation on the white body, closely resembling the state when the vehicle leaves the production line. Additionally, the following hardware is required:

One depth camera, model Dkam330 M, is consistent with the description in Section 3.1.
One camera stand with a gimbal at the top end that mounts an electric turntable. The turntable rotation is controlled by pulses with an accuracy of up to 0.005 degrees, enabling high-precision angle rotation.
Two red strip light sources (with controllers).
One computer, with hardware configuration consistent with the deep learning setup mentioned earlier.

Prepare the following software:

The SDK that comes with the experimental camera is used for setting the initial camera parameters and adjusting the initial shooting position.
PyCharm is used for running the test programs. The specific environment configuration for this experiment includes Python 3.9.13, PyTorch 2.1.0, CUDA 11.7, and MVTec Halcon 22111.0.0.

Under the above conditions, set up the following experiment:

(1): Park the experimental vehicle on a flat surface and fix the camera on the electric turntable. Then, set the camera at a distance of 800 mm from the selected white body area of the vehicle. Using the real-time camera feed displayed by the SDK, adjust the camera’s horizontal and height positions to center the test area in the frame. Adjust the stand and gimbal to make the camera as level as possible. After adjustments, fine-tune the turntable so that its rotation scale aligns with the 0-degree mark.
(2): Mount the red strip light sources on the stand and place them on both sides of the detection area. Adjust their positions to ensure the light evenly illuminates the detection surface.
(3): Capture RGB images and corresponding depth images. After completing this step, rotate the electric turntable 2.5 degrees around the Z-axis and capture the images again. Repeat this step multiple times to capture RGB and depth images with the gimbal rotated ±12.5 degrees around the Z-axis.
(4): Process the captured images using a program to calculate the three-dimensional coordinates of the target feature points in the camera coordinate system. Compare these with the known coordinates of the corresponding feature points in the vehicle coordinate system to obtain the rotation matrix and calculate the Euler angles. In this experiment, the calculation is performed every 2.5 degrees of rotation. The rotation error is obtained by subtracting the standard value (2.5 degrees) from the actual calculated value of the vehicle’s rotation angle around the Z-axis. Repeat this experiment three times. The results are shown in Table 1.

As shown in Table 1, when the vehicle rotates 2.5 degrees around the Z-axis (the camera rotates around the Y-axis), the average rotation error measured using this method is 0.11, 0.09, and 0.12 degrees, respectively, with a maximum error of 0.36 degrees. Currently, the vehicle alignment platform used by the automaker supporting this research has a rotation accuracy of 0.2 degrees. Therefore, this method improves the accuracy of reflecting the vehicle’s rotation compared to the original device. The table also shows some relatively large error values. Upon analysis, it was found that repeated calculations of the Euler angles for the same position resulted in approximately 0.1 degrees of repetitive error. To reduce this error, this study averages ten calculations of the Euler angles for the same position.

4.2. Translation Experiment

This experiment aims to verify the ability of this study to reflect the translation amount of a vehicle. The experiment simulates vehicle translation by translating the camera. Moving the camera along the X-axis simulates the vehicle moving along the X-axis, and moving the camera along the Z-axis simulates the vehicle moving along the Y-axis. The movement described here is in the camera coordinate system for the camera and in the vehicle coordinate system for the vehicle, and this convention is followed throughout the subsequent sections. The quantitative translation amount of the camera along the axis is compared with the translation amount calculated using this method. In practical tests, vehicles typically only move along the X and Y axes. Therefore, this experiment only investigates the translation reflection ability of this method in these two directions. In addition to the hardware used in Section 4.1, the following hardware is also required for this experiment:

An electric slide rail with degrees of freedom in the X and Z directions of the camera coordinate system.
A set of laser levels.

The remaining software, hardware, test site, and test vehicle are the same as in Section 4.1. Based on this, the following experimental settings were made:

(1): Mount the camera on the electric slide rail. Set the camera position as described in Section 4.1.2. After setting up, use the laser level to adjust the camera’s position to make the camera’s X and Y axes as parallel as possible to the vehicle’s X and Z axes.
(2): Move the camera along the X-axis on the slide rail, moving 50 mm forward and 50 mm backward, recording and calculating the translation amount at 5 mm intervals. Compare the actual calculated translation of the vehicle along the X-axis with the standard value (5 mm) to obtain the translation error. Repeat this experiment three times. The results are shown in Table 2.
(3): Move the camera along the Z-axis on the slide rail, moving 50 mm forward and 50 mm backward, recording and calculating the translation amount at 5 mm intervals. Compare the actual calculated translation of the vehicle along the Y-axis with the standard value (5 mm) to obtain the translation error. Repeat this experiment three times. The results are shown in Table 3.

As shown in Table 2, when the vehicle moves 5 mm along the X-axis, the average translation error between the calculated values and the actual values using this method is 0.56 mm, 0.41 mm, and 0.45 mm, respectively, with a maximum error of 1.6 mm. The current vehicle alignment platform used in this research has a translation accuracy of 2 mm along the X-axis. Therefore, this method demonstrates a better ability to reflect the vehicle’s translation along the X-axis compared to the original device. As shown in Table 3, when the vehicle moves 5 mm along the Y-axis, the average translation error between the calculated values and the actual values using this method is 0.25 mm, 0.13 mm, and 0.14 mm, respectively, with a maximum error of 0.61 mm. The current vehicle alignment platform has a translation accuracy of 1 mm along the Y-axis. Therefore, this method also demonstrates a better ability to reflect the vehicle’s translation along the Y-axis compared to the original device.

4.3. Adaptability Experiment for Dark Vehicle Bodies

Using the vehicle alignment platform for coordinate system reconstruction is not affected by changes in the vehicle’s body color. However, body color can impact the accuracy of feature point extraction with this method, subsequently affecting the accuracy of the reflected rotation and translation measurements. Therefore, this experiment aims to investigate the influence of different body colors on the reconstruction accuracy of this method. Preliminary tests revealed that light-colored bodies generally do not affect feature extraction accuracy, whereas dark-colored bodies do, with the extraction errors being most significant for dark brown vehicles. The difficulty in feature extraction for dark brown vehicles is comparable to that of black vehicles, making the results for dark brown vehicles highly representative. Therefore, keeping other conditions constant, the test vehicle in this experiment was replaced with a dark brown vehicle of the same model. The rotation experiment from Section 4.1 was then repeated, and the results are shown in Table 4. The translation experiments from Section 4.2 were also repeated, and the results are shown in Table 5 and Table 6.

As shown in Table 4, the errors in measuring the rotation around the Z-axis for dark-colored vehicles using this method are 0.08 degrees, 0.09 degrees, and 0.15 degrees, respectively. Compared to the measurement accuracy for white vehicle bodies, there is no significant change, and these errors are still lower than the 0.2 degree error of the current vehicle alignment platform. As shown in Table 5 and Table 6, the errors in measuring the translation along the X-axis are 0.66 mm, 0.69 mm, and 0.38 mm, and the errors in measuring the translation along the Y-axis are 0.26 mm, 0.13 mm, and 0.14 mm, respectively. Compared to the white vehicle body, there is still no significant change, and the accuracy is superior to that of the vehicle alignment platform. Therefore, although a dark-colored vehicle body increases the difficulty of feature point extraction, the current method’s algorithm and lighting scheme can adapt to the feature extraction of dark-colored vehicles. The calculation accuracy for both rotation and translation is similar to that for white vehicle bodies and is superior to the current alignment device.

5. Discussion

Through the validation experiments set up, it can be seen that the coordinate system reconstruction using the method described in this paper benefits from the development of machine vision and deep learning technologies. Compared to traditional vehicle alignment platforms, this method involves simpler hardware equipment, reducing the total hardware cost of the implementation from tens of thousands of dollars to a few thousand dollars. Additionally, its maintenance and installation costs are much lower than those of traditional detection equipment. While significantly reducing costs, it still meets and exceeds the accuracy of vehicle alignment platforms.

However, this method still has some limitations. Firstly, the equipment manufactured based on this method is only suitable for a limited number of vehicle models and lacks general applicability. It is evident that when the vehicle model changes, the feature points required by this method also need to be reconfigured and extracted. Developing corresponding feature extraction schemes for all existing vehicle models is impractical. Therefore, this study is aimed at specialized equipment for a single automobile manufacturer, only requiring feature extraction schemes for a few vehicle models they produce. This approach can achieve general applicability across multiple production lines within a single automobile manufacturer. This is similar to traditional vehicle alignment platforms, where each manufacturer’s platform needs to be adjusted according to the vehicle models they produce, lacking universal applicability. Due to current technological limitations, this method cannot be widely implemented. However, for specific manufacturers, such as the one collaborating in this study, it already adapts to all their vehicle models. After more field tests, it will be deployed on production lines to replace alignment platforms.

Secondly, the working range of this method is largely limited by the camera’s working range. Therefore, this paper only verifies the accuracy within ±12.5 degrees of rotation and ±50 mm of translation. If the vehicle’s pose changes too much, this method will not be able to complete the coordinate system reconstruction. To ensure the reliability of this method, guidelines need to be set up at the actual detection station to ensure that the vehicle’s yaw angle and Y-axis translation do not exceed acceptable limits, and front wheel stops need to be set to ensure that the X-axis translation does not exceed acceptable limits. Since this study is aimed at a manufacturer whose detection station limits the maximum camera working distance to 700 mm, all configurations mentioned earlier are also limited to this working distance. When the maximum working distance changes, the camera model needs to be reselected based on actual needs.

Compared to traditional equipment, the method described in this paper offers better expandability. In the future, we will focus on defining or obtaining common features for all vehicles, which can solve the current applicability problem of this method, making it more valuable for broader applications.

6. Conclusions

Based on the actual production needs of the automobile manufacturer, this study explored a vehicle coordinate system reconstruction method based on feature point extraction. The feasibility of this method was verified through practical testing, and the following conclusions were obtained:

This study constructed a white body feature region detection model based on YOLOv5, achieving an accuracy of 97.3%, a recall rate of 92.9%, and a mean average precision of 93.7% for feature region detection.
Developed a feature point extraction algorithm based on template matching and ellipse fitting capable of extracting target feature points in the feature region.
A vehicle coordinate system reconstruction method was designed for End-of-Line ADAS calibration. The commonly used vehicle alignment platforms currently achieve a reconstruction accuracy of 2 mm for the translation along the X-axis, 1 mm for the translation along the Y-axis, and 0.2 degrees for the rotational angle. However, the proposed method demonstrates superior reconstruction accuracy compared to the current approach. Specifically, when the vehicle body is of a light color, the method achieves a rotational accuracy exceeding 0.15 degrees, an X-axis translation accuracy exceeding 0.6 mm, and a Y-axis translation accuracy exceeding 0.3 mm. For dark-colored vehicle bodies, the method achieves a rotational accuracy exceeding 0.2 degrees, an X-axis translation accuracy exceeding 0.7 mm, and a Y-axis translation accuracy exceeding 0.3 mm.

Author Contributions

Conceptualization, Z.D. and J.J.; methodology, Z.D.; software, J.J.; validation, J.J. and J.Z.; formal analysis, Z.D.; investigation, J.J.; resources, Z.D.; data curation, Z.D.; writing—original draft preparation, J.J.; writing—review and editing, Z.D.; visualization, L.K.; supervision, J.Z.; project administration, J.J.; funding acquisition, L.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

If you need to use the dataset mentioned in the text, the feature point extraction algorithm code, or the overall implementation code, please contact us via email at JMM289401@outlook.com to obtain the necessary resources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kakani, V.; Kim, H.; Kumbham, M.; Park, D.; Jin, C.-B.; Nguyen, V.H. Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors for the Advanced Driver-Assistance System (ADAS). Sensors 2019, 19, 3369. [Google Scholar] [CrossRef] [PubMed]
Weber, M.; Weiss, T.; Gechter, F.; Kriesten, R. Approach for improved development of advanced driver assistance systems for future smart mobility concepts. Auton. Intell. Syst. 2023, 3, 2. [Google Scholar] [CrossRef]
Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef] [PubMed]
Alrousan, Q.; Matta, S.; Tasky, T. Multi-Sensor Fusion in Slow Lanes for Lane Keep Assist System; 0148-7191; SAE Technical Paper; SAE International: Warrendale, PA, USA, 2021. [Google Scholar]
Kim, J.; Han, D.S.; Senouci, B. Radar and Vision Sensor Fusion for Object Detection in Autonomous Vehicle Surroundings. In Proceedings of the 2018 Tenth International Conference on Ubiquitous and Future Networks (ICUFN), Prague, Czech Republic, 3–6 July 2018; pp. 76–78. [Google Scholar]
Singh, A. Transformer-Based Sensor Fusion for Autonomous Driving: A Survey. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 3312–3317. [Google Scholar]
Zhon, Z.; Song, X. Design and Implementation of Factory Calibration System Based on ADAS Platform. Comput. Meas. Control 2022, 30, 3184. [Google Scholar]
Markofsky, M.; Schäfer, M.; Schramm, D. Use Cases and Methods of Virtual ADAS/ADS Calibration in Simulation. Vehicles 2023, 5, 802–829. [Google Scholar] [CrossRef]
Wei, L.; Xiao, L. Design of Vehicle ADAS Equipment Off-line Calibration Strategy. Auto Electr. Parts 2022, 48–49. [Google Scholar]
Cheng, L.; Sengupta, A.; Cao, S. 3D Radar and Camera Co-Calibration: A flexible and Accurate Method for Target-based Extrinsic Calibration. In Proceedings of the 2023 IEEE Radar Conference (RadarConf23), San Antonio, TX, USA, 1–5 May 2023; pp. 1–6. [Google Scholar]
ISO 8855: 2011; Road Vehicles—Vehicle Dynamics and Road-Holding Ability—Vocabulary. Technical Report; ISO: Geneva, Switzerland, 2011.
Ren, C.; Xu, J.; Wang, C.; Yang, C.; Gu, J.; Jiang, K. Application of Collaborative Robot on End-Off-Line Detection System for Auto Factory. Automob. Technol. Mater. 2022. [Google Scholar]
Min, H.; Wu, X.; Cheng, C.; Zhao, X. Kinematic and Dynamic Vehicle Model-Assisted Global Positioning Method for Autonomous Vehicles with Low-Cost GPS/Camera/In-Vehicle Sensors. Sensors 2019, 19, 5430. [Google Scholar] [CrossRef] [PubMed]
Flores, P. Global and Local Coordinates. In Concepts and Formulations for Spatial Multibody Dynamics; Springer: Cham, Switzerland, 2015; pp. 11–14. [Google Scholar]
Eggert, D.W.; Lorusso, A.; Fisher, R.B. Estimating 3-D rigid body transformations: A comparison of four major algorithms. Mach. Vis. Appl. 1997, 9, 272–290. [Google Scholar] [CrossRef]
Thiruppathi, R.; Selvam, G.; Kannan, M.G.; Baskaran, V. Optimization of Body-In-White Weld Parameters for DP590 and EDD Material Combination; 0148-7191; SAE Technical Paper; SAE: Warrendale, PA, USA, 2021. [Google Scholar]
Zhang, Y.-J. Camera Calibration. In 3-D Computer Vision: Principles, Algorithms and Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 37–65. [Google Scholar]
Yin, W.; Zang, X.; Wu, L.; Zhang, X.; Zhao, J. A Distortion Correction Method Based on Actual Camera Imaging Principles. Sensors 2024, 24, 2406. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Song, S.M.; Shi, X.P. An Improved Adaptive Correction Method for Camera Distortion. In Proceedings of the 33rd Chinese Control Conference, Nanjing, China, 28–30 July 2014; pp. 4821–4825. [Google Scholar]
Cucchiara, R.; Grana, C.; Prati, A.; Vezzani, R. A Hough Transform-Based Method for Radial Lens Distortion Correction. In Proceedings of the 12th International Conference on Image Analysis and Processing, Mantova, Italy, 17–19 September 2003; pp. 182–187. [Google Scholar]
Chari, V.; Veeraraghavan, A. Lens Distortion, Radial Distortion. In Computer Vision: A Reference Guide; Ikeuchi, K., Ed.; Springer: Cham, Switzerland, 2021; pp. 739–741. [Google Scholar]
Karamanli, A. Radial basis Taylor series method and its applications. Eng. Comput. 2021, 38, 2354–2393. [Google Scholar] [CrossRef]
Zhang, Y.; Guo, Z.; Wu, J.; Tian, Y.; Tang, H.; Guo, X. Real-Time Vehicle Detection Based on Improved YOLO v5. Sustainability 2022, 14, 12274. [Google Scholar] [CrossRef]
Hussain, M. YOLOv1 to v8: Unveiling Each Variant–A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
Maity, S.; Chakraborty, A.; Singh, P.K.; Sarkar, R. Performance Comparison of Various YOLO Models for Vehicle Detection: An Experimental Study; Springer: Singapore, 2023; pp. 677–684. [Google Scholar]
Perez, L.; Wang, J. The effectiveness of data augmentation in Image classification using deep learning. arXiv 2017, arXiv:1712.04621 2017. [Google Scholar]
Ludlow, J. Eight Tips for Optimal Machine Vision Lighting. Assembly 2012, 55, 42–47. [Google Scholar]
Martin, D. A Practical Guide to Machine Vision Lighting; Advanced Illumination: Rochester, VT, USA, 2007; pp. 1–3. [Google Scholar]
Gao, B.; Spratling, M.W. Shape–Texture Debiased Training for Robust Template Matching. Sensors 2022, 22, 6658. [Google Scholar] [CrossRef] [PubMed]
Su, Y.; Liu, Y.; Cuan, B.; Zheng, N. Contour Guided Hierarchical Model for Shape Matching. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1609–1617. [Google Scholar]
Talmi, I.; Mechrez, R.; Zelnik-Manor, L. Template Matching with Deformable Diversity Similarity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 175–183. [Google Scholar]
Sun, Q. An Improved Harris Corner Detection Algorithm; Springer: Singapore, 2020; pp. 105–110. [Google Scholar]
McIlhagga, W. The Canny Edge Detector Revisited. Int. J. Comput. Vis. 2011, 91, 251–261. [Google Scholar] [CrossRef]
Wang, W.; Wang, G.; Hu, C.; Ho, K.C. Robust Ellipse Fitting Based on Maximum Correntropy Criterion With Variable Center. IEEE Trans. Image Process. 2023, 32, 2520–2535. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Definition of vehicle coordinate system.

Figure 2. Vehicle alignment platform. 1. V-groove 2. Expanding lead screw.

Figure 3. Selected feature points.

Figure 4. Overall workflow diagram.

Figure 5. Camera installation position diagram.

Figure 6. Distortion correction effect diagram.

Figure 7. Threshold segmentation effect on large Area of white vehicle.

Figure 8. Threshold segmentation effect on a large area of the dark vehicle in the same region.

Figure 9. Sample dataset images.

Figure 10. Model training results.

Figure 11. Actual detection effect diagram.

Figure 12. Feature template extraction.

Figure 13. Extraction results under different shooting and lighting angles.

Figure 14. Fuel cap center extraction effect diagram.

Figure 15. Camera coordinate system and vehicle coordinate system.

Table 1. Rotation experiment results for the white vehicle.

Serial Number	Rotation Error (°)	Serial Number	Rotation Error (°)	Serial Number	Rotation Error (°)
Group1		Group2		Group3
0	0.06106456	0	0.011940902	0	0.101769954
1	0.12520578	1	0.078518433	1	0.222099227
2	0.05672865	2	0.015447324	2	0.182213674
3	0.06759493	3	0.120020601	3	0.060097901
4	0.20326134	4	0.080259201	4	0.355974318
5	0.04664799	5	0.014292221	5	0.012174529
6	0.138900167	6	0.041243078	6	0.001969286
7	0.148740236	7	0.199963316	7	0.104253871
8	0.160723236	8	0.190992423	8	0.097897069
9	0.117763513	9	0.154380578	9	0.030382058
Average Error	0.11266304	Average Error	0.090705808	Average Error	0.116883189

Table 2. Translation experiment results for the white vehicle along the X-axis.

Serial Number	Translation Error (mm)	Serial Number	Translation Error (mm)	Serial Number	Translation Error (mm)
Group1		Group2		Group3
1	1.212616	1	0.161628	1	0.297261
2	0.680244	2	0.24413	2	0.038846
3	0.192302	3	0.562938	3	0.717352
4	0.312884	4	0.647776	4	0.150448
5	0.288948	5	0.287786	5	0.730163
6	0.660793	6	0.3153	6	0.325741
7	0.860879	7	0.036497	7	0.368419
8	0.619974	8	0.093484	8	0.725637
9	0.100123	9	0.49675	9	0.742166
10	0.3524	10	0.01389	10	0.588189
11	0.374209	11	0.635626	11	0.603003
12	0.2896	12	0.512492	12	0.217116
13	0.555504	13	0.293146	13	0.518138
14	0.200913	14	0.424166	14	0.223727
15	1.626608	15	0.836264	15	0.638419
16	0.512087	16	0.202957	16	0.583547
17	0.078319	17	0.313625	17	0.20543
18	0.949803	18	0.69512	18	0.729247
19	0.590864	19	0.608455	19	0.20906
20	0.761421	20	0.745988	20	0.417868
Average Error	0.56102455	Average Error	0.4064009	Average Error	0.45148885

Table 3. Translation experiment results for the white vehicle along the Y-axis.

Serial Number	Translation Error (mm)	Serial Number	Translation Error (mm)	Serial Number	Translation Error (mm)
Group1		Group2		Group3
1	0.063245	1	0.240266	1	0.005953
2	0.16856	2	0.115381	2	0.049188
3	0.488055	3	0.012431	3	0.066972
4	0.286109	4	0.10641	4	0.077893
5	0.267347	5	0.113603	5	0.313601
6	0.163825	6	0.034446	6	0.254591
7	0.506093	7	0.128727	7	0.162956
8	0.240081	8	0.007877	8	0.04599
9	0.06413	9	0.115182	9	0.06651
10	0.604514	10	0.268209	10	0.007594
11	0.153304	11	0.24639	11	0.15872
12	0.281826	12	0.038554	12	0.037033
13	0.093734	13	0.113457	13	0.198007
14	0.260771	14	0.120136	14	0.350038
15	0.243542	15	0.183649	15	0.21578
16	0.170627	16	0.029727	16	0.389129
17	0.07739	17	0.143945	17	0.246755
18	0.107461	18	0.259571	18	0.160788
19	0.339932	19	0.049695	19	0.033095
20	0.372605	20	0.17314	20	0.048491
Average Error	0.24765755	Average Error	0.1250398	Average Error	0.1444542

Table 4. Rotation experiment results for the dark brown vehicle.

Serial Number	Rotation Error (°)	Serial Number	Rotation Error (°)	Serial Number	Rotation Error (°)
Group 1		Group 2		Group 3
0	0.0290804	0	0.086217737	0	0.124777565
1	0.14910198	1	0.096245506	1	0.160051482
2	0.11235762	2	0.061321481	2	0.168438072
3	0.016245862	3	0.109398444	3	0.115871588
4	0.108355514	4	0.001538416	4	0.121698554
5	0.103137187	5	0.143563245	5	0.234350587
6	0.089882864	6	0.135435349	6	0.0297635
7	0.083852585	7	0.155285655	7	0.192473126
8	0.0680272	8	0.077534128	8	0.173592166
9	0.0403274	9	0.02620824	9	0.124777565
Average Error	0.080036861	Average Error	0.08927482	Average Error	0.146779627

Table 5. Translation experiment results for the dark brown vehicle along the X-axis.

Serial Number	Translation Error (mm)	Serial Number	Translation Error (mm)	Serial Number	Translation Error (mm)
Group1		Group2		Group3
1	0.6274918	1	0.5371849	1	0.115956498
2	1.1275836	2	0.7283951	2	0.988685804
3	0.3847291	3	0.4938176	3	0.125872326
4	0.5172836	4	0.6129483	4	0.189174608
5	0.6928473	5	0.8571934	5	0.106124861
6	0.4219857	6	0.3728419	6	0.241910168
7	0.7592846	7	0.6471825	7	0.250135184
8	0.3928471	8	0.5831927	8	0.174699325
9	0.5873921	9	0.2918374	9	0.870254966
10	1.0157392	10	0.7692831	10	0.800122681
11	0.3582917	11	0.1849372	11	0.382251901
12	1.3782956	12	1.0193847	12	0.685869685
13	0.2164829	13	0.4351829	13	0.506938138
14	0.5938174	14	1.1728391	14	0.332190276
15	1.1248397	15	0.6934812	15	0.674503059
16	0.6482917	16	1.2834917	16	0.056187558
17	0.7251938	17	0.5821948	17	0.116162652
18	0.2918374	18	1.3928475	18	0.292512886
19	0.9174826	19	0.4739184	19	0.656851104
20	0.4621849	20	0.6852937	20	0.151100448
Average Error	0.66219509	Average Error	0.690872405	Average Error	0.385875206

Table 6. Translation Experiment Results for Dark Brown Vehicle along Y-axis.

Serial Number	Translation Error (mm)	Serial Number	Translation Error (mm)	Serial Number	Translation Error (mm)
Group1		Group1		Group1
1	0.063245	1	0.240266	1	0.005953
2	0.16856	2	0.115381	2	0.049188
3	0.488055	3	0.012431	3	0.066972
4	0.286109	4	0.10641	4	0.077893
5	0.267347	5	0.113603	5	0.313601
6	0.163825	6	0.034446	6	0.254591
7	0.506093	7	0.128727	7	0.162956
8	0.240081	8	0.007877	8	0.04599
9	0.06413	9	0.115182	9	0.06651
10	0.28543	10	0.268209	10	0.007594
11	0.73664	11	0.24639	11	0.15872
12	0.281826	12	0.038554	12	0.037033
13	0.093734	13	0.113457	13	0.198007
14	0.260771	14	0.120136	14	0.350038
15	0.243542	15	0.183649	15	0.21578
16	0.170627	16	0.029727	16	0.389129
17	0.07739	17	0.143945	17	0.246755
18	0.107461	18	0.259571	18	0.160788
19	0.339932	19	0.049695	19	0.033095
20	0.372605	20	0.17314	20	0.048491
Average Error	0.26087015	Average Error	0.1250398	Average Error	0.1444542

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, Z.; Jiang, J.; Zheng, J.; Kong, L. Machine Vision-Based Method for Reconstructing the Vehicle Coordinate System in End-of-Line ADAS Calibration. Electronics 2024, 13, 3405. https://doi.org/10.3390/electronics13173405

AMA Style

Ding Z, Jiang J, Zheng J, Kong L. Machine Vision-Based Method for Reconstructing the Vehicle Coordinate System in End-of-Line ADAS Calibration. Electronics. 2024; 13(17):3405. https://doi.org/10.3390/electronics13173405

Chicago/Turabian Style

Ding, Zhigang, Jingjing Jiang, Jishi Zheng, and Linghua Kong. 2024. "Machine Vision-Based Method for Reconstructing the Vehicle Coordinate System in End-of-Line ADAS Calibration" Electronics 13, no. 17: 3405. https://doi.org/10.3390/electronics13173405

APA Style

Ding, Z., Jiang, J., Zheng, J., & Kong, L. (2024). Machine Vision-Based Method for Reconstructing the Vehicle Coordinate System in End-of-Line ADAS Calibration. Electronics, 13(17), 3405. https://doi.org/10.3390/electronics13173405

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Vision-Based Method for Reconstructing the Vehicle Coordinate System in End-of-Line ADAS Calibration

Abstract

1. Introduction

2. Materials

2.1. Vehicle Coordinate System Reconstruction Based on Rigid Body Transformation

2.2. Pixel Coordinate System to Camera Coordinate System

3. Methods

3.1. Camera Selection and Image Distortion Correction

3.2. Feature Region Target Detection

3.2.1. Dataset Construction

3.2.2. Detection Model

3.2.3. Results Analysis

3.3. Feature Point Extraction

3.3.1. Light Source Selection

3.3.2. Corner Extraction Based on Template Matching

3.3.3. Fuel Cap Center Extraction Based on Ellipse Fitting

3.4. Acquisition of 3D Coordinates in the Camera Coordinate System

4. Results

4.1. Rotation Experiment

4.1.1. Evaluation Metrics

4.1.2. Experiment Design

4.2. Translation Experiment

4.3. Adaptability Experiment for Dark Vehicle Bodies

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI