An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models

Chen, Junbo; Lu, Shunlai; Zhong, Lei

doi:10.3390/app14177716

Open AccessArticle

An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models

by

Junbo Chen

¹

,

Shunlai Lu

² and

Lei Zhong

^2,*

¹

School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China

²

Institute of Artificial Intelligence and Data Science, Xi’an Technological University, Xi’an 710021, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7716; https://doi.org/10.3390/app14177716 (registering DOI)

Submission received: 14 July 2024 / Revised: 30 August 2024 / Accepted: 30 August 2024 / Published: 1 September 2024

(This article belongs to the Collection Urban Transport Systems Efficiency, Network Planning and Safety: Volume II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the rapid increase in the number of vehicles on the road, minor traffic accidents have become more frequent, contributing significantly to traffic congestion and disruptions. Traditional methods for determining responsibility in such accidents often require human intervention, leading to delays and inefficiencies. This study proposed a fully intelligent method for liability determination in minor accidents, utilizing collision detection and large language models. The approach integrated advanced vehicle recognition using the YOLOv8 algorithm coupled with a minimum mean square error filter for real-time target tracking. Additionally, an improved global optical flow estimation algorithm and support vector machines were employed to accurately detect traffic accidents. Key frames from accident scenes were extracted and analyzed using the GPT4-Vision-Preview model to determine liability. Simulation experiments demonstrated that the proposed method accurately and efficiently detected vehicle collisions, rapidly determined liability, and generated detailed accident reports. The method achieved the fully automated AI processing of minor traffic accidents without manual intervention, ensuring both objectivity and fairness.

Keywords:

minor traffic accident; collision detection; GPT4-vision-preview; liability determination; without manual intervention

1. Introduction

As global urbanization accelerates, car ownership has surged and urban traffic density has increased significantly [1,2,3], posing significant challenges to traffic management. This situation not only increases the frequency of minor traffic accidents, but also exacerbates traffic congestion. Although minor accidents typically do not result in casualties, the efficient and accurate management of these incidents is crucial for public safety and the smooth operation of transportation systems [4].

Recent advancements in machine learning and artificial intelligence have paved the way for new methods to predict and prevent traffic accidents. For instance, Kodapogu et al. [5] utilized machine learning techniques to predict the severity of road accidents, demonstrating the potential of computational tools in enhancing traffic safety proactively. Kan et al. [6] developed an integrated model that combined convolutional neural networks (CNN), bidirectional long short-term memory (BiLSTM), and an attention mechanism, which significantly enhanced highway traffic flow prediction and demonstrated the potential of advanced predictive models in improving traffic safety. Similarly, Muhammad et al. [7] proposed a dynamic spatial-temporal attention (DSTA) network that utilized dashcam video data to predict traffic accidents early by learning the spatial-temporal features, enhancing accident prevention preparedness.

However, traffic accidents continue to occur despite these advancements. When a minor traffic accident occurs, a primary cause of traffic congestion is the challenge in determining the liability of the involved parties. This process typically involves waiting for traffic police to arrive at the scene to formally assign liability [8]. During this time, the affected lanes remain blocked, exacerbating traffic delays and congestion. Consequently, there is a pressing need for more advanced technical solutions to address traffic accidents.

Currently, the majority of methods for managing minor accidents still depend heavily on manual review. One approach is to manage liability determination online to minimize manual intervention. For instance, the “Traffic Management 12123” mobile app, promoted by Qingdao City, enables individuals involved in minor road traffic accidents to resolve disputes independently through online regulations [9]. Although this initiative has been somewhat successful, with the capability for online liability determination, self-resolution rates remain below 20%, and the overall effectiveness of the process is still unsatisfactory.

Some scholars have proposed automatic accident liability determination schemes based on traffic accident collision detection [10,11,12,13], however, the results of liability determination are not clear enough, and the specific proportion of accident liability cannot be accurately divided. Moreover, the scope of such liability determinations is quite limited and fails to address the complexities of dynamic traffic environments.

In summary, there is an urgent need to explore more efficient and precise technical methods to enhance the handling of accidents and alleviate both traffic congestion and the burden on traffic police.

Based on this foundation, our study introduced an autonomous method for determining liability in minor traffic accidents, leveraging collision detection and large language models. This method was designed to autonomously manage the rapid resolution of traffic incidents by precisely quantifying the proportion of responsibility for each party without human intervention. This capability aims to streamline traffic management, potentially reduce the need for on-site police intervention, and enable traffic authorities to allocate resources more effectively.

2. Methodology

This section outlines the computational approaches and algorithms employed to address the challenges identified in the scope of this research.

2.1. YOLOv8 Network

YOLOv8 (You Only Look Once v8) is a sophisticated, single-stage object detection network designed for the rapid and precise identification and localization of objects within images [14]. A key innovation in YOLOv8 is the C2f (Coarse-to-Fine) module, which significantly enhances the model’s capacity to discern complex patterns and intricate details in vehicle recognition tasks.

The C2f module operates by initially processing the input image through a convolutional layer, applying filters to detect fundamental features such as edges and textures. This layer generates a feature map that is subsequently divided into two segments. Each segment is further processed through distinct convolutional pathways, each capturing different levels of detail; one focuses on broader, coarser features while the other concentrates on finer, more specific details. Following this dual processing, the outputs are recombined, enabling the model to amalgamate both coarse and fine information to enhance the prediction accuracy.

This architecture enables the C2f module to extract a more comprehensive set of features from the image, proving especially advantageous in recognizing vehicles where both general shapes and specific details (such as logos or lights) are crucial [15]. By integrating these multi-scale features, the YOLOv8 network achieves enhanced accuracy in identifying vehicles, even under challenging conditions such as partial obscuration or variable lighting [16].

Figure 1 illustrates the structure of the C2f module [17], highlighting the process by which the input is divided, processed, and subsequently merged to enhance feature detection.

2.2. “Minimum Output Sum of Squared Error” Tracking Algorithm

Bolme et al. [18] proposed the use of the MOSSE (Minimum Output Sum of Squared Error) filter for target tracking. This tracking algorithm demonstrates significant robustness in handling rotation, occlusion, and illumination changes while maintaining a high speed throughout the tracking process. The FPS (frames per second) comparison between the MOSSE tracker and other filter algorithms across a dataset of 50 videos is illustrated in Table 1.

The core of the MOSSE tracker is the minimization of the output sum of squared error across all training images. The mathematical model of the optimal filter

H

can be expressed as:

H = a r g m i n \sum_{i} {‖F_{i} ⊙ H^{*} - G_{i}‖}^{2}

(1)

where

H^{*}

represents the complex conjugate of

H

.

F_{i}

represents the Fourier transform of the pre-processed training image

f_{i}

,

G_{i}

is the Fourier transform of the desired output (a Gaussian function centered on the target),

H^{*}

denotes the complex conjugate of the filter

H

, and

⊙

indicates element-wise multiplication.

Equation (1) minimizes the squared difference between the filtered input image

F_{i} ⊙ H^{*}

and the desired Gaussian output

G_{i}

, leading to an optimal filter that best matches the target appearance.

The MOSSE filter is updated iteratively based on the current frame to maintain tracking accuracy. The updated position of the target is determined by calculating the correlation graph

C_{g}

using the inverse Fourier transform:

{C_{g} = ℱ}^{- 1} (F ⊙ H^{*})

(2)

where

ℱ^{- 1}

represents the inverse Fourier transform,

C_{g}

is the correlation graph indicating the location of the tracked object, and

F

is the Fourier transform of the current frame.

The position of the target in the current frame is identified as the peak of the correlation graph

C_{g}

, given by:

(x_{n e w}, y_{n e w}) = a r g m a x (C_{g})

(3)

Here,

(x_{n e w}, y_{n e w})

represents the coordinates of the target in the current frame, and

a r g m a x (C_{g})

finds the coordinates where the graph

C_{g}

is maximized, indicating the target’s location.

2.3. Horn and Schunck Algorithm

The HS (Horn and Schunck) algorithm, extensively used for estimating the optical flow vector fields [20,21,22], computes velocity vectors that indicate each pixel’s direction and speed between consecutive images in a sequence. Operating under the assumption of smooth motion within images, this algorithm calculates the optical flow vector field by minimizing an energy functional. The energy functional is composed of two components: the data term

I_{x} u + I_{y} v + I_{t}

and the smoothing term

α^{2} ({|\nabla u|}^{2} + {|\nabla v|}^{2})

, expressed as:

E = \iint {(I_{x} u + I_{y} v + I_{t})}^{2} + α^{2} ({|\nabla u|}^{2} + {|\nabla v|}^{2}) d x d y

(4)

In this formulation,

I_{x}

and

I_{y}

represent the spatial derivatives of the image brightness in the horizontal

(x)

and vertical

(y)

directions, respectively, while

I_{t}

denotes the temporal derivative. The variables

u

and

v

correspond to the horizontal and vertical components of the optical flow vector, respectively. The regularization parameter

α

is employed to balance the influence between the data term and the smoothing term, where

{|\nabla u|}^{2} + {|\nabla v|}^{2}

imposes a smoothness constraint on the optical flow field. Consequently, at time

t

, the optical flow vector at a given image coordinate

(x, y)

can be represented as

(u_{x, y, t}, v_{x, y, t})

.

Estimating the optical flow vector field involves minimizing the energy functional presented in Equation (4). To achieve this, the Euler–Lagrange equation is applied to minimize

E (u, v)

, resulting in a system of partial differential equations [23]. This system is then solved iteratively to derive the optical flow vector

(u_{x, y, t}, v_{x, y, t})

.

2.4. Multimodal Large Language Model

A multimodal large language model is an artificial intelligence model that can process multiple data types (such as text, images, and audio) [24,25]. Unlike traditional models that process a single modality (such as text only), multimodal models can process multiple types of inputs simultaneously, thus having greater adaptability and flexibility in generating and understanding outputs for complex real-world scenarios [26,27].

GPT-4 [28], released by OpenAI, has demonstrated excellent performance and has performed well in several professional and academic benchmarks including passing a simulated bar exam with a score close to the top 10%, proving the feasibility of applying large language models in the field of legal judgment. With the launch of the GPT4-vision-preview (GPT4-V) model, multimodal large language models have made new progress in fusing text and images. By combining text cues in the video and analyzing the video content frame by frame, the model can infer the logical relationship of the development of events based on the image sequence and conduct in-depth decision analysis.

3. Fully Intelligent Accident Liability Determination Method Based on Collision Detection

This section introduces an advanced methodology for determining accident liability utilizing fully intelligent systems. The input for the proposed method is video captured by traffic road cameras; the overall process plan in our study is outlined as follows:

1. Vehicle recognition and tracking: Utilize the YOLOv8 algorithm for high-precision vehicle recognition, followed by real-time target tracking with the MOSSE filter.

2. Accident detection: Calculate the amplitude change in the global optical flow for each frame at the time of the accident and establish a threshold to extract features indicative of violent or abnormal motion patterns. These features are subsequently input into a trained SVM (support vector machine) to detect traffic accidents.

3. Liability determination: Extract key frames of the accident using the K-means algorithm. These key frames, along with descriptive information, are passed into a large language model for in-depth analysis. The system then automatically generates an accident liability determination report.

The modular process of the fully intelligent accident liability determination method, based on collision detection, is depicted in Figure 2.

3.1. Vehicle Recognition and Tracking

Accurate target recognition and effective tracking are crucial for the success of our proposed system in collision detection and liability determination. The process commences with the recognition of vehicles in traffic road videos using the trained YOLOv8 network model. The YOLOv8 model scans video frames to generate a tracking window that accurately outlines the recognized vehicle. This tracking window forms the basis for the subsequent tracking process, ensuring consistent monitoring of the identified vehicle across multiple frames.

Once the tracking window is established, the next critical step involves preprocessing the pixel values within this window to enhance the tracking filter’s accuracy. Initially, each pixel value

p

in the tracking window undergoes a logarithmic transformation to yield

p^{'}

.

p^{'} = l o g (1 + p)

(5)

Subsequently,

p^{'}

is normalized according to Formula (6)

p^{''} = \frac{p^{'} - μ}{σ_{p^{'}}}

(6)

where

p ”

represents the normalized pixel value, with

μ

and

σ_{p'}

being the mean and standard deviation of the pixel values post-logarithmic transformation, respectively.

To focus on the center of the tracking window, a cosine window function is applied to taper the pixel values toward the edges. If the window

W

is an

N \times M

matrix, the cosine window function can be expressed as:

W (n, m) = 0.5 (1 - c o s (\frac{2 π n}{N})) \cdot 0.5 (1 - c o s (\frac{2 π m}{M}))

(7)

where

n = 0, 1, . . ., N - 1

and

m = 0, 1, . . ., M - 1

are the row and column indices of the pixel in the window, respectively. The final preprocessed pixel value is obtained using Formula (8):

P_{f i n a l} (n, m) = p^{''} (n, m) \cdot W (n, m)

(8)

P_{f i n a l}

at the pixel location

(n, m)

forms a two-dimensional cosine window, peaking at the center and diminishing to zero at the edge. This preprocessing step effectively converts and optimizes the image data to meet the tracking algorithm’s requirements, enhancing the central features of the tracked target while minimizing edge interference.

3.2. Vehicle Target Tracking Based on Minimum Error Square Filter

For the training image

f_{i}

, assuming that the center of the tracking target is

(x_{0}, y_{0})

, each pixel value

(x, y)

in

g_{i}

is determined by the Gaussian response function:

g_{i} (x, y) = A e x p (- \frac{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}{2 σ^{2}})

(9)

where

σ

is the standard deviation of the Gaussian distribution, which is used to determine the peak value of the Gaussian distribution.

By solving the optimization problem

H = a r g m i n \sum_{i} {‖F_{i} ⊙ H^{*} - G_{i}‖}^{2}

, we can derive the analytical solution for the filter

H

:

H^{*} = \frac{\sum_{i} G_{i} ⊙ F_{i}^{*}}{\sum_{i} {|F_{i}|}^{2}}

(10)

Here,

{|F_{i}|}^{2}

represents the element-wise product of

F_{i}

with its complex conjugate, and the division is also performed element-wise.

Combined with Formula (2), in the correlation graph

C_{g}

, the location of the maximum value indicates the updated position

(x_{n e w}, y_{n e w})

of the tracked target in the current frame. When the appearance of the tracking target is obscured or altered, the filter

H

can be updated online based on the updated position.

H_{n e w}^{*} = α \frac{G F^{*}}{F F^{*}} + (1 - α) H_{p r e v}^{*}

(11)

where

α

is the learning rate parameter, controlling the speed and extent of the updates.

3.3. Accident Identification Based on Collision Detection

In Section 3.2, the target tracking window for each frame is established. Collision detection was performed based on the changes in all pixel values within each tracking window.

3.3.1. Calculation of the Amplitude Change in the Optical Flow Vector

The HS algorithm was used to derive the energy functional of the optical flow vector as described in Equation (4). In certain image areas, particularly where gradient values are low, the optical flow constraints may not be valid, potentially resulting in unreliable optical flow calculations. Furthermore, the smoothness constraint does not apply in directions perpendicular to image boundaries, which can cause excessive smoothing and diminish critical image details [29]. Based on these observations, we propose an improved method that dynamically adjusts the smoothing parameters to accommodate changes in image content. A predefined threshold

T

sets the stage for defining the dynamic smoothing factor

λ

as follows:

λ = \{\begin{matrix} 1 if |I_{x} u + I_{y} v + I_{t}| > T \\ 0.5 otherwise \end{matrix}

(12)

The resulting improved energy function is defined as:

E (u, v) = \iint {(I_{x} u + I_{y} v + I_{t})}^{2} + α^{2} λ (x, y) ({|\nabla u|}^{2} + {|\nabla v|}^{2}) d x d y

(13)

The optical flow vector

(u_{x, y, t}, v_{x, y, t})

is derived by solving the Euler–Lagrange equation. Abnormal flow, potentially indicative of a collision, is detected by analyzing the amplitude changes in the optical flow vector. The amplitude change in the optical flow vector is calculated using the formula:

∆ m_{x, y, t} = \sqrt{u_{x, y, t}^{2} + v_{x, y, t}^{2}} - \sqrt{u_{x, y, t - 1}^{2} + v_{x, y, t - 1}^{2}}

(14)

A threshold

β

is used to construct a binary significance index

b_{x, y, t}

, which indicates significant amplitude changes:

b_{x, y, t} = \{\begin{matrix} 1 if ∆ m_{x, y, t} > β \\ 0 otherwise \end{matrix}

(15)

Each frame generates a binary significance map

b_{x, y, t}

, which reflects the significance of the change in the amplitude of the optical flow at each pixel.

3.3.2. Collision Detection Classification Based on SVM

For the entire video sequence, which contains

S

frames, calculate the average binary significance map value for each pixel position across all frames:

{\bar{b}}_{x, y} = \frac{1}{S} \sum_{t = 1}^{S} b_{x, y, t}

(16)

Each video frame is divided into

n

non-overlapping blocks, and the distribution of significant amplitude changes is independently quantified within each block. For each block, calculate the histogram of the saliency index

{\bar{b}}_{x, y}

, which represents the proportion of significant pixel positions within the block.

Concatenate the histogram results of all small patches to construct a feature vector:

V_{S V M} = [h_{1}, h_{2}, h_{i} \dots \dots h_{n}]

(17)

Here,

h_{i}

represents the histogram of the

i

-th block. Finally, the feature vector

V_{S V M}

is input into a trained SVM model [30], and the classifier assesses whether there are abnormal events, such as vehicle collisions, in the video sequence based on these features.

3.4. Responsibility Determination Method Based on Multimodal Large Language Model

The liability determination method we proposed operates autonomously by first extracting key frames from the video data and then combining these with relevant textual information. It utilizes a large language model to automatically generate an accident liability determination report.

3.4.1. Key Frames Extraction Based on K-Means Algorithm

Considering that an accident responsibility determination relies on comprehensive process information from the event, we employed a key frame extraction method based on K-means clustering. This method utilizes each frame in the video as a sample to comprehensively describe the motion process, thus avoiding the oversight of overall motion patterns by not merely focusing on local features.

For each video frame, we extracted a feature vector that captured its essential visual information. Specifically, the feature vector originated from the color histogram of the frame, computed in the HSV color space. This space is more perceptually uniform than the RGB color space and less sensitive to lighting variations.

Let

f_{i}

represent the

i

-th video frame, then the calculation steps for the feature vector of this frame are outlined as follows:

F_{H S V, i} = H S V (f_{i})

(18)

where

F_{H S V, i}

represents the frame

f_{i}

converted to the HSV color space.

Next, calculate the color histogram

h_{H S V, i}

of

F_{H S V, i}

:

h_{H S V, i} = H i s t (F_{H S V, i}, b i n s = (b H, b S, b V))

(19)

Here,

h_{H S V, i}

is the histogram of the frame

f_{i}

, with

(b H, b S, b V)

representing the number of bins for the hue, saturation, and value channels, respectively. In our implementation, we used

b H = b S = b V = 8

, resulting in a 512-dimensional histogram.

This histogram

h_{H S V, i}

was normalized to ensure that the feature vector was invariant to the number of pixels in the frame, and then flattened into a one-dimensional vector

V_{i}

.

V_{i} = \frac{h_{H S V, i}}{{‖h_{H S V, i}‖}^{2}}

(20)

Figure 3 illustrates a schematic diagram of the described process.

Given a feature set

{V_{1}, V_{2}

,

. . ., V_{N_{v}}}

of video frames, where

N_{v}

is the total number of frames in the video, these frames are divided into

K

clusters to maximize intra-cluster similarity and minimize inter-cluster similarity. Define the center of each cluster as

C_{k}

, where

k = 1, 2

,

. . ., K

. To minimize the sum of the distances between the frame features and their corresponding centers within a cluster, the objective function is defined as:

J = m i n \sum_{k = 1}^{K} \sum_{V_{t} \in C_{k}} {‖V_{t} - C_{k}‖}^{2}

(21)

where

{‖V_{t} - C_{k}‖}^{2}

represents the square of the Euclidean distance between the feature vector

V_{t}

of frame

t

and the center

C_{k}

of its cluster.

In the solution process, the cluster center

C_{k}

is initialized and iteratively updated until convergence is achieved. Each frame feature

V_{t}

is assigned to the nearest cluster by minimizing the squared Euclidean distance to the corresponding cluster center

C_{k}

. The mean of the feature vectors from all frames in each cluster is recalculated to define the new cluster center

C_{k}

.

Upon the completion of clustering, each cluster represents a “scene” or a collection of frames with similar content in the video. The frame closest to the cluster center

C_{k}

in each cluster w selected as the representative of that cluster, designated as the key frame.

3.4.2. GPT4-V API Access

The key frames were encoded into JPEG format images in time series order, then converted into the base64-encoded strings, and finally, all converted frames (base64-encoded strings) were stored in the base64Frames list.

Define a prompt that contains an analysis request, directing the generation of an accident responsibility analysis report. The report should comprehensively cover aspects including but not limited to the time, location, vehicles involved, accident process, and possible causes. Drawing from the accident description report, applicable laws and regulations, and precedents, the GPT4-V model can generate a judgment that clarifies the accident responsibility and apportions liability between the involved parties.

The proposed autonomous accident liability determination process is depicted in Figure 4.

4. Experimental Analysis

In this section, we analyze the stages of the algorithm framework including vehicle recognition, vehicle tracking, collision detection, and responsibility determination.

4.1. Experimental Parameter Settings

The environment configuration of our study was shown in Table 2:

4.2. Vehicle Recognition Accuracy

Vehicle recognition training was conducted on the KITTI dataset. This dataset is co-sponsored by the Karlsruhe Institute of Technology in Germany and Toyota Technological Institute at Chicago, specifically for research in autonomous driving. The dataset comprises 7480 images, categorized into three labels: vehicles, pedestrians, and bicycles. This is partitioned into training, validation, and test sets in an 8:1:1 ratio.

During the training process, the settings are detailed in Table 3.

This section focuses on the recognition of vehicle categories. After training, the model achieved a vehicle recognition accuracy of 0.894, a recall rate of 0.882, an mAP@50 of 0.938, and an mAP@50-95 of 0.735. These results demonstrate that the model performed robustly in vehicle recognition, accurately and comprehensively detecting most instances. Figure 5 illustrates the training curves for the model’s precision, recall, combined precision–recall, and F1-score, each evaluated at various confidence thresholds. These metrics were evaluated across three categories: cars, pedestrians, and cyclists.

To assess the model’s performance under diverse environmental conditions, we utilized the test set of the UA-DETRAC dataset, encompassing weather conditions such as sunny, cloudy, rainy, and night. This dataset offers a robust testing environment to assess the adaptability and performance of the model across various scenarios. Figure 6 displays representative scenarios we selected from the UA-DETRAC dataset.

The test set was segmented based on weather conditions, and the performance of the trained YOLOv8 model was assessed across each weather-specific dataset. The evaluation metrics employed included recall, precision, F1-score, and average precision (AP). Datasets were categorized into sunny, cloudy, rainy, and night conditions, containing 740, 736, 245, and 510 images, respectively. The vehicle recognition performance is summarized in Table 4.

The results demonstrate that the YOLOv8 model consistently delivered high recognition performance across various weather conditions. Notably, the performance slightly diminished during nighttime, potentially due to the limited representation of night scenes in the KITTI dataset.

4.3. Tracking Performance

Our experiments determined that a standard deviation (

σ

) of 3 for the Gaussian function, as outlined in Equation (9), optimally enhanced the blurring effect on the training image

f_{i}

, resulting in improved outcomes. Furthermore, the learning rate (

α

) was optimized to 0.22 to enhance the filter performance.

To assess the real-time tracking performance, the PSR (peak sidelobe ratio) was recorded for each frame. The PSR evaluates the clarity and accuracy of the adaptive processing output by contrasting the peak’s statistical properties, ideally located at the target position, with those of the surrounding sidelobe areas. For analysis, a 10 × 10 window centered at the latest position

(x_{n e w}, y_{n e w})

was defined, with the remaining area designated as the sidelobe area. The PSR for each frame was calculated as follows:

P S R = \frac{P_{m a x} - μ_{p s r}}{σ_{p s r}}

(22)

where

μ_{p s r}

is the mean value of the pixel value in the sidelobe area, and

σ_{p s r}

is the standard deviation of the pixel value in the sidelobe area. We conducted tracking tests on 153 videos from vehicle collision searches, sourced publicly on the Internet, devoid of any personally identifiable information or visible license plate details, thus ensuring adequate privacy protection. Following testing, a PSR threshold below 6 was considered indicative of tracking failure. This necessitated either halting the online updates or reinitializing the target.

Taking a vehicle collision video as an example, we recorded the PSR in 70 video frames, as shown in Figure 7.

Calculations indicated that the average PSR was 9.91. Although the PSR significantly dropped at the moment of collision, it remained above 8, suggesting that the tracking performance of this method is effective.

4.4. Collision Detection Efficacy

The collision detection stage employed the improved HS algorithm and SVM classification. Analysis will begin with these two models to evaluate the results.

4.4.1. Effectiveness of the Improved HS Algorithm

Take two frames of a traffic road video as an example. The comparison results of the optical flow vectors obtained by the improved HS algorithm and the classic algorithm are shown in Figure 8. We used red vector arrows to represent optical flow vectors.

In Figure 8a, using the original Horn and Schunck algorithm, the optical flow vectors displayed discontinuities in multiple areas, particularly around the car on the left side of the image. In contrast, the results from the improved algorithm Figure 8b showed smoother and more coherent optical flow vectors, particularly along the vehicle’s edges, thereby reducing errors.

The comparison between the improved and classic algorithms was analyzed through the average endpoint error (AEE), angular error, and FPS, with the performance results detailed in Table 5.

Although some computational efficiency was sacrificed, the improved algorithm significantly enhanced the AEE and slightly reduced the angular error, thereby increasing the accuracy of optical flow estimation.

4.4.2. Performance of SVM Collision Detection

When calculating the amplitude change in the optical flow vector, each frame was first resized to 134 × 100 pixels, and a threshold of

β

= 0.35 was applied according to Formula (15) to construct the feature vector. The image was then divided into 4 × 4 non-overlapping cells, and the amplitude change frequency was independently collected for each cell. The amplitude change distribution within each cell was represented by a fixed-size histogram. These histograms were subsequently concatenated into a 16-dimensional feature vector.

To create a training set for the SVM, we compiled a dataset of 178 videos sourced from publicly accessible Internet content, ensuring that they were devoid of personally identifiable information or sensitive data. Of these, 121 videos were classified as non-accident data, while the remaining videos were labeled as accident data. During training, 10% of the videos were designated as a test set. The feature vectors extracted from these videos were utilized as input features for training the SVM.

The grid search technique was employed to optimize the SVM hyperparameters. The available values for the regularization parameter

C

were

{0.1, 1, 10, 100}

, while the available values for the kernel function parameter gamma were

{‘ s c a l e ’, ‘ a u t o ’, 0.01, 0.1, 1}

. The types of kernel functions considered were

{‘ l i n e a r ’, ‘ r b f ’, ‘ p o l y ’}

. A 5-fold cross-validation strategy was adopted, where the model was trained and evaluated on five different training and validation sets for each parameter combination.

The regularization parameter

C

is crucial in controlling the trade-off between minimizing errors on the training data and reducing the norm of the model’s weights. A smaller value of

C

leads to a closer fit to the training data, potentially causing overfitting, while a larger value promotes better generalization at the possible expense of training set accuracy.

The gamma parameter, also known as the kernel coefficient, determines the influence of individual training examples. A low gamma value yields a broader, smoother decision boundary by considering points far from it, while a high gamma value enables the model to capture more complex patterns, which can also lead to overfitting.

Figure 9 is a comprehensive view of the model performance on different data subsets.

The grid search results in Figure 9 highlight the interaction of these hyperparameters to achieve the best model performance. The performance reached its peak when

C

was set to 0.1 and gamma was set to ‘scale’, suggesting that this configuration enables the model to balance flexibility and regularization effectively. This equilibrium is crucial because the dataset for collision detection likely contains noise and variability typical of real-world scenarios. The chosen parameters ensure that the model is adequately robust to handle such noise without being overly sensitive to it. The polynomial kernel’s success in this context also indicates that the relationships between features in the dataset are not strictly linear. Its ability to effectively capture higher-order feature interactions enhances the SVM’s ability to classify collision and non-collision frames more effectively. This ability is particularly important in the context of accident detection, where the interactions between variables, such as the speeds and trajectories of different vehicles, are often complex and nonlinear.

The model exhibited strong performance on the test set with this parameter combination, achieving an AP score of 0.756, a precision of 75%, an accuracy of 75%, and a recall of 1.0. The precision–recall curve is illustrated in Figure 10.

The average precision–recall score was 0.88. Initially, the precision was very high. As the recall rate increased, the precision decreased but remained relatively high overall. This demonstrated the model’s strong capability in recognizing positive samples (traffic accidents). It is important to note that within the limited test set, our model achieved a perfect recall of 1.0, ensuring that all positive samples were correctly identified. Although there is potential for improving precision, this approach remains effective as a screening tool, ensuring that no collision incidents are overlooked. Additionally, the simplicity and low computational resource requirements make this method practical for real-world applications.

4.5. Accident Liability Assessment

In this section, we will demonstrate the practical effectiveness of the proposed method in this paper through a typical traffic accident case, thereby further validating its applicability in real-world scenarios.

4.5.1. Key Frame Extraction Evaluation

A real traffic accident video from Sohu News’ official reports was used for analysis. This publicly accessible video depicts a collision between two vehicles on the road, capturing several key moments before, during, and after the accident [31]. We extracted frames 80 to 90 from this video to test the effectiveness of our algorithm in extracting key frames. These 20 frames contained the main details of the accident. Figure 11 illustrates the sequence of these original frames and the results after clustering by example.

To analyze the video frames effectively, we utilized the color histogram features of each frame as inputs for the K-means clustering algorithm. To facilitate a clear visualization of these clustering results, we applied PCA (principal component analysis) to reduce the high-dimensional color histogram features down to two dimensions. This reduction allowed us to construct a more interpretable clustering distribution map. Figure 12 illustrates the distribution of video frames within the feature space, where different colors correspond to frames that belong to different clusters.

As specified in Definition Formula (21), the number of clusters was set to eight. Consequently, eight key frames were identified and extracted, representing distinct stages or events within the video. These key frames are displayed in Figure 13.

4.5.2. Accident Liability Determination Results

Consider the typical traffic accident case depicted in Figure 13. In this incident, the white car suddenly braked to avoid a pedestrian crossing the road. This action caused the closely following black car to collide with the white car. Due to the failure to maintain a safe distance, both the black and white cars were damaged. A truck also executed a 180-degree rotation during the incident but was undamaged.

The challenge in determining liability lies in assessing whether the white car’s emergency braking should be deemed responsible. This decision is contentious. One argument is that the white car’s emergency braking led to the rear-end collision with the black car, warranting secondary responsibility. Conversely, the opposing view is that the white car, having executed an emergency brake, should not be held liable. Regarding this dispute, the traffic police conducted a thorough investigation and ultimately concluded that the black vehicle, having failed to maintain a safe distance, should bear full responsibility.

When submitting the key frame to the GPT4-V API, the supplementary text included only a description of the vehicle’s speed: “The black vehicle was traveling fast”. Based on this input, the model generated the following liability determination report, as shown in Table 6:

The result in this study for the typical traffic accident case aligned perfectly with the conclusion reached by the traffic police.

4.6. Analysis of the Impact of Image and Text Information on Liability Determination Results

In real-world judgment scenarios, statements from the parties involved in an accident can serve as supplementary textual information. Given the inherent subjectivity of these statements, they often carry strong biases that may influence the model’s judgment. When the extracted key frames do not clearly depict the accident scene, this supplementary textual information becomes critical for determining judgment outcomes. To examine this effect, we conducted two experiments.

Experiment 1: We aimed to explore the influence of subjective statements from the parties involved in the accident on the model’s judgment. The key frames extracted in Figure 13 were kept unchanged, and various textual descriptions reflecting the parties’ subjectivity were introduced. The model was tasked with evaluating whether these different statements could guide the judgment outcomes and whether the text accurately matched the key frames. Additionally, the model was required to determine responsibility based on a combined analysis of the key frames and accompanying textual descriptions.

Experiment 2: We removed a key frame depicting a vehicle crash and obscured vehicle information to assess the impact on the accuracy of the generated liability determination.

4.6.1. Experiment 1

To simulate varying degrees of subjectivity in supplementary textual information, we established the following scenarios:

In one scenario, the driver of the white car claimed to have braked suddenly to avoid a pedestrian, arguing they should not be held liable, while the driver of the black car contended that the abrupt stop left no time to brake, thus attributing responsibility to the white car. Another scenario lacked relevant statements from either party, necessitating the model to rely solely on objective evidence and legal provisions for an independent judgment. Scenarios were also included where the text contained distracting or irrelevant details, such as mentions of the weather or unrelated items in the vehicles, to test the model’s ability to filter out such information and concentrate on the facts pertinent to the accident. Additionally, cases where the text was misleading or incorrect were examined, such as false claims of deliberate actions or complications attributed to road conditions, to assess the model’s ability to exclude erroneous information and maintain fairness. Furthermore, some texts presented a mixture of relevant and irrelevant content, posing a challenge for the model to extract key information while disregarding distractions. The model was also tested with complex scenarios involving multiple vehicles and incidents, such as chain-reaction collisions, and situations where sudden health issues caused a driver to lose control, requiring a balanced approach in its liability determination.

Table 7 summarizes the influence of these different subjective expressions on the model’s accident judgments, demonstrating its capability to handle diverse and complex scenarios with varying degrees of subjectivity.

The supplementary text information in the test was designed to simulate various scenarios commonly encountered by traffic police in real-life judgments. A key factor in these scenarios is the motivation behind the white car’s sudden braking. For example, when the driver of the white car claimed he braked suddenly to avoid a pedestrian, the model considered his action reasonable under traffic regulations and assigned full responsibility to the driver of the black car for failing to maintain a safe distance and brake in time. This illustrates how the driver’s motivation directly influenced the model’s determination of liability.

As shown in Table 7, when the description text included key information—regardless of the presence of irrelevant details—the model accurately determined liability based on traffic regulations and the specifics of the accident. However, when key information was missing or incorrect, the model’s judgment became biased, resulting in inaccurate conclusions. In more complex situations, such as those involving adverse weather or sudden U-turns, the model’s judgment deviated from standard legal practices. For instance, the simulation incorrectly attributed responsibility to the white car, contrary to the standard where the black car would typically bear full responsibility. This discrepancy suggests limitations in the model’s logic and its interpretation of legal provisions in complex scenarios.

Furthermore, when conflicts arose between the textual information and visual evidence, the model exhibited signs of confusion. For instance, the model failed to detect that the road surface was not slippery, despite the key frames indicating otherwise and did not challenge the textual description of the white car’s sudden turn. This underscores the model’s difficulty in accurately identifying and resolving contradictions between textual descriptions and video evidence. In other instances, where no such contradictions or critical information gaps existed, the model accurately determined liability.

4.6.2. Experiment 2

Complete image information is often unavailable in real-life scenarios. To simulate this, Experiment 2 tested the model’s judgment when the key frame capturing the moment of collision was removed, as shown in Figure 14, and the vehicles were obscured, as illustrated in Figure 15. The results are presented in Table 8.

As shown in Table 8, missing or obscured frames can significantly impact the model’s judgment, particularly in understanding the critical details of the accident. For instance, if a key frame, such as the moment of collision, is missing, the model may struggle to accurately determine the accident’s cause and the specific actions of the involved parties, potentially leading to an incorrect assignment of responsibility. However, in this experiment, even with some visual information obstructed, the model was still able to correctly determine responsibility by relying on the supplementary textual information provided. While this demonstrates the model’s ability to compensate for certain visual gaps, further investigation is required to evaluate the model’s performance under more complex scenarios with more significant or varied obstructions.

5. Discussion

While the proposed method offers promising results in simulated settings, its real-world application in traffic systems revealed certain limitations and challenges.

5.1. Limitations of the Algorithm

The performance of the improved HS algorithm relies heavily on the threshold

T

, which regulates the adjustment of the smoothing parameter. Although effective in controlled experiments, its robustness under varied and unpredictable traffic conditions remains insufficiently validated. In extreme lighting conditions or with sudden changes in scene dynamics, the preset threshold may result in suboptimal traffic estimates.

The dataset sourced online for training the SVM for collision detection may not fully capture the diversity of real-world driving scenarios, particularly under adverse weather conditions such as heavy rain or fog. This limitation could impede the model’s ability to generalize effectively across various scenarios.

The application of large language models like GPT-4 in legal contexts, such as automated accident liability determination, raises concerns because of their susceptibility to generating plausible yet factually incorrect outputs [32,33,34]. This risk is especially significant in legal contexts where precision is paramount.

5.2. Challenges in Real-World Implementation

Implementing this automated system in real-world traffic scenarios presents several challenges that underscore the gap between the current capabilities and the potential for full autonomy. Due to current technological and regulatory constraints, the system cannot yet replace human intervention in determining liability in traffic accidents. Instead, its application is currently best suited for supportive roles that complement human decision-makers.

At the current stage, the system can effectively assist traffic authorities by providing initial assessments of accident scenarios, helping to prioritize cases or guide further investigations. This not only streamlines the process but also enables human officers to concentrate on more complex cases requiring nuanced judgment. Furthermore, a feedback mechanism could be integrated, permitting parties involved in an accident to request a human review if dissatisfied with the automated determination. This approach ensures that, while the system provides a preliminary analysis, ultimate accountability and decision-making remain with human authorities, thereby maintaining trust and accuracy.

The system could also be advantageous in legal settings by offering magistrates and judges with comprehensive, data-driven insights to support their rulings on accident liability cases. This could reduce court time by providing clear, objective data about the incident.

Moreover, the performance of such automated systems is highly dependent on the diversity and quality of the training data utilized. The system must be trained on a broad range of scenarios encompassing different weather conditions, times of day, and traffic densities to ensure robustness and reliability. The variability in environmental conditions and the complexity of real-world traffic scenarios present challenges in capturing all possible variables within training datasets.

Until technological and regulatory frameworks advance to enable greater autonomy and reliability for these systems, their implementation should be approached with careful consideration of their supportive role in human-led processes.

5.3. Legal and Ethical Considerations

Deploying this technology in traffic management systems entails substantial legal and ethical challenges. The use of automated algorithms for liability determination carries significant legal implications that necessitate transparent operations and strict adherence to existing legal frameworks to gain public trust and regulatory approval. Furthermore, the risk of algorithmic bias, especially in situations involving vulnerable road users, introduces ethical dilemmas that must be addressed diligently. It is imperative to rigorously assess these systems and maintain a level of human oversight to mitigate potential biases and ensure the fairness and reliability of the technology.

5.4. Future Work

Future research should aim to enhance the reliability of large models in handling complex legal scenarios, especially in mitigating potential “hallucinations” related to external references such as laws and regulations. To address these challenges, it is essential to develop mechanisms that can automatically detect and rectify inconsistencies in information, ensuring accuracy and consistency in the legal judgments produced by the models. Additionally, these models require further optimization to more effectively integrate and interpret multimodal information from both textual and visual data, particularly in determining legal liability. This would enhance the transparency of the decision-making process and provide more credible and fair judgments within the legal framework, thereby reducing legal risks associated with inaccurate information or misinterpretations.

Regarding deployment in real-world applications, the robustness of the software architecture is paramount. Future design efforts should focus on constructing an architecture that is highly fault-tolerant and scalable, capable of handling multi-source data inputs—including video, sensor data, and legal databases—and performing real-time processing to meet diverse scenario demands. Moreover, designing user interaction modes is essential; the system must provide an intuitive and interpretable interface that enables users to easily understand the basis and process of its judgments. Interactive feedback and customizable options should be included, enabling users to intervene or challenge decisions as needed, thereby enhancing user trust and the practical effectiveness of the system. A user-centered design will significantly enhance the system’s applicability and acceptance in real-world operations.

An important direction for future research is to explore the system’s potential to prevent traffic accidents, particularly those “intentional” incidents staged for fraudulent purposes. Leveraging predictive analytics and pattern recognition, the system could proactively identify and alert authorities to suspicious behaviors indicative of staged accidents, thereby contributing to safer roads and reducing fraudulent activities [35,36].

To enhance the accuracy of vehicle detection and collision recognition across various scenarios, it is crucial to expand and diversify the training dataset. The current dataset may not adequately cover all scenarios and conditions, potentially limiting the model’s performance in complex or unusual situations. Therefore, collecting and annotating a broader set of traffic video data—particularly under varying weather conditions, lighting, road types, and traffic densities—will significantly improve the model’s generalization ability. Additionally, including more samples featuring abnormal behaviors, such as sudden braking, speeding, and intentional collisions, will enable the model to more accurately identify and distinguish different types of traffic incidents.

6. Conclusions

This study proposed a fully automated liability determination solution for minor traffic accidents that was designed to operate without human intervention. The solution utilizes YOLOv8 for vehicle recognition and MOSSE filters for vehicle tracking. We validated the performance of vehicle recognition under various weather conditions and demonstrated the stability of tracking results through real-time PSR analysis. Furthermore, we enhanced the Horn and Schunck algorithm and integrated it with an SVM model to detect and recognize collision events, proving its feasibility in complex traffic scenarios. Based on the collision detection results, we analyzed the key frames of accidents using a multimodal large language model, supplemented with textual information to generate detailed liability reports. We also tested the impact of additional textual information and incomplete image data on the determination of liability in different scenarios. The results indicate that with sufficient information, the model can achieve high accuracy in analyzing accident liability.

This research not only offers an innovative technical solution for the swift handling of minor traffic accidents but also opens new avenues for the intelligent and automated management of traffic safety. Going forward, we will continue to refine the model to enhance its effectiveness across diverse scenarios and expand its potential applications within the broader domain of smart transportation, with the aim of achieving more efficient accident handling and more precise responsibility determination.

Author Contributions

Conceptualization, J.C. and L.Z.; Methodology, J.C.; Software, J.C. and S.L.; Resources, L.Z. and J.C.; Data curation, J.C.; Writing—original draft preparation, J.C. and S.L.; Writing—review and editing, J.C., S.L., and L.Z.; Visualization, J.C.; Supervision, L.Z.; Project administration, J.C. and S.L.; Funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China Youth Fund Project, grant number 61401360.

Institutional Review Board Statement

The study involved the analysis of publicly accessible traffic accident video frames from Sohu News, which did not contain any personally identifiable information. As the data were derived from publicly available sources and involved no direct human subjects, this type of publicly available information did not require Institutional Review Board review.

Informed Consent Statement

Informed consent was not sought for the specific use of these public domain images as they contained no identifiable private information and were sourced from publicly accessible media reports. The use of such data was considered exempt from typical consent requirements; however, this manuscript maintained adherence to ethical standards concerning public data.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank Haobin Li for his contributions to the initial stages of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Duan, L.; Song, L.; Wang, W.; Jian, X.; Heijungs, R.; Chen, W.-Q. Urbanization inequality: Evidence from vehicle ownership in Chinese cities. Humanit. Soc. Sci. Commun. 2024, 11, 703. [Google Scholar] [CrossRef]
Chang, Y.S.; Jo, S.J.; Lee, Y.-T.; Lee, Y. Population Density or Populations Size. Which Factor Determines Urban Traffic Congestion? Sustainability 2021, 13, 4280. [Google Scholar] [CrossRef]
Patrick, M.; Damon, H. Slower, smaller and lighter urban cars. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 1999, 213, 19–26. [Google Scholar] [CrossRef]
Ehsani, J.P.; Michael, J.P.; Mackenzie, E.J. The Future of Road Safety: Challenges and Opportunities. Milbank Q. 2023, 101, 613–636. [Google Scholar] [CrossRef] [PubMed]
Kodepogu, K.; Manjeti, V.B.; Siriki, A.B. Machine Learning for Road Accident Severity Prediction. Mechatron. Intell Transp. Syst. 2023, 2, 211–226. [Google Scholar] [CrossRef]
Kan, H.Y.; Li, C.; Wang, Z.Q. An Integrated Convolutional Neural Network-Bidirectional Long Short-Term Memory-Attention Mechanism Model for Enhanced Highway Traffic Flow Prediction. J. Urban Dev. Manag. 2024, 3, 18–33. [Google Scholar] [CrossRef]
Karim, M.M.; Li, Y.; Qin, R.; Yin, Z. A Dynamic Spatial-Temporal Attention Network for Early Anticipation of Traffic Accidents. IEEE Trans. Intell. Transp. Syst. 2022, 23, 9590–9600. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, Z.; Zhu, L.; Jiang, H. Determinants of the congestion caused by a traffic accident in urban road networks. Accid. Anal. Prev. 2020, 136, 105327. [Google Scholar] [CrossRef] [PubMed]
Chen, X. Online video quick handling of minor traffic accidents saves you trouble. Qinghai Legal News, 8 November 2023. [Google Scholar]
Wang, Y.; Cui, H. Traffic accident detection and responsibility determination based on image processing. Comput. Syst. Appl. 2022, 31, 120–126. [Google Scholar]
Bai, P.; Li, J. A video-based traffic accident detection method. J. Jinan Univ. Nat. Sci. Ed. 2012, 26, 282–286. [Google Scholar]
Xu, S.; Huang, H. Traffic crash liability determination: Danger and Dodge model. Accid. Anal. Prev. 2016, 95 Pt B, 317–325. [Google Scholar] [CrossRef]
Liu, S.; Zhang, Z.J.; Yu, Z.H. Research on liability Identification System of Road Traffic Accident. J. Comput. 2022, 33, 215–224. [Google Scholar]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Maity, S.; Chakraborty, A.; Singh, P.K.; Sarkar, R. Performance comparison of various YOLO models for vehicle detection: An experimental study. In Proceedings of the International Conference on Data Analytics & Management, Porto, Portugal, 25–29 September 2023; Springer Nature Singapore: Singapore, 2023; pp. 677–684. [Google Scholar]
Deng, T.; Liu, X.; Wang, L. Occluded Vehicle Detection via Multi-Scale Hybrid Attention Mechanism in the Road Scene. Electronics 2022, 11, 2709. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 2544–2550. [Google Scholar]
Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef] [PubMed]
Bruhn, A.; Weickert, J.; Schnörr, C. Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods. Int. J. Comput. Vis. 2005, 61, 211–231. [Google Scholar] [CrossRef]
Pinto AM, G.; Moreira, A.; Costa, P.; Correia, M. Revisiting Lucas-Kanade and Horn-Schunck. J. Comput. Eng. Inf. 2013, 1, 23–29. [Google Scholar] [CrossRef]
Dong, J. Study on video key frame extraction in different scenes based on optical flow. J. Phys. Conf. Ser. 2023, 2646, 012035. [Google Scholar] [CrossRef]
Chen, D.; Sheng, H.; Chen, Y.; Xue, D. Fractional-order variational optical flow model for motion estimation. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2013, 371, 20120148. [Google Scholar] [CrossRef] [PubMed]
Luo, J.; Sun, Y.; Qian, Z.; Zhou, L.; Wang, J. A review and prospect of large AI models. Radio Eng. 2023, 53, 2461–2472. [Google Scholar]
Huang, D.; Yan, C.; Li, Q.; Peng, X. From Large Language Models to Large Multimodal Models: A Literature Review. Appl. Sci. 2024, 14, 5068. [Google Scholar] [CrossRef]
Bo, H.; Wenchao, L.; Jin, L. Exploring the Capabilities of the ChatGPT Model: Prospects and Challenges in Industrial Applications. J. Wuhan Univ. (Sci. Ed.) 2024, 70, 267–280. [Google Scholar]
Zhuang, M. The application mechanism and development path of ChatGPT in the field of legal supervision. J. Jianghan Univ. (Soc. Sci. Ed.) 2024, 41, 14–22. [Google Scholar] [CrossRef]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
Qi, Y.; An, G.; Cao, Y. An improved global optical flow estimation method. Comput. Sci. 2012, 39, 510–512. [Google Scholar]
Hassner, T.; Itcher, Y.; Kliper-Gross, O. Violent flows: Real-time detection of violent crowd behavior. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1–6. [Google Scholar]
Following Too Closely! If You Hit Them, It’s Entirely Your Fault. Available online: https://www.sohu.com/a/518682257_121123756 (accessed on 20 June 2024).
McIntosh, T.R.; Liu, T.; Susnjak, T.; Watters, P.; Ng, A.; Halgamuge, M.N. A culturally sensitive test to evaluate nuanced GPT hallucination. IEEE Trans. Artif. Intell. 2023, 5, 2739–2751. [Google Scholar] [CrossRef]
Lee, M. A mathematical investigation of hallucination and creativity in GPT models. Mathematics 2023, 11, 2320. [Google Scholar] [CrossRef]
Sovrano, F.; Ashley, K.; Bacchelli, A. Toward eliminating hallucinations: GPT-based explanatory ai for intelligent textbooks and documentation. In CEUR Workshop Proceedings; No. 3444; CEUR-WS: Tokyo, Japan, 2023; pp. 54–65. [Google Scholar]
Muktar, B.; Fono, V. Toward Safer Roads: Predicting the Severity of Traffic Accidents in Montreal Using Machine Learning. Electronics 2024, 13, 3036. [Google Scholar] [CrossRef]
Arciniegas-Ayala, C.; Marcillo, P.; Valdivieso Caraguay, Á.L.; Hernández-Álvarez, M. Prediction of Accident Risk Levels in Traffic Accidents Using Deep Learning and Radial Basis Function Neural Networks Applied to a Dataset with Information on Driving Events. Appl. Sci. 2024, 14, 6248. [Google Scholar] [CrossRef]

Figure 1. Structure of the C2f module. This diagram shows the ConvBNSiLU module handling convolution, batch normalization, and SiLU activation functions, split into three parallel ‘BottleNeck’ pathways to enhance detailed feature extraction.

Figure 2. Flowchart of the accident liability determination process.

Figure 3. Feature extraction process from RGB to HSV conversion and vector representation.

Figure 4. Detailed flowchart of the autonomous accident liability determination process.

Figure 5. Performance metrics of YOLOv8 for object detection in traffic scenarios.

Figure 6. Representative scenarios in the UA-DETRAC dataset. A composite image demonstrating four environmental conditions—sunny, cloudy, rainy, and night—selected to highlight dataset diversity.

Figure 7. Vehicle tracking and PSR analysis. (a) Tracking visualization shows a vehicle tracked over several frames with blue bounding boxes; (b) PSR variation over 70 frames: graph displaying PSR metric changes over time.

Figure 8. Detailed visualization comparison of the optical flow estimations highlighting enhancements: (a) original Horn and Schunck algorithm results with a focus on specific flow details; (b) improved Horn and Schunck algorithm results showcasing enhanced flow accuracy.

Figure 9. Hyperparameter optimization via grid-search. The heatmap displays the grid-search results for optimizing the performance of our model. Each cell represents the accuracy score achieved with different combinations of the hyperparameters

C

and gamma, aiding in the selection of the optimal settings.

Figure 9. Hyperparameter optimization via grid-search. The heatmap displays the grid-search results for optimizing the performance of our model. Each cell represents the accuracy score achieved with different combinations of the hyperparameters

C

and gamma, aiding in the selection of the optimal settings.

Figure 10. SVM classifier performance: precision–recall curve with AP = 0.88.

Figure 11. Key frame extraction using K-means clustering from the video sequence.

Figure 12. PCA-reduced clustering distribution of video frames.

Figure 13. Key frames extracted from the K-means cluster analysis.

Figure 14. Schematic diagram of the missing frame.

Figure 15. Schematic diagram of occluded vehicles.

Table 1. Results of the comparison of correlation filter tracking algorithms [19].

Tracking Algorithm	FPS
MOSSE	615
TLD	28
Struck	20
MIL	38
ORIA	9
CT	64

Table 2. Environment configuration.

Configuration Item	Configuration Information
Operating system	Windows 10
Development language	Python 3.7.16
CPU	11th Gen Intel(R) Core (TM) i5-11400H @ 2.70GHz 2.69 GHz
GPU	Intel(R) UHD Graphics/NVIDIA GeForce RTX 3050 Ti Laptop GPU
Memory	16GB

Table 3. Training setup for YOLOv8.

Parameter	Description	Value
Pre-trained weights	Path to the pre-trained YOLOv8 weights file	YOLOv8s.pt
Epochs	Number of training cycles	15
Batch size	Number of images processed per batch	4
Image size	Dimensions of the images used in training (pixels)	640 × 640
Worker processes	Number of worker processes for data loading	4

Table 4. Test results of the UA-DETRAC dataset under four weather conditions.

Weather	Recall (%)	Precision (%)	F1-Score (%)	AP (%)
Cloudy	50.42	98.74	67.55	92.77
Night	48.55	97.89	65.92	89.97
Rainy	50.68	98.92	67.76	93.18
Sunny	52.11	98.99	67.88	94.56

Table 5. Performance comparison of the optical flow estimation algorithms.

Algorithm	AEE	Angle Error	FPS
Improved HS algorithm	12.02	66.09°	5.26
Classic HS algorithm	13.32	66.88°	6.25

Table 6. Liability determination letter generated by the model.

Traffic Accident Liability Determination
I. Basic Situation: Date and Time: 16 January 2022 Location: [Specific Location] Vehicles of the Parties: White car: To avoid an elderly person who suddenly crossed the road, emergency braking was performed. Black car: Driving at a high speed, failed to stop in time, and collided with the white car. Truck: Avoided direct collision with the white car, but just rotated. Road and traffic environment: No traffic signals or traffic signs control. How the accident happened:
On a highway without traffic lights, an elderly person suddenly crossed the road. To avoid the elderly person, the driver of the white car braked urgently, and the black car behind failed to brake in time due to excessive speed, and collided with the white car. At the same time, a truck was also present, but it only rotated and did not collide with other vehicles.
II. Accident Evidence and Analysis: Accident evidence: Key frames of surveillance video, witness testimony. Analysis: After analysis, the elderly person’s crossing was the direct cause of the accident, and the excessive speed of the black car and the failure to maintain a safe distance were the main causes of the collision.
III. Fault and responsibility analysis: White car: Emergency brake to avoid pedestrians, no fault. Black car: Failed to maintain a safe distance, failed to slow down in time according to road conditions, bear full responsibility. Truck: No collision, no responsibility. Responsibility determination: According to Article 43 of the Road Traffic Safety Act of the Republic of China, the driver of the black car violated the regulation that they must maintain a necessary safe distance from the vehicle in front, and therefore should bear full responsibility. The responsibility ratio is as follows: Black car: Bears full responsibility, the responsibility ratio is 100%. White car: No fault, the responsibility ratio is 0%. Truck: No fault, the responsibility ratio is 0%.

Table 7. The impact of additional textual information with subjective information on responsibility judgment.

Supplementary Text Information	Text Information Qualitative	Responsibility Determination Result
The driver of the white car claimed that he stopped so suddenly to avoid the old man, so he was not responsible; the driver of the black car claimed that the driver of the white car stopped suddenly and did not have time to brake, so the responsibility should be on the driver of the white car.	In this scenario, the model needs to fairly analyze the relevance of these conflicting statements with other evidence to make a balanced judgment of responsibility.	Article 43 of the “Road Traffic Safety Law of the People’s Republic of China” stipulates that the rear car should maintain a safe distance from the front car sufficient to take emergency braking measures. The driver of the black car failed to maintain a sufficient safe distance and failed to brake in time, so should bear all the responsibility for this accident. During the emergency braking process, the driver of the white car avoided pedestrians who suddenly crossed the road. His behavior was reasonable defense and complied with relevant road traffic safety regulations. He did not violate Article 47 of the “Road Traffic Safety Law of the People’s Republic of China” regarding stopping to let pedestrians pass. Therefore, the driver of the white car was not responsible. In summary, the driver of the black car was fully responsible and the driver of the white car was not responsible.
\	In such cases, the model must rely entirely on objective data and laws to make an independent and unbiased judgment.	According to the relevant provisions of the Road Traffic Safety Law of the People’s Republic of China, the white car did not ensure safety when turning, which violated the provisions that motor vehicles should slow down or stop to give way when encountering vehicles turning left in front (Article 44), and was primarily responsible. The black car failed to control its speed, which violated the provisions that motor vehicles should maintain a safe speed according to road, traffic, weather and other conditions (Article 42), and was secondarily responsible. Responsibility ratio: The white car was responsible for 70% of the accident, and the black car was responsible for 30%.
It did not rain today. There were two people sitting in the white car and the white truck was full of eggs.	The model needs to filter out such irrelevant information to maintain focus on the pertinent facts and ensure the accuracy of the liability determination.	Article 43 of the Road Traffic Safety Law of the People’s Republic of China: It is stipulated that the rear vehicle should maintain a safe distance from the front vehicle sufficient to take emergency braking measures. Liability determination: The driver of the black car failed to maintain a sufficient safety distance and failed to brake in time, so should bear all the responsibility for this accident. Article 47 of the Road Traffic Safety Law of the People’s Republic of China: It is stipulated that motor vehicles should slow down when passing through crosswalks; when pedestrians are crossing the sidewalk, they should stop and give way. Liability determination: The driver of the white car was in reasonable defense during emergency braking to avoid pedestrians who suddenly crossed the road. He complied with the relevant laws and regulations on road traffic safety and did not violate the above regulations. Therefore, the driver of the white car was not responsible.
The white car deliberately braked urgently, leaving the black car no time to react, resulting in a tragedy.	The model must identify and exclude any misleading or incorrect information to preserve the fairness and integrity of the judgment.	The driver of the white car violated Article 42 of the Road Traffic Safety Law of the People’s Republic of China, which stipulates that drivers should pay attention to driving safety and avoid emergency braking without reason. According to the accident form and evidence, the sudden deceleration of the white car was the main cause of the accident. Although the driver of the black car maintained the necessary distance between vehicles, he failed to take effective evasive measures, violating the safe driving requirements of taking emergency avoidance measures in response to emergencies. According to the provisions of the Road Traffic Safety Law and relevant regulations, the driver of the white car bears the main responsibility for the accident, and the driver of the black car bears secondary responsibility. The responsibility ratio is 70% for the white car and 30% for the black car.
There was a young couple in the white car. The weather was good today. The driver of the white car suddenly stopped to avoid the elderly. The black car was about 20 m away from the white car when driving.	The model needs to filter out irrelevant details while extracting key information to ensure an accurate responsibility determination.	The white car braked to avoid pedestrians, which is a normal driving behavior. The black car failed to maintain a safe distance and failed to react to the sudden stop of the vehicle behind, which violated the safe distance requirements stipulated in Article 43 of the Road Traffic Safety Law of the People’s Republic of China. Responsibility determination: The driver of the black car is fully responsible. The driver of the white car is not responsible.
The white car was forced to brake suddenly because the vehicle in front braked. The black car followed closely behind and failed to stop in time, hitting the white car. At the same time, another car behind the black car also failed to brake and hit the black car.	Here, the model must navigate through a multi-vehicle accident scenario, assessing each vehicle’s actions and making a comprehensive judgment on liability.	How the accident happened: According to surveillance video and on-site photos, the black car failed to maintain a safe distance during driving and collided with the white car in front. When the accident occurred, the white car was driving normally in the right lane of the road, and the black car followed closely behind. The white car braked normally due to an emergency in front, and the black car failed to respond in time, resulting in a rear-end collision. Fault and responsibility analysis: The driver of the black car failed to maintain a safe distance, violating Article 43 of the Road Traffic Safety Law that vehicles should maintain a safe distance sufficient to take emergency braking measures. The driver of the white car drove normally and did not have obvious violations. Responsibility determination: The driver of the black car bears the main responsibility for this accident, and the driver of the white car is not responsible.
The road was snowy and very slippery. The white car skidded and lost control when turning. The black car behind it was unable to brake in time because the road was too slippery and hit the white car.	This scenario reflects the model’s performance when faced with traffic accidents caused by bad road conditions (such as snow, slippery, etc.), especially how to distinguish the responsibility for accidents caused by external conditions.	How the accident happened: According to the surveillance video and photos taken at the scene, the white car lost control and skidded at the turn due to the slippery snow on the road. The black car following closely behind failed to brake in time due to the same slippery road conditions, resulting in a collision with the white car in front. Fault and responsibility analysis: When driving on the slippery road, the driver of the white car failed to effectively control the speed and stability of the vehicle, violating Article 42 of the Road Traffic Safety Law on appropriately reducing the speed according to road conditions. The driver of the black car followed the car too closely under the same road conditions and failed to maintain a safe distance, which also violated Article 43 of the Road Traffic Safety Law. Responsibility determination: The driver of the white car bears the main responsibility, and the driver of the black car bears secondary responsibility.
The driver of the white car suddenly became ill, causing the vehicle to lose control and stop suddenly. The black car behind it had no time to react and rear-ended it.	In this scenario, the model needs to account for the sudden health issue as a contributing factor, balancing it against the actions of the other driver to determine liability.	How the accident happened: According to the surveillance video and photos taken at the scene, the driver of the white car suddenly stopped due to illness, and the black car followed closely behind and failed to react in time, causing a rear-end collision with the white car in front. Fault and liability analysis: The driver of the white car lost control of the vehicle due to a sudden illness, which is a force majeure. According to Article 63 of the Road Traffic Safety Law, he shall not be liable in case of emergency avoidance. The driver of the black car failed to maintain a safe distance from the car in front and failed to take timely measures to avoid the accident, which violated Article 43 of the Road Traffic Safety Law that the rear car should maintain a safe distance. The driver of the black car bears the main responsibility for this accident.

Table 8. Model performance with incomplete image information.

Changes in Image Input	Input Explanation	Responsibility Determination Result
Figure 14	The frame at the moment of collision was removed from the eight key frames.	Fault and liability of the parties causing the road traffic accident or accidental cause Elderly woman: According to Article 62 of the Road Traffic Safety Law of the People’s Republic of China, “Pedestrians shall use crosswalks or pedestrian crossing facilities when passing through intersections or crossing roads”. The elderly woman did not comply with this provision, which directly led to the white car taking emergency measures. Therefore, she bears primary responsibility for the accident. Driver of the white car: His behavior was an emergency avoidance, but he did not consider the risks that emergency braking might bring to the rear vehicle, so he bears secondary responsibility. Driver of the black car: He failed to maintain a safe distance, which violated Article 43 of the Road Traffic Safety Law of the People’s Republic of China, “For motor vehicles traveling in the same lane, the rear vehicle shall maintain a safe distance from the front vehicle sufficient to take emergency braking measures”. Therefore, he bears secondary responsibility for the accident.
Figure 15	Key information at the moment of collision is obscured	How the accident happened: A was driving a white car along the main road. Because an old man suddenly crossed the road in front of him, he stopped suddenly to avoid him. B was driving a black car about 20 m behind. Due to the high speed, he failed to react in time and rear-ended the white car. Responsibility determination: According to the “Road Traffic Safety Law of the People’s Republic of China”: A’s behavior was a normal risk avoidance behavior and he was not at fault. B failed to maintain a safe distance and drove at a high speed, which violated Article 43 of the “Road Traffic Safety Law” and should bear the main responsibility.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Lu, S.; Zhong, L. An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models. Appl. Sci. 2024, 14, 7716. https://doi.org/10.3390/app14177716

AMA Style

Chen J, Lu S, Zhong L. An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models. Applied Sciences. 2024; 14(17):7716. https://doi.org/10.3390/app14177716

Chicago/Turabian Style

Chen, Junbo, Shunlai Lu, and Lei Zhong. 2024. "An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models" Applied Sciences 14, no. 17: 7716. https://doi.org/10.3390/app14177716

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models

Abstract

1. Introduction

2. Methodology

2.1. YOLOv8 Network

2.2. “Minimum Output Sum of Squared Error” Tracking Algorithm

2.3. Horn and Schunck Algorithm

2.4. Multimodal Large Language Model

3. Fully Intelligent Accident Liability Determination Method Based on Collision Detection

3.1. Vehicle Recognition and Tracking

3.2. Vehicle Target Tracking Based on Minimum Error Square Filter

3.3. Accident Identification Based on Collision Detection

3.3.1. Calculation of the Amplitude Change in the Optical Flow Vector

3.3.2. Collision Detection Classification Based on SVM

3.4. Responsibility Determination Method Based on Multimodal Large Language Model

3.4.1. Key Frames Extraction Based on K-Means Algorithm

3.4.2. GPT4-V API Access

4. Experimental Analysis

4.1. Experimental Parameter Settings

4.2. Vehicle Recognition Accuracy

4.3. Tracking Performance

4.4. Collision Detection Efficacy

4.4.1. Effectiveness of the Improved HS Algorithm

4.4.2. Performance of SVM Collision Detection

4.5. Accident Liability Assessment

4.5.1. Key Frame Extraction Evaluation

4.5.2. Accident Liability Determination Results

4.6. Analysis of the Impact of Image and Text Information on Liability Determination Results

4.6.1. Experiment 1

4.6.2. Experiment 2

5. Discussion

5.1. Limitations of the Algorithm

5.2. Challenges in Real-World Implementation

5.3. Legal and Ethical Considerations

5.4. Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI