Machine-Learning-Based Path Loss Prediction for Vehicle-to-Vehicle Communication in Highway Environments

Sagir, Nugman; Tugcu, Zeynep Hasirci

doi:10.3390/app14177545

Open AccessArticle

Machine-Learning-Based Path Loss Prediction for Vehicle-to-Vehicle Communication in Highway Environments

by

Nugman Sagir

and

Zeynep Hasirci Tugcu

^*

Department of Electronics and Communications Engineering, Karadeniz Technical University, Trabzon 61830, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7545; https://doi.org/10.3390/app14177545

Submission received: 2 August 2024 / Revised: 16 August 2024 / Accepted: 22 August 2024 / Published: 26 August 2024

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Vehicle-to-vehicle (V2V) communication, which plays an important role in intelligent transportation systems, has been statistically proven to improve traffic efficiency and reduce the probability of accidents. In real-world applications, it is critical to accurately estimate the path loss parameter in communication channels due to the variable and complex propagation environments often encountered in inter-vehicle communication scenarios. This paper presents a study on various machine learning methods to improve path loss estimation in V2V communication using a dataset (192,000 observations) obtained from field measurements of highway environments in the Trabzon and Gümüşhane provinces in Türkiye. For this purpose, path loss estimation was carried out with different machine learning algorithms such as Artificial Neural Networks, Random Forest, Linear Regression, Gradient Boosting, Support Vector Regression, and AdaBoost by using various environmental and system features. Then, performance comparisons were conducted between machine learning methods and traditional empirical approaches such as log-distance, two-ray, and log-ray. Examining the outputs reveals that machine learning methods outperform traditional methods and yield results quickly. As a result, the Random Forest and Gradient Boosting methods demonstrated the highest prediction performances, with R² values of 0.97 and 0.96, MAE values of 0.0557 and 0.0701, and RMSE values of 0.0774 and 0.0964, respectively, outperforming both empirical methods, other machine learning techniques, and the existing studies based on V2V. Overall, our study provides significant contributions to the existing literature by providing a comprehensive parameter set for highway environments, examining the path loss prediction performance of machine learning models with different capabilities, and comparing them with traditional methods. This study not only fills a critical gap in the existing literature but also highlights the necessity, efficiency, and originality of machine learning approaches for improving reliable V2V communication systems.

Keywords:

machine learning; vehicular communication; path loss; V2V; DSRC; highway

1. Introduction

Vehicle-to-vehicle (V2V) communication is crucial to intelligent transportation systems (ITS), particularly for increasing traffic efficiency and decreasing the probability of accidents [1]. Therefore, V2V investigations have great potential and are growing research trends in communication system design, implementation, and performance metrics. V2V propagation channels differ from cellular communication in the context of transceiver antennas, which are comparatively low in height and mobility, creating an environment that is both relatively complicated and rapidly changing [2]. In general, the power of wireless signals decreases with increasing distance between the transmitter and receiver, and electromagnetic waves propagate via a variety of methods, which are broadly categorized as reflection, diffraction, and scattering [3]. Estimating the received signal strength is challenging because of the complex V2V propagation environment.

Path loss prediction is a crucial aspect of V2V communication systems, similar to wireless communication systems. Path loss generally refers to the attenuation of signal power as it propagates through space between two vehicles. This attenuation is influenced by factors such as distance, the presence of obstacles (e.g., buildings, vehicles), and the environment (urban vs. rural settings) [4]. Therefore, various path loss models have been developed for V2V communication channels under different traffic conditions, road types, and environments. Path loss models are commonly developed using deterministic or empirical methods. Most empirical models were created using data collected under specific conditions and within a particular frequency range [5]. These models provide a statistical description of the correlation between path loss and propagation factors such as frequency, antenna height, and distance between antennas [6,7]. Furthermore, the reduction in signal strength induced by the shadowing effect is typically modeled as a Gaussian random variable with a mean of zero. Empirical models are characterized by simple model equations and minimal parameter requirements, which make them comparatively comprehensible. Nevertheless, when these models are implemented in more general situations, their precision may be reduced because their parameters are determined from the data gathered in specific environments [8]. Thus, these models may offer limited solutions for accurately predicting exactly what power is received at a particular location outside the environment in which they are developed. Instead, they can only offer a statistical overview of the path loss at a given distance. To reduce this limitation and obtain more generalizable results, many measurements and parameter estimations need to be made for different geographies and scenarios. This requires the use of new techniques that can be generalized and provide strong predictions based on existing data.

In recent years, machine learning techniques have been suggested as powerful and robust tools for path loss prediction due to their main advantages, coming from their capacity to learn from data, their ability to generate more accurate models by training on different datasets, and quick adaptation to varying environments and conditions. In addition, higher prediction accuracy rates, especially in complex and dynamic systems, support its use in path loss predictions in V2V propagation environments [9]. Table 1 provides a comprehensive comparison of machine-learning-based path loss prediction studies in the literature.

Many of these studies focused on various environments, such as urban (U), suburban (SU), rural (R), highway (H), and specific areas, such as campuses or indoor environments, and only a small number specifically addressed V2V scenarios ([23,31,34,35,41]). Among these studies, [31] focuses on the Visible Light Communication (VLC) standard in campus road and indoor settings. While [23] provides a path loss estimation in the parking garage environment, it also offers a path loss estimation for SU environments at different frequencies (450, 1450, and 2300 MHz), which differs from the 5.9 GHz DSRC (Dedicated Short-Range Communication) standard we are focusing on in this study. On the other hand, researchers [34,41] who investigated the path loss in V2V environments for the 5.9 GHz DSRC standard based on IEEE 802.11p estimated the path loss in SU environments, while [35] used the Random Forest technique to estimate the path loss in both H and U environments. Unlike many existing studies, we aimed to examine path loss estimation in V2V communications more comprehensively under highway conditions. This study is necessary because accurate path loss estimation is critical for the development and optimization of V2V communication systems, particularly in dynamic and high-speed environments such as highways. Examining the existing studies, we found that only one study [35] concentrated on H scenarios in V2V, revealing a model performance specific to the RF technique. Therefore, we investigated the path loss estimation performances of various techniques, including Artificial Neural Networks (ANNs), Random Forest (RF), Linear Regression (LR), Gradient Boosting (GB), Support Vector Regression (SVR), and AdaBoost (AB) in our study. Additionally, we compared the model performances of traditional empirical path loss models in H environments, such as log-distance, two-ray, and log-ray, with those of machine learning techniques. This work is not only one of the few studies examining V2V communication in an H environment but also stands out for its use in various machine learning models, including ANN, RF, LR, GB, SVR, and AB, and their comparisons with traditional path loss models. In summary, the main motivation of this study is to highlight the potential of machine learning models in this domain by evaluating the performance of different machine learning models, particularly regression models, and comparing them with empirical models. This study contributes to the existing literature and aims to suggest a model that can be effectively used in practical applications by comparing the advantages and disadvantages of empirical models with machine learning techniques in path loss prediction.

This paper is structured as follows to provide a thorough understanding of how machine learning approaches can affect path loss prediction and how to increase the effectiveness of V2V communication systems. A comprehensive review of the existing literature and the main motivations for this study are presented in Section 1. The theory of traditional and machine-learning-based models for path loss estimation is briefly explained in Section 2. Section 3 discusses the design and implementation of the models and explains the methodology, data-gathering procedures, model selection, and algorithm configurations. The practical findings, numerical performance comparisons, and suggestions for further research are presented in Section 4 and Section 5, which also summarize the conclusions of the study.

2. Theory

2.1. Empirical-Model-Based Predictive Modeling

2.1.1. Log-Distance Path Loss Model

Path loss is defined as signal attenuation in dB, representing the difference between the transmitted power and the received power [42]. The log-distance path loss model commonly used in V2V communication states that the signal strength is based on the exponential change in the distance between the transmitter (Tx) and the receiver (Rx). The formula for this model is given by

\bar{P L} (d) = P L (d_{0}) + 10 n log (\frac{d}{d_{0}}), d > d_{0}

(1)

where

\bar{P L} (d)

represents the path loss at distance d between the Tx and Rx (dB),

d_{0}

shows the reference distance (m),

P L (d_{0})

is the path loss at the reference distance (dB), and n refers to the path loss exponent, indicating the rate at which the received signal strength decreases with distance.

2.1.2. Two-Ray Path Loss Model

The two-ray path loss model is often used in scenarios where there is a clear line-of-sight (LOS) between the (Tx) and (Rx), and it includes both the direct path and a ground-reflected path. This model provides a more accurate prediction of the path loss over long distances compared to the simple free-space model [3]. In the two-ray model, the total electric field

E_{two-ray} (d, t)

at the receiver is the sum of the electric fields of the direct and ground-reflected rays and is given by the following formula:

E_{two-ray} (d, t) = \frac{E_{0} d_{0}}{d^{'}} cos (ω_{c} (t - \frac{d^{'}}{c})) + Γ \frac{E_{0} d_{0}}{d^{''}} cos (ω_{c} (t - \frac{d^{''}}{c}))

(2)

where

E_{0}

depicts the electric field in free space at the reference distance

d_{0}

,

d^{'}

represents the distance of the direct path between the Tx and Rx,

d^{''}

shows the distance of the reflected path between the Tx and Rx,

ω_{c}

is the angular frequency of the carrier, c is the speed of light, and

Γ

is the ground reflection coefficient. The received power

P_{two-ray} (d)

can then be calculated from the total electric field using the following formula:

P_{two-ray} (d) = \frac{| E_{two-ray} |^{2} G_{r} λ^{2}}{480 π^{2}}

(3)

where

G_{r}

is the gain of the receiving antenna, and

λ

is the wavelength of the signal. This model takes into account both the direct line-of-sight signal and the ground-reflected signal, making it suitable for environments where such reflections are significant.

2.1.3. Log-Ray Path Loss Model

The log-ray path loss model is an approach developed to model path loss in vehicle-to-vehicle (V2V) communication channels [43]. This model incorporates the path loss exponent derived from measurements to overcome the limitations of traditional log-distance and two-ray path loss models. Mathematically, this is expressed as

P_{log-ray} (d) = \frac{| E_{log-ray} |^{2} G_{r} λ^{2}}{480 π^{2}}

(4)

where

E_{log-ray}

represents the total electric field at the Rx and is given by

\begin{matrix} E_{log-ray} (d, t) = E_{0} {(\frac{d_{0}}{d^{'}})}^{n / 2} cos (ω_{c} (t - \frac{d^{'}}{c})) \\ + Γ E_{0} {(\frac{d_{0}}{d^{''}})}^{n / 2} cos (ω_{c} (t - \frac{d^{''}}{c})) \end{matrix}

(5)

where the path loss exponent, which varies from one environment to another, is denoted by n, and the other parameters are explained in detail in Section 2.1.2.

2.2. Machine-Learning-Based Predictive Modeling

In this study, several machine learning models are used to predict path loss in V2V communication. The input parameters used in all machine learning methods are weather conditions, number of antennas, distance, antenna position, and vehicle direction. The output parameter for all models is path loss. A common dataset was used in machine learning methods.

2.2.1. Artificial Neural Networks (ANNs)

ANN is an information-processing paradigm that is inspired by the way biological nervous systems such as the brain process information [44]. The mathematical model of an ANN neuron consists of three basic components: synapses modeled as weights, an adder for summing the inputs, and an activation function. Each connection weight w represents the strength of the connection, where positive values indicate excitatory connections and negative values indicate inhibitory ones. The adder linearly combines the inputs x and their corresponding weights w, as described by the equation

v_{k} = \sum_{i, j} w_{i j} x_{i j}

(6)

The activation function

ϕ

then processes this sum to produce the neuron’s output

y_{k} = ϕ (v_{k})

. This approach allows ANNs to approximate complex functions by adjusting the weights based on training data, enabling the network to learn and generalize from the input data effectively.

2.2.2. Random Forest (RF)

Random Forest (RF) is an ensemble learning method that operates by constructing multiple decision trees during training and outputting the mode of the classes (classification) or mean prediction (regression) of the individual trees [45]. Random Forest is a classifier consisting of a collection of tree-structured classifiers

{h_{k} (x, Θ_{k}), k = 1, \dots}

where the

{Θ_{k}}

values are independent identically distributed random vectors and each tree casts a unit vote for the most popular class at input x. The decision function for an RF can be represented as

H (x) = arg max_{Y} \sum_{k = 1}^{K} I (h_{k} (x) = Y)

(7)

where

H (x)

shows the combined classification model,

h_{k} (x)

is the individual decision tree model, Y depicts the output variable, and

I (\cdot)

is the indicator function.

2.2.3. Linear Regression (LR)

Linear Regression is a statistical technique used to model the relationship between a dependent variable Y and one or more independent variables

X_{1}, X_{2}, \dots, X_{p}

[46]. The model assumes that the relationship between the variables is linear and can be expressed with the equation

Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{p} X_{p} + ϵ

(8)

where

β_{0}, β_{1}, \dots, β_{p}

are the regression coefficients, and

ϵ

represents the random error term. This method is widely used for its simplicity and ability to interpret the influence of each predictor on the outcome.

2.2.4. Gradient Boosting Regression (GB)

Gradient Boosting is a powerful ensemble learning algorithm used for regression and classification. It combines many weak learners (usually decision trees) to form a strong predictive model [47]. At each step, a new weak learner is added to correct the errors of the existing model. This process can be summarized by the following formula:

f_{m} (x) = f_{m - 1} (x) + γ h_{m} (x)

(9)

where

f_{m} (x)

is the model created at the m-th stage,

γ

is the learning rate, and

h_{m} (x)

is the m-th weak learner. GB regression provides a flexible and highly accurate prediction by offering multiple adjustable hyperparameters and loss functions.

2.2.5. Support Vector Regression (SVR)

Support Vector Regression (SVR) is a machine learning method based on statistical learning theory [20]. The main goal of SVR is to find a hyperplane in high-dimensional space with the following formula:

f (x) = w^{T} ϕ (x) + b

(10)

where w shows the normal vector of the hyperplane, x represents the input feature vector,

ϕ (\cdot)

depicts the nonlinear transformation function, and b is the bias term. The approximate function of SVR is expressed by solving the dual problem as follows:

f (x) = \sum_{i = 1}^{N} (α_{i} - α_{i}^{*}) K (x_{i}, x) + b

(11)

where,

α_{i}

and

α_{i}^{*}

are Lagrange multipliers and

K (\cdot, \cdot)

is the kernel function. The commonly used Gaussian kernel function is given by

K (x_{i}, x_{j}) = exp (- γ ∥ x_{i} - x_{j} ∥^{2})

(12)

where

x_{i}

and

x_{j}

represent the feature vectors of the support vectors used in the SVR model.

2.2.6. AdaBoost Regression (AB)

AdaBoost is a powerful ensemble learning algorithm used to improve the performance of weak learners [48]. The AB algorithm trains a weak learner at each iteration and then updates the weights to minimize the prediction errors of the weak learner. This process enhances the overall performance of the model by combining the predictions of various weak learners. Although the AB algorithm is particularly successful in classification problems, it can also be applied to regression problems. The following formula summarizes the AB algorithm:

F_{m} (x) = F_{m - 1} (x) + α_{m} h_{m} (x)

(13)

where

F_{m} (x)

is the model created at the m-th stage,

α_{m}

refers to the weight of the weak learner, and

h_{m} (x)

demonstrates the m-th weak learner. AB increases the model’s accuracy by giving more weight to misclassified examples at each iteration.

3. Materials and Methods

3.1. Measurement Scenarios and Setup

All measurement studies were conducted in different highway environments in two cities, Trabzon and Gümüşhane in Türkiye. A highway environment can typically be defined as a road with fast-flowing traffic, multiple lanes, and few or no surrounding structures, such as buildings and trees. The highways Trabzon–Giresun and Gümüşhane–Erzincan, where the data on V2V communication were collected in this study, are shown in Figure 1a and Figure 1b, respectively. These roads were selected as suitable testing environments because of their different traffic and geographical structures, which capture different climatic conditions and road features to provide an extensive dataset for path loss prediction. The Trabzon–Giresun highway has coastal areas with high humidity, while the Gümüşhane–Erzincan highway is characterized by significant altitude changes and mountainous terrain with frequent snowfall in winter. Thus, the varied geography and design of these highways offer beneficial data on how environmental factors impact V2V communication, facilitating the development of accurate machine-learning models for path loss prediction.

Figure 2 shows a block diagram of the measurement system used to gather the data. The transmitter (Tx) and receiver (Rx) were represented by two different automobiles fitted with DSRC + GPS antennas. Data processing and collection were performed using the DSRC on-board unit (DSRC-OBU) devices with the technical specifications given in Table 2 that link each car to a laptop.

For the experimental measurements in this study, two passenger cars, one Tx (2022 Skoda Fabia) and one Rx (2020 Volkswagen Golf), were used. The measurement setup was identical in both vehicles and included a laptop computer, a Cohda Wireless MK5 OBU kit [49], a 12V/220V inverter, an in-car camera, and a cigarette lighter multiplexer, as shown in Figure 3. The laptops were connected to the DSRC OBU device via an Ethernet port, and the sending and receiving of data packets were controlled through these computers.

3.2. Data Processing

The measurement data obtained from the field studies were first grouped according to their characteristics. Then, they were used as input parameters that significantly impact path loss for machine learning. In this context, these different inputs were categorized into two groups: environmental and systemic features. The path loss that represents the attenuation encountered by the signal between the Tx and the Rx was defined as an output parameter. This output was determined using learning algorithms, as depicted in Figure 4, and input parameters were considered in the estimation process.

Weather conditions such as sunny and snowy are among the most important environmental factors in wireless communication, affecting how the signal propagates under atmospheric conditions. The study used weather conditions as an environmental input parameter due to their significant impact on attenuation. On the other hand, the distance between Tx and Rx vehicles, which is one of the most significant parameters affecting signal loss directly in empirical models, the number of antennas (one or two) that enable wireless communication between vehicles, the position of antennas (inside or outside the vehicle), and the direction of vehicles’ movement (same or opposite) were defined as systemic input features. Table 3 summarizes the input feature encoding used in the machine learning models for this study to ensure uniformity in data processing and enhance the model’s predictive accuracy. The one-hot encoding method was used for each category for the inputs in Table 3 to be processed by the machine learning models [50].

Nine independent input parameters and one dependent output parameter (path loss) were selected from 192,000 observations measured in this study. All simulations and analyses were performed using MATLAB R2023a, with the gear including a computer with a 64-bit operating system running Windows 10 Pro, an Intel(R) Core(TM) i5-9300H CPU running at 2.40 GHz, 16.0 GB of RAM, and an x64-based processor. Next, we used a normalization process in [10] for our data to mitigate the impacts of different feature scales as given in Equation (14). Thus, all features contribute equally to the model, and the stability and efficiency of the optimization algorithms are improved, resulting in more accurate results and shorter convergence times during training. The formula used for normalization is as follows:

x_{n} = \frac{2 (x - x_{min})}{x_{max} - x_{min}} - 1

(14)

where

x_{n}

refers to the normalized value, x is the original value, and

x_{min}

and

x_{max}

represent the minimum and maximum values in the data, respectively. As a result of this normalization process, all input and output parameters were scaled between −1 and 1. Normalization was performed to allow fast and effective learning performance. After the initial stages of machine learning, such as data cleaning and preprocessing, 85% of the dataset was allocated as the training set, and the remaining 15% was set aside as test data to validate the accuracy of the model. In order to accurately assess the generalizability of the model, a five-fold cross-validation method was applied.

Figure 5 shows the distribution of the 192,000 observations used in the path loss estimation on the input parameters. After data preprocessing, normalization, and division of the data into 85% training and 15% test sets, we conducted hyperparameter optimization using grid search [51], which is crucial because it enables the adjustment of parameters that are not directly learned during training, thereby resulting in the best possible model performance. This step positively affects the ability of the model to perform efficiently with new and unexplored data, contributing to improved accuracy, a lower risk of overfitting, and guaranteeing the optimal functionality of the model. In this study, hyperparameter optimization was conducted for the ANN, RF, LR, GB, SVR, and AB machine-learning models, as presented in detail in Table 4. The optimal ANN configuration contained a certain number of layers and neurons, and the ReLU, Sigmoid, and Tanh activation functions were fine-tuned. RF was optimized for surrogate decision bins and minimum leaf size. Thus, flexibility and learning abilities increased. When SVR was optimized with RBF, Linear, and Polynomial kernel functions, regularization parameter (C), tolerance margin (epsilon), and kernel-scale adjustments, a balanced error tolerance and good performance were achieved. AB optimization was optimized by adjusting the learning rate, the base predictor, and the number of weak learners (n-predictors), whereas GB showed an optimal performance by optimally selecting the learning rate, the maximum tree depth, and the number of weak learners. The optimization process for LR ensured efficient model convergence by determining the maximum number of steps, limiting the terms used, and defining the initial terms. In summary, all of these parameter optimizations contributed to increasing the accuracy of path loss prediction and the robustness of each model.

4. Results and Discussion

In this study, V2V propagation analyses were performed in highway environments using ANN, RF, LR, GB, SVR, and AB machine-learning techniques and their performance was compared with the traditional log-distance, two-ray, and log-ray models. Various environmental and systemic features, such as weather, distance, number of antennas, antenna position, and direction of vehicle movement, have been used as input parameters in path loss prediction using machine learning. Processing these parameters with machine learning algorithms produced more accurate and faster predictions than using traditional methods. Moreover, the large dataset and diversity of parameter options in our study increased the reliability and generalizability of the model in real-world scenarios.

The effectiveness of these machine learning models in estimating path loss is first shown in Figure 6. The expectation here is that the predicted values will align closely with the actual data line (y = x line), indicating the accuracy and reliability of the model. In this regard, the RF and GB models showed the highest accuracy by closely reflecting the distribution of the actual and predicted values with minimal deviations from the actual data line. This means that these models performed well and effectively captured general trends. Although the ANN model was successful in effectively capturing the nonlinear structure of the data, we believe that its sensitivity to extreme values reduces its overall accuracy. In general, the SVR model is known for its success in representing nonlinear relationships; however, in this problem, the error distance increased notably at high carrying capacities. On the other hand, the AB model, characterized by a more scattered distribution of its predictions, demonstrated lower accuracy, while the LR model had the highest error rate, failing to adequately represent nonlinear relationships. Consequently, the RF and GB models provided higher accuracy and reliability compared to the others, whereas the ANN and SVR models exhibited acceptable performance and the AB and LR models showed lower accuracy.

The numerical results for the performance comparison of the path loss prediction models are presented in Table 5 and Figure 7 with the metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R²). The first observation is that machine learning models dominantly have high accuracy according to the prediction performance of traditional empirical models. The results show that the RF and GB models demonstrated superior performance, achieving the lowest MAE and RMSE values. The RF model, with an MAE of 0.0557, an RMSE of 0.0774, and an R² of 0.97, emerged as the best-performing model, indicating its high predictive accuracy and robustness. GB also performed quite well, almost as well as RF, with an MAE of 0.0701, an RMSE of 0.0964, and an R² of 0.96. Thus, it can be concluded that GB is as effective as RF in processing complex data models ın the context of this study. In contrast, the ANN and SVR models, while performing reasonably well, demonstrated higher error rates compared to RF and GB. The AB and LR were among the worst-performing machine learning models with the lowest R² and the highest RMSE, respectively. On the other hand, the log-distance model was the best-performing empirical model with 3.9884 RMSE and 0.92 R², but it generally lagged behind machine learning models. The two-ray and log-ray models also performed poorly, with RMSE values of 10.038 and 8.4032 and R² values of 0.32 and 0.29, respectively. This clearly shows that, whereas machine-learning models have the capability of highly accurate predictions for various complex V2V communication scenarios, the limitations of empirical approaches negatively affect the prediction performance. Overall, these results suggest the necessity of adopting advanced machine learning techniques for more accurate and reliable path loss predictions in V2V communication.

Then, the Principal Component Analysis (PCA), Minimum Redundancy Maximum Relevance (MRMR), and F-Test methods were used as preprocessing steps before rerunning the algorithms to optimize the performance of the machine learning models. PCA was employed to decrease the number of dimensions in the data while preserving a significant portion of the variability. This process aims to simplify the model and improve computational performance. The F-Test facilitated the selection of important features, thus guaranteeing that only the most useful factors were considered, thereby boosting the accuracy of the model. The MRMR method was used to identify features that optimize their relevance to the target parameter and minimize duplication between them. Table 6 shows the feature selection weights of various features using PCA, MRMR, and F-Test methods.

Here, PCA highlights four out of nine features, MRMR highlights eight features, and F-Test highlights three features, of which distance and weather (sunny or snowy) are the common features weighted in each. Subsequently, to investigate the individual effects of each method on model adaptability and predictive performance, the error outputs were calculated when each method was used separately and when it was not, as shown in Table 7.

When the performance comparison of the models before and after the application of the feature selection methods was examined, it was observed that each method had a different effect on the RMSE values of each model. The RMSE values for the RF and GB models remained rather constant, indicating that these models were already highly optimized regarding feature relevance and redundancy. In particular, the RMSE value of the RF model increased slightly from 0.0774 to about 0.081, and the RMSE value of the GB model increased from 0.0964 to about 0.1053. This slight increase means that feature selection had a negligible effect, which is likely because these models already demonstrated the best performance with the original feature set. The performance of the ANN and SVR models with feature selection produced different outcomes. While the ANN model experienced a marginal increase in RMSE values, PCA and MRMR contributed to an increase in RMSE from 0.1477 to 0.1488 and 0.1493, respectively. Thus, it is clear that these techniques are not successful in the performance of the ANN model. In contrast, the F-Test resulted in a notable reduction in the ANN model’s RMSE to 0.1032, demonstrating the effectiveness of the F-Test in identifying the most important features and consequently improving the accuracy of the model. Nevertheless, the RMSE of the SVR model increased from 0.1533 to 0.1601 when using PCA, 0.1645 with MRMR, and 0.1585 with the F-Test. These observations suggest that the original set of features provided a greater advantage for the SVR model, and the feature selection process did not have a significant impact on improving the accuracy of the predictions. The AB model exhibited noteworthy efficacy when employing both the PCA and MRMR approaches, with a decrease in the RMSE from 0.2198 to 0.1787 and 0.1998, respectively. This led to the conclusion that PCA and MRMR successfully identified the most critical features of AB and enhanced its performance. Moreover, the F-Test caused a marginal reduction in the RMSE to 0.2156, but this improvement was not as significant as that of the PCA and MRMR. Therefore, it can be concluded that the F-Test was not as efficient as PCA and MRMR for the AB model, although helpful. Finally, the RMSE of the LR model did not exhibit significant changes with the feature selection. Minor increases from 0.2780 to 0.2793 with PCA, 0.2815 with MRMR, and 0.2832 with the F-Test show that the initial feature set was optimal for the LR model. In summary, the impact of feature selection methods on model performance can vary significantly depending on the model architecture. This highlights the necessity of model-specific feature selection to optimize prediction accuracy. The significant decrease in RMSE for the ANN model with the F-Test highlights the importance of selecting the appropriate feature selection technique for each model to optimize prediction accuracy. For the AB models, both PCA and MRMR, and for the ANN model, the F-Test proved to be highly effective, underlining the need for model-specific strategies for feature selection.

The existing machine-learning-based path loss estimation studies are analyzed in general; it is clear from Table 1 that relatively few of them focus on V2V communication scenarios ([23,31,34,35,41]). However, only [34,35,41] studies were conducted for the 5.9 GHz carrier frequency, which is also the focus of our research; in comparison the studies [23] and [31] have focused on different frequencies. These studies presented path loss estimation performances with RF, ANN, and Gaussian Process Regression (GPR) models in various frequency bands in different propagation environments. Ref. [34] and [41] presented path loss estimation performances with the RF machine learning model in SU environments with 3.18 MAE and 2.49 RMSE metrics, respectively, while [35] obtained an MAE of 4.31 in H and U environments with the RF model. In our study, a comprehensive path loss estimation for V2V in an H environment at 5.9 GHz was performed with ANN, RF, LR, GB, SVR, and AB machine learning models and compared with the performances of log-distance, two-ray, and log-ray empirical models. In this context, the LR model with 0.68 R², 0.278 RMSE, and 0.2299 MAE has the worst performance among all the machine learning models examined, the RF model with 0.97 R², 0.0774 RMSE, and 0.0557 MAE and the GB model with 0.96 R², 0.0964 RMSE, and 0.0701 MAE showing the best performances. It is clear that the MAE values obtained for both RF and other models in our study are much lower than the MAE of 3.18 of the RF model in the SU environment, as reported in reference [34], and the MAE of 4.31 in the H and U scenarios, as reported in reference [35]. Furthermore, the RF model reported in our study has a much lower RMSE value (0.0774) compared to the RMSE value of 2.49 in reference [41] for the SU environment. On the other hand, the path loss estimation performance of empirical models such as log-distance, two-ray, and log-ray resulted in RMSE values of 3.9884, 10.038, and 8.4032, respectively, which are considerably higher than those in both the literature and the machine learning models analyzed. In general, these findings clearly emphasize that using machine learning approaches is more effective for path loss estimation. Moreover, our study advances the area by providing a comprehensive parameter set and advanced machine learning models and providing path loss predictions in an H environment. While previous studies generally used datasets consisting of 1000-5000 observations, our study used comprehensive field measurements for H environments and 192,000 observations. In addition, many different algorithms with various features and capabilities (ANN, RF, LR, GB, SVR, and AB) were used, unlike a single algorithm or a few algorithms, as commonly used in other studies, and their performances were compared. Then, the performances of the machine learning models were also compared to traditional empirical models. Moreover, a wider range of environmental and systemic factors was considered with additional factors such as weather, number, and position of antennas, as well as vehicle direction, and an attempt was made to increase the prediction performance. Finally, unlike the single approach frequently seen in other studies, the performance change according to the model and method was investigated using multiple feature selection methods (PCA, MRMR, F-Test) to increase the model performance. As a result, this study not only fills a critical gap in the existing literature but also emphasizes the necessity and originality of our approach for improving reliable V2V communication systems.

5. Conclusions

This study shows that integrating machine learning algorithms for path loss estimation in V2V communication offers better performance than traditional empirical models. In particular, when large datasets and environmental factors are combined, machine learning algorithms can estimate the results more accurately in complex and dynamic environments. For this purpose, a comprehensive dataset was created with experimental measurements conducted in highway environments in the Trabzon and Gümüşhane provinces at 5.9 GHz, 85% of which was used for training and 15% for testing. Path loss estimations were made using ANN, RF, LR, GB, SVR, and AB machine learning models, and the performances of these models were compared with the empirical log-distance, two-ray, and log-ray models used in highway environments. Here, the RF model obtained the lowest RMSE of 0.0774 and an MAE of 0.0557, whereas the GB model showed the second-best performance, with an RMSE of 0.0964 and an MAE of 0.0701. LR was the worst-performing model among the machine learning algorithms, with an RMSE of 0.278 and an MAE of 0.2299. Traditional experimental models exhibited significantly higher RMSE values (≥3.9884 ) than all the machine learning models used. In addition, feature selection methods such as PCA, MRMR, and F-Test significantly improved the model performance, especially in AB and ANN, although their effects varied for different algorithms. The RMSE value (0.1032) obtained for the ANN model using the F-Test showed the third-closest performance to the RMSE values of the RF and GB models. This also suggests the need for a model-specific investigation of feature selection and weighting. This study contributes to the existing literature by evaluating the performance of different machine learning techniques and provides important practical results for the design and optimization of V2V communication systems. The performance comparisons emphasized the strengths and weaknesses of machine learning approaches compared to experimental models, and ultimately, a more robust solution was proposed for path loss estimation. In general, our findings emphasize the necessity and effectiveness of using advanced machine learning techniques for accurate and reliable path loss estimations in V2V communications. Therefore, by using accurate path loss predictions to optimize the signal density in V2V communication, the reliability of data transmission may be enhanced, leading to the development of safe driving, particularly for autonomous vehicles. Moreover, these calculations can improve the efficiency of the traffic flow and reduce the probability of accidents by minimizing communication interruptions in highly congested locations.

Author Contributions

Conceptualization, N.S. and Z.H.T.; methodology, N.S. and Z.H.T.; software, N.S.; validation, Z.H.T.; formal analysis, Z.H.T.; investigation, N.S. and Z.H.T.; resources, N.S. and Z.H.T.; data curation, Z.H.T.; writing—original draft preparation, N.S.and Z.H.T.; writing—review and editing, Z.H.T.; visualization, N.S.; supervision, Z.H.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

He, R.; Renaudin, O.; Kolmonen, V.M.; Haneda, K.; Zhong, Z.; Ai, B.; Oestges, C. A dynamic wideband directional channel model for vehicle-to-vehicle communications. IEEE Trans. Ind. Electron. 2015, 62, 7870–7882. [Google Scholar] [CrossRef]
Meireles, R.; Boban, M.; Steenkiste, P.; Tonguz, O.; Barros, J. Experimental study on the impact of vehicular obstructions in VANETs. In Proceedings of the Vehicular Networking Conference (VNC) 2010 IEEE, Jersey, NJ, USA, 13–15 December 2010. [Google Scholar]
Rappaport, T.S. Wireless Communications: Principles and Practice, 2nd ed.; Prentice-Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
Abbas, T.; Gustafson, C.; Tufvesson, F. A Path Loss and Shadowing Model for Multilink Vehicle-to-Vehicle Channels in Urban Intersections. Sensors 2018, 18, 4433. [Google Scholar] [CrossRef]
Turan, B.; Coleri, S. Machine Learning Based Channel Modeling for Vehicular Visible Light Communication. IEEE Trans. Veh. Technol. 2021, 70, 9659–9672. [Google Scholar] [CrossRef]
Erceg, V.; Greenstein, L.J.; Tjandra, S.Y.; Parkoff, S.R.; Gupta, A.; Kulic, B.; Julius, A.A.; Bianchi, R. An empirically based path loss model for wireless channels in suburban environments. IEEE J. Sel. Areas Commun. 1999, 17, 1205–1211. [Google Scholar] [CrossRef]
Phillips, C.; Sicker, D.; Grunwald, D. A survey of wireless path loss prediction and coverage mapping methods. IEEE Commun. Surv. Tutor. 2013, 15, 255–270. [Google Scholar] [CrossRef]
Ayadi, M.; Zineb, A.B.; Tabbane, S. A UHF path loss model using learning machine for heterogeneous networks. IEEE Trans. Antennas Propag. 2017, 65, 3675–3683. [Google Scholar] [CrossRef]
Piovano, M.; Samani, M.R.; Crupi, P. Path loss prediction in urban environment using learning machines and dimensionality reduction techniques. Comput Manag Sci. 2023, 20, 533–548. [Google Scholar] [CrossRef]
Wu, D.; Zhu, G.; Ai, B. Application of Artificial Neural Networks for Path Loss Prediction in Railway Environments. In Proceedings of the 5th International ICST Conference on Communications and Networking, Beijing, China, 25–27 August 2010; pp. 1–5. [Google Scholar]
Idogho, J.; George, G. Path Loss Prediction Based on Machine Learning Techniques: Support Vector Machine, Artificial Neural Network, and MultiLinear Regression Model. Open J. Phys. Sci. (OJPS) 2022, 3, 1–22. [Google Scholar] [CrossRef]
He, R.; Gong, Y.; Bai, W.; Li, Y.; Wang, X. Random Forest Based Path Loss Prediction in Mobile Communication Systems. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 1246–1250. [Google Scholar]
Popoola, S.I.; Misra, S.; Atayero, A.A. Outdoor Path Loss Predictions Based on Extreme Learning Machine. Wirel. Pers. Commun. 2018, 99, 441–460. [Google Scholar] [CrossRef]
Sotiroudis, S.P.; Boursianis, A.D.; Goudos, S.K.; Siakavara, K. From Spatial Urban Site Data to Path Loss Prediction: An Ensemble Learning Approach. IEEE Trans. Antennas Propag. 2022, 70, 6101–6105. [Google Scholar] [CrossRef]
Popescu, I.; Kanatas, A.; Angelou, E.; Nafornita, I.; Constantinou, P. Applications of Generalized RBF-NN for Path Loss Prediction. In Proceedings of the IEEE PIMRC 2002, Lisbon, Portugal, 15–18 September 2002; pp. 133–138. [Google Scholar] [CrossRef]
Singh, H.; Gupta, S.; Dhawan, C.; Mishra, A. Path Loss Prediction in Smart Campus Environment: Machine Learning-based Approaches. In Proceedings of the 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–5. [Google Scholar] [CrossRef]
Pedraza, L.F.; Hernández, C.A.; López, D.A. A model to determine the propagation losses based on the integration of hata-okumura and wavelet neural models. Int. J. Antennas Propag. 2017, 2017, 1–8. [Google Scholar] [CrossRef]
Cruz, H.A.O.; Nascimento, R.N.A.; Araujo, J.P.L.; Pelaes, E.G.; Cavalcante, G.P.S. Methodologies for Path Loss Prediction in LTE-1.8 GHz Networks Using Neuro-Fuzzy and ANN. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; pp. 1–6. [Google Scholar]
Surajudeen-Bakinde, N.T.; Faruk, N.; Salman, M.; Popoola, S.; Oloyede, A.; Olawoyin, L.A. On Adaptive Neuro-Fuzzy Model for Path Loss Prediction in the VHF Band. ITU J. ICT Discov. 2018, 1, 67–75. [Google Scholar]
Zhang, Y.; Wen, J.; Yang, G.; He, Z.; Wang, J. Path loss prediction based on machine learning: Principle, method, and data expansion. Appl. Sci. 2019, 9, 1908. [Google Scholar] [CrossRef]
Sousa, M.; Alves, A.; Vieira, P.; Queluz, M.P.; Rodrigues, A. Analysis and optimization of 5G coverage predictions using a beamforming antenna model and real drive test measurements. IEEE Access 2021, 9, 101787–101808. [Google Scholar] [CrossRef]
Moraitis, N.; Tsipi, L.; Vouyioukas, D. Machine learning-based methods for path loss prediction in urban environment for LTE networks. In Proceedings of the 16th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Thessaloniki, Greece, 12–14 October 2020; pp. 1–6. [Google Scholar] [CrossRef]
Jo, H.-S.; Park, C.; Lee, E.; Choi, H.K.; Park, J. Path loss prediction based on machine learning techniques: Principal component analysis, Artificial Neural Network, and Gaussian process. Sensors 2020, 20, 1927. [Google Scholar] [CrossRef]
Moraitis, N.; Tsipi, L.; Vouyioukas, D.; Gkioni, A.; Louvros, S. Performance evaluation of machine learning methods for path loss prediction in rural environment at 3.7 GHz. Wirel. Netw. 2021, 27, 4169–4188. [Google Scholar] [CrossRef]
Levie, R.; Yapar, C.; Kutyniok, G.; Caire, G. RadioUNet: Fast radio map estimation with convolutional neural networks. IEEE Trans. Wirel. Commun. 2019, 20, 4001–4015. [Google Scholar] [CrossRef]
Egi, Y.; Otero, C.E. Machine-learning and 3D point-cloud based signal power path loss model for the deployment of wireless communication systems. IEEE Access 2019, 7, 42507–42517. [Google Scholar] [CrossRef]
Elmezughi, M.K.; Salih, O.; Afullo, T.J.; Duffy, K.J. Comparative Analysis of Major Machine-Learning-Based Path Loss Models for Enclosed Indoor Channels. Sensors 2022, 22, 4967. [Google Scholar] [CrossRef]
Thrane, J.; Zibar, D.; Christiansen, H.L. Model-aided deep learning method for path loss prediction in mobile communication systems at 2.6 GHz. IEEE Access 2020, 8, 7925–7936. [Google Scholar] [CrossRef]
Wu, L.; He, D.; Ai, B.; Wang, J.; Qi, H.; Guan, K.; Zhong, Z. Artificial Neural Network based path loss prediction for wireless communication network. IEEE Access 2020, 8, 199523–199538. [Google Scholar] [CrossRef]
Timoteo, R.D.A.; Cunha, D.; Cavalcanti, G.D.C. A proposal for path loss prediction in urban environments using Support Vector Regression. In Proceedings of the Tenth Advanced International Conference on Telecommunications, Paris, France, 20–24 July 2014; pp. 119–124. [Google Scholar]
Ostlin, E.; Zepernick, H.J.; Suzuki, H. Macrocell pathloss prediction using Artificial Neural Networks. IEEE Trans. Veh. Technol. 2010, 59, 2735–2747. [Google Scholar] [CrossRef]
Angeles, J.C.D.; Dadios, E.P. Neural network-based path loss prediction for digital TV macrocells. In Proceedings of the International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management, Cebu, Philippines, 9–12 December 2015; pp. 1–9. [Google Scholar]
Popescu, I.; Nafornita, I.; Constantinou, P.; Kanatas, A.; Moraitis, N. Neural networks applications for the prediction of propagation path loss in urban environments. In Proceedings of the IEEE 53rd Vehicular Technology Conference, Rhodes, Greece, 6–9 May 2001; pp. 387–391. [Google Scholar]
Panthangi, R.M.; Boban, M.; Zhou, C.; Stanczak, S. Online Learning Framework for V2V Link Quality Prediction. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
Turan, B.; Uyrus, A.; Koc, O.N.; Kar, E.; Coleri, S. Machine Learning Aided Path Loss Estimator and Jammer Detector for Heterogeneous Vehicular Networks. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar]
Zhang, Y.; Wen, J.; Yang, G.; He, Z.; Luo, X. Air-to-air path loss prediction based on machine learning methods in urban environments. Wirel. Commun. Mob. Comput. 2018, 2018, 8489326. [Google Scholar] [CrossRef]
Zhao, X.; Hou, C.; Wang, Q. A new SVM-based modeling method of cabin path loss prediction. Int. J. Antennas Propag. 2013, 2013, 279070. [Google Scholar] [CrossRef]
Ayadi, M.; Zineb, A.B. Body shadowing and furniture effects for accuracy improvement of indoor wave propagation models. IEEE Trans. Wirel. Commun. 2014, 13, 5999–6006. [Google Scholar] [CrossRef]
Popoola, S.I.; Adetiba, E.; Atayero, A.A.; Faruk, N.; Calafate, C.T. Optimal model for path loss predictions using feed-forward neural networks. Cogent Eng. 2018, 5, 1–20. [Google Scholar] [CrossRef]
Park, C.; Tettey, D.K.; Jo, H.-S. Artificial Neural Network modeling for path loss prediction in urban environments. arXiv 2019, arXiv:1904.02383. [Google Scholar]
Ramya, P.M.; Boban, M.; Zhou, C.; Stańczak, S. Using Learning Methods for V2V Path Loss Prediction. In Proceedings of the 2019 IEEE Wireless Communications and Networking Conference (WCNC), Marrakesh, Morocco, 15–18 April 2019; pp. 1–6. [Google Scholar] [CrossRef]
Hata, M. Empirical Formula for Propagation Loss in Land Mobile Radio Services. IEEE Trans. Veh. Technol. 1980, 29, 317–325. [Google Scholar] [CrossRef]
Kuzulugil, K.; Tugcu, Z.H.; Cavdar, I.H. A Proposed V2V Path Loss Model: Log-Ray. Arab. J. Sci. Eng. 2023, 48, 14901–14911. [Google Scholar] [CrossRef]
Dongare, A.D.; Kharde, R.R.; Kachare, A.D. Introduction to Artificial Neural Network. Int. J. Eng. Innov. Technol. (IJEIT) 2012, 2, 189–194. [Google Scholar]
Liu, Y.; Wang, Y.; Zhang, J. New Machine Learning Algorithm: Random Forest. In Proceedings of the International Conference on Information Computing and Applications (ICICA 2012), Chengde, China, 14–16 September 2012; Volume 7473, pp. 246–252. [Google Scholar]
Lunt, M. Introduction to statistical modelling: Linear Regression. Rheumatology 2015, 54, 1137–1140. [Google Scholar] [CrossRef]
Cai, J.; Xu, K.; Zhu, Y.; Hu, F.; Li, L. Prediction and analysis of net ecosystem carbon exchange based on Gradient Boosting regression and Random Forest. Appl. Energy 2020, 262, 114566. [Google Scholar] [CrossRef]
Solomatine, D.P.; Shrestha, D.L. AdaBoost.RT: A boosting algorithm for regression problems. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics IEEE, Budapest, Hungary, 25–29 July 2004; Volume 1, pp. 1163–1168. [Google Scholar]
Cohda Wireless. MK5 OBU. Cohda Wireless Web Sitesi. Available online: https://www.cohdawireless.com/solutions/hardware/mk5-obu/ (accessed on 27 July 2024).
Rodríguez, P.; Bautista, M.A.; Gonzalez, J.; Escalera, S. Beyond one-hot encoding: Lower dimensional target embedding. Image Vis. Comput. 2018, 75, 21–31. [Google Scholar] [CrossRef]
Huang, Q.; Mao, J.; Liu, Y. An improved grid search algorithm of SVR parameters optimization. In Proceedings of the 2012 IEEE 14th International Conference on Communication Technology, Chengdu, China, 9–11 November 2012; pp. 1022–1026. [Google Scholar] [CrossRef]

Figure 1. Measured highway routes and map views.

Figure 2. Measurement system block diagram.

Figure 3. Interior view of the transmitter and receiver vehicle and MK5 OBU Kit [49].

Figure 4. Input parameters used in path loss prediction with machine learning.

Figure 5. Distribution of input parameters.

Figure 6. Visual performance comparison of machine learning models for path loss prediction.

Figure 7. Comparison of machine learning models for path loss prediction using performance metrics (MAE, MSE, RMSE, R²).

Table 1. Comparison of machine-learning-based path loss estimation studies in the literature.

Ref.	Freq. (MHz)	Environment	Input Parameters	Output Parameter	Test Model	RMSE (dB)	Study Area
[10]	930	Railway	Distance between Base station and Receiver	Path Loss	ANN	1.10	Other
[11]	900	Suburban	Base Station transmission power, Mobile Station receiver power, Frequency, Distance	Path Loss	ANN, SVR, MLR	ANN = 0.00493 MLR = 0.051 SVM = 0.019	Other
[12]	VHF, UHF	High-building city, High-density city, Inland lake and quasi-flat terrain	Antenna horizontal deflection theta, Effective height of transmitting antenna, Transmitting downtilt, Signal power, Frequency	Path Loss	RF	6.106	Other
[13]	1800	Highway(Nigeria)	Normalized distance between the base station and the mobile station.	Path Loss	ELM, Okumura-Hata	ELM = 4.250 Okumura-Hata = 8.73	Other
[14]	900	Urban environments (Greece)	Coordinates of the Transmitter and Receiver, Distance along the xy-plane, Features depicting the intersections between the buildings and the LOS path	Path Loss	ANN SVR kNN RF AB	ANN: 2.95 SVR: 3.02 kNN: 2.97 RF: 2.85 XGB: 2.84 AB: 2.92 LGB: 2.83	Other
[15]	1890	Urban and Suburban Environments (Greece)	Distance between transmitter and receiver, Street width, Building separation, Building height, Difference between base station antenna height and rooftop height,	Path Loss	RBF-NN	3.71	Other
[16]	1800	Smart campus (Nigeria)	Longitude, Latitude, Elevation, Altitude, Clutter height, Distance	Path Loss	ANN, RF	RF = 1.927, ANN = 65.89	Other
[17]	828.93	Urban (Colombia)	BS transmission power, BS height, BS antenna gain, BS combiner losses, BS cable losses, Analyzer antenna gain, Analyzer cable losses, Low-noise amplifier gain, Analyzer height, GSM channel transmission frequency.	Path Loss	Hata-Okumara ANN	1.2688	Other
[18]	203.25	Urban (Nigeria)	Electromagnetic field strengths from the transmitter	Path Loss	NF	5.2	Other
[19]	1800	Suburban (Brazil)	Distance, Transmitter Antenna Height, Transmitter Antenna Gain	Path Loss	ANN	4.29	Other
[20]	877.26, 2021.4, 2127.6	City (Beijing, China)	Distance, Frequency	Path Loss	ANN, SVR, RF	ANN = 4.74, SVR = 4.54, RF = 4.19	Other
[21]	3700, 26000	City (Lisbon, Portugal)	3D Distance, Frequency	Path Loss	SVR, RF	SVR = 6.97, RF = 6.25	Other
[22]	2140	City (Frankfurt, Germany)	Distance, Frequency, Tx Altitude, Rx Altitude, Rx Coordinate, LoS, NLoS	Path Loss	SVR, RF, kNN	2.1–4.1	Other
[23]	450, 1450, 2300	Suburban (Korea)	Distance, Frequency, Tx Height, Rx Height, Rx Height Ratio	Path Loss	ANN, GPR	ANN = 8.42-9.17, GPR = 8.53–8.94	V2V
[24]	3700	Rural (Greece)	3D Distance, Tx Height, Rx Height, LoS, OLoS	Path Loss	ANN, SVR	ANN = 4–4.9, SVR = 4.4–6.5	Other
[25]	5900	City (Berlin, Germany)	2D City Map, Street Map, Vehicle Location	Path Loss	CNN	0.032	Other
[26]	700-2600	Kent (Melbourne, USA)	Tx Gain, Position Height, Distance	Path Loss	ANN	0.636	Other
[27]	14000, 18000, 22000	Indoor Environment (South Africa)	Distance, Operating Frequency, Angle of incidence, Transmitter Antenna Height	Path Loss	SVR, DT, RF, ANN, KNN	SVR = 0.93, ANN = 0.036, KNN = 0.91, DT = 1.16, RF = 0.91	Other
[28]	811, 2630	University Campus (Denmark)	Local coordinates (lat. and long.), Tx indicator, Distance, 3D Distance, Satellite Image	Path Loss	DNN	4.1	Other
[29]	2500	Urban/Suburban Rural (China)	Distance, Tx height, Rx height, Obstruction loss (DL): Calculated by Deygout method, Frequency, Crossed distance, di	Path Loss	ANN	4.1–6.8	Other
[30]	854.71	Urban (Brazil)	Distance, Terrain elevation, Horizontal angle, Vertical angle, Latitude, Longitude, Antenna horizontal and vertical attenuation	Path Loss	SVR	1.76–4.55	Other
[31]	10 (VVLC)	Campus Road and Indoor Parking Garage	Distance, Ambient light, Rx tilt angle, Optical turbulence, Occupied lane, Tx LED type, Rx detector type, Tx height, Rx height, Road surface, Environmental conditions	Path Loss	RF	3.95	V2V
[32]	600	Urban (Ecuador)	Distance, Base station (BS) height, Mobile station (MS) height, Carrier frequency, Rx coordinates (X, Y), LoS or NLoS status	Path Loss	SVR, RF, kNN, ANN	5.9–6.4	Other
[33]	1890	Urban (Greece)	Distance, Street width, Building height, Distance between building blocks, Tx antenna position with respect to roof height	Path Loss	ANN	5.8–6.6	Other
[34]	5900	Suburban	Tx and Rx Location, Distance, Type of obstacle between Tx and Rx	Path Loss	RF	MAE:3.18	V2V
[35]	5860	Urban and Highway Scenarios	Distance, Direction, Relative Speed Between Two Vehicles	Path Loss	RF	MAE:4.31	V2V
[36]	3700	Suburban (Korea)	Distance, LoS/NLoS status, Effective height between Tx and Rx	Path Loss	ANN, RF, SVR, kNN	RF = 4.3, SVR = 4.4, ANN = 4.5, kNN = 4.2	Other
[37]	3520	Aircraft Interior Cabin	Location data, Antenna configurations, Channel impulse response matrix	Path Loss	SVR	0.03–1.43	Other
[38]	900, 1800, 2100, 2400	Indoor	Tx-Rx distance, Frequency, Number of walls and floors crossed, Collision angle with walls and floors, Furniture index and human density ratio	Path Loss	ANN	5.4–9.7	Other
[39]	1800	Urban, rural, suburban	Longitude, Latitude, Altitude, Obstacle height, Elevation	Path Loss	ANN	5.56	Other
[40]	3000–6000	Urban	Distance, Frequency	Path Loss	ANN	6.10	Other
[41]	5900	Suburban	Distance, Tx and Rx Latitude-Longitude Info, Integer variable that distinguishes vehicles between Tx and Rx	Path Loss	RF	2.49	V2V

Table 2. Technical specifications of the DSRC OBU device [49].

Parameter	Value
Standard	IEEE 802.11p
Frequency band	5.9 GHz
Data rate	3–27 Mbps
Tx power	22 dBm
Antenna gain	5 dBi
Antenna heights	Vehicles = 1.46 m (Tx and Rx)
Rx sensitivity	−99 dBm at 3 Mbps

Table 3. Description of input features.

Input Feature	Description
Number of Antennas	Measurements were categorized into two groups: single (0) and double (1) antenna setups.
Antenna Position	The position of the antennas was classified as either inside (0) or outside (1) the vehicle.
Weather Conditions	Measurements were taken under two different weather conditions: sunny (0) and snowy (1) days.
Direction of Vehicles	Measurements were conducted based on the direction of the vehicles, either opposite (0) or the same (1).
Distance	The distance between vehicles was considered in meters.

Table 4. Hyperparameter optimization results for machine learning models.

Model	Parameter	Range Tested	Optimized Value
ANN	Number of Layers	1-3-5-7	3
	Number of Neurons in Layer 1	0-10-50-100	10
	Number of Neurons in Layer 2	0-10-50-100	10
	Number of Neurons in Layer 3	0-10-50-100	10
	Activation Function	ReLU, Sigmoid, Tanh	ReLU
	Number of Epochs	500-1000-1500-2000	1500
RF	Minimum Leaf Size	10-20-30-40-50	20
RF	Surrogate Decision Splits	On, Off	On
LR	Initial Terms	Linear, Quadratic	Linear
	Upper Bound on Terms	Interactions, Stepwise Linear	Interactions
	Maximum Number of Steps	1000-2000-3000-4000-5000	2000
GB	n_estimators	50-100-200-300	200
	Learning Rate	0.001-0.01-0.1-1	0.01
	Max Depth	1-5-10	5
SVR	Kernel Function	RBF, Linear, Polynomial	RBF
	C (Regularization Parameter)	0.1-0.5-1-5-10	0.5
	Epsilon	0.01-0.05-0.1-0.5	0.05
	Kernel Scale	0.1-1-5-10	1
AB	Base Estimator	Random Forest, Linear Regression	Random Forest
	n_estimators	50-100-150-200	100
	Learning Rate	0.01-0.1-0.5-1	0.1

Table 5. Performance comparison of path loss prediction models (validation).

Prediction Type	Model Type	MAE	MSE	RMSE	R²
Machine Learning	RF	0.0557	0.0059	0.0774	0.97
	GB	0.0701	0.0093	0.0964	0.96
	ANN	0.1080	0.0222	0.1477	0.91
	SVR	0.1108	0.0235	0.1533	0.90
	AB	0.1782	0.0483	0.2198	0.79
	LR	0.2299	0.0772	0.2780	0.68
Empirical Model	Log-Distance	-	15.840	3.98840	0.92
	Two-Ray	-	100.602	10.0380	0.32
	Log-Ray	-	70.5625	8.4032	0.29

Table 6. Obtained feature weights of the PCA, MRMR and F-Test Methods.

Features	PCA (%)	MRMR	F-Test
Distance	43.1	1.7983	∞
Snowy	26.7	0.8683	∞
Dual Antenna	-	0.8683	-
Sunny	13.9	0.8683	∞
Same Direction	12.3	0.5973	736.6049
Opposite Direction	-	-	369.3322
Antenna Outside	-	0.5295	-
Antenna Inside	-	0.5295	-
Single Antenna	-	0.6326	-

Table 7. Performance of models after feature selection methods (RMSE).

Model	No Feature Selection	PCA	MRMR	F-Test
RF	0.0774	0.0810	0.0821	0.08124
GB	0.0964	0.1053	0.9874	0.9959
ANN	0.1477	0.1488	0.1493	0.1032
SVR	0.1533	0.1601	0.1645	0.1585
AB	0.2198	0.1787	0.1998	0.2156
LR	0.2780	0.2793	0.2815	0.2832

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sagir, N.; Tugcu, Z.H. Machine-Learning-Based Path Loss Prediction for Vehicle-to-Vehicle Communication in Highway Environments. Appl. Sci. 2024, 14, 7545. https://doi.org/10.3390/app14177545

AMA Style

Sagir N, Tugcu ZH. Machine-Learning-Based Path Loss Prediction for Vehicle-to-Vehicle Communication in Highway Environments. Applied Sciences. 2024; 14(17):7545. https://doi.org/10.3390/app14177545

Chicago/Turabian Style

Sagir, Nugman, and Zeynep Hasirci Tugcu. 2024. "Machine-Learning-Based Path Loss Prediction for Vehicle-to-Vehicle Communication in Highway Environments" Applied Sciences 14, no. 17: 7545. https://doi.org/10.3390/app14177545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine-Learning-Based Path Loss Prediction for Vehicle-to-Vehicle Communication in Highway Environments

Abstract

1. Introduction

2. Theory

2.1. Empirical-Model-Based Predictive Modeling

2.1.1. Log-Distance Path Loss Model

2.1.2. Two-Ray Path Loss Model

2.1.3. Log-Ray Path Loss Model

2.2. Machine-Learning-Based Predictive Modeling

2.2.1. Artificial Neural Networks (ANNs)

2.2.2. Random Forest (RF)

2.2.3. Linear Regression (LR)

2.2.4. Gradient Boosting Regression (GB)

2.2.5. Support Vector Regression (SVR)

2.2.6. AdaBoost Regression (AB)

3. Materials and Methods

3.1. Measurement Scenarios and Setup

3.2. Data Processing

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI