*Article* **Development of a Multilayer Perceptron Neural Network for Optimal Predictive Modeling in Urban Microcellular Radio Environments**

**Joseph Isabona 1, Agbotiname Lucky Imoize 2,3, Stephen Ojo 4, Olukayode Karunwi 5, Yongsung Kim 6,\*, Cheng-Chi Lee 7,8,\* and Chun-Ta Li <sup>9</sup>**


**Abstract:** Modern cellular communication networks are already being perturbed by large and steadily increasing mobile subscribers in high demand for better service quality. To constantly and reliably deploy and optimally manage such mobile cellular networks, the radio signal attenuation loss between the path lengths of a base transmitter and the mobile station receiver must be appropriately estimated. Although many log-distance-based linear models for path loss prediction in wireless cellular networks exist, radio frequency planning requires advanced non-linear models for more accurate predictive path loss estimation, particularly for complex microcellular environments. The precision of the conventional models on path loss prediction has been reported in several works, generally ranging from 8–12 dB in terms of Root Mean Square Error (RMSE), which is too high compared to the acceptable error limit between 0 and 6 dB. Toward this end, the need for near-precise machine learning-based path loss prediction models becomes imperative. This work develops a distinctive multi-layer perception (MLP) neural network-based path loss model with well-structured implementation network architecture, empowered with the grid search-based hyperparameter tuning method. The proposed model is designed for optimal path loss approximation between mobile station and base station. The hyperparameters examined include the neuron number, learning rate and hidden layers number. In detail, the developed MLP model prediction accuracy level using different learning and training algorithms with the tuned best values of the hyperparameters have been applied for extensive path loss experimental datasets. The experimental path loss data is acquired via a field drive test conducted over an operational 4G LTE network in an urban microcellular environment. The results were assessed using several first-order statistical performance indicators. The results show that prediction errors of the proposed MLP model compared favourably with measured data and were better than those obtained using conventional log-distance-based path loss models.

**Keywords:** path loss models; log-distance models; neural networks models; MLP-based models; optimal predictive modelling; multi-layer perception neural network; urban microcellular radio networks

**Citation:** Isabona, J.; Imoize, A.L.; Ojo, S.; Karunwi, O.; Kim, Y.; Lee, C.-C.; Li, C.-T. Development of a Multilayer Perceptron Neural Network for Optimal Predictive Modeling in Urban Microcellular Radio Environments. *Appl. Sci.* **2022**, *12*, 5713. https://doi.org/ 10.3390/app12115713

Academic Editors: Pavel Lyakhov and Amalia Miliou

Received: 12 April 2022 Accepted: 2 June 2022 Published: 3 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Path loss models are unique prediction models employed by telecom network engineers to estimate the signal coverage area being served by a given transmitter during networking and management [1–3]. However, developing these signal path loss models with the optimal accuracy it deserves is a complex and significant problem in the planning of telecommunication networks. The conventional log-distance-based statistical models available in the literature, such as the cluster factor model, COST 234 Hata Model, free space model, Hata model and Lee models, lack accuracy for realistic path loss prediction applications in cellular mobile networks environments [4–11]. The aforementioned fundamental limitation of the conventional models is usually very pronounced when the respective models have been applied in cellular radio environments other than the developed and designed environment [12,13]. This scenario is mainly due to dissimilarities and variations in environmental formations (hilly, mountainous or quasi-plain), weather conditions, soil electrical properties and terrain type (open, rural, suburban or urban) that exist in different radio propagation locations, cities and countries [14–23]. For example, the Hata loss model was developed based on extensive practical measurements carried out in Japan at transmission frequency ranges of 150 to 1920 MHz and macrocellular communication distances of 1 to 100 km, with the mobile station and base station antenna heights of 2 to 3 and 30 to 1000 m, respectively [24]. This model, including other aforementioned conventional ones, is also generally limited in capturing the non-linear relationship between the independent variable (e.g., path loss) and dependent variable (e.g., distance) [25]. The precision of these conventional models on path loss prediction has been reported in many previous works to generally range from 8–12 dB in terms of Root Mean Square Error (RMSE), which is exceptionally higher than the acceptable values [9,14,15].

Recently, an Artificial Neural Network (ANN), a unique artificial intelligence soft computing and modelling technique, has been acknowledged and proven to solve function approximation and pattern classification problems [26]. Some ANN models exist in the literature; the key ones are Radial Basis Function models, Multilayer Perception Models, Generalized Neural Network models, etc. Among these models, the MLP ANN models have stood out most recently because they are very robust and popular for learning, function approximation, and pattern classification [27–30]. The MLP ANN possess many robust algorithms that can be explored to carry out more proficient adaptive nonlinear statistical modelling over the classical logistic regression methods [14–23] that are frequently engaged in developing predictive models. This robustness can be ascribed to their acknowledged special ability to learn, predict, and classify non-linear data using experience and preceding samples introduced to the network model. Huang [31] also noted that the MLP is characteristically good for input-output data mapping. Generally, a clear-cut underlining capability of ANN-based models over the conventional log-distance-based models is their large degrees of freedom structure which provides means for fitting many datasets with non-linear or linear correlation patterns. The concept of intelligent-based ANN models for optimal and adaptive prognostic estimations of path losses was introduced to surmount the limitations of existing empirically and deterministically developed log-distance models [32,33]. In the paper, the availability of manifold resourceful training algorithms and hypermeters of MLP ANN that can be tuned to further boost its extrapolative data analysis is worth exploring in this paper for optimal predictive modelling. Hence, the "Development of a Multilayer Perception Neural Network for Optimal Predictive Modeling in Urban Microcellular Radio Environments is self-evident." Other key robust advantages of the general ANNs are highlighted in Section 2.3.

Though several ANN models exist in the literature, a critical, challenging task remains in developing and using them appropriately through the correct selection of its network structural design with the required input elements and hyperparameters to solve peculiar predictive mapping and functional problems. The quest to address this issue is the leading motivation for this research paper. However, one primary challenge in using the MLP model is correctly selecting its network architecture with the required input elements (hyperparameters) to solve a particular mapping problem [34]. Another critical challenge with neural network models is the problem of determining the input data variables that must correlate with the target variables [35,36].

This paper develops a distinctive MLP-based path loss model with well-structured implementation network architecture, empowered with the grid search-based hyperparameter tuning method for optimal path loss approximation between mobile-station and base-station path lengths. The hyperparameters include the neuron number, learning rate and hidden layers number. In detail, the developed MLP model prediction accuracy level using different learning and training algorithms with the tuned best values of the hyperparameters have been applied for extensive path loss experimental datasets. The datasets were acquired via field drive tests conducted in Long Term Evolution (LTE) in urban microcellular radio networks. For the development and implementation of the MLP-ANNs model, we utilized version 2018a of the MATLAB neural networks toolbox. The toolbox provides the required user interface, algorithm and platform to train, test, validate, visualize and simulate networks with the desired number of layers, neurons and activation functions.

In particular, the contributions of this paper are summarized as follows:


The remainder of this work is structured in the following manner. Section 2 outlines the background information, such as radio propagation mechanisms, log-distance-based path loss prediction models, and the basis of artificial neural networks (ANN). Section 3 presents the methodology detailing the neural network implementation for predictive modelling. Section 4 provides the results, analysis, evaluation of the introduced neural network model at different study locations, comparison of the developed neural network model with log-distance models, and discussions. Finally, the conclusion is given in Section 5.

#### **2. Theoretical Background**

The theoretical background covers the radio propagation mechanism, log-distancebased path loss prediction models and artificial neural network systems.

#### *2.1. Radio Propagation Mechanism*

When radio signals travel, which are a form of electromagnetic waves, they interact with the media and objects they travel through. In the sequence of their interaction, the radio signals become weaker owing to refraction, reflection, diffraction, absorption and other propagation phenomena. The resultant effect of all the phenomena on propagated signals is signal propagation loss. The characteristics of the pathway or medium through which the radio signals travel determine the amount of propagation loss and the quality of the received signal that is attainable at the receiving terminal. Radio propagation loss is also governed by other sundry elements, particularly the transmitter power, receiver sensitivity and general antenna parameters such as antenna gain, antenna height and receiver location [1,2,37,38].

The prominent factors that influence the number of signal path losses in a medium include diffraction, reflection, refraction, scattering and absorption, to mention a few. For example, diffraction arises when radio waves collide with huge obstacles compared to the propagating signal wavelengths. Moreover, diffraction occurs when radio signals bend around objects, especially those with sharp edges. This alteration often empowers the received radio signal energy to spread around the boundaries of the obstructing object [39,40]. Diffraction is also influenced by the phase, amplitude, pathway and frequency of the transmitted waves.

The environment in which the radio frequency signals travel (or are propagated) will undoubtedly negatively impact the signal. For example, radio wave signals and propagation loss vary extensively in correspondence to the terrain landscape, building structures and population density. Marshy, damp and sandy terrain also attenuate radio signals, primarily propagated low-frequency signals. In other words, signals travel faster over conducive terrain than in sandy and marshy or damp terrains.

#### *2.2. Log-Distance-Based Path Loss Prediction Models*

Generally, path loss models are a set of mathematical models, expressions, resources and algorithms used for signal attenuation loss prediction between the paths of a base transmitter and the mobile station receiver. These models are helpful planning tools that assist the radio network designers of cellular telecommunication systems in optimally positioning base station transmitters to meet the desired signal coverage level and service quality requirements of the networks.

The predictive performance of any path loss model is determined by the resultant prediction accuracy with actual field measured loss data.

The log-distance-based path loss models are models whose average power loss logarithmically depends on distance (transmission path length) intertwined with a propagation exponent modelling parameter. The propagation exponent is usually employed to account for a specific radio propagation environment. They can also be described as simplified models that attempt to model variations, fluctuations and attenuations in the received signal power. Examples of log-distance-based models include the Walficsh–Ikegami, Walficsh Bertoni, cluster factor, COST 234 Hata, Hata Okumura, SUI model, Lee model, Egli Model and others [41]. Though these models have varying frequency validity thresholds, different correction factors have been applied to ease their applicability at the tested frequency band. Detailed descriptions of these models are contained in [14,17,42].

#### *2.3. Artificial Neural Networks (ANNs)*

ANNs, also popularly referred to as artificial neural systems, are efficient computing systems or relatively simple computational models founded on the neural organization of the brain with functional changing parameters to process information effectively. ANNs are distinctive and robust non-linear statistical data modeling networks wherein reasonably simple connections between inputs and outputs nodes alignments are established. According to Robert Hecht-Nielsen, the first inventor of the neurocomputer, a neural network can be defined as "*a computing system made up of several simple, highly interconnected processing elements, which process information by their dynamic state response external inputs*". The processing elements are called neurons. The neuron is the special mathematical function that captures and organizes information according to the neural network architecture.

Some of the essential features or advantages of ANNs are [31,34–36].


#### **3. Methodology**

As mentioned earlier, the ANN model possesses many robust training algorithms and hyperparameters that can be explored to conduct proficient adaptive nonlinear statistical modelling over the classical logistic regression methods. This section contains the materials and method explored to develop the proposed MLPANN-based model with well-structured implementation network architecture, empowered with the right hyperparameter tuning algorithm for optimal predictive analysis of practical path loss data. The stepwise exploratory method explored to develop the proposed MLPANN-based model is highlighted as follows:


#### *3.1. Data Collection*

The field measurement was conducted to acquire live signal data around three Long Term Evolution (LTE) transceiver base station antennas for one year (i.e., 12 months). The measurement took one year to cater to the study locations' seasonal variations and three LTE transceiver base station antennas operating at 2600 MHz with 10 MHz bandwidth [43]. The transceiver base station antennas (called NodeBs) are sectorized with 17.5 dBi gain and 43 dBm transmit power. The LTE network belongs to one of the major GSM/WCDMA/HSPA/LTE telecom service providers operating across major towns, villages and cities in Nigeria. The measurements were performed with field test tools with TEMS application tools for radio spectrum analysis. The test tools, some of which are displayed in Figure 1, include a Rover car, scanner, two Samsung mobile phones, and an HP lap, were explored to assess the performance of eNode B over the LTE radio air interface by connecting mobile phones directly to the Node B transmitters. To obtain the eNode B locations and delineate measurement data locations/information, the Global Positioning System (GPS) equipment was employed. The path loss data to be predicted are related to the acquired radio signal data by the measured path loss data where *PLmea*(dB), values have been obtained from the measured signal, RSRP (dBm) by Equation (1):

$$PL\_{\text{mean}}(\text{dB}) = EIRP + G\_A - RSRP\_{\text{meas}} \tag{1}$$

**Figure 1.** Illustration of the TEMS Drive Test Measurement System.

With *EIRP* calculated as Equation (2):

$$EIRP = P\_{TX} + G\_{TX} - CL\_{TX} \tag{2}$$

where *GTX* and *GA* are the base station (BS) transmit antenna gain and receiver (MS) antenna gain, respectively, *PTX* is the transmitted power, and *CLTX*, denotes transmission cable loss, all in dB. Table 1 reveals some of the key BS antenna site parameters acquired during the field drive test for calculation.

**Table 1.** Measurement Path Loss Computation Parameters.


#### *3.2. The MLP Neural Network Model*

The first step towards effectively engaging neural networks for predictive modelling is to know the exact type you want to use and determine network architecture. This paper considers the most robust and special type of neural network: the multi-layer perceptrons (MLP). A single perceptron (LP) has limitations in terms of input-desired output mapping capability. This limitation is because it only contains a single neuron per adaptable synaptic weights and bias; thus, it is only proficient in catering to ridge-like function, notwithstanding the type activation function explored [42]. The above limitation can be catered to using an MLP neural network with more source nodes with data input and output layers sandwiched with hidden layers nodes. Multiple layers of neurons in the MLP network provide enhanced input-desired output mapping capability.

Figure 2 displays a structure of a classic feedforward MLP network model composed of *g*1,*g*2, ... *gI*, inputs, and predicted output, (*y*1, *y*2, ... , *yN*) with *kh* hidden nodes, h number. The respective weights connecting the input and hidden layer, as well as the weights connecting the hidden layer and the output layer, are designated by *w*<sup>1</sup> *ij* and *<sup>w</sup>*<sup>2</sup> *jn*, while *Cj* indicates the hidden nodes thresholds. The network learns the correlation between input datasets and predicted output feedback by varying weight and bias values. Accordingly, the MLP network predicted output in correspondence to jth neurons with the *k*th node could be articulated as (3):

$$\mathcal{Y}\_n(t) = \sum\_{j=1}^{k\_k} w\_j^2 F\left(\sum w\_{ij}^1 \mathcal{Y}\_i(t) + c\_j\right) \tag{3}$$

for 1 ≤ *n* ≺ *m*, 1 ≤ *j* ≺ *kh*, *wj*, *j* = 0, 1, . . . , *kh* , *wij*, *i* = 0, 1, . . . , *m*; *j* = 0, 1, . . . , *kh* 

**Figure 2.** Scheme of a three-layered ANN multi-layer perceptron.

where:

*m*, *h* and *kh* indicate the input node number, hidden node and hidden node number, respectively; *i* designate input to *j* hidden layer neuron.

The *F* (·) in Equation (1) denotes the sigmoid activation function, an import function usually utilized in the MLP network. It can be defined by Equation (4):

$$F(a) = \frac{1}{1 + e^{-a}} \tag{4}$$

where *F*(*x*) is at all times in the range [−1, 1], with *F*(*a*) being a set of real numbers.

The weights *w*<sup>1</sup> *ij* and *<sup>w</sup>*<sup>2</sup> *jn*, including the threshold *Cj* , are unknown and thus can be chosen to update and reduce the error during prediction. The prediction error can be expressed by employing the expression (5):

$$\varepsilon\_{\rm n} = \frac{1}{2} \sum\_{n=1} \left( y\_n - \hat{y}\_n \right)^2 \tag{5}$$

where,

*yn* and *y*ˆ*<sup>n</sup>* represent the target (i.e., actual) data and their predicted output; and *n* = 1 ... , *N*, with *N* indicating the actual data sample number.

In MLP training, the error verve for assessing the network learning improvement related to convergence speed is the generalized aggregate error values. It is often computed using mean square error (MSE). The MSE can be obtained from the least square formation of Equation (6).

$$MSE = \frac{1}{N} \sum\_{n=1}^{N} \left( y\_n - \hat{y}\_n \right)^2 \tag{6}$$

In this work, the feedforward MLP network model explored for path loss predictive modelling is displayed in Figure 3.

**Figure 3.** ANN MLP Model Structure for Path Loss Prediction.

#### *3.3. MLP Modelling Parameters and Search Space*

Hyperparameters are a special set of regulating parameters that the NN model utilizes for the adaptive learning process in data training and testing. The special parameters may be categorical, continuous or integer variables whose values range are usually lower and upper bounded. Thus, there exists several MLPs directly impacting the predictive modelling. They include the hidden layers number, neurons number in the hidden layer, transfer function, etc. A summarised description of the transfer function is given in the following subsection.

#### 3.3.1. The Hidden Layers

Deciding the number of the hidden layers is one of the most important issues while investigating the neural network architecture for predictive modelling and data mining. Using too many hidden layers can result in poor generalization and complex neural network training. According to authors in [44–47], two hidden layers, combined with m output neurons, are adequate for a neural network to learn N data samples and produce negligibly minor errors.

Previous studies have examined the suitability of several machine models for path loss predictions as contained in [48–50]. The need to overcome the problems of empirical models when used for path loss predictions led to an artificial neural network [49]. ANN path loss prediction models were also more efficient and easier to deploy than deterministic models [51]. In [52,53], analyses of empirical models with different propagation features were performed, and the model with the lowest RMSE value was then compared with the prediction from ANN. The ANN-based path loss prediction model produced a much lower value of MSE upon validation. In [54], a multi-layer perceptron neural network was introduced for path loss prediction. The MLP network was then trained with a backpropagation algorithm. The MLP-based prediction was compared with predictions from analytical models, and the results indicated the former to be efficient for radio network planning and optimization.

ANN was also used for path loss prediction in urban areas [55]. The work explored the effect of the various input parameters and the environmental terrains on the robustness of the path loss prediction. One key finding from the study is that the accuracy of the signal prediction model increases with more input parameters: the greater the number of features, the greater the system's accuracy. This trend is because machine learning algorithms thrive well with the availability of large datasets. The model is trained with the help of the data and, in this case, the input features. An ANN-based path loss model at 800 MHz and 1800 MHz introduced in [47,48] were input for longitude, latitude, distance, elevation, clutter height and elevation. The ANN method in [56] outperforms the Support Vector Machine (SVM)- and the Random Forest (RF)-based predictions.

In [57], an artificial neural network was used for path loss prediction in a smart campus environment at 1800 MHz. There were two hidden layers for this network, and the performance of the network outperforms the prediction made by using RF. Moreover, in [58–60], several machine learning-based prediction models were introduced for signal predictions for wireless sensor networks. The machine learning-based prediction model in [61,62] also produced the lowest values of RMSE when compared to the other analytical models in a wireless sensor network.

#### 3.3.2. Neurons Number in the Hidden Layers

Determining the neurons number in the hidden layers remains an integral part of the inclusive neural network architecture. An inadequate neurons number in the hidden layers can lead to an underfitting problem. Underfitting arises once there are insufficient neurons number in the hidden layers to learn or detect the signals satisfactorily, especially in a multifaceted dataset.

On the other hand, using too many neurons can lead to overfitting problems. Overfitting occurs once the neural network contains too much information processing capacity problem. It can also result in excessive time increase during neural network training. The amount of training time can increase to the point that it is impossible to train the neural network adequately. It is evident that some give and take must be grasped between too few and too many neurons number in the hidden layers.

#### 3.3.3. Transfer Function

The transfer function is a singular, monotonically increasing and differentiable function used for translating the input data signals to produce the final output signals of a neuron. The transfer function is fundamental to the concrete concept of neural networks mainly for two key reasons. First, without activation functions, the entire organization of the neural network will be similar to a typical linear function that cannot learn nonlinear relationships. Second, transfer function styles and graces the main computation accomplished by neural networks.

#### 3.3.4. Learning Rate

*The* learning rate *is* another vital hyperparameter that regulates or fine tunes the weights of NN in relation to the loss gradient. Its value must be cautiously chosen to support both optimization and generalization robustly. A too-large learning rate value can cause the entire learning process to jump over minima. Similarly, a too-small learning rate value can make the entire learning process too long to converge, resulting in it being trapped in negative and spurious local minima.

#### *3.4. Hyperparameter Tuning*

Hyperparameter tuning or optimization expresses the robust procedure of identifying and finding the best feasible values of hyperparameters for a machine learning model to attain the desired resultant modeling outcome. Popular hyperparameter tuning algorithms in the literature include random search, grid search and Bayesian optimization search. In this paper, the last two methods are considered. Grid search is a standard

hyperparameter optimization technique wherein a list of critical parameters is selected with attached feasible values for each parameter, followed by training the model for every single blend and then choosing the values that yield the most desired resultant outcome. The Bayesian optimization method is a special sequential model-based optimization (also known as Bayesian optimization), utilizing the 'Bayes Theorem' to conduct an automatic hyperparameter search. Particularly, the Bayesian optimization search algorithm utilizes the upshot from the preceding iteration to select the next hyperparameter values.

#### *3.5. MLP Learning Algorithms*

The training algorithm used for a neural network system to learn and solve a problem is essential. The correct training algorithm from the available sundry types depends on diverse factors, including data sample size, task type, training time constraints, precision/accuracy requirements, etc. It is demanding to find out which training algorithm will produce the most satisfactory results. For example, suppose it is a predictive modelling task with function approximation. In that case, the dominant or most common ones are backpropagation training algorithms, which involve carrying out computations backwards over the network to fine-tune the weights and minimize performance error.

#### *3.6. MLP Network Model Implementation Process*

MATLAB is a distinctive programming language with a multi-exemplar numerical computing environment and a user interface. It provides easy matrix manipulation, graphical multi-domain simulation, figurative computing, creative functions plotting, excellent data mining, easy algorithms implementation, etc. MATLAB allows access to optional toolbox uses. The neural network toolbox has special tools for model-based design, implementation, visualization and simulation of neural networks. MATLAB is employed to encode the script files for the MLP network model predictive training, testing and quantitative evaluation in this work. The program code for conventional path loss calculation and assessment is also explored. The proposed MLP neural model consists of five input nodes and one output node. Flowchart for executing proposed predictive modelling with ANN while training and testing are shown in Figure 4.

As mentioned earlier, for practical and optimal application of MLP network for predictive modelling purposes, the right choice of the learning algorithm, and selection of the network processing elements such as the number of neurons, number of network layers and transfer functions, are crucial. For example, a network with insufficient neurons number in a hidden layer may fail to capture complex links between target output and input variables. Conversely, if the number of neurons allotted in the network hidden layer is too many, the network would likely follow the latent noise in the dataset owing to over-parameterization, and this, in turn, can lead to awkward generalization and poor predictive modelling of the original data [63,64]. Therefore, the determination of the hidden layers number and their number of neurons is performed by trial and random selection. However, for conciseness and the need to attain optimal neural network training and testing, the search for the required number of neurons and hidden layers in the network layers were narrowed down to 2–50 and 1–3, respectively.

Generally, if a particular algorithm performs well during the dataset training but flops in the aspect of generalization, we refer to it as overfitting. To improve generalization (or prevent overfitting) during the path loss data training with each of the NN algorithms, we employed input/target transformations and early stopping techniques. Thus, the inputs and targets datasets were scaled to reside in the range [−1, 1] to enhance training and testing speed. Moreover, the early stopping measures were engaged for training and testing to avoid overtraining, eliminate contemptuous impact stimulated by the initial values, and develop robust adaptive predictive ability. Although many learning algorithms are available for MLP neural network training and testing in MATLAB software, it is demanding to identify which algorithm works best for a given predictive modelling problem concerning convergence speed and accuracy [16]. Therefore, an exhaustive search is employed in this

study to accelerate the convergence and evaluate the impact of all the available learning algorithms during network training. The respective learning algorithms assessed to develop an optimal MLP network predictive model and their weight adaptation techniques are listed in Table 2. The target of the weight adaptation is to determine the optimum weight update for the input-output data array pair during training.

**Figure 4.** Flowchart for the execution of Proposed Predictive Modelling with MLP Neural Network.


#### **Table 2.** Respective MLP Network Learning Algorithm.

#### **4. Results and Discussions**

As mentioned earlier, many factors directly impact the development of an excellent back-propagation neural network predictive model, especially with a trial and error method, as adopted in this work. They include training algorithm, hidden layers number, neurons number in the hidden layer, transfer function, momentum term, learning rate, etc. Here, the

concentration is on the training algorithm, hidden layers number and neurons number in the hidden layer. To obtain the predictive path loss modelling results, first we divided the path loss data into portions and employed the grid search-based hyperparameter tuning method to generate configurations for the parted data chunks for training and testing. The performance results of the proposed method were evaluated and reported for each machine learning algorithm described in Table 1, using Mean Absolute Error (MAE), Mean Square Error (MSE), Root Mean Square Error (RMSE), Coefficient of Efficiency (COE), Correlation Coefficient (R) and Standard Deviation Error (STD) [65].

Secondly, the predictive path loss modelling was conducted for three study locations using the standard Bayesian optimization for hyperparameter tuning. The results were compared to our first results using the grid search-based hyperparameter tuning method.

#### *4.1. Neurons Number Impact*

Determining the neurons number in the hidden layers also remains integral in the inclusive neural network architecture. An inadequate neurons number in the hidden layers can lead to an underfitting problem. Underfitting arises once there is insufficient neuron number in the hidden layers to satisfactorily learn or detect the signals, especially in a multifaceted dataset. On the other hand, making use of too many neurons can lead to overfitting problems. Overfitting takes place once the neural network contains too much information processing. It can also result in excessive time increase during neural network training. The amount of training time can increase to the point that it is impossible to adequately train the neural network. It is very clear at this point that some give and take must be grasped between too few and too many neurons number in the hidden layers. Accordingly, by starting with the fastest training algorithm, which is Levenberg–Marquardt (lm), the network was trained and tested with 2, 5, 10, 15, 20, 25, 30, 35, 40, 45 and 50 neurons. This is to ascertain their incremental impact on its performance. Table 3, Figures 5 and 6 display the detailed overall predictive performance of each neurons number using different error statistics. As seen in Table 3, it was expected that a continuing increase in neurons number per layer would result in an upturn in the resolution of the neural network prediction fitting pattern to the measured loss data. At 30 neurons per layer, the neural network model had a good enough fit to the measured loss data with an MAE value of 2, RMSE value of 2.71, STD value of 1.82, R-value of 95 (%) and COE value of 90 (%). Increasing the neuron number to 50 showed no performance improvement, as seen in Figure 6.


**Table 3.** Neurons Number impact on MLP Network Predictive Modelling with LM.

**Figure 5.** Overall Performance Error Statistics with MAE, STD and RMSE.

**Figure 6.** Overall Performance Error Statistics with R (%) and COE (%).

#### *4.2. Transfer Function Impact*

The transfer function is a singular monotonically increasing and differentiable function used for translating the input data signals to produce the final output signals of a neuron. The transfer function is fundamental to the concrete concept of neural networks mainly for three key reasons. Firstly, without activation functions, the entire organization of the neural network will be similar to an ordinary linear function that cannot learn non-linear relationships. Secondly, transfer function styles grace the main computation accomplished by neural networks. Thirdly, transfer functions possess the proclivity to boost the learning rate and formation patterns in datasets. Thus, the choice of the right transfer (activation) function also positively influences the performance of the NN training algorithm. Table 4 presents the list of sigmoid transfer functions employed in this study to ascertain the stability of the proposed neural network model. The performance of sigmoid transfer functions in terms of MAE, MSE, RMSE, R and STD are also displayed in Table 4. The standard deviation (STD) statistics with one, two and three layers of training are given in Figure 7, and the Root Mean Error performance statistics with one, two and three layers of training are shown in Figure 8.


**Table 4.** Transfer Functions used for network training/testing and their performance.

**Figure 7.** Standard deviation (STD) statistics with one, two and three layers of training.

**Figure 8.** Root Mean Error performance statistics with one, two and three layers of training.

#### *4.3. Training Algorithm and Hidden Layers Number Impact*

The learning algorithm and hidden layers number also significantly impact the success of the neural network in coming up with a healthy predictive model. Therefore, deciding on the number of the hidden layer is one of the most important issues that come up while investigating the neural network architecture for predictive modelling and data mining. Using too many hidden layers can result in poor generalization and complex neural network training. According to the work [31], two hidden layers, combined with m output neurons, are adequate for a neural network to learn N data samples and produce negligibly small errors.

Here, the impact of many learning algorithms has been studied with one, two and three hidden layers numbers. Interestingly, results reveal that the two-layered network is superior to one-layered and three-layered layer network for all the 12 learning algorithms investigated. Interestingly, results show that the neural network architecture trained using lm (i.e., the Levenberg–Marquardt training algorithm) with two hidden-layer sizes and logsigtransit transfer function gave the best performance. Table 5 displays detailed network training/testing error statistics results and hidden layer numbers for each learning algorithm.

#### *4.4. Performance of Grid Search Algorithm and Bayesian Optimisation Search Algorithm for Hyperparameter Tuning*

The hyperparameter tuning process has a weighty influence on neural network learning performance. Given the computational resources requisite, the hyperparameters of high relevance receive superior usage in the hyperparameter tuning process. Hyperparameters with a more robust influence on weights are more effective during neural network training. Thus, the appropriate choice of hyperparameters selected for neural network model training influences the network training and performance.

As displayed in Table 6, the proposed MLP neural network model results with grid search algorithm-based hyperparameter tuning (optimization) are compared with those obtained using the traditional Bayesian optimization search-based hyperparameter tuning approach for path loss data predictive analysis using location one as a case study. We have reported the results attained for lm and br learning for brevity. While the grid searchbased hyperparameter tuning performs an in-depth and comprehensive search on the hyperparameters in a stepwise manner as set specified by users with limited search space, the Bayesian search-based hyperparameter tuning performs a sequential-based search on the hyperparameters via several trials, without the user having preliminary information of

the hyper-parameters distribution. From the results in Table 6, it is clear that the proposed MLP neural network model with grid search algorithm-based hyperparameter tuning outperforms the ones obtained using Bayesian Optimization search-based hyperparameter tuning. Next, Section 4.5 is to apply the proposed MLP neural network model with grid search algorithm-based hyperparameter tuning for detailed predictive path loss analysis across the three study locations.


**Table 5.** Learning Algorithm and Hidden Layers Number impact on MLP Network.


**Table 6.** Comparison of Hyperparameter Tuning algorithm performance with Grid Search and Bayesian Optimisation search.

#### *4.5. Evaluation of Proposed Neural Network Model at Different Locations*

The evaluation results of the proposed neural network model at different study locations are presented as follows. Figures 9–11 show the proposed neural network model prediction with measured path loss data configured with the Levenberg–Marquardt training algorithm, two hidden-layer sizes and logsig-transig transfer function—the prediction performance of the developed neural network model in terms of R and MSE values. The R-value measures the prediction correlation between outputs (predicted loss data) and targets (actual loss data). The closeness of the R-value to 1 corresponds to a high positive correction. Otherwise, it is poorly correlated. Figures 12–14 are plotted R values between the predicted loss data and the actual loss values for sites 1 to 3 during training, validation and testing with neural networks. The R-values obtained from the plots are 0.97, 0.93 and 0.94 for site 1, 0.92, 0.93 and 0.94 for site 2 and 0.91, 0.93, 0.96, 0.94 for site 3. The performance plots in Figures 15–17 indicate that the MSE becomes smaller with the epoch number (one complete training/testing/validation cycle). The word 'epoch' is used here to mean a special hyperparameter term that defines the number of times (in terms of iteration) that the NN algorithms undergo during the entire data training duration. The error of test and validation display similar characteristics while predicting the measured loss across sites 1 to 3. Specifically, the validation MSE error shows that the proposed neural network model would not generalize well or fit the measured loss data well if trained further than 4, 8 and 8 epochs. The mean prediction error along measurement data points in sites 1, 2 and 3 are presented in Figures 18–20.

**Figure 9.** Comparison between measured loss and the prediction ANN model in site 1.

**Figure 10.** Comparison between measured loss and the prediction ANN model in site 2.

**Figure 11.** Comparison between measured loss and the prediction ANN model in site 3.

**Figure 12.** Prediction performance with correlation coefficient in site 1.

**Figure 13.** Prediction performance with correlation coefficient in site 2.

**Figure 14.** Prediction performance with correlation coefficient in site 3.

**Figure 15.** Network training cycles in site 1.

**Figure 16.** Network training cycles in site 2.

**Figure 17.** Network training cycles in site 3.

**Figure 18.** Mean prediction error statistics along with Data points in site 1.

**Figure 19.** Mean prediction error statistics along with Data points in site 2.

**Figure 20.** Mean prediction error statistics along with Data points in site 3.

*4.6. Comparison of Prediction Accuracy of Proposed Neural Network Model with Log-Distance Models*

Detailed prediction capabilities of all the log-distance models and the proposed model on measured path loss data are provided in the plotted graphs of Figures 21–23 in terms of MAE, RMSE and STD. The graphs show that the neural network model achieved the best predictions with marginal errors. The COST 213 (W/I) made the closest prediction to measured loss, but in terms of accuracy, the proposed neural network model achieved the best performance by 20%, 15% and 25%, respectively, across study sites. For example, while COST 213 (W/I) reached 3.34, 2.35 and 4.23 dB in terms of RMSE, the proposed neural network model attained 1.73, 2.11 and 1.45 dB across study sites. Generally, models which predict the path loss in the tested areas with RMSEs higher than the acceptable range of up to 6 dB are not selected as most suitable. However, such models could be further optimized for improved performance. The lower the RMSE value towards zero, the better the model. Regarding standard deviation error, COST 213 (W/I) achieved 1.73, 2.11 and 1.45 dB, while the proposed neural network model achieved 1.73, 2.11 and 1.45 dB, respectively. The poor predictions made by the log-distance-based models can be ascribed

to due to dissimilarities and variations in environmental formations (hilly, mountainous or quasi-plain), weather conditions, soil electrical properties and terrain types that exist in different radio propagation environments.

**Figure 21.** A comparison of mean absolute error statistics between the proposed ANN model and log-distance models on measured path loss in site locations 1, 2 and 3.

**Figure 22.** A comparison of root mean absolute error statistics between the proposed ANN model and log-distance models on measured path loss in site locations 1, 2 and 3.

**Figure 23.** A comparison of standard deviation error statistics between the proposed ANN and log-distance models on measured path loss in site locations 1, 2 and 3.

#### **5. Conclusions**

The growing demand for mobile and fixed cellular telecommunication services have given substantial weight to the limited available radio frequency spectrum. Proper modelling and precise signal coverage predictions are crucial to utilizing this scarce resource effectively. Reliable predictive modeling of signal path loss aids in controlling the load on base station transmitters and assists in designing efficient radio network channels with less interference and coverage hole problems. The conventional log-distance-based statistical models for path loss prediction comprising the clustering factor, COST 234 Hata, free space, Hata, Lee models, etc., are generally limited for predicting signal attenuation losses, especially when employed in different environments other than the environment for which they have been designed.

The main objective of this paper was to develop a distinctive MLP-based path loss model with well-structured implementation network architecture, empowered with the grid search-based hyperparameter tuning method for optimal path loss approximation between mobile-station and base-station path lengths. The degree of prediction accuracy with the developed MLP network model over eight conventional log-distance-based path loss models is also clearly provided using first-order statistics. In summary, this research paper has revealed that:


Future work would consider more hyperparameter selection techniques to optimize MLP model prediction accuracy during NN training. We also intend to explore more super layered training capacity of deep neural networks such as the long-short memory (LSTM) network model for predictive modelling of path loss data in our work.

**Author Contributions:** The manuscript was written through the contributions of all authors. J.I. was responsible for the conceptualization of the topic; article gathering and sorting were carried out by J.I., A.L.I. and S.O.; manuscript writing, original drafting and formal analysis were carried out by J.I., A.L.I., S.O., O.K., Y.K., C.-C.L. and C.-T.L.; writing of reviews and editing was carried out by J.I., A.L.I., S.O., O.K., Y.K., C.-C.L. and C.-T.L.; and J.I. led the overall research activity. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2020R1G1A1099559).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data that supports the findings of this paper are available from the corresponding author upon reasonable request.

**Acknowledgments:** The work of Agbotiname Lucky Imoize is supported in part by the Nigerian Petroleum Technology Development Fund (PTDF) and in part by the German Academic Exchange Service (DAAD) through the Nigerian–German Postgraduate Program under grant 57473408.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

