outlier value excluded; ND = no data available.

Similarly, the prediction of tebuthiuron was quite accurate, with a mean prediction error of just +0.3 mg/g. The range of prediction errors varied between −7 and +5 mg/g. As shown in Figure 24, the prediction results were relatively linear (R2 pred = 0.63), with an RMSEP of 3.8 mg/g. Again, this was well within the expected range of error associated with the HPLC reference method (7 mg/g).

**Figure 24.** The predicted vs. measured tebuthiuron content of the Regain 400 samples in the independent test set (*n* = 12).

It should be noted that the single Regain200 sample was excluded from the tebuthiuron prediction results, as the model had only previously been trained on the Regain400 samples; therefore, we could not predict the tebuthiuron content of the Regain200 sample with acceptable accuracy.

#### **4. Conclusions**

The results from this study suggest that NIRS is quite accurate for the rapid prediction of moisture content and moderately accurate for the prediction of tebuthiuron content. Handheld and even benchtop NIR devices could not only allow for rapid quality control, but also for improvement of the manufacturing process. This form of rapid, on-site testing with sufficient accuracy could allow for isolation of sources of process variation and guide targeted efforts to minimize their effects. This is particularly important as unwanted variations in the processes can be costly, cause manufacturing downtime, or be an indication of phenomena that reduce the plant reliability and performance. Overall, the results found here support the use of the handheld MicroNIR instrument for future studies and potential real-time implementation. Furthermore, the use of a larger calibration set is likely to moderately improve the prediction accuracy of the tebuthiuron model.

**Author Contributions:** Conceptualization, J.B.J., M.I. and M.N.; methodology, J.B.J.; software, J.B.J.; validation, J.B.J.; formal analysis, J.B.J.; investigation, J.B.J.; resources, J.B.J., M.N. and H.F.; data curation, J.B.J.; writing—original draft preparation, J.B.J.; writing—review and editing, J.B.J., H.F., M.I. and M.N.; visualization, J.B.J.; supervision, M.N.; project administration, J.B.J.; funding acquisition, M.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** Thanks to Kerry Walsh for providing access to the NIR instrumentation.

**Conflicts of Interest:** Hugh Farquhar is a current employee and Mansel Ismay is a past employee of Cirrus Ag, the manufacturer of Regain™. Aside from supplying the samples, Cirrus Ag had no role in the collection, analysis or interpretation of data.

#### **References**


### *Article* **On the "Thixotropic" Behavior of Fresh Cement Pastes**

**Youssef El Bitouri \* and Nathalie Azéma**

Laboratoire de Mécanique et Génie Civil, LMGC, IMT Mines Ales, University of Montpellier, CNRS, 34000 Montpellier, France

**\*** Correspondence: youssef.elbitouri@mines-ales.fr; Tel.: +33-4-66-78-53-67

**Abstract:** Thixotropic behavior describes a time-dependent rheological behavior characterized by reversible changes. Fresh cementitious materials often require thixotropic behavior to ensure sufficient workability and proper casting without vibration. Non-thixotropic behavior induces a workability loss. Cementitious materials cannot be considered as an ideal thixotropic material due to cement hydration, which leads to irreversible changes. However, in some cases, cement paste may demonstrate thixotropic behavior during the dormant period of cement hydration. The aim of this work is to propose an approach able to quantify the contribution of cement hydration during the dormant period and to examine the conditions under which the cement paste may display thixotropic behavior. The proposed approach consists of a succession of stress growth procedures that allow the static yield stress to be measured. For an inert material, such as a calcite suspension, the structural build-up is due to the flocculation induced by attractive Van der Waals forces. This structural build-up is reversible. For cement paste, there is a significant increase in the static yield stress due to cement hydration. The addition of superplasticizer allows the thixotropic behavior to be maintained during the first hours due to its retarding effect. However, an increase in the superplasticizer dosage leads to a decrease in the magnitude of the Van der Waals forces, which can erase the thixotropic behavior.

**Keywords:** thixotropy; yield stress; cement paste; hydration; superplasticizer

**1. Introduction**

In rheology, thixotropy characterizes a time-dependent behavior [1–3]. This phenomenon, which is generally characteristic of flocculated suspensions, reflects the progressive breakdown (under a constant shear rate) of the structure formed at rest. The rheograms (shear stress as a function of shear rate) of thixotropic materials generally display a hysteresis loop. This evolution of the rheological behavior is reversible since the structural build-up occurs if the material is left at rest.

For cementitious materials, thixotropy was used to ensure proper casting and workability, especially for self-compacting concretes or printable concretes [4,5]. In addition, it allows the maintenance of workability and fluidity to be evaluated [6,7], which is very important from a practical point of view.

During the dormant or low activity period of cement hydration, the rheological behavior of cement pastes is often considered to be reversible. However, it appears that the initial structure can never be fully restored, even during this dormant period [6,8–10]. This is why cement pastes cannot be considered as typically thixotropic materials. In fact, due to the chemical evolution induced by the initial hydration reactions, the structural build-up (or breakdown) is not reversible. Roussel et al. [1] found that the structural build-up of cement pastes may be due to two origins: colloidal interactions between cement particles, which are reversible (thixotropy), and early hydrates, which form preferentially at the contact points between cement grains (irreversible). It can be noted that the irreversible changes in fresh cement paste structures can affect workability in time. This permanent change is thus defined as workability loss.

**Citation:** El Bitouri, Y.; Azéma, N. On the "Thixotropic" Behavior of Fresh Cement Pastes. *Eng* **2022**, *3*, 677–692. https://doi.org/10.3390/eng3040046

Academic Editors: Antonio Gil Bravo and F. Pacheco Torgal

Received: 16 November 2022 Accepted: 14 December 2022 Published: 14 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Furthermore, the addition of a superplasticizer decreases the contribution of hydration on the structural build-up when the cement paste is left at rest [11] and thus contributes to the decrease in the workability loss. The effect of the superplasticizer can be explained by the retarding effect.

The assessment of the contributions of the reversible flocculation (thixotropy) and the irreversible chemical evolution to the structural build-up is a very interesting challenge. Different methods based on rheological measurements have been developed to assess these contributions. One of these approaches consists of the determination of the evolution of the shear stress as a function of an ascendant and descendant shear rate. The hysteresis loop, i.e., the area between the up and down curves, is an indicator of thixotropy [12–14]. Another relevant approach to assess the structural build-up is to use oscillatory measurements, such as small amplitude oscillating shear (SAOS), which allow measurements of the viscoelastic properties of suspensions (storage modulus G'; loss modulus G") within the linear viscoelastic region [1,15–18].

Another method consists of determining the evolution of the static yield stress (the minimum stress that induces flow) by a stress growth procedure [8,10,19–21]. The slope (Athix) of the static yield versus the resting time curve is a relevant indicator of the structural build-up. In literature, this slope represents the flocculation rate and describes the reversible part of the structural build-up (thixotropy) due to particle flocculation. The order of magnitude is between 0.1 and 1.7 Pa/s [1,15].

Furthermore, the contribution of the chemical evolution during the dormant period is generally neglected, and the application of a strong shearing or remixing is considered sufficient for erasing the structural build-up. However, it appears that the structural build-up during the dormant period is not fully reversible. Recently, by using oscillatory measurements, Zhang et al. [15] found that the irreversible part of the structural build-up cannot be neglected and suggested that the structural build-up can be quantified by Astruct, which is the sum of the thixotropic part and the chemical part:

$$\mathbf{A}\_{\text{struct}} = \mathbf{A}\_{\text{trix}} + \mathbf{A}\_{\text{cherm}} \tag{1}$$

However, Zhang et al. did not provide quantification of Astruct. Based on their results, Astruct is about 0.07 Pa/s, Athix is about 0.06, and Achem is 0.01 Pa/s.

The aim of this study is to propose another approach based on the static yield stress measurement able to quantify the contribution of the chemical evolution to the structural build-up during the dormant period of cement hydration. This method is tested on ordinary Portland cement and calcite. The effect of a superplasticizer on the structural build-up is examined.

#### **2. Materials**

In this study, an ordinary Portland cement (CEM I 52.5 R CE CP2 NF) provided by Lafarge Holcim is used. This cement is composed of clinker (95%) and gypsum (5%). Its specific surface (Blaine) measures at 4420 cm2/g, and its density is about 3.14 g/cm3. In addition to cement, an inert carbonate of calcium (calcite) provided by Omya BL is used. Its density is about 2.75 g/cm3, and its BET-specific surface is 2.25 m2/g. Calcite is commonly used as a model material to mimic the behavior of complex cementitious materials during the dormant period [16,22–24].

The particle size distributions of the cement and calcite are determined in water using a laser granulometer (LS 13320) from Beckman Coulter Company with an adapted optical model (Figure 1). The physical properties are summarized in Table 1.

**Figure 1.** Particle size distributions of cement and calcite.

**Table 1.** Physical properties of cement and calcite.


A commercial polycarboxylate-based superplasticizer (PCE) from Masters Builders with an equivalent dry extract content of 19.5 wt% is used.

The cement and calcite pastes were mixed with deionized water with a water-to-solid ratio (E/C) of 0.4 in a planetary agitator according to the following sequence: 5 min mixing at 500 rpm, 30 s scraping the mixer walls, and 1 min mixing at 1000 rpm. Two dosages of superplasticizer are used: 0.05 and 0.1 wt% of dry substance. A delayed addition of the superplasticizer is performed after 5 min of mixing.

The samples' preparations are performed at ambient temperature (20 ◦C ± 2).

#### **3. Methods**

#### *3.1. Rheological Measurements*

The rheological measurements were carried out using a rotational rheometer AR2000Ex from TA Instruments equipped with a four-blade vane geometry. The internal diameter of this geometry is 28 mm, and the outer cup diameter is 30 mm. The resulting gap is 1 mm. The geometry constants were calibrated using the Couette analogy suggested by Aït-Kadi et al. [25].

The proposed testing method consists of a succession of stress growth measurements [19,26,27] with different resting times, as shown in Tables 2 and 3. The testing procedures begin with a strong pre-shear (100 s−1) to homogenize the paste in the rheometer cup and are followed by a resting time (10 min, 20 min, and 40 min). Then, stress growth (1, 2, 3, and 4) is applied to the paste. Procedure 1 allows for measurement of the rate of increase in the static yield stress due to the total structural build-up (reversible thixotropy and irreversible chemical evolution), while Procedure 2 (Table 3) allows for erasure of the reversible part of the structural build-up via application of a strong pre-shear before the stress growth measurements.


**Table 2.** The proposed testing method for the total structural build-up (Procedure 1).

**Table 3.** The proposed testing method for the chemical structural build-up (Procedure 2).


The stress growth experiment consists of measuring the shear stress evolution under a very low constant low shear rate (0.01 s−1). The typical stress growth curve (Figures 2, A1 and A2 and Appendix A) displays two domains. The first domain, in which the shear stress increases almost linearly with the strain until it reaches a peak, is followed by a second domain (plateau) representing the steady-state flow. The peak defines the static yield stress, which is the minimum stress required to induce the first evidence of flow. The static yield stress originates from interparticle forces and direct contacts [28] and constitutes a relevant parameter for examining the workability of cementitious materials.

The experiments are carried out in triplicate, and the average values with their standard deviation are represented.

**Figure 2.** Typical evolution of shear stress under a low constant shear rate.

#### *3.2. Isothermal Calorimetry*

The addition of the superplasticizer leads to a retarding effect. To assess this effect, the hydration heat flow of the cement pastes is explored with an isothermal calorimeter TAM Air from TA Instruments. Pastes are prepared by external mixing at w/c = 0.4 and then introduced into the device. The calorimeter measures the difference in the heat flow between 5 g of cement paste and a reference (deionized water) at 25 ◦C.

#### **4. Results and Discussion**

#### *4.1. Thixotropy vs. Non-Reversible Structural Build-Up*

Thixotropy describes a time-dependent rheological behavior with reversible changes [2]. In fact, when a thixotropic material is left at rest for a long time, its yield stress (or viscosity) gradually increases. Shearing or mixing then makes it possible to recover the initial yield stress (or viscosity). At rest, there is a reversible structural build-up (flocculation) leading to an increase in the yield stress (or viscosity), whereas, under shearing, the structure formed at rest is broken (deflocculation).

For chemically inert colloidal suspensions, the structural build-up is almost reversible and is due to physicochemical interparticle interactions that lead to reversible agglomeration/dispersion phenomena. For cementitious materials, due to the chemical changes induced by hydration reactions, the structural build-up is not completely reversible; this is why they cannot be considered thixotropic materials. A part of the structural build-up induces permanent changes in fresh cement paste.

In order to assess the contribution of hydration reactions to the structural build-up, the proposed approach based on a succession of stress growth procedures is performed (Tables 2 and 3). The behavior of an inert carbonate calcium (calcite) suspension is compared to that of cement paste. The evolution of the static yield stress as a function of time is shown in Figure 3 for calcite and Figure 4 for cement paste.

**Figure 3.** Evolution of the yield stress of the calcite suspension.

**Figure 4.** Evolution of the yield stress of the cement paste.

For the calcite suspension, the application of a strong pre-shear before the stress growth measurements allows for the erasure of the structural build-up, as shown in Figure 3. In fact, without pre-shearing (procedure 1), the yield stress increases with time, which is characteristic of a structural build-up at rest. Moreover, the static yield stress remains almost constant with the application of shearing (procedure 2). In fact, after the first preshear (Table 3), the suspension is left at rest for 2 min, and then an initial static yield stress of an order of 160 Pa is measured. A second pre-shear is applied to the suspension in order to break down the structure formed at rest, and then the suspension is left at rest again for 10 min. The second static yield stress is about 155 Pa. After resting times of 20 min and 40 min, the calcite suspension successively exhibits static yield stresses of about 156 and 143 Pa. It thus appears that a thixotropic material, such as the calcite suspension, displays a constant static yield stress that is not time dependent since the structure formed at rest is broken by the application of a strong pre-shear. The calcite suspension, therefore, represents a reference for the reversible part of the structural build-up, since no chemical reactions occur.

Furthermore, the static yield stress of the cement paste increases even with the application of a strong pre-shear able to erase the structural build-up. In fact, after 2 min of resting, the cement paste displays a static yield stress of 79 Pa. This static yield stress increases almost linearly with time to reach 661 Pa after 40 min of resting despite the strong pre-shear applied. This shows the irreversible nature of the structural build-up during the dormant period of cement hydration. In fact, as observed for the calcite suspension, if the cement paste is behaving as a thixotropic material, the static yield stress should remain constant with the application of the pre-shear. Figure 4 shows a significant increase in the static yield stress, which demonstrates that the irreversible structural build-up during the dormant period cannot be considered negligible.

As suggested by Zhang et al. [15], it thus appears that the structural build-up in the cement paste is the sum of a reversible part (thixotropy) and a chemical part (early age hydration). Procedure 1 provides the total structural build-up (Astruct), while Procedure 2 allows for the evaluation of the contribution of the chemical evolution (Achem). For a thixotropic material such as calcite, the Astruc is equal to the Athix and ranges from 0.14 to 0.22 Pas/s.

#### *4.2. Effect of the Superplasticizer on the Structural Build-Up*

Superplasticizers are usually used to improve the workability of cementitious materials [29,30]. They adsorb onto cement particles and act by electro-steric repulsion to enhance their dispersion [31–34]. This leads to the release of water trapped between agglomerated particles. Cement paste without a superplasticizer commonly exhibits a shear-thinning behavior (i.e., a non-linear behavior with a viscosity that decreases with the shear rate), while, with the addition of a superplasticizer, the rheological behavior becomes Newtonian. In addition, the yield stress decreases with the superplasticizer dosage.

In addition to their dispersive action, superplasticizers are known to retard cement hydration [35]. The retarding effect increases with the superplasticizer dosage [36,37]. Thus, the irreversible structural build-up during the dormant period is expected to be lower than that of the cement paste without a superplasticizer.

The approach presented in Tables 2 and 3 is applied to examine the effect of the superplasticizer on the structural build-up during the first 2 h of cement hydration. The results are presented in Figure 5.

First, it can be noted that the addition of the superplasticizer leads to a decrease in the static yield stress. This effect can be explained by the dispersive action. In fact, the superplasticizer allows for the deflocculation of the cement particles via the decrease in attractive Van der Waals forces, which leads to a decrease in the yield stress.

Then, contrary to the cement paste without a superplasticizer, it can be observed that the static yield stress remains almost constant during the first hour of cement hydration for the cement paste with a superplasticizer when a strong pre-shear is applied (Procedure 2). This indicates that the contribution of the chemical part to the structural build-up is negligible during this period. This effect may be due to the retarding effect induced by the presence of the superplasticizer, as shown in Figure 6. The static yield stress then starts increasing. Thus, the cement paste with the superplasticizer can be considered a thixotropic material during the first hour (or more, depending on the superplasticizer dosage). After this period, the contribution of cement hydration to the structural build-up cannot be neglected.

**Figure 5.** Effect of the superplasticizer dosage on the evolution of the static yield stress.

Thus, it appears that the addition of the superplasticizer affects the structural build-up during the dormant period since the superplasticizer induces a retarding effect. In this case, the contribution of cement hydration during the dormant period can be neglected. The cement paste thus behaves similarly to a thixotropic material with reversible changes. The use of the superplasticizer allows for a reduction in the workability loss during the dormant period.

The rheological procedures proposed in this work allow for quantification of the contribution of the structural build-up during the dormant period of cement hydration through the slope of the static yield stress–time curve (a derivative of the curve). As shown in Figure 7, the contribution of the irreversible chemical part (Achem) is almost constant for the cement paste without the superplasticizer. The contribution of the reversible part (Athix) increases with the resting time. In fact, when the cement paste is left at rest, attractive Van der Waals forces lead to a reversible structuration. For the cement paste with 0.05% SP, both the Athix and Achem increase with time. However, due to the retarding effect (Figure 6), during the first 40 min, this paste behaves similarly to a thixotropic material since the contribution of the chemical part is negligible. Furthermore, for the cement paste with 0.1% SP, the contribution of the chemical part increases with time, while the thixotropic part remains negligible. In fact, an increase in the SP dosage leads to a decrease in the magnitude of the attractive Van der Waals forces [38,39], which are no longer able to contribute to the structuration of the cement paste at rest.

**Figure 6.** Evolution of the heat flow due to cement hydration. (**a**) retarding effect on the main peak; (**b**) retarding effect on the dormant period.

**Figure 7.** Evolution of the contribution of the structural build-up: (**a**) cement paste; (**b**) cement paste with 0.05% SP; (**c**) cement paste with 0.1% SP.

The approach proposed in this study is thus able to quantify the contribution of the thixotropic part and the chemical changes on the structural build-up of cementitious materials at rest. In addition, it allows the effect of the superplasticizer during the dormant period to be examined and quantified. Very few techniques make it possible to quantify the effect of the chemical changes during the dormant period. In fact, the isothermal calorimetry allows us to follow cement hydration via an indirect method (heat flow, Figure 6a), but it does not detect significant differences during the dormant period, except for its extension (Figure 6b). Combining the calorimetric data with in situ XRD patterns may detect changes during the early stage of cement hydration [40]. This proposed procedure can complete the chemical data by quantifying the rheological effect of the chemical evolution during the first hours of cement hydration.

#### **5. Conclusions**

In this work, an approach has been proposed to quantify the contribution of the structural build-up during the dormant period of cement hydration.

This approach has been validated on an inert thixotropic calcite suspension in which the structural build-up is almost reversible. Then, the thixotropic behavior of a cement paste without a superplasticizer has been examined. It appears that the contribution of cement hydration during the so-called "dormant period" cannot be considered negligible. In fact, there is a significant increase in the static yield stress during this period despite the strong pre-shear performed. This increase in the static yield stress describes a loss of workability that can be detrimental from a practical point of view.

Furthermore, the proposed approach allowed the effect of the superplasticizer to be investigated. It seems that the cement paste with the superplasticizer behaved similarly to a thixotropic material during the first hours of cement hydration. The structural build-up during this period can be considered reversible. Beyond this period, there are permanent changes characterized by a significant increase in the static yield stress. In addition, an increase in the superplasticizer dosage leads to a decrease in the magnitude of the attractive Van der Waals forces, which can erase the thixotropic structural build-up.

The proposed approach can thus be applied to complete the chemical data provided by other techniques, such as in situ XRD patterns and calorimetric data, to examine the chemical changes during the dormant period.

**Author Contributions:** Y.E.B.: conceptualization, methodology, validation, investigation, writing original draft preparation. N.A.: validation, conceptualization. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Figure A1.** *Cont*.

**Figure A1.** Example of the yield stress measurements of calcite (**a**), cement paste (**b**), cement paste with 0.05% SP (**c**), and cement paste with 0.1% SP (**d**) during Procedure 2.

**Figure A2.** *Cont*.

**Figure A2.** Example of the yield stress measurements of calcite (**a**), cement paste (**b**), cement paste with 0.05% SP (**c**), and cement paste with 0.1% SP (**d**) during Procedure 1.

#### **References**


### *Article* **Real-Time Human Authentication System Based on Iris Recognition**

**Huma Hafeez 1, Muhammad Naeem Zafar 2, Ch Asad Abbas 1,3, Hassan Elahi 4,5,\* and Muhammad Osama Ali <sup>5</sup>**

	- National University of Science and Technology, Islamabad 44000, Pakistan

**Abstract:** Biometrics deals with the recognition of humans based on their unique physical characteristics. It can be based on face identification, iris, fingerprint and DNA. In this paper, we have considered the iris as a source of biometric verification as it is the unique part of eye which can never be altered, and it remains the same throughout the life of an individual. We have proposed the improved iris recognition system including image registration as a main step as well as the edge detection method for feature extraction. The PCA-based method is also proposed as an independent iris recognition method based on a similarity score. Experiments conducted using our own developed database demonstrate that the first proposed system reduced the computation time to 6.56 sec, and it improved the accuracy to 99.73, while the PCA-based method has less accuracy than this system does.

**Keywords:** biometrics; iris recognition; security system; image processing; pattern recognition; iris image acquisition; image registration; PCA

#### **1. Introduction**

The classical human identification system was based on physical keys, passwords or ID cards, etc., which can be lost or be forgotten easily, while the modern identification system is based on distinct and unique traits, i.e., physical or behavioral characteristics. The human eye has a very sensitive part named the iris, which has unique pattern in every individual. The iris has a thin structure, and it has a sphincter muscle lying in between the sclera and the pupil of the human eye. It is just like a person who has a living password which can never be altered. Although, fingerprints, face and voice recognition have been also widely used as proofs of identity [1], the iris pattern is more reliable, non-invasive and has higher recognition accuracy rate [2–4]. The iris pattern does not change significantly throughout the human's life, and even the left and right eyes have different iris patterns [5]. Every eye has its own iris features with a very high degree of freedom [6]. These are some benefits of iris recognition, which make it better than the other recognition systems [7]. It began in 1936 when Dr. Frank Burch proposed the innovative idea of using iris patterns as a method to recognize an individual. In 1995, the first commercial product was made available [8]. Nowadays, iris recognition is extensively applied in many corporations for identification such as in security systems, immigration systems, border control systems, attendance systems, and many more [9].

The iris recognition framework is divided into four sections: iris segmentation, iris normalization, iris feature extraction and matching [10]. Daugman proposed the method of capturing the image at a very close range using camera and a point light source [2,11].

**Citation:** Hafeez, H.; Zafar, M.N.; Abbas, C.A.; Elahi, H.; Ali, M.O. Real-Time Human Authentication System Based on Iris Recognition. *Eng* **2022**, *3*, 693–708. https:// doi.org/10.3390/eng3040047

Academic Editor: Antonio Gil Bravo

Received: 4 October 2022 Accepted: 24 November 2022 Published: 15 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

After an iris image has been captured, a series of integro-differential operators can be used for its segmentation [12]. In [13], the author proposed that the active contour method is better than fixed shape modelling is for describing the inner and outer boundaries of iris [14]. Wildes proposed the localization of the iris boundaries using Hough transform, and they represented the iris pattern via Laplacian pyramid. The author used a normalized correlation to find the goodness of matching between two iris patterns [15]. Boles et al. proposed a WT zero-crossing representation for a finer approximation of the iris features at different resolution levels, and an average dissimilarity at each resolution level was calculated to determine the overall dissimilarity between two irises [16].

Zhu et al. used a multi-scale texture analysis for global feature extraction, 2D texture analysis for feature extraction and a weighted Euclidean distance classifier for iris matching [17]. Daouk et al. proposed the Hough transform and canny edge detector to detect the inner and outer boundaries of an iris [18]. Tan et al. proposed the iterative pulling and pushing method for the localization of the iris boundaries, and they used key local variations and ordinal measure encoding to represent the iris pattern [18–21]. Patil et al. proposed the lifting wavelet scheme for the iris features [22]. Sundaram et al. used the circular Hough transform for the iris localization, 2D Haar wavelet and the Grey Level Co-occurrence Matrix (GLCM) to describe the iris features and the Probabilistic Neural Network (PNN) for matching the computed iris features [23].

Shin et al. proposed pre-classification based on the left or right eye and the color information of the iris, and then, they authenticated the iris by comparing the texture information in terms of binary code [24]. The authors proposed the Contrast Limited Adaptive Histogram Equalization (CLAHE) to remove the noise and occlusions from the image, and they used SURF (Speeded Up Robust Features)-based descriptors for the feature extraction [25,26]. Jamaludin et al. proposed the improved Chan–Vese active contour method for iris localization and the 1D log-Gabor filter for feature extraction for non-ideal iris recognition [9]. Kamble et al. proposed their own developed database and used Fourier descriptors to make an image quality assessment [27]. Dua et al. proposed an integrodifferential operator and Hough transform for segmentation, a 1D log-wavelet filter for feature encoding and a Radial basis function neural network (RBFNN) for classification and matching [28]. This algorithm presented very high precision value, but it involved massive calculations which increased the computation time, and as well as this, the method was not tested practically on humans or animals. So, in order to improve the recognition efficiency and reduce the computation time, we sum up the state-of-the-art technology in the domain of iris recognition. We propose an improved state-of-the-art Iris recognition system based on image registration along with feature extraction which employs the physiological characteristics of the human eye. We propose two different methods, i.e., the classical image processing-based method and the PCA-based method to determine which will handle the noisy conditions, the illumination problems, as well as camera-to-face distance problems better in the real-time implementation of the systems.

In this paper, Section 2 describes the steps of proposed iris recognition method in which we have added image registration (to align the image) as a compulsory step to reduce FAR (False Acceptance Rate) and FRR (False Rejection Rate). Section 3 describes the principle component analysis, which is our own proposed method to describe the iris texture in terms of its Eigen vector, Eigen value and similarity score. Section 4 describes the evaluation part, and last section concludes the paper.

#### **2. Proposed System**

The proposed system based on iris recognition consists of six main steps: data acquisition, pre-processing, image registration, segmentation, feature extraction and matching. The principle component analysis method is the second proposed method for iris recognition. These methods are proposed in context of them having less computation time and a high level of accuracy.

#### *2.1. Data Acquisition*

The data in our case are based on real-time images, while generally, pre-captured images are used. It consists of two steps, either using the images that are already available for the testing system, i.e., the CASIA database, or using images that were taken from camera directly. The iris pattern is only visible in the presence of infrared rays so an ordinary camera cannot be used for this purpose. The real-time implementation of the system proves the system's effectiveness, so we have used our own database based on real-time images that were taken using an IR-based iris scanner, which were taken instantly. The core characteristics of the iris scanner are: auto-iris-detection, auto-capturing and auto-storage to a directory that has been assigned. The specifications of the iris scanner include a monocular IR camera, a capture distance of 4.7 cm to 5.3 cm using an image sensor, a capturing speed of 1 s, an image dimension of 640 × 480 (Grayscale) and the compression format was BMP.

Iris database created using this scanner consists of 454 images of 43 persons. For 12 persons, we have captured ten images of the right eye and five images of the left eye. For 15 persons, we have captured five images of the right eye and five images of the left eye, and for 16 persons, we have captured two images of the left eye and two images of the right eye. All of the images were captured at different angles and different iris locations.

#### *2.2. Pre-Processing*

After loading the images, the next step was pre-processing. This mainly involved RGB-to-grayscale conversion, contrast adjustment, brightness adjustment and deblurring. Figure 1 shows the four right eye images of same person taken using the iris scanner. As our images are already in grayscale, there was no need to conduct the grayscale conversion. The iris scanner took images after focusing, so there was very little chance that image would be blurred, but still, we have used a weighted average filter to perform deblurring.

**Figure 1.** Four right eye images of one person.

In weighted average filters, there will be higher intensities of pixels in the center of the image. The center values (pixels) of the mask are multiplied with highest values, thus, the intensities of the pixels at center are the highest. The weighted average filter was implemented for filtering an image in the order of *M* × *N* with filter of size *m* × *n*, which is given by Equation (1).

$$\log(\mathbf{x}, \mathbf{y}) = \frac{\sum\_{s=-a}^{a} \sum\_{t=-b}^{b} w(s, t) f(\mathbf{x} + s, \mathbf{y} + t)}{\sum\_{s=-a}^{a} \sum\_{t=-b}^{b} w(s, t)} \tag{1}$$

where *w* (*s*, *t*) is the weight, *f* (*x* + *s*, *y* + *t*) is the input for which *x* = 0, 1, 2 ... ., *m* − 1 and *y* = 0, 1, 2 . . . ., *n* − 1 and *g* (*x*, *y*) are the output image. This will result in the image having better intensities in the iris portion than those that are shown in Figure 2. During the iris recognition process, the pre-processing steps can also be used again and again if they are required.

**Figure 2.** (**a**) Original Image. (**b**) Output of weighted average filter image.

#### *2.3. Image Registration*

Image registration is a process during image processing that overlaps two or more images from various imaging equipment or sensors which are taken at different angles to geometrically align the images for an analysis and to reduce the problems of misalignment, rotation and scale, etc. If the angle of the iris is changed (i.e., the person kept their eye near the scanner at a different angle or a different position), then the iris pattern at the same position as in other image will be changed, and it can cause mismatch. As it can be seen in Figure 1, the images were taken at different positions as well as angles, so we needed to perform image registration. If the new image *In* (*x*, *y*) is rotated or tilted at any angle, it will be compared with the sample image *Is* (*x*, *y*), and it will automatically be rotated to the ideal position. There are different processes in image registration based on point mapping, multimodal configurations and feature matching. These points will be detected in both of the images to find whether they are at same angle and position or not, and if they are not, then the images will be aligned by adjusting these points, respectively. When we were choosing a mapping function (*u* (*x*, *y*)*, v* (*x*, *y*)) for the ordinal coordinates transformation, the intensity values of the new image were made to be close to the corresponding points in the sample image. The mapping function must simplify to Equation (2).

$$\int\_{\mathcal{X}} \int\_{\mathcal{Y}} \left( I\_{\mathfrak{s}}(\mathbf{x}, \mathbf{y}) - I\_{\mathfrak{n}}(\mathbf{x} - \mathfrak{u}, \mathbf{y} - \mathbf{v}) \right)^{2} d\mathbf{x} dy \tag{2}$$

It is constrained to capture the similarity transformation of the image coordinates from (*x*, *y*) to (*x* , *y* ) as shown in Equation (3).

$$
\begin{pmatrix} \mathbf{x'}\\\mathbf{y'} \end{pmatrix} = \begin{pmatrix} \mathbf{x} \\\mathbf{y} \end{pmatrix} - s\mathcal{R}(\cdot) \begin{pmatrix} \mathbf{x} \\\mathbf{y} \end{pmatrix} \tag{3}
$$

where *s* is a scaling factor and *R*() is the rotation matrix, which is represented by . Practically, when a pair of iris images *In* and *Id* are given, the warping parameters s and are recovered via an iterative minimization procedure. The output of image registration is shown in Figure 3, and image registration data are represented in Algorithm 1.

**Algorithm 1:** Image Registration.

Step 1: Read sample iris image and new (i.e., tilted or rotated) grayscale eye image.

Step 2: Detect surface features of both images.

Step 3: Extract features from both images.

Step 4: Find the matching features using Equation (2).

Step 5: Retrieve location of corresponding points for both images using Equation (3).

Step 6: Find a transformation corresponding to the matching point pairs using M-estimator Sample Consensus (MSAC) algorithm.

Step 7: Use geometric transform to recover the scale and angle of new image corresponding to the sample image. Let sc = scale ∗ cos (theta) and ss = scale ∗ sin (theta), then: Tinv = [sc-ss 0; ss sc 0; tx ty 1]

where tx and ty are x and y translations of new image relative to the sample image, respectively. Step 8: Make the size of new image same as that of sample and display in same frame.

(**c**) (**d**)

**Figure 3.** Image Registration. (**a**) Sample Image. (**b**) New Image. (**c**) Matching Points Between both images. (**d**) Rotated/Registered Image.

#### *2.4. Segmentation*

Segmentation mainly involves the separation of the iris portion from the eye. The iris region consists of two circles, one of them is the outer iris–sclera boundary, and another one is the interior iris–pupil boundary. The eyelids and eyelashes sometimes hide the upper and lower parts of the iris region. It is considered to be a very crucial stage to achieve the correct detection the of outer and inner boundaries of the iris. There are different methods which are commonly used for this section including the integro-differential integrator [29], moving agent [30], Hough transform [31], circular Hough transform [32], iterative algorithm [33], Chan–Vese active contour method [34] and Fourier spectral density ones, [35] etc. We have used the circular Hough transform (CHT) one for the detection of the iris boundaries due to its robust performance even in noise, occlusion and varying illumination. It depends upon the equation of circle which is described by Equation (4):

$$(x-a)^2 + (y-b)^2 = r^2 \tag{4}$$

where *a*, *b* is the center, *r* is the radius and *x*, *y* represents the coordinates of the circle. Equations (5) and (6) shows the parametric representation of this circle:

$$x = a + r \* \cos\theta \tag{5}$$

$$y = b + r \ast \sin \theta \tag{6}$$

The CHT use a 3D array with the first two dimensions, representing the coordinates of the circle, and the last third of it specifies the radii. When a circle of a desired radii is drawn at every edge point, the values in the accumulator (the array which will find the intersection point) will increase. The accumulator, which keeps count of the circles passing through the coordinates of each edge point, will vote for the highest count as shown in Figure 4.

**Figure 4.** Circular Hough transform voting pattern.

The coordinates of the center of circles in the images will be the coordinates with the highest count. For efficient recognition, the circular Hough transform was performed on the iris–sclera boundary first, and then, on the iris–pupil boundary. The segmentation using the circular Hough transform method is shown in Figure 5. After the circular portion was detected, the next step was to separate this circular portion from the eye. We have applied the mask of zeros to extract the iris from the eye as shown in Figure 6, and the circle detection is represented in Algorithm 2.

**Algorithm 2:** Circle Detection Using Circular Hough Transform.


**Figure 5.** Output of circle detection.

**Figure 6.** Output of segmentation.

#### *2.5. Feature Extraction*

Feature extraction is one of the most important steps involved in process. We have used two different methods for the feature extraction. One is the two-dimensional Discrete Wavelet Transform (DWT) and other is edge detection.

#### 2.5.1. Two-Dimensional Discrete Wavelet Transform (2-D DWT)

The 2D wavelet and scaling functions were obtained by taking the vector product of the 1D wavelet and the scaling functions. This leads to the decomposition of the approximate coefficients at level *j* in four components, i.e., the approximation at level *j* + 1, and the details in three orientations (the horizontal, vertical, and diagonal ones). Two-dimensional wavelet transform is generally obtained by separable products of scaling functions Ø and wavelet functions Ψ as in Equation (7):

$$\mathbf{C}\_{j+1}[k,l] = \sum\_{m,n} h[m-2k] h[n-2] \mathbf{C}\_{j}[m,n] \tag{7}$$

The detail coefficient images, which are obtained from three wavelets, are given by Equations (8)–(10), and they are shown in Figure 7.

Vertical Wavelet : Ψ1(*t*1, *t*2) = (*t*1)Ψ(t2) (8)

$$\text{Horizontal Wavedet}: \,\,\Psi^2(t1, t2) = (t2)\Psi(t1) \tag{9}$$

$$\text{Diagonal Wavelet}: \,\,\,\Psi^3(t1, t2) = \Psi(t1)\Psi(t1) \tag{10}$$

**Figure 7.** Feature extraction using 2D DWT. (**a**) Original eye image. (**b**) Vertical wavelet. (**c**) Horizontal wavelet. (**d**) Diagonal wavelet.

The energy is computed to approximate the three detailed coefficients (the horizontal, vertical and diagonal ones) by Equation (11):

$$\text{Energy} = \sum\_{m=0}^{M-1} \sum\_{n=0}^{N-1} |X(m,n)| \tag{11}$$

where *X* (*m*, *n*) is a discrete function whose energy is to be computed.

#### 2.5.2. Edge Detection:

Edge detection consists of a variety of mathematical methods that can be used to identify the points in an image where the image brightness changes sharply, which are generally organized into set of curved line segments, or they have discontinuities. It can also be used for finding out those points where the intensities change rapidly. There are a lot of different edge detection techniques, but we have used a zero-crossing-based secondorder derivative edge detector named the canny edge detector. This technique uses two thresholds: a high threshold for low edge sensitivity and a low threshold for high edge sensitivity to detect strong as well as weak edges which enables the edge detector to not be affected by noise, and thus, it is more likely to detect the true weak edges. Bing Wang and Shaosheng Fan developed a filter which evaluated the discontinuity between the grayscale values of each pixel [36]. For higher discontinuity, a lower weight value was set to smooth the filter at the corresponding point, and for lower a discontinuity between the grayscale values, the higher weight value was set to the filter. The resultant image after applying the edge detection is shown in Figure 8, and feature extraction is represented in Algorithm 3.

#### **Algorithm 3:** Feature Extraction using Edge Detection.

Step 1: Convolve the Gaussian filter with image to smooth the image using:

$$H\_{lj} = \frac{1}{2\pi\sigma^2} \exp\left(-\frac{\left(l - (k+1)\right)^2 + \left(j - (k+1)\right)^2}{2\sigma^2}\right); 1 \le i, j \le \left(2k+1\right)$$

where *σ* is standard deviation and kernel size is (2*k* + 1) × (2*k* + 1).

Step 2: Compute the local gradient *g*2 *<sup>x</sup>* + *g*<sup>2</sup> *y* 1 2 at each point.

Step 3: Find edge direction *tan*−1( *gx gy* ) at each point.

Step 4: Apply an edge thinning technique to get more accurate representation of real edges. Step 5: Apply hysteresis thresholding based on two thresholds, *T*<sup>1</sup> and *T*<sup>2</sup> with *T*<sup>1</sup> < *T*2, to determine potential edges in image.

Step 6: Perform edge linking by incorporating the weak pixels connected to the strong pixels.

**Figure 8.** Feature extraction using canny edge detector. (**a**) Original eye image. (**b**) Output image of feature extraction using edge detection.

#### *2.6. Feature Matching*

For matching, we have used two different methods, i.e., the Hamming distance and Absolute differencing method.

#### 2.6.1. Hamming Distance

Hamming distance makes use of only those parts in both of the iris patterns which corresponds to "0". It is calculated by using the formula in Equation (12). Its value will be zero when both of the iris patterns match exactly, but unfortunately this never happens because of light variation while we are capturing the image, noise which will remain undetected during normalization and environmental effects on the sensor, etc. So, a value up to a 0.5 distance which was chosen in the hit and trial method is usually considered to be accurate. If the hamming distance is below 0.5, it means that the both iris patterns are the same, but if the distance is greater than 0.5, it means that the iris patterns may be matched or not matched. If the distance has value of 1, it clearly means that the iris patterns are not matched.

$$\text{Harmonic Distance} = \frac{1}{N} \sum\_{i=1}^{1} |X\_i - Y\_i| \tag{12}$$

where *N* is the number of parts to be compared and |*Xi* − *Yi*| is the difference between the two iris patterns.

#### 2.6.2. Absolute Differencing

The absolute differencing method will find the absolute difference between each element in one iris pattern from the corresponding element in other iris pattern, and it returns the absolute difference in the corresponding element of the output. If one pattern is similar to the other, then the absolute difference will be zero.

We have mixed both of these techniques in a way that first, it finds the hamming distance, and then, find out the absolute difference between the two images. If the distance between the two images is zero, the display feature are matched, but if it is not zero, then, we must calculate the absolute difference between the two patterns. However, if the absolute difference is zero, it will display the non-matched features.

#### **3. Principal Component Analysis (PCA)**

Principal component analysis is a method used to extract strong patterns from a given dataset. The data become easy to visualize using this technique and it converts set of correlated variables into a linearly correlated variable. This process gives the differences and similarities in the dataset. The dataset which has highest variance becomes the first axis, which is called the first principal component. The dataset which has the second highest variance becomes the second axis, which is called the second principal component and so on. PCA reduces the dimensions of the dataset, but it retains the features and characteristics of the dataset. We have used PCA to reduce the steps and obtain the desired results as by using the traditional image processing steps. It does not detect the inner features which is very important step for our system. It makes a decision based on the Eigenvectors, Eigenvalues and matching score between the two images. The results obtained using PCA are shown in Figure 9, and the PCA is represented in Algorithm 4.

**Algorithm 4:** Principal Component Analysis (PCA).

Step 1: Create MAT file of the database and load database.

Step 2: Find the mean of images using <sup>1</sup> <sup>2</sup> <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *Xij*.

Step 3: Find the mean shifted input image.

Step 4: Calculate the Eigen vector and Eigen values using *A*υ = *λ*υ, where matrix *λ* is the Eigen value of non-zero square matrix (*A*) corresponding to <sup>υ</sup>.

Step 5: Find the cumulative energy content for each Eigen vector by *gj* <sup>=</sup> <sup>∑</sup>*<sup>j</sup> <sup>k</sup>*=<sup>1</sup> *Dkk*, j = 1, 2, 3, ... , *p*. It will retain the top principal components only.

Step 6: Create the feature vector by taking the product of cumulative energy content of Eigen vector and mean shifted input image.

Step 7: Separate out feature vector (iris section) from input image.

Step 8: Find the similarity score with images in database.

Step 9: Display the image having highest similarity score with input image.

**Figure 9.** Results of applying principal component analysis.

#### **4. Results and Discussion**

Our proposed work was implemented using same laptop with the specifications of an Intel Core 5Y10C, 4 GB RAM, Windows 8, IriCore software and MATLAB R2017a. About 454 images of 43 persons taken using a camera, MK-2120U, using IriCore software were used to test the performance of the proposed system. Multiple samples of individual eyes were recorded in our database. Each image was captured at 640 × 480, 8-bit grayscale image, and they were saved in the BMP format. Figure 10 shows the left and right eye images of different persons.


**Figure 10.** Iris images of different persons in our database.

(**d**) (**e**)

**Figure 11.** (**a**) Original eye image. (**b**) Image after registration. (**c**) Outer and inner boundaries of image. (**d**) Iris portion separation from an eye image. (**e**) Features of an Iris.

To check the accuracy of the proposed system, it was evaluated using the false acceptance rate, the false rejection rate and the equal error rate. The equal error rate (EER) is achieved at a point where the FAR and FRR overlaps; the lower the EER is, the better the performance accuracy of the system will be. The matching algorithm uses a threshold value which will determine the closeness of the input iris to the database iris. The lower the threshold value is, the lower the FRR will be, while the FAR will be higher, and a higher

threshold value will lead to a lower FAR and higher FRR, as shown in Figures 12 and 13. EER is the point at which the FRR equals the FAR, and it is considered to be the most important measure of the biometric system's accuracy. The proposed iris recognition system gives an EER = 0.134 as shown in Figure 12, while using a PCA-based method for iris recognition gives an EER = 0.384 as shown in Figure 13.

**Figure 12.** Performance graph for proposed method. (**a**) % Error for threshold distance (**b**) ROC curve.

**Figure 13.** Performance graph for PCA-based method. (**a**) % Error for threshold distance. (**b**) ROC curve.

Table 1 describes the performance of different methodologies for iris recognition. It shows the false acceptance rates (FAR), the false rejection rates (FRR) and the recognition accuracy taken using different methodologies. It can be observed from the table that our proposed system has outperformed the already existing techniques in terms of the processing time. Our proposed system has recognition rate of 99.73%, and the execution time is 6.56 s, while the other system based on PCA has a recognition rate of 88.99%, and the execution time is 21.52 s. Therefore, the proposed system without PCA is more proficient for identification than the proposed system with PCA because it has less accuracy.


**Table 1.** Performance comparison of different methodologies.

It can be seen from the above table that our proposed system's accuracy is better than the other methods are as we have involved image registration which provide much better results, while the PCA-based system is less efficient than the other proposed system is, but it can be made more efficient by using a camera with a higher resolution. So, the proposed system with image registration is proficient for the identification and verification of the iris.

#### **5. Conclusions**

Iris recognition is an emerging field in biometrics as the iris has a data-rich unique structure, which makes it one of the best ways to identify an individual. The designed project is an innovation in the current modes of security systems that are being used today. Due to the unique nature of the iris, it can be used as a password for life. As the iris is the only part of human that can never be altered, there are no chances of trespassing when one is using an iris detection system, by any means. In this paper, an efficient approach for an iris recognition system using image registration and PCA is presented using a database that was built using images taken using an iris scanner. The iris characteristics enhance its suitability in the automatic identification which includes the ease of image registration, the natural protection from external environment and surgical impossibilities without the loss of vision.

The application of the iris recognition system has been seen in various areas of life such as in crime detection, airport, business application, banks and industries. Image registration adjust the angle and alignment of the input image to the reference image. The iris segmentation uses the circular Hough transform method for the iris portion detection, and then, the mask is applied to extract the iris segment from the eye. Feature extraction is achieved by using the two-dimensional discrete wavelet transform (2-D DWT) method and by an edge detection technique so that the most supreme areas of the iris pattern can be extracted, and hence, a high recognition rate and less computation time is achieved. The Hamming distance and absolute differencing methods were applied on the extracted features which give us the accuracy of 99.73%.

PCA was also applied on the same database, and the decision is purely based on the matching score. This system gives us the recognition rate of 89.99% as it does not analyses the deep features. Then, the doors were automated using serial communication between MATLAB and Arduino. The door automation using iris recognition has reduced the labor work for opening and closing the barrier. The system proved to be efficient, and it saved time as it had a processing time of 6.56 s which can be reduced further if an advance programming software had been used. In the future, an improved system can be accessible by investigating with the proposed iris recognition system under different constraints and environments.

**Author Contributions:** H.H.: Proposed topic, basic study Design, Data Collection, methodology, statistical analysis and interpretation of results; M.N.Z.: Manuscript Writing, Review and Editing; C.A.A.: Literature review, Writing and Referencing; H.E.: Review and Editing; M.O.A.: Review and Editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The author acknowledges the National Development Complex (NDC, NESCOM) Islamabad, Pakistan, for giving sponsorship for the project and authorized faculty for technical and non-technical help to accomplish the goal.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Strategic Participation of Active Citizen Energy Communities in Spot Electricity Markets Using Hybrid Forecast Methodologies**

**Hugo Algarvio**

LNEG—National Laboratory of Energy and Geology, Est. Paço Lumiar 22, 1649-038 Lisbon, Portugal; hugo.algarvio@lneg.pt

**Abstract:** The increasing penetrations of distributed renewable generation lead to the need for Citizen Energy Communities. Citizen Energy Communities may be able to be active market players and solve local imbalances. The liberalization of the electricity sector brought wholesale and retail competition as a natural evolution of electricity markets. In retail competition, retailers and communities compete to sign bilateral contracts with consumers. In wholesale competition, producers, retailers and communities can submit bids to spot markets, where the prices are volatile or sign bilateral contracts, to hedge against spot price volatility. To participate in those markets, communities have to rely on risky consumption forecasts, hours ahead of real-time operation. So, as Balance Responsible Parties they may pay penalties for their real-time imbalances. This paper proposes and tests a new strategic bidding process in spot markets for communities of consumers. The strategic bidding process is composed of a forced forecast methodology for day-ahead and short-run trends for intraday forecasts of consumption. This paper also presents a case study where energy communities submit bids to spot markets to satisfy their members using the strategic bidding process. The results show that bidding at short-term markets leads to lower forecast errors than to long and medium-term markets. Better forecast accuracy leads to higher fulfillment of the community programmed dispatch, resulting in lower imbalances and control reserve needs for the power system balance. Furthermore, by being active market players, energy communities may save around 35% in their electrical energy costs when comparing with retail tariffs.

**Keywords:** Balance Responsible Parties; Citizen Energy Communities; electricity markets; forecast methodologies; imbalance penalties; strategic bidding

#### **1. Introduction**

The liberalization process brought full competition to the electricity supply industry in both wholesale and retail markets [1]. As a consequence, the market agents have the option to trade electricity in different markets [2]: spots, continuous, derivatives, non-organized, and ancillary services markets.

In spot markets, agents can submit bids to electricity pools based on day-ahead and intraday or real-time marginal auctions. In continuous intraday markets, players can negotiate energy based on the pay-as-bid scheme, i.e., an automatic match of opposite bids [3]. These markets were designed for dispatchable players, i.e., players that can comply with a programmed dispatch, which means that players like consumers and variable generation without storage capacity will have real-time deviations [4,5]. Real-time deviations from the schedules of Balance Responsible Parties (BRPs) have to be balanced at balancing markets. Balancing markets are part of the ancillary services of the system, managed by transmission system operators (TSOs) to guarantee the secure operation of power systems. BRPs with deviations from their schedules may need to pay penalties concerning spot markets. They will receive the down/up balancing prices according to the direction of their deviations [6]. Those penalties are computed considering each country's imbalance settlement (IS) mechanism [7]. In derivatives markets, agents can sign standard

**Citation:** Algarvio, H. Strategic Participation of Active Citizen Energy Communities in Spot Electricity Markets Using Hybrid Forecast Methodologies. *Eng* **2023**, *4*, 1–14. https://doi.org/10.3390/ eng4010001

Academic Editor: Antonio Gil Bravo

Received: 23 November 2022 Revised: 15 December 2022 Accepted: 16 December 2022 Published: 21 December 2022

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

financial and physical contracts [8]. For non-standard contracts, agents can negotiate and set the terms and conditions of the private bilateral agreements [9].

Normally, in retail competition, retailers sign private bilateral contracts with clients [10]. Citizen Energy Communities (CECs) are a new market player that competes with retailers for signing private bilateral contracts with end-use consumers [11]. The main problem of retailers is that they usually follow a business-as-usual strategy, proposing high tariffs, equal in each consumer segment [12]. So, being part of a CEC is more economically attractive than signing retail tariffs, but also more demanding by considering the active participation of their members. CECs may be composed of local consumers, prosumers, distributed generation, and storage assets. Considering the global goal of a carbon-neutral society and the increasing penetration of distributed generation, CECs aim to achieve energy sustainability by managing local resources [13]. Against this background, new European legislation supports the active participation of consumers through CECs by providing significant discounts on their grid usage and access costs [14–16]. To satisfy the needs of their members, CECs can enter into the wholesale competition, submitting bids to spot markets, signing private bilateral contracts with producers, and standard contracts on the exchanges or OTC [17,18]. Algarvio [13] presented a review of CECs, as power system alliances that need resource management and coordination.

To avoid future losses, forecasting market prices is one of the aspects that CECs have to consider when participating in wholesale markets. Furthermore, forecasting their energy needs is one of the biggest issues that CECs have to face. The consumption dynamic of members is very dependent on the meteorological conditions, the type of days, and the segment type of consumers [19]. So, minimizing the consumption volatility of members can be a good solution to avoid high forecast errors, which can result in unbalances, and, consequently, in the payment of penalties by CECs. Thus, CECs should have an appropriate trading strategy to mitigate those errors. An adequate short-run strategic bidding on spot markets is crucial to mitigate potential consumption unbalances, since bilateral transactions are usually made in the long run (months prior to real-time consumption). Accordingly, monitoring the real local dispatch with smart meters is a critical aspect of communities with members composed of consumers, prosumers, and distributed generation [20,21]. Furthermore, it enables them to control their net load by using demand response programs, i.e., controlling the local energy production or consumption in case of shortages or excesses of energy [22,23].

Ayón et al. [24] indicated that large and diversified quantities of end-use clients might reduce load forecast errors. Furthermore, they concluded that aggregations of flexible loads are typically beneficial to reduce their forecast errors. Therefore, load aggregations may benefit market players concerning individual loads. Wei et al. [25] presented a complete review of 128 forecast models of energy load. They considered that highly accurate forecasts have a maximum mean absolute percentage error (MAPE) of 10%. Naturally, they concluded that forecasting small-scale loads have larger errors than large-scale loads. Furthermore, they also concluded that the forecast accuracy increase with the time horizon, i.e., longterm (yearly) and medium-term (monthly or quarterly) forecasts have smaller errors than short-term forecasts (from daily to sub-hourly). Naturally, demand is weather-driven, so by analyzing the studied models, the authors concluded that the forecast accuracy increase with the time scale being high to yearly forecasts than to hourly forecasts. However, considering hourly forecasts, the forecast accuracy increases how closer to real-time operation [26]. Koponen et al. presented a review of 12 models to forecast the short-term electrical energy load [27]. They considered six different scenarios to test these models. They concluded that the forecast errors decrease with an increase in the number of aggregated consumers, considering the normalized root mean square error (NRMSE). Furthermore, they indicated that their results do not support the use of specific criteria (such as MAPE or NRMSE) to compare methods. They also concluded that it should be used hybrid methods to compute demand forecasts. Algarvio and Lopes [28] presented a strategic bidding strategy for retailers considering hybrid forecast methodologies in spot day-ahead and intraday markets. They concluded that the participation of retailers closer to real-time markets improves their forecast accuracy and their return from markets. It also has been concluded that retailers with larger and diversified portfolios have lower forecast errors.

Against this background, this paper focuses on upgrading the strategic bidding process for retailers in wholesale power markets presented in the previous work, considering its adaptation to CECs. It considers a new forecast methodology for the day-ahead market based on forced forecast and adapted the forecast methodology considered for the spot intraday market based on the short-run energy trends of the community, aiming at reducing forecast errors, and, consequently, the unbalances and penalties. Specifically, the purpose of the paper is threefold:


The work presented here refines and extends the previous work on CECs composed of consumers [11], their agent-based management [13] and model, bilateral model [18], strategic bidding of retailers [28], and risk management [29,30]. The main novelty of the presented work consists of the equipment of the agent-based model of CECs with a new strategic bidding process that enables them to participate in wholesale electricity markets. Indeed, CECs have already been recognized by European legislation, and some CECs are already active in Portugal [11,14–16]. The main limitation of CECs is that they need to bid at least 1 MW of power to participate in spot markets. Therefore, CECs need to have a relevant weight not to need market intermediates.

The remainder of the paper is structured as follows. Section 2 presents an overview of electricity markets, considering spot, balancing, and IS markets. Section 3 introduces a model for strategic bidding of CECs. Section 4 presents a case study. Finally, concluding remarks are presented in Section 5.

#### **2. Electricity Markets**

Active market players have the option to trade electricity in five different markets: spots, continuous, derivatives (forwards, futures, swaps, and options), non-organized (private bilateral contracts), and ancillary services markets. In spot markets, agents can submit bids with a minimum of 1 MW to electricity pools based on day-ahead and intraday or real-time marginal auctions [3].

In Europe, day-ahead markets close at noon (CET time zone) of the day-ahead to real-time operation between 12–37 h before real-time commitment. European markets are coupled and use EUPHEMIA, a marginal pricing common algorithm used to solve power flows between different market zones with the goal of maximizing social welfare [31]. In Europe, it is also possible to trade energy in several intraday auctions a few hours ahead of real-time operation and in the continuous intraday market. In continuous intraday markets, players can negotiate 15 min-ahead of real-time operation based on the pay-as-bid scheme [3]. In derivatives markets, agents can sign standard financial and physical contracts on the exchanges (clearing houses) or over-the-counter (OTC) through electronic trading to reduce risk by hedging against spot price volatility and consumption uncertainty [17]. For non-standard agreements, agents can privately negotiate and set the terms and conditions of the contracts on non-organized markets [9]. These markets were designed for large dispatchable players, i.e., players that can comply with a programmed dispatch and have enough power to participate in these markets, which means that players like retailers, CECs, and variable generation without storage capacity may have real-time deviations [4,5]. Real-time imbalances of BRPs concerning their final programmed dispatch may have to be balanced during real-time operation [6]. TSOs use balancing markets to guarantee the security of power systems by doing a real-time balance of demand and supply of energy. BRPs may have to pay/receive the down/up balancing costs, which normally results in penalties concerning spot markets [7].

#### *2.1. European Balancing Markets*

A variation in the kinetic energy, *qkint*, caused by different instantaneous powers of the rotating generators, Δ*P<sup>s</sup> <sup>t</sup>* , and/or motors, Δ*P<sup>d</sup> <sup>t</sup>* , from their defined set-point values in period *t*, may lead to deviations between supply and demand and cause frequency and/or voltage oscillations, as presented in the power equilibrium equation [32]:

$$
\Delta P\_t^s - \Delta P\_t^d = \frac{dq \sin\_t}{dt} \tag{1}
$$

In Europe, the maximum secure frequency oscillation in relation to the reference is 0.1%, being the maximum allowed oscillation of 0.5%. Frequency oscillations higher than 0.5% can lead to outages and to the division of connected control areas. When the frequency deviations achieve 0.1%, the balancing reserves are automatically activated to mitigate the deviations that originate this oscillation [33,34].

Traditionally, in Europe exist, four different mechanisms to balance power systems [6]:


FCR is the fastest frequency reserve, being the first to be activated to solve frequency disturbances because of incidents or imbalances between production and consumption, which result in a frequency deviation in relation to the 50 Hz European programmed value. It has to be activated in a maximum of 15 s, and the disturbances need to be controlled in a few seconds. Power systems of the continental European synchronous grid have to reserve 3000 MW of their capacity to support FCR.

aFRR has to be activated in a maximum of 30 s and can stay active until a maximum of 15 min, replacing FCR. It also reestablishes the grid frequency to the scheduled value. Considering the programmed size of aFRR (power band), the TSO defines the band needs for every period. ENTSO-E suggests the minimum size of the symmetric power band [33].

mFRR is firstly used to free up and/or support aFRR and then to continue balancing long-term disturbances for long periods. The TSO is responsible for directly activating this reserve, which allows for solving medium and long-term active-power deviations originated by generators, loads, or other grid disturbances.

In the aFRR and mFRR products, TSOs typically define schedules for blocks of 15 min. In the corresponding markets, an auction for every hour of the day (or blocks of various hours) is carried out, and the technically capable generators are allowed to make bids. The auction criterion aims to determine the lowest capacity price (aFRR capacity market) and the lowest energy price (aFRR and mFRR energy markets), based on marginal pricing, pay-as-bid, or other pricing methods.

RRs are activated to solve long-term incidents that cannot be solved with the previous mechanisms. They are normally traded considering bilateral agreements between TSOs and providers. They can be activated in 15 min and can continue active for hours. This mechanism is activated considering the schedules of the programming dispatch agreed upon between TSOs and providers. While the other mechanisms can be directly activated and controlled by TSOs, in this mechanism, TSOs rely on providers to comply with the programmed dispatch.

Balancing reserves are directly traded between TSOs and providers. Providers of upward regulation will receive the up-regulation price of the reserves. On the contrary, providers of downward regulation will pay the down-regulation price. The costs or revenues of balancing markets are passed to BRPs that have deviations or need to be balanced according to the imbalance settlement mechanism. Normally, the prices of upward and downward regulation are higher and lower than spot prices, respectively, which originate the payment of penalties. Otherwise, BRPs that deviate from their schedules can be compensated or do not pay penalties.

In Europe, IS mechanisms strongly differ between countries. The following mechanisms are the most used [7]:


These mechanisms consider that BRPs will only pay for the balanced energy. The reserved capacity that guarantees the power system security is paid in the tariffs of end-use consumers.

The first two mechanisms are discriminatory, since only BRPs that contribute to deviations in the dominant direction may pay penalties. The second is more discriminatory because only BRPs that need to be balanced may pay penalties, but incentive BRPs to autoregulate their set points, avoiding the payment of penalties. In these mechanisms, BRPs are not compensated, independently of the balancing prices, which may originate an economic surplus to TSOs. However, when the costs of balancing the system in the dominant direction are lower than in the non-dominant direction, TSOs may have an economic deficit. The third mechanism does not originate an economic deficit to TSOs, because all BRPs will pay penalties concerning their deviations. The fourth mechanism considers that all the balancing costs or revenues are passed to BRPs. This mechanism is fairer in the sense TSOs do not have an economic surplus or deficit. However, it does not incentive BRPs to balance themselves because they can be compensated for their imbalances.

Next, are going to be presented the details of the Portuguese balancing and IS markets.

#### Portuguese Balancing Markets

Portugal and Spain are members of the Iberian Market of Electricity (MIBEL). MIBEL only manages spot, derivatives, and bilateral markets. Ancillary services are independent for each country and managed by their local TSOs. However, some ancillary services can be traded between TSOs. For continuous balancing, Portugal considers the traditional European frequency reserves with the following specifications [6].

FCR is a mandatory and non-remunerated system service for all technically capable generators connected to the grid. They have to reserve 5% of their nominal power in stable conditions to support FCR. Portugal is part of the synchronous grid of continental Europe, contributing with its FCR reserved capacity to the required 3000 MW of positive and negative FCR ready to be activated in continental Europe.

The Portuguese TSO requires an asymmetrical aFRR power band where its up capacity doubles the down capacity. Historically, in Portugal, the aFRR power band is more used for up-regulation than down-regulation. Thus, concerning ENTSO-E suggestions, the Portuguese TSO upscales the up capacity of the aFRR until 60% and downscales its down capacity until 40%. In Portugal, the TSO allows the participation of all technically capable generators in hourly auctions of aFRR capacity. They are remunerated based on the marginal prices of the hourly auction. Generators have to be capable of providing both down-regulation and up-regulation, bidding an up capacity that has to double the down capacity. Due to the lack of competition, in Portugal are the combined cycle gas turbines that participate in aFRR markets, being the price of the energy they provide in aFRR defined by the regulator.

The energy of mFRR is obtained considering an hourly auction-based separate procurement of both upward and downward regulation on marginal markets. The problem with mFRR is that it is based on hourly auctions, so RRs shall be used for balancing long-term

frequency deviations. RRs can be activated in 15 min and continue active for long time periods, based on bilateral contracts negotiated between TSOs and the participants.

#### *2.2. Imbalance Settlement Mechanisms*

The Portuguese mechanism considers that BRPs have to pay/receive the costs/revenues of all the energy used to balance the system [7]. Therefore, the TSO does not have an economic surplus or deficit concerning the energy used to balance the system. So, the TSO computes a single penalty, *ppen <sup>t</sup>* , and dual pricing, for period, *t*, considering the following formulations:

$$p\_t^{\text{pcn}} = \frac{\sum\_{v=1}^{O} (p\_{0,t} - p\_{0,t}) \times q\_{v,t}}{q\_t^{\text{dcv}}} \tag{2}$$

$$p\_t^{\
up} = p\_{0,t} + p\_t^{\text{per}} \tag{3}$$

$$p\_t^{down} = -(p\_{0,t} - p\_t^{pen}) \tag{4}$$

where:


BRPs with upward deviations receive the sum of the spot price and the penalty. BRPs with downward deviations pay the subtraction of the penalty to the spot price. In the case of a positive penalty, BRPs are compensated because the prices of the ancillary services are lower when compared to spot markets. Otherwise, they are penalized. In the case of positive upward or downward imbalance prices, the TSO has to pay BRPs. Otherwise, are BRPs who pay to the TSO.

The Nordic and Spanish mechanisms compute the balance direction, and only the BRPs that originate those balance needs must directly pay the price of the energy used to balance the system [7,35]. Contrary to the Portuguese mechanism, this mechanism considers double penalty and single pricing, as presented in the following formulations:

$$\begin{cases} \begin{aligned} p\_t^{up,\text{pen}} &= 0 & \text{if } \sum\_{o=1}^O q\_{o,t}^{up} < \sum\_{o=1}^O q\_{o,t}^{down} \\\ p\_t^{up,\text{pen}} &= \min\left[\frac{\sum\_{v=1}^O (p\_{o,t}^{down} - p\_{0,t}) \times q\_{o,t}^{down}}{\sum\_{v=1}^O q\_{o,t}^{down}}, 0\right] & \text{if } \sum\_{o=1}^O q\_{o,t}^{up} \le \sum\_{o=1}^O q\_{o,t}^{down} \\\ & \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \end{aligned} \end{cases} (5)$$

$$\begin{aligned} \textit{pen2dies} &= \begin{cases} \textit{p}\_{t}^{\textit{down},\textit{pen}} = 0 & \textit{if} \sum\_{o=1}^{O} q\_{o,t}^{\textit{up}} > q\_{o,t}^{\textit{down}} \\\ p\_{t}^{\textit{down},\textit{pen}} &= \min\left[\frac{\sum\_{v=1}^{O} (p\_{0,t} - p\_{o,t}^{\textit{up}}) \times q\_{o,t}^{\textit{up}}}{\sum\_{v=1}^{O} q\_{o,t}^{\textit{up}}}, 0\right] & \textit{if} \sum\_{o=1}^{O} q\_{o,t}^{\textit{up}} \ge \sum\_{o=1}^{O} q\_{o,t}^{\textit{down}} \end{cases} \end{aligned} \tag{5}$$

$$p\_t^{\
up} = p\_{0,t} + p\_t^{\
up,\text{peru}} \tag{6}$$

$$p\_t^{down} = -(p\_{0,t} - p\_t^{down,pen}) \tag{7}$$

where:


Considering this mechanism, an upward penalty exists when the downward balancing needs are higher, penalizing BRPs with up deviation. On the contrary, are BRPs with down deviations who pay penalties when the upward balancing needs are higher. The problem with this mechanism is that only net deviations in the dominant direction are paid. It is an unfair system that highly penalizes the players that have to pay penalties. However, the Portuguese IS also does not incentive BRPs to be balanced.

The imbalance quantity, *qdev <sup>t</sup>* , assigned to a BRP is computed considering the difference between its final programmed dispatch, *q prog <sup>t</sup>* , and its real-time dispatch, *qt*, in period *T*, as follows:

$$q\_t^{dev} = q\_t - q\_t^{prog} = \int\_{t=o}^T P\_t - P\_t^{prog} dt\tag{8}$$

where *Pt* and *<sup>P</sup>prog <sup>t</sup>* are the instantaneous powers of the final and programmed dispatch, respectively.

The next section presents the strategic bidding process of CECs able to reduce their imbalances.

#### **3. Strategic Bidding in Wholesale Electricity Markets**

Considering CECs with predefined members, as consumers or prosumers, they need to satisfy the energy needs of their members. CECs can enter into bilateral agreements to acquire energy with producers, retailers, or other sellers, and/or can submit bids to spot markets if they have the capability to trade the required minimum power. Bilateral contracts are a form of risk hedging against the volatility of spot prices, although they are subject to risk premiums. Normally, buyers of energy get worse prices in bilateral agreements. Thus, their risks are reduced to the consumption uncertainty of their portfolio and, in a smaller part, to the volatility of spot prices, since they could need to fix their energy quantities by submitting bids to spot markets, as the day-ahead market (DAM) and intraday market (IDM). The DAM is used to obtain/sell the need/excess of energy, that is expected not to be physically cleared by the members. Furthermore, each session of the IDM can be used to compensate for the expected short-run imbalances between all acquired and consumed electricity. Next, can be used the intraday continuous market 15-min ahead of a real-time operation to trade some of the close to real-time expected deviations [3]. Furthermore, as BRPs, CECs are responsible for their members' deviations. Thus, if they have imbalances in relation to their programmed dispatch, they could have to be penalized in balancing markets, paying/receiving the unbalanced down/up prices [6,7].

This section presents a process for strategic bidding in wholesale electricity markets, considering that CECs can also consider bilateral agreements to acquire electricity. The process uses different types of data. It uses historical data to forecast the next day's consumption in the DAM based on a forced forecast. It was selected from the database the most recent hour with an hourly consumption, *h*, according to the type of forecast day (D): weekday (W), Saturday (S), Sunday (U) or holiday (H). Considering the database with the historical daily consumption data, D = {W, S, U, H}, the formulation to obtain the forecast is:

subject to:

$$\min\_{\mathcal{H}^{t-h}} h\_{\prime} \left\{ h \middle| (\hat{q}\_{t} \wedge q\_{t-h}) \in (\mathcal{W} \vee \mathcal{S} \vee \mathcal{U} \vee \mathcal{H}) \right\} \tag{10}$$

For every time period, CECs can have multiple contracts *K*, so the total quantity of electricity already guaranteed through bilateral contracts, *qc*,*t*, is used to compute the bids to each period of the DAM, *q*0,*t*.

$$q\_{0,t} = \mathfrak{q}\_t - q\_{c,t} \tag{11}$$

*<sup>q</sup>*ˆ*<sup>t</sup>* <sup>=</sup> *qt*−*h*, <sup>∀</sup>*<sup>h</sup>* ∈ D (9)

$$q\_{\mathbb{C},t} = \sum\_{k=1}^{K} q\_{\mathbb{C}\_k,t} \tag{12}$$

For each intraday session, *s*, the forecast, *q*ˆ*s*,*t*, uses the most updated consumption information to forecast the consumption of the CEC and submit bids for the required electricity session. The intraday methodology has been adapted from a forecast methodology for

retailers [28]. The computed quantity bid to an intraday session, *qs*,*t*, to submit to every time period, *t*, of each intraday session, *s*, considering the short-run forecasts and all acquired electricity through bilateral contracts, *qc*,*t*, the DAM, *q*0,*<sup>h</sup>* or the previous intraday session(s), *qi*,*t*.

$$q\_{s,t} = \left. q\_{s,t} - q\_{c,t} - q\_{0,t} - \sum\_{i=1}^{s-1} q\_{i,t} \right. \tag{1.3}$$

Then, the real-time imbalance, *qdev <sup>t</sup>* , of period *t*, is computed considering the difference between the real-time consumption of the CEC, *qt*, and the final programmed dispatch, *q prog <sup>t</sup>* , respectively:

$$q\_t^{dcv} = \ \ q\_t - q\_{\mathfrak{c},t} - q\_{0,t} - \sum\_{s=1}^{S} q\_{s,t} = q\_t - q\_t^{prog} \tag{14}$$

Each time period balance responsibility of the CEC, *Cdev <sup>t</sup>* , considering its deviations, *qdev <sup>t</sup>* and the prices of the excess or lack of electricity, in cases of up, *<sup>P</sup>up <sup>t</sup>* , or down, *<sup>P</sup>down t* deviations, respectively, are computed as follows:

$$\begin{cases} \mathcal{C}\_t^{dvv} = q\_t^{dvv} P\_t^{\mu p}, & \text{for } q\_t^{dvv} > 0 \\\\ \mathcal{C}\_t^{dvv} = \left| q\_t^{dvv} \right| P\_t^{dvuvu}, & \text{for } q\_t^{dvv} < 0 \end{cases} \tag{15}$$

Each bilateral contract *k* has its own price, *pck*,*t*, so, each time period cost, *Ct*, of the CEC is:

$$\mathbf{C}\_{t} = \sum\_{k=1}^{K} p\_{\mathbf{c}\_{k},t} q\_{\mathbf{c}\_{k},t} + p\_{0,t} q\_{0,t} + \sum\_{s=1}^{S} p\_{s,t} q\_{s,t} - \mathbf{C}\_{t}^{dev} \tag{16}$$

To evaluate the performance of the forecast techniques are used two different indicators, MAPE and NRMSE [28]:

$$MAPE = \frac{100\%}{T} \sum\_{t=1}^{T} \left| \frac{q\_t - \dot{q}\_t}{q\_t} \right| \tag{17}$$

$$NRMSE = 100\% \frac{\sqrt{\frac{1}{T} \sum\_{t=1}^{T} (\hat{q}\_t - q\_t)^2}}{q\_{max}} \tag{18}$$

where *qmax* is the maximum CEC's demand. The value of *q*ˆ*t*, depends on the time horizon of each market forecast, being equal to *q*0,*<sup>t</sup>* + *qc*,*<sup>t</sup>* in the case of day-ahead forecasts and equal to *q prog <sup>t</sup>* in the case of intraday forecasts.

The following section presents a case study to test the strategic bidding process presented in this section when a CEC participates in the markets presented in the previous section.

#### **4. Case Study**

This section presents a case study that tests the process of strategic bidding on spot markets, considering a CEC composed of real-world consumers that want to be active market players.

The case study uses real-world data from 312 Portuguese consumers connected to the medium voltage of the transmission grid, representing around 5% of the national demand during the period from 2011 to 2013 [36]. The CEC is composed of 72 residential aggregations, 189 small commercial aggregations, 13 large commercial, 8 industrial, and 32 aggregations of diverse consumer types. They have a peak demand of 446 MW. Therefore, their consumption data from 2012 are extrapolated to 2019.

In 2019 the regulated energy tariff for medium voltage consumers was 111.93 e/MWh. From this tariff, 70.68 e/MWh is from the wholesale price of energy, 5.26 e/MWh is for retail commercialization, and the rest is for grid access and usage [16]. The last parcel includes the General Economic Interest Cost (GEIC), which results from economic incentives

for renewable and thermal generation, with a value of 24.70 e/MWh. The Portuguese legislation highly incentives CECs and self-consumption. So, CECs and self-consumption have a discount of 50% in the GEIC, being the discount of CECs with self-consumption of 100%. Thus, CECs may only pay 23.64 e/MWh for grid access plus the wholesale cost of energy of their own trades, instead of the retail tariff (111.93 e/MWh). Against this background, the goal of this section is to test the strategic bidding process of CECs, considering its forecast accuracy and the market outcomes of the CEC, also considering the different IS mechanisms.

Considering the forecast accuracy, the DAM forecasts have a MAPE of 5.32% and a NRMSE of 4.6%. The IDM forecasts have a MAPE of 4.43% and a NRMSE of 3.62%. According to the literature, forecasts with a MAPE lower than 10% are considered highly accurate forecasts [25]. Comparing these results with the forecast accuracy of retailers when these consumers are part of their portfolios can be concluded that only one out of six retailers can obtain lower errors, and only in the IDM forecasts [28]. So, CECs can improve local forecast accuracy, but reducing the portfolios' diversification of retailers may decrease their national demand forecast accuracy. Thus, CECs can be relevant to balance power systems that consider local marginal pricing and balance, as in the USA and Australia. These values prove the strong accuracy of the employed forecast methodology, as can be seen in Figure 1.

**Figure 1.** DAM and IDM deviations in relation to real consumption. Brown lines consider the merge between DAM and IDM deviations.

Analysing Figure 1 can be concluded that only a few hours during the year, IDM forecasts are worse than DAM forecasts. Analyzing the figure can be concluded that the CEC demand is higher during summer. This is true because in Portugal, during summer, cooling demand is satisfied by electric air conditioning, while during winter, heating demand is satisfied by natural gas, wood, and electricity. Furthermore, while cooling demand is satisfied during working hours, heating demand is satisfied during the night. Also, while the electrification of commercial buildings is advanced, residential consumers still use other sources of energy for heating demand. Moreover, the majority of the CEC participants are commercial consumers. Against this background, because of the high tourism rates and cooling demand during summer, the summer demand of the CEC is substantially higher than during other seasons. Concerning demand forecasts can be verified that during winter, deviations are higher, mainly at the beginning of January and during December, even considering lower demands when compared to summer. This may occur because of potentially uncertain cold waves that lead commercial consumers to use electrical heating against predictions. It was not detected significant differences in forecast accuracy according to the type of day (weekday, Saturday, Sunday, and holiday).

The main market outcomes of the CEC are presented in Table 1.


**Table 1.** Average hourly market outputs of the CEC on each market mechanism.

From the results, it is possible to conclude that the DAM forecasts are overestimating the CEC consumptions, leading the CEC to sell part of its extra energy in the intraday market. Moreover, the average cost of the imbalances weighs around 3% of the total energy cost.

Table 2 presents the levelized cost of the CEC with energy on wholesale markets.

**Table 2.** Levelized energy costs of the wholesale market.


Analyzing Table 2, it is possible to conclude that consumers may reduce their costs in the energy part of the tariff from 70.68 e/MWh to values below 49.00 e/MWh by being an active market player, besides significant savings in all grid access costs for being part of a CEC. Also, the imbalance costs have a low weight when compared with the energy cost. Consumers may reduce their tariffs from 111.93 e/MWh to 72.53 e/MWh, a reduction of around 35%, by being part of a CEC and active market players. Furthermore, their cost of electrical energy may have a significant reduction in the case they invest in self-consumption.

The proposed strategic bidding already leads to high forecast accuracies and low imbalance costs. So, the CEC has no incentive to invest in storage capacity for self-control of its consumption. However, future power systems with majority penetrations of vRES may need the flexibility of demand players to guarantee the security of supply. Against this background, power systems shall design economically attractive demand response programs to incentive demand-side flexibility. However, in the case considering consumers with self-consumption (prosumers) and/or distributed generation as members of the CEC, the forecast accuracy of the methodology may decrease, which can increase the need for storage solutions or self-regulation of consumption to avoid the payment of high penalties. In the case of considering self-consumption, the CEC will not pay the GEIC costs, reducing their costs with grid access and usage from 28.90 e/MWh to 16.55 e/MWh.

The present study does not consider a change in each consumer behavior, which may be more conscious and active in the case of being part of a community. With increasing levels of distributed generation and local storage, such as solar photovoltaic and electric vehicles, the tendency is to increase the importance of the distribution grid and retire large-scale power plants of the transmission grid. To guarantee the security of supply and security standards in the energy dispatched to/from the transmission grid, local distribution system operators may rely on local consumption flexibility to avoid outages. In power systems with nearly 100% renewable generation, imbalances may be solved locally, avoiding the need for large-scale fossil fuel power plants providing reserves to balancing markets. So, CECs are important as BRPs of current and future power systems. The main problem of CECs is their lack of experience in participating in electricity markets. So, local consumers may be aggregated as a community, obtain bargaining power and then participate in the retail competition to avoid being divided throughout the portfolios of several retailers. However, retailers request substantial market premiums while negotiating

long-term bilateral agreements [18]. CECs need to be more active as market players than as part of retailers' portfolios. So, the cost-benefit of being an active/passive consumer of an active/passive CEC may be considered.

In conclusion, it is economically beneficial for passive consumers to be part of an active CEC, considering savings of around 35% concerning retail tariffs, which may increase if consumers have self-consumption and flexibility.

#### **5. Conclusions**

This article has presented an overview of the European balancing and imbalance settlement markets. Furthermore, it has presented a strategic bidding process for Citizen Energy Communities (CECs) being active market players, by submitting bids on spot day-ahead (DAM) and intraday markets (IDMs).

The strategic bidding process uses two different hybrid forecast methodologies: a forced forecast for DAM bids and a short-run trend of the expected consumption behavior of the CEC members for IDM bids. The article has also presented a case study to evaluate the CECs' strategic bidding process in spot markets by using real data from the Iberian electricity market (MIBEL) in 2019 and from Portuguese consumers in 2012 but extrapolated for 2019. The model was tested by considering a CEC composed of 312 real medium voltage consumers. Results from the study confirm that large amounts of diversified aggregated demands conduct high forecast accuracies. Furthermore, it confirms that passive consumers economically benefit from being part of CECs, considering tariff incentives and lower wholesale market prices. Indeed, the study proved that consumers save 35% in electrical energy costs by being part of a CEC. Furthermore, their savings can increase if they invest in self-consumption. Moreover, the operation and outcomes of CECs can be improved in the case of having storage assets and flexible consumers, contributing to the local balance of the power system. Indeed, towards a carbon-neutral society, power systems may speed up the replacement of large-scale fossil fuel power plants by renewable distribution (small-scale) and transmission generation (large-scale) if consumers play an active role in the power system balance.

The main issues of CECs being active market players are the volatility of spot prices and the uncertain consumption of their members. They can mitigate the price risk by establishing medium to long-term bilateral agreements in wholesale markets. Furthermore, they can mitigate the quantity risk by signing demand response contracts with members and/or investing in storage solutions.

Future work is intended to study how the strategic bidding model can be adapted to prosumers and distributed generators as members of the CEC, and deal with flexibility considering demand response and storage assets. Moreover, are going to be analyzed the benefits of CECs being active market players or just part of retailers' portfolios.

**Funding:** This work has received funding from the EU Horizon 2020 research and innovation program under project TradeRES (grant agreement No 864276).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The real consumption dataset of the consumers can be found in an online repository at https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams2011201 4#. The market results of the Iberian market of electricity are available at https://www.omie.es/ pt/market-results/daily/daily-market/daily-hourly-price. The market results of the Portuguese balancing markets and imbalance settlement are available at https://www.mercado.ren.pt/EN/ Electr/MarketInfo/MarketResults/Pages/default.aspx. The Portuguese tariffs of electrical energy can be found at https://www.erse.pt/en/activities/market-regulation/tariffs-and-priceselectricity/. All data were accessed on 21 November 2022.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Abbreviations**


#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Experimental Design for the Propagation of Smoldering Fires in Corn Powder and Cornflour**

**Ana C. Rosa 1,2,\*, Ivenio Teixeira 1, Ana M. Lacasta 3, Laia Haurie 3, Carlos A. P. Soares 4, Vivian W. Y. Tam <sup>5</sup> and Assed Haddad <sup>6</sup>**


**Abstract:** Corn is an example of an agricultural grain with a specific combustibility level and can promote smoldering fires during storage. This paper conducts an experimental design to numerically evaluate how three parameters, namely particle size, moisture, and air ventilation, influence the smoldering velocity. The work methodology is based on Minitab's experimental design, which defined the number of experiments. First, a pile of corn is heated by a hot plate and a set of thermocouples registers all temperature variations. Then, a full-factorial experiment is implemented in Minitab to analyze the smoldering, which provides a mathematical equation to represent the smoldering velocity. The results indicate that particle size is the most influential factor in the reaction, with 35% and 45% variation between the dried and wet samples. Moreover, comparing the influence of moisture between corn flour and corn powder samples, a variation of 19% and 31% is observed; additionally, analyzing the ventilation as the only variant, we noticed variations of 15% and 17% for dried and wet corn flour, and 27% and 10% for dried and wet corn powder. Future studies may use the experimental design of this work to standardize the evaluation methodology and more effectively evaluate the relevant influencing factors.

**Keywords:** experimental design; corn; experiments; Minitab; smoldering velocity

#### **1. Introduction**

Smoldering is a term used to define the process of flameless burning within the material pores, with slow and low-temperature reactions [1], which is quite common in the storage of agricultural materials [2]. It can be defined as a process composed of two steps: pyrolysis and oxidation. The heat released by the oxidation step feeds the pyrolysis step, and if the pile height of the stored material is large enough not to dissipate heat and keep it stored, the reaction may be sustained for days or weeks. According to Ohlemiller [3], smoldering constitutes a severe fire hazard for two reasons: smoldering yields a higher fuel conversion to toxic compounds and allows a pathway to flaming combustion. A combustible powder or dust can react with oxygen and propagate the reaction without flaming, with velocities of mm/hour or cm/hour, evolving to glowing, flaming, or even explosive combustion [4]. Therefore, comprehensive knowledge of smoldering is essential to prevent facility accidents.

Some materials are susceptible to smoldering hazards triggered by self-heating or by an external source during storage [5]. Some grains, such as corn, have characteristics that

**Citation:** Rosa, A.C.; Teixeira, I.; Lacasta, A.M.; Haurie, L.; Soares, C.A.P.; Tam, V.W.Y.; Haddad, A. Experimental Design for the Propagation of Smoldering Fires in Corn Powder and Cornflour. *Eng* **2023**, *4*, 15–30. https://doi.org/ 10.3390/eng4010002

Academic Editor: Antonio Gil Bravo

Received: 17 October 2022 Revised: 8 December 2022 Accepted: 19 December 2022 Published: 24 December 2022

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

may develop a smoldering reaction due to their level of combustibility. The physical and chemical features of the material and external conditions directly influence smoldering development [6]. The most evaluated characteristics of these materials are particle size and moisture; their variations may increase or decrease the propagation speed rate [2,7].

Although smoldering combustion is a common hazard in agricultural storage, there are few studies regarding the smoldering propagation phenomena and a lack of studies in agricultural materials, especially on corn grains. Most studies focus on other materials with a higher level of combustibility, such as sawdust, biomass, foam, and coal. This paper aims to fill this gap by developing an upward smoldering reaction in a corn pile. The novelty of this work lies in proposing a framework to develop a design of experiments (DoE) to understand the variables' impact and characterize the smoldering velocity inside the pile. Since the reaction propagation depends on material characteristics and external conditions, this work evaluates the influence of three factors in a full factorial experiment—particle size, moisture content, and air ventilation. The present study investigates how granularity, moisture, and ventilation affect smoldering fire propagation. In order to make the problem reliable and reproducible, DoE was designed and developed via Minitab software, which can offer a statistical representation for the extrapolation of the experiments and provide a mathematical equation with the three dependent variables. Following that, it allows the determination of smoldering velocity for different conditions.

#### **2. Theoretical Foundation**

Faced with the constant risk in industrial facilities, researchers have perceived the need to assess the hazard level of particulate materials igniting and developing smoldering combustion or flaming combustion. Furthermore, if accumulated on surfaces with high temperatures, even small layers of particulate material, such as powder or dust, may undergo thermal ignition, further leading to smoldering [8]. Over the years, this subject has been approached theoretically as well as experimentally. Palmer [9] was one of the pioneers and experimentally investigated the evolution of smoldering propagation within dust deposits up to 85 cm deep of cork dust, wood dust, and grass dust and measured the smoldering velocities and the temperatures along with the sample height. Another important study was done by Leish et al. [10], who investigated the spread of smoldering on grain, grain dust (corn and soybean), and wood dust under the influence of forced horizontal airflow.

As depicted by Ogle [11], smoldering requires four necessary conditions: a porous fuel that forms a solid char, an oxidizer, an ignition source with minimum ignition energy, and a minimum thickness for the deposit focusing on the thermal insulation to store the energy released by oxidation, which will increase the temperature within the fuel mass. Therefore, any changes in these four conditions or the external conditions will affect the smoldering propagation rate [12]. For this reason, many authors aimed to study the variations in the characteristics of both stored materials and external conditions. The essential features studied are particle size, moisture content, sample height, and oxygen supply [13–15]; their variations may increase or decrease the propagation speed. For example, El-Sayed and Abdel-Latif [4] investigated the critical temperature and critical flux for igniting a layer of corn flour dust and a mixture of wheat flour and corn flour (80% wheat flour + 20% corn flour) on a hot plate. In another work, El-Sayed e Khass [13] determined the minimum hot surface temperature for dust ignition, the ignition temperature of dust itself, and the ignition times and evaluated the effects of the dust particle size and the rice sample size on ignition parameters. Chunmiao et al. [16] in their studies proved that increasing the particle size of magnesium powder from 6 to 173 μm increased the minimum ignition temperature and increasing the thickness of the dust layer decreased this temperature as well. Sesseng et al. [17] performed some experiments with wood chips to evaluate how the granularity affects smoldering fire. All the aforementioned studies and other similar works elucidated the importance of studying the particle size and the minimum temperature at which dust layers or material piles ignite and can lead to smoldering fires.

Regarding the experiments assessing the spread progress, only a limited number of studies addressed the smoldering propagation in fuel beds, which can be divided according to the location of the ignition source and the direction of the spread rate: upward and downward. Palmer's study [9] concluded that sustained upward smoldering could be obtained inside dust deposits up to 85 cm deep. The propagation time was approximately proportional to the square of the depth of the dust. Torero and Fernandez-Pello [18] conducted an experimental study of upward smoldering of polyurethane foam and evaluated the smoldering velocity and reaction temperature as a function of the fuel height. He and Behrendt [19] compared theoretically natural upward and downward smoldering of piled sawdust char powder. Their results showed that upward smoldering was more than ten times faster, and the temperature inside the fuel bed was significantly higher lower than downward. Then, they experimentally investigated the natural smoldering of wood char granules in a packed bed and concluded that downward smoldering was stable.

In contrast, upward smoldering was affected by many factors like the fuel bed height, particle size, and ambient conditions [20]. He et al. experimentally investigated the effects of fuel type, moisture content, and particle size on the natural downward smoldering of biomass powder. Hagen et al. [21] studied the ignition temperature for smoldering in cotton for several densities both experimentally and theoretically, and showed that the ignition temperature decreases with increasing density.

Throughout the years, the mechanisms that affect the spread of smoldering fire have been studied. Agricultural materials are studied in smaller proportions when compared with these other materials because they are not considered as combustible. However, this fact does not negate the possibility of smoldering during storage. Therefore, the present study aims to experimentally evaluate a particular corn pile while varying three variables used in previous works with other materials. Two material variables were considered, particle size and moisture and an external variable referring to air ventilation on the sample. For the analysis, a full factorial experiment was carried out via Minitab (v. 17). Three influencing variables were chosen to evaluate the smoldering propagation velocity, and each factor was varied to identify the most influential factor.

#### **3. Experimental Methodology**

The methodology of this work consisted of five steps: planning, preliminary, performance, graph plots, and Minitab evaluation, as exemplified in Figure 1.

#### **Figure 1.** Work methodology.

#### *3.1. Smoldering Schematic Setup*

The experimental setup adopted in this study is based on previous works that investigated the upward smoldering process with other combustible materials on a similar experimental setup [22–24], depicted in Figure 2.

The setup comprises an electric panel, a hot plate, a cylindrical reactor, eight thermocouples, and a data logger. The reactor consisted of a perforated cylindrical structure made of steel, 80 mm in diameter (D) and 120 mm in high (L), controlled by the electric panel. The panel is set to achieve a power of 550 W, and this can approximately elevate the hot plate temperature to 350 ◦C. Preliminary experiments that do not consider the use of the electric panel demonstrate that results could be impacted. Therefore, it ensures that there were no significant variations in hot plate temperature throughout the experiment.

During the tests, temperatures at different heights along the central axis of the reactor are measured with seven type K thermocouples, which are placed 3, 15, 30, 45, 60, 75, and 90 mm away from the hot plate. Furthermore, another thermocouple was positioned directly on the hot plate. After recording the temperatures, the smoldering velocities and the maximum temperature achieved in the burning process are determined.

#### *3.2. Material Preparation*

In order to perform the experimental planning, corn grains produced in Spain were used for these experiments, which were then crushed and sieved in different diameters. Thus, two samples were selected, and their particle diameter and particle distribution were estimated by laser diffraction (Table 1).


**Table 1.** Particle distribution of the corn materials.

Before the onset of the experiments, the material was separated into two environments to acquire two levels of moisture content. The first environment was a climatic chamber with a temperature of 16 ◦C that allowed moisture of 15%. The other one was a desiccator with a temperature of 100 ◦C that eliminated the moisture content of the corn samples. All samples were maintained at both environments for at least three days.

#### *3.3. Design of Experiments*

The objective of the experimental design is to perform a sequence of tests with some changes in the input variables of a process, which allows one to observe and identify corresponding changes in the output response. The core of this work was developed based on DoE defined by Minitab. As shown in Figure 3, its procedure is envisioned as a combination of influencing factors affecting a process and transforming an input material into an output product.

**Figure 3.** Schematic process of Minitab.

The sixteen experiments were performed, varying the three influencing factors chosen. For each particle diameter, two moisture content conditions were evaluated (named as dried and wet), and with and without a ventilation system. Table 2 shows the eight experiments that were conducted twice.



#### **4. Results and Discussion**

#### *4.1. Smoldering Combustion Analysis*

Many specific behaviors of propagation development were noticed during the experiment, similar to those observed by other authors [19,22]. Figure 4 depicts a sequence of images taken during the test development of the corn flour sample. First, the fuel material was gently packed into the cylindrical reactor, avoiding the mass compaction located on the hot plate, and the thermocouples were connected to it (A). After the heating onset, some displacement, a contraction at the bottom and cracks on the top, indicated the drying step progress and the shrinkage of the material (B). Once the plate heating started, the heat generated by the plate began to heat the first layer by conduction, then the following layers were heated by the previous layers as the reaction front moved upwards. The occurrence of three basic steps can explain the smoldering process: pre-heating and drying of corn material, pyrolysis reaction transforming the fuel into char, and char oxidation reaction releasing more heat to the sample material. After 4 h, the electrical panel was turned off, and the heating source was interrupted. If the heat generated by the oxidation step is sufficient, the heat propagation continues, and the upward smoldering process remains active. However, the smoldering process ceases if the heat released is not higher than the heat absorbed by the heating and pyrolysis steps. As illustrated in some works [17], the appearance of a glowing mass may occur during the smoldering process, which was also observed during some experiments (C), especially experiments with dry corn flour. Additionally, the presence of smoke (D) was observed during the whole process. After the burning process, part of the material turned to ashes and char residue (E) or if the burning was incomplete, unburnt material.

**Figure 4.** The sequence of images taken during the experiments.

As previously mentioned, seven thermocouples measured the evolution of the temperatures inside the corn bed. For each case, a graph was plotted indicating the temperature variation along the central axis of the reactor. As the heating, pyrolysis, and oxidation steps progressed, the heat was propagated toward the top of the sample, and temperatures at different heights of the corn bed were recorded. Subsequently, the graphs exhibited this temporal variation of the seven thermocouples within the sample and some differences and patterns could be observed.

Figure 5 displays four graphs showing the temporal evolutions registered in the tests with corn flour; each graph varied the moisture content and the ventilation condition. Figure 5A,B show the temporal evolutions of the temperature of the thermocouples obtained in an experiment with dried (CF-Dried) and wet corn flour (CF-Wet), while Figure 5C,D show the experiment with the air ventilation system for dried (CF-Dried-V) and wet corn flour (CF-Wet-V).

**Figure 5.** Temperature evolution of CF-Dried (**A**), CF-Wet (**B**), CF-Dried-V (**C**), and CF-Wet-V (**D**).

In all cases, it is possible to verify three specific zones. First, the pre-heating and drying zone is characterized by constant temperatures around 100 ◦C, followed by a temperature increase relative to the pyrolysis step. Finally, the sustained high temperatures due to the oxidation step ensure the smoldering propagation. Although comparing the wet and dried samples, this pre-heating zone is apparently shorter than the wet ones. This difference is more expressive in the layers of the corn flour bed closer to the top. After the pre-heating step, the propagated heat feeds the pyrolysis reaction in the lower layers. As the reaction rises toward the top, the layers underneath the top initiate the oxidation step. The temporal evolution of the temperatures registers a more significant development in the dried samples. However, the wet samples remain burning for longer.

Some researchers describe smoldering as a slow process, i.e., temperatures within a pile of particulate material take time to reach the temperature at which heat will be sufficient to self-sustain the propagation. For this reason, the corn flour layers packed in the cylindrical structure took a long time to reach the temperature to initiate the ignition and maintain the burning process. With the addition of the ventilation system to the samples with the two different moisture levels, a slight reduction in heat propagation is observed, which slows the temperature rise along the central axis of the fuel bed. At the initial pre-heating and drying phase, air ventilation influences the propagation time of the wet sample (CF-Wet-V), which extends the drying phase of the sample. Another critical point is the burning extension after switching off the heating plate. Ventilated samples increase the burning time at both moisture levels, approximately 1 h for dried corn flour (CF-Dried-V) and nearly 2 h for wet corn flour (CF-Wet-V).

At the sample's middle point (60 mm), it is observed that the temperature of the CF-Dried and CF-Wet samples at this point only reached the temperature of the hot plate at 2 h and 19 min, and 2 h and 42 min, respectively. With the addition of the ventilation, the values increase to 2 h and 38 min and 3 h and 21 min. The heat-dissipating effect is noted for this particle size and not the reaction acceleration due to more oxygen supplied. Another point that can be compared and evaluated is the point located 90 mm away from the heating plate, where the last thermocouple is placed. Regarding the influence of the ventilation system, there is a one-hour increase in the burning process for the dried corn flour sample and a one-hour and thirty-minute increase for the wet corn flour sample.

The graphs of corn flour exhibit differences in the maximum temperature achieved and the extension of the smoldering process. Although the CF-Dried (A) presents higher temperatures, CF-Wet-V (D) indicates the most extensive spread of smoldering propagation. After the hot plate shutdown, it can be noticed that smoldering propagation continued until the top of both samples, which indicates that all the fuel material is consumed.

Similar to what is presented and discussed above, Figure 6 also shows the temporal variation of temperature along the central axis of a corn powder bed, with a larger particle diameter than corn flour samples. Figure 6A,B show the temporal evolutions of the temperature of the thermocouples obtained in an experiment with dried (CP-Dried) and wet corn powder (CP-Wet), while Figure 6C,D show the experiment with the air ventilation system for dried (CP-Dried-V) and wet corn powder (CP-Wet-V).

These tests also present the pre-heating and drying, pyrolysis, and oxidation phases similar to the corn flour tests. The dried samples (CP-Dried and CP-Dried-V) show a less-defined drying phase than the wet samples (CP-Wet and CP-Wet-V). After the drying phase, unlike the corn flour samples, the temperatures of these corn powder samples slowly increase until they reach the hot plate temperature. Due to the larger particle size, the heat is not propagated at the same velocity as corn flour. In the case of dried corn powder, smoldering spread only up to 60 mm and 75 mm, while in the case of wet corn powder, the spread does not reach half of the fuel bed. This fact indicates that the upper layers of the fuel bed cannot sustain the reaction because the heat generated by the oxidation step is not sufficient to maintain the pyrolysis reaction in the upper layers. The addition of the air ventilation system does not extend the samples' burning time, which is not observed in the corn flour samples (CF-Dried-V and CF-Wet-V). However, the air ventilation affects the rise

in temperatures in the CP-Dried-V case, which makes it difficult to reach the temperature of the hot plate as well, while in the CP-Wet-V case, there is very little influence of this factor. Therefore, the dissipative effect of the heat has more impact on the dried sample, being almost inexpressible in the wet sample.

**Figure 6.** Temperature evolution of CP-Dried (**A**), CP-Wet (**B**), CP-Dried-V (**C**), and CP-Wet-V (**D**).

Evaluating the sample's middle-point for the corn powder graphs, the temperatures of CP-Dried and CP-Dried-V cases only reach the hot plate temperature at 3 h and 3 h and 47 min, respectively. However, the middle-point of the wet samples does not achieve the hot plate temperature, and they approximately reach 200 ◦C.

The graphs show that the smoldering ceases before it reaches the sample top. Comparing the two cases with the dried sample (CP-Dried and CP-Dried-V), we observe that the burning time is approximately the same, which ends almost 7 h after the hot plate shutdown. However, the height attained by smoldering propagation differs by at least 15 mm, i.e., the distance between thermocouples T5 and T6. In the cases with wet samples (CP-Wet and CP-Wet-V), the smoldering process extends to 45 mm upwards and remains burning for almost 4 h after the plate has been turned off.

These results exhibit similarities in propagation and extension of the smoldering, indicating the low influence of ventilation on this particle size. Although the CP-Dried sample (A) has the highest temperatures and longest burning time, the combustion process is incomplete, and the reaction does not entirely consume the corn powder.

The propagation rate of a smoldering process is more shallow than the rate of a flame combustion process. In the tests performed, the corn bed layers take a long time to reach the hot plate temperature, but they generate enough heat to continue feeding the process. Half of the sample takes more than 2 h to exceed the hot plate temperature, as with the CF-Dried sample, and more than 3 h with CP-Dried. In addition, the wet corn flour samples take a long time to reach high temperatures close to the hot plate's temperatures, as they remain longer in the drying phase. In contrast, wet corn powder samples do not develop smoldering in the layers located in the middle of the sample. Thus, the smoldering propagation is complete only for the corn flour samples, while the corn powder develops a partial smoldering.

The velocity at which smoldering propagates was defined according to the data gathered from the temperature evolution graphs. First, for each thermocouple positioned along the central axis of the fuel bed, the time taken to reach a temperature of 350 ◦C was verified. Then, with all the points of each thermocouple, a regression line was built, and the slope of the regression line quantifies the smoldering velocities. This process was repeated for each test, allowing us to achieve each condition's smoldering velocity.

Comparisons of the smoldering velocities acquired varying the three factors adopted to evaluate the process show that the smaller the particle size, the greater the smoldering velocity, as depicted in Figure 7. The corn flour propagation rates for dried and wet cases are 0.55 and 0.45 mm/min. In the meantime, the velocities of the corn powder samples achieve 0.36 and 0.25 mm/min, respectively. A comparison between the two-particle size samples with the same moisture content shows a variation of 35% for the dried samples and 45% for the wet samples. When comparing the influence of moisture, a variation of 19% for corn flour and 31% for corn powder is observed, indicating a more significant impact of this factor in corn powder samples. The air ventilation reduces the velocities of all samples. The corn flour samples, CF-Dried-V and CF-Wet-V, show velocity values of 0.47 and 0.37 mm/min, and the corn powder samples, CP-Dried-V and CP-Wet-V, registered 0.26 and 0.23 mm/min. This ventilation leads to 15% and 17% variations for dry and wet corn flour, and 27% and 10% for dry and wet corn powder. Although adding an airstream can provide more oxygen and thus promote a more intense combustion scenario, the tested cases indicate the opposite showing a slower propagation velocity.

**Figure 7.** Smoldering velocities of the samples with different particle diameter, moisture, and ventilation system.

Rein [1] pointed out that a burning reaction usually spreads around 1 mm/min despite the variation between the chemical and physical properties of the fuel, which is utterly slower than flame propagation. The experiments performed in this work present slightly slower smoldering velocities between 0.23 mm/min and 0.55 mm/min. Although these values are lower than the value Rein pointed out, smoldering remains a hazard when storing corn materials.

As previously described in this work and by several authors, the smoldering process is known to have a slow burn propagation compared with flaming combustion. For this reason, temperatures along the central axis of the fuel bed take a long time to reach the hot plate temperature and then to sustain the reaction through heat-generating and storage. The temporal evolution of the temperatures inside the corn bed presents a behavior very similar to that shown by Hagen et al. in their experiments in cotton samples, which required an extended period for smoldering development [21]. In the first minutes of the experiments, the graphs have a drying step with little temperature change, followed by the pyrolysis and oxidation step with a rapid temperature increase sustained even after the hot plate shutdown. The temperature decay finally indicates the termination of the char oxidation.

Inside the corn fuel bed, the heat generated by the hot plate is responsible for the beginning of the drying of the sample material and then for the pyrolysis reaction in the first layers at the bottom of the sample, and finally followed by the oxidation reaction that releases more heat to the corn material. The smoldering process requires sufficient heat to turn all the material into reactive char for the oxidation step. As the reaction spreads upwards, the heat generated by the hot plate and pyrolysis step is stored, and thus the smoldering process continues to transfer heat to the upper layers. If the heat generated by the hot plate and the oxidation step of the lower layers is insufficient, the process ceases. Thus, part of the experiments conducted in this work develops total smoldering. Another part has partial smoldering because the heat stored in the material is insufficient to continue feeding the pyrolysis step.

#### *4.2. Minitab Analysis*

The tests of this work were elaborated according to DoE in Minitab to evaluate the smoldering velocity behavior in a corn bed. As previously mentioned, a full factorial experiment was selected, and three influencing factors were defined as the variables to evaluate the smoldering velocity. Each of these factors received two values. In addition, it was determined that each test would be reproduced twice. Therefore, the minimum number of tests required for this full factorial experiment was sixteen tests. Table 3 indicates the values of the selected factors, the smoldering velocity, and the smoldering level. During the tests and the graphs of the temporal evolution of the thermocouple temperatures, not all tests showed a complete smoldering level to the top of the corn bed. Only the corn flour samples developed a complete reaction. At the end of the tests, it could be visually verified that some samples were burned entirely, leaving only ash remaining. In contrast, others were partially burnt, with char and unburnt material as the residue.


**Table 3.** Experiment results.

The tests were performed, varying the selected factors according to the values specified. After running the tests and calculating smoldering velocity in each case, all data were inserted in Minitab to continue with the DOE analysis. Afterward, it was possible to produce some graphs in this software to analyze the influencing factors and smoldering velocity.

The data set inserted in Minitab assisted in formulating a mathematical equation characterizing the smoldering phenomenon in corn material with the three variables chosen. However, as the linear regression model is not always appropriate for the specified data set, it is advisable to evaluate the model's suitability by examining residual plots. One

of the first graphs produced in Minitab was the residual plots, which indicate the dataset's quality to specify smoldering velocity (Figure 8).

The normal probability plot of these residuals is one of the four graphs presented in Figure 8. Most of the points in this plot fit precisely along the straight line, even though some do not fall precisely above the line, which indicates minor problems with the normality assumption. Still, no severe abnormality in the data set is suspected. Therefore, the data can be considered satisfactory for the analysis. Another graph related to residual plots is the histogram that shows a distribution similar to a Gaussian curve. This graph allows us to conclude that the number of tests reproduced is sufficient to guarantee that a normal distribution can represent the smoldering process. Finally, the last two graphs of residuals versus fitted values and residuals versus the observation order do not reveal any unusual or diagnostic pattern.

Figure 9 shows the Pareto Chart of the standardized effects that evaluate the effects of each influencing factor on smoldering velocity, which compares the relative magnitude and the statistical significance of both main and interaction effects. The standardized effect compares the t-statistic from each factor to the value corresponding to the error. This graph defines the particle diameter as factor A, the moisture content as factor B, and the air ventilation system as factor C. This graph evaluates the effects of each factor alone and the effects of more than one factor acting on the process. The factor with the most significant influence on smoldering propagation is the particle diameter, followed by the moisture content of the material. It is worth emphasizing that the impact of the material size is almost twice more significant than the influence of the moisture content. Additionally, according to the Pareto Chart, the effect of two or more factors together is not so relevant. This fact can also be verified in the set of interaction plots for the smoldering velocity (Figure 10). No factor presents an interaction between them.

**Figure 8.** Residual plots for smoldering velocity.

**Figure 9.** Pareto chart of the standardized effects.

Figure 10 presents a set of three graphs with interactions between two factors (particle diameter versus moisture content, particle diameter versus air ventilation, and moisture content versus air ventilation). There is an absence of interactions between the factors that can be observed in the three plots. As shown in the temporal evolution of temperatures graphs, this factor interaction graph indicates that small particle diameters, low moisture content, and low air ventilation provided higher smoldering propagation velocities. Additionally, it can be observed that when the particle diameter or the moisture of the material achieves higher values, the two levels of ventilation system values tend to come closer and have closer smoldering velocities.

Figure 11 depicts the contour plots showing the relationship between two factors. Similar to the interaction plot, this graph also analyzes the interactions of two factors in the desired response, which is the smoldering velocity. Moreover, this set of plots presents a color degree, indicating ranges with different velocity values, which allows the evaluation of the extent of each range and even the specification of other cases between the limits of each level of the influencing factors. For example, the moisture content versus particle diameter plot shows more bands than the other two contour plots. It can be explained by the fact that these two factors are the most expressive in velocity.

**Figure 11.** Contour plots of smoldering velocity.

These contour plots can analyze how variations between the factors' levels influence the smoldering and predict the best and worst cases where the smoldering propagation presents smaller values. The best case is in the dark blue band with greater diameter particles, higher moisture content, and higher air ventilation levels that exhibit bands of areas with lower velocities. Additionally, it is also possible to find out other values of the factors that fit in the same band. In the three graphs, we can see that slight variations in the axis regarding particle diameter lead to significant velocity changes and thus change the velocity bands. However, when evaluating the axes of the other two factors (moisture and air ventilation), it was observed that only a few velocity bands covered almost all values between the upper and lower levels of these two factors.

The moisture versus diameter plot has lower and higher velocity ranges, less than 0.25 mm/min and higher than 0.5 mm/min. In the meantime, the ventilation versus humidity plot indicates a smaller number of bands and a larger length than the other graphs. As it is possible to notice in the first contour plot, we can obtain the same velocity inferior to 0.25 mm/min if we have a level of moisture superior to 10% and a particle diameter slightly smaller than 0.5 mm. The following bands are broader and include moistures varying from 0% until 15% and a diameter greater than 0.300 mm. In the second and third contour plots, fewer bands appeared, and the blue bands are more vertical, i.e., for the same value on the *x*-axis, we get the same velocity if we change the values of the *y*-axis. The ventilation versus diameter plot shows that for a given range of diameter higher than 0.3 mm, ventilation may vary within the specified limits between 0 and 0.1 m/s, and the smoldering velocity will achieve the same value. The ventilation versus moisture plot indicates that with moisture higher than 7%, air ventilation can also change between the upper and lower limit levels. However, only the first allows the best factor combination to find the lowest smoldering velocity.

In addition to the plots, running the experimental design in Minitab provides a linear regression equation (Equation (1)) based on the data plugged into the software. The equation represents the contributions of the three influenced factors (Diameter—D, Moisture—M, Air ventilation—V) along with the three two-factor interactions (DM, MV, DV) and one three-factor interaction (DMV). Thus, this linear regression equation provided by the full-factorial experiment can be used to represent the smoldering process. Therefore, considering these factors, it will be possible to obtain the value of the smoldering rate for corn material from any set of values of these parameters. That can be helpful for future studies investigating the smoldering process with this material and similar conditions.

*<sup>V</sup>* = 7.445 <sup>×</sup> <sup>10</sup>−<sup>2</sup> <sup>−</sup> 7.740 <sup>×</sup> <sup>10</sup>−<sup>2</sup> <sup>×</sup> <sup>D</sup> <sup>−</sup> 0.380 <sup>×</sup> <sup>10</sup>−<sup>3</sup> <sup>×</sup> <sup>M</sup> <sup>−</sup> 0.790 <sup>×</sup> <sup>10</sup>−<sup>1</sup> <sup>×</sup> <sup>V</sup> <sup>−</sup> 0.960 <sup>×</sup> <sup>10</sup>−<sup>3</sup> <sup>×</sup> <sup>D</sup> <sup>×</sup> M + 2.60 <sup>×</sup> <sup>10</sup>−<sup>2</sup> <sup>×</sup> <sup>D</sup> <sup>×</sup> <sup>V</sup> <sup>−</sup> 0.273 <sup>×</sup> <sup>10</sup>−<sup>2</sup> <sup>×</sup> <sup>M</sup> <sup>×</sup> V + 1.160 <sup>×</sup> <sup>10</sup>−<sup>2</sup> <sup>×</sup> <sup>D</sup> <sup>×</sup> <sup>M</sup> <sup>×</sup> <sup>V</sup> (1)

> In this work, three factors are analyzed to evaluate the development of the smoldering process. First, the smaller the particle size, the larger the surface area and the greater the oxygen attack to the oxidation step. Therefore, the smoldering reaction in corn flour samples develops faster than in corn powder samples, which perfectly agrees with other literature experiments, demonstrating that granularity significantly affects the smoldering dynamics [17].

> Second, regarding the material moisture, the dried samples do not need to absorb the initial heat of the drying stage. Instead, this heat was conducted to the next step (pyrolysis stage), which promoted higher temperatures throughout the corn bed. Correspondingly, the work performed by Huang et al. [14] also concluded that smoldering spread decreases with moisture content, and above a specific threshold, the experiment exhibited an incomplete burning reaction. Furthermore, another work also corroborates that increasing the moisture content reduces the propagation reaction during the drying step [23].

> Third, the air ventilation system applied in the experiments had more dissipative heat than supplying oxygen to the oxidation reaction, reducing the velocity propagation. Experiments by Urban et al. [24] used an airflow of 0.5 m/s above the fuel bed. It demonstrated the importance of this factor and how it can affect the ignition process and establish a smoldering process.

> The plots obtained by the experimental design indicate the influence and interactions of each adopted factor. Furthermore, it is possible to evaluate all the variations and impacts of the factors on the smoldering velocity. For example, the particle size is more influenceable on smoldering velocity than the other two factors. This work focused on applying an experimental design to investigate the behavior of smoldering combustion in corn grain. As a limitation and a suggestion for future works, one can consider different conditions and other factors to evaluate the smoldering velocity, which can be compared with the main outcomes of this paper.

#### **5. Conclusions**

This work proposed a new methodological framework evaluating the effects of the influencing factors on smoldering propagation—particle diameter, moisture content, and air ventilation, which was performed using the Design of Experiments in Minitab software. The experiments showed that upward propagation succeeded vertically from the base sample until insufficient heat exchange tried to sustain the process. The results plotted on the temperature evolution graphs and the graphs generated by the full factorial experiment indicated that the factor with the most significant influence on the propagation rate was the particle diameter, which represented a variation of 35% among the dried samples and 45% among the wet samples.

Moreover, comparing the influence of moisture between the corn flour and corn powder samples, a variation of 19% and 31% was observed, which indicated a more significant influence of this factor in corn powder samples. Regarding the ventilation influence as the only variant, 15% and 17% variations were noticed for dried and wet corn flour, and 27% and 10% for dried and wet corn powder. Additionally, the proposed framework considering the experimental planning developed a linear regression equation to represent the smoldering process in corn grain particles with the three influencing factors chosen, which can be used to extrapolate the results and obtain the smoldering velocities for other cases. Future work may use the proposed methodology to study other material properties that affect the propagation rate.

**Author Contributions:** Conceptualization, formal analysis, investigation, data curation, validation, writing—original draft, writing—review & editing: A.C.R., Writing—review & editing: I.T.; Writing review & editing: A.M.L.; Writing—review & editing: L.H.; Writing—review & editing: C.A.P.S., Writing—review & editing: V.W.Y.T.; Formal analysis, investigation, data curation, writing—original draft, Writing—review & editing, validation, funding acquisition, project administration, supervision: A.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received financial support from CNPq (Brazilian National Council for Scientific and Technological Development) and CNE FAPERJ 2019-E-26/202.568/2019 (245653) Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Some or all data used are proprietary or is confidential in nature and thus may only be provided with restrictions, stated clearly in the article.

**Acknowledgments:** The authors want to acknowledge the financial support from CNPq and CNE FAPERJ.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Angle of the Perforation Line to Optimize Partitioning Efficiency on Toilet Papers**

**Joana Costa Vieira 1,\*, André Costa Vieira 2, Marcelo L. Ribeiro 3, Paulo T. Fiadeiro <sup>1</sup> and Ana Paula Costa <sup>1</sup>**


**Abstract:** Currently, tissue product producers try to meet consumers' requirements to retain their loyalty. In perforated products, such as toilet paper, these requirements involve the paper being portioned along the perforation line and not outside of it. Thus, it becomes necessary to enhance the behavior of the perforation line in perforated tissue papers. The current study aimed to verify if the perforation line for 0◦ (the solution found in commercial perforated products) is the best solution to maximize the perforation efficiency. A finite element (FE) simulation was used to validate the experimental data, where the deviations from the experiments were 5.2% for the case with a 4 mm perforation length and 8.8% for a perforation of 2 mm, and optimize the perforation efficiency using the genetic algorithm while considering two different cases. In the first case, the blank distance and the perforation line angle were varied, with the best configuration being achieved with a blank distance of 0.1 mm and an inclination angle of 0.56◦. For the second case, the blank distance was fixed to 1.0 mm and the only variable to be optimized was the inclination angle of the perforation line. It was found that the best angle inclination was 0.67◦. In both cases, it was verified that a slight inclination in the perforation line will favor partitioning and therefore the perforation efficiency.

**Keywords:** FE model; optimization; perforation efficiency; perforation line angle; tissue toilet paper

#### **1. Introduction**

At the present time, there is a need for products that result in the use of less disposable material by environmentally conscious consumers. In the tissue paper converting industrial process, this has encouraged manufacturers to produce products with the ability to be partitioned [1].

In the production of finished tissue paper products, such as facial papers, paper towels and toilet papers, transversal perforation lines are used to facilitate the separation of the roll into individual "sheets" or services needed by the consumer. This feature of perforation allows the consumer to conveniently dispense a certain amount of the product according to their convenience [2]. Perforation takes place in the tissue paper converting machine when the sheet of paper passes through a nip between a stationary anvil and the perforator blades. These blades are usually mounted on a rotating cylinder and have alternately spaced teeth and notches. Both the anvil and the perforator are skewed in the machine direction (MD) to decrease the impact of the blade against the anvil by reducing vibration and keeping the cut line perpendicular to the MD of the tissue paper sheet. It is important that the perforator blades produce the desired cut in the finished product, so that consumer acceptance is as intended. The quality of the product cannot be affected by this operation due to poor distribution or the type of perforations. On the other hand, there has to be a balance between the number of cuts, the dimension of the cuts, the number of

**Citation:** Vieira, J.C.; Vieira, A.C.; Ribeiro, M.L.; Fiadeiro, P.T.; Costa, A.P. Angle of the Perforation Line to Optimize Partitioning Efficiency on Toilet Papers. *Eng* **2023**, *4*, 80–91. https://doi.org/10.3390/ eng4010005

Academic Editor: Antonio Gil Bravo

Received: 23 November 2022 Revised: 19 December 2022 Accepted: 20 December 2022 Published: 1 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

spacings, the dimension of the spacings and the number of plies, so that the partition of the paper roll partition by the consumer is neither easy nor too hard [3–5]. This balance is called the perforation efficiency and can be determined accordingly to the standard [6] by Equation (1):

$$E\_p = 100\left[1 - \frac{\overline{S}\_p}{\overline{S}\_{np}}\right] \tag{1}$$

where *Ep* is the perforation efficiency (%), *Sp* is the average tensile strength of perforated papers (N/m) and *Snp* is the average tensile strength of unperforated papers (N/m).

During the tissue paper manufacturing process, raised up cellulosic fibers are found on the sheet surface, which help in consumer hygiene, but which in excess can form agglomerates, impairing the quality of the final product. To reduce the loss of cellulosic fibers on the paper surface, it is desirable that the perforation blade have relatively thin teeth [3,4]. Thus, the proper geometry of the blade must be considered. The perforator is also responsible for the visual appearance of the free edge of the remaining paper roll. The consumer wants an aesthetically pleasing free edge (smoother and less irregular between the cut and uncut areas) after tearing off the desired amount of paper [3,4].

The geometric discontinuity of the perforation line will affect the existing stress field in this area, thus affecting the stress concentration factor and consequently the final efficiency. The ratio between the highest value in a geometric discontinuity and the nominal stress in the minimum cross section is called the stress concentration factor [7]. In a previous work develop by Vieira et al. [8], they concluded that in toilet paper samples with a stress concentration factor above 0.11, a tear occurs at other locations away from the perforation line. On the other hand, toilet papers with a stress concentration factor below 0.11 tear along the perforation line. Another study carried out by Vieira et al. [9] showed that the perforation efficiency increases with an increase in the cut distance, stabilizing with a cut distance of 6 mm. The predicted differences of numerical simulations, when compared to experimental tests, decreases from 27% to 4% with a cutting distance ranging from 2 mm to 8 mm. However, the numerical simulations shown a trend in terms of the stabilization of the perforation efficiency for a cutting distance of 6 mm.

The current study aimed to verify if the perforation line at 0◦ is the best solution to maximize the perforation efficiency. To carry out this study, four commercial two-ply toilet papers were tested with the line of perforation at several angles. The perforation efficiency was evaluated at each angle. According to the authors' knowledge, there are limited studies on this subject.

#### **2. Simulation–Materials and Methods**

#### *2.1. Optimization*

The optimization of a constrained problem, using discrete variables, is better performed using the genetic algorithm (GA) [10] than using gradient-based methods, with the use of the GA avoiding the trap of local minima [11]. For this problem, the objective was to find the minimum force necessary to detach the toilet paper service by optimizing the angle α and the blank distance *d* of the paper cuts (see Figure 1), where the cut distance was maintained constant in all simulations (*c* = 3 mm). Additionally, a second optimization was performed regarding only the angle α by maintaining the blank distance *d* = 1.0 mm.

As usual, the design variables were coded as genes (coded as integer numbers) grouped into chromosomes (strings). The chromosomes were weighted as the fitness function (minimum force), representing the chromosome phenotype. Populations of possible optimal values were generated considering their probabilistic characteristics, which evolved over generations through reproductions. To avoid local minima, it is necessary to use enough search points within the design variables space [10]. The GA algorithm begins with a random population and assesses the fitness function. Reproduction is carried out by selecting the best individuals and generating the offspring. During reproduction, the genes can be exchanged by the crossovers [11].

**Figure 1.** Design parameters.

The optimization parameters regard a population of 40 individuals (20 times the design parameters) and 150 generations (or as many generations as it takes for a convergence criterion to be reached), with 20% of mutation parameters and 50% crossover probability [12].

As mentioned before, the objective was to minimize the force to detach the toilet paper regarding specific design constrains, i.e., the angle, α, which ranged from 0◦ to 55◦, and the blank distance, *d*, between the cuts, which ranged from 0.1 to 1.0 mm.

The GA created an angle, α, and a blank distance, *d*, population at random based on the angle range of interest. These parameters needed to be qualified according to how they may be more able than others to achieve the design objective.

When this was carried out by using the finite element (FE) model, population crossing could produce a new generation, which was again qualified by the FE model, and this process was repeated until the best generation was found, as shown by the flowchart in Figure 2. After each crossing, the algorithm made an elitism pre-definition, comparing the new generation with the previous one, and selecting the best members to compose the next generation to be crossed. For the genetic algorithm, the mutation probability is 1% and the crossover probability is 100%.

Regarding the optimization flowchart presented in Figure 2, four routines were developed separately:


The optimization process was controlled using the MATLAB® GA algorithm. The analysis started when MATLAB® GA generated the first generation of design parameters. Then, a Python script was called to modify the FE model regarding the design parameters. After that, the MATLAB® ran the FE analysis with the material model.

Die to the fact that explicit FE analyses can take a long time and the GA algorithm demands a considerable number of analyses, it was necessary to obtain the maximum force value and terminate the current analysis. This was performed by the MATLAB® code and a Python script that accesses the ABAQUSTM results several times until it detected a reduction of 20% in terms of the maximum force.

**Figure 2.** Analysis flowchart.

#### *2.2. Material Model*

It is not possible to adopt the isotropic behavior for tissue paper if the kind of paper has different behaviors in the machine and cross directions [8], and ABAQUSTM does not have a native constitutive law to model plasticity for orthotropic materials. Hence, a user material subroutine for explicit simulations (VUMAT) was implemented to simulate the orthotropic elastic–plastic behavior for the paper sheet. The material model, proposed by Mäkelä and Östlund [13], allows the paper anisotropic behavior to be accounted for, since the paper response is highly dependent on the fiber orientation. The model assumes the decomposition of the strain tensor into an elastic strain tensor and a plastic strain tensor (Equation (2)) while conserving the volume.

$$
\varepsilon\_{i\dot{j}} = \varepsilon\_{i\dot{j}}^{\varepsilon} + \varepsilon\_{i\dot{j}}^{p} \tag{2}
$$

where *εij* is the total strain, *ε<sup>e</sup> ij* is the elastic strain, and *ε p ij* is the plastic strain.

The material model adopts the concept of an isotropic plasticity equivalent material [14], a fictitious material that relates the orthotropic stress state to the isotropic stress state. Equation (3) gives the relation between the Cauchy stress tensor and the isotropic plasticity equivalent (IPE) deviatoric tensor.

$$s\_{ij} = L\_{ijkl}\sigma\_{kl} \tag{3}$$

where *sij* is the deviatoric IPE stress tensor, *σkl* is the Cauchy stress and *Lijkl* is the fourth order transformation tensor shown in Equation (4) for plane stress.

$$L = \begin{bmatrix} 2A & \mathcal{C} - A - B & 0\\ \mathcal{C} - A - B & 2B & 0\\ B - \mathcal{C} - A & A - B - \mathcal{C} & 0\\ 0 & 0 & 3D \end{bmatrix} \tag{4}$$

where the parameters *A*, *B*, *C* and *D* are calibrated from the experimental results at 0◦ (MD—machine direction) and 90◦ (CD—cross direction) without perforation obtained in a previous work [15], using the following Equations (5)–(12) [11]:

$$A = \sqrt{1 - 12x^2} \tag{5}$$

$$B = \mathfrak{Z}(y - x)\tag{6}$$

$$\mathbf{C} = \mathfrak{Z}(\mathfrak{y} + \mathfrak{x}) \tag{7}$$

$$D = \frac{K\_{12}^{\frac{n}{(n+1)}}}{\sqrt{3}}\tag{8}$$

$$\mathbf{x} = \sqrt{\frac{a^2}{24(3a^2 + \beta^2 - 4\beta + 4)} \left(\beta + 1 - \sqrt{6\beta - 3a^2 - 3}\right)}\tag{9}$$

$$y = \frac{\alpha}{4\pi} - A$$

$$\mathfrak{a} = K\_{33}^{\frac{2n}{(n+1)}} - K\_{22}^{\frac{2n}{(n+1)}} \tag{11}$$

$$\beta = K\_{33}^{\frac{2n}{(n+1)}} + K\_{22}^{\frac{2n}{(n+1)}} \tag{12}$$

The parameters *Kii* and *n* are related to the curve fit of the tensile test applying the Ramberg–Osgood methodology. For the MD tensile test (see Equation (13)):

$$
\varepsilon\_{11} = \frac{\sigma\_{11}}{E\_{11}} + \left(\frac{\sigma\_{11}}{E\_0}\right)^n \tag{13}
$$

For the CD (see Equation (14)):

$$
\varepsilon\_{kk} = \frac{\sigma\_{kk}}{E\_{kk}} + \left(\frac{K\_{kk}E\_{kk}}{E\_0}\right)^n, k = 2, 3\tag{14}
$$

Note that for Equation (13), the repeated indices do not mean the usual summation rule used in the indicial notation. Finally, the parameter *K*<sup>12</sup> is obtained using Equation (15).

$$\gamma\_{12} = \frac{\sigma\_{12}}{G\_{12}} + \left(\frac{K\_{12}\sigma\_{12}}{E\_0}\right)^n \tag{15}$$

The Hooke's law for plane stress, small strain, linear elastic orthotropic material is given using Equation (16).

$$
\sigma = \mathbb{C} : \mathfrak{e}^{\mathfrak{e}} \tag{16}
$$

Where *σ* is the second order Cauchy stress tensor, *C* is the four-order plane stress, linear elastic, orthotropic constitutive law tensor and *ε<sup>e</sup>* is the second order small strain elastic tensor using matrix notation.

#### *2.3. Finite Element Model*

The implementation of this model follows the well-known *J*<sup>2</sup> flow theory for isotropic materials using the backward Euler algorithm [11]. The explicit solver was used to overcome convergence issues that are common when using the implicit solver for this type of simulation. On the other hand, the stable time increment is very small, which increases the computational costs. Simulations were performed using a workstation with two intel Xeon E5-2630 8 cores (16 cores total with 32 threads) with 256 Gb ram.

The FE model dimensions, and boundary conditions are presented in Figure 3. The boundary conditions were imposed to represent a tensile test. Thus, all the displacement degrees of freedom are restricted (see Figure 3) in one side, and a prescribed displacement

was applied on the reference point. A rigid link between the reference point and paper edge was used to connect the paper and the reference point.

Modeling the tensile test using the reference point to apply the prescribed displacement was important for the post-processing once the number of procedures for the automatic results analysis had been reduced. This strategy does not affect the analysis results, as the resultant applied forces are the same for the case where a prescribed displacement is applied in each boundary node [8].

The paper was simulated using a four-node reduced integration membrane element (M3D4R). The model has a total of 11,086 elements and due to the cuts, a free mesh was used. It is important to mention that the mesh parameters did not change for all simulations. The material properties for the material model are: *E*<sup>11</sup> = 13.89 MPa, *E*<sup>22</sup> = *E*<sup>33</sup> = 4.23 MPa, μ = 0.33 and *G*<sup>12</sup> = 2.1 MPa. The parameters for the IPE model consider *K*<sup>22</sup> = *K*<sup>33</sup> since the mechanical behavior in the CD (direction 2) is similar to that in the thickness direction (direction 3). Thus, *A* = 1, *B* = 2.40, *C* = 2.40 and *D* = 1.38.

#### **3. Experimental Tests–Materials and Methods**

#### *3.1. Materials*

Four commercial two-ply toilet papers were selected. These toilet papers were identified A to D. It was previously verified that two of the two-ply papers tear off the perforation when loaded manually (toilet papers A and B). The other two papers tear on the perforation when loaded manually.

#### *3.2. Methods*

The grammage was determined accordingly with the standard ISO 12625-6:2005 [16] and defined as the mass per unit paper area (g/m2). A Mettler Toledo PB303 Delta range analytical balance (Mettler Toledo, Columbus, OH, USA) was used to determine the paper sample weight. To determine the thickness, where a stack of sheets of paper or a sheet of paper were/was compressed at a given pressure between two parallel plates, a FRANK-PTI® Micrometer (FRANK-PTI GMBH, Birkenau, Germany) was used, in accordance to the standard ISO 12625-3:2014 [17]. According to this standard [17], the bulk, which is the inverse of density, can be determined by using the grammage and thickness previously determined.

According to the standard ISO 12625-12:2010 [5], the perforation line was evaluated. On a Thwing-Albert® VantageNX Universal Testing Machine, tensile tests were performed in the MD for all samples. For each paper, samples were prepared with the perforation in the center (0◦) and with the line of perforation at different angles (20◦, 30◦, 37.5◦, 41◦ and 45◦). Other samples were also prepared, of each paper, with the length of a single "sheet" without perforation but with the orientation of the corresponding angle to annulate the fiber orientation contribution (see Figure 4)

**Figure 4.** Experimental set-up to test non-perforated and perforated toilet papers. (F shows the force direction applied in the tensile test).

The cut and blank distances measurements were made with a paquimeter and repeated in 10 different perforations for each toilet paper sample.

#### **4. Results and Discussion**

Structural characterizations were carried out on the four commercial two-ply toilet papers samples, according to the above-referred standards. Table 1 shows the results in terms of grammage, thickness, bulk, cut and blank distances for all toilet paper samples.

**Table 1.** Physical characterization of the toilet papers: number of plies, grammage, thickness, bulk, cut and blank distance.


Looking at Table 1, the grammage shows values in the range of 32.4–44.9 g/m2. Evaluating the outcomes for the thickness and bulk, values vary between 51% and 60%, respectively, due to the embossing process type.

Figure 5 shows the perforation efficiency behavior as function of the perforation line angle obtained for all toilet paper samples. Analyzing Figure 5, a decreasing trend in perforation efficiency can be observed with an increasing perforation line angle. Although the selected toilet papers present different characteristics, it was demonstrated that they present the same tendency in this regard. This fact is in line with what it was found by Vieira et al. [9], who stated that the perforation efficiency depends on the cut dimensions and not on the fibrous composition and/or the number of plies.

To validate the FE model, the perforation efficiency for papers B and C (Table 1), with a cut distance, *c*, of 1.9 mm and 4.0 mm, respectively, was simulated. The experimental and simulated results are compared in Figure 6. For these simulations, the FE model considered the same conditions (boundary conditions and fiber orientation) as the experiments with and without perforation.

There are some differences between the numerical and experimental results regarding the perforation efficiency (see Figure 6) that could be related to how the failure evolves in the FE model, resulting in higher failure loads (see Equation (1)). Despite these two cases, the FE model showed the same trend, and therefore optimization can be performed using this model (Figure 6). For the 4 mm perforation, the average error between the simulations and experiments was 5.2%, with the error being 8.8% for the 2 mm perforation.

**Figure 5.** Perforation efficiency behavior as function of perforation line angle.

**Figure 6.** Experimental and theoretical perforation efficiency results as function of perforation line angle.

The first case considered the optimization of the two parameters, the blank distance, *d*, and the angle of the perforation line, according to Figure 1, to minimize the tear force. The parameter boundaries used in the GA were 0◦ ≤ α ≤ 55◦ and 0.1 ≤ *d* ≤ 1.0 mm. Regarding the upper boundary for the perforation line angle, α, the value of 55◦ was chosen to avoid the cut line cross of the upper or the lower edges of the paper model, where the displacement boundary conditions were applied.

For the case regarding the optimization of the perforation line angle and the blank distance, the optimum configuration was achieved after 51 generations, with the tear force being in the region of 0.064 N. In the configuration for the minimum tear force, the optimum angle was 0.56◦, which corresponds to a perforation efficiency of 96.8% and, as expected, *d* = 0.1 mm. In comparison to a perforation efficiency of 0◦, in the case of the optimal angle, an increase of 29.3% was obtained.

The GA's best value and mean value over the generations is presented in Figure 7. In this figure, the best value is almost equal through all generations and the mean value converges to the best value after 17 generations.

For the case where only the perforation line angle was the variable to be optimized (blank distance d was fixed and equal to 1 mm), the convergence occurred only after 66 generations, and the minimum tear force was 0.394 N. For this case, the angle for the minimum tear force was 0.67◦, which corresponds to a perforation efficiency of 80.6%. Compared with a perforation efficiency at 0◦, in the case of the optimal angle, an increase of 7.6% was obtained.

**Figure 7.** Optimization evolution of the best value and mean value.

As presented in Figure 8, the best value was almost constant after the 16th generation. On the other hand, the mean value did not converge. For this case, the stop criteria adopted was when the best value between generations was less than the MATLAB® default tolerance.

The stresses field for the optimum case, where the blank distance, *d*, and angle, α, were optimized, are presented in Figure 9, in the increment just before rupture.

The normal stress field in the MD (*σ*<sup>11</sup> in *Y* direction), Figure 9a, shows a stress concentration between the cuts, as expected. As the distance between cuts are only 0.1 mm, the stress concentration is approximately *K*<sup>t</sup> = 21 (regarding the stress in fiber direction, MD), justifying the low rupture force. The same behavior is detected for the other stresses (the CD (*σ*<sup>22</sup> in *X* direction) in Figure 9b and shear stress (*σ*12) in Figure 9c). Hence, cuttings affect the stress fields in the different directions of the paper plane. In this case, rupture begins at the center of the paper, moving fast towards the left and right edges (see Figure 9d), in the same way as it occurs experimentally in the laboratory.

Considering the other case, the optimization regarding only the inclination of the cuts, the stress fields are show in Figure 10. The stress concentration factor is approximately *K*t = 4.1 for the MD stress (significantly lower as in the previous case). As in the previous case, the rupture starts at the center of the paper and moves towards the left and right edges (see Figure 10d).

Additionally, the simulations regarded the paper as a homogeneous media with no variations in fiber alignment or different concentrations throughout the model. This would not be the case in real paper, and such factors would have an influence on the paper rupture force. Figure 11a shows the MD stress field distribution (*σ*<sup>11</sup> in *Y* direction) around the cuts with an orientation of 45◦, and Figure 11b shows the same MD stress field distribution at the beginning of the paper rupture starting at the lower edge towards the center. Due to paper rupture (Figure 11b), stress flows in the non-ruptured region and hence stress is increased (darker green) in this region, while in the ruptured region stress field tends towards zero (darker blue).

**Figure 8.** Optimization evolution of best and mean value for parameter *d*.

**Figure 9.** (**a**) Stress field MD; (**b**) stress field in CD; (**c**) shear stress; (**d**) rupture.

**Figure 10.** Stress field for the optimum orientation: (**a**) in fiber direction MD; (**b**) normal to fiber direction CD; (**c**) shear stress; (**d**) rupture for half of the model.

**Figure 11.** (**a**) Fiber direction stress field in MD for cuts at 45◦; (**b**) rupture starting in the lower edge running to the center.

#### **5. Conclusions**

In general, the results of the FE model simulation analysis support the idea that the value of perforation efficiency tends to decrease with an increasing perforation line angle, in agreement with the experimental results.

A reduction in the tear force for the toilet paper was pursued using a genetic algorithm considering two different cases. In the first case, the blank distance and the angle of the cuts were the variables to be optimized and, for this case, the best configuration was achieved with a blank distance of 0.1 mm and a 0.56◦ inclination in terms of the perforation line, achieving an increase of 29.3% in perforation efficiency. Both the best and mean values converged for almost the same value for this case. For the case where the only variable to be optimized was the inclination of the cuts, with the blank distance fixed at 1.0 mm, the genetic algorithm found the best inclination angle to be 0.67◦, achieving an increase of 7.6% in perforation efficiency, but the average values of the population did not converge. This was due to the complex failure mode of the paper and its kinematics as the damage evolved. Despite the complex failure behavior, the optimum configuration was achieved for both cases (with and without a blank distance fixed at 1.0 mm), and only a small inclination in the perforation line will reduce the tear force, regardless of the rupture progression along the perforation line.

Digital twining is an emergent simulation tool that will be commonly used in the near future because it will permit optimization in a digital environment and the subsequent transition to and application in the industrial environment, as proved with this work.

The main limitation of this work was that it considered the material to be homogeneous and orthotropic. In fact, the material used experimentally contained heterogeneously distributed fibers, preferentially oriented in the MD. But this macroscale model is accurate enough to simulate different geometries in terms of both the perforation line and the cut itself, such as waves, triangles, etc.

**Author Contributions:** J.C.V.: data acquisition and curation, investigation, writing—original draft, writing—review and editing. A.C.V.: FEM analysis, writing—original draft, simulation supervision, writing—review and editing. M.L.R.: FEM analysis, writing—original draft, simulation supervision, writing—review and editing. P.T.F.: supervision, writing—review and editing. A.P.C.: project supervisor, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors gratefully acknowledge the funding of this work that was granted under the Project InPaCTus—Innovative Products and Technologies from Eucalyptus, Project No. 21874 funded by Portugal 2020 through the European Regional Development Fund (ERDF) in the framework of COMPETE 2020 no. 246/AXIS II/2017. The authors are also very grateful for the support given by the research unit Fiber Materials and Environmental Technologies (FibEnTech-UBI), under the project reference UIDB/00195/2020, and by the Center for Mechanical and Aerospace Science and Technologies (C-MAST-UBI), under the project reference UIDB/00151/2020, both funded by the Fundação para a Ciência e a Tecnologia, IP/MCTES through national funds (PIDDAC).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors acknowledge the materials, access to equipment and installations, and all the general support given by The Navigator Company, RAIZ, the Optical Center, Department of Physics, Department of Textile Science and Technology, Department of Chemistry of the Universidade da Beira Interior.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Using ARIMA to Predict the Growth in the Subscriber Data Usage**

**Mike Nkongolo 1,2**

<sup>1</sup> Department of Informatics, Faculty of Engineering, Built Environment and Information Technology, University of Pretoria, Pretoria 0028, South Africa; u21629545@tuks.co.za

<sup>2</sup> Maven Systems Worx (Pty), Ltd. & NEC XON (Pty), Ltd., Centurion 0157, South Africa

**Abstract:** Telecommunication companies collect a deluge of subscriber data without retrieving substantial information. Exploratory analysis of this type of data will facilitate the prediction of varied information that can be geographical, demographic, financial, or any other. Prediction can therefore be an asset in the decision-making process of telecommunications companies, but only if the information retrieved follows a plan with strategic actions. The exploratory analysis of subscriber data was implemented in this research to predict subscriber usage trends based on historical timestamped data. The predictive outcome was unknown but approximated using the data at hand. We have used 730 data points selected from the Insights Data Storage (IDS). These data points were collected from the hourly statistic traffic table and subjected to exploratory data analysis to predict the growth in subscriber data usage. The Auto-Regressive Integrated Moving Average (ARIMA) model was used to forecast. In addition, we used the normal Q-Q, correlogram, and standardized residual metrics to evaluate the model. This model showed a *p*-value of 0.007. This result supports our hypothesis predicting an increase in subscriber data growth. The ARIMA model predicted a growth of 3 Mbps with a maximum data usage growth of 14 Gbps. In the experimentation, ARIMA was compared to the Convolutional Neural Network (CNN) and achieved the best results with the UGRansome data. The ARIMA model performed better with execution speed by a factor of 43 for more than 80,000 rows. On average, it takes 0.0016 s for the ARIMA model to execute one row, and 0.069 s for the CNN to execute the same row, thus making the ARIMA 43<sup>×</sup> ( 0.069 0.0016 ) faster than the CNN model. These results provide a road map for predicting subscriber data usage so that telecommunication companies can be more productive in improving their Quality of Experience (QoE). This study provides a better understanding of the seasonality and stationarity involved in subscriber data usage's growth, exposing new network concerns and facilitating the development of novel predictive models.

**Keywords:** time series forecasting; subscriber data; seasonality; ARIMA; telecommunication; UGRansome; stationarity

### **1. Introduction**

The growth of competition in the telecommunications industry due to technological variety has facilitated the invention and expansion of new techniques for processing subscriber data to predict their behavior. Subscriber traffic represents all kinds of electronic data transmitted in the network [1]. This data is usually in the form of network flows passing from one node to another [2]. Furthermore, accurately predicting subscriber data can improve the Quality of Experience (QoE) to foresee and predict various anomalies, especially when the company faces revenue loss due to malicious activities. In addition, having the ability to forecast future data usage can be crucial for bandwidth sharing policy within the telecommunication business. Particularly, forecasting integrates a strong sense of seasonality towards data growth to enable management better predict potential revenue

**Citation:** Nkongolo, M. Using ARIMA to Predict the Growth in the Subscriber Data Usage. *Eng* **2023**, *4*, 92–120. https://doi.org/10.3390/ eng4010006

Academic Editor: Antonio Gil Bravo

Received: 1 November 2022 Revised: 26 December 2022 Accepted: 27 December 2022 Published: 1 January 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and anomalies. The above stems from a time series forecasting problem, and there is various research on different forecasting models [3–5]. Statistical models such as Auto-Regressive Integrated Moving Average (ARIMA) and Machine Learning (ML) models such as Long Short Term Memory (LSTM), gradient descent, and regression are popular techniques implemented within the time series forecasting space. In particular, the LSTM model has demonstrated great forecasting capabilities due to its ability to recall information and it is thus a strong contender against the traditional statistical ARIMA model. Execution speed is another factor to consider when selecting an appropriate model for subscriber data forecasting. This is because subscriber patterns contain petabytes made of historical time series data. It will thus be efficient to consider a model with fast execution speeds to enable faster decision-making. However, this research addresses the performance of the Convolutional Neural Network (CNN) and ARIMA models to forecast the growth of subscriber data usage. The research determines which algorithm is the most suitable in this scenario and aims to establish which of the two models performs better using speed and accuracy. In this article, we describe the advantages of using seasonality to examine changes in subscriber data. In a Time Series Analysis (TSA), seasonality is a characteristic of a time series in which the data experiences predictable and regular changes over a period [5]. Understanding seasonality in TSA can enhance the prediction performance of ML models. It can also assist in clearing the features by identifying the seasonality of time series samples and removing them from the original dataset. As a result, one can have a normalized dataset correlating input and output variables. The seasonality property can also provide more information about the seasonal component of the time series data that can provide insights to enhance predictors' performance [4]. Modeling seasonality ameliorates the data preparation and feature engineering steps. In each step, seasonal patterns can be extracted and modeled as input/output class labels with a supervised learning scheme. In adaptive computation, ARIMA is a class of time series forecasting models. Hence it is a special case of a class of regression models, not a class of classification models [6]. We have selected ARIMA as an adequate time series forecasting model to predict subscriber data usage and analyze the seasonality, trends, and cycles of features. The methodology was to use seasonality as the time series data property in the ARIMA model that implemented a distributed lag algorithm to forecast future subscriber data usage based on lagged parameters. This article implements a predictive ARIMA model using subscribers' data to study seasonality by predicting the growth in subscriber throughput.

#### *1.1. Research Question*

The main research question is as follows:

• Which forecasting model between ARIMA and CNN is effective in predicting subscriber data usage?

The research objective is to evaluate the two models using accuracy and computational speed.

#### *1.2. Research Contribution*

We propose the ARIMA model for subscriber data prediction using an unsupervised learning scheme. We have specifically implemented the ARIMA model with unlabelled features to predict the growth in subscriber data usage. In the model, the predictive layer forecasts the throughput rate fed into another layer that predicts the maximum usage growth. The remainder of the paper is structured as follows. Section 2 discusses related research works and Section 3 the research methodology. Section 4 presents the ARIMA results and the comparative analysis using the UGRansome dataset. Section 5 presents future research directions and concluding remarks.

#### **2. Related Works**

The section surveys the predictive techniques of TSA with attention to the proposed methodology.

#### *2.1. Stationarity*

Before delving deeper into the ARIMA model theory, it is important to understand the concept of stationarity. This is because, unlike the CNN, the ARIMA model does not perform well when the dataset is not stationary, and thus it needs to be made stationary before processing [7]. A dataset is said to be stationary if it conforms to all the following conditions [7]:


There are various ways to make the data stationary. One commonly used approach is the differencing method where finite differencing is applied to the data points. A nonstationary trend is denoted by *Yt* while the stationary trend is denoted by *Zt*. We posit that *Zt* will thus be equal to the difference between successive values of *Yt*:

$$Z\_t = \mathbb{Y}\_t - \mathbb{Y}\_{t-1} \tag{1}$$

However, should the nonstationary dataset also exhibit seasonal characteristics, it will therefore be recommended to apply seasonal differencing towards the dataset [8]:

$$Z\_t = \mathbb{Y}\_t - \mathbb{Y}\_{t-m} \tag{2}$$

where *m* is the monthly timestamps. For instance, a 12 monthly differencing can be written as follows:

$$Z\_t = \mathbb{Y}\_t - \mathbb{Y}\_{t-12} \tag{3}$$

The differencing method is very effective and thus recommended because in most cases, non-stationary data can easily be transformed to stationary after the first difference, and thus no further transformation would be required. This is not the case with other transformation methods where stationarity can be reached after multiple transformations along the same data points.

#### *2.2. Background*

In Ref. [9] a deep learning model to forecast a product usage of a given consumer based on historical data was developed. The authors adapted a CNN with auxiliary input to time-series data to demonstrate an improvement in the model accuracy which predicted future change. To improve the forecasting skills of aircraft in flight navigation systems, Ref. [10] undertook a study on weather forecasting comparing the predictive ability of LSTM and ARIMA models. The study found that the LSTM performs much better than the ARIMA with Root Mean Square Error (RMSE) values of 0.0007 for the LSTM and 0.948 for the ARIMA. A solution presented by [11] demonstrates that the LSTM outperforms the ARIMA model. The purpose of the study was to forecast a multi-step electricity load for Poland, Italy, and Great Britain. The RMSE values for each model were summarized, but the LSTM outperformed the ARIMA using the RMSE evaluation metric for predicting wind speed. In Ref. [12], a study to determine which forecasting time series techniques between ARIMA and LSTM produced the most accurate predictions with a minimalistic empirical error was undertaken. The LSTM outperformed the ARIMA for all stock markets prediction with an average RMSE of 64 dollars. Limestone is an important raw material in today's world. Around 10% of the sedimentary rocks on Earth are made up of limestone [13]. According to [13], over 25% of the world's population relies on limestone for drinking water, and about 50% of all known gas and oil reserves are encased in limestone rocks [13]. It is therefore crucial for various economies to accurately predict future prices of limestone. In Ref. [14] a study comparing the ARIMA and LSTM with regards to predicting future prices of limestone was conducted. The ARIMA performed slightly better than the LSTM with an accuracy of 95.7% compared to the LSTM's 91.8%. However, we argue that the probable reason for the LSTM model's subpar performance

was due to manual tuning towards some of the model's hyper-parameters. For instance, the number of LSTM layers was manually tuned. In addition, the author did not disclose the exact units for their target variable. The authors in [15] used regression to learn the correlation between a time series and continuous variables. The approach was to detect the correct coefficients to forecast various attributes. The regression model predicted annual rainfall using historical temperature values [16] with Random Forest (RF) and Gradient Descent (GD) algorithms. The final results confirm the in-depth understanding of time series data to compute the optimal fitting algorithm. However, Ref. [17] attempted to predict respiratory rates using a sliding window that consists of three modules. The first module retrieved the signal of respiratory patterns; the second approximated the rates, and the third made various estimations. A Gaussian-based regression process extracted the respiratory features from the datasets. It also attempted to fit different Auto-Regressive (AR) algorithms to the retrieved signals. Unfortunately, the AR model failed to detect seasonality. In Ref. [18], Dynamic Time Warping (DTW) and K-Nearest Neighbor (KNN) used for time series forecasting exhibited a complexity time of 1-NN using the DTW that relied on the engineering of hand-crafted patterns. In Ref. [19], the CNN used on time-series data outperformed all other tested ML models. The author proposed a feature selection method to automate the learning from input variables. The learned patterns represent time series features with discriminatory layers. However, this technique relies on back-propagation that turns the NN into an adequate feature selector. According to [20], the juxtaposition of Recurrent Neural Networks (RNN) such as LSTM and CNN yielded enhanced accuracy for classification tasks with a range of 27% to 43% in comparison to other well-known ML models. The classification was also considered by [21] and assessed with J48, LSTM, RF, Support Vector Machine (SVM), and CNN. The LSTM-based CNN outperformed other models with three hidden layers. In Ref. [22], the authors used regression to allocate company resources. In addition, the authors undertook a substantial review of well-known ML models for time series data forecasting, but [23] used CNN to address multivariate time-series regression problems. The LSTM and Gated Recurrent Unit (GRU) portrayed transferable CNN units compared to other models. The research in [24] used LSTM with additional convolutional layers. The results provide a boost in predicting performance. Lastly, three CNN and four LSTM were implemented by [25] with an improved CNN execution time. Generally, regression models using CNN and LSTM are the most optimal ML techniques used in the literature for time series data forecasting (Table 1). The limitation of the discussed research relies on dataset misunderstanding, lack of feature engineering, non-seasonal patterns, computational biases, and time complexity. Classifiers such as SVM and Decision Trees (DT) are also prone to error in terms of time series prediction since they are not a better choice for forecasting (Table 1). The time series data forecasting solutions are also implemented in various fields such as weather, electricity, and price prediction (Table 1).


**Table 1.** Comparative analysis.

**Table 1.** *Cont.*


#### *2.3. Time Series Data Limitation*

Some attempts allow efficient computation of large-scale time series data. For instance, Ref. [26] implemented a Hadoop-based framework for accurate preprocessing of data which is important for feature selection. Unlike [26,27] concentrated on model selection by using MapReduce to compute the cross-validation that improved parallel rolling-window prediction using the training set of heterogeneous time series patterns. The predictive parameters computed the accuracy, but this technique could not tackle challenges associated with forecasting. In Refs. [28,29] multi-step forecasting was monitored by the ML models using the Spark environment. Specifically, Ref. [28] used H iterations to compute the multistep prediction, while [29] implemented multivariate regression models using ML libraries. As a result, the H technique was not scalable for forecasting. With this, one can use a sample of patterns instead of the original data to predict. For example, Ref. [30] provides an overview of forecasting big data using time series traffic. The paper provides a premise for time series data forecasting, but it is still complicated to implement the proposed techniques to deal with subscriber data and forecast the future. Some researchers investigate the underlying intuition of parallel computing models using time series data. Unfortunately, these models resulted in expensive computational time complexity. For instance, Ref. [31] introduced a distributed approximator before the prediction calculation, requiring several iterations. Based on their frameworks, Refs. [32,33] proposed recursive techniques with Bayesian prediction while [34] refined the estimator computation of quantile regression model through various rounds of classification. Another well-known methodology is the alternation of eigenvectors for convex optimization of time series data. This technique blends the seasonality of time series data with the convergence properties of predictors [35], but the streams complicate the forecasting prediction. We argue that a one-shot averaging computation is a straightforward technique to compute the prediction. This method requires only a single computational round [36]. Various studies used distributed learning that split features in a specific frequency domain where the time series patterns are used in the splitting process [37]. These algorithms model successive refinements with a limitation that requires re-implementing each estimator scheme, but slow in terms of convergence accuracy compared to existing predictors designed for time series data forecasting [38]. For example, Ref. [39] analyzed cyclostationary properties of 0-day exploits with slow precision convergence. Boruta was the feature-based extraction method combined with Principal Components Analysis (PCA) to extract the most cyclostationary patterns from NSL-KDD, UGRansome, and KDD99 datasets. The RF and SVM were used to classify cyclostationary features. The supervised learning restricted the experiments, but our research implements an unsupervised learning scheme to study stationary prediction. Moreover, we have compared the ARIMA performance applied to the UGRansome [40] and subscriber datasets to assess the forecasting performance of stationary and time series data. The following section presents our methodology, Exploratory Data Analysis (EDA), and UGRansome dataset [41]. However, all mentioned articles in this section are crucial because they provide valuable recommendations regarding ML to forecast subscribers' usage data growth.

#### **3. Materials and Methods**

We have used subscriber data collected from a network database and analyzed the patterns to predict the growth in subscriber data usage. The Network Subscriber Data Management (NSDM) approach is thus the relevant aspect of this research as it stands at the core network layer and stores valuable data used by various subscribers. The NSDM extracts subscribers' patterns from the Insights Data Storage (IDS) and monitors all realtime traffic of subscriber data [42]. We have used the NSDM module that considers subscriber data in a centralized and secure environment having a scalable repository named IDS (Figure 1).

**Figure 1.** The NSDM architecture.

The IDS directory provides distributed and resilient subscriber patterns stored in a single repository. The ARIMA model was used on this repository to predict the growth in subscriber data usage (Figure 1).

#### *3.1. Mathematical Formulation of ARIMA*

An ARIMA model has a different Moving Average (MA), as well as AR components [43]. We use ARIMA(p, d, q) to denote an ARIMA model where the order of the AR module is (p, q) and d represents the number of differences needed for stationary series [43]. One can extend the ARIMA predictor to a Seasonal ARIMA (SARIMA) model by incorporating additional seasonal patterns to handle time series properties that exhibit a strong seasonal characteristic [43]. We can use ARIMA(p, d, q)(P, D, Q) to formulate a SARIMA model. Here, the uppercase Q, P, and D denote the order of the AR model, the number required for seasonal/stationary series, and the MA order. Similarly, the seasonality period is denoted by m [43,44]. An ARIMA(P, D, Q)(*p*, *d*, *q*)*<sup>m</sup>* model for time series (*yt*, *t* ∈ Z) has the following back-shift operator:

$$\frac{1}{p}\left(1-\sum\_{i=1}^{p}\theta\_i B^i\right)-\left(1-\sum\_{m=1}^{p}a\_i^m\right)(1-B)^d(1-B^m)^D y\_t = \left(1+\sum\_{i=1}^{q}\gamma\_i\right)(1+\sum\_{i=1}^{Q}a\_i B^m)\omega\_t \tag{4}$$

where B denotes the backward shift function, *ω<sup>t</sup>* the white noise, m the seasonality length, *θ*, and *α* represent the AR parameters, *γ*, and *ω* refer to the seasonal parameters of the MA. This mathematical formulation represents two major combinations of seasonal parameters *P*, *D*, *Q*, and *p*, *d*, *q*, where:


The variation of these ARIMA parameters can identify the most optimal set of features in obtaining precise predictive values [43–45].

#### *3.2. Experimental Datasets*

Figure 2 presents the research methodology where our framework provides subscriber data stored in the IDS module.

**Figure 2.** The experimental methodology.

The subscriber data was extracted from the real-time network traffic using a Structured Query Language (SQL). We pushed the features into a single comma-separated file and used EDA to visualize salient features of the network traffic. We have then obtained critical Key Performance Indicators (KPIs) that can support the prediction of data usage growth. The executed SQL retrieved the subscriber timestamps, incoming throughput, and outgoing throughout (Figure 3).


**Figure 3.** The subscriber data.

The query extracts the timestamps (ts) by truncating them into a human-readable format (Year-Month-Time). The incoming throughput was computed using the following Equation (5):

$$Tpt\_{in} = \frac{sum(bytes\_{in}) \* (8)}{36,000} \tag{5}$$

The SQL in Figure 3 illustrates this process. Equation (6) denotes the outgoing throughput computation:

$$Tpt\_{out} = \frac{sum(bytes\_{out}) \* (8)}{36,000} \tag{6}$$

It is hourly-based statistics retrieved from the traffic stats table of the IDS for 60 days (Figure 3). The 3600 represents an hour in seconds, and eight changed the bytes into bits. In addition, we grouped results by timestamps. Retrieved patterns were converted into Comma-Separated Values (CSV) (Figure 4).

The subscriber dataset has 730 entries with four attributes (human-readable timestamps, UNIX timestamps, incoming throughput (Tpt in), and outgoing throughput (Tpt out)). A timestamp represents the time when the subscriber traffic was collected [46]. The throughput is the flow that measures inputs/outputs movements within the network [46]. The following Figure 5 illustrates our research methodology.


**Figure 4.** The CSV format of the subscriber data.

**Figure 5.** The research methodology.

The subscriber and UGRansome datasets are collected, and then the EDA is executed before the computation of the ARIMA model that predicts the growth in data usage based on the current timestamp. The techniques discussed in the literature train ML classifiers with human-labeled features, but this supervised learning method uses limited samples. We have used an unsupervised learning technique whereby we did not label the features. The ARIMA model attempted to use data points *x*<sup>1</sup> ... *xn* and assigned predicted values Θ<sup>1</sup> ... Θ*<sup>n</sup>* using predefined parameters.

#### *3.3. Feature Engineering and Data Cleaning*

Data cleaning is a method of mapping and transforming features from one-row data format to another, to make it more suitable and valuable for various downstream uses, such as time-series forecasting. One of the most important data cleaning processes is handling missing values [41]. Fortunately, concerning the subscriber data, the dataset contains no missing values. However, the data still needs to be transformed in other various ways for training and testing, and in this case, this will include:


#### *3.4. Stationarity of Data*

Two main methods can be used to determine the stationarity of a time series dataset:


In this research, the differencing method was applied to make the dataset stationary (Figure 6). The differencing and original data are distinguished in Figure 6.

**Figure 6.** The original and differencing of subscriber data.

#### *3.5. The UGRansome Characteristics*

This dataset was created by extracting important features of two existing datasets (UGR'16 and ransomware) [41]. UGRansome is an anomaly detection dataset that includes normal and abnormal network activities [48]. The regular characteristic sequence makes up 41% of the dataset, whereas irregularity makes up 44%. The remaining 15% represents the predictive values of network attacks grouped into the signature (S), synthetic signature (SS), and anomalous (A) attacks (Figure 7).

**Figure 7.** Distribution of network threats.

Figure 7 depicts the signature attacks having a proportion of 44.02% (synthetic signature 28.71%, and anomaly 27.27%). A significant proportion of signature traffic means that the UGRansome threatening concerns are detectable. Regular threats, like User Datagram Protocol (UDP) and Botnet, provide about 9% for the anomalous category. The Internet Protocol (IP) and ransomware addresses have a ratio of 1% [39]. In addition, a ratio of 2% exists between communication protocols and ransomware addresses [41]. According to Refs. [39,41] the significant distribution of the UGRansome could be summed up in the following Figure 8. However, UGRansome is more redundant compared to subscriber data and we removed 28.2% of duplicate records during the feature extraction phase (Figures 8 and 9).


**Figure 8.** The UGRansome data summary.


**Figure 9.** The subscriber data summary.

#### *3.6. Exploratory Techniques*

The exploratory analysis provides a set of techniques to understand the dataset. The results produced by the EDA can assist in mastering the data structure [49], as well as the distribution of the features, detection of outliers, and correlation within the dataset. Some of the statistical metrics used to evaluate the ARIMA model are standard deviation, correlation, mean, standardized residual, normal Q-Q, correlogram, theoretical quantile, *p*-value, and accuracy:

• Standardized residual (*ri*). It measures the strength of actual and predicted values and indicates the significance of features [50] (*ri* facilitates the recognition of patterns that contribute the most to the predictive values):

$$r\_i = \frac{\varepsilon\_i}{s(\varepsilon\_i)} = \frac{\varepsilon\_i}{RSE\sqrt{1 - h\_i}}\tag{7}$$

where *ei* is the i*Th* residual, RSE is the standard error of the residual model, and *hi* the i*Th* leverage observation.

• Normal Q-Q. The normal Q-Q means normal Quantile-Quantile. It is a plot that compares actual and theoretical quantiles [50]. The metric considers the range of random variables to plot normal Q-Q using a probabilistic computation. The x-axis represents the Z-score of the standardized normal distribution, but different formulations have been proposed in the literature to detect the plotting positions:

$$\frac{(k-a)}{(n+1-2a)'} \tag{8}$$

for some value between 0 and 1 [k, a]; which gives the following range (Equation (9)):

$$\frac{K}{(n+1)} \longleftrightarrow \frac{(k-1)}{(n-1)}.\tag{9}$$

• Correlogram. It is a correlational and statistical chart used in TSA to plot the autocorrelations sample *rh* versus the timestamp lags h to check for randomness [50]. The correlation is zero when randomness is detected. Equation (10) denotes the auto-correlation parameter at h lag:

$$r\_h = \frac{c\_h}{c\_0} \,\prime \tag{10}$$

where *ch* is the auto-covariance coefficient and *c*<sup>0</sup> the variance function.

• Augmented Dickey-Fuller (ADF) test. This statistical metric tests the stationarity of time series data [50] by using a unit root metric *β* that exists in a series of observations where *α* = 1 as per the below Equation (11).

$$
\mu y\_t \implies \mathfrak{a}\_{t-1} + \beta \mathfrak{x}\_\varepsilon + \mathfrak{e}\_\prime \tag{11}
$$

Here *yt* represents the time series values at time t, but *xe* is a separate time series variable.


$$Kurtosis[x] = [(\frac{x-\mu}{\sigma})^n],\tag{12}$$

where *μ* is the random selection of inputs *x* using a standard deviation *σ* following the constraints:

$$\sum\_{i=1} \sum\_{j=1} \frac{\mu^i}{\sigma^j}.\tag{13}$$

• Jarque-Bera (JB) test. This metric uses a Lagrange multiplier to test for data normality. The JB value tests if the distribution is normal by testing the Kurtosis to determine if

features have a normal distribution. A normal JB distribution will have symmetrical Kurtosis indicating the peaked in the distribution. We formulate the JB test as follows:

$$JB = n[\frac{\sqrt{b\_1^2}}{6} + \frac{(b\_2 - 3)^2}{24}],\tag{14}$$

where the sample size is n, <sup>√</sup>*b*<sup>1</sup> is the skewness sample, and *<sup>b</sup>*<sup>2</sup> is the Kurtosis coefficient.

• Heteroscedasticity. It checks the alternative hypothesis (*HA*) versus the null hypothesis (*H*0) [50]. With the alternative hypothesis, the empirical error is multiplying the function of various variables:

$$H\_A: \sigma\_1^2 = \sigma\_2^2 \ast \dots \ast \sigma\_n^2. \tag{15}$$

However, a null hypothesis has equal error variances (homoscedasticity) [50]:

$$H\_0: \sigma\_1^2 = \sigma\_2^2 = \dots = \sigma\_n^2. \tag{16}$$

• Accuracy. The balanced accuracy *BA* of the ARIMA model is calculated with the following mathematical formulation [47]:

$$B\_A = \frac{\left(\left(TP/TP + FN\right) + \left(TN/\left(TN + FP\right)\right)\right)}{2} \tag{17}$$

where True Positive (TP) and True Negative (TN) denote correct classification, but misclassification is the False Positive (FP) and False Negative (FN) [50]. We used crossvalidation rounds to build multiple training/testing subsets to decide which model is a suitable predictor of the growth in subscriber data (80% of the training set, 10% of the validation set, and 10% of the testing set).

#### *3.7. Feature Extraction*

ML models are used to address a range of prediction problems. The unsatisfactory prediction of ML classifiers originates from overfitting or underfitting features [7]. In this research, the removal of irrelevant patterns guaranteed improved performance of the ARIMA computation. PCA was utilized on the UGRansome data to extract relevant patterns. PCA is a feature extraction methodology of this research. We denote PCA as follows:

$$\mathbf{P} = \frac{1}{t-1} + \sum\_{t=1}^{k} (\left[\mathbf{x}(t)\mathbf{x}(t)^T\right]),\tag{18}$$

with stochastic *x*(*t*) and *t* = 1, 2 ... *l* with *n*-dimensional inputs *x* having a probability matrix **P** of zero mean. The PCA formulation uses the covariance given in Equation (19) with a linear calculation of *x*(*t*) inputs into *y*(*t*) outputs:

$$y(t) = \mathbb{Q}^T x(t),\tag{19}$$

Q is an orthogonal n\*n matrix type where *i* represents the columns viewed as eigenvectors computed as follows:

$$y\_i(t) = \mathbb{Q}^T x(t),\tag{20}$$

The range in Equation (20) starts from 1 . . . n where *yi* is the new component of the ith PCA. Table 2 depicts the PCA results using the UGRansome dataset. However, SQL was used as a feature extractor to extract subscriber data from the IDS (Figure 3).


**Table 2.** The PCA results using the UGRansome.

#### *3.8. Model Training and Testing*

The parameters shown in Figure 10 will be used to train and test the ARIMA model.


**Figure 10.** The ARIMA model parameters.

The choice of these parameters is justified because a model with only two AR terms would be specified as an ARIMA of order (2, 0, 0). A MA(2) model would be specified as an ARIMA of order (0, 0, 2). A model with one AR term, a first difference, and one MA term would have order (1, 1, 1). For the proposed model using the subscriber data, an ARIMA (2, 1, 2) model with two AR terms, the first difference, and two MA terms are being applied to improve the forecasting accuracy (Figure 10). One AR(1) term with two differences and one MA(1) term are used for the UGRansome data to account for a linear trend in the data (Figure 10). The differencing order refers to successive first differences. For instance, for a difference order = 2 the variable analyzed is *zt* = [*xt* − *xt*−1] − [*xt*−<sup>1</sup> − *xt*−2]. This type of difference might account for a quadratic trend in the data. However, due to seasonality concerning the experimental data, the SARIMA model is used. The SARIMA model is an extension of the ARIMA with integrated seasonal components. The training set is used to train the model and thus used to predict the data usage for the 2022 year. The test set is also utilized to validate the final predictions for the year.

#### *3.9. Model Tuning*

In contrast to the ARIMA, the CNN has several parameters that are not estimated by the model (i.e., the p and q values), and thus the algorithm is trained by manually specifying a set of hyper-parameters using trial and error. The number of layers, neurons, batch sizes, and epochs utilized in the CNN, are some examples of hyper-parameters that need to be tuned manually. It should also be noted, that, unlike the ARIMA which required the dataset to be transformed to its stationary format, the CNN in this research will be trained against non-stationary time series patterns of the UGRansome dataset. Considering a single input of features as depicted in Figure 11, the CNN weights these features to enable the learning trend of particular observations. We have various observations, so we merge their outputs into a hidden layer.

**Figure 11.** The CNN architecture.

Our CNN architecture uses binary convolutions (with 70 and 30 filters) and a densely connected layer of 50 neurons with the activation function (ReLU). This unit has six connected layers representing auxiliary outputs (Figure 11). Each layer predicts and passes the prediction value to the next layer which predicts growth in the subscriber data usage until the final layer produces long-term forecasting. Ideally, the convolution layer uses the ReLU computed as follows:

$$a\_{i,k,j} = \max[\mathsf{W}\_k^T \mathsf{x}\_{j,i} + b\_{k'} 0] \tag{21}$$

where *ai*,*k*,*<sup>j</sup>* denote the activation value of the kth feature at location [j, i], and *xi*, *j* is the input patch centered at location [*j*, *i*]. Here *wk* and *bk* are weighted vectors and bias terms of the kth filter. We use each layer to predict in advance some additional days and grid search to detect the optimal number of filters, convolutions, connected layers, and drop-out rate. For each layer *k* ∈ [1, 2, 3, 4, 5, 6] we added an empirical error and loss function. Each layer k aims to produce future forecasting for more than 14 days ahead:

$$k\_{loss} \Longrightarrow \frac{1}{n} \sum\_{i} (y\_{i\prime} \, 14\_k - \tanh(\hat{y}\_{i\prime} \, 14\_k))^2 \tag{22}$$

where [*yi*, 14*k*] represents the features of time-series i, and [*y*ˆ*i*, *k*] the k-th layer's forecasting the value. The function restricting the range of [*y*ˆ*i*, *k*] to [−1, 1] is denoted by tanh. With this, we can reformulate Equation (22) to the weighted sum and minimize the loss by decreasing *λ<sup>k</sup>* values:

$$Min\_{loss} \implies \frac{1}{k} \sum\_{k=1}^{K} (\frac{k\_{loss}}{\lambda\_{loss}}) \tag{23}$$

Table 3 shows a summary of the hyper-parameters used for the CNN model.

**Table 3.** CNN tuning parameters.


#### *3.10. ARIMA Predictor Model*

Given a long period, *yt*[*t* = 1, 2, ... , *T*] of a spanning time series traffic, the aim is to come up with a new scheme that works well for predicting the future outcomes H. We define *S* = [1, 2, ... , *T*] as the sequence of timestamp with time series *yt*. The prediction of the problem can be written as *f* [Θ, ∑ |*yt*, *t* ∈ *S*], where the parameter is f, the global parameters Θ, and the covariance matrix ∑. However, the time series data is divided into different sub-series (k) having contiguous time intervals (Equation (24)):

$$S = \sum\_{k=1}^{K} S\_{k\prime} \tag{24}$$

where *Sk* extracts the kth sub-series timestamps and we posit *T* = ∑*<sup>K</sup> <sup>k</sup>*=<sup>1</sup> *Tk*. With this assumption, the predictor estimator of the sub-problem is shown in Equation (25):

$$f\left[\Theta, \sum \lfloor y\_{l\prime}, t \in S\right] = \lg\left[f\_{1\prime} \sum\_{1} \lfloor y\_{l\prime}, t \in S\_1\right] \dots \lfloor f\_{k\prime} \sum\_{k} \lfloor y\_{l\prime}, t \in S\_k\rfloor \right.\tag{25}$$

where *fk* represents the estimator function for the kth sub-series and g the combination. The estimation was merged before the prediction. The idea is to use *g*(.) as a single mean parameter, and our computational framework could be viewed as an averaging ML algorithm (Figure 12).

**Figure 12.** The ARIMA model.

Figure 12 outlines the proposed ARIMA model to forecast the growth in subscriber data. The timestamps of historical data were recorded in the IDS before being processed by the ARIMA. In simple terms, the proposed model consists of the following phases:


We used available hourly-based timestamps to create a new set of timestamps (ts) used in the ARIMA prediction. The following formulation was used to predict new timestamps (Equation (26)):

$$Predicted\_{ts} = LastEpochTämestamus + n\*3600\tag{26}$$

where *n* = *range*(1, 48). This computation provides new predicted timestamps on an hourly basis for the next hours. Figure 13 shows the computation of the needed inputs x using the current value, step size, and stop value. The x represents current hourly values starting at zero milliseconds (ms) and going up to 731. The computation of the predicted values used x, a step size of 360,000, and a stop value. The needed value represents the result of the prediction in which the current value of x was used as an index starting at zero and multiplied by the step size. This calculation stops at the 731st iteration, where 731 denotes the last index (Figure 13).

**Figure 13.** The predictive computation.

#### *3.11. Computational Environment*

The IDS used to build the subscriber data is installed on a DBeaver database. DBeaver is a database monitoring software that manipulates subscriber data. It can be used to build analytical dashboards from various data storage. Table 4 presents the computing environment and Table 5 the feature extraction results.


**Table 4.** Framework specification.

#### *3.12. Feature Extraction*

There are different reasons causing duplication in a dataset, among which are imperfections in the data collection process and the properties of patterns, but feature extraction solved redundancy dimensions. Features projected into a new space have lower dimensionality. Examples of such techniques include Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA), and PCA [40]. We have used PCA to extract relevant features of the UGRansome dataset. The PCA lowered computational complexity, built generalizable models, and optimized the storage space. To address the redundancy issue, the PCA selected a subset of relevant patterns from the original dataset based on their relevance.

We present the PCA results in Table 5 where the final dataset with the description of each attribute is presented. The prediction attribute facilitates the forecasting of any ML model to predict the category of novel intrusion. Our final dataset has 12 variables with 180,564 observations (Table 5). Consequently, if the deviation degree of a variable is high or low enough, it is considered an abnormality. However, we did not apply feature extraction on the subscriber data because it has no redundant patterns. The feature extraction using UGRansome leads to improved performance, higher prediction accuracy, minimized computational time, and efficient model interpretability.


**Table 5.** Extracted UGRansome features.

#### **4. Results**

This section is structured into the DFT, comparative results of ARIMA and CNN, execution speed test, and predictive results comparison using additional standard forecasting approaches such as BATS and TBATS. The section compares the ARIMA model performance using the subscriber and the UGRansome datasets. Figure 14 shows the format of the subscriber data following a stationarity distribution compared to the UGRansome timestamps.

**Figure 14.** The timestamp and density comparison.

Figure 15 portrays a distribution of incoming and outgoing throughput of the subscriber data compared to the UGRansome port traffic (5066–5068).

Each attack flow is also depicted. The figure depicts NerisBonet threats with less traffic. This result reveals a time series forecasting property of the subscriber data but not the UGRansome. However, the UGRansome has more distributed or dependent variables (Figure 16).

**Figure 15.** Additional features comparison.

**Figure 16.** The normal Q-Q results.

The correlation of throughput is in Figure 17.

**Figure 17.** The throughput correlation.

This plot indicates the linear distribution of predicted values. The summary of the SARIMA model using the subscriber data is presented in Figure 18.


**Figure 18.** The SARIMA model summary.

The summary confirms that the prediction of subscriber data will have an increased mean or standard deviation given the likelihood, Prob(Q), and Kurtosis values. The SARIMA model reports no outliers for subscriber data given the Kurtosis value, and the JB test describes a normal distribution. Similarly, the level of heteroscedasticity supports our hypothesis with a degree of 1.35, a *p*-value of 0.112 having a likelihood value of −14,399.47. With these results, we reject the null hypothesis and accept the alternative. The former denies the growth of subscriber data, while the latter accepts. In the next section, we will compare the subscriber data with the UGRansome using the DFT results.

#### *4.1. Dickey Fuller Test*

The DFT results are in Table 6. A *p*-value of 0.007 with an accuracy of 90% was obtained. This result supports our hypothesis predicting an increase in subscriber data growth. As such, (i) the UGRansome used more iterations due to its size surpassing the subscriber dataset, (ii) the balanced accuracy of the DFT reached 81% of accuracy, and (iii) the dataset size has not to effect on the prediction performance. The residual and correlogram are in Figure 19 with the seasonality of the throughput.

**Figure 19.** The standardized residual and correlogram.


**Table 6.** The DFT results.

However, the data usage growth prediction is illustrated in Figure 20 using the ARIMA model that predicts a growth of 3 Mbps at a specific timestamp. UNIX timestamps of the subscriber data predicted the maximum data growth using the ARIMA model. p gg

**Figure 20.** The ARIMA prediction.

In Figure 21, ARIMA predicts a maximal subscriber data usage growth of 14 Gbps (where blue denotes actual data, orange is the predicted ARIMA data, and green is the future predicted values).

The predicted mean and standard deviation are in Figure 22.

The original data represents current values, but ARIMA approximated a mean and standard deviation from these values. ARIMA predicted a maximum mean value of 5 Mbps with a standard deviation of 2 Mbps. The results show mean and standard deviation values lower than the original or current values. The scatter plot of the testing set is shown in orange with a testing size of 32 data points that have been plotted or predicted (Figure 23).

**Figure 21.** The maximum data usage prediction.

**Figure 22.** The prediction of the mean and standard deviation.

**Figure 23.** The throughput prediction.

The mean of the testing set is in red while the training set mean is black. The training set has more data points than the testing sample due to the cross-validation process. The prediction set has only 28 data points, and the forecasting is for 24 h ahead. On average, 3 Mbps is predicted for the next 24 h (Figure 23). In this study, the ARIMA code to generate the results is as follows:

```
def forecast(ARIMA model, periods = 0):
n periods = periods
fitted, interval = ARIMA model.predict(n periods = n periods, interval = True)
index = pd.date range(df.index[1], periods = n periods, freq = H)
fitted series = pd.Series(fitted, index = index)
lower series = pd.Series(interval[:, 0], index)
upper series = pd.Series(interval[:, 1], index)
plt.plot(fitted series)
plt.fill between(lower series.index,
lower series,
upper series,
alpha = 0.15)
plt.show()
forecast(ARIMA model, 730)
ARIMA model.summary( )
```
#### *4.2. The Comparative Results of ARIMA and CNN*

The CNN is compared to ARIMA using the subscriber and UGRansome datasets. The CNN depends on the predicted timestamps. In what follows, we present the CNN results compared to the results obtained by the ARIMA. The prediction includes 30 to 60 days. The comparative results of the ARIMA and CNN models are in Table 7.



The *p*-value is a number between 0 and 1 and can be interpreted as follows:


The reported *p*-values reject the null hypothesis stating a decrease in subscriber data usage. Moreover, the CNN is compared to ARIMA using experimental datasets. We used four samples: the first sample was the subscriber dataset, where the ARIMA model obtained 92% of accuracy and outperformed the CNN. The second sample was the UGRansome dataset containing more features, but the ARIMA model surpassed the CNN with 91% of accuracy. The third sample was the testing sample of the subscriber data where the ARIMA achieved 94%. In the last sample, the ARIMA accuracy outperformed the CNN with 95% of accuracy. Overall, the ARIMA model achieved the best results in all undertaken comparisons. The ARIMA model performed better with the UGRansome data, and this was due to the nature of seasonal network traffic. We computed our models on fewer features of the subscriber data without producing poor results. We believe this is due to time series data properties which improve the balanced accuracy with 93% of accuracy.

#### *4.3. Execution Speed Test*

The results showed a CNN not outperforming the ARIMA in terms of accuracy, whilst the ARIMA performed better than the CNN model in terms of execution speed by a factor of 43 for more than 80,000 rows. Table 8 summarises the speed test results for both models.

**Table 8.** Computational speed test results.


A time Python package is used before and after each model's execution to measure how long it took to complete the entire procedure. The results are presented right after each execution is complete. The data point for each model is then plotted in batches of ten multiples (10 rows, 100 rows, 1000 rows, etc.). This gives a general notion of how each model's execution time grows as the number of data increases. Furthermore, this test is conducted using parameters for both the SARIMA and CNN models respectively. From the experimental results, it is evident that on average the ARIMA model outperforms the CNN in terms of accuracy. Table 8 showcases the speed test results for both models up to 100,000 rows. The ARIMA outperforms the CNN in terms of execution speed. In Table 8, which shows a linear growth for both models, the average rate of change can be determined by computing the following slope for each graph (Figure 24):

$$\text{Total} = \frac{\Delta y}{\Delta x}, \boxed{\frac{\text{gross}}{100000}}\\Slope = \frac{159.87 - 0}{100000 - 0} = 0.0016\\\boxed{\frac{6000}{10000}}\\Slope = \frac{6951.91 - 0}{100000 - 0} = 0.0696$$

**Figure 24.** The slope computation.

What can be deduced from the slopes is that on average, it takes 0.0016 s for the ARIMA model to execute one row, and 0.069 s for the CNN to execute the same row, thus making the ARIMA 43<sup>×</sup> 0.069 0.0016 faster than the CNN model. Nevertheless, different parameters will yield different results. It can thus be argued that further tuning about the CNN would yield even better results. To put it into perspective, it took under three minutes for the ARIMA to successfully execute 100,000 rows of data as compared to the CNN which took nearly two hours to complete a similar task. This margin will only widen as more rows are introduced for training between the two models. In addition, both models must be fed a sufficient amount of data to successfully train and produce the best results. Furthermore, additional yearly data would provide findings that would boost the study's credibility in addition to pointing out parameter tuning issues for both models. Finding the ideal parameters was challenging in light of the above, especially with the CNN model. One option which was used in this research was to draw ideas from other models used in related contexts in addition to using the rule to fine-tune the model. Fortunately, this was not the case with the ARIMA model since the tuning process was relatively simple and was not based on trial and error as it was with the CNN. However, this process can further be improved for the ARIMA by a simple automated step-wise search using an Akaike Information Criterion (AIC). Developed by the Japanese statistician Hirotugu Akaike [51], AIC is used to evaluate various potential model parameters and choose the one that best fits the data. Fortunately, an AUTO ARIMA (.) function can be imported from Python's PMDARIMA library which can be used to compute the AIC for the ARIMA model. The main goal would be to develop ARIMA parameters with the lowest

AIC. For example, the ARIMA parameters for patterns in the experimental datasets were (2, 1, 2). However, if we were to perform an automated step-wise search using the AUTO ARIMA in Python (Figure 25), then the parameters would be (2, 0, 0) since it has the lowest AIC value at −103.947.


**Figure 25.** Step-wise AIC results using subscriber data.

It is thus safe to conclude that future ARIMA models can be tuned using the automated AIC step-wise search [51]. Due to the limited experience with constructing CNN models from scratch, the TensorFlow library was used to fully realize the model into practice. Unfortunately, due to the high level of abstraction of the library, it can be very challenging to realize full control over models built using TensorFlow. For example, one is unable to tune the weights of the model, thus resulting in mild changes over the final results every time the model is run due to TensorFlow's frequent changes concerning weights in the background, thus it is unclear precisely what type of network is being constructed in the background, which can result in uncertainty with regards to the performance of the model. Unfortunately, other libraries such as PyTorch and Theano exhibit similarly high levels of abstraction.

#### *4.4. ARIMA, CNN, BATS, and TBATS Comparison*

We have compared the obtained results with standard forecasting methods such as BATS and TBATS. BATS is an exponential smoothing technique that handles non-linear data. The advantage of using BATS is that it can treat non-linear patterns, resolve the autocorrelation issue, and account for multiple seasonality. However, BATS is computationally expensive with a large seasonal period. Hence, it is not suitable for hourly data. Thus, the TBATS model was developed to address this limitation. It represents each seasonal period as a trigonometric representation based on the Fourier series. This allows the model to fit large seasonal periods. It is thus a better choice when dealing with high-frequency data, and it usually fits faster than BATS. Figure 26 shows the comparative results with the testing sample and original subscriber data (baseline). The BATS and TBATS prediction are also illustrated in Figure 27. The TBATS outperformed the BATS with a Mean Absolute Percentage Error (MAPE) of 32.63 (Figure 27). This metric defines the accuracy of the forecasting method. The MAPE represents the average of the absolute percentage errors of each entry in a dataset to calculate how accurate the forecasted quantities were in comparison with the actual quantities. A maximum value of 6.5 KBS was predicted by the TBATS in terms of the network traffic volume (Figure 26). As such, the ARIMA model could still predict more subscriber data usage growth compared to the BATS and TBATS models. This is because ARIMA models have a fixed structure and are specifically built for time series or sequential features. It is also due to the non-modifications of predefined parameters before their implementation on time series data.

**Figure 26.** Standard forecasting approach comparison.

#### *4.5. Recommendation*

In general, the ARIMA model achieved the best predictive accuracy results on the subscriber data. With regards to model execution speed, it was posited that the ARIMA would perform better than the CNN due to the CNN's sequential weight computation for each hidden layer. However, it was not expected to outperform CNN by such a huge margin, where there was no need to validate future performance using the student *t*-test. Nonetheless, there are various ways to improve the execution time of CNN:


4. Similar to decreasing the number of neurons, decreasing the number of epochs will also reduce the final model run time, however, at the expense of accuracy since reducing the number of epochs results in underfitting the model.

#### **5. Conclusions and Discussion**

Insights retrieval from subscriber data impacts the telecommunication landscape to facilitate information management and assist decision-makers in predicting the future using ML techniques. We explore time series forecasting analysis and predict subscriber usage trends on the network using the ARIMA model. The unknown forecasting value used by ARIMA relied on historical data. However, we used the data storage to build the subscriber dataset using hourly traffic statistics. We used various metrics to evaluate the ARIMA model. For instance, the normal Q-Q, standardized residual, theoretical quantile, correlogram, and accuracy. UGRansome was used to compare the obtained results that demonstrate similar accuracy values of 90% using the CNN model. The subscriber data was stationary but exhibited less seasonality. In the experimentation, ARIMA was compared to the CNN and achieved the best results with the UGRansome data. We have used an NSDM environment with subscriber data in a secure environment to retrieve relevant patterns such as timestamps and incoming/outgoing throughout to build the subscriber dataset. The variation of the auto-regressive and moving average components identified the most optimal features for obtaining precise predictive values. In addition, the subscriber data have normal distributions, but the UGRansome has more dependent variables. The ARIMA model predicted a growth of 3 Mbps with a maximum data usage growth of 14 Gbps. Furthermore, the performance concerning accuracy showed that ARIMA was superior to CNN. Concerning execution speed, the ARIMA outperforms the CNN by a 43:1 ratio for 100,000 rows. However, we recommend utilizing both methods depending on whether speed or accuracy is a priority. In the future, we will explore additional forecasting models by combining classical mathematical algorithms and compare the performance with Neural Networks, specifically Recurrent Neural Networks (RNN) using ensemble learning approaches. Lastly, it will be also better to explore other factors that affect subscriber's data usage such as multivariate forecasting.

**Funding:** The author wishes to extend his sincere appreciation and gratitude to the Editor-in-Chief for the invitation to contribute to this article entitled to a fee waiver discount.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The dataset and code used can be obtained upon request or downloaded at https://www.researchgate.net/publication/342200905\_An\_Ensemble\_Learning\_Framework\_for\_ Anomaly\_Detection\_in\_the\_Existing\_Network\_Intrusion\_Detection\_Landscape (Public Files/ Ugransome.zip and subscriber data.csv). The code is under (Public Files/ARIMA). Accessed on 12 December 2022.

**Acknowledgments:** The author would like to thank his Ph.D. supervisor Jacobus Phillipus van Deventer from the University of Pretoria, Faculty of Informatics, who supervised this research, and Maven Systems Worx (Pty) Ltd for granting access to the subscriber data.

**Conflicts of Interest:** The author declares no conflict of interests.

#### **Abbreviations**

The following abbreviations are used in this manuscript:



#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Investigating Metals and Metalloids in Soil at Micrometric Scale Using** μ**-XRF Spectroscopy—A Case Study**

**Sofia Barbosa 1,\*, António Dias 2, Marta Pacheco 3, Sofia Pessanha <sup>2</sup> and J. António Almeida <sup>1</sup>**


**\*** Correspondence: svtb@fct.unl.pt; Tel.: +351-212-948-573

**Abstract:** Micrometric 2D mapping of distinct elements was performed in distinct soil grain-size fractions of a sample using the micro-X-ray Fluorescence (μ-XRF) technique. The sample was collected in the vicinity of São Domingos, an old mine of massive sulphide minerals located in the Portuguese Iberian Pyrite Belt. As expected, elemental high-grade concentrations of distinct metals and metalloids in the dependence of the existent natural geochemical anomaly were detected. Clustering and k-means statistical analysis were developed considering Red–Green–Blue (RGB) pixel proportions in the produced 2D micrometric image maps, allowing for the identification of elemental spatial distributions at 2D. The results evidence how elemental composition varies significantly at the micrometric scale per grain-size class, and how chemical elements present irregular spatial distributions in the direct dependence of distinct mineral spatial distributions. Due to this fact, elemental composition is more differentiated in coarser grain-size classes, whereas gridingmilled fraction does not always represent the average of all partial grain-size fractions. Despite the complexity of the performed analysis, the achieved results evidence the suitability of μ-XRF to characterize natural, heterogeneous, granular soils samples at the micrometric scale, being a very promising investigation technique of high resolution.

**Keywords:** soil matrix; metal distribution per grain fraction; micro-X-ray elemental mapping; RGB clustering image analysis; k-means

#### **1. Introduction**

Quantification, imaging, and data processing of micro-X-ray Fluorescence (μ-XRF) outputs are presently an interesting but also very challenging area of investigation. To obtain the elemental distribution of a sample, specific instrumentation that provides precise positioning and good energy resolution must be used. Micro-XRF imaging spectrometers rely on scanning samples along the X and Y directions, with a micro-X-ray beam irradiating a region of interest (ROI), point by point [1]. Recent developments in μ-XRF consider quantitative analysis using fundamental parameter-based 'standardless' quantification algorithms [2,3].

The works developed by [2,4–6] evidence the suitability of this technique for various applications within the earth sciences. Further, 2D high-resolution chemical distribution maps can be used as qualitative multi-element maps or as semiquantitative single-element maps through which bulk and phase-specific geochemical data sets can be established [4].

In [2], the authors discuss the accuracy and precision of these quantitative analyses by using a simple-type calibration against a certified reference material of similar matrix and composition. μ-XRF is a non-destructive technique and leaves samples intact for other types of analyses, such as Raman spectroscopy or X-ray diffraction, which allow for the characterization of molecular components [7]. The use of μ-XRF in conjunction with these

**Citation:** Barbosa, S.; Dias, A.; Pacheco, M.; Pessanha, S.; Almeida, J.A. Investigating Metals and Metalloids in Soil at Micrometric Scale Using μ-XRF Spectroscopy—A Case Study. *Eng* **2023**, *4*, 136–150. https://doi.org/10.3390/ eng4010008

Academic Editor: Antonio Gil Bravo

Received: 4 November 2022 Revised: 20 December 2022 Accepted: 21 December 2022 Published: 2 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

established methods of molecular analysis allows for a more complete characterization of grains and particles [2,8,9]. Heterogeneous samples, such as soils, are much harder to characterize. Both single particle as well as bulk analyses must be performed on sample specimens to ensure a full description by μ-XRF [8]. Its consideration to analyse bulk samples of soil implies, necessarily, a clear elemental identification and the distinction between different occurring grades [10]. Quantification of soil data by μ-XRF is still a topic of considerable investigation interest and has been reported only in a limited number of publications [11,12]. Recent research studies evidence how statistical and geostatistical techniques can be applied to co-relate distinct imaging results [13] and how it is already possible to generate 3D maps of chemical properties at the micrometric scale by combining 2D SEM-EDX data with 3D X-ray computed tomography images [14–16]. Effectiveness and potentialities that result from the integration of results of micro-X ray and SEM techniques are also well demonstrated by distinct researchers, even in the cases of very irregular, porous matrixes [14–16]. In fluorescence microscopy, colocalization refers to observation of the spatial overlap between two (or more) different fluorescent labels, each having a separate emission wavelength. Ref. [13] discussed co-localization analysis processes in the context of increasingly popular super-resolution imaging technique occurrence versus correlation, although this limits image pixel-based processing techniques. Ref. [15] developed a method to generate 3D maps of soil chemical properties at the microscale by combining 2D SEM-EDX data with 3D X-ray computed tomography images. The spatial correlation between the X-ray grayscale intensities and the chemical maps made it possible to use a regression-tree model as an initial step to predict 3D chemical composition.

Bulk-sample analysis is a test method used when individual particulate samples are not representative or are not obtained for a certain type of material. Particulate products, such as soils, granulated powders, dusts, or foodstuffs, are usually analysed through bulksampling principles [8]. The microscopic analysis of a heterogeneous matrix, such as bulk soil samples, with μ-XRF is complex but has unique potentialities.

The present work is an introductory study in which 2D image clustering analysis based on μ-XRF XY scanning maps of a soil sample was performed. The case study, a soil sample denominated as SD1, was collected at the former mine of São Domingos in Mértola, Portugal (Figure 1). São Domingos Mine is located at the Iberian Pyrite Belt (IPB). It is a world-renowned massive sulphide ore deposit, mainly exploited for its copper contents. High concentrations of As, Zn, and Pb area also found. Its exploitation started prior to the Roman occupation period, mainly for Au, Ag, and Pb. Due to the mine's extensive exploitation over the centuries, the area is filled with very heterogeneous mining waste. Natural gossan (iron caps) deposits and natural local mineralogy results in the generation of heterogeneous soils with high contents of several heavy metals and metalloids. At this mining site, the geology is dominated by greywackes and quartzwackes, quartzites, phyllites, schists, forming the "Baixo Alentejo" Flysch Group, turbidites, and a volcano– sedimentary complex. The lithostratigraphic units range mainly from the Devonian to the Carboniferous periods [17,18]. Due to its mining context and its local geology, the most common elements found in the soils around the mining area are, mainly, Al, Si, S, Ti, Mn, Cr, Fe, Cu, Zn, As, Ga, Pb, Sb, and Hg [19,20].

**Figure 1.** Location of São Domingos mine and location of the collected SD1 sample. Left figure is adapted from [17].

#### **2. Materials and Methods**

#### *2.1. Sampling and Sample Preparation*

The soil sample was collected with the aid of a small shovel, scooping the surface soil to a depth of about 10 to 20 cm. About 1.50 kg of material was collected, stored, and labelled adequately. SD1 consists of a reddish-brown soil with small to large particles (Figure 2). The sample was sieved into four classes of grain size, ≥2 to <3 mm, <2 mm to ≥500 μm, <500 μm to ≥250 μm, and <250 μm. A ground and milled bulk sample (TM, "Total Milled") was also prepared. Depending on the availability of the material, and using a manual benchtop press, two to five pellets were made from all the granulometry-size fractions and TM. Table 1 shows the number of pellets analysed by category. These pellets were analysed with a benchtop micro-XRF spectrometer, M4 TORNADO by Bruker (Billerica, MA, USA).

**Figure 2.** Pellet of an original SD1 sample of grain size fraction "<2 mm to ≥500 μm" (image source: Bruker's M4 TORNADO camera).


**Table 1.** Number of pellets by category.

#### *2.2. Micro X-ray Fluorescence Multi-Point Measurements and 2D Image Mapping*

The micro-X-ray fluorescence technique is applied by means of the energy dispersive spectrometer M4 TORNADO by Bruker. This instrument consists of a low-power X-ray tube with a Rh anode, which was operated in this case study at 50 kV and 300 uA. Placed after the X-ray tube, a poly-capillary lens focuses the beam to a spot size that can go down to 25 μm for Mo-Kα. This way, by selecting an area in the sample, point-by-point measurements can be performed and images of elemental distributions within the sample are generated.

In the case study, the pellets were analysed making use of an AlTiCu 100/50/25 μm filter composition. For elements emitting radiations from 5 to 35 keV, it is adequate to use filters that can lessen the effect of the Bremsstrahlung radiation that contribute to background radiation [21]. Therefore, for SD1, the two filters mentioned above were used due to the presence of elements with an atomic number (Z) superior to 21, i.e., from Titanium (Ti) to Yttrium (Y), which were identified in a primary analysis without filters.

The measurements were taken under 20 mbar vacuum conditions (to improve detection limits), with a step size of 15 μm and 10 ms acquisition per spectrum rendering for, on average, 1 h 30 min to ensure high-resolution 2D maps for each element.

Data treatment of micro-2D mapping was performed using the M4 TORNADO inbuilt software MQuant.

That is to say, only one pellet for each of the sample categories—TM, ≥2 mm to <3 mm, <2 mm to ≥500 μm, <500 μm to ≥250 μm, <250 μm—was chosen for 2D map surveys due to the big amount of data obtained.

#### *2.3. Two-Dimensional Image Mapping Processing: Clustering RGB Pixel Analysis*

μ-XRF 2D mapping outputs consisted of 2D image files. Possibilities related with the processing of these image files are mainly related with pixel quantification and statistical analysis of its distributions. In this case study, each image refers to a certain element spatial distribution for which its occurrence and concentration are locally represented by a certain intensity of a certain RGB (Red, Green, Blue) colour. The highest elemental concentrations are represented by the highest RGB light colour proportions (Figure 3).

**Figure 3.** μ-XRF 2D mapping outputs for the element Iron (Fe). (**a**) Grain-size distribution: ≥2 mm to <3 mm; (**b**) <2 mm to ≥500 μm; (**c**) <250 μm.

Pixel proportion quantifications per distinct RGB colour intensity were established with R©Countcolors Package [22–25]). This package was developed originally with the aim of quantifying the area of white-nose syndrome infection of bat wings [25]. R©Countcolors Package allows users to quantify regions of an image by distinct colours. It is an R package that counts colours within specified colour ranges in image files and provides a masked version of the image with targeted pixels changed to a different selected colour by the utilizer. This package integrates techniques from image processing without using any machine learning, adaptive thresholding, or object-based detection, which make it reliable and easy to use but limited in terms of application.

The principle of the image processing analysis consisted of considering each RGB colour in three dimensions, where each colour is defined by its coordinates in R (red), G (green), and B (blue) axes. The range of each RGB colour is, thus, interpreted in a 3D space (Figure 4a). The quantitative RGB pixel analysis performed for each 2D image begins with the verification of the level of similarity of colour intensities according to its respective RGB code. Each RGB code represents a certain RGB cluster (and, thus, a certain colour intensity). RGB pixels per cluster are counted by samples of 10,000 pixels from the 2D image. For each RGB code representing a certain cluster, its respective frequencies are calculated (Figure 4b,c). Figure 4 presents an exemplification of the pixel-counting frequencies for six distinct colour clusters representing the concentrations of the element Fe. The pixels of more light-colour clusters represent the locations with highest concentrations on Fe. The number of clusters and the number of the sampling pixels are established by the user.

**Figure 4.** RGB clustering analysis of a μ-XRF 2D map (element: Fe; pellet of a bulk sample). (**a**) RGB counting colours in three dimensions (sample size n = 10,000 pixels); (**b**) Pixel classification in 6 clusters; (**c**) RGB pixel proportions for each cluster.

One of the main objectives of this case study was to estimate the areas that are associated with a certain range of RGB pixels. The light-colour ranges that are associated with the highest colour intensities represent the highest elemental concentrations. In the adopted methodology, after selecting the colour clusters that are the most representative for a certain element occurrence, its respective areas are estimated. The images processed always integrate degrees of intensity of a unique colour, which relates to a certain element to be identified. The element occurrence is represented by the light-coloured clusters in each colour image. In [5], following the principles described in [23–25], the authors defined an analysis methodology based on two options: one that considers upper and lower limits for each colour range and where a box-shaped border is drawn around the region of that range

(rectangular range) and a second option that considers the selection of a certain central colour and a search radius around it, were a "sphere" for the considered colour range is drawn (spherical range). Due to the possibilities of applying distinct criteria, estimated area calculations are referenced in terms of percentages of minimum and maximum probable areas (Figure 5). In fact, the calculated areas have distinct possibilities, directly dependent on the number of colour clusters and the search criteria, which are, in turn, user defined. Due to these distinct possibilities, it is more correct to suggest a range of probable estimated areas than to present only a specific estimated area. For this, the adopted methodology integrates the possibility of considering the search criteria to one, two, or three colour clusters simultaneously (Figure 5). When two or three colour clusters are to be considered, a search radius is applied to each colour. For minimum area calculations, it is advisable to consider "one colour cluster" with spherical or rectangular search criteria or "two colour clusters" procedures. To calculate possible maximum estimated areas, it is advisable to simultaneously consider "three colour clusters" for the estimations.

**Figure 5.** Methodology applied to estimate minimum and maximum probable elemental occurrence in a μ-XRF 2D map (example of element Fe in a bulk pellet sample).

This methodology allows one to accomplish a semi-quantitative analysis of the μ-XRF 2D mapping images. Uncertainty is mostly associated with the clustering classification and search criteria, which are user defined. The described methodology has already been applied to granular mining waste samples [5] and to a syenite nepheline rock sample in order to identify incompatible and scarce metals at the micrometric scale [5]. Results evidence the potentiality of this methodology to interpret elemental μ-XRF 2D mapping images of materials with heterogenous granular textures, such as soils and mining wastes, being also quite promising in elemental and mineral identification of distinct rock matrix [5,6].

#### **3. Results—Elemental** μ**-2D Mapping Distributions**

Through multi-point measurement analysis, it was possible to identify, in sample SD1, the following elements per size fraction class: aluminium (Al), silicon (Si), potassium (K), calcium (Ca), titanium (Ti), manganese (Mn), iron (Fe), nickel (Ni), copper (Cu), zinc (Zn), gallium (Ga), arsenic (As), rubidium (Rb), strontium (Sr), and yttrium (Y). Figure 6 presents the results achieved for the methodology applied for the case of the element Fe.

Estimations of minimum and maximum probable Fe occurrence in μ-XRF 2D maps are presented. Analogous results are presented for the elements Ca, Mn, Cu, Zn, and As in Appendix A.

**Figure 6.** Minimum and maximum probable elemental occurrence in μ-XRF 2D map (percentage of area %) for Fe.

As can be observed, the difference in spatial distribution patterns and the estimated minimum and maximum elemental quantities is clear according to grain-size fractions. Further, patterns of TM (ground and milled) are more similar to grain-size fraction "<250 μm". This behavioural pattern can be observed in most of the analysed elements (Appendix A). Quantities per element are estimated in percentage (%) of the total mapped area and vary according to grinding, milling, and grain-size fraction (Figure 6, Appendix A and Figure 7). Bulk milled samples do not always represent the average between the distinct size fractions. In fact, for some elements, coarser gain-size fractions, such as "≥2 mm to <3 mm" and "<2 mm to ≥500 μm", tend to be present in distinct estimated quantities (Figure 6, Appendix A and Figure 7). These two facts are indicative of the occurrence of some elements in the direct dependence of the mineralogy and, in turn, in the dependence of its more representative granulometry. Table 2 includes a summary of the minimum and maximum elemental occurrence in the μ-XRF 2D map (percentage of area, %) of the elements Al, Si, K, Ca, Ti, Mn, Fe, Ni, Cu, Zn, Ga, As, Sr, and Y.

Elements presented in higher estimated percentages evidence the influence of the local geology in the soil's constitution [17,26–28]. Figure 8 presents some of the most representative results considering maximum estimated percentages of elemental occurrence area (%). The elements presented in this Figure, Si, Al, Cu, Zn, Ca, K, Ti, Fe, As, Ga, and Mn, are grouped according to their respective percentage of occurrence area (%). The results reflect not only the natural composition of soil (Si, Al, Ca, K, Ti) but also the presence of natural geochemical anomalies, which are related to the existence of massive sulphide ore deposit minerals, increasing the percentages of occurrence of Cu, Zn, Fe, As, Ga, and Mn among other elements. Apart from Si and Al, the elements Cu, Zn, Ca, K, Ti, Fe, As, Ga, and Mn present specific spatial distribution patterns. For the case of Fe, As, Ga, and Mn, the dependence on coarser minerals is quite evident. Spatial overlap of the elements according to mineralogy is also possible to observe. In this context, the spatial overlap of Fe, As, and Ga is an example and is a consequence of the local geochemistry and mineralogy, which includes iron oxides and sulphides [17,26–28]. Simultaneously, the presence of As and Fe can be explained by the existence of arsenic-bearing sulfides, such as arsenopyrite or sulfosalts. The presence of Ga in the soil is usually connected with the occurrence of silty minerals. Ga tends to be sorbed by Fe(III) and Mn(III) oxides [29,30] and occurs as an impurity in iron oxides, hydroxides, and sphalerite minerals, which can explain the spatial correspondence between Fe and Ga in the SD1 sample.

**Figure 7.** Minimum and maximum elemental occurrence in μ-XRF 2D map (percentage of area %) for Fe, Ca, Cu, and As.


**Table 2.** Synthesis of estimated minimum and maximum elemental occurrence (percentage of area %) for Al, Si, K, Ca, Ti, Mn, Fe, Ni, Cu, Zn, Ga, As, Sr, and Y.

**Figure 8.** Image 2D micrometric maps of the elements (**a**) Zn, Ca (**b**) Si, Al, Cu (**c**) K, Ti (**d**) Fe, As, Ga (**e**) Mn in sample SD1, grain-size fraction "≥2 mm to <3 mm", and correspondent maximum estimated percentages of occurrence area (%).

#### **4. Discussion and Conclusions**

Elemental 2D spatial mapping through micro-XRF spectroscopy is a promising technique in the detailed study of granular heterogeneous samples, such as soils and mining wastes [31–33]. In this exploratory study, a clustering image analysis methodology was applied to detect elemental distribution at micrometric scale according to distinct colour intensities. The results present accurate information on the elemental distribution per grain fraction, offering clues of its geochemical occurrence (manly primary in coarse grain-size fraction and secondary in finer fractions). Results are more regular and similar between distinct fraction samples and milled samples when the element occurs at lower granulometries. The results showed that the elemental spatial patterns per grain-size fraction are not always coincident or similar to grinding and milled spatial pattern samples, showing that, for some cases, elemental distribution is dependent on specific mineralogy, which can have its own grain-size distribution pattern according to geochemical characteristics of the site. Some metals show distinctive percentages of occurrence according to grain-size fraction. Metal occurrence in milled fractions do not always correspond to the average of the grain-size fractions. Certain elements tend to be present in higher quantities in coarse fractions, mainly 2–3 mm, while other elements tend to present in smaller-size grain fractions (<250 μm). This will be dependent on the mineralogy and specific geochemical behaviour, especially mobility, of the elements. For sure, mobility and geochemical source of the element (primary or secondary) will dictate elemental specific spatial patterns at the micrometric scale.

In general, minimum and maximum elemental estimations from 2D maps show a tendency of greater discrepancies in results when the element is more abundant and widespread in the matrix. This is the example of element Si, K, Zn, Sr, and finer gain-size fractions of Fe. Major discrepancies in measurements are due to the higher difficulty in fixing the characteristic degree of colour intensity that marks the occurrence of the element, and distance between the distinct intensity colour degrees, which may make clustering classification difficult. In this context, the joint interpretation of 2D images to estimate 3D grades is currently an emerging research area [13,14,31–33] that will represent a quite interesting investigation upgrade.

The exploration of applicable data image analysis techniques able to identify elemental spatial overlaps in μ-XRF 2D map surveys and the estimation of grain-size distributions per element or per groups of elements are two promising areas for forward investigation in granular and heterogeneous samples, such as in the case of soil samples.

**Author Contributions:** Conceptualization, S.B. and A.D.; methodology, S.B. and A.D.; software, S.B. and M.P.; validation, S.B., A.D., S.P. and J.A.A.; investigation, S.B., A.D. and M.P.; writing—original draft preparation, S.B.; writing—review and editing, A.D., S.P. and J.A.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by FCT-Fundação para a Ciência e a Tecnologia, Portugal, grants number UIDB/04035/2020, and UID/FIS/04559/2020.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors acknowledge the support of LIBPhys, GeoBiotec, Department of Physics and Department of Earth Sciences of Nova School of Science and Technology for the development of the laboratory work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Figure A1.** Minimum and maximum probable elemental occurrence in μ-XRF 2D map (percentage of area %) for Ca.


**Figure A2.** Minimum and maximum probable elemental occurrence in μ-XRF 2D map (percentage of area %) for Mn.

**Figure A3.** Minimum and maximum probable elemental occurrence in μ-XRF 2D map (percentage of area %) for Cu.

**Figure A4.** Minimum and maximum probable elemental occurrence in μ-XRF 2D map (percentage of area %) for Zn.

**Figure A5.** Minimum and maximum probable elemental occurrence in μ-XRF 2D map (percentage of area %) for As.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Network Pathway Extraction Focusing on Object Level**

**Ali Alqahtani 1,2**


**Abstract:** In this paper, I propose an efficient method of identifying important neurons that are related to an object's concepts by mainly considering the relationship between these neurons and their object concept or class. I first quantify the activation values among neurons, based on which histograms of each neuron are generated. Then, the obtained histograms are clustered to identify the neurons' importance. A network-wide holistic approach is also introduced to efficiently identify important neurons and their influential connections to reveal the pathway of a given class. The influential connections as well as their important neurons are carefully evaluated to reveal the sub-network of each object's concepts. The experimental results on the MNIST and Fashion MNIST datasets show the effectiveness of the proposed method.

**Keywords:** deep learning; network pathway extraction; neuron importance

#### **1. Introduction**

Deep learning algorithms (e.g., NNs and CNNs) are often viewed as "black-box models" because of their vagueness and ambiguous working mechanisms [1]. Efforts have been made to investigate complex models, as well as to clarify and describe their work mechanisms and internal function, providing us a general understanding of how to handle and enhance such models. Different approaches have been developed to understand the importance of intermediate units in the neural networks, which consider substantial steps to gain insight into the characteristics of the latent representations, to understand how information is propagated through a network, and to evaluate the importance of a neuron by measuring the influence of hidden units. The established techniques have made an effort to visually interpret and understand the deep representations, mainly focusing on pixel-level annotations [2,3] and single-neuron properties via code inversion strategies (e.g., [4–8]) and activation maximization strategies (e.g., [9–12]) with regard to illustrating the learned representations of deep learning algorithms. Their interpretability has been applied to visually evaluate neurons' importance and to understand their properties. The major priority of these approaches is to clarify a model's predictions by looking for an explanation for specific activation and by analyzing individual neurons. However, it is still challenging to intuitively measure decision linkages and the sufficient associations between nodes with a massive number of connections. In this paper, I propose an efficient method to identify the important neurons that are related to object concepts and mainly consider the relationship between these neurons and their object concept or class. I first quantify the activation values among neurons, based on which histograms of each neuron are generated. Then, the obtained histograms are clustered to identify the neurons' importance. I then introduce a network-wide holistic approach that efficiently identifies important neurons and their influential connections to reveal the pathway of a given class. The influential connections as well as their important neurons are carefully evaluated to reveal the sub-network of each object's concepts.

The rest of the paper is organized as follows. In Section 2, I present related works, while I describe our proposed methodology in Section 3. In Section 4, I present our experimental results. Finally, concluding remarks are provided in Section 5.

**Citation:** Alqahtani, A. Network Pathway Extraction Focusing on Object Level. *Eng* **2023**, *4*, 151–158. https://doi.org/10.3390/ eng4010009

Academic Editor: Antonio Gil Bravo

Received: 1 November 2022 Revised: 12 December 2022 Accepted: 27 December 2022 Published: 3 January 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **2. Related Works**

Considerable attention has been given to understanding the importance of internal units in neural networks. Understanding neuron properties and evaluating their importance raise awareness to the need to adopt the quantitative assessment of neurons' properties. Such manners are utilized to measure the activation importance of each node and to appoint a score to them. To determine the importance of hidden neurons, Dhamdhere et al. [13] apply integrated gradients by calculating a summation of the gradients of the prediction with respect to the input. Morcos et al. [14] explored the relationship between the output of individual neurons and the classification performance of neural networks to evaluate mutual information and class selectivity for each neuron's activation. Moreover, Na et al. [15] have recently used the highest mean activation to measure the importance of individual units on language tasks, showing that different units are selectively responsive to specific morphemes, words, and phrases. Despite the fact that most of the aforementioned techniques emphasize effective methods to identify important neurons, most of their concentration was on gaining the best understanding of the network's mechanism, with limited attention towards tracking the pathway of a given class and analyzing the network's behaviour at a sub-network level.

In an attempt to study the topic of cumulative network pruning, several approaches have been developed [16]. Frankle et al. [17] found that networks contain a sub-network that reaches a test accuracy that is comparable with the original network through an iterative pruning technique. Their core idea was to find a smaller, well-suited architecture to the target task at the training phase. Ashual et al. [18] proposed a composite network that focuses on extracting the sub-network for each class during the training process. These methods show the possibility of revealing the sub-network in an indirect way, whether through eliminating unimportant parts of the neural networks or through training multiple branches of the network, where different groups of branches denote different objects. Therefore, providing a way to measure the importance of different parts of the network, to detect the important neurons, and to identify relationships among neurons are worthwhile to extract the sub-network for a specific object or class.

In this paper, I propose an efficient method to identify the important neurons that are related to object concepts and mainly consider the relationship between these neurons and their object concept or class. I first quantify the activation values among neurons, based on which histograms of each neuron are generated. Then, the obtained histograms are clustered to identify the neurons' importance. I then introduce a network-wide holistic approach that efficiently identifies important neurons and their influential connections to reveal the pathway of a given class. The influential connections as well as their important neurons are carefully evaluated to reveal the sub-network of each object's concepts.

#### **3. Method**

Measuring the importance of different network parts always requires a more meticulous process. Most of the current techniques focus on providing efficient techniques to determine important neurons, with limited attention being paid to tracking the pathway of a given class and analyzing the network's behaviour at the sub-network level. My importance-measurement method introduces a novel way to reveal the sub-network; it estimates the importance of neurons in each layer and identifies a subset of their influential connections whose activation values are the most effective in identifying relationships among neurons. This section presents my overall proposed framework, which consists of two parts. First, the evaluation of neuron importance is discussed; this determines the importance of neurons in each layer. Then, I present a network-wide holistic method that efficiently identifies important neurons and their influential connections to reveal the pathway of a given class. The entire algorithm is provided in Algorithm 1. The details are provided below.

**Algorithm 1:** Network Pathway Extraction.


#### *3.1. Analyzing the Importance of Individual Neuron*

The aim was to reveal effective units in neural networks by estimating their activation. When the training data are fed through the network, a different representation is obtained for each example and has unique activation throughout all neurons in the network. A forward passing via a trained model is applied to derive the output of each unit. Different input examples can present more instances, and the output can be seen as random variables. I utilized a novel process to estimate the importance of units. In each layer, the weights are multiplied with an input sample, *x*, to deliver an output corresponding *n* activation. The activation at *j*-th unit is determined by summing the weights of the activations from all unit in the (*i* − <sup>1</sup>)-th layer. The production of the *j*-th unit in the *i*-th layer of the neural network is given by

$$\mathbf{t}\_{j}^{(i)}(\mathbf{x}\_{n}) = \sigma \left( \mathbf{b}\_{j}^{(i)} + \sum\_{p} w\_{p,j}^{(i-1)} \mathbf{t}\_{p}^{(i-1)}(\mathbf{x}\_{n}) \right), \tag{1}$$

where *xn* is the *n*-th data sample at the input, *σ* denotes the activation function, *b<sup>i</sup> <sup>j</sup>* is the corresponding bias for the *<sup>j</sup>*-th unit in the *<sup>i</sup>*-th layer, and *<sup>w</sup>*(*i*−1) *<sup>p</sup>*,*<sup>j</sup>* denotes the weight that connects *p*-th unit from the prior layer (*i* − 1) with the *j*-th unit in the *i*-th layer (current layer).

After obtaining a matrix of edge values for each example by Equation (1), histograms for each edge were generated (see Figure 1). Then, the obtained histograms were clustered into three different clusters (High, Medium, and Low). Therefore, I came up with a binary vector for every layer that demonstrates whether such units are critical, where 1 indicates that the neuron is essential and 0 otherwise.

**Figure 1.** Activation distribution from different units of trained FC network.

#### *3.2. From Individual Neuron to Sub-Network Analysis*

The collection of important neurons at each layer in a network will only give localized feature descriptors, of which essential neurons for a particular class can be detected. By focusing on this assumption, I am able to justify how distinct neurons become the most representative for a given class, maintaining the class information within the input domain, consequently providing a way to detect how the activations from the nodes of the previous layer impact the activations of the current layer is required to extract a sub-network for a given class. A combination of neurons and their influential connections provides a novel way to reveal the sub-network.

As the activation of a neuron in the current layer is computed as the weighted sum of activations from neurons in the previous layer obtained by Equation (1), in fully connected layers, the influential connections *t* (*i*−1) *<sup>j</sup>* (*x*ˆ*n*) of a single neuron are obtained by multiplying the weights of that neuron with its corresponding activations in the previous layer, as follows:

$$t\_j^{(i-1)}(\\\pounds\_n) = w\_{p,j}^{(i-1)} t\_p^{(i-1)}(\\\xleftarrow{}\_n),\tag{2}$$

where *xn* denotes the *<sup>n</sup>*-th data example at the input, *<sup>w</sup>*(*i*−1) *<sup>p</sup>*,*<sup>j</sup>* is the weight that links *p*-th unit from the former layer (*i* − 1) with the *j*-th unit in the *i*-th layer (current layer), and *t* (*i*−1) *<sup>p</sup>* (*xn*) represents the corresponding activations in the previous layer (*i* − 1). After the values of the influential connections are obtained for each neuron, the majority voting (MV) method [19,20] is applied to detect the most important connections for each neuron.

#### **4. Experiment and Discussion**

I empirically investigated the performance of my proposed approach using two different datasets: MNIST [21] and MNIST-Fashion [22] through several models. The proposed method was implemented using Keras and TensorFlow [23] in Python. The specifications of the datasets and their architecture for the used models are presented in Table 1.


**Table 1.** Details of datasets and their architectures used in my experiments.

The first model was an auto-encoder model, which was optimized in an unsupervised manner. There are no fine-tuning and pre-training processes involved. The stochastic gradient descent was used, and each batch included 100 random shuffled examples. For both datasets, an initial learning rate of 0.006 with a momentum of 0.9 and weight decay of 0.0005 were utilized.

I carried out a visual assessment to evaluate the results of the proposed method. Some instances of actual inputs and reconstruction images generated by my model are presented in Figures 2 and 3. Two evaluation studies were adopted: an ablation study and an insertion study.

After the pathway was identified, I applied the ablation study by forcing the pathway of a particular class to be zero and performed the propagation of the forwarding pass. Three samples were fed to the network after identifying the class pathway: (different image (left), random noise image (middle), and same image (right)). The reconstruction images of such this assessment were visualized using the three samples (see Figures 2a,b and 3a,b). This ablation study is best suited for evaluating the effectiveness of extracting a particular class pathway. One clear example can be seen in Figure 2a; the model failed to reconstruct the image of digit 5 back to its original shape because the pathway of that digit was ablated. It is also worth noting that the reconstruction of the image of digit 7 obtains an

optimal approximation of the underlying input data because the pathway of digit 7 was still available. These observations allow us to efficiently identify important neurons and their influential connections to reveal the pathway of a given class.

**Figure 2.** An ablation study (**Top**) and an insertion study (**Bottom**) with different examples and different identification of class pathway on the MNIST dataset.

**Figure 3.** An ablation study (**Top**) and an insertion study (**Bottom**) with different examples and different identification of class pathway on the fashion MNIST dataset.

Figures 2 and 3 also show the results of the insertion study, where I forced the pathway of a particular class to be one and zero otherwise and performed the propagation of the forwarding pass (see Figures 2c,d and 3c,d). This study obviously showed that all input images ended up with the reconstruction of the same class of the identified pathway. The insertion study helped to evaluate the effectiveness of my method when revealing the pathway of a given class.

The second model was a classification model, which trained end-to-end in a supervised manner. Figure 4 experimentally analyzes the performance of my proposed method. Using my method, I identified the pathway of each class of trained fully-connected networks. My experiment showed that ablating a specific class pathway has no effect on other classes. One obvious explanation is that the proposed method succeeded in carefully identifying the pathway of each class. It is also crucial to note that because digit 6 has a comparable structure to that of digit 8, especially with regards to the bottom part of both digits (see Figure 4g), the model classified most examples of digit 6 into the class of digit 8. One reasonable justification is that the presented method was able to determine the homogeneous patterns for a particular digit, which leads to the identification of the pathway of the target class.

**Figure 4.** Results of different class pathways when applying an ablation study on the MNIST dataset.

#### **5. Conclusions**

In this paper, I proposed an efficient method to identify the important neurons, mainly considering the relationship between these neurons and their object concept or class. I introduce a network-wide holistic approach that efficiently identifies important neurons and their influential connections to reveal the pathway of a given class. The influential connections as well as their important neurons were carefully evaluated to reveal the subnetwork of each object's concepts. I showed the effectiveness of the proposed method using two different datasets. Our potential future work is to expand the proposed framework to filters in CNNs and to investigate it with more difficult datasets. Although this procedure significantly identifies influential connections, the more theoretical analysis also requires further study to understand how such ideas can be generalized further. Moreover, I believe more investigation is needed to carefully study or gather evidence to determine whether the object's patterns can be a local receptive field in the FC networks, as connecting each neuron to only a local region of the input space might help to justify CNNs and to prove that the actual connections are local receptive fields. There are also several challenges and extensions we perceive as useful research directions. Extending the proposed framework and combining it to strengthen the discriminative features and improve the encoder's ability of deep clustering [24] is an important direction for future work.

**Funding:** This work was supported by the Deanship of Scientific Research, King Khalid University of Kingdom of Saudi Arabia under research grant number (RGP1/357/43).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Safety Occurrence Reporting amongst New Zealand Uncrewed Aircraft Users**

**Claire Natalie Walton \* and Isaac Levi Henderson \***

School of Aviation, Massey University, 47 Airport Drive, Palmerston North 4414, New Zealand **\*** Correspondence: clairenwalton@gmail.com (C.N.W.); i.l.henderson@massey.ac.nz (I.L.H.)

**Abstract:** Safety reporting has long been recognised as critical to reducing safety occurrences by identifying issues early enough that they can be remedied before an adverse outcome. This study examines safety occurrence reporting amongst a sample of 92 New Zealand civilian uncrewed aircraft users. An online survey was created to obtain the types of occurrences that these users have had, how (if at all) these are reported, and why participants did or did not report using particular systems. This study focussed on seven types of occurrences that have been highlighted by the Civil Aviation Authority of New Zealand as being reportable using a CA005RPAS form, the template for reporting to the authority for uncrewed aircraft occurrences. The number of each type of occurrence was recorded, as well as what percentage of occurrences were reported using a CA005RPAS form, an internal reporting system, or were non-reported. Qualitative questions were used to understand why participants did or did not report using particular systems. Categorical and numerical data were analysed using Chi-Squared Tests of Independence, Kruskal–Wallis H Tests, and Mann–Whitney U Tests. Qualitative data were analysed using thematic analysis. The findings reveal that 85.72% of reportable safety occurrences went unreported by pilots, with only 2.74% of occurrences being selfreported by pilots using a CA005RPAS form. The biggest reason for non-reporting was that the user did not perceive the occurrence as serious enough, with not being aware of reporting systems and not being legally required to report also being major themes. Significant differences were observed between user groups, providing policy implications to improve safety occurrence reporting, such as making reporting compulsory, setting minimum training standards, having an anonymous and non-punitive reporting system, and through working with member-based organisations.

**Keywords:** aviation safety; accident reporting; occurrence reporting; drones; unmanned aircraft; crewed aircraft; aviation regulation

### **1. Introduction**

While some types of uncrewed aircraft (UA) have been used within the civilian space for a long time (such as aeromodellers flying model aircraft), the proliferation of remotely-piloted aircraft and autonomous uncrewed aircraft (often called "drones") has accelerated in recent years [1], with the International Civil Aviation Organisation beginning to look at this issue as early as 2006 [2]. With applications ranging from aerial survey, mapping, aerial photography and video, along with inspection in the agricultural, security, energy, and construction industries, the benefits and range of uses continues to grow [3,4]. Worldwide, UA have become "more accessible, affordable, adaptable and more capable of anonymity" [5] p. 1. As UA utilisation expands, the potential for occurrences also increases. This highlights the importance of the communication of hazard information to mitigate risks and provide safety solutions. As technology progresses, UA are now performing Beyond Visual Line of Sight (BVLOS) operations and the amount of autonomy is also increasing, highlighting the need for safety systems that are fit for purpose [1]. While there is always a degree of inherent risk within aviation, the benefits of achieving a task must be perceived to outweigh any associated risks, particularly those toward crewed aircraft [6].

**Citation:** Walton, C.N.; Henderson, I.L. Safety Occurrence Reporting amongst New Zealand Uncrewed Aircraft Users. *Eng* **2023**, *4*, 236–258. https://doi.org/10.3390/eng4010014

Academic Editor: Antonio Gil Bravo

Received: 12 December 2022 Revised: 4 January 2023 Accepted: 10 January 2023 Published: 12 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

UA are inexpensive to purchase and because no one is aboard the aircraft, the user may take greater risks. Considering this, the importance of being proactive and identifying potential risks and vulnerabilities in advance ensures UA remain beneficial and do not become a danger to safety as the industry expands [7]. Occurrence reporting and monitoring of UA activities may be one method to reduce the risk of further occurrences [4].

With the development of UA technology and the likelihood of the UA industry expanding into shared airspace, along with increased numbers of UA operating BVLOS [1], the objective of this research is to examine the types of UA safety occurrences that users are having, how (if at all) these UA safety occurrences are being reported, and why they are being reported (or non-reported) using particular systems. An online survey was created to obtain this information from users. Our study was restricted to UA users in New Zealand, and only measured UA safety occurrences for each user between 2015 and 2022. This was because the current regulatory framework came into effect in 2015. With the data obtained, similarities and differences between users on their reporting (or non-reporting) of safety occurrences can be examined to ensure that risks are managed collectively, and safety improvements are made for the benefit of the industry.

The next section will present a literature review, which examines the applicable regulations in New Zealand and compares these with other jurisdictions and discusses relevant past literature in relation to safety occurrence reporting. Next, the methods are presented, including how the survey was created, how participants were recruited, our sample, and the how the data were analysed. The results are then presented, both quantitative (i.e., number of types of occurrences, number reported or non-reported using particular systems, and so on) and qualitative (i.e., the reasons for reporting or non-reporting using particular systems, as well as use of alternative safety performance measures). The results are then discussed, highlighting a number of potential strategies for improving safety reporting amongst UA users in New Zealand. Finally, the study is concluded and limitations and opportunities for future research are provided.

#### **2. Literature Review**

#### *2.1. Terminology*

Worldwide, several terms are used to describe UA. Colloquially, they are often referred to as drones, with more formal terms including Unmanned/Uncrewed Aerial Vehicles (UAVs), Unmanned/Uncrewed Aircraft Systems (UASs), Remotely Piloted Aircraft Systems (RPAS), and model aircraft [4,8]. Nuances exist between these terms; for example, UAVs refer to all uncrewed aircraft, while RPAS excludes autonomous aircraft, and model aircraft tends to refer to scale versions of crewed aircraft [4,8,9]. This study is interested in all types of UA, and henceforth we have consistently used the UA abbreviation throughout the paper.

#### *2.2. Applicable Regulations*

UA operations are primarily governed by two Civil Aviation Rule (CAR) Parts, CAR Parts 101 and 102. CAR Part 101 outlines a series of general operating rules, while CAR Part 102 outlines a process for certificating organisations that want to execute operations outside of those general operating rules [10,11]. Crewed aircraft incident and accident notification is regulated under CAR Part 12 but gives dispensation for UA to file occurrence reports [12]. According to CAR102.11, UA operators are required to produce an exposition to apply for Part 102 certification. The exposition must outline processes for a hazard register. The hazard register identifies risks and known hazards involved with the operation, the mitigation measures taken to avoid them along with details of the operation and equipment to be used [10]. While not being an explicit legal requirement to submit CA005RPAS forms to report occurrences, an advisory circular (outlining acceptable means of compliance) recommends that UA operators conducting operations under Part 102 should report occurrences in this manner [13]. This advisory circular lists seven events where reporting using a CA005RPAS form is recommended. These are " injury to persons; and loss of control; and

fly-away; and motor or structural failure; and incidents involving manned aircraft; and incursion into airspace where not authorised; and damage to third party property" [13] p. 16. Nonetheless, the advisory circular does encourage UA operators to report any occurrences that they deem necessary, stating that regular recording of statistical data may help establish the reliability of UA and used when updating regulations [13]. However, with UA listed as a low-consequence and low-regulatory priority in the Civil Aviation Authority of New Zealand's (CAANZ) current safety strategy, these regulations may not be revised for some time [14].

New Zealand does not significantly differ in most respects with other jurisdictions with regard to regulation of UA [8]. However, internationally, there are differences with regard to the requirement for UA occurrence reporting. For example, Australia, the United States, the United Kingdom, and member states of the European Union Aviation Safety Agency (including the 27 European Union member countries, Switzerland, Iceland, Norway, and Liechtenstein) all provide very explicit thresholds for when UA occurrence reporting is required [15–19]. The specific thresholds nonetheless vary. In this respect, New Zealand diverges from these major jurisdictions by only providing an advisory circular, which, because it relates to CAR Part 102, implies it is not relevant to CAR Part 101 where more operations occur. Australia, the United States, and the European Union also provide confidential reporting systems, which are for all incidents, including those that do not reach the threshold for formal reporting [15,20,21]. These systems are aimed at collecting, analysing, and sharing safety information with users, industry, and regulators. They also emphasise that they are there to help learn from occurrences and will not be used to assign blame. There is no confidential reporting system available in New Zealand, and there is also no explicit statement on the CA005RPAS form to say that reports of UA occurrences will not be used to assign blame, though it does highlight the purpose of reporting is to help monitor risk, learn from incidents and accidents, and help to reduce the chances of accidents occurring [22].

#### *2.3. Safety Occurrence Reporting*

Reporting occurrences is an essential element of safety, providing information for the improvement and development of technology in addition to aiding UA pilots to develop their skills and knowledge to avoid further occurrences. Occurrences can be a forewarning system of a gap or weakness in the safety hazard management system. Incidents enable industry to develop learning to avoid more serious occurrences [23]. For an organisation to avoid occurrences leading to major accidents, it is crucial for occurrence information to be recorded and the information filtered to those affected, ensuring risk management strategies are adapted to avoid occurrences turning somewhat more serious [23]. For the sake of brevity, key findings and approaches from past studies related to safety occurrence reporting are summarised in Table 1.

Past research on occurrence reporting highlights some consistent themes. Firstly, occurrence reporting allows for learning and improvement of safety over time, not just for the affected organisation, but also for the industry as a whole when information is shared. Secondly, reporting has an attitudinal element, and organisational safety culture is often cited as having an influence on reporting, as well as the vice versa, where acting upon safety reports helps to reinforce a safety culture. Thirdly, underreporting is a common issue, with UA occurrence reporting estimated to be much lower than for crewed aircraft, and where only a minority of reports are self-reported by UA pilots. Lack of seriousness of an occurrence, knowledge of the legal system, usability of reporting systems, and fear of repercussions are cited as the key barriers to reporting. Fourthly, most past research has used reported occurrences as data, which is acknowledged by those studies as subject to bias because they are only the reported occurrences. This highlights the novel approach of this study by capturing participant occurrences, whether reported or unreported. Finally, no study appears to have asked UA users qualitative questions to understand why they are reporting or not, and if so, why they use particular systems.


**Table 1.** Abbreviated summary of past research relevant to UA safety occurrence reporting.

<sup>1</sup> We have been selective about which findings are immediately relevant to the study at hand—only a portion of the findings from any of these studies are reported here; for full results, readers will need to consult the sources themselves.

#### **3. Methods**

#### *3.1. Materials*

An online survey was created to examine what types of safety occurrences UA users are having, how (if at all) these are being reported, and why they are being reported (or non-reported) using particular reporting systems. To provide some clear guidelines about what should be reported and what systems are available, this survey was designed for New Zealand UA users. Gender and age were collected for descriptive purposes. Users were categorised by user type (recreational, semi-professional, or professional), recency of flying, hours flown in the last 12 months, whether they had passed a course of UA operations, whether they had passed an Operational Competency Assessment (OCA) (the local term for a UA flight examination), whether they were a member of Model Flying New Zealand (MFNZ) (this is a nation-wide member organisation for aeromodellers), whether they were a member of UAVNZ and/or Aviation New Zealand (UAVNZ is an industry and professional body for UA operators, and Aviation New Zealand is its parent organisation representing the wider commercial aviation sector), which Rule Part they operated under (Part 101, Part 102, both, or unsure), and whether they pilot aircraft of 15 kg or more (as this is where some qualification or certification becomes required). These categorisations were used for later statistical analyses. Advisory Circular AC102-1 outlines seven types of occurrences that CAANZ would like reported using a CA005RPAS form, which are [13] p. 16:


Because this advisory circular was published alongside the current regulatory framework in 2015, this study limited its scope to occurrences between 2015 and 2022. The survey asks participants to tick a box as to whether they have had any of each occurrence within this time period, and if they had, they were asked a follow-up question to obtain the number of each type of occurrence that they had within this timeframe. Participants who had no reportable occurrences during this time period were asked follow-up questions to see whether they had reported anything else, while participants who had a reportable occurrence were asked to provide percentages for how many of their occurrences had been reported using a CA005RPAS form, how many had been reported using an internal reporting system, how many had been reported using both the CA005RPAS form and an internal reporting system, and how many were not reported using either system. Qualitative questions were used to obtain the reasons why they did or did not report using particular systems and whether they had any alternative ways of measuring safety performance for their operations. There were two reasons why qualitative questions were used for this purpose. The first was to avoid the issue of self-generated validity where answers may be influenced by asking questions about measures that may not exist in long-term memory [31–34]. The second was to be consistent with a heterophenomenological epistemology, whereby it is important to recognise that each person lives in their own subjective reality, influenced by their life experiences and what they believe about those experiences [35,36]. By allowing participants to describe their reasoning in their own words, we build a better understanding of why they think they may behave in a particular manner [37], which is of interest when considering their behaviour in relation to safety occurrence reporting. The full list of questions in the survey, including display logic, is provided in Appendix A.

#### *3.2. Procedure*

An online survey hosted via Qualtrics was used to collect data. This was available from the 12 September 2022 until the 8 October 2022. Participants were recruited through posts on social media, through a link in an Aviation New Zealand weekly newsletter, and through encouraging participants to refer others onto the survey. Posts were made on the following Facebook forums to recruit participants: (1) Kiwi Pilots, (2) DJI Drones New Zealand Operators' Group, (3) Drone Fishing New Zealand, (4) Multirotors New Zealand, (5) New Zealand Drone Photography, (6) NZ Drone Photography, and (7) Drones on Farm NZ. A post was also made on LinkedIn using the personal account of the second author. Participants were asked how they found out about the survey before completing the survey. When clicking on the link, participants were presented with an information sheet about the study. Three recruitment criteria were applied:


The use of convenience sampling was pragmatic—there are no reliable data about how many UA users there are in New Zealand, nor the split of different user types, though some estimations have been made [8]. Thus, this study simply aimed to make sure that there was a reasonable chance that different user types would be exposed to the recruitment materials posted across multiple forums. The "push-out" approach of social media recruitment (recruiting while they are engaged in other unrelated online activities) has been shown to provide demographically diverse samples [38]. Recruitment via Facebook has been shown to gather samples that are similarly representative to more traditional methods [39], with differences between Facebook data sets and comparison data sets being practically insignificant in their magnitude [40]. However, some researchers have found that it results in over-representation of young white women [41].

This project was peer-reviewed and deemed to be low risk. Consequently, it was not reviewed by one of the University's Human Ethics Committees but was registered as a low-risk study on the Massey University Human Ethics Database.

#### *3.3. Sample*

The survey obtained 110 responses during the study period. However, only 92 responses were complete enough to be useful (determined by completing at least 69% of the questions). Out of this sample of 92 participants, 83 (90.22%) were male, 6 (6.52%) were female, 1 (1.09%) was nonbinary, and 2 (2.17%) preferred not to say. The mean age was 42.78 (SD = 16.25), with one participant who did not provide age. All participants were current users of UA. Of these, 49 (53.26%) classified themselves as recreational users (primarily for enjoyment), 21 (22.83%) classified themselves as semi-professional (where less than 50% of the participants work time is spent on activities related to unmanned aircraft operations), and 22 (23.91%) classified themselves as professional users (where more than 50% of the participants work time is spent on activities related to unmanned aircraft operations).

To ensure that UA users from various groups were not over-represented, we asked participants how they found out about the survey. The majority of participants, 75 (81.52%), found out via social media, 14 (15.22%) found out from the Aviation New Zealand email, and 3 (3.26%) were referred by a friend.

Regarding flight recency, 62 (67.39%) had flown within the last month, 18 (19.57%) had flown within the last 6 months, 6 (6.52%) had flown within the last year, 6 (6.52%) had flown more than a year ago. With regards to flight currency, 18 (19.57%) had flown less than 5 h within the last 12 months, 19 (20.65%) had flown 5–10 h within the last 12 months, 16 (17.39%) had flown 10–25 h within the last 12 months, 13 (14.13%) had flown 25–50 h within the last 12 months, and 26 (28.26%) had flown more than 50 h within the last 12 months. There were 34 (36.96%) participants who had completed a course on UA operations, and 40 (43.48%) who had passed an operational competency assessment. In terms of member-based organisations, 21 (22.83%) were members of MFNZ, and 20 (21.74%) were members of UAVNZ and/or Aviation New Zealand.

Most of the participants (55, 59.78%) operated only under Part 101 of the CARs, while three (3.26%) operated only under a Part 102 Operator's Certificate, and 17 (18.48%) operated under both Part 101 and Part 102. Concerningly, 17 (18.48%) participants were unsure which set of CARs they were operating under. Only 6 (6.52%) participants operated UA with a mass of more than 15 kg.

#### *3.4. Analysis*

Participants were coded according to their user type, recency of flying, hours flown in last 12 months, whether they had completed a course or OCA, whether they were members of MFNZ or UAVNZ, and which Rule Part they used for operations. Chi-Squared Tests of Independence [42] were run to see whether occurrence reporting (both to CAANZ and internally) as well as non-reporting were associated with particular participant groups. Effect size is reported with Cramer's V [43]. The percentage of occurrences that were reported (using any means) was calculated for each participant as a numerical value. To see whether differences exist between the percentage of reported occurrences based upon user type, recency of flying, and hours flown, Kruskal–Wallis H tests [44] were run. Distributions were checked for similarity by visual inspection of a boxplot. Pairwise comparisons using Dunn's [45] procedure were performed and a Bonferroni [46] correction was applied. For other participant groupings (which only involve two groups), Mann–Whitney U tests [47] were used to see whether differences existed in the percentage of occurrences that were reported. Distributions were assessed to be similar based upon visual inspection, and directionality was assessed according to mean ranks and distributions using an exact sampling distribution for *U* [48].

For qualitative responses, a process consistent with Braun and Clarke's [49] fifteenpoint checklist for a good thematic analysis was used. First, all responses to each qualitative question were collated. Next, they were divided into single units of thought, which this study has called "statements". This was important as sometimes a participant may outline multiple ideas within the same response, so they may have multiple statements for a particular question. Definitions for themes were created so that statements could be thematically classified. Definitions for themes did not overlap, so participants would only be grouped into multiple themes if they made statements that expressed ideas consistent with multiple themes.

#### **4. Results**

#### *4.1. Descriptive Results*

Out of the participants, 50 (54.35%) participants had a reportable occurrence in the period of 2015 to 2022. A summary of the types of occurrences is presented in Table 2, showing the number of participants having each occurrence, the total number of each occurrence observed, the mean number for each occurrence across the sample (n = 92), and the mean number for each occurrence across the sub-sample of only those users who had that occurrence type.


**Table 2.** Types of occurrences observed among users.

<sup>1</sup> This number cannot be computed from the earlier numbers of participants for each occurrence type as a single participant may have had multiple types of occurrences.

Out of the 50 participants that indicated that they had reportable occurrences, 45 completed the follow-up questions about the percentage of occurrences that were reported using a CA005RPAS form (only), reported internally (only), reported both using a CA005RPAS form and internally, and the percentage that were not reported. Table 3 provides a summary of the prevalence of reporting systems to report occurrences.


**Table 3.** Prevalence of reporting systems to report occurrences.

<sup>1</sup> Computed by adding percentages for CA005RPAS form (only) and those who used that form as well as internal reporting. <sup>2</sup> Computed by adding percentages for internal reporting (only) and those who reported internally and submitted a CA005RPAS form.

Using the total number of reportable occurrences for each of the 45 participants who answered the follow-up questions, we can now multiply the percentage reported to CAANZ, internally, and the percentage non-reported by the number of reportable occurrences. This is important as different users had different numbers of reportable occurrences; it is not just the average percentages that are important, but the percentage of actual reportable occurrences. The sample of 45 participants who answered the follow-up questions had a total of 427 reportable occurrences, Table 4 shows how those occurrences are divided amongst reporting systems.

**Table 4.** Number and percentage of occurrences that were reported using different systems.


<sup>1</sup> Computed by adding percentages for CA005RPAS form (only) and those who used that form as well as internal reporting, and then multiplying by the user's observed number of occurrences. <sup>2</sup> Computed by adding percentages for internal reporting (only) and those who reported internally and submitted a CA005RPAS form, and then multiplying by the user's observed number of occurrences.

For this particular sample, despite the mean percentage for non-reporting being 71.13% across users, the percentage that were actually not reported was 85.72%. This was because some of the users with the highest number of occurrences had 0% reporting rates, skewing the percentage of occurrences upwards.

#### *4.2. Quantitative Results*

For the sake of brevity, statistically significant results from the Chi-Squared Tests of Independence, Kruskal–Wallis H Tests, and Mann–Whitney U Tests are reported in abbreviated form in Figure 1. In this figure, user groups are highlighted as being associated with either reporting to CAANZ via submitting a CA005RPAS Form, reporting using an internal process, or non-reporting of occurrences. These associations are based upon the results of the Chi-Squared Tests of Independence. Differences between groups in the overall percentage of occurrences that were reported (using either CA005RPAS Forms or an internal process) are shown in the bottom right corner. These are comparisons between groups determined using Kruskal–Wallis H Tests (for when there are more than two groups), or Mann–Whitney U Tests (for comparison between two groups). Full statistical reporting is provided in Appendix B. Effect sizes for statistically significant results are also reported in Appendix B.

**Figure 1.** Abbreviated results of statistical analyses. Note: \*, \*\*, and \*\*\* indicate statistical significance at the *p* < 0.05, 0.01, and 0.001 levels, respectively.

#### *4.3. Qualitative Results*

To help explain the quantitative findings, participants were asked to explain why they reported using particular systems, and if they indicated that they had alternative ways of measuring safety performance, they were also probed on what alternative ways they were using. Thematic analyses were performed to make sense of the qualitative data and are reported in this section. Sub-sections divide the different thematic analyses, and an explanation is provided before each one about the number of participants who were asked the relevant question.

#### 4.3.1. Non-Use of CA005RPAS Forms

Participants who did not report any occurrences using a CA005RPAS form were asked why they did not report any occurrences using this system. There were 40 participants who were presented with this question, however, only 31 participants provided an answer. These 31 participants made a total of 49 statements regarding why they did not use the CA005RPAS form to report any occurrences. Table 5 presents the themes from these qualitative answers.

#### 4.3.2. Use of Both CA005RPAS Forms and Internal Processes

Participants who indicated that they reported some occurrences using both a CA005RPAS form and through an internal process were asked why they used both systems. Only four participants were presented with this question due to the lack of users who used both reporting systems for the same occurrence. All four of these participants provided answers, making a total of five statements. Table 6 presents the thematic analysis from their answers.

#### 4.3.3. Use of Internal Reporting Instead of a CA005RPAS Form

Participants who indicated that they reported some occurrences internally, but not using a CA005RPAS form were asked why they reported only internally. Ten participants were presented with this question, with all ten participants providing answers, making a total of 13 statements. Table 7 presents the thematic analysis from their answers.


**Table 5.** Reasons for not using CA005RPAS forms to report any occurrences.

<sup>1</sup> Percentage calculated as of the number of participants who responded to the question, which was 31 (rather than the full sample of 92 participants).

**Table 6.** Reasons for using both a CA005RPAS form and an internal process.


<sup>1</sup> Percentage calculated as of the number of participants who responded to the question, which was 4 (rather than the full sample of 92 participants).

**Table 7.** Reasons for reporting only internally and not using a CA005RPAS form.


<sup>1</sup> Percentage calculated as of the number of participants who responded to the question, which was 10 (rather than the full sample of 92 participants).

#### 4.3.4. Non-Reporting Using Either CA005RPAS Form or Internal Systems

Participants who indicated that they did not report some occurrences using either a CA005RPAS form or an internal system were asked why they did not report those occurrences using either system. There were 33 participants who were asked this question, with 27 providing answers, making a total of 42 statements. Table 8 presents the thematic analysis from their answers.

#### 4.3.5. Alternative Ways of Monitoring Safety Performance

All participants were asked whether they used alternative ways of monitoring safety performance outside of occurrence reporting using CA005RPAS forms and internal reporting systems. Participants received different wording for this question based upon whether they were reporting occurrences or not. Eight participants out of the 13 that were reporting occurrences (including one who did not have a reportable occurrence but reported anyway) indicated that they also used alternative systems for monitoring safety performance. For those who were not reporting occurrences using CA005RPAS forms or using internal systems (whether those occurrences were reportable or not), 6 participants indicated that they used alternative systems for monitoring safety performance, while 61 participants indicated that they used no alternative systems. There were 12 participants that did not answer either version of the question. Of the 14 participants who indicated that they used alternative ways of monitoring safety performance, only 12 provided information about those alternative systems. Table 9 presents the thematic analysis based upon the responses of the 12 participants who were using alternative systems (whether in combination with CA005RPAS and internal reporting systems, or in lieu of those systems). These participants made a total of 15 statements.

**Table 8.** Reasons for not reporting occurrences using CA005RPAS forms or internal systems.


<sup>1</sup> Percentage calculated as of the number of participants who responded to the question, which was 27 (rather than the full sample of 92 participants).

**Table 9.** Alternative systems used by participants to monitor safety performance.


<sup>1</sup> Percentage calculated as of the number of participants who responded to the question, which was 12 (rather than the full sample of 92 participants).

#### **5. Discussion**

The results have indicated that a very large portion of reportable occurrences in New Zealand are going unreported, with each user on average reporting only 28.87% of their occurrences. Because of differences in the numbers of occurrences between users, the actual percentage of occurrences reported was only 14.28%. Reporting to the Civil Aviation Authority using a CA005RPAS form was particularly low, with an average of 7.62% of occurrences per user, and an observed rate of 2.74% of occurrences across this study's sample. Given the importance of safety occurrence reporting to improving organisational systems and for identifying common hazards across the sector and how these should be regulated, it is important to increase these percentages. The focus of this discussion section is on identifying ways that safety occurrence reporting might be improved, based upon the results of this study and also by examining the academic literature. It has been divided into five core areas of discussion: (1) the role of training and assessment, (2) working with member-based organisations, (3) seriousness of occurrences, (4) regulatory considerations, and (5) exploring confidential reporting systems.

#### *5.1. The Role of Training and Assessment*

One of the most useful findings from the statistical analyses was that having completed a course on UA operations and having passed an OCA both acted to improve reporting rates and decrease the likelihood of non-reporting occurrences. For the issuance of pilot licenses in crewed aviation, there are requirements in terms of passing both theory examinations and flight examinations [50]. Pilots operating under a Part 102 Operator's Certificate will need to have completed a theory course covering general aviation knowledge and UA-specific knowledge, as well as have passed an OCA [13]. However, no such requirement exists for operators under Part 101, whether flying commercially or not. As part of their *Enabling Drone Integration* discussion document of 2021, the Ministry of Transport in New Zealand are proposing to introduce a basic pilot qualification for all UA users to complete [51]. If this proposal is implemented, then there may be the opportunity to ensure that occurrence reporting is covered as part of this basic pilot qualification. Regardless of what approach is taken exactly, the results of this study suggest that the lack of any educational requirements to enter the aviation system as a UA user may be one of the drivers behind low reporting rates among users. A less formalised approach may also be to provide an online resource on CAANZ's website about occurrence reporting for UA users, which could provide clear guidelines on what should be reported and how this information will be used by the authority.

#### *5.2. Working with Member-Based Organisations*

This study found that members of UAVNZ/Aviation New Zealand were more likely to report an occurrence using the CA005RPAS form and were less likely to non-report occurrences. There were no statistically significant differences for MFNZ members. Both organisations have codes of conduct and can self-regulate their members. The difference in the observed results is likely because UAVNZ is a professional and industry body, meaning its members are commercial operators. MFNZ is an organisation for recreational operators who want to partake in aeromodelling. Unlike UAVNZ, MFNZ does have an accident reporting form and requires members to fill it out whenever someone is injured, when the model pilot files for insurance, or when the model has deviated into controlled airspace without permission [52]. However, this is also consistent with only reporting more serious occurrences (see later discussion).

One recent occurrence analysed on the UK's Confidential Human Factors Incident Reporting Programme (CHIRP) shows the significant amount of knowledge to be gained from occurrence reporting, analysing and discussion. In this report, a UA pilot and spotter had carried out a sizeable amount of hazard mitigation and planning prior to a training flight in a park. Regardless of this preparedness, plans were overthrown by an uninvolved person who walked across the UA's landing area and after a delay the UA ended up landing with minimal battery remaining. Lessons learnt and shared from this could help other new drone pilots in their hazard mitigations and preparations for training flights [53].

There is the opportunity for CAANZ to work more proactively alongside UAVNZ and MFNZ to ensure that their members report occurrences, even those that may not be strictly required by either organisation's internal policies. Working with organisations that have the ability to self-regulate may be an effective way of improving occurrence reporting, and member-based organisations have shown ability to improve performance in risk mitigation for UA operations [30]. Membership to MFNZ currently costs NZD 90 (USD 56.55) for an adult, while membership to UAVNZ currently costs NZD 253 (USD 158.98) for a company, with both organisations have other forms of membership available.

#### *5.3. Seriousness of Occurrences*

One reoccurring theme throughout this study was the perceived seriousness of the occurrence by participants. Although there is an argument for UA aircraft to have a safe place for training and currency practice where mistakes can be made, and lessons learnt, there is also an argument to be made for all occurrences being reported for statistical purposes. While these data could be recorded through a system similar to the Aviation Safety Reporting System run by NASA for the US aviation industry, CAANZ are currently the only organisation in New Zealand that collects occurrence information on the aviation industry. The importance of statistical data being available to CAANZ is highlighted in the organisation's Regulatory Safety and Security Strategy for the 2022 to the 2027 period, where the organisation's regulatory direction is towards an "intelligence-led" and "risk-based" assessment process, where CAANZ state "We rely in large part upon highquality reporting by participants of occurrences" [14] p. 19. "In short, we rely on data and information to provide intelligence that informs the formation of our strategic and operational policies and plans ... ." [14] p. 19. Without the data or with only limited data, the decisions made by the industry regulator may not be accurate to what the industry needs for safety and growth. The shortage of occurrence data may also limit the authority's ability to improve safety and avoid serious occurrences in the future [28].

Additionally, it is worth highlighting the importance of occurrence data being collected so that it can be shared within the industry so that other UA pilots and operators can avoid similar occurrences. The importance of understanding the UA modes, what could go wrong and the manual flight currency to recover prior to an occurrence was illustrated by the M600 Pro serious incident in 2019. In this occurrence the UA experienced a GPS-compass error where the UA reverted to a mode requiring manual control. However, both the pilot and observer did not recognise the error message or the loss of initial control as reason to take manual control and the UA continued drifting with the wind until it collided with a building [54]. The AAIB reported the operator had 10 s between the GPS-compass error and losing VLOS, and although CAAUK required the UA pilot to prove competence prior to carrying out commercial operations, there was no requirement at the time to maintain currency in emergency procedures. As a result of this serious incident the AAIB recommended UA pilots regularly practice manual emergency actions should automation ability be lost and raised the importance of recording minor occurrences where prior to this the UA had a minor similar occurrence in the weeks prior [54]. UA data and research are essential because UA are still in their infancy and the industry is growing [4].

#### *5.4. Regulatory Considerations*

A number of participants stated they do not submit occurrence reports because the current civil aviation regulations do not require an occurrence report to be submitted. This study also found those who operated under CAR Part 101, which does not require a hazard register, were associated with the non-reporting of occurrences. Those that operated under CAR Part 102, which includes a hazard register had reporting percentages higher than those who operated under CAR Part 101. Although the current New Zealand regulations may not specifically require occurrences to be reported, occurrence data are vital for regulations to evolve. A collision between a Robinson R44 helicopter and a Phantom 4 UA in Israel underlines the importance of reporting even when an occurrence happens within the law. In this accident, both operators were operating within the bounds of current regulations and were both approved to be in the areas of operation [55].

In 2019, an Alauda Airspeeder MK II entered controlled airspace affecting traffic in the Gatwick Airport area until its battery depleted resulting in an uncontrolled landing in a field. This shows the importance of report data for manufacturers to improve. During the resultant investigation, the AAIB found poor quality design and build contributed to the accident which saw parts within the kill-switch system detach from the UA circuit board [56]. The Australian-based organisation was compliant with Australian UA regulations and held a licence in accordance with CASA regulations. The UA was flying in the UK under an exemption approved by CAAUK for which no inspection by CAAUK was required [56]. On the day the exemption was issued, a test flight was carried out by the operator, without CAAUK present, where the UA experienced a heavy landing and damage to its landing gear. This was attributed to a power loss due to a battery fault. Although this previous day's occurrence was required under the exemption to be reported to CAAUK, no occurrence report was received to either the organisations base State authority CASA, the ATSB, or to CAAUK [56]. With UA's rapid advancement at the time, CAAUK team responsible for signing off on the exemption had little experience in the area of UA and insufficient resources [56]. This occurrence highlights how critical it is for information and in-depth knowledge of new technology for both operators and regulators, prior to approvals being granted and flights carried out. This occurrence also highlights the importance of the reporting and investigation of occurrences to develop knowledge and learning.

Like other jurisdictions have done, it would be beneficial for CAANZ to provide an explicit list of thresholds for when occurrence reporting is required, and for this to be on an accessible part of their website. Relying on users to come across AC102-1 means that those who do not have formal training are unlikely to know how to report using this system. Some of the thresholds that have been used internationally centre around the weight of the craft, airspace incursions, damage to property exceeding a certain amount, injury to uninvolved persons, loss of separation with crewed aircraft, and if the operation was commercial [15,16,18,19,57]. This study argues that any thresholds should be entirely risk-based and assessed using robust means such as the Joint Authorities for Rulemaking on Unmanned Systems' Specific Operations Risk Assessment (commonly called JARUS SORA) [58]. The current thresholds applied in AC102-1 could be used but need to be more accessible for users, such as through CAANZ's website, and clearly explained. A statement indicating that submitted CA005RPAS forms will not be used to assign blame or liability may also be appropriate if the purpose of data collection is to investigate and learn from occurrences, and would be more in line with the International Civil Aviation Organisation's Annex 13 requirements for air accident investigation [59].

#### *5.5. Exploring Confidential Reporting Systems*

This study discovered that CAANZ only received CA005RPAS forms from UA users for 2.74% of the occurrences within the sample over the study time period. This was similar to the CAAUK report which found only 25 percent of the 44 reports filed each month were by the UA pilots or operators themselves [60]. However, internal reporting was higher, with 13.15% of occurrences reported. The US confidential reporting system run by NASA, the ASRS, is aimed at collecting data and identifying deficiencies within the industry with the aim of improving safety. The ASRS system treats those that voluntarily report occurrences with immunity from prosecution so long as the act was not deliberate [20]. The EU and Australia have similar systems with ECCAIRS and REPCON, which are also aimed at improving safety.

#### **6. Conclusions**

This study presents the results obtained from a sample of 92 UA users in New Zealand regarding what sorts of occurrences they have had, how these have been reported (if at all), and why they chose to report using such systems (or non-report). There were 450 reportable safety occurrences from within the sample between 2015 and 2022, with the most common occurrences being loss of control and fly-aways. Concerningly, the average reporting rate (combining both internal reporting and the use of CA005RPAS forms) per user was only 28.87%, with the actual percentage of occurrences reported being even lower at 14.28%. This suggests that individual organisations and the industry as a whole are missing out on important safety information that may help to prevent occurrences escalating into accidents that result in injury, damage to property, or collisions with other aircraft. More so, reporting rates to CAANZ via CA005RPAS forms were particularly low, with the average reporting rate to CAANZ per user at only 7.62%, and only 2.74% of all occurrences being reported to CAANZ. This suggests that the regulator may be ill-informed about how to improve safety outcomes for the UA sector as it is not receiving sufficient data to lead evidencebased decisions. The statistical analyses and qualitative questions provide some potentially fruitful avenues for improving reporting rates amongst UA users in New Zealand. Namely, standards for training and assessment, working more effectively with member-based organisations, encouraging even non-serious occurrences to be reported, reconsideration of the current CARs, and the introduction of a confidential reporting system similar to what is used in the US, EU, and Australia. While these results are only directly applicable to New Zealand, the approach taken and the themes elucidated should also help guide other jurisdictions in the pursuit of improving safety occurrence reporting amongst UA users.

#### **7. Limitations and Future Research**

The key limitation of this research was the sample size, with only 92 valid responses. While understandable due to the topic area of the research and the number of qualitative questions that were presented to users, this does mean that the study only had the statistical power to find medium effect sizes. Small effect sizes may have been missed simply because of lack of statistical power. Nonetheless, the contribution of the qualitative themes may allow for a more structured questionnaire to be developed in the future and administered to a larger sample to check that the findings are indeed generalisable, and to provide enough statistical power to find small effect sizes. While we believe that this study has captured a useful and pragmatic cross-section of UA users in New Zealand, the convenience sampling method combined with anonymous responses means that we cannot be certain that the sample is representative. Nonetheless, we have described the recruitment methods in such a way that they could be replicated, and a similar result obtained.

Aside from generalisability, it would be valuable to have more in-depth discussions with UA users to understand why non-reporting of occurrences is so high. While some important conclusions can be made based upon this research, future research may benefit from focus groups or similar approaches being employed to properly understand why these conclusions hold true and to see how behaviours might be able to be changed in the future. That can then again be developed into a more structured approach, but with a solid foundation for making assumptions.

The final limitation is to highlight that we only observed self-reporting rates from the UA users themselves. In particular for CA005RPAS forms, it is possible for someone other than the UA user to file a report to CAANZ. In the United Kingdom, only a quarter of occurrence reports came from the UA pilots themselves [60]. Thus, it is possible that CAANZ is aware of some of the occurrences identified in this study from others reporting those occurrences. Reports from UA pilots are arguably more useful as these will contain operational and technical details that will not be able to be deduced from simply observing an occurrence. Regardless, it is important to highlight that our study is focussed only on UA users self-reporting the occurrences, rather than total occurrence reporting, including from third parties. Thus, reporting numbers inclusive of other parties may be higher than those reported in this study.

**Author Contributions:** Conceptualisation, C.N.W. and I.L.H.; methodology, C.N.W. and I.L.H.; formal analysis, C.N.W. and I.L.H.; investigation, C.N.W.; writing—original draft preparation, C.N.W.; writing—review and editing, I.L.H.; supervision, I.L.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** This study was peer-reviewed and deemed to be low risk. It was registered as such on the Massey University Human Ethics Database.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The full dataset that supports this study will be made available upon request to the corresponding author.

**Conflicts of Interest:** Claire Walton has been working on contract for the Civil Aviation Authority of New Zealand while completing this study, in the role of Regulatory Interventions Analyst. This job involves designing and managing safety initiatives for crewed aviation flight training. To manage any potential conflicts of interest with her research, she was not involved with any aspect of uncrewed aviation for the authority during her studies and has a signed conflict of interest with the authority. This research was completed in the capacity of her academic studies and none of the findings benefit her directly or indirectly. Isaac Henderson is the current Chair of UAVNZ, an industry and professional body representing the uncrewed aviation sector in New Zealand. He is also an active consultant in the uncrewed aviation industry. However, none of the findings directly or indirectly benefit him, and his primary involvement in this research was in the capacity of a supervisor.

#### **Appendix A. Online Survey Questions**

	- a. Male
	- b. Female
	- c. Other (please specify)
	- d. Prefer not to say
	- a. Referred by a friend
	- b. Sent a link from an organization that you are a member of
	- c. Saw it on social media
	- d. Other (please specify)
	- a. Not a current unmanned aircraft user
	- b. Recreational unmanned aircraft user (primarily for enjoyment)
	- c. Semi-professional unmanned aircraft user (where less than 50% of your work time is spent on activities related to unmanned aircraft operations)
	- d. Professional unmanned aircraft user (where more than 50% of your work time is spent on activities related to unmanned aircraft operations)
	- a. Within the last month
	- b. Within the last 6 months
	- c. Within the last year
	- d. More than a year ago
	- a. Yes
	- b. No
	- a. Yes
	- b. No
	- a. Yes
	- b. No
	- a. Yes
	- b. No
	- a. Under Part 101 of the Civil Aviation Rules
	- b. Under a Part 102 Operator's Certificate
	- c. Under both Part 101 and Part 102
	- d. Unsure
	- a. Yes
	- b. No
	- a. Injury to persons (including yourself)
	- b. Loss of control
	- c. Fly-away
	- d. Motor or structural failure
	- e. Loss of separation with a manned aircraft
	- f. Incursion into airspace where you were not authorised to fly
	- g. Damage to third-party property
	- h. None of the above

	- a. Yes
	- b. No
	- a. Yes
	- b. No

*Questions below this point were only asked to those who had at least one reportable accident since 2015*

	- a. CA005 RPAS Form (only) [Slider from 0–100%]
	- b. Internal reporting process (only) [Slider from 0–100%]
	- c. Both the CA005RPAS form and an internal reporting process [Slider from 0–100%]
	- d. Not reported using a CA005RPAS form or an internal reporting process [Slider from 0–100%] (Note that survey forced percentages to add up to 100%)

a. Yes

	- a. Yes
	- b. No
	- a. Yes
	- b. No

#### **Appendix B. Full Statistical Reporting**

*Appendix B.1. Chi-Squared Tests of Independence*

Results for Chi-Squared Tests of Independence are reported here and are divided by which demographic groupings were being tested for associations with use of particular reporting systems or non-reporting. Non-reported was tested in two manners: one as a categorical variable indicating that a participant had at least one non-reported occurrence, the other as a categorical variable indicating that a participant had not reported any occurrences.

Appendix B.1.1. User Type


Appendix B.1.2. Recency of Flying UA

No associations were statistically significant.

Appendix B.1.3. Hours of Flying UA within Last 12 Months

• Non-reporting of at least one occurrence was associated with users in the less than 5 h, 5–10 h, and 10–25 h categorisations, *x*<sup>2</sup> (4) = 11.700, *p* = 0.020, with a large effect size, V = 0.510.

Appendix B.1.4. Completion of a Course on UA Operations


Appendix B.1.5. Passing an OCA


Appendix B.1.6. MFNZ Membership

No associations were statistically significant.

Appendix B.1.7. UAVNZ/Aviation New Zealand Membership


Appendix B.1.8. Rule Part Operated Under


*Appendix B.2. Kruskal–Wallis H Tests*

Kruskal–Wallis H Tests revealed the following results:


*Appendix B.3. Mann–Whitney U Tests*

Mann–Whitney U Tests revealed the following results:


#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Application of Connected Vehicle Data to Assess Safety on Roadways**

**Mandar Khanal \* and Nathaniel Edelmann**

Department of Civil Engineering, Boise State University, Boise, ID 83725, USA

**\*** Correspondence: mkhanal@boisestate.edu; Tel.: +1-208-426-1430

**Abstract:** Using surrogate safety measures is a common method to assess safety on roadways. Surrogate safety measures allow for proactive safety analysis; the analysis is performed prior to crashes occurring. This allows for safety improvements to be implemented proactively to prevent crashes and the associated injuries and property damage. Existing surrogate safety measures primarily rely on data generated by microsimulations, but the advent of connected vehicles has allowed for the incorporation of data from actual cars into safety analysis with surrogate safety measures. In this study, commercially available connected vehicle data are used to develop crash prediction models for crashes at intersections and segments in Salt Lake City, Utah. Harsh braking events are identified and counted within the influence areas of sixty study intersections and thirty segments and then used to develop crash prediction models. Other intersection characteristics are considered as regressor variables in the models, such as intersection geometric characteristics, connected vehicle volumes, and the presence of schools and bus stops in the vicinity. Statistically significant models are developed, and these models may be used as a surrogate safety measure to analyze intersection safety proactively. The findings are applicable to Salt Lake City, but similar research methods may be employed by researchers to determine whether these models are applicable in other cities and to determine how the effectiveness of this method endures through time.

**Keywords:** road safety; surrogate safety measure; crash; prediction; connected vehicle data; harsh braking

### **1. Introduction**

Surrogate safety measures (SSMs) offer benefits over traditional safety analysis methods that use historical crash data. SSMs are a type of safety analysis that make use of data other than crash data, typically vehicle kinematic data. The first benefit of SSMs is that they use data which may be collected more rapidly than historical crash data. Crashes are rare events, and historical data may require years of accumulation to conduct a safety analysis. The second benefit is that SSM analysis is proactive, allowing for safety analysis prior to crashes occurring. An unsafe location may therefore be identified and improved before crashes occur, preventing injuries and property damage and possibly saving lives. The third benefit of SSMs is that the kinematic data used in a safety analysis with SSMs are much more voluminous, allowing for statistical methods to be more effective.

The kinematic data employed by SSMs may come from several sources. In the past, manual measurement at the study site was used. This method of data collection was problematic because it allowed for subjectivity and was difficult to perform accurately due to the fleeting nature of traffic interactions. Manual observation was replaced with video recordings which made it possible for traffic interactions to be replayed and offered the chance for multiple observers to analyze interactions, thus improving the problem of subjectivity. This problem has been further ameliorated with automated video data reduction with technology such as that offered by Transoft, Iteris, and similar companies. Additionally, microsimulation technology has allowed for simulation to be used as a source of kinematic data. This method eliminates subjectivity, as the computer running the

**Citation:** Khanal, M.; Edelmann, N. Application of Connected Vehicle Data to Assess Safety on Roadways. *Eng* **2023**, *4*, 259–275. https:// doi.org/10.3390/eng4010015

Academic Editors: Sanjay Nimbalkar and Antonio Gil Bravo

Received: 25 November 2022 Revised: 28 December 2022 Accepted: 12 January 2023 Published: 14 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

simulation provides the data rather than human observers [1]. Microsimulation produces highly detailed and precise data and can produce large volumes of data with relatively little effort in comparison with manual collection. The fault of microsimulation lies in it being an abstraction rather than reality. While microsimulations are still highly useful, there has been research into the use of connected vehicle (CV) data with SSMs, meaning the use of data from the physical world rather than simulation.

CVs are a source of traffic data that allows for the high level of precision offered by microsimulation along with the realism of being generated by human drivers. CVs are automobiles sold to the public that include a transceiver which allows data to be collected regarding the vehicle's motion. For the sake of privacy, no individually identifiable information about the vehicle is visible. Vendors offer CV data to clients who wish to use the data for research and engineering projects. The main drawback of using data from CVs is that they currently comprise a small percentage of the total number of vehicles in the United States. A study from October 2021 found the median CV penetration rate to be approximately 4.5% [2]. Therefore, CVs do not offer a full picture of traffic. They are gradually becoming more common, though, as older vehicles are retired and replaced with new vehicles that are connected. Research into effective analysis methods with CV data will become more valuable as time goes on, speaking to the need for this research to take place now for a future increase in CVs.

One metric that is available from CVs is harsh braking event counts, which form the basis for the models developed in this study. Data points from CVs include information about braking and acceleration. The braking data may be filtered so that harsh braking events are identified and counted and then used as a regressor variable in a crash prediction model. This method is investigated in this paper. The significance of other regressor variables, such as CV volume and intersection geometric characteristics, was also investigated. The proposed crash prediction models may be used to estimate monthly counts of intersection-related crashes and offer all of the benefits of SSMs mentioned above.

The statistical models developed in this study show promise for use as a surrogate safety measure. Of the twelve statistical models developed in this study, ten possess a high level of statistical significance. Although connected vehicle penetration rates are too low at this time to depend upon models such as these, once these penetration rates increase, these models will offer an additional method of analysis.

#### *1.1. Literature Review*

Researchers have developed many SSMs which tend to fall into three categories. SSMs can be a time-based measure, a deceleration-based measure, or a safety index. Although most SSMs consider collisions involving two vehicles, it is possible to model single-vehicle crashes due to distraction or error [3]. SSMs operate upon the concept that events with greater risk tend to happen less frequently, with the riskiest and rarest events being the events that result in collision [4]. By analyzing less risky events that occur significantly more frequently, a safety analysis with SSMs can offer more insight into safety than an analysis with crash data alone.

#### 1.1.1. Time-Based Measures

Time-based measures consider the kinematics of vehicles and how much of a time gap exists between vehicles. Time-to-collision (TTC), post-encroachment time (PET), and proportion of stopping distance (PSD) are time-based SSMs. TTC is a measure of the amount of time required for the space between two vehicles to close. TTC on its own is transient, but Minderhoud and Bovy developed aggregation methods in the form of their extended TTC measures, namely time-integrated TTC and time-exposed TTC [5]. Postencroachment time is the difference in time between when an encroaching vehicle exits the path of travel and when a following vehicle first occupies the location where a collision would have occurred. A modified form of PET exists as initially attempted PET (IAPE). IAPE corrects the measure to account for the acceleration that commonly occurs when a

driver determines that a conflict has ended [6]. PSD is a ratio between the distance a vehicle is from a potential collision location and the minimum stopping distance. These distances depend upon the velocity of the vehicles involved, making PSD a time-based measure.

There are both strengths and weaknesses associated with time-based SSMs. The strength of time-based SSMs lies in their simplicity and intuitiveness. TTC and PET may be implemented with kinematic data supplied by either on-site measurements or microsimulation. PSD also requires such kinematic data, but it also requires information on the vehicles' possible deceleration rates. This deceleration rate can be an established value or distribution of values or may be derived from environmental conditions. Drivers are aware of the importance of following distance and time headway, making these measures intuitive for researchers and practitioners alike. A weakness of time-based SSMs is the possibility of multiple encounters producing identical measures [7]. TTC may evaluate the same solution for both an encounter with a large speed differential between vehicles and a long following distance and another encounter with a small speed differential but a short following distance. This has made it difficult to establish particularly meaningful safety thresholds for these measures. Another weakness is the inability of time-based SSMs to evaluate the severity of a potential collision. In the encounters just described, which both result in an identical TTC, the severity of a resulting collision will be very different due to the differing speed differentials.

#### 1.1.2. Deceleration-Based Measures

Deceleration-based measures consider braking action and the braking capacity of vehicles and are better equipped than time-based measures to evaluate potential crash severity. Additionally, this type of measure considers a driver's evasive action, an important component of traffic conflicts. Deceleration-based measures include braking applications and deceleration rate to avoid collision (DRAC). Brake applications have been found to be a poor SSM due to the variability in braking habits among drivers. Brake applications are such a common act, even in benign situations, that they are not highly indicative of a conflict [6]. Brake applications as an SSM fail to consider the severity of each particular braking action, something that DRAC and harsh braking are able to capture to their benefit. DRAC is a measure of the deceleration rate that a following vehicle would need to apply to avoid colliding with a leading vehicle. This measurement is compared to a safety threshold, commonly given as 3.35 m/s2, to determine whether a conflict occurred [8].

Harsh braking events have also been suggested as an indicator of a conflict, which would also fall under the category of deceleration-based measures. A 2015 study found a high level of correlation between crash counts and harsh braking events, defined as events with a large absolute value of the first derivative of acceleration, known as jerk. These events were collected by vehicles with GPS units which collected data on the vehicles' location over time, allowing the jerk value to be computed. Mousavi found a threshold of <sup>−</sup>0.762 m/s3 to be the most effective to define harsh braking but also noted that this threshold is lower than expected. Further investigation of a proper jerk threshold was recommended [9].

#### 1.1.3. Safety Indices

Safety indices are the third category of SSM. These indices consider various factors and produce an indirect safety metric. Two examples are crash potential index (CPI) and the aggregated crash propensity metric (ACPM). CPI was developed to improve upon the drawbacks of the DRAC measure. While a constant safety threshold value is typically used with DRAC, the braking capacity of vehicles is variable for mechanical and environmental reasons. CPI considers this variability through the use of a maximum available deceleration rate (MADR) distribution. The probability that DRAC is greater than MADR is a term in the computation of CPI. ACPM also considers the MADR distribution in conjunction with a distribution of driver reaction times to compute the probability that each vehicle interaction will result in collision. These probabilities are aggregated to produce the ACPM [10]. CPI and ACPM indicate the safety level of a study location and time period without being a single measure of some observable quality.

Of the SSMs discussed, the analysis of harsh braking events holds potential due to its compatibility with CV data. Previous studies, such as Mousavi's thesis [9] and the work of Bagdadi and Varhelyi [11], have analyzed harsh braking data from GPS units due to the lack of availability of large-scale CV data when these studies were conducted. He et al. investigated the use of CV data for SSMs, using a safety pilot model dataset to compute TTC, DRAC, and a modified form of TTC [12]. Their study demonstrated the effectiveness of computing these measures with kinematic data from CVs. The development of a crash prediction model that uses harsh braking data from CVs would bridge the gap between these two studies and provide another tool for safety analysis.

#### **2. Materials and Methods**

The methods undertaken in this study include the three following phases: selection of study intersections, data collection, and statistical modeling. CV data collection was enabled by the automobile companies that manufactured the CVs. This study uses data within Salt Lake City, Utah for the months of March 2019, January 2021, and August 2021. These months were selected due to the availability of CV data for these particular months. A larger sample size in future studies would be preferable, but there were only three months of CV data due to budgetary restrictions.

#### *2.1. Intersection Selection*

The intersection selection process involved the collection of crash counts for all major intersections in Salt Lake City, amounting to 370 intersections. Crash counts for the three study months were obtained from the UDOT database and summed to find the total number of crashes for the intersections. The crashes within the UDOT system were filtered to include only those deemed to be intersection related by law enforcement. The sixty intersections with the most crashes were selected. The total monthly crashes ranged from zero to six. The sixty chosen study intersections included both signalized and unsignalized intersections.

#### 2.1.1. Data Collection

The CV data interface comprises an interactive map and a control pane. The map displays waypoints that are produced by the CVs. When a CV is in motion, waypoints are produced once every three seconds. The waypoints are grouped by the overall trip of which it is a part by a journey ID number, making it possible to collect CV volumes. The waypoints also include data such as geographical location, a timestamp, speed, acceleration, jerk, heading, and information about the origin and destination of the trip that includes the particular waypoint. Harsh braking events were identified using the jerk values of these waypoints. Jerk is the first derivative of acceleration and is recorded for each of the waypoints. Jerk is a continuous measure for a vehicle, similar to speed or location. Each waypoint contains a value for jerk at the particular moment corresponding to the waypoint. This value is derived from the speed data. A geospatial filter was applied to limit the waypoints to those within the influence area of the study intersections, the main intersection square, and the legs of the intersection 250 ft behind the stop bar as displayed in Figure 1 [13]. Another filter was applied to limit waypoints to only those that possess a jerk value that is above the threshold that differentiates a regular braking event from a harsh braking event. This jerk threshold varied in this study to test the effectiveness of several harsh braking definitions. Thresholds tested varied between <sup>−</sup>0.15 m/s<sup>3</sup> and <sup>−</sup>3.2 m/s<sup>3</sup> in increments of 0.15 m/s3. The query tool was used to obtain counts of harsh braking events for each of the jerk thresholds.

**Figure 1.** Intersection influence area with waypoints displayed.

Other metrics collected from the CV data included the CV volumes and the average jerk value for each of the intersections. The CV volumes were obtained by querying the unique count of the journey ID numbers. This counts the number of groups of waypoints that belong to trips that pass through the intersection. Thus, the volume of vehicles passing through the intersection is obtained. The total monthly CV volume was collected as was the total monthly volume that used the intersection between the hours of 7 AM and 9 AM and between the hours of 4 PM and 6 PM. The average jerk value among all waypoints within the intersection influence area was obtained on a monthly basis for each of the three study months for each of the intersections.

In addition to the crash data and CV data, information regarding the geometry and geography of each of the intersections was collected. The number of approaches with left turn lanes, the number of approaches with right turn lanes, and the maximum number of lanes that a pedestrian would have to cross were collected using Google Earth. Historical imagery was employed to ensure that these values were correct for the study months in question. ArcGIS Pro was used to determine the number of bus stops and the number of schools within a 305 m radius of the center point of each of the intersections. These metrics were included in this study because they are used in the safety performance functions within the Highway Safety Manual [14]. Table 1 is a summary of the dependent, exposure, and regressor variables collected for analysis in this study as organized per intersection or segment per month.


#### **Table 1.** Summary of variables.

\* N/A denotes not applicable.

#### 2.1.2. Statistical Analysis

Once these data points were collected for each of the study intersections during each of the study months, a statistical regression analysis was performed to produce crash prediction models for Salt Lake City. Poisson regression, negative binomial regression, and generalized Poisson regression were considered in the analysis. Three statistical methods were used for the sake of producing a larger number of total models and investigating which of the regression methods performed best. Poisson regression requires that the mean and variance are equal for the dependent variable in the regression. The mean and variance of the monthly crashes at the intersections were approximately equal, making Poisson regression a viable option.

#### Poisson Regression

Poisson regression is applicable when the variable of interest is assumed to follow the Poisson distribution, which is a model of the probability that a particular number of events will occur. The dependent variable is the event count, which can be any of the nonnegative integers. Large counts are assumed to be uncommon, making Poisson regression similar to logistic regression, with a discrete response variable. Poisson regression, unlike logistic regression, does not limit the response variable to specific values. The Poisson distribution model takes the form given in Equation (1), in which *Y* is the dependent variable, *y* is a count from among the nonnegative integers, and *μ* is the mean incidence rate for an event per unit of exposure.

$$Pr(Y = y | \mu) = \frac{e^{-\mu}\mu^y}{y!} \quad (y = 0, 1, 2, \cdots) \tag{1}$$

If the Poisson incidence rate, *μ*, is assumed to be determined by a set of regressor variables, then Poisson regression is possible through the expression displayed in Equation (2) and the regression model displayed in Equation (3). In these equations, *X* is a regressor variable, *β* is a regression coefficient, and *t* is the exposure variable.

$$\mu = \exp(\beta\_1 X\_1 + \beta\_2 X\_2 + \dots + \beta\_k X\_k) \tag{2}$$

$$Pr(Y\_i = y\_i | \mu\_i, t\_i) = \frac{e^{-\mu\_i t\_i} (\mu\_i t\_i)^{y\_i}}{y\_i!} \tag{3}$$

The regression coefficients in Equation (2) may be estimated by maximizing the log-likelihood for the regression model. This is achieved by setting the derivative of the loglikelihood equal to zero to generate a system of nonlinear equations which may be solved with an iterative algorithm. The reweighted least squares iterative method is typically able to converge to a solution within six iterations [15].

#### Negative Binomial Regression

The negative binomial distribution is a generalization of the Poisson distribution that includes a gamma noise variable. This allows for negative binomial regression to be performed even if the dependent variable's mean and variance are not equal [16]. Negative binomial regression is commonly used for traffic safety applications because it has loosened restrictions in comparison to Poisson regression but is still capable of estimating an observed count, such as crash counts [17]. The negative binomial distribution takes the form presented in Equation (4), in which *α* is the reciprocal of the scale parameter of the gamma noise variable and other variables are as defined previously.

$$\Pr(Y = y\_i | \mu\_i, \mathfrak{a}) = \frac{\Gamma(y\_i + \mathfrak{a}^{-1})}{\Gamma(y\_i + 1)\Gamma(\mathfrak{a}^{-1})} \left(\frac{\mathfrak{a}^{-1}}{\mathfrak{a}^{-1} + \mu\_i}\right)^{\mathfrak{a}^{-1}} \left(\frac{\mu\_i}{\mathfrak{a}^{-1} + \mu\_i}\right)^{y\_i} \tag{4}$$

The mean of *y* in negative binomial regression depends upon the exposure variable and the regressor variables which are related by the expression displayed in Equation (5). Negative binomial regression is possible with the regression model displayed in Equation (6). In these equations, *x* is a regressor variable, and the other variables are as defined previously. As with Poisson regression, maximizing the log-likelihood may be used to estimate the regressor coefficients through an iterative algorithm [16].

$$\mu\_i = \exp(\ln(t\_i) + \beta\_1 \mathbf{x}\_{1i} + \beta\_2 \mathbf{x}\_{2i} + \dots + \beta\_k \mathbf{x}\_{ki}) \tag{5}$$

$$\Pr(Y = y\_i | \mu\_i, a) = \frac{\Gamma\left(y\_i + a^{-1}\right)}{\Gamma(a^{-1})\Gamma(y\_i + 1)} \left(\frac{1}{1 + a\mu\_i}\right)^{a^{-1}} \left(\frac{a\mu\_i}{1 + a\mu\_i}\right)^{y\_i} \tag{6}$$

Generalized Poisson Regression

Generalized Poisson regression, like negative binomial regression, is applicable in a broader set of circumstances than Poisson regression. This is because it does not have the requirement that the mean and variance of the dependent variable in the regression be equal. There are two types of generalized Poisson regression models: Consul's generalized Poisson model and Famoye's restricted generalized Poisson regression model. Consul's model, also known as the Generalized Poisson-1 (GP-1) model, is the regression model that was employed in this study. The GP-1 model operates on the assumption that the dependent variable, *y*, is a random variable following the probability distribution presented in Equation (7), in which λ is the number of events per unit of time and α is the dispersion parameter which can be estimated using Equation (8) [18]. In Equation (8), *N* is the number of samples, *k* is the number of regression variables, *y*<sup>i</sup> is the *i*th observed value, and *yˆi* is the Poisson rate λ<sup>i</sup> predicted for the *i*th sample [19].

$$Pr(Y = y\_i) = \frac{e^{-(\lambda + a \* y\_i)} \* (\lambda + a \* y\_i)^{y\_i - 1} \* \lambda}{y\_i!} \tag{7}$$

$$\alpha = \frac{\sum\_{i=1}^{N} \left(\frac{|y\_i - \phi\_i|}{\sqrt{g\_i}} - 1\right)}{N - k - 1} \tag{8}$$

Poisson, negative binomial, and GP-1 regression techniques were explored by model generation in R. Models with many different combinations of regressor variables were created to find the model that performed best. In all models, the number of monthly crashes was used as the dependent variable, and the monthly CV volume was used as the exposure variable. The statistical models were evaluated based on the significance of the regressor variables used in the models, on the basis of the Akaike Information Criterion, and based on the residuals generated by the models. The best-performing models were selected and are summarized and discussed in the Results and Discussion Sections.

#### *2.2. Segment Analysis*

The preliminary results from the intersection study prompted interest in how the results of an intersection-based study would compare to the results of a segment-based study. To address this, a segment analysis was conducted. CV data were collected for thirty road segments in the Salt Lake City area. These segments include sections of interstate highway within the Salt Lake City limits and sections of interrupted state highway outside of the influence area of any intersections. The segments were all made to be approximately one-quarter mile in length to ensure that the segments had roughly equal exposure to crashes occurring. This prevented the need to determine a crash rate per unit length.

The segment CV data were collected in the same manner as the intersection CV data with a couple of key differences. First, the intersection CV data were all collected from within intersection influence areas. The segment CV data were all collected from areas entirely outside of intersection influence areas. Second, the geometric information and information related to schools and bus stops were not collected for the segments. Rather, the segment data included only harsh braking events for jerk thresholds ranging between <sup>−</sup>0.3 m/s<sup>3</sup> and <sup>−</sup>3.0 m/s<sup>3</sup> in increments of 0.3 m/s3, as well as monthly CV counts, monthly CV counts between the hours of 7 AM and 9 AM, and monthly CV counts between the hours of 4 PM and 6 PM. As with the intersection analysis, crash data were collected for the segments from the UDOT database. The increment between successive jerk thresholds for segments differs from that which was used for intersections. This was done simply for the purpose of decreasing the amount of work needed for the analysis. More thresholds could have been tested, but the preliminary results from the intersection study indicated that the increment did not need to be as fine as 0.15 m/s3. Table 2 is a summary of the variables collected for segments in this study.

Statistical analysis was conducted in the same manner as the intersection analysis, with Poisson, negative binomial, and generalized Poisson models generated and evaluated for the segment dataset. The best-performing models were selected and are summarized and discussed in the following sections.



#### **3. Results**

The collected intersection data were used for a statistical regression analysis, and the best regression model for each of the model families was found that had a high level of significance among the regressor variables and the intercept. The best Poisson model uses *Jerk18* and *Schools* from Table 1 as regressor variables. The best negative binomial model also uses *Jerk18* and *Schools* as regressor variables. The best generalized Poisson model uses *Jerk18* as a regressor variable. All of these models have a better than 0.1% significance level for their regressor variables and the intercept. In the case of the generalized Poisson model, both intercepts are significant at a better than 0.1% level. These models are summarized in Table 3.

**Table 3.** Summary of regression models for intersection analysis.



**Table 3.** *Cont.*

The segment analysis also yielded three statistical models: a Poisson regression model, a negative binomial regression model, and a generalized Poisson regression model. The best Poisson, negative binomial, and generalized Poisson models identified use *Jerk2* as a regressor variable. All models have a better than 0.1% significance level for their regressor variable and intercept(s). These models are summarized in Table 4.

**Table 4.** Summary of regression models for segment analysis.


The estimates for the coefficients of the harsh braking variable in each of these regression models (*Jerk18* and *Jerk2*) are all negative, indicating that an increase in hard braking events decreases the estimate for the number of crashes that will occur within the intersection area or along the segment in question. This suggests that hard braking events

are an indication of safety. This is true at intersections as well as on segments away from the influence of intersections.

Tables 3 and 4 include the models with the best level of statistical significance, but there were numerous other models identified which also were statistically significant. A number of potential models could theoretically be used with similar results. The models display a gradual degradation in significance as the jerk variable used gets further away from the Jerk18 variable for intersections and the Jerk2 variable for segments.

Validation efforts conducted with the models produced the following graphs, displayed in Figures 2 and 3. These graphs display the expected monthly crash counts for each of the three models on the vertical axis. The horizontal axis represents the observed monthly crash counts that correspond to each of the expected crash counts. The "jitter" function in R has been used to generate these plots; hence, there is scatter around the integer counts of observed crashes.

**Figure 2.** Intersection fitted crash counts versus observed crash counts.

**Figure 3.** Segment fitted crash counts versus observed crash counts.

An additional analysis was conducted in the same manner as that which yielded the results presented up to this point, except with outlier crash counts removed from the intersection and segment datasets. The outliers were identified using boxplots generated for the observed crash counts. These boxplots are presented in Figure 4. The outliers are denoted as black points in Figure 4. The best identified Poisson, negative binomial, and generalized Poisson models are summarized in Tables 5 and 6.

**Table 5.** Summary of regression models for intersection analysis with outliers removed.


**Table 6.** Summary of regression models for segment analysis with outliers removed.



**Figure 4.** Boxplots of the observed monthly crash counts at intersections and segments.

Validation efforts were conducted for the models generated with outlier crash counts removed from the datasets. These validation efforts produced the graphs displayed in Figures 5 and 6. These graphs display the expected monthly crash counts for each of the three models on the vertical axis. The horizontal axis represents the observed monthly crash counts that correspond to each of the expected crash counts.

**Figure 5.** Intersection fitted crash counts versus observed crash counts with outliers removed.

**Figure 6.** Segment fitted crash counts versus observed crash counts with outliers removed.

#### **4. Discussion**

This study demonstrates the effectiveness of using harsh braking data from CVs as a surrogate safety measure. For both intersections and segments, statistically significant models may be developed from multiple model families. Models such as these may be used to predict future crash rates for the purposes of prioritizing improvements and identifying risks to the public.

The results of this study reveal the jerk threshold for intersections and segments. For intersections, the jerk threshold is <sup>−</sup>2.7 m/s3, corresponding to the regressor variable *Jerk18*. This threshold was identified to be the most effective for all three statistical model families. The jerk threshold for segments was found to be <sup>−</sup>0.3 m/s3, corresponding to the variable *Jerk2*. A jerk threshold of <sup>−</sup>2.7 m/s3 for intersections and <sup>−</sup>0.3 m/s3 for segments indicates that intersections and segments operate differently in terms of safety. The jerk threshold is the value of jerk that differentiates ordinary events from harsh braking events. Events that do not meet the jerk threshold have little or no bearing on crash prediction. A larger absolute value of jerk threshold for intersections over segments indicates that braking must be more severe at an intersection to qualify a braking event as *harsh*. This could be due to different expectations of drivers in these differing contexts. At intersections, drivers expect to brake and are typically able to see the status of the traffic signal well in advance. Moderately hard braking, such as an event that generates a jerk value of <sup>−</sup>1.5 m/s3, is expected and therefore ordinary. Such an event in a segment context, however, would be relatively unexpected and therefore extraordinary because segments are expected to have more uniform and smooth flow. This event would therefore qualify as a harsh braking event in a segment context but not in an intersection context.

The coefficient estimates for the harsh braking event count variables were found to be negative for all statistical models generated, indicating that an increase in harsh braking events is correlated with an increase in the frequency of zero crashes and a decrease in the frequency of one or more crashes. This means that harsh braking is correlated with increased safety on roads. The coefficient estimates for the jerk variable are small relative to other covariates, when the covariates are statistically significant. The small value for these coefficient estimates is due to the number of crashes being a small number relative to the high jerk event counts, as can be seen in Tables 1 and 2. To obtain a crash count estimate from a high jerk event count requires that the coefficient estimate be quite small. This study was predicated upon the notion that a harsh braking event corresponds to a traffic conflict and that traffic conflicts and collisions are related. That these models have negative estimates for the coefficients of the harsh braking event counts suggests that harsh braking events are indicative of the prevention of a traffic conflict which leads to the increase in the probability of zero crashes and to the increase in the probability of one or more crashes. Harsh braking events are events which might have been collisions but never were as a result of the evasive action of the drivers involved.

The statistical models presented in Tables 3 and 4 possess an excellent level of statistical significance at a level better than 0.1%. Poisson models are simpler than negative binomial models and generalized Poisson models, making them preferable if applicable. While the requirement that the mean and variance of the dependent variable be equal was approximately satisfied for the dataset used in this study, that may not be the case for other datasets. Therefore, negative binomial and generalized Poisson regression are recommended for crash prediction models based on harsh braking data.

As mentioned above, the presence of schools was found to increase crash frequency within intersection influence areas. This confirms the efficacy of the use of school presence in HSM safety analysis methodology. The estimated coefficients for the *Schools* variable are positive, indicating that the presence of a school or multiple schools nearby decreases the frequency of zero crashes and increases the frequency of one or more crashes. The presence of schools increases pedestrian activity and the presence of young drivers, which may help explain this increase.

The graphs presented in Figures 2 and 3 illustrate that these models fail to predict high crash counts while performing better at locations with lower numbers of observed crashes. This was not unexpected because the count models used in this study predict low probabilities for higher counts. The statistical significance of the regressor variables in the models, on the other hand, speaks to their overall strength. As CV penetration rates increase, allowing models based on CV data to be trained by a fuller picture of the activity on roads, models of this form will likely become more effective. Preliminary studies such as this, using CV data information in its technological infancy, set the stage for a future in which CVs become significantly more widespread and CV data capture a large portion if not a majority of roadway traffic. Figures 5 and 6, as well as the RMSE values presented in Tables 5 and 6 demonstrate that the models' predictive ability improves when outliers are removed. The RMSE values for intersections decreased from approximately 0.95 to 0.63 for intersections and from approximately 1.55 to 0.76 for segments. The decrease in RMSE indicates that the models produce more accurate crash count estimates when outliers are removed.

#### **5. Conclusions**

This study developed several statistical models which use harsh braking event counts from CV data in Salt Lake City as regressor variables and crash counts as the dependent variables. Both intersections and segments were considered separately in this study with models derived for each. Poisson, Negative Binomial, and Generalized Poisson models were developed and they revealed the jerk threshold for intersection influence areas to be <sup>−</sup>2.7 m/s3 and the jerk threshold for segments to be <sup>−</sup>0.3 m/s3. Additionally, the presence of schools within 305 m was found to be a statistically significant variable for intersection influence areas.

Crash prediction models such as these, based on harsh braking event counts, hold promise for agencies and industry as another tool for safety analysis. Agencies may investigate these models and tailor them to their jurisdictions for the purpose of adding such models to their established methodologies. Such tailored models may then be employed as a means of conducting comparative safety analysis for the purpose of identifying crashprone locations and prioritizing improvements. Once a particular area is identified as being crash prone, further investigation into the cause of the safety hazard may commence. Employing harsh braking models such as those developed in this study requires less labor investment than existing methods, allowing for more frequent and widespread analyses to identify and characterize road hazards. It should be noted that the intersection models developed in this research are likely not applicable to sites with low crash activity, as intersections were selected to maximize the amount of historical crash activity.

Future research into SSMs that are based on harsh braking events could include the investigation of regional differences in models, the use of additional regressor variables in segment-based models, and harsh positive acceleration data from CVs. Regional differences may exist pertaining to the relationship between harsh braking and collisions. Harsh braking events were found to be positively correlated to crashes in a previous study in Louisiana which is contrary to the findings of this study [9]. While this may be due to the significant differences in the methods of data collection between these two studies, regional variations may also be a factor and ought to be investigated further. Additional regressor variables were not investigated in the segment-based models developed in this study to the degree to which they were investigated in the intersection-based models. The inclusion of such additional regressor variables for segments ought to be investigated more fully in a future study. These variables may include speed limits, curvature parameters, lane widths, or total number of lanes, among others. Finally, harsh positive acceleration data may be obtained in the same manner in which harsh braking data were collected in this study. Harsh acceleration may be an indicator of safety or the lack thereof because it can represent erratic driving behavior or situations in which a driver is attempting to clear a potential crash location rapidly. The consideration of harsh acceleration data may be performed separately from harsh braking data or in combination with harsh braking data. If attempts are successful, this would yield yet another tool for agencies and industry to employ for surrogate safety analysis.

**Author Contributions:** The authors confirm contribution to the paper as follows: study conception and design: M.K.; data collection: N.E.; analysis and interpretation of results: M.K. and N.E.; manuscript preparation: N.E. and M.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially funded by Subaward No. UWSC9924 from the University of Washington to Boise State University from the U.S. Department of Transportation award to the University of Washington.

**Data Availability Statement:** Data used in this research are available by contacting the corresponding author at mkhanal@boisestate.edu.

**Acknowledgments:** The authors are grateful to the Boise State University Department of Civil Engineering for their support of this research. The authors would also like to express their gratitude to the support received from the PacTrans Region 10 University Transportation Center that made the procurement of the CV data possible.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **An Approach to Quantifying the Influence of Particle Size Distribution on Buried Blast Loading**

**Ross Waddoups 1, Sam Clarke 1,\*, Andrew Tyas 1,2, Sam Rigby 1, Matt Gant <sup>3</sup> and Ian Elgy <sup>3</sup>**


**\*** Correspondence: sam.clarke@sheffield.ac.uk

**Abstract:** Buried charges pose a serious threat to both civilians and military personnel. It is well established that soil properties have a large influence on the magnitude and variability of loading from explosive blasts in buried conditions. In this study, work has been undertaken to improve techniques for processing pressure data from discrete measurement apparatus; this is performed through the testing of truncation methodologies and the area integration of impulses, accounting for the particle size distribution (PSD) of the soils used in testing. Two experimental techniques have been investigated to allow for a comparison between a global impulse capture method and an area-integration procedure from a Hopkinson Pressure Bar array. This paper explores an area-limiting approach, based on particle size distribution, as a possible approach to derive a better representation of the loading on the plate, thus demonstrating that the spatial distribution of loading over a target can be related to the PSD of the confining material.

**Keywords:** buried charges; impulse; particle size distribution; soil condition; landmine

### **1. Introduction**

Improvised Explosive Devices (IEDs) and landmines form a serious threat to life in military and civilian situations around the world. In 2020, over seven thousand people were killed or injured by landmines or 'Explosive Remnants of War' (ERWs) 80% of these were civilians [1]. A better understanding of the behaviour of these explosive charges can lead to better protection against them, thus saving lives and preventing injuries.

Much work has been performed to investigate the effects of explosive loading, especially for military applications, both in free air and with buried charges. The experiments are often conducted at a reduced scale [2] due to the high cost and difficulty of full-scale testing. Hopkinson [3]–Cranz [4] cube-root scaling is regularly used for this purpose. These experiments are supported by numerical modelling efforts [2], although these are often simplified models that do not incorporate soil-specific effects.

Studies have, in the past, failed to take sufficient account of soil conditions in experimentation and for the prediction of loading. Børvik et al. [5] and Kyner et al. [6] used ∼200 μm glass microspheres as a synthetic soil to reduce the influence of variable soil conditions. McShane et al. [7] used compressed gas in place of explosives to reduce complexity and increase the ease of experimentation. This was found to be a suitable method of simulating sand-throw interactions with structures, although it only accounts for impulse transfer through said sand throw (ignoring blast-wave transfer).

It has been found that introducing/taking account of these complexities has wideranging effects on the loading generated and, thus, it is key that these are accounted for in future work. Hlady [8] used Concrete Fine Aggregate Sand (CFAS), a cohesionless well-graded sand, and compared this against Suffield Prairie Soil (PS), a fine-grained cohesive soil composed mainly of clay. It was found that CFAS led to a greater level of repeatability, alongside a much reduced level of labour required in preparation, compared

**Citation:** Waddoups, R.; Clarke, S.; Tyas, A.; Rigby, S.; Gant, M.; Elgy, I. An Approach to Quantifying the Influence of Particle Size Distribution on Buried Blast Loading. *Eng* **2023**, *4*, 319–340. https://doi.org/10.3390/ eng4010020

Academic Editor: Antonio Gil Bravo

Received: 16 December 2022 Revised: 17 January 2023 Accepted: 19 January 2023 Published: 28 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

to the cohesive soil. Fourney et al. [9] conducted small-scale experiments in a range of soil conditions, which showed that soil ejecta contributes the majority of impulsive loading from buried charges. Anderson et al. [10] varied plate and soil parameters in a platejump-height experimental setup, finding that increasing the moisture content (and bulk density) resulted in greater momentum transfer. Bergeron et al. [11] carried out a series of small-scale experiments, using high-speed imaging and flash X-ray to capture soil ejecta and air shock detachment at greater distances. This showed that the ejection velocity of the soil decreases with increasing overburden, as does the air shock propagation speed (with this being greatest in a soil surface flush-buried condition). Weckert and Resnyansky [12] also used flash X-ray to capture ejecta expansion in experiments utilising a range of soils of varying PSD for the validation of numerical modelling. Very good agreement was found between the numerical and experimental results for the ejecta-wave expansion rate and shape.

Clarke et al. [13,14] found that the use of well-graded cohesionless soils result in greater variability in total impulse between tests, when compared with uniform cohesionless soils, for all moisture contents and bulk densities. Although geotechnical conditions (such as moisture content and bulk density) could be controlled to a high level, the well-graded nature of 'Stanag' (an approximation of the sandy gravel defined by [15]) results in a wider spread of impulse values than in a uniform soil such as Leighton Buzzard Sand (LB). Comparing two LB fractions: 'Fraction B' (LB) and '25B Grit' (LBF), with respective *Cu* (coefficient of uniformity, defined in Equation (1)) values of 1.4 and 3.2, resulting in a higher spread of impulse for LBF by a factor of four, even though similar levels of control of geotechnical conditions were achieved [16], thus demonstrating that increased variability is to be expected with an increasingly well-graded soil. A comparision of the particle size distributions of the soils used in this and other studies is shown in Figure 1.

$$\mathcal{C}\_{\mathfrak{u}} = \frac{D\_{60}}{D\_{10}} \tag{1}$$

where *D*<sup>60</sup> is the 60th percentile particle size by mass and *D*<sup>10</sup> is the 10th percentile particle size.

**Figure 1.** Particle Size Distribution (PSD) curves of a range of soils tested in the literature. LB, LBF, Clay [13]; Stanag [17]; CFAS (Centre of Allowable Bounds) [18]; and Glass Microspheres [5,6]

Computer programs have been used for decades for the prediction of blast loading from buried charges, with much based on earlier work by Westine et al. [19]. Tremblay [20] built on Westine's work to establish algebraic equations for impulsive loading; however, these do not account for moisture content as a separate influence from soil density. This is necessary as, for a constant bulk density, an increasing moisture content results in increasing impulse delivery [21].

Numerical modelling has begun to capture the specific loading characteristics associated with soil conditions. Børvik et al. [5] and Kyner et al. [6] used discrete-particle-based numerical models to simulate their small-scale soil-analogue experimental work. Grujicic et al. [22–24] developed material models for sand that take account of soil parameters including saturation and particle size. It is imperative that experimental results can be gathered and used to validate these models.

Research at the University of Sheffield has been conducted via two methods: 'Characterisation of Blast Loading' (CoBL) and 'Free-flying mass impulse capture apparatus' (FFM) [25]. FFM utilised a half-scale (of STANAG Threat Level 2, as defined by [15]) experimental setup wherein a deformable target plate and reaction mass captured the impulse from the buried charge, with the global impulse derived (as in Figure 2a). Hence, this method only captured the overall loading, without the spatial implications.

This spatial relationship has been determined previously using removable tapered plugs in the target plate [26], where their ejection velocity was measured using high-speed video. It was found that, as distance from the centre of the charge increases, the specific impulse decreases exponentially. A more accurate and repeatable experimental method has been developed for the CoBL setup [27] at quarter-scale, using 17 Hopkinson Pressure Bars (HPBs) of 10 mm diameter, arranged radially up to 100 mm from the charge centre in the face of a rigid target plate. Each HPB measures the axial strain, which is converted to stress with a specific impulse integrated for in time [27] and the global impulse interpolated over the instrumented area (as in Figure 2b).

**Figure 2.** The two experimental setups at the University of Sheffield. (**a**) FFM (from Figure 3 of Clarke et al. [28]). (**b**) CoBL (from Figure 2 of Rigby et al. [29]).

In well-graded Stanag soil, individual particles can be over twice the size of the HPBs used in the CoBL setup. The total impulse values reported from FFM and CoBL testing are not in agreement for this soil type [17]; the impulse from CoBL is found to be much greater than that expected from scaling FFM, this is not the case for uniform soils. This suggests that the method of determining loading (a simple interpolation between discrete points) could be flawed for this well-graded soil. Hence, work is required to establish the relationship between a soil's PSD and the distribution and magnitude of impulsive loading.

#### **2. Methodology**

In order to address the disparity between the global impulse results for well-graded soil between CoBL and FFM experiments, alterations were required to the method of interpolation between the discrete measurement points (as laid out in Figure 3).

**Figure 3.** Layout of HPBs in the rigid target plate for the CoBL experimental setup

The previous data processing method (as used by Rigby et al. [17] and Clarke et al. [30] for all soil types, outlined in detail in [31]) operated by first importing the voltage signals from the experimental output, converting these to pressure signals then truncating them to a chosen length of time. A breakwire placed within the explosive charge was used to trigger the recording, so the truncation is applied after this time. Next, all of the pressure traces are aligned in time by their maximum pressure, so that, at any time after wave arrival, the value of the pressure can be interpolated between each HPB in the same axial direction (thus eliminating the temporal progression element and reducing the problem to a 1-dimensional interpolation). These four axes (positive xx, positive yy, negative xx, and negative yy) can then be interpolated between to populate the quadrants of a matrix with the expected pressure at every location (to a given mesh size). This occurs for the full test duration, after which the temporal wave-progression is reintroduced to allow the algorithm to represent both the temporal and spatial aspects of the loading. In the previous work, this temporal matrix of pressures over the plate was used to derive a specific impulse and global impulse over the whole plate. This methodology, along with the new interventions proposed herein, is outlined in the flowchart in Figure 4.

**Figure 4.** Flowchart of the methodology to convert discrete pressure measurements from HPBs to a full-plate dataset

#### *2.1. Signal Truncation*

Until now, it has been determined that an arbitrary cut-off, sufficiently later than the passing of the pressure wave, can be used to truncate input pressure signals. For the data presented by Clarke et al. [30], these results can be replicated by the use of a cut-off of 1.3 ms (1.2 ms after wave arrival), using peak global impulse as the reported values of impulse. Lide et al. [32] state that the speed of a wave in a narrow stainless steel rod is 5000 m/s, which means that over the 6m distance that the wave travels from the strain gauges to the end of the HPB and back, a 1.2 ms time period will have elapsed before the reflection will interfere with the pressure trace. This length of truncation has been found to have a significant effect on some test results due to the presence of pressure signal 'drift' after the loading has occurred. This drift is a phenomenon wherein, on certain tests, the gauge voltage (and, thus, the recorded pressure) does not return to zero after the loading period, even though the true pressure has returned to the ambient level. This can, with enough time to compound, lead to large increases or reductions in the global impulse derived (acceptable if there is a negative drift, as the peak impulse can be measured before the drift causes it to drop, leading to an increased reported impulse value for tests with a positive drift). This drift acts in opposite directions depending on the scope polarity during experimentation, this varied during the testing as it was assumed that it would not affect the results, with the systematic error only identified post-testing. As such, a reduced truncation time of 0.7 ms was introduced, as this has been found to reduce the influence of pressure drift whilst still capturing the full period of loading. The effects of this can be seen in Figure 5.

(**b**) Example Stanag (well-graded sandy gravel) cumulative impulse plot with positive signal drift.

**Figure 5.** 1.3 ms truncated signals with opposing drift directions, dashed line shows reduced 0.7 ms truncation period. 'Dominant radius' indicates where the majority of impulse contributions are occurring at that time.

#### *2.2. Signal Noise and Filtering*

In order to improve the readability of the pressure signals from the experimental apparatus, data smoothing was performed with the purpose of reducing signal noise without affecting the peak pressure and impulse values significantly. A Hampel (median) Filter [33], with various window widths, was trialled with limited success, with some especially noisy signals being improved marginally, though not to an adequate level. Savitzky–Golay filtering [34], as used by Pannell et al. [35] for the removal of noise from

specific impulse data, was evaluated also. A first-order fit (moving average) with varying frame lengths was applied, with a frame length of 11 samples (equivalent to approximately 35 μs) selected as appropriate due to negligible reductions in peak pressure and global impulse whilst delivering a significantly 'cleaner' pressure signal.

To understand the origin of the noise in the signal, a Fast Fourier Transform (FFT) operation was undertaken to find the Discrete Fourier Transform (DFT) of the raw pressure signals, to determine if there were any dominant frequencies within the signal that could be attributed to physical causes such as electrical noise. Figure 6 shows the frequency spectrums resulting from the FFT process for three tests, showing that the majority of the signal is in the <100 kHz range. There is not a secondary peak in the frequency spectrum, suggesting that the noise cannot be attributed to a consistent external source.

**Figure 6.** FFT analysis of LB experimental pressure signals. Numerical test IDs are consistent with [30], with alphabetical being previously unpublished. Figure inset is a zoomed in section of the same data.

Wang and Li [36] used FFTs to filter out high-frequency noise from signals in the Split Hopkinson Pressure Bar experiments, applying a low-pass filter to the FFT output, then performing an inverse FFT to retrieve a low-noise output. The same method was applied in this study. Tyas and Watson [37] state that, "the highest acceptable frequency component in a signal propagating in a steel bar [is limited] to approximately from 250/*a* to 500/*a* kHz", with *a* being the HPB radius in mm, in this case *a* = 5. Thus, the maximum theoretically acceptable frequency would be between 50 and 100 kHz. However, they go on to state that, in blast loading, the situation can be more complicated. They propose a dispersion correction method that results in a bar having "a bandwidth in excess of 1250/a kHz" [37]. Supported by Figure 6, 100 kHz was selected as the maximum frequency cut-off in the current analysis. A comparison between the original pressure signal, with specific impulse take-up for the selected bar, against the Savitzky–Golay filtered and Inverse FFT signals is presented in Figure 7. It can be seen that the Savitzky–Golay filtering is an effective method of reducing signal noise, whilst preserving the pressure peak and specific impulse. However, an inverse FFT method is not suitable as a noise-reduction method as the pressure

peak and specific impulse takeup are not preserved. Thus, Savitzky–Golay filtering has been used to process each of the raw pressure signals before further analysis is performed.

**Figure 7.** For a single HPB pressure trace: (**a**) Raw Signal vs. (**b**) Savitzky–Golay Filtered vs. (**c**) Inverse FFT Filtered. Blue = pressure, red = specific impulse.

#### *2.3. Wave Arrival Time*

The subsequent alignment of the pressure signals for interpolation from the maximum value of pressure in each individual signal is relatively good, with signals usually aligned within 50 μs of each other. However, this falls down when the arrival of the wave and the maximum pressure peak do not align (through the presence of a second peak slightly after the first, potentially from initial separation of the shock front and the soil ejecta wave). As such, this alignment was changed to operate by finding the *n* highest pressure peaks in the signal (*n* = 5 was selected as the optimum value), then using the time signature of the earliest of these as the arrival time of that pressure wave. This improved the signal alignment substantially and also allowed for the determination of wave expansion speed (across the plate), which can be compared with other experimental blast wave speed data. The transition from raw pressure signals to smoothed, arrival-time-aligned signals is demonstrated for an example LB test in Figure 8. It can be seen that many of these signals exhibit slight negative pressure drift, as outlined previously.

The time of arrival (TOA) of a pressure wave at each HPB for a single test is plotted against the radial distance from the centre of the plate in Figure 9a. A reasonable TOA curve can be seen to occur, growing somewhat exponentially with increasing radial distance, with behaviour exhibited at far-field distances in the air [38]. The corresponding average wave-expansion velocity (calculated from an interpolation of the change in TOA at each HPB over the horizontal distance, from the central HPB) is shown in Figure 9b. This velocity is that of the coupled soil ejecta and blast wave expansion, as the blast wave is found (through HSV) to usually detach from the soil ejecta only at greater distances [11,29]. It can be seen that, after an initial period of instability (indicated by the flat portion of the graph), the velocity reduces with the radial distance. The arrival times are typically variable within the 0–25 mm range, potentially due to experimental error in the centring of the sand bin and charge below the instrumented plate surface, as well as from the influence of sand plumes ejecting ahead of the main wave. The decision was made to limit analysis of the wave speed to radii ≥25 mm due to the unreliability of the data before that point, exacerbated by the presence of only one sensor at the centre rather than the four at every other instrumented distance.

**Figure 8.** Full array of pressure signals for a single test: (**a**) raw pressure input, (**b**) smoothed, and (**c**) arrival-time-aligned.

**Figure 9.** Time of arrival and wave-expansion velocity for an LB test.

#### *2.4. Comparison of CoBL and FFM*

(**a**) TOA vs. radial distance from plate centre.

So that the outputs of the modified interpolation algorithm could be reconciled against other data, global impulse values from the FFM tests were used as a comparison [28]. There is a disparity between the CoBL and FFM results for the Stanag tests (as discussed previously and in [17]), with the Stanag global impulse results being consistently higher than those of LB from CoBL testing but the reverse being true in FFM. There are two ways to reconcile the differences in these datasets. The first is that postulated in [17], wherein this higher impulse is caused by a more centralised loading in Stanag (due to a large number of discrete strikes directly above the charge, as well as the higher stiffness of Stanag), captured by the smaller relative instrumented area of CoBL. If this were the case, a greater peak deflection for the same impulse would be expected in plate-deflection experiments, such as that seen for testing performed with charges contained within a steel 'pot' (Minepot) in [28]. The peak deflections are marginally higher for Stanag when compared with LB (for the same impulse), but the extent and contribution of this loading centralisation is currently unknown. The second is that the existing area-integration of impulse inaccurately assumes a regular wave of soil expansion (through cubic interpolation) and occurs (disregarding the effects of discrete large particles), resulting in a consistent over-estimation of the global impulse. This study hypothesises that the particle strikes do not occur across the whole plate in this wave-like manner and, instead only occur at a limited number of locations, with the simple interpolation currently acting to skew the results by 'stretching' the increased pressure readings over an excessively large 'zone of influence'. This effect is shown in Figure 10, with strikes at the two HPBs. The interpolation algorithm required alteration so that it could correctly account for discrete particle strikes, on top of a background contiguous wave. This study is intended as a proof of concept to investigate whether it is plausible to correct the interpolation algorithm to account for the effect of discrete particle strikes.

It was determined that this alteration should use an area-limiting scheme, wherein the maximum area surrounding a HPB, over which a recorded pressure would be likely to have been applied, would be determined, as opposed to the existing algorithm that assumed a full 180 degrees of the plate could be influenced by this pressure spike. For example, the existing algorithm would assume that a strike effectively impacts an area of 25 mm by 236 mm at a 75 mm HPB, a total area of 5890 mm2, when a typical Stanag particle (*D*50) is around 10 mm in diameter (a cross-sectional area of 79 mm2). As such, it was important to determine a way to limit this area of influence to the true limit that would be expected in

the real, uninstrumented regions of the target plate (represented in simplified form for a single axis in Figure 11).

**Figure 10.** Demonstrating the difference between actual total discrete strike area (shown here as 10% of the plate area) versus the 'zone of influence' of the existing interpolation. It is assumed that if a proportion of the plate area is struck, the same proportion of the HPBs will be struck also, a strike area of 3142 mm2 results in a 'zone of influence' of 9817 mm2 in this example case.

This area limiting, as a consequence of the PSD, has been attempted as a possible method of understanding the behaviour of the soil in blast conditions. A number of approximations have been created, including assuming perfectly plastic collision behaviour and the straight-on impact of particles. Further work is required to establish the validity of these assumptions and to increase accuracy.

**Figure 11.** Diagram showing a simplified 1D approximation of the area-limiting process for a single cardinal direction with discrete particle strike at 75 mm.

#### *2.5. Theoretical Particle Strike Area*

In order to establish the area limit for any given particle strike at any HPB, it was first important to consider the known soil parameters that could be used to derive this, the *D*<sup>50</sup> value (the median particle diameter) was selected for this purpose. From a particle size distribution graph, the *D*<sup>50</sup> can be established, in order to gauge a typical value of the particle size that is representative of the soil as a whole.

This *D*<sup>50</sup> value can be used, along with a stone density (assumed as a typical 2700 kg/m3, in the calculation of a typical particle mass (assuming a spherical particle). When this is multiplied by the velocity of a strike (determined from average wave-expansion velocity at the relevant radial distance), the momentum of a typical particle strike can be established.

$$I\_{\text{strike}} = F\_{\text{typ.}} \cdot t\_{\text{strike}} = m\_{\text{particle}} \cdot v\_{\text{avg.}} \tag{2}$$

This momentum transfer is equivalent to the impulse experienced if a plastic response is assumed (particle obliterated on impact, not usually the case and thus a simplification). The impulse would be greater if the particle were reflected elastically, up to a maximum of twice the incident momentum (for a perfectly elastic collision). This equivalence of recorded impulse to incoming momentum has been shown to be the case for a homogenous sand slug [39], but it is likely a simplification of the behaviour of discrete particle loading. However, the expected increase in impulse due to collision elasticity may be counteracted due to oblique or glancing impacts of particles on HPBs, reducing the incident force; for simplicity, these effects have been ignored in this study.

If divided by the time period of the strike (assumed as 25 μs from initial graphs), the impulse can be converted to a typical force value. This time period is supported by the findings of Liu et al. [40], who found that in a sand slug impact, the soil densification time (and thus strike length) is related to the column height of the soil (the overburden of 28 mm in Stanag testing), *H*, and the velocity of the soil (see Figure 12):

$$t = \frac{\vec{t} \cdot H}{v\_0} \tag{3}$$

where, for an incompressible (rigid) target, ¯*<sup>t</sup>* ≈ 1 when pressure drops to zero (¯*<sup>t</sup>* is a non-dimensional time).

Thus, for a 25 μs time period with *H* = 28 mm, a velocity of 1120 m/s would be expected, reasonable given the velocities of the soil in Stanag tests (peak average velocities: 864–2637 m/s and at 100 mm HPB: 636–1050 m/s).

$$\begin{split} F\_{\text{typ.}} &= \frac{m\_{\text{part.}} \cdot \upsilon\_{\text{avg.}}}{t\_{\text{strike}}}\\ &= \frac{\frac{4}{3} \cdot \pi \cdot \left(\frac{D\_{50}}{2}\right)^3 \cdot \rho\_{\text{stone}} \cdot \upsilon\_{\text{avg.}}}{t\_{\text{strike}}} \end{split} \tag{4}$$

If this typical force value is divided by the maximum actual recorded pressure at a given HPB, this will result in the area over which a particle strike should become effective, which can further be reduced to a radius of effect.

$$A\_{\text{strike}} = \pi \cdot r\_{\text{strike}}\,^2 = \frac{F\_{\text{type.}}}{P\_{\text{rec.}}}$$

$$r\_{\text{strike}} = \sqrt{\frac{A\_{\text{strike}}}{\pi}} = \sqrt{\frac{\left(\frac{F\_{\text{type.}}}{P\_{\text{rec.}}}\right)}{\pi}}\tag{5}$$

$$r\_{\text{strike}} = \sqrt{\frac{\frac{4}{5} \cdot \left(\frac{D\_{\text{50}}}{2}\right)^3 \cdot \rho\_{\text{store}} \cdot \upsilon\_{\text{avg.}}}{t\_{\text{strike}} \cdot P\_{\text{rec.}}}} \tag{6}$$

For example, the testing used Stanag soil with a *D*<sup>50</sup> of 10 mm (extracted from Figure 1). For the HPB in test 34 at yy−50, the maximum recorded pressure was 353.7 MPa and the average wave speed at 50 mm radius was 1760.65 m/s. Therefore, the effective radius of this particular particle strike can be calculated as below:

$$\begin{split} r\_{\text{strike}} &= \sqrt{\frac{\frac{4}{3} \cdot \left(\frac{10 \times 10^{-3}}{2}\right)^{3} \cdot 2700 \cdot 1760.65}{25 \times 10^{-6} \cdot 353.7 \times 10^{6}}} \\ &= 0.0095 \,\text{m} \\ &= 9.5 \,\text{mm} \end{split}$$

The particle mass for this theoretical (spherical) *D*<sup>50</sup> particle would be:

$$\begin{aligned} m &= \frac{4}{3} \cdot \pi \cdot \left(\frac{D\_{50}}{2}\right)^3 \cdot \rho\_{\text{stone}} \\ &= \frac{4}{3} \cdot \pi \cdot \left(\frac{10 \times 10^{-3}}{2}\right)^3 \cdot 2700 \\ &= 1.4 \text{ g} \end{aligned}$$

The larger particles within the soil have much higher masses but are not representative of the soil as a whole because they are less likely to strike the HPBs due to their lower probability of occurrence. Some of these larger particles are pictured in Figure 13, with masses in the range from 9.70 g to 15.11 g. For reference, a spherical particle of diameter 20 mm has a mass of 11.3 g.

**Figure 13.** A selection of large particles extracted from a sample of Stanag soil, with masses of 10.41 g, 12.89 g, 9.70 g, and 15.11 g.

#### *2.6. Application of Area-Limiting: Well-Graded Soil*

To apply the area-limiting scheme to the pressure interpolation algorithm and, thus, garner a more accurate picture of the mechanisms occurring, a *background*-interpolated array was generated, onto which the discrete areas of an array inclusive of particle strikes would be superimposed.

This backgroundarray was generated by finding the time signature of the maximum pressure value at each HPB then removing a 25 μs section of data surrounding this from the signal, effectively removing the period of the particle strike. After performing this action on all 17 HPB signals, these were then interpolated using the original method to create a 3D pressure array.

This interpolation was also carried out on the unaltered data, inclusive of the particle strikes. The circular portions of this array (for the full time of the test), with the appropriate strike radii, were then applied to the background array to create an overall pressure array. This overall pressure array consists of data representing the standard wave of smaller particles expanding from the centre, with limited discrete pressure spikes from larger particle strikes (see Figure 14).

**Figure 14.** Test 34 (Stanag) pressure surfaces over a 40 μs period: (**a**) Background (strikes removed), (**b**) Original (skewed by particle strikes), and (**c**) Area-Limited.

#### *2.7. Application of Area-Limiting: Uniform Soil*

This new area-limiting algorithm was experimentally applied to the data from uniform soil (LB) tests in order to gauge its effectiveness. As can be seen from Figure 15, this failed to improve the interpolation of the data as the soil response consists of a contiguous pressure wave of similar-sized particles without discrete particle strikes. This meant that the algorithm removed a large proportion of real data from the array and, therefore, it performed much more poorly than the original method of interpolation.

**Figure 15.** Test15 (LB) pressure surfaces over a 40 μs period: (**a**) Original and (**b**) Area-Limited

This mishandling of the data by the algorithm can be explained by calculating the radius of effect of an LB particle strike; thus, it can be seen that this is not an appropriate way to represent a pressure wave of uniform particles. For LB test 15 at the HPB at xx50, the D50 = 0.8 mm (from Figure 1), average wave speed at 50 mm radius was 769.8 m/s, and the maximum recorded pressure was 195.2 MPa.

$$\begin{split} r\_{\text{strike}} &= \sqrt{\frac{\frac{4}{3} \cdot \left(\frac{0.8 \times 10^{-3}}{2}\right)^{3} \cdot 2700 \cdot 769.8}{25 \times 10^{-6} \cdot 195.2 \times 10^{6}}} \\ &= 3.38 \times 10^{-4} \,\text{m} \\ &= 0.34 \,\text{mm} \end{split}$$

Constricting the pressure wave peak to only a 0.34 mm radius around each HPB results in the area-limited pressure array represented in Figure 15, which clearly ignores the actual progression of the wave, limiting it to just the background pressure wave from outside of the 25 μs period of 'strike'.

Therefore, it was determined that this area-limiting process should only be applied to data from soils where the D50 particle size would result in a discrete impulse that forms a sizeable proportion of the global impulse experienced. A 1.4 g D50 Stanag particle, at a typical 1120 m/s (from the 25 μs strike length established earlier), would (assuming particle plasticity) result in an impulse of 1.57 N s. For a D50 LB particle, with a mass of 0.00072 g, at 1120 m/s, the impulse would be 0.81mN s, nearly 2000 times less. Given that a global impulse of the magnitude of hundreds of Newton–Seconds is to be expected from this testing, a single typical-particle strike (not accounting for the interpolation causing this error to spread) in Stanag soil could represent a >1% portion of the result, whereas, in LB, this would represent <0.001% (indicating that loading has to occur as part of a contiguous wave in LB soil).

#### **3. Results and Discussion**

The results from the CoBL experimental setup, presented herein, correspond to an over-burden (OB) of 28 mm and a stand-off distance (SOD) of 140 mm. A 78g 3:1 cylinder of PE4 explosive was utilised, buried within either LB (a uniform sand with a moisture content of 25% and a bulk density of 1990 kg/m3) or Stanag (a well-graded sandy gravel with a moisture content of 14% and a bulk density of 2220 kg/m3) both of which were fully saturated. Full saturation was achieved using the method outlined in [13]. The Test IDs used correspond to those in [30] if numerical and are previously unpublished data if alphabetical. The interpolation was carried out with an element size of 5 mm to ensure consistency between tests. It was found that increasing the mesh resolution had a minimal effect on the results.

#### *3.1. Wave-Expansion Velocity*

In order to validate the analytical methodology described previously, the waveexpansion velocities found in this study (used to calculate an area-of-effect of a particle strike) were compared against the wave speeds corresponding to arrival time data from work by Ehrgott [41]. In this work, Ehrgott used 2.27 kg C4 charges, buried in different soils with 100 mm OB, and measured the TOA at gauges suspended 500 mm above the soil surface, at varied horizontal distances from the charge centre. This charge size is equivalent to three times the scale of the CoBL tests using Hopkinson–Cranz cube root scaling and, as such, extrapolating the wave speed, on an exponential trendline, to a 300 mm radius of the centre should provide indicative values of expected wave speeds for CoBL data at 100 mm (with CoBL-scaled OB of 33 mm and SO of 167 mm) see Appendix A.

As can be seen in Table 1, for a poorly graded sandy soil, or a silty sand, wave speeds in the range 1028–1672 m/s are to be expected. Table 2 displays the wave expansion velocities at 100 mm radius for each CoBL test, where a similar over-burden (at an equivalent scale) in saturated Stanag soil results in speeds in the range 636–1050 m/s for a stand-off distance of 140 mm (compared with a 167 mm equivalent) and in saturated LB soil 523–694 m/s. These velocities are somewhat lower than those found by Ehrgott [41], likely a factor of a higher level of saturation causing increased soil throw volume due to a higher level of detonation product containment (reducing the expansion velocity). This is corroborated by a similar analysis of CoBL data from low-moisture-content (2.45%) LB resulting in velocities

in the range 901–1611 m/s, which agrees much more closely with the range derived from Ehrgott [41].

**Table 1.** Wave speeds calculated from TOA data from Ehrgott [41], extrapolated to 300 mm for equivalence with CoBL data.


**Table 2.** Wave speeds calculated from CoBL TOA data.


#### *3.2. CoBL Global Impulse*

3.2.1. Full Plate Integration (No Area Limiting)

The global impulse, used as an integration of pressure in time, from a full-plate cubic interpolation between the HPBs, is displayed in Table 3. The effects of the signal drift can be seen in the difference in the global impulses between the 0.7 ms and 1.3 ms signal truncation times. It can be seen that the Stanag results tend to increase with increased truncation time (due to a large net positive drift). On the other hand, the LB tests have much less of an increase (due to a much reduced net drift). This, if allowed to influence the results, would cause values from the Stanag tests to be inflated when compared to LB tests.

Considering the mean values of the impulse, it can be seen that the ratios of impulses for LB:Stanag is 1.00:1.29 for 0.7 ms truncation and 1.00:1.42 for 1.3 ms truncation (an effect of the higher positive drift), showing the possible effect of discrete particle strikes in skewing the interpolation, as it is expected that the saturated Stanag would have a lower value of impulse than saturated LB [17].

#### 3.2.2. Area-Limiting Well-Graded Stanag

Application of the area-limiting process to the Stanag data results in the values of the global impulse displayed in Table 4. These results are for a 0.7ms truncation of each of the tests, to reduce the influence of signal drift.


**Table 3.** Global impulse results without area limiting for 0.7 ms and 1.3 ms truncations of signals from CoBL. \* Test 17 data show no difference between truncation times as it was only recorded up to 0.5 ms.

**Table 4.** Global impulse derived from integration of pressure over a 0.7 ms truncation time for an area-limited interpolation between HPBs (from CoBL).


The mean value of global impulse, when this method has been applied, results in a ratio of impulse for LB:Stanag of 1.00:0.88, much closer to the 1.00:0.84 found in FFM testing by Rigby et al. [17].

#### *3.3. Comparison to FFM*

The global impulse results can be compared between CoBL and FFM experiments in order to validate the area-limiting method. If it is assumed that the LB total impulse results scale accurately, the ratio of ICoBL:IFFM from the mean LB values for each setup (1:40.53) can be used to derive an expected mean value of impulse for Stanag. For reference, geometric scaling alone, through projection of the FFM target plate to the CoBL SOD at the same Hopkinson–Cranz scale, results in a ratio of 1:30.07, however, this does not account for losses in the wave expansion through air. Utilising the LB-equivalency ratio results in an expected mean of 135.29 Ns for Stanag (only 0.3% more than the mean value of 134.89 Ns resulting from the area-limiting process). Given the number of assumptions made throughout this analysis, it is unfair to accept this accuracy as completely true, however, it is indicative that accounting for PSD is viable when analysing the spatial and temporal distributions of loading from buried charges. These assumptions require further

investigation in the future. Each of the FFM test results, scaled by the 40.53 scale factor, are displayed in Table 5.


**Table 5.** Global impulse data from FFM testing, with impulse values scaled by the LB-equivalency scale factor (40.53). Results marked with \* are from [28], with other data previously unpublished.

#### **4. Conclusions**

This study has built on existing techniques to understand the spatial and temporal distributions of loading from soils in explosive blasts. Improvements have been made to pressure signal processing, including the use of data smoothing, alongside an improved arrival-time-finding algorithm. Further, a proof-of-concept method of area-limiting pressure spikes from discrete particle strikes was established, validated against another experimental setup to achieve similitude between results. The CoBL data (at one-quarter scale) has been directly compared against FFM (at one-half scale), with the area-limiting approximation allowing for agreement in the total impulsive loading for both well-graded and uniform soils.

From this, it can be understood that the spatial distribution of loading is largely impacted by the effects of a blast in well-graded soil, with high pressure strikes occurring over limited regions on a target. This leads to a lower level of global impulse than previously derived (thus bringing data from the CoBL experiment in line with expectations from FFM). This engenders a new understanding that the particle size distribution of a soil has not only a global effect on loading but also a discrete localised effect when larger particles are present, resulting in major implications for the design of protective structures and materials due to the presence of 'pockets' of much higher pressure (and thus specific impulse) loading within the overall wave.

**Author Contributions:** Conceptualization, S.C.; methodology, R.W.; formal analysis, R.W. and S.R.; writing, original draft preparation, R.W.; writing, review and editing, S.C., S.R., M.G. and I.E.; supervision, S.C. and A.T.; experimental management, A.T.; funding acquisition, M.G. and I.E. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by a University of Sheffield Faculty Prize Scholarship. The original experimental work was funded by the Defence Science and Technology Laboratory under contract DSTLX-1000059883.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** The authors would like to recognise the work of the technical support staff at Blastech Ltd. without whom we would have never been able to have such excellent datasets to work with.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Time of Arrival Data from Ehrgott [41]**



Arrival times for the soils most similar to those in use in this study were extracted from [41], converted to wave speeds, and then plotted with an exponential fit to find expected wave speeds at a CoBL-equivalent scale. The wave speeds found in this study have been plotted here also to allow for comparison.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Drone Detection Using YOLOv5**

**Burchan Aydin 1,\* and Subroto Singha <sup>2</sup>**


**Abstract:** The rapidly increasing number of drones in the national airspace, including those for recreational and commercial applications, has raised concerns regarding misuse. Autonomous drone detection systems offer a probable solution to overcoming the issue of potential drone misuse, such as drug smuggling, violating people's privacy, etc. Detecting drones can be difficult, due to similar objects in the sky, such as airplanes and birds. In addition, automated drone detection systems need to be trained with ample amounts of data to provide high accuracy. Real-time detection is also necessary, but this requires highly configured devices such as a graphical processing unit (GPU). The present study sought to overcome these challenges by proposing a one-shot detector called You Only Look Once version 5 (YOLOv5), which can train the proposed model using pre-trained weights and data augmentation. The trained model was evaluated using mean average precision (mAP) and recall measures. The model achieved a 90.40% mAP, a 21.57% improvement over our previous model that used You Only Look Once version 4 (YOLOv4) and was tested on the same dataset.

**Keywords:** YOLOv5; autonomous drone detection; image recognition; machine learning; mAP; unmanned aerial vehicle (UAV)

#### **1. Introduction**

Drones are becoming increasingly popular. Most are inexpensive, flexible, and lightweight [1]. They are utilized in a variety of industries, including the military, construction, agriculture, real estate, manufacturing, photogrammetry, sports, and photography [2,3]. There were 865,505 drones registered as of 3 October 2022, with 538,172 of them being recreational [4]. Drones can take off and land autonomously, intelligently adapt to any environment, fly to great heights, and provide quick hovering ability and flexibility [5]. Increased usage of drones, on the other hand, poses a threat to public safety; for example, their capacity to carry explosives may be used to strike public locations, such as governmental and historical monuments [6]. Drones can also be used by drug smugglers and terrorists. Moreover, the increasing number of hobbyist drone pilots could result in interference with activities, such as firefighting, disaster response efforts, and so on [7]. A list of threats that drones currently pose and a discussion of how drones are being weaponized are offered in [8]. For instance, in April 2021, two police officers in Aguililla, Michoacan, Mexico were assaulted by drones *artillados* carrying explosive devices, resulting in multiple injuries [9]. Thirteen tiny drones attacked Russian soldiers in Syria, causing substantial damage [10]. Considering the possibility of drones being used as lethal weapons [11], authorities shut down the London Gatwick airport for 18 hours due to serious drone intrusion, causing 760 flights with over 120,000 people to be delayed [12].

Detecting drones may be difficult due to the presence of similar objects in the sky, such as aircrafts, birds, and so forth. The authors of [13] used a dataset made up of drones and birds. To create the dataset, they gathered drone and bird videos and extracted images using the MATLAB image processing tool. After gathering 712 photos to train the algorithms, they utilized an 80:20 train:test split to randomly choose the training and testing images.

**Citation:** Aydin, B.; Singha, S. Drone Detection Using YOLOv5. *Eng* **2023**, *4*, 416–433. https://doi.org/ 10.3390/eng4010025

Academic Editor: Antonio Gil Bravo

Received: 30 November 2022 Revised: 24 January 2023 Accepted: 28 January 2023 Published: 1 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

They examined the accuracies of three different object detectors utilizing an Intel Core i5–4200M (2.5GHZ0), 2GB DDR3 L Memory, and 1TB HDD, reaching 93%, 88%, and 80% accuracy using the CNN, SVM, and KNN, respectively. The suggested technique examined included drone-like objects, i.e., birds in the dataset; however, it required 14 minutes and 28 seconds to attain 93% accuracy for just 80 epochs using the CNN methodology. As a result, their proposed approach was not feasible for real-time implementation.

Our previously proposed technique using fine-tuned YOLOv4 [14] overcame the speed, accuracy, and model overfitting issues. In that study, we collected 2395 images of birds and drones from public sources, such as Google, Kaggle, and others. We labeled the images and divided them into two categories: drones and birds. The YOLOv4 model was then trained on the Tesla K80 GPU using the Google deep learning VM. To test the detecting speed, we recorded two drone videos of our own drones at three different heights. The trained model obtained an FPS of 20.5 and 19.0. The mAP was 74.36%. In terms of speed and accuracy, YOLOv5 surpassed prior versions of YOLO [1]. In this study, we compared the performance increase using fine-tuned YOLOv5 for the same dataset used in [14] for drone detection using fine-tuned YOLOv4. YOLOv5 recently demonstrated improved performance in identifying drones. The authors of [1] presented a method for detecting drones flying in prohibited or restricted zones. Their deep learning-based technique outperformed earlier deep learning-based methodologies in terms of precision and recall.

Our key contributions to this study were the addition of a data augmentation technique to artificially overcome data scarcity difficulties, as well as the prevention of overfitting issues utilizing a random train:test split of 70:30, the fine-tuning of the original YOLOv5 based on our collected customized dataset, the testing of the model on a wide variety of backgrounds (dark, sunny), and the testing of different views of images. The model was tested on our own videos using two drones-DJI Mavic Pro, DJI Phantom; videos were taken at three common altitudes—60 ft, 40 ft, and 20 ft.

#### *Paper Organization*

The rest of the research study is structured as follows. Section 2 provides background for our research. Section 3 addresses the research materials and methodologies. Section 4 covers the findings of this study. Section 5 discusses the model's complexity and uncertainty. Section 6 depicts the performance improvement and gives an argumentative discussion. Section 7 brings our paper to a conclusion.

#### **2. Background**

In the past, various techniques, such as radar, were used to detect drones [15]. However, it is very difficult for radar to do so, due to the low levels of electromagnetic signals that drones transmit [16]. Similarly, other techniques, such as acoustic and radio frequencybased drone detection, are costly and inaccurate [17]. Recently, machine learning-based drone detectors, such as SVM and artificial neural network classifiers, have been used to detect drones, achieving better success than radar and acoustic drone detection systems [18]. The YOLO algorithm has outperformed competitor algorithms, such as the R-CNN and SSD algorithms, due to its complex feature-learning capability with fast detection [18]. In fact, the YOLO algorithm is now instrumental in object detection tasks [19]. Many computer vision tasks use YOLO due to its faster detection with high accuracy, which makes the algorithm feasible for real-time implementation [20]. One of the latest developments, YOLOv5, has greatly improved the algorithm's performance, offering a 90% improvement over YOLOv4 [21]. In the present research, we used YOLOv5 to build an automated drone detection system and compared the results against our previous system with the YOLOv4.

UAV detection systems are designed using various techniques. We have reviewed only those studies closely related to our methodology. UAV detection can be treated as an object detection problem in deep learning. Deep learning-based object detection techniques can be divided into one-stage and two-stage detection algorithms [22]. An example of a two-stage object detection technique is R-CNN [23]; examples of one-stage object detection techniques are YOLO [24], SSD [25], etc. The authors of [26] explained the mechanism of how object detectors work in general. Two-stage detectors use candidate object techniques, while one-stage detectors employ the sliding window technique. Thus, one-stage detectors are fast and operate in real-time [27]. YOLO is easy to train, faster, more accurate than its competitors, and can immediately train an entire image. Thus, YOLO is the most frequently used and reliable object detection algorithm [28]. It first divides an image into SXS grids and assigns a class probability with bounding boxes around the object [28]. It then uses a single convolutional network to perform the entire prediction. Conversely, R-CNNs begin by generating a large number of region proposals using a selective search method. Then, from each region proposal, a CNN is utilized to extract features. Finally, the R-CNN classifies and defines bounding boxes for distinct classes [28].

The authors of [28] used YOLOv2 to detect drones and birds, and achieved precision and recall scores above 90. The authors of [27] proposed a drone detection pipeline with three different models: faster R-CNN with ResNet–101, faster R-CNN with Inceptionv2, and SSD. After 60,000 iterations, they achieved mAP values of 0.49, 0.35, and 0.15, respectively. One example of an SSD object detector is MobileNet. MobileNetV2 was used as a classifier in [29]; the authors proposed a drone detection model where the methodology consisted of a moving object detector and a drone-bird-background classifier. The researchers trained the drone-vs-bird challenge dataset on the NVIDIA GeForce GT 1030 2GB GPU with a learning rate of 0.05. At an IoU of 0.5, their highest precision, recall, and F1 scores were 0.786, 0.910, and 0.801, respectively, after testing on three videos. The authors of [30] used YOLOv3 to detect and classify drones. The authors of [30] collected different types of drone images from the internet and videos to build a dataset. Images were annotated in the YOLO format in order to train a YOLOv3 model. An NVIDIA GeForce GTX 1050 Ti GPU was used to train the dataset with chosen parameter values, such as a learning rate of 0.0001, batch size of 64, and 150 total epochs. The best mAP value was 0.74. PyTorch, an open-source machine learning programming language, was used to train and test the YOLOv3 model.

The authors of [31] used YOLOv4 to automatically detect drones in order to integrate a trained model into a CCTV camera, thus reducing the need for manual monitoring. The authors collected their dataset from public resources such as Google images, opensource websites, etc. The images were converted into the YOLO format using free and paid image annotation tools. They fine-tuned the YOLOv4 architecture by customizing filters, max batches, subdivisions, batches, etc. After training the YOLOv4 model for 1300 iterations, the researchers achieved a mAP of 0.99. Though their mAP value was very high, they trained only 53 images and did not address model overfitting, resulting in a greater improvement scope.

The authors of [1] presented an approach based on YOLOv5. They utilized a dataset of 1359 drone images obtained from Kaggle. They fine-tuned the model on a local system with an 8 GB NVDIA RTX2070 GPU, 16 GB of RAM, and a 1.9 GHz CPU. They employed a 60:20:20 split of the dataset for training, testing, and validation. They trained the model on top of COCO pre-trained weights and obtained a precision of 94.70%, a recall of 92.50%, and a mAP of 94.1%.

#### **3. Materials and Methods**

In this research, we employed a recent version of the YOLO algorithm: YOLOv5 [32]. YOLOv5 is a high-performing and fast object detection algorithm that detects objects in real-time. Drones can fly at fast speeds; thus the detection speed also needs to be high. YOLOv5 has the ability to meet this requirement. The algorithm was developed using PyTorch, an open-source deep learning framework that has made training and testing easier for customized datasets and offers outstanding detection performance. YOLOv5 consists of three parts: the backbone, neck, and head [1].

The backbone is made of a CSPNet. The CSPNet reduces the model's complexity, resulting in fewer hyperparameters and FLOPS. At the same time, it resolves vanishing and

exploding gradient issues, due to the depth of the neural networks. These improvements enhance inference speed and accuracy in object detection. Inside the CSPNet, there are several convolutional layers, four CSP bottlenecks with three convolutions, and spatial pyramid pooling. The CSPNet is responsible for extracting features from an input image and using convolutions and pooling to form a feature map that combines all extracted features. Thus, the backbone plays the role of feature extractor in YOLOv5.

The middle part of YOLOv5, often called the neck, is also known as the PANet. The PANet takes all the extracted features from the backbone and saves and sends them to the deep layers in order to perform feature fusions. These feature fusions are passed to the head so that high-level features are known to the output layer for final object detection.

The head of YOLOv5 is responsible for object detection. It consists of 1x1 convolutions that predict the class of an object, with bounding boxes around the target object and a class probability score. Figure 1 shows the overall architecture of YOLOv5.

**Figure 1.** The YOLOv5 architecture.

The location of the bounding box is calculated using Equation (1):

$$\mathbb{L}\cup\_x^y = \mathcal{P}\_{x,y} \* IOL\_{predicted}^{\text{ground truth}} \tag{1}$$

In Equation (1), *<sup>x</sup>* and *<sup>y</sup>* are the *yth* bounding box of the *xth* grid. <sup>∪</sup>*<sup>y</sup> <sup>x</sup>* is the probability score for the *yth* bounding box of the *xth* grid. P*x*,*<sup>y</sup>* equals 1 when there is a target and 0 when there is no target in the *yth* bounding box. The IoU *IOUground truth predicted* is the IoU between the ground truth and the predicted class. Higher IoUs mean more accurately predicted bounding boxes.

The loss function of YOLOv5 is the combination of loss functions for the bounding box, classification, and confidence. Equation (2) represents the overall loss function of YOLOv5 [32]:

$$loss\_{YOLO\upsilon5} = loss\_{boundary\ box} + loss\_{classification} + loss\_{confidence} \tag{2}$$

*lossbounding box* is calculated using Equation (3):

$$\text{loss}\_{\text{bending box}} = \lambda\_{\text{if}} \sum\_{a=0}^{b^2} \sum\_{c=0}^{d} \mathbb{E}\_{a^c}^{\mathcal{G}} \mathbf{h}\_{\mathcal{S}} (2 - K\_a \mathbf{X} \mathbf{n}\_d) \left[ \left( \mathbf{x}\_d - \mathbf{x}\_a^{\mathcal{K}} \right)^2 + \left( y\_d - y\_a^{\mathcal{K}} \right)^2 + \left( w\_d - w\_a^{\mathcal{K}} \right)^2 + \left( h\_d - h\_a^{\mathcal{K}} \right)^2 \right] \tag{3}$$

In Equation (3), the width and height of the target object are denoted using *h* and *w*. *xa* and *ya* indicate the coordinates of the target object in an image. Finally, the indicator function (*λ*if) shows whether the bounding box contains the target object.

*lossclassi fication* is calculated using Equation (4):

$$\text{loss}\_{\text{classification}} = \lambda\_{\text{classification}} \sum\_{a=0}^{b^2} \sum\_{c=0}^{d} E\_{a,c}^{\mathbb{g}} \sum\_{\mathbf{c} \in c\_l} L\_a(\mathbf{c}) \log(LL\_a(\mathbf{c})) \tag{4}$$

*lossconfidence* is calculated using Equation (5):

$$\lambda \log\_{\text{conifold}} \text{arc} = \lambda\_{\text{conifold}} \sum\_{a=0}^{b^2} \sum\_{c=0}^{d} \mathbb{E}\_{\mathbf{z}, \mathbf{c}}^{\text{conifold}} (\mathbf{c}\_i - \mathbf{c}\_l)^2 + \lambda\_{\text{g}} \sum\_{a=0}^{b^2} \sum\_{c=0}^{d} \mathbb{E}\_{\mathbf{z}, \mathbf{c}}^{\text{g}} (\mathbf{c}\_i - \mathbf{c}\_l)^2 \tag{5}$$

In Equations (4) and (5), *λ*confidence indicates the category loss coefficient, *λ*classification the classification loss coefficient, *cl* the class, and c the confidence score.

#### *Construction of the Experiment and Data Acquisition*

We collected drone and bird images from public resources such as Google, Kaggle, Flicker, Instagram, etc. The drone images came from different altitudes, angles, backgrounds, and views, ensuring variability in the dataset. The bird images consisted of 300 different species. The entire dataset was formed using 479 bird images and 1916 drone images; altogether, the dataset consisted of 2395 images. We used a 70:30 train:test split to train and test the YOLOv5 model. The training dataset had 1677 images and the testing dataset had 718 images. We used data augmentation techniques to overcome data scarcity. In fact, using 3 variants of data augmentation, we generated a total of 5749 images. Using a freely available labeling tool, we annotated the images and divided them into two classes. Drone images were annotated as "first class" and bird images as "zero class." YOLO implementation requires that all images be saved in the. txt format, which has four coordinates for the object, including the class of 0 or 1.

We collected two videos of drones flying, using our own two drones: a DJI Mavic Pro and DJI Phantom III. We captured video shots at three different altitudes: 60 feet, 40 feet, and 20 feet. These are altitudes commonly used by drone pilots, especially drone hobbyists. At 60 feet, the drone looked almost like a bird. We captured the videos to evaluate the performance of the YOLOv5 model in terms of accuracy and speed, mainly at high altitudes.

We conducted the experiment using Google CoLab, a free cloud notebook in which we wrote the code, to implement YOLOv5. We fine-tuned the original YOLOv5 to train and test the model using our customized dataset. To accelerate and improve detection accuracy, we used a transfer learning technique. We employed the weights that were already available with the original YOLOv5 to implement transfer learning. We trained our customized model on top of the YOLOv5s.pt weight that was saved while training YOLOv5 on the COCO dataset. The original YOLOv5 was implemented using PyTorch. We also chose PyTorch. At the time we trained our model, Google CoLab allocated a Tesla T4 with a 15110MiB memory NVIDIA GPU. To fine-tune YOLOv5, we chose the values of the various hyperparameters suggested in the original. We used an lr of 0.01, momentum of 0.937, and decay of 0.0005. The model was optimized using stochastic gradient descent. The albumentations were Blur (*p* = 0.01, blur\_limit = (3,7)), MedianBlur (*p* = 0.01, blur\_limit = (3,7)), ToGray (*p* = 0.01), and CLAHE (*p* = 0.01, clip\_limit = (1,4.0) title\_grid\_size = (8,8)).

We changed the number of classes from 80 to 2, since we had 2 classes: drone and bird. The model had 214 layers with 7,025,023 parameters, 7,025,023 gradients, and 16.0 GFLOPs. We used the Roboflow API to load and perform the data augmentation and preprocessing. We used the same dataset employed in our previous experiment. We performed auto-orient and modified classes as data preprocessing techniques. Auto-orient is an image processing technique that ensures that images match the source device orientation. Sometimes, the coordinates in various cameras may confuse (x,y) and (y,x). Auto-orient prevents bad data from being fed into YOLOv5. In addition to data preprocessing, we performed data augmentation using Roboflow API, which helped reduce the data shortage issue. We set the parameters as follows: flip: horizontal, hue: between −25 degrees and +25 degrees, cutout: 3 boxes with 10% size each, and mosaic: 1. Data augmentation ensured data variability, artificially generating 5749 total images, and randomly splitting the entire dataset into a 70:30 train:test split. Figure 2 shows the augmented dataset. We trained the model for 4000 iterations and saved the best weight to test the model using the testing images and videos. During training, we used %tensorboard to log the runs that autogenerated the learning curves in order to evaluate the model's performance beyond the evaluation metrics. Figure 3 shows a flowchart of the overall conducted experiment.

**Figure 2.** Augmented dataset.

**Figure 3.** Overall conducted experiment flowchart.

#### **4. Results**

We evaluated the trained model using the mAP, precision, recall, and F1-scores. We used FPS as the evaluation metric to evaluate the speed of detection in the videos. Table 1 shows the mAP, precision, recall, and F1-scores. The model was evaluated on a testing dataset from a random train:test split. The testing images had data variabilities in terms of different backgrounds (e.g., bright, dark, blur, etc.) and weather conditions (e.g., cloudy, sunny, foggy, etc.), as well as images with multiple classes. To track the evaluation metrics, we plotted the values across iterations. Figure 4 shows the overall training summary of the model. The loss curves indicate a downward trend, meaning that during training, the losses were minimized both for training and validation. The metrics curves show upward trends, meaning the performance of the model improved over the iterations during training. We plotted the precision-recall curve to evaluate the model's prediction preciseness (see Figure 5). The curve tended towards the right top corner, meaning that the values were mostly close to one (i.e., the rate of misclassification was very low when using this model).


**Figure 4.** Overall summary of training.

**Figure 5.** Precision-recall curve.

Finally, we show the model's evaluation metrics in Table 1, which offers an overall summary of the results regarding the trained model's performance when using the testing images. We achieved precision, recall, and mAP50 values of 0.918, 0.873, and 0.904 for all images, respectively. In addition, we calculated individual precision, recall, and mAP50 values for Classes 1 and 2 (see Table 1). Figure 6 shows the drones predicted by the model using randomly chosen testing images. Figures 7–12 show the predictions with bounding boxes and class scores at three different altitudes (20 ft, 40 ft, and 60 ft) in videos using two different types of drones (the DJI Mavic Pro and DJI Phantom III, Da-Jiang Innovations, Shenzhen, China).

**Figure 6.** Drone predictions for test images.

**Figure 7.** At 20ft, DJI Mavic Pro.

**Figure 8.** At 40ft, DJI Mavic Pro.

**Figure 9.** At 60ft, DJI Mavic Pro.

**Figure 10.** At 20ft, DJI Phantom III.

**Figure 11.** At 40ft, DJI Phantom III.

**Figure 12.** At 60 ft, DJI Phantom III.

Appendix A contains more predictions based on the trained YOLOv5 model. While the model worked well on the majority of the test images, there were a few instances of misclassification. Figures A7 and A8 show two misclassifications in which the model misidentified certain drone-like objects as drones alongside correct predictions in these images. Blurred photos might be one of the causes of such misclassification. We can address this problem by employing more training photos, which is outside the scope of this study. The prediction confidence scores were poor, hovering around 10%. We might perhaps establish a confidence score threshold to avoid such misclassification while increasing the number of training images. There were just a few "bird" classes. Figure A3 depicts an example of correct "drone" and "bird" predictions. However, because of the uncertainty of both classes in a single video frame, we were unable to do any drone and bird detection in videos.

#### **5. Model Complexity and Parameter Uncertainty**

To do a quicker prediction, we employed YOLOv5, which mainly relies on GPU implementation. GPU implementation complicates CPU deployments. Data augmentation techniques such as rotation and flipping were used to artificially supplement the dataset for improved training and performance. The parameter uncertainty in our experiment included sampling errors, overfitting, and so forth. Too many classes from one class may create sampling error, whereas training a smaller number of images with higher parameters may result in overfitting. We used pre-trained model weights that were trained on the COCO dataset, and we trained our fine-tuned model on top of the pre-trained weights.

#### **6. Discussion**

Using deep learning for the detection of drones has become a common topic in the research community, due to the substantial importance of restricting drones in unauthorized regions; however, improvement is still needed. The authors of [30] proposed a drone detection methodology using deep learning, employing YOLOv3 to detect and classify drones. More than 10,000 different categories of drones were used to train the algorithm, and a mAP of 0.74 was achieved at the 150th epoch. Though they used a YOLO-based approach, their study did not consider testing the model using videos, different weather conditions, and backgrounds; most importantly, they did not test their model using images of objects like drones. The authors of [33] used deep learning-based techniques and Faster R-CNN on a dataset created from videos collected by the researchers. The following image augmentation techniques were employed: geometric transformation, illumination variation, and image quality. The researchers did not calculate the mAP values and instead plotted a

precision-recall (AUC) curve to evaluate the performance. Using a synthetic dataset, their model achieved an overall AUC score of 0.93; for a real-world dataset, their model achieved an overall AUC score of 0.58. The dataset was trimmed from video sequences, and thus had no objects much of the time. In our previous research, we analyzed the performance of our proposed methodology using YOLOv4 and showed that the proposed methodology outperformed existing methodologies in terms of mAP, precision, recall, and F-1 scores. Using YOLOv4, we were able to achieve a mAP of 0.7436, precision of 0.95, and recall of 0.68. Most importantly, we included another evaluation metric, FPS, to evaluate the performance, achieving an average FPS of 20.5 for the DJI Phantom III videos and 19.0 FPS for the DJI Mavic Pro videos, all at three different high altitudes (i.e., 20 ft, 40 ft, and 60 ft). We tested the model using a highly variable dataset with different backgrounds (e.g., sunny, cloudy, dark, etc.), various drone angles (e.g., side view, top view, etc.), long-range drone images, and multiple objects in a single image. Our previous methodology achieved such an improvement due to the real-time detection capability of YOLOv4 acting as a single-stage detection process, and the various new features of YOLOv4 (e.g., CSP, CmBN, mish activation, etc.), which sped up detection. Furthermore, the default MOSAIC = 1 flag automatically performed the data augmentation. In this research, we employed Google CoLab and Google Deep Learning VM for parts of the training and testing. In addition to YOLOv4, YOLOv5 showed performance improvement, as shown in [1]. They obtained a precision of 0.9470, a recall of 0.9250, and a mAP of 0.9410. Although their evaluation metrics were higher than ours, our dataset was bigger. Furthermore, we had binary classes, whereas they just had a "drone" class. They did not employ data augmentation, whereas we used a data augmentation technique to build a collection of over 5700 images. As a result of the variability in the dataset and the addition of new classes with data augmentation, our suggested technique is resilient and scalable in real-world scenarios.

Our results for the present research outperformed our previous methodology, achieving a mAP of 0.904. Because of the lightweight design, YOLOv5 recognized objects faster than YOLOv4. YOLOv4 was created using darknet architecture; however, YOLOv5 is built with a PyTorch framework rather than a darknet framework. This is one of the reasons we obtained more accuracy and speed than earlier methodologies. In addition to the architecture itself, we fine-tuned the last layers of the original YOLOv5 architecture so that it performed better on our customized dataset. Other than the layer tuning, we customized the default values of the learning rate, momentum, batch size, etc. We trained the model for 100 iterations since we trained the custom dataset on top of the transferred weights for the COCO dataset. In addition to mAP, we achieved a precision of 0.918, recall of 0.875, and F-1 score of 0.896. In terms of F-1 score and recall, we also outperformed the previous model. We further tested the new model on two videos, using a Tesla T4 GPU. For the DJI Mavic Pro, we achieved a maximum FPS of 23.9 ms, and for the DJI Phantom III, a maximum FPS of 31.2 ms. Thus, in terms of inference speed, we also outperformed the previous model's performance. We achieved this improvement due to the new feature additions included in YOLOv5, such as the CSPDarknet53 backbone, which resolved the gradient issue using fewer parameters, and thus was more lightweight. Other helpful feature additions included the fine-tuning of YOLOv5 for our custom dataset, data augmentation performed to artificially increase the number of images, and data preprocessing to make training the model smoother and faster. The evaluation metric, F1 score, is a weighted sum of precision and recall. Precision is the accuracy of positive class prediction, whereas recall is the proportion of true positive classes. The greater the F1 score, the better the model in general. The correctness of bounding boxes in objects is measured by mAP, and the greater the value, the better. The speed of object detection is measured in frames per second (FPS). Table 2 compares the performance of the previous and proposed models in terms of four evaluation metrics: precision, recall, F1 score, mAP50, and FPS.


**Table 2.** Comparison between previous and proposed models' performance.

#### **7. Conclusions**

In this research, we compared the performance of one of the latest versions of YOLO, YOLOv5, to our previously proposed drone detection methodology that used YOLOv4. To make a fair comparison, we employed the same dataset and the same computing configurations (e.g., GPU). We first fine-tuned the original YOLOv5, as per our customized dataset that had two classes: bird and drone. We further tuned the values of the hyperparameters (e.g., learning rate, momentum, and decay) to improve the detection accuracy. In order to speed up the training, we used transfer learning, implementing the pre-trained weights provided with the original YOLOv5. The weights were trained on a popular and commonly used dataset called MS COCO. To address data scarcity and overfitting issues, we used data augmentation via Roboflow API and included data preprocessing techniques to smoothly train the model. To evaluate the model's performance, we calculated the evaluation metrics on a testing dataset. We used precision, recall, F-1 score, and mAP, achieving 0.918, 0.875, 0.896, and 0.904 values, respectively. We outperformed the previous model's performance by achieving higher recall, F-1 score, and mAP values (a 21.57% improvement in mAP). Furthermore, we tested the speed of detection on videos of two different drone models, the DJI Phantom III and the DJI Mavic Pro. We achieved maximum FPS values of 23.9 and 31.2, respectively, using an NVIDIA Tesla T4 GPU. The videos were taken at three altitudes—20 ft, 40 ft, and 60ft—to test the capability of the detector for objects at high altitudes. In future work, we will use different versions of YOLO and larger datasets. In addition, other algorithms for object detection will be included to compare the performance. Various drone-like objects such as airplanes will be added as classes alongside birds to improve the model's ability to distinguish among similar objects.

**Author Contributions:** Conceptualization, S.S. and B.A.; methodology, B.A. and S.S.; software, S.S.; validation, B.A. and S.S.; formal analysis, S.S and B.A.; investigation, B.A. and S.S.; resources, S.S.; data curation, S.S.; writing—original draft preparation, S.S.; writing—review and editing, B.A.; visualization, S.S.; supervision, B.A.; project administration, S.S. and B.A.; funding acquisition, B.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The datasets used or analyzed during the current study are available from the corresponding author upon reasonable request.

**Conflicts of Interest:** We declare that there is no conflict of interest.

#### **Abbreviations**



#### **Appendix A**

In a variety of photos, our classifier effectively identified drone and bird objects. We evaluated images with intricate backgrounds and various climatic conditions. Here we have the detection results, where the images are displayed together with their corresponding class names and class probabilities. YOLOv5 generated the predictions in batches. Thus, predictions are shown all in one figure. Additionally, we tested images that have "drone" and "bird" in one image. In augmented training images, 0 refers to "bird" and 1 refers to "drone".

**Figure A1.** First batch prediction by YOLOv5.

**Figure A2.** Second batch prediction by YOLOv5.

**Figure A3.** Bird and drone in images predicted by YOLOv5.

**Figure A4.** First batch of augmented training image predicted by YOLOv5.

**Figure A5.** Second batch of augmented training image predicted by YOLOv5.

**Figure A6.** Third batch of augmented training image predicted by YOLOv5.

**Figure A7.** Instance1 of misclassified image predicted by YOLOv5.

**Figure A8.** Instance2 of misclassified image predicted by YOLOv5.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms**

**Shihan Ma and Jidong J. Yang \***

School of Environmental, Civil, Agricultural & Mechanical Engineering, University of Georgia, Athens, GA 30602, USA

**\*** Correspondence: jidong.yang@uga.edu

**Abstract:** This paper introduces a novel approach to leveraging features learned from both supervised and self-supervised paradigms, to improve image classification tasks, specifically for vehicle classification. Two state-of-the-art self-supervised learning methods, DINO and data2vec, were evaluated and compared for their representation learning of vehicle images. The former contrasts local and global views while the latter uses masked prediction on multiple layered representations. In the latter case, supervised learning is employed to finetune a pretrained YOLOR object detector for detecting vehicle wheels, from which definitive wheel positional features are retrieved. The representations learned from these self-supervised learning methods were combined with the wheel positional features for the vehicle classification task. Particularly, a random wheel masking strategy was utilized to finetune the previously learned representations in harmony with the wheel positional features during the training of the classifier. Our experiments show that the data2vec-distilled representations, which are consistent with our wheel masking strategy, outperformed the DINO counterpart, resulting in a celebrated Top-1 classification accuracy of 97.2% for classifying the 13 vehicle classes defined by the Federal Highway Administration.

**Keywords:** vehicle classification; vision transformer; self-supervised learning; supervised learning; object detection

**Citation:** Ma, S.; Yang, J.J. Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms. *Eng* **2023**, *4*, 444–456. https://doi.org/ 10.3390/eng4010027

Academic Editor: Antonio Gil Bravo

Received: 29 November 2022 Revised: 22 January 2023 Accepted: 30 January 2023 Published: 1 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Vehicle classification is crucial information for highway infrastructure planning and design. In practice, a large quantity of sensors has been installed in state highway networks to collect vehicle information, such as weight, speed, class, and count of vehicles [1]. Many studies have been conducted to classify vehicle types based on sensor data. For example, Wu et al. (2019) used roadside LiDAR data for vehicle classification [2]. The study evaluated traditional machine learning methods (e.g., naive Bayes, k-nearest neighbors (KNN), random forest (RF), and support vector machine) for classifying eight vehicle categories and resulted in a best accuracy of 91.98%. In another study, Sarikan et al. (2017) employed KNN and decision trees for automated vehicle classification, where the inputs were extracted features from vehicle images. The method could distinguish all sedans and motorcycles in the test dataset [3]. Recent developments in vision-based deep learning, inspired by AlexNet [4], have made image-based vehicle classification a popular approach and continue to elevate the image classification benchmark. Zhou et al. (2016) demonstrated a 99.5% accuracy in distinguishing cars and vans and a 97.36% accuracy in distinguishing among sedans, vans, and taxis [5]. Similarly, Han et al. used YOLOv2 to extract vehicle images from videos and applied an autoencoder-based layer-wise unsupervised pretraining to a convolutional neural network (CNN) for classifying motorcycles, transporter vehicles, passenger vehicles, and others [6]. ResNet-based vehicle classification and localization methods were developed using real traffic surveillance recordings, containing 11 vehicle categories, and it obtained a 97.95% classification accuracy and 79.24% mean average

precision (mAP) for the vehicle localization task [7]. To ensure the robustness of the models against weather and illumination variation, Butt et al. expanded a large dataset with six common vehicle classes considering adverse illuminous conditions and used it to finetune several pretrained CNN models (AlexNet, GoogleNet, Inception-v3, VGG, and ResNet) [8]. Among those, the finetuned ResNet was able to achieve 99.68% test accuracy. Regardless of the recent success in image-based vehicle classification, most of the models have been developed based on common vehicle categories that are not consistent with the vehicle classes established for engineering practice, such as the Federal Highway Administration (FHWA) vehicle classification, which defined 13 vehicle classes [9] with key axle information, as summarized in Table 1. The vehicle class details with illustrative pictures can be found in [10].


**Table 1.** This is a table. FHWA vehicle classification definitions.

Nonetheless, several vehicle classification studies have been conducted with respect to FHWA vehicle classes by focusing on truck classes. Given the detailed axle-based classification rules established by the FHWA, many researchers have used them explicitly for truck classification [11]. In statewide practice, weigh-in-motion (WIM) systems and advanced inductive loop detectors are typically utilized to collect data for truck classification based on the FHWA definition, from which high correct classification rates have been reported for both single-unit trucks and multi-unit trucks [12]. As mentioned previously, besides the traditional sensing technologies, vision-based models have recently been applied in truck classification. Similarly to [5], YOLO was adopted for truck detection. Then, CNNs were used to extract features of the truck components, such as truck size, trailers, and wheels, followed by decision trees to classify the trucks into three groups [13]. This work was continued by further introducing three discriminating features (shape, texture, and semantic information) to better identify the trailer types [14].

As noted in [14], some of the classes in the FHWA scheme only have subtle differences, and deep learning models have potential overfitting issues with their imbalanced datasets. It remains a challenge for vision-based models to successfully classify all 13 FHWA vehicle classes. The objective of this study is to leverage general representations distilled from the state-of-the-art self-supervised methods (DINO [15] and data2vec [16]) as well as specific wheel positional features extracted by YOLOR [17] to improve vehicle classification. Our results show that vehicle representations are the primary features for classifying different vehicle types while wheel positions are complementary features to help better distinguish similar vehicle classes, such as classes 8 and 9, where the only salient feature difference between them is the number of axles. To reinforce this feature complementarity, the general vehicle representations from self-supervised methods were further finetuned in a subsequent supervised classification task together with a random wheel masking strategy, which is compatible with the contextualized latent representations distilled by the data2vec method [16]. As a result, our method significantly improves the classification performance and achieved a Top-1 accuracy of 97.2% for classifying the 13 FHWA vehicle classes. This paper is organized into five sections. Section 2 describes the dataset, followed by our proposed method in Section 3, experiments in Section 4, and finally conclusions and discussions in Section 5.

#### **2. Data Description**

The dataset contains 7898 vehicle images collected from two sources: the Georgia Department of Transportation (GDOT) WIM sites and the ImageNet [18] opensource dataset. The GDOT data were collected by the cameras installed at selected WIM stations. A total of 6571 vehicles images was collected from the GDOT WIM sites, consisting mainly of the common classes, such as class 2 and class 9. The number of images across the 13 vehicle classes was not well balanced. Rare classes in the GDOT image dataset, such as classes 1, 4, 7, 10, and 13, contained a small number of images. In compensating for these low-frequency classes, an additional 1327 vehicle images were extracted from the ImageNet. Figure 1 summarizes the distribution of vehicle images across the 13 FHWA vehicle classes from both sources, the WIM sites and ImageNet. Exemplar images from each data source are shown in Figure 2.

**Figure 1.** Distribution of image data among 13 FHWA classes from WIM sites and ImageNet.

Given the varying scale of vehicles relative to the image frame, the vehicles were cropped from the original images to remove the irrelevant background information. All models were trained based on the cropped vehicle images.

It should be noted that the number of vehicle axles is considered an important feature in FHWA vehicle class definition. For instance, class 8, class 9, and class 10 are both onetrailer trucks. The only difference among these classes is the number of axles (represented by the number of wheels in vehicle images). In addition, the relative locations of wheels are also different across vehicle categories. To extract these particular wheel positional features, a wheel detector was trained to locate all wheels in an image, as shown in Figure 3. With the locations of wheels being identified, the relative positions of all wheels were computed by dividing the wheel spacings (Di) by the maximum distance (i.e., distance between the center of the leftmost wheel and the center of the rightmost wheel) as depicted in Figure 4. This normalization process exists to remove the effect of different camera angles

and unifying wheel positional information from different vehicle sizes and scales. The resulting normalized wheel positional features were a vector of the relative wheel positions, which were used to complement the features extracted by the vision transformer models for the downstream classification task.

**Figure 2.** Examples of vehicle images in the dataset. (**a**) Vehicle images from the ImageNet. (**b**) Vehicle images collected at the GDOT WIM sites.

**Figure 3.** Illustration of wheel positional feature extraction (D1, D2, D3, and D4 are the center-tocenter horizontal distances between two successive wheels from left to right).

**Figure 4.** Illustration of a vision transformer.

#### **3. Method**

Artificial intelligence, especially deep learning, has grown dramatically over the past decade, fulfilling many real-world necessities. Many creative and influential models have been introduced especially in the cognitive computing, computer vision (CV), and natural language processing (NLP) areas. Some models have accomplished multidisciplinary success. One typical example is the transformer [19], which was first developed for NLP and has been successfully applied to vision tasks. The original transformer consists of multiple layers of encoder/decoder blocks, each of which has a combination of three key modules: a self-attention module, feedforward network modules, and layer normalization. Given its increasing popularity, the self-attention mechanism and adapted ViT architectures have been widely adopted across different fields (e.g., [20,21]). In this study, we leveraged the pretrained ViT encoders with the state-of-the-art self-supervised learning methods (i.e., DINO and data2vec) and complemented the ViT representations with wheel positional features retrieved from a finetuned object detection model (i.e., YOLOR). The two sets of features (i.e., ViT representations and wheel positional features) were harmonized by a wheel masking strategy during the classifier training. Our proposed method has shown dramatically improved classification performance when data2vec is used as the pretraining method and the ViT encoder is finetuned during the subsequent classifier training stage.

#### *3.1. Vision Transformer*

The transformer's architecture has recently been adapted to successfully handle vision tasks [22]. The ViT model has been demonstrated to achieve comparable or better image classification results than traditional CNNs [23–25]. Specifically, ViT leverages embeddings from the transformer encoder for image classification. As depicted in Figure 4, the input image is first divided into small image patches. Each patch is flattened and linearly projected to a latent vector dimension, which is then kept constant throughout all layers. The latent vector is learnable and referred to as patch embedding. A positional embedding is added to the patch embedding process to retain the spatial relationship among the image patches. The positional embedding process is illustrated using 2 × 2 patches in Figure 4. In practice, usually 7 × 7 or more patches are used. A class token is added and serves as a learnable embedding to the sequence of embedded patches. The learned representations from the encoder are passed to a multi-layer perception (MLP) for image classification. ViT has surpassed many popular CNN-based vision models, such as Resnet152 [22].

#### *3.2. Self-Supervised Pretraining*

To leverage the large number of unlabeled images (e.g., ImageNet [18]), self-supervised learning is adopted for pretraining a ViT encoder (base network). Two state-of-the-art methods: (1) self-distillation with no label (DINO) [15] and (2) data2vec [16] were evaluated in this setting. Figure 5 illustrates the structure of DINO, where input images are randomly cropped to form different views (global and local views) and fed to a teacher ViT and a student ViT, respectively. Therefore, only global view images are passed to the teacher ViT while both global and local view images are passed to the student ViT. In this way, the student model is able to extract multi-scale features. With the encodings from both ViTs, a "softmax" function is applied to produce two probability distributions, p1 and p2. A cross-entropy loss is then computed between p1 and p2. During the training, the parameters of the student ViT are updated by a stochastic gradient descent, while the parameters of the teacher ViT are updated from the exponential weighted average of the student ViT's parameters.

**Figure 5.** Illustration of DINO.

The data2vec was recently proposed by Baevski et al. [16], which represents a general self-supervised learning framework for speech, NLP, and computer vision tasks. The structure of data2vec is illustrated in Figure 6. Similar to DINO, data2vec also employs a teacher–student paradigm. The teacher generates representations from the original input image, while the student generates representations from the masked image. Different from DINO, data2vec predicts the masked latent representation and regressed multiple neural network layer representations instead of just the top layer. Instead of the cross-entropy loss in DINO, a smooth L1 loss is used in data2vec.

**Figure 6.** Illustration of data2vec.

#### *3.3. Wheel Detection*

As discussed previously, the relative wheel positions are critical features according to the FHWA vehicle classifications. Therefore, being able to detect all wheels in vehicle images provides more definitive features for vehicle classification. To leverage the existing object detection models, three real-time object detection architectures, i.e., Faster R-CNN [26], YOLOv4 [27], and YOLOR [17], were evaluated as potential wheel detectors.

Based on our experiments, both YOLOv4 and YOLOR achieved a mAP of 99.9%, slightly outperforming the Faster R-CNN model (mAPs = 99.0%). In light of the faster inference speed, YOLOR was chosen as the wheel detector in our study for extracting wheel positional features. In our experimental setting, the YOLOR model played dual roles: (1) detecting vehicles (with bounding boxes) so that they could be cropped out for further processing (as mentioned previously, all the models were trained and tested on cropped images rather than the original images) and (2) extracting wheel positional features, which were combined with the ViT features for the vehicle classification task. The fusion of these two sets of features was achieved by the end-to-end training of a composite model architecture as discussed in the following section.

#### *3.4. Composite Model Architecture*

A composite model architecture is proposed to improve vehicle classification by harnessing the features extracted from both ViT and wheel detection models, as shown in Figure 7. The input image is fed to YOLOR for vehicle and wheel detection. Then, the vehicle image is cropped based on the vehicle bounding boxes output from the YOLOR and resized to 224 × 224, which is the input size for the ViT encoder. Features extracted from the ViT encoder and the wheel positional features are concatenated and fed to a multi-layer perceptron (MLP) to classify the 13 FHWA vehicle classes. The wheel locations detected by YOLOR provide wheel positional features that are complementary to those features extracted by the ViT encoder since the former provides localized details on axle configuration while the latter emphasizes vehicle features at a coarser and larger scale (i.e., not necessarily attending to the wheel position details). To further reinforce this complementarity, one wheel was randomly masked when finetuning the ViT encoder. The vanilla ViT and pretrained ViTs by DINO and data2vec were all evaluated in this composite architecture setting. The experimental results are presented in the Experiments section.

**Figure 7.** Structure of the composite model architecture.

#### **4. Experiments**

#### *4.1. Effects of Self-Supervised Pretraining*

The ViT model was trained with Adam [28] and a batch size of 64. The learning rate was initially set to 6 <sup>×</sup> <sup>10</sup>−5. For model development and evaluation, the dataset was split with 80% being relegated to training and 20% to testing. All cropped images were resized to 224 × 224 pixels and evenly divided into 14 × 14 patches. Two self-supervised methods, DINO and data2vec, were evaluated as a pretraining stage for the supervised classification task. An ImageNet-pretrained ViT with DINO and data2vec was utilized in this study. Our model training was conducted under two settings: (1) freezing the ViT backbone and only training the MLP classifier and (2) finetuning the ViT backbone while training the MLP classifier. For comparison purposes, the original ViT model was also trained end-to-end in a supervised fashion. The results are summarized in Table 2.


**Table 2.** Comparison of classification performance by different ViT training settings.

As shown in Table 2, the Top-1 accuracy, weighted average precision, and weighted average recall from the pretrained ViT models are significantly higher than those from the supervised ViT model regardless of whether the ViT backbone was frozen or not during the classifier training. There was a clear performance boost when the ViT encoder was finetuned during the classifier training. For the finetuned models, the ViT + DINO network performed slightly better than the ViT + data2vec.

For a detailed performance comparison across vehicle classes, the classification reports of the ViT, ViT + DINO, and ViT + data2vec are presented in Table 3. Overall, ViT performed well in the common classes (classes 2, 3, 6, and 9). However, for minority classes (classes 1, 4, 5, 7, 8, 10, 11, 12, and 13), pretrained ViTs reported much better precision, recall, and F1-scores.

**Table 3.** Comparison of classification reports of the ViT model with and without self-supervised pretraining.


#### *4.2. Performance of Composite Models*

As mentioned previously, the number of axles is a key factor in the FHWA vehicle classification rules. Therefore, we generated the wheel positional features from the wheel locations detected by YOLOR and fed these features to the classifier together with the ViT encodings. To assess the benefits of adding the wheel positional features, we evaluated two model scenarios: (1) ViT models, which did not include wheel positional features, and (2) composite models, which included the ViT models as well as the wheel positional features from YOLOR. Table 4 shows the results of the ViT models (upper part) and composite models (bottom part).


**Table 4.** Comparison of the supervised ViT model and the composite models under different training schemes.

The composite models, which fused the ViT encodings and wheel positional features, improved the classification accuracy by 0.3–1.5%. This confirms that specific wheel positional features are important for the vehicle classification task. The DINO-pretrained ViT models were slightly better than their data2vec counterparts. The best composite model was "ViT + DINO + YOLOR", which achieved an overall accuracy of 96%. The detailed performance metrics (precision, recall, and F1-score) across classes are included in Table 5 for both scenarios: with and without the wheel features.

**Table 5.** Comparison of classification reports of the pretrained model (ViT + DINO) with and without wheel features.


The benefit of adding wheel features resulted in an obvious improvement in the F1-scores for classes 8 and 10. As indicated in Table 1, classes 8, 9, and 10 are all onetrailer trucks with minor differences (number of axles) and immensely imbalanced data distributions (with 82 images in class 8 but 2436 images in class 9). The model could have easily been confused among these classes. Adding the wheel positional features helped to better classify them, as well as classes 11, 12, and 13, which are all multi-trailer classes.

#### *4.3. Random Wheel Masking Strategy*

To regularize the learning process of the composite model, one of the YOLOR-detected wheels was randomly selected for masking when training the ViT. This could allow the learned ViT representations to adapt to the wheel noises being injected. The experiment results are summarized in Table 6.


**Table 6.** Comparison of model performance with and without random wheel masking.

\* WAP, WAR: abbreviations for weighted average precision and weighted average recall, respectively.

As shown in Table 6, all models had improved performance when one wheel was randomly masked during the training of the classifier. Without wheel masking, the best composite model was DINO + ViT + YOLOR, which achieved an accuracy of 96.3%. After applying one wheel masking, its accuracy was raised up to 96.7%. In contrast, the "ViT + data2vec + YOLOR" model benefited tremendously from the random wheel masking. Its accuracy was boosted by 1.9% to 97.2%, surpassing the DINO + ViT + YOLOR. Table 7 shows the detailed classification results of the ViT + data2vec + YOLOR for both the with and without wheel masking settings.

**Table 7.** Comparison of classification reports of the composite model (ViT + data2vec + YOLOR) with and without wheel masking.


As indicated in Table 7, randomly masking one wheel increased the precision of classes 4, 5, 6, 7, 10, and 13 considerably. The F1-scores of most classes also improved, especially for the minority truck classes (5, 6, 7, 8, 10, and 13).

#### **5. Conclusions and Discussions**

The two self-supervised learning methods (DINO and data2vec) showed their superiority over the supervised ViT. The classification accuracies were further boosted after applying DINO or data2vec for pretraining. Finetuning the pretrained ViT encoders during the classifier training helped with the classification task. By adding additional wheel positional features, the models performed better than standalone ViTs. Additionally, the adoption of the random wheel masking strategy while finetuning the ViT encoder further improved the performance of the models, resulting in accuracies of 96.7 and 97.2%, respectively, for the DINO- and data2vec-pretrained models.

An important aspect to acknowledge is that classic supervised learning is largely constrained by limited annotated datasets. In contrast, self-supervised learning can take advantage of massive amounts of unlabeled data for representation learning and has become increasingly popular. Using self-supervised methods as a pretraining stage has been demonstrated to significantly improve the performance of vehicle classification. Between the two popular self-supervised learning methods, DINO and data2vec, there is an interesting finding: the DINO-pretrained ViT performed better than the data2vec-pretrained one, even with ViT finetuning and the addition of wheel positional features. However, by randomly masking a wheel during training, the data2vec-pretrained ViT outperformed the DINO-pretrained ViT. An arguable ratiocination is that during the pretraining stage, data2vec trains the ViT to predict the contextualized representations of masked image patches, which is consistent with our wheel masking strategy. This allows the data2vecpretrained ViT to easily generalize over the masked wheel features, while for DINO, the ViT encoder learns from the cropped parts of input images and does not capture the contextual information, unlike with data2vec.

Although the ViT + data2vec + YOLOR model, coupled with the proposed strategy of random wheel masking, demonstrated an excellent performance in classifying 13 FHWA vehicle classes, there is still plenty of room for future improvement. Dataset imbalance issues can be further mitigated by acquiring more images of minority class vehicles. YOLOR was adopted as a wheel detector to extract wheel positional features, which increases the computational footprint since the two standalone models (ViT and YOLOR) are executed in parallel. A unified model architecture could be investigated to reduce computational costs for practical real-time applications. The work presented in this paper implicitly assumes that the full bodies of all vehicles are visible in the images while this may not be true in real-world settings, where vehicle occlusion and superimposition often occur during heavy traffic conditions, causing only parts of vehicles to be visible. This issue could be mitigated by purposely training the models to recognize vehicle classes with partially blocked images. In fact, the data2vec method learns general representations by predicting contextualized latent representations of a masked view of the input in a self-distillation setting. Thus, data2vec-distilled representations are robust in cases of partial blocking of vehicles in images. Other mitigation methods may consider leveraging multiple views from different cameras or even multimodal sensory inputs. For example, using thermal cameras and LiDAR could help to improve model performance under low light conditions (e.g., at night).

**Author Contributions:** Study conception and design, J.J.Y. and S.M.; data collection, S.M.; analysis and interpretation of results, S.M. and J.J.Y.; draft preparation, S.M. and J.J.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Georgia Department of Transportation, Grant Number: RP20-04.

**Data Availability Statement:** Some or all the data, models, and code that support the findings of this study are available from the corresponding author upon reasonable request.

**Acknowledgments:** The study presented in this paper was conducted by the University of Georgia under the auspices of the Georgia Department of Transportation (RP 20-04). The contents of this paper reflect the views of the authors, who are solely responsible for the facts and accuracy of the data, opinions, and conclusions presented herein. The contents may not reflect the views of the funding agency or other individuals.

**Conflicts of Interest:** The funder had no role in the design of the study; in the analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **High-Performance Computation of the Number of Nested RNA Structures with 3D Parallel Tiled Code**

**Piotr Błaszy ´nski \*,†,‡ and Włodzimierz Bielecki †,‡**

Faculty of Computer Science and Information Systems, West Pomeranian University of Technology in Szczecin, 70-310 Szczecin, Poland; wbielecki@zut.edu.pl


**Abstract:** Many current bioinformatics algorithms have been implemented in parallel programming code. Some of them have already reached the limits imposed by Amdahl's law, but many can still be improved. In our paper, we present an approach allowing us to generate a high-performance code for calculating the number of RNA pairs. The approach allows us to generate parallel tiled code of the maximal dimension of tiles, which for the discussed algorithm is 3D. Experiments carried out by us on two modern multi-core computers, an Intel(R) Xeon(R) Gold 6326 (2.90 GHz, 2 physical units, 32 cores, 64 threads, 24 MB Cache) and Intel(R) i7(11700KF (3.6 GHz, 8 cores, 16 threads, 16 MB Cache), demonstrate a significant increase in performance and scalability of the generated parallel tiled code. For the Intel(R) Xeon(R) Gold 6326 and Intel(R) i7, target code speedup increases linearly with an increase in the number of threads. An approach presented in the paper to generate target code can be used by programmers to generate target parallel tiled code for other bioinformatics codes whose dependence patterns are similar to those of the code implementing the counting algorithm.

**Keywords:** bioinformatics; RNA folding; dynamic programming; tiled code generation; code parallelization; high-performance code

#### **1. Introduction**

Bioinformatics computing became one of the most important areas of science some time ago. Since the beginning, parallel versions of bioinformatics algorithms have been developed. Much of this work has led to significant speedup, some to new algorithms, but some remain in sequential versions. The approach presented in this paper used to modify the code implementing the algorithm for calculating the number of pairs in an RNA structure allows this algorithm to be parallelized and achieve significant target code speedup. The described method can also be used for a whole class of bioinformatics algorithms whose dependence patterns are similar to those of the examined code in this paper.

The serial code of bioinformatics algorithms subject to parallelization are Nussinov's, Zuker's, Smith–Waterman's algorithms, and some others. Typical dependency patterns available in such codes are non-uniform and generally presented with affine expressions. This makes transformations of such codes much more difficult in comparison with codes exposing only uniform dependences (all elements of dependence distance vectors are integers). Some bioinformatic algorithms are implemented in FPGA [1]. There are GPUbased solutions for both CUDA [2], and OpenCL [3], as well as Kokkos [4]. However, most implementations are realized by means of the OpenMP API [5] due to the popularity and simplicity of code development using this API.

In the Ref. [6], Smiths and Waterman described a mathematical analysis of a RNA secondary structure. In our paper, we use the implementation variant of the described algorithm. A description of this variant can be found in Raden's publications [7,8].

**Citation:** Błaszy ´nski, P.; Bielecki, W. High-Performance Computation of the Number of Nested RNA Structures with 3D Parallel Tiled Code. *Eng* **2023**, *4*, 507–525. https:// doi.org/10.3390/eng4010030

Academic Editor: Antonio Gil Bravo

Received: 21 December 2022 Revised: 28 January 2023 Accepted: 30 January 2023 Published: 3 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The counting algorithm computes the exact number of nested structures for a given RNA sequence. It populates matrix *C* using the following recursion:

$$\mathbb{C}\_{i,j} = \mathbb{C}\_{i,j-1} + \sum\_{\substack{i \le k < (j-l) \\ S\_k \cdot S\_j \text{ Fourier}}} \mathbb{C}\_{i,k-1} \cdot \mathbb{C}\_{k+1,j-1} \tag{1}$$

where *l* is the minimal number of enclosed positions, and the entry *Ci*,*<sup>j</sup>* provides the exact number of admissible structures for the sub-sequence from position *i* to *j*. The upper-right corner *C*1,*<sup>n</sup>* presents the overall number of admissible structures for the sequence. We choose value 1 for *l*. The minimal number of enclosed positions could also be 0, 2, or more. A value of *l* has an impact on results generated with the code, but it does not significantly change the execution time of the examined algorithm. Value 1 is the default in most experiments [7,8].

The C code implementing the counting algorithm is presented in Listing 1.

**Listing 1.** C code implementing the counting algorithm

```
1 for ( int i = N − 2 ; i >= 1 ; i −−) {
2 for ( int j = i + 2 ; j <= N; j ++) {
3 for ( int k = i ; k <= j − 1 ; k++) {
4 c [ i ] [ j ] += p ai red ( k , j ) ? c [ i ] [ k − 1 ] + c [ k + 1 ] [ j − 1 ] : 0 ; / / S0
5 }
6 c[i ][ j ] = c[i ][ j ] + c[i ][ j − 1]; / / S1
7 }
8 }
```
The counting algorithm requires high-performance computing for longer RNA sequences. Currently, high-performance computing is possible via the development of parallel tiled applications running on multi-core computers. Code parallelism allows for applying many cores, while tiling improves code locality and increases code granularity, that is crucial for achieving good code performance and scalability. To our best knowledge, there is no manual implementation (as parallel tiled code) of the counting algorithm. Parallel tiled code can be generated automatically by means of optimizing compilers. To generate such a code, we chose two optimizing compilers, Pluto and TRACO, and carried out experiments with codes generated with them. The results of experiments exposed the main drawback of the PLUTO and TRACO codes implementing the counting algorithm: insufficient code locality, which limits code performance and its scalability.

The problem statement is to derive an approach that allows for generation of parallel tiled code which is characterized by better code locality in comparison with that of PLUTO and TRACO code, to apply the approach to the source code implementing the counting algorithm to generate parallel tiled code, and carry out an experimental study to demonstrate the advantage of generated parallel target code implementing the counting algorithm.

A short description of the presented approach is the following. We discovered that the structure of dependences available in the source code implementing the counting algorithm prevents generation of 3D tiled code by means of PLUTO, PLUTO generates only 2D tiled code (the innermost loop is untiled). Increasing a tile dimension from 2D to 3D is crucial for enhancing tiled code locality. The reason is the following. A 2D tile is unbounded because it includes all the loop nest statement instances enumerated along the untiled innermost loop. In general, the upper bound of it is a parameter, so the number of statement instances enumerated with the innermost loop is parametric, that is, unbounded.

Thus, data associated with a single 2D tile cannot be held in cache that reduces code locality, whereas 3D tiles are bounded and choosing a proper tile size allows for keeping all the data of a single 3D tile in a cache that improves code locality.

We discovered that PLUTO generates 2D tiles (it fails to tile the innermost loop) instead of 3D ones because of complex dependences available in the source code implementing the counting algorithm. To improve the features of dependences, we suggest to apply to the source code scheduling according the data flow concept (DFC) that envisages that a loop nest statement instance can be executed when all its operands are ready (already calculated). That is, we suggest to generate new serial source code as a result of the transformation of the source one. For implementing such a transformation, DFC is derived and applied to the source code. We use a formal verification of the validity of derived schedules based on DFC. Then any compiler based on affine transformations can be applied to new source codes to generate parallel tiled codes. The crucial steps in the presented approach are deriving schedules based on DFC and verifying the legality of those schedules. The rest of the steps of the approach are easily implemented with open source tools.

Thus, the goal of the paper is to present an approach to generate high-performance 3D parallel tiled code on the basis of the code presented in Listing 1 and demonstrate the efficiency of that code on two modern multi-core platforms.

Loop tiling is discussed in the Refs. [9–12]. Let us illustrate loop tiling for the loop nest presented in Listing 2.

**Listing 2.** An illustrative example

```
1 for ( int i = N − 2 ; i >= 1 ; i −−) {
2 for ( int j = i + 2 ; j <= N; j ++) {
3 for ( int k = i ; k <= j − 1 ; k++) {
4 c [ i ] [ j ] += p ai red ( k , j ) ? c [ i ] [ k − 1 ] + c [ k + 1 ] [ j − 1 ] : 0 ; / / S0
5 }
6 c[i ][ j ] = c[i ][ j ] + c[i ][ j − 1]; / / S1
7 }
8 }
```
For this loop nest, the loop nest iteration space and the order of iteration execution is presented on the left side of Figure 1.

**Figure 1.** Loop transformation.

We may say that iteration execution takes place within a single tile (block). For a large problem size, it is impossible to hold all the data of such a tile in a cache that hampers code locality.

We may split the loop nest iteration space into tiles as shown on the right side of Figure 1. If a tile size is chosen properly, that is, all the data of a single tile can be held in the cache and those data occupy as much cache capacity as possible, we may considerably improve code locality.

The tiled code is presented in Listing 3. As we can see, the tiled code includes four loops. The two outermost loops enumerate tiles, while the two innermost loops enumerate iterations within a single tile. In general, it is difficult to build tiled code manually even for simple loop nests. Usually, optimizing compilers are used for automatic tiled code generation.

Automatically generated parallel tiled code is based on the polyhedral model [13]. For a given loop nest, this model envisages forming the following data: (i) a loop nest iteration space (a set of all the iterations executed with the loop nest), (ii) an original loop nest schedule in the global iteration space, and (iii) dependence relations. This model can be used for implementation of many transformations, for example, loop interchange, loop unrolling, loop fusion, loop fission, and register blocking. However, the most popular and effective transformations are loop parallelization and loop tiling. The polyhedral model is a basis of the affine transformation framework [13] and the correction approach [14], which allow for automatic parallel tiled code generation.

**Listing 3.** An illustrative example

```
1 for ( TI = 0; TI <N; TI +=16)
2 for ( T J = 0; TJ <N; T J +=16)
3 for ( i =TI ; i <min ( TI +16 ,N) ; i ++)
4 for ( j =T J ; j <min ( T J +16 ,N) ; j ++)
5 A[ i ] [ j ] = B[ j ] [ i ] ;
```
The main contributions of the paper are the following: (i) introducing an approach to generate 3D tiled parallel code for the counting algorithm; (ii) presenting an OpenMP parallel tiled code implementing the counting algorithm; (iii) presenting and discussing results of experiments on modern multi-core platforms.

The rest of the paper is organized as follows. Section 2.2 presents the background, our approach to generate 3D parallel tiled code, and closely related codes in C implementing the counting algorithm. Section 3 discusses results of experiments with examined codes on two modern multi-core machines. Section 4 concludes.

#### **2. Materials and Methods**

#### *2.1. Background*

Usually, to increase serial code performance, parallelization and loop tiling are applied to the source code. Parallelism allows us to use many threads to execute code, while loop tiling improves code locality and increases parallel code granularity that is crucial for improving multi-threaded code performance.

Loop tiling is a reordering loop transformation that allows data to be accessed in blocks (tiles), with the block size defined as a parameter of this transformation. Each loop is transformed in two loops: one iterating inside each block (intratile) and the other one iterating over the blocks (intertile).

As far as loop tiling is concerned, it is very important to generate target code with maximal tile dimension, which is defined with the maximal number of loops in the loop nest. If one or more the innermost loops remain un-tiled, resulting tiles are unbounded along those untiled loops. This makes tiles also unbounded that reduces tiled code locality because it is not possible to hold in cache all the data associated with a single unbounded tile [15,16]. If one or more of the outermost loops are untiled, they should be executed serially. This reduces target code parallelism and introduces additional synchronization events reducing target code performance [10,13].

Each iteration in the loop nest iteration space is represented with an iteration vector. All iteration vectors of a given loop statement form the iteration space of that statement.

Code can expose dependences among iterations in a code iteration space. A dependence is a situation when two different iterations access the same memory location and at least one of these accesses is written. Each dependence is represented by its source and destination.

To extract dependences and generate target code, we use PET [17] and the iscc calculator [18]. The iscc calculator is an interactive tool for manipulating sets and relations of integer tuples bounded by affine constraints over the set variables, parameters and existentially quantified variables. PET is a library for extracting a polyhedral model from a C source. Such a model consists of an iteration space, access relations, and a schedule, each of which is described using affine constraints. A PET schedule specifies the original execution order of loop nest statement instances.

PET extracts dependences in the form of relations, where the input tuple of each relation represents iteration vectors of dependence sources and the output tuple represents those of the corresponding dependence destinations; that is, the dependence relation, *R*, is presented in the following form:

### *R* := [*parameters*] → {[*input tuple*] → [*output tuple*] | *constraints*},

where [*parameters*] is the list of all parameters of affine *constraint* imposed on [*input tuple*] and [*output tuple*].

For the dependence, a distance vector is the difference between the iteration vector of its destination and that of its source. Calculating such a difference is possible when both abovementioned vectors are of the same length. This is true for perfectly nested loops where all statements are surrounded with all loops. Otherwise, loops are imperfectly nested, that is, the dimensions of iteration spaces of loop nest statements are different and we cannot directly calculate a distance vector.

In such a case, to calculate distance vectors, we normalize the iteration space of each statement so that all the iteration spaces are of the same dimension. Normalization consists in applying a global schedule extracted with PET for each loop nest statement to an iteration space of the statement. The entire global schedule corresponds to the original execution order of a loop nest. As a result, the iteration spaces of each statement become of the same dimension in the global iteration space and we are able to calculate all distance vectors. We present details of normalization in the following subsection.

To tile and parallelize source codes, we should form time partition constraints [10] that state that if iteration *I* of statement *S*1 depends on iteration *J* of statement *S*2, then *I* must be assigned to a time partition that is executed no earlier than the partition containing *J*, that is, schedule (*I*) ≤ schedule (*J*), where schedule (*I*) and schedule (*J*) denote the discrete execution time of iterations *I* and *J*, respectively.

Linear independent solutions to time partition constraints are applied to generate schedules for statement instances of original code. Those affine schedules are used to parallelize and tile an original loop nest.

We strived to extract as many linear independent solutions to time partition constraints as possible because the number of those solutions defines the dimension of generated tiles [10].

The affine transformation framework comprises the above considerations and includes the following steps: (i) extracting dependence relations, (ii) forming time partition constraints on the basis of dependence relations, (iii) resolving the time partition constraints striving to find as many linearly independent solutions as possible, (iv) forming affine transformations on the basis of the independent solutions, and (v) generating parallel tiled code.

Details of the affine transformation framework can be found in the Ref. [13]. In the same paper, implementation details of the PLUTO compiler based on the affine transformation are presented.

An alternative approach to generate parallel tiled code is based on applying the transitive closure of a dependence graph. This approach is introduced in the Ref. [14]. It envisages the following steps: (i) extracting dependence relations, (ii) forming a dependence graph as the union of all dependence relations, (iii) calculating the transitive closure of the dependence graph, (iv) applying transitive closure to form valid tiles, and (v) generating parallel tiled code. This approach does not form and apply any affine transformation. The approach is implemented in the TRACO compiler.

#### *2.2. 3D Tiled Code Generation*

For the original loop nest in Listing 1, PET returns the following iteration spaces, *D*0 and *D*1, for statements *S*0 and *S*1, respectively.

$$D0 := N \to \{ \ S\_0(i, j, k) \mid i > 0 \land 2 + i \le j \le N \land i \le k < j \},$$

$$D1 := N \to \{ \ S\_1(i, j) \mid i > 0 \land 2 + i \le j \le N \}.$$

As we can see, the dimensions of the iteration spaces of *S*0 and *S*1 are different, so we could not directly calculate distance vectors. To normalize the iteration spaces, we applied the following global schedules returned with PET:

$$\mathcal{M}0 := \mathcal{N} \to \{ \mathcal{S}0(i, j, k) \to (-i, j, 0, k) \mid \},$$

$$M1 := N \to \{ \ S1(i, j) \to (-i, j, 1, 0) \},$$

to sets *D*0 and *D*1, respectively, and obtained the following global spaces:

$$\begin{aligned} D0' &:= N \to \{ \begin{array}{l} (-i, j, 0, k) \mid i > 0 \land 2 + i \le j \le N \land i \le k < j \} \}, \\\\ D1' &:= N \to \{ \begin{array}{l} (-i, j, 1, 0) \mid i > 0 \land 2 + i \le j \le N \end{array} \}. \end{aligned}$$

It is worth noting that if a statement appears in a sequence of statements, PET extends the global schedule for these statements with a constant representing a global schedule dimension. The values of these dimensions correspond to the order of the statements in the sequence. If a statement appears as the body of a loop, then the schedule is extended with both an initial domain dimension and an initial range dimension. In schedules *M*0 and *M*1, in the output (right) tuples in the third positions, constants 0 and 1 are inserted, while in the fourth position of the output tuple of *M*1, constant 0 is inserted because statement *S*1 is not surrounded with iterator *k*.

In the same way, we transform dependence relations returned with PET and presented in original iteration spaces to dependence relations presented in the global iteration space, where all relation tuples are of the same dimension. Applying the deltas operator of the iscc calculator, which calculates the difference between the output and input tuples of dependence relations in the global iteration space, we obtained the following distance vectors represented with sets.

$$\begin{aligned} D1 &:= \{ \{ (0, i1, i2, i3) \mid i1 \ge 0 \land i1 \le i2 \le 1 \land i3 > i1 \}, \\\\ D2 &:= \{ \{ (0, 1, 0, i3) \mid i0 > 0 \land i3 < 0 \}, \\\\ D3 &:= \{ \{ (0, i1, 0, i3) \mid i1 \ge 2 \land i3 \ge 2 \}, \\\\ D4 &:= \{ \{ (i0, 1, -1, i3) \mid i0 > 0 \land i3 \le -3 \}, \\\\ D5 &:= \{ \{ (0, i1, -1, 1) \mid i1 \ge 2 \} \}, \\\\ D6 &:= \{ \{ (0, 1, 0, 1) \} \}. \end{aligned}$$

To simplify extracting affine transformations, we approximate the distance vectors above with a single distance vector, which represents each distance vector presented above: *D* := { (*i*0, *i*1, *i*2, *i*3) | *i*0 ≥ 0 ∧ (*i*1 = 1 ∨ *i*1 ≥ 2) ∧ (*i*2 = 0 ∨ *i*2 = −1 ∨ *i*1 ≥ *i*2 ≥ 1) ∧ (*i*3 > *i*1 ∨ *i*3 < 0 ∨ *i*3 > 2 ∨ *i*3 ≤ −3 ∨ *i*3 = 1)}.

It is worth noting that the constraints of *D* are the logical conjunction of all the constraints of *D*1, *D*2, *D*3, *D*4, *D*5, and *D*6.

The time partition constraint formed on the basis of vector *D* is the following.

$$\mathbf{x}\_0 \ast i\mathbf{0} + \mathbf{x}\_1 \ast i\mathbf{1} + \mathbf{x}\_2 \ast i\mathbf{2} + \mathbf{x}\_3 \ast i\mathbf{3} \ge \mathbf{0}, \text{ constraints} \tag{2}$$

where *x*0, *x*1, *x*2, *x*<sup>3</sup> are unknowns, and *constraints* is the constraints of set *D*.

Because variable *i*3 is unbounded, that is, −∞ ≤ *i*3 ≤ ∞, we conclude that *x*<sup>3</sup> should be equal to 0 to satisfy constraint (2). We also conclude that unknown *x*2 should be 0 because variable *i*2 is not any loop iterator, it represents global schedule constants, which should not be transformed; they are used only to properly generate target code (correctly place loop statements in target code).

Taking into account the conclusions above, we consummate that there exist only two linearly independent solutions to constraint (2), for example, (1, 0, 0, 0)*<sup>T</sup>* and (0, 1, 0, 0)*T*.

Thus, for the code in Listing 1, by means of affine transformations, we are able to generate only 2D tiles (see Background).

Next, we use the concept of a loop nest statement instance schedule, which specifies the order in which those instances are executed in target code. To improve the features of the dependences of the code presented in Listing 1, we suggest applying to the loop nest statements a schedule formed according to the data flow concept DFC): first, the readiness time for each operand of each statement should be defined, for example, if *t i* 1, *t i* <sup>2</sup>, ... , *t i <sup>k</sup>* are *k* discrete times of the readiness of *k* operands of statement *i*, then the schedule of statement *i* is defined as follows: *ti* = *max*(*t i* 1, *t i* <sup>2</sup>, ... , *t i <sup>k</sup>*) + 1. On the right of that formula, the first term defines the maximal time among all operand readiness times of statement *i*, and "+1" means that statement *i* can be executed at the next discrete time after all its operands are ready (already calculated).

DFC schedules should be defined and applied to all the statements of the source loop nest to generate a transformed serial loop.

Analyzing the operands *c*[*i*][*k* −1] and *c*[*k* +1][*j* −1] of statement *S*0 in Listing <sup>1</sup> as well as the bounds of loops *i* and *j*, we may conclude that their readiness times are *k* − *i* − 1 and *j* − *k* − 2, respectively. We also take into account that element *c*[*i*][*j*] can be updated many times for different values of *k*, and the final value of *c*[*i*][*j*] is formed in time *j* − *i* − 1. Thus, according to DFC, statement *S*0 is to be executed at time *t* = *max*(*k* − *i* − 1, *j* − *k* − 2) + 1 for variables *i* and *j* satisfying the constraint *t* <= *j* − *i* − 1. The last constraint means that element *c*[*i*][*j*] formed with statement *S*0 can be updated many times at time *t*, satisfying the condition *t* <= *j* − *i* − 1. Thus, taking into account the global schedule of statement *S*<sup>0</sup> represented with relation *M*0, we obtain the following DFC schedule, *SCHED*(*S*0).

$$SCHED(S0) := N \rightarrow \{ \ S\_0(i, j, k) \rightarrow t = \max(k - i - 1, j - k - 2) + 1, -i, j, 0, k) \mid t \le j - i - 1 \}.$$

Analyzing statement *S*1, we may conclude that it should be executed when for given *i* and *j*, loop *k* is terminated, that is, the calculation of the value of element *c*[*i*][*j*] is terminated, that is, at time *j* − *i* − 1. Thus, we obtained the following schedule for statement *S*1 taking into account the global schedule for *S*1 presented with relation *M*1 above.

$$\text{SCHED}(S1) := N \rightarrow \{ \ S\_0(i, j, k) \rightarrow (t = \max(k - i - 1, j - k - 2) + 1, -i, j, 1, 0) \mid t = j - i - 1 \}.$$

The constraint *t* = *j* − *i* − 1 means that statement *S*1 can be updated only when loop *k* is terminated. Constant 1 in the third position of the tuple (*t* = *max*(*k* − *i* − 1, *j* − *k* − <sup>2</sup>) + 1, −*i*, *j*, 1, 0) guarantees that statement *S*1 should be executed after terminating all the iterations of loop *k*.

Applying schedules *SCHED*(*S*0) and *SCHED*(*S*1) to statements *S*0 and *S*1, by means of the codegen iscc operator, we obtain the transformed code presented in Listing 4.

**Listing 4.** Transformed C code implementing the counting algorithm

1 **for** ( **int** c0 = 1 ; c0 < N − 1 ; c0 += 1 ) 2 **for** ( **int** c1 = −N + c0 + 1 ; c1 < 0 ; c1 += 1 ) 3 **for** ( **int** c2 = c0 − c1 + 1 ; c2 <= min (N, 2 \* c0 − c1 + 1 ) ; c2 += 1) { 4 **i f** ( 2 \* c0 >= c1 + c2 ) 5 { 6 c [ − c1 ] [ c2 ] += p ai red ( − c0 + c2 − 1 , c2 ) ? c [ − c1 ] [ − c0 + c2 − 1 − 1] + c[−c0 + c2 − 1 + 1][ c2 − 1] : 0; 7 } 8 c [ − c1 ] [ c2 ] += p ai red ( c0 − c1 , c2 ) ? c [ − c1 ] [ c0 − c1 − 1 ] + c [ c0 − c1 + 1][ c2 − 1] : 0; 9 **i f** ( c1 + c2 == c0 + 1 ) 10 { 11 c [ − c1 ] [ c0 − c1 + 1 ] = c [ − c1 ] [ c0 − c1 + 1 ] + c [ − c1 ] [ c0 − c1 + 1 − 1]; 12 } 13 }

That code respects all dependences available in the code in Listing 1 due to the following reason. In the code in Listing 1, we distinguish two types of dependences: standard ones and reductions. If the loop nest statement uses an associative and commutative operation such as addition, we recognize the dependence between two references of this statement as a reduction dependence [19]. For example, in the code in Listing 1, statement *S*0

$$c[i][j] += paired(k,j) \, ?c[i][k-1] + c[k+1][j-1] : 0; / / S0$$

causes reduction dependences regarding to reads and writes of element *c*[*i*][*j*].

We may allow them to be reordered provided that a new order is serial, that is, reduction dependences do not impose an ordering constraint; in the code in Listing 4, reduction dependences are respected due to the serial execution of loop nest statement instances.

Standard dependences available in the code in Listing 1 are respected via implementing the DFC concept.

To prove the validity of the applied schedules to generate the code in Listing 4 in a formal way, we use the schedule validation technique presented in the Ref. [20].

Given relation *F* representing all the dependences to be respected, schedule *S* is valid if the following inequality is true:

$$
\Delta(S \circ F \circ S^{-1}) \succeq 0,
$$

where Δ is the operator that maps a relation to the differences between image and domain elements.

The result of the composition *<sup>R</sup>* = (*<sup>S</sup>* ◦ *<sup>F</sup>* ◦ *<sup>S</sup>*−1) is a relation where the input (left) and output (right) tuples represent dependence sources and destinations, respectively, in the transformed iteration space. A schedule is valid (respects all the dependences available in an original loop nest) if the vector whose elements are the differences between the image and domain elements of relation *R* is lexicographically non-negative ( 0). In such a case, each standard dependence in the original loop nest is respected in the transformed loop nest.

To apply the schedule validity technique above, we extract dependence relations *F* by means of PET. Then we eliminate from *F* all reduction dependences, taking into account the fact that such dependences cause only statement *S*0. We present reduction dependences by means of the following relation:

$$\{\mathcal{S}\_0(i, j, k) \to \mathcal{S}\_0(i, j, k') \mid k' > k\}.$$

Next, applying the iscc calculator, we obtain a relation, *R*, as the result of the composition (*<sup>S</sup>* ◦ *<sup>F</sup>* ◦ *<sup>S</sup>*−1), where S is the union of schedules *SCHED*(*S*0) and *SCHED*(*S*1) defined above to generate the target code.

To check whether each vector represented with set *<sup>C</sup>* <sup>=</sup> <sup>Δ</sup>(*<sup>S</sup>* ◦ *<sup>F</sup>* ◦ *<sup>S</sup>*−1) is lexicographically non-negative, we form the following set that represents all lexicographically negative vectors in the unbounded 5D space.

*LD*5 := *N* → { (*t*, *i*0, *i*1, *i*2, *i*3) | *t* < 0 } ∪ *N* → { (0, *i*0, *i*1, *i*2, *i*3) | *i*0 < 0 } ∪ *N* → { (0, 0, *i*1, *i*2, *i*3) | *i*1 < 0 } ∪ *N* → { (0, 0, 0, *i*2, *i*3) | *i*2 < 0 } ∪ *N* → { (0, 0, 0, 0, *i*3) | *i*3 ≤ 0 }.

Then, we calculate the intersection of sets *C* and *LD*5. That intersection is the empty set, that means that all vectors of *C* are lexicographically non-negative. This proves the validity of schedules *SCHED*(*S*0) and *SCHED*(*S*1).

For the code presented in Listing 4, by means of PET and the iscc calculator, we obtained the following distance vectors.

*D*<sup>1</sup> := { (*i*0, *i*1, 1, *i*3) | *i*<sup>0</sup> ≥ <sup>2</sup> ∧ −<sup>1</sup> ≤ *i*<sup>3</sup> ≤ <sup>1</sup> ∧ ((*i*<sup>1</sup> ≥ <sup>2</sup> + *i*<sup>0</sup> ∧ *i*<sup>3</sup> ≥ <sup>0</sup>) ∨ (*i*<sup>1</sup> > <sup>0</sup> ∧ *i*<sup>3</sup> ≤ <sup>0</sup>)) }, *D*2 := { (2, 0, *i*2, −1) | *i*2 ≥ 2 }, *D*3 := { (*i*0, 0, *i*2, *i*3) | *i*3 ≤ 2 ∧ ((*i*0 ≥ 3 ∧ *i*2 ≥ 3 + *i*0 ∧ −2 ≤ *i*3 ≤ 0) ∨ (*i*0 ≥ 2 ∧ *i*2 ≥ 2 ∧ 0 ≤ *i*3 ≤ 1) ∨ (0 ≤ *i*2 ≤ 1 ∧ *i*2 ≤ *i*0 ∧ *i*3 ≥ *i*2 ∧ *i*3 > −*i*0)) }, *D*4 := { (2, *i*1, 1, −2) | *i*1 > 0 }, *D*5 := { (1, 0, 1, 0) }, *D*6 := { (*i*0, 0, 0, −1) | *i*0 > 0 }.

To simplify extracting affine transformations, we approximate the distance vectors above with a single distance vector, which represents each distance vector presented above. *D* := { (*i*0, *i*1, 1, *i*3) | *i*<sup>0</sup> ≥ <sup>0</sup> ∧ *i*<sup>1</sup> ≥ <sup>0</sup> ∧ *i*<sup>2</sup> ≥ <sup>0</sup> ∧ −<sup>2</sup> ≤ *i*<sup>3</sup> ≤ 2. The time partitions constraint formed on the basis of the vector above is the following.

$$\mathbf{x}\_0 \ast i\mathbf{0} + \mathbf{x}\_1 \ast i\mathbf{1} + \mathbf{x}\_2 \ast i\mathbf{2} + \mathbf{x}\_3 \ast i\mathbf{3} \ge \mathbf{0}, \text{ constraints} \tag{3}$$

where *x*0, *x*1, *x*2, *x*<sup>3</sup> are unknowns, and *constraints* represent the constraints of set *D* above.

Taking into account that unknown *x*3 should be 0 because variable *i*3 is not any loop iterator, it represents global schedule constants and it should not be transformed. There exist three linearly independent solutions to constraint (3), for example, (1, 0, 0, 0)*T*,(0, 1, 0, 0)*T*, and (0, 0, 1, 0)*T*.

Applying the DAPT optimizing compiler [21], which automatically extracts and applies the affine transformations to the code in Listing 4 to tile and parallelize that code by means of the wave-front technique [22], we obtain the following target 3D tiled parallel code (Listing 5) with tiles of size 16 × 32 × 40. By means of experiments, this size was defined by us as the optimal one regarding tiled code performance.

In that code, the first three loops enumerate tiles, while the remaining three loops scan statement instances within each tile. The OpenMP [23] directive #*pragma omp parallel f or* makes the loop *for (int h0=...)* parallel.

The Ref. [24] illustrates the advantage of 3D tiled codes in comparison with 2D tiled ones which implement RNA Nussinov's algorithm [25]. However, there are the following differences between the approaches used for code generation for the Nussinov problem [24] and for the counting problem considered in the current paper. The code for the Nussinov problem is derived on the idea of a calculation model based on systolic arrays (first figure in the Nussinov paper), while code for the counting problem is based on the data flow concept (DFC). The approach presented in the current paper uses the validity technique of the applied schedules to generate target code, while the approach presented in the Nussinov paper does not envisage any formal validation of applied schedules.

**Listing 5.** 3D tiled parallel code


*2.3. Related Codes*

To our best knowledge, there is no manual implementation (as parallel tiled code) of the counting algorithm. To generate such a code, we chose two optimizing compilers, PLUTO and TRACO. They are open source codes and allow for automatic code optimization (tiling and parallelization). We did not consider other optimizing compilers because they do not satisfy one or more of the following demands: the compiler must be a sourceto-source translator, it should be able to tile and parallelize source code, be based on polyhedral techniques, be currently maintained and have no building problems, and be welldocumented. Applying PLUTO to counting source code allows us to generate only 2D tiled parallel code while applying TRACO results in the generation of codes with irregular tiles.

The usefulness of generated tiled parallel code presented in this paper is the following. In relation to the 2D tiled PLUTO code, the generated code by means of the presented approach is 3D. This allows us to increase code locality and, as a consequence, improve target code performance. With regard to the PLUTO code, 3D tiled code enumerates regular bounded tiles that allows for improving code locality in comparison with the PLUTO code, and this results in improving code performance. The tile regularity of 3D tiled code also

provides better code locality in comparison with the TRACO code because a tile size is limited and there is a possibility to choose a tile size such that all the data associated with a single tile can be held in cache.

**Listing 6.** PLUTO [13] code implementing the counting algorithm

```
1 i f (N >= 3 )
2 {
3 for ( t 1 = 3 ; t 1 <= N; t 1 ++)
4 {
5 lbp = 0;
6 ubp = fl o o rd ( t 1 − 2 , 3 2 ) ;
7 #pragma omp p a r a l l e l for p ri v a te ( lbv , ubv , t3 , t4 , t 5 )
8 for ( t 2 = lbp ; t 2 <= ubp ; t 2 ++)
9 {
10 for ( t 3 = t 2 ; t 3 <= fl o o r d ( t1 , 3 2 ) ; t 3 ++)
11 {
12 i f ( ( t 1 >= 32 * t 3 + 1 ) && ( t 1 <= 32 * t3 + 31) )
13 {
14 for ( t 4 = max ( 1 , 32 * t 2 ) ; t 4 <= min ( t 1 − 2 , 32 * t2 + 31) ; t4
             ++)
15 {
16 for ( t 5 = max ( 3 2 * t3 , t 4 ) ; t 5 <= t 1 − 1 ; t 5 ++)
17 {
18 c [ t 4 ] [ t 1 ] += p ai red ( t5 , t 1 ) ? c [ t 4 ] [ t 5 − 1 ] + c [ t 5 + 1 ] [ t 1 −
                1] : 0;
19 }
20 c [ t 4 ] [ t 1 ] = c [ t 4 ] [ t 1 ] + c [ t 4 ] [ t 1 − 1 ] ;
21 }
22 }
23 i f ( t 1 >= 32 * t3 + 32)
24 {
25 for ( t 4 = max ( 1 , 32 * t 2 ) ; t 4 <= min ( t 1 − 2 , 32 * t2 + 31) ; t4
             ++)
26 {
27 for ( t 5 = max ( 3 2 * t3 , t 4 ) ; t 5 <= 32 * t3 + 31; t5 ++)
28 {
29 c [ t 4 ] [ t 1 ] += p ai red ( t5 , t 1 ) ? c [ t 4 ] [ t 5 − 1 ] + c [ t 5 + 1 ] [ t 1 −
                1] : 0;
30 }
31 }
32 }
33 i f ( t 1 == 32 * t3 )
34 {
35 for ( t 4 = max ( 1 , 32 * t 2 ) ; t 4 <= min ( t 1 − 2 , 32 * t2 + 31) ; t4
             ++)
36 {
37 i f ( t 1 % 32 == 0 )
38 {
39 c [ t 4 ] [ t 1 ] = c [ t 4 ] [ t 1 ] + c [ t 4 ] [ t 1 − 1 ] ;
40 }
41 }
42 }
43 }
44 }
45 }
46 }
```
The codes used for comparison are presented below. These codes were obtained from the code in Listing 1. The code in Listing 6 is generated by the PLUTO parallel compiler [13] and the code in Listing 7 is generated by TRACO [14].

The code in Listing 6 enumerates 2D tiles of size 32 × 32, which was established as the optimal one by us via experiments. PLUTO implements the affine transformation framework, and as we demonstrate in Section 2.2, there exist only two linearly independent solutions for the time partition constraints formed for the code in Listing 1. Thus, the maximal dimension of the tiles in the code in Listing 6 is 2D.

Traco generates 3D tiles of size 8 × 127 × 16 (defined by us as the optimal one by means of experiments), but tiles are irregular, some of them are unbounded, hampering thread load balance and reducing code locality because not all the data associated with a single unbounded tile can be held in the cache.

**Listing 7.** TRACO [14] code implementing the counting algorithm

```
1 for ( c1 = 0 ; c1 < N + fl o o r d (N − 3 , 1 2 8 ) − 2 ; c1 += 1 )
2 #pragma omp p a r a l l e l for
3 for ( c3 = max ( 0 , −N + c1 + 3 ) ; c3 <= c1 / 1 2 9 ; c3 += 1 )
4 for ( c4 = 0 ; c4 <= 1 ; c4 += 1 )
5 {
6 i f ( c4 == 1 )
7 {
8 for ( c9 = N − c1 + 129 * c3 ; c9 <= min (N, N − c1 + 129 * c3 +
           1 2 7 ) ; c9 += 1 )
9 for ( c10 = max ( 0 , −c1 + 64 * c3 − c9 + (N + c1 + c3 + c9 + 1 )
            / 2 + 1 ) ; c10 <= 1 ; c10 += 1 )
10 {
11 i f ( c10 == 1 )
12 {
13 c [ (N − c1 + c3 − 2 ) ] [ c9 ] = c [ (N − c1 + c3 − 2 ) ] [ c9 ] + c [ (N −
               c1 + c3 − 2) ] [ c9 − 1 ];
14 }
15 else
16 {
17 for ( c11 = N − c1 + 129 * c3 + 1 ; c11 < c9 ; c11 += 1 )
18 c [ (N − c1 + c3 − 2 ) ] [ c9 ] += p ai red ( c11 , c9 ) ? c [ (N − c1 +
               c3 − 2) ] [ c11 − 1] + c [ c11 + 1][ c9 − 1] : 0;
19 }
20 }
21 }
22 else
23 {
24 for ( c5 = 0 ; c5 <= 8 * c3 ; c5 += 1 )
25 for ( c9 = N − c1 + 129 * c3 ; c9 <= min (N, N − c1 + 129 * c3 +
              1 2 7 ) ; c9 += 1 )
26 for ( c11 = N − c1 + c3 + 16 * c5 − 2 ; c11 <= min (min (N − c1
              + 129 * c3 , N − c1 + c3 + 16 * c5 + 1 3 ) , c9 − 1 ) ; c11 +=
                1 )
27 c [ (N − c1 + c3 − 2 ) ] [ c9 ] += p ai red ( c11 , c9 ) + c [ (N − c1 +
               c3 − 2) ] [ c11 − 1] + c [ c11 + 1][ c9 − 1] + 0;
28 }
29 }
```
#### **3. Results**

First we show impact of the number of threads on performace in Figures 2 and 3. To carry out the experiments, we used two machines: (i) a processor Intel(R) Xeon(R) Gold 6326 (2.90GHz, 2 physical units, 32 cores, 64 threads, 24 MB Cache) (results obtained on that machine are presented in Figures 4 and 5) and (ii) a processor Intel(R) i7-11700KF (3.6GHz, 8 cores, 16 threads, 16MB Cache), where results achieved on that machine are depicted in Figures 6 and 7. All examined codes were compiled by means of the gcc 11.3 compiler with the –O3 flag of optimization. The reason for using this option was to generate optimal serial and parallel executable code. Codes without this option run much longer for both sequential and parallel code.

Experiments were carried out for 24 RNA randomly generated sequence lengths of the problem defined with parameter *N* from 500 to 12,000. The results presented in the Ref. [26,27] show that cache-efficient code performance does not change based on the strings themselves, but it depends on the size of a string. We also performed a study with a variable number of threads, with sequence length 12,000 on Intel Xeon and sequence length 10,000 on Intel i7. We used a range from 1 to 32 threads with step 1 for both machines. Results are shown in Figures 2 and 3. We compared the performance of the 3D tiled code generated with the presented approach with that of the following codes:


All source codes used for carrying out experiments as well as a program allowing us to run each parallel program for a random or real RNA strand can be found in the Data Availability section.

**Figure 2.** Speedup for different thread numbers. Intel Xeon—sequence length 12,000.

### *3.1. Impact of the Number of Threads on Code Performance*

3.1.1. Intel Xeon

Figure 2 shows the speedup of the three examined parallel codes—the ratio of the serial code execution time to that of the parallel one. The 3D code speedup demonstrates nearly linear speedup. From this chart, it can be assumed that the 3D code is well-scalable, that is, it is possible to increase code parallelism further when increasing the number of threads. The 3D code can be run on a machine with a large number of threads without any code modification.

#### 3.1.2. Intel i7

Figure 3 shows the speedup of the same parallel codes on a i7 processor. The important part of this chart is between 12 and 16 threads (16 threads is the maximum for this unit). From this chart, it can be assumed that the 3D code works very well at the maximum level of threads, and it is also possible to increase code parallelism further when increasing the number of cores and threads.

**Figure 3.** Speedup for different thread numbers. Intel i7—sequence length 10,000.

#### *3.2. Impact of the Problem Size on Code Performance*

#### 3.2.1. Intel Xeon

For the results depicted in Figures 4 and 5, one can see a clear advantage of the 3D code for larger problem sizes. For smaller problem sizes, taking into account how the PLUTO code is simpler than 3D code (it comprises 5 loops while 3D code includes 6 loops), that is, PLUTO executes less iterations than 3D code does, it demonstrates better performance than that of the 3D code, while for larger problem sizes, better code locality of 3D code in comparison with that of PLUTO one outweighs the benefits of PLUTO code simplicity.

**Figure 4.** Speedup for different sequence lengths. Intel Xeon—16 threads.

#### 3.2.2. Intel i7

The code in Figure 6 shows that with a larger problem size, it is possible to calculate faster for the 3D code. In addition to that, the 3D code is characterized by much greater stability of performance (no spikes in speedup for the both processors tested).

**Figure 5.** Speedup for different sequence lengths. Intel Xeon—32 threads.

**Figure 6.** Speedup for different sequence lengths. Intel i7—16 threads.

In addition to that, Figure 7 shows that for the 3D code, it is possible to utilize the available threads fully. However, with more threads than the number of threads available on the processor, the operation of this code is not the fastest one.

**Figure 7.** Speedup for different sequence lengths. Intel i7—32 threads.

#### **4. Discussion**

The approach presented in this paper allows for parallel tiled code generation for the counting algorithm. Target code demonstrates a significant increase in code performance, largely over original sequential code. For larger problem sizes, it outperforms related parallel tiled codes and exposes better scalability. The experimental results carried out by us show that the 3D parallel tiled code, implementing the counting algorithm, utilises computational capabilities of modern processor cores very well. The advantages of the obtained 3D code are more obvious for a large problem size. We plan to apply the presented approach to other bioinformatics codes whose dependence patterns are similar to those available in the code implementing the counting algorithm. This allows for increasing the tile dimension as a consequence of increasing the performance and scalability of target codes. We also intend to fully automate the process of target code generation and implement it in an optimizing compiler.

**Author Contributions:** Conceptualization and methodology, W.B. and P.B.; software, P.B.; validation, W.B., P.B.; data curation, P.B.; original draft preparation, P.B.; writing—review and editing, W.B. and P.B.; visualization, P.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Source codes to reproduce all the results described in this article can be found at: https://github.com/piotrbla/counting3d. The iscc script (validitycheck.iscc) carrying out the calculations above is presented at https://github.com/piotrbla/counting3d/blob/main/ validity\_check.iscc (accessed on 12 January 2023).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

RNA RiboNucleic Acid TRACO compiler based on the TRAnsitive ClOsure of dependence graphs

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Covering Arrays ML HPO for Static Malware Detection**

**Fahad T. ALGorain \*,† and John A. Clark \*,†**

Department of Computer Science, University of Sheffield, Sheffield S10 2TN, UK

**\*** Correspondence: ftalgorain1@sheffield.ac.uk (F.T.A.); john.clark@sheffield.ac.uk (J.A.C.)

† These authors contributed equally to this work.

**Abstract:** Malware classification is a well-known problem in computer security. Hyper-parameter optimisation (HPO) using covering arrays (CAs) is a novel approach that can enhance machine learning classifier accuracy. The tuning of machine learning (ML) classifiers to increase classification accuracy is needed nowadays, especially with newly evolving malware. Four machine learning techniques were tuned using cAgen, a tool for generating covering arrays. The results show that cAgen is an efficient approach to achieve the optimal parameter choices for ML techniques. Moreover, the covering array shows a significant promise, especially cAgen with regard to the ML hyperparameter optimisation community, malware detectors community and overall security testing. This research will aid in adding better classifiers for static PE malware detection.

**Keywords:** cAgen; combinatorial testing; covering arrays; machine learning; static PE malware detection; hyper-parameter optimisation; grid search

#### **1. Introduction**

#### *1.1. Malware and Its Detection*

Malicious software is any programme that can be executed that is intended to cause harm. Academic and commercial research and development into malware detection has been a constant focus for some time now [1] and malware remains one of the most important concerns in contemporary cybersecurity. There are three approaches to malware detection: static, dynamic, and hybrid detection. Static malware detection analyses malicious binary files without executing them; this is the focus of this paper. Dynamic malware detection uses the features of run-time execution behaviour to identify malware. Hybrid detection combines the two previous approaches. Many companies and universities have significantly invested in developing new methods for identifying malware and many researchers have looked into the possibility of using machine learning (ML) to detect it.

#### *1.2. Ml-Based Static Malware Detection Related Literature*

Windows Portable Execution (PE) malware is one of the most common forms of encountered malware. Several works have explored the use of machine learning for PE malware detection, e.g., [2–4]. In [5] the authors provided a dataset (usually referred to as the Ember dataset) accompanied by various Python routines to facilitate access. They also provided baseline applications of various ML techniques to their dataset. In [6], the authors considered imbalanced dataset issues and model training duration. They also applied a static detection method using a gradient-boosting decision tree algorithm. Their model achieved better performance than the baseline model with less training time. (They used feature reduction based on the recommendation of the authors of [5].) Another approach used a subset of the Ember dataset for their work and compared different ML models [7]. Their goal was to identify malware families and their work was mainly concerned with scalability and efficiency. The proposed random forest model achieved a slightly better performance than the baseline model. In [8], the authors used a hybrid of two datasets, Ember (version 2017) and another dataset from the security partner of Meraz' 18 techno

**Citation:** ALGorain, F.T.; Clark, J.A. Covering Arrays ML HPO for Static Malware Detection. *Eng* **2023**, *4*, 543–554. https://doi.org/ 10.3390/eng4010032

Academic Editor: Antonio Gil Bravo

Received: 9 December 2022 Revised: 16 January 2023 Accepted: 7 February 2023 Published: 9 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

cultural festival (IIT Behali). A feature selection method—Fast Correlation-based Feature Selection (FCBF)—was used to improve their model's performance. Thirteen features (with high variance) were selected. Several ML models (Decision Trees, Random Forest, Gradient boost, AdaBoost, Gaussian Naive Bayes) were introduced and trained. The Random Forest approach achieved the highest accuracy (99.9%) [9]. The study used the same dataset as this paper. It proposed an ensemble learning-based method for malware detection. A stacked ensemble of fully connected, one-dimensional convolutional neural networks (CNNs) performs the initial stage classification, while a machine learning algorithm handles the final stage classification. They evaluated 15 different machine learning classifiers in order to create a meta-learner. Several machine learning techniques were utilised for this comparison: Naive Bayes, Decision Tree, Random Forests, GB, K-Nearest Neighbours, Stochastic Gradient Descent and Neural Nets. The evaluation was conducted on the Windows Portable Executable (PE) malware dataset. An ensemble of seven neural networks with the ExtraTrees classifier as the last-stage classifier performed the best, achieving perfect accuracy. The model parameters were not stated.

Determining the full detection capabilities of the various methods is a tricky business, particularly when such methods are ML-based. Parameter selections for ML algorithms, for instance, are typically crucial to their performance and yet specific choices in the literature often lack convincing (or sometimes any) rationale. In this paper, we explore how to optimise the parameters of such algorithms, a process known as hyper-parameter optimisation (HPO). We specifically investigate the use of covering arrays as a way to combat the curse of dimensionality that results from Grid Search, which is the most common systematic approach used.

#### *1.3. Grid Search and the Curse of Dimensionality*

Grid Search is a powerful and widely used means of searching a parameter space to seek sets of values that give the best performance. Grid Search applies the full combinatoric evaluation of the cross-product of discretised parameter domains. A discretised domain is a set of 'representative' elements that 'span' the domain in some way. For example, the set 0, 5, ..., 95, 100 can be considered to span the set comprising the integers 0..100. The real interval [0, 1] can be spanned for some purposes by the set with the elements 0.0, 0.25, 0.5, 0.75, and 1.0.

The total number of combinations for Grid Search is the product of the (assumed finite) cardinalities of the individual discretised domains *Di*.

$$total\text{Combinations} = \prod\_{i=1}^{n} card(D\_i)$$

Grid Search can obviously give a thorough exploration of the parameter space, assuming that the individual domains are suitably discretised. However, in some areas of engineering, it is found that full combinatorial evaluation can be wasteful. For example, in software testing, a particular sub-combination coverage of parameter values can provide a very high fault detection capability. However, we do not know in advance the specific sub-combinations that will be the most revealing. Some effective means of exploring the combinatorial space is needed so that we do not incur the costs of a full grid search.

*Covering arrays* provide one such mechanism. Furthermore, the concept can be applied at different 'strengths', allowing flexibility in the thoroughness of the exploration of the search space at hand. Each discretised parameter domain has a set of values. A covering array is defined over the cross-product of the discretised domains *D* = *D*<sup>1</sup> × *D*<sup>2</sup> × ... × *Dn*. The rows of the array denote specific tests. The columns of the array denote specific parameters. The (*i*, *j*) element of the array is the value of parameter *j* in test *i*. The Cartesian product of all parameter sets defines complete combinatorial coverage. The rows of a covering array provide a subset of that with a particular strength *t*. In a CA with strength *t*, then for any subset of t parameters, each possible t-tuple of values occurs in at least one row (test). This is often called t-way testing. Orthogonal arrays (OAs) are the optimal version of

CAs where each t-tuple occurs *exactly* once (rather than at least once). For some problems, an OA may not actually exist. Pairwise testing (*t* = 2) is widely used. Furthermore, it has been found more generally that small values of *t* can actually give a strong performance in fault-finding. As *t* increases, the size of the covering array also increases. The test set reduction achieved by covering arrays compared with a full combinatorial grid search may be very significant. The concept of strength is illustrated below.

We will generally denote a covering array with cross-product D as above and with strength *t* by *CA<sup>D</sup> <sup>t</sup>* . Consider a combinatorial search space with the (discretised) parameter domains *A* = {0, 1}, *B* = {0, 1}, and *C* = {0, 1}, i.e., with a cross-product *Dabc* = *A* × *B* × *C*. A *CADabc* <sup>1</sup> provides a suite of cases where each value of each domain occurs at least once. This is easily achieved by an array with just two rows (i.e., two cases) as shown below. *A* = 0 occurs in the first row and *A* = 1 occurs in the second. This is similar for *B* and *C*.


If we had, say, 26 binary domains *A*, *B*, ..., *Z*, then a similar covering array, i.e., a *CADabc*..*<sup>z</sup>* <sup>1</sup> with two rows would satisfy the *t* = 1 strength requirement, i.e., with rows as shown below.


A *CA<sup>D</sup>* <sup>1</sup> clearly gives a rather weak coverage (exploration) of the domain space for most purposes. In the *A*, *B*, ..., *Z* example, only 2 from 226 possible row values are sampled. For a *CA<sup>D</sup>* <sup>2</sup> , each combination of values from any two (*t* = 2) domains is present in the array. A *CADabc* <sup>2</sup> for the A, B, and C example is given below.


We can see that the four possible values of (A, B) are present, i.e., (*A*, *B*) =(0, 0) in row 0, (0, 1) in row 1, (1, 0) in row 2, and (1, 1) in row 3. Similarly, we can see that four possible values of (*A*, *C*) and the four values of (*B*, *C*) are also present. Thus, all pairs of values from any two domains from A, B, and C are present and so the given array is indeed a *CADabc* <sup>2</sup> . The simplest *CADabc* <sup>3</sup> array would give full combinatorial coverage, i.e., with all eight (*A*, *B*, *C*) combinations, with the usual binary enumeration of 0–7 for the rows, i.e., [0, 0, 0] through to [1, 1, 1].

#### *1.4. Generating Covering Arrays*

The actual generation of arrays is not our focus. A good deal of theoretical and practical work has been carried out into algorithms to do so. Our motivation to use covering arrays was inspired by their use in software testing. The in-parameter order (IPO) method for generating *CA<sup>D</sup>* <sup>2</sup> arrays for test suites is given in [10]. As they state, "For a system with two or more input parameters, the IPO strategy generates a pairwise test set for the first two parameters, extends the test set to generate a pairwise test set for the first three parameters, and continues to do so for each additional parameter."

CAs have become widely used in the combinatorial testing field where they provide a means of reducing the number of tests needed in comparison to exhaustive combinatorial testing. This has led to an increased use of a specific instance of the IPO strategy called inparameter-order-general (IPOG). IPOG can be used to generate covering arrays of arbitrary

strengths [11]. It is a form of greedy algorithm and might not yield the test suites of minimal size. It has been noted that providing an optimal covering array is an NP-complete problem [12].

The IPOG strategy has gained traction in the software testing field. This is due to the competitive test suites that are yielded by the covering arrays it generates in comparison with other approaches for generating test suites. Additionally, it exhibits a lower generation time than other algorithms. The main goal of IPOG is to minimise the generated test suite size. This is a significant area to explore, especially when the cost of testing is very high. The duration of test cycles will be reduced with fewer tests. However, there are some cases when the test execution is very fast and does not impact the overall testing time. Instead, optimising test suites can be very costly as the test generation time can become dominating [13,14]. Optimisation of the IPOG family was introduced by [15].

There are many problems for CAs [16], where the construction of optimal values is known to be the hardest [17]. Various methods for generating covering arrays have been proposed. These include the automatic efficient test generator (AETG) system [18], deterministic density algorithm (DDA) [19,20], in-parameter order [21], and the advanced combinatorial testing system (ACTS) [22], each with its own advantages and disadvantages. Interested readers are referred to [18–21,23], respectively, for more information. The inparameter-order (IPO) strategy grows the covering array column by column, adding rows as needed to ensure full t-way coverage. Various kinds of research on improving the covering array generation with the in-parameter-order strategy have been conducted. The original aim of the strategy was the generalisability of generating covering arrays of arbitrary strength [11] resulting in the in-parameter-order-general (IPOG) algorithm. In [10] a modification to IPOG resulted in smaller covering arrays in some instances and faster generation times. In [24], a combination of IPOG with a recursive construction method was proposed that reduces the number of combinations to be enumerated. In [25], the use of graph-colouring schemes was proposed to reduce the size of the covering arrays. In [26], IPOG was modified with additional optimisations aimed at reducing don't-care values in order to have a smaller number of rows. Most of these presented works primarily aimed to reduce the size of generated covering arrays. The FIPOG technique was shown to outperform the IPOG implementation of ACTs in all benchmarks and improved test generation times by up to a factor of 146 [15].

In this paper, we use an implementation of FIPOG provided by the cAgen tool. We show that the use of FIPOG's covering arrays can achieve excellent results for the hyperparameter optimisation of ML-based classifiers (and better than using the default parameters) far more quickly than when using full grid search. Below, we describe the cAgen tool [27], which implements the FIPOG technique.

#### *1.5. The Cagen Toolset*

The cAgen toolset provides a means of generating the covering arrays and is available online [27]. This allows the user to specify parameters and sets of associated values. For technical reasons that are concerned with our specific approach to the use of covering arrays, we will assume that a discretised parameter domain with R elements is indexed by values 0, 1, (R-1). Figure 1 shows a completed specification for the (*A*, *B*, *C*) example above.


**Figure 1.** Full parameter specification for the ABC example (no constraints)

Having specified the parameters, we can invoke the generation capability of cAgen, as Figure 2 shows the array generation stage for the A, B, and C examples, where a value of t = 2 was selected. If we wanted each pair to occur multiple times, we could specify a larger value of *lambda*. Several generation algorithms are available. Figure 2 shows that we have chosen FIPOG for better performance and fast generation [27]. The array can then be stored in a variety of formats. We chose to use the CSV format throughout.


**Figure 2.** Array generation for ABC example above with t = 2.

#### *1.6. Array Indexing*

We use lists to represent parameter spaces. A list's elements will be either actual parameter values or else a list representing a subdomain. The values 0, 1, ...,(*R* − <sup>1</sup>) are interpreted as indices to the corresponding elements in the discretised domain list. For example, *MAX*\_*DEPTH* = [5, 10, 15, 20, *None*] would be a simple list with four specific integer values and a 'None' value. *LEARNING*\_*RATE* = [0.001, 0.01, 0.1, 0.2] is a simple list of four real values. *MAX*\_*LEAVES* = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] is a list of lists of values. Here, the list cardinalities are given by *card*(*MAX*\_*DEPTH*) = 5, *card*(*LEARNING*\_*RATE*) = 4 and *card*(*MAX*\_*LEAVES*) = 3. Thus, for the list of lists, the cardinality is the cardinality of the highest-level list. Covering array values are indices to the top-level list.

Python lists allow us to include different types of elements. Thus, in *MAX*\_*DEPTH*, we see that integer values, as well as a 'None' parameter value, can be specified. 'None' typically means that the algorithm can proceed as it sees fit, with no direction from the user for this parameter. ScikitLearn's ML algorithms often have such parameters as defaults. Where a parameter is represented by a simple list, then the covering array value for the parameter is used to index the specific element of the list. Thus, an array value of 2 in the covering array column corresponding to *MAX*\_*DEPTH* corresponds to a parameter value of 15, i.e., *MAX*\_*DEPTH*[2] = 15. The indexed array element may be a list. Thus, a covering array value of 2 for *MAX*\_*LEAVES*, gives *MAX*\_*LEAVES*[2]=[7, 8, 9]. In such a case, a value is randomly selected from the indexed list [7, 8, 9]. Thus, each of the values 7, 8, and 9 are now selected with a probability of 0.333. In practice, we represent regular integer ranges of integer values more compactly, via the use of low, high, and increment indicators. Thus, we will typically represent the list [1, 2, 3, 4, 5] by [*low*, *high*, *incr*]=[1, 5, 1]. We adopt the convention of both low and high being included in the denoted range. We distinguish between simple lists with three elements and 'compact' lists with the same three elements (selection is resolved by different routines, determined at the set-up time by the user).

#### *1.7. Structure of the Paper*

In this paper, we investigate whether the clear efficiency benefits a covering array approach can be brought to bear on the ML-based static malware detection problem. Section 2 describes our methodology. Section 2.2 details the performed experiments. The results are presented in Sections 3 and 4 concludes our paper.

#### **2. Methodology**

#### *2.1. Overall Approach*

We apply a variety of ML techniques and specify suitable domains for the parameters we wish to experiment with (other parameters assume defaults). We evaluate over the full combinatorial domain (for Grid Search) and over all rows of the covering arrays of interest (for t = 2, 3, 4). A full combinatorial evaluation or a full covering array evaluation (i.e., all rows evaluated) will be referred to as a 'run' or 'iteration'. We carry out 30 runs for each array of interest and for the full combinatorial case. We do this in order to gain insight into the distribution of outcomes from the technique. Some runs may give better results than others, even if the same array has been used as the basis for the run. This is due to the stochastic selection of elements within selected ranges as indicated above. Pooling the results from the 30 runs provides a means of determining an accurate and useful distribution for the approach. In practice, a user may simply use one run of a covering array search, if they are confident that it will give good enough results. Our evaluation activities aim to determine whether such confidence is justified.

#### *2.2. Experimental Details*

Our work uses two powerful toolkits: scikitLearn [28] and the cAgen tool [27]. The experiments are carried out using the Windows OS 11, with 11 Gen Intel Core i7-11800H, with a 2.30 GHz processor, and 16 GB RAM. The work uses a dataset [29] built using PE files from [30]. The dataset has 19,611 labelled malicious and benign samples from different repositories (such as VirusShare). The samples have 75 features. The dataset is split into a training dataset and a testing dataset (80% training, 20% testing) and can be found in [29]. All results are obtained using Jupyter Notebook version 6.1.0 and Python version 3.6.0.

A small amount of pre-processing is carried out on the malware dataset. The 'Malware' feature records the label for the supervised learning. From the remaining (i.e., input) feature columns, we restrict ourselves to binary and numerical features and so drop the 'Name', 'Machine', and 'TimeDateStamp' features. The filtered input features are then subject to scaling via scikitLearn's StandardScalar fit\_transform function. The same approach is taken for all the ML approaches considered. No further feature engineering is performed. This is deliberate. Our aim is investigate ML model hyper-parameters; we wish to keep other factors constant (researchers whose focus is any of the specific ML approaches are free to engage in further optimisations should they so wish).

We evaluate a covering-array-based hyper-parametrisation on three well-established ML approaches (Decision Trees (DTs) [28], xgboost [31], and Random Forest (RF) [28,32]) together with a state-of-the-art approach (LightGBM [33]). Table 1 shows the implementation details for all four ML models using the cAgen tool. In particular, the hyper-parameter ranges of interest are shown for each technique, together with the corresponding IPM values (giving possible indices into the top level array list). The ML evaluation metric is *accuracy* as implemented by scikitLearn [34]. All three-element lists in our experiments are compact lists. The results are processed using SciPy's 'descriptive statistics' method [35].


**Table 1.** ML Models cAgen configurations.

#### **3. Results**

The results of hyper-parameter optimisation based on covering arrays (with strengths of 2, 3, or 4) and grid search are shown in the following tables. The best-performing parameter values are given, together with the time taken to complete the corresponding search, coverage (number of evaluations), and summary accuracy data. Tables 2–5 show results for RF, LightGBM, Xgboost, and DT, respectively. The results for Grid Search (over the same discretised parameter ranges) are also shown in each table. In the tables "No. of evaluations" is equal to the number of rows (i.e., combinations) in the covering array multiplied by the number of iterations (30).







**Table 5.** DT Model cAgen Results Comparison.


We can see that the DT classifier in Table 5 is the fastest of all ML models. Even if we look at the grid search, it is still efficient with this particular technique, taking only 5 min and 43 s to finish 3840 evaluations. However, cAgen is much more efficient with only 42.5 s to finish. Although only 480 evaluations with *t* = 2 were made, this achieves the same accuracy as Grid Search but with less time and effort. The second fastest ML model after DT was LightGBM, which highlights covering the array capability even more. Table 3 shows a huge disparity in time between the Grid Search and cAgen runs. The cAgen approach is faster than the Grid Search with only 1 h, 41 min and 18 s taken to complete the search, while the latter took 7 h, 57 min, and 14 s. Both reached excellent values for finding hyperparameter choices while having higher accuracy. Both strength values *t* = 2 and *t* = 3 in LightGBM, even though they have almost the same results obtained, reached a score with different hyper-parameter values. cAgen is more efficient than Grid Search, using less time. The third ML model was RF, where cAgen runs reached the highest performing choices for *t* = 2 with 2 h and 35 min. In contrast, Grid Search took 2 days, 23 h and 58 min to complete the search. The difference between cAgen and Grid Search in Table 2 is significant evidence of the usefulness of covering arrays for hyper-parameter optimisation. Xgboost was the slowest of all models to achieve the best values. It took more computational time than the other techniques to achieve the best values for strengths *t* = 3 and *t* = 4, and even Grid Search. The figures below Figures 3–5 compare the accuracy results between the selected models (*t* = 2, *t* = 3 and *t* = 4) in a histogram. (These histograms are not normalised between techniques, i.e., the total counts may vary between techniques. However, the general distributions can be compared.)

**Figure 3.** cAgen ML Models Results Comparison for Strength *t* = 2.

**Figure 4.** cAgen ML Models Results Comparison for Strength *t* = 3.

**Figure 5.** cAgen ML Models Results Comparison for Strength *t* = 4.

The authors in [9] benchmarked several ML models' performances using ensemble learning with 10-fold cross-validation. For DT and RF, the accuracy results were 0.989 and 0.984, respectively. Our model achieved 0.9849 and 0.9904. However, the main aim of our paper was to evaluate various coverage strategies, and not necessarily achieving an optimal value for each ML technique application. If explicit optima are the target, then further optimisations should be considered (see below).

#### **4. Discussion**

cAgen, a covering array approach with various strengths, was used to find highperforming hyper-parameters for targeted ML models. It was compared to Grid Search. Our results show that the systematic coverage offered by covering arrays can be both highly effective and efficient. The covering arrays produced by cAgen produced superior results to Grid Search across all four ML models. We highly recommend the covering arrays approach for ML researchers and the community overall. Although our work focused on improving the attained accuracy of malware classification, other security tasks may benefit from such an approach (particularly ML-based classification tasks).

For future work, we would like to assess the feasibility of adding more ML models techniques and hyper-parameters, increasing the complexity of search space/test sets, comparing different settings within the workspaces itself (e.g., FIPOG-F and FIPOG-2F), and increasing the complexity of t-way testing by adding constraints. We also believe that there is merit in considering hierarchical approaches to hyper-parametrisation, i.e., using the best values to come out of a set of runs (or even a single run) as identifying a reduced space to be systematically searched (e.g., using another covering array).

We note that we generally seek only excellent results. There is no guarantee of optimality from any of our tested approaches. An optimal result may well be given at a point that is simply not present in the cross-product of discretised domains because the discretisation process only defines *representative points* to span the domain. Furthermore, for each ML model considered, we presented what we believe are *plausible* discretised parameters ranges as the basis for our experiments. The specific choices made may affect the results. We acknowledge that other ranges are possible.

Furthermore, although one might legitimately expect higher strengths of a covering array to give rise to improved results, this is not actually guaranteed. Furthermore, after building up experience with the approaches for a specific system, one might accept that a low-strength array gives highly acceptable results very quickly, and so choose to use such arrays for all subsequent runs when training data are updated.

**Author Contributions:** Writing—original draft, F.T.A. and J.A.C.; Writing—review & editing, F.T.A. and J.A.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Dataset can be found in https://www.kaggle.com/datasets/amauricio/ pe-files-malwares.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Similarity of Musical Timbres Using FFT-Acoustic Descriptor Analysis and Machine Learning**

**Yubiry Gonzalez \* and Ronaldo C. Prati**

Center of Mathematics, Computer Science, and Cognition, Federal University of ABC, Av. Dos Estados, 5001, Santo André 09210-580, SP, Brazil

**\*** Correspondence: yubiry.gonzalez.17@gmail.com

**Abstract:** Musical timbre is a phenomenon of auditory perception that allows the recognition of musical sounds. The recognition of musical timbre is a challenging task because the timbre of a musical instrument or sound source is a complex and multifaceted phenomenon that is influenced by a variety of factors, including the physical properties of the instrument or sound source, the way it is played or produced, and the recording and processing techniques used. In this paper, we explore an abstract space with 7 dimensions formed by the fundamental frequency and FFT-Acoustic Descriptors in 240 monophonic sounds from the Tinysol and Good-Sounds databases, corresponding to the fourth octave of the transverse flute and clarinet. This approach allows us to unequivocally define a collection of points and, therefore, a timbral space (Category Theory) that allows different sounds of any type of musical instrument with its respective dynamics to be represented as a single characteristic vector. The geometric distance would allow studying the timbral similarity between audios of different sounds and instruments or between different musical dynamics and datasets. Additionally, a Machine-Learning algorithm that evaluates timbral similarities through Euclidean distances in the abstract space of 7 dimensions was proposed. We conclude that the study of timbral similarity through geometric distances allowed us to distinguish between audio categories of different sounds and musical instruments, between the same type of sound and an instrument with different relative dynamics, and between different datasets.

**Keywords:** musical timbre; FFT; musical instruments; acoustic descriptors; machine learning; data analysis; tinysol; goodsounds

**1. Introduction**

Musical timbre is a multidimensional attribute of musical instruments and of music in general, which, as a first approximation, allows one to differentiate one sound from another when they have the same intensity, duration, and pitch. It is well known that the complexity of musical timbre is not only associated with the identification of a musical instrument. We can find musical sounds with more similar timbre characteristics between acoustically different instruments than those of instruments with the same acoustic characteristics, considering the same pitch and dynamics.

Since musical timbre is a phenomenon of auditory perception, many of the investigations were developed in line with psychoacoustics with the aim of evaluating verbal descriptors that reveal measurable attributes of musical timbre [1–4]. The attributes of color vision and the perception of musical timbre were revealed through experiments on the subjective evaluation of perception [5]. Other more recent studies focused on similarities in the perception of images and the perception of timbre in various types of musical instruments, with models that represent timbre through linguistic-cognitive variables in a two-dimensional space [6,7]. Although the psychoacoustic perception of the musical timbre cannot be ignored, it must be recognized that the main timbral characteristics must be inscribed somehow within the Fast Fourier Transform (FFT) that enables the recording and subsequent reproduction of musical sound.

**Citation:** Gonzalez, Y.; Prati, R.C. Similarity of Musical Timbres Using FFT-Acoustic Descriptor Analysis and Machine Learning. *Eng* **2023**, *4*, 555–568. https://doi.org/10.3390/ eng4010033

Academic Editor: Antonio Gil Bravo

Received: 25 December 2022 Revised: 4 February 2023 Accepted: 6 February 2023 Published: 9 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

For the sake of argument, suppose that there are significant timbral characteristics that are not contained in the FFT performed on a musical audio record. In this scenario, the deconvolved audio (inverse convolution) of the reproduced digital record cannot be distinguished timbrally. However, this does not occur in musical digitization, as we are able to distinguish timbral aspects from deconvolved audios. Therefore, the FFT contains all of the significant timbral characteristics. If monophonic audio recordings of constant frequency (separate musical notes), equal intensity, and duration are considered, then the FFT will account for the timbre differences. Although psychoacoustic aspects are important, under these considerations, their effect on audio records does not affect the differences and timbral similarities of comparisons between the various audio records.

The characterization of musical timbre from the analysis of the spectrum contained in the Fast Fourier Transform has been one of the research topics of recent years in the fields of Musical Information Retrieval (MIR), Automatic Music Transcription (AMT), and performances of electro-acoustic music, among others. Recent developments in Signal Processing for Music Analysis [8–10] have allowed important applications in audio synthesis and in the deconvolution of polyphonic musical signals using spectrograms. However, it remains to be quantified which of the minimal descriptors of musical timbre characteristics, when present in the FFT of the audio records, are responsible for the acoustic stimulus, which allows the auditory identification of the sound source.

To extract information from the frequency spectrum of musical records, one must define the magnitudes, functions, or coefficients that describe or characterize a certain spectrum, which is generically called the acoustic descriptors. These provide quantitative measures that describe the set of amplitudes and frequencies of the FFTs of the audio records. Many researchers [11–23] focused on the presentation of an exhaustive collection of timbre descriptors (Timbre ToolBox, Librosa, etc.) that can be computationally extracted from a statistical analysis of the spectrum (FFT). Several other spectral descriptors appear in the literature, although there is no consensus on which or how many acoustic descriptors are necessary to characterize musical timbre. However, it is recognized that many of them are derivatives or combinations of others and that, in general, they are correlated with each other [12].

The use of the FFT and its representation in the frequency domain could be a way to study the physical characteristics of the musical timbre, thus, having a collection of well-bounded, discrete, and measurable pairs of computable numbers that represent the frequencies and amplitudes of the components of the Fourier analysis. In previous work, the authors [24,25] presented a minimum set of six dimensionless descriptors, motivated by musical acoustics and using the spectra obtained by the FFT, which allows for the description of the timbre of wooden aerophones (Bassoon, Clarinet, Transverse Flute, and Oboe) using individual sound recordings of the musical tempered scale. We show that these descriptors are sufficient to describe the timbral characteristics in the aerophones studied, allowing for the recognition of the musical instrument by means of the acoustic spectral signature. Also, Gonzalez & Prati [26] studied the timbral-variation dynamics (pianissimo, mezzo-forte, and fortissimo) in wooden aerophones using this set of six timbral descriptors in the Principal Component Analysis (PCA) of the TinySol audio library [27] and considering the common tessitura.

The goal of the present communication is to use the FFT-timbral coefficients to decrypt the similarity of musical timbres of different instruments. To this end, it is necessary to establish categories and build a space that classifies certain structures by applying Machine-Learning techniques. In Section 2 we used the timbre descriptors for defining a point for each musical sound of frequency *f* 0, each dynamic, and each instrument in an abstract timbral space of seven dimensions; then, the set of points is represented as a moduli space, and, therefore, the classification of the similarity problem of musical sound can be approached using Category Theory [28,29]. Section 3 presents an algorithm that is based on a data table corresponding to the fundamental frequencies and timbral coefficients for classifying each sound in terms of Euclidean distances. Further, in Section 4, we present

the preliminary results of variations arising as a function of the musical instrument, the dynamics, and the audio database used. Finally, the conclusions are presented in the last Section.

#### **2. Acoustic Descriptors and Timbral Representation**

It should be noted that, unlike the timbral study of speech and environmental sounds, musical frequencies make up a finite, countable, and discrete set of only 12 different values in each musical octave for a total of 96 possible fundamental frequencies, and their integer multiples are in the audible range: from 20 Hz to 20 kHz. Therefore, the musical timbre can be characterized by a limited set of timbral coefficients, which are dimensionless quantities related to the frequencies and amplitudes in the Fourier spectrum of the audio records. Motivated by musical acoustics, these coefficients are tonal descriptors and, in essence, functionally describe the discrete distribution of normalized frequencies and amplitudes. As the amplitudes of the spectra of the FFTs are normalized (using the quotient of the amplitude of each partial frequency with respect to the greatest amplitude measured in each spectrum), it is possible to compare the relative amplitudes among them. They can be grouped into descriptors of the fundamental frequency (musical scale, 96 possible frequencies) and descriptors of the rest of the partial frequencies that arise when performing the FFT of the audio under analysis (descriptors of the shape of the distribution and statistical-frequency distribution). These proposed descriptors are dimensionless coefficients.

The FFT values are essentially a discrete collection of pairs of different amplitudes and frequencies; therefore, they can be summarized by the following six dimensionless parameters, see [24,26] for further details.

#### *2.1. Fundamental Frequency Descriptors*

The measurement of the fundamental frequency in relation to the average frequency (Affinity *A*) is as follows:

$$A \equiv \frac{\sum\_{i=1}^{N} a\_i f\_i}{f\_0 \sum\_{i=1}^{N} a\_i} \tag{1}$$

The quantification of the amplitude of the fundamental frequency with respect to the collection of amplitudes (Sharpness *S*) follows below, where *f*<sup>0</sup> and *a*<sup>0</sup> represent the fundamental frequency and their amplitude, and *fi* and *ai* denote the frequency and amplitude of the *i*th FFT peak.

$$S \equiv \frac{a\_0}{\sum\_{i=1}^{N} a\_i} \tag{2}$$

#### *2.2. Distribution Statistics*

A descriptor of how close the secondary pulses are to being integer multiples of the fundamental frequency (Harmonicity *H*) is as follows:

$$H \equiv \sum\_{j=1}^{N} \left( \frac{f\_j}{f\_0} - \left[ \frac{f\_j}{f\_0} \right] \right) \tag{3}$$

where the [ ] denotes the integer part.

The envelope descriptor through the average slope in the collection of pulses (Monotony *M*) follows:

$$M \equiv \frac{f\_0}{N} \sum\_{j=1}^{N} \left( \frac{a\_{j+1} - a\_j}{f\_{j+1} - f\_j} \right) \tag{4}$$

#### *2.3. Descriptors of the Frequency Distribution*

The measurement of the frequency distribution with respect to the average frequency (Mean Affinity *MA*) is:

$$MA \equiv \frac{\sum\_{i=1}^{N} \left| f\_i - \overline{f} \right|}{Nf\_0} \tag{5}$$

The quantification of the average amplitude of the pulse collection (Mean Contrast *MC*) is:

$$\text{MC} \equiv \frac{\sum\_{j=1}^{N} |a\_0 - a\_j|}{N} \tag{6}$$

These dimensionless timbral coefficients, together with the fundamental frequency, form a vector (*f* 0, *A*, *S*, *H*, *M*, *MA*, *MC)* in an abstract or seven-dimensional configurational space for each monophonic audio record, which could represent the musical timbral space. Then, given a certain musical instrument, there will be only 96 possible sounds in western music (12 semitones in 8 octaves), with each one represented by a unique septuple. The set of points is represented as a Moduli space [29] or equivalently as a vector space.

As a potential representation of the timbres, Grey [30] proposed a three-dimensional timbre space based on the dissimilarity between pairs of sounds of musical instruments. Stimulus-neighboring points are represented in evolution points by their physical representations in terms of amplitude, time, and frequency. McAdams [31] found two dimensions for the set of wind/string musical instruments that qualitatively included the spectral and temporal envelopes, those for the set of percussion instruments including the temporal envelope, and either the spectral density or pitch clarity/noisine of the sound. The combined set had all three perceptual dimensions. Peeters et al. [12] calculated several measures on various sets of sounds and found that many of the descriptors correlated quite strongly with each other. Using a hierarchical cluster analysis of correlations between timbral descriptors, they concluded that there were only about ten classes of independent descriptors.

The problem of timbral representation is very similar to that of color–space representations. In both cases, the perceptions (audio, color) need to be defined operationally in abstract spaces for their computation and operational management. Thus, there are 256 digital colors (0–255) represented in an RGB configuration. By analogy, one could think of an analogous representation of the 96 monophonic musical sounds. This assumption is formally justified through Category theory in abstract mathematics [29]. The color and audio categories form groupoids, where the colors (timbres) are objects, and the color variations (timbre variations) are morphisms. The functors between them are induced in the continuous maps [30]. Hence, if the sounds constitute groupoids, all of their morphisms or forms of representation are equivalent, and consequently, the categories of musical sounds admit a representation through a vector space where the functors are linear transformations and a Euclidean metric could be defined for the distances between points in this abstract space.

#### **3. Timbre Similarities in Musical Instruments**

Two databases, Tinysol [32] and Good-Sounds [33], were used for the study of timbral similarities. The first dataset contained 2478 samples in the WAV audio format, sampled at 44.1 kHz, with a single channel (mono), at a bit depth of 16, each containing a single musical note from 14 different instruments, played in the so-called "ordinary" style and in the absence of a mute. The second dataset (Good-Sounds) contained monophonic recordings of two kinds of exercises: single notes and scales, from 12 different instruments and four different microphones. For the instruments, the entire set of playable semitones in the instrument was recorded several times with different tonal characteristics: "good-sound", "bad", "scale-good", and "scale-bad", see [33] for details.

For this study, only monophonic sounds were analyzed using the FFT of the audio records for the two common woodwind instruments in the databases Tinysol and Good-Sounds: the Transverse Flute and the Clarinet. The analysis presented includes only the

fourth octave of the equal temperament scale. These are the most typical types of musical scales in Western music culture and are also the ones used by the audio recordings of the datasets used in this work. We used the following nomenclature for each sound of that octave: C4, C#4, D4, D#4, E4, F4, F#4, G4, G#4, A4, A#4, and B4. From the Good-Sounds database, only single-note sound recordings labeled "Good-Sounds" were used, and records in the database were called "AKG" and "Neumann" in all of the dynamics differences (*p*, *mf*, *f*).

The general procedure is summarized in Figure 1. First, the fundamental frequencies and their corresponding timbral coefficients were obtained for each of the 240 sounds analyzed from the 2 databases, namely, List 1, in Figure 1. With this data, a general dataframe was built. After the mean value of the timbral coefficients was listed, the data was grouped by instrument, note, and dynamics for List 2. The mean of the standardized data was calculated (namely, list 2), grouping the data by instrument, note, and dynamics. Subsequently, the Euclidean distance for each sound was calculated considering the data in List 3, which was grouped by musical sound and the dynamics of the instrument, specifically, by Flute and Clarinet, 12 sounds of the fourth octave, and 3 dynamics (*p*, *mf*, *f*) for a total of 72 types of audio in 244 records of the data set. When the test audio was incorporated, the software obtained the timbral vector "b" of that audio and identified it using List 2. The characteristic value of the vector "a" corresponds to the said instrument, sound, and dynamics. We proceeded to calculate the Euclidean distance between both (d). If, statistically, the distance is probably significant (less than 2.4 times SD), the audio test was considered as corresponding to the sound, instrument, and dynamics of List 3, in the *i*th position. Finally, the new audio was incorporated into the database. Otherwise, the software indicated the Euclidean distance, such as the similarity weighting and the timbre characteristics associated with the said audio.

**Figure 1.** Flowchart diagram of the algorithm to calculate distances and relations of timbral similarity.

#### **4. Results**

#### *4.1. Variations Due to the Musical Instrument*

Following the previous procedure, for the Clarinet and Flute reference audios of the 4th octave and a dynamics of mezzo-forte for both databases, we obtained the average distances in the seven-dimensional timbral space between the positions of each sound with respect to all of the others (Figure 2). It was observed that the minimum distance occurs precisely for the correspondence between the sounds (diagonal elements), and in all cases, it is statistically discernible for a normal distribution (less than 2.4 times the standard deviation). In addition, the distance of any sound of the Clarinet with respect to those of the Flute, and reciprocally any of the sounds of the Flute with respect to those of the Clarinet (matrix sub-blocks without color), is greater than those corresponding to the distance between sounds of the same instrument (matrix sub-blocks in color green and violet).

The representation of the audio records by means of the timbral-coefficients vector allows the representation of a timbral space, where the distance between points is a measure of their timbral proximity. Then, the distances between any two audio records can be represented in a matrix (Figure 2). To facilitate its reading, a color scale is included, highlighting the distances that are statistically significant (blue) with those that are not (red), using as a criterion the value of 2.4 times the standard deviation in a normal distribution.

**Figure 2.** Patterned distance between musical sounds for Clarinet and Flute in mezzo-forte for the 4 ta octave, reference sounds in the data set.

The Good-Sounds database contains several registers considered "spurious" (called P 1 and P 2 in Figure 3); in addition to sounds selected as standards for each instrument and dynamics (called Neu1 and AKG1 in Figure 3), they present variations of the ratio d/d\_mean that are significantly higher with respect to the reference audios, in all the musical sounds of the 4th Octave: This variation of the P 1 and P 2 audios occurs randomly in both series and in both instruments. The procedure outlined in Figure 1, when going through the records of these audios, incorporates the Neu1 and AKG1 audios into the database, while the audios P 1 and P 2 are discarded because they are incompatible with the registered standard values (see Figure 3).

**Figure 3.** Patterned distance between musical sounds in mezzo—forte for the 4 ta octave, reference sounds in the dataset: (**a**) Clarinet (**b**) Flute.

#### *4.2. Variations Due Musical Dynamics*

When variations of the dynamics are considered for the sounds of the fourth octave, we observed that the minimum distance occurs precisely for the correspondence between the sounds (diagonal elements). In Figure 4(up) Clarinet and Figure 4(down) Flute, it was observed that, in each row, the minimum distance occurs for the corresponding sound on the musical and dynamic scale. We also noted that the dynamics of mezzo-forte are always less than those of the adjacent sounds (by rows or columns) for all musical notes and in both instruments.

There does not seem to be a d/d\_mean behavior for the various dynamics in the Clarinet. The dynamic of fortissimo in the Flute (Figure 4) is always close to the value of d\_mean for each musical sound. This may be due to the fact that the monophonic sounds of the Flute are well-defined by the performer when the pressure of air is at its maximum within the resonant cavity and by the relative ease to play of this dynamic. So, for the sounds of the flute in fortissimo, the execution is very similar in different interpreters, and the dispersion of distance values in the registers is smaller. That is, the standard deviation of the sample is very small (d/d\_mean was less than a tenth of the standard deviation).

**Figure 4.** *Cont*.

**Figure 4.** Patterned distance between musical sounds for clarinet (**up**) and Transverse Flute (**down**) in several dynamics (*p*, *mf*, *f*) for the 4 ta octave, reference sounds in the dataset.

*4.3. Variations Due to the Tinysol and GoodSounds Database*

The classification of the timbre of musical instruments in the proposed seven-dimensional space critically depends on the standardization of the real audios taken as reference. For this reason, it is important to ensure the robustness and reliability of the reference audio records. The databases used, as already mentioned, are reliable [32,33]; the standardized distances of the records for Clarinet and Flute are shown in Figures 5 and 6, respectively, and are grouped by dataset type (Tinysol, Goodsounds-Neumann, and Goodsounds-AKG). Figures 5 and 6 show that the audio records are within the radius of reliability radius of the normal distribution, with the separation from the mean value less than 2.2 times the standard deviation.


**Figure 5.** Patterned distance between musical sounds for Clarinet to reference sounds, according to several data sets. The circles of colors represent the shortest distance between the three dynamics.

The diagonal sub-blocks of the matrix indicate the correspondence with the expected values for the ratio between d/d\_mean. The minimum value in each data set has been highlighted. For the Clarinet, the GoodSounds Neumann audio recordings are closer to the mean value and better discriminate the smallest distance in relation to the other distances of the other sounds (minimum value for each row, highlighted in dotted ovals). Figure 6 shows the comparison of the Transverse Flute data sets. The GoodSounds database provides two sets of Neumann-type and AKG-type records.

For some sounds, the distance closest to the mean value belongs to different recordings with no apparent systematic variation. However, the mode with which a given database provides a weighted distance (d/d\_mean) closest to the mean value could be used as a quantitative evaluation criterion for various sound libraries.


**Figure 6.** Patterned distance between musical sounds for Flute to reference sounds, according to several data sets. The circles of colors represent the shortest distance between the sound libraries.

#### *4.4. Timbral Similarity between Clarinet and Transverse Flute*

For the common tessiture between Clarinet and Flute in mezzo-forte dynamics, the results of timbral distances (Figure 2) can be used to find those sounds generated by different instruments such that the separation in timbral space is statistically significant (less than 2.4 times the average of the distance), in accordance with the machine-learning algorithm presented in Figure 1. Table 1 shows the distance values between similar sounds for the fourth octave. Note that the minimum distance corresponds to the diagonal elements, as expected. However, there are sounds between different instruments that are also significant because their distance is less than 2.4 times of the average distance. This suggests that they are timbrically related, that is to say that the FFTs of these sounds should be similar in terms of the number of harmonics, envelopes, and distribution of partial frequencies. Indeed, in Figure 7 the FFTs are shown where it is possible to see their similarity.


**Table 1.** Average distance in the timbral space of the sounds A#4 and B4 for Flute and Clarinet.

**Figure 7.** Comparison of the normalized FFTs of the sounds A#4 and B4 (rows) for Flute and Clarinet (columns). The intensities are normalized with respect to the amplitude (*a*0) of the fundamental frequency.

#### **5. Conclusions**

The septuple made up of the fundamental frequency and the six timbral coefficients of each musical sound unambiguously define a collection of points and, therefore, formally (Category Theory), a timbral space can be devised to represent the sounds. In such a space, the subsets of musical sounds are groupoids and are related to each other by morphisms. This suggests that for each musical instrument, the dynamics and musical sound would be represented by a single characteristic vector (*f* 0, *A*, *S*, *H*, *M*, *AM*, *CM*) containing significant timbral characteristics. The real audio recordings would constitute statistical variations due to randomness in the execution of the musical sound by the interpreter and to specificities of the musical instrument with which the audio was made (model and manufacturer, quality of the same, materials used, imperfections of its acoustics, etc.), or even the recording equipment. Then, the audio sets for a type of musical instrument, specific dynamics, and a specific sound, will cover a spatial region in addition to the characteristic timbral vector.

In this work, we were able to determine the timbral variations of the following audio categories:


We find that for all the case studies, the smallest distances always occur between the elements of the diagonal. In the case of timbral variations by dynamics, we found that for most of the sounds, the dynamics of pianissimo and mezzo-forte have the smallest distances. This is related to the acoustic properties of the instrument and the difficulty of the air control for the dynamics of fortissimo by the performers. Regarding the analyzed databases, we found that the GoodSounds—Neumann database was the one with the lowest distance values. This suggests that it is a more reliable database for analysis of timbral properties of instruments.

For the study of the timbral similarities between the audio recordings, we proposed an algorithm (Figure 1) that evaluates such timbral similarities through Euclidean distances in the abstract space of 7 dimensions. This allowed us to find which FFTs are similar across different instruments (Section 4.4). For the two instruments in this study, statistically significant sounds were found because their distance is less than 2.4 times of the average distance. This suggests that these sounds (Table 1) are timbrically related, that is, that the FFTs of these sounds are similar in terms of the number of harmonics, envelope, and distribution of partial frequencies.

We plan to investigate different machine learning algorithms for future work, as well as different measures of distance.

**Author Contributions:** Conceptualization, Y.G. and R.C.P.; methodology, Y.G. and R.C.P.; software, Y.G. and R.C.P.; validation, Y.G. and R.C.P.; formal analysis, Y.G. and R.C.P.; investigation, Y.G. and R.C.P.; resources, Y.G. and R.C.P.; data curation, Y.G. and R.C.P.; writing—original draft preparation, Y.G.; writing—review and editing, Y.G. and R.C.P.; visualization, Y.G.; supervision, R.C.P.; project administration, Y.G. and R.C.P.; funding acquisition, Y.G. and R.C.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal deNível Superior—Brasil (CAPES)—Finance Code 001.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The sounds used in this work are available at the following link: https://zenodo.org/record/3685367#.XnFp5i2h1IU%22, accessed on 15 May 2022.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Tensor CSRMT System with Horizontal Electrical Dipole Sources and Prospects of Its Application in Arctic Permafrost Regions**

**Alexander K. Saraev \*, Arseny A. Shlykov and Nikita Yu. Bobrov**

Institute of Earth Sciences, St. Petersburg State University, 199034 St. Petersburg, Russia **\*** Correspondence: a.saraev@spbu.ru; Tel.: +7-921-799-43-16

**Abstract:** When studying horizontally-inhomogeneous media, it is necessary to apply tensor modifications of electromagnetic soundings. Use of tensor measurements is of particular relevance in near-surface electrical prospecting because the upper part of the geological section is usually more heterogeneous than the deep strata. In the Enviro-MT system designed for the controlled-source radiomagnetotelluric (CSRMT) sounding method, two mutually perpendicular horizontal magnetic dipoles (two vertical loops) are used for tensor measurements. We propose a variant of the CSRMT method with two horizontal electrical dipole sources (two transmitter lines). The advantage of such sources is an extended frequency range of 1–1000 kHz in comparison with 1–12 kHz of the Enviro-MT system, greater operational distance (up to 3–4 km compared to 600–800 m), and the ability to measure the signal at the fundamental frequency and its subharmonics. To implement tensor measurements with the equipment of the CSRMT method described in the paper, a technique of creating a time-varying polarization of the electromagnetic field (rotating field) has been developed based on the use of two transmitters with slightly different current frequencies and two mutually-perpendicular transmitter lines grounded at the ends. In this way, we made it possible to change the direction of the electrical and magnetic field polarization continuously. This approach allows realization of the technique of tensor measurements using the new modification of the CSRMT method. In permafrost areas, the hydrogenic taliks are widespread. These local objects are important in the context of study of environmental changes in the Arctic and can be successfully explored by the tensor CSRMT method. For the numerical modeling, a 2D model of the talik was used. Results of the interpretation of synthetic data showed the advantage of bimodal inversion using CSRMT curves of both TM and TE modes compared to separate inversion of TM and TE curves. These new data demonstrate the prospects of the tensor CSRMT method in the study of permafrost regions. The problems that can be solved using the CSRMT method in the Arctic permafrost regions are discussed.

**Keywords:** electromagnetic soundings; controlled source; radio magnetotellurics; polarization of electromagnetic field; tensor measurements; permafrost; hydrogenic taliks

Received: 24 November 2022 Revised: 1 February 2023 Accepted: 7 February 2023 Published: 9 February 2023

Academic Editor: Antonio Gil Bravo

**Citation:** Saraev, A.K.; Shlykov, A.A.; Bobrov, N.Y. Tensor CSRMT System with Horizontal Electrical Dipole Sources and Prospects of Its Application in Arctic Permafrost Regions. *Eng* **2023**, *4*, 569–580. https://doi.org/10.3390/

eng4010034

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

### **1. Introduction**

Horizontally-inhomogeneous media are challenging objects for electrical prospecting. To obtain reliable data, it is necessary to use tensor modifications of the prospecting methods. Methods for measuring and interpreting tensor data are developed in detail for the magnetotelluric (MT) sounding method, operating in the frequency range 0.001–10 Hz, and for the audiomagnetotelluric (AMT) sounding method, frequency range 1–10,000 Hz, both based on the use of natural electromagnetic fields. Using multidirectional measuring arrays, recording the horizontal components of the electrical field Ex and Ey and the magnetic field Hx and Hy, as well as the vertical component of the magnetic field Hz, and using the tensor processing and interpretation of sounding data, reliable information can be obtained about the structure of horizontally inhomogeneous media [1,2].

The tensor version of the controlled-source audio magnetotelluric (CSAMT) sounding method, frequency range 0.1–10,000 Hz, is used in areas with complex geology. Two differently-directed transmitter lines (placed nearby or separated) are used, and the measurements of electrical and magnetic field components are done in the same way as in MT and AMT methods [3]. If information about the predominant strike of geological structures at the studied site is available, orienting the transmitter lines parallel and perpendicular to the strike simplifies the interpretation, since measurement data are directly related to the transverse-electric (TE) and transverse-magnetic (TM) modes of the electromagnetic field. The CSAMT method is more efficient than MT and AMT methods when the level of industrial noise is high, as well as when it is necessary to obtain high-quality data in the MT "dead band" (frequency range 0.5–7 Hz) and AMT "dead band" (frequency range 800–5000 Hz) [4].

The features of horizontally-inhomogeneous media are of particular relevance when interpreting data produced by near-surface electrical prospecting methods, since the upper part of a section is usually more heterogeneous than the deep horizons. This is especially true for permafrost regions. The structure of permafrost areas is characterized by a significant heterogeneity, including rock properties changing dramatically both vertically and horizontally. Areas of thawed rocks alternate with ice-rich sediments. In this case, the cryogenic boundaries in a section may not coincide with the geological boundaries. Underground ice and cryopegs create large contrasts in electrical properties.

Currently, processes of permafrost degradation occur in large areas, leading to an increase in the heterogeneity of frozen strata. The production of mineral resources in the Arctic regions and the construction of infrastructure require considerable year-round engineering and geophysical research. Electrical-prospecting methods are widely used in permafrost regions, since the electrical properties of frozen sediments are a sensitive indicator of their state. The direct current (DC) methods, such as vertical electrical sounding (VES) and electrical resistivity tomography (ERT), are the most commonly used in these studies [5–9]. Taking into account the horizontally-heterogeneous structure of frozen strata, for engineering surveys in the permafrost regions, the VES method is often used not in a standard version with a single array, but in the modification of two-components (MTC) with two differently-directed lines [10]. However, one of the main disadvantages of directcurrent methods is the necessity of using grounded electrodes, which imposes seasonal restrictions on fieldwork and significantly slows the survey speed [8].

Electromagnetic methods are more promising for year-round surveys in the permafrost zone, but they are not as widely used as direct-current methods, despite having a number of advantages: higher survey speed, the ability to work at any season of the year with ungrounded electrical sensors, and the ability to study geological sections under high-resistivity screens. Among electromagnetic methods, transient electromagnetics (TEM) [11–15] and ground penetrating radar (GPR) [16–20] are most often used on permafrost. TEM has always been considered as a relatively deep geophysical technique compared to DC methods. In particular, it was proposed to be used for mapping the base of permafrost [21]. When studying the near-surface part of the section, TEM has a limitation regarding the minimum depth of research ("dead zone") associated with the influence of interfering transients when the current is turned off in the transmitter loop [22–24]. GPR is successfully used on glaciers and solves a number of problems in the study of near-surface horizons in permafrost areas. However, the penetration depth of GPR reduces to the uppermost few meters when clay sediments are present in a section. In addition, GPR data can be adversely influenced by multiple reflections from ice lenses and other heterogeneities in the upper part of the section.

For shallow-depth studies down to 30–50 m, the tensor measurement technique is implemented in radio-magnetotelluric (RMT) sounding, based on the registration of radio-transmitter signals in the frequency range of 10 kHz to 250–1000 kHz [25], which covers very low frequency 10–30 kHz (VLF), low frequency 30–300 kHz (LF), and medium frequency 300–1000 kHz (MF) signals. However, in remote polar regions, the spectra of the fields of LF and MF radio transmitters are quite limited. For these conditions, a tensor variant of the controlled-source radiomagnetotelluric (CSRMT) sounding method has been developed, which expands the capabilities of the RMT method. The first Enviro-MT system implementing this method was built at the University of Uppsala, Sweden [26]. Two mutually-perpendicular horizontal magnetic dipoles (HMD, two vertical loops) are used as controlled sources. Enviro-MT equipment has been widely applied to a variety of problems in near-surface geophysics [27–30]. A disadvantage of this technology is that it is difficult to generate large moments (the product of the current strength and the area of the loop) of the vertical loop source, reducing the amplitude of the response signal and thus limiting the area of survey without moving the source.

St. Petersburg State University, Russia, and the University of Cologne, Germany, have developed a variant of CSRMT method when two mutually-perpendicular horizontal electrical dipoles (HED, two transmitter lines) are used as sources for tensor measurements [31–33]. It is much easier to generate a large moment with such a source, which in this case is the current strength multiplied by the line length.

One of the problems of current interest for the CSRMT method in permafrost regions is detection and contouring of hydrogenic taliks. These objects (unfrozen rock mass surrounded by permafrost) are usually formed as a result of the warming effect of water reservoirs and streams on the underlying ground [34]. Despite their wide distribution in the permafrost, taliks are still relatively poorly studied. Often there is no reliable information even about their thickness [14,35]. Meanwhile, taliks are of interest as indicators of climate change in the Arctic and may serve as a potential source for the water supply in the permafrost areas.

Here, we describe features of the tensor version of the CSRMT method using the developed equipment with the HED sources, and present the results of numerical modeling applied to the problem of identifying and contouring hydrogenic taliks in the permafrost zone.

#### **2. Radiomagnetotelluric and Controlled Source Radiomagnetotelluric Methods**

The RMT method uses the electromagnetic fields of remote radio transmitters. The primary field of a radio transmitter is a linearly-polarized wave, in which, at the earth-air interface, the horizontal components of the electrical and magnetic fields are mutually perpendicular. Values of the complex surface impedance (*Z*) are determined by measuring horizontal components of the electrical (*E*X) and magnetic (*Hy*) fields:

$$Z = E\_{\mathcal{X}} / \mathcal{H}y.\tag{1}$$

*Z* is then transformed into the apparent resistivity

$$
\rho\_{\mathbf{a}} = (1/\omega \mu\_0) \cdot \|Z\|^2,\tag{2}
$$

where *<sup>ω</sup>* = 2*πf*, *<sup>f</sup>* is the frequency in Hz, *<sup>μ</sup>*<sup>0</sup> = 4*π*·10−<sup>7</sup> H/m is the vacuum magnetic permeability (magnetic constant).

The impedance phase (phase difference between *E*<sup>X</sup> and *Hy* components) is determined as:

$$
\mathfrak{p}\_{\mathbb{Z}} = \mathfrak{q}\_{E\_{\mathbb{X}}} - \mathfrak{q}\_{Hy}.\tag{3}
$$

The sounding curves are frequency dependences of *ρ*<sup>a</sup> and *ϕ*Z; inversion of the latter yields the resistivity section at the observation point. At a distance of several kilometers from a radio transmitter, the measured impedance coincides with the impedance of the vertically-incident plane wave, which depends only on the structure and properties of the underlying half-space. For the plane-wave model, interpretation methods have been developed in detail to ensure reliable sounding results [1,2].

The RMT sounding method is most effectively used in populated regions, where it is possible to measure the signals of VLF, LF, and MF radio transmitters in the full frequency range from 10 to 1000 kHz. Usually, the signals of 20–30 radio transmitters are measured with confidence, allowing both the acquisition of sounding curves suitable for the inversion and the derivation of resistivity sections [31]. In remote regions, such as permafrost areas in the Arctic, only VLF radio transmitter signals can be recorded, allowing use of the profiling technique only. In addition, existing radio transmitters operate at frequencies above 10 kHz, which limits the depth of investigation to 30–50 m, depending on the resistivity of the rocks.

To overcome limitations of the RMT method, the controlled source of the electromagnetic field is used in addition to measuring the signals from remote radio transmitters. Concerning the Enviro-MT equipment developed at the University of Uppsala, the frequency range was extended down to 1 kHz, increasing the depth of investigation to 100–150 m. Two mutually-perpendicular HMDs (two vertical loops) were used as controlled sources [26]. These sources have a number of advantages: absence of grounding, compactness and ease of installation, and the possibility of implementing tensor measurements with two differently-oriented HMDs. However, the working area of these sources is limited, measurements being possible at a distance from the source not exceeding 600–800 m. This leads to the need to move the source frequently. Furthermore, the Enviro-MT equipment measures the controlled-source field only in a narrow frequency range from 1 to 12 kHz. The need to tune the source in resonance with a load at each emitted frequency limits the survey productivity. The possibility of using subharmonics of the fundamental frequency has not been studied.

St. Petersburg State University, Russia, and the University of Cologne, Germany, jointly with St. Petersburg small enterprises Microkor LLC, Tenzor LLC, Magnetic Devices LLC, and with the Russian Institute for Power Radioengineering, have developed CSRMT equipment that includes a recorder and transmitter allowing operation of the CSRMT method in an extended frequency range of 1 kHz–1 MHz. The frequency range is extended by using as a source a transmitter line several hundred meters long grounded at the ends. This source has a wider working area than an HMD source. Measurements are possible at a distance from the source of up to 3–4 km. Along with the measurements of the signal at the fundamental frequency, signals of subharmonics are measured, increasing the survey productivity. On the other hand, the HMD source does not require grounding, which may be an advantage, particularly during fieldwork in winter.

First surveys by the CSRMT method with an HED source were carried out using the scalar technique. Significant experience has been gained in solving various problems of near-surface geophysics [31,32,36–39]. In recent years, the tensor variant of the CSRMT method, using two mutually perpendicular HEDs as electromagnetic field sources, has been developed. Examples of bimodal inversion of field data using the tensor modification of the CSRMT method have been reported in our papers [33,40].

#### **3. Equipment for the Controlled Source Radiomagnetotellurics**

The CSRMT hardware-software complex (Figure 1) includes a recorder with receiving electrical and magnetic antennas, a transmitter with electrical dipole sources, and data processing and interpretation software tools. The RMT-5 recorder [31] has five channels for synchronous measurements, with 16-bit ADCs in each channel (two electrical and three magnetic channels). The recorder frequency range is 1–1000 kHz, the built-in memory is 8 GB. The built-in display and keypad allow autonomous field work without having to connect to an external PC, and the built-in power supply provides an operation time of 6–8 h. Measured data are transferred to an external computer via an Ethernet channel. The GPS receiver records the survey coordinates and time. During operation, time series of magnetic and electrical field signals are recorded and stored in the built-in memory. The recorder operates in four frequency ranges: D1 (1–10 kHz, signal sampling frequency *fs* = 39 kHz), D2 (10–100 kHz, *fs* = 312 kHz), D3 (10–300 kHz, *fs* = 832 kHz), and D4 (100–1000 kHz, *fs* = 2496 kHz).

**Figure 1.** Recorder (**a**), magnetic sensors (**b**), and transmitter (**c**) of the CSRMT equipment.

Magnetic antennas have a frequency range of 1–1000 kHz, a self-noise level of 25 fT/√Hz, and a transfer factor of the magnetic-induction antenna into a signal voltage of 20 mV/nT. Electrical field measurements can be carried out with grounded and ungrounded (capacitive) receiving lines (2 × 10 m long wires), which allow one to work both in summer and in winter with ice and snow cover, as well as in conditions unfavorable for the grounding of electrical lines (asphalt, concrete, gravel). The compact size of the equipment allows for its use in restrictive areas.

After measurements, a fast Fourier transformation of time series of electrical field *E*<sup>X</sup> and *Ey* (V/m) and magnetic field *H*X, *Hy* and *H*<sup>Z</sup> (A/m) components is performed, and auto-spectra and mutual spectra of the electrical and magnetic fields and their coherence are calculated. At a coherence level > 0.8, the data are considered suitable for further processing and are used to calculate the apparent resistivity and impedance phase.

The GTS-1 transmitter for the CSRMT method is designed to excite rectangular bipolar pulses in the frequency range of 0.1 Hz–1 MHz, with an adjustable pulse-time ratio to a load with a resistance of 10–1000 Ω. The supply voltage is 220 V; power frequency is 50 Hz. Output voltage is up to 300 V, output current is from 100 mA to 7.5 A, and output power at a load of 100 Ω is up to 1 kW. The transmitter is operated from a control panel or remotely from an external computer.

#### **4. Measurement Technique**

When conducting the survey using the CSRMT method, we use an HED source, which is a cable with a length of 400 to 1000 m, grounded at the ends. As noted above, this source is more efficient for the CSRMT method than an HMD source having larger range of coverage, wider frequency range, and the ability to register the main harmonic of the emitted signal and its subharmonics in a wide frequency range.

A finite-length cable source has long been used in the CSAMT method [3]. The high efficiency of this source is confirmed by many years of experience in applying the CSAMT method in different regions and with equipment from different manufacturers. The experience of using HED in the CSRMT method confirms the efficiency of this type of source.

In the surveys using the CSRMT method with HED, the measurement performance is significantly increased by measuring the signals at the main frequencies and their odd subharmonics. As shown in Figure 2, at the main signal frequency of 1 kHz, nine odd subharmonics with a coherence level higher than 0.8 are visible in the signal spectra for the electrical and magnetic channels. The spectra shown in this figure were obtained at a distance of 1 km from the HED source, with transmitter-cable length of 200 m. To cover the full frequency range of 1–1000 kHz, three fundamental frequencies are usually used, each being accompanied by 8–12 subharmonics. As a result, high productivity of measurements, about 70–80 sounding stations per day is achieved.

**Figure 2.** Autospectra of electrical E1 and magnetic H1 field signals from an HED source and radio transmitters in the frequency range of 1–100 kHz. Directions of E1 and H1 are mutually perpendicular.

Using the CSRMT method with electrical transmitter lines, it is necessary to use electrodes for the groundings. The installation of the transmitter lines is usually performed once per several days or weeks of surveying, when it is possible to measure electromagnetic fields of the HED source, due to the large operational distance (up to 3–4 km). In the winter season, we use for each grounding several (up to 10–15) electrodes, and the grounding resistance usually does not exceed 100–200 Ω.

For controlled sources including HED, three electromagnetic field zones are commonly introduced: near-field, transition, and far-field zone [3]. The near-field zone corresponds to the condition |*k*|*r* << 1, where *k* is the wave number of the earth, and *r* is the distance from the source to the observation point. For the case of low frequency (quasi-stationary approximation), the near-field zone is introduced via the skin layer thickness *d*:

$$d \approx 503 \text{ (p/f)}^{1/2} \tag{4}$$

where ρ is the resistivity in Ω·m, *f* is the frequency in Hz.

In this case, the above condition for the near-field zone corresponds to the ratio *r*/*d* < 0.5. In the near-field zone, the alternating electromagnetic field behaves like the DC field. Components of the electrical field depend on the resistivity of the rocks, but do not depend on the frequency. Components of the magnetic field do not depend on either the frequency or the resistivity of rocks. Therefore, near-field impedance measurements cannot be used for frequency soundings. In the transition zone at |*k*|*r* ≈1 or *r*/*d* ≈1, the components of the electromagnetic field depend on both the frequency and the coordinates of the observation point. In the far-field zone, with |*k*|*r* >> 1 or *r*/*d* > 4–5, the components of the electromagnetic field correspond to the model of a vertically-incident plane wave [3]. The components depend only on the field frequency and do not depend on the coordinates of the observation point.

Surveys by the CSRMT method are carried out both in the far-field and in the transition zones of the controlled source. It should be noted that measurements in the transition zone make it possible to determine the anisotropy parameters of rocks, such as horizontal and vertical resistivity and anisotropy coefficient [38].

#### **5. Tensor Measurements**

Tensor measurements require registration of an electromagnetic field of different polarizations [1,2]. To generate a field with time-varying polarization (rotating field), two alternately-operating and differently-oriented sources are commonly used [26,41]. In this case, the transmitter is connected in turn to one then to the other transmitter line. The field registered by the recorder has a different orientation at different times. Time series obtained at the same frequency but with different sources are processed as a single data set.

Apart from sequential connection of sources with the same current frequency, field rotation can be achieved by using currents with slightly different frequencies in each of two transmitters operating simultaneously. In this case, the period of change in the direction of the total field will be proportional to the difference between the periods of current in each transmitter. The total field (both electrical and magnetic) will be elliptically polarized, and the direction of polarization will continuously change in time.

This approach is effective at frequencies below 10 kHz. At high frequencies (above 10 kHz) used in the CSRMT method, the wavelength of the current in the transmitter line hundreds of meters long becomes comparable to or smaller than the length of the line itself. In this case, the amplitude and phase of the recorded field will be affected by the current distribution along the wire, which in turn will be determined by the distributed electrical parameters of the wire, such as linear resistance, capacitance, and inductance. The wave effects that arise in the wire will influence the direction of polarization of the electromagnetic field.

The layout of the field experiment conducted to analyze the polarization of the rotating field is presented in Figure 3. We have used two transmitters and two mutuallyperpendicular lines grounded at the ends. In the first experiment, each line was 200 m long, and in the second experiment the lines were 600 m long.

**Figure 3.** Layout of the field experiment: 1—600 m long lines; 2—200 m long lines; 3—measurement point. Polarization ellipse elements are schematically shown around the measurement point: *a*—major semi axis, *b*—minor semi axis, θ—angle of rotation of the semi-major axis.

Figure 4 shows dynamic spectra of the parameter Δθ*E*, which is a measure of the deviation of the polarization direction of the electrical field, characterized by angle θ, at a particular moment of time from the average direction of polarization over the entire measurement time. We measured the time series and estimated the directions of electrical field polarization for short realizations. Then we estimated an averaged direction for each transmitter frequency (odd harmonics of the main frequency) over the entire measurement time.

**Figure 4.** Dynamic spectrum of the Δθ*<sup>E</sup>* parameter for frequencies with a small shift in different transmitter lines: (**a**) 200 m long lines; (**b**) 600 m long lines. Base frequencies are 0.5000 kHz in the first transmitter and 0.5001 kHz in the second transmitter, respectively.

This figure shows how much directions of the horizontal electrical field polarization deviate from a certain averaged direction during the measuring time, i.e., the spatial precession of the direction of horizontal electrical field for different frequencies over time. A small frequency shift between two transmitters, 0.02% for a base frequency of 0.5 kHz (0.5000 and 0.5001 kHz), appears to produce oscillations in the orientation of the horizontal electrical field with an amplitude of ±15–20◦. Period of oscillations for a 200-m transmitter lines varies from about 20 s at 1.5 kHz to about 4 s at 9.5 kHz (Figure 4a). For a 600-m transmitter line, the main oscillation period is about 3 s for all frequencies in the considered range (Figure 4b). The same properties apply to the magnetic field. This is sufficient for the use in tensor measurements.

#### **6. Modeling of Taliks in Permafrost Regions**

We present here an assessment of the possibility of detecting and contouring hydrogenic taliks based on the data of numerical modeling. A two-dimensional model of a sub-lake talik is assumed (Figure 5). Under the lake with depth ranging from 2 to 6.5 m and water resistivity of 120 Ω·m, there is a zone of thawed sedimentary rocks with a maximum thickness up to 18 m and resistivity of 30 Ω·m. Surrounding frozen rocks have resistivity of 1000 Ω·m. This model is representative for arctic thermokarst lakes [42–45].

**Figure 5.** Two-dimensional model of a sub-lake talik. The direction to the field generation system (Tx) is shown by an arrow in the upper left corner. The measurement stations (Rx) are marked by black dots along the surface.

The synthetic field generation system consists of two perpendicular HEDs. The first dipole is directed along the *x* axis, and the position of the measurements profile corresponds to the equatorial area of the source. The second dipole is directed along the *y* axis, and in this case the profile is in the axial area of the source.

The sounding stations are located along the profile with a separation of 10 m. The nearest station, with number 00, is located at a distance of 3 km from the source. In total, there are 31 sounding stations in a 300 m long profile.

The modeling has been performed for 18 frequencies evenly distributed on a logarithmic scale in the range of 1–1000 kHz. For the selected talik model and measurement geometry, the field of the dipole directed along the x-axis corresponds to the TE mode, and the field of the dipole directed along the y-axis corresponds to the TM mode. At the

lowest frequency of 1 kHz, the thickness of the skin layer for a half-space with a resistivity of 1000 Ω·m is approximately 500 m. Therefore, we can assume that the position of all sounding stations on the profile corresponds to the ratio r/d = 6 or, in other words, to the source far-field zone [3].

To obtain synthetic 2D data in the source far-field zone, the MARE2DEM software [46] has been used. Two-dimensional inversion of synthetic curves of apparent resistivity and impedance phase has been carried out using the ZondMT2D software [47]. When only curves of the TM mode are used, an anomalous object, the talik, is identified, but its contour is not determined exactly (Figure 6a). From the inversion of the TE mode, the contour of the talik has been determined more accurately; however, its shape relative to the model is somewhat distorted and resistivity of the host medium under the talik is overestimated up to 5000–7000 Ω·m (Figure 6b). The most reliable result with good correspondence of the shape and properties of the anomalous object to the original model was obtained by bimodal inversion using CSRMT curves of TM and TE modes (Figure 6c).

**Figure 6.** Results of the 2D inversion of the synthetic CSRMT data in the plane-wave approximation. (**a**)—inversion of the TM mode; (**b**)—inversion of the TE mode; (**c**)—bimodal inversion. The black dashed line indicates the contour of the talik and the water body assumed in the model.

From the analysis of the experience in application of different electrical and electromagnetic methods and results of numerical modeling, we can conclude that the CSRMT method has good prospects for application in the Arctic regions to study the structure and properties of frozen rocks, determine the depth of seasonal thawing, monitor the dynamics of permafrost degradation, identify and contour lenses of underground ice and taliks, map ice wedges and cryopegs, and control soil and groundwater pollution in the sensitive permafrost ecosystem. Frozen strata are characterized by high spatial heterogeneity that can be better resolved by tensor soundings using both TM and TE modes of the electromagnetic field. Data obtained by the CSRMT method can be used during planning and construction of oil and gas facilities in permafrost regions, of the infrastructure of the ports at the Northern seas, and for tracing linear industrial objects in the North (oil and gas pipelines, power lines, railways and roads).

#### **7. Conclusions**

The features of the CSRMT tensor system using two differently directed HEDs (transmitter lines) as sources are considered. Compared to a previous variant of the CSRMT method with two multidirectional HMDs (vertical loops), realized in the Enviro-MT system, the proposed technique has a number of advantages. It has an extended frequency range

of 1–1000 kHz compared to 1–12 kHz of the old system, greater operational distance, (up to 3–4 km, compared to 600–800 m), and the ability to measure the signal of the main frequency and its subharmonics. Nevertheless, HMD still has its advantages in that loop sources do not require grounding, which is especially important during winter fieldwork.

Technical parameters of the developed CSRMT system, which is used for realization of tensor measurements, are described. To implement tensor measurements, a technique of creating time-varying polarization of electromagnetic field (rotating field) has been developed. It is based on the use of two mutually perpendicular HEDs and two transmitters simultaneously operating with a slight frequency shift (below 10 kHz) or at the same frequency (above 10 kHz). This technique makes it possible to obtain, within a shorter measurement time, results similar to the results of tensor measurements based on a sequential switching of two perpendicular HEDs, and to process data as a single time series.

The numerical modelling was performed for one of the common objects in permafrost areas—a sub-lake hydrogenic talik, which can be studied using the new tensor CSRMT system. We considered a 2D model of talik and obtained synthetic CSRMT curves for TM and TE modes. Upon inversion of TM curves, we found that the talik was identified, but its borders were not determined exactly. The inversion of TE curves allows the creation of a more accurate contour of the talik; however, its shape compared to the model was somewhat distorted, and resistivity of the host medium under the talik was overestimated. The most reliable result with good correspondence of the shape and properties of the talik to the initial model was obtained by bimodal inversion using curves for both TM and TE modes. The numerical modelling confirmed the expected conclusion about the necessity and relevance of CSRMT tensor measurements in permafrost areas.

According to the analysis of applications of different electrical and electromagnetic methods in permafrost regions, the prospects for the use of the CSRMT method are highlighted. The possibility of working with a bimodal electromagnetic field is an important advantage of CSRMT compared to other electrical prospecting methods employed in the Arctic.

**Author Contributions:** Conceptualization, A.K.S.; methodology, A.K.S.; software, A.A.S.; validation, A.K.S., A.A.S. and N.Y.B.; formal analysis, A.A.S.; investigation, A.K.S., A.A.S. and N.Y.B.; resources, A.K.S. and A.A.S.; data curation, A.A.S.; writing—original draft preparation, A.K.S.; writing review and editing, N.Y.B.; visualization, A.K.S. and A.A.S.; supervision, A.K.S.; project administration, A.K.S. and N.Y.B.; funding acquisition, A.K.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Russian Science Foundation, project No. 21-47-04401.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The datasets generated and analyzed in the current study are available from the corresponding author on reasonable request.

**Acknowledgments:** The presented results were obtained with the support of the Russian Science Foundation, project No. 21-47-04401, and the Research park of St. Petersburg State University "Center for Geo-Environmental Research and Modeling (GEOMODEL)".

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Use of the Analytic Hierarchy Process Method in the Variety Selection Process for Sugarcane Planting**

**Luiza L. P. Schiavon \*, Pedro A. B. Lima \*, Antonio F. Crepaldi and Enzo B. Mariano**

Department of Production Engineering, School of Engineering of Bauru, Campus Bauru, São Paulo State University (UNESP), Bauru 17033-360, Brazil

**\*** Correspondence: luiza.schiavon@unesp.br (L.L.P.S.); pedro.ab.lima@unesp.br (P.A.B.L.)

**Abstract:** The sugar and alcohol sectors are dynamic as a result of climate alterations, the introduction of sugarcane varieties, and new technologies. Despite these factors, Brazil stands out as the main producer of sugarcane worldwide, being responsible for 45% of the production of fuel ethanol. Several varieties of sugarcane have been developed in the past few years to improve features of the plant. This, however, led to the challenge of which variety producers should choose to plant on their property. In order to support this process, this research aims to test the application of the analytic hierarchy process (AHP) method to support producers to select which sugarcane variety to plant on their property. To achieve this goal, the research relied on a single case study performed on a rural property located inland of São Paulo state, the main producer state in Brazil. The results demonstrate the feasibility of the approach used, specifically owing to the adaptability capacity of the AHP method.

**Keywords:** multicriteria method; decision-making process; sugarcane; analytic hierarchy process

#### **1. Introduction**

Sugarcane is an important commodity for several developing countries' economies, such as China [1], India [2], Belize [3], and Brazil—the main sugarcane producer in the world. Brazil started producing sugarcane in the 14th century, still in the colonial age, and around the 17th century, the country became the major sugarcane producer worldwide [4]. Nowadays, Brazil is responsible for producing 45% of the ethanol used for fuel in the world. The country is also one of the major exporters of sugar. São Paulo state is the main producer in the country, producing 53.7% of the Brazilian sugarcane in the 2019/2020 harvest, producing 29.03 million tons of sugar and 35.5 billion liters of ethanol. Moreover, the sector represented 26.89% (U.S. \$4.07 billion) of the state exportation [5]. The northeastern region of the state received the greatest increase in sugarcane production, which took place in areas that previously held cattle and other agricultural production [6]. Other states in the south-central region of the country also experienced a similar pattern to São Paulo, although of a lower magnitude [7].

Among the reasons for the expansion of Brazilian production of ethanol from sugarcane was the introduction of flex-fuel engines in the internal market, which provides the ability to use any amount of ethanol and gasoline combination in vehicles [6,8]. The global demand for less environmentally harmful fuels played a significant part in this process [8], motivating research on several alternatives for replacing fossil fuels in the agriculture field, e.g., producing biomass from agricultural residues [9]. Brazilian governmental and industrial stakeholders took advantage of these positive scenarios and worked for the development of sugarcane production in the country. The substantial production of sugarcane in the São Paulo state is related to several factors, such as the presence of a large amount of land with appropriate quality for sugarcane, the best infrastructure in the country, and a regional system of innovation to support the development of production [10]. An expansive picture of Brazilian sugarcane-based ethanol production can be found in [11].

**Citation:** Schiavon, L.L.P.; Lima, P.A.B.; Crepaldi, A.F.; Mariano, E.B. Use of the Analytic Hierarchy Process Method in the Variety Selection Process for Sugarcane Planting. *Eng* **2023**, *4*, 602–614. https://doi.org/ 10.3390/eng4010036

Academic Editor: Antonio Gil Bravo

Received: 29 November 2022 Revised: 10 February 2023 Accepted: 13 February 2023 Published: 15 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Sugarcane cultivation and processing plants have a significant impact on the Brazilian socio-economic structure, being a source of employment and income generation for several municipalities [12]. With the significant reduction in the burning technique previous to the harvesting [6], bioethanol production from sugarcane can also have positive outcomes for environmental sustainability, especially related to reducing CO2 emission compared with using fossil fuels (see [13] for a review about sugarcane production and sustainability in Brazil). Recent studies show new opportunities to increase environmental sustainability in the sector, such as with circular economy practices [14], which is a concept that can contribute to achieving sustainable and human development [15].

Different varieties of sugarcane present different features that affect the products made from sugarcane as well as the sugarcane growth and production itself [16]. Brazil has been experiencing a significant increase in the number of sugarcane varieties [17]. New varieties of sugarcane are useful for achieving higher efficiency (e.g., lower costs and higher productivity) according to different factors [10], such as the new intensive mechanization (as well as other management aspects) and new environments for plantation (including soil and climatic conditions) [18]. Notwithstanding all the apparent benefits, a massive number of options cannot be easily processed, resulting in difficulty in selecting the best choice [19]. The bounded rationality of individuals affects the efficiency of the decisionmakers, including in the agriculture sector [20]. In other words, the higher the number of sugarcane varieties, the more complex the process to select the best sugarcane variety [21].

A decision-making process should be structured based on rules, methods, and procedures and its goal should be the selection of the best-performing option, best expectation, or best evaluation among all the available choices [22]. Within the strategies for decisionmaking, multicriteria techniques are among the main approaches used, as they consider several factors—different conflicting criteria—related to the decision [23,24].

The use of multicriteria methods is a common approach in agriculture-related literature [14,25]. Within these applications, there is a branch of studies focused on variety selection using the analytical hierarchical process (AHP) method. AHP is a suitable approach to indicate preferences for different objectives [26]. For example, AHP has already been applied to support crop selection considering oilseed crops [27], as well as to select the best grape option for organic viticulture [28]. However, little research has relied on AHP to support the selection process of sugarcane variety (with the exception of [29]). Approaches to support variety selection can be quite useful for producers, especially in locations with low resources [30]. Thus, this research aims to test the application of the AHP method to support producers to select which sugarcane variety to plant on their property. In order to achieve this goal, the research relied on a single case study performed in a rural property located in São Paulo state, Brazil. Therefore, while the generalization of the approach used in this research can be made to other contexts, the specific outcomes (i.e., the variety selected) and the specific variables in the model (i.e., the varieties and their features) should be understood considering this case study. This is because of the external aspects that influence the sugarcane varieties, such as soil and climate. It is expected that the approach presented in this research can be especially useful for the decision-making processes related to variety selection faced by small and medium-sized rural producers.

After this introduction, Section 2 presents the materials and methods used in this research, describing the AHP method and the case study of this research. Next, Section 3 presents the results and discussions of the research's findings. Finally, Section 4 presents the conclusions.

#### **2. Materials and Methods**

*2.1. Analytic Hierarchy Process*

Analytic hierarchy process (AHP) was developed in the 1970s by Tomas L. Saaty, and consists of a multicriteria method used and known to support decision-making in problems with multiple criteria. It is based on the Newtonian and Cartesian method that seeks to solve a problem by decomposing it into factors, which can be decomposed into new factors up to the lowest level [31].

AHP is a method commonly applied to define weights for different criteria, being used in a different set of problems and fields, e.g., [27,28,32], mainly thanks to its robustness and simplicity [33]. These applications can be suitable for supporting the solution of simple issues involving only one person or extremely complex situations related to several variables [33]. Generally, the AHP method follows four main steps: problem modeling, pairwise comparison, judgment scale, and priorities' derivation [34].

The hierarchical structure begins with an overall objective that descends to criteria and then alternatives [35]. The first level of the structure presents the general objective to be achieved. The second level indicates the criteria that contribute to reaching the general objective. The third level contains the decision alternatives for the problem [36] (Figure 1).

**Figure 1.** Representation of the basic hierarchical structure of the AHP method. Source: adapted from [37].

After the hierarchical model is developed to address the issue, the next step is the pairwise comparisons for each level of the model by the decision-makers participating in the research. This step aims to achieve a weight factor for each element on the level considering the element right on the next higher level. This process provides a measure of the relative importance of the considered element [36].

The priority definition should be based on the ability of the individuals to perceive the relationship between the objects, comparing the pairs in relation to a criterion or judgment. It is necessary to apply the following steps to achieve this [35,38]:


• Parity judgment: to judge pair-by-pair the elements in the hierarchy in relation to each element in the superior level, compounding a matrix of judgment A, using the scale presented in Table 1. The quantity of judgments for the construction of a matrix A is n × (n − 1), where n is the number of elements.

**Table 1.** Saaty's fundamental scale for the AHP method.



#### *2.2. AHP Application*

In this research, the AHP method was used to support two medium producers of sugarcane located in the inland of São Paulo state, Brazil, to select varieties of sugarcane to be planted. Both are the owners and are responsible for the decision-making in the property. Thus, the application of the AHP method can support them to select the best option according to their preferences regarding the sugarcane's features. The land has purple oxisol soil, also known as "purple land", a kind of reddish soil that is very fertile. The climate is high-altitude tropical, which is characterized by the concentration of rains in the summer and temperatures below 18 ◦C in the winter.

The São Paulo state (Figure 2) has an estimated population of 44.7 million inhabitants, being the most populous Brazilian state, representing 21% of Brazil's population. In 2019, the GDP achieved by the state was approximately USD 582.18 billion, representing 31% of the national GDP [39]. The state also has among the country's highest levels of human development and municipal environmental management practices [40].

**Figure 2.** Brazil and São Paulo state location.

For decision applications, the AHP method was carried out in two phases: (1) the hierarchic design and (2) the evaluation [33]. One of the ways to develop the hierarchic design—phase (1)—is by reaching a consensus in a group, with the presence of individuals with knowledge and experience in the analyzed field being recommended [33]. The hierarchical model should be "complex enough to capture the situation, but small and nimble enough to be sensitive to changes" [31] (p. 163). Thus, in order to decompose the problem into hierarchy elements and to establish the criteria to be evaluated, the first author arranged a meeting with an agronomist engineer. The interview with the agronomist engineer was used to understand the most relevant criteria to be considered for AHP and the best varieties of sugarcane to be included in the model. This study considered the following possible varieties as options to be selected: RB867515, RB966928, CT9001, and RB855156, which are the most cultivated varieties in the regions according to the agronomist engineer. The selected criteria were as follows:


**Figure 3.** Representation of the basic hierarchical structure of sugarcane used in this research.

There is not a pre-established number of individuals that should be interviewed for the AHP method. In the agriculture-related literature, this number has ranged from large numbers in complex issues, such as 60 [45] and 144 [46], to small numbers when related to decision-making problems for farmers to apply in their work context, such as 1 decision-maker [36] and 3 decision-makers [28]. This last context is the case of this research, as the main goal of the AHP application was to support the farmers in selecting which sugarcane variety to choose on their farm. Thus, for the evaluation phase (phase 2), the first author arranged individual meetings with two local farmers who are co-owners of a farm located in São Paulo's inland and have more than 50 years of experience in agriculture production. From now on, they will be called decision-maker 1 and decision-maker 2. This step was performed in order to conduct the paired comparison of each element on the hierarchical level, creating a matrix of quadratic decisions. The paired comparison used the Saaty's fundamental scale for AHP (Table 1). Next, the authors determined the degrees of preference for each criterion, developing five matrices that compare the degrees of intensity for pairs as a function of each characteristic, referencing the five criteria adopted. With the comparing matrices fulfilled, the authors created an algorithm in C language and in MatLab R2015a to implement the AHP method. The results were validated with the application of the Super Decision Software. Next, the authors evaluated the consistency ratio (CR) of all hierarchies, dividing the consistency index (CI) by the random consistency index (RCI) obtained for one matrix of order n, with non-negative elements and randomly generated. The RCI of the hierarchy must be inferior or equal to 10%. The flowchart in Figure 4 presents all of the research steps.

**Figure 4.** Flowchart describing all of the steps of the AHP method applied in this research.

#### **3. Results**

Table 2 presents the prioritization of each alternative (variety of sugarcane) related to each criterion and the main focus of decision-maker 1 and decision-maker 2.

**Table 2.** Prioritization of each alternative of decision-maker 1 and decision-maker 2 for the variables considered in the AHP application.


With the application of the AHP method, after performing the matrices' normalization and the calculation of the average of each criterion, it was possible to determine the preference matrices, as presented in Table 3 for decision-maker 1 and decision-maker 2.


**Table 3.** Preference matrix of each variable and sugarcane variety for both decision-makers.

Regarding the sucrose accumulation criterion, both decision-makers indicated a preference for RB966928. However, while decision-maker 1 preferred CT9001 as the second-best variety, decision-maker 2 opted for the RB55156 variety. Regarding the criterion of ratoon sprouting, both decision-makers agreed that the best variety was RB966928 and the second preference was RB855156. Considering the criterion ton per hectare, decision-maker 1 elected CT9001 as the best choice, while decision-maker 2 opted for RB855156. For the longevity criterion, decision-maker 1 considered RB855156 as the best variety, followed by RB966928; decision-maker 2, however, considered the RB966928 variety as the best and RB855156 as the second preference. Finally, considering the soil requirement criterion, decision-maker 1 chose the RB855156 variety, while decision-maker 2 preferred the RB966928 variety.

Table 4 presents a criteria comparison matrix for decision-maker 1 and decision-maker 2. Both decision-makers presented similar options regarding the comparison of the criteria; the majority of comparisons were different in two points of intensity importance. The main difference is that, while decision-maker 1 considers that longevity has a moderate importance over ton per hectare, decision-maker 2 considers that ton per hectare has a strong importance over longevity. In other words, when comparing only these two criteria, decision-maker 1 considers longevity more important while decision-maker 2 considers ton per hectare more important. There was also a difference when comparing the sucrose accumulation criterion with the longevity criterion. While decision-maker 1 considered longevity with very strong importance over the sucrose accumulation criterion, decisionmaker 2 considered longevity with moderate importance over sucrose accumulation.

**Table 4.** Criterion comparison matrix of each variable and sugarcane variety for both decision-makers.


Next, the authors normalized the comparison matrix of the criteria and calculated the average in order to achieve the final result, which is displayed in Table 5.

**Table 5.** Result of the AHP application with the preferences of both decision-makers for each sugarcane variety.


The results in Table 5 present the final quantification of each alternative according to the answers provided by decision-maker 1 and decision-maker 2. Considering decisionmaker 1, 5.84% of the quantification was for selecting variety RB867515, 35.71% for choosing variety RB966928, 24.31% for variety CT9001, and 34.12% for variety RB855156. For decisionmaker 2, 9.18% of the quantification was for selecting variety RB867515, 62.63% was for selecting variety RB966928, 10.60% was for selecting variety CT9001, and 17.57% was for selecting variety RB855156.

Comparing the final results of both decision-makers, it was possible to notice that, even though there were differences in the percentage for each variety, they presented the same ranking order. In this way, for both decision-makers, the best choice was the RB966928 variety, the second-best choice was RB855156, the third was CT9001, and the last was RB867515.

In order to verify the method's validity, the authors calculated the consistency ratio in the research's matrices, that is, they compared the consistency index with the random consistency index corresponding to the dimension of each matrix. As the consistency ratio of the hierarchy of all matrices was lower than 10%, the method can be considered valid. Therefore, the sugarcane variety RB966928, with numeric results of 35.71% and 62.63% for decision-maker 1 and 2, respectively, should be selected considering the pairwise comparisons provided and the verification of the matrix's coherence. After achieving the final results, the first author presented them to both producers, which reported that the variety RB966928 is usually the one that they prefer and the one they were thinking of planting in the next season. Therefore, the method presented in this research can also be applied to evaluate whether the variety selected by the producers is indeed the one that the producers believe is the best option. Future studies should include longitudinal and economic data in order to verify whether the selected option is indeed the one that presents the best economic outcomes for the producers.

The aggregation of individual judgments (AIJ) method was used in order to aggregate the results of decision-makers into a single group [47] (Table 6). It is noticeable that the decision-makers' ranking was maintained, that is, the selected variety was RB966928 with 49.28%, followed by the RB855156 variety with 27.57%, next was the CT9001 variety with 15.19%, and finally the RB867515 variety with 7.96%.

**Table 6.** Aggregated results for both decision-makers 1 and 2.


The final priority of the alternatives is mainly determined by the weights assigned to the main criteria. Therefore, small changes in the relative weights can lead to large changes in the final ranking. In this context, sensitivity analysis can be performed based on the scenarios they reflect, increasing or decreasing the weight of individual criteria, resulting in changes in priority and rank [48].

As a final analysis, the authors performed a sensitivity analysis of the chosen criteria. For decision-maker 1, when the weight of the potential for sucrose accumulation criterion was changed, the alternative to be chosen remained RB855156. When the weight of the ratoon sprouting criterion was below 53.05%, the selected variety was RB855156; when it was above 53.05%, then the selected variety changed to RB966928. For the ton per hectare criterion, when its weight was at 0%, the selected varieties were RB855156 and CT9001; as the weight increased, the selected variety became RB966928. The variety chosen was CT9001 when the weight was 36.65%. The chosen alternative was RB966928 when the longevity criterion had a weight lower than 22.59%; when it was above this value, the selected variety was RB855156. On the other hand, when the weight of the soil requirement criterion was below 10.73%, the alternative chosen was RB966928; when it was above this level, the RB855156 variety was chosen. This same analysis was performed for decisionmaker 2; however, regardless of the weight of the criteria, the variety chosen was RB966928. These analyses indicate that, for decision-maker 1, there is a system instability that varies, mainly, between the RB855156 and RB966928 varieties, which is justifiable because, as shown in Table 5, the variation between the choice of these varieties is only 1.58866802%. On the other hand, decision-maker 2 s system is stable, making it possible to change the relative importance levels of the criteria without affecting the choice of sugarcane variety, proving to be a robust choice, allowing the decision-maker greater security in relation to his choice.

#### **4. Conclusions**

The application of the analytic hierarchy process (AHP) to support a decision enables the analysis of all of the criteria and alternatives in light of each criterion. It can be considered that the goal to select the best variety of sugarcane was accomplished. The method used in this research is a supporting tool to the decision-making process, which does not diminish the farmer's role in it; he/she remains the decision element and source of information for judging the value and construction of the hierarchical model. Besides, the objective of the tool is to deal with the selection process scientifically and to model the subjectivity inherent to the decision-making process, not removing its subjectivity [29].

This research's importance is highlighted by applying a relatively simple method that can support farmers in selecting the best variety of sugarcane to plant. Another application of the method is to analyze whether farmers are selecting the best choice for their farms. Future studies could rely on a bigger sample in order to compare the varieties that farmers are planting and the choice that they reached as the best one with the AHP method. Considering theoretical implications, this research is important to increase the knowledge of AHP usefulness in the agricultural field.

Considering that the agricultural environment is very dynamic owing to environmental changes and the introduction of new technologies and new varieties of sugarcane into the market, the application of AHP proved to be adequate owing to its dynamic capacity of adaptation. Although there is no methodological issue in applying the AHP method with a sample of only two individuals, this can be considered one limitation of this research. Another limitation is that both farmers are from the same region of the state; therefore, future studies could compare the results of farmers in different locations and kinds of farms to better understand the issue. Finally, future studies should include longitudinal and economic data in order to perform further analysis and increase the outcomes of the method.

**Author Contributions:** Conceptualization, L.L.P.S. and P.A.B.L.; methodology, L.L.P.S.; software, L.L.P.S.; validation, A.F.C. and E.B.M.; formal analysis, L.L.P.S.; investigation, L.L.P.S.; resources, L.L.P.S.; data curation, L.L.P.S.; writing—original draft preparation, L.L.P.S. and P.A.B.L.; writing—review and editing, L.L.P.S., P.A.B.L., A.F.C. and E.B.M.; visualization, L.L.P.S. and P.A.B.L.; supervision, L.L.P.S. and P.A.B.L.; project administration, L.L.P.S. and P.A.B.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Formalising Autonomous Construction Sites with the Help of Abstract Mathematics**

**Dmitrii Legatiuk <sup>1</sup> and Daniel Luckey 2,\***


**Abstract:** With the rapid development of modern technologies, autonomous or robotic construction sites are becoming a new reality in civil engineering. Despite various potential benefits of the automation of construction sites, there is still a lack of understanding of their complex nature combining physical and cyber components in one system. A typical approach to describing complex system structures is to use tools of abstract mathematics, which provide a high level of abstraction, allowing a formal description of the entire system while omitting non-essential details. Therefore, in this paper, autonomous construction is formalised using categorical ontology logs enhanced by abstract definitions of individual components of an autonomous construction system. In this context, followed by a brief introduction to category theory and ologs, exemplary algebraic definitions are given as a basis for the olog-based conceptual modelling of autonomous construction systems. As a result, any automated construction system can be described without providing exhausting detailed definitions of the system components. Existing ologs can be extended, contracted or revised to fit the given system or situation. To illustrate the descriptive capacity of ologs, a lattice of representations is presented. The main advantage of using the conceptual modelling approach presented in this paper is that any given real-world or engineering problem could be modelled with a mathematically sound background.

**Keywords:** modelling; abstract approach; formalisation; category theory; ontology logs; robotic construction; autonomous construction; conceptual modelling

#### **1. Introduction**

Civil engineering is widely considered very traditional, especially in comparison to other engineering disciplines, such as mechanical engineering, mainly because of the unique character of each structure. Even for standard residential buildings, the particular conditions on each construction site may require changes in the design and modelling of the building, which may affect the entire construction process. This one-of-a-kind-production has been a major obstacle on the way to integrating modern technologies and automation in the field of civil engineering for a long time. Furthermore, the availability of cheap manual labour and the small and medium-sized enterprise structure of the construction sector have hindered advances in research and development. Recently, the development of integrating affordable yet highly flexible industrial robots into the digital design flow of architecture and construction has prompted a surge in robotic systems in construction [1].

The spectrum of automation in construction ranges from industrial and on-site prefabrication to autonomous on-site robots. However, for mobile in-situ construction robots, only prototypes have been presented so far; see, for example, works [2–4]. This is mainly because industrial robots are designed for repetitive tasks in controlled environments and are now used in manufacturing systems for different types of materials, for example, timber [5], masonry [6], and concrete [7]. In this regard, the transition to autonomous mobile on-site robots requires modifications to the robotic system, possibly adding wheels

**Citation:** Legatiuk, D.; Luckey, D. Formalising Autonomous Construction Sites with the Help of Abstract Mathematics. *Eng* **2023**, *4*, 799–815. https://doi.org/ 10.3390/eng4010048

Academic Editors: Antonio Gil Bravo and Jingzheng Ren

Received: 7 December 2022 Revised: 21 January 2023 Accepted: 24 February 2023 Published: 1 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

for movement, tracking or scanning devices for orientation and perception of obstacles, other robots or human co-workers. Furthermore, each construction site is unique, implying that a robot must adapt to sudden changes in environmental conditions.

As the success of robotic construction evidently depends on the interaction of robotic systems with the environment, it has been stated in [8] that design and construction systems have to be aligned to the capabilities of the robot arm and tolerances of the material, i.e., weight, friction, rigidity, as well as the robotic placement and potential connection systems. In addition to the one-of-a-kind character, structures are an aggregation of possibly hundreds of work steps for each individual part of the structure. From structural work to interior fitting, robotic systems have to be able to handle each step of the way and, therefore, need to interact or collaborate with the environment and other robots on the same construction site. Therefore, advanced feedback and localisation systems, e.g., external or integrated sensors, cameras or laser scanners, are required to cope with tolerances, uncertainties and possibly human interaction. To overcome these obstacles, a workflow and prototypes for autonomous construction sites have been presented in [4,9].

However, summarising current results related to robotic construction sites, it is noticeable that researchers solely present particular solutions, aiming to address specific tasks. This approach results in creating somehow similar yet different workflows, each highlighting specific aspects of robotic construction sites relevant to a particular task. Furthermore, each prototype uses different tools as well as system components and is based on different types of robotic manipulators from various manufacturers programmed by different types of code to control the robots, emphasising the lack of a general approach to modelling robotic construction sites that would be applicable to different tasks. Therefore, this paper aims to provide a fundamental basis for the conceptual modelling of robotic construction sites based on mathematical abstractions and, in particular, category theory.

In recent years, several results related to the formal modelling of engineering systems have been presented. In particular, abstract approaches based on graph theory [10], abstract Hilbert spaces [11,12], relational algebra [13,14], predicate logic [15,16], type theory [17,18], and category theory [19–22] have been proposed. However, direct use of these results in the context of autonomous construction sites is difficult because of the coupled system robot-construction site, which requires conceptual modelling not only of a robot itself but also its surrounding and, in particular, kinematic constraints on robot movements. Therefore, to overcome this difficulty, the use of *categorical ontology logs*, or simply *ologs*, combined with an abstract algebraic approach is proposed in this paper.

Ologs were introduced by Spivak and Kent in [23] and are based on category theory, implying that ologs have a strong mathematical basis while providing the flexibility of general-purpose ontologies. In particular, ologs provide two distinct features making them very attractive for practical use:


Although ologs have the obvious advantages discussed above, they also share an obstacle typical for all general-purpose ontologies: a subjective worldview of the ontology creator. Therefore, to overcome this obstacle, in this paper, we propose a slight modification of the concept of common ground presented in [23]. This modification is based on a twostep procedure: at first, formal definitions of individual components of an autonomous construction site, based on an abstract algebraic approach presented in [13], are introduced; after that, the abstract definitions are used as a common ground for all ologs describing an autonomous construction site. In this case, the subjectivity of the ologs' creator worldview can be overcome, and thus, a formally sound lattice of ologs, representing an autonomous construction site, will be obtained.

This paper aims to provide a basis to overcome the previously mentioned issues of conceptual modelling of automated construction sites by coupling ologs with abstract algebraic definitions. In this context, algebraic definitions are used as a common basis for olog-based conceptual modelling of autonomous construction systems. As a result, any automated construction system can be described, without providing exhausting detailed definitions of the system components. Existing ologs can easily be extended, contracted or revised to fit the given system or situation. With these operations, e.g., revision, precise translation terminologies are provided. To illustrate the capacity of ologs, a lattice of representations for automated construction sites is presented. The main advantage of using the conceptual modelling approach presented in this paper is that any given real-world scenario or engineering problem could be modelled with a mathematically sound background.

The paper is organised as follows: Section 2 provides a few basic facts about category theory and ologs; Section 3 introduces an abstract description of autonomous construction sites, used as the common ground for olog-based representation of advanced structures presented in Section 4; finally, a discussion on the results of the paper and remarks on further applications are provided in Section 5.

#### **2. Fundamentals of Category Theory and Ologs**

In this section, the concept of ologs for the purpose of conceptual modelling of real-life scenarios is described, following a basic introduction to category theory as mathematical basis of ologs.

#### *2.1. Basics of Category Theory*

Ologs are based on category theory, and therefore, to support the reader in the upcoming discussion, a few basic definitions of category theory are provided in this section. Generally speaking, category theory can be seen as an abstract theory of functions studying different mathematical structures (objects) and relations between them [25]. A *category* is introduced via the following definition:

**Definition 1** ([25])**.** *A category consists of the following data:*


*(v) For each object A, there is given an arrow* 1*<sup>A</sup>* : *A* −→ *A called the identity arrow of A.*

*These data are required to satisfy the following laws: h* ◦ (*g* ◦ *f*) = (*h* ◦ *g*) ◦ *f and f* ◦ 1*<sup>A</sup>* = *f* = 1*<sup>B</sup>* ◦ *f .*

A category is everything satisfying this definition, and therefore, very general objects can be put together to form a category by specifying relations between objects via the arrows, which are sometimes called morphisms. This generality is the starting point for introducing ologs, as it will be shown later. Mappings between different categories are introduced by the notion of a *functor*:

**Definition 2** ([25])**.** *A functor F* : **C** −→ **D** *between categories* **C** *and* **D** *is a mapping of objects to objects and arrows to arrows in such a way that:*

*(i) F*(*f* : *A* −→ *B*) = *F*(*f*): *F*(*A*) −→ *F*(*B*)*;*

$$(ii) \quad F(\mathbf{1}\_A) = \mathbf{1}\_{F(A)}.$$

*(iii) F*(*g* ◦ *f*) = *F*(*g*) ◦ *F*(*f*)*.*

*That is, F respects domains and codomains, identity arrows, and composition. In other words, functors are structure-preserving mappings between categories.*

#### *2.2. Introduction to Ologs*

Ologs, in general, as first introduced in [23], are intended to provide a framework for knowledge representation, in order to organise data and results, to make them comprehensible and comparable to other scientists. As stated by the name, **o**ntology **logs** are closely related to ontologies, which focus on defining what entities exist, thus consequently categorising entities and defining relationships between these categories. In engineering applications, ontologies are used to develop models of reality. Subsequently, ologs are intended to structure and represent the results of defining entities and modelling relationships between categories by recording them in a structure based on category theory.

However, as for every model, the structure is highly dependent on the subjective worldview of the creator(s). When creating ontologies, subjectivity should be eliminated as far as possible, as it may lead to information not being perceived by readers as intended by the creators. Hence, ologs are aware that the views of the creators and readers may not correspond. Therefore, ologs do not attempt to accurately reflect reality but to be structurally sound and accurate in correspondence with the views of the creator. However, discrepancies in the views of different creators do not prevent ologs from being aligned and connected. Because of the strong mathematical basis provided by category theory, ologs can be linked and precisely connected by functors, as the main advantage for conceptual modelling.

Functors allow ologs to be referenceable by other authors and, in addition, extendable since any model, respectively olog, needs to be extended in order to correctly represent new developments, features or different views. Moreover, the mapping of ologs by functors allows the generation of precise translation terminologies between models. Thus, as well as being represented as graphs, ologs can serve as database schemas that provide a humanreadable interface, with the components of ologs representing tables and attributes to translate one system of tables into another. Therefore, the basic components and the respective graphic representation of ologs are presented in the following.

To keep the presentation short, the detailed discussion on the construction of ologs and their structure from [23,26] is compressed in the form of one definition. More advanced concepts from ologs theory will be discussed at the places of their direct use for the ologbased description of engineering systems. Additionally, to support the reader, the definition of ologs is placed in the engineering context. The following definition introduces ologs:

**Definition 3.** *An olog is a category, which has types as objects, aspects as arrows, and facts as commutative diagrams. The types, aspects, and facts are defined as follows:*

• *<sup>A</sup> type is an abstract concept represented as a box containing a singular indefinite noun phrase. Types are allowed to have compound structure, i.e., being composed of smaller units. The following boxes are types:*

*a triple* (*w*, *t*, *m*) *where w is a wall, t is a thickness of w, and m is a material of w*

• *Aspects are functional relationships between the types represented by labelled arrows in ologs. Consider a functional relationship called f between types X and Y, which can be denoted f* : *X* → *Y, then X is called the domain of definition for the aspect f , and Y is called the set of result values of f . Here are two examples of using aspects:*

• *Facts are commutative diagrams, i.e., graphs with some declared path equivalences, in ologs. Facts are constructed by composing several aspects and types.*

Facts, represented by commutative diagrams, have a crucial role in practical applications of ologs, because facts can be straightforwardly converted into databases of knowledge; see [23,26] again for a detailed discussion. Thus, ologs provide a general framework for knowledge representation supporting an easy integration into the engineering modelling process via the link to databases.

With the definition of types, aspects, and facts, the main components of ologs have been introduced. However, as shown in Definition 3, the construction of ologs as graphs follows several rules, in order to keep the system readable, such as the declaration of types should begin with "a" or "an" and aspects with a verb. Detailed information on the construction of ologs can be found in [23].

#### **3. Abstract Description of Autonomous Construction Sites**

For enhancing ologs with more objective constructions, it is necessary to provide a formal common ground for the development of ologs of an autonomous construction site. Therefore, this section provides an abstract algebraic description of essential parts definitions, such as robot and robotic environment, constituting an autonomous construction site. As a result of this section, an abstract framework for describing autonomous construction is created.

It is important to remark that existing definitions of a robot attempt to find a balance between being too vague and too specific, with a valid general definition regarding robots seemingly missing or still subject to debate given the sheer amount of robot variations. An overview of several varying robot definitions is given in [27]. Although this section takes steps in this direction, it is not the aim to claim that the definitions provided below should be used as an industry standard. Additionally, it is worth remarking that it is certainly possible to connect existing definitions of a construction robot to the abstract constructions presented in this paper. However, this connection goes beyond the scope of the current paper and is therefore kept for future work.

Because of the predominant role in research and development as stated above, robots in the following context are considered industrial robots, with their components, structure, and operation described in [28]. The aim of this work is to illustrate how the coupling of the abstract algebraic approach and ologs can improve the conceptual modelling of autonomous construction sites.

In terms of abstract constructions, it is possible to follow either the top-to-bottom approach by first defining an autonomous construction site and then scaling it down to its components, or the bottom-to-top approach, by defining the components and then scaling them up to the autonomous construction site. For the purpose of this article, the first approach will be used. Therefore, we start with the following definition:

**Definition 4** (Autonomous construction site)**.** *An autonomous construction site or a robotic environment is the object* <sup>A</sup> <sup>=</sup> T, <sup>R</sup>, <sup>H</sup>, <sup>E</sup>, <sup>G</sup>, <sup>B</sup>)*, where*


Let us discuss the role of each component from Definition 4 in more detail:


In summary, Definition 4 provides an abstract point of view on autonomous construction sites. This abstract point of view helps to "sieve out"all details that are not critical for the first stage of planning and designing an autonomous construction site.

Next, an abstract description of a robot needs to be introduced. It is also necessary to take into account that a robotic system can generally be subdivided into two parts: a physical part (physical components of the systems) and a logical part (control and communication signals). Hence, an abstract definition must also reflect this coupled nature of a robot. Therefore, the following definition is proposed:

**Definition 5** (Robot)**.** *<sup>A</sup> robot is the object* <sup>R</sup> *<sup>=</sup>* C,K, <sup>P</sup>, <sup>S</sup>, Ac*, where*


For providing a clear practical interpretation of this definition, let us now discuss the robot components individually:


the robotic system is the combination of internal and external axes determined by the kinematic chain. Based on the kinematic properties representing specific constraints, the robotic controller C is able to generate control signals for the robot to reach target coordinates in the determined work area. Additionally, it is important to notice that for making K consistent from the point of view of set theory, it is assumed that all kinematic constraints are formalised in terms of equations and inequalities, i.e., mathematical expressions.


For completing basic abstract definitions related to autonomous construction sites, it is necessary to introduce abstract descriptions of sensors and actuators. Abstract definitions for sensors and sensor networks have already been introduced in [13], and a sensor is then defined as follows:

**Definition 6** (Sensor, [13] )**.** *<sup>A</sup> sensor is the object* <sup>S</sup> <sup>=</sup> I, <sup>Y</sup>, T *, where*


By this definition, sensors are allowed to measure several physical quantities, and not just one. Moreover, for simplicity, we assume that card *Ii* = *Ni* ∀*i*. Nonetheless, it is important to remark that the case card *Ii* < *Ni* ∀*i* is also of practical interest for further use of measurements, i.e., data and signal analysis, since it underlines that not all measured data can be used, but only a subset, which corresponds to the idea of frame analysis and sparse representations [30].

Further, the following definition of a sensor cluster has been presented in [13]:

**Definition 7** (Sensor cluster, [13])**.** *A sensor cluster is the object* SC = B, <sup>S</sup>, <sup>R</sup>*, where*


In this definition, the *m*-tuple of relation R specifies the rules of communication between sensors, which are specified during the sensor network design; see [14,31] for specific examples of relations and practical meanings of the relations in wireless sensor network modelling.

Taking into account Definition 7, it is also possible to change Definition 5 of a robot by replacing the *n*-tuple of sensors S with the sensor cluster SC . However, this approach might be a bit inconsistent because typically robots have built-in sensors, which are directly controlled by the robotic controller C and not by a separated sensor node B, as required by Definition 7. Therefore, the current form of Definition 5 is preferred. Moreover, if extra sensors need to be installed on a robot, then it is always possible to combine both definitions via the composition

$$
\mathfrak{R} \circ \mathcal{S}\_{\mathbb{C}'} 
$$

where the composition ◦ represents communication rules between the sensor node and the robotic controller. Hence, the separation of two definitions provides more flexibility in terms of the descriptive capabilities of the whole abstract framework.

Finally, following Definition 6, let us now introduce a definition for an actuator:

### **Definition 8** (Actuator)**.** *An actuator is the object* A = B, AS, T *, where*


This definition is based on the fact that each actuator has a sensor node attached to it, controlling the actuation process. The control of the actuation process is realised via the corresponding control model embedded into the sensor node, which is abstracted here in terms of the actuation signal AS. The *k*-tuple of specification information T represents standard information about the actuator (e.g., type, manufacturer).

Further, if necessary, a definition of an actuator cluster, similar to a sensor cluster introduced in Definition 7, can be provided. In this case, actuators have to be combined in a tuple, and another tuple of relation, specifying communications between various actuators and base stations, must be introduced. For the purpose of this paper, a definition of an actuator cluster is omitted. Instead, Definition 8 shall conclude by defining a common ground for an autonomous construction site, which shall subsequently be used for creating ologs.

#### **4. Olog Representations of Robotic Construction Sites**

In this section, the abstract definitions of autonomous construction sites introduced in Section 3 will be used as a common ground for creating olog representations of autonomous construction sites. In particular, the concept of the lattice of representations will be discussed and illustrated by examples.

Ologs reflect the idea of lattice of theories by a lattice of representations, see again [23], as mentioned in Section 1. Formally, the lattice of representations is represented by an entailment pre-order as part of the global category of specifications. Practically, it means that it is possible to move between different ologs by using four operations/mapping, see Figure 1: contraction **C**, expansion **E**, revision **R**, and analogy **A**.

**Figure 1.** A general idea of the lattice of representation concept.

More general or detailed ologs are created, by moving upwardes or downwards, respectively, between ologs, i.e., by contraction **C** or expansion **E**, as it is indicated in Figure 1. In addition, ologs may be revised to update or remove details, which is realised via revision **R**, or translated into another olog, using a morphism.

As mentioned in Section 3, a top-to-bottom approach was chosen for olog development. Therefore, we start our lattice of representations by developing olog O<sup>1</sup> for describing an autonomous construction site based on Definition 4. The olog starts at type *A*, the autonomous construction site itself. Based on Definition 4, an autonomous construction site consists of a tuple of tasks T, an *n*-tuple of robots R, a human–robot interaction H, a set of pairs of environmental conditions E, a 4-tuple of GPS information G, and a base station B. Every object of the autonomous construction site is represented by a type in the corresponding olog illustrated in Figure 2.

**Figure 2.** Olog representation of an autonomous construction site.

In conformity with Section 2.2, every type begins with a or an. An aspect is connecting a type, representing all possible objects of that type, called *domain*, with another type, called *codomain*, representing a subset of possible results. The olog presented in Figure 2 provides a very general description of an autonomous construction site. To provide a better overview, several arrows connecting types have been omitted. These connecting arrows constitute facts about an autonomous construction site. For example, by connecting type *A* and type *K* via an arrow labelled as *has*, we would obtain the following fact: an autonomous construction site A has a base station. Similarly, other facts can be deduced from olog O1.

Next, let us illustrate how expansion works on the example of olog O1. We formally apply an expansion mapping **E** to O1, which results in adding more types and connecting arrows to the original olog of an autonomous construction site. Figure 3 presents the results of this expansion, a new olog **E**O1.

**E**O<sup>1</sup> :

**Figure 3.** Olog representation of an autonomous construction site after the application of an expansion mapping **E**.

Evidently, olog **E**O<sup>1</sup> has been expanded with more types and more facts provided by commutative paths, for example, the triangle *DEF*. Additionally, for illustrative purposes, we have added type *I*<sup>3</sup> as humidity, which is a particular instance of an environmental condition. This shows how concrete data can be added to an abstract olog, implying that an olog can be directly translated into a database of knowledge about an autonomous construction site.

Further, let us illustrate how contraction works by the example of olog **E**O1. It is important to underline that **C** = **E**−<sup>1</sup> is not required, meaning that by contracting an expanded olog, we do not need to obtain the original olog. Figure 4 illustrates a possible (one of many) output(s) of applying a contraction mapping **C** to olog **E**O1, where some details of Definition 4 have been omitted.

**Figure 4.** Olog representation of an autonomous construction site after the application of a contraction mapping **C** to the expanded olog **E**O1.

Thus, we have the following diagram on the level of ologs:

where the dashed arrow indicates that we cannot arrive at olog **C**(**R**O1) from the olog O<sup>1</sup> in one step, and a combination of contractions and expansions is required. Hence, we obtain a lattice of representations containing several ologs that are convertible between each other and represent different levels of details about an autonomous construction site.

Similar to the construction of olog O1, ologs for Definitions 5–8 can be established. For illustrative purposes, only the olog for a robot is presented, while the other ologs will only be denoted. For keeping consistency with the order of definition presented in Section 3, let us denote by O<sup>2</sup> an olog for a robot, by O<sup>3</sup> an olog for a sensor, by O<sup>4</sup> an olog for a sensor cluster, and by O<sup>5</sup> an olog for an actuator. Figure 5 presents olog O2, which is based on Definition 5. Similar to olog O1, olog O<sup>2</sup> might be expanded or contracted in different ways, as well as some arrows making olog O<sup>2</sup> commute, and facts could be easily added.

**Figure 5.** Olog representation of a robot based on Definition 5.

In addition, a potential connection between ologs O<sup>1</sup> and O<sup>2</sup> is worth discussing. In general, there are two possibilities to formally connect these ologs:


From the point of view of constructing a lattice of theories (or representations), the first approach is preferable, while an olog containing more information and, thus, in this sense, a more general olog can be generated from O1. Following this, let us further introduce formal denotations:


Thus, we obtain the following lattice of representations:

Evidently, all arrows can be reverted and, hence, turned into a contraction of ologs. The diagram above provides a clear structure of how different parts of a system "autonomous construction site" are connected. This structure underlines advantages of working with abstract definitions introduced in Section 3 in combination with ologs:


It is also worth underlining that the composition of expansion followed by contraction can be viewed as a "zoom-in" operation on an olog. This operation can be seen as a special kind of revision **R**, when a type of the original olog is expanded and then everything except this expansion is removed. This procedure corresponds to the second alternative on connecting ologs O<sup>1</sup> and O2, as discussed above.

Next, let us briefly illustrate a revision of an olog. According to [23], a revision is a composite, which uses a contraction to discard irrelevant details, followed by an expansion to add new facts. Referring back to the discussion around Definition 7, it is possible to replace an *n*-tuple of sensors in the definition of a robot with a sensor cluster SC . For olog O2, it means that revision **R** = **E** ◦ **C** is applied. Figure <sup>6</sup> shows the resulting olog **R**O2. It is worth noting that this revision of ologs, as well as of the definition, can be easily performed within the abstract approach proposed in this paper.

**Figure 6.** Illustration of olog revision on the example the robot-olog O2.

Finally, let us briefly discuss how analogy mapping works. An analogy is obtained by systematically renaming all types and aspects of an olog to describe/model a different realworld situation. For example, a mobile unmanned aerial vehicle can be seen as a system, which is similar to a robot. In this case, an analogy between olog O<sup>2</sup> and olog O*UAV*, describing a mobile unmanned aerial vehicle, can be created. This analogy is formally represented by the diagram

$$
\mathcal{O}\_2 \xrightarrow{\mathbf{A}} \mathcal{O}\_{UAV}
$$

Exemplarily, an analogy could mean that the types for kinematic properties need to be renamed or reorganised, or the type for a robotic controller needs to be replaced by a remote controller device. Evidently, Definition 5 needs to be adapted then as well, which can be easily accomplished within the abstract approach proposed in this paper.

#### **5. Discussion and Conclusions**

In this paper, a conceptual modelling approach for autonomous construction based on categorical ontology logs coupled with abstract algebraic definitions was presented. The motivation for this coupling is twofold: first, introducing abstract definitions of individual components of an autonomous construction system allows removing subjectivity, which is typical for ontology-based representations; and second, these abstract definitions serve as a common ground for ologs making the whole framework easily extendable and interpretable. Therefore, after introducing abstract definitions of individual components of an autonomous construction system, several ologs for these definitions have been developed. Moreover, basic operations, i.e., contraction, expansion, revision, and analogy, have been discussed.

Let us now summarise and discuss the main points of the paper:

#### • **Abstract description of autonomous construction sites**

Several abstract definitions formalising autonomous construction sites have been introduced in Section 3. The idea of these definitions is to provide a common ground for an olog-based description of autonomous construction. A top-to-bottom approach for conceptual modelling of autonomous construction sites has been chosen. Hence, starting with an autonomous construction site, definitions of its more detailed components have been added step-by-step. The main advantage of this approach is that the resulting conceptual modelling framework is scaleable and extendable with more details, if necessary. Any of the Definitions 4–8 can be revised or updated without the need for a general restructuring of the complete framework presented in this paper. It is also important to underline that the field of robotic construction still misses generally accepted "standard" definitions. Therefore, the results presented in Section 3 should not be understood in the way of the definitions to become an industrial

#### standard but rather as an approach on how to address practical engineering problems on a more abstract level sieving out all concrete details.

#### • **Olog-based representations of autonomous construction**

An olog-based representation of autonomous construction sites has been presented in Section 4. As described in Section 2.2, ologs are designed to handle the subjectivism of the creator of the abstract model. This point has been further strengthened by coupling ologs with abstract definitions introduced in Section 3. This coupling makes the relation and comparison, as well as the translation of ologs, even more mathematically sound and formal. Hence, the ologs presented in this paper can be straightforwardly implemented in the form of databases, as well as the extension/contraction rules. Further, if more details are desired in a concrete application, these details can be easily added via revision of existing ologs, as has been demonstrated in the paper.

#### • **Lattice of representations**

Finally, Section 4 presents a lattice of representations, which is developed by extending and revising existing ologs. Arguably, the concept of the lattice of representations is the most powerful tool of olog-based description of engineering systems. First, the lattice can be easily extended without the need for changing previous results. In this case, a new olog is simply added to the lattice, and the corresponding extension is then formally defined. Second, the lattice of representation can even be created first and, hence, provide a guideline for creating ologs and missing definitions.

It is also beneficial to provide a few comments on practical applications of the conceptual modelling framework presented in this paper:


3. Create ologs for each part and fit them into the lattice of representations defined in Step 1. Further, if necessary, ologs can be converted into databases and connected to other conceptual models, if available.

In summary, the results presented in this paper indicate that a coupling of ologs and abstract algebraic definitions provides a high degree of flexibility to the resulting framework. Moreover, as it has been shown in some examples, the abstract framework can be easily extended with new definitions and, hence, with new ologs. Therefore, ologs are proposed to overcome the issues of incomparable prototypes and isolated solutions of systems for autonomous construction. As a result, any automated construction system can be described without providing exhausting detailed definitions of the system components, as existing ologs can be extended, contracted or revised to fit the given system or situation. To illustrate the capacity of ologs, an exemplary lattice of representations for autonomous construction sites has been presented. Additionally, the results obtained for autonomous construction can be transferred to other fields of engineering by using analogy operations on two levels: adapting ologs and translating the respective definitions. Thus, the results presented in this paper can be seen not only as an attempt to formalise an autonomous construction but as a general approach to formalising engineering problems.

For future work, since the definitions for a robot and an autonomous construction site are only exemplary, the detailed description of a complete system of autonomous construction would be of relevance to determine the exact ramifications and parameters of describing such a complex system by means of ologs. Subsequently, the process of how an existing olog representation of an autonomous construction system can be translated or revised into another system needs to be examined. Furthermore, the investigation of different systems would allow the identification of matching parameters in order to identify possible system-inherent properties in order to approach a general system definition, if required.

Further direction of future work could be related to using the abstract definitions presented in Section 3 in concrete engineering applications. In particular, using these definitions in the context of path optimisation on graphs with the help of Clifford operator calculus, as it has been presented in [31], to further underline the advantages of coupling abstract mathematics and engineering.

**Author Contributions:** Conceptualisation, D.L. (Daniel Luckey) and D.L. (Dmitrii Legatiuk); methodology, D.L. (Daniel Luckey) and D.L. (Dmitrii Legatiuk); writing—original draft preparation, D.L. (Daniel Luckey); writing—review and editing, D.L. (Daniel Luckey) and D.L. (Dmitrii Legatiuk); funding acquisition, D.L. (Dmitrii Legatiuk). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is supported by the German Research Foundation (DFG) through grant LE 3955/4-1.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the writing of the manuscript.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

Olog Ontology log

UAV Unmanned aerial vehicle

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Bending and Torsional Stress Factors in Hypotrochoidal H-Profiled Shafts Standardised According to DIN 3689-1**

**Masoud Ziaei**

Department of Mechanical and Automotive Engineering, Institute for Machin Development, Westsächsische Hochschule Zwickau, D-08056 Zwickau, Germany; masoud.ziaei@fh-zwickau.de

**Abstract:** Hypotrochoidal profile contours have been produced in industrial applications in recent years using two-spindle processes, and they are considered effective high-quality solutions for form-fit shaft and hub connections. This study mainly concerns analytical approaches to determine the stresses and deformations in hypotrochoidal profile shafts due to pure bending loads. The formulation was developed according to bending principles using the mathematical theory of elasticity and conformal mappings. The loading was further used to investigate the rotating bending behaviour. The stress factors for the classical calculation of maximum bending stresses were also determined for all those profiles presented and compiled in the German standard DIN3689-1 for practical applications. The results were also compared with the corresponding numerical and experimental results, and very good agreement was observed. Additionally, based on previous work, the stress factor was determined for the case of torsional loading to calculate the maximum torsional stresses in the standardised profiles, and the results are listed in a table. This study contributes to the further refinement of the current DIN3689 standard.

**Keywords:** hypotrochoidal profile shafts; DIN3689 H-profiles; bending stress; rotating bending loads in profiled shafts; flexure; torsional stress in profiled shafts; noncircular shafts; bending stress factor; torsional stress factor

#### **1. Introduction**

In the field of modern drive technology, there is an increasing demand for higher power transmission in a smaller construction space. A necessary and important component in drive trains is the form-fit shaft and hub connections. Thereby, a widely used standard solution is the key-fit connection according to DIN 6885 [1]. However, this technique is reaching its mechanical limitations, which is why industry focus has been increasingly on form-fit connections with polygon profiles in the past few years. With the hypotrochoidal polygonal connection (H-profiles in Figure 1), a polygonal contour has been the new standard according to DIN 3689-1 [2] since November 2021. The great advantages of Hprofiles via key-fit connections were studied in [3]. These investigations display a significant reduction of around 50% in the fatigue notch factor.

Additionally, a significant advantage of hypotrochoidal profiles (H-profiles) is their manufacturability through two-spindle turning [4,5] (Figure 2) and oscillating–turning [6] processes, as well as roller milling [7] (Figure 3). This allows time-efficient production.

Despite the excellent manufacturability described above and the great mechanical advantages of H-profiles, there is currently no reliable and cost-effective calculation method for the dimensioning of such profiles. The determination of the strength limit of H-profiles is still performed by means of extensive numerical investigations.

DIN 3689-1 refers to geometric specifications for H-profiles. Design guidelines are compiled in Part 2 of the standard. This paper represents an analytical solution for purely bending-loaded H-profile shafts in general and specifically for all standardised H-profiles for the first time. Furthermore, the author uses the analytical solution developed in another

**Citation:** Ziaei, M. Bending and Torsional Stress Factors in Hypotrochoidal H-Profiled Shafts Standardised According to DIN 3689-1. *Eng* **2023**, *4*, 829–842. https://doi.org/10.3390/eng4010050

Academic Editor: Antonio Gil Bravo

Received: 14 December 2022 Revised: 8 February 2023 Accepted: 1 March 2023 Published: 6 March 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

paper [8] for all standard profiles for torsional stresses and puts them together for practical and industrial applications.

The results can be used for a reliable and cost-effective calculation method of H-profile shafts with a simple pocket calculator for pure bending as well as torsional loads.

**Figure 1.** Description of exemplary hypotrochoid (H-profile) with four concave sides. A detailed explanation of the parameters is given below in Section 2.

**Figure 2.** Some H-profiles manufactured by two-spindle process, Iprotec GmbH, © Guido Kochsiek, www.iprotec.de, Zwiesel, Germany [5].

**Figure 3.** Roller milling manufacturing for H-profile [7].

#### **2. Geometry of H-Profiles**

A hypotrochoid (H-profile) is created by rolling a circle with radius *rr* (called a rolling circle) on the inside of a guiding circle with radius *rg* with no slippage (see, for instance, [9]). The distance between the centre point of the rolling circle and the generating point P is defined as eccentricity (Figure 1). Depending on the diameter ratios of the two circles and the location of the generating point P in the rolling circle, different H-profiles may be formed.

The diameter ratio (*rg*/*rr*) defines the number of sides "*n*" and should be an integer (*n* > 2) to obtain a closed curve without intersection. The coordinates of the generated point P describe the parameter equations for the hypotrochoid (H-profile) as follows:

$$\begin{array}{c} x(t) = r \cdot \cos(t) + e \cdot \cos[(n-1)\cdot t] \\ y(t) = r \cdot \sin(t) - e \cdot \sin[(n-1)\cdot t] \text{ with } 0^\circ \le t \le 360^\circ. \end{array} \tag{1}$$

The overlapping of the profile contour starts from the limit eccentricities of *elim* = *<sup>r</sup> n*−1 and, accordingly, the limit relative eccentricity of *<sup>ε</sup>lim* = *elim <sup>r</sup>* <sup>=</sup> <sup>1</sup> *<sup>n</sup>*−<sup>1</sup> .

Figure 4 shows some examples of the H-profiles obtained for different numbers of sides (*n*) and eccentricities.

**Figure 4.** Examples of H-profiles with different numbers of sides (*n*) and eccentricities.

If a rolling circle rolls on the outside of a guiding circle, the profile generated is called an epicycloid (E-profile).

#### *2.1. Geometric Properties*

Area

Starting from the parameter representation (1) for the hypotrochoidal contours, the following complex mapping function is formulated as follows:

$$
\omega(\zeta) = r \cdot \zeta + \frac{\mathfrak{e}}{\zeta^{n-1}} \tag{2}
$$

This function conformally maps the perimeter of a unit circle to the contour of a H-profile. However, when the area enclosed by the polygon was mapped, multiple poles were formed at the corners of the contour. A complete conformal mapping is not essential for the determination of bending stresses. However, for shear force bending, a complete mapping of profile cross-section is necessary (analogue to torsion problem [8]).

By substituting mapping (2) into the equation for the area [10,11]:

$$A = \frac{1}{2} \int\_0^{2\pi} \operatorname{Im} \left[ \bar{\omega}(\zeta) \cdot \dot{\bar{\omega}}(\zeta) \right] dt \tag{3}$$

the following relationship can be derived for the area enclosed by an H-profile for any number of flanks *n* and eccentricity *e:*

$$A = A\_a - \pi \cdot c \cdot \left[ d\_a + e \cdot (n - 2) \right] \tag{4}$$

where . *ω* = *dω*/*dt* is the first derivative of the mapping function, *t* defines the parameter angle, and *Aa* = *<sup>π</sup>* <sup>4</sup> · *<sup>d</sup>*<sup>2</sup> *<sup>a</sup>* is the area of the head circle (with *da* = 2 · *ra*).

#### *2.2. Radius of Curvature at Profile Corners and Flanks*

From a manufacturing point of view, the radius of the curvature of the contour at profile corners (on the head circle) plays an important role. Using the equation presented in [11], the radius of curvature can be determined:

$$\rho = 2\mathbf{i} \cdot \frac{(\dot{\boldsymbol{\omega}} \cdot \ddot{\bar{\boldsymbol{\omega}}})^{\frac{3}{2}}}{\ddot{\boldsymbol{\omega}} \cdot \ddot{\boldsymbol{\omega}} - \dot{\boldsymbol{\omega}} \cdot \ddot{\bar{\boldsymbol{\omega}}}} = \frac{\left| \dot{\boldsymbol{\omega}} \right|^{3}}{\operatorname{Im}(\ddot{\bar{\boldsymbol{\omega}}} \cdot \ddot{\boldsymbol{\omega}})} \tag{5}$$

The second derivative of the mapping function in (5) is defined as .. *ω* = *<sup>d</sup>*2*<sup>ω</sup> dt*<sup>2</sup> .

The radius of curvature at profile corners (on the head circle in Figure 1) can be determined by substituting mapping function (2) into Equation (5) for *t* = 0 as follows:

$$\rho\_d = \frac{\left(d\_d - 2 \cdot c \cdot n\right)^2}{2 \cdot \left[d\_d + 2 \cdot c \cdot n \cdot (n - 2)\right]} \tag{6}$$

The radius of curvature at profile corners *ρ<sup>a</sup>* is important in connection with the minimum tool diameter regarding the manufacturability of the profile.

The radius of curvature of the profile in the profile flank *ρ<sup>f</sup>* (Figure 1) can also be determined using Equation (5) for *t* = *π*/*n*:

$$\rho\_f = \frac{\left[d\_d + 2 \cdot \varepsilon \cdot (n - 2)\right]^2}{2 \cdot \left[d\_d - 2 \cdot \varepsilon \cdot (n^2 - 2 \cdot n + 2)\right]} \tag{7}$$

The radius of curvature in the flank area *ρ<sup>f</sup>* is a measure of the degree of the form closure of profile contours.

#### *2.3. Bending Stresses*

In many practical applications, a failure may occur in the profiled shaft outside of the connection due to the excessive stresses. For these cases, the following analytical approach based on [12] is used to solve the bending problem.

It is assumed that the cross-sections remain flat (without warping) after bending. The following relationships are valid for the stresses:

$$\begin{array}{c} \sigma\_x = \sigma\_y = \tau\_{xy} = \tau\_{yz} = \tau\_{xz} = 0\\ \sigma\_z = -\frac{M\_p}{\frac{1}{2}} \cdot \mathbf{x}\_\prime \end{array} \tag{8}$$

where *Iy* denotes the moment of inertia for profile cross-section relative to the y-axis (Figure 5).

**Figure 5.** The bending coordinate system for a loaded profile shaft.

#### *2.4. Bending Deformations*

Displacement is determined using Hooke's law, and the corresponding correlation between displacements and the strain is as follows (see [12,13]):

$$\mu\_x = \frac{M\_b}{2 \cdot E \cdot I\_y} \cdot \left[ z^2 + \nu \cdot \left( y^2 - x^2 \right) \right] \tag{9}$$

#### *2.5. Moments of Inertia*

The moments of inertia involve a double integral over the profile's cross-section, but this can be reduced to a simple curvilinear integral over the profile contour using Green's theorem, as follows:

$$\begin{aligned} I\_X &= -\frac{1}{3} \int\_{\gamma} \mathbf{y}^3 dx \\ I\_Y &= \frac{1}{3} \int\_{\gamma} \mathbf{x}^3 dy \\ I\_{xy} &= \frac{1}{2} \int\_{\gamma} \mathbf{x}^2 y dy. \end{aligned} \tag{10}$$

The contour description according to Equation (2) is also advantageous here. For the contour of the profile's cross-section, the following coordinates apply:

$$\begin{aligned} x &= \frac{\omega(\lambda) + \omega(\lambda)}{2} \\ y &= \frac{\omega(\lambda) - \omega(\lambda)}{2 \cdot i} . \end{aligned} \tag{11}$$

By substituting Equation (11) in (10), *Iy*, *Iy*, *Ixy* can be determined as such:

$$\begin{aligned} I\_x &= -\frac{i}{48} \int\_{\gamma} \left( \omega(\lambda) - \omega(\lambda) \right)^3 d \left( \omega(\lambda) + \omega(\lambda) \right) \\\ I\_y &= \frac{i}{48} \int\_{\gamma} \left( \omega(\lambda) + \omega(\lambda) \right)^3 d \left( \omega(\lambda) - \omega(\lambda) \right) \\\ I\_{xy} &= -\frac{1}{32} \int\_{\gamma} \left( \omega(\lambda) + \omega(\lambda) \right)^2 \left( \omega(\lambda) - \omega(\lambda) \right) d \left( \omega(\lambda) - \omega(\lambda) \right) \end{aligned} \tag{12}$$

where *λ* = *eit*. Function (12) facilitates the determination of moment of inertia with the assistance of Equation (2).

The moment of inertia *Iy* is necessary for the calculation of the bending stress *σ<sup>z</sup>* as well as for the determination of bending deformation *ux* (Equations (8) and (9)).

Inserting the mapping function from (2) into Equation (12) for *Iy*, the following relationship is determined for the bending moment of inertia for an arbitrary number of flanks *n* and eccentricity *e*:

$$I\_Y = \frac{\pi}{4} \cdot \left( r^4 - 2e^2(n-2)r^2 - e^4(n-1) \right) \tag{13}$$

If one substitutes *x*(*t*) from (1) and *Iy* from (13) into Equation (8), the distribution of the bending stress on the lateral surface of the profile can be determined as follows:

$$\sigma\_b(t) = \frac{4M\_b}{\pi} \cdot \frac{r \cos(t) + c \cos((n-1)t)}{r^4 - 2\epsilon^2(n-2)r^2 - \epsilon^4(n-1)}\tag{14}$$

The maximum bending stress on the tension side occurs at *x* = *r* + *e* (on the profile head, Figure 5), and therefore the following equation can be obtained:

$$
\sigma\_{\rm bh} = \frac{4M\_b}{\pi} \cdot \frac{r + \varepsilon}{r^4 - 2\varepsilon^2 (n - 2)r^2 - \varepsilon^4 (n - 1)} \tag{15}
$$

The bending stress on the pressure side occurs at *x* = *r* − *e* in the middle of a profile flank (on the profile foot, Figure 5) can also be determined as follows:

$$
\sigma\_{bf} = \frac{4M\_b}{\pi} \cdot \frac{r - \varepsilon}{r^4 - 2\varepsilon^2 (n - 2)r^2 - e^4 (n - 1)} \tag{16}
$$

*2.6. Example*

%HQGLQJ6WUHVV1PP

An H-profile from DIN 3689-1 [2] with three sides, a head circle diameter of 40 mm and eccentricity *e* = 1.818 mm (*r* = 18.18 mm; related eccentricity *ε* = 0.1) was chosen as the object of investigation. The bending load was chosen as *Mb* = 500 Nm.

In order to compare the analytical results, numerical investigations were carried out using FE analyses, and the MSC-Marc programme system was used.

Figure 6 shows the mesh structure and the corresponding boundary conditions. The shaft is fixed on the right side. A bending moment is applied on the left side of the shaft via a reference node using REB2s. Bending stresses were evaluated at an adequate distance (*lb*) from the loading point. The FE mesh in Figure 6 contains hexahedral elements with full integration, type 7 according to the Marc Element Library [14].

**Figure 6.** FE mesh and boundary conditions for the H-profile with *n* = 3 according to DIN 3689-1.

FE structures are generated by employing software written in Python language at the Chair of Machine Elements at West Saxon University of Zwickau, Germany. The FE meshes were then transferred to MSC-Marc program system and integrated into pre-processing.

Figure 7 displays the distribution of bending stress on the circumference of the profile according to Equation (14) and its comparison with the numerical result. A good agreement between the results was observed.

&LUFXPIHUHQWLDO3RVLWLRQPP

**Figure 7.** Circumferential distribution of the bending stress on the lateral surface of a standardised H3 profile.

Additionally, bending stresses were experimentally determined for the profile head and foot areas. Figure 8 shows the test bench for bending load.

**Figure 8.** Bending loads test bench (Machine Elements Laboratory at West Saxon University of Zwickau).

Experimental results for head and foot areas are compared with Equations (15) and (16) in Figure 9, where a good agreement of the results is evident.

**Figure 9.** Comparison of the experimental results with the analytical solutions.

#### *2.7. Stress Factor for Bending Loads*

The stress factor is defined as the ratio of bending stress in a profile shaft to a corresponding reference stress for a round cross-section with radius *r* (nomial radius of the profile):

$$\begin{aligned} a\_b &= \frac{\sigma\_b}{\sigma\_{b,ref}}\\ \text{with: } \sigma\_{b,ref} &= \frac{M\_b \cdot r}{I\_{y,ref}} \text{ and } I\_{y,ref} = \frac{\pi}{4} \cdot r^4. \end{aligned} \tag{17}$$

For the head of the profile, the stress factor is determined as follows:

$$\alpha\_{bh} = \frac{1+\varepsilon}{1 - 2\varepsilon^2(n-2) - \varepsilon^4(n-1)}\tag{18}$$

Figure 10 shows the curves for the stress factor *αbh* as a function of the relative eccentricity *ε* for different numbers of sides *n*. It can be recognised that the stress factor rises with an increase in eccentricity and or the number of sides.

**Figure 10.** Stress factors for the bending stress at the profile head (Equation (18)) with varying relative eccentricity and number of sides.

For the profile base (foot), the following stress factor is analogously obtained:

$$\alpha\_{bf} = \frac{1 - \varepsilon}{1 - 2\varepsilon^2(n - 2) - \varepsilon^4(n - 1)}\tag{19}$$

#### *2.8. Rotating Bending Stress*

During power transmission, the gear shaft always shows rotational movement. Therefore, the rotating bending was also investigated.

Figure 11 schematically represents the rotated position of an H-profile with three flanks according to the Cartesian coordinates.

**Figure 11.** Rotated coordinate system for determining the bending moment of inertia.

The moment of inertia remains invariant due to the periodic symmetry of the crosssection of the H-profile presented based on Equation (2). Therefore, the following relationships are valid from Equation (12):

$$I\_x = I\_y \text{ and } I\_{xy} = 0. \tag{20}$$

From Equation (20) and the use of Mohr's circle, it can be proven that the moment of inertia is independent of the rotation angle *φ* (see also [10]):

$$\begin{array}{c} I\_{\tilde{\xi}} = I\_{\eta} \left(=I\_{\mathbf{x}} = I\_{\mathbf{y}}\right) \\\ I\_{\tilde{\xi}\eta} = I\_{\mathbf{x}\mathbf{y}} = \mathbf{0}. \end{array} \tag{21}$$

In order to obtain the general solution of the bending stress according to Equation (8) for an arbitrary angle of rotation, the perpendicular distance *ξ* is to be calculated in the rotated coordinate system:

$$\xi^{\mathfrak{x}}(\phi) = y \cos(\phi) - x \sin(\phi) \tag{22}$$

where *φ* denotes the angle of rotation. If the values for *x* and *y* from (1) are inserted into the relationship (22), the following equation results for the perpendicular distance in the rotated coordinate system (0 ≤ *t* ≤ 2*π*):

$$\xi(\phi, t) = r \sin(t - \phi) - \varepsilon \sin((n - 1)t + \phi) \tag{23}$$

The distribution of bending stress on the profile contour may be determined by using (23) in the relation of bending stress as follows:

$$\sigma\_b(\phi, t) = -\frac{M\_b}{I\_\eta} \cdot \xi(\phi, t) = \frac{4M\_b}{\pi} \cdot \frac{r \sin(t - \phi) - e \sin((n - 1)t + \phi)}{r^4 - 2r^2(n - 2)r^2 - e^4(n - 1)}\tag{24}$$

Figure 12 shows the distributions of the bending stresses on the profile contour for different angles of rotation, which were determined using Equation (24). As expected, the maximum stress occurred at the profile head.

**Figure 12.** Distributions of the bending stresses on the profile contour for different angles of rotation *φ*, with *r* = 18.18 mm, *n* = 3, *e* = 1.818 mm, and *Mb* = 500 Nm.

#### *2.9. Deflection*

The deflection of the profile shaft can also be determined with the help of the bending moment of inertia *Iy*. As explained above, this is independent of the angular position of the cross-section (Equation (21)).

The deflection of the neutral axis is determined from Equation (9) for *x* = *y* = 0 as follows:

$$\delta\_x = \frac{M\_b}{2 \cdot E \cdot I\_y} \cdot z^2 \tag{25}$$

Substituting (13) in (25), the deflection can be determined as

$$\delta\_{\rm x} = \frac{2M\_b}{\pi E} \cdot \frac{z^2}{r^4 - 2\varepsilon^2(n-2)r^2 - \varepsilon^4(n-1)}\tag{26}$$

#### *2.10. Example*

Figure 13 shows the deflection for an H-profile shaft with three flanks according to DIN3689-1 with *da* = 40 mm (H3-40 × 32.73 with *ε* = 0.1) and a length of 160 mm made of steel (E = 210,000 <sup>N</sup> mm2 ). The comparison with FE analysis shows very good agreement with Equation (26), as can also be seen in Figure 13. The bending load was chosen as *Mb* = 500 Nm.

**Figure 13.** Deflection in a DIN3689-H3-40 × 32.73 profile shaft.

#### *2.11. H-Profiles According to DIN3689-1*

DIN3689-1 is a new standard that was published for the first time in November 2021. It describes the geometric properties of 18 specified H-profiles in two series. Series A is based on the head diameter, and series B involves the foot diameter as the nominal size of the profile. The respective corresponding profiles are geometrically similar. Each series contains 48 nominal sizes, which remain geometrically similar amongst themselves. Consequently, all standardised profiles are limited to 18 variants. This facilitates the processing of a generally valid design concept.

#### *2.12. Stress Factor for Bending*

The maximum bending stresses at the head and foot of the profile are important from a technical point of view for the design of a profile shaft subject to bending. Therefore, in this section, the two stress factors *αbh* and *αb f* for all the 18 standard profile series were determined using Equations (18) and (19).

#### *2.13. Stress Factor for Torsion*

The stress concentration factor for torsion *α<sup>t</sup>* is defined as the ratio of the maximum torsional stress *τt*,*max* (occurring in the middle of the profile flank) and the torsional stress in a round reference shaft with radius *r*:

$$\begin{aligned} \alpha\_{l} &= \frac{\tau\_{t,max}}{\tau\_{t,ref}}\\ \text{with: } \tau\_{t,ref} &= \frac{M\_{l} \cdot r}{I\_{l,ref}} \text{ and } I\_{l,ref} = \frac{\pi}{2} \cdot r^{4}. \end{aligned} \tag{27}$$

In [15], purely numerical investigations were carried out on the torsional stresses in H-profile shafts to calculate the stress factor.

The analytical solution for torsion may be performed using the approach of Muskhelishvili [12]. However, this requires a conformal mapping of the unit circle onto the polygon's cross-section. For H-profiles, the mapping function derived from the parametric equation, Equation (1), cannot be directly used to solve the torsional stresses due to the multiple poles. The authors of [16] employed an elaborate computational process to determine the polynomials required for the description of the mappings of H-profiles. In [8,17–19], successive methods according to Kantorovich [20] were used to develop a suitable mapping function in the form of a series converging to the profile contour. The convergence quality and limit were examined and presented depending on the number of terms in the series developed in [8], calculating the torsional deformations for all standardised profiles. In the presented work, this method, accompanied by FEA, was used for all the 18 standardised profile geometries of DIN3689-1 to determine the maximum torsional stresses, which occur in the middle of the profile flank at the profile foot. A stress concentration factor for torsional loading *α<sup>t</sup>* was also determined analogously to that defined for the case of bending load.

For practical applications, the results for the bending and torsional stress factors are compiled in Table 1. Using the relative eccentricity, no dependence on the shaft diameter appears. Table 1 lists the results obtained for the bending and torsional stress factors for all standardised profile geometries according to DIN3689-1 (rounded to two decimal places).

**Table 1.** Stress factors for bending and torsional loads for the H-profiles standardised according to DIN3689-1.


The bending moment of the inertia of a circular cross-section with radius *r* is defined as a reference moment of inertia and labelled *I*0. The ratio between *Iy* and *I*<sup>0</sup> is also listed in Table 1 for the standardised profiles. The H-profiles are normally slightly more flexible than round profiles.

#### **3. Conclusions**

In this paper, an analytical approach was presented to determine the bending stresses and deformations in the hypotrochoidal profile shafts. Valid calculation equations for the area, radii of curvature of the profile contour, and the bending moment of inertia were derived for such profiles. Furthermore, the solutions for bending stresses and deformations were presented. For practical applications, a stress factor was defined for the critical locations on the profile contour.

The analytical results demonstrated very good agreement with both numerical and experimentally determined results.

The stress factors of the bending stresses were determined for all profile geometries standardised according to DIN3689-1, and the values obtained were compiled in a table for practical applications. Based on previous works of the author, the stress factors for torsional stresses were also determined and added to the table. The data allow a reliable and cost-effective calculation of H-profile shafts with a pocket calculator for pure bending as well as torsional loads. This can be very advantageous for SMEs.

**Funding:** This research was funded by DFG (Deutsche Forschungsgemeinschaft) grant number [DFG ZI 1161/2].

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Abbreviations**

Formula Symbols:



#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **On Long-Range Characteristic Length Scales of Shell Structures**

**Harri Hakula \***

Department of Mathematics and Systems Analysis, Aalto University, Otakaari 1, FI-00076 Espoo, Finland

**Abstract:** Shell structures have a rich family of boundary layers including internal layers. Each layer has its own characteristic length scale, which depends on the thickness of the shell. Some of these length scales are long, something that is not commonly considered in the literature. In this work, three types of long-range layers are demonstrated over an extensive set of simulations. The observed asymptotic behavior is consistent with theoretical predictions. These layers are shown to also appear on perforated structures underlying the fact these features are properties of the elasticity equations and not dependent on effective material parameters. The simulations are performed using a high-order finite element method implementation of the Naghdi-type dimensionally reduced shell model. Additionally, the effect of the perforations on the first eigenmodes is discussed. One possible model for buckling analysis is outlined.

**Keywords:** shells; boundary layers; finite element method

#### **1. Introduction**

Shell structures and, in particular, thin shells remain challenging for both theoretical and computational structural analysis [1]. One must either use special shell elements or rely on a high-order finite element method, as is performed here. Computing with 3D formulations is still prohibitively expensive. One of the defining features of shells is that every solution of a shell problem can be thought of as a linear combination of features or boundary layers each with its own characteristic length scale, including the so-called smooth component, which typically spans the whole structure. The effects of curvature lead to boundary layers that can also be internal, something that cannot happen in plates, for instance. Moreover, some of the layers can have long, yet parameter-dependent, length scales that have not received much attention in the literature. Indeed, in the first paper introducing modern boundary layer analysis [2], the long-range features on cylindrical shells were omitted since their meaning was not properly understood.

Thin structures are normally modeled as two-dimensional ones via dimension reduction, where the thickness becomes a parameter. For the sake of analysis, the thickness is defined as a dimensionless constant. Once the reduced linear elasticity equations have been obtained, the characteristic length scales can be derived as functions of the parameter. Classical boundary layers have short length scales, and internal boundary layers may have long-range effects along the characteristic curves of such surfaces, but with parameterdependent widths. The third category of the characteristic length scales, the long-range effects, are the focus of this paper. Every layer is generated by some combination of curvature, kinematic constraints, and loading; in other words, every layer has its own generator. The standard reference for boundary layers of shells is Pitkäranta et al. [3], where every generator of a layer is taken to be either a straight line or a point. This work was later extended via the introduction of curved generators [4,5]. In fact, this extension shows that the collection of boundary layers is not finite.

Shells of revolution are a representative class of thin structures. Let us denote the thickness with *d*, and the dimensionless thickness with *t* = *d*/*LD*, where *LD* is taken to be the diameter of the domain, for example. The practical range in engineering problems is typically *t* ∈ [1/1000, 1/100], and already at *t* = 1/10 the dimension reduction is not

**Citation:** Hakula, H. On Long-Range Characteristic Length Scales of Shell Structures. *Eng* **2023**, *4*, 884–902. https://doi.org/10.3390/eng4010053

Academic Editor: Antonio Gil Bravo

Received: 10 December 2022 Revised: 3 February 2023 Accepted: 1 March 2023 Published: 6 March 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

effective, depending on the model of course. In analysis, *t* is the convenient parameter, and in the sequel, *d* and *t* can be used interchangeably. The long-range layers have characteristic length scales of *L* ∼ 1/*t* and *L* ∼ 1/ <sup>√</sup>*t*. In addition, there are length scales <sup>∼</sup> <sup>√</sup>*<sup>n</sup> <sup>t</sup>*, with *n* = 1, 2, ..., of which only *n* = 1, 2 are standard boundary layers. For illustrative examples, see Figure 1.

**Figure 1.** Examples of structures subject to long-range layers: transverse deflection profiles. In both cases, the dimensionless thickness *t* = 1/100, all boundaries are kinematically fully constrained, and the loading is the unit pressure. (**a**): Long cylinder with an ellipsoidal hole. (**b**): Circular plate with an elliptic extension. The load is acting only on the extension.

The purpose of this paper is show that many interesting structural responses in shells are due to long-range layers. The main contribution of this paper is to show that these effects also exist in perforated structures. This serves as a useful remainder that even though the internal and long-range boundary layers have wave propagation characteristics, the underlying equations are those of linear elasticity, and phenomena such as wave dispersion do not occur. In the numerical examples below, all three geometry classes, parabolic, hyperbolic, and elliptic, are included. Another contribution of this study is that the results on hyperbolic cases are shown to agree with those on the parabolic ones, as predicted by the theory.

One of the complicating aspects is that the long-range effects are also combined with standard boundary layers, which typically dominate in amplitudes. This is probably one of the reasons why the existing literature is not extensive. As mentioned above, Pitkärantaet al. [3] is the standard reference. In elliptic shells, a curious boundary-layerlike long-range effect can also occur in connection with the so-called sensitivity of the structure. Here, the comprehensive study is by Sanchez-Palencia et al. [6]. In an excellent review article by Pietraszkiewicz and Konopi ´nska [7] in the section on shells of revolution, only standard boundary layers are addressed. Similarly, in the paper by Malliotakis et al. [8] in connection to a wind tower problem, the focus is on the short-range phenomena. This is understandable since the maximal stresses occur due to standard boundary layers. Additionally, as is also evident in the numerical examples below, visualization and precise analysis of the long-range features of the solutions are difficult.

The simulation of thin structures with the finite element method poses its own set of difficulties. A highly influential case study was carried out by Szabo [9], where a complete analysis of a shell structure starting from measurements and ending with error analysis is discussed. The thin shell models have been verified in static and dynamic settings with experiments with reasonable agreement. In particular, one must take numerical locking into account and either use special shell elements or rely on higher-order methods [10]. In this work, the latter approach is adopted and justified using numerical energy convergence observations. One of the open problems in thin shell modeling is the question of buckling. One of the models applicable within the framework used in this paper is due to Niemi [11].

The three examples are motivated by the wind tower example mentioned above [8], interesting modeling problems encountered when long-range effects emerge, [12,13], and recent work on homogenization of perforated shell structures, where the curved generator cases have not been considered [14]. In both [12,13], interesting long-range responses are modeled where the driving force is internal torsion resulting from a bilayer or similar materials. It is an interesting question for future work to see if some connections with the results presented in this work can be found and formulated precisely.

The rest of the paper is structured as follows: In Section 2, the necessary background material is covered. Shell models are introduced in Section 3, and the related layers are discussed in the following Section 4. The set of three examples is presented in Section 5, followed by conclusions in Section 6. An alternative shell model is briefly outlined in Appendix A. Finally, the buckling problem is discussed in the last appendix (Appendix B).

#### **2. Preliminaries**

In this section, the necessary background material is introduced. The notation used in the sequel is established here as well.

#### *2.1. Navier's Equations of Elasticity*

In this section, the elasticity equations are introduced. For reference, see for instance [15]. Let *D* be a domain representing a deformable medium subject to a body force **f** and a surface traction **g**. The 3D model problem is then to find the displacement field **u** = (*u*1, *u*2, *u*3), and the symmetric stress tensor *σ* = (*σij*)<sup>3</sup> *<sup>i</sup>*,*j*=1, such that

$$\begin{aligned} \boldsymbol{\sigma} &= \lambda \operatorname{div}(\mathbf{u}) I + 2\mu \boldsymbol{\varepsilon}(\mathbf{u})\_{\prime} & \quad \text{in } D \\ \mathbf{I} - \operatorname{div}(\boldsymbol{\sigma}) &= \mathbf{f}\_{\prime} & \quad \text{in } D \\ \mathbf{u} &= \mathbf{0}\_{\prime} & \quad \text{on } \partial D\_{D} \\ \boldsymbol{\sigma} \cdot \mathbf{n} &= \mathbf{g}\_{\prime} & \quad \text{on } \partial D\_{N} \end{aligned} \tag{1}$$

where *∂D* = *∂DD* ∪ *∂DN* is a partitioned boundary of *D*. The Lamé constants are

$$
\lambda = \frac{E\,\nu}{(1+\nu)(1-2\nu)}, \qquad \mu = \frac{E}{2(1+\nu)}.\tag{2}
$$

with *<sup>E</sup>* and *<sup>ν</sup>* being Young's modulus and Poisson's ratio, respectively. Further, *I* is the identity tensor, **n** denotes the outward unit normal to *∂DN*, and the strain tensor is

$$\boldsymbol{\epsilon}(\mathbf{u}) = \frac{1}{2} (\nabla \mathbf{u} + \nabla \mathbf{u}^T). \tag{3}$$

The vector-valued tensor divergence is

$$\text{div}(\sigma) = \left(\sum\_{j=1}^{3} \frac{\partial \sigma\_{ij}}{\partial \mathbf{x}\_{j}}\right)\_{i=1}^{3} \,. \tag{4}$$

This formulation assumes a constitutive relation corresponding to linear isotropic elasticity with stresses and strains related by Hooke's generalized law

$$
\sigma\_{\upsilon} = \mathbf{D}(\lambda, \mu) \varepsilon\_{\upsilon} \tag{5}
$$

where the constitutive matrix **D**(*λ*, *μ*) relates the symmetric parts of and *σ*.

If the domain *D* is thin, then one of the dimensions is much smaller than the other two. In standard discretisation of the problem, for instance, with the finite element method, the small dimension, say the thickness, forces the sizes of the elements to be equally small and

the simulations become expensive. This could be alleviated with alternative discretisation methods such as isogeometric analysis or high-order finite element method with carefully constructed meshes. Here, the approach taken is to modify the equations via dimension reduction [16].

#### *2.2. Surface Definitions*

In this work, the focus is solely on thin shells of revolution. They can formally be characterized as domains in R<sup>3</sup> of type

$$D = \{ \mathbf{x} + z \mathbf{n}(\mathbf{x}) \mid \mathbf{x} \in \Gamma, -d/2 < z < d/2 \},\tag{6}$$

where *d* is the (constant) thickness of the shell, Γ is a (mid)surface of revolution, and **n**(**x**) is the unit normal to Γ. The principal curvature coordinates require only four parameters, the radii of principal curvature *R*1, *R*2, and the so-called Lamé parameters, *A*1, *A*2, which relate coordinates changes to arc lengths, to specify the curvature and the metric on Γ. The displacement vector field of the midsurface **u** = {*u*, *v*, *w*} can be interpreted as projections to directions

$$\mathbf{e}\_1 = \frac{1}{A\_1} \frac{\partial \Psi}{\partial x\_1}, \quad \mathbf{e}\_2 = \frac{1}{A\_2} \frac{\partial \Psi}{\partial x\_2}, \quad \mathbf{e}\_3 = \mathbf{e}\_1 \times \mathbf{e}\_2,\tag{7}$$

where Ψ(*x*1, *x*2) is a suitable parametrisation of the surface of revolution, **e**1, **e**<sup>2</sup> are the unit tangent vectors along the principal curvature lines, and **e**<sup>3</sup> is the unit normal. In other words, **u** = *u* **e**<sup>1</sup> + *v* **e**<sup>2</sup> + *w* **e**3.

#### Profile Functions and Parametrisation

When a plane curve is rotated (in three dimensions) around a line in the plane of the curve, it sweeps out a surface of revolution. Consider a plane curve, the so-called profile function in the *xy*-plane, *y* = *γ*(*x*). Without any loss of generality, in the sequel the surfaces are generated by a curve rotating either around the *x*-axis or *y*-axis. This profile function is denoted with *f*(*x*) and the resulting surface Γ*<sup>f</sup>* for the case of the *x*-axis, and for the *y*-axis with *g*(*x*) and the resulting surface Γ*g*.

Let *<sup>I</sup>* = [*α*, *<sup>β</sup>*] <sup>⊂</sup> <sup>R</sup> be a bounded closed interval, and let *<sup>f</sup>*(*x*) : *<sup>I</sup>* <sup>→</sup> <sup>R</sup><sup>+</sup> be a regular function. The shell midsurface Γ*<sup>f</sup>* is parameterised by means of the mapping

$$\begin{aligned} \mathbf{^q \mathbf{v}\_f} &: I \times [0, 2\pi] \longrightarrow \mathbb{R}^3 \\ \mathbf{^q \mathbf{v}\_f (x\_1, x\_2)} &= (\mathbf{x}\_1, f(\mathbf{x}\_1) \cos \mathbf{x}\_2, f(\mathbf{x}\_1) \sin \mathbf{x}\_2) \end{aligned} \tag{8}$$

For Γ*<sup>f</sup>*

$$A\_1(\mathbf{x}) = \sqrt{1 + [f'(\mathbf{x})]^2}, \quad A\_2(\mathbf{x}) = f(\mathbf{x}), \tag{9}$$

and

$$R\_1(\mathbf{x}) = -\frac{A\_1(\mathbf{x})^3}{f''(\mathbf{x})}, \quad R\_2(\mathbf{x}) = A\_1(\mathbf{x})A\_2(\mathbf{x}).\tag{10}$$

Let *<sup>J</sup>* = [*α*, *<sup>β</sup>*] <sup>⊂</sup> <sup>R</sup> be a bounded closed interval with *<sup>α</sup>* <sup>&</sup>gt; 0, and let *<sup>g</sup>*(*x*) : *<sup>J</sup>* <sup>→</sup> <sup>R</sup> be a regular function. In this case, the shell midsurface Γ*<sup>g</sup>* is parameterised by means of the mapping

$$\begin{array}{lcl}\Psi\_{\mathcal{S}} & : & \mathcal{J} \times [0, 2\pi] \longrightarrow \mathbb{R}^{3} \\ & \Psi\_{\mathcal{S}}(\mathbf{x}\_{1}, \mathbf{x}\_{2}) = (\mathbf{x}\_{1} \cos \mathbf{x}\_{2}, \mathcal{g}(\mathbf{x}\_{1}), \mathbf{x}\_{1} \sin \mathbf{x}\_{2}) \,. \end{array} \tag{11}$$

For Γ*<sup>g</sup>*

$$A\_1(\mathbf{x}) = \sqrt{1 + [\mathbf{g'(x)}]^2}, \quad A\_2(\mathbf{x}) = \mathbf{x}, \tag{12}$$

and

$$R\_1(\mathbf{x}) = \frac{A\_1(\mathbf{x})^3}{\mathcal{g}^{\prime\prime}(\mathbf{x})}, \quad R\_2(\mathbf{x}) = A\_1(\mathbf{x})A\_2(\mathbf{x}). \tag{13}$$

#### *2.3. Perforations*

Perforated domains are characterized by the penetration patterns, which in turn depend on the underlying manufacturing processes and the related hole coverage, typically given as a percentage.

The quantity used to characterize perforated sheets of metal is the ligament efficiency *η*. Let us assume that the holes are ellipses with *a*, *b* as the horizontal and perpendicular semiaxis, and the separation of the centers used in the definitions is *Px* and *Py*, respectively. Following [17–19], one can define both the horizontal and perpendicular ligament efficiency, denoting them as *η<sup>x</sup>* and *ηy*, respectively. For regular arrays of holes,

$$
\eta\_{\mathbf{x}} = (P\_{\mathbf{x}} - 2\,\mathrm{a}) / P\_{\mathbf{x}\prime} \qquad \eta\_{\mathbf{y}} = (P\_{\mathbf{y}} - 2\,\mathrm{b}) / P\_{\mathbf{y}\prime} \tag{14}
$$

and for triangular arrays, allowing for alternating layers,

$$
\eta\_{\mathbf{x}} = \left(P\_{\mathbf{x}} - 4 \,\mathrm{a}\right) / P\_{\mathbf{x}}, \qquad \eta\_{\mathbf{y}} = \left(P\_{\mathbf{y}} - 4 \,\mathrm{b}\right) / P\_{\mathbf{y}}.\tag{15}
$$

For circular holes, the radius *r* = *a* = *b*, of course, and further if the pattern is regular *η* = *η<sup>x</sup>* = *ηy*. Both pattern types are illustrated in Figure 2. Notice that the triangular pattern in the figure has a tighter packing than that implied by (15).

**Figure 2.** Penetration patterns. (**a**) Regular pattern. (**b**) Triangular pattern.

#### *2.4. Finite Element Method*

All numerical simulations reported here have been computed with two different highorder continuous Galerkin codes in 2D solving the variational formulation on conforming meshes of quadrilateral or triangular elements.

One of the challenges in shell problems is to avoid numerical locking. Instead of using special shell elements [1], one can let the higher-order FEM alleviate the locking and accept that some thickness dependent error amplification or locking factor, *K*(*t*) ≥ 1, is unavoidable. For the *hp*-FEM solution, one can derive a simple error formulation

$$\text{error} \sim K(t)(h/L\_D)^p,\tag{16}$$

where *h* is the mesh spacing, *LD* is the diameter of the domain, and *p* is the degree of the elements. It is possible that *K*(*t*) diverges as *t* tends to zero, with the worst case being for pure bending problems: *K*(*t*) ∼ 1/*t*. In (16), the latter part follows from standard approximation theory of the FEM. It is the term *K*(*t*) that is shell specific. If the bending part in the energy expressions given below dominates, the energy norm depends linearly on *t* and hence any energy error estimate has an inverse dependence, that is, ∼ 1/*t*. This simple error formula suggests why higher-order methods are advantageous in shell problems: the mesh over-refinement in the "worst" case is <sup>∼</sup> (1/*t*)1/*p*, which for a fixed *<sup>t</sup>* <sup>=</sup> 1/100, say, indicates that for *p* = 4 the requirement is moderate in comparison to the case of *p* = 1. This also suggests that convergence in *p* is a useful measure if the problem is otherwise difficult to analyze exactly. For a more detailed discussion on this and further references, see [20,21].

#### Implementations

The first solver used in this study is implemented with Mathematica, providing exact geometry handling of the holes via blending functions [22]. The second one is AptoFEM, a parallel code implemented in FORTRAN90 and MPI. Both codes allow for arbitrary order of polynomials to be used in the elements including different orders of polynomials in individual elements in the same mesh [14].

#### **3. Shell Models**

If the shell of revolution is defined by the profile function *f*(*x*) defined over some interval *x* ∈ *I* = [*x*0, *x*1], then using the derivatives of the profile function all shell geometries can be classified in terms of Gaussian curvature (see, for instance, [23]). The analysis of shell problems is greatly simplified if the type of curvature is uniform.


#### *Dimensionally Reduced Elasticity Equations: Naghdi Model*

Consider a shell of (constant) thickness *d*, the mid-surface *ω* of which occupies a region to of some smooth surface Γ. This is a three-dimensional body equipped with principal curvature coordinates as defined in Section 2.2 for which the 3D theory of linear elasticity could be considered "exact" for small deformations. One of the classical dimension reduction models is applied here, since these models reveal the nature of shell deformations more explicitly. Such models are often reasonably accurate for thin shells [3]. The displacement field u has five components, *u*, *v*, *w*, *θ*, and *ψ*, each of which is a function of two variables on the mid-surface of the shell. The first two components represent the tangential displacements of the mid-surface, *w* is the transverse deflection, and *θ* and *ψ* are dimensionless rotations. The model is similar to the Reissner–Mindlin model for plate bending and is sometimes named after Naghdi. Assume that the shell consists of homogeneous isotropic material with Young modulus *E* and Poisson ratio *ν*. Then, the total energy of the shell in our dimension reduction model is expressed as

$$\mathcal{F}(\mathbf{u}) = \frac{1}{2} S(d \, a(\mathbf{u}, \mathbf{u}) + d^3 \, b(\mathbf{u}, \mathbf{u})) - q(\mathbf{u}), \tag{17}$$

where *<sup>S</sup>* <sup>=</sup> *<sup>E</sup>*/(12(<sup>1</sup> <sup>−</sup> *<sup>ν</sup>*2)) is a scaling factor, *<sup>q</sup>* is the external load potential, and *<sup>a</sup>*(**u**, **<sup>u</sup>**) and *b*(**u**, **u**) represent the portions of total deformation energy that are stored in membrane and transverse shear deformations and bending deformations, respectively. The latter are quadratic forms independent of *d* and defined as

$$\begin{aligned} a(\mathbf{u}, \mathbf{u}) &= \ &a\_{\mathrm{m}}(\mathbf{u}, \mathbf{u}) + a\_{\mathrm{s}}(\mathbf{u}, \mathbf{u}) \\ &= &12 \int\_{\omega} \left[ \nu(\beta\_{11}(\mathbf{u}) + \beta\_{22}(\mathbf{u}))^2 + (1 - \nu) \sum\_{i,j=1}^{2} \beta\_{ij}(\mathbf{u})^2 \right] A\_1 A\_2 \, d\gamma + \\ &\quad \boldsymbol{\theta}(1 - \nu) \int\_{\omega} \left[ \left( \rho\_1(\mathbf{u})^2 + \rho\_2(\mathbf{u}) \right)^2 \right] A\_1 A\_2 \, d\gamma,\end{aligned} \tag{18}$$

$$b(\mathbf{u}, \mathbf{u}) \quad = \int\_{\omega} \left[ \nu(\kappa\_{11}(\mathbf{u}) + \kappa\_{22}(\mathbf{u}))^2 + (1 - \nu) \sum\_{i,j=1}^{2} \kappa\_{ij}(\mathbf{u})^2 \right] A\_1 A\_2 \, d\gamma,\tag{19}$$

where *βij*, *ρi*, and *κij* stand for the membrane, transverse shear, and bending strains, respectively. The strain-displacement relations are linear and involve at most first derivatives of the displacement components.

**Remark 1.** *In the following, we shall omit the constant factor d S from the energy expressions. Consequently, all results can be considered to be scaled with a factor* (*d S*)−1*.*

Following [16], the bending strains *κij* are

$$\begin{array}{rcl}\kappa\_{11} &=& \frac{1}{A\_1}\frac{\partial\theta}{\partial x} + \frac{\psi}{A\_1 A\_2}\frac{\partial A\_1}{\partial y},\\\kappa\_{22} &=& \frac{1}{A\_2}\frac{\partial\psi}{\partial y} + \frac{\theta}{A\_1 A\_2}\frac{\partial A\_2}{\partial x},\\\kappa\_{12} &=& \kappa\_{21} = \frac{1}{2}\left[\frac{1}{A\_1}\frac{\partial\psi}{\partial x} + \frac{1}{A\_2}\frac{\partial\theta}{\partial y} - \frac{\theta}{A\_1 A\_2}\frac{\partial A\_1}{\partial y} - \frac{\psi}{A\_1 A\_2}\frac{\partial A\_2}{\partial x}\right] \\& - \frac{1}{R\_1}\left(\frac{1}{A\_2}\frac{\partial u}{\partial y} - \frac{v}{A\_1 A\_2}\frac{\partial A\_2}{\partial x}\right) \\& - \frac{1}{R\_2}\left(\frac{1}{A\_1}\frac{\partial v}{\partial x} - \frac{u}{A\_1 A\_2}\frac{\partial A\_1}{\partial y}\right)\Big|,\end{array} \tag{20}$$

similarly the membrane strains *βij*

$$\begin{array}{rcl}\beta\_{11} &=& \frac{1}{A\_1}\frac{\partial u}{\partial x} + \frac{v}{A\_1 A\_2}\frac{\partial A\_1}{\partial y} + \frac{w}{R\_1},\\\beta\_{22} &=& \frac{1}{A\_2}\frac{\partial v}{\partial y} + \frac{u}{A\_1 A\_2}\frac{\partial A\_2}{\partial x} + \frac{w}{R\_2},\\\beta\_{12} &=& \beta\_{21} = \frac{1}{2}\left(\frac{1}{A\_1}\frac{\partial v}{\partial x} + \frac{1}{A\_2}\frac{\partial u}{\partial y} - \frac{u}{A\_1 A\_2}\frac{\partial A\_1}{\partial y} - \frac{v}{A\_1 A\_2}\frac{\partial A\_2}{\partial x}\right),\end{array} \tag{21}$$

and finally the shear strains *ρ<sup>i</sup>*

$$\begin{array}{rcl}\rho\_1 &=& \frac{1}{A\_1}\frac{\partial w}{\partial x} - \frac{u}{R\_1} - \theta, \\\rho\_2 &=& \frac{1}{A\_2}\frac{\partial w}{\partial y} - \frac{v}{R\_2} - \psi. \end{array} \tag{22}$$

**Remark 2.** *When the shell parametrisations defined above are used, all terms of the form ∂Ai*/*∂y are identically zero.*

The energy norm ||| · ||| is defined in a natural way in terms of the deformation energy and taking the scaling into account:

$$\mathcal{E}\left(\mathbf{u}\right) := ||\!|\mathbf{u}\,|||^2 = a(\mathbf{u}, \mathbf{u}) + d^2 \, b(\mathbf{u}, \mathbf{u}).\tag{23}$$

Similarly for bending, membrane, and shear energies:

$$\mathbf{B}(\mathbf{u}) := d^2 b(\mathbf{u}, \mathbf{u}), \quad \mathbf{M}(\mathbf{u}) := a\_{\mathrm{m}}(\mathbf{u}, \mathbf{u}), \quad \mathbf{S}(\mathbf{u}) := a\_{\mathrm{s}}(\mathbf{u}, \mathbf{u}). \tag{24}$$

The load potential has the form *q*(**v**) = *<sup>ω</sup>* **f**(*x*, *y*) · **v** *A*1*A*<sup>2</sup> *dx dy*. If the load acts in the transverse direction of the shell surface, i.e., **f**(*x*, *y*)=[0, 0, *fw*(*x*, *y*), 0, 0] *<sup>T</sup>*, and **<sup>f</sup>** <sup>∈</sup> [*L*2(*ω*)]<sup>5</sup> holds, then the variational problem has a unique weak solution **<sup>u</sup>** <sup>∈</sup> [*H*1(*ω*)]5. The corresponding result is true in the finite dimensional case, when the finite element method is employed.

In the following discussion, both free vibration and buckling problems will be briefly covered. To this effect, the mass matrix is defined as **M**(*t*) = *t M<sup>l</sup>* + *t* <sup>3</sup> *M<sup>r</sup>* , with *M<sup>l</sup>* (displacements) and *M<sup>r</sup>* (rotations) *independent of t*.

The geometric stiffness matrix used in buckling analysis in the dimensionally reduced case is still without a universally agreed definition (See Appendix B). Here, in the cylindrical case, the geometric stiffness matrix **U***<sup>g</sup>* is taken to be the inner product of the axial derivative of the transverse deflection with itself as suggested in [11]. Formally, **U***g*(**v**) = *<sup>ω</sup>*(*∂w*/*∂x*)<sup>2</sup> *<sup>A</sup>*1*A*<sup>2</sup> *dx dy*.

#### **4. Boundary and Internal Layers**

The theory of one-dimensional *hp*-approximation of boundary layers is due to Schwab [24]. Boundary layer functions are of the form *u*(*x*) = exp(−*a x*/*δ*), 0 < *x* < *L*, where *δ* ∈ (0, 1] is a small parameter, *a* > 0 is a constant, and *L* is the characteristic length scale of the problem under consideration. Even though in certain classes of problems it is possible to choose a robust strategy leading to uniform convergence in *δ*, the distribution of the mesh nodes *depends* on *p*, and over a range of polynomial degrees *p* = 2, ... , 8, say, the mesh is different for every *p*. In 2D, this requires one to allow for the mesh topology to change over the range of polynomial degrees. In this study, the short-range layers have been addressed in the meshes, but optimality in *p* has not been attempted.

It is useful to define the central concepts in a problem-independent manner.

**Definition 1** (Layer Element Width)**.** *For every boundary layer in the problem, one should have an element of width O*(*p δ*) *in the direction of the decay of the layer.*

Note, that with *c* constant, if *c p δ* → *L* as *p* increases, the standard *p*-method can be interpreted as the limiting method. Boundary layers can also occur within the domains, i.e., be internal layers, or emanate from a point. For our discussion, it is useful to define the concept of boundary layer generators (see [3]).

**Definition 2** (Layer Generator)**.** *The subset of the domain from which the boundary layer decays exponentially, is called the layer generator. Formally, the layer generator is of measure zero.*

The layer generators are independent of the length scale of the problem under consideration.

The types of layers that a shell structure can exhibit depend on its geometry, that is, on local curvature. Elliptic, parabolic, and hyperbolic structures each possess a distinctive set of layer deformations. The layer structure is classically assumed to be an exponential solution to the homogeneous Euler equations of the shell problem. In [3], it is shown using the Ansatz **u**(*ξ*, *η*) = **U***eλξ eik<sup>η</sup>* that solutions with Re*λ* < 0 such that the characteristic lengths *L* = 1/Re*λ* → 0 are of the form *L* ∼ *t* 1/*<sup>n</sup>* where *<sup>n</sup>* ∈ {1, 2, 3, 4}. Here, *<sup>ξ</sup>* is the coordinate orthogonal to the layer generator. From these, the layer with *n* = 2 is present in all geometries, whereas layers with *n* = 3 and *n* = 4 are present only in hyperbolic and parabolic geometries, respectively. The case *n* = 1 arises from a shear deformation and shows up only when a model similar to the model of Reissner and Naghdi is used in analysis.

If the curved generators are included [5], more characteristic lengths can be found. In particular, for elliptic shells with a parabolic curved layer generator, any *n* ≥ 2 can be

induced. Consider a shell structure generated by a rotation around the *y*-axis of the profile *<sup>g</sup>*(*x*) = *<sup>α</sup>*(*<sup>x</sup>* <sup>−</sup> *<sup>x</sup>*0)*m*, *<sup>x</sup>*<sup>0</sup> <sup>≤</sup> *<sup>x</sup>* <sup>≤</sup> *<sup>x</sup>*<sup>1</sup> so that at *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*<sup>0</sup> the geometry parameters will vanish, otherwise we have an elliptic shell. The solution to the shell problem under unit pressure is a layer deformation in the scale *L* = *t* 1/(*m*+1).

Finally, the long-range layers have the characteristic lengths *L* ∼ *t* <sup>−</sup><sup>1</sup> and *<sup>L</sup>* <sup>∼</sup> *<sup>t</sup>* <sup>−</sup>1/2, where the first one is the axial torsion boundary layer, and the second is, for instance, induced by kinematic constraints in part of a long cylinder such a T-junction. These layers are referred to as long-range Fourier modes in Section 3.2 of [3]. The layer chart for a long cylinder is given in Figure 3.

**Figure 3.** Layer structure: Parabolic long cylinder. The expected length scales are indicated with the arrows. The generators are the boundaries at *x* = ±*H*, and the hole in the center. The location of the center does not play a role. If the periodic boundary at *y* = ±*π* is free, under torsion there will be a very long layer ∼ 1/*t*. The hole also generates a short range axial layer, which is not indicated.

#### **5. Numerical Simulations**

The three simulation sets considered here have been tabulated in Table 1. They have been selected to illustrate each of the three main types of long-range layers including perforated variants. The relative effects of the perforations on the fundamental natural frequency are given as ratios of the perforated and reference frequencies. For every realization of the shell geometry, five logarithmically equidistant thicknesses ∈ [1/1000, 1/100] have been used. All parabolic cases have been computed using both Naghdi and shallow shell models and as expected, the differences are negligible. The hyperbolic and elliptic cases have been simulated with the Naghdi model only. In all cases where short-range layers have been present at the clamped boundaries, the meshes have been adapted a priori to the corresponding length scales. In all structures with perforation patterns, the holes are free; that is, there are no kinematical constraints.

**Table 1.** The set of simulations with key parameters. The profile functions for the parabolic and hyperbolic cases are *f*(*x*) = 1 and *f*(*x*) = 1 + (1/2)(*x*/*H*)2, respectively. For every individual simulation, five logarithmically equidistant thicknesses ∈ [1/1000, 1/100] have been used. *H* is the half-width, *p* the uniform polynomial order, and *N* is the total number of degrees of freedom. In the non-perforated Slit Shell simulations, the meshes are topologically equivalent. In the perforated cases, the hole coverage percentage is 25%, the triangular pattern is 200 × 10 resulting in 3791 holes, and the regular pattern is 1000 × 10 resulting in 10,000 holes.


The selected polynomial orders can be justified by considering the energy convergence in *p*. In Figure 4, two examples of convergence graphs are shown.

**Figure 4.** Numerical Locking: *p*-convergence in energy. Total energy at *p* = 5 is taken as the reference. (**a**) Circular plate with an elliptic extension under unit pressure on the extension. (**b**) Long slit parabolic cylinder with a regular perforation pattern under torsion loading. In both cases starting at *p* = 4, the relative error in energy is sufficiently small, justifying the selected polynomial orders.

#### *5.1. Wind Turbine: Manhole*

The first simulation concerns the long-range layer on long or tall structures with a kinematically constrained section within the domain. The manhole title is inspired by an example in [8], (Figure 1). The authors discuss the effects of various stiffeners for the manhole of a wind tower. In their FEM analysis figure, the long-range effect is clearly visible but not shown at full length. The construction reported has a dimensional thickness of 3/100, so it falls within the realm of thin structures.

In Figures 1a and 5, the overall solutions are shown for parabolic and hyperbolic shells, respectively. The overall features predicted in the layer chart of Figure 3 are clearly visible in both cases. The effect of the length *H* of the structure is shown in Figure 6. The long-range extends further in the longer cylinder as expected, since *H* = 10 ∼ 1/ √*t*, whereas *<sup>H</sup>* <sup>=</sup> <sup>100</sup> 1/√*t*.

**Figure 5.** Long hyperbolic shell with an ellipsoidal hole. Transverse deflection profile when the dimensionless thickness *t* = 1/100, all boundaries are kinematically fully constrained, *u* = *v* = *w* = *θ* = *ψ* = 0, and the loading is the unit pressure.

**Figure 6.** Two long cylinders: Transverse deflection profiles. In both cases, the dimensionless thickness *t* = 1/100, all boundaries are kinematically fully constrained, *u* = *v* = *w* = *θ* = *ψ* = 0, and the loading is the unit pressure. (**a**) *H* = 10. (**b**) *H* = 100, with the centre section shown.

The asymptotic behavior of the eigenmodes in the free vibration of shells of revolution is known [25]. It is of interest to monitor the effect of the hole to the lowest eigenvalue. Transverse profiles are shown in Figure 7. Since the hyperbolic profile is only mildly hyperbolic and the interval of thicknesses is kept realistic, the profiles appear very similar with the oscillations only slightly more concentrated in the center in the hyperbolic case. Assuming the same material parameters, the eigenvalue amplification due to the hole is smaller in the hyperbolic case. In both cases, the amplification decreases as *t* → 0. This is due to angular oscillations (wave numbers) increasing as *t* → 0 and hence the interaction of the hole and the layers becomes weaker.

**Figure 7.** Free vibration: Transverse deflection profiles of first eigenmodes. In both cases, *H* = 30 and all boundaries are kinematically fully constrained, *u* = *v* = *w* = *θ* = *ψ* = 0. From top: *<sup>t</sup>* ∈ {1/1000, 1/100√10, 1/100}. Ratio of the observed eigenvalue over the reference eigenvalues: (**a**,**c**,**e**) (Parabolic) – {1.1, 1.3, 1.8}. (**b**,**d**,**f**) (Hyperbolic) – {1.1, 1.3, 1.7}.

Of course, for cylindrical or parabolic shells, there is also the relative long layer of <sup>√</sup><sup>4</sup> *<sup>t</sup>* in the angular dimension. In Figure 8, this effect is shown by visual comparison of two profiles corresponding to different thicknesses. Since the eigenmodes oscillate in the angular direction as seen in Figure 7, the effect of the hole is very small in the eigenvalues, with the increase in the ratio less than 1%.

**Figure 8.** Manhole: Transverse deflection profiles. In both cases, *H* = 30, boundaries are kinematically fully constrained, *u* = *v* = *w* = *θ* = *ψ* = 0, and the loading is the unit pressure. (**a**) *t* = 1/100, *w* ∈ [−0.10, 0.24]. (**b**) *t* = 1/1000, *w* ∈ [−0.56, 0.38].

**Remark 3.** *For hyperbolic shells, the* <sup>√</sup><sup>4</sup> *<sup>t</sup> layer does not exist. Instead, there is a* <sup>√</sup><sup>3</sup> *<sup>t</sup> layer along the characteristics of the surface. In Figure 5, the shell is nearly parabolic in the vicinity of the hole and therefore this feature is not visible.*

Confirming the correct length scales is difficult since the long-range effects do not occur in isolation, but in all cases the deflection profile is a linear combination of different characteristic features. Using interpolated representations, the long-range effect is associated with the inflection point of the curve. The two thinnest cases are used to estimate the constant of the layer, and this value is used to predict the corresponding locations in the other cases.

In Figure 9a, another view to the angular layers is given. In addition, in Figure 9b, the agreement with the predicted length scale is illustrated. However, it appears that the

clamped end at *x* = *H* already affects the overall profile. The estimated constant is *c* = 0.85 leading to model *L* ∼ *c*(1/ <sup>√</sup>*t*). In Figure 10, two long configurations with different geometries have been considered. The displacement graphs over a set of thicknesses show the stronger short-range effects. As can be seen, in the longer cases, *H* = 1000, the agreement is very good indeed. Notice that the good agreement is on the axial rotation component *θ*, which means that domain expertise has been necessary to find the right component.

**Figure 9.** Manhole: Transverse deflection profiles and predicted characteristic length scales. Case *H* = 30 with ellipsoidal hole at (1 − *H*, *π*): (**a**) Profiles at *x* = 0. (**b**) Observed characteristic length scales.

**Figure 10.** Long-range layers: Predicted characteristic length scales. In all cases, the inflection points have been computed for the two thinnest cases and the constant has been set with these points. The constants are from left to right: *c*<sup>1</sup> = 3.5, *c*<sup>2</sup> = 3.25, leading to models *L* ∼ *ci*(1/ <sup>√</sup>*t*), *<sup>i</sup>* <sup>=</sup> 1, 2. The rest of the points have been selected based on the theoretical prediction. The agreement is surprisingly good. (**a**) Parabolic case, center hole, *H* = 1000, *θ*-component. (**b**) Hyperbolic case, center hole, *H* = 1000, *θ*-component.

#### *5.2. Slit Shells: Torsion Effect*

The torsion layer ∼ 1/*t* is naturally the most difficult to recover from simulation data. The effect on a slit cylinder is illustrated in Figure 11. The boundary at *x* = −*H* is clamped, *u* = *v* = *w* = *θ* = *ψ* = 0, but all other boundaries are free. The loading is a unit torsion load acting on the boundary at *x* = *H*. The technique used in the previous case was not successful here. Indeed, the exact layer could not be found from the data. However, the effect of *H* can be deduced indirectly by observing the rate of change of curvature of the displacement profile over a set of thicknesses after the loading is scaled by *t* 3. In Figure 12a, for *H* = 100 the slopes for both geometries, including a parabolic perforated case, the trends are constant. In Figure 12b, for *H* = 1000 the observed trends are somewhat polluted, but the important aspect for this discussion is that as the shell becomes longer, and the torsion effect becomes milder indicating the presence of a long-range layer. Since

this phenomenon also exists at *H* = 1000 1/*t* for *t* = 1/100, it is likely that it is due to the corresponding predicted layer.

**Figure 11.** Slit cylinder: Transverse deflection profile. Perforation pattern is triangular with hole coverage of 25%.

**Figure 12.** Slit shell of revolution: Effect of the length of the shell on torsion loading acting on the free end. Parabolic and hyperbolic cases, with a perforated parabolic one as the third option. The torsion layer leads to smaller effect close to the fixed boundary. Shown is the observed transverse deflection at *x* = 20 along the midline of the surface. (**a**) *H* = 100. The slope of the loglog-graphs is =3. (**b**) *H* = 1000. The slope of the loglog-graphs is =2.

Interestingly, the first eigenmode corresponds to the kind of rotation caused by the chosen loading in the static case, and therefore the figure has been omitted. Now, the perforated case is softer (the holes are free), and hence the ratios are less than one, and decreasing as *t* → 0. For *t* = 1/100, the ratio is 0.64, whereas for *t* = 1/1000, it is 0.42.

#### *5.3. Curvature Effect*

The third and final simulation adds two aspects: First, the layer generator is curved, and second, the shell geometry is not of uniform type. The shell of revolution is formed by letting a profile function *<sup>g</sup>*(*x*) = 1, *<sup>x</sup>* <sup>∈</sup> [1, 1 <sup>+</sup> *<sup>π</sup>*], and <sup>=</sup> <sup>1</sup> + (*<sup>x</sup>* <sup>−</sup> *<sup>π</sup>*)*α*, *<sup>x</sup>* <sup>∈</sup> [*π*, 2*π*]. In other words, the inner section is a plate and the outer section is an elliptic extension with curvature dependent on *α*. One realization is shown in Figure 1b. The parameter *α* determines the length of the layer. Two sets of simulations with *α* = 2 or = 3 are computed on the multipanel mesh of Figure 13a. The inner holes are free and therefore as *α* increases and *t* → 0 for a fixed loading, the displacement amplitudes increase due to the sensitivity of the problem. However, for a given range of thicknesses, the response of the structure is reasonable (see Figure 13b).

**Figure 13.** Circular plate with an elliptic extension. Pressure load is acting on the right-hand-side half of the domain. The boundaries at *x* = 1 and *x* = 1 + 2*π* are clamped, *u* = *v* = *w* = *θ* = *ψ* = 0, the *y* = 0 and *y* = 2*π* boundaries are periodic. The inner circular hole has a radius = 1. (**a**) Multipanel mesh. (**b**) Quadratic extension, *w*-component (transverse deflection), *t* = 1/100.

For small values of *α* the recovery of the length scales is successful (see Figure 14). In the previous study on curved generators, the plate was not present, and the structure was not perforated. This clearly indicates that these effects are features of the elasticity equations irrespective of the effective material properties.

**Figure 14.** Circular plate with an elliptic extension. Profiles along *y* = *π*. As above, in all cases the inflection points have been computed for the two thinnest cases and the constant has been set with these points. The constant is the same in both cases: *c* = 2.25, leading to model *L* ∼ *c*(1/*t <sup>α</sup>*+1). The rest of the points have been selected based on the theoretical prediction. Again, the agreement is very good indeed. (**a**) Quadratic extension, *w*-component (transverse deflection). (**b**) Cubic extension, *w*-component (transverse deflection).

The eigenanalysis becomes more involved in this case. As can be seen in Figure 15, the (relative) transverse deflection profiles are almost dramatically different. This is due to the free holes in the elliptic part exhibiting sensitivity [6]. The eigenmode includes strong oscillations in the elliptic part, a phenomenon that does not have any corresponding effect in the reference case where the oscillation is confined to the plate section. It is due to sensitivity that the eigenvalues ratios increase as *t* → 0. For *t* = 1/100, the ratio is 0.26, whereas for *t* = 1/1000, it is 0.48.

**Figure 15.** Circular plate with an elliptic extension. First eigenmodes at *t* = 1/100. The boundaries at *x* = 1 and *x* = 1 + 2*π* are clamped, the *y* = 0 and *y* = 2*π* boundaries are periodic. The inner circular hole has a radius = 1. (**a**) Reference domain. (**b**) Perforated domain, *w*-component (transverse deflection).

#### **6. Conclusions**

Shell structures have a rich family of boundary layers including internal layers. Each layer has its own characteristic length scale, which depends on the thickness of the shell. The long-range layers do exist, and in certain problem classes play an important role. Interestingly, since they are not well-known, it is possible that in some problems new modeling ideas might be brought forward if only they were recognized in the right contexts.

The simulation of such structures is subject to numerical locking, and high-order finite element methods provide one way to derive reliable solutions. The observed asymptotic behavior is consistent with the theoretical predictions. These layers are shown to also appear on perforated structures underlying the fact these features are properties of the elasticity equations and not dependent on the effective materials.

**Funding:** This research received no external funding.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** I would like to thank the anonymous referees for suggestions that improved the paper considerably.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A. Mathematical Shell Model**

In the following, the Naghdi model is simplified by assuming that *ω* is a domain expressed in the coordinates *x* and *y*. The curvature tensor {*bij*} of the midsurface is assumed to be constant and *a* = *b*11, *b* = *b*22, and *c* = *b*<sup>12</sup> = *b*21. The shell is then called elliptic when *a b* <sup>−</sup> *<sup>c</sup>*<sup>2</sup> <sup>&</sup>gt; 0, parabolic when *a b* <sup>−</sup> *<sup>c</sup>*<sup>2</sup> <sup>=</sup> 0, and hyperbolic when *a b* <sup>−</sup> *<sup>c</sup>*<sup>2</sup> <sup>&</sup>lt; 0. The above assumptions are valid when the shell is shallow, i.e., the midsurface differs only slightly from a plane. In the simplest case, one may set d*ω* = d*x*d*y* and write the relation between the strain and the displacement fields as

$$\begin{aligned} \beta\_{11} &= \frac{\partial u}{\partial x} + aw, \quad \beta\_{22} = \frac{\partial v}{\partial y} + bw, \quad \beta\_{12} = \frac{1}{2} \left( \frac{\partial u}{\partial y} + \frac{\partial v}{\partial x} \right) + cw, \\\ \rho\_1 &= \theta - \frac{\partial w}{\partial x'}, \quad \rho\_2 = \psi - \frac{\partial w}{\partial y'}, \\\ \kappa\_{11} &= \frac{\partial \theta}{\partial x'}, \quad \kappa\_{22} = \frac{\partial \psi}{\partial y'}, \quad \kappa\_{12} = \frac{1}{2} \left( \frac{\partial \theta}{\partial y} + \frac{\partial \psi}{\partial x} \right). \end{aligned} \tag{A1}$$

This choice of shell model gives us additional flexibility in the design of the numerical simulations since the model admits *non-realizable* shell geometries. This is due to the assumption that the local curvatures are constant at every point of the surface.

Remarkably, for parabolic shells these strains differ from those of the standard Naghdi model only in *κ*<sup>12</sup> and *ρ*1, when the radius is = 1. Naturally, for non-parabolic geometries the differences are much more extensive. Notice that the resulting system has constant coefficients, which simplifies the implementation of the model significantly.

#### **Appendix B. On Buckling Modes**

The critical load of the real shell is known to be very sensitive to small geometric imperfections and deviations in boundary conditions, which are difficult to take into account in linear or nonlinear stability theory. As a result, theoretical and experimental results do not agree well in many loading scenarios. In any case, the linear stability theory provides useful information regarding the buckling behavior of thin shells [11].

The first buckling modes are shown in Figure A1. Interestingly, even in the reference case, the lowest mode can exhibit axial oscillations. In the profile of the thinnest configuration with a hole, symmetry appears to be lost. This is due to the extreme ill-conditioning of the problem. Also in contrast with the free vibration, here the eigenvalue ratio between the perforated and reference configurations does not change as *t* → 0 (see Figure A2). In both cases, the dependence is linear, which is in fact a new result. It should be noted that in simulations with the given buckling model, the observed spectrum is clustered for every fixed thickness. For instance, the relative difference within the first ten modes is less than 1% in the case with a hole, and a fraction higher in the reference case. In fact, it has been proposed that the Shapiro–Lopatinsky conditions are not satisfied in the limit and the coercivity is lost, just as for the sensitive elliptic shells [6].

**Figure A1.** Buckling modes: Transverse deflection profiles of first modes. In both cases, *H* = 30, the hole is kinematically fully constrained, *u* = *v* = *w* = *θ* = *ψ* = 0, and the ends have *v* = *w* = *ψ* = 0. From top: *<sup>t</sup>* ∈ {1/1000, 1/100√10, 1/100}. (**a**,**c**,**e**) Parabolic reference. (**b**,**d**,**f**) Parabolic with a hole.

**Figure A2.** Buckling modes: Linear dependence of the observed smallest *λ* (eigenvalue corresponding to the critical load) to the thickness. Thick line: Reference. Dashed line: With hole.

This set of simulations simply illustrates the inherent complexity of the buckling problem in this context. Many fundamental concepts remain open starting from the selection of the right model and kinematic constraints.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **A Review of Coastal Protection Using Artificial and Natural Countermeasures—Mangrove Vegetation and Polymers**

**Deborah Amos \* and Shatirah Akib**

Department of Civil Engineering, School of Architecture, Design and the Built Environment, Nottingham Trent University, Nottingham NG1 4FQ, UK

**\*** Correspondence: deborah.amos2021@my.ntu.ac.uk

**Abstract:** Any stretch of coastline requires protection when the rate of erosion exceeds a certain threshold and seasonal coastal drift fluctuations fail to restore balance. Coast erosion can be caused by natural, synthetic, or a combination of the two. Severe storm occurrences, onshore interventions liable for sedimentation, wave action on the coastlines, and rising sea levels caused by climate change are instances of natural factors. The protective methods used to counteract or prevent coastal flooding are categorized as hard and soft engineering techniques. This review paper is based on extensive reviews and analyses of scientific publications. In order to establish a foundation for the selection of appropriate adaptation measures for coastal protection, this research compiles literature on a combination of both natural and artificial models using mangrove trees and polymer-based models' configurations and their efficiency in coastal flooding. Mangrove roots occur naturally and cannot be manipulated unlike artificial model configuration which can be structurally configured with different hydrodynamic properties. Artificial models may lack the real structural features and hydrodynamic resistance of the mangrove root it depicts, and this can reduce its real-life application and accuracy. Further research is required on the integration of hybrid configuration to fully optimize the functionality of mangrove trees for coastal protection.

**Keywords:** hard engineering techniques; soft engineering techniques; coastal protection; hybrid configuration; hydrodynamic resistance

#### **1. Introduction**

In the coastal region, dry land and a maritime environment (water and submerged land) coexist in a zone where terrestrial functions and land uses directly affect the marine environment, and vice versa. Physiological factors such as tides, waves, nearshore eddies, sand movement, and rivers impact coastlines. In several coastal cities worldwide, coastlines cover ecosystems and habitats that generate goods and services for the local population. Coastal areas also serve as the origin or backbone of the national economy [1].

According to Zanuttigh, erosion and flooding presently pose serious hazards to coastal communities, so developing defense mechanisms capable of dealing with the increasing sea level and more frequent storms caused by climate change is a significant challenge [2]. Different techniques are used to protect coastlines against erosion, including hard engineering and soft engineering. In hard engineering, solid structures are used to withstand erosion pressures, such as sieves, dikes, embankments, piers and revetements, and breakwaters. The use of soft engineering methods of coastal protection involves taking into consideration all aspects of preservation, including environmental, sociological, and economic aspects, and utilizing smaller structures made of natural materials. Currently, many parts of the world prefer natural coastal defenses that employ vegetation such as mangroves [3–6].

Mangroves are vegetation formations that develop on alluvial soils in coastal and estuarine locations which are frequently inundated by ocean tides. Researchers have extensively studied the performance of mangrove forests in reducing waves caused by

**Citation:** Amos, D.; Akib, S. A Review of Coastal Protection Using Artificial and Natural Countermeasures—Mangrove Vegetation and Polymers. *Eng* **2023**, *4*, 941–953. https://doi.org/10.3390/ eng4010055

Academic Editor: Antonio Gil Bravo

Received: 2 November 2022 Revised: 13 February 2023 Accepted: 13 February 2023 Published: 8 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

erosion, including [7–9] who conducted laboratory experiments on mangroves as coastal protection. Mangrove trees have been proven and used in several locations as a solid structure capable of shielding coastlines against erosion. For decades, this has led to problems in establishing natural coastal protection, for example, mangrove-seedling-trees being destroyed by waves or tides before they have a chance to grow firmly, which requires at least two years of plantation. Planting them requires temporary structures, according to Verhagen [10]. As a result of this challenge, a natural coastal protection system combining natural and temporary artificial structures is recommended [11].

Mangroves grow in tidal zones along estuaries and coastal areas. While considering mangrove regeneration, it is crucial to consider the appropriate habitat and planting strategy. Their species are selected based on the existing species in the surrounding region as well as their access to seed. Yuanita et al. conducted a physical modelling experiment on different configurations using four different types of model settings without mangroves and with the presence of mangroves. A modelled mangrove seedling was carried out in a wave flume made of iron bars [12].

Several studies currently indicate that floods attenuate differently but they fail to address the role of major factors such as slope bathymetry, forest area, forest channelization, plant density, flood amplitudes and durations, etc. in determining those variations. More research is needed to understand how forest and storm features impact flood attenuation rates in mangrove forests so that informed decisions can be made about mangrove management. Natural resources necessary for human survival and growth have historically been found in coastal areas [13]. Today, coastal areas remain attractive due to their abundant ecological benefits. The majority of big cities being located near coastal regions, such as New York, Tokyo, Shanghai, and London, as argued by Nicholls [14], and the population density in coastal regions being three times the global mean, are indicative of society's desire to live near the shore [15]. The socioeconomic status of coastal communities in the UK makes this particularly important due to the UK's diverse coastal areas.

Natural erosion and flooding caused by coastal storms, such as flooding and coastal erosion threats, are becoming more common as a consequence of climate change due to tidal marshes and mangrove [16]. The researchers argue, however, that tidal wetlands are not able to reduce all risks equally, and that hazard reduction is governed by specific conditions. As a result of severe weather conditions, wetland qualities, and relatively large coastal terrain geometries, long-period severe storms that raise ocean levels by several metres for about a day are less effectively attenuated. Although storm damage to vegetation (especially mangrove trees) is often severe, and recovery can take years, wetlands generally assist in reducing erosion.

Slinger and Vreugdenhil demonstrated the importance of nature-based solutions for coastal management using a critical reflection technique centered on the design process. They distinguish four axes in attempting to determine the extent to which a hydraulic infrastructure forms a nature-based solution: the degree of inclusion of ecological knowledge; the extent to which the full infrastructural lifecycle is addressed; the complexity of the actor arena considered; and the resulting form of the infrastructural artefact. They classified traditional and new sea defense facilities on the North and South Holland coasts along the axes indicating how nature-based newly implemented solutions are and how broadly society values and stakeholders are included in the design process [17].

#### **2. Coastal Engineering Protection**

The coastal zone is a sensitive area in which the balance could be disturbed by a variety of factors; therefore, engineers, planners, and government agencies must pay close attention to detail before proceeding with any engineering activities along the coastline. Coastal behavior is largely site-specific, which means that a host of different factors must be considered closely. It is important to ensure that activities near beaches are ecologically sound, particularly in metropolitan areas. In coastal protection, measures are classified as hard (gabions, seawalls, offshore detached breakwaters) and soft (artificial nourishment of beaches, bio shields/vegetation, dune stabilization, geosynthetic application) as discussed previously.

#### *2.1. Hard Engineering Techniques*

Typically, hard engineering consists of the erection of gravity infrastructure made up of dunes, concrete structures, or rubble with a trapezoidal cross-section that is designed to withstand the waves along the shoreline. When structures are built along the coast, they are often irreparably damaged. Many of the projects are usually undertaken to provide a quick solution to erosion concerns; they are most effective when they are meticulously constructed with a comprehensive understanding of the wave geography, the local bathymetry, and the sediment properties. Hard engineering structures along the coasts are groins, seawalls, breakwaters, and offshore breakwaters (emerged and submerged). Hard engineering approaches have strong impacts on the environment and are expensive to implement and maintain.

#### 2.1.1. Seawall

This structure prevents erosion immediately along a coastal stretch, but it may not contribute to or expand beach width. It may be necessary, however, in many circumstances to regularly repair seawalls, especially those constructed of rubble mounds. Figure 1 shows a cross-section of the seawall harbor. There are several practical obstacles in transporting tonnes of rubble mound to the beach, as well as in continually fabricating concrete structures to drop along the coastlines. Hard methods and gravity systems can be efficient if the local soil structure are sustainable and construction materials are easily obtained at the construction site location. Gravity systems and hard methods can be efficient if the local soil structure is sustainable and construction materials are easily obtained on the construction site. The disadvantage of a seawall is that waves can erode the wall, defeating its purpose.

**Figure 1.** Cross Section of a Seawall. Adapted with permission from Ref. [16], 2013, Firth et al.". More details on "Copyright and Licensing" are available via the following link: https://www.mdpi.com/ ethics#10 (accessed on 12 February 2023).

#### 2.1.2. Gabion

According to [16], the utilization of gabion boxes as submarine reefs could be considered a soft engineering solution to counteract coastal flooding since they contribute to fostering ocean life around them. Sundar and Murali went into detail about the use of gabions around the Kerala coast. Gabion boxes are considered a hard engineering solution when used as an alternative to rubble or concrete armour layers in traditional shore-linked structures. The gabion boxes were originally used to repair a damaged seawall crosssection. Although gabions are a hard engineering structure, they are not very attractive and effective [17].

#### 2.1.3. Offshore Detached Breakwater

Breakwaters that are disconnected offshore generate areas of low energy on their leeside, allowing for the creation of salient and, eventually, Tombolos, the details of which are described by Sukanya [18]. The cost and time involved in the construction of offshore disconnected breakwaters have prevented them from being implemented over impacted portions of Indian coastline. Waves are deflected by the breakwaters' ends, creating a quiet zone between them. Their offshore run parallel to the eroding shoreline and contribute to the formation of salient in time, which in turn leads to the formation of tombolo which enlarges the beach. Furthermore, it can also be built as part of wave energy conversion (WEC) systems, such as an oscillating water column [19].

#### *2.2. Soft Engineering Techniques*

It is a well-designed measure, which has little or no impact on the coastal environment. Opposed to hard measures, this is a lengthier process. Metrics like these require extensive knowledge. In this term, artificial beach nourishment and natural vegetation are two of the most common solutions. Over time, geo-synthetics were increasingly used in coastal protection measures, where polymer-based synthetic fibers were utilized for drainage, separation, filtration, and retention [20]. The term "soft structures" refers to structures completely or partially composed of geo-synthetic materials, such as seawalls, underwater breakwaters, and submarine reefs.

#### 2.2.1. Coral Reef

Its' unique structures—some emerging from deep levels to the surface of the ocean, and in many cases extending parallel to coasts for tens or hundreds of kilometers—place them on the front line of coastal protection. The structural geometry and ruggedness of reef formations determine their impact on currents and waves. This complicated structure is a result of the biotic proliferation of habitat-forming organisms, particularly hard corals, and coralline algae. In addition to reducing coastline flooding, reef roughness has been found to have a substantial impact on reducing massive energy flows from underlying seas into the reef structure, greatly slowing the action of waves [21,22].

In tropical regions, coral reefs play an extremely important role in dispersing wave energy. On the other hand, fragmented reef patches and channels may be able to enhance or direct tidal energy locally [23,24]. The impact of storms on habitat and the kind of coastal protection provided by reefs must also be acknowledged [25].

Sea level rise also poses a critical threat to reef structures, including beaches and islands that are connected. As evidenced by geological data from the Great Barrier Reef, coral reefs may grow rapidly [26], although such growth is dependent on reef stability. There are many regions where land has formed from coral reef deposits that are sculpted into beaches and islands by storms and sometimes boosted by windblown sediments [27]. A massive analysis of Pacific islands found that, while some islands are shrinking in area and many have dynamic borders, all of these mechanisms could be adequate to allow sustained island expansion or maintenance under some conditions of rising sea levels: despite the slight rise in sea levels that has occurred to date, the total area of coral islands appears to have expanded [28], although coastal development and climate change, particularly ocean acidification, may alter such processes.

Psychologically, sea level rise, as well as the possibility of sea level rise associated with changes in island sediment migration, may pose serious risks even if landmasses are not substantially reduced [27,29,30]. As with coastal wetlands, reef formations have varying impacts throughout space on coastal protection. The primary sources of variability are listed in Table 1. A better understanding of these causes and their measurement is crucial to fully analyze how well a reef protects.


**Table 1.** Significant variation determinants in the coastal protection function of coastal wetlands and coral reefs. Coral reef data are based on [21,31–36]. Wetlands data are based on [4,35,37–39].

#### 2.2.2. Mangrove Forest

Mangrove trees have been proven and used in several locations as a solid structure capable of shielding coastlines against erosion. This has caused problems for decades in establishing natural coastal protection, for example mangrove-seedling-trees being destroyed by waves or tides before they have a chance to grow firmly, which requires at least two years of plantation. Planting them requires temporary structures, according to Verhagen [10]. As a result of this challenge, a natural coastal protection system combining natural and temporary artificial structures is recommended [12]. Mangrove species are selected based on the existing species in the surrounding region as well as their access to seed. Figure 2 illustrates a natural coastal protection system.

**Figure 2.** Temporary Structures and Natural Coastal Protection [12]. Reprinted/adapted with permission from Ref. [12]. 2019, Yuanita et al. More details on "Copyright and Licensing" are available via the following link: https://www.mdpi.com/ethics#10 (accessed on 12 February 2023).

A mangrove ecosystem is a tropical or subtropical wetland forest located between the land and the ocean composed of saltwater-adapted trees, shrubs, palms, and ferns. As mangroves grow at or above mean sea level, floods vary from near-constant to irregular [40–44]. Giri et al. estimate that mangrove ecosystems cover 152,400 square kilometres worldwide, distributed across 123 nations, and account for 30–35% of tropical wetland forests [43–45]. They are not one morphological group, but rather a variety of plant species with special adaptations that allow them to survive in the severe intertidal environment [40,42,46].

#### Traits and Adaptation

With both physiological and morphological adaptations, mangrove plant species are uniquely adapted to frequently waterlogged, salines, and turbulence intertidal environments, including:


*Rhizophora* species have tall lateral prop roots (or stilt roots). In some cases, as well as in *Avicennia* spp. [e.g., *officinalis*]), shallow but far-reaching aerial roots producing surface-penetrating pneumatophores in *Avicennia*, *Laguncularia*, *Lumnitzera*, *Sonneratia*, and *Xylocarpus* spp., surface-penetrating knee roots in *Bruguiera*, *Ceriops* and *Xylocarpus* spp., plank roots in *Camptostemon* and *Xylocarpus* spp., and buttress-forming stems in *Heritiera* and *Kandelia* spp. assist in stabilising mangrove stems (Figure 3; [41,43,46,47]). In the saline intertidal zone, high root:shoot ratios are a key factor for absorbing water [47], as well as tolerance of strong intertidal disturbances [48]. The presence of lenticels allows root aeration in anaerobic, water-logged sediments with surface-penetrating aerial roots [46, 47]. Mangrove roots such as *Aegialitis*, *Aegiceras*, *Avicennia*, *Bruguiera*, *Ceriops*, *Excoecaria*, *Osbornia*, *Rhizophora*, and *Xylocarpus* spp. are also capable of excluding salt from tissues by ultrafiltration; other species actively secrete salt from tissues such as *Acanthus*, *Aegialitis*, *Aegiceras*, *Avicennia*, *Laguncularia* and *Sonneratia* species or from senescent leaves such as *Excoecaria* and *Xylocarpus* spp. [46,47].

**Figure 3.** Mangrove Species. Reprinted/adapted with permission from Ref. [41], 2010, Polidoro et al. More details on "Copyright and Licensing" are available via the following link: https://www.mdpi. com/ethics#10 (accessed on 12 February 2023).

Mangrove species make significant investments in leaf production, often producing lengthy, fleshy leaves with tough outer layers of the epidermis and specialised salt excretory glands that reduce transpiration losses at the expense of reduced number of leaves and photosynthetic activity [41,46,49]. *Rhizophora apiculata* mangrove seedlings can be planted on the shore when they are more than 30 cm tall and have four leaves. Seedlings of *Rhizophora mucronata* can be planted when they are at least 55 cm tall and have at least four–six leaves. As mangroves cannot thrive in either wet or dry conditions, they should be planted in places where both wet and dry conditions exist daily.

#### Experimental Models of Coastal Protection

Models are conducted using a physical modelling experiment on different configurations using four distinct types of model settings without mangroves and with the presence of mangroves [12]. A modelled mangrove seedling was carried out in a wave flume made of iron bars. Different types of configurations are illustrated in Figure 4.

**Figure 4.** Types of Configuration (**a**) staggered arrangement 10 cm; (**b**) tandem arrangement 10 cm; (**c**) tandem arrangement 5 cm. Reprinted/adapted with permission from Ref. [12]. 2019, Yuanita et al. More details on "Copyright and Licensing" are available via the following link: https://www.mdpi. com/ethics#10 (accessed on 12 February 2023).

In this study, the influence of the mangroves model was examined using the wave transmission coefficient (*Kt*). A transmission coefficient is the ratio of the transmitted wave height (*Ht*) to the starting wave height (*Hi*). To determine the transmitted wave height (*Ht*), data from wave gauge *CH3* was used, while the initial wave height (*Hi*) was determined by data from wave gauge *CH2*.

$$K\_t = \frac{\text{Transmitted Wave Height}}{\text{Incident Wave Height}} = \frac{H\_{t\\_(CH\_3)}}{H\_{i\\_(CH\_2)}}\tag{1}$$

The research objective was to examine the wave height reduction with different mangrove densities, and to investigate the effect of mangrove seedling tree patterns on wave attenuation. The experimental testing was carried out in a narrow wave flume with a mangrove model as the primary natural barrier and geotextile geo-bag models as a temporary constructed construction. During this laboratory experiment, several wave scenarios were established. The study focused on the wave propagation findings over mangrove seedling trees in order to discover the most effective configuration of mangrove tree planting against wave. The results revealed that the wave height reduction in areas with mangroves was twice as big as that in bare land [12].

During the research it was also discovered that the variation in wave attenuation comparing tandem and staggered tree configurations was 20% lesser and that the temporary structure considerably reduces wave height and protects the growth of mangrove seedlings against wave action.

Safari et al. in their study computed the transmission coefficient (*Kt*) as the ratio of the residual wave height after models to the incident wave height before models in Equation (2) [50,51]. To overcome the limitations of the previously described armour blocs, Hogue et al. investigated the newly designed armour unit, called 'The Starbloc®,' which is made up of a centralized hexagonal core, three legs, and two noses. Its structural characteristics facilitate simplified mobility, much better positioning, and much better hydraulic stability [52].

$$K\_t = \frac{\text{Wave Height after Models}}{\text{Wave Height before Models}} = \frac{H\_{aft}}{H\_{bfr}}\tag{2}$$

An experimental investigation of the efficiency of artificial Xbloc walls made of hybrid polymer and mangrove root models for water wave defense was conducted by Safari et al., as shown in Figure 5, Ref. [51].

**Figure 5.** Hybrid Configuration using one Xbloc wall with two mangrove roots ina5m flume tank. Reprinted/adapted with permission from Ref. [51]. 2018, Safari et al. More details on "Copyright and Licensing" are available via the following link: https://www.mdpi.com/ethics#10 (accessed on 12 February 2023).

Three Xbloc pieces were placed on each other and bonded with water-resistant adhesive to form one Xbloc wall. Software such as SolidWorks and AutoCAD were used to create fake models, which were 3D printed, laser cut, and superglued. The test was carried out using a variety of single and multiple Xbloc barriers and mangrove root simulations. For six alternative model setups, changes in wavelength, height, celerity, and period were found. The results showed that the celerity, height, and wavelength were successfully reduced, as well as the wave period being lengthened (one cycle time).

In the research carried out, it was discovered that the hybrid configuration of one Xbloc wall and two mangrove roots gave the best protection, lowering the wavelength, celerity, and height by 5.50%, 26.46%, and 58.97%, respectively, and delaying the wave duration by 28.34%. The configuration with only one set of mangrove roots model had the lowest attenuation. As a result, wave reduction utilizing the hybrid action of artificial polymer made Xbloc walls and mangrove roots was superior since it permitted wave energy dissipation to a larger extent than using just Xbloc walls or mangrove roots alone.

As shown in Equation (3), Zwicht computed the transmission coefficient in consideration of wave height and wave energy [53]. Their studies specify the reflection and dissipation coefficients as two additional wave attenuation analysis factors. Their linking method involves the energy balance among the three factors.

$$K\_l = \frac{\text{Transmitted Wave Energy (after forces)}}{\text{Incident Wave Energy (before forces)}} = \frac{E\_{m0,t}}{E\_{m0,i}} \tag{3}$$

Hogue et al. investigated the uneven wave attenuation performance of mangrove forests in terms of wave dissipation, reflection, and transmission coefficients. The experiment was carried out in a Twin Wave Flume (TWF), with the bigger flume containing quantified *Rhizopora* sp. mangrove trees and the smaller flume not. *Rhizophira* sp. was extremely efficient in minimising tsunami-induced flow due to the complexity and thickness of its rhizome. The wave energy diminished exponentially throughout the flume forest area, and the amount of the energy dissipated decreased from the front of the vegetation to the end having more wave attenuation at the mangrove forest Ref. [52].

Artificial coastal protection measures were examined by Zwicht, who analyzed the effect of concrete unit weight on the hydraulic stability along with our ability to establish the appropriate computational model of the stability number (Ns) [53]. Based on the model testing, it was evident that as the specific weight increases, so does the hydraulic stability; however, when factoring in the impact of varied gradients, relevant data were obtained. For gradients of 2:3 and greater, stability was observed to be higher than predicted from the previous Ns equation, whereas stability was lower for gradients of 1:2. During coastal protection, the stability of armour bloc units depends on their structure, packing density, and deployment pattern (random or organized). Acropode® and Xbloc®, which are single layer interlocking armour block units, can be damaged by oscillations. As a result of a weak foundation or inadequate interconnections, blocks wobble during this phase of destruction, causing variations in their optimum state.

A laboratory experiment of wave attenuation through cylinder arrays, mimicking wave attenuation processes through a coastal man-grove forest, was conducted in a flume of the Fluid Mechanics Laboratory at Delft University of Technology by Phan et al. The effective length, height, and width of the flume is 40 m, 1 m, and 0.8 m, respectively. Numerical modeling was constructed based on SWASH model using Morrison's equation shown in Equation (4) [54].

$$\mathbf{F}\_{\mathbf{x}} = \frac{1}{2} \rho \mathbf{C}\_{D} h\_{\upsilon} b\_{\upsilon} N\_{\upsilon} \mathbf{U} \, |\, \mathcal{U} \, | \, \mathcal{U} \, | \, \mathcal{U} \, \tag{4}$$

The physical model was constructed in a way that the numerical results can be directly compared with the experimental results. A wide variety of wave characteristics, such as regular, irregular, broken, and non-broken waves, were used in the experiment to obtain additional information. The findings support the idea that vegetation can reduce wave heights. Furthermore, the vegetation influenced the set-down of the waves rather than the set-up of the waves. Data from the experiment were used to assess the effect of wave nonlinearity on wave reduction techniques.

Maza et al. investigated the physical processes involved in flow-mangrove interaction, wave attenuation, and drag forces along a 1:6 scale fringe Rhizophora mangrove forest. A 26 m long forest composed of 135 models built reproducing mature Rhizophora mangrove trees with 24 prop roots were used for the experiment. Using both experimental and numerical approach, it was observed that water depth, the accompanying mangrove frontal area, as well as wave height were shown to be the major variables causing wave attenuation for short waves. Wave shoaling was caused by the forest's seaward slope, which increases the wave steepness. Therefore, the pressures imposed on the mangroves began to rise after 3–4 m. Wave decay models that match wave heights well produce smaller pressures farther into the forest [55].

#### **3. Conclusions**

Models are critical for forecasting and monitoring mangrove functioning and sustainability. Classification techniques are necessary to characterize mangroves for use in coastal flood risk mitigation. Secondly, experimental and numerical mangrove models may be used to replicate severe flooding circumstances (functionality) and anticipate long term development (persistence) in order to analyze the impacts of climatic and human induced alteration. While mangrove model configuration has been extensively used, the creation of experimental and numerical methods with predictive validity is an ongoing area of study.

Globally, coastal areas suffer endemic problems of human induced problems associated with increase in population growth while dealing with the effect of naturally occurring climate change and increased susceptibility to coastal flooding. Mangrove forests can aid flood mitigation and help adapt to climate change. Mangroves are suitable for minimizing coastal flooding when combined with artificial structures. Many researchers are experimenting with different methods of coastal protection measures using a combination of hard and soft engineering structures as hybrid coastal defense strategies. In order to reduce coastal flooding using mangrove forests, there is need to study, analyze, and simulate the essential processes, patterns, and limitations to mangrove efficiencies.

This review provides an overview of the existing literature on experimental modeling and numerical approaches for the effective use of mangrove trees and artificial polymers in coastal protection. Mangrove roots occur naturally and cannot be manipulated unlike artificial model configuration which can be structurally configured with different hydrodynamic properties. Artificial models may lack the real structural features and hydrodynamic resistance of the mangrove root it depicts, and this can reduce its real-life application and accuracy.

#### **4. Innovation and Future Research Direction**

This research is limited to finding the influence of using natural and artificial countermeasures considering different reviews of past literatures on the use of hybrid polymer and mangrove trees. The study is to examine the effectiveness of using the combined polymer and mangrove roots in comparison with each model being used separately for coastal protection. This study recommends the following:


**Author Contributions:** Conceptualization, D.A. and S.A.; methodology D.A. and S.A.; writing original draft preparation, D.A.; writing—review and editing, D.A. and S.A.; supervision, S.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Communication* **Measuring the Adoption of Drones: A Case Study of the United States Agricultural Aircraft Sector**

**Roberto Rodriguez III**

Spatial Data Analysis and Visualization Laboratory, University of Hawaii at Hilo, Hilo, HI 96720, USA; roberto6@hawaii.edu

**Abstract:** Unmanned aircraft systems (UAS), commonly referred to as drones, are an emerging technology that has changed the way many industries conduct business. Precision agriculture is one industry that has consistently been predicted to be a major locus of innovation for UAS. However, this has not been the case globally. The agricultural aircraft sector in the United States is used as a case study here to consider different metrics to evaluate UAS adoption, including a proposed metric, the normalized UAS adoption index. In aggregate, UAS operators only make up 5% of the number of agricultural aircraft operators. However, the annual number of new UAS operators exceeded that of manned aircraft operators in 2022. When used on a state-by-state basis, the normalized UAS adoption index shows that there are regional differences in UAS adoption with western and eastern states having higher UAS adoption rates while central states have significantly lower UAS adoption rates. This has implications for UAS operators, manufacturers, and regulators as this industry continues to develop at a rapid pace.

**Keywords:** unmanned aircraft system; UAS; unmanned aerial vehicle; UAV; drone; agriculture

### **1. Introduction**

Unmanned aircraft systems (UAS), also referred to as unmanned aerial vehicles (UAV) and drones, have made great strides globally as regulatory frameworks have gradually accommodated this growing sector. Precision agriculture is frequently projected to be the most significant industry to benefit from these new tools [1–3]. However, these optimistic predictions have not been achieved [4,5]. In many cases throughout the world, regulatory hurdles remain in the United States [6], Europe [7,8], India [9], and Africa [10]. Meanwhile, UAS have been at the forefront of aerial application in Japan [11], China [12], and Korea [13].

Traditional aerial applications of plant protection products have been at the core of developments in UAS [14–17]. Additionally, some more specific applications in agriculture and forestry have crossed over from manned aircraft, including insect sampling [18,19], encapsulated herbicide applications [20], and aerial ignition [21]. Novel applications that are unique to this platform include vegetation sampling [22–24] and cattle herding [25]. While these new developments have introduced more cases for UAS, the implementation has proven difficult to measure.

Recent bibliometric studies on agricultural UAS have shown an increasing trend [26–28]. However, these studies are biased toward research and remote sensing. This study seeks to measure the implementation of UAS by industry—specifically, agricultural aircraft operators. Several metrics for the assessment of technology adoption have been developed, which are typically based on the percentages of users [29]. For agricultural technology, the Agricultural Technology Adoption Index uses an area that the technology is operated within as the base metric [30]. While the area is an effective base measurement, this data is not publicly available for applied areas of different aerial application technology.

In this study, the adoption of UAS compared to existing manned aviation is considered the use of a number of agricultural aircraft operators in the USA as a case study: Section 2

**Citation:** Rodriguez, R., III. Measuring the Adoption of Drones: A Case Study of the United States Agricultural Aircraft Sector. *Eng* **2023**, *4*, 977–983. https://doi.org/10.3390/ eng4010058

Academic Editor: Antonio Gil Bravo

Received: 31 December 2022 Revised: 27 February 2023 Accepted: 16 March 2023 Published: 17 March 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

describes the materials and methods, including operator data acquisition and analysis; Section 3 describes the results of aggregated data analysis and annual trends; Section 4 discusses the experimental results including implications for regulators; and Section 5 concludes the work.

#### **2. Materials and Methods**

Data on agricultural aircraft operators was downloaded from the FAA databases [31] on 24 November 2022. Agricultural aircraft operators are regulated under Title 14 of the Code of Federal Regulations Part 137. The data was partitioned first by operator type (i.e., Part 137) and then by aircraft operated to separate those operators who had UAS listed on the operator certificates and those who did not. Data were then aggregated by year and by state for temporal and spatial analyses. Agricultural aircraft operators utilizing UAS who had certificates prior to the introduction of Part 107, i.e., operated manned aircraft and added UAS to their existing certificate, were aggregated together during the temporal analysis. The number of farms and the average farm size on a state basis, based on a 2021 USDA report [32], were also incorporated into the analysis. ANOVA was performed using R [33] to identify significant factors correlating with the number of agricultural aircraft operators using UAS.

To illustrate the adoption of UAS, an additional metric, the normalized UAS adoption index, *I*, is defined as

$$I = \frac{n\_{\text{UAS},x}}{n\_{\text{UAS}}} - \frac{n\_{\text{M,x}}}{n\_{\text{M}}}$$

where *n*UAS is the total number of agricultural aircraft operators using UAS, *n*<sup>M</sup> is the total number of agricultural aircraft operators only using manned aircraft, and the subscript *x* denotes the quantity at the individual state level. The normalized UAS adoption index was calculated for each state and regional trends were analyzed qualitatively.

#### **3. Results**

Following the initial introduction of the Part 137 operator certificate in 1967, there has been a steady increase in the number of operators (Figure 1). We see a similar pattern for UAS agricultural aircraft operators following the introduction of 14 CFR 107 in 2016 and the standardized exemption for agricultural UAS operations. At the end of the study period, there were 1767 Part 137 operators, of which 93 (5%) made use of UAS.

**Figure 1.** The total number of manned ((**A**), red) and unmanned ((**B**), blue) agricultural aircraft operators in the United States.

Focusing on manned agricultural aircraft operators, we see that the annual increase in the number of operators was relatively stable after 1986 until 2008 (Figure 2). The years from 2009 to 2020 saw an increase in the annual increase in operators, with a sharp increase starting in 2014. This increase is likely due to changes in the certification process following the FAA Modernization and Reform Act of 2012 [34], and in response to an audit report by the Inspector General [35]. Since 2021, the rate has fallen back to average levels of 29.3 new operators per year, likely due to complications in the certification process caused by COVID-19, e.g., restrictions on travel and meetings preventing in-person knowledge and skill tests.

**Figure 2.** The annual increase in manned agricultural aircraft operators. The average annual increase is 29.3 (black dashed) with a 95% confidence interval of 25.8–32.7 (dashed grey).

Comparing the rate of new operators per year, we see that the addition of UAS operators had a significantly slower start than manned agricultural aircraft operators, which did not have the initial spike that manned agricultural aircraft operators experienced (Figure 3). However, the rate of new additions has rapidly climbed and in 2022, for the first time, the addition of new UAS agricultural aircraft operators has exceeded that of manned agricultural aircraft operators and also the average annual rate of new manned agricultural aircraft operators.

**Figure 3.** The annual increase in manned (red) and unmanned (blue) agricultural aircraft operators and the average annual increase in manned operators since 1967, with a 95% confidence interval.

The ANOVA analysis showed that the number of farms in a state and the number of manned agricultural aircraft operators were both significant factors in determining the number of UAS agricultural aircraft operators (Table 1). On a state-by-state basis, the number of UAS agricultural aircraft operators is only weakly correlated with the number of manned agricultural aircraft operators (Figure 4A). This indicates that UAS are not simply replacing a portion of the existing aerial application market. Over a third of states do not have a UAS agricultural aircraft operator, and yet half have two or more, which indicates a regional bias (Figure 4B). The primary factor in this regional bias is the number of farms in a particular state. The normalized UAS adoption index (Figure 5) further illustrates this regional bias with states with relatively high adoption indices concentrated together. A positive index value indicates that the rate of increase in the number of agricultural aircraft operators using UAS exceeds that of operators using only manned aircraft while a negative index value indicates the opposite.

**Table 1.** Results of statistical analysis of factors affecting number of UAS agricultural aircraft operators. Only significant results, number of farms, and number of manned agricultural aircraft operators are shown.


**Figure 5.** Normalized UAS adoption index across the United States indicates significant regional bias in the adoption of UAS for aerial application.

#### **4. Discussion**

While the total number of agricultural aircraft operators utilizing UAS is relatively low compared to that of manned agricultural aircraft operators, the rapid increase in the annual rate of new agricultural aircraft operators using UAS, which has now overtaken the rate of new manned agricultural aircraft operators, provides a strong argument that the industry has finally started taking these new tools seriously. This also has ramifications for the FAA as the agency must now account for double the amount of new Part 137 applicants, with roughly half being potential UAS agricultural aircraft operators. The relatively high number of states without an agricultural aircraft operator with UAS also impacts local Flight Standards Districts Offices (FSDO), with many having no experience with these new aircraft, which will lead to difficulties performing oversight and inspections of these new agricultural aircraft operators.

Based on the individual state analysis, there is still some regional bias to operators, as has been previously noted [6]. In particular, western and eastern states have higher UAS adoption rates while central states have significantly lower UAS adoption rates. The primary determining factor is the number of farms in a particular state, with the number of manned agricultural aircraft operators having a smaller effect size. Additional regional variability may be due to the types of crops in these areas and continuing regulatory barriers. Manufacturers may use this information when considering customer service locations, e.g., for repairs of UAS.

The normalized UAS adoption index as a metric was able to capture this regional bias. To analyze individual factors such as regulation and crops, alternative geographic boundaries could be used in place of state boundaries, such as Flight Standards Districts. The normalized UAS adoption index could be further applied within the United States to analyze other types of operator, such as Part 135 air carrier operators, or applied on a global scale to analyze the adoption across different nations in order to understand how variations in regulations have helped or hurt the adoption of UAS.

The limitations of this study include the use of the headquarters' location listed on the certificate and the lack of applied area data in the calculation of adoption rates. The service area of an agricultural aircraft operator can extend beyond the state that the base of operations is located in. In particular, adjacent states typically accept the pesticide applicator license based on reciprocity. Due to the limited payload capacity of UAS, the applied area per flight is typically much lower than that of manned aircraft. This would result in a bias in area calculations, as manned aircraft are currently a more economical alternative to UAS.

#### **5. Conclusions**

Based on aggregate numbers of operators in the USA, UAS still have a long way to go in comparison to manned agricultural aircraft operators in agricultural operations, with only 5% of agricultural aircraft operators using UAS. However, in terms of new operators being added to the sector, UAS are now leading the charge. The normalized UAS adoption index, a proposed metric to evaluate the introduction of UAS into a sector, applied on a state-by-state basis, indicates a strong regional bias in the distribution of these operators. This index may be applied to other operator types and other geographic boundaries to determine factors that may be impacting UAS utilization.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All underlying data is made publicly available at https://av-info.faa. gov/dd\_sublevel.asp?Folder=\AIROPERATORS (accessed on 24 November 2022).

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Eng-Advances in Engineering* Editorial Office E-mail: eng@mdpi.com www.mdpi.com/journal/eng

Academic Open Access Publishing

www.mdpi.com ISBN 978-3-0365-7531-5