**1. Introduction**

Songbirds are one of the study targets for the purpose of ecoacoustic research [1,2]: an interdisciplinary science that investigates natural and anthropogenic sounds and their relationship to the environment across multiple scales of time and space [3], as well as bioacoustic research. This is because their vocalizations (1) can tell us a suite of useful information about the environment for monitoring, (2) have rich and complex variety of structures [4], which are used as benchmark problems for classification tasks (e.g., BirdCLEF [5]), and (3) enable bird individuals to interact in complex ways, behaving as complex systems [6].

There are several approaches for using microphone arrays to localize bird vocalizations. Rhinehart et al. recently surveyed applications of acoustic localization using autonomous recording units in terrestrial environments [7], and pointed out that ecologists will make better use of acoustic localization; it can collect large-scale animal position data with minimizing the influence on the environment if recording hardware and automated localization and classification software are more available, and their algorithms are improved for outdoor measurement.

**Citation:** Suzuki, R.; Hayashi, K.; Osaka, H.; Matsubayashi, S.; Arita, T.; Nakadai, K.; Okuno, H.G. Estimating the Soundscape Structure and Dynamics of Forest Bird Vocalizations in an Azimuth-Elevation Space Using a Microphone Array. *Appl. Sci.* **2023**, *13*, 3607. https://doi.org/10.3390/ app13063607

Academic Editors: Luis Gracia and Carlos Perez-Vidal

Received: 26 December 2022 Revised: 27 February 2023 Accepted: 4 March 2023 Published: 11 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Some of these studies focused on both azimuth and elevation estimation [8–10]. Hedley et al. developed a 4-channel microphone array unit by combining two stereo microphones and evaluated the accuracy to estimate azimuth and elevation angles of replayed songs of a few species [9]. The results showed that most of sounds were estimated within 12 degrees of the true direction of arrivals (DOA) in the azimuth angle and 9 degrees in the elevation angle within a range of at least 30 m. It was also discussed that the DOA estimation may improve the ability to assess abundance in biodiversity surveys. However, the experiment was conducted in an open space, and the elevation angle was limited to −10 to 15 degrees from horizontal. As a different approach, Gayk et al. constructed a microphone array system to estimate 3D position of flying songbirds with a wireless microphone array. The system was consisted of four 7-m poles arranged in a 25 m square, and each pole had two microphone channels that are placed on top and bottom of the pole. They adopted a triangulation method based on time-of-arrival differences of a sound recorded at these microphones to cross-correlate and estimate sound position. They showed that both broadcasted bird calls and calls of natural migratory birds were successfully triangulated with the accuracy and estimated accuracy of less than 3 m. In addition, there is increasing interest and development for sound event localization and detection (SELD) of various environmental sounds using microphone arrays and ambisonic microphones [11]. We expect that practical experimental analyses of natural sounds such as bird vocalizations in forests can further contribute to better use of such microphone array-based techniques in natural fields.

We have been proposing that robot audition techniques [12], especially an open source software for robot audition HARK (Honda Research Institute Japan Audition for Robots with Kyoto University), can contribute to bioacoustics and ecoacoustics. It not only provides the DOA estimation of sounds, but also allows us to separate them and perform further signal processing on them, even in real time. We developed HARKBird, a collection of Python scripts for localizing bird songs in fields using HARK [13]. Previously, we confirmed the effects of playback of conspecific song on song or call responses by measuring the changes in their localized direction [14] and changes in their 2D position using a set of microphone arrays [15].

It is recognized that data characterizing the vertical structure of vegetation are becoming increasingly useful for biodiversity applications as remote sensing techniques such as radar and lidar become more readily available [16]. We believe that direct observation of the vertical and horizontal soundscape of vocalizations among birds would also contribute to this field, as well as to bioacoustic analysis of bird behavior. There is initial work on 3D localization of bird songs using multiple microphone array units [10] and observation of nocturnal birds with a single microphone array unit based on the azimuth-elevation estimation [17]. However, we still need to investigate how HARK or HARKBird can estimate both azimuth and elevation angles of bird songs to capture the dynamics and structures of the soundscape of bird songs. In particular, a systematic evaluation of the localization accuracy of elevation angles and an estimation of the structure of soundscape in a realistic situation where multiple bird species are vocalizing are important for the practical use of the system.

This paper aims to demonstrate a systematic evaluation of the localization accuracy of azimuth-elevation angles of replayed bird vocalizations in a practical forest environment, and show an example field observation of the structure and dynamics of birdsong soundscape. For this purpose, we use a self-developed 16-channel microphone array, called DACHO, using HARK and HARKBird. Suzuki et al. [18] used the same microphone array to conduct spatiotemporal analysis of acoustic interactions between great reed warblers (*Acrocephalus arundinaceus*). They conducted a 2D localization of their vocalizations using multiple arrays and estimated the location of two individuals' song posts with mean error distance of 5.5 ± 4.5 m from the location of observed song posts. They then evaluated the temporal localization accuracy of the songs by comparing the duration of localized songs around the song posts with those annotated by human observers, with an accuracy score of average 0.89 for one bird that stayed at one song post. However, the localization accuracy

of songs in the elevation angle was not evaluated, and thus a systematic analysis of the accuracy of elevation angle estimation in field conditions would supplement and strengthen our knowledge about the application of robot audition techniques to ecoacoustic research.

We used a single microphone array unit because it is a minimal system and its cost is low for field deployment. We think that sound source localization is useful to passively monitor auditory behaviors of rare or nocturnal birds. Localized results can be used to estimate the abundance and the distribution of those birds. The high portability and low deployment cost are both essential in such a case.

First, we evaluated the accuracy in estimating the azimuth and elevation angles of bird vocalizations replayed from a loudspeaker on a tree, 6.55 m above the height of the array, from different horizontal distances in a forest. The results showed that the localization error of azimuth and elevation angle was equal to or less than 5 degrees and 15 degrees, respectively, in most of cases when the horizontal distance from the array was equal to or less than 35 m. We then conducted field observation of vocalizations to monitor birds in a forest. The results showed that the system can successfully capture how birds use the soundscape horizontally and vertically. This can contribute to bioacoustic and ecoacoustic research, including behavioral observations and study of biodiversity.

The organization of the paper is as follows: We firstly introduce two cases of experimental trials: a speaker test and field observation of soundscape dyamics of bird vocalizations, and introduce the sound source localization method based on HARK and HARKBird in Section 2. Then, we show experimental results of the two trials in Section 3, and finally summarize and discusses the significance of the findings and their implications for further contribution to ecoacoustics and related fields in Section 4.
