*2.2. Field Observation of Soundscape Dynamics of Bird Vocalizations*

We also conducted a field observation with the same microphone array set up in the Inabu field, the experimental forest of the Field Science Center, Graduate School of Bioagricultural Sciences, Nagoya University, Japan. The forest is mainly a conifer plantation with small patches of broad leaf trees. We placed a DACHO on a path around a patch of broad leaf trees as shown in Figure 4.

**Figure 4.** A panorama (**A**) and 360 degree (**B**) photos around the microphone array.

Recording was conducted from 4 April to 7 May 2018. Common bird species actively vocalized during the breeding season here. In particular, Blue-and-white Flycatchers (*Cyanoptila cyanomelana*) tend to sing, advertising their territories, on top of tall trees in their territories. We focused on a 1000-s recording from 8:00 AM on 3 May (Figure 5), where such a typical pattern of bird vocalizations was observed.

**Figure 5.** A spectrogram of the whole recording and 5 time slots on which we focus in the analysis.

#### *2.3. Bird Song Localization Using HARKBird*

We used HARKBird 2.0 [13], a collection of Python scripts for bird song localization, to estimate the DOA of sound sources in recordings, using sound source localization and separation functions in HARK.

The employed sound source localization algorithm is based on the multiple signal classification (MUSIC) [19] using multiple spectrograms obtained by short-time Fourier transformation (STFT). The MUSIC method is a widely used high-resolution algorithm, and is based on the eigenvalue decomposition of the correlation matrix of multiple signals from a microphone array. We adopted the standard eigenvalue decomposition (SEVD) MUSIC method implemented as one of the sound source localization methods in HARK. All localized sounds are separated the sounds as wave files (16 bit, 16 kHz) using geometric high-order decorrelation-based source separation (GHDSS) method [20], which is also implemented in HARK. For more details on HARKBird (http://www.alife.cs.i.nagoya-u. ac.jp/~reiji/HARKBird/), accessed on 25 December 2022, see [13,21] and on HARK, see Nakadai et al. [12]. In order to optimize localization performance, we can adjust some parameters of HARKBird, such as the source tracking and the lower bound frequency for MUSIC, to reduce noise, etc.

We used a transfer function of the microphone array created from a numerical simulation based on the geometry of the channels of the microphone array using HARKTool5, assuming that there no effects of the body of the array unit on sound transmission. The resolution of DOA for azimuth angle was 5 degrees. The resolutions for elevation angles were 5 and 15 degrees for a speaker test and a bird observation, respectively.

For the DOA estimation of replayed vocalizations in the speaker test, we used the limited frequency range (1800–5000 Hz) for sound source localization, which included the replayed songs. We found that the amplitude of the replayed vocalizations became weaker the farther as the microphone array was from the speaker. Therefore, we gradually decreased the threshold parameter (THRESH), which determines the minimum value of the MUSIC spectrum to detect a sound source, from 29 to 20 with increasing distance. we determined these threshold values empirically according to the acoustic condition around the microphone array. However, this resulted in HARKBird localize vocalizations of other bird species more frequently. Therefore, we manually selected the localized sounds that were detected as replayed vocalizations, and excluded other localized sounds from the analyses. We also lowered the threshold in degree to distinguish multiple sound sources in different directions when there were other sound sources in closer directions to the replayed songs. This is to avoid recognizing them as a part of the replayed vocalizations. We used default values for the other parameters of HARKBird.

For the field observation, we focused on five 100-s time slots (A–E) during which a Blue-and-white Flycatcher (Figure 5) sang on top of tall trees, along with other species such as Varied Tit (*Sittiparus varius*) and Coal Tit (*Periparus ater*). We focused on the behavioral changes in the individual of Blue-and-white Flycatcher, and chose the durations that well illustrated his different behavioral patterns.

We adjusted parameters of HARKBird to localize their songs and exclude other sound sources. We plotted the distribution of songs in the polar-coordinate system representing the azimuth-elevation space for each slot. We then calculated the elevation and azimuth variations of the localized sounds to see if such statistical metrics could reflect the soundscape structures of bird songs.

#### **3. Results**

### *3.1. Speaker Test*

Figure 6 shows the estimated azimuth (left) and elevation (right) of the replayed vocalizations in Experiment 1 (top) and 2 (bottom). A red line represents the expected value. Each box plot represents the distribution of localized values when the microphone array was placed *x* m from the loudspeaker. In Experiment 1, the expected azimuth and elevation angles were 0 degrees. The errors of observed azimuth were equal to or less than 5 degrees when the distance was equal to or less than 50 m. The errors of observed elevation were equal to or less than 10 degrees except in the case of 15 m distance. The slightly larger error when the distance = 15 m was expected to be due to the vocalization of another species (Brown-eared Bulbul (*Hypsipetes amaurotis*)). The large error when the distance = 0 m is expected to be due to that the DOA substantially became different among localized sounds because they can change drastically even with small noise if the microphone is right under the loudspeaker. Thus, both the elevation and azimuth angles of replayed vocalizations were successfully estimated in this experiment.

In Experiment 2, the expected azimuth was 0 degrees, while the expected elevation decreased inversely proportional to the distance of 90 degrees, as shown in Figure 6 (bottom right). The errors of the observed azimuth were equal to or less than 5 degrees until the horizontal distance was equal to or less than 35 m, while it became larger than Experiment 1. This result was expected because the net distance between the microphone and the loudspeaker was larger. This was also expected because other species were vocalizing in the same direction as the speaker, causing the localization of replayed sounds to deviate from the expected value, which was sometimes observed in Experiment 2.

The observed elevation also reflected well the expected value; it decreased inversely proportional to the distance and errors were less than 15 degrees, except in some cases (e.g., 15 or 40 m away). We expect that the errors can be reduced if we adopt a transfer function with a higher resolution of elevation angles.

Overall, we were able to correctly estimate both the elevation and azimuth of bird vocalizations even when the songs were far away from the microphone array, if there were no other vocalizations or sounds in the similar direction as the target sounds.

**Figure 6.** An estimated azimuth (**left**) and elevation (**right**) of replayed songs in Experiment 1 (**top**) and 2 (**bottom**).
