*2.1. Speaker Test*

We conducted a speaker test to investigate whether and how bird vocalizations can be localized in a forest environment using azimuth and elevation angles. The experiment was conducted at Nagano park, Kawachinagano, Osaka, Japan on 3 December 2018 (Figure 1). Figure 2 shows a schematic diagram of two experimental setups. In Experiment 1, we placed a loudspeaker on a tripod (height = 1.3 m). A 16-ch microphone array DACHO (WILD-BIRD-SONG-RECORDER; SYSTEM IN FRONTIER Inc., Tokyo, Japan) was also placed on a tripod. The array was specifically developed for bird observations in the field. It consists of 16 microphones, arranged within an egg-shaped frame, which is 17 cm in height and 13 cm in width. It records using a 16-channel, 16 bit, 16 kHz format. Recorded raw data are stored in SD cards and can be exported in wave format for further analysis. One can schedule a recording by preparing the time settings in a micro-SD card. See [18] for more detail and an example of using this microphone array in open fields. We changed the distance between the loudspeaker and the microphone array from 0 to 65 m, with an interval of 5 m, by moving the microphone array along a straight path. This is because the maximum length of the ridge that could be considered straight was 65 m around the loudspeaker. Within this distance, a spacing of 5 m was chosen as it was sufficient to measure the effect of the difference in loudspeaker height and the horizontal difference between the array and the loudspeaker.

We replayed a sound file containing four vocalizations of Scaly Thrush (*Zoothera dauma*) at each location as shown in Figure 3. The distance between the loudspeaker and the microphone was 30 m (Experiment 1). In this figure, four vocalizations of the replayed songs were localized successfully, and at the same time, other sound sources were localized around 1 and 8 s. This species is known to sing this type of songs mainly at night. In this experiment, we adopted this vocalization as the playback sound, to simulate observations of such nocturnal vocalizations, which are not easily observed by other methods such as video recordings.

In Experiment 2, we attached the loudspeaker on a tree, 6.55 m above the height of the microphone array. This is because it was the maximum height at which we could safely place the loudspeaker and at which we could study the effect of the height of the loudspeaker on the localization accuracy of the replayed sound. We performed the same speaker experiment as in Experiment 1.

**Figure 1.** A snapshot of the experiment. We used DACHO, a 16-ch microphone array, for recording replayed songs from a loudspeaker.

**Figure 2.** A schematic image of the experimental condition.

**Figure 3.** An example of the recording of a replayed sound (**top**) and the localization results (**bottom**). We used the latter part of a replayed sound that include four vocalizations of Scaly Thrush, which is shown in the top figure. The bottom figure shows a heat map of the MUSIC spectrum, whose value represents the strength of sound existence in the corresponding direction. Each black line represents the duration and direction of a localized sound.
