1. Introduction
Passive acoustic monitoring (PAM) is a tool widely used to monitor underwater environments: it is a flexible, non-invasive and cost-effective solution to acquiring information from remote areas or from shallow, coastal areas. Within PAM, soundscape analysis offers the possibility of surveying entire underwater habitats and their acoustic environments simultaneously, which is critical to understanding their fundamental interaction and long-term dynamics [
1,
2]. A soundscape is commonly known as the complex chorus of sounds from a certain habitat [
3,
4,
5]. The sounds contributing to a soundscape are typically subdivided into three categories: biophony (biotic events); geophony (abiotic events); and anthrophony (human activities). In the marine environment, biological sound sources include sounds actively produced by marine animals in order to communicate, navigate, mate and forage: these signals can then be used to detect animal presence, behavior and interactions [
6]. In addition to these actively produced sounds, some animals also produce sound as a byproduct of their behavior. Abiotic sounds are mainly produced by physical events, such as rain, wind, sediment transport or current. A soundscape can also comprise sounds coming from human activities, such as shipping, pile driving or sand and oil extraction, increasingly present during recent decades [
7]. Furthermore, a soundscape is also influenced by passive elements, such as the vegetation and landform of a particular location. Therefore, the sound signature of a soundscape depends on the shape and structure of the animals and plants living there, their interaction with the current and their influence on the sound propagation [
8,
9]. In addition to local elements, distant sounds also contribute to the local soundscape. The propagation of these distant sound sources can ultimately be shaped by depth, topography, salinity and temperature. All these elements have an influence in how these sounds are received. Soundscape analysis can provide valuable ecological information about the structure and functioning of the ecosystem, and can be used for monitoring purposes.
Shallow marine environments present complex interactions between all their sound components, which mostly stem from complex propagation patterns generated by the reflections at the water surface and the sea bottom [
10]. Shallow soundscapes tend to contain many sound sources, resulting in complex acoustic patterns and often in relatively high sound levels [
11]. Furthermore, in shallow water areas with strong tides, the tide has a significant influence on the sound propagation channel, hence affecting the received sound levels. As a consequence, the relationship between the acoustic activity produced and the sound received at a certain listening location can be intricate and difficult to disentangle.
Shallow environments, such as the Belgian Part of the North Sea (BPNS), are quite particular when studied acoustically [
12]. Lower frequencies do not propagate very far, because the wavelength is greater than the water column, and other frequencies are amplified; therefore, the shallow water depth forces the soundscapes to vary over small spatial scales—some meters [
13]—due to variation in the occurrence and proximity of human use patterns, as well as sound propagation conditions and biological activity. Consequently, the faunas occupying the different habitats mapped in the BPNS are expected to contribute to the soundscape, and so are the faunas living in the multiple shipwrecks present in the BPNS. The different types of substrate influence the sound propagation, and thereby the soundscape signature as well. In addition to the spatial soundscape variation, seasonal, latitudinal and celestial factors will have an effect on the presence of particular sounds. Certain sounds present cyclical patterns, or repeat at regular intervals, while others occur at random times. This time-dependent occurrence can be either short transient signals that occur over seconds or minutes, or a continuous presence in the soundscape over hours or days, resulting in chronic contributions to the soundscape. Understanding, measuring and integrating site features and environmental parameters is, therefore, necessary, when interpreting and characterizing soundscapes in shallow waters: this will help to further understand the sound composition. Including these parameters can thus be useful when comparing soundscapes between sites [
11].
Studying marine soundscapes holistically, instead of focusing on specific sound events, can provide us with information at habitat or community level. In a human-centered way, ISO 12913-1 [
14] includes perception and understanding of the definition of soundscape; therefore, a soundscape is defined by how it is perceived and understood. This definition can be extended to the underwater world, by including the perception and meaning of a variety of species or, in other words, considering ecological relevance. Understanding the acoustic environment in an artificial intelligence context is also referred to as acoustic scene recognition, which refers to one of the important functions of the acoustic scene: supporting context and situation awareness. Acoustic scene identification does not need to be based on specific event recognition, but can be based on an overall sound impression [
15]. In waters where visibility is low, asserting situation and context may be relevant for many species. The holistic soundscape contains many different sounds, and some species may use auditory stream segregation to disentangle these sounds depending on their relevance. However, in a multi-species and ecological approach, a more holistic technique may be more appropriate than methods that separate and classify individual sound sources [
12]. Here, we mainly focused on continuous sound or sounds that were repeated frequently in a (complex) sequence (e.g., a fish chorus or breaking waves), including the combination of all sounds that occurred under certain conditions at specific places.
Characterizing marine soundscapes remains a challenging task, as there are no standards by which to do so, as mentioned in the International Quiet Ocean Experiment (IQOE) report published by the Marine Bioacoustical Standardization Working Group [
16]. Historically, it has been done by detecting certain known acoustic events of relevance, such as animal vocalizations, ships passing by or wind intensity [
17]: the soundscape is then defined using the proportion and temporal distribution of these specific events [
18,
19]. This approach is usually done manually or using supervised methods, which require previous investment in laborious data annotation. Another common approach is to measure the variation in pressure within a specified frequency band and time interval—sound pressure levels—; however, only using sound pressure levels can limit the scope of interpretations [
11]. Other studies propose the use of eco-acoustic indices to characterize soundscapes [
20,
21]: these indices summarize the acoustic information from a particular soundscape in a single value. They have been used to characterize acoustic attributes from the soundscape, and to test their relation to the structure of the ecological community, habitat quality and ecological functioning in ecosystems. They have been successfully used in terrestrial environments [
22]. However, even though these approaches have been successfully implemented in some marine eco-acoustics studies [
23,
24,
25], the performances of these indices have not been consistent across studies [
26]. Moreover, Bradfer-Lawrence et al. (2019) [
27] have observed that acoustic indices as a habitat indicator do not stabilize until 120 h of recordings from a single location. By not using hand-crafted eco-acoustic indices, we thus avoided focusing on a small number of specific acoustic features that had not been proven to be robust in marine environments.
In this study, we propose an unsupervised method of assigning a label to different underwater acoustic scenes, with the aim of categorizing them. Previous studies have shown how different environments have distinct acoustic signatures that cluster together [
28]. Other studies have successfully used unsupervised clustering algorithms to discriminate between terrestrial ecological habitats [
29], and to test if the combination of several acoustic indices could capture the difference in the spectral and temporal patterns between under-shelf and pelagic marine soundscapes [
30]. Sethi et al. (2020) [
28] proposed unsupervised clustering as a method of detecting acoustic anomalies in natural terrestrial soundscapes. Michaud et al. (2023) [
31] and Ulloa et al. (2018) [
32] used unsupervised clustering on terrestrial soundscapes, to group pre-selected regions-of-interest from terrestrial soundscapes. Clustering has also been used as a tool to speed up labeling efforts [
33], and has been proposed as a method of monitoring changes in the acoustic scene in terrestrial habitats [
34]: however, in these cases, the obtained categories had to be manually analyzed, to have an ecological meaning.
Knowledge about the marine acoustic scene is still limited, and there are no defined marine soundscape categories because, so far, humans have not needed to name them. To further understand these categories, we propose to explain them according to the spatiotemporal context in which they occur. This is the first study to describe soundscape categories in an automatic way, and also the first to consider the time component in soundscape categorization. The proposed solution is particularly useful in areas where the underwater acoustic scenes (soundscapes) have not yet been described. This is often the case in areas where the water is often too turbid to employ camera or video sampling techniques, and where the sound signatures of most of its sound sources (especially biological ones) are not known. To understand when and where these categories occur, we linked them to environmental data, using a supervised machine learning model. Interrelationships were checked, using explainable artificial intelligence (XAI) tools [
35]. These tools allowed for assessing which of the environmental features were important in predicting each class, and had already been successfully implemented in some ecology fields [
36]. The classes that could not be explained from the environmental parameters were not considered soundscape classes, but just a certain sound class. Afterwards, by interpreting the XAI outcome, we could infer when and where these categories were found. The tools for the machine learning models’ interpretation led to an understanding of which environmental parameters were representative in different soundscapes, and what differentiates soundscapes from one another, ecologically.
The relevant environmental conditions of each cluster could then be used to describe and understand each category, without the need for a large annotation effort: this helped explain acoustic dissimilarity between habitats, and also provided a baseline for the soundscapes in their current state. In conclusion, we propose an automatic solution to extract relevant ecological information from underwater soundscapes, by assessing which environmental factors are contributing to the soundscape, and quantifying their significance. This should allow for monitoring of relevant ecological processes and major changes in the underwater ecosystems. Furthermore, we propose a semi-supervised method by which to remove artifacts from a dataset before processing. The analysis was performed on a dataset recorded in the BPNS, a highly anthropogenic shallow water area with high turbidity.
4. Discussion
In this study, we showed that by using an unsupervised approach we were able to categorize different marine shallow water soundscapes. In addition, we demonstrated that by using an automatic (supervised) approach, based on Explainable machine learning tools, we were able to characterize these categories ecologically. Our method was able to group soundscapes in different categories, which could be used to understand the spatiotemporal acoustic variations of a dataset. The obtained categories were afterwards proved to be connected to environmental parameters through an RF classifier. To understand the predictions made by the trained RF, SHAP values were used: this allowed for the assessment of the main environmental parameters shaping the acoustic categories in general, but also per category, which provided a practical ecological profile for each soundscape category.
Our results show that the acoustic data analyzed by UMAP and DBSCAN clustered mostly in clear, independent groups. This indicates that there were major and quantifiable differences in BPNS’ underwater soundscapes, and that one-third octave bands encoded enough information to capture these dissimilarities. This was an advantage, due to the current availability of built-in implementations in some recorders, and the different available tools to compute them. We chose the frequency band of study to capture the sound sources of interest, but increasing the frequency range could lead to new clusters. The chosen frequency band captured enough information to obtain distinct clusters, even though increasing the frequency range might have increased the number of clusters. Furthermore, similar artifact sounds clustered together. This was in line with findings from Sethi et al. (2020), where artifacts could be detected by using an unsupervised clustering technique. Accordingly, the semi-supervised process used in this study could be applied to detecting artifacts in acoustic datasets. This would be especially useful for long-term deployments, where exhaustive manual analysis is too time consuming, and it would be a rapid solution for detecting instrument malfunction events.
The RF classifier was able to correctly classify more than 90% of the BPNS’ acoustic data into the 17 relevant soundscape categories: this high accuracy suggests that the environmental parameters included in the analysis were good indicators of the observed acoustic patterns. The SHAP values showed that in our case study, time of the day, instrument depth, distance to the coast, salinity and moon phase were the most important environmental parameters shaping and differentiating the soundscape categories. These results should be carefully interpreted. The importance of the environmental parameters was not necessarily correlated to their influence on the total sound: rather, it described their relevance to discriminating between categories. For example, if all the categories from the study area had had a considerable and equally distributed sound contribution from shipping, the feature would not have had a great effect in differentiating them: as a consequence, it would have had a low importance score in the model. However, if we had expanded the dataset with acoustic data from other areas in the world with less shipping influence, shipping would have become a very dominant feature in explaining the categories’ differences. In addition, the redundant variables that were removed should not be ignored, but should be considered together with the redundant pair. For example, temperature was removed, because of its redundancy with the season (week_n_sin). Consequently, in all the clusters where season has an influence, we would not be able to distinguish if the real effect was the temperature or the season: we would know, however, that these two correlated parameters had an influence on the soundscape. If distinguishing between two redundant features would be relevant, more data should be collected to that effect, in a way that the two variables are not correlated.
Instrument depth showed a strong influence on discriminating between soundscape categories. This result could have been expected: as the recording depth was not kept at the same level and in such shallow environments, acoustic changes in the soundscape occurred at very small spatial scales, and vertically in the water column [
45]. It is therefore important to always consider and report the hydrophone depth when comparing different soundscapes, to avoid any misleading conclusions. Furthermore, to exclude the position effect, and in order to better assess the biotic-driven soundscapes in shallow environments, recordings should be taken at a fixed depth.
The obtained categories showed the expected acoustic variation: this reflected the dynamic environment in the BPNS, and the need to study these soundscapes in more detail, in order to better understand this specific marine acoustic environment. Spatial or temporal change in environmental variables could be noticed in the acoustic scenes, and the obtained categories reflected these changes. In general, a greater proportion of spatial environmental parameters were determined to be important by the RF model: this was most likely explained by the fact that we did not record over long periods of time (e.g., over all four seasons or years) and, therefore, many biotic acoustic patterns that are known to be temporally driven [
69] were not completely captured in our dataset.
Future work should hold the same analysis on a more extensive dataset with stationary recording stations, such as the dataset from the LifeWatch Broadband Acoustic Network [
70]. This would allow for testing the potential to capture circadian, monthly or seasonal cycles. To be able to generalize our conclusions and test the robustness of the method, it should be applied to new long- and short-term data from different underwater acoustic contexts. If these datasets came from ecosystems that were well-studied acoustically, the results would be contrasted. We would thereby be able to assess whether the obtained categories were representative and informative enough, and if they matched the currently existing knowledge or if they complemented it. Furthermore, the incorrectly classified data could also be analyzed manually, to detect specific events, and to have more insight into the missing explanatory parameters.
The method proposed here could be particularly useful in environments where the visual correlation between ecological factors and the underwater soundscape cannot be established: this includes low visibility and other challenging conditions, such as those occurring in remote areas with high latitudes, where the winter season prevents traditional ways of surveying, or in highly exploited areas. In these cases, a rapid and automated tool, capable of characterizing the soundscape, and of monitoring its potential changes in relation to relevant environmental drivers, would be very valuable.
Ecologically characterizing the soundscape categories is only possible if data from all the environmental parameters are available. If not, the method could still be applied to categorize the different recorded soundscapes into acoustically relevant categories that could help guide conservation decisions on, e.g., areas with diver soundscape patterns. It would also be possible to use the categories to optimize the sampling effort, and to only sample for potential drivers where the soundscape categories are, e.g., most distinct. If no environmental data from a specific site were available, it would be possible to train the model on a similar dataset, but with available environmental data. The acoustic data could then be explored, according to the obtained classification, to assess whether there were similarities between the soundscape categories obtained in both datasets, thus establishing a potential relationship with analogous drivers. In addition, the acoustic categories obtained in such an unsupervised way could be manually analyzed and labeled, and subsequently used as a baseline for future monitoring, to assess the acoustic change in time or the spatial acoustic (dis)similarities in a certain environment.