1. Introduction
The past few decades have seen the development, miniaturization and cost reduction of a variety of sensors that can be attached to animals to monitor their behavior, physiology and environment [
1]. Data (archival) loggers are particularly appealing if the device can be retrieved due to their capacity to store large datasets, allowing for high sampling frequencies and thus fine-scale monitoring [
2]. Often, sensors are used in tandem to better identify and contextualize behavior. For example, a tri-axial accelerometer can be used to measure body motion and posture in the three orthogonal planes, through dynamic and gravitational forces, respectively. In turn, distinct behaviors corresponding to these waveform signatures can be identified (through direct-observation, i.e., “ground-truthing”) or inferred, which has made them a popular choice for scientists aiming to understand the activity of an animal in the wild. When used in conjunction with sensors that provide information on the body’s angular velocity and rotation—through a gyroscope and magnetometer, respectively—the ability to reconstruct and differentiate behaviors can be improved [
3,
4,
5]. However, with each sensor potentially yielding millions of data points, manually deciphering behaviors from these inertial measurement unit (IMU) data sets is impractical. As such, numerous machine learning (ML) methods have been employed to automate the process of classifying animal-borne sensor output into behavioral classes [
6,
7,
8,
9].
Murphy [
10] defines ML as “a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty”. ML is typically divided into two main types, supervised and unsupervised learning, each with advantages and disadvantages [
8]. In supervised learning, a training data set is required whereby the input vector(s)
x (e.g., sensor channel features) and associated outcome measure/label in vector
y (e.g., behavior) are known. Once the input vectors can be appropriately mapped to the outcome, the algorithm can be used to make predictions from new input data [
11]. This is termed supervised learning, as the outcome label is provided by an “instructor” who tells the ML algorithm what to do. If an animal cannot be housed in captivity for direct observation, or simultaneously fitted with the sensor(s) and a video camera while in situ, building a detailed training set may not be possible. In such instances, unsupervised learning can be implemented. Pre-defined classes are not provided by an instructor (hence “unsupervised learning”), but rather the algorithm finds structure in the data, grouping it based on inherent similarities between input variables [
11]. While the terms supervised and unsupervised learning help to categorize some of the methods available, the two concepts are not mutually exclusive and can be used in tandem when labeled data is available for only a portion of the dataset (e.g., semi-supervised, multi-instance learning).
Recently, deep learning approaches have become popular for modeling high-level data in areas such as image classification [
12], text classification [
13], medical data classification [
14] and acoustic sound classification [
15]. Unlike supervised machine learning approaches, deep learning is a form of ML that does not require a manual extraction of features for training the model but instead can be fed raw data (
Figure 1). Its development was driven by the challenges faced by conventional ML algorithms including the inability to generalize well to new data, particularly when working with high-dimensional data and the computational power required to do so.
Various deep learning approaches have been applied to accelerometer data for human activity classification including convolutional neural networks (CNNs), long short-term memory (LSTM) and a combination of the two [
16,
17,
18,
19,
20,
21,
22,
23,
24]. Aviléz-Cruz et al. [
19] proposed a deep learning model that achieved 100% accuracy across six activities, compared with 98% and 96% for the two most competitive conventional ML approaches (Hidden Markov Model and support vector machine, SVM, respectively). The model had three CNNs working in parallel, all receiving the same input signal from a tri-axial accelerometer and gyroscope. The feature maps of the three CNNs were flattened and concatenated before being passed into a fully connected layer and finally an output layer with a Softmax activation (a function that converts the numbers/logits generated by the last fully connected layer, into a probability that an observation belongs to each potential class [
25]). Other studies demonstrate the relevance of using LSTM networks for human activity recognition [
17,
20,
21,
22,
23]. Lastly, a few studies have suggested augmenting CNNs with LSTM layers [
26]. For example, Karim et al. [
26] proposed a model architecture in which a three-layer CNN and an LSTM layer extract features from sensor data in parallel. The resulting feature vectors are then concatenated and passed into a Softmax classification layer. Although deep learning can yield improved classifier performance over conventional ML methods, it has been sparsely applied for animal behavior detection from IMU data [
8].
Within the realm of marine fishes, IMU sensors have been widely applied to highly mobile species including sharks [
27,
28,
29], Atlantic bluefin tuna (
Thunnus thynnus) [
30], dolphin fish (
Coryphaena hippurus) [
31] and amberjack (
Seriola lalandi) [
32], providing insight into biomechanics, activity patterns, energy expenditure, diving and spawning behavior. However, application of IMUs to more sedentary species that persist predominantly over highly complex structures, such as natural and artificial reefs, are rarer. These species, for example grouper, can be expected to engage in different behaviors to that of highly mobile species and present a different activity budget.
Groupers (family
Epinephelidae) are comprised of more than 160 species of commercially and recreationally important fishes that inhabit coastal areas of the tropics and subtropics [
33]. This family of long-lived fishes shares life history traits that make them particularly vulnerable to overfishing, including: late sexual maturity, protogyny, and the formation of spawning aggregations [
34,
35,
36,
37]. The Atlantic Goliath Grouper (
Epinephelus itajara Lichtenstein 1822; hereafter referred to as Goliath grouper) is one of the largest grouper species, capable of attaining lengths of 2.5 m and exceeding 400 kg [
38]. The species ranges from North Carolina to Brazil and throughout the Gulf of Mexico [
39]. Much of our understanding of Goliath grouper behavior has been learned from divers, from underwater video footage, and observing animals in captivity (e.g., feeding kinematics [
40], abundance [
41]). Passive acoustic monitoring of sound production (e.g., associated with spawning behavior) [
42,
43] and modest acoustic telemetry work has provided some insight into site fidelity and coarse horizontal and vertical movement [
44]. To date, no studies have documented the fine-scale behavior of this species. IMUs provide the opportunity to learn about fine-scale Goliath grouper activity patterns over a range of temporal scales, and the energetic implications. Additionally, IMUs can yield insight into, inter alia, mating behavior, habitat selection and responses to environmental variables [
45,
46].
Accelerometer transmitters have been used to determine activity levels (active versus inactive) [
47] and feeding behavior [
48] of captive red-spotted groupers (
Epinephelus akaara). An accelerometer-gyroscope data logger was used to identify feeding and escape response behavior of captive White-streaked grouper (
Epinephelus ongus) [
3]. In both studies, behaviors were validated using underwater video cameras situated in the tank. To our knowledge, no studies have used IMU sensors to elucidate the behavior of grouper species at liberty. However, as one of the largest grouper species, Goliath grouper can be equipped with multi-sensor tags that include a video camera for validation of IMU data obtained from individuals in the wild.
The goals of this study were to: (a) obtain ground-truthed body movement data from a custom-made tag fitted to Goliath grouper, which could be used to develop a behavioral classifier; (b) develop two conventional ML approaches, using handcrafted features, to classify behavior from the tag data; (c) design a deep learning approach using CNN and frequency representations of IMU data; and (d) compare the performance of the conventional ML approaches to the deep learning approach to determine the preferred method for identifying and studying behaviors from animals at liberty. Knowledge of the fine-scale activity of these animals can help us understand the ecology of this species, a key research need highlighted by the International Union for the Conservation of Nature [
39].
4. Discussion
The aim of this study was to develop and assess the performance of two conventional machine learning methods and a deep learning method for classifying IMU data obtained from Goliath grouper into behavioral classes. Prerequisites to achieving this were the development of a retrievable custom-made tag that recorded IMU data and video concurrently (for ground-truthing) and establishing a robust attachment method. We chose our dorsal spine attachment method as it conferred the following benefits: it was minimally invasive (compared to other tag attachment methods, e.g., drilling through the dorsal musculature [
3]), no attachment materials were left in/on the individual when the tag detached, and it resulted in good tag stability on fish > ~1.3 m total length. Tag stability is imperative to the IMU recording data reflective of body movement and ensuring behaviors are discernable from the data between deployments. Smaller fish tended to have narrower spines that did not sufficiently fill the gap between the arms of the tag, resulting in a less stable attachment. A similar tag design and attachment technique to that used here should be applicable to other morphologically similar species such as the Pacific analogs,
Epinephelus tukula. As sensors, cameras and batteries continue to miniaturize there may be potential for a reduction in overall tag size, perhaps making it applicable for use with smaller species with conservation concerns (e.g., Nassau Grouper,
Epinephelus striatus).
The tag captured a variety of behaviors, but the activity budget was dominated by hovering and/or resting for all but one individual (Fish 5) that spent 70% of its time swimming. These activity budget patterns may periodically shift to include more activity for individuals at liberty, particularly as Goliath grouper are thought to move to site-specific aggregations during the spawning season [
43,
67,
68]. With low-movement (and thus low-energy) behaviors dominating the activity budget in this study, and the tag only recording video during daylight hours, it is perhaps not surprising that feeding events were infrequent and/or not seen. Goliath grouper are considered opportunistic predators, but feeding was only captured once during the study when fish four consumed a black margate (
Anisotremus surinamensis). Consequently, we did not obtain enough data to develop a feeding class. Moreover, a study by Collins and Motta (2017) described how Goliath grouper modulate their feeding behavior depending on prey type [
40], and thus feeding would likely warrant two classes: suction and ram feeding. When targeting slow-moving or benthic prey, which comprise most Goliath grouper prey items, they employ suction feeding. This involves a slow approach, potentially stopping in front of the prey before it is rapidly sucked into the mouth. When targeting more mobile prey, Goliath grouper typically employ ram feeding, which is characterized by faster capture that includes quicker approaches and wider gapes [
40]. Thus, to appropriately classify feeding behavior from IMU data for this species, more data must be collected in future studies. This could be achieved using IMUs that record for longer and are fitted to captive Goliath grouper that can be directly observed/videoed, or from continued deployment of these custom tags to wild individuals.
Using the three learning approaches, we classified nine of the 13 behaviors identified as part of ethogram development. The CNN performed better overall than either conventional ML method according to each of the five metrics calculated. This may be attributable to both the number of features and type of data used as the input to the CNN. The CNN had 36,864 feature maps used as input to the fully connected layer versus 187-handcraft features—spanning the time-series and frequency domain—for the conventional ML approaches. The CNN was developed solely from frequency domain data for each tri-axial IMU sensor and is designed to identify and extract the features (which often have no meaningful interpretation outside of their application) most useful to the classification task. The feature importance plot obtained from the RF indicated four of the five most important features were from the frequency domain (Shannon entropy, minimum, median and mean energy;
Figure 7). Therefore, the CNN not only had more features to train from but may have detected important features from the frequency domain that were not extracted as handcraft features for the conventional ML approaches.
Both RF and SVMs are commonly employed to classify IMU data into behaviors. In a study investigating the performance of eight conventional machine learning methods classifying acceleration data into behavioral classes for Port Jackson sharks (
Heterodontus portusjacksoni), the SVM and RF performed best, using 2 s epochs for labeling the data. The two methods obtained equal overall accuracy (89%) but the SVM achieved superior performance for fine-scale behaviors such as chewing [
7]. Conversely, RFs performed better than SVMs for classifying acceleration data obtained from Griffon vultures (
Gyps fulvus) into seven behaviors [
6]. In our study, the RF performed better overall and achieved higher
F1-scores for each class than the SVM. This indicates the importance of model comparison when determining which classifier to use to make predictions from a dataset. No single conventional machine learning algorithm consistently performs best for classifying IMU data into behavioral classes and will be dependent upon factors such as training dataset size, linearity of the data, number of classes and the extent of kinematic similarities between classes (e.g., resting and hovering).
An important consideration when selecting a classifier is whether the researcher is more concerned with identifying a particular behavior or determining overall activity patterns. A need to identify each instance of a particular behavior would require high sensitivity (preferably coupled with good specificity) for that class, which in turn may influence the choice of classifier. The SVM had a marginally higher sensitivity for forward swimming (0.8251) than that obtained by the CNN and RF (0.8007 and 0.7631 respectively). However, it obtained much lower sensitivity values for all other behaviors, including booming (SVM = 0.3282, RF = 0.8733, CNN = 1.000). Goliath grouper produce sound (i.e., “booming”) as part of courtship, spawning and agonistic behavior and is therefore a behavior of particular interest [
42]. Passive acoustics can be used to remotely monitor these booms and have been used to determine the relative abundance of soniferous fishes at spawning aggregation sites [
42,
69]. However, a limitation of using passive acoustics is the inability to approximate how many fish are contributing to sound production. The CNN method developed here robustly classified “booming” behavior from the IMU data and provides a means to determine sound production at the individual level; as such, it may serve as a complementary method to passive acoustic monitoring.
The CNN developed in this study has numerous practical applications for understanding the behavioral ecology of Goliath grouper. IMU sensors are capable of recording data over ever-increasing durations. These tools, coupled with the CNN classifier developed here, present the opportunity to quantify how the activity budget of wild Goliath grouper may differ: temporally (e.g., diel and seasonal patterns), between habitat types (e.g., artificial versus natural reefs) and between pristine habitats and those that are heavily impacted by anthropogenic activity (e.g., fishing, diving, boat traffic). For example, a study that applied accelerometers to red snapper (
Lutjanus campechanus) found them to be more active over artificial structures (i.e., shipwrecks and submerged oil platform jackets) than on natural reefs, suggesting there may be differences in the functional role of these habitats for red snapper [
70]. The same study also documented higher activity levels at night and during the summer. However, without video footage or a behavioral classifier to interpret the acceleration data, the reasons for these differences remain unclear [
70]. Other acceleration-based studies have documented impacts of anthropogenic activities on fish behavior, such as impacts of provisioning sites on activity levels of whitetip reef sharks (
Triaenodon obesus) [
71] and dam construction on Chinese sturgeon (
Acipenser sinensis) swimming behavior [
72]. Furthermore, Goliath grouper are targeted for catch-and-release fishing and caught as incidental bycatch by fishermen targeting other reef fishes [
73], but little is known about their post-release recovery. The CNN developed herein provides a means to determine if and how the activity budget changes after capture, and how long it may take for an individual to resume normal behavior [
74,
75].
Custom-made tags such as the one presented here provide an opportunity to document interactions with humans. Stakeholder interactions with Goliath grouper can directly influence their stance on whether Florida should re-open the fishery [
73]. Spear fishers claim increased negative encounters with Goliath grouper, while commercial fishermen argue Goliath grouper are impacting their ability to land valuable snapper/grouper species as they presumably depredate their catch [
73,
76]. Conversely, many recreational dive companies and divers oppose the fishery, with out-of-state divers willing to pay ~336 USD to dive at a Goliath grouper spawning aggregation site [
77]. These customized tags can thus help quantify the frequency of these interactions and help make more informed management decisions. Additionally, while not used in this study given the focus on body movement classification, the hydrophone component of the tag could be used to track boat traffic within the vicinity of the fish, as others have done recently with monitoring fishing activity on artificial reef sites [
78].
Behavioral classification from animal-borne IMU tags is typically completed once the tag is recovered and the raw data can be downloaded. However, real-time behavioral monitoring requires data transmission from the tag to a nearby receiver. In this case, either the raw data must be transmitted from the tag and be classified onboard the receiver, or the classification occurs onboard the tag and the class prediction is transmitted. A study by le Roux et al. [
79] indicated that behavioral classification onboard the tag (using linear discriminant analysis) and transmission reduced the tag’s battery consumption 27-fold compared to transmitting the raw data. This can lead to a substantial increase in the time a tag functions while on the animal, providing obvious benefits (e.g., reducing how often an animal needs to be recaptured if continuous monitoring is required, increased ability to capture rare events, etc.). Alternatively, on-animal classification and storage of the behavior, in favor of storing all the raw data, led to a 469-fold reduction in memory use and a 1.3% increase in power consumption [
79]. However, the primary limitation of deep learning is the computational power required, which may prove problematic for on-animal classification where a larger battery, and thus bigger tag would be required. In such instances, a conventional machine learning approach may be more practical.
Overall, our study describes a novel multi-sensor tag with a reliable attachment method to a large reef fish that can be applied to analogous species around the world. Furthermore, analyses of behaviors revealed from the tag indicates better performance of a deep learning approach at classifying IMU data into behaviors compared to two commonly employed conventional ML approaches. The authors recommend that researchers looking to optimize classification of animal-borne IMU data into behavioral classes more regularly consider deep learning approaches alongside conventional ML approaches when developing and selecting a classifier.