Non-Intrusive System for Honeybee Recognition Based on Audio Signals and Maximum Likelihood Classification by Autoencoder

Libal, Urszula; Biernacki, Pawel

doi:10.3390/s24165389

Open AccessArticle

Non-Intrusive System for Honeybee Recognition Based on Audio Signals and Maximum Likelihood Classification by Autoencoder

by

Urszula Libal

^*,†

and

Pawel Biernacki

^*,†

Department of Acoustics, Multimedia and Signal Processing, Wroclaw University of Science and Technology, 50-370 Wroclaw, Poland

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2024, 24(16), 5389; https://doi.org/10.3390/s24165389 (registering DOI)

Submission received: 9 July 2024 / Revised: 9 August 2024 / Accepted: 20 August 2024 / Published: 21 August 2024

(This article belongs to the Special Issue Audio, Image, and Multimodal Sensing Techniques)

Download

Browse Figures

Versions Notes

Abstract

:

Artificial intelligence and Internet of Things are playing an increasingly important role in monitoring beehives. In this paper, we propose a method for automatic recognition of honeybee type by analyzing the sound generated by worker bees and drone bees during their flight close to an entrance to a beehive. We conducted a wide comparative study to determine the most effective preprocessing of audio signals for the detection problem. We compared the results for several different methods for signal representation in the frequency domain, including mel-frequency cepstral coefficients (MFCCs), gammatone cepstral coefficients (GTCCs), the multiple signal classification method (MUSIC) and parametric estimation of power spectral density (PSD) by the Burg algorithm. The coefficients serve as inputs for an autoencoder neural network to discriminate drone bees from worker bees. The classification is based on the reconstruction error of the signal representations produced by the autoencoder. We propose a novel approach to class separation by the autoencoder neural network with various thresholds between decision areas, including the maximum likelihood threshold for the reconstruction error. By classifying real-life signals, we demonstrated that it is possible to differentiate drone bees and worker bees based solely on audio signals. The attained level of detection accuracy enables the creation of an efficient automatic system for beekeepers.

Keywords:

beehive monitoring; smart beehives; anomaly detection; power spectral density; signal processing; artificial intelligence; autoencoder neural network

1. Introduction

Swarming is a natural phenomenon that occurs when an Apis mellifera honeybee colony reproduces and splits into multiple colonies. Swarming typically occurs during the late sping and early summer months [1,2,3,4]. The traditional beekeeping approach requires systematic beehive monitoring by beekeepers who observe the states of the colonies and take necessary actions to prevent swarming. Nowadays, new automated approaches to beehive monitoring are being developed that utilize machine learning techniques to build an intelligent monitoring system. Non-invasive automated detection of Varroa mites [5] and early detection of swarming [6,7] are among the topics most actively studied in beekeeping. Swarming in honeybees can be expected after a beekeeper has observed new queen cells inside a hive. The other indicators leading to swarming are a higher foraging activity, a rapid growth of the colony, and drone bee appearance outside of a hive.

Thus, the detection of swarming without interfering with the hive can be based on the detection of drone bees’ activity around the hive during the mating season. Each summer, thousands of male bees, the drones, travel to drone congregation areas [3,7,8,9,10], attempting to mate with virgin queens from neighboring hives, which correlates with their higher detectability around entrances to the hives. The share of drones in a bee family during the mating season can increase up to 15% of the bee colony—Ref. [2] stated that the honeybee colony rather precisely limits the drone comb to some 15% of the total comb within its hive. There have been studies on honeybees’ activities, including automatic monitoring systems counting forager bees entering and leaving the hives, proposed by [11], or drone activity monitoring in [12]. Therefore, based on the facts presented, an early-stage swarming alarm system for beekeepers can be constructed by analyzing bee sounds around a hive and identifying drones.

Honeybees use sound as a means of communication, both within the hive and with other bees outside the hive. The sound produced by bees is a form of vibration created by the rapid beating of their wings and is used to convey information about the location of food, the presence of danger, and other important information. One of the most well-known sounds produced by bees is the buzzing sound that is heard when they are in flight. This sound is created by the rapid beating of their wings, which can occur at a rate of approximately 260 Hz [2], i.e., 260 wing beats per second.

To distinguish between worker bees and drones, based on the sounds they make, one can use the fact that the drones are generally bigger and have longer wings (see Figure 1). The results shown for bumblebees by [13] indicate that the body shape or the wing size of a pollinator can be correlated with the fundamental frequency it produces. Additionally, by examining the frequency spectrum or mel-frequency cepstral coefficients (MFCCs) [14], it is possible to automatically recognize pollinating bees.

1.1. Internet of Things Solutions in Beehive Monitoring

In recent years, there has been a significant shift from manual beehive monitoring by beekeepers to the application of Internet of Things (IoT) technologies. The primary objectives of these advancements are to enable remote observation of bee colony conditions and to enhance the safety and health monitoring of bees. Additionally, the implementation of more sophisticated methods in beekeeping allows for the collection of large datasets, which facilitate research into the still poorly understood behaviors of honeybees.

Numerous commercial IoT systems and devices have been developed to enhance beekeeping practices, as detailed in [15,16]. These include Beebot, which measures hive weight, humidity, and internal temperature; Easy Bee Counter, which utilizes 48 infrared (IR) sensors to count bees entering and exiting the hive; and the Bee-Shop Camera Kit, designed to capture photos and video recordings that can be stored on an SD card or transmitted to a beekeeper’s mobile device via the 3G/4G LTE network. Additionally, EyeSon Hives employs an image detection algorithm to analyze swarm flight direction, offering real-time video monitoring through 3G/4G LTE connectivity. Other notable devices include Zygi, which measures temperature, humidity, and weight; Hive-Tech, which detects swarming activity using IR and reflectance sensors to monitor real-time crowd conditions; and HiveMind, which tracks bee activity using sound and IR sensors.

Another noteworthy example is the self-powered SBMaCS system [17], which incorporates temperature, humidity, weight, motion, and flame sensors. A system described in [18] uses a thermal camera to count bees, employing four popular classifiers: k-nearest neighbors, neural networks, random forest, and support vector machine. Additionally, bee classification using an autoencoder neural network trained on mel-frequency cepstral coefficients (MFCCs) as representations of audio samples was proposed in our previous works [19,20,21].

The AppMAIS project [22] was developed to better understand and prevent colony collapse disorder, which leads to the rapid loss of adult worker bees and sudden colony mortality. AppMAIS is based on the open-source IoT platform Thingsboard, which generates alerts upon detecting specific colony states. As highlighted in the survey in [7], energy consumption and non-intrusiveness are crucial aspects of such systems.

IoT technology is closely related to radio-frequency identification (RFID) technology. As noted in [23], RFID tags can be used for continuous monitoring of individual bees within a colony, offering a less time-consuming alternative to video surveillance. However, this approach involves attaching RFID tags to the backs of individual bees, which is challenging due to their small size. Considering that an average colony consists of 20,000 to 80,000 bees, a large number of RFID tags would be required, and the data obtained would only provide information about a bee entering or leaving a specific area defined by the location and parameters of an RFID reader.

1.2. Audio Analysis Methods

Mel-frequency cepstral coefficients (MFCCs) are the most common set of features used in numerous studies [24,25,26,27] that exploit a machine learning framework based on audio recordings. In the mel-frequency cepstral method of obtaining a power spectrum representation of a signal, the frequency bands are equally spaced on the mel scale. Many studies analyzed MFCCs to extract information for bee detection, queen absence and swarming detection, and bee species identification, as well as environmental effects, with the three first coefficients showing the highest discrimination potential.

Despite the popularity of MFCCs, methods based on power spectral density (PSD) have also been introduced to honeybee sound analysis. In a study on drone audio detection [19], parametric and non-parametric methods of PSD estimation were compared for various frequency bandwidths, ranging from 100 Hz to 3000 Hz. In [28], the short-time Fourier transform (STFT) calculated with filter banks and the overlapping method was used to detect swarming events. The STFT was calculated using 128, 256, 512, and 1024 bins to investigate the impact of window width.

1.3. Machine Learning Algorithms

A typical machine learning framework encompasses signal measurement, preprocessing, feature extraction, and lastly, classification. In the area of acoustic analysis of bee sounds, many different classifiers have been explored, with the most common classifier—a support vector machine (SVM) [29,30,31]—as en example. The SVM is a kernel-based method that projects data into higher dimensions in which a hyperplane can effectively separate classes.

But recently, neural network-based deep learning [32] methods have been willingly introduced. In [33,34], the use of so-called long short-term memory (LSTM) recurrent neural networks (RNNs) for queen bee presence detection is proposed. A comparison between an LSTM, a multilayer perceptron (MLP) neural network, and logistic regression was made, and it showed the power of the LSTM for the task at hand. Recently, convolutional neural networks (CNNs) [31,35,36] have gained popularity, especially within computer vision tasks. To make them directly applicable to bee acoustics analysis, researchers have relied on image-like inputs, such as spectrograms, mel-scaled spectrograms, or other two-dimensional time–frequency representations of the audio signals.

1.4. Data Acquisition and the Proposed Processing

This paper presents results based on a selected set of audio recordings acquired by our proposed beehive monitoring system. It is designed to identify and predict certain events and states of the beehive that are of interest to the beekeeper.

Data were collected twice a day during the highest outdoor activity of worker bees foraging pollen and nectar (around from 8 am to 10 pm and from 3 pm to 5 pm). Bees generally avoid flying during rain or windy weather conditions for safety reasons. To prevent abnormal bee behavior, rain damage to equipment, and wind interference with recordings, all recordings were collected on sunny, almost windless days in June (early summer in the Northern Hemisphere). A directional microphone, facing the entrance to the beehive, was mounted at the edge of its roof.

All recordings were sampled at a frequency of 44,100 Hz and stored in an uncompressed 16-bit WAV format. For signal processing purposes, the recordings were segmented into 1 s long intervals. For the purposes of digital signal processing, the recordings were divided into samples of 1 s in length. The dataset utilized in the off-line simulation comprises 3400 flight sound signals of worker bees and about 1700 sound signals of flying drone bees. Our database [37] is available online with open access.

The ground truth was verified using a video camera standing at a tripod aiming at the front of the beehive. Video footage was used to determine which segments of the audio recordings contained unambiguous representations of each class: worker bees or drone bees. Only audio segments that could be clearly classified were included in our database. Due to the manual verification process, the dataset is relatively small. The clear separation between classes contributed to high classification accuracy across all methods. However, we anticipate lower performance in real-world applications for this reason.

The overall drone bee detection process is split into two main parts: feature extraction by signal preprocessing and classification with the use of autoencoder neural network reconstruction error. The signal processing diagram, including the detection of bees using the autoencoder neural network, is illustrated in Figure 2.

For feature extraction, we have used two spectral and two cepstral representations of audio signals, obtained by the following algorithms:

Burg algorithm—a parametric power spectral estimation,
MUSIC—a pseudospectrum estimation,
MFCCs—mel-frequency cepstral coefficients, and
GTCCs—gammatone cepstral coefficients.

Our paper is organized into six sections. We start with introducing the drone bee detection problem and the characteristics of the sounds generated by the honeybees. We investigate the methods published in related works in Section 1.2 and Section 1.3 of the paper. In Section 2, we describe the preprocessing methods to be compared, including feature extraction techniques based on power spectrum density estimation in Section 2.1 and cepstral signal analysis in Section 2.2. The specific structure of autoencoder neural network and its anomaly detection potential is explained in Section 2.3. A bayesian approach to classification exploiting autoencoder neural network with maximum likelihood threshold is proposed in Section 3. In Section 4, we present the drone bees detection results. The discussion of the obtained results is in Section 5 and the summary of the paper is placed at the end in Section 6.

1.5. The Main Contributions

The manuscript presents a novel method for automatic recognition of honeybee types: worker bees and drone bees, by analyzing audio signals generated by flying bees near a beehive entrance. The main contributions include the following:

Audio Signal Processing for Bee Classification: This study explores various methods for audio signal representation in the frequency domain, including mel-frequency cepstral coefficients (MFCCs), gammatone cepstral coefficients (GTCCs), multiple signal classification (MUSIC), and Burg’s method for parametric estimation of power spectral density (PSD).
Use of Autoencoder Neural Network: The proposed system uses an autoencoder neural network for classifying bees based on the reconstruction error of signal representations. The autoencoder, typically used for generative reconstruction, is repurposed to distinguish drone bees (anomalous signals) from worker bees (normal signals) by leveraging differences in reconstruction error.
Novel Thresholding Approach: The paper introduces a new method for class separation using various thresholds, including a maximum likelihood threshold (T*) to optimize the classification accuracy, achieving near-perfect accuracy in most cases.
Empirical Bayes Classifier: The method effectively acts as an empirical version of the Bayes classifier, minimizing misclassification probability based on reconstruction error histograms.
Practical Implications: The results, obtained under ideal low-noise conditions, suggest that the system could be effectively deployed in real-world beehive monitoring, with plans for integration with data acquisition and communication modules to transmit alerts and status updates to beekeepers.

The research demonstrates the feasibility of differentiating between drone and worker bees using audio signals alone, paving the way for practical applications in beekeeping, such as swarming prevention, with potential enhancements through additional sensors and data transmission via GSM.

2. Materials and Methods

2.1. Spectral Coefficients

The power spectral density (PSD) function is commonly used in signal processing, including audio signal processing, to analyze the frequency content of a signal. The PSD provides information about how the power of a signal is distributed across different frequency components. Audio signals are typically complex and can contain a wide range of frequencies. The PSD allows us to analyze the distribution of power across different frequency bands. This information is essential for understanding the characteristics of the sound, such as pitch, timbre, and other frequency-related attributes.

The PSD analysis is often used to extract relevant features from audio signals. These features can be used for various applications, including audio classification, speech recognition, and music analysis. For example, different genres of music may exhibit different patterns in the frequency domain.

We decided to use the power spectrum density (PSD) coefficients as features in the learning phase of the neural network in the detection process.

Power spectral density estimation techniques can be divided into parametric and non-parametric methods. The non-parametric methods estimate PSD explicitly from signal samples, without any assumptions about particular process structure. The parametric approaches assume that the signal can be described as the stationary process (MA—moving average; AR—autoregressive; ARMA—autoregressive moving average) of order m. Power spectral density is then calculated using the estimated model parameters. This paper presents PSD estimation with the parametric approach by the Burg method and with eigenvector decomposition of an autocorrelation matrix for the MUSIC method.

2.1.1. Burg Algorithm

The Burg algorithm [38,39] assumes that a signal can be described as an autoregressive (AR) process of order m:

\hat{x} = - \sum_{k = 1}^{m} a_{m} (k) x (n - k) .

(1)

The Burg algorithm solves the ordinary least squares problem. AR parameters

a_{m}

are estimated by minimizing the prediction forward and backward errors, which are referred to as the errors between the actual value of the signal and the corresponding estimators in forward and backward directions:

P S D_{B U R G} (f) = \frac{E_{m}}{| 1 + \sum_{k = 1}^{m} a_{m} (k) e^{- j 2 π f k} |^{2}} .

(2)

2.1.2. MUSIC

The MUSIC (multiple signal classification) algorithm calculates the pseudospectrum based on a signal or correlation matrix using Schmidt’s eigenspace analysis technique [40]. This method involves analyzing the eigenspace of the correlation matrix of the signal to determine its frequency characteristics, as presented in Figure 3.

The algorithm is well suited for signals composed of the combination of sinusoids with added white Gaussian noise. Its appeal lies in the following recognized benefits:

Ability to manage multiple concurrent sound sources. In our case, it is hard to record only one bee or drone. In most cases, we are able to obtain sounds from many species at the same time.
Precise measurements. Our earlier research [19] has shown that the key to correct recognition lies in detailed spectrum analysis up to a few kilohertz.
Excellent spatial resolution. In many observations, worker bees and drones are moving, which means in other words that the signal source is changing its position.

2.2. Cepstral Coefficients

2.2.1. MFCC

Mel-frequency cepstral coefficients (MFCCs) are widely used in signal processing and speech analysis because of their effectiveness in capturing essential features of the audio signal, especially in the context of human auditory perception. The process involves representing the energy distribution across different frequency bands, emphasizing important spectral characteristics while discarding less relevant information. This feature extraction process is crucial for signal recognition tasks, as it helps highlight the discriminative aspects of the signal.

The MFCC calculation is mainly based on applying mel-scale filter banks to absolute squared value of the fast Fourier transform (FFT) of a windowed signal frame

Y (m) = \sum_{k = 0}^{N - 1} W_{m} (k) * {| X (k) |}^{2},

(3)

where

X (k)

is the FFT of a signal frame (each frame is a vector of N samples in time domain), k is the FFT bin number,

W_{m} (k)

is the mth mel-scale filter for

m = 1, \dots, M,

and M is a chosen number of mel-scale filter banks. The formula for converting from frequency in hertz to mel scale is

m e l (f) = 1127 \cdot l n (1 + \frac{f}{700}),

(4)

where

f = \frac{m \cdot f_{s}}{N}

is the frequency and

f_{s}

is the sampling rate.

The last step is an application of discrete cosine transform (DCT) to the obtained output, which gives a set of cepstral coefficients:

c_{i} = \sqrt{\frac{2}{M}} \sum_{m = 1}^{M} l o g_{10} (Y (m)) * c o s ((2 m - 1) \cdot \frac{i π}{2 M}),

(5)

where i is the MFCC coefficient index.

The block diagram of the MFCC extraction is shown in Figure 4.

2.2.2. GTCC

Gammatone cepstral coefficients (GTCCs) are popular features extracted from audio signals for use in recognition problems. The gammatone function models the response of the human auditory filter. The frequency selectivity properties of the cochlea and those measured psychophysically in humans appear to coincide with properly implemented gamma filters. The gammatone filter bank (composed of the frequency responses of several gammatone filters) emphasizes the perceptually meaningful sound signal frequencies.

The process of calculating gammatone cepstral coefficients is analogous to the MFCC extraction scheme. The gammatone filter bank requires setting the total filter bank bandwidth (

B W

); gammatone filter order N; equivalent rectangular bandwidth model: Lyon, Greenwood, or Glasberg and Moore (ERB); and number of filters. The frequency response of the single filter is defined as follows:

G T (f) = {(1 + j (f - f_{0}) / B W)}^{- N},

(6)

where

f_{0}

is a center frequency of the filter.

In Figure 5, we present the block diagram for the GTCC calculation.

2.3. Autoencoder Neural Network

For worker bee and drone classification, we propose using an autoencoder. It is a type of generative artificial neural network [41] that can learn the features of the input signals through the training process, during which the model adjusts its parameters to minimize the difference between the input signals and their reconstructions. As the autoencoder minimizes the reconstruction error, it learns to map the input signals to a lower-dimensional representation that captures the essential features of the data. Its weights are adjusted to extract meaningful patterns and structures from the input signals.

The primary goal of autoencoder neural network is reconstruction of signals based on encoded representation. It is mainly used for dimensionality reduction. The previous tests [20,21] have shown that autoencoder neural networks can be modified for the purpose of anomaly detection, in particular for drone bee detection. In Section 3.1, we investigate the further alterations of thresholding of autoencoder reconstruction error, including maximum likelihood threshold.

The general structure of this network is shown in Figure 6.

For the preprocessed audio recordings using spectral and cepstral methods, we performed a series of numerical experiments, using the neural network of autoencoder type for honeybee classification. The audio signals obtained by the spectral methods (Burg and MUSIC algorithms) were represented by feature vectors counting 512 power spectral density coefficients, and this was the size of the input layer for those cases in the neural network. On the other hand, the cepstral methods have given a lower number of 120 coefficients, which is equal to the number of filter banks in the calculation of the gammatone- and the mel-frequency cepstral coefficients. And for cepstral preprocessing, the input layer size of the neural network was equal to that number. The honeybee classification was tested with four main autoencoder neural network structures: with 1, 2, 3, and 4 hidden layers in the encoder. The size of each layer varied depending on the size of the input layer, but the code layer of the lower size was always described by 8 neurons. In all tests, we applied the most standard setting for anomaly detection by autoencoder, with the decoder having one layer less than the encoder. Except for the output layer that had a sigmoid activation function, all the rest of the layers used ReLu activation functions.

The anomaly detection performed with the use of an autoencoder neural network has the three following main steps:

Training: In the first step, the autoencoder neural network is trained on feature vectors extracted from worker bee audio recordings only.
Testing: In the second step, the trained autoencoder is used for reconstruction of training set data, containing both worker bee and drone bee audio recording representations. The reconstruction error is calculated for all signals from the training set.
Classification: The last step is the classification based on the value of the reconstruction error. The anomaly in the form of drone bee sounds should have a higher reconstruction error.

The classification result depends on the choice of the reconstruction error threshold, dividing the decision areas for the two classes, which is discussed in the next Section 3.

3. Theory

3.1. Anomaly Detection Threshold

In anomaly detection tasks using autoencoders, the threshold for reconstruction error is typically set based on statistical analysis of the errors obtained from a dataset representing normal conditions. This threshold is often established by using statistical measures like the mean plus a multiple of the standard deviation. This approach ensures that only reconstruction errors significantly higher than those observed for normal data are classified as anomalies. By setting this threshold, the autoencoder can effectively distinguish between normal variations and deviations that suggest anomalies.

The reconstruction error for the worker bees should be significantly smaller since the autoencoder was trained on the worker bees’ feature vectors only. The reconstruction error

M S E^{t e s t}

for the drone bee feature vectors, applied as an input of the autoencoder in the testing stage, generally achieved higher values, allowing the detection of drones as anomalies for the most commonly occurring worker bees.

We performed classification using the following three threshold values:

T 1 = mean (M S E^{t r a i n}) + std (M S E^{t r a i n}),

(7)

T 2 = mean (M S E^{t r a i n}) + 2 \cdot std (M S E^{t r a i n}),

(8)

T 3 = mean (M S E^{t r a i n}) + 3 \cdot std (M S E^{t r a i n}),

(9)

where

M S E^{t r a i n}

is reconstruction error for worker bee feature vectors from the training dataset,

m e a n (M S E^{t r a i n})

is its mean value, and

s t d (M S E^{t r a i n})

its standard deviation. It should be stressed here that the threshold values are calculated on the basis of results for the training data only, containing the feature vectors for worker bee audio recordings—see Figure 7 and Figure 8. The autoencoder is an unsupervised anomaly detection method, and the testing set containing the coefficient representations of audio signals from the second class, for drone bees, does not take part in the training process. Nevertheless, in this paper, we use the term classification because in the proposed approach, the autoencoder becomes a classifier thanks to the addition of a supplementary step. Autoencoders, though not designed as classifiers, are highly effective in anomaly detection thanks to their generative nature. When an autoencoder is trained solely on data from a specific class, it becomes adept at reconstructing that class while showing a strong sensitivity to deviations. As a result, when data from a different class are encountered, the autoencoder struggles to accurately reconstruct them, flagging these instances as anomalies.

The most popular approach assumes that the threshold value is equal to T1. The threshold defines the decision areas for the classification of worker bees and drone bees, which is executed as a last stage after obtaining the reconstruction error from the autoencoder.

3.2. Maximum Likelihood Approach to Classification

We propose a new way of choosing the threshold for autoencoder output, based on the maximum likelihood approach. The optimal classification decision rule is defined as follows:

d^{*} (x) = max_{c \in {class 1, class 2}} [P (c) P (x ∣ c)],

(10)

where

P (c)

is the a priori probability and—importantly—x is not a feature vector as in a standard classifier, but here, x is a reconstruction error, an output of the autoencoder neural network.

The Bayes decision rule

d^{*}

classifies the output x of the autoencoder to a class of higher a posteriori probability, i.e.,

d^{*} (x) = \{\begin{matrix} class 1, & if P (1) P (x ∣ 1) > P (2) P (x ∣ 2), \\ class 2, & if opposite case . \end{matrix}

(11)

The classifier constructed in that way minimizes the expected risk for the zero–one loss function.

For practical application of that classification rule (11), we based it on probability density function estimation. In our tests, we used Gaussian estimation since the histograms of the obtained MSE from the autoencoder matched the normal distribution quite well. The illustration of determining the optimal threshold T* by fitting Gaussian probability density functions to the data is given in Figure 9.

The proposed threshold values, which separate the decision areas of the classifier, are marked by the dotted vertical lines in Figure 7 and Figure 8, where we present exemplary histograms of the MSE losses from two numerical experiments, returned by the autoencoder neural network for training (worker bees—class 1) and testing (drone bees—class 2) datasets.

The trained autoencoder neural network with the threshold value already calculated is at this point ready to perform class prediction for new input data in a fast way with low computational cost.

3.3. Statistical Evaluation

The binary classification by the autoencoder reconstruction error to two classes of worker bees and drone bees, based on their spectral and cepstral representations of feature vectors from the training and test sets, allowed for calculation of the following quality indicators:

T P

—true positives (number of correctly detected worker bees);

T N

—true negatives (number of correctly detected drone bees);

F P

—false positives (number of falsely detected worker bees); and

F N

—false negatives (number of falsely detected drone bees).

For the verification of the results, based on the four counts listed above, the accuracy of the classifier was calculated. In binary classification, the accuracy is defined as a proportion of correct predictions, being a sum of true positives and true negatives, to the total number of signals from the testing set:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} .

(12)

In addition to the accuracy, we analyzed also another classifier performance measure, the F1-score. The F1-score is calculated in the following way:

F 1 - s c o r e = \frac{2 T P}{2 T P + F P + F N} .

(13)

Both accuracy and F1-score can be expressed as percentages, from zero to one hundred. The accuracy is a measure of correct predictions of a classifier, while the F1-score is a harmonic mean of the precision and recall (sensitivity) of the method. The ideal classifier would have an accuracy and F1-score equal to

100 %

.

4. Results

The results presented in this section were obtained for a set of selected 3400 recordings of worker bees and 1700 recordings of drone bees from our database [37] available open access. We extracted feature vectors, as the frequency domain representations of analyzed audio signals, with the use of four methods: two spectral (Burg algorithm and MUSIC) and two cepstral (MFCC and GTCC). As described in the previous section, the experimental classification of worker bees and drones was performed with the use of autoencoder neural networks with 1, 2, 3, and 4 encoder hidden layers, which we will denote by NN1, NN2, NN3, and NN4.

The classification of drone bees and worker bees was carried out with four different autoencoder reconstruction error thresholds T1, T2, T3, and T*, described in Section 3.1 and Section 3.2. The classification accuracy and the F1-score obtained for the three approaches for the thresholds T1, T2, and T3, following the well-known ‘k-sigma’ rule dependent on the mean value and the standard deviation of the reconstruction error

T k = mean (M S E^{t r a i n}) + k \cdot std (M S E^{t r a i n}),

(14)

are presented in Figure 10 and Figure 11.

The detailed numerical results for the classification of drone bees and worker bees for all four autoencoder reconstruction error thresholds T1, T2, T3 and the optimal Bayesian threshold T* are shown in Table 1.

5. Discussion

Autoencoder neural networks are a specialized form of generative networks designed to copy and compress input data, encoding them in a reduced-dimensional space, and then reconstructing a representation of the original signal (or object) at the network’s output. Although autoencoders are not inherently classifiers, they excel in anomaly detection due to their generative properties. When trained exclusively on data from a single class, the autoencoder becomes highly sensitive to deviations from that class, effectively identifying instances from a second class as anomalies.

In this study, we leveraged this capability by training the autoencoder solely on representations of worker bees. This approach is justified by the fact that worker bees constitute the vast majority of a bee colony, while drone bees typically comprise no more than 15% of the population, and only during the brief swarming period. Consequently, this method simplifies the training process for beekeepers, who can train the network in their specific environments, such as during periods when drone bees are inactive.

The proposed usage of the autoencoder neural network as an anomaly detection tool allowed for transformation of the high-dimensional classification problem to a one-dimensional one. The features of honeybee signals were extracted with spectral and cepstral analysis methods, leading to 512-dimensional feature vectors for the Burg and MUSIC methods, and 120-dimensional feature vectors for the GTCC and MFCC extraction methods, due to the frequency band aggregation on the cepstral scale, executed by the mel-scale or the gammatone filter banks. The high-dimensional signal representations were applied in the next step to the autoencoder neural network, whose most common task is generative reconstruction of the input data, but our approach was different. We noticed that for well-separated classes of signals, their representations produced a distinct reconstruction error, low for the worker bees, whose signals were used for training of the autoencoder, and high for drone bees, whose appearance at the entrance to the beehive is quite rare and can be treated as an anomaly. This led us to propose the novel method of classification of honeybees based only on the autoencoder reconstruction error. The standard output of autoencoder neural networks is high-dimensional feature vectors, i.e., generative models mimicking the input data. Our approach based on maximum likelihood (schematically presented in Figure 12), reduced the dimensionality of the problem and protected against the curse of the dimensionality phenomenon, which is a common problem in pattern recognition tasks.

Analyzing the reconstruction error histograms for both classes of signals, worker bees and drone bees (compare Figure 7 and Figure 8), we noticed that for most cases, we had at least good separation of classes. The results for the standard threshold T1 (around 96.28% for Burg method, 92.13% for GTCC, 93.85% for MFCC, and 97.06% for MUSIC) were improved by using the threshold T2 with two standard deviations from the mean value, and later slightly corrected again with threshold T3 with three standard deviations. The reason for that effect is the high separability of the autoencoder MSE in the two classes of signals, and the result is highly data-dependent. It is worth noting that for different datasets with less separate histograms of the output MSE, the thresholds T2 and T3 can potentially lead to worse accuracy than for the standard threshold T1, which is a safe option.

The proposed maximum likelihood threshold T* is a numerically obtained threshold that separates the decision areas of the classifier in the optimal way—see Figure 9. The application of the T* threshold to autoencoder reconstruction error is, in fact, producing an empirical version of the Bayes classifier with the smallest probability of misclassification. In theory, the misclassification probability for the Bayes classifier with full probabilistic information about distributions in classes is the lowest for all classifiers, hence the optimal name. In practice, when the probability distributions in the decision rule (11) must be exchanged with the empirical histograms obtained for the dataset, the misclassification rate will remain the lowest, and at the same time, the complementary accuracy will be the highest. The accuracy results shown in Table 1 confirm the theoretical property of the Bayes classifier for our data. The accuracy for the optimal threshold T* was greater than 99.6% in most cases, except for two: for the Burg and MUSIC preprocessing methods and the neural network structure NN1 with only one hidden layer. The high-dimensional signal representations in the form of 512-dimensional vectors, encoded by only one hidden layer of the neural network, were not as sufficient as encoders with 2, 3, and 4 hidden layers. The optimal threshold T* gives accuracy close to 100%, which means that the classes of worker bees and drone bees, represented by feature vectors obtained by the described methods, are highly separable.

Prototype of LTE Module

The presented signal classification methods will be embedded into an IoT system based on a cellular LTE module. The ultimate goal is to build a functional beehive monitoring system with wireless communication. The device located near the beehive would not only record the audio signals but would also perform the signal processing and send the reports to a beekeeper. Currently, we are working on a prototype of an evaluation board with LTE module. The photo of the board is shown in Figure 13.

The good theoretical results give hope for a practical realization of the proposed system with decent level of accuracy. It should be clearly noted that the results shown in this paper were obtained for a dataset with selected clean signals, recorded in low-noise conditions. We are aware of the idealized conditions. The first step was to check if it is possible to recognize drones and worker bees based on audio signals only, which we have proven to be possible. The next step will be confronting the proposed methods with data collected in constant way on the beehive side with the LTE module. The ground truth will be achieved by visual confirmation of honeybee type on video recordings, as achieved before. For this real-life application of the theoretical solution, we need a data acquisition and wireless communication module, which will be able to transfer the data over the cellular network and send specific alarms defined by a beekeeper. The system should be able to send an update on the status of honeybee colony at given time or on demand. The proposed system can be enlarged by adding more sensors and extending the daily reports with additional data about, for example, weather conditions outside of a beehive and temperature inside a beehive.

Currently, we are developing a wireless communication system for audio data collection. Our hardware includes a radio modulus and a digital signal processor (DSP). The DSP has hardware support to perform the convolution operations used for example in the MUSIC and autoencoder algorithms. The preprocessing can be computationally quite costly. However, neural networks, aside from the training phase, can be highly efficient and fast in terms of computation. The computational complexity of the MUSIC preprocessing method depends on the number of operations: for the estimation of

M \times M

autocorrelation matrix for signal frames of length N samples (equal to

N \cdot M^{2}

, where

M < < N

) and for eigenvalue decomposition of complexity

O (M^{3})

. For example, for the case of NN2 with two hidden layers in the encoder, the computational complexity of the already trained autoencoder neural network is

O (n_{0} \times n_{1} + n_{1} \times n_{2} + n_{2} \times n_{3} + n_{3} \times n_{4} + n_{4} \times n_{5})

, where

n_{i}

is a number of neurons in the ith layer,

i = 0, 1, \dots, 5

. In our test, the number of neurons in the input layer is

n_{0} = 512

, in the coder layer, it is

n_{3} = 8

, and in the output layer,

n_{5} = 512

.

The proposed wireless system will not acquire data continuously, as a real-time system is not necessary. Instead, we plan to collect the audio data in short intervals (e.g., 1 s) followed by a 2 s break for processing. The numbers of worker bees and drones bees will be stored in two counters and when the ratio of drones to worker bees exceeds a chosen value, an alarm will be sent to the beekeeper via the GSM network. Further testing for the MUSIC preprocessing method is necessary to accurately define the system parameters and ensure its reliability.

6. Conclusions

In this article, we investigated and proposed an early detection method for swarming by determining the most effective preprocessing method for audio signals. We have compared the results for four different methods for signal representation in frequency domain, including mel-frequency cepstral coefficients, gammatone cepstral coefficients, and the MUSIC and Burg algorithms. The extracted features were applied to an autoencoder neural network. To separate the classes of worker bees and drones, the maximum likelihood threshold T* was adopted in the classification part of data processing.

The highest variability of the results can be seen for the standard threshold T1, for which the best performance was obtained for MUSIC feature extraction, followed by the Burg method in second place. Both spectral methods were significantly more accurate for all autoencoder structures compared to the cepstral methods, GTCC and MFCC, for which the results were several percentage points lower. For the 2-sigma threshold T2, and the 3-sigma T3, we observe much smaller variability in the results between all four feature extraction methods, mainly varying in the range 96.87–99.86%.

Even lower accuracy variability between 99.66 and 99.97% is obtained for the optimal threshold T*. This was possible only due to the high separability of the probability distributions of autoencoder reconstruction errors in classes of worker bees and drone bees. In that case, the best classification result was again achieved by the MUSIC spectral method.

The final conclusion is that the MUSIC feature extraction method performed the best for all proposed threshold values T1, T2, T3, and optimal T*, achieving the highest accuracy and the highest F1-score. The spectral methods, Burg and MUSIC, generally use higher-dimensional signal representations, and the insufficient number of hidden layers of the encoder (as for the NN1 case with only one hidden layer) negatively affects the results in comparison to more extensive autoencoder structures. With that constraint, the spectral methods Burg and MUSIC outperformed the cepstral methods GTCC and MFCC for the standard value of threshold T1, commonly used in anomaly detection problems.

Spectral methods, particularly parametric approaches such as the MUSIC algorithm and Burg’s spectral estimation, offer significantly higher resolution in the frequency-domain representation of a signal. It is therefore unsurprising that these spectral methods outperform cepstral methods in our application. Cepstral techniques, like mel-frequency cepstral coefficients (MFCCs), were originally designed for specific signal types, such as voice parameterization, and are commonly employed as default methods for audio signal processing. However, our experimental results demonstrate that for non-standard signals, such as insect buzzing, cepstral methods yield lower performance compared to spectral methods.

The proposed solution for audio classification of honeybees can be further developed for drone bee detection, facilitating the creation of an intelligent counting system for beekeepers. Future research should experimentally assess whether an autoencoder trained on data from one beehive can generalize effectively to another beehive and evaluate the impact of environmental noise in new settings on the accuracy of honeybee classification.

Data collection for this study was conducted over a brief period on a clear, sunny day, resulting in a database of clean recordings. Given the anomaly detection capability of autoencoder neural networks, it is likely that environmental noise or rain would be erroneously classified as anomalies, akin to the drone bee class. In future work, we intend to address the challenge of noisy data by collecting a more comprehensive dataset. While the classification method presented in this manuscript may currently yield low accuracy in outdoor scenarios, it provides valuable insights into which frequency-domain signal representation methods most effectively capture the distinguishing features between the two classes.

Author Contributions

Conceptualization, U.L.; methodology, U.L. and P.B.; software, U.L. and P.B.; validation, U.L. and P.B.; formal analysis, U.L.; investigation, U.L. and P.B.; resources, P.B.; data curation, P.B.; writing—original draft preparation, U.L. and P.B.; writing—review and editing, U.L. and P.B.; visualization, U.L. and P.B.; supervision, U.L.; project administration, U.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in a publicly accessible repository. The data presented in this study are openly available in “Dataset for honey bee audio detection” at https://zenodo.org/doi/10.5281/zenodo.10359685 (accessed on 9 July 2024), reference number [37].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AR	Autoregressive process
ARMA	Autoregressive moving average process
CNN	Convolutional neural networks
DCT	Discrete cosine transform
FFT	Fast Fourier transform
GTCC	Gammatone cepstral coefficients
LSTM	Long short-term memory
MA	Moving average process
MFCC	Mel-frequency cepstrum coefficient
MLP	Multilayer perceptron
MSE	Mean square error
MUSIC	Multiple signal classification
PSD	Power spectral density
RNN	Recurrent neural network
STFT	Short-time Fourier transform
SVM	Support vector machine

References

Johnson, B.R. Honey Bee Biology; Princeton University Press: Princeton, NJ, USA; Oxford, UK, 2023. [Google Scholar]
Seeley, T.D. The Wisdom of the Hive. The Social Physiology of Honey Bee Colonies; Harvard University Press: Cambridge, MA, USA, 1995. [Google Scholar]
Tautz, J. The Buzz about Bees: Biology of a Superorganism; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Wright, W.D. The Honey Bee; J.B. Lyon Company, Printers: Albany, NY, USA, 1913. [Google Scholar]
Hall, H.; Bencsik, M.; Newton, M. Automated, non-invasive Varroa mite detection by vibrational measurements of gait combined with machine learning. Sci. Rep. 2023, 13, 10202. [Google Scholar] [CrossRef]
Bencsik, M.; Bencsik, J.; Baxter, M.; Lucian, A.; Romieu, J.; Millet, M. Identification of the honey bee swarming process by analysing the time course of hive vibrations. Comput. Electron. Agric. 2011, 76, 44–50. [Google Scholar] [CrossRef]
Hadjur, H.; Ammar, D.; Lefévre, L. Toward an intelligent and efficient beehive: A survey of precision beekeeping systems and services. Comput. Electron. Agric. 2022, 192, 106604. [Google Scholar] [CrossRef]
Ayup, M.M.; Gärtner, P.; Agosto-Rivera, J.L.; Marendy, P.; de Souza, P.; Galindo-Cardona, A. Analysis of Honeybee Drone Activity during the Mating Season in Northwestern Argentina. Insects 2021, 12, 566. [Google Scholar] [CrossRef]
Hayashi, S.; Satoh, T. Body size and age of drone honeybees (Apis mellifera) affect the structure and characteristics of mating congregations via dispersal. Apidologie 2023, 54, 3. [Google Scholar] [CrossRef]
Muerrle, T.M.; Hepburn, H.R.; Radloff, S.E. Experimental determination of drone congregation areas for Apis mellifera capensis Esch. J. Apic. Res. 2007, 46, 154–159. [Google Scholar] [CrossRef]
Jiang, J.-A.; Wang, J.-C.; Huang, C.-P.; Lee, M.-H.; Liu, A.-C.; Lin, H.-J.; Wang, C.-H.; Chou, C.-Y.; Yang, E.-C. Foraging flight-based health indicators for honey bee colonies using automatic monitoring systems. Comput. Electron. Agric. 2024, 216, 108476. [Google Scholar] [CrossRef]
Neubauer, L.C.; Davidson, J.D.; Wild, B.; Dormagen, D.M.; Landgraf, T.; Couzin, I.D.; Smith, M.L. Honey bee drones are synchronously hyperactive inside the nest. Anim. Behav. 2023, 203, 207–223. [Google Scholar] [CrossRef]
Gradisek, A.; Slapničar, G.; Šorn, J.; Luštrek, M.; Gams, M.; Grad, J. Predicting species identity of bumblebees through analysis of flight buzzing sounds. Bioacoustics 2017, 26, 63–76. [Google Scholar] [CrossRef]
Ribeiro, A.P.; da Silva, N.F.F.; Mesquita, F.N.; Araujo, P.D.C.S.; Rosa, T.C.; Mesquita-Neto, J.N. Machine learning approach for automatic recognition of tomato-pollinating bees based on their buzzing-sounds. PLoS Comput. Biol. 2021, 17, e1009426. [Google Scholar] [CrossRef]
Kontogiannis, S. An Internet of Things-Based Low-Power Integrated Beekeeping Safety and Conditions Monitoring System. Inventions 2019, 4, 52. [Google Scholar] [CrossRef]
Voudiotis, G.; Kontogiannis, S.; Pikridas, C. Proposed Smart Monitoring System for the Detection of Bee Swarming. Inventions 2021, 6, 87. [Google Scholar] [CrossRef]
Ntawuzumunsi, E.; Kumaran, S.; Sibomana, L. Self-Powered Smart Beehive Monitoring and Control System (SBMaCS). Sensors 2021, 21, 3522. [Google Scholar] [CrossRef] [PubMed]
Williams, S.M.; Bariselli, S.; Palego, C.; Holl, R.; Cross, P. A comparison of machine-learning assisted optical and thermal camera systems for beehive activity counting. Smart Agric. Technol. 2022, 2, 100038. [Google Scholar] [CrossRef]
Libal, U.; Biernacki, P. Detecting drones at an entrance to a beehive based on audio signals and autoencoder neural networks. In Proceedings of the IEEE Signal Processing Symposium (SPSympo), Karpacz, Poland, 26–28 September 2023; pp. 99–104. [Google Scholar] [CrossRef]
Libal, U.; Biernacki, P. MFCC Selection by LASSO for Honey Bee Classification. Appl. Sci. 2024, 14, 913. [Google Scholar] [CrossRef]
Libal, U.; Biernacki, P. Drone flight detection at an entrance to a beehive based on audio signals. Arch. Acoust. 2024; Advance online publication. [Google Scholar] [CrossRef]
Hamza, A.S.; Tashakkori, R.; Underwood, B.; O’Brien, W.; Campell, C. BeeLive: The IoT platform of Beemon monitoring and alerting system for beehives. Smart Agric. Technol. 2023, 6, 100331. [Google Scholar] [CrossRef]
Nunes-Silva, P.; Hrncir, M.; Guimarães, J.T.F.; Arruda, H.; Costa, L.; Pessin, G.; Siqueira, J.O.; de Souza, P.; Imperatriz-Fonseca, V.L. Applications of RFID technology on the study of bees. Insect. Soc. 2019, 66, 15–24. [Google Scholar] [CrossRef]
Davis, S.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Signal Process. 1980, 28, 357–366. [Google Scholar] [CrossRef]
Di, N.; Sharif, M.Z.; Hu, Z.; Xue, R.; Yu, B. Applicability of VGGish embedding in bee colony monitoring: Comparison with MFCC in colony sound classification. PeerJ 2023, 11, e14696. [Google Scholar] [CrossRef]
Phan, T.-T.-H.; Nguyen-Doan, D.; Nguyen-Huu, D.; Nguyen-Van, H.; Pham-Hong, T. Investigation on new Mel frequency cepstral coefficients features and hyper-parameters tuning technique for bee sound recognition. Soft Comput. 2023, 27, 5873–5892. [Google Scholar] [CrossRef]
Soares, B.S.; Luz, J.S.; de Macêdo, V.F.; Silva, R.R.V.e.; de Araújo, F.H.D.; Magalhães, D.M.V. MFCC-based descriptor for bee queen presence detection. Expert Syst. Appl. 2022, 201, 117104. [Google Scholar] [CrossRef]
Zlatkova, A.; Gerazov, B.; Tashkovski, D.; Kokolanski, Z. Analysis of parameters in algorithms for signal processing for swarming of honeybees. In Proceedings of the 2020 28th Telecommunications Forum (TELFOR), Belgrade, Serbia, 24–25 November 2020; pp. 1–4. [Google Scholar] [CrossRef]
Cejrowski, T.; Szymanski, J.; Logofătu, D. Buzz-Based Recognition of the Honeybee Colony Circadian Rhythm. Comput. Electron. Agric. 2020, 175, 105586. [Google Scholar] [CrossRef]
Kawakita, S.; Ichikawa, K. Automated classification of bees and hornet using acoustic analysis of their flight sounds. Apidologie 2019, 50, 71–79. [Google Scholar] [CrossRef]
Nolasco, I.; Benetos, E. To bee or not to bee: Investigating machine learning approaches to beehive sound recognition. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), Surrey, UK, 19–20 November 2018; pp. 133–137. [Google Scholar] [CrossRef]
Quaderi, S.J.; Labonno, S.; Mostafa, S.; Akhter, S. Identify the Beehive Sound using Deep Learning. Int. J. Comput. Sci. Inf. Technol. 2022, 14, 13–29. [Google Scholar] [CrossRef]
Kulyukin, V.; Mukherjee, S.; Amlathe, P. Toward audio beehive monitoring: Deep learning vs. Standard Machine Learning in classifying beehive audio samples. Appl. Sci. 2018, 8, 1573. [Google Scholar] [CrossRef]
Ruvinga, S.; Hunter, G.; Duran, O.; Nebel, J.-C. Identifying Queenlessness in Honeybee Hives from Audio Signals Using Machine Learning. Electronics 2023, 12, 1627. [Google Scholar] [CrossRef]
Kim, J.; Oh, J.; Heo, T.Y. Acoustic Scene Classification and Visualization of Beehive Sounds Using Machine Learning Algorithms and Grad-CAM. Math. Probl. Eng. 2021, 2021, 5594498. [Google Scholar] [CrossRef]
Truong, T.H.; Nguyen, H.D.; Mai, T.Q.A.; Nguyen, H.L.; Dang, T.N.M.; Phan, T.-T.-H. A deep learning-based approach for bee sound identification. Ecol. Inform. 2023, 78, 102274. [Google Scholar] [CrossRef]
Biernacki, P. Dataset for Honey Bee Audio Detection. Zenodo. 2023. Available online: https://zenodo.org/doi/10.5281/zenodo.10359685 (accessed on 9 July 2024).
Kay, S.M. Modern Spectral Estimation: Theory and Application; Prentice Hall: Englewood Cliffs, NJ, USA, 1988. [Google Scholar]
Orfanidis, S.J. Introduction to Signal Processing; Prentice-Hall: Englewood Cliffs, NJ, USA, 1995. [Google Scholar]
Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Adaptive Computation and Machine Learning Series; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]

Figure 1. Honeybees Apis mellifera L.: male drone (left) and female worker bee (right). Photography: Beltsville Agriculture Research Center, public domain.

Figure 2. Signal processing diagram.

Figure 3. Diagram of PSD estimation by MUSIC.

Figure 4. The MFCC extraction diagram.

Figure 5. The GTCC extraction diagram.

Figure 6. General structure of autoencoder neural network.

Figure 7. Example of poor class separation: worker bee (training set), drone bee (test set). Histograms of the MSE loss produced by autoencoder neural network with marked threshold values T1, T2, T3, and T*.

Figure 8. Example of excellent class separation: worker bee (training set), drone bee (test set). Histograms of the MSE loss produced by the autoencoder neural network with marked threshold values T1, T2, T3, and T*.

Figure 9. Practical illustration of maximum likelihood approach to the classification with the optimal threshold T* separating the decision areas.

Figure 10. Accuracy of classification of worker bees and drones by autoencoder neural networks with 1, 2, 3, or 4 encoder hidden layers for threshold values: T1, T2, T3.

Figure 11. F1-score for classification of worker bees and drones by autoencoder neural networks with 1, 2, 3, or 4 encoder hidden layers for threshold values: T1, T2, T3.

Figure 12. A novel approach to autoencoder neural networks: transformation of high-dimensional classification problem based on feature vectors to a one-dimensional one based on autoencoder reconstruction error.

Figure 13. Our evaluation board prototype with LTE module and SIM card slot, designed for data acquisition, signal processing, and wireless communication with a beehive.

Table 1. Classification accuracy for class separation by thresholds T1, T2, T3, and T*.

Feature	Autoencoder	Accuracy	Accuracy	Accuracy	Accuracy
Extraction	Structure	for T1	for T2	for T3	for T*
Burg	NN1	95.72%	98.03%	98.60%	98.90%
	NN2	96.57%	98.85%	99.58%	99.61%
	NN3	96.43%	98.84%	99.60%	99.61%
	NN4	96.40%	98.86%	99.65%	99.66%
GTCC	NN1	92.33%	97.62%	98.87%	99.93%
	NN2	92.06%	97.69%	98.94%	99.93%
	NN3	91.64%	97.91%	98.98%	99.94%
	NN4	92.48%	97.82%	98.89%	99.92%
MFCC	NN1	93.89%	97.65%	98.91%	99.89%
	NN2	93.80%	97.65%	98.99%	99.82%
	NN3	93.91%	97.68%	98.93%	99.86%
	NN4	93.78%	97.82%	98.95%	99.85%
MUSIC	NN1	96.87%	96.87%	98.54%	99.55%
	NN2	99.59%	99.85%	99.86%	99.89%
	NN3	97.69%	98.94%	99.54%	99.97%
	NN4	93.99%	96.89%	98.79%	99.89%

The highest accuracy in bold font.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Libal, U.; Biernacki, P. Non-Intrusive System for Honeybee Recognition Based on Audio Signals and Maximum Likelihood Classification by Autoencoder. Sensors 2024, 24, 5389. https://doi.org/10.3390/s24165389

AMA Style

Libal U, Biernacki P. Non-Intrusive System for Honeybee Recognition Based on Audio Signals and Maximum Likelihood Classification by Autoencoder. Sensors. 2024; 24(16):5389. https://doi.org/10.3390/s24165389

Chicago/Turabian Style

Libal, Urszula, and Pawel Biernacki. 2024. "Non-Intrusive System for Honeybee Recognition Based on Audio Signals and Maximum Likelihood Classification by Autoencoder" Sensors 24, no. 16: 5389. https://doi.org/10.3390/s24165389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Non-Intrusive System for Honeybee Recognition Based on Audio Signals and Maximum Likelihood Classification by Autoencoder

Abstract

1. Introduction

1.1. Internet of Things Solutions in Beehive Monitoring

1.2. Audio Analysis Methods

1.3. Machine Learning Algorithms

1.4. Data Acquisition and the Proposed Processing

1.5. The Main Contributions

2. Materials and Methods

2.1. Spectral Coefficients

2.1.1. Burg Algorithm

2.1.2. MUSIC

2.2. Cepstral Coefficients

2.2.1. MFCC

2.2.2. GTCC

2.3. Autoencoder Neural Network

3. Theory

3.1. Anomaly Detection Threshold

3.2. Maximum Likelihood Approach to Classification

3.3. Statistical Evaluation

4. Results

5. Discussion

Prototype of LTE Module

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI