1. Introduction
Forests are the principal part of the terrestrial ecosystem and renewable resources of our planet. Forests enhance carbon sequestration, prevent soil erosion and desertification, contribute to the protection of watersheds and air quality, and provide habitats to a diverse array of species [
1]. Different forest disasters, including forest fires, tree pathogens, insect pests, and rodents, severely threaten the health of forest ecosystems. They further impact the stable development of agriculture, forestry, and the livelihood of humans [
2]. Among forest pest species, trunk-boring beetles are particularly difficult to manage. Such species include
Agrilus planipennis (Coleoptera: Buprestidae),
Semanotus bifasciatus (Coleoptera: Cerambycidae), and
Eucryptorrhynchus brandti (Coleoptera: Curculionidae). They tunnel and feed in the cambium layer of trees, which transports nutrients and water to the leaves. As a result, infested trees become increasingly weaker, their limbs and branches gradually fall off, and they eventually die. What is more troublesome is that some borer infestations often go unnoticed until plants or parts of plants begin to die or show external signs of damage. To deal with this issue, a relatively novel method is to embed a piezoelectric accelerometer into the tree trunk to pick up boring vibrations caused by larvae, and then to feed these vibrations to a trained model to distinguish whether the trunk is infected [
3]. To date, the standard equipment for vibration detection using contact sensors is the piezoelectric accelerometer. An accelerometer consists of a piezoelectric crystal coupled with a seismic mass. It detects the displacement of the substrate to which it is attached and measures the acceleration. Regarding its reliability, the highest degree of accuracy is reached by stud mounting the sensor on the substrate [
4]. Automatic detection of wood-boring larvae has been investigated before, but the signal analysis has constantly been hampered by background noise that is recorded simultaneously. In order to increase the detectability of vibrations, our research focuses on the preprocessing of this method, which is the enhancement of boring vibration signals.
Several studies have applied the technique described above in the identification of boring vibrations. Bilski et el. [
5] used an accelerometer to record the vibrations of wood-boring insects’ larvae in wooden constructions, and then employed the support vector machine to perform classification based on features defined in the time domain. Sutin et al. [
6] designed an algorithm that automatically detects the pulses of
Anoplophora glabripennis and
Agrilus planipennis larvae with parameters typical for larva-induced signals. The trunk was identified as infected when the mean rate of the detected insect pulses per minute exceeded a predefined threshold. Zhu et al. [
7] utilized the sound parameterization technique, which is frequently used in speech recognition, to discern insects. In their study, mel-frequency cepstral coefficients (MFCCs) were extracted from the recordings after preprocessing, followed by classification using a trained Gaussian mixture model (GMM). With the development of deep learning and neural networks, artificial intelligence has been shown to be a promising solution for various challenges that require specialized and labor-intensive work [
8]. Many researchers in recent years have held the same point of view and adopted deep learning models in their studies. For example, Sun et al. [
9] proposed a lightweight convolutional neural network called InsectFrames, which contains only four convolutional layers. They employed the technique of keyword spotting to automatically identify the boring vibrations of
Semanotus bifasciatus and
Eucryptorrhynchus brandti larvae. Karar et al. [
10] proposed a new IoT-based framework for early vibration detection of red palm weevils using the fine-tuned transfer learning classifier InceptionResNet-V2, which was trained using vibration data collected by a Tree Vibes [
11] recording device.
Although some studies are available, research that has taken the interference of background noise into consideration is rarely seen. However, a particular challenge for the recognition model is the task of detecting relevant signals in the presence of noise. Noise is defined as unwanted sound or signal. There are biotic and abiotic noise sources, such as wind or rain, which is generally below 2 kHz, or anthropogenic noise caused by traffic and heavy machinery [
12]. Although the vibrational signaling channel has been traditionally considered “private” and thus is less influenced by environmental noise than the acoustic channel, it can also be highly noisy in plant environments. In the vibrational channel, the frequency range of boring vibrations and the frequency range of noise from the environment overlap, causing severe interference. Studies of the natural vibrational environment show that regardless of the environment studied, geophysical vibrations induced by light wind are nearly always a component of the natural vibroscape that is present. Stronger wind gusts generate high-amplitude vibrations in the frequency range up to 5 kHz, characterized by rapid, unpredictable short-term variations in the amplitude [
4]. Our recordings show that the boring vibrations of
Agrilus planipennis are characterized by frequencies slightly below 2 kHz and about 17 kHz. As depicted in the spectrogram of noise in our recordings, the frequency of birds’ twitter ranges from 1.5 to 4 kHz and the frequency of babble noise ranges from below 400 Hz to above 1 kHz. The amplitudes of boring vibrations are typically low and subjected to masking by incidental noise of a biotic and abiotic origin. In addition to incidental noise, noise from the measurement equipment itself is also contained in the signals. Noise has been verified to have a negative impact on the recognition accuracy [
13]. Mankin et al. [
14] analyzed the vibrations of
Rhynchophorus and the background noise in both the time and frequency domain. The results indicated that part of the background noise that has the same frequency as the larval vibrations could interfere with the discrimination of an infestation. Liu et al. [
15] designed a recognition model based on the convolutional neural network (CNN) to recognize the boring vibrations of
Semanotus bifasciatus. Moreover, they tested the noise immunity of the proposed CNN model and GMM. The results clearly showed that noise had a significant impact on the classification accuracy of both the CNN model and GMM: the lower the signal-to-noise ratio (SNR), the greater the decrease in accuracy. When the SNR was −7 dB, the recognition accuracy decreased by 10.8% and 15.6% for the CNN model and GMM, respectively. Zhou et al. [
16] introduced improved anti-noise power normalized cepstral coefficients (PNCCs) based on the wavelet package for trunk borer vibrations, and adopted the genetic algorithm support vector machine (GA-SVM) as a classifier. The audio clips in the research consisted of the clean boring vibrations of borer pests of five different species and various kinds of environmental noise. For a −5 dB SNR, the accuracy of the model decreased from 100% to 83%, and a further decline to 70% for −10 dB SNR.
A previous study [
5] pointed out that noise that deteriorates proper larva detection should be suppressed if possible. Other research [
17] also indicated that environmental noise can be significant and can cover the feeble vibrations of wood-boring insects, consequently leading to false alarms. Therefore, the addition of a denoising or enhancement procedure to boring vibrations can mitigate or even eliminate interference, ensuring accurate early detection and opportune treatment. Yet, most existing discrimination methods of boring vibrations lack a noise suppression procedure or adopt primitive techniques. These techniques include spectral subtraction and the minimum mean square error short-time spectral amplitude estimation in the frequency domain, and adaptive filtering methods in the time domain. Most of the described noise reduction methods require a priori knowledge of the noise profile to operate correctly [
13]. Thus, these methods do not yield satisfactory results and are unpractical. With the development of signal analysis in recent years, significant progress has been made. It is now time for them to be applied to biotremology.
To engineers and physicists, both sound and vibration encompass mechanical waves that can be technically described as both vibrational and acoustic. The categorization of these signals in biotremology is biological or perceptual in nature. Sound or acoustic waves are far-field, purely longitudinal waves perceived by pressure or pressure-difference receivers while vibrations are applied in two further ways of emitting mechanical energy in biological interactions, including “contact vibration or rhythmic touch” and “near-field medium motion”. They are perceived by motion detectors. The transmission medium imposes important differences as well. Sounds are air-borne signals and vibrations are substrate-borne signals. Air is a relatively homogeneous medium, and its properties are fairly predictable. On the other hand, substrates are very heterogeneous media, and their transmission properties that influence attenuation and filtering differ depending on the physical properties of the substrate [
4]. Despite these differences, boring vibrations may share the same processing technique as acoustic signals. Regarding boring vibration identification models, one of them [
9] employed the technique of keyword spotting. Similarly, it is theoretically feasible to apply speech enhancement to boring vibration signals.
Speech enhancement is the task of using noisy speech as input and producing an enhanced speech output for better speech quality, intelligibility, and, sometimes, better criterion in downstream tasks [
18]. Since one probe is capable of detecting boring vibrations within a spherical region of a trunk [
10], the signal is a single channel, and monaural speech enhancement is the proper technique for this situation. Classical speech enhancement methods include spectral subtraction, Wiener filtering, statistical model-based methods, and subspace methods [
19]. Although the above-mentioned algorithms have the capability to suppress background noise, they are cumbersome and complicated [
20] and do not generalize well. Recently, neural network-based approaches have experienced much success in speech enhancement due to their powerful modeling capabilities [
21]. Among the neural network-based methods, a portion of them carry out enhancement on frequency-domain acoustic features, which are called spectral-mapping-based approaches. In these approaches, speech signals are analyzed and reconstructed using the short-time Fourier transform (STFT) and inverse STFT, respectively. Another class of methods directly perform enhancement on the raw waveform, which are called waveform-mapping-based approaches [
22]. The waveform-mapping-based approaches do not rely on the representation of speech signals in the frequency domain, and as a result avoid the loss of accurate phase information. In addition, it is a simpler procedure due to the cancellation of unwanted signal transformation between the time domain and frequency domain.
This study takes the boring vibrations of emerald ash borer (EAB) larvae as the research subject. The emerald ash borer,
Agrilus planipennis Fairmaire, 1888 (Coleoptera: Buprestidae), is an invasive beetle of East Asian origin that has caused extreme levels of mortality in ash [
23], with a devastating economic and ecological impact [
24]. The monitoring methods for EAB are mainly visual inspections and the application of pheromone and color traps. The control measures include the cutting of infested trees, which are mostly detected by dry branches and typical D-shaped exit holes on the bark. In addition, the replacement of North American and European ash trees with more resistant Asian ash species or possibly hybrids, and chemical and biological control is available [
25]. The detection of trunk-boring beetles by their vibrational cues is an efficient and convenient approach. It is independent of visual access. Thus, it is capable of an early warning and early reaction. The aforementioned necessity for a denoising or enhancement process for the boring vibration signals of Buprestidae encouraged us to propose a waveform-mapping-based boring vibration enhancement model called VibDenoiser (Vibration Denoiser), which consists of convolution layers and SRU++ [
26] layers. The dataset used in this research consists of boring vibrations and environmental noise and a mixture of them. For boring vibrations, we inserted a piezoelectric vibration probe into several ash trunks that were infected by EAB larvae to pick up their boring vibrations. For environmental noise, we inserted the same probe into a dead ash trunk that was not infected by EAB and placed the trunk in noisy environments to pick up noise propagated in the trunk. Our results showed that VibDenoiser is able to increase SNR by 18.57, and it runs in real-time on a single laptop CPU core. We applied our noisy boring vibrations to four well-known classification models, namely VGG16 [
27], ResNet18 [
28], SqueezeNet [
29], and MobileNetV2 [
30]. Their classification accuracies were 81.14%, 89.39%, 78.45%, and 85.77%, respectively. It is gratifying that their accuracies increased by a large margin, reaching 92.51%, 96.47%, 88.89%, and 90.40% using vibration clips enhanced by our model. These results prove that VibDenoiser is able to suppress noise effectually with an affordable expense, ensuring a more accurate early detection of larvae.
4. Discussion
Until now, convenient and efficient monitoring of trunk-boring beetles has remained a difficult problem in pest control and forest management. An appropriate solution is to record vibrations in tree trunks by embedding a piezoelectric accelerometer and use a trained model to detect boring vibrations in these recordings. As stated in a previous study, environmental noise can be significant and cover the feeble vibrations of wood-boring insects [
17], thus having a negative impact on the recognition accuracy [
13]. It is necessary to add an enhancement procedure to the boring vibrations before recognition. Inspired by a previous study [
9], the method for acoustic signals is applicable to biological signal analysis. Specifically, the use of methods for processing air-borne sound signals is also feasible for substrate-borne boring signals. Considering the necessity for the enhancement procedure, deep learning-based speech enhancement is the best alternative to realize it. Deep learning-based speech enhancement has shown numerous breakthroughs in recent years. Its powerful modeling capability may enhance signals beyond speech. An approach to detecting larvae activity relies heavily on clear boring vibration signals. Aiming to alleviate this problem, our research successfully applied speech enhancement to the boring vibrations of trunk-boring larvae. Despite this interest, as far as we know, almost no researchers have applied enhancement models to boring vibrations. We recorded dozens of hours of EAB boring vibrations and environmental noise to create our dataset. The enhancement model VibDenoiser was proposed by us, employing deep learning-based speech enhancement. VibDenoiser is a waveform-mapping-based model with an encoder-decoder architecture and convolutional recurrent neural network (CRN). The encoder and decoder both consist of five convolution layers, and two layers of SRU++ are placed between them, functioning as sequence modeling module. The loss function for our model is log-cosh, which is a time domain point-to-point loss. It resembles the mean square error but is not susceptible to abnormal points. We also tested the frequency domain loss function, but it yielded a poor result. This was probably caused by the time domain evaluation metrics we adopted. The model with 8-layer SRU++ showed the best performance in our study. However, the model size and inference time were not satisfying and can be further reduced. The 2-layer SRU++ substantially reduced the model size and inference time at the expanse of a slight decrease in the enhancement effect, which is more in line with the practical application. It was clearly seen in the frequency spectrum of both the noisy and enhanced boring vibration segments (
Figure 4) that VibDenoiser could suppress most of the noise, leaving clear vibrations of EAB for classification. We applied four well-known classification models to both noisy and enhanced boring vibration segments for discrimination, and the accuracy was increased by a large margin. This further proves the necessity of an enhancement procedure and the excellent performance of VibDenoiser.
Datasets in future research studies should include boring vibrations of a variety of larvae to train models with improved universal applicability. Future model development should aim for lighter models with less parameters and faster inference. Our model is rather valuable for the development of insect pest monitoring. It can be integrated into larvae surveillance programs with mobile deployment for general use. Prototypes of trunk-boring beetle monitoring system built with our model are capable of early warning in both forests and urban areas; therefore, this enables early reaction and treatment, such as quarantine measures, sanitation felling, chemical control, and biological control [
25]. Some parasitoids are potential agents for the biological control of EAB. For example, a braconnid
Spathius polonicus Niezabitowski (Hymenoptera: Braconidae: Doryctinae); a species of egg parasitoid
Oobius sp. (Hymenoptera: Encyrtidae); and three species of larval ectoparasitoids:
Tetrastichus planipennisi Yang (Hymenoptera: Eulophidae),
Atanycolus nigrivensis Voinovskaja-Krieger (Hymenoptera: Braconidae: Braconinae), and
Spathius galinae Belokobylskij et Strazanac (Hymenoptera: Braconidae: Doryctinae) [
46]. Chemical (insecticide) control of EAB falls into three categories: systemic insecticides that are applied as soil injections or drenches; systemic insecticides applied as trunk injections or trunk implants; and protective cover sprays that are applied to the trunk, main branches, and foliage. Insecticide formulations include Merit
®, IMA-jet
®, Imicide, and some other options [
47]. We hope that our research study is conducive and facilitates the development of a non-invasive early detection tool for larvae infestations. Consequently, it contributes to pest control and forest management.