As a crucial component of agricultural mechanization, tractors play a significant role in agricultural production. Equipped with various implements, tractors can efficiently perform tasks such as land cultivation and planting, thereby reducing the labor intensity of farmers and enhancing agricultural productivity [
1]. However, tractors face numerous challenges [
2], including the need to transmit large amounts of power, a wide range of speed variations, harsh operating environments, and frequent load fluctuations. Once the tractor’s transmission system experiences a decrease in reliability or even serious failure, it not only delays agricultural operations and affects crop yields but also compromises the overall safety of the tractor [
3,
4]. As a result, diagnosing issues with large tractors’ transmission systems is crucial.
With the advancement of agricultural mechanization, tractors are evolving towards larger and more intelligent configurations. The complexity of tractor transmission systems has increased with this development, leading to a diverse range of potential faults [
5]. As a result, traditional fault diagnosis methods are no longer adequate. During operation, components like the gearbox of tractors generate vibrations. A commonly used deep learning fault diagnosis method involves analyzing information from these vibration signals to detect issues with the gearbox [
6,
7]. Gangsar, Purushottam et al. [
8]. employed deep learning techniques for state detection and fault diagnosis of rotating. Feng et al. [
9] proposed an adaptive spiral flight SSA algorithm to eliminate the interference components caused by noise and vibration sources. This method is a Gaussian Laplacian (LoG) filtering technique optimized by improving the sparrow search algorithm (SSA), and its effectiveness has been verified through experiments. Wang et al. [
10] proposed a lightweight fault diagnosis method based on an attention mechanism and multi-layer fusion network, which addresses the conflict between the large number of parameter calculations in deep networks and the current embedded platform computing resources. This method proposes a lightweight student network that reduces computational complexity and improves computing speed while ensuring accuracy. Saadi et al. [
11] proposed a novel BILSTM neural network that extracts frequency characteristics from two-dimensional images of vibration signals and recombines frequency data from vibration images to improve fault recognition accuracy. Guo et al. [
12] proposed a method based on attention CNN and BILSTM (ACNN-BILSTM) to solve the problem of existing bearing fault diagnosis methods being unable to adaptively select features and difficult to handle noise interference. This method introduces a convolutional block attention module that reallocates weights between different feature dimensions, improving the model’s attention to important features and achieving a high fault recognition rate.
Intelligent algorithms have been widely used in a variety of pattern recognition domains, including image processing [
13], computer vision [
14], natural language processing [
15], and medicine [
16], thanks to the development of deep learning technology. Concurrently, related algorithms are also increasingly utilized in mechanical fault diagnosis. Algorithms for defect diagnosis have been becoming more sophisticated and intelligent in recent years. For bearing defect identification, Zhang et al. [
17] proposed a multi-scale deep residual shrinkage network with a mixed attention mechanism to address the impact of unexpected noise caused by accessible vibration signals and global information attenuation in deep networks during fault diagnosis. This method introduces a spatial domain attention mechanism in the residual shrinkage module and constructs a mixed attention mechanism that takes into account both internal and cross channel characteristics. The combination of DRSN and dilated convolution features enhances the global fault information of rolling bearings and improves the accuracy of the model under noise interference. A novel signal decomposition method called empirical standard autoregressive power spectral decomposition was presented by Zhang et al. [
18]. This method can effectively decompose bearing fault signals and identify all fault characteristics. With an accuracy of 94.08%, Ravikumar, K.N. et al. [
19] presented a fault diagnostic model that combines residual learning [
20] and convolutional neural networks (CNNs) [
21] for bearing and gear problem identification. A novel approach to defect diagnostics for tractor gearboxes was put out by Mohammad Hosseinpour-Zarnaq et al. [
22]. With a 95% accuracy rate, this approach uses vibration signals from gears and Random Forest (RF) [
23] and Multi-Layer Perceptron (MLP) [
24] neural networks for data classification. Most of the methods described above involve processing the original vibration signals using algorithms such as Continuous Wavelet Transform (CWT), Discrete Wavelet Transform (DWT), Fast Fourier Transform (FFT) [
25,
26], etc. Then, the synchrosqueezed transform (SST) [
27] is used to compress the signal and improve the resolution. Finally, the data are converted into two-dimensional images combined with neural networks for fault recognition and classification. Although synchrosqueezed transform (SST) technology can improve the quality of signal analysis, it requires high computational costs and processing power. In addition, during the process of converting one-dimensional signals, crucial information along the time series may be lost [
28], ultimately affecting the accuracy of recognition. Additionally, converting one-dimensional signals into images can increase computational complexity and storage requirements [
29]. Huang et al. [
30] conducted an analysis of the feature extraction mechanism of one-dimensional convolutional neural networks, revealing that they exhibit excellent learning capabilities for time series data. Sun et al. [
31] used one-dimensional convolutional neural networks to diagnose bearing faults, and the results were highly accurate. The aforementioned methods yield satisfactory results in traditional fault diagnosis scenarios such as gearboxes and bearings. However, extensive data preprocessing is required before classification, leading to complex network structures, indicating room for optimization in terms of the structure and efficiency of these algorithms for fault diagnosis. Additionally, these algorithms do not address the issue of noise interference in the harsh working environments of tractors, rendering them potentially unsuitable for fault diagnosis in modern tractor transmission systems.
The general procedure of fault diagnosis involves data collection, data preparation, feature extraction, identification, and classification. An essential component of fault detection is feature extraction [
32]. Thus, using bidirectional long short-term memory (BILSTM) and one-dimensional convolutional neural networks (1DCNNs), this research suggests a defect detection technique for tractor transmission systems. The method aims to enhance fault recognition accuracy and robustness by improving the CNN network to construct a novel feature extractor. Firstly, different scales of feature extractors are constructed using one-dimensional convolutional neural networks (1DCNNs) to directly extract feature information at different levels. Secondly, in order to enhance feature learning and increase the accuracy of defect recognition, a multi-head attention mechanism (MHA) is introduced. Additionally, an adaptive soft threshold is incorporated to further improve the model’s resilience and capacity for generalization. Finally, to accomplish fault recognition and classification, the fused features are fed into a classifier made up of completely linked layers and bidirectional long short-term memory (BILSTM).