1. Introduction
Drilling systems are used for the exploration and harvesting of oil, gas, and geothermal energy [
1]. Typically, the drilling process is affected by unwanted vibrations, including axial, lateral, and torsional vibrations [
2,
3]. More specifically, axial vibrations manifest as bit bounce; lateral vibrations manifest as buckling; and torsional vibrations manifest as stick–slip. Industry data show that these vibrations can cause serious damage to drill strings and drill bits, such as premature failure of drill string components, excessive wear of the bit, and many other negative impacts. Among all these abnormal vibrations, torsional vibration is the most destructive, affecting approximately 40% of annual drilling depth and leading to extended non-productive time (NPT) [
2,
3,
4,
5]. It is worth noting that these vibration signals not only contain key information for predicting downhole lithology [
6,
7,
8,
9,
10,
11], but their early identification and warning can also effectively reduce abnormal vibrations, thereby significantly reducing risks such as bit wear. Most downhole acceleration sensors collect data at a frequency of up to 100 Hz, which is sufficient to identify traditional stick–slip [
12,
13,
14]. Stick–slip is considered a rotational constraint that may lead to the drill string twisting due to this movement constraint. This vibration mode causes the drill bit to stop rotating (stick) for a period, then releases the restriction, and the drill bit rotates (slip) again under the action of increasing torque [
12,
13,
14,
15,
16]. With advances in sensor technology enabling sampling frequencies up to 1000 Hz, HFTOs have been identified. The fundamental cause of HFTOs is usually considered to be torsional resonance at the natural frequency of the lower bottom-hole assembly (BHA), which is excited by the interaction between the drill bit and the rock. Both types of torsional vibration modes can cause BHA connection fatigue, drill bit wear, and electronic component damage [
2]. These risks underscore the critical need for real-time torsional vibration monitoring and early warning systems, necessitating advanced monitoring solutions beyond traditional approaches [
12].
Extensive research on drill string vibration modeling has been conducted worldwide, primarily focusing on two core components: drill string dynamics modeling and bit–rock interaction modeling [
16,
17,
18]. Research globally mainly focuses on these two directions: drill string dynamics modeling (evolving from low-degree-of-freedom torsional pendulum models to high-fidelity distributed parameter and finite element models [
18,
19,
20]), and bit–rock interaction modeling (mainly including torsional friction models and axial–torsional coupled rock/bit interaction models [
21,
22]). Besselink et al. pioneered the coupling of axial and torsional dynamics, proposing a rate-independent bit–rock interaction law and employing semi-analytical methods to reveal axial vibration as the root cause of torsional stick–slip, while clarifying how bit bluntness parameters regulate stability [
23]. Kamel and Yigit advanced drill string dynamics modeling by integrating hoisting system dynamics with a three-phase polycrystalline diamond compact (PDC) bit cutting model, developing a “smoothness index” optimization function to quantify operational parameter effects on stick–slip and bit–bounce coupling, thus providing guidance for deep well parameter optimization [
24]. Most recently, Sharma et al. focused on geothermal hard rock scenarios, comparing velocity decaying friction (VDF) and state-dependent delayed friction (SDDF) models to validate the superior field data matching of VDF, identifying critical sensitive parameters, and establishing stick–slip operational windows [
25]. Although these methods have made significant contributions, they face major challenges. Vibration modeling and monitoring are complex processes that require ideal conditions. However, in actual drilling operations, these ideal conditions cannot be consistently met due to variations in BHA, reservoirs, geology, and formations [
26,
27,
28,
29].
With the development of downhole measurement tools and intelligent algorithms, various machine learning (ML) methods have been successfully applied to drill string vibration identification [
30]. Traditional approaches include Okoli et al.’s model for classifying axial/lateral vibrations using torque, rotational speed, and weight on bit (WOB) (with an accuracy of 50–85%) [
31], and Saadeldin et al.’s multi-model method for predicting axial, torsional, and lateral vibration modes using surface drilling parameters (with an accuracy of 90–99%) [
12]. For stick–slip vibrations, Hegde et al. pioneered the development of an ML-driven model for grading stick–slip severity [
32]; Gupta et al. proposed a data scale-based model selection strategy: random forest/gradient boosting for small datasets and convolutional neural network–long short-term memory (CNN-LSTM) for large datasets [
33]. Recent breakthroughs include Zha and Pham’s deep neural network (DNN), which achieved 99% accuracy in stick–slip classification using generalized data on torque, rotational speed, WOB, and triaxial acceleration [
34]; Yahia et al., on the other hand, explored LSTM data-driven models, transfer learning, and hybrid models integrating physical features, evaluating the impact of normalization methods on generalization ability [
35]. Kulke et al. developed a stability map algorithm based on downhole high-frequency data, suppressing HFTOs by inducing controlled stick–slip [
36].
Real-time monitoring is crucial for avoiding operational risks [
37,
38]. Michael Yi et al. developed a Bayesian network model that predicts downhole failures (stick–slip/bit–bounce/whirl) using only surface data [
39]; Millan et al. combined surface measurements with machine learning to achieve real-time vibration detection [
40]. Zhao et al. developed an ML event-triggered model to capture abnormal vibration signals in real time based on drilling data [
41]. Zhang et al. confirmed that HFTOs induce drilling tool failures through high-frequency downhole measurements in North America and proposed a real-time parameter adjustment scheme [
42]. Elahifar et al. innovatively combined Bayesian optimized extra trees (BO_ET) with model-agnostic meta-learning to realize real-time prediction of stick–slip events using downhole data [
43]. de Souza et al. integrated pre-job BHA modeling with real-time HFTO monitoring, dynamically adjusting parameters to reduce tool failure rates by 35% [
44]. However, surface monitoring systems are prone to high false alarm rates due to signal attenuation and transmission delays. To overcome these limitations, hybrid approaches have emerged: Sheth et al. integrated physics-based and data-driven methods [
45]; Hutahaean et al. used Bayesian-optimized Random Undersampling Boosting (RUSBoost) decision trees to predict the risk of drilling tool failures involving stick–slip [
46]; and Huang et al. innovatively introduced deep reinforcement learning to autonomously optimize drilling parameters through a comprehensive reward function (incorporating stick–slip, HFTOs, and bit wear) [
47]. For severity quantification, Zhang et al. combined the downhole rotational speed range predicted by eXtreme Gradient Boosting (XGBoost) with continuous wavelet transform (CWT) analysis to achieve real-time assessment of stick–slip severity [
48]. Existing challenges affect the LSTM vibration prediction model by Vishnumolakala et al., which suffers from accuracy degradation in sequences longer than 30 s due to gradient vanishing and convergence issues [
49].
The emergence of high-frequency downhole sensors and high-speed data transmission systems has enabled big data-driven drilling analytics. To address the limitations in prediction accuracy caused by data transmission delays and the inadequacies of downhole vibration data in long-sequence forecasting, this study conducted time-domain and frequency-domain characteristic analyses on five high-frequency downhole engineering parameters (weight on bit, torque, and triaxial acceleration) collected near the drill bit using a self-developed downhole engineering parameter measurement tool (with a sampling frequency of 400 Hz and a cumulative operation time of 23 h). Two characteristic operating conditions of torsional vibration were identified. This study proposes DS-DW-TimesNet, which integrates downsampling modules for computational efficiency, dilated convolutional structures to expand temporal receptive fields, and weight normalization for stable training, collectively forming a lightweight solution for torsional vibration prediction and early warning. Compared with Informer, LSTM, TimesNet-V1, and TimesNet-V2, DS-DW-TimesNet demonstrates superior effectiveness and performance.
2. Methodology
Figure 1 illustrates the workflow of the proposed DS-DW-TimesNet early warning model. The framework achieves precise torsional vibration prediction through three core stages: (a) data preprocessing, (b) model training and optimization, and (c) evaluation of prediction performance. The detailed implementation process is described as follows:
Downhole sensors collect raw high-frequency signals. Due to the limitations of measurement-while-drilling (MWD) transmission bandwidth, the raw data undergo two-step real-time preprocessing downhole: calculating the mean and root mean square (RMS) values of acceleration/torque using a fixed 1 s window, and uploading the calculated results to the surface system, with the raw data stored in downhole tools.
After receiving the data, the surface system processes them through the model’s built-in downsampling evaluation module. First, a downsampling operation is performed. Subsequently, a threefold evaluation using Kullback–Leibler divergence, relative mean, and relative RMS is conducted. Only the downsampled data that pass the verification are input into the prediction model.
- (b)
Model Training and Optimization:
The preprocessed data first undergo layer normalization for standardization before being input into the DW-TimesNet architecture. The model extracts spectral features of the input sequence through Fast Fourier Transform (FFT), identifies k periods, and calculates the corresponding amplitude weights. Based on the identified period lengths, the model reshapes the one-dimensional time series into a two-dimensional tensor representation, capturing both intra-period and inter-period trends simultaneously. In the feature extraction stage, dynamically weight-normalized dilated convolutions are applied to process the two-dimensional representations. The final output is generated through residual summation of multiple TimesBlock modules. Model optimization is achieved through performance evaluation on the validation set.
- (c)
Prediction Result Evaluation:
The finalized model is deployed on the test dataset, with quantitative assessment conducted using three key metrics.
Historical data from offset wells are used for model training. Drilling engineers can view prediction curves through the real-time monitoring interface, and when a risk trend of abnormal torsional vibration is identified, they can intervene in advance to reduce the occurrence of drilling accidents.
2.1. TimesNet
The TimesNet model employs TimesBlock as its backbone for time-series analysis, transforming 1D time series into 2D tensors to enhance representational capacity while simultaneously achieving unified modeling of intra-period and inter-period variations [
50]. The overall architecture of the model is illustrated in
Figure 2.
2.1.1. Dimensional Expansion from 1D to 2D
To uniformly represent the temporal variations within and between periods, it is first necessary to explore the periodicity of the time series. For a one-dimensional time series
with a time length of
T and a channel dimension of
C, its periodicity can be calculated via FFT on the time dimension, specifically as follows:
Among them,
represents the intensity of each frequency component in
. The
k frequencies with the highest intensity
correspond to the most significant period lengths
. The above process is abbreviated as follows:
For the selected period
, the original one-dimensional time series
is folded. Zero-padding is performed at the end of the sequence
to ensure that the sequence length is divisible by the period
.
2.1.2. 2D Temporal Feature Extraction
The architecture comprises multiple stacked TimesBlock modules. The input sequence first passes through an embedding layer for deep feature
presentation. Each subsequent TimesBlock then progressively processes the hierarchical features
from its preceding layer, employing 2D convolutions to extract temporal patterns through learned 2D representations.
2.1.3. Dimensionality Reduction and Adaptive Fusion
The model first projects the 2D temporal features back to 1D space while preserving their original period lengths. It then generates the final output through an amplitude-weighted summation of all 1D sequences, where the weights correspond to the spectral amplitudes of their associated frequency components.
In , represents the removal of the zero-padding added during operation in Step 1.
2.1.4. Residual Architecture
As illustrated in
Figure 3, adjacent TimesBlock modules are interconnected through residual connections. Specifically, the output from the previous layer is element-wise added to the output of the current layer, forming a residual learning framework that guarantees performance stability when increasing model depth.
2.2. Dilated Convolution with Weight Normalization
The initial TimesNet architecture adopted Inception-v1 modules for 2D convolution operations when transforming temporal sequences into two-dimensional representations. However, the inherent computational overhead of Inception’s multi-branch parallel convolutions substantially hinders training efficiency, particularly when processing high-resolution 2D temporal maps [
51]. To address this, this study has implemented dilated convolutions with exponentially increasing dilation rates, which systematically expand the effective receptive field while maintaining computational efficiency.
represents the size of the dilated convolution kernel, denotes the dilation factor, and refers to the size of the original convolution kernel.
Weight normalization (WN) is integrated into the convolutional network to normalize weights vector-wise, enforcing unit norm. This offers three key benefits: (1) faster convergence through stable gradient updates, (2) better generalization via implicit regularization, and (3) improved training stability by mitigating exploding gradients. WN ensures consistent weight magnitudes across layers, enhancing optimization in deep networks.
Representing the weight
using the parameter vector
and the scalar
, the new parameter formulation is given by
where
is the unnormalized weight vector,
is a learnable scaling parameter, and
represents the Euclidean norm of
.
2.3. Downsampling Performance Evaluation
The bandwidth limitation in downhole data transmission significantly affects the accuracy of torsional vibration prediction. To address this issue, an optimized data compression method is proposed, aiming to achieve two critical objectives: (1) substantially reducing the volume of transmitted data and (2) effectively retaining key dynamic features essential for prediction reliability. The performance of the proposed method is evaluated using the following quantitative metrics: Kullback–Leibler (
) divergence to measure the distributional differences in transmitted data before and after compression [
52], and the relative mean variation (
) and relative variance variation (
) to quantify the preservation of transmission data distribution characteristics [
53,
54].
4. Model Training
This study utilizes near-bit measurements including WOB, torque, and triaxial acceleration to construct two preprocessed datasets: Case One applies n-second moving average filtering (n ∈ {1,2,3,4,5}) to smooth transient fluctuations, while Case Two employs RMS processing within identical time windows to preserve vibration energy characteristics. The datasets contain labeled samples of both normal vibrations and torsional oscillations, with n = 1 (representing original downhole data transmission resolution) serving as the baseline for comparative validation. For time-series data, both datasets are partitioned into training, validation, and test sets in a 7:1:2 ratio, strictly following the chronological order.
For the multi-step prediction task, five comparative experimental models were established: (1) DW-TimesNet; (2) TimesNet-V1, referring to the original TimesNet architecture; (3) TimesNet-V2, which uses the InceptionV2 structure to extract two-dimensional temporal features; (4) Informer; and (5) LSTM. To eliminate random initialization bias, each experiment was repeated 10 times, taking the average as the result. The predictive performance was evaluated using standard regression metrics: the coefficient of determination (R
2), MAE, and root mean square error (RMSE). In this context,
denotes the observed value,
denotes the predicted value, and
,
denotes the mean value.
This study adopts a fully unsupervised learning paradigm, where the original time-series data contain no labels generated manually or by algorithms. The model hyperparameters are set as follows: the learning rate is 0.01, the batch size is 32, patience for early stopping is 3, the dropout rate is 0.1, and the number of training epochs is 10; other hyperparameters are detailed in
Table 2 below.
The models were employed to predict drilling data from both datasets. All experiments were conducted using the PyTorch 2.2.2+cu121 framework on a hardware platform equipped with an AMD Ryzen 7 7735H CPU, 60 GB RAM, and an NVIDIA RTX 4050D GPU.
5. Experimental Results and Analysis
To achieve comprehensive wellbore data utilization and real-time early warning, the selected model must possess low computational complexity while maintaining stable performance in long-sequence prediction. These characteristics are essential to adapt to field conditions, mitigate transmission error impacts, and ultimately enhance both the accuracy and timeliness of early warnings.
5.1. Case One
As shown in
Table 3, when
n = 3, KL divergence in the normal acceleration dimension is significantly lower than that for
n = 4 and
n = 5 relative to the transmitted data. This indicates limited variation in the normal dimension—lower volatility and distribution closer to the transmitted data. Additionally, mean and variance changes across
n values show negligible deviation from the transmitted data’s mean, with no significant fluctuations in variance.
Figure 7 confirms this: comparing
n = 3 with the transmitted data reveals virtually invariant non-stationary distribution across all dimensions, validating robust data integrity preservation.
This experiment evaluates multi-step prediction performance across varying time horizons (10–70 steps). All predictions were iteratively generated based on a fixed 10-step historical observation window. The selected prediction horizons were empirically determined through preliminary experimental analysis. Given the 3 s sampling interval of the dataset, these prediction steps correspond to drilling operation forecasts spanning 30 to 210 s in increments of 30 s.
Compared to the original dataset, the model’s runtime decreased from 108.4 s to 47.8 s after downsampling, representing a 55.9% reduction. This demonstrates a significant improvement in computational efficiency.
The proposed model demonstrates superior performance compared to other deep learning architectures. Particularly at the 210 s prediction step, its long-term time-series forecasting capability is significantly improved. Comparative analysis with LSTM shows that TimesNet-V1 achieves a significant average reduction of 78.8% in RMSE, which confirms its accuracy in trend capture and reliability in multi-step forecasting. In addition, MAE is reduced by 77.2%, indicating a significant decrease in prediction deviations and higher overall precision. Detailed comprehensive performance metrics are presented in
Figure 8.
Figure 9 compares the triaxial vibration prediction performance of different models at the 30 s and 210 s prediction horizons. All models demonstrate accuracy in short-term predictions at 30 s, with the predicted curves closely matching the actual values. However, at the 210 s prediction step, only TimesNet maintains prediction accuracy, while other models show significant performance degradation.
The experimental results demonstrate that while DW-TimesNet shows marginal accuracy gains over TimesNet-V1/V2, its optimized architecture—featuring weight-normalized two-layer dilated convolutions replacing Inception-V1—achieves substantial computational improvements. In continuous 34,353 s predictions, DW-TimesNet reduces runtime by 56.1% at 210 s horizons compared to TimesNet-V1. The simplified structure decreases floating-point operations (FLOPs) by two orders of magnitude (30 s input/210 s prediction) while maintaining prediction accuracy during incremental training; the specific data are shown in
Table 4.
Comparing the curves in
Figure 10 for 30-s and 210 s predictions based on 30 s historical data, the TimesNet model does not exhibit a sharp increase in error as the prediction step increases. The model was tested using non-dataset data from the same well, with the results presented in
Figure 10. Verified by actual drilling data, the model accurately captures the periodic characteristics of HFTOs and achieves precise prediction of peak values. For the measured HFTO periodicity of approximately 130 s in this well, the model can effectively predict the subsequent 210 s vibration trend using 30 s historical data. This design achieves two goals:
- (1)
Complete cycle coverage: The prediction duration covers the full HFTO cycle of ≥130 s, ensuring accurate assessment of the evolution trend of oscillation energy;
- (2)
Key decision redundancy: An additional 20–50 s of buffer time is provided, compatible with downhole command transmission delays (5–10 s) and operator response time (10–30 s).
5.2. Case Two
Table 5 shows that while the relative mean changes for all values of
n are close to zero and the relative variance changes consistently match the transmitted data, there is high stability in both mean and variance measures. However, at
n = 2, KL divergence reveals significant distributional differences in the tangential and normal dimensions compared to the transmitted data. In
Figure 11, the comparison between
n = 2 and the transmitted data indicates compromised data integrity along the tangential and normal axes.
The experiment conducts multi-step predictions (30–150 steps at 30-step increments) using a 30-step historical window. Given a 1 s sampling interval, these steps translate to 30–150 s drilling operation forecasts.
Comparative analysis shows that although TimesNet has slightly lower R
2 values, it outperforms other models across different forecasting horizons. Specifically, compared with LSTM, TimesNet-V1 achieves an average reduction of 11.19% in RMSE and a 7.91% decrease in MAE. As shown in
Figure 12, the model’s predicted curves are highly consistent with the actual measurements, which further validates its superiority.
While all models demonstrate excellent short-term forecasting capabilities, only TimesNet maintains high accuracy in long-term predictions. The experimental results in
Table 6 show that DW-TimesNet, as an optimized variant, achieves a 48.17% reduction in runtime compared to TimesNet-V1 at the 150 s prediction step, with detailed indicators provided in
Table 6.
Figure 13 compares the enlarged 1800 s segments of DW-TimesNet predictions at 30 s and 150 s horizons. The results show that in the enlarged 1800 s prediction segment, although the overall trend is consistent with the actual signal, there are large local deviations, which explains the occurrence of a low R
2. This is because R
2 quantifies both global trend alignment and point-wise fitting accuracy. In torque vibration early warning applications, where the main goal is to detect macroscopic trend anomalies, the actual impact of these local errors is negligible.
As shown in
Figure 14, the predicted results are remarkably consistent with the characteristics of stick–slip and have been rigorously verified against actual drilling data. The predicted outcomes not only capture typical stick–slip behavior but also reveal an interesting phenomenon: regular periodic fluctuations in normal acceleration may serve as an early precursor to the occurrence of stick–slips. This provides a new perspective for the early warning of stick–slip events. In practical drilling operations, the timely and accurate detection of these signs is crucial. Once these early warning indicators are identified, effective mitigation strategies can be promptly implemented, such as increasing torque and adjusting rotational speed.
Based on an in-depth analysis of the stick–slip cycle characteristics, it has been determined that utilizing 30 s of historical data to predict vibrations for the subsequent 150 s is an optimal approach. This conclusion is derived from testing the trained model on non-dataset data from the same well, with the results illustrated in
Figure 14. This time interval ensures comprehensive coverage of the entire stick–slip cycle, considering the inevitable time delays in downhole data transmission and providing operators with sufficient response time.
5.3. Adjacent Well Test
To verify the generalization performance of the early warning model, this study selected Manshen Well in Shaya County, Aksu Prefecture, Xinjiang Uygur Autonomous Region, China, for testing. This well is approximately 80 km in a straight-line distance from Yueman Well, which can effectively test the model’s adaptability in different geographical locations. The tested well section is 3501–3863 m, with a total working time of 83 h, and its BHA is consistent with that of Yueman Well, ensuring the comparability of the test results to a certain extent.
In the specific testing process, this study adopted the 210 s root mean square model and 150 s mean model for data processing and analysis of Manshen Well. These two models, with different time window settings, can comprehensively capture the characteristics of the well data from multiple perspectives, thereby more accurately evaluating the generalization performance of the early warning model. The statistical data of Manshen Well processed by the two models are shown in
Table 7 below.
Figure 15 and
Figure 16 show comparisons of the prediction results based on the mean dataset and RMS dataset, respectively. In terms of curve trends, the model’s prediction results are generally consistent with the measured data. However, since the test well and the training well belong to different geological blocks and there are differences in drilling depth, the prediction accuracy at some peak positions has decreased. It is expected that better fitting results can be obtained if adjacent well data from the same block and with similar depths are used for testing.
From the vibration data in
Figure 15, the tangential vibration shows an obvious peak at around 150,000 s, while the axial vibration has a large fluctuation range with significant negative values. These characteristics are consistent with the periodic, large-amplitude fluctuation features caused by the “stick–slip” alternation in torsional vibrations. In particular, the severe fluctuations in axial vibrations conform to the force mutation characteristics during energy storage and release of the drill string in stick–slip, so it can be judged that stick–slip vibrations exist. Combined with the further verification results of downhole stored data, it can be clearly determined that stick–slip is present.
From the analysis of the vibration data characteristics in
Figure 15, it can be seen that the vibration signals do not exhibit the typical time-domain characteristics of HFTOs. Combined with the further verification results of downhole stored data, it can be clearly determined that HFTOs are absent.
5.4. Limitations and Future Work
While the proposed DS-DW-TimesNet demonstrates superior performance, several limitations should be acknowledged:
Geographical Generalization: All experiments were conducted with data from the Fuman Oilfield. The model’s performance in other geological formations requires further validation.
Noise Robustness: Downhole sensor noise may degrade prediction accuracy, which was not explicitly tested.
Data Gaps: The model assumes continuous data streams. Its resilience to missing data scenarios needs evaluation.
Physical Interpretability: Unlike physics-based models, the black-box nature of deep learning may limit operational trust.
Future work will center on adjacent well testing under the same block, depth, and formation conditions, while focusing on the following directions: (a) conducting multi-basin validation; (b) developing hybrid modeling combined with adaptive filters; (c) and researching real-time noise suppression technologies.