1. Introduction
Well logging is a geological exploration technique that involves using specialized logging tools, which are lowered into the well during or after drilling, to measure the physical parameters of the formation near the wellbore. The data is transmitted in real-time to the surface via cable or radio transmission. The raw data obtained is processed and analyzed, converted into useful geological information, and interpreted by experts. This provides crucial data support for subsurface geological studies, oil and gas exploration, and resource evaluation. Common well-logging data include caliper, density, gamma-ray, neutron porosity, and p-wave, among others. These parameters help in understanding the geological characteristics of the area, rock types, fluid content, and the hydrocarbon potential of the subsurface formations. Shear wave velocity (Vs) is a crucial parameter for petrophysical parameter estimation of reservoir, pre-stack inversion, fluid- type identification, unconventional oil and gas development, and CO
2 injection [
1,
2,
3,
4,
5]. However, the accurate acquisition of Vs is hindered by the high cost of dipole acoustic logging measurements and the limitations imposed by borehole conditions, leading to a lack of Vs data in various regions. Therefore, Vs prediction is essential. In the related research, various prediction methods have been proposed for estimating Vs. These prediction methods can be divided into the following three categories: (1) the empirical correlation formulas, (2) the theoretical rock physical models, and (3) artificial intelligence methods.
Empirical correlation formulas were established using the relationship between existing logging data and the S-wave velocity in the specific study area, incorporating various mathematical equations [
6,
7,
8,
9]. The empirical correlation formulas are simple and can be applied quickly. However, they are only suitable for relatively simple geological environments and are highly dependent on the characteristics of the study domain [
10]. This limits the application of empirical correlation formulas to predict Vs in complex geological conditions.
To address the limitations of empirical correlation formulas, theoretical rock physical models have been introduced to establish the relationship between elastic parameters and reservoir parameters to predict Vs. Sun et al. utilized the DEM–Gassman model to achieve a more accurate prediction of Vs than that of an empirical correlation formula [
11]. Wang et al. introduced a method that inverts the coordination number of saturated rocks from the P-wave velocity by integrating the unified granular media model with Gassmann’s equation [
12]. Zhang et al. designed a statistical model that links logging curves with Vs using Bayesian inversion to calculate key petrophysical parameters of the Xu–White model [
13].
To obtain an accurate prediction result, it is crucial to precisely determine various rock physics parameters such as mineral components, pore characteristics, and fluid distribution. Additionally, due to noise interference, there is some uncertainty in the prediction results of the theoretical rock physical models when processing field data. The prediction process using these theoretical models is complex and inefficient [
14].
Artificial intelligence, especially deep learning, has developed rapidly and found widespread applications in recent years. Convolutional Neural Networks (CNNs) are a widely used model applied across various fields. CNNs can automatically and adaptively learn spatial features, establishing nonlinear relationships between inputs and outputs. There are various variants of CNNs in deep learning, including AlexNet, which introduces the use of the ReLU activation function and dropout to significantly improve training speed and prevent overfitting; VGGNet, which uses smaller convolutional kernels (3 × 3) and a deep network structure to enhance feature extraction; ResNet, which introduces residual connections to address the gradient vanishing problem in deep networks; and Inception, which implements multi-scale feature extraction through parallel convolution operations (1 × 1, 3 × 3, 5 × 5) while introducing bottleneck layers to reduce the number of parameters. Wang et al. summarized the application scope of CNNs with different dimensions: 1D-CNN is suitable for time series and text processing, 2D-CNN is used for computer vision, and 3D-CNN is applied to process CT image and video analysis [
15]. Due to the inherent relationship between Vs and well as logging data, various artificial intelligence methods have been developed to predict the Vs using well-logging data. For example, both compressional waves and shear waves are influenced by the same rock characteristics, and their velocity values exhibit a positive correlation [
16]. Zhang et al. analyzed the effect of different logging combinations on Vs prediction based on the established 1D-CNN model [
17]. Kheirollahi et al. proposed a new algorithm to estimate Vs in a carbonate reservoir using various predictive models, with a deep artificial neural network demonstrating the highest accuracy [
18].
However, previous models were point-to-point prediction models based on logging records at the entire depth and did not take into account the time-series characteristics of logging data. The variation characteristics of Vs are often closely related to the sedimentary characteristics of the strata. Therefore, when predicting Vs, it is important to consider not only the logging data at the same depth but also the changes in characteristics along the depth direction. Recurrent Neural Networks (RNNs) can handle sequential data by maintaining a hidden state that is adept at capturing temporal dependencies [
19]. However, RNNs face challenges in learning long-term dependencies due to gradient vanishing and exploding issues. Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional Long Short-Term Memory (BiLSTM) are variants of RNNs that enhance the ability of models to capture and learn features across a broader range by incorporating modules such as gate units. LSTM networks employ a more complex structure with three types of gates: input gates, output gates, and forget gates to retain or discard information over longer range. GRU is simplified based on LSTM; it uses only two gates: reset gates and update gates. Due to fewer gates and parameters, GRUs are more efficient than LSTM. However, for certain complex tasks, its simplicity may make it less flexible compared to LSTM. BiLSTM enhances traditional LSTM network capabilities by processing sequences in both forward and backward directions. BiLSTM can provide better performance than LSTM in sequence-based tasks. Therefore, these variant models can be applied to identify and capture the complex relationship between Vs and well as logging data [
20]. Wang et al. proposed a novel hybrid Vs prediction network based on LSTM and optimized by a PSO (Particle Swarm Optimization algorithm [
21]. Zhang et al. utilized an LSTM model using six kinds of well-logging data to predict Vs [
22]. You et al. proposed an LSTM network to predict the complete Vs profile of wells using limited Vs logging data [
23]. Wang et al. proposed a hybrid model combining CNN with LSTM to realize the intelligent inversion of Vs [
24]. Using logging and core data, Liu et al. introduced an improved approach to time/depth series prediction, establishing the relationship between elastic properties and reservoir characteristics [
25]. Yang et al. proposed a TCN–SA–BiLSTM model to study the internal correlation of features in logging datasets [
26]. Feng et al. introduced the theoretical rock physics model into deep learning algorithms to enhance physical interpretability [
27].
Well logs are known to be sequential data generated along the depth of wells, with different types of logs representing distinct petrophysical characteristics of rocks. Therefore, the vertical data from logging can be analogized to the temporal characteristics of the subsurface, while the various types of logging correspond to the spatial characteristics. By considering both temporal and spatial characteristics simultaneously, we can obtain more comprehensive and detailed information for predicting Vs compared to conventional methods. Parisa et al. presented an HC–BiLSTM model that predicts Japan’s future earthquake magnitudes by analyzing spatial and temporal patterns across 49 zones [
28]. Shan et al. proposed a model by integrating CNN, BiLSTM, and attention mechanism, to accurately predict missing well log data by leveraging spatio–temporal correlations in highly heterogeneous reservoirs [
29]. Guo et al. developed the MC–GAN–BiLSTM model to address missing logging data in geophysical logging [
30]. Ma et al. developed a CNN–BiLSTM–attention hybrid neural network model for predicting horizontal in situ stresses in complex formations [
31]. Chen et al. developed a hybrid network comprising 2DCNN and GRU to establish more complex nonlinear relationships between inputs and outputs [
32]. Sun et al. utilized BiLSTM with attention mechanism, to predict porosity in the Tarim Oilfield by handling the nonlinear relationship between logging parameters and porosity [
33].
In order to focus more on key features in the logging data, we incorporate the attention mechanism into the model that allows the model to better capture the serial relationships in the logging parameters. It is a core concept in machine learning that enables models to focus selectively on specific parts of the inputs, helping them capture relevant information and enhance performance across various tasks [
34].
The attention mechanism enhances prediction by selectively focusing on critical logging features, thus capturing temporal dependencies and reducing interference. This selective emphasis helps the model maintain key relationships over time, which improves prediction accuracy and model interpretability, allowing for an understanding of the significant time intervals or features influencing outcomes. Additionally, it mitigates long-range dependency issues, making it well-suited for complex sequences such as logging data, while its adaptability enables effective generalization across different tasks, ensuring robust performance in predicting Vs velocity [
35].
Traditional methods are highly reliant on the quality of logging data, and prediction results can be directly influenced by noise and missing data. There are also challenges in selecting appropriate logging data for fitting, and the presence of nonlinear relationships between different types of logging increases the uncertainty of the prediction results. To address these issues, a hybrid network is proposed in this article that combines Inception, attention mechanism, and BiLSTM. The Inception module, recognized for its ability to capture multi-scale features through parallel convolutional layers with varying kernel sizes, is a more efficient feature extractor than conventional CNN layers. This allows the proposed model to capture diverse geological patterns, improving its ability to handle complex data. Meanwhile, the attention mechanism module is integrated to highlight essential information dynamically. By selectively emphasizing important time steps, this module enhances the proposed network’s sensitivity to crucial temporal features, addressing the limitations of conventional methods that often overlook these relationships. Then, the BiLSTM module processes the sequence data forward and backward directions; it allows the proposed model to capture long-term dependencies more comprehensively, especially for data with significant spatio–temporal variations. The proposed network is referred to as the Inception–attention–BiLSTM hybrid network.
The novelty of this proposed hybrid network lies in its combination of these three powerful components—Inception for multi-scale feature extraction, attention mechanism for prioritizing critical time steps, and BiLSTM for learning long-term dependencies. These integrated approaches create a robust framework for Vs prediction. To evaluate the applicability and performance of the proposed network, a dataset from the Jurassic Badaowan Formation in the Junggar Basin is processed and analyzed using Inception, BiLSTM, and the proposed network, respectively. Comparative experiments show that the proposed network outperforms standalone Inception and BiLSTM, achieving higher prediction accuracy and better generalization, demonstrating its superiority in handling complex geological data.
2. Methods
2.1. Inception
The Inception architecture is characterized by integrating multiple convolution and pooling operations into a cohesive network structure. When designing neural networks, the Inception architecture follows a modular approach, allowing for a sparse network structure that efficiently processes dense data. The key innovation of Inception involves incorporating 1 × 1 convolution kernels before and after 3 × 3 and 5 × 5 convolutions, as well as after max pooling. These 1 × 1 convolutions reduce the dimensionality of feature maps, contributing to the overall efficiency of the Inception network. The structures of the naive version and the dimensional reduction version of the Inception module are shown in
Figure 1.
The process flow of data in the Inception module of dimensionality reduction is shown in
Figure 2.
2.2. Attention Mechanism
In geophysics studies, the temporal attention mechanism is chosen to analyze the temporal characteristics of well logging data. Well logging data typically consist of measurements at multiple time steps, reflecting the physical properties of subsurface layers, such as resistivity, gamma-ray, and density, which change with depth. The model needs to extract important information related to the formation characteristics and fluid content from these continuous measurement sequences. This approach significantly enhances the accuracy of identifying physical features and deepens the analysis of subsurface properties.
The core idea of the attention mechanism is to assign varying weights to elements at different positions in the input sequence, thereby enabling the model to prioritize important ones. The following is a simplified formula representation for the attention mechanism.
Suppose the input sequence is:
where
is the sequence length.
For each time step , calculate the corresponding attention weight and then perform a weighted sum of the input sequence to obtain the output.
The formula for calculating the attention weight is:
where
represents the hidden state or output vector at a specific time step
, the
is a scoring function that measures the relationship between the context
, the input
,
is the score, and
is the normalized attention weight.
The formula for weighted summation is:
where
is the result of a weighted sum of the input sequence with attention.
The attention mechanism calculates the correlation between query and key, identifies the most relevant value based on this correlation, and then assigns attention weights to the value to generate the final output result. This process entails deriving attention weights from the query–key correlation and applying them to process relevant values for computing the output. The attention mechanism process is shown in
Figure 3.
2.3. Bidirectional LSTM
Conventional RNNs have difficulty capturing long-range dependencies due to the gradient vanishing problem, where gradients shrink during backpropagation, limiting the network’s ability to learn long-range dependencies.
LSTM networks are introduced to address the gradient vanishing problem. LSTM integrates memory cells, input, forget, and output gates to selectively retain or discard information in long sequences. Despite the enhancements provided by LSTMs over traditional RNNs, unidirectional LSTMs encounter difficulties in effectively capturing information from past and future contexts, particularly in tasks where directional information is essential.
The concept of bidirectional processing in BiLSTM utilizes both forward and backward information, enabling the network to capture contextual information more comprehensively [
36]. This bidirectional approach allows the LSTM model to capture dependencies and patterns from past and future information, providing a more comprehensive understanding of the input sequence. BiLSTM is successfully applied to a variety of tasks, including NLP, speech recognition, and sequence-to-sequence modeling [
37,
38,
39].
The computations within the LSTM cell encompass updating cell states, calculating hidden states, and determining the output during both the forward and backward passes. These computations govern the forward and backward passes in an LSTM cell including computations for updating cell states, calculating hidden states, and determining the output. The bidirectional nature allows the model to consider the bidirectional context for each time step. The demonstration diagram of the BiLSTM network is shown in
Figure 4.
During the input sequence period, the input to a BiLSTM consists of a sequence of data points or tokens, each represented as a vector. Then the entire sequence is composed of a series of these vectors. For instance, in natural language processing tasks, individual words within a sentence can be represented as a vector using word embeddings.
In the forward LSTM processing phase, the input sequence is processed sequentially from the beginning to the end by a forward LSTM layer. At each time step , the LSTM takes the input vector corresponding to the -th token in the sequence. The LSTM computes an output and a hidden state based on , the previous hidden state , and cell state . The output and the updated hidden state are passed to the next time step.
In the backward LSTM processing period, the input sequence is processed in reverse order using a backward LSTM layer. At each time step (from end to beginning), the LSTM takes the input vector and computes an output and a hidden state based on , the previous hidden state , and cell state . The output and the updated hidden state are passed to the previous time step.
In the combining outputs, after processing both forward and backward sequences, the outputs from the forward LSTM and from the backward LSTM are concatenated for each time step . The combined outputs capture information from past and future contexts of the input sequence.
In the final output period, the concatenated outputs
are then typically used as inputs to subsequent layers of the neural network (e.g., fully connected layers) for tasks like sequence tagging, sentiment analysis, and machine translation. In summary, a BiLSTM processes input data using two LSTM layers operating in opposite directions, which enables the model to capture dependencies and contexts from past and future information within a sequence [
40].
It is worth mentioning that we designed the sliding window mechanism in the data reading stage. The specific window size can be set to flexibly consider the amount of before-and-after data the model needs to consider at the current depth. This ensures that the model can more comprehensively capture temporal variations and trends.
2.4. The Structure of the Hybrid Network
The Inception module replaces traditional convolution layers to enhance feature extraction capabilities within the proposed model. The Inception module can capture multi-scale feature information by simultaneously using different-sized convolution kernels to process the input data. In geological data analysis, logging datasets have different scale features, such as subtle stratigraphic variations and large-scale geological structures; the use of the Inception module helps to extract these multi-scale features and improve the model’s understanding of the geological data. Additionally, the design of the Inception module allows the network to simultaneously learn features at different scales at the same layer without having to go through multiple separate convolution layers, reducing parameter count, and boosting computational efficiency.
The main objective of attention mechanism is to intensify the model’s emphasis on significant features within the input data, enabling more effective learning and utilization of key information. In geological data analysis, stratigraphic datasets often contain various noise and redundant information, and the attention mechanism helps the model prioritize crucial features, thereby enhancing prediction accuracy and analysis. In addition, the attention mechanism dynamically adjusts feature weights based on different segments of the inputs, thus enabling the model to focus attention on features across different locations and scales depending on contextual information. This adaptive adjustment enhances the flexibility and adaptability of the model to better accommodate changes and complexities within geologic datasets.
The BiLSTM module can learn features from the input sequence in both the forward and backward directions to more accurately capture the long-term dependencies within the sequence data [
41]. When analyzing geological data, particularly stratigraphic information with complex spatio–temporal dependencies, the dataset used in this experiment organizes multiple logging datum in depth order, which also exhibits strong spatial relationships. The BiLSTM module is adept at analyzing the time evolution patterns in geological data, thus enabling accurate prediction of variables.
The structure of the proposed network is shown in
Figure 5.
The overall network structure comprises a combination of modules, each responsible for different data processing and feature learning tasks, facilitating the prediction and analysis of logging data, and ultimately realizing the S-wave prediction.
The input layer receives the feature vectors from logging data as input to the model. The Inception module conducts feature extraction through several branches. A dropout layer is incorporated to prevent overfitting by randomly deactivating neurons, enhancing model generalization. The attentional mechanism assigns weights to features, emphasizing important ones, thereby enhancing model expressiveness and accuracy. BiLSTM processes sequential data to capture temporal dependencies and features, combining output sequences from both forward and backward LSTMs for comprehensive temporal feature information. The fully connected layer maps the output features of the BiLSTM to the final predicted output space, followed by an activation function to generate model predictions. The output layer produces final model predictions.
2.5. Training and Prediction of the Hybrid Network
The data preprocessing phase includes dataset creation and multi-dimensional normalization. In dataset creation, relevant logging parameters are first selected as inputs, and input sequences are generated using a sliding window. Normalizing the logging data is essential to ensure data consistency and processing efficiency. The multidimensional normalization method is employed to scale the data to a specific range and distribution, facilitating improved learning and processing by the model. Assume that the original data feature
takes the range of values of
, normalized to the range
, where
and
are the minimum and maximum values of the target normalized range,
is the minimum value of the feature in the dataset,
is the maximum value of the feature in the dataset, and
is the result after multidimensional normalization. The normalization formula is described as:
The proposed hybrid network is trained using Adam optimizer and the mean square error (MSE) loss function. Throughout training, the network iteratively optimizes its parameters to learn the relationship between input sequence features and target values, leveraging the specified optimizer and loss function. The validation set is used to evaluate model performance and prevent overfitting during training.
After training, the trained network is deployed to predict outcomes on new data. The test data is prepared through pre-processing to generate input sequences, which are subsequently inputted into the proposed network for prediction using its designated prediction method. During prediction, the model computes predicted outputs based on learned weights and input data features. Post-processing techniques such as inverse normalization can be employed on the prediction results to generate final predicted values. This structured methodology reinforces the network’s effectiveness and generalization capability throughout the training and prediction phases.
Otherwise, Mean Absolute Error (MAE) is applied as a loss function to measure the differences between predicted values and true values. MAE is calculated as follows:
where
is the number of samples,
is the value of the
-th sample predicted by the network, and
is the true value of the
-th sample.
The correlation coefficient (R
2) is used to measure the fit of the regression model. R
2 is calculated as follows:
where
(Sum of Squares Residuals) represents the sum of the squares of the errors between the model’s predicted values and the true value,
(Total Sum of Squares) represents the total variation between the dependent variable and its mean,
is the true value,
is the predicted value,
is the sample size, and
is the mean of predicted values.
The workflow of the Vs prediction is shown in
Figure 6.