Inception–Attention–BiLSTM Hybrid Network: A Novel Approach for Shear Wave Velocity Prediction Utilizing Well Logging

Li, Jiayi; Lin, Yaoting; Gui, Zhixian; Wang, Peng

doi:10.3390/app15052345

Open AccessArticle

Inception–Attention–BiLSTM Hybrid Network: A Novel Approach for Shear Wave Velocity Prediction Utilizing Well Logging

¹

Key Laboratory of Exploration Technologies for Oil and Gas Resources of Ministry of Education, Yangtze University, Wuhan 430100, China

²

School of Computer Science and Engineering, Guangdong Ocean University at Yangjiang, Yangjiang 529500, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(5), 2345; https://doi.org/10.3390/app15052345

Submission received: 3 January 2025 / Revised: 18 February 2025 / Accepted: 20 February 2025 / Published: 22 February 2025

Download

Browse Figures

Versions Notes

Abstract

Shear wave velocity prediction is critical for applications in petrophysics, reservoir characterization, and unconventional energy resource development. While empirical formulas and theoretical rock physics models offer solutions, they are often limited by geological complexity, high cost, and computational inefficiency. After the emergence of deep learning methods, a series of new approaches have been provided to tackle these problems. In this study, a novel Inception–attention–BiLSTM hybrid network is proposed to enhance shear wave prediction accuracy and stability by integrating the strengths of three components: Inception for multi-scale feature extraction, attention mechanisms for dynamically highlighting key temporal features, and BiLSTM for capturing long-term dependencies in logging data. The test dataset of this network comes from the Jurassic Badaowan Formation in the Junggar Basin, achieving superior performance compared to standalone Inception and BiLSTM networks. The proposed hybrid network demonstrated MAE and R² values of 0.211 and 0.994, respectively, outperforming Inception (MAE 0.671, R² 0.981) and BiLSTM (MAE 0.215, R² 0.991). These results underscore its robustness in handling complex logging data, providing a more accurate and generalizable framework for Vs prediction while addressing limitations of traditional methods. This work highlights the potential of hybrid deep learning architectures in advancing logging data analysis and reservoir characterization.

Keywords:

Inception; BiLSTM; attention mechanism; shear wave velocity prediction; hybrid network

1. Introduction

Well logging is a geological exploration technique that involves using specialized logging tools, which are lowered into the well during or after drilling, to measure the physical parameters of the formation near the wellbore. The data is transmitted in real-time to the surface via cable or radio transmission. The raw data obtained is processed and analyzed, converted into useful geological information, and interpreted by experts. This provides crucial data support for subsurface geological studies, oil and gas exploration, and resource evaluation. Common well-logging data include caliper, density, gamma-ray, neutron porosity, and p-wave, among others. These parameters help in understanding the geological characteristics of the area, rock types, fluid content, and the hydrocarbon potential of the subsurface formations. Shear wave velocity (Vs) is a crucial parameter for petrophysical parameter estimation of reservoir, pre-stack inversion, fluid- type identification, unconventional oil and gas development, and CO₂ injection [1,2,3,4,5]. However, the accurate acquisition of Vs is hindered by the high cost of dipole acoustic logging measurements and the limitations imposed by borehole conditions, leading to a lack of Vs data in various regions. Therefore, Vs prediction is essential. In the related research, various prediction methods have been proposed for estimating Vs. These prediction methods can be divided into the following three categories: (1) the empirical correlation formulas, (2) the theoretical rock physical models, and (3) artificial intelligence methods.

Empirical correlation formulas were established using the relationship between existing logging data and the S-wave velocity in the specific study area, incorporating various mathematical equations [6,7,8,9]. The empirical correlation formulas are simple and can be applied quickly. However, they are only suitable for relatively simple geological environments and are highly dependent on the characteristics of the study domain [10]. This limits the application of empirical correlation formulas to predict Vs in complex geological conditions.

To address the limitations of empirical correlation formulas, theoretical rock physical models have been introduced to establish the relationship between elastic parameters and reservoir parameters to predict Vs. Sun et al. utilized the DEM–Gassman model to achieve a more accurate prediction of Vs than that of an empirical correlation formula [11]. Wang et al. introduced a method that inverts the coordination number of saturated rocks from the P-wave velocity by integrating the unified granular media model with Gassmann’s equation [12]. Zhang et al. designed a statistical model that links logging curves with Vs using Bayesian inversion to calculate key petrophysical parameters of the Xu–White model [13].

To obtain an accurate prediction result, it is crucial to precisely determine various rock physics parameters such as mineral components, pore characteristics, and fluid distribution. Additionally, due to noise interference, there is some uncertainty in the prediction results of the theoretical rock physical models when processing field data. The prediction process using these theoretical models is complex and inefficient [14].

Artificial intelligence, especially deep learning, has developed rapidly and found widespread applications in recent years. Convolutional Neural Networks (CNNs) are a widely used model applied across various fields. CNNs can automatically and adaptively learn spatial features, establishing nonlinear relationships between inputs and outputs. There are various variants of CNNs in deep learning, including AlexNet, which introduces the use of the ReLU activation function and dropout to significantly improve training speed and prevent overfitting; VGGNet, which uses smaller convolutional kernels (3 × 3) and a deep network structure to enhance feature extraction; ResNet, which introduces residual connections to address the gradient vanishing problem in deep networks; and Inception, which implements multi-scale feature extraction through parallel convolution operations (1 × 1, 3 × 3, 5 × 5) while introducing bottleneck layers to reduce the number of parameters. Wang et al. summarized the application scope of CNNs with different dimensions: 1D-CNN is suitable for time series and text processing, 2D-CNN is used for computer vision, and 3D-CNN is applied to process CT image and video analysis [15]. Due to the inherent relationship between Vs and well as logging data, various artificial intelligence methods have been developed to predict the Vs using well-logging data. For example, both compressional waves and shear waves are influenced by the same rock characteristics, and their velocity values exhibit a positive correlation [16]. Zhang et al. analyzed the effect of different logging combinations on Vs prediction based on the established 1D-CNN model [17]. Kheirollahi et al. proposed a new algorithm to estimate Vs in a carbonate reservoir using various predictive models, with a deep artificial neural network demonstrating the highest accuracy [18].

However, previous models were point-to-point prediction models based on logging records at the entire depth and did not take into account the time-series characteristics of logging data. The variation characteristics of Vs are often closely related to the sedimentary characteristics of the strata. Therefore, when predicting Vs, it is important to consider not only the logging data at the same depth but also the changes in characteristics along the depth direction. Recurrent Neural Networks (RNNs) can handle sequential data by maintaining a hidden state that is adept at capturing temporal dependencies [19]. However, RNNs face challenges in learning long-term dependencies due to gradient vanishing and exploding issues. Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional Long Short-Term Memory (BiLSTM) are variants of RNNs that enhance the ability of models to capture and learn features across a broader range by incorporating modules such as gate units. LSTM networks employ a more complex structure with three types of gates: input gates, output gates, and forget gates to retain or discard information over longer range. GRU is simplified based on LSTM; it uses only two gates: reset gates and update gates. Due to fewer gates and parameters, GRUs are more efficient than LSTM. However, for certain complex tasks, its simplicity may make it less flexible compared to LSTM. BiLSTM enhances traditional LSTM network capabilities by processing sequences in both forward and backward directions. BiLSTM can provide better performance than LSTM in sequence-based tasks. Therefore, these variant models can be applied to identify and capture the complex relationship between Vs and well as logging data [20]. Wang et al. proposed a novel hybrid Vs prediction network based on LSTM and optimized by a PSO (Particle Swarm Optimization algorithm [21]. Zhang et al. utilized an LSTM model using six kinds of well-logging data to predict Vs [22]. You et al. proposed an LSTM network to predict the complete Vs profile of wells using limited Vs logging data [23]. Wang et al. proposed a hybrid model combining CNN with LSTM to realize the intelligent inversion of Vs [24]. Using logging and core data, Liu et al. introduced an improved approach to time/depth series prediction, establishing the relationship between elastic properties and reservoir characteristics [25]. Yang et al. proposed a TCN–SA–BiLSTM model to study the internal correlation of features in logging datasets [26]. Feng et al. introduced the theoretical rock physics model into deep learning algorithms to enhance physical interpretability [27].

Well logs are known to be sequential data generated along the depth of wells, with different types of logs representing distinct petrophysical characteristics of rocks. Therefore, the vertical data from logging can be analogized to the temporal characteristics of the subsurface, while the various types of logging correspond to the spatial characteristics. By considering both temporal and spatial characteristics simultaneously, we can obtain more comprehensive and detailed information for predicting Vs compared to conventional methods. Parisa et al. presented an HC–BiLSTM model that predicts Japan’s future earthquake magnitudes by analyzing spatial and temporal patterns across 49 zones [28]. Shan et al. proposed a model by integrating CNN, BiLSTM, and attention mechanism, to accurately predict missing well log data by leveraging spatio–temporal correlations in highly heterogeneous reservoirs [29]. Guo et al. developed the MC–GAN–BiLSTM model to address missing logging data in geophysical logging [30]. Ma et al. developed a CNN–BiLSTM–attention hybrid neural network model for predicting horizontal in situ stresses in complex formations [31]. Chen et al. developed a hybrid network comprising 2DCNN and GRU to establish more complex nonlinear relationships between inputs and outputs [32]. Sun et al. utilized BiLSTM with attention mechanism, to predict porosity in the Tarim Oilfield by handling the nonlinear relationship between logging parameters and porosity [33].

In order to focus more on key features in the logging data, we incorporate the attention mechanism into the model that allows the model to better capture the serial relationships in the logging parameters. It is a core concept in machine learning that enables models to focus selectively on specific parts of the inputs, helping them capture relevant information and enhance performance across various tasks [34].

The attention mechanism enhances prediction by selectively focusing on critical logging features, thus capturing temporal dependencies and reducing interference. This selective emphasis helps the model maintain key relationships over time, which improves prediction accuracy and model interpretability, allowing for an understanding of the significant time intervals or features influencing outcomes. Additionally, it mitigates long-range dependency issues, making it well-suited for complex sequences such as logging data, while its adaptability enables effective generalization across different tasks, ensuring robust performance in predicting Vs velocity [35].

Traditional methods are highly reliant on the quality of logging data, and prediction results can be directly influenced by noise and missing data. There are also challenges in selecting appropriate logging data for fitting, and the presence of nonlinear relationships between different types of logging increases the uncertainty of the prediction results. To address these issues, a hybrid network is proposed in this article that combines Inception, attention mechanism, and BiLSTM. The Inception module, recognized for its ability to capture multi-scale features through parallel convolutional layers with varying kernel sizes, is a more efficient feature extractor than conventional CNN layers. This allows the proposed model to capture diverse geological patterns, improving its ability to handle complex data. Meanwhile, the attention mechanism module is integrated to highlight essential information dynamically. By selectively emphasizing important time steps, this module enhances the proposed network’s sensitivity to crucial temporal features, addressing the limitations of conventional methods that often overlook these relationships. Then, the BiLSTM module processes the sequence data forward and backward directions; it allows the proposed model to capture long-term dependencies more comprehensively, especially for data with significant spatio–temporal variations. The proposed network is referred to as the Inception–attention–BiLSTM hybrid network.

The novelty of this proposed hybrid network lies in its combination of these three powerful components—Inception for multi-scale feature extraction, attention mechanism for prioritizing critical time steps, and BiLSTM for learning long-term dependencies. These integrated approaches create a robust framework for Vs prediction. To evaluate the applicability and performance of the proposed network, a dataset from the Jurassic Badaowan Formation in the Junggar Basin is processed and analyzed using Inception, BiLSTM, and the proposed network, respectively. Comparative experiments show that the proposed network outperforms standalone Inception and BiLSTM, achieving higher prediction accuracy and better generalization, demonstrating its superiority in handling complex geological data.

2. Methods

2.1. Inception

The Inception architecture is characterized by integrating multiple convolution and pooling operations into a cohesive network structure. When designing neural networks, the Inception architecture follows a modular approach, allowing for a sparse network structure that efficiently processes dense data. The key innovation of Inception involves incorporating 1 × 1 convolution kernels before and after 3 × 3 and 5 × 5 convolutions, as well as after max pooling. These 1 × 1 convolutions reduce the dimensionality of feature maps, contributing to the overall efficiency of the Inception network. The structures of the naive version and the dimensional reduction version of the Inception module are shown in Figure 1.

The process flow of data in the Inception module of dimensionality reduction is shown in Figure 2.

2.2. Attention Mechanism

In geophysics studies, the temporal attention mechanism is chosen to analyze the temporal characteristics of well logging data. Well logging data typically consist of measurements at multiple time steps, reflecting the physical properties of subsurface layers, such as resistivity, gamma-ray, and density, which change with depth. The model needs to extract important information related to the formation characteristics and fluid content from these continuous measurement sequences. This approach significantly enhances the accuracy of identifying physical features and deepens the analysis of subsurface properties.

The core idea of the attention mechanism is to assign varying weights to elements at different positions in the input sequence, thereby enabling the model to prioritize important ones. The following is a simplified formula representation for the attention mechanism.

Suppose the input sequence is:

X = (x_{1}, x_{2}, \dots, x_{T})

(1)

where

T

is the sequence length.

For each time step

t

, calculate the corresponding attention weight

a_{t}

and then perform a weighted sum of the input sequence to obtain the output.

The formula for calculating the attention weight is:

\begin{matrix} e_{t} = score (h_{t - 1}, x_{t}) \\ α_{t} = \frac{e x p (e_{t})}{\sum_{k = 1}^{T} e x p (e_{k})} \end{matrix}

(2)

where

h_{t}

represents the hidden state or output vector at a specific time step

t

, the

s c o r e

is a scoring function that measures the relationship between the context

h_{t - 1}

, the input

x_{t}

,

e_{t}

is the score, and

α_{t}

is the normalized attention weight.

The formula for weighted summation is:

c_{t} = \sum_{k = 1}^{T} α_{k} x_{k}

(3)

where

c_{t}

is the result of a weighted sum of the input sequence with attention.

The attention mechanism calculates the correlation between query and key, identifies the most relevant value based on this correlation, and then assigns attention weights to the value to generate the final output result. This process entails deriving attention weights from the query–key correlation and applying them to process relevant values for computing the output. The attention mechanism process is shown in Figure 3.

2.3. Bidirectional LSTM

Conventional RNNs have difficulty capturing long-range dependencies due to the gradient vanishing problem, where gradients shrink during backpropagation, limiting the network’s ability to learn long-range dependencies.

LSTM networks are introduced to address the gradient vanishing problem. LSTM integrates memory cells, input, forget, and output gates to selectively retain or discard information in long sequences. Despite the enhancements provided by LSTMs over traditional RNNs, unidirectional LSTMs encounter difficulties in effectively capturing information from past and future contexts, particularly in tasks where directional information is essential.

The concept of bidirectional processing in BiLSTM utilizes both forward and backward information, enabling the network to capture contextual information more comprehensively [36]. This bidirectional approach allows the LSTM model to capture dependencies and patterns from past and future information, providing a more comprehensive understanding of the input sequence. BiLSTM is successfully applied to a variety of tasks, including NLP, speech recognition, and sequence-to-sequence modeling [37,38,39].

The computations within the LSTM cell encompass updating cell states, calculating hidden states, and determining the output during both the forward and backward passes. These computations govern the forward and backward passes in an LSTM cell including computations for updating cell states, calculating hidden states, and determining the output. The bidirectional nature allows the model to consider the bidirectional context for each time step. The demonstration diagram of the BiLSTM network is shown in Figure 4.

During the input sequence period, the input to a BiLSTM consists of a sequence of data points or tokens, each represented as a vector. Then the entire sequence is composed of a series of these vectors. For instance, in natural language processing tasks, individual words within a sentence can be represented as a vector using word embeddings.

In the forward LSTM processing phase, the input sequence is processed sequentially from the beginning to the end by a forward LSTM layer. At each time step

t

, the LSTM takes the input vector

x_{t}

corresponding to the

t

-th token in the sequence. The LSTM computes an output

h_{t}

and a hidden state

c_{t}

based on

x_{t}

, the previous hidden state

h_{t - 1}

, and cell state

c_{t - 1}

. The output

h_{t}

and the updated hidden state

c_{t}

are passed to the next time step.

In the backward LSTM processing period, the input sequence is processed in reverse order using a backward LSTM layer. At each time step

t

(from end to beginning), the LSTM takes the input vector

x_{t}

and computes an output

{\tilde{h}}_{t}

and a hidden state

{\tilde{c}}_{t}

based on

x_{t}

, the previous hidden state

{\tilde{h}}_{t + 1}

, and cell state

{\tilde{c}}_{t + 1}

. The output

{\tilde{h}}_{t}

and the updated hidden state

{\tilde{c}}_{t}

are passed to the previous time step.

In the combining outputs, after processing both forward and backward sequences, the outputs

h_{t}

from the forward LSTM and

{\tilde{h}}_{t}

from the backward LSTM are concatenated for each time step

t

. The combined outputs

[h_{t}; {\tilde{h}}_{t}]

capture information from past and future contexts of the input sequence.

In the final output period, the concatenated outputs

[h_{t}; {\tilde{h}}_{t}]

are then typically used as inputs to subsequent layers of the neural network (e.g., fully connected layers) for tasks like sequence tagging, sentiment analysis, and machine translation. In summary, a BiLSTM processes input data using two LSTM layers operating in opposite directions, which enables the model to capture dependencies and contexts from past and future information within a sequence [40].

It is worth mentioning that we designed the sliding window mechanism in the data reading stage. The specific window size can be set to flexibly consider the amount of before-and-after data the model needs to consider at the current depth. This ensures that the model can more comprehensively capture temporal variations and trends.

2.4. The Structure of the Hybrid Network

The Inception module replaces traditional convolution layers to enhance feature extraction capabilities within the proposed model. The Inception module can capture multi-scale feature information by simultaneously using different-sized convolution kernels to process the input data. In geological data analysis, logging datasets have different scale features, such as subtle stratigraphic variations and large-scale geological structures; the use of the Inception module helps to extract these multi-scale features and improve the model’s understanding of the geological data. Additionally, the design of the Inception module allows the network to simultaneously learn features at different scales at the same layer without having to go through multiple separate convolution layers, reducing parameter count, and boosting computational efficiency.

The main objective of attention mechanism is to intensify the model’s emphasis on significant features within the input data, enabling more effective learning and utilization of key information. In geological data analysis, stratigraphic datasets often contain various noise and redundant information, and the attention mechanism helps the model prioritize crucial features, thereby enhancing prediction accuracy and analysis. In addition, the attention mechanism dynamically adjusts feature weights based on different segments of the inputs, thus enabling the model to focus attention on features across different locations and scales depending on contextual information. This adaptive adjustment enhances the flexibility and adaptability of the model to better accommodate changes and complexities within geologic datasets.

The BiLSTM module can learn features from the input sequence in both the forward and backward directions to more accurately capture the long-term dependencies within the sequence data [41]. When analyzing geological data, particularly stratigraphic information with complex spatio–temporal dependencies, the dataset used in this experiment organizes multiple logging datum in depth order, which also exhibits strong spatial relationships. The BiLSTM module is adept at analyzing the time evolution patterns in geological data, thus enabling accurate prediction of variables.

The structure of the proposed network is shown in Figure 5.

The overall network structure comprises a combination of modules, each responsible for different data processing and feature learning tasks, facilitating the prediction and analysis of logging data, and ultimately realizing the S-wave prediction.

The input layer receives the feature vectors from logging data as input to the model. The Inception module conducts feature extraction through several branches. A dropout layer is incorporated to prevent overfitting by randomly deactivating neurons, enhancing model generalization. The attentional mechanism assigns weights to features, emphasizing important ones, thereby enhancing model expressiveness and accuracy. BiLSTM processes sequential data to capture temporal dependencies and features, combining output sequences from both forward and backward LSTMs for comprehensive temporal feature information. The fully connected layer maps the output features of the BiLSTM to the final predicted output space, followed by an activation function to generate model predictions. The output layer produces final model predictions.

2.5. Training and Prediction of the Hybrid Network

The data preprocessing phase includes dataset creation and multi-dimensional normalization. In dataset creation, relevant logging parameters are first selected as inputs, and input sequences are generated using a sliding window. Normalizing the logging data is essential to ensure data consistency and processing efficiency. The multidimensional normalization method is employed to scale the data to a specific range and distribution, facilitating improved learning and processing by the model. Assume that the original data feature

x

takes the range of values of

x - x_{m i n}

, normalized to the range

[a, b]

, where

a

and

b

are the minimum and maximum values of the target normalized range,

x_{m i n}

is the minimum value of the feature in the dataset,

x_{m a x}

is the maximum value of the feature in the dataset, and

y

is the result after multidimensional normalization. The normalization formula is described as:

y = a + \frac{(x - x_{m i n}) \times (b - a)}{(x_{m a x} - x_{m i n})}

(4)

The proposed hybrid network is trained using Adam optimizer and the mean square error (MSE) loss function. Throughout training, the network iteratively optimizes its parameters to learn the relationship between input sequence features and target values, leveraging the specified optimizer and loss function. The validation set is used to evaluate model performance and prevent overfitting during training.

After training, the trained network is deployed to predict outcomes on new data. The test data is prepared through pre-processing to generate input sequences, which are subsequently inputted into the proposed network for prediction using its designated prediction method. During prediction, the model computes predicted outputs based on learned weights and input data features. Post-processing techniques such as inverse normalization can be employed on the prediction results to generate final predicted values. This structured methodology reinforces the network’s effectiveness and generalization capability throughout the training and prediction phases.

Otherwise, Mean Absolute Error (MAE) is applied as a loss function to measure the differences between predicted values and true values. MAE is calculated as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(5)

where

n

is the number of samples,

{\hat{y}}_{i}

is the value of the

i

-th sample predicted by the network, and

y_{i}

is the true value of the

i

-th sample.

The correlation coefficient (R²) is used to measure the fit of the regression model. R² is calculated as follows:

R^{2} = 1 - \frac{SS_res}{SS_tot}

(6)

SS_res = \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(7)

SS_tot = \sum_{i = 1}^{n} (y_{i} - {\bar{y}}_{i})^{2}

(8)

where

SS_res

(Sum of Squares Residuals) represents the sum of the squares of the errors between the model’s predicted values and the true value,

SS_tot

(Total Sum of Squares) represents the total variation between the dependent variable and its mean,

y_{i}

is the true value,

{\hat{y}}_{i}

is the predicted value,

n

is the sample size, and

{\bar{y}}_{i}

is the mean of predicted values.

The workflow of the Vs prediction is shown in Figure 6.

3. Example Analysis

3.1. Dataset Introduction

The dataset is sourced from the Jurassic Badaowan Formation in the Junggar Basin. It primarily consists of sandstone and mudstone, representing a typical unconventional tight hydrocarbon reservoir characterized by deep burial, low porosity, low permeability, and complex pore structure [42]. Commercial oil well pore structures typically exhibit fractures in horizontal, cross, and deformed bedding, contributing to rock anisotropy. Existing rock physics theories face challenges in establishing accurate models for tight sandstone reservoirs. To address this, the proposed network for Vs prediction aims to enhance the accuracy of petrophysical models and pre-stack seismic inversion by integrating well-seismic data.

The study region encompasses 15 wells, with five wells containing Vs data labeled as NY-1, NY-5, NY-9, NY-11, and NY-15. Logging data from NY-1, NY-5, and NY-9 wells are merged as Y3 for training the network, and logging data from NY-11 and NY-15 well logging data are merged as Y1 to assess the network’s generalization performance.

3.2. Data Feature Selection

The visualization of logging data is shown in Figure 7.

Various logging parameters exhibit different correlations with Vs. The R² between logging data and Vs is shown in Figure 8; in the figure DTM_S refers to the shear wave velocity.

The strong correlation of logging data directly affects the accuracy of Vs prediction. The R² between Vs and the other parameters, in descending order, are as follows: P-wave (0.8653), density (0.3601), neutron porosity (0.0243), caliper (0.0103), gamma ray (GR) (0.0096), resistivity (LLD) (0.0041), lithology (Lith3) (0.0038), and SP (0.0020). Based on the order of R², three less relevant aspects (SP, lithology, and resistivity) are excluded from the experiment. Excluding less relevant parameters can reduce overfitting, enhance model generalization, and extract important features to improve model accuracy and generalizability. Caliper data measure the diameter of the wellbore, providing information on wellbore size [43]. Density data quantify the mass per unit volume of rock, aiding in rock- type identification and stratigraphic analysis [44]. Gamma-ray surveys detect radioactive elements within formations, providing insights into rock composition and hydrocarbon potential [45]. Neutron porosity measurements assess water content and porosity in rock formations, aiding in aquifer identification [46]. P-wave data measure the speed of seismic waves through rock, facilitating the rock-type identification and stratigraphic analysis [47].

4. Network Comparative Analysis

To assess the capability of the proposed network, we analyzed the prediction outcomes of Inception, BiLSTM, and the hybrid model, respectively.

All networks employ the Adam optimization algorithm, a popular gradient descent method that combines momentum and adaptive learning rates. Adam accelerates neural network training, improves optimization efficiency and stability, and is well-suited for various machine learning and deep learning tasks [48]. After extensive comparative testing, the recommended parameters for the hybrid network are detailed in Table 1; they enhance the network’s capability to process critical spatio–temporal features.

4.1. Training Set Analysis

The logging data from Y3 wells is used to train Inception, BiLSTM, and the hybrid network, separately. Splitting the dataset into 70% for training and 30% for testing allows sufficient data for training while preserving enough information for evaluation. The training phase loss errors of the three networks are shown in Figure 9. It is evident that as the number of training times increases, the loss error decreases and eventually stabilizes at a constant value, which indicates that the network has converged to an optimal state. This stabilization also reflects the robustness of the model, as it demonstrates consistent performance across multiple training epochs without significant fluctuations in loss. Furthermore, the convergence of training and validation losses shows that the model can be effectively generalized to other data, minimizing the risk of overfitting. Overall, the stable loss values indicate that the model is reliable and capable of maintaining its predictive accuracy under varying conditions.

The proposed network exhibits a lower loss error compared to the Inception and BiLSTM networks, indicating its effective utilization of the correlation between spatio–temporal features in conventional logging data and Vs. This improvement highlights the network’s enhanced sensitivity to critical spatio–temporal features.

4.2. Testing Set Analysis

To verify the accuracy and generalization ability of the proposed hybrid network’s prediction, 1600 pieces data from 5518 to 5735 feet, Y3 and Y1 logging data is used as the training and testing set, and the MAE and R² of the prediction results from Inception, BiLSTM, and the proposed networks are analyzed, as presented in Figure 10 and Table 2. The prediction results of the Inception network show a significant deviation from the actual values. The BiLSTM network’s predictions are closer to the actual values than those of the Inception network, but there are still errors. The proposed network outperforms the Inception and BiLSTM networks, while the peak predictions are much closer to the actual values. The proposed network’s MAE is lower than that of the other two networks, demonstrating smaller errors, while its R² value is higher than both, demonstrating greater accuracy.

The performance analysis reveals that the prediction performance of the proposed network is better than that of the individual Inception and BiLSTM networks.

The correlation between true data and the prediction result of the proposed network is shown in Figure 11. The predicted values from the proposed network are evenly distributed around the actual values with small errors, indicating a high correlation and a greater reliability of the data.

The prediction results demonstrate the superior performance of the proposed network compared with standalone Inception and BiLSTM. While the predictions from both standalone Inception and BiLSTM exhibit trends that align with data changes, the proposed network’s predictions closely match the actual data. This suggests that the proposed network compensates for limitations in the standalone Inception network’s consideration of data dependencies and addresses the BiLSTM network’s challenges with multi-scale feature extraction.

5. Conclusions

In this article, a novel hybrid network is proposed to predict Vs. The proposed hybrid network can tackle the limitations of conventional logging methods, particularly in complex geological environments like the dense sandstone reservoirs of the Junggar Basin. The network replaces the traditional single CNN layer with the Inception module, enhancing feature extraction by capturing multi-scale information. The attention mechanism is also incorporated to dynamically focus on different time steps in the sequence, increasing the network’s sensitivity to critical temporal features. BiLSTM captures spatio–temporal dependencies in both directions and focuses more comprehensively on logging parameter sequences. A sliding window approach, which considers features from four steps before and after the current depth, captures temporal variations and trends more comprehensively.

The proposed network structure significantly outperforms previous models through multiple comparative tests, demonstrating superior prediction accuracy and generalization. The results indicate that in the formation of sandstone in Y3 and Y1 wells, The R² of the proposed network is higher than 99% while the MAE value is low, which is notably better than that of standalone Inception and BiLSTM models. This performance confirms the proposed network’s ability to extract multi-scale features effectively while considering spatio–temporal dependencies in logging data, leading to enhanced sensitivity and improved Vs prediction accuracy.

Author Contributions

Conceptualization, P.W. and Y.L.; methodology, P.W. and Y.L.; validation, Y.L. and J.L.; formal analysis, Y.L. and J.L.; investigation, J.L.; resources, Z.G.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.L.; supervision, Z.G.; project administration, Z.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the State Key Program of the National Natural Science Foundation of China (Grant Number: 42030805) and in part by the Scientific Research Start-up Funds of Guangdong Ocean University (Grant Number: 360302042301).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to author is not authorized to share the experimental data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jolly, R.N. Investigation of shear waves. Geophysics 1956, 21, 905–938. [Google Scholar] [CrossRef]
Vernik, L.; Castagna, J.; Omovie, S.J. S-wave velocity prediction in unconventional shale reservoirs. Geophysics 2018, 83, MR35–MR45. [Google Scholar] [CrossRef]
Vijouyeh, A.G.; Kadkhodaie, A.; Sedghi, M.H.; Vijouyeh, H.G. A committee machine with intelligent experts (CMIE) for estimation of fast and slow shear wave velocities utilizing petrophysical logs. Comput. Geosci. 2022, 165, 105149. [Google Scholar] [CrossRef]
Diaz-Acosta, A.; Bouchaala, F.; Kishida, T.; Jouini, M.S.; Ali, M.Y. Investigation of fractured carbonate reservoirs by applying the shear-wave splitting concept. Adv. Geo-Energy Res. 2023, 7, 99–110. [Google Scholar] [CrossRef]
Okere, C.J.; Sheng, J.J.; Ejike, C. Evaluating reservoir suitability for large-scale hydrogen storage: A preliminary assessment considering reservoir properties. Energy Geosci. 2024, 5, 100318. [Google Scholar] [CrossRef]
Eberhart-Phillips, D.; Han, D.H.; Zoback, M.D. Empirical relationships among seismic velocity, effective pressure, porosity, and clay content in sandstone. Geophysics 1989, 54, 82–89. [Google Scholar] [CrossRef]
Yasar, E.; Erdogan, Y. Correlating sound velocity with the density, compressive strength and Young’s modulus of carbonate rocks. Int. J. Rock Mech. Min. Sci. 2004, 41, 871–875. [Google Scholar] [CrossRef]
Oloruntobi, O.; Butt, S. The shear-wave velocity prediction for sedimentary rocks. J. Nat. Gas Sci. Eng. 2020, 76, 103084. [Google Scholar] [CrossRef]
Ghorbani, H.; Davoodi, S.; Davarpanah, A. Accurate determination of shear wave velocity using LSSVM-GA algorithm based on petrophysical log. In Third EAGE Eastern Mediterranean Workshop; European Association of Geoscientists & Engineers: Houten, Netherlands, 2021; Volume 2021, No. 1. [Google Scholar]
Ameen, M.S.; Smart, B.G.; Somerville, J.M.; Hammilton, S.; Naji, N.A. Predicting rock mechanical properties of carbonates from wireline logs (A case study: Arab-D reservoir, Ghawar field, Saudi Arabia). Mar. Pet. Geol. 2009, 26, 430–444. [Google Scholar] [CrossRef]
Sun, S.Z.; Wang, H.; Liu, Z.; Li, Y.; Zhou, X.; Wang, Z. The theory and application of DEM-Gassmann rock physics model for complex carbonate reservoirs. Lead. Edge 2012, 31, 152–158. [Google Scholar] [CrossRef]
Wang, J.; Wu, S.; Zhao, L.; Wang, W.; Wei, J.; Sun, J. An effective method for shear-wave velocity prediction in sandstones. Mar. Geophys. Res. 2019, 40, 655–664. [Google Scholar] [CrossRef]
Zhang, B.; Jin, S.; Liu, C.; Guo, Z.; Liu, X. Prediction of shear wave velocity based on a statistical rock-physics model and Bayesian theory. J. Pet. Sci. Eng. 2020, 195, 107710. [Google Scholar] [CrossRef]
Rajabi, M.; Hazbeh, O.; Davoodi, S.; Wood, D.A.; Tehrani, P.S.; Ghorbani, H.; Mehrad, M.; Mohamadian, N.; Rukavishnikov, V.S.; Radwan, A.E. Predicting shear wave velocity from conventional well logs with deep and hybrid machine learning algorithms. J. Pet. Explor. Prod. Technol. 2023, 13, 19–42. [Google Scholar] [CrossRef]
Wang, Y.; Teng, Q.; He, X.; Feng, J.; Zhang, T. CT-image of rock samples super resolution using 3D convolutional neural network. Comput. Geosci. 2019, 133, 104314. [Google Scholar] [CrossRef]
Xu, S.; White, R.E. A new velocity model for clay--sand mixtures 1. Geophys. Prospect. 1995, 43, 91–118. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, C.; Ma, Q.; Zhang, X.; Zhou, H. Automatic prediction of shear wave velocity using convolutional neural networks for different reservoirs in Ordos Basin. J. Pet. Sci. Eng. 2022, 208, 109252. [Google Scholar] [CrossRef]
Kheirollahi, H.; Manaman, N.S.; Leisi, A. Robust estimation of shear wave velocity in a carbonate oil reservoir from conventional well logging data using machine learning algorithms. J. Appl. Geophys. 2023, 211, 104971. [Google Scholar] [CrossRef]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Wang, J.; Cao, J. Data-driven S-wave velocity prediction method via a deep-learning-based deep convolutional gated recurrent unit fusion network. Geophysics 2021, 86, M185–M196. [Google Scholar] [CrossRef]
Wang, J.; Cao, J.; Yuan, S. Shear wave velocity prediction based on adaptive particle swarm optimization optimized recurrent neural network. J. Pet. Sci. Eng. 2020, 194, 107466. [Google Scholar] [CrossRef]
Zhang, Y.; Zhong, H.-R.; Wu, Z.-Y.; Zhou, H.; Ma, Q.-Y. Improvement of petrophysical workflow for shear wave velocity prediction based on machine learning methods for complex carbonate reservoirs. J. Pet. Sci. Eng. 2020, 192, 107234. [Google Scholar] [CrossRef]
You, J.; Cao, J.; Wang, X.; Liu, W. Shear wave velocity prediction based on LSTM and its application for morphology identification and saturation inversion of gas hydrate. J. Pet. Sci. Eng. 2021, 205, 109027. [Google Scholar] [CrossRef]
Wang, J.; Cao, J.; Zhao, S.; Qi, Q. S-wave velocity inversion and prediction using a deep hybrid neural network. Sci. China Earth Sci. 2022, 65, 724–741. [Google Scholar] [CrossRef]
Liu, X.; Zhou, H.; Guo, K.; Li, C.; Zu, S.; Wu, L. Quantitative characterization of shale gas reservoir properties based on BiLSTM with attention mechanism. Geosci. Front. 2023, 14, 101567. [Google Scholar] [CrossRef]
Yang, W.; Xia, K.; Fan, S. Oil logging reservoir recognition based on TCN and SA-BiLSTM deep learning method. Eng. Appl. Artif. Intell. 2023, 121, 105950. [Google Scholar] [CrossRef]
Feng, G.; Zeng, H.-H.; Xu, X.-R.; Tang, G.-Y.; Wang, Y.-X. Shear wave velocity prediction based on deep neural network and theoretical rock physics modeling. Front. Earth Sci. 2023, 10, 1025635. [Google Scholar] [CrossRef]
Kavianpour, P.; Kavianpour, M.; Jahani, E.; Ramezani, A. Earthquake magnitude prediction using spatia-temporal features learning based on hybrid cnn-bilstm model. In Proceedings of the 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS), Virtual, 29–30 December 2021. [Google Scholar]
Shan, L.; Liu, Y.; Tang, M.; Yang, M.; Bai, X. CNN-BiLSTM hybrid neural networks with attention mechanism for well log prediction. J. Pet. Sci. Eng. 2021, 205, 108838. [Google Scholar] [CrossRef]
Liang, G.; Luo, R.; Li, X.; Tuo, J.; Lei, C.; Zhou, Y. Logging Data Completion Based on an MC-GAN-BiLSTM Model. IEEE Access 2021, 10, 1810–1822. [Google Scholar]
Ma, T.; Xiang, G.; Shi, Y.; Liu, Y. Horizontal in situ stresses prediction using a CNN-BiLSTM-attention hybrid neural network. Geomech. Geophys. Geo-Energy Geo-Resour. 2022, 8, 152. [Google Scholar] [CrossRef]
Chen, T.; Gao, G.; Li, Y.; Wang, P.; Zhao, B.; Gui, Z.; Zhai, X. Shear-Wave Velocity Prediction Method via a Gate Recurrent Unit Fusion Network Based on the Spatiotemporal Attention Mechanism. Lithosphere 2022, 2022, 4701851. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, J.; Yu, Z.; Zhang, Y.; Liu, Z. Bidirectional long short-term neural network based on the attention mechanism of the residual neural network (ResNet–BiLSTM–Attention) predicts porosity through well logging parameters. ACS Omega 2023, 8, 24083–24092. [Google Scholar] [CrossRef] [PubMed]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Xu, G.; Meng, Y.; Qiu, X.; Yu, Z.; Wu, X. Sentiment analysis of comment texts based on BiLSTM. IEEE Access 2019, 7, 51522–51532. [Google Scholar] [CrossRef]
Rhanoui, M.; Mikram, M.; Yousfi, S.; Barzali, S. A CNN-BiLSTM model for document-level sentiment analysis. Mach. Learn. Knowl. Extr. 2019, 1, 832–847. [Google Scholar] [CrossRef]
Sajjad, M.; Kwon, S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access 2020, 8, 79861–79875. [Google Scholar]
Ma, D.; Li, S.; Wu, F.; Xie, X.; Wang, H. Exploring sequence-to-sequence learning in aspect term extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019. [Google Scholar]
Chen, T.; Gao, G.; Wang, P.; Zhao, B.; Li, Y.; Gui, Z. Prediction of shear wave velocity based on a hybrid network of two-dimensional convolutional neural network and gated recurrent unit. Geofluids 2022, 2022, 1–14. [Google Scholar] [CrossRef]
Chen, Y.; Tang, D.; Xu, H.; Chen, T.; Tian, L. Application of logging data in recognition of coal structure and stratification. Coal Geol. Explor. 2014, 42, 19–23. [Google Scholar]
Rybach, L.; Buntebarth, G. The variation of heat generation, density and seismic velocity with rock type in the continental lithosphere. Tectonophysics 1984, 103, 335–344. [Google Scholar] [CrossRef]
Galbraith, J.H.; Saunders, D.F. Rock classification by characteristics of aerial gamma-ray measurements. J. Geochem. Explor. 1983, 18, 49–73. [Google Scholar] [CrossRef]
Burke, J.A.; Campbell, R.L., Jr.; Schmidt, A.W. The litho-porosity cross plot a method of determining rock characteristics for computation of log data. In Proceedings of the SPE Illinois Basin Regional Meeting, Evansville, Indiana, 30–31 October 1969. [Google Scholar]
Kahraman, S. The correlations between the saturated and dry P-wave velocity of rocks. Ultrasonics 2007, 46, 341–348. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. The structures of the different Inception versions. (a) The naive version, (b) the dimensional reduction version.

Figure 2. The operating mechanism of the dimensionality reduction version of the Inception module.

Figure 3. The attention mechanism process.

Figure 4. A demonstration diagram of the BiLSTM network.

Figure 5. The structure of the Inception–attention–BiLSTM hybrid network.

Figure 6. The Vs prediction workflow of the proposed hybrid network.

Figure 7. The visualization of logging data.

Figure 8. The scatter plots illustrating the correlation between logging parameters and Vs.

Figure 9. Training loss error curves of the three networks. (a) Inception network; (b) BiLSTM network; (c) Inception–attention–BiLSTM hybrid network.

Figure 10. The prediction results of the three networks. (a) Inception network, (b) BiLSTM network, (c) Inception–attention–BiLSTM hybrid network.

Figure 11. The correlation between the true Vs and the Inception–attention–BiLSTM hybrid network predicted.

Table 1. The parameters of the proposed hybrid network for the comparative experiment.

Parameters	Values
The number of time steps	9
The number of the LSTM units The number of the training epochs The number of training batch size	16 200 64

Table 2. Comparison of the predicted effects of the Inception, BiLSTM, and Inception–attention–BiLSTM.

Networks	MAE	R²
Inception	0.671	0.981
BiLSTM Inception–attention–BiLSTM	0.215 0.211	0.991 0.994

All models were trained on Y3 and tested on Y1.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Lin, Y.; Gui, Z.; Wang, P. Inception–Attention–BiLSTM Hybrid Network: A Novel Approach for Shear Wave Velocity Prediction Utilizing Well Logging. Appl. Sci. 2025, 15, 2345. https://doi.org/10.3390/app15052345

AMA Style

Li J, Lin Y, Gui Z, Wang P. Inception–Attention–BiLSTM Hybrid Network: A Novel Approach for Shear Wave Velocity Prediction Utilizing Well Logging. Applied Sciences. 2025; 15(5):2345. https://doi.org/10.3390/app15052345

Chicago/Turabian Style

Li, Jiayi, Yaoting Lin, Zhixian Gui, and Peng Wang. 2025. "Inception–Attention–BiLSTM Hybrid Network: A Novel Approach for Shear Wave Velocity Prediction Utilizing Well Logging" Applied Sciences 15, no. 5: 2345. https://doi.org/10.3390/app15052345

APA Style

Li, J., Lin, Y., Gui, Z., & Wang, P. (2025). Inception–Attention–BiLSTM Hybrid Network: A Novel Approach for Shear Wave Velocity Prediction Utilizing Well Logging. Applied Sciences, 15(5), 2345. https://doi.org/10.3390/app15052345

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inception–Attention–BiLSTM Hybrid Network: A Novel Approach for Shear Wave Velocity Prediction Utilizing Well Logging

Abstract

1. Introduction

2. Methods

2.1. Inception

2.2. Attention Mechanism

2.3. Bidirectional LSTM

2.4. The Structure of the Hybrid Network

2.5. Training and Prediction of the Hybrid Network

3. Example Analysis

3.1. Dataset Introduction

3.2. Data Feature Selection

4. Network Comparative Analysis

4.1. Training Set Analysis

4.2. Testing Set Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI