DRAG: A Novel Method for Automatic Geological Boundary Recognition in Shale Strata Using Multi-Well Log Curves

Zhou, Tianqi; Zhu, Qingzhong; Zhu, Hangyi; Zhao, Qun; Shi, Zhensheng; Zhao, Shengxian; Zhang, Chenglin; Wang, Shanyu

doi:10.3390/pr11102998

Open AccessArticle

DRAG: A Novel Method for Automatic Geological Boundary Recognition in Shale Strata Using Multi-Well Log Curves

¹

Research Institute of Petroleum Exploration and Development, PetroChina, Beijing 100083, China

²

Shale Gas Institute of PetroChina Southwest Oil & Gasfield Company, Chengdu 610051, China

^*

Author to whom correspondence should be addressed.

Processes 2023, 11(10), 2998; https://doi.org/10.3390/pr11102998

Submission received: 31 August 2023 / Revised: 7 October 2023 / Accepted: 11 October 2023 / Published: 17 October 2023

Download

Browse Figures

Versions Notes

Abstract

:

Ascertaining the positions of geological boundaries serves as a cornerstone in the characterization of shale reservoirs. Existing methods heavily rely on labor-intensive manual well-to-well correlation, while automated techniques often suffer from limited efficiency and consistency due to their reliance on single well log data. To overcome these limitations, an innovative approach, termed DRAG, is introduced, which uses deep belief forest (DBF), principal component analysis (PCA), and an enhanced generative adversarial network (GAN) for automatic layering recognition in logging curves. The approach employed in this study involves the use of PCA for dimensionality reduction across multiple well log datasets, coupled with a sophisticated GAN to generate representative samples. The DBF algorithm is then applied for stratification, incorporating a confidence screening mechanism to improve computational efficiency. In order to improve both accuracy and stability, a coordinate system is introduced that adjusts for stratification variations among neighboring wells around the target well. Experimental comparisons demonstrate the superior performance of the proposed algorithm in reducing stratification fluctuations and improving precision.

Keywords:

logging curves; stratigraphic boundary delineation; automatic stratification; deep belief forest

1. Introduction

Well log stratification plays an integral role in geological data interpretation, significantly influencing lithology identification, log facies analysis, and reservoir parameter studies [1,2,3,4,5,6,7,8,9,10]. Conventional methodologies, which include both labor-intensive manual techniques and automated approaches, have inherent limitations. The origin of automatic stratification can be traced back to mathematical statistical methods, encompassing intra-layer difference, optimal segmentation, change point analysis, and extreme variance clustering [11,12,13,14,15]. While these pioneering methods were transformative, they generally necessitated substantial computational resources and had difficulty with the acquisition of accurate prior probabilities [16,17,18,19].

The advent of machine learning techniques, exemplified by fuzzy clustering and neural networks, represents a significant evolution [20,21,22,23,24]. These methods focus on stratification using individual well-logging curves, potentially leading to imprecision, especially in instances of ambiguous curve delineation [25]. Moreover, these techniques often have difficulty correlating stratification resulting from proximate wells that share similar stratigraphic structures [26,27].

More recently, the development of deep learning has provided a promising direction for well logging curve prediction. Techniques such as fully convolutional neural networks (FCNNs) and recurrent neural networks (RNNs) have exhibited considerable promise. Zhang et al. [28] utilized the RNN-based improved long short-term memory (LSTM) network and the C-LSTM network with a cascading system to generate logging curves. Zhou et al. [29] accurately predicted the changing trend of logging curves using LSTM networks and gated recurrent unit (GRU) neural networks [28,29,30,31,32]. In recent years, both the GRU algorithm and the BPNN algorithm have emerged as foundational tools in the domain of neural network-based stratigraphic division [28,29,30,31,32]. The GRU, a variant of the recurrent neural network (RNN), enhances the capacity to capture dependencies across different time steps, addressing some of the vanishing gradient challenges inherent to traditional RNNs [16,24,26,27,28,29,30,31,32]. Its architecture, characterized by update and reset gates, allows for efficient memory retention over longer sequences [28,29,30,31,32]. Conversely, BPNN, a foundational artificial neural network, relies on the propagation of errors backward through the network to refine its predictions iteratively [28,29,30,31,32]. Although BPNN has exhibited versatility across a wide range of tasks, its susceptibility to issues, such as local minima and slow convergence in certain scenarios, is noteworthy [16,24,26,27,28,29,30,31,32]. The methods based on recurrent neural networks can improve the prediction effect of logging curves to a certain extent, compared to those based on fully connected neural networks; however, when local mutations arise in logging curves, their prediction effect still needs improvement [28,29,30,31,32]. Nevertheless, these methods also face unique challenges, particularly when local mutations arise in logging curves.

Despite these advancements, the majority of automatic stratification techniques rely on the data derived from single wells, frequently overlooking the pivotal data from drilling location coordinates. This singular focus results in stratification that does not consider the potential impact of neighboring wells, thereby posing a significant challenge [30,31,32].

Subtle changes in mineral composition and organic matter content of black shale surrounding the stratigraphic boundary result in small variations in the logging response tied to the shale stratigraphic boundary. Such nuances make the stratigraphic division notably more challenging than in sandstone and carbonate formations [33,34,35,36]. In light of these complexities, this paper focuses on Ordovician–Silurian black shale and the corresponding logging curves on the southern edge of the Sichuan Basin, aiming at automatic stratigraphic division. Based on the above analysis, an innovative method, DRAG, which combines deep belief forest (DBF), principal component analysis (PCA), and an advanced generative adversarial network (GAN), is introduced to achieve automatic layering recognition in logging curves [37,38,39,40]. This method encompasses several techniques; principal component analysis (PCA) is utilized for dimensionality reduction, followed by the application of a generative adversarial network (GAN) for sample generation. Furthermore, DRAG integrates the deep belief forest (DBF), a cascaded deep forest algorithm [40,41,42,43,44,45,46]. An integral feature of the method is the automatic calibration strategy, which uses neighborhood information to adjust single point results [47,48,49].

2. The Proposed Method

This paper proposes a novel approach, termed DRAG, for well log analysis and shale boundary identification, as illustrated in Figure 1. Specifically, the original well log curve data undergo dimensional reduction using the PCA algorithm. Subsequently, a GAN network is employed to generate synthetic samples of the well log curve data, aiming to achieve a balanced quantity of samples across different categories. The deep belief forest algorithm is then utilized for well log curve stratification. Lastly, the calibration threshold function of the layer division result for a single well is constructed, and the layer division result is corrected by considering the stratigraphic division schemes of neighboring wells.

2.1. Data Preprocessing

Well log curve data typically consist of a high-dimensional feature space, with each feature represents different attributes. However, in well log analysis, not all features are equally informative or relevant for the target task [32,33,34,35]. Consequently, working with such high-dimensional data, which may contain redundant or noisy information, can lead to challenges such as high computational complexity and low analysis performance [38]. To overcome these limitations, dimensionality reduction is applied to the original well log curve data, where the most important and distinctive features should be identified and retained [42,43,44].

In this study, a principal component analysis (PCA) algorithm, a widely utilized method, is employed for dimensionality reduction in well log curve analysis. PCA helps eliminate redundant dimensions and identifies the most important and distinctive features, thereby preserving crucial information [38,39,40]. By reducing the dimensionality of well log curve data while preserving vital attributes, PCA improves the robustness of analysis results. The trend in TOC can effectively predict the boundaries of stratigraphic units. Therefore, conducting PCA analysis on well logging data and TOC data enhances the accuracy in predicting the limits of these stratigraphic units. This enables more efficient processing and enhances the ability to extract meaningful insights from the well log data, leading to improved decision-making in shale boundary identification and well log analysis tasks [44].

First, the principal component score of each sample is obtained after PCA dimension reduction of the original data, and it is represented by

F_{ic} (i = 1, 2, \dots, n; c = 1, 2, \dots, m)

, where

n

is the number of samples and

m

is the number of main components [24].

Then, each principal component is normalized. The process can be formulated as follows [38]:

\begin{matrix} F_{ic}^{*} = \frac{F_{ic} - F_{c (\min)}}{F_{c (\max)} - F_{c (\min)}} \\ (i = 1, 2, \dots, n; c = 1, 2, \dots, m) \end{matrix}

(1)

where

F_{ic}^{*}

is the normalized score of the

c

principal component of the i sample.

F_{ic}

is the score of the

c

principal component of sample i.

F_{c (\max)}

is the maximum score of the

c

principal component.

F_{c (\min)}

is the minimum score of the c principal component. Then, the gravity of each principal component sample is calculated.

\begin{matrix} P_{ic} = F_{ic}^{*} / \sum_{i = 1}^{n} F_{ic}^{*} \\ i = 1, 2, \dots, n; c = 1, 2, \dots, m \end{matrix}

(2)

where

P_{ic}

is the proportion of the

c

principal component in sample

i

.

Then, the information entropy of each principal component is calculated. The formula is as follows [38]:

\begin{matrix} E_{c} = - \frac{1}{lnn} \sum_{i = 1}^{n} P_{ic} \cdot {lnP}_{ic} \\ (i = 1, 2, \dots, n; c = 1, 2, \dots, m) \end{matrix}

(3)

where

E_{c}

is the information entropy of the

c

principal component. Next, the weight of each principal component is calculated. The formula is as follows [38]:

w_{c} = (1 - E_{c}) / \sum_{f = 1}^{m} (1 - E_{f}) (c, f = 1, 2, \dots, m)

(4)

where

w_{c}

is the weight of the

c

principal component, and

E_{f}

is the information entropy of the f principal component. Then, the comprehensive score of principal component of information entropy is calculated, and the formula is as follows:

F' = \sum_{c = 1}^{m} w_{c} \cdot P_{ic} (i = 1, 2, \dots, n; c = 1, 2, \dots, m)

(5)

where

F'

is the comprehensive score of principal components based on information entropy. According to the comprehensive score of principal components based on information entropy, the first 30% of principal components are selected as the data after dimensionality reduction.

This paper takes Well A as a case study and reduces the dataset for predicting TOC using PCA analysis. The specific parameters related to the PCA analysis are presented in Tables S1–S7. The weights of principal components (W_c) and the composite scores (F’) for logging data at each depth allow for the identification of principal components that explain the data with the highest granularity. The magnitude of these weights determines which principal components are pivotal in describing the data’s variability. Using Well A as a reference, Figure 2a reveals that the weights for the principal components GR, AC, RT, and DEN are comparatively high, indicating their pronounced predictive capacity for the TOC parameter (Figure 2a). The variability in the composite scores (F’) for each sample reflects the inherent importance or influence of each sample within the entirety of the logging data (Figure 2b). This information facilitates the recognition of samples that stand out across all principal components.

2.2. Sample Balance Treatment

The generative adversarial network (GAN) is a powerful unsupervised learning mechanism that was introduced in 2014 by Goodfellow [42,43,44,45,46]. Goodfellow and his colleagues first applied GAN to image generation [48,49,50,51,52]. Since then, GAN has attracted significant attention in the deep learning community for its capability to model high-dimensional data distributions and produce realistic samples [48,49,50,51,52].

In the context of well log data analysis, GAN provides valuable contributions by addressing the challenges associated with complex and high-dimensional data [32]. Well log curve data contain intricate patterns and subtle variations that are crucial for accurate shale boundary identification [44]. Utilizing GAN allows for harnessing its robust computational power to model the underlying distribution of the well log data and generate synthetic samples that capture the essential characteristics of the actual data.

GAN’s key characteristic is its composition of two competing networks: a generator (G) and a discriminator (D) [48,49,50,51,52]. The generator captures the latent distribution of actual sample data, creating new data samples, while the discriminator distinguishes between input data and generator-produced samples [48,49,50,51,52]. The training alternates and consists of competing phases between G and D, and it concludes once a balance is reached. At this point, the generator produces data that are true enough to elude detection by the discriminator [48,49,50,51,52]. The GAN process does not necessarily require prior knowledge of boundaries identified by experts. GANs are designed to learn and generate data in an unsupervised manner, without relying on pre-defined boundaries or labels. Instead, GANs learn from input data to generate new samples that resemble the training data.

In this paper, an improved GAN algorithm is used to generate samples. The network structure of the generator is mainly composed of five deconvolution layers, in which the input is a 100-dimensional log vector, and the output is a matrix with both length and width of 64 and 3 channels.

The discriminator network structure is basically the opposite of the generator, aims to judge the probability that the input feature matrix is the true sample, and is composed of 5 convolution layers and 1 reshaping layer.

The input feature matrix is a 3-channel matrix with length and width of 64. After 5 convolutional layers, the feature vector with length and width of 1 and channel number of 1 is output. After transformation, the output scalar is output in the range of 0–1; the closer the value is to 1, the higher the probability that the input feature vector is true. As with the generator, batch normalization is added after each convolution layer.

The convolution kernel size of CONV1, CONV2, CONV3, CONV4, and CONV5 is 4 × 4, and the step size is 2. Figure 3 shows the framework of a GAN.

For logging curves, amplitude and frequency can reflect the main differences between different curves. To capture the main differences between different curves based on their amplitudes and frequencies, we used the difference function

C (f_{t}, f_{u})

, defined as:

C (f_{t}, f_{u}) = \frac{f_{t} \times f_{u}}{f_{t} + f_{u}}

(6)

where

f_{t}

is the amplitude and

f_{u}

is the frequency. The function combines the amplitude (

f_{t}

) and frequency (

f_{u}

) using a weighted average, where the product of the two values is divided by their sum. Both

f_{t}

and

f_{u}

represent the amplitude (for

f_{t}

) and the frequency (for

f_{u}

) of individual samples, spanning the entire data distribution of the dataset. These are not confined to specific intervals. The term Ec in Equation (6) calculates the weighted average difference between the amplitude and frequency of the generated and real samples, capturing the dissimilarity between them. In the provided formulation, the difference is normalized by the sum of the two values to ensure the result stays within a meaningful range. This formulation emphasizes both amplitude and frequency information in determining the dissimilarity between curves.

In the improved GAN algorithm, the similarity between the generated log and the real log is defined as the objective function of the network, and the formula is:

R_{loss} = | R_{real} - R_{generate} |

(7)

where

R_{real}

represents the actual log curve and

R_{generate}

represents the generated log curve.

2.3. Layering Recognition

In order to effectively extract geological information and identify different geological layers or reservoirs from well log curve data, a robust algorithm is required [53,54,55,56]. In this study, the deep belief forest (DBF) algorithm, which combines the concepts from deep belief networks (DBNs) and random forests, is employed for well log curve stratification, as illustrated in Figure 4. The DBF algorithm offers several advantages over traditional random forest algorithms [53,54,55,56]. It utilizes a cascade forest structure, consisting of multiple layers of decision trees, to extract higher-level geological features and enhance the model’s expressive capability [53,54,55,56]. Additionally, the DBF algorithm incorporates DBNs to extract shale-related features, further improving its performance [53,54,55,56]. Moreover, the hierarchical structure of random forests provides interpretability to the classified well log curve data, enabling users to better understand the obtained stratification results [53,54,55,56].

However, all the data in the deep forest must pass through each step of the cascade forest, making the time cost increase linearly with the increase of the number of cascade forest layers [57,58,59,60,61]. Moreover, each original sample will generate hundreds of new samples after multi-particle scanning, greatly increasing the training set and computing cost [62,63,64]. To tackle the issue of time and memory overhead caused by all samples passing through each layer of the cascaded forest, this paper proposes a confidence screening mechanism in the cascaded forest structure, where each layer of the cascaded forest is able to automatically determine its own confidence threshold such that this mechanism improves the computational efficiency of the deep forest model while ensuring performance [62,63,64].

Figure 5 shows a deep confidence level forest [62,63,64]. The confidence screening mechanism aims to divide the instances of each level of the cascade into two subsets, those that are easy to predict and those that are more difficult to predict [62,63,64]. Specifically, if an instance is classified as belonging to the easily predictable subset, it will be directly outputted and used as the final result. Conversely, if an instance is determined to be difficult to predict, it needs to be passed to the next level of the cascaded forest [62,63,64].

At layer

t

of the cascade forest, its predicted confidence threshold

n_{t}

is determined according to the cross-validation error rate

ϵ_{t}

of layer t. The hyperparameter

α < 1

represents the cross-verification error rate that the training sample with high confidence needs to achieve

α

.

ϵ_{t}

sorts the training sample at the same level in descending order according to the prediction confidence, where

c_{i}

represents the prediction confidence of

x_{i}

of the m samples. The confidence threshold is set as follows:

n_{t} = m {c_{k} ∣ L (x_{1}, \dots, x_{k}) < α \in_{t}, k \in [1, m]}

(8)

In the equation

L (x_{1}, \dots, x_{k}) = \frac{1}{k} \sum_{i = 1}^{k} 1 [g_{t} (x_{i}) \neq y_{i}]

, the term 1[g_t(x_i) ≤ α] is an indicator function. Specifically, it takes the value of 1 if the condition g_t(x_i) ≤ α holds true and 0 otherwise. This function serves to capture the instances where the prediction confidence exceeds the threshold α. The equation aims to compute the cross-verification error rate for the k samples with the highest prediction confidence. Considering only the samples with the highest confidence ensures that the selected training samples have a low cross-verification error rate, indicating more reliable predictions. When the output class vector of the last layer of forest is obtained, each forest class vector of the last layer of forest is averaged, and then the category corresponding to the maximum value is taken as the prediction category of the model, namely the classification category

Y_{t}

of the stratum in this paper. Moreover, m is the sample size,

α ϵ_{t}

represents the cross-validation error rate,

c_{k}

denotes the prediction confidence of the kth sample.

3. Result Calibration

The automatic calibration of single point results through neighborhood information can significantly reduce the fluctuation of formation division between different logs in different regions; this can lead to the single detection results in the same region having a more similar formation division, which is more in line with the accuracy of actual formation division. In addition, since there are some fuzzy formation boundaries in single well-logs, the use of formation division results can make the fuzzy boundaries of a single well clearer and its performance more stable [50,51].

In this paper, the calibration threshold function of the single point result is constructed and the point information is corrected by considering the neighborhood stratification result. The correction function is constructed as follows:

Y_{f i n a} = r a n d (\frac{5 Y_{t} + Y_{1} + Y_{2} + Y_{3} + Y_{4} + Y_{5} + Y_{6} + Y_{7} + Y_{8}}{13})

(9)

where

Y_{f i n a}

is the final classification result,

r a n d ()

is the integral function, Yt represents the classification result obtained directly from the stratigraphic division results predicted by the DRAG method (using the PCA algorithm, the well log curve data dimensions are reduced. The GAN network then generates synthetic samples for balanced distribution. Finally, the deep belief forest algorithm refines well log stratification using a calibration threshold function; for details, see Section 2), before considering any neighborhood stratification influences. Essentially, Yt is the primary output classification of a specific depth point in the well based on its intrinsic logging properties. This classification acts as the foundation, which is then refined by the correction function using neighborhood stratification results. In terms of its computation, Yt is determined through the deep belief forest-based analysis, where well log characteristics at each depth point in layer 4 are input into the model, and the model then provides a classification based on the learned patterns and relationships in the data.

Y_{1}, Y_{2}, Y_{3}, Y_{4}, Y_{5}, Y_{6}, Y_{7}, Y_{8}

are the stratigraphic classification results of the surrounding wells, respectively. Moreover, the function rand() is utilized to generate random numbers. Within GANs, the generator model (G) initiates its training with a random input, often referred to as a noise vector. This noise vector, after being processed by the generator, is transformed into an output resembling the distribution of the real data. The inherent randomness ensures that the generator produces varied results each time, enhancing the diversity of generated samples. Had we consistently employed the same input, the generator might repetitively produce identical or highly similar images. This randomness is pivotal for the generator’s capability to explore and learn diverse data distributions.

4. Result and Discussion

4.1. Targeted Stratigraphic Formations and Sub-Divisions

The objective of this study was to conduct well log analysis and identify shale boundaries in the southern Sichuan area [40,46]. This region poses particular complexities in terms of stratigraphic division due to the subtle variations across different areas and the subtle changes in well log responses in shale formations. The primary focus of this study is on the Longmaxi Formation, specifically the sub-section of Long1, which shows substantial exploration potential [43].

The Longmaxi Formation can be broadly divided into two sections, Long1 and Long2. The Long1 section is further divided into two sub-sections, Long1-1 and Long1-2. The Long1-1 sub-section, which is the target layer, is sub-divided further into four layers: L11, L12, L13, and L14 [1,14,25]. Moreover, the Wufeng Formation is also a crucial target for shale gas exploration. It is necessary to automatically identify L11, L12, L13, L14, and W1 (Wufeng Formation) by using deep belief forest analysis [30,31,32].

4.2. Experimental Data

For this study, well log data from 168 shale gas wells drilled in the southern Sichuan Basin were utilized. These data were specifically extracted from the Wufeng–Longmaxi Formation (Long1-1 sub-section, [33]). Through PCA analysis, the top four logging curves with the highest principal component weights have been selected for stratigraphic unit division (see the Section 2.1 for details). The well log curves employed for training include gamma ray (GR), acoustic transit time (AC), bulk density (DEN), and deep resistivity (Rt). Each of these sets of well log data comes with the corresponding layer labels, which allow for precise identification of the stratigraphic layers [35]. In the process of implementing the deep belief forest-based analysis, the goal is to overcome the challenges of shale formation division in the Southern Sichuan Basin [42,43,44]. Furthermore, the ability to accurately identify the boundaries within the sub-sections of Long1-1 could facilitate more effective and efficient exploration of shale gas resources in this region [35].

4.3. Experimental Results and Analysis

Figure 6a displays the stratigraphic curve annotated manually by experts. Figure 6b presents the stratigraphic curve results yielded by the method proposed in this study. Figure 6c depicts the results of well log curve stratification carried out using the gated recurrent unit algorithm, while Figure 6d shows the outcomes of well log curve stratification performed via the backpropagation neural network (BPNN) algorithm. The comparative analysis of these results emphasizes that the method introduced in this study possesses superior classification accuracy and performance.

Compared with the GRU and BPNN algorithms, the automatically identified results based on deep belief forest are the most similar to the results of artificial stratigraphic division, with an error of ±1 m. Compared with artificial results, the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R²) of both algorithms are the smallest. The results of automatic stratigraphic division based on the GRU algorithm generally differ from those of manual stratigraphic division by less than 5 m. The difference between the results of automatic stratigraphic division based on the BPNN algorithm and the results of artificial stratigraphic division is the greatest (Table 1 and Table 2). Compared with the results of artificial stratigraphic division, the errors of the two are generally concentrated within 10 m, and the error of stratigraphic division results is relatively large.

Both GRU and BPNN have been widely regarded as effective tools for sequence modeling and stratigraphic division tasks, respectively. The GRU’s architecture, with its capacity to maintain memory over extended sequences, has demonstrated its efficacy in capturing intricate dependencies within data [28,29,30,31,32]. On the other hand, the BPNN, being a seminal artificial neural network approach, has demonstrated adaptability and precision across myriad applications, even though it can encounter challenges such as local minima or slow convergence in certain contexts [16,24,26,27,28,29,30,31,32]. Given their prominence and applicability in related domains, juxtaposing the performance of GRU and BPNN against DRAG provides a comprehensive and rigorous assessment [28,29,30,31,32]. This comparative analysis aims to discern the inherent strengths and potential limitations of our proposed method, while anchoring it to established benchmarks in the field [28,29,30,31,32]. Compared to the GRU and BPNN algorithms, the hierarchical results based on the deep belief forest are the most similar to the result of artificial stratigraphic division. The deviation from manual interpretations is within ±1 m. In comparison to the results of artificial stratigraphic division, the deep belief forest method yields the lowest values for MAE, RMSE and R². On the other hand, the results of automatic stratigraphic division based on the GRU algorithm generally differ by less than 5 m from manual interpretations. The automatic stratigraphic identification using the BPNN algorithm exhibits the largest discrepancy compared to manual interpretations, with the deviations typically concentrated within 10 m, indicating significant errors in artificial stratigraphic division results. The proposed method in this study introduces a confidence-based filtering mechanism within the cascade forest structure, partitioning instances into subsets of easily predictable and difficult-to-predict instances. As a result, it effectively reduces time and memory overhead, enhances classification accuracy, and exhibits advantages in terms of resource efficiency, hierarchical processing, and high-dimensional data handling (Table 1 and Table 2).

The proposed method, DRAG, demonstrates high accuracy, attributed not only to appropriate dimensionality reduction and sample generation operations, but also to the incorporation of a confidence-based filtering mechanism in the deep belief forest algorithm. Confidence filtering involves dividing instances at each cascade layer into two subsets, one comprising easily predictable instances and the other comprising difficult-to-predict instances. If an instance is deemed easy to predict, it is directly output as the final result; only when an instance is difficult to predict does it get passed to the next layer. This hierarchical approach greatly enhances the accuracy of classification. Additionally, utilizing neighborhood information for single-point result calibration further improves the classification performance.

As illustrated in Figure 6, the well log curve predictions for Well A have been adjusted using the stratigraphic layering outcomes from Wells B, C, and D (Figure 6, Table 3). Figure 7a exhibits the well log curve derived from auto division for stratigraphic units by single well (Figure 7a), whereas Figure 7b presents the well log curve for Well A, corrected by employing stratigraphic data from Well B (Figure 7b). Figure 7c depicts the well log curve for Well A, modified with stratigraphic information from both Wells B and C (Figure 7c). Figure 7d represents the well log curve for Well A, refined using stratigraphic insights derived from Wells B, C, and D (Figure 7d). When using the artificial stratigraphic division from Well B to correct the automatic stratigraphic division of Well A, the deviation between the automatic and manual interpretations is controlled within 5 m. When both Well B and Well C are used to correct the stratigraphic division of Well A, the automatic interpretations of Well A align more closely with the manual interpretations, and the boundary errors between the two stratigraphic divisions are reduced to within 4 m. However, when the artificial stratigraphic division from Well B, Well C, and Well D, which are shale gas wells located within 100 km of Well A, are simultaneously used to correct the automatic stratigraphic division of Well A, it is found that the automatic interpretations become more accurate. The deviation between the automatic and manual stratigraphic division results is reduced to within 1 m. The utilization of multi-well analysis and shale characterization for well-logging curve analysis enables auxiliary support for the classification results of the target logging.

Evaluation of these well log curves reveals a progressive approximation of the adjusted outcomes to the authentically labeled well log curves, achieved by implementing stratigraphic corrections on Well A using data from its neighboring wells. This progressive alignment underscores the efficacy of the correction methodology adopted in this study.

The three subfigures below represent the manual stratification results for Wells B, C, and D, respectively. Hence, the proposed methodology in this study further substantiates the necessity of well log curve rectification as a crucial step. This precise technique contributes to a significant enhancement in the accuracy of stratification efforts.

The proposed research presents a distinctive methodology that manifests considerable advantages in managing high-dimensional well log data and conducting stratigraphic analysis. Central to the approach is the integration of PCA, an indispensable tool for data dimension reduction. This technique judiciously eliminates superfluous dimensions while conscientiously preserving critical data attributes. Through its adeptness in identifying robust correlations amongst a multitude of variables, the methodology excels in retaining the essential characteristics of the data. This process safeguards the preservation of pivotal information, even amidst a substantial reduction in overall data size.

The experimental findings offer compelling evidence for the precision and efficacy inherent in the proposed method. A testament to the robustness and high predictive accuracy of the technique is its ability to iteratively refine approximations to achieve closer alignment with the actual labelled well log curves. When juxtaposed with established algorithms such as the GRU and the BPNN, the proposed methodology demonstrates superior classification performance, further bolstering its merit.

5. Conclusions

The research introduces a cutting-edge method, DRAG, designed for well log analysis and automated stratigraphic layer identification within the Wufeng–Longmaxi shale of the Southern Sichuan Basin. By harnessing the PCA algorithm, the dimensions of the original well log curve data are reduced. A subsequent application of a GAN network facilitates the generation of synthetic samples, ensuring a balanced distribution across categories. The deep belief forest algorithm then undertakes well log curve stratification, further refined through a calibration threshold function and by incorporating stratigraphic schemes from neighboring wells. Notably, the deviation between automated and manual stratigraphic divisions is minimized to only 1 m, indicating a precision surpassing methods such as GRU, BPNN, and random forest.

Three pivotal facets underscore DRAG’s superiority in stratification precision: (1) the confidence-based filtering mechanism within the deep belief forest algorithm; (2) the integration of PCA, a critical tool for dimensionality reduction; and (3) the importance of well-to-well correlation rectification.

In the deep belief forest algorithm, the confidence-based filtering mechanism classifies instances at each cascade layer into two categories, easily predictable and challenging to discern. While easily predictable instances are immediately finalized, the more complex ones proceed to subsequent layers, enhancing classification accuracy. This accuracy is further refined by integrating neighborhood data to calibrate individual point results.

The DRAG approach is especially adept at managing high-dimensional well log data, chiefly owing to its integration of PCA, which efficiently trims redundant dimensions while preserving essential data attributes. By identifying potent correlations among numerous variables, the method ensures the retention of critical data features, even with significant data downsizing. Emphasizing well log curve rectification, the technique benefits from considering spatial information from surrounding wells, culminating in enhanced stratification accuracy.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/pr11102998/s1, Table S1. Stratigraphic division data and original well-logging data of a well; Table S2. Process parameter F_ic calculated from 10 types of well-logging data of Well A during PCA computation; Table S3. Process parameter F^*_ic calculated from 10 types of well-logging data of Well A during PCA computation; Table S4. Process parameter P_ic calculated from 10 types of well-logging data of Well A during PCA computation; Table S5. Process parameter E_c calculated from 10 types of well-logging data of Well A during PCA computation; Table S6. Process parameter W_c calculated from 10 types of well-logging data of Well A during PCA computation; Table S7. Process parameter F’ calculated from 10 types of well-logging data of Well A during PCA computation.

Author Contributions

Conceptualization, T.Z., Q.Z. (Qingzhong Zhu) and S.Z.; Methodology, T.Z., Q.Z. (Qingzhong Zhu) and S.Z.; Software, Q.Z. (Qun Zhao), S.Z. and C.Z.; Validation, Z.S.; Formal analysis, Q.Z. (Qingzhong Zhu), H.Z., Q.Z. (Qun Zhao), C.Z. and S.W.; Investigation, Q.Z. (Qingzhong Zhu); Resources, Q.Z. (Qun Zhao) and C.Z.; Data curation, H.Z.; Writing—original draft, T.Z.; Writing—review & editing, T.Z., Q.Z. (Qun Zhao), Z.S. and S.W.; Visualization, T.Z., Q.Z. (Qingzhong Zhu), H.Z., Z.S., C.Z. and S.W.; Supervision, H.Z.; Project administration, Z.S. and S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is unavailable due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

DRAG	a novel deep belief forest-based automatic layering recognition method for logging curves
DBF	Deep belief forest
PCA	Principal component analysis
GAN	Generative adversarial network
FCNN	Fully convolutional neural network
RNN	Recurrent neural network
LSTM	Long short-term memory
C-LSTM	Convolutional long short-term memory
GRU	Gated recurrent unit neural networks
BPNN	Backpropagation neural network
$F_{ic}^{*}$	The normalized score of the c principal component of the i sample
$F_{ic}$	The score of the c principal component of sample i
$F_{c (\max)}$	The maximum score of the c principal component
$F_{c (\min)}$	The minimum score of the c principal component
$P_{ic}$	The proportion of the c principal component in sample i
$E_{c}$	The information entropy of the c principal component
$w_{c}$	The weight of the c principal component
$E_{f}$	The weight of the f principal component
$F'$	The comprehensive score of principal components based on information entropy
$f_{t}$	The amplitude
$f_{u}$	The frequency
$R_{real}$	The actual log curve
$R_{generate}$	The generated log curve
$n_{t}$	Predicted confidence threshold of layer t
$ϵ_{t}$	The cross-validation error rate of layer t
α	Hyperparameter for indicating the cross-verification error rate
$c_{i}$	The prediction confidence of $x_{i}$ of the sample
$Y_{t}$	The classification category of the stratum
$Y_{f i n a}$	The final classification result
Yn	The stratigraphic classification result of the surrounding well
GR	Gamma ray
AC	Acoustic transit time
DEN	Bulk density
Rt	Deep resistivity
MAE	Mean absolute error
RMSE	Root mean square error
R²	Coefficient of determination

References

Karimi, A.M.; Sadeghnejad, S.; Rezghi, M. Well-to-well correlation and identifying lithological boundaries by principal component analysis of well-logs. Comput. Geosci. 2021, 157, 104942. [Google Scholar] [CrossRef]
Zaitouny, A.; Small, M.; Hill, J.; Emelyanova, I.; Ben Clennell, M. Fast automatic detection of geological boundaries from multivariate log data using recurrence. Comput. Geosci. 2020, 135, 104362. [Google Scholar] [CrossRef]
Partovi, S.M.A.; Sadeghnejad, S. Geological boundary detection from well-logs: An efficient approach based on pattern recognition. J. Pet. Sci. Eng. 2019, 176, 444–455. [Google Scholar] [CrossRef]
Behdad, A. A step toward the practical stratigraphic automatic correlation of well logs using continuous wavelet transform and dynamic time warping technique. J. Appl. Geophys. 2019, 167, 26–32. [Google Scholar] [CrossRef]
Liu, L.-L.; Wang, Y. Quantification of stratigraphic boundary uncertainty from limited boreholes and its effect on slope stability analysis. Eng. Geol. 2022, 306, 106770. [Google Scholar] [CrossRef]
Zhang, Q.; Zhang, F.; Liu, J.; Wang, X.; Chen, Q.; Zhao, L.; Tian, L.; Wang, Y. A method for identifying the thin layer using the wavelet transform of density logging data. J. Pet. Sci. Eng. 2018, 160, 433–441. [Google Scholar] [CrossRef]
Dobróka, M.; Szabó, N.P. Interval inversion of well-logging data for automatic determination of formation boundaries by using a float-encoded genetic algorithm. J. Pet. Sci. Eng. 2012, 86–87, 144–152. [Google Scholar] [CrossRef]
Omeragic, D.; Polyakov, V.; Shetty, S.; Brot, B.; Habashy, T.; Mahesh, A.; Friedel, T.; Denichou, J. Integration of well logs and reservoir geomodels for formation evaluation in high-angle and horizontal wells. In Proceedings of the SPWLA 52nd Annual Logging Sym-Posium, Colorado Springs, CO, USA, 14–18 May 2011; OnePetro: Richardson, TX, USA, 2011. [Google Scholar]
Maiti, S.; Tiwari, R. Automatic detection of lithologic boundaries using the Walsh transform: A case study from the KTB borehole. Comput. Geosci. 2005, 31, 949–955. [Google Scholar] [CrossRef]
Luthi, S.M.; Bryant, I.D. Well-log correlation using a back-propagation neural network. J. Int. Assoc. Math. Geol. 1997, 29, 413–425. [Google Scholar] [CrossRef]
Delfiner, P.; Peyret, O.; Serra, O. Automatic determination of lithology from well logs. SPE Form. Eval. 1987, 2, 303–310. [Google Scholar] [CrossRef]
Shaw, B.R.; Cubitt, J.M. Stratigraphic correlation of well logs: An automated approach. In Geomathematical and Petro-Physical Studies in Sedimentology; Elsevier: Amsterdam, The Netherlands, 1979; pp. 127–148. [Google Scholar]
Silversides, K.L.; Melkumyan, A.; Wyman, D.A.; Hatherly, P.J.; Nettleton, E. Detection of geological structure using gamma logs for autonomous mining. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 1577–1582. [Google Scholar] [CrossRef]
Reading, A.M.; Gallagher, K. Transdimensional change-point modeling as a tool to investigate uncertainty in applied ge-ophysical inference: An example using borehole geophysical logs. Geophysics 2013, 78, WB89–WB99. [Google Scholar] [CrossRef]
Sen, D.; Texas A&M University; Ong, C.; Kainkaryam, S.; Sharma, A.; Houston, T. Automatic detection of anomalous density measurements due to wellbore cave-in. Petrophysics 2020, 61, 434–449. [Google Scholar] [CrossRef]
Gill, D.; Shomrony, A.; Fligelman, H. Numerical zonation of log suites and logfacies recognition by multivariate clustering. Aapg. Bull. 1993, 77, 1781–1791. [Google Scholar]
Smith, J.H. A method for calculating pseudo sonics from e-logs in a clastic geologic setting. Gcags Trans. 2007, 57, 675–678. [Google Scholar]
Merembayev, T.; Yunussov, R.; Yedilkhan, A. Machine learning algorithms for classification geology data from well logging. In Proceedings of the 2018 14th International Conference on Electronics Computer and Computation (ICECCO), Kaskelen, Kazakhstan, 29 November–1 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 206–212. [Google Scholar]
Partovi, S.M.A.; Sadeghnejad, S. Reservoir rock characterization using wavelet transform and fractal dimension. Iran. J. Chem. Chem. Eng. 2018, 37, 223–233. [Google Scholar]
Belozerov, B.; Bukhanov, N.; Egorov, D.; Zakirov, A.; Osmonalieva, O.; Golitsyna, M.; Reshytko, A.; Semenikhin, A.; Shindin, E.; Lipets, V. Automatic well log analysis across priobskoe field using machine learning methods. In Proceedings of the SPE Russian Petroleum Technology Conference, Moscow, Russia, 15–17 October 2018; OnePetro: Richardson, TX, USA, 2018. [Google Scholar]
Partovi, S.M.A.; Sadeghnejad, S. Fractal parameters and well-logs investigation using automated well-to-well correlation. Comput. Geosci. 2017, 103, 59–69. [Google Scholar] [CrossRef]
Gulbrandsen, M.L.; Cordua, K.S.; Bach, T.; Hansen, T.M. Smart Interpretation–automatic geological interpretations based on supervised statistical models. Comput. Geosci. 2017, 21, 427–440. [Google Scholar] [CrossRef]
Gonzalez, A.; Kanyan, L.; Heidari, Z. Integrated multi-physics workflow for automatic rock classification and formation evaluation using multi-scale image analysis and conventional well logs. In Proceedings of the SPWLA 60th Annual Logging Symposium, The Woodlands, TX, USA, 15–19 June 2019. [Google Scholar] [CrossRef]
Loginov, G.; Petrov, A. Automatic detection of geoelectric boundaries according to lateral logging sounding data by applying a deep convolutional neural network. Russ. Geol. Geophys. 2019, 60, 1319–1325. [Google Scholar] [CrossRef]
Lapkovsky, V.; Istomin, A.; Kontorovich, V.; Berdov, V. Correlation of well logs as a multidimensional optimization problem. Russ. Geol. Geophys. 2015, 56, 487–492. [Google Scholar] [CrossRef]
Gardner, G.H.F.; Gardner, L.W.; Gregory, A.R. Formation velocity and density-the diagnostic basics for stratigraphic traps. Geophysics 1974, 39, 770–780. [Google Scholar] [CrossRef]
Castagna, J.P.; Batzle, M.L.; Eastwood, R.L. Relationships between compressional-wave and shear-wave velocities in clastic silicate rocks. Geophysics 1985, 50, 571–581. [Google Scholar] [CrossRef]
Zhang, D.; Yuntian, C.; Jin, M. Synthetic well logs generation via Recurrent Neural Networks. Pet. Explor. Dev. 2018, 45, 629–639. [Google Scholar] [CrossRef]
Zhou, X.; Cao, J.; Wang, X.; Wang, J.; Liao, W. Acoustic log reconstruction based on bidirectional Gated Recurrent Unit (GRU) neural network. Prog. Geophys. 2022, 37, 357–366. [Google Scholar]
Wood, D.A. Carbonate/siliciclastic lithofacies classification aided by well-log derivative, volatility and sequence boundary attributes combined with machine learning. Earth Sci. Inform. 2022, 15, 1699–1721. [Google Scholar] [CrossRef]
Anvari, K.; Mousavi, A.; Sayadi, A.R.; Sellers, E.; Salmi, E.F. Automatic detection of rock boundaries using a hybrid recurrence quantification analysis and machine learning techniques. Bull. Eng. Geol. Environ. 2022, 81, 398. [Google Scholar] [CrossRef]
Wang, Y.; Shi, C.; Li, X. Machine learning of geological details from borehole logs for development of high-resolution subsurface geological cross-section and geotechnical analysis. Georisk: Assess. Manag. Risk Eng. Syst. Geohazards 2021, 16, 2–20. [Google Scholar] [CrossRef]
Tözün, K.A.; Özyavaş, A. Automatic detection of geological lineaments in central Turkey based on test image analysis using satellite data. Adv. Space Res. 2022, 69, 3283–3300. [Google Scholar] [CrossRef]
Shi, Z.; Zhou, T.; Guo, W.; Liang, P.; Cheng, F. Quantitative Paleogeographic Mapping and Sedimentary Microfacies Divi-sion in a Deep-water Marine Shale Shelf: Case study of Wufeng-Longmaxi shale, southern Sichuan Basin, China. Acta Sedimentol. Sin. 2022, 40, 1728–1744. [Google Scholar]
Hongyan, W.; Zhensheng, S.; Shasha, S.; Leifu, Z.; Aarnes, I. Characterization and genesis of deep shale reservoirs in the first Member of the Silurian Longmaxi Formation in southern Sichuan Basin and its periphery. Oil Gas Geol. 2021, 42, 66–75. [Google Scholar]
Shi, Z.; Dong, D.; Wang, H.; Sun, S.; Wu, J. Reservoir characteristics and genetic mechanisms of gas-bearing shales with different laminae and laminae combinations: A case study of Member 1 of the Lower Silurian Longmaxi shale in Sichuan Basin, SW China. Pet. Explor. Dev. 2020, 47, 888–900. [Google Scholar] [CrossRef]
Wang, H.; Shi, Z.; Zhao, Q.; Liu, D.; Sun, S.; Guo, W.; Liang, F.; Lin, C.; Wang, X. Stratigraphic framework of the Wufeng-Longmaxi shale in and around the Sichuan Basin, China: Implications for targeting shale gas. Energy Geosci. 2020, 1, 124–133. [Google Scholar] [CrossRef]
Girolami, M.; Mischak, H.; Krebs, R. Analysis of complex, multidimensional datasets. Drug Discov. Today Technol. 2006, 3, 13–19. [Google Scholar] [CrossRef] [PubMed]
Karamizadeh, S.; Abdullah, S.M.; Manaf, A.A.; Zamani, M.; Hooman, A. An overview of principal component analysis. J. Signal Inf. Process. 2013, 4, 173. [Google Scholar] [CrossRef]
Gonog, L.; Zhou, Y. A review: Generative adversarial networks. In Proceedings of the 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), Xi’an, China, 19–21 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 505–510. [Google Scholar]
Aggarwal, A.; Mittal, M.; Battineni, G. Generative adversarial network: An overview of theory and applications. Int. J. Inf. Manag. Data Insights 2021, 1, 100004. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. arXiv 2014, arXiv:1406.2661. [Google Scholar]
Chen, K.; Chen, H.; Zhou, C.; Huang, Y.; Qi, X.; Shen, R.; Liu, F.; Zuo, M.; Zou, X.; Wang, J.; et al. Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res. 2019, 171, 115454. [Google Scholar] [CrossRef]
Shewalkar, A.; Nyavanandi, D.; Ludwig, S.A. Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 2019, 9, 235–245. [Google Scholar] [CrossRef]
Rajeswari, S.; Suthendran, K. C5.0: Advanced Decision Tree (ADT) classification model for agricultural data analysis on cloud. Comput. Electron. Agric. 2019, 156, 530–539. [Google Scholar] [CrossRef]
Zhou, Z.-H.; Feng, J. Deep Forest: Towards an Alternative to Deep Neural Networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, VIC, Australia, 19–25 August 2017; pp. 3553–3559. [Google Scholar]
Canziani, A.; Paszke, A.; Culurciello, E. An analysis of deep neural network models for practical applications. arXiv 2016, arXiv:1605.07678 2016. [Google Scholar]
Loh, W.-Y.; Eltinge, J.; Cho, M.J.; Li, Y. Classification and regression trees and forests for incomplete data from sample surveys. Stat. Sin. 2018, 29, 431–453. [Google Scholar] [CrossRef]
Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
Fedus, W.; Goodfellow, I.; Dai, A.M. Maskgan: Better text generation via filling in the_. arXiv 2018, arXiv:1801.07736 2018. [Google Scholar]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Goodfellow, I.J. On distinguishability criteria for estimating generative models. arXiv 2014, arXiv:1412.6515 2014. [Google Scholar]
Zhang, C.; Zuo, R. Recognition of multivariate geochemical anomalies associated with mineralization using an improved generative adversarial network. Ore Geol. Rev. 2021, 136, 104264. [Google Scholar] [CrossRef]
Liu, M.; Li, W.; Jervis, M.; Nivlet, P. 3D seismic facies classification using convolutional neural network and semi-supervised generative adversarial network. In SEG Technical Program Expanded Abstracts 2019; Society of Exploration Geophysicists: Houston, TX, USA, 2019. [Google Scholar] [CrossRef]
Fu, R.; Chen, J.; Zeng, S.; Zhuang, Y.; Sudjianto, A. Time Series Simulation by Conditional Generative Adversarial Net. arXiv 2019, arXiv:1904.11419. [Google Scholar] [CrossRef]
Jo, H.; Santos, J.E.; Pyrcz, M.J. Rule-Based Models with Generative Adversarial Networks: A Deepwater Lobe. In Proceedings of the Deep Learning Example, 2019 AAPG Annual Convention and Exhibition, San Antonio, TX, USA, 15–20 September 2019. [Google Scholar]
Nakayama, J.Y.; Ho, J.; Cartwright, E.; Simpson, R.; Hertzberg, V.S. Predictors of progression through the cascade of care to a cure for hepatitis C patients using decision trees and random forests. Comput. Biol. Med. 2021, 134, 104461. [Google Scholar] [CrossRef]
Ko, B.C.; Kim, S.; Jung, M. Energy-efficient pupil tracking method and device based on simplification of cascade regression forest. Sensors 2021, 20, 5141. [Google Scholar]
Johnson, S.L.; Henshaw, D.; Downing, G.; Wondzell, S.; Schulze, M.; Kennedy, A.; Cohn, G.; Schmidt, S.A.; Jones, J.A. Long-term hydrology and aquatic biogeochemistry data from H. J. Andrews Experimental Forest, Cascade Mountains, Oregon. Hydrol Process 2021, 35, e14187. [Google Scholar]
Zheng, L.; Bao, Q.; Weng, S.; Tao, J.; Zhang, D.; Huang, L.; Zhao, J. Determination of adulteration in wheat flour using multi-grained cascade forest-related models coupled with the fusion information of hyperspectral imaging. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2021, 270, 120813. [Google Scholar] [CrossRef]
Shao, M.; Zou, Y. Multi-spectral cloud detection based on a multi-dimensional and multi-grained dense cascade forest. J. Appl. Remote Sens. 2021, 15, 028507. [Google Scholar] [CrossRef]
Li, D.; Liu, Z.; Armaghani, D.J.; Xiao, P.; Zhou, J. Novel Ensemble Tree Solution for Rockburst Prediction Using Deep Forest. Mathematics 2022, 10, 787. [Google Scholar] [CrossRef]
Runxuan, W.; Croft, R.A.C.; Patrick, S. Deep forest: Neural network reconstruction of intergalactic medium temperature. Mon. Not. R. Astron. Soc. 2022, 515, 1568–1579. [Google Scholar]
Zhang, J.; Song, H. Multi-Feature Fusion for Weak Target Detection on Sea-Surface Based on FAR Controllable Deep Forest Model. Remote Sens. 2021, 13, 812. [Google Scholar] [CrossRef]

Figure 1. A framework illustration of DRAG. After the application of the PCA algorithm, the well log data undergo a dimensionality reduction, leading to the extracted principal components. These components, representing the main features of the original well log data, are then used in two distinct pathways. The first pathway feeds into the generative adversarial network (GAN), where it provides the basis for generating synthetic samples. The second pathway channels the reduced data directly into the deep belief forest for further stratigraphic analysis. Both routes are crucial for the comprehensive stratigraphic division process, with the GAN ensuring a balanced data distribution and the deep belief forest refining stratigraphic categorization. This dual-path strategy optimizes the use of reduced data, ensuring a more precise and reliable stratigraphic division.

Figure 2. (a) Weights (wc) of principal components during PCA analysis of Well A and (b) Composite scores (F’) for each depth interval of Well A. PC1: Gamma ray (GR), PC2: Acoustic transit time (AC), PC3: Deep resistivity (Rt), PC4: Potassium–thorium–hydrogen (KTH), PC5: Compensated neutron log (CNL), PC6: Bulk density (DEN), PC7: Resistivity of flushed zone (RXO), PC8: Caliper log (CAL), PC9: Thorium–hydrogen (TH), PC10: Uranium (URAN).

Figure 3. The framework of a GAN for the deep belief forest-based automatic layering recognition method.

Figure 4. Schematic diagram of the deep belief forest used in this study.

Figure 5. Deep confidence level forest used for this study.

Figure 6. Comparison of automatic stratigraphic unit division results based on manual division, GRAG, GRU, and BPNN methods.

Figure 7. Comparison of corrections applied to stratigraphic division results from well log. (a) well A geological boundary identified by experts, (b) well A geological boundary identified by well B correlation after deep belief forest analysis, (c) well A geological boundary identified by well B and well c correlations after deep belief forest analysis, (d) well A geological boundary identified by well B, well C and well D correlations after deep belief forest analysis, (e) well locations of well A, well B, well C and well D, (f) geological boundary identification of well B, (g) geological boundary identification of well C, (h) geological boundary identification of well D.

Table 1. Results of different methods for identifying automatic shale boundary.

Well	Layer	Artificial Geological Boundary Identified Results		Proposed Method		GRU		BPNN		Random Forest
Well	Layer	Top Depth (m)	Bottom Depth (m)	Top Depth (m)	Bottom Depth (m)	Top Depth (m)	Bottom Depth (m)	Top Depth (m)	Bottom Depth (m)	Top Depth (m)	Bottom Depth (m)
Well A	L14	1273.471	1302.5	1274.058	1302.922	1273.956	1303.367	1279.607	1303.051	1274.081	1306.388
Well A	L13	1302.5	1311.652	1302.922	1311.677	1303.367	1312.749	1303.051	1305.885	1306.388	1313.845
Well A	L12	1311.652	1318.332	1311.677	1318.997	1312.749	1321.884	1305.885	1320.249	1313.845	1322.706
Well A	L11	1318.332	1320.368	1318.997	1320.597	1321.884	1322.138	1320.249	1322.236	1322.706	1326.11
Well A	W1	1320.368	1323	1320.597	1322.555	1322.138	1328.728	1322.236	1325.984	1326.11	1327.87
Well B	L14	2290.708	2321.879	2291.004	2322.636	2290.476	2322.676	2293.19	2325.156	2299.947	2316.072
Well B	L13	2321.879	2339.537	2322.636	2339.493	2322.676	2341.61	2325.156	2336.799	2316.072	2345.462
Well B	L12	2339.537	2345.506	2339.493	2345.201	2341.61	2343.233	2336.799	2347.478	2345.462	2346.248
Well B	L11	2345.506	2348.477	2345.201	2348.577	2343.233	2350.572	2347.478	2344.549	2346.248	2343.577
Well B	W1	2348.477	2357	2348.577	2356.591	2350.572	2354.883	2344.549	2361.742	2343.577	2357.78
Well C	L14	2906.318	2940.176	2906.81	2940.542	2908.149	2941.566	2908.784	2944.714	2907.024	2936.386
Well C	L13	2940.176	2954.237	2940.542	2953.414	2941.566	2952.814	2944.714	2951.976	2936.386	2957.748
Well C	L12	2954.237	2958.511	2953.414	2958.871	2952.814	2958.486	2951.976	2959.955	2957.748	2960.499
Well C	L11	2958.511	2959.96	2958.871	2959.993	2958.486	2960.405	2959.955	2964.13	2960.499	2962.396
Well C	W1	2959.96	2962.808	2959.993	2961.869	2960.405	2964.599	2964.13	2967.738	2962.396	2965.122
Well D	L14	2038.45	2073.933	2038.762	2072.952	2037.977	2069.651	2029.618	2071.743	2028.379	2074.321
Well D	L13	2073.933	2092.152	2072.952	2092.688	2069.651	2090.59	2071.743	2094.587	2074.321	2095.42
Well D	L12	2092.152	2101.53	2092.688	2100.649	2090.59	2100.941	2094.587	2102.906	2095.42	2102.095
Well D	L11	2101.53	2107.016	2100.649	2106.53	2100.941	2103.031	2102.906	2110.992	2102.095	2111.828
Well D	W1	2107.016	2111	2106.53	2111.796	2103.031	2113.375	2110.992	2109.787	2111.828	2108.992

Table 2. Comparison between the proposed algorithms and the comparison algorithm.

Evaluation Index	Proposed Method	GRU	BPNN	Random Forest
MAE	6.221	8.876	10.241	10.221
RMSE	8.944	11.345	14.214	14.341
R²	0.932	0.911	0.834	0.831

MAE: mean absolute error, RMSE: root mean square error, R²: coefficient of determination.

Table 3. Automatic stratigraphic division results of well A modified by artificial stratigraphic division results of the adjacent shale gas drilling well.

Well	Layer	Artificial Geological Boundary Identified Results		Geological Boundary Identified by 1 Well Correlations after Deep Belief Forest Analysis		Geological Boundary Identified by 2 Well Correlations after Deep Belief Forest Analysis		Geological Boundary Identified by 3 Well Correlations after Deep Belief Forest Analysis
Well	Layer	Top Depth (m)	Bottom Depth (m)	Top Depth (m)	Bottom Depth (m)	Top Depth (m)	Bottom Depth (m)	Top Depth (m)	Bottom Depth (m)
Well A	L14	1273.471	1302.5	1281.022	1303.228	1274.81	1301.915	1274.058	1302.922
Well A	L13	1302.5	1311.652	1303.228	1313.349	1301.915	1314.614	1302.922	1311.677
Well A	L12	1311.652	1318.332	1313.349	1316.012	1314.614	1322.201	1311.677	1318.997
Well A	L11	1318.332	1320.368	1316.012	1321.495	1322.201	1323.094	1318.997	1320.597
Well A	W1	1320.368	1323	1321.495	1323.728	1323.094	1325.261	1320.597	1322.555
Well B	L14	2290.708	2321.879	2293.171	2320.868	2292.322	2322.993	2291.004	2322.636
Well B	L13	2321.879	2339.537	2320.868	2342.894	2322.993	2342.893	2322.636	2339.493
Well B	L12	2339.537	2345.506	2342.894	2348.412	2342.893	2345.719	2339.493	2345.201
Well B	L11	2345.506	2348.477	2348.412	2353.043	2345.719	2346.372	2345.201	2348.577
Well B	W1	2348.477	2357	2353.043	2355.871	2346.372	2357.055	2348.577	2356.591
Well C	L14	2906.318	2940.176	2898.726	2940.402	2905.641	2940.223	2906.81	2940.542
Well C	L13	2940.176	2954.237	2940.402	2952.837	2940.223	2956.697	2940.542	2953.414
Well C	L12	2954.237	2958.511	2952.837	2962.61	2956.697	2957.353	2953.414	2958.871
Well C	L11	2958.511	2959.96	2962.61	2957.152	2957.353	2963.454	2958.871	2959.993
Well C	W1	2959.96	2962.808	2957.152	2965.154	2963.454	2965.094	2959.993	2961.869
Well D	L14	2038.45	2073.933	2038.838	2073.495	2039.604	2071.56	2038.762	2072.952
Well D	L13	2073.933	2092.152	2073.495	2094.361	2071.56	2094.322	2072.952	2092.688
Well D	L12	2092.152	2101.53	2094.361	2098.649	2094.322	2102.831	2092.688	2100.649
Well D	L11	2101.53	2107.016	2098.649	2108.308	2102.831	2103.168	2100.649	2106.53
Well D	W1	2107.016	2111	2108.308	2109.049	2103.168	2111.777	2106.53	2111.796

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, T.; Zhu, Q.; Zhu, H.; Zhao, Q.; Shi, Z.; Zhao, S.; Zhang, C.; Wang, S. DRAG: A Novel Method for Automatic Geological Boundary Recognition in Shale Strata Using Multi-Well Log Curves. Processes 2023, 11, 2998. https://doi.org/10.3390/pr11102998

AMA Style

Zhou T, Zhu Q, Zhu H, Zhao Q, Shi Z, Zhao S, Zhang C, Wang S. DRAG: A Novel Method for Automatic Geological Boundary Recognition in Shale Strata Using Multi-Well Log Curves. Processes. 2023; 11(10):2998. https://doi.org/10.3390/pr11102998

Chicago/Turabian Style

Zhou, Tianqi, Qingzhong Zhu, Hangyi Zhu, Qun Zhao, Zhensheng Shi, Shengxian Zhao, Chenglin Zhang, and Shanyu Wang. 2023. "DRAG: A Novel Method for Automatic Geological Boundary Recognition in Shale Strata Using Multi-Well Log Curves" Processes 11, no. 10: 2998. https://doi.org/10.3390/pr11102998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DRAG: A Novel Method for Automatic Geological Boundary Recognition in Shale Strata Using Multi-Well Log Curves

Abstract

1. Introduction

2. The Proposed Method

2.1. Data Preprocessing

2.2. Sample Balance Treatment

2.3. Layering Recognition

3. Result Calibration

4. Result and Discussion

4.1. Targeted Stratigraphic Formations and Sub-Divisions

4.2. Experimental Data

4.3. Experimental Results and Analysis

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI