Next Article in Journal
Testing of Protective Gas Masks with an Emphasis on Subjective Opinions
Previous Article in Journal
An Improved 3D Reconstruction Method for Satellite Images Based on Generative Adversarial Network Image Enhancement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Innovative Deep Learning Approaches for High-Precision Segmentation and Characterization of Sandstone Pore Structures in Reservoirs

1
College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China
2
National Key Laboratory of Continental Shale Oil, Northeast Petroleum University, Daqing 163318, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(16), 7178; https://doi.org/10.3390/app14167178
Submission received: 4 July 2024 / Revised: 12 August 2024 / Accepted: 13 August 2024 / Published: 15 August 2024
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
The detailed characterization of the pore structure in sandstone is pivotal for the assessment of reservoir properties and the efficiency of oil and gas exploration. Traditional fully supervised learning algorithms are limited in performance enhancement and require a substantial amount of accurately annotated data, which can be challenging to obtain. To address this, we introduce a semi-supervised framework with a U-Net backbone network. Our dataset was curated from 295 two-dimensional CT grayscale images, selected at intervals from nine 4 mm sandstone core samples. To augment the dataset, we employed StyleGAN2-ADA to generate a large number of images with a style akin to real sandstone images. This approach allowed us to generate pseudo-labels through semi-supervised learning, with only a small subset of the data being annotated. The accuracy of these pseudo-labels was validated using ensemble learning methods. The experimental results demonstrated a pixel accuracy of 0.9993, with a pore volume discrepancy of just 0.0035 compared to the actual annotated data. Furthermore, by reconstructing the three-dimensional pore structure of the sandstone, we have shown that the synthetic three-dimensional pores can effectively approximate the throat length distribution of the real sandstone pores and exhibit high precision in simulating throat shapes.

1. Introduction

Digital core technology effectively overcomes the limitations of traditional rock experimental methods, including the difficulty of sample acquisition, equipment cost, and experimental period, by simulating the physical properties of rocks. This technology significantly reduces costs and improves experimental efficiency and the consistency of results. Compared to traditional methods like mercury intrusion and NMR [1], digital core technology offers significant advantages in evaluating microscopic pore structures. It uses digital images to reflect the internal structure of the core, precisely measuring key physical parameters such as porosity and permeability [2,3]. It also facilitates the intuitive assessment of reservoir characteristics and connectivity and allows for an in-depth study of fluid flow mechanisms within rock pores [4,5,6]. Additionally, when combined with numerical simulation, this technology can create accurate reservoir models, predict pressure and production changes during extraction, and provide robust data support for reservoir management and optimized oil extraction strategies [7]. As a core component of the digital transformation in the petroleum industry, digital core technology is rapidly gaining widespread attention. In this field, image processing technology plays a crucial role. The integration of digital core technology with image processing has significantly enhanced our understanding of rock microstructures and markedly improved the accuracy of fluid flow simulations [8]. Using deep learning-based generative adversarial networks (GANs) like StyleGAN, it is possible to create realistic rock images [9], simulating various geological conditions. This extends the application scope of digital core technology and improves the accuracy of identifying rock’s physical properties [10,11]. Image segmentation technology is pivotal in this process [12], accurately identifying microscopic structures such as pores and fractures, thus providing a solid foundation for fluid flow simulation. The combination of these technologies enhances the value of digital core technology in geological exploration and resource development. This paper explores the integration of digital core technology and image processing from the perspectives of image generation and image segmentation.
In the realm of image generation, GANs have demonstrated exceptional data-generation capabilities across various fields [13,14]. For instance, in 2020, Kramberger and Potočnik improved the quality of a large-scale automotive image dataset using StyleGAN [15]. This advancement has paved new avenues for image data augmentation and quality improvement. StyleGAN2-ADA, an advanced version of StyleGAN, uses adaptive discriminator augmentation to generate high-resolution, visually realistic images. In 2023, Wang et al. applied StyleGAN2-ADA to an algae classification model, significantly improving classification accuracy by 16% through image augmentation [16]. That same year, Ahn et al. generated high-resolution knee X-ray images using StyleGAN2-ADA, which closely resembled real images both visually and statistically, exhibiting characteristics akin to a real dataset [17]. In 2022, Liu and Mukerji ingeniously applied StyleGAN2-ADA to generate high-resolution 2D scanning electron microscope images, providing a new solution for the three-dimensional reconstruction of digital cores [18]. In 2023, Bhosale et al. explored using StyleGAN for data augmentation to enhance the segmentation accuracy of medical CT images [19]. This study employs StyleGAN2-ADA technology to augment 2D CT grayscale images, aiming to improve image segmentation accuracy by enhancing image quality.
The traditional methods of image segmentation technology are mainly based on two key characteristics: similarity and boundary discontinuity. Similarity means that the pixels in the same region of the image have similar gray values, which is the basis of the region segmentation method. Boundary discontinuity indicates that there are obvious gray changes between different regions, which is the key to the edge detection method. The OSTU algorithm is an efficient region segmentation algorithm [20] and has been applied to the analysis of rock porosity [21]. Through research, researchers have found that the rock physical properties calculated using the threshold–porosity relationship are more accurate than the classical segmentation methods [22,23]. With the rapid development of deep learning technology [24,25], image segmentation methods based on deep learning are constantly emerging [26,27,28] and are increasingly widely used in the field of digital cores. For example, Manzoor et al. proposed using the SegNet network to automatically segment digital rock images, effectively improving the accuracy of image processing [29]. Alqahtani et al. further explored the application of convolutional neural networks in the super-resolution segmentation of carbonate rock images [30], providing a new perspective for high-precision geological analysis. The U-Net architecture, due to its excellent image segmentation performance, has been widely used in the field of the automatic segmentation of rock images, especially in the image analysis of rock types such as sandstone [31]. In recent years, researchers have made various innovative improvements to U-Net. For example, in the research by Chen et al. in 2024, Pore-net [32] was proposed, which is an improved model based on U-Net. This model significantly improves the efficiency of pore identification by optimizing the data-reading strategy, increasing the number of convolutional layers, and integrating the Canny edge detection algorithm. In addition, Liu et al. proposed a deep learning method called UR network that combines U-Net and Res_Unet to address the segmentation accuracy problem caused by the adhesion of ore particles and dark areas [33]. This method first uses U-Net to conduct a preliminary ore contour detection and then further refines the contour through Res_Unet to improve the segmentation accuracy. In the research published by Li et al. in May 2024 [34], a new semantic segmentation network that combines U-Net and Transformer, namely Diamond-Unet, was proposed. This innovative model introduces the Feature Cross Fusion Path (FCFP) to enhance the network’s ability to capture fine-grained details and understand coarse-grained semantics in full-scale image segmentation. Although the above methods have achieved remarkable results in the fully supervised learning framework, as the requirements for image segmentation accuracy continue to increase, the traditional fully supervised methods gradually show limitations due to their dependence on a large amount of professionally annotated data. To address this challenge, semi-supervised learning methods have rapidly become the research hotspot due to their potential to reduce annotation requirements while maintaining high segmentation accuracy and have made breakthroughs in multiple fields [35,36,37]. Currently, semi-supervised technologies such as pseudo-label technology [38], consistency regularization [39], and graph learning methods [40] are particularly concerned. Yin et al. proposed a semi-supervised learning model SU-Net [41] based on U-Net in 2023 which combines unsupervised and semi-supervised learning strategies and effectively reduces the dependence on a large amount of labeled data. Liang et al. further proposed an improved semi-supervised support vector machine-fuzzy C-means (CSVM-FCM) algorithm [42] which realizes a more refined segmentation of pores and rocks in the image through the optimization of the objective function. In view of the limitations of the pseudo-label technology [43], we have proposed an innovative Adaptive Bagging method, improved the multi-model weight adjustment strategy, and optimized the pseudo-label generation process. We use multiple U-Net models to evaluate the accuracy of the pseudo-labels based on porosity consistency and, with the support of high-quality images generated by StyleGAN2-ADA, not only improve the robustness of the model but also enhance the generalization ability of the model, bringing new perspectives and solutions to the field of digital core image segmentation. Code is available at https://github.com/sumqiu/ELSSL and was accessed on 13 August 2024.

2. Materials and Methods

2.1. Overview

Figure 1 illustrates the workflow of the entire task, encompassing two critical stages: generation and segmentation. First, data acquisition is carried out, followed by the image generation task. Using the StyleGAN2-ADA model, a large number of high-resolution 2D grayscale images of sandstone are generated. Next, during the image segmentation phase, a portion of the generated images is annotated. A semi-supervised segmentation model combining Adaptive Bagging with pseudo-labeling is then used to train on both the large, unannotated dataset and the small, annotated dataset. Finally, the trained model is employed for 3D pore reconstruction.

2.2. Model Architecture

2.2.1. GAN

GAN (generative adversarial network) is a type of deep learning model whose core idea is to train two models simultaneously: a generative model (Generator, denoted as G) and a discriminative model (discriminator, denoted as D) [44]. The task of the generative model is to generate fake data samples. The generative model typically uses the following loss function:
L G = E Z ~ p z z [ log ( 1 D ( G ( z ) ) ) ]
Here, z represents random noise, p z z is the noise distribution, G ( z ) is the data generated by the generative model, and D ( G ( z ) ) indicates the discriminative model’s judgment of the generated data. The task of the discriminative model is to distinguish whether the received data sample is a real image from the training data or a fake image from the generative model. The discriminative model will provide a probability value indicating whether an input sample is a real image. Its loss function can be expressed as follows:
L D = E x ~ p t x [ log ( D ( x ) ) ] E Z ~ p z z [ log ( 1 D ( G ( z ) ) ) ]
x represents the actual data, p t x represents the distribution of the actual data, and D ( x ) represents the judgment of the discriminative model on the actual data.
During the training process, the generative model will continuously learn to deceive the discriminative model, and the discriminative model will also continuously learn to distinguish between real and fake data. The ultimate goal of the generative adversarial network (GAN) is to generate new data samples that are similar to the actual data. This paper accomplishes the task of generating sandstone sample two-dimensional CT grayscale images by using the adaptive enhancement of the generative adversarial network (StyleGAN2-ADA).

2.2.2. StyleGAN2-ADA

StyleGAN2-ADA is a training method for generative adversarial networks (GANs). It aims to address the discriminator overfitting issue that arises when training GANs with limited data, which can lead to divergence in the training process and a decline in the quality of generated images. The core technique of StyleGAN2-ADA is adaptive discriminator augmentation (ADA), an advancement over Stochastic Discriminator Augmentation (SDA). SDA applies a series of augmentation transformations to all the images seen by the discriminator while ensuring these transformations do not leak into the generated images, preventing the discriminator from overfitting to the training samples. SDA’s augmentation pipeline comprises 18 transformations, categorized into pixel-level operations, geometric transformations, color transformations, image space filtering, additive noise, and cropping. Each transformation is applied independently to each image with a certain probability. As shown in Figure 2, ADA dynamically adjusts the intensity of augmentations based on the degree of overfitting during training. It employs heuristic methods to quantify overfitting by measuring the difference between the training set and generated images and adjusts the augmentation probability p accordingly. If overfitting is detected, ADA increases the augmentation intensity; if overfitting is not apparent, it reduces the augmentation intensity.

2.2.3. U-Net

U-Net is a deep convolutional neural network (CNN) architecture designed for image segmentation tasks across various fields. Proposed by Olaf Ronneberger, Philipp Fischer, and Thomas Brox in 2015 [45], U-Net consists of multiple convolutional layers, activation functions (ReLU), upsampling layers, and skip connections. The network’s input is an annotated image, and its output is a segmentation map where each pixel is assigned to a category. The network’s contracting path progressively reduces the spatial dimensions of feature maps while increasing the number of channels. In the expansive path, upsampling and convolution operations gradually restore the spatial dimensions of the feature maps while reducing the number of channels, combining them with the feature maps from the contracting path through skip connections. By leveraging a diverse dataset, U-Net can better learn the boundaries between different categories within images.

2.2.4. Adaptive Bagging

Bootstrap Aggregating (Bagging) is an ensemble learning method that improves overall performance by combining the predictions of multiple models. In our segmentation task, we enhance the model’s generalization ability and segmentation accuracy by training multiple U-Net models with random parameters. Each model is independently trained using different subsets of data obtained from the training set through Bootstrap sampling, ensuring diversity among the models. In the traditional Bagging method, all models’ predictions are given equal weight, and the final prediction is based on a simple average of all the model predictions. However, the improved version of this study introduces an adaptive weight adjustment mechanism. The weight of each model in the ensemble is dynamically adjusted according to its performance on the validation set, thereby optimizing the overall predictive performance. The method for calculating the weights is as follows:
W i = e α ( β p i + ( 1 β ) q i ) j = 1 n e α ( β p j + ( 1 β ) q j )
Here, n represents the total number of models. The Intersection over Union (IoU) coefficient of each model i on the validation set is denoted as p i , and the Dice coefficient is denoted as q i . β is used to balance p i and q i , and α is a parameter that controls the degree of weight amplification. The exponential weighted value is denoted as e α p i , j = 1 n e α p j represents the sum of the exponential weighted values of all the models, and W i is the final weight value of model i.
To avoid the situation where model weights are too high, this study introduces an L2 regularization term to optimize the model weights. The weight calculation formula after introducing the L2 regularization is as follows:
W i = W i 1 + λ j = 1 n W j 2

2.2.5. Pseudo-Labeling

By introducing a semi-supervised learning strategy based on the idea of pseudo-labeling, the performance of the model can be significantly improved. The specific approach is to use the existing model to pre-segment the unlabeled data, and then use the pre-segmentation results as pseudo-labels for further training of the model. This method not only increases the size of the training dataset but also reduces the dependence on a large amount of manual annotation. In this process, the quality of the pseudo-labels is assessed through the following mechanism: if multiple models have a consistency indicator in the calculation of porosity for the segmentation results of the same image that is below a given consistency threshold, it indicates that this image is not suitable to be used as a pseudo-label.
Equations (5)–(7) are for calculating the consistency metrics of multiple models. Here, M represents the total number of models, and w m is the weight of model m . For a given sample x , collect the porosity values { p m ( x ) | m { 1 , , M } calculated by all the models for the pre-segmentation results, compute the weighted average μ x of these porosity values, then calculate the weighted standard deviation σ x of these porosity values, and finally, calculate the reciprocal of the coefficient of variation C x as the consistency score. The higher the score, the higher the consistency.
μ x = 1 M m = 1 M w m p m ( x )
σ x = 1 M m = 1 M w m ( p m ( x ) μ x ) 2
C x = μ x σ x
Therefore, only pseudo-labels with high consistency are retained for training. The entire process is iteratively executed, with the termination condition being that the consistency metrics of the segmentation results in porosity calculation for the sampled unlabeled data by multiple models remain consistently low, or the maximum number of iterations is reached. The training process is shown in Algorithm 1 below:
Algorithm 1: Adaptive Bagging and Pseudo-Labeling Algorithm
Input: Labeled dataset D L , Unlabeled dataset D U , Validation Dataset D V , Number of models M, Consistency threshold ϵ , Maximum iterations T
Result: Trained ensemble of models with high-quality pseudo-labeled data
Initialize: Train M base models on different bootstrap samples from D L , Set iteration counter t = 0
1     while t < T :
2         For   each   iteration   t   { 1 , , T } ,   For   each   model   m { 1 , , M } ,   Sample   bootstrap   dataset   D L m ,   Train   model   m   on   D L m
3         Dynamically   adjust   the   weights   of   each   model   based   on   their   performance   on   D V
4         For   each   sample   x D U ,   Collect   predictions   { p m ( x ) | m { 1 , , M } }
5        Calculate consistency score among predictions
6        if consistency   score > ϵ :
7           Assign   pseudo - label   y = m o d e ( { p m ( x ) } )
8           Add   ( x , y )   to   pseudo - labeled   dataset   D p
9        else:
10           Discard   sample   x
11        end if
12         Combine   D L   and   D p   to   form   extended   labeled   dataset   D e
14        if the   consistency   score   for   all   samples   in   D U < ϵ :
15           break
16         Retrain   all   M   models   on   D e
17          t = t + 1
18      end while

2.3. Data Acquisition

2.3.1. Micro-CT Scanning

Micro-Computed Tomography (CT) scanning technology plays an important role in the non-destructive detection of the internal microstructure of samples thanks to its non-destructive nature and high resolution at the micrometer level. It is particularly suitable for the meticulous observation of small-scale samples, such as microorganisms, cellular tissues, and nanostructures in materials science. This technique uses X-rays to penetrate the sample and, in combination with the signals received by the detector, generates clear three-dimensional images using computer algorithms. It provides researchers with a way to deeply observe the internal structure of the sample without physical sectioning. The CT scanning parameters for this session are shown in Table 1.
To conduct a more in-depth and intuitive analysis of the internal structure of geological samples such as rocks, it is necessary to combine the Avizo software. The Avizo software uses three-dimensional reconstruction technology to build an accurate digital rock core model from the micro-CT scan data. By simulating the flow path of fluids within the rock core, the absolute permeability of the rock sample can be calculated. The Avizo software also provides advanced three-dimensional visualization functions, including image data acquisition, three-dimensional reconstruction, image feature extraction, and segmentation, which are crucial for extracting useful information from the micro-CT scan data.

2.3.2. Dataset Optimization

This study drilled 9 small cylindrical samples, each with a diameter of 4 mm, from a cylindrical sandstone sample with a diameter of 2.5 cm. Using CT scanning technology, high-precision imaging was performed on the samples. Subsequently, the CT scan results were meticulously processed using the Avizo software. The imaging data of the first small cylinder contained 840 continuous two-dimensional CT grayscale images of 800 × 845 pixels (see Figure 3). To avoid the pattern collapse phenomenon that may occur in deep learning training, where the generative model begins to generate almost identical images (see Figure 4), this study adopted an innovative interval selection strategy. This strategy ensures the diversity and quality of training data by assessing the degree of change between continuous-layer two-dimensional CT grayscale images. We established a method of selecting one image as training data from every 30 two-dimensional CT grayscale images by calculating the similarity between consecutive layer images. This method ensures that the selected images have significant visual differences, thus avoiding excessive similarity between the images. Ultimately, the first small cylinder selected 28 two-dimensional CT grayscale images as training data. The processing of the other samples followed the same method, accumulating a total of 295 non-continuous high-resolution two-dimensional CT grayscale images of 1024 × 1024 pixels, providing a high-quality dataset for the subsequent deep learning training.

2.4. Evaluation Metrics

2.4.1. Evaluation Metrics for Image Generation Tasks

For image generation tasks, we use Inception Score (IS), Kernel Inception Distance (KID), and Learned Perceptual Image Patch Similarity (LPIPS) as metrics [46]. IS evaluates the diversity and authenticity of the generated images by calculating the KL divergence between the distributions of the model-generated images and the real images. A high IS value indicates the diversity and high quality of the generated images. The calculation formula is as follows:
I S = exp E x ~ p t [ D K L ( p ( y | x ) | | p ( y ) ) ]
Here, p t represents the distribution of the actual data, p ( y | x ) represents the conditional probability distribution of predicting category y given input x , p ( y ) represents the marginal probability distribution of the model predicting category y over the entire dataset, and D K L is the Kullback–Leibler divergence, which measures the difference between two probability distributions.
Due to the inherent bias affecting the evaluation results of the Fréchet Inception Distance (FID) when there are fewer real image samples, FID is not suitable as an evaluation metric for our image generation task [47]. To address this issue, we have adopted the unbiased Kid as an alternative. Kid assesses the performance of the generative model by measuring the difference between the mean embeddings of the real data and generated data in the feature space. The formula is as follows:
K I D = T r ( ( ϕ t ϕ m ) T ( ϕ t ϕ m ) )
ϕ t represents the embedded representation of the real dataset, typically extracted through a specific layer of the Inception network, while ϕ m is the embedded representation of the image output by the generative model, also extracted through the same layer of the Inception network. T r ( A T A ) is the Hilbert–Schmidt norm, which can be used to measure the difference between two embedded representations.
LPIPS is an advanced image quality assessment tool that uses deep learning models to measure the perceptual differences between images. Compared to traditional pixel-level comparison methods such as SSIM, LPIPS is more reflective of human visual perception characteristics. In the LPIPS assessment process (see Figure 5), the input of two images is first passed through a pre-trained convolutional neural network (such as VGG19) to extract feature embeddings, which capture high-level semantic information of the images. Subsequently, the feature embeddings are normalized on each channel, scaled to the range of [−1, 1] or [0, 1], and then weighted by a weight vector. Next, the L2 distance of the weighted feature embeddings is calculated to quantify the differences in the feature space of the images. The L2 distances of all the layers are averaged to obtain a global distance measurement, corresponding to the d0 and d1 of the image pair, respectively. Finally, these two sets of global distance measurements are used as inputs to provide a small network G, which is trained to predict human perceptual judgments of the image pairs, outputting h.

2.4.2. Evaluation Metrics for Image Segmentation Tasks

IoU, Recall, and pixel accuracy (PA) are common metrics for evaluating image segmentation tasks. IoU provides a quantitative measure to assess the consistency between the predicted segmentation area and the true segmentation area. Recall represents the proportion of the actual segmented area that is correctly identified, and PA measures the proportion of pixels in the image that are correctly classified out of the total number of pixels. Their calculation formulas are shown as follows:
I o U = T P T P + F P + F N
R e c a l l = T P T P + F N
P A = T P + T N T P + T N + F P + F N
Among them, T P refers to the pixels or areas that the model correctly identifies as belonging to a certain category, F P refers to the pixels or areas that the model incorrectly identifies as belonging to a certain category when they do not, F N refers to the pixels or areas that the model fails to identify as belonging to a certain category, and T N refers to the pixels or areas that the model predicts as the background or other categories, which are also actually the background or other non-target categories.
To comprehensively evaluate the performance of segmentation models, we have also introduced the Local Consistency metric, which is a concept in Constraint Satisfaction Problems (CSPs). It refers to a strategy for advancing problem-solving by ensuring that a part of the problem (such as a single variable or a pair of variables) satisfies all the relevant constraints during the problem-solving process. For example, in Path Consistency, it is required that for any path of length k, if the variable x is on the path and y and z are the previous and next variables of x , respectively, the following must be satisfied:
C x , z C x , y o C y , z
In the given context, C x , y represents the constraint between the variables x and y , C x , z represents the constraint between the variables x and z , and C y , z represents the constraint between the variables y and z . o denotes the combination of constraints. Local Consistency focuses on the consistency of the segmentation results within a local area, that is, whether the segmentation results are smooth within a small neighborhood, without obvious jumps or noise. This metric reflects the smoothness and continuity of the segmentation results, supplementing the pixel accuracy.
In conjunction with specific segmentation tasks, we also use porosity difference as an evaluation metric. Here is its mathematical expression:
Δ P = | C 0,1 C 0,2 N |
The total number of pixels in the segmented result image is N , C 0,1 represents the total number of pixels indicating the pore area in the labeled image, C 0,2 represents the total number of pixels indicating the pore area in the segmented image using the model, and Δ P is the porosity difference calculated in the end.

3. Results and Discussion

3.1. Training Details

The training utilized the PyTorch deep learning framework, with all the processes conducted on a device equipped with an Nvidia GeForce 3080Ti GPU and running Windows 10. For the image generation task, the frequency of saving image snapshots and network weight snapshots was set based on the number of training iterations, with each iteration generating an image snapshot and network weight snapshot, storing sample image data and model weights. The adaptive learning rate adjustment targeted a value of 0.6 for loss function smoothness, aiming to approximate this target. During the actual training loop, a custom sampler continuously fetched data from the dataset in an infinite loop. The DataLoader iterator’s batch loading and data shuffling functions ensured different data orders in each training session, preventing dependency on a specific data sequence.
For the image segmentation task, our primary goal was to accurately annotate the pore regions in the images to enhance data usability. Given the vast amount of image data, manual annotation was not only time-consuming but also labor-intensive. To address this issue, we initially experimented with the OTSU algorithm, an automatic segmentation method that determines the global threshold based on maximizing inter-class variance. As shown in Figure 6, the OTSU algorithm performed excellently on the images with distinct pore features and low clay content. However, when dealing with images where pore features were indistinct or clay content was high, the OTSU algorithm struggled to find an appropriate global threshold, resulting in suboptimal segmentation performance.
While manual annotation can provide higher accuracy, especially for complex or detail-rich images, relying solely on manual annotation for a vast dataset remains prohibitively expensive. Therefore, we chose a semi-supervised learning model combining Adaptive Bagging with pseudo-labeling. First, we generated an enhanced dataset named Synthetic20000 consisting of 20,000 high-quality images using the trained StyleGAN2-ADA model. From Synthetic20000, we meticulously selected 600 images for professional manual annotation to accurately delineate pore regions. Of these, 420 images were used as initial training data, named train420, and the remaining 180 images were split equally into test and validation sets, named test90 and valid90, respectively. In the first training round, we trained ten U-Net420 models using these 420 images. The initial weights of these models were random, but they were adaptively adjusted during iterative training to optimize the ensemble strategy. Subsequently, we performed preliminary segmentation on the remaining unannotated images in Synthetic20000, initially set at n = 1000. After segmentation, we calculated the porosity consistency of the results. High porosity consistency indicated a high confidence level in the models’ predictions for that image, which could then be included in the training set as pseudo-labeled images. Ultimately, the ten trained models were integrated into a final model. This iterative training strategy not only enhanced the models’ generalization capabilities but also effectively handled large-scale image datasets while maintaining high annotation accuracy and efficiency.

3.2. Image Generation Results

This study quantitatively analyzed the image generation results, demonstrating the impact of different training parameters on model performance. Specifically, as shown in Table 2, we set three different total training image counts: 500,000, 1,000,000, and 2,000,000, corresponding to training endpoints after 500,000, 1,000,000, and 2,000,000 image iterations, respectively. Additionally, by adjusting the batch sizes to 4, 8, and 16, we further refined the training process to explore its specific effects on the results. Through nine different training settings, we deeply analyzed and compared the models’ performance under various parameter configurations. In the field of image generation, a high Inception Score (IS) does not guarantee that generated images will have visual style consistency and diverse content similar to the real images. To address this issue, we introduced the KID and LPIPS metrics. The KID metric measures the proximity of the generated images to the real images at the feature level; a lower KID value indicates that the generated image distribution closely approximates the real image distribution, reflecting superior model performance. Conversely, the LPIPS metric assesses the visual similarity between the generated and real images from a human perceptual perspective; a lower LPIPS value signifies a closer visual resemblance. As the total number of training images and batch size increased, the overall performance of the model improved. Under the condition of a total training image count of 500,000 and a batch size of 4, we observed an IS of 1.0120, a KID of 7.4536, and an LPIPS of 0.5521. When the batch size increased from 4 to 8 and then to 16, the IS value significantly improved by 0.5937, the KID value significantly decreased by 2.5479, and the LPIPS value slightly increased by 0.0237. These changes indicate that with an increase in batch size, both the diversity of the generated images and their proximity to real images in the feature space improved, while their visual similarity to real images also increased. Furthermore, when the total number of training images increased to 1,000,000 and the batch size increased from 4 to 16, the IS value further rose by 0.3705, the KID value decreased by 2.3793, and the LPIPS value decreased by 0.0873. The rate of change in these values gradually slowed, indicating that at this stage, the performance improvement of the model began to stabilize, although there was still room for further parameter optimization. Finally, when the total number of training images increased to 2,000,000 and the batch size remained adjusted from 4 to 16, the IS value rose by 0.238, the KID value decreased by 0.9378, and the LPIPS value decreased by 0.0421. Compared to the previous stage, the changes in these indicators were smaller, showing that the model’s performance was approaching its optimal state. With a total training image count of 2,000,000 and a batch size of 16, the model not only performed excellently in the diversity of the generated images but also achieved significant improvements in visual quality, attaining dual optimization in visual and statistical characteristics. This indicates that meticulously adjusting training parameters can significantly enhance the performance of the generative model.
In previous quantitative analyses, it was determined that increasing the total number of training images to 2,000,000 and adjusting the batch size to 16 significantly enhanced the performance of the generative model. To further evaluate the model’s generative capabilities, we conducted a visual analysis of the generated images, as shown in Figure 7a,b. Through detailed comparisons, it is evident that the generated images exhibit a high degree of visual similarity to real images. Specifically, the generated images accurately simulate the complex details found in real images, including the uneven distribution of large and small pores, fine cracks between particles, and the intricate mixture of clay and pores. The accurate reproduction of these details not only confirms the model’s high precision in simulating real images but also demonstrates the effective guarantee of diversity in the generated images. Combining the quantitative and qualitative evaluation results, we can conclude that this generative model excels in simulating real images and achieves outstanding visual quality.

3.3. Image Segmentation Results

3.3.1. Ablation Experiments

To evaluate the potential enhancement in model performance brought by Adaptive Bagging and pseudo-label techniques in image segmentation tasks, this study employed ablation experiments. Ablation experiments are a systematic analysis method that involves the gradual removal or replacement of the various components of the model to quantitatively assess their specific impact on the final performance. The datasets used were Synthetic20000, train420, valid90, and test90. This approach allows for a precise analysis of the performance of each component on the test90 dataset and subsequently assesses their contributions to the overall model performance. Specifically, this experiment aims to verify the contributions of the Adaptive Bagging and pseudo-label strategies in segmentation tasks. By comparing performance changes with the removal or application of different strategies, we can gain a deeper understanding of their effectiveness. The experimental results, as shown in Table 3, clearly demonstrate the impact of these strategies on model performance.
The initial baseline U-Net model, without any additional strategy optimization, maintained an IoU metric of around 0.96. The porosity difference reached 0.0231, reflecting the model’s deficiency in identifying clay-filled areas within the images, leading to the over-segmentation of the pore regions and resulting in deviations in porosity calculations. Introducing pseudo-label significantly enhanced the model’s segmentation performance: IoU surged to 0.9796, Recall increased to 0.9680, and PA rose to 0.9724. These substantial improvements indicate a qualitative leap in segmentation accuracy, markedly reducing the porosity difference to 0.0156. Simultaneously, the improvement in Local Consistency indicates a significant enhancement in the quality of the segmented images, effectively reducing noise. Further application of Adaptive Bagging optimized the model performance even more. Through iterative selection, ensemble learning continually enhanced model performance. Ultimately, combining pseudo-label and Adaptive Bagging strategies, the model achieved scores above 0.99 in key metrics such as IoU, Recall, PA, and Local Consistency, with PA reaching 0.9993, indicating a high overlap between the predicted and true segmentation regions. The porosity difference further decreased to 0.0035, demonstrating the model’s exceptional performance and accuracy in image segmentation tasks.

3.3.2. Qualitative Analysis

In this study, an ensemble learning approach was adopted to improve the accuracy of image segmentation. Specifically, we integrated multiple well-trained models to independently segment the same unsegmented image. Subsequently, we calculated the porosity consistency values of these model segmentation results and compared them with a preset consistency threshold. If the consistency value exceeded the threshold, the segmentation result was deemed reliable and used as a pseudo-label; if it was below the threshold, the result was excluded. We set two consistency thresholds: a high threshold of 28 and a low threshold of 18, and Figure 8 visually demonstrates the specific impact of the different threshold settings on segmentation results. The findings reveal that the models trained with a high consistency threshold significantly outperformed those trained with a low consistency threshold in segmentation accuracy. Particularly, in handling fine cracks between particles, the segmentation results with a high consistency threshold were more precise. Furthermore, for identifying clay-filled regions within pores, the segmentation results were robust and showed minimal differences between the two thresholds. However, in identifying large pores, although the high consistency threshold had some advantages in detail handling, both threshold-trained models generally performed well. Through qualitative analysis, we can clearly observe the impact of the different consistency threshold settings on the segmentation results. The high consistency threshold has a distinct advantage in improving the segmentation accuracy, especially in recognizing fine cracks. Although the performance in identifying large pores was similar between the two thresholds, the high consistency threshold models had notable advantages in detail handling. These findings provide valuable insights for image segmentation research and guide further model optimization.

3.3.3. Comparative Analysis

This study compared seven popular image segmentation methods, including three fully supervised and four semi-supervised methods, using the datasets Synthetic20000, train420, valid90, and test90. The fully supervised methods were trained on train420, while the semi-supervised methods, using a U-Net as the backbone network, were trained on Synthetic20000, train420, and valid90, with all the methods subsequently tested on test90. The evaluation metrics included IoU, PA, and porosity difference. As shown in Table 4, among the fully supervised methods, SegNet performed relatively poorly, with both IoU and PA around 0.95 and a porosity difference of 0.0968, indicating a deficiency in pore detection. In contrast, U-Net++ performed the best, achieving an IoU of 0.9806 and a PA of 0.9699, though there was still room for improvement in the porosity difference. Among the semi-supervised methods, including the one proposed in this study, all the metrics surpassed those of U-Net++, demonstrating that semi-supervised models trained on extended datasets could significantly enhance performance. Notably, the Mean Teacher (MN) method excelled, with IoU and PA both exceeding 0.98 and a porosity difference reduced to 0.0079, indicating high accuracy. However, the method proposed in this study, incorporating adaptive weight adjustment, ensemble strategies, and the optimization of the semi-supervised process, ultimately surpassed MN, achieving superior performance.

3.3.4. Three-Dimensional Pore Reconstruction and Micro-Pore Structure Analysis

To further evaluate the segmentation model’s performance, this study segmented 840 consecutive 2D CT grayscale images of the first scanned sandstone column. Subsequently, all the segmentation results were reassembled into a 3D pore map in their original sequence and compared with the real 3D pore map of the first sandstone column (Figure 9a,b). Previously, observing the segmentation results from a 2D perspective allowed only the single-image analysis of each segmentation map’s shortcomings. However, synthesizing the 2D segmentation results into a 3D image can more comprehensively reveal the segmentation model’s deficiencies. This 3D comparative analysis method not only showcases the local errors of each segmentation image but also reflects the cumulative effect of these errors in the overall structure, providing a more comprehensive and in-depth basis for optimizing the segmentation model.
As shown in Figure 10a, the distribution of throat lengths in both the real and synthetic 3D pores exhibits a certain degree of consistency, despite some subtle differences. Prior to the intersection point, the synthetic 3D pores show a higher probability distribution in the shorter throat length region, indicating that the segmentation model is more sensitive to, or more inclined towards, identifying shorter throat structures. However, beyond the intersection point, the probability distribution of the real 3D pores surpasses that of the synthetic ones, suggesting that the segmentation model’s accuracy in recognizing longer throats requires improvement. This variation implies that the model may need further optimization to more accurately reflect the throat length characteristics of real sandstone. Despite these differences, the overall distribution trends in both datasets are similar, indicating that the synthetic 3D pores can fairly approximate the throat length distribution of real sandstone to a certain extent. This similarity reflects the segmentation model’s high accuracy in simulating the pore structure of sandstone. Nevertheless, to more precisely emulate the characteristics of real sandstone, especially in recognizing longer throats, the segmentation model may need further fine-tuning to balance the identification of the throats of varying lengths.
As depicted in Figure 10b, analyzing the coordination number distribution of the real and synthetic 3D pores reveals a similar high probability distribution in the low coordination number region, indicating that most pore points have few connections. This similarity suggests that both the real and synthetic pores predominantly feature a pore network with low coordination numbers. However, as the coordination number increases, the probability distribution of the synthetic pores diverges from that of the real pores. Specifically, the probability distribution of synthetic pores drops rapidly when the coordination number exceeds three, whereas the probability distribution of the real pores declines more gradually. This discrepancy may reveal limitations or biases in the segmentation model’s ability to identify high coordination number pore structures. Particularly in regions with a coordination number greater than four, the probability distribution of the synthetic pores is significantly lower than that of real pores, potentially impacting the accurate representation of the complexity of the sandstone pore network.
The pore radius distributions of the real and synthetic 3D pores are shown in Figure 11a. The pore radius distribution of real 3D pores starts at smaller radii, reaching a peak probability distribution as the radius increases, and then gradually declines. The highest probability distribution for real 3D pores occurs at a pore radius of 9.342494 μm, indicating that pores within this radius range are more common. In contrast, the synthetic 3D pores generated by the segmentation model reach their peak probability distribution at a slightly higher radius of 10.353425 μm. This suggests that the segmentation model might be more sensitive or biased towards identifying medium-radius pores. As the pore radius further increases, the probability distribution of synthetic 3D pores declines more rapidly, particularly in the identification of larger radius pores, where the probability distribution is lower than that of real 3D pores. This implies that the segmentation model’s accuracy diminishes in simulating large-radius pore structures. Overall, while the segmentation model exhibits trends similar to real 3D pores in certain aspects of the pore radius distribution, the rapid decline in probability distribution for larger radius pores indicates that further optimization and adjustment may be necessary to improve simulation in this area.
As illustrated in Figure 11b, the analysis of throat shape factor distribution in real and synthetic 3D pores reveals a high degree of overall consistency. Starting from the low-value region of the throat shape factor, the probability distributions of both the pore structures gradually increase, indicating a high occurrence frequency in the low to medium shape factor range. As the throat shape factor increases, the probability distributions of both pore structures peak and then begin to decline. This trend reflects that in the medium shape factor region, both real and synthetic pores tend to exhibit common throat morphologies. Although the peak probability distribution of the synthetic pores is slightly lower than that of the real pores, the overall peak regions are similar, demonstrating the high accuracy of synthetic pores in simulating the shape factors of real throats. In the high shape factor region, the probability distributions of both pore structures rapidly decrease, but the decline is more gradual for real pores, suggesting a certain proportion of throats with complex shapes in the real pores. Despite the steeper decline in the synthetic pores’ distribution, the overall fit between the shape factor distributions of synthetic and real pores is high, indicating good performance of the segmentation model in simulating the shape factors of real pores. Overall, the shape factor distribution of the synthetic 3D pores shows a good fit with that of real 3D pores, especially in the medium shape factor range.

3.4. Limitations Analysis and Future Prospects

In the task of pore segmentation in the 2D CT grayscale images of sandstone, the segmentation model exhibited outstanding performance. However, when applied to 3D data, its performance significantly declined, particularly in identifying medium-radius pores. This decline in performance stems from the model’s overfitting in this specific area, leading to increased sensitivity to noise and a failure to adequately learn and simulate the complex morphological characteristics of medium-radius pores. Additionally, this study focused solely on sandstone, without encompassing other rock types such as carbonate rocks and shale, which limits the model’s generalizability. Considering the differences in pore structures across different rock types, the model’s performance may decline when applied to new rock types. The study foresees challenges in generalizing the model’s capabilities to identify pores in a diverse array of rock types. Continuing to use the StyleGAN2-ADA generative model may not suffice to meet this challenge, as the model’s stability and training time will be affected by the increase in rock types. Furthermore, the current semi-supervised image segmentation method combining Adaptive Bagging, pseudo-labeling, and U-Net shows limitations when dealing with a growing variety of rock types. Particularly, the computational complexity of Adaptive Bagging may increase sharply, necessitating a reevaluation of the ensemble model’s number and structure. However, the adaptive weight adjustment mechanism will demonstrate its advantages in handling more complex tasks. The pseudo-labeling strategy introduces a potential issue: mixing less accurate labels with professionally annotated data could lead to decreased accuracy during model training. Although our proposed method of calculating porosity consistency scores alleviates this problem to some extent, its effectiveness may be limited as the variety of rock types increases. Therefore, developing more refined algorithms to optimize this process is necessary for the future.
The study has the potential for further improvement, particularly in generalizing across different rock types, and plans to expand the method to accommodate a broader range of rock types to enhance its generalizability. Additionally, the authors are committed to making a series of innovative adjustments to the model. First, they will attempt to combine the Adaptive Bagging ensemble learning method with the StyleGAN2-ADA model. This innovative combination is expected to significantly enhance the model’s performance in handling complex tasks. The plan involves integrating multiple StyleGAN2-ADA models trained on different rock types or pore features to enhance the model’s adaptability to diverse rock types. Second, for optimizing the segmentation method, the traditional ensemble U-Net model may be insufficient to handle increasing complexity. Therefore, the study will draw on the methods proposed in the DatasetGAN model [48], training a Style Interpreter network with a small amount of labeled data. This network will serve as a label generation branch within the StyleGAN2-ADA architecture, automatically generating high-quality pixel-level annotations for many unlabeled images. By combining the advantages of ensemble learning strategies with the StyleGAN2-ADA model, we expect to significantly improve the model’s generation and segmentation capabilities when handling large-scale and highly complex tasks. These strategies will significantly enhance the model’s performance while providing new perspectives and solutions for the field of rock pore segmentation.

4. Conclusions

Our study successfully utilized StyleGAN2-ADA to generate high-resolution 2D CT grayscale images of sandstone from a limited dataset. These synthetic images, characterized by high stochasticity and diversity, were applied to pore structure image segmentation using a semi-supervised method that combines Adaptive Bagging, pseudo-labeling, and U-Net. Adaptive Bagging, an innovative enhancement of traditional Bagging, adaptively adjusts model weights to improve prediction performance. We also developed a novel method to ensure consistency in pore volume predictions across multiple models, enhancing the pseudo-labeling process.
Our approach demonstrated significant practical value in accurately identifying pore networks, predicting rock permeability and strength, and optimizing oil and gas extraction. The trained segmentation model enabled the synthesis of the 3D pore structures of sandstone columns with minimal differences compared to the real structures, highlighting the high accuracy of our method. This achievement underscores the potential of artificial intelligence in geoscience applications, offering a new and effective way to analyze rock pore structures and improve resource utilization efficiency.

Author Contributions

Methodology, Z.W. and L.S.; software, H.L. and X.Q.; validation, X.S. and H.L.; investigation, Z.W. and L.S.; resources, X.S. and L.C.; writing—original draft preparation, Z.W., H.L. and X.Q.; writing—review and editing, L.C.; visualization, Z.W. and X.Q.; supervision, L.S.; project administration, L.S.; funding acquisition, X.S., L.C. and L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number.: 42304130).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from the National Key Laboratory of Continental Shale Oil, Northeast Petroleum University, and are available from the author L.C. with the permission of the National Key Laboratory of Continental Shale Oil, Northeast Petroleum University.

Acknowledgments

Thanks to the National Key Laboratory of Continental Shale Oil, Northeast Petroleum University, for providing the sandstone microscopic CT scan data. Special thanks to L.C. for his guidance on the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Jiang, L.; Liu, T.; Yu, C.; Meng, H.; Wang, L.; Alamin, M.; Cao, X. A Novel Inversion Combining NMR Log and Conventional Logs. Appl. Magn. Reson. 2020, 51, 85–101. [Google Scholar] [CrossRef]
  2. Zhang, L.; Jing, W.; Yang, Y.; Yang, H.; Guo, Y.; Sun, H.; Zhao, J.; Yao, J. The investigation of permeability calculation using digital core simulation technology. Energies 2019, 12, 3273. [Google Scholar] [CrossRef]
  3. March, R.; Egya, D.; Maier, C.; Busch, A.; Doster, F. Numerical computation of stress-permeability relationships of fracture networks in a shale rock. arXiv 2020, arXiv:2012.02080. [Google Scholar]
  4. Guo, Y.; Liang, Y.; Li, J.; Gong, B. A novel connectivity-based hierarchical model for multi-scale fracture system in carbonate reservoir simulation. Fuel 2019, 250, 327–338. [Google Scholar] [CrossRef]
  5. Zhao, Y.; Zhu, G.; Liu, S.; Wang, Y.; Zhang, C. Effects of pore structure on stress-dependent fluid flow in synthetic porous rocks using microfocus x-ray computed tomography. Transp. Porous Media 2019, 128, 653–675. [Google Scholar] [CrossRef]
  6. Zhang, Y.; Lin, C.; Ren, L. Flow Patterns and Pore Structure Effects on Residual Oil during Water and CO2 Flooding: In Situ CT Scanning. Energy Fuels 2023, 37, 15570–15586. [Google Scholar] [CrossRef]
  7. Zhu, L.; Zhang, C.; Zhang, C.; Zhou, X.; Zhang, Z.; Nie, X.; Liu, W.; Zhu, B. Challenges and prospects of digital core—Reconstruction research. Geofluids 2019, 2019, 7814180. [Google Scholar] [CrossRef]
  8. Liao, Q.; You, S.; Cui, M.; Guo, X.; Aljawad, M.S.; Patil, S. Digital Core Permeability Computation by Image Processing Techniques. Water 2023, 15, 1995. [Google Scholar] [CrossRef]
  9. Zha, W.; Li, X.; Li, D.; Xing, Y.; He, L.; Tan, J. Shale digital core image generation based on generative adversarial networks. J. Energy Resour. Technol. 2021, 143, 033003. [Google Scholar] [CrossRef]
  10. Zhang, T.; Li, D.; Lu, F. A pore space reconstruction method of shale based on autoencoders and generative adversarial networks. Comput. Geosci. 2021, 25, 2149–2165. [Google Scholar] [CrossRef]
  11. He, L.; Gui, F.; Hu, M.; Li, D.; Zha, W.; Tan, J. Digital core image reconstruction based on residual self-attention generative adversarial networks. Comput. Geosci. 2023, 27, 499–514. [Google Scholar] [CrossRef]
  12. Zhao, J.; Zhang, M.; Wang, C.; Mao, Z.; Zhang, Y. Application of the backpropagation neural network image segmentation method with genetic algorithm optimization in micropores of intersalt shale reservoirs. ACS Omega 2021, 6, 25246–25257. [Google Scholar] [CrossRef] [PubMed]
  13. Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
  14. de Souza, V.L.T.; Marques, B.A.D.; Batagelo, H.C.; Gois, J.P. A review on generative adversarial networks for image generation. Comput. Graph. 2023, 114, 13–25. [Google Scholar] [CrossRef]
  15. Kramberger, T.; Potočnik, B. LSUN-Stanford car dataset: Enhancing large-scale car image datasets using deep learning for usage in GAN training. Appl. Sci. 2020, 10, 4913. [Google Scholar] [CrossRef]
  16. Chan, W.H.; Fung, B.S.; Tsang, D.H.; Lo, I.M. A freshwater algae classification system based on machine learning with StyleGAN2-ADA augmentation for limited and imbalanced datasets. Water Res. 2023, 243, 120409. [Google Scholar] [CrossRef] [PubMed]
  17. Ahn, G.; Choi, B.S.; Ko, S.; Jo, C.; Han, H.S.; Lee, M.C.; Ro, D.H. High-resolution knee plain radiography image synthesis using style generative adversarial network adaptive discriminator augmentation. J. Orthop. Res. 2023, 41, 84–93. [Google Scholar] [CrossRef]
  18. Liu, M.; Mukerji, T. Multiscale fusion of digital rock images based on deep generative adversarial networks. Geophys. Res. Lett. 2022, 49, e2022GL098342. [Google Scholar] [CrossRef]
  19. Bhosale, S.; Krishna, A.; Wang, G.; Mueller, K. Improving CT Image Segmentation Accuracy Using StyleGAN Driven Data Augmentation. arXiv 2023, arXiv:2302.03285. [Google Scholar]
  20. Huang, C.; Li, X.; Wen, Y. AN OTSU image segmentation based on fruitfly optimization algorithm. Alex. Eng. J. 2021, 60, 183–188. [Google Scholar] [CrossRef]
  21. Wu, Y.; Li, X.; Wang, Y.; Zhang, B. An Improved Algorithm in Porosity Characteristics Analysis for Rock and Soil Aggregate. Discret. Dyn. Nat. Soc. 2014, 2014, 798235. [Google Scholar] [CrossRef]
  22. Shou, Y.; Zhao, Z.; Zhou, X. Sensitivity analysis of segmentation techniques and voxel resolution on rock physical properties by X-ray imaging. J. Struct. Geol. 2020, 133, 103978. [Google Scholar] [CrossRef]
  23. Lin, W.; Li, X.; Yang, Z.; Lin, L.; Xiong, S.; Wang, Z.; Wang, X.; Xiao, Q. A new improved threshold segmentation method for scanning images of reservoir rocks considering pore fractal characteristics. Fractals 2018, 26, 1840003. [Google Scholar] [CrossRef]
  24. Michelucci, U. Applied Deep Learning: A Case-Based Approach to Understanding Deep Neural Networks; Apress: Berkeley, CA, USA, 2018. [Google Scholar]
  25. Menghani, G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 2023, 55, 259. [Google Scholar] [CrossRef]
  26. Ghosh, S.; Das, N.; Das, I.; Maulik, U. Understanding deep learning techniques for image segmentation. ACM Comput. Surv. (CSUR) 2019, 52, 73. [Google Scholar] [CrossRef]
  27. Liu, X.; Song, L.; Liu, S.; Zhang, Y. A review of deep-learning-based medical image segmentation methods. Sustainability 2021, 13, 1224. [Google Scholar] [CrossRef]
  28. Shen, T.; Huang, F.; Zhang, X. CT medical image segmentation algorithm based on deep learning technology. Math. Biosci. Eng. 2023, 20, 10954–10976. [Google Scholar] [CrossRef] [PubMed]
  29. Manzoor, S.; Qasim, T.; Bhatti, N.; Zia, M. Segmentation of digital rock images using texture analysis and deep network. Arab. J. Geosci. 2023, 16, 436. [Google Scholar] [CrossRef]
  30. Alqahtani, N.J.; Niu, Y.; Wang, Y.D.; Chung, T.; Lanetc, Z.; Zhuravljov, A.; Armstrong, R.T.; Mostaghimi, P. Super-resolved segmentation of X-ray images of carbonate rocks using deep learning. Transp. Porous Media 2022, 143, 497–525. [Google Scholar] [CrossRef]
  31. Wang, F.; Zai, Y. Image segmentation and flow prediction of digital rock with U-net network. Adv. Water Resour. 2023, 172, 104384. [Google Scholar] [CrossRef]
  32. Chen, X.; Tang, X.; Xiong, J.; He, R.; Wang, B. Pore characterization was achieved based on the improved U-net deep learning network model and scanning electron microscope images. Pet. Sci. Technol. 2024, 371, 131923. [Google Scholar] [CrossRef]
  33. Liu, X.; Zhang, Y.; Jing, H.; Wang, L.; Zhao, S. Ore image segmentation method using U-Net and Res_Unet convolutional networks. RSC Adv. 2020, 10, 9396–9406. [Google Scholar] [CrossRef]
  34. Li, G.; Xi, B.; He, Y.; Zheng, T.; Li, Y.; Xue, C.; Chanussot, J. Diamond-Unet: A Novel Semantic Segmentation Network Based on U-Net Network and Transformer for Deep Space Rock Images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 8002205. [Google Scholar] [CrossRef]
  35. Zhang, M.; Zhou, Y.; Zhao, J.; Man, Y.; Liu, B.; Yao, R. A survey of semi-and weakly supervised semantic segmentation of images. Artif. Intell. Rev. 2020, 53, 4259–4288. [Google Scholar] [CrossRef]
  36. Han, K.; Sheng, V.S.; Song, Y.; Liu, Y.; Qiu, C.; Ma, S.; Liu, Z. Deep semi-supervised learning for medical image segmentation: A review. Expert. Syst. Appl. 2024, 245, 123052. [Google Scholar] [CrossRef]
  37. Ma, Z.; He, X.; Sun, S.; Yan, B.; Kwak, H.; Gao, J. Zero-Shot Digital Rock Image Segmentation with a Fine-Tuned Segment Anything Model. arXiv 2023, arXiv:2311.10865. [Google Scholar]
  38. Li, Y.; Guo, L.; Ge, Y. Pseudo labels for unsupervised domain adaptation: A review. Electronics 2023, 12, 3325. [Google Scholar] [CrossRef]
  39. Fan, Y.; Kukleva, A.; Dai, D.; Schiele, B. Revisiting consistency regularization for semi-supervised learning. Int. J. Comput. Vis. 2023, 131, 626–643. [Google Scholar] [CrossRef]
  40. Wang, H.; Wang, S.-B.; Li, Y.-F. Instance selection method for improving graph-based semi-supervised learning. Front. Comput. Sci. 2018, 12, 725–735. [Google Scholar] [CrossRef]
  41. Yin, B.; Hu, Q.; Zhu, Y.; Zhou, K. Semi-supervised learning for shale image segmentation with fast normalized cut loss. Geoenergy Sci. Eng. 2023, 229, 212039. [Google Scholar] [CrossRef]
  42. Liang, H.; Zou, J. Rock image segmentation of improved semi-supervised SVM–FCM algorithm based on chaos. Circuits Syst. Signal Process. 2020, 39, 571–585. [Google Scholar] [CrossRef]
  43. Huang, H.; Luo, X.; Xu, S.; Li, Y. Twin Pseudo-training for semi-supervised semantic segmentation. Comput. Graph. 2023, 115, 348–358. [Google Scholar] [CrossRef]
  44. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  45. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. pp. 234–241. [Google Scholar]
  46. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
  47. Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst. 2020, 33, 12104–12114. [Google Scholar]
  48. Zhang, Y.; Ling, H.; Gao, J.; Yin, K.; Lafleche, J.-F.; Barriuso, A.; Torralba, A.; Fidler, S. Datasetgan: Efficient labeled data factory with minimal human effort. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10145–10155. [Google Scholar]
Figure 1. Overview of the entire task workflow.
Figure 1. Overview of the entire task workflow.
Applsci 14 07178 g001
Figure 2. StyleGAN2-ADA model framework, where G denotes the Generator, D represents the discriminator, Augmentations refer to various augmentation techniques, and Compute D/G loss pertains to the calculation of the loss for the Generator or discriminator.
Figure 2. StyleGAN2-ADA model framework, where G denotes the Generator, D represents the discriminator, Augmentations refer to various augmentation techniques, and Compute D/G loss pertains to the calculation of the loss for the Generator or discriminator.
Applsci 14 07178 g002
Figure 3. Consecutive-layer 2D CT grayscale images of the first sandstone column sample.
Figure 3. Consecutive-layer 2D CT grayscale images of the first sandstone column sample.
Applsci 14 07178 g003
Figure 4. Images generated by the StyleGAN2-ADA model using consecutive-layer 2D CT grayscale images as training data exhibiting mode collapse.
Figure 4. Images generated by the StyleGAN2-ADA model using consecutive-layer 2D CT grayscale images as training data exhibiting mode collapse.
Applsci 14 07178 g004
Figure 5. LPIPS feature extraction and perceptual judgment prediction flowchart.F represents the network for feature extraction. G represents the small network for predicting human perceptual judgments based on the two sets of global distance measurements.
Figure 5. LPIPS feature extraction and perceptual judgment prediction flowchart.F represents the network for feature extraction. G represents the small network for predicting human perceptual judgments based on the two sets of global distance measurements.
Applsci 14 07178 g005
Figure 6. (a) Original 2D CT grayscale image of Fontainebleau sandstone; (b) pore segmentation of the same region using the OTSU algorithm.
Figure 6. (a) Original 2D CT grayscale image of Fontainebleau sandstone; (b) pore segmentation of the same region using the OTSU algorithm.
Applsci 14 07178 g006
Figure 7. (a) Real 2D CT grayscale image; (b) 2D CT grayscale image generated by StyleGAN2-ADA.
Figure 7. (a) Real 2D CT grayscale image; (b) 2D CT grayscale image generated by StyleGAN2-ADA.
Applsci 14 07178 g007aApplsci 14 07178 g007b
Figure 8. Comparison of segmentation model test results trained with consistency thresholds set at 28 (high consistency) and 18 (low consistency).The red box shows the detailed differences between different segmented images.
Figure 8. Comparison of segmentation model test results trained with consistency thresholds set at 28 (high consistency) and 18 (low consistency).The red box shows the detailed differences between different segmented images.
Applsci 14 07178 g008
Figure 9. (a) Real 3D pore structure (800 × 845 × 840); (b) synthetic 3D pore structure (1024 × 1024 × 840).
Figure 9. (a) Real 3D pore structure (800 × 845 × 840); (b) synthetic 3D pore structure (1024 × 1024 × 840).
Applsci 14 07178 g009
Figure 10. (a) Throat length distribution of real and synthetic 3D pores; (b) coordination number distribution of real and synthetic 3D pores.
Figure 10. (a) Throat length distribution of real and synthetic 3D pores; (b) coordination number distribution of real and synthetic 3D pores.
Applsci 14 07178 g010
Figure 11. (a) Pore radius distribution of real and synthetic 3D pores; (b) throat shape factor distribution of real and synthetic 3D pores.
Figure 11. (a) Pore radius distribution of real and synthetic 3D pores; (b) throat shape factor distribution of real and synthetic 3D pores.
Applsci 14 07178 g011
Table 1. Parameter settings for CT scanning.
Table 1. Parameter settings for CT scanning.
ParameterUnitValue
CurrentuA90
VoltageKv170
Pixel Sizeμm2.67
Table 2. Effects of various key training parameters on the performance of the generative model.
Table 2. Effects of various key training parameters on the performance of the generative model.
Total Training
Images
Batch SizeISKIDLPIPS
500,00041.01207.45360.5521
81.48965.36250.5405
161.60574.90570.5284
1,000,00041.63184.40640.5272
81.88693.34460.5031
162.00232.02710.4399
2,000,00042.00481.15010.4385
82.19130.42030.4073
162.24280.21320.3965
Table 3. Ablation study of different technical combinations.
Table 3. Ablation study of different technical combinations.
U-NetAdaptive BaggingPseudo-LabelIoURecallPALocal ConsistencyPorosity Difference
0.96150.96670.96240.96490.0231
0.97960.96800.97240.98410.0156
0.98010.96990.98430.98550.0128
0.99810.99270.99930.99730.0035
Table 4. Performance comparison of different segmentation methods.
Table 4. Performance comparison of different segmentation methods.
MethodsIoUPAPorosity Difference
SegNet0.95380.95670.0968
ResNet+U-Net0.97100.98800.0552
U-Net++0.98060.96990.0313
DAN0.98510.97120.0259
EM0.98660.97940.0116
MN0.98800.98240.0079
Ours0.99810.99930.0035
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Suo, L.; Wang, Z.; Liu, H.; Cui, L.; Sun, X.; Qin, X. Innovative Deep Learning Approaches for High-Precision Segmentation and Characterization of Sandstone Pore Structures in Reservoirs. Appl. Sci. 2024, 14, 7178. https://doi.org/10.3390/app14167178

AMA Style

Suo L, Wang Z, Liu H, Cui L, Sun X, Qin X. Innovative Deep Learning Approaches for High-Precision Segmentation and Characterization of Sandstone Pore Structures in Reservoirs. Applied Sciences. 2024; 14(16):7178. https://doi.org/10.3390/app14167178

Chicago/Turabian Style

Suo, Limin, Zhaowei Wang, Hailong Liu, Likai Cui, Xianda Sun, and Xudong Qin. 2024. "Innovative Deep Learning Approaches for High-Precision Segmentation and Characterization of Sandstone Pore Structures in Reservoirs" Applied Sciences 14, no. 16: 7178. https://doi.org/10.3390/app14167178

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop