Sedimentary Facies Identification Technique Based on Multimodal Data Fusion

Yi, Yuchuan; Zhang, Yuanfu; Hou, Xiaoqin; Li, Junyang; Ma, Kai; Zhang, Xiaohan; Li, Yuxiu

doi:10.3390/pr12091840

Open AccessArticle

Sedimentary Facies Identification Technique Based on Multimodal Data Fusion

by

Yuchuan Yi

,

Yuanfu Zhang

^*,

Xiaoqin Hou

,

Junyang Li

,

Kai Ma

,

Xiaohan Zhang

and

Yuxiu Li

School of Energy Resources, China University of Geosciences, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(9), 1840; https://doi.org/10.3390/pr12091840

Submission received: 29 July 2024 / Revised: 23 August 2024 / Accepted: 26 August 2024 / Published: 29 August 2024

(This article belongs to the Special Issue Contribution of Artificial Intelligence/Big Data to Reservoir Engineering and Reservoir Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

Identifying sedimentary facies represents a fundamental aspect of oil and gas exploration. In recent years, geologists have employed deep learning methods to develop comprehensive predictions of sedimentary facies. However, their methods are often constrained to some kind of unimodal data, and the practicality and generalizability of the resulting models are relatively limited. Therefore, based on the characteristics of oilfield data with multiple heterogeneous sources and the difficulty of complementary fusion between data, this paper proposes a sedimentary facies identification technique with multimodal data fusion, which uses multimodal data from core wells, including logging, physical properties, textual descriptions, and core images, to comprehensively predict the sedimentary facies by adopting decision-level feature fusion after predicting different unimodal data separately. The method was applied to a total of 12 core wells in the northwestern margin of the Junggar Basin, China; good results were obtained, achieving an accuracy of over 90% on both the validation and test sets. Using this method, the sedimentary microfacies of a newly drilled core well can be predicted and the interpretation of the sedimentary framework in the well area can be updated in real-time based on data from newly drilled core wells, significantly improving the efficiency and accuracy of oil and gas exploration and development.

Keywords:

sedimentary facies prediction; multimodal data fusion; hydrocarbon exploration; ResNet-50; sedimentary framework optimization

1. Introduction

The concept of sedimentary facies is a fundamental part of sedimentology, and sedimentary facies identification is a basic and important task in the field of energy exploration, especially oil and gas exploration. Accurate identification of sedimentary facies is not only crucial to the understanding of the depositional environments and depositional systems but also plays a key role in the assessment of reservoir quality, the prediction of reservoir distribution, and the optimization of production strategies. Sedimentary facies identification is typically performed by experienced geologists based on a combination of geophysics, geochemistry, field exploration, core identification, and many other tasks [1]. Nevertheless, the manual method is not only time-consuming and laborious, with a considerable workload, but also multi-resolution, necessitating that geologists make judgments based on their experience, which may introduce subjectivity into the process. Verification of the sedimentary facies delineated in the study area is only possible based on the core wells. In the absence of core wells, the lithology can be obtained through geophysical data fitting, combined with field exploration to identify the outcrops and other methods to comprehensively judge the sedimentary facies. However, this approach is subject to a significant margin of error.

With the advancement of technologies, such as deep learning, big data analytics, and image processing, as well as the development of interdisciplinary geoscience and data science, the geology community has begun to explore the use of these technologies to study geologic problems in scientific research fields and production applications. Examples of such applications include mineral prediction, geological mapping, inferring geological formations, geochemical data prediction, and well logging evaluation [2,3,4,5,6,7]. The multiple applications of deep learning techniques in geology demonstrate the significant potential of this technology for prediction, classification, and modeling.

The application of deep learning methods to the prediction of geological solutions based on logging profiles represents a field of prior research. Yang Cun et al. conducted research on intelligent seismic recognition technology for sedimentary facies with weakly supervised learning at a small sample size by calibrating seismic profiles [8]. However, the seismic profiles are meticulously and narrowly labeled, which impedes the comprehensive identification of sedimentary facies. Instead, they are confined to the classification of carbonate mound beach body sedimentary facies, which cannot discern a diverse array of sedimentary facies. Linqi Zhu et al. investigated a logging interpretation method for deep-sea gas hydrate sedimentary reservoirs based on unlabeled big data and semi-supervised deep learning [9]. The method overcomes the limitation of small sample training of traditional methods to some extent, but the relationship between hyperparameters and reservoir parameters is not determined, which makes the accuracy and reliability of the prediction model lacking. Hui Liu et al. proposed an automatic depositional microfacies identification technique based on DPNN [10]. Although their method achieves a high accuracy rate, the model identification results are strongly correlated with the sample set, and the algorithm needs to be improved to adapt to the randomly selected sample set.

All of the above studies predict sedimentary facies by logging curves. Nevertheless, the prediction of sedimentary facies based solely on logging curves has two inherent limitations. Firstly, the recognition accuracy is not high enough to replace manual recognition. Secondly, the geological data that reflect sedimentary facies are not only limited to logging curves but also seismic data, geochemical data, core data, etc., which can also reflect the regional sedimentary system and evolutionary pattern. However, these data contain a large number of structured and unstructured data, which are difficult to utilize in a unified way.

Modality refers to the manner in which something occurs or exists, and multimodality involves the combination of various forms of two or more modalities [11]. Multi-modality Fusion Technology (MFT) in deep learning is the process by which a model synthesizes information from two or more modalities to make predictions as it processes different forms of data during analysis and recognition tasks. The fusion of multimodal data can provide more information for modeling decisions, thus improving the accuracy of the overall results of those decisions [12].

Four categories of modal data, including logging curves, physical property curves, core text descriptions, and core photographs, illustrate the physical and chemical attributes of diverse geological objects in disparate manners [13,14,15,16]. Considering these different modal data together can provide more geological information and help improve the identification of different sedimentary microfacies.

Addressing the challenges of data heterogeneity from multiple sources and the difficulty of complementary fusion between data types, several scholars have proposed a multimodal knowledge graph construction method tailored to the field of logging. This approach facilitates the fusion and storage of multimodal data in logging [17,18,19,20]. However, these methods either store only the data without predicting unknown data, or they are limited to a certain type of data and do not consider the information from other data types such as text and images.

In summary, this paper proposes a sedimentary facies identification technique based on deep learning multimodal data fusion. By fusing data from multiple modalities, such as logging curves, physical data, core photos, core descriptions, etc., the accuracy and model generalization of deep learning predictive sedimentary facies technique is improved. The model can accurately predict the sedimentary facies in the study area, achieving an overall accuracy of 94%, even when some data are incomplete or missing. This significantly reduces the workload of manual identification and minimizes errors caused by subjective factors. By applying this method, after the core is taken from each core well, the core drum scanning photos are recorded into the computer on-site, and sedimentary microfacies of the well can be immediately identified based on the existing logging curves and physical property curves, combined with the core photos and logging data. The modified sedimentary facies data can then be applied to rapidly adjust the existing sedimentary framework of the entire work area. Against the background of rapid oil and gas development in China, more core wells and the application of core well data in actual production will gradually appear in the future. Through real-time analysis and adjustment, drilling plans and resource allocation can be better optimized to further enhance the benefits of oil and gas exploration and development. Meanwhile, the method also provides new ideas and methods for other geological research fields and is expected to play an important role in a wider range of geological applications.

2. Methodology

2.1. Framework of the Prediction Model

The multimodal fusion prediction model proposed in this paper comprises five main components: a logging curve prediction part based on random forest, a physical property curve prediction part based on K-nearest neighbor, a textual description prediction part based on bagging classifier, a core picture prediction part based on residual convolutional neural network, and a multimodal fusion part based on decision-level fusion. Our framework employs an object-based analysis approach, integrating various data types such as logging curves, core images, physical properties, and textual descriptions to enhance sedimentary facies identification. Within this framework, each data type is treated as a component of a larger geological system, where specific geological features are regarded as distinct objects. For instance, in core images, regions corresponding to sedimentary layers are identified and analyzed as objects rather than merely focusing on individual pixels. Similarly, logging curves and physical property data are processed as independent geological objects, identifying key geological structures or facies features.

The framework of the prediction model is as follows: First, each type of data undergoes preprocessing. Then, algorithms such as random forest, K-nearest neighbors, bagging classifiers, and convolutional neural networks are used to extract features from the different modalities of data. During the feature extraction process, the object-based analysis framework helps us effectively capture and integrate high-level geological patterns from various data types, enhancing the predictive capability and interpretability of each unimodal model. Subsequently, the features of each modality are subjected to unimodal training, resulting in sedimentary facies predictions. Finally, the prediction results from each unimodal model are combined through a weighted voting process at the decision level, producing the final sedimentary facies prediction.

2.2. Multimodal Feature Extraction

2.2.1. Logging Curve Feature Extraction

The random forest method represents an integrated learning approach that constructs multiple decision trees and employs a voting or averaging process to derive a final prediction [21,22]. The model structure is illustrated in Figure 1. The integration of multiple models allows for the effective mitigation of the overfitting problem and the enhancement of the model’s generalization ability. The logging curve dataset contains a multitude of measurement parameters, including RT (true resistivity), AC (acoustic compressional), RI (resistivity index), and others. These parameters are high-dimensional and diverse, exhibiting intricate nonlinear relationships with one another. To effectively capture these relationships, sophisticated models are necessary. Additionally, logging curve data often contain noise and missing values, which can lead to overfitting if not adequately addressed [23,24,25,26].

The random forest algorithm can circumvent the overfitting phenomenon to a certain degree. Each tree in a random forest randomly selects samples and features, making it highly resistant to the effects of noise. Since the random forest comprises multiple decision trees, each tree involves a split based on one or a few features. This gradual division of space allows for finding suitable split points, regardless of the dimensionality of the feature space. This makes it an ideal choice for processing high-dimensional data. Even if some features are not selected in some trees, important features will still work in other trees due to the availability of enough trees for integration, which ultimately improves the stability and robustness of the overall model. Accordingly, the random forest algorithm was selected for processing the logging curve data, automatically identifying the most salient features of the logging curve through the feature selection mechanism of the random forest, elucidating the interrelationship between logging parameters and sedimentary facies, and formulating predictions. However, despite these advantages, the random forest model may still face challenges such as class imbalance and reduced interpretability. To address these potential weaknesses, we implemented five-fold cross-validation to reduce overfitting and increase model robustness. Additionally, we carefully selected features to enhance the stability and reliability of the model.

The K-nearest neighbor (KNN) classification algorithm is one of the most basic and classical methods in data mining classification techniques by measuring the distance between different feature values [27,28]. The idea of the KNN algorithm is very simple: for any n-dimensional input vector, corresponding to a point in the feature space, the output is the category label or prediction corresponding to that feature vector.

The idea of the KNN algorithm is shown in Equations (1) and (2). Equation (1) represents the formula for calculating the distance,

d (x_{i}, x_{j})

denotes the distance between samples

x_{i}

and

x_{j}

. The term denotes the number of dimensions of the features, and

x_{i}^{k}

and

x_{j}^{k}

denote the values of samples

x_{i}

and

x_{j}

on the k-th feature, respectively.

In contrast, Equation (2) represents the concept of categorical decision-making through majority voting. Here,

\hat{y}

represents the predicted category, c denotes the set of all possible categories, K denotes the number of nearest neighbors selected,

y_{i}

denotes the true category of the Ith neighbor, and

σ (y_{i}, c)

is the indicator function, which assumes a value of 1 when

y_{i} = c

and 0 otherwise. The method is employed by selecting the initial k most analogous data points within the sample set. In most cases, k is an integer that does not exceed 20. The category with the highest frequency in the final k most analogous data points is then assigned as the classification of the new data.

d (x_{i}, x_{j}) = \sqrt{\sum_{k = 1}^{n} {(x_{i}^{k} - x_{j}^{k})}^{2}}

(1)

\hat{y} = arg max_{c \in C} \sum_{i = 1}^{K} δ (y_{i}, c)

(2)

The physical profile data provide important information about physical properties in the formation; these profiles include porosity, permeability, oil saturation, and resistivity. Although both physical profile data and logging profile data constitute feedback on the subsurface stratigraphy, they differ in terms of data volume, complexity, and characterization. Consequently, different algorithms are selected for processing.

The logging curve dataset is larger and more complex, necessitating more powerful algorithms. The random forest algorithm, which can handle high-dimensional data with excellent resistance to overfitting, was selected for its suitability. In contrast, the physical properties curve dataset is relatively small and has lower dimensionality; furthermore, the data are smoother and more structured, rendering the KNN algorithm more appropriate for classification based on sample distance. This is particularly well-suited to data of this nature [29,30,31,32]. However, despite its simplicity and effectiveness, KNN can be sensitive to the choice of k, and its performance may degrade with noisy data or when the data points are not uniformly distributed across categories. To mitigate these issues, we carefully selected the optimal value of k through cross-validation and applied data preprocessing techniques to reduce noise in the physical profile dataset.

2.2.2. Text Data Feature Extraction

A bagging classifier is an integrated learning algorithm that improves classification performance by voting the predictions of multiple weak classifiers [33,34]. It creates multiple instances of a black-box estimate on a random subset of the original training set, and then aggregates their individual predictions to form the final prediction, as shown in Figure 2. These methods reduce the variance of the underlying estimators (e.g., decision trees) by introducing randomization to their construction and then integrating them. The formula is as follows:

H (x) = m a j o r i t y_{v o t e} {h_{1} (x), h_{2} (x), \dots, h_{T} (x)}

(3)

In the formula,

H (x)

is the final classification result,

h_{i} (x)

is the prediction result of the i-th classifier, and i is the number of classifiers.

The textual description data contain detailed descriptions of the core features, such as the depth at which they are located, lithology, color, grain size, length, and so on. Compared to other types of data, text description data are unstructured and require some preprocessing before they can be used for deep learning model training. For this characteristic of textual description data, a bagging classifier is a more suitable algorithm, which improves the robustness and accuracy of the model by sampling the data multiple times, training multiple classifiers, and voting on the results of these classifiers. Furthermore, the quantity of textual description data is relatively modest, and the predictive outcomes of a single classifier may be influenced by specific data points, exhibiting considerable variability when the data volume is limited. The application of a bagging classifier can mitigate fluctuations in prediction outcomes and enhance the stability of the model by integrating the predictive results of multiple classifiers [35,36]. One limitation of this method is its reduced interpretability. As the final prediction is derived from the aggregation of multiple classifiers, it is challenging to ascertain the contribution of each feature to the ultimate decision. This “black-box” nature somewhat constrains the capacity to elucidate the model’s behavior in comprehensive detail.

2.2.3. Core Image Feature Extraction

A convolutional neural network (CNN) is a deep learning model that has been developed to process image data [37,38]. CNNs can efficiently capture spatial hierarchical features in images through a combination of convolutional, pooling, and fully connected layers. The core idea of CNN is to use local connectivity and weight sharing to reduce the number of parameters and improve the generalization of the model. ResNet-50 (a residual network) is a pre-trained deep residual network, a variant on top of a CNN, which contains a 50-layer network of convolutional, batch normalized, activation, and fully connected layers [39,40]. ResNet-50 solves the problem of gradient vanishing and gradient explosion as the network deepens by introducing skip connections, making it possible to train deeper networks. ResNet-50 has already achieved excellent performance on classical datasets such as ImageNet, proving its powerful feature extraction capability [41].

Core image data include the image obtained after scanning the core obtained from drilling wells in the process of geological exploration, with high resolution and rich details. Due to the different composition, handling process, and depositional environment, the core image shows different grain sizes, textures, colors, and other characteristics (typical unstructured data). Instead of performing pixel-level analysis, we employ object-based analysis in the processing of core images. ResNet-50 can effectively extract high-level features in images through its depth structure and residual connectivity, which is suitable for processing complex core images [42,43,44], as shown in Figure 3.

Since drilling and coring are very costly, the number of core wells and core images is limited, making it difficult to provide a large amount of labeled data for deep learning training. ResNet-50 has been pre-trained on the ImageNet dataset, whereas, the pre-trained model has learned a large number of generalized image features that can be transferred to the feature extraction of core images, thus reducing the reliance on large amounts of labeled data. Despite the aforementioned advantages, ResNet-50 is not without certain limitations. The most significant limitation is that achieving high performance on smaller datasets, such as the one used in this study, can still present challenges related to overfitting.

2.3. Multimodal Data Fusion

The multimodal data fusion method is a technique that employs information from disparate data sources to enhance the precision of predictive outcomes. This approach is predominantly classified into three categories: early fusion, late fusion, and hybrid fusion [45,46]. Early fusion, also known as feature-level fusion, involves the fusion of data from different modalities at the feature extraction stage, where the features from all data sources are combined into a unified feature vector. This vector is then fed into a model for training and prediction. Feature-level fusion can integrate information from all data sources into a single feature vector, allowing the model to learn the relationships between the data sources and to make predictions with only one model, thus avoiding the complexity of having to train multiple models. However, due to the merging of features from all data sources, the feature vectors of this method are of high dimensionality, which is prone to “dimensionality catastrophe”. Features from different data sources may have different problems, and direct merging is also likely to introduce noise [47].

Late fusion is also called decision-level fusion, in which deep learning models are first trained on different modalities, and then the outputs of multiple models are fused to arrive at a final prediction by voting or weighted voting. Each unimodal model in decision-level fusion is trained independently, which can fully exploit the characteristics of each data type and reduce the interference between data sources. At the same time, by integrating the decisions of multiple models, the robustness of the whole system can be improved and possible biases and errors of a single model can be reduced [48,49,50].

Hybrid fusion, i.e., combining feature-level fusion and decision-level fusion, can theoretically synthesize the advantages of the two and maximize the use of information from multimodal data, but it also combines the shortcomings of the two methods, increasing the structural complexity of the model and the difficulty of training, and has limitations in practical applications.

The performance of the two methods—early fusion and late fusion—depends on the specific problem; late fusion is better than early fusion when the correlation between the modalities is relatively large, and late fusion is more appropriate when the modalities are largely uncorrelated, such as when the number of dimensions and the sampling rate are highly uncorrelated. Therefore, this study chooses to use late fusion, also known as decision-level fusion, for multimodal data fusion.

Initially, the logging data, physical data, text data, and image data are each subjected to a process of training and feature extraction. Subsequently, the four distinct modal data types are processed using appropriate models, including random forest, KNN, bagging classifier, and CNN. Subsequently, the decision-level fusion method is employed to synthesize the prediction results of each model through weighted majority voting, thereby obtaining more accurate and stable prediction results of the sedimentary facies. The framework of this prediction model is illustrated in Figure 4.

3. General Situation of Geology

3.1. Geological Features

The study area is located on the northwestern margin of the Junggar Basin. Tectonically, this region belongs to the northeastern section of the lower plate of the Ke-Wu fault belt in the northwestern margin of the Junggar Basin. It borders the Jayer and Hala’alat Mountains to the northwest and extends to the Sag of Mahu in the east [51] (Figure 5).

The basement of the northwestern margin of the Junggar Basin was formed in the Carboniferous period, and the crust was uplifted in the late Carboniferous period by the Hercynian movement, which led to the formation of the Ke Wu fault belt. The Ke Wu fault activity slowed down in the Triassic period, but the sedimentary difference continued to exist. In the Late Triassic period, the lake levels rose, leading to the development of thick mudstone layers. At the end of the Triassic period, the crust was uplifted, causing the Baijiantan formation to undergo denudation, and a weathering surface developed at the top. Fracture activities nearly ceased in the Jurassic period. After the Cretaceous, the stratigraphic deposition stabilized, and the regional tectonic effects were not obvious.

During the early Triassic period, the northwestern margin developed a depositional system characterized by flood and alluvial fan–river and lake delta–submarine fan’s coarse clastic deposits. In the middle period, it evolved into an alluvial fan-submarine fan–fan delta–delta and shallow lake shore-phase depositional system. By the late Triassic, the area was dominated by large, thick deposits of shallow lacustrine mudstone along the shoreline. The logging, physical properties, core, and textual description data used in this study are from the middle Triassic Karamay Formation of the northwestern margin.

3.2. Sedimentary Types

Through a core observation (Figure 6), combined with a thin rock section and grain size analysis, the Karamay Formation in the study area was analyzed as fan–delta facies and was divided into two subfacies: fan–delta plain and fan–delta front. These are further subdivided into five microfacies, namely, main channel, distributary channel, channel edge, interchannel, and debouch bar [54].

The main channel microfacies are mainly conglomerates, accounting for ≥50% of the total, with a thickness of ≥5 m. The logging curve is box-shaped, reflecting the relatively stable energy of the water flow and the supply of material sources, and the up and down amplitude of the curve does not change much, so the material properties are good.

The percentage of conglomerates in the distributary channel microfacies ranges from 25% to 50%, with thickness ranging from 2 m to 6 m; the logging curves are bell-shaped, indicating that the grain size gradually becomes finer and finer in the vertical upward direction, reflecting that the supply rate and strength of the sedimentary clastic material gradually decreases from the bottom up, and that the physical properties are average.

The channel edge microfacies exhibit characteristics of mud-covered sand, with a conglomerate proportion less than 25%, a reservoir thickness of 1–3 m, and a small amplitude on the logging curve. It is mostly in the form of a date core, with the poorest physical properties.

The debouch bar microfacies are developed between the river channels. The sand body is relatively well-sorted, but the thickness of the sand layers is substantial, typically displaying anti-rhythmic or composite rhythmic rotation. The lithology at the edges of the sand body thins out, and the thickness decreases gradually with interlayer characteristics. The thickness of the sand body is ≥3 m, and the logging curves are funnel-shaped, reflecting that the supply rate and strength of the detrital sedimentary materials are gradually increasing, and the physical properties are good.

The interchannel microfacies stratigraphic structure is characterized by small cross-bedding, wavy bedding, or sandstone lenses. Occasionally, thin layers of sandstone and conglomerate are also present. The lithology of the upper water interchannel is dominated by brownish-red, mottled-colored mudstone and siltstone, while the underwater interchannel is primarily composed of gray and dark-gray mudstone and silty mudstone.

4. Experiments and Results Analysis

4.1. Data Description and Processing

A total of 12 core wells, designated from A to L, were selected for training in this study. The details of the dataset are presented in Table 1. Wells A through K were utilized as the training and validation sets, while well L was employed as a test set to assess the model’s generalization ability. Firstly, the sedimentary facies at the corresponding depths were manually identified for 12 core wells by observing the core photos as well as checking the textual descriptions as shown in Table 2, where the sedimentary microfacies codes are shown in Table 3, and the manually identified microfacies were used as the labeling data for the training model.

(1) A total of 16 curves, including SP (spontaneous potential), AC (acoustic), RT (true formation resistivity), and RI (resistivity index) were selected for the logging curve dataset. The GR curve (among logging curves) is typically employed to reflect the lithological change characteristics. However, the region is subject to volcanic activity (Figure 7 illustrates the tuff of well L), and the GR curve exhibits outliers, rendering it unsuitable for analysis [55,56].

(2) The physical property curves include ART, ARI, AAC, POR, PERM, SO, and others. Among these, the POR, PERM, and SO curves provide a more intuitive reflection of the porosity, permeability, and fluid content of the rocks. These curves are crucial for highlighting the characteristics of sedimentary facies and aiding in the identification of stratigraphic properties. For example, higher porosity and permeability are commonly found in sandy sedimentary facies with high storage capacity, such as distributary channels and debouch bars.

(3) The format of the data pertaining to both the logging curve and the physical curve is standardized, obviating the necessity for extensive processing, with the exception of the removal of outliers. However, the text description data and core image data necessitate a more intricate data preprocessing procedure. The text description data must first be extracted in order to obtain key features such as top depth, bottom depth, color, grain size, composition, lithology, and so forth. This is necessary to ensure that the extracted information is accurate and structured. Subsequently, the extracted data must be cleaned to remove invalid or anomalous values, address missing data, and standardize the data format, unify the description of color, grain size, composition, and ensure the consistency of description between different records. Subsequently, the category-based features are converted to numerical features via label encoding (LEC), thereby enabling the model to process the features.

(4) The core images require a lot of manual labeling, and all the core images need to be labeled with depth and sedimentary facies information, and the labeling rule is “well name_cylinder number-section number_core top depth_core bottom depth_sedimentary microfacies”. After labeling, all the images were enhanced and preprocessed: firstly, the image size was adjusted to 224 × 224 pixels. Then random horizontal flipping and rotation were applied to enrich the image data. After that, the image data were converted into a tensor and normalized by using ImageNet’s standard mean and standard deviation.

The multimodal data used in this study, including logging curves, physical properties, and core images, exhibited significant heterogeneity due to differences in their measurement units and data acquisition methods. To address the challenges posed by the heterogeneity of the data, we employed several preprocessing techniques to minimize potential systematic errors. These techniques included data normalization to standardize measurement scales across different data types and noise reduction methods to eliminate spurious data points. Furthermore, we applied cross-validation to assess the model’s performance across multiple folds, ensuring that the model remains robust even when trained on varied and heterogeneous datasets. These measures were crucial in minimizing the risk of systematic errors arising from data heterogeneity, thereby improving the model’s reliability and generalizability.

Once the data were subjected to the requisite preprocessing, the sedimentary facies were correlated with the logging data, the physical property data, and the text descriptions, according to depth. The core image data were manually labeled and, thus, did not require further processing. Ultimately, the data from well L were stored as part of a distinct test set, while the remaining data were employed in the training set and validation set for model training.

4.2. Unimodal Experiments and Results

4.2.1. Well Logs Prediction Experiment

The computations for this experiment and the following experiments were conducted on a system equipped with an Intel Core i5-12400F CPU at 2.50 GHz, supported by 32 GB RAM. The device operated on a Win10 64-bit OS and featured an NVIDIA GeForce RTX 4060ti graphics card with 16 GB memory.

In order to predict sedimentary facies, the random forest algorithm was selected as the most appropriate for the logging data. Once the preprocessing of the logging data was completed, the optimal hyperparameters for the random forest were selected using the method of cross-validation.

In order to find the optimal hyperparameter combinations, the range of each hyperparameter was first defined: number of trees: 100, 200, 300, 400, 500; maximum depth: 10, 20, 30, 40, 50; the maximum number of features: ‘auto’, ‘sqrt’, ‘log2’; minimum number of sample splits: 2, 5, 10; and minimum number of sample leaves: 1, 2, 4. Afterward, all of the possible hyperparameter combinations were used to evaluate the model performance using a 5-fold cross-validation approach. In each fold cross-validation, the dataset was randomly partitioned into five subsets, where four subsets were used for training and one subset was used for validation. This process was repeated five times. Each time, a different subset was selected as the validation set, and the other four subsets were used as the training set. Based on the results of cross-validation, the combination of hyperparameters with the highest average accuracy was selected and the final random forest model was trained using the best hyperparameters. After the model was trained, the performance of the model was evaluated using metrics such as accuracy, precision, recall, and F1 score, as shown in Table 4. The model is presented with a confusion matrix as shown in Figure 8a.

It is obvious from the confusion matrix that the overall prediction accuracy of the model is as high as 94%. The best prediction is in the main channel (Facies 1) and interchannel (Facies 4), with a recall as high as 97%, while the prediction is poorer in the channel edge (Facies 3) and debouch bar (Facies 5), with a recall lower than 85%, which is partly due to the difference in the number of datasets, with the number of samples in the channel edge and debouch bar significantly less than the other three microfacies, which leads to their lower recall along with higher precisions.

On the other hand, the main channel is usually characterized by high-energy sedimentary features, such as coarser grains and the result of higher porosity, while the interchannel is characterized by finer grains, lower porosity, and more muddy constituents; these significant differences in the features allow the model to more accurately differentiate between these two types of microfacies.

It can also be seen from the confusion matrix that channel edges are misclassified as the main channel and interchannel, and debouch bars are often misclassified as the distributary channel and interchannel. In addition to the relatively small amount of training data for the channel edge and debouch bar, this misclassification is likely due to the presence of similar lithologies across different sedimentary microfacies. Channel edges feature coarser sedimentary grains and greater thicknesses near the channel, similar to the main channel, while areas farther from the channel are dominated by fine-grained sediments, akin to the interchannel. Debouch bars, which develop between channels, consist mainly of well-sorted fine sand and silt, which are characteristics also common in distributary channels and interchannels.

4.2.2. Physical Properties Prediction Experiment

The physical curves exhibit a data structure analogous to that of the logging curves; however, they are distinguished by a markedly lower degree of complexity and unstructured characteristics, so the relatively simple KNN model is chosen for prediction. The same cross-validation method was selected to adjust the hyperparameters of the KNN model in order to obtain the optimal hyperparameter combination. Ultimately, Manhattan was selected as the metric, 9 for K, and distance for weighted voting weights. The final model performance is presented in Table 5, and the confusion matrix is shown in Figure 8b.

From the model output results, the accuracy of predicting sedimentary microfacies from physical property data is significantly lower than that of logging curves, which may be attributed to the small number of physical property curves and the limitation of the physical properties they reflect, which cannot comprehensively capture the complex characteristics of sedimentary microfacies. In addition, the characteristic differences between different sedimentary microfacies in the physical property curves may not be significant enough to cause the model to perform poorly in distinguishing between different sedimentary microfacies. From the confusion matrix, the recalls of the channel edge and debouch bar are lower than 0.5, in which the probability of the channel edge being misidentified as the main channel, distributary channel, and interchannel is not much different, which may be due to the small number of samples on the one hand. On the other hand, it may be due to the complexity of the lithology of the channel edge, the lithology is similar to the three microfacies, such as the main channel, resulting in physical characteristics that cannot be well distinguished from the three microfacies. The training results for the debouch bar may be suboptimal due to the limited amount of data, which makes it challenging to accurately distinguish these sedimentary microfacies from others.

4.2.3. Textual Descriptions Prediction Experiment

The textual description dataset has a total of 474 data points following data preprocessing, which is a considerably smaller number than that of the other modal dataset. In the event of insufficient data, a single classifier may result in overfitting of the training data, whereas the bagging classifier can enhance the model’s generalizability by integrating the prediction outcomes of multiple classifiers. The integrated learning feature of the bagged classifier can compensate for the shortcomings brought about by the small data size to some extent, thus improving the overall performance and stability of the model. Again, after hyperparameter selection, the model is trained and the performance of the model is evaluated on the validation set as shown in Table 6, and the confusion matrix is shown in Figure 8c.

From the model outputs, the overall prediction of the textual description data is biased due to the small sample data size, especially at the river edges and debouch bar, where the prediction accuracies and recalls are low due to the significantly too small data size. Furthermore, the confusion matrix indicates that the distributary channel, channel edge, and interchannel were erroneously identified as the main channel. This likely resulted from the higher number of main channel samples compared to the other microfacies samples, which biased the model toward identifying the main channel during training. The limited number of samples presents a challenge for the model in accurately discerning the distinctive characteristics of the other microfacies. However, the textual description data themselves are more sparse and have an accuracy of more than 70%, which is not as precise as the models trained on other data but can contribute to the overall prediction of sedimentary microfacies.

4.2.4. Core Image Prediction Experiments

The classic convolutional neural network model was selected for the core image prediction experiments for training. The total number of pre-processed core photos was 530, which may be considered a relatively modest amount of data for a complex classification task such as predicting sedimentary microfacies. Accordingly, the ResNet-50 pre-trained model was selected as the base model; ResNet-50 was pre-trained and fine-tuned, with a fully connected layer added on top to output sedimentary facies classification results. The results of the model training are presented in Table 7, and the confusion matrix is shown in Figure 8d.

From the table, it can be seen that the prediction results are poor due to the small number of core image samples in the channel edge and debouch bar, whereas the prediction results for the main channel, distributary channel, and interchannel are better, with prediction accuracies of about 90% and above. Since the same type of lithology may correspond to more than one sedimentary microfacies, the use of images to predict sedimentary microfacies is subject to a certain degree of misclassification, e.g., fine sandstones may be present in both the interchannel and debouch bar. This is another indication that the accuracy and generalizability of models using only unimodal data for sedimentary facies prediction are prone to bottlenecks.

4.3. Multimodal Data Fusion Model

After training the four models, realizing decision-level multimodal fusion requires further processing of the data. First, the logging data and physical data are organized in a one-to-one correspondence according to depth. That is to say, the logging data and physical data of the same depth are placed in the same row. As the depth of each textual description sample and core image sample is not a single depth point, but rather a section of depth, each row represents a depth interval. In order to align with the logging and physical data, it is necessary to correspond each depth point to the depth interval in which it is located.

That is, if a depth point from a specific set of logging and physical properties data falls within the depth interval of a particular piece of textual description data, this textual description data is assigned to that depth point. Similarly, if a depth point is situated within the depth interval of a core image, that core image is assigned to that depth point. However, there may be cases where a depth point has only logging and physical data, with no corresponding textual description or core image; these data points will be removed.

Following the aforementioned processing, data with disparate depth resolutions can be aligned and integrated into a comprehensive dataset. This alignment and integration guarantees that each depth point contains multimodal data from which features can be extracted for multimodal data fusion prediction. Although this approach introduces additional complexity, it can leverage the distinctive attributes of each data type, thereby enhancing the model’s predictive accuracy.

As mentioned in the previous section, commonly used methods in multimodal data fusion research include feature-level fusion and decision-level fusion. Feature-level fusion can be defined as the integration of features derived from individual modal data, which have been extracted and subsequently fed into a unified model for classification or regression purposes. The benefit of this methodology is that it allows for the comprehensive exploitation of the granular data pertaining to all modal variables, thereby enhancing the overall performance of the model. However, a significant drawback of feature-level fusion is that when the data dimensions and features of each modality differ significantly, the fused feature space may become too complex, thereby rendering the model challenging to train. It is possible that if only a limited number of data types are subjected to training, satisfactory outcomes may be attained. However, there are a plethora of information modalities in the oilfield. While this paper mentions only four modalities, for a study area containing a greater number of modal data, employing feature-level fusion is not conducive to incorporating additional modalities into the original dataset. Moreover, feature-level fusion requires extensive data preprocessing, which involves standardizing and aligning data from different modalities. This introduces additional complexity when dealing with a large amount of heterogeneous data, such as oilfield data, and significantly increases the difficulty of data preprocessing.

In contrast, decision-level fusion integrates the prediction outcomes of each model after the processing and prediction of the respective modal data by discrete models. This is typically achieved through weighted voting or alternative aggregation techniques, thereby yielding the ultimate decision. The advantage of decision-level fusion is that it preserves the independence of each modality and is more flexible in dealing with heterogeneous data from multiple sources. In comparison to feature-level fusion, decision-level fusion methods are relatively straightforward, do not require complex preprocessing and standardization of data from different modalities, and are well-suited to the processing of large-scale and complex datasets.

It can be concluded that the decision-level fusion method is a more suitable approach for analyzing the data in the oilfield context. Firstly, suitable models are selected for training and feature extraction for each of the four modalities. Subsequently, the prediction results of each model are weighted according to the assigned weights, which are based on the performance of each model on the validation set. The weights are 0.94, 0.76, 0.72, and 0.90, respectively. Finally, the prediction results of each model are weighted and summed up, and the category with the highest score is selected as the final prediction result; the results are shown in Table 8 and Figure 9a.

The final prediction results indicate a significant enhancement in the efficacy of multimodal data fusion, accompanied by a notable improvement in the recognition rates of the remaining four sedimentary microphases, with the exception of the debouch bar, which exhibited relatively lower accuracy. The performance and effect of the multimodal data fusion model are observed to be markedly superior to those of the unimodal data prediction model. For the identification of the sedimentary phase studied in this paper, the formation mechanism is complex, often requiring the consideration of multiple factors. It is clear that relying solely on unimodal prediction, or considering only one type of factor, yields insufficient results; the accuracy of predictions using the physical property curve and text description does not exceed 80%. Even if the prediction results are good, there may be some randomness. However, multimodal prediction can synthesize various elements. When modal data are missing or the prediction effect is poor, data from other modes can compensate for the shortcomings of the entire network, significantly enhancing the credibility of the prediction results for complex data compared to unimodal prediction.

The data from the test set of well L are predicted by the model after decision-level fusion; the results are shown in Table 9 and Figure 9b. The overall accuracy reaches 93%, which is very close to that of the validation set, further confirming the practical application of the fusion model.

To further verify the feasibility of the method in sedimentary facies identification application, we selected four wells in a paragenetic profile for sedimentary facies prediction. Among these four wells, three wells on the left were non-core wells containing only the logging curve and physical property curve data, and the other well was a core well containing all data. The effect of the data presentation after prediction is shown in Figure 10, which is basically consistent with the recognition of the study area [53]. It can be seen that the method can be applied to the whole study area to predict and recognize the sedimentary phase for all wells and construct the sedimentary grid of the study area.

5. Conclusions

Addressing the characteristics of oilfield data that are heterogeneous and difficult to fuse, this paper proposes a sedimentary facies prediction method based on multimodal data fusion. This method can reduce the reliance on traditional sedimentary phase identification methods on experts’ experience and the errors caused by manual identification. It also significantly reduces the manual workload while improving the efficiency and accuracy of sedimentary facies identification.

After comparing feature-level fusion and decision-level fusion, the multimodal data fusion model was selected and built using decision-level fusion. The random forest model, the KNN model, the bagging classifier model, and the CNN model achieved 94%, 76%, 72%, and 90% accuracy, respectively, on the validation set, and the fused model achieved 97% recognition accuracy on the validation set. The application of the multimodal data fusion model to predict well L on the test set achieved 93% accuracy, which is similar to that of the validation set, thereby verifying the accuracy of the model.

A multimodal fusion model was applied to predict the sedimentary facies of a cis-source section in the study area, and the results are generally consistent with the understanding of the study area. The feasibility of extending the method to predict sedimentary facies throughout the study area and to improve the interpretation of the sedimentary framework in the study area is verified.

The algorithms used in this paper for different types of unimodal data are targeted algorithms proposed for the limited amount of data currently available; the data in this study area are not suitable for training with more complex networks, and as the amount and type of data improve, more complex integrated learning methods can be used.

Given the background of rapid oil and gas development in China, in the future, more core drilling activities will appear in the exploration and development process, along with expanded use of data from core drilling activities. The multimodal data fusion method improves the accuracy and efficiency of sedimentary facies identification, and can further provide more accurate geological models for oil and gas field development. Meanwhile, the method also provides new ideas for other geological research fields and is expected to play an important role in broader geological applications.

Author Contributions

Conceptualization, Y.Y. and Y.Z.; methodology, Y.Y.; validation, Y.Y. and X.Z.; formal analysis, Y.Y. and K.M.; investigation, Y.Y., K.M. and Y.L.; writing—original draft preparation, Y.Y.; writing—review and editing, X.H., J.L. and K.M.; visualization, Y.Y.; supervision, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Research Program on Key Technologies for the Detection and Development of Medium and Deep Geothermal Energy in Yunnan Province under grant no. 202302AF080001.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the China University of Geosciences (Beijing) for their financial support for this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhu, X.; Li, Y.; Dong, Y.; Zhao, D.; Wang, X.; Zhu, M. The program of seismic sedimentology and its application to Shahejie Formation in Qikou depression of North China. Geol. China 2013, 40, 152–162. [Google Scholar]
Caté, A.; Perozzi, L.; Gloaguen, E.; Blouin, M. Machine learning as a tool for geologists. Lead. Edge 2017, 36, 215–219. [Google Scholar] [CrossRef]
Zhao, C.; Jia, Y.; Qu, Y.; Zheng, W.; Hou, S.; Wang, B. Forecasting Gas Well Classification Based on a Two-Dimensional Convolutional Neural Network Deep Learning Model. Processes 2024, 12, 878. [Google Scholar] [CrossRef]
Cracknell, M. Machine Learning for Geological Mapping: Algorithms and Applications. Ph.D. Thesis, University of Tasmania, Hobart, Australia, 2014. [Google Scholar]
Xu, L.; Green, E.C. Inferring geological structural features from geophysical and geological mapping data using machine learning algorithms. Geophys. Prospect. 2023, 71, 1728–1742. [Google Scholar] [CrossRef]
He, Y.; Zhou, Y.; Wen, T.; Zhang, S.; Huang, F.; Zou, X.; Ma, X.; Zhu, Y. A review of machine learning in geochemistry and cosmochemistry: Method improvements and applications. Appl. Geochem. 2022, 140, 105273. [Google Scholar] [CrossRef]
Wang, L.; Lu, J.; Luo, Y.; Ren, B.; Li, A.; Zhao, N. An Automated Quantitative Methodology for Computing Gravel Parameters in Imaging Logging Leveraging Deep Learning: A Case Analysis of the Baikouquan Formation within the Mahu Sag. Processes 2024, 12, 1337. [Google Scholar] [CrossRef]
Yang, C.; Meng, H.; Ye, Y.; Yong, X.; Chang, D. Research and application of intelligent seismic identification technology of sedimentary facies. Oil Geophys. Prospect. 2023, 58, 528–539. [Google Scholar]
Zhu, L.; Wei, J.; Wu, S.; Zhou, X.; Sun, J. Application of unlabelled big data and deep semi-supervised learning to significantly improve the logging interpretation accuracy for deep-sea gas hydrate-bearing sediment reservoirs. Energy Rep. 2022, 8, 2947–2963. [Google Scholar] [CrossRef]
Liu, H.; Xu, S.; Ge, X.; Liu, J.; Zahid, M.A. Automatic sedimentary microfacies identification from logging curves based on deep process neural network. Clust. Comput. 2019, 22, 12451–12457. [Google Scholar] [CrossRef]
Summaira, J.; Li, X.; Shoib, A.M.; Li, S.; Abdul, J. Recent advances and trends in multimodal deep learning: A review. arXiv 2021, arXiv:2105.11087. [Google Scholar]
Arabi Aliabad, F.; Shojaei, S.; Zare, M.; Ekhtesasi, M. Assessment of the fuzzy ARTMAP neural network method performance in geological mapping using satellite images and Boolean logic. Int. J. Environ. Sci. Technol. 2019, 16, 3829–3838. [Google Scholar] [CrossRef]
Hu, W.S.; Peng, J. Application of Well Logging Curves in Sedimentary Micro facies Research. West-China Explor. Eng. 2010, 22, 61–64. [Google Scholar]
Danciu, C.; Marchiș, D.; Cucăilă, S. The Influence of Physical Parameters on the Mechanical Characteristics of Some Volcanic Magmatic Rocks of the Andesite Type. Min. Rev. 2021, 27, 30–41. [Google Scholar] [CrossRef]
Macfarlane, P.; Doveton, J.; Coble, G. Interpretation of lithologies and depositional environments of Cretaceous and Lower Permian rocks by using a diverse suite of logs from a borehole in central Kansas. Geology 1989, 17, 303–306. [Google Scholar] [CrossRef]
Rivard, B.; Harris, N.; Feng, J.; Dong, T. Inferring total organic carbon and major element geochemical and mineralogical characteristics of shale core from hyperspectral imagery. AAPG Bull. 2018, 102, 2101–2121. [Google Scholar] [CrossRef]
Cao, M.; Lin, S.; Xiao, Y.; Wang, R.; Qiu, B. Construction of multi-modal knowledge graph for logging field. Comput. Technol. Dev. 2024, 1–8. [Google Scholar]
Chang, D.; Yong, X.; Gao, J.; Chen, D.; Wang, W. Research on Multimodal Multitasking Intelligent Large Model for Oil and Gas Geophysics. In Proceedings of the Second Annual China Petroleum Physical Exploration Symposium (Next Volume); PetroChina Research Institute of Petroleum Exploration and Development-Northwest, China University of Petroleum: Beijing, China, 2024; pp. 380–383. [Google Scholar]
Caceres, V.A.T.; Duffaut, K.; Yazidi, A.; Westad, F.; Johansen, Y.B. Automated well log depth matching: Late fusion multimodal deep learning. Geophys. Prospect. 2023, 72, 155–182. [Google Scholar] [CrossRef]
Liang, L.; Le, T.; Zimmermann, T.; Zeroug, S.; Heliot, D. A machine learning framework for automating well log depth matching. In Proceedings of the SPWLA Annual Logging Symposium, The Woodlands, TX, USA, 15–19 June 2019; p. D033S003R009. [Google Scholar]
Ho, T.K. Random Decision Forest. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Khan, S.Q.; Kirmani, F.U.D. Applicability of deep neural networks for lithofacies classification from conventional well logs: An integrated approach. Pet. Res. 2024. [Google Scholar] [CrossRef]
Wang, J.; Cao, J.; You, J.; Cheng, M.; Zhou, P. A method for well log data generation based on a spatio-temporal neural network. J. Geophys. Eng. 2021, 18, 700–711. [Google Scholar] [CrossRef]
Rohit; Manda, S.R.; Raj, A.; Dheeraj, A.; Rawat, G.S.; Choudhury, T. Identification of Lithology from Well Log Data Using Machine Learning. EAI Endorsed Trans. Internet Things 2024, 10, 1–10. [Google Scholar] [CrossRef]
McDonald, A. Data quality considerations for petrophysical machine-learning models. Petrophysics 2021, 62, 585–613. [Google Scholar]
Fix, E. Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties; Report No. 4; USAF School of Aviation Medicine: Randolph Field, TX, USA, 1951. [Google Scholar]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Xie, P.; Yin, Y.; Li, W.; Li, F.; Zhao, L. Distribution of remaining oil based on a single sand body analysis: A case study of Xingbei Oilfield. J. Pet. Explor. Prod. Technol. 2018, 8, 1159–1167. [Google Scholar]
Min, R.; Stanley, D.A.; Yuan, Z.; Bonner, A.; Zhang, Z. A deep non-linear feature mapping for large-margin knn classification. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, Miami, FL, USA, 6–9 December 2009; pp. 357–366. [Google Scholar]
Ali, J.; Ashraf, U.; Anees, A.; Peng, S.; Umar, M.U.; Vo Thanh, H.; Khan, U.; Abioui, M.; Mangi, H.N.; Ali, M.; et al. Hydrocarbon potential assessment of carbonate-bearing sediments in a meyal oil field, Pakistan: Insights from logging data using machine learning and quanti elan modeling. ACS Omega 2022, 7, 39375–39395. [Google Scholar] [CrossRef]
Guo, L.; Renze, L.; Li, X.; Tuo, J.; Lei, C.; Yang, Z. Logging Data Completion Based on an MC-GAN-BiLSTM Model. IEEE Access 2021, 10, 1810–1822. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Shi, F.; Gao, X.; Li, R.; Zhang, H. Ensemble Learning for the Land Cover Classification of Loess Hills in the Eastern Qinghai–Tibet Plateau Using GF-7 Multitemporal Imagery. Remote Sens. 2024, 16, 2556. [Google Scholar] [CrossRef]
Patil, S.; Subil, D.; Nasar, N.; Kokatnoor, S.A.; Krishnan, B.; Kumar, S. Text Mining-A Comparative Review of Twitter Sentiments Analysis. Recent Adv. Comput. Sci. Commun. 2024, 17, 21–37. [Google Scholar] [CrossRef]
Kabari, L.G.; Onwuka, U.C. Comparison of bagging and voting ensemble machine learning algorithm as a classifier. Int. Journals Adv. Res. Comput. Sci. Softw. Eng. 2019, 9, 19–23. [Google Scholar]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Veit, A.; Wilber, M.J.; Belongie, S. Residual networks behave like ensembles of relatively shallow networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2016; Volume 29. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 630–645. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Alzubaidi, F.; Mostaghimi, P.; Swietojanski, P.; Clark, S.R.; Armstrong, R.T. Automated lithology classification from drill core images using convolutional neural networks. J. Pet. Sci. Eng. 2021, 197, 107933. [Google Scholar] [CrossRef]
Zhang, X.; Wu, K.; Ma, Q.; Chen, Z. Research on object detection model based on feature network optimization. Processes 2021, 9, 1654. [Google Scholar] [CrossRef]
Trott, M.; Mooney, C.; Azad, S.; Sattarzadeh, S.; Bluemel, B.; Leybourne, M.; Layton-Matthews, D. Alteration assemblage characterization using machine learning applied to high-resolution drill-core images, hyperspectral data and geochemistry. Geochem. Explor. Environ. Anal. 2023, 23, geochem2023-032. [Google Scholar] [CrossRef]
Boulahia, S.Y.; Amamra, A.; Madi, M.R.; Daikh, S. Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach. Vis. Appl. 2021, 32, 121. [Google Scholar] [CrossRef]
Wajid, M.A.; Zafar, A. Multimodal fusion: A review, taxonomy, open challenges, research roadmap and future directions. Neutrosophic Sets Syst. 2021, 45, 8. [Google Scholar]
Fu, H.; Lu, H. A Review of Multimodal Sentiment Analysis: Modal Fusion and Representation. In Proceedings of the 2024 International Wireless Communications and Mobile Computing (IWCMC), Ayia Napa, Cyprus, 27–31 May 2024; pp. 0049–0054. [Google Scholar]
Shen, H.; Song, S.; Gunes, H. Multi-modal Human Behaviour Graph Representation Learning for Automatic Depression Assessment. In Proceedings of the 2024 18th International Conference on Automatic Face and Gesture Recognition (FG), Istanbul, Turkiye, 27–31 May 2024. [Google Scholar]
Kniesmeijer, K. Ensemble Learning to Recognize Human Actions Using the Icub Robot Multimodal Dataset. Ph.D. Thesis, Tilburg University, Tilburg, The Netherlands, 2022. [Google Scholar]
Datsi, T.; Aznag, K.; BenAli, B.A.; Karbout, K.; El Oirrak, A.; Khayya, E.K. A Short Survey on Multimodal Data Fusion in Image Classification. In Proceedings of the 2024 International Conference on Global Aeronautical Engineering and Satellite Technology (GAST), Marrakesh, Morocco, 24–26 April 2024; pp. 1–4. [Google Scholar]
Qin, G.; Zou, C.; Kuang, M.; Tian, Y.; Chen, Y.; Chen, Y. The evolution characteristie of massive alluvial fan and its controling elleet on the hvdrocarbon aceumulation: A case of Baikouquan Formation in B21 bloek, Baikouquan Oilfield. Sci. Technol. Eng. 2018, 18, 79–87. [Google Scholar]
Lv, H. Characteristies of diagenetie fluids in the Lower TriassieBaikouquan Formation Clastic Reservoir in the Mahu Depression, Junggar Basin. Master’s Thesis, Guizhou University, Guiyang, China, 2023. [Google Scholar]
Chen, L.; LU, Y.; Xu, H.; Xu, Z. High resolution sequence stratigraphy in fan delta facies: A case study from Upper Karamay Formation reservoir in Bai-21 well area in Karamay Oilfield. Lithol. Reserv. 2012, 24, 30–35. [Google Scholar]
Feng, Z.Z. A review on the definitions of terms of sedimentary facies. J. Palaeogeogr. 2019, 8, 1–11. [Google Scholar] [CrossRef]
Li, Y. The Geochemical Characteristies and Significance of Basic Rocksin Karamay Ophiolitie Mélange Rock in West Junggar, Xinjiang. Master’s Thesis, Chengdu University of Technology, Chengdu, China, 2020. [Google Scholar]
Shen, C. Research of Rules of Composite Oil and Gas Accumulation in Northwestern Margin of Junggar Basin. Master’s Thesis, China University of Petroleum (East China), Qingdao, China, 2020. [Google Scholar]

Figure 1. Structure of the random forest model.

Figure 2. Bagging classifier model diagram.

Figure 3. Block diagram of the ResNet-50 network.

Figure 4. Framework diagram of the multimodal feature fusion prediction model.

Figure 5. (a) Tectonic location of the Junggar Basin, modified from [52]. (b) Comprehensive stratigraphic column (Well B), modified from [53].

Figure 6. Rock types and sedimentary structures of the Keshang Formation in the study area. (a) Brown mudstone, massive bedding, upper water distributary channel, 1843.7 m, well B. (b) Reddish-brown imbricate arrangement aligned sandy, massive bedding conglomerate, distributary channel, 1824 m, well A. (c) Gray–green mudstone, massive bedding, submerged distributary channel, 2166.6 m, well G. (d) Gray–green mixed aligned glutenite, scour surface, main channel, 1832.5 m, well B. (e) Greenish gray muddy siltstone, convolute bedding, channel edge, 2027.9 m, well F. (f) Gray–green muddy fine sandstone, parallel bedding, debouch bar, 1924.4 m, well F. (g) Gray conglomerate-bearing muddy fine sandstone, massive bedding, debouch bar, 1987 m, well I. (h) Greenish-gray imbricate arrangement aligned conglomerate, massive bedding, main channel, 1711.5 m, well H. (i) Gray–green imbricate arrangement aligned sandy anisometric conglomerate, massive bedding, distributary channel, 1898 m, well D.

Figure 7. Tuff from well L, indicating that the age of this stratum in the region was influenced by volcanic activity.

Figure 8. Confusion matrix for unimodal model predictions, illustrating the classification performance across four different data types. (a) Well logs: Demonstrate relatively high accuracy due to the rich feature set provided by the logging data, although there are occasional misclassifications likely due to class imbalance. (b) Physical properties: Show stable predictions across most categories, but exhibit some variability in performance when dealing with data sparsity or subtle feature differences. (c) Textual descriptions: Experience more misclassifications, particularly in underrepresented categories, reflecting the challenges of working with unstructured text data. (d) Core images: Effectively identify distinct geological features from core images, maintaining consistent accuracy across most categories, with minor misclassifications reflecting the inherent complexity of the data.

Figure 9. Confusion matrix for multimodal model predictions. (a) Validation set. (b) Test set.

Figure 10. Sedimentary facies prediction results for continuous well profiles.

Table 1. Data types of different well locations.

Well Name	Data Type	Depth (m)	Logging Interval (m)
A	Training and Validation set	1730–1927	0.125
B	Training and Validation set	1591–1901	0.125
C	Training and Validation set	1625–1936	0.125
D	Training and Validation set	1930–2013	0.125
E	Training and Validation set	1709–2008	0.125
F	Training and Validation set	1790–2029	0.125
G	Training and Validation set	1943–2213	0.125
H	Training and Validation set	2010–2258	0.125
I	Training and Validation set	1947–2203	0.125
J	Training and Validation set	1797–2081	0.125
K	Training and Validation set	1679–1971	0.125
L	Test set	1920–2192	0.125

Table 2. Statistics of sedimentary microfacies (in well I).

Top Depth	Bottom Depth	Microfacies	Top Depth	Bottom Depth	Microfacies
1941.00	1950.00	3.0	2101.00	2113.00	2.0
1950.00	1955.50	2.0	2113.00	2120.20	1.0
1955.50	1959.50	5.0	2120.20	2144.00	2.0
1959.50	1979.00	2.0	2144.00	2151.00	1.0
1979.00	1990.00	3.0	2151.00	2170.00	3.0
1990.00	2028.00	2.0	2170.00	2177.00	2.0
2028.00	2035.00	4.0	2177.00	2185.00	3.0
2035.00	2061.00	2.0	2185.00	2190.00	2.0
2061.00	2066.00	3.0	2190.00	2195.00	3.0
2066.00	2083.50	5.0	2195.00	2203.00	5.0
2083.50	2094.00	2.0	2203.00	2100.20	2.0
2094.00	2101.00	4.0	2100.20	2102.00	4.0

Table 3. Codes for sedimentary microfacies.

Code	Microfacies Type
1	Main channel
2	Distributary channel
3	Channel edge
4	Interchannel
5	Debouch bar

Table 4. Experimental results of logging to predict sedimentary microfacies.

Facies	Precision	Recall	F1-Score	Support
1.0	0.95	0.97	0.96	1275
2.0	0.94	0.92	0.93	688
3.0	0.98	0.83	0.9	312
4.0	0.93	0.97	0.95	1056
5.0	0.96	0.84	0.9	164
Accuracy			0.94	3495
Macro avg	0.95	0.91	0.93	3495
Weighted avg	0.94	0.94	0.94	3495

Table 5. Experimental results of physical properties to predict sedimentary microfacies.

Facies	Precision	Recall	F1-Score	Support
1.0	0.78	0.83	0.81	1665
2.0	0.7	0.71	0.7	1283
3.0	0.61	0.47	0.53	452
4.0	0.83	0.88	0.86	1581
5.0	0.71	0.39	0.5	217
Accuracy			0.76	5198
Macro avg	0.73	0.65	0.68	5198
Weighted avg	0.76	0.76	0.76	5198

Table 6. Experimental results of textual descriptions to predict sedimentary microfacies.

Facies	Precision	Recall	F1-Score	Support
1.0	0.68	0.86	0.76	37
2.0	0.76	0.83	0.80	24
3.0	0.56	0.50	0.53	4
4.0	0.59	0.59	0.59	27
5.0	0.30	0.33	0.32	3
Accuracy			0.72	95
Macro avg	0.58	0.62	0.60	95
Weighted avg	0.66	0.72	0.70	95

Table 7. Experimental results of core images used to predict sedimentary microfacies.

Facies	Precision	Recall	F1-Score	Support
1.0	0.94	0.94	0.94	32
2.0	0.89	0.94	0.91	34
3.0	0.50	0.50	0.50	3
4.0	0.94	0.92	0.93	64
5.0	0.50	0.50	0.50	2
Accuracy			0.90	135
Macro avg	0.75	0.76	0.76	135
Weighted avg	0.91	0.91	0.91	135

Table 8. Experimental results of multimodal data fusion prediction.

Facies	Precision	Recall	F1-Score	Support
1.0	0.97	0.99	0.98	413
2.0	0.96	0.98	0.97	363
3.0	0.9	0.98	0.94	46
4.0	0.99	0.97	0.98	708
5.0	0.87	0.84	0.85	31
Accuracy			0.97	1561
Macro avg	0.94	0.95	0.94	1561
Weighted avg	0.97	0.97	0.97	1561

Table 9. Predictive effectiveness of the test set (well L) in multimodal fusion models.

Facies	Precision	Recall	F1-Score	Support
1.0	0.96	0.96	0.96	53
2.0	0.96	0.95	0.96	85
3.0	0.88	0.95	0.91	39
4.0	0.95	0.93	0.94	45
5.0	0.91	0.83	0.87	12
Accuracy			0.94	234
Macro avg	0.93	0.93	0.93	234
Weighted avg	0.95	0.94	0.94	234

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yi, Y.; Zhang, Y.; Hou, X.; Li, J.; Ma, K.; Zhang, X.; Li, Y. Sedimentary Facies Identification Technique Based on Multimodal Data Fusion. Processes 2024, 12, 1840. https://doi.org/10.3390/pr12091840

AMA Style

Yi Y, Zhang Y, Hou X, Li J, Ma K, Zhang X, Li Y. Sedimentary Facies Identification Technique Based on Multimodal Data Fusion. Processes. 2024; 12(9):1840. https://doi.org/10.3390/pr12091840

Chicago/Turabian Style

Yi, Yuchuan, Yuanfu Zhang, Xiaoqin Hou, Junyang Li, Kai Ma, Xiaohan Zhang, and Yuxiu Li. 2024. "Sedimentary Facies Identification Technique Based on Multimodal Data Fusion" Processes 12, no. 9: 1840. https://doi.org/10.3390/pr12091840

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sedimentary Facies Identification Technique Based on Multimodal Data Fusion

Abstract

1. Introduction

2. Methodology

2.1. Framework of the Prediction Model

2.2. Multimodal Feature Extraction

2.2.1. Logging Curve Feature Extraction

2.2.2. Text Data Feature Extraction

2.2.3. Core Image Feature Extraction

2.3. Multimodal Data Fusion

3. General Situation of Geology

3.1. Geological Features

3.2. Sedimentary Types

4. Experiments and Results Analysis

4.1. Data Description and Processing

4.2. Unimodal Experiments and Results

4.2.1. Well Logs Prediction Experiment

4.2.2. Physical Properties Prediction Experiment

4.2.3. Textual Descriptions Prediction Experiment

4.2.4. Core Image Prediction Experiments

4.3. Multimodal Data Fusion Model

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI