Multitarget Intelligent Recognition of Petrographic Thin Section Images Based on Faster RCNN

Wang, Hanyu; Cao, Wei; Zhou, Yongzhang; Yu, Pengpeng; Yang, Wei

doi:10.3390/min13070872

Open AccessFeature PaperArticle

Multitarget Intelligent Recognition of Petrographic Thin Section Images Based on Faster RCNN

by

Hanyu Wang

^1,2,3,

Wei Cao

^1,2,3,*,

Yongzhang Zhou

^1,2,3,*

,

Pengpeng Yu

^1,2,3 and

Wei Yang

^3,4

¹

School of Earth Science and Engineering, Sun Yat-sen University, Zhuhai 519000, China

²

Centre for Earth Environment and Resources, Sun Yat-sen University, Zhuhai 519000, China

³

Guangdong Provincial Key Lab of Geological Process and Mineral Resources, Zhuhai 519000, China

⁴

Guangdong Avi Technology Research Institute, Guangzhou 510000, China

^*

Authors to whom correspondence should be addressed.

Minerals 2023, 13(7), 872; https://doi.org/10.3390/min13070872

Submission received: 22 May 2023 / Revised: 21 June 2023 / Accepted: 25 June 2023 / Published: 28 June 2023

(This article belongs to the Special Issue Application of Big Data Mining, Machine Learning and Artificial Intelligence in Ore Deposits)

Download

Browse Figures

Versions Notes

Abstract

:

The optical features of mineral composition and texture in petrographic thin sections are an important basis for rock identification and rock evolution analysis. However, the efficiency and accuracy of human visual interpretation of petrographic thin section images have depended on the experience of experts for a long time. The application of image-based computer vision and deep-learning algorithms to the intelligent analysis of the optical properties of mineral composition and texture in petrographic thin section images (in plane polarizing light) has the potential to significantly improve the efficiency and accuracy of rock identification and classification. This study completed the transition from simple petrographic thin image classification to multitarget detection, to address more complex research tasks and more refined research scales that contain more abundant information, such as spatial, quantitative and category target information. Oolitic texture is an important paleoenvironmental indicator that widely exists in sedimentary records and is related to shallow water hydraulic conditions. We used transfer learning and image data augmentation in this paper to identify the oolitic texture of petrographic thin section images based on the faster region-based convolutional neural network (Faster RCNN) method. In this study, we evaluated the performance of Faster RCNN, a two-stage object detection algorithm, using VGG16 and ResNet50 as backbones for image feature extraction. Our findings indicate that ResNet50 outperformed VGG16 in this regard. Specifically, the Faster RCNN model with ResNet50 as the backbone achieved an average precision (AP) of 92.25% for the ooids test set, demonstrating the accuracy and reliability of this approach for detecting ooids. The experimental results also showed that the uneven distribution of training sample images and the complexity of images both significantly affect detection performance; however, the uneven distribution of training sample images has a greater impact. Our work is preliminary for intelligent recognition of multiple mineral texture targets in petrographic thin section images. We hope that it will inspire further research in this field.

Keywords:

Faster RCNN; object detection; intelligent recognition; petrographic thin section; VGG; ResNet; big data mining; computer vision

1. Introduction

In recent years, the introduction of big data and artificial intelligence algorithms has been a major development in geological research fields. Machine learning and deep learning algorithms have been used to uncover hidden relationships in massive structured and unstructured geoscience data [1,2,3,4,5,6], identify geochemical anomalies [7,8,9,10,11,12], and predict potential mineral resources [13,14,15,16,17]. Furthermore, with the development of deep learning, image-based computer vision technology has made it possible to intelligently identify and classify rocks by their optical properties [18,19,20,21,22].

An important rock identification method is observing and studying the optical properties of mineral composition and texture in petrographic thin sections under a polarizing microscope [23]. This process requires the professional ability and expert experience of observers; thus, some scholars have used machine learning and deep learning algorithms to classify rocks with promising results [20,24,25]. For instance, Młynarczuk et al. [24] used the nearest neighbor method, K-nearest neighbor algorithm, closest approach mode method, and optimal spherical domain method to classify nine kinds of rock flakes and achieved satisfactory accuracy. Shu et al. [25] compared manually selected features and automatically extracted features based on the k-means method to classify rock images. They found that k-means has greater robustness and flexibility in the classification effect of rock images, albeit requiring a large number of training sets as support.

With advances in deep learning methods in computer vision, rock image classification research methods have shifted toward deep learning. Deep learning models generally adopt mature convolutional neural network models, such as the Visual Geometry Group Network (VGG) [26], UNet [27], Residual Network (ResNet) [28], Dense Convolutional Network (DenseNet) [29], Inception-v3 [30] and Faster Region-based Convolutional Neural Network (Faster RCNN) [31]. These models have demonstrated excellent performance in computer vision and have also achieved good classification results in rock image classification. Classification studies of rock images can be divided into two categories according to the image scale: macroscopic hand specimen rock and microscopic petrographic thin section images. While some scholars have conducted research on rock image classification from a macroscopic perspective [25,32,33,34,35], others have studied petrographic thin section image classification from a microscopic perspective [24,36,37,38,39]. For instance, Zhang et al. [33] classified granite, phyllite and breccia images based on the Inception-V3 deep learning model and transfer learning method. They used a total of 571 rock images and achieved a satisfactory overall classification effect despite the small size of the training set. Xu and Zhou [39] used the UNet model to classify and recognize pyrite, chalcopyrite, galena and sphalerite in petrographic thin section images. After data enhancement, they obtained a recognition success rate of more than 90% for each mineral. Bai et al. [40] used the VGG model to classify petrographic thin section images of dolomite, andesite, oolitic limestone, granite, lithic sandstone and quartz sandstone, with 1000 samples in each category. They achieved an accuracy rate of 82% on the test set. Polat et al. [37] used the transfer learning method to classify 1200 rock thin sections of 6 kinds of volcanic rocks based on ResNet50 and DenseNet121 models, with an average classification accuracy of more than 98%. These results demonstrate the potential of transfer learning and data enhancement in achieving good results even with small data sets. In these classification studies, deep learning models have been used to classify rocks by learning the features in rock images or petrographic thin section images, which usually contain only one rock type and one target [33,36,37,38]. To classify multitarget images, the tedious process of standardizing the images into single-target images of the same size and resolution is usually needed.

The above studies of rock and petrographic thin section images have largely focused on mineral composition identification, with few investigations into mineral textural and structural characteristics. In fact, it is not only the mineral composition, but also the mineral textural and structural characteristics that are crucial to determining the rock type and its genesis. Oolitic texture is a kind of texture widely existing in sedimentary rocks. It is composed of small ooids (spherical or ellipsoidal particles composed of a core and a shell surrounding the core) with high porosity and permeability. Therefore, oolitic shoal reservoirs are important targets for oil and gas exploration [41,42]. However, previous studies on intelligent recognition of oolitic texture only focused on whether a petrographic thin section image has oolitic texture or not [40], but did not pay attention to information such as the size, shape, aspect ratio, sorting index, etc., that is bearing in ooids, and is important for reconstructing the paleogeographic environment. Intelligent identification of these minimal units is more challenging than mineral composition identification, as it requires multitarget localization and classification, and thus meets the requirements of multicomponent intelligent recognition in petrographic thin section images.

In this study, the Faster RCNN model was applied to multitarget identification of oolitic texture on petrographic thin section images. The training and test sets were composed of sedimentary rock petrographic thin section images, which contain multiple ooids. This study has two main characteristics: (1) it transitions the research based on petrographic thin section images from simple classification to multitarget detection, making it more consistent with real situations and the research task more complex; (2) the research scale is more refined and contains more abundant information, such as the spatial, quantitative and category target information, which is of reference significance for multitarget mineral intelligent recognition on petrographic thin section images. The present study offers potential benefits to users of the model who can utilize the automatically detected ooids as a reference for their own research on oolitic texture, thereby reducing the time and effort required for manual identification and measurement. Furthermore, this study lays the groundwork for future research on oolitic texture and related fields, including mineralogy, geology and paleontology, which may lead to novel discoveries and insights.

2. Materials and Methods

The microscopic image data of oolitic texture are from the “Micro image data set of some rock-forming minerals, typical metamorphic minerals and oolitic thin sections” of the public database ScienceDB [43]. Each microscopic image contains multiple ooids.

The object detection model can accomplish both localization and classification tasks and, thus, multitarget recognition. The object detection model generates a large number of anchor boxes on the original image by direct or indirect methods and calculates the intersection over union (IOU) between these boxes and the ground truth box, as shown in Figure 1. Positive and negative samples are determined according to the threshold of the IOU ratio, and the positive and negative samples are classified and trained by regression. Therefore, the object detection task is decomposed into classification and regression tasks [44,45,46].

Object detection models can be divided into two categories: one-stage and two-stage models. One-stage models are generally faster, but two-stage object detection models have shown higher accuracy and better small target detection capabilities than one-stage models [44]. This study applies the classic Faster RCNN model in the two-stage approach to multitarget identification of oolitic texture in petrographic thin section images due to the varying mineral composition and texture scales.

2.1. Faster RCNN

The Faster RCNN model, as depicted in Figure 2, consists of a feature extraction backbone network, region proposal network (RPN), region of interest (ROI) pooling and classification and regression. The calculation process can be divided into the following four steps [31]:

Backbone network: The backbone network comprises a series of convolution, batch normalization, activation function and pooling operations. It is used to extract image features and generate feature maps.
Region proposal networks (RPN): A large number of anchor boxes are indirectly generated on an image, and the IOU ratio between each anchor box and the ground truth box is calculated. The anchor boxes are labeled as positive samples and negative samples according to the IOU threshold, and the positive and negative samples are classified and trained by regression. The final output is 300 relatively accurate region proposals (ROI).
ROI pooling: The region proposal output in the previous step is projected onto the feature map. The feature maps of these region proposals are different sizes, so it is necessary to standardize them and output feature maps with the same size of 7 × 7, which is convenient to connect the subsequent fully connected layer.
Classification and regression: More accurate classification and regression are performed on the feature map output in the previous step.

2.1.1. VGG16

During the 2014 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [47], the VGG Lab at the University of Oxford introduced the VGG series network model architecture [26]. The primary innovation of the VGG series is the utilization of small convolution kernel continuous convolution, as opposed to large convolution kernel convolution operation. This approach extends the network depth while ensuring the same receptive field.

The VGG models are characterized by their simple and elegant structure, as well as their impressive performance. The VGG model comprises five VGG-blocks (VGG-block1, VGG-block2, VGG-block3, VGG-block4, VGG-block5), five max pooling layers and three fully connected layers (FC). The varying number of convolutional layers in each VGG-block constitutes VGG-block1, VGG-block2, VGG-block3, VGG-block4 and VGG-block5. VGG11, VGG13, VGG16 and VGG19 are composed of different combinations of VGG-blocks. Among the VGG series, VGG16 is the most classic. In this study, we selected VGG16, which utilizes a 3 × 3 convolution kernel size as the backbone for feature extraction.

Figure 3 illustrates the architecture of VGG16, which comprises five VGG-blocks. VGG-block1 and VGG-block2 each consist of two convolution layers that employ a 3 × 3 convolution kernel size. In contrast, VGG-block3, VGG-block4 and VGG-block5 each contain three convolution layers that also utilize a 3 × 3 convolution kernel size. Subsequently, three fully connected layers are connected to the output.

2.1.2. ResNet50

The residual network model (ResNet) is a classical convolutional neural network model proposed by He et al. [28]. It emerged in the ILSVRC and won the classification, localization and detection category. The residual block in the network structure is the primary feature that distinguishes ResNet from the previous convolutional neural networks. The introduction of the residual block expands the depth of the convolutional neural network and develops a network structure with dozens or hundreds of layers [28].

In the standard convolutional network, the input is x, and after the convolutional layer, the input is transformed into y. The mapping that this standard convolutional network needs to learn is F, where y = F(x). In the ResNet convolution network, the input is x. After going through the convolution layer, the input x on the shortcut connection (which may need to be transformed) is added, and the output after feature transformation is y. The mapping that this convolution network needs to learn is F, where F(x) = y-x, as the learning mapping is the difference between output and input. Therefore, the network structure corresponding to F(x) is referred to as the residual building block. Compared with the standard convolutional network, the residual network has added shortcut connections, as shown in Figure 4.

ResNet is stacked with one convolutional layer, four residual modules and one fully connected layer. Common ResNets are ResNet18, ResNet34, ResNet50, ResNet101 and ResNet152. ResNet50 is used in this study, as shown in Figure 5. ResNet50 contains five blocks. Conv1 has only one convolution layer, and the convolution kernel is 7 × 7. Conv2 has 3 residual modules; Conv3 contains 4 residual modules; Conv4 contains 6 residual modules; and Conv5 contains 3 residual modules. Each residual module has a shortcut connection and contains 3 convolution layers; the convolution kernel sizes are 1 × 1, 3 × 3 and 1 × 1. The backbone network of the Faster RCNN model applied in this study is based on ResNet50.

2.1.3. Transfer Learning

Deep learning models have been shown to be heavily reliant on large data sets [48]. When the size of data sets is limited, deep neural networks with a large number of parameters (e.g., ResNet50) cannot be fully trained, which will lead to overfitting and other problems. To avoid overfitting problems and enhance the generalization ability of the model, transfer learning and data augmentation are usually adopted. Transfer learning generally uses the network weights that have been trained on other large data sets as the initial weights of new tasks. The data enhancement method generates additional image data by translation, flipping, cropping and brightness adjustment of the original image, which is then used to supplement the training data set [48,49].

2.1.4. Average Precision

In the binary decision problem, the classification decision is to classify the samples as positive or negative. Four results are generated by comparing the predicted value with the real value. The decision result of the classifier can be represented by a confusion matrix [50], as shown in Table 1.

T indicates that the sample is correctly classified, F indicates that the sample is incorrectly classified, P denotes the sample is classified as positive and N denotes the sample is classified as negative.

The meanings of TP, FP, FN and TN are as follows:

TP (true positive): A positive sample is correctly classified as positive.

FP (false positive): A negative sample is incorrectly classified as positive.

FN (false negative): A positive sample is incorrectly classified as negative.

TN (true negative): A negative sample is correctly classified as negative.

Recall: the probability of correct positive samples among all positive samples.

Re call = \frac{TP}{TP + FN}

(1)

Precision: the probability of positive samples correctly classified among all positive samples returned.

Precision = \frac{TP}{TP + FP}

(2)

Intersection-Over-Union (IOU): IOU is the ratio of the intersection to the union of the predicted box and the ground truth box. In region proposal networks, samples with an IOU greater than 0.7 are considered positive, while those with an IOU less than 0.3 are considered negative. The IOU threshold is crucial for calculating the average precision (AP), which is a key metric for evaluating the quality of object detection. In the evaluation of the validation set and test set, an IOU threshold of 0.5 is utilized to calculate the AP.

Average precision: indicates the average accuracy of different recall rates. This index is used to evaluate the performance of ooid detection in this study. The prediction boxes in the test set were sorted according to the confidence from high to low, and the correct rate and recall rate were calculated successively. The Precision-Recall (PR) curve with the recall rate as the abscissa and the correct rate as the ordinate was drawn, and the area enclosed between the rounded PR curve and the coordinate axis was calculated by the integration form as the AP value of this category.

2.2. Experiments

The ooids in the original microscopic images were annotated by LabelImg [51], an open source Python library, and the annotated images were used for training and testing the Faster RCNN. The code has been modified from the source code obtained from GitHub [52]. To improve the computational efficiency and speed up the operation, all the training was completed on a GPU. To accelerate the convergence of the model, the weights that were trained on the PASCAL Visual Object Classes [53] data set were selected as the initial weights. For ablation experiments, VGG16 and ResNet50 were employed as the backbone of the Faster RCNN model. The hardware and software used in the experiment are shown in Table 2 below.

The performance of a model reaches saturation under ideal conditions as the size of the dataset increases, beyond which it no longer improves. Additionally, the optimal state of the model’s performance is achieved with an increase in model complexity, but further increases in complexity lead to a decrease in performance [54]. The size of the dataset and the complexity of the model are closely linked to the model’s performance. Complex models require larger datasets to fit the model, and they tend to predict better performance. Warden [55] suggests that for image classification tasks in deep learning, a general rule of thumb is to have 1000 images per classification. However, this requirement can be reduced significantly by using transfer learning and introducing a pre-trained model in a classification task. In the same domain, ten or twenty new categories may suffice. However, if the categories are not in the same field, more training images are necessary.

There are a total of 103 images in the original data set, of which 56 single-polarized microscopic images are selected as experimental data. We chose a random sampling method to reduce human intervention while making it simple and easy to implement. Ninety percent of the images are randomly selected as the training set (81%) and validation set (9%), and the remaining 10% are selected as the test set. The training set images were data enhanced, and the data set was expanded to 315 images by cropping, translation, rotation, flipping, adding random noise and adjusting brightness. Figure 6 shows the comparison between the original image and the image after data enhancement. After the training, the weight with the smallest loss value is selected as the weight of the test set to test and verify its performance.

3. Results

The experimental results of this study showed that the multitarget detection of ooids based on Faster RCNN has good performance. In the ablation test of the Faster RCNN model with VGG16 and ResNet50 as backbones, the AP values for VGG16 and ResNet50 in the test set were 87.83% (Figure 7) and 92.25% (Figure 8), respectively. The performance of ResNet50 as the backbone of the Faster RCNN model in the test set was analyzed. Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 show the test results of ooid detection in the test set using ResNet50 as the backbone. It was observed that when the features of ooids in the microscopic image were distinct from the background, the detection performance was good, and the ooids were accurately located (Figure 9 and Figure 10). Conversely, when the composition of the image was relatively complex and the difference between the ooids and the background was not obvious, misclassification occurred (Figure 11, Figure 12 and Figure 13), and the uneven distribution of samples in the training set was found to have a significant impact on the detection performance, with cases of missed detection occurring when the number of similar samples in the training set and the test set was small (Figure 14).

In Figure 9, the detection model revealed a large number of ooids with moderate size proportions, small differences and uniform distribution, which were significantly distinct from the background. The detection result was highly intuitive, enabling the accurate identification of complete ooids in the image, as well as those partially displayed ooids located at the edge.

In Figure 10, the model detected a large number of ooids with moderate size proportions, small differences and uniform distribution, which were distinct from the background. The checking model yielded satisfactory results, with complete ooids near the center of the image correctly identified. However, a few misclassifications occurred at the image′s edges.

In Figure 11, the composition complexity of components is low, and oolitic features are relatively obvious, while in Figure 12 and Figure 13, the oolitic texture features are more complex, and the size of oolitic particles differs greatly from the background, although not as much as in Figure 9 and Figure 10. The main ooids were correctly detected in all three figures, and large ooids at the edge were localized and detected.

The features of complete ooids in Figure 14 are not as obvious as those in Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13, and the contrast with the background is not as pronounced, leading to a potential misclassification. Nevertheless, the ooids in Figure 11, Figure 12 and Figure 13 are also highly similar to the background. Although misclassification may occur in the image edge, the main ooids were still detected and identified without error. Comparing Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 revealed that the detectability of ooids is not solely dependent on the distinctiveness of their features and the contrast with the background. A comparison of the original training data set shows that the number of samples with high similarity to the test in Figure 14 in the training set is small, suggesting that the distribution of training samples can have a significant impact on the detection performance.

4. Discussion

The potential causes of misclassification in detection can be attributed to a variety of factors, including the following:

The complexity of image components and the lack of distinction between target texture and background features can lead to misclassification in detection. This is exemplified by the petrographic thin section image classification of Bai et al., (2019) [40], where the similarity between oolitic limestone and dolomite thin sections was so high that misclassification occurred. To address this issue, incorporating such samples into the training process may enable the model to learn the differences between them and thus reduce misjudgment.
The uneven distribution of training image samples can have a significant impact on detection. This effect was demonstrated by the training set analysis, which revealed that the training set similar to Figure 9, Figure 10 and Figure 11 accounted for 54%, the training set similar to Figure 12 and Figure 13 accounted for 32%, and the training set similar to Figure 14 accounted for 14%. This skewed distribution of training set samples indicates that the model may not have sufficient learning experience, leading to better performance on simple images than on complex images.
The limited size of the training image data sets can lead to insufficient generalization of the model. Data augmentation has been employed to address the issue of small data sets, thus improving the model′s generalization capacity [38,39]. After data augmentation, the overall result was promising, which shows the potential of this method. With the increase in data sets, the generalization ability of the model can be further enhanced, thus improving the detection performance.

When the number of samples is large, deep learning models can be constructed from the very beginning and yield satisfactory classification results. Conversely, when the number of samples is insufficient, transfer learning methods such as appropriate initial weight selection can be employed to expedite the model convergence, and the data set can be augmented and supplemented through data augmentation, ultimately leading to satisfactory classification results. These results suggest that complex deep learning models can be applied to small data sets with the help of appropriate transfer learning and data augmentation.

In this study, a deep learning model was applied to detect multiple targets of ooids in microscopic petrographic thin section images. Object detection was found to be more complex and difficult than classification, with an average precision of 92.25% using ResNet50 as the backbone. Most of the prediction targets had high confidence. However, the uneven distribution of samples in the training set and the component complexity of microscopic image samples both impacted the detection performance, with the former having a greater influence. The results of this experiment suggest that ooid detection methods may provide useful insights for multitarget recognition on microscopic petrological thin section images. Our future work will focus on applying these technologies to intelligently recognize the other mineral textures and structures that reflect the genesis and formation environment of rocks.

5. Conclusions

By analyzing the mineral composition, texture and structural features, petrographic thin section images can be employed to classify rock types, elucidate petrogenesis, invert paleoenvironments and facilitate other relevant studies. In this study, Faster RCNN object detection algorithms were applied to identify multiple objects in petrographic thin section images using oolitic texture images from the public database ScienceDB. Transfer learning was used to accelerate the model convergence, data augmentation was used to enhance the generalization ability of the model, and GPU was utilized to improve the computational efficiency. The following conclusions and understandings are drawn from the experimental research.

The AP value of the ooids test set using ResNet50 as the backbone was 92.25%, indicating good overall detection performance. This object detection model was found to be robust and generalizable, as it was able to identify both complete ooids in the middle of the image and partial ooids at the edge.
The uneven distribution of samples in the training set and the complex composition of microscopic images affected the detection, with the former having a greater effect. Deep learning was used to learn features from the training set and make predictions on the test set, but the uneven distribution of samples caused the distribution of the learned features to deviate, resulting in missed detection in the prediction process. The complexity of the microscopic image composition, with a small difference between the target and the background, also contributed to misclassification and thus affected the detection performance, although to a lesser extent than the uneven distribution.
This study sought to transition from the classification of petrographic thin section images to multitarget detection, incorporating richer content such as spatial, quantitative and categorical target information, as well as more complex tasks. The research scale was further refined, transitioning from rocks to the textures and structures within them, providing a reference for multitarget intelligent recognition on petrographic thin section images.

Author Contributions

Conceptualization, H.W. and Y.Z.; methodology, H.W. and Y.Z.; software, H.W.; validation, H.W. and W.C.; investigation, H.W., W.C. and W.Y.; resources, Y.Z. and W.Y.; data curation, H.W. and W.C.; writing—original draft preparation, H.W. and W.C.; writing—review and editing, Y.Z. and P.Y.; visualization, H.W. and W.C.; supervision, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by National Key Research and Development Program of China (Grant No. 2022YFF0801201), National Natural Science Foundation of China (Grant No. U1911202) and the Key-Area Research and Development Program of Guangdong Province (Grant No. 2020B1111370001).

Data Availability Statement

Data available on request from the authors.

Acknowledgments

We express our utmost gratitude to Fan Xiao and Jin Hong from Sun Yat-sen University. Furthermore, we extend our sincere appreciation to Renguang Zuo, the esteemed guest editor of this special issue, for his exceptional organization and guidance. We would also like to express our appreciation to the two anonymous reviewers who have helped us improve the paper, as well as section managing editor and other editors for their assistance throughout the publication process.

Conflicts of Interest

The authors declare that they have no conflict of interest in relation to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

References

Zhou, Y.Z.; Zuo, R.G.; Liu, G.; Yuan, F.; Mao, X.C.; Guo, Y.J.; Xiao, F.; Liao, J.; Liu, Y.P. The great-leap-forward development of mathematical geoscience during 2010–2019: Big data and artificial intelligence algorithm are changing mathematical geoscience. Bull. Mineral. Petrol. Geochem. 2021, 40, 556–573. [Google Scholar] [CrossRef]
Zhou, Y.Z.; Chen, S.; Zhang, Q.; Xiao, F.; Wang, S.G.; Liu, Y.P.; Jiao, S.T. Advances and prospects of big data and mathematical geoscience. Acta Petrol. Sin. 2018, 34, 255–263. [Google Scholar]
Liu, C.; Chen, J.P.; Li, S.; Qin, T. Construction of Conceptual Prospecting Model Based on Geological Big Data: A Case Study in Songtao-Huayuan Area, Hunan Province. Minerals 2022, 12, 669. [Google Scholar] [CrossRef]
Mao, X.C.; Liu, P.; Deng, H.; Liu, Z.K.; Li, L.J.; Wang, Y.S.; Ai, Q.X.; Liu, J.X. A Novel Approach to Three-Dimensional Inference and Modeling of Magma Conduits with Exploration Data: A Case Study from the Jinchuan Ni–Cu Sulfide Deposit, NW China. Nat. Resour. Res. 2023, 32, 901–928. [Google Scholar] [CrossRef]
Deng, H.; Zhou, S.F.; He, Y.; Lan, Z.D.; Zou, Y.H.; Mao, X.C. Efficient Calibration of Groundwater Contaminant Transport Models Using Bayesian Optimization. Toxics 2023, 11, 438. [Google Scholar] [CrossRef]
Zeng, L.; Li, T.B.; Huang, H.T.; Zeng, P.; He, Y.X.; Jing, L.H.; Yang, Y.; Jiao, S.T. Identifying Emeishan basalt by supervised learning with Landsat-5 and ASTER data. Front. Earth Sci. 2023, 10, 1097778. [Google Scholar] [CrossRef]
Wang, J.; Zhou, Y.Z.; Xiao, F. Identification of multi-element geochemical anomalies using unsupervised machine learning algorithms: A case study from Ag-Pb-Zn deposits in north-western Zhejiang, China. Appl. Geochem. 2020, 120, 104679. [Google Scholar] [CrossRef]
Zuo, R.G. Machine Learning of Mineralization-Related Geochemical Anomalies: A Review of Potential Methods. Nat. Resour. Res. 2017, 26, 457–464. [Google Scholar] [CrossRef]
Wu, G.P.; Chen, G.X.; Cheng, Q.M.; Zhang, Z.J.; Yang, J. Unsupervised Machine Learning for Lithological Mapping Using Geochemical Data in Covered Areas of Jining, China. Nat. Resour. Res. 2021, 30, 1053–1068. [Google Scholar] [CrossRef]
Yu, X.T.; Xiao, F.; Zhou, Y.Z.; Wang, Y.; Wang, K.Q. Application of hierarchical clustering, singularity mapping, and Kohonen neural network to identify Ag-Au-Pb-Zn polymetallic mineralization associated geochemical anomaly in Pangxidong district. J. Geochem. Explor. 2019, 203, 87–95. [Google Scholar] [CrossRef]
Wu, B.C.; Li, X.H.; Yuan, F.; Li, H.; Zhang, M.M. Transfer learning and siamese neural network based identification of geochemical anomalies for mineral exploration: A case study from the Cu-Au deposit in the NW Junggar area of northern Xinjiang Province, China. J. Geochem. Explor. 2022, 232, 106904. [Google Scholar] [CrossRef]
Li, H.; Li, X.H.; Yuan, F.; Jowitt, S.M.; Zhang, M.M.; Zhou, J.; Zhou, T.F.; Li, X.L.; Ge, C.; Wu, B.C. Convolutional neural network and transfer learning based mineral prospectivity modeling for geochemical exploration of Au mineralization within the Guandian-Zhangbaling area, Anhui Province, China. Appl. Geochem. 2020, 122, 104747. [Google Scholar] [CrossRef]
Qin, Y.Z.; Liu, L.M. Quantitative 3D Association of Geological Factors and Geophysical Fields with Mineralization and Its Significance for Ore Prediction: An Example from Anqing Orefield, China. Minerals 2018, 8, 300. [Google Scholar] [CrossRef] [Green Version]
Zuo, R.G.; Kreuzer, O.P.; Wang, J.; Xiong, Y.H.; Zhang, Z.J.; Wang, Z.Y. Uncertainties in GIS-Based Mineral Prospectivity Mapping: Key Types, Potential Impacts and Possible Solutions. Nat. Resour. Res. 2021, 30, 3059–3079. [Google Scholar] [CrossRef]
Liu, L.M.; Cao, W.; Liu, H.S.; Ord, A.; Qin, Y.Z.; Zhou, F.H.; Bi, C.X. Applying benefits and avoiding pitfalls of 3D computational modeling-based machine learning prediction for exploration targeting: Lessons from two mines in the Tongling-Anqing district, eastern China. Ore Geol. Rev. 2022, 142, 104712. [Google Scholar] [CrossRef]
Wang, Z.Y.; Yin, Z.; Caers, J.; Zuo, R.G. A Monte Carlo-based framework for risk-return analysis in mineral prospectivity mapping. Geosci. Front. 2020, 11, 2297–2308. [Google Scholar] [CrossRef]
Lu, Y.; Liu, L.M.; Xu, G.J. Constraints of deep crustal structures on large deposits in the Cloncurry district, Australia: Evidence from spatial analysis. Ore Geol. Rev. 2016, 79, 316–331. [Google Scholar] [CrossRef]
Jia, L.Q.; Yang, M.; Meng, F.; He, M.Y.; Liu, H.M. Mineral Photos Recognition Based on Feature Fusion and Online Hard Sample Mining. Minerals 2021, 11, 1354. [Google Scholar] [CrossRef]
Wu, B.K.; Ji, X.H.; He, M.Y.; Yang, M.; Zhang, Z.C.; Chen, Y.; Wang, Y.Z.; Zheng, X.Q. Mineral Identification Based on Multi-Label Image Classification. Minerals 2022, 12, 1338. [Google Scholar] [CrossRef]
Su, C.; Xu, S.J.; Zhu, K.Y.; Zhang, X.C. Rock classification in petrographic thin section images based on concatenated convolutional neural networks. Earth Sci. Inform. 2020, 13, 1477–1484. [Google Scholar] [CrossRef]
Ma, H.; Han, G.Q.; Peng, L.; Zhu, L.Y.; Shu, J. Rock thin sections identification based on improved squeeze-and-Excitation Networks model. Comput. Geosci. 2021, 152, 104780. [Google Scholar] [CrossRef]
Singh, N.; Singh, T.N.; Tiwary, A.; Sarkar, K.M. Textural identification of basaltic rock mass using image processing and neural network. Comput. Geosci. 2010, 14, 301–310. [Google Scholar] [CrossRef]
Flügel, E.; Munnecke, A. Microfacies of Carbonate Rocks: Analysis, Interpretation and Application; Springer: Berlin, Germany, 2010. [Google Scholar]
Mlynarczuk, M.; Gorszczyk, A.; Slipek, B. The application of pattern recognition in the automatic classification of microscopic rock images. Comput. Geosci. 2013, 60, 126–133. [Google Scholar] [CrossRef]
Shu, L.; McIsaac, K.; Osinski, G.R.; Francis, R. Unsupervised feature learning for autonomous rock image classification. Comput. Geosci. 2017, 106, 10–17. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Part III 18, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
Liu, X.B.; Wang, H.Y.; Jing, H.D.; Shao, A.L.; Wang, L.C. Research on Intelligent Identification of Rock Types Based on Faster R-CNN Method. IEEE Access 2020, 8, 21804–21812. [Google Scholar] [CrossRef]
Zhang, Y.; Li, M.C.; Han, S. Automatic identification and classification in lithology based on deep learning in rock images. Acta Petrol. Sin. 2018, 34, 333–342. [Google Scholar]
Liu, C.Z.; Li, M.C.; Zhang, Y.; Han, S.; Zhu, Y.Q. An Enhanced Rock Mineral Recognition Method Integrating a Deep Learning Model and Clustering Algorithm. Minerals 2019, 9, 516. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.Q.; Chen, J.P.; Yan, P.B.; Zhong, J.; Sun, Y.X.; Jin, X.Y. Lithology Identification of Uranium-Bearing Sand Bodies Using Logging Data Based on a BP Neural Network. Minerals 2022, 12, 546. [Google Scholar] [CrossRef]
Cheng, G.J.; Guo, W.H. Rock images classification by using deep convolution neural network. J. Phys. Conf. Ser. 2017, 887, 012089. [Google Scholar] [CrossRef] [Green Version]
Polat, O.; Polat, A.; Ekici, T. Automatic classification of volcanic rocks from thin section images using transfer learning networks. Neural Comput. Appl. 2021, 33, 11531–11540. [Google Scholar] [CrossRef]
Ran, X.J.; Xue, L.F.; Zhang, Y.Y.; Liu, Z.Y.; Sang, X.J.; He, J.X. Rock Classification from Field Image Patches Analyzed Using a Deep Convolutional Neural Network. Mathematics 2019, 7, 755. [Google Scholar] [CrossRef] [Green Version]
Xu, S.T.; Zhou, Y.Z. Artificial intelligence identification of ore minerals under microscope based on deep learning algorithm. Acta Petrol. Sin. 2018, 34, 3244–3252. [Google Scholar]
Bai, L.; Wei, X.; Liu, Y.; WU, C.; CHEN, L. Rock thin section image recognition and classification based on VGG model. Geol. Bull. China 2019, 38, 2053–2058. [Google Scholar] [CrossRef]
Tan, X.C.; Zhao, L.Z.; Luo, B.; Jiang, X.F.; Cao, J.; Liu, H.; Li, L.; Wu, X.B.; Nie, Y. Comparison of basic features and origins of oolitic shoal reservoirs between carbonate platform interior and platform margin locations in the Lower Triassic Feixianguan Formation of the Sichuan Basin, southwest China. Petrol. Sci. 2012, 9, 417–428. [Google Scholar] [CrossRef] [Green Version]
Hollis, C.; Lawrence, D.A.; de Periere, M.D.; Al Darmaki, F. Controls on porosity preservation within a Jurassic oolitic reservoir complex, UAE. Mar. Petrol. Geol. 2017, 88, 888–906. [Google Scholar] [CrossRef] [Green Version]
Zhu, S.S.; Yang, W.Y.; Lu, B.B.; Huang, G.Y.; Hou, G.S.; Wei, S.P.; Zhang, Y.L. Micro image data set of some rock forming minerals, typical metamorphic minerals and oolitic thin sections. Sci. Data Bank 2020. [Google Scholar] [CrossRef]
Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A survey of deep learning-based object detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
Oksuz, K.; Cam, B.C.; Kalkan, S.; Akbas, E. Imbalance problems in object detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3388–3415. [Google Scholar] [CrossRef] [Green Version]
Wu, X.W.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef] [Green Version]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.H.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef] [Green Version]
Kukačka, J.; Golkov, V.; Cremers, D. Regularization for deep learning: A taxonomy. arXiv 2017, arXiv:1710.10686. [Google Scholar]
Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
Tzutalin. Labellmg. Git Code. 2015. Available online: https://github.com/heartexlabs/labelImg (accessed on 10 March 2022).
Bubbliiiing. Faster-RCNN-Pytorch. Git Code. 2022. Available online: https://github.com/bubbliiiing/faster-rcnn-pytorch (accessed on 12 March 2022).
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
Zhu, X.X.; Vondrick, C.; Fowlkes, C.C.; Ramanan, D. Do We Need More Training Data? Int. J. Comput. Vis. 2016, 119, 76–92. [Google Scholar] [CrossRef] [Green Version]
Warden, P. How Many Images Do You Need to Train a Neural Network? Pete Warden’s Blog. 2017. Available online: https://petewarden.com/2017/12/14/how-many-images-do-you-need-to-train-a-neural-network (accessed on 10 June 2023).

Figure 1. Schematic of the anchor box and ground truth box (The underlying image was obtained from the public database ScienceDB [43]).

Figure 2. Architecture of the Faster RCNN framework.

Figure 3. VGG16 Architecture.

Figure 4. (a) Convolution module and (b) residual block.

Figure 5. ResNet50 Architecture (the symbol “× number” denotes the number of blocks stacked.).

Figure 6. Comparison between the original image and the augmented data. (a) is the original image (from the public database ScienceDB [43]); (b) is the flipped image; (c) is the rotated image; (d) is the image after brightness adjustment; (e) is the image after adding random noise; (f) is the cropped image.

Figure 7. AP of ooids detection using VGG16 as the backbone.

Figure 8. AP of ooids detection using ResNet50 as the backbone.

Figure 9. Ooids image with correct detection results.

Figure 10. Misclassification of ooids at the image’s edges.

Figure 11. Detection results of ooids with low composition complexity.

Figure 12. Detection results of ooids with high composition complexity.

Figure 13. Detection results of ooids with moderate composition complexity.

Figure 14. Poor detection results of ooids.

Table 1. Confusion matrix table.

	Actual Value: True	Actual Value: False
Predicted values: Positive	TP	FP
Predicted values: Negative	FN	TN

Table 2. Hardware and software configuration.

Hardware/Software	Series/Version
CPU	i7-10700KF@3.8 GHz
GPU	2080ti
DRAM	32 G
SSD	1.5 T
OS	Windows10 Professional
Python	3.7.1
Torch	1.8.1
Torchvision	0.9.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Cao, W.; Zhou, Y.; Yu, P.; Yang, W. Multitarget Intelligent Recognition of Petrographic Thin Section Images Based on Faster RCNN. Minerals 2023, 13, 872. https://doi.org/10.3390/min13070872

AMA Style

Wang H, Cao W, Zhou Y, Yu P, Yang W. Multitarget Intelligent Recognition of Petrographic Thin Section Images Based on Faster RCNN. Minerals. 2023; 13(7):872. https://doi.org/10.3390/min13070872

Chicago/Turabian Style

Wang, Hanyu, Wei Cao, Yongzhang Zhou, Pengpeng Yu, and Wei Yang. 2023. "Multitarget Intelligent Recognition of Petrographic Thin Section Images Based on Faster RCNN" Minerals 13, no. 7: 872. https://doi.org/10.3390/min13070872

APA Style

Wang, H., Cao, W., Zhou, Y., Yu, P., & Yang, W. (2023). Multitarget Intelligent Recognition of Petrographic Thin Section Images Based on Faster RCNN. Minerals, 13(7), 872. https://doi.org/10.3390/min13070872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multitarget Intelligent Recognition of Petrographic Thin Section Images Based on Faster RCNN

Abstract

1. Introduction

2. Materials and Methods

2.1. Faster RCNN

2.1.1. VGG16

2.1.2. ResNet50

2.1.3. Transfer Learning

2.1.4. Average Precision

2.2. Experiments

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI