Research on Lightweight Lithology Intelligent Recognition System Incorporating Attention Mechanism

Zhang, Zhiyu; Li, Heng; Lei, Zhen; Liu, Haoshan; Zhang, Yifeng

doi:10.3390/app122110918

Open AccessArticle

Research on Lightweight Lithology Intelligent Recognition System Incorporating Attention Mechanism

by

Zhiyu Zhang

¹

,

Heng Li

^2,*

,

Zhen Lei

³,

Haoshan Liu

¹ and

Yifeng Zhang

²

¹

Faculty of Land Resources Engineering, Kunming University of Science and Technology, Kunming 650093, China

²

Faculty of Public Safety and Emergency Management, Kunming University of Science and Technology, Kunming 650093, China

³

School of Mining Engineering, Guizhou Institute of Technology, Guiyang 550003, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(21), 10918; https://doi.org/10.3390/app122110918

Submission received: 20 September 2022 / Revised: 13 October 2022 / Accepted: 22 October 2022 / Published: 27 October 2022

Download

Browse Figures

Versions Notes

Abstract

:

How to achieve high-precision detection and real-time deployment of the lithology intelligent identification system has significant engineering implications in the geotechnical, geological, water conservation, and mining disciplines. In this study, a lightweight lithology intelligent identification model is proposed to overcome this problem. The MobileNetV2 model is utilized as the basic backbone network to decrease network operation parameters. Furthermore, channel attention and spatial attention methods are incorporated into the model to improve the network’s extraction of complicated and abstract petrographic elements. In addition, based on the findings of network training, computing power performance, test results, and Grad-CAM interpretability analysis and comparison tests with Resnet101, InceptionV3, and MobileNetV2 models. The training accuracy of the proposed model is 98.59 percent, the training duration is 76 min, and the trained model is just 6.38 megabytes in size. The precision (P), recall (R), and harmonic mean (FI-score) were, respectively, 89.62%, 91.38%, and 90.42%. Compared to the three competing models, the model presented in this work strikes a better balance between lithology recognition accuracy and speed, and it gives greater consideration to the rock feature area. Wider and more uniform, strong anti-interference capability, improved robustness and generalization performance of the model, which can be deployed in real-time on the client or edge devices and has some promotion value.

Keywords:

geotechnical engineering; lithology identification; convolutional neural networks; attention mechanisms; lightweight

1. Introduction

Rapid, efficient, and accurate identification of rock lithology has significant engineering significance and application value. Lithology identification of rocks has been a hot topic of research in the fields of resource science, geological exploration, and geotechnical engineering at home and abroad [1,2]. Traditional and contemporary lithology identification techniques include naked eye identification [3], experimental analysis [4], thin section identification [5], and machine learning [6,7,8,9,10,11,12,13,14], among others. There are still flaws, despite the fact that the lithology identification of rocks has become more scientific and rational as a consequence of the investigations described above. Experimental analysis and thin section identification of lithology need high equipment and instruments, the workload is substantial, and the time and cost are sometimes prohibitive. Despite the fact that mathematical statistics and machine learning have solved certain workload and time cost issues, they can only extract some superficial characteristics of rocks, such as color and contour, and it is difficult to determine textural characteristics and deeper abstraction layers. Due to the complexity of rock groups, it is impossible to distinguish certain rocks by macroscopic examination only because of the presence of many variables, even in the case of metamorphic rocks.

Few researchers have utilized computer vision in deep learning [15,16,17,18] and migration learning techniques [19] to conventional domains such as geotechnical and geological engineering, etc., during the past few years. As research progresses, CV and migration learning approaches have been used for tasks such as the intelligent categorization of certain photographs of rocks. Y. J. Tan et al. [20] introduced the Xception model and migration learning techniques to recognize lithology by manually constructing 10 types of rock datasets, and the recognition accuracy reached 86%, achieving good recognition results; Zhang Ye et al. [21] used the InceptionV3 model as a pre-training model to establish lithology recognition models for three types of rocks, including granite, micaceous rock, and breccia, and the recognition accuracy was greater than 85%, and an improved lithology recognition model was developed; Xu Zhenhao et al. [22] used the self-supervised network to negatively influence the rock images, and the migration learning-based Resnet101 model was introduced to recognize different lithologies; Li Mingchao et al. [23] combined the migration learning of InceptionV3 model and the audio regression model of SVM and established lithology recognition models for six types of rocks, including dolomite, slate, granite, quartzite, marble, and basalt. The literature’s building of rock datasets [21,23] lacks diversity and does not refine the construction of the three primary rocks. In the literature [22], the Resnet101 model is used for lithology recognition, and it has achieved good recognition accuracy. However, the number and volume of model parameters are huge, and they can typically only be performed on GPU systems with high processing capacity. In addition, deep learning networks have black-box features, and the majority of academics do not provide model interpretability analyses, making them difficult for non-computer experts to comprehend and lacking in conviction.

Based on this, a total of 21 types of representative subclasses of the three major rocks are selected as the study dataset, and a lightweight lithology intelligent recognition model is proposed by combining the channel attention mechanism and spatial attention mechanism with the MobileNetV2 model, which has a deep separable convolutional and inverse residual network structure that can effectively solve the problems of large model parameters and gradient degradation or explosion caused by the deep network compared with the traditional convolutional neural network. The attention mechanism can extract more abstract and complex petrographic features and weaken the interference of non-petrographic features, which can effectively solve the cost problems such as time and model size while ensuring recognition accuracy. The present study model was also compared and analyzed with Resnet101 [22], InceptionV3 [21], MobileNetV2, etc. The present study model has the highest training accuracy with less model training time and model size, which shows the superiority of the present study model, and the present study model was tested on the test set and the rock lithology in the dataset. In addition, the Grad-CAM technique was used to analyze the interpretability of the proposed model, which verifies the universality and superiority of the proposed model and has certain application prospects and engineering values.

2. Rock Image Sample and Design

2.1. Dataset Source

Rock image selection criteria and the quality of the picture dataset determine the universality of the model and the upper limit of the model’s identification accuracy; hence, the following rock category selection criteria and image selection sources were considered in this study: (1) To make the study dataset more widely relevant, representative subclasses of the three main rocks were chosen as the study dataset. Since there are also changes in the qualities of the same type of rock from diverse sources, the same type of rock was also gathered from other places. (2) In the actual field, due to the geological tectonics, some rocks with the same lithology are classified, but due to the different formation mechanisms, some mineral components increase or decrease in content during the process of rock formation, and later metamorphic mineral components reorder under the metamorphic conditions; in order to account for these differences, the data in this paper, the igneous rocks, and the metamorphic rocks are collected from several regions, but the majority of them are not long in the time span. Because of this, they do not have an impact on the identification of the model. (3) To make the research sample dataset authoritative and authentic, a total of 1795 images of 21 types of rocks in the three major rocks were collected from multiple channels of the National Infrastructure of Mineral Rock and Fossil Resources for Science and Technology [24]. The collected rocks were recorded as igneous rocks (Class I): (I ₁) andesitic volcanic breccia, (I ₂) tuff, (I ₃) granodiorite, (I ₄) granite, (I ₅) gabbro, (I ₆) basalt and (I ₇) diorite; sedimentary rocks (Class II): (II ₁) gritstone, (II ₂) conglomerate, (II ₃) shale, (II ₄) greisen, (II ₅) silicalite, (II ₆) limestone and (II ₇) smokeless coal; metamorphic rocks (Class III): (III ₁) amphibolite, (III ₂) eclogite, (III ₃) marble, (III ₄) black slate, (III ₅) fault breccia, (III ₆) mylonite and (III ₇) skarn. To avoid affecting the more accurate extraction of model features, images with distortion, numerous background obscurations, and a high degree of degradation, as well as less complete exposure of fresh rock faces, must be eliminated from the process. Following this elimination, 21 image types were chosen. The collection of rock photos for this study consists of a total of 1661 photographs, with only one image of each of the 21 categories of rocks gathered mentioned in Table 1 due to space constraints.

2.2. Dataset Pre-Processing

According to the information and distribution of the rock dataset collected in Table 1, the sample size of various types of rocks ranges from 32 to 155, and the overfitting phenomenon in most cases is caused by the insufficient number of training samples or large differences in the number between samples; therefore, according to the study of the literature [25], data expansion (enhancement) of the original dataset is necessary, and the enhancement methods used in this study are described. horizontal flip, vertical flip. The enhancement effect is depicted in Figure 1, and the enhancement effect of a randomly chosen hornblende image is demonstrated here. A total of six times of data set expansion is obtained from the original data set of 1661 images to 9966 images, and the number of data sets of each rock category after the expansion is shown in Table 2. Matlab was used to convert these photos to three-channel RGB/224*224-pixel jpeg images with a standard format.

In the extended dataset, 80% and 10% of the samples are randomly picked as training and validation samples, respectively, with the remaining 10% utilized as a test set to evaluate the model’s generalization ability and resilience.

3. Model Introduction and Construction

3.1. MobileNetV2 Model Introduction

This section may be divided into subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

Google introduced the MobileNetV2 [26] network in 2018 as a lightweight convolutional neural network. Compared to conventional convolutional neural networks, the model has the following characteristics and benefits: (1) the depth separable (DWS) convolution has less operational costs and a number of network parameters; (2) the proposed inverse residual structure is more comprehensive for feature extraction with the deepening of network layers and solves problems such as gradient disappearance explosion; (3) the use of linear bottleneck (the last layer of the structure uses a linear layer) effectively reduces the loss of low-dimensional tensor features.

The traditional convolution is denoted by Equation (1), but the depth-separable convolution consists of two components, “depth convolution (DW)” and “point convolution (PW)”, which are represented by Equations (2) and (3), correspondingly.

Conv (W, F) = \sum_{k, l, c}^{K, L, C} W_{(k, l, c)} \times F_{(k, l, c)}

(1)

D_{-} Conv (W, F) = \sum_{k, l}^{K, L} W_{(k, l)} \times F_{(k, l)}

(2)

P_{-} Conv (W, F) = \sum_{c}^{c} W_{(c)} \times F_{(c)}

(3)

SepConv (W_{p}, W_{d}, F) = D_{-} Conv (W, F) + P_{-} Conv (W, F)

(4)

where W is the convolution kernel; F is the matrix of input features; K, L, C are the convolution kernel W dimensions.

According to Figure 2, assuming that the feature map input size is D_F × D_F × C_F × N_F, the convolution kernel size is D_K × D_K × C_K × N_K, and the depth convolution kernel point convolution kernel sizes are D_K × D_K × C_F × 1 and 1 × 1 × C_K × N_F, respectively, the conventional convolution covariance calculation P1 and the covariance calculation P2 of DWS convolution can be expressed as Equations (5) and (6) [25].

P_{1} = D_{K} \times D_{K} \times C_{F} \times N_{K} \times D_{F} \times D

(5)

P_{2} = D_{K} \times D_{K} \times C_{F} \times D_{F} \times D_{F} + C_{F} \times N_{K} \times D_{F} \times D_{F}

(6)

where D_K and C_K are the input feature map size and the number of convolution kernel channels in conventional convolution, respectively; D_F and C_F are the input feature map size and the number of PW convolution channels in DWS convolution, respectively; N_K is the number of input feature map channels.

Typically, the size of the convolution kernel is 3 × 3; thus, when the size of the convolution kernel is changed from D_K × D_K × C_K to 3 × 3 × C_K, the formulae for DWS convolution and classical convolution are produced (5)~(6). According to Formula (7), the P value is about one-ninth, demonstrating that the DWS convolution calculation parameter is only one-ninth of the classic convolution and that the depthwise separable convolution approach may significantly decrease the number of parameters. Decreasing reliance on computational resources, boost speed, and decrease cost.

P = \frac{P_{1}}{P_{2}} = \frac{1}{N_{K}} + \frac{1}{D_{K}^{2}}

(7)

The inverse residual structure presented by the MobileNetV2 network is based on the further evolution of Resnet based on the study results of Kaiming He et al. [27], as seen in Figure 3. In contrast to the residual connection, the inverse residual structure is based on the concept of ascending and then descending, and the layer structure is first climbed with 1*1 convolution and then activated nonlinearly with the Relu6 function in the high-dimensional space. The last layer of the network utilizes a linear activation function; hence, the layer uses 1*1 convolution to downscale and activates with a linear function; a residual link is created when the step size is 1. On both sides, the link is a low-dimensional tensor, which makes the network layers deeper, the gradient propagation more effective, and enables the identification of deeper and more abstract characteristics.

3.2. CBAM Module Introduction

CBAM is a module that incorporates channel attention and spatial attention to increase the feature extraction capability of the network through an attention mechanism [28,29]: focusing on important features and repressing non-essential features, CBAM injects the attention mapping along two separate dimensions of the feature map, channel, and space, and then multiplies the attention by the input feature map to perform adaptive feature refinement on the input feature map. CBAM may be smoothly incorporated into a CNN and so trained end-to-end with the CNN, while the model itself is incredibly lightweight and adds minimal processing. CBAM may be smoothly incorporated into a CNN and so trained end-to-end with the CNN, while the model itself is incredibly lightweight and adds minimal processing. Figure 4 depicts the channel attention and spatial attention structure of CBAM. Assume that given an input feature map

M_{C} \in R^{C \times 1 \times 1}

, the procedure may be separated into two halves. (1) On each input channel, global maximum pooling and mean pooling are conducted first. The pooled two one-dimensional vectors are fed into the fully connected layer and added by the sigmoid function mapping to obtain the final one-dimensional channel attention called

M_{C} \in R^{C \times 1 \times 1}

. The channel attention is then multiplied with the input elements again to obtain the channel attention-adjusted feature map F′, and the input feature map is continued to be adjusted and optimized as the spatial attention. (2) Secondly, the two two-dimensional vectors produced by pooling are then stitched together and subjected to convolution operation to obtain the two-dimensional spatial attention known as

M_{S} \in R^{1 \times H \times W}

. Finally, the two-dimensional spatial attention M_S is multiplied with F′ by elements, and the optimized output feature value F″ is obtained in accordance with Equation (9).

F^{'} = M_{C} (F) \times F

(8)

F^{″} = M_{S} (F^{'}) \times F^{'}

(9)

This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

Where F is the input feature map; C, H, and W are the number and size of input feature map channels; M_C is the final channel attention weight; M_S is the final spatial attention weight; F′ is the weighted summation of F over the channel attention module; F″ is the weighted summation of F′ over the spatial attention module.

3.3. Improvement of MoboileNetV2 Rockiness Recognition Model Incorporating Attention Mechanism

It is crucial for the network to extract numerous shallow and deep aspects of the rocks when utilizing deep learning for lithology recognition. Although it can use a previously trained model to learn the deep and shallow features of the lithology classification task, which makes the amount of training data required for migration learning order of magnitude smaller, some researchers have primarily used the migration learning method when using deep learning for lithology recognition. This method does not require training a new model. However, take into account the subsequent issues: one must learn deeper and more abstract rock features, such as contour features, texture features, and tectonic features of rocks because some rocks’ features and morphology are not regular, for example, similar rocks from different regions and periods may have some differences in external features; two) if a single MoboileNetV2 lightweight CNN, although the model’s recognition speed can be greatly improved, but compared with Resnet and InceptionV3 and other models. Therefore, in some situations, such as practical engineering and field exploration tasks where speed and lithology recognition accuracy may not be compatible at the same time, traditional convolutional neural networks or migration learning for fine-tuning training are not suitable for lithology recognition tasks.

Therefore, the MoboileNetV2 network has been enhanced to ensure compatibility of real-time detection and accuracy of rock recognition tasks at the same time, expand the network’s universality for rock identification, and realize the real-time deployment of rock recognition system: (1) The channel attention module and spatial attention module in CBAM are integrated with the MoboileNetV2 model in a serial connection after the second PW convolution. While the spatial attention module can reduce the interference of some difficult backgrounds and therefore impede the extraction of non-essential information, the channel attention module may promote the extraction of abstract characteristics such as texture features and rock construction features. (2) After the average pooling, a dropout layer is added, and the dropout rate is set to 0.5. This prevents the network from overfitting. The study output dimension (21 classes in this paper) is then switched to the output dimension of the fully linked layer, and each class of rocks is labeled and categorized accordingly.

Figure 5 shows the original mobilenetV2 model, and the improved MoboileNetV2 rock recognition model is depicted schematically in Figure 6. The network is primarily composed of two blocks: feature extraction and classification. The feature extraction part consists primarily of two ordinary convolutions, seven depth-separable convolutions with an inverted residual structure of fused attention mechanism, and one pooling. The classification part adopts a fully connected layer + Softmax structure, which is used to classify the network. According to Formula (10), the rocks are categorized, and the category with the highest level of confidence is chosen as the output category for the rocks. The cross-entropy function is then utilized as the loss function, as shown in Formula (11) The particular expression is as follows: to improve the classification results, the less the cross entropy, the closer the network output is to the real value corresponding to the labeling results.

S {(z)}_{j} = e^{z_{j}} / \sum_{k = 1}^{N} e^{z_{k}}

(10)

L = - \sum_{i = 1}^{M} \sum_{j = 1}^{N} m_{i j} \ln n_{i j}

(11)

where

S {(z)}_{j}

is the output confidence level corresponding to the jth class of rocks; N is the output image dimension, which is 21 classes in this paper; M is the total number of samples; N is the number of categories; m_ij is the predicted probability of category j at the ith sample; n_ij is the output probability of sample i at the jth category.

3.4. Experimental Environment and Hyperparameter Settings

The experimental apparatus employed in this work has the following hardware setup. Operating system: Windows 10 Professional x64, CPU: Intel(R) Core(TM) [email protected] GHz, GPU: NVIDIA GTX 2060Ti, memory: 16 GB (dual channel), deep learning development environment, and design module for Matlab 2021a and Deep Network designer. The hyperparameter settings of this study model were verified by going over pertinent literature and existing data, as well as by repeatedly fine-tuning the model and using a wide range of computer performance specifications, among other methods. The settings for the research model’s hyperparameters are presented in Table 3. Sgdm stochastic gradient descent was used to optimize the gradient update, and all other parameter settings were left at the program defaults with the exception of the hyperparameters that need to be altered in the table.

3.5. Model Evaluation Metrics

Confusion matrices were selected to reflect the classification accuracy of each rock picture in this study, and confusion matrices may be used for more analysis than only accuracy to assess the strengths and weaknesses of the rock image classification models. For classifier performance analysis, accuracy is not a desirable metric since it might lead to false positives because of issues like imbalanced rock datasets. As a result, in addition to accuracy (E_A), precision (P), recall (R), and FI-score are added to assess the model thoroughly in all areas.

E_{A} = \frac{TP + TN}{TP + TN + FP + FN}

(12)

P = \frac{TP}{TP + FP}

(13)

R = \frac{TP}{TP + FN}

(14)

FI = \frac{2 PR}{P + R}

(15)

where TP is the number of samples classified as positive by the network; TN is the number of samples classified as negative by the network; FP is the number of samples classified as positive by the network; FN is the number of samples classified as negative by the network.

4. Results

4.1. Training Result

To further validate and explain the efficacy and superiority of the proposed model (CBAM+MobileNetV2), a comparison test was conducted with the representative models in the direction of lithology identification in recent years: the Resnet101 [22] model, the InceptionV3 [21] model, and the MobileNetV2 model without the fused channel spatial attention mechanism, with each model having consistent parameter settings. Figure 7a,b depict the training loss values and training accuracy curves taken from the Matlab training log. The training loss and training accuracy curves of the proposed model converge in approximately 2000 iterations, which is faster than the other three models, and their final training loss values are lower than the other three models at 0.046, and their final training accuracy reaches 98.59%, which is greater than the other three models. The final training accuracy is 98.59 percent greater than the other three models, indicating that the suggested model has a superior recognition impact on the training set photos.

Table 4 displays the model’s greatest validation accuracy, lowest validation loss, training duration, and model size following training. Table 4 shows that migration learning for the lithology classification problem directly uses the unimproved MobileNetV2 model. The verification accuracy is only 86.43% despite the model having a shorter training time and model size after training. The effect of verification accuracy and loss is also significantly decreased. The model training time and model size after training, according to this research, are just 7 min and 0.4 MB behind the MobileNetV2 network. The influence is minimal in terms of real-time model identification performance, and the suggested model’s verification accuracy is excellent (95.71%). With a 9.28% MobileNetV2 model, the verification accuracy has significantly increased. The verification accuracy of the Resnet101 model and the InceptionV3 model, which are not significantly different from the model used in this work, is 94.79% and 92.55%, respectively. Table 4 shows that the Resnet101 and InceptionV3 models’ training times and model sizes, respectively, were much longer than those of the model used in this study. The model has 2.56 to 3.08 and 59.83 to 78.22 multiples. It is not necessary for the model to be deployed in real time due to restrictions such limited computer resources and processing capacity. As a result, the MobileNetV2 lithology identification model integrating the channel spatial attention mechanism performs better in terms of training set accuracy, model lightweight, and speed when compared to previous transfer learning models. Note: * represents the improved model proposed in this paper.

4.2. Model Validation

To assess the models’ robustness and generalizability, they were verified using the dataset from the test set (a brand new dataset). Figure 8a through Figure 8c display the findings of the three primary classes’ rock identification accuracy.

Figure 8a displays the recognition effect and accuracy rate for Class I rock images; the MobileNetV2 model has a general recognition effect in Class I rocks, but the average recognition rate is only 77.26%, and no rock has recognition accuracy higher than 90%; the MobileNetV2 model with fused attention mechanism has a better recognition effect in Class I rocks for three types of rocks, including andesitic volcanic breccia, tuff, and granite, with an average rec rate of 87.44% The recognition effect is better; the recognition accuracy exceeds 90%; the best recognition effect is granite; its accuracy is 98.47%; the average recognition accuracy is 87.14%; when compared to the unimproved MobileNetV2 model, it has increased by 11.88%; the average recognition accuracy of the ResNet101 model and the Inceptionv3 model, respectively, is 83.17% and 85.71%. For the ResNet101 and Inceptionv3 models, the average recognition accuracy is 83.17% and 85.71%, respectively.

The recognition effect and accuracy for Class II rock images are shown in Figure 8b. The MobileNetV2 model has a recognition accuracy higher than 90% for anthracite rocks in Class II rocks, with an average recognition accuracy of 75.25%; the model in this study has a better recognition effect for four types of rocks, namely conglomerate, shale, Greisen and Smokeless Coal in Class II rocks, with a recognition accuracy of more than 90% and an average recognition accuracy of 90.59%; for ResNet101 and Inceptionv3 models, the average recognition accuracy is 86.51% and 84.47%, respectively.

Figure 8c displays the effectiveness of the identification and the precision of Class III rock photos (c). In Class III rocks, the MobileNetV2 model performs poorly, with an average identification rate of 71.34%. No rock has a recognition accuracy greater than 90%, and Marble is the only rock with a recognition accuracy higher than 80%. The typical accuracy of recognition is 89.24%. The average recognition accuracy for the ResNet101 and Inceptionv3 models is 81.41% and 86.98%, respectively.

In conclusion, the test set performance comparison shows that the average recognition accuracy of the current research model for Class I–III rocks is 88.99%. The overall average recognition accuracy of the ResNet101 model is inferior to the present study model, with the exception of the individual rocks, which have higher recognition accuracy than the present study model. This shows that the present study model is more stable for the recognition of Class I–III rocks and further demonstrates the superiority and universality of the model proposed in this paper.

Only the confusion matrix of the MobileNetV2 model with fused attention mechanism for classifying rocks is listed in Figure 9 due to space restrictions. The overall validation of the model is carried out by quoting the precision rate, recall rate, and reconciliation mean, with the results displayed in Table 5. The recall rate of this study model is 91.38%, higher than Inceptionv3 model and MobileNetV2 model, and slightly lower than 92.66 of ResNet101 model, indicating that in the differentiation ability of positive samples, this study model has the best differentiation effect. The precision rate of this study model is 89.62%, which is higher than that of ResNet101 model, Inceptionv3 model, and MobileNetV2 model by 7.19–9.97%. In addition, this model’s average reconciliation value of 90.49%—which is 3.88–13.82% higher than that of ResNet101, Inceptionv3, and MobileNetV2—indicates that it is more reliable for identifying rocks overall. better. As a result, the MobileNetV2 model that incorporates the attention mechanism performs better at differentiating between positive and negative samples. The model’s rock classification recognition accuracy rate is 89.62%, recall rate is 91.38%, and summed average is 90.49%, all of which are greater than 85%, indicating that the model has a better detection rate and accuracy. This further confirms the model’s robust performance and generalization ability.

4.3. Grad-CAM Model Visualization Interpretation

As we all know, due to the black box nature of convolutional neural networks and other problems, the visualization interpretation is not strong, and the image pixels extracted from its convolutional layer almost tend to be distorted, so in order to overcome this shortcoming enable the model to be better interpreted in terms of visualization, the Grad-CAM model is used to visualize the interpretation of the network, and the heat map shows the focus of the model for the rock regions of interest. In this paper, only the macroscopic recognition of rock images using deep learning models is presented, and the advantages of the proposed model are only explained from the principal point of view for the mentioned abstract feature recognition, but not for the specific lithological features. This technological model is not the topic of this study; thus, it will not be repeated, and its specific principles may be found in the literature [30,31,32].

Because there is a restriction on the amount of storage space available, only one picture of each of the six different kinds of rocks in the data set will be used for the Grad-CAM interpretability study. The findings of the heat maps generated by the four different models are presented in Table 6. The range of rock feature areas that are concerned by the Resnet101, InceptonV3, and MobileNetV2 models is not as large as that of the model proposed in this paper, and the rock feature areas that are concerned are not as uniform as the model proposed in this paper. This is true for the categories that have less background interference. The Resnet101, InceptonV3, and MobileNetV2 models pay more attention to the feature areas of some background interferences, which leads to the deterioration of the feature extraction of the model. The model that is proposed in this paper, on the other hand, only interferes with the background for those categories that have a large amount of background interference, such as limestone and marble. Although just a little amount of care has been devoted to the material, rock characteristics continue to be the primary emphasis, and both the uniformity and the extent of the concern are noticeably superior to those of the other three models. To summarize, through the visual interpretation and analysis of the Grad-CAM model, the model in this paper pays more attention to the rock feature area more evenly and can better filter the background interferences that are not rock features, thereby improving the network’s extraction of lithofacies features. Additionally, the model pays more attention to the background interferences that are not rock features.

The MobileNetV2 model, which is a fused attention mechanism, is being applied to rock lithology identification and program development design for the very first time in this study. Despite this, there are certain limitations to the study. The following are some areas that will see further development in the near future: (1) Expanding the data set of rock types to enrich the samples of each rock type, so that more images of rock types can be generalized to the model of this study. (2) The next step will be to identify the lithology by coupling mechanical characteristics (for example, rock strength) to make it more close to the actual engineering and field exploration scenarios.

5. Conclusions

A lightweight lithology intelligent recognition model is proposed in this paper by using the MobileNetV2 network as the basic framework and improving the network by incorporating channel attention and spatial attention mechanisms. Additionally, two major aspects of model recognition accuracy and speed are explored and analyzed, and the following conclusions are drawn as a result of these investigations and analyses.

(1) The enhanced MobilNetV2 model, which is capable of extracting more abstract and complex petrographic features and weakening the interference of non-rock features by fusing the inverse residual network of attention mechanism, is compared and analyzed with the Resnet101, InceptonV3, and MobileNetV2 models. The training results show that the training time of the model while ensuring the recognition accuracy and model size are less, which can effectively realize lightweight occlusion detection.

(2) The MobileNetV2 model that incorporates the attention mechanism has an average recognition accuracy of 85% or more for the lithology of Class I–III rocks in the self-constructed dataset, and the recognition rate of some rocks is greater than 95%. Compared with the Resnet101, InceptonV3, and MobileNetV2 models, the model in this study has a better recognition effect and better stability for the overall dataset, and there are no significant differences between the models in terms of the accuracy with which it recognizes individuals. There is no recognition rate that is lower than 70 percent.

(3) The consequences of the Grad-CAM interpretability analysis show that this model has a broader and more uniform focus on the rock feature area. Additionally, this model has a stronger anti-interference ability, which further illustrates the superiority of the robustness and generalization ability of this model.

Author Contributions

Investigation, Y.Z.; Supervision, H.L.(Haoshan Liu ); Validation, Z.L.; Writing—original draft, H.L. (Heng Li); Writing—review & editing, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NO. 52064025) and the Major Science and Technology Special Program in Yunnan Province (NO. 202102AG050024).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Saporetti, C.M.; Goliatt, L.; Pereira, E. Neural network boosted with differential evolution for lithology identification based on well logs information. Earth Sci. Informatics 2020, 14, 133–140. [Google Scholar] [CrossRef]
Čalogović, M.; Marjanac, T.; Fazinić, S. Chemical Composition of Allochthonous Melt-Rocks Possibly Associated with Proposed Krk Impact Structure in NW Croatia, Comparison of Three Spectroscopic Methods and Recognition of Source-Rock Lithology. Geochem. Int. 2020, 58, 308–320. [Google Scholar] [CrossRef]
Zhang, M.L.; Feng, L.; Lin, J.; Zeng, K. The Optimal Selecting Technique on Well Logging Items and its Application to Identify Complex Lithology Profile. Adv. Mater. Res. 2013, 652–654, 2508–2514. [Google Scholar] [CrossRef]
Cao, Z.; Yang, C.; Han, J.; Mu, H.; Wan, C.; Gao, P. Lithology identification method based on integrated K-means clustering and meta-object representation. Arab. J. Geosci. 2022, 15, 1462. [Google Scholar] [CrossRef]
Ma, H.; Han, G.; Peng, L.; Zhu, L.; Shu, J. Rock thin sections identification based on improved squeeze-and-Excitation Networks model. Comput. Geosci. 2021, 152, 104780. [Google Scholar] [CrossRef]
Mahmoud, A.A.; Elkatatny, S.; Al-AbdulJabbar, A. Application of machine learning models for real-time prediction of the formation lithology and tops from the drilling parameters. J. Pet. Sci. Eng. 2021, 203, 108574. [Google Scholar] [CrossRef]
Lopes, D.M.R.; Andrade, A.J.N. Lithology identification on well logs by fuzzy inference. J. Pet. Sci. Eng. 2019, 180, 357–368. [Google Scholar] [CrossRef]
Ren, Q.; Zhang, H.; Zhang, D.; Zhao, X.; Yan, L.; Rui, J. A novel hybrid method of lithology identification based on k-means++ algorithm and fuzzy decision tree. J. Pet. Sci. Eng. 2022, 208, 109681. [Google Scholar] [CrossRef]
Xu, T.; Chang, J.; Feng, D.; Lv, W.; Kang, Y.; Liu, H.; Li, J.; Li, Z. Evaluation of active learning algorithms for formation lithology identification. J. Pet. Sci. Eng. 2021, 206, 108999. [Google Scholar] [CrossRef]
Yu, Z.; Wang, Z.; Zeng, F.; Song, P.; Baffour, B.A.; Wang, P.; Wang, W.; Li, L. Volcanic lithology identification based on parameter-optimized GBDT algorithm: A case study in the Jilin Oilfield, Songliao Basin, NE China. J. Appl. Geophys. 2021, 194, 104443. [Google Scholar] [CrossRef]
Dong, S.; Zeng, L.; Du, X.; He, J.; Sun, F. Lithofacies identification in carbonate reservoirs by multiple kernel Fisher discriminant analysis using conventional well logs: A case study in A oilfield, Zagros Basin, Iraq. J. Pet. Sci. Eng. 2022, 210, 110081. [Google Scholar] [CrossRef]
Fagbohun, B.J.; Adeoti, B.; Aladejana, O.O. Litho-structural analysis of eastern part of Ilesha schist belt, Southwestern Nigeria. J. Afr. Earth Sci. 2017, 133, 123–137. [Google Scholar] [CrossRef]
Luo, D.; Liu, A. Kernel Fisher Discriminant Analysis Based on a Regularized Method for Multiclassification and Application in Lithological Identification. Math. Probl. Eng. 2015, 2015, 384183. [Google Scholar] [CrossRef] [Green Version]
Ali, A.M.M.; Haffar, A.M.A.M.A. Artificial Intelligence for Lithology Identification through Real-Time Drilling Data. J. Earth Sci. Clim. Chang. 2015, 6, 1000265. [Google Scholar] [CrossRef] [Green Version]
Goni, I.; Ahmadu, A.S.; Malgwi, Y.M. Image Processing Techniques and Neuro-computing Algorithms in Computer Vision. Adv. Networks 2021, 9, 33. [Google Scholar] [CrossRef]
Wang, Q.; Yu, H.; Wang, S.; Lin, J. Guest editorial: Graph learning for computer vision. IET Comput. Vis. 2021, 15, 547–548. [Google Scholar] [CrossRef]
AI Hub Singapore. AI Hub Singapore. AI Hub Singapore creates first AI computer vision application that allows businesses to monitor social distancing with a mobile phone. In Medical Letter on the CDC & FDA; AI Hub Singapore: Singapore, 2020. [Google Scholar]
Hassaballah, M.; Awad, A.I. Deep Learning in Computer Vision: Principles and Applications; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
Fuad, A.; Al-Yahya, M. Cross-Lingual Transfer Learning for Arabic Task-Oriented Dialogue Systems Using Multilingual Transformer Model mT5. Mathematics 2022, 10, 746. [Google Scholar] [CrossRef]
Tan, Y.J.; Tian, M.; Xu, D.X.; Sheng, G.X.; Ma, K.; Qiu, Q.J.; Pan, S.Y. Research on rock image classification and recognition based on Xception network. Geogr. Geogr. Inf. Sci. 2022, 38, 3. [Google Scholar] [CrossRef]
Zhang, Y.; Li, M.; Han, S. Automatic lithology identification and classification method based on deep learning of rock images. J. Petrol. 2018, 34, 333–342. [Google Scholar]
Xu, Z.H. Intelligent lithology recognition based on rock image migration learning. J. Appl. Basic Eng. Sci. 2021, 29, 5. [Google Scholar] [CrossRef]
Li, M.; Fu, J.; Zhang, Y.; Liu, C. Intelligent recognition and analysis method of lithology classification by coupling rock image and hammering audio. Chin. J. Rock Mech. Eng. 2022, 39, 137–145. [Google Scholar] [CrossRef]
National Science & Technology Infrastructure. National Infrastructure of Mineral Rock and Fossil Resources for Science and Technology [EB/OL]. 19 December 2003. Available online: http://www.nimrf.net.cn (accessed on 15 April 2022).
Wang, J.M.; Chen, X.Y.; Yang, Z.Z.; Shi, C.Y.; Zhang, Y.H.; Qian, Z.K. Effects of different data enhancement methods on model recognition accuracy. Comput. Sci. 2021, 49, 418–423. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef] [Green Version]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Wang, J.; Yu, Z.; Luan, Z.; Ren, J.; Zhao, Y.; Yu, G. RDAU-Net: Based on a Residual Convolutional Neural Network With DFP and CBAM for Brain Tumor Segmentation. Front. Oncol. 2022, 12, 210. [Google Scholar] [CrossRef]
Cao, W.; Feng, Z.; Zhang, D.; Huang, Y. Facial Expression Recognition via a CBAM Embedded Network. Procedia Comput. Sci. 2020, 174, 463–477. [Google Scholar] [CrossRef]
Kim, J.; Kim, J.-M. Bearing Fault Diagnosis Using Grad-CAM and Acoustic Emission Signals. Appl. Sci. 2020, 10, 2050. [Google Scholar] [CrossRef]
Computers-Computer Graphics. Investigators from Georgia Institute of Technology Have Reported New Data on Computer Graphics (Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization). Comput. Technol. J. 2020. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. arXiv 2016, arXiv:1610.02391. [Google Scholar]

Figure 1. Example of amphibolite image enhancement. (a) Original image; (b) Horizontal Flip; (c) Vertical Flip; (d) Random rotation; (e) Image panning; (f) Gaussian noise wave.

Figure 2. Depthwise separable convolution.

Figure 3. MoboileNetV2 linear bottleneck with inverted residual block structure. (a) Module with stride 1; (b) Module with stride 2.

Figure 4. Schematic diagram of channel attention module and spatial attention module.

Figure 5. MobileNetV2 original model.

Figure 6. The improved MobileNetV2 lithology recognition model.

Figure 7. Model training loss, training accuracy curve comparison. (a) training loss curve; (b) training accuracy curve.

Figure 8. Comparison of correct classification rate of rock images. (a) Comparison of the accuracy of Class I lithology recognition; (b) Comparison of the accuracy of Class II lithology recognition; (c) Comparison of the accuracy of Class III lithology recognition.

Figure 9. Confusion matrix of rock image classification with MobileNetV2 model fused with attention mechanism.

Table 1. Representative rock image dataset display.

Groups
Precision Groups	(I₁) Andesitic Volcanic Breccia	(I₂) Tuff	(I₃) Granodiorite	(I₄) Granite	(I₅) Gabbro	(I₆) Basalt	(I₇) Diorite
Major Groups	Igneous rock
Source area of the rock	Guiding County, Guizhou Province. Yuhua County, Henan Province. Ali region, Tibet Autonomous Region. Rikaze region, Tibet Autonomous Region. Qingdao city, Shandong province. Jinan City, Shandong Province.
Groups
Precision Groups	(II₁) Gritstone	(II₂) Conglomerate	(II₃) Shale	(II₄) Greisen	(II₅) Silicalite	(II₆) Limestone	(II₇) Smokeless Coal
Major Groups	Sedimentary rock
Source area of the rock	Luodian County, Yunnan Province. Mianyang City, Sichuan Province. Liangshan Prefecture, Sichuan Province. Luzhou city, Sichuan Province. Kashgar, Aksu and Hotan regions of Xinjiang Uygur Autonomous Region. Rikaze, Tibet Autonomous Region. Xining City, Qinghai Province. Haibei Tibetan Autonomous Prefecture, Qinghai Province.
Groups
Precision Groups	(III₁) Amphibolite	(III₂) Eclogite	(III₃) Marble	(III₄) Black Slate	(III₅) Fault Breccia	(III₆) Mylonite	(III₇) Skarn
Major Groups	Metamorphic rock
Source area of the rock	Qinglong County, Pan County and Dafang County, Guizhou Province. Yulin and Guilin, Guangxi Zhuang Autonomous Region. Yunnan Province Diqing Tibetan Autonomous Prefecture. Laos County, Chongqing. Wenchuan County, Sichuan Province. Dandong and Fushun, Liaoning province. Heihe and Daxinganling regions of Heilongjiang Province.

Table 2. Distribution of original number, enhanced number and training test samples of rock dataset.

	Rock Category	Original Quantity (Sheets)	The Number after Enhancement (Sheets)	The Number of Training Samples + Validation (Sheets)	The Number of Test Samples (Sheets)
Igneous rock (Class I)	(I ₁) Andesitic Volcanic Breccia	38	228	205	23
	(I ₂) Tuff	69	414	373	41
	(I ₃) Granodiorite	55	330	297	33
	(I ₄) Granite	108	648	583	65
	(I ₅) Gabbro	85	510	459	51
	(I ₆) Basalt	96	576	518	58
	(I ₇) Diorite	75	450	405	45
Sedimentary rock (Class II)	(II ₁) Gritstone	88	528	475	53
	(II ₂) Conglomerate	112	672	605	67
	(II ₃) Shale	128	768	691	77
	(II ₄) Greisen	45	270	243	27
	(II ₅) Silicalite	76	456	410	46
	(II ₆) Limestone	155	930	837	93
	(II ₇) Smokeless Coal	42	252	227	25
Metamorphic rock (Class III)	(III ₁) Amphibolite	65	390	351	39
	(III ₂) Eclogite	77	462	416	46
	(III ₃) Marble	142	852	767	85
	(III ₄) Black Slate	66	396	356	40
	(III ₅) Fault Breccia	58	348	313	35
	(III ₆) Mylonite	32	192	173	19
	(III ₇) Skarn	49	294	265	29
Total		1661	9966	8969	997

Table 3. Model hyperparameter settings.

Hyperparameters	Set Value	Hyperparameters	Set Value
Training batch	16	Initial learning rate	0.001
Number of rounds	10	Weight decay value	10⁻⁴
Number of iterations per round	523	Momentum (sgdm)	0.9
Total number of iterations	5230	Dropout discard probability	0.5

Table 4. Model performance comparison performance.

Model	Minimum Validation Loss	Maximum Validation Accuracy/%	Training Time/min	Post-Training Model Size/MB
CBAM + MobileNetV2*	0.0723	95.71	76	6.38
ResNet101 [22]	0.1125	94.79	195	485.78
InceptionV3 [21]	0.1955	92.55	234	371.59
MobileNetV2	0.2885	86.43	69	5.98

Table 5. Model evaluation index results.

Model	Precision/%	Recall/%	FI-Score/%
CBAM+MobileNetV2*	89.62	91.38	90.49
ResNet101 [22]	81.32	92.66	86.62
InceptionV3 [21]	79.65	88.52	83.85
MobileNetV2	82.43	71.65	76.66

Table 6. Grad-CAM interpretability analysis.

Rock Category	Granite	Basalt	Diorite	Shale	Limestone	Marble
Model/Original Image
CBAM+MobileNetV2*
MobileNetV2
ResNet101 [22]
InceptionV3 [21]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Li, H.; Lei, Z.; Liu, H.; Zhang, Y. Research on Lightweight Lithology Intelligent Recognition System Incorporating Attention Mechanism. Appl. Sci. 2022, 12, 10918. https://doi.org/10.3390/app122110918

AMA Style

Zhang Z, Li H, Lei Z, Liu H, Zhang Y. Research on Lightweight Lithology Intelligent Recognition System Incorporating Attention Mechanism. Applied Sciences. 2022; 12(21):10918. https://doi.org/10.3390/app122110918

Chicago/Turabian Style

Zhang, Zhiyu, Heng Li, Zhen Lei, Haoshan Liu, and Yifeng Zhang. 2022. "Research on Lightweight Lithology Intelligent Recognition System Incorporating Attention Mechanism" Applied Sciences 12, no. 21: 10918. https://doi.org/10.3390/app122110918

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Lightweight Lithology Intelligent Recognition System Incorporating Attention Mechanism

Abstract

1. Introduction

2. Rock Image Sample and Design

2.1. Dataset Source

2.2. Dataset Pre-Processing

3. Model Introduction and Construction

3.1. MobileNetV2 Model Introduction

3.2. CBAM Module Introduction

3.3. Improvement of MoboileNetV2 Rockiness Recognition Model Incorporating Attention Mechanism

3.4. Experimental Environment and Hyperparameter Settings

3.5. Model Evaluation Metrics

4. Results

4.1. Training Result

4.2. Model Validation

4.3. Grad-CAM Model Visualization Interpretation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI