Next Article in Journal
A Geotechnical Investigation of 2017 Chattogram Landslides
Next Article in Special Issue
Pilot Study Using ArcGIS Online to Enhance Students’ Learning Experience in Fieldwork
Previous Article in Journal
Trace Element Geochemistry of Chalcopyrites and Pyrites from Golpu and Nambonga North Porphyry Cu-Au Deposits, Wafi-Golpu Mineral District, Papua New Guinea
Previous Article in Special Issue
Geologist in the Loop: A Hybrid Intelligence Model for Identifying Geological Boundaries from Augmented Ground Penetrating Radar
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pretraining Convolutional Neural Networks for Mudstone Petrographic Thin-Section Image Classification

by
Rafael Pires de Lima
1,* and
David Duarte
2
1
Geological Survey of Brazil, São Paulo 01304-010, Brazil
2
School of Geosciences, University of Oklahoma, Norman, OK 73019, USA
*
Author to whom correspondence should be addressed.
Geosciences 2021, 11(8), 336; https://doi.org/10.3390/geosciences11080336
Submission received: 5 July 2021 / Revised: 30 July 2021 / Accepted: 5 August 2021 / Published: 11 August 2021
(This article belongs to the Special Issue Advances and Applications in Computational Geosciences)

Abstract

:
Convolutional neural networks (CNN) are currently the most widely used tool for the classification of images, especially if such images have large within- and small between- group variance. Thus, one of the main factors driving the development of CNN models is the creation of large, labelled computer vision datasets, some containing millions of images. Thanks to transfer learning, a technique that modifies a model trained on a primary task to execute a secondary task, the adaptation of CNN models trained on such large datasets has rapidly gained popularity in many fields of science, geosciences included. However, the trade-off between two main components of the transfer learning methodology for geoscience images is still unclear: the difference between the datasets used in the primary and secondary tasks; and the amount of available data for the primary task itself. We evaluate the performance of CNN models pretrained with different types of image datasets—specifically, dermatology, histology, and raw food—that are fine-tuned to the task of petrographic thin-section image classification. Results show that CNN models pretrained on ImageNet achieve higher accuracy due to the larger number of samples, as well as a larger variability in the samples in ImageNet compared to the other datasets evaluated.

1. Introduction

Although the roots of convolutional neural networks (CNN) emerged in the 1980s [1,2], they were only widely adopted in the 2010s, after a model used by Krizhevsky et al. [3] won the 2012 ImageNet competition challenge [4] by a large margin [5] when competing against traditional machine-learning algorithms. ImageNet [6] is one of several computer vision datasets (e.g., [7,8,9]) that contributed to the development of CNN models, as well as the standardization of models’ analysis. CNN models used for the classification of images are trained on datasets containing pairs of input data (images) and labels (classes). During training, CNN models need to learn mapping from the input data to the desired labels. Compared to fully connected (fc) layers, convolutional layers are neurons that better exploit the locality, stationarity, and compositionality of signals that are well suited to image data. It is generally useful to have a large dataset for training CNNs for two main reasons: image data have a very high dimensionality, e.g., a red-green-blue (RGB) image with 224 × 224 pixels is one sample in 150,528 dimensions space; and CNN models are usually built with very large number of parameters (in the order of millions). The widespread adoption of training and evaluating models in standardized datasets by the computer vision community facilitated the distribution of the trained models at large. The popularization of pretrained models supported the adoption of transfer learning (e.g., [10,11,12,13]) by other fields of science in which the amount of labelled data is not as large. The field of geosciences is one of many in which the use of transfer learning has recently become popular.
The main idea behind transfer learning comes from the realization that the representation of the input generated by the layers in a neural network are generic for the layers closer to the input, and more complex and abstract for layers closer to the output (or label), especially when trained with datasets of natural images such as ImageNet. The interpretability of CNN filters and their outputs is an active area of research, with several examples showing how images activate filters differently, for example [14,15,16,17]. Although the representations learned by CNN models contain the information needed to map the input to the output, these transformations can be useful for solving other outputs (tasks), and this procedure tends to be more successful if the tasks are related in some aspect [12,13,18,19]. In transfer learning, a model trained on a primary task (e.g., to classify the ImageNet dataset) is repurposed for a secondary task (e.g., to classify thin-section images). The repurposing step generally requires an adaptation of the model, as well as further training. Zamir et al. [19] investigated how different visual tasks were related to each other as a means of proposing better transfer learning strategies. Their results show, for example, that 2D segmentation is a task more similar to colorization and in-painting than to image denoising. They also observed that the results they found are model- and data-specific, meaning there are still knowledge gaps that should be studied. The objectives of this paper are well aligned with such observations. Here, we investigate the transferability of models trained on different datasets for the task of petrographic classification at a thin-section scale.
In many transfer learning applications, the similarity between the primary task and the secondary task is simplified up to the point of the type of data used. For instance, many transfer learning applications rely on the fact that the ImageNet is a dataset of RGB images; thus, the parameters learned by CNN models to classify an ImageNet dataset can be repurposed to classify other RGB image datasets. For example, Norouzzadeh et al. [20] repurposed CNN models pretrained on ImageNet to automate animal identification in a dataset composed of camera trap images. Tschandl et al. [21] used transfer learning to repurpose a model pretrained on ImageNet to classify pigmented lesions from different populations; Kather et al. [22] used transfer learning to identify microsatellite instability directly from histological images in gastrointestinal cancer. Both Hu et al. [18] and Pires de Lima and Marfurt [23] studied the use of transfer learning for the classification of high-resolution remote-sensing images, both using models pretrained on ImageNet. The list of studies using transfer learning for the classification of geoscience images is also expanding. Examples include Pires de Lima et al. [24], who used transfer learning for the classification of lithofacies using pictures of core data. In contrast to some of the examples cited, Baraboshkin et al. [25] reported inferior performance when using transfer learning than when training a CNN model with randomly initialized weights. Transfer learning showed itself to be highly valuable in the classification of petrographic thin-section images (e.g., [26,27,28,29]). All these transfer learning examples for the classification of geoscientific images repurpose models pretrained on ImageNet at some point in their analyses.
What is curious about these examples is that training a CNN model for the classification of ImageNet, an image dataset composed of a wide range of objects and scenes, always seems to improve the model’s performance when transferred to a secondary task, even if the images from the secondary task are visually dissimilar to ImageNet (e.g., histology or skin lesion images). The classes in ImageNet have a within-class variance that is much larger than what we usually expect for petrographic thin-section classifications. To put this in perspective, ImageNet-trained models should be able to identify dogs or cats, independent of their scale, lightning conditions, or background. In fact, ImageNet is specifically constructed so that objects in images have different appearances, background clutter and occlusions, as well as varying positions, viewpoints, and poses [6]. In contrast, thin-section images are obtained in well-defined configurations, rendering images that have a much smaller variance, especially when the photographs are taken at the same zoom level. The hypothesis that leads the experiments presented in this manuscript is that models primarily trained on a dataset more visually similar to the secondary task should outperform models primarily trained on a dataset dissimilar to the secondary task. In other words, adapting the weights should be easier when tasks are related, which aligns with recent findings (e.g., [19]). The evaluation of similarity here is somewhat qualitative, but enough to discuss the findings in depth. To perform such analysis, we evaluate the results of using transfer learning to classify thin-sections images using models primarily trained on ImageNet, the HAM10000 dataset [21], the RawFooT dataset [30], and part of Kather et al. [22]’s dataset (hereinafter, MSI vs. MSS standing for microsatellite instable vs. microsatellite stable). These datasets were selected based on their resolution, the number of samples available, and because they were previously used to train CNN models. Moreover, these datasets have standardized images, reducing unnecessary variance in the samples imaged—a contrast with ImageNet that aims to include variations, as described above. Understanding how the similarity between visual datasets affects the classification of petrographic thin sections can help us create better models, as well as better petrographic datasets.
The main contributions of this manuscript can be summarized as follows:
  • An evaluation of how the difference between the dataset used to train a CNN model for the primary task affects the performance of such model when used for transfer learning for the classification of petrographic thin sections for the secondary task;
  • The optimization of models that can accurately classify thin-section images from the Sycamore formation with two different magnification levels;
  • How the amount of data affects CNN models used for transfer learning;
  • How the similarity between primary and secondary tasks affects CNN models used for transfer learning;
Section 2 presents the datasets used for the primary task of image classification, as well as the CNN architectures used and the general methodology of the study. Section 3 shows the results of all experiments conducted, as well as their interpretation. Section 4 shows the discussion regarding the results obtained with the proposed experiments. Section 5 presents the conclusions.

2. Materials and Methods

2.1. Petrographic Thin-Section Data

The central data used for transfer learning of this study are 98 thin sections acquired from 5 different cores from the Sycamore formation (early Mississippian strata) in the Ardmore basin, Oklahoma, the same thin sections used in [26]. We took roughly five randomly placed and generally non-overlapping photographic images for every thin section using plane polarized light and a 2.5× or 10× magnification zoom, attaining a total of 513 samples. Then, we classified the thin sections in four microfacies (classes) based on the texture and mineralogical composition. Although this classification took into account a general knowledge about the origin, depth, and geological background of the thin sections, some bias was introduced into the dataset, as the microfacies were defined based on our interpretation. Note, we mixed images with 2.5× and 10× magnification zoom. Even though the relative grain sizes are different at different magnification levels, we argue that the petrographic characteristics that explain the interpreted microfacies are well defined in both scales; thus, this will not negatively affect the models. Table 1 shows the count for each one of the microfacies interpreted, as well as the count of samples in the train and test sets. Except for Argillaceous mudstone (AMdst), the classes are somewhat well balanced. The images were then color balanced following [31], which assumed that the highest values of RGB observed in a photograph corresponded to white and the lowest values corresponded to black (Figure 1a). Moreover, [32] investigated the use of color balancing and found it helpful for image classification using CNN. The same methodology is used in [26]. We then resized the images to 646 × 484 pixels, maintaining the same aspect ratio of the original photograph and maintaining the same pixel area relation. During training and inference, we used the five-crop technique to augment the data, extracting 224 × 224 samples from the corners and the center of the images (Figure 1b). The final prediction for a given sample was the resulting mean average prediction across the five crops. The five-crop is a common technique frequently used in computer vision tasks and somewhat simpler than what was discussed in [26]. We then computed the mean of the RGB channels of the training, as well as their standard deviations useful for the normalization of the data, and obtained (0.3579, 0.2924, 0.3122) for the mean and (0.2028, 0.2011, 0.1877) for the standard deviation. Figure 2 shows examples of images in the test set.

2.2. ImageNet

ImageNet is one of the most popular computer vision datasets. The full dataset is composed of over 15 million labeled high-resolution images belonging to roughly 22,000 classes. The dataset images were of variable resolutions and were collected from the internet and labelled by humans. Many computer vision experiments are conducted on a subset of the ImageNet dataset that originated from an annual competition called the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). The ILSVRC subset of ImageNet has roughly 1000 images in each of the 1000 categories [3,4]. Weights from models trained with the ILSVRC (hereinafter simplified back to ImageNet) are likely the most widely available and are used for transfer learning. We downloaded pretrained weights directly from [33].

2.3. HAM10000

Tschandl et al. [21] released the “Human Against Machine with 10,000 training images”—HAM10000 dataset to facilitate training of neural networks for automated diagnosis of pigmented skin lesions. The dataset contains dermatoscopic images from different populations, acquired and stored by different modalities. The published dataset consists of 10,015 dermatoscopic images which can serve as a training set for academic machine-learning purposes. The images comprise a representative collection of many important pigmented lesion categories. Figure 3 shows some samples of the test set. Over 50% of the imaged lesions were confirmed through histopathology, and the remaining samples were confirmed through either follow-up examination, expert consensus, or confirmation by in vivo confocal microscopy. The train/test split is provided by the authors. We computed the mean of the RGB channels for the train set (0.7637, 0.5461, 0.5707), with a standard deviation of (0.0897, 0.1184, 0.1330).
Figure 3. Examples of images in the HAM10000 [21] test set and their corresponding labels. Table 2 shows class names and number of samples in the dataset.
Figure 3. Examples of images in the HAM10000 [21] test set and their corresponding labels. Table 2 shows class names and number of samples in the dataset.
Geosciences 11 00336 g003

2.4. RawFooT

The Raw Food Texture (RawFooT) dataset was designed to study the robustness of classification methods with respect to variations in lighting conditions. The dataset includes images of food with different visual texture, acquired under 46 lighting conditions, with variations in the light direction, in the illuminant color, in its intensity, or in a combination of such factors. The dataset contains 68 classes of raw food and includes different kinds of meat, fish, cereals, bread, and other food [30]. The dataset is available in full image (800 × 800 pixels) and tile (200 × 200 pixels) formats. We used the tiles in this study. The tile images correspond to subdivisions of the original 800 × 800 images into 16 square regions. The authors of the RawFooT dataset proposed a checkerboard pattern and selected eight square regions to be part of the train set, while the remaining eight square regions were allocated to the test set. We used the split defined by the authors. We calculated the mean of the RGB images in the train set (0.3765, 0.2743, 0.1351) and the standard deviation (0.0571, 0.0537, 0.0433). Figure 4 shows examples of images in the test set. Table 3 shows the table with the class names and the count of samples per set.

2.5. MSI vs. MSS

Knowing that gastrointestinal cancer MSI patients respond exceptionally well to immunotherapy, and that not every patient is tested for MSI, Kather et al. [22] show that ResNets can predict MSI directly from H&E (hematoxylin and eosin stain) histology images. Kather et al. also used tiles of larger high-resolution slides for their analysis, and made the tiles available for download. The tiles available for download are separated into train and test set samples with 224 × 224 pixels, which is equivalent to 0.5 µm per pixel. We used the colorectal cancer images only. Figure 5 shows examples of the dataset. Table 4 shows the number of samples in the train and test sets. We computed the mean RGB (0.7263, 0.5129, 0.6925) and standard deviation (0.1444, 0.1833, 0.1310) of the images in the train set. This dataset is named MSI vs. MSS in this study.
Figure 5. Examples of Kather et al. [22]’s images in the MSI vs. MSS test set and their corresponding labels. Table 4 shows class names and number of samples in the dataset.
Figure 5. Examples of Kather et al. [22]’s images in the MSI vs. MSS test set and their corresponding labels. Table 4 shows class names and number of samples in the dataset.
Geosciences 11 00336 g005

2.6. Transfer Learning and Implementation Details

The methodology behind transfer learning is well described in many studies (e.g., [12,13,18,23]). For instance, [26,27,29] described the technique using petrographic thin-section data. Thus, in this paper, we provide only a brief explanation of the transfer learning methodology. Initially, CNN models are created with randomly initialized weights. During training, the weights are updated according to the objective function, reducing the loss of the model regarding the output. For classification problems, the loss is computed according to the true and predicted labels. After training, the weights of the models are in a stage such that they are useful for classifying the input image data. In other words, the models learn to extract features that are useful for classification. What was described is the default training methodology, when the weights of the model were randomly initialized and updated through gradient descent during training. In the transfer learning training methodology, the model previously trained with randomly initialized weights for the primary task is further trained on the secondary task. The trained models can be used as feature extractors, when part of the weights of the models is not updated for the secondary task, or they can be fine-tuned, when all the weights are updated, onto the secondary task. Here, we use the fine-tuning technique.
We used residual nets (ResNets [34]) as the CNN architecture for the experiments, specifically ResNet18 and ResNet50. ResNets introduce shortcut connections to CNN architectures as a way to address the problem of vanishing gradients. The skip connections allow the gradients to flow from layers close to the output, where the loss is computed, to layers closer to the input. This strategy enabled the adoption of deeper networks, facilitating the optimization of large models. ResNet18 (ResNet50) is composed of 18 (50) parameter layers, either convolution or fc layers. ResNets end with a global average pooling layer, followed by an fc layer. For ResNets trained on ImageNet, the fc layer placed after the global average pooling layer contains 1000 neurons (fc 1000), one for each class of the dataset. In the implemented transfer learning methodology, we randomized the fc 1000 and added another fc layer, with a neuron for each one of the classes of the new dataset. In the fine-tuning technique implemented here, only the fc layers were updated in the first five epochs; then, the remaining layers were unfrozen and all the weights were updated for the remaining epochs.
For all experiments, 20% of the train dataset was randomly selected to be part of the validation set. The model continues training while the validation loss decreases, with a patience of five epochs unless otherwise noted. Patience is a hyperparameter indicating the number of epochs after which the model stops training. The thin-section data was cropped to desired dimensions of 224 × 224 pixels using the five-crop technique described previously. The images of the remaining datasets were resized to 224 × 224 pixels. After cropping or resizing, the images were normalized following the computed dataset mean and standard deviation. Data augmentation was used for the train set only, not on the validation or test sets, and used the following pipeline: horizontal flip, vertical flip, and rotation limited to ± 5 degrees, all with 50% probability. The hyperparameter search included mostly batch size by accumulation of gradients, learning rate, and optimizers. Adam [35] was the preferred optimizer, but RMSprop [36] was also evaluated. We used the PyTorch [33] framework for implementation, following PyTorch Lightning [37] structures. Experiments were tracked with Weights and Biases [38].

3. Results and Interpretation

3.1. Results

We start this section showing the results obtained training ResNet18 and ResNet50 for the classification of the HAM10000, RawFooT, and MSI vs. MSS datasets. To recall, these datasets were chosen because they are more visually similar to petrographic thin-section image than the ImageNet; samples from the former, for example, are in the same scale and there are no backgrounds. We evaluated different hyperparameters using the models and datasets described in Section 2. Appendix A (Table A1 and Table A2) shows details of the hyperparameters used for training the CNN models on the primary task. Table 5 shows the accuracy of the best performing models on the test set for each one of the datasets, using both ResNet18 and ResNet50. Other performance metrics are important for the evaluation of classification tasks, especially when there are different costs for different classes or when dealing with unbalanced datasets (e.g., [39,40]), but these are omitted here for the sake of simplicity. The results in Table 5 show some overfitting for both MSI vs. MSS and RawFooT datasets, even though the train data were augmented during training. Moreover, the accuracy of ResNet18 on the RawFooT dataset is noticeably higher than the ResNet50. The difference in accuracy between train and test sets for the MSI vs. MSS is considerably larger—0.2 for ResNet18 and 0.19 for ResNet50. Nonetheless, the trained models learned weights that can serve as effective feature extractors for the classification of each one of the datasets, unlike the initial random weights. Thus, the models are then adapted to the secondary task.
The baseline for the classification of thin-section data is a model started with randomly initialized weights. We performed a hyperparameter search training ResNet18 and ResNet50 with the petrographic thin-section data described in Section 2.1. Figure 6 shows the computed accuracy and loss during training for the best performing hyperparameters in ResNet18 and ResNet50 models. Although the train and validation metrics walk closely together for ResNet50, the model takes longer to achieve a higher performance. Thus, we increased the patience for all ResNet50s to 10 to accommodate possible spurious fluctuations in the validation loss that could make the model stop training before desired. ResNet18 decreases the loss (increases the accuracy) more rapidly; however, it also shows signs of overfitting earlier. ResNet18 models are maintained with a patience of five.
Table 6 shows the hyperparameters searched while training ResNets with randomly initialized weights on the petrographic thin-section data. Due to GPU memory limitations, ResNet18s are trained with a batch size of 16 while ResNet50s use a batch size of 4. The batch size is increased before the optimizer steps by the accumulation of gradients. Results show that minor changes in the choice of hyperparameters can have significant effects in models’ performances. For example, changing the batch size (accumulated) for ResNet18 from 32 to 64 increased the accuracy by 0.03 in the test set, but increasing it again from 64 to 128 reduced the accuracy by 0.03 (Test accuracy, rows 1, 2, and 4 in Table 6). Some hyperparameters bring the performance to noticeable sub optimal results (e.g., rows 5 and 8). The model in row 5 trained only for 8 epochs, which was smaller than most of the experiments reported for ResNet18. This happened because the model achieved a small validation loss in epoch three and then oscillated with slightly larger validation loss values. Due to the patience (five epochs), the model stopped after epoch eight. Changing only the learning rate (row 4), the model is able to improve the accuracy in the test set by 0.2. Results in rows 8 and 10 show that the optimizer can have a significant impact on the performance of ResNet50, where Adam improves the accuracy in the test set by 0.16 when compared to RMSprop with the same hyperparameters. ResNet18 achieves a mean average of 0.82 for test accuracy, with the worst model achieving 0.66 and the best model reaching 0.89 accuracy. Hereinafter, we summarize the information just provided as: 0.82 (0.66, 0.89). ResNet50 achieves accuracies of 0.75 (0.64, 0.82). Training ResNet18 with the petrographic thin-section data, and using the five-crop strategy, is relatively cheap, with one epoch taking less than one minute with a laptop-based GTX 1050 GPU. ResNet50 training is a little more expensive, taking roughly 1.2 min per epoch with the same hardware.
With the results obtained by training CNN models with weights that were randomly initialized as the baseline for the classification of petrographic thin-section images, we continued with the transfer learning analysis. The ResNet18 and ResNet50 models were pretrained on the datasets described in Section 2.2, Section 2.3, Section 2.4 and Section 2.5, and are fine-tuned for the classification of the dataset described in Section 2.1. As before, we perform a hyperparameter search to evaluate how the performance of the models is affected. The full combination of hyperparameters, as well as train, validation, and test set accuracies, is presented in Appendix B (Table A3).
Figure 7 and Table 7 show the most important summary of results of the transfer learning experiments, with details on the test set accuracies. As before, results show that ResNet18 tends to have better performance than ResNet50 in the petrographic thin-section image data; this is likely due to the limited data size. Curiously, training the networks with randomly initialized weights or fine-tuning the proposed datasets have, in general, a comparable performance for mean accuracy. The mean accuracy of ResNet50 trained on the HAM10000 and on the MSI vs. MSS is considerably smaller than the baseline results. However, the best performing models, those showing higher accuracy on the test set, are slightly larger for most of the fine-tuned models. The only exception is ResNet50 pretrained on the MSI vs. MSS dataset, which has the same maximum accuracy as the ResNet50 trained with randomly initialized weights. Although the proposed pretraining datasets provided only a marginal or comparable results than what can be achieved by training networks trained with randomly initialized weights, the technique of pretraining the model on the ImageNet showed the most stable and accurate results. Pretraining the models on ImageNet improved the networks’ accuracy, and the results for ResNet18 are consistent, even with different choices of hyperparameters. Although RMSprop was generally detrimental for models pretrained on the other datasets, RMSprop and Adam optimizers with the same batch size and learning rage (64, and 1 × 10−4, respectively) achieved the highest performance for ResNet18 pretrained on ImageNet. Figure 8 shows a comparison of the accuracy on the test set for different hyperparameters, identifying the optimizer choice, for ResNet50 trained with randomly initialized weights and pretrained on ImageNet. As previously demonstrated, the results show that pretraining on ImageNet helps the model achieve higher accuracy. The figure makes it clear that RMSprop does not affect the performance negatively when the model is pretrained on ImageNet, whereas RMSprop is detrimental when using randomly initialized weights.
The following figures focus on ResNet18, as its results are better than ResNet50. Figure 9 shows the best performing fine-tuned ResNet18 models when pretrained on the ImageNet, HAM10000, and MSI vs. MSS datasets. ResNet18 pretrained on RawFooT shows similar behavior; however, it trained for longer. Results show a break in continuity around epoch five, when the full model is unfrozen and all the weights can be updated. Such a break is more pronounced on the ResNet18 trained on ImageNet (grey curves in Figure 9). Results show that the network tends to overfit after some epochs, with the distance between the train and validation loss (accuracy) keeping somewhat constant or increasing (decreasing). Regularization and decreases in the learning rate can help with overfitting. We choose to present only the best performing models, but the curves obtained during hyperparameter search behave similarly.
Figure 10 shows the confusion matrix computed on the test set for the best performing models, including the baseline model trained with randomly initialized weights, as well as the fine-tuned models for ResNet18. Results show, in general, that ResNet18 is capable of correctly classifying most of the images. The most significative confusion seems to be for the baseline between the classes BMdst and AMdst. ResNet18 pretrained with the MSI vs. MSS dataset shows a similar performance, but also confuses one BMdst with MCcSt. ResNet18 pretrained with the ImageNet dataset incorrectly classified AMdst as BMdst twice. The remaining confusion is caused by very few, generally one or two, images classified incorrectly. Results in Figure 10 indicate that ResNet18 can achieve high levels of accuracy for the classification of thin-section image data, regardless of pretraining. It also shows that adequate pretraining, when limited data are available, can aid in network differentiation between classes in which the between-class variance is small. Appendix C (Table A4, Table A5, Table A6, Table A7 and Table A8) shows complementing metrics for the results in Figure 10.
Moving away from the performance of the models, the next set of figures shows the weights of the models. The first convolutional layer of ResNet18 is composed of 64 kernels with a size of 7 by 7. They represent a fraction of the layers in ResNet18, but they are closer to the model input and are thus easier to exhibit and interpret. Each one of these 64 kernels aligns with a RGB image; therefore, they can also be shown as RGB images. Figure 11 shows the 64 weights of the baseline ResNet18, as well as the fine-tuned ResNet18 pretrained on ImageNet. The weights in Figure 11a are closer to random when compared to the weights in Figure 11b. The weights in Figure 11b are very simi-lar before and after fine-tune, meaning that there are almost not changes during the fine-tuning process.
Figure 12 shows a comparison of the weights when RawFooT is the dataset used for the primary task. Figure 12a shows the weights of the first convolutional layer of the model before fine-tuning, and Figure 12b shows the weights after fine-tuning on the thin-section dataset. The weights in Figure 12a act as different colors and edges detectors. In contrast to Figure 11b, the edge filters in Figure 12a show a sensitivity to color and are not as well defined. There are also semi-circular filters, or color blobs. The weights in Figure 12b retain some of such features, but exhibit weaker organization, moving towards values similar to the baseline ResNet18 model (Figure 11a).
This transition towards weights similar to what is observed for the baseline is also evident in the weights when MSI vs. MSS is the dataset used for the primary task. Figure 13a shows the weights of the model trained on MSI vs. MSS before fine-tuning, and Figure 13b shows the weights of the model after fine-tuning on the thin-section dataset. The weights in Figure 13a show somewhat oriented edge detectors, as well as some color blobs. Much of such organization, however, is lost after the model is fine-tuned on the thin-section dataset (Figure 13b). ResNet18 pretrained on HAM10000 shows a similar behavior.

3.2. Interpretation

We interpret that the higher accuracy of ResNet18 when compared to ResNet50 in Table 5 is an indication that the accuracy becomes saturated even with residual connections, or an indication that ResNet50 is harder to train due to its larger number of parameters. Further investigation and different hyperparameters might dimmish this issue, as accuracy saturation is one of the issues ResNets were proposed to address. Table 5 also shows that there is a large difference in accuracy between train and test sets for the MSI vs. MSS for both ResNet18 (0.2) and ResNet50 (0.19). We interpret that one of the main reasons for this difference is that the training data are perfectly balanced, whereas MSS data are much larger than MSI on the test set (Table 4).
Table 6 shows that ResNet18 also has a better performance than ResNet50 in the thin-section data. In this case, the interpretation is that the reduced performance is explained by the large number of parameters in the larger model. Without a sufficiently large dataset, ResNet50 has a greater capability to overfit the training data. The table also shows the importance of the choice of hyperparameters. Two examples are easily noted due to their weak performance in comparison to the other experiments, namely rows 5 and 8. As described, the interpretation for the difference in performance is attributed to the choice of hyperparameters.
The most noticeable pattern in the confusion matrix in Figure 10 is the misclassifications of the models between the classes BMdst and AMdst, where the models classify BMdst as AMdst; this is stronger for the baseline model. This confusion is caused because these two classes are very similar clay-rich facies; the only difference between them is the amount of bioturbation, which is higher in the BMdst. Additionally, this bioturbation is described using 2.5× magnification, but it is not particularly evident when using 10X zoom. However, this demonstrates the accuracy of the models when it comes to differentiating between two classes, even if only showing small differences and being described using different magnifications than the ones used in the model. Figure 10 also shows that the ResNet18 pretrained on RawFooT was the only model that did not incorrectly classify BMdst samples as AMdst. We interpret that the combination of lighting and texture present in RawFooT helped the model to learn filters that are also useful for bioturbation detection, although this is not clearly defined in the weights in Figure 12. In contrast to the challenges between AMdst and BMdst, the models show better results when differentiating between MCSt and MCcSt. The difference between them, MCSt and MCcSt, is the amount of calcite cement that is evident at different scales. Moreover, the samples are stained with red Alizarin, which is used for calcite identification, which also helps the models.
The results in Figure 7 and Table 7 are somewhat unexpected because they indi-cate that training the networks with randomly initialized weights or fine-tuning the proposed datasets provide similar mean accuracy results, whereas fine-tuning net-works pretrained in ImageNet achieves higher performance. Based on Figure 11, Figure 12 and Figure 13 our interpretation is that the features learned by models trained on ImageNet are more general. For example, Figure 11b shows several edge detectors mostly black and white, indicating color invariance, and some well-defined color blobs. Results in Figure 12a and Figure 13a show some similarity to the ImageNet weights, however sensible to color contrast (the edge detectors are colorful and blobs are not surrounded by a grey background). As the weights obtained when trained on the proposed datasets seem more specific to each one of the datasets used in the primary task, these weights are more disturbed during the fine-tuning and, in general, move towards the values found for the baseline in Figure 11a. Curiously as well, these baseline weights do not show easily recognizable features and visually appear almost random. This is likely due to a combination of the small number of samples in the dataset, the general nature of the thin section images where minerals are generally somewhat randomly distributed, and the small subsection of weights (only one layer) investigated.

4. Discussion

As previously observed in several studies, the results presented here show once again that CNNs can achieve high levels of accuracy for the classification of petro-graphic thin section image data. Using transfer learning and repurposing models pre-trained for the classification of the ImageNet can help increase the performance of the CNN models. Unlike most of the previous studies, here we evaluate the performance of pretraining CNNs on different datasets that are visually more similar to petrographic thin section images than the natural images of ImageNet. On average, pretraining ResNet18 and ResNet50 on the proposed datasets did not improve the accuracy significantly compared to training such models with randomly initialized weights (Figure 7 and Table 7). However, hyperparameter tuning shows that pretraining ResNet18 (ResNet50) on the proposed datasets can lead to improvements in accuracy of up to 4% (7%) in the performance of classification of petrographic thin section image data as the “Maximum” column in Table 7 shows. In the experiments described here, we limited the search of hyperparameters, selecting among some of the most common hyperparameters as that was sufficient for the proposed analysis. However, several hyperparameters and training strategies could be evaluated when the objective is directly related to improving models’ accuracy. For example, regularization and decaying the learning rate can help with overfitting. Bello et al. [41] studied how training strategies and hyperparameters can affect the performance of ResNets.
Although MSI vs. MSS dataset was the largest of the proposed datasets, with more than 90k samples in the train set (Table 4), pretraining ResNets on MSI vs. MSS did not improve the classification significantly when compared to HAM10000 (roughly 10k samples in the train set, Table 2) or RawFooT (roughly 25k samples in the train set Table 3) datasets. This is unexpected, as the larger dataset was anticipated to help the model generate more robust filters compared to the smaller datasets. In fact, the results obtained by pretraining the models on the proposed datasets are similar, except for ResNet50 pretrained on the RawFooT dataset that shows a slightly improved mean average and minimum value. This is likely due to ResNet50 learning more generic filters, such as edge and texture detectors, on RawFooT than in the other proposed datasets.
Results of the experiments show that the widely adopted strategy to use models pretrained on ImageNet is usually very favorable for the classification of petrographic thin-section data, even though the datasets have different distributions. Models pretrained on ImageNet tend to exhibit strong performance with a wide selection of hyperparameters, making them easier to train. We argue that such superior performance is due to two main factors: ImageNet number of samples and variability and general GPU hours employed for hyperparameter tuning. As previously described, the number of samples in ImageNet is one or two orders of magnitude larger than the number of samples of the proposed datasets. Computer vision researchers training models with ImageNet datasets generally have access to more powerful GPUs and higher experience in computer vision tasks, and are thus able to perform more experiments in the search for a good hyperparameter setting than the average geoscientist researcher. Moreover, the weights for models trained on ImageNet are easily available in the most popular deep learning frameworks. Thus, using models pretrained on ImageNet remains a valuable approach for the classification of petrographic thin-section image data. Yosinski et al. [13] showed that transferring features even from distant tasks can be better than training models with randomly initialized weights, although feature transferability reduces as the distance between the primary and secondary task grows. The results presented here are then partially aligned with what was observed before in [13]. The initial expectation was that the proposed datasets would facilitate transferability; however, results show that more accurate models can be obtained by fine-tuning models previously trained on ImageNet.
In the study presented here, we use a five-crop technique to accommodate for images larger than what is generally used as input for CNN models and to generate models that are able to classify petrographic thin-section images based on an average of the image. Extracting patches averaging their predictions is a common strategy widely used to increase performance (e.g., [3,42]). Sultana et al. [43] showed that multiple cropping strategies often outperform single cropping strategies. The five-crop technique is somewhat easier than what was presented in [26], with the advantage of also being incorporated in some deep learning frameworks. Generally, in [26], the thin-section photograph was split into six smaller crops, and each crop was treated as one independent sample. The final classification of the full photograph was given based on the classification of each one of such smaller crops. Such technique allows for a finer control on the number of crops extracted from a single photograph, however converting the final probabilities to a single class for each one of the crops, i.e., converting a continuous value to a categorical value might attenuate small differences that are better accommodated based on a photograph mean average.
Results of the experiments presented here also show that the petrographic thin-section image could correctly be classified, even though the images were taken with two different magnification levels, expressing that CNNs can be trained to be scale invariant. Thin-section images at different magnifications were successfully used before, for example, by Koeshidayatullah et al. [28], who assembled thin-section images with different magnifications for carbonate petrography. Graziani et al. [44] observed that CNN trained on natural images must achieve scale invariance due to viewpoint variations, and they studied this property in models trained on ImageNet. It is helpful to know that CNN models did not struggle with such variations in the characteristics of the data, as this can be used to facilitate the assemblage of larger petrographic thin-section datasets. That said, we anticipate that the variable magnification strategy might fail for some objectives. For example, the magnification level necessary for a sandstone modal analysis is very different to the magnification level needed to identify types of clays in the same sample, which indicates the same dataset likely needs labels for different levels of magnification. We believe multiple magnification levels can be used if the general characteristics of the thin section leading to the defined classification (e.g., mineralogical composition and texture) can be observed at different scales.
In general, the results presented for the best performing ResNet18s pretrained on different datasets show comparable performances. Moreover, [26] showed that the accuracy of such models tends to decrease when applied to classify thin section data processed by different laboratories. Larger datasets would be helpful to understand what characteristics of the images make the models fail the classification. Although petrographers often photograph locations of the thin section on which the grains/crystals—or fossils—exhibit different behaviors, rather than the average behavior used for the thin-section description and microfacies classification, we believe the creation of a larger petrographic thin-section dataset can be helpful for many geoscientists. With larger models, our community would be able to pretrain robust petrographic classifiers and make weights available for fine-tuning on specific formations. However, the results of this paper, specifically the transfer learning analysis performed with the MSI vs. MSS dataset, indicate that such a petrographic dataset should likely have hundreds of thousands of samples.

5. Conclusions

The number of studies using models pretrained on ImageNet, and the fine-tuning of such models for the classification of thin-section data, is increasing. Despite the difference in characteristics of the samples from ImageNet, a dataset with a wide range of classes and image characteristics, with the samples of petrographic thin-section image data, the fine-tuning strategy proved itself reliable and generated models that achieved the best performance in experiments conducted with thin sections from five different wells from the Sycamore formation. Although expected to facilitate fine-tuning, on average, pretraining CNN models on datasets that are visually more similar to petrographic thin-section images did not show significant improvements than training CNN models with randomly initialized weights. Larger petrographic datasets should be helpful for the creation of more robust CNN models.

Author Contributions

Conceptualization, R.P.d.L. and D.D.; methodology, R.P.d.L.; software, R.P.d.L. and D.D.; validation, R.P.d.L. and D.D.; formal analysis, R.P.d.L.; investigation, R.P.d.L. and D.D.; resources, R.P.d.L. and D.D.; data curation, D.D.; writing—original draft preparation, R.P.d.L.; writing—review and editing, R.P.d.L. and D.D; visualization, R.P.d.L.; supervision, R.P.d.L.; project administration, R.P.d.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The petrographic thin-section data presented in this study can be accessed through https://zenodo.org/record/5071794#.YONrUuj0lPY (accessed on 7 August 2021). The HAM10000 can be accessed using the following link: https://doi.org/10.7910/DVN/DBW86T (accessed on 7 August 2021). The dataset named in this manuscript as “MSI vs. MSS” can be accessed at https://zenodo.org/record/2530835#.YL_eOPn0lPb (accessed on 7 August 2021). This work uses only the colorectal cancer images. The RawFooT dataset can be accessed at http://projects.ivl.disco.unimib.it/minisites/rawfoot/ (accessed on 7 August 2021).

Acknowledgments

We thank our colleagues at the Attribute Assisted Seismic Processing and Interpretation consortium at the University of Oklahoma for the discussions that led us to investigate this problem, especially Kurt Marfurt. We thank Katie Welch for fruitful discussions and for help reviewing the manuscript. We thank the reviewers and editors for their valuable comments that greatly helped us improve the quality of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

This appendix provides extra details about the pretraining step. Table A1 shows the hyperparameter tuning for the different datasets used to pretrain the models. Table A2 shows the accuracies for each one of the sets and for each one of the models presented in Table A1. Table A2 also shows the approximate execution time (training and evaluation), with identification of the hardware for each one of the experiments. As described in the main text, the model stops training after five epochs without improvements on the validation set. ImageNet training parameters are not shown, as the pretrained models are downloaded directly from [33].
Table A1. Hyperparameters tested for pretraining models.
Table A1. Hyperparameters tested for pretraining models.
Model NameDataset# of LayersBatch SizeOptimizerLearning Rate
ResNet18-M-128-A3MSI vs. MSS18128Adam1 × 10−3
ResNet18-M-512-A3MSI vs. MSS18512Adam1 × 10−3
ResNet18-M-1024-A3MSI vs. MSS181024Adam1 × 10−3
ResNet50-M-512-A3MSI vs. MSS50512Adam1 × 10−3
ResNet50-M-1024-A3MSI vs. MSS501024Adam1 × 10−3
ResNet18-R-A2
ResNet18-R-A3
ResNet18-R-A4
RawFooT
RawFooT
RawFooT
18
18
18
128
128
128
Adam
Adam
Adam
1 × 10−2
1 × 10−3
1 × 10−4
ResNet50-R-A2
ResNet50-R-A3
ResNet50-R-A4
RawFooT
RawFooT
RawFooT
50
50
50
128
128
128
Adam
Adam
Adam
1 × 10−2
1 × 10−3
1 × 10−4
ResNet18-H-64-A3HAM100001864Adam1 × 10−3
ResNet18--H-128-A3HAM1000018128Adam1 × 10−3
ResNet18--H-128-R3HAM1000018128RMSprop1 × 10−3
ResNet18--H-256-A3HAM1000018256Adam1 × 10−3
ResNet50-H-64-A3HAM100005064Adam1 × 10−3
ResNet50--H-128-A3HAM1000050128Adam1 × 10−3
ResNet50--H-128-A4HAM1000050128Adam5 × 10−4
ResNet50--H-256-A3HAM1000050256Adam1 × 10−3
Table A2. Accuracy obtained for the primary task and other training information. The bold indicates best performing in group.
Table A2. Accuracy obtained for the primary task and other training information. The bold indicates best performing in group.
Model NameTrainingValidationTestGPUElapsed Time 1Epochs
ResNet18-M-128-A30.950.920.70GTX 105011.5 h34
ResNet18-M-512-A30.940.910.71GTX 105011.5 h33
ResNet18-M-1024-A30.930.910.70GTX 10509.5 h28
ResNet50-M-512-A30.920.910.68GTX 105024 h32
ResNet50-M-1024-A30.910.900.71GTX 105020 h27
ResNet18-R-A2
ResNet18-R-A3
ResNet18-R-A4
0.97
0.98
0.97
0.92
0.96
0.92
0.88
0.88
0.88
Quadro K1200
Quadro K1200
Quadro K1200
12 h
8 h
11.5 h
38
25
38
ResNet50-R-A2
ResNet50-R-A3
ResNet50-R-A4
0.85
0.89
0.90
0.79
0.87
0.83
--
0.73
--
Quadro K1200
Quadro K1200
Quadro K1200
6 h
7 h
9 h
9
10
14
ResNet18-H-64-A30.780.750.73GTX 10501 h16
ResNet18--H-128-A30.790.770.76GTX 10501.5 h21
ResNet18--H-128-R30.800.760.75GTX 10502 h30
ResNet18--H-256-A30.770.730.75GTX 10501 h15
ResNet50-H-64-A30.780.750.73GTX 10502 h29
ResNet50--H-128-A30.780.740.74GTX 10502 h28
ResNet50--H-128-A40.770.760.74GTX 10501.5 h16
ResNet50--H-256-A30.800.760.75GTX 10502.5 h32
1 The elapsed time includes all training and testing, but performance might vary according to secondary tasks running concomitantly.

Appendix B

Hyperparameter search for fine-tuned models discussed in Section 3.
Table A3. Hyperparameter search for fine-tuned models.
Table A3. Hyperparameter search for fine-tuned models.
ModelPretrainedBatch SizeBatch Size
(Accumulated)
OptimizerLearning
Rate
PatienceLast
Epoch
Train
Accuracy
Validation
Accuracy
Test
Accuracy
ResNet18RawFooT48Adam1.00 × 10−310250.730.640.65
ResNet18RawFooT416Adam1.00 × 10−310490.820.720.85
ResNet18RawFooT432Adam1.00 × 10−410260.840.790.81
ResNet18RawFooT432Adam1.00 × 10−310320.770.780.80
ResNet18RawFooT464Adam1.00 × 10−410350.870.690.80
ResNet18RawFooT464RMSprop1.00 × 10−310480.730.610.79
ResNet18RawFooT464Adam1.00 × 10−310720.950.850.93
ResNet18RawFooT4128Adam1.00 × 10−310370.850.790.82
ResNet18MSI vs. MSS1632Adam1.00 × 10−45260.910.780.84
ResNet18MSI vs. MSS1664Adam1.00 × 10−35250.910.790.77
ResNet18MSI vs. MSS1664Adam5.00 × 10−55210.870.740.85
ResNet18MSI vs. MSS1664RMSprop1.00 × 10−45210.820.560.73
ResNet18MSI vs. MSS1664Adam1.00 × 10−45250.920.780.91
ResNet18MSI vs. MSS16128Adam1.00 × 10−45230.880.750.85
ResNet18ImageNet1632Adam1.00 × 10−45141.000.890.94
ResNet18ImageNet1664RMSprop5.00 × 10−55201.000.920.94
ResNet18ImageNet1664RMSprop1.00 × 10−45190.990.820.95
ResNet18ImageNet1664Adam5.00 × 10−55180.980.930.94
ResNet18ImageNet1664Adam1.00 × 10−45151.000.910.95
ResNet18ImageNet16128Adam1.00 × 10−45180.990.930.94
ResNet18HAM100001632Adam5.00 × 10−55240.910.770.86
ResNet18HAM100001664RMSprop5.00 × 10−55140.820.520.68
ResNet18HAM100001664Adam1.00 × 10−55520.850.790.86
ResNet18HAM100001664Adam5.00 × 10−55370.950.860.93
ResNet18HAM100001664Adam1.00 × 10−45180.890.570.81
ResNet18HAM1000016128Adam5.00 × 10−55320.890.780.86
ResNet50RawFooT48Adam1.00 × 10−310580.790.750.84
ResNet50RawFooT416Adam1.00 × 10−310300.750.760.79
ResNet50RawFooT432Adam1.00 × 10−410430.820.710.84
ResNet50RawFooT432Adam1.00 × 10−310500.870.690.73
ResNet50RawFooT464RMSprop1.00 × 10−310340.660.480.55
ResNet50RawFooT464Adam1.00 × 10−310390.800.770.89
ResNet50MSI vs. MSS48Adam1.00 × 10−310530.760.740.65
ResNet50MSI vs. MSS416Adam1.00 × 10−310350.720.480.82
ResNet50MSI vs. MSS432Adam5.00 × 10−510250.630.560.69
ResNet50MSI vs. MSS432Adam1.00 × 10−410210.730.640.66
ResNet50MSI vs. MSS464RMSprop1.00 × 10−310320.600.550.54
ResNet50MSI vs. MSS464Adam1.00 × 10−310400.830.750.81
ResNet50MSI vs. MSS4128Adam1.00 × 10−310570.920.760.77
ResNet50ImageNet48Adam1.00 × 10−310140.780.700.82
ResNet50ImageNet416Adam1.00 × 10−310140.790.770.85
ResNet50ImageNet432RMSprop1.00 × 10−410281.000.910.90
ResNet50ImageNet432Adam5.00 × 10−510311.000.900.84
ResNet50ImageNet432Adam1.00 × 10−410371.000.940.90
ResNet50ImageNet464Adam1.00 × 10−310330.960.850.84
ResNet50ImageNet4128Adam1.00 × 10−310270.980.750.76
ResNet50HAM1000048Adam1.00 × 10−310340.690.740.74
ResNet50HAM10000416Adam1.00 × 10−310290.720.610.74
ResNet50HAM10000432Adam5.00 × 10−510240.650.530.68
ResNet50HAM10000432Adam1.00 × 10−410220.740.640.68
ResNet50HAM10000464Adam1.00 × 10−410310.790.680.66
ResNet50HAM10000464RMSprop1.00 × 10−310280.650.590.49
ResNet50HAM10000464Adam1.00 × 10−310540.920.790.89
ResNet50HAM100004128Adam1.00 × 10−310390.890.850.77

Appendix C

This appendix provides extra performance metrics for the test set for the best performing fine-tuned ResNet18 models when pretrained on the ImageNet, HAM10000, and MSI vs. MSS datasets, as well as the Baseline ResNet18, as described in Section 3, specifically in relation to the results shown in Figure 10.
Table A4. Classification report for ResNet18—baseline. The overall accuracy is 0.89.
Table A4. Classification report for ResNet18—baseline. The overall accuracy is 0.89.
ClassPrecisionRecallF1-ScoreSupport
AMdst0.771.000.8720
BMdst0.930.700.8020
MCSt0.950.950.9520
MCcSt0.950.900.9220
macro average0.900.890.8980
weighted average0.900.890.8980
Table A5. Classification report for ResNet18 pretrained on ImageNet. The overall accuracy is 0.95.
Table A5. Classification report for ResNet18 pretrained on ImageNet. The overall accuracy is 0.95.
ClassPrecisionRecallF1-ScoreSupport
AMdst0.950.900.9220
BMdst0.900.900.9020
MCSt1.001.001.0020
MCcSt0.951.000.9820
macro average0.950.950.9580
weighted average0.950.950.9580
Table A6. Classification report for ResNet18 pretrained on HAM10000. The overall accuracy is 0.93.
Table A6. Classification report for ResNet18 pretrained on HAM10000. The overall accuracy is 0.93.
ClassPrecisionRecallF1-ScoreSupport
AMdst0.900.950.9320
BMdst1.000.850.9220
MCSt0.900.950.9320
MCcSt0.900.950.9320
macro average0.930.930.9280
weighted average0.930.930.9280
Table A7. Classification report for ResNet18 pretrained on RawFooT. The overall accuracy is 0.93.
Table A7. Classification report for ResNet18 pretrained on RawFooT. The overall accuracy is 0.93.
ClassPrecisionRecallF1-ScoreSupport
AMdst1.000.850.9220
BMdst0.900.950.9320
MCSt0.950.900.9220
MCcSt0.871.000.9320
macro average0.930.920.9280
weighted average0.930.930.9280
Table A8. Classification report for ResNet18 pretrained on MSI vs. MSS. The overall accuracy is 0.91.
Table A8. Classification report for ResNet18 pretrained on MSI vs. MSS. The overall accuracy is 0.91.
ClassPrecisionRecallF1-ScoreSupport
AMdst0.801.000.8920
BMdst1.000.700.8220
MCSt1.000.950.9720
MCcSt0.911.000.9520
macro average0.930.910.9180
weighted average0.930.910.9180

References

  1. Fukushima, K. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position. Biol. Cybern. 1980, 36, 193–202. [Google Scholar] [CrossRef]
  2. LeCun, Y.; Boser, B.E.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.E.; Jackel, L.D. Handwritten Digit Recognition with a Back-Propagation Network. In Advances in Neural Information Processing Systems 2; Touretzky, D.S., Ed.; Morgan-Kaufmann: Denver, CA, USA, 1990; pp. 396–404. [Google Scholar]
  3. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
  4. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
  5. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  6. Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  7. LeCun, Y. The MNIST Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 7 August 2021).
  8. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context BT—Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
  9. Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images, Technical Report TR-2009; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
  10. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  11. Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning; Guyon, I., Dror, G., Lemaire, V., Taylor, G., Silver, D., Eds.; PMLR: Bellevue, WA, USA, 2012; Volume 27, pp. 17–36. [Google Scholar]
  12. Razavian, A.S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 512–519. [Google Scholar]
  13. Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How Transferable Are Features in Deep Neural Networks? Adv. Neural Inf. Process. Syst. 2014, 27, 3320–3328. [Google Scholar]
  14. Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In Proceedings of the International Workshop at International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  15. Olah, C.; Mordvintsev, A.; Schubert, L. Feature Visualization. Distill 2017, 2, e7. [Google Scholar] [CrossRef]
  16. Olah, C.; Satyanarayan, A.; Johnson, I.; Carter, S.; Schubert, L.; Ye, K.; Mordvintsev, A. The Building Blocks of Interpretability. Distill 2018, 3, e10. [Google Scholar] [CrossRef]
  17. Carter, S.; Armstrong, Z.; Schubert, L.; Johnson, I.; Olah, C. Activation Atlas. Distill 2019, 4, e15. [Google Scholar] [CrossRef]
  18. Hu, F.; Xia, G.-S.; Hu, J.; Zhang, L. Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef] [Green Version]
  19. Zamir, A.R.; Sax, A.; Shen, W.; Guibas, L.J.; Malik, J.; Savarese, S. Taskonomy: Disentangling Task Transfer Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  20. Norouzzadeh, M.S.; Nguyen, A.; Kosmala, M.; Swanson, A.; Palmer, M.S.; Packer, C.; Clune, J. Automatically Identifying, Counting, and Describing Wild Animals in Camera-Trap Images with Deep Learning. Proc. Natl. Acad. Sci. USA 2018, 115, E5716–E5725. [Google Scholar] [CrossRef] [Green Version]
  21. Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 Dataset, a Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions. Sci. Data 2018, 5, 180161. [Google Scholar] [CrossRef]
  22. Kather, J.N.; Pearson, A.T.; Halama, N.; Jäger, D.; Krause, J.; Loosen, S.H.; Marx, A.; Boor, P.; Tacke, F.; Neumann, U.P.; et al. Deep Learning Can Predict Microsatellite Instability Directly from Histology in Gastrointestinal Cancer. Nat. Med. 2019, 25, 1054–1056. [Google Scholar] [CrossRef]
  23. Pires de Lima, R.; Marfurt, K. Convolutional Neural Network for Remote-Sensing Scene Classification: Transfer Learning Analysis. Remote Sens. 2019, 12, 86. [Google Scholar] [CrossRef] [Green Version]
  24. Pires de Lima, R.; Suriamin, F.; Marfurt, K.J.; Pranter, M.J. Convolutional Neural Networks as Aid in Core Lithofacies Classification. Interpretation 2019, 7, SF27–SF40. [Google Scholar] [CrossRef]
  25. Baraboshkin, E.E.; Ismailova, L.S.; Orlov, D.M.; Zhukovskaya, E.A.; Kalmykov, G.A.; Khotylev, O.V.; Baraboshkin, E.Y.; Koroteev, D.A. Deep Convolutions for In-Depth Automated Rock Typing. Comput. Geosci. 2020, 135, 104330. [Google Scholar] [CrossRef] [Green Version]
  26. Pires de Lima, R.; Duarte, D.; Nicholson, C.; Slatt, R.; Marfurt, K.J. Petrographic Microfacies Classification with Deep Convolutional Neural Networks. Comput. Geosci. 2020, 142, 104481. [Google Scholar] [CrossRef]
  27. Liu, X.; Song, H. Automatic Identification of Fossils and Abiotic Grains during Carbonate Microfacies Analysis Using Deep Convolutional Neural Networks. Sediment. Geol. 2020, 410, 105790. [Google Scholar] [CrossRef]
  28. Koeshidayatullah, A.; Morsilli, M.; Lehrmann, D.J.; Al-Ramadan, K.; Payne, J.L. Fully Automated Carbonate Petrography Using Deep Convolutional Neural Networks. Mar. Pet. Geol. 2020, 122, 104687. [Google Scholar] [CrossRef]
  29. Ma, H.; Han, G.; Peng, L.; Zhu, L.; Shu, J. Rock Thin Sections Identification Based on Improved Squeeze-and-Excitation Networks Model. Comput. Geosci. 2021, 152, 104780. [Google Scholar] [CrossRef]
  30. Cusano, C.; Napoletano, P.; Schettini, R. Evaluating Color Texture Descriptors under Large Variations of Controlled Lighting Conditions. J. Opt. Soc. Am. A JOSAA 2016, 33, 17–30. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Limare, N.; Lisani, J.-L.; Morel, J.-M.; Petro, A.B.; Sbert, C. Simplest Color Balance. Image Process. Line 2011, 1, 297–315. [Google Scholar] [CrossRef] [Green Version]
  32. Bianco, S.; Cusano, C.; Napoletano, P.; Schettini, R.; Bianco, S.; Cusano, C.; Napoletano, P.; Schettini, R. Improving CNN-Based Texture Classification by Color Balancing. J. Imaging 2017, 3, 33. [Google Scholar] [CrossRef] [Green Version]
  33. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Proces. Syst. 2019, 32, 8024–8035. [Google Scholar]
  34. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  35. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the ICLR, San Diego, CA, USA, 22 December 2014. [Google Scholar]
  36. Tieleman, T.; Hinton, G. Lecture 6.5—RmsProp: Divide the Gradient by a Running Average of Its Recent Magnitude; 2012. Available online: https://www.youtube.com/watch?v=SJ48OZ_qlrc (accessed on 7 August 2021).
  37. WA PyTorch Lightning. GitHub. 2019, 3. Available online: https://github.com/PyTorchLightning/pytorch-lightning (accessed on 7 August 2021).
  38. Biewald, L. Experiment Tracking with Weights and Biases. Available online: https://www.wandb.com/ (accessed on 7 August 2021).
  39. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  40. Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. Int. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
  41. Bello, I.; Fedus, W.; Du, X.; Cubuk, E.D.; Srinivas, A.; Lin, T.-Y.; Shlens, J.; Zoph, B. Revisiting ResNets: Improved Training and Scaling Strategies. arXiv 2021, arXiv:2103.07579. [Google Scholar]
  42. Liu, Y.; Yin, B.; Yu, J.; Wang, Z. Image Classification Based on Convolutional Neural Networks with Cross-Level Strategy. Multimed Tools Appl. 2017, 76, 11065–11079. [Google Scholar] [CrossRef]
  43. Sultana, F.; Sufian, A.; Dutta, P. Advancements in Image Classification Using Convolutional Neural Network. In Proceedings of the 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India, 22–23 November 2018; pp. 122–129. [Google Scholar]
  44. Graziani, M.; Lompech, T.; Müller, H.; Depeursinge, A.; Andrearczyk, V. On the Scale Invariance in State of the Art CNNs Trained on ImageNet. Mach. Learn. Knowl. Extr. 2021, 3, 19. [Google Scholar] [CrossRef]
Figure 1. Thin-section data: (a) color balancing effect;(b) five-crop technique. The bottom row shows each one of the five crops extracted from the original sample and used for training and inference.
Figure 1. Thin-section data: (a) color balancing effect;(b) five-crop technique. The bottom row shows each one of the five crops extracted from the original sample and used for training and inference.
Geosciences 11 00336 g001
Figure 2. Examples of images in the petrographic thin-section test set and their corresponding labels. The figure shows two images for each one of the classes. The top image of each class is taken with a magnification zoom of 2.5×, the bottom image with 10×. The red labels in the bottom right-hand corner are randomly present in some 10× images, indicating 200 µm. Class names and descriptions are provided in Table 1.
Figure 2. Examples of images in the petrographic thin-section test set and their corresponding labels. The figure shows two images for each one of the classes. The top image of each class is taken with a magnification zoom of 2.5×, the bottom image with 10×. The red labels in the bottom right-hand corner are randomly present in some 10× images, indicating 200 µm. Class names and descriptions are provided in Table 1.
Geosciences 11 00336 g002
Figure 4. Examples of images in the RawFooT [30] test set and their corresponding labels. Table 3 shows class names and number of samples in the dataset.
Figure 4. Examples of images in the RawFooT [30] test set and their corresponding labels. Table 3 shows class names and number of samples in the dataset.
Geosciences 11 00336 g004
Figure 6. Training accuracy and loss for models trained on the thin-section data when the model starts with randomly initialized weights: (a) accuracy by epoch for the train and validation sets of models ResNet18 and ResNet50; (b) corresponding loss by epoch.
Figure 6. Training accuracy and loss for models trained on the thin-section data when the model starts with randomly initialized weights: (a) accuracy by epoch for the train and validation sets of models ResNet18 and ResNet50; (b) corresponding loss by epoch.
Geosciences 11 00336 g006
Figure 7. Test set accuracy for the transfer learning experiments: (a) accuracy for fine-tuned ResNet18 and (b) accuracy for fine-tuned ResNet50. Solid bars indicate the mean accuracy values. Black lines indicate minimum and maximum accuracy values. Blue bar on top indicates the baseline models trained with randomly initialized weights, as discussed in the text. The mean, minimum, and maximum values are shown in Table 7.
Figure 7. Test set accuracy for the transfer learning experiments: (a) accuracy for fine-tuned ResNet18 and (b) accuracy for fine-tuned ResNet50. Solid bars indicate the mean accuracy values. Black lines indicate minimum and maximum accuracy values. Blue bar on top indicates the baseline models trained with randomly initialized weights, as discussed in the text. The mean, minimum, and maximum values are shown in Table 7.
Geosciences 11 00336 g007
Figure 8. Experiment comparison by hyperparameter for ResNet50. This image shows some of the hyperparameters tested, as well as the accuracy in the test set. The solid lines show results for when the model is pretrained on the ImageNet and dashed lines show results for ResNet50 trained with randomly initialized weights.
Figure 8. Experiment comparison by hyperparameter for ResNet50. This image shows some of the hyperparameters tested, as well as the accuracy in the test set. The solid lines show results for when the model is pretrained on the ImageNet and dashed lines show results for ResNet50 trained with randomly initialized weights.
Geosciences 11 00336 g008
Figure 9. Training accuracy and loss for ResNet18 fine-tuned on the thin-section data: (a) accuracy by epoch for the train and validation sets best performing ImageNet, HAM10000, and MSI vs. MSS pretrained data; (b) corresponding loss by epoch.
Figure 9. Training accuracy and loss for ResNet18 fine-tuned on the thin-section data: (a) accuracy by epoch for the train and validation sets best performing ImageNet, HAM10000, and MSI vs. MSS pretrained data; (b) corresponding loss by epoch.
Geosciences 11 00336 g009
Figure 10. Confusion matrix computed on the test set for ResNet18 models. The color of the bars corresponds to the pretraining dataset. The size of the bar indicates the number of samples classified in each of the actual vs. predicted locations. Description of each one of the classes is given in Table 1.
Figure 10. Confusion matrix computed on the test set for ResNet18 models. The color of the bars corresponds to the pretraining dataset. The size of the bar indicates the number of samples classified in each of the actual vs. predicted locations. Description of each one of the classes is given in Table 1.
Geosciences 11 00336 g010
Figure 11. Weights for the first convolutional layer for ResNet18 used for the classification of the thin-section data: (a) baseline weights; (b) ResNet18 pretrained on ImageNet and fine-tuned on the thin-section dataset.
Figure 11. Weights for the first convolutional layer for ResNet18 used for the classification of the thin-section data: (a) baseline weights; (b) ResNet18 pretrained on ImageNet and fine-tuned on the thin-section dataset.
Geosciences 11 00336 g011
Figure 12. Weights for the first convolutional layer for ResNet18 used for the classification of the thin-section dataset: (a) ResNet18 trained on RawFooT; (b) same weights after the model is fine-tuned on the thin-section dataset.
Figure 12. Weights for the first convolutional layer for ResNet18 used for the classification of the thin-section dataset: (a) ResNet18 trained on RawFooT; (b) same weights after the model is fine-tuned on the thin-section dataset.
Geosciences 11 00336 g012
Figure 13. Weights for the first convolutional layer for ResNet18 used for the classification of the thin-section data: (a) ResNet18 trained on MSI vs. MSS; (b) same weights after the model is fine-tuned on the thin-section dataset.
Figure 13. Weights for the first convolutional layer for ResNet18 used for the classification of the thin-section data: (a) ResNet18 trained on MSI vs. MSS; (b) same weights after the model is fine-tuned on the thin-section dataset.
Geosciences 11 00336 g013
Table 1. Thin-section data details.
Table 1. Thin-section data details.
ClassMicrofaciesDescriptionTrainTest
AMdstArgillaceous mudstone Clay-rich mudstones. Structureless or slightly laminated. 6320
BMdstBioturbated mudstoneClay-rich mudstones. Evident bioturbation at 10X magnification.13420
MCcStMassive calcite cemented siltstoneSilt-rich mudstones. Structureless, abundant calcite cemented and calcareous pellets. 10420
MCStMassive calcareous siltstoneSilt-rich mudstones. Structureless, some calcite cement.13220
Table 2. The HAM10000 dataset details.
Table 2. The HAM10000 dataset details.
ClassDescriptionTrainTest
bklbenign keratosis-like lesions (solar lentigines/seborrheic keratoses and lichen-planus like keratoses)871228
nvMelanocytic nevi53671338
dfDermatofibroma8728
melMelanoma887226
vascVascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage)12121
bccBasal cell carcinoma42193
akiecActinic keratoses and intraepithelial carcinoma/Bowen’s disease25869
Table 3. The RawFooT dataset details.
Table 3. The RawFooT dataset details.
ClassDescriptionTrainTestClassDescriptionTrainTest
0001chickpeas3683680035hazelnut grain368368
0002corn3683680036flour368368
0003salt3683680037bread crumbs368368
0004cookie3683680038pasta (stars)368368
0005lentils3683680039cut spaghetti368368
0006candies3683680040pastina368368
0007green peas3683680041red cabbage368368
0008puffed rice3683680042grapefruit368368
0009spelt3683680043hamburger368368
0010white peas3683680044swordfish368368
0011cous cous3683680045bread368368
0012sliced bread3683680046candied fruit368368
0013apple slice3683680047chili pepper368368
0014pearl barley3683680048milk chocolate368368
0015oat3683680049garlic grain368368
0016black rice3683680050curry368368
0017quinoa3683680051pink pepper368368
0018buckwheat3683680052kiwi368368
0019puffed rice3683680053mango368368
0020basmati rice3683680054pomegranate368368
0021steak3683680055currant368368
0022fennel seeds3683680056pumpkin seeds368368
0023poppy seeds3683680057tea368368
0024brown sugar3683680058red lentils368368
0025sultana3683680059green adzuki368368
0026coffee powder3683680060linseeds368368
0027polenta flour3683680061coconut flakes368368
0028salami3683680062chicory368368
0029air-cured beef3683680063pork loin368368
0030flatbread3683680064chicken breast368368
0031corn crackers3683680065carrots368368
0032oregano3683680066sugar368368
0033black beans3683680067salmon368368
0034soluble coffee3683680068tuna368368
Table 4. The MSI vs. MSS dataset details.
Table 4. The MSI vs. MSS dataset details.
ClassDescriptionTrainTest
MSIMicrosatellite instable46,70428,335
MSSMicrosatellite stable46,70470,569
Table 5. Accuracy of the best performing models for the classification of each one of the datasets used for the primary task.
Table 5. Accuracy of the best performing models for the classification of each one of the datasets used for the primary task.
DatasetModelTrainValidationTest
MSI vs. MSSResNet180.940.910.71
RawFooTResNet180.980.960.88
HAM10000ResNet180.790.770.76
MSI vs. MSSResNet500.910.900.71
RawFooTResNet500.890.870.73
HAM10000ResNet500.800.760.75
Table 6. Training hyperparameters and performance for models trained on the thin-section data when the model starts with randomly initialized weights.
Table 6. Training hyperparameters and performance for models trained on the thin-section data when the model starts with randomly initialized weights.
ModelBatch Size
(Accumulated)
OptimizerLearning RateLast EpochTrain AccuracyValidation AccuracyTest
Accuracy
ResNet18128Adam1.00 × 10−4180.900.790.86
ResNet1864Adam1.00 × 10−4150.910.750.89*
ResNet1832Adam5.00 × 10−5150.900.670.81
ResNet1832Adam1.00 × 10−4100.860.720.86
ResNet1832Adam1.00 × 10−380.780.670.66
ResNet50256Adam1.00 × 10−3330.790.640.76
ResNet5064Adam1.00 × 10−3460.930.750.74
ResNet50128RMSprop1.00 × 10−3310.580.530.64
ResNet50128Adam1.00 × 10−2380.710.700.77
ResNet50128Adam1.00 × 10−3360.840.710.80
ResNet50128Adam5.00 × 10−3560.730.720.82
ResNet18128Adam1.00 × 10−4180.900.790.86 *
* Best performing models shown in Figure 6.
Table 7. Accuracies displayed in Figure 7.
Table 7. Accuracies displayed in Figure 7.
DatasetModelMeanMinimumMaximum
BaselineResNet180.820.660.89
ImageNetResNet180.940.940.95
HAM10000ResNet180.830.680.93
RawFooTResNet180.810.650.93
MSI vs. MSSResNet180.830.730.91
BaselineResNet500.750.640.82
ImageNetResNet500.840.760.90
HAM10000ResNet500.700.490.89
RawFooTResNet500.770.550.89
MSI vs. MSSResNet500.710.540.82
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pires de Lima, R.; Duarte, D. Pretraining Convolutional Neural Networks for Mudstone Petrographic Thin-Section Image Classification. Geosciences 2021, 11, 336. https://doi.org/10.3390/geosciences11080336

AMA Style

Pires de Lima R, Duarte D. Pretraining Convolutional Neural Networks for Mudstone Petrographic Thin-Section Image Classification. Geosciences. 2021; 11(8):336. https://doi.org/10.3390/geosciences11080336

Chicago/Turabian Style

Pires de Lima, Rafael, and David Duarte. 2021. "Pretraining Convolutional Neural Networks for Mudstone Petrographic Thin-Section Image Classification" Geosciences 11, no. 8: 336. https://doi.org/10.3390/geosciences11080336

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop