19

**Figure 5.** The first three rows show (1) the defect-free image segments with noise, (2) the PCA reconstruction, and (3) the residual between the raw and reconstructed image, respectively, for a range of imaging conditions. The bottom three rows show the same sequence images except the raw image contains a circular defect that has been randomly inserted. Notably, the PCA reconstructed image does not accurately reconstruct the defect since the PCA transformation was fitted only on images without defects.

The difference between an image segmen<sup>t</sup> and its reconstruction is referred to as the residual image. The residual image, intuitively, shows what is remaining when the general lattice structure is "subtracted" from the original image. Thus, the residual images consists of noise and any anomalies in the lattice structure. The reconstruction MSE can be regarded as a scalar that summarizes the residual image. For each of the 5600 images in the training set, we can compute the reconstruction MSE with and without a circular defect to understand the distribution of reconstruction MSE. Figure 6a shows how the presence of a defect changes the reconstruction MSE for each training example. In addition, Figure 6b shows how the addition of imaging noise affects the reconstruction MSE distribution with and without a defect. The concept of a residual image plays an important role in the CNN model that is presented in the next section.

After fitting the PCA transformation, we apply the resulting **Wk** to the test set images via a sliding window. Recall that each test set image is of dimension 1007 × 1024 and contains a single point defect with known location. We use a 84 × 118 sliding window across the 1007 × 1024 image and, for each window, we complete the following three steps: (1) generate the PCA reconstruction, (2) generate the residual between the original image segmen<sup>t</sup> and the reconstruction, (3) compute the pixel-wise mean squared error (MSE). We then generate a heatmap that shows the average reconstruction MSE for each pixel in the full-size TEM image. The predicted location of the defect corresponds to the area of the heatmap that has the largest reconstruction MSE. Figure 7 shows an example of a test image and the corresponding MSE heatmap. The defect in the test image is a substitutional defect where a single Gallium atom is replaced with a dopant atom that has a 5% larger radius. The defect is difficult to identify visually, but the heatmap accurately locates the defect. This method is applied to all imaging conditions in the test set and we evaluate the accuracy in predicting the location of each type of defect. Figure 3 summarizes the process for predicting defect location using PCA.

**Figure 6.** The scatterplot shows the reconstruction MSE of 5600 defect-free image segments (x-axis) in the PCA training set and the corresponding reconstruction MSE for the same image segmen<sup>t</sup> with a circular defect inserted. Points that are close to the *x* = *y* represent image segments where the reconstruction MSE does not differ much with or without a defect. The marginal plots show the distribution of reconstruction MSEs with and without defects. On left, without imaging noise. On right, with imaging noise.

**Figure 7.** Heatmap that shows the pixels with the largest average MSE based on the PCA reconstruction. Bright spots correspond to areas that are mostly likely to have a defect.

## *2.3. PCA-CNN Model*

In this section, we supplement the PCA-based detection method with a CNN classifier to improve the accuracy of the defect location predictions. This combined method significantly improves the prediction accuracy of the PCA model, especially in the case when there is imaging noise.

The PCA-based defect detection method has the benefit of being straightforward. However, in the presence of imaging noise, using PCA reconstruction error can lead to issues. Figure 5 shows the PCA residual images of segments with and without defects. In these particular examples, the reconstruction MSE for the defect images is actually lower than the reconstruction of the MSE for the defect-free images. Notably, if we visually inspect the residual images, the residual images clearly show the presence of a point defect. To address this shortcoming, we introduce a CNN classification model fitted on the PCA residual images. Intuitively, reconstruction MSE is equivalent to adding up the squared values in the residual image and it ignores any local patterns in the residual image. A CNN, on the other hand, can be trained to look for the presence of local patterns in the residual image that may be evidence of a defect. To the best of our knowledge, the use of the residual image for defect detection is a novel approach.

A CNN is a type of neural network that is commonly used for analyzing image data ([4]). The key concept in a CNN involves the use of small filters or kernels to extract local information from an image. A filter, often of dimension 3 × 3, is a matrix consisting of weights. The filter is applied to an image by sliding the filter across the image and taking the sum of the element-wise product between the filter weights and the image pixel values. The sums of these element-wise products are then stored in a new matrix, commonly referred to as a feature map, which can once again be analyzed using another set of filters. The training process of a CNN involves optimizing the filter weights to minimize a given loss function. In our application, our goal is to train a CNN to identify the presence of a defect within a residual image.

The training data for the CNN model begins with the same set of defect-free training images used to fit the PCA. Recall that 50 random crops from each of the 112 training images were used to fit the PCA. These same 5600 images are used to build a set of labeled training data for the CNN classifier. Since the training data only includes image segments that are defect-free, a set of labeled training data with defects is generated by adding random, circular defects to each of the 5600 training images. These circular defects could be representative of an interstitial defect or a vacancy, but they are not necessarily meant to represent a realistic defect that would be observed in an experimental image. Instead, the hope is that the CNN will learn to classify any residual image with an abnormal local pattern as one containing a defect. Since the circular defects are arbitrary and are added post-hoc to the simulated image, this method can easily be applied to experimental TEM images as well. After generating the labeled training, a CNN classification model is trained such that for an input PCA residual image, the model outputs a scalar *y*ˆ = P(defect) where P(defect) ∈ [0, 1] is the probability that the image segmen<sup>t</sup> contains a defect. A summary of the CNN model development process is visualized in Figure 3.

Our primary CNN architecture is adapted from the classic LeNet-5 architecture [34] and has 58,000 trainable parameters. Figure 8 shows the details of each layer of the CNN. It contains four convolutional layers with max-pooling following by two dense layers. We use a binary cross-entropy loss function and is optimized using nAdam. The model is trained for 200 epochs. Importantly, the training data are generated randomly for each batch so the location of the circular defects and noise patterns in the training set are randomized during training. The CNN is trained using Python 3.7 and Keras 2.3 with a TensorFlow 2.4.1 backend. The model achieves >99% training accuracy and test accuracy in less than 100 epochs. At the completion of 200 epochs, the test accuracy is 99.8% (Figure 9. Since the test set images are generated using a separate set of imaging conditions (focal length and thickness), the strong performance on the test set suggests that the trained CNN generalizes well to imaging conditions that were not included in the training set.

In addition to the LeNet based architecture, a VGG-16 architecture [35] was also implemented for comparison. The VGG-16 model was pretrained on ImageNet and the top dense layers were retrained using the TEM images. This resulted in 14.7 million fixed parameters and 3.2 million trainable parameters. After training for 100 epochs, the VGG-16 model achieved an accuracy of 98.2%. Given the much smaller size of the LeNet-based model and the better test set performance, the LeNet-based model was chosen as the preferred model.


**Figure 8.** CNN architecture motivated by LeNet-5.

**Figure 9.** Train and test set accuracy during the CNN training process. The input to the CNN model is a residual image of dimension 84 × 118 and the output is the probability that the image contains a defect, P(defect).

After training the CNN, an 84 × 118 sliding window is applied to each of the 168 test images that are 1007 × 1024 with one hidden point defect. Using a stride of four pixels, this process results in 50,000 image segments that must be classified as having a defect or not. For each 84 × 118 window, we apply the following three steps: (1) generate a PCA reconstruction, (2) generate a residual image between the original image segmen<sup>t</sup> and the PCA reconstruction, and (3) pass the residual image into the trained CNN to generate

P(defect). For each pixel in the 1007 × 1024 test image, we compute the average P(defect) for all sliding windows that contain the pixel. This results in a smoothed heatmap for the entire test image. The location of the defect is then predicted to be the area of the heatmap that has the highest average P(defect). The heatmap shown earlier in Figure 3 is an example of a heatmap generated using the CNN classification model with a sliding window.

In many applications of CNNs for anomaly detection, the output of the CNN classifier, P(defect), is compared to a fixed threshold value to determine if a particular input contains an anomaly or not [19]. Please note that a threshold is not necessary here since the predicted defect location is simply the pixel value with the largest average P(defect). If we generalize to the case where there are *n* defects in a GaAs sample, then the locations corresponding to the *n* largest average P(defect) would be the predicted locations of the defects.

## **3. Results**

In this section, we compare the performance of the two defect detection methods discussed above. Recall that there are 56 imaging conditions that were reserved for the test set and there are three defect types. For each combination of imaging condition and defect type, we generate 10 simulated TEM images, each of dimension 1007 × 1024, where the defect location is randomized. This results in 1680 test images where the defect location is known. For each of the 1680 test images (540 images for each of the three defect types), we apply the PCA and PCA-CNN defect detection methods to predict the location of the defect. We compare the predicted defect location to the true defect location to determine whether the model successfully located the defect.

Table 1 shows the accuracy of both methods in predicting the defect location for various levels of imaging noise. The PCA defect detection method performs particularly well in the case of no imaging. It accurately locates all three defects types at nearly >97% and generally outperforms the CNN model. However, as the imaging noise increases, we observe the superior performance of the CNN model. Specifically, when imaging noise rises to *σ*<sup>2</sup> = 0.10, the PCA model achieves an accuracy of 56% and 57% on antisite and circular defects, respectively, while the CNN model achieves 75% and 93% accuracy.

**Table 1.** Accuracy of the PCA and PCA-CNN model in locating point defects in the test set images. Table (a) shows the accuracy results when including all images in the test set. Table (b) shows the accuracy results when only the nominal defocus conditions are included. In both cases, the CNN model is more robust to imaging noise.


The results in Table 1a report the performance of the two methods under all test imaging conditions. Recall that the test set includes an equal number of TEM images for a range of defocus conditions. In practice, extreme defocus conditions are relatively uncommon and are actively avoided. Narrowing the focus on the central range of defocus conditions, {−6 nm, 0 nm, +6 nm}, provides a better representation of expected performance on experimental images. Table 1b shows the defect location accuracy of both methods under nominal defocus conditions. Under the restricted set of defocus conditions, the CNN model remains more robust in the presence of imaging noise. Specifically, when *σ*<sup>2</sup> = 0.10, the CNN model achieves 89% and 91% accuracy for antisite and circular defects, respectively, while the PCA model achieves 70% and 61% accuracy.

Based on these preliminary results, it appears that the substitution defects are more challenging to identify compared to the antisite and circular defect. This is unsurprising given that the substitution defects are also the most challenging to identify from visual inspection. The substitution defects were purposely subtle so as to determine the effectiveness of the proposed methods for a wide range of defects. In practice, the substitution defects are unlikely to sit precisely in a gallium or arsenic site. If the substitution defect is slightly misaligned, then it is likely that the proposed methods would be more effective in locating the defect. The antisite and random circular defects are more readily identified visually which is reflected in the accuracy results. Although the circular defect is not representative of a particular defect, the circular defect could be representative of an interstitial defect or a vacancy.

## **4. Discussion**

In this paper, we introduce two methods for determining the location of a point defect in a TEM image of GaAs. Compared to recent applications of using CNNs for defect detection ([1,2], and references therein), the proposed PCA and PCA-CNN methods of defect detection are unique in that they can be trained on TEM images that are defectfree. Unlike prior approaches to defect detection, this opens the door to training these models using experimental data. After training both models using a set of simulated images that are free of defects, we demonstrate the performance of both methods in locating a simulated defect in an HRTEM image. In the case of no imaging noise, we show the PCA method is sensitive to minor defects such as a subtle substitution defect (97% accuracy). However, as imaging noise is introduced, the performance of the PCA method declines rapidly. Supplementing the PCA method with a CNN classification model improves the performance of the model dramatically. The CNN classification model achieves >89% accuracy for both antisite and circular defects at the highest level of imaging noise ( *σ*<sup>2</sup> = 0.10). These results sugges<sup>t</sup> that the CNN approach has the potential to be highly effective in analyzing experimental images.

Our PCA-CNN classification model is unique in that it is trained on PCA residual images. Using the PCA reconstruction to generate a residual image is a novel approach that has notable benefits. One of the benefits is that it allows for a single pre-trained CNN to be used for a wide range of imaging conditions. This is in contrast to prior studies that rely on condition-specific models for defect detection. Imaging conditions, such as thickness and defocus condition, change the overall "pattern" that is visible in an TEM image. By taking the difference between an image segmen<sup>t</sup> and its reconstruction, we are, intuitively, "subtracting" the pattern that is associated with a set of imaging conditions. The residual images are then less correlated with the imaging conditions used to generate the TEM image and can be analyzed using a single pre-trained CNN. Another benefit is that using the residual images allows a CNN to more effectively classify defects. Specifically, when we trained a CNN classification model directly on image segments in the training set without using residual images, the trained model far underperformed our model that uses residual images. This suggests that the use of residual images is a key step in training an effective CNN classification model in the context of TEM images.

The results presented in this paper are based on simulated TEM images. However, the goal is to implement and adapt these methods for experimental images as they become available. We observe that experimental images pose unique challenges compared to simulated images. In the simulated TEM images, the imaging conditions and the imaging noise were assumed to be consistent across the entire image. In contrast, the thickness of a sample can vary in an experimental image and the imaging noise is unlikely to be consistent across an entire image. While additional steps will be necessary to account for these variations, we believe the key ideas of using PCA reconstructions and residual images will remain an integral part of analyzing defects in experimental TEM images.

## **5. Conclusions**

In this paper, we propose an anomaly detection method for locating point defects in crystalline materials using TEM images. The proposed method involves using a PCA reconstruction to generate a residual image and then a self-supervised CNN classifier to detect the presence of an anomaly in the residual image. Unlike earlier works that rely on extensive pixel-by-pixel labeled training data via simulation ([1,2]), our proposed method is a self-supervised method that only requires defect-free TEM images in the training set. Since the method only requires defect-free TEM images, it allows for the possibility of training a defect detection model directly on experimental TEM images that are defect-free. Additionally, our novel use of a residual image allows for strong results using a simple, computationally efficient CNN architecture that generalizes well to imaging conditions that are not included in the training set. Using simulated TEM images with a single point defect, we show that our PCA-CNN method is able to accurately locate point defects and it outperforms reconstruction error-based methods, particularly in the case when there is significant imaging noise.

**Author Contributions:** Conceptualization, P.C., A.W., K.M. and K.E.; methodology, P.C. and A.W.; software, P.C. and K.M.; investigation, P.C., A.W. and K.M.; formal analysis, P.C. and A.W.; investigation, P.C. and A.W.; writing—original draft preparation, P.C.; writing—review and editing, P.C., A.W., K.M. and K.E.; visualization, P.C.; supervision, A.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The simulated TEM images used in this study are being reviewed for public release by AFRL and are not currently available for public release. Individual data requests can be submitted to the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.
