*3.1. Pattern Recognition Algorithm*

To identify the damage mechanisms in the reactive specimens a pattern recognition algorithm was employed for data classification. Pattern recognition is under the machine learning field and has two major types: unsupervised and supervised. When a background history for a data set is available or the data set has labeled classes, supervised pattern recognition is employed. If there are no labeled classes available, unsupervised pattern recognition can be utilized to identify the potential patterns in the data set based on the selected features.

An agglomerative hierarchical clustering algorithm [25] was employed for classifying the AE data into subsets. The clustering procedure is chronologically illustrated in Figure 3. The first step in the pattern recognition was deriving the frequency-based features for the signals. AE signals were transferred into the frequency domain using the Fast Fourier Transform (FFT). The FFT amplitude spectra were determined for each AE signal and the frequency domain was divided into ten frequency ranges. Then, signal energy in each frequency range was calculated and normalized to the total energy of the signal. These ten signal energies are the signal features. For example, in Figure 3a, the area of the hatched region is the energy of the signal in the frequency band from 200 kHz to 250 kHz. This value is normalized to the total energy of the signal, which is the entire area under the FFT spectrum. Principle component analysis (PCA) was used to reduce redundancy in the data. In this analysis, the original data is projected to the new orthogonal coordinates having high variation. An input for PCA is a matrix with the number of columns and rows equal to the feature numbers and the number of hits. Then a variance-covariance matrix for the features was calculated. The coefficients and variance of a specific principle component were calculated by eigenvalue analysis of the variance-covariance matrix. The principal components were selected in a way that represented more than 90% of the entire data. The principal components of the original features were selected as the input features for the pattern recognition algorithm. The algorithm initially calculated the Euclidian distance between the resulting data from PCA analysis. The result was a proximity matrix that contained distances between the original objects (data).

The objects were initially linked together according to calculated distances in the previous step and Ward's method. Ward's method is based on calculating the total within-cluster sum of squares of the data resulting from combining the clusters [25]. In each level, the data was merged into a binary linkage and the clusters were again merged into new clusters according to the Ward's method. This procedure was continued to form a single cluster which includes all data. The number of the cluster was determined according to the developed dendrogram and the height of each link with respect to the adjacent links [26].

**Figure 3.** Clustering procedure. (**a**) Energy-frequency based feature extraction; (**b**) Flow chart of data clustering steps.
