3.6.1. Linear Discriminant Analysis

Linear discriminant analysis (LDA) [26] is widely used in the field of high-dimensional data classification as a supervised dimensionality reduction technology. It takes the separability of pattern data as the goal and finds a set of optimal discriminant vectors, which maximizes the between-class scatter measures while minimizing the within-class scatter measures. In this study, the eggshells could be divided into intact eggs and cracked eggs. This was a classification problem. Letting *C* be the number of categories, where *C* = 2, *x* is the n-dimensional features of the training sample, and *N* is the number of samples, the sample's within-class scatter matrix *SW* and between-class scatter matrix *SB* are shown below in Equations (14) and (15), respectively:

$$S\_W = \frac{1}{N} \sum\_{i=1}^{C} \sum\_{x \in c\_i} (x - \mu\_i)(x - \mu\_i)^T \tag{14}$$

$$S\_B = \sum\_{i=1}^{C} p\_i(\mu\_i - \mu)(\mu\_i - \mu)^T \tag{15}$$

where *pi* = *Ni*/*N* is the prior probability of each class, *Ni* is the number of training samples of class *Ci*(*i* = 1, 2, ... , *C*), *μ<sup>i</sup>* is the mean value of sample *Ci*, and *μ* is the mean of all samples.

The goal of LDA is to find the best projection matrix *W* so that the Fisher criterion is the largest, and its formula is

$$J(\mathcal{W}\_{opt}) = \underset{\mathcal{W}}{\arg\max} \frac{\left| \mathcal{W}^T \mathcal{S}\_b \mathcal{W} \right|}{\left| \mathcal{W}^T \mathcal{S}\_W \mathcal{W} \right|} \tag{16}$$

#### 3.6.2. K-Means Classification Algorithm

K-means [27] is a common unsupervised learning algorithm that is often used to discover the inherent regularities between datasets. The principle is that K samples are first randomly selected as cluster centers of K categories, and then, the Euclidean distance between the sample data and the k-th centroid is calculated to judge the correlation with this category. Then, it belongs to the category with the highest correlation. Such centroids will also be recalculated with the addition of new samples until the iteration is completed or the preset number of iterations is reached. The Euclidean distance between samples is

$$D(\mathbf{x}\_{i\prime}\mathbf{x}\_{j}) = \sqrt{\sum\_{n=1}^{N} (\mathbf{x}\_{i,n} - \mathbf{x}\_{j,n})^2} \tag{17}$$

where *Dxi*,*xj* is the Euclidean distance between samples *xi* and *xj* and *N* is the dimension of the sample data. *xi* represents the i-th sample data, and *xj* represents the j-th sample data. If the sample has C categories, *Ck* is used to represent the k-th cluster center, where *k* = 1, 2, . . . , *K*. First, K points in the sample are selected as centroids, followed by calculating the similarity between other points and the cluster center points and dividing them into K sets, denoted by *Ck*. Finally, the new cluster center is recalculated. The formula for *Ck* is

$$C\_k = \frac{1}{m\_k} \sum\_{\mathbf{x} \in \mathcal{C}\_k} \mathbf{x}\_k \tag{18}$$

where *mk* is the number of k-th category elements. During this process, the K-means clustering algorithm continuously reclassifies and updates the cluster centers, and this ends when the iteration reaches the maximum limit or the objective function is smaller than the threshold. Its objective function is

$$J = \sum\_{i=1}^{K} \sum\_{\mathbf{x}\_{i} \in \mathcal{C}\_{i}} D\_{\mathbf{x}\_{i}, \mathbf{x}\_{j}}(\mathbf{x}\_{i\prime} \mathcal{C}\_{k}) \tag{19}$$

#### 3.6.3. SVM

A support vector machine (SVM) is based on statistical learning and can solve linear and nonlinear problems at the same time. It shows good performance [28,29], especially in small-sample data when applied in a series of challenging practical problems. The basic idea of SVM is to find the optimal hyperplane that distinguishes the two classes by training the sample set and maximizing the distance between the segmentation plane or hyperplane and the data points in the given dataset.

The current signal obtained in this paper was not linearly separable, so it was necessary to first select an appropriate kernel function to map it to a high-dimensional space and then optimize it. Up to now, there has been no generally accepted selection criterion for the selection of the kernel function. The commonly used kernel functions mainly include Gaussian kernel function, polynomial kernel function, linear kernel function, and sigmoid kernel function. Owing to its advantages of few parameters and fast convergence speed, Gaussian kernel function was used for kernel transformation in this paper. Its mathematical definition is shown in Equation (20) [30]:

$$K(x,y) = e^{-\frac{\|x-y\|^2}{2\sigma^2}}\tag{20}$$

where *x* and *y* are the eigenvectors of the current signal.

#### 3.6.4. CART Decision Tree

A decision tree [31] is a supervised machine learning algorithm that can be used to classify or predict unknown objects. The construction of the decision tree is a process of top-down and recursive branching. First, we selected the most effective division method for the samples according to the features, formed a new decision branch, and then pruned the branch to optimize the decision tree. Commonly used decision tree generation algorithms mainly include ID3, C4.5, and CART. We employed the CART model in this study and used the GINI index to select the optimal division points of the optimal features. The basic principle is to form a decision tree structure in the form of a binary tree by cyclic analysis of the training dataset and select the attribute that minimizes the GINI index value of the child nodes as the classification scheme.

#### 3.6.5. Random Forest

A random forest [32] uses a decision tree as the base classifier. It improves the overfitting problem of a decision tree by combining the bagging ensemble learning theory and random subspace method. Based on the idea of multiple decision trees, the random forest generates the training data of each tree by random extraction from the original dataset and then randomly extracts n features from N feature variables before finally selecting the optimal feature variables from these n features as split features to construct multiple decision trees. Finally, each of the decision trees gives a class prediction, and the class with the most votes becomes the model's prediction.

#### **4. Experiments and Results**

#### *4.1. Data Acquisition*

We purchased 770 eggs at a farmer's market near the laboratory and collected current signals for model training and algorithm verification, including 367 intact eggs and 403 cracked eggs. To avoid the noise introduced by stains on the eggshells, which may have affected the experiment, the cleaning and drying process in the actual egg factory was simulated before data acquisition. As for the impact of cleaning on the test results, we came to the conclusion after small-scale experiments that cleaning could remove the stains on the surface of the eggshell and reduce the interference with the current signal acquisition. Meanwhile, the water molecules during cleaning could wet a part of the crack gaps that were generated and had been blocked for a long time, which contributed to the conductivity of the cracks.

At the initial stage of data acquisition, each egg was used only once for the current signal, which resulted in a lot of waste. In order to improve the utilization rate of the sample eggs and efficiency of data acquisition, the eggs that were detected to be intact would be used again as cracked eggs after being slightly cracked by our crack striking machine. The physical and experimental parameters of the tested eggs are shown in Table 3.


**Table 3.** Physical and experimental parameters of tested eggs.

#### *4.2. Extraction of Data Features*

As shown in Figure 12, the current signals of eggs with different sizes, which included three small ones and three large ones, were found to fluctuate significantly. The current signals collected in the experiment were mixed with noise and were easily affected by the environment, reducing the classification accuracy. Therefore, we introduced six common time domain features, three frequency domain features, and wavelet packet coefficients to extract stable and comprehensive feature information from the current signals for the classification models. The six time domain features were the weighted mean, average, standard deviation, range, skewness, kurtosis, and their expressions are listed in Table 4. In the six expressions given in Table 4, *xi* (*i* = 1, 2, ... , *N*) is the current data, *N* is the length of the data, and *w* is the coefficient. The three frequency domain features were the frequency of the center of gravity, root mean square frequency, and standard deviation of the frequency, and their expressions are described in Table 5. In the three expressions given in Table 5, *f* is the frequency value and *P*(*f*) is the power spectrum.

**Figure 12.** The effect of egg size on current signal.

**Table 4.** Time domain features.


**Table 5.** Frequency domain features.


#### *4.3. Analysis of the Results*

In the process of acquiring an egg's current signal, there are various discharge phenomena, such as corona discharge, small air gap breakdown, and creeping discharge, which make the current signal mix with a lot of noise. The interference of noise plus the relatively weak current signal at the microcrack cause the current signal to be submerged in the noise. To solve this, the method of wavelet threshold denoising was adopted to remove the high-frequency noise in the signal while retaining the useful high signals. The wavelet threshold denoising was such that, due to the continuity of the real signal *f*(*t*), after the discrete wavelet transform, the wavelet coefficients generated at different scales were large, while the wavelet coefficients produced by a corresponding noise signal *e*(*t*) were small. Therefore, noise can be effectively suppressed by first selecting appropriate thresholds on different scales to process high-frequency wavelet coefficients, and then performing an inverse wavelet transform on the signal can effectively suppress noise. It is noteworthy that the selection of a wavelet base is of great significance to the effect of wavelet threshold denoising. By analyzing the shape of the current signal at the crack position, the Sym2

wavelet base was finally selected, and it had better symmetry, which could, to a certain extent, reduce the phase distortion when analyzing and reconstructing the signal.

The current signals of two intact eggs and two cracked eggs were randomly selected from the dataset, as shown in Figure 13, where blue represents the signal before denoising and red represents the signal after denoising. The following can be observed from Figure 13: (1) The current signal of the cracked eggs had an evident peak within one cycle, while that of the intact eggs did not. As mentioned in Section 2, when the experimental voltage is smaller than the breakdown voltage, the change in the current curve is mainly dominated by the capacitance jump during the rotation. The experimental voltage in this paper was higher than the breakdown voltage, so the change in the current curve was mainly dominated by the electrical breakdown at the crack. When the crack was small, the experimental voltage may not have reached the breakdown voltage, and the change in the current curve may have also been dominated by a capacitance jump. In addition, we also designed the circuit protection function, where the system would automatically cut off the circuit to protect the safety of the equipment and eggs when the current exceeded the set threshold. (2) The jitter of the current curve was relatively smooth due to the small changes in capacitance of the intact eggs. However, the two wave shapes of the intact eggs were not exactly identical and even had big differences, which may have been related to the different roughnesses of the eggshells.

**Figure 13.** Egg current waveform. (**a**,**b**) Waveforms of intact eggs. (**c**,**d**) Waveforms of cracked eggs.

After the wavelet threshold denoising, the time domain, frequency domain, and wavelet packet coefficients of the current signal were extracted. It can be seen from Figures 14 and 15 that most of the features of the intact eggs and cracked eggs had obvious differences, but some of the differences were not obvious.

We put the time domain, frequency domain, and wavelet packet coefficient features into the SVM model. The experimental results showed that the recognition rate of each feature was different and that the eggs incorrectly recognized by different features were also not the same. This indicates that features in different domains had different classification effects. Therefore, this paper used the multi-domain features to fully reflect the inherent characteristics of the original current signal so as to improve the detection accuracy.

**Figure 14.** Feature distribution diagram. (**a**) Time domain features. (**b**) Wavelet domain features.

**Figure 15.** Three-dimensional distribution diagram of frequency domain features.

Finally, we adopted a variety of machine learning methods such as K-means clustering, linear discrimination analysis, and a support vector machine, as mentioned in Section 3.6, for pattern classification, and performance measures such as accuracy, precision, and the recall rate were calculated from the testing data. The experimental results are shown in Table 6.


**Table 6.** Combination feature classification effect in time domain, frequency domain, and wavelet domain.

The following conclusions can be drawn from the experimental results:


#### **5. Discussion**

This paper studied the electric field characteristics of eggs under the action of electrodes on the basis of analyzing the physical properties of the eggshell and established two discharge models. The high-precision detection of eggshell cracks was realized by designing an egg crack detection platform, comparing machine learning classification algorithms, and analysis of the current signal. The most important element of this study is proposing a novel method for crack detection in eggshells based on discharge analysis. The vision-based method has higher requirements for the light source and image processing technology, and the acoustic method has higher requirements for the percussion equipment and environmental noise. However, the method in this paper has high precision, stable results, and less dependence on the environment. It only needs to control the humidity, voltage, and a few other experimental conditions. This section will further discuss the electrical characteristics of poultry eggs and explore the universality and generalization of the method proposed in this paper.

It is worth noting that the classification accuracy did not change significantly under different machine learning methods, which proves that the features extracted based on the current signals were stable. Therefore, the current-based crack detection method is feasible and can be used in actual production, with accuracy rates as high as 99%. In addition, for misclassified eggs, by analyzing the position, condition, and corresponding current signal of the cracks, we found the following problems. Although the cracks were distributed in the effective detection area between the tip and the blunt end, they were blocked by spilled egg liquid and dust due to a long storage time. Therefore, it should be possible to further improve the classification accuracy by improving the design of the brushes.

In addition, we conducted further studies on the electrical properties of the eggs. We randomly selected 10 eggs as samples and recorded the current signals at applied voltages of 800 V, 1000 V, 1200 V, and 1400 V. According to whether there was an obvious discharge that could be directly observed and heard, the eggs could be divided into discharged eggs and undischarged eggs. The current signals of the two kinds of eggs are shown in Figures 16 and 17. Figure 18 compares the current signals of both the discharged and undischarged eggs in the same coordinate system. After analysis, it can be seen that the higher the discharge voltage, the larger the dynamic current of the egg would be. However, the voltage increases would also amplify the current fluctuation, which also indirectly proves that the high voltage will cause breakdown in the eggs. In addition, not all eggs in the discharged samples had cracks, which means it is not reliable for directly identifying whether the eggs had cracks when only using the current signal, and it is very necessary to conduct data analysis on the current signal.

**Figure 16.** Current signal when the egg had no discharge phenomenon under different voltages.

**Figure 17.** Current signal when the egg produced the discharge phenomenon at different voltages.

**Figure 18.** Current signal of the eggs in the voltage range of 800–1400 V. The current signals of 3 eggs with obvious cracks are set to blue, the current signals of 2 eggs with no cracks but obvious discharge are set to green, and the current of the eggs without discharge signal is set to orange.

Crack detection technology based on electrical characteristics is a new research direction for the quality inspection of agricultural products in the future which has great research value and market potential. The method proposed in this paper can not only detect cracks in eggs but also achieve high-precision detection of cracks in duck eggs, among others. It is a universal and generalizable method. We purchased 267 fresh duck eggs from the Dabao Breeding Duck Incubation Base in Xintai Tianbao Town for current signal acquisition, including 130 intact duck eggs and 137 cracked duck eggs. The physical and experimental parameters of the tested duck eggs are shown in Table 7. Based on the analysis in Section 4.3, after the wavelet denoising, the time domain, frequency domain, and wavelet packet coefficient features of the current signal of the duck eggs were extracted and combined, and we selected the RF classifier for training. The results are shown in Table 8. For the duck eggs, the accuracy of the model was slightly reduced but still within a higher accuracy range. We speculate that there are two main reasons for the slight fluctuation of the evaluation index: (1) The number of duck eggs used in verification was quite different from that of the number of eggs.Therefore, according to the equations for the precision rate and recall rate, it can be known that, when the overall base is low, misclassification usually leads to a greater reduction in relevant indicators. (2) Eggs are usually laid in industrialized chicken houses, where the environment is relatively dry and hygienic. While ducks are typical waterfowl, they usually live outdoors and in water, which also leads to a relatively humid and dark environment for duck eggs, and the cracks are easily blocked by impurities such as dust. Although we simulated the cleaning process of the egg factory before testing, the impurities that had been blocked for a long time had solidified, and it was difficult for water molecules to enter the small cracks to wet the blocked substance during flushing, so the conductivity at the cracks would decrease and cause them to be missed during the inspection.

**Table 7.** Physical and experimental parameters of tested duck eggs.


**Table 8.** Detection results of cracked duck eggs.


#### **6. Conclusions**

In this study, we established the egg electrical characteristics model and designed a microcrack detection system that has higher accuracy and is more convenient than the traditional methods. Different types of features extracted from the time, frequency, and wavelet domains of the current signals were proven to contain a mass of crack characteristics after reducing the interference of noise in the signal with the sym2 wavelet. Based on the above features, five typical machine learning algorithms were used to divide the eggs into cracked eggs and intact eggs, which verified the proposed model. The experimental results show that the RF had better robustness, and the fusion of multi-domain features can effectively improve the accuracy of classification. It is worth noting that the classification accuracy by different machine learning methods had little variation, with all being around 99%, proving that the model of detecting microcracks by using current signal features has certain stability and reliability. The relevant experiments of duck eggs also confirmed that the method proposed in this paper has a certain universality and generalization. Our research will help relevant enterprises to quickly and accurately detect cracked eggs in the production line, greatly reduce the number of cracked eggs in the end products, improve the quality of related products, and have good practical application prospects. In general, this paper explored a new method for nondestructive testing for egg cracks which lays a foundation for the development of nondestructive testing of egg cracks based on an electrical characteristics model.

**Author Contributions:** Conceptualization, C.S. and C.Z. (Changsheng Zhu); methodology, C.S., C.Z. (Changsheng Zhu) and Y.W.; software, Y.W., Y.C., B.J. and C.Z. (Changsheng Zhu); validation, C.S., C.Z. (Changsheng Zhu), C.Z. (Chun Zhang) and Y.W.; formal analysis, C.S., C.Z. (Chun Zhang) and J.Y.; investigation, C.S. and J.Y.; writing—original draft preparation, C.S., C.Z. (Changsheng Zhu), Y.W., Y.C. and B.J.; writing—review and editing, C.S., C.Z. (Chun Zhang), C.Z. (Changsheng Zhu), Y.W. and J.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Tai'an Science and Technology Innovation Development Plan (No.2021GX050 and No.2020GX055).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on demand from the corresponding author at (cs.zhu@sdust.edu.cn).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**

