1. Introduction
The electronic circuit’s limitation and the excessive capability of optical devices to excite terahertz (THz) waves in the past have inhibited the THz spectrum exploitation in various industrial and laboratory applications. THz is electromagnetic radiation between the microwave and the infrared region within the frequency domain of 0.1–10 THz [
1]. The THz waves are non-ionizing, non-invasive, and penetrate many materials. In the last two decades, research and development probing the THz spectrum and THz signal generation has led to technologies in several scientific applications [
1,
2,
3]. Nowadays, THz applications are often used in biology, physics, chemistry, material science, and security. Also, it is known that THz radiation is very sensitive to translation, vibration, and rotation [
2,
3]. Such distinctive characteristics enable using the THz spectrum in numerous inspection, scanning, and imaging applications. The THz frequency spectrum’s uniqueness allows new research to be conducted in the scientific and engineering fields.
Regarding the overview of the latest scientific publications, most THz applications are limited strictly to scientific research with the minority on engineering applications. The main reason stems from the fact that the engineers’ community is not aware of the technology, and there is a lack of ‘off-the-shelf appliances’. Many THz systems contain expensive elements such as photoconductive antennas, laser sources, low noise, and sensitive amplifiers to acquire the emitted radiation. With continued research on the physical principles of radiation, excitation, and probing, THz applications can be a breakthrough technology in the new upcoming industrial era. They can be used in various industrial applications and can offer valuable upgrades or replacements for standard devices.
This paper presents an automated inorganic pigments (IP) classification of plastic materials using terahertz frequency domain spectroscopy (THz-FDS) as a source of data. The vast majority of the THz spectroscopy is done with the time domain spectroscopy (TDS) technique, which is robust and flexible. The TDS uses short pulses and coherent detection, which consequently disables standing waves and simplifies the data analysis. The frequency resolution is in the span of 1–5 GHz [
4]. Regarding unknown spectral properties of the pigments and the intention of accurate spectral properties examination, the THz-FDS technique was used. The main advantage of THz-FDS is high-resolution specter measurement, possible user-selected frequency span, and are cost efficient. Unlike TDS, FDS resolution is in the range of 1–5 MHz. Due to the continuous waves operation principle and possible standing waves occurrence, the FDS data are more complex and require advanced post-processing techniques [
4,
5]. The analysis of FDS data is discussed in depth below.
Plastic materials are used widely in many fabrication processes, assembly, and design. The pigments’ classification of the processed workpiece is essential in quality control, material characterization, and fabrication validation [
6,
7]. The production of raw pigmented plastic material is a trade secret, and the manufacturer’s confidential information is usually not accessible to the client. To ensure supervision over material characteristics, the demand for non-invasive and non-destructive material inspection arises in many production assemblies. Currently, the inspection techniques used for plastic materials in industrial environments are mostly destructive, including cutting and grinding the sample. On the other hand, non-invasive techniques are based mostly on surface inspection methods [
7]. For example, the classification of the colored material can be achieved straightforwardly with machine vision, which is inefficient for in-depth analysis [
8]. Also, inspections based on thermal, microwave, ultrasonic, acoustic emission, and magnetic techniques are not adequate for IP classification [
1,
9]. The inability to use well-established inspection techniques for non-destructive testing of plastic materials has THz technology as its advantages [
10]. THz waves can penetrate the polymers, even if the material is opaque and has a high spatial resolution compared to microwaves [
11]. Thus, THz technology is suitable for contact-free inspection and can be used for crack, defect [
12], mechanical stress [
13], and aging process detection [
14] in polymer materials. Most inspections were done with the TDS and some with the FDS with know processing techniques, such as phase fringes extraction and FFT [
3]. All applications and their data analysis were made for the characterization of a single instance of the material. Unlike the aforementioned, this paper introduces a high-frequency resolution approach with advanced data processing of complex FDS data, which enables pigment classification and the possibility of automated quality control of the plastic material. The presented work extends THz spectroscopy as a real-time system for non-destructive IP classification and validation.
Most of the data analysis in THz-FDS is undertaken by determining the attenuation and the phase shift of THz waves spread through the medium [
15]. The detected absorption lines of a medium are measured and compared with the reference values, where complex permeability was estimated using the Kramers–Kronig relation [
16]. Besides, chemometric methods, such as principal component analysis [
17], individual component analysis [
18], and partial least square [
19], are used for finding the compounds in the measured medium. The THz-FDS systems based on photoconductive antennas (PCAs) utilize the two optical signals with different wavelengths to create a modulated optical signal with THz frequency [
3]. A most common method of measuring the spectral components is by sweeping the modulated THz frequency and measuring phase fringes. The attenuation and phase shift of THz waves in the medium can be estimated from the measured phase fringes. Such a THz-FDS system needs careful calibration and setting. The calibration’s first concern, which needs to be examined, is the measurement’s uncertainty due to the tuned distributed feedback laser diodes (DFLD) used in THz-FDS systems. When sampling the detected THz wave, the measurement’s actual frequency can vary regarding the reference frequency. The second main calibration task is the environmental impact. THz waves are sensitive to the change of temperature and change of humidity around the measured sample. Therefore, all measurements are usually performed in a controlled environment, which is a significant limitation for industrial applications. The authors in [
16] predicted that THz technology would be used more in laboratory environments than the industrial production processes. Material analysis in quality control and waste management could be areas where THz technology would have an edge. According to this, THz waves are not ionized waves; therefore, they would not damage the measured sample. Many materials have spectral footprints in the THz band. THz technology could also detect unwanted metals in the measured medium, since metals are opaque to the THz waves. There are some examples in material and natural process control, such as fruit inspection [
20] and fermentation supervision [
21]. The major drawback remains that the applications are performed in a controlled environment and not in real-time.
By reviewing the literature, we found that the classification of materials using a THz sensor is extremely difficult due to the wide THz spectrum, and the measurement can be contaminated with different uncertainties, such as DFLD non-linearities, measurement bias, and environmental impact. All the influences are known, but cannot be determined and compensated precisely in the measurement. The supervised learning procedure of the machine learning (ML) algorithm can be used according to the known indeterminate uncertainties. Supervised learning ensures robust mapping of ML over a set of input-output data pairs. All data pairs contain uncertainty, which can be suppressed efficiently under the assumption that the significant futures are preserved and hidden. Many different algorithms of ML exist, which can be deployed for IP classification. After extensive research and comparison between different ML strategies, the convolution neural network (CNN) gives beneficial results. The CNN algorithm is a subset of deep neural networks and deep learning paradigms [
22], and has proven its effectiveness as an image, speech recognition, face detection, futures extraction algorithm. The novel research confirms that CNNs have advantages in series forecasting, a data-driven approach for diagnostic and fault classification of various industrial processes and applications [
23,
24,
25,
26,
27].
The paper proposes an automated IP classification using a novel preprocessing algorithm suitable for processing with CNN classification methods [
22], which can be executed in real-time. The acquired THz signal was transformed from 1D to 2D representation and classified using a CNN. The 1D data are transformed into 2D data representation by preprocessing 1D THz data with peak detection, envelope extraction, and downsampling algorithms, which operate over the THz phase fringes. A new algorithm, called Windowing with Spectrum Dilatation (WSD), transforms 1D data into 2D data that represent a material’s spectral features obtained using the THz-FDS. The 2D data are classified using a CNN, where the material’s spectral futures are distributed spatially throughout the 2D data. Such a transformation ensures that the spectral futures are located regionally with a certainty boundary. Classification and detection of spatially spread futures with added uncertainty is the main advantage of the CNN algorithms. The complexity of CNN is related to the preprocessing parameters’ selection, and can be treated as an optimization procedure. The proposed method for IP classification with CNN was evaluated experimentally. Plastic material, polyethylene (PE), was mixed with various IPs and used as an evaluation sample. The CNN was trained with the preprocessed training set. The efficiency of the training is closely related to the selection of the preprocessing algorithm and its parameters. The paper compares novel WSD and the mostly used set cut-technique (SetCT). The advantage of WSD over SetCT was confirmed with the experimental results, and the outperformance is evident. The proposed approach WSD-CNN was also compared with other known classification algorithms, such as support vector machine (SVM) [
28], naive Bayes (NB) [
29], classification tree (CT) [
30], and discriminant analysis (DA) [
29], all of which operate over 1D data. All the preprocessing methods are discussed and compared later in the article. The spectral characteristic of each PE sample was gathered between 0.1 THz and 1.2 THz. As we confirmed during the work, the proposed WSD preprocessing for automated IP classification based on THz-FDS with the CNN classification achieves high reliability, robustness, accuracy and can operate in real time. The paper also shows the improvement of the THz-FDS scanning technique with efficient and robust complex FDS data analysis.
The paper has six sections. The
Section 1 is an introduction, where the main objectives of the work are presented. The
Section 2 represents the THz-FDS operation principle and experimental setup with the inspection samples. The
Section 3 introduces different preprocessing algorithms of the THz-FDS measurements. The following section continues with CNN structure selection, hyperparameters’ role, and the benefits of the WSD preprocessing algorithm. The experimental results are presented in
Section 5, where comparisons are conducted between different preprocessing and CNN structures. The paper is concluded with
Section 6 2. Terahertz Frequency Domain Spectroscopy Principle for Inorganic Pigments (IP) Classification
The TeraScan 1550 from Toptica Photonics, Munich, Germany, was used for the THz-FDS experiment. The Tarascan 1550 can generate THz waves in the span of 0.03–1.21 THz. It has high THz power and a wide dynamic range. It utilizes mixing or beating the optical signals excited from the two DFB laser diodes with different wavelengths and a PCA emitter and detector. The experimental setup is presented in
Figure 1.
The experiment involved five different categories of samples with individual inorganic pigments, which differed in five colors, white, blue, green, yellow, and black. The geometrical parameters of the samples were equal in all the used batches. The samples with different pigments are presented in
Figure 2.
The sample organization for the deep learning algorithm will be discussed in section five. Only the essential operation principle of the Terascan 1550 is presented for a better overview of the approach and the general understanding of the readers. The system comprises two independent laser sources for signal modulation and two PCAs for emitting and detecting THz radiation, depicted in
Figure 3.
Two tunable DFB lasers were utilized for generating optical signals with different wavelengths,
and
, and mixed within the optical fiber coupler. The resulting optical signal was modulated with frequency
, which can be expressed as,
where
is the speed of light in the vacuum,
is effective refractive Index (
for an optical fiber) and
is the wavelength difference, given as,
. The two DFB laser diodes were of the same type and emitted a light’s wavelength of one DFB. Thelaser diode is shifted by cooling and the others by heating [
31,
32].
The optical source was coupled with the PCAs. The PCA emitter acts as a capacitor with the charge
if the antenna gap is not lit [
33]. The photocurrent is induced when the gap is lit with the optical signal. The photocurrent drives a dipole antenna, and the THz wave is established [
34]. The THz far-field
is estimated as,
where
is the light illumination area,
is the distance from the source,
is the dielectric constant,
is the induced surface current in the PCA gap, and
is the distance dependable peak value. The detector PCA acts in a similar way to the emitter PCA, where the THz waves pushed the photocarriers apart with induced voltage
.
In a transmission-based FDS system, such as that shown in
Figure 3, the emitter PCA’s emitted beam is collimated through the sample into a PCA detector. The measured characteristic of the material is transmittance. The transmittance can generally be described as how much of the emitted field has passed through the measured media. Transmittance
is defined as,
where
is the remaining field after propagation through the medium, and
is the emitted field,
, are the measured and initial intensity, respectively. Regarding the classical electromagnetic theory using the Maxwell equations [
1], the THz wave propagation can be described accurately. The focus is on the frequency-dependence on absorption and dispersion in the measured medium with the transmission spectrometer. Absorption affects the amplitude of the propagated wave, and is described with the absorption coefficient
. The attenuated intensity of the propagated wave is described as,
where
is the propagation depth or thickness of the inspected material, and
is the radiation wavelength. The dispersion or change in propagation speed will cause the propagated wave’s phase change, as shown in
Figure 4.
The measured photocurrent in the detector PCA’s gap will depend on the emitted frequency and the absorption and dispersion in the medium. The THz far-field will also drop with the square of the distance between the emitter and detector PCA. Nevertheless, the distance between the PCA emitter and detector should be considered a phase shift, as shown in
Figure 5.
In the presented setup shown in
Figure 1, a fixed distance is considered between the PCAs. For measuring an attenuation and phase shift in a medium, the frequency should be swept. With sweeping, the frequency
induced photocurrent obtains a sinusoidal form (phase fringes) due to the interferometry between the optical signal and THz waves in the PCA detector. An example of the measured phase fringes of the sample in
Figure 2 is shown in
Figure 6.
The amplitude and phase can be extracted from the measured phase fringes in
Figure 6. The extracted amplitude and phase, with advanced processing algorithms, can be used for automated material classification and non-invasive inspection. The data processing for automated IP classification is followed up in the next section.
5. Experimental Results
The classification algorithms were tested on the THz-FDS data acquired with the TeraScan 1550 presented in
Figure 1. The TeraScan system was calibrated only for measurements in the first batch. All other batches used the initial system setting, which is not usual for such applications. The THz scanner often requires recalibration, which alleviates the external impact and measurement uncertainty. On the other hand, recalibration is time-consuming and reduces system usage. To avoid excessive recalibration and ensure classification reliability, the robustness property of the algorithm is crucial for real-time scanning. The frequency span of the TeraScan system was set to 0.1–1.2 THz, with 20,000 samples for a single measurement. The peak detection and envelope extraction used the maximum seek procedure in a span of 150 samples, where the downsampling ratio was set to 1/50. The experimental procedure is depicted in
Figure 13.
The obtained results of the CNN classification with SetCt and WSD are presented in
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6. The Table parameters are:
is the average processing time for a single item of 2D data,
is the average accuracy,
is the standard deviation of the
, and
,
are the best and worst probability values of the testing samples.
The confusion matrix of CNN classification is presented in
Figure 14, where rows and columns represent actual and predicted values, respectively.
The comparisons with support vector machine, naive Bayes, classification tree, and discriminant analysis [
24,
25,
26] are presented in
Table 7 and
Figure 14. All classification algorithms SVM, NB, CT, and DA used 1D data. The preprocessing data included peak detection, envelope extraction, and downsampling algorithms with equal parameters, as in WSD
low.
Table 7 presents the achieved scores, where
is the average accuracy of the complete validation set for each classification algorithm separately.
Figure 15 presents the confusion matrix of the classification algorithms SVM, NB, CT, and DA.
The presented results in
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 demonstrate the efficiency of the learning algorithm with the preselected testing group, and show significant differences among the approaches and CNN structures. The WSD method has a beneficial property, and is efficient with the reduced data set. It can be seen that the downsampling ration 1/50 does not influence the classification accuracy much; see
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 row
. In contrast to the SetCT
low, such a reduction cannot be applied. The full data set, without reduction, showed promising efficiency of the IP classification, whereby it requires longer training and operation time,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 row
. The selection of the CNN
full structure is more demanding and time-consuming than CNN
low, wherein kernel size, stride selection, and layer sequence require more attention and are more sensitive to parameter change than CNN
low. Comparing both reduced and full-data sets shows that the WSD is a more suitable technique than SetCT. The advantage of WSD is in the 2D data structuring and CNN convolution layer operation, where the claims discussed in
Section 3.3 are confirmed with the experiments. The reduction of the 2D data also benefits the CNN structure.
Figure 10 and
Figure 11 show that the CNN with the lower 2D data does not contain a max-pooling layer. It was confirmed that WSD has an advantage as a data reduction technique and structure relaxation for the THz-FSD measurement and CNN. A max-pooling layer with WSD
low lowers CNN’s reliability, which means that the reduced 2D data after downsampling contained all the significant spectral futures of the measurement, and additional simplification inside the CNN eliminated them. Apparent differences can be seen from the confusion values in
Figure 14 by WSD
full and WSD
low, where the scattering of the SetCT scores is evident, which means the downsampling of SetCT deteriorated the results. The WSD confusion values in
Table 2 and
Table 6 are more consistent than SetCTs and indicate that the WSD approach had higher robustness. The WSD advantage can finally be confirmed with an accuracy of reduced WSD
low, which is comparable with WSD
full. Comparing the classification algorithms in
Figure 15 and
Table 7 shows that the WSD achieved the highest results and outperformed the 1D-based algorithms. Regarding the confusion matrix in
Figure 14, DA performed better than the SVM, NB, and CT but still reached lower results than the 2D-based algorithms, except for SetCT
low. SVM and NB achieved results with an approximate accuracy of 70% and CT close to 50%. It needs to be mentioned that, with the use of unprocessed 1D data, the accuracy of SVM, NB, CT, DA was much lower than with the preprocessed data presented in
Table 7. The comparison shows that the approach using WSD-CNN over THz-FDS data had higher accuracy and the best performance. The transformation from 1D into 2D data with WSD was beneficial for the THz-FDS approach. The paper shows that the IP component in the plastic material can be classified with THz-FDS data and the proposed WSD-CNN structure with an average accuracy of 98%. It is also evident that each IP in plastic material has unique spectral characteristics in the THz domain and non-destructive analysis is possible.