The methodology of this study is crafted to address the intricate challenges of recognizing and classifying components on densely packed printed circuit boards (PCBs), where traditional single-modality imaging techniques fall short. By innovatively integrating optical and X-ray imaging, our approach not only enhances the visibility of both surface and hidden component features but also overcomes the typical limitations—such as the optical method’s restricted view of internal component structures and the X-ray’s lower resolution for surface details. This dual-modality fusion, facilitated by our advanced data fusion technique, allows for a more detailed and comprehensive dataset, essential for high-accuracy classification. Furthermore, the deployment of the WaferCaps capsule network introduces a significant advancement over traditional convolutional neural networks (CNNs) by leveraging its superior capabilities in maintaining spatial hierarchies and utilizing dynamic routing, which enhances the accuracy and reliability of the classification outcomes.
3.1. Image Fusion
The physical limitations of imaging sensors can make it difficult to obtain uniformly good images from a scene. The fusion of images represents one possible solution to this problem. Through this technique, a perfect image of a scene can be created by combining multiple samples that each provide complementary information about the scene. In this paper, images of the same electronic PCB are combined using X-rays and optical images. Through this process, salient and relevant information from hidden parts of images acquired by an X-ray machine (for instance, inside a chip) is fused with details from optical images [
16].
The fusion of images takes place on three levels: at the pixel level, at the feature level, and at the decision level. In pixel-level image fusion techniques, input images can be processed directly for further computer processing. The feature level techniques for image fusion involve the extraction of relevant features, such as pixels, textures, or edges, and blending them together to generate supplementary merged features. In decision level fusion, multiple classifiers combine their decisions into a single one that describes the activity that occurred [
17]. A variety of fusion methods can be categorized into two groups: traditional algorithms (spatial and frequency domain techniques) and deep learning-based methods [
18]. In spite of the high performance of traditional fusion methods, they have some disadvantages. A major problem is that fusion performance heavily depends on the extraction and selection of features, and there is no universal method for obtaining features. To address these drawbacks, deep learning-based fusion methods have been developed. In these fusion methods, deep learning is employed to extract deep representations of the information provided by the source images. Various fusion strategies have been proposed in order to reconstruct the fused image. The fusion strategy can also be designed with deep learning. In this paper, we employ the innovative fusion method proposed by Jingwen Zhou et al. [
16] that is presented for infrared and visible image fusion based on the VGG-19 model. In this method, unlike Li et al.’s proposed approach in [
19], the source image does not need to be split into basic and detailed parts. This decomposition makes the fusion process too complex and leads to the incomplete extraction of details and salient targets. The Jingwen fusion method uses grayscale images as inputs. Therefore, because the optical images are in color in this research, the IHS (intensity, hue, and saturation) transform is applied for transforming the optical images from RGB (red, green, and blue) to the IHS color space [
20]. In the IHS space, intensity indicates the spectral brightness, hue represents the wavelength, and saturation reflects the spectrum’s purity. Through the Jingwen fusion method, the intensity component of the optical image is combined with the X-ray image and then the updated intensity, along with the hue and saturation, is converted back into RGB color space. The result of this process is a fused color image.
Figure 1 displays the architecture of the whole fusion process. Instead of decomposing images into high and low frequencies, optical images and X-ray images feed into VGG-19 for layer-by-layer feature extraction. In X-ray images, hidden targets are usually visible because of the imaging characteristics of the technique. Optical cameras usually capture more surface details, but the targets are often covered by other objects and there is also no way to see inside the components. With its superior classification and location capability, VGG-19 is well suited to the fusion task because it can extract detailed features and salient targets.
3.2. VGG-19 Network: Features Extraction, Processing, and Reconstruction
VGG-19 is a CNN (convolutional neural network) for which over a million images from the ImageNet database have been used to train. This network is composed of 19 layers and has the capacity to classify images into one thousand object categories. Therefore, the network has learned affluent representations of features for a variety of images. The proposed fusion method by Jingwen entails five layers that are used to extract detailed features and salient targets.
The first two selected convolutional layers are conv1_1 and conv1_2 of VGG-19, which are mainly responsible for extracting details and edges and then must be retained. The third selected layer is conv2_1, which mainly extracts the edges in the picture. Conv3_1, the fourth selected layer, is used to extract the image’s prominent targets. The last retained layer is conv4_1, which mainly extracts the salient targets. Using L1-norm and average operators, activity level maps (
) are derived from the extracted features and targets. Weight maps (
) are generated using the SoftMax function and upsampling operator. The X-ray and intensity components of the optical images are then convolved with five different weight maps to produce candidates for fusion. In this section, the final fused image is formed by using the maximum strategy to combine the five candidate fused images.
Figure 1 demonstrates the process of feature processing. As a result of the L1-norm, intuitive images are translated into objective data distributions, and the average operator ensures robustness to misregistration of the images.
The average operator of the final activity level map (
) is shown in Equation (
1):
where
is an N-dimensional vector of the feature maps of the input image
a (since we fuse two input images then
), which derived from the i-th convolutional layer. In the i-th layer, N indicates the maximum number of channels. The parameter r represents the size of the average operator and according to [
16], it is set to 1. Based on the final activity level map (
), an initial weight map is calculated by utilizing the SoftMax function. Equation (
2) indicates that all weight map values fall within the range [0, 1]:
where i denotes the convolutional layer number
and K shows the number of activity level maps, whose value is 2 since source images contain X-rays and optical images. In VGG-19, the pooling operator reduces the size of the feature maps gradually via subsampling with a stride of two. Due to this, the size of the feature maps in different convolutional layer groups is
of the original image.
As soon as the initial weight maps are obtained (
), they are up-sampled to ensure that they are in proportion to the source image. In Equation (
3), the final weight map has the same dimensions as the source image:
Eventually, based on Equation (
4), each pixel of all five competitor fusion images is determined to be the maximum value as the final result in the fusion image [
16]:
where
are the source images and K is the number of source images.
Figure 2 depicts an example of a chip component optical and X-ray images and the resulting fused image.
3.3. WaferCaps
Many computer vision tasks have extensively utilized convolutional neural networks (CNNs) [
21]. Although CNNs have shown remarkable performance in many classification tasks, they still have some drawbacks. One of these drawbacks is the use of pooling layers in CNNs. Pooling layers have an advantage in decreasing the computation requirements by shrinking the sizes of the feature maps during the feed-forward process; however, this comes with the cost of losing many vital features that could be important in the learning process. Additionally, CNNs have limitations in accurately identifying the spatial location of the inspected feature within an image [
22].
The capsule network (CapsNet) is a newly proposed neural network that is employed in classification tasks and can overcome the previous drawbacks of CNNs. It was developed in 2017 by Sabour et al. [
23] and it was implemented primitively to classify MNIST handwritten digits dataset. CapsNet stands apart from conventional CNNs due to two primary factors: dynamic routing and layer-based squashing [
24]. Feature detectors with scalar output are replaced with capsules with vector output in CapsNets. Furthermore, CapsNets use the routing-by-agreement concept instead of pooling layers. In CapsNet, each capsule consists of multiple neurons, where each neuron represents specific features in different regions of an image. This approach enables the recognition of the entire image by considering its individual parts [
25].
The initial layer of CapsNet employs a convolutional layer similar to CNNs, but subsequent layers differ in structure. In the second layer, known as PrimaryCaps, each of the 32 capsules possesses an activity vector
to encode spatial information through instantiation parameters. The output of
is then transmitted to the subsequent layer, DigitCaps, where each of the 16 capsules per digit class receives
and performs matrix multiplication with the weight matrix
. This computation yields the prediction vector
, which signifies the contribution of capsule
i in PrimaryCaps to capsule
j in DigitCaps as indicated by Equation (
5):
Subsequently, the predictions undergo multiplication with a coefficient known as the coupling coefficient
c, which signifies the level of agreement between capsules as depicted in the equation. Consequently, the value of the coefficient
c is updated iteratively through an iterative process, giving rise to what is commonly referred to as “Dynamic Routing”. This process is determined by utilizing a routing Softmax function, where the initial logits
represent the logarithmic prior probabilities of coupling capsule
i in PrimaryCaps with capsule
j in DigitCaps. These operations can be exemplified through Equations (
6)–(
9):
In this context,
represents the weighted sum computed to derive the candidates for the squashing function
. The role of the squashing operation is to generate a normalized vector from the collection of neurons present within the capsule. The activation function employed for this purpose can be described by Equation (
10):
To facilitate the classification process, a margin loss function is established. This function assesses the loss term derived from the output vector of DigitCaps, aiding in determining the correspondence between the chosen digit capsule and the actual target value of class
k. The mathematical representation of the margin loss function can be observed in Equation (
11):
Here, the label is used to indicate the presence (“1”) or absence (“0”) of class k. The hyper-parameters of the model, denoted as , , and , hold specific values: is set to 0.9, is set to 0.1, and is set to 0.5.
The original CapsNet performed well in classifying the MNIST dataset with 99%; however, it did not provide high accuracy with more complex images such as the CIFAR-10 dataset. Therefore, in this study, we propose a modified version of CapsNet known as WaferCaps that was originally proposed in [
9] to classify semiconductor wafer defects and was also used in [
26] to classify optoelectronic wafer defects. The structure of WaferCaps is shown in
Figure 3 and
Table 2. WaferCaps, in comparison to CapsNet, incorporates an additional two convolutional layers with larger kernel sizes, enabling a more effective feature extraction procedure. Additionally, dropout layers are introduced after each convolutional layer to mitigate overfitting.
3.4. Decision Fusion
The concept of decision fusion refers to a method of combining data from different classifiers into a single decision about the activity that made up the dataset that was merged. It has been shown in many studies that a decision fusion approach has a significant effect on classification accuracy.
A given classification problem may be classified differently by different classification techniques. Using decision fusion, multiple classifiers are integrated into a common explanation of an event and a variety of rules can be integrated in a fully flexible manner. The accuracy of classification can be improved through decision fusion.
Decision fusion techniques can be divided into several types based on their architecture: serial decision, parallel decision and hybrid decision fusion. In serial decision fusion, classifiers are arranged one after another, and their outputs are fed into the next classifier. In parallel decision fusion, several classifiers perform classification simultaneously in parallel, then combine their results. Hybrid decision fusion is a hierarchy-based classification process [
27].
In this study, three WaferCaps-based networks are combined to accomplish a parallel decision fusion process. According to
Figure 4, these parallel processes use optical, X-ray, and fused images to provide a final decision. Algorithm 1 shows a general view of this process. As will be seen from the results, some networks are able to accurately predict the class of components with a higher probability than others. Therefore, integrating these three networks improves the accuracy of detecting the class of components. Accordingly, the decision fusion approach will lead to an increase in the accuracy of the final classification of all classes. The combined and fused classifier is composed of a selection rule and three individual classifiers.
Three WaferCaps-based classifiers based on three different training datasets are laid out in the first layer. The three training datasets are composed of optical, X-ray, and fused images. In the second layer, selection methods are applied to the outputs of the individual classifiers in order to produce a final classification result. Every classifier produces outputs that represent the probability of each component. In percentage terms, the decimal numbers between 0 and 1 represent confidence levels.
In the first layers of classifiers, , , and represent the probability of the predicted class of components, and , , and represent predicted classes by networks that are trained based on X-ray, optical, and fused images, respectively. In the second layer, Algorithm 1 describes the selection rules. The application of these rules to the output of the three classifiers results in a high level of accuracy due to the blending of the advantages of all three classifiers. Trial and error are used to determine the thresholds.
Algorithm 2 succinctly encapsulates the entirety of the methodology in a structured pseudocode format. This algorithm presents all steps of our approach in a unified sequence, ensuring that the logic and operations are clearly delineated and easily interpretable.
Algorithm 1 Selection rules. |
- 1:
procedure Selection() ▹ Oc, Xc, and Fc represent predicted classes by the networks trained on optical, X-ray, and fused images, respectively. Op, Xp, Fp represent the corresponding probabilities of the predicted classes. - 2:
if AND AND then - 3:
predictedClassLabel ← ▹ Assume that ‘classProbabilities’ is a dictionary linking probabilities to class labels - 4:
else if ( ≠ ) AND ( ≠ ) AND () AND () then - 5:
predictedClassLabel ← - 6:
else if ( ≠ ) AND ( ≠ ) AND () then - 7:
predictedClassLabel ← - 8:
else - 9:
predictedClassLabel ← - 10:
end if - 11:
return predictedClassLabel - 12:
end procedure
|
Algorithm 2 PCB component classification. |
- 1:
procedure PCB_Component_Classification - 2:
opticalImage ← CaptureOpticalImage - 3:
xrayImage ← CaptureXrayImage - 4:
opticalIHS ← ConvertToIHS(opticalImage) - 5:
IHSfusedImage ← FuseImages(opticalIHS, xrayImage, ‘VGG − 19’) - 6:
fusedImage ← ConvertToRGB(IHSfusedImage) - 7:
opticalComponents ← ComponentExtraction(opticalImage) - 8:
xrayComponents ← ComponentExtraction(xrayImage) - 9:
fusedComponents ← ComponentExtraction(fusedImage) - 10:
opticalClass ← Classify(opticalComponents, ‘WaferCaps’) - 11:
xrayClass ← Classify(xrayComponents, ‘WaferCaps’) - 12:
fusedClass ← Classify(fusedComponents, ‘WaferCaps’) - 13:
finalDecision ← DecisionFusion(opticalClass, xrayClass, fusedClass) - 14:
return - 15:
end procedure - 16:
function CaptureOpticalImage - 17:
// Capture optical image using a camera setup - 18:
end function - 19:
function CaptureXrayImage - 20:
// Capture X-ray image using X-ray equipment - 21:
end function - 22:
function ConvertToIHS(image) - 23:
// Convert RGB image to IHS color space - 24:
end function - 25:
function FuseImages(opticalIHS, xrayImage, method) - 26:
// Apply image fusion algorithm on intensity component of IHS optical image and X-ray image - 27:
end function - 28:
function ConvertToRGB(image) - 29:
// Convert IHS image back to RGB color space after fusion - 30:
end function - 31:
function ComponentExtraction(image) - 32:
// Extract components from the image (Single-component images have been extracted from the PCB images and labelled already) - 33:
end function - 34:
function Classify(image, method) - 35:
// Classification process using the specified method (e.g., WaferCaps) - 36:
end function - 37:
function DecisionFusion(opticalClass, xrayClass, fusedClass) - 38:
// Final decision is made based on Algorithm 1. - 39:
end function
|