Attention-Guided Deep Learning Texture Feature for Object Recognition Applications

Veerashetty, Sachinkumar

doi:10.3390/engproc2023059051

Open AccessProceeding Paper

Attention-Guided Deep Learning Texture Feature for Object Recognition Applications^†

by

Sachinkumar Veerashetty

Department of Computer Science and Engineering, Sharnbasva University, Kalaburagi 585103, India

^†

Presented at the International Conference on Recent Advances on Science and Engineering, Dubai, United Arab Emirates, 4–5 October 2023.

Eng. Proc. 2023, 59(1), 51; https://doi.org/10.3390/engproc2023059051

Published: 14 December 2023

(This article belongs to the Proceedings of Eng. Proc., 2023, RAiSE-2023)

Download

Browse Figures

Versions Notes

Abstract

:

Image processing-based pattern recognition applications often use texture features to identify structural characteristics. Existing algorithms, including statistical, structural, model-based, and transform-based, lack expertise for specialized features extracted around potentially defective regions. This paper proposes an attention-guided deep-learning texture feature extraction algorithm that can learn features at various regions with varying complexities, addressing the lack of expertise in existing techniques. This approach can be used for applications such as minor fabric defects and hairline faults in PCB manufacturing.

Keywords:

texture features; deep learning; object recognition; PCB manufacturing; attention-guided deep learning models

1. Introduction

In computer vision and object recognition applications, the extraction of texture features plays a significant role [1]. The machine learning algorithm is trained to recognize objects using texture features that are extracted from the image. The analysis of satellite or aerial imagery, facial recognition, biometric object recognition, texture enhancement and robot vision for unmanned aerial vehicles, texture synthesis for computer graphics, and image compression are just a few examples of the numerous applications where texture analysis is important [2]. Numerous texture extraction techniques have been proposed since 1960 [3]. These methods convert an image’s texture into a feature vector that describes its characteristics. This feature vector can be applied to later tasks like classifying textures. Analyzing the pixel neighborhood is necessary to capture the spatial context because an image’s texture contains spatial context information. The majority of methods turn an image into a collection of small-scale features before representing it globally. The local texture features are aggregated to produce a global representation using operations like sum, max, min, etc.

The existing texture feature extraction algorithms can be classified into two categories: traditional and learning-based [4]. Traditional feature extraction algorithms extract various statistical-, structural-, spectral- and model-based features from image. But these features are not adaptive to fine-level fabric textural differences and lack specialization to certain regions desired by applications. Learning-based approaches have recently been proposed for the extraction of texture features. The learning-based approaches can be categorized as vocabulary learning strategies, extreme learning approaches, and deep learning approaches. The learning-based method has higher capabilities compared to traditional feature extraction algorithms. Among the learning-based approaches, deep learning-based approaches have recently gained importance due to their ability to learn intricate features with much handcrafting, as needed in traditional feature extraction algorithms. Though the deep learning approaches avoid handcrafting, they still have problems related to generalized learning and lack selective intricate learning based on specialization regions desired by applications. This work proposes a solution to this problem using attention-based deep learning feature extraction. The proposed solution identifies the specialization regions in the image through frequency domain analysis and an LBP-based convolutional kernel is designed to extract more intricate features at salient regions compared to other regions. Following are the novel contributions of the work.

(i) A novel specialization region selection algorithm based on frequency domain analysis using Quaternion wavelet transform;

(ii) A novel LBP convolutional kernel to extract more intricate features at the specialization regions.

The discriminative ability of features for object recognition applications improves with specialization and more intricate features at the specialization region.

This paper is organized as follows: Section 2 details the proposed attention-based deep-learning texture feature extraction technique. Section 3 provides the results of the discriminative ability of the proposed deep learning feature for different applications. Section 4 presents the conclusion and scope for further research.

2. Materials and Methods

Andrearczyk et al. [5] introduced a deep learning-based technique for texture feature extraction, replacing traditional filters. They modified CNN architecture to reduce shape emphasis, but this led to higher-dimensional features and lacked compactness. They also used a modified AlexNet model for image texture extraction [6].

Lin et al. [7] studied CNN models for texture feature extraction, finding bilinear models superior but computationally complex. Li et al. [8] combined deep learning with Gabor wavelets to extract rotation-invariant texture features without specific weights. Simple et al. [9] proposed a novel texture feature as a Fisher vector, but lacked details on mapping features to specific regions. Liu et al. [10] presented a CNN-based method that lacked compactness and region mapping. Dixit et al. [11] combined deep learning with Whale’s optimization algorithm.

Kociołek et al. [12] used CNN for texture directionality detection, but found it lacked granularity and compaction support. Sabine et al. [13] explored deep CNN models, but not specific regions. Zhang et al. [14] combined convolutional and encoding layers, experimenting with various encoders, but did not focus on extracting features from specific regions. Sabino et al. [15] proposed a multilayer network for texture feature extraction, but faced computational issues. Barburiceanu et al. [16] extracted textures from deep learning models, but lacked region specification. Anwer et al. [17] combined LBP with a deep learning model to create TEX-Nets, but the fusion did not consider specific image regions.

Jia et al. [18] used a two-stage recurrent neural network to extract shape and texture features, while Kasthuri et al. [19] combined deep learning with Gabor filters for face recognition. Simon et al. [20] combined deep architecture features with luminance information. Bello et al. [21] utilized CNN to extract color texture features, revealing superior discriminative ability compared to hand-crafted descriptors.

The studies on deep learning methods for extracting texture features found that none are region-specific. However, for applications like fabric defect detection and manufacturing hairline defects, it’s crucial to extract features uniformly across the entire image, focusing on areas where defects are more likely to occur. This specialized feature extraction can significantly improve the accuracy of defect classification.

The proposed solution uses frequency domain analysis to generate salient regions from an image, an unsupervised clustering algorithm, a modified CNN with an LBP convolutional kernel, and texture feature extraction as sown in Figure 1.

2.1. Proposal Generation

HIS transform is applied to the input RGB image before performing FDA. HSI transform on RGB color image is performed as follows:

H = \{\begin{array}{l} θ, & B \leq G \\ 360 - θ, & B > G \end{array}

(1)

S = 1 - \frac{3}{(R + G + B)} [m i n (R, G, B)]

(2)

I = \frac{1}{3} (R + G + B)

(3)

where

θ

is given as

θ = a r c o s \{((\frac{1}{2})) [(R - G) + (R - B)]) / ({(R - G)}^{2} + (R - B) {(G - B)}^{\frac{1}{2}})}

(4)

where R, G, B are the pixel values for red, green and blue component of the image.

The HSI image can be represented in quaternion form as

f (n, m) = H (n, m) μ_{1} + S (n, m) μ_{2} + I (n, m) μ_{3}

(5)

where (n, m) is the location of the pixel.

H (n, m), S (n, m) I (n, m)

are the Hue, Saturation, and Intensity of pixel at (n, m). The value of μ is selected as per the condition.

The quaternion Fourier transform is performed for pixel at (u, v) as

F (u, v) = F_{1} (u, v) + F_{2} (u, v) μ_{2}

(6)

F_{i} (u, v) = \frac{1}{\sqrt{M N}} \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} e^{- μ_{1} 2 π ((\frac{m u}{M}) + (\frac{n u}{N})} f_{i} (n, m)

(7)

The Gaussian Quaternion high-pass filter in two dimensions is used to smooth images by removing high frequency components in specific ranges. A Gaussian Quaternion high pass filter function is defined as

H_{q} (u, v) = 1 - e^{\frac{D^{2} (u, v)}{2 σ^{2}}}

(8)

where σ is a measure of Gaussian spread and D(u, v) is the distance of point (u, v) to the origin of the frequency rectangle (M/2, N/2). It is calculated as

D (u, v) = {[{(u - \frac{M}{2})}^{2} + {(v - \frac{N}{2})}^{2}]}^{1 / 2}

(9)

The Gaussian high-pass filter offers superior smoothing compared to ideal filters, and the resulting coefficients are then processed using the inverse quaternion transform. The inverse transform is given as

f_{i} (n, m) = \frac{1}{\sqrt{M N}} \sum_{v = 0}^{M - 1} \sum_{u = 0}^{N - 1} e^{- μ_{1} 2 π ((\frac{m u}{M}) + (\frac{n u}{N})} F_{i} (u, v)

(10)

The inverse transform image is evaluated for salient regions based on color contrast, with regions with sharp color contracts being marked as proposal regions. The probability of a salient region to a proposal region is found as

P_{R} = \frac{1}{W C}

(11)

where

W C = \sum_{j = 1}^{n} W (i, j) . | | {m c}_{i} - {m c}_{j} | |

(12)

The weight W is calculated based on the area of super pixel segment and spatial similarity as

W (i, j) = [{S P}_{j}| {S i m}_{d} (i, j)

(13)

When

P_{R}

is greater than the threshold, then the salient region is selected as the proposal region.

2.2. Attention Matrix Generation

The attention matrix is used for feature learning with different intensities across regions. A dataset of images is collected, with each image split into grids of size n * n. A binary matrix representing each grid is created, with initial values of 0 if the grid falls in a proposal region. K mean clustering is performed on the binary matrix, and the attention matrix is generated from the binary matrices of the higher density cluster.

2.3. Modified CNN with LBP Kernel

This study proposes a modified CNN with a convolutional kernel between Gaussian and LBP, convolving pixel regions and applying a new kernel matrix for learning intricate features using LBP. For an image region marked by an attention matrix, multi-scale representation is formed, applying a Laplacian of Gaussian (LoG) operator. It is calculated as

L o G (x, y) = - \frac{1}{Π σ^{2}} (1 - \frac{x^{2} + y^{2}}{2 σ^{2}}) e^{\frac{x^{2} + y^{2}}{2 σ^{2}}}

(14)

L o G

is usually 0 for a uniform image and it changes to positive or negative depending on the darkness level of the image.

In each scale, LBP is calculated. The LBP of each scale is logically AND with the region to generate the feature map as

C (q) = \sum_{m = 1}^{M} A N D (L B P (s c a l e (q, m)), r e g i o n (q)) .

(15)

The effect of LoG is approximated using a discrete convolutional kernel, as shown in Figure 2a. The 2-D

L o G

for different values of σ is shown in Figure 2b.

The modified CNN uses an attention matrix-based convolution to extract texture features from an input image, resulting in 1024 dimension texture features. The architecture includes default convolution in unmarked regions and convolution using the calculated value as shown in Figure 3. The convolution flow discussed so far is summarized in Figure 4.

3. Results

The attention-based deep learning texture feature and SVM classifier were tested on two datasets: the texture database Out-ex_TC_00013 [22] and the plant village dataset. The texture database offers 68 RGB images in 68 categories, while the plant village dataset contains numerous plant species in both healthy and diseased states. The proposed solution was compared to CNN, transfer learning, a deep convolutional neural network, and Deep Lumina. Texture features were classified using SVM classifier, and performance was measured in terms of accuracy, precision, recall, and the F1-score [23,24]. Results are presented in Table 1 and Table 2.

The proposed solution has an average accuracy of 97.41%, which is at least 4.78% higher than previous works. As shown in Table 3, the proposed solution consistently outperformed existing works for all nine plant cases.

The proposed features’ performance was compared with other deep learning models like Alexnet, VGG16, and Resnet for the Texture dataset, as shown in Table 4.

The proposed attention-based deep feature extraction model outperforms other deep learning models by at least 9% for texture datasets, and its performance was compared to Alexnet, VGG16, and Resnet for the plant village dataset; the result is given in Table 5.

From Figure 5, compared with other deep learning models, the proposed attention-based deep feature extraction had at least 7% higher accuracy for texture datasets.

The proposed solution significantly reduces feature extraction time by 48%, primarily due to fewer CNN layers, with the modified CNN functionality contributing the most to this reduction.

The time taken for the feature in the proposed solution in compared against other deep learning feature extraction methods and result is shown. Compared to other deep learning models, the proposed solution took less time due to use a lesser number of layers.

The accuracy of the SVM classifier with proposed feature extraction is tested for different SVM kernels and the result is given in Table 6.

The proposed texture features performed better with a radial-basis kernel of SVM compared to Linear and Polynomial kernels.

The performance of the proposed features along with the radial-basis kernel was measured for various values of C, and the result is given in Figure 6.

The proposed solution achieved an accuracy of 0.1 at C = 0.1, with no significant increase in accuracy after increasing C, as shown in Figure 7. The ROC curve plot in Figure 8 reveals a highest ROC area of 90.7%, indicating higher sensitivity. The modified CNN’s peak accuracy is achieved at epoch 10, indicating its superior sensitivity.

4. Conclusions

In this work, a deep learning algorithm for extracting texture features was proposed. The proposed method for extracting texture features can learn complex features with varying intensities across an image region. To accomplish this, proposal regions based on frequency domain analysis were selected, and an attention matrix was created from the proposal regions. The proposed features were able to provide at least 4% more accuracy when compared to existing works when the discriminative ability of proposed feature extraction was tested against various cases. In this study, attention vector generation was accomplished through unsupervised clustering. Future work will include improving it with application-defined cues through active learning.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

The author would like to thank the reviewers for their constructive comments, which improved the overall quality of this paper.

Conflicts of Interest

The author declares no conflict of interest.

References

Armi, L.; Fekri-Ershad, S. Texture image analysis and texture classification methods—A Review. arXiv 2019, arXiv:1904.06554. [Google Scholar]
Ghalati, M.K.; Nunes, A.; Ferreira, H.; Serranho, P.; Bernardes, R. Texture Analysis and Its Applications in Biomedical Imaging: A Survey. IEEE Rev. Biomed. Eng. 2022, 15, 222–246. [Google Scholar] [CrossRef] [PubMed]
Chaki, J.; Dey, N. Texture Feature Extraction Techniques for Image Recognition; Springer: Singapore, 2019. [Google Scholar]
Humeau-Heurtier, A. Texture feature extraction methods: A survey. IEEE Access 2019, 7, 8975–9000. [Google Scholar] [CrossRef]
Andrearczyk, V. Deep Learning for Texture and Dynamic Texture Analysis. Ph.D. Thesis, Dublin City University, Dublin, Ireland, 2017. [Google Scholar]
Andrearczyk, V.; Whelan, P.F. Using filter banks in convolutional neural networks for texture classification. Pattern Recognit. Lett. 2016, 84, 63–69. [Google Scholar] [CrossRef]
Lin, T.-Y.; Maji, S. Visualizing and understanding deep texture representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2791–2799. [Google Scholar]
Li, C.; Huang, Y. Deep decomposition of circularly symmetric Gabor wavelet for rotation-invariant texture image classification. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 2702–2706. [Google Scholar]
Cimpoi, M.; Maji, S.; Kokkinos, I.; Vedaldi, A. Deep filter banks for texture recognition, description, and segmentation. Int. J. Comput. Vis. 2016, 118, 65–94. [Google Scholar] [CrossRef] [PubMed]
Liu, N.; Rogers, M.; Cui, H.; Liu, W.; Li, X.; Delmas, P. Deep convolutional neural networks for regular texture recognition. PeerJ Comput. Sci. 2022, 8, e869. [Google Scholar] [CrossRef] [PubMed]
Dixit, U.; Mishra, A.; Shukla, A. Texture classification using convolutional neural network optimized with whale optimization algorithm. SN Appl. Sci. 2019, 1, 655. [Google Scholar] [CrossRef]
Kociołek, M.; Kozłowski, M.; Cardone, A. A Convolutional Neural Networks-Based Approach for Texture Directionality Detection. Sensors 2022, 22, 562. [Google Scholar] [CrossRef]
Scabini, L.F.; Condori, R.H.; Ribas, L.C.; Bruno, O.M. Evaluating Deep Convolutional Neural Networks as Texture Feature Extractors. In Proceedings of the Image Analysis and Processing—ICIAP 2019: 20th International Conference, Trento, Italy, 9–13 September 2019. [Google Scholar]
Zhang, H.; Xue, J.; Dana, K. Deep ten: Texture encoding network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 708–717. [Google Scholar]
Scabini, L.F.; Condori, R.H.; Gonçalves, W.N.; Bruno, O.M. Multilayer complex network descriptors for color-texture characterization. Inf. Sci. 2019, 491, 30–47. [Google Scholar] [CrossRef]
Barburiceanu, S.; Meza, S.; Orza, B.; Malutan, R.; Terebes, R. Convolutional Neural Networks for Texture Feature Extraction. Applications to Leaf Disease Classification in Precision Agriculture. IEEE Access 2021, 9, 160085–160103. [Google Scholar] [CrossRef]
Anwer, R.M.; Khan, F.S.; Van De Weijer, J.; Molinier, M.; Laaksonen, J. Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS J. Photogramm. Remote Sens. 2018, 138, 74–85. [Google Scholar] [CrossRef]
Jia, Q.; Fan, X.; Yu, M.; Liu, Y.; Wang, D.; Latecki, L.J. Coupling Deep Textural and Shape Features for Sketch Recognition. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 421–429. [Google Scholar]
Kasthuri, A.; Suruliandi, A.; Raja, S.P. Gabor Oriented Local Order Features Based Deep Learning for Face Annotation. Int. J. Wavelets Multiresolution Inf. Process. 2020, 17, 1950032. [Google Scholar] [CrossRef]
Simon, A.P.; Uma, B.V. Deep Lumina: A Method Based on Deep Features and Luminance Information for Color Texture Classification. Comput. Intell. Neurosci. 2022, 2022, 9510987. [Google Scholar] [CrossRef] [PubMed]
Hafemann, L.G.; Oliveira, L.S.; Cavalin, P.R.; Sabourin, R. Transfer learning between texture classification tasks using Convolutional Neural Networks. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–7. [Google Scholar]
Outex Texture Database. 2018. Available online: https://www.outex.oulu.fi/index.php?page=classification (accessed on 22 December 2022).
Uplaonkar, D.S.; Virupakshappa; Patil, N. Modified Otsu thresholding based level set and local directional ternary pattern technique for liver tumor segmentation. Int. J. Syst. Assur. Eng. Manag. 2022. [Google Scholar] [CrossRef]
Veerashetty, S.; Virupakshappa; Ambika. Face recognition with illumination, scale and rotation invariance using multiblock LTP-GLCM descriptor and adaptive ANN. Int. J. Syst. Assur. Eng. Manag. 2022, 1–14. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed solution.

Figure 2. (a) Kernel Function. (b) LoG Function.

Figure 3. Modified CNN.

Figure 4. Convolution flow.

Figure 5. Comparison of accuracy with other deep learning models for texture database.

Figure 6. Time taken for each stage in proposed solution.

Figure 7. Accuracy vs. C Value.

Figure 8. ROC Plot.

Table 1. Results for texture database.

Metrics	Proposed	[17]	[11]	[21]
Accuracy	93.94	89.76	88.5	85.06
Precision	94.76	90.82	89.25	85.12
Recall	93.94	89.76	88.5	84.06
F1-score	94.35	90.29	88.87	84.58

Table 2. The results for plant village dataset.

Metrics	Proposed	[17]	[11]	[21]
Accuracy	97.41	92.63	91.76	90.54
Precision	96.42	93.41	92.34	92.34
Recall	96.43	92.11	90.21	89.98
F1-score	96.42	92.89	91.12	90.14

Table 3. Results for categories in plant village dataset.

Metrics	Proposed	[17]	[11]	[21]	Proposed	[17]	[11]	[21]
1. Apple					6. Pepper bell
Accuracy	99.74	93.13	93.46	90.54	99.82	93.13	91.26	92.84
Precision	99.75	93.33	94.14	91.34	99.81	94.31	92.24	93.24
Recall	99.66	93.12	92.23	90.98	99.83	93.71	93.23	92.18
F1-score	99.7	93.89	93.82	92.14	99.82	93.18	93.22	92.24
2. Cherry					7. Potato
Accuracy	100	94.13	93.16	91.34	99.43	93.13	90.16	90.81
Precision	100	94.11	94.24	92.14	98.87	94.11	91.41	91.34
Recall	100	94.41	92.33	91.98	98.45	93.21	90.13	90.26
F1-score	100	94.85	93.52	92.54	98.66	93.10	91.12	91.16
3. Corn					8. Strawberry
Accuracy	97.45	91.43	93.16	92.34	98.41	92.23	90.26	90.86
Precision	96.52	92.41	94.24	92.84	98.31	93.21	91.21	91.32
Recall	96.39	92.71	92.33	91.98	98.37	93.51	90.31	90.86
F1-score	96.46	92.88	93.52	92.54	98.34	93.80	91.21	91.65
4. Grape					9. Tomato
Accuracy	99.53	92.43	92.26	91.34	97.99	93.23	91.36	91.16
Precision	99.61	93.41	93.64	92.24	97.14	94.31	92.31	92.12
Recall	99.63	92.71	93.23	92.98	97.04	93.51	90.33	90.87
F1-score	99.62	92.78	93.72	92.54	97.09	93.60	91.42	91.85
5. Peach
Accuracy	98.1	92.13	92.26	92.34
Precision	96.15	93.31	93.14	93.14
Recall	95.83	92.71	92.13	92.28
F1-score	96	92.68	93.12	92.54

Table 4. Comparison with other deep learning models for texture dataset.

Metrics	Proposed	Alexnet	VGG16	Resnet
Accuracy	93.94	84.16	85.15	85.06
Precision	94.76	85.23	86.21	85.12
Recall	93.94	84.61	85.65	84.06
F1-score	94.35	85.19	86.13	84.58

Table 5. Comparison with other deep learning models for plant village dataset.

Metrics	Proposed	[17]	[11]	[21]
Accuracy	97.41	90.13	90.76	90.72
Precision	96.42	91.11	91.34	91.84
Recall	96.43	90.21	90.31	89.81
F1-score	96.42	91.69	91.32	90.24

Table 6. Accuracy of SVM Classifier with proposed feature extraction with different SVM Kernels.

Metrics	Radial Basis Kernel	Linear Kernel	Polynomial Kernel
Accuracy	97.41	94.13	94.76
Precision	96.42	95.11	95.34
Recall	96.43	93.21	93.31
F1-score	96.42	94.69	94.32

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Veerashetty, S. Attention-Guided Deep Learning Texture Feature for Object Recognition Applications. Eng. Proc. 2023, 59, 51. https://doi.org/10.3390/engproc2023059051

AMA Style

Veerashetty S. Attention-Guided Deep Learning Texture Feature for Object Recognition Applications. Engineering Proceedings. 2023; 59(1):51. https://doi.org/10.3390/engproc2023059051

Chicago/Turabian Style

Veerashetty, Sachinkumar. 2023. "Attention-Guided Deep Learning Texture Feature for Object Recognition Applications" Engineering Proceedings 59, no. 1: 51. https://doi.org/10.3390/engproc2023059051

Article Menu

Attention-Guided Deep Learning Texture Feature for Object Recognition Applications^†

Abstract

1. Introduction