Next Article in Journal
A GPU Scheduling Framework to Accelerate Hyper-Parameter Optimization in Deep Learning Clusters
Next Article in Special Issue
Polarization-Encoded Fully-Phase Encryption Using Transport-of-Intensity Equation
Previous Article in Journal
A Neural Network Classifier with Multi-Valued Neurons for Analog Circuit Fault Diagnosis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Weight Quantization Retraining for Sparse and Compressed Spatial Domain Correlation Filters

1
Department of Computer and Software Engineering, College of Electrical and Mechanical Engineering (E&ME), National University of Sciences and Technology, Islamabad 44000, Pakistan
2
Institute of Computer Engineering, Technische Universität Wien (TU Wien), 1040 Vienna, Austria
3
Faculty of Computer Engineering, HITEC University, Taxila 47080, Pakistan
4
Division of Engineering, New York University Abu Dhabi (NYU AD), Abu Dhabi 00000, United Arab Emirates
*
Author to whom correspondence should be addressed.
Electronics 2021, 10(3), 351; https://doi.org/10.3390/electronics10030351
Submission received: 13 November 2020 / Revised: 3 January 2021 / Accepted: 4 January 2021 / Published: 2 February 2021
(This article belongs to the Special Issue Compressive Optical Image Encryption)

Abstract

:
Using Spatial Domain Correlation Pattern Recognition (CPR) in Internet-of-Things (IoT)-based applications often faces constraints, like inadequate computational resources and limited memory. To reduce the computation workload of inference due to large spatial-domain CPR filters and convert filter weights into hardware-friendly data-types, this paper introduces the power-of-two (Po2) and dynamic-fixed-point (DFP) quantization techniques for weight compression and the sparsity induction in filters. Weight quantization re-training (WQR), the log-polar, and the inverse log-polar geometric transformations are introduced to reduce quantization error. WQR is a method of retraining the CPR filter, which is presented to recover the accuracy loss. It forces the given quantization scheme by adding the quantization error in the training sample and then re-quantizes the filter to the desired quantization levels which reduce quantization noise. Further, Particle Swarm Optimization (PSO) is used to fine-tune parameters during WQR. Both geometric transforms are applied as pre-processing steps. The Po2 quantization scheme showed better performance close to the performance of full precision, while the DFP quantization showed further closeness to the Receiver Operator Characteristic of full precision for the same bit-length. Overall, spatial-trained filters showed a better compression ratio for Po2 quantization after retraining of the CPR filter. The direct quantization approach achieved a compression ratio of 8 at 4.37× speedup with no accuracy degradation. In contrast, quantization with a log-polar transform is accomplished at a compression ratio of 4 at 1.12× speedup, but, in this case, 16% accuracy of degradation is noticed. Inverse log-polar transform showed a compression ratio of 16 at 8.90× speedup and 6% accuracy degradation. All the mentioned accuracies are reported for a common database.

1. Introduction

The computer vision system competence has faced many challenges during their early development phases. These challenges impeded the target detection performance of artificial vision systems. The human vision can easily distinguish the object despite occlusion, clutter, rotation, lighting conditions, scale, or noise; however, the camera sensors encountered difficulty while resolving the mentioned challenges. In order to mitigate these hindrances, multiple efforts are made in the Correlation Pattern Recognition (CPR) literature. Usually, these challenges can be solved by applying an appropriate form of rotation. Besides scale invariance, improving the statistical approach to training the matching filter, extraction, and exploiting the scale-invariant features for the detection or geometrical transform are other useful steps. Affixing pre-processing steps before the training and inference phase incurs an extra computation cost.
Traditionally, CPR filters are trained and tested in the frequency domain. Contrary to that, spatial domain CPR filters are trained in the frequency domain, and they are later converted back to the space domain for inference. The current paper refers to this methodology as frequency-trained (FT). In addition to this approach, this paper also considers complete training and inference in the space domain known as spatially-trained (ST). Inference in the spatial domain is computationally expensive as compared to the frequency domain. It involves cross-correlation between the test image and the reference template. Inference can be performed on various devices, like CPU, Internet-of-Things (IoT) devices, GPU, or ASICs.

1.1. Motivation and Research Challenges

Computation Cost Associated with Spatial Domain Correlation Filters: In order to handle the false detection under non-uniform lighting conditions, the state-of-the-art CPR [1,2] employed spatial filters instead of a typical approach of training and testing filters in the frequency domain. However, real-time implementation of the spatial filters demands more computational resources than frequency domain filters.
Hardware Implementation Constraints: Intrinsically, embedded systems have limited resources. So, synthesizing the state-of-the-art CPR inference on embedded systems poses many challenges. Hardware is either constrained by the number of operations that can be executed in parallel or by the memory interface transmission rate [3].
Associated Research Challenges: The problem of computation complexity and hardware constraints poses the following challenges.
  • Efficiency and Computational Complexity of Inference due to Number and Large Sizes of Spatial Domain Correlation Filters: The large size of CPR-trained templates and the number of filters required for each target, especially for out-of-plan training, make the inference phase computationally complex. This complexity increases because of certain limitations and critical requirements, such as limited available power, high throughput demand, and hard real-time processing requirements; so, sparsity can reduce workload and increase inference efficiency.
  • Memory Requirement of CPR filter Weights: Full-precision filter weights have higher memory requirements, which increases with the size and number of spatial filters. In that case, memory minimization is possible through filter-weight compression.
Consequently, both the above-mentioned challenges increase the number and complexity of operations required to detect the target. To address such challenges, this paper mainly focuses on compression and retraining CPR approaches. Subsequently, the following research gaps should be explored:
  • We need to explore compression techniques for CPR filters and improve the inference computation efficiency; however, reducing the weight precision results in the emergence of quantization error, which degrades the classification accuracy. The real challenge is to maintain the classification accuracy for assuring the maximum possible compression ratio
  • To minimize the computation workload for inference without degradation in classification accuracy.
Recent researches on CPR apply some pre-processing steps before training filters focusing on accuracy or in-variance. These steps are used to achieve zoom, rotation, or translation in-variance. Gardezi et al. [1] use Affine Scale Shift Invariant Feature Transform (ASIFT) along with a spatial correlation filter that enables the fully-invariant filter. Similarly, Awan et al. [4] devise an auto-contour-based technique to reduce the side lobes. This method assures higher accuracy through prior object segregation before correlation with the reference template; however, the mentioned techniques do not target the filters’ compression, efficiency, or memory requirements.
The flow diagram of the proposed techniques is illustrated in Figure 1. Database of training, validation, and testing samples are read to pre-process through geometric transformation (step 1 in Figure 1), which is log-polar or inverse log-polar transform. For direct quantization, training samples are passed through spatial-training (ST) (step 2 in Figure 1) or frequency-training (FT) (step 3 in Figure 1) before applying quantization schemes. Weight quantization retraining (WQR) (step 5 in Figure 1) is applied after ST. Then, the outcome is quantized (step 4 in Figure 1) for compressed filters. These quantization techniques from all methods are cross-correlated with a testing image to produce a correlation plane. This plane is used to generate the detection score. Particle Swarm Optimization (PSO) (step 6 in Figure 1) is employed to find and fine-tune the γ and β parameters. Further, Table 1 represents the details of variable and its description used in this paper.

1.2. Contributions

This paper makes the following contributions:
  • A Weight Quantization Retraining (WQR) (step 5 in Figure 1) method is proposed in this paper to retrain low-precision quantization weights of the CPR filter for dynamic fixed point and power-of-two (step 4 in Figure 1) quantization schemes. Further, the PSO (step 6 in Figure 1) technique is applied to optimize β and γ .
  • Log-polar and inverse log-polar transforms (step 1 in Figure 1) are introduced as the pre-processing strategies to support the low-precision CPR filter quantization.
  • An analysis is performed to compare the advantages of ST filters (step 2 in Figure 1) and FT filters (step 3 in Figure 1). This analysis is further extended to each domain, either spatially-trained or frequency-trained, to investigate the comparative benefits of power-of-two (Po2) and dynamic-fixed-point (DFP) quantization schemes.
  • The overall analysis compares the advantages of direct, log-polar, inverse log-polar, and WQR, which provides a better perspective.

2. Mathematical Background and Related Work

CPR is a match-filtering [5,6,7] technique. Reference [8] is one of the correlation-based pattern recognition approaches. During the last three decades, CPR progressively improves the designs of statistical methods for training the reference templates. Typically, the CPR training phase involves different sample images and their processing through a statistical training method to prepare a reference template for the target/object. Target localization in the testing image uses cross-correlation with a stride of 1, which means convolution after 180-degree rotation for searching the target/object at each location, and that gives the output correlation plane. For spatial domain filtering, each output in the correlation plane has a float-point operational cost, which is equal to the product of height and width of the reference template. The presence of the target is identified by the height of the peak in the correlation plane. The relative class resemblance of the target/object is proportional to the peak height in the output correlation plane. The larger peak in the output correlation plane corresponds to a stronger probability of the target, whereas the absence of the target results in a broader peak in the correlation output plane. Each reference template design must be designed keeping in view the fact that there is a trade-off between optimal correlation peak, distortion invariance, and clutter suppression [9,10]. Generally, these traits are regulated by optimized parameters. Preliminary steps to resolve the target detection problem [5] are limited to optimal optical correlators [6,7]; however, these basic template designs do not solve the issues, such as distortion and clutter rejection. Synthetic Discriminant Filters (SDFs) [11] are used as the first decent strategy to deal with the challenges. It is the earliest effort in the overall CPR domain and besides, it provides the foundation for further advancements in this field. Although, the later generalization of SDFs involves addressing the in-variance issue in the filter design [12,13]. This approach partially handles the mentioned challenges. Besides, for achieving the optimality and invariance [13], the SDF design emphasizes on enhances the signal-to-noise ratio in the proximity of the target but it allows the side-lobes to function in the proximity of the correlation peak, which complicates the estimation method, and makes it difficult to obtain the optimal threshold value. Casasent et al. [14] generate a bigger dataset for training, which is obtained by rotating and shifting each image. After that, the Minimum Average Correlation Energy (MACE) [15,16] filter, which is a hybrid form of MACE, and a filter with the minimum variance SDF (MVSDF) [17,18] are proposed. The proposed approaches produce sharper peaks in the correlation output plan as compared to their predecessors. Both these filters (MACE, MVSDF) give excellent responses against noise; however, their performances are inadequate against distortion. A trade-off must be kept between the parameter to control the object/target detection despite clutter and the distortion of an object/target. To answer the possible challenges, a breakthrough in the CPR field is made in the mid-90s. First, the Maximum Average Correlation Height (MACH) and then the Optimal Trade-Off MACH filter are introduced. These statistical models optimize the filter response between noise, distortion, and clutter rejection but these filters excessively depend on the mean of all the samples. This obstacle impedes the classifier’s performance and results in false positives. Eigen Maximum Average Correlation Height (EMACH) [19] mitigates the dependence on the sample average, which relatively improves the classifier accuracy. Further improvement in the classification accuracy is possible in the Enhanced Eigen Maximum Average Correlation Height Filter (EEMACH) [20] along with a tradeoff. WMACH [21,22,23] enhances the performance of the reference template, and the Gaussian wavelet is applied before training as a pre-processing step. Target search in the input scene enables the CPR techniques to accurately localize the target. MMC filter [24] exploits this CPR feature by integrating the Support Vector Machine (SVM) with CPR to pinpoint the target’s location within the input scene. The CPR localizes the target when the SVM allowed generalization in the input. Further enhancement in the CPR performance is introduced through partial-aliasing correlation filters [25]. These filtering techniques ensure sharper peaks in the presence of a target/object. Performance improvement originates from the aliasing effect that takes place because of the circular correlation as compared to linear convolution, which impedes the CPR performance. Human action recognition [26,27,28,29] intercedes the CPR filters to detect human actions.
Achuthanunni et al. [30] and Banerjee et al. [31] propose the band-pass pre-processing of Laplacian of Gaussian (LoG) of unconstrained correlation filter for facial recognition. The band-pass filtering achieves a trade-off between suppressing irrelevant details and enhancing the edges for feature representation. The proposed technique applies PSO to find the optimum scale. The filter successfully handles the challenges, like illumination and noise, during face recognition and outperforms other correlation filters. However, it lacks detection in the presence of out-of-plane and in-plane rotation and scale. Akbar et al. [32] also employ the rotational invariant correlation filter for moving human detection. The proposed methodology pre-processes the color conversion approach and background elimination to enhance correlation filters’ speed and accuracy. Akbar et al. [33] propose the hardware implementation of correlation filters on FPGA, which reduces the processing time with negligible performance loss. Hardware design is implemented in LabView, which later may be used later in real-time security applications. Haris et al. [34] apply the MACH filter to localize the target in videos. The target is tracked using a particle filter, while motion is approximated using the Markov model. An approximate proximal gradient algorithm is applied to limit the object tracking to target templates. Haris et al. [35] implement the fast-tracking and recognition using Proximal Gradient Filter (PG) and modified MACH. The proposed tracking approach resolves the challenge of target detection and changing the coordinates of the target. In another research [28], a blended approach is proposed to simultaneously handle noise, clutter, and occlusion. Logarithm transform and DoG are applied as a pre-processing step to achieve this. Further, to produce sharp peaks, a minimum average correlation energy filter is adopted to recognize the target. The results show a remarkable performance of the mentioned approach as compared to other correlation filters.
Reducing the data precision is a straightforward approximation technique to reduce memory and stringent requirements of energy. It also brings some accuracy degradation; however, finding the application’s resilience against error introduces due to bit-width reduction is vital for approximation. This approximation is feasible both at software and architecture levels. Venkataramani et al. [36] propose an approximation approach to mitigate the energy requirements of Neural Networks (NNs). An approximation framework is presented which employed the back-propagation to convert the standard trained NN to AxNN which is an approximated and energy-efficient version with almost the same accuracy. This method locates the neuron, which has the least effect on accuracy then replaces that neuron with its approximated equivalent neuron. Different approximation versions with the energy-accuracy trade-off for original NN are produced by adjusting the input precision and neuron weights. Retraining is used to recover the accuracy loss generated due to approximations. Authors also proposed customized hardware that enables the flexibility of weights, topologies, and tunable approximation called neuromorphic processing engine. This engine exploits the computation and activations units to implement the AxNN and achieve precision -energy trade-off during execution.
Rubio-Gonzalez et al. [37] propose a Precimonious framework for approximating floating-point precision reduction. This approach finds low precision floating-point data type for the program’s variables, which depends on given accuracy constraints. For hardware applications, FPGA implementation requires the code change, while software application only requires to use a dedicated library or modification in data type. The framework evaluates different test programs which include numerical analysis applications, Scientific Library, and Numerical Aerodynamic Simulation (NAS) parallel computing. The results demonstrate a 41% improvement in the performance for a precision reduction in data types. Pandey et al. [38] propose the fixed-point logarithm function approximation, which is implemented using FPGA. The proposed approach approximates the mathematical function by presenting a binary logarithm unit. The proposed hardware combines the fixed-point data path with a combinational logical circuit, enabling low area utilization. This approach is verified using a Xilinx Virtex-5 device. The hardware can approximate integer, a mix of integer and fraction, and fractional-only inputs. Moreover, Table 2 summarizes the comparison between significant works in CPR literature.

2.1. Optimal Trade-Off Maximum Average Height Correlation (OT-MACH) Filter

Maximum Average Height Correlation filter [39,40] is designed with a prime objective of target/object recognition, which, unlike previous methods, simultaneously handles maximum possible distortion tolerance, ability to discriminate objects, and capability of dealing with noise in the test image. The MACH filter mainly comprises of the criteria known as Average Correlation Height (ACH), Average Correlation Energy (ACE), and Output Noise Variance (ONV). For deriving the MACH, the following energy expression is used:
E ( h ) = α ( O N V ) + β ( A C E ) + γ ( A S M ) δ ( A C H ) ,
E ( h ) = α h + C h + β h + D x h + γ h + S x h δ | h T m x | ,
h = m x α C + β D x + γ S x ,
where α , β and γ are non-negative filter-tuning parameters, m x is the average of the training image vector x 1 , x 2 , x 3 , x n , and C is a diagonal-power spectral density matrix of additive input noise.
D x = 1 N i = 1 N X i X i + ,
where X i is a diagonal matrix of the i t h training image. S x denotes the similarity matrix of the training images.
S x = 1 N i = 1 N ( X i M x ) ( X i M x ) + ,
where M x is mean of vectors X i . Different values of α , β and γ are optimized to get the required response under different test-image scenarios.

2.2. Eigen Maximum Average Correlation Height (EMACH) Filter

EMACH [19] is designed to improve the false-positive generated due to over-emphasis on the average training image. The improved statistical method introduced β to control the average training image’s contribution in the filter design. EMACH filter is defined by criteria C x β and S x β .
C x β = 1 N i = 1 N ( x i β m ) ( x i β m ) + ,
S x β = 1 N i = 1 N [ X i ( 1 β ) M ] [ X i ( 1 β ) M ] + ,
J x β = h + C x β h h + ( 1 + S x β ) h ,
( 1 + S x β ) 1 C x β h = λ h .
Eigen value λ and Eigen vector ( 1 + S x β ) C x β define the filter.

2.3. Log-Polar Transform

The mammalian retina is analog to the log-polar transform which converts the standard Cartesian coordinate (x, y) into log-polar coordinates θ and ρ . The log-polar transform is used for object rotation and scale-invariance [41], where scalability and rotation translate into a peak position in the output correlation plane. Log-polar transform of the color image in RGB format is shown in Figure 2. Note that transform is applied to each channel separately.
z = r e θ + ρ .
In log-polar domain, the same is represented as:
w = l o g r + θ i + ρ i = u + i v + ρ i .
In the log-polar domain, θ corresponds to the rotation. In fact, rescaling an object results in a horizontal shift in the mapping, and the re-scaling effect is grasped through the following set of equations (see Figure 2):
z = r γ e θ
w = l o g r + θ i + l o g γ = u + i v + l o g γ
where log γ corresponds to the horizontal shift.
This paper is organized into different sections. Each section describes a significant part of our approach in detail. Section 3 demonstrates the methodology and subdivides it into further sections. Section 3.2 describes the details of each quantization scheme. Section 3.3 explains the CPR filters’ re-training using the quantization error. Details of log-polar, inverse log-polar transformation and its support for quantization are mentioned in Section 3.4. The quantization configuration settings are given in Section 3.5. Section 4 provides the details of the experimental analysis, which is further divided into experimental setup, parameter optimization, and performance analysis. In the end, Section 5 concludes the paper.

3. Methodology

3.1. Overview

The block diagram presents an integrated framework of the proposed approach in Figure 3. The framework accepts a number of training instances x 1 , x 2 , x 3 , . . . . x N and a testing sample y i as inputs. Instances carry through a log-map (step 1 in Figure 3), and inverse log-map transforms (step 2 in Figure 3). The details of these respective transforms are provided in Section 3.4. Direct quantization is applied (step 3 in Figure 3) without the transform. From this point onwards, each of these cases either branches out into FT, ST, or spatially-retrained categories. For a FT filter, frequency transform (step 4 in Figure 3) is used.
After obtaining the Correlation Height (step 7 in Figure 3) and Average Similarity (step 8 in Figure 3) in the frequency domain, the regular training (step 13 in Figure 3) is performed to get the H E E M A C H . Spatial frequency response h E E M A C H s p a t i a l f r e q u e n c y is obtained after the Inverse Fourier Transform (step 14 in Figure 3) of H E E M A C H . Further, the training process is repeated in the spatial domain where Spatial Correlation Height (step 5 in Figure 3) and Spatial Average Similarity (step 6 in Figure 3) are calculated, while the training is conducted in the spatial domain. Similarly, Weighted Quantization Retraining Modified Spatial Correlation Height (step 9 in Figure 3) and Modified Average Similarity (step 10 in Figure 3) are computed for retraining (step 12 in Figure 3) of reference filter h R t E E M A C H . Retraining requires β , γ and the already optimized filter h E E M A C H o p t with floating-point precision. The details of retraining are available in Section 3.3. Consequently, the trained templates from all the quantized approaches (step 15 in Figure 3) use power-of-two (Po2) and dynamic-fixed-point (DFP) schemes. The details of these schemes are in Section 3.2. Subsequently, the correlation is calculated in a spatial domain, like a window operation, which generates the correlation output plane. Lastly, the detection score evaluation (step 16 in Figure 3) is performed by post-processing of the output correlation plane. The details of post-processing are available in Section 4.

3.2. Quantization Schemes

Quantization schemes convert the pre-trained floating-point precision weights into quantized weights with minimum distance from the original filter; however, the magnitude of distance depends on the type of quantization scheme.
Evaluation: For evaluating the quantization mechanism, two quantization schemes are chosen and evaluated for filter compression. These schemes are power-of-two (Po2) and dynamic-fixed-point (DFP) quantization. The resulting properties of these quantization schemes are studied in conjunction with direct, log-polar, inverse log-polar, and filter retraining.
Power-of-Two Quantization: Po2 is the state-of-the-art quantization technique used for data compression. Zhou et al. [45] implement the Po2 quantization for quantization of deep networks. This technique is employed due to its hardware-friendly nature, which means that multiplication can be performed using a shift operation. This property gives it an advantage over other quantization schemes during spatial cross-correlation. Po2 quantization can be defined using the following mathematical framework:
L p o w 2 = [ ± 2 m 1 , . . . . . . . , ± 2 m 2 , 0 ] .
m 1 and m 2 are integer numbers with
m 1 = f l o o r ( log 2 1.33 v ) ,
v = max ( a b s ( f w ) ) .
For a given bit-width (BW), m 2 can be mathematically represented as follows:
m 2 = m 1 ( 2 B W 1 1 ) .
Equation (14) presents the proposed quantization levels for different values of m 1 and m 2 ; however, as mentioned in Equations (15) and (16), these values further depend on the absolute maximum value of weights in the filter f w . Overall, the quantization levels depend on the distribution and maximum absolute value of weights in the filter. Since the quantization scheme’s nature is symmetric, we utilize only 2 B W 1 of 2 B W quantization levels. Additionally, we have added an extra quantization level at zero value, as shown in Equation (14). Therefore, the filter may show high accuracy. CPR filters employ the training samples containing black background, adding the quantization level at zero increases the sparsity significantly in trained filters.
Dynamic-fixed-point Quantization: Since there are plenty of cases in which this scheme is successfully implemented to achieve relatively better compression versus accuracy trade-off [46,47], however, unlike Po2 quantization, DFP quantization has a better peak-signal-to-noise ratio (PSNR). This property provides it an edge over Po2. Overall, this scheme assures less noise because it produces equidistant points for quantization levels as compared to the previous quantization method. Equations (18) and (19) represent the quantization scheme. For bit-width (BW), Equation (18) maintains quantization levels at equal distance from each other.
L d f p = [ ± 2 B W 1 1 , ± 2 B W 1 2 , . . . . . . . , 0 ] ,
U d f p = L d f p 2 B W 1 × 2 m 1 .
Like Po2 quantization, this approach is symmetric, while Equation (18) provides the normalizing and scaling functions to the already-bounded quantization levels in Equation (19).
The weight distribution of both quantization schemes before and after quantization is presented in Figure 4. By comparing Figure 4a,b, the Po2 has non-uniform quantization levels as compared to DFP, whereas more weights are mainly quantized around zeroth quantization level, which preserves the low-value weights that help to improve the accuracy. Meanwhile, the DFP induces more sparsity as compared to Po2 quantization. This sparsity increases with a rise in compression ratio as more zeroth levels are added with an increase in compression ratio.
To establish a proper connection between the quantization schemes and the resulting quantization noise, Figure 5 and Figure 6 are provided for analysis. For DFP quantization, Figure 5 illustrates the relationship between peak signal-to-quantization noise ratio (PSNR) and compression ratio. A sample is taken from the Fashion MNIST dataset, and we approximate it using DFP quantization. By applying direct DFP to the filter, the peak signal-to-noise ratio remains constant up to 6-bit compression, but, after this, peak-signal-to-noise-ratio starts to decrease because the quantization interval doubles with the reduction of a bit, which doubles the compression ratio. Similarly, DFP quantization has better PSNR after log-polar transform, while both quantization schemes have the same PSNR values for 2-bit (CR = 8) to 1-bit compression (CR = 16). Likewise, in Figure 6, applying Po2 quantization on a sample and monitoring PSNR values for each compression level yields a constant PSNR value, and this goes on up to 3-bit compression. From that point, as the compression ratio increases, the new quantization levels are added near the zeroth level. This causes more quantization noise in a compressed version. For direct Po2 quantization, it gradually falls to 2-bit (CR = 8) and 1-bit (CR = 16) compression. Overall, a better PSNR value is achieved through this method than the direct Po2 quantization, and besides, that behavior reverses after 3-bit compression. PSNR for 2-bit (CR = 8) and 1-bit (CR = 16) compression falls more rapidly than the direct Po2 quantization. By comparing the two different quantization approaches, it is evident that DFP quantization has achieved better PSNR values as compared to Po2 quantization. Contrary to that, PSNR values of Po2 remain insensitive to most compression values. Moreover, in the case of DFP quantization, PSNR value for 1-bit compression (CR = 16) drops to −40; however, in the case of Po2 quantization, PSNR drops to −32 in case of 1-bit (CR = 16) compression; therefore, it is quite clear from both Figure 5 and Figure 6 that Po2 quantization drops less in case of PSNR for lower bit compression as compared to DFP quantization.
These observations hint at the superior accuracy of DFP quantization as compared to Po2 up to 5-bit compression. After that, accuracy of both should be equal for 4-bit compression, while, for 3-bit and more, Po2 quantization should have greater accuracy.

3.3. Retraining the CPR Filter

Quantization error causes inaccurately trained filters. The fine-tuning of the quantized trained filter is an approach to get more accurate filters, which can be possible through a retraining filter. A mathematical framework is proposed for retraining filter to add the quantization error term in an already defined statistical training method. Equation (20) adds the given quantization error for trained filter h e q with each sample x i . In Section 3.2, we have already seen the PSNR degradation with the increase in the compression ratio. This PSNR degradation is different for the Po2 and DFP quantization schemes due to quantization noise. In order to ensure the compensation because of introducing these quantization approaches, quantization error co-efficient ξ is used to control the contribution of quantization error ( h e q ) in filter design, where ( 0 < ξ < 1 ). Modified Average Image Correlation Height (mAICH) criteria has an additional term h e q for each sample x i because it adds to each sample, as well as in the average of samples, m. Equation (22) presents the mAICH after substitution of v i and m h .
m A I C H = 1 N i = 1 N ( h + ( x i + ξ h e q ) ) 2 α ( h + ( m + ξ h e q ) ) 2 ,
where h + is the complex conjugate transpose of h.
v i = x i + ξ h e q , m h = m + ξ h e q ,
m A I C H = 1 N i = 1 N ( h + v i ) 2 α ( h + m h ) 2 ,
m C β x = 1 N i = 1 N [ v i β m h ] + [ v i β m h ] ,
whereas β denotes the contribution of m h in Equation (23). In Equation (23), m C β x is the average of correlation peak intensities ( v i β m h ) samples. Ideally, all training images should follow this convention in which v i is subtracted from a partial average of training samples. To achieve this, every sample v i should have an identical output correlation plane, like the ideal output correlation plane f. To find out the f that suits all samples’ correlation output planes, the minimum deviation is required between its correlation planes. Equation (24) describes this deviation as the average square error (ASE).
The average square error between f and g i is given in Equation (24):
A S E = 1 N i = 1 N ( g i f ) + ( g i f ) ,
where
g = ( 1 β ) M h h ,
where h is the complex conjugate of h. To achieve the maximum peak, partial derivative with respect to f should be equal to zero, as given in Equation (26):
A S E f = 0
In this equation, f o p t is the optimized filter, and, after solving Equation (26) and substituting g i in Equation (27), we get the following:
f o p t = 1 N i = 1 N g i = ( 1 β ) M h h = ( 1 β ) ( M + ξ H e q ) h ,
where M h = M + ξ H e q . H e q is a diagonal matrix having h e q along its main diagonal, and M is a diagonal matrix having m along its diagonal. Substituting Equation (21) into Equation (23), we get:
m C β , γ x = 1 N i = 1 N [ x i β m + γ h e q ] + [ x i β m + γ h e q ] ,
γ = ξ ( 1 β ) .
The next step is to change the Average Similarity Measure (ASM), which defines the dissimilarity of training samples to ( 1 β ) M h h , the measure referred as modified ASM, or mASM.
m A S M = 1 N i = 1 N [ V i h ( 1 β ) M h h ] + [ V i h ( 1 β ) M h h ] = h T [ 1 N i = 1 N [ X i ( 1 β ) M + β ξ H e q ] [ X i ( 1 β ) M + β ξ H e q ] ] h = h T m S β , γ x h = h + m S β , γ x h ,
where V i = X i + ξ H e q . X i is a diagonal matrix having vector x i along its main diagonal.
Similarly,
m S β , γ x = 1 N i = 1 N [ X i ( 1 β ) M + β ξ H e q ] [ X i ( 1 β ) M + β ξ H e q ] ,
where m S β , γ x is a diagonal matrix.
m J x β = h + m C x β h h + ( 1 + m S x β ) h ,
( 1 + m S x β ) 1 m C x β h = λ h .
Eigen value λ and Eigen vector ( 1 + m S x β ) m C x β define the filter. In Equations (28) and (30), m S β , γ x and m C β , γ x are the modified forms of S β , γ x and C β , γ x . Figure 7 shows the floating-point filter’s histogram, quantized version, and the retrained quantize version. The illustration clearly demonstrates the displacement of weight values of the retrained filter. Noticeably, the intensity values of the filter shift to adjust to new values. The retraining process changes the value of retraining intensities. The complete retraining process for a 3-by-3 snip of the filter is shown in Figure 8a. The floating-point precision filter, h f transforms into a quantized version, h q . Further, the quantization error, h e q is calculated to support the retraining process. This process yields h r t filter using Equations (31) and (32), which is a retrained version of the filter in floating-point precision. Finally, its quantized version, h r t q has a reduced quantization error h e q . Figure 8a demonstrates the function of retraining approach as quantization error h e q in case of retaining method (see the first row in Figure 8a) is less than the direct quantization (see the second row in Figure 8a). Note that the retrained filter in floating-point precision, h r t alters its weights to reduce the quantization error, h e q . Figure 8b illustrates a part of the filter before and after the retraining process. All the weights of the filter do not change the value because the alteration is only limited to certain intensities. The above observations confirm that WQR reduces the quantization error h e q . We expect that WQR will reduce the accuracy degradation in trained CPR filters due to the quantization process.

3.4. Geometric Transform

Quantizing the magnitude of 2-dimensional filters introduces the quantization error, which degrades the quality and causes accuracy loss during the inference process. PSNR measures the ratio between the maximum possible signal power and noise power. This ratio estimates the quality after quantization. Equation (33) represents the PSNR, while the power of noise in the denominator is defined by the Mean Square Error (MSE), which is the average of the square of the pixel-by-pixel difference between the original image and the approximated version of the image; however, the MSE also depends on the variance of the original and quantized signals. Equation (34) establishes a relationship between MSE and variance of the signal. For a higher number of pixels, the variance of both original and estimated images has more contribution than the equation’s last three terms. The equation implies that there is a higher variance value of the original signal and its quantized version results in more error. It is obvious in mathematical proof presented in the Appendix A.
P S N R = 20 log 10 M A X f M S E ,
whereas M A X f is the maximum possible value for a given bit-width.
M S E = σ Y i 2 + σ Y ^ i 2 2 N i N Y i Y ^ i + 1 N 2 ( i N Y i ) 2 + 1 N 2 ( i N Y ^ i ) 2 ,
whereas Y i and Y ^ i denote the original image and the estimated image, respectively. σ Y i 2 and σ Y ^ i 2 are the variances of the original image and the estimated image, respectively. N denotes the total number of pixels in the original image.
To enhance PSNR for a given compression ratio, minimizing the variance of the signal and its approximated version require some transformation of the original signal. Here, we have introduced two types of geometric transforms to reduce the variance in the next two subsections.

3.4.1. Reducing the Standard Deviation Using Log-Polar Transform

Sabir et al. [41] have already demonstrated in a previous study that applying the log-polar transform has negligible influence on the classification accuracy degradation. This paper demonstrates the additional property of log-polar besides achieving the scale and rotation invariance. This transform alters the distribution of intensity levels, resulting in reduced standard deviation in the transformed image compared to the original image. This property may be useful for the quantization of the filter weights. A sample of a shirt is selected in grayscale format to understand the effect of log-polar transform outcomes on the intensity value distribution. Figure 9a is the picture of a shirt with strips of various intensity levels. Figure 9c is a histogram of the image, which shows that a large portion of the image has zero intensity level, while 100th intensity level has the second-largest occurrence. By analyzing the overall distribution, the estimated standard deviation is 63.91. Figure 9b represents the picture’s log-polar transform, which is a distorted form of an image; however, it changes the intensity distribution of the image. By observing the histogram in Figure 9d, it is evident that it reduces the frequency of black from 1200 to just 200, while the occurrence of 100th intensity level varies from less than 200 to ∼280, it alters the histogram distribution of the image of the shirt. When the log-polar transform is used, the standard deviation of the image falls from 63.91 to 45.62, which shows that now, the frequency of intensity level is in a more compact form than before; therefore, it became more efficient and convenient to apply any quantization scheme to represent the intensity levels because it will reduce the PSNR value. For log-polar quantization, higher PSNR values as compared to direct quantization confirm the better resilience of this method as shown in Figure 5 and Figure 6.

3.4.2. Reducing the Standard Deviation Using Inverse Log-Polar Transform

One of the many properties of log-polar is its reversibility, which means that it is possible to convert an image back to its original for using a 2-dimensional inverse log-polar transform. Figure 10b shows a transformed image, but, unlike the previous transform, the resulting transformed object in the image reduced in size and quality because many horizontal features of the image are almost curbed. The inverse log-polar transform demonstrates in Figure 10d, and the standard deviation of the image is further reduced to 42.48, but, in this process, zero intensity increased to 2300, which is almost double as compared to the intensity of the log-polar transform.
Equations (35) and (36) show the conversion of θ into x and y Cartesian coordinates; however, when θ varies across its range (0 to 2 π ), the Cartesian coordinates x and y range is 0–r. Figure 10b represents the evidence that the frequency of most intensity levels beyond zero is modified and reduced to the minimum level.
x = e ρ cos θ ,
y = e ρ sin θ ,
whereas ρ denotes the logarithm of the distance between the given point and the origin, and θ denotes the angle between the x-axis and the line through the origin and the given point. Based on the mentioned observations, we can expect that applying the log-map and inverse log-map pre-processing will reduce the quantization noise, which indirectly increases the compression ratio of spatial CPR filters.

3.5. Configurations for Weight Quantization

To understand the quantization and re-quantization, it is necessary to first understand different quantization methods and their configurations with or without the transform. Figure 11a illustrates the direct quantization method, through which regular training of the filter h is followed by quantization, which implies that either DFP or Po2 is performed for a given bit-width. Figure 11b represents the retraining method, through which intensity levels are reinforced using the retraining process. First, like direct quantization, h q filter is obtained after regular filter training. Then, using h q , a separate retraining process for each quantization approach (DFP and Po2) is employed. Finally, after the re-quantization process, h q r t is achieved, which is a quantized form of the retrained filter. In Figure 11c, transforms are applied to support the quantization process. The resulting filter h is transformed using a log-polar or an inverse log-polar transform. For a given bit-width, each quantization technique is performed to obtain a different filter of h q l .

4. Experimental Analysis

4.1. Experimental Setup

Figure 12 illustrates the overall experimental setup consisting of different components.

4.1.1. CPR Filter Implementations and Setting

EEMACH [20] and its derivatives [44] showed a remarkable performance as compared to other CPR filters. Literature shows their superior clutter-rejection capability as compared to other methods, and experiments, which have shown better results, are conducted taking N v = 1 and not at other value of N v . The same setting of N v . is applied to our experiments.
We applied a couple of mathematical quantization techniques to filters to reiterate the proposed approaches, as demonstrated in block diagrams Figure 11a–c. The experimental setup in Figure 12 uses in-house MATLAB scripts and functions for training (Train.m), cross-correlation (conv.m), quantizaion (Po2.m and DFT.m), WQR (Retraining.m) and geometric transform (logpolar.m and inverselogpolar.m). The experiment setup trains the filters on dataset 01, 02, and 03, with each dataset has unique purposes. For validation purposes, the parameters β and γ are obtained using PSO optimization. The in-built MATLAB function (Particleswarm.m) is employed for the PSO optimization of parameters where cross-validation dataset is used explicitly for said purpose. Initially, the filters train (Train.m) with a geometric transform (logpolar.m or inverselogpolar.m) or directly. Optimal values of β and γ parameters are generated by the PSO optimization which are employed for retraining (Retrain.m) the CPR filter. Either of these trained filters is quantized for DFP (DFT.m) or Po2 (Po2) scheme. Then, either scale, moving lighting, rotation, or classification test is performed using cross-correlation (conv.m). After this, evaluation and detection score produces the analysis graphs separately for each test.

4.1.2. Database

For evaluation, the experimental work is carried out on publicly available datasets [48]. These datasets contain test images with or without a (black) background in different poses, which vary from 0 to 180 degrees out of a plane at different elevation angles. For the training phase, we use images without a background. These training snips are centered in the middle of the test image, which makes them ideal for recognizing correlation patterns. In order to analyze the responses of precision reduction in filters, dataset 01 is specifically used to evaluate the ROC’s comparison of different techniques and methods adopted in this paper. Similarly, to study the precision reduction responses against the scale enhancement and lighting alterations, dataset 02 and dataset 03 are employed, respectively.

4.1.3. Evaluation Framework

To understand the efficiency of the proposed techniques, we initially outlined an appropriate framework for experimental evaluation. After choosing the database, the next step is to define the performance evaluation framework. Instead of performing a lexicographical scan, equal window size for both the filter and the full-test image is considered. The block diagram of this framework is shown in Figure 3. Three different objects are demonstrated in Figure 13. At a 30-degree elevation angle, the images of each object are divided into six sections. Each section has six object images with six consecutive out-of-plane angles. These image sections have successive intervals 0–30, 35–55, 60–90, 95–125, 130–160, and 165–180 degrees with 5-degree incremental gap in each section.
For testing, we use an image at a 50-degree elevation angle. An example is demonstrated in Figure 14. Thus, a total of 18 filters are formed, while there are six filters for each object. To assess the clutter rejection capability of each filter, we draw 2560 clutter images from the database. The filter response of each filter has a maximum value called correlation-output peak intensity. Instead of directly considering the raw correlation plane cp and the peak value for measuring the target object’s existence, we consider a mathematically-derived form of correlation plane output intensity, as is presented in Equations (37) and (38). This mathematical transform assures the output quality of the correlation plane. Raw correlation plane response does not provide the quality of correlation. Only considering the maximum value in the output correlation plane provides no information about the suppression of side lobes in a correlation plane; therefore, during the correlation process, there is a high probability that the final response has high-correlation output peak intensity, but, after mathematical processing, correlation peak can reduce. Conversely, after the mathematical process, lesser peak intensity is observed in the correlation plane with shorter side lobes’ intensity, which might increase.
ϑ j = c p j μ c p j x y | c p j | 2 ,
n c p j = ϑ j σ ϑ j .
Here, c p j is the raw correlation plane of the test image j, σ ϑ j is the standard deviation of the mean subtracted normalized correlation plane ϑ j , and n c p is the normalized correlation plane. The correlation output peak intensity of this plane serves as an object-detection score. Table 3 represents sample COPI’s and its corresponding detection scores for both quantization schemes.
ι = max ( n c p j ) ,
Δ % = ( ι τ ) ι × 100 ,
τ = 0.5 × 1 N i N n c p i .
Here, n c p j is the correlation response of the test image j. ι is the absolute peak correlation intensity used in Equation (39). In Equation (40), Δ % is the percentage difference between the threshold and the C O P I of test response j. The average normalized correlation peaks’ response τ of training instances i = 1 , 2 , 3 , . . . . . . . . . , N multiplied with a factor of 0.5 in Equation (41).

4.2. Parameter Optimization

To maximize the filter response, we should select appropriate parameters for each filter. For this purpose, we establish a framework of cross-validation set for each filter. This validation set has a test image from the training set of the corresponding filter as a true class and 100 clutter images are randomly chosen out of 2560 clutter images as a false class. The cross-validation of 600 clutter images is defined for each object. Previously, to estimate the binary class difference, Peak-to-Side-lobe Ratio and Fisher Ratio were used; however, in this paper, we employ a simple ratio of mean correlation output peak intensity of the false class μ n c p F to mean correlation output peak intensity of the true class, μ n c p T . This peak ratio is illustrated in Equation (42).
P r = μ n c p F μ n c p T .
We select an optimal β value with minimum ratio P r . We search the optimal β value across the beta range 0–1 using PSO for each compression ratio. As quantization process for each filter results in the quantization error, which is different for direct quantization, log-polar, inverse log-polar, and filter retraining methods; therefore, each compression ratio holds a different set of optimal parameters.

PSO-Based Optimization of γ and β

The classical PSO is a self-organizing approach that holds the powerful property of dynamic non-linearity. Our problem is non-linear, we want to calculate the parameter(s) using the minimum objective function value. This makes PSO an ideal solution for parameter searching, like previous literature [36]. This property ensures the trade-off between the positive and negative response. The positive response supports constructing the swarm structures, while the negative response acts as a counterweight to this construction. Overall, this method provides a stable and complete solution to a non-linear problem. PSO also offers a balance between the exploitation and exploration of a solution. Further, potential solutions are known as particles, which excessively interact with neighboring particles. This interaction spread the updated information through-out the swarm. Filter retraining method search space is not limited to a parameter. We employed the classical PSO technique for finding the optimal values of β and γ , while minimizing the objective function of the peak ratio P r , where v i , j = [ v 1 , j , v 2 , j , . . . . v 20 , j ] is the velocity vector of twenty particles and p i = [ γ i , β i ] , β [ 0 , 1 ] , and γ [ 0 , 1 ] for particle i and dimension j. Velocities and particles are randomly initialized. Each particle is initialized with uniformly random values in a bounded range [0,1]. For each particle, a filter is separately retrained and cross-correlated with false and true images. Subsequently, the P r ratio is calculated using the average of peak intensity for false and true samples defined in the cross-validation dataset. Pr value is minimized for each filter using the algorithm given in Figure 15. PSO exits on either completion of the maximum number of epochs or getting the same minimum value of Pr for the number of epochs. Equations (43) and (44) are used to update the velocity and position of each particle.
v i , j i t r + 1 = w v i , j i t r + s 1 r ( ) ( p B e s t i , j p i , j i t r ) + s 2 R ( ) ( g B e s t j p i , j i t r ) ,
p i , j i t r + 1 = p i , j i t r + v i , j i t r + 1 ,
where i t r denotes iteration, i = 1 , 2 , . . . , 20 and j = 1 , 2 . s 1 and s 2 are acceleration constants. r() and R() are random functions, while w denotes the influence of motion in previous iteration, 0 ≤ w < 1. p B e s t is the particle’s best position, while g B e s t is the global best position.

4.3. Performance Analysis

4.3.1. Rotational Analysis

For comparison, the full precision responses of 16-bit spatially-trained and frequency-trained EEMACH are considered a baseline for the DFP and Po2 quantized trained filters. For each compression ratio, the ST and FT filters are quantized using both the proposed approaches.
Commencing with a compression ratio of 16, the DFP and Po2 quantization schemes are separately analyzed for spatial and frequency domains. The contemplated strategies show nearly identical performance graphs. Detection responses of all the quantization schemes are slightly above the threshold. For direct quantization and retraining filters, the detection on the average response is ∼23% more than the baseline. In the case of inverse, log-map transform on the average response is ∼30% above the baseline in Figure 16a (box 1). The log-map quantization response has a slight dip around 100 degrees in Figure 16a (dip 1) and 300 degrees in Figure 16a (dip 2). On average, the log-map response is ∼35% below the baseline in Figure 16a (box 2).
With compression ratio 8 for inverse log-map, the response reduces to ∼15% on average above the baseline in Figure 16e (box 3). Direct and retrained quantization filters show responses, which are almost similar to the reference responses. The log-map pre-processing on the average response is ∼40% below the baseline in Figure 16e (box 4). In the second tetrad (e–h), the log-map response remained lower than the threshold in a 50–300 degree interval in Figure 16e (dip 3) for each quantization type. Contrary to the frequency-trained filter, the response of spatially-trained filters for log-map quantization remains below the threshold within 75–270 degree interval in Figure 16g (dip 4). Subsequently, for all spatial responses below compression ratio 8, the log-map pre-processing response for both quantization types diminish below the threshold from 75 to 270 degrees in Figure 16o (dip 5). No significant change is observed in the rest of the responses as compared to the previous cases. All the responses, except for the log-map quantization, do not significantly vary.
Regarding the compression ratio of 5.33, the average response of the log-map pre-processing remains ∼30% below the baseline; however, for all compression ratios below 5.33, the log-map pre-processing for FT filters with the Po2 quantization shows a response that diminishes below the threshold of 0–190 degree interval in Figure 16m (dip 6). For compression ratio 4 or below, the DFP quantization for FT filter remains below the threshold for a 50–190 degree interval. Consequently, the full precision and direct quantization responses have almost identical curves with a gradual drop in the compression ratio.

4.3.2. Scale and Moving Light Analysis

Dataset 2 includes the image of a car at different scales. Some samples are shown in Figure 17. In order to investigate the resilience of the compressed configurations to handle the target’s scalability, filter detection responses are measured on the scale of 0–400% of the original target size, and it is given in Figure 18. Similar to the rotational test, each set of the four graphs obtain for the following corresponding compression ratios: 16, 8, 5.33, 4, 8, and 1.33.
For the 16 compression ratio, the full precision detection response for both ST and FT filter is above the threshold up to 125% scalability. Beyond this scale, this response mainly revolved around the threshold value; however, the detection response of the inverse log-map is well above the threshold with a slight fall around 225% scale in Figure 18a (dip 1). For direct and retrained quantization, the detection score is above the threshold up to 350% in Figure 18a (dip 2), whereas log-map pre-processing is only successful up to 80% scale in Figure 18a (dip 3).
When the compression ratio is 8, all the curves’ detection responses do not change much except for the log-map pre-processing. In that case, the response increase continued above the threshold on a scale of 0 to 400% in Figure 18e (box 1). Overall, compression ratio 8 is found to be more resilient in terms of scale enhancements for each type of quantization.
For the compression ratio of 5.33, the FT filter’s detection score for the inverse log-map pre-processing has a slightly deeper dip with almost 225% scale enhancement in Figure 18i (dip 4). This drop stays above the threshold for the spatially-trained filter in Figure 18k (dip 5). On the other hand, for compression ratio below 5.33, this dip Figure 18m (dip 6) in the detection score goes even deeper for ST and FT filters for both compression schemes but the log-map pre-processing shows a detection score below the threshold for FT-quantized filter versions.
Conversely, the detection score remains well above the threshold for the ST versions of filters. The remaining quantization versions do not considerably alter its detection score by analyzing the curves related to the rest of the compression ratios. Sequel to a comprehensive analysis of graphs in Figure 18, the detection responses of ST quantized filters are more resilient to scale enhancements than the FT quantized filters. Conversely, the detection score remains well above the threshold for the ST filter versions. By analyzing the rest of the compression ratios’ curves, the remaining quantization versions do not significantly alter its detection score. The comprehensive analysis is presented in the graphs in Figure 18, which show more resilience in the detection responses of the ST-quantized filters than the FT-quantized filters. Dataset 3 includes more than 1000 car images captured under various lighting conditions. Each image has a background, which shows that it belonged to a specific set of images developed by incremental rotation from 0 to 360 degrees under a particular light setting around the car as shown in Figure 19. Overall, the compressed versions of the filter exhibit excellent responses under different lighting conditions. For brevity, the compression ratios of 8 and 16 are demonstrated in Figure 20 and Figure 21, respectively. Generally, for all compression ratios, including log-map and inverse log-map, pre-processing exhibits superior performance as compared to the baseline. That is equally valid for both ST and FT filters; however, the retrained and direct quantization filter responses are below the baseline. In Figure 20, responses of each graph for all the quantized instances are found identical. In comparison, Figure 21 expresses a better response to the FT filter as compared to the ST filter for both cases of the inverse log-map and the log-map. Beyond a compression ratio of 8, the performance graphs do not change; however, their responses remain well above the threshold. Table 4 explains the legends in Figure 16, Figure 18, Figure 19, Figure 20 and Figure 21.

4.3.3. ROC Comparative Analysis

Typically, the CPR paradigm’s evaluation analysis is achieved through conventional Receiver Operator Characteristic (ROC). Previously, the EMACH [19] and EEMACH [20] were analyzed in the available literature using the ROC analysis approach, but the issues with the ROC results made it insignificant for statistical analysis and inconsistent for application. Each compression level has a ROC curve, so each is considered a separate classifier; therefore, there are many ROC curves. The full-precision implementation has a distinct ROC curve for each trained method. These FT or ST methods provide a baseline for comparison to the corresponding compression rates. Since each approach holds 16 curves for each trained method, a total of 32 ROC curves are evaluated for each method. All 32 ROC curves should be compared with the corresponding ROC baseline curve representing full precision to find the compression outcomes for each bit-width. In our experimental analysis, Z, D, E, and their corresponding p-values are used to describe the likeness between the baseline ROC and the corresponding compressed versions for each method, for example, direct, log-map, and inverse log-map. E measure may highlight the differences between any two paired ROCs and the E values with their p-values are more significant as compared to Z and D; therefore, for brevity, we have only discussed the bit-widths having a minimum E value. See Appendix B for detailed results.
Figure 22a illustrates the E values and p-values [49,50] for all above-mentioned methods for ROC comparison. E and p-value demonstrate an integrated absolute difference between two ROC curves; so, the smaller E value shows the two ROC curves’ closeness. The large value of E and p < 0.05 illustrates ROC’s degradation due to corresponding bit-width compression. For direct quantization, the FT filter has the least E value 1308 with p-value < 0.92 for both Po2 and DFP. This indicates a strong closeness between ROCs, which implies nearly equal classification performance, like original ROCs. The ST filter has E = 2368, p-value < 0.1525, E = 1106 and p-value < 0.106 for Po2 and DFP, respectively which means relatively less closeness as compared to FT. For log-map transform, the E value of all FT and ST quantization schemes varies between 83,184 and 88,000 with p-value < 2.2 × 10 16 . This shows a classification of performance degradation for all bit-width compression. For inverse log-map transform, the E value of all FT and ST quantization schemes varies between 29,714 and 32,948 with p-value < 2.2 × 10 16 . This demonstrates relatively less classification performance degradation for all bit-width compression as compared to log-map transform. For WQR, the ST filter has E = 3618, p-value < 0.028, E = 2442 and p-value < 0.053 for Po2 and DFP, respectively. These bit-widths have better classification performances than log-map and inverse log-map but less classification performance than the direct quantization.
For comparing ROC curves, the area under the curve (AUC) is assumed as an accuracy measure. In Equation (45), we present parameter Z and p-value to find the difference between the AUC of the two curves.
Z = θ 2 θ 2 σ 2 .
In Equation (45), θ 1 and θ 2 denote the respective AUCs of ROC1 and ROC2, while σ 2 is the standard deviation of the difference between the thetas. Figure 22b illustrates the Z value and p-value [51] for all above-mentioned methods for AUC comparison. The negative and small values indicate no or insignificant drop in AUC, while a large value demonstrates signification fall in AUC. For direct quantization, the Z value varies from −1.4929 to 0.22072, with p-values showing no significant difference between the AUC of quantized versions and the original. For the log-map transform, the Z value for ST and FT varies from 19.357 to 20.232 with p-value 2.2 × 10 16 illustrating a significant AUC loss for all quantization schemes. For the inverse log-map transform, the Z value for ST and FT varies from 9.799 to 11.198, with a p-value 2.2 × 10 16 illustrating a less significant AUC loss than the log-map. For the WQR, the ST filter has the Z value -1.4954 with p-value < 0.1348 for Po2 and Z = −0.3745 with p-value < 0.708 for DFP, which shows better AUC of bit-widths than log-map and inverse log-map transform. Equation (46) is another measure to compare the AUC’s of ROCs, whereas D is given as follows:
D = V r ( θ r ) V s ( θ s ) σ r 2 + σ s 2 .
In Equation (46), θ r and θ s denote the respective AUCs of r and s ROC curves, while σ r 2 and σ s 2 are the standard deviations of “r” and “s”, respectively. Figure 22c demonstrates the D value and p-value [52] for AUC comparison. The D values further support the results of E and Z. The D value and p-value for direct quantization are showed no significant AUC loss for all quantization methods. The D value varies between −1.4993 to 0.22624 with p-values. For the log-map transform, the D value changes from 18.697 to 20.433 for all quantization. This demonstrates a lot of performance degradation (AUC). For inverse log-map transform, the D value changes from 9.548 to 11.251 for all quantization. This demonstrates a less AUC degradation as compared to the log-map transform. Figure 22d illustrates the AUC values for each quantization method for both ST and FT filters and its comparisons with baseline AUCs.
To signify the benefits of quantization to the CPR filters, the performance parameters are demonstrated by column graphs in Figure 23, Figure 24 and Figure 25. The compressed versions of these filters are concluded based on the least E value. The selected performance parameters include sparsity, CPU execution time, and memory minimization. Here, the sparsity implies the number of zero weights in the trained filters, which implies a reduction in floating-point operations workload and speed-up in the convolution process during inference. The sparsity of the corresponding quantization schemes is displayed in Figure 23. On average, the direct and inverse log-map quantization schemes have better weight sparsity values as compared to full-precision versions. The best case is Po2 compression, with a compression ratio 16 but the log-map sparsity is insignificant. For a few instances, their sparsity lagged behind the full precision. The second performance measure is the memory minimization, as shown in Figure 24. Again, the inverse log-map and the direct one have meager memory requirements as compared to the full-precision version.
On the other hand, log-map memory requirements are modest. This third measure is the CPU execution time, which is given in Figure 25. The CPU is i5-2500k, 3.30 GHz with 3301 MHz 4 Core processor, whereas the system is 64-bit, equipped with 16GB RAM. The CPU time is measured using tic and toc standard functions available in MATLAB. Overall, the inverse log-map has the least execution time on the CPU, which is followed by the direct quantization scheme. The fastest case is the Po2 quantization with a 16-bit compression, which showed ∼8.90× speed-up capacity during the full-precision implementation.

5. Conclusions

The spatial-domain CPR filters require substantial computation resources and memory. The proposed weight quantization is imperative to reduce the computation workload, processing time, and memory minimization. We propose the WQR approach and pre-processing steps, like log-map and inverse log-map, to improve the accuracy degradation through full-precision weight quantization. The WQR regulates the filter retraining process by fine-tuning the weights through any stated quantization scheme. The PSO is used for selecting WQR training parameters. Quantization error causes more accuracy loss to the ST filters than the FT filters, and WQR alleviates this accuracy loss. No accuracy degradation occurs at 9.88–88.54% MAC sparsity, 1.11×–4.73× speedup, and 14 is the maximum compression ratio for direct quantization. The inverse log-map achieves 34.30–94.87% MAC sparsity, 2.57×–8.90× speed-up, and maximum 1-bit compression with 6% accuracy loss, while the log-map achieves 4.25–34.30% MAC sparsity, 0.98×–1.12× speedup, and maximum 4-bit compression with 16% decline in accuracy. To study the quantization for the CPR, Po2 and DFP quantization approaches are applied. The results showed that better ROC closeness of DFP quantization is assured with the floating-point, while Po2 achieved better precision reduction. Based on the results, it is unnecessary to perform the retraining procedure with DFP quantization. On the other hand, retaining with Po2 showed better performance improvement than the DFP quantization. It can easily be concluded that Po2 quantization is a preferable choice for CPR. Moreover, the Po2 implementation for the spatial-domain CPR on hardware is recommended for future work. Further, multiplication in Po2-quantized filter could be performed using the shift operation, which makes it more hardware-friendly; however, it needs optimized hardware. We consider recovering the accuracy loss of geometric pre-processing, Po2 quantization, and retraining for future work.

Author Contributions

Conceptualization, methodology and formal analysis, D.S.; supervision, A.H., S.R. and M.S.; writing—original draft, D.S.; writing—review and editing, M.A.H. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was partially funded by National University of Sciences and Technology, Islamabad.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Relationship between Mean Square Error and Variance of Sample and Its Quantized Version

Whereas Y i and Y ^ i denote the original image and the estimated image, respectively, N denotes the total number of pixels in the original image. Y is the difference between an original image and its corresponding quantized version. μ Y is the average of differences between original and estimated images.
As we know the equation,
i N ( Y μ Y ) 2 = i N ( Y ) 2 1 N ( i N Y ) 2 , Y = Y i Y ^ i
i N ( Y ) 2 = i N ( Y i Y ^ i ) 2 = i N ( Y i 2 + Y ^ i 2 2 Y i Y ^ i ) = i N Y i 2 + i N Y ^ i 2 2 i N Y i Y ^ i .
By multiplying Equation (A2) with 1 N ,
1 N ( i N Y ) 2 = 1 N ( i N ( Y i Y ^ i ) ) 2 = 1 N ( i N Y i i N Y ^ i ) 2 = 1 N ( ( i N Y i ) 2 ( i N Y ^ i ) 2 + 2 ( i N Y i ) ( i N Y ^ i ) ) = 1 N ( i N Y i ) 2 1 N ( i N Y ^ i ) 2 + 2 N ( i N Y i ) ( i N Y ^ i ) .
By substituting the Equation (A2) and Equation (A3) into L.H.S. of Equation (A1), we get
i N ( Y μ Y ) 2 = i N Y i 2 1 N ( i N Y i ) 2 + i N Y ^ i 2 1 N ( i N Y ^ i ) 2 2 i N Y i Y ^ i + 2 N ( i N Y i ) ( i N Y ^ i ) .
Equation (A4) can be simplified by substituting μ Y i = 1 N ( i N Y i ) and μ Y ^ i = 1 N ( i N Y ^ i ) ,
i N ( Y ) 2 1 N ( i N Y ) 2 = i N ( Y i μ Y i ) 2 + i N ( Y ^ i μ Y ^ i ) 2 2 i N Y i Y ^ i + 2 N ( i N Y i ) ( i N Y ^ i ) ,
i N ( Y ) 2 = i N ( Y i μ Y i ) 2 + i N ( Y ^ i μ Y ^ i ) 2 2 i N Y i Y ^ i + 2 N ( i N Y i ) ( i N Y ^ i ) + 1 N ( i N Y i ) 2 + 1 N ( i N Y ^ i ) 2 2 N ( i N Y i ) ( i N Y ^ i ) .
By simplifying above equations, we get
i N ( Y ) 2 = i N ( Y i μ Y i ) 2 + i N ( Y ^ i μ Y ^ i ) 2 2 i N Y i Y ^ i + 1 N ( i N Y i ) 2 + 1 N ( i N Y ^ i ) 2 .
By multiplying Equation (A7) with 1 N ,
1 N i N ( Y ) 2 = 1 N i N ( Y i μ Y i ) 2 + 1 N i N ( Y ^ i μ Y ^ i ) 2 2 N i N Y i Y ^ i + 1 N 2 ( i N Y i ) 2 + 1 N 2 ( i N Y ^ i ) 2 .
σ Y i 2 and σ Y ^ i 2 are the variances of the original image and its corresponding quantized version, respectively.
M S E = σ Y i 2 + σ Y ^ i 2 2 N i N Y i Y ^ i + 1 N 2 ( i N Y i ) 2 + 1 N 2 ( i N Y ^ i ) 2 ,
whereas M S E = 1 N i N ( Y ) 2 , σ Y i 2 = 1 N i N ( Y i μ Y i ) 2 , and σ Y ^ i 2 = 1 N i N ( Y ^ i μ Y ^ i ) 2 :
P N S R = 20 log 10 M A X f M S E .

Appendix B. Performance Tables

Table A1. Performance measurements of direct Power-of-Two (Po2) quantization scheme.
Table A1. Performance measurements of direct Power-of-Two (Po2) quantization scheme.
Bit-WidthFrequency, AUC = 0.9601852Spatial, AUC = 0.9555699
Zp-Value <Dp-Value <Ep-Value <AUCZp-Value <Dp-Value <Ep-Value <AUC
1−0.24620.8055−0.24730.804710,5742.20 × 10 16 0.96109−0.31260.7546−0.31140.755518,5902.20 × 10 16 0.95734
2−0.36420.7157−0.36820.712713080.9260.960992.37050.017762.33210.019777062.20 × 10 16 0.94758
33.92358.73 × 10 5 3.94358.03 × 10 5 39602.20 × 10 16 0.953840.220720.82530.226240.82123680.14250.9551
44.16193.16 × 10 5 4.15543.25 × 10 5 28702.20 × 10 16 0.955520.705790.48030.716560.473629140.14350.95401
54.49516.95 × 10 6 4.59964.23 × 10 6 31142.20 × 10 16 0.95511.05990.28921.04280.29729720.1150.95325
64.49516.95 × 10 6 4.5076.58 × 10 6 31142.20 × 10 16 0.95511.05990.28921.07510.282329720.1170.95325
74.49516.95 × 10 6 4.3981.09 × 10 5 31142.20 × 10 16 0.95511.05990.28921.05280.292429720.1030.95325
84.49516.95 × 10 6 4.44498.79 × 10 6 31142.20 × 10 16 0.95511.05990.28921.07770.281229720.1190.95325
94.49516.95 × 10 6 4.52246.11 × 10 6 31142.20 × 10 16 0.95511.05990.28921.06570.286629720.1110.95325
104.49516.95 × 10 6 4.52416.06 × 10 6 31142.20 × 10 16 0.95511.05990.28921.06550.286629720.11950.95325
114.49516.95 × 10 6 4.51146.44 × 10 6 31142.20 × 10 16 0.95511.05990.28921.08490.27829720.1190.95325
124.49516.95 × 10 6 4.5285.95 × 10 6 31142.20 × 10 16 0.95511.05990.28921.04180.297529720.12250.95325
134.49516.95 × 10 6 4.40791.04 × 10 5 31142.20 × 10 16 0.95511.05990.28921.05440.291729720.12450.95325
144.49516.95 × 10 6 4.49387.00 × 10 6 31142.20 × 10 16 0.95511.05990.28921.07460.282629720.13550.95325
154.49516.95 × 10 6 4.46727.93 × 10 6 31142.20 × 10 16 0.95511.05990.28921.05280.292429720.11750.95325
164.49516.95 × 10 6 4.38451.16 × 10 5 31142.20 × 10 16 0.95511.05990.28921.06920.28529720.12750.95325
Mean3.8385160.09508763.8295140.094850335050.0578750.9557880.9814490.3467730.9825490.3451944202.6250.10731250.953317
Table A2. Performance measurements of direct Dynamic-Fixed-Point (DFP) quantization scheme.
Table A2. Performance measurements of direct Dynamic-Fixed-Point (DFP) quantization scheme.
Bit-WidthFrequency, AUC = 0.9601852Spatial, AUC = 0.9555699
Zp-Value <Dp-Value <Ep-Value <AUCZp-Value <Dp-Value <Ep-Value <AUC
1−0.24620.8055−0.24160.805710,5742.20 × 10 16 0.96109−0.29210.7702−0.29860.765218,6442.20 × 10 16 0.95722
2−0.36420.7157−0.36210.715213080.92850.96099−2.8590.00425−2.8590.0042558902.20 × 10 16 0.96201
33.68110.0002323.67850.00023532445.00 × 10 4 0.95496−1.43470.1514−1.43090.152528780.00750.95799
45.74149.39 × 10 9 5.60512.08 × 10 8 48002.20 × 10 16 0.95186−2.11650.0343−2.08120.0374230760.00150.95884
55.32151.03 × 10 7 5.17072.33 × 10 7 47942.20 × 10 16 0.95198−1.97850.04788−20.045532122.20 × 10 16 0.95862
65.79976.65 × 10 9 5.76248.30 × 10 9 39842.20 × 10 16 0.95311−2.91840.00352−3.02270.0025120620.0030.95907
76.28093.37 × 10 10 5.96412.46 × 10 9 40382.20 × 10 16 0.953−2.53130.01136−2.62580.0086518060.01650.95855
86.60034.10 × 10 11 6.5495.79 × 10 11 42022.20 × 10 16 0.95272−3.21680.0013−3.19450.001422545.00 × 10 4 0.95951
96.73221.67 × 10 11 6.67322.50 × 10 11 43262.20 × 10 16 0.9525−1.57040.1163−1.56380.117911440.10250.95736
106.74521.53 × 10 11 6.78721.14 × 10 11 42962.20 × 10 16 0.95256−1.56120.1185−1.60020.109611340.10150.95735
116.72981.70 × 10 11 6.53286.46 × 10 11 42842.20 × 10 16 0.95258−1.52620.127−1.52510.127211260.10750.95731
126.72981.70 × 10 11 6.55875.43 × 10 11 42842.20 × 10 16 0.95258−1.52030.1284−1.55320.120411260.1090.9573
136.72991.70 × 10 11 6.66922.57 × 10 11 42802.20 × 10 16 0.95258−1.49290.1355−1.4930.135411060.1060.95728
146.73571.63 × 10 11 6.56145.33 × 10 11 42802.20 × 10 16 0.95258−1.50740.1317−1.49170.135811100.110.95729
156.73431.65 × 10 11 6.50287.88 × 10 11 42802.20 × 10 16 0.95258−1.50410.1326−1.52990.12611120.09850.95729
166.73491.64 × 10 11 6.61453.73 × 10 11 42822.20 × 10 16 0.95258−1.49290.1355−1.47040.141511100.09350.95728
Mean5.4178980.09508955.314120.09507094453.50.05806250.953766−1.845170.128107−1.858750.1269513049.3750.05359380.95814
Table A3. Performance measurements of log-map pre-processing with Power-of-Two (Po2) quantization scheme.
Table A3. Performance measurements of log-map pre-processing with Power-of-Two (Po2) quantization scheme.
Bit-WidthFrequency, AUC = 0.9601852Spatial, AUC = 0.9555699
Zp-Value <Dp-Value <Ep-Value <AUCZp-Value <Dp-Value <Ep-Value <AUC
125.4472.2 × 10 16 25.5062.2 × 10 16 11,64282.2 × 10 16 0.7520720.7252.2 × 10 16 20.9592.2 × 10 16 153,2402.2 × 10 16 0.68165
222.3622.2 × 10 16 22.2562.2 × 10 16 88,1302.2 × 10 16 0.8026519.842.2 × 10 16 20.3442.2 × 10 16 88,0622.2 × 10 16 0.79816
319.9132.2 × 10 16 20.3422.2 × 10 16 84,3762.2 × 10 16 0.8093622.5332.2 × 10 16 22.492.2 × 10 16 105,9322.2 × 10 16 0.76622
420.2322.2 × 10 16 20.4332.2 × 10 16 88,0002.2 × 10 16 0.8028919.4972.2 × 10 16 19.8022.2 × 10 16 83,1842.2 × 10 16 0.80688
520.2232.2 × 10 16 19.8882.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.3982.2 × 10 16 83,2722.2 × 10 16 0.80672
620.2232.2 × 10 16 20.2252.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.5842.2 × 10 16 83,2722.2 × 10 16 0.80672
720.2232.2 × 10 16 19.8752.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.5622.2 × 10 16 83,2722.2 × 10 16 0.80672
820.2232.2 × 10 16 20.2422.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 18.9982.2 × 10 16 83,2722.2 × 10 16 0.80672
920.2232.2 × 10 16 20.8892.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.2222.2 × 10 16 83,2722.2 × 10 16 0.80672
1020.2232.2 × 10 16 20.2672.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.9792.2 × 10 16 83,2722.2 × 10 16 0.80672
1120.2232.2 × 10 16 20.2542.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.2422.2 × 10 16 83,2722.2 × 10 16 0.80672
1220.2232.2 × 10 16 20.4362.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.1292.2 × 10 16 83,2722.2 × 10 16 0.80672
1320.2232.2 × 10 16 20.1012.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.3632.2 × 10 16 83,2722.2 × 10 16 0.80672
1420.2232.2 × 10 16 19.7962.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.6632.2 × 10 16 83,2722.2 × 10 16 0.80672
1520.2232.2 × 10 16 20.0322.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.5762.2 × 10 16 83,2722.2 × 10 16 0.80672
1620.2232.2 × 10 16 20.4252.2 × 10 16 88,0222.2 × 10 16 0.8028519.5082.2 × 10 16 19.0222.2 × 10 16 83,2722.2 × 10 16 0.80672
Mean20.664382.2 × 10 16 20.685442.2 × 10 16 89,574.882.2 × 10 16 0.8000719.793192.2 × 10 16 19.770812.2 × 10 16 89,355.132.2 × 10 16 0.795847
Table A4. Performance measurements of log-map pre-processing with Dynamic-Fixed-Point (DFP) quantization scheme.
Table A4. Performance measurements of log-map pre-processing with Dynamic-Fixed-Point (DFP) quantization scheme.
Bit-WidthFrequency, AUC = 0.9601852Spatial, AUC = 0.9555699
Zp-Value <Dp-Value <Ep-Value <AUCZp-Value <Dp-Value <Ep-Value <AUC
125.4472.2 × 10 16 25.2642.2 × 10 16 116,4282.2 × 10 16 0.7520721.2792.2 × 10 16 20.9262.2 × 10 16 103,9562.2 × 10 16 0.76975
222.3622.2 × 10 16 22.9132.2 × 10 16 88,1302.2 × 10 16 0.8026517.1842.2 × 10 16 17.452.2 × 10 16 89,4722.2 × 10 16 0.79564
320.3712.2 × 10 16 20.1872.2 × 10 16 92,4022.2 × 10 16 0.7950222.682.2 × 10 16 22.4632.2 × 10 16 105,5022.2 × 10 16 0.76698
419.3572.2 × 10 16 18.6972.2 × 10 16 84,7462.2 × 10 16 0.808722.862.2 × 10 16 23.5142.2 × 10 16 107,3502.2 × 10 16 0.76368
519.5742.2 × 10 16 19.7172.2 × 10 16 87,7902.2 × 10 16 0.8032619.4662.2 × 10 16 19.5772.2 × 10 16 85,6342.2 × 10 16 0.8025
619.5892.2 × 10 16 19.4442.2 × 10 16 86,7922.2 × 10 16 0.8050419.2422.2 × 10 16 18.8422.2 × 10 16 85,4562.2 × 10 16 0.80282
719.5012.2 × 10 16 20.1132.2 × 10 16 86,8342.2 × 10 16 0.8049719.3862.2 × 10 16 19.1552.2 × 10 16 86,2602.2 × 10 16 0.80138
819.5542.2 × 10 16 19.132.2 × 10 16 86,8422.2 × 10 16 0.8049619.3612.2 × 10 16 19.452.2 × 10 16 86,2042.2 × 10 16 0.80148
919.5322.2 × 10 16 19.152.2 × 10 16 86,8442.2 × 10 16 0.8049519.3412.2 × 10 16 18.8992.2 × 10 16 86,2802.2 × 10 16 0.80134
1019.5112.2 × 10 16 19.5692.2 × 10 16 86,8242.2 × 10 16 0.8049919.3522.2 × 10 16 19.5552.2 × 10 16 86,2562.2 × 10 16 0.80139
1119.5212.2 × 10 16 19.9972.2 × 10 16 86,8182.2 × 10 16 0.80519.3492.2 × 10 16 19.4032.2 × 10 16 86,2522.2 × 10 16 0.80139
1219.5172.2 × 10 16 19.7072.2 × 10 16 86,8142.2 × 10 16 0.8050119.352.2 × 10 16 18.8712.2 × 10 16 86,2702.2 × 10 16 0.80136
1319.5242.2 × 10 16 20.0222.2 × 10 16 86,8222.2 × 10 16 0.8049919.3492.2 × 10 16 19.7422.2 × 10 16 86,2502.2 × 10 16 0.8014
1419.5232.2 × 10 16 19.4282.2 × 10 16 86,8242.2 × 10 16 0.8049919.352.2 × 10 16 19.2572.2 × 10 16 86,2542.2 × 10 16 0.80139
1519.5232.2 × 10 16 19.4492.2 × 10 16 86,8242.2 × 10 16 0.8049919.3482.2 × 10 16 19.8312.2 × 10 16 86,2542.2 × 10 16 0.80139
1619.5242.2 × 10 16 19.3542.2 × 10 16 86,8222.2 × 10 16 0.8049919.352.2 × 10 16 19.0332.2 × 10 16 86,2542.2 × 10 16 0.80139
Mean20.120632.2 × 10 16 20.133812.2 × 10 16 89,034.752.2 × 10 16 0.80103519.765442.2 × 10 16 19.7482.2 × 10 16 89,9942.2 × 10 16 0.794705
Table A5. Performance measurements of inverse log-map pre-processing with Power-of-Two (Po2) quantization scheme.
Table A5. Performance measurements of inverse log-map pre-processing with Power-of-Two (Po2) quantization scheme.
Bit-WidthFrequency, AUC = 0.9601852Spatial, AUC = 0.9555699
Zp-Value <Dp-Value <Ep-Value <AUCZp-Value <Dp-Value <Ep-Value <AUC
110.4592.20 × 10 16 10.2522.20 × 10 16 32,9482.20 × 10 16 0.9015110.2722.20 × 10 16 10.4552.20 × 10 16 34,3922.20 × 10 16 0.89409
212.3872.20 × 10 16 12.3472.20 × 10 16 35,6462.20 × 10 16 0.8964710.7882.20 × 10 16 10.762.20 × 10 16 32,4382.20 × 10 16 0.89759
311.8242.20 × 10 16 11.6672.20 × 10 16 33,7762.20 × 10 16 0.8998110.3962.20 × 10 16 10.5962.20 × 10 16 30,9542.20 × 10 16 0.90024
412.3132.20 × 10 16 12.2292.20 × 10 16 34,6822.20 × 10 16 0.8981910.1342.20 × 10 16 10.342.20 × 10 16 31,3482.20 × 10 16 0.89954
512.4472.20 × 10 16 12.3392.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.2552.20 × 10 16 31,1562.20 × 10 16 0.89988
612.4472.20 × 10 16 12.5262.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.6252.20 × 10 16 31,1562.20 × 10 16 0.89988
712.4472.20 × 10 16 12.322.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.2882.20 × 10 16 31,1562.20 × 10 16 0.89988
812.4472.20 × 10 16 12.5012.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.3252.20 × 10 16 31,1562.20 × 10 16 0.89988
912.4472.20 × 10 16 12.1282.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.4212.20 × 10 16 31,1562.20 × 10 16 0.89988
1012.4472.20 × 10 16 12.4472.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.312.20 × 10 16 31,1562.20 × 10 16 0.89988
1112.4472.20 × 10 16 12.5142.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.0882.20 × 10 16 31,1562.20 × 10 16 0.899878
1212.4472.20 × 10 16 12.8862.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.3672.20 × 10 16 31,1562.20 × 10 17 0.89988
1312.4472.20 × 10 16 12.7932.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.0082.20 × 10 16 31,1562.20 × 10 18 0.89988
1412.4472.20 × 10 16 12.6342.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.0692.20 × 10 16 31,1562.20 × 10 19 0.89988
1512.4472.20 × 10 16 12.5012.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.5382.20 × 10 16 31,1562.20 × 10 20 0.89988
1612.4472.20 × 10 16 12.7312.20 × 10 16 36,0042.20 × 10 16 0.8958310.2882.20 × 10 16 10.4582.20 × 10 16 31,1562.20 × 10 21 0.89988
Mean12.271692.2 × 10 16 12.300942.2 × 10 16 35,568.752.2 × 10 16 0.89661910.315382.2 × 10 16 10.368942.2 × 10 16 31,437.751.52778 × 10 16 0.899375
Table A6. Performance measurements of inverse log-map pre-processing with Dynamic-Fixed-Point (DFP) quantization scheme.
Table A6. Performance measurements of inverse log-map pre-processing with Dynamic-Fixed-Point (DFP) quantization scheme.
Bit-WidthFrequency, AUC = 0.9601852Spatial, AUC = 0.9555699
Zp-Value <Dp-Value <Ep-Value <AUCZp-Value <Dp-Value <Ep-Value <AUC
110.4592.2 × 10 16 10.4652.2 × 10 16 32,9482.20 × 10 16 0.901519.33552.2 × 10 16 9.38862.2 × 10 16 31,5302.20 × 10 16 0.900483
212.3872.2 × 10 16 12.362.2 × 10 16 35,6462.20 × 10 16 0.8964711.0282.2 × 10 16 10.8872.2 × 10 16 32,7482.20 × 10 16 0.897033
312.1762.2 × 10 16 12.5412.2 × 10 16 34,7582.20 × 10 16 0.898069.79932.2 × 10 16 9.5482.2 × 10 16 29,7142.20 × 10 16 0.902613
411.1982.2 × 10 16 11.2512.2 × 10 16 32,1502.20 × 10 16 0.9027210.3782.2 × 10 16 10.2582.2 × 10 16 31,4162.20 × 10 16 0.899414
512.3972.2 × 10 16 12.2222.2 × 10 16 36,4402.20 × 10 16 0.895059.73742.2 × 10 16 9.60622.2 × 10 16 29,9642.20 × 10 16 0.902131
612.412.2 × 10 16 12.4622.2 × 10 16 35,8882.20 × 10 16 0.896049.79432.2 × 10 16 9.67582.2 × 10 16 29,9802.20 × 10 16 0.902088
712.1282.2 × 10 16 11.9432.2 × 10 16 34,1462.20 × 10 16 0.8991510.0112.2 × 10 16 10.0152.2 × 10 16 30,2082.20 × 10 16 0.901573
812.6892.2 × 10 16 12.4352.2 × 10 16 37,0762.20 × 10 16 0.8939110.1792.2 × 10 16 10.2412.2 × 10 16 31,2462.20 × 10 16 0.899718
912.6772.2 × 10 16 12.9472.2 × 10 16 36,8902.20 × 10 16 0.8942410.0712.2 × 10 16 9.84982.2 × 10 16 30,3322.20 × 10 16 0.901351
1012.7692.2 × 10 16 12.8232.2 × 10 16 37,1162.20 × 10 16 0.8938410.1442.2 × 10 16 9.62022.2 × 10 16 30,4742.20 × 10 16 0.901098
1112.6862.2 × 10 16 12.7232.2 × 10 16 35,8742.20 × 10 16 0.8960610.1062.2 × 10 16 9.93342.2 × 10 16 30,4122.20 × 10 16 0.901208
1212.6892.2 × 10 16 12.542.2 × 10 16 35,8882.20 × 10 16 0.8960410.1122.2 × 10 16 9.99882.2 × 10 16 30,4162.20 × 10 16 0.901201
1312.6922.2 × 10 16 12.4172.2 × 10 16 35,8842.20 × 10 16 0.8960410.1072.2 × 10 16 10.1052.2 × 10 16 30,4062.20 × 10 16 0.901219
1412.6892.2 × 10 16 12.8312.2 × 10 16 35,8722.20 × 10 16 0.8960610.1072.2 × 10 16 10.3772.2 × 10 16 30,4102.20 × 10 16 0.901212
1512.6912.2 × 10 16 12.6052.2 × 10 16 35,8762.20 × 10 16 0.8960610.1072.2 × 10 16 9.69372.2 × 10 16 30,4102.20 × 10 16 0.901212
1612.6892.2 × 10 16 13.2242.2 × 10 16 35,8722.20 × 10 16 0.8960610.1072.2 × 10 16 10.282.2 × 10 16 30,4082.20 × 10 16 0.901216
Mean12.339132.2 × 10 16 12.361812.2 × 10 16 35,520.252.2 × 10 16 0.89670610.070222.2 × 10 16 9.9673442.2 × 10 16 30,629.632.2 × 10 16 0.900923
Table A7. Performance measurements of retraining with Power-of-Two (Po2) quantization scheme.
Table A7. Performance measurements of retraining with Power-of-Two (Po2) quantization scheme.
Bit-WidthSpatial, AUC = 0.9555699
Zp-Value <Dp-Value <Ep-Value <AUC
1−1.4840.1378−1.51760.129111,4842.20 × 10 16 0.9614257
2−1.34420.1789−1.33510.181944440.00550.9589876
3−1.49540.1348−1.49230.135636180.0280.9592128
4−2.42040.0155−2.45410.0141240540.0050.961272
5−3.69260.000222−3.74610.0001857782.20 × 10 16 0.9645181
6−2.22860.02584−2.28250.0224637240.0090.9606535
7−3.25140.00115−3.22590.0012651182.20 × 10 16 0.9633884
8−1.99560.04598−2.05360.0400136740.01550.9602138
9−1.99560.04598−2.0030.0451836740.01650.9602138
10−1.99560.04598−2.01750.0436436740.01550.9602138
11−1.99560.04598−2.04530.0408236740.0090.9602138
12−1.99560.04598−1.96940.048936740.0160.9602138
13−1.99560.04598−1.98120.0475736740.0140.9602138
14−1.99660.04587−1.99660.0458736740.010.9602138
15−1.99560.04598−1.97940.0477736740.01250.9602138
16−1.99560.04598−2.01710.0436836740.01150.9602138
Mean−2.117380.056745−2.132290.0555034455.3750.01050.960711394
Table A8. Performance measurements of retraining with Dynamic-Fixed-Point (DFP) quantization scheme.
Table A8. Performance measurements of retraining with Dynamic-Fixed-Point (DFP) quantization scheme.
Bit-WidthSpatial, AUC = 0.9555699
Zp-Value <Dp-Value <Ep-Value <AUC
1−0.846410.3973−0.842770.399411,6502.20 × 10 16 0.95901
2−1.61870.1055−1.60630.108251182.20 × 10 16 0.95978
3−1.60910.1076−1.6070.108135420.0070.95882
40.171470.86390.172030.863426440.06450.95524
52.64380.0081992.63440.00842836360.00950.94999
64.28151.86 × 10 5 4.18872.81 × 10 5 10,8242.20 × 10 16 0.93636
74.24472.19 × 10 5 4.26552.00 × 10 5 10,3742.20 × 10 16 0.93725
8−0.56190.5742−0.55810.576826940.0240.9566
9−0.37450.708−0.37390.708524420.0530.95627
10−0.33240.7396−0.34110.73324920.0540.95619
11−0.30810.758−0.31480.752924740.05450.95615
12−0.32380.7461−0.31410.753524900.05350.95617
13−0.31810.7504−0.30010.764124800.0510.95616
14−0.32390.746−0.31520.752624740.0540.95617
15−0.31810.7504−0.32220.747324800.04350.95616
16−0.3220.7475−0.31660.751624760.05450.95617
Mean0.2552820.50017120.2530440.50174234393.1250.03268750.953907

References

  1. Gardezi, A.; Malik, U.; Rehman, S.; Young, R.C.D.; Birch, P.M.; Chatwin, C.R. Enhanced target recognition employing spatial correlation filters and affine scale invariant feature transform. In Pattern Recognition and Tracking XXX; Alam, M.S., Ed.; International Society for Optics and Photonics, SPIE: Baltimore, MD, USA, 2019; Volume 10995, pp. 145–160. [Google Scholar] [CrossRef] [Green Version]
  2. Gardezi, A.; Qureshi, T.; Alkandri, A.; Young, R.C.D.; Birch, P.M.; Chatwin, C.R. Comparison of spatial domain optimal trade-off maximum average correlation height (OT-MACH) filter with scale invariant feature transform (SIFT) using images with poor contrast and large illumination gradient. In Optical Pattern Recognition XXVI; Casasent, D., Alam, M.S., Eds.; International Society for Optics and Photonics, SPIE: Baltimore, MD, USA, 2015; Volume 9477, pp. 21–35. [Google Scholar] [CrossRef] [Green Version]
  3. P. D., S.M.; Lin, J.; Zhu, S.; Yin, Y.; Liu, X.; Huang, X.; Song, C.; Zhang, W.; Yan, M.; Yu, Z.; Yu, H. A Scalable Network-on-Chip Microprocessor With 2.5D Integrated Memory and Accelerator. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 64, 1432–1443. [Google Scholar] [CrossRef]
  4. Awan, A.B.; Bakhshi, A.D.; Abbas, M.; Rehman, S. Active contour-based clutter defiance scheme for correlation filters. Electron. Lett. 2019, 55, 525–527. [Google Scholar] [CrossRef]
  5. Kay, S. Fundamentals of Statistical Signal Processing: Detection Theory; PTR Prentice-Hall: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
  6. Kumar, B.V.K.V.; Pochapsky, E. Signal-to-noise ratio considerations in modified matched spatial filters. J. Opt. Soc. Am. A 1986, 3, 777–786. [Google Scholar] [CrossRef]
  7. Lugt, A.V. Signal detection by complex spatial filtering. IEEE Trans. Inf. Theory 1964, 10, 139–145. [Google Scholar] [CrossRef]
  8. Kumar, B.; Juday, R.; Mahalanobis, A. Correlation Pattern Recognition; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  9. Psaltis, D.; Casasent, D. Position, Rotation, And Scale Invariant Optical Correlation. Appl. Opt. 1976, 15, 1795–1799. [Google Scholar] [CrossRef] [Green Version]
  10. Mersereau, K.; Morris, G.M. Scale, rotation, and shift invariant image recognition. Appl. Opt. 1986, 25, 2338–2342. [Google Scholar] [CrossRef] [PubMed]
  11. Casasent, D. Unified synthetic discriminant function computational formulation. Appl. Opt. 1984, 23, 1620–1627. [Google Scholar] [CrossRef] [PubMed]
  12. Kumar, B.V.K.V. Minimum-variance synthetic discriminant functions. J. Opt. Soc. Am. A 1986, 3, 1579–1584. [Google Scholar] [CrossRef]
  13. Bahri, Z.; Kumar, B.V.K.V. Generalized synthetic discriminant functions. J. Opt. Soc. Am. A 1988, 5, 562–571. [Google Scholar] [CrossRef]
  14. Casasent, D.; Chang, W.T. Correlation synthetic discriminant functions. Appl. Opt. 1986, 25, 2343–2350. [Google Scholar] [CrossRef]
  15. Mahalanobis, A.; Casasent, D.P. Performance evaluation of minimum average correlation energy filters. Appl. Opt. 1991, 30, 561–572. [Google Scholar] [CrossRef] [PubMed]
  16. Mahalanobis, A.; Kumar, B.V.K.V.; Casasent, D. Minimum average correlation energy filters. Appl. Opt. 1987, 26, 3633–3640. [Google Scholar] [CrossRef] [PubMed]
  17. Kumar, B.V.K.V. Tutorial survey of composite filter designs for optical correlators. Appl. Opt. 1992, 31, 4773–4801. [Google Scholar] [CrossRef] [PubMed]
  18. Sudharsanan, S.I.; Mahalanobis, A.; Sundareshan, M.K. Unified framework for the synthesis of synthetic discriminant functions with reduced noise variance and sharp correlation structure. Opt. Eng. 1990, 29, 1021–1028. [Google Scholar] [CrossRef]
  19. Alkanhal, M.; Vijaya Kumar, B.V.K.; Mahalanobis, A. Improving the false alarm capabilities of the maximum average correlation height correlation filter. Opt. Eng. 2000, 39, 1133–1141. [Google Scholar] [CrossRef]
  20. Alkanhal, M.; Vijaya Kumar, B.V.K.; Mahalanobis, A. Eigen-extended maximum average correlation height (EEMACH) filters for automatic target recognition. Automat. Target Recogn. 2001. [Google Scholar] [CrossRef]
  21. Goyal, S.; Nishchal, N.K.; Beri, V.K.; Gupta, A.K. Wavelet-modified maximum average correlation height filter for rotation invariance that uses chirp encoding in a hybrid digital-optical correlator. Appl. Opt. 2006, 45, 4850–4857. [Google Scholar] [CrossRef]
  22. Goyal, S.; Nishchal, N.K.; Beri, V.K.; Gupta, A.K. Wavelet-modified maximum average correlation height filter for out-of-plane rotation invariance. Optik 2009, 120, 62–67. [Google Scholar] [CrossRef]
  23. Ang, T.; Tan, A.W.I.; Loo, C.K.; Wong, W.K. Wavelet MACH Filter for Omnidirectional Human Activity Recognition. Int. J. Innov. Comput. Inf. Control IJICIC 2012, 8, 3565–3584. [Google Scholar]
  24. Rodriguez, A.; Boddeti, V.N.; Kumar, B.V.K.V.; Mahalanobis, A. Maximum Margin Correlation Filter: A New Approach for Localization and Classification. IEEE Trans. Image Process. 2013, 22, 631–643. [Google Scholar] [CrossRef]
  25. Fernandez, J.A.; Vijaya Kumar, B.V.K. Partial-Aliasing Correlation Filters. IEEE Trans. Signal Process. 2015, 63, 921–934. [Google Scholar] [CrossRef]
  26. Kiani, H.; Sim, T.; Lucey, S. Multi-channel correlation filte rs for human action recognition. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 1485–1489. [Google Scholar] [CrossRef] [Green Version]
  27. Tehsin, S.; Rehman, S.; Bilal, A.; Chaudry, Q.; Saeed, O.; Abbas, M.; Young, R. Comparative analysis of zero aliasing logarithmic mapped optimal trade-off correlation filter. In Pattern Recognition and Tracking XXVIII; Alam, M.S., Ed.; International Society for Optics and Photonics, SPIE: Anaheim, CA, USA, 2017; Volume 10203, pp. 22–37. [Google Scholar] [CrossRef]
  28. Tehsin, S.; Rehman, S.; Riaz, F.; Saeed, O.; Hassan, A.; Khan, M.A.; Alam, M.S. Fully invariant wavelet enhanced minimum average correlation energy filter for object recognition in cluttered and occluded environments. In Defense + Security; SPIE: Anaheim, CA, USA, 2017. [Google Scholar]
  29. Tehsin, S.; Rehman, S.; Saeed, M.O.B.; Riaz, F.; Hassan, A.; Abbas, M.; Young, R.; Alam, M.S. Self-Organizing Hierarchical Particle Swarm Optimization of Correlation Filters for Object Recognition. IEEE Access 2017, 5, 24495–24502. [Google Scholar] [CrossRef]
  30. Achuthanunni, A.; Kishore Saha, R.; Banerjee, P.K. Unconstrained Band-pass Optimized Correlation Filter (UBoCF): An application to face recognition. In Proceedings of the 2018 IEEE Applied Signal Processing Conference (ASPCON), Kolkata, India, 7–9 December 2018; pp. 297–303. [Google Scholar] [CrossRef]
  31. Banerjee, P.K.; Datta, A.K. Band-pass correlation filter for illumination- and noise-tolerant face recognition. SIViP 2017, 9–16. [Google Scholar] [CrossRef]
  32. Akbar, N.; Tehsin, S.; Bilal, A.; Rubab, S.; Rehman, S.; Young, R. Detection of moving human using optimized correlation filters in homogeneous environments. In Pattern Recognition and Tracking XXXI; Alam, M.S., Ed.; International Society for Optics and Photonics, SPIE: Baltimore, MD, USA, 2020; Volume 11400, pp. 73–79. [Google Scholar] [CrossRef]
  33. Akbar, N.; Tehsin, S.; ur Rehman, H.; Rehman, S.; Young, R. Hardware design of correlation filters for target detection. In Pattern Recognition and Tracking XXX; Alam, M.S., Ed. International Society for Optics and Photonics, SPIE: Baltimore, MD, United States, 2019; Volume 10995, pp. 71–79. [Google Scholar] [CrossRef]
  34. Masood, H.; Rehman, S.; Khan, M.; Javed, Q.; Abbas, M.; Alam, M.; Young, R. Approximate Proximal Gradient-Based Correlation Filter for Target Tracking in Videos: A Unified Approach. Arab. J. Sci. Eng. 2019, 9363–9380. [Google Scholar] [CrossRef]
  35. Masood, H.; Rehman, S.; Khan, M.; Javed, Q.; Abbas, M.; Alam, M.; Young, R. A novel technique for recognition and tracking of moving objects based on E-MACH and proximate gradient (PG) filters. In Proceedings of the 2017 20th International Conference of Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 22–24 December 2017; pp. 1–6. [Google Scholar] [CrossRef]
  36. Venkataramani, S.; Ranjan, A.; Roy, K.; Raghunathan, A. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proceedings of the 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), La Jolla, CA, USA, 11–13 August 2014; pp. 27–32. [Google Scholar]
  37. Rubio-González, C.; Nguyen, H.D.; Demmel, J.; Kahan, W.; Sen, K.; Bailey, D.H.; Iancu, C.; Hough, D. Precimonious: Tuning assistant for floating-point precision. In Proceedings of the SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 17–22 November 2013; pp. 1–12. [Google Scholar]
  38. Pandey, J.G.; Karmakar, A.; Shekhar, C.; Gurunarayanan, S. An FPGA-based fixed-point architecture for binary logarithmic computation. In Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013), Shimla, India, 9–11 December 2013; pp. 383–388. [Google Scholar]
  39. Mahalanobis, A.; Kumar, B.V. Optimality of the maximum average correlation height filter for detection of targets in noise. Opt. Eng. 1997, 36, 2642–2648. [Google Scholar] [CrossRef]
  40. Mahalanobis, A.; Kumar, B.V.K.V.; Song, S.; Sims, S.R.F.; Epperson, J.F. Unconstrained correlation filters. Appl. Opt. 1994, 33, 3751–3759. [Google Scholar] [CrossRef] [PubMed]
  41. Sabir, D.; Rehman, S.; Hassan, A. Fully invariant quaternion based filter for target recognition. In Proceedings of the 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 19–21 October 2015; pp. 509–513. [Google Scholar] [CrossRef]
  42. Gardezi, A.; Alkandri, A.; Birch, P.; Young, R.; Chatwin, C. A space variant maximum average correlation height (MACH) filter for object recognition in real time thermal images for security applications. In Optics and Photonics for Counterterrorism and Crime Fighting VI and Optical Materials in Defence Systems Technology VII; Lewis, C., Burgess, D., Zamboni, R., Kajzar, F., Heckman, E.M., Eds.; International Society for Optics and Photonics, SPIE: Toulouse, France, 2010; Volume 7838, pp. 191–204. [Google Scholar] [CrossRef]
  43. Gardezi, A.; Al-Kandri, A.; Birch, P.; Young, R.; Chatwin, C. Enhancement of the speed of space-variant correlation filter implementations by using low-pass pre-filtering for kernel placement and applications to real-time security monitoring. In Optical Pattern Recognition XXII; Casasent, D.P., Chao, T.H., Eds.; International Society for Optics and Photonics, SPIE: Orlando, FL, USA, 2011; Volume 8055, pp. 78–88. [Google Scholar] [CrossRef]
  44. Awan, A.B. Composite filtering strategy for improving distortion invariance in object recognition. IET Image Process. 2018, 12, 1499–1509. [Google Scholar] [CrossRef]
  45. Zhou, A.; Yao, A.; Guo, Y.; Xu, L.; Chen, Y. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights. CoRR 2017, abs/1702.03044. Available online: http://xxx.lanl.gov/abs/1702.03044 (accessed on 25 August 2017).
  46. Gysel, P. Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks. CoRR 2016, abs/1605.06402. Available online: http://xxx.lanl.gov/abs/1605.06402 (accessed on 20 May 2016).
  47. Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. CoRR 2016, abs/1609.07061. Available online: http://xxx.lanl.gov/abs/1609.07061 (accessed on 22 September 2016).
  48. Viksten, F.; Forssén, P.E.; Johansson, B.; Moe, A. Comparison of Local Image Descriptors for Full 6 Degree-of-Freedom Pose Estimation. In Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009. [Google Scholar]
  49. Venkatraman, E. A permutation test to compare receiver operating characteristic curves. Biometrics 2000, 56, 1134–1138. [Google Scholar] [CrossRef] [PubMed]
  50. Venkatraman, E.S.; Begg, C.B. A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 1996, 83, 835–848. Available online: http://xxx.lanl.gov/abs/https://academic.oup.com/biomet/article-pdf/83/4/835/703326/83-4-835.pdf (accessed on 1 December 1996). [CrossRef]
  51. Hanley, J.A.; McNeil, B.J. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983, 148, 839–843. Available online: http://xxx.lanl.gov/abs/https://doi.org/10.1148/radiology.148.3.6878708 (accessed on 1 September 1983). [CrossRef] [PubMed] [Green Version]
  52. DeLong, E.R.; DeLong, D.M.; Clarke-Pearson, D.L. Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988, 44, 837–845. [Google Scholar] [CrossRef]
Figure 1. Optimization flow of our Correlation Pattern Recognition (CPR) quantization technique.
Figure 1. Optimization flow of our Correlation Pattern Recognition (CPR) quantization technique.
Electronics 10 00351 g001
Figure 2. Log-polar illustration from certain coordinates to log-polar coordinates.
Figure 2. Log-polar illustration from certain coordinates to log-polar coordinates.
Electronics 10 00351 g002
Figure 3. Complete block diagram representation of step-by-step implementation of quantization schemes for CPR.
Figure 3. Complete block diagram representation of step-by-step implementation of quantization schemes for CPR.
Electronics 10 00351 g003
Figure 4. Distribution of filter weights prior and after direct quantization: (a) power-of-two (Po2); (b) dynamic-fixed-point (DFP).
Figure 4. Distribution of filter weights prior and after direct quantization: (a) power-of-two (Po2); (b) dynamic-fixed-point (DFP).
Electronics 10 00351 g004
Figure 5. Peak signal-to-noise ratio (PSNR) of baseline DFP quantization; DFP quantization after log-polar transform for different compression ratios (CR).
Figure 5. Peak signal-to-noise ratio (PSNR) of baseline DFP quantization; DFP quantization after log-polar transform for different compression ratios (CR).
Electronics 10 00351 g005
Figure 6. Peak signal-to-noise ratio (PSNR) of baseline Po2 quantization; Po2 quantization after log-polar transform for different compression ratios (CR).
Figure 6. Peak signal-to-noise ratio (PSNR) of baseline Po2 quantization; Po2 quantization after log-polar transform for different compression ratios (CR).
Electronics 10 00351 g006
Figure 7. Comparison of weight distribution of filter between full precision, direct, and retrain quantization. (a) The comparison case for Po2. (b) The comparison case for DFP.
Figure 7. Comparison of weight distribution of filter between full precision, direct, and retrain quantization. (a) The comparison case for Po2. (b) The comparison case for DFP.
Electronics 10 00351 g007
Figure 8. Retraining of a spatially-trained CPR filter. (a) Retraining process converts the floating-point precision weights filter, h f into a retrain quantized version h r t q . The first row describes the direct quantization and quantization error ( h e q calculation). The second row represents the retraining filter h r t and retrained quantized version h r t q when ξ = 1. (b) Snips of the filter after direct and retraining quantization. Highlighted weights are change before and after retraining.
Figure 8. Retraining of a spatially-trained CPR filter. (a) Retraining process converts the floating-point precision weights filter, h f into a retrain quantized version h r t q . The first row describes the direct quantization and quantization error ( h e q calculation). The second row represents the retraining filter h r t and retrained quantized version h r t q when ξ = 1. (b) Snips of the filter after direct and retraining quantization. Highlighted weights are change before and after retraining.
Electronics 10 00351 g008
Figure 9. (a) Represents the picture of full shirt sample from the Fashion MNIST dataset. (b) Two-dimension log-polar transform of an image sample. (c) Histogram of image. (d) Histogram of log-polar transform of the image.
Figure 9. (a) Represents the picture of full shirt sample from the Fashion MNIST dataset. (b) Two-dimension log-polar transform of an image sample. (c) Histogram of image. (d) Histogram of log-polar transform of the image.
Electronics 10 00351 g009
Figure 10. (a) Represents the picture of full shirt sample from Fashion MNIST dataset. (b) Two-dimensional inverse log-polar transform of an image sample. (c) Histogram of image. (d) Histogram of inverse log-polar transform of image.
Figure 10. (a) Represents the picture of full shirt sample from Fashion MNIST dataset. (b) Two-dimensional inverse log-polar transform of an image sample. (c) Histogram of image. (d) Histogram of inverse log-polar transform of image.
Electronics 10 00351 g010
Figure 11. Flow diagram of (a) Direct Quantization using DFP and Po2 quantization schemes. Filters have regular training before quantization for a given bit-width. (b) Retraining Quantization using DFP and Po2 quantization schemes. Quantized version of regularly trained filters retrain for the corresponding bit-width, then re-quantize to obtain Weight quantization re-training (WQR) filters. (c) Pre-processing Quantization using DFP and Po2 quantization schemes. After regular training, a spatial transform is applied. Then, quantization is performed.
Figure 11. Flow diagram of (a) Direct Quantization using DFP and Po2 quantization schemes. Filters have regular training before quantization for a given bit-width. (b) Retraining Quantization using DFP and Po2 quantization schemes. Quantized version of regularly trained filters retrain for the corresponding bit-width, then re-quantize to obtain Weight quantization re-training (WQR) filters. (c) Pre-processing Quantization using DFP and Po2 quantization schemes. After regular training, a spatial transform is applied. Then, quantization is performed.
Electronics 10 00351 g011
Figure 12. Experimental setup showing different software components.
Figure 12. Experimental setup showing different software components.
Electronics 10 00351 g012
Figure 13. Different training objects are illustrated. These objects have different object complexities.
Figure 13. Different training objects are illustrated. These objects have different object complexities.
Electronics 10 00351 g013
Figure 14. Training image (af). Testing image (g).
Figure 14. Training image (af). Testing image (g).
Electronics 10 00351 g014
Figure 15. Flow diagram of Particle Swarm Optimization (PSO)-based parameter optimization of β and γ .
Figure 15. Flow diagram of Particle Swarm Optimization (PSO)-based parameter optimization of β and γ .
Electronics 10 00351 g015
Figure 16. Rotation test for CPR. Each graph represents the responses from 0–360 degree rotation of the testing object. For brevity, every four consecutive graphs, (ad), (eh), (il), (mp), (qt), and (ux), correspond to compression ratios of 16, 8, 5.33, 4, 2, and 1.33, respectively. Four columns represent Po2 frequency-trained, DFP frequency-trained, Po2 spatially-trained, and DFP spatially-trained filters, respectively.
Figure 16. Rotation test for CPR. Each graph represents the responses from 0–360 degree rotation of the testing object. For brevity, every four consecutive graphs, (ad), (eh), (il), (mp), (qt), and (ux), correspond to compression ratios of 16, 8, 5.33, 4, 2, and 1.33, respectively. Four columns represent Po2 frequency-trained, DFP frequency-trained, Po2 spatially-trained, and DFP spatially-trained filters, respectively.
Electronics 10 00351 g016aElectronics 10 00351 g016b
Figure 17. Random samples of a car from dataset 2 are illustrated at different scales.
Figure 17. Random samples of a car from dataset 2 are illustrated at different scales.
Electronics 10 00351 g017
Figure 18. Scalability test for CPR. Each graph represents the responses up to 400% maximum scale of the testing object. For brevity, every four consecutive graphs, (ad), (eh), (il), (mp), (qt), and (ux), correspond to compression ratios of 16, 8, 5.33, 4, 2, and 1.33, respectively. Four columns represent Po2 frequency-trained, DFP frequency-trained, Po2 spatially-trained, and DFP spatially-trained filters, respectively.
Figure 18. Scalability test for CPR. Each graph represents the responses up to 400% maximum scale of the testing object. For brevity, every four consecutive graphs, (ad), (eh), (il), (mp), (qt), and (ux), correspond to compression ratios of 16, 8, 5.33, 4, 2, and 1.33, respectively. Four columns represent Po2 frequency-trained, DFP frequency-trained, Po2 spatially-trained, and DFP spatially-trained filters, respectively.
Electronics 10 00351 g018aElectronics 10 00351 g018b
Figure 19. Samples of a car from dataset 3 are illustrated under different lighting conditions.
Figure 19. Samples of a car from dataset 3 are illustrated under different lighting conditions.
Electronics 10 00351 g019
Figure 20. CPR responses for the object under different moving lighting conditions. Graphs with a compression ratio of sixteen contain the first row, which represents Po2 frequency-trained and DFP frequency-trained filter graphs, respectively. The second row represents Po2 spatially-trained, and DFP spatially-trained filter graphs, respectively.
Figure 20. CPR responses for the object under different moving lighting conditions. Graphs with a compression ratio of sixteen contain the first row, which represents Po2 frequency-trained and DFP frequency-trained filter graphs, respectively. The second row represents Po2 spatially-trained, and DFP spatially-trained filter graphs, respectively.
Electronics 10 00351 g020
Figure 21. CPR responses for the object under different moving lighting conditions. Graphs with a compression ratio of eight contain the first row which represents Po2 frequency-trained and DFP frequency-trained filter graphs, respectively. The second row represents Po2 spatially-trained, and DFP spatially-trained filter graphs, respectively.
Figure 21. CPR responses for the object under different moving lighting conditions. Graphs with a compression ratio of eight contain the first row which represents Po2 frequency-trained and DFP frequency-trained filter graphs, respectively. The second row represents Po2 spatially-trained, and DFP spatially-trained filter graphs, respectively.
Electronics 10 00351 g021
Figure 22. Performance measures of each method for quantized bit-widths.
Figure 22. Performance measures of each method for quantized bit-widths.
Electronics 10 00351 g022
Figure 23. The graph demonstrates a sparsity comparison between direct, log-map, and inverse log-map quantization for filter bank. Note that it is filter sparsity.
Figure 23. The graph demonstrates a sparsity comparison between direct, log-map, and inverse log-map quantization for filter bank. Note that it is filter sparsity.
Electronics 10 00351 g023
Figure 24. The graph demonstrates a memory comparison between direct, log-map, and inverse log-map quantization for filter bank.
Figure 24. The graph demonstrates a memory comparison between direct, log-map, and inverse log-map quantization for filter bank.
Electronics 10 00351 g024
Figure 25. The graph demonstrates a speed-up comparison between direct, log-map, and inverse log-map quantization for filter bank.
Figure 25. The graph demonstrates a speed-up comparison between direct, log-map, and inverse log-map quantization for filter bank.
Electronics 10 00351 g025
Table 1. Variables used in this work.
Table 1. Variables used in this work.
VariablesComments
f w Spatial filter weights in floating-point precision
L p o w 2 Quantized weights for Power-of-Two (Po2) scheme
L d f p Quantized weights for Dynamic-Fixed-Point (DFP) scheme
B W Bit-width for precision reduction
m 1 Exponential power of two used for the upper-bound
m 2 Exponential power of two used for the lower-bound
vMaximum absolute value of f w
U d f p Quantized weights for the Dynamic-Fixed-Point scheme
m C β , γ x Modified Average Image Correlation Height
m S β , γ x Modified Average Image Similarity
γ Parameter of the contribution of quantization error
β Parameter of the contribution of average
ξ Co-efficient of quantization error
c p j Raw correlation plane for test image j
n c p Normalized correlation plane
C O P I Correlation output peak intensity
τ Threshold for object detection
Δ % Percentage difference between threshold τ and COPI
Table 2. Comparative analysis of related work.
Table 2. Comparative analysis of related work.
ApproachesCross-Correlation DomainEmphasisMethodologies and StrengthsLimitations
MACH, OT-MACH [39,40]Frequency domainAutomatic Target RecognitionThe generalization of the minimum average correlation energy presented, which improved the target recognition in the presence of additive noise and distortionsResults in false-positives obtained because of excessive dependence on mean image and low discrimination ability
EMACH [19]Frequency domainAutomatic Target RecognitionTwo new metrics, all image correlation height and the modified average similarity measure introduced to improve false-positives and increase the discrimination abilityPoor generalization capability
EEMACH [20]Frequency domainAutomatic Target RecognitionBased on Eigen analysis of EMACH filter which resulted in better generalization capability than the EMACH filterRequired extensive Eigen analysis is computationally expansive
Fully invariant quaternion based filter [41]Frequency domainAutomatic Target Recognition, to achieve invariance in terms of color, scale, and orientation.For color target recognition, logarithm mapping and EMACH combined in quaternion domain which successfully solved color, rotation, and scale distortionsIncurs an extra computational cost due to pre-processing involved
Space variant maximum average correlation height (MACH) filter [42]Spatial domainAutomatic Target RecognitionEnables detection in an unpredictable environment which is resilient against background heat signature variance and scale changesIncurs the additional computation cost due to spatial domain filters
Pre-processing using low-pass filtering of space-variant correlation filter [43]Spatial domain, frequency domain pre-processingAutomatic Target Recognition reduces computation workload for target searchA low-pass filter is employed to reduces the search space for target detectionIncurs the computation complexity due to spatial domain filters and pre-processing steps
Combination of spatial correlation filters and affine scale-invariant feature transform [1]Spatial domain, spatial domain pre-processingAutomatic Target Recognition, to achieve invariance to color, scale, and orientationUsed Affine Scale Invariant Feature Transform (ASIFT) for pre-processing to achieve translation, zoom, rotation, and two camera axis orientation invarianceIncreases performance and increases the computation complexity due to ASIFT
Composite filtering [44]Frequency domainAutomatic Target Recognition, to achieve full invarianceResilience against distortion, for example, in-plane and out-of-plane rotation, illumination, and scale alterations which obtained after the pre-processing of the difference of Gaussian (DoG) and logarithmic on EEMACHRequirement of computational resources increase extensively
OursSpatial domainAutomatic Target Recognition, to achieve sparse and compressed correlation filter representation.Automatic Target Recognition to achieve sparse and compressed correlation filter representation.
Table 3. The correlation output peak intensity (COPI) of correlated surface and its corresponding detection score of sample compressed correlation filters.
Table 3. The correlation output peak intensity (COPI) of correlated surface and its corresponding detection score of sample compressed correlation filters.
Bit-WidthFrequencySpatial
Po2DFPPo2DFP
COPIScoreCOPIScoreCOPIScoreCOPIScore
11.42 × 10 11 3.51281.42 × 10 11 3.51283.15 × 10 17 3.55481.38 × 10 15 3.9381
22,439,0634.37482,439,0634.37487.36 × 10 15 3.44483.53 × 10 15 3.6044
31,557,5043.80721,783,7274.12453.85 × 10 10 3.87153.17 × 10 10 3.7645
41,532,1463.77921,986,8663.9653.69 × 10 10 3.7857.92 × 10 16 3.4876
51,532,7083.78011,901,3823.93083.72 × 10 10 3.77153.24 × 10 15 3.4693
61,532,7073.78011,895,4843.90693.72 × 10 10 3.77142.89 × 10 17 3.4648
71,532,7073.78011,898,0873.91153.72 × 10 10 3.77142.26 × 10 16 3.4632
81,532,7073.78011,903,3493.90763.72 × 10 10 3.77142.89 × 10 17 3.4611
91,532,7073.78011,903,2713.90423.72 × 10 10 3.77147.98 × 10 16 3.461
1015327073.78011,903,1133.90723.72 × 10 10 3.77147.98 × 10 16 3.4607
111,532,7073.78011,902,4673.90573.72 × 10 10 3.77147.98 × 10 16 3.461
121,532,7073.78011,902,7013.90633.72 × 10 10 3.77142.89 × 10 17 3.4609
131,532,7073.78011,902,7873.90583.72 × 10 10 3.77142.89 × 10 17 3.4609
141,532,7073.78011,902,7093.90583.72 × 10 10 3.77142.89 × 10 17 3.4609
151,532,7073.78011,902,7483.90583.72 × 10 10 3.77142.89 × 10 17 3.4609
161,532,7073.78011,902,7413.90593.72 × 10 10 3.77142.89 × 10 17 3.4609
Note: Detection scores represent very small values as compared to corresponding COPI.
Table 4. Description of legends employed in Figure 16, Figure 17, Figure 18, Figure 19, Figure 20 and Figure 21.
Table 4. Description of legends employed in Figure 16, Figure 17, Figure 18, Figure 19, Figure 20 and Figure 21.
LegendsResponse DescriptionLegendsResponse Description
Full_Precision_spFloating-point precision of ST filterFull_Precision_freFloating-point precision of ST filter
Direct_Quantization_fre_pw2Direct quantization using Po2 of FT filterInverseLogPolar_Quantization _fre_pw2Quantization with Inverse log-polar pre-processing using Po2 of FT filter
Direct_Quantization_fre_dftDirect quantization using DFP of FT filterInverseLogPolar_Quantization _fre_dftQuantization with Inverse log-polar pre-processing using DFP of FT filter
Direct_Quantization_sp_pw2Direct quantization using Po2 of ST filterInverseLogPolar_Quantization _sp_pw2Quantization with Inverse log-polar pre-processing using Po2 of ST filter
Direct_Quantization_sp_dftDirect quantization using DFP of ST filterInverseLogPolar_Quantization _sp_dftQuantization with Inverse log-polar pre-processing using DFP of ST filter
LogPolar_Quantization _fre_pw2Quantization with log-polar pre-processing using Po2 of FT filterRetrain_Quantization _fre_pw2Retrain-quantization using Po2 of FT filter
LogPolar_Quantization _fre_dftQuantization with log-polar pre-processing using DFP of FT filterRetrain_Quantization _fre_dftRetrain-quantization using DFP of FT filter
LogPolar_Quantization _sp_pw2Quantization with log-polar pre-processing using Po2 of ST filterRetrain_Quantization _sp_pw2Retrain-quantization using Po2 of ST filter
LogPolar_Quantization _sp_dftQuantization with log-polar pre-processing using DFP of ST filterRetrain_Quantization _sp_dftRetrain-quantization using DFP of ST filter
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sabir, D.; Hanif, M.A.; Hassan, A.; Rehman, S.; Shafique, M. Weight Quantization Retraining for Sparse and Compressed Spatial Domain Correlation Filters. Electronics 2021, 10, 351. https://doi.org/10.3390/electronics10030351

AMA Style

Sabir D, Hanif MA, Hassan A, Rehman S, Shafique M. Weight Quantization Retraining for Sparse and Compressed Spatial Domain Correlation Filters. Electronics. 2021; 10(3):351. https://doi.org/10.3390/electronics10030351

Chicago/Turabian Style

Sabir, Dilshad, Muhammmad Abdullah Hanif, Ali Hassan, Saad Rehman, and Muhammad Shafique. 2021. "Weight Quantization Retraining for Sparse and Compressed Spatial Domain Correlation Filters" Electronics 10, no. 3: 351. https://doi.org/10.3390/electronics10030351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop