Next Article in Journal
Cruise Industry Trends and Cruise Ships’ Navigational Practices in the Central and South Part of the Adriatic East Coast Affecting Navigational Safety and Sustainable Development
Next Article in Special Issue
A Feedback System Supporting Students Approaching a High-Level Programming Course
Previous Article in Journal
Predictive Model for the Optimized Mixed-Air Temperature of a Single-Duct VAV System
Previous Article in Special Issue
Accurate Sinusoidal Frequency Estimation Algorithm for Internet of Things Based on Phase Angle Interpolation Using Frequency Shift
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Compressive Domain Deep CNN for Image Classification and Performance Improvement Using Genetic Algorithm-Based Sensing Mask Learning

by
Baba Fakruddin Ali B H
and
Prakash Ramachandran
*
School of Electronics Engineering, Vellore Institute of Technology, Vellore 632014, India
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(14), 6881; https://doi.org/10.3390/app12146881
Submission received: 13 June 2022 / Revised: 29 June 2022 / Accepted: 4 July 2022 / Published: 7 July 2022
(This article belongs to the Special Issue Applied and Innovative Computational Intelligence Systems)

Abstract

:
The majority of digital images are stored in compressed form. Generally, image classification using convolution neural network (CNN) is done in uncompressed form rather than compressed one. Training the CNN in the compressed domain eliminates the requirement for decompression process and results in improved efficiency, minimal storage, and lesser cost. Compressive sensing (CS) is one of the effective and efficient method for signal acquisition and recovery and CNN training on CS measurements makes the entire process compact. The most popular sensing phenomenon used in CS is based on image acquisition using single pixel camera (SPC) which has complex design implementation and usually a matrix simulation is used to represent the SPC process in numerical demonstration. The CS measurements using this phenomenon are visually different from the image and to add this in the training set of the compressed learning framework, there is a need for an inverse SPC process that is to be applied all through the training and testing dataset image samples. In this paper we proposed a simple sensing phenomenon which can be implemented using the image output of a standard digital camera by retaining few pixels and forcing the rest of the pixels to zero and this reduced set of pixels is assumed as CS measurements. This process is modeled by a binary mask application on the image and the resultant image still subjectively legible for human vision and can be used directly in the training dataset. This sensing mask has very few active pixels at arbitrary locations and there is a lot of scope to heuristically learn the sensing mask suitable for the dataset. Only very few attempts had been made to learn the sensing matrix and the sole effect of this learning process on the improvement of CNN model accuracy is not reported. We proposed to have an ablation approach to study how this sensing matrix learning improves the accuracy of the basic CNN architecture. We applied CS for two class image dataset by applying a Primitive Walsh Hadamard (PWH) binary mask function and performed the classification experiment using a basic CNN. By retaining arbitrary amount of pixel in the training and testing dataset we applied CNN on the compressed measurements to perform image classification and studied and reported the model performance in terms of training and validation accuracies by varying the amount of pixels retained. A novel Genetic Algorithm-based compressive learning (GACL) method is proposed to learn the PWH mask to optimize the model training accuracy by using two different crossover techniques. In the experiment conducted for the case of compression ratio (CR) 90% by retaining only 10% of the pixels in every images both in training and testing dataset that represent two classes, the training accuracy is improved from 67% to 85% by using diagonal crossover in offspring creation of GACL. The robustness of the method is examined by applying GACL for user defined multiclass dataset and achieved better CNN model accuracies. This work will bring out the strength of sensing matrix learning which can be integrated with advanced training models to minimize the amount of information that is to be sent to central servers and will be suitable for a typical IoT frame work.

1. Introduction

The Shannon–Nyquist sampling theorem states that the sample rate must be at least twice of the maximum signal frequency and this provides the foundation for signal reconstruction from discrete measurements. In order to be precisely reconstructed in the discrete situation, the number of measurements must be at least equal to the signal length. However, this strategy may necessitate a huge storage space, a long detecting time, lot of power, and a greater number of sensors. Compressive sensing (CS) [1,2,3,4,5,6,7,8] is a new theory that works beyond the standard technique and demonstrates how a sparse signal can be reconstructed from a smaller number of incoherent samples.
Comprehensive reviews of the present state of the field of CS imaging have been published. The single-pixel camera (SPC) [7] is particularly helpful in imaging outside the visible range, such as infrared imaging, terahertz imaging, and hyperspectral imaging, when detector arrays are either prohibitively costly or non-existent. Instead of rebuilding the whole picture, the aim in many applications is to tackle an inference issue such as anomaly detection or classification. The primary premise of the CS method is that most real-world signals have a compact representation in a transform domain where just a few are significant, while the remaining are zero or inconsequential this condition is known as signal sparsity.
The advantage of learning by manipulating deep neural network [9] is that the network can do feature extraction without the need for a human to do it manually. Deep neural networks developed throughout time, with more profound network topologies delving deeper into hidden layers. The convolutional neural network (CNN) is one of the most well-known deep neural networks in pattern recognition [10]. Conventionally, CNN [11] has made significant contributions to image processing due to its remarkable capacity to generate meaningful feature maps [12] and information for conducting classification [13], object recognition [14], and signal analysis [15,16,17].
Deep convolution neural network is very promising in image classification problem in which the CNN model is trained with the training dataset that comprises of images belonging to different classes and the CNN model can classify a test image as a subset of one of the existing classes. In the classical classification problem, the complete uncompressed raw images are used in the training process and it is a challenge to classify a test image if it is in compressed form and the challenges are both in terms of validation accuracy and the computational time required to decompress the compressed image. One of the interesting solution is to perform the training and validation process in the compressed domain. In this approach the learning happens directly on the compressed information and validation also done in the same compressed domain. The compressed domain inference has been studied decade before [18,19,20,21,22,23,24,25,26] and many work have been extended recently [27,28,29,30,31,32,33]. The key benefits of this approach are two-fold: (i) There is no need for direct reconstruction computation process; and (ii) the amount of information to be communicated through networks to the server is also minimized. As discussed earlier, CS is a paradigm shift in the sampling and compression process in which only required number of measurements are done on the signal and the signal is reconstructed using optimization techniques [25,26] and the method of learning on CS measurements will enjoy the compactness of the CS. Davenport et al. [22] used the matched filter of compressed sensing patterns applied to a library of pictures to produce a ‘smashed’ filter and proved the validity of the random projection-based strategy for compressed domain image classification. For the same objective, Li et al. utilized the same SPC system but learnt sensing patterns using data-dependent “secant projections” [23]. The convolutional neural network (CNN) has recently been used [19,20,21,22] and has generated substantially higher results. On the other hand, current neural network approaches, need to build a distinct network model for each individual measurement rate (MR).
In this proposed work, we focus on leveraging neural networks to conduct image classification directly on CS measurements without reconstruction, as well as pushing the limits of object recognition and classification with very low sensing and processing resources. In order to overcome the above limits, initially we reduced the pixels density by applying binary masking to the dataset images after getting the minimized learning accuracy, we applied genetic algorithm to determine the binary mask that is to be applied to the dataset while training the model using CNN. It is demonstrated that the mask learning improves the training accuracy on different possible crossover methods. Numerical experiments are conducted for CNN learning on CS measurements using standard image classification datasets and a performance analysis was conducted. There are two works [20,32] which are similar to our work and both used a sensing phenomenon mimicking the concept of SPC. SPC has a complicated implementation details but mathematically can be modeled using matrix operations. The sensing mechanism considered in our work is based on a standard digital camera output image and retaining only some selected pixels. This reduced set of pixels can be modeled as CS measurements [34]. Unlike SPC, the CS measurements we consider here is a simple spatial domain process which retains some pixel and force the remaining pixels to be zero. If there is a need to communicate the test image in an application platform it is enough to communicate the retained pixels as the mask is known both in client and server side. In server side CS measurements can be easily rearranged as an image using the mask pattern. In SPC-based sensing phenomenon the CS measurements cannot be directly used for testing or training. The inverse operation of SPC has to be performed to get the approximate images that are to be trained or tested [20,32]. In our experiment, the key performance parameter we considered is the training accuracy of the CNN model for the given percentage of pixels used in the learning process. The percentage of the pixels not used is otherwise considered as the compression ratio CR. Considering the fact that the binary mask used are fixed for all the training and testing images and are arbitrarily high only in 10% of locations, we propose a Genetic Algorithm-based compressed learning (GACL) which is a novel process that will improve the training accuracy of the CNN. The GACL proposed is to determine the best mask that is to be applied to all images in the training set which will maximize the training accuracy of CNN. This is achieved by using Genetic Algorithm to determine the best chromosomes of the mask by assuming the CNN training accuracy as the objective function that is to be maximized. Our numerical experiment shows that in the process of CNN learning on CS measurements, when 10% of pixels are retained (compression ratio (CR) = 90%), the training accuracy of 77% with untrained fixed mask improves to 80% when the masks are genetically learned with vertical cross over and it drastically improves to 85% with diagonal crossover GA learning of mask. Comparison of the classification training accuracies of the CNN for two different scenarios, one with GACL and another without GACL, shows that GACL improves the model accuracies in a larger extent. This improvement performance is highly data dependent and can be compared only with existing work that uses the same sensing principle and dataset.

2. Related Work

This section discusses the basic work done in compressed domain image classification, proposed sensing matrix learning, and the heuristic algorithms for the proposed work.

2.1. Compressed Domain Image Classification

Neural networks are employed for image compression using CNN architecture [28], RNN architecture [29], and GAN architecture [30]. In another way, the neural network can determine compressed data either by decompressing the data or directly on the compressed data. Calderbank et al. [8] provided the first theoretical results for compressed domain inference, demonstrating that learning directly in the compressed domain is possible. They provided bounds demonstrating that a linear SVM in the compressed domain performs similarly to the best linear classifier in the uncompressed domain and that classifiers can be learned directly in the compressed domain. The “smashed” filter was used by Davenport et al. [22], which used a random sensing matrix and a 1-nearest-neighbor classifier. Later, the SPC system was used to perform classification directly in compressed data using learned patterns via data-dependent “secant projections”. Prof. Pavan Turaga and his team created a rate-adaptive neural network for compressed domain classification recently [21]. Several studies had been conducted on frequency domain learning problems. DCT is utilized by JPEG, the most frequently used compression standard, and DCT coefficients are fed into CNNs for picture classification, object identification, and instance segmentation [25,26,27,28,29,30,31]. Similar to DCT, wavelet coefficients can also be directly used in the CNN training [29,30].
Yibo Xu et al. [32] present an effective training technique for a neural network with dynamic rate property, which allows a single neural network to categorize any measurement rate (MR) within the range of interest using a given sensing matrix that represents a SPC process. Only a few selected MRs are used in this training approach, yet the trained neural network is valid throughout the whole range of MRs of relevance. They also demonstrated that the dynamic-rate training scheme may be thought of as a universal strategy that works with a variety of sensing matrices and neural network designs, and it is a significant step forward in the use of neural networks for compressive inference and other compressive sensing applications.

2.2. Proposed Sensing Matrix Learning

When the CS sensing matrix is applied to the images, the CS measurements which are much lesser than the total number of pixels in the image are created and the sensing matrix values are arbitrary and random. The sensing matrix proposed in this paper is a simple process that retains certain set of pixels and forces other pixels zero. The image reconstruction from the reduced set of pixels is a CS problem [34,35,36] and the retained pixels can be considered as CS measurements. This process can be modeled as array multiplication of image and a binary mask. The work [32] discussed in Section 2.1 also uses a binary mask that mimics SPC. Basically, in this work the CS masks for various MR are applied to the training images and the training network is made adaptable for multiple MR. The adaptation of the network for multiple MR is made possible by adding all MR representation of the dataset images in training phase. This also involves an image reconstruction process for both training and testing images that is done by inverse of single pixel camera process which is another matrix operation that involves a computation cost. The possibility of training the sensing matrix is discussed in this paper but not attempted. But another work [20] discusses on learning the sensing matrix jointly with CNN weights for classification, assuming SPC sensing process. This work also involves an image reconstruction process from the CS measurements for all the testing and training images. A 28 × 28 MNIST handwritten dataset is used as the dataset. Our proposed work conceptually has the common goal as above [20,32], but different in following aspects:
(i)
Our sensing matrix is not based on single pixel camera principle but a simple pixel-retaining process using a binary mask.
(ii)
We use an evolution algorithm for binary mask learning named Genetic Algorithm-based compressed domain learning (GACL). We evaluate our research outcomes purely by comparison of our own results with and without GACL.
(iii)
We do not have an image reconstruction step both in testing and training phase.
(iv)
We consider dataset with more natural images, for example cat and dog images.

2.3. Heuristic Algorithms for Proposed Work

In the proposed work to improve the classification accuracy of CNN on compressed learning, we propose to use the mask learning strategy which determines the binary mask that maximizes the model accuracy. The search space of the mask is very large and it is suitable to use evolutionary method. The state-of-art of evolutionary algorithm (EA) had rapidly reached an advanced level. There are so many variants of EA and we can list a few as follows (i) Genetic Algorithm (GA), which applies the “survival of fittest first” policy to select the best child candidates [37,38], (ii) particle swarm optimization which assumes the potential solution as a particle [39], (iii) grey wolf optimization [40] which is based on the hierarchical behavior of wolf, and (iv) modified GA [41]. In our proposed work to bring out the novel idea of mask learning, we decided to use the basic GA method. More over GA is widely used in hyper parameter learning of CNN and well accepted in the AI eco system [42] and it is very time effective to use GA for our mask learning development phase to prove the concept.

3. Proposed Methodology

In this proposed work, we focus on neural networks to conduct image classification directly on CS measurements done on spatial domain. The CS measurements are used directly for learning without any reconstruction. The sensing model of CS is implemented applying binary masking to the dataset images and later the mask is learned iteratively using GA to improve the accuracy of learning. This was experimented on different possible crossover methods in GA to achieve high accuracy.
This section is divided in two, Section 3.1 describes the study on binary masking based on compressed domain learning and Section 3.2 describes the proposed Genetic Algorithm-Based Compressive Convolution Neural Network (GACCNN) training which is based on our proposed method Genetic Algorithm-based compressed domain learning (GACL).

3.1. Study of Binary Masking Based Compressed Domain Learning

The database used for this work consists of RGB images of dimension M × N belonging to two categories, with 2000 training images and 500 test images. All the images belonging to the dataset are resized to standard 256 × 256 images in our scheme. The binary sensing matrix, which is a structured matrix, is contrasted to the Primitive Walsh–Hadamard (PWH) matrix, which is essentially a random matrix. Binary Sensing Matrices of 256 × 256 size are implemented to use only a few elements of the original image; these matrices are used as a sieve model at a certain moment. Applying binary matrix and primitive Walsh–Hadamard matrix with original image retains only P% of its pixels and drastically reduces the number of pixels to be transmitted/stored. But it is also a fact that this process will significantly reduce the classifying accuracy due to distraction of the image.
Let us consider a typical image classification problem, say the widely used cat dog classification problem in which an image I under a test has to be transmitted to a server where CNN models are deployed. There is a need for compression of this test image I to obtain the compressed image T to reduce the communication time. One of the effective and fast ways of compressing is by CS. The CS is applied by multiplying a known binary mask. Mask with the image will force many pixels to zero and retains only P% of pixels for which the mathematical model is given by T = I . * Mask whereas T i is the image used in Training/Testing.
The best binary mask that retains the perceptual quality of image is Hadamard. We conducted an experiment by applying Hadamard binary mask to a test set which retains 50% of the pixels for a CNN-based image classification for the dataset [43]. The basic functional parameters used to configure the CNN for our experiment is shown in Table 1 and the network architecture of CNN is shown in Figure 1. For this case of retaining 50% of pixels, we found that the training accuracy of CNN is good enough whereas validation accuracy falls much lower than 70%. The reason is because the training set includes unmasked complete images whereas the test set includes masked images. When the experiment is repeated with masked training set, the validation accuracy improves to 75%.We further studied by decreasing the number of retained pixels by arbitrarily throwing the pixels of the PWH mask.
Figure 2 illustrates the masking process executed on the dataset images in the proposed algorithm and its outcome for various levels of pixels retaining. Figure 2a depicts the case of 100% pixel retained and the cases of images with pixels retained 50%, 25%, 15%, and 10% by applying PWH is depicted in Figure 2b–e.
Using the same CNN architecture of parameter shown in Table 1,we conducted the learning experiment in two different ways: (i) by using complete images in training (100% pixels retained) and PWH masking (retaining 50%, 25%, 15%, 10%) the test set and (ii) by using both training and testing set masked. The experimental results are tabulated in Table 2. When 100% pixels are retained for both the training and testing datasets there is no compression involved and was able to get 97% training accuracy. The model accuracies and loss function for this case is shown in Figure 3a. When only some portion of pixels are retained to achieve compression, accuracy drops and it can be well inferred from Table 2. But it should be noted that the training accuracy is better in the case (ii) which involves masking of both training and testing datasets than that of case (i) in which the training set is not masked. The CNN model accuracies and the loss functions for case (i) are shown in Figure 3b–e and that of case (ii) are shown in Figure 3f–i.

3.2. Genetic Algorithm-Based Compressed Domain Learning

The experiment we conducted in Section 3.1 is a proof of concept that a CNN can learn even from 10% of distributed pixels of the training set with a training accuracy of 77% and can recognize the image with a validation accuracy of 61%. This is because even when the retained pixel is 10% it is distributed well in all the region of image by using PWH-based binary mask. It should be noted that the same binary mask is used for all the images in the training and testing sets. The mask being randomly distributed with 10% of points 1 and remaining 0, there is a lot of scope to search for the best mask which would give better accuracy than our experimental accuracy 77%. The search space to get the best mask is large and we propose to use genetic algorithm to make the mask to maximize the training accuracy which is taken as the objective function of the GA. Generalized block diagram for Genetic Algorithm-based compressed learning (GACL) for two class dataset is shown in Figure 4, this depicts the work flow process for the training and testing models.
This research work is initiated by performing training on the widely used Cat and Dog datasets. One of the highlights of this work is that we used two types of fixed binary sensing patterns that are used as a binary sensing matrix of besides training of the original image in genetic algorithm. Alongside fusing original images with Binary Sensing and primitive Walsh Hadamard Matrices, where training and testing dataset are created from this generated dataset, the pixel density value can be varied as discussed in Section 3.1 and this has been applied to CNN model. This process performs arbitrarily which generates different training accuracies, which is termed as chromosomes and the training accuracies of these top ten chromosomes achieved in CNN are displayed in Figure 5.
To create their offspring, a crossover point is chosen over top two accuracy chromosomes. On these top two high accurate chromosomes, crossover operation was performed by choosing the first left half of the first matrix with second right half of the second matrix, later second right half of the first matrix with first left half of the second matrix as shown in Figure 6c,d was considered. The outcome of this matrix manipulation was used to mask the training and testing set images in the CNN algorithm and the results are tabulated in Table 3 and the resultant model accuracy graphs are shown in Figure 7a,b. Though the best achieved accuracy of 80% is better than the method without GA, the result was not convincible that this experiment was further progressed with diagonal cross over method as shown in Figure 6e,f. In this method the crossover operation was performed by choosing the segments that are separated diagonally and using this crossover it was able to obtain improved accuracy of 85% as shown in Table 3 which is better than the method without GA and also acceptable for achieving best classification and the resultant model accuracy graphs are shown in Figure 7c,d. This novel method of genetically learning the best pattern of pixels to be retained for an accurate CNN classification throws light on new ways of compressed sensing learning and will be much useful for faster communication of test images in an IoT platform.

3.3. GACL for Multiclass Datasets

The proposed GACL method works well in a two class dataset, and to examine it further we experimented with the same procedure on multiclass images by adding the more user-defined classes to the dataset used in two class experiment. Training of multiclass dataset with five different classes was considered initially and was able to attain training accuracy of 66.34% for the case of 10% of pixel retaining and with GA achieved an accuracy of 50% with conventional crossover and an improved accuracy of 67% with diagonal crossover as tabulated in Table 4 and Table 5. It should also be noted that there is no drastic improvement in training accuracy but improvement in the validation accuracy is considerable. The model accuracy, receiver operating characteristic, loss and confusion matrices are shown in Figure 8 and Figure 9.

4. Results and Discussion

This research is a proof of concept to bring out the possibility of training a CNN network-based classifier by using training images which have only few pixels that are retained and remaining forced to zero which facilitates the reduction of information to be communicated in an application platform. This is based on the fact that the retained pixels can be assumed as the CS measurements and can be used directly in the training process. This CS process can be easily implemented practically with simple digital cameras. By looking at the result of image obtained by retaining only the 50% of the pixels shown in Figure 2b, we can infer that it is legible for human eye to classify it as a cat and same thing applicable for Figure 2c–e. Intuitively we can say that the deep CNN which mimics human visual system can also recognize the image from the CS images having partial amount of pixels. We also consider the dataset that has natural images rather than simple images and it is interesting to note the result of our classification experiment in Table 1, that for the case of CS testing images the training accuracy of CNN models trained with CS images are better than that of training with complete images. The percentage of pixels not retained can be considered as the CR and for the case of CR of 90% the training accuracy is 77% with the validation accuracy 61%. In this case the CNN used is a very basic architecture which is vastly used for common image recognition application. For the completeness of the experiment all the model accuracies graphs are presented in Figure 3a–e. To bring out the efficacy of the proposed GACL algorithm, the results obtained by using the binary mask learned by GA is given in Table 3 and can be observed that the training accuracy of 77% is improved to 85% using diagonal crossover for the case of 90% CR and the result is supported with all model accuracy graphs Figure 7a–d. So here we validate the performance of the novel GACL method which could be integrated with any other good performing CNN.

5. Conclusions

This work brings out a new way of improving the accuracy of compressive domain learning using a novel CS sensing matrix learning method. In doing so, a very simple compressive sensing process is used which can be easily implemented using a standard digital camera and a pixel retaining mask function. By making the CNN architecture standard and simple, the strength of the sensing matrix learning process is brought out. The proposed GA-based learning of sensing matrix guarantees an increase in the training accuracy of the CNN by 8% for datasets consisting of natural images with more features for compression ratio of 90%. This gives a new direction to integrate this learning methodology with sophisticated CNN architecture customized to certain applications. The sensing matrix retains the image information in the CS measurements and thus does not need any pre-processing or reconstruction operation in the training and testing phase. In the numerical experiments we conducted, lot of intermediate results are obtained which throw light on compressive domain learning. The compressive sampling in spatial domain is achieved by applying binary mask to the dataset images to retain only P% of the pixels for a two-class image and the performance of CNN is thoroughly studied. The study reveals that the CNN performance is better when compressive sampling is applied to both training and testing dataset. The training accuracy achieved for the popular dog and cat dataset for the case of retaining only 10% of the pixels both in training and testing dataset is 77% with validation accuracy of 61%. The proposed novel GACL algorithm adopted in this research to learn the best binary mask and a training accuracy of 80% is achieved for the case of 10% pixels retained (CR = 90%) in both training and testing dataset using vertical crossover of the mask function. Much more improved training accuracy of 85% is achieved in GACL for the case of 10% pixels retained in both training and testing dataset using diagonal crossover of the mask function which is much entertaining and the concept can be deployed in IoT frame work where huge image data are transferred for analytics purpose. To examine the robustness of the proposed GACL-based compressive sampling, we extended the experiment to a user-defined small multiclass dataset and found both training and validation accuracies improve with GACL. The complete work is limited in the spatial domain and extending the implementation in frequency domain such as DCT, Wavelet will opens up new door in this direction. The proposed GA algorithm fit well with the existing GA-based tuning of hyper parameters of CNN. The binary mask that is applied to the dataset images can also be considered as a standard parameter of CNN architecture. Applying more computationally intensive advanced heuristic algorithms on a real valued masks can be explored in future.

Author Contributions

Ideation and concept, B.F.A.B.H. and P.R.; methodology, B.F.A.B.H. and P.R.; writing—original manuscript preparation, B.F.A.B.H. and P.R.; review P.R.; editing, B.F.A.B.H. and P.R.; supervision, P.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We use the kaggle dataset [43] for our classification experiment.

Acknowledgments

We are thankful to VIT management to which we are affiliated for the constant support in our research.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The abbreviations are used in this manuscript as follows:
CSCompressive Sensing
CNNConvolution Neural Network
SPCSingle Pixel Camera
PWHPrimitive Walsh Hadamard
GAGenetic Algorithm
GACLGenetic Algorithm based compressive learning
GACCNNGenetic Algorithm Based Compressive Convolution Neural Network
DRNNDynamic rate neural network
SPCsingle-pixel camera
SVMSupport vector machine
DCTDiscrete Cosine Transforms
GANGenerative adversarial network
RNNRecurrent neural networks
MRMeasuring Rate

References

  1. Candès, E.J. Compressive sampling. In Proceedings of the International Congress of Mathematicians, Madrid, Spain, 22–30 August 2006; Volume 3, pp. 1433–1452. [Google Scholar]
  2. Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
  3. Candes, E.; Romberg, J. Sparsity and incoherence in compressive sampling. Inverse Probl. 2007, 23, 969. [Google Scholar] [CrossRef]
  4. Candès, E.J.; Wakin, M.B. An introduction to compressive sampling. IEEE Signal Process. Mag. 2008, 25, 21–30. [Google Scholar] [CrossRef]
  5. Lustig, M.; Donoho, D.L.; Santos, J.M.; Pauly, J.M. Compressed sensing MRI. IEEE Signal Process. Mag. 2008, 25, 72–82. [Google Scholar] [CrossRef]
  6. Gao, Z.; Dai, L.; Han, S.; Chih-Lin, I.; Wang, Z.; Hanzo, L. Compressive sensing techniques for next-generation wireless communications. IEEE Wirel. Commun. 2018, 25, 144–153. [Google Scholar] [CrossRef]
  7. Duarte, M.F.; Davenport, M.A.; Takhar, D.; Laska, J.N.; Sun, T.; Kelly, K.F.; Baraniuk, R.G. Single-pixel imaging via compressive sampling. IEEE Signal Process. Mag. 2008, 25, 83–91. [Google Scholar] [CrossRef]
  8. Calderbank, R.; Jafarpour, S.; Schapire, R. Compressed Learning: Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain. 2009. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.7564 (accessed on 12 June 2022).
  9. Kriegeskorte, N. Deep neural networks: A new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 2015, 1, 417–446. [Google Scholar] [CrossRef]
  10. Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
  11. Shea, K.O.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
  12. Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 818–833. [Google Scholar]
  13. Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Fully convolutional neural networks for remote sensing image classification. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 5071–5074. [Google Scholar]
  14. Lokanath, M.; Kumar, K.S.; Keerthi, E.S. Accurate object classification and detection by faster-RCNN. IOP Conf. Ser. Mater. Sci. Eng. 2017, 263, 052028. [Google Scholar] [CrossRef]
  15. Hossain, M.B.; Posada-Quintero, H.F.; Kong, Y.; McNaboe, R.; Chon, K.H. Automatic motion artifact detection in electrodermal activity data using machine learning. Biomed. Signal Process. Control 2022, 74, 103483. [Google Scholar] [CrossRef]
  16. Kapgate, D. Efficient Quadcopter Flight Control Using Hybrid SSVEP+ P300 Visual Brain Computer Interface. Int. J. Hum.–Comput. Interact. 2022, 38, 42–52. [Google Scholar] [CrossRef]
  17. Roy, A.M. An efficient multi-scale CNN model with intrinsic feature integration for motor imagery EEG subject classification in brain-machine interfaces. Biomed. Signal Process. Control 2022, 74, 103496. [Google Scholar] [CrossRef]
  18. Lohit, S.; Kulkarni, K.; Turaga, P.; Wang, J.; Sankaranarayanan, A.C. Reconstruction-free inference on compressive measurements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 16–24. [Google Scholar]
  19. Lohit, S.; Kulkarni, K.; Turaga, P. Direct inference on compressive measurements using convolutional neural networks. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 1913–1917. [Google Scholar]
  20. Adler, A.; Elad, M.; Zibulevsky, M. compressed learning: A deep neural network approach. arXiv 2016, arXiv:1610.09615. [Google Scholar]
  21. Kulkarni, K.; Turaga, P. Reconstruction-free action inference from compressive imagers. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 772–784. [Google Scholar] [CrossRef]
  22. Davenport, M.A.; Duarte, M.F.; Wakin, M.B.; Laska, J.N.; Takhar, D.; Kelly, K.F.; Baraniuk, R.G. The smashed filter for compressive classification and target recognition. Comput. Imaging Int. Soc. Opt. Photonics 2007, 6498, 64980H. [Google Scholar]
  23. Li, Y.; Hegde, C.; Sankaranarayanan, A.C.; Baraniuk, R.; Kelly, K.F. Compressive image acquisition and classification via secant projections. J. Opt. 2015, 17, 065701. [Google Scholar] [CrossRef]
  24. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
  25. Fu, D.; Guimaraes, G. Using Compression to Speed up Image Classification in Artificial Neural Networks. 2016. Available online: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjs8c-tiub4AhUJl1YBHVOxD6gQFnoECAMQAQ&url=https%3A%2F%2Fwww.danfu.org%2Ffiles%2FCompressionImageClassification.pdf&usg=AOvVaw1foTAdAWzQUV6ELtSfdCeQ (accessed on 12 June 2022).
  26. Lohit, S.; Singh, R.; Kulkarni, K.; Turaga, P. Rate-adaptive neural networks for spatial multiplexers. arXiv 2018, arXiv:1809.02850. [Google Scholar]
  27. Ball, J.; Laparra, V.; Simoncelli, E.P. End-to-End Optimized, Image Compression. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017; pp. 1–27. [Google Scholar]
  28. Choi, Y.; El-Khamy, M.; Lee, J. Variable rate deep image compression with a conditional auto encoder. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 3146–3154. [Google Scholar]
  29. Johnston, N.; Vincent, D.; Minnen, D.; Covell, M.; Singh, S.; Chinen, T.; Hwang, S.J.; Shor, J.; Toderici, G. Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4385–4393. [Google Scholar]
  30. Agustsson, E.; Tschannen, M.; Mentzer, F.; Timofte, R.; Gool, L.V. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 221–231. [Google Scholar]
  31. Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. Learning in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1740–1749. [Google Scholar]
  32. Xu, Y.; Liu, W.; Kelly, K.F. Compressed Domain Image Classification Using a Dynamic-Rate Neural Network. IEEE Access 2020, 8, 217711–217722. [Google Scholar] [CrossRef]
  33. Torfason, R.; Mentzer, F.; Agustsson, E.; Tschannen, M.; Timofte, R.; Van Gool, L. Towards image understanding from deep compression without decoding. arXiv 2018, arXiv:1803.06131. [Google Scholar]
  34. Stanković, I.; Orović, I.; Stanković, S. Image reconstruction from a reduced set of pixels using a simplified gradient algorithm. In Proceedings of the 22nd Telecommunications Forum Telfor (TELFOR), Belgrade, Serbia, 25–27 November 2014; IEEE: Piscataway, NJ, USA; pp. 497–500. [Google Scholar]
  35. He, T.; Sun, S.; Guo, Z.; Chen, Z. Beyond coding: Detection-driven image compression with semantically structured bit-stream. In Proceedings of the 2019 Picture Coding Symposium (PCS), Ningbo, China, 12–15 November 2019; IEEE: Piscataway, NJ, USA; pp. 1–5. [Google Scholar]
  36. Gharib, M.R. Comparison of robust optimal QFT controller with TFC and MFC controller in a multi-input multi-output system. Rep. Mech. Eng. 2020, 1, 151–161. [Google Scholar] [CrossRef]
  37. Das, M.; Roy, A.; Maity, S.; Kar, S.; Sengupta, S. Solving fuzzy dynamic ship routing and scheduling problem through new genetic algorithm. Decis. Mak. Appl. Manag. Eng. 2021, 1–33. [Google Scholar] [CrossRef]
  38. Ganguly, S. Multi-objective distributed generation penetration planning with load model using particle swarm optimization. Decis. Mak. Appl. Manag. Eng. 2020, 3, 30–42. [Google Scholar] [CrossRef]
  39. Negi, G.; Kumar, A.; Pant, S.; Ram, M. Optimization of complex system reliability using hybrid grey wolf optimizer. Decis. Mak. Appl. Manag. Eng. 2021, 4, 241–256. [Google Scholar] [CrossRef]
  40. Ghosal, S.G.; Dey, S.; Chattopadhyay, P.P.; Datta, S.; Bhattacharyya, P. Designing optimized ternary catalytic alloy electrode for efficiency improvement of semiconductor gas sensors using a machine learning approach. Decis. Mak. Appl. Manag. Eng. 2021, 4, 126–139. [Google Scholar] [CrossRef]
  41. Sharma, R.; Kim, M.; Gupta, A. Motor imagery classification in brain-machine interface with machine learning algorithms: Classical approach to multi-layer perceptron model. Biomed. Signal Process. Control 2022, 71, 103101. [Google Scholar] [CrossRef]
  42. Ragab, M.G.; Abdulkadir, S.J.; Aziz, N.; Al-Tashi, Q.; Alyousifi, Y.; Alhussian, H.; Alqushaibi, A. A novel one-dimensional cnn with exponential adaptive gradients for air pollution index prediction. Sustainability 2020, 12, 10090. [Google Scholar] [CrossRef]
  43. Available online: https://www.kaggle.com/datasets/tongpython/cat-and-dog (accessed on 15 February 2022).
Figure 1. Network architecture of CNN.
Figure 1. Network architecture of CNN.
Applsci 12 06881 g001
Figure 2. Sample masked images of the dataset after PWH masking process with P% pixels retained (a) P = 100 (b) P = 50 (c) P = 25 (d) P = 15 (e) P = 10.
Figure 2. Sample masked images of the dataset after PWH masking process with P% pixels retained (a) P = 100 (b) P = 50 (c) P = 25 (d) P = 15 (e) P = 10.
Applsci 12 06881 g002
Figure 3. Training and validation accuracy and loss graph with respect to epochs for PWH masked dataset with P % pixels retained in testing dataset and Q% in training dataset (a) P = 100, Q = 100, (b) P = 50, Q = 100 (c) P = 25, Q = 100 (d) P = 15, Q = 100 (e) P = 10 , Q = 100 (f) P = Q = 50 (g) P = Q = 25 (h) P = Q = 15 (i) P = Q = 10.
Figure 3. Training and validation accuracy and loss graph with respect to epochs for PWH masked dataset with P % pixels retained in testing dataset and Q% in training dataset (a) P = 100, Q = 100, (b) P = 50, Q = 100 (c) P = 25, Q = 100 (d) P = 15, Q = 100 (e) P = 10 , Q = 100 (f) P = Q = 50 (g) P = Q = 25 (h) P = Q = 15 (i) P = Q = 10.
Applsci 12 06881 g003
Figure 4. Proposed Genetic Algorithm-based compressed learning (GACL) scheme.
Figure 4. Proposed Genetic Algorithm-based compressed learning (GACL) scheme.
Applsci 12 06881 g004
Figure 5. Top ten accuracies of the selected chromosomes of GACL representing the mask.
Figure 5. Top ten accuracies of the selected chromosomes of GACL representing the mask.
Applsci 12 06881 g005
Figure 6. Crossover pattern of the selected 256 × 256 chromosomes of GACL representing the mask (a) and (b) top two selected chromosomes (c,d) vertical crossover (e,f) diagonal crossover.
Figure 6. Crossover pattern of the selected 256 × 256 chromosomes of GACL representing the mask (a) and (b) top two selected chromosomes (c,d) vertical crossover (e,f) diagonal crossover.
Applsci 12 06881 g006
Figure 7. The validation and training performance with respect to epochs for two class CNN models achieved through GACL for P = Q = 10%, (a) accuracy for vertical cross over, (b) loss for vertical cross over, (c) accuracy for diagonal cross over, (d) loss for diagonal cross over.
Figure 7. The validation and training performance with respect to epochs for two class CNN models achieved through GACL for P = Q = 10%, (a) accuracy for vertical cross over, (b) loss for vertical cross over, (c) accuracy for diagonal cross over, (d) loss for diagonal cross over.
Applsci 12 06881 g007
Figure 8. Shows the validation and training performance of multiclass (5) CNN model without GACL for P = Q = 10%, (a) accuracy vs. epochs, (b) loss vs. epochs, (c) receiver operating characteristic over true and false rate, (d) confusion matrix.
Figure 8. Shows the validation and training performance of multiclass (5) CNN model without GACL for P = Q = 10%, (a) accuracy vs. epochs, (b) loss vs. epochs, (c) receiver operating characteristic over true and false rate, (d) confusion matrix.
Applsci 12 06881 g008
Figure 9. The validation and training performance with respect to epochs for multiclass (5) CNN model achieved through GACL for P = Q = 10%, (a) accuracy with vertical crossover, (b) loss with vertical crossover, (c) with diagonal crossover, (d) loss with diagonal crossover.
Figure 9. The validation and training performance with respect to epochs for multiclass (5) CNN model achieved through GACL for P = Q = 10%, (a) accuracy with vertical crossover, (b) loss with vertical crossover, (c) with diagonal crossover, (d) loss with diagonal crossover.
Applsci 12 06881 g009aApplsci 12 06881 g009b
Table 1. CNN model parameters.
Table 1. CNN model parameters.
ParametersValues
Model TypeSequential
Activation LayerRelu
Shear Range0.2
Zoom Range0.2
Flip TypeHorizontal
Filter Size64 × 64, 128 × 128
Kernel Size3 × 3
OptimizerAdam
Batch Size32
Epoch25–100
Class ModeBinary
Loss functionBinary Cross Entropy
MetricsAccuracies
Table 2. Training and validation accuracy comparison of the CNN on different percentage of pixel retained.
Table 2. Training and validation accuracy comparison of the CNN on different percentage of pixel retained.
Percentage of Pixels RetainedTraining AccuracyValidation Accuracy
Training DatasetTesting Dataset
100%100%97%76%
100%50%97%62%
100%25%97%59%
100%15%97%52%
100%10%96%49%
50%50%95%73%
25%25%87%67%
15%15%84%65%
10%10%77%61%
Table 3. Training and validation accuracy of the CNN on two class images with GA.
Table 3. Training and validation accuracy of the CNN on two class images with GA.
MethodPercentage of Pixels RetainedTraining AccuracyValidation Accuracy
Training DatasetTesting Dataset
Crossover10%10%80%60%
Diagonal Crossover10%10%85%62%
Table 4. Training and validation accuracy of the CNN on multiclass images without GA.
Table 4. Training and validation accuracy of the CNN on multiclass images without GA.
MethodPercentage of Pixels RetainedTraining AccuracyValidation Accuracy
Training DatasetTesting Dataset
Without GA10%10%67%45%
Table 5. Training and validation accuracy of the CNN on multiclass images with GA.
Table 5. Training and validation accuracy of the CNN on multiclass images with GA.
MethodPercentage of Pixels RetainedTraining AccuracyValidation Accuracy
Training DatasetTesting Dataset
Crossover10%10%50%44%
Diagonal Crossover10%10%67%52%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ali B H, B.F.; Ramachandran, P. Compressive Domain Deep CNN for Image Classification and Performance Improvement Using Genetic Algorithm-Based Sensing Mask Learning. Appl. Sci. 2022, 12, 6881. https://doi.org/10.3390/app12146881

AMA Style

Ali B H BF, Ramachandran P. Compressive Domain Deep CNN for Image Classification and Performance Improvement Using Genetic Algorithm-Based Sensing Mask Learning. Applied Sciences. 2022; 12(14):6881. https://doi.org/10.3390/app12146881

Chicago/Turabian Style

Ali B H, Baba Fakruddin, and Prakash Ramachandran. 2022. "Compressive Domain Deep CNN for Image Classification and Performance Improvement Using Genetic Algorithm-Based Sensing Mask Learning" Applied Sciences 12, no. 14: 6881. https://doi.org/10.3390/app12146881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop