1. Introduction
The Shannon–Nyquist sampling theorem states that the sample rate must be at least twice of the maximum signal frequency and this provides the foundation for signal reconstruction from discrete measurements. In order to be precisely reconstructed in the discrete situation, the number of measurements must be at least equal to the signal length. However, this strategy may necessitate a huge storage space, a long detecting time, lot of power, and a greater number of sensors. Compressive sensing (CS) [
1,
2,
3,
4,
5,
6,
7,
8] is a new theory that works beyond the standard technique and demonstrates how a sparse signal can be reconstructed from a smaller number of incoherent samples.
Comprehensive reviews of the present state of the field of CS imaging have been published. The single-pixel camera (SPC) [
7] is particularly helpful in imaging outside the visible range, such as infrared imaging, terahertz imaging, and hyperspectral imaging, when detector arrays are either prohibitively costly or non-existent. Instead of rebuilding the whole picture, the aim in many applications is to tackle an inference issue such as anomaly detection or classification. The primary premise of the CS method is that most real-world signals have a compact representation in a transform domain where just a few are significant, while the remaining are zero or inconsequential this condition is known as signal sparsity.
The advantage of learning by manipulating deep neural network [
9] is that the network can do feature extraction without the need for a human to do it manually. Deep neural networks developed throughout time, with more profound network topologies delving deeper into hidden layers. The convolutional neural network (CNN) is one of the most well-known deep neural networks in pattern recognition [
10]. Conventionally, CNN [
11] has made significant contributions to image processing due to its remarkable capacity to generate meaningful feature maps [
12] and information for conducting classification [
13], object recognition [
14], and signal analysis [
15,
16,
17].
Deep convolution neural network is very promising in image classification problem in which the CNN model is trained with the training dataset that comprises of images belonging to different classes and the CNN model can classify a test image as a subset of one of the existing classes. In the classical classification problem, the complete uncompressed raw images are used in the training process and it is a challenge to classify a test image if it is in compressed form and the challenges are both in terms of validation accuracy and the computational time required to decompress the compressed image. One of the interesting solution is to perform the training and validation process in the compressed domain. In this approach the learning happens directly on the compressed information and validation also done in the same compressed domain. The compressed domain inference has been studied decade before [
18,
19,
20,
21,
22,
23,
24,
25,
26] and many work have been extended recently [
27,
28,
29,
30,
31,
32,
33]. The key benefits of this approach are two-fold: (i) There is no need for direct reconstruction computation process; and (ii) the amount of information to be communicated through networks to the server is also minimized. As discussed earlier, CS is a paradigm shift in the sampling and compression process in which only required number of measurements are done on the signal and the signal is reconstructed using optimization techniques [
25,
26] and the method of learning on CS measurements will enjoy the compactness of the CS. Davenport et al. [
22] used the matched filter of compressed sensing patterns applied to a library of pictures to produce a ‘smashed’ filter and proved the validity of the random projection-based strategy for compressed domain image classification. For the same objective, Li et al. utilized the same SPC system but learnt sensing patterns using data-dependent “secant projections” [
23]. The convolutional neural network (CNN) has recently been used [
19,
20,
21,
22] and has generated substantially higher results. On the other hand, current neural network approaches, need to build a distinct network model for each individual measurement rate (MR).
In this proposed work, we focus on leveraging neural networks to conduct image classification directly on CS measurements without reconstruction, as well as pushing the limits of object recognition and classification with very low sensing and processing resources. In order to overcome the above limits, initially we reduced the pixels density by applying binary masking to the dataset images after getting the minimized learning accuracy, we applied genetic algorithm to determine the binary mask that is to be applied to the dataset while training the model using CNN. It is demonstrated that the mask learning improves the training accuracy on different possible crossover methods. Numerical experiments are conducted for CNN learning on CS measurements using standard image classification datasets and a performance analysis was conducted. There are two works [
20,
32] which are similar to our work and both used a sensing phenomenon mimicking the concept of SPC. SPC has a complicated implementation details but mathematically can be modeled using matrix operations. The sensing mechanism considered in our work is based on a standard digital camera output image and retaining only some selected pixels. This reduced set of pixels can be modeled as CS measurements [
34]. Unlike SPC, the CS measurements we consider here is a simple spatial domain process which retains some pixel and force the remaining pixels to be zero. If there is a need to communicate the test image in an application platform it is enough to communicate the retained pixels as the mask is known both in client and server side. In server side CS measurements can be easily rearranged as an image using the mask pattern. In SPC-based sensing phenomenon the CS measurements cannot be directly used for testing or training. The inverse operation of SPC has to be performed to get the approximate images that are to be trained or tested [
20,
32]. In our experiment, the key performance parameter we considered is the training accuracy of the CNN model for the given percentage of pixels used in the learning process. The percentage of the pixels not used is otherwise considered as the compression ratio CR. Considering the fact that the binary mask used are fixed for all the training and testing images and are arbitrarily high only in 10% of locations, we propose a Genetic Algorithm-based compressed learning (GACL) which is a novel process that will improve the training accuracy of the CNN. The GACL proposed is to determine the best mask that is to be applied to all images in the training set which will maximize the training accuracy of CNN. This is achieved by using Genetic Algorithm to determine the best chromosomes of the mask by assuming the CNN training accuracy as the objective function that is to be maximized. Our numerical experiment shows that in the process of CNN learning on CS measurements, when 10% of pixels are retained (compression ratio (CR) = 90%), the training accuracy of 77% with untrained fixed mask improves to 80% when the masks are genetically learned with vertical cross over and it drastically improves to 85% with diagonal crossover GA learning of mask. Comparison of the classification training accuracies of the CNN for two different scenarios, one with GACL and another without GACL, shows that GACL improves the model accuracies in a larger extent. This improvement performance is highly data dependent and can be compared only with existing work that uses the same sensing principle and dataset.
3. Proposed Methodology
In this proposed work, we focus on neural networks to conduct image classification directly on CS measurements done on spatial domain. The CS measurements are used directly for learning without any reconstruction. The sensing model of CS is implemented applying binary masking to the dataset images and later the mask is learned iteratively using GA to improve the accuracy of learning. This was experimented on different possible crossover methods in GA to achieve high accuracy.
This section is divided in two,
Section 3.1 describes the study on binary masking based on compressed domain learning and
Section 3.2 describes the proposed Genetic Algorithm-Based Compressive Convolution Neural Network (GACCNN) training which is based on our proposed method Genetic Algorithm-based compressed domain learning (GACL).
3.1. Study of Binary Masking Based Compressed Domain Learning
The database used for this work consists of RGB images of dimension belonging to two categories, with 2000 training images and 500 test images. All the images belonging to the dataset are resized to standard images in our scheme. The binary sensing matrix, which is a structured matrix, is contrasted to the Primitive Walsh–Hadamard (PWH) matrix, which is essentially a random matrix. Binary Sensing Matrices of size are implemented to use only a few elements of the original image; these matrices are used as a sieve model at a certain moment. Applying binary matrix and primitive Walsh–Hadamard matrix with original image retains only P% of its pixels and drastically reduces the number of pixels to be transmitted/stored. But it is also a fact that this process will significantly reduce the classifying accuracy due to distraction of the image.
Let us consider a typical image classification problem, say the widely used cat dog classification problem in which an image under a test has to be transmitted to a server where CNN models are deployed. There is a need for compression of this test image I to obtain the compressed image to reduce the communication time. One of the effective and fast ways of compressing is by CS. The CS is applied by multiplying a known binary mask. with the image will force many pixels to zero and retains only P% of pixels for which the mathematical model is given by whereas is the image used in Training/Testing.
The best binary mask that retains the perceptual quality of image is Hadamard. We conducted an experiment by applying Hadamard binary mask to a test set which retains 50% of the pixels for a CNN-based image classification for the dataset [
43]. The basic functional parameters used to configure the CNN for our experiment is shown in
Table 1 and the network architecture of CNN is shown in
Figure 1. For this case of retaining 50% of pixels, we found that the training accuracy of CNN is good enough whereas validation accuracy falls much lower than 70%. The reason is because the training set includes unmasked complete images whereas the test set includes masked images. When the experiment is repeated with masked training set, the validation accuracy improves to 75%.We further studied by decreasing the number of retained pixels by arbitrarily throwing the pixels of the PWH mask.
Figure 2 illustrates the masking process executed on the dataset images in the proposed algorithm and its outcome for various levels of pixels retaining.
Figure 2a depicts the case of 100% pixel retained and the cases of images with pixels retained 50%, 25%, 15%, and 10% by applying PWH is depicted in
Figure 2b–e.
Using the same CNN architecture of parameter shown in
Table 1,we conducted the learning experiment in two different ways: (i) by using complete images in training (100% pixels retained) and PWH masking (retaining 50%, 25%, 15%, 10%) the test set and (ii) by using both training and testing set masked. The experimental results are tabulated in
Table 2. When 100% pixels are retained for both the training and testing datasets there is no compression involved and was able to get 97% training accuracy. The model accuracies and loss function for this case is shown in
Figure 3a. When only some portion of pixels are retained to achieve compression, accuracy drops and it can be well inferred from
Table 2. But it should be noted that the training accuracy is better in the case (ii) which involves masking of both training and testing datasets than that of case (i) in which the training set is not masked. The CNN model accuracies and the loss functions for case (i) are shown in
Figure 3b–e and that of case (ii) are shown in
Figure 3f–i.
3.2. Genetic Algorithm-Based Compressed Domain Learning
The experiment we conducted in
Section 3.1 is a proof of concept that a CNN can learn even from 10% of distributed pixels of the training set with a training accuracy of 77% and can recognize the image with a validation accuracy of 61%. This is because even when the retained pixel is 10% it is distributed well in all the region of image by using PWH-based binary mask. It should be noted that the same binary mask is used for all the images in the training and testing sets. The mask being randomly distributed with 10% of points 1 and remaining 0, there is a lot of scope to search for the best mask which would give better accuracy than our experimental accuracy 77%. The search space to get the best mask is large and we propose to use genetic algorithm to make the mask to maximize the training accuracy which is taken as the objective function of the GA. Generalized block diagram for Genetic Algorithm-based compressed learning (GACL) for two class dataset is shown in
Figure 4, this depicts the work flow process for the training and testing models.
This research work is initiated by performing training on the widely used Cat and Dog datasets. One of the highlights of this work is that we used two types of fixed binary sensing patterns that are used as a binary sensing matrix of besides training of the original image in genetic algorithm. Alongside fusing original images with Binary Sensing and primitive Walsh Hadamard Matrices, where training and testing dataset are created from this generated dataset, the pixel density value can be varied as discussed in
Section 3.1 and this has been applied to CNN model. This process performs arbitrarily which generates different training accuracies, which is termed as chromosomes and the training accuracies of these top ten chromosomes achieved in CNN are displayed in
Figure 5.
To create their offspring, a crossover point is chosen over top two accuracy chromosomes. On these top two high accurate chromosomes, crossover operation was performed by choosing the first left half of the first matrix with second right half of the second matrix, later second right half of the first matrix with first left half of the second matrix as shown in
Figure 6c,d was considered. The outcome of this matrix manipulation was used to mask the training and testing set images in the CNN algorithm and the results are tabulated in
Table 3 and the resultant model accuracy graphs are shown in
Figure 7a,b. Though the best achieved accuracy of 80% is better than the method without GA, the result was not convincible that this experiment was further progressed with diagonal cross over method as shown in
Figure 6e,f. In this method the crossover operation was performed by choosing the segments that are separated diagonally and using this crossover it was able to obtain improved accuracy of 85% as shown in
Table 3 which is better than the method without GA and also acceptable for achieving best classification and the resultant model accuracy graphs are shown in
Figure 7c,d. This novel method of genetically learning the best pattern of pixels to be retained for an accurate CNN classification throws light on new ways of compressed sensing learning and will be much useful for faster communication of test images in an IoT platform.
3.3. GACL for Multiclass Datasets
The proposed GACL method works well in a two class dataset, and to examine it further we experimented with the same procedure on multiclass images by adding the more user-defined classes to the dataset used in two class experiment. Training of multiclass dataset with five different classes was considered initially and was able to attain training accuracy of 66.34% for the case of 10% of pixel retaining and with GA achieved an accuracy of 50% with conventional crossover and an improved accuracy of 67% with diagonal crossover as tabulated in
Table 4 and
Table 5. It should also be noted that there is no drastic improvement in training accuracy but improvement in the validation accuracy is considerable. The model accuracy, receiver operating characteristic, loss and confusion matrices are shown in
Figure 8 and
Figure 9.
4. Results and Discussion
This research is a proof of concept to bring out the possibility of training a CNN network-based classifier by using training images which have only few pixels that are retained and remaining forced to zero which facilitates the reduction of information to be communicated in an application platform. This is based on the fact that the retained pixels can be assumed as the CS measurements and can be used directly in the training process. This CS process can be easily implemented practically with simple digital cameras. By looking at the result of image obtained by retaining only the 50% of the pixels shown in
Figure 2b, we can infer that it is legible for human eye to classify it as a cat and same thing applicable for
Figure 2c–e. Intuitively we can say that the deep CNN which mimics human visual system can also recognize the image from the CS images having partial amount of pixels. We also consider the dataset that has natural images rather than simple images and it is interesting to note the result of our classification experiment in
Table 1, that for the case of CS testing images the training accuracy of CNN models trained with CS images are better than that of training with complete images. The percentage of pixels not retained can be considered as the CR and for the case of CR of 90% the training accuracy is 77% with the validation accuracy 61%. In this case the CNN used is a very basic architecture which is vastly used for common image recognition application. For the completeness of the experiment all the model accuracies graphs are presented in
Figure 3a–e. To bring out the efficacy of the proposed GACL algorithm, the results obtained by using the binary mask learned by GA is given in
Table 3 and can be observed that the training accuracy of 77% is improved to 85% using diagonal crossover for the case of 90% CR and the result is supported with all model accuracy graphs
Figure 7a–d. So here we validate the performance of the novel GACL method which could be integrated with any other good performing CNN.