1. Introduction
Technology and the Internet have become vital aspects of human lives in all scopes. Many institutions have converted their work to rely almost 100% on technology. All correspondence is exchanged by email. In some cases, data are being stored on the cloud, which has become more secure than personal vices or even institutions’ servers.
Health is one of the most important sectors that has been converted to technology in many aspects. With the development of scanning and imaging devices, such as MRI, X-ray, and others, medical images have been produced and stored in clinics, hospitals, and on physicians’ personal computers every day and in large amounts.
Medical images are considered the most sensitive data transferred or stored over the Internet [
1]. Thus, the need to preserve their privacy has become a very hot research problem that has been tackled to propose proper solutions.
Encryption is one of the best solutions proposed for this problem. Several encryption algorithms have been created and used on data in general and medical images specifically.
Nevertheless, medical images’ sizes can vary from small to large, reaching over 4000 × 4000, which becomes even larger when dealing with colored images. Encrypting large-sized images may take time, especially with the additional steps aiming to sophisticate the encryption process to prevent possible malicious attacks related to medical images [
2]. We should note that practical encryption techniques such as AES cannot solely provide authentication and integrity [
3]; hence, they are usually combined with other techniques to be considered reliable.
Images, generally, have also been the subject of research in artificial intelligence (AI) systems. Various research studies have applied all types of AI models that perform classifications [
4,
5], clustering [
6], segmentation [
7,
8], generation of fake images [
9], denoising [
10] and impainting [
11].
Autoencoders are used with medical images to extract necessary features and reconstruct the images with remarkable accuracy [
12]. Encoding medical images using autoencoders is a known deep learning method that reduces the dimensionality of the images into smaller, compact representations of the image as well [
13]. The size of the generated data out of the autoencoder can be controlled according to the architecture of the used autoencoder. The encoded data, generated from autoencoders, can be used to regenerate the original images. However, the encoded data are entirely different from the original data and cannot be viewed as a representation of the original data. Hence, encrypting the encoded data of a medical image cannot be used to maliciously view the content of the medical image after encrypting the encoded data. On the other hand, encrypting the original image content can be used for malicious purposes [
2].
Autoencoders are also used in much work as a powerful denoising tool. The work in [
14,
15,
16] addressed the benefit of using autoencoders for medical image denoising. Medical images are prone to different types of noise and poor quality due to the technology used for taking the images [
14,
16]. In this work, we are interested in using the autoencoder to encrypt medical images to overcome the problem of malicious viewing, which is common with medical images. Encrypting the autoencoder’s extracted features can also reduce the required data to be encrypted and transferred, resulting in a faster encryption process.
The AES encryption algorithm has proven to be a robust and reliable encryption technique that can transfer data over the Internet [
17]. AES is widely used in developing highly secure encryption techniques such as the one in [
17,
18,
19] and many others.
The autoencoder, illustrated in
Figure 1, is a deep learning model used to perform several tasks, such as denoising and impainting. It trains images by extracting important features and gathering them in a bottleneck layer in the encoder phase. Then the decoder uses these features to reconstruct the same image after removing noise, called denoising. Or, it can reconstruct the image by filling the empty spaces in it, called impainting.
Both symmetric and asymmetric encryption techniques aim to protect data confidentiality, integrity, and authenticity over the Internet and other computer-based systems, such as computer clouds. Symmetric encryption uses the same key to encrypt and decrypt data. On the other hand, asymmetric encryption uses different keys on the encryption and decryption sides. Symmetric encryption is faster and requires fewer hardware and software resources. Transmitting large amounts of data via asymmetric encryption techniques can be considered impractical. Symmetric key algorithms alone cannot provide authentication and integrity. Hence, they should be embedded with other techniques to be considered practical [
3].
This research proposes a medial image cryptosystem; the system uses an autoencoder to extract the important features from the image on the sender’s side. These features are the ones to be encrypted using the state-of-the-art advanced encryption standard (AES) [
20], and they are then sent to the receiver. After decrypting the features, the receiver uses the decoder part to reconstruct the original image.
Consequently, in this study, we propose a robust medical image encryption algorithm that uses a deep learning model before encrypting the data using AES. The used deep learning model is an autoencoder, which is supposed to give us the ability to minimize the data being encrypted and transferred, as the data transmission is supposed to happen to the output of the encoder part of the autoencoder. This allows for transmitting medical images without sharing the real content of the image. On the decryption side, the decoder is supposed to regenerate the medical image from the encrypted transmitted data after applying decryption. Encrypting the encoder output, which is a part of the autoencoder, makes extracting information from the encrypted data over vulnerable transmission channels almost impossible. Even when malicious parties access the data, the image is not transferred. Even when the data is decrypted, no conclusions can be built over the data without the secret autoencoder model. The autoencoder is also used to enhance the quality of the encrypted images, as it is used as a denoising tool.
1.1. Summery of Contribution
The contributions of this research are listed in the following points:
We present a new technique for image encryption where deep learning (autoencoder) has been used to generate the shared encrypted data.
We present an encryption model that allows control of the size and structure of the data being encrypted and transmitted by using the autoencoders as a feature extraction instead of the actual images’ contents.
We present an efficient encryption model that can denoise medical images during the decryption process.
Previous work that used deep learning techniques with cryptography applications used deep learning mainly as an obfuscation tool to enhance data hiding and prevent malicious views for the data carried in data ciphers. This work uses deep learning tools as an enhancing tool prior to the encryption process. During encryption, deep learning is used to minimize the size of the data to be encrypted and to take the original data into another scope where malicious views are almost impossible. During data transmission, even if the encryption process is broken, the transferred data are the extracted features from the auto-encoder; hence the attacker will get useless data. This use for deep learning tools such as auto-encoders can be considered a state-of-the-art technique that efficiently can improve the encryption process for medical images and many other forms of data.
1.2. Paper Organization
The paper is organized as follows. The next section reviews some significant work related to the current research. Then the proposed encryption model is presented. The fourth section presents the experiments and the results, and finally, the conclusion and future work are described in the last section.
2. Related Work
Data encryption has always been considered essential to protect digital data and information, especially during transmission over different channels—one form of information that has attracted special attention is medical images. Medical images usually require special encryption methods and techniques to hide the information in the image [
21,
22]. Medical images are also not tolerant of data loss during encryption and transmission due to the importance of the details carried in the images and their role in the diagnosis process.
Medical image encryption was recently discussed in [
18,
23,
24]. In this work, we take a different path than the ones taken in the formerly noted work. We focus on using deep learning methods, namely, autoencoders, to safely and efficiently transfer medical images.
Special encryption methods for medical images were recently proposed, aiming to enhance the encryption process in many ways. For example, the work in [
21] proposed an encryption technique using the SCAN technique and a chaotic tent map system to enhance the security measures of the encryption process.
Medical image homomorphic encryption was discussed in [
25] to allow access to medical images in their encrypted form. Homomorphic encryption takes care of images being processed over clouds. The study showed that the encryption technique has a very high computational cost, for which they proposed a partition technique with a multi-agent technique to overcome this problem.
Machine learning and deep learning methods were used with medical images for multiple purposes, such as disease detection [
26], dermatology health care services [
27], and image segmentation [
28,
29]. Deep learning methods for improving medical image encryption techniques have also been proposed in the literature. The work presented in [
30] aims to obfuscate medical images so human eyes cannot detect the important features. At the same time, they can be trained using deep learning models with an acceptable range of accuracy loss. The work used a variational autoencoder (VAE) and a random non-bijective pixel intensity mapping to protect the content of medical images. At the same time, the images could be used to train DL models and give good results.
At the same time, the authors of [
2] proposed an image encryption algorithm based on a deep learning model. They proposed this model to encrypt medical images from the Internet of Medical Things systems. Their proposed model (DeepEDN) consists of a cycle generative adversarial network (GAN). This network is trained to transform the medical images into another form that works as a cipher image sent to the receiver. The original image is reconstructed (decrypted) from this cipher image on the receiver’s side. It has been proven to be secured against several types of attacks, such as ciphertext only and chosen ciphertext attacks, in addition to known plaintext and chosen plaintext attacks.
In some phases, the authors of [
31] have depended on the work of [
2]. They proposed an autoencoder network mainly used to scramble the image and generate a key. Then they used the Cycle GAN, presented by [
2], to change the image into a different form. This should be done on the sender’s side. In contrast, at the receiver’s side, a reverse of the operation is applied by using the same structure to reconstruct the original, scrambled image and then descramble it to retrieve the original one. The parameters of the GAN network are used as public and private keys, creating an asymmetric encryption system.
As for [
32], they also used GAN to change the linearity nature of an image encryption system. In their paper, they claim that using GAN combined with SHA-256 as a chaotic system could create a cryptosystem that is immune against known plaintext and chosen plaintext attacks that usually target linear image encryption systems. They start by creating and adding noise to the original image and then, by using the logistic maps, convert this image to a cipher image that can be sent to the receiver safely. Depending on the non-linear nature of the used GAN, they proved that their system could resist well-known attacks such as known or chosen plaintext attacks.
Similar techniques were applied in [
33]. They started their encryption model by using logistic maps to scramble the image; then, an autoencoder is used to encrypt the image to create a cipher image.
On the other hand, ref. [
14] studied the efficiency of using autoencoders in denoising medical images. The autoencoder was trained with a flattened dataset where each row representing an image was processed by adding Gaussian and Poisson noise with different parameter values. The testing results showed visual and measurable enhancement on corrupted images. The autoencoder enhanced the quality of noisy medical images, even with small datasets. Extremely corrupted images that almost did not show the original image content before adding the noise were clarified so that the image content was visible. The study presented in [
14] showed that autoencoders could perform better than median filters commonly used in denoising medical images. Using autoencoders to denoise noisy bio-medical images was presented in [
34]; the work showed that autoencoders can eliminate the added noise into the images even with a very high noising factor.
It has been noticed so far that GANs, autoencoders, or any neural network used in these systems, are not used alone to create encrypted images. A previous noising or scrambling phase is added to the image before inputting it into the network to create the cipher image. The main target in previous literature was to add randomness to the original image before encrypting it, either by the scrambling or the noising phase. Non-linearity has also been a target that has been added using the NN architecture. We dare to claim that our work is the first in the literature to use an autoencoder network as a pre-step to the encryption algorithm. In this encryption model, we propose that the encrypted data are not the image itself, but the extracted features are the data to be encrypted. We also present the efficiency of the autoencoder in denoising medical images through the encryption process.
Deep learning has found its way to many applications and succeeded in moving them into a more sophisticated intelligent scope. Deep learning is still new to cryptography applications, and few studies use deep learning methods in cryptography applications. This work is influenced by the exceptional cases rising when efficient encryption schemes are used with medical images and use the autoencoder model of deep learning to enhance the encryption process security and encryption process time efficiency. Other work that used deep learning to enhance the security of medical images mainly used it to produce ciphered data rather than produce suitable encrypted content.
Table 1 summarizes other work in scope and compares it to our work.
4. Experiment and Results
The experiment was conducted on Colab Pro. The used dataset has been split into 80% training and 20% testing. The proposed model consists of three main steps.
The first step was to train the autoencoder. In the proposed model, the autoencoder used ‘adamax’ as its optimizer function with 60 epochs and mean square error as the loss function. The model achieved an accuracy of 83%.
Figure 5 shows the loss for training and testing. It has been noticed that the training and testing losses are almost the same, which indicates that the model performs very well.
The second step was to use the trained encoder to extract the features from the image using the AES256 encryption algorithm and send the encrypted data to the intended party.
Figure 6a shows the original image before any processing, and
Figure 6b shows the feature extracted from the original image (the output of the encoder model). It is worth noting that the output of the encoder is not an image-like structure but has been represented in an image for visualization purposes only. In other words, the extracted features from the original image do not conform to an image; they are stored in an n-dimension matrix containing floating-point data. Because the output of the encoder has three dimensions, we were able to convert it into an image. The size of the feature matrix was (112 × 112 × 3).
Figure 6c represents the encrypted data output; as previously noted, the extracted features were encrypted using 256 keys derived from the original shared key. The first 256 rows of the features were encrypted using the 256 keys, and the process was repeated until all features were encrypted. The size of the encrypted data was (112 × 112 × 3).
The third and final step was to use the same AES256 key for decryption and then use the decoder to reconstruct the image.
Figure 6d represents the decrypted features.
Figure 6e illustrates the reconstructed image after using the decoder.
The encryption process takes place after the features are extracted using the autoencoder. AES’s encryption process is performed on the extracted features rather than the medical image data. Using the autoencoder on the decryption side to extract the original image from the decrypted feature data can give the system special robustness.
Table 3 shows the time analysis conducted on the proposed model. It can be noticed that the encryption and decryption times are reduced when using the proposed model.
It is worth noting that the well-known evaluation metrics that have been used in the literature to evaluate the medical image encryption algorithms cannot be used in our case as the output of the encoder is not an image but rather a bulk of data that has positive and negative floating point values resulting from the deep learning model.
Thus, to compare our work with previous related work, ref. [
2] has been chosen for this comparison. In their work, they used a deep learning model (GAN) but utilized it as a part of the encryption and decryption processes. In our case, the deep learning model (autoencoder) was trained to use the encoder as a preliminary step to the encryption process and the decoder as the next step of the decryption process. When comparing both models with the state-of-the-art AES, the model proposed by [
2] reduced the original encryption time of AES by 50%; in our model, the reduction was 72%.
Table 4 illustrates this comparison.
As previously noted, one of the applications of an autoencoder is denoising images. Medical images may gain some noise during their capturing due to a device problem or even an unclear lens. Thus, in addition to reducing the size of the data to be encrypted, using an autoencoder has also helped reduce the percentage of noise in medical images produced by the decoder in the final step.
Certain experiments were applied to confirm this effect on some of the images in the testing dataset. Noise was added to certain images before inputting them into the model. These images passed through all the phases: encoding, encryption, decryption, and decoding. It was found that the noise amount was reduced from the resulting images.
Two metrics have been used to compare the noise amount on the inputted images and the outputted ones: structure similarity (SSIM) and mean square error (MSE) [
36]. Structure similarity, illustrated in Equation (
1), is a metric that gives a percentage of similarity between two images; thus, a higher value indicates better results. In the equation,
and
indicate the local means, and
and
represent the standard deviations. As for
, it represents the cross-covariance of both images.
As for the mean square error metric, illustrated in Equation (
2), it calculates the amount of error or difference between two images; thus, the lower the error value, the better. In the equation,
O and
N represent the original and the noisy images, respectively, and
m and
n represent number of pixels in each. For further illustration,
Table 5 summarizes the used mathematical notations.
Table 6 illustrates the resulting scores of SSIM and MSE and compares the amount of noise between the original and the noisy images on one side and the original and the decoder output images on the other. It has been noticed that for all types of noise added, the decoder reduced this noise in the resulting images.
Analysis and Discussion
This sub-section is dedicated to discussing the results illustrated earlier. The encrypted data in our scheme is not the content of the original image but a compressed version of the image, meaning we have encrypted the features of the image. Thus, the features are presented as floating-point numbers whose values are not related to the image’s content, and AES is considered robust for encrypting floating-point data. In other words, the encrypted data are no longer a medical image.
Encrypting the extracted features has minimized the amount of data that are required to be encrypted. Accordingly, the time needed for encryption and decryption was reduced. Aside from changing the nature of the image, the encryption and decryption time were reduced by approximately 72%. It has also significantly aided in resisting malicious views on encrypted medical images.
The denoising effect of an autoencoder is caused by the usage of a max pooling layer in the model [
37]. This layer reduces the size of an inputted image (matrix) by extracting the most important features. The noise, in this case, is not considered an important feature. Thus, it is discarded as a result of this layer.
In our model, one max pooling layer has been used. It has been noticed that if a second max pooling layer is added to the model, the resulting image will be even less noisy; it will be blurred. Because this model is intended to be used for medical images and the sensitivity of these images, it is more important to clarify the important features than remove a larger amount of noise. Thus, it was decided to settle for one max pooling layer for this purpose.
Regarding the quality of the output of the resulting image from the autoencoder when compared to the original image, the MSE was 0.0019% and the SSIM was 0.9528%, which indicates a loss of 0.0472, which can be neglected.
Returning to
Figure 6b–d, they do not represent the actual outputs of the proposed model, which are indeed a set of floating-point values, not the values of the image content. However, the output was reframed in the form of the images noted earlier so that the reader can imagine the process that is taking place. In addition, we intentionally add dense layer (3) to the autoencoder so that we can present it to the reader as an image, but in real-life applications, this last layer will not be applied. Hence, the ability to present the transmitted data (features) as an image will not be an option.