4.1. Security Analysis
This section provides the overall security analysis of our privacy-preserving CNN inference protocol.
For any -bit message , Theorem 1 states that the probability for a probabilistic polynomial time (PPT) adversary to output a correct guess of is negligible.
Theorem 1. Given the ciphertext of a κ-bit message generated using iPPSP, the probability for a PPT adversary to output a correct guess for follows:where is a negligible function regarding the security parameter λ, is the guess for made by , and denotes the probability that the adversary makes a correct guess on . Proof. Given a -bit input , iPPSP encrypts it with an -bit random number generated by padding T random numbers of l-bit with a secure PRG, which indicates that the distribution of should be indistinguishable from uniform. To make a correct guess on without ciphertext, has . Given the ciphertext , , we differentiate two different cases to discuss the impact of the value of . In the first case, , we have , since is indistinguishable from within the range. If or , then the distribution of will be affected by and the total possible inputs are reduced to or . Hence, we have or , respectively. In these cases, .
Consider the second case when
or
; this happens only when the value of encryption pad
is either too big or too small. More specifically, this happens only when
or
. The probability of this event is
Thus, we require
to be greater than some pre-set security parameter
(e.g.,
). In this case, the probability
is negligible, we denote it as
. Combining the two cases:
Hence, we have
which concludes the proof. □
Theorem 1 shows that. with appropriate system parameters. the ciphertext distribution cannot be distinguished with non-negligible probability by a PPT adversary. Since inputs of different CNN layers are independently encrypted, this property holds for the entire CNN inference task.
Next, we show that our protocol is semantically secure under the chosen-plaintext attack (CPA) model. Assume a PPT adversary has access to an encryption oracle . When queries the oracle, it chooses a message m on its own and sends it to the oracle. The oracle will execute the protocol and output an encryption of message , where is sampled through our key-generation algorithm. Recall that, during pad generation, the client generates T random numbers to identify elements in key pool . Thus, the total possible combination of the pad is given by . For security reasons, we assure the probability that the same pad is re-used is negligible regarding the security parameter . That is, , which can be achieved by choosing appropriate parameters. Specifically, we can formulate the security in Theorem 2.
Theorem 2. For every PPT adversary , there exists a negligible function such that the probability of the CPA-indistinguishability experiment Exp to output 1 is only negligibly more than : Proof. We first describe the CPA-indistinguishability experiment . The adversary is given input and access to the encryption oracle . samples a pair of message from the message space and sends them to the challenger. Then, a random bit is selected by the challenger, who encrypts the message , where is generated through Algorithm 3. The challenger sends the ciphertext to the adversary. Here, we assume that the output of Algorithm 3 follows distribution . In Theorem 1, we show that cannot be distinguished from a uniform distribution on with non-negligible probability. The adversary can keep querying the oracle for polynomial times and then output a bit . The output of experiment is defined as 1 if and 0 otherwise.
Then, we define a sequence of hybrid
By construction, if can win the CPA game with a non-negligible probability p, i.e., the probability of to output 1 is larger than the experiment to output 0 with non-negligible probability, then can distinguish and with same probability p. Therefore, from the hybrid lemma, which shows the transitivity of hybrids, we immediately know that can also distinguish between two consecutive hybrids with non-negligible probability . We will proceed to show that this gives rise to a contradiction.
Define a pair of nonuniform PPT machines by . It is clear that is an efficient operator. Observe that and . Guaranteed by our construction, with only negligible probability, the pad generated by our algorithm can be distinguished from a truly random number sampled from a uniform distribution . Thus, from the efficient operation lemma, . Similarly, it is easy to show that . Then, to preserve the transitivity of hybrids, must be able to distinguish and . Nonetheless, by the perfect secrecy of one time pad, we know that and are indistinguishable, which immediately introduces the contradiction. □
Next, we show that our construction is secure against a key-recovery attack. Assume that a PPT adversary has access to the encryption oracle and wants to perform a key-recovery attack, i.e., to recover the key pool given the polynomial number of accesses to the oracle . In Theorem 3, we show that the probability that the adversary succeeds in performing such an attack is negligible.
Theorem 3. For any PPT adversary with access to encryption oracle , the probability of performing a successful key-recovery attack iswhich is negligible in terms of security parameter λ, , where s denotes the number of secret keys, and T represents the number of keys used to generate the pad r for encryption. where λ is the security parameter. Proof. By accessing the encryption
for each query,
can choose a message
m from the given message space that it chooses, and the oracle will return its encryption
, where
is generated through the pad-generation algorithm. Thus, the adversary can easily recover the pad
by
. Recall that each pad is a combination of the secret key:
To successfully recover the key pool of size s, the adversary is asked to give a unique solution of a linear system . The variable is an s-dimensional vector where each entry represents a secret key value in the key pool. The coefficient matrix A is a sparse matrix of dimension , where is the number of queries made by the adversary. The non-zero elements indicate the secret key value used for generating the pad . The sparsity of the matrix ranges from .
To launch the attack, adversary
needs at least
s equations to give a solution to the system. Consider the case that
only retrieves
k equations where
; it is clear that the linear system would be an underdetermined system, which implies infinite solutions. Thus, to recover the key pool, the adversary shall have at least
s equations. However, notice that, for each query, the attacker is only able to recover the pad
r while the randomnesses
are kept secret on the user’s end. Thus, to recover the secret key,
has to first guess the randomnesses, i.e., to guess the coefficient matrix
A for at least
s times. For each query, the adversary needs to identify which key value is used in generating the pad. This problem can be described as a sampling problem with replacement. The possible combination of secret key values is given by
. That is, the probability of a successful guess in one equation is
. Here, we do not consider the trivial case
in which for each query adversary can reveal a secret key value since the pad is exactly equal to the secret key chosen. When
, to retrieve the secret key value, the adversary needs to recover at least
s equations on
attempts. Thus, the overall probability of recovering
s equations out of
attempts is given by
. We require the probability to be negligible in terms of the security parameter
. That is,
. To prove this, we first show that the upper bound of
is given by
. To start with, we know that
Consider the expansion of
. If we only consider the term with respect to
s, we have
, which immediately yields
. Substituting the term in Equation (
5), we have
. Now, it is easy to see that
Next, we show that, with the careful parameter, we have . Specifically, we have , where s is the key pool size. Notice that is polynomial in terms of , where . Since , we have . If p is sub-exponential, then the inequality holds. We know that . □
Discussion We now discuss the overall security of the protocol and the impact of the parameters. In our protocols, we have two important parameters: s and T, where s denotes the number of secret keys, and T represents the number of keys used to generate the pad for encryption. The key-pool size s is strongly related to the storage overhead, since our protocol needs to pre-compute the product of the keys and weights. T is related to the computation complexity, because the IoT device has to perform T additions for each encryption and T subtraction with a search operation for each decryption.
To fulfill the security requirement while preserving the efficiency, the parameters shall be chosen carefully. Throughout our implementation, s is chosen to be 8192 under careful consideration of the storage overhead. Under this condition, to fulfill the requirement of being semantically secure with a security parameter (or ), T is required to be at least 10 (or 6). Nonetheless, semantic security indicates that the adversary cannot distinguish between different plaintexts being encrypted. If such a security requirement can be relaxed by instead focusing on defeating the key-recovery attack, the protocol can be made more efficient. Following Theorem 3, when , the probability that the adversary can launch a successful key-recovery attack is negligible. In real IoT systems, the trade-off between security and efficiency can be chosen based on practical security requirements.
4.2. Performance Analysis
In this section, we provide the performance analysis both from a theoretical perspective and with experimental results. In our protocol, we consider input data and weights to be real numbers, and we use floating-point operations (FLOPs) to denote the addition and multiplication operation. We consider a typical network structure AlexNet for our analysis.
Computational Cost Given a convolutional layer with
H kernels, stride
, and padding
p, for each input data of size
, the output tensor is of size
, where
. An IoT device needs to performs
FLOPs and
FLOPs to encrypt and decrypt, respectively. In comparison, the computational cost for local execution is
FLOPs. Moreover, it is noteworthy that the operations in our protocol are addition/subtraction only, while the local execution contains
multiplication and
addition. For a fully-connected layer of input size
and output size
, the IoT device needs to perform
FLOPs for encryption and
FLOPs for decryption. For comparison, the IoT device needs to perform
FLOPs if it executes such a fully-connected layer on IoT device without outsourcing. It is worth noting that, when executing the fully-connected layer at local, the IoT device is performing
multiplications, while, in our scheme, the IoT device only needs to preform
additions on its end. Moreover, IoT devices need to handle non-linear computation locally. As depicted in
Table 3, the comparison of IoT computation on linear layers between local executions and our protocol shows that our design dramatically reduces the overhead and achieves better performance in CNN inference tasks, considering that the parameter
T is usually small for practical security.
Communication Cost The communication cost of our protocol is mainly from the transmission of the ciphertext and the outputs of the convolutional layers and fully-connected layers. To outsource a convolutional layer with inputs, the IoT device first encrypts it and sends the corresponding ciphertext, which is of size , to the edge device. Then, the edge device computes the expected scalar products and sends back the results, which are matrices of size . Regarding the fully-connected layer, the IoT device transmits encrypted message vectors of exactly the same dimensions as the plaintext ones, i.e., -dimensional input vectors and -dimensional output vectors. Thus, the total communication cost for a fully-connected layer would be .
Storage Overhead To assure correct decryption, the scalar products of the secret keys and weights shall be pre-computed and stored in the IoT device as showed in
Table 2. The pre-computation occurs only in the initialization phase, and once finished, there would be no extra storage overhead introduced during the offloading process. To outsource a convolutional layer with
H kernels of
matrices, the IoT device needs to store the scalar products with secret keys of size
for decryption, which is quite small for most CNN architecture where the kernel sizes are normally
or
. Moreover, the IoT device also stores a constant-size key pool measured by
s. To outsource a fully-connected layer with
-dimensional input and
-dimensional output, the IoT device needs to store
as a decryption key.
4.3. Experiment Result
We implemented our protocol on real devices to evaluate its performance. Our implementation adopts the TensorFlow library with Python 3.8. We used a Raspberry Pi 4 (Model B) as the IoT device, which is configured with Raspbian Debian GNU/Linux 11 (bulleye) and has an 1.5GHz quad-core ARM Cortex-A72 processor, 8 GB SDRAM of LPDDR4, and 32 GB SD card storage. We use a desktop as the edge device, which is configured with Ubuntu 20.04.3 LTS, an 8-core 3.60GHz Intel i7-9700K processor, 32 GB memory, two Nvidia GeForce RTX 2080Ti GPU, and 2TB HDD storage. The IoT device and the edge device are deployed in the same building and connected through 2.4GHz WiFi of 300 Mbps. We used ImageNet [
16] as the dataset and implemented a privacy-preserved SqueezeNet and ResNet-50 to compare with the current state-of-the-art cryptographic 2PC-NN solution CrypTFlow2 [
17]. The result depicted in
Table 4 shows that our solution has 4.7× to 24.7× speed-up in terms of computation as compared to CrypTFlow2 on the two CNN models being evaluated. For the communication overhead, our design requires an interaction of the client and the edge node only for the transmission of the encrypted data and the encrypted scalar product, which are of the same size compared to the expanded ciphertext using heavy cryptographic primitives such as homomorphic encryption. As a result, the communication cost of our protocol is 65 to 4000 times less than CrypTFlow2.
We summarize the experimental results from AlexNet in
Table 5 and compare the performance of each layer with local execution and the overall performance with the literature [
15]. As shown in
Table 5, our protocol significantly improves the efficiency of the AlexNet inference task, achieving a 14.26× speed-up compared with executing the inference task on IoT devices locally. It is noteworthy that, for the convolutional layer, our protocol achieves up to over 66× faster than local computation. With the number of convolution layers increasing in more complicated CNNs such as ResNet-50, DenseNet-121, our secure inference scheme retains the advantage of performance, as shown in
Table 4.
In terms of storage overheads, in our design, the storage cost, which contains mainly the secret-key pool and the pre-computation table, is fixed once the pre-trained CNN model is selected. For SqueezeNet and ResNet-50, the storage overhead of our protocol is 0.54 GB and 2.61 GB, respectively. For the implementation of AlexNet, considering the extremely imbalanced ratio between parameters and FLOPs, where there are over 586 million parameters with 58 million FLOPs performed in fully connected layers, while the number is 3 million parameters with over 665 million FLOPs, we outsource the convolutional layers only and keep the fully-connected layers computed locally. The storage overhead for the convolutional layers of AlexNet is 1.98 GB. We want to mention that better performance is expected when the IoT devices with sufficient storage can pre-load the complete pre-computation table for AlexNet.
Our experiment also shows that [
15] is faster than our protocol. According to our experiments, it takes [
15] around 2.51 s to execute the AlexNet, while ours took around 6.76 s, though both are significantly faster than other crypto-based solutions such as [
17]. However, as mentioned earlier, one outstanding limitation of [
15] is that its storage cost on the IoT device grows linearly to the number of inference instances being executed. For example, a 32 GB SD card can store pre-computed secrets that are only enough for around 1600 AlexNet inferences and 250 ResNet-50 inferences. After that, the SD card needs to be replaced and re-initialized, which could disturb the continuous operation of the IoT system in real-world applications such as drone-based systems. Our design eliminates such a limitation because of the constant storage overheads.