10: **end for**

#### **return** *h*∗, *f*

#### *4.4. Watermark Extraction and Verification*

The verification process of ownership involves a would-be owner in the role of prover and the authority in the role of verifier. The would-be owner claims that a remote model *h* - is part of his/her IP. The authority is given the WM carrier set *Dwm*, the would-be owner's *signature*, the signing function *sign*, and remote access to *h* - . The authority sets an accuracy threshold *T* and a number of required verification rounds *r* to decide whether *h* - is the IP of the would-be owner. In each round, the authority randomly selects a sample *c* from *Dwm*, signs it using *signature* in a random position *pk*, and sends the signed sample *cpk* to the remote model *h* - . The predictions *h* - (*cpk* ) (which contain the encoded WM information) are forwarded to the would-be owner. The latter passes them to her private model *f* to obtain *lk* = *f*(*h* - (*cpk* ). As the relationship between positions and labels is one-to-one, the would-be owner can use *lk* to tell the authority the position *pk* of her signature in *c*. After *r* rounds, the accuracy *acc* of the would-be owner at detecting the positions is the number of correct answers divided by *r*. If *acc* ≥ *T*, then authority certifies that *h* - is owned by the would-be owner.

Note that the authority can also send the samples without signing them or sign them using fake signatures different from *signature*. In this case, the would-be owner should tell the authority that this sample does not contain her signature. That is possible because in these cases the private model gives them the label 0. Algorithm 3 formalizes the verification process.

#### **Algorithm 3** Watermark Verification

```
Input: Remote access to h
                           -

                           , threshold T, number of rounds r
```
**Output:** Boolean decision *d* (True or False) on *h* - 's ownership.

```
1: correct = 0
```

```
2: d = False //Decision on the ownership of h
                                                 -

                                                  .
```

```
3: for each round i = 1, 2, . . .r do
```

```
4: c ← randomSample(Dwm)
```
5: *pk* ← randomPosition(), k ∈ 1, 2, . . . *z*

```
6: cpk ← sign(signature, c, pk)
```

```
7: predictions ← h
                       -

                       (cpk )
```

```
8: lk ← f(predictions)
```

```
11: correct ← correct + 1
```

```
12: end if
```

```
13: end for
```
14:

```
15: acc = correct/r
```

```
16: if acc ≥ T then
```

```
17: d ← True
```

```
18: end if
```

```
return d
```
#### **5. Experimental Results**

In this section, we evaluate the performance of *KeyNet* on two image classification data sets and with two different DL model architectures. First, we present the experimental setup. After that, we evaluate the proposed framework performance against the requirements stated in Table 1. We focus on robustness, authentication, scalability, capacity, integrity, and fidelity, but, as our framework partly fulfills the rest of requirements, we also assess its performance on each of them.

The code and the models used in this section are available at https://github.com/ NajeebJebreel/KeyNet.

#### *5.1. Experimental Setup*

**Original task data sets and DL models.** We used two image classification data sets: CIFAR10 [38] and FMNIST5. CIFAR10 has 10 classes, while FMNIST5 is a subset of the public data set Fashion-MNIST [39]; FMNIST5 contains the samples that belong to the first five classes in Fashion-MNIST (classes from 0 to 4). Table 2 summarizes the original task data sets, the carrier set, and the DL models and their corresponding private models.

**Table 2.** Data sets and deep learning model architectures. *C*(3, 32, 5, 1, 2) denotes a convolutional layer with 3 input channels, 32 output channels, a kernel of size 5 × 5, a stride of 1, and a padding of 2, *MP*(2, 1) denotes a max-pooling layer with a kernel of size 2 × 2 and a stride of 1, and *FC*(10, 20) indicates a fully connected layer with 10 inputs and 20 output neurons. We used *ReLU* as an activation function in the hidden layers. We used *LogSoftmax* as an activation function in the output layers for all DL models. The rightmost column contains the architecture of the corresponding private models.


**Watermark carrier sets.** We employed three different data sets as WM carrier sets: STL10 [43], MNIST [44], and Fashion-MNIST (the latter was used only in attacks). We applied Algorithm 1 to label the carrier set's images. Then, we passed the carrier set, the owner's information, the signature size, a fake signature, some samples from different distributions, and a list containing the labeling order of the positions of the owner's signature in the carrier set. We used the following labeling order: 1: Top left, 2: Top right, 3: Bottom left, 4: Bottom right, and 5: Image center. Algorithm 1 assigns label 0 to an image if (i) the image belongs to the carrier set but does not carry any signature, (ii) the image belongs to the carrier set but carries a signature different from the owner's signature, or (iii) the image does not belong to the carrier set distribution (even if it is signed with the owner's real signature). For WM accuracy evaluation, we randomly sampled 15% of the WM carrier set. After that, we signed them in different random positions and assigned them the corresponding labels. Figure 2 shows some examples of signed carrier set images and their corresponding labels.

**Figure 2.** Examples of signed STL10 carrier set images employed with the CIFAR10 data set. Each image shows the signature position and its corresponding label.

**Attacker configurations.** We assumed the attacker has varying percentages of the training data, ranging from 1% to 30% of the original training data. The attacker's training data were randomly sampled from the original training data. We also assumed that the WM carrier set distribution is a secret between the owner and the authority, so we assigned the attacker different WM carrier sets from those of the owner. The attacker's private model was slightly different as well, because the owner's private model and its architecture are secret. The rest of the attacker's configurations and hyper-parameters were the same as the owner's. Table 3 summarizes the attacker's WM carrier sets and private model architectures.


**Table 3.** Attacker's WM carrier sets and private models. The attacker's WM carrier set and private model differ from the owner's.

**Performance metric.** We used *accuracy* as performance metric to evaluate all the original and WM tasks. *Accuracy* is the number of correct predictions divided by the total number of predictions.

**Training hyperparameters.** We used the cross-entropy loss function and the stochastic gradient descent (SGD) optimizer with learning rate = 0.001, momentum = 0.9, weight decay = 0.0005, and batch size = 128. We trained all the original unmarked models for 250 epochs. To embed the WM from scratch, we trained the combined model for 250 epochs. To embed the WM in a pretrained model, we combined the private model and fine-tuned the combination for 30 epochs.

To jointly train the original and private models, we used parameter *α* to weight the original task loss and the WM task loss before optimization. For CIFAR10 we used *α* = 0.9 when embedding the WM in a pretrained model, while we used *α* = 0.95 when embedding the WM into a DL model from scratch. For FMNIST5 we used *α* = 0.85 to embed the WM in a pretrained model, while we used *α* = 0.9 to embed the WM from scratch. The experiments were implemented using Pytorch 1.6 and Python 3.6.

#### *5.2. Experiments and Results*

First, we made sure that an accurate private model could not be obtained (and therefore the ownership of a DL model could not be claimed) by using only the predictions of a black-box DL model. To do so, we queried different unmarked DL models with all the signed samples in the owner's WM carrier set, and we used the predictions as input features to train the private model. Table 4 shows the performance of the private models obtained in this way after 250 epochs.

**Table 4.** Accuracy of the private models at detecting the position of the owner's signature in the WM carrier set when trained for 250 epochs with the predictions of black-box models.


It can be seen that the average accuracy of the private model at detecting the signature position inside the WM carrier set is as low as 32.27%. This accuracy was obtained by granting unconditional query access to the black-box model and by using its predictions as input to train the private model. Based on that, we decided to set threshold *T* = 0.9, which is nearly three times greater than the above average accuracy. Therefore, to prove her ownership of a black-box DL model, the owner's private model must detect the signature positions in the WM carrier set with an accuracy greater than or equal to 90%.

In the following, we report the results of *KeyNet* on several experiments that test its fulfillment of the requirements depicted in Table 1.

**Table 5.** Fidelity results. Column 3 shows the accuracy of the unmarked models in the original tasks (baseline accuracy) before embedding the WM. Columns 4 and 5 show the accuracy of the marked model in the original task after embedding WM by fine-tuning a pretrained model or by training the combined model from scratch. Columns 6 and 7 show the accuracy of the private model in detecting the WM using the predictions of the corresponding marked model. To embed the WM in a pretrained model, we fine-tuned it for 30 epochs while we trained models from scratch for 250 epochs.


**Fidelity.** Embedding the WM should not decrease the accuracy of the marked model on the original task. As shown in Table 5, the marked model's accuracy is very similar to that of the unmarked model. This is thanks to the joint training, which simultaneously minimizes the loss for the original task and the WM task. Furthermore, *KeyNet* did not only preserve the accuracy in the original task, but sometimes it even led to improved accuracy. That is not surprising, because the watermarking task added a small amount of noise to the marked model, and this helped reduce overfitting and thus generalize better.

*KeyNet* therefore fulfills the fidelity requirement by reconciling accuracy preservation for the original task and successfully embedding of the WM in the target models.

**Reliability and robustness**. *KeyNet* guarantees a robust DL watermarking and allows legitimate owners to prove their ownership with accuracy greater that the required threshold *T* = 90%. Table 5 shows that WM detection accuracy was almost 100%, and thus our framework was able to reliably detect the WM.

We assess the **robustness** of our framework against three types of attacks: *finetuning* [45], *model compression* [46,47], and *WM overwriting* [13,48]:


data, ranging from 1% to 30%. We chose the lower bound 1% based on the work in [49]; the authors of that paper demonstrate that an attacker with less than 1% of the original data is able to remove the watermark with a slight loss in the accuracy of the original task.

**Table 6.** Fine-tuning results. In the fine-tuning attack, the marked models were retrained based on the original task loss only.


**Figure 3.** Robustness against model compression. The X-axis indicates the pruning levels we used for each marked model. The blue bars indicate the marked model accuracy in the original task, while the orange bars indicate the accuracy of WM detection. The horizontal dotted line indicates the threshold *T* = 90% used to verify the ownership of the model.

To overwrite the WM, the attacker selected his/her own carrier set and signed it using Algorithm 1 with his/her signature. Then, he/she trained her private model along with the marked model in the same way as in Algorithm 2. Tables 7 and 8 summarize the results of WM overwriting experiments. The attacker was able to successfully overwrite the original WM and successfully embed her new WM, but this was done at the cost of a substantial accuracy loss in the original task when using a fraction of the training data up to 10%. Thus, our watermark easily survives the attacks in the conditions described in [49]. For fractions above 10%, the accuracy of the marked model became competitive, but an attacker holding such a large amount of training data can easily train her own model and has no need to pirate the owner's model [18].

**Table 7.** Overwriting attack results with CIFAR10 marked models. The table shows the accuracy before and after overwriting each marked model and its corresponding private model depending on the fraction of training data known by the attacker (from 1% to 30%).


**Table 8.** Overwriting attack results with FMNIST5 marked models. The table shows the accuracy before and after overwriting each marked model and its corresponding private model depending on the fraction of training data known by the attacker (from 1% to 30%).


**Integrity.** *KeyNet* meets the integrity requirement by yielding low WM accuracy detection with unmarked models, and thus it does not falsely claim ownership of models owned by a third party. In our experiments, there were 6 classes for the watermarking task. Looking at Table 9, the accuracy of falsely claimed ownership of unmarked models is not far from guessing 1 out of 6 numbers randomly, which equals approximately 16.6%.

**Authentication.** *KeyNet* fulfills the authentication requirement by design. Using a cryptographic hash function such as *SHA256* to generate the owners' signatures establishes a strong link between owners and their WMs. Furthermore, the verification protocol of *KeyNet* provides strong evidence of ownership. When the authority uses a fake signature, the marked model does not respond. This dual authentication method provides unquestionable confidence in the identity of the legitimate owner.

**Security.** As *KeyNet* embeds the WM in the dynamic content of DL models through joint training, and as modern deep learning models contain a huge number of parameters, detecting the presence of the WM in such models is infeasible. In case the attacker knows that a model contains WM information and wants to destroy it, he/she will only be able to do so by also impairing the accuracy of the model in the original task. Regarding the security of the owner's signature, the use of a strong cryptographic hash function, such as SHA256, provides high security, as we next justify. On the one hand, if the signature size *s* is taken long enough, it is virtually impossible for two different parties to have the same signature: the probability of collision for *s* hexadecimal digits is 1/16*<sup>s</sup>* , so *s* = 25

should be more than enough. On the other hand, even if the owner's signature is known by an attacker, the cryptographic hash function makes it impossible to deduce the owner's information from her signature.

**Unforgeability.** To prove ownership of a DL model that is not his/hers, an attacker needs to pass the verification protocol (Algorithm 3). However, the private model allowing watermark extraction is kept secret by the legitimate owner. Without the private model, even if the attacker knows both the WM carrier set and the owner's signature, the attacker can only try a random strategy. Yet, the probability of randomly guessing the right position at least a proportion *T* of *r* rounds is at most 1/*zTr*. This probability can be made negligibly small by increasing the number *r* of verification rounds.

Thus, *KeyNet* partially meets the unforgeability requirement: an attacker can embed additional WMs into a marked model, but cannot claim ownership of another party's WM.

**Capacity.** Capacity can be viewed from two perspectives: (i) the framework allows the inclusion of a large amount of WM information, and (ii) triggers available in the verification process are large enough. Given that hashes are one-way functions with fixed length, the information that can be embedded in them is virtually unlimited. In our experiments, we used a medium-sized signature of *s* = 25 characters. Nevertheless, *KeyNet* allows flexibility in specifying various signature sizes and in using hash functions other than SHA256. On the other hand, *KeyNet* can use a large number of samples in WM verification. In addition to using all samples belonging to a certain distribution (the WM carrier set), it allows using samples from other distributions due to the method of labeling and training used. The marked model gives the signature information if the signature is placed on top of a WM carrier set's sample, while samples from different distributions are given the label 0 (even if they are signed with the signature of the legitimate owner).

**Uniqueness and scalability.** *KeyNet* can be easily extended to produce unique copies of a DL model for each user, as well as scale to cover a large number of users in the system. Furthermore, it can link a remote copy of a DL model with its user with minimal effort and high reliability.

**Table 9.** Integrity results with unmarked models. Each private model was tested with two different unmarked models: one model has the same topology as its corresponding marked model, the other one has a different topology. The last four columns show the accuracy detection obtained with the unmarked models.


In our experiments, we distributed two unique copies of the FMNIST5-CNN model: one for *User1* and another for *User2*, each copy having its corresponding private model. We took a pretrained FMNIST5-CNN model and fine-tuned it for 30 epochs to embed the WM linked to a specific user. To do so, we signed two copies of the WM carrier set, where each copy was signed using different joint signatures. Once we got two unique carrier sets, we trained two unique marked models, each one with its corresponding private model using Algorithm 2. In the end, we got two unique marked copies of the model with their corresponding private models and users. We distributed each copy to its corresponding user.

We then assumed that *User 1*, respectively, *User 2*, leaked their model, and we tried to find the leaker as follows.


4. We passed the predictions from samples signed with *User1*'s signature to her private model and calculated the accuracy at detecting the WM. We did the same with *User2*'s predictions and his private model.

Figure 4a shows the results of model owner detection if *User1* leaked her model. We see that we were able to determine that the model copy was most likely leaked by *User1*. Figure 4(a1) shows the normalized confusion matrix of *User1*'s private model in detecting the WM information using the predictions of *User1*'s remote model. It shows that the accuracy at detecting signature positions was almost 100% when we sent the samples signed by *User1*. Figure 4(a2) shows the normalized confusion matrix of *User2*'s private model in detecting the WM information by using the predictions of *User1*'s remote model when the samples were signed with *User2*' signature. As *User1*'s model was trained to distinguish only *User1*'s signature position, it output features that led *User2*'s private model to provide label 0 for samples signed by *User2*.

Figure 4b provides similar results when *User2* leaked his model. The same conclusions hold. Note that the private models were unable to distinguish the signature positions and output the label 0 when they were fed with predictions of non-corresponding marked models and non-corresponding signatures. This is an interesting feature of *KeyNet*, as all the remote models and their private models learned a common representation of label 0.

Regarding **scalability**, if we want to query a remote model in case we have *u* users and we decide to use *m* signed samples for verification, then the number of remote model queries will be *u* × *m*, and thus linear in *u*.

(**a**) With predictions of *User1*'s copy

**Figure 4.** Normalized confusion matrices of the accuracy in detecting the individual copies of the FMNIST5-CNN model distributed among two users. Subfigure (**a**) shows the detection accuracy of the predictions of *User1*' copy. Subfigure (**a1**) shows the confusion matrix of *User1*'s private model when *User1*'s copy was queried by samples signed by *User1*. Subfigure (**a2**) shows the confusion matrix of *User2*'s private model when *User1*'s copy was queried by samples signed by *User2*. Subfigure (**b**) shows the same results for *User2*'s model.

> **Efficiency and generality.** The efficiency of *KeyNet* is related to the size of the output of the model to be marked. The smaller the number of output neurons, the fewer the

parameters of the private model. On the other hand, our framework allows embedding the WM from scratch or by fine-tuning; the latter contributes to efficiency. Regarding **generality**, even though in our work we use image classification tasks that output softmax layer probabilities (confidence) for the input image with each class, *KeyNet* can be extended to cover a variety of ML tasks that take images as input and output multiple values such as multi-labeling tasks, semantic segmentation tasks, image transformation tasks, etc.

#### **6. Conclusions and Future Work**

We have presented *KeyNet*, a novel watermarking framework to protect the IP of DL models. We use the final output distribution of deep learning models to include a robust WM that does not fall in the same decision boundaries of original task classes. To make the most of this advantage, we design the watermarking task in an innovative way that makes it possible (i) to embed a large amount of WM information, (ii) to establish a strong link between the owner and her marked model, (iii) to thwart the attacker from overwriting the WM information without losing accuracy in the original task, and (iv) to uniquely fingerprint several copies of a pretrained model for a large number of users in the system.

The results we obtained empirically prove that *KeyNet* is effective and can be generalized to various data sets and DL model architectures. Besides, it is robust against a variety of attacks, it offers a very strong authentication linking the owners and their WMs, and it can be easily used to fingerprint different copies of a DL model for different users.

As a future work, we plan to extend the framework to cover computer vision tasks that take images as input and output images as well. We also intend to study ways to estimate the watermark capacity of a deep neural network depending on its topology, the complexity of the learning task, and the watermark to be embedded.

**Author Contributions:** Conceptualization: N.M.J. and J.D.-F.; methodology, N.M.J.; validation: N.M.J., D.S., and A.B.-J.; formal analysis, N.M.J. and J.D.-F.; investigation, N.M.J. and A.B.-J.; writing original draft preparation: N.M.J.; writing—review and editing, J.D.-F. and D.S.; funding acquisition, J.D.-F. and D.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the European Commission (projects H2020-871042 "SoBig-Data++" and H2020-101006879 "MobiDataLab"), the Government of Catalonia (ICREA Acadèmia Prizes to J. Domingo-Ferrer and D. Sánchez, FI grant to N. Jebreel and grant 2017 SGR 705), and the Spanish Government (projects RTI2018-095094-B-C21 "Consent" and TIN2016-80250-R "Sec-MCloud").

**Acknowledgments:** Special thanks go to Mohammed Jebreel, for discussions on an earlier version of this work, and to Rami Haffar, for implementing some of the experiments. The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

