**1. Introduction**

Deep learning (DL) models are used to solve many complex tasks, including computer vision, speech recognition, natural language processing, or stock market analysis [1–3]. However, building representative and highly accurate DL models is a costly endeavor. Model owners, such as technology companies, devote significant computational resources to process vast amounts of proprietary training data, whose collection also implies a significant effort. For example, a conversational model from Google Brain contains 2.6 billion parameters and takes about one month to train on 2048 TPU cores [4]. Besides, designing the architecture of a DL model and choosing its hyperparameters require substantial ML experience and many preliminary tests. Thus, it is not surprising that the owners of DL models seek compensation for the incurred costs by reaping profits from commercial exploitation. They may monetize their models in Machine Learning as a Service (MLaaS) platforms [5] or license them for a financial return to their customers for a specific period of time [6].

Unfortunately, the high value of pretrained DL models is attractive for attackers who would like to steal those models and use them illegally. For example, a user may leak a pretrained model to an unauthorized party or continue to use it after the license period has expired. Furthermore, if a model is offered as MLaaS, many model theft techniques are available to steal it based on its predictions [7,8]. Due to the competitive nature of the

**Citation:** Jebreel, N.; Domingo-Ferrer, J.; Sánchez, D.; Blanco-Justicia, A. KeyNet: An Asymmetric Key-Style Framework for Watermarking Deep Learning Models. *Appl. Sci.* **2021**, *11*, 999. https://doi.org/10.3390/ app11030999

Academic Editor: David Megías Received: 30 November 2020 Accepted: 17 January 2021 Published: 22 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

technology market, a stolen or misused model is clearly detrimental to its owner on both economic and competitive terms.

As model theft cannot be prevented in advance, legitimate owners need a robust and reliable way to prove their ownership of DL models in order to protect their intellectual property (IP).

Digital watermarking techniques have been widely used in the past two decades as a means to protect the ownership of multimedia contents like photos, videos and audios [9–12]. The general idea of watermarking is to embed secret information into a data item (without degrading its quality) and then use the embedded secret to claim ownership of the item.

This concept of watermarking can also be extended to DL models. Several authors have proposed to use digital WMs to prove the ownership of models and address IP infringement issues [13–21]. The proposed methods fall into two main classes: (i) *white-box methods*, which directly embed the WM information into the model parameters and then extract it by accessing those parameters, and (ii) *black-box methods*, which embed WMs in the output predictions of DL models. The latter type of methods employ so-called trigger (or carrier) data samples that trigger an unusual prediction behavior: these unusual trigger-label pairs constitute the model watermark and they can be used by the model owner to claim her ownership.

As shown in Table 1, watermarking should fulfill a set of requirements to ensure its effectiveness and robustness [13,15,18,19,22]. Nonetheless, simultaneously satisfying all of these requirements is difficult to achieve [22].


**Table 1.** Requirements for watermarking of deep learning models.

#### *Contributions and Plan of This Article*

We propose *KeyNet*, a novel watermarking framework that meets a wide range of desirable requirements for effective watermarking. In particular it offers fidelity, robustness, reliability, integrity, capacity, security, authentication, uniqueness, and scalability.

*KeyNet* depends on three components: the WM carrier set distribution, the signature, and the marked model and its corresponding private model. The private model is trained

together with the original model to decode the WM information from the marked model's predictions. The WM information is only triggered by passing a sample from the WM carrier set signed by the legitimate owner to the corresponding marked model. The predictions of the marked model represent the encoded WM information that can be decoded only by the corresponding private model. The private model takes the predictions as input and decodes the WM information.

Unlike in previous works (discussed in Section 2), a watermarked input can take more than one label, which corresponds to the position of the owner's signature. Besides, the number of WM classes can be greater than the original task classes.

To successfully embed the WM and preserve the original task accuracy, the owner leverages multi-task learning (MTL) to learn both the original and the watermarking tasks together. After that, the owner distributes the marked original model, and keeps the private model secret. The owner uses the private model as a private key to decode the original model's outputs on the WM carrier set.

The main contributions of our work can be summarized as follows:


The remainder of the paper is organized as follows. Section 2 discusses related work. Section 3 describes the attack model to watermarking systems. Section 4 presents our framework in detail. Section 5 describes the experimental setup and reports the results on a variety of data sets. Finally, Section 6 gathers conclusions and proposes several lines of future research.

#### **2. Related Work**

The use of digital watermarking techniques has recently been extended from traditional domains such as multimedia contents to deep learning models. Related works can be categorized based on their application scenario as follows.

#### *2.1. White-Box Watermarking*

In this scenario, the model internal weights are publicly accessible. In [13], the authors embed an *N*-bit string WM into specific parameters of a target model via regularization. To this end, they add a regularizer term to the original task loss function that causes a statistical bias on those parameters and use this bias to represent the WM. To project the parameters carrying the WM information, they use an embedding parameter *X* for WM embedding and verification. Based on the same idea, the authors of [23] use an additional neural network instead of the embedding parameter to project the WM. The additional network is kept secret and serves for WM verification. Other works [24,25] also adopt the same approach for embedding the WM information in the internal weights of DL models.

#### *2.2. Black-Box Watermarking*

Assuming direct access to the weights of a DL model to extract the WM is often unrealistic, particularly when someone wishes to extract the WM to claim legitimate ownership of a seemingly stolen model in someone else's power. To overcome this problem, several black-box watermarking methods have been proposed. These methods assume access to the predictions of the model and, thus, embed the WM information into the model's outputs. The idea of these methods is to use some samples and assign each sample a specific label within the original task classes [14,15,17,19,26]. The trigger–label pairs form what is called a trigger set or a carrier set. The carrier set is then used to embed the WM into the target model by training the target model to memorize its trigger–label pairs along with learning the original task. As DL models are overparameterized, it is possible to make them memorize the trigger–label samples through overfitting [27]. Such embedding methods are known as backdooring methods [15]. The triggers are used later to query a remote model and compare its predictions with the corresponding labels. A high proportion of matches between the predictions and the labels is used to prove the ownership of the model.

Trigger set methods can be classified into several types. A first type of methods is based on assigning a random label to each trigger. The trigger samples themselves may be random samples from different distributions [15] or adversarial samples [17]. This approach has many drawbacks. Beyond its limited capacity regarding the number of triggers that can be used for verification, it does not establish a strong link between the owner and her WM. Thus, it is easy for an attacker to insert his WM by using a set of trigger–label pairs, giving them random labels, and then claim ownership of the owner's model. This type of attack is called the ambiguity attack [24].

A second type of methods relies on inserting the WM information into the original data. The inserted information may be a graphical logo [28], the owner's signature [26], a specific text string (which could be the company name) [14], or some pattern of noise [14]. These methods may affect the accuracy of the model in the original task. Besides, the WM may be vulnerable to model fine-tuning aimed at destroying the WM. That is possible because the WM samples will be close to their counterparts in the same class in the feature space. Therefore, fine-tuning may cause the WM pattern to be ignored and those samples to be classified into their original classes again [29].

Another type of black-box methods proposes to exploit the discarded capacity in the intermediate distribution of DL models' output to embed the WM information [21]. They use non-classification images as triggers and assign each trigger a serial number (SN) as label. SN is a vector that contains *n* decimal units where *n* is the number of neurons in the output layer. The value of SN serves as an identity bracelet that proves ownership of a marked model. To embed the WM information in the softmax layer predictions, they train the target model to perform two tasks simultaneously: the original task, which is a multiclass classification task, and the watermarking task, which is a regression task. They use the mean square error (MSE) as a loss function for the watermarking task to minimize the difference between the predicted value of a trigger and its corresponding SN. To link the owner with her marked model, they create an endorsement by a certification authority on the generated SNs. Ownership verification is performed by sending some trigger inputs, extracting their corresponding SNs, and having them verified by the authority. This method preserves the original task accuracy and also creates a link between the owner and her WM model. However, it has several drawbacks. The length of the SN depends on the size of the output layer in the model. This may prevent the owner from embedding a large WM. Moreover, by relying on values after the decimal point to express a specific symbol in the SN, if some decimal values are slightly changed, the entire SN will be corrupted. A modification like model fine-tuning would lead to destroying the WM information. In this respect, the authors do not evaluate two important types of modifications that could affect the WM: model fine-tuning and WM overwriting.

A recent paper [30] proposes to watermark DL models that output images. They force the marked model to embed a certain WM (e.g., logo) in any image output by that model. They train two models together: the marked model and the extractor model. The latter extracts the WM from the output of the former. The marked model is distributed while the extractor is kept secret by the owner. The paper does not evaluate the robustness of the method against the basic attacks that may target the marked model, such as model fine-tuning, model compression, and WM overwriting. Besides, there is a high probability that the WM extractor has memorized the WM in its weights; as a result, when it receives images from models different from the marked one, it might generate the same WM each time.

A shortcoming of most of the aforementioned WM methods is that the WM is the same in all copies of the model [22]. Therefore, if the owner distributes more than one copy of a model, it is impossible for the legitimate owner to determine which of the authorized users has leaked it.

#### **3. Attack Model**

To ensure the robustness of a watermarking methodology, it should effectively overcome (at least) three potential attacks:


#### **4. The** *KeyNet* **Framework**

Instead of having the model memorize the WM through overfitting, in our approach we design the watermarking task as a standalone ML task with its logical context and rules. First, this task performs a one-vs.-all classification so that it can distinguish WMs from original samples with different distributions. Second, the watermarking task learns the features that enable it to identify the spatial information of the legitimate owner's signature. Third, it learns to distinguish the pattern of the owner's signature from the patterns of fake signatures. The purpose of designing the watermarking task in this way is (i) to increase the difficulty of the task so that an attacker with little training data cannot add his WM without losing the accuracy of the original task, (ii) to provide a reliable verification method that strengthens the owner's association with her marked model, (iii) to achieve greater security by keeping the private model in the hands of the owner, (iv) to embed a robust WM without affecting the accuracy of the original task, (v) to produce different unique copies of a DL model for different users of the system based on the same carrier by signing the carrier with the joint signature of the user/owner and (vi) to scale for a large number of users and identify the leakage point with high confidence and little effort.

*KeyNet* consists of two main phases: watermark embedding and watermark extraction and verification. Figure 1 shows the global workflow of *KeyNet*. The marked DL model is used as a remote service, so that the user can only obtain its final predictions. *KeyNet* passes the final predictions of the remote DL model to its corresponding private model, which uses them to decode the WM information. In the ownership verification protocol we exploit the fact that each sample in the owner's WM carrier set can take different labels based on the position of the owner's signature in it. We next briefly explain the workflow of each phase.

**Figure 1.** *KeyNet* global workflow.

**Watermark embedding.** *KeyNet* takes four main inputs in the WM embedding phase: the target model (pretrained or from scratch), the original data set, the owner's WM carrier set, and the owner's information string. The output is the marked model, its corresponding private model, and the owner's signature. The WM carrier set samples are signed using the owner's signature. After that, the signed WM carrier set is combined with the original data set and they are used to fine-tune (or train) the targeted model. The private model takes the final predictions of the original model as inputs and outputs the position of the owner's signature on the WM sample. To embed the WM information and preserve the main task accuracy at the same time, WM embedding leverages multi-task learning (MTL) to train the two models jointly.

MTL is an ML approach that allows learning multiple tasks in parallel by sharing the feature representation among them [31,32]. Many MTL methods [33–35] show that different tasks can share several early layers, and then have task-specific parameters in the later layers. MTL also helps the involved tasks to generalize better by adding a small amount of noise that helps them reduce overfitting [36,37].

In our framework, the original model parameters are shared among the original task and the watermarking task. When the marked model receives unmarked data samples, its predictions represent the classification decision on those samples. However, when it receives a watermarked sample, its output represents the features that the private model needs to distinguish the signature position on that sample. For this to be possible, the private model forces the shared layer (the original model parameters) to produce a different representation of the WM samples. We can see the private model as a private key held only by the owner that decodes the WM information from the original model predictions. More details about this phase are given in Sections 4.2 and 4.3.

**Watermark extraction and verification.** The owner can extract the WM information from a suspicious remote DL by taking a random sample from her WM carrier set, putting her signature on one of the predefined positions, and querying the remote model. After that, he/she passes the remote model's predictions to the private model. If the private model decodes the WM information and provides the position of the signature with high accuracy, then the owner can claim her ownership.

To verify the ownership of a remote black-box DL model, the owner first delivers the WM carrier set and her signature to the authority. He/she also tells the authority about the methodology used to sign the WM samples along with the predefined positions where the WM may be placed. The authority (i.e., the *verifier*) randomly chooses a sample from the carrier set, puts the signature in a random position, queries the remote DL model, and sends the model's predictions to the owner. The owner (i.e., the *prover*) takes the predictions, passes them to her private model, and tells the authority the position of her signature on the image. The authority repeats the proof as many times as he/she desires. After that, the owner's answer accuracy is evaluated according to a minimum threshold. If the owner surpasses the threshold, his/her ownership is regarded as proven by the authority. More details about this phase are given in Section 4.4.

The following subsections describe each phase in detail. First, we formalize the problem. Then, we describe the methodology for signing and labeling the WM carrier set using the hashed value of the owner's information. After that, we describe the WM embedding phase by training the original model and the private model on the original and the watermarking tasks jointly. Finally, we explain the WM extraction and verification phase.

#### *4.1. Problem Formulation*

The key idea of our framework is to perform two tasks at the same time: the original classification task *Torg* and the watermarking task *Twm*. To do so, *KeyNet* leverages the multi-task learning (MTL) approach to achieve high accuracy in both tasks by sharing the parameters of the original model between the two tasks. *KeyNet* adds a private model to the original model. The original model's objective is to correctly classify the original data samples into their corresponding labels, while the private model's objective is to correctly predict the position of the owner's signature in a sample of the WM carrier set using the original model predictions. We can formally represent as follows the problem being tackled:

• **Representation of the original, private, and combined models.** Let *Dorg* = {(*xi*, *yi*)}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> be the original task data and *Dwm* <sup>=</sup> {*cj*}*<sup>m</sup> <sup>j</sup>*=<sup>1</sup> be the WM carrier set data. Let *h* be the function of the original model and *f* be the function of the private model. Let *θ*<sup>1</sup> the parameters of *h* and *θ*<sup>2</sup> be the parameters of *f* . Let *signature* be the owner's signature and *PL* <sup>=</sup> {(*pk*, *lk*)}*<sup>z</sup> <sup>k</sup>*=<sup>1</sup> be the set of predefined position–label pairs (e.g., position: top left, label: 1 and position: bottom right, label: 4) where *z* is the total number of positions at which the signature can be located on a WM carrier set sample. Let *sign* be the function that puts a *signature* on a carrier set sample *c* and returns the signed sample *cpk* and its corresponding label *lk* as:

$$(c^{p\_k}, l\_k) = \operatorname{sign}(\operatorname{sign}(\operatorname{matrix}\_{\prime} c, p\_k), l)$$

Let *Dsigned wm* be the signed carrier set samples that contain all the (*cp*, *l*) pairs. We use *Dorg* and *<sup>D</sup>signed wm* to train both *h* and *f* to perform *Torg* and *Twm*.

Typically, the function *h* tries to map each *xi* ∈ *Dorg* to its corresponding *yi*, that is, *h*(*xi*) = *yi*.

Let *f*(*h*) be the composite function that aims at mapping each *c pk <sup>j</sup>* to its corresponding *lk*, that is, *f*(*h*(*cpk* )) = *lk*.

• **Embedding phase.** We formulate the embedding phase as an MTL problem where we jointly learn two tasks that share some parameters in the early layers and then have task-specific parameters in the later layers. The shared parameters in our case are *θ*1, while *θ*<sup>2</sup> are the WM task-specific parameters. We compute the weighted combined loss *L* as

$$L = \alpha Loss(h(x), y) + (1 - \alpha) Loss(f(h(c^p)), l),$$

where *h*(*x*) represents the predictions on the original task samples, *f*(*h*(*cp*) represents the predictions on *Dsigned wm* samples, *Loss*(*h*(*x*), *y*)) is the loss function that penalizes the difference between the original model outputs *h*(*x*) and the original data targets *y*, *Loss*(*f*(*h*(*cp*)), *l*) is the loss function that penalizes the difference between the composite model outputs *f*(*h*(*cp*)) and the signed WM carrier set's target *l*, and *α* is the combined weighted loss parameter. Then, we seek *θ*<sup>1</sup> and *θ*<sup>2</sup> that make *L* small enough to get acceptable accuracy on both *Torg* and *Twm*. Once this is done, the WM has been successfully embedded while preserving the accuracy of the original task *Torg*.

• **Verification phase.** The verification function *V* checks whether a claimer (also known as the *prover*), who has delivered her *signature* and WM carrier set *Dwm* to the authority (also known as the *verifier*), is the legitimate owner of a remote model *h* - . If the prover is the legitimate owner of *h* - , he/she will be able to pass the verification process and thus prove her ownership of *h* - . That is, because he/she possesses the private model *f* , which was trained to decode *h* - predictions on her signed *Dwm*.

Here, *r* represents the number of the required rounds in the verification process and *T* denotes the threshold needed to prove the ownership of *h* - . Note that the authority also knows the signing function *sign* used to sign *Dwm* samples in order to obtain (*cpk* , *lk*) pairs.

The function *V* can be expressed as *V*({(*f*(*h* - (*cpk* )), *lk*, *pk*)}*<sup>r</sup> <sup>k</sup>*=1, *T*) = {*True*, *False*}.

#### *4.2. Watermark Carrier Set Signing and Labeling*

The methodology we use for labeling the WM carrier set is key in our approach. In contrast to related works, which assign a unique label to each of the WM carrier set samples, our labeling method allows for a single sample to carry more than one label. More precisely, any sample *<sup>c</sup>* <sup>∈</sup> *Dwm* can take one of *<sup>z</sup>* labels {*lk*}*<sup>z</sup> <sup>k</sup>*=1, where *z* is the number of predefined positions {*pk*}*<sup>z</sup> <sup>k</sup>*=<sup>1</sup> at which the *signature* of the owner can be placed. Besides the *z* positions, if a sample is not signed by the legitimate owner, it uses label 0 by default. Moreover, if the sample is not in *Dwm*, it uses label 0 by default even if it is signed by the owner. Algorithm 1 formalizes the method used to sign and label the *Dwm* samples.

First and foremost, the owner's information and the metadata of her model are endorsed by the authority. This information is a string of arbitrary length. After that, Algorithm 1 returns the signed *Dsigned wm* WM carrier set consisting of pairs (signed *Dwm* sample, label), a signed *Dsigned di f* consisting of pairs (signed sample from a set *Ddi f* of different distribution, 0), and the owner's signature *signature* used to sign the samples. The inputs to Algorithm 1 are:


Algorithm 1 starts its work by taking the hash value for *in f Str* and then converting it to a squared array of size *s*, as follows. In our implementation we use *SHA256*, which yields 256 bits that are converted to 64 characters by digesting them to hexadecimal. The last *s* (with *s* ≤ 64) among the digested characters are converted to a decimal vector of length *s*. The decimal vector values are normalized between 0 and 1 by dividing them by the maximum value in the vector. The normalized vector is then reshaped into a squared array. The resulting array represents the owner's *signature*. Note that it is also possible to use any hashing function different from SHA256.

#### **Algorithm 1** Signing a WM carrier Set

**Input:** Owner's information *in f Str*, owner's signature size *s*, owner's WM carrier set *Dwm*, signature positions/labels set *PL*, other distributions' samples *Ddi f*

**Output:** signed labeled WM samples *Dsigned wm* , signed samples from different distributions *Dsigned di f* , owner's signature *signature*

1: *signature* <sup>←</sup> *hashAndReshape*(*in f Str*,*s*) //The owner signature.


22: **end for**

**return** *Dsigned wm* , *<sup>D</sup>signed di f* ,*signature*

Once we obtain *signature*, we start the labeling step of the WM carrier set *Dwm*. For each *c* ∈ *Dwm*, we replicate *c* for *z* times where *z* is the total number of possible positions {*pk*}*<sup>z</sup> <sup>k</sup>*=<sup>1</sup> of the signature. Then, we use the *sign* function to place *signature* in the position *pk* to obtain the (*cpk* , *lk*) pair.

We also leave one copy of each sample without signing and assign it the label 0. That is, if a carrier set sample *c* is not signed with *signature*, it will be represented by the (*c*, 0) pair. We then do two steps:


The goal of the above two steps is to make the marked model *h*∗ output the information that tells the position of a signature only if we pass to it a sample that *belongs to Dwm and is* *signed with the real signature*. Otherwise, *h*∗ ignores the presence of any different signature from *signature* on *Dwm* samples. This also avoids *h*∗ responding to samples from different distributions than the *Dwm* distribution.

Finally, Algorithm 1 returns the signed WM samples *Dsigned wm* , the signed samples from different distributions *Dsigned di f* , and the signature *signature* that will be used to trigger the marked model *h*∗. Note that this process is performed only once, before the WM is embedded.

#### *4.3. Watermark Embedding*

To successfully embed the WM in the original model *h* without compromising the accuracy of the original task, we jointly train both *h* and the private models *f* simultaneously. As a large amount of the carrier set samples have been signed, we first randomly select one-fifth of *Dsigned wm* for training. The random selection allows for the representation of all the possible states while reducing the carrier set size. The signed samples from other distributions *Dsigned di f* are combined with those randomly selected and assigned to *D*2. In the end, we add *D*<sup>2</sup> to the original task data *Dorg*. The resulting combined data set *D* = *Dorg* ∪ *D*<sup>2</sup> are used in the training step as specified next.

During joint training, a batch *b* is taken from *D*. Then, *b* is separated into two subbatches: {*x*, *y*} ∈ *Dorg*, {*c*, *l*} ∈ *D*2. {*x*} is passed to the original model *h* and the loss *Lf*(*h*(*x*), *y*) is calculated. On the other hand, {*c*} is first passed to *h*, and then the predictions of the original model *h*(*c*) are passed to the private model *f* ; the loss *Lf*(*f*(*h*(*c*)), *l*) is afterwards calculated. As we deal with two classification tasks, the cross-entropy loss function is used to calculate the loss for both tasks.

We use the parameter *α* to balance the weight of *Lf*(*h*(*x*), *y*) and *Lf*(*f*(*h*(*c*)), *l*) before we add them up in the joint loss *L*. Parameter *α* allows us to choose the best combination of the weighted loss that preserves the accuracy of *Torg* while embedding WM successfully. Then, the parameters of *h*, *f* are optimized to minimize *L*.

Reducing *Lf*(*h*(*x*), *y*) forces *h* to predict the correct class for *x*, while reducing the watermarking task loss *Lf*(*f*(*h*(*c*)), *l*) forces the private model to distinguish the distribution of the WM carrier set *Dwm*, and predict the location of the owner's signature in its samples. The original model, in addition to performing the original task, also executes the first part of the watermarking task by outputting the features needed to find the position of the signature. By using these features as input, the private model performs the second part of the watermarking task, which consists in identifying the signature position.

Regarding the architecture of the private model *f* , the number of inputs corresponds to the size of *h* predictions, whereas the number of outputs corresponds to the number of classes of the WM task *z* + 1. We also add at least one hidden layer in between. The hidden layer enriches the information coming from the original model before passing it to the output layer of the private model.

Algorithm 2 summarizes the process of embedding the WM. It takes an unmarked model *h* that might be pretrained or be trained from scratch, the private model *f* , the original data set *Dorg*, the signed WM carrier set *<sup>D</sup>signed wm* , signed samples from other distributions *Dsigned di f* , and the joint loss balancing parameter *α*. The output of the embedding phase is a marked model *h*∗ along with its corresponding private model *f* .

#### **Algorithm 2** Watermark Embedding

**Input:** Unmarked DL model *h*, private model *f* , original data set *Dorg*, signed WM carrier set *Dsigned wm* , signed samples from other distributions *<sup>D</sup>signed di f* , batch size *BS*, weighted loss parameter *α*.

**Output:** Marked model *h*∗, corresponding private model *f* .


$$\otimes \ D\_2 \gets D\_2 \cup D\_{diff}^{sing.el}$$

