**1. Introduction**

Digital watermarking refers to a family of methods that embed a signal into a digital object. For example, visible logos in a TV broadcast or on a picture for sale on the web are two applications of visible watermarking. Nonetheless, digital technology allows for more complex applications of the general idea of watermarking, also allowing invisible (to humans) watermarks to be embedded into digital objects. Copyright protection, tracking origin, authentication, and integrity protection are examples of the application of this technology.

It is possible to classify watermarking algorithms according to various characteristics [1]: the main properties of digital watermarking algorithms will be discussed in the following.

A first, trivial, subjective property of watermarks is imperceptibility, that is the inability of a human being to recognize the presence of a watermark embedded into an object.

A digital watermark may be robust, semi-fragile, or fragile. Robust watermarking refers to the embedding of a signal with the purpose of resisting malicious attempts aimed at its removal, maintaining the possibility to show the presence of the signal; a typical application of this kind of watermarking algorithms is copyright protection. On the other hand, fragile watermarks have the purpose of being altered by minimal modification of the digital object containing them; fields of application are integrity protection and authentication. The class of semi-fragile watermarking algorithms contains those methods that accept a minimal level of modification without flagging a tampering alarm.

On the detection side, watermarking methods may be grouped as informed (non-blind) or blind depending on the necessity or not of the original host object. Moreover, if an algorithm can recover the original data, it is called reversible; otherwise, it is called lossy or non-reversible.

The values of the watermark signal *w* may be embedded into the host object by modifying the object features in different domains; the most widely used are the spatial domain and the frequency domain (or transformed domain). The spatial domain refers to the values that describe and represent the object; as examples, for an image, they may be the pixel intensities; for a 3D model, they may be the vertices' (spatial) coordinates. Instead, in the case of embedding in the frequency domain, the object's data are first transformed with a linear transform, like the discrete Fourier transform, the wavelet transform, the singular value decomposition or the Karhunen–Loève Transform (KLT), then the watermark signal is applied to alter the transform coefficients; after that, the inverse transformation is applied to compute the modified values, which represent the watermarked object. In general, embedding in the frequency domain requires satisfying non-linear constraints (like obtaining integer pixel values or modifying only some parts of the transformed coefficient); thus, computational intelligence algorithms which look for almost optimal solutions in non-linear domains can be a possible embedding method.

Most of the 3D model watermarking works are focused on copyright protection (by inserting robust watermarks), while there are fewer works on integrity protection and authentication. Moreover, existing approaches to fragile watermarking introduce high distortion and/or might require extra information for verification. In this paper, we present a blind, non-reversible fragile watermarking algorithm that adds an extra level of security (by defining a secret embedding space through the KLT transform) and protects the integrity of 3D models, with negligible alterations to the host model vertices' coordinates.

The host model original file format may be both textual or binary, but the watermarked model file is binary (as in many graphics file formats, like Blender).

The following section recalls some related works in the field of 3D model watermarking. Sections 3 and 4 present two fundamental tools used by the proposed algorithm, discussed in Section 5, to embed in a secure way a watermark in a 3D model. Section 6 reports numerical results from experiments applying the method to publicly available models. The results are discussed in Section 7, and some conclusions are drawn in Section 8.

#### **2. Related Works**

The field of digital watermarking has seen a large development of methods for images and audio, but also the protection of 3D models has received attention from researchers. Several algorithms for robust (see Table 1) and fragile (see Table 2) watermarking of 3D models have been developed in recent years. Here, we report a list of the properties declared by the authors, along with their pros and cons.


**Table 1.** Summary of related works on robust watermarking: main properties, pros, and cons.


**Table 2.** Summary of related works on fragile watermarking: main properties, pros, and cons. KLT, Karhunen–Loève Transform.

Our proposed method shares a number of features with the literature and differs substantially in others: the main and more significant difference is the embedding space, which is secret and allows an extra level of security, as it is unfeasible for an attacker to reproduce such a space. Most of the related works in the literature are actually robust watermarking schemes, while our method is an extremely fragile method whose watermark is broken by negligible alterations to the 3D models it protects. Similarly to [13], we modify the binary float representation of the vertices' coordinates, but our watermark is not directly inserted into them, as done in [13]. Moreover, the proposed method is immune to vertex reordering and can tolerate affine transformations to the model when these transformations are stored separately from the vertex coordinates in the output file format (such as done in Blender). Finally, similarly to [20], we use a genetic algorithm to compute a (quasi)optimal solution to the embedding problem, but again, our watermark is embedded into a secret space and not directly into the vertices' coordinates.

#### **3. The Karhunen–Loève Transform**

The discrete Karhunen–Loève Transform (KLT) (also called Hotelling transform or Principal Component Analysis (PCA)) is a linear transformation that maps vectors from one *n*-dimensional vector space to another vector space of the same dimension: this mapping is defined by a square matrix of size *n* × *n*, called the kernel.

Differently from other linear transformations like the discrete cosine transform or the Fourier transform, one of the characteristics of the KLT is having a kernel that is not fixed a priori, i.e., its transformation matrix is computed from a set of vectors that must be given to define the mapping.

This characteristic of the KLT is exploited in the present algorithm to have a compact, secure, and efficient way to define a secret space of embedding: if the set of vectors used in computing the KLT kernel is kept secret, then also the kernel will be secret, defining a transformed domain in which to embed the watermark securely; for example, using a secret image's pixel values to define these vectors, then only the entities possessing this image will be able to verify the integrity of the 3D model. Moreover, lacking knowledge of the embedding space does not allow modifying the 3D model without altering the fragile watermark.

Consider a set of vectors {**x**1, **x**2, ... , **x***r*} of an *n*-dimensional vector space, and compute their mean vector **m** = *E*{**x***i*}. Then, evaluate the covariance matrix of the centered vectors **C** = *E*{(**x***<sup>i</sup>* − **m**) (**x***<sup>i</sup>* − **m**) - }, where **z** represents the conjugate transpose of **z**. The eigenvectors **e***i*, *i* = 1, 2, . . . , *n*, of the covariance matrix **C**, computed as:

$$\mathbf{C}\mathbf{e}\_{i} = \lambda\_{i}\mathbf{e}\_{i} \tag{1}$$

define an orthonormal basis for the *n*-dimensional vector space; the *λi*s are the associated eigenvalues. Thus, arranging the eigenvectors as rows of a matrix **A** in descending sort order of the respective eigenvalues, the KLT **y** of a vector **x** is computed as:

$$\mathbf{y} = \mathbf{A}(\mathbf{x} - \mathbf{m}).\tag{2}$$

Each element of the vector **y** is called the coefficient of the transform; the position of every coefficient in the vector **y** is called the order of the coefficient.

It is possible to reverse the transformation process through the inverse KLT:

$$\mathbf{x} = \mathbf{A}^{-1}\mathbf{y} + \mathbf{m} \tag{3}$$

which, by the orthonormality of **A**, may be also written as:

$$\mathbf{x} = \mathbf{A}'\mathbf{y} + \mathbf{m}.\tag{4}$$

One property of the eigenvectors is that they define the directions of maximum data spread in the data sample, and sorting them according to their associated eigenvalues allows exploiting the energy compaction property of the KLT transform, useful when expressing the elements of the vector space with a reduced number of coefficients: it may be shown that the eigenvalues represent the variance of the coefficients in each dimension.

For more details on the KLT, see [21].

#### **4. Genetic Algorithms**

A Genetic Algorithm (GA) is a computing paradigm that simulates, in a simplified manner, the evolution and selection processes of natural species.

It may be used in solving non-linear optimizations problems when they may be coded in a set of parameters, and it is possible to define a function, called the fitness (function), that measures how close a solution is to the optimum.

A set of instantiated parameters describing the problem at hand is called the individuals: these parameters are put in a sequence similar to a chromosome, which is distinctive for its individual and defines it. Moreover, a fitness function is defined to evaluate the quality of the individual in terms of approximating the optimal solution.

A set of individuals, i.e., a population, is evolved in a way similar to the development of natural species. They mate, reproduce, and possibly have random modifications of their genes.

A chromosome codes a sequence of parameters, typically binary or integer numbers or a mix of them (but any type of data could be used in principle) with the constraint that the operations on them keep the data types consistent in the resulting chromosomes.

A population of individuals is created and (randomly) initialized; in general, a population of a hundred chromosomes is considered adequate.

Then, the GA goes through an iterative process, and in every cycle, the population is evolved according to the following steps:


The described cycle is performed until a termination condition is met. In general, to avoid infinite cycles, an upper limit to the number of generations is fixed. Furthermore, the cycle may be terminated when at least one of the individual's fitness drops below (in this case, the lower the fitness, the better the individual) a pre-defined threshold. In both cases, the GA returns the best individual, representing the near-optimal solution it was able to find. For a deeper insight into GAs, see [22,23].

As will be shown in the following Section 5, to minimize the embedding noise due to watermark embedding, the watermark bits are stored, modifying the less significant part of the floating point numbers representing the 3D model vertex coordinates. This is obtained by considering the bytes encoding these floating point numbers as unsigned integers and changing the less significant bytes. This mode of operation results in a minimization problem in a non-linear space, a task where GAs, among other algorithms, are flexible and efficient to implement and apply.

#### **5. The Proposed Algorithm**

The developed algorithm has the objective to protect the shape and the structure of a 3D model, defined in terms of a mesh of polygons, providing a verification that allows for identification and localization of modifications to the model. In particular, the algorithm that will be presented in this section will protect:


In the present embodiment, vertices' normal vectors are not considered because it was found that some 3D modeling software (e.g., Blender) alter them according to vertices' positions; thus, authentication data would be obsoleted by the modeling software at saving or loading time, inducing our verification procedure to flag vertices as forged. Nonetheless, protecting vertices will have as a side effect the protection of normals computed from them.

To allow modifications that maintain the structure of the 3D model to the file representing it, the proposed method permits order alterations to the vertices and to the polygons as they are stored in the file. This provides independence, to some extent, from the particular file format.

The method is composed of two main modules, namely an embedder and an extractor, and three companion modules, which are a key generator, a watermark generator, and a verifier. The interactions between the modules and the main data interchanged are shown in Figure 1.

**Figure 1.** A high level scheme of the proposed method.

Each vertex is considered an Embedding Unit (EU); in the present embodiment, the vertex is defined by three spatial coordinates (*x*, *y*, *z*) and by the polygons to which the vertex belongs; a detailed description of this structure and of the embedding procedure will be given at the end of this section and in Section 5.2. The use of the EU concept solves the problem in 3D models that is not present in audio and image samples, that is the lack of the total order of the vertices.

A bird's eye-view of the various modules is the following:


The input data are a 3D model composed of a set of vertices and a set of polygons. Each vertex is defined by three coordinates (*x*, *y*, *z*); every component value is considered in binary format, float representation (four bytes) in the IEEE 754 format. Every polygon is defined by the sequence of the indexes of its vertices. The watermark Embedding Unit (EU) is defined by the vertex data (*x*, *y*, *z*) along with a combined fingerprint of the *t* polygons to which the vertex belongs.

The fingerprint of a polygon is computed as follows: for every pair of vertices (**v***i*, **v***i*+1) encountered on the perimeter, a cryptographic hash (c. h.) function *H* is computed, *H*(**v***i*, **v***i*+1) = *hi*,*i*+1; thus, for a polygon of *n* vertices, *n* cryptographic hashes will be obtained *h*1,2, *h*2,3, ... , *hn*,1. To be independent from the starting vertex, these hashes are XORed *q* = *h*1,2 ⊕ *h*2,3 ⊕ ... ⊕ *hn*,1, then a c. h. is computed on the result *q*; the value *H*(*q*) is the fingerprint of a polygon. Finally, a combined fingerprint *F* is obtained by XORing (to be independent from the order of the polygons considered) the fingerprints *H*(*qi*) of the *t* polygons the vertex belongs to:

$$F = H(q\_1) \oplus H(q\_2) \oplus \dots \oplus H(q\_l). \tag{5}$$

The c. h. function *H* used is MD5, whose hash length is 16 bytes.

Thus, the EU is composed of *x*, *y*, *z* (all floats, each one occupying four bytes) and a fingerprint *F* (bit string, 16 bytes), making 28 bytes in total (see Figure 2).

In Figure 2, note the bytes marked with vertical stripes: they are the least significant bytes of the mantissa of each float value. The watermark is embedded altering only those bytes; it follows that the polygon hashes do not take into account those bytes in the computation, i.e., if [.]<sup>3</sup> is the operator that extracts the three most significant bytes, then given two vertices **v***<sup>i</sup>* = (*xi*, *yi*, *zi*) and **v***<sup>j</sup>* = (*xj*, *yj*, *zj*), the c. h. *hi*,*<sup>j</sup>* is:

$$h\_{i,j} = H(\left[x\_i\right]\_3 \mid \left[y\_i\right]\_3 \mid \left[z\_i\right]\_3 \mid \left[x\_j\right]\_3 \mid \left[y\_j\right]\_3 \mid \left[z\_j\right]\_3 \rangle\_\prime \tag{6}$$

where the symbol | means string concatenation.

**Figure 2.** The fields and structure of an Embedding Unit (EU).

Note that the algorithm can cope with more data related to each vertex, for example *uv* coordinates: it is sufficient to add them to the EU and modify the KLT basis size.

#### *5.1. Key Generation*

The key generator module has the objective to create a secret orthonormal basis to define a secret space of embedding. This space must be known to the embedder and to the extractor; thus, the key generator must provide the basis to both of them during their operations.

In principle, any secret orthonormal basis of the required dimension (in the present embodiment, 28) would suffice; nonetheless, the KLT provides a method to define a basis starting from a set of vectors, so we found that using a secret image to derive a set of vectors from which to compute a KLT basis was a flexible and viable solution.

#### *5.2. Watermark Embedding*

The watermark string *w* to be embedded into an EU is derived by the key generator module from the fixed part of the EU itself, i.e., the diagonal striped bytes in Figure 2. This string can be computed as the MD5 hash of the aforementioned data: from the resulting 128 bits, a predefined subset is extracted for embedding. Note that this step is not strictly necessary because the watermark is embedded into a secret space, as will be discussed in the following; thus, also a constant *w* bit string would suffice; nonetheless, a variable *w* further improves security.

A high level description of the algorithm is the following.

The fragile watermark is stored in the coefficients of every EU KLT: the 28 bytes of the EU are considered as 28 non-negative integer pixel values, which are KLT transformed, producing 28 coefficients. If *m* is the payload in bpv (bits-per-vertex), i.e., *m* is the length of the watermark *w* for every EU, then a subset of *m* KLT coefficients is chosen a priori to store the watermark bits. In the present implementation, a coefficient *c* is considered as carrying a watermark bit *b* in position *p* (the *p* value is a parameter of the algorithm) iff:

$$b = \text{round}(2^{-p}c) \bmod 2 \tag{7}$$

(see [24] for a deeper discussion on various methods for embedding the watermark bits into a set of coefficients).

The GA alters the values of the vertical striped bytes in Figure 2 until the watermark is embedded into the EU KLT coefficients, i.e., if **g** = (*g*1, *g*2, ... , *g*28) is the EU, the GA alters *g*4, *g*8, *g*<sup>12</sup> (the vertical striped bytes) so that the predefined coefficients in (*c*1, *c*2, ... , *c*28) = KLT(*g*1, *g*2, ... , *g*28) contain the respective *m* bits watermark. The reason for using a GA in this task is its ability to find the solution of a non-linear problem (bytes representing integer values) looking for the (local) minima of the error w.r.t. the original data.

Referring to Equation (7), the present implementation of the algorithm considers the first *m* coefficients among the 28 transform coefficients and embeds into the bit in position *p* = −2.

When all the EUs have been processed by the GA, the 3D model with the new modified coordinates contains the fragile watermark allowing for the integrity check by the modules presented in the following section.
