*2.3. Embedding Layer*

It is a very difficult task to encode each word manually. This layer gives us an automatic and efficient way of representing words or documents in which matching words have a similar encoding [16]. This work is done by just multiplying one hot vector from the left with a weight matrix W ЄR<sup>d</sup> \*|V| where |V| represents the number of primary symbols related to the vocabulary as shown in Equation (3).

$$\mathbf{V}\_{\mathbf{Z}} = \mathbf{W}\_{\mathbf{X}\_{\mathbf{t}}} \tag{3}$$

As a result, the input sequence of amino acids becomes a solid valued vector (z = 1, 2, 3, 4, ... n). In the embedding layer, assume that the output dense vector length is 8, and each number map corresponds to a fixed vector length. After passing through layer proteins, the sequence becomes an 8×8 matrix e.g., as exposed in Equation (4). We may represent Thyronine amino acid with [0.5, −0.8, 0.7, 0.4, 0.3, −0.5, −0.7, 0.8] and Methionine with [0.4, −0.4, 0.5, 0.6, 0.2, −0.1, −0.3, 0.2].

$$\text{ProteinSeq2} = \begin{pmatrix} 0.1 & -0.4 & 0.1 & 0.2 & 0.6 & 0.4 & -0.1 & 0.1\\ 0.4 & -0.4 & 0.5 & 0.6 & 0.2 & -0.1 & -0.3 & 0.2\\ 0.2 & -0.2 & 0.6 & 0.7 & -0.1 & 0.1 & -0.2 & 0.1\\ 0.5 & -0.2 & 0.1 & 0.6 & 0.2 & -0.6 & -0.2 & 0.9\\ 0.4 & -0.4 & 0.5 & 0.6 & 0.2 & -0.1 & -0.3 & 0.2\\ 0.8 & -0.5 & 0.4 & 0.7 & 0.5 & -0.2 & -0.5 & 0.3\\ 0.9 & -0.6 & 0.7 & 0.8 & 0.2 & -0.1 & -0.2 & 0.7\\ 0.5 & -0.8 & 0.7 & 0.4 & 0.3 & -0.5 & -0.7 & 0.8 \end{pmatrix} \tag{4}$$
