*2.4. Convolution Layer*

The deep learning model works very efficiently in image processing and video analysis by extracting the deep features based on convolutional and pooling layers [24,25]. In case of images or videos, we directly give input data to the model because their data already exhibits a matrix arrangemen<sup>t</sup> [26]. While working with protein sequences, first, we prepare data in the form of a matrix with fixed-size and forwards to the convolution layer for processing like images. In this work, the model exhibits three convolution layers and each one is followed by a max pooling layer for deep features extraction. In this layer, we use 3 × 8 filters to scan the protein seq2 and obtain a new feature map as shown in Equations (5) and (6).

$$\begin{aligned} \text{Filter} &= \left[ \begin{array}{ccccc} 0.2 & 0.2 & -0.3 & 0.8 & 0.5 & 0.3 & 0.2 & -0.2 \\ 0.1 & 0.3 & -0.3 & 0.6 & 0.1 & 0.3 & -0.2 & 0.3 \\ 0.8 & -0.2 & 0.3 & -0.5 & 0.6 & 0.3 & 0.2 & 0.1 \end{array} \right] \\\\ \text{Protein seq3} &= \text{Convolution (Seq2)} = \begin{bmatrix} 0.48 \\ 0.53 \\ 0.75 \\ 0.20 \\ 0.25 \\ 0.62 \\ 0.40 \end{bmatrix} \end{aligned} \tag{5}$$

In max pooling layer, the sliding window takes the highest value of the two numbers as shown in Equation (7)

$$\text{Protein seq4} = \text{Max Probing (Seq3)} = \begin{bmatrix} 0.65\\ 0.53\\ 0.48\\ 0.62 \end{bmatrix} \tag{7}$$

## *2.5. MBD-LSTM Layer*

For the complications and issues related to short-term memory sequencing, RNN is employed, mostly for the cases where a long sequence is required to handle and stored for both forward and backward steps. As a result of the continuation of this procedure, RNN may depart crucial information out of the initial sequence data. But during backpropagation, a vanishing gradient issue is encountered, that makes it hard to memorize long-term changes in sequence [27,28]. Throughout the propagation process, the neural network weights are updated and shrink due to the gradient. These extremely minor weights do not participate to the learning process in an RNN, and also layers stop learning due to acquiring such a small gradient. In this situation the RNN does not have a capability to store longer sequence modifications that were observed previously. LSTM provides a solution by incorporating a short-term memory unit which is a special recurrent neural network architecture. LSTM emphases build memory cells and gates that regulate to process and store information and also allow when to update and forget the hidden states of the network [29]. The internal structure review of LSTM contains memory cell state Sst-1. These cells directly relate to Hst-1 which is the middle output state, and the successive state Xst controls the internal state vector which is required to be upgraded. There are three gates in LSTM structure; input gates Nst, forget gates Fst, and the output gate Ost. The mathematical notation of these gates are as follows.

$$\mathbf{F\_{ST}} = \sigma(\mathbf{W\_{F\lambda}}\mathbf{X\_{ST}} + \mathbf{W\_{FH}}\mathbf{H\_{ST-1}} + \mathbf{B\_F}) \tag{8}$$

$$\mathbf{I}\_{\rm ST} = \sigma(\mathbf{W}\_{\rm IN} \mathbf{X}\_{\rm ST} + \mathbf{W}\_{\rm IHI} \mathbf{H}\_{\rm ST-1} + \mathbf{B}\_{\rm I}) \tag{9}$$

$$\mathbf{N\_{ST}} = \phi(\mathbf{W\_{NN}}\boldsymbol{\chi\_{ST}} + \mathbf{W\_{NH}}\mathbf{H\_{ST-1}} + \mathbf{B\_N}) \tag{10}$$

$$\mathbf{^0O\_{ST}} = \sigma(\mathbf{W\_{OX}}\boldsymbol{\chi\_{ST}} + \mathbf{W\_{OH}}\boldsymbol{\Pi\_{ST-1}} + \mathbf{B\_O}) \tag{11}$$

$$\mathbf{S\_{ST}} = \sigma(\mathbf{G\_{T}}\theta\mathbf{I\_{ST}} + \mathbf{S\_{ST-1}}\theta\,\mathbf{B\_{S}}) \tag{12}$$

$$H\_{ST} = \phi(\mathcal{S}\_{ST-1})\,\theta\mathcal{O}\_{ST} \tag{13}$$

In Equations (8)–(13), the network inputs weight matrices are represented by WFX,WFH,WIX, WIH, WNX, WNH, WOX, and WHO. Here, θ is used for the multiplication in an elementwise manner. The two activation functions such as sigmoid and tanh are represented by σ and φ. The single time step of LSTM architecture is shown in Figure 3a. In this article, we evaluated the performance of MBD-LSTM for protein sequence identification. The idea of a MBD-LSTM is developed from traditional bidirectional RNN [30], which also processes the hidden layer input sequence data in both forward and backward direction. MBD-LSTM has achieved significant results in speech recognition [31], summarization [32], classification, energy consumption prediction [33], and text generation. The structure of MBD-LSTM consists of forward and backward layers as shown in Figure 3b. The output of the forward layer I > T is analyzed through input data from T − n to T − 1, while the output data of the backward layer H<sup>&</sup>lt; T is generated through reversed inputs such as from T − n to T − 1. Final MBD-LSTM generates the OToutput vector as illustrated in equation (14).

$$\Pr = \sigma(I\_{T'}^\circ, H\_T^\circ) \tag{14}$$

**Figure 3.** (**a**) Internal architecture of LSTM, comprising of multiple gates along with LSTM cells for stimulation of numerous operations, permitting the gates to store and omit related information; (**b**) MBD-LSTM, which acquires input data sequences and then proceeds in a forward and backward direction.

In Equation (11), σ combines an output sequence of two layers, which is also known as summation function.

#### **3. Results and Discussion**

In this portion, an in-depth analysis over comprehensive experiments which are performed on three protein sequence datasets and detailed discussion of comparative studies of the proposed model with state-of-the-art techniques is presented.
