*4.1. Framework of SSAE*

The SSAE is a generative model. It consists of four modules: encoder, decoder, discriminator and classifier. In this model, the encoder and decoder are coupled as an autoencoder. They could learn more general features from all samples include unlabeled and labeled. Due to generative model, the SSAE could avoid overfitting effectively. The discriminator is aimed to regularize the autoencoder by a specified arbitrary prior. It judges the encoding distribution of *X* is same as the prior or not. This idea is borrowed from [27]. The classifier are designed to select features from latent vector and classify normal and abnormal. The architecture of SSAE is shown in Figure 5.

The autoencoder attempts to minimize the reconstruction error. The encoder defines an aggregated posterior distribution of *q*(*z*) on the latent vectors as follows:

$$q(Z) = \int\_{X} E(Z|X) p\_d(X) dX \tag{12}$$

where, *E*(*Z*|*X*) is encoding distribution, *pd*(*X*) is the data distribution. Meanwhile, the encoder ensures the aggregated posterior distribution *q*(*Z*) can fool the discriminator into thinking that the latent comes from the true prior distribution *p*(*Z*).

**Figure 5.** Framework of Semi-Supervised Deep Neural Network.

Figure 6 presents the detail architecture of the proposed network. Due to the samples are modeled as multi-channel 1D vectors, the encoder is equipped with 1D convolutional (Conv1D) layer. In detail, the encoder contains 4 Conv1D layers, each convolution layer contains 64 or 128 filters, and the kernel size is 5 and the stride is 2. The last layer of encoder is a fully connected layer without activation. The output dimension is related to latent space. In all experiments, 50 is the best choice for the dimension of latent space. The decoder contains three fully connected layers and a reshape layer to reconstruct samples. The output of the third fully connected layer is activated by sigmoid which related to the normalization of samples.

The discriminator also has three fully connected layers and the parameters are same as the decoder's fully connected layers. The difference is that the discriminator not only handles latent vectors, but also the samples drawn from *N*(*Z*|0, *I*) which called *Zreal*. The discriminator is more like a function to measure the *similarity* between the latent vector and *Zreal*.

**Figure 6.** The detail architecture of the SSAE modules.

The classifier just contains two fully connected layers and a dropout layer. The last layer is activated by Softmax even if there are only two categories. The first fully connected layer is aimed to ascend dimension of latent features because there are difference in the customers' SM data. Ascending the dimension of latent features will improve the linear separability. The dropout layer is used to avoid overfitting. The second fully connected layer of classifier will find hyperplane between categories to complete classifying. In fact, the classifier in the SSAE is similar to SVM. However, we cannot use SVM to replace these 2 fully connected layers, because it is impossible to co-train SVM and autoencoder together. The separated training will lead to a decline in the learning efficiency, such as [23] could not ge<sup>t</sup> satisfactory performance of NTL detection.
