1. Introduction
Anomaly detection refers to the identification of data points in a dataset that deviate from normal behavior. These deviations are known as anomalies or outliers in various application domains. This mode of detection is extensively used in real-world settings, including credit card detection, insurance detection, cybersecurity intrusion detection, error detection in security systems, and military activity monitoring [
1,
2]. However, the acquisition of anomaly data in practical applications, such as medical diagnostics, machine malfunction detection, and circuit quality inspection, is expensive [
3,
4]. Consequently, there is significant interest in one-class classification (OCC) problems, where training samples only include normal data (also known as target data) or normal data with a small number of anomalies (also referred to as non-target data) [
5,
6,
7]. In this context, it is important to define the following terms:
Normal data: Normal data refers to data points that conform to the characteristics and behavior patterns of the majority of data points within a dataset. They represent the normal operating state of a system or process.
Anomalies: Anomalies are data points that significantly deviate from normal patterns and usually reflect actual problems or critical events in the system.
Outliers: Outliers are data points that are significantly different from other data points in the dataset, which may be due to natural fluctuations, special circumstances, or noise.
Noise: Noise refers to irregular, random errors or fluctuations, usually caused by measurement errors or data entry mistakes, and does not reflect the actual state of the system.
Support vector data description (SVDD) is a method extensively used for one-class classifications (OCCs) [
8]. The core idea of SVDD is to construct a hyper-sphere of minimal volume that encompasses all (or most) of the target class samples. Data points inside the hyper-sphere are considered normal, while those outside are considered anomalies. SVDD can be easily integrated with popular kernel methods or some deep neural network models [
7], making it highly scalable and flexible. Due to these attractive features, SVDD has garnered significant attention and has been extensively developed. SVDD is regarded as an effective and excellent technique for anomaly detection problems; however, it remains sensitive to outliers and noise present in training datasets. In real-world scenarios, various issues, such as instrument failure, formatting errors, and unrepresentative sampling, result in datasets with anomalies, which degrade the performance of the SVDD [
9,
10].
The existing methods for mitigating the impact of noise are typically categorized as follows:
In order to reduce the impact of outliers on the OCC method, researchers have attempted to remove outliers through data preprocessing methods. Stanley Fong used methods such as cluster analysis to remove anomalies from the training set to achieve a robust classifier [
11]. Breunig et al. tried to assign an outlier score to each sample in the dataset by estimating its local density, known as LOF [
12]. Zheng et al. used LOF to filter raw samples and remove outliers [
13]. Khan et al. [
14] and Andreou and Karathanassi [
15] calculated the interquartile range (IQR) of training samples, which provides a method for indicating a boundary beyond which samples are marked as outliers and removed. Clustering (k-means, DBSCAN) or LOF methods can significantly reduce noise points in data preprocessing, thereby improving the quality and effectiveness of subsequent model training. For datasets with obvious noise and significant distribution characteristics, preprocessing methods can be very effective in enhancing model performance. However, preprocessing methods have several issues: they require additional computational resources and time, especially with large datasets, potentially making the preprocessing step time-consuming. Moreover, there is a risk of overfitting and inadvertently deleting useful normal data points, impacting the model’s ability to accurately detect anomalies. Additionally, these preprocessing methods are sensitive to parameter settings, necessitating careful selection to achieve satisfactory results.
Zhao et al. proposed the dynamic radius SVDD method that accounts for hyper-sphere radius information and the existing data distribution [
16,
17]. This approach achieves a more flexible decision boundary by assigning different radii to different samples. Moreover, the framework for the dynamic radius SVDD method is based on the traditional SVDD framework. However, if the traditional SVDD has not undergone adequate training, the good performance of these dynamic approaches will be difficult to guarantee.
Density-weighted SVDD, position-weighted SVDD, Stahel–Donoho outlier-weighted SVDD, global plus local joint-weighted SVDD, and confidence-weighted SVDD are examples of weighted methods [
18,
19,
20,
21,
22,
23,
24]. These methods assign smaller weights to sparse data, which are commonly outliers, thus excluding them from the sphere. These methods balance the target class data and outliers in the training phase, thus enhancing the classification performance, especially when the data are contaminated by outliers. However, when the number of outliers in the dataset increases and they form sparse clusters, the number of outliers might surpass that of normal samples. In such cases, weighted methods assign higher weights to the outliers and lower weights to the normal samples, leading to decreased algorithm performance.
The convex property of the hinge loss function of the SVDD algorithm makes it sensitive to outliers. To address this issue, Xing et al. proposed a new robust least squares one-class support vector machine (OCSVM) that employs a bounded, non-convex entropic loss function, instead of the unbounded convex quadratic loss function used in traditional least squares OCSVM [
25]. The non-convex nature of the ramp loss function makes this model more robust than the traditional OCSVM [
26]. Tian et al. introduced the ramp loss function to the traditional OCSVM to create the Ramp-OCSVM model [
27], and the non-convex nature of this model makes it more robust than the traditional OCSVM. Xing et al. enhanced the robustness of the OCSVM by introducing a re-scaled hinge loss function [
28]. Additionally, Zhong et al. proposed a new robust SVDD method, called pinball loss SVDD [
29], to perform OCC tasks when the data are contaminated by outliers. The pinball loss function ensures minimal dispersion at the center of the sphere, thus creating a tighter decision boundary. Recently, Zheng introduced a mixed exponential loss function to the design of the SVDD model, enhancing its robustness and making its implementation easier [
30].
Extensive research has shown that, as a result of unbounded convex loss functions being sensitive to anomalies, loss functions with boundedness or bounded influence functions are more robust to the influence of outliers. To address this issue, researchers introduced an upper limit to unbounded loss functions, effectively preventing them from increasing beyond a certain point. The truncated loss function thus makes the SVDD model more robust. The advantages of truncated loss functions include:
Robustness to noise: Truncated loss functions can enhance the model’s robustness and stability by limiting the impact of outliers without removing data points.
Reduction of error propagation: In anomaly detection tasks, outliers may significantly contribute to the loss function, leading to error propagation and model instability. Truncated loss functions can effectively reduce error propagation caused by outliers, thereby improving overall model performance.
Generalization ability: Using truncated loss functions can prevent the model from overfitting to outliers, enhancing the model’s generalization ability. Truncated loss functions are well-suited for various types of datasets and noise conditions, particularly when noise is not obvious or easily detectable.
However, robust SVDD algorithms still face considerable challenges in the research. They are designed to address specific types of losses and lack an appropriate framework for constructing robust loss functions. Thus, researchers are required to learn how to use different algorithms and modify loss functions before use. Since truncated loss functions are often non-differentiable, methods such as the difference of convex algorithm (DCA) [
31] and concave–convex procedures (CCCPs) [
32,
33] are commonly employed to provide solutions. For some truncated loss functions, the DCA cannot ensure straightforward decompositions or the direct use of comprehensive convex toolboxes, potentially increasing development and maintenance costs [
34]. At present, no unified framework exists in the literature for the design of robust loss functions or a unified optimization algorithm. Therefore, even though this is challenging, providing a new bounded strategy for the SVDD model is crucial, with the potential for developing more efficient and universally applicable solutions.
In response to the several issues previously outlined, this study proposes a universal framework for the truncated loss functions of the SVDD model. To address and solve the non-differentiable, non-convex optimization problem introduced by the truncated loss function, we employ the fast ADMM algorithm. Our contributions to this field of study are as follows:
We define a universal truncated loss framework that smoothly and adaptively binds loss functions, while preserving their symmetry and sparsity.
To solve different truncated loss functions, we propose the use of a unified proximal operator algorithm.
We introduce a fast ADMM algorithm to handle any truncated loss function within a unified scheme.
We implement the proposed robust SVDD model for various datasets with different noise intensities. The experimental results for real datasets show that the proposed model exhibits superior resistance to outliers and noise compared to more traditional methods.
The remainder of this paper is organized as follows:
Section 2: We review related support vector data description (SVDD) models, providing a foundational understanding of the existing methodologies and their limitations.
Section 3: We propose a general framework for truncated loss functions. Within this framework, we examine the representative loss functions’ proximal operators and present a universal algorithm for solving these proximal operators.
Section 4: This section introduces the SVDD model that utilizes the truncated loss function, detailing its structure and theoretical framework.
Section 5: A new algorithm for solving the SVDD model with truncated loss functions is presented. This section also includes an analysis of the algorithm’s convergence properties, ensuring that the method is both robust and reliable.
Section 6: Numerical experiments and parameter analysis are conducted to validate the effectiveness of the proposed model. This section provides empirical evidence of the model’s performance across various datasets and noise scenarios.
Section 7: The conclusion summarizes the findings and contributions of the study, and discusses potential future research directions.
3. Truncated Loss Function
SVDD models with unbounded loss functions can achieve satisfactory results when addressing scenarios lacking noise. However, the continual growth of these loss functions results in the collapse of the model when it is subjected to noise. Therefore, truncating the SVDD model’s loss function makes it more robust. The general definition of a truncated loss function is as follows:
where
is a constant, and
is an unbounded loss function, such that, when
,
. Since
is an abstract function, a general form of the truncated loss function includes several loss functions. The three specific truncated loss functions we created in our study are present as follows:
Truncated generalized ramp loss function: , where , .
Truncated binary cross entropy loss function: , where , .
Truncated linear exponential loss function: , where , , and .
Assuming the truncation point , the mathematical properties of the three truncated loss functions presented above can be summarized as follows:
For samples with , the loss value is 0; for samples with , the loss value is . Thus, the general truncated loss function exhibits sparsity and robustness to outliers.
and are truncated concave loss functions, which are non-differentiable at . is a truncated convex loss function, which is non-differentiable at and differentiable at .
and exhibit explicit expressions for the proximal operators, while does not.
In the next section, we provide explicit expressions for the proximal operators of and .
3.1. Proximal Operators of Truncated Loss Functions
Definition 1 (Proximal Operator [
35])
. Assume : is a proper lower-semi-continuous loss function. The expression for the proximal operator of at is defined as follows:whenis a convex loss function, it presents a single-value proximal operator; whenis a non-convex loss function, it exhibits a multi-value proximal operator. Lemma 1 ([
36])
. When , let , . The expressions for the proximal operators are as follows:.
Lemma 2. The explicit expression of the proximal operator is as follows:
- 1.
When , the explicit expression of the proximal operator is presented as follows: - 2.
When , the explicit expression of the proximal operator is as follows:
Proof of Lemma 1. Equation (16) exhibits that
is a local minimum of the following piecewise function:
The minima of the piecewise functions , , , , and are located at , , , , and , with minimum values of , , , , and , respectively.
Since , it follows that . If , then . When and is in the interval , .
When , the following conclusion can be reached by comparing the values of , , , , and .
- (1.1)
Since , we achieve , which means .
- (1.2)
Since , we obtain , which means or .
- (1.3)
Since , we achieve , which means .
- (1.4)
Since , we obtain , which means .
- (1.5)
Since , we achieve , which means .
According to (1.1)–(1.5), Equation (18) can be derived.
When , the following conclusion can be reached by comparing the values of , , , , and .
- (2.1)
As , we obtain , which means .
- (2.2)
As , we obtain , which either means or .
- (2.3)
As , we obtain , which means .
- (2.4)
As , we obtain , which means .
According to (2.1)–(2.4), Equation (19) can be derived. □
Lemma 3. The expression representing the proximal operator of the truncation function is as follows:where , , , and represent the minimizers of the piecewise function, and , , , and , represent the minimal values of the piecewise function. Proof of Lemma 3. It can be deduced from Equations (15) and (16) that
represents the local minimum of the following piecewise function:
Let ; the minimizers of the piecewise functions are , , , , and , and their minimal values are , , , , and , respectively. From , it follows that .
When , the stage function’s minimal values are ,,, and ; when , the stage function’s minimal values are , , and . Thus, we can observe that . If , then ; similarly, if , then . We can determine the following conclusions by comparing the values of , , , , and :
- (1.1)
When the conditions of , , and are met, and it follows that ;
- (1.2)
When the condition of is met, if is true, then or can be derived;
- (1.3)
When the condition of is met, if is true, then or can be derived;
- (1.4)
When the conditions of , , and are met, it follows that ;
- (1.5)
When the conditions of , , and are met, it follows that ;
- (1.6)
When the condition of is met, it follows that .
Thus, it is possible to successfully derive Equation (20). □
3.2. The Use of the Proximal Operator Algorithm to Solve Truncated Loss Functions
When in the truncated loss function is a monotonic and non-piecewise function, and can be expressed explicitly, the proximal operator of the truncated loss function can be calculated using Formula (20). In practical applications, however, it is sometimes impossible to obtain the explicit expression for ; for example, does not provide an explicit expression. The calculation of the proximal operator in such scenarios is discussed below.
For
, if it is smooth and has a second derivative, the problem is a smooth unconstrained optimization problem. Newton’s method is used to solve for the minimum of
in unconstrained optimization problems due to its high convergence rate. The gradient and Hessian matrix for problem (16) can be expressed as follows:
The minimizer and the minimal value can be obtained with Newton’s method for .
If an explicit expression for
cannot be achieved, the calculation of the proximal operator follows the same process as Lemma 3. Once the minimizers
and
are obtained, the proximal operator can be calculated. When the conditions of
,
, and
are met, we can derive
. Therefore, Formula (20) can be modified to express the following:
Based on the analysis presented above, the algorithm for solving the proximal operator of the truncated loss function is as following Algorithm 1:
Algorithm 1: Algorithm for solving the proximal operator of the truncated loss function |
|
|
. |
do |
according to Formula (21). |
is obtained. |
according to Formula (22). |
. |
. |
. |
9: End |
. |
according to Formula (23). |
4. Robust SVDD Model
Formula (1), for the SVDD formula, can be rewritten as follows:
where
represents the hinge loss function. Since the hinge loss function is sensitive to outliers, it can be replaced with the truncated loss function from Formula (15). Thus, the objective function for obtaining the robust SVDD model is as follows:
As the truncated loss function is non-differentiable, solving the objective function of the robust SVDD model is a non-convex optimization problem, and it cannot be solved using standard SVDD model methods.
Theorem 1 (Nonparametric Representation Theorem [
37])
. Suppose we are designated a non-empty set ; a positive definite real-valued kernel : ; a training sample ; a strictly monotonically increasing real-valued function on ; an arbitrary cost function : ; and a class of functions: In this scenario,
represents the norm in RKHS,
associated with
, i.e., for any
.
Then, any
minimizing the regularized risk function
admits a representation of
, where
represents coefficients of
in RKHS
.
A set of vectors
exists in the nonparametric representation theorem, where the center
is the optimal solution for problem (25). Therefore, Formula (25) can be transformed into the following:
Formula (28) represents the single-class SVDD model. To obtain data that include negative samples, these samples must be integrated into the SVDD model; then, the center is
, and the objective function of the robust SVDD model is as follows:
when
, it follows that
. Problem (29) is rewritten as the following matrix form:
where
,
,
, and
. This study discusses the use of the SVDD model as a solution for addressing data with negative samples, and the Lagrangian function for problem (30) is as follows:
The KKT conditions for problem (30) are provided below:
where
represents any KKT point.
The generalized non-smooth optimization problem can be represented by the following Formula [
38]:
where
and
are continuously differentiable functions on
, and
represents a non-smooth function on
. Problem (31) is considered as a form of the aforementioned generalized non-smooth optimization problem, where
,
,
, and
, representing the optimization model’s constraints, are nonlinear equality constraints. In this study, the fast ADMM algorithm was employed to solve problem (33), and the algorithm will be introduced in a subsequent chapter.
7. Conclusions
This study aimed to enhance the robustness and effectiveness of the SVDD algorithm. We propose a general framework for the truncated loss function of this algorithm, which uses bounded loss functions to mitigate the impact of outliers. Due to the non-differentiability of the truncated loss function, we employed the fast ADMM algorithm to solve the SVDD model with the truncated loss function, which handles truncated loss functions within a unified framework. In this context, the truncated generalized ramp, truncated binary cross entropy, and truncated linear exponential loss functions for the SVDD algorithm were constructed, and extensive experiments show that these three SVDD models exhibit more robustness than other SVDD models in most cases. However, this method still has the following shortcomings. Firstly, introducing the truncated loss function increases the complexity of model training, as some truncated loss functions cannot directly provide explicit expressions for neighboring point operators, requiring additional computational overhead. These factors may limit the application of the method to large-scale datasets. To overcome these limitations, future work can consider using a distributed computing framework to accelerate the training process of the ADMM algorithm. Secondly, the truncated loss function SVDD introduces new free parameters, which increases the time required for grid search parameter selection. When the data scale is large, the computation time for the grid search method may become unacceptable. To address this drawback, algorithms such as Bayesian optimization can be considered in the future to find the optimal parameters, further improving model performance and optimization efficiency. Finally, for extremely noisy data, the truncated loss function may not completely eliminate its impact, and the effect is limited. In this case, methods combining clustering algorithms such as DBSCAN can be adopted. First, clustering algorithms like DBSCAN can be used to preprocess the data and remove noise, and then the proposed method can be used to detect anomalies.