5.1. Performance Comparison of rCBF, L-FBF, and L-rCBF
Our simulation was performed for datasets with specific distributions because learning-based data structures use the distribution of the elements in each set. A total of 245,514 URLs were used as a positive set (i.e.,
S) with six return values (i.e., six classes) [
28], and 1,491,178 blacklisted URLs were used as a negative set (i.e.,
) [
29].
We compared five BF structures: a single rCBF, L-FBF, L-FBF, L-rCBF, and L-rCBF. Each of the two L-FBFs supports the deletion operation using a V-FBF for dynamic data. This performance comparison used two models with different classification accuracies and memory requirements. One model was included in the L-FBF and L-rCBF, and the other model was included in the L-FBF and L-rCBF. The memory requirement of a model generally increases with the accuracy of the model in a learning-based structure. However, this does not necessarily imply that the total memory requirement of the structure increases. For a fair comparison, we first constructed the L-rCBF, and then constructed the other four structures with the same memory requirement as that of the L-rCBF.
To train the two models, character-level pretrained embeddings were utilized with principal component analysis (PCA). The models include a long short-term memory (LSTM) layer, two one-dimensional convolutional neural network (CNN) layers, and three fully connected layers with softmax activation. However, because the hyperparameters of the two models are different, the memory requirements and accuracies of the models are also different.
Table 1 compares the number of weights (
w) and memory requirements of the two models. The memory requirement of a model depends on
w because it is calculated using
w. Model
for the L-rCBF
and L-FBF
requires more memory than Model
for the L-rCBF
and L-FBF
; however, Model
provides a higher level of accuracy.
Because an additional FP from the FR-BF causes an FN from the V-rCBF, an L-rCBF should be reconstructed at threshold
before the FR-BF begins to return an additional FP owing to too many deletions. Threshold
can be set to the number of deleted elements at the point where the theoretically calculated
is less than and close to 1. Let
be
,
the number of elements programmed in the FR-BF (i.e.,
),
the FR-BF size,
the number of hash functions of the FR-BF, and
the number of elements to be programmed into the FR-BF for deletion. Therefore,
can be calculated as follows:
Figure 3 shows the theoretical and experimental
in the L-rCBF
with Model
according to the elements deleted. When deleting more than 15% of the elements in the L-rCBF
,
. Hence,
can be predefined as 15% of the elements. In this experiment, reconstruction was not considered for a simple performance comparison; hence, to evaluate the deletion performance, 15% of the randomly selected URLs among those in
S were deleted. In addition, a search performance experiment was performed for all URLs in
U (i.e.,
).
In the construction procedure, the L-rCBF is constructed first, and then the other four structures are constructed with the same amount of memory as that of the L-rCBF. Let be the number of elements programmed into a verification structure (i.e., ), the number of cells in the verification structure, the size factor of the verification structure (i.e., ), and the size factor of the FR-BF (i.e., ). The L-rCBF comprises Model, a -bit FR-BF, and an -cell V-rCBF (i.e., and in the L-rCBF). To allocate the same amount of memory as in the L-rCBF, the size factors of the FR-BFs and verification structures should be adjusted in the L-FBF, L-FBF, and L-rCBF.
For the FR-BF,
depends on the model accuracy. As the accuracy of the model increases,
increases and
decreases. To satisfy the condition
until 15% of the elements are deleted in the learning-based structures, each
in the L-rCBF
and L-FBF
using Model
with a higher accuracy than Model
should be increased to increase
, considering (
11). In other words,
in the L-FBF
is the same as that in the L-rCBF
(i.e., 32) because both structures use Model
. However, each
in the L-FBF
and L-rCBF
is set to 49 to satisfy the condition. Hence, to support deletions,
should be increased as the accuracy of the model increases.
A verification structure is constructed with the remaining memory after constructing a model and FR-BF from the total memory. We assume that and because six values exist. A single cell in an rCBF has five bits (i.e., bits), whereas a single cell in an FBF has three bits (i.e., L bits). Therefore, if an L-FBF and L-rCBF include the same model, the of the V-FBF in the L-FBF is greater than that of the V-rCBF in the L-rCBF. Therefore, the in the L-FBF is 13.33, 14.03 in the L-FBF, and 8.42 in the L-rCBF. In addition, the size factor of the single rCBF is 6.19.
Table 2 compares the
undeletable rates of the rCBF, L-FBF
, L-FBF
, L-rCBF
, and L-rCBF
when using the same amount of memory. No UNDEL-FPs from the rCBF and L-FBFs are observed because the structures return more UNDEL-Cs than the L-rCBFs. However, the total number of undeletables from each L-rCBF is less than those from the rCBF and L-FBFs. In terms of the deletion performance, structures using rCBFs outperform those using FBFs because of their counter fields. Even though each V-FBF in the L-FBFs has more cells than each V-rCBF in the L-rCBFs, the undeletable rates of the L-rCBF
and L-rCBF
improve by 83.67% and 76.67% compared with those of the L-FBF
and L-FBF
, respectively.
Table 3 compares the
search failure rates of the rCBF, L-FBF
, L-FBF
, L-rCBF
, and L-rCBF
when using the same amount of memory. The reduction rate in search failures represents the proportion of reduced search failures observed in a learning-based structure relative to the single rCBF. The four learning-based structures improve the search failure rates compared with the single rCBF. When comparing L-FBFs with L-rCBFs, the search failure rates of the L-FBFs are better than those of the L-rCBFs because the
of the V-FBFs (i.e., 13.33 and 14.03) are greater than those of the V-rCBFs (i.e., 8 and 8.42). However, if insertions and deletions are repeated for dynamic data, the performance gap of the search failure rates between the L-rCBF and L-FBF with the same model would be reduced owing to an increase in the number of conflict cells in the V-FBF. Furthermore, because of the significantly superior deletion performance of the L-rCBFs compared with that of the L-FBFs, as shown in
Table 2, the L-rCBFs are more appropriate than L-FBFs for dynamic data processing. In addition, when comparing the L-FBF
to L-FBF
or the L-rCBF
to L-rCBF
, in terms of the search and deletion performances, each structure using Model
outperforms each structure using Model
, despite Model
requiring more memory than Model
, because of the higher accuracy of Model
.
For static data, because insertions and deletions are infrequent and searches are primarily performed, using the L-FBF with an improved search performance is more efficient than using a single rCBF. Especially if the FBF for dynamic data is replaced with that for static data in the L-FBF, the search performance of the L-FBF improves.
Table 4 compares the
search failure rates of the rCBF, L-FBF
with the FBF for dynamic data, and L-FBF
with the FBF for static data when using the same amount of memory. Because the number of conflict cells in the FBF for static data is less than that for dynamic data, using the FBF for static data can reduce the number of INDETs included in set
S. Hence, the L-FBF for static data is more efficient than a single rCBF and the L-FBF for dynamic data when insertions and deletions are infrequent.
Additionally, we compare the L-rCBFs (i.e., L-rCBF
and L-rCBF
) to two L-FBFs (i.e., L-FBF
and L-FBF
), which are identical to the L-FBF
and L-FBF
except for the V-FBFs, and the
of the V-FBFs have the same values as those of the V-rCBFs in the L-rCBFs: 8 and 8.42, respectively.
Table 5 compares the
undeletable rates when using a verification structure with the same
. The undeletable rates of the L-rCBF
and L-rCBF
are reduced by 98.73% and 98.20%, respectively, compared with those of the L-FBF
and L-FBF
, respectively. In terms of the search performance, if both an L-FBF and L-rCBF with the same model possess the same
and
values, the structures have the same search failure rates. Hence, the search failure rates of the L-rCBF
and L-FBF
are the same, and those of the L-rCBF
and L-FBF
are also the same. However, if insertions and deletions are repeated, the search failure rates of the L-rCBFs would be better than those of the L-FBFs.
5.2. Comparison of Probabilities between Theoretical and Experimental Results for rCBF
This section compares the
undeletable and
search failure probabilities (i.e.,
and
) between the theoretical and experimental results for the rCBF and FBF supporting the deletion operation. To obtain the results for
and
, experiments were performed using
random URLs for set
S and
URLs for
, and the URLs were obtained from ALEXA [
30]. To allocate the same number of bits to a value field in the rCBF and a cell in the FBF, we assumed 254 return values; hence, a cell in the rCBF has a two-bit counter and eight-bit value field, and a cell in the FBF has eight bits to store values. However, the rCBF can store up to 255 values for eight bits (i.e.,
) because the rCBF does not need to reserve the maximum value
as a conflict value.
Let
n be the number of elements stored in a BF structure,
the size factor of the structure,
m the number of cells in the structure (i.e., BF size
), and
M the memory requirement of the structure.
Figure 4 and
Figure 5 compare the
and
between the theoretical and experimental results according to the BF size, respectively. When each
of the FBF is 2, 4, and 8, each
of the rCBF
is
,
, and
, respectively, because the rCBF
uses the same
M as the FBF. The rCBF
uses the same
as the FBF. Although the
of the rCBF
is smaller than that of the FBF, the
of the rCBF
is much smaller than that of the FBF. The
of the rCBF
is slightly greater than that of the FBF; however, if insertions and deletions are repeated, the
of the rCBF would be better than that of the FBF. Hence, with dynamic data, replacing the FBF with the rCBF can improve the performance of the overall structure (i.e., L-rCBF), even though the FBF is better than the rCBF in terms of the
when using the same
M. In addition, the experimental results validated the theoretical analysis, as shown in
Figure 4 and
Figure 5.