CoreTemp: Coreset Sampled Templates for Multimodal Mobile Biometrics

Yoon, Jaeho; Park, Jaewoo; Kim, Jungyun; Teoh, Andrew Beng Jin

doi:10.3390/app14125183

Open AccessArticle

CoreTemp: Coreset Sampled Templates for Multimodal Mobile Biometrics

¹

Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea

²

AiV Research, Seoul 06745, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5183; https://doi.org/10.3390/app14125183

Submission received: 9 May 2024 / Revised: 4 June 2024 / Accepted: 11 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue Advanced Technologies in Gait Recognition)

Download

Browse Figures

Versions Notes

Abstract

:

Smart devices have become the core ingredient in maintaining human society, where their applications span basic telecommunication, entertainment, education, and even critical security tasks. However, smartphone security measures have not kept pace with their ubiquitousness and convenience, exposing users to potential security breaches. Shading light on shortcomings of traditional security measures such as PINs gives rise to biometrics-based security measures. Open-set authentication with pretrained Transformers especially shows competitive performance in this context. Bringing this closer to practice, we propose CoreTemp, a greedy coreset sampled template, which offers substantially faster authentication speeds. In parallel with CoreTemp, we design a fast match algorithm where the combination shows robust performance in open-set mobile biometrics authentication. Designed to resemble the effects of ensembles with marginal increment in computation, we propose PIEformer+, where its application with CoreTemp has state-of-the-art performance. Benefiting from much more efficient authentication speeds to the best of our knowledge, we are the first to attempt identification in this context. Our proposed methodology achieves state-of-the-art results on HMOG and BBMAS datasets, particularly with much lower computational costs. In summary, this research introduces a novel integration of greedy coreset sampling with an advanced form of pretrained, implicitly ensembled Transformers (PIEformer+), greatly enhancing the speed and efficiency of mobile biometrics authentication, and also enabling identification, which sets a new benchmark in the relevant field.

Keywords:

behavioral biometrics; gait dynamics; touchstroke; Transformer; ensemble; open-set authentication; open-set identification

1. Introduction

Smart devices, such as smartphones and tablets, play a key role in everyday life. Despite the technological advancements that have enabled such devices to dominate our lives, even the latest security measures are becoming outdated and expose users to various threats. For example, personal identification numbers, passwords, or patterns, while providing some degree of security, are especially prone to shoulder-surfing. Whereas biometrics-based methods such as fingerprint [1], facial [2], or iris recognition [3] do provide more security, most are limited to authentication at the point of entry.

Unlike physical biometrics, which primarily authenticate, i.e., one-to-one matching, at the point of entry, continuous authentication involves ongoing verification. Behavioral biometrics facilitates a seamless and smooth authentication process in this regard. This body of literature involves keystrokes [4], gait features [5], and touchstroke dynamics [6]. However, behavioral biometrics inherently includes various characteristics within a single identity, varying on factors such as mood. Therefore, the machine learning approach has been widely adopted, especially for touchstroke dynamics [6,7,8,9]. For clarity, we note that touchstroke dynamics are coupled with gait features, whereas, in this work, we refer to touchstroke-gait feature multimodal signals as touchstrokes for brevity.

The touchstroke authentication problems can be framed as traditional binary classification (closed-set) [6,7,8,10,11,12,13,14], anomaly detection (closed-set) [7,15], or open-set authentication [16] problems. Among the most primitive is traditional binary classification, where access to an abundant amounts of genuine and impostor data is assumed, and the model is trained as a binary classifier upon such data. However, [7,15] points out that the accessibility to impostor data is unrealistic, thus reformulating the problem as anomaly detection, where only genuine data are taken to train an anomaly detector. Unfortunately, anomaly detectors are widely known to suffer from the problem of feature representation collapse [17]. Moreover, though arguing as anomaly detectors, most rely upon limited generated or reinforced impostor data [7,13,14,18,19,20,21]. Also, training the model individually on genuine user data can be resource-consuming, make further updates and maintenance inefficient, and prone to model parameter breaches.

To resolve the problems above [16], introduces an open-set authentication (OSA) for touchstroke biometrics, where the model is trained upon a multiple identity pretraining dataset. Upon distribution of the model, the enrollment phase involves a simple extraction of the touchstroke features from the user inputs, where the extracted features are stored as the template, in contrast to the training of models in binary classification or anomaly-detection frameworks. Subsequently, in the authentication phase, the input is likewise forwarded through the model, where its features are compared to measure the nearest neighbor distance with the stored template.

By nature, OSA removes the shortcomings of binary classification of anomaly detection, enabling a more secure and efficient authentication. Specifically, its application with transformer-based architectures [22,23], regarding the temporal nature of touchstroke dynamics, showed robust performance. State-of-the-art was achieved by a model called PIEformer [16], attempting to resemble the effect of ensembles of multiple Transformers while leading to minimal increment in model parameters and computational complexity. While OSA with PIEformer sets a notable benchmark, we make the following observations.

The touchstroke input from the user may be substantial in reality, leading to a bulky template where resource-limited mobile devices may hinder finding the nearest neighbor distance. Hence, a method of extracting a user-representative, robust subset of the original template should be investigated.
The PIEformer intends to resemble the effect of having multiple models by having multiple learnable embedding inputs, where various experiments show its soundness. However, the global attention mechanism introduces some dependency between these embeddings, thus introducing a marginal limitation on the original objective.

In this work, we primarily address the observations above. First, realizing the necessity of a method of reducing the template to a user-representative, robust subset, we propose CoreTemp: coreset-reduced templates. Motivated from the memory-reduction technique [24], where the combination of iterative greedy approximation [25,26] and Johnson–Lindenstrauss (JL) [27] results in an effective reduction of templates. Moreover, based on the integrity of the reduction of dimensionality attributed to the JL theorem, we further put forth a quick-authentication mechanism, where the claimants’ features are also reduced to be compared to the template reduced both in size (CoreTemp) and dimension, that is, a dimensional-reduced CoreTemp.

Achieving a user-representative template of much smaller size and dimensionality leads us to explore its new capabilities further. Noticing that some mobile devices are shared, such as educational tablets, we utilize the smaller capacity of CoreTemp to enable identification, i.e., one-to-many matching by simply tagging multiple (identity) CoreTemps with their corresponding identity. We note that a mere application of the original templates within this framework of OSA leads to a substantial computational overhead, deviating from the objective of behavioral biometric authentication.

Second, we improve the PIEformer by introducing a masking layer between the learnable embedding inputs. The main idea of PIEformer comes from resembling an explicit ensemble [28] of multiple Transformers by adopting multiple aggregated learnable embedding inputs.

However, unlike the explicit ensemble where the performance enhancement is dependent on the diversity of individual sets of model parameters, attributed to predominantly by independent randomized initialization of model parameters and independent training of each model, the independence between embedding inputs in PIEformer is jeopardized due to the attention mechanism itself.

Thus, by including an extra masking layer in the attention between the learnable embeddings, we achieve a clearer partitioning of information across the embedding inputs, achieving a closer approximation to the behavior of an explicit ensemble. This modified architecture, now referred to as PIEformer+ and illustrated in Figure 2, reaches state-of-the-art performance while maintaining the identical computation and model complexity as PIEformer.

Our main contributions are as follows:

We propose utilizing greedy coreset sampling upon the touchstroke template. Extracting a user-representative yet reduced template efficiently uses limited memory on mobile devices. Moreover, we also propose a fast match algorithm conceived from the JL theorem adopted in greedy coreset sampling, leading to much more efficient authentication.
We introduce a novel variant of PIEformer suitable for initially generating user-discriminative templates, PIEformer+. While [16] has deeply investigated simulating an ensembled effect of Transformers with multiple learnable embedding inputs at a low computational cost, PIEformer+ simulates this phenomenon more closely by introducing a masking block in between the learnable parameters.
We demonstrate the feasibility of authentication and identification with the reduced templates. This highlights the importance of extracting a robust, user-representative touchstroke template. The proposed methods are evaluated on two datasets, HMOG and BBMAS, where they reach state-of-the-art performance in authentication and identification tasks.

Building on these foundation, our study aims to enhance the efficiency and effectiveness of mobile biometric authentication systems through several key innovations, i.e., CoreTemp and PIEformer+. These enhancements aim to address the primary challenges of traditional methods by providing a more secure and efficient approach to continuous authentication and identification using touchstroke-gait behavioral biometrics.

2. Related Works

Authentication via touch gestures and gait features, or touchstrokes, involves identifying users by their touch interactions on the screen, which include touch trajectories, device movement, and orientation. Previous studies have primarily utilized data like touch coordinates, pressure, and motion sensor data from accelerometers and gyroscopes [18,19,20,21,29,30,31,32,33,34,35].

Some primitive approaches attempt manually handcrafting the touchstroke signals, where they involve deriving the trajectory length, the median pressure, and so on [18,19,20,21,30,31,32,33,34,35]. This dependence is due to the sensitivity of touch gestures to behavioral variances. Typically, these features are extracted and used to train various predictive models based on traditional machine learning, such as one-class support vector machines (SVM) [21,30], kernel ridge regression [33], random forest [19], temporal regression forest [32], or SVM [18,20]. A major fallback of such methods is the reliance on task- or user-specific knowledge for creation, making them challenging to design. Additionally, these features are often not fine-grained enough to effectively identify subtle patterns of impostors. Some studies have further explored the realm of deep learning, where integration of deep models with handcrafted features takes place [36], or the development of models that learn features directly from raw touch-gesture data in an end-to-end manner [29,37] to overcome the limitations of manual features.

Realizing the weak correlation of behavioral biometrics with the user’s identity, and with the rise of deep learning, more literature has focused on utilizing deep models for authentication with touchstroke-gait biometrics. Specifically, most works have commonly employed sensory data from the touchscreen, accelerometer, and gyroscope. We limit our scope of interest to deep learning-based approaches and further explore the studies that have utilized deep models.

A typical approach begins by understanding this problem as a binary classification among genuine and imposter users. The reader may refer to a recent survey in [38]. However, acquiring complete datasets of impostor data for mobile devices is often unattainable. To overcome this limitation, the issue can be approached as a few-shot binary classification problem, as proposed by [9]. This method seeks to enhance the robustness of detection against unseen impostors by utilizing the limited impostor data available. Ideally, a one-class classification model would be used, training solely on authentic user data [7,15]. Yet, this method performs poorly because it is particularly susceptible to feature representation collapse, which undermines the model’s ability to generalize and maintain resilience, as identified by [17].

Traditionally, classification models for mobile behavioral biometrics authentication have been developed within a closed-set framework, utilizing either known genuine and impostor data or only genuine data for training. This traditional approach, however, falls short in practical security and usability scenarios. In response, ref. [16] introduced an open-set authentication (OSA) strategy that allows for zero-shot inference, where the model can predict unseen data. Moreover, in [16], in a realization that the exploration of various deep learning architectures for mobile biometrics authentication has been limited to LSTM networks [6,7], Convolutional LSTM [8,14], Autoencoders [15], and Temporal Convolutional Networks (TCN) [9], the authors attempt their solution with Transformers [22,23]. Furthermore, an implicit-ensembled model named PIEformer is proposed, noted on the ensembled model’s capability of exhibiting robustness in open set scenarios, mainly due to their enhanced generalization performance [39].

In this work, we realize that such a framework requires a bankset of templates, whose size would likely present a computational overhead in practical scenarios. In realizing the necessity of shrinking the banks into a user-representative, robust set of templates, we thus propose performing greedy coreset sampling on the initial template. A reduced template expands its capabilities further to identification, where tagging templates with multiple identities does the job, as further detailed in Section 4.2. Furthermore, we propose PIEformer+, which attempts to mimic the effect of ensembles closer, where its presentation with Transformer and PIEformer is given in a comparison in Section 4.3.

3. Preliminary

3.1. Open-Set Authentication and Identification

Open-set methodologies are gaining traction in biometric security research. In this context, ref. [16] is noted for pioneering Open Set Authentication (OSA) in touchstroke biometrics. At the same time, our study is the inaugural application of identification techniques within an open-set environment specifically for touchstroke-gait security. We delineate the scope of the “open set” used in our research to clarify the terminology and enhance the presentation.

We first assume model

f (\cdot)

where it is pretrained upon a pretraining set

P = {(X_{1}, y_{1}), (X_{2}, y_{2}), \dots, (X_{| P |}, y_{| P |})}

that consists of samples from K distinct classes, that is,

y_{i} \in Y_{P} = {0, 1, \dots, K - 1}

. For authentication, a set of input samples is given to

f (\cdot)

for a construction of templates, where its identity

y_{g}

is considered disjoint from the pretraining set, that is,

y_{g} \notin Y_{P}

.

During authentication, the claimant’s sample

\bar{X}

is likewise passed to the system, whose associated, unknown identity likewise comes from an open-set, thus without a loss of generality, it is assumed that

\bar{y} \notin Y_{P}

. Subsequently, a similarity score

s (f (\bar{X}))

is derived upon a comparison with the template, where the decision to accept or reject is made upon a certain threshold. We point out that a model with profound generalization ability is required because of the assumption that

\bar{y} \notin Y_{P}

.

For identification, we likewise assume a set of users whose identities are pairwise disjoint from

Y_{P}

, that is,

Y_{i d} \cap Y_{P} = ϕ

, where the samples of associated identity are utilized to generate templates of the given identity in enrollment. Likewise, following the aforementioned notations, we assume

\bar{X}

whose associated identity comes from

Y_{i d}

, that is,

\bar{y} \in Y_{i d}

, whose identity is derived by an argmax of similarity scores spanning the entire set of templates.

3.2. Greedy Coreset Sampling

To our knowledge, ref. [24] is one of the leading works to implement greedy coreset sampling to generate a reduced yet representative memory bank in machine learning; recognizing the ubiquitousness of memory banks in conjunction with pretrained networks accentuates the importance of structuring a generalized, robust, yet practical (especially in size) memory bank. Realistic applications being hindered due to their size have already been pointed out in [40]. Consequently, a primitive approach to this problem would be understanding the generalizability and size of the memory bank as a tradeoff.

However, both concerns may be addressed simultaneously by reducing the memory bank with the coreset subsampling algorithm. To this end, let us assume training samples

X = {x_{1}, x_{2}, \dots, x_{M}}

, and a pretrained network

f (\cdot)

, where the initial memory bank would be simply defined as

M = ⋃_{i = 1}^{M} f (x_{i}) .

(1)

Here, we aim to search a representative subset such that

M_{C} \in M

.

Observing that ref. [16] contemplates the use of nearest neighbor distance in harmony with [24], we likewise adopt minimax facility location coreset selection via iterative greedy approximation [25,26], assuming a flowing memory bank randomly initialized as

M_{C}^{*} \leftarrow {x_{i}} (i = 1, 2, \dots, M)

. Subsequently, our objective is to search

m_{n e x t}

such that

m_{n e x t} = \underset{m_{n e x t} \in M - M_{C}^{*}}{arg max} min_{m \in M} | | ψ (m_{n e x t}) - ψ (m) {| |}_{2},

(2)

where

M_{C}^{*}

is iteratively given as

M_{C}^{*} \leftarrow M_{C}^{*} \cup {m_{n e x t}}

with

m_{n e x t}

being continually updated by Equation (2) each iteration until it reaches the predefined size, and

ψ : R^{d} \to R^{d^{*}} (d > d^{*})

a random projector for computational efficiency [41], its usage justified by JL theorem [27]. The integrity of such subsampling has been explored in [24], where the results indicate a significant enhancement in the efficiency with minimal deterioration in its generalizability.

4. Methods

4.1. Overview

Figure 1 depicts the overall pipeline for open-set authentication with CoreTemp: coreset-reduced templates. We follow the legacy of previous works [9,16] with a notable benchmark for acquiring extracted signals. Specifically, we take the

x - y

coordinates, the pressure, and the area information of the point of contact from the touchscreen; the linear speed in x, y, and z directions from the accelerometer; and the angular speed pivoting around x, y, and z directions from the gyroscope. Noting that each sensor operates according to its clock cycle, an additional timestamp of sensory reading per sensor is added to compensate for the timely discrepancies.

Finally, the raw readings are converted to a concatenated touchstroke signal. A sliding window technique with a window size of

T_{w}

and an overlap of

T_{o}

is utilized to convert the signals to more refined, discrete samples. Thus, assuming the frequency of the sensors as f and the data from the touchscreen, accelerometer, and gyroscope as

S_{t}

,

S_{a}

, and

S_{g}

, we now derive a discrete user sample

X = {S_{t}, S_{a}, S_{g}} \in R^{(| S_{t} | + | S_{a} | + | S_{g} | + 3) \cdot (f \cdot T_{w})}

. Note that 3 stands for the auxiliary timestamp information.

Suppose a pretrained model

f (\cdot)

has been pretrained upon a general, favorably large-scale pretraining dataset; we first discuss the enrollment and authentication procedures. In the enrollment phase of OSA, the genuine user’s signals forwarded through

f (\cdot)

are taken as a template. Here, we observe that employing the entire set of templates will likely be impractical in realistic schemes. Observing that nearest neighbor distance functions as a strong foundation for calculating similarity scores, we propose performing a minimax facility location coreset selection on the raw template (as discussed in Section 3.2), where the resulting CoreTemp is an excerpt of user-representative behavioral signals. Consequently, the distilled CoreTemp is stored in the user’s device and compared against claimants’ signals.

During authentication, the input signals from a claimant are likewise processed into discrete samples and forwarded through

f (\cdot)

. Subsequently, the comparison against CoreTemp utilizing the nearest neighbor yields a similarity score, where the claimant is accepted or rejected according to such a score and a threshold value, following the legacy of OSA.

Moreover, we present the feasibility of identification with now reduced templates, adopting nearest neighbor distance as the similarity score to identity each, as further described in Section 4.2.

Similarly, during enrollment and identification, the genuine user’s touchstrokes are processed through

f (\cdot)

, where the coreset-reduced template is tagged with its identity. With multiple independently coreset-reduced templates registered in the system, a claimant’s sample is compared against each coreset, where the identity revealing the closest nearest neighbor distance is chosen as the user.

Importantly, we do not make any presumptions about the claimant’s identity; without the loss of generality, it is supposed to be disjoint from the identities of the pretraining dataset.

4.2. CoreTemp: Coreset-Reduced Templates

In this section, we clarify the utilization of CoreTemp.

4.2.1. Enrollment

We redefine how the templates are enrolled utilizing pretrained models. For authentication, suppose a set of templates that belong to a single identity,

M

, and denote the coreset subsampling process as explained in Section 3.2 as

C : M \to M_{C}

; the template is subsampled by the greedy sampling process by a predefined ratio p such that

p = | M_{C} | / | M |

, which is finally registered by the system as the template.

For identification, we assume sets of templates that belong to T multiple identities,

M^{1}, M^{2}, \dots, M^{T}

, that is, memory sets of each user tagged by its identity. Here, we derive T independently subsampled coresets of each memory bank by

C : M^{1} \to M_{C}^{1}

,

C : M^{2} \to M_{C}^{2}

, …,

C : M^{T} \to M_{C}^{T}

, where each user’s coresets as well as their identity tag is registered as the template.

4.2.2. Inference

Now, assuming an authentication scheme, we assume a claimant sample

\bar{m} = f (\bar{X})

. The similarity score is calculated based on the nearest neighbor distance within data registered in

M_{C}

, that is,

s = - min_{m \in M_{C}} | | m - \bar{m} {| |}_{2},

(3)

with the computational complexity linearly dependent on the size, the dimension, and the sampling ratio of the template assuming a fixed number of claimant samples. Namely, assuming p as the sampling ratio, N the original length of a template, and d the dimension of the template, the complexity is given as

O (p N d)

, where coreset sampling achieves

N \to p N

.

To further reduce the computational burden, we propose reducing the dimension of the CoreTemp via random linear projection. By applying the same reduction layer when the greedy sampling selection process takes place, we justify such an application to calculate the similarity score, that is, taking the notations given in Section 3.2,

s = - min_{m \in M_{C}} | | ψ (m) - ψ (\bar{m}) {| |}_{2},

(4)

where we further reduce the computational burden to

O (p N d^{*})

, hence a fast match algorithm (

d \to d^{*})

. Finally, the similarity score is compared against a predefined threshold for both cases, where the accept or reject decision occurs.

Correspondingly, for identification, we simply take the arg max of the the similarity scores, namely,

i d = \underset{i}{arg max} s^{i}

(5)

where

s^{i}

denotes the similarity score calculated corresponding to the coreset with identity i, that is,

s^{i} = - min_{m \in M_{C}^{i}} | | m - \bar{m} {| |}_{2}

, or for an application of the fast match algorithm,

s^{i} = - min_{m \in M_{C}^{i}} | | ψ (m) - ψ (\bar{m}) {| |}_{2}

. A detailed algorithm for authentication and identification is further provided in Algorithms 1 and 2.

Algorithm 1 Authentication Implementation

Require:: $D_{A}, D_{B}, f (\cdot), s (\cdot), τ$
1:: for $(P, E) in {(D_{A}, D_{B}), (D_{B}, D_{A})}$ do
: # Pretraining
2:: Initialize $f (\cdot)$ ▹ PIEformer+
3:: Train $f (\cdot)$ on $P$
: # Authentication
4:: for $y_{g} in Y_{E}$ do
5:: Generate $M^{y_{g}}$
6:: Subsample $C$ : $M^{y_{g}}$ → $M_{C}^{y_{g}}$
7:: Generate $q_{g}, q_{i}$ ▹ query samples of genuine and impostors
8:: for $q in q_{g} \cup q_{i}$ do
9:: if $s (f (q)) > τ$ then
10:: Accept
11:: else
12:: Reject
13:: end if
14:: end for
15:: end for
16:: end for
17:: Average accumulated statistics

Algorithm 2 Identification Implementation

Require:: $D_{A}, D_{B}, f (\cdot)$
1:: for $(P, E) in {(D_{A}, D_{B}), (D_{B}, D_{A})}$ do
: # Pretraining
2:: Initialize $f (\cdot)$ ▹ PIEformer+
3:: Train $f (\cdot)$ on $P$
: # Identification
4:: for $y_{g} in Y_{E}$ do
5:: Generate $M^{y_{g}}$
6:: Subsample $C$ : $M^{y_{g}}$ → $M_{C}^{y_{g}}$
7:: end for
8:: Register $M_{C}^{1}, \dots, M_{C}^{| Y_{E} |}$
9:: for $y_{g} in Y_{E}$ do
10:: Generate $q_{g}, q_{i}$ ▹ query samples of genuine and impostors
11:: for $q in q_{g} \cup q_{i}$ do
12:: $i d = {arg max}_{i} s^{i} (f (q))$
13:: end for
14:: end for
15:: end for
16:: Average accumulated statistics

4.3. PIEformer+

In this section, as a means to provide a clear definition of our proposed architecture, we first define the notations; subsequently, we revisit and present an in-depth analysis of PIEformer [16].

4.3.1. Notations

Suppose a Transformer encoder

T_{θ} (\cdot)

, a learnable input embedding

r

, and an input embedding sequence

X = {x_{1}, x_{2}, \dots, x_{t}}

, with

r, x_{t^{'}} \in R^{d}

for

t^{'} = 1, 2, \dots, t

. The output vector

\hat{r}

is passed to a classifier

C l (\cdot)

, where

\hat{y} = C l (\hat{r})

is used for an end-to-end training of parameters

θ

. After a successful training of

θ

, we assume

\hat{r} = f (X)

as the embedding of an input sequence

X

, where

f (\cdot)

corresponds to the output of

r

from the Transformer encode module

T_{θ}

. Finally, note that

r \in θ

.

4.3.2. PIEformer

To propose a low-cost ensembled model of Transformers, PIEformer utilizes a beneficiary property of Transformers that may have an input of arbitrary length and learnable embeddings. Remarkably, the authors propose to utilize multiple learnable embeddings, namely,

r_{1}, r_{2}, \dots, r_{N}

as an auxiliary input to the Transformer, where the output is given as:

\begin{matrix} f (X) = concat (\hat{r_{1}}, \hat{r_{2}}, \dots, \hat{r_{N}}) \\ with (\hat{r_{1}}, \hat{r_{2}}, \dots, \hat{r_{N}}, \hat{x_{1}}, \hat{x_{2}}, \dots, \hat{x_{t}}) = T_{θ} (r_{1}, \dots, r_{N}, x_{1}, \dots, x_{t}) . \end{matrix}

(6)

Thus, compared to a naive ensemble (denoted as

\tilde{f} (\cdot)

) of

\begin{matrix} \tilde{f} (X) = concat (\hat{r_{1}}, \hat{r_{2}}, \dots, \hat{r_{N}}) \\ with \hat{r_{i}} = T_{θ_{i}} (r_{i}, x_{1}, \dots, x_{t}) for i = 1, \dots, N where θ_{i} \neq θ_{j} (i \neq j), \end{matrix}

(7)

the computational burden is significantly less while simulating the effect of ensembles.

However, noting that ensembled models benefit primarily from independence within each other as such a property is crucial to enhance the overall generalization ability by enhancing the diversity of predictions, we observe that

r_{1}, r_{2}, \dots, r_{N}

are passed through a single attention module in

T_{θ} (\cdot)

. Due to the global nature of attention mechanisms, the entanglements amongst

r_{1}, r_{2}, \dots, r_{N}

are inevitable in the forward and backward propagation through the model, thus infringing the separation of critical factors that replicates the effect of ensembled models.

4.3.3. PIEformer+

In an attempt to preserve the independence throughout multiple learnable input parameters while keeping their interaction with the input sequence

X

intact, we propose adding a masking block in between the learnable parameters, or

r_{1}, r_{2}, \dots, r_{N}

. We note that this is identical to having a masked attention rather than a complete attention module inside the Transformer encoder. We label the new architecture as PIEformer+.

As illustrated in Figure 2, separating the learnable parameter inputs in the attention module via masked attention results in resembling N independent attentions, during both forward and backward propagation of the model. Formally, PIEformer+ is expressed as:

\begin{matrix} f (X) = concat (\hat{r_{1}}, \hat{r_{2}}, \dots, \hat{r_{N}}) \\ with (\hat{r_{i}}, \hat{x_{1}}, \hat{x_{2}}, \dots, \hat{x_{t}}) = T_{θ} (r_{i}, x_{1}, \dots, x_{t}) for i = 1, \dots, N . \end{matrix}

(8)

Notice, unlike utilizing multiple models expressed in Equation (7), namely, multiple

θ_{i}

-s, our proposed method introduces diversity while constraining the complexity within a single set of

θ

. Moreover, we acknowledge that masked attention brings no overhead to the attention parameters, FLOPs, or computational complexity.

Remarking that the above describes the inference of PIEformer+, we now explain pretraining PIEformer+ and how extra diversity may be incorporated within the model parameters in this stage. Following the pretraining methods with PIEformer as explained in [16], we adopt N different classifiers to derive class-representative one-hot-encoded vectors, namely,

\hat{y_{i}} = C l_{i} (\hat{r_{i}})

for

i = 1, 2, \dots, N

, with the loss derived as the sum of individual cross-entropy, i.e.,

L_{t o t a l} = Σ_{i = 1}^{N} L_{C E} (y, C l_{i} (r_{i})

. Here, acknowledging that dropout [42] and dropblock [43] also mimics the ensembling effect by utilizing stochastic operations, we randomly inject an extra dropout or dropblock layer in each

C l_{i} (\cdot)

, thus introducing more diversity within

r_{1}, r_{2}, \dots, r_{N}

.

5. Experiments

5.1. Experimental Details

5.1.1. Datasets

Our experiments utilize two widely used public touchstroke-gait datasets: HMOG [44] and BBMAS [45]. The HMOG dataset captures intricate details of hand movements, device orientation, and grip patterns during touch-based interactions with mobile devices. These data are essential for developing behavioral biometrics, enhancing security protocols, and improving user interface designs. Similarly, the BBMAS dataset encompasses a broader spectrum of body movements and behavioral patterns, including gait and posture, recorded through various sensors. This multi-modal dataset is also widely adopted for security systems, offering comprehensive insights into gait features and physical activities.

In detail, data from both datasets, the touchscreen, the accelerometer, and the gyroscope, are utilized in this research. The specifics of data collected in each sensor are delineated in Section 4.1. Following the notations introduced in Section 4.1, we adopt a sliding window of 200 samples, leading to

T_{w} = 2

and

T_{w} = 1

, respectively, for HMOG and BBMAS, and

T_{o} = T_{w} / 2

in both cases, closely adhering to the configurations set by [16].

5.1.2. Implementation Details

We further replicate the open-set environment for a fair comparison, as suggested in [16]. Specifically, the open-set environment is simulated using either of the datasets or both. For the former (single dataset), a single dataset is disjointedly split into two, where the first half serves as the pretraining set while each user in the second is selected as the genuine user, and the rest as impostors, iteratively for the entire span of the second half. The first and second half swap their roles, where the performance is averaged across all users. For the latter (multiple datasets), one dataset is taken entirely as the pretraining dataset. At the same time, each user in the second serves as the genuine in an iterative manner, similar to those above. For simplicity, we refer to the cases as intra- or inter-dataset, respectively.

5.2. Authentication on HMOG and BBMAS

In this section, we present our experiment in two parts. First, we compare the performance of our proposed model, PIEformer+, with that of previous state-of-the-art models. Table 1 presents the equal error rate (EER), the area under receiver operating characteristic (AUROC), and TPR@FPR1% and 0.1%. [email protected]% is unavailable for BBMAS due to the limited size of the dataset. For a fair comparison with prior literature, coreset sampling or fast match algorithms are not present in Table 1, as such methodologies are incompatible with works that assume closed-set authentication. Furthermore, for works that adopt open-set, that is, [16], and ours, the metrics reported are intra-dataset experiments to align with the configurations of predecessors. The table demonstrates the superiority of open-set authentication while also showcasing the enhanced ability to extract user-discriminative and representative embeddings of PIEformer+.

Furthermore, we provide additional analysis on the effect of coreset sampling and the fast match algorithm with transformer-based architectures, as attention mechanisms have shown their profound ability to generate robust embeddings within this context. At this moment, we reiterate that the fundamental objective of coreset sampling and fast match lies in increasing the efficiency of authentication (identification), not the performance and that a gain in performance has been demonstrated in Table 1 with the presentation of PIEformer+. The implementation details are presented in Algorightm 1 for clarity.

More specifically, these efficiency-oriented methods aim to increase efficiency drastically with minimal sacrifice to performance. To this end, we first provide a detailed comparison of Transformer [22], PIEformer [16], and PIEformer+ being adopted as the backbone of our experiment in Table 2 (intra-dataset experiments) and Table 3 (inter-dataset experiments). As indicated in Table 2 and Table 3, the application of coreset sampling and fast match suffers minimal degradation in performance, especially in association with PIEformer+. Moreover, Table 5 exhibits the efficiency of our proposed methodologies, thus complying with the prime intent of increased efficiency with minimal loss in accuracy. Note that the model parameters are reasonably matched to that of PIEformer+ for a fair comparison, where the details are further given in Section 5.3. For a further presentation, the ROC curve is given in Figure 3.

5.3. Identification on HMOG and BBMAS

We further present our identification experiment in two ways. Denoting that we are the first to attempt identification utilizing touchstroke-gait behavioral biometrics, we acknowledge that such an experiment poses a significant challenge as behavioral biometrics inherently exhibits comparatively lower correlation with the user’s identity, and registration and search involving multiple templates of various users may be time- and resource-consuming, especially on mobile devices. To this end, we have proposed PIEformer+ for generating robust templates, coreset sampling, and fast matching for efficiency. Acknowledging that we are the first to attempt identification with touchstroke-gait signals, we test for various backbones, namely, LSTM, Transformer, and PIEformer, where the number of parameters has been sensibly balanced; 5-stacked LSTM with a hidden state dimension of 160, 4-layers of Transformer encoders with eight multi-heads and a dimension of embedding and forward at 128 and 512, and the number of learnable inputs for PIEformer and PIEformer+ set at 8 with the rest of the parameters shared with the Transformer. We present the experimental results with Rank1 and Rank5 values in Table 4, with a further delivery of the CMC curve in Figure 4. A detailed implementation of the experiment is provided in Algorithm 2.

Finally, we end this section by presenting the cost-effectiveness of our proposed algorithms, coreset sampling, and fast match, as shown in Table 5. Benchmark time was calculated with no GPU acceleration on a Macintosh M1, with rapid access memory (RAM) limited to 32Gb, to simulate mobile resource-limited environments. Acknowledging that each sample is equivalent to a data fragment of 2 (HMOG) or 1 (BBMAS) seconds, respectively, as explained in Section 5.1, the improvements displayed showcase a meaningful advancement.

5.4. Ablation Study

Sampling Ratio

A core parameter in greedy coreset sampling is the sampling rate, denoted as p in Section 4.2. The fundamental objective for its employment is to efficiently yet effectively represent the data distribution with a minimal set of representative samples. This is achieved by a greedy algorithm that aims to cover the maximum variance of the data with the least redundancy, with detailed steps given in Section 3.2 and Section 4.2. Here, we ask the technical question of finding the adequate percentage (p), where a coreset that is too small risks missing the critical nuances of the data distribution. At the same time, a coreset that is too large unnecessarily increases computation time without substantial gains in performance. Thus, in Figure 5, we present the performance, EER, and Rank1 for various sampling ratios and show that 10% or 25% for HMOG and BBMAS, respectively, serves as an understandable census.

6. Conclusions

In conclusion, this work advances the field of behavioral biometrics by addressing critical challenges in mobile device authentication and identification. Our introduction of CoreTemp, a coreset-reduced template, revolutionizes how templates are managed by significantly reducing their size and complexity. This innovation alleviates the computational burden on resource-constrained devices and enhances user privacy and security through efficient and robust authentication processes. Moreover, the development of PIEformer+, with its novel masking layer, further refines the authentication framework by more closely simulating the benefits of an ensemble of models, thus achieving state-of-the-art performance on the HMOG and BBMAS datasets. These contributions mark a significant step forward in the practical application of behavioral biometrics, paving the way for more secure, efficient, and user-friendly authentication and identification systems in ubiquitous computing environments.

Author Contributions

Conceptualization, J.Y., J.P. and A.B.J.T.; methodology, J.Y. and J.P.; software, J.Y. and J.K.; validation, J.Y., J.P. and A.B.J.T.; formal analysis, J.Y. and J.K.; investigation, J.Y.; resources, J.Y. and A.B.J.T.; data curation, J.Y. and J.K.; writing—original draft preparation, J.Y. and J.K.; writing—review and editing, J.Y. and A.B.J.T.; visualization, J.Y. and J.K.; supervision, A.B.J.T.; project administration, A.B.J.T.; funding acquisition, A.B.J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NO. NRF-2022R1A2C1010710).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This literature involves two publicly available datasets, https://www.cs.wm.edu/~qyang/hmog.html HMOG [44], and https://ieee-dataport.org/open-access/su-ais-bb-mas-syracuse-university-and-assured-information-security-behavioral-biometrics BBMAS [45]. Both datasets accessed on 1 Febuary 2024.

Conflicts of Interest

Author Jaewoo Park was employed by the company AiV Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wu, W.; Wu, Q.; Chen, X. A novel authentication scheme for fingerprint based on transformer. IEEE Access 2021, 9, 81362–81373. [Google Scholar]
Wang, X.; Cheng, H.; Zhang, X.; Wang, Z. Vision transformers for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 25840–25849. [Google Scholar]
Yang, C.; Li, S.; Liu, Y.; Li, M. Transformer-based iris recognition. IEEE Access 2021, 9, 111775–111781. [Google Scholar]
Wang, T.; Huang, Y.; Liu, H.; Yang, J. Ktrans: A transformer-based keystroke dynamics authentication system. IEEE Access 2021, 9, 68037–68048. [Google Scholar]
Yang, L.; Song, Q.; Wu, Y. Attacks on state-of-the-art face recognition using attentional adversarial attack generative network. Multimed. Tools Appl. 2021, 80, 855–875. [Google Scholar] [CrossRef]
Jose, C.J.; Rajasree, M. Deep learning-based implicit continuous authentication of smartphone user. In Proceedings of the Third International Conference on Communication, Computing and Electronics Systems: ICCCES 2021, Coimbatore, India, 28–29 October 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 387–400. [Google Scholar]
Giorgi, G.; Saracino, A.; Martinelli, F. Using recurrent neural networks for continuous authentication through gait analysis. Pattern Recognit. Lett. 2021, 147, 157–163. [Google Scholar] [CrossRef]
Mekruksavanich, S.; Jitpattanakul, A. Deep learning approaches for continuous authentication based on activity patterns using mobile sensing. Sensors 2021, 21, 7519. [Google Scholar] [CrossRef] [PubMed]
Wagata, K.; Teoh, A.B.J. Few-shot continuous authentication for mobile-based biometrics. Appl. Sci. 2022, 12, 10365. [Google Scholar] [CrossRef]
Hu, H.; Li, Y.; Zhu, Z.; Zhou, G. CNNAuth: Continuous authentication via two-stream convolutional neural networks. In Proceedings of the 2018 IEEE International Conference on Networking, Architecture and Storage (NAS), Chongqing, China, 11–14 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–9. [Google Scholar]
Li, Y.; Hu, H.; Zhu, Z.; Zhou, G. SCANet: Sensor-based continuous authentication with two-stream convolutional neural networks. ACM Trans. Sens. Netw. (TOSN) 2020, 16, 1–27. [Google Scholar] [CrossRef]
Li, Y.; Tao, P.; Deng, S.; Zhou, G. DeFFusion: CNN-based continuous authentication using deep feature fusion. ACM Trans. Sens. Netw. (TOSN) 2021, 18, 1–20. [Google Scholar] [CrossRef]
Deng, S.; Luo, J.; Li, Y. Cnn-based continuous authentication on smartphones with auto augmentation search. In Proceedings of the Information and Communications Security: 23rd International Conference, ICICS 2021, Chongqing, China, 19–21 November 2021; Proceedings, Part I 23. Springer: Berlin/Heidelberg, Germany, 2021; pp. 169–186. [Google Scholar]
Benegui, C.; Ionescu, R.T. To augment or not to augment? Data augmentation in user identification based on motion sensors. In Proceedings of the Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, 18–22 November 2020; Proceedings, Part V 27. Springer: Berlin/Heidelberg, Germany, 2020; pp. 822–831. [Google Scholar]
Centeno, M.P.; van Moorsel, A.; Castruccio, S. Smartphone continuous authentication using deep learning autoencoders. In Proceedings of the 2017 15th Annual Conference on Privacy, Security and Trust (PST), Calgary, AB, Canada, 28–30 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 147–1478. [Google Scholar]
Yoon, J.; Park, J.; Wagata, K.; Park, H.; Teoh, A.B.J. Pretrained Implicit-Ensemble Transformer for Open-Set Authentication on Multimodal Mobile Biometrics. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October 2023; pp. 5909–5922. [Google Scholar]
Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep one-class classification. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 4393–4402. [Google Scholar]
Agrawal, M.; Mehrotra, P.; Kumar, R.; Shah, R. Defending Touch-based Continuous Authentication Systems from Active Adversaries Using Generative Adversarial Networks. In Proceedings of the 2021 IEEE International Joint Conference on Biometrics (IJCB), Shenzhen, China, 4–7 August 2021; pp. 1–8. [Google Scholar]
Buriro, A.; Ricci, F.; Crispo, B. SwipeGAN: Swiping Data Augmentation Using Generative Adversarial Networks for Smartphone User Authentication. In Proceedings of the 3rd ACM Workshop on Wireless Security and Machine Learning, Abu Dhabi, United Arab Emirates, 2 July 2021; pp. 85–90. [Google Scholar]
Incel, Ö.; Günay, S.; Akan, Y.; Barlas, Y.; Basar, O.; Alptekin, G.; Isbilen, M. DAKOTA: Sensor and touch screen-based continuous authentication on a mobile banking application. IEEE Access 2021, 9, 38943–38960. [Google Scholar] [CrossRef]
Li, Y.; Hu, H.; Zhou, G. Using data augmentation in continuous authentication on smartphones. IEEE Internet Things J. 2018, 6, 628–640. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Roth, K.; Pemula, L.; Zepeda, J.; Schölkopf, B.; Brox, T.; Gehler, P. Towards total recall in industrial anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14318–14328. [Google Scholar]
Sener, O.; Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv 2017, arXiv:1708.00489. [Google Scholar]
Wolsey, L.A.; Nemhauser, G.L. Integer and Combinatorial Optimization; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Dasgupta, S.; Gupta, A. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 2003, 22, 60–65. [Google Scholar] [CrossRef]
Zhou, Z.H. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Abuhamad, M.; Abuhmed, T.; Mohaisen, D.; Nyang, D. AUToSen: Deep-learning-based implicit continuous authentication using smartphone sensors. IEEE Internet Things J. 2020, 7, 5008–5020. [Google Scholar] [CrossRef]
Yang, Y.; Guo, B.; Wang, Z.; Li, M.; Yu, Z.; Zhou, X. BehaveSense: Continuous authentication for security-sensitive mobile apps using behavioral biometrics. Ad Hoc Netw. 2019, 84, 9–18. [Google Scholar] [CrossRef]
Keykhaie, S.; Pierre, S. Mobile match on card active authentication using touchscreen biometric. IEEE Trans. Consum. Electron. 2020, 66, 376–385. [Google Scholar] [CrossRef]
Ooi, S.; Teoh, A. Touch-stroke dynamics authentication using temporal regression forest. IEEE Signal Process. Lett. 2019, 26, 1001–1005. [Google Scholar] [CrossRef]
Chang, I.; Low, C.; Choi, S.; Teoh, A. Kernel deep regression network for touch-stroke dynamics authentication. IEEE Signal Process. Lett. 2018, 25, 1109–1113. [Google Scholar] [CrossRef]
Buriro, A.; Crispo, B.; Gupta, S.; Del Frari, F. Dialerauth: A motion-assisted touch-based smartphone user authentication scheme. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, Tempe, AZ, USA, 19–21 March 2018; pp. 267–276. [Google Scholar]
Volaka, H.C.; Alptekin, G.; Basar, O.E.; Isbilen, M.; Incel, O.D. Towards continuous authentication on mobile phones using deep learning models. Procedia Comput. Sci. 2019, 155, 177–184. [Google Scholar] [CrossRef]
Song, Y.; Cai, Z. Integrating handcrafted features with deep representations for smartphone authentication. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–27. [Google Scholar] [CrossRef]
Kim, J.; Kang, P. Draw-a-deep pattern: Drawing pattern-based smartphone user authentication based on temporal convolutional neural network. Appl. Sci. 2022, 12, 7590. [Google Scholar] [CrossRef]
Ray-Dowling, A.; Hou, D.; Schuckers, S. Stationary mobile behavioral biometrics: A survey. Comput. Secur. 2023, 128, 103184. [Google Scholar] [CrossRef]
Chen, H.; Shrivastava, A. Group ensemble: Learning an ensemble of convnets in a single convnet. arXiv 2020, arXiv:2007.00649. [Google Scholar]
Cohen, N.; Hoshen, Y. Sub-image anomaly detection with deep pyramid correspondences. arXiv 2020, arXiv:2005.02357. [Google Scholar]
Sinha, S.; Zhang, H.; Goyal, A.; Bengio, Y.; Larochelle, H.; Odena, A. Small-gan: Speeding up gan training using core-sets. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 9005–9015. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Ghiasi, G.; Lin, T.Y.; Le, Q.V. Dropblock: A regularization method for convolutional networks. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Sitová, Z.; Šeděnka, J.; Yang, Q.; Peng, G.; Zhou, G.; Gasti, P.; Balagani, K.S. HMOG: New behavioral biometric features for continuous authentication of smartphone users. IEEE Trans. Inf. Forensics Secur. 2015, 11, 877–892. [Google Scholar] [CrossRef]
Belman, A.K.; Wang, L.; Iyengar, S.S.; Sniatala, P.; Wright, R.; Dora, R.; Baldwin, J.; Jin, Z.; Phoha, V.V. SU-AIS BB-MAS (Syracuse University and Assured Information Security—Behavioral Biometrics Multi-device and multi-Activity data from Same users) Dataset. IEEE Dataport 2019, 10. [Google Scholar] [CrossRef]

Figure 1. A pipeline of (a) authentication and (b) identification with CoreTemp.

Figure 2. Illustration of PIEformer (left) and PIEformer+ (right). The red circles indicate attention mechanisms.

Figure 3. The averaged ROC curve for intra-dataset experiments, as a more detailed presentation of Table 2. Best viewed in color.

Figure 4. CMC curve of Transformer, PIEformer, and PIEformer+. Best viewed in color.

Figure 5. EER and Rank1 values according to the sampling ratio. Points marked with “X” indicate an average of 10 templates per user, which we consider the lower bound, where its ratio p approximates the closest value.

Table 1. Comparison of our proposed model to prior works. PIEformer and PIEformer+ are run under an open-set protocol, whereas the others are in a closed set.

(%), Except AUROC	HMOG				BBMAS
Model	EER (↓)	AUROC (↑)	TPR@FPR1% (↑)	[email protected]% (↑)	EER (↓)	AUROC (↑)	TPR@FPR1% (↑)
AUToSen [6]	${25.79}_{\pm 16.19}$	${0.7888}_{\pm 0.1754}$	${37.36}_{\pm 28.17}$	${18.44}_{\pm 23.27}$	${27.26}_{\pm 15.56}$	${0.7779}_{\pm 0.1705}$	${27.65}_{\pm 26.32}$
Girogi et al. [7]	${24.61}_{\pm 15.80}$	${0.7994}_{\pm 0.1671}$	${37.36}_{\pm 31.63}$	${18.02}_{\pm 24.77}$	${27.62}_{\pm 16.56}$	${0.7669}_{\pm 0.1820}$	${25.27}_{\pm 27.13}$
Volaka et al. [35]	${23.80}_{\pm 16.08}$	${0.8105}_{\pm 0.1721}$	${35.87}_{\pm 32.51}$	${18.77}_{\pm 26.13}$	${22.55}_{\pm 12.67}$	${0.8314}_{\pm 0.1376}$	${25.23}_{\pm 29.30}$
DeepAuthen [8]	${22.86}_{\pm 18.05}$	${0.8149}_{\pm 0.1896}$	${41.59}_{\pm 35.55}$	${21.13}_{\pm 30.64}$	${24.34}_{\pm 18.22}$	${0.8025}_{\pm 0.2035}$	${45.25}_{\pm 32.81}$
DeFFusion [12]	${19.57}_{\pm 14.10}$	${0.8642}_{\pm 0.1406}$	${46.87}_{\pm 30.25}$	${29.85}_{\pm 29.66}$	${18.41}_{\pm 10.88}$	${0.8811}_{\pm 0.1047}$	${42.97}_{\pm 28.80}$
Wagata et al. [9]	${18.37}_{\pm 15.74}$	${0.8537}_{\pm 0.1637}$	${43.20}_{\pm 41.32}$	${24.52}_{\pm 36.94}$	${19.07}_{\pm 15.24}$	${0.8536}_{\pm 0.1628}$	${36.83}_{\pm 38.73}$
PIEformer [16]	${6.48}_{\pm 3.63}$	${0.9672}_{\pm 0.0576}$	${86.54}_{\pm 12.81}$	${69.36}_{\pm 18.15}$	${2.33}_{\pm 1.53}$	${0.9924}_{\pm 0.0181}$	${83.04}_{\pm 29.18}$
PIEformer+ (ours)	5.42_±3.66	0.9756_±0.0576	88.83_±11.82	74.74_±17.31	2.06±1.55	0.9952±0.0180	86.86_±27.87

Table 2. Comparison of the PIEformer+ with other attention-based architectures for authentication, namely, Transformer and PIEformer.

(%), Except AUROC			HMOG				BBMAS
Model	Coreset	Fast Match	EER (↓)	AUROC (↑)	TPR@FPR1% (↑)	[email protected]% (↑)	EER (↓)	AUROC (↑)	TPR@FPR1% (↑)
			${7.29}_{\pm 4.00}$	${0.9631}_{\pm 0.0585}$	${83.64}_{\pm 14.84}$	${65.35}_{\pm 18.50}$	${3.08}_{\pm 1.73}$	${0.9879}_{\pm 0.0165}$	${81.16}_{\pm 29.66}$
Transformer	✓		${8.30}_{\pm 3.80}$	${0.9553}_{\pm 0.0660}$	${80.48}_{\pm 14.41}$	${59.97}_{\pm 18.54}$	${3.16}_{\pm 1.61}$	${0.9880}_{\pm 0.0169}$	${78.65}_{\pm 28.62}$
	✓	✓	${8.30}_{\pm 4.41}$	${0.9553}_{\pm 0.0593}$	${80.48}_{\pm 15.93}$	${59.97}_{\pm 19.23}$	${3.45}_{\pm 1.80}$	${0.9857}_{\pm 0.0169}$	${78.15}_{\pm 29.81}$
			${6.48}_{\pm 3.63}$	${0.9672}_{\pm 0.0576}$	${86.54}_{\pm 12.81}$	${69.36}_{\pm 18.15}$	${2.33}_{\pm 1.53}$	${0.9924}_{\pm 0.0181}$	${83.04}_{\pm 29.18}$
PIEformer	✓		${6.62}_{\pm 3.90}$	${0.9661}_{\pm 0.0600}$	${86.38}_{\pm 12.50}$	${68.44}_{\pm 15.59}$	${2.39}_{\pm 1.67}$	${0.9921}_{\pm 0.0181}$	${82.52}_{\pm 27.47}$
	✓	✓	${6.73}_{\pm 3.85}$	${0.9658}_{\pm 0.0614}$	${85.44}_{\pm 13.75}$	${67.36}_{\pm 17.70}$	${2.62}_{\pm 1.76}$	${0.9904}_{\pm 0.0222}$	${81.66}_{\pm 29.61}$
			${5.42}_{\pm 3.66}$	${0.9756}_{\pm 0.0576}$	${88.83}_{\pm 11.82}$	${74.74}_{\pm 17.31}$	${2.06}_{\pm 1.55}$	${0.9952}_{\pm 0.0180}$	${86.86}_{\pm 27.87}$
PIEformer+	✓		${5.42}_{\pm 3.09}$	${0.9755}_{\pm 0.0457}$	${88.83}_{\pm 9.77}$	${74.69}_{\pm 16.16}$	${2.06}_{\pm 1.71}$	${0.9952}_{\pm 0.0223}$	${86.86}_{\pm 27.58}$
	✓	✓	${5.51}_{\pm 3.24}$	${0.9744}_{\pm 0.0551}$	${86.63}_{\pm 14.37}$	${69.85}_{\pm 18.99}$	${2.28}_{\pm 1.56}$	${0.9928}_{\pm 0.0155}$	${85.58}_{\pm 27.65}$

Table 3. Comparison of the PIEformer+ with other attention-based architectures for authentication, namely, Transformer and PIEformer. BBMAS → HMOG indicates BBMAS as the pretraining set, whereas HMOG is utilized for inference, and vice versa.

(%), Except AUROC			BBMAS → HMOG				HMOG → BBMAS
Model	Coreset	Fast Match	EER (↓)	AUROC (↑)	TPR@FPR1% (↑)	[email protected]% (↑)	EER (↓)	AUROC (↑)	TPR@FPR1% (↑)
			${10.51}_{\pm 6.32}$	${0.9361}_{\pm 0.0947}$	${76.86}_{\pm 18.39}$	${61.19}_{\pm 24.47}$	${4.60}_{\pm 2.57}$	${0.9852}_{\pm 0.0305}$	${77.63}_{\pm 14.91}$
Transformer	✓		${10.76}_{\pm 6.00}$	${0.9337}_{\pm 0.0991}$	${74.89}_{\pm 19.90}$	${58.35}_{\pm 23.60}$	${4.90}_{\pm 2.73}$	${0.9834}_{\pm 0.0293}$	${76.33}_{\pm 14.69}$
	✓	✓	${10.76}_{\pm 6.00}$	${0.9336}_{\pm 0.0991}$	${74.89}_{\pm 19.90}$	${58.34}_{\pm 23.60}$	${5.01}_{\pm 2.33}$	${0.9667}_{\pm 0.0232}$	${75.53}_{\pm 12.76}$
			${9.22}_{\pm 6.08}$	${0.9460}_{\pm 0.1028}$	${80.27}_{\pm 20.83}$	${64.31}_{\pm 23.85}$	${3.66}_{\pm 1.98}$	${0.9893}_{\pm 0.0187}$	${79.79}_{\pm 11.87}$
PIEformer	✓		${9.70}_{\pm 6.22}$	${0.9430}_{\pm 0.1048}$	${79.49}_{\pm 19.61}$	${63.69}_{\pm 23.63}$	${4.17}_{\pm 1.89}$	${0.9863}_{\pm 0.0190}$	${87.65}_{\pm 10.98}$
	✓	✓	${10.18}_{\pm 6.36}$	${0.9386}_{\pm 0.1109}$	${77.61}_{\pm 22.73}$	${60.92}_{\pm 25.84}$	${4.52}_{\pm 2.40}$	${0.9842}_{\pm 0.0266}$	${76.87}_{\pm 11.24}$
			${8.97}_{\pm 5.83}$	${0.9554}_{\pm 0.0974}$	${81.90}_{\pm 20.36}$	${64.47}_{\pm 23.47}$	${3.52}_{\pm 2.13}$	${0.9890}_{\pm 0.0227}$	${79.61}_{\pm 12.92}$
PIEformer+	✓		${9.23}_{\pm 6.28}$	${0.9468}_{\pm 0.1088}$	${80.60}_{\pm 20.44}$	${64.39}_{\pm 23.87}$	${3.74}_{\pm 2.10}$	${0.9883}_{\pm 0.0216}$	${79.52}_{\pm 12.23}$
	✓	✓	${9.63}_{\pm 6.47}$	${0.9432}_{\pm 0.1119}$	${78.63}_{\pm 22.46}$	${62.35}_{\pm 24.61}$	${4.03}_{\pm 2.17}$	${0.9859}_{\pm 0.0231}$	${77.44}_{\pm 11.65}$

Table 4. Comparison of the PIEformer+ with time sequence models for identification, namely, LSTM, Transformer, and PIEformer. Data configurations aligned with notations and explanations in Table 2 and Table 3.

			HMOG		BBMAS		BBMAS → HMOG		HMOG → BBMAS
Model	Coreset	Fast Match	Rank1 (↑)	Rank5 (↑)	Rank1 (↑)	Rank5 (↑)	Rank1 (↑)	Rank5 (↑)	Rank1 (↑)	Rank5 (↑)
LSTM	✓		${63.12}_{\pm 14.48}$	${84.01}_{\pm 15.57}$	${75.14}_{\pm 4.20}$	${94.14}_{\pm 9.81}$	${61.37}_{\pm 20.57}$	${82.78}_{\pm 21.93}$	${81.64}_{\pm 5.71}$	${92.07}_{\pm 8.14}$
	✓	✓	${60.79}_{\pm 15.67}$	${79.85}_{\pm 9.96}$	${72.57}_{\pm 4.28}$	${85.67}_{\pm 10.10}$	${58.38}_{\pm 21.74}$	${78.12}_{\pm 21.68}$	${79.94}_{\pm 5.82}$	${87.05}_{\pm 8.45}$
Transformer	✓		${85.38}_{\pm 12.32}$	${91.06}_{\pm 7.37}$	${89.33}_{\pm 7.53}$	${98.61}_{\pm 0.72}$	${84.85}_{\pm 17.60}$	${90.43}_{\pm 10.13}$	${90.41}_{\pm 7.32}$	${98.75}_{\pm 0.60}$
	✓	✓	${85.38}_{\pm 12.32}$	${91.06}_{\pm 7.37}$	${89.33}_{\pm 7.53}$	${98.61}_{\pm 0.72}$	${84.43}_{\pm 18.29}$	${90.27}_{\pm 9.95}$	${90.41}_{\pm 7.32}$	${98.75}_{\pm 0.60}$
PIEformer	✓		${87.41}_{\pm 11.89}$	${92.61}_{\pm 5.45}$	${92.23}_{\pm 6.33}$	${98.24}_{\pm 0.61}$	${86.93}_{\pm 18.22}$	${92.08}_{\pm 8.26}$	${94.27}_{\pm 7.59}$	${98.60}_{\pm 0.27}$
	✓	✓	${87.41}_{\pm 11.89}$	${92.61}_{\pm 5.45}$	${92.23}_{\pm 6.33}$	${98.24}_{\pm 0.61}$	${86.48}_{\pm 18.10}$	${91.94}_{\pm 8.69}$	${94.27}_{\pm 7.59}$	${98.60}_{\pm 0.27}$
PIEformer+	✓		${88.69}_{\pm 11.31}$	${97.42}_{\pm 2.41}$	${98.03}_{\pm 4.72}$	${98.99}_{\pm 0.67}$	${88.37}_{\pm 17.42}$	${97.22}_{\pm 5.38}$	${98.78}_{\pm 5.54}$	${99.01}_{\pm 0.32}$
	✓	✓	${88.69}_{\pm 11.31}$	${97.42}_{\pm 2.41}$	${98.03}_{\pm 4.72}$	${98.99}_{\pm 0.67}$	${88.29}_{\pm 18.19}$	${97.19}_{\pm 5.64}$	${98.78}_{\pm 5.54}$	${99.01}_{\pm 0.32}$

Table 5. Time required to authenticate a single sample with PIEformer+. Reported in microseconds.

		HMOG		BBMAS
Coreset	Fast Match	Authentication	Identification	Authentication	Identification
		0.54	-	0.49	-
	✓	0.07	3.68	0.04	2.07
✓	✓	0.02	1.92	0.02	1.84

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yoon, J.; Park, J.; Kim, J.; Teoh, A.B.J. CoreTemp: Coreset Sampled Templates for Multimodal Mobile Biometrics. Appl. Sci. 2024, 14, 5183. https://doi.org/10.3390/app14125183

AMA Style

Yoon J, Park J, Kim J, Teoh ABJ. CoreTemp: Coreset Sampled Templates for Multimodal Mobile Biometrics. Applied Sciences. 2024; 14(12):5183. https://doi.org/10.3390/app14125183

Chicago/Turabian Style

Yoon, Jaeho, Jaewoo Park, Jungyun Kim, and Andrew Beng Jin Teoh. 2024. "CoreTemp: Coreset Sampled Templates for Multimodal Mobile Biometrics" Applied Sciences 14, no. 12: 5183. https://doi.org/10.3390/app14125183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CoreTemp: Coreset Sampled Templates for Multimodal Mobile Biometrics

Abstract

1. Introduction

2. Related Works

3. Preliminary

3.1. Open-Set Authentication and Identification

3.2. Greedy Coreset Sampling

4. Methods

4.1. Overview

4.2. CoreTemp: Coreset-Reduced Templates

4.2.1. Enrollment

4.2.2. Inference

4.3. PIEformer+

4.3.1. Notations

4.3.2. PIEformer

4.3.3. PIEformer+

5. Experiments

5.1. Experimental Details

5.1.1. Datasets

5.1.2. Implementation Details

5.2. Authentication on HMOG and BBMAS

5.3. Identification on HMOG and BBMAS

5.4. Ablation Study

Sampling Ratio

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI