*3.1. Datasets*

To evaluate the proposed approach, we used the LivDet 2015 dataset for the fingerprint and real ECG datasets collected in our lab. LivDet 2015 has approximately 19,000 images divided into two parts: training and testing. Each part has bona fide (live) and artefact (fake) images captured using different fingerprint sensors, as shown in Table 1. Numerous materials are used for fabricating the artefact fingerprint samples, e.g., Ecoflex, gelatin, latex, and wood glue. The testing dataset contains artefact samples fabricated using various materials, some of which are not used in the training dataset, such as OOMOO and RTV, as shown in Table 2. Figure 8 shows bona fide and artefact samples for the same subject captured from two different sensors, i.e., Green Bit and Digital Persona sensors.


**Table 1.** Device and image characteristics of the LivDet 2015 dataset.

**Table 2.** Materials used for fabricating fake images in the LivDet 2015 dataset. Some materials in the testing are unknown during training (underlined).


**Figure 8.** Bona fide and artefact fingerprint samples from the LivDet 2015 dataset captured using Digital Person and Green Bit sensors. Artefact samples were fabricated using different materials.

For the ECG dataset, we used a dataset collected in our lab. We collected this dataset using a commercially available handheld ECG device, i.e., ReadMyHeart by DailyCare BioMedical, Inc. (https://www.dcbiomed.com/webls-en-us/index.html), as shown in Figure 9. We built a database of 656 ECG records captured from 164 individuals collected in two sessions [48,58]. Now, we have extended this database with a third session to have 10 records for most of the users. The device captures a signal for 15 seconds, digitalizes it and exports it to the computer as an ECG record. Generally, such a signal may contain different types of noise, such as power-line interface, baseline wanders, and patient-electrode motion artifacts. In the preprocessing step, we use a band-pass Butterworth filter of order four with cut-off frequencies of 0.25 and 40 Hz to remove the noises. Then an efficient curvature-based method is used to detect heartbeats [59,60] and we take the first 10 beats from each record for this experiment. Figure 10 shows such preprocessed ECG samples from four different subjects.

**Figure 9.** ECG data collection using the ReadMyHeart device.

**Figure 10.** ECG sample of 10 heart beats from four different subjects.

Owing to a lack of availability of public multimodal datasets containing fingerprint and ECG signals, we constructed a multi-model dataset from the LivDet 2015 dataset and an ECG dataset. First, we built a mini fingerprint dataset from the LivDet 2015 dataset, called the mini-livdet2015 dataset, containing images from Digital Persona sensor. This mini-livdet2015 is composed of 70 subjects, each of which has bona fide and artefact samples (10 and 12, respectively). Subsequently, we randomly selected the artefact samples from all available fabricating materials. To form this multimodal dataset, we assigned a random subject from the ECG dataset to each subject from the mini-livdet2015 dataset. Table 3 describes this new dataset, which is comprised of 70 subjects, each of which has 10 bona fide and 12 artefact fingerprint samples, and 10 ECG samples.

During training, we feed the network with batches of input triplets that cover both possible classes. For the bona fide label, we assign a bona fide fingerprint sample with a bona fide ECG sample from the same subject; i.e., *X<sup>f</sup> <sup>i</sup>* and *<sup>X</sup><sup>e</sup> <sup>i</sup>* are bona fide samples belonging to the same subject. Because we do not have artefact ECG signals, we assign an artefact fingerprint sample from one subject with a bona fide ECG sample from another subject (zero-effort ECG sample); i.e., *X<sup>f</sup> <sup>i</sup>* and *<sup>X</sup><sup>e</sup> <sup>i</sup>* are bona fide samples from two different subjects.

Feeding the network with these inputs allows learning the correlations between bona fide fingerprint samples and ECG samples of the same subject to correctly predict which samples are bona fide. Furthermore, this network learns how to correctly predict an artefact by learning the features representing the incoherence between artefact fingerprint sample and a bona fide ECG sample of the same subjects or between a bona fide fingerprint sample of a subject and a bona fide ECG sample belonging to different subjects.


**Table 3.** Description of the customized multimodal dataset, which contains 70 subjects.
