**3. Method**

We experiment using an FV pipeline to evaluate FaceNet [12] on four benchmark datasets. We quantify bias according to the "equal metrics" fairness definition with several distinct statistical fairness metrics, revealing representation bias. We then evaluate clustered embeddings with respect to race and gender groups using clustering metrics and visualizations, revealing aggregation bias. Using the statistical and cluster-based analyses, we draw conclusions on the connection between the clustering of faces into protected and unprotected groups and disparity in model performance between these groups. Figure 2 provides an overview of our method.

**Figure 2.** An overview of our approach. We use diverse face datasets to assess bias in FaceNet [12] by leveraging the face embeddings that it produces for various fairness experiments.

### *3.1. FV Pipeline*

We use MTCNN [29] for face detection and a facenet-pytorch Inception V1 model (https://github.com/timesler/facenet-pytorch, accessed on 28 February 2022), cutting out the final, fully connected layer from the FaceNet model so that it produces face embeddings. The constructed pipeline follows.


As detailed in [12], FaceNet is trained using triplet loss on the VGGFace2 dataset [30], comprising faces that are 74.2% White, 15.8% Black, 6.0% Asian, and 4.0% Indian, with 59.3% male and 40.7% female [30].
