Author Contributions
Conceptualization, S.L., S.O.; Data curation, S.L.; Formal analysis, S.L., S.O.; Funding acquisition, E.P.; Investigation, S.L., S.O.; Methodology, S.L., S.O.; Project administration, S.L., S.O.; Resources, E.P.; Software, S.L.; Supervision, S.O.; Validation, S.L., S.O.; Visualization, S.L., S.O.; Writing—original draft, S.L., S.O., and M.K.; Writing—review & editing, S.L., S.O. All authors have read and agreed to the published version of the manuscript.
Figure 1.
One example set of images for measuring race bias, where the targets are face images of European American and Asian American while the attributes are Career and Family. The images labeled with , , , and are images that depict a target in the context of an attribute.
Figure 1.
One example set of images for measuring race bias, where the targets are face images of European American and Asian American while the attributes are Career and Family. The images labeled with , , , and are images that depict a target in the context of an attribute.
Figure 2.
The classification probability of race between AF and EU by extent of the race transformation; x-axis indicates level of race transformation, while y-axis indicates probability of prediction to EU (0) or AF (1).
Figure 2.
The classification probability of race between AF and EU by extent of the race transformation; x-axis indicates level of race transformation, while y-axis indicates probability of prediction to EU (0) or AF (1).
Table 1.
The statistics of dataset used in our paper. To measure racial bias, targets are EU, AF, and AS, while attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent. For gender bias test, targets are Male and Female, while attributes are same as racial bias test. In age bias measure, targets are young and old, while attributes are also same as in the gender bias test. To measure gendered racism, the most common stereotype of Asian Female (ASF) having Incompetent attribute, we sorted out images of each racial group with a certain gender (i.e., European American Female (EUF) and African American Female (AFF)) and attribute (i.e., Competent/Incompetent).
Table 1.
The statistics of dataset used in our paper. To measure racial bias, targets are EU, AF, and AS, while attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent. For gender bias test, targets are Male and Female, while attributes are same as racial bias test. In age bias measure, targets are young and old, while attributes are also same as in the gender bias test. To measure gendered racism, the most common stereotype of Asian Female (ASF) having Incompetent attribute, we sorted out images of each racial group with a certain gender (i.e., European American Female (EUF) and African American Female (AFF)) and attribute (i.e., Competent/Incompetent).
Target | EU | AF | AS | M | F | Young | Old | EUF | AFF | ASF |
---|
3434 | 3434 | 3434 | 5244 | 5058 | 851 | 851 | 1515 | 1684 | 1859 |
---|
Attribute | Career/Family | 237 | 239 | 280 | 236 | 230 | 264 | 250 | - | - | - |
Pleasant/Unpleasant | 541 | 579 | 681 | 546 | 541 | 713 | 537 | - | - | - |
Likable/Unlikable | 123 | 110 | 153 | 111 | 112 | 160 | 160 | - | - | - |
Competent/Incompetent | 177 | 155 | 189 | 158 | 148 | 200 | 197 | 92 | 82 | 92 |
Table 2.
The results for FEAT on race tests present biases toward races. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets for test are European American, African American, and Asian American. Attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent.
Table 2.
The results for FEAT on race tests present biases toward races. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets for test are European American, African American, and Asian American. Attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent.
| | DeepFace | DeepID | VGGFace | FaceNet | OpenFace | ArcFace |
---|
Career/Family | EU/AF | 0.095 * | 0.078 * | 0.294 * | 0.569 * | 0.148 * | −0.000 |
EU/AS | −0.006 | −0.209 | −0.476 | −0.097 | 0.372 * | 0.078 * |
Pleasant/Unpleasant | EU/AF | 0.507 * | 0.557 * | 0.939 * | 1.081 * | 0.635 * | 0.277 * |
EU/AS | −0.049 | −0.001 | −0.138 | 0.009 | 0.140 * | 0.165 * |
Likable/Unlikable | EU/AF | 0.134 * | 0.647 * | 0.021 | 1.084 * | 0.287 * | 0.517 * |
EU/AS | −0.032 | −0.112 | −0.829 | −0.121 | 0.111 * | −0.524 |
Competent/Incompetent | EU/AF | −0.038 | −0.520 | −1.215 | 0.704 * | −0.575 | −0.200 |
EU/AS | 0.012 | 0.075 * | 0.223 * | −0.123 | −0.334 | 0.186 * |
Table 3.
The results for FEAT on gender stereotype test that measures biases toward gender. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets for test are Male and Female. Attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent.
Table 3.
The results for FEAT on gender stereotype test that measures biases toward gender. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets for test are Male and Female. Attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent.
| | DeepFace | DeepID | VGGFace | FaceNet | OpenFace | ArcFace |
---|
Career/Family | Male/Female | 0.002 | −0.412 | −0.197 | −0.106 | 0.445 * | 0.111 * |
Pleasant/Unpleasant | 0.001 | −0.194 | −0.089 | −0.042 | 0.020 | 0.452 * |
Likable/Unlikable | 0.002 | −0.053 | −0.030 | 0.237 * | 0.053 | −0.243 |
Competent/Incompetent | −0.001 | −0.036 | 0.205 * | −0.343 | 0.212 * | 0.035 |
Table 4.
The results for FEAT on age stereotype test that measures biases toward age. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets for test are Young and Old. Attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent.
Table 4.
The results for FEAT on age stereotype test that measures biases toward age. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets for test are Young and Old. Attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent.
| | DeepFace | DeepID | VGGFace | FaceNet | OpenFace | ArcFace |
---|
Career/Family | Young/Old | −0.055 | −0.376 | 0.344 * | −0.166 | 0.993 | −0.416 |
Pleasant/Unpleasant | 0.062 | −0.036 | 1.406 * | 0.137 | 0.551 * | −0.260 |
Likable/Unlikable | 0.066 | 0.290 * | 1.222 * | 0.000 | 0.431 * | 0.509 * |
Competent/Incompetent | −0.021 | −0.001 | 1.046 * | 0.031 | 0.225 * | −0.477 |
Table 5.
The results for FEAT on intersectional bias that measures stereotypes toward Asian females. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets are European American Female, African American Female, and Asian American Female. All target pairs are tested with a single attribute pair, Competent and Incompetent.
Table 5.
The results for FEAT on intersectional bias that measures stereotypes toward Asian females. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets are European American Female, African American Female, and Asian American Female. All target pairs are tested with a single attribute pair, Competent and Incompetent.
| | DeepFace | DeepID | VGGFace | FaceNet | OpenFace | ArcFace |
---|
Competent/Incompetent | EUF/AFF | −0.017 | 0.465 * | −1.007 | 0.748 * | −0.095 | 0.358 * |
EUF/ASF | 0.007 | −0.172 | 0.029 | 0.165 * | −0.237 | 0.354 * |
AFF/ASF | 0.072 | 0.018 | 1.424 * | 0.451 * | 0.453* | −0.367 |
Table 6.
The results for race sensitivity analysis with FEAT on race transformation by varying the racial features in each image. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets for test are EU and AF. Attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent.
Table 6.
The results for race sensitivity analysis with FEAT on race transformation by varying the racial features in each image. Each cell represents the effect size, which indicates the magnitude of bias as small (0.2), medium (0.5), and large (0.8). p-values under 0.001 are significant, which are marked as *. Targets for test are EU and AF. Attributes are Career/Family, Pleasant/Unpleasant, Likable/Unlikable, and Competent/Incompetent.
Race Transformation | Attribute | DeepFace | DeepID | VGGFace | FaceNet | OpenFace | ArcFace |
---|
25% | Career/Family | 0.598 * | 0.470 * | 0.354 * | 0.419 * | 0.657 * | 0.523 * |
Pleasant/Unpleasant | 0.438 * | 0.314 * | 1.723 * | 0.720 * | 0.267 * | 0.901 * |
Likable/Unlikable | 0.796 * | 0.202 * | 1.414 * | 0.607 * | 0.756 * | 0.077 |
Competent/Incompetent | 0.957 * | 0.717 * | 1.420 * | 0.645 * | 1.306 * | 0.657 * |
50% | Career/Family | −0.007 | −0.560 | −0.689 | −0.770 | −0.281 | −0.443 |
Pleasant/Unpleasant | −0.029 | −0.409 | 1.591 * | −0.754 | −0.510 | 0.201 * |
Likable/Unlikable | 0.008 | −0.961 | 0.834 * | −0.729 | −0.378 | −0.951 |
Competent/Incompetent | −0.095 | −0.624 | 0.817 * | −0.716 | 0.308 * | −0.501 |
75% | Career/Family | −0.768 | −1.226 | −1.362 | −1.467 | −1.134 | −1.089 |
Pleasant/Unpleasant | −0.653 | −0.888 | 1.324 * | −1.547 | −1.188 | −0.475 |
Likable/Unlikable | −1.018 | −1.515 | −0.387 | −1.490 | −1.318 | −1.375 |
Competent/Incompetent | −1.170 | −1.439 | −0.549 | −1.509 | −1.036 | −1.278 |
100% | Career/Family | −1.112 | −1.538 | −1.586 | −1.725 | −1.490 | −1.382 |
Pleasant/Unpleasant | −0.999 | −1.200 | 0.761 * | −1.785 | −1.493 | −0.884 |
Likable/Unlikable | −1.448 | −1.733 | −1.102 | −1.745 | −1.619 | −1.593 |
Competent/Incompetent | −1.536 | −1.697 | −1.046 | −1.755 | −1.493 | −1.628 |