**5. Discussion**

In this paper, we introduced an intrinsic approach to measuring gender bias on contextualized embeddings by using gender polarity: an existing bias metric that measures how related an embedding of a word is to a specific gender. This metric has previously been applied on contextualized embeddings by first mapping them to the Word2Vec embedding space. We contribute by first detecting a stable gender direction in T5's embedding space and then computing gender polarity distributions for the various embeddings, instead of single values, for each word. The results of this approach are consistent with those of an extrinsic approach that we also followed; we evaluated T5's and mT5's outputs in terms of how bias can be propagated to the downstream task of semantic text similarity.

Our results indicate that higher status professions tend to be more associated to the male gender than the female gender. We also compared Swedish with English as well as various model sizes and found that our methods find less bias associated with gender in the Swedish language, though we note that the detection method itself may be more sensitive to bias in English. Additionally, we find that larger sizes of the models can lead to an increased manifestation of gender bias. This finding suggests that the embedding dimensionality might be proportional to the extend to which biases will be successfully encoded in the embedding vectors.

The consistency of the results between the intrinsic and extrinsic approach might be a positive indicator that deriving a stable gender direction in a Transformer model's embedding space is feasible and can lead to valid results. This is a simple, ye<sup>t</sup> powerful idea, which if supported by further research, can offer a solid basis for effective debiasing in Transformer models.

### **6. Ethics Statement**

It has been shown that changes in stereotypes and attitudes towards women and their participation in the workforce can be quantified by tracking the temporal dynamics of bias in word embeddings [20]. Furthermore, it has been observed in various use cases that models might marginalize specific groups in the way they handle downstream tasks, establishing a behavior similar to that of a stereotypically biased conduct [21–26]. To responsibly direct actions that will combat this problem, it is of crucial importance that we find reliable ways of detecting and quantifying it, which is what we aim for in this work. A reliable way of bias detection could be the touchstone of developing effective bias mitigation techniques, which could practically contribute to the pursuit of a more fair representation of different races and genders by the models. Such a course of action complies with the fifth and tenth goal regarding "gender equality" and "reduced inequalities" respectively, as defined in the 17 Sustainable Development Goals (https://sdgs.un.org/goals, accessed on 12 December 2021) set by the United Nations General Assembly and intended to be achieved by the year 2030. More specifically, this work is aligned with sub-goal 10.2 that is about empowering and promoting "the social, economic and political inclusion of all, irrespective of age, sex, disability, race, ethnicity, origin, religion or economic or other status" (https://sdgs.un.org/goals/goal10, accessed on 12 December 2021). This work is also aligned with sub-goal 5.1 that is about ending "all forms of discrimination against women and girls everywhere", and sub-goal 5.5, which ensures "women's full and effective participation and equal opportunities for leadership at all levels of decision-making in

political, economic and public life" (https://sdgs.un.org/goals/goal5, accessed on 12 December 2021).

**Author Contributions:** Conceptualization, S.K. and B.R.-G.; methodology, S.K. and B.R.-G.; software, S.K.; validation, S.K., B.R.-G. and J.S.; formal analysis, S.K.; investigation, S.K.; data curation, S.K.; writing—original draft preparation, S.K.; writing—review and editing, S.K., B.R.-G. and J.S.; visualization, S.K., B.R.-G. and J.S.; supervision, B.R.-G.; project administration, S.K. and B.R.-G.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded in part by by VINNOVA, Sweden (project title: Språkmodeller för svenska myndigheter, gran<sup>t</sup> number 2019-02996) and by the Swedish research council under contract 2019-03606.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data that support the findings of this study are openly available at http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark accessed on 12 December 2021, at https: //github.com/timpal0l/sts-benchmark-swedish accessed on 12 December 2021 and at https:// github.com/Stellakats/Master-thesis-gender-bias accessed on 12 December 2021.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Figure A1.** Average similarity scores per occupation. Model: T5 small.

**Figure A2.** Average similarity scores per occupation. Model: T5 base.

**Figure A3.** Average similarity scores per occupation.

**Figure A4.** Average similarity scores per occupation. Language: English. Model: mT5 small.

 Model: T5 large.

**Figure A5.** Average similarity scores per occupation. Language: Swedish. Model: mT5 small.

**Figure A6.** Average similarity scores per occupation. Language: English. Model: mT5 base.

**Figure A7.** Average similarity scores per occupation. Language: Swedish. Model size: mT5 base.

**Figure A8.** Average similarity scores per occupation. Language: English. Model: mT5 large.

**Figure A9.** Average similarity scores per occupation. Language: Swedish. Model: mT5 large.
