A Survey on Bias in Deep NLP
Abstract
:1. Introduction
2. Defining Bias
2.1. The Bias Problem in Machine Learning
2.2. A Reflection on Bias in Language Models
2.3. Definition of Bias at Semantic Level
2.4. Definition of Bias in Language Modeling
3. Overview on Bias Related Research
- Year. This column is the publication year in ascending order and will serve as timeline on research progress. It also serves to highlight the increase of interest in the research community over time. We can see that it was not until two years after [26] when the community began to actively work on the bias of word embeddings models.
- Reference points to the publication.
- Domain(s) show us in which category falls the studied bias. The most represented category is gender bias, usually showing difference treatment between male and female. The second most represented one is ethnicity bias, in this category we grouped bias against race, ethnicity, nationality or language. We also found work on bias related with age, religion, sexual orientation and disability. It is worth mentioning that there is some work done on political bias.
- Model will refer to the neural network model studied in the paper. When the bias is not a model but an application, we will refer to such an application. Bias is not only studied in open system but also in black box applications like Google Translate. It is interesting how some studies are able to discover and measure bias in those system. Although they are not able to mitigate the bias directly, there are some samples that manage to reduce the bias without having access to the model by modifying strategically the input.
- Data will serve as a summary of what data were used. We will consider almost all the resources that have taken part in the study regarding: the information used to train the models to the corpus on which the models is applied, or the other dataset that helps to contextualize the technique used.
- Language column mainly shows that most of the work has been done on English datasets and models. Some approaches when working with bias in other languages usually have English as a reference point, involving the translation of the data or test sets from English to other languages with both automated tools and paid professionals. Another approach involves looking for analogies between different languages.
- Evaluation column shows the reader which was the technique for evaluation or for measuring the bias:
- ̶
- ̶
- Sentiment of association, a common way to find biased terms is measuring the sentiment of sentences by changing just one word. The words that differ will belong to the two classes being compared. A term will be biased if one sentence has a strong negative sentiment regarding the complementary. This is also tested with text generation tasks where a given sentence start will produce a full sentence or text, just changing a word of each class.
- ̶
- Analogies. The use of analogies has been found useful to show the bias with simple examples. Word embeddings space is suited to this type of technique, as analogies can be studied from a geometric perspective.
- ̶
- Representation. The works that fall in this category compare the likelihood between two classes of the protected property. Some studies will consider the goal to achieve equal representation, but usually the likelihood of the classes is compared with real world data. For example, comparing the distribution of men and women in the United States for a occupation with the probability of a sentence to be completed with an attribute of each one of the genres. In this way, you can compare the model output representation with the demographic percentage.
- ̶
- Accuracy. It is common to find studies that measure accuracy in tasks like classification or prediction to find out how biased the model is. This is similar to the general approach in machine learning with fairness measures.
- Mitigation shows how the bias is removed or attenuated from the data or the model.
- ̶
- Vector space manipulation evolves from the work of Bolukbasi et al. [26] in which he proposes to find the vector representation of the gender to compensate for its deviation and equalize some terms with respect to the neutral gender. This technique is known as word embedding debiasing or hard debiasing. This proposal has been explored with substantial improvements to better capture the bias, trying to avoid causing a harm to the model.
- ̶
- Data augmentation by increasing the source corpus/data of the biases. For example, by adding examples that balance with respect to an attribute. Thus seeking to make the data represent that given attribute in a less biased way.
- ̶
- Data manipulation makes changes to the data to help the model capture a less biased reality. For example, removing named entities in such a way that the model cannot learn differences associated with named entities.
- ̶
- Attribute protection tries to prevent an attribute from containing bias. For this purpose, different techniques are used to manipulate the data, the model or the training in order to avoid capturing information about that attribute. For example, if you remove proper names from phrases in a dataset and train a model, the model will not be able to associate proper names with other features such as jobs. If you train a model to analyze the sentiment of phrases and avoid proper nouns, the names will not have sentiment associated with them. You can find its application in the other techniques or as a combination of them. For example, eliminating proper names so that they do not capture gender information, duplicating all sentences that have gender (data modification) using the opposite gender (data augmentation) and finally training the model and manipulating it to eliminate the gender subspace (vector space manipulation).
- Stage column stands for mitigation stage, and indicates when the mitigation/bias correction work was done.
- ̶
- Before. Mostly altering or augmenting the source data to avoid bias or to balance the data that will be used in the model training.
- ̶
- During/Train. Changing the training process or fine tuning the model. For example by using a custom loss function.
- ̶
- After. Usually changing the model vector space after the learning stage.
- Task. This column outlines the field or scope in which the author is working. Since the appearance of [26], an important part of the studies will try to solve the novel problem of both “debiasing” and “bias evaluation”. Since both tasks are already reported in columns of the table itself, they will not appear in this column.
4. Discussion
4.1. Association Tests
4.2. Translation
4.3. Coreference Resolution
4.4. GPT-3 and Black-Box Models
4.5. Vector Space
4.6. Deep Learning Versus Traditional Machine Learning Algorithms
- Deep models are very greedy of data. Large datasets must be fed into the learning process. Gigabytes (and even terabytes) of texts are consumed during the training process. Most of the models are pre-trained with a language model approach (masking, sentence sequence) over corpora generated from available sources in the Internet, like movie subtitles, Wikipedia articles, news, tweets and so on. As these texts come from open communities but with specific cultural profiles from western world, bias expressions are naturally present in such a collection of texts.
- As the demand of larger corpora for larger models grows, the bias digs deeper its footprint. Some of the most powerful architectures, like GPT-3, are examples of such a situation. Bigger models, bigger bias (and more stereotype patterns found).
- Intensive research is being carried out in order to compress models, so the fit into environments with limited memory, latency and energy capabilities. Quantization, pruning or teacher–student approaches are enabling deep learning models to operate in more restricted infrastructures at a negligible cost in terms of performance [88]. It has been found that bias is, far from expected, emphasized by this reduction methods [89].
4.7. Complementary Works
5. A General Methodology for Dealing with Bias in Deep NLP
- Define the stereotyped knowledge. This implies to identify one or more protected properties and all the related stereotyped properties. For each protected property, you have to develop its own ontology.
- With the previous model at hand, we can overcome the task of identifying protected expressions and stereotyped expressions, so your stereotyped language is defined. It is equivalent to the process of populating your ontology (i.e., your stereotyped knowledge). There are some corpora available, like the ones mentioned in this work, but you may need to define your own expressions in order to capture all the potential biases that may harm your system. Anyhow, it is here when different resources could be explored to obtain a set of expressions as rich as possible.
- The next step is to evaluate your model. Choose a distance metric and compute overall differences in sequence probabilities containing stereotyped expressions with protected expressions as priors. Detail the benchmarked evaluation framework used.
- Analyze the results of the evaluation to identify which expressions or categories of expressions result in higher bias.
- Design a corrective mechanism. You have to decide which strategy fits better with your problem and with your available resources: data augmentation, a constraint in the learning process, model parameters correction, etc.
- Re-evaluate your model and loop over these last three steps until an acceptable response is reached, or though out your model if behavior is not what is desired. Rethink the whole process (network architecture, pre-training approach, fine-tuning, etc.).
- Report the result of this procedure by attaching model cards or similar document formalism in order to achieve transparent model reporting.
6. Conclusions and Challenges
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
Abbreviations
AAVE | African American Vernacular English |
AI | Artificial Intelligence |
BAT | Bias Analogy Test |
BERT | Bidirectional Encoder Representations from Transformers |
BLS | Bureau of Labor Statistics |
CAT | Context Association Test |
CDA | Counterfactual Data Augmentation |
CDS | Counterfactual Data Substitution |
CEAT | Contextualized Embedding Association Test |
CLAT | Cross-lingual Analogy Task |
ECT | Embedding Coherence Test |
EQT | Embedding Quality Test |
GPT | Generative Pre-Training Transformer |
IAT | Implicit Association Tests |
LM | Language Model |
LSTM | Long-Short Term Memory |
NER | Named Entities Recognition |
NLP | Natural Language Processing |
PCA | Principal Component Analysis |
POS | Part of Speech |
SAE | Standard American English |
SEAT | Sentence Encoder Association Test |
SIRT | Sentence Inference Retention Test |
UBE | Universal Bias Encoder |
USE | Universal Sentence Encoder |
WEAT | Word Embedding Association Test |
WEFAT | Word Embedding Factual Association Test |
XWEAT | Multilingual and Cross-Lingual WEAT |
References
- Howard, A.; Borenstein, J. Trust and Bias in Robots: These elements of artificial intelligence present ethical challenges, which scientists are trying to solve. Am. Sci. 2019, 107, 86–90. [Google Scholar] [CrossRef] [Green Version]
- Rodger, J.A.; Pendharkar, P.C. A field study of the impact of gender and user’s technical experience on the performance of voice-activated medical tracking application. Int. J. Hum. Comput. Stud. 2004, 60, 529–544. [Google Scholar] [CrossRef]
- Bullinaria, J.A.; Levy, J.P. Extracting semantic representations from word co-occurrence statistics: A computational study. Behav. Res. Methods 2007, 39, 510–526. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Stubbs, M. Text and Corpus Analysis: Computer-Assisted Studies of Language and Culture; Blackwell: Oxford, UK, 1996. [Google Scholar]
- Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A Survey on Bias and Fairness in Machine Learning. arXiv 2019, arXiv:1908.09635. [Google Scholar]
- Obermeyer, Z.; Powers, B.; Vogeli, C.; Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019, 366, 447–453. [Google Scholar] [CrossRef] [Green Version]
- Kendall, A.; Gal, Y. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? arXiv 2017, arXiv:1703.04977. [Google Scholar]
- Tolan, S.; Miron, M.; Gómez, E.; Castillo, C. Why machine learning may lead to unfairness: Evidence from risk assessment for juvenile justice in catalonia. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, Montreal, QC, Canada, 17–21 June 2019; pp. 83–92. [Google Scholar]
- Xu, J.; Ju, D.; Li, M.; Boureau, Y.L.; Weston, J.; Dinan, E. Recipes for Safety in Open-domain Chatbots. arXiv 2020, arXiv:2010.07079. [Google Scholar]
- Caliskan, A.; Bryson, J.; Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 2017, 356, 183–186. [Google Scholar] [CrossRef] [Green Version]
- Greenwald, A.G.; McGhee, D.E.; Schwartz, J.L. Measuring individual differences in implicit cognition: The implicit association test. J. Personal. Soc. Psychol. 1998, 74, 1464. [Google Scholar] [CrossRef]
- Mitchell, M.; Wu, S.; Zaldivar, A.; Barnes, P.; Vasserman, L.; Hutchinson, B.; Spitzer, E.; Raji, I.D.; Gebru, T. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, New York, NY, USA, 23–24 February 2018. [Google Scholar]
- Bender, E.M.; Friedman, B. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Trans. Assoc. Comput. Linguist. 2018, 6, 587–604. [Google Scholar] [CrossRef] [Green Version]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Abid, A.; Farooqi, M.; Zou, J. Persistent Anti-Muslim Bias in Large Language Models. arXiv 2021, arXiv:2101.05783. [Google Scholar]
- Kahneman, D.; Tversky, A. On the psychology of prediction. Psychol. Rev. 1973, 80, 237. [Google Scholar] [CrossRef] [Green Version]
- Gigerenzer, G. Bounded and rational. In Philosophie: Grundlagen und Anwendungen/Philosophy: Foundations and Applications; Mentis: Paderborn, Germany, 2008; pp. 233–257. [Google Scholar]
- Haselton, M.G.; Nettle, D.; Murray, D.R. The evolution of cognitive bias. The Handbook of Evolutionary Psychology; John Wiley& Sons: Hoboken, NJ, USA, 2015; pp. 1–20. [Google Scholar]
- Schneider, D.J. The Psychology of Stereotyping; Guilford Press: New York, NY, USA, 2005. [Google Scholar]
- Gajane, P.; Pechenizkiy, M. On Formalizing Fairness in Prediction with Machine Learning. arXiv 2017, arXiv:1710.03184. [Google Scholar]
- Verma, S.; Rubin, J. Fairness definitions explained. In Proceedings of the 2018 IEEE/ACM International Workshop on Software Fairness (Fairware), Gothenburg, Sweden, 29 May 2018; pp. 1–7. [Google Scholar]
- Qiu, X.; Sun, T.; Xu, Y.; Shao, Y.; Dai, N.; Huang, X. Pre-trained models for natural language processing: A survey. Sci. China Technol. Sci. 2020, 63, 1872–1897. [Google Scholar] [CrossRef]
- Baader, F.; Horrocks, I.; Sattler, U. Description logics. In Handbook on Ontologies; Springer: Berlin/Heidelberg, Germany, 2004; pp. 3–28. [Google Scholar]
- Antoniou, G.; Van Harmelen, F. Web ontology language: Owl. In Handbook on Ontologies; Springer: Berlin/Heidelberg, Germany, 2004; pp. 67–92. [Google Scholar]
- Bolukbasi, T.; Chang, K.W.; Zou, J.Y.; Saligrama, V.; Kalai, A. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. arXiv 2016, arXiv:1607.06520. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Lauscher, A.; Glavaš, G. Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019), Minneapolis, MN, USA, 6–7 June 2019; pp. 85–91. [Google Scholar] [CrossRef]
- Lauscher, A.; Glavas, G.; Ponzetto, S.P.; Vulic, I. A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces. arXiv 2020, arXiv:1909.06092. [Google Scholar]
- Lauscher, A.; Takieddin, R.; Ponzetto, S.P.; Glavaš, G. AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain, 12 December 2020; pp. 192–199. [Google Scholar]
- Gonen, H.; Goldberg, Y. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. arXiv 2019, arXiv:1903.03862. [Google Scholar]
- Zhao, J.; Zhou, Y.; Li, Z.; Wang, W.; Chang, K.W. Learning Gender-Neutral Word Embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4847–4853. [Google Scholar] [CrossRef] [Green Version]
- Jentzsch, S.; Schramowski, P.; Rothkopf, C.; Kersting, K. Semantics Derived Automatically from Language Corpora Contain Human-like Moral Choices. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES’19), New York, NY, USA, 27–28 January 2019; pp. 37–44. [Google Scholar] [CrossRef]
- Swinger, N.; De-Arteaga, M.; Heffernan, N.T., IV; Leiserson, M.D.; Kalai, A.T. What Are the Biases in My Word Embedding? In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 27–28 January 2019; pp. 305–311. [Google Scholar] [CrossRef] [Green Version]
- Dev, S.; Phillips, J. Attenuating Bias in Word vectors. In Proceedings of the The 22nd International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 16–18 April 2019; Volume 89, pp. 879–887. [Google Scholar]
- Zhao, J.; Wang, T.; Yatskar, M.; Ordonez, V.; Chang, K. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. arXiv 2018, arXiv:1804.06876. [Google Scholar]
- Manzini, T.; Yao Chong, L.; Black, A.W.; Tsvetkov, Y. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 615–621. [Google Scholar] [CrossRef]
- Zhou, P.; Shi, W.; Zhao, J.; Huang, K.H.; Chen, M.; Chang, K.W. Analyzing and Mitigating Gender Bias in Languages with Grammatical Gender and Bilingual Word Embeddings; ACL: Montréal, QC, Canada, 2019. [Google Scholar]
- Conneau, A.; Lample, G.; Ranzato, M.; Denoyer, L.; Jégou, H. Word Translation Without Parallel Data. arXiv 2018, arXiv:1710.04087. [Google Scholar]
- Escudé Font, J.; Costa-jussà, M.R. Equalizing Gender Bias in Neural Machine Translation with Word Embeddings Techniques. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, Florence, Italy, 1–2 August 2019; pp. 147–154. [Google Scholar] [CrossRef]
- Ziemski, M.; Junczys-Dowmunt, M.; Pouliquen, B. The United Nations Parallel Corpus v1.0. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23–28 May 2016; pp. 3530–3534. [Google Scholar]
- Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. 2005. Available online: https://www.statmt.org/europarl/ (accessed on 1 October 2019).
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar] [CrossRef] [Green Version]
- Vig, J. A Multiscale Visualization of Attention in the Transformer Model. arXiv 2019, arXiv:1906.05714. [Google Scholar]
- Dev, S.; Li, T.; Phillips, J.M.; Srikumar, V. OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings. arXiv 2020, arXiv:2007.00049. [Google Scholar]
- Bhardwaj, R.; Majumder, N.; Poria, S. Investigating Gender Bias in Bert. arXiv 2020, arXiv:2009.05021. [Google Scholar]
- Chaloner, K.; Maldonado, A. Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, Florence, Italy, 1–2 August 2019; pp. 25–32. [Google Scholar] [CrossRef]
- Webster, K.; Recasens, M.; Axelrod, V.; Baldridge, J. Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. Trans. Assoc. Comput. Linguist. 2018, 6, 605–617. [Google Scholar] [CrossRef] [Green Version]
- May, C.; Wang, A.; Bordia, S.; Bowman, S.R.; Rudinger, R. On Measuring Social Biases in Sentence Encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 622–628. [Google Scholar] [CrossRef]
- Tan, Y.; Celis, L. Assessing Social and Intersectional Biases in Contextualized Word Representations. arXiv 2019, arXiv:1911.01485. [Google Scholar]
- Hall Maudslay, R.; Gonen, H.; Cotterell, R.; Teufel, S. It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5267–5275. [Google Scholar] [CrossRef]
- Nadeem, M.; Bethke, A.; Reddy, S. StereoSet: Measuring stereotypical bias in pretrained language models. arXiv 2020, arXiv:2004.09456. [Google Scholar]
- Guo, W.; Çalişkan, A. Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases. arXiv 2020, arXiv:2006.03955. [Google Scholar]
- Díaz Martínez, C.; Díaz García, P.; Navarro Sustaeta, P. Hidden Gender Bias in Big Data as Revealed by Neural Networks: Man is to Woman as Work is to Mother? Rev. Esp. Investig. Sociol. 2020, 172, 41–60. [Google Scholar]
- Leavy, S.; Meaney, G.; Wade, K.; Greene, D. Mitigating Gender Bias in Machine Learning Data Sets. In Bias and Social Aspects in Search and Recommendation; Boratto, L., Faralli, S., Marras, M., Stilo, G., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 12–26. [Google Scholar]
- Bartl, M.; Nissim, M.; Gatt, A. Unmasking Contextual Stereotypes: Measuring and Mitigating BERT’s Gender Bias. In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, Florence, Italy, 12–13 December 2020. [Google Scholar]
- Rudinger, R.; Naradowsky, J.; Leonard, B.; Van Durme, B. Gender Bias in Coreference Resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 2, pp. 8–14. [Google Scholar] [CrossRef]
- Clark, K.; Manning, C. Deep Reinforcement Learning for Mention-Ranking Coreference Models. arXiv 2016, arXiv:1609.08667. [Google Scholar]
- Lu, K.; Mardziel, P.; Wu, F.; Amancharla, P.; Datta, A. Gender Bias in Neural Natural Language Processing. arXiv 2018, arXiv:1807.11714. [Google Scholar]
- Lee, K.; He, L.; Lewis, M.; Zettlemoyer, L. End-to-end Neural Coreference Resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 188–197. [Google Scholar] [CrossRef]
- Clark, K.; Manning, C.D. Deep Reinforcement Learning for Mention-Ranking Coreference Models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 2256–2262. [Google Scholar] [CrossRef] [Green Version]
- Zhao, J.; Wang, T.; Yatskar, M.; Cotterell, R.; Ordonez, V.; Chang, K.W. Gender Bias in Contextualized Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 629–634. [Google Scholar] [CrossRef] [Green Version]
- McGuffie, K.; Newhouse, A. The Radicalization Risks of GPT-3 and Advanced Neural Language Models. arXiv 2020, arXiv:2009.06807. [Google Scholar]
- Floridi, L.; Chiriatti, M. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds Mach. 2020, 30, 681–694. [Google Scholar] [CrossRef]
- Farkas, A.; N’emeth, R. How to Measure Gender Bias in Machine Translation: Optimal Translators, Multiple Reference Points. arXiv 2020, arXiv:2011.06445. [Google Scholar]
- Badjatiya, P.; Gupta, M.; Varma, V. Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 49–59. [Google Scholar] [CrossRef] [Green Version]
- De-Arteaga, M.; Romanov, A.; Wallach, H.; Chayes, J.; Borgs, C.; Chouldechova, A.; Geyik, S.; Kenthapadi, K.; Kalai, A.T. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA, 29–31 January 2019; pp. 120–128. [Google Scholar] [CrossRef] [Green Version]
- Heindorf, S.; Scholten, Y.; Engels, G.; Potthast, M. Debiasing Vandalism Detection Models at Wikidata. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 670–680. [Google Scholar] [CrossRef]
- Zuckerman, M.; Last, M. Using Graphs for Word Embedding with Enhanced Semantic Relations. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), Hong Kong, China, 3–7 November 2019; pp. 32–41. [Google Scholar] [CrossRef] [Green Version]
- Peng, X.; Li, S.; Frazier, S.; Riedl, M. Reducing Non-Normative Text Generation from Language Models. In Proceedings of the 13th International Conference on Natural Language Generation, Dublin, Ireland, 7–10 September 2020; pp. 374–383. [Google Scholar]
- Kiritchenko, S.; Mohammad, S. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, New Orleans, LA, USA, 5–6 June 2018; pp. 43–53. [Google Scholar] [CrossRef] [Green Version]
- Prates, M.O.; Avelar, P.H.; Lamb, L.C. Assessing gender bias in machine translation: A case study with google translate. Neural Comput. Appl. 2019, 32, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Sheng, E.; Chang, K.W.; Natarajan, P.; Peng, N. The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3407–3412. [Google Scholar] [CrossRef]
- Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. In Proceedings of the International AAAI Conference on Web and Social Media, Oxford, UK, 26–29 May 2015. [Google Scholar]
- Stanovsky, G.; Smith, N.A.; Zettlemoyer, L. Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1679–1684. [Google Scholar] [CrossRef]
- Ott, M.; Edunov, S.; Grangier, D.; Auli, M. Scaling Neural Machine Translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, 31 October–1 November 2018; pp. 1–9. [Google Scholar] [CrossRef]
- Basta, C.; Costa-jussà, M.R.; Casas, N. Evaluating the Underlying Gender Bias in Contextualized Word Embeddings. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, Florence, Italy, 1–2 August 2019; pp. 33–39. [Google Scholar] [CrossRef]
- Groenwold, S.; Ou, L.; Parekh, A.; Honnavalli, S.; Levy, S.; Mirza, D.; Wang, W.Y. Investigating African-American Vernacular English in Transformer-Based Text Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), online, 16–20 November 2020; pp. 5877–5883. [Google Scholar] [CrossRef]
- Blodgett, S.L.; Green, L.; O’Connor, B. Demographic Dialectal Variation in Social Media: A Case Study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 1119–1130. [Google Scholar] [CrossRef]
- Babaeianjelodar, M.; Lorenz, S.; Gordon, J.; Matthews, J.N.; Freitag, E. Quantifying Gender Bias in Different Corpora. In Companion Proceedings of the Web Conference 2020; ACM: New York, NY, USA, 2020. [Google Scholar]
- Iandola, F.; Shaw, A.; Krishna, R.; Keutzer, K. SqueezeBERT: What can computer vision teach NLP about efficient neural networks? In Proceedings of the SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, online, 11 November 2020; pp. 124–135. [Google Scholar] [CrossRef]
- Hutchinson, B.; Prabhakaran, V.; Denton, E.; Webster, K.; Zhong, Y.; Denuyl, S. Social Biases in NLP Models as Barriers for Persons with Disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, online, 5–10 July 2020; pp. 5491–5501. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
- Davis, J. Gender Bias In Machine Translation. 2020. Available online: https://towardsdatascience.com/gender-bias-in-machine-translation-819ddce2c452 (accessed on 1 January 2021).
- Johnson, M. Providing Gender-Specific Translations in Google Translate. Google AI Blog. 2018. Available online: https://ai.googleblog.com/2018/12/providing-gender-specific-translations.html (accessed on 1 August 2020).
- Johnson, M. A Scalable Approach to Reducing Gender Bias in Google Translate. Google AI Blog. 2018. Available online: https://ai.googleblog.com/2020/04/a-scalable-approach-to-reducing-gender.html (accessed on 1 September 2020).
- Gupta, M.; Agrawal, P. Compression of Deep Learning Models for Text: A Survey. arXiv 2020, arXiv:2008.05221. [Google Scholar]
- Hooker, S.; Moorosi, N.; Clark, G.; Bengio, S.; Denton, E. Characterising Bias in Compressed Models. arXiv 2020, arXiv:2010.03058. [Google Scholar]
Protected Property () | Stereotyped Properties () | Protected Terms () | Stereotyped Terms () |
---|---|---|---|
has-gender | {has-job} | {girl, women, Christine, man} | {doctor, nurse} |
has-age | {responsibility, efficiency} | {: t < 25}, {: t > 25} | {responsible, efficient, irresponsible, inefficient} |
has-religion | {confidence, crime-committed} | {Muslim, Christian, atheist} | {terrorist, dangerous, robbery, homicide} |
has-profession | {social-skills, intelligence, aspect} | {IT specialist, physicist, cleaning lady, politician, athlete, lawyer, CEO, teacher} | {empathetic, friendly, kind, confident, handsome, strong, intelligent, attractive, powerful, influencer} |
Year | Ref. | Stereotype(s) | Model | Data | Lang. | Evaluation | Mitigation | Stage | Task |
---|---|---|---|---|---|---|---|---|---|
2016 | [26] | Gender | Word2Vec, GloVe | GoogleNews corpus (w2vNEWS), Common Crawl | English | Analogies/Cosine Similarity | Vector Space Manipulation | After | - |
2018 | [37] | Gender | GloVe [27] | OntoNotes 5.0, WinoBias, Occupation Data (BLS), B&L | English | Prediction Accuracy | Data Augmentation (Gender Swapping), Vector Space Manipulation | After | Coreference Resolution |
2018 | [33] | Gender | GloVe [27], GN-GloVe, Hard-GloVe | 2017 English Wikipedia dump, SemBias (3) | English | Prediction Accuracy, Analogies (3) | Attribute Protection, Vector Space Manipulation(1), Hard-Debias (2) | Train (1), After (2) | Coreference resolution |
2019 | [38] | Ethnicity, Gender, Religion | Word2Vec | Reddit L2 corpus | English | PCA, WEAT, MAC, Clustering | Vector Space Manipulation | After | POS tagging, POS chunking, NER |
2019 | [39] | Gender | Spanish fastText | Spanish Wikipedia, bilingual embeddings (MUSE) [40] | English, Spanish | CLAT, WEAT | Vector Space Manipulation | After | - |
2019 | [41] | Gender | Transformer, GloVe, Hard-Debiased GloVe, GN-GloVe | United Nations [42], Europarl [43], newstest2012, newstest2013, Occupation data (BLS) | English, Spanish | BLEU [44] | Vector Space Manipulation (Hard-Debias) | Train, After | Translation |
2020 | [30] | General | CBOW, GloVe, FastText, DebiasNet | - | German, Spanish, Italian, Russian, Croatian, Turkish, English | WEAT, XWEAT, ECT, BAT, Clustering (KMeans) (BIAS ANALOGY TEST) | Vector Space Manipulation, DEBIE | After | - |
2019 | [45] | Gender | BERT (base, uncased), GPT-2 (small) | - | English | Visualization, Text Generation likelihood | - | - | - |
2020 | [46] | Gender | RoBERTa/GloVe (1) | Common Crawl (1) | English | WEAT*, SIRT | Vector Space Manipulation, OSCaR | Train | - |
2020 | [47] | Gender | BERT | Equity Evaluation Corpus, Gen-data | English | EEC, Gender Separability. Emotion/Sentiment Scoring | Vector Space Manipulation | Train | - |
Year | Ref. | Stereotype(s) | Model | Data | Lang. | Evaluation | Mitigation | Stage | Task |
---|---|---|---|---|---|---|---|---|---|
2017 | [10] | Gender, Ethnicity | GloVe, Word2Vec | Common Crawl, Google News Corpus, Ocuppation Data (BLS) | English | Association Tests (WEAT, WEFAT) | - | - | - |
2019 | [32] | Gender | HARD-DEBIASED [26], GN-GloVe [33] | Google News, English Wikipedia | English | WEAT, Clustering | - | - | - |
2019 | [34] | Gender, Crime, Moral | Skip-Gram | Google’s News | English | WEAT | - | - | Question answering, Decision making |
2019 | [35] | - | Word2Vec (1), fastText (2), GloVe (3) | Google News (1), Web data (2,3), First Names (SSA) | English | WEAT | - | - | Unsupervised Bias Enumeration |
2019 | [36] | Gender, Age, Ethnicity | GloVe | Wikipedia Dump, WSim-353, SimLex-999, Google Analogy Dataset | English | WEAT, EQT, ECT | Vector Space Manipulation | - | - |
2019 | [38] | Ethnicity, Gender, Religion | Word2Vec | Reddit L2 corpus | English | PCA, WEAT, MAC, Clustering | Vector Space Manipulation | After | POS tagging, POS chunking, NER |
2019 | [29] | Gender | CBOW (1), GloVe (1,2), FastText (1), Dict2Vec(1) | English Wikipedia (1), Common Crawl (2), Wikipedia (2), Tweets (2) | English, German, Spanish, Italian, Russian, Croatian, Turkish | WEAT, XWEAT | - | - | - |
2019 | [39] | Gender | Spanish fastText | Spanish Wikipedia, bilingual embeddings (MUSE)[40] | English, Spanish | CLAT, WEAT | Vector Space Manipulation | After | - |
2019 | [48] | Gender | Skip-Gram (1,2,3), FastText (4) | Google News (1), PubMed (2), Twitter (3), GAP-Wikipedia (4) [49] | English | WEAT, Clustering (K-Means++) | - | - | - |
2019 | [50] | Gender, Ethnicity | BERT(large, cased), CBoW-GloVe (Web corpus version), InferSent, GenSen, USE, ELMo, GPT | - | English | SEAT | - | - | - |
2019 | [51] | Gender, Race | BERT( base cased, large cased), GPT-2 (117M, 345M), ELMo, GPT | - | English | Contextual SEAT | - | - | - |
2019 | [52] | Gender | CBOW | English Gigaword, Wikipedia, Google Analogy, SimLex-999 | English | Analogies, WEAT, Sentiment Classification, Clustering | Hard-Debiasing, CDA, CDS | Train | - |
2020 | [30] | General | CBOW, GloVe, FastText, DebiasNet | - | German, Spanish, Italian, Russian, Croatian, Turkish, English | WEAT, XWEAT, ECT, BAT, Clustering(KMeans) (BIAS ANALOGY TEST) | Vector Space Manipulation, DEBIE | After | - |
2020 | [31] | Gender, Ethnicity | AraVec CBOW (1), CBOW (2), AraVec Skip-Gram (3) and FASTTEXT (4), FastText (5) | translated WEAT test set, Leipzig news (2), Wikipedia (1,3,5), Twitter (1,3,4), CommonCrawl (5) | Modern Arabic. Egyptian Arabic | WEAT, XWEAT, AraWEAT, ECT, BAT | - | - | - |
Year | Ref. | Stereotype(s) | Model | Data | Lang. | Evaluation | Mitigation | Stage | Task |
---|---|---|---|---|---|---|---|---|---|
2020 | [46] | Gender | RoBERTa/GloVe (1) | Common Crawl (1) | English | WEAT*, SIRT | Vector Space Manipulation, OSCaR | Train | - |
2020 | [53] | Gender, Profession, Race, Religion | BERT, GPT-2, RoBERTa, XLNet | StereoSet | English | CAT Context Association Test | - | - | Language Modeling |
2020 | [54] | Intersectional Bias (Gender, Ethnicity) | GloVe, ElMo, GPT, GPT-2, BERT | CommonCrawl, Billion Word Benchmark, BookCorpus, English Wikipedia dumps, BookCorpus, WebText, Bert-small-cased? | English | WEAT, CEAT | - | - | - |
2020 | [55] | Gender | Word2Vec | Wikipedia-es 2006 | Spanish | Analogies | - | - | - |
2020 | [56] | Gender | CBOW | British Library Digital corpus, The Guardian articles | English | Association, Prediction likelihood, Sentiment Analysis | - | - | - |
2020 | [57] | Gender | BERT | GAP, BEC-Pro, Occupation Data (BLS) | English, German | Association Test (like WEAT) | Fine-tuning, CDS | Train | - |
2018 | [58] | Gender | Deep Coref. [59] | WinoGender, Occupation Data (BLS), B&L | English | Prediction Accuracy | - | - | Coreference Resolution |
2018 | [33] | Gender | GloVe [27], GN-GloVe, Hard-GloVe | 2017 English Wikipedia dump, SemBias (3) | English | Prediction Accuracy, Analogies (3) | Attribute Protection, Vector Space Manipulation (1), Hard-Debias (2) | Train (1), After (2) | Coreference resolution |
2018 | [60] | Gender | e2e-coref [61], deep-coref [62] | CoNLL-2012, Wikitext-2 | English | Coreference score (1), likelihood (2) | Data Augmentation (CDA), WED [26], | Before, Train, After | Coreference Resolution (1), Language Modeling (2) |
2019 | [63] | Gender | ELMo, GloVe | One Billion Word Benchmark, WinoBias, OntoNotes 5.0 | English | PCA, Prediction Accuracy | Data Augmentation (1), Attribute Protection (gender swapping averaging) (2) | Train (1), After (2) | Coreference Resolution |
2020 | [14] | Gender, Race, Religion | GPT-3 | Common Craw, WebText2, Books1, Books2, Wikipedia | English | Text generation | - | - | - |
2020 | [64] | Ideological, Political, Race | GPT-3 | Common Craw, WebText2, Books1, Books2, Wikipedia | English | QA, Text Generation | - | - | - |
2020 | [65] | Race | GPT-3 | Common Craw, WebText2, Bools1, Books2, Wikipedia | English | Text Generation | - | - | Question Answering |
2020 | [66] | Gender | Google Translate | United Nations [42], Europarl [43], Google Translate Community | English, Hungarian | Prediction accuracy - | - | Translation | |
2021 | [16] | Ethnicity | GPT-3 | Common Craw, WebText2, Books1, Books2, Wikipedia, Humans of New York images | English | Analogies, associations, Text Generation | Positive Contextualizacion | After | - |
Year | Ref. | Stereotype(s) | Model | Data | Lang. | Evaluation | Mitigation | Stage | Task |
---|---|---|---|---|---|---|---|---|---|
2018 | [37] | Gender | GloVe [27] | OntoNotes 5.0, WinoBias, Occupation Data (BLS), B&L | English | Prediction Accuracy | Data Augmentation (Gender Swapping), Vector Space Manipulation | After | Coreference Resolution |
2018 | [33] | Gender | GloVe [27], GN-GloVe, Hard-GloVe | 2017 English Wikipedia dump, SemBias (3) | English | Prediction Accuracy, Analogies (3) | Attribute Protection, Vector Space Manipulation (1), Hard-Debias (2) | Train (1), After (2) | Coreference resolution |
2019 | [67] | Gender, Ethnicity, Disability, Sexual Orientation | Google Perspective API | WikiDetox, Wiki Madlibs, Twitter, WordNet | English | Classification Accuracy, likelihood | Data correction, Data Augmentation, Attribute Protection | Before | Hate Speech Detection |
2019 | [68] | Gender | fastText, BoW, DRNN with Custom Dataset | Common Crawl, Occupation Data (BLS) | English | Prediction Accuracy | Attribute protection (Removing Gender and NE) | Before | Hiring |
2019 | [69] | Account Age, user features | Graph Embeddings [70] | WikiData | English | Accuracy | Attribute Protection (Remove user information) | Train | Vandalism Detection |
2019 | [38] | Ethnicity, Gender, Religion | Word2Vec | Reddit L2 corpus | English | PCA, WEAT, MAC, Clustering | Vector Space Manipulation | After | POS tagging, POS chunking, NER |
2019 | [63] | Gender | ELMo, GloVe | One Billion Word Benchmark, WinoBias, OntoNotes 5.0 | English | PCA, Prediction Accuracy | Data Augmentation (1), Attribute Protection(gender swapping averaging) (2) | Train (1), After (2) | Coreference Resolution |
2019 | [52] | Gender | CBOW | English Gigaword, Wikipedia, Google Analogy, SimLex-999 | English | Analogies, WEAT, Sentiment Classification, Clustering | Hard-Debiasing, CDA, CDS | Train | - |
2020 | [71] | Ethnicity | GPT-2 | Science fiction story corpus, Plotto, ROCstories, toxic and Sentiment datasets | English | Classification Accuracy | Loss function modification | Fine tuning | Normative text Classifiaction |
Year | Ref. | Stereotype(s) | Model | Data | Lang. | Evaluation | Mitigation | Stage | Task | |
---|---|---|---|---|---|---|---|---|---|---|
2018 | [72] | Gender, Ethnicity | - | EEC, Tweets (SemEval-2018) | English | Sentiment, Emotion of Association | - | - | Sentiment Scoring | |
2019 | [63] | Gender | ELMo, GloVe | One Billion Word Benchmark, WinoBias, OntoNotes 5.0 | English | PCA, Prediction Accuracy | Data Augmentation (1), Attribute Protection(gender swapping averaging) (2) | Train (1), After (2) | Coreference Resolution | |
2019 | [67] | Gender, Ethnicity, Disability, Sexual Orientation | Google Perspective API | WikiDetox, Wiki Madlibs, Twitter, WordNet | English | Classification Accuracy, likelihood | Data correction, Data Augmentation, Attribute Protection | Before | Hate Speech Detection | |
2019 | [73] | Gender | Google Translate API (1) | United Nations and European Parliament transcripts (1), Translate Community (1), Occupation Data (BLS), COCA | Malay, Estonian, Finish, Hungarian, Armenian, Bengali, English, Persian, Nepali, Japanese, Korean, Turkish, Yoruba, Swahili, Basque, Chinese | Prediction accuracy - | - | Translation | ||
2019 | [74] | Race, Gender, Sexual Orientation | LSTM, BERT, GPT-2 (small), GoogleLM1b (4) | One Billion Word Benchmark(4) | English | Sentiment Score (VADER [75]), Classification accuracy | Train LSTM/BERT | Train | Text Generation | |
2019 | [76] | Gender | Google Translate, Microsoft Translator, Amazon Translate, SYSTRAN, Model of [77] | - | English, French, Italian, Russian, Ukrainian, Hebrew, Arabic, German | WinoMT (WinoBias + WinoGender), Prediction Accuracy | Positive Contextualization | After | Translation | |
2019 | [78] | Gender | ELMo | English-German news WTM18 | English | cosine similarity, clustering, KNN | - | - | - | |
2019 | [52] | Gender | CBOW | English Gigaword, Wikipedia, Google Analogy, SimLex-999 | English | Analogies, WEAT, Sentiment Classification, Clustering | Hard-Debiasing, CDA, CDS | Train | - | |
2020 | [47] | Gender | BERT | Equity Evaluation Corpus, Gen-data | English | EEC, Gender Separability. Emotion/Sentiment Scoring | Vector Space Manipulation | Train | - | |
2020 | [79] | Etnicity | GPT-2 (small), DISTILBERT | TwitterAAE [80], Amazon Mechanical Turk annotators (SAE) | English (AAVE/SAE) | Text generation, BLEU, ROUGE, Sentiment Classification, VADER [75] | - | - | - | |
2020 | [56] | Gender | CBOW | British Library Digital corpus, The Guardian articles | English | Association, Prediction likelihood, Sentiment Analysis | - | - | - | |
2020 | [81] | Gender, Race, Religion, Disability | BERT(1) | Wikipedia(1), Book corpus(1), Jigsaw identity toxic dataset, RtGender, GLUE | English | Cosine Similarity, Accuracy, GLUE | Fine tuning | Fine tuning | Decision Making | |
2020 | [82] | Gender, Race | SqueezeBERT | Wikipedia, BooksCorpus | English | GLUE | - | - | - | |
2020 | [83] | Disability | BERT, Google Cloud sentiment model | Jigsaw Unintended Bias | English | Sentiment Score | - | - | Toxicity prediction, Sentiment analysis. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Garrido-Muñoz , I.; Montejo-Ráez , A.; Martínez-Santiago , F.; Ureña-López , L.A. A Survey on Bias in Deep NLP. Appl. Sci. 2021, 11, 3184. https://doi.org/10.3390/app11073184
Garrido-Muñoz I, Montejo-Ráez A, Martínez-Santiago F, Ureña-López LA. A Survey on Bias in Deep NLP. Applied Sciences. 2021; 11(7):3184. https://doi.org/10.3390/app11073184
Chicago/Turabian StyleGarrido-Muñoz , Ismael, Arturo Montejo-Ráez , Fernando Martínez-Santiago , and L. Alfonso Ureña-López . 2021. "A Survey on Bias in Deep NLP" Applied Sciences 11, no. 7: 3184. https://doi.org/10.3390/app11073184
APA StyleGarrido-Muñoz , I., Montejo-Ráez , A., Martínez-Santiago , F., & Ureña-López , L. A. (2021). A Survey on Bias in Deep NLP. Applied Sciences, 11(7), 3184. https://doi.org/10.3390/app11073184