TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks

Rajabi, Amirarsalan; Garibay, Ozlem Ozmen

doi:10.3390/make4020022

Open AccessArticle

TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks

by

Amirarsalan Rajabi

¹

and

Ozlem Ozmen Garibay

^1,2,*

¹

Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA

²

Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL 32816, USA

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2022, 4(2), 488-501; https://doi.org/10.3390/make4020022

Submission received: 12 April 2022 / Revised: 7 May 2022 / Accepted: 13 May 2022 / Published: 16 May 2022

(This article belongs to the Section Data)

Download

Browse Figures

Versions Notes

Abstract

With the increasing reliance on automated decision making, the issue of algorithmic fairness has gained increasing importance. In this paper, we propose a Generative Adversarial Network for tabular data generation. The model includes two phases of training. In the first phase, the model is trained to accurately generate synthetic data similar to the reference dataset. In the second phase we modify the value function to add fairness constraint, and continue training the network to generate data that is both accurate and fair. We test our results in both cases of unconstrained, and constrained fair data generation. We show that using a fairly simple architecture and applying quantile transformation of numerical attributes the model achieves promising performance. In the unconstrained case, i.e., when the model is only trained in the first phase and is only meant to generate accurate data following the same joint probability distribution of the real data, the results show that the model beats the state-of-the-art GANs proposed in the literature to produce synthetic tabular data. Furthermore, in the constrained case in which the first phase of training is followed by the second phase, we train the network and test it on four datasets studied in the fairness literature and compare our results with another state-of-the-art pre-processing method, and present the promising results that it achieves. Comparing to other studies utilizing GANs for fair data generation, our model is comparably more stable by using only one critic, and also by avoiding major problems of original GAN model, such as mode-dropping and non-convergence.

Keywords: fairness in artificial intelligence; generative adversarial networks; fair data generation

Share and Cite

MDPI and ACS Style

Rajabi, A.; Garibay, O.O. TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks. Mach. Learn. Knowl. Extr. 2022, 4, 488-501. https://doi.org/10.3390/make4020022

AMA Style

Rajabi A, Garibay OO. TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks. Machine Learning and Knowledge Extraction. 2022; 4(2):488-501. https://doi.org/10.3390/make4020022

Chicago/Turabian Style

Rajabi, Amirarsalan, and Ozlem Ozmen Garibay. 2022. "TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks" Machine Learning and Knowledge Extraction 4, no. 2: 488-501. https://doi.org/10.3390/make4020022

APA Style

Rajabi, A., & Garibay, O. O. (2022). TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks. Machine Learning and Knowledge Extraction, 4(2), 488-501. https://doi.org/10.3390/make4020022

Article Menu

TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI