Synthesizing Individual Consumers′ Credit Historical Data Using Generative Adversarial Networks
Open Access
- 26 January 2021
- journal article
- research article
- Published by MDPI AG in Applied Sciences
- Vol. 11 (3), 1126
- https://doi.org/10.3390/app11031126
Abstract
The financial sector accumulates a massive amount of consumer data that contain the most sensitive information daily. These data are strictly limited outside the financial institutions, sometimes even within the same organization, for various reasons such as privacy laws or asset management policy. Financial data has never been more valuable, especially when assessed jointly with data from different industries, including healthcare, insurance, credit bureau, and research institutions. Therefore, it is critical to generate synthetic datasets that retain the statistical or latent properties of the real datasets as well as the privacy protection guaranteed. In this paper, we apply Generative Adversarial Nets (GANs) to generating synthetic consumer credit data to be used for various educational purposes, specifically in developing machine learning models. GAN is preferable to other pseudonymization methods such as masking, swapping, shuffling, or perturbation, for it does not suffer from adding more attributes or data. This study is significant because it is the first attempt to generate the synthetic data of real-world credit data in practical use. The results find that synthetic consumer credit data using GAN shows a substantial utility without severely compromising privacy and would be a useful resource for big data training programs.Funding Information
- Institute for Information and Communications Technology Promotion (2017-0-00302)
This publication has 12 references indexed in Scilit:
- Survey on categorical data for neural networksJournal of Big Data, 2020
- Seeing What a GAN Cannot GeneratePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2019
- Estimating the success of re-identifications in incomplete datasets using generative modelsNature Communications, 2019
- Probabilistic Forecasting of Sensory Data With Generative Adversarial Networks – ForGANIEEE Access, 2019
- Privacy and Synthetic DatasetsSSRN Electronic Journal, 2018
- Data synthesis based on generative adversarial networksProceedings of the VLDB Endowment, 2018
- Citywide Cellular Traffic Prediction Based on Densely Connected Convolutional Neural NetworksIEEE Communications Letters, 2018
- General and Specific Utility Measures for Synthetic DataJournal of the Royal Statistical Society Series A: Statistics in Society, 2018
- Big data preprocessing: methods and prospectsBig Data Analytics, 2016
- Data Preprocessing in Data MiningPublished by Springer Science and Business Media LLC ,2015