Synthetic Population Generation Without a Sample
- 1 May 2013
- journal article
- Published by Institute for Operations Research and the Management Sciences (INFORMS) in Transportation Science
- Vol. 47 (2), 266-279
- https://doi.org/10.1287/trsc.1120.0408
Abstract
The advent of microsimulation in the transportation sector has created the need for extensive disaggregated data concerning the population whose behavior is modeled. Because of the cost of collecting this data and the existing privacy regulations, this need is often met by the creation of a synthetic population on the basis of aggregate data. Although several techniques for generating such a population are known, they suffer from a number of limitations. The first is the need for a sample of the population for which fully disaggregated data must be collected, although such samples may not exist or may not be financially feasible. The second limiting assumption is that the aggregate data used must be consistent, a situation that is most unusual because these data often come from different sources and are collected, possibly at different moments, using different protocols. The paper presents a new synthetic population generator in the class of the Synthetic Reconstruction methods, whose objective is to obviate these limitations. It proceeds in three main successive steps: generation of individuals, generation of household type's joint distributions, and generation of households by gathering individuals. The main idea in these generation steps is to use data at the most disaggregated level possible to define joint distributions, from which individuals and households are randomly drawn. The method also makes explicit use of both continuous and discrete optimization and uses the χ2 metric to estimate distances between estimated and generated distributions. The new generator is applied for constructing a synthetic population of approximately 10,000,000 individuals and 4,350,000 households localized in the 589 municipalities of Belgium. The statistical quality of the generated population is discussed using criteria extracted from the literature, and it is shown that the new population generator produces excellent results.Keywords
This publication has 14 references indexed in Scilit:
- Efficient Methodology for Generating Synthetic Populations with Multiple Control LevelsTransportation Research Record: Journal of the Transportation Research Board, 2010
- Population Synthesis for Microsimulating Travel BehaviorTransportation Research Record: Journal of the Transportation Research Board, 2007
- Creating Synthetic Household PopulationsTransportation Research Record: Journal of the Transportation Research Board, 2007
- GALAHAD, a library of thread-safe Fortran 90 packages for large-scale nonlinear optimizationACM Transactions on Mathematical Software, 2003
- Evaluating Goodness-of-Fit Measures for Synthetic MicrodataGeographical and Environmental Modelling, 2001
- Models for Contingency Tables with Known Margins when Target and Sampled Populations DifferJournal of the American Statistical Association, 1991
- Future paths for integer programming and links to artificial intelligenceComputers & Operations Research, 1986
- Association and Estimation in Contingency TablesJournal of the American Statistical Association, 1968
- Contingency tables with given marginalsBiometrika, 1968
- On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are KnownThe Annals of Mathematical Statistics, 1940