Development of a Database of Health Insurance Claims: Standardization of Disease Classifications and Anonymous Record Linkage
Open Access
- 1 January 2010
- journal article
- Published by Japan Epidemiological Association in Journal of Epidemiology
- Vol. 20 (5), 413-419
- https://doi.org/10.2188/jea.je20090066
Abstract
Background: Health insurance claims (ie, receipts) record patient health care treatments and expenses and, although created for the health care payment system, are potentially useful for research. Combining different types of receipts generated for the same patient would dramatically increase the utility of these receipts. However, technical problems, including standardization of disease names and classifications, and anonymous linkage of individual receipts, must be addressed. Methods: In collaboration with health insurance societies, all information from receipts (inpatient, outpatient, and pharmacy) was collected. To standardize disease names and classifications, we developed a computer-aided post-entry standardization method using a disease name dictionary based on International Classification of Diseases (ICD)-10 classifications. We also developed an anonymous linkage system by using an encryption code generated from a combination of hash values and stream ciphers. Using different sets of the original data (data set 1: insurance certificate number, name, and sex; data set 2: insurance certificate number, date of birth, and relationship status), we compared the percentage of successful record matches obtained by using data set 1 to generate key codes with the percentage obtained when both data sets were used. Results: The dictionary’s automatic conversion of disease names successfully standardized 98.1% of approximately 2 million new receipts entered into the database. The percentage of anonymous matches was higher for the combined data sets (98.0%) than for data set 1 (88.5%). Conclusions: The use of standardized disease classifications and anonymous record linkage substantially contributed to the construction of a large, chronologically organized database of receipts. This database is expected to aid in epidemiologic and health services research using receipt information.Keywords
This publication has 12 references indexed in Scilit:
- An MCMC algorithm for haplotype assembly from whole-genome sequence dataGenome Research, 2008
- Reliability of Health Insurance Claim Statistical Data Based on the Principal Diagnosis MethodNippon Eiseigaku Zasshi (Japanese Journal of Hygiene), 2008
- Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievalsBMC Bioinformatics, 2005
- Estimation of disease-specific costs in health insurance claims: a comparison of three methods.2004
- Stream cipher based on pseudorandom number generation with optical affine transformation.Applied Optics, 2000
- Automatic Record Hash Coding and Linkage for Epidemiological Follow-up Data ConfidentialityMethods of Information in Medicine, 1998
- How to ensure data security of an epidemiological follow-up:quality assessment of an anonymous record linkage procedureInternational Journal of Medical Informatics, 1998
- A computerized record hash coding and linkage procedure to warrant epidemiological follow-up data security.1997
- [Legal aspects of health insurance claims and their potential use as a data source for epidemiological research].1995
- Structure, process, effectiveness and efficiency of the check and review system in Japan's health insuranceHealth Policy, 1991