Development of a Database of Health Insurance Claims: Standardization of Disease Classifications and Anonymous Record Linkage

Open Access

1 January 2010

journal article
Published by Japan Epidemiological Association in Journal of Epidemiology

Vol. 20 (5), 413-419
https://doi.org/10.2188/jea.je20090066

Abstract

Background: Health insurance claims (ie, receipts) record patient health care treatments and expenses and, although created for the health care payment system, are potentially useful for research. Combining different types of receipts generated for the same patient would dramatically increase the utility of these receipts. However, technical problems, including standardization of disease names and classifications, and anonymous linkage of individual receipts, must be addressed. Methods: In collaboration with health insurance societies, all information from receipts (inpatient, outpatient, and pharmacy) was collected. To standardize disease names and classifications, we developed a computer-aided post-entry standardization method using a disease name dictionary based on International Classification of Diseases (ICD)-10 classifications. We also developed an anonymous linkage system by using an encryption code generated from a combination of hash values and stream ciphers. Using different sets of the original data (data set 1: insurance certificate number, name, and sex; data set 2: insurance certificate number, date of birth, and relationship status), we compared the percentage of successful record matches obtained by using data set 1 to generate key codes with the percentage obtained when both data sets were used. Results: The dictionary’s automatic conversion of disease names successfully standardized 98.1% of approximately 2 million new receipts entered into the database. The percentage of anonymous matches was higher for the combined data sets (98.0%) than for data set 1 (88.5%). Conclusions: The use of standardized disease classifications and anonymous record linkage substantially contributed to the construction of a large, chronologically organized database of receipts. This database is expected to aid in epidemiologic and health services research using receipt information.

Keywords

This publication has 12 references indexed in Scilit:

An MCMC algorithm for haplotype assembly from whole-genome sequence data
Genome Research, 2008
Reliability of Health Insurance Claim Statistical Data Based on the Principal Diagnosis Method
Nippon Eiseigaku Zasshi (Japanese Journal of Hygiene), 2008
Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievals
BMC Bioinformatics, 2005
Estimation of disease-specific costs in health insurance claims: a comparison of three methods.
2004
Stream cipher based on pseudorandom number generation with optical affine transformation.
Applied Optics, 2000
Automatic Record Hash Coding and Linkage for Epidemiological Follow-up Data Confidentiality
Methods of Information in Medicine, 1998
How to ensure data security of an epidemiological follow-up:quality assessment of an anonymous record linkage procedure
International Journal of Medical Informatics, 1998
A computerized record hash coding and linkage procedure to warrant epidemiological follow-up data security.
1997
[Legal aspects of health insurance claims and their potential use as a data source for epidemiological research].
1995
Structure, process, effectiveness and efficiency of the check and review system in Japan's health insurance
Health Policy, 1991

Cited by 280 articles