Use of Surnames to Identify Individuals of Chinese Ancestry

Abstract
The objectives of this study were to develop and test surname lists for identifying Chinese ancestry. The Ontario all-cause mortality database for the period 1982–1989 was randomly split into source and test data sets. Frequencies by birthplace were compiled for each surname in the source data set, by sex, and the surnames were weighted based on their positive likelihood ratios. Lists of Chinese surnames were then assembled based on varying cutoff levels, and screening performance indicators for each list were calculated, including sensitivity, specificity, positive and negative predictive values, post-test odds, positive likelihood ratio, and yield. The internally generated lists were evaluated in the test data set. Results indicated that surnames have a good potential to identify individuals of Chinese origin. In the source data set, at a cutoff level of 100 for males (217 surnames) and females (210 surnames), both sensitivity and the positive predictive value of the surname lists for males and females were very high, above 80%, and the positive likelihood ratio was above 600. In the test data set and using the same surname lists, the sensitivity, positive predictive value, and positive likelihood ratio remained at a high level: 73%, 81%, and 603, respectively, for males; and 73%, 84%, and 772, respectively, for females. Various scenarios and their methodological implications are discussed.