Censoring Trace-Level Environmental Data: Statistical Analysis Considerations to Limit Bias

Abstract
Trace-level environmental data typically include values near or below detection and quantitation thresholds where health effects may result from low-concentration exposures to one chemical over time or to multiple chemicals. In a cook stove case study, bias in dibenzo[a,h]anthracene concentration means and standard deviations (SDs) was assessed following censoring at thresholds for selected analysis approaches: substituting threshold/2, maximum likelihood estimation, robust regression on order statistics, Kaplan–Meier, and omitting censored observations. Means and SDs for gas chromatography–mass spectrometry-determined concentrations were calculated after censoring at detection and calibration thresholds, 17% and 55% of the data, respectively. Threshold/2 substitution was the least biased. Measurement values were subsequently simulated from two log-normal distributions at two sample sizes. Means and SDs were calculated for 30%, 50%, and 80% censoring levels and compared to known distribution counterparts. Simulation results illustrated (1) threshold/2 substitution to be inferior to modern after-censoring statistical approaches and (2) all after-censoring approaches to be inferior to including all measurement data in analysis. Additionally, differences in stove-specific group means were tested for uncensored samples and after censoring. Group differences of means tests varied depending on censoring and distributional decisions. Investigators should guard against censoring-related bias from (explicit or implicit) distributional and analysis approach decisions.