A statistical selection strategy for normalization procedures in LC‐MS proteomics experiments through dataset‐dependent ranking of normalization scaling factors

Abstract
Quantification of LC‐MS peak intensities assigned during peptide identification in a typical comparative proteomics experiment will deviate from run‐to‐run of the instrument due to both technical and biological variation. Thus, normalization of peak intensities across an LC‐MS proteomics dataset is a fundamental step in pre‐processing. However, the downstream analysis of LC‐MS proteomics data can be dramatically affected by the normalization method selected. Current normalization procedures for LC‐MS proteomics data are presented in the context of normalization values derived from subsets of the full collection of identified peptides. The distribution of these normalization values is unknown a priori. If they are not independent from the biological factors associated with the experiment the normalization process can introduce bias into the data, possibly affecting downstream statistical biomarker discovery. We present a novel approach to evaluate normalization strategies, which includes the peptide selection component associated with the derivation of normalization values. Our approach evaluates the effect of normalization on the between‐group variance structure in order to identify the most appropriate normalization methods that improve the structure of the data without introducing bias into the normalized peak intensities.
Funding Information
  • National Institutes of Health (1R011GM084892, U54-016015, U54-AI081680, HHSN272200800060C)
  • U.S. Department of Energy (DE-AC05-76RL01830)