SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
Open Access
- 26 March 2010
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 5 (3), e9905
- https://doi.org/10.1371/journal.pone.0009905
Abstract
Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.This publication has 41 references indexed in Scilit:
- Mapping short DNA sequencing reads and calling variants using mapping quality scoresGenome Research, 2008
- Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray ExperimentsInternational Journal of Plant Genomics, 2008
- RNA-seq: An assessment of technical reproducibility and comparison with gene expression arraysGenome Research, 2008
- Mapping and quantifying mammalian transcriptomes by RNA-SeqNature Methods, 2008
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurementsNature Biotechnology, 2006
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2004
- Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implicationsProceedings of the National Academy of Sciences of the United States of America, 2001
- Computational analysis of microarray dataNature Reviews Genetics, 2001
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences of the United States of America, 2001