InterPro and InterProScan

Abstract
Protein sequence classification and comparison has become increasingly important in the current “omics” revolution, where scientists are working on functional genomics and proteomics technologies for large-scale protein function prediction. However, functional classification is also important for the bench scientist wanting to analyze single or small sets of proteins, or even a single genome. A number of tools are available for sequence classification, such as sequence similarity searches, motif- or pattern-finding software, and protein signatures for identifying protein families and domains. One such tool, InterPro, is a documentation resource that integrates the major players in the protein signature field to provide a valuable tool for annotation of proteins. Protein sequences are searched using the InterProScan software to identify signatures from the InterPro member databases; Pfam, PROSITE, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D, and PANTHER. The InterPro database can be searched to retrieve precalculated matches for UniProtKB proteins, or to find additional information on protein families and domains. For completely sequenced genomes, the user can retrieve InterPro-based analyses on all nonredundant proteins in the proteome, and can execute user-selected proteome comparisons. This chapter will describe how to use InterPro and InterProScan for protein sequence classification and comparative proteomics