Thermodynamic instability of siRNA duplex is a prerequisite for dependable prediction of siRNA activities

Abstract
We developed a simple algorithm, i-Score (inhibitory-Score), to predict active siRNAs by applying a linear regression model to 2431 siRNAs. Our algorithm is exclusively comprised of nucleotide (nt) preferences at each position, and no other parameters are taken into account. Using a validation dataset comprised of 419 siRNAs, we found that the prediction accuracy of i-Score is as good as those of s-Biopredsi, ThermoComposition21 and DSIR , which employ a neural network model or more parameters in a linear regression model. Reynolds and Katoh also predict active siRNAs efficiently, but the numbers of siRNAs predicted to be active are less than one-eighth of that of i-Score . We additionally found that exclusion of thermostable siRNAs, whose whole stacking energy (Δ G ) is less than − 34.6 kcal/mol, improves the prediction accuracy in i-Score, s-Biopredsi, ThermoComposition21 and DSIR . We also developed a universal target vector, pSELL, with which we can assay an siRNA activity of any sequence in either the sense or antisense direction. We assayed 86 siRNAs in HEK293 cells using pSELL, and validated applicability of i-Score and the whole Δ G value in designing siRNAs.