A Novel Method of Significant Words Identification in Text Summarization

Abstract
Text summarization is a process that reduces the size of the text document and extracts significant sentences from a text document. We present a novel technique for text summarization. The originality of technique lies on exploiting local and global properties of words and identifying significant words. The local property of word can be considered as the sum of normalized term frequency multiplied by its weight and normalized number of sentences containing that word multiplied by its weight. If local score of a word is less than local score threshold, we remove that word. Global property can be thought of as maximum semantic similarity between a word and title words. Also we introduce an iterative algorithm to identify significant words. This algorithm converges to the fixed number of significant words after some iterations and the number of iterations strongly depends on the text document. We used a two-layered backpropagation neural network with three neurons in the hidden layer to calculate weights. The results show that this technique has better performance than MS-word 2007, baseline and Gistsumm summarizers