KEYWORD EXTRACTION FROM A SINGLE DOCUMENT USING WORD CO-OCCURRENCE STATISTICAL INFORMATION
Top Cited Papers
- 1 March 2004
- journal article
- research article
- Published by World Scientific Pub Co Pte Ltd in International Journal on Artificial Intelligence Tools
- Vol. 13 (01), 157-169
- https://doi.org/10.1142/s0218213004001466
Abstract
We present a new keyword extraction algorithm that applies to a single document without using a corpus. Frequent terms are extracted first, then a set of co-occurrences between each term and the frequent terms, i.e., occurrences in the same sentences, is generated. Co-occurrence distribution shows importance of a term in the document as follows. If the probability distribution of co-occurrence between term a and the frequent terms is biased to a particular subset of frequent terms, then term a is likely to be a keyword. The degree of bias of a distribution is measured by the χ2-measure. Our algorithm shows comparable performance to tfidf without using a corpus.Keywords
This publication has 4 references indexed in Scilit:
- Similarity-Based Models of Word Cooccurrence ProbabilitiesMachine Learning, 1999
- A cooccurrence-based thesaurus and two applications to information retrievalInformation Processing & Management, 1997
- Methods of automatic term recognitionTerminology. International Journal of Theoretical and Applied Issues in Specialized Communication, 1996
- An algorithm for suffix strippingProgram: electronic library and information systems, 1980