Example retrieval from a translation memory

Abstract
Clustering of a translation memory is proposed to make the retrieval of similar translation examples from a translation memory more efficient, while a second contribution is a metric of text similarity which is based on both surface structure and content. Tests on the two proposed techniques are run on part of the CELEX database. The results reported indicate that the clustering of the translation memory results in a significant gain in the retrieval response time, while the deterioration in the retrieval accuracy can be considered to be negligible. The text similarity metric proposed is evaluated by a human expert and found to be compatible with the human perception of text similarity.