A Review of Text-Based Recommendation Systems

Top Cited Papers
Open Access
Abstract
Many websites over the Internet are producing a variety of textual data; such as news, research articles, ebooks, personal blogs, and user reviews. In these websites, the textual data is so large that the process of finding pertinent information by a user often becomes cumbersome. To overcome this issue, “Text-based Recommendation Systems (RS)” are being developed. They are the systems with the capability to find the relevant information in a minimal time using text as the primary feature. There exist several techniques to build and evaluate such systems. And though a good number of surveys compile the general attributes of recommendation systems, there is still a lack of comprehensive literature review about the text-based recommendation systems. In this paper, we present a review of the latest studies on text-based RS. We have conducted this survey by collecting literature from preeminent digital repositories, that was published during the period 2010-2020. This survey mainly covers the four major aspects of the textual based recommendation systems used in the reviewed literature. The aspects are datasets, feature extraction techniques, computational approaches, and evaluation metrics. As benchmark datasets carry a vital role in any research, publicly available datasets are extensively reviewed in this paper. Moreover, for text-based RS many proprietary datasets are also used, which are not available in the public. But we have consolidated all the attributes of these publically available and proprietary datasets to familiarize these attributes to new researchers. Furthermore, the feature extraction methods from the text are briefed and their usage in the construction of text-based RS are discussed. Later, various computational approaches that use these features are also discussed. To evaluate these systems, some evaluation metrics are adopted. We have presented an overview of these evaluation metrics and diagramed them according to their popularity. The survey concludes that Word Embedding is the widely used feature selection technique in the latest research. The survey also deduces that hybridization of text features with other features enhance the recommendation accuracy. The study highlights the fact that most of the work is on English textual data, and News recommendation is the most popular domain.

This publication has 97 references indexed in Scilit: