Empirical Analysis of Attribute-Aware Recommender System Algorithms Using Synthetic Data

1 July 2006

journal article
Published by International Academy Publishing (IAP) in Journal of Computers

Vol. 1 (4), 18-29-29
https://doi.org/10.4304/jcp.1.4.18-29

Abstract

As the amount of online shoppers grows rapidly, the need of recommender systems for e-commerce sites are demanding, especially when the number of users and products being offered online continues to increase dramatically. There have been many ongoing researches on recommender systems and in investigating recommendation algorithms that could optimize the recommendation quality. However, adequate and public datasets of users and products have always been demanding to better evaluate recommender system algorithms. Yet, the amount of public data, especially data containing adequate content information (attributes) is limited. When evaluating recommendation algorithms, it is important to observe the behavior of the algorithm as the characteristic of data varies. Synthetic data would allow the application of systematic changes on the data which cannot be done with real-life data. Although studies on synthetic data for the use of recommender systems have been investigated, artificial data with attributes information are rarely looked into. In this paper, we review public and synthetic data that are applied in the field of recommender systems. A synthetic data generation methodology that considers attributes will also be discussed. Furthermore, we present empirical evaluations on existing attributea-ware recommendation algorithms and other state-of-the-art algorithms using real-life dataset as well as variable synthetic data to observe their behavior as the characteristic of data varies. In particular, the informativeness of attributes is being further investigated with both real-life datasets with augmented attributes sets as well as synthetic datasets with attributes. We have shown that a reasonably good overview of the behavior of attribute-aware algorithms can be obtained by using synthetic data compared to results done with real-life datasets.

Keywords

Cited by 9 articles