Constructing, organizing, and visualizing collections of topically related Web resources

Abstract
For many purposes, the Web page is too small a unit of interaction and analysis. Web sites are structured multimedia documents consisting of many pages, and users often are interested in obtaining and evaluating entire collections of topically related sites. Once such a collection is obtained, users face the challenge of exploring, comprehending and organizing the items. We report four innovations that address these user needs: (1) we replaced the Web page with the Websiteas the basic unit of interaction and analysis;(2) we defined a new informationstructure, theclan graph, that groups together sets of related sites; (3) we augment the representation of a site with asite profile, information about site structure and content that helps inform user evaluation of a site; and (4) we invented a new graph visualization, theauditorium visualization, that reveals important structural and content properties of sites within a clan graph. Detailed analysis and user studies document the utility of this approach. The clan graph construction algorithm tends to filter out irrelevant sites and discover additional relevant items. The auditorium visualization, augmented with drill-down capabilities to explore site profile data, helps users to find high-quality sites as well as sites that serve a particular function.

This publication has 8 references indexed in Scilit: