DocuBrowse

Faceted document browsing and search

DocuBrowse is a faceted document browsing and search system that presents results from online enterprise document repositories within user-created document hierarchies.

Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queries, enterprise search has not been as successful because of differences in document types and user requirements. To support users in finding the information they need in their online enterprise repository, we created DocuBrowse, a faceted document browsing and search system. Search results are presented within the user-created document hierarchy, showing only directories and documents matching selected facets and containing text query terms. In addition to file properties such as date and file size, automatically detected document types, or genres, serve as one of the search facets. Highlighting draws the user’s attention to the most promising directories and documents while thumbnail images and automatically identified keyphrases help select appropriate documents. DocuBrowse utilizes document similarities, browsing histories, and recommender system techniques to suggest additional promising documents for the current facet and content filters.

Technical Contact

Related Publications

2010

Abstract

Close
Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queries, enterprise search has not been as successful because of differences in document types and user requirements. To support users in finding the information they need in their online enterprise repository, we created DocuBrowse, a faceted document browsing and search system. Search results are presented within the user-created document hierarchy, showing only directories and documents matching selected facets and containing text query terms. In addition to file properties such as date and file size, automatically detected document types, or genres, serve as one of the search facets. Highlighting draws the user’s attention to the most promising directories and documents while thumbnail images and automatically identified keyphrases help select appropriate documents. DocuBrowse utilizes document similarities, browsing histories, and recommender system techniques to suggest additional promising documents for the current facet and content filters.
Publication Details
  • Fuji Xerox Technical Report No. 19, pp. 88-100
  • Jan 1, 2010

Abstract

Close
Browsing and searching for documents in large, online enterprise document repositories is an increasingly common problem. While users are familiar and usually satisfied with Internet search results for information, enterprise search has not been as successful because of differences in data types and user requirements. To support users in finding the information they need from electronic and scanned documents in their online enterprise repository, we created an automatic detector for genres such as papers, slides, tables, and photos. Several of those genres correspond roughly to file name extensions but are identified automatically using features of the document. This genre identifier plays an important role in our faceted document browsing and search system. The system presents documents in a hierarchy as typically found in enterprise document collections. Documents and directories are filtered to show only documents matching selected facets and containing optional query terms and to highlight promising directories. Thumbnail images and automatically identified keyphrases help select desired documents.