Ranked Feature Fusion Models for Ad Hoc Retrieval

Abstract

We introduce the Ranked Feature Fusion framework for information retrieval system design. Typical information retrieval
formalisms such as the vector space model, the best-match model and the language model first combine features (such as
term frequency and document length) into a unified representation, and then use the representation to rank documents.
We take the opposite approach: Documents are first ranked by the relevance of a single feature value and are assigned
scores based on their relative ordering within the collection. A separate ranked list is created for every feature
value and these lists are then fused to produce a final document scoring. This new “rank then combine” approach is
extensively evaluated and is shown to be as effective as traditional “combine then rank” approaches. The model is
easy to understand and contains fewer parameters than other approaches. Finally, the model is easy to extend
(integration of new features is trivial) and modify. This advantage includes but is not limited to relevance feedback
and distribution flattening.