Recent years have witnessed the rise of many effective text information retrieval systems. By treating local visual features as terms, training images as documents and input images as queries, we formulate the problem of object recognition into that of text retrieval. Our formulation opens up the opportunity to integrate some powerful text retrieval tools with computer vision techniques. In this paper, we propose to improve the efficiency of articulated object recognition by an Okapi-Chamfer matching algorithm. The algorithm is based on the inverted index technique.
The inverted index is a widely used way to effectively organize a collection of text documents. With the inverted index, only documents that contain query terms are accessed and used for matching. To enable inverted indexing in an image database, we build a lexicon of local visual features by clustering the features extracted from the training images. Given a query image, we extract visual features and quantize them based on the lexicon, and then look up the inverted index to identify the subset of training images with non-zero matching score. To evaluate the matching scores in the subset, we combined the modified Okapi weighting formula with the Chamfer distance. The performance of the Okapi-Chamfer matching algorithm is evaluated on a hand posture recognition system. We test the system with both synthesized and real world images. Quantitative results demonstrate the accuracy and efficiency our system.