Links


Matthew Cooper - RESEARCH
Overview:
My research interests focus on the design of automatic tools to interpret and process multimedia information. I have particular interest in the combination of statistical inference methods and signal processing techniques to facilitate multimedia information management, retrieval, and reuse.

MULtimedia content analysis
Personal and institutional media collections are growing to unprecedented sizes. Our aim is to develop generic and scalable statistical techniques for structural (temporal) and semantic multimedia content analysis. Whenever possible, we attempt to avoid limiting assumptions about the charcteristics of the content, and design algorithms that can be readily adapted to multiple modalities. Selected recent projects are described below.
  • semantic media annotation: We are presently designing systems for the high-level feature extraction task at TRECVID. More details to come...
  • interactive video search: For the interactive search task of TRECVID, we designed a system combining multi-level segmentation with powerful interface elements and validated the system successfully. We automatically group shots according to a text-based topic segmentation. These "stories" are the central organizational unit by which the interface presents video shots to users in reponse to their queries. More details may be found in [Adcock, et al., 2005]. We are now integrating semantic indexes based on automatically generated meta-data into the interface.
  • video segmentation via classification: For the shot boundary detection task of TRECVID 2004, we designed a system for video segmentation combining similarity analysis with supervised classification. We first build a partial inter-frame similarity matrix to quantify local inter-frame similarity. For this we use a chi-square measure of color histogram similarity. We then use an efficient exact k-nearest-neighbor classifier to determine which frames are cut or gradual shot boundaries. The system was among the top performing systems, although it used relatively primitive low-level image features (YUV histograms) and very minimal post-processing. The approach is documented in [Cooper, et al., 2007].
  • temporal clustering for digital photo collections: Consumers are quickly amassing large collections of digital photographs as digital cameras saturate the consumer market. Most commonly, photographers wish to organize these collections according to events such as vacations or family gatherings. Events are difficult to consistently define either in terms of temporal duration or statistics of low-level image features. We present unsupervised methods for event-based clustering based on timestamps and image content that partition the photos into contiguous clusters in time order. The approach is multiscale extension of the audio segmentation work of [Foote, 2000]. The algorithms and evaluation results are summarized in [Cooper et al., 2005] which includes extensions using the Bayes information criterion and dynamic programming.
  • structural analysis of popular music: A similarity matrix provides a convenient means of visualizing the structure of a media stream [Foote, 1999]. We have been exploring the application of spectral clustering to music summarization. Music "thumbnails" are useful to enhance browsing of audio collections or to serve as proxies to improve music information retrieval. We take a hierarchical approach, first segmenting the audio into its main structural components. In the case of popular music, these components are commonly verse and chorus segments. We then compute an inter-segment similarity matrix and its SVD to cluster the segments, and can select a summary using any number of criteria. Selected examples are here. More details are in [Cooper and Foote, 2003].
  • automatic music video creation: In this work, we automatically align and synchronize selected video excerpts to an arbitrary soundtrack is documented in [Foote, Cooper, and Girgensohn, 2002].
COMPUTER VISION
My graduate research focused on algorithms and analysis for computer vision using pattern theory and information theory:

M. Cooper. Information Measures for Object Recognition Accommodating Signature Variability, D.Sc. Thesis, Washington University in St. Louis, 1999.

The thesis made two main contributions:
  • extending pattern theoretic object templates to accommodate surface appearance variability: Object templates are derived from three-dimensional geometric CAD models. The variability in the pose of objects in a three-dimensional scene is represented by the rigid transformations of rotation and translation. We thus formulate pose recognition as the estimation of the group action on the template that best accounts for the observed imagery. Standard statistical criteria for this estimation such as minimum-mean-squared-error (MMSE) or maximum-a-posteriori probability (MAP) are used. This approach was originally presented in [Miller, et al., 1997].
    We equip these rigid templates with a random field defined on the three-dimensional object surface. This scalar random field was used initially to account for variations in the radiant intensity of the object surface in forward-looking infrared radar (FLIR) imagery [Cooper et al, 1997]. Different regions on a vehicle's surface have different temperatures according to the vehicle's operational state. This produces considerable variation in object appearance. While the object surface is represented by a high-dimensional set of vertices, a Karhunen-Loeve expansion of the random field allowed us to use a low-dimensional set of basis functions to accurately account for the dominant variations in appearance. This approach has also been used to address diffuse illumination variability in computer vision [Cooper, et al., 2001]. Ongoing efforts to commercialize this technology are being conducted by Animetrics, Inc.
  • information-theoretic analysis of computer vision: Formulating recognition via statistical inference allows for principled performance analysis of various recognition problems. We use mutual information to quantitatively determine: (1) the amount of information a specific senor (e.g. camera) supplies about unknown object configuration parameters, and (2) the information gain associated with the combination of mutliple sensors. Such analysis is invaluable in the design of recognition systems and allows for a systematic cost/benefit assessment of sensor deployment options. Finally, the use of entropy measures and Fano's inequality also provides bounds on recognition error. We also provided asymptotic analysis of these measures. These results are summarized in [Cooper and Miller, 2000].