|
Links
|
Matthew Cooper - RESEARCH
Overview:
My research interests focus on the design of automatic tools to
interpret and process multimedia information. I have
particular
interest in the combination of statistical inference methods and signal
processing techniques to facilitate multimedia information management,
retrieval, and reuse.
|
|
MULtimedia content analysis
Personal
and
institutional media collections are growing to unprecedented
sizes. Our
aim is to develop generic and scalable
statistical techniques for structural (temporal) and semantic
multimedia content analysis. Whenever possible, we attempt to
avoid limiting assumptions about the charcteristics of the content, and
design algorithms that can be readily adapted to multiple modalities.
Selected
recent projects are described below.
- semantic
media annotation: We are presently designing
systems for the high-level feature extraction task at TRECVID.
More details to come...
- interactive
video search: For
the
interactive search
task of TRECVID,
we designed a system combining multi-level
segmentation with powerful interface elements and validated the system
successfully. We automatically group shots according to a
text-based
topic segmentation. These "stories" are the central organizational unit
by which the interface
presents video shots to users in reponse to their queries.
More details may be found in [Adcock,
et al., 2005].
We are now integrating semantic indexes based on automatically
generated meta-data into
the interface.
- video
segmentation via classification: For the shot
boundary
detection task of TRECVID
2004,
we designed a system for video segmentation combining similarity
analysis with supervised classification. We first build a
partial
inter-frame similarity matrix to quantify local inter-frame
similarity. For this we use a chi-square measure of color
histogram similarity. We then use an efficient exact
k-nearest-neighbor classifier to determine which frames are cut or
gradual shot boundaries. The system was among the top
performing
systems, although it used relatively primitive low-level image features
(YUV histograms) and very minimal post-processing. The
approach is documented in [Cooper,
et al.,
2007].
- temporal
clustering for digital photo collections:
Consumers
are quickly
amassing large collections of digital photographs
as digital cameras saturate the consumer market. Most
commonly,
photographers wish to organize these collections according to events
such as vacations or family gatherings. Events are
difficult to consistently define either in terms of temporal duration
or statistics of low-level image features. We present
unsupervised
methods for event-based clustering based on
timestamps and image content that partition the photos into contiguous
clusters in time order. The approach is multiscale extension
of
the audio segmentation work of [Foote,
2000]. The algorithms
and evaluation results are summarized
in [Cooper
et al.,
2005] which includes extensions using the Bayes information
criterion and dynamic
programming.
- structural
analysis of popular music: A
similarity matrix
provides a convenient means of visualizing the
structure of a media stream [Foote,
1999]. We have been exploring
the application of spectral clustering to music
summarization. Music "thumbnails" are useful to enhance browsing of audio collections
or to serve as proxies to improve music
information retrieval. We take a hierarchical approach, first
segmenting the audio into its main structural components. In
the
case of popular music, these components are commonly verse and chorus
segments. We then compute an inter-segment similarity matrix
and
its SVD to cluster the
segments, and can select a summary using any number of
criteria. Selected examples are here. More details are in [Cooper
and Foote, 2003].
- automatic
music video creation: In this
work, we automatically align and
synchronize selected video excerpts to an arbitrary soundtrack is
documented in [Foote,
Cooper, and Girgensohn, 2002].
COMPUTER VISION
My graduate
research focused on algorithms and analysis for computer vision using pattern theory
and information
theory:
M.
Cooper. Information Measures for
Object Recognition Accommodating Signature Variability,
D.Sc.
Thesis, Washington
University in
St. Louis, 1999.
The thesis made two main contributions:
- extending
pattern theoretic object templates to
accommodate surface appearance variability: Object
templates are
derived from
three-dimensional geometric
CAD models. The variability in the pose of objects in a
three-dimensional scene is represented by the rigid transformations of
rotation and translation. We thus formulate pose
recognition as the estimation of the group action on the template that
best accounts for the observed imagery. Standard statistical
criteria for this estimation such as minimum-mean-squared-error (MMSE)
or maximum-a-posteriori probability (MAP) are used. This
approach
was originally presented in [Miller,
et al., 1997].
We
equip these rigid
templates with a random field
defined on the three-dimensional object surface. This scalar
random field was used initially to account for variations in the
radiant
intensity of the object surface in forward-looking infrared radar
(FLIR) imagery [Cooper
et al, 1997]. Different regions on a vehicle's surface have
different temperatures according to the vehicle's operational
state. This produces considerable variation in object
appearance. While the
object surface is represented by a
high-dimensional set of vertices, a Karhunen-Loeve expansion of the
random field allowed us to use a low-dimensional set of basis functions
to accurately account for the dominant variations in
appearance. This approach has also been
used to address diffuse illumination variability in computer vision [Cooper,
et al., 2001]. Ongoing efforts to commercialize this technology are being conducted by
Animetrics,
Inc.
- information-theoretic
analysis of computer vision: Formulating
recognition via statistical inference allows for principled
performance analysis of various recognition problems. We use
mutual information to quantitatively
determine: (1) the amount of information a specific senor (e.g. camera)
supplies about unknown object configuration parameters, and (2) the
information gain associated with the combination of mutliple
sensors. Such analysis is invaluable in the design of recognition systems and
allows for a systematic cost/benefit assessment of sensor deployment
options. Finally, the use of entropy measures and Fano's
inequality also provides bounds on recognition error. We also
provided asymptotic analysis of these measures. These results
are
summarized in [Cooper
and
Miller, 2000].
|