media analysis at FXPAL

 
 
  introduction to similarity analysis
   
introduction In this paper we review self-similarity analysis.  We are developing a general framework for inferring structure in media streams by building and processing similarity matrices.   
building the matrix The heart of the analysis is the construction of the similarity matrix.  The first step is to parameterize the time-indexed media samples using appropriate features.  Assume the media samples are grouped into N windows, and corresponding feature vectors     {v1, ..., vN} are calculated.  The features must quantify similarity; similar media samples must produce similar features.  The parameters need not be optimized for compression or transmission.  After calculating appropriate features, we compare all pairwise combinations of feature vectors using a quantitative similarity measure d.  We embed the results of these comparisons in a N x N matrix, S, such that

S(i,j) = d(vi, vj) .

The graphic above depicts the construction of the similarity matrix.  

examples

Example test signal. This 100-second audio file consists of  30 seconds of a 1 kHz sine wave, 40 seconds of a 500 Hz sine wave, and 30 seconds of a 2 kHz sine wave. Because the 500 Hz portion is the longest, the ideal summary should consist primarily of the 500 Hz signal as opposed to the shorter 1 KHz and 2 KHz segments. 

    Download:    testtones.wav    testtones.mp3

 

visualization

The figure above shows the similarity matrix for the two tone test signal.  64 bin low frequency spectrograms were computed for the signal over 256 sample windows at 20 Hz.  The resulting coefficients are compared using the cosine distance measure (i.e. the cosine of the angle between the two coefficient vectors).  The structure of this simple signal is clearly visible.  The bright white squares (high self-similarity) along the main diagonal indicate the three homogenous segments that comprise the test audio.  The dark rectangular regions off the main diagonal indicate the low cross-similarity between pairs of the three segments.

stream segmentation To review a presentation describing how similarity matrices can be processed to segment audio streams see the following links.  

Netscape (no audio)

Internet Explorer

For a more complete description, see 

J. Foote, "Automatic Audio Segmentation using a Measure of Audio Novelty." Proc. of IEEE Intl. Conf. on Multimedia and Expo, 1, pp. 452-455, 2000.