| introduction to similarity analysis | |
| introduction | In this paper we review self-similarity analysis. We are developing a general framework for inferring structure in media streams by building and processing similarity matrices. |
| building the matrix |
The heart of the analysis is the construction of the similarity matrix.
The first step is to parameterize the time-indexed media samples using
appropriate features. Assume the media samples are grouped into N
windows, and corresponding feature vectors {v1, ..., vN}
are calculated. The features must quantify similarity; similar media
samples must produce similar features. The parameters need not be optimized
for compression or transmission. After calculating appropriate features,
we compare all pairwise combinations of feature vectors using a quantitative
similarity measure d. We embed the results of these comparisons in
a N x N matrix, S, such that
S(i,j) = d(vi, vj) .
The graphic above depicts the construction of the similarity matrix. |
| examples |
Example test signal. This 100-second audio file consists of 30 seconds of a 1 kHz sine wave, 40 seconds of a 500 Hz sine wave, and 30 seconds of a 2 kHz sine wave. Because the 500 Hz portion is the longest, the ideal summary should consist primarily of the 500 Hz signal as opposed to the shorter 1 KHz and 2 KHz segments. Download: testtones.wav testtones.mp3
|
| visualization |
The figure above shows the similarity matrix for the two tone test signal. 64 bin low frequency spectrograms were computed for the signal over 256 sample windows at 20 Hz. The resulting coefficients are compared using the cosine distance measure (i.e. the cosine of the angle between the two coefficient vectors). The structure of this simple signal is clearly visible. The bright white squares (high self-similarity) along the main diagonal indicate the three homogenous segments that comprise the test audio. The dark rectangular regions off the main diagonal indicate the low cross-similarity between pairs of the three segments. |
| stream segmentation |
To review a presentation describing how similarity matrices can be processed to
segment audio streams see the following links.
Netscape (no audio) For a more complete description, see J. Foote, "Automatic Audio Segmentation using a Measure of Audio Novelty." Proc. of IEEE Intl. Conf. on Multimedia and Expo, 1, pp. 452-455, 2000. |