Automatic Music Summarization via Similarity Analysis

Abstract

We present methods for automatically producing summary excerpts or thumbnails of music. To find the most representative excerpt, we maximize the average segment similarity to the entire work. After window-based audio parameterization, a quantitative similarity measure is calculated between every pair of windows, and the results
are embedded in a 2-D similarity matrix. Summing the similarity matrix over the support of a segment results in a measure of how similar that segment is to the whole. This measure is maximized to find the segment that best represents the entire work. We discuss variations on the method, and present experimental results for orchestral music, popular songs, and jazz. These results demonstrate that the method finds significantly representative excerpts, using very few assumptions about the source audio.