Summarizing Popular Music via Structural Similarity Analysis

Abstract

We present a framework for summarizing digital media based on structural analysis. Though these methods are applicable to general media, we concentrate here on characterizing repetitive structure in popular music. In the first step, a similarity matrix is calculated from inter-frame spectral similarity. Segment boundaries, such as verse-chorus transitions, are found by correlating a kernel along the diagonal of the matrix. Once segmented, spectral statistics of each segment are computed. In the second step, segments are clustered based on the pairwise similarity of their statistics, using a matrix decomposition approach. Finally, the audio is summarized by combining segments representing
the clusters most frequently repeated throughout the piece. We present results on a small corpus showing more than 90% correct detection of verse and chorus segments.