Structural characterization of popular music


This page contains digital audio data that is reproduced here under the "Fair Use" clause of the 1976 Copyright Act (17 USCS �107).

In this experiment, we compute summaries of pop and rock songs.  Our aim is to construct summaries for use in database applications both to help users browse individual audio files, and also as proxies for indexing, searching, and retrieval.  The summarization is based on a complete structural characterization of the piece.  We extract the two segment clusters which are most frequently repeated in the song.  In contrast to existing methods, the measure of repetition is not dependent on segment length.  


The table below shows results for a few songs.   The columns show the summary elements.  Segments I and II are representative segments for the two dominant clusters in the song, selected based on global similarity to the song's segments.   They are ordered by occurrence in the piece, so that Summary I is the predicted verse segment and Summary II is the predicted chorus segment, assuming that the verse and chorus segments are the dominant segment clusters detected by the automatic algorithm.  The 2 Segment summary is the most globally similar contiguous combination of a verse and chorus cluster.  If there is a segment(s) between the verse and chorus (i.e. a lead-in segment) in every verse/chorus occurrence through the piece, these are included to provide a contiguous summary.
Song - Title / Artist Entire Song Segment I Segment II 2 Segment Summary 
Wild Honey / U2 MP3 MP3 MP3 MP3
Lucy in the Sky with Diamonds / The Beatles MP3 MP3 MP3 MP3
The Magical Mystery Tour / The Beatles MP3 MP3 MP3 MP3
Optimistic / Radiohead MP3 MP3 MP3 MP3
Hash Pipe / Weezer MP3 MP3 MP3 MP3
Bohemian Like You / The Dandy Warhols MP3 MP3 MP3 MP3
Tahitian Moon / Porno for Pyros MP3 MP3 MP3 MP3
The Zephyr Song / The Red Hot Chili Peppers MP3 MP3 MP3 MP3
I Did It / The Dave Matthews Band MP3 MP3 MP3 MP3

To review segmentation results for a subset of the above songs (included in [2]), see this page.

For comparison, 30 second summary from Tahitian Moon

For comparison, 30 second summary from Hash Pipe

For comparison, 30 second summary from I Did It


technical details
The approach is fully documented in the papers below.  A basic flowchart appears below.  The first step is to segment the digital audio into its major components.  For pop music, these are typically, verse, chorus, bridge, etc..  The segments are then statistically clustered using spectral methods, and the dominant segment clusters are determined.  Representative segments are selected from the dominant segment clusters to comprise the summary.  Using the time-ordering of the segments, the verse and chorus cluster is predicted.  A final summary consisting of adjacent "verse/chorus" segments is also provided.  In the event that the verse and chorus are not adjacent anywhere in the piece, we include intermediate segments.  As each segment is assigned a cluster, many other forms of summaries can be provided, according to the application context or bandwidth constraints. 

[1] J. Foote and M. Cooper. Media Segmentation using Self-Similarity DecompositionProc. SPIE, 5021:167--75, 2003.
This paper (above)  provides a description of the approach with a single example for exposition.

[2] M. Cooper and J. Foote. Summarizing Popular Music via Structural Similarity Analysis.  Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2003.
This paper provides an overview of the approach and  more complete experimental results.

Technical Contact: Matt Cooper

Related Publications

Copyright ©1999-2014 FX Palo Alto Laboratory | Send feedback to the webmaster