Video Keyframes

Accessing video through a timeline + keyframes interface

The video keyframe tools provide access into the video through keyframes along the timeline.

Our approach for selecting keyframes from a video starts with extracting video frames that are of high quality with respect to sharpness and sufficient brightness. Those extracted frames form the starting point for our other algorithm, such as the one that produces a Manga.

We use hierarchical agglomerative clustering to select keyframes from the extracted video frames. Our algorithm divides the cluster tree such that there are as many cluster as keyframes are needed for presentation. One keyframe is selected from each cluster. For temporal presentations such as timelines, keyframes are selected from clusters such that the temporal distribution of the selected keyframes is somewhat uniform.

Here we present two user interfaces that make use of the selected keyframes. One interface lets the the user browse through a visual-temporal cluster tree to quickly locate a video clip of interest. Another uses keyframes to aid the navigation in a video player.

Video Keyframe Browser

We created a novel interface for browsing through a video keyframe hierarchy to find frames or clips. We developed algorithms for selecting quality keyframes and for clustering keyframes hierarchically. At each level of the hierarchy, a single representative keyframe from each cluster is shown. Users can drill down into the most promising cluster and view representative keyframes for the sub-clusters. Our clustering algorithms optimize for short navigation paths to the desired keyframe. This publication provides additional details.

Please use a browser that supports the HTML5 canvas (Firefox, Chrome, Safari, Opera, IE9) to see the timeline.

The blue part of the timeline is linear and indicates the part of of the video that is represented by the keyframes in the center, initially the whole video. The rest of the timeline is non-linear with darker shades of gray indicating denser time. While the mouse is over a keyframe, the corresponding video segment is indicated in yellow in the timeline. Thin blue lines connect the keyframes to the corresponding times in the timeline.

Move the mouse across images to enlarge them. Scroll forward with the mouse scroll-wheel (one tick) while being over a keyframe to zoom in on that video segment. Scroll backwards to zoom out. Mouse clicks are intended for video playback but are not functional in this demonstration. On a touch-screen, drag across to enlarge images, drag up to see additional keyframes, and drag down to zoom out.

Video Keyframe Player

We created an interactive video keyframe player based on our keyframe selection algorithm. Keyframes are attached to the timeline and appear on mouse-over or touch. In the video player shown below, only the keyframes near the playback position or mouse position on the timeline are shown. That approach supports navigation near the current position. It can be easily switched to keyframes with increasingly larger gaps that cover the whole video.

The video player below is fully functional and can be controlled both with a mouse and on a touch device such as a mobile phone or a tablet.

Video from the NudgeCam project

Moving the mouse over the timeline or touching it makes the keyframes appear. Clicking on a keyframe moves the playback position to that time. Dragging the timeline thumb also changes the playback position. Both operations can be performed either when the video is paused or when it is playing. The larger dots in the timeline indicate the positions of the currently visible keyframes and the smaller dots indicate the positions of the other keyframes. The keyframe under the mouse is indicated in yellow.

Technical Contact

Related Publications

Publication Details
  • ACM International Conference on Multimedia Retrieval (ICMR)
  • Apr 17, 2011


User-generated video from mobile phones, digital cameras, and other devices is increasing, yet people rarely want to watch all the captured video. More commonly, users want a single still image for printing or a short clip from the video for creating a panorama or for sharing. Our interface aims to help users search through video for these images or clips in a more efficient fashion than fast-forwarding or "scrubbing" through a video by dragging through locations on a slider. It is based on a hierarchical structure of keyframes in the video, and combines a novel user interface design for browsing a video segment tree with new algorithms for keyframe selection, segment identification, and clustering. These algorithms take into account the need for quality keyframes and balance the desire for short navigation paths and similarity-based clusters. Our user interface presents keyframe hierarchies and displays visual cues for keeping the user oriented while browsing the video. The system adapts to the task by using a non-temporal clustering algorithm when a the user wants a single image. When the user wants a video clip, the system selects one of two temporal clustering algorithm based on a measure of the repetitiveness of the video. User feedback provided us with valuable suggestions for improvements to our system.

Discriminative Techniques for Keyframe Selection

Publication Details
  • 2005 IEEE International Conference on Multimedia & Expo
  • Jul 6, 2005


A convenient representation of a video segment is a single keyframe. Keyframes are widely used in applications such as non-linear browsing and video editing. With existing methods of keyframe selection, similar video segments result in very similar keyframes, with the drawback that actual differences between the segments may be obscured. We present methods for keyframe selection based on two criteria: capturing the similarity to the represented segment, and preserving the differences from other segment keyframes, so that different segments will have visually distinct representations. We present two discriminative keyframe selection methods, and an example of experimental results.
Publication Details
  • IEEE Computer, 34(9), pp. 61-67
  • Sep 1, 2001



To meet the diverse needs of business, education, and personal video users, the authors developed three visual interfaces that help identify potentially useful or relevant video segments. In such interfaces, keyframes-still images automatically extracted from video footage-can distinguish videos, summarize them, and provide access points. Well-chosen keyframes enhance a listing's visual appeal and help users select videos. Keyframe selection can vary depending on the application's requirements: A visual summary of a video-captured meeting may require only a few highlight keyframes, a video editing system might need a keyframe for every clip, while a browsing interface requires an even distribution of keyframes over the video's full length. The authors conducted user studies for each of their three interfaces, gathering input for subsequent interface improvements. The studies revealed that finding a similarity measure for collecting video clips into groups that more closely match human perception poses a challenge. Another challenge is to further improve the video-segmentation algorithm used for selecting keyframes. A new version will provide users with more information and control without sacrificing the interface's ease of use.

Publication Details
  • In Multimedia Tools and Applications, 11(3), pp. 347-358, 2000.
  • Aug 1, 2000


In accessing large collections of digitized videos, it is often difficult to find both the appropriate video file and the portion of the video that is of interest. This paper describes a novel technique for determining keyframes that are different from each other and provide a good representation of the whole video. We use keyframes to distinguish videos from each other, to summarize videos, and to provide access points into them. The technique can determine any number of keyframes by clustering the frames in a video and by selecting a representative frame from each cluster. Temporal constraints are used to filter out some clusters and to determine the representative frame for a cluster. Desirable visual features can be emphasized in the set of keyframes. An application for browsing a collection of videos makes use of the keyframes to support skimming and to provide visual summaries.
Publication Details
  • In Proceedings of IEEE International Conference on Multimedia and Expo, vol. III, pp. 1329-1332, 2000.
  • Jul 30, 2000


We describe a genetic segmentation algorithm for video. This algorithm operates on segments of a string representation. It is similar to both classical genetic algorithms that operate on bits of a string and genetic grouping algorithms that operate on subsets of a set. For evaluating segmentations, we define similarity adjacency functions, which are extremely expensive to optimize with traditional methods. The evolutionary nature of genetic algorithms offers a further advantage by enabling incremental segmentation. Applications include video summarization and indexing for browsing, plus adapting to user access patterns.