Manga

Interactive video summaries in a comic book style

Manga is an interactive video summary that provides video playback from the keyframes comprising the summary.

What is Manga?

Manga is a pictorial summary of video named after a Japanese word for “comic book”. A video is automatically analyzed and represented with different-sized keyframes packed in a visually pleasing form reminiscent of a comic book. Video Manga allows users to get a quick overview of a video’s contents at a glance without watching the video from beginning to end. The visual summaries are suitable for printing, and they can also be used to help users browse through videos.

To generate a Manga, we start with high-quality video keyframes. A video is segmented based on the color features of each keyframe. The segments are also clustered according to their similarities. We have introduced an importance score to rank the segments. A segment is considered to be important if it is long and rare. Keyframes are selected from highly ranked segments and sized according to their scores so that more important keyframes are presented as bigger frames.

Our frame-packing algorithm puts the different-sized keyframes in a compact “comic book” format such that images are placed in rows in temporal order from top-left to bottom-right. Within a row, there is some freedom for placing images but the overall order is maintained. The packing algorithm has to change the size of some of the images to avoid gaps and to fill the available space.

Manga on the Web

A previous version of Video Manga was implemented as a Java applet that was included in the Fuji Xerox product MediaDEPO. Recently, we implemented an interactive version of the pictorial summary using HTML5 technologies. This version will be included in a future version of MediaDEPO.

Moving the mouse over the displayed frames highlights the frame and the corresponding segment in the timeline. This allows users to explore the temporal properties of a video. Clicking on a keyframe starts video playback from the beginning of that segment. We have also implemented a way to explore segments that are not presented at the top level of the summary.

Video from the NudgeCam project

The Manga video summary above is fully interactive. Keyframes under the mouse expand. Moving the mouse along the timeline also expands the corresponding keyframes. Single-clicking on a keyframe starts video playback at that position. Double-clicking on a keyframe displays a more detailed view of that part of the video. On touch devices, touch-dragging takes the place of the mouse hover. Single and double taps have the same effects as mouse clicks.

Consistent keyframe selection

Manga video summaries of different sizes can use very different keyframes. That is visually confusing when a user changes the size of a Manga video summary. For video content owners who present their content as Manga summaries, this generates uncertainty of what users viewing those summaries on different display sizes will see. This uncertainty prevents the attachment of additional information to certain keyframes because those may not be shown in some summaries.

The solution is to use all keyframes from a smaller summary in a larger one and to select additional keyframes for the larger summary. Importance scores from the smaller summary are propagated to the larger one to keep image sizes similar.

This memorandum provides more details about the approach.

Technical Contact

Related Publications

2003
Publication Details
  • IEEE International Conference on Multimedia and Expo, v. II, pp. 77-80
  • Jul 7, 2003

Abstract

Close
We created an improved layout algorithm for automatically generating visual video summaries reminiscent of comic book pages. The summaries are comprised of images from the video that are sized according to their importance. The algorithm performs a global optimization with respect to a layout cost function that encompasses features such as the number of resized images and the amount of whitespace in the presentation. The algorithm creates summaries that: always fit exactly into the requested area, are varied by containing few rows with images of the same size, and have little whitespace at the end of the last row. The layout algorithm is fast enough to allow the interactive resizing of the summaries and the subsequent generation of a new layout.
2002
Publication Details
  • Proceedings IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, August 2002
  • Aug 26, 2002

Abstract

Close
We present a method for rapidly and robustly extracting audio excerpts without the overhead of speech recognition or speaker segmentation. An immediate application is to automatically augment keyframe-based video summaries with informative audio excerpts associated with the video segments represented by the keyframes. Short audio clips combined with keyframes comprise an extremely lightweight and Web-browsable interface for auditioning video or similar media, without using bandwidth-intensive streaming video or audio.
2001
Publication Details
  • IEEE Computer, 34(9), pp. 61-67
  • Sep 1, 2001

Abstract

Close

To meet the diverse needs of business, education, and personal video users, the authors developed three visual interfaces that help identify potentially useful or relevant video segments. In such interfaces, keyframes-still images automatically extracted from video footage-can distinguish videos, summarize them, and provide access points. Well-chosen keyframes enhance a listing's visual appeal and help users select videos. Keyframe selection can vary depending on the application's requirements: A visual summary of a video-captured meeting may require only a few highlight keyframes, a video editing system might need a keyframe for every clip, while a browsing interface requires an even distribution of keyframes over the video's full length. The authors conducted user studies for each of their three interfaces, gathering input for subsequent interface improvements. The studies revealed that finding a similarity measure for collecting video clips into groups that more closely match human perception poses a challenge. Another challenge is to further improve the video-segmentation algorithm used for selecting keyframes. A new version will provide users with more information and control without sacrificing the interface's ease of use.

2000
Publication Details
  • In CHI 2000 Conference Proceedings, ACM Press, pp. 185-192, 2000.
  • Mar 31, 2000

Abstract

Close
This paper presents a method for generating compact pictorial summarizations of video. We developed a novel approach for selecting still images from a video suitable for summarizing the video and for providing entry points into it. Images are laid out in a compact, visually pleasing display reminiscent of a comic book or Japanese manga. Users can explore the video by interacting with the presented summary. Links from each keyframe start video playback and/or present additional detail. Captions can be added to presentation frames to include commentary or descriptions such as the minutes of a recorded meeting. We conducted a study to compare variants of our summarization technique. The study participants judged the manga summary to be significantly better than the other two conditions with respect to their suitability for summaries and navigation, and their visual appeal.
1999
Publication Details
  • In Proceedings ACM Multimedia, (Orlando, FL) ACM Press, pp. 383-392, 1999.
  • Oct 30, 1999

Abstract

Close
This paper presents methods for automatically creating pictorial video summaries that resemble comic books. The relative importance of video segments is computed from their length and novelty. Image and audio analysis is used to automatically detect and emphasize meaningful events. Based on this importance measure, we choose relevant keyframes. Selected keyframes are sized by importance, and then efficiently packed into a pictorial summary. We present a quantitative measure of how well a summary captures the salient events in a video, and show how it can be used to improve our summaries. The result is a compact and visually pleasing summary that captures semantically important events, and is suitable for printing or Web access. Such a summary can be further enhanced by including text captions derived from OCR or other methods. We describe how the automatically generated summaries are used to simplify access to a large collection of videos.
Publication Details
  • In Human-Computer Interaction INTERACT '99, IOS Press, pp. 205-212, 1999.
  • Aug 30, 1999

Abstract

Close
When reviewing collections of video such as recorded meetings or presentations, users are often interested only in an overview or short segments of these documents. We present techniques that use automatic feature analysis, such as slide detection and applause detection, to help locate the desired video and to navigate to regions of interest within it. We built a web-based interface that graphically presents information about the contents of each video in a collection such as its keyframes and the distribution of a particular feature over time. A media player is tightly integrated with the web interface. It supports navigation within a selected file by visualiz-ing confidence scores for the presence of features and by using them as index points. We conducted a user study to refine the usability of these tools.
Publication Details
  • In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (Phoenix, AZ), vol. 6, pp. 3041-3044, 1999.
  • Mar 14, 1999

Abstract

Close
This paper presents methods of generating compact pictorial summarizations of video. By calculating a measure of shot importance video can be summarized by de-emphasizing or discarding less important information, such as repeated or common scenes. In contrast to other approaches that present keyframes for each shot, this measure allows summarization by presenting only the most important shots. Selected keyframes can also be resized depending on their relative importance. We present an efficient packing algorithm that constructs a pictorial representation from differently-sized keyframes. This results in a compact and visually pleasing summary reminiscent of a comic book.