Publications

FXPAL publishes in top scientific conferences and journals.

2003
Publication Details
  • CHI 2003
  • Apr 7, 2003

Abstract

Close
Shared freeform input is a technique for facilitating note taking across devices during a meeting. Laptop users enter text with a keyboard, whereas PDA and Tablet PC users input freeform ink with their stylus. Users can quickly reuse text and freeform ink already entered by others. We show how a new technique, freeform pasting, allowed us to deal with a variety of design issues such as quick and informal ink sharing, screen real estate, privacy and mixing ink-based and textual material.

Media Segementation using Self-Similarity Decomposition

Publication Details
  • Proc. SPIE Storage and Retrieval for Multimedia Databases, Vol. 5021, pp. 167-75
  • Jan 20, 2003

Abstract

Close
We present a framework for analyzing the structure of digital media streams. Though our methods work for video,text,and audio,we concentrate on detecting the structure of digital music files. In the first step,spectral data is used to construct a similarity matrix calculated from inter-frame spectral similarity. The digital audio can be robustly segmented by correlating a ernel along the diagonal of the similarity matrix. Once segmented, spectral statistics of each segment are computed.In the second step,segments are clustered based on the self- similarity of their statistics. This reveals the structure of the digital music in a set of segment boundaries and labels.Finally,the music can be summarized by selecting clusters with repeated segments throughout the piece. The summaries can be customized for various applications based on the structure of the original music.

AttrActive Windows: Active Windows for Pervasive Computing Applications

Publication Details
  • ACM Intelligent User Interface (IUI) 2003, Miami Beach, FL, pp 326
  • Jan 12, 2003

Abstract

Close
We introduce the AttrActive Windows user interface, a novel approach for presenting interactive content on large screen, interactive, digital, bulletin boards. Moving away from the desktop metaphor, AttrActive Windows are dynamic, non-uniform windows that can appear in different orientations and have autonomous behaviours to attract passers-by and invite interactions.
2002
Publication Details
  • IEEE Multimedia Signal Processing Workshop
  • Dec 11, 2002

Abstract

Close
We present a novel approach to automatically ex-tracting summary excerpts from audio and video. Our approach is to maximize the average similarity between the excerpt and the source. We first calculate a similarity matrix by comparing each pair of time samples using a quantitative similarity measure. To determine the segment with highest average similarity, we maximize the summation of the self-similarity matrix over the support of the segment. To select multiple excerpts while avoiding redundancy, we compute the non-negative matrix factorization (NMF) of the similarity matrix into its essential structural components. We then build a summary comprised of excerpts from the main components, selecting the excerpts for maximum average similarity within each component. Variations integrating segmentation and other information are also discussed, and experimental results are presented.
Publication Details
  • ACM Multimedia 2002
  • Dec 1, 2002

Abstract

Close
We present methods for automatic and semi-automatic creation of music videos, given an arbitrary audio soundtrack and source video. Significant audio changes are automatically detected; similarly, the source video is automatically segmented and analyzed for suitability based on camera motion and exposure. Video with excessive camera motion or poor contrast is penalized with a high unsuitability score, and is more likely to be discarded in the final edit. High quality video clips are then automatically selected and aligned in time with significant audio changes. Video clips are adjusted to match the audio segments by selecting the most suitable region of the desired length. Besides a fully automated solution, our system can also start with clips manually selected and ordered using a graphical interface. The video is then created by truncating the selected clips (preserving the high quality portions) to produce a video digest that is synchronized with the soundtrack music, thus enhancing the impact of both.
Publication Details
  • ACM Multimedia 2002
  • Dec 1, 2002

Abstract

Close
FlySPEC is a video camera system designed for real-time remote operation. A hybrid design combines the high resolution possible using an optomechanical video camera, with the wide field of view always available from a panoramic camera. The control system integrates requests from multiple users with the result that each controls a virtual camera. The control system seamlessly integrates manual and fully automatic control. It supports a range of options from untended automatic to full manual control, and the system can learn control strategies from user requests. Additionally, the panoramic view is always available for an intuitive interface, and objects are never out of view regardless of the zoom factor. We present the system architecture, an information-theoretic approach to combining panoramic and zoomed images to optimally satisfy user requests, and experimental results that show the FlySPEC system significantly assists users in a remote inspection tasks.
Publication Details
  • ACM 2002 Conference on Computer Supported Cooperative Work
  • Nov 16, 2002

Abstract

Close
Technology can play an important role in enabling people to interact with each other. The Web is one such technology with the affordances for sharing information and for connecting people to people. In this paper, we describe the design of two social interaction Web sites for two different social groups. We review several related efforts to provide principles for creating social interaction environments and describe the specific principles that guided our design. To examine the effectiveness of the two sites, we analyze the usage data. Finally, we discuss approaches for encouraging participation and lessons learned.

Moving Markup: Repositioning Freeform Annotations

Publication Details
  • Proceedings of ACM UIST 2002
  • Oct 27, 2002

Abstract

Close
Freeform digital ink annotation allows readers to interact with documents in an intuitive and familiar manner. Such marks are easy to manage on static documents, and provide a familiar annotation experience. In this paper, we describe an implementation of a freeform annotation system that accommodates dynamic document layout. The algorithm preserves the correct position of annotations when documents are viewed with different fonts or font sizes, with different aspect ratios, or on different devices. We explore a range of heuristics and algorithms required to handle common types of annotation, and conclude with a discussion of possible extensions to handle special kinds of annotations and changes to documents.
Publication Details
  • IEEE InfoVis '02 Interactive Poster and Demo
  • Oct 27, 2002

Abstract

Close
This work presents constructs called interactive space-time maps along with an application called the SpaceTime Browser for visualizing and retrieving documents. A 3D visualization with 2D planar maps and a time line is employed. Users can select regions on the maps and choose precise time intervals by sliding the maps along the telescopic time line. Regions are highlighted to indicate the presence of documents with matching space-time attributes, and documents are retrieved and displayed in an adjoining workspace. We provide two examples: (1) organizing travel photos, (2) managing documents created by room location-aware devices in a building.

Context-Aware Communication

Publication Details
  • IEEE Wireless Communications Magazine, Vol. 9, No. 5.
  • Oct 15, 2002

Abstract

Close
This paper describes how the changing information about an individual's location, environment, and social situation can be used to initiate and facilitate people's interactions with one another, individually and in groups. Context-aware communication is contrasted with other forms of context-aware computing and we characterize applications in terms of design decisions along two dimensions: the extent of autonomy in context sensing and the extent of autonomy in communication action. A number of context-aware communication applications from the research literature are presented in five application categories. Finally, a number of issues related to the design of context-aware communication applications are presented.

Web Interaction Using Very Small Internet Devices

Publication Details
  • IEEE Computer Magazine, Cover Feature, Vol. 35, No. 10.
  • Oct 15, 2002

Abstract

Close
Squeezing desktop Web content into smart phones and text pagers is more practical with separate interfaces for navigation and content manipulation. m-Links, a middleware proxy system, supports this dual-mode browsing, offering phonetop users an extendable set of actions.

Automatic Music Summarization via Similarity Analysis

Publication Details
  • 2002 International Symposium on Music Information Retrieval
  • Oct 13, 2002

Abstract

Close
We present methods for automatically producing summary excerpts or thumbnails of music. To find the most representative excerpt, we maximize the average segment similarity to the entire work. After window-based audio parameterization, a quantitative similarity measure is calculated between every pair of windows, and the results are embedded in a 2-D similarity matrix. Summing the similarity matrix over the support of a segment results in a measure of how similar that segment is to the whole. This measure is maximized to find the segment that best represents the entire work. We discuss variations on the method, and present experimental results for orchestral music, popular songs, and jazz. These results demonstrate that the method finds significantly representative excerpts, using very few assumptions about the source audio.

Audio Retrieval by Rhythmic Similarity

Publication Details
  • 2002 International Symposium on Music Information Retrieval
  • Oct 13, 2002

Abstract

Close
We present a method for characterizing both the rhythm and tempo of music. We also present ways to quantitatively measure the rhythmic similarity between two or more works of music. This allows rhythmically similar works to be retrieved from a large collection. A related application is to sequence music by rhythmic similarity, thus providing an automatic "disc jockey" function for musical libraries. Besides specific analysis and retrieval methods, we present small-scale experiments that demonstrate ranking and retrieving musical audio by rhythmic similarity.
Publication Details
  • The 4th International Conference on Ubiquitous Computing (UbiComp 2002).
  • Sep 29, 2002

Abstract

Close
As ubiquitous computing becomes widespread, we are increasingly coming into contact with "shared" computer-enhanced devices, such as cars, televisions, and photocopiers. Our interest is in identifying general issues in personalizing such shared everyday devices. Our approach is to compare alternative personalization methods by deploying and using alternative personalization interfaces (portable and embedded) for three shared devices in our workplace (a presentation PC, a plasma display for brainstorming, and a multi-function copier). This paper presents the comparative prototyping methodology we employed, the experimental system we deployed, observations and feedback from use, and resulting issues in designing personalized shared ubiquitous devices.
Publication Details
  • Workshop on User centered Evaluations for Ubiquitous Computing Systems: Best Known Methods, The 4th International Conference on Ubiquitous Computing (UbiComp 2002).
  • Sep 29, 2002

Abstract

Close
Evaluating ubiquitous systems is hard, and has attracted the attention of others in the research community. These investigators, like others in CSCW, argue there is a basic mismatch between traditional evaluation techniques and the needs posed by ubiquitous systems. Namely, these systems are embedded in a variety of complex real world environments that cannot be easily modeled (as required by theoretical analyses), simulated, measured, or controlled (as required by laboratory experiments). As a result, many investigators have abandoned traditional comparative evaluation techniques and opted instead for techniques adapted from the social sciences, such as anthropology. We wanted to perform a comparative evaluation similar to a laboratory experiment, but in such a way that we could observe the effects of our design decisions in relatively unconstrained, real world use. This led us to the process described in this paper.

Low-Resolution Supplementary Tactile Cues for Navigational Assistance

Publication Details
  • In proceedings of Mobile HCI 2002. (Pisa, Italy,2002), Springer-Verlag, Lecture notes in computer science #2411,pp.369-372.
  • Sep 18, 2002

Abstract

Close
The TactGuide is a mobile navigation device 'displaying' personalized direction cues by means of a tactile and 'tactful' representation. The TactGuide is operated by tactile inspection which is subtle enough to allow the users to engage/disengage in device interaction while preserving their visual, auditory and kinesthetic senses for inspection of the environment. The TactGuide design thereby accommodates the users' need to economize their attentional resources between device and environment while navigating through physical space. Preliminary experiments indicates that users readily map the tactile cues to spatial directions and that TactGuide can be operated as a supplement to, and without compromising, the use of our existing wayfinding abilities. substituting the use of our natural abilities and earned skills for wayfinding.
Publication Details
  • Journal of Mathematical Physics, September 2002 special issue on Quantum Information Theory, Vol. 43 (9), pp. 4376 - 7381.
  • Sep 7, 2002

Abstract

Close

To implement any quantum operation (a.k.a. ``superoperator'' or ``CP map'') on a d-dimensional quantum system, it is enough to apply a suitable overall unitary transformation to the system and a d^2-dimensional environment which is initialized in a fixed pure state. It has been suggested that a d-dimensional environment might be enough if we could initialize the environment in a mixed state of our choosing. In this note we show with elementary means that certain explicit quantum operations cannot be realized in this way. Our counterexamples map some pure states to pure states, giving strong and easily manageable conditions on the overall unitary transformation. Everything works in the more general setting of quantum operations from d-dimensional to d'-dimensional spaces, so we place our counterexamples within this more general framework.

Publication Details
  • Proceedings IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, August 2002
  • Aug 26, 2002

Abstract

Close
We present a method for rapidly and robustly extracting audio excerpts without the overhead of speech recognition or speaker segmentation. An immediate application is to automatically augment keyframe-based video summaries with informative audio excerpts associated with the video segments represented by the keyframes. Short audio clips combined with keyframes comprise an extremely lightweight and Web-browsable interface for auditioning video or similar media, without using bandwidth-intensive streaming video or audio.
Publication Details
  • IEEE International Conference on Multimedia and Expo 2002
  • Aug 26, 2002

Abstract

Close
This paper presents a camera system called FlySPEC. In contrast to a traditional camera system that provides the same video stream to every user, FlySPEC can simultaneously serve different video-viewing requests. This flexibility allows users to conveniently participate in a seminar or meeting at their own pace. Meanwhile, the FlySPEC system provides a seamless blend of manual control and automation. With this control mix, users can easily make tradeoffs between video capture effort and video quality. The FlySPEC camera is constructed by installing a set of Pan/Tilt/Zoom (PTZ) cameras near a high-resolution panoramic camera. While the panoramic camera provides the basic functionality of serving different viewing requests, the PTZ camera is managed by our algorithm to improve the overall video quality that may affect users watching details. The video resolution improvements from using different camera management strategies are compared in the experimental section.

Detecting Path Intersections in Panoramic Video

Publication Details
  • IEEE International Conference on Multimedia and Expo 2002
  • Aug 26, 2002

Abstract

Close
Given panoramic video taken along a self-intersecting path, we present a method for detecting the intersection points. This allows "virtual tours" to be synthesized by splicing the panoramic video at the intersection points. Spatial intersections are detected by finding the best-matching panoramic images from a number of nearby candidates. Each panoramic image is segmented into horizontal strips. Each strip is averaged in the vertical direction. The Fourier coefficients of the resulting 1-D data capture the rotation-invariant horizontal texture of each panoramic image. The distance between two panoramic images is calculated as the sum of the distances between their strip texture pairs at the same row positions. The intersection is chosen as the two candidate panoramic images that have the minimum distance.
Publication Details
  • SPIE ITCOM 2002
  • Jul 31, 2002

Abstract

Close
We present a framework, motivated by rate-distortion theory and the human visual system, for optimally representing the real world given limited video resolution. To provide users with high fidelity views, we built a hybrid video camera system that combines a fixed wide-field panoramic camera with a controllable pan/tilt/zoom (PTZ) camera. In our framework, a video frame is viewed as a limited-frequency representation of some "true" image function. Our system combines outputs from both cameras to construct the highest fidelity views possible, and controls the PTZ camera to maximize information gain available from higher spatial frequencies. In operation, each remote viewer is presented with a small panoramic view of the entire scene, and a larger close-up view of a selected region. Users may select a region by marking the panoramic view. The system operates the PTZ camera to best satisfy requests from multiple users. When no regions are selected, the system automatically operates the PTZ camera to minimize predicted video distortion. High-resolution images are cached and sent if a previously recorded region has not changed and the PTZ camera is pointed elsewhere. We present experiments demonstrating that the panoramic image can effectively predict where to gain the most information, and also that the system provides better images to multiple users than conventional camera systems.

Communication and Understanding for Decision Support

Publication Details
  • Proceedings of the IFIP International Conference on Decision Making and Decision Support in the Internet Age
  • Jul 4, 2002

Abstract

Close
As the technology for communication changes, the role of communication in the conduct of business changes with it. Communication is no longer just a technical matter of separating signal from noise and managing bandwidth but also a social matter in which negotiating differences in understanding among and between communicators is a primary business priority. Addressing this priority requires an understanding of how individuals interact in the course of their decision making activities. Using the work of Anthony Giddens as a point of departure, this paper views interaction in communication as consisting of three dimensions - meaning, authority, and trust. These three dimensions are used to identify new opportunities for advances in decision making technology which help deal with potential breakdowns in social interaction.

The Elusive Ubiquitous Information System and m-Links

Publication Details
  • Fuji Xerox Technical Report, No. 14, 2002
  • Jun 25, 2002

Abstract

Close
A basic objective of Weiser's Ubiquitous Computing vision is ubiquitous information access: being able to utilize any content or service (e.g., all the rich media content and services on the WWW), using devices that are always "at hand" (embedded in environments or portable), over a network with universal coverage and adequate bandwidth. Although much progress has been made, the ideal remains elusive. This paper examines the inter-relations among three dimensions of ubiquitous information systems: (1) ubiquitous content; (2) ubiquitous devices; and (3) ubiquitous networking. We use the space defined by these dimensions to reflect on the tradeoffs designers make and to chart some past and current information systems. Given this background, we present m-Links (mobile links), a new system that takes aim at the elusive ideal of ubiquitous information. Our approach builds on wireless web phone technologies because of their trend towards ubiquitous devices and networking (the second and third dimensions). Yet such very small devices sacrifice usability as rich media Internet terminals (the first dimension). To offset this limitation, we propose a new information access model for very small devices that supports a much wider range of content and services than previously possible. We have built this system with an emphasis on open systems extensibility and describe its design and implementation.

Going Back in Hypertext

Publication Details
  • Proceedings of ACM Hypertext 2002
  • Jun 11, 2002

Abstract

Close
Hypertext interfaces typically involve navigation, the act (and interaction) of moving from one piece of information to another. Navigation can be exploratory, or it may involve backtracking to some previously-visited node. While backtracking interfaces are common, they may not reflect differences in readers' purposes and mental models. This paper draws on some empirical evidence regarding navigation between and within documents to suggest improvements on traditional hypertext navigation, and proposes a time-based view of backtracking.
Publication Details
  • Journal of Library Administration, 35:1-2, 99-123, Haworth
  • Jun 7, 2002

Abstract

Close
In the emerging world of electronic publishing how we create, distribute, and read books will be in a large part determined by an underlying framework of content standards that establishes the range of technological opportunities and constraints for publishing and reading systems. But efforts to develop content standards based on sound engineering models must skillfully negotiate competing and sometimes apparently irreconcilable objectives if they are to produce results relevant to the rapidly changing course of technology. The Open eBook Forum's Publication Structure, an XML-based specification for electronic books, is an example of the sort of timely and innovative problem solving required for successful real-world standards development. As a result of this effort, the electronic book industry will not only happen sooner and on a larger scale than it would have otherwise, but the electronic books it produces will be more functional, more interoperable, and more accessible to all readers. Public interest participants have a critical role in this process.
Publication Details
  • CHI 2002
  • Apr 22, 2002

Abstract

Close
Shared text input is a technique we implemented into a note taking system for facilitating text entry on small devices. Instead of writing out words on the tedious text entry interfaces found on handheld computers, users can quickly reuse words and phrases already entered by others. Sharing notes during a meeting also increases awareness among note takers. We found that filtering the text to share was appropriate to deal with a variety of design issues such as screen real estate, scalability, privacy, reciprocity, and predictability of text location
Publication Details
  • CHI 2002
  • Apr 22, 2002

Abstract

Close
In this paper, we describe an evaluation of the Palette, a presentation tool that was reported at CHI '99. The Palette allows presenters to quickly access digital presentations using physical cards that have unique barcodes printed on them. The Palette has been in use in our lab for over three years, and has been released as a product in Japan. Our evaluation consists of an analysis of usage logs, an expert walkthrough review, and observations and interviews with users, non-users and the system administrator. The findings reveal benefits and drawbacks of the technology, and offers design ideas for further work on tangible tools of this kind.
Publication Details
  • International Journal of Human-Computer Studies, 56, pp. 75-107
  • Feb 1, 2002

Abstract

Close
We describe our experiences with the design, implementation, deployment, and evaluation of a Portholes tool which provides group and collaboration awareness through the Web. The research objective was to explore how such a system would improve communication and facilitate a shared understanding among distributed development groups. During the deployment of our Portholes system, we conducted a naturalistic study by soliciting user feedback and evolving the system in response. Many of the initial reactions of potential users indicated that our system projected the wrong image so that we designed a new version that provided explicit cues about being in public and who is looking back to suggest a social rather than information interface. We implemented the new design as a Java applet and evaluated design choices with a preference study. Our experiences with different Portholes versions and user reactions to them provide insights for designing awareness tools beyond Portholes systems. Our approach is for the studies to guide and to provide feedback for the design and technical development of our system.
2001

Signature Random Fields for Accommodating Illumination Variability

Publication Details
  • In Workshop on Identifying Objects Across Variations in Lighting: Psychophysics & Computation, Proc. IEEE Intl. Conf. on Computer Vision & Pattern Recognition 2001.
  • Dec 12, 2001

Abstract

Close
In this paper, we document an extension to traditional pattern-theoretic object templates to jointly accommodate variations in object pose and in the radiant appearance of the object surface. We first review classical object templates accommodating pose variation. We then develop an efficient subspace representation for the object radiance indexed on the surface of the three dimensional object template. We integrate the low-dimensional representation for the object radiance, or signature, into the pattern-theoretic template, and present the results of orientation estimation experiments. The experiments demonstrate both estimation performance fluctuations under varying illumination conditions and performance degradations associated with unknown scene illumination. We also present a Bayesian approach for estimation accommodating illumination variability.

Work/place: mobile technologies and arenas of activity

Publication Details
  • ACM SIGGROUP Bulletin, Volume 22, Issue 3, Pp3-9, Publisher ACM Press, New York, NY, USA
  • Dec 8, 2001

Abstract

Close
The increasing number of wireless, portable devices has led inevitably to lyrical rhetorics of business cost-cutting and increased efficiency as workers can be productive while on the and offices become streamlined areas of efficient activity. In this short paper, we raise a number if issues that have been appearing in common discourses the (most) modern office, and the impact of wireless technologies thereupon. We also present an overview of a workshop held at ECSCW in Bonn in September of 2001 on this topic, giving an overview of the comments and discussions that took place at the workshop.

Framing Mobile Collaborations and Mobile Technologies.

Publication Details
  • In B. Brown, N. Green, R. Harper (Eds.) Wireless World: Social and Interactional Aspects of Wireless Technology, London, UK: Springer-Verlag.
  • Dec 1, 2001

Abstract

Close
Recent years have seen a marked increase in the production and promotion of portable, wireless communication devices: mobile phones with internet access, wireless PDAs such as the Palm VII and smart pagers such as RIM's 850 and 950. Some claim the presence of such devices in the hands, bags and pockets of so many people heralds a new world of work in which people can be reached and information accessed "anywhere, anytime". Whether or not access to information in itself can promote new working practices, individuals whose lives revolve around movement between work sites have been singled out as an obvious market for such portable wireless communication devices. Using these devices such “mobile workers” can be in touch with colleagues, collaborators and clients "24/7", and still sustain non-work social relationships due, apparently, to their constant connectedness whilst mobile. In this chapter we have two goals. The first is to address the design of mobile technologies. This second is to illustrate our design approach, wherein we consider local practices of technology use, but also the broader cultural context in which technologies are designed, produced, bought, sold, used and redesigned. Our ultimate design aim is to build upon existing practices, but also to consider possibilities for the development of innovative technologies that enable new, complementary, practices.
Publication Details
  • In Proceedings of the International Conference on Image Processing, Thessaloniki, Greece. October 7-10, 2001.
  • Oct 7, 2001

Abstract

Close
In this paper, we present a novel framework for analyzing video using self-similarity. Video scenes are located by analyzing inter-frame similarity matrices. The approach is flexible to the choice of similarity measure and is robust and data-independent because the data is used to model itself. We present the approach and its application to scene boundary detection. This is shown to dramatically outperform a conventional scene-boundary detector that uses a histogram-based measure of frame difference.
Publication Details
  • Proceedings of ACM Multimedia 2001, Ottawa, Canada, Oct. 5, 2001.
  • Oct 5, 2001

Abstract

Close
Given rapid improvements in storage devices, network infrastructure and streaming-media technologies, a large number of corporations and universities are recording lectures and making them available online for anytime, anywhere access. However, producing high-quality lecture videos is still labor intensive and expensive. Fortunately, recent technology advances are making it feasible to build automated camera management systems to capture lectures. In this paper we report our design of such a system, including system configuration, audio-visual tracking techniques, software architecture, and user study. Motivated by different roles in a professional video production team, we have developed a multi-cinematographer single-director camera management system. The system performs lecturer tracking, audience tracking, and video editing all fully automatically, and offers quality close to that of human-operated systems.
Publication Details
  • Proc. ACM Multimedia 2001, Ottawa,CA, Oct. 2001.
  • Sep 30, 2001

Abstract

Close
We describe a system called FlyAbout which uses spatially indexed panoramic video for virtual reality applications. Panoramic video is captured by moving a 360° camera along continuous paths. Users can interactively replay the video with the ability to view any interesting object or choose a particular direction. Spatially indexed video gives the ability to travel along paths or roads with a map-like interface. At junctions, or intersection points, users can chose which path to follow as well as which direction to look, allowing interaction not available with conventional video. Combining the spatial index with a spatial database of maps or objects allows users to navigate to specific locations or interactively inspect particular objects.
Publication Details
  • Proc. International Conference on Computer Music (ICMC), Habana, Cuba, September 2001.
  • Sep 12, 2001

Abstract

Close
This paper presents a novel approach to visualizing the time structure of musical waveforms. The acoustic similarity between any two instants of an audio recording is displayed in a static 2D representation, which makes structural and rhythmic characteristics visible. Unlike practically all prior work, this method characterizes self-similarity rather than specific audio attributes such as pitch or spectral features. Examples are presented for classical and popular music.
Publication Details
  • IEEE Computer, 34(9), pp. 61-67
  • Sep 1, 2001

Abstract

Close

To meet the diverse needs of business, education, and personal video users, the authors developed three visual interfaces that help identify potentially useful or relevant video segments. In such interfaces, keyframes-still images automatically extracted from video footage-can distinguish videos, summarize them, and provide access points. Well-chosen keyframes enhance a listing's visual appeal and help users select videos. Keyframe selection can vary depending on the application's requirements: A visual summary of a video-captured meeting may require only a few highlight keyframes, a video editing system might need a keyframe for every clip, while a browsing interface requires an even distribution of keyframes over the video's full length. The authors conducted user studies for each of their three interfaces, gathering input for subsequent interface improvements. The studies revealed that finding a similarity measure for collecting video clips into groups that more closely match human perception poses a challenge. Another challenge is to further improve the video-segmentation algorithm used for selecting keyframes. A new version will provide users with more information and control without sacrificing the interface's ease of use.