Publications

FXPAL publishes in top scientific conferences and journals.

2001

Recording the Region of Interest from FlyCam Panoramic Video

Publication Details
  • Proc. International Conference on Image Processing, Thessaloniki, Greece, September 2001.
  • Sep 1, 2001

Abstract

Close
A novel method for region of interest tracking and recording video is presented. The proposed method is based on the FlyCam system, which produces high resolution and wide-angle video sequences by stitching the video frames from multiple stationary cameras. The method integrates tracking and recording processes, and targets applications such as classroom lectures and video conferencing. First, the region of interest (which typically covers the speaker) is tracked using a Kalman filter. Then, the Kalman filter estimation results are used for virtual camera control and to record the video. The system has no physical camera motion and the virtual camera parameters are readily available for video indexing. The proposed system has been implemented for real time recording of lectures and presentations.

The Beat Spectrum: A New Approach to Rhythm Analysis

Publication Details
  • In Proceedings of the International Conference on Multimedia and Expo 2001 (ICME), Tokyo, Japan. August 22-25, 2001.
  • Aug 25, 2001

Abstract

Close
We introduce the beat spectrum, a new method of automatically characterizing the rhythm and tempo of music and audio. The beat spectrum is a measure of acoustic self-similarity as a function of time lag. Highly structured or repetitive music will have strong beat spectrum peaks at the repetition times. This reveals both tempo and the relative strength of particular beats, and therefore can distinguish between different kinds of rhythms at the same tempo. We also introduce the beat spectrogram which graphically illustrates rhythm variation over time. Unlike previous approaches to tempo analysis, the beat spectrum does not depend on particular attributes such as energy or frequency, and thus will work for any music or audio in any genre. We present tempo estimation results for a variety of musical genres, which are accurate to within 1%. This approach has a variety of applications, including music retrieval by similarity and automatically generating music videos.
Publication Details
  • In Proceedings of Conference on Modeling and Design of Wireless Networks (ITCOM2001), Denver, Colorado, August 23-24 August 2001.
  • Aug 23, 2001

Abstract

Close
This paper reports our design, and implementation of an automatic lecture-room camera-management system. The motivation for building this system is to facilitate online lecture access and reduce the expense of producing high quality lecture videos. The goal of this project is a camera-management system that can perform as a human video-production team. To achieve this goal, our system collects audio/video signals available in the lecture room and uses the multimodal information to direct our video cameras to interesting events. Compared to previous work--which has tended to be technology centric--we started with interviews with professional video producers and used their knowledge and expertise to create video production rules. We then targeted technology components that allowed us to implement a substantial portion of these rules, including the design of a virtual video director, a speaker cinematographer, and an audience cinematographer. The complete system is installed in parallel with a human-operated video production system in a middle-sized corporate lecture room, and used for broadcasting lectures through the web. The systemí*s performance was compared to that of a human operator via a user study. Results suggest that our system's quality is close to that of a human-controlled system.

The impact of text browsing on text retrieval performance

Publication Details
  • Information Processing and Management 37 (3) pp. 507-520
  • Aug 21, 2001

Abstract

Close
The results from a series of three experiments that used Text Retrieval Conference (TREC) data and TREC search topics are compared. These experiments each involved three novel user interfaces (one per experiment). User interfaces that made it easier for users to view text were found to improve recall in all three experiments. A distinction was found between a cluster of subjects (a majority of whom were search experts) who tended to read fewer documents more carefully (readers, or exclusives) and subjects who skimmed through more documents without reading them as carefully (skimmers, or inclusives). Skimmers were found to have significantly better recall overall. A major outcome from our experiments at TREC and with the TREC data, is that hypertext interfaces to information retrieval (IR) tasks tend to increase recall. Our interpretation of this pattern of results across the three experiments is that increased interaction with the text (more pages viewed) generally improves recall. Findings from one of the experiments indicated that viewing a greater diversity of text on a single screen (i.e., not just more text per se, but more articles available at once) may also improve recall. In an experiment where a traditional (type-in) query interface was contrasted with a condition where queries were marked up on the text, the improvement in recall due to viewing more text was more pronounced with search novices. Our results demonstrate that markup and hypertext interfaces to text retrieval systems can benefit recall and can also benefit novices. The challenge now will be to find modified versions of hypertext interfaces that can improve precision, as well as recall and that can work with users who prefer to use different types of search strategy or have different types of training and experience.

m-Links: An Infrastructure for Very Small Internet Devices

Publication Details
  • The 7th Annual International Conference on Mobile Computing and Networking (MOBICOM 2001), Rome, Italy, July 16-21 2001, ACM Press, 2001, pp. 122-131.
  • Jul 16, 2001

Abstract

Close
In this paper we describe the Mobile Link (m-Links) infrastructure for utilizing existing World Wide Web content and services on wireless phones and other very small Internet terminals. Very small devices, typically with 3-20 lines of text, provide portability and other functionality while sacrificing usability as Internet terminals. In order to provide access on such limited hardware we propose a small device web navigation model that is more appropriate than the desktop computers web browsing model. We introduce a middleware proxy, the Navigation Engine, to facilitate the navigation model by concisely displaying the Webs link (i.e., URL) structure. Because not all Web information is appropriately "linked," the Navigation Engine incorporates data-detectors to extract bits of useful information such as phone numbers and addresses. In order to maximize program-data composibility, multiple network-based services (similar to browser plug-ins) are keyed to a links attributes such as its MIME type. We have built this system with an emphasis on user extensibility and we describe the design and implementation as well as a basic set of middleware services that we have found to be particularly important.
Publication Details
  • Proceedings of the INNS-IEEE International Joint Conference on Neural Networks, vol. 3, pp. 2176 - 2181, Washington DC., July 14-19, 2001.
  • Jul 14, 2001

Abstract

Close
The goal of this project is to teach a computer-robot system to understand human speech through natural human-computer interaction. To achieve this goal, we develop an interactive and incremental learning algorithm based on entropy-guided learning vector quantisation (LVQ) and memory association. Supported by this algorithm, the robot has the potential to learn unlimited sounds progressively. Experimental results of a multilingual short-speech learning task are given after the presentation of the learning system. Further investigation of this learning system will include human-computer interactions that involve more modalities, and applications that use the proposed idea to train home appliances.
Publication Details
  • The Eighth IFIP TC.13 Conference On Human-Computer Interaction (INTERACT 2001). Tokyo, Japan, July 9-13, 2001.
  • Jul 9, 2001

Abstract

Close
The two most commonly used techniques for evaluating the fit between application design and use - namely, usability testing and beta testing with user feedback - suffer from a number of limitations that restrict evaluation scale (in the case of usability tests) and data quality (in the case of beta tests). They also fail to provide developers with an adequate basis for: (1) assessing the impact of suspected problems on users at large, and (2) deciding where to focus development and evaluation resources to maximize the benefit for users at large. This paper describes an agent-based approach for collecting usage data and user feedback over the Internet that addresses these limitations to provide developers with a complementary source of usage- and usability-related information. Contributions include: a theory to motivate and guide data collection, an architecture capable of supporting very large scale data collection, and real-word experience suggesting the proposed approach is complementary to existing practice.
Publication Details
  • In Proceedings of Human-Computer Interaction (INTERACT '01), IOS Press, Tokyo, Japan, pp. 464-471
  • Jul 9, 2001

Abstract

Close
Hitchcock is a system to simplify the process of editing video. Its key features are the use of automatic analysis to find the best quality video clips, an algorithm to cluster those clips into meaningful piles, and an intuitive user interface for combining the desired clips into a final video. We conducted a user study to determine how the automatic clip creation and pile navigation support users in the editing process. The study showed that users liked the ease-of-use afforded by automation, but occasionally had problems navigating and overriding the automated editing decisions. These findings demonstrate the need for a proper balance between automation and user control. Thus, we built a new version of Hitchcock that retains the automatic editing features, but provides additional controls for navigation and for allowing users to modify the system decisions.

Designing e-Books for Legal Research.

Publication Details
  • In Proceedings of JCDL 2001 (Roanoke, VA, June 23-27). ACM Press. pp. 41-48.
  • Jun 23, 2001

Abstract

Close
In this paper we report the findings from a field study of legal research in a first-tier law school and on the resulting redesign of XLibris, a next-generation e-book. We first characterize a work setting in which we expected an e-book to be a useful interface for reading and otherwise using a mix of physical and digital library materials, and explore what kinds of reading-related functionality would bring value to this setting. We do this by describing important aspects of legal research in a heterogeneous information environment, including mobility, reading, annotation, link following and writing practices, and their general implications for design. We then discuss how our work with a user community and an evolving e-book prototype allowed us to examine tandem issues of usability and utility, and to redesign an existing e-book user interface to suit the needs of law students. The study caused us to move away from the notion of a stand-alone reading device and toward the concept of a document laptop, a platform that would provide wireless access to information resources, as well as support a fuller spectrum of reading-related activities.
Publication Details
  • Proceedings of ACM CHI2001, vol. 3, pp. 442 - 449, Seattle, Washington, USA, March 31 - April 5, 2001.
  • Apr 5, 2001

Abstract

Close
Given rapid improvements in network infrastructure and streaming-media technologies, a large number of corporations and universities are recording lectures and making them available online for anytime, anywhere access. However, producing high-quality lecture videos is still labor intensive and expensive. Fortunately, recent technology advances are making it feasible to build automated camera management systems to capture lectures. In this paper we report on our design, implementation and study of such a system. Compared to previous work-which has tended to be technology centric-we started with interviews with professional video producers and used their knowledge and expertise to create video production rules. We then targeted technology components that allowed us to implement a substantial portion of these rules, including the design of a virtual video director. The system's performance was compared to that of a human operator via a user study. Results suggest that our system's quality in close to that of a human-controlled system. In fact most remote audience members could not tell if the video was produced by a computer or a person.

Quiet Calls: Talking Silently on Mobile Phones

Publication Details
  • In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 174-181, ACM Press, March 31-April 5, 2001, Seattle, WA.
  • Mar 30, 2001
Publication Details
  • In Proceedings of the Thirty-fourth Annual Hawaii International Conference on System Sciences (HICSS), Big Island, Hawaii. January 7-12, 2001.
  • Feb 7, 2001

Abstract

Close
This paper describes a new system for panoramic two-way video communication. Digitally combining images from an array of inexpensive video cameras results in a wide-field panoramic camera, from inexpensive off-the-shelf hardware. This system can aid distance learning in several ways, by both presenting a better view of the instructor and teaching materials to the students, and by enabling better audience feedback to the instructor. Because the camera is fixed with respect to the background, simple motion analysis can be used to track objects and people of interest. Electronically selecting a region of this results in a rapidly steerable "virtual camera." We present system details and a prototype distance-learning scenario using multiple panoramic cameras.
Publication Details
  • WebNet 2001 World Conference on the WWW and Internet, Orlando, FL
  • Jan 17, 2001

Abstract

Close
As more information is made available online, users collect information in personal information spaces like bookmarks and emails. While most users feel that organizing these collections is crucial to improve access, studies have shown that this activity is time consuming and highly cognitive. Automatic classification has been used but by relying on the full text of the documents, they do not generate personalized classifications. Our approach is to give users the ability to annotate their documents as they first access them. This annotation tool is unobtrusive and welcome by most users who generally miss this facility when dealing with digital documents. Our experiments show that these annotations can be used to generate personalized classifications of annotated Web pages.

Description and Narrative in Hypervideo

Publication Details
  • Proceedings of the Thirty-Fourth Annual Hawaii International Conference on System Sciences
  • Jan 3, 2001

Abstract

Close
While hypertext was originally conceived for the management of scientific and technical information, it has been embraced with great enthusiasm by several members of the literary community for the promises it offers towards new approaches to narrative. Experiments with hypertext-based interactive narrative were originally based solely on verbal text but have more recently extended to include digital video artifacts. The most accomplished of these experiments, HyperCafe, provided new insights into the nature of narrative and how it may be presented; but it also offered an opportunity to reconsider other text types. This paper is an investigation of the application of an approach similar to HyperCafe to a descriptive text. We discuss how the approach serves the needs of description and illustrate the discussion with a concrete example. We then conclude by considering the extent to which our experiences with description may be applied to our continuing interest in narrative.
2000
Publication Details
  • ACM Computing Surveys, Vol. 32 No. 4, December 2000.
  • Dec 1, 2000

Abstract

Close
Modern window-based user interface systems generate user interface events as natural products of their normal operation. Because such events can be automatically captured and because they indicate user behavior with respect to an application's user interface, they have long been regarded as a potentially fruitful source of information regarding application usage and usability. However, because user interface events are typically voluminos and rich in detail, automated support is generally required to extract information at a level of abstraction that is useful to investigators interested in analyzing application usage or evaluating usability. This survey examines computer-aided techniques used by HCI practitioners and researchers to extract usability-related information from user interface events. A framework is presented to help HCI practitioners and researchers categorize and compare the approaches that have been, or might fruitfully be, applied to this problem. Because many of the techniques in the research literature have not been evaluated in practice, this survey provides a conceptual evaluation to help identify some of the relative merits and drawbacks of the various classes of approaches. Ideas for future research in this area are also presented. This survey addresses the following questions: How might user interface events be used in evaluating usability? How are user interface events related to other forms of usability data? What are the key challenges faced by investigators wishing to exploit this data? What approaches have been brought to bear on this problem and how do they compare to one another? What are some of the important open research questions in this area?
Publication Details
  • Multimedia Modeling: Modeling Multimedia Information and Systems, Nagano, Japan
  • Nov 12, 2000

Abstract

Close
While hypermedia is usually presented as a way to offer content in a nonlinear manner, hypermedia structure tends to reinforce the assumption that reading is basically a linear process. Link structures provide a means by which the reader may choose different paths to traverse; but each of these paths is fundamentally linear, revealed through either a block of text or a well-defined chain of links. While there are experiences that get beyond such linear constraints, such as driving a car, it is very hard to capture this kind of non-linearity, characterized by multiple sources of stimuli competing for attention, in a hypermedia document. This paper presents a multi-channel document infrastructure that provides a means by which all such sources of attention are presented on a single "page" (i.e., a display with which the reader interacts) and move between background and foreground in response to the activities of the reader. The infrastructure thus controls the presentation of content with respect to four dimensions: visual, audio, interaction support, and rhythm.
Publication Details
  • In Proceedings of UIST '00, ACM Press, pp. 81-89, 2000.
  • Nov 4, 2000

Abstract

Close
Hitchcock is a system that allows users to easily create custom videos from raw video shot with a standard video camera. In contrast to other video editing systems, Hitchcock uses automatic analysis to determine the suitability of portions of the raw video. Unsuitable video typically has fast or erratic camera motion. Hitchcock first analyzes video to identify the type and amount of camera motion: fast pan, slow zoom, etc. Based on this analysis, a numerical "unsuitability" score is computed for each frame of the video. Combined with standard editing rules, this score is used to identify clips for inclusion in the final video and to select their start and end points. To create a custom video, the user drags keyframes corresponding to the desired clips into a storyboard. Users can lengthen or shorten the clip without specifying the start and end frames explicitly. Clip lengths are balanced automatically using a spring-based algorithm.
Publication Details
  • In Proceedings of the International Symposium on Music Information Retrieval, in press.
  • Oct 23, 2000

Abstract

Close
We introduce an audio retrieval-by-example system for orchestral music. Unlike many other approaches, this system is based on analysis of the audio waveform and does not rely on symbolic or MIDI representations. ARTHUR retrieves audio on the basis of long-term structure, specifically the variation of soft and louder passages. The long-term structure is determined from envelope of audio energy versus time in one or more frequency bands. Similarity between energy profiles is calculated using dynamic programming. Given an example audio document, other documents in a collection can be ranked by similarity of their energy profiles. Experiments are presented for a modest corpus that demonstrate excellent results in retrieving different performances of the same orchestral work, given an example performance or short excerpt as a query.

An Introduction to Quantum Computing for Non-Physicists.

Publication Details
  • ACM Computing Surveys, Vol. 32(3), pp. 300 - 335
  • Sep 1, 2000

Abstract

Close
Richard Feynman's observation that quantum mechanical effects could not be simulated efficiently on a computer led to speculation that computation in general could be done more efficiently if it used quantum effects. This speculation appeared justified when Peter Shor described a polynomial time quantum algorithm for factoring integers. In quantum systems, the computational space increases exponentially with the size of the system which enables exponential parallelism. This parallelism could lead to exponentially faster quantum algorithms than possible classically. The catch is that accessing the results, which requires measurement, proves tricky and requires new non-traditional programming techniques. The aim of this paper is to guide computer scientists and other non-physicists through the conceptual and notational barriers that separate quantum computing from conventional computing. We introduce basic principles of quantum mechanics to explain where the power of quantum computers comes from and why it is difficult to harness. We describe quantum cryptography, teleportation, and dense coding. Various approaches to harnessing the power of quantum parallelism are explained, including Shor's algorithm, Grover's algorithm, and Hogg's algorithms. We conclude with a discussion of quantum error correction.
Publication Details
  • In Multimedia Tools and Applications, 11(3), pp. 347-358, 2000.
  • Aug 1, 2000

Abstract

Close
In accessing large collections of digitized videos, it is often difficult to find both the appropriate video file and the portion of the video that is of interest. This paper describes a novel technique for determining keyframes that are different from each other and provide a good representation of the whole video. We use keyframes to distinguish videos from each other, to summarize videos, and to provide access points into them. The technique can determine any number of keyframes by clustering the frames in a video and by selecting a representative frame from each cluster. Temporal constraints are used to filter out some clusters and to determine the representative frame for a cluster. Desirable visual features can be emphasized in the set of keyframes. An application for browsing a collection of videos makes use of the keyframes to support skimming and to provide visual summaries.

Expanding a Tangible User Interface

Publication Details
  • In proceedings of DIS'2000, ACM Press, August 2000.
  • Aug 1, 2000
Publication Details
  • In Proceedings of IEEE International Conference on Multimedia and Expo, vol. III, pp. 1329-1332, 2000.
  • Jul 30, 2000

Abstract

Close
We describe a genetic segmentation algorithm for video. This algorithm operates on segments of a string representation. It is similar to both classical genetic algorithms that operate on bits of a string and genetic grouping algorithms that operate on subsets of a set. For evaluating segmentations, we define similarity adjacency functions, which are extremely expensive to optimize with traditional methods. The evolutionary nature of genetic algorithms offers a further advantage by enabling incremental segmentation. Applications include video summarization and indexing for browsing, plus adapting to user access patterns.
Publication Details
  • In Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann Publishers, pp. 666-673, 2000.
  • Jul 8, 2000

Abstract

Close
We describe a genetic segmentation algorithm for image data streams and video. This algorithm operates on segments of a string representation. It is similar to both classical genetic algorithms that operate on bits of a string and genetic grouping algorithms that operate on subsets of a set. It employs a segment fair crossover operation. For evaluating segmentations, we define similarity adjacency functions, which are extremely expensive to optimize with traditional methods. The evolutionary nature of genetic algorithms offers a further advantage by enabling incremental segmentation. Applications include browsing and summarizing video and collections of visually rich documents, plus a way of adapting to user access patterns.
Publication Details
  • In Japan Hardcopy 2000, The Annual Conference of the Imaging Society of Japan. 6/12 6/14 2000.
  • Jun 12, 2000
Publication Details
  • In Proceedings of Hypertext '00, ACM Press, pp. 244-245, 2000.
  • May 30, 2000

Abstract

Close
We describe a way to make a hypermedia meeting record from multimedia meeting documents by automatically generating links through image matching. In particular, we look at video recordings and scanned paper handouts of presentation slides with ink annotations. The algorithm that we employ is the Discrete Cosine Transform (DCT). Interactions with multipath links and paper interfaces are discussed.

Hypertext Interaction Revisited

Publication Details
  • In Proceedings of Hypertext '00, ACM Press, pp. 171-179, 2000
  • May 30, 2000

Abstract

Close
Much of hypertext narrative relies on links to shape a reader's interaction with the text. But links may be too limited to express ambiguity, imprecision, and entropy, or to admit new modes of participation short of full collaboration. We use an e-book form to explore the implications of freeform annotation-based interaction with hypertext narrative. Readers' marks on the text can be used to guide navigation, create a persistent record of a reading, or to recombine textual elements as a means of creating a new narrative. In this paper, we describe how such an experimental capability was created on top of XLibris, a next generation e-book, using Forward Anywhere as the hypernarrative. We work through a scenario of interaction, and discuss the issues the work raises
Publication Details
  • In RIAO'2000 Conference Proceedings, Content-Based Multimedia Information Access, C.I.D., pp. 637-648, 2000.
  • Apr 12, 2000

Abstract

Close
We present and interactive system that allows a user to locate regions of video that are similar to a video query. Thus segments of video can be found by simply providing an example of the video of interest. The user selects a video segment for the query from either a static frame-based interface or a video player. A statistical model of the query is calculated on-the-fly, and is used to find similar regions of video. The similarity measure is based on a Gaussian model of reduced frame image transform coefficients. Similarity in a single video is displayed in the Metadata Media Player. The player can be used to navigate through the video by jumping between regions of similarity. Similarity can be rapidly calculated for multiple video files as well. These results are displayed in MBase, a Web-based video browser that allows similarity in multiple video files to be visualized simultaneously.

Anchored Conversations. Chatting in the Context of a Document.

Publication Details
  • In CHI 2000 Conference Proceedings, ACM Press, pp. 454-461, 2000.
  • Mar 31, 2000

Abstract

Close
This paper describes an application-independent tool called Anchored Conversations that brings together text-based conversations and documents. The design of Anchored Conversations is based on our observations of the use of documents and text chats in collaborative settings. We observed that chat spaces support work conversations, but they do not allow the close integration of conversations with work documents that can be seen when people are working together face-to-face. Anchored Conversations directly addresses this problem by allowing text chats to be anchored into documents. Anchored Conversations also facilitates document sharing; accepting an invitation to an anchored conversation results in the document being automatically uploaded. In addition, Anchored Conversations provides support for review, catch-up and asynchronous communications through a database. In this paper we describe motivating fieldwork, the design of Anchored Conversations, a scenario of use, and some preliminary results from a user study.
Publication Details
  • In CHI 2000 Conference Proceedings, ACM Press, pp. 185-192, 2000.
  • Mar 31, 2000

Abstract

Close
This paper presents a method for generating compact pictorial summarizations of video. We developed a novel approach for selecting still images from a video suitable for summarizing the video and for providing entry points into it. Images are laid out in a compact, visually pleasing display reminiscent of a comic book or Japanese manga. Users can explore the video by interacting with the presented summary. Links from each keyframe start video playback and/or present additional detail. Captions can be added to presentation frames to include commentary or descriptions such as the minutes of a recorded meeting. We conducted a study to compare variants of our summarization technique. The study participants judged the manga summary to be significantly better than the other two conditions with respect to their suitability for summaries and navigation, and their visual appeal.

Beyond Bits: The Future of Quantum Information Processing.

Publication Details
  • IEEE Computer, pp. 38-45, January 2000.
  • Feb 1, 2000

Abstract

Close
Recently, physicists and computer scientists have realized that not only do our ideas about computing rest on only partly accurate principles, but they miss out on a whole class of computation. Quantum physics offers powerful methods of encoding and manipulating information that are not possible within a classical framework. The potential applications of these quantum information processing methods include provably secure key distribution for cryptography, rapid integer factoring, and quantum simulation.