Publications

FXPAL publishes in top scientific conferences and journals.

2006

Security Risks in Java-based Mobile Code Systems

Publication Details
  • Henry Hexmoor, Marcin Paprzycki, Niranjan Suri (eds) Scalable Computing: Practice and Experience Volume 7, No. 4, December 2006
  • Dec 23, 2006

Abstract

Close
Java is the predominant language for mobile agent systems, both for implementing mobile agent execution environments and for writing mobile agent applications. This is due to inherent support for code mobility by means of dynamic class loading and separable class name spaces, as well as a number of security properties, such as language safety and access control by means of stack introspection. However, serious questions must be raised whether Java is actually up to the task of providing a secure execution environment for mobile agents. At the time of writing, it has neither resource control nor proper application separation. In this article we take an in-depth look at Java as a foundation for secure mobile agent systems.
Publication Details
  • MobCops 2006 Workshop in conjunction with IEEE/ACM CollaborateCom 2006, Atlanta, Georgia, USA.
  • Nov 17, 2006

Abstract

Close
Load balancing has been an increasingly important issue for handling computational intensive tasks in a distributed system such as in Grid and cluster computing. In such systems, multiple server instances are installed for handling requests from client applications, and each request (or task) typically needs to stay in a queue before an available server is assigned to process it. In this paper, we propose a high-performance queueing method for implementing a shared queue for collaborative clusters of servers. Each cluster of servers maintains a local queue and queues of different clusters are networked to form a unified (or shared) queue that may dispatch tasks to all available servers. We propose a new randomized algorithm for forwarding requests in an overcrowded local queue to a networked queue based on load information of the local and neighboring clusters. The algorithm achieves both load balancing and locality awareness.

Term Context Models for Information Retrieval

Publication Details
  • CIKM (Conference on information and Knowledge Management) 2006, Arlington, VA
  • Nov 7, 2006

Abstract

Close
At their heart, most if not all information retrieval models utilize some form of term frequency. The notion is that the more often a query term occurs in a document, the more likely it is that document meets an information need. We examine an alternative. We propose a model which assesses the presence of a term in a document not by looking at the actual occurrence of that term, but by a set of nonindependent supporting terms, i.e. context. This yields a weighting for terms in documents which is different from and complementary to tf-based methods, and is beneficial for retrieval.
Publication Details
  • In Proceedings of the fourth ACM International Workshop on Video Surveillance & Sensor Networks VSSN '06, Santa Barbara, CA, pp. 19-26
  • Oct 27, 2006

Abstract

Close
Video surveillance systems have become common across a wide number of environments. While these installations have included more video streams, they also have been placed in contexts with limited personnel for monitoring the video feeds. In such settings, limited human attention, combined with the quantity of video, makes it difficult for security personnel to identify activities of interest and determine interrelationships between activities in different video streams. We have developed applications to support security personnel both in analyzing previously recorded video and in monitoring live video streams. For recorded video, we created storyboard visualizations that emphasize the most important activity as heuristically determined by the system. We also developed an interactive multi-channel video player application that connects camera views to map locations, alerts users to unusual and suspicious video, and visualizes unusual events along a timeline for later replay. We use different analysis techniques to determine unusual events and to highlight them in video images. These tools aid security personnel by directing their attention to the most important activity within recorded video or among several live video streams.
Publication Details
  • UIST 2006 Companion
  • Oct 16, 2006

Abstract

Close
Video surveillance requires keeping the human in the loop. Software can aid security personnel in monitoring and using video. We have developed a set of interface components designed to locate and follow important activity within security video. By recognizing and visualizing localized activity, presenting overviews of activity over time, and temporally and geographically contextualizing video playback, we aim to support security personnel in making use of the growing quantity of security video.
Publication Details
  • UIST 2006 Companion
  • Oct 16, 2006

Abstract

Close
With the growing quantity of security video, it becomes vital that video surveillance software be able to support security personnel in monitoring and tracking activities. We have developed a multi-stream video player that plays recorded and live videos while drawing the users' attention to activity in the video. We will demonstrate the features of the video player and in particular, how it focuses on keeping the human in the loop and drawing their attention to activities in the video.
Publication Details
  • Proceedings of IEEE Multimedia Signal Processing 2006
  • Oct 3, 2006

Abstract

Close
This paper presents a method for facilitating document redirection in a physical environment via a mobile camera. With this method, a user is able to move documents among electronic devices, post a paper document to a selected public display, or make a printout of a white board with simple point-and-capture operations. More specifically, the user can move a document from its source to a destination by capturing a source image and a destination image in a consecutive order. The system uses SIFT (Scale Invariant Feature Transform) features of captured images to identify the devices a user is pointing to, and issues corresponding commands associated with identified devices. Unlike RF/IR based remote controls, this method uses object visual features as an all time 'transmitter' for many tasks, and therefore is easy to deploy. We present experiments on identifying three public displays and a document scanner in a conference room for evaluation.

The USE Project: Designing Smart Spaces for Real People

Publication Details
  • UbiComp 2006 Workshop position paper
  • Sep 20, 2006

Abstract

Close
We describe our work-in-progress: a "wizard-free" conference room designed for ease of use, yet retaining next-generation functionality. Called USE (Usable Smart Environments), our system uses multi-display systems, immersive conferencing, and secure authentication. It is based in cross-cultural ethnographic studies on the way people use conference rooms. The USE project has developed a flexible, extensible architecture specifically designed to enhance ease of use in smart environment technologies. The architecture allows customization and personalization of smart environments for particular people and groups, types of work, and specific physical spaces. The system consists of a database of devices with attributes, rooms and meetings that implements a prototype-instance inheritance mechanism through which contextual information (e.g. IP addresses application settings, phone numbers for teleconferencing systems, etc.) can be associated

Usable ubiquitous computing in next generation conference rooms: design, architecture and evaluation

Publication Details
  • International workshop at UbiComp 2006.
  • Sep 17, 2006

Abstract

Close
In the UbiComp 2005 workshop "Ubiquitous computing in next generation conference rooms" we learned that usability is one of the primary challenges in these spaces. Nearly all "smart" rooms, though they often have interesting and effective functionality, are very difficult to simply walk in and use. Most such rooms have resident experts who keep the room's systems functioning, and who often must be available on an everyday basis to enable the meeting technologies. The systems in these rooms are designed for and assume the presence of these human "wizards"; they are seldom designed with usability in mind. In addition, people don't know what to expect in these rooms; as yet there is no technology standard for next-generation conference rooms. The challenge here is to strike an effective balance between usability and new kinds of functionality (such as multiple displays, new interfaces, rich media systems, new uploading/access/security systems, robust mobile integration, to name just a few of the functions we saw in last year's workshop). So, this year, we propose a workshop to focus more specifically on how the design of next-generation conference rooms can support usability: the tasks facing the real people who use these rooms daily. Usability in ubiquitous computing has been the topic of several papers and workshops. Focusing on usability in next-generation conference rooms lets us bring to bear some of the insights from this prior work in a delineated application space. In addition the workshop will be informed by the most recent usability research in ubiquitous computing, rich media, context-aware mobile systems, multiple display environments, and interactive physical environments. We also are vitally concerned with how usability in smart environments tracks (or doesn't) across cultures. Conference room research has been and remains a focal point for some of the most interesting and applied work in ubiquitous computing. It is also an area where there are many real-world applications and daily opportunities for user feed-back: in short, a rich area for exploring usable ubiquitous computing. We see a rich opportunity to draw together researchers not only from conference room research but also from areas such as interactive furniture/smart environments, rich media, social computing, remote conferencing, and mobile devices for a lively exchange of ideas on usability in applied ubicomp systems for conference rooms.
Publication Details
  • International Conference on Pattern Recognition
  • Aug 20, 2006

Abstract

Close
This paper describes a framework for detecting unusual events in surveillance videos. Most surveillance systems consist of multiple video streams, but traditional event detection systems treat individual video streams independently or combine them in the feature extraction level through geometric reconstruction. Our framework combines multiple video streams in the inference level, with a coupled hidden Markov Model (CHMM). We use two-stage training to bootstrap a set of usual events, and train a CHMM over the set. By thresholding the likelihood of a test segment being generated by the model, we build a unusual event detector. We evaluate the performance of our detector through qualitative and quantitative experiments on two sets of real world videos.
Publication Details
  • Interactive Video; Algorithms and Technologies Hammoud, Riad (Ed.) 2006, XVI, 250 p., 109 illus., Hardcover.
  • Jun 7, 2006

Abstract

Close
This chapter describes tools for browsing and searching through video to enable users to quickly locate video passages of interest. Digital video databases containing large numbers of video programs ranging from several minutes to several hours in length are becoming increasingly common. In many cases, it is not sufficient to search for relevant videos, but rather to identify relevant clips, typically less than one minute in length, within the videos. We offer two approaches for finding information in videos. The first approach provides an automatically generated interactive multi-level summary in the form of a hypervideo. When viewing a sequence of short video clips, the user can obtain more detail on the clip being watched. For situations where browsing is impractical, we present a video search system with a flexible user interface that incorporates dynamic visualizations of the underlying multimedia objects. The system employs automatic story segmentation, and displays the results of text and image-based queries in ranked sets of story summaries. Both approaches help users to quickly drill down to potentially relevant video clips and to determine the relevance by visually inspecting the material.

Visualization in Audio-Based Music Information Retrieval

Publication Details
  • Computer Music Journal Vol. 30, Issue 2, pp. 42-62, 2006.
  • Jun 6, 2006

Abstract

Close
Music Information Retrieval (MIR) is an emerging research area that explores how music stored digitally can be effectively organized, searched, retrieved and browsed. The explosive growth of online music distribution, portable music players and lowering costs of recording indicate that in the near future most of recorded music in human history will be available digitally. MIR is steadily growing as a research area as can be evidenced by the international conference on music information retrieval (ISMIR) series soon in its sixth year and the increasing number of MIR-related publications in the Computer Music Journal as well as other journals and conferences.
Publication Details
  • Complexity, Vol 11, No 5.
  • Jun 3, 2006

Abstract

Close
Technology-the collection of devices and methods available to human society-evolves by constructing new devices and methods from ones that previously exist, and in turn offering these as possible components-building blocks-for the construction of further new devices and elements. The collective of technology in this way forms a network of elements where novel elements are created from existing ones and where more complicated elements evolve from simpler ones. We model this evolution within a simple artificial system on the computer. The elements in our system are logic circuits. New elements are formed by combination from simpler existing elements (circuits), and if a novel combination satisfies one of a set of needs it is retained as a building block for further combination. We study the properties of the resulting buildout. We find that our artificial system can create complicated technologies (circuits), but only by first creating simpler ones as building blocks. Our results mirror Lenski et al.'s, that complex features can be created in biological evolution only if simpler functions are first favored and act as stepping stones. We also find evidence that the resulting collection of technologies exists at self-organized criticality.
Publication Details
  • Proceedings of AVI '06 (Short Paper), ACM Press, pp. 258-261.
  • May 23, 2006

Abstract

Close
During grouping tasks for data exploration and sense-making, the criteria are normally not well-defined. When users are bringing together data objects thought to be similar in some way, implicit brushing continually detects for groups on the freeform workspace, analyzes the groups' text content or metadata, and draws attention to related data by displaying visual hints and animation. This provides helpful tips for further grouping, group meaning refinement and structure discovery. The sense-making process is further enhanced by retrieving relevant information from a database or network during the brushing. Closely related to implicit brushing, target snapping provides a useful means to move a data object to one of its related groups on a large display. Natural dynamics and smooth animations also help to prevent distractions and allow users to concentrate on the grouping and thinking tasks. Two different prototype applications, note grouping for brainstorming and photo browsing, demonstrate the general applicability of the technique.
Publication Details
  • The 15th International World Wide Web Conference (WWW2006)
  • May 23, 2006

Abstract

Close
In a landmark article, over a half century ago, Vannevar Bush envisioned a "Memory Extender" device he dubbed the "memex". Bush's ideas anticipated and inspired numerous breakthroughs, including hypertext, the Internet, the World Wide Web, and Wikipedia. However, despite these triumphs, the memex has still not lived up to its potential in corporate settings. One reason is that corporate users often don't have sufficient time or incentives to contribute to a corporate memory or to explore others' contributions. At FXPAL, we are investigating ways to automatically create and retrieve useful corporate memories without any added burden on anyone. In this paper we discuss how ProjectorBox a smart appliance for automatic presentation capture and PAL Bar a system for proactively retrieving contextually relevant corporate memories have enabled us to integrate content from a variety of sources to create a cohesive multimedia corporate memory for our organization.

Tunnel Vector: A New Routing Algorithm with Scalability

Publication Details
  • The 9th IEEE Global Internet Symposium in conjunction with the 25th IEEE INFOCOM Conference, Barcelona, Catalunya, Spain, April 28 - 29, 2006
  • Apr 28, 2006

Abstract

Close
Routing algorithms such as Distance Vector and Link States have the routing table size as O(n), where n is the number of destination identifiers, thus providing only limited scalability for large networks when n is high. As the distributed hash table (DHT) techniques are extraordinarily scalable with n, our work aims at adapting a DHT approach to the design of a network-layer routing algorithm so that the average routing table size can be significantly reduced to O(log n) without losing much routing efficiency. Nonetheless, this scheme requires a major breakthrough to address some fundamental challenges. Specifically, unlike a DHT, a network-layer routing algorithm must (1) exchange its control messages without an underlying network, (2) handle link insertion/deletion and link-cost updates, and (3) provide routing efficiency. Thus, we are motivated to propose a new network-layer routing algorithm, Tunnel Vector (TV), using DHT-like multilevel routing without an underlying network. TV exchanges its control messages only via physical links and is self-configurable in response to linkage updates. In TV, the routing path of a packet is near optimal while the routing table size is O(log n) per node, with high probability. Thus, TV is suitable for routing in a very large network.
Publication Details
  • Proceedings of ACM DIS (Designing Interactive Systems) 2006, Penn State, Penn.
  • Apr 5, 2006

Abstract

Close
What does a student need to know to be a designer? Beyond a list of separate skills, what mindset does a student need to develop for designerly action now and into the future? In the excitement of the cognitive revolution, Simon proposed a way of thinking about design that promised to make it more manageable and cognitive: to think of design as a planning problem. Yet, as Suchman argued long ago, planning accounts may be applied to problems that are not at base accomplished by planning, to the detriment of design vision. This paper reports on a pedagogy that takes Suchman's criticism to heart and avoids dressing up design methods as more systematic and predictive than they in fact are. The idea is to teach design through expo-sure to not just one, but rather, many methods---that is, sets of rules or behaviors that produce artifacts for further reflec-tion and development. By introducing a large number of design methods, decoupled from theories, models or frame-works, we teach (a) important cross-methodological regu-larities in competence as a designer, (b) that the practice of design can itself be designed and (c) that method choice affects design outcomes. This provides a rich and produc-tive notion of design particularly necessary for the world of pervasive and ubiquitous computing.
Publication Details
  • EACL (11th Conference of the European Chapter of the Association for Computational Linguistics)
  • Apr 3, 2006

Abstract

Close
Probabilistic Latent Semantic Analysis (PLSA) models have been shown to provide a better model for capturing polysemy and synonymy than Latent Semantic Analysis (LSA). However, the parameters of a PLSA model are trained using the Expectation Maximization (EM) algorithm, and as a result, the trained model is dependent on the initialization values so that performance can be highly variable. In this paper we present a method for using LSA analysis to initialize a PLSA model. We also investigated the performance of our method for the tasks of text segmentation and retrieval on personal-size corpora, and present results demonstrating the efficacy of our proposed approach.
Publication Details
  • International Journal of Web Services Practices
  • Jan 17, 2006

Abstract

Close
Mobile users often require access to their documents while away from the office. While pre-loading documents in a repository can make those documents available remotely, people need to know in advance which documents they might need. Furthermore, it may be difficult to view, print, or share the document through a portable device such as cell phone. We describe DoKumobility, a network of web services for mobile users for managing, printing, and sharing documents. In this paper, we describe the infrastructure and illustrate its use with several applications. We conclude with a discussion of lessons learned and future work.
2005

On-Demand Overlay Networking of Collaborative Applications

Publication Details
  • IEEE CollaborateCom 2005 - The First IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing
  • Dec 19, 2005

Abstract

Close
We propose a new overlay network, called Generic Identifier Network (GIN), for collaborative nodes to share objects with transactions across affiliated organizations by merging the organizational local namespaces upon mutual agreement. Using local namespaces instead of a global namespace can avoid excessive dissemination of organizational information, reduce maintenance costs, and improve robustness against external security attacks. GIN can forward a query with an O(1) latency stretch with high probability and achieve high performance. In the absence of a complete distance map, its heuristic algorithms for self configuration are scalable and efficient. Routing tables are maintained using soft-state mechanisms for fault tolerance and adapting to performance updates of network distances. Thus, GIN has significant new advantages for building an efficient and scalable Distributed Hash Table for modern collaborative applications across organizations.
Publication Details
  • Proceedings of SPIE International Symposium ITCom 2005 on Multimedia Systems and Applications VIII, Boston, Massachusetts, USA, October 2005.
  • Dec 7, 2005

Abstract

Close
Meeting environments, such as conference rooms, executive briefing centers, and exhibition spaces, are now commonly equipped with multiple displays, and will become increasingly display-rich in the future. Existing authoring / presentation tools such as PowerPoint, however, provide little support for effective utilization of multiple displays. Even using advanced multi-display enabled multimedia presentation tools, the task of assigning material to displays is tedious and distracts presenters from focusing on content. This paper describes a framework for automatically assigning presentation material to displays, based on a model of the quality of views of audience members. The framework is based on a model of visual fidelity which takes into account presentation content, audience members' locations, the limited resolution of human eyes, and display location, orientation, size, resolution, and frame rate. The model can be used to determine presentation material placement based on average or worst case audience member view quality, and to warn about material that would be illegible. By integrating this framework with a previous system for multi-display presentation [PreAuthor, others], we created a tool that accepts PowerPoint and/or other media input files, and automatically generates a layout of material onto displays for each state of the presentation. The tool also provides an interface allowing the presenter to modify the automatically generated layout before or during the actual presentation. This paper discusses the framework, possible application scenarios, examples of the system behavior, and our experience with system use.
Publication Details
  • Video track, ACM Multimedia 2005.
  • Nov 13, 2005

Abstract

Close
A Post-Bit is a prototype of a small ePaper device for handling multimedia content, combining interaction control and display into one package. Post-Bits are modeled after paper Post-Its™; the functions of each Post-Bit combine the affordances of physical tiny sticky memos and digital handling of information. Post-Bits enable us to arrange multimedia contents in our embodied physical spaces. Tangible properties of paper such as flipping, flexing, scattering and rubbing are mapped to controlling aspects of the content. In this paper, we introduce the integrated design and functionality of the Post-Bit system, including four main components: the ePaper sticky memo/player, with integrated sensors and connectors; a small container/binder that a few Post-Bits can fit into, for ordering and multiple connections; the data and power port that allows communication with the host com-puter; and finally the software and GUI interface that reside on the host PC and manage multimedia transfer.
Publication Details
  • ACM Multimedia 2005, Technical Demonstrations.
  • Nov 5, 2005

Abstract

Close
The MediaMetro application provides an interactive 3D visualization of multimedia document collections using a city metaphor. The directories are mapped to city layouts using algorithms similar to treemaps. Each multimedia document is represented by a building and visual summaries of the different constituent media types are rendered onto the sides of the building. From videos, Manga storyboards with keyframe images are created and shown on the façade; from slides and text, thumbnail images are produced and subsampled for display on the building sides. The images resemble windows on a building and can be selected for media playback. To support more facile navigation between high overviews and low detail views, a novel swooping technique was developed that combines altitude and tilt changes with zeroing in on a target.

Seamless presentation capture, indexing, and management

Publication Details
  • Internet Multimedia Management Systems VI (SPIE Optics East 2005)
  • Oct 26, 2005

Abstract

Close
Technology abounds for capturing presentations. However, no simple solution exists that is completely automatic. ProjectorBox is a "zero user interaction" appliance that automatically captures, indexes, and manages presentation multimedia. It operates continuously to record the RGB information sent from presentation devices, such as a presenter's laptop, to display devices, such as a projector. It seamlessly captures high-resolution slide images, text and audio. It requires no operator, specialized software, or changes to current presentation practice. Automatic media analysis is used to detect presentation content and segment presentations. The analysis substantially enhances the web-based user interface for browsing, searching, and exporting captured presentations. ProjectorBox has been in use for over a year in our corporate conference room, and has been deployed in two universities. Our goal is to develop automatic capture services that address both corporate and educational needs.
Publication Details
  • World Conference on E-Learning in Corporate, Government, Healthcare, & Higher Education (E-Learn 2005)
  • Oct 24, 2005

Abstract

Close
Automatic lecture capture can help students, instructors, and educational institutions. Students can focus less on note-taking and more on what the instructor is saying. Instructors can provide access to lecture archives to help students study for exams and make-up missed classes. And online lecture recordings can be used to support distance learning. For these and other reasons, there has been great interest in automatically capturing classroom presentations. However, there is no simple solution that is completely automatic. ProjectorBox is our attempt to create a "zero user interaction" appliance that automatically captures, indexes, and manages presentation multimedia. It operates continuously to record the RGB information sent from presentation devices, such as an instructor's laptop, to display devices such as a projector. It seamlessly captures high-resolution slide images, text, and audio. A web-based user interface allows students to browse, search, replay, and export captured presentations.
Publication Details
  • In Proceedings of International Conference on Computer Vision, 2005, page 1026-1033
  • Oct 17, 2005

Abstract

Close
Recent years have witnessed the rise of many effective text information retrieval systems. By treating local visual features as terms, training images as documents and input images as queries, we formulate the problem of object recognition into that of text retrieval. Our formulation opens up the opportunity to integrate some powerful text retrieval tools with computer vision techniques. In this paper, we propose to improve the efficiency of articulated object recognition by an Okapi-Chamfer matching algorithm. The algorithm is based on the inverted index technique. The inverted index is a widely used way to effectively organize a collection of text documents. With the inverted index, only documents that contain query terms are accessed and used for matching. To enable inverted indexing in an image database, we build a lexicon of local visual features by clustering the features extracted from the training images. Given a query image, we extract visual features and quantize them based on the lexicon, and then look up the inverted index to identify the subset of training images with non-zero matching score. To evaluate the matching scores in the subset, we combined the modified Okapi weighting formula with the Chamfer distance. The performance of the Okapi-Chamfer matching algorithm is evaluated on a hand posture recognition system. We test the system with both synthesized and real world images. Quantitative results demonstrate the accuracy and efficiency our system.
Publication Details
  • IEEE Trans. Multimedia, Vol. 7 No. 5, pp. 981-990
  • Oct 11, 2005

Abstract

Close
Abstract-We present a system for automatically extracting the region of interest and controlling virtual cameras control based on panoramic video. It targets applications such as classroom lectures and video conferencing. For capturing panoramic video, we use the FlyCam system that produces high resolution, wide-angle video by stitching video images from multiple stationary cameras. To generate conventional video, a region of interest (ROI) can be cropped from the panoramic video. We propose methods for ROI detection, tracking, and virtual camera control that work in both the uncompressed and compressed domains. The ROI is located from motion and color information in the uncompressed domain and macroblock information in the compressed domain, and tracked using a Kalman filter. This results in virtual camera control that simulates human controlled video recording. The system has no physical camera motion and the virtual camera parameters are readily available for video indexing.
Publication Details
  • http://www.strata.com/gallery_detail.asp?id=1480&page=1&category=48
  • Oct 1, 2005

Abstract

Close
I produced these Illustrations for two multimedia applications that were developed by FX Palo Alto Laboratory and California State University at Sacramento's Department of Psychology. The applications were part of a study to see how primary school age children learn with certain multimedia tools. Each illustration was viewed as part of a fairly complex screen of information as well as on its own.
Publication Details
  • We organized and ran a full-day workshop at the UbiComp 2005 Conference in Tokyo, Japan, September 11, 2005.
  • Sep 29, 2005

Abstract

Close
Designing the technologies, applications, and physical spaces for next-generation conference rooms (This is a day-long workshop in Tokyo.) Next-generation conference rooms are often designed to anticipate the onslaught of new rich media presentation and ideation systems. Throughout the past couple of decades, many researchers have attempted to reinvent the conference room, aiming at shared online or visual/virtual spaces, smart tables or walls, media support and tele-conferencing systems of varying complexity. Current research in high-end room systems often features a multiplicity of thin, bright display screens (both large and small), along with interactive whiteboards, robotic cameras, and smart remote conferencing systems. Added into the mix one can find a variety of meeting capture and metadata management systems, automatic or not, focused on capturing different aspects of meetings in different media: to the Web, to one's PDA or phone, or to a company database. Smart spaces and interactive furniture design projects have shown systems embedded in tables, podiums, walls, chairs and even floors and lighting. Exploiting the capabilities of all these technologies in one room, however, is a daunting task. For example, faced with three or more display screens, all but a few presenters are likely to opt for simply replicating the same image on all of them. Even more daunting is the design challenge: how to choose which capabilities are vital to particular tasks, or for a particular room, or are well suited to a particular culture. In this workshop we'll explore how the design of next-generation conference rooms can be informed by the most recent research in rich media, context-aware mobile systems, ubiquitous displays, and interactive physical environments. How should conference room systems reflect the rapidly changing expectations around personal devices and smart spaces? What kinds of systems are needed to support meetings in technologically complex environments? How can design of conference room spaces and technologies account for differing social and cultural practices around meetings? What requirements are imposed by security and privacy issues in public spaces? What aspects of meeting capture and access technologies have proven to be useful, and how should a smart environment enable them? What intersections exist with other research areas such as digital libraries? Conference room research has been and remains a focal point for some of the most interesting and applied work in ubiquitous computing. What lessons can we take from the research to date as we move forward? We are confident that a lively and useful discussion will be engendered by bringing directions from recent ubicomp research in games, multimedia applications, and social software to ongoing research in conference rooms systems: integrating architecture and tangible media, information design and display, and mobile and computer-mediated communications.
Publication Details
  • Paper presented at SIGGRAPH 2005, Los Angeles.
  • Sep 29, 2005

Abstract

Close
The Convertible Podium is a central control station for rich media in next-generation classrooms. It integrates flexible control systems for multimedia software and hardware, and is designed for use in classrooms with multiple screens, multiple media sources and multiple distribution channels. The built-in custom electronics and unique convertible podium frame allows intuitive conversion between use modes (either manual or automatic). The at-a-touch sound and light control system gives control over the classroom environment. Presentations can be pre-authored for effective performance, and quickly altered on the fly. The counter-weighted and motorized conversion system allows one person to change modes simply by lifting the top of the Podium to the correct position for each mode. The Podium is lightweight, mobile, and wireless, and features an onboard 21" LCD display, document cameras and other capture devices, tangible controls for hardware and software, and also possesses embedded RFID sensing for automatic data retrieval and file management. It is designed to ease the tasks involved in authoring and presenting in a rich media classroom, as well as supporting remote telepresence and integration with other mobile devices.
Publication Details
  • INTERACT '05 short paper
  • Sep 12, 2005

Abstract

Close
Indexes such as bookmarks and recommendations are helpful for accessing multimedia documents. This paper describes the 3D Syllabus system, which is designed to visualize indexes to multimedia training content along with the information structures. A double-sided landscape with balloons and cubes represents the personal and group indexes, respectively. The 2D ground plane organizes the indexes as a table and the third dimension of height indicates their importance scores. Additional visual properties of the balloons and cubes provide other information about the indexes and their content. Paths are represented by pipes connecting the balloons. A reliminary evaluation of the 3D Syllabus prototype suggests that it is more efficient than a typical training CD-ROM and is more enjoyable to use.
Publication Details
  • INTERACT 2005, LNCS 3585, pp. 781-794
  • Sep 12, 2005

Abstract

Close
A video database can contain a large number of videos ranging from several minutes to several hours in length. Typically, it is not sufficient to search just for relevant videos, because the task still remains to find the relevant clip, typically less than one minute of length, within the video. This makes it important to direct the users attention to the most promising material and to indicate what material they already investigated. Based on this premise, we created a video search system with a powerful and flexible user interface that incorporates dynamic visualizations of the underlying multimedia objects. The system employes an automatic story segmentation, combines text and visual search, and displays search results in ranked sets of story keyframe collages. By adapting the keyframe collages based on query relevance and indicating which portions of the video have already been explored, we enable users to quickly find relevant sections. We tested our system as part of the NIST TRECVID interactive search evaluation, and found that our user interface enabled users to find more relevant results within the allotted time than other systems employing more sophisticated analysis techniques but less helpful user interfaces.
Publication Details
  • M.F. Costabile and F. Paternò (Eds.): INTERACT 2005, LNCS 3585
  • Sep 12, 2005

Abstract

Close
We developed and studied an experimental system, RealTourist, which lets a user to plan a conference trip with the help of a remote tourist consultant who could view the tourist's eye-gaze superimposed onto a shared map. Data collected from the experiment were analyzed in conjunction with literature review on speech and eye-gaze patterns. This inspective, exploratory research identified various functions of gaze-overlay on shared spatial material including: accurate and direct display of partner's eye-gaze, implicit deictic referencing, interest detection, common focus and topic switching, increased redundancy and ambiguity reduction, and an increase of assurance, confidence, and understanding. This study serves two purposes. The first is to identify patterns that can serve as a basis for designing multimodal human-computer dialogue systems with eye-gaze locus as a contributing channel. The second is to investigate how computer-mediated communication can be supported by the display of the partner's eye-gaze.
Publication Details
  • Short presentation in UbiComp 2005 workshop in Tokyo, Japan.
  • Sep 11, 2005

Abstract

Close
As the use of rich media in mobile devices and smart environments becomes more sophisticated, so must the design of the everyday objects used as containers or controllers. Rather than simply tacking electronics onto existing furniture or other objects, the design of a smart object can enhance existing ap-plications in unexpected ways. The Convertible Podium is an experiment in the design of a smart object with complex integrated systems, combining the highly designed look and feel of a modern lectern with systems that allow it to serve as a central control station for rich media manipulation in next-generation confer-ence rooms. It enables easy control of multiple independent screens, multiple media sources (including mobile devices) and multiple distribution channels. The Podium is designed to ease the tasks involved in authoring and presenting in a rich media meeting room, as well as supporting remote telepresence and in-tegration with mobile devices.
Publication Details
  • Demo and presentation in UbiComp 2005 workshop in Tokyo, Japan.
  • Sep 11, 2005

Abstract

Close
A Post-Bit is a prototype of a small ePaper device for handling multimedia content, combining interaction control and display into one package. Post-Bits are modeled after paper Post-Its™; the functions of each Post-Bit combine the affordances of physical tiny sticky memos and digital handling of information. Post-Bits enable us to arrange multimedia contents in our embodied physical spaces. Tangible properties of paper such as flipping, flexing, scattering and rubbing are mapped to controlling aspects of the content. In this paper, we introduce the integrated design and functionality of the Post-Bit system, including four main components: the ePaper sticky memo/player, with integrated sensors and connectors; a small container/binder that a few Post-Bits can fit into, for ordering and multiple connections; the data and power port that allows communication with the host com-puter; and finally the software and GUI interface that reside on the host PC and manage multimedia transfer.
Publication Details
  • Sixteenth ACM Conference on Hypertext and Hypermedia
  • Sep 6, 2005

Abstract

Close
Hyper-Hitchcock is a hypervideo editor enabling the direct manipulation authoring of a particular form of hypervideo called "detail-on-demand video." This form of hypervideo allows a single link out of the currently playing video to provide more details on the content currently being presented. The editor includes a workspace to select, group, and arrange video clips into several linear sequences. Navigational links placed between the video elements are assigned labels and return behaviors appropriate to the goals of the hypervideo and the role of the destination video. Hyper-Hitchcock was used by students in a Computers and New Media class to author hypervideos on a variety of topics. The produced hypervideos provide examples of hypervideo structures and the link properties and behaviors needed to support them. Feedback from students identified additional link behaviors and features required to support new hypervideo genres. This feedback is valuable for the redesign of Hyper-Hitchcock and the design of hypervideo editors in general.

DoKumobility: Web services for the mobile worker

Publication Details
  • IEEE International Conference on Next Generation Web Services Practices (NWeSP'05), Seoul, Korea
  • Aug 22, 2005

Abstract

Close
Mobile users often require access to their documents while away from the office. While pre-loading documents in a repository can make those documents available remotely, people need to know in advance which documents they might need. Furthermore, it may be difficult to view, print, or share the document through a portable device such as cell phone. We implemented DoKumobility, a network of web services for mobile users for managing, printing, and sharing documents. In this paper, we describe the infrastructure and illustrate its use with several applications
Publication Details
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Aug 8, 2005

Abstract

Close
Organizing digital photograph collections according to events such as holiday gatherings or vacations is a common practice among photographers. To support photographers in this task, we present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present several variants of an automatic unsupervised algorithm to partition a collection of digital photographs based either on temporal similarity alone, or on temporal and content-based similarity. First, inter-photo similarity is quantified at multiple temporal scales to identify likely event clusters. Second, the final clusters are determined according to one of three clustering goodness criteria. The clustering criteria trade off computational complexity and performance. We also describe a supervised clustering method based on learning vector quantization. Finally, we review the results of an experimental evaluation of the proposed algorithms and existing approaches on two test collections.

Parallel Changes: Detecting Semantic Interferences

Publication Details
  • The 29th Annual International Computer Software and Applications Conference (COMPSAC 2005), Edinburgh, Scotland
  • Jul 26, 2005

Abstract

Close
Parallel changes are a basic fact of modern software development. Where previously we looked at prima facie interference, here we investigate a less direct form that we call semantic interference. We reduce the forms of semantic interference that we are interested in to overlapping def-use pairs. Using program slicing and data flow analysis, we present algorithms for detecting semantic interference for both concurrent changes (allowed in optimistic version management systems) and sequential parallel changes (supported in pessimistic version management systems), and for changes that are both immediate and distant in time. We provide these algorithms for changes that are additions, showing that interference caused by deletions can be detected by considering the two sets of changes in reverse-time order.
Publication Details
  • International Conference on Image and Video Retrieval 2005
  • Jul 21, 2005

Abstract

Close
Large video collections present a unique set of challenges to the search system designer. Text transcripts do not always provide an accurate index to the visual content, and the performance of visually based semantic extraction techniques is often inadequate for search tasks. The searcher must be relied upon to provide detailed judgment of the relevance of specific video segments. We describe a video search system that facilitates this user task by efficiently presenting search results in semantically meaningful units to simplify exploration of query results and query reformulation. We employ a story segmentation system and supporting user interface elements to effectively present query results at the story level. The system was tested in the 2004 TRECVID interactive search evaluations with very positive results.
Publication Details
  • ICME 2005
  • Jul 20, 2005

Abstract

Close
A common problem with teleconferences is awkward turn-taking - particularly 'collisions,' whereby multiple parties inadvertently speak over each other due to communication delays. We propose a model for teleconference discussions including the effects of delays, and describe tools that can improve the quality of those interactions. We describe an interface to gently provide latency awareness, and to give advanced notice of 'incoming speech' to help participants avoid collisions. This is possible when codec latencies are significant, or when a low bandwidth side channel or out-of-band signaling is available with lower latency than the primary video channel. We report on results of simulations, and of experiments carried out with transpacific meetings, that demonstrate these tools can improve the quality of teleconference discussions.
Publication Details
  • 2005 IEEE International Conference on Multimedia & Expo
  • Jul 6, 2005

Abstract

Close
A convenient representation of a video segment is a single keyframe. Keyframes are widely used in applications such as non-linear browsing and video editing. With existing methods of keyframe selection, similar video segments result in very similar keyframes, with the drawback that actual differences between the segments may be obscured. We present methods for keyframe selection based on two criteria: capturing the similarity to the represented segment, and preserving the differences from other segment keyframes, so that different segments will have visually distinct representations. We present two discriminative keyframe selection methods, and an example of experimental results.

AN ONLINE VIDEO COMPOSITION SYSTEM

Publication Details
  • IEEE International Conference on Multimedia & Expo July 6-8, 2005, Amsterdam, The Netherlands
  • Jul 6, 2005

Abstract

Close
This paper presents an information-driven online video composition system. The composition work handled by the system includes dynamically setting multiple pan/tilt/zoom (PTZ) cameras to proper poses and selecting the best close-up view for passive viewers. The main idea of the composition system is to maximize captured video information with limited cameras. Unlike video composition based on heuristic rules, our video composition is formulated as a process of minimizing distortions between ideal signals (i.e. signals with infinite spatial-temporal resolution) and displayed signals. The formulation is consistent with many well-known empirical approaches widely used in previous systems and may provide analytical explanations to those approaches. Moreover, it provides a novel approach for studying video composition tasks systematically. The composition system allows each user to select a personal close-up view. It manages PTZ cameras and a video switcher based on both signal characteristics and users' view selections. Additionally, it can automate the video composition process based on past users' view-selections when immediate selections are not available. We demonstrate the performance of this system with real meetings.
Publication Details
  • CHI 2005 Extended Abstracts, ACM Press, pp. 1395-1398
  • Apr 1, 2005

Abstract

Close
We present a search interface for large video collections with time-aligned text transcripts. The system is designed for users such as intelligence analysts that need to quickly find video clips relevant to a topic expressed in text and images. A key component of the system is a powerful and flexible user interface that incorporates dynamic visualizations of the underlying multimedia objects. The interface displays search results in ranked sets of story keyframe collages, and lets users explore the shots in a story. By adapting the keyframe collages based on query relevance and indicating which portions of the video have already been explored, we enable users to quickly find relevant sections. We tested our system as part of the NIST TRECVID interactive search evaluation, and found that our user interface enabled users to find more relevant results within the allotted time than those of many systems employing more sophisticated analysis techniques.

Improving Proactive Information Systems

Publication Details
  • International Conference on Intelligent User Interfaces (IUI 2005)
  • Jan 9, 2005

Abstract

Close
Proactive contextual information systems help people locate information by automatically suggesting potentially relevant resources based on their current tasks or interests. Such systems are becoming increasingly popular, but designing user interfaces that effectively communicate recommended information is a challenge: the interface must be unobtrusive, yet communicate enough information at the right time to provide value to the user. In this paper we describe our experience with the FXPAL Bar, a proactive information system designed to provide contextual access to corporate and personal resources. In particular, we present three features designed to communicate proactive recommendations more effectively: translucent recommendation windows increase the user's awareness of particularly highly-ranked recommendations, query term highlighting communicates the relationship between a recommended document and the user's current context, and a novel recommendation digest function allows users to return to the most relevant previously recommended resources. We present empirical evidence supporting our design decisions and relate lessons learned for other designers of contextual recommendation systems.
2004

Contextual Lexical Valence Shifters

Publication Details
  • Yan Qu, James Shanahan, and Janyce Wiebe, Cochairs. 2004. Exploring Attitude and Affect in Text: Theories and Applications. Technical Report SS-04-07, AAAI Press, ISBN 1-57735-219-x
  • Dec 6, 2004
Publication Details
  • Springer Lecture Notes in Computer Science - Advances in Multimedia Information Processing, Proc. PCM 2004 5th Pacific Rim Conference on Multimedia, Tokyo, Japan
  • Dec 1, 2004

Abstract

Close
For some years, our group at FX Palo Alto Laboratory has been developing technologies to support meeting recording, collaboration, and videoconferencing. This paper presents several systems that use video as an active interface, allowing remote devices and information to be accessed "through the screen." For example, SPEC enables collaborative and automatic camera control through an active video window. The NoteLook system allows a user to grab an image from a computer display, annotate it with digital ink, then drag it to that or a different display. The ePIC system facilitates natural control of multi-display and multi-device presentation spaces, while the iLight system allows remote users to "draw" with light on a local object. All our systems serve as platforms for researching more sophisticated algorithms to support additional functionality and ease of use.
Publication Details
  • ACM Multimedia 2004
  • Oct 28, 2004

Abstract

Close
In this paper, we compare several recent approaches to video segmentation using pairwise similarity. We first review and contrast the approaches within the common framework of similarity analysis and kernel correlation. We then combine these approaches with non-parametric supervised classification for shot boundary detection. Finally, we discuss comparative experimental results using the 2002 TRECVID shot boundary detection test collection.

Who cares? Reflecting who is reading what on distributed community bulletin boards

Publication Details
  • UIST 2004, the Seventeenth Annual ACM Symposium on User Interface Software and Technology, October 24-27, 2004
  • Oct 24, 2004

Abstract

Close
In this paper, we describe the YeTi information sharing system that has been designed to foster community building through informal digital content sharing. The YeTi system is a general information parsing, hosting and distribution infrastructure, with interfaces designed for individual and public content reading. In this paper we describe the YeTi public display interface, with a particular focus on tools we have designed to provide lightweight awareness of others' interactions with and interest in posted content. Our tools augment content with metadata that reflect people's reading of content - captured video clips of who's reading and interacting with content, tools to allow people to leave explicit freehand annotations about content, and a visualization of the content access history to show when content is interacted with. Results from an initial evaluation are presented and discussed.
Publication Details
  • UIST 2004 Companion, pp. 37-38
  • Oct 24, 2004

Abstract

Close
As the size of the typical personal digital photo collection reaches well into the thousands or photos, advanced tools to manage these large collections are more and more necessary. In this demonstration, we present a semi-automatic approach that opportunistically takes advantage of the current state-of-the-art technology in face detection and recognition and combines it with user interface techniques to facilitate the task of labeling people in photos. We show how we use an accurate face detector to automatically extract faces from photos. Instead of having a less accurate face recognizer classify faces, we use it to sort faces by their similarity to a face model. We demonstrate our photo application that uses the extracted faces as UI proxies for actions on the underlying photos along with the sorting strategy to identify candidate faces for quick and easy face labeling.
Publication Details
  • UIST 2004 Companion, pp. 13-14
  • Oct 24, 2004

Abstract

Close
We developed a novel technique for creating visually pleasing collages from photo regions. The technique is called "stained glass" because the resulting collage with irregular shapes is reminiscent of a stained glass window. The collages reuse photos in novel ways to present photos with faces that can be printed, included in Web pages, or shared via email. The poster describes the requirements for creating stained glass visualizations from photos of faces, our approach for creating face stained glass, and techniques used to improve the aesthetics and flexibility of the stained glass generation. Early user feedback with face stained glass have been very positive.

Remote Interactive Graffiti

Publication Details
  • Proc. ACM Multimedia 2004
  • Oct 12, 2004

Abstract

Close
We present an installation that allows distributed internet participants to "draw" on a public scene using light. The iLight system is a camera/projector system designed for remote collaboration. Using a familiar digital drawing interface, remote users "draw" on a live video image of a real-life object or scene. Graphics drawn by the user are then projected onto the scene, where they are visible in the camera image. Because camera distortions are corrected and the video is aligned with the image canvas, drawn graphics appear exactly where desired. Thus the remote users may harmlessly mark a physical object to serve their own their artistic and/or expressive needs. We also describe how local participants may interact with remote users through the projected images. Besides the intrinsic "neat factor" of action at a distance, this installation serves as an experiment in how multiple users from different locales and cultures can create a social space that interacts with a physical one, as well as raising issues of free expression in a non-destructive context.
Publication Details
  • Proceedings of the International Workshop on Multimedia Information Retrieval, ACM Press, pp. 99-106
  • Oct 10, 2004

Abstract

Close
With digital still cameras, users can easily collect thousands of photos. We have created a photo management application with the goal of making photo organization and browsing simple and quick, even for very large collections. A particular concern is the management of photos depicting people. We present a semi-automatic approach designed to facilitate the task of labeling photos with people that opportunistically takes advantage of the strengths of current state-of-the-art technology in face detection and recognition. In particular, an accurate face detector is used to automatically extract faces from photos while the less accurate face recognizer is used not to classify the detected faces, but to sort faces by their similarity to a chosen model. This sorting is used to present candidate faces within a user interface designed for quick and easy face labeling. We present results of a simulation of the usage model that demonstrate the improved ease that is achieved by our method.
Publication Details
  • IEEE Computer Graphics & Applications, pp. 66-75
  • Sep 1, 2004

Abstract

Close
Information sharing, computation and social interaction are main features of the Web that has enabled online communities to abound and flourish. However, this trend has not been coupled with the development of cues and browsing mechanisms for the social space. On the flip side, active contributors to social spaces (i.e., Web communities) lack the means to present a public face to visitors that can be important for social organizations. Social browsers that combine social visualization and tools can enable newcomers and visitors to view and explore information and patterns. We present two social browsers for two Web communities. The CHIplace People browser provides an abstract graphical view of the CHIplace community based on the self-described work roles of its membership. The Portkey eTree browser uses a life-like tree ecosystem metaphor to reflect the people, activities and discussions occurring on the Portkey Web site.
Publication Details
  • In Proceedings of Hypertext 2004, ACM Press
  • Aug 9, 2004

Abstract

Close
The preservation of literary hypertexts presents significant challenges if we are to ensure continued access to them as the underlying technology changes. Not only does such an effort involve standard digital preservation problems of representing and refreshing metadata, any constituent media types, and structure; hypertext preservation poses additional dimensions that arise from the work's on-screen appearance, its interactive behavior, and the ways a reader's interaction with the work is recorded. In this paper, we describe aspects of preservation introduced by literary hypertexts such as the need to reproduce their modes of interactivity and their means of capturing and using records of reading. We then suggest strategies for addressing the pragmatic dimensions of hypertext preservation and discuss their status within existing digital preservation schemes. Finally, we examine the possible roles various stakeholders within and outside of the hypertext community might assume, including several social and legal issues that stem from preservation.

Hybrid Text Summarization: Combining external relevance measures with Structural Analysis

Publication Details
  • Proceedings of the ACL2004 Workshop Text Summarization Branches Out, Barcelona, Spain, July 25-26, 2004.
  • Jul 25, 2004

Abstract

Close
A novel linguistically advanced text summarization system is described for reducing the minimum size of highly readable variable-sized summaries of digitized text documents produced by text summarization methods that use discourse analysis to rank sentences for in-clusion in the final summary. The basic algorithm used in FXPAL's PALSUMM text summarization system combines text structure methods that preserve readability and correct reference resolution with statistical methods to reduce overall summary length while promoting the inclusion of important material.

Sentential Structure and Discourse Parsing

Publication Details
  • Proceedings of the ACL2004 Workshop on Discourse Annotation, Barcelona, Spain, July 25-26, 2004.
  • Jul 25, 2004

Abstract

Close
In this paper, we describe how the LIDAS System (Linguistic Discourse Analysis System), a discourse parser built as an implementation of the Unified Linguistic Discourse Model (U-LDM) uses information from sentential syntax and semantics along with lexical semantic information to build the Open Right Discourse Parse Tree (DPT) that serves as a representation of the structure of the discourse (Polanyi et al., 2004; Thione 2004a,b). More specifically, we discuss how discourse segmentation, sentence-level discourse parsing, and text-level discourse parsing depend on the relationship between sentential syntax and discourse. Specific discourse rules that use syntactic information are used to identify possible attachment points and attachment relations for each Basic Discourse Unit to the DPT.

LiveTree: An Integrated Workbench for Discourse Processing

Publication Details
  • Proceedings of the ACL2004 Workshop on Discourse Annotation, Barcelona, Spain, July 25-26, 2004.
  • Jul 25, 2004

Abstract

Close
In this paper, we introduce LiveTree, a core component of LIDAS, the Linguistic Discourse Analysis System for automatic discourse parsing with the Unified Linguistic Discourse Model. LiveTree is an integrated workbench for supervised and unsupervised creation, storage and manipulation of the discourse structure of text documents under the U-LDM. The LiveTree environment provides tools for manual and automatic U-LDM segmentation and discourse parsing. Document management, grammar testing, manipulation of discourse structures and creation and editing of discourse relations are also supported.
Publication Details
  • Proceedings of 2004 IEEE International Conference on Multimedia and Expo (ICME 2004)
  • Jun 27, 2004

Abstract

Close
This paper presents a method for creating highly condensed video summaries called Stained-Glass visualizations. These are especially suitable for small displays on mobile devices. A morphological grouping technique is described for finding 3D regions of high activity or motion from a video embedded in x-y-t space. These regions determine areas in the keyframes, which can be subsumed in a more general geometric framework of germs and supports: germs are the areas of interest, and supports give the context. Algorithms for packing and laying out the germs are provided. Gaps between the germs are filled using a Voronoi-based method. Irregular shapes emerge, and the result looks like stained glass.
Publication Details
  • Proceedings of 2004 IEEE International Conference on Multimedia and Expo (ICME 2004)
  • Jun 27, 2004

Abstract

Close
Using a machine to assist remote environment management can save people's time, effort, and traveling cost. This paper proposes a trainable mobile robot system, which allows people to watch a remote site through a set of cameras installed on the robot, drive the platform around, and control remote devices using mouse or pen based gestures performed in video windows. Furthermore, the robot can learn device operations when it is being used by humans. After being used for a while, the robot can automatically select device control interfaces, or launch a pre-defined operation sequence based on its sensory inputs.
Publication Details
  • Proceedings of 2004 IEEE International Conference on Multimedia and Expo (ICME 2004)
  • Jun 27, 2004

Abstract

Close
Many conference rooms are now equipped with multiple multi-media devices, such as plasma displays and surrounding speakers, to enhance presentation quality. However, most existing presentation authoring tools are based on the one-display-and-one-speaker assumption, which makes it difficult to organize and playback a presentation dispatched to multiple devices, thus hinders users from taking full advantage of additional multimedia devices. In this paper, we propose and implement a tool to facilitate authoring and playback of a multi-channel presentation in a media devices distributed environment. The tool, named PreAuthor, provides an intuitive and visual way to author a multi-channel presentation by dragging and dropping "hyper-slides" on corresponding visual representations of various devices. PreAuthor supports "hyper-slide" synchronization among various output devices during preview and playback. It also offers multiple options for the presenter to view the presentation in a rendered image sequence, live video, 3D VRML model, or real environment.
Publication Details
  • JOINT AMI/PASCAL/IM2/M4 Workshop on Multimodal Interaction and Related Machine Learning Algorithms
  • Jun 22, 2004

Abstract

Close
For some years, our group at FX Palo Alto Laboratory has been developing technologies to support meeting recording, collaboration, and videoconferencing. This paper presents a few of our more interesting research directions. Many of our systems use a video image as an interface, allowing devices and information to be accessed "through the screen." For example, SPEC enables hybrid collaborative and automatic camera control through an active video window. The NoteLook system allows a user to grab an image from a computer display, annotate it with digital ink, then drag it to that or a different display, while automatically generating timestamps for later video review. The ePIC system allows natural use and control of multi-display and multi-device presentation spaces, and the iLight system allows remote users to "draw" with light on a local object. All our systems serve as platforms for researching more sophisticated algorithms that will hopefully support additional advanced functions and ease of use.
Publication Details
  • ED-Media 2004
  • Jun 21, 2004

Abstract

Close
This paper presents a system designed to support note taking by students on wirelessly connected PDAs in a classroom. The system leverages the devices' wireless connectivity to allow students to share their notes in real time and quickly reuse words from their fellow note takers. In addition, presentation material such as Powerpoint slides is also extracted when presented by the instructor, giving students further means for reusing words. We describe the system and report our findings on an initial user study where the system was used for four months during a graduate level course.
Publication Details
  • Journal of Human Interface Society, 6(2), pp. 51-58
  • Jun 1, 2004

Abstract

Close
FX Palo Alto Laboratory provides multimedia and information technology research for the Fuji Xerox corporation based in Tokyo, Japan. FXPAL's mission is to help Fuji Xerox with a digital information technology infrastructure to support services in Fuji Xerox's Open Office Frontier. Our research spans interactive media, immersive conferencing, social computing, mobile and adaptive computing, natural language inquiry, and emerging technologies such as quantum computing and bioinformatics. Our research methods combine determining user needs, inventing new technologies, building prototype systems, informing professional communities, and transferring technology to Fuji Xerox. The physical distance between our laboratory and our parent company makes it natural for us to research problems with collaborations across time zones and cultures. To address these problems, to test our ideas, and to prepare for technology transfers, we actively create prototype systems for interactive media, immersive conferencing, and social and mobile computing. We also foster collaboration with our Japanese colleagues through a combination of face-to-face visits and both synchronous and asynchronous remote communication.
Publication Details
  • Proceedings of the Working Conference on Advanced Visual Interfaces, AVI 2004, pp. 290-297
  • May 25, 2004

Abstract

Close
We introduced detail-on-demand video as a simple type of hypervideo that allows users to watch short video segments and to follow hyperlinks to see additional detail. Such video lets users quickly access desired information without having to view the entire contents linearly. A challenge for presenting this type of video is to provide users with the appropriate affordances to understand the hypervideo structure and to navigate it effectively. Another challenge is to give authors tools that allow them to create good detail-on-demand video. Guided by user feedback, we iterated designs for a detail-on-demand video player. We also conducted two user studies to gain insight into people's understanding of hypervideo and to improve the user interface. We found that the interface design was tightly coupled to understanding hypervideo structure and that different designs greatly affected what parts of the video people accessed. The studies also suggested new guidelines for hypervideo authoring.
Publication Details
  • ACM Interactions Magazine
  • May 1, 2004

Abstract

Close
This article describes two years of experience with a research prototype for personalizing shared workplace devices such as projectors, public displays, and multi-function copiers. The system combines users' networked resources-or "personal information clouds"—with device-specific user interfaces for performing common device tasks. We developed and compared personal interfaces that are embedded (i.e., integrated or co-located with the shared device) and portable (i.e., accessible via personal devices such as mobile phones and PDAs). Our experience indicates that a little personalization can go a long way toward improving user friendliness, efficiency, and capabilities of shared document devices, helping them "weave themselves into the fabric of everyday life". We also gained important insights into subtle differences between embedded and portable approaches to ubiquitous computing systems.

A Rule Based Approach to Discourse Parsing

Publication Details
  • Proceedings of the 5th SIGdial Workshop in Discourse And Dialogue. Cambridge, MA USA pp. 108-117.
  • May 1, 2004

Abstract

Close
In this paper we present recent developments in discourse theory and parsing under the Linguistic Discourse Model framework, a semantic theory of discourse structure. We give a novel approach to the problems of discourse segmentation based on discourse semantics and sketch a limited but robust approach to symbolic discourse parsing based on syntactic, semantic and lexical rules. To demonstrate the utilioty of the system in a real application, we briefly describe the architecture of the PALSUMM System, a symbolic smmarization system being developed at FX Palo Alto Laboratory that uses discourse structures constructed usding the theory otlined to summarize written English texts.

MiniMedia Surfer: Browsing Video Segments on Small Displays

Publication Details
  • CHI 2004 short paper
  • Apr 27, 2004

Abstract

Close
It is challenging to browse multimedia on mobile devices with small displays. We present MiniMedia Surfer, a prototype application for interactively searching a multimedia collection for video segments of interest. Transparent layers are used to support browsing subtasks: keyword query, exploration of results through keyframes, and playback of video. This layered interface smoothly blends the key tasks of the browsing process and deals with the small screen size. During exploration, the user can adjust the transparency levels of the layers using pen gestures. Details of the video segments are displayed in an expandable timeline that supports gestural interaction.

Sharing Multimedia Content with Interactive Displays: A Case Study

Publication Details
  • ACM DIS 2004, Cambridge, August 1-4, 2004. New York: ACM
  • Apr 18, 2004

Abstract

Close
The Plasma Posters are large screen, digital, interactive posterboards designed for informal content sharing within teams, groups, organizations and communities. Leveraging the fact that the physical world is often used as a canvas for asynchronous information exchange via paper fliers and posters, the Plasma Posters provide a platform for sharing interactive digital content in public places. People serendipitously encounter multimedia content usually only encountered from the desktop, while going about their daily business. In this paper we describe the Plasma Poster interface in detail, and offer and overview of the underlying information authoring, parsing, storage, distribution and publishing infrastructure, the Plasma Poster Network. We report qualitative and quantitative data collected over 14 months of use that demonstrate the Plasma Posters have become an integral part of information sharing within our organization. We conclude the paper by reflecting on the patterns of adoption, and speculate on factors that have contributed to the system's success. Finally we briefly describe three other installations of the Plasma Poster Network and new interfaces that have been designed.

Collaborative Note Taking

Publication Details
  • WMTE 2004
  • Mar 22, 2004

Abstract

Close
Collaborative note taking enables students in a class to take notes on their PDAs and share them with their "study group" in real-time. Students receive instructor's slides on their PDAs as they are displayed by the instructor. As the individual members of the group take notes pertaining to the slide being presented, their notes are automatically sent to all members of the group. In addition, to reduce their typing, students can use text they receive from other students and from the instructors slides to construct their notes. This system has been used in actual practice for a graduate level course on wireless mobile computing. In developing this system, special attention has been paid to the task of inputting text on PDAs, efficient use of the screen real estate, dynamics among students, privacy and ease of use issues.

Shot boundary detection via similarity analysis

Publication Details
  • Proceedings TRECVID 2003
  • Mar 1, 2004

Abstract

Close
In this paper, we present a framework for analyzing video using self-similarity. Video scenes are located by analyzing inter-frame similarity matrices. The approach is flexible to the choice of both feature parametrization and similarity measure and it is robust because the data is used to model itself. We present the approach and its application to shot boundary detection.

Digital Graffiti: Public Annotation of Multimedia Content

Publication Details
  • CHI 2004, Vienna, Austria, April 24-29, 2004. New York: ACM Publications.
  • Feb 26, 2004

Abstract

Close
Our physical environment is increasingly filled with multimedia content on situated, community public displays. We are designing methods for people to post and acquire digital information to and from public digital displays, and to modify and annotate previously posted content to create publicly observable threads. We support in-the-moment and on-site "person-to-place-to-people-to-persons" content interaction, annotation, augmentation and publication. We draw design inspiration from field work observations of how people remove, modify and mark up paper postings. We present our initial designs in this arena, and some initial user reactions.

Gooey Interfaces: An Approach for Rapidly Repurposing Digital Content

Publication Details
  • CHI 2004, Vienna, Austria, April 24-29, 2004. New York: ACM Publications
  • Feb 20, 2004

Abstract

Close
With the acceleration of technological development we are reaching the point where our systems and their user interfaces become to some degree outdated 'legacy systems' as soon as they are released. This raises the question of how can we maintain, extend, override, and adapt these systems while preserving what people depend on in them? In this paper we describe an approach for dynamically restructuring user interfaces into a set of communicating processes that 1) provide methods for changing their appearance, behavior, and state; and 2) report their proposed state changes so that other processes may override their actions in updating themselves to a new state. We do this for both new and wrapped legacy user interface components, thereby allowing us to repurpose user interfaces for our evolving needs. We describe how this approach has been successfully used in rapidly creating and deploying interfaces that repurpose content for new appearances and behaviors.
Publication Details
  • Communications of the ACM, February 2004, Vol. 47, No. 2, pp. 38-44
  • Feb 1, 2004

Abstract

Close
Blurring the boundary between the digital and physical in social activity spaces helps blend - and motivate - online and face-to-face community participation. This paper discusses two experimental installations of large screen displays at conferences - CHI 2002 and CSCW 2002. The displays offered a window in the conference arena onto online community information.

Contextual Contact Retrieval

Publication Details
  • International Conference on Intelligent User Interfaces (IUI 2004)
  • Jan 13, 2004

Abstract

Close
People routinely rely on physical and electronic systems to remind themselves of details regarding personal and organizational contacts. These systems include rolodexes, directories and contact databases. In order to access details regarding contacts, users must typically shift their attention from tasks they are performing to the contact system itself in order to manually look-up contacts. This paper presents an approach for automatically retrieving contacts based on users' current context. Results are presented to users in a manner that does not disrupt their tasks, but which allows them to access contact details with a single interaction. The approach promotes the discovery of new contacts that users may not have found otherwise and supports serendipity.

Inhabited Information Spaces: Living with Your Data

Publication Details
  • London: Springer-Verlag, 2003
  • Jan 5, 2004

Abstract

Close
This edited volume brings together projects funded by, or related to, the European i3 initiative. Projects address the design, development and use of rich, digital information spaces for collocated and distributed collaboration.
2003

Public and Situated Displays. Social and Interactional Aspects of Shared Display Technologies

Publication Details
  • Lond: Kluwer Academic Publishers
  • Dec 31, 2003

Abstract

Close
Public and situated display technologies have an important impact on individual and social behaviour and present us with particular interesting new design considerations and challenges. While there is a growing body of research exploring these design considerations and social impacts this work remains somewhat disparate, making it difficult to assimilate in a coherent manner. This book brings together the perspective of key researchers in the area of public and situated display technology. The chapters detail research representing the social, technical and interactional aspects of public and sitauted display technologies. The underlying concern common to these chapters is how these displays can be best designed for collaboration, coordination, community building and mobility.

THE PLASMA POSTER NETWORK Social Hypermedia on Public Display

Publication Details
  • In Public and Situated Displays. Social and Interactional Aspects of Shared Display Technologies. K. O'Hara, M.Perry, E. Churchill and D. Russell (Eds) London: Kluwer Acamdemic Publishers
  • Dec 31, 2003

Abstract

Close
The sharing of digital materials within online communities has increased significantly in recent years. Our work focuses on promoting community information sharing in public spaces using large screen, interactive, digital poster boards called the Plasma Posters. In this chapter we first describe our fieldwork-led, iterative design process, and elaborate a number of design guidelines that resulted. Following this, the design and development of the Plasma Posters themselves and the underlying network infrastructure is discussed. Finally, we present results from qualitative and quantitative evaluations over the course of a ten-month deployment of three Plasma Posters within our own organization, a software research community made up of technologists and designers. We conclude with observations regarding ergonomic, social and other factors that were raised during the design and deployment and offer reflections on factors in the success of this deployment.

A fast, interactive 3D paper-flier metaphor for digital bulletin boards

Publication Details
  • UIST 2003
  • Nov 1, 2003

Abstract

Close
We describe a novel interface for presenting interactive content on public digital bulletin boards. Inspired by paper fliers on physical bulletin boards, posted content is displayed using 3D virtual fliers attached to a virtual corkboard by virtual pushpins. Fliers appear in different orientations, creating an attractive, informal look, and have autonomous behaviors like fluttering in the wind. Passers-by can rotate, move and fold fliers; they can also interact with fliers' live content. Flier content is streamed from a server and represented by the system on large screen displays using a real-time cloth simulation algorithm. We describe our prototype, and offer the results of an initial evaluative user study.
Publication Details
  • Proc. ACM Multimedia 2003. pp. 364-373
  • Nov 1, 2003

Abstract

Close
We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present results for the algorithm based solely on temporal similarity, and jointly on temporal and content-based similarity. We also describe a supervised algorithm based on learning vector quantization. Finally, we include experimental results for the proposed algorithms and several competing approaches on two test collections.
Publication Details
  • Proc. ACM Multimedia 2003, pp. 546-554
  • Nov 1, 2003

Abstract

Close
We present a system that allows remote and local participants to control devices in a meeting environment using mouse or pen based gestures "through" video windows. Unlike state-of-the-art device control interfaces that require interaction with text commands, buttons, or other artificial symbols, our approach allows users to interact with devices through live video of the environment. This naturally extends our video supported pan/tilt/zoom (PTZ) camera control system, by allowing gestures in video windows to control not only PTZ cameras, but also other devices visible in video images. For example, an authorized meeting participant can show a presentation on a screen by dragging the file on a personal laptop and dropping it on the video image of the presentation screen. This paper presents the system architecture, implementation tradeoffs, and various meeting control scenarios.
Publication Details
  • Proc. ACM Multimedia 2003. pp. 92-93
  • Nov 1, 2003

Abstract

Close
To simplify the process of editing interactive video, we developed the concept of "detail-on-demand" video as a subset of general hypervideo. Detail-on-demand video keeps the authoring and viewing interfaces relatively simple while supporting a wide range of interactive video applications. Our editor, Hyper-Hitchcock, provides a direct manipulation environment in which authors can combine video clips and place hyperlinks between them. To summarize a video, Hyper-Hitchcock can also automatically generate a hypervideo composed of multiple video summary levels and navigational links between these summaries and the original video. Viewers may interactively select the amount of detail they see, access more detailed summaries, and navigate to the source video through the summary.
Publication Details
  • Proc. ACM Multimedia 2003. pp. 392-401
  • Nov 1, 2003

Abstract

Close
In this paper, we describe how a detail-on-demand representation for interactive video is used in video summarization. Our approach automatically generates a hypervideo composed of multiple video summary levels and navigational links between these summaries and the original video. Viewers may interactively select the amount of detail they see, access more detailed summaries, and navigate to the source video through the summary. We created a representation for interactive video that supports a wide range of interactive video applications and Hyper-Hitchcock, an editor and player for this type of interactive video. Hyper-Hitchcock employs methods to determine (1) the number and length of levels in the hypervideo summary, (2) the video clips for each level in the hypervideo, (3) the grouping of clips into composites, and (4) the links between elements in the summary. These decisions are based on an inferred quality of video segments and temporal relations those segments.

Detail-on-Demand Hypervideo

Publication Details
  • Proc. ACM Multimedia 2003. pp. 600-601
  • Nov 1, 2003

Abstract

Close
We demonstrate the use of detail-on-demand hypervideo in interactive training and video summarization. Detail-on-demand video allows viewers to watch short video segments and to follow hyperlinks to see additional detail. The player for detail-ondemand video displays keyframes indicating what links are available at each point in the video. The Hyper-Hitchcock authoring tool helps users create hypervideo by automatically dividing video into clips that can be combined in a direct manipulation interface. Clips can be grouped into composites and hyperlinks can be placed between clips and composites. A summarization algorithm creates multi-level hypervideo summaries from linear video by automatically selecting clips and placing links between them.

Shifting Attitudes

Publication Details
  • Multiple Approaches to Discourse 2003
  • Oct 22, 2003
Publication Details
  • 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
  • Oct 19, 2003

Abstract

Close
We present a framework for summarizing digital media based on structural analysis. Though these methods are applicable to general media, we concentrate here on characterizing repetitive structure in popular music. In the first step, a similarity matrix is calculated from inter-frame spectral similarity. Segment boundaries, such as verse-chorus transitions, are found by correlating a kernel along the diagonal of the matrix. Once segmented, spectral statistics of each segment are computed. In the second step, segments are clustered based on the pairwise similarity of their statistics, using a matrix decomposition approach. Finally, the audio is summarized by combining segments representing the clusters most frequently repeated throughout the piece. We present results on a small corpus showing more than 90% correct detection of verse and chorus segments.

Palimpsests on Public View:Annotating Community Content with Personal Devices

Publication Details
  • UBICOMP, Seattle, October 12-15th
  • Oct 12, 2003

Abstract

Close
This demonstration introduces UbiComp attendees to a system for content annotation and open-air, social blogging on interactive, publicly situated, digital poster boards using public and personal devices. We describe our motivation, a scenario of use, our prototype, and an outline of the demonstration.

Technology Support for Communication and Understanding

Publication Details
  • Journal of Decision Systems, Volume 12, Number 2, pages 123-139
  • Sep 24, 2003

Abstract

Close
As the Internet and related technologies for communication change, the role of communication in the conduct of business changes with it. Communication used to be viewed as a technical problem of separating signal from noise and managing bandwidth. Now it is a social matter in which negotiating differences in understanding among communicators is a primary business priority. Addressing this priority requires an understanding of how individuals interact in the course of their decision making activities. Using the work of Anthony Giddens as a point of departure, this paper views interaction in communication as consisting of three dimensions - meaning, authority, and trust. These three dimensions are used to identify new opportunities for advances in decision making technology that help deal with potential breakdowns in social interaction.
Publication Details
  • Proc. IEEE Intl. Conf. on Image Processing
  • Sep 14, 2003

Abstract

Close
We present similarity-based methods to cluster digital photos by time and image content. This approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We describe versions of the algorithm using temporal similarity with and without content-based similarity, and compare the algorithms with existing techniques, measured against ground-truth clusters created by humans.