Publications

FXPAL publishes in top scientific conferences and journals.

2010

TalkMiner: A Lecture Webcast Search Engine

Publication Details
  • ACM Multimedia 2010
  • Oct 25, 2010

Abstract

Close
The design and implementation of a search engine for lecture webcasts is described. A searchable text index is created allowing users to locate material within lecture videos found on a variety of websites such as YouTube and Berkeley webcasts. The index is created from words on the presentation slides appearing in the video along with any associated metadata such as the title and abstract when available. The video is analyzed to identify a set of distinct slide images, to which OCR and lexical processes are applied which in turn generate a list of indexable terms. Several problems were discovered when trying to identify distinct slides in the video stream. For example, picture-in-picture compositing of a speaker and a presentation slide, switching cameras, and slide builds confuse basic frame-differencing algorithms for extracting keyframe slide images. Algorithms are described that improve slide identification. A prototype system was built to test the algorithms and the utility of the search engine. Users can browse lists of lectures, slides in a specific lecture, or play the lecture video. Over 10,000 lecture videos have been indexed from a variety of sources. A public website will be published in mid 2010 that allows users to experiment with the search engine.
Publication Details
  • ACM Multimedia 2010
  • Oct 25, 2010

Abstract

Close
An Embedded Media Marker (EMM) is a transparent mark printed on a paper document that signifies the availability of additional media associated with that part of the document. Users take a picture of the EMM using a camera phone, and the media associated with that part of the document is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document appearance. Retrieval of media associated with an EMM is based on image features of the document within the EMM boundary. Unlike other feature-based retrieval methods, the EMM clearly indicates to the user the existence and type of media associated with the document location. A semi-automatic authoring tool is used to place an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document. We will demonstrate how to create an EMM-enhanced document, and how the EMM enables access to the associated media on a cell phone.
Publication Details
  • ACM Multimedia
  • Oct 25, 2010

Abstract

Close
FACT is an interactive paper system for fine-grained interaction with documents across the boundary between paper and computers. It consists of a small camera-projector unit, a laptop, and ordinary paper documents. With the camera-projector unit pointing to a paper document, the system allows a user to issue pen gestures on the paper document for selecting fine-grained content and applying various digital functions. For example, the user can choose individual words, symbols, figures, and arbitrary regions for keyword search, copy and paste, web search, and remote sharing. FACT thus enables a computer-like user experience on paper. This paper interaction can be integrated with laptop interaction for cross-media manipulations on multiple documents and views. We present the infrastructure, supporting techniques and interaction design, and demonstrate the feasibility via a quantitative experiment. We also propose applications such as document manipulation, map navigation and remote collaboration.
Publication Details
  • NPUC2010
  • Oct 22, 2010

Abstract

Close
The massive amounts of information that are being collected about each of us will only increase as sensors become ever cheaper and more powerful. Analysis of this wealth of data supports advances in medicine and public health, improved software and services through user pattern analysis, and more efficient economic mechanisms. At the same time, the potential for misuse of such data is significant. A long-term research question is how best to support beneficial uses while inhibiting misuse. One approach is to enable individuals to maintain tighter control of their own data while still supporting the computation of group statistics. Currently, analysts are usually given access to all data in order to compute statistics, and often use a third party service provider to store, or even process, such data. Either the third party has access to all data or the data are encrypted, in which case the third party does no processing. An interesting research question is how to provide mechanisms to support "need to know" security in which an individual has full access to her own data, the third party learns nothing about the data but can nevertheless contribute to the processing, and the analyst learns only the desired statistics. We have explored "need to know" security in connection with MyUnity, a prototype awareness system. MyUnity collects data from a variety of sources and displays summary presence states, such as ``in office'' or ``with visitor,'' computed from the received data. MyUnity was deployed in a small research lab and has been in use by over 30 people for more than a year. To avoid concerns about misuse, the system did not store any data initially. The researchers developing the system were interested, however, in analyzing usage patterns, and users expressed interest in seeing personal trends, activity patterns of coworkers, and long-term data pooled across groups of users, all requiring data to be stored. At the same time, users continued to be concerned about misuse of stored data. We looked at ``need to know'' security for cases in which, at each time step, each member of a group of users has a value (i.e., a presence state) to contribute, and the group would like to provide only an aggregate view of those values to people outside their group. We designed and implemented an efficient protocol that enables each user to encrypt under her own key in such a way that a third party can compute an encryption of a sum across values encrypted under different keys without the need for further interactions with the individuals. The protocol provides means for an analyst to decrypt the encrypted sum. We designed key structures and extensions to provide a family of efficient non-interactive ``need to know'' protocols for time series data in which the analyst learns only the statistics, not the individual data values, and the third party learns nothing about the values.

Camera Pose Navigation using Augmented Reality

Publication Details
  • ISMAR 2010
  • Oct 13, 2010

Abstract

Close
We propose an Augmented Reality (AR) system that helps users take a picture from a designated pose, such as the position and camera angle of an earlier photo. Repeat photography is frequently used to observe and document changes in an object. Our system uses AR technology to estimate camera poses in real time. When a user takes a photo, the camera pose is saved as a 'view bookmark.' To support a user in taking a repeat photo, two simple graphics are rendered in an AR viewer on the camera's screen to guide the user to this bookmarked view. The system then uses image adjustment techniques to create an image based on the user's repeat photo that is even closer to the original.
Publication Details
  • ACM DocEng 2010
  • Sep 21, 2010

Abstract

Close
We present a method for picture detection in document page images, which can come from scanned or camera images, or rendered from electronic file formats. Our method uses OCR to separate out the text and applies the Normalized Cuts algorithm to cluster the non-text pixels into picture regions. A refinement step uses the captions found in the OCR text to deduce how many pictures are in a picture region, thereby correcting for under- and over-segmentation. A performance evaluation scheme is applied which takes into account the detection quality and fragmentation quality. We benchmark our method against the ABBYY application on page images from conference papers.
Publication Details
  • IIiX 2010
  • Aug 18, 2010

Abstract

Close
Exploratory search is a difficult activity that requires iterative interaction. This iterative process helps the searcher to understand and to refine the information need. It also generates a rich set of data that can be used effectively to reflect on what has been found (and found useful). In this paper, we describe a framework for unifying transitions among various stages of exploratory search, and show how context from one stage can be applied to the next. The framework can be used both to describe existing information-seeking interactions, and as a means of generating novel ones. We illustrate the framework with examples from a session-based exploratory search system prototype that we have built.
Publication Details
  • ICME 2010, Singapore, July 19-23 2010
  • Jul 19, 2010

Abstract

Close
Virtual, mobile, and mixed reality systems have diverse uses for data visualization and remote collaboration in industrial settings, especially factories. We report our experiences in designing complex mixed-reality collaboration, control, and display systems for a real-world factory, for delivering real-time factory information to multiple users. In collaboration with (blank for review), a chocolate maker in San Francisco, our research group is building a virtual “mirror” world of a real-world chocolate factory and its processes. Real-world sensor data (such as temperature and machine state) is imported into the 3D environment from hundreds of sensors on the factory floor. Multi-camera imagery from the factory is also available in the multi-user 3D factory environment. The resulting "virtual factory" is designed for simulation, visualization, and collaboration, using a set of interlinked, real-time 3D and 2D layers of information about the factory and its processes. We are also looking at appropriate industrial uses for mobile devices such as cell phones and tablet computers, and how they intersect with virtual worlds and mixed realities. For example, an experimental iPhone web app provides mobile laboratory monitoring and control. The app allows a real-time view into the lab via steerable camera and remote control of lab machines. The mobile system is integrated with the database underlying the virtual factory world. These systems were deployed at the real-world factory and lab in 2009, and are now in beta development. Through this mashup of mobile, social, mixed and virtual technologies, we hope to create industrial systems for enhanced collaboration between physically remote people and places – for example, factories in China with managers in Japan or the US.
Publication Details
  • ACM SIGACT News, Vol 41, No. 3, 2010
  • Jul 12, 2010

Abstract

Close
Over the years I have enjoyed Mermin's colorful, idiosyncratic, and insightful papers. His interest in the foundations of quantum mechanics has led him to discover alternative explanations for various quantum mechanical puzzles and protocols. These explanations are often superior to previous explanations in both simplicity and insight, and even when they are not outright better, they provide a valuable alternative point of view. His book is filled with such explanations, and with strong, sometimes controversial, opinions on the right way of seeing something, which make his book both valuable and entertaining.
Publication Details
  • JCDL 2010
  • Jun 21, 2010

Abstract

Close
Photo libraries are growing in quantity and size, requiring better support for locating desired photographs. MediaGLOW is an interactive visual workspace designed to address this concern. It uses attributes such as visual appearance, GPS locations, user-assigned tags, and dates to filter and group photos. An automatic layout algorithm positions photos with similar attributes near each other to support users in serendipitously finding multiple relevant photos. In addition, the system can explicitly select photos similar to specified photos. We conducted a user evaluation to determine the benefit provided by similarity layout and the relative advantages offered by the different layout similarity criteria and attribute filters. Study participants had to locate photos matching probe statements. In some tasks, participants were restricted to a single layout similarity criterion and filter option. Participants used multiple attributes to filter photos. Layout by similarity without additional filters turned out to be one of the most used strategies and was especially beneficial for geographical similarity. Lastly, the relative appropriateness of the single similarity criterion to the probe significantly affected retrieval performance.
Publication Details
  • SIAM MI'09 monograph. Related talks: SIAM GPM'09, SIAM MI'09, and BAMA (Bay Area Mathematical Adventures)
  • May 1, 2010

Abstract

Close
Creating virtual models of real spaces and objects is cumber- some and time consuming. This paper focuses on the prob- lem of geometric reconstruction from sparse data obtained from certain image-based modeling approaches. A number of elegant and simple-to-state problems arise concerning when the geometry can be reconstructed. We describe results and counterexamples, and list open problems.

Making sense of Twitter Search

Publication Details
  • In Proc. CHI2010 Workshop on Microblogging: What and How Can We Learn From It? April 11, 2010
  • Apr 11, 2010

Abstract

Close
Twitter provides a search interface to its data, along the lines of traditional search engines. But the single ranked list is a poor way to represent the richly-structured Twitter data. A more structured approach that recognizes original messages, re-tweets, people, and documents as interesting constructs is more appropriate for this kind of data. In this paper, we describe a prototype for exploring search results delivered by Twitter. The design is based on our own experience with using Twitter search, and as well as on the results of an small online questionnaire.
Publication Details
  • In Proc. CHI 2010
  • Apr 10, 2010

Abstract

Close
The use of whiteboards is pervasive across a wide range of work domains. But some of the qualities that make them successful—an intuitive interface, physical working space, and easy erasure—inherently make them poor tools for archival and reuse. If whiteboard content could be made available in times and spaces beyond those supported by the whiteboard alone, how might it be appropriated? We explore this question via ReBoard, a system that automatically captures whiteboard images and makes them accessible through a novel set of user-centered access tools. Through the lens of a seven week workplace field study, we found that by enabling new workflows, ReBoard increased the value of whiteboard content for collaboration.
Publication Details
  • In Proc. CHI 2010
  • Apr 10, 2010

Abstract

Close
The modern workplace is inherently collaborative, and this collaboration relies on effective communication among coworkers. Many communication tools – email, blogs, wikis, Twitter, etc. – have become increasingly available and accepted in workplace communications. In this paper, we report on a study of communications technologies used over a one year period in a small US corporation. We found that participants used a large number of communication tools for different purposes, and that the introduction of new tools did not impact significantly the use of previously-adopted technologies. Further, we identified distinct classes of users based on patterns of tool use. This work has implications for the design of technology in the evolving ecology of communication tools.
Publication Details
  • In Proc. of CHI 2010
  • Apr 10, 2010

Abstract

Close
PACER is a gesture-based interactive paper system that supports fine-grained paper document content manipulation through the touch screen of a cameraphone. Using the phone's camera, PACER links a paper document to its digital version based on visual features. It adopts camera-based phone motion detection for embodied gestures (e.g. marquees, underlines and lassos), with which users can flexibly select and interact with document details (e.g. individual words, symbols and pixels). The touch input is incorporated to facilitate target selection at fine granularity,and to address some limitations of the embodied interaction, such as hand jitter and low input sampling rate. This hybrid interaction is coupled with other techniques such as semi-real time document tracking and loose physical-digital document registration, offering a gesture-based command system. We demonstrate the use of PACER in various scenarios including work-related reading, maps and music score playing. A preliminary user study on the design has produced encouraging user feedback, and suggested future research for better understanding of embodied vs. touch interaction and one vs. two handed interaction.
Publication Details
  • Symposium on Eye Tracking Research and Applications 2010
  • Mar 22, 2010

Abstract

Close
In certain applications such as radiology and imagery analysis, it is important to minimize errors. In this paper we evaluate a structured inspection method that uses eye tracking information as a feedback mechanism to the image inspector. Our two-phase method starts with a free viewing phase during which gaze data is collected. During the next phase, we either segment the image, mask previously seen areas of the image, or combine the two techniques, and repeat the search. We compare the different methods proposed for the second search phase by evaluating the inspection method using true positive and false negative rates, and subjective workload. Results show that gaze-blocked configurations reduced the subjective workload, and that gaze-blocking without segmentation showed the largest increase in true positive identifications and the largest decrease in false negative identifications of previously unseen objects.
Publication Details
  • IEEE Virtual Reality 2010 conference
  • Mar 19, 2010

Abstract

Close
This project investigates practical uses of virtual, mobile, and mixed reality systems in industrial settings, in particular control and collaboration applications for factories. In collaboration with TCHO, a chocolate maker start-up in San Francisco, we have built virtual mirror-world representations of a real-world chocolate factory and are importing its data and modeling its processes. The system integrates mobile devices such as cell phones and tablet computers. The resulting "virtual factory" is a cross-reality environment designed for simulation, visualization, and collaboration, using a set of interlinked, real-time 3D and 2D layers of information about the factory and its processes.
Publication Details
  • IEEE Pervasive Computing. 9(2). 46-55.
  • Mar 15, 2010

Abstract

Close
Paper is static but it is also light, flexible, robust, and has high resolution for reading documents in various scenarios. Digital devices will likely never match the flexibility of paper, but come with all of the benefits of computation and networking. Tags provide a simple means of bridging the gap between the two media to get the most out of both. In this paper, we explore the tradeoffs between two different types of tagging technologies – marker-based and content-based – through the lens of four systems we have developed and evaluated at our lab. From our experiences, we extrapolate issues for designers to consider when developing systems that transition between paper and digital content in a variety of different scenarios.

Abstract

Close
Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queries, enterprise search has not been as successful because of differences in document types and user requirements. To support users in finding the information they need in their online enterprise repository, we created DocuBrowse, a faceted document browsing and search system. Search results are presented within the user-created document hierarchy, showing only directories and documents matching selected facets and containing text query terms. In addition to file properties such as date and file size, automatically detected document types, or genres, serve as one of the search facets. Highlighting draws the user’s attention to the most promising directories and documents while thumbnail images and automatically identified keyphrases help select appropriate documents. DocuBrowse utilizes document similarities, browsing histories, and recommender system techniques to suggest additional promising documents for the current facet and content filters.
Publication Details
  • IUI 2010 Best Paper Award
  • Feb 7, 2010

Abstract

Close
Embedded Media Markers, or simply EMMs, are nearly transparent iconic marks printed on paper documents that signify the existence of media associated with that part of the document. EMMs also guide users' camera operations for media retrieval. Users take a picture of an EMMsignified document patch using a cell phone, and the media associated with the EMM-signified document location is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document contents. Retrieval of media associated with an EMM is based on image local features of the captured EMMsignified document patch. This paper describes a technique for semi-automatically placing an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document.

Seamless Document Handling

Publication Details
  • Fuji Xerox Technical Report, No.19, 2010, pp. 57-65.
  • Jan 12, 2010

Abstract

Close
The current trend toward high-performance mobile networks and increasingly sophisticated mobile devices has fostered the growth of mobile workers. In mobile environments, an urgent need exists for handling documents using a mobile phone, especially for browsing documents and viewing Rich Contents created on computers. This paper describes Seamless Document Handling, which is a technology for viewing electronic documents and Rich Contents on the small screen of a mobile phone. To enhance operability and readability, we devised a method of scrolling documents efficiently by applying document image processing technology, and designed a novel user interface with a pan-and-zoom technique. We conducted on-site observations to test usability of the prototype, and gained insights difficult to acquire in a lab that led to improved functions in the prototype.
Publication Details
  • Fuji Xerox Technical Report No. 19, pp. 88-100
  • Jan 1, 2010

Abstract

Close
Browsing and searching for documents in large, online enterprise document repositories is an increasingly common problem. While users are familiar and usually satisfied with Internet search results for information, enterprise search has not been as successful because of differences in data types and user requirements. To support users in finding the information they need from electronic and scanned documents in their online enterprise repository, we created an automatic detector for genres such as papers, slides, tables, and photos. Several of those genres correspond roughly to file name extensions but are identified automatically using features of the document. This genre identifier plays an important role in our faceted document browsing and search system. The system presents documents in a hierarchy as typically found in enterprise document collections. Documents and directories are filtered to show only documents matching selected facets and containing optional query terms and to highlight promising directories. Thumbnail images and automatically identified keyphrases help select desired documents.
2009

Quantum Computing

Publication Details
  • Entry in Wiley's The Handbook of Technology Management
  • Dec 31, 2009

Abstract

Close
Changing the model underlying information and computation from a classical mechanical to a quantum mechanical one yields faster algorithms, novel cryptographic mechanisms, and alternative methods of communication. Quantum algorithms can perform a select set of tasks vastly more efficiently than any classical algorithm, but for many tasks it has been proven that quantum algorithms provide no advantage. The breadth of quantum computing applications is still being explored. Major application areas include security and the many fields that would benefit from efficient quantum simulation. The quantum information processing viewpoint provides insight into classical algorithmic issues as well as a deeper understanding of entanglement and other non-classical aspects of quantum physics.
Publication Details
  • ACM Multimedia 2009 Workshop on Large-Scale Multimedia Retrieval and Mining
  • Oct 23, 2009

Abstract

Close
We describe an efficient and scalable system for automatic image categorization. Our approach seeks to marry scalable “model-free” neighborhood-based annotation with accurate boosting-based per-tag modeling. For accelerated neighborhood-based classification, we use a set of spatial data structures as weak classifiers for an arbitrary number of categories. We employ standard edge and color features and an approximation scheme that scales to large training sets. The weak classifier outputs are combined in a tag-dependent fashion via boosting to improve accuracy. The method performs competitively with standard SVM-based per-tag classification with substantially reduced computational requirements. We present multi-label image annotation experiments using data sets of more than two million photos.