Publications

FXPAL publishes in top scientific conferences and journals.

2011
Publication Details
  • ACM International Conference on Multimedia Retrieval (ICMR) 2011
  • Apr 17, 2011

Abstract

Close
Embedded Media Marker (EMM) identification system allows users to retrieve relevant dynamic media associated with a static paper document via camera phones. The user supplies a query image by capturing an EMM-signified patch of a paper document through a camera phone; the system recognizes the query and in turn retrieves and plays the corresponding media on the phone. Accurate image matching is crucial for positive user experience in this application. To address the challenges posed by large datasets and variations in camera-phone-captured query images, we introduce a novel image matching scheme based on geometrically consistent correspondences. Two matching constraints - "injection" and "approximate global geometric consistency" (AGGC), which are unique in EMM identification, are presented. A hierarchical scheme, combined with two constraining functions, is designed to detect the "injective-AGGC" correspondences between images. A spatial neighborhood search approach is further proposed to address challenging cases with large translational shift. Experimental results on a 100k+ dataset show that our solution achieves high accuracy with low memory and time complexity and outperforms the standard bag-of-words approach.

Quantum Computing: A Gentle Introduction

Publication Details
  • MIT Press
  • Mar 18, 2011

Abstract

Close
The combination of two of the twentieth century's most influential and revolutionary scientific theories, information theory and quantum mechanics, gave rise to a radically new view of computing and information. Quantum information processing explores the implications of using quantum mechanics instead of classical mechanics to model information and its processing. Quantum computing is not about changing the physical substrate on which computation is done from classical to quantum but about changing the notion of computation itself, at the most basic level. The fundamental unit of computation is no longer the bit but the quantum bit or qubit. This comprehensive introduction to the field offers a thorough exposition of quantum computing and the underlying concepts of quantum physics, explaining all the relevant mathematics and offering numerous examples. With its careful development of concepts and thorough explanations, the book makes quantum computing accessible to students and professionals in mathematics, computer science, and engineering. A reader with no prior knowledge of quantum physics (but with sufficient knowledge of linear algebra) will be able to gain a fluent understanding by working through the book. The text covers the basic building blocks of quantum information processing, quantum bits and quantum gates, showing their relationship to the key quantum concepts of quantum measurement, quantum state transformation, and entanglement between quantum subsystems; it treats quantum algorithms, discussing notions of complexity and describing a number of simple algorithms as well as the most significant algorithms to date; and it explores entanglement and robust quantum computation, investigating such topics as quantifying entanglement, decoherence, quantum error correction, and fault tolerance.

Augmented Perception through Mirror Worlds

Publication Details
  • Augmented Human 2011
  • Mar 12, 2011

Abstract

Close
We describe a system that mirrors a public physical space into cyberspace to provide people with augmented awareness of that space. Through views on web pages, portable devices, or on `Magic Window' displays located in the physical space, remote people may `look in' to the space, while people within the space are provided information not apparent through unaided perception. For example, by looking at a mirror display, people can learn how long others have been present, where they have been, etc. People in one part of a building can get a sense of the activities in the rest of the building, who is present in their office, look in to a talk in another room, etc. We describe a prototype for such a system developed in our research lab and office space.

DiG: A task-based approach to product search

Publication Details
  • IUI 2011
  • Feb 13, 2011

Abstract

Close
While there are many commercial systems designed to help people browse and compare products, these interfaces are typically product centric. To help users more efficiently identify products that match their needs, we instead focus on building a task centric interface and system. With this approach, users initially answer questions about the types of situations in which they expect to use the product. The interface reveals the types of products that match their needs and exposes high-level product features related to the kinds of tasks in which they have expressed an interest. As users explore the interface, they can reveal how those high-level features are linked to actual product data, including customer reviews and product specifications. We developed semi-automatic methods to extract the high-level features used by the system from online product data. These methods identify and group product features, mine and summarize opinions about those features, and identify product uses. User studies verified our focus on high-level features for browsing and low-level features and specifications for comparison.  

Privacy-Preserving Aggregation of Time-Series Data

Publication Details
  • NDSS 2011
  • Feb 6, 2011

Abstract

Close
We consider how an untrusted data aggregator can learn desired statistics over multiple participants' data, without compromising each individual's privacy. We propose a construction that allows a group of participants to periodically upload encrypted values to a data aggregator, such that the aggregator is able to compute the sum of all participants' values in every time period, but is unable to learn anything else. We achieve strong privacy guarantees using two main techniques. First, we show how to utilize applied cryptographic techniques to allow the aggregator to decrypt the sum from multiple ciphertexts encrypted under different user keys. Second, we describe a distributed data randomization procedure that guarantees the differential privacy of the outcome statistic, even when a subset of participants might be compromised.
Publication Details
  • IS&T and SPIE International Conference on Multimedia Content Access: Algorithms and Systems
  • Jan 23, 2011

Abstract

Close
This paper describes research activities at FX Palo Alto Laboratory (FXPAL) in the area of multimedia browsing, search, and retrieval. We first consider interfaces for organization and management of personal photo collections. We then survey our work on interactive video search and retrieval. Throughout we discuss the evolution of both the research challenges in these areas and our proposed solutions.
Publication Details
  • Fuji Xerox Technical Report
  • Jan 1, 2011

Abstract

Close
Embedded Media Markers, or simply EMMs, are nearly transparent iconic marks printed on paper documents that signify the existence of media associated with that part of the document. EMMs also guide users' camera operations for media retrieval. Users take a picture of an EMM-signified document patch using a cell phone, and the media associated with the EMM-signified document location is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document appearance. Retrieval of media associated with an EMM is based on image local features of the captured EMM-signified document patch. This paper describes a technique for semi-automatically placing an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document.
Publication Details
  • Encyclopledia of the Sciences of Learning
  • Jan 1, 2011

Abstract

Close
Supervised Learning is a machine learning paradigm for acquiring the input-output relationship information of a system based on a given set of paired input-output training samples. As the output is regarded as the label of the input data or the supervision, an input-output training sample is also called labelled training data, or supervised data. Occasionally, it is also referred to as Learning with a Teacher (Haykin 1998), Learning from Labelled Data, or Inductive Machine Learning (Kotsiantis, 2007). The goal of supervised learning is to build an artificial system that can learn the mapping between the input and the output, and can predict the output of the system given new inputs. If the output takes a finite set of discrete values that indicate the class labels of the input, the learned mapping leads to the classification of the input data. If the output takes continuous values, it leads to a regression of the input. The input-output relationship information is frequently represented with learning-model parameters. When these parameters are not directly available from training samples, a learning system needs to go through an estimation process to obtain these parameters. Different form Unsupervised Learning, the training data for Supervised Learning need supervised or labelled information, while the training data for unsupervised learning are unsupervised as they are not labelled (i.e., merely the inputs). If an algorithm uses both supervised and unsupervised training data, it is called a Semi-supervised Learning algorithm. If an algorithm actively queries a user/teacher for labels in the training process, the iterative supervised learning is called Active Learning.
2010
Publication Details
  • ACM International Conference on Multimodal Interfaces
  • Nov 8, 2010

Abstract

Close
Embedded Media Barcode Links, or simply EMBLs, are optimally blended iconic barcode marks, printed on paper documents, that signify the existence of multimedia associated with that part of the document content (Figure 1). EMBLs are used for multimedia retrieval with a camera phone. Users take a picture of an EMBL-signified document patch using a cell phone, and the multimedia associated with the EMBL-signified document location is displayed on the phone. Unlike a traditional barcode which requires an exclusive space, the EMBL construction algorithm acts as an agent to negotiate with a barcode reader for maximum user and document benefits. Because of this negotiation, EMBLs are optimally blended with content and thus have less interference with the original document layout and can be moved closer to a media associated location. Retrieval of media associated with an EMBL is based on the barcode identification of a captured EMBL. Therefore, EMBL retains nearly all barcode identification advantages, such as accuracy, speed, and scalability. Moreover, EMBL takes advantage of users' knowledge of a traditional barcode. Unlike Embedded Media Maker (EMM) which requires underlying document features for marker identification, EMBL has no requirement for the underlying features. This paper will discuss the procedures for EMBL construction and optimization. It will also give experimental results that strongly support the EMBL construction and optimization ideas.
Publication Details
  • Information Processing & Management, 46 (6), pp. 629-631
  • Nov 1, 2010

Abstract

Close
This special issue brings together papers that describe some of the many ways that collaborative information seeking manifests itself. Some papers report on collaborative practices in a range of domains, including medical (Hertzum), legal (Attfield et al.), and online Q&A (Gazan). Others propose and evaluate models of collaborative activity (Evans and Chi; Evans et al.; Wilson and schraefel; Foley and Smeaton), and others describe systems and algorithms that support collaboration in various ways (Boydell and Smyth; Fernandez-Luna et al., Halvey et al., Morris et al.; Shah et al.).

Role-based results redistribution for collaborative information retrieval

Publication Details
  • Information Processing & Management, 46 (6), pp. 773-781
  • Nov 1, 2010

Abstract

Close
We describe a new approach for algorithmic mediation of a collaborative search process. Unlike most approaches to collaborative IR, we are designing systems that mediate explicitly-defined synchronous collaboration among small groups of searchers with a shared information need. Such functionality is provided by first obtaining different rank-lists based on searchers' queries, fusing these rank-lists, and then splitting the combined list to distribute documents among collaborators according to their roles. For the work reported here, we consider the case of two people collaborating on a search. We assign them roles of Gatherer and Surveyor: the Gatherer is tasked with exploring highly promising information on a given topic, and the Surveyor is tasked with digging further to explore more diverse information. We demonstrate how our technique provides the Gatherer with high-precision results, and the Surveyor with information that is high in entropy.

Reverted Indexing for Feedback and Expansion

Publication Details
  • ACM Conference on Information and Knowledge Management (CIKM 2010)
  • Oct 26, 2010

Abstract

Close
Traditional interactive information retrieval systems function by creating inverted lists, or term indexes. For every term in the vocabulary, a list is created that contains the documents in which that term occurs and its relative frequency within each document. Retrieval algorithms then use these term frequencies alongside other collection statistics to identify the matching documents for a query. In this paper, we turn the process around: instead of indexing documents, we index query result sets. First, queries are run through a chosen retrieval system. For each query, the resulting document IDs are treated as terms and the score or rank of the document is used as the frequency statistic. An index of documents retrieved by basis queries is created. We call this index a reverted index. Finally, with reverted indexes, standard retrieval algorithms can retrieve the matching queries (as results) for a set of documents (used as queries). These recovered queries can then be used to identify additional documents, or to aid the user in query formulation, selection, and feedback.

TalkMiner: A Search Engine for Online Lecture Video

Publication Details
  • ACM Multimedia 2010 - Industrial Exhibits
  • Oct 25, 2010

Abstract

Close
TalkMiner is a search engine for lecture webcasts. Lecture videos are processed to recover a set of distinct slide images and OCR is used to generate a list of indexable terms from the slides. On our prototype system, users can search and browse lists of lectures, slides in a specific lecture, and play the lecture video. Over 10,000 lecture videos have been indexed from a variety of sources. A public website now allows users to experiment with the search engine.
Publication Details
  • ACM Multimedia 2010
  • Oct 25, 2010

Abstract

Close
NudgeCam is a mobile application that can help users capture more relevant, higher quality media. To guide users to capture media more relevant to a particular project, third-party template creators can show users media that demonstrates relevant content and can tell users what content should be present in each captured media using tags and other meta-data such as location and camera orientation. To encourage higher quality media capture, NudgeCam provides real time feedback based on standard media capture heuristics, including face positioning, pan speed, audio quality, and many others. We describe an implementation of NudgeCam on the Android platform as well as fi eld deployments of the application.

The Virtual Chocolate Factory:Mixed Reality Industrial Collaboration and Control

Publication Details
  • ACM Multimedia 2010 - Industrial Exhibits
  • Oct 25, 2010

Abstract

Close
We will exhibit several aspects of a complex mixed reality system that we have built and deployed in a real-world factory setting. In our system, virtual worlds, augmented realities, and mobile applications are all fed from the same infrastructure. In collaboration with TCHO, a chocolate maker in San Francisco, we built a virtual “mirror” world of a real-world chocolate factory and its processes. Sensor data is imported into the multi-user 3D environment from hundreds of sensors on the factory floor. The resulting virtual factory is used for simulation, visualization, and collaboration, using a set of interlinked, real-time layers of information. Another part of our infrastructure is designed to support appropriate industrial uses for mobile devices such as cell phones and tablet computers. We deployed this system at the real-world factory in 2009, and it is now is daily use there. By simultaneously developing mobile, virtual, and web-based display and collaboration environments, we aimed to create an infrastructure that did not skew toward one type of application but that could serve many at once, interchangeably. Through this mixture of mobile, social, mixed and virtual technologies, we hope to create systems for enhanced collaboration in industrial settings between physically remote people and places, such as factories in China with managers in the US.

TalkMiner: A Lecture Webcast Search Engine

Publication Details
  • ACM Multimedia 2010
  • Oct 25, 2010

Abstract

Close
The design and implementation of a search engine for lecture webcasts is described. A searchable text index is created allowing users to locate material within lecture videos found on a variety of websites such as YouTube and Berkeley webcasts. The index is created from words on the presentation slides appearing in the video along with any associated metadata such as the title and abstract when available. The video is analyzed to identify a set of distinct slide images, to which OCR and lexical processes are applied which in turn generate a list of indexable terms. Several problems were discovered when trying to identify distinct slides in the video stream. For example, picture-in-picture compositing of a speaker and a presentation slide, switching cameras, and slide builds confuse basic frame-differencing algorithms for extracting keyframe slide images. Algorithms are described that improve slide identification. A prototype system was built to test the algorithms and the utility of the search engine. Users can browse lists of lectures, slides in a specific lecture, or play the lecture video. Over 10,000 lecture videos have been indexed from a variety of sources. A public website will be published in mid 2010 that allows users to experiment with the search engine.
Publication Details
  • ACM Multimedia 2010
  • Oct 25, 2010

Abstract

Close
An Embedded Media Marker (EMM) is a transparent mark printed on a paper document that signifies the availability of additional media associated with that part of the document. Users take a picture of the EMM using a camera phone, and the media associated with that part of the document is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document appearance. Retrieval of media associated with an EMM is based on image features of the document within the EMM boundary. Unlike other feature-based retrieval methods, the EMM clearly indicates to the user the existence and type of media associated with the document location. A semi-automatic authoring tool is used to place an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document. We will demonstrate how to create an EMM-enhanced document, and how the EMM enables access to the associated media on a cell phone.
Publication Details
  • ACM Multimedia
  • Oct 25, 2010

Abstract

Close
FACT is an interactive paper system for fine-grained interaction with documents across the boundary between paper and computers. It consists of a small camera-projector unit, a laptop, and ordinary paper documents. With the camera-projector unit pointing to a paper document, the system allows a user to issue pen gestures on the paper document for selecting fine-grained content and applying various digital functions. For example, the user can choose individual words, symbols, figures, and arbitrary regions for keyword search, copy and paste, web search, and remote sharing. FACT thus enables a computer-like user experience on paper. This paper interaction can be integrated with laptop interaction for cross-media manipulations on multiple documents and views. We present the infrastructure, supporting techniques and interaction design, and demonstrate the feasibility via a quantitative experiment. We also propose applications such as document manipulation, map navigation and remote collaboration.
Publication Details
  • NPUC2010
  • Oct 22, 2010

Abstract

Close
The massive amounts of information that are being collected about each of us will only increase as sensors become ever cheaper and more powerful. Analysis of this wealth of data supports advances in medicine and public health, improved software and services through user pattern analysis, and more efficient economic mechanisms. At the same time, the potential for misuse of such data is significant. A long-term research question is how best to support beneficial uses while inhibiting misuse. One approach is to enable individuals to maintain tighter control of their own data while still supporting the computation of group statistics. Currently, analysts are usually given access to all data in order to compute statistics, and often use a third party service provider to store, or even process, such data. Either the third party has access to all data or the data are encrypted, in which case the third party does no processing. An interesting research question is how to provide mechanisms to support "need to know" security in which an individual has full access to her own data, the third party learns nothing about the data but can nevertheless contribute to the processing, and the analyst learns only the desired statistics. We have explored "need to know" security in connection with MyUnity, a prototype awareness system. MyUnity collects data from a variety of sources and displays summary presence states, such as ``in office'' or ``with visitor,'' computed from the received data. MyUnity was deployed in a small research lab and has been in use by over 30 people for more than a year. To avoid concerns about misuse, the system did not store any data initially. The researchers developing the system were interested, however, in analyzing usage patterns, and users expressed interest in seeing personal trends, activity patterns of coworkers, and long-term data pooled across groups of users, all requiring data to be stored. At the same time, users continued to be concerned about misuse of stored data. We looked at ``need to know'' security for cases in which, at each time step, each member of a group of users has a value (i.e., a presence state) to contribute, and the group would like to provide only an aggregate view of those values to people outside their group. We designed and implemented an efficient protocol that enables each user to encrypt under her own key in such a way that a third party can compute an encryption of a sum across values encrypted under different keys without the need for further interactions with the individuals. The protocol provides means for an analyst to decrypt the encrypted sum. We designed key structures and extensions to provide a family of efficient non-interactive ``need to know'' protocols for time series data in which the analyst learns only the statistics, not the individual data values, and the third party learns nothing about the values.

Camera Pose Navigation using Augmented Reality

Publication Details
  • ISMAR 2010
  • Oct 13, 2010

Abstract

Close
We propose an Augmented Reality (AR) system that helps users take a picture from a designated pose, such as the position and camera angle of an earlier photo. Repeat photography is frequently used to observe and document changes in an object. Our system uses AR technology to estimate camera poses in real time. When a user takes a photo, the camera pose is saved as a 'view bookmark.' To support a user in taking a repeat photo, two simple graphics are rendered in an AR viewer on the camera's screen to guide the user to this bookmarked view. The system then uses image adjustment techniques to create an image based on the user's repeat photo that is even closer to the original.
Publication Details
  • ACM DocEng 2010
  • Sep 21, 2010

Abstract

Close
We present a method for picture detection in document page images, which can come from scanned or camera images, or rendered from electronic file formats. Our method uses OCR to separate out the text and applies the Normalized Cuts algorithm to cluster the non-text pixels into picture regions. A refinement step uses the captions found in the OCR text to deduce how many pictures are in a picture region, thereby correcting for under- and over-segmentation. A performance evaluation scheme is applied which takes into account the detection quality and fragmentation quality. We benchmark our method against the ABBYY application on page images from conference papers.
Publication Details
  • IIiX 2010
  • Aug 18, 2010

Abstract

Close
Exploratory search is a difficult activity that requires iterative interaction. This iterative process helps the searcher to understand and to refine the information need. It also generates a rich set of data that can be used effectively to reflect on what has been found (and found useful). In this paper, we describe a framework for unifying transitions among various stages of exploratory search, and show how context from one stage can be applied to the next. The framework can be used both to describe existing information-seeking interactions, and as a means of generating novel ones. We illustrate the framework with examples from a session-based exploratory search system prototype that we have built.
Publication Details
  • ICME 2010, Singapore, July 19-23 2010
  • Jul 19, 2010

Abstract

Close
Virtual, mobile, and mixed reality systems have diverse uses for data visualization and remote collaboration in industrial settings, especially factories. We report our experiences in designing complex mixed-reality collaboration, control, and display systems for a real-world factory, for delivering real-time factory information to multiple users. In collaboration with (blank for review), a chocolate maker in San Francisco, our research group is building a virtual “mirror” world of a real-world chocolate factory and its processes. Real-world sensor data (such as temperature and machine state) is imported into the 3D environment from hundreds of sensors on the factory floor. Multi-camera imagery from the factory is also available in the multi-user 3D factory environment. The resulting "virtual factory" is designed for simulation, visualization, and collaboration, using a set of interlinked, real-time 3D and 2D layers of information about the factory and its processes. We are also looking at appropriate industrial uses for mobile devices such as cell phones and tablet computers, and how they intersect with virtual worlds and mixed realities. For example, an experimental iPhone web app provides mobile laboratory monitoring and control. The app allows a real-time view into the lab via steerable camera and remote control of lab machines. The mobile system is integrated with the database underlying the virtual factory world. These systems were deployed at the real-world factory and lab in 2009, and are now in beta development. Through this mashup of mobile, social, mixed and virtual technologies, we hope to create industrial systems for enhanced collaboration between physically remote people and places – for example, factories in China with managers in Japan or the US.
Publication Details
  • ACM SIGACT News, Vol 41, No. 3, 2010
  • Jul 12, 2010

Abstract

Close
Over the years I have enjoyed Mermin's colorful, idiosyncratic, and insightful papers. His interest in the foundations of quantum mechanics has led him to discover alternative explanations for various quantum mechanical puzzles and protocols. These explanations are often superior to previous explanations in both simplicity and insight, and even when they are not outright better, they provide a valuable alternative point of view. His book is filled with such explanations, and with strong, sometimes controversial, opinions on the right way of seeing something, which make his book both valuable and entertaining.