Publications

FXPAL publishes in top scientific conferences and journals.

2013
Publication Details
  • CBDAR 2013
  • Aug 23, 2013

Abstract

Close
Capturing book images is more convenient with a mobile phone camera than with more specialized flat-bed scanners or 3D capture devices. We built an application for the iPhone 4S that captures a sequence of hi-res (8 MP) images of a page spread as the user sweeps the device across the book. To do the 3D dewarping, we implemented two algorithms: optical flow (OF) and structure from motion (SfM). Making further use of the image sequence, we examined the potential of multi-frame OCR. Preliminary evaluation on a small set of data shows that OF and SfM had comparable OCR performance for both single-frame and multi-frame techniques, and that multi-frame was substantially better than single-frame. The computation time was much less for OF than for SfM.

SearchPanel: A browser extension for managing search activity

Publication Details
  • EuroHCIR 2013
  • Aug 1, 2013

Abstract

Close
People often use more than one query when searching for information; they also revisit search results to re-find information. These tasks are not well-supported by search interfaces and web browsers. We designed and built a Chrome browser extension that helps people manage their ongoing information seeking. The extension combines document and process metadata into an interactive representation of the retrieved documents that can be used for sense-making, for navigation, and for re-finding documents.

Looking Ahead: Query Preview in Exploratory Search

Publication Details
  • SIGIR 2013
  • Jul 28, 2013

Abstract

Close
Exploratory search is a complex, iterative information seeking activity that involves running multiple queries, finding and examining many documents. We introduced a query preview interface that visualizes the distribution of newly-retrieved and re-retrieved documents prior to showing the detailed query results. When evaluating the preview control with a control condition, we found effects on both people’s information seeking behavior and improved retrieval performance. People spent more time formulating a query and were more likely to explore search results more deeply, retrieved a more diverse set of documents, and found more different relevant documents when using the preview. With more time spent on query formulation, higher quality queries were produced and as consequence the retrieval results improved; both average residual precision and recall was higher with the query preview present.
Publication Details
  • The International Symposium on Pervasive Displays
  • Jun 4, 2013

Abstract

Close
Existing user interfaces for the configuration of large shared displays with multiple inputs and outputs usually do not allow users easy and direct configuration of the display's properties such as window arrangement or scaling. To address this problem, we are exploring a gesture-based technique for manipulating display windows on shared display systems. To aid target selection under noisy tracking conditions, we propose VoroPoint, a modified Voronoi tessellation approach that increases the selectable target area of the display windows. By maximizing the available target area, users can select and interact with display windows with greater ease and precision.

Private Aggregation for Presence Streams

Publication Details
  • Future Generation Computer Systems
  • May 28, 2013

Abstract

Close

Collaboration technologies must support information sharing between collaborators, but must also take care not to share too much information or share information too widely. Systems that share information without requiring an explicit action by a user to initiate the sharing must be particularly cautious in this respect. Presence systems are an emerging class of applications that support collaboration. Through the use of pervasive sensors, these systems estimate user location, activities, and available communication channels. Because such presence data are sensitive, to achieve wide-spread adoption, sharing models must reflect the privacy and sharing preferences of their users. This paper looks at the role that privacy-preserving aggregation can play in addressing certain user sharing and privacy concerns with respect to presence data. We define conditions to achieve CollaPSE (Collaboration Presence Sharing Encryption) security, in which (i) an individual has full access to her own data, (ii) a third party performs computation on the data without learning anything about the data values, and (iii) people with special privileges called “analysts” can learn statistical information about groups of individuals, but nothing about the individual values contributing to the statistic other than what can be deduced from the statistic. More specifically, analysts can decrypt aggregates without being able to decrypt the individual values contributing to the aggregate. Based in part on studies we carried out that illustrate the need for the conditions encapsulated by CollaPSE security, we designed and implemented a family of CollaPSE protocols. We analyze their security, discuss efficiency tradeoffs, describe extensions, and review more recent privacy-preserving aggregation work.

Leading People to Longer Queries

Publication Details
  • CHI 2013
  • Apr 27, 2013

Abstract

Close
Although longer queries can produce better results for information seeking tasks, people tend to type short queries. We created an interface designed to encourage people to type longer queries, and evaluated it in two Mechanical Turk experiments. Results suggest that our interface manipulation may be effective for eliciting longer queries.
Publication Details
  • IUI 2013
  • Mar 19, 2013

Abstract

Close
People frequently capture photos with their smartphones, and some are starting to capture images of documents. However, the quality of captured document images is often lower than expected, even when applications that perform post-processing to improve the image are used. To improve the quality of captured images before post-processing, we developed a Smart Document Capture (SmartDCap) application that provides real-time feedback to users about the likely quality of a captured image. The quality measures capture the sharpness and framing of a page or regions on a page, such as a set of one or more columns, a part of a column, a figure, or a table. Using our approach, while users adjust the camera position, the application automatically determines when to take a picture of a document to produce a good quality result. We performed a subjective evaluation comparing SmartDCap and the Android Ice Cream Sandwich (ICS) camera application; we also used raters to evaluate the quality of the captured images. Our results indicate that users find SmartDCap to be as easy to use as the standard ICS camera application. Additionally, images captured using SmartDCap are sharper and better framed on average than images using the ICS camera application.

Abstract

Close
Motivated by the addition of gyroscopes to a large number of new smart phones, we study the effects of combining accelerometer and gyroscope data on the recognition rate of motion gesture recognizers with dimensionality constraints. Using a large data set of motion gestures we analyze results for the following algorithms: Protractor3D, Dynamic Time Warping (DTW) and Regularized Logistic Regression (LR). We chose to study these algorithms because they are relatively easy to implement, thus well suited for rapid prototyping or early deployment during prototyping stages. For use in our analysis, we contribute a method to extend Protractor3D to work with the 6D data obtained by combining accelerometer and gyroscope data. Our results show that combining accelerometer and gyroscope data is beneficial also for algorithms with dimensionality constraints and improves the gesture recognition rate on our data set by up to 4%.

Real-time Direct Manipulation of Screen-based Videos

Publication Details
  • IUI 2013
  • Mar 19, 2013

Abstract

Close
We describe direct video manipulation interactions applied to screen-based tutorials. In addition to using the video timeline, users of our system can quickly navigate into the video by mouse-wheel, double click over a rectangular region to zoom in and out, or drag a box over the video canvas to select text and scrub the video until the end of a text line even if not shown in the current frame. We describe the video processing techniques developed to implement these direct video manipulation techniques, and show how there are implemented to run in most modern web browsers using HTML5's CANVAS and Javascript.
Publication Details
  • SPIE Electronic Imaging 2013
  • Feb 3, 2013

Abstract

Close
Video is becoming a prevalent medium for e-learning. Lecture videos contain useful information in both the visual and aural channels: the presentation slides and lecturer's speech respectively. To extract the visual information, we apply video content analysis to detect slides and optical character recognition (OCR) to obtain their text. Automatic speech recognition (ASR) is used similarly to extract spoken text from the recorded audio. These two text sources have distinct characteristics and relative strengths for video retrieval. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Experiments reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Additional experiments demonstrate higher precision video retrieval using automatically extracted slide text.  
2012

Mirror Worlds for Indoor Navigation and Awareness

Publication Details
  • IPIN2012
  • Nov 13, 2012

Abstract

Close
We describe Explorer, a system utilizing mirror worlds - dynamic 3D virtual models of physical spaces that reflect the structure and activities of those spaces to help support navigation, context awareness and tasks such as planning and recollection of events. A rich sensor network dynamically updates the models, determining the position of people, status of rooms, or updating textures to reflect displays or bulletin boards. Through views on web pages, portable devices, or on 'magic window' displays located in the physical space, remote people may 'Clook in' to the space, while people within the space are provided with augmented views showing information not physically apparent. For example, by looking at a mirror display, people can learn how long others have been present, or where they have been. People in one part of a building can get a sense of activities in the rest of the building, know who is present in their office, and look in to presentations in other rooms. A spatial graph is derived from the 3D models which is used both to navigational paths and for fusion of acoustic, WiFi, motion and image sensors used for positioning. We describe usage scenarios for the system as deployed in two research labs, and a conference venue.
Publication Details
  • IPIN2012
  • Nov 13, 2012

Abstract

Close
Audio-based receiver localization in indoor environ-ments has multiple applications including indoor navigation, loca-tion tagging, and tracking. Public places like shopping malls and consumer stores often have loudspeakers installed to play music for public entertainment. Similarly, office spaces may have sound conditioning speakers installed to soften other environmental noises. We discuss an approach to leverage this infrastructure to perform audio-based localization of devices requesting local-ization in such environments, by playing barely audible controlled sounds from multiple speakers at known positions. Our approach can be used to localize devices such as smart-phones, tablets and laptops to sub-meter accuracy. The user does not need to carry any specialized hardware. Unlike acoustic approaches which use high-energy ultrasound waves, the use of barely audible (low energy) signals in our approach poses very different challenges. We discuss these challenges, how we addressed those, and experimental results on two prototypical implementations: a request-play-record localizer, and a continuous tracker. We evaluated our approach in a real world meeting room and report promising initial results with localization accuracy within half a meter 94% of the time. The system has been deployed in multiple zones of our office building and is now part of a location service in constant operation in our lab.
Publication Details
  • ICPR 2012
  • Nov 11, 2012

Abstract

Close
Images of document pages have different characteristics than images of natural scenes, and so the sharpness measures developed for natural scene images do not necessarily extend to document images primarily composed of text. We present an efficient and simple method for effectively estimating the sharpness/ blurriness of document images that also performs well on natural scenes. Our method can be used to predict the sharpness in scenarios where images are blurred due to camera-motion (or hand-shake), defocus, or inherent properties of the imaging system. The proposed method outperforms the perceptually-based, no-reference sharpness work of [1] and [4], which was shown to perform better than 14 other no-reference sharpness measures on the LIVE dataset.
Publication Details
  • ACM Multimedia 2012
  • Oct 29, 2012

Abstract

Close
Paper and Computers have complementary advantages and are used side by side in many scenarios. Interactive paper systems aim to combine the two media. However, most such systems only allow fingers and pens to interact with content on paper. This finger-pen-only input suffers from low precision, lag, instability and occlusion. Moreover, it incurs frequent device switch (e.g. pen vs. mouse) in users' hand during cross-media interactions, yielding inefficiency and interruptions of a document workspace continuum. To address these limitations, we propose MixPad, a novel interactive paper system which incorporates mice and keyboards to enhance the conventional pen-finger-based paper interaction. Similar to many other systems, MixPad adopts a mobile camera-projector unit to recognize paper documents, detect pen and finger gestures and provide visual feedback. Unlike these systems, MixPad supports users to use mice and keyboards to select fine-grained content and create annotation on paper, and to facilitate bimanual operations for more efficient and smoother cross-media interaction. This novel interaction style combines the advantages of mice, keyboards, pens and fingers, enabling richer digital functions on paper.
Publication Details
  • ACM Multimedia 2012
  • Oct 29, 2012

Abstract

Close
Faithful sharing of screen contents is an important collaboration feature. Prior systems were designed to operate over constrained networks. They performed poorly even without such bottlenecks. To build a high performance screen sharing system, we empirically analyzed screen contents for a variety of scenarios. We showed that screen updates were sporadic with long periods of inactivity. When active, screens were updated at far higher rates than was supported by earlier systems. The mismatch was pronounced for interactive scenarios. Even during active screen updates, the number of updated pixels were frequently small. We showed that crucial information can be lost if individual updates were merged. When the available system resources could not support high capture rates, we showed ways in which updates can be effectively collapsed. We showed that Zlib lossless compression performed poorly for screen updates. By analyzing the screen pixels, we developed a practical transformation that significantly improved compression rates. Our system captured 240 updates per second while only using 4.6 Mbps for interactive scenarios. Still, while playing movies in fullscreen mode, our approach could not achieve higher capture rates than prior systems; the CPU remains the bottleneck. A system that incorporates our findings is deployed within the lab.
Publication Details
  • ACM Multimedia '12
  • Oct 29, 2012

Abstract

Close
DisplayCast is a many to many screen sharing system that is targeted towards Intranet scenarios. The capture software runs on all computers whose screens need to be shared. It uses an application agnostic screen capture mechanism that creates a sequence of pixmap images of the screen updates. It transforms these pixmaps to vastly improve the lossless Zlib compression performance. These algorithms were developed after an extensive analysis of typical screen contents. DisplayCast shares the processor and network resources required for screen capture, compression and transmission with host applications whose output needs to be shared. It balances the need for high performance screen capture with reducing its resource interference with user applications. DisplayCast uses Zeroconf for naming and asynchronous location. It provides support for Cisco WiFi and Bluetooth based localization. It also includes a HTTP/REST based controller for remote session initiation and control. DisplayCast supports screen capture and playback in computers running Windows 7 and Mac OS X operating systems. Remote screens can be archived into a H.264 encoded movie on a Mac. They can also be played back in real time on Apple iPhones and iPads. The software is released under a New BSD license.
Publication Details
  • CIKM 2012 Books Online Workshop Keynote Address
  • Oct 29, 2012

Abstract

Close
Reading is part of how we understand the world, how we share knowledge, how we play, and even how we think. Although reading text is the dominant form of reading, most of the text we read— letters, numbers, words, and sentences—is surrounded by illustrations, photographs, and other kinds of symbols that we include as we read. As dynamic displays migrate into the real world at many scales, whether personal devices, handhelds, or large screens in both interior and exterior spaces, opportunities for reading migrate as well. As has happened continually throughout the history of reading, new technologies, physical forms and social patterns create new genres, which themselves may then combine or collide to morph into something new. At PARC, the RED (Research in Experimental Design) group examined emerging technologies for impact on media and the human relationship to information, especially reading. We explored new ways of experiencing text: new genres, new styles of interaction, and unusual media. Among the questions we considered: how might “the book” change? More particularly, how does the experience of reading change with the introduction of new technologies…and how does it remain the same? In this talk, we'll discuss the ideas behind the design and research process that led to creating eleven different experiences of new forms of reading. We’ll also consider how our technological context for reading has changed in recent years, and what influence the lessons from XFR may have on our ever-developing online reading experiences.

Through the Looking-Glass: Mirror Worlds for Augmented Awareness & Capability

Publication Details
  • ACM MM 2012
  • Oct 29, 2012

Abstract

Close
We describe a system for supporting mirror worlds - 3D virtual models of physical spaces that reflect the structure and activities of those spaces to help support context awareness and tasks such as planning and recollection of events. Through views on web pages, portable devices, or on 'magic window' displays located in the physical space, remote people may 'look in' to the space, while people within the space are provided information not apparent through unaided perception. For example, by looking at a mirror display, people can learn how long others have been present, or where they have been. People in one part of a building can get a sense of activities in the rest of the building, know who is present in their office, and look in to presentations in other rooms. The system can be used to bridge across sites and help provide different parts of an organization with a shared awareness of each other's space and activities. We describe deployments of our mirror world system at several locations.
Publication Details
  • Mobile HCI 2012 demo track
  • Sep 21, 2012

Abstract

Close
In this demonstration we will show a mobile remote control and monitoring application for a recipe development laboratory at a local chocolate production company. In collaboration with TCHO, a chocolate maker in San Francisco, we built a mobile Web app designed to allow chocolate makers to control their laboratory's machines. Sensor data is imported into the app from each machine in the lab. The mobile Web app is used for control, monitoring, and collaboration. We have tested and deployed this system at the real-world factory and it is now in daily use. This project is designed as part of a research exploration into enhanced collaboration in industrial settings between physically remote people and places, e.g. factories in China with clients in the US.
Publication Details
  • Workshop on Social Mobile Video and Panoramic Video
  • Sep 20, 2012

Abstract

Close
The ways in which we come to know and share what we know with others are deeply entwined with the technologies that enable us to capture and share information. As face-to-face communication has been supplemented with ever-richer media––textual books, illustrations and photographs, audio, film and video, and more––the possibilities for knowledge transfer have only expanded. One of the latest trends to emerge amidst the growth of Internet sharing and pervasive mobile devices is the mass creation of online instructional videos. We are interested in exploring how smart phones shape this sort of mobile, rich media documentation and sharing.
Publication Details
  • USENIX/ACM/IFIP Middleware
  • Sep 19, 2012

Abstract

Close
Faunus addresses the challenge of specifying and managing complex collaboration sessions. Many entities from various administrative domains orchestrate such sessions. Faunus decouples the entities that specify the session from entities that activate and manage them. It restricts the operations to specific agents using capabilities. It unifies the specification and management operations through its naming system. Each Faunus name is persistent and globally unique. A collection of attributes are attached to each name. Together, they represent a collection of services that form a collaboration session. Anyone can create a name; the creator has full read and write privileges that can be delegated to others. With the proper capability, anyone can modify session attributes between an active and inactive state. Though the system is designed for Internet scale deployments, the security model for providing and revoking capabilities currently assumes an Intranet style deployment. We have incorporated Faunus into a DisplayCast system that originally used Zeroconf. We are incorporating Faunus into another project that will fully exercise the power of Faunus.
Publication Details
  • International Journal on Document Analysis and Recognition (IJDAR): Volume 15, Issue 3 (2012), pp. 167-182.
  • Sep 1, 2012

Abstract

Close
When searching or browsing documents, the genre of a document is an important consideration that complements topical characterization. We examine design considerations for automatic tagging of office document pages with genre membership. These include selecting features that characterize genre-related information in office documents, examining the utility of text-based features and image-based features, and proposing a simple ensemble method to improve genre identification performance. In the open-set identification of four office document genres, our experiments show that when combined with image-based features, text-based features do not significantly influence performance. These results provide support for a topic-independent approach to genre identification of office documents. Experiments also show that our simple ensemble method significantly improves performance relative to using a support vector machine (SVM) classifier alone. We demonstrate the utility of our approach by integrating our automatic genre tags in a faceted search and browsing application for office document collections.
Publication Details
  • IIiX 2012
  • Aug 21, 2012

Abstract

Close
Exploratory search activities tend to span multiple sessions and involve finding, analyzing and evaluating information and collab-orating with others. Typical search systems, on the other hand, are designed to support a single searcher, precision-oriented search tasks. We describe a search interface and system design of a multi-session exploratory search system, discuss design challenges en-countered, and chronicle the evolution of our design. Our design describes novel displays for visualizing retrieval history infor-mation, and introduces ambient displays and persuasive elements to interactive information retrieval.
Publication Details
  • DIS (Designing Interactive Systems) 2012 Demos track
  • Jun 11, 2012

Abstract

Close
We will demonstrate successive and final stages in the iterative design of a complex mixed reality system in a real-world factory setting. In collaboration with TCHO, a chocolate maker in San Francisco, we built a virtual “mirror” world of a real-world chocolate factory and its processes. Sensor data is imported into the multi-user 3D environment from hundreds of sensors and a number of cameras on the factory floor. The resulting virtual factory is used for simulation, visualization, and collaboration, using a set of interlinked, real-time layers of information. It can be a stand-alone or a web-based application, and also works on iOS and Android cell phones and tablet computers. A unique aspect of our system is that it is designed to enable the incorporation of lightweight social media-style interactions with co-workers along with factory data. Through this mixture of mobile, social, mixed and virtual technologies, we hope to create systems for enhanced collaboration in industrial settings between physically remote people and places, such as factories in China with managers in the US.
Publication Details
  • CHI 2012
  • May 7, 2012

Abstract

Close
Affect influences workplace collaboration and thereby impacts a workplace's productivity. Participants in face-toface interactions have many cues to each other's affect, but work is increasingly carried out via computer-mediated channels that lack many of these cues. Current presence systems enable users to estimate the availability of other users, but not their affect states or communication preferences. This work investigates relationships between affect state and communication preferences and demonstrates the feasibility of estimating affect state and communication preferences from a presence state stream.
Publication Details
  • CHI 2012
  • May 5, 2012

Abstract

Close
Abstract: Pico projectors have lately been investigated as mobile display and interaction devices. We propose to use them as ‘light beams’: Everyday objects sojourning in a beam are turned into dedicated projection surfaces and tangible interaction devices. While this has been explored for large projectors, the affordances of pico projectors are fundamentally different: they have a very small and strictly limited projection ray and can be carried around in a nomadic way during the day. Thus it is unclear how this could be actually leveraged for tangible interaction with physical, real world objects. We have investigated this in an exploratory field study and contribute the results. Based upon these, we present exemplary interaction techniques and early user feedback.

Designing a tool for exploratory information seeking

Publication Details
  • CHI 2012
  • May 5, 2012

Abstract

Close
In this paper we describe our on-going design process in building a search system designed to support people's multi-session exploratory search tasks. The system, called Querium, allows people to run queries and to examine results as do conventional search engines, but it also integrates a sophisticated search history that helps people make sense of their search activity over time. Information seeking is a cognitively demanding process that can benefit from many kinds of information, if that information is presented appropriately. Our design process has been focusing on creating displays that facilitate on-going sense-making while keeping the interaction efficient, fluid, and enjoyable.

Querium: A Session-Based Collaborative Search System

Publication Details
  • European Conference on Information Retrieval 2012
  • Apr 1, 2012

Abstract

Close
People's information-seeking can span multiple sessions, and can be collaborative in nature. Existing commercial offerings do not effectively support searchers to share, save, collaborate or revisit their information. In this demo paper we present Querium: a novel session-based collaborative search system that lets users search, share, resume and collaborate with other users. Querium provides a number of novel search features in a collaborative setting, including relevance feedback, query fusion, faceted search, and search histories
Publication Details
  • DAS 2012
  • Mar 27, 2012

Abstract

Close
This paper describes a system for capturing images of a book with a 3D stereo camera which performs dewarping to produce output images that are flattened. A Fujifilm consumer grade 3D camera (FinePix W3) provides a highly mobile and low cost 3D capture device. Applying standard computer vision algorithms, the camera is calibrated and the captured images are stereo rectified. Due to technical limitations, the resulting point cloud has defects such as splotches and noise, which make it hard to recover the precise 3D locations of the points on the book pages. We address this problem by computing curve profiles of the depth map and using them to build a cylinder model of the pages. We then generate a mesh M1 on the source image and project this into a mesh M2 on the cylinder model in virtual space. Finally, the mesh M2 is flattened and the pixels in M1 are interpolated and rendered via M2 onto the output image. We have implemented a prototype of the system and report on some preliminary evaluation results.
Publication Details
  • ACM Transactions on Computer Human Interaction
  • Mar 1, 2012

Abstract

Close
To combine the affordances of paper and computers, prior research has proposed numerous interactive paper systems that link specific paper document content to digital operations such as multimedia playback and proofreading. Yet, it remains unclear to what degree these systems bridge the inherent gap between paper and computers when compared to existing paper-only and computer-only interfaces. In particular, given the special properties of paper, such as limited dynamic feedback, how well does an average new user learn to master the interactive paper system? What factors affect the user performance? And how does the paper interface work in a typical use scenario? To answer these questions, we conducted two empirical experiments on a generic pen gesture based command system, called PapierCraft [Liao, et al., 2008], for paper-based interfaces. With it, people can select sections of printed document and issue commands such as copy and paste, linking and in-text search. The first experiment focused on the user performance of drawing pen gestures on paper. It proves that users can learn the command system in about 30 minutes and achieve a performance comparable to a Table PC-based interface supporting the same gestures. The second experiment examined the application of the command system in Active Reading tasks. The results show promise for seamless integration of paper and computers in Active Reading for their combined affordances. In addition, our study identifies some key design issues, such as the pen form factor and feedback of gestures. This paper contributes to better understanding on pros and cons of paper and computers, and sheds light on the design of future interfaces for document interaction.

TalkMiner: A Lecture Video Search Engine

Publication Details
  • Fuji Xerox Technical Report, No. 21, 2012, pp. 118-128
  • Feb 3, 2012

Abstract

Close
The design and implementation of a search engine for lecture webcasts is described. A searchable text index is created allowing users to locate material within lecture videos found on a variety of websites such as YouTube and Berkeley webcasts. The searchable index is built from the text of presentation slides appearing in the video along with other associated metadata such as the title and abstract when available. The automatic identification of distinct slides within the video stream presents several challenges. For example, picture-in-picture compositing of a speaker and a presentation slide, switching cameras, and slide builds confuse basic algorithms for extracting keyframe slide images. Enhanced algorithms are described that improve slide identification. A public system was deployed to test the algorithms and the utility of the search engine at www.talkminer.com. To date, over 17,000 lecture videos have been indexed from a variety of public sources.
Publication Details
  • Fuji Xerox Technical Report No.21 2012
  • Feb 2, 2012

Abstract

Close
Modern office work practices increasingly breach traditional boundaries of time and place, making it difficult to interact with colleagues. To address these problems, we developed myUnity, a software and sensor platform that enables rich workplace awareness and coordination. myUnity is an integrated platform that collects information from a set of independent sensors and external data aggregators to report user location, availability, tasks, and communication channels. myUnity's sensing architecture is component-based, allowing channels of awareness information to be added, updated, or removed at any time. Multiple channels of input are combined and composited into a single, high-level presence state. Early studies of a myUnity deployment have demonstrated that the platform allows quick access to core awareness information and show that it has become a useful tool for supporting communication and collaboration in the modern workplace.
Publication Details
  • Personal and Ubiquitous Computing (PUC)
  • Feb 1, 2012

Abstract

Close
Presence systems are valuable in supporting workplace communication and collaboration. These systems are only effective if widely adopted and used. User perceptions of the utility of the information being shared and their comfort sharing such information strongly impact adoption and use. This paper describes the results of a survey of user preferences regarding comfort with and utility of workplace presence systems; the effects of sampling frequency, fidelity, and aggregation; and design implications of these results. We present new results that extend some past findings while challenging others. We contribute new design insights that inform the design of presence technologies to increase both utility and adoption.
2011
Publication Details
  • The 10th International Conference on Virtual Reality Continuum and Its Applications in Industry
  • Dec 11, 2011

Abstract

Close
Augmented Paper (AP) is an important area of Augmented Reality (AR). Many AP systems rely on visual features for paper doc-ument identification. Although promising, these systems can hardly support large sets of documents (i.e. one million documents) because of the high memory and time cost in handling high-dimensional features. On the other hand, general large-scale image identification techniques are not well customized to AP, costing unnecessarily more resource to achieve the identification accuracy required by AP. To address this mismatching between AP and image identification techniques, we propose a novel large-scale image identification technique well geared to AP. At its core is a geometric verification scheme based on Minimum visual-word Correspondence Set (MICSs). MICS is a set of visual word (i.e. quantized visual fea-ture) correspondences, each of which contains a minimum number of correspondences that are sufficient for deriving a transformation hypothesis between a captured document image and an indexed image. Our method selects appropriate MICSs to vote in a Hough space of transformation parameters, and uses a robust dense region detection algorithm to locate the possible transformation models in the space. The models are then utilized to verify all the visual word correspondence to precisely identify the matching indexed image. By taking advantage of unique geometric constraints in AP, our method can significantly reduce the time and memory cost while achieving high accuracy. As showed in evaluation with two AP systems called FACT and EMM, over a dataset with 1M+ images, our method achieves 100% identification accuracy and 0.67% registration error for FACT; For EMM, our method outperforms the state-of-the-art image identification approach by achieving 4% improvements in detection rate and almost perfect precision, while saving 40% and 70% memory and time cost.

PaperUI

Publication Details
  • Springer LNCS
  • Dec 1, 2011

Abstract

Close
PaperUI is a human-information interface concept that advocates using paper as displays and using mobile devices, such as camera phones or camera pens, as traditional computer-mice. When emphasizing technical efforts, some researchers like to refer the PaperUI related underlying work as interactive paper system. We prefer the term PaperUI for emphasizing the final goal, narrowing the discussion focus, and avoiding terminology confusion between interactive paper system and interactive paper computer [40]. PaperUI combines the merits of paper and the mobile devices, in that users can comfortably read and flexibly arrange document content on paper, and access digital functions related to the document via the mobile computing devices. This concept aims at novel interface technology to seamlessly bridge the gap between paper and computers for better user experience in handling documents. Compared with traditional laptops and tablet PCs, devices involved in the PaperUI concept are more light-weight, compact, energy efficient, and widely adopted. Therefore, we believe this interface vision can make computation more convenient to access for general public.
Publication Details
  • ACM Multimedia 2011
  • Nov 28, 2011

Abstract

Close
This paper describes methods for clustering photos that include both time stamps and location coordinates. We present versions of a two part method that first detects clusters using time and location information independently. These candidate clusters partition the set of time-ordered photos. A subset of the candidate clusters is selected by an efficient dynamic programming procedure to optimize a cost function. We propose several cost functions to design clusterings that are coherent in space, time, or both. One set of cost functions minimizes inter-photo distances directly. A second set maximizes an information measure to select clusterings for consistency in both time and space across scale.
Publication Details
  • ACM Multimedia 2011
  • Nov 28, 2011

Abstract

Close
Embedded Media Markers (EMMs) are nearly transparent icons printed on paper documents that link to associated digital media. By using the document content for retrieval, EMMs are less visually intrusive than barcodes and other glyphs while still providing an indication for the presence of links. An initial implementation demonstrated good overall performance but exposed difficulties in guaranteeing the creation of unambiguous EMMs. We developed an EMM authoring tool that supports the interactive authoring of EMMs via visualizations that show the user which areas on a page may cause recognition errors and automatic feedback that moves the authored EMM away from those areas. The authoring tool and the techniques it relies on have been applied to corpora with different visual characteristics to explore the generality of our approach.
Publication Details
  • ACM Multimedia Industrial Exhibit
  • Nov 28, 2011

Abstract

Close
The Active Reading Application (ARA) brings the familiar experience of writing on paper to the tablet. The application augments paper-based practices with audio, the ability to review annotations, and sharing. It is designed to make it easier to review, annotate, and comment on documents by individuals and groups. ARA incorporates several patented technologies and draws on several years of research and experimentation.
Publication Details
  • ACM Multimedia Industrial Exhibits
  • Nov 28, 2011

Abstract

Close
Modern office work practices increasingly breach traditional boundaries of time and place, making it difficult to interact with colleagues. To address these problems, we developed myUnity, a software and sensor platform that enables rich workplace awareness and coordination. myUnity is an integrated platform that collects information from a set of independent sensors and external data aggregators to report user location, availability, tasks, and communication channels. myUnity's sensing architecture is component-based, allowing channels of awareness information to be added, updated, or removed at any time. Our current system includes a variety of sensor and data input, including camera-based activity classification, wireless location trilateration, and network activity monitoring. These and other input channels are combined and composited into a single, high-level presence state. Early studies of a myUnity deployment have demonstrated that use of the platform allows quick access to core awareness information and show it has become a useful tool supporting communication and collaboration in the modern workplace.

Session-based search with Querium

Publication Details
  • HCIR 2011
  • Oct 20, 2011

Abstract

Close
We illustrate the use of Querium, a novel search system designed to support people's collaborative and multi-session search tasks, in the context of the HCIR 2011 Search Challenge. This report demonstrates how a Querium's interface and search engine can be used to search for documents in an open-ended, exploratory task. We illustrate the use of relevance feedback, faceted search, query fusion, and the search history, as well as commenting and overview functions.

Designing for Collaboration in Information Seeking

Publication Details
  • HCIR 2011
  • Oct 19, 2011

Abstract

Close
Information seeking is often a collaborative activity that can take can take many forms; in this paper we focus on explicit, intentional collaboration of small and explore a range of design decisions that should be considered when building Human-Computer Information Retrieval (HCIR) tools that support collaboration. In particular, we are interested in exploring the interplay between algorithmic mediation of collaboration and the mediated communication among team members. We argue that certain characteristics of the group's information need call for different design decisions.
Publication Details
  • Oct 3, 2011

Abstract

Close
Documents created, stored, and retrieved digitally are often printed on paper to be read for the purposes of producing new documents. The cycle of electronic document "consumption" and production is often broken in the middle by printing. Our research in XLibris has examined these transitions between the digital and paper worlds. Starting with interfaces for analytic reading, we have focused on annotation, on retrieval and re-retrieval, and on shared annotation. In this talk, I will describe the interfaces and the empirical evaluations we have conducted, and will discuss the potential of this technology in digital--and in physical--libraries.

PaperUI

Publication Details
  • CBDAR 2011
  • Sep 18, 2011

Abstract

Close
PaperUI is a human-computer interface concept that treats paper as displays that users can interact with via mobile devices such as mobile phones and projectors. It combines the merits of paper and the mobile devices. Compared with traditional laptops and tablet PCs, devices involved in this concept are more light-weight, compact, energy efficient, and widely adopted. Therefore, we believe this interface vision can make computation more convenient to access for general public. With our implemented prototype, pilot users can read documents easily and comfortably on paper, and access many digital functions related to the document via a camera phone or a mobile projector Invited Talk. http://imlab.jp/cbdar2011/#keynote

Abstract

Close
This demo shows an interactive paper system called MixPad, which features using mice and keyboards to enhance the conventional pen-finger-gesture based interaction with paper documents. Similar to many interactive paper systems, MixPad adopts a mobile camera-projector unit to recognize paper documents, detect pen and finger gestures and provide visual feedback. Unlike these systems, MixPad allows using mice and keyboards to help users interact with fine-grained document content on paper (e.g. individual words and user-defined arbitrary regions), and to facilitate cross-media operations. For instance, to copy a document segment from paper to a laptop, one first points a finger of her non-dominant hand to the segment roughly, and then uses a mouse in her dominant hand to refine the selection and drag it to the laptop; she can also type text as a detailed comment on a paper document. This novel interaction paradigm combines the advantages of mice, keyboards, pens and fingers, and therefore enables rich digital functions on paper.
Publication Details
  • MobileHCI
  • Aug 30, 2011

Abstract

Close
Modern office work practices increasingly breach traditional boundaries of time and place, increasing breakdowns workers encounter when coordinating interaction with colleagues. We conducted interviews with 12 workers and identified key problems introduced by these practices. To address these problems we developed myUnity, a fully functional platform enabling rich workplace awareness and coordination. myUnity is one of the first integrated platforms to span mobile and desktop environments, both in terms of access and sensing. It uses multiple sources to report user location, availability, tasks, and communication channels. A pilot field study of myUnity demonstrated the significant value of pervasive access to workplace awareness and communication facilities, as well as positive behavioral change in day-to-day communication practices for most users. We present resulting insights about the utility of awareness technology in flexible work environments.
Publication Details
  • International Journal of Arts and Technology
  • Jul 25, 2011

Abstract

Close

Mobile media applications need to balance user and group goals, attentional constraints, and limited screen real estate. In this paper, we describe the iterative development and testing of an application that explores these tradeo ffs. We developed early prototypes of a retrospective, time-based system as well as a prospective and space-based system. Our experiences with the prototypes led us to focus on the prospective system. We argue that attentional demands dominate and mobile media applications should be lightweight and hands-free as much as possible.

Estimation Methods for Ranking Recent Information

Publication Details
  • SIGIR2011
  • Jul 24, 2011

Abstract

Close
Temporal aspects of documents can impact relevance for certain kinds of queries. In this paper, we build on earlier work of modeling temporal information. We propose an extension to the Query Likelihood Model that incorporates query-specific information to estimate rate parameters, and we introduce a temporal factor into language model smoothing and query expansion using pseudo-relevance feedback. We evaluate these extensions using a Twitter corpus and two newspaper article collections. Results suggest that, compared to prior approaches, our models are more effective at capturing the temporal variability of relevance associated with some topics.

Secured histories for presence systems

Publication Details
  • SECOTS 2011
  • May 23, 2011

Abstract

Close
As sensors become ever more prevalent, more and more information will be collected about each of us. A longterm research question is how best to support beneficial uses while preserving individual privacy. Presence systems are an emerging class of applications that support collaboration. These systems leverage pervasive sensors to estimate end-user location, activities, and available communication channels. Because such presence data are sensitive, to achieve wide-spread adoption, sharing models must reflect the privacy and sharing preferences of the users. To reflect users' collaborative relationships and sharing desires, we introduce CollaPSE security in which an individual has full access to her own data, a third party processes the data without learning anything about the data values, and users higher up in the hierarchy learn only statistical information about the employees under them. We describe simple schemes that efficiently realize CollaPSE security for time series data. We implemented these protocols using readily available cryptographic functions, and integrated the protocols with FXPAL's MyUnity presence system.
Publication Details
  • CHI 2011 Workshop on Mobile and Personal Projection (MP2)
  • May 8, 2011

Abstract

Close
The field of personal mobile projection is advancing quickly and a variety of work focuses on enhancing physical objects in the real world with dynamically projected digital artifacts. Due to technological restrictions, none of them has yet investigated, what we feel is the most promising research direction: the (bi-manual) interaction with mobile projections on non-planar surfaces. To elicit the challenges of this field of research, we contribute (1) a technology-centered design space for mobile projector-based interfaces and discus related work in light thereof, (2) a discussion on lessons learnt from two of our research projects, which aim at improving both usability and user experience and (3) an outline of open research challenges within this field.