Publications

FXPAL publishes in top scientific conferences and journals.

2014
Publication Details
  • To appear in Ubicomp 2014
  • Sep 9, 2014

Abstract

Close
In recent years, there has been an explosion of social and collaborative applications that leverage location to provide users novel and engaging experiences. Current location technologies work well outdoors but fare poorly indoors. In this paper we present LoCo, a new framework that can provide highly accurate room-level location using a supervised classification scheme. We provide experiments that show this technique is orders of magnitude more efficient than current state-of-the-art Wi- Fi localization techniques. Low classification overhead and computational footprint make classification practical and efficient even on mobile devices. Our framework has also been designed to be easily deployed and lever- aged by developers to help create a new wave of location- driven applications and services.

Abstract

Close
In this paper, we describe Gesture Viewport, a projector-camera system that enables finger gesture interactions with media content on any surface. We propose a novel and computationally very efficient finger localization method based on the detection of occlusion patterns inside a virtual sensor grid rendered in a layer on top of a viewport widget. We develop several robust interaction techniques to prevent unintentional gestures to occur, to provide visual feedback to a user, and to minimize the interference of the sensor grid with the media content. We show the effectiveness of the system through three scenarios: viewing photos, navigating Google Maps, and controlling Google Street View.
Publication Details
  • ACM SIGIR International Workshop on Social Media Retrieval and Analysis
  • Jul 11, 2014

Abstract

Close
We examine the use of clustering to identify selfies in a social media user's photos for use in estimating demographic information such as age, gender, and race. Faces are first detected within a user's photos followed by clustering using visual similarity. We define a cluster scoring scheme that uses a combination of within-cluster visual similarity and average face size in a cluster to rank potential selfie-clusters. Finally, we evaluate this ranking approach over a collection of Twitter users and discuss methods that can be used for improving performance in the future.
Publication Details
  • SIGIR 2014
  • Jul 6, 2014
  • pp. pp.495-504

Abstract

Close
People often use more than one query when searching for information. They revisit search results to re-find information and build an understanding of their search need through iterative explorations of query formulation. These tasks are not well-supported by search interfaces and web browsers. We designed and built SearchPanel, a Chrome browser extension that helps people manage their ongoing information seeking. This extension combines document and process metadata into an interactive representation of the retrieved documents that can be used for sense-making, navigation, and re-finding documents. In a real-world deployment spanning over two months, results show that SearchPanel appears to have been primarily used for complex information needs, in search sessions with long durations and high numbers of queries. The process metadata features in SearchPanel seem to be of particular importance when working on complex information needs.

Supporting media bricoleurs

Publication Details
  • ACM interactions
  • Jul 1, 2014

Abstract

Close
Online video is incredibly rich. A 15-minute home improvement YouTube tutorial might include 1500 words of narration, 100 or more significant keyframes showing a visual change from multiple perspectives, several animated objects, references to other examples, a tool list, comments from viewers and a host of other metadata. Furthermore, video accounts for 90% of worldwide Internet traffic. However, it is our observation that video is not widely seen as a full-fledged document; dismissed as a media that, at worst, gilds over substance and, at best, simply augments text-based communications. In this piece, we suggest that negative attitudes toward multimedia documents that include audio and video are largely unfounded and arise mostly because we lack the necessary tools to treat video content as first-order media or to support seamlessly mixing media.
Publication Details
  • ACM TVX 2014
  • Jun 25, 2014

Abstract

Close
Creating compelling multimedia content is a difficult task. It involves not only the creative process of developing a compelling media-based story, but it also requires significant technical support for content editing, management and distribution. This has been true for printed, audio and visual presentations for centuries. It is certainly true for broadcast media such as radio and television. The talk will survey several approaches to describe and manage media interactions. We will focus on the temporal modeling of context-sensitive personalized interactions of complex collections of independent media objects. Using the concepts of ‘togetherness’ being employed in the EU’s FP-7 project TA2: Together Anywhere, Together Anytime, we will follow the process of media capture, profiling, composition, sharing and end-user manipulation. We will consider the promise of using automated tools and contrast this with the reality of letting real users manipulation presentation semantics in real time. The talk will not present a closed form solution, but will present a series of topics and problems that can stimulate the development of a new generation of systems to stimulate social media interaction.
Publication Details
  • ICWSM (The 8th International AAAI Conference on Weblogs and Social Media)
  • Jun 1, 2014

Abstract

Close
A topic-independent sentiment model is commonly used to estimate sentiment in microblogs. But for movie and product reviews, domain adaptation has been shown to improve sentiment estimation performance. We investigated the utility of topic-dependent polarity estimation models for microblogs. We examined both a model trained on Twitter tweets containing a target keyword and a model trained on an enlarged set of tweets containing terms related to a topic. Comparing the performance of the topic-dependent models to a topic-independent model trained on a general sample of tweets, we noted that for some topics, topic-dependent models performed better. We then propose a method for predicting which topics are likely to have better sentiment estimation performance when a topic-dependent sentiment model is used.
Publication Details
  • CHI 2014 (Interactivity)
  • Apr 26, 2014

Abstract

Close
AirAuth is a biometric authentication technique that uses in-air hand gestures to authenticate users tracked through a short-range depth sensor. Our method tracks multiple distinct points on the user's hand simultaneously that act as a biometric to further enhance security. We describe the details of our mobile demonstrator that will give Interactivity attendees an opportunity to enroll and verify our system's authentication method. We also wish to encourage users to design their own gestures for use with the system. Apart from engaging with the CHI community, a demonstration of AirAuth would also yield useful gesture data input by the attendees which we intend to use to further improve the prototype and, more importantly, make available publicly as a resource for further research into gesture-based user interfaces.
Publication Details
  • CHI Extended Abstracts 2014
  • Apr 26, 2014

Abstract

Close
AirAuth is a biometric, gesture-based authentication system based on in-air gesture input. We describe the operations necessary to sample enrollment gestures and to perform matching for authentication, using data from a short range depth sensor. We present the results of two initial user studies. A first study was conducted to crowd source a simple gesture set for use in further evaluations. The results of our second study indicate that AirAuth achieves a very high Equal Error Rate (EER-)based accuracy of 96.6 % for simple gesture set and 100 % for user-specific gestures. Future work will encompass the evaluation of possible attack scenarios and obtaining qualitative user feedback on usability advantages of gesture-based authentication.
Publication Details
  • ACM ICMR 2014
  • Apr 1, 2014

Abstract

Close
Motivated by scalable partial-duplicate visual search, there has been growing interest on a wealth of compact and efficient binary feature descriptors (e.g. ORB, FREAK, BRISK). Typically, binary descriptors are clustered into codewords and quantized with Hamming distance, which follows conventional bag-of-words strategy. However, such codewords formulated in Hamming space did not present obvious indexing and search performance improvement as compared to the Euclidean ones. In this paper, without explicit codeword construction, we explore to utilize binary descriptors as direct codebook indices (addresses). We propose a novel approach to build multiple index tables which parallelly check the collision of same hash values. The evaluation is performed on two public image datasets: DupImage and Holidays. The experimental results demonstrate the index efficiency and retrieval accuracy of our approach.
Publication Details
  • HotMobile 2014
  • Feb 26, 2014

Abstract

Close
In this paper, we propose HiFi system which enables users to interact with surrounding physical objects. It uses coded light to encode position in an environment. By attaching a tiny light sensor on a user’s mobile device, the user can attach digital info to arbitrary static physical objects or retrieve/modify them anchored to these objects. With this system, a family member may attach a digital maintenance schedule to a fish tank or indoor plants, etc. In a store, a store manager may use such system to attach price tag, discount info and multimedia contents to any products and customers can get the attached info by moving their phone close to the focused product. Similarly, a museum can use this system to provide extra info of displayed items to visitors. Different from computer vision based systems, HiFi does not have requests on texture, bright illumination, etc. Different from regular barcode approaches, HiFi does not require extra physical attachments that may change an object’s native appearance. HiFi has much higher spatial resolution for distinguishing close objects or attached parts of the same object. As HiFi system can track a mobile device at 80 positions per second, it also has much faster response than any above listed system.
Publication Details
  • Fuji Xerox Technical Report, No. 23, 2014, pp. 34-42
  • Feb 20, 2014

Abstract

Close
Video content creators invest enormous effort creating work that is in turn typically viewed passively. However, learning tasks using video requires users not only to consume the content but also to engage, interact with, and repurpose it. Furthermore, to promote learning with video in domains where content creators are not necessarily videographers, it is important that capture tools facilitate creation of interactive content. In this paper, we describe some early experiments toward this goal. A literature review coupled with formative field studies led to a system design that can incorporate a broad set of video-creation and interaction styles.
2013
Publication Details
  • IEEE ISM 2013
  • Dec 9, 2013

Abstract

Close
Real-time tele-immersion requires low latency, synchronized multi-camera capture. Prior high definition (HD) capture systems were bulky. We in vestigate the suitability of using flocks of smartphone cameras for tele-immersion. Smartphones can potentially integrate HD capture and streaming into a single portable package. However, they are designed for archiving the captured video into a movie. Hence, we create a sequence of H.264 movies and stream them. We lower the capture delay by reducing the number of frames in each movie segment. Increasing the number of movie segments adds compression overhead. Smartphone video encoders do not sacrifice video quality to lower the compression latency or the stream size. On an iPhone 4S, our application that uses published APIs streams 1920x1080 videos at 16.5 fps with a delay of 712 msec between a real-life event and displaying an uncompressed bitmap of this event on a local laptop. For comparison, the bulky Cisco Tandberg required 300 msec delay. Stereoscopic video from two unsynchronized smartphones showed minimal visual artifacts in an indoor teleconference setting.
Publication Details
  • Education and Information Technologies journal
  • Oct 11, 2013

Abstract

Close
Video tends to be imbalanced as a medium. Typically, content creators invest enormous effort creating work that is then watched passively. However, learning tasks require that users not only consume video but also engage, interact with, and repurpose content. Furthermore, to promote learning across domains where content creators are not necessarily videographers, it is important that capture tools facilitate creation of interactive content. In this paper, we describe some early experiments toward this goal. Specifically, we describe a needfinding study involving interviews with amateur video creators as well as our experience with an early prototype to support expository capture and access. Our findings led to a system redesign that can incorporate a broad set of video-creation and interaction styles.
Publication Details
  • Interactive Tabletops and Surfaces (ITS) 2013
  • Oct 6, 2013

Abstract

Close
The expressiveness of touch input can be increased by detecting additional finger pose information at the point of touch such as finger rotation and tilt. PointPose is a prototype that performs finger pose estimation at the location of touch using a short-range depth sensor viewing the touch screen of a mobile device. We present an algorithm that extracts finger rotation and tilt from a point cloud generated by a depth sensor oriented towards the device's touchscreen. The results of two user studies we conducted show that finger pose information can be extracted reliably using our proposed method. We show this for controlling rotation and tilt axes separately and also for combined input tasks using both axes. With the exception of the depth sensor, which is mounted directly on the mobile device, our approach does not require complex external tracking hardware, and, furthermore, external computation is unnecessary as the finger pose extraction algorithm can run directly on the mobile device. This makes PointPose ideal for prototyping and developing novel mobile user interfaces that use finger pose estimation.
Publication Details
  • ACM Trans. On Multimedia Computing, Communications and Applications (TOMCCAP)
  • Oct 1, 2013

Abstract

Close
A panel at ACM Multimedia 2012 addressed research successes in the past 20 years. While the panel focused on the past, this article discusses successes since the ACM SIGMM 2003 Retreat and suggests research directions in the next ten years. While significant progress has been made, more research is required to allow multimedia to impact our everyday computing environment. The importance of hardware changes on future research directions is discussed. We believe ubiquitous computing—meaning abundant computation and network bandwidth—should be applied in novel ways to solve multimedia grand challenges and continue the IT revolution of the past century.
Publication Details
  • DocEng 2013
  • Sep 10, 2013

Abstract

Close
Unlike text, copying and pasting parts of video documents is challenging. Yet, the huge amount of video documents now available in the form of how-to tutorials begs for simpler techniques that allow users to easily copy and paste fragments of video materials into new documents. We describe new direct video manipulation techniques that allow users to quickly copy and paste content from video documents such as how-to tutorials into a new document. While the video plays, users interact with the video canvas to select text regions, scrollable regions, slide sequences built up across many frames, or semantically meaningful regions such as dialog boxes. Instead of relying on the timeline to accurately select sub-parts of the video document, users navigate using familiar selection techniques such as mouse-wheel to scroll back and forward over a video region where content scrolls, double-clicks over rectangular regions to select them, or clicks and drags over textual regions of the video canvas to select them. We describe the video processing techniques that run in real-time in modern web browsers using HTML5 and JavaScript; and show how they help users quickly copy and paste video fragments into new documents, allowing them to efficiently reuse video documents for authoring or note-taking.
Publication Details
  • CBDAR 2013
  • Aug 23, 2013

Abstract

Close
Capturing book images is more convenient with a mobile phone camera than with more specialized flat-bed scanners or 3D capture devices. We built an application for the iPhone 4S that captures a sequence of hi-res (8 MP) images of a page spread as the user sweeps the device across the book. To do the 3D dewarping, we implemented two algorithms: optical flow (OF) and structure from motion (SfM). Making further use of the image sequence, we examined the potential of multi-frame OCR. Preliminary evaluation on a small set of data shows that OF and SfM had comparable OCR performance for both single-frame and multi-frame techniques, and that multi-frame was substantially better than single-frame. The computation time was much less for OF than for SfM.
Publication Details
  • EuroHCIR 2013
  • Aug 1, 2013

Abstract

Close
People often use more than one query when searching for information; they also revisit search results to re-find information. These tasks are not well-supported by search interfaces and web browsers. We designed and built a Chrome browser extension that helps people manage their ongoing information seeking. The extension combines document and process metadata into an interactive representation of the retrieved documents that can be used for sense-making, for navigation, and for re-finding documents.
Publication Details
  • SIGIR 2013
  • Jul 28, 2013

Abstract

Close
Exploratory search is a complex, iterative information seeking activity that involves running multiple queries, finding and examining many documents. We introduced a query preview interface that visualizes the distribution of newly-retrieved and re-retrieved documents prior to showing the detailed query results. When evaluating the preview control with a control condition, we found effects on both people’s information seeking behavior and improved retrieval performance. People spent more time formulating a query and were more likely to explore search results more deeply, retrieved a more diverse set of documents, and found more different relevant documents when using the preview. With more time spent on query formulation, higher quality queries were produced and as consequence the retrieval results improved; both average residual precision and recall was higher with the query preview present.
Publication Details
  • The International Symposium on Pervasive Displays
  • Jun 4, 2013

Abstract

Close
Existing user interfaces for the configuration of large shared displays with multiple inputs and outputs usually do not allow users easy and direct configuration of the display's properties such as window arrangement or scaling. To address this problem, we are exploring a gesture-based technique for manipulating display windows on shared display systems. To aid target selection under noisy tracking conditions, we propose VoroPoint, a modified Voronoi tessellation approach that increases the selectable target area of the display windows. By maximizing the available target area, users can select and interact with display windows with greater ease and precision.
Publication Details
  • Future Generation Computer Systems
  • May 28, 2013

Abstract

Close

Collaboration technologies must support information sharing between collaborators, but must also take care not to share too much information or share information too widely. Systems that share information without requiring an explicit action by a user to initiate the sharing must be particularly cautious in this respect. Presence systems are an emerging class of applications that support collaboration. Through the use of pervasive sensors, these systems estimate user location, activities, and available communication channels. Because such presence data are sensitive, to achieve wide-spread adoption, sharing models must reflect the privacy and sharing preferences of their users. This paper looks at the role that privacy-preserving aggregation can play in addressing certain user sharing and privacy concerns with respect to presence data. We define conditions to achieve CollaPSE (Collaboration Presence Sharing Encryption) security, in which (i) an individual has full access to her own data, (ii) a third party performs computation on the data without learning anything about the data values, and (iii) people with special privileges called “analysts” can learn statistical information about groups of individuals, but nothing about the individual values contributing to the statistic other than what can be deduced from the statistic. More specifically, analysts can decrypt aggregates without being able to decrypt the individual values contributing to the aggregate. Based in part on studies we carried out that illustrate the need for the conditions encapsulated by CollaPSE security, we designed and implemented a family of CollaPSE protocols. We analyze their security, discuss efficiency tradeoffs, describe extensions, and review more recent privacy-preserving aggregation work.

Publication Details
  • CHI 2013
  • Apr 27, 2013

Abstract

Close
Although longer queries can produce better results for information seeking tasks, people tend to type short queries. We created an interface designed to encourage people to type longer queries, and evaluated it in two Mechanical Turk experiments. Results suggest that our interface manipulation may be effective for eliciting longer queries.
Publication Details
  • IUI 2013
  • Mar 19, 2013

Abstract

Close
People frequently capture photos with their smartphones, and some are starting to capture images of documents. However, the quality of captured document images is often lower than expected, even when applications that perform post-processing to improve the image are used. To improve the quality of captured images before post-processing, we developed a Smart Document Capture (SmartDCap) application that provides real-time feedback to users about the likely quality of a captured image. The quality measures capture the sharpness and framing of a page or regions on a page, such as a set of one or more columns, a part of a column, a figure, or a table. Using our approach, while users adjust the camera position, the application automatically determines when to take a picture of a document to produce a good quality result. We performed a subjective evaluation comparing SmartDCap and the Android Ice Cream Sandwich (ICS) camera application; we also used raters to evaluate the quality of the captured images. Our results indicate that users find SmartDCap to be as easy to use as the standard ICS camera application. Additionally, images captured using SmartDCap are sharper and better framed on average than images using the ICS camera application.

Abstract

Close
Motivated by the addition of gyroscopes to a large number of new smart phones, we study the effects of combining accelerometer and gyroscope data on the recognition rate of motion gesture recognizers with dimensionality constraints. Using a large data set of motion gestures we analyze results for the following algorithms: Protractor3D, Dynamic Time Warping (DTW) and Regularized Logistic Regression (LR). We chose to study these algorithms because they are relatively easy to implement, thus well suited for rapid prototyping or early deployment during prototyping stages. For use in our analysis, we contribute a method to extend Protractor3D to work with the 6D data obtained by combining accelerometer and gyroscope data. Our results show that combining accelerometer and gyroscope data is beneficial also for algorithms with dimensionality constraints and improves the gesture recognition rate on our data set by up to 4%.