Publications

FXPAL publishes in top scientific conferences and journals.

2014
Publication Details
  • MobileHCI 2014 (Industrial Case Study)
  • Sep 23, 2014

Abstract

Close
Telepresence systems usually lack mobility. Polly, a wearable telepresence device, allows users to explore remote locations or experience events remotely by means of a person that serves as a mobile "guide". We built a series of hardware prototypes and our current, most promising embodiment consists of a smartphone mounted on a stabilized gimbal that is wearable. The gimbal enables remote control of the viewing angle as well as providing active image stabilization while the guide is walking. We present qualitative findings from a series of 8 field tests using either Polly or only a mobile phone. We found that guides felt more physical comfort when using Polly vs. a phone and that Polly was accepted by other persons at the remote location. Remote participants appreciated the stabilized video and ability to control camera view. Connection and bandwidth issues appear to be the most challenging issues for Polly-like systems.
Publication Details
  • MobileHCI 2014 (Full Paper)
  • Sep 23, 2014

Abstract

Close
Secure authentication with devices or services that store sensitive and personal information is highly important. However, traditional password and pin-based authentication methods compromise between the level of security and user experience. AirAuth is a biometric authentication technique that uses in-air gesture input to authenticate users. We evaluated our technique on a predefined (simple) gesture set and our classifier achieved an average accuracy of 96.6% in an equal error rate (EER-)based study. We obtained an accuracy of 100% when exclusively using personal (complex) user gestures. In a further user study, we found that AirAuth is highly resilient to video-based shoulder surfing attacks, with a mea- sured false acceptance rate of just 2.2%. Furthermore, a longitudinal study demonstrates AirAuth’s repeatability and accuracy over time. AirAuth is relatively simple, robust and requires only a low amount of computational power and is hence deployable on embedded or mobile hardware. Un- like traditional authentication methods, our system’s security is positively aligned with user-rated pleasure and excitement levels. In addition, AirAuth attained acceptability ratings in personal, office, and public spaces that are comparable to an existing stroke-based on-screen authentication technique. Based on the results presented in this paper, we believe that AirAuth shows great promise as a novel, secure, ubiquitous, and highly usable authentication method.
Publication Details
  • DocEng 2014
  • Sep 16, 2014

Abstract

Close
Distributed teams must co-ordinate a variety of tasks. To do so they need to be able to create, share, and annotate documents as well as discuss plans and goals. Many workflow tools support document sharing, while other tools support videoconferencing, however there exists little support for connecting the two. In this work we describe a system that allows users to share and markup content during web meetings. This shared content can provide important conversational props within the context of a meeting; it can also help users review archived meetings. Users can also extract shared content from meetings directly into other workflow tools.
Publication Details
  • Assistive Computer Vision and Robotics Workshop of ECCV
  • Sep 12, 2014

Abstract

Close
Polly is an inexpensive, portable telepresence device based on the metaphor of a parrot riding a guide's shoulder and acting as proxy for remote participants. Although remote users may be anyone with a desire for `tele-visits', we focus on limited mobility users. We present a series of prototypes and field tests that informed design iterations. Our current implementations utilize a smartphone on a stabilized, remotely controlled gimbal that can be hand held, placed on perches or carried by wearable frame. We describe findings from trials at campus, museum and faire tours with remote users, including quadriplegics. We found guides were more comfortable using Polly than a phone and that Polly was accepted by other people. Remote participants appreciated stabilized video and having control of the camera. One challenge is negotiation of movement and view control. Our tests suggests Polly is an effective alternative to telepresence robots, phones or fixed cameras.

Abstract

Close
In recent years, there has been an explosion of social and collaborative applications that leverage location to provide users novel and engaging experiences. Current location technologies work well outdoors but fare poorly indoors. In this paper we present LoCo, a new framework that can provide highly accurate room-level location using a supervised classification scheme. We provide experiments that show this technique is orders of magnitude more efficient than current state-of-the-art Wi- Fi localization techniques. Low classification overhead and computational footprint make classification practical and efficient even on mobile devices. Our framework has also been designed to be easily deployed and lever- aged by developers to help create a new wave of location- driven applications and services.
Publication Details
  • International Journal of Multimedia Information Retrieval Special Issue on Cross-Media Analysis
  • Sep 4, 2014

Abstract

Close
Media Embedded Target, or MET, is an iconic mark printed in a blank margin of a page that indicates a media link is associated with a nearby region of the page. It guides the user to capture the region and thus retrieve the associated link through visual search within indexed content. The target also serves to separate page regions with media links from other regions of the page. The capture application on the cell phone displays a sight having the same shape as the target near the edge of a camera-view display. The user moves the phone to align the sight with the target printed on the page. Once the system detects correct sight-target alignment, the region in the camera view is captured and sent to the recognition engine which identifies the image and causes the associated media to be displayed on the phone. Since target and sight alignment defines a capture region, this approach saves storage by only indexing visual features in the predefined capture region, rather than indexing the entire page. Target-sight alignment assures that the indexed region is fully captured. We compare the use of MET for guiding capture with two standard methods: one that uses a logo to indicate that media content is available and text to define the capture region and another that explicitly indicates the capture region using a visible boundary mark.

Abstract

Close
In this paper, we describe Gesture Viewport, a projector-camera system that enables finger gesture interactions with media content on any surface. We propose a novel and computationally very efficient finger localization method based on the detection of occlusion patterns inside a virtual sensor grid rendered in a layer on top of a viewport widget. We develop several robust interaction techniques to prevent unintentional gestures to occur, to provide visual feedback to a user, and to minimize the interference of the sensor grid with the media content. We show the effectiveness of the system through three scenarios: viewing photos, navigating Google Maps, and controlling Google Street View.
Publication Details
  • ACM SIGIR International Workshop on Social Media Retrieval and Analysis
  • Jul 11, 2014

Abstract

Close
We examine the use of clustering to identify selfies in a social media user's photos for use in estimating demographic information such as age, gender, and race. Faces are first detected within a user's photos followed by clustering using visual similarity. We define a cluster scoring scheme that uses a combination of within-cluster visual similarity and average face size in a cluster to rank potential selfie-clusters. Finally, we evaluate this ranking approach over a collection of Twitter users and discuss methods that can be used for improving performance in the future.
Publication Details
  • SIGIR 2014
  • Jul 6, 2014
  • pp. pp.495-504

Abstract

Close
People often use more than one query when searching for information. They revisit search results to re-find information and build an understanding of their search need through iterative explorations of query formulation. These tasks are not well-supported by search interfaces and web browsers. We designed and built SearchPanel, a Chrome browser extension that helps people manage their ongoing information seeking. This extension combines document and process metadata into an interactive representation of the retrieved documents that can be used for sense-making, navigation, and re-finding documents. In a real-world deployment spanning over two months, results show that SearchPanel appears to have been primarily used for complex information needs, in search sessions with long durations and high numbers of queries. The process metadata features in SearchPanel seem to be of particular importance when working on complex information needs.

Supporting media bricoleurs

Publication Details
  • ACM interactions
  • Jul 1, 2014

Abstract

Close
Online video is incredibly rich. A 15-minute home improvement YouTube tutorial might include 1500 words of narration, 100 or more significant keyframes showing a visual change from multiple perspectives, several animated objects, references to other examples, a tool list, comments from viewers and a host of other metadata. Furthermore, video accounts for 90% of worldwide Internet traffic. However, it is our observation that video is not widely seen as a full-fledged document; dismissed as a media that, at worst, gilds over substance and, at best, simply augments text-based communications. In this piece, we suggest that negative attitudes toward multimedia documents that include audio and video are largely unfounded and arise mostly because we lack the necessary tools to treat video content as first-order media or to support seamlessly mixing media.
Publication Details
  • ACM TVX 2014
  • Jun 25, 2014

Abstract

Close
Creating compelling multimedia content is a difficult task. It involves not only the creative process of developing a compelling media-based story, but it also requires significant technical support for content editing, management and distribution. This has been true for printed, audio and visual presentations for centuries. It is certainly true for broadcast media such as radio and television. The talk will survey several approaches to describe and manage media interactions. We will focus on the temporal modeling of context-sensitive personalized interactions of complex collections of independent media objects. Using the concepts of ‘togetherness’ being employed in the EU’s FP-7 project TA2: Together Anywhere, Together Anytime, we will follow the process of media capture, profiling, composition, sharing and end-user manipulation. We will consider the promise of using automated tools and contrast this with the reality of letting real users manipulation presentation semantics in real time. The talk will not present a closed form solution, but will present a series of topics and problems that can stimulate the development of a new generation of systems to stimulate social media interaction.
Publication Details
  • ICWSM (The 8th International AAAI Conference on Weblogs and Social Media)
  • Jun 1, 2014

Abstract

Close
A topic-independent sentiment model is commonly used to estimate sentiment in microblogs. But for movie and product reviews, domain adaptation has been shown to improve sentiment estimation performance. We investigated the utility of topic-dependent polarity estimation models for microblogs. We examined both a model trained on Twitter tweets containing a target keyword and a model trained on an enlarged set of tweets containing terms related to a topic. Comparing the performance of the topic-dependent models to a topic-independent model trained on a general sample of tweets, we noted that for some topics, topic-dependent models performed better. We then propose a method for predicting which topics are likely to have better sentiment estimation performance when a topic-dependent sentiment model is used.
Publication Details
  • CHI 2014 (Interactivity)
  • Apr 26, 2014

Abstract

Close
AirAuth is a biometric authentication technique that uses in-air hand gestures to authenticate users tracked through a short-range depth sensor. Our method tracks multiple distinct points on the user's hand simultaneously that act as a biometric to further enhance security. We describe the details of our mobile demonstrator that will give Interactivity attendees an opportunity to enroll and verify our system's authentication method. We also wish to encourage users to design their own gestures for use with the system. Apart from engaging with the CHI community, a demonstration of AirAuth would also yield useful gesture data input by the attendees which we intend to use to further improve the prototype and, more importantly, make available publicly as a resource for further research into gesture-based user interfaces.
Publication Details
  • CHI Extended Abstracts 2014
  • Apr 26, 2014

Abstract

Close
AirAuth is a biometric, gesture-based authentication system based on in-air gesture input. We describe the operations necessary to sample enrollment gestures and to perform matching for authentication, using data from a short range depth sensor. We present the results of two initial user studies. A first study was conducted to crowd source a simple gesture set for use in further evaluations. The results of our second study indicate that AirAuth achieves a very high Equal Error Rate (EER-)based accuracy of 96.6 % for simple gesture set and 100 % for user-specific gestures. Future work will encompass the evaluation of possible attack scenarios and obtaining qualitative user feedback on usability advantages of gesture-based authentication.
Publication Details
  • ACM ICMR 2014
  • Apr 1, 2014

Abstract

Close
Motivated by scalable partial-duplicate visual search, there has been growing interest on a wealth of compact and efficient binary feature descriptors (e.g. ORB, FREAK, BRISK). Typically, binary descriptors are clustered into codewords and quantized with Hamming distance, which follows conventional bag-of-words strategy. However, such codewords formulated in Hamming space did not present obvious indexing and search performance improvement as compared to the Euclidean ones. In this paper, without explicit codeword construction, we explore to utilize binary descriptors as direct codebook indices (addresses). We propose a novel approach to build multiple index tables which parallelly check the collision of same hash values. The evaluation is performed on two public image datasets: DupImage and Holidays. The experimental results demonstrate the index efficiency and retrieval accuracy of our approach.
Publication Details
  • HotMobile 2014
  • Feb 26, 2014

Abstract

Close
In this paper, we propose HiFi system which enables users to interact with surrounding physical objects. It uses coded light to encode position in an environment. By attaching a tiny light sensor on a user’s mobile device, the user can attach digital info to arbitrary static physical objects or retrieve/modify them anchored to these objects. With this system, a family member may attach a digital maintenance schedule to a fish tank or indoor plants, etc. In a store, a store manager may use such system to attach price tag, discount info and multimedia contents to any products and customers can get the attached info by moving their phone close to the focused product. Similarly, a museum can use this system to provide extra info of displayed items to visitors. Different from computer vision based systems, HiFi does not have requests on texture, bright illumination, etc. Different from regular barcode approaches, HiFi does not require extra physical attachments that may change an object’s native appearance. HiFi has much higher spatial resolution for distinguishing close objects or attached parts of the same object. As HiFi system can track a mobile device at 80 positions per second, it also has much faster response than any above listed system.
Publication Details
  • Fuji Xerox Technical Report, No. 23, 2014, pp. 34-42
  • Feb 20, 2014

Abstract

Close
Video content creators invest enormous effort creating work that is in turn typically viewed passively. However, learning tasks using video requires users not only to consume the content but also to engage, interact with, and repurpose it. Furthermore, to promote learning with video in domains where content creators are not necessarily videographers, it is important that capture tools facilitate creation of interactive content. In this paper, we describe some early experiments toward this goal. A literature review coupled with formative field studies led to a system design that can incorporate a broad set of video-creation and interaction styles.
2013
Publication Details
  • IEEE ISM 2013
  • Dec 9, 2013

Abstract

Close
Real-time tele-immersion requires low latency, synchronized multi-camera capture. Prior high definition (HD) capture systems were bulky. We in vestigate the suitability of using flocks of smartphone cameras for tele-immersion. Smartphones can potentially integrate HD capture and streaming into a single portable package. However, they are designed for archiving the captured video into a movie. Hence, we create a sequence of H.264 movies and stream them. We lower the capture delay by reducing the number of frames in each movie segment. Increasing the number of movie segments adds compression overhead. Smartphone video encoders do not sacrifice video quality to lower the compression latency or the stream size. On an iPhone 4S, our application that uses published APIs streams 1920x1080 videos at 16.5 fps with a delay of 712 msec between a real-life event and displaying an uncompressed bitmap of this event on a local laptop. For comparison, the bulky Cisco Tandberg required 300 msec delay. Stereoscopic video from two unsynchronized smartphones showed minimal visual artifacts in an indoor teleconference setting.
Publication Details
  • Education and Information Technologies journal
  • Oct 11, 2013

Abstract

Close
Video tends to be imbalanced as a medium. Typically, content creators invest enormous effort creating work that is then watched passively. However, learning tasks require that users not only consume video but also engage, interact with, and repurpose content. Furthermore, to promote learning across domains where content creators are not necessarily videographers, it is important that capture tools facilitate creation of interactive content. In this paper, we describe some early experiments toward this goal. Specifically, we describe a needfinding study involving interviews with amateur video creators as well as our experience with an early prototype to support expository capture and access. Our findings led to a system redesign that can incorporate a broad set of video-creation and interaction styles.
Publication Details
  • Interactive Tabletops and Surfaces (ITS) 2013
  • Oct 6, 2013

Abstract

Close
The expressiveness of touch input can be increased by detecting additional finger pose information at the point of touch such as finger rotation and tilt. PointPose is a prototype that performs finger pose estimation at the location of touch using a short-range depth sensor viewing the touch screen of a mobile device. We present an algorithm that extracts finger rotation and tilt from a point cloud generated by a depth sensor oriented towards the device's touchscreen. The results of two user studies we conducted show that finger pose information can be extracted reliably using our proposed method. We show this for controlling rotation and tilt axes separately and also for combined input tasks using both axes. With the exception of the depth sensor, which is mounted directly on the mobile device, our approach does not require complex external tracking hardware, and, furthermore, external computation is unnecessary as the finger pose extraction algorithm can run directly on the mobile device. This makes PointPose ideal for prototyping and developing novel mobile user interfaces that use finger pose estimation.
Publication Details
  • ACM Trans. On Multimedia Computing, Communications and Applications (TOMCCAP)
  • Oct 1, 2013

Abstract

Close
A panel at ACM Multimedia 2012 addressed research successes in the past 20 years. While the panel focused on the past, this article discusses successes since the ACM SIGMM 2003 Retreat and suggests research directions in the next ten years. While significant progress has been made, more research is required to allow multimedia to impact our everyday computing environment. The importance of hardware changes on future research directions is discussed. We believe ubiquitous computing—meaning abundant computation and network bandwidth—should be applied in novel ways to solve multimedia grand challenges and continue the IT revolution of the past century.
Publication Details
  • DocEng 2013
  • Sep 10, 2013

Abstract

Close
Unlike text, copying and pasting parts of video documents is challenging. Yet, the huge amount of video documents now available in the form of how-to tutorials begs for simpler techniques that allow users to easily copy and paste fragments of video materials into new documents. We describe new direct video manipulation techniques that allow users to quickly copy and paste content from video documents such as how-to tutorials into a new document. While the video plays, users interact with the video canvas to select text regions, scrollable regions, slide sequences built up across many frames, or semantically meaningful regions such as dialog boxes. Instead of relying on the timeline to accurately select sub-parts of the video document, users navigate using familiar selection techniques such as mouse-wheel to scroll back and forward over a video region where content scrolls, double-clicks over rectangular regions to select them, or clicks and drags over textual regions of the video canvas to select them. We describe the video processing techniques that run in real-time in modern web browsers using HTML5 and JavaScript; and show how they help users quickly copy and paste video fragments into new documents, allowing them to efficiently reuse video documents for authoring or note-taking.
Publication Details
  • CBDAR 2013
  • Aug 23, 2013

Abstract

Close
Capturing book images is more convenient with a mobile phone camera than with more specialized flat-bed scanners or 3D capture devices. We built an application for the iPhone 4S that captures a sequence of hi-res (8 MP) images of a page spread as the user sweeps the device across the book. To do the 3D dewarping, we implemented two algorithms: optical flow (OF) and structure from motion (SfM). Making further use of the image sequence, we examined the potential of multi-frame OCR. Preliminary evaluation on a small set of data shows that OF and SfM had comparable OCR performance for both single-frame and multi-frame techniques, and that multi-frame was substantially better than single-frame. The computation time was much less for OF than for SfM.
Publication Details
  • EuroHCIR 2013
  • Aug 1, 2013

Abstract

Close
People often use more than one query when searching for information; they also revisit search results to re-find information. These tasks are not well-supported by search interfaces and web browsers. We designed and built a Chrome browser extension that helps people manage their ongoing information seeking. The extension combines document and process metadata into an interactive representation of the retrieved documents that can be used for sense-making, for navigation, and for re-finding documents.
Publication Details
  • SIGIR 2013
  • Jul 28, 2013

Abstract

Close
Exploratory search is a complex, iterative information seeking activity that involves running multiple queries, finding and examining many documents. We introduced a query preview interface that visualizes the distribution of newly-retrieved and re-retrieved documents prior to showing the detailed query results. When evaluating the preview control with a control condition, we found effects on both people’s information seeking behavior and improved retrieval performance. People spent more time formulating a query and were more likely to explore search results more deeply, retrieved a more diverse set of documents, and found more different relevant documents when using the preview. With more time spent on query formulation, higher quality queries were produced and as consequence the retrieval results improved; both average residual precision and recall was higher with the query preview present.