Publications

By Qiong Liu (Clear Search)

2017
Publication Details
  • IEEE PerCom 2017
  • Mar 13, 2017

Abstract

Close
We present Lift, a visible light-enabled finger tracking and object localization technique that allows users to perform freestyle multi-touch gestures on any object’s surface in an everyday environment. By projecting encoded visible patterns onto an object’s surface (e.g. paper, display, or table), and localizing the user’s fingers with light sensors, Lift offers users a richer interactive space than the device’s existing interfaces. Additionally, everyday objects can be augmented by attaching sensor units onto their surface to accept multi-touch gesture input. We also present two applications as a proof of concept. Finally, results from our experiments indicate that Lift can localize ten fingers simultaneously with accuracy of 0.9 mm and 1.8 mm on two axes respectively and an average refresh rate of 84 Hz with 16.7ms delay on WiFi and 12ms delay on serial, making gesture recognition on noninstrumented objects possible.
2016
Publication Details
  • ENCYCLOPEDIA WITH SEMANTIC COMPUTING
  • Oct 31, 2016

Abstract

Close
Improvements in sensor and wireless network enable accurate, automated, instant determination and dissemination of a user's or objects position. The new enabler of location-based services (LBSs) apart from the current ubiquitous networking infrastructure is the enrichment of the different systems with semantics information, such as time, location, individual capability, preference and more. Such semantically enriched system-modeling aims at developing applications with enhanced functionality and advanced reasoning capabilities. These systems are able to deliver more personalized services to users by domain knowledge with advanced reasoning mechanisms, and provide solutions to problems that were otherwise infeasible. This approach also takes user's preference and place property into consideration that can be utilized to achieve a comprehensive range of personalized services, such as advertising, recommendations, or polling. This paper provides an overview of indoor localization technologies, popular models for extracting semantics from location data, approaches for associating semantic information and location data, and applications that may be enabled with location semantics. To make the presentation easy to understand, we will use a museum scenario to explain pros and cons of different technologies and models. More specifically, we will first explore users' needs in a museum scenario. Based on these needs, we will then discuss advantages and disadvantages of using different localization technologies to meet these needs. From these discussions, we can highlight gaps between real application requirements and existing technologies, and point out promising localization research directions. By identifying gaps between various models and real application requirements, we can draw a road map for future location semantics research.
Publication Details
  • 3rd IEEE International Workshop on Mobile Multimedia Computing (MMC)
  • Jul 11, 2016

Abstract

Close
Mobile Audio Commander (MAC) is a mobile phone-based multimedia sensing system that facilitates the introduction of extra sensors to existing mobile robots for advanced capabilities. In this paper, we use MAC to introduce an accurate indoor positioning sensor to a robot to facilitate its indoor navigation. More specifically, we use a projector to send out position ID through light signal, use a light sensor and the audio channel on a mobile phone to decode the position ID, and send navigation commands to a target robot through audio output. With this setup, our system can simplify user’s robot navigation. Users can define a robot navigation path on a phone, and our system will compare the navigation path with its accurate location sensor inputs and generate analog line-following signal, collision avoidance signal, and analog angular signal to adjust the robot’s straight movements and turns. This paper describes two examples of using MAC and a positioning system to enable complicated robot navigation with proper user interface design, external circuit design and real sensor installations on existing robots.
Publication Details
  • IEEE Multimedia Magzine
  • May 2, 2016

Abstract

Close
Silicon Valley is home to many of the world’s largest technology corporations, as well as thousands of small startups. Despite the development of other high-tech economic centers throughout the US and around the world, Silicon Valley continues to be a leading hub for high-tech innovation and development, in part because most of its companies and universities are within 20 miles of each other. Given the high concentration of multimedia researchers in Silicon Valley, and the high demand for information exchange, I was able to work with a team of researchers from various companies and organizations to start the Bay Area Multimedia Forum (BAMMF) series back in November 2013.
2015
Publication Details
  • ISM 2015
  • Dec 14, 2015

Abstract

Close
Indoor localization is challenging in terms of both the accuracy and possible using scenarios. In this paper, we introduce the design and implementation of a toy car localization and navigation system, which demonstrates that a projected light based localization technique allows multiple devices to know and exchange their fine-grained location information in an indoor environment. The projected light consists of a sequence of gray code images which assigns each pixel in the projection area a unique gray code so as to distinguish their coordination. The light sensors installed on the toy car and the potential “passenger” receive the light stream from the projected light stream, based on which their locations are computed. The toy car then utilizes A* algorithm to plan the route based on its own location, orientation, the target’s location and the map of available “roads”. The fast speed of localization enables the toy car to adjust its own orientation while “driving” and keep itself on “roads”. The toy car system demonstrates that the localization technique can power other applications that require fine-grained location information of multiple objects simultaneously.

Abstract

Close
New technology comes about in a number of different ways. It may come from advances in scientific research, through new combinations of existing technology, or by simply from imagining what might be possible in the future. This video describes the evolution of Tabletop Telepresence, a system for remote collaboration through desktop videoconferencing combined with a digital desk. Tabletop Telepresence provides a means to share paper documents between remote desktops, interact with documents and request services (such as translation), and communicate with a remote person through a teleconference. It was made possible by combining advances in camera/projector technology that enable a fully functional digital desk, embodied telepresence in video conferencing and concept art that imagines future workstyles.
Publication Details
  • International Journal of Semantic Computing
  • Sep 15, 2015

Abstract

Close
A localization system is a coordinate system for describing the world, organizing the world, and controlling the world. Without a coordinate system, we cannot specify the world in mathematical forms; we cannot regulate processes that may involve spatial collisions; we cannot even automate a robot for physical actions. This paper provides an overview of indoor localization technologies, popular models for extracting semantics from location data, approaches for associating semantic information and location data, and applications that may be enabled with location semantics. To make the presentation easy to understand, we will use a museum scenario to explain pros and cons of different technologies and models. More specifically, we will first explore users' needs in a museum scenario. Based on these needs, we will then discuss advantages and disadvantages of using different localization technologies to meet these needs. From these discussions, we can highlight gaps between real application requirements and existing technologies, and point out promising localization research directions. Similarly, we will also discuss context information required by different applications and explore models and ontologies for connecting users, objects, and environment factors with semantics. By identifying gaps between various models and real application requirements, we can draw a road map for future location semantics research.

POLI: MOBILE AR BY HEARING POSITION FROM LIGHT

Publication Details
  • ICME 2015 Mobile Multimedia Workshop
  • Jun 29, 2015

Abstract

Close
Connecting digital information to physical objects can enrich their content and make them more vivid. Traditional augmented reality techniques reach this goal by augmenting physical objects or their surroundings with various markers and typically require end users to wear additional devices to explore the augmented content. In this paper, we propose POLI, which allows a system administrator to author digital content with his/her mobile device while allows end-users to explore the authored content with their mobile devices. POLI provides three novel interactive approaches for authoring digital content. It does not change the nature appearances of physical objects and does not require users to wear any additional hardware on their bodies.
2014
Publication Details
  • International Journal of Multimedia Information Retrieval Special Issue on Cross-Media Analysis
  • Sep 4, 2014

Abstract

Close
Media Embedded Target, or MET, is an iconic mark printed in a blank margin of a page that indicates a media link is associated with a nearby region of the page. It guides the user to capture the region and thus retrieve the associated link through visual search within indexed content. The target also serves to separate page regions with media links from other regions of the page. The capture application on the cell phone displays a sight having the same shape as the target near the edge of a camera-view display. The user moves the phone to align the sight with the target printed on the page. Once the system detects correct sight-target alignment, the region in the camera view is captured and sent to the recognition engine which identifies the image and causes the associated media to be displayed on the phone. Since target and sight alignment defines a capture region, this approach saves storage by only indexing visual features in the predefined capture region, rather than indexing the entire page. Target-sight alignment assures that the indexed region is fully captured. We compare the use of MET for guiding capture with two standard methods: one that uses a logo to indicate that media content is available and text to define the capture region and another that explicitly indicates the capture region using a visible boundary mark.
Publication Details
  • ICME 2014, Best Demo Award
  • Jul 14, 2014

Abstract

Close
In this paper, we describe Gesture Viewport, a projector-camera system that enables finger gesture interactions with media content on any surface. We propose a novel and computationally very efficient finger localization method based on the detection of occlusion patterns inside a virtual sensor grid rendered in a layer on top of a viewport widget. We develop several robust interaction techniques to prevent unintentional gestures to occur, to provide visual feedback to a user, and to minimize the interference of the sensor grid with the media content. We show the effectiveness of the system through three scenarios: viewing photos, navigating Google Maps, and controlling Google Street View.
Publication Details
  • ACM ICMR 2014
  • Apr 1, 2014

Abstract

Close
Motivated by scalable partial-duplicate visual search, there has been growing interest on a wealth of compact and efficient binary feature descriptors (e.g. ORB, FREAK, BRISK). Typically, binary descriptors are clustered into codewords and quantized with Hamming distance, which follows conventional bag-of-words strategy. However, such codewords formulated in Hamming space did not present obvious indexing and search performance improvement as compared to the Euclidean ones. In this paper, without explicit codeword construction, we explore to utilize binary descriptors as direct codebook indices (addresses). We propose a novel approach to build multiple index tables which parallelly check the collision of same hash values. The evaluation is performed on two public image datasets: DupImage and Holidays. The experimental results demonstrate the index efficiency and retrieval accuracy of our approach.
Publication Details
  • HotMobile 2014
  • Feb 26, 2014

Abstract

Close
In this paper, we propose HiFi system which enables users to interact with surrounding physical objects. It uses coded light to encode position in an environment. By attaching a tiny light sensor on a user’s mobile device, the user can attach digital info to arbitrary static physical objects or retrieve/modify them anchored to these objects. With this system, a family member may attach a digital maintenance schedule to a fish tank or indoor plants, etc. In a store, a store manager may use such system to attach price tag, discount info and multimedia contents to any products and customers can get the attached info by moving their phone close to the focused product. Similarly, a museum can use this system to provide extra info of displayed items to visitors. Different from computer vision based systems, HiFi does not have requests on texture, bright illumination, etc. Different from regular barcode approaches, HiFi does not require extra physical attachments that may change an object’s native appearance. HiFi has much higher spatial resolution for distinguishing close objects or attached parts of the same object. As HiFi system can track a mobile device at 80 positions per second, it also has much faster response than any above listed system.
2012
Publication Details
  • ACM Multimedia 2012
  • Oct 29, 2012

Abstract

Close
Paper and Computers have complementary advantages and are used side by side in many scenarios. Interactive paper systems aim to combine the two media. However, most such systems only allow fingers and pens to interact with content on paper. This finger-pen-only input suffers from low precision, lag, instability and occlusion. Moreover, it incurs frequent device switch (e.g. pen vs. mouse) in users' hand during cross-media interactions, yielding inefficiency and interruptions of a document workspace continuum. To address these limitations, we propose MixPad, a novel interactive paper system which incorporates mice and keyboards to enhance the conventional pen-finger-based paper interaction. Similar to many other systems, MixPad adopts a mobile camera-projector unit to recognize paper documents, detect pen and finger gestures and provide visual feedback. Unlike these systems, MixPad supports users to use mice and keyboards to select fine-grained content and create annotation on paper, and to facilitate bimanual operations for more efficient and smoother cross-media interaction. This novel interaction style combines the advantages of mice, keyboards, pens and fingers, enabling richer digital functions on paper.
Publication Details
  • CHI 2012
  • May 5, 2012

Abstract

Close
Abstract: Pico projectors have lately been investigated as mobile display and interaction devices. We propose to use them as ‘light beams’: Everyday objects sojourning in a beam are turned into dedicated projection surfaces and tangible interaction devices. While this has been explored for large projectors, the affordances of pico projectors are fundamentally different: they have a very small and strictly limited projection ray and can be carried around in a nomadic way during the day. Thus it is unclear how this could be actually leveraged for tangible interaction with physical, real world objects. We have investigated this in an exploratory field study and contribute the results. Based upon these, we present exemplary interaction techniques and early user feedback.
2011
Publication Details
  • The 10th International Conference on Virtual Reality Continuum and Its Applications in Industry
  • Dec 11, 2011

Abstract

Close
Augmented Paper (AP) is an important area of Augmented Reality (AR). Many AP systems rely on visual features for paper doc-ument identification. Although promising, these systems can hardly support large sets of documents (i.e. one million documents) because of the high memory and time cost in handling high-dimensional features. On the other hand, general large-scale image identification techniques are not well customized to AP, costing unnecessarily more resource to achieve the identification accuracy required by AP. To address this mismatching between AP and image identification techniques, we propose a novel large-scale image identification technique well geared to AP. At its core is a geometric verification scheme based on Minimum visual-word Correspondence Set (MICSs). MICS is a set of visual word (i.e. quantized visual fea-ture) correspondences, each of which contains a minimum number of correspondences that are sufficient for deriving a transformation hypothesis between a captured document image and an indexed image. Our method selects appropriate MICSs to vote in a Hough space of transformation parameters, and uses a robust dense region detection algorithm to locate the possible transformation models in the space. The models are then utilized to verify all the visual word correspondence to precisely identify the matching indexed image. By taking advantage of unique geometric constraints in AP, our method can significantly reduce the time and memory cost while achieving high accuracy. As showed in evaluation with two AP systems called FACT and EMM, over a dataset with 1M+ images, our method achieves 100% identification accuracy and 0.67% registration error for FACT; For EMM, our method outperforms the state-of-the-art image identification approach by achieving 4% improvements in detection rate and almost perfect precision, while saving 40% and 70% memory and time cost.

PaperUI

Publication Details
  • Springer LNCS
  • Dec 1, 2011

Abstract

Close
PaperUI is a human-information interface concept that advocates using paper as displays and using mobile devices, such as camera phones or camera pens, as traditional computer-mice. When emphasizing technical efforts, some researchers like to refer the PaperUI related underlying work as interactive paper system. We prefer the term PaperUI for emphasizing the final goal, narrowing the discussion focus, and avoiding terminology confusion between interactive paper system and interactive paper computer [40]. PaperUI combines the merits of paper and the mobile devices, in that users can comfortably read and flexibly arrange document content on paper, and access digital functions related to the document via the mobile computing devices. This concept aims at novel interface technology to seamlessly bridge the gap between paper and computers for better user experience in handling documents. Compared with traditional laptops and tablet PCs, devices involved in the PaperUI concept are more light-weight, compact, energy efficient, and widely adopted. Therefore, we believe this interface vision can make computation more convenient to access for general public.
Publication Details
  • ACM Multimedia 2011
  • Nov 28, 2011

Abstract

Close
Embedded Media Markers (EMMs) are nearly transparent icons printed on paper documents that link to associated digital media. By using the document content for retrieval, EMMs are less visually intrusive than barcodes and other glyphs while still providing an indication for the presence of links. An initial implementation demonstrated good overall performance but exposed difficulties in guaranteeing the creation of unambiguous EMMs. We developed an EMM authoring tool that supports the interactive authoring of EMMs via visualizations that show the user which areas on a page may cause recognition errors and automatic feedback that moves the authored EMM away from those areas. The authoring tool and the techniques it relies on have been applied to corpora with different visual characteristics to explore the generality of our approach.

PaperUI

Publication Details
  • CBDAR 2011
  • Sep 18, 2011

Abstract

Close
PaperUI is a human-computer interface concept that treats paper as displays that users can interact with via mobile devices such as mobile phones and projectors. It combines the merits of paper and the mobile devices. Compared with traditional laptops and tablet PCs, devices involved in this concept are more light-weight, compact, energy efficient, and widely adopted. Therefore, we believe this interface vision can make computation more convenient to access for general public. With our implemented prototype, pilot users can read documents easily and comfortably on paper, and access many digital functions related to the document via a camera phone or a mobile projector Invited Talk. http://imlab.jp/cbdar2011/#keynote

Abstract

Close
This demo shows an interactive paper system called MixPad, which features using mice and keyboards to enhance the conventional pen-finger-gesture based interaction with paper documents. Similar to many interactive paper systems, MixPad adopts a mobile camera-projector unit to recognize paper documents, detect pen and finger gestures and provide visual feedback. Unlike these systems, MixPad allows using mice and keyboards to help users interact with fine-grained document content on paper (e.g. individual words and user-defined arbitrary regions), and to facilitate cross-media operations. For instance, to copy a document segment from paper to a laptop, one first points a finger of her non-dominant hand to the segment roughly, and then uses a mouse in her dominant hand to refine the selection and drag it to the laptop; she can also type text as a detailed comment on a paper document. This novel interaction paradigm combines the advantages of mice, keyboards, pens and fingers, and therefore enables rich digital functions on paper.
Publication Details
  • CHI 2011 Workshop on Mobile and Personal Projection (MP2)
  • May 8, 2011

Abstract

Close
The field of personal mobile projection is advancing quickly and a variety of work focuses on enhancing physical objects in the real world with dynamically projected digital artifacts. Due to technological restrictions, none of them has yet investigated, what we feel is the most promising research direction: the (bi-manual) interaction with mobile projections on non-planar surfaces. To elicit the challenges of this field of research, we contribute (1) a technology-centered design space for mobile projector-based interfaces and discus related work in light thereof, (2) a discussion on lessons learnt from two of our research projects, which aim at improving both usability and user experience and (3) an outline of open research challenges within this field.
Publication Details
  • ACM International Conference on Multimedia Retrieval (ICMR) 2011
  • Apr 17, 2011

Abstract

Close
Embedded Media Marker (EMM) identification system allows users to retrieve relevant dynamic media associated with a static paper document via camera phones. The user supplies a query image by capturing an EMM-signified patch of a paper document through a camera phone; the system recognizes the query and in turn retrieves and plays the corresponding media on the phone. Accurate image matching is crucial for positive user experience in this application. To address the challenges posed by large datasets and variations in camera-phone-captured query images, we introduce a novel image matching scheme based on geometrically consistent correspondences. Two matching constraints - "injection" and "approximate global geometric consistency" (AGGC), which are unique in EMM identification, are presented. A hierarchical scheme, combined with two constraining functions, is designed to detect the "injective-AGGC" correspondences between images. A spatial neighborhood search approach is further proposed to address challenging cases with large translational shift. Experimental results on a 100k+ dataset show that our solution achieves high accuracy with low memory and time complexity and outperforms the standard bag-of-words approach.
Publication Details
  • Fuji Xerox Technical Report
  • Jan 1, 2011

Abstract

Close
Embedded Media Markers, or simply EMMs, are nearly transparent iconic marks printed on paper documents that signify the existence of media associated with that part of the document. EMMs also guide users' camera operations for media retrieval. Users take a picture of an EMM-signified document patch using a cell phone, and the media associated with the EMM-signified document location is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document appearance. Retrieval of media associated with an EMM is based on image local features of the captured EMM-signified document patch. This paper describes a technique for semi-automatically placing an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document.
Publication Details
  • Encyclopledia of the Sciences of Learning
  • Jan 1, 2011

Abstract

Close
Supervised Learning is a machine learning paradigm for acquiring the input-output relationship information of a system based on a given set of paired input-output training samples. As the output is regarded as the label of the input data or the supervision, an input-output training sample is also called labelled training data, or supervised data. Occasionally, it is also referred to as Learning with a Teacher (Haykin 1998), Learning from Labelled Data, or Inductive Machine Learning (Kotsiantis, 2007). The goal of supervised learning is to build an artificial system that can learn the mapping between the input and the output, and can predict the output of the system given new inputs. If the output takes a finite set of discrete values that indicate the class labels of the input, the learned mapping leads to the classification of the input data. If the output takes continuous values, it leads to a regression of the input. The input-output relationship information is frequently represented with learning-model parameters. When these parameters are not directly available from training samples, a learning system needs to go through an estimation process to obtain these parameters. Different form Unsupervised Learning, the training data for Supervised Learning need supervised or labelled information, while the training data for unsupervised learning are unsupervised as they are not labelled (i.e., merely the inputs). If an algorithm uses both supervised and unsupervised training data, it is called a Semi-supervised Learning algorithm. If an algorithm actively queries a user/teacher for labels in the training process, the iterative supervised learning is called Active Learning.
2010
Publication Details
  • ACM International Conference on Multimodal Interfaces
  • Nov 8, 2010

Abstract

Close
Embedded Media Barcode Links, or simply EMBLs, are optimally blended iconic barcode marks, printed on paper documents, that signify the existence of multimedia associated with that part of the document content (Figure 1). EMBLs are used for multimedia retrieval with a camera phone. Users take a picture of an EMBL-signified document patch using a cell phone, and the multimedia associated with the EMBL-signified document location is displayed on the phone. Unlike a traditional barcode which requires an exclusive space, the EMBL construction algorithm acts as an agent to negotiate with a barcode reader for maximum user and document benefits. Because of this negotiation, EMBLs are optimally blended with content and thus have less interference with the original document layout and can be moved closer to a media associated location. Retrieval of media associated with an EMBL is based on the barcode identification of a captured EMBL. Therefore, EMBL retains nearly all barcode identification advantages, such as accuracy, speed, and scalability. Moreover, EMBL takes advantage of users' knowledge of a traditional barcode. Unlike Embedded Media Maker (EMM) which requires underlying document features for marker identification, EMBL has no requirement for the underlying features. This paper will discuss the procedures for EMBL construction and optimization. It will also give experimental results that strongly support the EMBL construction and optimization ideas.
Publication Details
  • ACM Multimedia 2010
  • Oct 25, 2010

Abstract

Close
An Embedded Media Marker (EMM) is a transparent mark printed on a paper document that signifies the availability of additional media associated with that part of the document. Users take a picture of the EMM using a camera phone, and the media associated with that part of the document is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document appearance. Retrieval of media associated with an EMM is based on image features of the document within the EMM boundary. Unlike other feature-based retrieval methods, the EMM clearly indicates to the user the existence and type of media associated with the document location. A semi-automatic authoring tool is used to place an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document. We will demonstrate how to create an EMM-enhanced document, and how the EMM enables access to the associated media on a cell phone.