Publications

From 2017 (Clear Search)

2017
Publication Details
  • IEEE Internet of Things Journal
  • Nov 22, 2017

Abstract

Close
Advances in small and low power electronics have created new opportunities for the Internet of Things (IoT), leading to an explosion of physical objects being connected to the Internet. However, there still lacks an indoor localization solution that can answer the needs of various location-based IoT applications with desired simplicity, robustness, accuracy, and responsiveness. We introduce Foglight, a visible light enabled indoor localization system for IoT devices that relies on unique spatial encoding produced when mechanical mirrors inside a projector are flipped based on gray-coded binary images. Foglight employs simple off-the-shelf light sensors that can be easily coupled with existing IoT devices - such as thermometers, gas meters, or light switches - making their location discoverable. Our sensor unit is computation efficient; it can perform high-accuracy localization with minimum signal processing overhead, allowing any low-power IoT device on which it rests to be able to locate itself. Additionally, results from our evaluation reveal that Foglight can locate a target device with an average accuracy of 1.7 millimeters and average refresh rate of 84 Hz with minimal latency, 31.46 milliseconds on WiFi and 23.2 milliseconds on serial communication. Two example applications are developed to demonstrate possible scenarios as proof of concept. We also discuss limitations, how they could be overcome, and propose next steps.
Publication Details
  • ICDAR 2017
  • Nov 10, 2017

Abstract

Close
We present a system for capturing ink strokes written with ordinary pen and paper using a fast camera with a frame rate comparable to a stylus digitizer. From the video frames, ink strokes are extracted and used as input to an online handwriting recognition engine. A key component in our system is a pen up/down detection model for detecting the contact of the pen-tip with the paper in the video frames. The proposed model consists of feature representation with convolutional neural networks and classification with a recurrent neural network. We also use a high speed tracker with kernelized correlation filters to track the pen-tip. For training and evaluation, we collected labeled video data of users writing English and Japanese phrases from public datasets, and we report on character accuracy scores for different frame rates in the two languages.
Publication Details
  • Computer-Supported Cooperative Work and Social Computing
  • Nov 1, 2017

Abstract

Close
Video telehealth is growing to allow more clinicians to see patients from afar. As a result, clinicians, typically trained for in-person visits, must learn to communicate both health information and non-verbal affective signals to patients through a digital medium. We introduce a system called ReflectLive that senses and provides real-time feedback about non-verbal communication behaviors to clinicians so they can improve their communication behaviors. A user evaluation with 10 clinicians showed that the real-time feedback helped clinicians maintain better eye contact with patients and was not overly distracting. Clinicians reported being more aware of their non-verbal communication behaviors and reacted positively to summaries of their conversational metrics, motivating them to want to improve. Using ReflectLive as a probe, we also discuss the benefits and concerns around automatically quantifying the “soft skills” and complexities of clinician-patient communication, the controllability of behaviors, and the design considerations for how to present real-time and summative feedback to clinicians.
Publication Details
  • ACM MM Workshop
  • Oct 23, 2017

Abstract

Close
Humans are complex and their behaviors follow complex multimodal patterns, however to solve many social computing problems one often looks at complexity in large-scale yet single point data sources or methodologies. While single data/single method techniques, fueled by large scale data, enjoyed some success, it is not without fault. Often with one type of data and method, all the other aspects of human behavior are overlooked, discarded, or, worse, misrepresented. We identify this as two succinct problems. First, social computing problems that cannot be solved using a single data source and need intelligence from multiple modals and, second, social behavior that cannot be fully understood using only one form of methodology. Throughout this talk, we discuss these problems and their implications, illustrate examples, and propose new directives to properly approach in the social computing research in today’s age.
Publication Details
  • Fuji Xerox Technical Report
  • Oct 1, 2017

Abstract

Close
モバイル技術の発展と日常生活における継続的なつながりは、仕事の進め方に大きく影響を与えている。センシング技術の活用は個人による使用事例が多くを占めているが、ワークプレイスはセンシング技術を活用するのに重要かつ適切な環境である。つまり、従業員が自分の追跡可能な端末を使ってセンシング技術を連携させることが可能である。本稿では、ワークプレイスにおける身体的、精神的、および社会的に良好な状態と生産性を向上させる技術について、2つの最新の調査結果と、行動を変える姿勢を維持 るための仕組みを報告する。次に、新しい作業の領域について簡単に議論する。
Publication Details
  • IEEE Transactions on Visualization and Computer Graphics (Proceedings of VAST 2017)
  • Oct 1, 2017

Abstract

Close
Discovering and analyzing biclusters, i.e., two sets of related entities with close relationships, is a critical task in many real-world applications, such as exploring entity co-occurrences in intelligence analysis, and studying gene expression in bio-informatics. While the output of biclustering techniques can offer some initial low-level insights, visual approaches are required on top of that due to the algorithmic output complexity.This paper proposes a visualization technique, called BiDots, that allows analysts to interactively explore biclusters over multiple domains. BiDots overcomes several limitations of existing bicluster visualizations by encoding biclusters in a more compact and cluster-driven manner. A set of handy interactions is incorporated to support flexible analysis of biclustering results. More importantly, BiDots addresses the cases of weighted biclusters, which has been underexploited in the literature. The design of BiDots is grounded by a set of analytical tasks derived from previous work. We demonstrate its usefulness and effectiveness for exploring computed biclusters with an investigative document analysis task, in which suspicious people and activities are identified from a text corpus.

Supporting Handoff in Asynchronous Collaborative Sensemaking Using Knowledge-Transfer Graphs

Publication Details
  • IEEE Transactions on Visualization and Computer Graphics (Proceedings of VAST 2017)
  • Oct 1, 2017

Abstract

Close
During asynchronous collaborative analysis, handoff of partial findings is challenging because externalizations produced by analysts may not adequately communicate their investigative process. To address this challenge, we developed techniques to automatically capture and help encode tacit aspects of the investigative process based on an analyst’s interactions, and streamline explicit authoring of handoff annotations. We designed our techniques to mediate awareness of analysis coverage, support explicit communication of progress and uncertainty with annotation, and implicit communication through playback of investigation histories. To evaluate our techniques, we developed an interactive visual analysis system, KTGraph, that supports an asynchronous investigative document analysis task. We conducted a two-phase user study to characterize a set of handoff strategies and to compare investigative performance with and without our techniques. The results suggest that our techniques promote the use of more effective handoff strategies, help increase an awareness of prior investigative process and insights, as well as improve final investigative outcomes.

How Do Ancestral Traits Shape Family Trees over Generations?

Publication Details
  • IEEE Transactions on Visualization and Computer Graphics (Proceedings of VAST 2017)
  • Oct 1, 2017

Abstract

Close
Whether and how does the structure of family trees differ by ancestral traits over generations? This is a fundamental question regarding the structural heterogeneity of family trees for the multi-generational transmission research. However, previous work mostly focuses on parent-child scenarios due to the lack of proper tools to handle the complexity of extending the research to multi-generational processes. Through an iterative design study with social scientists and historians, we develop TreeEvo that assists users to generate and test empirical hypotheses for multi-generational research. TreeEvo summarizes and organizes family trees by structural features in a dynamic manner based on a traditional Sankey diagram. A pixel-based technique is further proposed to compactly encode trees with complex structures in each Sankey Node. Detailed information of trees is accessible through a space-efficient visualization with semantic zooming. Moreover, TreeEvo embeds Multinomial Logit Model (MLM) to examine statistical associations between tree structure and ancestral traits. We demonstrate the effectiveness and usefulness of TreeEvo through an in-depth case-study with domain experts using a real-world dataset (containing 54,128 family trees of 126,196 individuals).

Abstract

Close
For tourists, interactions with digital public displays often depend on specific technologies that users may not be familiar with (QR codes, NFC, Bluetooth); may not have access to because of networking issues (SMS), may lack a required app (QR codes), or device technology (NFC); may not want to use because of time constraints (WiFi, Bluetooth); or may not want to use because they are worried about sharing their data with a third-party service (text, WiFi). In this demonstration, we introduce ItineraryScanner, a system that allows users to seamlessly share content with a public travel kiosk system.
Publication Details
  • British Machine Vision Conference (BMVC) 2017
  • Sep 4, 2017

Abstract

Close
Video summarization and video captioning are considered two separate tasks in existing studies. For longer videos, automatically identifying the important parts of video content and annotating them with captions will enable a richer and more concise condensation of the video. We propose a general neural network architecture that jointly considers two supervisory signals (i.e., an image-based video summary and text-based video captions) in the training phase and generates both a video summary and corresponding captions for a given video in the test phase. Our main idea is that the summary signals can help a video captioning model learn to focus on important frames. On the other hand, caption signals can help a video summarization model to learn better semantic representations. Jointly modeling both the video summarization and the video captioning tasks offers a novel end-to-end solution that generates a captioned video summary enabling users to index and navigate through the highlights in a video. Moreover, our experiments show the joint model can achieve better performance than state-of- the-art approaches in both individual tasks.
Publication Details
  • ACM Document Engineering 2017
  • Aug 30, 2017

Abstract

Close
In this paper, we describe DocHandles, a novel system that allows users to link to specific document parts in their chat applications. As users type a message, they can invoke the tool by referring to a specific part of a document, e.g., “@fig1 needs revision”. By combining text parsing and document layout analysis, DocHandles can find and present all the figures “1” inside previously shared documents, allowing users to explicitly link to the relevant “document handle”. Documents become first-class citizens inside the conversation stream where users can seamlessly integrate documents in their text-centric messaging application.

Abstract

Close
It is increasingly possible to use cameras and sensors to detect and analyze human appearance for the purposes of personalizing user experiences. Such systems are already deployed in some public places to personalize advertisements and recommend items. However, since these technologies are not yet widespread, we do not have a good sense of the perceived benefits and drawbacks of public display systems that use face detection as an input for personalized recommendations. We conducted a user study with a system that inferred a user’s gender and age from a facial detection and analysis algorithm and used this to present recommendations in two scenarios (finding stores to visit in a mall and finding a pair of sunglasses to buy).  This work provides an initial step towards understanding user reactions to a new and emerging form of implicit recommendation based on physical appearance.

Image-Based User Profiling of Frequent and Regular Venue Categories

Publication Details
  • IEEE ICME 2017
  • Jul 10, 2017

Abstract

Close
The availability of mobile access has shifted social media use. With that phenomenon, what users shared on social media and where they visited is naturally an excellent resource to learn their visiting behavior. Knowing visit behaviors would help market survey and customer relationship management, e.g., sending customers coupons of the businesses that they visit frequently. Most prior studies leverage meta-data e.g., check- in locations to profile visiting behavior but neglect important information from user-contributed content, e.g., images. This work addresses a novel use of image content for predicting the user visit behavior, i.e., the frequent and regular business venue categories that the content owner would visit. To collect training data, we propose a strategy to use geo-metadata associated with images for deriving the labels of an image owner’s visit behavior. Moreover, we model a user’s sequential images by using an end-to-end learning framework to reduce the optimization loss. That helps improve the prediction accuracy against the baseline as demonstrated in our experiments. The prediction is completely based on image content that is more available in social media than geo-metadata, and thus allows coverage in profiling a wider set of users.
Publication Details
  • Communities & Technologies 2017
  • Jun 26, 2017

Abstract

Close
Video conferencing is widely used to help deliver educational presentations, such as lectures or informational webinars, to a distributed audience. While individuals in a dyadic conversation may be able to use webcam streams to assess the engagement level of their interlocutor with some ease, as the size of the audience in a video conference setting increases, it becomes increasingly difficult to interpret how engaged the overall group may be. In this work, we use a mixed-methods approach to understand how presenters and attendees of online presentations use available cues to perceive and interpret audience behavior (such as how engaged the group is). Our results suggest that while webcams are seen as useful by presenters to increase audience visibility and encourage attention, audience members do not uniformly benefit from seeing others’ webcams; other interface cues such as chat may be more useful and informative engagement indicators for both parties. We conclude with design recommendations for future systems to improve what is sensed and presented.
Publication Details
  • International Conference on Robotics and Automation
  • May 29, 2017

Abstract

Close
In this paper, we propose a real-time classification scheme to cope with noisy Radio Signal Strength Indicator (RSSI) measurements utilized in indoor positioning systems. RSSI values are often converted to distances for position estimation. However due to multipathing and shadowing effects, finding a unique sensor model using both parametric and nonparametric methods is highly challenging. We learn decision regions using the Gaussian Processes classification to accept measurements that are consistent with the operating sensor model. The proposed approach can perform online, does not rely on a particular sensor model or parameters, and is robust to sensor failures. The experimental results achieved using hardware show that available positioning algorithms can benefit from incorporating the classifier into their measurement model as a meta-sensor modeling technique.

Gaze-informed multimodal interaction

Publication Details
  • The Handbook of Multimodal-Multisensor Interfaces
  • May 9, 2017

Abstract

Close
Observe at a person pointing out and describing something. Where is that person looking? Chances are good that this person also looks at what she is talking about and pointing at. Gaze is naturally coordinated with our speech and hand movements. By utilizing this tendency, we can create a natural interaction with computing devices and environments. In this chapter, we will first briefly discuss some basic properties of the gaze signal we can get from eye trackers, followed by a review of a multimodal system utilizing the gaze signal as one input modality. In Multimodal Gaze Interaction, data from eye trackers is used as an active input mode where for instance gaze is used as an alternative, or complimentary, pointing modality along with other input modalities. Using gaze as an active or explicit input method is challenging for several reasons. One of them being that eyes are primarily used for perceiving our environment, so knowing when a person selects an item with gaze versus just looking around is an issue. Researchers have tried to solve this by combining gaze with various input methods, such as manual pointing, speech, touch, etc. However, gaze information can also be used in interactive systems, for other purposes than explicit pointing since a user's gaze is a good indication of the user's attention. In passive gaze interaction, the gaze is not used as the primary input method, but as a supporting input method. In these kinds of systems, gaze is mainly used for inferring and reasoning about the user's cognitive state or activities in a way that can support the interaction. These kinds of multimodal systems often combine gaze with a multitude of input modalities. In this chapter we focus on interactive systems, exploring the design space for gaze-informed multimodal interaction spanning from gaze as active input mode to passive and if the usage scenario is stationary (at e.g. a desk) or mobile. There are a number of studies aimed at describing, detecting or modeling specific behaviors or cognitive states. We will touch on some of these works since they can guide us in how to build gaze-informed multimodal interaction.

Abstract

Close
Work breaks can play an important role in the mental and physical well-being of workers and contribute positively to productivity. In this paper we explore the use of activity-, physiological-, and indoor-location sensing to promote mobility during work-breaks. While the popularity of devices and applications to promote physical activity is growing, prior research highlights important constraints when designing for the workplace. With these constraints in mind, we developed BreakSense, a mobile application that uses a Bluetooth beacon infrastructure, a smartphone and a smartwatch to encourage mobility during breaks with a game-like design. We discuss constraints imposed by design for work and the workplace, and highlight challenges associated with the use of noisy sensors and methods to overcome them. We then describe a short deployment of BreakSense within our lab that examined bound vs. unbound augmented breaks and how they affect users’ sense of completion and readiness to work.

Abstract

Close
Users often use social media to share their interest in products. We propose to identify purchase stages from Twitter data following the AIDA model (Awareness, Interest, Desire, Action). In particular, we define a task of classifying the purchase stage of each tweet in a user's tweet sequence. We introduce RCRNN, a Ranking Convolutional Recurrent Neural Network which computes tweet representations using convolution over word embeddings and models a tweet sequence with gated recurrent units. Also, we consider various methods to cope with the imbalanced label distribution in our data and show that a ranking layer outperforms class weights.
Publication Details
  • IEEE PerCom 2017
  • Mar 13, 2017

Abstract

Close
We present Lift, a visible light-enabled finger tracking and object localization technique that allows users to perform freestyle multi-touch gestures on any object’s surface in an everyday environment. By projecting encoded visible patterns onto an object’s surface (e.g. paper, display, or table), and localizing the user’s fingers with light sensors, Lift offers users a richer interactive space than the device’s existing interfaces. Additionally, everyday objects can be augmented by attaching sensor units onto their surface to accept multi-touch gesture input. We also present two applications as a proof of concept. Finally, results from our experiments indicate that Lift can localize ten fingers simultaneously with accuracy of 0.9 mm and 1.8 mm on two axes respectively and an average refresh rate of 84 Hz with 16.7ms delay on WiFi and 12ms delay on serial, making gesture recognition on noninstrumented objects possible.
Publication Details
  • TRECVID Workshop
  • Mar 1, 2017

Abstract

Close
This is a summary of our participation in the TRECVID 2016 video hyperlinking task (LNK). We submitted four runs in total. A baseline system combined on established vectorspace text indexing and cosine similarity. Our other runs explored the use of distributed word representations in combination with fine-grained inter-segment text similarity measures.