Publications

FXPAL publishes in top scientific conferences and journals.

“Notice: FX Palo Alto Laboratory will be closing. All Research and related operations will cease as of June 30, 2020.”

2020
Publication Details
  • Natural Language Engineering
  • Jul 15, 2020

Abstract

Close
Twitter and other social media platforms are often used for sharing interest in products. The identification of purchase decision stages, such as in the AIDA model (Awareness, Interest, Desire, Action), can enable more personalized e-commerce services and a finer-grained targeting of ads than predicting purchase intent only. In this paper, we propose and analyze neural models for identifying the purchase stage of single tweets in a user's tweet sequence. In particular, we identify three challenges of purchase stage identification: imbalanced label distribution with a high number of negative instances, limited amount of training data, and domain adaptation with no or only little target domain data. Our experiments reveal that the imbalanced label distribution is the main challenge for our models. We address it with ranking loss and perform detailed investigations of the performance of our models on the different output classes. In order to improve the generalization of the models and augment the limited amount of training data, we examine the use of sentiment analysis as a complementary, secondary task in a multitask framework. For applying our models to tweets from another product domain, we consider two scenarios: For the first scenario without any labeled data in the target product domain, we show that learning domain-invariant representations with adversarial training is most promising while for the second scenario with a small number of labeled target examples, finetuning the source model weights performs best. Finally, we conduct several analyses, including extracting attention weights and representative phrases for the different purchase stages. The results suggest that the model is learning features indicative of purchase stages and that the confusion errors are sensible.

Abstract

Close
Managing post-surgical pain is critical for successful surgical outcomes. One of the challenges of pain management is accurately assessing the pain level of patients. Self-reported numeric pain ratings are limited because they are subjective, can be affected by mood, and can influence the patient’s perception of pain when making comparisons. In this paper, we introduce an approach that analyzes 2D and 3D facial keypoints of post-surgical patients to estimate their pain intensity level.Our approach leverages the previously unexplored capabilities of a smartphone to capture a dense3D representation of a person’s face as input for pain intensity level estimation. Our contributions are a data collection study with post-surgical patients to collect ground-truth labeled sequences of2D and 3D facial keypoints for developing a pain estimation algorithm, a pain estimation model that uses multiple instance learning to overcome inherent limitations in facial keypoint sequences, and the preliminary results of the pain estimation model using 2D and 3D features with comparisons of alternate approaches.

Abstract

Close
Flow-like experiences at work are important for productivity and worker well-being. However, it is difficult to objectively detect when workers are experiencing flow in their work. In this paper, we investigate how to predict a worker's focus state based on physiological signals. We conducted a lab study to collect physiological data from knowledge workers experienced different levels of flow while performing work tasks. We used the nine characteristics of flow to design tasks that would induce different focus states. A manipulation check using the Flow Short Scale verified that participants experienced three distinct flow states, one overly challenging non-flow state, and two types of flow states, balanced flow, and automatic flow. We built machine learning classifiers that can distinguish between non-flow and flow states with 0.889 average AUC and rest states from working states with 0.98 average AUC. The results show that physiological sensing can detect focused flow states of knowledge workers and can enable ways to for individuals and organizations to improve both productivity and worker satisfaction.

Interpretable Contrastive Learning for Networks

Publication Details
  • arXiv
  • Jun 3, 2020

Abstract

Close
Contrastive learning (CL) is an emerging analysis approach that aims to discover unique patterns in one dataset relative to another. By applying this approach to network analysis, we can reveal unique characteristics in one network by contrasting with another. For example, with networks of protein interactions obtained from normal and cancer tissues, we can discover unique types of interactions in cancer tissues. However, existing CL methods cannot be directly applied to networks. To address this issue, we introduce a novel approach called contrastive network representation learning (cNRL). This approach embeds network nodes into a low-dimensional space that reveals the uniqueness of one network compared to another. Within this approach, we also propose a method, named i-cNRL, that offers interpretability in the learned results, allowing for understanding which specific patterns are found in one network but not the other. We demonstrate the capability of i-cNRL with multiple network models and real-world datasets. Furthermore, we provide quantitative and qualitative comparisons across i-cNRL and other potential cNRL algorithm designs.
Publication Details
  • Personal and Ubiquitous Computing
  • May 31, 2020

Abstract

Close
An important capability of most smart, Internet-of-Things-enabled spaces (e.g., office, home, hospital, factory) is the ability to leverage context of use. This can support social awareness, allowing people to interact more effectively which each other. Location is a key context element; particularly indoor location. Recent advances in radio ranging technologies, such as 802.11-2016 FTM, promise the availability of low-cost, near-ubiquitous time-of-flight-based ranging estimates. In this paper, we build on prior work to enhance this ranging technology's ability to provide useful location estimates. For further improvements, we model user-motion behavior to estimate the user motion state by taking the temporal measurements available from time-of-flight ranging. We select the velocity parameter of a particle-filter-based on this motion state. We demonstrate meaningful improvements in coordinate-based estimation accuracy and substantial increases in room-level estimation accuracy. Furthermore, insights gained in our real-world deployment provides important implications for future Internet of Things context applications and their supporting technology deployments such as social interaction, workflow management, inventory control, or healthcare information tools.
Publication Details
  • CHI 2020
  • Apr 25, 2020

Abstract

Close
The demands of daily work offer few opportunities for workers to take stock of their own progress, big or small, which can lead to lower motivation, engagement, and higher risk of burnout. We present Highlight Matome, a personal online tool that encourages workers to quickly record and rank a single work highlight each day, helping them gain awareness of their own successes. We describe results from a field experiment investigating our tool's effectiveness for improving workers' engagement, perceptions, and affect. Thirty-three knowledge workers in Japan and the U.S. used Highlight Matome for six weeks. Our results show that using our tool for less than one minute each day significantly increased measures of work engagement, dedication, and positivity. A qualitative analysis of the highlights offers a window into participants' emotions and perceptions. We discuss implications for theories of inner work life and worker well-being.

Social VR: A New Medium for RemoteCommunication and Collaboration

Publication Details
  • CHI 2020
  • Apr 25, 2020

Abstract

Close
There is a growing need for effective remote communication, which has many positive societal impacts, such as reducing environmental pollution and travel costs, supporting rich collaboration by remotely connecting talented people. Social Virtual Reality (VR) invites multiple users to join a collaborative virtual environment, which creates new opportunities for remote communication. The goal of social VR is not to completely replicate reality, but to facilitate and extend the existing communication channels of the physical world. Apart from the benefits provided by social VR, privacy concerns and ethical risks are raised when the boundary between the real and the virtual world is blurred. This workshop is intended to spur discussions regarding technology, evaluation protocols, application areas, research ethics and legal regulations for social VR as an emerging immersive remote communication tool.

Abstract

Close
While it is often critical for indoor-location- and proximity-aware applications to know whether a user is in a space or not (e.g., a specific room or office), a key challenge is that the difference between standing on one side or another of a doorway or wall is well within the error range of most RF-based approaches. In this work, we address this challenge by augmenting RF-based localization and proximity detection with active ultrasonic sensing, taking advantage of the limited propagation of sound waves. This simple and cost-effective approach can allow, for example, a Bluetooth smart-lock to discern whether a user is inside or outside their home. We describe a configurable architecture for our solution and present experiments that validate this approach but also demonstrate that different user behavior and application needs can impact system configuration decisions. Finally, we describe applications that could benefit from our solution and address privacy concerns.
2019
Publication Details
  • IEEE ISM2019
  • Dec 8, 2019

Abstract

Close
This paper reports our explorations on learning Sensory Media Association through Reciprocating Training (SMART). The proposed learning system contains two deep autoencoders, one for learning speech representations and another for learning image representations. Two deep networks are trained to bridge the latent spaces of two autoencoders, yielding representation mappings for both speech-to-image and image-to-speech. To improve feature clustering in both latent spaces, the system alternately uses one modality to guide the learning of another modality. Different from traditional technology that uses a fixed modality for supervision (e.g. using text labels for image classification), the proposed approach facilitates a machine to learn from sensory data of two or more modalities through alternating guidance among these modalities. We evaluate the proposed model with MNIST digit images and corresponding digit speeches in the Google Command Digit Dataset (GCDD). We also evaluate the model with a dataset based on COIL-100 and corresponding Watson synthesized speech. The results demonstrate the model's promising viability for sensory media association.

Abstract

Close
We present a remote assistance system that enables a remotely located expert to provide guidance using hand gestures to a customer who performs a physical task in a different location. The system is built on top of a web-based real-time media communication framework which allows the customer to use a commodity smartphone to send a live video feed to the expert, from which the expert can see the view of the customer's workspace and can show his/her hand gestures over the video in real-time. The expert's hand gesture is captured with a hand tracking device and visualized with a rigged 3D hand model on the live video feed. The system can be accessed via a web browser, and it does not require any app software to be installed on the customer's device. Our system supports various types of devices including smartphone, tablet, desktop PC, and smart glass. To improve the collaboration experience, the system provides a novel gravity-aware hand visualization technique.
Publication Details
  • ACM ISS 2019
  • Nov 9, 2019

Abstract

Close
In a telepresence scenario with remote users discussing a document, it can be difficult to follow which parts are being discussed. One way to address this is by showing the user's hand position on the document, which also enables expressive gestural communication. An important practical problem is how to capture and transmit the hand movements efficiently with high resolution document images. We propose a tabletop system with two channels that integrates document capture with a 4K video camera and hand tracking with a webcam, in which the document image and hand skeleton data are transmitted at different rates and handled by a lightweight Web browser client at remote sites. To enhance the rendering, we employ velocity based smoothing and ephemeral motion traces. We tested our prototype over long distances from USA to Japan and to Italy, and report on latency and jitter performance. Our system achieves relatively low latency over a long distance in comparison with a tele-immersive system that transmits mesh data over much shorter distances.
Publication Details
  • International Conference on the Internet of Things (IoT 2019)
  • Oct 22, 2019

Abstract

Close
A motivating, core capability of most smart, Internet of Things enabled spaces (e.g., home, office, hospital, factory) is the ability to leverage context of use. Location is a key context element; particularly indoor location. Recent advances in radio ranging technologies, such as 802.11-2016 FTM, promise the availability of low-cost, near-ubiquitous time-of-flight-based ranging estimates. In this paper, we build on prior work to enhance the technology's ability to provide useful location estimates. We demonstrate meaningful improvements in coordinate-based estimation accuracy and substantial increases in room-level estimation accuracy. Furthermore, insights gained in our real-world deployment provides important implications for future Internet of Things context applications and their supporting technology deployments such as workflow management, inventory control, or healthcare information tools.
Publication Details
  • ACM MM
  • Oct 21, 2019

Abstract

Close
Despite work on smart spaces, nowadays a lot of knowledge work happens in the wild: at home, in coffee places, trains, buses, planes, and of course in crowded open office cubicles. Conducting web conferences in these settings creates privacy issues, and can also distract participants, leading to a perceived lack of professionalism from the remote peer(s). To solve this common problem, we implemented CamaLeon, a browser-based tool that uses real-time machine vision powered by deep learning to change the webcam stream sent by the remote peer: specifically, CamaLeon dynamically changes the "wild" background into one that resembles that of the office workers. In order to detect the background in wild settings, we designed and trained a fast UNet model on head and shoulder images. CamaLeon also uses a face detector to determine whether it should stream the person's face, depending on its location (or lack of presence). It uses face recognition to make sure it streams only a face that belongs to the user who connected to the meeting. The system was tested during a few real video conferencing calls at our company where 2 workers are remote. Both parties felt a sense of enhanced co-presence, and the remote participants felt more professional with their background replaced.
Publication Details
  • ACM MM
  • Oct 21, 2019

Abstract

Close
Responding to requests for information from an application, a remote person, or an organization that involve documenting the presence and/or state of physical objects can lead to incomplete or inaccurate documentation. We propose a system that couples information requests with a live object recognition tool to semi-automatically catalog requested items and collect evidence of their current state.
Publication Details
  • ACM MM
  • Oct 20, 2019

Abstract

Close
Multimedia research has now moved beyond laboratory experiments and is rapidly being deployed in real-life applications including advertisements, social interaction, search, security, automated driving, and healthcare. Hence, the developed algorithms now have a direct impact on the individuals using the abovementioned services and the society as a whole. While there is a huge potential to benefit the society using such technologies, there is also an urgent need to identify the checks and balances to ensure that the impact of such technologies is ethical and positive. This panel will bring together an array of experts who have experience collecting large-scale datasets, building multimedia algorithms, and deploying them in practical applications, as well as, a lawyer whose eyes have been on the fundamental rights at stake. They will lead a discussion on the ethics and lawfulness of dataset creation, licensing, privacy of individuals represented in the datasets, algorithmic transparency, algorithmic bias, explainability, and the implications of application deployment. Through an interactive process engaging the audience, the panel hopes to: increase the awareness of such concepts in the multimedia research community; initiate a discussion on community guidelines all for setting the future direction of conducting multimedia research in a lawful and ethical manner.

Albireo: An Interactive Tool for Visually Summarizing Computational Notebook Structure

Publication Details
  • VDS'19
  • Oct 20, 2019

Abstract

Close
Computational notebooks have become a major medium for data exploration and insight communication in data science. Although expressive, dynamic, and flexible, in practice they are loose collections of scripts, charts, and tables that rarely tell a story or clearly represent the analysis process. This leads to a number of usability issues, particularly in the comprehension and exploration of notebooks. In this work, we design, implement, and evaluate Albireo, a visualization approach to summarize the structure of notebooks, with the goal of supporting more effective exploration and communication by displaying the dependencies and relationships between the cells of a notebook using a dynamic graph structure. We evaluate the system via a case study and expert interviews, with our results indicating that such a visualization is useful for an analyst’s self-reflection during exploratory programming, and also effective for communication of narratives and collaboration between analysts.

Interactive Bicluster Aggregation in Bipartite Graphs

Publication Details
  • IEEE VIS 2019
  • Oct 20, 2019

Abstract

Close
Exploring coordinated relationships is important for sensemaking of data in various fields, such as intelligence analysis. To support such investigations, visual analysis tools use biclustering to mine relationships in bipartite graphs and visualize the resulting biclusters with standard graph visualization techniques. Due to overlaps among biclusters, such visualizations can be cluttered (e.g., with many edge crossings), when there are a large number of biclusters. Prior work attempted to resolve this problem by automatically ordering nodes in a bipartite graph. However, visual clutter is still a serious problem, since the number of displayed biclusters remains unchanged. We propose bicluster aggregation as an alternative approach, and have developed two methods of interactively merging biclusters. These interactive bicluster aggregations help organize similar biclusters and reduce the number of displayed biclusters. Initial expert feedback indicates potential usefulness of these techniques in practice.
Publication Details
  • IEEE InfoVis 2019
  • Oct 20, 2019

Abstract

Close
Think-aloud protocols are widely used by user experience (UX) practitioners in usability testing to uncover issues in user interface design. It is often arduous to analyze large amounts of recorded think-aloud sessions and few UX practitioners have an opportunity to get a second perspective during their analysis due to time and resource constraints. Inspired by the recent research that shows subtle verbalization and speech patterns tend to occur when users encounter usability problems, we take the first step to design and evaluate an intelligent visual analytics tool that leverages such patterns to identify usability problem encounters and present them to UX practitioners to assist their analysis. We first conducted and recorded think-aloud sessions, and then extracted textual and acoustic features from the recordings and trained machine learning (ML) models to detect problem encounters. Next, we iteratively designed and developed a visual analytics tool, VisTA, which enables dynamic investigation of think-aloud sessions with a timeline visualization of ML predictions and input features. We conducted a between-subjects laboratory study to compare three conditions, i.e., VisTA, VisTASimple (no visualization of the ML’s input features), and Baseline (no ML information at all), with 30 UX professionals. The findings show that UX professionals identified more problem encounters when using VisTA than Baseline by leveraging the problem visualization as an overview, anticipations, and anchors as well as the feature visualization as a means to understand what ML considers and omits. Our findings also provide insights into how they treated ML, dealt with (dis)agreement with ML, and reviewed the videos (i.e., play, pause, and rewind).
Publication Details
  • IEEE VIS 2019
  • Oct 20, 2019

Abstract

Close
The analysis of bipartite networks is critical in a variety of application domains, such as exploring entity co-occurrences in intelligence analysis and investigating gene expression in bio-informatics. One important task is missing link prediction, which infers the existence of unseen links based on currently observed ones. In this paper, we propose MissBiN that involves analysts in the loop for making sense of link prediction results. MissBiN combines a novel method for link prediction and an interactive visualization for examining and understanding the algorithm outputs. Further, we conducted quantitative experiments to assess the performance of the proposed link prediction algorithm, and a case study to evaluate the overall effectiveness of MissBiN.

Abstract

Close
Localization in an indoor and/or Global Positioning System (GPS)-denied environment is paramount to drive various applications that require locating humans and/or robots in an unknown environment. Various localization systems using different ubiquitous sensors such as camera, radio frequency, inertial measurement unit have been developed. Most of these systems cannot accommodate for scenarios which have substan- tial changes in the environment such as a large number of people (unpredictable) and sudden change in the environment floor plan (unstructured). In this paper, we propose a system, InFo that can leverage real-time visual information captured by surveillance cameras and augment that with images captured by the smart device user to deliver accurate discretized location information. Through our experiments, we demonstrate that our deep learning based InFo system provides an improvement of 10% as compared to a system that does not utilize this real-time information.
Publication Details
  • British Machine Vision Conference (BMVC 2019)
  • Sep 1, 2019

Abstract

Close
Automatic medical report generation from chest X-ray images is one possibility for assisting doctors to reduce their workload. However, the different patterns and data distribution of normal and abnormal cases can bias machine learning models. Previous attempts did not focus on isolating the generation of the abnormal and normal sentences in order to increase the variability of generated paragraphs. To address this, we propose to separate abnormal and normal sentence generation by using a dual word LSTM in a hierarchical LSTM model. In addition, we conduct an analysis on the distinctiveness of generated sentences compared to the BLEU score, which increases when less distinct reports are generated. Together with this analysis, we propose a way of selecting a model that generates more distinctive sentences. We hope our findings will help to encourage the development of new metrics to better verify methods of automatic medical report generation.
Publication Details
  • The 17th IEEE International Conference on Embedded and Ubiquitous Computing (IEEE EUC 2019)
  • Aug 2, 2019

Abstract

Close
Human activity forecasting from videos in routine-based tasks is an open research problem that has numerous applications in robotics, visual monitoring and skill assessment. Currently, many challenges exist in activity forecasting because human actions are not fully observable from continuous recording. Additionally, a large number of human activities involve fine-grained articulated human motions that are hard to capture using frame-level representations. To overcome thesechallenges, we propose a method that forecasts human actions by learning the dynamics of local motion patterns extracted from dense trajectories using longshort-term memory (LSTM). The experiments on a pub-lic dataset validated the effectiveness of our proposed method in activity forecasting and demonstrate large improvements over the baseline two stream end-to-endmodel. We also learnt that human activity forecasting benefits from learning both the short-range motion pat-terns and long-term dependencies between actions.
Publication Details
  • 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)
  • Jul 28, 2019

Abstract

Close
A common issue in training a deep learning, abstractive summarization model is lack of a large set of training summaries. This paper examines techniques for adapting from a labeled source domain to an unlabeled target domain in the context of an encoder-decoder model for text generation. In addition to adversarial domain adaptation (ADA), we introduce the use of artificial titles and sequential training to capture the grammatical style of the unlabeled target domain. Evaluation on adapting to/from news articles and Stack Exchange posts indicates that the use of these techniques can boost performance for both unsupervised adaptation as well as fine-tuning with limited target data.

Abstract

Close
An open challenge in current telecommunication systems including Skype and other existing research systems is a lack of physical interaction, and consequently a restricted feeling of connection for users. For example, those telecommunication systems cannot allow remote users to move pieces of a board game while playing with a local user. We propose that installing a robot arm and teleoperating it can address the problem by enabling remote physical interaction. We compare three methods for remote control to study the relationship between connection, and how it relates to agency and autonomy for each control scheme.
Publication Details
  • ACM SIGMOD/PODS workshop on Human-In-the-Loop Data Analytics (HILDA)
  • Jun 30, 2019

Abstract

Close
Manufacturing environments require changes in work procedures and settings based on changes in product demand affecting the types of products for production. Resource re-organization and time needed for worker adaptation to such frequent changes can be expensive. For example, for each change, managers in a factory may be required to manually create a list of inventory items to be picked up by workers. Uncertainty in predicting the appropriate pick-up time due to differences in worker-determined routes may make it difficult for managers to generate a fixed schedule for delivery to the assembly line. To address these problems, we propose OPaPi, a human-centric system that improves the efficiency of manufacturing by optimizing parts pick-up routes and schedules. OPaPi leverages frequent pattern mining and the traveling salesman problem solver to suggest rack placement for more efficient routes. The system further employs interactive visualization to incorporate an expert’s domain knowledge and different manufacturing constraints for real-time adaptive decision making.