Publications

FXPAL publishes in top scientific conferences and journals.

2016

From Single Screen to Dual Screen - a Design Study for a User-Controlled Hypervideo-Based Physiotherapy Training

Publication Details
  • WSICC Workshop at TVX
  • Jun 22, 2016

Abstract

Close
Hypervideo based physiotherapy trainings bear an opportunity to support patients in continuing their training after being released from a rehabilitation clinic. Many exercises require the patient to sit on the floor or a gymnastic ball, lie on a gymnastics mat, or do the exercises in other postures. Using a laptop or tablet with a stand to show the exercises is more helpful than for example just having some drawings on a leaflet. However, it may lead to incorrect execution of the exercises while maintaining eye contact with the screen or require the user to get up and select the next exercise if the devices is positioned for a better view. A dual screen application, where contents are shown on a TV screen and the flow of the video can be controlled from a mobile second device, allows patients to keep their correct posture and the same time view and select contents. In this paper we propose first studies for user interface designs for such apps. Initial paper prototypes are discussed and refined in two focus groups. The results are then presented to a broader range of users in a survey. Three prototypes for the mobile app and one prototype for the TV are identified for future user tests.

Screen Concepts for Multi-Version Hypervideo Authoring

Publication Details
  • WSICC Workshop at TVX
  • Jun 22, 2016

Abstract

Close
The creation of hypervideos usually requires a lot of planning and is time consuming with respect to media content creation. However, when structure and media are put together to author a hypervideo, it may only require minor changes to make the hypervideo available in other languages or for another user group (like beginners versus experts). However, to make the translation of media and all navigation elements of a hypervideo efficient and manageable, the authoring tool needs a GUI that provides a good overview of elements that can be translated and of missing translations. In this work, we propose screen concepts that help authors to provide different versions (for example language and/or experience level) of a hypervideo. We analyzed different variants of GUI elements and evaluated them in a survey. We draw guidelines from the results that can help with the creation of similar systems in the future.
Publication Details
  • International Workshop on Interactive Content Consumption
  • Jun 22, 2016

Abstract

Close
The confluence of technologies such as telepresence, immersive imaging, model based virtual mirror worlds, mobile live streaming, etc. give rise to a capability for people anywhere to view and connect with present or past events nearly anywhere on earth. This capability properly belongs to a public commons, available as a birthright of all humans, and can been seen as part of an evolutionary transition supporting a global collective mind. We describe examples and elements of this capability, and suggest how they can be better integrated through a tool we call TeleViewer and a framework called WorldViews, which supports easy sharing of views as well as connecting of providers and consumers of views all around the world.
Publication Details
  • EICS 2016
  • Jun 21, 2016

Abstract

Close
Most current mobile and wearable devices are equipped with inertial measurement units (IMU) that allow the detection of motion gestures, which can be used for interactive applications. A difficult problem to solve, however, is how to separate ambient motion from an actual motion gesture input. In this work, we explore the use of motion gesture data labeled with gesture execution phases for training supervised learning classifiers for gesture segmentation. We believe that using gesture execution phase data can significantly improve the accuracy of gesture segmentation algorithms. We define gesture execution phases as the start, middle and end of each gesture. Since labeling motion gesture data with gesture execution phase information is work intensive, we used crowd workers to perform the labeling. Using this labeled data set, we trained SVM-based classifiers to segment motion gestures from ambient movement of the device t. We describe initial results that indicate that gesture execution phase can be accurately recognized by SVM classifiers. Our main results show that training gesture segmentation classifiers with phase-labeled data substantially increases the accuracy of gesture segmentation: we achieved a gesture segmentation accuracy of 0.89 for simulated online segmentation using a sliding window approach.

Beyond Actions: Exploring the Discovery of Tactics from User Logs

Publication Details
  • Information Processing & Management
  • Jun 11, 2016

Abstract

Close
Search log analysis has become a common practice to gain insights into user search behaviour, it helps gain an understanding of user needs and preferences, as well as how well a system supports such needs. Currently log analysis is typically focused on the low-level user actions, i.e. logged events such as issued queries and clicked results; and often only a selection of such events are logged and analysed. However, the types of logged events may differ widely from interface to interface, making comparison between systems difficult. Further, analysing a selection of events may lead to conclusions out of context— e.g. the statistics of observed query reformulations may be influenced by the existence of a relevance feedback component. Alternatively, in lab studies user activities can be analysed at a higher level, such as search tactics and strategies, abstracted away from detailed interface implementation. However, the required manual codings that map logged events to higher level interpretations prevent this type of analysis from going large scale. In this paper, we propose a new method for analysing search logs by (semi-)automatically identifying user search tactics from logged events, allowing large scale analysis that is comparable across search systems. We validate the efficiency and effectiveness of the proposed tactic identification method using logs of two reference search systems of different natures: a product search system and a video search system. With the identified tactics, we perform a series of novel log analyses in terms of entropy rate of user search tactic sequences, demonstrating how this type of analysis allows comparisons of user search behaviours across systems of different nature and design. This analysis provides insights not achievable with traditional log analysis.
Publication Details
  • ACM International Conference on Multimedia Retrieval (ICMR)
  • Jun 6, 2016

Abstract

Close
We propose a method for extractive summarization of audiovisual recordings focusing on topic-level segments. We first build a content similarity graph between all segments of all documents in the collection, using word vectors from the transcripts, and then select the most central segments for the summaries. We evaluate the method quantitatively on the AMI Meeting Corpus using gold standard reference summaries and the Rouge metric, and qualitatively on lecture recordings using a novel two-tiered approach with human judges. The results show that our method compares favorably with others in terms of Rouge, and outperforms the baselines for human scores, thus also validating our evaluation protocol.
Publication Details
  • LREC 2016
  • May 23, 2016

Abstract

Close
Many people post about their daily life on social media. These posts may include information about the purchase activity of people, and insights useful to companies can be derived from them: e.g. profile information of a user who mentioned something about their product. As a further advanced analysis, we consider extracting users who are likely to buy a product from the set of users who mentioned that the product is attractive. In this paper, we report our methodology for building a corpus for Twitter user purchase behavior prediction. First, we collected Twitter users who posted a want phrase + product name: e.g. "want a Xperia" as candidate want users, and also candidate bought users in the same way. Then, we asked an annotator to judge whether a candidate user actually bought a product. We also annotated whether tweets randomly sampled from want/bought user timelines are relevant or not to purchase. In this annotation, 58% of want user tweets and 35% of bought user tweets were annotated as relevant. Our data indicate that information embedded in timeline tweets can be used to predict purchase behavior of tweeted products.
Publication Details
  • CHI 2016
  • May 7, 2016

Abstract

Close
The negative effect of lapses during a behavior-change program has been shown to increase the risk of repeated lapses and, ultimately, program abandonment. In this paper, we examine the potential of system-driven lapse management -- supporting users through lapses as part of a behavior-change tool. We first review lessons from domains such as dieting and addiction research and discuss the design space of lapse management. We then explore the value of one approach to lapse management -- the use of "cheat points" as a way to encourage sustained participation. In an online study, we first examine interpretations of progress that was reached through using cheat points. We then present findings from a deployment of lapse management in a two-week field study with 30 participants. Our results demonstrate the potential of this approach to motivate and change users' behavior. We discuss important open questions for the design of future technology-mediated behavior change programs.

Abstract

Close
Taking breaks from work is an essential and universal practice. In this paper, we extend current research on productivity in the workplace to consider the break habits of knowledge workers and explore opportunities of break logging for personal informatics. We report on three studies. Through a survey of 147 U.S.-based knowledge workers, we investigate what activities respondents consider to be breaks from work, and offer an understanding of the benefit workers desire when they take breaks. We then present results from a two-week in-situ diary study with 28 participants in the U.S. who logged 800 breaks, offering insights into the effect of work breaks on productivity. We finally explore the space of information visualization of work breaks and productivity in a third study. We conclude with a discussion of implications for break recommendation systems, availability and interuptibility research, and the quantified workplace.
Publication Details
  • CHI 2016 (Late Breaking Work)
  • May 7, 2016

Abstract

Close
We describe a novel thermal haptic output device, ThermoTouch, that provides a grid of thermal pixels. Unlike previous devices which mainly use Peltier elements for thermal output, ThermoTouch uses liquid cooling and electro-resistive heating to output thermal feedback at arbitrary grid locations. We describe the design of the prototype, highlight advantages and disadvantages of the technique and briefly discuss future improvements and research applications.
Publication Details
  • IEEE Multimedia Magzine
  • May 2, 2016

Abstract

Close
Silicon Valley is home to many of the world’s largest technology corporations, as well as thousands of small startups. Despite the development of other high-tech economic centers throughout the US and around the world, Silicon Valley continues to be a leading hub for high-tech innovation and development, in part because most of its companies and universities are within 20 miles of each other. Given the high concentration of multimedia researchers in Silicon Valley, and the high demand for information exchange, I was able to work with a team of researchers from various companies and organizations to start the Bay Area Multimedia Forum (BAMMF) series back in November 2013.
Publication Details
  • Multimedia Systems Journal
  • Apr 12, 2016

Abstract

Close
With modern technologies, it is possible to create annotated interactive non-linear videos (a form of hypervideo) for the Web. These videos have a non-linear structure of linked scenes to which additional information (other media like images, text, audio, or additional videos) can be added. A variety of user interactions - like in- and between-scene navigation or zooming into additional information - are possible in players for this type of video. Like linear video, quality of experience (QoE) with annotated hypervideo experiences is tied to the temporal consistency of the video stream at the client end - its flow. Despite its interactive complexity, users expect this type of video experience to flow as seamlessly as simple linear video. However, the added hypermedia elements bog playback engines down. Download and cache management systems address the flow issue, but their effectiveness is tied to numerous questions respecting user requirements, computational strategy, and evaluative metrics. In this work, we a) define QoE metrics, b) examine structural and behavioral patterns of interactive annotated non-linear video, c) propose download and cache management algorithms and strategies, d) describe the implementation of an evaluative simulation framework, and e) present the algorithm test results.

Social Media-Based Profiling of Business Locations

Publication Details
  • Fuji Xerox Technical Report
  • Mar 17, 2016

Abstract

Close
We present a method for profiling businesses at specific locations that is based on mining information from social media. The method matches geo-tagged tweets from Twitter against venues from Foursquare to identify the specific business mentioned in a tweet. By linking geo-coordinates to places, the tweets associated with a business, such as a store, can then be used to profile that business. From these venue-located tweets, we create sentiment profiles for each of the stores in a chain. We present the results as heat maps showing how sentiment differs across stores in the same chain and how some chains have more positive sentiment than other chains. We also estimate social group size from photos and create profiles of social group size for businesses. Sample heat maps of these results illustrate how the average social group size can vary across businesses.
Publication Details
  • IUI 2016
  • Mar 7, 2016

Abstract

Close
We describe methods for analyzing and visualizing document metadata to provide insights about collaborations over time. We investigate the use of Latent Dirichlet Allocation (LDA) based topic modeling to compute areas of interest on which people collaborate. The topics are represented in a node-link force directed graph by persistent fixed nodes laid out with multidimensional scaling (MDS), and the people by transient movable nodes. The topics are also analyzed to detect bursts to highlight "hot" topics during a time interval. As the user manipulates a time interval slider, the people nodes and links are dynamically updated. We evaluate the results of LDA topic modeling for the visualization by comparing topic keywords against the submitted keywords from the InfoVis 2004 Contest, and we found that the additional terms provided by LDA-based keyword sets result in improved similarity between a topic keyword set and the documents in a corpus. We extended the InfoVis dataset from 8 to 20 years and collected publication metadata from our lab over a period of 21 years, and created interactive visualizations for exploring these larger datasets.
Publication Details
  • IUI 2016
  • Mar 6, 2016

Abstract

Close
The use of videoconferencing in the workplace has been steadily growing. While multitasking during video conferencing is often necessary, it is also viewed as impolite and sometimes unacceptable. One potential contributor to negative attitudes towards such multitasking is the disrupted sense of eye contact that occurs when an individual shifts their gaze away to another screen, for example, in a dual-monitor setup, common in office settings. We present a system to improve a sense of eye contact over videoconferencing in dual-monitor setups. Our system uses computer vision and desktop activity detection to dynamically choose the camera with the best view of a user's face. We describe two alternative implementations of our system (RGB-only, and a combination of RGB and RGB-D cameras). We then describe results from an online experiment that shows the potential of our approach to significantly improve perceptions of a person's politeness and engagement in the meeting.
Publication Details
  • Proceedings of CSCW 2016
  • Feb 27, 2016

Abstract

Close
This paper presents a detailed examination of factors that affect perceptions of, and attitudes towards multitasking in dyadic video conferencing. We first report findings from interviews with 15 professional users of videoconferencing. We then report results from a controlled online experiment with 397 participants based in the United States. Our results show that the technology used for multitasking has a significant effect on others' assumptions of what secondary activity the multitasker is likely engaged in, and that this assumed activity in turn affects evaluations of politeness and appropriateness. We also describe how different layouts of the video conferencing UI may lead to better or worse ratings of engagement and in turn ratings of polite or impolite behavior. We then propose a model that captures our results and use the model to discuss implications for behavior and for the design of video communication tools.

MixMeetWear: Live Meetings At-a-glance

Publication Details
  • CSCW 2016
  • Feb 27, 2016

Abstract

Close
We present MixMeetWear, a smartwatch application that allows users to maintain awareness of the audio and visual content of a meeting while completing other tasks. Users of the system can listen to the audio of a meeting and also view, zoom, and pan webcam and shared content keyframes of other meeting participants' live streams in real time. Users can also provide input to the meeting via speech-to-text or predefined responses. A study showed that the system is useful for peripheral awareness of some meetings.
Publication Details
  • CSCW 2016
  • Feb 26, 2016

Abstract

Close
Remote meetings are messy. There are an ever-increasing number of support tools available, and, as past work has shown, people will tend to select a subset of those tools to satisfy their own institutional, social, and personal needs. While video tools make it relatively easy to have conversations at a distance, they are less adapted to sharing and archiving multimedia content. In this paper we take a deeper look at how sharing multimedia content occurs before, during, and after distributed meetings. Our findings shed light on the decisions and rationales people use to select from the vast set of tools available to them to prepare for, conduct, and reconcile the results of a remote meeting.
Publication Details
  • Personal and Ubiquitous Computing (Springer)
  • Feb 19, 2016

Abstract

Close
In recent years, there has been an explosion of services that lever- age location to provide users novel and engaging experiences. However, many applications fail to realize their full potential because of limitations in current location technologies. Current frameworks work well outdoors but fare poorly indoors. In this paper we present LoCo, a new framework that can provide highly accurate room-level indoor location. LoCo does not require users to carry specialized location hardware—it uses radios that are present in most contemporary devices and, combined with a boosting classification technique, provides a significant runtime performance improvement. We provide experiments that show the combined radio technique can achieve accuracy that improves on current state-of-the-art Wi-Fi only techniques. LoCo is designed to be easily deployed within an environment and readily leveraged by application developers. We believe LoCo’s high accuracy and accessibility can drive a new wave of location-driven applications and services.
Publication Details
  • AAAI
  • Feb 12, 2016

Abstract

Close
Image localization is important for marketing and recommendation of local business; however, the level of granularity is still a critical issue. Given a consumer photo and its rough GPS information, we are interested in extracting the fine-grained location information (i.e. business venues) of the image. To this end, we propose a novel framework for business venue recognition. The framework mainly contains three parts. First, business aware visual concept discovery: we mine a set of concepts that are useful for business venue recognition based on three guidelines including business-awareness, visually detectable, and discriminative power. Second, business-aware concept detection by convolutional neural networks (BA-CNN): we pro- pose a new network architecture that can extract semantic concept features from input image. Third, multimodal business venue recognition: we extend visually detected concepts to multimodal feature representations that allow a test image to be associated with business reviews and images from social media for business venue recognition. The experiments results show the visual concepts detected by BA-CNN can achieve up to 22.5% relative improvement for business venue recognition compared to the state-of-the-art convolutional neural network features. Experiments also show that by leveraging multimodal information from social media we can further boost the performance, especially in the case when the database images belonging to each business venue are scarce.
Publication Details
  • MMM 2016
  • Jan 4, 2016

Abstract

Close
Hypervideos yield to different challenges in the area of navigation due to their underlying graph structure. Especially when used on tablets or by older people, a lack of clarity may lead to confusion and rejection of this type of medium. To avoid confusion, the hypervideo can be extended with a well known table of contents, which needs to be created separately by the authors due to an underlying graph structure. In this work, we present an extended presentation of a table of contents for hypervideos on mobile devices. The design was tested in a real world medical training scenario with the target group of people older than 45 which is the main target group of these applications. This user group is a particular challenge since they sometimes have limited experience in the use of mobile devices and physical deficiencies with growing age. Our user interface was designed in three steps. The findings of an expert group and a survey were used to create two different prototypical versions of the display, which were then tested against each other in a user test. This test revealed that a divided view is desired. The table of contents in an easy-to-touch version should be on the left side and previews of scenes should be on the right side of the view. These findings were implemented in the existing SIVA HTML5 open source player and tested with a second group of users. This test only lead to minor changes in the GUI.
2015
Publication Details
  • ISM 2015
  • Dec 14, 2015

Abstract

Close
Indoor localization is challenging in terms of both the accuracy and possible using scenarios. In this paper, we introduce the design and implementation of a toy car localization and navigation system, which demonstrates that a projected light based localization technique allows multiple devices to know and exchange their fine-grained location information in an indoor environment. The projected light consists of a sequence of gray code images which assigns each pixel in the projection area a unique gray code so as to distinguish their coordination. The light sensors installed on the toy car and the potential “passenger” receive the light stream from the projected light stream, based on which their locations are computed. The toy car then utilizes A* algorithm to plan the route based on its own location, orientation, the target’s location and the map of available “roads”. The fast speed of localization enables the toy car to adjust its own orientation while “driving” and keep itself on “roads”. The toy car system demonstrates that the localization technique can power other applications that require fine-grained location information of multiple objects simultaneously.
Publication Details
  • MM Commons Workshop co-located with ACM Multimedia 2015.
  • Oct 30, 2015

Abstract

Close
In this paper, we analyze the association between a social media user's photo content and their interests. Visual content of photos is analyzed using state-of-the-art deep learning based automatic concept recognition. An aggregate visual concept signature is thereby computed for each user. User tags manually applied to their photos are also used to construct a tf-idf based signature per user. We also obtain social groups that users join to represent their social interests. In an effort to compare the visual-based versus tag-based user profiles with social interests, we compare corresponding similarity matrices with a reference similarity matrix based on users' group memberships. A random baseline is also included that groups users by random sampling while preserving the actual group sizes. A difference metric is proposed and it is shown that the combination of visual and text features better approximates the group-based similarity matrix than either modality individually. We also validate the visual analysis against the reference inter-user similarity using the Spearman rank correlation coefficient. Finally we cluster users by their visual signatures and rank clusters using a cluster uniqueness criteria.

Inferring Crowd-Sourced Venues for Tweets

Publication Details
  • IEEE BigData 2015
  • Oct 29, 2015

Abstract

Close
Knowing the geo-located venue of a tweet can facilitate better understanding of a user's geographic context, allowing apps to more precisely present information, recommend services, and target advertisements. However, due to privacy concerns, few users choose to enable geotagging of their tweets resulting in a small percentage of tweets being geotagged; furthermore, even if the geo-coordinates are available, the closest venue to the geo-location may be incorrect. In this paper, we present a method for providing a ranked list of geo-located venues for a non-geotagged tweet, which simultaneously indicates the venue name and the geo-location at a very fine-grained granularity. In our proposed method for Venue Inference for Tweets ({\VIT}), we construct a heterogeneous social network in order to analyze the embedded social relations, and leverage available but limited geographic data to estimate the geo-located venue of tweets. A single classifier is trained to predict the probability of a tweet and a geo-located venue being linked, rather than training a separate model for each venue. We examine the performance of four types of social relation features and three types of geographic features embedded in a social network when predicting whether a tweet and a venue are linked, with a best accuracy of over 88%. We use the classifier probability estimates to rank the predicted geo-located venues of a non-geotagged tweet from over 19k possibilities, and observed an average top-5 accuracy of 29%.
Publication Details
  • ACM MM
  • Oct 26, 2015

Abstract

Close
Establishing common ground is one of the key problems for any form of communication. The problem is particularly pronounced in remote meetings, in which participants can easily lose track of the details of dialogue for any number of reasons. In this demo we present a web-based tool, MixMeet, that allows teleconferencing participants to search the contents of live meetings so they can rapidly retrieve previously shared content to get on the same page, correct a misunderstanding, or discuss a new idea.

Abstract

Close
New technology comes about in a number of different ways. It may come from advances in scientific research, through new combinations of existing technology, or by simply from imagining what might be possible in the future. This video describes the evolution of Tabletop Telepresence, a system for remote collaboration through desktop videoconferencing combined with a digital desk. Tabletop Telepresence provides a means to share paper documents between remote desktops, interact with documents and request services (such as translation), and communicate with a remote person through a teleconference. It was made possible by combining advances in camera/projector technology that enable a fully functional digital desk, embodied telepresence in video conferencing and concept art that imagines future workstyles.
Publication Details
  • ACM Multimedia
  • Oct 18, 2015

Abstract

Close
While synchronous meetings are an important part of collaboration, it is not always possible for all stakeholders to meet at the same time. We created the concept of hypermeetings to support meetings with asynchronous attendance. Hypermeetings consist of a chain of video-recorded meetings with hyperlinks for navigating through the video content. HyperMeeting supports the synchronized viewing of prior meetings during a videoconference. Natural viewing behavior such as pausing generates hyperlinks between the previously recorded meetings and the current video recording. During playback, automatic link-following guided by playback plans present the relevant content to users. Playback plans take into account the user's meeting attendance and viewing history and match them with features such as speaker segmentation. A user study showed that participants found hyperlinks useful but did not always understand where they would take them. The study results provide a good basis for future system improvements.
Publication Details
  • International Journal of Semantic Computing
  • Sep 15, 2015

Abstract

Close
A localization system is a coordinate system for describing the world, organizing the world, and controlling the world. Without a coordinate system, we cannot specify the world in mathematical forms; we cannot regulate processes that may involve spatial collisions; we cannot even automate a robot for physical actions. This paper provides an overview of indoor localization technologies, popular models for extracting semantics from location data, approaches for associating semantic information and location data, and applications that may be enabled with location semantics. To make the presentation easy to understand, we will use a museum scenario to explain pros and cons of different technologies and models. More specifically, we will first explore users' needs in a museum scenario. Based on these needs, we will then discuss advantages and disadvantages of using different localization technologies to meet these needs. From these discussions, we can highlight gaps between real application requirements and existing technologies, and point out promising localization research directions. Similarly, we will also discuss context information required by different applications and explore models and ontologies for connecting users, objects, and environment factors with semantics. By identifying gaps between various models and real application requirements, we can draw a road map for future location semantics research.

Creating Gaze Annotations in Head Mounted displays

Publication Details
  • International Symposium on Wearable Computers (ISWC)
  • Sep 8, 2015

Abstract

Close
To facilitate distributed communication in mobile settings, we developed a system for creating and sharing gaze anno-tations using head mounted displays, such as Google Glass. Gaze annotations make it possible to point out objects of interest within an image and add a verbal description to it. To create an annotation, the user simply looks at an object of interest in the image and speaks out the information connected to the object. The gaze location is recorded and inserted as a gaze marker and the voice is transcribed using speech recognition. After an annotation has been created, it can be shared with another person. We performed a user study that showed that users experienced that gaze annota-tions add precision and expressiveness compared to an annotation to the whole image.
Publication Details
  • DocEng 2015
  • Sep 8, 2015

Abstract

Close
We present a novel system for detecting and capturing paper documents on a tabletop using a 4K video camera mounted overhead on pan-tilt servos. Our automated system first finds paper documents on a cluttered tabletop based on a text probability map, and then takes a sequence of high-resolution frames of the located document to reconstruct a high quality and fronto-parallel document page image. The quality of the resulting images enables OCR processing on the whole page. We performed a preliminary evaluation on a small set of 10 document pages and our proposed system achieved 98% accuracy with the open source Tesseract OCR engine.
Publication Details
  • DocEng 2015
  • Sep 8, 2015

Abstract

Close
Web-based tools for remote collaboration are quickly becoming an established element of the modern workplace. During live meetings, people share web sites, edit presentation slides, and share code editors. It is common for participants to refer to previously spoken or shared content in the course of synchronous distributed collaboration. A simple approach is to index with Optical Character Recognition (OCR) the video frames, or key-frames, being shared and let user retrieve them with text queries. Here we show that a complementary approach is to look at the actions users take inside the live document streams. Based on observations of real meetings, we focus on two important signals: text editing and mouse cursor motion. We describe the detection of text and cursor motion, their implementation in our WebRTC-based system, and how users are better able to search live documents during a meeting based on these detected and indexed actions.

Abstract

Close
Location-enabled applications now permeate the mobile computing landscape. As technologies like Bluetooth Low Energy (BLE) and Apple's iBeacon protocols begin to see widespread adoption, we will no doubt see a proliferation of indoor location enabled application experiences. While not essential to each of these applications, many will require that the location of the device be true and verifiable. In this paper, we present LocAssure, a new framework for trusted indoor location estimation. The system leverages existing technologies like BLE and iBeacons, making the solution practical and compatible with technologies that are already in use today. In this work, we describe our system, situate it within a broad location assurance taxonomy, describe the protocols that enable trusted localization in our system, and provide an analysis of early deployment and use characteristics. Through developer APIs, LocAssure can provide critical security support for a broad range of indoor location applications.

Assistive Image Comment Robot - A Novel Mid-Level Concept-Based Representation

Publication Details
  • IEEE Transactions on Affective Computing
  • Aug 30, 2015

Abstract

Close
We present a general framework and working system for predicting likely affective responses of the viewers in the social media environment after an image is posted online. Our approach emphasizes a mid-level concept representation, in which intended affects of the image publisher is characterized by a large pool of visual concepts (termed PACs) detected from image content directly instead of textual metadata, evoked viewer affects are represented by concepts (termed VACs) mined from online comments, and statistical methods are used to model the correlations among these two types of concepts. We demonstrate the utilities of such approaches by developing an end-to-end Assistive Comment Robot application, which further includes components for multi-sentence comment generation, interactive interfaces, and relevance feedback functions. Through user studies, we showed machine suggested comments were accepted by users for online posting in 90% of completed user sessions, while very favorable results were also observed in various dimensions (plausibility, preference, and realism) when assessing the quality of the generated image comments.
Publication Details
  • MobileHCI 2015
  • Aug 24, 2015

Abstract

Close
In this paper we report findings from two user studies that explore the problem of establishing common viewpoint in the context of a wearable telepresence system. In our first study, we assessed the ability of a local person (the guide) to identify the view orientation of the remote person by looking at the physical pose of the telepresence device. In the follow-up study, we explored visual feedback methods for communicating the relative viewpoints of the remote user and the guide via a head-mounted display. Our results show that actively observing the pose of the device is useful for viewpoint estimation. However, in the case of telepresence devices without physical directional affordances, a live video feed may yield comparable results. Lastly, more abstract visualizations lead to significantly longer recognition times, but may be necessary in more complex environments.
Publication Details
  • IEEE Pervasive Computing
  • Jul 1, 2015

Abstract

Close
Tutorials are one of the most fundamental means of conveying knowledge. In this paper, we present a suite of applications that allow users to combine different types of media captured from handheld, standalone, or wearable devices to create multimedia tutorials. We conducted a study comparing standalone (camera on tripod) versus wearable capture (Google Glass). The results show that tutorial authors have a slight preference for wearable capture devices, especially when recording activities involving larger objects.

POLI: MOBILE AR BY HEARING POSITION FROM LIGHT

Publication Details
  • ICME 2015 Mobile Multimedia Workshop
  • Jun 29, 2015

Abstract

Close
Connecting digital information to physical objects can enrich their content and make them more vivid. Traditional augmented reality techniques reach this goal by augmenting physical objects or their surroundings with various markers and typically require end users to wear additional devices to explore the augmented content. In this paper, we propose POLI, which allows a system administrator to author digital content with his/her mobile device while allows end-users to explore the authored content with their mobile devices. POLI provides three novel interactive approaches for authoring digital content. It does not change the nature appearances of physical objects and does not require users to wear any additional hardware on their bodies.
Publication Details
  • Presented in "Everyday Telepresence" workshop at CHI 2015
  • Apr 18, 2015

Abstract

Close
As video-mediated communication reaches broad adoption, improving immersion and social interaction are important areas of focus in the design of tools for exploration and work-based communication. Here we present three threads of research focused on developing new ways of enabling exploration of a remote environment and interacting with the people and artifacts therein.
Publication Details
  • Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
  • Apr 18, 2015

Abstract

Close
Edge targets, such as buttons or menus along the edge of a screen, are known to afford fast acquisition performance in desktop mousing environments. As the popularity of touch based devices continues to grow, understanding the affordances of edge targets on touchscreen is needed. This paper describes results from two controlled experiments that examine in detail the effect of edge targets on performance in touch devices. Our results shows that on touch devices, a target's proximity to the edge has a significant negative effect on reaction time. We examine the effect in detail and explore mitigating factors. We discuss potential explanations for the effect and propose implications for the design of efficient interfaces for touch devices.
Publication Details
  • CHI 2015 (Extended Abstracts)
  • Apr 18, 2015

Abstract

Close
We present our ongoing research on automatic segmentation of motion gestures tracked by IMUs. We postulate that by recognizing gesture execution phases from motion data that we may be able to auto-delimit user gesture entries. We demonstrate that machine learning classifiers can be trained to recognize three distinct phases of gesture entry: the start, middle and end of a gesture motion. We further demonstrate that this type of classification can be done at the level of individual gestures. Furthermore, we describe how we captured a new data set for data exploration and discuss a tool we developed to allow manual annotations of gesture phase information. Initial results we obtained using the new data set annotated with our tool show a precision of 0.95 for recognition of the gesture phase and a precision of 0.93 for simultaneous recognition of the gesture phase and the gesture type.
Publication Details
  • CHI 2015
  • Apr 18, 2015

Abstract

Close
Websites can record individual users' activities and display them in a variety of ways. There is a tradeoff between detail and abstraction in visualization, especially when the amount of content increases and becomes more difficult to process. We conducted an experiment on Mechanical Turk varying the quality, detail, and visual presentation of information about an individual's past work to see how these design features affected perceptions of the worker. We found that providing detail in the display through text increased processing time and led to less positive evaluations. Visually abstract displays required less processing time but decreased confidence in evaluation. This suggests that different design parameters may engender differing psychological processes that influence reactions towards an unknown person.
Publication Details
  • CSCW 2015
  • Mar 14, 2015

Abstract

Close
Collaboration Map (CoMap) is an interactive visualization tool showing temporal changes of small group collaborations. As dynamic entities, collaboration groups have flexible features such as people involved, areas of work, and timings. CoMap shows a graph of collaborations during user-adjustable periods, providing overviews of collaborations' dynamic features. We demonstrate CoMap with a co-authorship dataset extracted from DBLP to visualize 587 publications by 29 researchers at a research organization.

Abstract

Close
In this paper, we report findings from a study that compared basic video-conferencing, emergent kinetic video-conferencing techniques, and face-to-face meetings. In our study, remote and co-located participants worked together in groups of three. We show, in agreement with prior literature, the strong adverse impact of being remote on participation-levels. We also show that local and remote participants perceived differently their own contributions and others. Extending prior work, we also show that local participants exhibited significantly more overlapping speech with remote participants who used an embodied proxy, than with remote participants in basic-video conferencing (and at a rate similar to overlapping speech for co-located groups). We also describe differences in how the technologies were used to follow conversation. We discuss how these findings extend our understanding of the promise and potential limitations of embodied video-conferencing solutions.
Publication Details
  • CSCW 2015
  • Mar 14, 2015

Abstract

Close
In a variety of peer production settings, from Wikipedia to open source software development to crowdsourcing, individuals may encounter, edit, or review the work of unknown others. Typically this is done without much context to the person's past behavior or performance. To understand how exposure to an unknown individual's activity history influences attitudes and behaviors, we conducted an online experiment on Mechanical Turk varying the content, quality, and presentation of information about another Turker's work history. Surprisingly, negative work history did not lead to negative outcomes, but in contrast, a positive work history led to positive initial impressions that persisted in the face of contrary information. This work provides insight into the impact of activity history design factors on psychological and behavioral outcomes that can be of use in other related settings.
Publication Details
  • Human-Robot Interaction (HRI) 2015
  • Mar 2, 2015

Abstract

Close
Our research focuses on improving the effectiveness and usability of driving mobile telepresence robots by increasing the user's sense of immersion during the navigation task. To this end we developed a robot platform that allows immersive navigation using head-tracked stereoscopic video and a HMD. We present the result of an initial user study that compares System Usability Scale (SUS) ratings of a robot teleoperation task using head-tracked stereo vision with a baseline fixed video feed and the effect of a low or high placement of the camera(s). Our results show significantly higher ratings for the fixed video condition and no effect of the camera placement. Future work will focus on examining the reasons for the lower ratings of stereo video and and also exploring further visual navigation interfaces.
Publication Details
  • The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)
  • Jan 25, 2015

Abstract

Close
Name of an identity is strongly influenced by his/her cultural background such as gender and ethnicity, both vital attributes for user profiling, attribute-based retrieval, etc. Typically, the associations between names and attributes (e.g., people named "Amy" are mostly females) are annotated manually or provided by the census data of governments. We propose to associate a name and its likely demographic attributes by exploiting click-throughs between name queries and images with automatically detected facial attributes. This is the first work attempting to translate an abstract name to demographic attributes in visual-data-driven manner, and it is adaptive to incremental data, more countries and even unseen names (the names out of click-through data) without additional manual labels. In the experiments, the automatic name-attribute associations can help gender inference with competitive accuracy by using manual labeling. It also benefits profiling social media users and keyword-based face image retrieval, especially for contributing 12% relative improvement of accuracy in adapting to unseen names.
2014

Synchronizing Web Documents with Style

Publication Details
  • ACM Brazilian Symposium on Multimedia and the Web
  • Nov 17, 2014

Abstract

Close
In this paper we report on our efforts to define a set of document extensions to Cascading Style Sheets (CSS) that allow for structured timing and synchronization of elements within a Web page. Our work considers the scenario in which the temporal structure can be decoupled from the content of the Web page in a similar way that CSS does with the layout, colors and fonts. Based on the SMIL (Synchronized Multimedia Integration Language) temporal model we propose CSS document extensions and discuss the design and implementation of a proof of concept that realizes our contributions. As HTML5 seems to move away from technologies like Flash and XML (eXtensible Markup Language), we believe our approach provides a flexible declarative solution to specify rich media experiences that is more aligned with current Web practices.
Publication Details
  • ACM International Workshop on Understanding and Modeling Multiparty, Multimodal Interactions (UMMMI)
  • Nov 15, 2014

Abstract

Close
In this paper we discuss communication problems in video-mediated small group discussions. We present results from a study in which ad-hoc groups of five people, with moderator, solved a quiz question-select answer style task over a video-conferencing system. The task was performed under different delay conditions, of up to 2000ms additional one-way delay. Even with a delay up to 2000ms, we could not observe any effect on the achieved quiz scores. In contrast, the subjective satisfaction was severely negatively affected. While we would have suspected a clear conversational breakdown with such a high delay, groups adapted their communication style and thus still managed to solve the task. This is, most groups decided to switch to a more explicit turn-taking scheme. We argue that future video-conferencing systems can provide a better experience if they are aware of the current conversational situation and can provide compensation mechanisms. Thus we provide an overview of what cues are relevant and how they are affected by the video-conferencing system and how recent advancements in computational social science can be leveraged. Further, we provide an analysis of the suitability of normal webcam data for such cue recognition. Based on our observations, we suggest strategies that can be implemented to alleviate the problems.
Publication Details
  • ACM International Workshop on Socially-aware Multimedia (SAM)
  • Nov 6, 2014

Abstract

Close
As commercial, off-the-shelf, services enable people to easily connect with friends and relatives, video-mediated communication is filtering into our daily activities. With the proliferation of broadband and powerful devices, multi-party gatherings are becoming a reality in home environments. With the technical infrastructure in place and has been accepted by a large user base, researchers and system designers are concentrating on understanding and optimizing the Quality of Experience (QoE) for participants. Theoretical foundations for QoE have identified three crucial factors for understanding the impact on the individual’s perception: system, context, and user. While most of the current research tends to focus on the system factors (delay, bandwidth, resolution), in this paper we offer a more complete analysis that takes into consideration context and user factors. In particular, we investigate the influence of delay (constant system factor) in the QoE of multi-party conversations. Regarding the context, we extend the typical one-to-one condition to explore conversations between small groups (up to five people). In terms of user factors, we take into account conversation analysis, turn-taking and role-theory, for better understanding the impact of different user profiles. Our investigation allows us to report a detailed analysis on how delay influences the QoE, concluding that the actual interactivity pattern of each participant in the conversation results on different noticeability thresholds of delays. Such results have a direct impact on how we should design and construct video-communication services for multi-party conversations, where user activity should be considered as a prime adaptation and optimization parameter.

Multi-modal Language Models for Lecture Video Retrieval

Publication Details
  • ACM Multimedia 2014
  • Nov 2, 2014

Abstract

Close
We propose Multi-modal Language Models (MLMs), which adapt latent variable models for text document analysis to modeling co-occurrence relationships in multi-modal data. In this paper, we focus on the application of MLMs to indexing slide and spoken text associated with lecture videos, and subsequently employ a multi-modal probabilistic ranking function for lecture video retrieval. The MLM achieves highly competitive results against well established retrieval methods such as the Vector Space Model and Probabilistic Latent Semantic Analysis. Retrieval performance with MLMs is also shown to improve with the quality of the available extracted spoken text.

Social Media-based Profiling of Store Locations

Publication Details
  • ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia
  • Nov 2, 2014

Abstract

Close
We present a method for profiling businesses at specific locations that is based on mining information from social media. The method matches geo-tagged tweets from Twitter against venues from Foursquare to identify the specific business mentioned in a tweet. By linking geo-coordinates to places, the tweets associated with a business, such as a store, can then be used to profile that business. We used a sentiment estimator developed for tweets to create sentiment profiles of the stores in a chain, computing the average sentiment of tweets associated with each store. We present the results as heatmaps which show how sentiment differs across stores in the same chain and how some chains have more positive sentiment than other chains. We also created profiles of social group size for businesses and show sample heatmaps illustrating how the size of a social group can vary.