Publications

From 2016 (Clear Search)

2016

Automatic Geographic Metadata Correction for Sensor-Rich Video Sequences.

Publication Details
  • ACM SIGSPATIAL GIS 2016
  • Nov 2, 2016

Abstract

Close
Videos recorded with current mobile devices are increasingly geotagged at fine granularity and used in various location based applications and services. However, raw sensor data collected is often noisy, resulting in subsequent inaccurate geospatial analysis. In this study, we focus on the challenging correction of compass readings and present an automatic approach to reduce these metadata errors. Given the small geo-distance between consecutive video frames, image-based localization does not work due to the high ambiguity in the depth reconstruction of the scene. As an alternative, we collect geographic context from OpenStreetMap and estimate the absolute viewing direction by comparing the image scene to world projections obtained with different external camera parameters. To design a comprehensive model, we further incorporate smooth approximation and feature-based rotation estimation when formulating the error terms. Experimental results show that our proposed pyramid-based method outperforms its competitors and reduces orientation errors by an average of 58.8%. Hence, for downstream applications, improved results can be obtained with these more accurate geo-metadata. To illustrate, we present the performance gain in landmark retrieval and tag suggestion by utilizing the accuracy-enhanced geo-metadata.

A General Feature-based Map Matching Framework with Trajectory Simplications.

Publication Details
  • 7th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS 2016)
  • Oct 31, 2016

Abstract

Close
Accurate map matching has been a fundamental but challenging problem that has drawn great research attention in recent years. It aims to reduce the uncertainty in a trajectory by matching the GPS points to the road network on a digital map. Most existing work has focused on estimating the likelihood of a candidate path based on the GPS observations, while neglecting to model the probability of a route choice from the perspective of drivers. Here we propose a novel feature-based map matching algorithm that estimates the cost of a candidate path based on both GPS observations and human factors. To take human factors into consideration is very important especially when dealing with low sampling rate data where most of the movement details are lost. Additionally, we simultaneously analyze a subsequence of coherent GPS points by utilizing a new segment-based probabilistic map matching strategy, which is less susceptible to the noisiness of the positioning data. We have evaluated the proposed approach on a public large-scale GPS dataset, which consists of 100 trajectories distributed all over the world. The experimental results show that our method is robust to sparse data with large sampling intervals (e.g., 60 s to 300 s) and challenging track features (e.g., u-turns and loops). Compared with two state-of-the-art map matching algorithms, our method substantially reduces the route mismatch error by 6.4% to 32.3% and obtains the best map matching results in all the different combinations of sampling rates and challenging features.
Publication Details
  • ENCYCLOPEDIA WITH SEMANTIC COMPUTING
  • Oct 31, 2016

Abstract

Close
Improvements in sensor and wireless network enable accurate, automated, instant determination and dissemination of a user's or objects position. The new enabler of location-based services (LBSs) apart from the current ubiquitous networking infrastructure is the enrichment of the different systems with semantics information, such as time, location, individual capability, preference and more. Such semantically enriched system-modeling aims at developing applications with enhanced functionality and advanced reasoning capabilities. These systems are able to deliver more personalized services to users by domain knowledge with advanced reasoning mechanisms, and provide solutions to problems that were otherwise infeasible. This approach also takes user's preference and place property into consideration that can be utilized to achieve a comprehensive range of personalized services, such as advertising, recommendations, or polling. This paper provides an overview of indoor localization technologies, popular models for extracting semantics from location data, approaches for associating semantic information and location data, and applications that may be enabled with location semantics. To make the presentation easy to understand, we will use a museum scenario to explain pros and cons of different technologies and models. More specifically, we will first explore users' needs in a museum scenario. Based on these needs, we will then discuss advantages and disadvantages of using different localization technologies to meet these needs. From these discussions, we can highlight gaps between real application requirements and existing technologies, and point out promising localization research directions. By identifying gaps between various models and real application requirements, we can draw a road map for future location semantics research.
Publication Details
  • UIST 2016 (Demo)
  • Oct 16, 2016

Abstract

Close
We propose a robust pointing detection with virtual shadow representation for interacting with a public display. Using a depth camera, our shadow is generated by a model with an angled virtual sun light and detects the nearest point as a pointer. Position of the shadow becomes higher when user walks closer, which conveys the notion of correct distance to control the pointer and offers accessibility to the higher area of the display.
Publication Details
  • ACM MM
  • Oct 15, 2016

Abstract

Close
The proliferation of workplace multimedia collaboration applications has meant on one hand more opportunities for group work but on the other more data locked away in proprietary interfaces. We are developing new tools to capture and access multimedia content from any source. In this demo, we focus primarily on new methods that allow users to rapidly reconstitute, enhance, and share document-based information.

Second Screen Hypervideo-Based Physiotherapy Training

Publication Details
  • Multimedia for personal health and health care – MMHealth 2016 @ ACM Multimedia 2016
  • Oct 15, 2016

Abstract

Close
Adapting to personal needs and supporting correct posture are important in physiotherapy training. In this demo, we show a dual screen application (handheld and TV) that allows patients to view hypervideo training programs. Designed to guide their daily exercises, these programs can be adapted to daily needs. The dual screen concept offers the positional flexibility missing in single screen solutions.

A Dual Screen Concept for User-Controlled Hypervideo-Based Physiotherapy Training

Publication Details
  • Multimedia for personal health and health care – MMHealth 2016 @ ACM Multimedia 2016
  • Oct 15, 2016

Abstract

Close
Dual screen concepts for hypervideo-based physiotherapy training are important in healthcare settings, but existing applications often cannot be adapted to personal needs and do not support correct posture. In this paper, we describe the design and implementation of a dual screen application (handheld and TV) that allows patients to view hypervideos designed to help them correctly perform their exercises. This approach lets patients adapt their training to their daily needs and their overall training progress. We evaluated this prototypical implementation in a user test with post-operative care prostate cancer patients. From our results, we derived design recommendations for dual screen physical training hypervideo applications.

Hypervideo Production Using Crowdsourced Youtube Videos

Publication Details
  • ACM Multimedia 2016
  • Oct 15, 2016

Abstract

Close
Different systems exist for the creation of hypervideos nowadays. However, the creation of the video scenes which are put together to a hypervideo is a tedious and time consuming job. Then again huge video databases like YouTube exist which already provide rich sources of video materials. Yet it is not allowed to download and re-purpose the videos from there legally, which requires a solution to link whole videos or parts of videos and play them from the platform in an embedded player. This work presents the SIVA Web Producer, a Chrome extension for the creation of hypervideos consisting of scenes from YouTube videos. After creating a project, the Chrome extension allows to import YouTube videos or parts thereof as video clips. These can than be linked to a scene graph. A preview is provided and finalized videos can be published on the SIVA Web Portal.
Publication Details
  • Document Engineering DocEng 2016
  • Sep 13, 2016

Abstract

Close
In this paper we describe DocuGram, a novel tool to capture and share documents from any application. As users scroll through pages of their document inside the native application (Word, Google Docs, web browser), the system captures and analyses in real-time the video frames and reconstitutes the original document pages into an easy to view HTML-based representation. In addition to regenerating the document pages, a DocuGram also includes the interactions users had over them, e.g. mouse motions and voice comments. A DocuGram acts as a modern copy machine, allowing users to copy and share any document from any application.
Publication Details
  • Mobile HCI 2016
  • Sep 6, 2016

Abstract

Close
Most teleconferencing tools treat users in distributed meetings monolithically: all participants are meant to be connected to one another in the more-or-less the same manner. In reality, though, people connect to meetings in all manner of different contexts, sometimes sitting in front of a laptop or tablet giving their full attention, but at other times mobile and involved in other tasks or as a liminal participant in a larger group meeting. In this paper we present the design and evaluation of two applications, Penny and MeetingMate, designed to help users in non-standard contexts participate in meetings.
Publication Details
  • CBRecSys: Workshop on New Trends in Content-Based Recommender Systems at ACM Recommender Systems Conference
  • Sep 2, 2016

Abstract

Close
The abundance of data posted to Twitter enables companies to extract useful information, such as Twitter users who are dissatisfied with a product. We endeavor to determine which Twitter users are potential customers for companies and would be receptive to product recommendations through the language they use in tweets after mentioning a product of interest. With Twitter's API, we collected tweets from users who tweeted about mobile devices or cameras. An expert annotator determined whether each tweet was relevant to customer purchase behavior and whether a user, based on their tweets, eventually bought the product. For the relevance task, among four models, a feed-forward neural network yielded the best cross-validation accuracy of over 80% per product. For customer purchase prediction of a product, we observed improved performance with the use of sequential input of tweets to recurrent models, with an LSTM model being best; we also observed the use of relevance predictions in our model to be more effective with less powerful RNNs and on more difficult tasks.
Publication Details
  • Ro-Man 2016
  • Aug 26, 2016

Abstract

Close
Two related challenges with current teleoperated robotic systems are lack of peripheral vision and awareness, and difficulty or tedium of navigating through remote spaces. We address these challenges by providing an interface with a focus plus context (F+C) view of the robot location, and where the user can navigate simply by looking where they want to go, and clicking or drawing a path on the view to indicate the desired trajectory or destination. The F+C view provides an undistorted, perspectively correct central region surrounded by a wide field of view peripheral portion, and avoids the need for separate views. The navigation method is direct and intuitive in comparison to keyboard or joystick based navigation, which require the user to be in a control loop as the robot moves. Both the F+C views and the direct click navigation were evaluated in a preliminary user study.
Publication Details
  • Ro-Man 2016
  • Aug 26, 2016

Abstract

Close
Mobile Telepresence Robots (MTR) are an emerging technology that extend the functionality of telepresence systems by adding mobility. MTRs nowadays, however, rely on stationary imaging systems such as a single narrow-view camera for vision, which can lead to reduced operator performance due to view-related deficiencies in situational awareness. We therefore developed an improved imaging and viewing platform that allows immersive telepresence using a Head Mounted Device (HMD) with head-tracked mono and stereoscopic video. Using a remote collaboration task to ground our research, we examine the effectiveness head-tracked HMD systems in comparison to a baseline monitor-based system. We performed a user study where participants were divided into three groups: fixed camera monitor-based baseline condition (without HMD), HMD with head-tracked 2D camera and HMD with head-tracked stereo camera. Results showed the use of HMD reduces task error rates and improves perceived collaborative success and quality of view, compared to the baseline condition. No major difference was found, however, between stereo and 2D camera conditions for participants wearing an HMD.

Tweetviz: Visualizing Tweets for Business Intelligence

Publication Details
  • SIGIR 2016
  • Jul 18, 2016

Abstract

Close
Social media offers potential opportunities for businesses to extract business intelligence. This paper presents Tweetviz, an interactive tool to help businesses extract actionable information from a large set of noisy Twitter messages. Tweetviz visualizes tweet sentiment of business locations, identifies other business venues that Twitter users visit, and estimates some simple demographics of the Twitter users frequenting a business. A user study to evaluate the system's ability indicates that Tweetviz can provide an overview of a business's issues and sentiment as well as information aiding users in creating customer profiles.

Pre-fetching Strategies for HTML5 Hypervideo Players

Publication Details
  • Hypertext 2016
  • Jul 12, 2016

Abstract

Close
Web videos are becoming more and more popular. Current web technologies make it simpler than ever to both stream videos and create complex constructs of interlinked videos with additional information (video, audio, images, and text); so called hypervideos. When viewers interact with hypervideos by clicking on links, new content has to be loaded. This may lead to excessive waiting times, interrupting the presentation -- especially when videos are loaded into the hypervideo player. In this work, we propose hypervideo pre-fetching strategies, which can be implemented in players to minimize waiting times. We examine the possibilities offered by the HTML5
Publication Details
  • 3rd IEEE International Workshop on Mobile Multimedia Computing (MMC)
  • Jul 11, 2016

Abstract

Close
Mobile Audio Commander (MAC) is a mobile phone-based multimedia sensing system that facilitates the introduction of extra sensors to existing mobile robots for advanced capabilities. In this paper, we use MAC to introduce an accurate indoor positioning sensor to a robot to facilitate its indoor navigation. More specifically, we use a projector to send out position ID through light signal, use a light sensor and the audio channel on a mobile phone to decode the position ID, and send navigation commands to a target robot through audio output. With this setup, our system can simplify user’s robot navigation. Users can define a robot navigation path on a phone, and our system will compare the navigation path with its accurate location sensor inputs and generate analog line-following signal, collision avoidance signal, and analog angular signal to adjust the robot’s straight movements and turns. This paper describes two examples of using MAC and a positioning system to enable complicated robot navigation with proper user interface design, external circuit design and real sensor installations on existing robots.
Publication Details
  • ICME 2016
  • Jul 11, 2016

Abstract

Close
Captions are a central component in image posts that communicate the background story behind photos. Captions can enhance the engagement with audiences and are therefore critical to campaigns or advertisement. Previous studies in image captioning either rely solely on image content or summarize multiple web documents related to image's location; both neglect users' activities. We propose business-aware latent topics as a new contextual cue for image captioning that represent user activities. The idea is to learn the typical activities of people who posted images from business venues with similar categories (e.g., fast food restaurants) to provide appropriate context for similar topics (e.g., burger) in new posts. User activities are modeled via a latent topic representation. In turn, the image captioning model can generate sentences that better reflect user activities at business venues. In our experiments, the business-aware latent topics are effective for adapting to captions to images captured in various businesses than the existing baselines. Moreover, they complement other contextual cues (image, time) in a multi-modal framework.

Abstract

Close
We previously created the HyperMeeting system to support a chain of geographically and temporally distributed meetings in the form of a hypervideo. This paper focuses on playback plans that guide users through the recorded meeting content by automatically following available hyperlinks. Our system generates playback plans based on users' interests or prior meeting attendance and presents a dialog that lets users select the most appropriate plan. Prior experience with playback plans revealed users' confusion with automatic link following within a sequence of meetings. To address this issue, we designed three timeline visualizations of playback plans. A user study comparing the timeline designs indicated that different visualizations are preferred for different tasks, making switching among them important. The study also provided insights that will guide research of personalized hypervideo, both inside and outside a meeting context.
Publication Details
  • Springer Multimedia Tools and Applications: SPECIAL ISSUE ON
  • Jul 1, 2016

Abstract

Close
It is difficult to adjust the content of traditional slide presentations to the knowledge level, interest and role of individuals. This might force presenters to include content that is irrelevant for part of the audience, which negatively affects the knowledge transfer of the presentation. In this work, we present a prototype that is able to eliminate non-pertinent information from slides by presenting annotations for individual attendees on optical head-mounted displays. We first create guidelines for creating optimal annotations by evaluating several types of annotations alongside different types of slides. Then we evaluate the knowledge acquisition of presentation attendees using the prototype versus traditional presentations. Our results show that annotations with a limited amount of information, such as text up to 5 words, can significantly increase the amount of knowledge gained from attending a group presentation. Additionally, presentations where part of the information is moved to annotations are judged more positively on attributes such as clarity and enjoyment.

4th International Workshop on Interactive Content Consumption (WSICC'16)

Publication Details
  • ACM TVX 2016
  • Jun 22, 2016

Abstract

Close
WSICC has established itself as a truly interactive workshop at EuroITV'13, TVX'14, and TVX'15 with three successful editions. The fourth edition of the WSICC workshop aims to bring together researchers and practitioners working on novel approaches for interactive multimedia content consumption. New technologies, devices, media formats, and consumption paradigms are emerging that allow for new types of interactivity. Examples include multi-panoramic video and object-based audio, increasingly available in live scenarios with content feeds from a multitude of sources. All these recent advances have an impact on different aspects related to interactive content consumption, which the workshop categorizes into Enabling Technologies, Content, User Experience, and User Interaction. The resources from past editions of the workshop are available on the http://wsicc.net website.

Speech Control for HTML5 Hypervideo Players

Publication Details
  • WSICC Workshop at TVX
  • Jun 22, 2016

Abstract

Close
Hypervideo usage scenarios like physiotherapy trainings or instructions for manual tasks make it hard for users to use an input device like a mouse or touch screen on a hand-held device while they are performing an exercise or use both hands to perform a manual task. In this work, we are trying to overcome this issue by providing an alternative input method for hypervideo navigation using speech commands. In a user test, we evaluated two different speech recognition libraries, annyang (in combination with the Web Speech API) and PocketSphinx.js (in combination with the Web Audio API), for their usability to control hypervideo players. Test users spoke 18 words, either in German or English, which were recorded and then processed by both libraries. We found out that annyang shows better recognition results. However, depending on other factors of influence, like the occurrence of background noise (reliability), the availability of an internet connection, or the used browser, PocketSphinx.js may be a better fit.

From Single Screen to Dual Screen - a Design Study for a User-Controlled Hypervideo-Based Physiotherapy Training

Publication Details
  • WSICC Workshop at TVX
  • Jun 22, 2016

Abstract

Close
Hypervideo based physiotherapy trainings bear an opportunity to support patients in continuing their training after being released from a rehabilitation clinic. Many exercises require the patient to sit on the floor or a gymnastic ball, lie on a gymnastics mat, or do the exercises in other postures. Using a laptop or tablet with a stand to show the exercises is more helpful than for example just having some drawings on a leaflet. However, it may lead to incorrect execution of the exercises while maintaining eye contact with the screen or require the user to get up and select the next exercise if the devices is positioned for a better view. A dual screen application, where contents are shown on a TV screen and the flow of the video can be controlled from a mobile second device, allows patients to keep their correct posture and the same time view and select contents. In this paper we propose first studies for user interface designs for such apps. Initial paper prototypes are discussed and refined in two focus groups. The results are then presented to a broader range of users in a survey. Three prototypes for the mobile app and one prototype for the TV are identified for future user tests.

Screen Concepts for Multi-Version Hypervideo Authoring

Publication Details
  • WSICC Workshop at TVX
  • Jun 22, 2016

Abstract

Close
The creation of hypervideos usually requires a lot of planning and is time consuming with respect to media content creation. However, when structure and media are put together to author a hypervideo, it may only require minor changes to make the hypervideo available in other languages or for another user group (like beginners versus experts). However, to make the translation of media and all navigation elements of a hypervideo efficient and manageable, the authoring tool needs a GUI that provides a good overview of elements that can be translated and of missing translations. In this work, we propose screen concepts that help authors to provide different versions (for example language and/or experience level) of a hypervideo. We analyzed different variants of GUI elements and evaluated them in a survey. We draw guidelines from the results that can help with the creation of similar systems in the future.
Publication Details
  • International Workshop on Interactive Content Consumption
  • Jun 22, 2016

Abstract

Close
The confluence of technologies such as telepresence, immersive imaging, model based virtual mirror worlds, mobile live streaming, etc. give rise to a capability for people anywhere to view and connect with present or past events nearly anywhere on earth. This capability properly belongs to a public commons, available as a birthright of all humans, and can been seen as part of an evolutionary transition supporting a global collective mind. We describe examples and elements of this capability, and suggest how they can be better integrated through a tool we call TeleViewer and a framework called WorldViews, which supports easy sharing of views as well as connecting of providers and consumers of views all around the world.
Publication Details
  • EICS 2016
  • Jun 21, 2016

Abstract

Close
Most current mobile and wearable devices are equipped with inertial measurement units (IMU) that allow the detection of motion gestures, which can be used for interactive applications. A difficult problem to solve, however, is how to separate ambient motion from an actual motion gesture input. In this work, we explore the use of motion gesture data labeled with gesture execution phases for training supervised learning classifiers for gesture segmentation. We believe that using gesture execution phase data can significantly improve the accuracy of gesture segmentation algorithms. We define gesture execution phases as the start, middle and end of each gesture. Since labeling motion gesture data with gesture execution phase information is work intensive, we used crowd workers to perform the labeling. Using this labeled data set, we trained SVM-based classifiers to segment motion gestures from ambient movement of the device t. We describe initial results that indicate that gesture execution phase can be accurately recognized by SVM classifiers. Our main results show that training gesture segmentation classifiers with phase-labeled data substantially increases the accuracy of gesture segmentation: we achieved a gesture segmentation accuracy of 0.89 for simulated online segmentation using a sliding window approach.

Beyond Actions: Exploring the Discovery of Tactics from User Logs

Publication Details
  • Information Processing & Management
  • Jun 11, 2016

Abstract

Close
Search log analysis has become a common practice to gain insights into user search behaviour, it helps gain an understanding of user needs and preferences, as well as how well a system supports such needs. Currently log analysis is typically focused on the low-level user actions, i.e. logged events such as issued queries and clicked results; and often only a selection of such events are logged and analysed. However, the types of logged events may differ widely from interface to interface, making comparison between systems difficult. Further, analysing a selection of events may lead to conclusions out of context— e.g. the statistics of observed query reformulations may be influenced by the existence of a relevance feedback component. Alternatively, in lab studies user activities can be analysed at a higher level, such as search tactics and strategies, abstracted away from detailed interface implementation. However, the required manual codings that map logged events to higher level interpretations prevent this type of analysis from going large scale. In this paper, we propose a new method for analysing search logs by (semi-)automatically identifying user search tactics from logged events, allowing large scale analysis that is comparable across search systems. We validate the efficiency and effectiveness of the proposed tactic identification method using logs of two reference search systems of different natures: a product search system and a video search system. With the identified tactics, we perform a series of novel log analyses in terms of entropy rate of user search tactic sequences, demonstrating how this type of analysis allows comparisons of user search behaviours across systems of different nature and design. This analysis provides insights not achievable with traditional log analysis.
Publication Details
  • ACM International Conference on Multimedia Retrieval (ICMR)
  • Jun 6, 2016

Abstract

Close
We propose a method for extractive summarization of audiovisual recordings focusing on topic-level segments. We first build a content similarity graph between all segments of all documents in the collection, using word vectors from the transcripts, and then select the most central segments for the summaries. We evaluate the method quantitatively on the AMI Meeting Corpus using gold standard reference summaries and the Rouge metric, and qualitatively on lecture recordings using a novel two-tiered approach with human judges. The results show that our method compares favorably with others in terms of Rouge, and outperforms the baselines for human scores, thus also validating our evaluation protocol.
Publication Details
  • LREC 2016
  • May 23, 2016

Abstract

Close
Many people post about their daily life on social media. These posts may include information about the purchase activity of people, and insights useful to companies can be derived from them: e.g. profile information of a user who mentioned something about their product. As a further advanced analysis, we consider extracting users who are likely to buy a product from the set of users who mentioned that the product is attractive. In this paper, we report our methodology for building a corpus for Twitter user purchase behavior prediction. First, we collected Twitter users who posted a want phrase + product name: e.g. "want a Xperia" as candidate want users, and also candidate bought users in the same way. Then, we asked an annotator to judge whether a candidate user actually bought a product. We also annotated whether tweets randomly sampled from want/bought user timelines are relevant or not to purchase. In this annotation, 58% of want user tweets and 35% of bought user tweets were annotated as relevant. Our data indicate that information embedded in timeline tweets can be used to predict purchase behavior of tweeted products.

Abstract

Close
The negative effect of lapses during a behavior-change program has been shown to increase the risk of repeated lapses and, ultimately, program abandonment. In this paper, we examine the potential of system-driven lapse management -- supporting users through lapses as part of a behavior-change tool. We first review lessons from domains such as dieting and addiction research and discuss the design space of lapse management. We then explore the value of one approach to lapse management -- the use of "cheat points" as a way to encourage sustained participation. In an online study, we first examine interpretations of progress that was reached through using cheat points. We then present findings from a deployment of lapse management in a two-week field study with 30 participants. Our results demonstrate the potential of this approach to motivate and change users' behavior. We discuss important open questions for the design of future technology-mediated behavior change programs.

Abstract

Close
Taking breaks from work is an essential and universal practice. In this paper, we extend current research on productivity in the workplace to consider the break habits of knowledge workers and explore opportunities of break logging for personal informatics. We report on three studies. Through a survey of 147 U.S.-based knowledge workers, we investigate what activities respondents consider to be breaks from work, and offer an understanding of the benefit workers desire when they take breaks. We then present results from a two-week in-situ diary study with 28 participants in the U.S. who logged 800 breaks, offering insights into the effect of work breaks on productivity. We finally explore the space of information visualization of work breaks and productivity in a third study. We conclude with a discussion of implications for break recommendation systems, availability and interuptibility research, and the quantified workplace.
Publication Details
  • CHI 2016 (Late Breaking Work)
  • May 7, 2016

Abstract

Close
We describe a novel thermal haptic output device, ThermoTouch, that provides a grid of thermal pixels. Unlike previous devices which mainly use Peltier elements for thermal output, ThermoTouch uses liquid cooling and electro-resistive heating to output thermal feedback at arbitrary grid locations. We describe the design of the prototype, highlight advantages and disadvantages of the technique and briefly discuss future improvements and research applications.
Publication Details
  • IEEE Multimedia Magzine
  • May 2, 2016

Abstract

Close
Silicon Valley is home to many of the world’s largest technology corporations, as well as thousands of small startups. Despite the development of other high-tech economic centers throughout the US and around the world, Silicon Valley continues to be a leading hub for high-tech innovation and development, in part because most of its companies and universities are within 20 miles of each other. Given the high concentration of multimedia researchers in Silicon Valley, and the high demand for information exchange, I was able to work with a team of researchers from various companies and organizations to start the Bay Area Multimedia Forum (BAMMF) series back in November 2013.
Publication Details
  • Multimedia Systems Journal
  • Apr 12, 2016

Abstract

Close
With modern technologies, it is possible to create annotated interactive non-linear videos (a form of hypervideo) for the Web. These videos have a non-linear structure of linked scenes to which additional information (other media like images, text, audio, or additional videos) can be added. A variety of user interactions - like in- and between-scene navigation or zooming into additional information - are possible in players for this type of video. Like linear video, quality of experience (QoE) with annotated hypervideo experiences is tied to the temporal consistency of the video stream at the client end - its flow. Despite its interactive complexity, users expect this type of video experience to flow as seamlessly as simple linear video. However, the added hypermedia elements bog playback engines down. Download and cache management systems address the flow issue, but their effectiveness is tied to numerous questions respecting user requirements, computational strategy, and evaluative metrics. In this work, we a) define QoE metrics, b) examine structural and behavioral patterns of interactive annotated non-linear video, c) propose download and cache management algorithms and strategies, d) describe the implementation of an evaluative simulation framework, and e) present the algorithm test results.

Social Media-Based Profiling of Business Locations

Publication Details
  • Fuji Xerox Technical Report
  • Mar 17, 2016

Abstract

Close
We present a method for profiling businesses at specific locations that is based on mining information from social media. The method matches geo-tagged tweets from Twitter against venues from Foursquare to identify the specific business mentioned in a tweet. By linking geo-coordinates to places, the tweets associated with a business, such as a store, can then be used to profile that business. From these venue-located tweets, we create sentiment profiles for each of the stores in a chain. We present the results as heat maps showing how sentiment differs across stores in the same chain and how some chains have more positive sentiment than other chains. We also estimate social group size from photos and create profiles of social group size for businesses. Sample heat maps of these results illustrate how the average social group size can vary across businesses.
Publication Details
  • IUI 2016
  • Mar 7, 2016

Abstract

Close
We describe methods for analyzing and visualizing document metadata to provide insights about collaborations over time. We investigate the use of Latent Dirichlet Allocation (LDA) based topic modeling to compute areas of interest on which people collaborate. The topics are represented in a node-link force directed graph by persistent fixed nodes laid out with multidimensional scaling (MDS), and the people by transient movable nodes. The topics are also analyzed to detect bursts to highlight "hot" topics during a time interval. As the user manipulates a time interval slider, the people nodes and links are dynamically updated. We evaluate the results of LDA topic modeling for the visualization by comparing topic keywords against the submitted keywords from the InfoVis 2004 Contest, and we found that the additional terms provided by LDA-based keyword sets result in improved similarity between a topic keyword set and the documents in a corpus. We extended the InfoVis dataset from 8 to 20 years and collected publication metadata from our lab over a period of 21 years, and created interactive visualizations for exploring these larger datasets.

Abstract

Close
The use of videoconferencing in the workplace has been steadily growing. While multitasking during video conferencing is often necessary, it is also viewed as impolite and sometimes unacceptable. One potential contributor to negative attitudes towards such multitasking is the disrupted sense of eye contact that occurs when an individual shifts their gaze away to another screen, for example, in a dual-monitor setup, common in office settings. We present a system to improve a sense of eye contact over videoconferencing in dual-monitor setups. Our system uses computer vision and desktop activity detection to dynamically choose the camera with the best view of a user's face. We describe two alternative implementations of our system (RGB-only, and a combination of RGB and RGB-D cameras). We then describe results from an online experiment that shows the potential of our approach to significantly improve perceptions of a person's politeness and engagement in the meeting.
Publication Details
  • Proceedings of CSCW 2016
  • Feb 27, 2016

Abstract

Close
This paper presents a detailed examination of factors that affect perceptions of, and attitudes towards multitasking in dyadic video conferencing. We first report findings from interviews with 15 professional users of videoconferencing. We then report results from a controlled online experiment with 397 participants based in the United States. Our results show that the technology used for multitasking has a significant effect on others' assumptions of what secondary activity the multitasker is likely engaged in, and that this assumed activity in turn affects evaluations of politeness and appropriateness. We also describe how different layouts of the video conferencing UI may lead to better or worse ratings of engagement and in turn ratings of polite or impolite behavior. We then propose a model that captures our results and use the model to discuss implications for behavior and for the design of video communication tools.
Publication Details
  • CSCW 2016
  • Feb 27, 2016

Abstract

Close
We present MixMeetWear, a smartwatch application that allows users to maintain awareness of the audio and visual content of a meeting while completing other tasks. Users of the system can listen to the audio of a meeting and also view, zoom, and pan webcam and shared content keyframes of other meeting participants' live streams in real time. Users can also provide input to the meeting via speech-to-text or predefined responses. A study showed that the system is useful for peripheral awareness of some meetings.
Publication Details
  • CSCW 2016
  • Feb 26, 2016

Abstract

Close
Remote meetings are messy. There are an ever-increasing number of support tools available, and, as past work has shown, people will tend to select a subset of those tools to satisfy their own institutional, social, and personal needs. While video tools make it relatively easy to have conversations at a distance, they are less adapted to sharing and archiving multimedia content. In this paper we take a deeper look at how sharing multimedia content occurs before, during, and after distributed meetings. Our findings shed light on the decisions and rationales people use to select from the vast set of tools available to them to prepare for, conduct, and reconcile the results of a remote meeting.
Publication Details
  • Personal and Ubiquitous Computing (Springer)
  • Feb 19, 2016

Abstract

Close
In recent years, there has been an explosion of services that lever- age location to provide users novel and engaging experiences. However, many applications fail to realize their full potential because of limitations in current location technologies. Current frameworks work well outdoors but fare poorly indoors. In this paper we present LoCo, a new framework that can provide highly accurate room-level indoor location. LoCo does not require users to carry specialized location hardware—it uses radios that are present in most contemporary devices and, combined with a boosting classification technique, provides a significant runtime performance improvement. We provide experiments that show the combined radio technique can achieve accuracy that improves on current state-of-the-art Wi-Fi only techniques. LoCo is designed to be easily deployed within an environment and readily leveraged by application developers. We believe LoCo’s high accuracy and accessibility can drive a new wave of location-driven applications and services.
Publication Details
  • AAAI
  • Feb 12, 2016

Abstract

Close
Image localization is important for marketing and recommendation of local business; however, the level of granularity is still a critical issue. Given a consumer photo and its rough GPS information, we are interested in extracting the fine-grained location information (i.e. business venues) of the image. To this end, we propose a novel framework for business venue recognition. The framework mainly contains three parts. First, business aware visual concept discovery: we mine a set of concepts that are useful for business venue recognition based on three guidelines including business-awareness, visually detectable, and discriminative power. Second, business-aware concept detection by convolutional neural networks (BA-CNN): we pro- pose a new network architecture that can extract semantic concept features from input image. Third, multimodal business venue recognition: we extend visually detected concepts to multimodal feature representations that allow a test image to be associated with business reviews and images from social media for business venue recognition. The experiments results show the visual concepts detected by BA-CNN can achieve up to 22.5% relative improvement for business venue recognition compared to the state-of-the-art convolutional neural network features. Experiments also show that by leveraging multimodal information from social media we can further boost the performance, especially in the case when the database images belonging to each business venue are scarce.
Publication Details
  • MMM 2016
  • Jan 4, 2016

Abstract

Close
Hypervideos yield to different challenges in the area of navigation due to their underlying graph structure. Especially when used on tablets or by older people, a lack of clarity may lead to confusion and rejection of this type of medium. To avoid confusion, the hypervideo can be extended with a well known table of contents, which needs to be created separately by the authors due to an underlying graph structure. In this work, we present an extended presentation of a table of contents for hypervideos on mobile devices. The design was tested in a real world medical training scenario with the target group of people older than 45 which is the main target group of these applications. This user group is a particular challenge since they sometimes have limited experience in the use of mobile devices and physical deficiencies with growing age. Our user interface was designed in three steps. The findings of an expert group and a survey were used to create two different prototypical versions of the display, which were then tested against each other in a user test. This test revealed that a divided view is desired. The table of contents in an easy-to-touch version should be on the left side and previews of scenes should be on the right side of the view. These findings were implemented in the existing SIVA HTML5 open source player and tested with a second group of users. This test only lead to minor changes in the GUI.