Sven Kratz, Ph.D.

Research Scientist

Sven Kratz

Sven’s research focuses on sensor-based mobile user interfaces that enable “post-touchscreen” interactions such as motion gestures or free-space interactions. Further interests include the exploration of gestural input in cooperative work environments and augmented spaces, as well as the use of mobile devices in surface computing applications.

Sven worked as a research assistant at Media Informatics and Human-Computer Interaction Group at the University of Munich , Germany, where he obtained his Ph.D. in Computer Science in 2012. From 2008–2011, Sven worked as a a junior researcher and Ph.D. student within the Quality and Usability Group at Telekom Innovation LaboratoriesTU Berlin, Germany. Sven holds a Diplom degree in Computer Science from RWTH Aachen University, Germany.

See Sven’s personal homepage for more information and a full publications list.

Co-Authors

Publications

2016
Publication Details
  • UIST 2016 (Demo)
  • Oct 16, 2016

Abstract

Close
We propose a robust pointing detection with virtual shadow representation for interacting with a public display. Using a depth camera, our shadow is generated by a model with an angled virtual sun light and detects the nearest point as a pointer. Position of the shadow becomes higher when user walks closer, which conveys the notion of correct distance to control the pointer and offers accessibility to the higher area of the display.
Publication Details
  • Ro-Man 2016
  • Aug 26, 2016

Abstract

Close
Two related challenges with current teleoperated robotic systems are lack of peripheral vision and awareness, and difficulty or tedium of navigating through remote spaces. We address these challenges by providing an interface with a focus plus context (F+C) view of the robot location, and where the user can navigate simply by looking where they want to go, and clicking or drawing a path on the view to indicate the desired trajectory or destination. The F+C view provides an undistorted, perspectively correct central region surrounded by a wide field of view peripheral portion, and avoids the need for separate views. The navigation method is direct and intuitive in comparison to keyboard or joystick based navigation, which require the user to be in a control loop as the robot moves. Both the F+C views and the direct click navigation were evaluated in a preliminary user study.
Publication Details
  • Ro-Man 2016
  • Aug 26, 2016

Abstract

Close
Mobile Telepresence Robots (MTR) are an emerging technology that extend the functionality of telepresence systems by adding mobility. MTRs nowadays, however, rely on stationary imaging systems such as a single narrow-view camera for vision, which can lead to reduced operator performance due to view-related deficiencies in situational awareness. We therefore developed an improved imaging and viewing platform that allows immersive telepresence using a Head Mounted Device (HMD) with head-tracked mono and stereoscopic video. Using a remote collaboration task to ground our research, we examine the effectiveness head-tracked HMD systems in comparison to a baseline monitor-based system. We performed a user study where participants were divided into three groups: fixed camera monitor-based baseline condition (without HMD), HMD with head-tracked 2D camera and HMD with head-tracked stereo camera. Results showed the use of HMD reduces task error rates and improves perceived collaborative success and quality of view, compared to the baseline condition. No major difference was found, however, between stereo and 2D camera conditions for participants wearing an HMD.
Publication Details
  • Springer Multimedia Tools and Applications: SPECIAL ISSUE ON
  • Jul 1, 2016

Abstract

Close
It is difficult to adjust the content of traditional slide presentations to the knowledge level, interest and role of individuals. This might force presenters to include content that is irrelevant for part of the audience, which negatively affects the knowledge transfer of the presentation. In this work, we present a prototype that is able to eliminate non-pertinent information from slides by presenting annotations for individual attendees on optical head-mounted displays. We first create guidelines for creating optimal annotations by evaluating several types of annotations alongside different types of slides. Then we evaluate the knowledge acquisition of presentation attendees using the prototype versus traditional presentations. Our results show that annotations with a limited amount of information, such as text up to 5 words, can significantly increase the amount of knowledge gained from attending a group presentation. Additionally, presentations where part of the information is moved to annotations are judged more positively on attributes such as clarity and enjoyment.
Publication Details
  • EICS 2016
  • Jun 21, 2016

Abstract

Close
Most current mobile and wearable devices are equipped with inertial measurement units (IMU) that allow the detection of motion gestures, which can be used for interactive applications. A difficult problem to solve, however, is how to separate ambient motion from an actual motion gesture input. In this work, we explore the use of motion gesture data labeled with gesture execution phases for training supervised learning classifiers for gesture segmentation. We believe that using gesture execution phase data can significantly improve the accuracy of gesture segmentation algorithms. We define gesture execution phases as the start, middle and end of each gesture. Since labeling motion gesture data with gesture execution phase information is work intensive, we used crowd workers to perform the labeling. Using this labeled data set, we trained SVM-based classifiers to segment motion gestures from ambient movement of the device t. We describe initial results that indicate that gesture execution phase can be accurately recognized by SVM classifiers. Our main results show that training gesture segmentation classifiers with phase-labeled data substantially increases the accuracy of gesture segmentation: we achieved a gesture segmentation accuracy of 0.89 for simulated online segmentation using a sliding window approach.
Publication Details
  • CHI 2016 (Late Breaking Work)
  • May 7, 2016

Abstract

Close
We describe a novel thermal haptic output device, ThermoTouch, that provides a grid of thermal pixels. Unlike previous devices which mainly use Peltier elements for thermal output, ThermoTouch uses liquid cooling and electro-resistive heating to output thermal feedback at arbitrary grid locations. We describe the design of the prototype, highlight advantages and disadvantages of the technique and briefly discuss future improvements and research applications.
Publication Details
  • Personal and Ubiquitous Computing (Springer)
  • Feb 19, 2016

Abstract

Close
In recent years, there has been an explosion of services that lever- age location to provide users novel and engaging experiences. However, many applications fail to realize their full potential because of limitations in current location technologies. Current frameworks work well outdoors but fare poorly indoors. In this paper we present LoCo, a new framework that can provide highly accurate room-level indoor location. LoCo does not require users to carry specialized location hardware—it uses radios that are present in most contemporary devices and, combined with a boosting classification technique, provides a significant runtime performance improvement. We provide experiments that show the combined radio technique can achieve accuracy that improves on current state-of-the-art Wi-Fi only techniques. LoCo is designed to be easily deployed within an environment and readily leveraged by application developers. We believe LoCo’s high accuracy and accessibility can drive a new wave of location-driven applications and services.
2015
Publication Details
  • MobileHCI 2015
  • Aug 24, 2015

Abstract

Close
In this paper we report findings from two user studies that explore the problem of establishing common viewpoint in the context of a wearable telepresence system. In our first study, we assessed the ability of a local person (the guide) to identify the view orientation of the remote person by looking at the physical pose of the telepresence device. In the follow-up study, we explored visual feedback methods for communicating the relative viewpoints of the remote user and the guide via a head-mounted display. Our results show that actively observing the pose of the device is useful for viewpoint estimation. However, in the case of telepresence devices without physical directional affordances, a live video feed may yield comparable results. Lastly, more abstract visualizations lead to significantly longer recognition times, but may be necessary in more complex environments.
Publication Details
  • Presented in "Everyday Telepresence" workshop at CHI 2015
  • Apr 18, 2015

Abstract

Close
As video-mediated communication reaches broad adoption, improving immersion and social interaction are important areas of focus in the design of tools for exploration and work-based communication. Here we present three threads of research focused on developing new ways of enabling exploration of a remote environment and interacting with the people and artifacts therein.
Publication Details
  • CHI 2015 (Extended Abstracts)
  • Apr 18, 2015

Abstract

Close
We present our ongoing research on automatic segmentation of motion gestures tracked by IMUs. We postulate that by recognizing gesture execution phases from motion data that we may be able to auto-delimit user gesture entries. We demonstrate that machine learning classifiers can be trained to recognize three distinct phases of gesture entry: the start, middle and end of a gesture motion. We further demonstrate that this type of classification can be done at the level of individual gestures. Furthermore, we describe how we captured a new data set for data exploration and discuss a tool we developed to allow manual annotations of gesture phase information. Initial results we obtained using the new data set annotated with our tool show a precision of 0.95 for recognition of the gesture phase and a precision of 0.93 for simultaneous recognition of the gesture phase and the gesture type.
Publication Details
  • Human-Robot Interaction (HRI) 2015
  • Mar 2, 2015

Abstract

Close
Our research focuses on improving the effectiveness and usability of driving mobile telepresence robots by increasing the user's sense of immersion during the navigation task. To this end we developed a robot platform that allows immersive navigation using head-tracked stereoscopic video and a HMD. We present the result of an initial user study that compares System Usability Scale (SUS) ratings of a robot teleoperation task using head-tracked stereo vision with a baseline fixed video feed and the effect of a low or high placement of the camera(s). Our results show significantly higher ratings for the fixed video condition and no effect of the camera placement. Future work will focus on examining the reasons for the lower ratings of stereo video and and also exploring further visual navigation interfaces.
2014
Publication Details
  • SUI-Symposium
  • Oct 4, 2014

Abstract

Close
It is now possible to develop head-mounted devices (HMDs) that allow for ego-centric sensing of mid-air gestural input. Therefore, we explore the use of HMD-based gestural input techniques in smart space environments. We developed a usage scenario to evaluate HMD-based gestural interactions and conducted a user study to elicit qualitative feedback on several HMD-based gestural input techniques. Our results show that for the proposed scenario, mid-air hand gestures are preferred to head gestures for input and rated more favorably compared to non-gestural input techniques available on existing HMDs. Informed by these study results, we developed a prototype HMD system that supports gestural interactions as proposed in our scenario. We conducted a second user study to quantitatively evaluate our prototype comparing several gestural and non-gestural input techniques. The results of this study show no clear advantage or disadvantage of gestural inputs vs.~non-gestural input techniques on HMDs. We did find that voice control as (sole) input modality performed worst compared to the other input techniques we evaluated. Lastly, we present two further applications implemented with our system, demonstrating 3D scene viewing and ambient light control. We conclude by briefly discussing the implications of ego-centric vs.~exo-centric tracking for interaction in smart spaces.
Publication Details
  • MobileHCI 2014 (Industrial Case Study)
  • Sep 23, 2014

Abstract

Close
Telepresence systems usually lack mobility. Polly, a wearable telepresence device, allows users to explore remote locations or experience events remotely by means of a person that serves as a mobile "guide". We built a series of hardware prototypes and our current, most promising embodiment consists of a smartphone mounted on a stabilized gimbal that is wearable. The gimbal enables remote control of the viewing angle as well as providing active image stabilization while the guide is walking. We present qualitative findings from a series of 8 field tests using either Polly or only a mobile phone. We found that guides felt more physical comfort when using Polly vs. a phone and that Polly was accepted by other persons at the remote location. Remote participants appreciated the stabilized video and ability to control camera view. Connection and bandwidth issues appear to be the most challenging issues for Polly-like systems.
Publication Details
  • MobileHCI 2014 (Full Paper)
  • Sep 23, 2014

Abstract

Close
Secure authentication with devices or services that store sensitive and personal information is highly important. However, traditional password and pin-based authentication methods compromise between the level of security and user experience. AirAuth is a biometric authentication technique that uses in-air gesture input to authenticate users. We evaluated our technique on a predefined (simple) gesture set and our classifier achieved an average accuracy of 96.6% in an equal error rate (EER-)based study. We obtained an accuracy of 100% when exclusively using personal (complex) user gestures. In a further user study, we found that AirAuth is highly resilient to video-based shoulder surfing attacks, with a mea- sured false acceptance rate of just 2.2%. Furthermore, a longitudinal study demonstrates AirAuth’s repeatability and accuracy over time. AirAuth is relatively simple, robust and requires only a low amount of computational power and is hence deployable on embedded or mobile hardware. Un- like traditional authentication methods, our system’s security is positively aligned with user-rated pleasure and excitement levels. In addition, AirAuth attained acceptability ratings in personal, office, and public spaces that are comparable to an existing stroke-based on-screen authentication technique. Based on the results presented in this paper, we believe that AirAuth shows great promise as a novel, secure, ubiquitous, and highly usable authentication method.

Polly: Telepresence from a Guide's Shoulder

Publication Details
  • Assistive Computer Vision and Robotics Workshop of ECCV
  • Sep 12, 2014

Abstract

Close
Polly is an inexpensive, portable telepresence device based on the metaphor of a parrot riding a guide's shoulder and acting as proxy for remote participants. Although remote users may be anyone with a desire for `tele-visits', we focus on limited mobility users. We present a series of prototypes and field tests that informed design iterations. Our current implementations utilize a smartphone on a stabilized, remotely controlled gimbal that can be hand held, placed on perches or carried by wearable frame. We describe findings from trials at campus, museum and faire tours with remote users, including quadriplegics. We found guides were more comfortable using Polly than a phone and that Polly was accepted by other people. Remote participants appreciated stabilized video and having control of the camera. One challenge is negotiation of movement and view control. Our tests suggests Polly is an effective alternative to telepresence robots, phones or fixed cameras.
Publication Details
  • Ubicomp 2014
  • Sep 9, 2014

Abstract

Close
In recent years, there has been an explosion of social and collaborative applications that leverage location to provide users novel and engaging experiences. Current location technologies work well outdoors but fare poorly indoors. In this paper we present LoCo, a new framework that can provide highly accurate room-level location using a supervised classification scheme. We provide experiments that show this technique is orders of magnitude more efficient than current state-of-the-art Wi- Fi localization techniques. Low classification overhead and computational footprint make classification practical and efficient even on mobile devices. Our framework has also been designed to be easily deployed and lever- aged by developers to help create a new wave of location- driven applications and services.
Publication Details
  • CHI 2014 (Interactivity)
  • Apr 26, 2014

Abstract

Close
AirAuth is a biometric authentication technique that uses in-air hand gestures to authenticate users tracked through a short-range depth sensor. Our method tracks multiple distinct points on the user's hand simultaneously that act as a biometric to further enhance security. We describe the details of our mobile demonstrator that will give Interactivity attendees an opportunity to enroll and verify our system's authentication method. We also wish to encourage users to design their own gestures for use with the system. Apart from engaging with the CHI community, a demonstration of AirAuth would also yield useful gesture data input by the attendees which we intend to use to further improve the prototype and, more importantly, make available publicly as a resource for further research into gesture-based user interfaces.
Publication Details
  • CHI Extended Abstracts 2014
  • Apr 26, 2014

Abstract

Close
AirAuth is a biometric, gesture-based authentication system based on in-air gesture input. We describe the operations necessary to sample enrollment gestures and to perform matching for authentication, using data from a short range depth sensor. We present the results of two initial user studies. A first study was conducted to crowd source a simple gesture set for use in further evaluations. The results of our second study indicate that AirAuth achieves a very high Equal Error Rate (EER-)based accuracy of 96.6 % for simple gesture set and 100 % for user-specific gestures. Future work will encompass the evaluation of possible attack scenarios and obtaining qualitative user feedback on usability advantages of gesture-based authentication.
2013
Publication Details
  • Interactive Tabletops and Surfaces (ITS) 2013
  • Oct 6, 2013

Abstract

Close
The expressiveness of touch input can be increased by detecting additional finger pose information at the point of touch such as finger rotation and tilt. PointPose is a prototype that performs finger pose estimation at the location of touch using a short-range depth sensor viewing the touch screen of a mobile device. We present an algorithm that extracts finger rotation and tilt from a point cloud generated by a depth sensor oriented towards the device's touchscreen. The results of two user studies we conducted show that finger pose information can be extracted reliably using our proposed method. We show this for controlling rotation and tilt axes separately and also for combined input tasks using both axes. With the exception of the depth sensor, which is mounted directly on the mobile device, our approach does not require complex external tracking hardware, and, furthermore, external computation is unnecessary as the finger pose extraction algorithm can run directly on the mobile device. This makes PointPose ideal for prototyping and developing novel mobile user interfaces that use finger pose estimation.
Publication Details
  • The International Symposium on Pervasive Displays
  • Jun 4, 2013

Abstract

Close
Existing user interfaces for the configuration of large shared displays with multiple inputs and outputs usually do not allow users easy and direct configuration of the display's properties such as window arrangement or scaling. To address this problem, we are exploring a gesture-based technique for manipulating display windows on shared display systems. To aid target selection under noisy tracking conditions, we propose VoroPoint, a modified Voronoi tessellation approach that increases the selectable target area of the display windows. By maximizing the available target area, users can select and interact with display windows with greater ease and precision.

Abstract

Close
Motivated by the addition of gyroscopes to a large number of new smart phones, we study the effects of combining accelerometer and gyroscope data on the recognition rate of motion gesture recognizers with dimensionality constraints. Using a large data set of motion gestures we analyze results for the following algorithms: Protractor3D, Dynamic Time Warping (DTW) and Regularized Logistic Regression (LR). We chose to study these algorithms because they are relatively easy to implement, thus well suited for rapid prototyping or early deployment during prototyping stages. For use in our analysis, we contribute a method to extend Protractor3D to work with the 6D data obtained by combining accelerometer and gyroscope data. Our results show that combining accelerometer and gyroscope data is beneficial also for algorithms with dimensionality constraints and improves the gesture recognition rate on our data set by up to 4%.