Chelhwon Kim, Ph.D.

Research Scientist

Chelhwon Kim

Chelhwon Kim joined FXPAL as a Research Engineer in April 2016. His research focus is on Computer Vision. He received his Ph.D. from University of California, Santa Cruz in Electrical Engineering. For more information, please visit his personal website.

Co-Authors

Publications

2019

Abstract

Close
We present a remote assistance system that enables a remotely located expert to provide guidance using hand gestures to a customer who performs a physical task in a different location. The system is built on top of a web-based real-time media communication framework which allows the customer to use a commodity smartphone to send a live video feed to the expert, from which the expert can see the view of the customer's workspace and can show his/her hand gestures over the video in real-time. The expert's hand gesture is captured with a hand tracking device and visualized with a rigged 3D hand model on the live video feed. The system can be accessed via a web browser, and it does not require any app software to be installed on the customer's device. Our system supports various types of devices including smartphone, tablet, desktop PC, and smart glass. To improve the collaboration experience, the system provides a novel gravity-aware hand visualization technique.
Publication Details
  • ACM ISS 2019
  • Nov 9, 2019

Abstract

Close
In a telepresence scenario with remote users discussing a document, it can be difficult to follow which parts are being discussed. One way to address this is by showing the user's hand position on the document, which also enables expressive gestural communication. An important practical problem is how to capture and transmit the hand movements efficiently with high resolution document images. We propose a tabletop system with two channels that integrates document capture with a 4K video camera and hand tracking with a webcam, in which the document image and hand skeleton data are transmitted at different rates and handled by a lightweight Web browser client at remote sites. To enhance the rendering, we employ velocity based smoothing and ephemeral motion traces. We tested our prototype over long distances from USA to Japan and to Italy, and report on latency and jitter performance. Our system achieves relatively low latency over a long distance in comparison with a tele-immersive system that transmits mesh data over much shorter distances.
Publication Details
  • ACM MM
  • Oct 21, 2019

Abstract

Close
Despite work on smart spaces, nowadays a lot of knowledge work happens in the wild: at home, in coffee places, trains, buses, planes, and of course in crowded open office cubicles. Conducting web conferences in these settings creates privacy issues, and can also distract participants, leading to a perceived lack of professionalism from the remote peer(s). To solve this common problem, we implemented CamaLeon, a browser-based tool that uses real-time machine vision powered by deep learning to change the webcam stream sent by the remote peer: specifically, CamaLeon dynamically changes the "wild" background into one that resembles that of the office workers. In order to detect the background in wild settings, we designed and trained a fast UNet model on head and shoulder images. CamaLeon also uses a face detector to determine whether it should stream the person's face, depending on its location (or lack of presence). It uses face recognition to make sure it streams only a face that belongs to the user who connected to the meeting. The system was tested during a few real video conferencing calls at our company where 2 workers are remote. Both parties felt a sense of enhanced co-presence, and the remote participants felt more professional with their background replaced.

Abstract

Close
Localization in an indoor and/or Global Positioning System (GPS)-denied environment is paramount to drive various applications that require locating humans and/or robots in an unknown environment. Various localization systems using different ubiquitous sensors such as camera, radio frequency, inertial measurement unit have been developed. Most of these systems cannot accommodate for scenarios which have substan- tial changes in the environment such as a large number of people (unpredictable) and sudden change in the environment floor plan (unstructured). In this paper, we propose a system, InFo that can leverage real-time visual information captured by surveillance cameras and augment that with images captured by the smart device user to deliver accurate discretized location information. Through our experiments, we demonstrate that our deep learning based InFo system provides an improvement of 10% as compared to a system that does not utilize this real-time information.
2018
Publication Details
  • ISS 2018
  • Nov 25, 2018

Abstract

Close
Projector-camera systems can turn any surface such as tabletops and walls into an interactive display. A basic problem is to recognize the gesture actions on the projected UI widgets. Previous approaches using finger template matching or occlusion patterns have issues with environmental lighting conditions, artifacts and noise in the video images of a projection, and inaccuracies of depth cameras. In this work, we propose a new recognizer that employs a deep neural net with an RGB-D camera; specifically, we use a CNN (Convolutional Neural Network) with optical flow computed from the color and depth channels. We evaluated our method on a new dataset of RGB-D videos of 12 users interacting with buttons projected on a tabletop surface.
2017
Publication Details
  • ICDAR 2017
  • Nov 10, 2017

Abstract

Close
We present a system for capturing ink strokes written with ordinary pen and paper using a fast camera with a frame rate comparable to a stylus digitizer. From the video frames, ink strokes are extracted and used as input to an online handwriting recognition engine. A key component in our system is a pen up/down detection model for detecting the contact of the pen-tip with the paper in the video frames. The proposed model consists of feature representation with convolutional neural networks and classification with a recurrent neural network. We also use a high speed tracker with kernelized correlation filters to track the pen-tip. For training and evaluation, we collected labeled video data of users writing English and Japanese phrases from public datasets, and we report on character accuracy scores for different frame rates in the two languages.
Publication Details
  • IEEE PerCom 2017
  • Mar 13, 2017

Abstract

Close
We present Lift, a visible light-enabled finger tracking and object localization technique that allows users to perform freestyle multi-touch gestures on any object’s surface in an everyday environment. By projecting encoded visible patterns onto an object’s surface (e.g. paper, display, or table), and localizing the user’s fingers with light sensors, Lift offers users a richer interactive space than the device’s existing interfaces. Additionally, everyday objects can be augmented by attaching sensor units onto their surface to accept multi-touch gesture input. We also present two applications as a proof of concept. Finally, results from our experiments indicate that Lift can localize ten fingers simultaneously with accuracy of 0.9 mm and 1.8 mm on two axes respectively and an average refresh rate of 84 Hz with 16.7ms delay on WiFi and 12ms delay on serial, making gesture recognition on noninstrumented objects possible.
2015
Publication Details
  • ACM Multimedia Conference 2015
  • Oct 26, 2015

Abstract

Close
New technology comes about in a number of different ways. It may come from advances in scientific research, through new combinations of existing technology, or by simply from imagining what might be possible in the future. This video describes the evolution of Tabletop Telepresence, a system for remote collaboration through desktop videoconferencing combined with a digital desk. Tabletop Telepresence provides a means to share paper documents between remote desktops, interact with documents and request services (such as translation), and communicate with a remote person through a teleconference. It was made possible by combining advances in camera/projector technology that enable a fully functional digital desk, embodied telepresence in video conferencing and concept art that imagines future workstyles.
Publication Details
  • DocEng 2015
  • Sep 8, 2015

Abstract

Close
We present a novel system for detecting and capturing paper documents on a tabletop using a 4K video camera mounted overhead on pan-tilt servos. Our automated system first finds paper documents on a cluttered tabletop based on a text probability map, and then takes a sequence of high-resolution frames of the located document to reconstruct a high quality and fronto-parallel document page image. The quality of the resulting images enables OCR processing on the whole page. We performed a preliminary evaluation on a small set of 10 document pages and our proposed system achieved 98% accuracy with the open source Tesseract OCR engine.
2013
Publication Details
  • CBDAR 2013
  • Aug 23, 2013

Abstract

Close
Capturing book images is more convenient with a mobile phone camera than with more specialized flat-bed scanners or 3D capture devices. We built an application for the iPhone 4S that captures a sequence of hi-res (8 MP) images of a page spread as the user sweeps the device across the book. To do the 3D dewarping, we implemented two algorithms: optical flow (OF) and structure from motion (SfM). Making further use of the image sequence, we examined the potential of multi-frame OCR. Preliminary evaluation on a small set of data shows that OF and SfM had comparable OCR performance for both single-frame and multi-frame techniques, and that multi-frame was substantially better than single-frame. The computation time was much less for OF than for SfM.