Video Surveillance

Dynamic Object Tracking System (DOTS)

The Dynamic Object Tracking System (DOTS) is an indoor, real-time, multi-camera video surveillance system, deployed in a real office setting.

Video surveillance systems are common in commercial, industrial, and residential environments. A common surveillance activity is to track important people, or people exhibiting suspicious behavior, as they move from camera to camera. With the decreasing cost of video hardware, the number of video streams per installation is increasing. The increased scale creates difficulties for humans trying to recognize important events as they happen and to track people through the monitored space.

Video surveillance has a number of applications related to home and office security. It is also required in places such as casinos, banks, hospitals, and health care facilities. In addition, surveillance techniques are increasingly applied to retail and business process analysis. Our vision is surveillance systems that provide users with an understanding of remote activity and events through the use of video analysis, remote sensors, and user interfaces.

We are developing surveillance systems that provide users with an understanding of remote activity and events. Combining our expertise in video analysis and user interfaces, we have developed DOTS (Dynamic Object Tracking System), an indoor, real-time, multi-camera surveillance system deployed in a real office setting with the following features:

  • integration across multiple cameras,
  • a “smart” interface that makes it easy to follow a specified person across cameras,
  • summarization of amount of activity and movement patterns,
  • integration with other sensors.

DOTS Video Surveillance System

The DOTS user interface displays multiple streams of recorded or live video. It provides automatic object tracking and visualization based on video analysis results stored in a database. The user interface includes a camera bank, a timeline with event display, a floor plan, and a main player area. The camera bank presents views of all cameras in low resolution and at a low frame rate. All displays are synchronized to the same playback position; skipping to a different position in the timeline controls all video displays. Users can manually select cameras in the camera bank to see a larger display at a higher frame rate in the main video playback area. Video playback is controlled by a non-linear timeline that is synchronized to all displayed video streams. It uses a detailed linear scale for the video around the current playback position shown in yellow and a less detailed linear scale for the video far away from the playback position. Controls let the user increase and decrease the playback speed and reverse the playback direction. DOTS supports externally generated events such as from door sensors or RFID tags. Security video installations often require security personnel to monitor hundreds of video streams. A floor-plan interface component enables security personnel to select which video streams to include in the multi-stream video player. The floor plan displays the location of each camera, its field of view, the cameras being shown in the main viewer, and the objects being tracked. The main player area displays important camera views such as those that show a person currently being tracked. The main player area displays one or more video streams at high frame rates. The size of a video stream display indicates its relative importance as determined by either the user or the system. Users can switch between automatic and non-automatic selection modes and can override the automatic selection at any time.

DOTS Figure 1

When manually tracking a person walking from camera view to camera view, it is difficult for users to predict the camera view in which a tracked person might appear after walking out of the main camera view. To better support this task, we created an alternative to the main player area. The Spatial Multi-Video (SMV) player selects and organizes its contents primarily based on geographic relations between the main camera and the other cameras. Rather than displaying all camera views, only views in close proximity are shown. Multiple smaller views surround the central view; a person walking out of the main camera’s view will likely appear in the camera view adjacent to the direction they walked out.

DOTS Figure 2

The DOTS user interface suite includes a 3D viewer that displays segmented foreground regions of tracked people in a simple 3D model of the surveillance area. Foreground segments are shown as ‘billboards’ facing the virtual camera, placed at the tracked position of the person. Arbitrary viewpoints are supported, but two modes provide particularly useful mobile views. One places the virtual view at the position of a tracked person to help a surveillance user understand what the tracked person can see as they move. The other automatically chooses the best camera view of a tracked person. As the person moves around and the best camera view changes, the virtual view smoothly transitions to the next camera. Models are produced by a tool that allows tracing over a floor plan image to define walls, but they could be imported from CAD or architectural files. The floor is texture mapped with the building floor plan, and surfaces are currently given simple texture maps.

DOTS Figure 3

DOTS detects foreground objects in a camera view by performing a foreground segmentation consisting of a pixel-level background modeling and a feature-level subtraction approach. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion. In our installation, cameras are mounted near the ceiling with oblique downward views. We measured the location of each camera in three dimensions. We also estimated pan, tilt, yaw, and field of view by matching up well-known points in the world (e.g., corners of walls) with their views in a camera. Our system uses calibrated camera information and a model of the building geometry to estimate each object’s position given the bounding box associated with the object. Once objects are tracked in single-camera views, camera handoff determines the likelihood that multiple tracks result from the same object. In our surveillance system, people may pass from one camera view to another through “blind” regions in which they cannot be seen. In those cases, previously learned times between cameras are used to match tracks.

DOTS is an integrated system for office video surveillance. It combines an infrastructure for recording network video cameras with a video analysis component for detecting events and tracking people, and a user interface that can quickly access recorded and live video. DOTS uses the results of the video analysis to guide users’ attention to interesting events for more effective monitoring in systems with many video streams. We gained a better understanding of issues in video surveillance through the year of DOTS’ deployment in our office. We improved our analysis methods and the user interface, but additional work remains. For example, we are working on detecting higher-level events such as unusual behavior, fights or falls.

Technical Contact

Related Publications

2008
Publication Details
  • ACM Multimedia
  • Oct 27, 2008

Abstract

Close
Retail establishments want to know about traffic flow and patterns of activity in order to better arrange and staff their business. A large number of fixed video cameras are commonly installed at these locations. While they can be used to observe activity in the retail environment, assigning personnel to this is too time consuming to be valuable for retail analysis. We have developed video processing and visualization techniques that generate presentations appropriate for examining traffic flow and changes in activity at different times of the day. Taking the results of video tracking software as input, our system aggregates activity in different regions of the area being analyzed, determines the average speed of moving objects in the region, and segments time based on significant changes in the quantity and/or location of activity. Visualizations present the results as heat maps to show activity and object counts and average velocities overlaid on the map of the space.
2007

DOTS: Support for Effective Video Surveillance

Publication Details
  • Fuji Xerox Technical Report No. 17, pp. 83-100
  • Nov 1, 2007

Abstract

Close
DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.
Publication Details
  • ICDSC 2007, pp. 132-139
  • Sep 25, 2007

Abstract

Close
Our analysis and visualization tools use 3D building geometry to support surveillance tasks. These tools are part of DOTS, our multicamera surveillance system; a system with over 20 cameras spread throughout the public spaces of our building. The geometric input to DOTS is a floor plan and information such as cubicle wall heights. From this input we construct a 3D model and an enhanced 2D floor plan that are the bases for more specific visualization and analysis tools. Foreground objects of interest can be placed within these models and dynamically updated in real time across camera views. Alternatively, a virtual first-person view suggests what a tracked person can see as she moves about. Interactive visualization tools support complex camera-placement tasks. Extrinsic camera calibration is supported both by visualizations of parameter adjustment results and by methods for establishing correspondences between image features and the 3D model.

DOTS: Support for Effective Video Surveillance

Publication Details
  • ACM Multimedia 2007, pp. 423-432
  • Sep 24, 2007

Abstract

Close
DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.
Publication Details
  • ICME 2007, pp. 1015-1018
  • Jul 2, 2007

Abstract

Close
We describe a new interaction technique that allows users to control nonlinear video playback by directly manipulating objects seen in the video. This interaction technique is simi-lar to video "scrubbing" where the user adjusts the playback time by moving the mouse along a slider. Our approach is superior to variable-scale scrubbing in that the user can con-centrate on interesting objects and does not have to guess how long the objects will stay in view. Our method relies on a video tracking system that tracks objects in fixed cameras, maps them into 3D space, and handles hand-offs between cameras. In addition to dragging objects visible in video windows, users may also drag iconic object representations on a floor plan. In that case, the best video views are se-lected for the dragged objects.
Publication Details
  • ICME 2007, pp. 675-678
  • Jul 2, 2007

Abstract

Close
In this paper we describe the analysis component of an indoor, real-time, multi-camera surveillance system. The analysis includes: (1) a novel feature-level foreground segmentation method which achieves efficient and reliable segmentation results even under complex conditions, (2) an efficient greedy search based approach for tracking multiple people through occlusion, and (3) a method for multi-camera handoff that associates individual trajectories in adjacent cameras. The analysis is used for an 18 camera surveillance system that has been running continuously in an indoor business over the past several months. Our experiments demonstrate that the processing method for people detection and tracking across multiple cameras is fast and robust.
Publication Details
  • CHI 2007, pp. 1167-1176
  • Apr 28, 2007

Abstract

Close
A common video surveillance task is to keep track of people moving around the space being monitored. It is often difficult to track activity between cameras because locations such as hallways in office buildings can look quite similar and do not indicate the spatial proximity of the cameras. We describe a spatial video player that orients nearby video feeds with the field of view of the main playing video to aid in tracking between cameras. This is compared with the traditional bank of cameras with and without interactive maps for identifying and selecting cameras. We additionally explore the value of static and rotating maps for tracking activity between cameras. The study results show that both the spatial video player and the map improve user performance when compared to the camera-bank interface. Also, subjects change cameras more often with the spatial player than either the camera bank or the map, when available.
2006
Publication Details
  • In Proceedings of the fourth ACM International Workshop on Video Surveillance & Sensor Networks VSSN '06, Santa Barbara, CA, pp. 19-26
  • Oct 27, 2006

Abstract

Close
Video surveillance systems have become common across a wide number of environments. While these installations have included more video streams, they also have been placed in contexts with limited personnel for monitoring the video feeds. In such settings, limited human attention, combined with the quantity of video, makes it difficult for security personnel to identify activities of interest and determine interrelationships between activities in different video streams. We have developed applications to support security personnel both in analyzing previously recorded video and in monitoring live video streams. For recorded video, we created storyboard visualizations that emphasize the most important activity as heuristically determined by the system. We also developed an interactive multi-channel video player application that connects camera views to map locations, alerts users to unusual and suspicious video, and visualizes unusual events along a timeline for later replay. We use different analysis techniques to determine unusual events and to highlight them in video images. These tools aid security personnel by directing their attention to the most important activity within recorded video or among several live video streams.