Video surveillance systems are common in commercial, industrial, and residential environments. A common surveillance activity is to track important people, or people exhibiting suspicious behavior, as they move from camera to camera. With the decreasing cost of video hardware, the number of video streams per installation is increasing. The increased scale creates difficulties for humans trying to recognize important events as they happen and to track people through the monitored space.
Video surveillance has a number of applications related to home and office security. It is also required in places such as casinos, banks, hospitals, and health care facilities. In addition, surveillance techniques are increasingly applied to retail and business process analysis. Our vision is surveillance systems that provide users with an understanding of remote activity and events through the use of video analysis, remote sensors, and user interfaces.
We are developing surveillance systems that provide users with an understanding of remote activity and events. Combining our expertise in video analysis and user interfaces, we have developed DOTS (Dynamic Object Tracking System), an indoor, real-time, multi-camera surveillance system deployed in a real office setting with the following features:
integration across multiple cameras,
a "smart" interface that makes it easy to follow a specified person across cameras,
summarization of amount of activity and movement patterns,
The DOTS user interface displays multiple streams of recorded or live video. It provides automatic object tracking and visualization based on video analysis results stored in a database. The user interface includes a camera bank, a timeline with event display, a floor plan, and a main player area. The camera bank presents views of all cameras in low resolution and at a low frame rate. All displays are synchronized to the same playback position; skipping to a different position in the timeline controls all video displays. Users can manually select cameras in the camera bank to see a larger display at a higher frame rate in the main video playback area. Video playback is controlled by a non-linear timeline that is synchronized to all displayed video streams. It uses a detailed linear scale for the video around the current playback position shown in yellow and a less detailed linear scale for the video far away from the playback position. Controls let the user increase and decrease the playback speed and reverse the playback direction. DOTS supports externally generated events such as from door sensors or RFID tags. Security video installations often require security personnel to monitor hundreds of video streams. A floor-plan interface component enables security personnel to select which video streams to include in the multi-stream video player. The floor plan displays the location of each camera, its field of view, the cameras being shown in the main viewer, and the objects being tracked. The main player area displays important camera views such as those that show a person currently being tracked. The main player area displays one or more video streams at high frame rates. The size of a video stream display indicates its relative importance as determined by either the user or the system. Users can switch between automatic and non-automatic selection modes and can override the automatic selection at any time.
When manually tracking a person walking from camera view to camera view, it is difficult for users to predict the camera view in which a tracked person might appear after walking out of the main camera view. To better support this task, we created an alternative to the main player area. The Spatial Multi-Video (SMV) player selects and organizes its contents primarily based on geographic relations between the main camera and the other cameras. Rather than displaying all camera views, only views in close proximity are shown. Multiple smaller views surround the central view; a person walking out of the main camera's view will likely appear in the camera view adjacent to the direction they walked out
The DOTS user interface suite includes a 3D viewer that displays segmented foreground regions of tracked people in a simple 3D model of the surveillance area. Foreground segments are shown as 'billboards' facing the virtual camera, placed at the tracked position of the person. Arbitrary viewpoints are supported, but two modes provide particularly useful mobile views. One places the virtual view at the position of a tracked person to help a surveillance user understand what the tracked person can see as they move. The other automatically chooses the best camera view of a tracked person. As the person moves around and the best camera view changes, the virtual view smoothly transitions to the next camera. Models are produced by a tool that allows tracing over a floor plan image to define walls, but they could be imported from CAD or architectural files. The floor is texture mapped with the building floor plan, and surfaces are currently given simple texture maps.
DOTS detects foreground objects in a camera view by performing a foreground segmentation consisting of a pixel-level background modeling and a feature-level subtraction approach. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion. In our installation, cameras are mounted near the ceiling with oblique downward views. We measured the location of each camera in three dimensions. We also estimated pan, tilt, yaw, and field of view by matching up well-known points in the world (e.g., corners of walls) with their views in a camera. Our system uses calibrated camera information and a model of the building geometry to estimate each object's position given the bounding box associated with the object. Once objects are tracked in single-camera views, camera handoff determines the likelihood that multiple tracks result from the same object. In our surveillance system, people may pass from one camera view to another through "blind" regions in which they cannot be seen. In those cases, previously learned times between cameras are used to match tracks.
DOTS is an integrated system for office video surveillance. It combines an infrastructure for recording network video cameras with a video analysis component for detecting events and tracking people, and a user interface that can quickly access recorded and live video. DOTS uses the results of the video analysis to guide users' attention to interesting events for more effective monitoring in systems with many video streams. We gained a better understanding of issues in video surveillance through the year of DOTS' deployment in our office. We improved our analysis methods and the user interface, but additional work remains. For example, we are working on detecting higher-level events such as unusual behavior, fights or falls.