This paper summarizes our environment-image/videosupported collaboration technologies developed in the past several years. These technologies use environment images and videos as active interfaces and use visual cues in these images and videos to orient device controls, annotations and other information access. By using visual cues in various interfaces, we expect to make the control interface more intuitive than buttonbased control interfaces and command-based interfaces. These technologies can be used to facilitate high-quality audio/video capture with limited cameras and microphones. They can also facilitate multi-screen presentation authoring and playback, teleinteraction, environment manipulation with cell phones, and environment manipulation with digital pens.