The design and implementation of a search engine for lecture webcasts is described. A searchable text index is created allowing users to locate material within lecture videos found on a variety of websites such as YouTube and Berkeley webcasts. The searchable index is built from the text of presentation slides appearing in the video along with other associated metadata such as the title and abstract when available.
The automatic identification of distinct slides within the video stream presents several challenges. For example, picture-in-picture compositing of a speaker and a presentation slide, switching cameras, and slide builds confuse basic algorithms for extracting keyframe slide images. Enhanced algorithms are described that improve slide identification.
A public system was deployed to test the algorithms and the utility of the search engine at www.talkminer.com. To date, over 17,000 lecture videos have been indexed from a variety of public sources.