TalkMiner

TalkMiner aggregates web-distributed lecture video for advanced text-based search.

TalkMiner is a system for aggregating and processing web-based lecture video to enable slide-based visual index and text-based search of the content within the presentation.

TalkMiner is offered online as a public demo at talkminer.fxpal.com. Lecture webcasts are readily available on the Internet. These include class lecturesresearch seminars, and product demonstrations. The webcasts often combine presentation slides with either a synchronized audio stream (i.e., podcast) or an audio/video stream.

Conventional web search engines will retrieve this content if you include “webcast” or “lecture” among your search terms, or perform a search on a website that specifically organizes lecture content. But users, particularly students, want to find the points when an instructor covers a specific topic in a lecture. Answering these queries requires a search engine that can search within the webcast to identify important keywords.

TalkMiner processes RSS feeds from various sites to collect lecture videos. TalkMiner does not maintain a copy of the original videos. Rather, processing generates metadata about the talk including the video frames containing slides and their time codes, and the text recovered from those frames by OCR. When a user plays a lecture, the video is played from the original website on which the lecture webcast is hosted. As a result, storage requirements for the system are modest.  TalkMiner analyzes web videos to identify unique slide images and builds a search index from words on those presentation slides. Additionally, each talk may be browsed by its captured slide images for efficient non-linear playback. Visual cues indicate slides containing search terms within relevant talks.

A slide detection algorithm was developed to handle common video production techniques and correctly identify slides. Such production techniques include shooting a slide screen from the back of the room, picture in picture compositing, and multiple camera videos that intersperse shots of slides with shots of the speaker. Additionally, we developed specialized processing for slides with progressive content, e.g. bulleted lists that are revealed gradually. Because detected slide images underlie both our search index and browsing interface, their automatic detection is a critical component in the system design.

TalkMiner builds its index and interface from commonly recorded video rather than using dedicated lecture-capture systems, or requiring careful post-capture authoring, or even imposing onerous constraints on the style of the video capture. Thus, the system can scale to include a greater volume and variety of content at a much lower cost than would otherwise be possible. The system is detailed in our 2010 ACM Multimedia paper.

Technical Contact

Related Publications

2014

Multi-modal Language Models for Lecture Video Retrieval

Publication Details
  • ACM Multimedia 2014
  • Nov 2, 2014

Abstract

Close
We propose Multi-modal Language Models (MLMs), which adapt latent variable models for text document analysis to modeling co-occurrence relationships in multi-modal data. In this paper, we focus on the application of MLMs to indexing slide and spoken text associated with lecture videos, and subsequently employ a multi-modal probabilistic ranking function for lecture video retrieval. The MLM achieves highly competitive results against well established retrieval methods such as the Vector Space Model and Probabilistic Latent Semantic Analysis. Retrieval performance with MLMs is also shown to improve with the quality of the available extracted spoken text.
2013
Publication Details
  • SPIE Electronic Imaging 2013
  • Feb 3, 2013

Abstract

Close
Video is becoming a prevalent medium for e-learning. Lecture videos contain useful information in both the visual and aural channels: the presentation slides and lecturer's speech respectively. To extract the visual information, we apply video content analysis to detect slides and optical character recognition (OCR) to obtain their text. Automatic speech recognition (ASR) is used similarly to extract spoken text from the recorded audio. These two text sources have distinct characteristics and relative strengths for video retrieval. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Experiments reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Additional experiments demonstrate higher precision video retrieval using automatically extracted slide text.  
2012

TalkMiner: A Lecture Video Search Engine

Publication Details
  • Fuji Xerox Technical Report, No. 21, 2012, pp. 118-128
  • Feb 3, 2012

Abstract

Close
The design and implementation of a search engine for lecture webcasts is described. A searchable text index is created allowing users to locate material within lecture videos found on a variety of websites such as YouTube and Berkeley webcasts. The searchable index is built from the text of presentation slides appearing in the video along with other associated metadata such as the title and abstract when available. The automatic identification of distinct slides within the video stream presents several challenges. For example, picture-in-picture compositing of a speaker and a presentation slide, switching cameras, and slide builds confuse basic algorithms for extracting keyframe slide images. Enhanced algorithms are described that improve slide identification. A public system was deployed to test the algorithms and the utility of the search engine at www.talkminer.com. To date, over 17,000 lecture videos have been indexed from a variety of public sources.
2010

TalkMiner: A Search Engine for Online Lecture Video

Publication Details
  • ACM Multimedia 2010 - Industrial Exhibits
  • Oct 25, 2010

Abstract

Close
TalkMiner is a search engine for lecture webcasts. Lecture videos are processed to recover a set of distinct slide images and OCR is used to generate a list of indexable terms from the slides. On our prototype system, users can search and browse lists of lectures, slides in a specific lecture, and play the lecture video. Over 10,000 lecture videos have been indexed from a variety of sources. A public website now allows users to experiment with the search engine.

TalkMiner: A Lecture Webcast Search Engine

Publication Details
  • ACM Multimedia 2010
  • Oct 25, 2010

Abstract

Close
The design and implementation of a search engine for lecture webcasts is described. A searchable text index is created allowing users to locate material within lecture videos found on a variety of websites such as YouTube and Berkeley webcasts. The index is created from words on the presentation slides appearing in the video along with any associated metadata such as the title and abstract when available. The video is analyzed to identify a set of distinct slide images, to which OCR and lexical processes are applied which in turn generate a list of indexable terms. Several problems were discovered when trying to identify distinct slides in the video stream. For example, picture-in-picture compositing of a speaker and a presentation slide, switching cameras, and slide builds confuse basic frame-differencing algorithms for extracting keyframe slide images. Algorithms are described that improve slide identification. A prototype system was built to test the algorithms and the utility of the search engine. Users can browse lists of lectures, slides in a specific lecture, or play the lecture video. Over 10,000 lecture videos have been indexed from a variety of sources. A public website will be published in mid 2010 that allows users to experiment with the search engine.