Multi-modal Language Models for Lecture Video Retrieval


We propose Multi-modal Language Models (MLMs), which adapt latent variable models for text document analysis to modeling co-occurrence relationships in multi-modal data. In this paper, we focus on the
application of MLMs to indexing slide and spoken text associated with lecture videos, and subsequently employ a multi-modal probabilistic ranking function for lecture video retrieval. The MLM achieves highly competitive results against well established retrieval methods such as the Vector Space Model and Probabilistic Latent Semantic Analysis. Retrieval performance with MLMs is also shown to improve with the quality of the available extracted spoken text.