Harnessing Popularity in Social Media for Extractive Summarization of Online Conversations

Abstract

We leverage a popularity measure in social media as a distant label for extractive summarization of online conversations. In social media, users can vote, share, or bookmark a post they prefer. The number of these actions is regarded as a measure of popularity. However, popularity is not solely determined by content of a post, e.g., a text or an image in a post, but is highly contaminated by its contexts, e.g., timing, and authority. We propose a disjunctive model, which computes the contribution of content and context separately. For evaluation, we build a dataset where the informativeness of a comment is annotated. We evaluate the results with ranking metrics, and show that our model outperforms the baseline model, which directly uses popularity as a measure of informativeness.