Learning to Disentangle Interleaved Conversational Threads with a Siamese Hierarchical Network and Similarity Ranking

Abstract

An enormous amount of conversation occurs online every day, including on chat platforms where multiple conversations may take place concurrently.
Interleaved conversations lead to difficulties in not only following discussions but also retrieving relevant information from simultaneous messages.
Conversation disentanglement aims to separate overlapping messages into detached conversations.

In this paper, we propose to leverage representation learning for conversation disentanglement. A Siamese Hierarchical Convolutional Neural Network (SHCNN), which integrates local and more global representations of a message, is first presented to estimate the conversation-level similarity between closely posted messages. With the estimated similarity scores, our algorithm for Conversation Identification by SImilarity Ranking (CISIR) then derives conversations based on high-confidence message pairs and pairwise redundancy.
Experiments were conducted with four publicly available datasets of conversations from Reddit and IRC channels. The experimental results show that our approach significantly outperforms comparative baselines in both pairwise similarity estimation and conversation disentanglement.