Tackling Challenges of Neural Purchase Stage Identification from Imbalanced Twitter Data


Twitter and other social media platforms are often used for sharing interest in products. The identification of purchase decision stages, such as in the AIDA model (Awareness, Interest, Desire, Action), can enable more personalized e-commerce services and a finer-grained targeting of ads than predicting purchase intent only.
In this paper, we propose and analyze neural models for identifying the purchase stage of single tweets in a user’s tweet sequence. In particular, we identify three challenges of purchase stage identification: imbalanced label distribution with a high number of negative instances, limited amount of training data, and domain adaptation with no or only little target domain data.

Our experiments reveal that the imbalanced label distribution is the main challenge for our models. We address it with ranking loss and perform detailed investigations of the performance of our models on the different output classes. In order to improve the generalization of the models and augment the limited amount of training data, we examine the use of sentiment analysis as a complementary, secondary task in a multitask framework. For applying our models to tweets from another product domain, we consider two scenarios: For the first scenario without any labeled data in the target product domain, we show that learning domain-invariant representations with adversarial training is most promising while for the second scenario with a small number of labeled target examples, finetuning the source model weights performs best.

Finally, we conduct several analyses, including extracting attention weights and representative phrases for the different purchase stages. The results suggest that the model is learning features indicative of purchase stages and that the confusion errors are sensible.