Many people post about their daily life on social media. These posts may include information about the purchase activity of people, and insights useful to companies can be derived from them: e.g. profile information of a user who mentioned something about their product. As a further advanced analysis, we consider extracting users who are likely to buy a product from the set of users who mentioned that the product is attractive.
In this paper, we report our methodology for building a corpus for Twitter user purchase behavior prediction. First, we collected Twitter users who posted a want phrase + product name: e.g. “want a Xperia” as candidate want users, and also candidate bought users in the same way. Then, we asked an annotator to judge whether a candidate user actually bought a product. We also annotated whether tweets randomly sampled from want/bought user timelines are relevant or not to purchase. In this annotation, 58% of want user tweets and 35% of bought user tweets were annotated as relevant. Our data indicate that information embedded in timeline tweets can be used to predict purchase behavior of tweeted products.