Times Change and Your Training Data Should Too: The Effect of Training Data Recency on Twitter Classifiers

Sophisticated adversaries are moving their botnet command and control infrastructure to social media microblogging sites such as Twitter. As security practitioners work to identify new methods for detecting and disrupting such botnets, including machine-learning approaches, we must better understand what effect training data recency has on classifier performance. This research investigates the performance of several binary classifiers and their ability to distinguish between non-verified and verified tweets as the offset between the age of the training data and test data changed.

Download file

38500 (PDF, 3.43MB)

11 Jul 2018

ByRyan O'Grady

All papers are copyrighted

No re-posting of papers is permitted