Recent Experiments on Abusive Language Detection


In this talk, I will present two recent projects I did in the area of abusive language detection. In the first part of my presentation, I will be focusing on abusive compounds, a specific subset of compounds that convey abusive content, such as "curry muncher" or "booze hound". I will propose a new unsupervised classification approach that incorporates linguistic properties of compounds. It mostly depends on a simple distributional representation. I will compare my approach against previously established methods proposed for extracting abusive unigrams. In the second part of my presentation, I will be addressing biases that can be found in existing datasets for abusive language detection. I will be focusing on the question why many previous evaluations on this task reported unrealistically high performance scores. Simple experiments on the most popular dataset of the task will be presented to show why the particular sampling method is responsible for this behaviour and what can be done to avoid this problem.

