Identifying and Categorizing Offensive Language in Social Media using Sentence Embeddings

Authors: Vijaysaradhi Indurthi,Bakhtiyar Syed,Manish Shrivastava,Manish Gupta,Vasudeva Varma
Conference: 13th International Workshop on Semantic Evaluation (SemEval-2019 2019)
Location Minneapolis, USA
Date: 2019-06-06
Report no: IIIT/TR/2019/98

Abstract

This paper describes our system (Fermi) for Task 6: OffensEval: Identifying and Categorizing Offensive Language in Social Media of SemEval-2019. We participated in all the three sub-tasks within Task 6. We evaluate multiple sentence embeddings in conjunction with various supervised machine learning algorithms and evaluate the performance of simple yet effective embedding-ML combination algorithms. Our team (Fermi)’s model achieved an F1-score of 64.40%, 62.00% and 62.60% for sub-task A, B and C respectively on the official leaderboard. Our model for subtask C which uses pretrained ELMo embeddings for transforming the input and uses SVM (RBF kernel) for training, scored third position on the official leaderboard. Through the paper we provide a detailed description of the approach, as well as the results obtained for the task.

Full paper: pdf

Centre for Search and Information Extraction Lab

IIIT Hyderabad Publications

Identifying and Categorizing Offensive Language in Social Media using Sentence Embeddings

Abstract