IIIT Hyderabad Publications |
|||||||||
|
Emotion Unmasked: A Transformer-based Analysis of Lyrics for Improved Emotion RecognitionAuthor: R Guru Ravi Shanker 2018114011 Date: 2023-06-16 Report no: IIIT/TH/2023/66 Advisor:Vinoo Alluri AbstractMusic is an important art form which has been present for centuries. Lyrics are a crucial part of a song conveying thoughts, messages and also a wide range of emotions. Individuals listen to songs to satisfy their emotional needs. The task of identifying emotions from a given music track has been an active pursuit in the Music Information Retrieval (MIR) community for years. Music emotion recognition has typically relied on acoustic features, social tags, and other metadata to identify and classify music emotions. The role of lyrics in music emotion recognition remains under-appreciated in spite of several studies reporting superior performance of music emotion classifiers based on features extracted from lyrics. In the first study, we use the transformer-based approach model using XLNet as the base architecture, which has not been used to identify emotional connotations of music based on lyrics. Our proposed approach outperforms existing methods for multiple datasets. We also used a robust methodology to enhance web crawlers’ accuracy for extracting lyrics. There are no datasets of Indian language songs that contain both valence and arousal manual ratings of lyrics. We present a new manually annotated dataset of Telugu songs’ lyrics collected from Spotify with valence and arousal annotated on a discrete scale. A fairly high interannotator agreement was observed for both valence and arousal. Subsequently, we create two music emotion recognition models by using two classification techniques to identify valence, arousal and respective emotion quadrant from lyrics. Support vector machine (SVM) with term frequency-inverse document frequency (TF-IDF) features and fine-tuning the pre-trained XLMRoBERTa (XLM-R) model were used for valence, arousal and quadrant classification tasks. Fine-tuned XLMRoBERTa performs better than the SVM by improving macro-averaged F1-scores of 54.69%, 67.61%, 34.13% to 77.90%, 80.71% and 58.33% for valence, arousal and quadrant classifications, respectively, on 10-fold cross-validation. In addition, we compare our lyrics annotations with Spotify’s annotations of valence and energy (same as arousal), which are based on entire music tracks. We also compare the performance emotion recognition models on the original, translated, transliterated texts. The implications of our findings are discussed. We make the dataset publicly available with lyrics, annotations and Spotify IDs. We also used the XLM-R model to identify emotional connotations in Indian language song lyrics datasets of Hindi and Telugu. We also conducted a perceptual validation study on misclassified lyrics. Lastly, we conduct a study to understand the individual differences between the preferences of music with and without lyrics Full thesis: pdf Centre for Cognitive Science |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |