IIIT Hyderabad Publications |
|||||||||
|
Towards NLP in Climate ChangeAuthor: Roopal Vaid 201502023 Date: 2023-06-23 Report no: IIIT/TH/2023/73 Advisor:Manish Shrivastava AbstractClimate change is one of the most pressing issues of our time, and understanding the discourse surrounding it is crucial for effective communication and action. The discourse encircling climate change can circumscribe a wide range of perspectives, attitudes and opinions. It is essential to analyze this discourse to identify current challenges, road-maps, and systematic changes governments, organizations, and institutions require to combat the effects of climate change. Social media is an important platform for climate change discourse due to its widespread use and real-time nature. This makes it possible to analyze the discourse in near real-time, providing valuable insights into public opinions, attitudes, assess topic framing, event dependent attention to the issue and concerns surrounding climate change. We evaluate the contextual and social features that play key role in the coverage on different platforms. In this thesis, we focus on the fine-grained classification and stance detection of climate changerelated social media text surrounding the United Nations Climate Change Conference. We establish two corpora, ClimateStance and ClimateEng with the help of tweets posted during the 2019 United Nations Framework Convention on Climate Change with Intergovernmental Panel in Geneva. We comprehensively outline the dataset collection, pre-processing, annotation methodology, and dataset composition. We have put together a set of guidelines and specifications for creating expandable corpora ClimateEng, ClimateStance which is a collection of 3777 tweets that have been manually labeled with information about events, states, the categories they belong to, and their corresponding stance. We benchmark both datasets for climate change prevention stance detection and fine-grained classification using state-ofthe-art methods in text classification and experiments along with results are discussed in detail. In addition, we create a dataset called ClimateReddit, which is based on Reddit and includes 6262 comments from climate-change related subreddits. We perform semi-supervised learning on the corpus with pseudo-labelling and manually annotate 329 comments for the tasks of fine-grained classification and stance detection of climate-change data. We compare the results with the best-performing models for both tasks from the supervised experiments. Finally, we provide linguistic analysis of ClimateEng, ClimateStance and ClimateReddit using techniques such as part-of-speech tagging and named-entity recognition. Further, we extend our work in code-mixed setting. We collect Hindi and English code-mixed data from twitter during 2020 and construct a corpus of code-mixed Twitter data. We define the task of finegrained classification for the same and outline data-collection and annotation methodology for code-mix data. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |