IIIT Hyderabad Publications |
|||||||||
|
Evaluation of Transformer Models on Summarization and BeyondAuthor: Pawan Sasanka Ammanamanchi 2020701028 Date: 2024-06-27 Report no: IIIT/TH/2024/135 Advisor:Manish Shrivastava AbstractTransformer models, due to their self-attention architecture, ease of training, and high par- allelization capabilities, transformer models have taken the field of deep learning by storm. They have been applied in Natural Language Processing, Computer Vision, Speech Recogni- tion, Protein Folding, Reinforcement Learning, and other intersections of deep learning and sub-fields within artificial intelligence and beyond. Deep Learning models in Natural language processing, especially transformer models, have given rise to skillful and well-performing language models. Recently, one such appli- cation called ChatGPT has seen 100 million users within three months of launch because of its user-friendly interface and wide-ranging task-solving capabilities through prompting, finetuning and other techniques. We aim to investigate the capabilities of transformer language models and understand the nuances of evaluating them. We focus on applying a pre-trained language model to the summarization task and trying to extend its capabilities through finetuning. These models can have their abilities augmented and extended to solve tasks in various domains. We then look at developing a generation benchmark. Developing a benchmark gives us insight into where the field was at and where the area is going and the choices that must be made to create a benchmark that meets the needs of an ever-encompassing fast-moving natural language processing landscape and even then be challenging enough for language models so that it provides us some insight into their nature. Next, we turn our attention toward LLMs to appreciate the capabilities of language modeling transformers and their scalable nature. We see the development of BLOOM, a 176B parameter multilingual language model trained on 46 languages. We evaluate its 0-shot and 1-shot performance on various tasks to understand what one can expect from a massively trained large language model. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |