IIIT Hyderabad Publications |
|||||||||
|
Deciphering Beyond the View: A Brain Decoding Approach to Language Processing TasksAuthor: Jashn Arora Date: 2023-05-05 Report no: IIIT/TH/2023/25 Advisor:Bapi Raju Surampudi AbstractBrain decoding involves the reconstruction of stimuli from brain recordings. These recordings can be obtained by presenting stimuli to a subject in various forms, such as text, image, and speech. Despite extensive research on brain decoding, important questions remain unanswered. Can we develop multi-view decoders capable of decoding concepts from brain recordings of any view, including picture, sentence, or word cloud? Can we build a system that can use brain recordings to automatically generate descriptions of what a subject is viewing using keywords or sentences? How about a system that can automatically extract important keywords from sentences that a subject is reading? Answering these questions requires innovative approaches to brain decoding, as traditional methods have not yet been proven adequate. Previous brain decoding efforts have focused only on single-view analysis and hence cannot help us build such systems. As a first step toward building such systems, inspired by Natural Language Processing literature on multi-lingual and cross-lingual modelling, this thesis proposes novel brain decoding setups: (1) Multi-view Decoding (MVD), (2) Cross-view Decoding (CVD), and (3) Abstract v/s Concrete Decoding. In MVD, the goal is to build an MV decoder that can take brain recordings for any view as input and predict the concept. In CVD, the goal is to train a model which takes brain recordings for one view as input and decodes a semantic vector representation of another view. Specifically, this thesis studies practically useful CVD tasks like image captioning, image tagging, keyword extraction, and sentence formation. In Abstract v/s Concrete Decoding, the goal is to build a decoder trained on concrete concepts and test it on both abstract and concrete concepts and similarly build a decoder trained on abstract concepts and test it in both types of concepts. Extensive experiments lead to MVD models with ∼0.68 average pairwise accuracy across view pairs and CVD models with ∼0.8 average pairwise accuracy across tasks. It was found that the decoder trained on concrete concepts can decode both abstract and concrete objects with great and better accuracy than the model trained on abstract objects. Analysis of the contribution of different brain networks reveals exciting cognitive insights: (1) Models trained on picture or sentence view of stimuli are better MV decoders than a model trained on word cloud view. (2) Our extensive analysis across 9 broad brain regions, 11 language sub-regions, and 16 visual sub-regions of the brain helped us localize, for the first time, the parts of the brain involved in cross-view tasks like image captioning, image tagging, sentence formation, and keyword extraction. (3) The visual brain network is very important for processing Word+Picture stimuli for concrete concepts. Surprisingly, this is not the case for abstract concepts where voxels from the language and DMN brain network are more activated. Full thesis: pdf Centre for Cognitive Science |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |