IIIT Hyderabad Publications |
|||||||||
|
Pseudowords: Generatucing, Evaluadating, and their ImpactfluenceAuthor: Mukund Choudhary Date: 2023-06-28 Report no: IIIT/TH/2023/81 Advisor:Bapi Raju Surampudi AbstractPseudowords are a part of language that are not translatable to another, as they have no meaning attached to them while also having the constraint of sounding like a phonologically valid sequence under the desired language’s native phonotactics. This thesis thus explores automated language-agnostic pseudoword generation, evaluation of them, and use of them outside psycholinguistics research and clinical use. As the thesis progresses, we highlight current research, draw inspiration from close topics of study, build a pipeline to generate pseudowords and generate Hindi and English pseudoword candidates for further experiemntation. We make this reusable pipeline available on a public repository, as one of the deliverables of this work. Then we show how the current evaluation work in this field is very scarce and sew an evaluation framework with reproducible details on how to design and analyse a human-inthe-loop experiment for something as tricky as pseudoword judgement, conducted for a layman native speaker. After showing various ways to prod a pseudoword set for quality, we compare notes against past sets in English and present observations summarising how comparable they are. However as there is no Hindi pseudoword dataset yet, we add in psycholinguistic features on top of results of evaluation metrics per Hindi pseudoword and release “Soodkosh” another fully public and usable for research resource. Finally, we conduct two separate studies involving pseudowords to show the application, impact, and importance of them across fields. The first study uses pseudowords to establish gradient between high-frequency words, low-frequency words, and non-sensical sequences of alphanumerics used as passwords. The aim of this study is to find correlation and its strength between the perceived security and memorability of a password/phrase. The other part of this chapter is an exploration into language models’ performance on Aphasia classification and if replacing pseudowords can help them. This is as pseudowords like neologisms, mis-pronunciations, and other novel forms generated by Aphasic speakers are largely out-of-vocabulary to a standard languge model that functions off of a pile of mostly well-formed and coherent data. As these are not directly helpful to the field of Aphasia, this work replaces one possible hurdle to see if it is a feasible solution. However the results show that pseudowords are passively used as features and cannot be replaced directly Full thesis: pdf Centre for Cognitive Science |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |