Question Answering for Health Forums

Author: Jalan Raksha Sanjay
Date: 2019-07-11
Report no: IIIT/TH/2019/82
Advisor:Vasudeva Varma,Manish Gupta

Abstract

In today’s world, users prefer getting a precise answer to their questions rather than a set of relevant documents provided by search engines. It has led to the immense popularity of Community Question Answering( CQA ) forums, where forum users respond directly to questions with targeted answers. Users ask question to other members of community and expect domain experts to answer questions. Questions asked on community Question answering are generally different in nature than traditional search queries due to specific user needs. A user may have a query about the general topic to a most specific one. Hence, user uses CQA platforms to express users query in more verbose form. Recently, Community Question Answering forums are gaining huge popularity in the health-care domain. The impact of the health-care industry on a day to day patient care and on biomedical research is immense. Owing to an exponential growth of online information seekers in the health domain, there is a particularly great and growing demand for QA systems that can effectively and efficiently aid consumers in their information search. However, in order to get better and quick responses to questions on health forums, it is important to categorize questions and direct them to appropriate experts based on question types. Due to huge popularity of CQA’s, people get noisy, non-relevant responses as well along with relevant one. In order to improve the quality of CQAs, it is important to filter out noisy answers and present only relevant answers. In this thesis, we propose a model to classify question-based on user intentions. We also introduce a model which predicts relevance of answers with respect to the question. After defining relevance of answers, our model also predicts polarity of answers with respect to question. In the first model, we propose a novel approach for classifying questions capturing a unique user intent posted on health forums. Our proposed model combines deep learning based features along with domain based knowledge and statistical features. We further added weak supervision to improve performance. We also proposed a unique model of weak supervision, called ”Self-training with Lookups”. Our best performing model for question classification provides an accuracy of 71.13%, which beats the state-of-the-art method with a margin of 3.13%. In the second model, we work towards finding the relevance and polarity of responses to questions. When questions get thousands of responses, it becomes difficult for the user to read every response and he or she might lose some important response. The dataset consists of questions and corresponding responses having some inherent implicit opinions. First, we devise a model to find if the answer is relevant to the question. In the next step, model categorizes relevant answers based on their polarity. Our Final model takes ensemble representation of TF-IDF vector and Doc2Vec vector as features. Traditional machine learning model like multinomial naive Bayes performed superior to other models in ensemble environment

Full thesis: pdf

Centre for Search and Information Extraction Lab

IIIT Hyderabad Publications

Question Answering for Health Forums

Abstract