Towards Identification, Classification and Analysis of Mental Illness on Social Media

Author: Sravani Boinepelli
Date: 2022-09-29
Report no: IIIT/TH/2022/117
Advisor:Vasudeva Varma

Abstract

The number of people consuming social media and the amount of time they dedicate to it increases every day. With it, cases of mental illness and suicide risk are escalating on social media platforms. The lack of mental health detection mechanisms capable of proactively reaching out to people in need is becoming increasingly evident. The space of automatic mental health detection is ripe with potential, as previous work on mental health has predominantly been theoretical and on a small scale. Work done to detect mental illness on a more expansive scale is few and far between. However, it is riddled with many practical and ethical challenges. We must consider the degree or severity of a case of mental illness in order to estimate whether a person requires intervention. Identifying these intervention points can help us contact the user and provide them with guidance and resources before their condition escalates to suicidal ideation. However, research efforts are limited due to the social stigma and discrimination around accessing mental health care. This has often kept users from disclosing their personal problems not only to public social media forums but to mental health professionals as well. In addition to this, there are other barriers to accessing the appropriate help when necessary, such as financial concerns, apprehensions about privacy and confidentiality, and long waiting periods to see a mental health professional. Anonymous social media platforms such as mental health blogs or Reddit forums have therefore become increasingly popular as they can share their personal stories without judgment. They also generate a shared sense of community as they no longer have to suffer alone. People who face similar issues can share their experiences, give advice, motivate and persuade them to seek counsel from professionals. Therefore, social media has become a valuable source of linguistic cues to help us identify mental health problems from textual data. As researchers, it is imperative that we maintain user privacy and do not disclose personal identities or any other information associated with users. Therefore, anonymous subreddits and mental health forums make for excellent data sources. We additionally preprocess author usernames, as they could contain sensitive information related to the names or locations of the user. Our work takes steps towards automated identification, classification, and analysis of mental health on large-scale social media platforms. Research has indicated that approximately two-thirds of people who die by suicide were dealing with depression or other mental illnesses at the time of death. Hence, we first turn our efforts toward the identification of depression. Our novel architecture considers various issues that arise from dealing with large social media datasets. A majority of these challenges stem from the inherent size and composition of the internet. Social media is quite vast, and the amount of content on depression is nearly negligible when comparing the number and variety of topics being discussed on these platforms. This can often lead to skewed results and can be incredibly damaging to our purpose due to the sensitivity and severity of our problem. We attempt to simulate and consequently adapt our system to these conditions. We also strive to decrease the load on time and computational resources. The resulting system executes in a quicker, more efficient manner, making it suitable for deployment. We also experimented with different deep learning methods to analyze and categorize mental health issues from social media forums. While our user-level depression detection system is at its crux, a binary classification problem, we now focus on tackling the challenges that come when dealing with including multiple labels or categories of mental health issues. The dataset used contains information pertaining to 15 specific mental health support groups collected from related subreddits. We explore different approaches and class imbalance techniques to develop more effective learning models and counteract the bias in any given dataset. While our work as computational linguists usually revolves around finding linguistic intricacies from textual data, we also attempt to evaluate sentiment from pictorial data. We must not assume that social media users would limit themselves to sharing their mental health issues on predominantly textual platforms such as Twitter and Reddit. With the rise of meme culture and the popularity of sites such as Instagram, pictorial formats are being utilized to a greater extent. We take our knowledge of textual deep learning methods to experiment with incorporating this mode of data, which includes a short amount of text in their images, as seen in memes. This would allow us to detect positive or negative sentiments from these images and is a step towards detecting mental health sufferers on multi-modal social media. Suicide is amongst the most pressing public health issues facing today’s society, stressing the need for rapid and effective detection tools. Most suicides are related to psychiatric disease, with depression, substance use disorders and other mental health disorders being the most relevant risk factors. Shared tasks such as CLPsych have played a part in the rise of using social media datasets to develop deep learning architectures for suicidality prediction. We therefore participate and present our findings from our participation in CLPsych’s 2022 shared task. The task is divided into two parts: capturing changes in a user’s mood over time and assessing a user’s suicidality risk level. Considering the longitudinal of the task, we use transformer-based LSTM architectures to take historical context into account. We also keep in mind the formulation of platform-agnostic detection mechanisms (that are required to run in realtime) while devising our finetuned transformer suicidality risk system. Our team not only outperformed all baselines but also achieved top results in different categories for both subtasks.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Towards Identification, Classification and Analysis of Mental Illness on Social Media

Abstract