IIIT Hyderabad Publications |
|||||||||
|
METHODS IN LEGAL CONTRACTUAL CONTENT GENERATIONAuthor: Sagar Sandeep Joshi Date: 2023-06-17 Report no: IIIT/TH/2023/80 Advisor:Vasudeva Varma AbstractLegal contracts are formal agreements between parties that create legally binding obligations and rights, typically between two parties involved in a transaction. Legal clauses form the fundamental units of discourse of a legal contract, which are individual provisions that set out specific terms or conditions of the agreement, which collectively make up the entire contract. Contracts are typically drafted by lawyers or other legal professionals who have expertise in contract law through a process that involves a series of steps to ensure that the interests of both the involved parties are captured, while also ensuring the legal correctness of the content. The domain specific content in legal contracts is an interesting area for the application of NLP techniques, not only because of the peculiar nature of legalese, but also because of the challenges that lie in processing the long length of the content involved. Compared to the general domain of language such as news, encyclopedic content, stories, social media content or some other domain-specific verticals such as scientific articles, judicial proceedings, the domain of legal contracts has seen much less research when it comes to the application of deep learning techniques in NLP, with most of it focusing on problems in contract understanding and review. However, a study of the application of generative methods to this domain has been severely lacking. This thesis aims to establish a stepping stone towards the AI-aided generation of legal content, by presenting a study of two generation problems. We also focus on involving user customizability in the process, for easy tailoring of legal content with respect to the parties involved. The thesis starts off with a simple exploratory idea focused on a small set of rental agreements. In this pilot study, we built and studied an agreement drafting tool that could match informal user intents to rental agreement clauses. Observations from this study brought forth the challenges that lie in the generation of legal content, and the need for explicit finetuning in the face of the scarcity of supervised data while leveraging the presence of large unlabeled corpus for content generation. Following this, we study and extend a prior work done towards contract drafting on the recommendation of legal clauses. In this work, we experiment with addition of a new clause to an existing, incomplete contract draft and experiment several strategies to study the effect of different informative signals for clause recommendation. While we model the contexts for recommendation using mean-based pooling of representations from BERTbased architectures, we also explore methods to achieve a better modeling with the use of long range transformers and present the difficulties involved. In another major contribution of this thesis, we devise a pipeline for generation of legal clauses from minimal, keyword-level information. We take inspiration from the content planning for story generation vii viii paradigm, and study the application of coherence-based techniques in devising a content plan for the generation of legal clauses. Our approach is centered on the idea of generating a legal clause, given the topic (or the type) of the clause and a few keywords for customization. Our proposed pipeline uses the topic and keyword input to use a lightweight graph-based mechanism for creating a content plan to act as the outline for the clause to be generated. The content plan is used by a transformer-based generative model for producing an appropriate legal clause, interpolating through the content plan keywords. We propose an ordered content plan consisting of clause keywords, ranked with respect to their generic-ness, with the keywords more generic to the topic ranked higher. We show the benefit of this order as opposed to a natural, sequential order. The study also compares the ordered plan-based generation in contrast to baselines involving prompt-based generation and generation from unordered content plans. We also show the robustness of our approach across a broad range of topics. With our techniques based completely on pretrained transformer architectures, we also contribute a small chapter on the effectiveness of these architectures on four other tasks. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |