XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages

Authors: Tushar Abhishek,shivprasad sagare,Bhavyajeet Singh,Anubhav Sharma,Manish Gupta,Vasudeva Varma
Conference: Companion Proceedings of the Web Conference 2022
Pages: 1-5

Date: 2022-04-04
Report no: IIIT/TR/2022/12

Abstract

Multiple critical scenarios need automated generation of descriptive text in low-resource (LR) languages given English fact triples. For example, Wikipedia text generation given English Infoboxes, automated generation of non-English product descriptions using English product attributes, etc. Previous work on fact-to-text (F2T) generation has focused on English only. Building an effective crosslingual F2T (XF2T) system requires alignment between English structured facts and LR sentences. Either we need to manually obtain such alignment data at a large scale, which is expensive, or build automated models for cross-lingual alignment. To the best of our knowledge, there has been no previous attempt on automated cross-lingual alignment or generation for LR languages. We propose two unsupervised methods for cross-lingual alignment. We contribute XAlign, an XF2T dataset with 0.45M pairs across 8 languages, of which 5402 pairs have been manually annotated. We also train strong baseline XF2T generation models on XAlign. We make our code and dataset publicly available1 , and hope that this will help advance further research in this critical area.

Full paper: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages

Abstract