IIIT Hyderabad Publications |
|||||||||
|
Text Embeddings in Riemannian ManifoldsAuthor: Aman mehta Date: 2023-04-17 Report no: IIIT/TH/2023/15 Advisor:Kamalakar Karlapalem AbstractAlthough the concept of a knowledge graph has been discussed since at least 1972 [93] it wasn’t until Google [96] unveiled their version in 2012 that it truly took off. Since then, several companies have started working on their own knowledge graphs, including Google, Amazon [54], eBay [82], Twitter, IBM [21], LinkedIn [37], Microsoft [95], and Uber. There has been a rise in the number of academic publications devoted to the topic of knowledge graphs in recent years, reflecting the growing interest in this idea. There are several books [79, 85, 52] and papers [84, 24] that detail knowledge graphs, as well as unique techniques to generating and analysing knowledge graphs and assessments of various knowledge graph aspects [103]. Apart from these enterprise Knowledge Graphs which are private and not accessible to public, there are a number of a number of public knowledge graphs published where the content is accessible for users of the web. The publicly available Knowledge Graphs are called open KGs. DBpedia [58], YAGO [40], Freebase [101], Wikidata [8] are the top few examples of such open KGs which are multi-domain and are built using the data from Wikipedia. Automatic extraction of information from text and its transformation into a structured format is an important goal in both Semantic Web Research and computational linguistics. Knowledge Graphs (KG) serve as an intuitive way to provide structure to unstructured text. A fact in a KG is expressed in the form of a triple which captures entities and their interrelationships (predicates). Multiple triples extracted from text can be semantically identical but they may have a vocabulary gap which could lead to an explosion in the number of redundant triples. Hence, to get rid of this vocabulary gap, there is a need to map triples to a homogeneous namespace. In this work, we present an end-to-end KG construction system, which identifies and extracts entities and relationships from text and maps them to the homogenous DBpedia namespace. For Predicate Mapping, we propose a Deep Learning architecture to model semantic similarity. This mapping step is computation heavy, owing to the large number of triples in DBpedia. We identify and prune unnecessary comparisons to make this step scalable. Our experiments show that the proposed approach is able to construct a richer KG at a significantly lower computation cost with respect to previous work. Over recent years, document similarity has grown to become the foundation of various natural language processing activities, which are crucial to information retrieval, automatic question answering, machine translation, dialogue systems, and document matching. For document or topic similarity, focusing on the semantics within the text has been the most common and pursued direction of effort Our novel KG-based similarity classifier works on the limitations of previous approaches and also provides reasoning behind the classification. Our results show that we’re able to score similarity between Wikipedia documents accurately. Furthermore, the accuracy of our approach is able to withstand against noise and paraphrasing. We also see that our classifier can be used for category outlier detection in DBpedia. In this thesis, we will focus on Knowledge Graph construction from unstructured text and document similarity. Our knowledge graph construction approach performs better > 25% better than Cosine & Rule-based approaches, and is also computationally cheaper than state-of-the-art T2KG [53] by a magnitude of atleast 106 . We use these constructed knowledge graphs to classify similarity between two documents, which is used to detect category outliers in DBpedia and find highly similar documents. Full thesis: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |