IIIT Hyderabad Publications |
|||||||||
|
FAST : Fragment Assisted Storage for query execution in read-only databasesAuthor: Vivek Hamirwasia Date: 2019-12-02 Report no: IIIT/TH/2019/121 Advisor:Kamalakar Karlapalem AbstractHistorically, row store has been the most popular storage layout to store relational data in commercial databases. With the advent of column store, it became possible to improve the efficiency of real world queries by storing and operating on each attribute separately. Traditional row store has the disadvantage of reading irrelevant attributes into main memory, when only a few attributes are queried. Whereas, column store suffers from large “stitching” costs if the number of attributes queried are large. Moreover, traditional row store suffers from a large amount of cache misses whereas column store is cache-efficient since irrelevant attributes are not brought into the cache. Recent years have also witnessed an in-memory revolution in databases, wherein a large part of the data is majorly stored in RAM thereby removing the I/O overhead. Although the concept of in-memory databases has been around for a long time, commercial and practical implementations have been feasible only recently due to the falling prices of RAM coupled with its increasing capacity. However, research has shown that most of the time in such systems is lost due to the cache misses. Most commercial row and column storage systems do not optimize for the layout of data in the available main memory to improve performance. In this thesis, we introduce an intuitive, hybrid combination of row and column storage mechanisms to reduce the I/O cost and cache misses for ad-hoc read-only queries. Our system acts as a commercial off-the-shelf (COTS) solution on top of existing databases. We leverage the concept of vertical fragmentation by analyzing the historical query load and grouping certain related attributes together to form vertical fragments. The attributes within a particular fragment are selected on the basis of their co-occurrence over all queries. These generated fragments are then stored as materialized views in the main memory buffer. The more ”valuable” fragments are given priority when allocating space in the main memory. We also present algorithms to optimally select the required data from these fragments for a given query so as to reduce the total I/O calls to the secondary storage. Moreover, this placement of data in main memory is cache-efficient, thereby improving the performance. We show that on average, FAST executes queries up-to an order of magnitude faster than row storage and as much as twice as fast than column storage for an ad-hoc workload. We demonstrate the superior performance of FAST on TPC-H benchmark queries as well. Our results show superior performance of FAST on multiple TPC- H queries. We also present techniques to automatically adapt the main memory layout to a changing workload Although the framework described by this thesis largely focuses on the main memory and the secondary storage for data layout, our algorithms and techniques have been designed to work for any generic multi-level data storage system. For example, we can leverage our approach in a three level storage model consisting of the main memory, an intermediate SSD layer and the secondary storage. This is possible because we deal with the logical database layer rather than modifying the physical layer to achieve performance gains. Moreover, our storage model is designed to support both OLAP (online analytical processing) and OLTP (online transaction processing) workloads. In fact, one of the advantages of our framework is the ease of implementation on top of existing databases systems. To this end, we apply FAST in the context of distributed databases. We also discuss an in-grained applica- tion of FAST for distributed databases which would result in larger query efficiency gains at the cost of implementation. We hope this discussion to open up new areas of systems research in this direction. Full thesis: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |