Scaling Blockchain using Codes and DRL based Approach for Blockchain and UAV

Author: Divija Swetha Gadiraju 2018802001
Date: 2023-09-14
Report no: IIIT/TH/2023/135
Advisor:Lalitha Vadlamani

Abstract

Blockchain and Reinforcement Learning (RL) are two game-changing research areas that have received a lot of attention recently. In recent years, significant advances in RL have resulted in tremendous success in solving various sequential decision-making problems in machine learning. The two most successful RL applications are discussed in this work, unmanned aerial vehicles, and blockchain. Blockchain is a distributed ledger technology with its first application in Bitcoin. The main challenge in blockchain-based cryptocurrencies is to provide a distributed trust environment with high security like in a centralized financial system. The current throughput of Bitcoin is around 4 to 7 transactions per second and confirmation latency is about one hour. If Bitcoin has to go mainstream, the throughput has to be in the order of thousands of transactions per second with very low latency in the order of a few seconds. Recent advances in blockchain research proposed consensus algorithms that scale bitcoin, such as sharding and Prism-based blockchain. However, the security of Bitcoin is very high that it can tolerate up to 50% adversarial nodes and avoids double spending attacks. The current blockchain size is over 260 GB and is growing at an astonishing rate imposing a huge storage requirement on the nodes. Recent developments improving the Bitcoin consensus have shown that there is a tradeoff between decentralization, scaling, and security. In order to scale blockchain, we leverage coding theory and RL in this thesis. Due to the increasing storage requirement for blockchains, the computation can be afforded by only a few miners. Sharding has been proposed to scale blockchains so that the storage and transaction efficiency of the blockchain improves at the cost of a security guarantee. Incorporating coding theory into existing consensus algorithms has demonstrated improvements in terms of storage efficiency and low latency. A Secure-Repair-Blockchain (SRB) is proposed which aims to decrease the storage cost at the miners. In addition, SRB also decreases the bootstrapping cost, which allows for new miners to easily join a sharded blockchain. In order to reduce storage, coding-theoretic techniques are used in SRB. In order to decrease the amount of data that is transferred to the new node joining a shard, the concept of exact repair secure regenerating codes is used. The proposed blockchain protocol achieves lower storage than those that do not use coding and achieves lower bootstrapping costs as compared to the different baselines. Prism is a recent blockchain algorithm that achieves the physical limit on throughput and latency without compromising security. However, like the traditional blockchain systems, Prism also has a trade-off between security, latency, and cost. In recent days, reinforcement learning approaches are investigated in traditional blockchains, to improve performance. In this work, we apply Deep Reinforcement Learning (DRL) to one of the promising blockchain protocols, Prism, to optimize its performance. We propose a Deep Reinforcement Learning-based Prism Blockchain (DRLPB) scheme which dynamically optimizes the parameters of Prism blockchain and helps in achieving a better performance. In DRLPB, we apply two widely used DRL algorithms, Dueling Deep Q Networks (DDQN) and Proximal Policy Optimization (PPO). This work presents a novel approach to applying DDQN and PPO to a blockchain protocol and comparing the performance. The analysis of Prism in terms of latency, and security level considering other blockchain parameters is provided. Using the analysis, the DRLPB scheme adapts the Prism blockchain parameters to enhance the security upto 84% more than Prism, while still preserving the performance guarantees of Prism. The recent advancements in the field of Internet of Things (IoT) motivate the development of a secure infrastructure for storing and sharing vast amounts of data. Blockchain, a distributed and immutable ledger, is best known as a potential solution to data security and privacy for IoT. The scalability of blockchain, which should optimize the throughput and handle the dynamics of the IoT environment, becomes a challenge due to the enormous amount of IoT data. The critical challenge in scaling blockchain is to guarantee decentralization, latency, and security of the system while optimizing the transaction throughput. this paper presents a deep reinforcement learning (DRL)-based performance optimization for blockchain-enabled IoT. We consider one of the recent promising blockchains, Prism as the underlying blockchain system because of its good performance guarantees. We integrate the IoT data to Prism Blockchain and optimize the performance of the system by leveraging Proximal Policy Optimization (PPO) method. The DRL method helps to optimize the blockchain parameters like mining rate and mined blocks to adapt to the environment dynamics of the IoT system. Our results show that the proposed method can improve the throughput of Prism blockchain based IoT systems while preserving Prism performance guarantees. Our scheme can achieve 1.5 times more system rewards than IoT integrated Prism and improve the average throughput of the system by about 6,000 transactions per sec. Unmanned aerial vehicles (UAVs) are widely used for missions in dynamic environments. DRL can find effective strategies for multiple agents that need to cooperate to complete the task. The challenge of controlling the movement of a group of UAVs is addressed by MultiAgent Deep Reinforcement Learning (MARL). The collaborative movement of the UAV fleet can be controlled centrally and also in a decentralized fashion, which is studied in this work. We consider a dynamic military environment with a group of UAVs, whose task is to destroy the targets while avoiding obstacles like mines. The UAVs inherently come with a limited fuel capacity directing our research to focus on the minimum task completion time. GLIDE, a continuous-time based PPO algorithm is leveraged in which the UAVs coordinate among themselves and communicate with the central base to choose the best possible action. The simulator called UAV SIM is developed for our experimentation in which the mines are placed at random locations unknown to the UAVs at the beginning of each episode. The performance of the proposed scheme is evaluated through extensive simulations and a comparison of the centralized action control and the decentralized action control is presented.

Full thesis: pdf

Centre for Others

IIIT Hyderabad Publications

Scaling Blockchain using Codes and DRL based Approach for Blockchain and UAV

Abstract