IIIT Hyderabad Publications
PRIVACY PRESERVING OUTLIER DETECTION OVER VERTICALLY PARTITIONED DATA
Author: P Madhuchand Rushi
Report no: IIIT/TH/2016/61
Consider a scenario wherein a bank with a large database of customer information intends to identify the set of customers to whom a credit line could be extended. For this, the bank might need various other information, apart from its own database, such as tax information from the income tax department, criminal history of the customers from the law enforcement authorities etc. Although all the other parties might be willing to share their respective information (databases) in order for the banking organization to compute the result, they might not be willing to compromise on the privacy of their databases. To preserve the privacy of the individual data, in an idealistic world, all parties could send their data to a Trusted Third Party (TTP) which would then apply any of the data mining techniques to compute the intended result and share it with the banking organization. The bank would only get to know the final set of customers to whom a line of credit could be extended and any local information with other parties would not be revealed other than what could be computed from the final result itself. However, in the real world, in many instances it would be infeasible to find such a Trusted third party with whom all the other sources of data would be willing to share their information. To overcome such problems, Privacy Preserving Data Mining techniques have been developed which allow the computation of a function on various databases distributed across a set of players, with out either party learning anything about the data of the other parties other than that can be derived from the final result itself. In other words, these techniques simulate the functionality of a TTP. In this thesis, we consider the problem of Privacy Preserving outlier Detection(PPOD) over vertically partitioned data. The computational and communication complexities of the proposed PPOD algorithm are sub-quadratic in the size of the dataset as opposed to the quadratic complexities of the previously known results. In order to achieve efficient algorithms in the private setting, we first develop an approximation algorithm for outlier detection in the centralized setting and then extend it to the case of privacy preserving outlier detection.
Full thesis: pdf
Centre for Security, Theory and Algorithms
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved.