Hiding Sensitive Frequent Itemsets over Privacy Preserving Distributed Data Mining

Abstract

Data mining is the process of extracting hidden patterns from data. One of the most important activities in data mining is the association rule mining and the new head for data mining research area is privacy of mining. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Data mining can be applied on centered or distributed databases. Most efficient approaches for mining distributed databases suppose that all of the data at each site can be shared. Privacy concerns may prevent the sites from directly sharing the data, and some types of information about the data. Privacy Preserving Data Mining (PPDM) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. In this paper, the problem of privacy preserving association rule mining in horizontally distributed database is addressed by proposing a system to compute a global frequent itemsets or association rules from different sites without disclosing individual transactions. Indeed, a new algorithm is proposed to hide sensitive frequent itemsets or sensitive association rules from the global frequent itemsets by hiding them from each site individually. This can be done by modifying the original database for each site in order to decrease the support for each sensitive itemset or association rule. Experimental results show that the proposed algorithm hides rules in a distributed system with the good execution time, and with limited side effects. Also, the proposed system has the capability to calculate the global frequent itemsets from different sites and preserves the privacy for each site.