Data Pre-processing for knowledge discovery

Abstract

AbstractData pre-processing stage is also known as (data preparation) stage and it is a fundamental stage for data analysisand knowledge discovery. If there is much irrelevant and redundant information or noisy and unreliable data,then knowledge discovery during analysis and mining phase will be more difficult. Therefore we consider thepre-processing stage as an important step for knowledge discovery process and has a significant impact onpredictive accuracy. Essentially, while each customer attribute may require special treatment for each algorithm,so the choices of data pre-processing (DPP) depend on the individual dataset or database used. In this paper wehave chosen and explained two different pre-processing techniques which are (consistency, reduction) dependingon our data warehouse of marketing which contains inconsistent attributes and also contains duplicated records.We have also proposed two new algorithms for reduction named (Removing Duplication Algorithm) and forconsistency named (Resolving Inconsistency Algorithm) so that achieving the best performance for their dataset. In this paper we applied and implemented our two new algorithms on our data warehouse using (C#programming language) and (Microsoft Access file), and gained cleaning data warehouse with consistentattributes and empty of duplicated records that is ready for preparing quality data as input to the algorithms ofdata mining process or any other analysis method which also influences of knowledge quality that is discoveredduring data mining process