Studying the Impact of Handling the Missing Values on the Dataset On the Efficiency of Data Mining Techniques

Abstract

Medical data has potential information for extracting hidden patterns in the data sets. Classification is form of data analysis that can used to extract models describing important data classes or to predict future data trend. Such analysis can help providing us with a better understanding of the large data.The diagnosis of a medical from symptoms is one example of classification tasks, in which the classes could be either the various disease states or the possible therapies. Data cleaning and normalization may improve the accuracy and efficiency of mining algorithms.In this paper we use two data mining techniques ( neural network and decision tree ) on a known diabetic dataset to predict the future from the given attributes, and notice the impact of handling the missing value in the dataset at the results.