Intelligent Documents Classification System

Abstract

There are a huge number of documents that available in many various sources in unorganized format, therefore these unstructured documents needs to be classified. In this paper, a proposed system called "Intelligent Documents Classification System" which represents the system for classifying the documents to the correct class based on its textual information. This system contain through four steps which are preprocessing, features extraction, proposed method for features selection, and finally, modify model of naïve bays. Two datasets are used to evaluate the proposed system, the first dataset its name as "bbc from ucd repository" is standard that contains technical research documents distributed over five classes which available on the internet and the second dataset is collected dataset contains books documents distributed over six classes which collected during this work. The IDC system achieved the powerful results. For the standard dataset the accuracy is 95.1%, precision is 95%, recall is 95.8%, and f1-measure is 95.39% while the accuracy for the collected dataset is 95.3%, precision is 95.16%, recall is 95.83%, and f1-measure is 95.49%.