Text Similarity Based on English Morphological Analyzer Approach


Nowadays many applications require text similarity. It becomes important for comparing texts on websites. Keywords are useful for a variety of purposes, including summarizing, indexing, labeling, information retrieval, text similarity, clustering, and searching. The objective of the proposed systemis achieving automatic test for text similarity and compute similarity ratio. The system based on several techniques especially English Morphological Analyzer (EMA). In this work, keyword extraction and text summarization are very useful to determine text similarity for long and very long texts. The proposed system solves the problem of text similarity through applying several statistics and linguistic approaches especially based on morphological rules. The linguistic approaches in this system also include synonym, word-frequencies, word position, and Part-Of-Speech (POS). It will be shown that keyword extraction and text summarization that are built on EMA approach and other statistics and linguistic approaches are very useful in building high accurate method for text similarity. The system was tested and the accuracy rates of results bounded from %98.85 to %100.