Extractive Multi-Document Summarization Model Based On Different Integrations of Double Similarity Measures

Abstract

Currently, the prominence of automatic multi document summarization task belongs to the information rapid increasing on the Internet. Automatic document summarization technology is progressing and may offer a solution to the problem of information overload. Automatic text summarization system has the challenge of producing a high quality summary. In this study, the design of generic text summarization model based on sentence extraction has been redirected into a more semantic measure reflecting individually the two significant objectives: content coverage and diversity when generating summaries from multiple documents as an explicit optimization model. The proposed two models have been then coupled and defined as a single-objective optimization problem. Also, for improving the performance of the proposed model, different integrations concerning two similarity measures have been introduced and applied to the proposed model along with the single similarity measures that are based on using Cosine, Dice and Jaccard similarity measures for measuring text similarity. For solving the proposed model, Genetic Algorithm (GA) has been used. Document sets supplied by Document Understanding Conference 2002 (DUC2002) have been used for the proposed system as an evaluation dataset. Also, as an evaluation metric, Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit has been used for performance evaluation of the proposed method. Experimental results have illustrated the positive impact of measuring text similarity using double integration of similarity measures against single similarity measure when applied to the proposed model wherein the best performance in terms of Rouge-2 (0.1354) and Rouge-1 (0.4210) has been recorded for the integration of Cosine similarity and Jaccard similarity.