A Secure Index for Document Similarity Detection

Abstract

The document similarity detection plays an essential role in many applications such as plagiarism detection, copyright protection, document management, and document searching. However, the current methods do not care to the privacy of the contents of documents outsourced on remote servers. Such limitation reduces the utilization of these methods. For example, plagiarism detection between two conferences should protect the privacy of the submitted papers. In this paper, we consider the problem of privacy-preserving similarity document detection. The proposed scheme allows comparing documents without disclosing them to the untrusted servers. For each document, the fingerprint set is computed. The inverted index is built based on the entire fingerprint set. The index is protected by Paillier cryptosystem before uploading it to the untrusted server. We have developed a secure yet efficient method to rank the retrieved documents. Several experiments are conducted to investigate the performance of the proposed scheme.