Convergent encryption

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Convergent encryption, also known as content hash keying, is a cryptosystem that produces identical ciphertext from identical plaintext files. This has applications in cloud computing to remove duplicate files from storage without the provider having access to the encryption keys.[1] The combination of deduplication and convergent encryption was described in a backup system patent filed by Stac Electronics in 1995.[2] This combination has been used by Farsite,[3] Permabit,[4] Freenet, MojoNation, GNUnet, flud, and the Tahoe Least-Authority Filesystem.[5]

The system gained additional visibility in 2011 when cloud storage provider Bitcasa announced they were using convergent encryption to enable de-duplication of data in their cloud storage service.[6]


  1. The system computes the cryptographic hash of the plaintext in question.
  2. The system then encrypts the plaintext by using its hash as a key.
  3. Finally, the hash itself is stored, encrypted with a key chosen by the user.

Known Attacks[edit]

Convergent encryption is open to a "confirmation of a file attack" in which an attacker can effectively confirm whether a target possesses a certain file by encrypting an unencrypted, or plain-text, version and then simply comparing the output with files possessed by the target.[7] This attack only poses a problem for a user storing information that is non-unique, i.e. also either publicly available or already held by the adversary - for example: Banned books or files that cause copyright infringement. An argument could be made that a confirmation of a file attack is easily rendered ineffective by simply adding a unique piece of data such as a few random characters to the plain text before encryption; this causes the uploaded file to be unique and therefore results in a unique encrypted file. However, some implementations of convergent encryption where the plain-text is broken down into blocks based on file content, and each block then independently convergently encrypted may inadvertently defeat attempts at making the file unique by adding bytes at the beginning or end.[8]

There is also a "learn the remaining information attack" described by Drew Perttula in 2008.[9] This type of attack applies to the encryption of documents that are only a slight variation of a public document. These include marginal revisions of public document, filled public form... For example if the defender encrypts a bank form including a ten digit bank account number, an attacker that is aware of generic bank form format may extract defender's bank account number by producing bank forms for all possible bank account numbers, encrypt them and then by comparing those encryptions with defender's encrypted file deduce his bank account number.

See also[edit]


  1. ^ Secure Data Deduplication, Mark W. Storer Kevin Greenan Darrell D. E. Long Ethan L. Miller
  2. ^ System for backing up files from disk volumes on multiple nodes of a computer network, US Patent 5,778,395 filed October 1995,
  3. ^ Reclaiming Space from Duplicate Files in a Serverless Distributed File System, MSR-TR-2002-30,
  4. ^ Data repository and method for promoting network storage of data, US Patent 7,412,462 provisionally filed Feb 2000,,412,462.PN.&OS=PN/7,412,462&RS=PN/7,412,462
  5. ^ Drew Perttula and Attacks on Convergent Encryption
  6. ^ Finally! Bitcasa CEO Explains How The Encryption Works, September 18th, 2011,
  7. ^ [1] (2008-08-20). Retrieved on 2013-09-05.
  8. ^ Storer, Greenan, Long & Miller: "Secure Data Deduplication" University of California at Santa Cruz (2008-10-31). Retrieved 2013-09-5.
  9. ^ [2] (2008-08-20). Retrieved on 2013-09-05.