Security Laboratory

Security Laboratory

Security Laboratory: Cryptography in Business Series

We are grouping papers in this series to focus on the many facets of data encryption.

Other Related Articles in Security Laboratory: Cryptography in Business Series


Hash Functions


By Stephen Northcutt, Google+

There are three types of cryptography algorithms: secret key, public key, and hash functions. Unlike secret key and public key algorithms, hash functions, also called message digests or one-way encryption, have no key. Instead, a fixed-length hash value is computed based on the plaintext that makes it impossible for either the contents or length of the plaintext to be recovered.

The primary application of hash functions in cryptography is message integrity. The hash value provides a digital fingerprint of a message's contents, which ensures that the message has not been altered by an intruder, virus, or by other means. Hash algorithms are effective because of the extremely low probability that two different plaintext messages will yield the same hash value.

There are several well-known hash functions in use today:

  • Hashed Message Authentication Code (HMAC): Combines authentication via a shared secret with hashing.
  • Message Digest 2 (MD2): Byte-oriented, produces a 128-bit hash value from an arbitrary-length message, designed for smart cards.
  • MD4: Similar to MD2, designed specifically for fast processing in software.
  • MD5: Similar to MD4 but slower because the data is manipulated more. Developed after potential weaknesses were reported in MD4.
  • Secure Hash Algorithm (SHA): Modeled after MD4 and proposed by NIST for the Secure Hash Standard (SHS), produces a 160-bit hash value.
What do they all have in common? Well according to NIST:
"Serious attacks have been reported in recent years against cryptographic hash algorithms, including SHA-1, and because SHA-1 and the SHA-2 family share a similar design, NIST has decided to standardize an additional hash algorithm to augment the ones currently specified in FIPS 180-2. NIST issued a Call for a New Cryptographic Hash Algorithm (SHA-3) Family in a Federal Register Notice on Nov. 2, 2007. The announcement specifies the submission requirements, the minimum acceptability requirements, and the evaluation criteria for candidate hash algorithms. Entries for the competition must be received by Oct. 31, 2008."[1]

In search of the perfect hash
Such a hash algorithm has a few basic properties. The algorithm converts a message of any size into a fixed-size digital string. The length of the result is called the "hash length" or "L" once we start referring to it in an equation. According to RFC 4270 "Finding a pair of messages M1 and M2 that have the same hash value takes 2^(L/2) attempts. For any reasonable hash length, this is an impossible problem to solve (collision free). Also, given a message M1, finding any other message M2 that has the same hash value as M1 takes 2^L attempts. This is an even harder problem to solve (one way)."[2]

"A "collision attack" allows an attacker to find two messages M1 and M2 that have the same hash value in fewer than 2^(L/2) attempts."[2] Hopefully, the winner of the competition will be resistant to collision attacks

An example use of a hash in a modern cryptosystem
Today's cryptosystems are built with the foundations that were discussed in the previous section, including symmetric keys, asymmetric keys and Hash functions. We have described a number of cryptography algorithms that are employed for different applications that enable secure communications. In today's environment, computers come in many varieties from desktop systems to mobile communications devices to home appliances. The Internet, although it provides global communication, is the ultimate nonsecure communications medium.

So how are these types of cryptosystems deployed in the real world? In this section, we will examine Pretty Good Privacy (PGP), the Secure Sockets Layer (SSL), and Kerberos. These public key systems are arguably the de facto standards worldwide in their respective niches. SSL is built in to virtually every Web browser, and PGP is widely used to encrypt or digitally sign documents and e-mail. Kerberos is now the authentication used by Microsoft operating systems. Kerberos is a single sign-on system for client/server authentication, which was invented at MIT. The university has deployed it in its high-risk environment for more than 15 years.

In today's crypto products, what appears to the user as a single system actually comprises multiple algorithms used in conjunction to form a hybrid cryptosystem. Multiple algorithms are employed because each is optimized for a specific purpose.

For example, Alice wants to send a message to Bob. The message needs to be private, the message integrity verified, and Alice's identity confirmed. Alice knows several things, including the message, her own private key, and Bob's public key. Alice starts by passing the message through a hash function to obtain a hash value. She encrypts the hash value with her private key using an asymmetric algorithm. This forms the digital signature.

Alice also creates a random session key for use by the symmetric encryption, which is used to encrypt the message. The secret key is encrypted with Bob's public key using asymmetric encryption. The encrypted message and encrypted session key form a digital envelope. The digital envelope and digital signature are sent to Bob.

Bob obtains the symmetric session key by decrypting it with his private key using asymmetric encryption. The session key is then used to decrypt the message. The decrypted message is run through the hash function, and the value is compared to the digital signature's hash value that was decrypted with Alice's public key.

At this point, Bob knows:
  • The contents of the private message (symmetric encryption).
  • That the message was intended for him (because he was able to obtain the secret key).
  • That the message was not altered (because his hash value matched Alice's hash value).
  • That the message was sent by Alice (because he was able to recover the hash value using Alice's public key).
But why do we need all of these crypto algorithms? Why not just use asymmetric encryption for everything? The answer is processing speed: symmetric encryption is about 1000 times faster than asymmetric encryption for bulk encryption. Diffie-Hellman and RSA were originally seen by their inventors as a way to encrypt and decrypt information using a split key, thereby eliminating the key exchange problem of asymmetric encryption. In the mid-1980s, Lotus Notes designer Ray Ozzie and PGP developer Phil Zimmermann independently observed that asymmetric encryption was much slower than symmetric encryption and using asymmetric encryption for large volumes of data would be infeasible. They designed their software to use symmetric encryption for encryption of data and asymmetric encryption for key exchange. Other algorithms were added, such as hash values for integrity and signed hash values for authenticating the sender.

Applications for hashes
We can use a hash any time we want to prove message integrity. Hash values have been important in incident response for a long time. They can be used to put a "tamper proof seal" on digital evidence as it is collected. For instance, many incident responders prefer Polaroid cameras since digital photos can be easily altered. However, digital cameras are much more convenient, so best practice is to make a hash of the digital photo as soon as possible to reduce the time window one could claim the photo was altered. Some cameras such as Nikon D200 and beyond have the ability to "authenticate" the images they shoot; this, of course, is done with a hash.


1. http://www.ieee-security.org/Calendar/cfps/cfp-NIST-SHA3.html
2. http://www.faqs.org/rfcs/rfc4270.html