This is an attempt to understand the basics of cryptography. The very basics 🙂
Beware of a link-intensive post, it is meant (as many others) to serve me as a reference.
It started with the recent iCloud privacy problems, then the article about hashing of secrets intrigued me a bit and made me curious to read more about this field. So here it is.
Hashing vs Encrypting vs Encoding
Hashing – irreversible; used to check integrity of data, to irreversibly encode data (passwords) and also to sign data (in conjunction with HMAC).
Encrypting – reversible; used for maintaining data confidentiality
Encoding – reversible, for usability (ex Base64Encode) #
Update 16/Dec/2014: There is a small debate whether applying ROT13 to a string is considered encryption or not. ROT13 is a very simple substitution cipher (one of the 26 possible ones) – which substitutes each letter by another one placed 13 positions further in the alphabet.
I would say that ROT13 is a form of encryption; true, a very very weak one. But it has an algorithm (substitution of letters) and a key (13 positions). So in theory it encodes a message so that only authorized parties can read it. In practice, almost anyone with a basic motivation can read it.
Hashing vs HMAC vs KDF
1. Hashing algorithms
A hashing algorithm converts a variable-length string to a fixed-length string that can act as a “fingerprint” or unique identifier for the original string. It is not possible to convert the hash result back to the source string.
Hash(string [, algorithm [, encoding ]])
string hash ( string $algo, string $data [, bool $raw_output = false ] )
2. HMAC (Hash-Based Message Authentication Codes)
HMAC is used to verify the data integrity and authenticity of a message transmitted. It involves a cryptographic hash function in combination with a secret key.
According to the official specifications, HMAC is defined as:
H(K XOR opad, H(K XOR ipad, text))
H is a cryptographic hash function where data is hashed by iterating a basic compression function on blocks of data
B is the byte-length of such blocks (B=64 for MD5, SHA-1)
L is the byte-length of hash outputs (L=16 for MD5, L=20 for SHA-1)
K is the authentication key and can be of any length up to B, the block length of the hash function.
Applications that use keys longer than B bytes will first hash the key using H and then use the resultant L byte string as the actual key to HMAC. In any case the minimal recommended length for K is L bytes (as the hash output length). »» this is an interesting fact leading to potential problems, but it does not make pbkdf-hmac-sha1 unsecure
ipad, opad (inner/outer pad) are two fixed and different strings defined as
ipad = the byte 0x36 repeated B times
opad = the byte 0x5C repeated B times.
0x5C? “Their values have been arbitrarily chosen by the HMAC designers, and any pair (opad,ipad) could have been selected, as long as opad≠ipad. #”
string hash_hmac ( string $algo , string $data , string $key [, bool $raw_output = false ] )
In ColdFusion, the hmac() function exists starting ColdFusion 10, while in the Open Source world Railo had introduced it with version 4 (see cfml.io)
hmac(object message,object key,[string algorithm,[string encoding]]):string
Custom implementations of the function: here, here and here
3. Password-based Key Derivation Function (PBKDF)
A key derivation function (or KDF) derives the encryption key from a master password. Specifications
PBKDF2 applies HMAC to the input password along with a salt value and repeats the process many times to produce a derived key, which can then be used as a cryptographic key in subsequent operations. The added computational work makes password cracking much more difficult, and is known as key stretching. When the standard was written in 2000, the recommended minimum number of iterations was 1000, but the parameter is intended to be increased over time as CPU speeds increase.
Having a salt added to the password reduces the ability to use precomputed hashes (rainbow tables) for attacks, and means that multiple passwords have to be tested individually, not all at once. The standard recommends a salt length of at least 64 bits.
In ColdFusion the PBKDF support was introduced very recently (April 2014) – with ColdFusion 11:
GeneratePBKDFKey(algorithm, inputString, salt, iterations, keysize) (algorithm can be ‘PBKDF2WithHmacSHA1’)
Same story with PHP, only supporting PBKDF starting version 5.5.0:
string hash_pbkdf2 ( string $algo , string $password , string $salt , int $iterations [, int $length = 0 [, bool $raw_output = false ]] )
1. Password hashing
By ‘password hashing’ I don’t mean ‘apply a hash function to the password’ (this would be against the rule of not hashing secrets 🙂 ), but performing an irreversible set of operations on the password in order to obtain an unique output string.
The recommended way to go in this case is ‘always use libraries’ (PHP example: phpass). They basically do the following:
– generate a random and unique salt
– hash the password + salt
– then the output is strengthened using a key strengthening function (bcrypt, PBKDF2, etc).
At the end of the process, we have a password hash and a salt. They both need to be stored (along with the algorithms used) and in some cases – like Phpass – the salt is a part of the password hash:
Phpass actually uses salt to hash the password (and it is also slower algorithm which is good), you just don’t need to store it in separate column because the salt is included within the output string. Generating multiple hashes of the very same password, it resulted in different strings:
You can notice their start is the same, that’s because first few bytes ($2a) identify used algorithm, next few bytes ($08) identify used options for that algorithm, roughly half of the rest is used salt and the rest is password hash. So, the salt is included within the string, it is just not so obvious to human.
Note 1: Hashing, salting and key derivation is not intended to protect from a brute force attack intended to log in into the system. For this use case, other strategies can be used (2-factor authentication, automatically locking the account after several failed login attempts, introducing an artificial delay after posting the login data, etc…). Hashing+salting is intended to protect the malicious users from getting the clear-text password in the event when the user database was compromised (that is – password hash + salt were stolen).
Note 2: The problem with using hashing alone (without salting + key derivation) is not the fact that it can be decoded. It could, but it would take so many resources that it’s actually easier to brute-force the hash, especially if the input dictionary is more or less known.
2. Confidential data encryption
Example: storing an encrypted file in the cloud. This operation should be reversed, so we don’t need hashing, but encryption.
It works like that:
– apply a HMAC-based key derivation function on the file password => get a private encryption key (basically this is the first use case above)
– encrypt the confidential file using the encryption key at the step above
– store the encrypted file in the cloud
At this point, we have a public encrypted file, and a private encryption key.
To decrypt the file you will need the encryption key. As we’ve seen above, this would be very difficult considering the process of generating it (salt, HMAC, key derivation). For additional security, use a sufficiently long randomly generated salt and high iteration count when deriving key from the password (link).
3. Downloaded file integrity verification
Probably the only place where you can still use md5/sha1 for considerations of speed + non-confidential information.
Some real life examples
1. Linkedin – or what happens when you do a
More than 6 million LinkedIn passwords stolen
The 6.5 million leaked passwords were posted Monday on a Russian online forum, camouflaged with a common cryptographic code called SHA-1 hash. It’s a format that’s considered weak if added precautions aren’t taken.
2. Reddit …but storing
sha1(password) is still better than storing just the
password (that is, if you have a password resistant to brute-force rainbow tables attack):
Recently, the folks behind Reddit.com confessed that a backup copy of their database had been stolen. Later, spez, one of the Reddit developers, confirmed that the database contained password information for Reddit’s users, and that the information was stored as plain, unprotected text. In other words, once the thief had the database, he had everyone’s passwords as well.
Never store passwords in a database!
Related link: You’re Probably Storing Passwords Incorrectly
3. LastPass explains how they are using PBKDF2 + SHA256 to turn the master password into an encryption key:
LastPass has opted to use SHA-256, a slower hashing algorithm that provides more protection against brute-force attacks. LastPass utilizes the PBKDF2 function implemented with SHA-256 to turn your master password into your encryption key. LastPass performs x number of rounds of the function to create the encryption key, before a single additional round of PBKDF2 is done to create your login hash.
LastPass User Manual: Password Iterations (PBKDF2)
4. Finally, a very comprehensive and well appreciated StackOverflow answer: http://stackoverflow.com/questions/4948322/fundamental-difference-between-hashing-and-encryption-algorithms
Written by Dorin Moise (Published articles: 260)