Cryptographic hash algorithms

What are hash algorithms ?

A cryptographic hash function is an algorithm (sequence of mathematic and cryptographic operations) that takes an arbitrary amount of data and produces a fixed-size output of enciphered text called a hash. They are usually used to secure an information transfer between two computer system. Hash functions allow mainly to confirm some properties of an IT message, such as its destination, its origin, or just to make the difference between two similar items without revealing its content.

Example of hash function

md5(a) = 0cc175b9c0f1b6a831c399e269772661

md5(b) = 92eb5ffee6ae2fec3ad71c777531578f

You can check that the first requirement is respect, which is determinism, by going yourself on a website which calculate a SHA-256 hash. Each SHA-256 implementation is identic so you can go on every website or through every programming language you should get the same result than I. Be careful with the interlaced o-e which is one unique character. You will observe while the efficiency requirement is respected since the result is instant.

Finally, you will notice that we can easily differenciate the hash of two close words. Of course, checking the security requirement of attack resistance and unicity of the hash is not easy. We need to do that in order to do very hard mathematical operations and even sometimes spend years to make a bit of progress. This is not a task for everyone, and we will have to trust cryptography specialists. Concerning SHA-256 the function seams secured until now. But its predecessors where all break by exceptional mathematicians and cryptographs, so we need to keep in mind that this may append to SHA-256.

Therefore, many cryptographs often set new alternative functions, modulated to improve some points, which can be a security improvement or a speed improvement for example. The SHA3 function, also called Keccak, and the Whirlpool are example among others of new functions which are set to explore new way of functioning to get better results.

How do they work ?

It is both very simple and very complicated.

Considering the mathematic aspect, everything depends on the hash function which is used: there are dozens of them, some more secured than others. Every existing function is based on different protocols: substitution, permutation, content addition, difference of content… Each function leads in consequence to different results.

To keep it simple, the mathematic aspect of a hash function is not very different from antic cyphertext, or the Enigma code. To understand what the result means or does; we have to know the procedure which was used.

Hash functions have however in common the fact that they are used in a computer science context. Another difference with the previous cryptographic functions is that the objective is not to recover the initial message from the result.

In order to do that, hash functions must follow some rules:

  • Determinism: the result of the function must be invariable in each circumstance. Whenever the function is used, by whom and independantly of how many times is it used, if the initial message is the same the result must be the same.

  • Efficiency: The hash function must allow to get an instant result, otherwise it would slow down the whole system and each computer system depending on it.

  • Attack resistance: two messages which are close must generate two easily separable hash.

  • Result unicity: It must be impossible to generate two similar results with two different inputs.