MD5

MD5

MD5, meaning Message Digest 5, is the director successor of MD4. It was created in 1991 by Ronald Rivest by improving MD4’s architecture in order to counter the potential attacks that could be done onto it. The main problem with this function was the high probability of finding a collision, which was lowered with its successor MD5, but unfortunately did not suppress this risk.

MD5

History

MD5 is the successor of 2 other hash functions, MD2 and MD4, who were all created by Ronald Rivest. MD2 was created in 1989 and was in fact more like a signature algorithm. As it was destined to 8bits processors, it’s no longer used, and was replaced by MD4 and MD5. MD4 was created in 1990 but wasn’t used long, as its improved version, MD5 was released a year later. It was replaced so fast because it was rapidly found out that MD4 was not cryptographically safe, as collisions can be generated with only approximately (2^8) operations.

MD5 was used from 1991 until 2004 when it was found that it wasn’t cryptographically safe as a Chinese team found out a method to create collisions for the full MD5.

How does it work

MD5 is a hash function using a variable size message and creates a 128 bits print.

By the way it proceed, it will first divide the message in blocs with a 512 bits size, which are then filled with the same process, being : Putting a 1 at the end of the message; adding a sequence of zero depending on how much needs to be filled; writing the size of the initial message, coded on 64bits. This filling is always done, even if the message length can already be divided by 512. This padding method is commonly used in the functions of the MD family of course, but also in the SHA family. This is a necessary preparation of the message, before sending it into the main loop. Each bloc will then be subdivided in 16 words of 32 bits. It is worth noting that every bloc uses a bit organization convention called “Little Endian”, putting the lightest word at the beginning.

These words are then initialized with the constants of the function. The main algorithm then uses each 512-bit message block in turn to modify the state. The processing of a message block consists of four similar stages, called rounds; each round is composed of 16 similar operations based on a non-linear function F, modular addition, and left rotation. There are four possible functions, a different one being used each round (the words to be encripted are named A, B, C and D here):
F(B,C,D)=(B⋀C)⋁(¬B⋀D)
G(B,C,D)=(B∧D)∨(C∧¬D)
H(B,C,D)=B⨁C⨁D
I(B,C,D)= C⨁(B∨¬D)

Possible threats

As said earlier, MD5 was made to be more collision resistant than MD4, and it was. However, it wasn’t long before this weakness was exposed again. It was in 1993 that the first pseudo-collision on the compression function of MD5 was discovered, but a limited one. Unfortunately, in 1996 was discovered the first partial collision on MD5, which was enough for cryptographers to replace it with SHA-1 or RIPEMD-160.

Another weakness of MD5 is that its short size of only 128 bits makes it small enough for a birthday attack to be performed. It was theorized in 2005 by a Chinese team and proven in 2007 by a project called Hashclash, led by Marc Stevens as his master’s degree thesis. He managed to demonstrate construction of two X.509 certificates (security certificates for web pages) with different public keys and the same MD5 hash value. Later, in 2010, the same Chinese team as before managed to create the first full collision on MD5.

Finally, again due to its short size, rainbow table attacks can be performed too on the MD5 function. However, it can easily be countered, or at least made longer to perform, by adding a random salt to the MD5.

<![endif]–>

The cost of an attack

Doing an attack on a MD5 hash is actually pretty cheap, as it is an old hash function. The Chinese team that demonstrated it no longer being secure used a cluster of IBM p690, a server machine used here as a workstation, and managed to create a collision within an hour. These server machines had POWER4 processors, which have a calculation strength, or clock rate, of around 1.5GHz to 2.3GHz. That means that any recent processor, which have a clock rate of around 3.5GHz, could be used in order to create a collision.

Furthermore, a collision attack exists that can find collisions within seconds on a computer with a 2.6 GHz Pentium 4 processor (complexity of 224), which means a mere 10$ are needed to perform this attack, given you just need the CPU.

But there is more, as there is also a chosen-prefix collision attack that can produce a collision for two inputs with specified prefixes within seconds, using off-the-shelf computing hardware (complexity 239). The ability to find collisions has been greatly aided by the use of off-the-shelf GPUs. On an NVIDIA GeForce 8400GS graphics processor, 16–18 million hashes per second can be computed. An NVIDIA GeForce 8800 Ultra can calculate more than 200 million hashes per second. So this attack can be performed if you buy only the graphics processor by paying between 20 and 40$ depending on where you buy the GPU and which one you buy.

Estimating the time needed for the attack

If you want to perform an attack, for exemple the chose-prefix collision attack we just talked about, using the material listed, you might want to know how much time you would need to wait before you are sure to have found an actual collision. Let’s do a bit of math:

The complexity of this attack is around 239and our GPUs can calculate between 16 million and 200 million hashes per second. Let’s estimate for our first GPU.

We need 239/16.106=34359 seconds to calculate all the hashes needed, or around 9h30. Using the other GPU, it will take 239/200.106=2748 seconds, or around 45 minutes.

Should I use MD5?

It is absolutely not advised to use MD5 in any way related to security, as collisions can be found very easily with this function. However, you can still use it when doing checksums, as it is still very hard to modify a message without modifying the checksum it is associated to.

Even then, as of 2019 one quarter management content systems still used MD5 for password hashing.