MD5 algorithm
I'm trying to implement the MD5 algorithm. (Note: I'm not looking for an
existing MD5 library; this is an exercise). It compiles and returns
something that looks like a hash superficially, but that's clearly the
wrong result. Even if there were a Unicode/Ascii problem, the empty
string should return the correct result (since it is always padded to
one 1-bit and 511 0-bits), but it doesn't.
Also, several strings (say, "" and "The") return the same incorrect hash
- but by dumping the 128-bit hash in each iteration, I've found only
very little in common between the intermediate steps, only the final one
is completely identical.
This is the function, as implemented from the pseudocode on Wikipedia.
http://en.wikipedia.org/wiki/MD5#Pseudocode
--------------------------------------------------------
Constants:
int[4] s = {
0x67452301,
0xEFCDAB89,
0x98BADCFE,
0x10325476
};
int[64] shift= {
7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20,
4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21
};
int[64] k;
for (int i=0;i<64;i++)
{
k[i]=(int)(Math.abs(Math.sin(i+1))*1073741824*4);
}
private static int[] processFin(int[] message)
{
int[] result=s.clone(); // initialize
// message is already padded, just split:
int blocks=message.length/16;
for (int i=0;i<blocks;i++)
{ // for each 16-int block
int[] state=result.clone(); // copy state
for (int j=0;j<64;j++)
{
int f,g,temp;
if (j<16)
{
f= state[1] & state[2] | ~state[1] & state[3];
g=j;
}
else if (j<32)
{
f=state[3] & state[1] | ~state[3] & state[2];
g=(5*j+1)%16;
}
else if (j<48)
{
f=state[1]^state[2]^state[3];
g=(3*j+5)%16;
}
else
{
f=state[2]^ (state[1] | ~state[3]);
g=(7*j)%16;
}
temp=state[3];
state[3]=state[2];
state[2]=state[1];
state[1]=(state[0]+f+k[j]+message[i*16+g]) << shift[j];
state[0]=temp;
}
for(int j=0;j<4;j++) result[i]+=state[i];
}
return result;
}
---------------------------------------------
The hex string returned for both "" and "The" is
3c00a101efcdab8998badcfe10325476.
I just can't see what's wrong with the code - all parts of the
pseudocode description seem to be implemented exactly as they should
be... unless the signed ints don't wrap modulo 2^32 as unsigned ones
should. But they do for all examples I could think of, and anyway, that
shouldn't cause those hashes to be exactly the same...
--cb