Welcome back for part two in this six part series of building your own advanced File Integrity Monitor. In this post, we’ll be upgrading the MD5 hashing to SHA-256, as well as grabbing the Base64 encoded bytes of our files.

There isn’t a whole lot of work to do in this post, so I’d like to start off by writing a bit about MD5, and why it is not a good fit for our project.

Note: If you haven’t completed Part 1 of this series, check that out before continuing.

MD5: What is it?

MD5, or Message Digest 5, is a hashing algorithm commonly used for checking data integrity. It can be seen on download pages as a checksum, used in databases for hashing passwords, and even found in other FIMs.

Having been around for almost three decades, researchers have had plenty of time to poke and prod the algorithm, finding a number of weaknesses within it. This include an increasingly rapid rate of collision generation due to improving technology, and a growing number of Rainbow-Tables that allow for quick matching of known hashes.

Because of these issues, MD5 is not recommended for use in secure systems, or in any application requiring data integrity.

Difference between MD5 and SHA-256

SHA-256 is part of the Secure Hashing Algorithm family and is considered to be the new standard in hashing.

Check out the hashes generated for the string “Science Vikings”:

As you can see, the SHA-256 output is twice as long as the MD5 hash. This is because SHA-256 uses twice as many bits, giving it a much higher resistance to collisions but also a slightly longer calculation time.

Hash Times

The above times (in seconds) are based on larger files, as you wont see much change until you’re dealing with gigabytes and terabytes worth of data. SHA-256 ends up taking roughly 30% longer than MD5, but this is negligible compared to what we gain in security.

Upgrading MD5

Now that we’ve got a basic understanding of MD5, lets get rid of it.

Walk-through:

Pretty straight forward changes, so we’ll go over it quickly. Because we’re already importing Python’s hashlib module, we can access all the other hashes within it. By simply replacing hashlib.md5() with hashlib.sha256(), the upgrade process is complete.

I also went through and renamed all instances of the md5 variable with sha356 to avoid any confusion later.

Retrieve Bytes from files

In my original post on Basic File Integrity Monitors, I mentioned that some FIMs can actually prevent changes from sticking. Retrieving bytes from our known safe files is the first step towards accomplishing this goal in our own FIM.

Walk-through:

output

getBytes()

Adding getBytes() to the main script

Now that we have a working getBytes() function going, let’s add it to our main script. We’ll also want to update our code to store the new data in our files variable.

Walk-through

Conclusion

This wraps up part two of building our FIM. We’ve now helped to secure our detection against collisions, as well as prepare our script for self-healing capabilities. In the upcoming post, we’ll be taking a look at implementing a database for storage, so be sure to check back soon.