Building a File Integrity Monitor - Part 2
- MD5: What is it?
- Difference between MD5 and SHA-256
- Upgrading MD5
- Retrieve Bytes from files
- Adding getBytes() to the main script
- Conclusion
Welcome back for part two in this six part series of building your own File Integrity Monitor. In this post, we’ll be upgrading the MD5 hashing to SHA-256, as well as grabbing the Base64 encoded bytes of our files.
There isn’t a whole lot of work to do in this post, so I’d like to start off by writing a bit about MD5, and why it is not a good fit for our project.
Note: If you haven’t completed Part 1 of this series, check that out before continuing.
MD5: What is it?
MD5, or Message Digest 5, is a hashing algorithm commonly used for checking data integrity. It can be seen on download pages as a checksum, used in databases for hashing passwords, and even found in other FIMs.
Having been around for almost three decades, researchers have had plenty of time to poke and prod the algorithm, finding a number of weaknesses within it. This include an increasingly rapid rate of collision generation due to improving technology, and a growing number of Rainbow-Tables that allow for quick matching of known hashes.
Because of these issues, MD5 is not recommended for use in secure systems, or in any application requiring data integrity.
Difference between MD5 and SHA-256
SHA-256 is part of the Secure Hashing Algorithm family and is considered to be the new standard in hashing.
Check out the hashes generated for the string “Science Vikings”:
- MD5: 8aecf444e829fd7717cd4c3ad57f80c1
- SHA-256: 55168a73fd9e98e5fef9ae8fe615cf78445613d384ffe4a4b01a812f78975451
As you can see, the SHA-256 output is twice as long as the MD5 hash. This is because SHA-256 uses twice as many bits, giving it a much higher resistance to collisions but also a slightly longer calculation time.
The above times (in seconds) are based on larger files, as you won’t see much change until you’re dealing with gigabytes and terabytes worth of data. SHA-256 ends up taking roughly 30% longer than MD5, but this is negligible compared to what we gain in security.
Upgrading MD5
Now that we’ve got a basic understanding of MD5, let’s get rid of it.
import os,hashlib,time
monitor=[
{'path':'E:\Dropbox\SVL\Projects\AdvancedFIM_GatherFiles','recursive':True},
{'path':'E:\Dropbox\SVL\Projects','recursive':False},
{'path':'E:\Dropbox\SVL\Projects\BasicFIM\BasicFIM.py','recursive':False}
]
files={}
def getFiles():
filesList=[]
for x in monitor:
if os.path.isdir(x['path']):
if x['recursive']:
filesList.extend([os.path.join(root, f) for (root, dirs, files) in os.walk(x['path']) for f in files])
else:
filesList.extend([item for item in os.listdir(x['path']) if os.path.isfile(item)])
elif os.path.isfile(x['path']):
filesList.append(x['path'])
return filesList
while True:
for file in getFiles():
hash = hashlib.sha256()
with open(file) as f:
for chunk in iter(lambda: f.read(2048), ""):
hash.update(chunk)
sha256 = hash.hexdigest()
if file in files and sha256 <> files[file]:
print '%s\t%s has been changed!'%(time.strftime("%Y-%m-%d %H:%M:%S") , file)
files[file]=sha256
time.sleep(1)
Walk-through:
Pretty straight forward changes, so we’ll go over it quickly. Because we’re already importing Python’s hashlib
module, we can access all the other hashes within it. By simply replacing hashlib.md5()
with hashlib.sha256()
, the upgrade process is complete.
I also went through and renamed all instances of the md5
variable with sha356
to avoid any confusion later.
Retrieve Bytes from files
In my original post on Basic File Integrity Monitors, I mentioned that some FIMs can actually prevent changes from sticking. Retrieving bytes from our known safe files is the first step towards accomplishing this goal in our own FIM.
import os,hashlib,base64
file='E:\Dropbox\SVL\Projects\BasicFIM\BasicFIM.py'
def getBytes(file):
return base64.b64encode(open(file, "rb").read())
print getBytes(file)
Walk-through:
- Line 1: In addition the the
os
andhashlib
modules you’re already familiar with, thebase64
library will be imported as well. - Line 3: Declaring a variable
file
to pass to thegetBytes()
function. - Line 6: Returns output of the
base64
module’sb64encode()
function after passing it the file in “read binary” mode.
output
Adding getBytes() to the main script
Now that we have a working getBytes()
function going, let’s add it to our main script. We’ll also want to update our code to store the new data in our files
variable.
import os,hashlib,time,base64
monitor=[
{'path':'E:\Dropbox\SVL\Projects\AdvancedFIM_GatherFiles','recursive':True},
{'path':'E:\Dropbox\SVL\Projects','recursive':False},
{'path':'E:\Dropbox\SVL\Projects\BasicFIM\BasicFIM.py','recursive':False}
]
files={}
def getFiles():
filesList=[]
for x in monitor:
if os.path.isdir(x['path']):
if x['recursive']:
filesList.extend([os.path.join(root, f) for (root, dirs, files) in os.walk(x['path']) for f in files])
else:
filesList.extend([item for item in os.listdir(x['path']) if os.path.isfile(item)])
elif os.path.isfile(x['path']):
filesList.append(x['path'])
return filesList
def getBytes(file):
return base64.b64encode(open(file, "rb").read())
while True:
for file in getFiles():
hash = hashlib.sha256()
with open(file) as f:
for chunk in iter(lambda: f.read(2048), ""):
hash.update(chunk)
sha256 = hash.hexdigest()
if file in files and sha256 <> files[file]['sha256']:
print '%s\t%s has been changed!'%(time.strftime("%Y-%m-%d %H:%M:%S") , file)
files[file]={'sha256':sha256,'bytes':getBytes(file)}
time.sleep(1)
Walk-through
- Line 20: Added the
getBytes()
function to the script. - Line 31: The previous script set dictionary values to a string containing the hash. This updated line will now set the value to a dictionary containing the SHA-256 hash and base64 output from
getBytes()
.
Conclusion
This wraps up part two of building our FIM. We’ve now helped to secure our detection against collisions, as well as prepare our script for self-healing capabilities. In the upcoming post, we’ll be taking a look at implementing a database for storage, so be sure to check back soon.