In malware research, they are useful for positively identifying a piece of software. If a suspected file has the same cryptographic hash as a known file, an analyst is reasonably confident that the files are identical. Modifying even a single bit of a malicious file, however, will alter its cryptographic hash. The result is that inconsequential changes to malicious files will prevent analysts from rapidly observing that a suspected file is identical to a file they have already seen. To counter this behavior, analysts seek improved ways of assessing whether two files are similar. One such method is known as fuzzy hashing.
These methods produce hash values that allow analysts to assign a percentage score that indicates the amount of content that the two files have in common. A recent type of fuzzy hashing, known as context triggered piecewise hashing, has gained enormous popularity in malware detection and analysis in the form of an open-source tool called ssdeep. You can download ssdeep tool at http://ssdeep.sourceforge.net.
Usage: ssdeep [-m file] [-k file] [-dpgvrsblcxa] [-t val] [-h|-V] [FILES]
-m – Match FILES against known hashes in file
-k – Match signatures in FILES against signatures in file
-d – Directory mode, compare all files in a directory
-p – Pretty matching mode. Similar to -d but includes all matches
-g – Cluster matches together
-v – Verbose mode. Displays filename as its being processed
-r – Recursive mode
-s – Silent mode; all errors are supressed
-b – Uses only the bare name of files; all path information omitted
-l – Uses relative paths for filenames
-c – Prints output in CSV format
-x – Compare FILES as signature files
-a – Display all matches, regardless of score
-t – Only displays matches above the given threshold
-h – Display this help message
-V – Display version number and exit
$ ssdeep config.h INSTALL doc\README
config.h, INSTALL, and doc\README correspond to the file that want to get hashed.
$ ssdeep -b foo.txt > hashes.txt
hashes.txt is the output hash from foo.txt
$ ssdeep -bm hashes.txt bar.txt
Match bar.txt from signature in hashes.txt
We can also compare files in folder 1 to folder 2, find truncated files, compare signatures, compare signature with file not signature.