This website was designed to¬†complement file hash sets¬†released by the National Software Reference Library (NSRL),¬†US Commerce Department NIST (National Institute of Standards and Technology) (www.nsrl.nist.gov).¬†¬†The NSRL maintains¬†the largest known number¬†of¬†hash values (more than¬†128 million files analyzed as of 2016) which¬†are¬†free¬†to the public.
During¬†November 2003, while reviewing the NSRL Dataset releases we observed the¬†MD5 and¬†SHA-1 hash values¬†were a¬†direct result of very advanced custom scripting aimed at¬†software¬†product media¬†(Floppy,¬†CD and DVD).¬† This¬†advanced scripting included processes to parse out and¬†hash files found within¬†a software product’s compressed¬†files (cab files, zip files, etc).
While performing some of our own validation testing¬†of the NSRL Datasets¬†we discovered that far¬†more¬†unidentified¬†hash values could be derived from the actual installation¬†of¬†computer software,¬†operating systems, etc. The NSRL Datasets were¬†unfortunately not a direct result of a product’s¬†‘installation process’.
To show the¬†differences during our earlier findings we installed a typical Microsoft Windows Operating System (i.e. Vista Home Basic)¬†onto two non similar IBM compatible computers and then performed a file hash analysis across both systems to see how many hash values the most recent release of a NSRL hash set we could detect.
From an average of 36,002 files installed onto either Intel compatible computer system the NSRL hash sets detected 8,324 files from within its own hash library. That is a discovery of 23% of files that are known to be installed from a sample Microsoft Windows operating system CD/DVD and are therefore considered trustworthy, known and non-threatening during any typical computer forensic examination.
Using our own method of¬†installing an operating system and then gathering the common hash values between both computers we were able to detect 99.98% of the files that were known to be installed from a Microsoft Windows operating system and were therefore also considered trustworthy, known and non-threatening. Specifically, 35,456 files were detected on either test computer.
Based on the¬†larger number of hash values discovered we decided that spending the added time and effort of¬†installing an operating system, hashing and then gathering¬†all unique hash values into one hash set would be just as valuable as the NSRL datasets and would additionally complement any current NSRL datasets during¬†computer forensic examinations.
It is important to understand that this analysis does NOT¬†suggest in any form or manner¬†that¬†computer forensic and computer security¬†examiners should consider discontinuing¬†the use of NSRL datasets. On the contrary, the NSRL datasets are EXTREMELY¬†significant¬†to the computer forensic and computer security communities as they provide the largest known¬†depository of hash values (far more than 40,000,000 unique as of 2016) for free for many current and legacy software and operating system programs.
To summarize,¬†our goal with¬†this website¬†is to¬†recommend that when performing computer forensics or computer security investigations every¬†analyst,¬†examiner and professional should seek out and¬†consider¬†additional hash values that¬†could possibly off set¬†other ‘unidentified’¬†computer files and¬†their hash values throughout¬†an examination. This is especially true¬†if the computer forensic or security analysis entails large scale, timely and¬†thorough analysis.