HashSets.com complements NSRL

This website was designed to complement file hash sets released by the National Software Reference Library (NSRL), US Commerce Department NIST (National Institute of Standards and Technology) (www.nsrl.nist.gov).  The NSRL maintains the largest known number of hash values consisting of more than 432 million modern day computer files (72,015,335 unique) as of March 2026 which are free to the public for download.

During November 2003, while initially reviewing the NSRL Dataset releases up to that point we observed the MD5 and SHA-1 hash values were a direct result of very advanced custom scripting aimed at software product media (Floppy, CD and DVD).  This advanced scripting included processes to parse out and hash files found within a software product’s compressed and uncompressed files (cab files, zip files, ISO files, etc).

While performing some of our own validation testing of the NSRL Datasets we discovered that far more unidentified hash values could be derived from the actual installation of computer software, operating systems, etc. The NSRL Datasets were unfortunately not yet a direct result of a product’s ‘installation process’. 

To show the differences during our earlier findings we installed a typical Microsoft Windows Operating System (i.e. Vista Home Basic at that time) onto two non-similar compatible computers and then performed a file hash analysis across both systems to see how many hash values the most recent release of a NSRL hash set we could detect.

From an average of 36,002 files installed onto either compatible computer system the NSRL hash sets detected 8,324 files from within its own hash library. That is a discovery of 23% of files that are known to be installed from a sample Microsoft Windows operating system CD/DVD and are therefore considered trustworthy, known and non-threatening during any typical digital forensic and cybersecurity examination.

Using our own method of installing an operating system and then gathering the common hash values between both computers we were able to detect 99.98% of the files that were known to be installed from a Microsoft Windows operating system and were therefore also considered trustworthy, known and non-threatening. Specifically, 35,456 files were detected on either test computer.

Based on the larger number of hash values discovered we decided that spending the added time and effort of installing an operating system, hashing and then gathering all unique hash values into one hash set would be just as valuable as the NSRL datasets and would additionally complement any current NSRL datasets during digital forensic and cybersecurity examinations.

It is important to understand that this analysis does NOT suggest in any form or manner that digital forensic and cybersecurity examiners should consider discontinuing the use of NSRL datasets. On the contrary, the NSRL datasets are EXTREMELY significant to the cybersecurity community as they provide the largest known depository of free hash values (far more than 72M+ unique as of 2026) for many current and legacy software and operating system programs.

To summarize, our goal with this website is to recommend that when performing digital forensics and cybersecurity investigations every analyst, examiner and professional should seek out and incorporate all hash values that could possibly off set other ‘unidentified’ computer files and their hash values throughout an examination. This is especially true if the digital forensic or cybersecurity analysis entails large scale, timely and thorough analysis.