This website was designed toÂ complement file hash setsÂ released by the National Software Reference Library (NSRL),Â US Commerce Department NIST (National Institute of Standards and Technology) (www.nsrl.nist.gov).Â Â The NSRL maintainsÂ the largest known numberÂ ofÂ hash values (more thanÂ 128 million files analyzed as of 2016) whichÂ areÂ freeÂ to the public.
DuringÂ November 2003, while reviewing the NSRL Dataset releases we observed theÂ MD5 andÂ SHA-1 hash valuesÂ were aÂ direct result of very advanced custom scripting aimed atÂ softwareÂ product mediaÂ (Floppy,Â CD and DVD).Â ThisÂ advanced scripting included processes to parse out andÂ hash files found withinÂ a software product’s compressedÂ files (cab files, zip files, etc).
While performing some of our own validation testingÂ of the NSRL DatasetsÂ we discovered that farÂ moreÂ unidentifiedÂ hash values could be derived from the actual installationÂ ofÂ computer software,Â operating systems, etc. The NSRL Datasets wereÂ unfortunately not a direct result of a product’sÂ ‘installation process’.
To show theÂ differences during our earlier findings we installed a typical Microsoft Windows Operating System (i.e. Vista Home Basic)Â onto two non similar IBM compatible computers and then performed a file hash analysis across both systems to see how many hash values the most recent release of a NSRL hash set we could detect.
From an average of 36,002 files installed onto either Intel compatible computer system the NSRL hash sets detected 8,324 files from within its own hash library. That is a discovery of 23% of files that are known to be installed from a sample Microsoft Windows operating system CD/DVD and are therefore considered trustworthy, known and non-threatening during any typical computer forensic examination.
Using our own method ofÂ installing an operating system and then gathering the common hash values between both computers we were able to detect 99.98% of the files that were known to be installed from a Microsoft Windows operating system and were therefore also considered trustworthy, known and non-threatening. Specifically, 35,456 files were detected on either test computer.
Based on theÂ larger number of hash values discovered we decided that spending the added time and effort ofÂ installing an operating system, hashing and then gatheringÂ all unique hash values into one hash set would be just as valuable as the NSRL datasets and would additionally complement any current NSRL datasets duringÂ computer forensic examinations.
It is important to understand that this analysis does NOTÂ suggest in any form or mannerÂ thatÂ computer forensic and computer securityÂ examiners should consider discontinuingÂ the use of NSRL datasets. On the contrary, the NSRL datasets are EXTREMELYÂ significantÂ to the computer forensic and computer security communities as they provide the largest knownÂ depository of hash values (far more than 40,000,000 unique as of 2016) for free for many current and legacy software and operating system programs.
To summarize,Â our goal withÂ this websiteÂ is toÂ recommend that when performing computer forensics or computer security investigations everyÂ analyst,Â examiner and professional should seek out andÂ considerÂ additional hash values thatÂ could possibly off setÂ other ‘unidentified’Â computer files andÂ their hash values throughoutÂ an examination. This is especially trueÂ if the computer forensic or security analysis entails large scale, timely andÂ thorough analysis.