How to Verify File Integrity with Hashes
When you download software, firmware, or important documents, how do you know the file is exactly what the publisher intended? File hashing gives you a cryptographic fingerprint, a unique string that changes if even a single byte of the file is different. Verifying a hash takes seconds and can save you from installing tampered software, flashing a broken firmware that bricks a device, or trusting a corrupted backup that fails when you actually need to restore from it.
A short history of hash functions
The idea of a checksum is older than computing itself, accountants used arithmetic digit-sums to detect transcription errors long before there were files to verify. Cryptographic hashes appeared in the late 1980s. Ron Rivest published MD4 in 1990 and MD5 in 1991, which became the de facto checksum for two decades. NIST standardised SHA-0 in 1993, then quickly withdrew it in favour of SHA-1 in 1995 after a flaw was discovered. The SHA-2 family (SHA-224, SHA-256, SHA-384, SHA-512) followed in 2001 to address the looming weakness of SHA-1.
Each generation was retired by attacks that turned out to be cheaper than expected. MD5 collisions were demonstrated in 2004, made practical for digital certificates in 2008, and are now trivial. SHA-1 fell to Google's SHAttered attack in 2017, when researchers produced two different PDFs with identical SHA-1 digests. SHA-2 has held up since 2001, and SHA-3 (Keccak, standardised in 2015) provides a structurally different fallback in case SHA-2 ever breaks. The lesson is that hash algorithms have a shelf life; the right algorithm today may need replacing within a decade.
How file hashing works
A hash function reads every byte of a file and produces a fixed-length string. The same file always produces the same hash. Change one byte, and the hash changes completely, this is called the avalanche effect and is the property that makes hashes useful for verification.
Example:
- Original file (SHA-256):
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 - Same file, one byte changed:
d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592
The two hashes share no recognisable pattern even though the files differ by a single bit. That sensitivity is what makes verification possible: generate the hash, compare it to the published hash, and you know instantly whether the file is authentic.
Internally, modern hash functions chunk the file into fixed-size blocks (64 bytes for SHA-256, 128 for SHA-512), feed each block through a compression function, and chain the state forward. The output is the final state after the last block has been mixed in. Because the chain depends on every byte, no shortcut lets an attacker change content without rewriting the entire hash.
How to verify a file
- Find the official hash, the software publisher typically lists file hashes on their download page, often labelled "SHA-256 checksum" or "SHA256SUMS file".
- Upload your downloaded file: select the file in the hash calculator. The hash is computed locally in your browser; the file never leaves your machine.
- Compare the hashes: if your calculated hash matches the official hash exactly, the file is authentic and uncorrupted. Copy-paste both into a text diff if the strings are long.
- Match the algorithm: SHA-256 hashes only match other SHA-256 hashes. If the publisher gives you SHA-512, generate SHA-512 too; mixing algorithms is the most common "they do not match" mistake.
- Verify the published hash itself when possible: a signed SHA256SUMS file (signed with the publisher's GPG key) tells you the hash list was not tampered with, which the bare hash on a download page does not.
When to verify file hashes
- Software downloads: verify that installers and updates have not been tampered with or corrupted during download. Anyone serving you the binary, your ISP, a CDN, a mirror, could in theory swap it for something hostile.
- Firmware updates: a corrupted firmware file can brick a device. Always verify before flashing a router, IoT device, or motherboard BIOS, recovery often requires special hardware.
- ISO images: operating system images should be verified before burning to USB or installing. Linux distributions universally publish both SHA-256 hashes and GPG signatures over those hashes; verify both.
- Legal and financial documents: verify that contracts, audit reports, or financial statements have not been altered after signing or sharing. Hashes provide a tamper-evident receipt.
- Backup verification, confirm that backup files are identical to the originals. Silent disk corruption is real, and a backup whose hash drifted since creation may not restore cleanly.
- CI/CD artifacts: pinning a build artifact by its SHA-256 in a deployment pipeline guarantees that the binary you tested is the binary you deploy. Container images do this by design with their content-addressable digests.
- Software supply chain audits: when a security advisory says "this version has hash X," the only way to confirm what is on your servers is to hash the file there and compare. Without it you are trusting the version string, which the attacker controls.
Supported algorithms
| Algorithm | Hash length | Output bits | Recommendation |
|---|---|---|---|
| MD5 | 32 hex chars | 128 | Legacy only, broken, accidental-corruption only |
| SHA-1 | 40 hex chars | 160 | Legacy only, broken, do not trust for security |
| SHA-224 | 56 hex chars | 224 | Niche; prefer SHA-256 |
| SHA-256 | 64 hex chars | 256 | Recommended general-purpose standard |
| SHA-384 | 96 hex chars | 384 | High security, used in TLS 1.3 cipher suites |
| SHA-512 | 128 hex chars | 512 | Maximum strength of SHA-2, fast on 64-bit CPUs |
| SHA3-256 | 64 hex chars | 256 | Different internal design from SHA-2, future-proof |
| BLAKE2b/BLAKE3 | varies | 256 or 512 | Fastest modern hashes, used by rsync, restic |
| CRC32 | 8 hex chars | 32 | Error detection only, not a security hash |
If you have a choice, SHA-256 is the right default. Use SHA-512 on 64-bit machines for slightly better performance (the algorithm is tuned for 64-bit words). Reach for BLAKE3 when throughput matters most, it can saturate modern NVMe SSDs in ways SHA-256 cannot.
Hash vs digital signature
A hash tells you whether a file changed. A digital signature tells you whether it changed AND who created the hash. A signature is a hash that has been encrypted with the publisher's private key; you decrypt it with their public key, recompute the hash, and check that the two match. If they do, you know the file is intact AND that the publisher (or someone with their private key) approved it.
When a download page shows both a SHA-256 hash and a .sig or .asc file, the hash protects against corruption and accidental tampering, but the signature is what protects against an attacker who breached the download server. The attacker can swap the file and update the displayed hash; they cannot forge a valid signature without the publisher's key.
Common pitfalls
- Comparing the wrong algorithm, MD5 and SHA-256 produce strings of different length; mismatched algorithms will never match, even on identical files.
- Trusting the hash served from the same place as the file, if both are on the same compromised mirror, both can be replaced. Whenever possible, fetch the hash from a different domain (the project's GitHub releases page, a signed SHA256SUMS file).
- Hex case sensitivity, hashes are case-insensitive (
a3fandA3Frepresent the same byte). String comparison may still flag them as different. Lower-case both sides or compare as bytes. - Trailing whitespace and BOMs, a copied hash with a trailing newline or invisible character will look identical but not match. Trim both sides before comparing.
- Hashing the wrong file, on Windows the download path may be
Downloads\file (1).exerather thanDownloads\file.exeif you have downloaded it twice. Verify the path before hashing. - Treating MD5 or SHA-1 as security guarantees, an attacker can produce a benign-looking file with the same MD5 as a malicious one in seconds. For security, always use SHA-256 or stronger.
- Verifying the wrong thing, if the publisher hashes the compressed
.tar.gzand you hash the extracted contents, neither will match. Hash whatever was hashed; usually the downloaded archive itself. - Ignoring the GPG signature, many people verify the hash but skip the signature. The signature is what defends against the attacker who controls the mirror.
- Network corruption silent failures, a truncated download can match a partial file hash if both were truncated identically. Always hash the complete file you intend to use, not a chunk.
- Browser caching of old downloads, a stale cached file can fool a verification step. Force a fresh download if the hash mismatches and you suspect caching.
Alternative tools and contexts
A web hash calculator is the fastest path when you have one file to check. For repeated use or scripting, command-line tools are the standard.
| Tool | Platform | Strength | Watch out for |
|---|---|---|---|
| Web hash calculator | Browser | No install, file never uploads | One file at a time |
sha256sum |
Linux | Fast, scriptable, GNU coreutils | --check reads SHA256SUMS files |
shasum -a 256 |
macOS, BSD | Bundled, same output format | Different binary name than Linux |
Get-FileHash |
Windows PowerShell | First-class on Windows | Output format differs from sha256sum |
certutil -hashfile |
Windows cmd | Available on every Windows | Verbose output needs parsing |
openssl dgst -sha256 |
Cross-platform | If you already have OpenSSL | Slower than dedicated tools |
b3sum |
Cross-platform | BLAKE3, multi-GB/s throughput | Newer, less ubiquitous |
rhash |
Cross-platform | Computes many algorithms at once | Extra install |
In CI/CD pipelines the same task usually runs as sha256sum file > file.sha256 during build and sha256sum -c file.sha256 during verify, sometimes wrapped in a signed manifest. The principle (compute, publish, compare on retrieval) is identical to what the browser tool does interactively.
Privacy and the hash calculator
The hash calculator runs entirely in your browser. The file you select is read with the FileReader API, fed through the Web Crypto SubtleCrypto interface, and the resulting hash is shown to you. The file's bytes never travel to our servers, there is no upload, no log of which files were hashed, and no analytics on file sizes or extensions. For sensitive material, contracts, medical records, private keys, the difference between a tool that uploads and one that hashes locally is the difference between trusting one third party and trusting none. A hash is a tiny output (64 hex characters for SHA-256), but the input it summarises can be very revealing. Keeping that input client-side is the right default for any verification task.
Frequently Asked Questions
How do I compare a file hash to the official one?
After generating the hash, compare it character by character with the hash published by the file's source (usually on the download page). If every character matches, the file is authentic and uncorrupted. Even one character difference means the file has been modified.
Which hash algorithm should I use?
SHA-256 is the standard for file verification. Use whichever algorithm the publisher provides. If you have a choice, SHA-256 offers a good balance of security and performance.
Can a corrupted file have the correct hash?
It is theoretically possible (a collision) but statistically negligible with SHA-256. The odds are so astronomically low that for all practical purposes, matching hashes guarantee identical files.
Is my file uploaded to a server?
No. The hash is calculated entirely in your browser. Your file never leaves your device, making it safe for any file including sensitive documents.
What is the difference between a hash and a digital signature?
A hash proves a file has not changed since the hash was computed; anyone can verify it. A digital signature proves both integrity AND identity, the publisher signs the hash with their private key, and you verify with their public key. Hashes alone do not protect against a hacker who replaced both the file and the published hash on the same compromised mirror.
Why are MD5 and SHA-1 considered insecure?
Researchers have demonstrated practical collision attacks for both. In 2017 Google produced two different PDFs with identical SHA-1 hashes (the SHAttered attack), and MD5 collisions can be generated in seconds on a laptop. For deliberate-tamper detection use SHA-256 or stronger; MD5 and SHA-1 still work for catching accidental corruption but should never be trusted as security boundaries.