What's the purpose of hashing file?
Many websites offering downloads also provide the MD5 or SHA1 hashes of the files which are available for download. What is the security purpose behind providing such a facility to the end-user?
The point of a hashing file is that if the data changes, the hash changes. So if you know that the hash is correct (i.e. hash(data_you_have) = announced_hash), and if you know that the hash is good (i.e. announced_hash = hash(genuine_data)), then you know that you have the right data (data_you_have = genuine_data). So why would you send a hash separately? In case someone could modify the data but not the hash. This makes sense if you send the data through a channel that has a high bandwidth but may be subject to data corruption, and the hash through a channel that is more reliable but has little bandwidth.
Torrents are a good example. The .torrent file contains a hash of each file in the torrent. It's only a few kilobytes and can be downloaded easily over the web. The pieces of the torrent, on the other hand, are downloaded from random unknown machines somewhere on the Internet, which you have no reason to trust. So once the data has been downloaded, your torrent software will verify that the hashes match and reject any piece whose hash doesn't match.Here, providing the hashes separately on the web page is an additional convenience if you get the .torrent file from the same source: in that case there is no need to verify the hash separately. If you get the .torrent file from a different source (such as a tracker), that source may itself be corrupted or malicious, and the hashes let you verify that what you downloaded is the genuine data.
- If you download the hashes from a website over HTTP, then you need to trust the following parties:
- the site you're downloading from (to provide “good” data and matching hashes);
- your ISP, the site's host, and more generally any Internet infrastructure in between;
- anyone who might have installed software (malware) on your machine, on the site or in the Internet infrastructure.
- You do not need, however, to trust anyone involved in getting the actual data to you: only the hashes.
It's up to you to decide whether to trust the site (both to serve useful data and not to be affected by malware) and whether to trust that your own computer is free of malware. As for the Internet infrastructure, that's a very very low risk in practice unless you're using public wifi. If you download the hashes over an HTTPS website whose certificate is good, this removes the risk from the Internet infrastructure. You still need to trust the website owner, your own machine, and the part of the Internet infrastructure involved in providing the certificate (browser provider, certification authority).