Generate MD5 hash of a file in Golang

07 April, 2016

Why generate a MD5 hash of a file?

You may have noticed before on some websites where you download a file, there is some strange string of text around the download link, named "checksum", for example "md5 checksum: dcdfb1413d7fa48c3cab920c0448f236". Now this may look like a bunch of random characters to you but don't be fooled, it's a one-way mathematical hash that correspondents to the file you want to download. The idea behind this is to check if the integrity of the file you downloaded is not corrupted (some more information below). So you generate a MD5 hash of the file that you saved to your disk, and compare it to the MD5 hash from where you downloaded it, if they don't match, it means that something went wrong during the transferring of the file.

Nowadays with stable, broadband internet connections this isn't a real issue anymore. But back in the days of Dial-up internet errors where common. During transfers packets could get lost, or bits could be misinterpreted by the modem, which would cause the avalanche effect on your hash.

It's also good practice in programming when you transfer files between programs, that you check if the received file matches the file that has been sent, by MD5 checksum or other popular hashing algorithms like CRCSHA,...

 

A Golang function

The argument you have to specify in the following function is the path to the file you desire to hash (the absolute path or relative path), and the output is either an empty string with an error, or a string with 32 characters and a nil error.

import (
	"crypto/md5"
	"encoding/hex"
	"io"
	"os"
)

func hash_file_md5(filePath string) (string, error) {
	//Initialize variable returnMD5String now in case an error has to be returned
	var returnMD5String string

	//Open the passed argument and check for any error
	file, err := os.Open(filePath)
	if err != nil {
		return returnMD5String, err
	}

	//Tell the program to call the following function when the current function returns
	defer file.Close()

	//Open a new hash interface to write to
	hash := md5.New()

	//Copy the file in the hash interface and check for any error
	if _, err := io.Copy(hash, file); err != nil {
		return returnMD5String, err
	}

	//Get the 16 bytes hash
	hashInBytes := hash.Sum(nil)[:16]

	//Convert the bytes to a string
	returnMD5String = hex.EncodeToString(hashInBytes)

	return returnMD5String, nil

}

Example

This example calculates the MD5 hash of itself, the running program.

package main

import (
	"crypto/md5"
	"encoding/hex"
	"fmt"
	"io"
	"os"
)

func hash_file_md5(filePath string) (string, error) {
	var returnMD5String string
	file, err := os.Open(filePath)
	if err != nil {
		return returnMD5String, err
	}
	defer file.Close()
	hash := md5.New()
	if _, err := io.Copy(hash, file); err != nil {
		return returnMD5String, err
	}
	hashInBytes := hash.Sum(nil)[:16]
	returnMD5String = hex.EncodeToString(hashInBytes)
	return returnMD5String, nil

}

func main() {
	hash, err := hash_file_md5(os.Args[0])
	if err == nil {
		fmt.Println(hash)
	}
}

This should output the following in windows on golang 1.5.3 with no special compiler options:

C:/Gotests/md5string.exe  [C:/Gotests]
656fc88239fed34577fca4084cf2add6