Monday 26 June 2017

Hashing

Hashing is the process of mapping binary data of variable length to a fixed size binary data. you use a hash function to transform data, if you use the same hash function on two identical pieces of data the result would be two identical hashes.

Hashing has four main applications

  • Indexing: hashing is used in hash tables to generate an index. basically you hash your key and it's used as the address for your data, often you'll end up with multiple keys having the same hash they're grouped together.
  • Data Integrity: if one user wants to send data to another, that user would hash the data and send, the data, the hash, and the algorithm to perform the hash. The receiving user could then run the hash themselves using the received algorithm and compare their hash with the one received if they are equal, then the data is intact, if they're different the data has been corrupted on the way, maliciously or otherwise.
  • Data Authenticity: is used when the receiver of data wants to ensure the authenticity of the sender, and the integrity of the data. This is accomplished with the sender computing a cryptographic hash and signing it with its own private key. the receiver then hashes the data again and then decrypts the received signature using the senders public key and verifies that it's the same as the hash.
  • Password Storage: instead of storing a users password you store a hash of their password. then when the user enters in their password, you hash it and compare the hashes to see if they are equal. It is highly unlikely using a cryptographic hash  that two different inputs will generate the same hash, actually two very similar inputs will generate completely different hashes.

They're are two kinds of hashes, ones with a key and ones without a key. The algorithms without keys are used to ensure data integrity, while ones with a key are used for both integrity and authenticity.

here is an example of ensuring data integrity using the Secure Hash Algorithm (SHA256)

using System;
using System.Linq;
using System.Security.Cryptography;
using System.Text;

namespace pc.HashingExample
{
    class Program
    {
        static void Main(string[] args)
        {
            var sha256 = SHA256.Create();

            var data = Encoding.Default.GetBytes("A paragraph of text");
            byte[] hashA = sha256.ComputeHash(data);

            data = Encoding.Default.GetBytes("A paragraph of text 2");
            byte[] hashB = sha256.ComputeHash(data);

            data = Encoding.Default.GetBytes("A paragraph of text");
            byte[] hashC = sha256.ComputeHash(data);

            Console.WriteLine("Hash A");
            Console.WriteLine(Encoding.Default.GetString(hashA));

            Console.WriteLine("Hash B");
            Console.WriteLine(Encoding.Default.GetString(hashB));

            Console.WriteLine("Hash C");
            Console.WriteLine(Encoding.Default.GetString(hashC));

            Console.WriteLine(hashA.SequenceEqual(hashB)); // Displays: false
            Console.WriteLine(hashA.SequenceEqual(hashC)); // Displays: true
        }
    }

}

Notice if you run the above even though there is only a minor difference between the two strings the hashes are completely different