NML Says

Data Security 3 - Hashing

References for this Part

Jean-Philippe Aumasson. Serious Cryptography. No Starch Press, 2018. Chapter 6

Model Solutions Previous Lesson

Nothing to show. The exercises were hands on practical. Questions are welcome.

Hashing

Here we shall look at hashing, hashing algorithms, and how we logically verify user input against a hash. Your guru (Aumasson, 2018, chapter 6), calls them “ the cryptographer’s Swiss Army Knife: they are used in digital signatures, public-key encryption, integrity verification, message authentication, password protection, key agreement protocols, and many other cryptographic protocols.” They are indeed everywhere, but first a few words on general terminology. We looked at the first, encryption, previously, but generally:

Hashing / Encryption, Terminology

Dealing with keeping confidentiality of information on any network or computer we have to recognize some of the terms of the domain.

Cryptography1, sometimes also called Cryptology is the practice and study of techniques for secure communication in the presence of third parties called adversaries.

In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher.2

Decryption is the opposite process, ie converting ciphertext into plaintext. If this is done securely. Alice3 may encrypt a message in plaintext into ciphertext, send it across the network to Bob, who may decrypt it back into plaintext by a key, and then read it. Any eavesdropping on the message while it is traversing the net will only give Eve, the eavesdropper, some un-understandable ciphertext. The message is confidential.

Hashing is in some respects similar, and in others, fundamentally different.

A cryptographic hash function (CHF) is a hash function that is suitable for use in cryptography. It is a mathematical algorithm that maps data of arbitrary size (often called the “message”) to a bit string of a fixed size (the “hash value”, “hash”, or “message digest”) and is a one-way function …

Please notice that we do not call hashed text ciphertext, but message digest, or just digest. In the last bit above, the practical irreversibility of hash functions is exactly what we are looking for when obfuscating passwords. There’s no key to get for Eve. The only way to crack a hash is applying brute force, ie guessing systematically the plaintext producing the hash. The quality, strength, of the CHF is dependent on the time it will take a modern computer to find the plaintext, the preimage, by brute force. Here we are preferably talking years, many years, even many, many years.

As you have already realized, we use hashing for obfuscating passwords beyond recognition before we store them. Encryption would entail the possibility of decryption. We don’t want that. We don’t need that. To crack message digests you will need to hash a string of plaintext, match it against the stored password, and, if there’s no match, try another, then another, then …

It will not help you that you might know the hashed representation of the password.

The Mechanics

Let me outline hashing and brute forcing inspired by (Aumasson, 2018, chapter6):

Example 1. Hashing Pseudo Code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
/*
 * THE FOLLOWING IS PSEUDO CODE TO ILLUSTRATE AN IDEA
 *
 * input: plaintext
 * output: message digest
 * doImplAlgo: some hashing algorithm
 * duration: enough to warrant synchronicity
 * warning: Do Not Write This Yourself
 */
const hash = function (plaintext) {
    let messageDigest = doImplAlgo(plaintext);
    return messageDigest;
}
Example 2. Brute Force Pseudo Code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
/*
 * THE FOLLOWING IS PSEUDO CODE TO ILLUSTRATE AN IDEA
 *
 * input: message digest
 * output: plaintext
 * duration: measurable, possibly significant, measure of quality
 * warning: You Should Probably Not Write This Yourself
 */
const crack = function (md) {
    while (true) {
        let niceTry = randomMessage()
        if (hash(niceTry) == md) return niceTry;
    }
}

Usage

All programming languages have functions for hashing with an array of different hashing algorithms. When we make a choice of one, choose a good one, it has some repercussions on other parts of whatever software you are building. It affects, for example, the layout of the database user table or collection you are authenticating your users against.

It should also be stated that unless you are totally sure of what you are doing, do NOT use home made hashing, with or without salt. It is not secure. Instead you should apply publicly known and recognized functions from whatever language you use for your applications.

We may have hinted that the primary use of hashing is to create obfuscated passwords. (Aumasson, 2018) says in chapter 6 otherwise. He holds that:

The notion of security for hash functions is different from what we’ve seen thus far. Whereas ciphers protect data confidentiality in an effort to guarantee that data sent in the clear can’t be read, hash functions protect data integrity in an effort to guarantee that data — whether sent in the clear or encrypted — hasn’t been modified. If a hash function is secure, two distinct pieces of data should always have different hashes. A file’s hash can thus serve as its identifier.

Consider the most common application of a hash function: digital signatures, or just signatures. When digital signatures are used, applications process the hash of the message to be signed rather than the message itself, as shown in Figure 6-2. The hash acts as an identifier for the message. If even a single bit is changed in the message, the hash of the message will be totally different. The hash function thus helps ensure that the message has not been modified. Signing a message’s hash is as secure as signing the message itself, and signing a short hash of, say, 256 bits is much faster than signing a message that may be very large. In fact, most signature algorithms can only work on short inputs such as hash values.

Meaning that signing documents or mails for transmission is main use of hashes. We have already talked about that in a previous lesson, and it is often done as a feature of your mail user agent, the general term for mail program. In our context, however, we focus on the use in authentication, ie for password obfuscation.

In order to work with hashing and encryption we have to get the necessary software.

For Javascript
The two essential modules for node are bcryptjs, and crypto-js. You just need one of them, for hashing. bcryptjs still seems to be current best practice. The other one, crypto-js has other, and also standardized algorithms. It also has encryption functions.
For Python
The modules to import for python are bcrypt which must be installed, and hashlib that seems to be part of the standard installation.

Let us see them in practice:

Example 3. Hashing Example in JavaScript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
const MD5 = require("crypto-js/md5");
const SHA1 = require("crypto-js/sha1");
const SHA256 = require("crypto-js/sha256");
const SHA512 = require("crypto-js/sha512");
const bcrypt = require('bcryptjs');

const argv = process.argv;
const argc = argv.length;

let algo, hash;
const input = argv[2];

algo = 'MD5';
hash = `${MD5(input)}`;
console.log(`${algo}\t${hash.length}\t${hash}\n`);

algo = 'SHA1';
hash = `${SHA1(input)}`;
console.log(`${algo}\t${hash.length}\t${hash}\n`);

algo = 'SHA256';
hash = `${SHA256(input)}`;
console.log(`${algo}\t${hash.length}\t${hash}`);

algo = 'SHA256';
hash = `${SHA256(input)}`;
console.log(`${algo}\t${hash.length}\t${hash}\n`);

algo = 'SHA512';
hash = `${SHA512(input)}`;
console.log(`${algo}\t${hash.length}\t${hash}`);

algo = 'SHA512';
hash = `${SHA512(input)}`;
console.log(`${algo}\t${hash.length}\t${hash}\n`);

algo = 'BCRYPT';
hash = bcrypt.hashSync(input);
console.log(`${algo}\t${hash.length}\t${hash}`);

algo = 'BCRYPT';
hash = bcrypt.hashSync(input);
console.log(`${algo}\t${hash.length}\t${hash}`);

Usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ node runme.js test 
MD5     32      098f6bcd4621d373cade4e832627b4f6

SHA1    40      a94a8fe5ccb19ba61c4c0873d391e987982fbbd3

SHA256  64      9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
SHA256  64      9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08

SHA512  128     ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff
SHA512  128     ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff

BCRYPT  60      $2a$10$wHDAWGHfNzFwten5wVh1KOSKq63CGTh0bJMBIeXUSoPJyRtRN6I2O
BCRYPT  60      $2a$10$EFyB0w0BbbAnlpQqIjggT.9mFfZKdBJ9cAlmjA62POYxwk7hlvd5m
Example 4. Hashing Example in Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import bcrypt
import hashlib
import sys

argv = sys.argv
password = argv[1]                                  # pwd from cli
justpwd = password.encode()                         # to byte sequence
saltedpwd = (password + "qpurhafherue").encode()    # to byte seq

hash = hashlib.md5(justpwd)                         # hash, salting optional
hash = hash.hexdigest()                             # to hexadecimal string
hash = f"MD5\t{len(hash)}\t{hash}"                  # format
print(hash, "\n")


hash = hashlib.sha1(justpwd)
hash = hash.hexdigest()
hash = f"SHA1\t{len(hash)}\t{hash}"
print(hash)

hash = hashlib.sha1(justpwd)
hash = hash.hexdigest()
hash = f"SHA1\t{len(hash)}\t{hash}"
print(hash, "\n")


hash = hashlib.sha256(justpwd)
hash = hash.hexdigest()
hash = f"SHA256\t{len(hash)}\t{hash}"
print(hash)

hash = hashlib.sha256(justpwd)
hash = hash.hexdigest()
hash = f"SHA256\t{len(hash)}\t{hash}"
print(hash, "\n")


hash = hashlib.sha512(justpwd)
hash = hash.hexdigest()
hash = f"SHA512\t{len(hash)}\t{hash}"
print(hash)

hash = hashlib.sha512(justpwd)
hash = hash.hexdigest()
hash = f"SHA512\t{len(hash)}\t{hash}"
print(hash, "\n")


salt = bcrypt.gensalt()                             # generate salt
hash = bcrypt.hashpw(justpwd, salt)                 # hash with salt, required
hash = f"{hash.decode('utf-8')}"                    # byte seq to string
hash = f"BCRYPT\t{len(hash)}\t{hash}"               # format
print(hash)

salt = bcrypt.gensalt()
hash = bcrypt.hashpw(justpwd, salt)
hash = f"{hash.decode('utf-8')}"
hash = f"BCRYPT\t{len(hash)}\t{hash}"
print(hash)

Usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ python runme.py test 
MD5     32      098f6bcd4621d373cade4e832627b4f6 

SHA1    40      a94a8fe5ccb19ba61c4c0873d391e987982fbbd3
SHA1    40      a94a8fe5ccb19ba61c4c0873d391e987982fbbd3 

SHA256  64      9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
SHA256  64      9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 

SHA512  128     ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff
SHA512  128     ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff 

BCRYPT  60      $2b$12$VAqqHGdp/L4r1g6F.oeo6.xXTX.shGkckwZ5F9JnNIkRVQMeTdv8W
BCRYPT  60      $2b$12$TYKfbIWeolRzCdDCeDewVexcHvuxwZam8xf74aCahN0REOCS8ABu2

Take a detailed look. Notice that the length is indicative for designing the database table that will contain the hashed password in a application.

Please notice also that the to instances of using the bcrypt algorithm of the bcrypt.hash function results in four different hashes, yet they all verify as correct with bcrypt.compare.

What Is Salt?

There is a phenomenon called salting that helps reenforcing the password hashing to be even harder to brute force. Please refer to Salt (cryptography). Here it is stated that “salt is random data that is used as an additional input to a one-way function that hashes data.”

Study that, and put special emphasis on the section Example usage, as well as on the following quote:

… a salt cannot protect common or easily guessed passwords. Without a salt, the hashed value is the same for all users that have a given password, making it easier for hackers to guess the password from the hashed value …

Verify the Hash

Example 5. The JavaScript Way
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// verifyps.js

const bcrypt = require('bcryptjs');

const argv = process.argv;
const argc = argv.length;

const verify = function (user, pwd) {
    // read user in db
    // extract password as readpwd, here we use eknown hash for demo
    let readpwd = "$2a$10$wHDAWGHfNzFwten5wVh1KOSKq63CGTh0bJMBIeXUSoPJyRtRN6I2O";
    let rc = bcrypt.compareSync(pwd, readpwd);
    return rc;
}

let enteredpassword = argv[2];
if (verify('dummy', enteredpassword)) {
    console.log('You chose wisely, you are in');
} else {
    console.log('Try again');
}

Usage:

1
2
3
4
$ node verifyps.js test
You chose wisely, you are in
~/exercises/ds3 $ node verifyps.js test1
Try again
Example 6. The Python Way
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# verifyps.py

import bcrypt
import sys

def verify(user, pwd):
    # read user in db
    # extract password as readpwd, here we use eknown hash for demo
    readpwd = "$2a$10$wHDAWGHfNzFwten5wVh1KOSKq63CGTh0bJMBIeXUSoPJyRtRN6I2O".encode()
    rc = bcrypt.checkpw(pwd, readpwd)
    return rc

if __name__ == "__main__":
    argv = sys.argv
    enteredpassword = argv[1].encode()
    if (verify('dummy', enteredpassword)):
        print('You chose wisely, you are in')
    else:
        print('Try again')

Usage:

1
2
3
4
$ python verifyps.py test 
You chose wisely, you are in
~/exercises/ds3 $ python verifyps.py test1
Try again

The bcrypt requires the stored password digest to be known in order to extract the salt, then it hashes the entered password with the salt from the know password digest. Otherwise verification would be meaningless.

Just One More Thing on Hashing

The passwords used by your operating system are also interesting in order to get some more perspective on hashing. Regarding Linux especially, but not only, you might want to read https://crypto.stackexchange.com/questions/40841/what-is-the-algorithm-used-to-encrypt-linux-passwords and, for more background: https://security.stackexchange.com/questions/211/how-to-securely-hash-passwords

Exercises

The rules for handing in assignments may be found in the README

Exercise DS.3.0

In this assignment you must write a program that simulates a user registration process.

The main requirements

  1. input of userid and password may be prompted by the program, or read from the CLI
  2. the password must not be found in the rockyou.txt list on 14,3 million compromised passwords.
  3. Non compliant password must be rejected with a message
  4. Success must be acknowledged
  5. The password must be hashed with the bcrypt algorithm
  6. On success the userid and hashed password must be entered in the database test.db
Re 3. Rockyou

In order to get rockyou.txt clone this repo https://gitlab.com/arosano/rockyou.git and then unzip the file rockyou.zip, will give you rockyou.txt that holds 14.3 million passwords.

The object Rockyou will be applying the Singleton pattern. This pattern creates an object which can have but one instance. An attemt to create a Rockyou will return the already existing object if it exists, otherwise it will be created. This way the big file only uses memory space once, and not per call.

Example 7. JavaScript Code Rockyou.js
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
/*
        Rockyou.js
        Rockyou as Singleton, lean and mean
*/

const fs = require('fs');

module.exports = class Rockyou {
        static #rockyou = [];                // 14.3 million bad passwords
        static #filename = './rockyou.txt';
        
        constructor() {
                if (Rockyou.#rockyou.length === 0) {
                        Rockyou.#rockyou = fs.readFileSync(Rockyou.#filename, 'utf8').toString().split('\n');
                }
        }
}
Example 8. Python Code Radical Rockyou Singleton, rock.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
'''
    Rockyou as a singleton, lean and mean
'''

class Rockyou(object):
    _instance = None
    _miserable_passwords = []

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super(Rockyou, cls).__new__(cls)
            with open('./rockyou.txt', 'r', errors='replace') as f:
                cls._miserable_passwords = [line.rstrip() for line in f]
        return cls._instance
Re 6. The Database

The database can be created from the CLI. It is a one time thing.

The database software sqlite3 should be installed per default on all moderne operating systems. If not, install.

Example 9. Create a Database as Follows, CLI:
1
2
3
4
5
6
7
8
9
$ sqlite3 test.db 
SQLite version 3.44.2 2023-11-24 11:41:44
Enter ".help" for usage hints.
sqlite> create table user (
(x1...> id integer primary key autoincrement,
(x1...> userid varchar(32) unique not null,
(x1...> password blob nut null
(x1...> );
sqlite> 

The database may be used from either JavaScript or Python as you please.

The Application
Example 10. Driving Code Hint JavaScript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// register.js

const bcrypt = require('bcryptjs');
const sqlite3 = require('better-sqlite3');
const Rockyou = require('./Rockyou.js');

const argv = process.argv;
const argc = argv.length;
const db = new sqlite3('test.db');              // includes connect

const register = function(user, pwd) {
    // hash password
    // insert
    // return tru if ok else false
}


let entereduser = argv[2];
let enteredpassword = argv[3];
let rockyou = new Rockyou();

let exists = rockyou.search(enteredpassword);
// if exists error
// if not exists register

db.close();

Usage:

1
# tba if or when example 11 has been completed
Example 11. Driving Code Hint Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# register.py

# https://docs.python.org/3/library/sqlite3.html for doc
# https://www.slingacademy.com/article/python-sqlite3-insert-new-row-get-id/

import bcrypt
import rock
import sqlite3
import sys

def register(user, pwd):
    con = sqlite3.connect("test.db")
    cur = con.cursor()
    # hash password
    # insert
    # return True if success
    # False if not

if __name__ == "__main__":

    # get input from cli
    rockyou = rock.Rockyou()
    exists = rockyou.search(enteredpassword)
    # if exists error
    # else register
)

Usage from a completed version of the code from example 10:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ python register.py ls4 abc123ghj
inserted
$ python register.py nml test
inadequate password 
$ sqlite3 test.db             
SQLite version 3.44.2 2023-11-24 11:41:44
Enter ".help" for usage hints.
sqlite> select * from user;
6|ls4|$2b$12$sU1uR7S5cioCg0RotY.oPu4Y1.Q5PJd4dvJqYhqyh61lDdldc.J3O
sqlite> 
Extra Hints

To compensate for reported lack of SQL experience, we have create two programs that together with all the other hints given above, should enable you to do the database work.

Example 12: Three SQL examples in on program. testsqlite3.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# testsqlite3.py

import sqlite3
import sys

argv = sys.argv
argc = len(argv)
con = sqlite3.connect('test.db')
cur = con.cursor()

# insert into the database table - neccessary in register
sql = "insert into user(id, userid, password) values (null, ?, ?)"
res = cur.execute(sql, (argv[1], 'x'))
con.commit()

# select all rows from table
sql = 'select * from user'
res = cur.execute(sql)
print(res)


# select one from table - neccessary before compare/verify
sql = 'select userid, password from user where userid = ?'
res = cur.execute(sql, (argv[1]))
print(res)
        
con.close()
Example 13: Three SQL examples in on program. testsqlite3.js
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// testsqlite3.js

const sqlite3 = require('better-sqlite3');
// https://www.npmjs.com/package/better-sqlite3

const argv = process.argv;
const argc = argv.length;
const db = new sqlite3('test.db');

let res, rows, sql, stmt;

// insert into the database table - neccessary in register
sql = "INSERT INTO user(userid, password) VALUES (?, ?)";
stmt = db.prepare(sql);
res = stmt.run(argv[2], 'x');
console.log(res);


// select all rows from table
sql = 'select * from user';
stmt = db.prepare(sql);
rows = stmt.all();
console.log(rows);

// select one from table - neccessary before compare/verify
sql = 'select userid, password from user where userid = ?';
stmt = db.prepare(sql);
rows = stmt.get(argv[2]);
console.log(`${rows.userid}, password=${rows.password}`);
        
if (db) db.close();

The code will simulate what a web application should do to ascertain that users use better passwords. You may add a password length requirement too, if you like.