Data Security 3 - Hashing
References for this Part
Jean-Philippe Aumasson. Serious Cryptography. No Starch Press, 2018. Chapter 6
Model Solutions Previous Lesson
Nothing to show. The exercises were hands on practical. Questions are welcome.
Hashing
Here we shall look at hashing, hashing algorithms, and how we logically verify user input against a hash. Your guru (Aumasson, 2018, chapter 6), calls them “ the cryptographer’s Swiss Army Knife: they are used in digital signatures, public-key encryption, integrity verification, message authentication, password protection, key agreement protocols, and many other cryptographic protocols.” They are indeed everywhere, but first a few words on general terminology. We looked at the first, encryption, previously, but generally:
Hashing / Encryption, Terminology
Dealing with keeping confidentiality of information on any network or computer we have to recognize some of the terms of the domain.
Cryptography1, sometimes also called Cryptology is the practice and study of techniques for secure communication in the presence of third parties called adversaries.
In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher.2
Decryption is the opposite process, ie converting ciphertext into plaintext. If this is done securely. Alice3 may encrypt a message in plaintext into ciphertext, send it across the network to Bob, who may decrypt it back into plaintext by a key, and then read it. Any eavesdropping on the message while it is traversing the net will only give Eve, the eavesdropper, some un-understandable ciphertext. The message is confidential.
Hashing is in some respects similar, and in others, fundamentally different.
A cryptographic hash function (CHF) is a hash function that is suitable for use in cryptography. It is a mathematical algorithm that maps data of arbitrary size (often called the “message”) to a bit string of a fixed size (the “hash value”, “hash”, or “message digest”) and is a one-way function …
Please notice that we do not call hashed text ciphertext, but message digest, or just digest. In the last bit above, the practical irreversibility of hash functions is exactly what we are looking for when obfuscating passwords. There’s no key to get for Eve. The only way to crack a hash is applying brute force, ie guessing systematically the plaintext producing the hash. The quality, strength, of the CHF is dependent on the time it will take a modern computer to find the plaintext, the preimage, by brute force. Here we are preferably talking years, many years, even many, many years.
As you have already realized, we use hashing for obfuscating passwords beyond recognition before we store them. Encryption would entail the possibility of decryption. We don’t want that. We don’t need that. To crack message digests you will need to hash a string of plaintext, match it against the stored password, and, if there’s no match, try another, then another, then …
It will not help you that you might know the hashed representation of the password.
The Mechanics
Let me outline hashing and brute forcing inspired by (Aumasson, 2018, chapter6):
Example 1. Hashing Pseudo Code
|
|
Example 2. Brute Force Pseudo Code
|
|
Usage
All programming languages have functions for hashing with an array of different hashing algorithms. When we make a choice of one, choose a good one, it has some repercussions on other parts of whatever software you are building. It affects, for example, the layout of the database user table or collection you are authenticating your users against.
It should also be stated that unless you are totally sure of what you are doing, do NOT use home made hashing, with or without salt. It is not secure. Instead you should apply publicly known and recognized functions from whatever language you use for your applications.
We may have hinted that the primary use of hashing is to create obfuscated passwords. (Aumasson, 2018) says in chapter 6 otherwise. He holds that:
The notion of security for hash functions is different from what we’ve seen thus far. Whereas ciphers protect data confidentiality in an effort to guarantee that data sent in the clear can’t be read, hash functions protect data integrity in an effort to guarantee that data — whether sent in the clear or encrypted — hasn’t been modified. If a hash function is secure, two distinct pieces of data should always have different hashes. A file’s hash can thus serve as its identifier.
Consider the most common application of a hash function: digital signatures, or just signatures. When digital signatures are used, applications process the hash of the message to be signed rather than the message itself, as shown in Figure 6-2. The hash acts as an identifier for the message. If even a single bit is changed in the message, the hash of the message will be totally different. The hash function thus helps ensure that the message has not been modified. Signing a message’s hash is as secure as signing the message itself, and signing a short hash of, say, 256 bits is much faster than signing a message that may be very large. In fact, most signature algorithms can only work on short inputs such as hash values.
Meaning that signing documents or mails for transmission is main use of hashes. We have already talked about that in a previous lesson, and it is often done as a feature of your mail user agent, the general term for mail program. In our context, however, we focus on the use in authentication, ie for password obfuscation.
In order to work with hashing and encryption we have to get the necessary software.
- For Javascript
- The two essential modules for
node
arebcryptjs
, andcrypto-js
. You just need one of them, for hashing.bcryptjs
still seems to be current best practice. The other one,crypto-js
has other, and also standardized algorithms. It also has encryption functions. - For Python
- The modules to import for
python
arebcrypt
which must be installed, andhashlib
that seems to be part of the standard installation.
Let us see them in practice:
Example 3. Hashing Example in JavaScript
|
|
Usage:
|
|
Example 4. Hashing Example in Python
|
|
Usage:
|
|
Take a detailed look. Notice that the length is indicative for designing the database table that will contain the hashed password in a application.
Please notice also that the to instances of using the bcrypt
algorithm of the
bcrypt.hash
function results in four different hashes, yet they all verify
as correct with bcrypt.compare
.
What Is Salt?
There is a phenomenon called salting that helps reenforcing the password hashing to be even harder to brute force. Please refer to Salt (cryptography). Here it is stated that “salt is random data that is used as an additional input to a one-way function that hashes data.”
Study that, and put special emphasis on the section Example usage, as well as on the following quote:
… a salt cannot protect common or easily guessed passwords. Without a salt, the hashed value is the same for all users that have a given password, making it easier for hackers to guess the password from the hashed value …
Verify the Hash
Example 5. The JavaScript Way
|
|
Usage:
|
|
Example 6. The Python Way
|
|
Usage:
|
|
The bcrypt
requires the stored password digest to be known in order
to extract the salt, then it hashes the entered password with the
salt from the know password digest. Otherwise verification would be
meaningless.
Just One More Thing on Hashing
The passwords used by your operating system are also interesting in order to get some more perspective on hashing. Regarding Linux especially, but not only, you might want to read https://crypto.stackexchange.com/questions/40841/what-is-the-algorithm-used-to-encrypt-linux-passwords and, for more background: https://security.stackexchange.com/questions/211/how-to-securely-hash-passwords
Exercises
The rules for handing in assignments may be found in the README
Exercise DS.3.0
In this assignment you must write a program that simulates a user registration process.
The main requirements
- input of userid and password may be prompted by the program, or read from the CLI
- the password must not be found in the
rockyou.txt
list on 14,3 million compromised passwords. - Non compliant password must be rejected with a message
- Success must be acknowledged
- The password must be hashed with the
bcrypt
algorithm - On success the userid and hashed password must
be entered in the database
test.db
Re 3. Rockyou
In order to get rockyou.txt
clone this repo
https://gitlab.com/arosano/rockyou.git
and then unzip the file rockyou.zip
, will give you rockyou.txt
that holds 14.3 million passwords.
The object Rockyou
will be applying the Singleton pattern.
This pattern creates an object which can have but one instance.
An attemt to create a Rockyou
will return the already existing
object if it exists, otherwise it will be created.
This way the big file only uses memory space once, and not per call.
Example 7. JavaScript Code Rockyou.js
|
|
Example 8. Python Code Radical Rockyou Singleton, rock.py
|
|
Re 6. The Database
The database can be created from the CLI. It is a one time thing.
The database software
sqlite3
should be installed per default on all
moderne operating systems. If not, install.
Example 9. Create a Database as Follows, CLI:
|
|
The database may be used from either JavaScript or Python as you please.
The Application
Example 10. Driving Code Hint JavaScript
|
|
Usage:
|
|
Example 11. Driving Code Hint Python
|
|
Usage from a completed version of the code from example 10:
|
|
Extra Hints
To compensate for reported lack of SQL experience, we have create two programs that together with all the other hints given above, should enable you to do the database work.
Example 12: Three SQL examples in on program. testsqlite3.py
|
|
Example 13: Three SQL examples in on program. testsqlite3.js
|
|
The code will simulate what a web application should do to ascertain that users use better passwords. You may add a password length requirement too, if you like.