Learn how passwords can be stored without a risk of leaking them in this tutorial by Alessandro Molina, a Python developer since 2001 and currently the core developer of the TurboGears2 web framework and maintainer of Beaker Caching/Session framework.While cryptography is generally perceived as a complex field, there are tasks based on it that are a part of everyday lives as software developers, or at least they should be, to ensure a minimum level of security in your code base.
This article tries to cover one of the most common task – hashing passwords – that can help make your software resilient to attacks.
While software written in Python will hardly suffer from exploitation, such as buffer overflows (unless there are bugs in the interpreter or compiled libraries you rely on), there are still a whole bunch of cases where you might be leaking information that must remain undisclosed.
How can passwords be stored without a risk of leaking them?
Avoiding storing passwords in plain text is a known best practice. With software, usually, only needs to check whether the password provided by the user is correct and the hash of the password can be stored and compared with the hash of the provided password. If the two hashes match, the passwords are equal; if they don’t, the provided password is wrong.
Storing passwords is a pretty standard practice, and usually, they are stored as a hash plus some salt. The salt is a randomly generated string that is joined with the password before hashing. Being randomly generated, it ensures that even hashes of equal passwords get different results.
The Python standard library provides a pretty complete set of hashing functions, some of them very well-suited to storing passwords.
How to do it…
Python 3 introduced key derivation functions, which are especially convenient when storing passwords. Both pbkdf2 and scrypt are provided. While scrypt is more robust against attacks as it is both memory- and CPU-heavy, it only works on systems that provide OpenSSL 1.1+. While pbkdf2 works on any system, in a worst-case scenario, a Python-provided fallback is used.
So, while from a security point of view scrypt would be preferred, you can rely on pbkdf2 due to its wider availability and the fact that it’s been available since Python 3.4 (scrypt is only available on Python 3.6+):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import hashlib, binascii, os def hash_password(password): """Hash a password for storing.""" salt = hashlib.sha256(os.urandom(60)).hexdigest().encode('ascii') pwdhash = hashlib.pbkdf2_hmac('sha512', password.encode('utf-8'), salt, 100000) pwdhash = binascii.hexlify(pwdhash) return (salt + pwdhash).decode('ascii') def verify_password(stored_password, provided_password): """Verify a stored password against one provided by user""" salt = stored_password[:64] stored_password = stored_password[64:] pwdhash = hashlib.pbkdf2_hmac('sha512', provided_password.encode('utf-8'), salt.encode('ascii'), 100000) pwdhash = binascii.hexlify(pwdhash).decode('ascii') return pwdhash == stored_password |
The two functions can be used to hash the user-provided password for storage on disk or into a database ( hash_password ) and to verify the password against the stored one when a user tries to log back in ( verify_password ):
1 2 3 4 5 6 7 8 9 |
>>> stored_password = hash_password('ThisIsAPassWord') >>> print(stored_password) cdd5492b89b64f030e8ac2b96b680c650468aad4b24e485f587d7f3e031ce8b63cc7139b18 aba02e1f98edbb531e8a0c8ecf971a61560b17071db5eaa8064a87bcb2304d89812e1d07fe bfea7c73bda8fbc2204e0407766197bc2be85eada6a5 >>> verify_password(stored_password, 'ThisIsAPassWord') True >>> verify_password(stored_password, 'WrongPassword') False |
How it works…
There are two functions involved here:
- hash_password : Encodes a provided password in a way that is safe to store on a database or file
- verify_password : Given an encoded password and a plain text one is provided by the user, it verifies whether the provided password matches the encoded (and thus saved) one
hash_password actually does multiple things; it doesn’t just hash the password. The first thing it does is generate some random salt that should be added to the password. That’s just the sha256 hash of some random bytes read from os.urandom . It then extracts a string representation of the hashed salt as a set of hexadecimal numbers ( hexdigest).
The salt is then provided to pbkdf2_hmac together with the password itself to hash the password in a randomized way. As pbkdf2_hmac requires bytes as its input, the two strings (password and salt) are previously encoded in pure bytes. The salt is encoded as plain ASCII, as the hexadecimal representation of a hash will only contain the 0-9 and A-F characters. While the password is encoded as utf-8 , it could contain any character. (Is there anyone with emojis in their passwords?)
The resulting pbkdf2 is a bunch of bytes, as you want to store it into a database; you use binascii.hexlify to convert the bunch of bytes into their hexadecimal representation in a string format. Hexlify is a convenient way to convert bytes to strings without losing data. It just prints all the bytes as two hexadecimal digits, so the resulting data will be twice as big as the original data, but apart from this, it’s exactly the same as the converted data.
In the end, the function joins together the hash with its salt. As you know that the hexdigest of a sha256 hash (the salt) is always 64 characters long, by joining them together, you can grab back the salt by reading the first 64 characters of the resulting string. This will permit verify_password to verify the password and verify whether the salt used to encode it is required.
Once you have your password, verify_password can then be used to verify provided passwords against it. So it takes two arguments: the hashed password and the new password that should be verified. The first thing verify_password does is extract the salt from the hashed password (remember, you placed it as the first 64 characters of the string resulting from hash_password).
The extracted salt and the password candidate are then provided to pbkdf2_hmac to compute their hash and then convert it into a string with binascii.hexlify . If the resulting hash matches with the hash part of the previously stored password (the characters after the salt), it means that the two passwords match.
If the resulting hash doesn’t match, it means that the provided password is wrong. As you can see, it’s very important that you make the salt and the password available together, because you’ll need it to be able to verify the password and a different salt would result in a different hash and thus you’d never be able to verify the password.
If you found this article interesting, you can explore Alessandro Molina’s Modern Python Standard Library Cookbook to build optimized applications in Python by smartly implementing the standard library. This book will help you acquire the skills needed to write clean code in Python and develop applications that meet your needs.