After writing about hashing passwords in Python, I have decided to publish the other parts of the chapter, “Cryptography” from the “Modern Python Standard Library Cookbook“, written by Alessandro Molina and and kindly provided to me by PacktPublishing.In the article, the following recipes are covered:
- Asking for passwords — when asking for a password in a terminal-based software, make sure you don’t leak it.
- Verifying a file’s integrity — how to check that a file transferred over a network wasn’t corrupted.
- Verify a message’s integrity — how to check that a message you are sending to another software hasn’t been altered.
Introduction
While cryptography is generally perceived as a complex field, there are tasks based on it that are part of our everyday lives as software developers, or at least they should be, to ensure a minimum level of security in our code base. This chapter tries to cover recipes for most of the common tasks that you will have to face every day that can help to make your software resilient to attacks. While software written in Python will hardly suffer from exploitation, such as buffer overflows (unless there are bugs in the interpreter or compiled libraries you rely on), there are still a whole bunch of cases where you might be leaking information that must remain undisclosed.
Asking for passwords
In terminal-based programs, it’s common to ask for passwords from our users. It’s usually a bad idea to do so from command options, as on Unix-like systems, they will be visible to anyone with access to the shell who is able to run a ps command to get the list of processes, and to anyone willing to run a history command to get the list of recently executed commands. While there are ways to tweak the command arguments to hide them from the list of processes, it’s always best to ask for passwords interactively so that no trace of them is left. But, asking for them interactively is not enough, unless you also ensure they are not displayed while typing, otherwise anyone looking at your screen can grab all your passwords.
How to do it…
Luckily, the Python standard library provides an easy way to input passwords from a prompt without showing them back:
1 2 3 |
import getpass pwd = getpass.getpass() print (pwd) |
How it works…
The getpass.getpass function will use the termios library on most systems to disable the echoing back of the characters written by the user. To avoid messing with the rest of the application input, it will be done within a new file descriptor for the terminal. On systems that do not support this, it will use more basic calls to read characters directly from sys.stdin without echoing them back.
Verifying a file’s integrity
If you’ve ever downloaded a file from a public network, you might have noticed that their URLs are frequently in the form of http://files.host.com/somefile.tar.gz#md5=3b3f5b2327421800ef00c38ab5ad81a6. That’s because the download might go wrong and the data you got might be partially corrupted. So the URL includes an MD5 hash that you can use to verify that the downloaded file is fine through the md5sum tool. The same applies when you download a file from a Python script. If the file provided has an MD5 hash for verification, you might want to check whether the retrieved file is valid and, in cases where it is not, then you can retry downloading it again.
How to do it…
Within hashlib, there are multiple supported hashing algorithms, and probably the most widespread one is md5, so we can rely on hashlib to verify our downloaded file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import hashlib def verify_file(filepath, expectedhash, hashtype='md5'): with open(filepath, 'rb') as f: try: filehash = getattr(hashlib, hashtype)() except AttributeError: raise ValueError('Unsupported hashing type %s' % hashtype) from None while True: data = f.read(4096) if not data: break filehash.update(data) return filehash.hexdigest() == expectedhash |
Our file can then be downloaded and verified with
verify_file. For example, I might download the wrapt distribution from the Python Package Index (PyPI) and I might want to verify that it was correctly downloaded. The file name would be
wrapt-1.10.11.tar.gz#sha256=d4d560d479f2c21e1b5443bbd15fe7ec4b37fe7e53d335d3b9b0a7b1226fe3c6 on which I could run my
verify_file function:
1 2 3 4 |
verify_file( 'wrapt-1.10.11.tar.gz', 'd4d560d479f2c21e1b5443bbd15fe7ec4b37fe7e53d335d3b9b0a7b1226fe3c6', 'sha256) |
How it works…
The first thing the function does is open the file in binary mode. As all hash functions require bytes and we don’t even know the content of the file, reading it in binary mode is the most convenient solution. Then, it checks whether the requested hashing algorithm is available in hashlib . That’s done through getattr by trying to grab hashlib.md5, hashlib.sha256, and so on. If the algorithm is not supported, it won’t be a valid hashlib attribute (as it won’t exist in the module) and will throw AttributeError. To make those easier to understand, they are trapped and a new ValueError is raised that states clearly that the algorithm is not supported. Once the file is opened and the algorithm is verified, an empty hash gets created (notice that right after getattr, the parenthesis will lead to the creation of the returned hash). We start with an empty one because the file might be very big, and we don’t want to read the complete file and throw it at the hashing function at once. Instead, we start with an empty hash and we read the file in chunks of 4 KB, then each chunk is fed to the hashing algorithm to update the hash. Finally, once we have the hash computed, we grab its representation as hexadecimal numbers and compare it to the one provided to the function. If the two match, the file was properly downloaded.
Verify a message’s integrity
When sending messages through a public network or storages accessible to other users and systems, we need to know whether the message contains the original content or whether it was intercepted and modified by anyone. That’s a typical form of a man-in-the-middle attack and it’s something that can modify anything in our content, which is stored in a place that other people can read too, such as an unencrypted network or a disk on a shared system. The HMAC algorithm can be used to guarantee that a message wasn’t altered from its original state and it’s frequently used to sign digital documents to ensure their integrity. A good scenario for HMAC might be a password-reset link; those links usually include a parameter about the user for whom the password should be reset:
1 |
http://myapp.com/reset-password?user=myuser@email.net |
But anyone might replace the user argument and reset other people’s passwords. So, we want to ensure that the link we provide wasn’t actually modified, since it was sent by attaching an HMAC to it. That will result in something such as:
1 |
http://myapp.com/reset-password?user=myuser@email.netsignature=8efc6e7161004cfb09d05af69cc0af86bb5edb5e88bd477ba545a9929821f582 |
Furthermore, any attempt at modifying the user will make the signature invalid, thus making it impossible to reset other people’s passwords. Another use case is deploying REST APIs to authenticate and verify requests. Amazon Web Services uses HMAC as an authentication system for its web services. When you register, an access key and a secret are provided to you. Any request you make must be hashed with HMAC, using the secret key to ensure that you are actually the user stated in the request (as you owned its secret key), and the request itself wasn’t changed in any way because details of it are hashed with HMAC too. The HMAC signature is frequently involved in cases where your software has to send messages to itself or receive messages from a verified partner that can own a secret key.
How to do it…
For this recipe, the following steps are to be performed:
- The standard library provides an hmac module that, combined with the hashing functions provided in hashlib, can serve the purpose of computing the message’s authentication code for any provided message:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import hashlib, hmac, time def compute_signature(message, secret): message = message.encode('utf-8') timestamp = str(int(time.time()*100)).encode('ascii') hashdata = message + timestamp signature = hmac.new(secret.encode('ascii'), hashdata, hashlib.sha256).hexdigest() return { 'message': message, 'signature': signature, 'timestamp': timestamp } def verify_signature(signed_message, secret): timestamp = signed_message['timestamp'] expected_signature = signed_message['signature'] message = signed_message['message'] hashdata = message + timestamp signature = hmac.new(secret.encode('ascii'), hashdata, hashlib.sha256).hexdigest() return signature == expected_signature |
- Our functions can then be used to compute a signed message and we can check that a signed message wasn’t altered in any way:
1 2 |
signed_msg = compute_signature('Hello World', 'very_secret') verify_signature(signed_msg, 'very_secret') |
- If you try to change the message field of the signed message, it won’t be valid anymore, and only the real message will match the signature:
1 2 |
signed_msg['message'] = b'Hello Boat' verify_signature(signed_msg, 'very_secret') |
How it works…
Our purpose is to ensure that any given message can’t be changed in any way or it will invalidate the signature attached to the message. So the compute_signature function, given a message and a private secret key, returns all the data that the signed message should include when it’s sent to the receiver. The sent data includes the message itself, the signature, and a timestamp. The timestamp is included because, in many cases, it’s a good idea to ensure that the message is a recent one. If you are receiving an API request signed with HMAC or a cookie that you just set, you might want to ensure that you are handling a recent message and not one that was sent an hour ago. The timestamp can’t be tampered with as it’s included in the signature together with the message, and its presence makes it harder for attackers to guess the secret key, as two identical messages will result in having two different signatures, thanks to the timestamp.
Once the message and the timestamp are known, the compute_signature function hands them to hmac.new, together with the secret key, to compute the signature itself. For convenience, the signature is represented as the characters that compose the hexadecimal numbers that represent the bytes the signature is made of. This ensures that it can be transferred as plain text in HTTP headers or some similar manner. Once we have our signed message as returned by compute_signature , this can be stored somewhere and, when loading it back, we can use verify_signature to check that it wasn’t tampered with. The verify_signature function takes the same steps as compute_signature . The signed message includes the message itself, the timestamp, and the signature. So, verify_signature grabs the message and the timestamp and joins them with the secret key to compute the signature. If the computed signature matches the signature provided in the signed message, it means the message wasn’t altered in any way. Otherwise, even a minor change to the message or to the timestamp will make the signature invalid.