You know what’s really lame? Slow websites.
Unfortunately, certain parts of the authentication process are supposed to be slow. This may seem counterintuitive, but slowness in the authentication process is a big part of being secure.
This article talks about how authentication works in Python (not just hashing), and how you can make your site faster for your users without compromising your security.
I’ll walk you through Python pseudocode, and show you exactly what you need to understand to ensure your auth system is as quick as possible.
Password Hashing is Slow
When a user signs up on your website, and gives you their password — the best practice is to hash the user’s password before storage. This means you’ll use a hashing algorithm like bcrypt or scrypt which translates a password string into a bunk of gibberish that cannot be reversed.
Once you have a hash, there’s no way to recover the original password.
Here’s the important bit: strong hashing functions like bcrypt and scrypt are meant to be slow!
The more CPU, RAM, and time required to compute a hash means an attacker will need to spend that much longer attempting to brute for a password.
Pretend for a moment, that you’ve hacked a company user database and now have access to all user password hashes.
Let’s also say that these are bcrypt 10 hashes.
If you’re trying to figure out what the password are, there’s only one thing for you to do: brute force them.
To do this, you might write some code that iterates over every possible password combination (in the example below I’m using the brute library on PyPI):
1 2 |
$ pip install brute |
1 2 3 4 5 6 7 8 9 10 |
# crack.py from brute import brute HASH_TO_CRACK = 'xxx' for pw in brute(length=8): if HASH_TO_CRACK == bcrypt(pw): print 'Password is:', pw break |
In the example above, we’ll iterate over every possible password of 8 characters and less, attempting to brute force it.
Each time you generate a new possible password, you’d then run it through the bcrypt
function, get the resulting hash, and compare it to the hacked password hash you have. If you get a match, it means you’ve successfully brute forced the user’s password!
But here’s the kicker: bcrypt and scrypt take a while to compute, and use a lot of resources.
Since both bcrypt and scrypt are mathematically slow to compute, attackers have a much harder time brute forcing these hashes as it requires a lot of computer resources ($$$).
So, since we now understand how hashing works, and why it is time intensive — let’s talk about authentication.
NOTE: If you’re interested in learning more about password security, you might want to read through an article we wrote a while back on the right way to do password security — it’s a good read. And if you want even more info, check this out.
How Authentication Works
When a user typically registers or signs into a site, you’re going to be hashing their password, and either storing it in a database or comparing it to a value in a database — but what happens after that? You remember the user either with an ID in a session, or via an API key of some sort.
Here’s some pseudocode:
1 2 3 4 5 6 7 |
# register.py user.save() # save this user to the database # Create a new session cookie in the browser, which holds the user ID. session.create('session', user.id) |
The idea is that a user ID will be stored in the user’s browser via cookies — this way, the next time the user requests a page on your site, the user’s browser will send that cookie with the user’s ID along to your server, allowing you to look up this user’s account information, without needing the email address and password again.
Here’s some more pseudocode:
1 2 3 |
# views.py user = User.find(id=session) |
As you can imagine — finding a user account by ID is very quick (no password hashing is necessary).
So — what this means is that only the initial user creation and login processes are slow — the rest of your site can still be fast!
But let’s not stop just yet.
Optimizing for Speed
Since user data is typically required on every page of a website, this data is accessed very frequently.
If you’re using a database like Postgres or MySQL, this means that if you have a few hundred website users, you might be querying your users
table in the database a couple hundred times per second.
That’s quite a few queries!
If your site needs to do other things, you might be unnecessarily slowing down page loads.
So what can you do to speed things up? Cache!
Caching is the solution to most speed and performance problems — and making user data quickly available is one of the most effective ways to speed up your site.
The idea is pretty simple: keep a key / value store in memory (or a specialized database like Redis or Memcache) that consists of a user ID for the key, and the user’s account data as the hash.
This helps, because the next time a user makes a request for a page on your site, and sends you their session cookie, instead of querying the database to find the account, you can instead query an in-memory cache directly for this information.
This might be the difference between 1ms and 100+ms in every user request: that’s a lot of saved time! While it doesn’t seem like much on it’s own, when you start adding in latency caused by other parts of your application, you can really speed things up a lot overall.
For caching, you’ll most likely want to store this data in a cache system like memcached or redis, both of which have awesome python libraries.
In pseudocode, you’ll likely do something like this:
1 2 3 4 5 6 7 |
if session: user = cache.get(session) # If no user was found in the cache, try querying the database directly. if not user: user = User.get(id=session) |
Implementation
If you’re using a web framework like Django, you can really easily do all of the things mentioned in this article by simply using the built-in auth system.
If you’re using another framework / tool, you might want to google around for libraries — there are typically a few good options to help with this stuff regardless of what tooling you’re using.
Lastly, if you’re using Python / Flask / Django, and want to get all the awesomeness of best practices around user storage and security, you might want to check out our developer service: Stormpath.
Our service stores user accounts and user data for you, taking care of password hashing, encryption, data security, best practices, and everything else.
It’s free to use for most applications and integrates easily into Python,Flask, and Django apps.
The latest release of our Python library includes built-in support in-memory, memcached, and redis caching to ensure your site is ALWAYS as fast as possible, out of the box.
If you’d like to get started with Stormpath, you can check out our libraries here:
To learn more about what Stormpath is doing for password security, check out our security page.