Cryptography is an important part of almost every web or mobile application and yet most developers feel that they don’t understand it or worse, are doing it wrong. Yes, the field of cryptography is dominated by uber-smart mathematicians, researchers, and PhDs galore but thanks to the hard work of those people, we have great tools that the average dev can benefit from.

Here at Stormpath, our goal is to make developer’s lives easier and help them build more secure applications faster. Sharing the basics of cryptography will hopefully help you in your development projects.

What is Cryptography

Cryptography is the practice of protecting information from undesired access by hiding it or converting it into nonsense. For developers, cryptography is a notoriously complex topic and difficult to implement.

The goal of this guide is to walk you through the basics of Cryptography. How some of these algorithms and techniques are implemented is often dependent on your chosen language and/or framework but this guide will hopefully help you know what to look for. If you’re a Java developer, check out Apache Shiro. It’s a popular security framework thats makes implementing crypto much easier than messing with the JCE. ::cough:: We’re the authors.

Terminology

First, a bit of terminology. In crypto-land, you can break most algorithms down to ciphers and hashes.

  • Ciphers are algorithms that can encrypt or decrypt data based on shared or public/private keys.
  • Hashes (a.k.a. Message Digests) are one way, irreversible conversion of an input source. Often used for passwords and checksums.

When to use a Cipher vs a Hash

Put simply, you should use a hash when you know you will not need the original data in plain text ever again. When would you ever take data that you don’t ever need it’s original value? Well, passwords are a good example.

Let’s say Sean’s password is ‘Suck!tTrebek’. Your application doesn’t actually need to know the raw value of the password. It just needs to be able to verify that any future authentication attempt by Sean gives you a matching value. Hashes are great for that because the hash, while gibberish, will always be the same for a particular input, thereby letting you verify a match. Because you just need the hash to perform comparisons, the big benefit to you is that you don’t need to store Sean’s raw (plaintext) password directly in your database, which would be a bad idea.

Another example is a checksum. A checksum is a simple way to know if data has been corrupted or tampered with during storage or transmission. You take the data, hash it, and then send data along with the hash (the checksum). On the receiving end, you’ll apply the same hashing algorithm to the received data and compare that value to the checksum. If they don’t match, your data has been changed. This is one of the strategies Stormpath uses to prevent man-in-the-middle attacks on API requests.

Ciphers on the other hand are your tool if you will need the original raw value (called ‘plaintext’ in cryptography, even if its binary data) eventually. Credit card numbers are a good example here. Sean gives you his credit card and later you’ll need the plain text value to process a transaction. In order to encrypt and decrypt any piece of data you’ll need deal with keys and make sure you’re keeping them safe.

Working with Ciphers

Want to encrypt and decrypt your sensitive data like a boss? Let’s talk ciphers.

Choosing a Cipher Algorithm

If you’re reading this… well, then you should probably stick to AES-256 Encryption as it is approved by the US Military for top-secret material.

Most languages and many security frameworks support multiple cipher algorithms but determining which is appropriate for you is a complex topic outside the scope of this guide. Consult your local cryptanalyst.

Ciphers typically come in a few varieties. Symmetric or Asymmetric Ciphers. Symmetric ciphers can be further categorized into Block or Stream ciphers. Let’s discuss the differences.

Symmetric vs Asymmetric Cipher

Symmetric encryption (aka secret key encryption) uses the same (or trivially similar) key to both encrypt and decrypt data. As long as the sender and receiver both know the key, then they can encrypt and decrypt all the messages that use that use that key. AES and Blowfish are both Symmetric (Block) Ciphers.

But there could be a problem. What happens if the key falls into the wrong hands. Oh No!

Asymmetric ciphers to the rescue. Asymmetric encryption (aka public key encryption) uses a pair of keys, one key to encrypt and the other to decrypt. One key, the public key, is published openly so that anyone one can send you a properly encrypted message. The other key, the private key, you keep secret to yourself so that only you can decrypt those messages. Asymmetric sounds perfect, right? It has limitations too, unfortunately. It’s slower, using up more computing resources than a symmetric cipher and it requires more coordination between parties for each direction of communication.

Consider defaulting to asymmetric encryption until your project requirements suggest otherwise. And always remember to properly secure you private keys.

Block vs Stream Cipher

Symmetric ciphers can be categorized into two sub-categories: Block Ciphers and Stream Ciphers. A Stream Cipher has a streaming key that is used to encrypt and decrypt data. A Block Cipher uses a ‘block’ of data (a ‘chunk’ of bytes = a byte array) as the key that is used to encrypt and decrypt. We recommend most people use Block Ciphers since byte array keys are easy to work with and store. AES and Blowfish are both Block Ciphers for example.

Stream ciphers have their benefits (like speed) but are harder to work with so we don’t generally recommend them unless you know what you’re doing.

There’s a common misconception that Block ciphers are for block data like files and data fields while Stream ciphers are for stream data like network streams. This is not the case. The word ‘stream’ reflects a stream of bits used for the key, not the raw data (again, _plaintext) being encrypted or decrypted.

Securing Keys… Yes

Speaking of securing your keys, you should! No, I’m serious. If someone ever gets a hold of your private keys, all your encrypted data might as well be in plaintext. Like most everything in Cryptography, there are very advanced strategies but most are outside the skill set and budget of most developers. If you can afford key management software, then it’s your best and safest bet. Otherwise, you can still reduce some risk with basic strategies.

  1. Keep your keys on another server and database than your encrypted data.
  2. Keep that server’s security patches 100% up-to-date all the time (Firmware, OS, DB, etc).
  3. Lock down its network access as much as you can. Invest in a good firewall.
  4. Limit who on your team has access to the server and enforce multi-factor authentication for them to access it.

Working with Hashes

Don’t actually care what the original input value was but still need to do other things like matching? Let’s hash it out… get it?

Choosing a Hash Algorithm

Here too, most languages and many security frameworks support a variety of algorithms including MD-2, MD-5 and SHA-1, SHA-256, SHA-384, SHA-512, BCrypt, PBKDF2, and Scrypt. Unless you have a requirement for a particular hashing algorithm, we recommend you stick to BCrypt for secure data like passwords. It is a widely used and reviewed algorithm.

If you’ve heard of Scrypt before, then you’re probably wondering “Shouldn’t I be using Scrypt? Isn’t it better?” Maybe. Cryptanalysts are nothing if not paranoid (to our benefit). The prominent view among experts is that Scrypt is very promising but still too new to be considered a guaranteed bet for most people. So we recommend you stick with Bcrypt for now until more concrete research confirms SCrypt as better.

Salting and Repeated Hashing

For secure data, hashing is often not enough. The same message hashed will always have the same output and can therefore be susceptible to dictionary and rainbow table attacks.

To protect against these attacks, you should use salts. A salt is (preferably) randomly generated data that is used as an input to a hashing algorithm to generate the hash. This way, the same input message with two different salts will have different hashes.

Again, think of a password. If your password is 12345, then the hashed output would be the same anyone else who uses the same password (and very easy to guess for an attacker), unless the password is salted as well and each person has a different salt. Oh, and please don’t use the same salt for every record, that’s not a good idea. Different secure random salts for each password hash is a good idea.

In addition to salting, repeated hashing is recommended. Repeated hashing increases the time it takes someone to try to guess a password. For a user in human time, a difference of milliseconds to half a second is probably negligible. But to an attacker running a script with billions of password candidates, the added time per password can change the time it takes to attack a password from hours to years or even centuries! You’ll need to play with the appropriate number of iterations (or ‘rounds’ in BCrypt speak) in order to find the appropriate balance of security and performance.

What work factor (or # of iterations) should I use?

The complexity of your hashing algorithm comes down to performance versus security. If you’re talking about passwords, then a good rule of thumb is 500 milliseconds to process the algorithm on your production hardware. Most people won’t notice a half second delay during an authentication but it will make an attack prohibitively expensive for many attackers.

Keeping your Hashes and Ciphers up to date

The limiting factors to someone cracking your hashed or encrypted data is compute power and time. If a password is properly hashed with strong salts and a high number of iterations (work factor), it might take an attacker years to break a single password— today. With passwords in particular, you could have a high enough complexity factor so that it takes 0.5 to 1 second to process a password. But Moore’s law presents a problem. Every year compute power doubles and compute costs drop significantly. Moreover, new technologies pop up all the time that give attackers new advantages, like elastic on-demand compute clouds or power GPUs. So, what was a pretty secure hashing strategy today, may not be tomorrow. For any production application, you should be evaluating your strategy at least once a year and perhaps ratcheting up things like the number of hash iterations or salt sizes.

About Stormpath

Stormpath is a user mananagement API for developers. We make it easy to register, login, and manage users without build a database, worrying about data encryption or spending time on maintenance. Stormpath plugs into identity stores like Facebook, Google, LDAP and Active Directory, and also hosts robust user profiles.