When you build a web application, one thing you may need to think about is how you plan to store user files.
If you’re building an application that requires users to upload or download files (images, documents, receipts, etc.) — file storage can be an important part of your application architecture.
Deciding where you’ll store these files, how you’ll access them, and how you’ll secure them is an important part of the engineering process, and can take quite a bit of time to figure out for complex applications.
In this guide, I’m going to walk you through the best ways to store files for your users if you’re already using Stormpath to handle your user storage, authentication, and authorization.
If you aren’t already using Stormpath—are you crazy?! Go sign up and start using it right now! It’s totally free (unless you’re building a large project) and makes building secure web appications, API services, and mobile apps wayyy simpler.
When building web applications, you’ve got a few choices for where to store your files. You can:
- Store user files in your database in a text column, or something similar
- Store user files directly on your web server
- Store user files in a file storage service like Amazon S3
Out of the above choices, #3 is your best bet.
Storing files in a database directly is not very performant. Databases are not optimized for storing large blobs of content. Both retrieving and storing files from a database server is incredibly slow and will tax all other database queries.
Storing files locally on your web server is also not normally a good idea. A given web server only has so much disk space, which means you now have to deal with the very real possibility of running out of disk space. Furthermore, ensuring your user files are properly backed up and easily accessible at all times can be a difficult task for even experienced engineers.
Unlike the other two options, storing files in a file storage service like S3 is a great option: it’s cheap, your files are replicated and backed up transparently, and you’re also able to quickly retrieve and store files there without taxing your web servers or database servers. It even provides fine-grained control over who can access what files, which allows you to build complex authorization rules for your files if necessary.
For storing what can sometimes be sensitive information, a file storage service like Amazon S3 is a great way to get the best of all worlds: availability, simplicity, and security.
To sign up for an Amazon Web Services (AWS) account, and to start using Amazon S3, you can visit their website here.
Now that we’ve talked about where you should store your user files (a service like Amazon S3), let’s talk about how you actually store your files there.
When storing files in S3, there are a few things you need to understand.
Firstly, you need to pick the AWS region in which you want your files to live. An Amazon region is basically a datacenter in a particular part of the world.
Like all big tech companies, Amazon maintains datacenters all over the world so they can build fast services for users in different physical locations. One of the benefits to using an Amazon service is that you can take advantage of this to help build faster web applications.
Let’s say you’re building a website for Korean users. You probably want to put all of your web servers and content in a datacenter somewhere in Korea. This way, when your users visit your site, they only need to connect over a short physical distance to your web server, thereby decreasing latency.
Amazon has a list of regions for which you can store files in S3 on their website here.
The first thing you need to do is use the list above to pick the most appropriate location for storing your files. If you’re building a web application that needs to be fast from all over the world: don’t worry, just pick the AWS region closest to you — you can always use a CDN service like Amazon Cloudfront to optimize this later.
Next, you need to create an S3 bucket. An S3 bucket is basically a directory for which all of your files will be stored. I usually give my S3 buckets the same name as my application.
Let’s say I’m building an application called “The Greatest Test App”—I would probably name my S3 bucket: “the-greatest-test-app”.
S3 allows you to create as many buckets as you want, but each bucket name must be globally unique. That means that if someone else has already created a bucket with the name you want to use: you won’t be able to use it.
Finally, after you’ve picked your region and created your bucket, you can now start storing files.
This brings us to the next question: how should you structure your S3 bucket when storing user files?
The best way to do this is to partition your S3 bucket into user-specific sub-folders.
Let’s say you have three users for your web application, and each one has a unique ID. You might then create three sub-folders in your main S3 bucket for each of these users — this way, when you store user files for these users, those files are stored in the appropriately named sub-folders.
Here’s how this might look:
│ └── avatar.png
│ └── avatar.png
This is a nice structure because you can easily see the separation of files by user, which makes managing these files in a central location simple. If you have multiple processes or applications reading and writing these files, you already know your what files are owned by which user.
Now that you’ve seen how to store files in S3, how do you ‘link’ those files to your actual Stormpath user accounts? The answer is custom data.
Custom Data is a essentially a JSON store that Stormpath provides for every resource. This JSON store allows you to store any arbitrary JSON data you want on your user accounts. This is the perfect place to store file metadata to make searching for user files simpler.
Let’s say you have just uploaded two files for a given user into S3, and want to store a ‘link’ to those files in your Stormpath Account for that user. To do this, you will insert the following JSON data into your Stormpath user’s CustomData resource:
This is a nice structure for storing file metadata because it means that every time you have the user account object in your application code, you can easily know:
- What files this user owns.
- How to access each file the user owns by its public URL. NOTE: These URLs may not actually be public depending on how you permission these files in S3. More on this later.
- When each file was last modified.
This JSON data makes it much easier to build complex web applications, as you can seamlessly find your user files either directly from S3, or from your user account. Either way: finding the files you need is no longer a problem.
So far we’ve seen how you can store files, link them to your user accounts, and manage them.
But now let’s talk about how you can secure your user files.
Security is a large issue for sensitive applications. Storing medical records or personal information can be a huge risk. Ensuring you take the proper precautions when working with this type of data will save you a lot of trouble down the road.
There are several things you need to know about securely storing files in Amazon S3.
First: let’s talk about file encryption.
If you’re building a simple web app that stores personal information of some sort, you’ll want to use client side encryption. This is the most “secure” form of file storage, as it requires you (the developer) to encrypt the files on your web server BEFORE storing them in S3. This means that no matter what happens, Amazon (as a company) can not possibly decrypt and view your stored files.
On the other hand, if you’re building an application that doesn’t require the utmost (and usually more complicated) client side encryption functionality S3 provides, you can instead use the provided server side encryption technology. This technology allows Amazon to theoretically decrypt and read your files, but still provides a decent amount of protection against many forms of attacks.
The next thing you need to know about are your file permissions, also known as ACLs. The full ACL documentation can be found here.
The gist of it is, however, that when you upload files to S3, you can tell Amazon to give your files certain permissions.
You can say things like:
- Ensure only I (the creator of the file) can view or change this file
- Make this file publicly accessible to anyone who has the file URL
- Make this file only temporarily accessible to anyone who has a one-time usage URL
Using Amazon ACLs you can create a very fine-grained amount of control over who has access to what files, and for how long: it is an ideal system for building secure applications.
A general rule of thumb is to only grant file permissions when absolutely necessary. Unless you’re building a public image hosting service, or storing files that are meant to be publicly accessible always (like user avatars), you’ll probably want to lock your files down to the maximum extent possible.
Now that we’ve covered all the main things you need to know to securely store user files with for your user accounts with S3, let’s do a quick review of what we’ve learned.
When storing user files, keep them namespaced by user IDs in your S3 bucket. This way, you can easily distinguish between user files when looking at them from your storage service alone.
Use Stormpath’s Custom Data store to store all user file metadata. This way you have a single, simple place to reference all of your file data from your user account alone.
If you’re not using Stormpath to store your user accounts: you’ll want to build something similar.
If you’re building a sensitive application: use client-side encryption to encrypt your files before storing them in S3. This will keep them really safe.
If you’re not building a sensitive application, use Amazon’s server-side encryption to help alleviate various security concerns. It’s not as secure as client-side encryption, but is better than nothing.
Finally, be sure to only grant the minimal necessary permissions you need for each file you store. This way, files are not left open or accessible to people who shouldn’t see them.
And… That’s it! If you follow these rules to storing user files, you’ll do just fine.