Encrypting files in AWS S3 using Java API

If you use AWS S3 Java API, and would like to see how you can encrypt files on S3, this post is for you.

First of all, there are two ways you can encrypt files in S3. One is to encrypt files on the server side, and one is to encrypt files on the client side. With using the server side option, you don’t have to worry about too much. S3 encrypts the files for you when they are written to disk, and decrypts them when they are read, seamlessly. With the client side option, the client (your application) has to encrypt files before transmitting them to S3, and decrypt them after receiving the file from S3.

In this post I’ll cover server side encryption. We opted to use this one because it’s just simpler, and seamless. You don’t have to worry about encrypting/decrypting files yourself, nor do you have to worry about the key.

I’m assuming that you’re already familiar with the AWS Java API. For most things related to S3, AWS provides a class called AmazonS3Client. Once you have AmazonS3Client instantiated with your configuration, you will need to enable encryption in the matadata for each file you upload.

Example:

File fileForUpload = new File(...);
AmazonS3Client amazonS3Client = new AmazonS3Client(...);
ObjectMetadata meta = new ObjectMetadata();
meta.setContentType(URLConnection.guessContentTypeFromName(fileForUpload.getName()));
meta.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION);
amazonS3Client.putObject(s3Bucket, s3FullDestinationPath, new FileInputStream(fileForUpload), meta);

Let’s examine. First you instantiate the File you want to upload, and AmazonS3Client. Next you set the metadata on the file. This includes setting the content type of the file (important because having the wrong content-type can cause issues down the line), and sets the encryption flag for the file. Then when you upload the file using AmazonS3Client.putObject(…), the file will be encrypted by S3 before it is stored, and automatically decrypted when it is retrieved, all by S3’s servers. And that’s it!

Note that according to AWS Java API documentation, AmazonS3Client uses SSL under the hood so you don’t have to worry about transmitting unencrypted files over the network.

Migrating AWS RDS MySQL instances from non-encrypted to encrypted

We had to do this exercise recently due to a security audit requirement, so I thought I’d write about it. If you’ve got old AWS (Amazon Web Services) RDS (Relational Database Service) instances around since before encrypted databases were an option in RDS, or you just never encrypted your databases, and are now deciding to encrypt them, you’ve come to the right place. The steps below apply specifically to MySQL RDS instances, but the same guidelines can be used for other database server types as well.

In summary, RDS doesn’t give you an option to simply encrypt your database if it was created as non-encrypted. Furthermore, you cannot take a snapshot of your non-encrypted database and create an encrypted instance out of it. Essentially you have to manually export the data from the non-encrypted instance, import it into the new encrypted instance, and switch your applications over to use the new encrypted database instances. Then you can get rid of your old non-encrypted instances.

Note that RDS does not allow you to create a full blown replicated database (that is not a read-only replica tied to the existence of a master database). Ideally, this feature would exist in RDS, in which case you can use replication from one DB instance (the un-encrypted one) to another (the encrypted one) so that data is automatically replicated and synchronized between the two. This is essential if you have live applications using your databases, in which case you can have almost no downtime if you can replicate data from an old database to a new one, and simply switch the application to use the new database without doing any manual export/import of data.

So unfortunately if you have live applications using your non-encrypted AWS RDS databases and you need to migrate to encrypted databases, you’ll need to pick a time to do the migration, get prepared, let your users know about the maintenance downtime, and take your applications offline to make it happen. (I’m hoping the good folks at AWS one day soon add a feature for us to fully replicate independently standing databases within RDS).

Anyway, on to the steps. To start out, depending on the size of your databases and your connection speed, you’ll need to decide whether to export the data from your non-encrypted database onto a machine external to AWS (such as your own developer/administration machine wherever you are), OR export it onto an EC2 instance in AWS. If your databases are relatively small in size (and non-numerous) or you just have a ton of bandwidth, you can decide to download all the data onto your own machine. Otherwise I’d recommend you create an EC2 instance in AWS if you don’t have one already, and use that to temporarily act as the machine to export data to, and import data from.

We used a Linux EC2 machine so I’ll focus on that. First of all, you want to make sure that the EC2 machine has an encrypted volume attached to it. This ensures that the data you export doesn’t end up on a non-encrypted disk in AWS (so as not to violate any security rules or policies for your data). At the time of writing this blog entry, EC2 machine root volumes cannot be encrypted, but you can attach encrypted volumes to them. In summary, from the AWS web console, in the EC2 console, you can create an encrypted volume and attach it to your EC2 machine. Then log on to your machine, format the volume, and mount it. I’m sure there are various guides out there for this, so I won’t focus on the nitty gritty.

ssh into your Linux EC2 instance and ensure that mysql client is installed by typing in the “mysql” command. If not, try “yum install mysql” to install it. Next, if you have security group (firewall) rules applied to your RDS instances, make sure that the EC2 machine can connect to the databases (add the IP for the EC2 machine to your RDS security group(s)). Ensure you can connect to your database by typing in the following command: mysql -u (username) -p –host=(database hostname)

You will probably want to create the new encrypted databases in RDS ahead of time from the actual scheduled “maintenance” with your users, so that there is minimal downtime during the actual maintenance window. So assuming your encrypted databases are created and ready, you’re in the maintenance window and are ready to migrate, and have taken your live applications offline, you can begin exporting data from each database.

Now you’re finally ready to export the data from the database. Connect to the EC2 linux machine and cd to the directory the encrypted volume is mounted on. Type in the following command to dump the database from the old non-encrypted MySQL RDS instance:

mysqldump –opt –events –routines –triggers –user=(username)-p –host=(hostname) (database name) > (database name).sql

Of course you’ll want to replace the username, hostname, and database names (everything in parenthesis) with real values. You will be prompted for the password. This command includes everything you’ll need from your old database. More information, or if you want to include multiple databases from the same MySQL server, can be found here on the mysqldump command: http://dev.mysql.com/doc/refman/5.7/en/mysqldump.html

Then to import the data into your new encrypted database, use the following command:

mysql –user=(user) -p –host=(hostname) -e “drop database if exists (database name); create database (database name); use (database name); source (database name).sql;”

Note that the export is usually very fast, but the import is slower. Also note that if you changed the username from what it was in the old database, you’ll need to modify all instances of the username in the .sql file dumped from mysql dump. In order to accomplish that, try this sed command:

sed -i “s/\`(old username)\`@\`%\`/CURRENT_USER/g” (database name).sql

Lastly, after the export and import are finished, update the hostnames of the old databases with the new ones in all your applications. Try out your applications to ensure your new databases are being queried. And once everything checks out, at this point you are ready to update your live applications and put them back online!