Setting up AWS CLI and dumping a S3 bucket

AWS CLI (command line interface) is very useful when you want to automate certain tasks. This post is about dumping a whole S3 bucket from the command line. This could be for any purpose, such as creating a backup.

First of all, if you don’t already have it installed, you’ll need to download and install the AWS CLI. More information here: http://docs.aws.amazon.com/cli/latest/userguide/installing.html

To configure AWS CLI, type the command:

aws configure

It will ask for credentials: the Access Key ID, and the Access Secret Key. More information on how to set up a key is here: http://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html

And that’s it! You now have the power of manipulating your AWS environment from your command line.

In order to dump a bucket, you’ll need to first make sure that the account belonging to the AWS Key you generated has read access to the bucket. More on setting up permissions in S3 here: http://docs.aws.amazon.com/AmazonS3/latest/dev/s3-access-control.html

To dump the whole contents of an S3 bucket, you can use the following command:

aws s3 cp –quiet –recursive s3:///

This will copy the entire contents of the bucket to your local directory. As easy as that!

Creating a simple ping servlet for AWS Elastic Load Balancer (ELB) Health Check

If you use the AWS Elastic Load Balancer (ELB) you’ll need to decide what to use as an endpoint on your application server for the health checker. This is how the ELB will determine if an instance is healthy, or not.

If you use the default “/” path, this may mean that a session in your application is kicked off every time the health checker connects, which could translate to unnecessary added load on your server (though perhaps it may be negligible).

Furthermore, if you create a static .html file and map it, and point the health checker to that, it could turn out that though your application server is hung, the simple static .html is still getting served. This would not make for an accurate health check, and happened to me in my experience.

The best way to ensure that your application server is online and not hung, without adding extra load to your server, will be to create a simple program that runs on the application server. In the case of a Java application server, you can create a simple servlet, as follows:

package com.whatever.aws.elb;

import java.io.IOException;
import java.io.PrintWriter;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@SuppressWarnings("serial")
public class Ping extends HttpServlet {
 private String message;

 public void init() throws ServletException {
  message = "Pong";
 }

 public void doGet(HttpServletRequest request,
  HttpServletResponse response)
 throws ServletException, IOException {
  response.setContentType("text/html");

  PrintWriter out = response.getWriter();
  out.println(" < h1 > " + message + " < /h1>
   ");
  }
 }

And map the servlet to a path in web.xml:

<servlet>
  <servlet-name>Ping</servlet-name>
  <servlet-class>com.perthera.elb.Ping</servlet-class>
  <load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
  <servlet-name>Ping</servlet-name>
  <url-pattern>/Ping</url-pattern>
  <url-pattern>/ping</url-pattern>
</servlet-mapping>

Now, you can configure the ELB health checker to connect to the “/ping” path on your instance. If it times out, or returns an error, that means the application server is not healthy. If it returns a normal HTTP code, then all is good.

Encrypting already existing files in AWS S3 using the AWS Java API

In my last post I covered how to server-side encrypt files in S3 using the AWS Java API. Unfortunately, if you didn’t turn on encryption from the very first day when uploading to S3, you may have some files that are not encrypted. This post will cover an easy block of Java code which you can use to server-side encrypt any existing files that aren’t already, using the AWS Java API.

In summary, you need to loop through all existing files in a bucket, and see which one is not encrypted. And if not encrypted, you set the metadata to turn on server-side encryption, and have to save the file again in S3. Note: this may change the timestamps on your files, but this is essentially the only way through the API to save the metadata for a file to turn on encryption.

Here is the code:

public S3EncryptionMigrator(String bucketName) {
 Logger.getLogger("com.amazonaws.http.AmazonHttpClient").setLevel(Level.OFF); //AWS API outputs too much information, totally flodding the console. Turn it off

 AmazonS3Client amazonS3Client = new AmazonS3Client(...);

 ObjectListing objectListing = amazonS3Client.listObjects(bucketName);
 List s3ObjectSummaries = objectListing.getObjectSummaries();
 while (objectListing.isTruncated()) {
  objectListing = amazonS3Client.listNextBatchOfObjects(objectListing);
  s3ObjectSummaries.addAll(objectListing.getObjectSummaries());
 }

 for (S3ObjectSummary s3ObjectSummary: s3ObjectSummaries) {
  String s3ObjectKey = s3ObjectSummary.getKey();
  S3Object unecryptedS3Object = amazonS3Client.getObject(bucketName, s3ObjectKey);
  ObjectMetadata meta = unecryptedS3Object.getObjectMetadata();
  String currentSSEAlgorithm = meta.getSSEAlgorithm();
  unecryptedS3Object.close();
  if (currentSSEAlgorithm != null && currentSSEAlgorithm.equals(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION))
   continue; //Already encrypted, skip
  meta.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION); //set encryption
  CopyObjectRequest copyObjectRequest = new CopyObjectRequest(bucketName, s3ObjectKey, bucketName, s3ObjectKey);
  copyObjectRequest.setNewObjectMetadata(meta);
  amazonS3Client.copyObject(copyObjectRequest); //Save the file
  System.out.println(">> '" + s3ObjectKey + "' encrypted.");
 }
}

Let’s examine the code. First you instantiate AmazonS3Client with the correct credentials. This should be tailored to your S3 authentication setup.  You start by getting a list of all files in a bucket. Note that you have to loop through objectListing.getObjectSummaries() because only 1000 results are returned at a time. In case you have more than 1000 files, you’ll need to loop through the rest until you get all of them.

Then you loop through the list of files. For each file you check if server-side encryption is already turned on by reading the existing metadata of the file. If not, you set the flag for encryption, and then essentially copy the file onto itself. This will save the new metadata, and will turn on server-side encryption.

Encrypting files in AWS S3 using Java API

If you use AWS S3 Java API, and would like to see how you can encrypt files on S3, this post is for you.

First of all, there are two ways you can encrypt files in S3. One is to encrypt files on the server side, and one is to encrypt files on the client side. With using the server side option, you don’t have to worry about too much. S3 encrypts the files for you when they are written to disk, and decrypts them when they are read, seamlessly. With the client side option, the client (your application) has to encrypt files before transmitting them to S3, and decrypt them after receiving the file from S3.

In this post I’ll cover server side encryption. We opted to use this one because it’s just simpler, and seamless. You don’t have to worry about encrypting/decrypting files yourself, nor do you have to worry about the key.

I’m assuming that you’re already familiar with the AWS Java API. For most things related to S3, AWS provides a class called AmazonS3Client. Once you have AmazonS3Client instantiated with your configuration, you will need to enable encryption in the matadata for each file you upload.

Example:

File fileForUpload = new File(...);
AmazonS3Client amazonS3Client = new AmazonS3Client(...);
ObjectMetadata meta = new ObjectMetadata();
meta.setContentType(URLConnection.guessContentTypeFromName(fileForUpload.getName()));
meta.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION);
amazonS3Client.putObject(s3Bucket, s3FullDestinationPath, new FileInputStream(fileForUpload), meta);

Let’s examine. First you instantiate the File you want to upload, and AmazonS3Client. Next you set the metadata on the file. This includes setting the content type of the file (important because having the wrong content-type can cause issues down the line), and sets the encryption flag for the file. Then when you upload the file using AmazonS3Client.putObject(…), the file will be encrypted by S3 before it is stored, and automatically decrypted when it is retrieved, all by S3’s servers. And that’s it!

Note that according to AWS Java API documentation, AmazonS3Client uses SSL under the hood so you don’t have to worry about transmitting unencrypted files over the network.

Migrating AWS RDS MySQL instances from non-encrypted to encrypted

We had to do this exercise recently due to a security audit requirement, so I thought I’d write about it. If you’ve got old AWS (Amazon Web Services) RDS (Relational Database Service) instances around since before encrypted databases were an option in RDS, or you just never encrypted your databases, and are now deciding to encrypt them, you’ve come to the right place. The steps below apply specifically to MySQL RDS instances, but the same guidelines can be used for other database server types as well.

In summary, RDS doesn’t give you an option to simply encrypt your database if it was created as non-encrypted. Furthermore, you cannot take a snapshot of your non-encrypted database and create an encrypted instance out of it. Essentially you have to manually export the data from the non-encrypted instance, import it into the new encrypted instance, and switch your applications over to use the new encrypted database instances. Then you can get rid of your old non-encrypted instances.

Note that RDS does not allow you to create a full blown replicated database (that is not a read-only replica tied to the existence of a master database). Ideally, this feature would exist in RDS, in which case you can use replication from one DB instance (the un-encrypted one) to another (the encrypted one) so that data is automatically replicated and synchronized between the two. This is essential if you have live applications using your databases, in which case you can have almost no downtime if you can replicate data from an old database to a new one, and simply switch the application to use the new database without doing any manual export/import of data.

So unfortunately if you have live applications using your non-encrypted AWS RDS databases and you need to migrate to encrypted databases, you’ll need to pick a time to do the migration, get prepared, let your users know about the maintenance downtime, and take your applications offline to make it happen. (I’m hoping the good folks at AWS one day soon add a feature for us to fully replicate independently standing databases within RDS).

Anyway, on to the steps. To start out, depending on the size of your databases and your connection speed, you’ll need to decide whether to export the data from your non-encrypted database onto a machine external to AWS (such as your own developer/administration machine wherever you are), OR export it onto an EC2 instance in AWS. If your databases are relatively small in size (and non-numerous) or you just have a ton of bandwidth, you can decide to download all the data onto your own machine. Otherwise I’d recommend you create an EC2 instance in AWS if you don’t have one already, and use that to temporarily act as the machine to export data to, and import data from.

We used a Linux EC2 machine so I’ll focus on that. First of all, you want to make sure that the EC2 machine has an encrypted volume attached to it. This ensures that the data you export doesn’t end up on a non-encrypted disk in AWS (so as not to violate any security rules or policies for your data). At the time of writing this blog entry, EC2 machine root volumes cannot be encrypted, but you can attach encrypted volumes to them. In summary, from the AWS web console, in the EC2 console, you can create an encrypted volume and attach it to your EC2 machine. Then log on to your machine, format the volume, and mount it. I’m sure there are various guides out there for this, so I won’t focus on the nitty gritty.

ssh into your Linux EC2 instance and ensure that mysql client is installed by typing in the “mysql” command. If not, try “yum install mysql” to install it. Next, if you have security group (firewall) rules applied to your RDS instances, make sure that the EC2 machine can connect to the databases (add the IP for the EC2 machine to your RDS security group(s)). Ensure you can connect to your database by typing in the following command: mysql -u (username) -p –host=(database hostname)

You will probably want to create the new encrypted databases in RDS ahead of time from the actual scheduled “maintenance” with your users, so that there is minimal downtime during the actual maintenance window. So assuming your encrypted databases are created and ready, you’re in the maintenance window and are ready to migrate, and have taken your live applications offline, you can begin exporting data from each database.

Now you’re finally ready to export the data from the database. Connect to the EC2 linux machine and cd to the directory the encrypted volume is mounted on. Type in the following command to dump the database from the old non-encrypted MySQL RDS instance:

mysqldump –opt –events –routines –triggers –user=(username)-p –host=(hostname) (database name) > (database name).sql

Of course you’ll want to replace the username, hostname, and database names (everything in parenthesis) with real values. You will be prompted for the password. This command includes everything you’ll need from your old database. More information, or if you want to include multiple databases from the same MySQL server, can be found here on the mysqldump command: http://dev.mysql.com/doc/refman/5.7/en/mysqldump.html

Then to import the data into your new encrypted database, use the following command:

mysql –user=(user) -p –host=(hostname) -e “drop database if exists (database name); create database (database name); use (database name); source (database name).sql;”

Note that the export is usually very fast, but the import is slower. Also note that if you changed the username from what it was in the old database, you’ll need to modify all instances of the username in the .sql file dumped from mysql dump. In order to accomplish that, try this sed command:

sed -i “s/\`(old username)\`@\`%\`/CURRENT_USER/g” (database name).sql

Lastly, after the export and import are finished, update the hostnames of the old databases with the new ones in all your applications. Try out your applications to ensure your new databases are being queried. And once everything checks out, at this point you are ready to update your live applications and put them back online!

The power of AWS Elastic Beanstalk Environment Configuration using .ebextensions

This is a quick post exploring the usefulness of AWS Elastic Beanstalk Environment Configuration files using “.ebextensions”.

.ebextensions config files, written in YAML (http://yaml.org/), can be used to set up the server platform by automatically performing various custom actions and configuration when an application is uploaded to AWS Elastic Beanstalk.

Through .ebextensions you can:

  • Create configuration or other files (SSL certificates, etc) on the server machine
  • Install packages/programs
  • Start/stop services
  • Execute custom commands
  • And much more

This can help you set up a new or existing server, as far as the configuration on the server machine is concerned, without manually having to do it yourself every time you deploy a new application.

Since I’m most familiar with how .ebextensions work using Java .war’s deployed to AWS Elastic Beanstalk, here’s a quick rundown on how to set it up for your Java environment: in your web project’s WebContent folder, create a folder called “.ebextensions”. Then within the .ebextensions folder you can create one or many files ending with a .config extension. Any and all .config files within the .ebextensions (ProjectRoot/WebContent/.ebextensions/*.config) will get executed after you upload the .war file for your project to AWS Elastic Beanstalk.

So if you’re using AWS Elastic Beanstalk and aren’t yet using .ebextensions, I would highly recommend you look into it. There is more documentation here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/ebextensions.html

Resolving Vaadin push with long polling issues in AWS Elastic Beanstalk

This took me a long time to figure out, so I thought I’d share it with anyone else struggling with this.

When you upload a Vaadin application to the AWS Elastic Beanstalk, Tomcat sits behind an Apache server that proxies connections to it. (Note: this may apply to you even if you’re not in AWS/ElasticBeanstalk. If you use Apache as a proxy to an application server, be it Tomcat or something else, this could still be an issue).

If you’re using Vaadin push in long polling mode (Transport.LONG_POLLING), it turns out that the connection between the client (browser) and server (tomcat, or your other java application server) gets dropped due to a timeout issue. If there is no data exchanged between the server (tomcat) and the client (browser), then the connection is deemed “idle” by Apache and gets dropped. This is an issue because with long polling, the HTTP connection remains open indefinitely, and that’s information is pushed to the client from the server.

Now with Vaadin’s push, it’s supposed to recover from a dropped connection. The connection is supposed to just get reestablished, and everything resumes. That’s an ideal scenario. In my experience, that doesn’t always work the way it should. If I’m on a Vaadin web app through my browser, and sit idle for a while, when I come back to it a lot of time it resumes fine (I can see push reestablishing itself in the console), but a lot of times the app just hangs forever.

Another thing to note is that Vaadin sends a heartbeat to the client, to check if the client is still there. The default heartbeat is every 5 minutes. When a heartbeat is sent, Apache’s idle connection timeout is reset. So if there is no user activity for a few minutes, the goal is to adjust the configuration so that the heartbeat happens before Apache times out the HTTP connection.

So the solution for this is to

  1. Modify Apache’s timeout for idle connections. I believe the default setting in AWS’s configuration for Apache, is 60 seconds. You’ll need to ssh into your server, edit /etc/httpd/conf/httpd.conf, and look for the “Timeout” directive. Change the Timeout directive to be greater-than the default Vaadin heartbeat of 5 minutes. Try 6 minutes (note: the Timeout value is in seconds), so you’d set “Timeout 360”. The downside of this approach is if your server has hundreds of thousands of requests, surely some legitimately idle connections (where the client is gone without cleaning up and not coming back) will linger for longer than 60 seconds, this affecting your system performance.
  2. Modify Vaadin’s heartbeat to be less than 60 seconds. You can edit your web.xml, and add an heartbeatInterval parameter like so:
        <context-param>
            <param-name>heartbeatIntervalparam-name>
            <param-value>120param-value>
        context-param>
  3. You can tweak both the Vaadin heartbeat, and the Apache Timeout value to some reasonable values that suit your needs the best. Just make sure that the heartbeat interval is shorter than the timeout.

Enjoy!