Creating a rotating proxy in AWS using the Java SDK

AWS EC2 instances can be used to create a HTTP proxy server, so when a client browser using the proxy browses the internet, the AWS EC2 instance’s public IP address effectively becomes their IP address. This may be useful for anonymity, for example if you’re browsing the Internet from home but want to mask your IP address.

Furthermore, you can even have the IP address of your AWS EC2 instance change, by releasing and attaching a new AWS Elastic IP to it, thus “rotating” the public IP of the HTTP proxy. This way you can achieve even more anonymity by using an ever changing IP address.

This is a guide on how to use an AWS EC2 instance (particularly Linux) to create a rotating HTTP proxy. We’ll achieve this using the AWS Java SDK.

To get started start, install tinyproxy on your EC2 instance. SSH into it, and run the following command:

sudo yum -y install tinyproxy –enablerepo=epel

Then edit /etc/tinyproxy/tinyproxy.conf. Note the port, which should be 8888 by default. Make sure the following options are set:

BindSame yes
Allow 0.0.0.0/0
#Listen 192.168.0.1 (make sure this is commented out, meaning line starts with #)
#Bind 192.168.0.1 (make sure this is commented out, meaning line starts with #)

Fire up the tinyproxy by running:

sudo service tinyproxy start

You may also want to add the same command (without the sudo) to /etc/rc.local so tinyproxy is started whenever the EC2 instance is restarted. There’s a proper way to indicate in Linux what services to start on system startup, but I’m forgetting how, and being too lazy to look it up right now :). Adding this command to /etc/rc.local will certainly do the trick.

Now set your web browser (or at the OS level) to use an HTTP proxy by pointing the settings to the public IP address of the EC2 instance. If you don’t know the IP already, you can get it using the AWS EC2 web console. Or by typing the following command on the EC2 server shell:

wget http://ipinfo.io/ip -qO –

You can now go to Google and type in “What is my IP address”. Google will show you, and you’ll notice that it’s not your real IP, but the public IP of the EC2 instance you’re using as a proxy.

Before we move on, let’s set up some security group settings for the EC2 instance to prevent access. This is necessary so not everyone on the Internet can use your proxy server. The best way to go about this is to use the AWS EC2 web console. Navigate to the security group of the EC2 instance, and note the “Group Name” of the security group (we’ll use that later). Add a custom inbound TCP rule to allow traffic from your IP address to port 8888 (or whatever you configured the proxy to run on).

Next what you need to do is to attach new network interfaces to your EC2 instance (one or multiple). This is so that you can have additional network interfaces that you can map an elastic IP address to, as you don’t want to mess with the main network interface so you can have at least one static IP so you can connect to your EC2 instance for whatever reason. The other network interfaces will rotate their public IPs by attaching and releasing to Elastic IPs (AWS seems to have an endless pool of Elastic IPs, you get a new random one every time you release an Elastic IP and reallocate a new one… this works in our favor so we get new IPs every time).

To attach an Elastic Network Interface to your EC2 instance, check out this documentation: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html. Also note that depending on the type of EC2 instance, you only get to allocate a certain number of network interfaces (for t2.micro, I believe the limit is 1 default and 2 additional (so 3 total)). Lastly, take note of the Elastic Network Interface IDs and their corresponding private IP addresses, once you create them. We’ll use them in our java code.

Now, below is a Java code segment that can be used to assign and rotate Elastic IPs to your EC2 instance, which then become the IPs used as proxy. Note at the top of the code there are a number of configuration parameters (static class level variables) that you’ll need to fill out. And of course you’ll need to have the AWS Java SDK in your classpath.

The method associateAll() will associate the Elastic Network Interfaces provided with new Elastic IPs. And the method releaseAll() will detach the Elastic IPs from the Elastic Network Interfaces and release them to the wild (and thus a subsequent associateAll() will then return new IPs). associateAll() will return an ArrayList of Strings corresponding to the new Elastic IPs attached to the EC2 instance. And these IPs can then be used as the HTTP proxy (tinyproxy will automatically bind itself to the proxy port (8888) on the new public IP addresses, so you can connect to them from your client/browser).

Also note that associateAll() will authorize the public IP of the machine running this code by adding it to the EC2 security group to allow connection to TCP port 8888 (or whatever you configured your HTTP proxy port to be) going into the EC2 instance.

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashSet;

import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.ec2.AmazonEC2;
import com.amazonaws.services.ec2.AmazonEC2ClientBuilder;
import com.amazonaws.services.ec2.model.Address;
import com.amazonaws.services.ec2.model.AllocateAddressRequest;
import com.amazonaws.services.ec2.model.AllocateAddressResult;
import com.amazonaws.services.ec2.model.AmazonEC2Exception;
import com.amazonaws.services.ec2.model.AssociateAddressRequest;
import com.amazonaws.services.ec2.model.AssociateAddressResult;
import com.amazonaws.services.ec2.model.AuthorizeSecurityGroupIngressRequest;
import com.amazonaws.services.ec2.model.AuthorizeSecurityGroupIngressResult;
import com.amazonaws.services.ec2.model.DescribeAddressesResult;
import com.amazonaws.services.ec2.model.DomainType;
import com.amazonaws.services.ec2.model.IpPermission;
import com.amazonaws.services.ec2.model.IpRange;
import com.amazonaws.services.ec2.model.ReleaseAddressRequest;
import com.amazonaws.services.ec2.model.ReleaseAddressResult;

public class AWSProxyUtil 
{
	static String SECURITY_GROUP = "security-group-name-of-your-ec2-instance";
	static int DEFAULT_PROXY_PORT_TO_ASSIGN = 8888;
	static String PUBLIC_IP_TO_IGNORE = "1.2.3.4"; 	//This is the IP you want to remain static,
							//so you can connect to your EC2 instance.

	@SuppressWarnings("serial")
	static HashSet<String> NETWORK_ID_PRIVATE_IPs_TO_ASSOCIATE_WITH = new HashSet<String>()
	{{
		//These are the network interface IDs and their private IPs
		//that will be used to attach Elastic IPs to. Format is <ID>:<IP>.
		add("eni-xxxxxxxx:1.2.3.4");
		add("eni-xxxxxxxx:1.2.3.4");
		add("eni-xxxxxxxx:1.2.3.4");
	}};
	
	public static String AWS_ACCESS_KEY_ID = "xxx"; //Your AWS API key info
	public static String AWS_SECRET_KEY_ID = "xxx";

	public static Regions AWS_REGIONS = Regions.US_WEST_2;

	public static void releaseAll() throws Exception
	{
		debugSOP("Relasing elastic IPs");
		
		BasicAWSCredentials awsCreds = new BasicAWSCredentials(AWS_ACCESS_KEY_ID, AWS_SECRET_KEY_ID);
		final AmazonEC2 ec2 = 
				AmazonEC2ClientBuilder
					.standard()
					.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
					.withRegion(AWS_REGIONS)
					.build(); 

		DescribeAddressesResult response = ec2.describeAddresses();

		for(Address address : response.getAddresses()) 
		{
			if(address.getPublicIp().equals(PUBLIC_IP_TO_IGNORE))
			{
				debugSOP(" * Keeping "+address.getPublicIp());
				continue;
			}
			debugSOP(" * Releasing "+address.getPublicIp());
			ReleaseAddressRequest releaseAddressRequest = new ReleaseAddressRequest().withAllocationId(address.getAllocationId());
			ReleaseAddressResult releaseAddressResult = ec2.releaseAddress(releaseAddressRequest);
			debugSOP("   * Result "+releaseAddressResult.toString());
		}
	}
	
	public static ArrayList<String> associateAll() throws Exception
	{
		ArrayList<String> result = new ArrayList<String>();
		
		debugSOP("Associating elastic IPs");
		
		BasicAWSCredentials awsCreds = new BasicAWSCredentials(AWS_ACCESS_KEY_ID, AWS_SECRET_KEY_ID);
		final AmazonEC2 ec2 = 
				AmazonEC2ClientBuilder
					.standard()
					.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
					.withRegion(AWS_REGIONS)
					.build(); 

		DescribeAddressesResult response = ec2.describeAddresses();

		HashSet<String> alreadyAssociated = new HashSet<String>();
		for(Address address : response.getAddresses()) 
		{
			if(address.getPublicIp().equals(PUBLIC_IP_TO_IGNORE))
			{
				continue;
			}
			debugSOP(" * Already associated - Private IP: "+address.getPrivateIpAddress()+", Public IP: "+address.getPublicIp());
			result.add(address.getPublicIp()+":"+DEFAULT_PROXY_PORT_TO_ASSIGN);
			alreadyAssociated.add(address.getNetworkInterfaceId()+":"+address.getPrivateIpAddress());
		}
		
		for(String networkIdPrivateId : NETWORK_ID_PRIVATE_IPs_TO_ASSOCIATE_WITH)
		{
			if(alreadyAssociated.contains(networkIdPrivateId))
				continue;
			
			String fields[] = networkIdPrivateId.split(":");
			String networkId = fields[0];
			String privateIp = fields[1];

			AllocateAddressRequest allocate_request = new AllocateAddressRequest()
				    .withDomain(DomainType.Vpc);

			AllocateAddressResult allocate_response =
			    ec2.allocateAddress(allocate_request);

			String publicIp = allocate_response.getPublicIp();
			String allocation_id = allocate_response.getAllocationId();

			debugSOP(" * Associating Public IP "+publicIp+" to "+networkIdPrivateId);

			AssociateAddressRequest associate_request =
			    new AssociateAddressRequest()
			    	.withNetworkInterfaceId(networkId)
			    	.withPrivateIpAddress(privateIp)
			        .withAllocationId(allocation_id);
			
			AssociateAddressResult associate_response =
				    ec2.associateAddress(associate_request);
			
			debugSOP("   * Result "+associate_response.toString());
			
			result.add(publicIp+":"+DEFAULT_PROXY_PORT_TO_ASSIGN);
		}
		
		debugSOP("Getting public IP address of this machine");
		URL awsCheckIpURL = new URL("http://checkip.amazonaws.com");
		HttpURLConnection awsCheckIphttpUrlConnection = (HttpURLConnection) awsCheckIpURL.openConnection();
		BufferedReader awsCheckIpReader = new BufferedReader(new InputStreamReader(awsCheckIphttpUrlConnection.getInputStream()));
		String thisMachinePublicIp = awsCheckIpReader.readLine();
		
		debugSOP("Authorizing public IP for this machine "+thisMachinePublicIp+" to security group "+SECURITY_GROUP+" for incoming tcp port "+DEFAULT_PROXY_PORT_TO_ASSIGN);
		IpRange ip_range = new IpRange()
			    .withCidrIp(thisMachinePublicIp+"/32");
		IpPermission ip_perm = new IpPermission()
		    .withIpProtocol("tcp")
		    .withToPort(DEFAULT_PROXY_PORT_TO_ASSIGN)
		    .withFromPort(DEFAULT_PROXY_PORT_TO_ASSIGN)
		    .withIpv4Ranges(ip_range);
		AuthorizeSecurityGroupIngressRequest auth_request = new
		    AuthorizeSecurityGroupIngressRequest()
		        .withGroupName(SECURITY_GROUP)
		        .withIpPermissions(ip_perm);
		try
		{
			AuthorizeSecurityGroupIngressResult auth_response =
			    ec2.authorizeSecurityGroupIngress(auth_request);
			debugSOP(" * Result "+auth_response.toString());
		}
		catch(AmazonEC2Exception e)
		{
			if(e.getMessage().contains("already exists"))
				debugSOP(" * Already associated");
			else
			{
				throw e;
			}
		}
		
		debugSOP("Sleeping for 120 seconds to allow EC2 instance(s) to get up to speed.");
		Thread.sleep(120000);

		return result;
	}

	public static void debugSOP(String str)
	{
		System.out.println("[AWSProxyUtil] "+str);
	}
}

An important note on cost! If you allocate and release Elastic IPs too many times, AWS starts charging you (I think the first couple hundred(?) are free, but after that they start charging and it can add up!). And there is also a cost for leaving an Elastic IP address allocated.

Dynamic DNS using AWS Route 53 and AWS Java SDK

Route 53 is the Amazon Web Services (AWS) DNS service. Assuming your domain’s DNS is hosted with Route 53, you can create a utility in Java, using the AWS Java SDK, to update a hostname under your domain that points to a dynamic IP address. This may be useful if for example your home’s public IP address changes often, and you want to be able to access it remotely.

To start off, you’ll need to create a hostname in AWS Route 53 that maps to an “A” record pointing to an IP address (doesn’t matter what IP address at this point, since we’ll update it through code later). This can be done manually online, and should be pretty self-explanatory once you open up the Route 53 control panel in the AWS web console.

Let’s say your domain name is domain.com. And you want to dynamically update two hosts: home.domain.com, and dynamic.domain.com, to point to the IP address of a machine that has a dynamically assigned IP.

For this, you can use the following code snippit which I whipped up using the AWS Java SDK documentation for Route 53, and with lots of trial and error:

package utils;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.logging.Logger;

import org.xbill.DNS.ARecord;
import org.xbill.DNS.Lookup;
import org.xbill.DNS.Record;
import org.xbill.DNS.Resolver;
import org.xbill.DNS.SimpleResolver;
import org.xbill.DNS.Type;

import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.route53.AmazonRoute53;
import com.amazonaws.services.route53.AmazonRoute53ClientBuilder;
import com.amazonaws.services.route53.model.Change;
import com.amazonaws.services.route53.model.ChangeAction;
import com.amazonaws.services.route53.model.ChangeBatch;
import com.amazonaws.services.route53.model.ChangeResourceRecordSetsRequest;
import com.amazonaws.services.route53.model.GetHostedZoneRequest;
import com.amazonaws.services.route53.model.HostedZone;
import com.amazonaws.services.route53.model.ListResourceRecordSetsRequest;
import com.amazonaws.services.route53.model.ListResourceRecordSetsResult;
import com.amazonaws.services.route53.model.ResourceRecord;
import com.amazonaws.services.route53.model.ResourceRecordSet;

public class DynamicDNSUpdater {
	static String AWS_ACCESS_KEY_ID = "xxx";
	static String AWS_SECRET_KEY_ID = "xxx";
	static String ROUT53_HOSTED_ZONE_ID = "Zxxxxxxxxxxxxx";
	static String[] HOSTNAMES_TO_UPDATE = { "home.domain.com", "dynamic.domain.com" };

	static void UpdateIP() throws Exception
	{
		Logger log = ...;

		HashSet<String> hostnamesNeedingUpdate = new HashSet<String>();

		URL awsCheckIpURL = new URL("http://checkip.amazonaws.com");
		HttpURLConnection awsCheckIphttpUrlConnection = (HttpURLConnection) awsCheckIpURL.openConnection();
		BufferedReader awsCheckIpReader = new BufferedReader(new InputStreamReader(awsCheckIphttpUrlConnection.getInputStream()));
		String thisMachinePublicIp = awsCheckIpReader.readLine();
		log.fine("Current public IP of this machine: "+thisMachinePublicIp);
		
	    Resolver resolver = new SimpleResolver("8.8.8.8");
		for(String hostname : HOSTNAMES_TO_UPDATE)
		{
		    Lookup lookup = new Lookup(hostname, Type.A);
		    lookup.setResolver(resolver);
		    Record[] records = lookup.run();
		    String address = ((ARecord) records[0]).getAddress().toString();
		    address = address.substring(address.lastIndexOf("/")+1);
			if(!address.equals(thisMachinePublicIp))
			{
				log.fine("!!! Needs update: "+hostname+". Current IP: "+address+". New public IP: "+thisMachinePublicIp);
				hostnamesNeedingUpdate.add(hostname+".");
			}
		}

		if(hostnamesNeedingUpdate.size()>0)
		{
			BasicAWSCredentials awsCreds = new BasicAWSCredentials(AWS_ACCESS_KEY_ID, AWS_SECRET_KEY_ID);
			AmazonRoute53 route53 = AmazonRoute53ClientBuilder
					.standard()
					.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
					.withRegion(Constants.AWS_REGIONS)
					.build(); 
		    HostedZone hostedZone = route53.getHostedZone(new GetHostedZoneRequest(ROUT53_HOSTED_ZONE_ID)).getHostedZone();

		    ListResourceRecordSetsRequest listResourceRecordSetsRequest = new ListResourceRecordSetsRequest()
		            .withHostedZoneId(hostedZone.getId());
		    ListResourceRecordSetsResult listResourceRecordSetsResult = route53.listResourceRecordSets(listResourceRecordSetsRequest);
		    List<ResourceRecordSet>	resourceRecordSetList = listResourceRecordSetsResult.getResourceRecordSets();
	    	List<Change> changes = new ArrayList<Change>();
		    for(ResourceRecordSet resourceRecordSet : resourceRecordSetList)
		    {
		    	if(resourceRecordSet.getType().equals("A") && hostnamesNeedingUpdate.contains(resourceRecordSet.getName()))
		    	{
			    	List<ResourceRecord> resourceRecords = new ArrayList<ResourceRecord>();
			    	ResourceRecord resourceRecord = new ResourceRecord();
			    	resourceRecord.setValue(thisMachinePublicIp);
			    	resourceRecords.add(resourceRecord);
			    	resourceRecordSet.setResourceRecords(resourceRecords);
			    	Change change = new Change(ChangeAction.UPSERT, resourceRecordSet);
			    	changes.add(change);
			    	log.fine("Updating "+resourceRecordSet.getName()+" to A "+thisMachinePublicIp);
		    	}
		    }
		    if(changes.size()>0)
		    {
		    	ChangeBatch changeBatch = new ChangeBatch(changes);
		    	ChangeResourceRecordSetsRequest changeResourceRecordSetsRequest = new ChangeResourceRecordSetsRequest()
		    			.withHostedZoneId(ROUT53_HOSTED_ZONE_ID)
		    			.withChangeBatch(changeBatch);
		    	route53.changeResourceRecordSets(changeResourceRecordSetsRequest);
		    	log.fine("Done!");
		    }
		    else
		    {
		    	log.fine("None of the specified hostnames found in this zone");
		    }
		}
		else
			log.fine("No updates required!");
	}

	public static void main(String args[]) throws Exception {
		UpdateIP();
	}
}

In order for this to work correctly, you’ll need to set up an AWS API key. This key will need either full access to your AWS account, or at least access to Route53. The documentation for setting it up is available at AWS.

You’ll need to update the AWS_ACCESS_KEY_ID and AWS_SECRET_KEY_ID in the code block above with the key details you get from AWS. And then you’ll need to update ROUT53_HOSTED_ZONE_ID with the Zone ID of your domain hosted in Route 53 (it begins with Z, at least as far as I’ve noticed). And, of course, you’ll need to update HOSTNAMES_TO_UPDATE with the hostname(s) that need to be dynamically updated with the public IP of the machine running this utility.

Here’s a quick breakdown of the code: We start by getting the public IP of the machine this code is running on, and then we look up the IP of the hostnames provided. If these don’t match, that means an update with the new IP is needed. That’s when the com.amazonaws.services.route53.AmazonRoute53 class is used to do the following: using the AWS API access key, it gets a list of all the “A” records for the hosted zone provided. It then loops through the hostnames needing update, and simply posts a com.amazonaws.services.route53.AmazonRoute53.changeResourceRecordSets() with the new public IP of the machine.

And that’s it! There you have it–a Java util that will dynamically update the IP address for the machine it’s running on.

Now in order to run this utility periodically (so it can actually do what it’s meant to, without you manually running it), you can compile the Java code and stick it in a jar, or a simply just copy the .class files in a directory somewhere. (Note: if you’re using Eclipse, it makes it easy to export your project as an executable jar).

Then, if you’re in Linux, you can set up a crontab entry to run every 5 minutes or so and simply run this java utility from the command line.
Granted Java is installed and available in the system path, the command would look something like: java -cp /path/to/MyUtils.jar utils.DynamicDNSUpdater. And if you’re in windows, you can set up a task with the Windows Task Scheduler to run every 5 minutes and run the same command. Pro tip: if using windows, you may want to use “javaw” instead of “java”, if you don’t want a little window to pop up and disappear periodically when you’re in the middle of on the same machine.

AmazonS3Client to loop through batches of S3 files objects

AWS provides the AmazonS3Client class, which is part of the AWS Java SDK. This class can be used to interact with files in S3.

An important feature to note of the AmazonS3Client is that it limits results to batches of 1000. If you have less than 1000 files, then all is good. You can use amazonS3Client.listObjects(bucketName); and it will provide all the objects in a bucket.

But if the bucket contains more than 1000 files, you will need to loop through the files in batches. This is not entirely obvious and can cause you to miss files (as I certainly did)!

To get started, you would initiate AmazonS3Client like so:

AmazonS3Client amazonS3Client = new AmazonS3Client(new BasicAWSCredentials(KEY, SECRET));

The approach I like to take is to first loop through and collect all the files up front like so:

ObjectListing objectListing = amazonS3Client.listObjects(bucketName);
List<S3ObjectSummary> s3ObjectSummaries = objectListing.getObjectSummaries();
while (objectListing.isTruncated()) 
{
   objectListing = amazonS3Client.listNextBatchOfObjects (objectListing);
   s3ObjectSummaries.addAll (objectListing.getObjectSummaries());
}

Note: if memory is a concern or you have an unlimited number of files, you can simply modify the approach to do whatever you need to with each file as you fetch it in batches from the API, instead of collecting them up front.

If you first collected them in a List up front, you can then loop through each file like so:

for(S3ObjectSummary s3ObjectSummary : s3ObjectSummaries)
{
	String s3ObjectKey = s3ObjectSummary.getKey();
	//Do whatever with s3ObjectSummary

 

Installing pandas, scipy, numpy, and scikit-learn on AWS EC2

Most of the development/experimentation I was doing with scikit-learn’s machine learning algorithms was on my local development machine. But eventually I needed to do some heavy duty model training / cross validation, which would take weeks on my local machine. So I decided to make use of one of the cheaper compute optimized EC2 instances that AWS offers.

Unfortunately I had some trouble getting scikit-learn to install on a stock Amazon’s EC2 Linux, but I figured it out eventually. I’m sure others will run into this, so I thought I’d write about it.

Note: you can of course get an EC2 community image or an image from the EC2 marketplace that already has Anaconda or scikit-learn and tools installed. This guide is for installing it on a stock Amazon EC2 Linux instance, in case you already have an instance setup you want to use.

In order to get scikit-learn to work, you’ll need to have pandas, scipy and numpy installed too. Fortunately Amazon EC2 Linux comes with python 2.7 already installed, so you don’t need to worry about that.

Start by ssh’ing into your box. Drop into rootshell with the following command (if you’re going to be typing “sudo” before every single command, might as well be root by default anyway, right?)

sudo su

First you need to install some development tools, since you will literally be compiling some libraries in a bit. Run the following commands:

yum groupinstall ‘Development Tools’
yum install python-devel

Next you’ll install the ATLAS and LAPACK libraries, which are needed by numpy and scipy:

yum install atlas-sse3-devel lapack-devel

Now you’re ready to install first all the necessary python libraries and finally scikit-learn:

pip install numpy
pip install scipy
pip install pandas
pip install scikit-learn

Congratulations. You now have scikit-learn installed on the EC2 Linux box!

Amazon EC2 ssh timeout due to inactivity

Well, this applies to any Linux instance that you may be remotely connected to, depending on how sshd is configured on the remote server. And depending on how your localhost (developer machine) ssh config is done. But essentially in some instances the sshd host you’re connecting to times you out pretty quickly, so you have to reconnect often.

This was bothering me for a while. I usually am off and on all day on Linux shell on EC2 instances. And it seemed every time I come back to it, I’d be timed out, causing me to have to reconnect. Not a huge deal, just a nuisance.

To remedy this, without changing the settings on the remote server’s sshd config, you can add the following line to your localhost ssh config. Edit ~/.ssh/config file and add the following line:

ServerAliveInterval 50

And it’s as simple as that! It seems that AWS EC2s are set up to time you out at 60 seconds. So a 50 second keep-alive interval prevents you from getting timed out so aggressively.

Installing MongoDB on AWS EC2 and turning on zlib compression

At this time AWS doesn’t provide an RDS type for MongoDB. So in order to have a MongoDB server on the AWS cloud, you have to install it manually on an EC2 instance.

The full documentation for installing a MongoDB instance on an AWS EC2 can be seen at: https://docs.mongodb.com/v3.0/tutorial/install-mongodb-on-amazon/. Here’s a quick summary though.

First you’ll need to create a Linux EC2 server. Once you have the server created, log in to the machine through secure shell. Drop into root shell using the following command:

sudo su

Next you’ll need to create the repository info for yum to use to download the prebuilt MongoDB packages. You’ll create a file at /etc/yum.repos.d/mongodb-org-3.0.repo:

vi /etc/yum.repos.d/mongodb-org-3.0.repo

And copy/paste the repository:

[mongodb-org-3.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.0/x86_64/
gpgcheck=0
enabled=1

Save and exit from vi. And type in the following command to install:

yum install -y mongodb-org

And that’s it! Now you have MongoDB installed on your EC2.

Next, to turn on compression, you’ll need to edit /etc/mongod.conf

vi /etc/mongod.conf

Scroll down to the “storage” directive, and add in this configuration:

engine: "wiredTiger"
wiredTiger:
  collectionConfig:
    blockCompressor: "zlib"

Now any collections you create will be compressed with zlib, which provides the best compression currently.

To turn on your MongoDB instance by typing in this command:

service mongod start

And of course you’ll want to custom configure your MongoDB instance (or not). You can find several guides and tutorials to do that online.

Running AWS CLI commands from crontab

This is a short post to explain how to run AWS CLI commands from a crontab.

First you’ll need to install and set up the AWS CLI. More information here: http://docs.aws.amazon.com/cli/

Once you’ve set up AWS CLI, you’ll notice that there is a “.aws” folder created in the HOME folder for the user you’re logged in as. If it’s root, it would be “/root/.aws”.

The problem with running AWS CLI commands from crontab is that crontab sets HOME to “/”, so the “aws” command will not find “~/.aws”.

In order to get around this, you simply need to set HOME=”/root/” (or whatever the HOME is for the user AWS CLI was set up under). This can be done in the shell script that is being called by crontab, or if the aws command is directly in crontab, the crontab command could be something like the following:

HOME=”/root” && aws cli

And that’s it!

Setting up AWS CLI and dumping a S3 bucket

AWS CLI (command line interface) is very useful when you want to automate certain tasks. This post is about dumping a whole S3 bucket from the command line. This could be for any purpose, such as creating a backup.

First of all, if you don’t already have it installed, you’ll need to download and install the AWS CLI. More information here: http://docs.aws.amazon.com/cli/latest/userguide/installing.html

To configure AWS CLI, type the command:

aws configure

It will ask for credentials: the Access Key ID, and the Access Secret Key. More information on how to set up a key is here: http://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html

And that’s it! You now have the power of manipulating your AWS environment from your command line.

In order to dump a bucket, you’ll need to first make sure that the account belonging to the AWS Key you generated has read access to the bucket. More on setting up permissions in S3 here: http://docs.aws.amazon.com/AmazonS3/latest/dev/s3-access-control.html

To dump the whole contents of an S3 bucket, you can use the following command:

aws s3 cp –quiet –recursive s3:///

This will copy the entire contents of the bucket to your local directory. As easy as that!

Creating a simple ping servlet for AWS Elastic Load Balancer (ELB) Health Check

If you use the AWS Elastic Load Balancer (ELB) you’ll need to decide what to use as an endpoint on your application server for the health checker. This is how the ELB will determine if an instance is healthy, or not.

If you use the default “/” path, this may mean that a session in your application is kicked off every time the health checker connects, which could translate to unnecessary added load on your server (though perhaps it may be negligible).

Furthermore, if you create a static .html file and map it, and point the health checker to that, it could turn out that though your application server is hung, the simple static .html is still getting served. This would not make for an accurate health check, and happened to me in my experience.

The best way to ensure that your application server is online and not hung, without adding extra load to your server, will be to create a simple program that runs on the application server. In the case of a Java application server, you can create a simple servlet, as follows:

package com.whatever.aws.elb;

import java.io.IOException;
import java.io.PrintWriter;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@SuppressWarnings("serial")
public class Ping extends HttpServlet {
 private String message;

 public void init() throws ServletException {
  message = "Pong";
 }

 public void doGet(HttpServletRequest request,
  HttpServletResponse response)
 throws ServletException, IOException {
  response.setContentType("text/html");

  PrintWriter out = response.getWriter();
  out.println(" < h1 > " + message + " < /h1>
   ");
  }
 }

And map the servlet to a path in web.xml:

<servlet>
  <servlet-name>Ping</servlet-name>
  <servlet-class>com.perthera.elb.Ping</servlet-class>
  <load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
  <servlet-name>Ping</servlet-name>
  <url-pattern>/Ping</url-pattern>
  <url-pattern>/ping</url-pattern>
</servlet-mapping>

Now, you can configure the ELB health checker to connect to the “/ping” path on your instance. If it times out, or returns an error, that means the application server is not healthy. If it returns a normal HTTP code, then all is good.

Encrypting already existing files in AWS S3 using the AWS Java API

In my last post I covered how to server-side encrypt files in S3 using the AWS Java API. Unfortunately, if you didn’t turn on encryption from the very first day when uploading to S3, you may have some files that are not encrypted. This post will cover an easy block of Java code which you can use to server-side encrypt any existing files that aren’t already, using the AWS Java API.

In summary, you need to loop through all existing files in a bucket, and see which one is not encrypted. And if not encrypted, you set the metadata to turn on server-side encryption, and have to save the file again in S3. Note: this may change the timestamps on your files, but this is essentially the only way through the API to save the metadata for a file to turn on encryption.

Here is the code:

public S3EncryptionMigrator(String bucketName) {
 Logger.getLogger("com.amazonaws.http.AmazonHttpClient").setLevel(Level.OFF); //AWS API outputs too much information, totally flodding the console. Turn it off

 AmazonS3Client amazonS3Client = new AmazonS3Client(...);

 ObjectListing objectListing = amazonS3Client.listObjects(bucketName);
 List s3ObjectSummaries = objectListing.getObjectSummaries();
 while (objectListing.isTruncated()) {
  objectListing = amazonS3Client.listNextBatchOfObjects(objectListing);
  s3ObjectSummaries.addAll(objectListing.getObjectSummaries());
 }

 for (S3ObjectSummary s3ObjectSummary: s3ObjectSummaries) {
  String s3ObjectKey = s3ObjectSummary.getKey();
  S3Object unecryptedS3Object = amazonS3Client.getObject(bucketName, s3ObjectKey);
  ObjectMetadata meta = unecryptedS3Object.getObjectMetadata();
  String currentSSEAlgorithm = meta.getSSEAlgorithm();
  unecryptedS3Object.close();
  if (currentSSEAlgorithm != null && currentSSEAlgorithm.equals(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION))
   continue; //Already encrypted, skip
  meta.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION); //set encryption
  CopyObjectRequest copyObjectRequest = new CopyObjectRequest(bucketName, s3ObjectKey, bucketName, s3ObjectKey);
  copyObjectRequest.setNewObjectMetadata(meta);
  amazonS3Client.copyObject(copyObjectRequest); //Save the file
  System.out.println(">> '" + s3ObjectKey + "' encrypted.");
 }
}

Let’s examine the code. First you instantiate AmazonS3Client with the correct credentials. This should be tailored to your S3 authentication setup.  You start by getting a list of all files in a bucket. Note that you have to loop through objectListing.getObjectSummaries() because only 1000 results are returned at a time. In case you have more than 1000 files, you’ll need to loop through the rest until you get all of them.

Then you loop through the list of files. For each file you check if server-side encryption is already turned on by reading the existing metadata of the file. If not, you set the flag for encryption, and then essentially copy the file onto itself. This will save the new metadata, and will turn on server-side encryption.