AmazonS3Client to loop through batches of S3 files objects

AWS provides the AmazonS3Client class, which is part of the AWS Java SDK. This class can be used to interact with files in S3.

An important feature to note of the AmazonS3Client is that it limits results to batches of 1000. If you have less than 1000 files, then all is good. You can use amazonS3Client.listObjects(bucketName); and it will provide all the objects in a bucket.

But if the bucket contains more than 1000 files, you will need to loop through the files in batches. This is not entirely obvious and can cause you to miss files (as I certainly did)!

To get started, you would initiate AmazonS3Client like so:

AmazonS3Client amazonS3Client = new AmazonS3Client(new BasicAWSCredentials(KEY, SECRET));

The approach I like to take is to first loop through and collect all the files up front like so:

ObjectListing objectListing = amazonS3Client.listObjects(bucketName);
List<S3ObjectSummary> s3ObjectSummaries = objectListing.getObjectSummaries();
while (objectListing.isTruncated()) 
{
   objectListing = amazonS3Client.listNextBatchOfObjects (objectListing);
   s3ObjectSummaries.addAll (objectListing.getObjectSummaries());
}

Note: if memory is a concern or you have an unlimited number of files, you can simply modify the approach to do whatever you need to with each file as you fetch it in batches from the API, instead of collecting them up front.

If you first collected them in a List up front, you can then loop through each file like so:

for(S3ObjectSummary s3ObjectSummary : s3ObjectSummaries)
{
	String s3ObjectKey = s3ObjectSummary.getKey();
	//Do whatever with s3ObjectSummary

 

Creating Java “Helper” class to interact with Egnyte RESTful API

Egnyte is a HIPAA compliant secure cloud file service. If you’re using Egnyte, and have Java applications that need to access your files on Egnyte, you can create a “Helper” class to interact with the RESTful API Egnyte provides. You could create methods such as “downloadFile()”, “uploadFile()”, “listFileOrFolder()” to use anywhere in your Java code.

As of when this blog post was written, Egnyte does not provide a native Java API. But it does provide a pretty rich RESTful API using web services. The documentation for this API is here: https://developers.egnyte.com/docs

The API requires an authentication token sent in with the HTTP headers. So first you’ll need to obtain this token using your Egnyte account username and password. More details on obtaining this token is in the documentation referenced above.

Once you have your authorization token, you can get started with making RESTful calls.

Let’s define two variables in a Constants class, called EGNYTEAPI_BASE_URL and EGNYTEAPI_AUTH_TOKEN. The base URL should be “https://(your-domain).egnyte.com/pubapi/v1/”, and the auth token should be the simple alphanumeric string you obtained from Egnyte. You’ll need to use these in all the methods that interact with the Egnyte RESTful API.

Here’s a snippet of what a downloadFile() method would look like:

public File downloadFile(String fullPathWithSpaces) throws Exception {
 File file = null;

 URL url = UriBuilder.fromPath(Constants.EGNYTEAPI_BASE_URL + "fs-content/" + fullPathWithSpaces).build().toURL();
 URLConnection urlConnection = url.openConnection();
 HttpURLConnection httpUrlConnection = (HttpURLConnection) urlConnection;
 httpUrlConnection.setRequestMethod("GET");
 httpUrlConnection.setRequestProperty("Authorization", "Bearer " + Constants.EGNYTEAPI_AUTH_TOKEN);

 httpUrlConnection.connect();

 if (httpUrlConnection.getResponseCode() == HttpURLConnection.HTTP_OK || httpUrlConnection.getResponseCode() == HttpURLConnection.HTTP_CREATED) {
  String fileName = fullPathWithSpaces.substring(fullPathWithSpaces.lastIndexOf('/') + 1);
  File tempDir = new File(System.getProperty("java.io.tmpdir") + "/" + Long.toString(System.nanoTime()));
  tempDir.mkdir();
  file = new File(tempDir.getAbsolutePath() + "/" + fileName);
  InputStream is = httpUrlConnection.getInputStream();
  Files.copy(is, file.toPath());
 }

 httpUrlConnection.disconnect();

 return file;
}

As simple as that! The method takes a full path to a file, and returns a java File object, saved in the system temp folder. If the file is not found, or there’s another error, the method simply returns a null File object. Note that the method will handle any URL-sensitive characters by percent-encoding them (https://en.wikipedia.org/wiki/Percent-encoding), so you won’t need to worry about spaces or other characters in the file name.

Another useful method is one that returns the listing of a file or folder on your Egnyte file server account:

public JSONObject listFileOrFolder(String fullFileOrFolderPathWithSpaces) throws Exception {
 JSONObject listing;

 URL url = UriBuilder.fromPath(Constants.EGNYTEAPI_BASE_URL + "fs/" + fullFileOrFolderPathWithSpaces).build().toURL();
 URLConnection urlConnection = url.openConnection();
 HttpURLConnection httpUrlConnection = (HttpURLConnection) urlConnection;
 httpUrlConnection.setRequestMethod("GET");
 httpUrlConnection.setRequestProperty("Authorization", "Bearer " + Constants.EGNYTEAPI_AUTH_TOKEN);

 httpUrlConnection.connect();

 if (httpUrlConnection.getResponseCode() == HttpURLConnection.HTTP_OK || httpUrlConnection.getResponseCode() == HttpURLConnection.HTTP_CREATED) {
  Scanner scanner = new Scanner(httpUrlConnection.getInputStream());
  scanner.useDelimiter("\\A");
  String returnJsonString = (scanner.hasNext() ? scanner.next() : "{}");
  scanner.close();
  listing = new JSONObject(returnJsonString);
 } else
  listing = new JSONObject();

 httpUrlConnection.disconnect();

 return listing;
}

This method will also take a full path to a folder or file (can include URL-sensitive characters, such as spaces) and will return either a null JSONObject if there was an error or if the file or folder was not found, or it will return a JSONObject of the listing returned by Egnyte.

The upload file method is a bit more complex, and took me a while to get right. Here it is:

public boolean uploadFile(String fullPathWithSpaces, File file) throws Exception {
 boolean returnValue = false;

 URL url = UriBuilder.fromPath(Constants.EGNYTEAPI_BASE_URL + "fs-content/" + fullPathWithSpaces).build().toURL();
 URLConnection urlConnection = url.openConnection();
 HttpURLConnection httpUrlConnection = (HttpURLConnection) urlConnection;
 httpUrlConnection.setRequestMethod("POST");
 httpUrlConnection.setDoOutput(true);
 httpUrlConnection.setRequestProperty("Authorization", "Bearer " + Constants.EGNYTEAPI_AUTH_TOKEN);

 String crlf = "\r\n", twoHyphens = "--", boundary = UUID.randomUUID().toString();

 httpUrlConnection.setRequestProperty("Content-Type", "multipart/form-data;boundary=" + boundary);

 DataOutputStream request = new DataOutputStream(httpUrlConnection.getOutputStream());

 request.writeBytes(twoHyphens + boundary + crlf);
 request.writeBytes("Content-Disposition: form-data; name=\"file\";filename=\"" + file.getName() + "\"" + crlf);
 request.writeBytes(crlf);

 InputStream fileInputStream = new FileInputStream(file);

 int bytesRead, bytesAvailable, bufferSize;
 byte[] buffer;
 int maxBufferSize = 1 * 1024 * 1024;

 bytesAvailable = fileInputStream.available();
 bufferSize = Math.min(bytesAvailable, maxBufferSize);
 buffer = new byte[bufferSize];
 bytesRead = fileInputStream.read(buffer, 0, bufferSize);
 while (bytesRead > 0) {
  request.write(buffer, 0, bufferSize);
  bytesAvailable = fileInputStream.available();
  bufferSize = Math.min(bytesAvailable, maxBufferSize);
  bytesRead = fileInputStream.read(buffer, 0, bufferSize);
 }
 fileInputStream.close();

 request.writeBytes(crlf);
 request.writeBytes(twoHyphens + boundary + twoHyphens + crlf);

 request.flush();
 request.close();

 if (httpUrlConnection.getResponseCode() == HttpURLConnection.HTTP_OK || httpUrlConnection.getResponseCode() == HttpURLConnection.HTTP_CREATED)
  returnValue = true;

 httpUrlConnection.disconnect();

 return returnValue;
}

The uploadFile() method takes a java File object, and a full path for where you want the file uploaded to on your Egnyte account. It then reads the file, and uploads it using HTTP multipart data.

Essentially using the methods above you have the basics down: listing a file or folder, uploading a file, and downloading a file. The same code from these can be used to do other actions too such as making a directory, or anything else that the Egnyte RESTful API allows.

Enjoy!

Encrypting already existing files in AWS S3 using the AWS Java API

In my last post I covered how to server-side encrypt files in S3 using the AWS Java API. Unfortunately, if you didn’t turn on encryption from the very first day when uploading to S3, you may have some files that are not encrypted. This post will cover an easy block of Java code which you can use to server-side encrypt any existing files that aren’t already, using the AWS Java API.

In summary, you need to loop through all existing files in a bucket, and see which one is not encrypted. And if not encrypted, you set the metadata to turn on server-side encryption, and have to save the file again in S3. Note: this may change the timestamps on your files, but this is essentially the only way through the API to save the metadata for a file to turn on encryption.

Here is the code:

public S3EncryptionMigrator(String bucketName) {
 Logger.getLogger("com.amazonaws.http.AmazonHttpClient").setLevel(Level.OFF); //AWS API outputs too much information, totally flodding the console. Turn it off

 AmazonS3Client amazonS3Client = new AmazonS3Client(...);

 ObjectListing objectListing = amazonS3Client.listObjects(bucketName);
 List s3ObjectSummaries = objectListing.getObjectSummaries();
 while (objectListing.isTruncated()) {
  objectListing = amazonS3Client.listNextBatchOfObjects(objectListing);
  s3ObjectSummaries.addAll(objectListing.getObjectSummaries());
 }

 for (S3ObjectSummary s3ObjectSummary: s3ObjectSummaries) {
  String s3ObjectKey = s3ObjectSummary.getKey();
  S3Object unecryptedS3Object = amazonS3Client.getObject(bucketName, s3ObjectKey);
  ObjectMetadata meta = unecryptedS3Object.getObjectMetadata();
  String currentSSEAlgorithm = meta.getSSEAlgorithm();
  unecryptedS3Object.close();
  if (currentSSEAlgorithm != null && currentSSEAlgorithm.equals(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION))
   continue; //Already encrypted, skip
  meta.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION); //set encryption
  CopyObjectRequest copyObjectRequest = new CopyObjectRequest(bucketName, s3ObjectKey, bucketName, s3ObjectKey);
  copyObjectRequest.setNewObjectMetadata(meta);
  amazonS3Client.copyObject(copyObjectRequest); //Save the file
  System.out.println(">> '" + s3ObjectKey + "' encrypted.");
 }
}

Let’s examine the code. First you instantiate AmazonS3Client with the correct credentials. This should be tailored to your S3 authentication setup.  You start by getting a list of all files in a bucket. Note that you have to loop through objectListing.getObjectSummaries() because only 1000 results are returned at a time. In case you have more than 1000 files, you’ll need to loop through the rest until you get all of them.

Then you loop through the list of files. For each file you check if server-side encryption is already turned on by reading the existing metadata of the file. If not, you set the flag for encryption, and then essentially copy the file onto itself. This will save the new metadata, and will turn on server-side encryption.

Encrypting files in AWS S3 using Java API

If you use AWS S3 Java API, and would like to see how you can encrypt files on S3, this post is for you.

First of all, there are two ways you can encrypt files in S3. One is to encrypt files on the server side, and one is to encrypt files on the client side. With using the server side option, you don’t have to worry about too much. S3 encrypts the files for you when they are written to disk, and decrypts them when they are read, seamlessly. With the client side option, the client (your application) has to encrypt files before transmitting them to S3, and decrypt them after receiving the file from S3.

In this post I’ll cover server side encryption. We opted to use this one because it’s just simpler, and seamless. You don’t have to worry about encrypting/decrypting files yourself, nor do you have to worry about the key.

I’m assuming that you’re already familiar with the AWS Java API. For most things related to S3, AWS provides a class called AmazonS3Client. Once you have AmazonS3Client instantiated with your configuration, you will need to enable encryption in the matadata for each file you upload.

Example:

File fileForUpload = new File(...);
AmazonS3Client amazonS3Client = new AmazonS3Client(...);
ObjectMetadata meta = new ObjectMetadata();
meta.setContentType(URLConnection.guessContentTypeFromName(fileForUpload.getName()));
meta.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION);
amazonS3Client.putObject(s3Bucket, s3FullDestinationPath, new FileInputStream(fileForUpload), meta);

Let’s examine. First you instantiate the File you want to upload, and AmazonS3Client. Next you set the metadata on the file. This includes setting the content type of the file (important because having the wrong content-type can cause issues down the line), and sets the encryption flag for the file. Then when you upload the file using AmazonS3Client.putObject(…), the file will be encrypted by S3 before it is stored, and automatically decrypted when it is retrieved, all by S3’s servers. And that’s it!

Note that according to AWS Java API documentation, AmazonS3Client uses SSL under the hood so you don’t have to worry about transmitting unencrypted files over the network.