AmazonS3Client to loop through batches of S3 files objects

AWS provides the AmazonS3Client class, which is part of the AWS Java SDK. This class can be used to interact with files in S3.

An important feature to note of the AmazonS3Client is that it limits results to batches of 1000. If you have less than 1000 files, then all is good. You can use amazonS3Client.listObjects(bucketName); and it will provide all the objects in a bucket.

But if the bucket contains more than 1000 files, you will need to loop through the files in batches. This is not entirely obvious and can cause you to miss files (as I certainly did)!

To get started, you would initiate AmazonS3Client like so:

AmazonS3Client amazonS3Client = new AmazonS3Client(new BasicAWSCredentials(KEY, SECRET));

The approach I like to take is to first loop through and collect all the files up front like so:

ObjectListing objectListing = amazonS3Client.listObjects(bucketName);
List<S3ObjectSummary> s3ObjectSummaries = objectListing.getObjectSummaries();
while (objectListing.isTruncated()) 
{
   objectListing = amazonS3Client.listNextBatchOfObjects (objectListing);
   s3ObjectSummaries.addAll (objectListing.getObjectSummaries());
}

Note: if memory is a concern or you have an unlimited number of files, you can simply modify the approach to do whatever you need to with each file as you fetch it in batches from the API, instead of collecting them up front.

If you first collected them in a List up front, you can then loop through each file like so:

for(S3ObjectSummary s3ObjectSummary : s3ObjectSummaries)
{
	String s3ObjectKey = s3ObjectSummary.getKey();
	//Do whatever with s3ObjectSummary