Java: Workaround for Array.sort() slowness when sorting on File.lastModified()

java

Let’s say you have a File[] array gotten using File.listFiles() (or any other means). Now you want to sort that array based on the last modified date of the files. You could whip up the following code:

File directory = new File("/SomeDirectory");
File[] filesList = directory.listFiles();
Arrays.sort(filesList, new Comparator<File>() {
    public int compare(File file1, File file2)
    {
    	return Long.valueOf(file1.lastModified()).compareTo(file2.lastModified());
    } 
});

Note: this will sort them with the latest modified files first.

So this is all well and good, but let’s say your directory has 5 million files in it. Turns out the code above will be extremely slow in sorting the array on such a large list of files (also depending on the speed of your disk drive). The reason for that is because File.lastModified() is called on each file, every time a comparison is made during the sort. Arrays.sort() is an O(n log(n)) operation, so you do the math to see how many times File.lastModified() will be called on each individual file repeatedly in the worst case. (The issue with the repeated File.lastModified() calls is that the method does not cache the last modified timestamp; the call ventures out to the OS and the disk in real time to get the information every time.)

The way around this is simple. Cache the File.lastModified() timestamp. Here’s a code snippet on how to go about that:

public class FileLastModifiedWrapper implements Comparable<FileLastModifiedWrapper> 
{
	public final File file;
	public final long lastModified;

	public FileLastModifiedWrapper(File file) 
	{
		this.file = file;
		lastModified = file.lastModified();
	}

	public int compareTo(FileLastModifiedWrapper other) 
	{
		return Long.compare(this.lastModified, other.lastModified);
	}
}

//...somewhere else:

File directory = new File("/SomeDirectory");
File[] filesList = directory.listFiles();
FileLastModifiedWrapper[] wrappedFilesList = new FileLastModifiedWrapper[filesList.length];
for(int i=0; i<filesList.length; i++)
	wrappedFilesList[i] = new FileLastModifiedWrapper(filesList[i]);
Arrays.sort(wrappedFilesList);
for(int i=0; i<filesList.length; i++)
	filesList[i] = wrappedFilesList[i].file;

And voila! This will sort immensely faster. I noted that on around 100k files, it took just a few seconds, whereas the original code took up to two minutes.

As you see, FileLastModifiedWrapper caches the lastModified timestamp locally. Then we instantiate an array of FileLastModifiedWrapper objects with each file in our filesList. We then sort this new array, and use it to rearrange the original array.

Leave a Reply