I’ve known for a long time that the Java Just-In-Time compiler produces some very fast code but I didn’t realize how fast until I went one-on-one with it and lost.

At work, my manager sponsored a little programming contest. It’s the first of its kind in our department and the goal is to foster creativity and get developers talking with each other about problems solving. The problem itself is trivial. Given the Java method signature

public void reverseSent(char[] sentence)

reverse the order of the words contained in the char array. So, a sentence like, “Mike Heath is cool” would be morphed to “cool is Heath Mike”.

I wrote the following Java code to solve the problem:

	public void reverseSent(char[] sentence) {
		int i = 0;
		int j = sentence.length - 1;

		// Reverse the array
		while (i < j) {
			char tmp = sentence[i];
			sentence[i] = sentence[j];
			sentence[j] = tmp;
			i++; j--;
		}

		// Reverse each word
		i = 0;
		for (int k = 0; k < sentence.length; k++) {
			if (sentence[k] == ' ') {
				j = k - 1;
				while (i < j) {
					char tmp = sentence[i];
					sentence[i] = sentence[j];
					sentence[j] = tmp;
					i++; j--;
				}
				i = k + 1;
			}
		}
		j = sentence.length - 1;
		while (i < j) {
			char tmp = sentence[i];
			sentence[i] = sentence[j];
			sentence[j] = tmp;
			i++; j--;
		}
	}

My solution reverses the entire array and then re-reverses each word. This is so that I don't have to allocate a second array and copy data back and forth between two arrays.

After writing it, I started thinking that this would be the perfect problem for pointer arithmetic in C or C++. So I decided to cheat in the contest. I would provide a Java method that would invoke some C++ code using Java Native Interface (JNI).

The C++ code I wrote is found below:

JNIEXPORT void JNICALL Java_NativeReverseSentence_reverseSent
  (JNIEnv *env, jobject self, jcharArray sentence)
{
	int length = env->GetArrayLength(sentence);
	jchar *sentencePtr = (jchar *)env->GetPrimitiveArrayCritical(sentence, NULL);

	// Reverse the sentence
	jchar *i = sentencePtr;
	jchar *j = sentencePtr + length - 1;
	while (i < j) {
		swap(i, j);
		i++; j--;
	}

	// Reverse each word
	i = sentencePtr;
	for (jchar* k = sentencePtr; k < sentencePtr + length; k++) {
		if (*k == ' ') {
			j = k - 1;
			while (i < j) {
				swap(i, j);
				i++; j--;
			}
			i = k + 1;
		}
	}

	// Swap last word
	j = sentencePtr + length - 1;
	while (i < j) {
		swap(i, j);
		i++; j--;
	}

	env->ReleasePrimitiveArrayCritical(sentence, sentencePtr, 0);
}

The algorithms are identical but the in the C++ example, I'm not indexing each array; I'm using pointer arithmetic. Since Java doesn't let you use pointers, I figured the C++ code must be faster. Array indexing requires multiplication and my pointer arithmetic is only doing addition and subtraction. A slight advantage but an advantage nonetheless.

To compare the performance of the two methods, I ran the Java version 1 million times to make sure the JIT had optimized as much is possible and then ran it a 1 million times again. I ran it 1 million times a third time just for good measure. I measured how much time it took to run each batch of 1 million. I did the same with the native code. The second and third Java measurements always had very similar run times. The first Java measurement was always slower but this was expected since the JIT needs time to go in and optimize the code.

The native code always ran at the same speed and had very little variance. This is expected because the C++ code is compiled once and doesn't have any dynamic run-time compiling as is the case with the Java code.

The results of my experiment shocked me. Java won by a large margin. I was mad. How could Java beat my finely crafted C code?

I run Linux so, of course, I used GCC to compile my C code. I went back and recompiled it with -O3. This tells GCC to optimize as much as possible. After turning on the optimizations, my C code was faster. I smiled and thought to myself, "Should I have them put 'Mike Heath' or 'Michael Heath' on my programming contest trophy." I didn't think any of my coworkers would use JNI.

The contest was going to be run on Windows so I pulled out my Dell Latitude D810, booted it to Windows and compiled my code using GCC under Windows. I would have used the Visual C++ compiler but I could not create a .dll project with Visual Studio Express. Microsoft apparently doesn't believe in allowing their free tools to be used for much more than a trivial project. That's a rant for another day. Apparently it can be done with the command-line tools but I was too lazy to figure that out. Suffice it to say, I hate Visual Studio more now than ever.

It didn't take long at all to create a .dll using GCC on Cygwin. Honestly, Cygwin is the only thing that makes Windows usable for a Unix geek like myself.

I ran through all my tests and sure enough. My C++ code still ran faster than the Java equivalent on Windows.

After getting everything working under Windows, I decided I had better try Java 6 to see how it compares. Java 6 has a number of significant performance improvements over Java 5. Java 6 was significantly faster than Java 5 but my C code was still faster. I patted myself on the back and decided to run my code on my Dell Lattitude D820. I wanted to make sure that everything would run fine without Cygwin being installed. I've only used Windows on my D820 to play Age of Empires III. Windows might not be hacker friendly but it is gamer friendly.

I ran the code on my D820 and much to my surprise, the Java code ran faster than the C++ code. I was shocked. How could this be? The C++ code was always faster on the D810. The only explanation I can think of is that the Java just-in-time compiler is producing code that takes advantage of my D820's Intel Core 2 Duo processor. I'm sure it's not using both cores (that would require threading) but the Core 2 Duo does have a 64-bit instruction set that the JIT may be using to push more data through more quickly. Further investigation is required.

So, I decided to submit my native code to the contest anyway because it lets my true geek shine through and I'm relatively certain the code will be run on a D810.

I will post a matrix in the near future that compares various JVM's (Sun Java 5, Sun Java 6, IBM Java 5, JRocket, etc.) to my native code. I'll run the experiments on both my D810 and my D820 and do both Windows and Linux on my D820. It will take some time to compile all the results but they should be interesting. I'll also trying running the experiment on my single core 64-bit Athlon processor.