Thursday, September 11, 2008
Thursday, August 14, 2008
XFire interface proxies
As I wrote in my last post, we are currently working on a project where we have a queue with a huge number of Java objects, and we need to squeeze every last bit of memory out of them. Even after the previous optimization, it still seemed like the memory usage was much higher than it should have been.
One thing I didn't mention is that the objects in the queue are coming in via Web service calls, marshaled by XFire's Aegis binding. No big deal, right? Except that the objects are exposed in our API using Java interfaces, not concrete classes. In other words, our code is something like this:
How does XFire handle this? At first glance, it doesn't seem like a problem, but when XFire is converting a SOAP call into Java objects, how does it know what type of object to instantiate? XFire doesn't magically know that our implementation of the Node interface is NodeImpl. And in fact, early versions of XFire did not support using interfaces in a service's API.
The trick XFire uses since version 1.0 RC (way back when) is to create a dynamic proxy class. This is really great because it allows interfaces to "just work", but it carries a huge amount of overhead.
Now, back to our queue. If these proxies are taking so much memory, what can we do about it? Do we have to change our API to use concrete classes? Once again, XFire comes to the rescue. You can use XFire settings to configure your service so that, even though you use interfaces, XFire will always instantiate the class that you tell it to. The details are all here.
Now for the results: using dynamic proxies, our queue could hold about 1,000 elements (using a JVM configured with a maximum heap size of 13 GB). After the change was made, our queue could hold 10,000 elements - a 10x increase! Now that's what I'm talking about :-).
One thing I didn't mention is that the objects in the queue are coming in via Web service calls, marshaled by XFire's Aegis binding. No big deal, right? Except that the objects are exposed in our API using Java interfaces, not concrete classes. In other words, our code is something like this:
public interface Node { ... }
public class NodeImpl implements Node { ... }
public class OurService {
public void enqueueNode(Node node) { ... }
}
How does XFire handle this? At first glance, it doesn't seem like a problem, but when XFire is converting a SOAP call into Java objects, how does it know what type of object to instantiate? XFire doesn't magically know that our implementation of the Node interface is NodeImpl. And in fact, early versions of XFire did not support using interfaces in a service's API.
The trick XFire uses since version 1.0 RC (way back when) is to create a dynamic proxy class. This is really great because it allows interfaces to "just work", but it carries a huge amount of overhead.
Now, back to our queue. If these proxies are taking so much memory, what can we do about it? Do we have to change our API to use concrete classes? Once again, XFire comes to the rescue. You can use XFire settings to configure your service so that, even though you use interfaces, XFire will always instantiate the class that you tell it to. The details are all here.
Now for the results: using dynamic proxies, our queue could hold about 1,000 elements (using a JVM configured with a maximum heap size of 13 GB). After the change was made, our queue could hold 10,000 elements - a 10x increase! Now that's what I'm talking about :-).
Friday, August 8, 2008
The dangers of micro-benchmarking
This week I started on a project where we needed to optimize the memory usage of a Java queue with a very, very large number of objects. Right away I noticed that the objects we are putting in the queue use "wrapper objects" for native types like long, int, etc.
Changing these to use regular primitives seemed like a great way to save memory, so I wrote up a little micro-benchmark:
When I ran this test, the old node class came out ahead of the new node class! In other words, I was able to add around 300,000 of the old objects to the list before running out of memory, but I could only add around 280,000 of the new objects. Obviously, this didn't make any sense.
The reason is easy to spot if you know how auto-boxing works. The following line:
gets converted by the Java compiler to:
and the implementation of Long.valueOf() (at least in the Sun JVM) uses cached values for -128 to 127. So the old node class was re-using a single Long object for every instance! Obviously this uses a lot less memory.
Once I changed the benchmark to use Rand.nextLong() to generate the values, the new node class came out ahead, using about 40% of the memory that the old class used.
The larger picture is that if you write micro-benchmarks, you have to approximate real-world data. The part that made my original test fail was taking the easy way out and using all 0's.
public class Node {
public java.lang.Long oneValue;
public java.lang.Long anotherValue;
// etc.
}
Changing these to use regular primitives seemed like a great way to save memory, so I wrote up a little micro-benchmark:
public class OldNode {
public java.lang.Long oneValue;
// etc.
}
public class NewNode {
public long oneValue;
// etc.
}
public static void main(String[] args) {
List nodes = new ArrayList();
try {
OldNode node = new OldNode();
node.oneValue = 1L; // take advantage of Java 5 auto-boxing
nodes.add(node);
} catch (OutOfMemoryError e) {
System.out.println("Out of memory. Size: " + nodes.size());
}
// And then repeat the same thing with NewNode
}
When I ran this test, the old node class came out ahead of the new node class! In other words, I was able to add around 300,000 of the old objects to the list before running out of memory, but I could only add around 280,000 of the new objects. Obviously, this didn't make any sense.
The reason is easy to spot if you know how auto-boxing works. The following line:
node.oneValue = 1L; // take advantage of Java 5 auto-boxing
gets converted by the Java compiler to:
node.oneValue = Long.valueOf(1L);
and the implementation of Long.valueOf() (at least in the Sun JVM) uses cached values for -128 to 127. So the old node class was re-using a single Long object for every instance! Obviously this uses a lot less memory.
Once I changed the benchmark to use Rand.nextLong() to generate the values, the new node class came out ahead, using about 40% of the memory that the old class used.
The larger picture is that if you write micro-benchmarks, you have to approximate real-world data. The part that made my original test fail was taking the easy way out and using all 0's.
Hello world!
I'm starting a new blog to post the random stuff that I come up with at work and otherwise. Hope you enjoy it :-).
Subscribe to:
Posts (Atom)