Saturday, October 3, 2009

Implementing wait_timeout in PostgreSQL

In MySQL, the wait_timeout setting can be used to automatically close connections that have been inactive for a certain amount of time.  The default is 8 hours.

Unfortunately, there is no equivalent in PostgreSQL (as of version 8.3).  However, it's easy to implement.  You just need to use the query_start column (of the pg_stat_activity view) to determine when a connection was last used.

Here is an example using Bash:


/usr/bin/psql -U postgres -t -c "SELECT procpid FROM pg_stat_activity WHERE (NOW() - query_start) > INTERVAL '$WAIT_TIMEOUT_MINUTES MINUTES' AND current_query = '<IDLE>'" | xargs kill

This script takes the timeout as a command-line parameter, and then kills off any connections that have been sitting idle for too long.

Add this to a cron job and you're all set.

If you're using PostgreSQL 8.4, then you can do this entirely using SQL, thanks to the new terminate_backend() function:

SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE (NOW() - query_start) > INTERVAL 60 MINUTES' AND current_query = '<IDLE>'

This is especially useful if you're working in a Windows environment where there's no equivalent of xargs (that I know of), or if you don't have access to a command line (i.e. you're making a remote connection using psql) and therefore can't use kill.

Thanks to the anonymous commenter who pointed out an error in the original script.

Thursday, August 14, 2008

XFire interface proxies

As I wrote in my last post, we are currently working on a project where we have a queue with a huge number of Java objects, and we need to squeeze every last bit of memory out of them. Even after the previous optimization, it still seemed like the memory usage was much higher than it should have been.

One thing I didn't mention is that the objects in the queue are coming in via Web service calls, marshaled by XFire's Aegis binding. No big deal, right? Except that the objects are exposed in our API using Java interfaces, not concrete classes. In other words, our code is something like this:

public interface Node { ... }

public class NodeImpl implements Node { ... }

public class OurService {
public void enqueueNode(Node node) { ... }

How does XFire handle this? At first glance, it doesn't seem like a problem, but when XFire is converting a SOAP call into Java objects, how does it know what type of object to instantiate? XFire doesn't magically know that our implementation of the Node interface is NodeImpl. And in fact, early versions of XFire did not support using interfaces in a service's API.

The trick XFire uses since version 1.0 RC (way back when) is to create a dynamic proxy class. This is really great because it allows interfaces to "just work", but it carries a huge amount of overhead.

Now, back to our queue. If these proxies are taking so much memory, what can we do about it? Do we have to change our API to use concrete classes? Once again, XFire comes to the rescue. You can use XFire settings to configure your service so that, even though you use interfaces, XFire will always instantiate the class that you tell it to. The details are all here.

Now for the results: using dynamic proxies, our queue could hold about 1,000 elements (using a JVM configured with a maximum heap size of 13 GB). After the change was made, our queue could hold 10,000 elements - a 10x increase! Now that's what I'm talking about :-).

Friday, August 8, 2008

The dangers of micro-benchmarking

This week I started on a project where we needed to optimize the memory usage of a Java queue with a very, very large number of objects. Right away I noticed that the objects we are putting in the queue use "wrapper objects" for native types like long, int, etc.

public class Node {
public java.lang.Long oneValue;
public java.lang.Long anotherValue;
// etc.

Changing these to use regular primitives seemed like a great way to save memory, so I wrote up a little micro-benchmark:

public class OldNode {
public java.lang.Long oneValue;
// etc.

public class NewNode {
public long oneValue;
// etc.

public static void main(String[] args) {
List nodes = new ArrayList();
try {
OldNode node = new OldNode();
node.oneValue = 1L; // take advantage of Java 5 auto-boxing
} catch (OutOfMemoryError e) {
System.out.println("Out of memory. Size: " + nodes.size());

// And then repeat the same thing with NewNode

When I ran this test, the old node class came out ahead of the new node class! In other words, I was able to add around 300,000 of the old objects to the list before running out of memory, but I could only add around 280,000 of the new objects. Obviously, this didn't make any sense.

The reason is easy to spot if you know how auto-boxing works. The following line:

node.oneValue = 1L; // take advantage of Java 5 auto-boxing

gets converted by the Java compiler to:

node.oneValue = Long.valueOf(1L);

and the implementation of Long.valueOf() (at least in the Sun JVM) uses cached values for -128 to 127. So the old node class was re-using a single Long object for every instance! Obviously this uses a lot less memory.

Once I changed the benchmark to use Rand.nextLong() to generate the values, the new node class came out ahead, using about 40% of the memory that the old class used.

The larger picture is that if you write micro-benchmarks, you have to approximate real-world data. The part that made my original test fail was taking the easy way out and using all 0's.

Hello world!

I'm starting a new blog to post the random stuff that I come up with at work and otherwise. Hope you enjoy it :-).