Tuesday, November 16, 2010

GemFire Hibernate L2 cache

One of the important strategies for improving the performance of your hibernate applications is caching.
I recently worked on implementing second level caching for hibernate using GemFire. Some of the key advantages of using this implementation are:

Smart Eviction
You do not have to worry about calculating how many entries will fit your cache. By default GemFire will monitor your heap, and evict the least recently used entry when your heap is about 80% full. It is recommended that you enable ConcurrentMarkSweep collector, so that GemFire can have accurate stats of the heap, and does not over-evict your data.

In process and clustered caching
This GemFire module can be configured for caching data within the same JVM as your application (for relatively small amounts of data) or cache the data in main memory of a cluster of machines. To use clustering, just start GemFire process on each machine in your cluster. By default, all these processes will discover each other (using multi-cast), and create connections to each other. If you have more than a few hundred JVM running your hibernate application, it is recommended that you split your deployment into client-server topology. In this case, the processes holding the data will act as servers to your hibernate application JVMs. This is a one-line configuration change.
We also have a little flowchart for helping you decide your deployment strategy.



No distributed locks
A distributed cache provider for hibernate is expected to provide distributed locking support for the entities in the cache. Although GemFire supports distributed locks, we feel that grabbing a lock on each entry is an overkill, so we have implemented a smart version checking scheme, where we keep track of the versions of the entities, and make sure that there is consistency among the distributed cache and the database. We call our strategy "when in doubt throw it out", which may result in a few extra cache misses in the unlikely scenario that the same entry is modified by two threads simultaneously, but it will never return stale data.

Eager pre-fetching
When you have relationships among your entities, we make sure that all the dependent entities are eagerly fetched from the remote cache and stored in the local JVM, so that the subsequent access to these entities is local and hence fast. If you are running using the client-server topology, and caching data in the client, the server makes sure that whenever another client changes the entity, the changes are pushed to all other clients who have that entity.