Swapnil's Blog

Tuesday, April 24, 2012

Gemcached - Improved Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data. memcached is distributed, however its servers are disconnected from each other. This means it is the client's responsibility to put and retrieve data from the correct server. Most memcached clients use some sort of hashing to decide a key's location. While deploying a memcached cluster (if you can call it that) there are a lot of things that can go wrong.

Configuring Servers

To determine the server where a particular key should reside, most memcached clients will use the key's hash code and mod it by the number of servers (this changes slightly when the servers are weighed). In order for all the clients to do this consistently, you will have to configure all of them with the same server list. Not only that you have to make sure that the order of the servers is the same too, because some clients may not sort the server list provided to them.

Mixed memcached Clients

If you want to use a mix of clients, say a php and a java client, you could be in for trouble. These clients may not hash the keys in a uniform manner, so you could end up putting same key on more than one servers, leading to stale data, lost updates and all sorts of confusion.

High Availability

If one of the memcached servers crashes, a get request for a key on that server simply results in a cache miss. The application will then have to go to its datastore and fetch the entry again. If many concurrent requests arrive for keys on the crashed server, there can be severe delays in serving all the requests.

Dynamic Scaling

When you want to add capacity to your system, you start a new memcached server. But that in itself does not enable the clients to use the new server. You have to let the client know that there is a new server either by stopping and re-configuring the client or by invoking API (like addserver() for xmemcached). To add capacity, I don't think stopping your client (which can be your application server) is really an option, so you will have to build a mechanism to notify an already running client about a server that was added.

Gemcached

I recently worked on gemcached which solves all problems listed above with memcached. gemcached uses VMware GemFire to provide clustered caching. You can download gemcached from VMware labs.

Starting gemcached

After adding gemfire.jar and gemcached-X.X-RELEASE.jar to your CLASSPATH, you can start gemcached from the command line using:
java com.gemstone.memcached.GemFireMemcachedServer -port=[port]

If port is not specified, 11212 is used by default. You can start as many gemcached servers as you want and point your clients to one, some or all of them without worrying about the order of the servers. When you want to scale the system, just start another gemcached server. This new server discovers running server by default and shares their load. Clients do not have to be re-started, they simply start using the new server by continuing to talk with its configured set of servers.

Tuesday, November 16, 2010

GemFire Hibernate L2 cache

One of the important strategies for improving the performance of your hibernate applications is caching.

I recently worked on implementing second level caching for hibernate using GemFire. Some of the key advantages of using this implementation are:

Smart Eviction

You do not have to worry about calculating how many entries will fit your cache. By default GemFire will monitor your heap, and evict the least recently used entry when your heap is about 80% full. It is recommended that you enable ConcurrentMarkSweep collector, so that GemFire can have accurate stats of the heap, and does not over-evict your data.

In process and clustered caching

This GemFire module can be configured for caching data within the same JVM as your application (for relatively small amounts of data) or cache the data in main memory of a cluster of machines. To use clustering, just start GemFire process on each machine in your cluster. By default, all these processes will discover each other (using multi-cast), and create connections to each other. If you have more than a few hundred JVM running your hibernate application, it is recommended that you split your deployment into client-server topology. In this case, the processes holding the data will act as servers to your hibernate application JVMs. This is a one-line configuration change.

We also have a little flowchart for helping you decide your deployment strategy.

No distributed locks

A distributed cache provider for hibernate is expected to provide distributed locking support for the entities in the cache. Although GemFire supports distributed locks, we feel that grabbing a lock on each entry is an overkill, so we have implemented a smart version checking scheme, where we keep track of the versions of the entities, and make sure that there is consistency among the distributed cache and the database. We call our strategy "when in doubt throw it out", which may result in a few extra cache misses in the unlikely scenario that the same entry is modified by two threads simultaneously, but it will never return stale data.

Eager pre-fetching

When you have relationships among your entities, we make sure that all the dependent entities are eagerly fetched from the remote cache and stored in the local JVM, so that the subsequent access to these entities is local and hence fast. If you are running using the client-server topology, and caching data in the client, the server makes sure that whenever another client changes the entity, the changes are pushed to all other clients who have that entity.