Tuesday, July 5, 2011

What is Koda (in one sentence)?

Key-code-value in memory data store.  Here is the promised feature set:

1.    Scalable off-heap storage allows using the whole server’s RAM. Tested up to 30G of memory.
2.    Very low and predictable latencies. With cache size = 30G: Avg latency = 3.5 microsecond, Median = 2.5 microsecond. 99% - 25 microseconds, 99.9%  = 50 microseconds, 99.99% < 300 microseconds.
3.    Max latencies are in 100’s ms even for very large caches.
4.    Very fast  (up to 7x times faster than BigMemory). 3.5M requests per second (90/10 get/put) with eviction = LRU and cache size = 28G.
5.    Very fast (faster 4 x times than on-heap Ehcache ).
6.    Hard limit on a maximum cache instance size (in bytes) and hard limit on overall allocated memory.
7.    Low overhead per cache entry (24 bytes).
8.    Compact String representations in memory.
9.    Fast serialization using Kryo
10.  Custom serialization through Writable, Serializer interface.
11. Multiple eviction policies: LRU, FIFO, LFU, RANDOM.
12. Two types of Secondary cache indexes (optimized for scan and optimized for lookup).
13. Very low index memory overhead (< 10 bytes per cache entry).
14.  Indexes are lightweight (creation performance: 10-20M of cache entries per second).
15.  Indexes are off-heap as well. So you do not have to worry about Java OOM.
16. Execute/ExecuteForUpdate API allows to execute Java code inside server process thus allowing more efficient data processing algorithms (no need to lock-load-store-unlock data on a client side).
17. Direct Memory Access allows to implement rich off heap data structures such as: lists, sets, trees etc.
18.  Query API.
19.  Queries are performed on serialized version of objects (without materializing objects themselves, using DMA).
20. Queries are very fast (scan of 10s of millions cache entries per second per server w/o indexes).
21.  Supported OS: Linux, FreeBSD, NetBSD, Mac OSX, Solaris 10,11. 64 bit only.
22.  Infinispan 5.0 integration.
23. Ehcache integration.


  1. Interested in hearing more about this! Our company is looking for ways to scale cached data. I dug into the concept of using NIO's direct allocation as a way to get around the constraints of garbage collection, and your approach seems to be right on the money. Are you planning for this to be an open-source project? Are you mostly using it to solve some problems at work? The cool thing about off-heap caching in my case is that I'm already storing most data as arrays of primitives anyway, so there would be little serialization overhead in using buffers.


  2. Eventually it will be open-sourced but as a part of another (much bigger project I am working on currently). This is my own (not my employer) project and I am borrowing time from my family. If you want to discuss your requirements and needs (for scalable caching solution) you can contact me at:
    vladrodionov at gmail.com - and I will be happy to help you.