Designed to scale (and fly :)

Saturday, June 14, 2014

How not to benchmark something

Jonathan Ellis from Cassandra wrote blog post "How not to benchmark Cassandra" (here: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra-a-case-study) and presented us his own benchmark results for Cassandra, Couchbase, HBase and MongoDB. Ironically, his benchmark methodology is even more obscure than Thumbstack's one (read his post to understand the context). As usual, every vendor "tune" benchmark in his own favor and run competitors in default configurations, which is (at least for HBase) often is sub-optimal. We need an independent entity to conduct and publish NoSQL benchmarks.

Thursday, May 29, 2014

Rock solid

What is a "rock solid"?

30th hour of performance test ...

1.3B reads/ 130M writes

SSD cache capacity is hovering around 95%,

RAM used freezes at 84.7%,

ops - 14.5K (+/- 20)

No single sign of instability, degradation, memory or disk partition leak.

BigBase news

BigBase (optimized fork of NoSQL data store HBase ) is in final testing right now. L3 (SSD - based) cache performance is very good, latency distribution is exceptional, far better than anything on the market (may be except Aerospike). On standard YCSB benchmark with all data in L3 cache, I expect performance on par with Aerospike, better than MapR M7 and way better than Cassandra, MongoDB etc. A couple spoilers:

L3 block cache access latencies:

99% - 1.7ms
99.9% - 4.8ms
99.99% - 17ms

Raw performance 12K ops for 8KB blocks, 6K ops for 16KB blocks, with compression, 16KB and 32KB decompressed. This is on AWS m3.2xlarge with 2/80GB of local SSD storage. Add in memory cache block and overall performance is going to be in 10s of thousands ops per node. MC Shrivas will get his personal copy of benchmark report for comparison. No, M7 is not the engineering marvel.

DataStax Enterprise 4.0 In-Memory option

Somehow I have managed to miss this announcement. It was back in Feb. 2014. I must admit that I strongly believe, that popularity of the product (person, movie, book - you name it ...) depends not on a quality and feature set (person has a feature set,. as well :)), but on some other factors, like money spent to promote product ( person, movie, book, etc). Cassandra and MongoDB are good examples ...
The former is still struggling to get counters done right, the latter promises not lock table for every write operation... in next release, of course.

OK, here is the link to the tech paper:

http://www.datastax.com/wp-content/uploads/2014/02/WP-DataStax-Enterprise-In-Memory.pdf

It contains everything except raw benchmark performance data, but the most anecdotal statement can be found at the end of the document:

Because in memory tables are stored on the JVM heap, the total SIZE_LIMIT_IN_MB for an in memory table should currently be limited to 1GB per node, with amount also being dependent on the heap size and the need to leave room for normal Cassandra activity.

Cassandra, you say? :)

Tuesday, May 6, 2014

BigBase announcement (Koda inside :)

Finally, more than three years later ...

http://www.bigbase.org

For those who are still interested in Koda. Koda is part of BigBase distribution, but you can extract it, of course. Sorry guys and girls, no separate doc for Koda, browse source code and test cases.

Friday, September 16, 2011

Some random thoughts on Hadoop, HBase and OSS

Let me start with the link - its very interesting piece of information: Goggle sorts petaflop. First of all, I must confess - I am Hadoop programmer. Hadoop is open-source alternative to Google proprietary MapReduce framework (this is what they used to sort petabyte). This is my day-to-day job - do some stuff in Hadoop and HBase (distributed k-v data store inspired by Google's BigTable). I am wondering how much will it take to sort petabyte in Hadoop? The last number I am aware about: 973 minutes on 3600+ node cluster in 2009. 30x times slower. Of course, average server in 2009 can not be compared to average server in 2011 and number of servers in a cluster were more that 2x time less. How long would it take to finish the same benchmark on the same 8000 nodes cluster but with Apache Hadoop instead of Google's proprietary MapReduce framework? I would say we can divide 973 minutes by 4 ~ 240 minutes to get some approximate estimates. Its ~ 8 times slower than Goggle can do. So what? I must confess one more time - I do not believe in OSS (Open Source Software) as a good model for EVERY type of a software. When you need:

Performance.
Optimal (minimal) system resource usage.
Robustness.
Predictable release schedule.
Innovation.

You better look for commercial alternatives or develop this software in-house (if you have budget, time and skilled professionals).

Thursday, August 18, 2011

Update on Koda

Fixed one issue with native memory allocator used internally by Koda which greatly affected multithreaded update performance. The preliminary test numbers are ~ 4M queries per sec with 90/10 read/write ratio (which is up 15%). I think for 50/50 ratio performance gain must be much larger.
Implemented persistence layer support (which is based on a very promising leveldb library). The support is not integrated yet (will be working on this later on next week)
Currently working on compression support (snappy and gzip) for Koda.

My plans for early autumn 2011: Koda will be released as a part of a new open source distributed Key-Value data store. Wait for announcements. We will be in the same league with Cassandra and HBase.