NoSQL hype – Cassandra example

In last few weeks there was a lot of NoSQL hype, with more and more information about companies which migrate from rational databases such MySQL to NoSQL solution. There are a lot of pretty awesome NoSQL solution on the market, but from my point of view the most promising is Cassandra.

Originally Cassandra was developed by Facebook guys (btw the core developer was hired from amazon – he was one of the author of Amazon Dynamo). In 2008 Facebook open sourced Cassandra project and now it is developed by Apache.

Apache Cassandra Project was based on two awesome papers “Bigtable: A distributed storage system for structured data, 2006″ and “Dynamo: amazon’s highly available key- value store, 2007″ the result is:

Fault tolerant
Data is replicated and nodes can be replaced with no downtime.
Scalable
Read and Write throughput increase linearly with new nodes are added.
Proven
Digg, Facebook, Twitter and more, are for sure great example of usage.
Easy to use
High Level API, Java, Ruby, Python, Scala and more

Most of the API is done through Thrift it is also developed by Apache Foundation but still in incubator. It is framework for class language services development.

As Cassandra is NoSQL storage it does not have typical rational tables, instead of that it uses Data Structures such as:

Column
Tuple of name, value and timestamp
SuperColumn
Name and Map of
ColumnFamily
Is the infinite container for Columns
SuperColumnFamily
Is the infinite container for SuperColumns
Keyspace
Keyspace is the outer most grouping of your data.

Very nice introduction article for Cassandra Data Model.

The main problem with NoSQL databases is that, data modeling is completely different that relation database data modeling. Because we ask for a given key for some structured data, to achieve the best performance boost we should store data in proper way.

First at all NoSQL databases are not always better, so as always use the right tool to get job done without pain. We should decide do we need NoSQL database such as Cassandra. So … why we may want to use NoSQL solution:

  1. No single point of failure – the relational model is hard and expensive to be clustered (sequences, cascades, transactions, etc.), Oracle or MySQL database are focused on Consistency opposite to Cassandra (see CAP Theorem).
  2. Relational model theory is about normalization 1NF, 2NF, 3NF, and above, NoSQL make here difference, as we want to get all the needed data as single query we allow to duplicate data, so we are not structuring our data to be normalized, we want to structure our data for queries that are executed.
  3. With document store like Cassandra we have flexible scheme, so we may add and remove fields on the fly, this is huge pros as our deployment grows (hundreds of nodes)
  4. Most of setups with “normal” databases are Master (with mostly one “master” node) where writes operation goes, with Cassandra we have distributed writes so we can write data anywhere.

There are a lot of nice articles about installation of Cassandra, so I will just point them here:

As OSX user I’ve used last two, but highlighted link is worth to see as you can build Cassandra cluster.

To play a little bit with Cassandra we will use Cascal library which is hosted at GitHub. Cascal has pretty good documentation so if something is unclear refer to cascal wiki page. One additional important project is twissandra. It is example project to demonstrate how to use Cassandra, so to better understand Cassandra data model it is good to get that project and play with it a little bit.

The practical part is outdated, Cascal is outdated. Current list of drivers on cassandra planet page.

Summary

Cassandra is well-known for having no single point of failure, it is data storage for Facebook, Twitter and Digg. And what is most important here Cassandra has now Commercial support, so if your business don’t have time to learn and play with Cassandra. Now you may call Riptano. They provide Service and Training for Apache Cassandra.

Yes, Riptano is now called DataStax :)

Cassandra has now hers five minutes, and as we see she proves that it is worth to put some affords to learn NoSQL style data models. As always some of problems are ideal for rational data storage and some of them are typical for Cassandra it is good to have both tools in ours toolbox.

Meantime I was playing with Cassandra, Cassandra team have released version 0.6 and 0.6.1. The most important feature is Hadoop MapReduce support, there are also performance improvements with new caching layer. So as you see they moving fast :).

Yes they went really fast. I’ve played with 0.5.1 and now we have 2.1. I decided to publish this outdated post, but to make it interesting I’ve added Cassandra Time Machine.

References:

Cassandra Time Machine:

This  time machine shows us most important changes in major version of Cassandra. Despite of this changes there was a lot of bug fixes and improvements as well. I’ve done this time machine to appreciate work of Cassandra’s contributors. They did and still do great job!

0.6.x (2010)

The Cassandra’s team resolved 348 issues (part of them are port from 0.7x), there was thirteen releases. From 0.6.7 version all releases were bug fixing  ported form 0.7.x.

Features added:

  • Simple and very “stupid :)” Hadoop integration,
  • Dynamic endpoint snitch – “An endpoint snitch that automatically and dynamically infers “distance” to other machines without having to explicitly configure rack and datacenter positions solves two problems:”,
  • MX is accessible for none java client
  • Authorization and authentication (the beginning)
  • Per-keyspace replication factor (the beginning of replication strategy)
  • Row level cache
  • InProcessCassandraServer for testing purpose. Now it is replaced by EmbeddedCassandraService.
  • and many more minor features (ConsistencyLevel.ANY, ClusterProbe, Pretty-print column names, more JMX operations, global_snapshot and clear_global_snapshot commands, cleanup utility …)

0.7.x (2011)

This time they resolved 1006 issues and there was ten releases.

Features added:

  • Expiration time for column. Expired column acts as ‘markedForDelete’.
  • Configurable ‘merge factor’ for Column Families. MergeFactor attribute is used to tune read vs write performance for a ColumnFamily. A lower MergeFactor will cause compaction more frequently, leading to improved read performance at the cost of decreased write performance.
  • Allow creating indexes on existing data.
  • EC2Snitch – this snitch assumes  that EC2 region is a DC and  availability zone is a rack.
  • scrub command – rebuild sstables for one or more column family.
  • Removal operation which operates on key ranges and delete an entire columnfamily (truncate operation).
  • Weighted request scheduler.
  • and many more (access level for Thirft, many cassandra-cli improvements, NumericType  column comparator, support for Hadoop Streaming, cfhistograms, secondary indices for column families, JMX per node interface,

0.8.x (2011)

Last version before 1.0. Team resolved 549 issues and released ten versions.

Features added:

  • CQL (Cassandra Query Language) 1.0 language specification.
  • Idea of Coprocessors  (from hackathon) which was renamed to Plugins, which was implemented in 2.x as Triggers.
  • SeedProvider is pluggable via interface
  • Encryption support for internode communication (all, none).
  •  EC2 features for setting seeds and tokens (in EC2 machines die and bring up more frequently)
  • Compaction Throttling .
  • Support for batch insert/delete in CQL.
  • JDBC driver for CQL.
  • and many more (more commands in cli, different timeouts for different classes of operation, counter column support for SSTableExport, EndpointSnitchInfoMBean …)

1.0.x (2011)

First stable release. 510 issues were resolved, there was also twelve versions.

Features added:

  • SSTable compression – long waited feature (CASSANDRA-47). Most of the time it is good to exchange CPU for I/O.
  • Stream compression – Today we have Snappy, LZ4 and Deflate compression.
  • Checksum for compressed data to detect corrupted columns.
  • Better performance for rows with contains many (more than thousand) columns.
  • Max client timestamp for an SSTable is being captured and provided via SSTableMetadata.
  • Encryption for data across DC only.
  • Timing information to cassandra-cli queries – it looks like cosmetics, but is very handy.
  • Redesigned Compaction
  • CQL 1.1
  • and many more (RoundRobinScheduler metrics, overriding RING_DELAY, upgradesstables nodetool command, CQL improvements,  bloomfilter stats and memory size, ….)

1.1.x (2012)

One dot One line had twelve releases and the team resolved 620 issues.

Features added:

  • Concurrent Schema Migrations,
  • Prepared statements,
  • Infinite bootstrap – for new configuration testing purpose with live traffic. In this mode node would follow the bootstrap procedure as normal, but never fully join the ring.
  • Running Map/Reduce job with server side filtering.
  • Override of available processors value, so we can deploy multiple instances on single machine.
  • CompositeType comparator is now extendable.
  • Fine-grained control over data directories, so we can control what sstable are placed where.
  • Eagerly re-write data at read time.
  • Configurable transport in RecordReader and RecordWriter.
  • and many more (ALTER of Column Family attributes in CQL, Gossip goodbye command, loading from flat file, COPY TO command,  CQL support for describing keyspaces and column families,  rebuild index” JMX command , disable snapshots option,  “ALTER KEYSPACE” statement …)

1.2.x (2013)

This time Cassnadra team resolved 997 issues and released nineteen versions.

Features added:

  • Disallow client-provided timestamps, so WITH TIMESTAMP was ripped out.
  • Query tracing details – very helpful feature.
  • Ability to query collection types (list, set, and map) in CQL.
  • CQL 3.0 (better support for wide rows and generalization for composite columns, per-column family default consistency level)
  • Murmur3 partitioner which is faster then MD5.
  • Different timeout for reads and writes.
  • Atomic, eventually-consistent batches.
  • Compressed and off heap bloomfilters.
  • Global prepared statement instead of connection based.
  • Describe cluster for nodetool and cqlsh.
  • Metrics for native protocols and for global ColumnFamily.
  • Latency consistency analysis within nodetool. Users can accurately predict Cassandra’s behavior in their production environments without interfering with performance.
  • Custom CQL protocol and transport.
  • LZ4Compressor two time faster compression than Snappy.
  • LOCAL_ONE consistency level.
  • and many more (improved authentication logs, Multiple independent Level Compactions, UpgradeSSTables optimization, tombstone statistics in cfstats, ReverseCompaction,  resizing of threadpools via JMX, allow disabling the SlabAllocator, Notify before deleting SSTable…)

2.0.x (2013)

The 2.0 was released and here you find great Datastax article: What’s under the hood in Cassandra 2.0. There were ten versions and the team resolved 868 issues.

Features added:

  • Triggers – Asynchronous triggers is a basic mechanism to implement various use cases of asynchronous execution of application code at database side.
  • Query paging mechanism native CQL protocol.
  • Compare and Set Support (SET with IF statements).
  • Streaming 2.0.
  • Multiple ports to gossip from a single IP address this allow for multiple Cassandra service instances to run on a single machine, or from a group of machines behind a NAT.
  • CQL improvements.
  • Reduce Bloom Filter Garbage Allocation.
  • Network topology snitch supporting preferred addresses, so having cluster spanning multiple data centers, some in Amazon EC2 and some not is possible.
  • and many more ( index_interval configurable per column family, Single-pass compaction, Track sstable coldness, beforeChange Notification, CqlRecordReader, balance utility for vnodes, triggers LWT operations …..)

2.1.x (2014 during Cassandra Summit)

Two beta, six release candidate, 535 issues resolved. That’s great news. Datastax provided great articles about 2.1:

Final Word

Rafał provides great Cassandra Modeling Kata. It’s worth reading!

This post is original from April 2010. I’ve added citation to make some comments, and I’ve added Time Machine section (after orginal references). Currently the best place to start is DataStax Blog,  CassandraPlanet, and of course Twitter.

Some of futures where backported into previous version, that’s the reason they are in previous version (eg. CASSANDRA-5935 was fixed in 2.0.1 but also ported to 1.2.10 so it will show in 1.2.x).

DataStax in meantime became great company. They are behind Cassandra for many years providing stable and continuous growth. Currently valuation is around 830 million american dollars. If you have spear money I recommend you to invest in this company :). Last pre IPO round raised DataStax by $106 Million.

CloudFront Joined AWS Free Usage Tier

Amazon CloudFront is Content Delivery Network (CDN) and more as it is integrated and optimized to work with other AWS services. By using edge locations of Amazon’s DCs we can cache our content and deliver it with low latency. It doesn’t meter if the content is static (S3 object) or dynamic (EC2 service). We can deliver entire website, or cache part of it to improve performance (static content at edge location means faster download of images, js etc. ).  We should it consider in our mobile apps, when latency is challenging. You will find detailed information on product page.

So what’s the big deal with AWS Free Usage Tier?  AWS Free Usage Tier is how AWS promote services. We can use certain services for free. YES for FREE, of course with some restriction such as: storage size, total requests, hours, etc. For us it means we can build quite successful web page for free. On the other hand we can just run some tests to find out if particular solution works for us. It’s marketing for AWS, and opportunity for us.

Amazon CloudFront free tier allows for for 2,000,000 HTTP and HTTPS Requests each month for one year. For small business or testing purpose it’s enough.

If we can use it something for free it is good to mention it. In our profession we have to learn all the time, and access for free gives us opportunity to play a little bit before buying. For working professionals paying few bucks is not a problem, but for students it might be.

Currently AWS Free Usage Tier has:

  • Amazon EC2 – so we have some servers for free
  • Amazon S3 – so we have some static object for free
  • Amazon RDS – so we have database for free
  • Amazon CloudWatch – so we monitor for free
  • Amazon EBS – so we have storage for free
  • Amazon SES – so we can send email for free
  • Amazon SQS – so we have messaging for free
  • Amazon SNS – so we have notification for free
  • Amazon SWF – so we have workflows for free
  • and few more …

and of course Amazon CloudFront for free so we have CDN

In my opinion a lot of small businesses can start for free, and if somebody succeed than …… you know FB or Google will pay few millions  and will provide infrastructure :)

 

JIRA Tuning – part one

There was a time when I did some JVM tuning. It’s time to reactivate the part of the brain responsible for this activity.

The key principle is to prepare a baseline. What is it?

The baseline refers to:

  • the first measuring mechanisms (np. gc.log) – we should measure on different levels (cpu, I/O, database etc) if in doubt follow thus rule: “more is better”
  • a load test script that makes JIRA work (e.g. Apache JMeter script),
  • a test environment – should be identical as production (different environment create different baseline), and,
  • a desired goal (we tend to tune solutions not having any goal). Never tune just for tuning.

The next required value is time… a lot of time. The reason is simple. We make only one change at a time and when it is completed, we launch load scripts and use metrics to see if we’re going in the right direction. The whole operation is based on making several steps forward followed by a few steps back.

Our recent goal was to make JIRA stable, despite working slow (no restart during the day).

We launched the analysis of gc.loc. Due to short pauses, we used the Concurrent Mark Sweep collector. Unfortunately, there are two sides of a coin. As CMS does not compact the memory, you may not be able to place any object in free spaces despite having free memory.

This usually results in the death of JVM – when looking for some free space, you do some cleaning, which takes pretty long time. This causes a “slashdot effect”. As a result you need more and more time, because of which the main GC works and the remaining part is stuck -> reset!

OK, we update two parameters causing CMS to start working earlier – usually, it is launched when the old generation is filled up to 68% (exact value vary depending on JVM version):

  • -XX:CMSInitiatingOccupancyFraction=40
  • -XX:+UseCMSInitiatingOccupancyOnly

Because of that, the whole commotion happens on the bottom of the stack. On the top, there is a place for larger objects. Obviously, you may (and will) experience a problem with stack fragmentation.

We launch load tests. The situation was better, but….

Unfortunately, we killed JIRA. The next decision is to limit the work of GC by reducing the pipe. We lower the maxThreads parameter on tomcat. We launch tests on one of the machines. Nothing happen. We add the second machine and start acting more aggressively. Result: although slow, Jira responds in a stable manner. This is what we were hoping for!

I may still try two other options:

  • -XX:+UseCompressedStrings
  • -XX:+OptimizeStringConcat

They were added in java 6u21 and 6u20. It is an interesting option, because it optimizes String-type objects, so that – whenever possible – it used 1 byte tables instead of 2 byte char tables. Most of the texts can be squeezed in a single byte. We check the memory usage histogram (previously 98% – char[]). This time a byte[] takes precedence over a char[].

In the meantime, the jConsole operator screams with surprise: “Pedro, what have you done?!”. This draws our attention (we think we’ve f… up something). Fortunately, all this hassle was about a 50% drop in memory usage. Uff, this is yet another confirmation that our switch is working.

Time to finish. We note down our ideas for later. Migration is in just two days. Quick help of our colleague gave us a new machine. We’ve finally got a place to use the G1 collector.