In terms of it’s architecture, Cassandra is an open source, p2p, distributed data store written in the Java programming language.
Distributed, P2P and Fault Tolerant:
It is designed to run on multiple nodes and all nodes are equals: you do not have to specify any master nodes. One of the main ideas behind Cassandra was that machines fail and that the system should be able to sustain node failures. Cassandra fulfills this vision by partitioning its data among all nodes in the cluster and using replication to store multiple copies of the data.
One of the biggest features of its architecture is that Cassandra is write anywhere, read anywhere. This means that if you have a Cassandra cluster of 50 nodes, you can write some data to any node in the cluster and read it from any node in the cluster. You do not have to deal with partitioning and other messy details on the application level.
Cassandra is scalable. I have read articles which shows that Cassandra can easily scale to handle petabytes of data. Cassandra also claims to scale linearly with respect to performance, so that when nodes are doubled, so is the performance. In our experimentation with Cassandra on RackSpace, Cassandra does seem to hold this point valid as we saw almost double throughput every time we doubled the nodes.
Distribution Among Multiple Data Centers:
This is really an extension of “distributed” feature of Cassandra but this is so important that I’d like to discuss it. The TCP/IP or the OSI model is rack or data center topology unaware. When you provide an IP address to an application using TCP/IP, the networking stack does not know whether the TCP/IP end-point is in the same data center or a different one across the globe. This means that we must do something at the application layer to allow an application to make a determination about the location of a TCP/IP end-point. (define end-points, or use some intelligent algorithm to infer the location of the IP end-point).
Cassandra provides a tool on to define data center configurations. You edit configuration file in Cassandra to specify nodes and then group nodes by racks or data centers. Cassandra does no magic to determine whether the nodes are in the same data center or not, it simply reads it from the configuration. Cassandra uses the term Snitch for mapping nodes to physical locations in the network. A snitch maps a IPs to racks and data centers. It defines network topology in terms of nodes. Snitches are configured in main Cassandra config file: cassandra.yaml
Actually, to make my point, I lied: Cassandra can infact ‘infer’ the node’s location in the network, say a rack, if your network topology is indicated by the IP addresses. RackInferringSnitch does this.
The nodes in the Cassandra cluster uses Gossip protocols to exchange internal information about what’s going on with each other. For example, when a new node is added to the cluster, it uses the Gossip protocol to signal it’s arrival to the other nodes. When writing Cassandra applications, you do not use the Gossip protocol at all, but it is handy to know about this since it comes up in the official documentation.
Writing Data to Cassandra:
When data is written to Cassandra, the node handling that request writes it to a log file, called the “commit log”. Commit logs are used to ensure data durability and to make sure that writes are not lost if the machines crashes all of a sudden. Cassandra then writes the data to a table in memory and eventually writes the data to disk in SortedStrings (SS) table.
When you are writing and reading data from any node in a Cassandra cluster, there are issues regarding consistency. In other words, for a real-time system which is writing data thousands of times a second, how do we read the latest copy of data. For example, if you are writing some very critical data, you can make Cassandra writes that data to every node in the cluster. Similarly, if you would like to get the latest copy of data, you can read from all nodes and then take the latest copy (by milliseconds).
Cassandra offers tunable consistency, which means you can set it yourself. If you want high consistency, no problem. Cassandra takes this one step further: consistency can be set on a per operation basis.
Consistency Level ALL:
Consistency Level ONE:
Consistency Level QOURUM:
Cassandra uses no fixed schema like RDBMS to define data structure. You can have structure, unstructured data. Here’s the cool thing: Cassandra still allows you to define indexes .
Much like SQL language in the RDBMS universe, Cassandra has CQL to access and modify data. CQL’s syntax is close to SQL and supports core SQL DDL/DML commands like CREATE, SELECT, INSERT, UPDATE, etc. Here’s an example query in CQL:
SELECT * FROM usercounters WHERE NumberOfTimesPoked > 2
To provide space saving benefits, as of version 1.0 Cassandra also supports data compression using the Google Snappy protocol. You can tell Cassandra to compress the data and when you try to read it, Cassandra will automatically decompress the relavant data and send it to you. This is something that I may not experiment. There is an obvious trade-off in performance as the CPU has to do extra work uncompressing data.