Where as Data Partitioning (discussed in last chapter) is concerned with picking a node to store the first copy of data (row) on, Replication is all about storing additional copies of data on multiple nodes so it can deal with node failures without data loss.
In Cassandra terminology, these copies of data or rows are called replicas.
This parameter determine how many nodes in your cluster store copies of data. For example, if Replication Factor (RF) is set to 2, there will be two copies of every data stored on different nodes. As common sense dictates, the RF cannot be greater than the number of nodes in the cluster. You cannot say store 10 replicas of data (RF=10) when you only have 8 nodes available. If you try to do this, your writes will fail.
Cassandra has two replication protocols or strategies which I will discuss below. The main difference between the two is whether you have a single data center or your Cassandra cluster spans multiple data centers.
- Simple Strategy: As the name indicates, this is the simplest strategy. Data Partitioner protocol (e.g. RandomPartitioner) picks out the first node for the first replica of data. SimpleStrategy places the second replica on the very next node in the ring.
- Network Topology Strategy: As the name indicates, this strategy is aware of the network topology (location of nodes in racks, data centers etc.) and is much intelligent than Simple Strategy. This strategy is a must if your Cassandra cluster spans multiple data centers and lets you specify how many replicas you want per data center. It tries to distribute data among racks to minimize failures. That is, when choosing nodes to store replicas, it will try to find a node on a different rack.Let’s talk about a few real-world scenarios from my experience: My team and I were designing a Cassandra cluster spanning 4 data centers (we called them sites). Please note that I’m changing several details about the actual scenario, however, the concepts are the same. Each site also had own application servers. See figure below.
We had two concerns:
1. Allow local reads in a data center to avoid cross data center latency. For example, the Application Server in Site #1 should read from the Cassandra cluster in the same site.
2. Application should continue to work albeit with limited functionality if connectivity between data centers is disrupted.
3. Tolerate entire cluster failures in two data centers and some node failures throughout all clusters in all data centers.
To meet all the requirements, we used NetworkTopologyStrategy with 2 replicas per data center. That is every single row of data will be stored in 8 nodes in total ( 2 nodes per site x 4 sites). Hence we can tolerate failure a single node per site while still allowing local reads. We can also tolerate failure of an entire cluster but will have to incur the extra cost of cross data center latency in the Application server of the site where the cluster failed. For example, if the cluster in Site #2 dies completely, the Application server in that site can have to retrieve data from any of Site # 1, 2 or 3 with extra cost of inter site communication. In this example, we can still have our application working (with decreased performance) even if Cassandra clusters in 3 sites fail completely.
SimpleStrategy Versus NetworkTopologyStrategy – Use Cases:
I have read in several posts people suggesting that if you have a single data center, you should use Simple Strategy. However, it’s not so simple. Simple Strategy is completely oblivious to network topology, including rack configuration. If the replicas happen to be on the same rack, and the rack fail, you will suffer from data loss. Racks are susceptible to failures due to power, heat or network issues. So, I would recommend using Simple Strategy only when your cluster is in a single data center and on a single rack (or you don’t care about rack configuration).
How Cassandra knows your Network Topology?
There is no magic here. Cassandra doesn’t learn your topology automatically. You must define the topology such as assigning nodes to racks and data centers and give this information to Cassandra which then uses it. This mapping of nodes to racks and data center is done using Snitch, which I will discuss later.
~~~ Ignore below this line ~~~
2 thoughts on “Cassandra Chapter 5: Data Replication Strategies”
Hi Umer! Your blog posts about Cassandra are fantastic and a very good resource for the C* community. We’d love to see you syndicated with the http://www.PlanetCassandra.org blog, to spread the word about this great content and to help get your name out there.
I’ve syndicated your RSS feed with our blog, you just have to tag “PlanetCassandra” in your C* related postings and they will automatically sync 🙂
Also, shoot on over a good shipping to email@example.com; I’d love to send over some C* SWAG!
DataStax Community Manager
Great article as well as all previous. But I have one unclear point in your example. In my understanding “cluster” is self-standing peace of data (with all that racks, nodes, replicas, etc inside it), so statement “partition tolerance in case of full cluster failure” does not make sense to me. I think this is Site X – Node X, and not Site X – Cluster X configuration. Am I wrong?