Stupid, Simple Intros: Apache ZooKeeper Explained

What is Apache ZooKeeper?

An open-source software project for sharing data between  distributed applications. It acts as a “coordination” service for distributed applications to synchronized with one another.

In a nutshell, ZooKeeper gives you the tools to help build distributed applications. For example, suppose you have a distributed web server application running on 10 nodes. Say, you want to get total real-time hit count. One way to do this is to write an application which connects to the 10 nodes, gets count from each and present the sum. Alternatively, you can have each web server application write their hit counts to ZooKeeper on regular intervals and then query ZooKeeper to get the count.

ZooKeeper supports locking, synchronization, queues, hierarchical naming service and much more, out of the box.

ZooKeeper runs on a cluster of servers. Data is replicated on nodes and is kept in memory. Clients can connect to any node to read and write, however, writes are passed to the cluster `leader`.

ZooKeeper is eventually consistent: the writes are guaranteed to be executed in the order they were received from the client, updates are atomic, and will eventually be replicated to other nodes.

ZooKeeper is widely used by some big names such as Yahoo!, Netflix, Twitter, and LinkedIn.

Redis: An Alternative?

I have used Redis in the past to coordinate and share data between our distributed applications. Redis is much faster than the ZooKeeper and extremely simple to setup. However, the issue with Redis is that it is not truly distributed in itself and problems may occur if the main Redis node fails (Although, I have been hearing about Redis cluster coming soon. Haven’t tried it though). I don’t want to get into Redis Vs ZooKeeper debate, but if you want to automatically recover from failures, use ZooKeeper. If you are sharing a lot of data at a very high rate, and you can tolerate some downtime, use Redis. 

Leave a comment