Operating Management and Automation with the ZooKeeper Open Source Project
Overview
Apache ZooKeeper is a coordination engine that provides operational support services for distributed environments. It provides centralized maintenance functions for each node, or for the system as a whole.
Basic Description
When implementing a distributed application, a large amount of work occurs around configuration and other settings for each distributed node. ZooKeeper extracts essential tasks such as configuration work and provides a simple interface as a centralized coordination service.
ZooKeeper provides many frequently used services, including configuration management, name resolution, synchronization, and group services.
As standard features, it can use consensus protocols, group management protocols, leader election protocols, presence protocols, and more. It is also possible to build services for custom requirements.
It is an open source project of the Apache Software Foundation and one of Hadoop’s subprojects. It is implemented in Java and runs on the JVM. APIs are provided for Java and C.
Main Features
High Throughput and Low Latency
The various data managed by ZooKeeper is kept in memory rather than storage, so it provides high throughput and low latency.
Because it can deliver high performance, it can support large-scale distributed systems. It can operate especially fast when client access is mainly read-oriented.
High Availability (Master Redundancy)
ZooKeeper can be installed and used on multiple servers. This improves availability and can also improve performance. When multiple ZooKeeper instances are started, a master is selected automatically and performs overall management.
If the ZooKeeper master node stops for any reason, an election is held among the nodes and a new master node is selected.
Atomicity
Data managed by ZooKeeper can be read and written atomically. Client updates are applied one at a time in order. Access restrictions can also be configured for each node.
Hierarchical Namespace
ZooKeeper stores data in a hierarchical namespace. Each process distributed across multiple servers shares this namespace and proceeds with distributed processing while coordinating with the others.
Eliminating Inconsistency Between Nodes
Because data updates are performed only by the master node, the structure prevents inconsistent data between nodes.
Main Functions
Shared Configuration
If configuration files are stored on the ZooKeeper server and each instance retrieves the configuration file stored in ZooKeeper, you can ensure that the configuration files for each instance are identical.
Distributed Locking
ZooKeeper provides locking functionality that prevents multiple instances from rewriting a shared resource at the same time.
Membership Acquisition
When each instance in a distributed system registers an ephemeral node in ZooKeeper, you can obtain a list of available instances in the distributed system. Each client can connect to an active server.
Watch Function
When a specific process watches a specific ZooKeeper node, the process is notified if that ZooKeeper node changes. This can be used as a trigger for reloading configuration files.
Adoption Cases
ZooKeeper has been adopted by the distributed key-value database HBase, the search engine SolrCloud, and the distributed real-time stream processing engine Jubatus.
Major application areas include directory services, configuration management services, synchronization services, leader election services, message queue services, and enterprise search systems.
License Information
Apache ZooKeeper is licensed under Apache License 2.0. Under this license, you may modify and publish the source code.