Kafka hands-on Guide to using publish-subscribe based messaging system (PART I)
Kafka is in general publish-subscribe based messaging system. Producers publish messages and consumers consume or pull that data.
A real-life example is Dish TV, which publishes different channels like sports, movies, music, etc., and anyone can subscribe to their own set of channels and get them whenever their subscribed channels are available.
Messages are persisted in a topic. Kafka keeps a minimum of one partition
Now let’s understand in this architecture system each component plays an important role.
Broker
- Kafka cluster typically consists of multiple brokers to maintain load balance.
- Kafka brokers are stateless, so they use ZooKeeper for maintaining their cluster state.
- One Kafka broker instance can handle hundreds of thousands of reads and writes per second and each broker can handle TB of messages without performance impact.
- Kafka broker leader election can be done by ZooKeeper. Means in case of data loss zookeeper decide which broker to make a master and which broker to make a slave.
ZooKeeper
- Zookeeper plays an important role in Kafka system. it’s used to manage and coordinate with the broker.
- ZooKeeper service is mainly used to notify producer and consumer about the presence or failure of any new broker in the Kafka system.
- As per the notification received by the Zookeeper regarding presence or failure of the broker then producer and consumer take a decision and starts coordinating their task with some other broker.
Producers
- Producers push data to brokers.
- When the new broker is started, all the producers search it and automatically sends a message to that new broker.
- Kafka producer doesn’t wait for acknowledgments from the broker and sends messages as fast as the broker can handle.
Consumers
- Since Kafka brokers are stateless, which means that the consumer has to maintain how many messages have been consumed by using partition offset.
- If the consumer acknowledges a particular message offset, it implies that the consumer has consumed all prior messages.
- The consumer issues an asynchronous pull request to the broker to have a buffer of bytes ready to consume.
- The consumers can rewind or skip to any point in a partition simply by supplying an offset value. The consumer offset value is notified by ZooKeeper.
Role of ZooKeeper — Without zookeeper, Kafka can not exists :)
- Zookeeper serves as the coordination interface between the Kafka brokers and consumers.
- The Kafka servers share information via a Zookeeper cluster.
- All the critical information is stored in the Zookeeper so a failure of Kafka broker / Zookeeper does not affect the state of the Kafka cluster.
- Kafka will restore the state, once the Zookeeper restarts. → This gives zero downtime for Kafka.
- The leader election between the Kafka broker is also done by using Zookeeper in the event of leader failure.
This much basic information is good to get started your hands-on coding session :) Lets Tuned in part II