How To Setup Apache Kafka

Apache Kafka

In this guide I will tell you how to setup and work with publish subscribe domain in Apache Kafka in windows environment. I am not going to tell you about Kafka and you will find very good documentation about Apache Kafka.

Kafka works on publish subscribe domain and this tutorial will show you how to create topic, how to send or publish messages, how to consume messages etc using command line tool.

Prerequisites

Kafka, Windows 10 64 bit/11 64 bit

Install Kafka

Go through the following steps to install Kafka in Windows Operating System:

  1. Download Apache Kafka Binary distribution.
  2. Extract the file kafka_*.tgz. Then again extract the file kafka_*.tar. Now copy the root folder kafka_* under C drive. You can choose any other drive and you have to accordingly adjust the command path for rest of the tutorial content you go through.

Start ZooKeeper Server

Kafka uses ZooKeeper to manage the cluster. ZooKeeper is used to coordinate the brokers/cluster topology and for leadership election for Broker Topic Partition Leaders.

The default data directory location is dataDir=/tmp/zookeeper where the snapshot is stored. If you want to change the default location you can do so by editing the file C:\kafka_*\config\zookeeper.properties file. For example, you can change the location to dataDir=C:/kafka/zookeeper.

Start the ZooKeeper server:

  • Open command line tool. Navigate to the directory C:\kafka_*.
  • Execute command bin\windows\zookeeper-server-start.bat config\zookeeper.properties.
setup and work with kafka in windows
  • Finally ZooKeeper server starts on port 2181:
setup and work with kafka in windows

Start Kafka Server

Kafka server acts as a broker. Producers or publishers are publishing or sending messages or data into topics within the broker. Consumers or subscribers of topics pulls messages or data off the topics.

The default location for log is log.dirs=/tmp/kafka-logs where log files are stored. If you want to change the default location then you can do so by editing the file C:\kafka_*\config\server.properties file. For example, you can change it to log.dirs=C:/logs/kafka.

Start Kafka server:

  • Execute the command in command line tool: bin\windows\kafka-server-start.bat config\server.properties.
setup and work with kafka in windows
  • Finally Kafka server or broker will be started
setup and work with kafka in windows

Create Topic

Nest step is to create topic and publish some messages into it. Let’s create a topic named “roytuts” with a single partition and only one replica.

  • Execute command in command line tool: bin\windows\kafka-topics.bat –create –bootstrap-server localhost:9092 –replication-factor 1 –partitions 1 –topic roytuts
setup and work with kafka in windows
  • We can now verify the topic we created: bin\windows\kafka-topics.bat –list –bootstrap-server localhost:9092
setup and work with kafka in windows

Send Message

Now we can test by sending some messages:

  • Execute command: bin\windows\kafka-console-producer.bat –broker-list localhost:9092 –topic roytuts
  • Then type the messages you want to send
setup and work with kafka in windows

Consume Message

Start a consumer to consume messages:

  • Execute the command to consume messages from topic roytuts: bin\windows\kafka-console-consumer.bat –bootstrap-server localhost:9092 –topic roytuts –from-beginning
setup and work with kafka in windows

Set up Multi-broker Clusters

For Kafka, a single broker is just a cluster of size one, so nothing much changes other than starting a few more broker instances. But just to get feel for it, let’s expand our cluster to three nodes on our local machine.

We will copy the existing file server.properties and create as many files as the number of clusters we need to create.

  • Copy the file C:\kafka_*\config\server.properties twice and rename as server1.properties and server2.properties in the same location as we have server.properties file. So we have now three files for three nodes – server.properties, server1.properties and server2.properties.
  • Now open the file server1.properties and make the following changes:
    • Replace broker.id=0 by broker.id=1
    • Add listener port by adding a line listeners=PLAINTEXT://:9093
    • Replace log.dirs=/tmp/kafka-logs by log.dirs=/tmp/kafka-logs-1
  • Open the file server2.properties and make the following changes:
    • Replace broker.id=0 by broker.id=2
    • Add listener port by adding a line listeners=PLAINTEXT://:9094
    • Replace log.dirs=/tmp/kafka-logs by log.dirs=/tmp/kafka-logs-2

The broker.id property is the unique and permanent name of each node in the cluster. We have to override the port and log directory only because we are running these all on the same machine and we want to keep the brokers from all trying to register on the same port or overwrite each other’s data.

We already have Zookeeper and our single node started, so we just need to start the two new nodes:

  • Execute command bin\windows\kafka-server-start.bat config\server1.properties and bin\windows\kafka-server-start.bat config\server2.properties
  • Now create a new topic with a replication factor of three: bin\windows\kafka-topics.bat –create –bootstrap-server localhost:9092 –replication-factor 3 –partitions 1 –topic roytuts-replicated-topic

Okay but now that we have a cluster how can we know which broker is doing what? To see that run the “describe topics” command: bin\windows\kafka-topics.bat –describe –bootstrap-server localhost:9092 –topic roytuts-replicated-topic.

setup and work with kafka in windows

The first line gives a summary of all partitions, each additional line gives information about one partition. Since we have only one partition for this topic there is only one line.

“leader” is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.

“replicas” is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.

“isr” is the set of “in-sync” replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.

Note that in the example node 1 is the leader for the only partition of the topic.

We can run the same command on the topic roytuts we created to see where it is: bin\windows\kafka-topics.bat –describe –bootstrap-server localhost:9092 –topic roytuts

setup and work with kafka in windows

So there is no surprise – the topic roytuts has no replica and is on server 0, the only server in our cluster when we created it.

Let’s publish some messages on new topic roytuts-replicated-topic using below command:

bin\windows\kafka-console-producer.bat –broker-list localhost:9092 –topic roytuts-replicated-topic

setup and work with kafka in windows

Let’s consume these messages:

bin\windows\kafka-console-consumer.bat –bootstrap-server localhost:9092 –from-beginning –topic roytuts-replicated-topic

setup and work with kafka in windows

Let’s test fault-tolerance. Broker 1 was acting as the leader so let’s kill it:

Get the process id: wmic process where “caption = ‘java.exe’ and commandline like ‘%server1.properties%'” get processid

setup and work with kafka in windows

Kill the process: taskkill /pid 3820 /f

setup and work with kafka in windows

Check the topic roytuts-replicated-topic: bin\windows\kafka-topics.bat –describe –bootstrap-server localhost:9092 –topic roytuts-replicated-topic

setup and work with kafka in windows

So leadership has switched to one of the followers and node 1 is no longer in the in-sync replica set.

But the messages are still available for consumption even though the leader that took the writes originally is down:

setup and work with kafka in windows

That’s all and hope you got an idea how to setup and work with publish subscribe domain in Apache Kafka.

Leave a Reply

Your email address will not be published. Required fields are marked *