Apache Kafka Terminology or Keywords

Apache Kafka Terminology or Keywords

In this article, we will discuss Apache Kafka Keywords such as Topic, Partition, Broker, Producer, Consumer, Consumer Group, Partitions, Message Key, Offset, Replication Factor, etc..

 

Topic

  • Kafka maintains feeds of messages in categories called topics.
  • Each topic has a user-defined category (or feed name), to which messages are published.
  • Basically, topics in Kafka are similar to tables in the database, but not containing all constraints.
  • The topic used to store and publish a particular stream of data.
  • The producer publishes the message to the topic and the consumer consumes the message the topic.
kafka-topics.bat --zookeeper localhost:2181 --create --topic MyFirstTopic2 --partitions 1 --replication-factor 3

Replication Factor

  • Replica refers to a backup.
  • Same data will be replicated according to the replication number given at the time of topic creation
  • A replica of a partition is a “backup” of a partition.
  • Replication factors used to prevent data loss.
kafka-topics.bat --zookeeper localhost:2181 --create --topic MyFirstTopic2 --partitions 1 --replication-factor 3

Here partitions are 1, the partitions data will be replicated 3 times.

Partitions

  • Kafka topics are divided into a number of partitions
  • Each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel
  • Kafka appends new messages to a partition in an ordered, immutable sequence.
  • Each broker holds a number of partitions and each of these partitions can be either a leader or a replica for a topic
  • All writes and reads of data to a topic go through the leader partitions and the leader coordinates updating replicas partitions with new data
  • Partitions mapped with Message Key. So while sending the message with the same key then the message will be stored into the same partition.
  • The messages are stored in the sequence order i.e. lower number means that older messages and a higher number mean that the latest messages.
Kafka-Partitions

Kafka-Partitions

kafka-topics.bat --zookeeper localhost:2181 --create --topic MyFirstTopic2 --partitions 1 --replication-factor 3

Broker

  • A Kafka cluster consists of one or more servers, each of them is called a broker.
  • Each broker holds a number of partitions and each of these partitions can be either a leader or a replica for a topic
  • All writes and reads of data to a topic go through the leader partitions and the leader coordinates updating replicas partitions with new data
  • At any movement of time a leader fails, then a replica takes over as the new leader.
  • Each broker is nothing but one system.

Kafka-Broker-Leader-Replications

Message Key

  • While publishing the message we will add the Key, so that the message will go to particular partition because the partition mapped with that Key. And the partition will allow other messages also.
  • The Key will be used to JOIN the streams or tables.
  • If the message sends without the Key, then the message will be stored in any partition depends on the availability of partition.

Producer

  • The Producer will publish the message to a topic in the cluster.
  • The Producer will publish the message with the Key to the topic in the cluster
  • The Producer will publish the text file
  • The Producer is writing to partition 0 of the topic and partition 0 replicates that write to the available replicas.
  • The Producer will publish the message in the format of binary i.e. StringSerialization

Kafka-Producers

Consumer

  • The Consumer will consume the message from a topic in the cluster.
  • The Consumer will consume the message with the Key to the topic in the cluster
  • The Consumer will consume the text file
  • The Consumer will consume the message in the format of String i.e. StringDeSerialization

Kafka-Consumers

Producers – Consumers

Kafka-Producers-Consumers

Kafka-Producers-Consumers

Consumer Group

  • Consumers can be organized into consumer groups for a given topic
  • Each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic.

Example:

Consumer 1 —> Partition 1 —- Right

Consumer 1 —> Partition 1 & Partition 2 —- Right

Partition 1 —> Consumer 1 —- Right

Partition 1 —> Consumer 1 & Consumer 2 —- Wrong

  • If you have more consumers than partitions then some consumers will be idle because they have no partitions to read from.

Example: 4 Consumers and 3 Partitions -> 1 Consumer will be idle

Kafka-Consumers-Consumers-Count-Greatar-Than-Partitions-Count

Kafka-Consumers-Consumers-Count-Greatar-Than-Partitions-Count

  • If you have more partitions than consumers then consumers will receive messages from multiple partitions.

Example: 3 Consumers and 4 Partitions

Consumer 1 —> Partition 1

Consumer 2 —> Partition 2

Consumer 3 —> Partition 3

Consumer 1 or Consumer 2 or Consumer 3 in which any one of the consumers consume the partition P4

Kafka-Consumers-PartitionCount-Greatar-Than-Consumer-Count

Kafka-Consumers-PartitionCount-Greater-Than-Consumer-Count

  • If you have equal numbers of consumers and partitions, each consumer reads messages in order from exactly one partition.

Example: 3 Consumers and 3 Partitions

Consumer 1 —> Partition 1

Consumer 2 —-> Partitiion 2

Consumer 3 —-> Partitiion 3

Kafka-Consumers-Partitions-Consumers-Equal

Kafka-Consumers-Partitions-Consumers-Equal

Offset

Each message in a topic is assigned a sequential number that uniquely identifies the message within a partition. This number is called an offset,

 

 

Leave a Reply