Kafka Series

=Basic use & knowledge=

Abstract

  • Review
  • Kafka Basic Concept
  • How to install
  • Why Kafka is so fast

Review

  • Why Kafka
  • What Kafka can do

Why Kafka?

Kafka is a distributed message system with these metrics:

  1. Horizontally scalable
  2. Fault-tolerant
  3. Wicked fast
 

What Kafka can do

  • Messaging
  • Website Activity Tracking
  • Log Aggregation
  • Stream Processing
 

Kafka Basic Concept

  • Overview
  • Broker
  • Producer
  • Consumer
  • Topic
  • Zookeeper

Overview

Overview

Broker

  • No master cluster
  • Accept and handle clients' requests
  • Preserver message
  • Controller coordinates and manages broker cluster
 

Producer

  • Push message to Kafka
  • Partition strategy
    • Round-robin

    • Randomness

    • Key-ordering

  • Compression algorithm
    • GZIP
    • Snappy
    • LZ4
    • Zstandard
  • How not to lost message

Consumer

 
  • Pull message from Kafka
  • Why consumer group
  • What is consumer offsets
  • How to commit offset

Topic

 
  • Why partition 
  • How replica works
  • What is ISR
  • How log is preserved

A topic is a category or feed name to which records are published.

Zookeeper

 
 
  • Manage Kafka cluster metadata
  • Help Controller to coordinate cluster

Zookeeper

 
 

How to install

  • Install Zookeeper Cluster
  • Install Kafka Cluster
  • Most Important Properties
  • Demonstration: Horizontally scalable
 

Install Zookeeper Cluster

  • Install JAVA
  • Install Zookeeper
  • Set myid for each node
  • Start Zookeeper Cluster

Install Kafka Cluster

  • Install JAVA
  • Install Kafka
  • Set server.properties
  • Start Kafka Cluster

Most Important Properties

 
 
 
1
 
Name Description Value
broker.id unique and permanent name of each node in the cluster 1
log.dirs The directories in which the log data is kept. If not set, the value in log.dir is used /kafka/data
zookeeper.connect Specifies the ZooKeeper connection zk1:2181,zk2:2181,zk3:2181
listeners list of URIs we will listen on host:9092
advertised.listeners Listeners to publish to ZooKeeper for clients to use host:9092
unclean.leader.election.enable Indicates whether to enable replicas not in the ISR set to be elected as leader false
log.retention.{hours|minutes|ms} The number of {hours|minutes|ms} to keep a log file before deleting it  48
auto.leader.rebalance.enable A background thread checks and triggers leader balance if required at regular intervals false
max.message.bytes The largest record batch size allowed by Kafka.  104857600 (100mb)
default.replication.factor default replication factors for automatically created topics >3
​min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. >1

Demonstration

Install Zookeeper

# install tools
yum -y install which wget nc net-tools

# install jdk
yum -y install java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64

# java home
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk

# get zk from web
wget "${ZOOKEEPER_URL}" -O "/tmp/${ZOOKEEPER_FILENAME}"

# install zookeeper
tar xfz /tmp/${ZOOKEEPER_FILENAME} -C /opt

# add zookeeper config (add all zk node ip)
cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg


# add myid file for zookeeper
touch /zookeeper/log/myid
echo $NODE_ID >> /zookeeper/log/myid

# start zk
/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg

Checking Zookeer is alive

# [4 letter words]

echo [word] | nc localhost 2181

conf : Print details about serving configuration.
cons : List full connection/session details for all clients connected to this server.
crst : Reset connection/session statistics for all connections.
dump : Lists the outstanding sessions and ephemeral nodes. 
       This only works on the leader.
envi : Print details about serving environment
ruok : Tests if server is running in a non-error state. 
       The server will respond with imok 
       if it is running. Otherwise it will not respond at all.
srst : Reset server statistics.
srvr : Lists full details for the server.
stat : Lists brief details for the server and connected clients.
wchs : Lists brief information on watches for the server.
mntr : Outputs a list of variables that could be used for monitoring the health 
       of the cluster.

# example
echo srvr | nc localhost 2181

Install Kafka

# install tools
yum -y install which wget nc net-tools

# install jdk
yum -y install java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64

# java home
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk

# Get Kafka From web
wget "${KAFKA_URL}" -O "/tmp/${KAFKA_FILENAME}"

# install kafka
tar xfz /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz -C /opt

# edit config/server.properties (listener, zookeeper.connect)
cp /opt/kafka/config/server.properties /opt/kafka/config/server-default.properties

# start setting your broker cfg (plz follow previous slides)
vi /opt/kafka/config/server.properties

# start kafka
/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties

Checking Kafka is alive

# check kafka cluster size
echo dump | nc 10.200.252.232 2181 | grep brokers

# test create topic
/opt/kafka/bin/kafka-topics.sh --zookeeper {{zks_ip:2181}} --create --topic {{name}} 
 --partitions 3 --replication-factor 3

# list topic
/opt/kafka/bin/kafka-topics.sh --zookeeper {{zks_ip:2181}} --list

# show topic detail
/opt/kafka/bin/kafka-topics.sh --zookeeper {{zks_ip:2181}} --describe --topic {{name}}

# produce msg to topic
/opt/kafka/bin/kafka-console-producer.sh --broker-list {{kas_ip:9092}} --topic {{name}} 

# consuming msg from topic
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server {{kas_ip:9092}} 
 --topic {{name}} --group {{name}} --from-beginning 
 --consumer-property enable.auto.commit=false

Why Kafka is so fast

  • When Kafka is writing
  • When Kafka is reading

When Kafka is writing

  • Sequential Disk Access:
    • Fast than random memory access
    • Avoid JVM GC efficient problem and high memory usage
  • Page Cache: LRU

When Kafka is reading

When Kafka is reading

Review

  • Kafka Basic Concept

    • Overview

    • Broker

    • Producer

    • Consumer

    • Topic

    • Zookeeper

  • How to install

    • Install Zookeeper Cluster

    • Install Kafka

    • Most Important Properties

    • Demonstration: Horizontally scalable

Overview

Most Important Properties

Name Description Value
broker.id unique and permanent name of each node in the cluster 1
log.dirs The directories in which the log data is kept. If not set, the value in log.dir is used /kafka/data
zookeeper.connect Specifies the ZooKeeper connection zk1:2181,zk2:2181,zk3:2181
listeners list of URIs we will listen on host:9092
advertised.listeners Listeners to publish to ZooKeeper for clients to use host:9092
unclean.leader.election.enable Indicates whether to enable replicas not in the ISR set to be elected as leader false
log.retention.{hours|minutes|ms} The number of {hours|minutes|ms} to keep a log file before deleting it  48
auto.leader.rebalance.enable A background thread checks and triggers leader balance if required at regular intervals false
max.message.bytes The largest record batch size allowed by Kafka.  104857600 (100mb)
default.replication.factor default replication factors for automatically created topics >3
​min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. >1

Review

  • Why Kafka is so fast
    • When Kafka is writing
    • When Kafka is reading

Next Time!

  • Concept of Kafka Broker
  • Concept of Kafka Topic
  • Concept of Kafka Producer
  • Concept of Kafka Consumer

Any Questions?

Kafka Series - Basic use & knowledge

By Harvey Jhuang

Kafka Series - Basic use & knowledge

Kafka basic concept and use

  • 646