Knowledge worth sharing
#05
Florian Dambrine - Principal Engineer - @GumGum
***
***
***
A Log is a sequence of events
Event
Kafka Log
Event
Event
Workload | Partition Sizing |
---|---|
Common | 8 - 16 |
Big topics | 120 - 200 |
YOU'RE WRONG ! | > 200 |
Event
Kafka Broker
4 Nodes Kafka Cluster
Partitions are units of scalability because it allows client applications to both read and write the data from/to many brokers at the same time
L
F
L
L
F
F
L
F
L = Leader
F = Follower
Data will be ingested or served by Brokers that leads a given partition
once connected
1st connection
{
"bootstrap.servers": "<ELB>"
<other settings>
}
["broker1", "broker2", ...]
["broker1", "broker2", ...]
A couple of RCAs were actually caused by topics with misconfigured replication factor leading to data loss
You need to set advertised.listeners (or KAFKA_ADVERTISED_LISTENERS if you’re using Docker images) to the external address (host/IP) so that clients can correctly connect to it. Otherwise, they’ll try to connect to the internal host address—and if that’s not reachable, then problems ensue.
0.0.0.0:9092
PLAINTEXT
0.0.0.0:29092
PLAINTEXT_DOCKER
kafka:9092
host.docker.internal:29092
0.0.0.0:29093
PLAINTEXT_NGROK
4.tcp.ngrok.io:18028
PLAINTEXT
PLAINTEXT_DOCKER
PLAINTEXT_NGROK
produce( )
ack ?
consume()
P1
P2
P3
Consumer group
Consumer 1
Consumer 2
0
1
2
3
4
5
6
7
8
Last Committed
Offset
Current
Position
High
Watermark
Log
End Offset
P1
P2
P3
Consumer group
Consumer 2
Consumer 1
from confluent_kafka import Consumer
consumer = Consumer({
"enable.auto.commit": True
})
while True:
messages = consumer.poll()
try:
process(messages) # heavy
except Exception as e:
logger.exception("Failed processing messages")
P1
P2
P3
Consumer group
Consumer 2
Consumer 1
from confluent_kafka import Consumer
consumer = Consumer({
"enable.auto.commit": False
})
while True:
messages = consumer.poll()
try:
process(messages) # TODO Need to handle message
# committing once processed
except Exception as e:
logger.exception("Failed processing messages")
Rebalance/Rebalancing: the procedure that is followed by a number of distributed processes that use Kafka clients and/or the Kafka coordinator to form a common group and distribute a set of resources among the members of the group.
Examples:
MUST READ !
alias kaf="docker run --entrypoint="" -v ~/.kaf:/root/.kaf -it lowess/kaf bash"
# Consume the content <TOPIC> and copy it to a file
kaf consume <TOPIC> --offset latest 2>/dev/null | tee /tmp/kafka-stream.log
# Consume the content <TOPIC> and generate reformat payload to keep url and uuid only
kaf consume <TOPIC> 2>/dev/null | jq '. | {"url": .url, "uuid": .uuid}'
# Send each line from <FILE> as individual records to <TOPIC>
cat <FILE> | while read line; do echo $line | kaf produce <TOPIC>; done
alias kafka-cli="docker run --rm --entrypoint="" -it confluentinc/cp-kafka bash"
By Florian