Intro to
Apache Kafka

What is Kafka?

A highly scalable, enterprise-grade, pub/sub messaging system implemented as a distributed transaction log.

Created at LinkedIn for internal needs to handle real-time data feeds across the company
Open sourced in 2011
Written in Scala and Java
Runs on the JVM

What is Kafka?

(cont.)

Confluent spun off by original devs to provide commercial support for Kafka
Currently in use at many large companies including:
- Netflix
- PayPal
- Spotify
- Uber
- many, many more

What is it used for?

Replaces "traditional" messaging systems like ActiveMQ, RabbitMQ, etc. for pub/sub applications
Operational monitoring/metrics
Log aggregation - central storage of distributed, individual logs
Stream processing - incoming messages can be processed before passing along

Excellent example of using Kafka to provide real-time enforcement of trading rules in an MMORPG can be found at:
http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/

Architecture

Partitions

Partitions are the unit of parallelism in Kafka
Partitions provide durability to Topics
Partitions are stored on disk on Brokers
Partitions may contain data not yet replicated to Follower nodes
A write is not "committed" until all replicas sync

Replication

Topic Partitions are replicated across Brokers to provide automatic failover in the event of Broker failure.
Each Partition has a Leader Broker that handles all reads & writes to that partition. All the other Brokers are Followers.
In the event of Leader failure, a Follower is chosen to takeover as Leader.
Replication Factor can be assigned on a per topic basis.

Producers

Available in many languages
Writes messages to Topic Partition on Leader Broker
Writes can be keyed so that all writes from a Producer go to a specific Partition
Writes can be individually sent or batched by latency or size

Consumers

Also available in many languages
Read messages off Topic Partition on Leader Broker
Consumers track progress by offset
Messages can be re-consumed by stating the desired offset
Consumer Groups are collections of Consumers that are used to evenly distribute reads from all the partitions of a topic.

Demo

Intro to Kafka

By Andrew MacKenzie

Intro to Kafka

8 years ago
420

Andrew MacKenzie