Intro to
Apache Kafka
What is Kafka?
A highly scalable, enterprise-grade, pub/sub messaging system implemented as a distributed transaction log.

- Created at LinkedIn for internal needs to handle real-time data feeds across the company
- Open sourced in 2011
- Written in Scala and Java
- Runs on the JVM
What is Kafka?
(cont.)

- Confluent spun off by original devs to provide commercial support for Kafka
-
Currently in use at many large companies including:
-
Netflix
-
PayPal
-
Spotify
-
Uber
-
many, many more
-

What is it used for?
- Replaces "traditional" messaging systems like ActiveMQ, RabbitMQ, etc. for pub/sub applications
- Operational monitoring/metrics
- Log aggregation - central storage of distributed, individual logs
- Stream processing - incoming messages can be processed before passing along
Excellent example of using Kafka to provide real-time enforcement of trading rules in an MMORPG can be found at:
http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
Architecture

Partitions
- Partitions are the unit of parallelism in Kafka
- Partitions provide durability to Topics
- Partitions are stored on disk on Brokers
- Partitions may contain data not yet replicated to Follower nodes
- A write is not "committed" until all replicas sync

Replication
- Topic Partitions are replicated across Brokers to provide automatic failover in the event of Broker failure.
- Each Partition has a Leader Broker that handles all reads & writes to that partition. All the other Brokers are Followers.
- In the event of Leader failure, a Follower is chosen to takeover as Leader.
- Replication Factor can be assigned on a per topic basis.

Producers
- Available in many languages
- Writes messages to Topic Partition on Leader Broker
- Writes can be keyed so that all writes from a Producer go to a specific Partition
- Writes can be individually sent or batched by latency or size
Consumers
- Also available in many languages
- Read messages off Topic Partition on Leader Broker
- Consumers track progress by offset
- Messages can be re-consumed by stating the desired offset
- Consumer Groups are collections of Consumers that are used to evenly distribute reads from all the partitions of a topic.

Demo

Intro to Kafka
By Andrew MacKenzie
Intro to Kafka
- 420