Intro to
Apache Kafka

What is Kafka?

A highly scalable, enterprise-grade, pub/sub messaging system implemented as a distributed transaction log.

  • Created at LinkedIn for internal needs to handle real-time data feeds across the company
  • Open sourced in 2011
  • Written in Scala and Java
  • Runs on the JVM

What is Kafka?

(cont.)

  • Confluent spun off by original devs to provide commercial support for Kafka
  • Currently in use at many large companies including:

    • Netflix

    • PayPal

    • Spotify

    • Uber

    • many, many more

What is it used for?

  • Replaces "traditional" messaging systems like ActiveMQ, RabbitMQ, etc. for pub/sub applications
  • Operational monitoring/metrics
  • Log aggregation - central storage of distributed, individual logs
  • Stream processing - incoming messages can be processed before passing along

Excellent example of using Kafka to provide real-time enforcement of trading rules in an MMORPG can be found at:
http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/

Architecture

Partitions

  • Partitions are the unit of parallelism in Kafka
  • Partitions provide durability to Topics
  • Partitions are stored on disk on Brokers
  • Partitions may contain data not yet replicated to Follower nodes
  • A write is not "committed" until all replicas sync

Replication

  • Topic Partitions are replicated across Brokers to provide automatic failover in the event of Broker failure.
  • Each Partition has a Leader Broker that handles all reads & writes to that partition. All the other Brokers are Followers.
  • In the event of Leader failure, a Follower is chosen to takeover as Leader.
  • Replication Factor can be assigned on a per topic basis.

Producers

  • Available in many languages
  • Writes messages to Topic Partition on Leader Broker 
  • Writes can be keyed so that all writes from a Producer go to a specific Partition
  • Writes can be individually sent or batched by latency or size

Consumers

  • Also available in many languages
  • Read messages off Topic Partition on Leader Broker
  • Consumers track progress by offset
  • Messages can be re-consumed  by stating the desired offset
  • Consumer Groups are collections of Consumers that are used to evenly distribute reads from all the partitions of a topic.

Demo

Intro to Kafka

By Andrew MacKenzie

Intro to Kafka

  • 420