Event Driven

Architectures with

Apache Kafka on Heroku

Chris Castle, Developer Advocate

Rand Fitzpatrick, Director of Product

November 3, 2016

What problems does Apache Kafka solve?
 

What are the core concepts of Kafka?
 

Why Apache Kafka on Heroku?

What problems does Apache Kafka solve?

Event-Driven Architecture

Event-driven architecture (EDA), also known as message-driven architecture, is a software architecture pattern promoting the production, detection, consumption of, and reaction to events.

Source: Wikipedia

Forward-Looking Statements

Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.

 

The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.

Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

 

What Are Events?

"Contextualized operation on state"

Context

When was the event? (event time, process time)?
What produced the event? (causal history, device, etc)
Where did the event occur? (system location, geo location)

Operation

What function was applied? (create, update, delete, etc)
What are the characteristics of the function?

State
What is the data involved in the event?
How is that data identified?

"Contextualized operation on state"

Event Examples

Product views

Completed sales

Page visits

Site logins

Shipping notifications

Inventory received

IoT sensor values

Weather data

Traffic data

Tweets

Election polling data!

Completed sale

2016-11-03T15:13:27Z

Retail www site

referrer Google search

Inventory item purchased

Amazon Echo, Black

$179.99

ID B00X4WHP5E

Context

Operation

State

Why Should I Care?

  • Scaling too slowly leads to dropped data
  • Overprovisioning leads to inefficient systems
  • Dataflow between processing stages requires coordination
  • Parallel pipelines with the same data can be nontrivial
  • Service discovery must support current and future processes
  • Sequencing service availability is critical to system function

  • Possible loss of state when individual services fail

Why Should I Care?

Inbound Streams

  • Scaling too slowly leads to dropped data
  • Overprovisioning leads to inefficient systems
  • Backpressure and other coordination is hard!

Data Pipelines

  • Dataflow between processing stages requires coordination
  • Parallel pipelines with the same data can be nontrivial
  • Provenance and auditability!?

Microservices

  • Service discovery must support current and future processes
  • Sequencing service availability is critical to system function
  • Possible loss of state when individual services fail

Why Should I Care?

Inbound Streams

  • Event streams in Kafka allow other resources to pull when ready
  • Resources can fail and reconnect without dropping events
  • Kafka provides elasticity, reducing the need for backpressure

Data Pipelines

  • Dataflow coordination is reduced via event stream structure
  • The immutability of data allows for trivial parallel processing
  • Tracking provenance and lineage of data becomes possible

Microservices

  • Services now only need to discover topics in Kafka
  • Service availability sequencing is relaxed
  • Inter-service communication is more robust

Use Cases

Heroku Platform Event Stream

Use Cases

Heroku Operational Experience: App Metrics

Use Cases

Heroku App Metrics

Use Cases

Twitter Analytics Dashboard

Use Cases Generalized

Inbound Streams

Data Pipelines

Microservices

Platform
Event Stream

App Metrics

Twitter Analytics

What are the core concepts of Kafka?

Apache Kafka Core Concepts

PRODUCERS

CONSUMERS

​Brokers

The instances running Kafka and managing streams of events in a cluster.

 

​Producers + Consumers

Clients that write to or read from a Kafka cluster.

 

Topics

Streams of events that are replicated across the brokers. Configured with time based retention or log compaction.

 

​Partitions

Discrete subsets of topics, and important tuning points for parallelism and ordering.

BROKER

TOPIC

PARTITION

Example Producers

Product views

Completed sales

Page visits

Site logins

Shipping notifications

Inventory received

IoT data

Weather data

Traffic data

Tweets

Election polling data!

Web server

Payment processor

Browser

Authentication service

Shipping provider

Warehouse

Motion sensor

Rain gauge

Vehicle sensor

Twitter

Online/phone survey

Example Consumers

Product views

Completed sales

Page visits

Site logins

Shipping notifications

Inventory received

IoT data

Weather data

Traffic data

Tweets

Election polling data!

Personalization engine

Accounting system

Reporting dashboard

Security audit service

Shipping provider

Inventory database

Actuator

Climate model

Traffic map

Analytics dashboard

Election forecast

Complex Architecture

Complex Controls

TOPIC

PARTITION

Other Kafka primitives to provide structure to Kafka event streams

Retention

Log compaction

Replication factor

Delivery guarantees

Interacting with Kafka

and many more...

Kafka Connect

Some examples: HDFS, JDBC, Elasticsearch, Couchbase, Oracle, MS SQL Server, Cassandra, DynamoDB, Salesforce Streaming API, Splunk

Why Apache Kafka on Heroku?

Without Heroku

Apache Kafka
The heart of the event management system, with a broad variety of configurations and options.


Apache Zookeeper
The system’s consensus and coordination cluster is vital for Kafka’s operation.


OS + JVM Tuning
Tuning the cluster runtimes can be an art.


Instances + Networking
Physical or virtual, the infrastructure behind clusters must be well considered.

 

Myriad Moving Pieces

Apache Kafka on Heroku

Simple Configuration

Apache Kafka on Heroku

Automated Operations

Apache Kafka on Heroku

Experienced Staff

  • Self-Healing
  • Current Version
  • No-Downtime Upgrades

Heroku engineers have contributed patches to the core open source Kafka project.

Apache Kafka on Heroku

Global

US West

US East

Ireland

Germany

Japan

Sydney

Let's Review...

...and get you started with Kafka!

Apache Kafka is a valuable tool for building architectures to support inbound event streams, data processing pipelines, and microservices coordination.
 

The primitives provided by Kafka -- topics, partitions, retention duration, log compaction, and replication -- provide the tools to manage structured event streams.
 

Apache Kafka on Heroku simplifies operational complexity so that any developer can get started quickly and feel confident that their application is supported by a rock-solid, production service.

Get started at

hrku.co/use-kafka

Q&A

Rand Fitzpatrick, Director of Product

Chris Castle, Developer Advocate

But first, please take one minute to answer a few quick questions so we can make webinars like this even better for you.

Learn More

Thank you!

Apache Kafka on Heroku

By Christopher Castle

Apache Kafka on Heroku

  • 1,012