Event Driven
Architectures with
Apache Kafka on Heroku
Chris Castle, Developer Advocate
Rand Fitzpatrick, Director of Product
November 3, 2016
What problems does Apache Kafka solve?
What are the core concepts of Kafka?
Why Apache Kafka on Heroku?
What problems does Apache Kafka solve?
Event-Driven Architecture
Event-driven architecture (EDA), also known as message-driven architecture, is a software architecture pattern promoting the production, detection, consumption of, and reaction to events.
Source: Wikipedia
Forward-Looking Statements
Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
What Are Events?
"Contextualized operation on state"
Context
When was the event? (event time, process time)?
What produced the event? (causal history, device, etc)
Where did the event occur? (system location, geo location)
Operation
What function was applied? (create, update, delete, etc)
What are the characteristics of the function?
State
What is the data involved in the event?
How is that data identified?
"Contextualized operation on state"
Event Examples
Product views
Completed sales
Page visits
Site logins
Shipping notifications
Inventory received
IoT sensor values
Weather data
Traffic data
Tweets
Election polling data!
Completed sale
2016-11-03T15:13:27Z
Retail www site
referrer Google search
Inventory item purchased
Amazon Echo, Black
$179.99
ID B00X4WHP5E
Context
Operation
State
Why Should I Care?
- Scaling too slowly leads to dropped data
- Overprovisioning leads to inefficient systems
- Dataflow between processing stages requires coordination
- Parallel pipelines with the same data can be nontrivial
- Service discovery must support current and future processes
-
Sequencing service availability is critical to system function
-
Possible loss of state when individual services fail
Why Should I Care?
Inbound Streams
- Scaling too slowly leads to dropped data
- Overprovisioning leads to inefficient systems
- Backpressure and other coordination is hard!
Data Pipelines
- Dataflow between processing stages requires coordination
- Parallel pipelines with the same data can be nontrivial
- Provenance and auditability!?
Microservices
- Service discovery must support current and future processes
- Sequencing service availability is critical to system function
- Possible loss of state when individual services fail
Why Should I Care?
Inbound Streams
- Event streams in Kafka allow other resources to pull when ready
- Resources can fail and reconnect without dropping events
- Kafka provides elasticity, reducing the need for backpressure
Data Pipelines
- Dataflow coordination is reduced via event stream structure
- The immutability of data allows for trivial parallel processing
- Tracking provenance and lineage of data becomes possible
Microservices
- Services now only need to discover topics in Kafka
- Service availability sequencing is relaxed
-
Inter-service communication is more robust
Use Cases
Heroku Platform Event Stream
Use Cases
Heroku Operational Experience: App Metrics
Use Cases
Heroku App Metrics
Use Cases
Twitter Analytics Dashboard
Use Cases Generalized
Inbound Streams
Data Pipelines
Microservices
Platform
Event Stream
App Metrics
Twitter Analytics
What are the core concepts of Kafka?
Apache Kafka Core Concepts
PRODUCERS
CONSUMERS
Brokers
The instances running Kafka and managing streams of events in a cluster.
Producers + Consumers
Clients that write to or read from a Kafka cluster.
Topics
Streams of events that are replicated across the brokers. Configured with time based retention or log compaction.
Partitions
Discrete subsets of topics, and important tuning points for parallelism and ordering.
BROKER
TOPIC
PARTITION
Example Producers
Product views
Completed sales
Page visits
Site logins
Shipping notifications
Inventory received
IoT data
Weather data
Traffic data
Tweets
Election polling data!
Web server
Payment processor
Browser
Authentication service
Shipping provider
Warehouse
Motion sensor
Rain gauge
Vehicle sensor
Online/phone survey
Example Consumers
Product views
Completed sales
Page visits
Site logins
Shipping notifications
Inventory received
IoT data
Weather data
Traffic data
Tweets
Election polling data!
Personalization engine
Accounting system
Reporting dashboard
Security audit service
Shipping provider
Inventory database
Actuator
Climate model
Traffic map
Analytics dashboard
Election forecast
Complex Architecture
Complex Controls
TOPIC
PARTITION
Other Kafka primitives to provide structure to Kafka event streams
Retention
Log compaction
Replication factor
Delivery guarantees
Interacting with Kafka
and many more...
Kafka Connect
Some examples: HDFS, JDBC, Elasticsearch, Couchbase, Oracle, MS SQL Server, Cassandra, DynamoDB, Salesforce Streaming API, Splunk
Image credit: Confluent Kafka Connect announcement blog post
Why Apache Kafka on Heroku?
Without Heroku
Apache Kafka
The heart of the event management system, with a broad variety of configurations and options.
Apache Zookeeper
The system’s consensus and coordination cluster is vital for Kafka’s operation.
OS + JVM Tuning
Tuning the cluster runtimes can be an art.
Instances + Networking
Physical or virtual, the infrastructure behind clusters must be well considered.
Myriad Moving Pieces
Apache Kafka on Heroku
Simple Configuration
Apache Kafka on Heroku
Automated Operations
Apache Kafka on Heroku
Experienced Staff
- Self-Healing
- Current Version
- No-Downtime Upgrades
Heroku engineers have contributed patches to the core open source Kafka project.
Apache Kafka on Heroku
Global
US West
US East
Ireland
Germany
Japan
Sydney
Let's Review...
...and get you started with Kafka!
Apache Kafka is a valuable tool for building architectures to support inbound event streams, data processing pipelines, and microservices coordination.
The primitives provided by Kafka -- topics, partitions, retention duration, log compaction, and replication -- provide the tools to manage structured event streams.
Apache Kafka on Heroku simplifies operational complexity so that any developer can get started quickly and feel confident that their application is supported by a rock-solid, production service.
Get started at
hrku.co/use-kafka
Q&A
Rand Fitzpatrick, Director of Product
Chris Castle, Developer Advocate
But first, please take one minute to answer a few quick questions so we can make webinars like this even better for you.
Learn More
Apache Kafka on Heroku
Get Started
https://elements.heroku.com/addons/heroku-kafka
Documentation
https://devcenter.heroku.com/articles/kafka-on-heroku
Kafka Event Stream Modeling
https://devcenter.heroku.com/articles/kafka-event-stream-modeling
Podcast: Managed Kafka with Heroku Engineer Tom Crayford
http://softwareengineeringdaily.com/2016/10/25/managed-kafka-with-tom-crayford/
Thank you!
Apache Kafka on Heroku
By Christopher Castle
Apache Kafka on Heroku
- 1,012