streamin' aint easy

Building Real-Time Streaming Architectures Across Distributed and Heterogeneous Microservices with Kafka

Nathan Murthy

Staff Software Engineer, Tesla

@natemurthy

Control Service

Telemetry Service

Asset Service

MicroServ Energy API

About Me

8.5 years of software engineering experience
Undergrad at UC Berkeley
Written applications for start-ups, big companies, R&D, and academia
- Banatao Institute @ CITRIS
- Fujitsu Labs
- Wireless Glue Networks
- SolarCity
- Tesla
Passionate about cleantech (and clean APIs)

Outline

Protocol potpourri
- From HTTP to {WebSockets,AMQP,gRPC,Kafka,...}
Kafka to the rescue (well, sort of)
Enabling streaming data science
- Case Study: Hornsdale Power Reserve

Protocol Potpourri

HTTP/1.1

Limitations of HTTP/1.1

Client-server constraint (REST)
One-to-one communication model
Synchronous request-response
CRUD semantics (POST, GET, PUT, DELETE)
Long-polling can be expensive
Request-response payload size limits

Composability Patterns

HTTP/1.1

Streaming HTTP

WebSockets

JMS

MQTT

AMQP

Kinesis

Cloud Pub/Sub

Redis

Kafka

gRPC

NATS

...

Requirements

Decouple readers and writers (pub/sub)
Asynchronous communication
Define our own semantics and domain model
High performance: process large volumes of data with low latency and high throughput
High availability, horizontal scalability
Persistent message queues
Avoid cloud provider/vendor lock-in

Topic A

Commit Log

Distributed Commit Log

Kafka: The Definitive Guide (2017)

Kreps, Narkhede, Rao (2011)

Kafka FTW!

Not all Kafka client libraries implemented to spec
No web browser support
Large operational overhead
Producing is easy, consuming can be tricky
Need to interoperate with existing systems

(well, sort of)

Enabling Streaming Data Science

Kafka Ecosystem

Kafka in the Big Data Ecosystem (2017)

Kafka + Spark

https://databricks.com/glossary/what-is-spark-streaming

https://electrek.co/2018/12/06/tesla-battery-report/

Conclusion

There are a lot of streaming technologies out there, find what fits your needs best
Apache Kafka is an effective pub/sub technology choice for critical, high-throughput systems
Recommend data scientists to add Kafka to their developer toolkit for real-time insights

Thank You!

@natemurthy

The Business Case for Streaming

Connect all the things!

https://steemit.com/technology/@michaelizer/internet-of-things-iot-what-is-it-and-predictions-for-this-year

Performance of Large-Scale Battery Energy Storage: Hornsdale Power Reserve (2018)

Model Predictive Control (MPC)

Ungarala, Botchu (2007)

Behrendt (2009)

Performance of Large-Scale Battery Energy Storage: Hornsdale Power Reserve (2018)