Flink 102
Core Concepts other options
Stream Basics
Life is Easy without State
(Source Float) DataStream<Float>
DataStream<Float> (map Float->Float) DataStream<Float>
DataStream<Float> (map Float->Boolean) DataStream<Boolean>
DataStream<Boolean> (sink Boolean)
Stream Basics
- DataStreams are infinite
- Operators perform transformations
- Data is processed in the order in which it is received
- Backup/Restore is easy (check source offset)
State Instantly Makes Life HARD
- Let's say temperature alerts when it's over 100 degrees C, fires once, doesn't fire again until the next event.
State Instantly Makes Life HARD
- Requires Data In Order
- Requires Knowledge on if we've alerted
Watermarking
- Event Time
- Buffers for a defined period until Watermark is met
- Emits Data
Watermarking
- How long can we wait?
- What do we do if something passes our wait timer?
- How much memory does this use?
- What does this do to system latency?
Flink State
- Flink can capture state variables and remember them
- This hurts composition
- This means operators need to be serializable
- This means we have backup/restore operational stories
Flink Windows
- What if we wanted to know the average temperature during an alert?
- We now need to collect data for an arbitrary period
- Windowing lets us aggregate data according to some criteria and emit a collection.
- Window Start when we pass threshold, collect data until under threshold
- Fire collection of datapoints to an operator that generates an average
- Windows can be very simple too (i.e. collect 1 minute of data)
- Windows don't like to reopen (tricky)
Final Graph
More interesting things
- Streams form a graph
- Can be forked
- Operators can emit zero to many data points.
- Cycles should be avoided
- We can query running state
- We need to advance forward down the graph
Streaming vs Micro Batching
- Allows us to play data in Event Time order
- Windowing sounds similar to a batch, but allows more flexibility in terms of window size
Nothing New Under the Sun
- People have done this all before, but Flink as a framework assembles these features for us.
Flink 102
By Philip Doctor
Flink 102
- 1,534