Flat Buffers
https://google.github.io/flatbuffers/
Flashback to structs in c
struct contact
{
char name[50];
char phone[10];
int age;
};
Data is stored as bytes, co-located in memory. It's very small (compared to serialized formats like json), it's fast to read, but it's hard to transport, why?
* If I gave someone else a memory address to read, they'd need to know how long each field is (ex, first 50 bytes are the name field)
* They'd need to know the order of the bytes (big vs little endian)
* What if consumer is java and not c? what's a struct?
Typical deserialization (not just Java)
* Given a byte array
* (for non-schema formats like json) scan the array, look for tokens, parse out structure (i.e. look for {}, "", :, etc)
* Create a new object in memory
* Copy the data into the new memory object
[ A1, 00, B7, 23 ]
class Foo {
int bar; // 161
int baz; // 9143
}
needed to know byte order, had to copy data, data in memory twice, extra GC
Flat Buffer approach (simplified)
* Given a byte array
* Given a schema
* data "on demand" read out of the byte array according to schema
[ A1, 00, B7, 23 ]
class Foo(val byteArray) {
fun getBar(): Int {
// jump to offset 0, read 2 bytes as little endian int
}
fun getBaz(): Int {
// jump to offset 2, read 2 bytes as little endian int
}
}
Flat Buffer use (tl;dr)
* Write Schema file
* Generate Java classes (or c++/python/etc) from schema
* Use FlatBufferBuilder to build FBO
* Traverse your byte buffer in place
Flat Buffer (physIQ)
* Write Schema file
contracts/flatbuffers/schema/com.physiq.vitalink.sdk.flatbuffers.series.fbs
table Int8Channel {
data:[byte];
}
union ChannelDataUnion {
Int8Channel,
...
StringChannel
}
table ChannelData {
readings:ChannelDataUnion;
}
table SamplingSetData {
channels:[ChannelData];
}
Flat Buffer (physIQ)
* Generate Java Classes
cloud/code/sdk/java-sdk/build/generated/source/
flatbuffers-generator/main/java/com/physiq/
vitalink/sdk/flatbuffers/series/Int8Channel.java
public final class Int8Channel extends Table {
public static Int8Channel getRootAsInt8Channel(ByteBuffer _bb) { return getRootAsInt8Channel(_bb, new Int8Channel()); }
...
public static int createInt8Channel(FlatBufferBuilder builder,
int dataOffset) {
builder.startObject(1);
Int8Channel.addData(builder, dataOffset);
return Int8Channel.endInt8Channel(builder);
}
Flat Buffer (physIQ)
* Build FBO
sampling_sets:
- alias: min-avg
channels:
- alias: hr
desc: Minute average of heart rate.
name: Heart Rate
type: INT16
classification: HR
units: BPM
table SeriesFrame {
frameId:long;
samplingSets:[SamplingSetData];
}
table SamplingSetData {
id:byte;
channels:[ChannelData];
}
cloud/code/flink/processor/sbm/src/test/kotlin/com/physiq/vitalink/timeseries/sbm/DataHelpers.kt
fun buildVitalHRSeriesFrame(frameNumber: Int, x: ShortArray): ByteArray {
return FlatBufferBuilder().apply {
finish(SeriesFrame.createSeriesFrame(
this, // builder
17460 + frameNumber.toLong(), // frame id
-1, // ingested at micros
SeriesFrame.createSamplingSetsVector(
this, // builder
intArrayOf(SamplingSetData.createSamplingSetData(
this, // builder
0.toByte(), // sampling set number
0, // start offset
SamplingSetData.createChannelsVector(
this, // builder
intArrayOf(ChannelData.createChannelData(
this, // builder
0.toByte(), // vector number
ReadingType.INT16,
Int16Channel.createInt16Channel(
this, // builder
Int16Channel.createDataVector(
this, // builder
x // shortArray
)
)
))
)
))),
-1,
0
))
}.sizedByteArray()
}
val result = Int16Channel().apply {
flatBufferObject.obj.dataAsSeriesFrame().samplingSets(0)
.channels(0).readings(this)
}
val hr = result.data(0)
Read the data without copies
short hp = monster.hp();
Vec3 pos = monster.pos();
Compare to example code from google
Benchmarks and Benefits
also less GC = better performance for us jvm'ers
Further Reading
Flat buffers https://google.github.io/flatbuffers/
C structs https://www.geeksforgeeks.org/structures-c/
Flat Buffers
By Philip Doctor
Flat Buffers
- 1,539