Ops
TALKS
Knowledge worth sharing
#02
Karim Lamouri - Machine Learning Team Lead - @GumGum
AVRO OVERVIEW
Agenda
What DOES it DO
***
Basics
***
DEEP dive
***
CHEATSHEET
What does it do
/ Avro What is it ? /
Avro - What is it ?
> Avro provides
- Rich data structures.
- A compact, fast, binary data format.
- A container file, to store persistent data.
- Remote procedure call (RPC).
Avro is a serialization system
> Avro relies on schemas
When Avro data is read, the schema used when writing it is always present. This makes serialization both fast and small.
When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program.
Basics
/ How to represent Avro / Bare minimum / More functionalities / Higher level representation /
The bare minimum
{
"namespace": "com.gumgum.avro.verity",
"type": "record",
"name": "Callback",
"fields": [
{
"name": "callback_uuid",
"type": "string"
},
{
"name": "target_url",
"type": "string"
}
]
}
> An AVRO Schema
The bare minimum
{
"namespace": "com.gumgum.avro.verity",
"type": "record",
"name": "Callback",
"fields": [
{
"name": "callback_uuid",
"type": "string"
},
{
"name": "target_url",
"type": "string"
}
]
}
> An AVRO Schema
More Functionalities
{
"protocol": "Callback",
"namespace": "com.gumgum.avro.verity",
"types": [
{
"type": "record",
"name": "Callback",
"fields": [
{
"name": "callback_uuid",
"type": "string"
},
{
"name": "target_url",
"type": "string"
}
]
}
],
"messages": {}
}
> AVRO Protocol
HIGHER LEVEL REPRESENTATION
@namespace("com.gumgum.avro.verity")
protocol Callback {
record Callback {
string callback_uuid;
string target_url;
}
}
> AVRO IDL
Deep-Dive
/ Schema Registry / Schema Compatibility / Tips for easy schema evolution / schema evolution and kafka connect /
SCHEMA REGISTRY
If we always need a schema, do we always send the schema to Kafka as well?
Yes and no. The Kafka Library seamlessly sends a pointer to a schema.
The Kafka SerDe libraries, communicate with the Schema-Registry to get a unique id of a schema.
After the AVRO serialization 5 bytes are prepended to the Avro binary
Byte 0
Byte 1-4
Byte 5-...
- Magic Byte
-
4-bytes schema ID as returned by Schema Registry
- Serialized data for the specified schema format
COMPATIBILITY MATRIX
Compatibility Type | Changes allowed | Check against which schemas | Upgrade first |
---|---|---|---|
BACKWARD | * Delete fields * Add optional fields |
Last version | Consumers |
BACKWARD_TRANSITIVE | * Delete fields * Add optional fields |
All previous versions | Consumers |
FORWARD | * Add fields * Delete optional fields |
Last version | Producers |
FORWARD_TRANSITIVE | * Add fields * Delete optional fields |
All previous versions | Producers |
FULL | * Add optional fields * Delete optional fields |
Last version | Any order |
FULL_TRANSITIVE | * Add optional fields * Delete optional fields |
All previous versions | Any order |
NONE | * All changes are accepted | Compatibility checking disabled | Depends |
Tips for easy schema evolution
/ Make fields nullable / Add default value to fields / Only remove fields with nullable and default value /
TIPS FOR EASY SCHEMA EVOLUTION
- Make fields nullable
"fields" : [
{"name": "next", "type": ["null", "int"]} // optional next element
]
- Add default value to fields
- Only remove fields with nullable and default value
"fields" : [
{"name": "next", "type": ["null", "int"], "default": 0} // default value for next
]
SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES
E1
E2
E2
E1
Schema evolution: NONE
E1
E2
E2
E1
S1
S2
S1
SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES
E1
E2
E2
E1
Schema evolution: FORWARD
E1
E2
E2
E1
S1
SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES
E1
E2
E2
E1
Schema evolution: BACKWARD & FULL
E1
E2
E2
E1
S1
S2
CHEATSHEET
Ops
TALKS
Knowledge worth sharing
By Florian