Florian Dambrine
I am a Freelance DevOps Engineer graduated from UTC (University of Technology of Compiègne) in 2014. I am a DevOps enthusiast embracing Cloud computing technologies to build automated infrastructure at large scale.
Knowledge worth sharing
#02
Karim Lamouri - Machine Learning Team Lead - @GumGum
***
***
***
> Avro provides
Avro is a serialization system
> Avro relies on schemas
When Avro data is read, the schema used when writing it is always present. This makes serialization both fast and small.
When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program.
{
"namespace": "com.gumgum.avro.verity",
"type": "record",
"name": "Callback",
"fields": [
{
"name": "callback_uuid",
"type": "string"
},
{
"name": "target_url",
"type": "string"
}
]
}
> An AVRO Schema
{
"namespace": "com.gumgum.avro.verity",
"type": "record",
"name": "Callback",
"fields": [
{
"name": "callback_uuid",
"type": "string"
},
{
"name": "target_url",
"type": "string"
}
]
}
> An AVRO Schema
{
"protocol": "Callback",
"namespace": "com.gumgum.avro.verity",
"types": [
{
"type": "record",
"name": "Callback",
"fields": [
{
"name": "callback_uuid",
"type": "string"
},
{
"name": "target_url",
"type": "string"
}
]
}
],
"messages": {}
}
> AVRO Protocol
HIGHER LEVEL REPRESENTATION
@namespace("com.gumgum.avro.verity")
protocol Callback {
record Callback {
string callback_uuid;
string target_url;
}
}
> AVRO IDL
SCHEMA REGISTRY
If we always need a schema, do we always send the schema to Kafka as well?
Yes and no. The Kafka Library seamlessly sends a pointer to a schema.
The Kafka SerDe libraries, communicate with the Schema-Registry to get a unique id of a schema.
After the AVRO serialization 5 bytes are prepended to the Avro binary
Byte 0
Byte 1-4
Byte 5-...
COMPATIBILITY MATRIX
Compatibility Type | Changes allowed | Check against which schemas | Upgrade first |
---|---|---|---|
BACKWARD | * Delete fields * Add optional fields |
Last version | Consumers |
BACKWARD_TRANSITIVE | * Delete fields * Add optional fields |
All previous versions | Consumers |
FORWARD | * Add fields * Delete optional fields |
Last version | Producers |
FORWARD_TRANSITIVE | * Add fields * Delete optional fields |
All previous versions | Producers |
FULL | * Add optional fields * Delete optional fields |
Last version | Any order |
FULL_TRANSITIVE | * Add optional fields * Delete optional fields |
All previous versions | Any order |
NONE | * All changes are accepted | Compatibility checking disabled | Depends |
Tips for easy schema evolution
/ Make fields nullable / Add default value to fields / Only remove fields with nullable and default value /
TIPS FOR EASY SCHEMA EVOLUTION
"fields" : [
{"name": "next", "type": ["null", "int"]} // optional next element
]
"fields" : [
{"name": "next", "type": ["null", "int"], "default": 0} // default value for next
]
SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES
E1
E2
E2
E1
Schema evolution: NONE
E1
E2
E2
E1
S1
S2
S1
SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES
E1
E2
E2
E1
Schema evolution: FORWARD
E1
E2
E2
E1
S1
SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES
E1
E2
E2
E1
Schema evolution: BACKWARD & FULL
E1
E2
E2
E1
S1
S2
By Florian
By Florian Dambrine
OpsTalks #02 - AVRO
I am a Freelance DevOps Engineer graduated from UTC (University of Technology of Compiègne) in 2014. I am a DevOps enthusiast embracing Cloud computing technologies to build automated infrastructure at large scale.