Ops

 TALKS

Knowledge worth sharing

#02

Karim Lamouri - Machine Learning Team Lead - @GumGum

AVRO OVERVIEW

Agenda

What DOES it DO

***

Basics

***

DEEP dive

***

CHEATSHEET

What does it do

/ Avro What is it ? /

Avro - What is it ?

> Avro provides

  • Rich data structures.
  • A compact, fast, binary data format.
  • A container file, to store persistent data.
  • Remote procedure call (RPC).

Avro is a serialization system

> Avro relies on schemas

When Avro data is read, the schema used when writing it is always present. This makes serialization both fast and small.

When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program.

Basics

/ How to represent Avro / Bare minimum / More functionalities / Higher level representation /

The bare minimum

{
  "namespace": "com.gumgum.avro.verity",
  "type": "record",
  "name": "Callback",
  "fields": [
    {
      "name": "callback_uuid",
      "type": "string"
    },
    {
      "name": "target_url",
      "type": "string"
    }
  ]
}

> An AVRO Schema

The bare minimum

{
  "namespace": "com.gumgum.avro.verity",
  "type": "record",
  "name": "Callback",
  "fields": [
    {
      "name": "callback_uuid",
      "type": "string"
    },
    {
      "name": "target_url",
      "type": "string"
    }
  ]
}

> An AVRO Schema

More  Functionalities

{
  "protocol": "Callback",
  "namespace": "com.gumgum.avro.verity",
  "types": [
    {
      "type": "record",
      "name": "Callback",
      "fields": [
        {
          "name": "callback_uuid",
          "type": "string"
        },
        {
          "name": "target_url",
          "type": "string"
        }
      ]
    }
  ],
  "messages": {}
}

> AVRO Protocol

HIGHER LEVEL REPRESENTATION

@namespace("com.gumgum.avro.verity")

protocol Callback {
  record Callback {
	string callback_uuid;
	string target_url;
  }
}

> AVRO IDL

Deep-Dive

/ Schema Registry / Schema Compatibility / Tips for easy schema evolution / schema evolution and kafka connect /

SCHEMA REGISTRY

If we always need a schema, do we always send the schema to Kafka as well?

Yes and no. The Kafka Library seamlessly sends a pointer to a schema.

The Kafka SerDe libraries, communicate with the Schema-Registry to get a unique id of a schema.

 

After the AVRO serialization 5 bytes are prepended to the Avro binary

Byte 0

Byte 1-4

Byte 5-...

  • Magic Byte
     
  • 4-bytes schema ID as returned by Schema Registry
     
  • Serialized data for the specified schema format

COMPATIBILITY MATRIX

Compatibility Type Changes allowed Check against which schemas Upgrade first
BACKWARD * ​Delete fields
* Add optional fields
Last version ​Consumers
BACKWARD_TRANSITIVE * ​Delete fields
* Add optional fields
All previous versions ​Consumers
FORWARD * Add fields
* Delete optional fields
Last version ​Producers
FORWARD_TRANSITIVE * Add fields
* Delete optional fields
All previous versions ​Producers
FULL * Add optional fields
* Delete optional fields
Last version Any order
FULL_TRANSITIVE * Add optional fields
* Delete optional fields
All previous versions Any order
NONE * All changes are accepted Compatibility checking disabled Depends

Tips for easy schema evolution

/ Make fields nullable  / Add default value to fields / Only remove fields with nullable and default value /

TIPS FOR EASY SCHEMA EVOLUTION

  • Make fields nullable
  "fields" : [
    {"name": "next", "type": ["null", "int"]} // optional next element
  ]
  • Add default value to fields
  • Only remove fields with nullable and default value
  "fields" : [
    {"name": "next", "type": ["null", "int"], "default": 0} // default value for next
  ]

SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES

E1

E2

E2

E1

Schema evolution: NONE

E1

E2

E2

E1

S1

S2

S1

SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES

E1

E2

E2

E1

Schema evolution: FORWARD

E1

E2

E2

E1

S1

SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES

E1

E2

E2

E1

Schema evolution: BACKWARD & FULL

E1

E2

E2

E1

S1

S2

CHEATSHEET

Ops

 TALKS

Knowledge worth sharing

By Florian

Ops-Talks #02

By Florian Dambrine

Ops-Talks #02

OpsTalks #02 - AVRO

  • 1,243