Ops

TALKS

Knowledge worth sharing

#02

Karim Lamouri - Machine Learning Team Lead - @GumGum

AVRO OVERVIEW

Agenda

What DOES it DO

***

Basics

***

DEEP dive

***

CHEATSHEET

What does it do

/ Avro What is it ? /

Avro - What is it ?

> Avro provides

Rich data structures.
A compact, fast, binary data format.
A container file, to store persistent data.
Remote procedure call (RPC).

Avro is a serialization system

> Avro relies on schemas

When Avro data is read, the schema used when writing it is always present. This makes serialization both fast and small.

When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program.

Basics

/ How to represent Avro / Bare minimum / More functionalities / Higher level representation /

The bare minimum

{
  "namespace": "com.gumgum.avro.verity",
  "type": "record",
  "name": "Callback",
  "fields": [
    {
      "name": "callback_uuid",
      "type": "string"
    },
    {
      "name": "target_url",
      "type": "string"
    }
  ]
}

> An AVRO Schema

The bare minimum

{
  "namespace": "com.gumgum.avro.verity",
  "type": "record",
  "name": "Callback",
  "fields": [
    {
      "name": "callback_uuid",
      "type": "string"
    },
    {
      "name": "target_url",
      "type": "string"
    }
  ]
}

> An AVRO Schema

More Functionalities

{
  "protocol": "Callback",
  "namespace": "com.gumgum.avro.verity",
  "types": [
    {
      "type": "record",
      "name": "Callback",
      "fields": [
        {
          "name": "callback_uuid",
          "type": "string"
        },
        {
          "name": "target_url",
          "type": "string"
        }
      ]
    }
  ],
  "messages": {}
}

> AVRO Protocol

HIGHER LEVEL REPRESENTATION

@namespace("com.gumgum.avro.verity")

protocol Callback {
  record Callback {
	string callback_uuid;
	string target_url;
  }
}

> AVRO IDL

Deep-Dive

/ Schema Registry / Schema Compatibility / Tips for easy schema evolution / schema evolution and kafka connect /

SCHEMA REGISTRY

If we always need a schema, do we always send the schema to Kafka as well?

Yes and no. The Kafka Library seamlessly sends a pointer to a schema.

The Kafka SerDe libraries, communicate with the Schema-Registry to get a unique id of a schema.

After the AVRO serialization 5 bytes are prepended to the Avro binary

Byte 0

Byte 1-4

Byte 5-...

Magic Byte
4-bytes schema ID as returned by Schema Registry
Serialized data for the specified schema format

COMPATIBILITY MATRIX

Compatibility Type	Changes allowed	Check against which schemas	Upgrade first
BACKWARD	* Delete fields * Add optional fields	Last version	Consumers
BACKWARD_TRANSITIVE	* Delete fields * Add optional fields	All previous versions	Consumers
FORWARD	* Add fields * Delete optional fields	Last version	Producers
FORWARD_TRANSITIVE	* Add fields * Delete optional fields	All previous versions	Producers
FULL	* Add optional fields * Delete optional fields	Last version	Any order
FULL_TRANSITIVE	* Add optional fields * Delete optional fields	All previous versions	Any order
NONE	* All changes are accepted	Compatibility checking disabled	Depends

Tips for easy schema evolution

/ Make fields nullable / Add default value to fields / Only remove fields with nullable and default value /

TIPS FOR EASY SCHEMA EVOLUTION

Make fields nullable

  "fields" : [
    {"name": "next", "type": ["null", "int"]} // optional next element
  ]

Add default value to fields

Only remove fields with nullable and default value

  "fields" : [
    {"name": "next", "type": ["null", "int"], "default": 0} // default value for next
  ]

SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES

Schema evolution: NONE

SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES

Schema evolution: FORWARD

SCHEMA EVOLUTION & KAFKA-CONNECT PERFORMANCES

Schema evolution: BACKWARD & FULL

CHEATSHEET

   https://lowess.github.io/ops-talks/cheatsheet/ops-talks-02/

Ops

TALKS

Knowledge worth sharing

By Florian

https://lowess.github.io/ops-talks

Ops-Talks #02

By Florian Dambrine

Ops-Talks #02

OpsTalks #02 - AVRO

1,541

Florian Dambrine

I am a Freelance DevOps Engineer graduated from UTC (University of Technology of Compiègne) in 2014. I am a DevOps enthusiast embracing Cloud computing technologies to build automated infrastructure at large scale.

Ops

TALKS

AVRO OVERVIEW

Agenda

What DOES it DO

Basics

DEEP dive

CHEATSHEET

What does it do

/ Avro What is it ? /

Avro - What is it ?

Basics

/ How to represent Avro / Bare minimum / More functionalities / Higher level representation /

The bare minimum

The bare minimum

More Functionalities

Deep-Dive

/ Schema Registry / Schema Compatibility / Tips for easy schema evolution / schema evolution and kafka connect /

CHEATSHEET

Ops

TALKS

Knowledge worth sharing

https://lowess.github.io/ops-talks

Ops-Talks #02

More from Florian Dambrine