Avro with Apache Kafka: Schema Registry and Schema Evolution

How to use Apache Avro schemas with Kafka — covering Schema Registry, compatibility rules, and safe schema evolution

February 202612 min read

Why Avro and Kafka Go Together

Apache Kafka moves enormous amounts of data between services. JSON works fine at small scale, but at millions of messages per second, you start hitting real problems: wasted bandwidth from repeating field names, no schema enforcement, and breaking changes that silently corrupt downstream consumers.

Apache Avro solves these. It's the most popular serialization format for Kafka — especially in the Confluent ecosystem — because it offers compact binary encoding, schema-enforced data, and built-in support for schema evolution without breaking existing consumers. If you're new to Avro itself, start with What is Apache Avro? before continuing here.

Smaller Messages

Binary encoding without field names. Avro messages are typically 50–70% smaller than equivalent JSON, reducing Kafka storage and network costs.

Schema Enforcement

Every message is validated against a schema at publish time. Malformed data never enters your Kafka topic.

Safe Evolution

Add or remove fields without breaking existing consumers. Schema Registry enforces compatibility rules before any change goes live.

The Problem with JSON in Kafka

Here's a concrete example of why JSON causes pain at scale:

// JSON message — every record repeats field names
{"userId": 12345, "event": "purchase", "amount": 99.99, "currency": "USD"}
{"userId": 67890, "event": "purchase", "amount": 24.99, "currency": "USD"}
// At 10 million messages/day: ~100MB just for field name repetition
// Avro message — schema stored once in Schema Registry, data is binary
// Same two records: ~40% smaller, type-safe, schema-validated

With Avro, the schema lives once in Schema Registry. Each Kafka message carries only a small schema ID (4 bytes) and the binary-encoded field values. Consumers look up the schema ID to deserialize — no field names in every message.

What is Schema Registry?

Confluent Schema Registry is a centralized service that stores and manages Avro schemas for your Kafka topics. Think of it as a schema version control system.

1

Producer registers schema

The Avro serializer sends the schema to Schema Registry on first use. If compatible with previous versions, it gets a numeric ID (e.g., id=3).

2

Message is published with schema ID

Each Kafka message starts with a magic byte (0x00), 4-byte schema ID, then binary Avro data. No schema embedded in the message.

3

Consumer fetches schema and deserializes

The Avro deserializer reads the schema ID, fetches the schema from Registry (cached after first call), and deserializes the binary data back to a record.

The key benefit: Schema Registry enforces compatibility rules. If you try to register a schema that breaks existing consumers, the registry rejects it before it reaches Kafka.

# Register a schema via REST API
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"schema": "{"type":"record","name":"User","fields":[{"name":"id","type":"int"},{"name":"name","type":"string"}]}"}' \
  http://localhost:8081/subjects/users-value/versions

# Response: {"id": 1}

Schema Evolution in Kafka

Schema evolution is the ability to change your schema over time without breaking existing producers or consumers. It's one of Avro's biggest strengths — and a core reason why it pairs so well with Kafka. If you need a deep dive into schema types and structure first, the Avro Schema Guide covers all primitives, complex types, and field rules. Avro supports three compatibility levels:

CompatibilityWhat it meansSafe upgrade order
BackwardNew schema reads old dataUpgrade consumers first
ForwardOld schema reads new dataUpgrade producers first
FullBoth directions workUpgrade in any order

Safe vs Breaking Changes

✓ Safe changes

  • • Add optional field with a default value
  • • Remove a field that had a default value
  • • Add a type to a union (with care)
  • • Change field order (Avro uses names, not positions)
  • • Add an alias to an existing field

✗ Breaking changes

  • • Remove a required field (no default)
  • • Add a required field without default
  • • Change a field's type (e.g. int → string)
  • • Rename a field without an alias
  • • Change the record name or namespace

// Adding an optional field — backward AND forward compatible

{
  "type": "record",
  "name": "User",
  "namespace": "com.example.avro",
  "fields": [
    { "name": "id", "type": "int" },
    { "name": "name", "type": "string" },
    { "name": "email", "type": "string" },
    { "name": "age", "type": ["null", "int"], "default": null }
  ]
}

The new age field uses a union type ["null", "int"] with default: null. Old consumers that don't know about age will simply ignore it. New consumers reading old data will use the default null. For more ready-to-use schema patterns like this, see Avro Schema Examples.

Tip: Use our Avro Schema Compatibility Checker to verify backward, forward, and full compatibility between your schemas before deploying to Kafka.

Producer and Consumer Setup

Here's how to configure Kafka producers and consumers to use Avro serialization with Schema Registry. These examples use the Confluent Kafka Python client. Before wiring up producers and consumers, use the Avro Schema Validator to confirm your schema is valid, or the Avro Schema Generator to auto-generate one from your JSON data.

Kafka Producer (Python)

from confluent_kafka import Producer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroSerializer
from confluent_kafka.serialization import SerializationContext, MessageField

# Schema Registry client
schema_registry_conf = {'url': 'http://localhost:8081'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)

# Define your Avro schema
schema_str = """
{
  "type": "record",
  "name": "User",
  "namespace": "com.example.avro",
  "fields": [
    {"name": "id", "type": "int"},
    {"name": "name", "type": "string"},
    {"name": "email", "type": "string"}
  ]
}
"""

avro_serializer = AvroSerializer(schema_registry_client, schema_str)

producer_conf = {
    'bootstrap.servers': 'localhost:9092',
}

producer = Producer(producer_conf)

# Produce a message
user = {"id": 1, "name": "Alice", "email": "[email protected]"}
producer.produce(
    topic='users',
    value=avro_serializer(user, SerializationContext('users', MessageField.VALUE))
)
producer.flush()

Kafka Consumer (Python)

from confluent_kafka import Consumer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroDeserializer
from confluent_kafka.serialization import SerializationContext, MessageField

schema_registry_conf = {'url': 'http://localhost:8081'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)

avro_deserializer = AvroDeserializer(schema_registry_client)

consumer_conf = {
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'user-consumer-group',
    'auto.offset.reset': 'earliest'
}

consumer = Consumer(consumer_conf)
consumer.subscribe(['users'])

while True:
    msg = consumer.poll(1.0)
    if msg is None:
        continue
    user = avro_deserializer(
        msg.value(),
        SerializationContext(msg.topic(), MessageField.VALUE)
    )
    print(f"Received: {user}")

Setting Compatibility Level

You can set compatibility per subject (topic) or globally via the Schema Registry REST API:

# Set compatibility for a specific topic's value schema
curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"compatibility": "FULL"}' \
  http://localhost:8081/config/users-value

# Options: BACKWARD, BACKWARD_TRANSITIVE, FORWARD, FORWARD_TRANSITIVE, FULL, FULL_TRANSITIVE, NONE

Recommendation: Use FULL for new topics when possible. This gives you the most flexibility in rolling upgrades without coordination between teams.

Avro vs JSON vs Protobuf in Kafka

Quick comparison of the most common serialization choices for Kafka:

FeatureAvroJSONProtobuf
Message sizeSmall (binary)Large (text)Smallest (binary)
Schema enforcement✓ Built-in✗ Manual✓ Built-in
Schema Registry supportNativeBasicSupported
Schema evolutionExcellentManualGood
Human readableSchema yes, data noYesSchema yes, data no
Code generation requiredOptionalNoneYes
Kafka ecosystem fitBestSimple casesGood

Avro is the go-to for Kafka because Schema Registry was designed with Avro first. Protobuf is slightly more compact and works well too, especially if you already use it across services. JSON is fine for low-volume internal topics where readability matters more than efficiency. For a deeper comparison, see the Protobuf vs Avro guide.

Best Practices

1

Always use Schema Registry

Embedding schemas in messages wastes space and breaks versioning. Schema Registry is the correct approach for production Kafka with Avro.

2

Set compatibility to FULL for critical topics

Full compatibility allows rolling upgrades without coordination. Start strict and relax later if needed — it's harder to tighten after the fact.

3

Always add defaults to new fields

New fields without defaults break backward compatibility. Use ["null", "type"] unions with default: null for optional fields.

4

Use namespaces

Always set a namespace (e.g. com.example.events) to avoid name collisions between teams and services.

5

Check compatibility before deploying

Use our Avro Schema Compatibility Checker or the Schema Registry API /compatibility endpoint to test changes before they go live.

Free Avro Tools

Use these tools to work with Avro schemas in your Kafka projects: