Introduction

Understanding Apache Avro clicks when you see schema and data together. It's one thing to look at schemas alone, but seeing actual data alongside really helps.

Here are complete examples from production Kafka and Hadoop systems. Each shows the schema (blueprint) and sample data (what actually gets stored).

About These Examples

In production, Avro stores data in binary format. The examples below show JSON encoding so you can read them. Binary is what makes Avro efficient - typically 30-50% smaller. Learn more at Apache Avro docs.

Example 1: Simple User Record

Let's start simple - a user record with basic fields. You'd use this in user services or auth systems.

Schema:

{
  "type": "record",
  "name": "User",
  "namespace": "com.example.users",
  "fields": [
    { "name": "id", "type": "int" },
    { "name": "username", "type": "string" },
    { "name": "email", "type": "string" },
    { "name": "age", "type": "int" },
    { "name": "active", "type": "boolean" }
  ]
}

Data:

{
  "id": 1001,
  "username": "elara_quinn",
  "email": "[email protected]",
  "age": 28,
  "active": true
}

What's Happening

Schema defines five fields with types. Data provides values matching those types. In binary encoding, this would be stored without field names, making it way more compact. Try the JSON to Avro converter to see the difference.

Example 2: Nested Records

Real data has nested structures. Here's a customer with an embedded address - common in e-commerce.

Schema:

{
  "type": "record",
  "name": "Customer",
  "namespace": "com.example.customers",
  "fields": [
    { "name": "customerId", "type": "long" },
    { "name": "name", "type": "string" },
    {
      "name": "address",
      "type": {
        "type": "record",
        "name": "Address",
        "fields": [
          { "name": "street", "type": "string" },
          { "name": "city", "type": "string" },
          { "name": "zipCode", "type": "string" }
        ]
      }
    }
  ]
}

Data:

{
  "customerId": 100234567890,
  "name": "Jane Smith",
  "address": {
    "street": "123 Main Street",
    "city": "San Francisco",
    "zipCode": "94102"
  }
}

Nested Data

Address record sits inside Customer. Keeps related data together. When you send this over Kafka, the whole customer object travels as one compact message.

Example 3: Arrays

When you need lists - order items, tags, IDs - arrays handle it.

Schema:

{
  "type": "record",
  "name": "Order",
  "namespace": "com.example.orders",
  "fields": [
    { "name": "orderId", "type": "string" },
    { "name": "total", "type": "double" },
    {
      "name": "items",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "OrderItem",
          "fields": [
            { "name": "productId", "type": "string" },
            { "name": "quantity", "type": "int" },
            { "name": "unitPrice", "type": "double" }
          ]
        }
      }
    }
  ]
}

Data:

{
  "orderId": "ORD-2024-001",
  "total": 149.97,
  "items": [
    {
      "productId": "PROD-123",
      "quantity": 2,
      "unitPrice": 49.99
    },
    {
      "productId": "PROD-456",
      "quantity": 1,
      "unitPrice": 49.99
    }
  ]
}

The items array holds any number of OrderItem records - zero to thousands. Each follows the same structure.

Example 4: Optional Fields (Unions)

Unions make fields optional. Critical for schema evolution - add new fields without breaking old data.

Schema:

{
  "type": "record",
  "name": "Product",
  "namespace": "com.example.products",
  "fields": [
    { "name": "productId", "type": "string" },
    { "name": "name", "type": "string" },
    { "name": "price", "type": "double" },
    { "name": "discount", "type": ["null", "double"], "default": null },
    { "name": "description", "type": ["null", "string"], "default": null }
  ]
}

With Optional Fields:

{
  "productId": "LAPTOP-2024",
  "name": "Professional Laptop",
  "price": 999.99,
  "discount": 50.00,
  "description": "High-performance laptop"
}

Without Optional Fields:

{
  "productId": "MOUSE-2024",
  "name": "Wireless Mouse",
  "price": 29.99,
  "discount": null,
  "description": null
}

Evolution Magic

["null", "double"] means "can be null or a double". Old data without these fields works with new systems. New data can include or skip them. That's why Avro works great with Schema Registry.

Example 5: Enums

Enums restrict fields to specific values. Status fields, categories - stuff with fixed options.

Schema:

{
  "type": "record",
  "name": "Transaction",
  "namespace": "com.example.transactions",
  "fields": [
    { "name": "transactionId", "type": "string" },
    { "name": "amount", "type": "double" },
    {
      "name": "status",
      "type": {
        "type": "enum",
        "name": "Status",
        "symbols": ["PENDING", "COMPLETED", "FAILED", "CANCELLED"]
      }
    }
  ]
}

Data:

{
  "transactionId": "TXN-2024-12345",
  "amount": 250.75,
  "status": "COMPLETED"
}

Why Enums?

Try setting status to "FINALIZED" (not in the list) and you'll get an error. Catches bugs at write time instead of in production. Keeps data consistent across your whole Kafka pipeline.

Example 6: Maps

Maps store flexible key-value pairs when you don't know all keys ahead of time. Common for metadata.

Schema:

{
  "type": "record",
  "name": "UserEvent",
  "namespace": "com.example.events",
  "fields": [
    { "name": "eventId", "type": "string" },
    { "name": "eventType", "type": "string" },
    { "name": "timestamp", "type": "long" },
    { "name": "userId", "type": "string" },
    { "name": "metadata", "type": { "type": "map", "values": "string" } }
  ]
}

Data:

{
  "eventId": "evt-550e8400-e29b-41d4-a716-446655440000",
  "eventType": "USER_LOGIN",
  "timestamp": 1704067200000,
  "userId": "user-12345",
  "metadata": {
    "ipAddress": "192.168.1.100",
    "userAgent": "Mozilla/5.0",
    "sessionId": "sess-abc123"
  }
}

Metadata map holds any string keys and values. Different events have different metadata without schema changes. Very flexible.

Understanding Avro Format

Binary vs JSON

Examples above show JSON encoding so you can read them. Production uses binary - 30-50% more compact and faster.

The string "username" in JSON takes 8 bytes. In binary Avro, it's not stored at all - only the value is stored since the schema defines the field name.

Self-Describing

.avro files embed the schema with data. Makes files self-describing - open a file from 5 years ago and still know what's in it.

In Kafka, schemas live in Schema Registry and messages reference them by ID.

Type Safety

Unlike JSON where anything goes, Avro enforces types. Schema says int, you can't send a string.

Catches errors during development, not in production. Your data pipeline stays clean.

Common Uses

Kafka streaming, Hadoop data lakes, ETL pipelines, microservices, data warehouses, log aggregation.

Companies like LinkedIn, Netflix, Uber use Avro for billions of messages daily.

Tools to Work with Avro

Try these examples yourself:

Official Documentation

Official Resources

Avro Specification
Full format spec
Kafka with Avro
Integration guide
Official Examples
More on GitHub

Related Guides

What is Apache Avro?
Complete overview
Schema Guide
All types explained
Schema Examples
Ready templates

Back to Avro Articles All Categories

Avro Format Examples - Schema and Data Together

Introduction

Example 1: Simple User Record

Schema:

Data:

Example 2: Nested Records

Schema:

Data:

Example 3: Arrays

Schema:

Data:

Example 4: Optional Fields (Unions)

Schema:

With Optional Fields:

Without Optional Fields:

Example 5: Enums

Schema:

Data:

Example 6: Maps

Schema:

Data:

Understanding Avro Format

Binary vs JSON

Self-Describing

Type Safety

Common Uses

Tools to Work with Avro

JSON to Avro

Avro to JSON

Schema Generator

Schema Validator

Avro Formatter

CSV to Avro

Official Documentation

Official Resources

Related Guides