Apache Avro Format Examples - Complete Reference Guide

Comprehensive examples of Apache Avro data format with schemas, data encoding, and real-world use cases for Kafka and Hadoop ecosystems.

Example 1

Simple Avro Record Format

The most basic Apache Avro format example shows a simple user record with primitive types. This format is commonly used in microservices and data streaming applications.

Avro Schema:

{
  "type": "record",
  "name": "User",
  "namespace": "com.example.users",
  "fields": [
    { "name": "id", "type": "int" },
    { "name": "username", "type": "string" },
    { "name": "email", "type": "string" },
    { "name": "age", "type": "int" },
    { "name": "active", "type": "boolean" }
  ]
}

Data in Avro Format (JSON Encoding):

{
  "id": 1001,
  "username": "elara_quinn",
  "email": "[email protected]",
  "age": 28,
  "active": true
}

Note: In production, Avro data is typically stored in efficient binary format. The JSON encoding shown above is used for readability and testing. Binary encoding is 30-50% smaller and faster to process.

Example 2

Nested Record Format

Complex Avro format with nested records, commonly used for representing structured data like customer profiles with address information in Apache Kafka systems.

Avro Schema with Nested Record:

{
  "type": "record",
  "name": "Customer",
  "namespace": "com.example.customers",
  "fields": [
    { "name": "customerId", "type": "long" },
    { "name": "name", "type": "string" },
    {
      "name": "address",
      "type": {
        "type": "record",
        "name": "Address",
        "fields": [
          { "name": "street", "type": "string" },
          { "name": "city", "type": "string" },
          { "name": "zipCode", "type": "string" }
        ]
      }
    }
  ]
}

Data Example:

{
  "customerId": 100234567890,
  "name": "Jane Smith",
  "address": {
    "street": "123 Main Street",
    "city": "San Francisco",
    "zipCode": "94102"
  }
}

Example 3

Avro Array Format

This example shows how arrays are represented in Avro format, useful for collections like order items, tags, or product lists in e-commerce and inventory systems.

Avro Schema with Array:

{
  "type": "record",
  "name": "Order",
  "namespace": "com.example.orders",
  "fields": [
    { "name": "orderId", "type": "string" },
    { "name": "total", "type": "double" },
    {
      "name": "items",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "OrderItem",
          "fields": [
            { "name": "productId", "type": "string" },
            { "name": "quantity", "type": "int" },
            { "name": "price", "type": "double" }
          ]
        }
      }
    }
  ]
}

Data Example:

{
  "orderId": "ORD-2024-001",
  "total": 149.97,
  "items": [
    {
      "productId": "PROD-123",
      "quantity": 2,
      "price": 49.99
    },
    {
      "productId": "PROD-456",
      "quantity": 1,
      "price": 49.99
    }
  ]
}

Example 4

Union Type Format (Optional Fields)

Avro union types allow fields to accept multiple types, commonly used for optional fields. This format is essential for schema evolution in production systems.

Avro Schema with Union Types:

{
  "type": "record",
  "name": "Product",
  "namespace": "com.example.products",
  "fields": [
    { "name": "productId", "type": "string" },
    { "name": "name", "type": "string" },
    { "name": "price", "type": "double" },
    { "name": "discount", "type": ["null", "double"], "default": null },
    { "name": "description", "type": ["null", "string"], "default": null }
  ]
}

Data Example (with optional fields):

{
  "productId": "LAPTOP-2024",
  "name": "Professional Laptop",
  "price": 999.99,
  "discount": 50.00,
  "description": "High-performance laptop with 16GB RAM"
}

Data Example (without optional fields):

{
  "productId": "MOUSE-2024",
  "name": "Wireless Mouse",
  "price": 29.99,
  "discount": null,
  "description": null
}

Example 5

Enum Type Format

Avro enums define a fixed set of allowed values, perfect for status fields, categories, or any field with a predefined set of options. This ensures data quality and validation.

Avro Schema with Enum:

{
  "type": "record",
  "name": "Transaction",
  "namespace": "com.example.transactions",
  "fields": [
    { "name": "transactionId", "type": "string" },
    { "name": "amount", "type": "double" },
    {
      "name": "status",
      "type": {
        "type": "enum",
        "name": "Status",
        "symbols": ["PENDING", "COMPLETED", "FAILED", "CANCELLED"]
      }
    }
  ]
}

Data Example:

{
  "transactionId": "TXN-2024-12345",
  "amount": 250.75,
  "status": "COMPLETED"
}

Example 6

Kafka Event Message Format

A complete example of an Avro-formatted event message for Apache Kafka, including metadata fields commonly used in event-driven architectures and data streaming pipelines.

Kafka Event Schema:

{
  "type": "record",
  "name": "UserEvent",
  "namespace": "com.example.events",
  "fields": [
    { "name": "eventId", "type": "string" },
    { "name": "eventType", "type": "string" },
    { "name": "timestamp", "type": "long" },
    { "name": "userId", "type": "string" },
    { "name": "action", "type": "string" },
    { "name": "metadata", "type": { "type": "map", "values": "string" } }
  ]
}

Event Data Example:

{
  "eventId": "evt-550e8400-e29b-41d4-a716-446655440000",
  "eventType": "USER_LOGIN",
  "timestamp": 1704067200000,
  "userId": "user-12345",
  "action": "login",
  "metadata": {
    "ipAddress": "192.168.1.100",
    "userAgent": "Mozilla/5.0",
    "sessionId": "sess-abc123"
  }
}

Understanding Avro Data Format

What is Apache Avro Format?

Apache Avro is a data serialization format that combines a schema with the data. Unlike JSON or XML, Avro stores data in a compact binary format alongside a schema definition, making it highly efficient for data storage and transmission in distributed systems.

The Avro format consists of two main components: the schema (which defines the structure) and the data (which follows that structure). This separation enables schema evolution, where you can modify schemas over time while maintaining compatibility with existing data.

Binary vs JSON Encoding

Avro supports two encoding formats:

Binary encoding: Compact, fast, and efficient for production use. Data is serialized in a space-efficient binary format that requires the schema to be decoded.
JSON encoding: Human-readable format used for testing, debugging, and development. The examples on this page use JSON encoding for clarity.

In production systems like Kafka Schema Registry, binary encoding is typically used because it is 30-50% more efficient than JSON.

Common Use Cases for Avro Format

Apache Kafka: Avro is the preferred format for Kafka message serialization with Schema Registry integration
Hadoop Ecosystem: Used for storing data in HDFS, Hive, Spark, and other big data tools
Data Pipelines: ETL processes use Avro for efficient data transfer between systems
Microservices: Service-to-service communication with schema validation and evolution support
Data Lakes: Long-term storage of structured data with schema versioning

Frequently Asked Questions

What is the difference between Avro format and JSON format?

While both can represent structured data, Avro format includes a schema definition and uses binary encoding for efficiency. JSON is human-readable but larger in size and slower to parse. Avro is 30-50% smaller than JSON and significantly faster for serialization and deserialization, making it ideal for high-throughput systems like Kafka and Hadoop.

How do I create data in Avro format?

To create Avro format data, you need a schema and your data. Use the JSON to Avro converter to transform JSON data into Avro format, or use the Avro Schema Generator to create a schema from sample JSON data first.

Can I read Avro format data without the schema?

No, you need the schema to decode Avro binary data. However, Avro files (.avro) typically embed the schema in the file header, so the schema travels with the data. When using Kafka Schema Registry, schemas are stored centrally and referenced by ID, allowing consumers to retrieve the schema and decode messages.

What are Avro primitive types?

Avro supports primitive types: null, boolean, int (32-bit), long (64-bit), float, double, bytes, and string. It also supports complex types including records (structured objects), arrays, maps, unions (multiple possible types), enums (predefined values), and fixed (fixed-length byte arrays). These types cover most data modeling needs in modern applications.

How does Avro handle schema evolution?

Avro supports schema evolution through forward, backward, and full compatibility modes. You can add fields with defaults, remove optional fields, and rename fields while maintaining compatibility with existing data. This makes Avro excellent for systems that evolve over time, like long-running Kafka topics or data warehouses.

What tools work with Avro format?

Many tools support Avro format: JSON to Avro for encoding data, Avro to JSON for decoding, Avro Formatter for beautifying schemas, Avro Schema Validator for checking schema validity, and CSV to Avro for batch data conversion.

Avro Format Example - Complete Apache Avro Data Format Reference Guide

Apache Avro Format Examples - Complete Reference Guide

Simple Avro Record Format

Avro Schema:

Data in Avro Format (JSON Encoding):

Nested Record Format

Avro Schema with Nested Record:

Data Example:

Avro Array Format

Avro Schema with Array:

Data Example:

Union Type Format (Optional Fields)

Avro Schema with Union Types:

Data Example (with optional fields):

Data Example (without optional fields):

Enum Type Format

Avro Schema with Enum:

Data Example:

Kafka Event Message Format

Kafka Event Schema:

Event Data Example:

Understanding Avro Data Format

What is Apache Avro Format?

Binary vs JSON Encoding

Common Use Cases for Avro Format

Frequently Asked Questions

What is the difference between Avro format and JSON format?

How do I create data in Avro format?

Can I read Avro format data without the schema?

What are Avro primitive types?

How does Avro handle schema evolution?

What tools work with Avro format?

Related Tools

JSON to Avro

Avro to JSON

Avro Fixer

Avro Schema Generator

Avro Schema Validator

Avro Formatter

Support This Tool