Introduction
Understanding Apache Avro clicks when you see schema and data together. It's one thing to look at schemas alone, but seeing actual data alongside really helps.
Here are complete examples from production Kafka and Hadoop systems. Each shows the schema (blueprint) and sample data (what actually gets stored).
About These Examples
In production, Avro stores data in binary format. The examples below show JSON encoding so you can read them. Binary is what makes Avro efficient - typically 30-50% smaller. Learn more at Apache Avro docs.
Example 1: Simple User Record
Let's start simple - a user record with basic fields. You'd use this in user services or auth systems.
Schema:
{
"type": "record",
"name": "User",
"namespace": "com.example.users",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "username", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "age", "type": "int" },
{ "name": "active", "type": "boolean" }
]
}Data:
{
"id": 1001,
"username": "john_doe",
"email": "[email protected]",
"age": 28,
"active": true
}What's Happening
Schema defines five fields with types. Data provides values matching those types. In binary encoding, this would be stored without field names, making it way more compact. Try the JSON to Avro converter to see the difference.
Example 2: Nested Records
Real data has nested structures. Here's a customer with an embedded address - common in e-commerce.
Schema:
{
"type": "record",
"name": "Customer",
"namespace": "com.example.customers",
"fields": [
{ "name": "customerId", "type": "long" },
{ "name": "name", "type": "string" },
{
"name": "address",
"type": {
"type": "record",
"name": "Address",
"fields": [
{ "name": "street", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "zipCode", "type": "string" }
]
}
}
]
}Data:
{
"customerId": 100234567890,
"name": "Jane Smith",
"address": {
"street": "123 Main Street",
"city": "San Francisco",
"zipCode": "94102"
}
}Nested Data
Address record sits inside Customer. Keeps related data together. When you send this over Kafka, the whole customer object travels as one compact message.
Example 3: Arrays
When you need lists - order items, tags, IDs - arrays handle it.
Schema:
{
"type": "record",
"name": "Order",
"namespace": "com.example.orders",
"fields": [
{ "name": "orderId", "type": "string" },
{ "name": "total", "type": "double" },
{
"name": "items",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "OrderItem",
"fields": [
{ "name": "productId", "type": "string" },
{ "name": "quantity", "type": "int" },
{ "name": "unitPrice", "type": "double" }
]
}
}
}
]
}Data:
{
"orderId": "ORD-2024-001",
"total": 149.97,
"items": [
{
"productId": "PROD-123",
"quantity": 2,
"unitPrice": 49.99
},
{
"productId": "PROD-456",
"quantity": 1,
"unitPrice": 49.99
}
]
}The items array holds any number of OrderItem records - zero to thousands. Each follows the same structure.
Example 4: Optional Fields (Unions)
Unions make fields optional. Critical for schema evolution - add new fields without breaking old data.
Schema:
{
"type": "record",
"name": "Product",
"namespace": "com.example.products",
"fields": [
{ "name": "productId", "type": "string" },
{ "name": "name", "type": "string" },
{ "name": "price", "type": "double" },
{ "name": "discount", "type": ["null", "double"], "default": null },
{ "name": "description", "type": ["null", "string"], "default": null }
]
}With Optional Fields:
{
"productId": "LAPTOP-2024",
"name": "Professional Laptop",
"price": 999.99,
"discount": 50.00,
"description": "High-performance laptop"
}Without Optional Fields:
{
"productId": "MOUSE-2024",
"name": "Wireless Mouse",
"price": 29.99,
"discount": null,
"description": null
}Evolution Magic
["null", "double"] means "can be null or a double". Old data without these fields works with new systems. New data can include or skip them. That's why Avro works great with Schema Registry.
Example 5: Enums
Enums restrict fields to specific values. Status fields, categories - stuff with fixed options.
Schema:
{
"type": "record",
"name": "Transaction",
"namespace": "com.example.transactions",
"fields": [
{ "name": "transactionId", "type": "string" },
{ "name": "amount", "type": "double" },
{
"name": "status",
"type": {
"type": "enum",
"name": "Status",
"symbols": ["PENDING", "COMPLETED", "FAILED", "CANCELLED"]
}
}
]
}Data:
{
"transactionId": "TXN-2024-12345",
"amount": 250.75,
"status": "COMPLETED"
}Why Enums?
Try setting status to "FINALIZED" (not in the list) and you'll get an error. Catches bugs at write time instead of in production. Keeps data consistent across your whole Kafka pipeline.
Example 6: Maps
Maps store flexible key-value pairs when you don't know all keys ahead of time. Common for metadata.
Schema:
{
"type": "record",
"name": "UserEvent",
"namespace": "com.example.events",
"fields": [
{ "name": "eventId", "type": "string" },
{ "name": "eventType", "type": "string" },
{ "name": "timestamp", "type": "long" },
{ "name": "userId", "type": "string" },
{ "name": "metadata", "type": { "type": "map", "values": "string" } }
]
}Data:
{
"eventId": "evt-550e8400-e29b-41d4-a716-446655440000",
"eventType": "USER_LOGIN",
"timestamp": 1704067200000,
"userId": "user-12345",
"metadata": {
"ipAddress": "192.168.1.100",
"userAgent": "Mozilla/5.0",
"sessionId": "sess-abc123"
}
}Metadata map holds any string keys and values. Different events have different metadata without schema changes. Very flexible.
Understanding Avro Format
Binary vs JSON
Examples above show JSON encoding so you can read them. Production uses binary - 30-50% more compact and faster.
The string "username" in JSON takes 8 bytes. In binary Avro, it's not stored at all - only the value is stored since the schema defines the field name.
Self-Describing
.avro files embed the schema with data. Makes files self-describing - open a file from 5 years ago and still know what's in it.
In Kafka, schemas live in Schema Registry and messages reference them by ID.
Type Safety
Unlike JSON where anything goes, Avro enforces types. Schema says int, you can't send a string.
Catches errors during development, not in production. Your data pipeline stays clean.
Common Uses
Kafka streaming, Hadoop data lakes, ETL pipelines, microservices, data warehouses, log aggregation.
Companies like LinkedIn, Netflix, Uber use Avro for billions of messages daily.
Tools to Work with Avro
Try these examples yourself:
Official Documentation
Official Resources
- Avro Specification
Full format spec
- Kafka with Avro
Integration guide
- Official Examples
More on GitHub
Related Guides
- What is Apache Avro?
Complete overview
- Schema Guide
All types explained
- Schema Examples
Ready templates