Avro Format Example - Complete Apache Avro Data Format Reference Guide
Learn Avro format with real-world examples. See how data is structured in Apache Avro format for Kafka, Hadoop, and big data pipelines.
Apache Avro Format Examples - Complete Reference Guide
Comprehensive examples of Apache Avro data format with schemas, data encoding, and real-world use cases for Kafka and Hadoop ecosystems.
Simple Avro Record Format
The most basic Apache Avro format example shows a simple user record with primitive types. This format is commonly used in microservices and data streaming applications.
Avro Schema:
{ "type": "record", "name": "User", "namespace": "com.example.users", "fields": [ { "name": "id", "type": "int" }, { "name": "username", "type": "string" }, { "name": "email", "type": "string" }, { "name": "age", "type": "int" }, { "name": "active", "type": "boolean" } ] }
Data in Avro Format (JSON Encoding):
{ "id": 1001, "username": "elara_quinn", "email": "[email protected]", "age": 28, "active": true }
Note: In production, Avro data is typically stored in efficient binary format. The JSON encoding shown above is used for readability and testing. Binary encoding is 30-50% smaller and faster to process.
Nested Record Format
Complex Avro format with nested records, commonly used for representing structured data like customer profiles with address information in Apache Kafka systems.
Avro Schema with Nested Record:
{ "type": "record", "name": "Customer", "namespace": "com.example.customers", "fields": [ { "name": "customerId", "type": "long" }, { "name": "name", "type": "string" }, { "name": "address", "type": { "type": "record", "name": "Address", "fields": [ { "name": "street", "type": "string" }, { "name": "city", "type": "string" }, { "name": "zipCode", "type": "string" } ] } } ] }
Data Example:
{ "customerId": 100234567890, "name": "Jane Smith", "address": { "street": "123 Main Street", "city": "San Francisco", "zipCode": "94102" } }
Avro Array Format
This example shows how arrays are represented in Avro format, useful for collections like order items, tags, or product lists in e-commerce and inventory systems.
Avro Schema with Array:
{ "type": "record", "name": "Order", "namespace": "com.example.orders", "fields": [ { "name": "orderId", "type": "string" }, { "name": "total", "type": "double" }, { "name": "items", "type": { "type": "array", "items": { "type": "record", "name": "OrderItem", "fields": [ { "name": "productId", "type": "string" }, { "name": "quantity", "type": "int" }, { "name": "price", "type": "double" } ] } } } ] }
Data Example:
{ "orderId": "ORD-2024-001", "total": 149.97, "items": [ { "productId": "PROD-123", "quantity": 2, "price": 49.99 }, { "productId": "PROD-456", "quantity": 1, "price": 49.99 } ] }
Union Type Format (Optional Fields)
Avro union types allow fields to accept multiple types, commonly used for optional fields. This format is essential for schema evolution in production systems.
Avro Schema with Union Types:
{ "type": "record", "name": "Product", "namespace": "com.example.products", "fields": [ { "name": "productId", "type": "string" }, { "name": "name", "type": "string" }, { "name": "price", "type": "double" }, { "name": "discount", "type": ["null", "double"], "default": null }, { "name": "description", "type": ["null", "string"], "default": null } ] }
Data Example (with optional fields):
{ "productId": "LAPTOP-2024", "name": "Professional Laptop", "price": 999.99, "discount": 50.00, "description": "High-performance laptop with 16GB RAM" }
Data Example (without optional fields):
{ "productId": "MOUSE-2024", "name": "Wireless Mouse", "price": 29.99, "discount": null, "description": null }
Enum Type Format
Avro enums define a fixed set of allowed values, perfect for status fields, categories, or any field with a predefined set of options. This ensures data quality and validation.
Avro Schema with Enum:
{ "type": "record", "name": "Transaction", "namespace": "com.example.transactions", "fields": [ { "name": "transactionId", "type": "string" }, { "name": "amount", "type": "double" }, { "name": "status", "type": { "type": "enum", "name": "Status", "symbols": ["PENDING", "COMPLETED", "FAILED", "CANCELLED"] } } ] }
Data Example:
{ "transactionId": "TXN-2024-12345", "amount": 250.75, "status": "COMPLETED" }
Kafka Event Message Format
A complete example of an Avro-formatted event message for Apache Kafka, including metadata fields commonly used in event-driven architectures and data streaming pipelines.
Kafka Event Schema:
{ "type": "record", "name": "UserEvent", "namespace": "com.example.events", "fields": [ { "name": "eventId", "type": "string" }, { "name": "eventType", "type": "string" }, { "name": "timestamp", "type": "long" }, { "name": "userId", "type": "string" }, { "name": "action", "type": "string" }, { "name": "metadata", "type": { "type": "map", "values": "string" } } ] }
Event Data Example:
{ "eventId": "evt-550e8400-e29b-41d4-a716-446655440000", "eventType": "USER_LOGIN", "timestamp": 1704067200000, "userId": "user-12345", "action": "login", "metadata": { "ipAddress": "192.168.1.100", "userAgent": "Mozilla/5.0", "sessionId": "sess-abc123" } }
Understanding Avro Data Format
What is Apache Avro Format?
Apache Avro is a data serialization format that combines a schema with the data. Unlike JSON or XML, Avro stores data in a compact binary format alongside a schema definition, making it highly efficient for data storage and transmission in distributed systems.
The Avro format consists of two main components: the schema (which defines the structure) and the data (which follows that structure). This separation enables schema evolution, where you can modify schemas over time while maintaining compatibility with existing data.
Binary vs JSON Encoding
Avro supports two encoding formats:
- Binary encoding: Compact, fast, and efficient for production use. Data is serialized in a space-efficient binary format that requires the schema to be decoded.
- JSON encoding: Human-readable format used for testing, debugging, and development. The examples on this page use JSON encoding for clarity.
In production systems like Kafka Schema Registry, binary encoding is typically used because it is 30-50% more efficient than JSON.
Common Use Cases for Avro Format
- Apache Kafka: Avro is the preferred format for Kafka message serialization with Schema Registry integration
- Hadoop Ecosystem: Used for storing data in HDFS, Hive, Spark, and other big data tools
- Data Pipelines: ETL processes use Avro for efficient data transfer between systems
- Microservices: Service-to-service communication with schema validation and evolution support
- Data Lakes: Long-term storage of structured data with schema versioning
Frequently Asked Questions
What is the difference between Avro format and JSON format?
While both can represent structured data, Avro format includes a schema definition and uses binary encoding for efficiency. JSON is human-readable but larger in size and slower to parse. Avro is 30-50% smaller than JSON and significantly faster for serialization and deserialization, making it ideal for high-throughput systems like Kafka and Hadoop.
How do I create data in Avro format?
To create Avro format data, you need a schema and your data. Use the JSON to Avro converter to transform JSON data into Avro format, or use the Avro Schema Generator to create a schema from sample JSON data first.
Can I read Avro format data without the schema?
No, you need the schema to decode Avro binary data. However, Avro files (.avro) typically embed the schema in the file header, so the schema travels with the data. When using Kafka Schema Registry, schemas are stored centrally and referenced by ID, allowing consumers to retrieve the schema and decode messages.
What are Avro primitive types?
Avro supports primitive types: null, boolean, int (32-bit), long (64-bit), float, double, bytes, and string. It also supports complex types including records (structured objects), arrays, maps, unions (multiple possible types), enums (predefined values), and fixed (fixed-length byte arrays). These types cover most data modeling needs in modern applications.
How does Avro handle schema evolution?
Avro supports schema evolution through forward, backward, and full compatibility modes. You can add fields with defaults, remove optional fields, and rename fields while maintaining compatibility with existing data. This makes Avro excellent for systems that evolve over time, like long-running Kafka topics or data warehouses.
What tools work with Avro format?
Many tools support Avro format: JSON to Avro for encoding data, Avro to JSON for decoding, Avro Formatter for beautifying schemas, Avro Schema Validator for checking schema validity, and CSV to Avro for batch data conversion.
Related Tools
JSON to Avro
Convert JSON data to Apache Avro format with automatic schema generation
Avro to JSON
Convert Apache Avro data to JSON format with schema validation
Avro Fixer
Fix and repair broken Avro schemas with automatic syntax error correction
Avro Schema Generator
Generate Apache Avro schemas from JSON data with automatic type inference
Avro Schema Validator
Validate Apache Avro schemas with syntax and structure checking for Kafka
Avro Formatter
Format and beautify Apache Avro schemas with proper indentation