Schema evolution is probably the biggest challenge with Protocol Buffers. You need to change your data structure, but you can't break the thousands of clients already running old code. Get it wrong and everything breaks. Get it right and you can evolve gracefully.
This guide shows you exactly how to add fields, remove fields, and modify your schemas without causing disasters. We'll use real examples and explain what works, what doesn't, and why.
The Golden Rules of Schema Evolution
Rule #1: Never Change Field Numbers
This is the most important rule. Field numbers are how protobuf identifies fields. Change the number and old messages parse into wrong fields. This breaks everything.
Rule #2: Always Add New Fields, Never Remove
Adding fields is safe. Removing fields breaks old code. Instead of removing, mark as deprecated and reserve the number.
Rule #3: Make All Fields Optional
Proto3 makes fields optional by default. This makes evolution much easier because old code can handle new fields gracefully.
Adding New Fields (Safe)
Adding fields is the safest operation. Old code ignores new fields, new code handles missing fields.
Scenario: Adding Email Field
Version 1 (Original):
message Subscriber { string msisdn = 1; string name = 2; bool is_active = 3; }
✓ Version 2 (Safe - Added email):
message Subscriber { string msisdn = 1; string name = 2; bool is_active = 3; string email = 4; // New field - totally safe! string address = 5; // Can keep adding more }
What happens: Old clients ignore fields 4 and 5. New clients see empty values for these fields when parsing old messages. Everyone works fine!
Removing Fields (The Right Way)
You can't actually "remove" fields - but you can stop using them. Here's the safe migration path:
Scenario: Removing Old Status Field
Version 1 (Original):
message Subscriber { string msisdn = 1; string name = 2; string old_status = 3; // We want to remove this }
Step 1: Mark Deprecated (Deploy this first):
message Subscriber { string msisdn = 1; string name = 2; string old_status = 3 [deprecated = true]; // Mark deprecated SubscriberStatus new_status = 4; // Add replacement }
✓ Step 2: Reserve Number (After all clients migrated):
message Subscriber { reserved 3; // Never use this number again reserved "old_status"; // Never use this name again string msisdn = 1; string name = 2; SubscriberStatus new_status = 4; // Only the new field remains }
Migration timeline:
- Deploy v1: Mark old_status as deprecated, add new_status
- Update all code to write both fields (for backward compat)
- Update all code to read new_status, ignore old_status
- Wait until ALL clients are updated
- Deploy v2: Reserve the field number
Renaming Fields (Surprisingly Easy)
Field names are just for code generation. The wire format uses numbers. So renaming is actually safe!
Before:
message Subscriber { string phone_number = 1; // Unclear name string customer_name = 2; }
✓ After (Completely safe!):
message Subscriber { string msisdn = 1; // Renamed - totally safe! string name = 2; // Also renamed - no problem! }
Why it works: The field numbers (1, 2) didn't change. Old binaries talk to new binaries perfectly. Only your code variable names change.
Changing Field Types (Dangerous!)
Changing types is risky. Some changes work, others cause data corruption. Here's the breakdown:
✓ SAFE Type Changes
- ✓
int32
↔int64
(small numbers) - ✓
sint32
↔sint64
(small numbers) - ✓
fixed32
↔sfixed32
- ✓
fixed64
↔sfixed64
- ✓Single value ↔
repeated
(if you pack/unpack carefully)
✗ UNSAFE Type Changes
- ✗
int32
→string
(data corruption) - ✗
bool
→int32
(breaks) - ✗
string
→bytes
(encoding issues) - ✗Message → primitive type (total failure)
⚠️ BETTER APPROACH: Add New Field
Instead of changing types, add a new field with the correct type:
message Subscriber { string user_id_old = 1 [deprecated = true]; // Was string int64 user_id = 2; // Now integer // Migrate over time, then reserve field 1 }
Evolving Enums
Enums have special rules. Old code must handle new enum values gracefully.
Adding Enum Values (Safe)
Version 1:
enum SubscriptionType { PREPAID = 0; POSTPAID = 1; }
✓ Version 2 (Safe - Added new value):
enum SubscriptionType { PREPAID = 0; POSTPAID = 1; CORPORATE = 2; // New value - safe! ENTERPRISE = 3; // Can keep adding }
What happens: Old code receives CORPORATE (value 2) but doesn't recognize it. Most implementations store it as the integer and preserve it when re-serializing.
✗ Removing Enum Values (Dangerous)
Don't remove enum values! Old messages with that value will break. Instead, reserve:
enum SubscriptionType { reserved 2; // Don't reuse this number reserved "DEPRECATED"; // Don't reuse this name PREPAID = 0; POSTPAID = 1; CORPORATE = 3; // Skip 2 forever }
Refactoring Message Structures
Sometimes you need to restructure - split messages apart or combine them. Here's how:
Scenario: Splitting Flat Structure into Nested
Version 1 (Flat):
message Subscriber { string msisdn = 1; string name = 2; string street = 3; string city = 4; string country = 5; }
✓ Version 2 (Nested - Done right):
message Address { string street = 1; string city = 2; string country = 3; } message Subscriber { string msisdn = 1; string name = 2; // Keep old fields for backward compat string street = 3 [deprecated = true]; string city = 4 [deprecated = true]; string country = 5 [deprecated = true]; // New nested structure Address address = 6; } // Migration: Write to both old and new fields // Read from new field first, fall back to old fields
Testing Schema Changes
Always test compatibility before deploying. Here's a simple test pattern:
# Python - Test backward compatibility import subscriber_v1_pb2 import subscriber_v2_pb2 def test_forward_compatibility(): """Old code can read new messages""" # Create message with v2 (new schema) v2_msg = subscriber_v2_pb2.Subscriber() v2_msg.msisdn = "+91-9876543210" v2_msg.name = "Test User" v2_msg.email = "test@example.com" # New field # Serialize with v2 data = v2_msg.SerializeToString() # Parse with v1 (old schema) v1_msg = subscriber_v1_pb2.Subscriber() v1_msg.ParseFromString(data) # Should not crash! # Old fields should work assert v1_msg.msisdn == "+91-9876543210" assert v1_msg.name == "Test User" # v1 just ignores the email field def test_backward_compatibility(): """New code can read old messages""" # Create message with v1 (old schema) v1_msg = subscriber_v1_pb2.Subscriber() v1_msg.msisdn = "+91-9876543210" v1_msg.name = "Test User" # Serialize with v1 data = v1_msg.SerializeToString() # Parse with v2 (new schema) v2_msg = subscriber_v2_pb2.Subscriber() v2_msg.ParseFromString(data) # Should not crash! # Old fields should work assert v2_msg.msisdn == "+91-9876543210" assert v2_msg.name == "Test User" # New fields are empty/default assert v2_msg.email == "" # Default value
Evolution Best Practices Checklist
reserved
for deleted fields[deprecated = true]
Related Resources
External References
Official Documentation
- Updating Message Types - Official evolution guide
- API Compatibility - Google's recommendations
Conclusion
Schema evolution doesn't have to be scary. Follow the golden rules: never change field numbers, always add (never remove), and test both directions. When in doubt, add a new field instead of modifying existing ones.
Plan your migrations carefully. Keep both old and new fields during transitions. Use deprecation warnings to guide developers. And always, always test compatibility before deploying to production. These practices will save you from painful debugging sessions at 2 AM.