Schema evolution is probably the biggest challenge with Protocol Buffers. You need to change your data structure, but you can't break the thousands of clients already running old code. Get it wrong and everything breaks. Get it right and you can evolve gracefully.

This guide shows you exactly how to add fields, remove fields, and modify your schemas without causing disasters. We'll use real examples and explain what works, what doesn't, and why.

The Golden Rules of Schema Evolution

Rule #1: Never Change Field Numbers

This is the most important rule. Field numbers are how protobuf identifies fields. Change the number and old messages parse into wrong fields. This breaks everything.

Rule #2: Always Add New Fields, Never Remove

Adding fields is safe. Removing fields breaks old code. Instead of removing, mark as deprecated and reserve the number.

Rule #3: Make All Fields Optional

Proto3 makes fields optional by default. This makes evolution much easier because old code can handle new fields gracefully.

Adding New Fields (Safe)

Adding fields is the safest operation. Old code ignores new fields, new code handles missing fields.

Scenario: Adding Email Field

Version 1 (Original):

message Subscriber {
  string msisdn = 1;
  string name = 2;
  bool is_active = 3;
}

✓ Version 2 (Safe - Added email):

message Subscriber {
  string msisdn = 1;
  string name = 2;
  bool is_active = 3;
  string email = 4;        // New field - totally safe!
  string address = 5;      // Can keep adding more
}

What happens: Old clients ignore fields 4 and 5. New clients see empty values for these fields when parsing old messages. Everyone works fine!

Removing Fields (The Right Way)

You can't actually "remove" fields - but you can stop using them. Here's the safe migration path:

Scenario: Removing Old Status Field

Version 1 (Original):

message Subscriber {
  string msisdn = 1;
  string name = 2;
  string old_status = 3;  // We want to remove this
}

Step 1: Mark Deprecated (Deploy this first):

message Subscriber {
  string msisdn = 1;
  string name = 2;
  string old_status = 3 [deprecated = true];  // Mark deprecated
  SubscriberStatus new_status = 4;             // Add replacement
}

✓ Step 2: Reserve Number (After all clients migrated):

message Subscriber {
  reserved 3;                       // Never use this number again
  reserved "old_status";            // Never use this name again
  
  string msisdn = 1;
  string name = 2;
  SubscriberStatus new_status = 4;  // Only the new field remains
}

Migration timeline:

Deploy v1: Mark old_status as deprecated, add new_status
Update all code to write both fields (for backward compat)
Update all code to read new_status, ignore old_status
Wait until ALL clients are updated
Deploy v2: Reserve the field number

Renaming Fields (Surprisingly Easy)

Field names are just for code generation. The wire format uses numbers. So renaming is actually safe!

Before:

message Subscriber {
  string phone_number = 1;  // Unclear name
  string customer_name = 2;
}

✓ After (Completely safe!):

message Subscriber {
  string msisdn = 1;  // Renamed - totally safe!
  string name = 2;    // Also renamed - no problem!
}

Why it works: The field numbers (1, 2) didn't change. Old binaries talk to new binaries perfectly. Only your code variable names change.

Changing Field Types (Dangerous!)

Changing types is risky. Some changes work, others cause data corruption. Here's the breakdown:

✓ SAFE Type Changes

✓int32 ↔ int64 (small numbers)
✓sint32 ↔ sint64 (small numbers)
✓fixed32 ↔ sfixed32
✓fixed64 ↔ sfixed64
✓Single value ↔ repeated (if you pack/unpack carefully)

✗ UNSAFE Type Changes

✗int32 → string (data corruption)
✗bool → int32 (breaks)
✗string → bytes (encoding issues)
✗Message → primitive type (total failure)

⚠BETTER APPROACH: Add New Field

Instead of changing types, add a new field with the correct type:

message Subscriber {
  string user_id_old = 1 [deprecated = true];  // Was string
  int64 user_id = 2;                            // Now integer
  // Migrate over time, then reserve field 1
}

Evolving Enums

Enums have special rules. Old code must handle new enum values gracefully.

Adding Enum Values (Safe)

Version 1:

enum SubscriptionType {
  PREPAID = 0;
  POSTPAID = 1;
}

✓ Version 2 (Safe - Added new value):

enum SubscriptionType {
  PREPAID = 0;
  POSTPAID = 1;
  CORPORATE = 2;   // New value - safe!
  ENTERPRISE = 3;  // Can keep adding
}

What happens: Old code receives CORPORATE (value 2) but doesn't recognize it. Most implementations store it as the integer and preserve it when re-serializing.

✗ Removing Enum Values (Dangerous)

Don't remove enum values! Old messages with that value will break. Instead, reserve:

enum SubscriptionType {
  reserved 2;              // Don't reuse this number
  reserved "DEPRECATED";   // Don't reuse this name
  
  PREPAID = 0;
  POSTPAID = 1;
  CORPORATE = 3;  // Skip 2 forever
}

Refactoring Message Structures

Sometimes you need to restructure - split messages apart or combine them. Here's how:

Scenario: Splitting Flat Structure into Nested

Version 1 (Flat):

message Subscriber {
  string msisdn = 1;
  string name = 2;
  string street = 3;
  string city = 4;
  string country = 5;
}

✓ Version 2 (Nested - Done right):

message Address {
  string street = 1;
  string city = 2;
  string country = 3;
}

message Subscriber {
  string msisdn = 1;
  string name = 2;
  
  // Keep old fields for backward compat
  string street = 3 [deprecated = true];
  string city = 4 [deprecated = true];
  string country = 5 [deprecated = true];
  
  // New nested structure
  Address address = 6;
}

// Migration: Write to both old and new fields
// Read from new field first, fall back to old fields

Testing Schema Changes

Always test compatibility before deploying. Here's a simple test pattern:

# Python - Test backward compatibility
import subscriber_v1_pb2
import subscriber_v2_pb2

def test_forward_compatibility():
    """Old code can read new messages"""
    # Create message with v2 (new schema)
    v2_msg = subscriber_v2_pb2.Subscriber()
    v2_msg.msisdn = "+91-9876543210"
    v2_msg.name = "Test User"
    v2_msg.email = "[email protected]"  # New field
    
    # Serialize with v2
    data = v2_msg.SerializeToString()
    
    # Parse with v1 (old schema)
    v1_msg = subscriber_v1_pb2.Subscriber()
    v1_msg.ParseFromString(data)  # Should not crash!
    
    # Old fields should work
    assert v1_msg.msisdn == "+91-9876543210"
    assert v1_msg.name == "Test User"
    # v1 just ignores the email field

def test_backward_compatibility():
    """New code can read old messages"""
    # Create message with v1 (old schema)
    v1_msg = subscriber_v1_pb2.Subscriber()
    v1_msg.msisdn = "+91-9876543210"
    v1_msg.name = "Test User"
    
    # Serialize with v1
    data = v1_msg.SerializeToString()
    
    # Parse with v2 (new schema)
    v2_msg = subscriber_v2_pb2.Subscriber()
    v2_msg.ParseFromString(data)  # Should not crash!
    
    # Old fields should work
    assert v2_msg.msisdn == "+91-9876543210"
    assert v2_msg.name == "Test User"
    # New fields are empty/default
    assert v2_msg.email == ""  # Default value

Evolution Best Practices Checklist

✓Always add new fields, never remove

✓Use reserved for deleted fields

✓Mark deprecated fields with [deprecated = true]

✓Test both forward and backward compatibility

✓Use optional fields (proto3 default)

✓Document breaking changes in comments

✓Plan migration paths before deploying

✓Keep old and new fields during transition

✗Never reuse field numbers

✗Never change field numbers

✗Never change types unless absolutely necessary

Related Resources

Protobuf Best Practices

Essential dos and don'ts for protobuf schemas

Proto File Validator

Validate schema changes before deployment

External References

Official Documentation

Updating Message Types - Official evolution guide
API Compatibility - Google's recommendations

Conclusion

Schema evolution doesn't have to be scary. Follow the golden rules: never change field numbers, always add (never remove), and test both directions. When in doubt, add a new field instead of modifying existing ones.

Plan your migrations carefully. Keep both old and new fields during transitions. Use deprecation warnings to guide developers. And always, always test compatibility before deploying to production. These practices will save you from painful debugging sessions at 2 AM.

Back to Protocol Buffers Read: Protobuf vs Avro

All Categories

Protobuf Schema Evolution Guide