Protocol Buffers Best Practices

Essential dos and don'ts for building robust, maintainable Protobuf schemas

Published: January 2025 • 12 min read

Protocol Buffers is powerful, but like any tool, there are right ways and wrong ways to use it. Follow these best practices to avoid common pitfalls, ensure backward compatibility, and build schemas that stand the test of time.

These aren't just theoretical rules. They're battle-tested practices from teams running protobuf in production at scale. We'll use simple examples so you can apply them immediately.

Schema Design Best Practices

✓ DO: Keep Messages Small and Focused

Each message should represent one thing. Don't create giant "god messages" with everything.

❌ BAD:

// Too much in one message
message UserData {
  string name = 1;
  string email = 2;
  string address = 3;
  string phone = 4;
  repeated Order orders = 5;
  repeated Payment payments = 6;
  AccountSettings settings = 7;
  // ... 20 more fields
}

✓ GOOD:

// Split into focused messages
message User {
  string user_id = 1;
  string name = 2;
  string email = 3;
}

message UserContact {
  string user_id = 1;
  string phone = 2;
  string address = 3;
}

message UserOrders {
  string user_id = 1;
  repeated Order orders = 2;
}

✓ DO: Use Clear, Descriptive Names

Names should tell you what the field contains. Avoid abbreviations unless they're industry standard.

❌ BAD:

message Sub {
  string msdn = 1;  // Typo
  string nm = 2;    // Too short
  int32 st = 3;     // Unclear
  bool act = 4;     // Ambiguous
}

✓ GOOD:

message Subscriber {
  string msisdn = 1;      // Standard telecom
  string name = 2;        // Clear
  SubscriptionType type = 3;  // Descriptive
  bool is_active = 4;     // Unambiguous
}

✗ NEVER: Change Field Numbers

Once a field number is assigned, it's permanent. Changing it breaks compatibility completely.

// Version 1
message Subscriber {
  string name = 1;
  string email = 2;  // ← This field number is FOREVER
}

// Version 2 - WRONG! ❌
message Subscriber {
  string name = 1;
  string email = 3;  // ← Changed from 2 to 3 = DISASTER!
}

// Version 2 - CORRECT! ✓
message Subscriber {
  string name = 1;
  string email = 2;       // Keep original number
  string phone = 3;       // New field gets new number
}

Field Numbering Best Practices

✓ DO: Reserve Numbers 1-15 for Frequently Used Fields

Field numbers 1-15 take 1 byte to encode. Numbers 16+ take 2 bytes. Put your most common fields first.

message CallRecord {
  // Hot path fields (1-15) - used in every message
  string caller_msisdn = 1;      // Always present
  string callee_msisdn = 2;      // Always present
  int64 duration_seconds = 3;     // Always present
  int64 timestamp = 4;            // Always present
  
  // Less frequent fields (16+)
  string call_id = 16;            // Only for debugging
  repeated string notes = 17;     // Rarely used
  string operator_id = 18;        // Optional metadata
}

✓ DO: Reserve Field Numbers for Future Use

When you remove a field, reserve its number so it's never reused accidentally.

message Subscriber {
  // Reserved numbers from deleted fields
  reserved 2, 5, 9 to 11;
  reserved "old_field_name", "deprecated_status";
  
  // Active fields
  string msisdn = 1;
  string name = 3;
  bool is_active = 4;
  // Never use 2, 5, 9, 10, or 11 again!
}

✓ DO: Leave Gaps for Future Fields

Don't number fields 1, 2, 3, 4... Leave space so you can add related fields later.

message Subscriber {
  // Identity fields (1-10)
  string msisdn = 1;
  string email = 2;
  string name = 3;
  // Reserved 4-10 for future identity fields
  
  // Status fields (11-20)
  bool is_active = 11;
  SubscriptionType type = 12;
  // Reserved 13-20 for future status fields
  
  // Metadata fields (21-30)
  int64 created_at = 21;
  int64 updated_at = 22;
  // Reserved 23-30 for future metadata
}

Choosing the Right Data Types

✓ DO: Use Fixed-Width Types for Performance-Critical Fields

For fields that are always large numbers, fixed-width types are faster and smaller.

message TelemetryData {
  // Use fixed64 for timestamps (always 8 bytes, faster)
  fixed64 timestamp_nanos = 1;
  
  // Use fixed32 for hashes (always 4 bytes)
  fixed32 checksum = 2;
  
  // Use regular int64 for IDs (variable length, smaller for small numbers)
  int64 user_id = 3;
  
  // Use sint32 for negative numbers (better encoding)
  sint32 temperature_celsius = 4;
}

✓ DO: Use Enums Instead of Strings for Fixed Values

Enums are more efficient and type-safe than strings. Plus, you catch typos at compile time.

❌ BAD:

message Subscriber {
  string status = 1;  
  // "active", "inactive", 
  // "suspended", "activee" (typo!)
}

✓ GOOD:

enum SubscriberStatus {
  ACTIVE = 0;
  INACTIVE = 1;
  SUSPENDED = 2;
}

message Subscriber {
  SubscriberStatus status = 1;
  // Type-safe!
}

✓ DO: Use Appropriate String Types

Use string for UTF-8 text,bytes for binary data.

message DataPacket {
  // Use string for human-readable text
  string customer_name = 1;      // UTF-8 text
  string description = 2;        // UTF-8 text
  
  // Use bytes for binary data
  bytes encrypted_payload = 3;   // Binary data
  bytes file_contents = 4;       // Binary data
  bytes checksum = 5;            // Binary hash
}

Versioning and Backward Compatibility

✓ DO: Always Add New Fields, Never Remove

Adding fields is safe. Removing fields breaks old code. Mark deprecated instead.

message Subscriber {
  string msisdn = 1;
  string name = 2;
  
  // Old field we don't want anymore - DON'T DELETE!
  string old_status = 3 [deprecated = true];  // Mark deprecated
  
  // New replacement field
  SubscriberStatus status_v2 = 4;  // Use this instead
  
  // New field added in v2
  string email = 5;  // Safe to add anytime
}

✓ DO: Make All Fields Optional (proto3 default)

In proto3, all fields are optional by default. This makes evolution easier.

// proto3 - all fields optional by default
message Subscriber {
  string msisdn = 1;        // Can be missing
  string name = 2;          // Can be missing
  bool is_active = 3;       // Can be missing (defaults to false)
  
  // If you really need to know if a field was set:
  optional string email = 4;  // Explicitly optional (has presence)
}

✓ DO: Use Default Values Wisely

Remember that proto3 doesn't send default values over the wire. Design with this in mind.

message FeatureFlags {
  // Good: true is the safe default
  bool enable_new_feature = 1;  // Default false = feature off
  
  // Careful: false might be meaningful
  bool is_verified = 2;         // false could mean "not verified" or "field not set"
  
  // Better: Use enum when false is meaningful
  enum VerificationStatus {
    VERIFICATION_UNKNOWN = 0;   // Explicit unknown state
    VERIFIED = 1;
    NOT_VERIFIED = 2;
  }
  VerificationStatus verification = 3;
}

Performance Best Practices

✓ DO: Put Frequently Accessed Fields First

Parsers can skip fields. Put hot data in low field numbers for faster access.

message CallRecord {
  // Hot path - checked on every message
  string caller_msisdn = 1;      // Read immediately
  string callee_msisdn = 2;      // Read immediately
  int64 duration = 3;             // Read immediately
  
  // Cold path - rarely accessed
  repeated string debug_logs = 20;     // Usually skipped
  map<string, string> metadata = 21;   // Usually skipped
}

✓ DO: Use Repeated Instead of Maps When Possible

Maps are convenient but have overhead. For simple lists, use repeated fields.

⚠️ SLOWER:

message Subscriber {
  // Map has overhead
  map<string, string> tags = 1;
}

✓ FASTER:

message Tag {
  string key = 1;
  string value = 2;
}

message Subscriber {
  // Repeated is faster
  repeated Tag tags = 1;
}

✓ DO: Reuse Message Objects

Creating new messages allocates memory. Reuse when processing many messages.

⚠️ SLOWER:

// Python - creates new object each time
for data in stream:
    msg = Subscriber()  # New alloc
    msg.ParseFromString(data)
    process(msg)

✓ FASTER:

// Python - reuse object
msg = Subscriber()  # One alloc
for data in stream:
    msg.Clear()  # Reuse memory
    msg.ParseFromString(data)
    process(msg)

Organization and Maintainability

✓ DO: Use Packages to Organize Schemas

Packages prevent naming conflicts and organize related messages.

// Good organization
syntax = "proto3";

package telecom.subscriber.v1;  // Clear namespace

message Subscriber {
  string msisdn = 1;
}

message SubscriberList {
  repeated Subscriber subscribers = 1;
}

✓ DO: Add Comments Generously

Future you (and your team) will thank you. Explain non-obvious fields.

message CallRecord {
  // Mobile Subscriber ISDN (international format, e.g., +91-9876543210)
  // Must include country code for international calls
  string caller_msisdn = 1;
  
  // Call duration in seconds. Rounded down to nearest second.
  // Billing is based on this value, so ensure accuracy.
  int64 duration_seconds = 2;
  
  // Unix timestamp in milliseconds when call started.
  // Uses server time (UTC) not client time.
  int64 start_timestamp_ms = 3;
}

✓ DO: Version Your Package Names

Include version in package name for major changes. Allows gradual migration.

// subscriber_v1.proto
package telecom.subscriber.v1;

message Subscriber {
  string msisdn = 1;
  string name = 2;
}

// subscriber_v2.proto (major breaking changes)
package telecom.subscriber.v2;

message Subscriber {
  string msisdn = 1;
  PersonName name = 2;  // Changed from string to message
  // Can coexist with v1 during migration
}

Common Mistakes to Avoid

❌ Mistake #1: Using required Fields (proto2)

Problem: Required fields can't be removed without breaking everything.

Solution: Use proto3 (all fields optional) or make everything optional in proto2.

❌ Mistake #2: Reusing Field Numbers

Problem: Old messages will parse into wrong fields.

Solution: Use reserved for deleted field numbers.

❌ Mistake #3: Storing Large Binary Data Directly

Problem: Entire message must be loaded to memory.

Solution: Store large files elsewhere, keep references in protobuf.

❌ Mistake #4: Not Planning for Extension

Problem: Schema becomes unmaintainable as it grows.

Solution: Leave gaps in field numbers, use reserved blocks.

Quick Reference Checklist

✓ DO

  • Use clear, descriptive field names
  • Reserve 1-15 for frequent fields
  • Add comments to your schema
  • Use enums for fixed values
  • Reserve deleted field numbers
  • Version your package names
  • Keep messages small and focused

✗ DON'T

  • Change field numbers ever
  • Reuse field numbers
  • Use required fields (proto2)
  • Create "god messages"
  • Store huge binary blobs directly
  • Use abbreviations in names
  • Delete fields (mark deprecated)

Related Resources

External References

Official Documentation

Conclusion

Following these best practices will save you from painful debugging sessions and breaking changes. Protocol Buffers is incredibly powerful when used correctly, but small mistakes can cause big problems down the line.

Start with good schema design, never change field numbers, and always think about backward compatibility. Your future self and your team will thank you. Remember: it's easier to design it right the first time than to fix it later when you have millions of messages in production.