Protocol Buffers is powerful, but like any tool, there are right ways and wrong ways to use it. Follow these best practices to avoid common pitfalls, ensure backward compatibility, and build schemas that stand the test of time.
These aren't just theoretical rules. They're battle-tested practices from teams running protobuf in production at scale. We'll use simple examples so you can apply them immediately.
Schema Design Best Practices
✓ DO: Keep Messages Small and Focused
Each message should represent one thing. Don't create giant "god messages" with everything.
❌ BAD:
// Too much in one message
message UserData {
string name = 1;
string email = 2;
string address = 3;
string phone = 4;
repeated Order orders = 5;
repeated Payment payments = 6;
AccountSettings settings = 7;
// ... 20 more fields
}✓ GOOD:
// Split into focused messages
message User {
string user_id = 1;
string name = 2;
string email = 3;
}
message UserContact {
string user_id = 1;
string phone = 2;
string address = 3;
}
message UserOrders {
string user_id = 1;
repeated Order orders = 2;
}✓ DO: Use Clear, Descriptive Names
Names should tell you what the field contains. Avoid abbreviations unless they're industry standard.
❌ BAD:
message Sub {
string msdn = 1; // Typo
string nm = 2; // Too short
int32 st = 3; // Unclear
bool act = 4; // Ambiguous
}✓ GOOD:
message Subscriber {
string msisdn = 1; // Standard telecom
string name = 2; // Clear
SubscriptionType type = 3; // Descriptive
bool is_active = 4; // Unambiguous
}✗ NEVER: Change Field Numbers
Once a field number is assigned, it's permanent. Changing it breaks compatibility completely.
// Version 1
message Subscriber {
string name = 1;
string email = 2; // ← This field number is FOREVER
}
// Version 2 - WRONG! ❌
message Subscriber {
string name = 1;
string email = 3; // ← Changed from 2 to 3 = DISASTER!
}
// Version 2 - CORRECT! ✓
message Subscriber {
string name = 1;
string email = 2; // Keep original number
string phone = 3; // New field gets new number
}Field Numbering Best Practices
✓ DO: Reserve Numbers 1-15 for Frequently Used Fields
Field numbers 1-15 take 1 byte to encode. Numbers 16+ take 2 bytes. Put your most common fields first.
message CallRecord {
// Hot path fields (1-15) - used in every message
string caller_msisdn = 1; // Always present
string callee_msisdn = 2; // Always present
int64 duration_seconds = 3; // Always present
int64 timestamp = 4; // Always present
// Less frequent fields (16+)
string call_id = 16; // Only for debugging
repeated string notes = 17; // Rarely used
string operator_id = 18; // Optional metadata
}✓ DO: Reserve Field Numbers for Future Use
When you remove a field, reserve its number so it's never reused accidentally.
message Subscriber {
// Reserved numbers from deleted fields
reserved 2, 5, 9 to 11;
reserved "old_field_name", "deprecated_status";
// Active fields
string msisdn = 1;
string name = 3;
bool is_active = 4;
// Never use 2, 5, 9, 10, or 11 again!
}✓ DO: Leave Gaps for Future Fields
Don't number fields 1, 2, 3, 4... Leave space so you can add related fields later.
message Subscriber {
// Identity fields (1-10)
string msisdn = 1;
string email = 2;
string name = 3;
// Reserved 4-10 for future identity fields
// Status fields (11-20)
bool is_active = 11;
SubscriptionType type = 12;
// Reserved 13-20 for future status fields
// Metadata fields (21-30)
int64 created_at = 21;
int64 updated_at = 22;
// Reserved 23-30 for future metadata
}Choosing the Right Data Types
✓ DO: Use Fixed-Width Types for Performance-Critical Fields
For fields that are always large numbers, fixed-width types are faster and smaller.
message TelemetryData {
// Use fixed64 for timestamps (always 8 bytes, faster)
fixed64 timestamp_nanos = 1;
// Use fixed32 for hashes (always 4 bytes)
fixed32 checksum = 2;
// Use regular int64 for IDs (variable length, smaller for small numbers)
int64 user_id = 3;
// Use sint32 for negative numbers (better encoding)
sint32 temperature_celsius = 4;
}✓ DO: Use Enums Instead of Strings for Fixed Values
Enums are more efficient and type-safe than strings. Plus, you catch typos at compile time.
❌ BAD:
message Subscriber {
string status = 1;
// "active", "inactive",
// "suspended", "activee" (typo!)
}✓ GOOD:
enum SubscriberStatus {
ACTIVE = 0;
INACTIVE = 1;
SUSPENDED = 2;
}
message Subscriber {
SubscriberStatus status = 1;
// Type-safe!
}✓ DO: Use Appropriate String Types
Use string for UTF-8 text,bytes for binary data.
message DataPacket {
// Use string for human-readable text
string customer_name = 1; // UTF-8 text
string description = 2; // UTF-8 text
// Use bytes for binary data
bytes encrypted_payload = 3; // Binary data
bytes file_contents = 4; // Binary data
bytes checksum = 5; // Binary hash
}Versioning and Backward Compatibility
✓ DO: Always Add New Fields, Never Remove
Adding fields is safe. Removing fields breaks old code. Mark deprecated instead.
message Subscriber {
string msisdn = 1;
string name = 2;
// Old field we don't want anymore - DON'T DELETE!
string old_status = 3 [deprecated = true]; // Mark deprecated
// New replacement field
SubscriberStatus status_v2 = 4; // Use this instead
// New field added in v2
string email = 5; // Safe to add anytime
}✓ DO: Make All Fields Optional (proto3 default)
In proto3, all fields are optional by default. This makes evolution easier.
// proto3 - all fields optional by default
message Subscriber {
string msisdn = 1; // Can be missing
string name = 2; // Can be missing
bool is_active = 3; // Can be missing (defaults to false)
// If you really need to know if a field was set:
optional string email = 4; // Explicitly optional (has presence)
}✓ DO: Use Default Values Wisely
Remember that proto3 doesn't send default values over the wire. Design with this in mind.
message FeatureFlags {
// Good: true is the safe default
bool enable_new_feature = 1; // Default false = feature off
// Careful: false might be meaningful
bool is_verified = 2; // false could mean "not verified" or "field not set"
// Better: Use enum when false is meaningful
enum VerificationStatus {
VERIFICATION_UNKNOWN = 0; // Explicit unknown state
VERIFIED = 1;
NOT_VERIFIED = 2;
}
VerificationStatus verification = 3;
}Performance Best Practices
✓ DO: Put Frequently Accessed Fields First
Parsers can skip fields. Put hot data in low field numbers for faster access.
message CallRecord {
// Hot path - checked on every message
string caller_msisdn = 1; // Read immediately
string callee_msisdn = 2; // Read immediately
int64 duration = 3; // Read immediately
// Cold path - rarely accessed
repeated string debug_logs = 20; // Usually skipped
map<string, string> metadata = 21; // Usually skipped
}✓ DO: Use Repeated Instead of Maps When Possible
Maps are convenient but have overhead. For simple lists, use repeated fields.
⚠SLOWER:
message Subscriber {
// Map has overhead
map<string, string> tags = 1;
}✓ FASTER:
message Tag {
string key = 1;
string value = 2;
}
message Subscriber {
// Repeated is faster
repeated Tag tags = 1;
}✓ DO: Reuse Message Objects
Creating new messages allocates memory. Reuse when processing many messages.
⚠SLOWER:
// Python - creates new object each time
for data in stream:
msg = Subscriber() # New alloc
msg.ParseFromString(data)
process(msg)✓ FASTER:
// Python - reuse object
msg = Subscriber() # One alloc
for data in stream:
msg.Clear() # Reuse memory
msg.ParseFromString(data)
process(msg)Organization and Maintainability
✓ DO: Use Packages to Organize Schemas
Packages prevent naming conflicts and organize related messages.
// Good organization
syntax = "proto3";
package telecom.subscriber.v1; // Clear namespace
message Subscriber {
string msisdn = 1;
}
message SubscriberList {
repeated Subscriber subscribers = 1;
}✓ DO: Add Comments Generously
Future you (and your team) will thank you. Explain non-obvious fields.
message CallRecord {
// Mobile Subscriber ISDN (international format, e.g., +91-9876543210)
// Must include country code for international calls
string caller_msisdn = 1;
// Call duration in seconds. Rounded down to nearest second.
// Billing is based on this value, so ensure accuracy.
int64 duration_seconds = 2;
// Unix timestamp in milliseconds when call started.
// Uses server time (UTC) not client time.
int64 start_timestamp_ms = 3;
}✓ DO: Version Your Package Names
Include version in package name for major changes. Allows gradual migration.
// subscriber_v1.proto
package telecom.subscriber.v1;
message Subscriber {
string msisdn = 1;
string name = 2;
}
// subscriber_v2.proto (major breaking changes)
package telecom.subscriber.v2;
message Subscriber {
string msisdn = 1;
PersonName name = 2; // Changed from string to message
// Can coexist with v1 during migration
}Common Mistakes to Avoid
❌ Mistake #1: Using required Fields (proto2)
Problem: Required fields can't be removed without breaking everything.
Solution: Use proto3 (all fields optional) or make everything optional in proto2.
❌ Mistake #2: Reusing Field Numbers
Problem: Old messages will parse into wrong fields.
Solution: Use reserved for deleted field numbers.
❌ Mistake #3: Storing Large Binary Data Directly
Problem: Entire message must be loaded to memory.
Solution: Store large files elsewhere, keep references in protobuf.
❌ Mistake #4: Not Planning for Extension
Problem: Schema becomes unmaintainable as it grows.
Solution: Leave gaps in field numbers, use reserved blocks.
Quick Reference Checklist
✓ DO
- ✓Use clear, descriptive field names
- ✓Reserve 1-15 for frequent fields
- ✓Add comments to your schema
- ✓Use enums for fixed values
- ✓Reserve deleted field numbers
- ✓Version your package names
- ✓Keep messages small and focused
✗ DON'T
- ✗Change field numbers ever
- ✗Reuse field numbers
- ✗Use required fields (proto2)
- ✗Create "god messages"
- ✗Store huge binary blobs directly
- ✗Use abbreviations in names
- ✗Delete fields (mark deprecated)
Related Resources
External References
Official Documentation
- Proto3 Language Guide - Official syntax reference
- API Best Practices - Google's official recommendations
- Encoding Guide - Understanding wire format
Conclusion
Following these best practices will save you from painful debugging sessions and breaking changes. Protocol Buffers is incredibly powerful when used correctly, but small mistakes can cause big problems down the line.
Start with good schema design, never change field numbers, and always think about backward compatibility. Your future self and your team will thank you. Remember: it's easier to design it right the first time than to fix it later when you have millions of messages in production.